Renesas has a couple of different processor architectures, including proprietary architectures completely unique to Renesas and others that are based off of ARM. One of the things that Renesas has always focused on in terms of decreasing lag is optimizing the pipelining within the microcontrollers. Using the RX family as an example, the design team created a pipelining methodology utilizing RISC based 5-stage pipelining with the power and flexibility of the variable length CISC instructions. But, without understanding what the process is in general, it may be hard to see how this affects performance.
Pipelining is one of the most fundamental principle of microprocessors and processors in general. When data is manipulated, it needs to go through a series of steps called a pipeline. There are a lot of different ways that a pipeline can be built, however, one of the easiest pipelines is the simple RISC 5-Step pipeline. For this example, we will explore how to pipeline using the diagram in Figure 1 (courtesy of the wikicommons project).
The first thing that needs to happen is for the processor to get a command. In order to do this, something called a program counter (PC) is created. You can think of the PC as a list of addresses to all of the commands that the processor needs to go through. In the Instruction Fetch (IF) stage, the address from the PC is loaded into a special place in memory called the instruction cache. Once the address is in the instruction cache, it’s accessible by the processors Control Unit (CU) via the register file. The control unit will breakdown the instruction into commands based on the values in dedicated control registers. For example, assume you have a Renesas RX62 and are able to look into the control register. You would see something that is fundamentally like Figure 2:
But the control unit would see “HALT” and would stop all functions.
When the CU reads the control register, you enter what is called the Instruction Decode (ID) stage. While the CU is decoding the instruction, it sends out various signals. These signals pass information such as what memory register addresses to pull up, whether or not the arithmetic logic unit (ALU) will be used, and many other things. Once the CU has prepared everything in the processor, the data is sent down the line with all the necessary devices activated. This stage is called the Execution stage (EX), and is where all of the actual number crunching and processes happens. Depending on the type of instruction that was executed, it might be necessary to access memory in the Memory Access stage (MEM).
Once the task is finished, the result should be stored in another register to be used for the next process (if the result is needed). This last phase is called the Write Back stage (WB), and it’s what allows the result to be preloaded for the next instruction. These 5 stages are executed one after another, but that doesn’t mean that the processor can only do one at a time. The actual flow of the pipeline can be concurrent with each stage immediately starting to work on the next piece of data in the next clock cycle, as shown in Figure 3.
Although simplified, this example gives a brief introduction to the stages that are common on most processors. However, much as discussed with the RX family, many processors add extra paths in order to accelerate execution speed and to add a level of resiliency, as well as more advanced schemes such as hyper-threading. This should work as a starting model that illustrates the basic principles.