Introduction to Pipelining in Processor Design

foreign pipeline code we are designing this series on the request of some of the users who want us to give a tutorial of specify pipeline code as we have provided for his first single cycle core so in this series we are going to carry all the basic concept of what is pipelining and how we can convert the simple single circuits for according to a pipeline version of it and what hazards we are going to face when we do such things and how we are going to provide the solutions so this is the first video of the series uh we will going to keep on adding the video as we go further so without any further delay let's start with our first video in this uh so in this video we will just going to carry the Oasis overview of what is pipelining r and how we can convert the simplest so single code into a pipeline code so let's start with the overview so just uh giving you the basic concept of what is the pipelining so what does pipelining mean that you have some tasks which are happening in a pipeline right so before going to uh dig into the deeper of the pipeline Let's uh go with the basic knowledge of the single cycle code but how does the single cycle code works as the name suggests that all the execution here all the operation of this instruction I will perform in a single cycle okay we have already covered what is the clock cycle and how is represent how it is generating so I'm not going into those topics so basically every execution of particular instruction when execute in a single cycle that's what we call a single cycle code as we know while designing the single cycle called the latency of the clock cycle we face is during the load board instruction because the load one instruction is the only instruction who occupies all the hardware designer single cycle code which we have covered in our previous uh single Cycles course Series so you can review that also uh for the detailed knowledge of that so single cycle right foreign let's say we have what we have designed a single cycle was a very basic architecture we have designed but when we goes towards the complex designing like multi uh multi-trad designing a super scalar other processors which have higher performance and everything so in that if you having the latest execution in a single cycle the latency of the clock will be X according to the uh the worst instruction which we're taking or executing or using all the hardware you are designing the uh processor so while designing any of the professor we have keep in mind the three main performance factors the PPA the power performance and area so definitely you never want a processor which have a least power performance right for have a performance watch if I can say that you are designing a very up to uh optimized processor which can only run on the speed of 10 megahertz so this is not we want to achieve as we have the technology has been innovation has been growing rapidly fast so the speed of the processors and everything is also rapidly increasing so that's why the main concept of come to pipelining because what this pipeline do is that he breaks the task in two smaller chunks so what we have overcome here that we have increased the performance of our processor by reduced uh making a small chunks of our Hardware where each operation perform for a minimum area of the time so this is how we perform the pipelining okay as you can see here we have shown a little bit of demonstration that let's suppose in the single cycle processor the whole instruction has been executing from zero clock Cycles to 17 let's say 650 nanoseconds right so uh picose second sorry the time frame here that we mentioned so the in particular instruction was taking a single instruction was taking 650 seconds and I have 650 Pico seconds and then the second uh instruction start execution is uh performance execution which starts from 650 seconds to further on to 1350 seconds so this you can see that we have been uh what we can say that we have just neglected the time frame of 615 Pico second by doing nothing and just implementing a simple instruction so what we do in the pipelining that uh has Illustrated in a below example that we have used the pipelining function to execute multiple instruction in depth time frame as you can see that if the single instruction is taking 650 P per second execution so in that frame we have also performing multiple second instruction some operation of the second installation and some operation of the third instruction as well so this is how we have overcome our performance uh factor and increase our performance right uh the basic example if I give it the daily life example of this pipelining processor so let's suppose we have a washing machine which have key features in it it have a washing feature drawing feature and spinning feature right so what if I am telling you can you can just use the washing machine for a single a lot of your laundry it will going to wash it it we're going to spin it and it will go into dry so definitely the one whole operation will going to take more let's say 10 to 15 minutes and what if I can say that okay let's distribute each task into specific terms Let's uh designer washing machine use does the washing separately there is a spinner and there's a drive so what you can do that you can use all of the three of them parallel to uh optimize your work that you can work on three laundry sets at uh simultaneously rather than just doing on working on the any uh first launch yourself so this is how uh we uh obtain the pipelining features uh just explain to another easy language in Urdu as well foreign so this is uh the basic overview of the pipelining and then how does it what does the mean by the pipelining and how this is going to affect our performance factor of the process so this is the basic uh block diagram as you can see okay how we are going to convert this simple single cycle processor into a pipeline process okay so pipelining means that we are uh adding some registers into our data part okay the basic Quantum of the resistance again you can go visit our tutorials for verilog in which we have covered the register Concept in details so you can remove that so registers just like uh shaving something into them so what definitely when you are breaking a task into multiple tasks so you have to save the result of the previous so it can be used for the next task right let's uh just as gifted example of the washing machine if you are just watch the clothes we have to keep those clothes uh to keep for make them go forward towards the drive what if I just make a new lot and put in the drive that is not going to work because I haven't watched that close right so this is the same have been done in the computer architecture then we have to save the previous season some type of register so that we can use 72 net cycle whereas the previous cycle your previous Hardware can work on the new data which we are going to provide them so this is how what we do what we do then we have just uh break the sum of the hardware into chunks like say that this is uh we have a reduced marks PC counter and instruction memory and the PC Plus 4 added into one of the Chun into one part and we have introduced one of the register which we have called instruction fetch register so what we are calling this state we are calling the stage of first stage then we have again make a check of the register file extender and break it into a small check and by keeping again in including the next resistance which is called the code register and we are calling this Champion as a decode part of the processor and then we have again distributed the three Hardware the marks added and Al unit into a small check and add the register and forward with it which we call the execution register and we are calling this part as an execute cycle and then again we have break a data memory into a separate cycle which we and added a register which we call a data memory register and we call test it into a memory stick and last but not least we have uh separated our box right back box and we called it a pipeline right by stable so this is how we have converted our single cycle code which have there is no single register have been all uh input in between the Cycles we have inputed few four resistance and break them into five Cycles so this is how we convert a single cycle core to a five five stage pipeline core of risk 5 architecture so first cycle we call a first stage the second is called Eco stage third is called as execute stage four is called a memory state in fifth to scholar right back steps so now you can visualize that initial really I will be having I when I first fetch the first instruction so that only the fetch Hardware will be going to operational right there will be nothing into the coastline nothing to execute State nothing will be into memory State and nothing will be into the right by state but as I move forward my first instruction will move toward the Eco stage and then I will having an empty Hardware empty execution interfaces what I can do I can fashion at this instruction and start exit performing on that so one the first instruction is the Eco stage whereas by the pre new instruction is in the first stage instruction coming to The Deco State and now I am going to fetch the third instruction so now you can see then at that time I'm performing a three instruction simultaneously one has been doing execution of the instruction one is decoding the instruction and third is refreshing the third instruction as we move forward if I have the load instruction to definitely going to the memory stage the second instruction move to the execute stage the first third instruction we moved to the decoration and I will be refreshing the fourth instruction and then again moving to the new one cycle my first instruction will move towards the right by stage second installation will come to the memory stage the third instruction will come to the execution State four instruction will become to The Deco State and the fifth expression I built with the fetching so now you can see that initially I just have the first one instruction in the whole pipeline but as I move forward by one by increasing the clock Cycles now I have five instruction in my whole pipeline architecture that I have at a time performing on the five instruction simultaneously one instruction is getting right back one instruction have a memory uh stage on it one instruction have been the execute stage one instruction has been decoding and one instruction has been fetching so this is how we implement the pipelining uh uh concept into our web processors and we have distributed to five stages which have been working simultaneously on different instructions with different stages on it so this will be the whole data part we will be designing in the whole uh series uh as you can see that definitely uh the previous diagram I have shown which we bought was without the uh control signal so definitely we have to pipeline the control signal as well because we have to keep make sure that the correct control signals have been used for the correct operation for the correct time of the instruction right so that's why we are also pipelining the controls uh signals one by one as requirement between the particular stage so definitely we are going to we are going to go into details of this by step by step in the force further future videos so this is the basic overview and how the pipeline data part will be going to be over look in the end of the implementation so this is just the basic overview of that so that is all for today I hope that it will be the uh make your basic concept of the pipeline and clear and stay tuned for the future videos thank you very much

Transcript for:Introduction to Pipelining in Processor Design

Transcript for:
Introduction to Pipelining in Processor Design