[Music] you know [Music] everybody Welcome to the course vlsi design flow RTL to GDs this is the third lecture in this lecture and in the next five lectures we'll be taking an overview of vlsi design flow and then in the rest of the lectures of this course we'll be looking at various design tasks in more detail we have taken this approach of first taking a overview of Minnesota design flow so that later on when we look into the various design tasks in more detail we'll be able to appreciate the linkages between one design task and the other we can also understand the impact of One Design task on the other and how we can optimize a given design task such that down the flow other task gets benefit out of it so by taking this or understand taking an overview and then going into the detail of each of the tasks helps us to gain the understanding of the entire design flow in DT so in this lecture we'll be covering the topics first of all an overview of of of how the design flow is structured and then we'll be looking at a important concept which is known as abstraction and then we'll be looking into pre-rtl methodologies and Hardware soft software partitioning let us first look into the top level problem that we are trying to solve in vlsi design flow the top level problem is that we have got some idea idea of implementing a system and that system is doing some useful work for example it is doing some processing or it is doing some computation or it may be say controlling a robot or doing some useful work for us so that is the starting point of of Chip design or that is the idea we are having and at the end of the flow what we want is a chip which is delivering that task and that chip should be manufactured in bulk and the the and and the Manifest manufacturer and the fabrication of Chip should be such that or such that that the whole whole process is a profitable for the for for the for the person who is this undertaking this venture now this is a complicated task it can be accomplished in many ways some of them can lead to profitable uh chip design and some may not right and some can actually fail so to to to tackle this complicated problem what strategy do we follow we follow the strategy of divide and concur so we what we do is that we we break the entire chip designing task and Chip manufacturing task into multiple steps and carry out those steps sequentially so let us look into at a broad level what are those tasks so given an idea and its implementation to a chip within this flow there are the the design a design goes under goes through a lot of Transformations nevertheless we can identify a few Milestones through which a design goes through and what are those Milestones the first one we can say that it is the rtn RTL is a design representation in a register transfer level it is typically uh written in verilog or vsd language and then we take this RTL and it goes through the through the through through the design process and finally we get get a layout which we say that it is in the form of graphical database system and that is representing a layout so the from the idea to the chip we have identified two important Milestones one is the RTL and the GTS and based on these milestones we can divide the entire process flow into into three three parts the first part is from idea to the RT so in this step what we do is that we take a high level idea concept of a product and represent the hardware portion of the of the of the idea into an RT so that we can call it as idea to RTL flow or we also call it as system level design and then once we have an RDL then comes the RTL to GDs flow so in this what we do is that we take an RTL and take it through various stages of logical and physical design what does logical and physical design mean will we will look into that in the later part of this course and at the end of the RTL to GDs flow we get a layout which is represented in GDs format and once we have got the layout then it goes to The Foundry for fabrication and that we can say is the gds2 chip processes it takes a GDs prepares mask for a given DDS and fabricates test packages chips and then it goes to the mark so the the the in this today's lecture will be looking primarily at this part right and in later lectures in we'll be taking an overview of this RTL to GDs flow and then we'll be also going into uh gds2 chip processes flow we'll be taking an overview or we will be looking into them very briefly but the later part or the main part of this course is this RTL to GDs flow where a lot of effort is needed not only in implementation of the design but also in verification of the design we have taken a a top level view of what vlsi design flow is and how uh an idea is converted to a chip now let us look into an important concept which is known as abstraction and how it relates to vlsi design flow foreign so what is abstraction abstraction is basically hiding lower level details in a description so as the design moves through vsi design flow details go on adding inside in the in the design and as a result abstraction decreases why abstraction decreases because abstraction means hiding lower level details and keeping only the the the thing which is required at the top level right so as design moves through the vsi design float more details are added and abstraction decreases so if you look into the three parts of the design flow that we just discussed in the idea to RTL flow the abstraction is very very high right and we are representing a design in terms of system and behavior later on when it goes to RTL to GDs flow the as the as the design moves through various stages of RTL to GS flow the abstraction goes on decreasing and and the design get transformed from first to RTL to logic gates then transistor and finally to the layout and once we have got the layout then there is no abstraction all the details required for fabrication are already there and then the actual implementation starts and the design in then in the in the fabrication step is represented by mask and integrated circuits for the final integrated circuit now why do we do abstraction or why why do we abstract out information in vlsi design flow at the especially at the earlier stages of design to one so to understand that we should we should first see that for a design task for a design task what are important consider considerations are there are two important considerations the first one is optimization optimization means choosing right combination of design parameters to obtain desired QR by trading of some other some other QR measures right so the optimization is basically trying to find suitable parameters for the design so that we get a better better quality of result or figures of Merit right that is one part of a design task another part of the design task is the turnaround time now what is turnaround time turnaround time is the time taken to make changes in a design ideally we want to make these changes as quickly as possible right why do we want to do that because we want to design or make design of an offer chip as soon as possible so that the chip can be then fabric get it and then it goes goes out to the market and it succeeds in the competition in the market so the sooner we bring a product the more likely it is get it is it is more likely it is it will succeed and therefore turnaround time is very very important especially in the semiconductor industry now when when we do abstraction what benefits do we get when we do abstraction at the higher level the details in the design are less okay so at the higher level abstraction is large is is more and therefore large number of solutions can be analyzed in less amount of time right so when when we remove details of a design then what happens is that only some part of the design which are actually important may be the functionality and other things those are important and only a few implementation details are available at the top level so when we keep only a few details of a design then it allows us to find to explore the design space more easily in terms of in terms of of run time by the tools and also the human and human designer effort why because when the details are less then a human designer can look into more details because when there are less entities then then a designer can focus on those less number of entities and can optimize it manually or can give hints to the tool and get a better cure and the so as a the result of optimization at higher level of abstraction is expected to be better okay so uh if we the scope of optimization is very high at the top level right because there we can look into the the there are less details so we can try out many other many options and then choose the best out of it and therefore the scope of optimization is very high and in the in the system level design or at the idea to the RTL flow the first part of the flow that we discussed and the turn turnaround time will be also below because the details are less so we need to add or or change a fewer details in the design and therefore turnaround time can be quicker and in the RTL to GDs flow what happens is that as the as the design flow moves the more the more details are added and therefore making changes become difficult difficult and the scope of the optimization decreases because we can try out less number of options right and when we go into the last part that is the fabrication there is no optimization only if some corrections are required those are made and those are very costly now to give you an illustration that uh how how what are various levels of abstraction let us take an example now consider that the functionality is represented in two ways first in terms of logic formula this is one level of abstraction in which we are representing a function in terms of logic formula for example f is equal to a plus b whole path right now the same function can be represented in a in a in a layout also in terms of standard cells and a nor gate or a standard shell which is which is delivering a nor function and we are making a layout and placing it and also making a connection out of right so these are two different levels of abstraction right now let us understand that which of the above representation has greater abstraction since in the logic formula there is less detail it is only defining the functionality right and the details about how it is implemented in terms of layout and other things is missing the level of abstraction is higher in for the for the for the first representation right so in the first representation the abstraction is more and the smaller turnaround time will be will be will be required in the first representation why because if you want to make a change in the implementation suppose we want to represent f is equal to a A plus b whole bar as a bar Dot Biba right so this kind of transformation can be done very easily at the logic formula level but what if we want to do a similar kind of transformation in the layout then we might need to change the nor gate to say an and gate and then we need to have an inverter and so on and then we need to place these Gates and also make the connections on the layout right so that's the the turnaround time that will be required in making change in the in in the layout representation or the B representation will be much higher than in the a representation okay so the next the question comes that where the the the the accuracy in evaluation of different option will be more so we we understand that in the logic formula the details are missing meaning that we don't know the details of that how a and the nor gate is implemented so we cannot compute the delay area and other things right however if the layout is there then we can estimate the area estimate the delay and other figures of Merit much more accurately so greater accuracy in evaluation will be in the in the in the B representation right so what we see is that the as the as the as the abstract as the design moves from from a higher abstraction level to the lower abstraction level the details are added and as such when details are added then the scope of optimization reduces but the but the estimate of the of figures of Merit those become more accurate now let us look at pre-rtl methodologies three RTL methodologies are the design task that we undertake before getting an RTR in this part of the of that Village I design flow we decide the various components that will be required for a system these components may be Hardware these components may be software and also we decide that how these components interact with each other therefore these designs or pre this design step or pre-rtl methodologies is also known as the system level design in this uh when we have an idea and we want to get a product out of it so what we do is that initially Whenever there is an idea the first thing is that we evaluate whether that idea is worthwhile or not so we check for the market requirement we look for we we analyze my financial viability and we also look into the technical feasibility if all these things are positive then we move to the next step that is creating a specification for the product right so when we Define a specification we will decide on the features that we need for the product and the PPA or the figures of Merit that that that are that are relevant for the product for example at what clock frequency it should work or what will the power dissipation that will be allowed for the chip and the cost and other kind of figures of Merit and then we also decide what should be the schedule of making this product and what and what is known as time to market right so the time to Market is that how much time it takes for an idea to get its implemented and the product coming out to the market ideally we want to make keep it as small as possible and once we have got the specification and the features that needs to be implemented for a product then the step comes what is known as Hardware software partitioning so in this Hardware software partitioning step we identify various components that will be required for the system so these ah this these at a broad level these uh components can be either Hardware or it can be software right now we we need to decide that which component needs to be implemented in software and which component needs to be implemented soft in hardware and this step is known as Hardware software partitioning once we have decided that this part the the what will be the hardware portion of the specification that needs to be implemented then it goes through IC design and manufacture phase in which we build the hardware right and in the similar manner the software portion of the of the specification is goes through the software development process and the executables and the firmware device drivers apps etc those are finally final result of the software development once both hardware and software both are available and there will be some components which are already existing which means we mean we we can directly reuse at the system level when all of them are ready then we do a system level integration we combine all these components together we do some validation we do testing and at the end of this the product comes out of it right so this is a very top level view of top level view of of system level design right so now what we will do is that we will be looking into a few of these steps or this this the system level design steps in more detail so it is worthy to point out that the system level design um varies depending on the the the type of products we are designing or the type of functionality we are implementing for example whether it is a processor or a signal processing chip or maybe a other kind of controller or other kind of design but there the the the the exact step that is or design tasks that will be undertaken under in system level design will be varying however let us look at a few design tasks which are common to all the system level design or which are widely used okay so let's look into the first part that is Hardware software partitioning and this is one of one of the integral part of system level design so what is Hardware software partitioning and why do we want to do it so the motivation for doing Hardware software partitioning is that we want to exploit the Merit merits of both hardware and software so now Hardware has got some Merit it has also got some demerit now and software has got some Merit and also demand right now when we combine hardware and software and get a system out of it what we want is that the good thing of Hardware is extract or is utilized in the system and good thing of software is also utilized and as a result we get a product of an excellent quantity quality right so the why what are the good things of hardware and what are good things of software let's let us understand that first the good thing about Hardware is that it can deliver a very high performance meaning that it can be very it can deliver a very high speed why it can give you a very high speed is that Hardware can be made such that there are parallel circuits right and all of them can work concurrently and as such a given task can be completed much more quickly while if we look into the software that software if it is it is running on a processor and assuming that the processor is only is running in in a sequential manner one taking one instruction and doing the processing and giving you the result there will be a kind of sequential operation in for the software but for Hardware lot of parallelization can be done and therefore that is the biggest Merit of Hardware is that it will can deliver a very high performance now what are good things about the or about the software the good thing about software is that the development of the software is much more easy why because it is less complicated than making a hardware or designing the hardware and also the risk involved in software is much lesser why because if we find the find any error in a software we can simply debug it and then and re and make the change in the code recompile it and we are done but if there is a problem in the hardware we mean may need to respin it right we need to make the we can we need to redesign the hardware we may require to re design the hardware we may need to change the mask and other things and therefore it may be costly and also time taken right and the other other good thing about software is that a lot of customization can be done right you can make that code make make a code which can do many things based on the requirement and it is easy to customize in the case of software but for Hardware you need Hardware Hooks and other things which are more difficult to implement and for develop and the development time of the software is much lesser compared to the the the the the the hardware because Hardware design is a more complex task as we'll be seeing in this in this Co in the in this course in the subsequent lectures now to understand how do we do Hardware software partitioning let us understand uh or take a look at a typical system in which both hardware and software coexist right so this figure shows a such a kind of system right in this what we have is a general purpose microprocessor OK over which the software part is running right so one instruction is running that so the the C the the processor is giving the result and then it is utilized again and so on so this it is the software is running sequentially on a general purpose purpose microprocessor and then there are there is a dedicated Hardware right which performs some task and this is the hardware we are talking about that we can Implement in a system this Hardware this is also known as Hardware accelerator which can give the result very fast it is other in this figure what we are showing is one only one dedicated Hardware there can be multiple dedicated Hardware right and there all of them can be may be working in parallel and as a result of that the whole system will be giving the result very fast okay there can be memory and all these things can be interacting through or sharing information through some bus right so here it a simple bus is showing is shown but there in a in a system in a in a chip or in a system in in a system realized on a chip these things can be okay the the connections can be very very complicated also right this is so the figure that I have shown here is just an illa for the illustrative purpose I have just shown you a kind of a kind of a a system where hardware and software both are working together so the hardware usually runs as parallel circuits and can have very good good PPA and that's why these are known as Hardware accelerators and it can be implemented in full custom icsic or fpg right this dedicated Hardware can be can be implemented in various kinds of kinds of design Styles and then the software usually runs sequentially on the general purpose process right now the hardware software partitioning step what it does is that it tries to map a functionality which should go into this software part with running on a general purpose macro processor and the functionality which should go into this hardware part or multiple such Hardwares or different kinds of Hardware which will be hooked onto the to the system now let us look into an example of how partitioning can be done right just a simple example let us consider an video compression algorithm right so now in this algorithm it can we can divide it into two parts the first part is the one which is Computing discrete cosine transform so assume that for the time being that this computation is done many many times and this is the bottleneck for the algorithm it is taking lot of time uh the the discrete the the computation of discrete cosine transform and then there are other computation for example frame handling and other computation which are also being done but the point is that here we are assuming for the time being that the part of the algorithm which is bottleneck bottle neck meaning that a lot of runtime say out of 100 percent run time 80 percent is going in Computing DCT right now in this case how do we do a a hardware software partition what we can do is that the DCT part can be computed separately on a hardware for uh on a hardware or a hardware accelerator and it can be computed using parallel circuit that circuit can be optimized so that the performance can be say thousand times faster than a software doing a similar kind of computation on general purpose computer or processor right and several it can be several orders of magnitude faster and more energy efficient we can open optimize other QR measures also for example Energy Efficiency or power dissipation required for the computation of DCT and other things those things can also be optimized in the hardware that we design and the other part that is frame handling and other computation those can be actually delegated to the software part and that software will be running on a general purpose purpose purpose processor in the same system right so software running on the general purpose microprocessor what it will give the benefit is that it will it will provide lot of flexibility for example later on if frame handling algorithm and other things needs to be changed those can be changed just by the software modifications no change in Hardware will require right so this is just an example that a given an algorithm how can we partition into hardware part and the software part we typically choose the heart the part which is very critical and which is basically bottleneck in the in at a system level to implement in hardware and the part where we require a lot of flexibility in the algorithm or where the change can be there those we can Implement in the software now Hardware software partitioning can be done in many ways right so ah it can be done in many ways for the illustrative purpose let us look into into one way in which Hardware software partitioning can be done just to illustrate that what are the challenges and what decisions needs to be made during Hardware software partitioning step so let us assume that the objective of Hardware software partitioning is to find a minimum set of functions minimum set of functions that need to be implemented in Hardware to achieve the desired performance right so what we say that initially again algorithm is given to us and that algorithm is implemented entirely in software right now out of that all the functions which are implemented in software we want to partition it or move a subset of that function or that though the set of that set of functions to Hardware right and what that subset should be that we need to find out so the in in this figure I am showing you a flowchart for how software Hardware software partitioning can be done initially so we are we are representing here two sets one is s and the other is H so the set s contains all the functions which are implemented in software and H contains all the functions which are implemented in Hardware so initially all the functions of the algorithm are in s right and H is an empty sense right H is an empty set and H is an empty set and everything is implemented in software right and uh then what we say is that we are also given an acceptable performance P right that the performance should be better than P it shouldn't be worse than P right so by Hardware software partitioning what we want right one one to attain is that the performance should come be above a given threshold p right and then there are parameters of the algorithm and which decides the maximum number of functions to be moved to Hardware in each iteration what is the use of n will seal it right so this is the input input is the set of functions which are which needs to be implemented in for a given algorithm and all are initially in the software right and at the end what we want the output from the software Hardware partitioning is a set of function H right a set of function H that needs to be implemented in Hardware right initially H is empty but as this algorithm will progress the the H will be populated or that set will get filled and it will be filled with minimal set of functions such that the performance becomes above the giventh ratio okay now let us see how to achieve this in Hardware software partition so when we do Hardware software partitioning we need to measure that what is the performance right so in this pseudo code or in the in this flow flowchart I am showing the that the perform the task of of of measuring the performance using the function evaluate H comma right so when we are given an whenever we have this algorithm enters the first thing is that it measures the speed what is the current speed or what is the current performance of the partition that is in the initial stage H is empty while s contains all the functions of the given algorithm now if the speed is already acceptable right if it is already greater than P right if speed is greater than P then we need not do any partitioning right in that case everything can be implemented in the software part it goes through this right everything is so got the required partition nothing to do right if it is not if the if there are if the performance is not as per or it is below the threshold P then what we do is a step which is known as profile HS what is this step doing is that it is measuring the frequency or duration of each function calls right if we have all the functions of the algorithm implemented in software right now what we do is that for each of the functions we do a step known as profiling in profiling is basically running the given test cases using a profiler profiler is a tool that measures that how much runtime is taken by each function in a given executive right so initially everything is in software we do a profiling of the code to find out that which functions out of that is taking more time for example there are four functions F1 F2 F3 F4 right so it the profile will give an information that F1 is taking say 10 percent of the time F2 is taking say 70 percent of the time F3 is taking say 10 percent of time and F4 is also taking 10 percent of time right this is what an information will be given by the profiler right now using the profiler information we can identify which is the bottleneck function so clearly in this case F2 is the bottleneck function bottleneck function right so in each iteration what we do is that we identify ith most severe bottleneck function f i right so in this case the the first bottleneck function is F2 right so once we identify the ith most severe bottleneck function what we do is that we assume that FY is implemented in Hardware right we assume that so now in H will get a a an element will add an element which is F2 right so F2 will be added to the set H right that is what it is doing H is equal to H Union f i that is we are adding the an element F2 to H and what we are doing simultaneously s is equal to S minus s f i meaning that we are removing F2 from the set of function in s right so it is kind of indicating that this function F2 has moved to Hardware rather than software right once we do that then what we do is that we evaluate the news P right ideally we want this the new speed should be better than the earlier speed why because Hardware can probably do things faster right the compared to software and therefore the new speed will be greater than the earlier right now if this performance is improving to a such an extent that it crosses the threshold right the threshold was P then that we got the required partition if not the next the next the the next function will be taken right in this case F1 F3 F4 all are having equal equal percentage so in the next one maybe say F1 will be picked any one will be picked out of them right the first one first appearing function in the in the event when the when the percentage match will be taken right now we we move F1 also into H and then measure right now if the speed is acceptable we are done if not then do con we continue moving a function from the from software side to the hardware side right so we measure the performance and check whether Target performance p is met so it moves maximum of and most critical bottleneck functions to the heart right so it goes on doing it n times right now what happens is that in doing this n times either we get the partition get the get the required partition or the performance of the system improves to such an extent that it crosses the threshold P or it may may not deliver the the sufficient gain and then we need to iterate it right now when this iteration will terminate it will be terminating when we are able to achieve the target P right that is one criteria then this algorithm will terminate other terminating criteria is that no improvement even after moving n functions to the hardware even if we have moved n and n times right even if we have moved n times uh sorry even when we have moved n functions from soft software to Hardware we are not getting an improvement right when we are not getting an improvement we we say that the we know partition can be found right why can we not get an improvement even we have when we have moved a function from software to Hardware there can be many reasons one of the primary reason can be the communication see when we may when we Implement everything in software everything is going from software to software and so on right but when there is a partitioning between software and Hardware then there will be a time taken in the coming education between software and Hardware also right and there will be also be dependency among the data and it may be that Hardware is waiting for the data and therefore it cannot produce the result even though it is ready right and therefore because of the data dependency there can be less gain or even if there the the gain that we get by the implementing in Hardware that can be offset by the the extra time taken in communication right and therefore even if we move some function out of form software to the hardware we may not get an expected gain and in that case we cannot find a suitable partition so this is a very simplistic method of of Hardware software partitioning this is just for the illustrative purpose it takes a greedy approach and and to to settle to some Minima right so or getting to a case in which we are able to meet the performance and incur the cost of Hardware the minimum right we are assuming here that the hardware is costly therefore we don't want to move everything to the hardware we want to move minimum set of functions from the software to Hardware that was what what we started are these are or this algorithm is assuming right now what are challenges in this kind of implementation or so or Hardware software partitioning in general the first challenge is the performance estimation what is this challenge is that when we say that we move the function f from software to hardware and then we evaluate the performance to see whether it is below the the threshold or not now the problem is that we don't have the hardware yet as yet right we have just assumed that that software function has moved to Hardware but the hardware is still non-existent the hardware has to be made it has to be designed it has to be fabricated then only we can actually do get a fair idea or get the exact numbers that by how much it is performance is improved right so the non-existence of Hardware will be the primary concern in Hardware software parts now how do we try to tackle this problem we tackle this problem by implementing the hardware either in say fpga which can be designed very quickly because Hardware is already there we need to program it and therefore we assume that the things has moved to fpga and based on that we can make some estimate that if that that Hardware so that function is moved from software to Hardware how much gain do we expect right even for sick we can extrapolate that number to something and then check that whether if we Implement that function in sick whether we'll get a sufficient get on a gain or not the second approach can be taken is that we can say that the given function F1 is implemented in Hardware but we take through a quick design flow for example we do a kind of behavior synthesis or high level synthesis implemented dual placement routing and other things at a very crude level and estimate or get a number out of it right that can be another way to deal with this problem then the second challenge is regarding the verification of hardware and software now earlier everything was implemented in software now we are saying something is implemented in Hardware something in software now we also need to ensure that one once these two combine together they still give the required functionality and we need to do that kind of verification and that kind of verification is a difficult a difficult thing because that software is sorry the hardware is still non-existent so various approaches are taken to deal with this problem for example what is known as co-simulation Hardware software co-simulation in which we assume some model of the hardware or timing model for the hardware and using that timing model of the hardware and the existing software we try to simulate it and get a get a fair idea of whether the system will work even when these two things are implemented together or combined together so this is the reference where to which you can refer to for more information on this topic and this brings us to the end of this lecture in this lecture we have looked into an important concept of abstraction and then we looked into the uh the the idea to RTL flow or the system level design flow in particular we looked into Hardware software partitioning in the next lecture we'll be looking into an another aspect of of system level design that is behavior level synthesis thank you very much [Music]