[Lecture 6] Understanding Circuit Timing and Verification

you can you hear me I e [Music] spe for e e e spe [Music] okay let's get it started is it too loud I guess so is it better so good afternoon welcome back to another lecture in digital design computer architecture uh today we're going to continue what we have started in timing and verification I've been waiting actually for this lecture because this is actually one of the most exciting Topic in digital design and I really hope that you guys also enjoy it as a reminder these are some readings that we have for this week um and next week we're going to start Forum man process of microarchitecture foundation that we need for that so make sure that you also check these readings for next week and uh basically we were discussing about circuit timing yesterday at the very very end and we say that basically we already know how kind of investigate logical functionality but we wanted to know about timings and how fast is a circuit how can we make a circuit faster and what happens if you run this circuit too fast and basically uh we mentioned that a design that is logically correct can still fail because of real world implementation issues and we basically reached to this point that we wanted to start about combinational circuit timing so the digital logic abstraction that we are using uh is quite convenient um that for example we assume that we can output changes immediately with the input so here is a not gate and a is the input and immediately after basically first immediately a can go to zero so this is sharp and uh the output of this gate also immediately goes from 0o to one so this is a convenient an abstraction that we usually consider when we want to talk only about the functionality of the circuits but in reality that's not the case so outputs are delayed from inputs because transistors take finite amount of time to switch so for here we have a buffer gate this is be careful this is not a nutgate so the input a actually goes from0 to one this is also not that ideal so it's not sharp and the output takes some time to rea react essentially so this is a delay that we have even actually this diagram is also a kind of abstraction because uh what we have seen in real world is actually something like that you know we don't really have U uh lines in these kind of ve forms and you can actually Define different kind of timing so we're going to see the definition of them but this is one of the timings for example uh the amount of the latency that you need to wait for uh for the output to start uh changing so this is the latency for that and here is also another latency that is kind of propagation latency the latency that U you need to wait until you your output is stable essentially but we're going to learn about these definitions so delay is fundamentally caused by capacitance and resistance in a circuit whenever you design a circuit like every gate that you have has transistors and first of all when transist s gets on they don't act as a um basically single wire so they can act as a as as resistors and also the all of them they have a capacitor so you have also parasitic capacitance in all wires in transistors and all when you want to change the output essentially very likely you need to go through this RC delay so you need to charge kind of capacit and from your um study I guess in phys physics and um from maybe from your high school also you know that that would take some time so you cannot immediately charge the capacitors uh similarly when you want to decharge the capacitor that would take some time so that's one of the reason another one would be also Define the speed of light so we know that the speed of light is fast but not so fast on a nan scale so anything affecting these quantities can change delay uh Rising meaning that going from 0o to one and versus falling inputs so each of them can also cause um basically can cause delay in your circuit different inputs have different delays very if I want to also give you some insight is that uh think about the nand logic that we discussed um the nand circuit semos so we had two inmos transistor in series and we have two fos transistor in parallel right in the pull-up Network so from that you can see that actually in pullup Network you have two transistors that they are in parallel so from kovs law you know that you can actually you have better currents current to charge a transistor but in the pull down Network you have two transistors that they are sered so that can actually reduce your current so that's very uh simple example that shows you basically different input so for example going from 0 to one in the output can take a lower less time compared to going to 1 to zero or or the vice versa and any different input vectors can also have different latencies also changing the environment like the temperature can or for example the voltage Supply voltage that you provide to the Circuit that can also affect the latency as well as aging of the circuit so as as you use your circuit more and more your circuit is going to age and with that actually your uh latency can increase so we have a range of possible delays from input to Output that uh basically a designer need to deal with them so we have uh in order to find a good kind of simple abstraction we're going to uh Define these two delays contamination delay and propagation delay contamination delay is delay with until output which is why uh starts changing and propagation delay is the delay until output finishes changing so we can see from this example uh this is a circuit and ab is the are the inputs and why is the output so the moment that we start changing the one of the input which is a um you can see from this circuit that this is the amount of time minimum amount of time that you need to wait until your output starts changing so we call this U contamination delay essentially so this is the amount for contamination delay and from starting the start changing of the input and also until the output uh finishes changing we call this is propagation DeLay So This is the definition of propagation delay and that uh the first one is the contamination delay we usually need to when you want to make sure that your circuit is stable you need to basically um make sure about basically you need to calculate propagation delays but when you want to also make sure that uh any single change is not affecting the result of your circuit you really need to also be conservative and you need to check contamination delay we're going to actually see a lot about this in this lecture so don't worry if you don't understand it clearly now and this is the notation that we usually use in waveforms this cross-hatching means value is changing so it's not it's just kind of notation for waveforms essentially okay so we need to calculate longest and shortest delay pass we care about both the longest and shortest delay uh pass in a circuit and we will see actually the reason in this lecture later uh here is an example you have this one an here and then the output goes to this or and then you have ant so you can by checking this circuit you can see that probably the critical pass is going from a to the Y essentially right so you need to go through this end and then through this or operation and then through this and operation so this is the critical PA or the longest uh delay um pass of your circuit and very likely your shortest pass is going to be this from D to Y right because it only goes Direct directly to this gate so if you want to calculate for example the minimum latency of this circuit you need to basically uh check the shortest pass if you want to get calculate the critical pass or longest latency of the circuit you need to go through the critical pass latency so essentially critical longest pass uh for that we calculate um basically the latency of propagation delay is going to be uh two basically one propagation delay of this an and then one propagation delay of this ore and then one propagation delay of this an so in the end going to have we're going to have two propagation delay of and plus one or right and for the shortest pass latency we need to calculate based on the contamination delay as we already discussed so for and we only considered we consider the shortest path of the circuit and we only calculate the contamination DeLay So the contamination delay of this whole circuit is going to be the contamination delay of this end essentially only right does it make sense okay so let's see with some examples uh here we can see for example this is the circuit and these are the initial input that you had like a 1 B1 c0 and D1 so with that if you calculate you can see that the output of Y would be one then um you change the input a to zero so this is your transition from one to zero and things needs to get propagated essentially so this is actually very good example because you need to go through all these Gates so this and gate in the past it was one but now because a is zero so you need to wait for that until N1 gets to zero and then when N1 gets to zero then this or also needs to get zero so you need to also wait for that such that N2 also get from one to zero and um after having zero in N2 then you can actually also see that y goes from 1 to zero so it's actually exactly propagation delay so your delay is propagating as you go through this critical path of your circuit and for the shortest pass um considering this um basically initial vectors A 1 1 0 and one and then for D you go from 1 to Z so you know that the output should go should go from one to zero but this will happen very very quickly so this is the propagation delay but you can until here is the propagation delay but you can uh calculate the contamination delay that D when it starts changing and until y also starts changing so because of that we calculate the contamination delay of the end gate for the shortest pass does it make sense any question good okay so now I'm going to provide some examples about propagation delays for some different Gates so here is an example of real Nan 2 gate uh actually it's from um this IC they are actually very famous 74 um XX and they are um from Texas instrument as far as I remember in the in the past they had the TTL uh which is actually completely different technology to uh make your circuit it's using bjt's transistors which they are different from Enos but after that they also started using seos so in this uh IC you can see that you have four nand Gates and you can use each of them separately essentially so what I want to show you here is that essentially depending on the voltage the supply voltage that you apply to your IC um you're going to have different latencies so as you increase your supply voltage your increase your latency decreases and depending on the temperature that you're working on you have different latencies so essentially you can see that by uh working on the lowest temperature and also the maximum voltage your latency would be 7 nond but it can be up to 135 nond if you work with uh 2 volt and also so the Vol the temperature of 125 so that shows you that basically propagation delay heavily depends on voltage and temperature it also depends on the way that you implement your circuit so here is an example of two different way of implementing a multiplexer 4 to one this is the way that you calculate the mean terms and then you do or and this is another way that you use the triet buffer and you can see that the propagation delay are different U between these two designs so depending on how you implement your circuit you're going to have different also uh latencies here is also another way of implementing multiplexer uh using two to one multiplexers but it also has different latencies so basically the goal for this slide want to show that essentially the these latencies of our circuit are heavily dependent on the voltage temperature the way that you design and also the input vectors so we didn't show that here but essentially when you change for different input vectors you're going to have different latencies and all our tools that they need to deal with all these latency calculation they actually need to consider different um input vectors for for for accurate estimation but here I want to provide this disclaimer that essentially it's not always this easy to determine the long long and short pass so you observe that in that circuit it was so easy to see what is longest and what is shortest basically not all input transitions affect the output so this is very important to verify and also it can have multiple different passes from an input to Output so your output is drived through different passes and you need to deal with all of them together to understand which one is the shortest which one is the longest so it's not really easy and usually it's automated using cat tools in reality also circuits are not all built equally so different instances of the same gate have different delays here I showed you real ic's uh that's with some delays but in that Ser also we have different version of that IC uh even for Nate that they have uh different delays and access Laten um different delays essentially and wires also have nonzero delay because um essentially wires also can be modeled by capacitors and also resist source and it can increase with the lengths and temperature voltage also affects circuit speeds and uh even with these some uh when you put all these this together you can actually see that sometimes it can even change the critical path of your design um so when you use a different circuit and with different temperature also with different voltage you might have a completely different critical pass as you taught before so in the end designers assume worst case conditions and run many statistical simulations to balance yield and performance this is a norm usual way of Designing but this consideration of worst case and designing for wor worst case scenarios actually cause a lot of inefficiencies in our today systems and uh you're going to see a little bit more uh later in our lectures that we don't need to be that U you know strict and always designed for worst case so you can also design some intelligent uh techniqu that can decide based on the situation you know can work for the best case or sometimes basically try to make the better the more intelligent decisions on the status that you have there yes so the clock dep on it actually uh we will reach that um later soon yes how does the relate to um that's also a bit out of a scope of discourse we can discuss later offline but essentially if you look into Electronics um specific usually when you're spe U especially when you have long wires uh when they want to model the way of you know uh the wire they need to consider some capacitors and resistors so there's there's a lot in electronics about that but we can also discuss it offline any other question yes why does aing delay uh what why what sorry say oh aging okay so aging can affect uh the effect effectiveness of your transistors essentially that's one reason for example so your transistors when you so normally for example you consider that your transistor when it's on you know it can uh drives that much you know current but after using it a lot you know doing it on and off on and off it can cause some issues so it it it performance of the transistor can reduce there are also many other reasons for that which is again out of scope but uh we can also discuss offline if you're interested was that uh insightful okay yeah good any other questions good so yeah this is a summary of what we have discussed already about these delays now let's also quickly take a look at glitches that um we we briefly talk about it so glitch are actually one input transition causes multiple output transitions so here is an example um look at this circuit and we have this initial State and basically we want to well no um the initial estate actually was uh 011 and the output is one essentially and you want to move from one to zero so this input wants go from one to zero so let's see what what would happen so the thing is that usually glitches happen when you have different passs to drive output and you have slow and fast pass so here you have this slow pass that passed through three gates and here is your fast pass that passed through only two gates so the as a result you can see that the output would go from one to zero and then to one again why is that because when you move from one to zero here so this become zero and the output of this end would be zero and because of this or operation that this initially was also zero so this or operation is going to make the output to zero but after some time you also get the another uh value for this or which is one and then you need to go to one essentially so that's the reason that you might see some glitches in your circuit so glitches um yeah here is uh basically the diagram that I already discussed why it's happening like that um and the output for a very short amount of time it can go to zero and then come to up to high voltage yes that time is tpd TC um yeah around that yes but it's not that easy to calculate like this I mean you can use this abstraction uh in your mind but usually these delays are not uh they they don't they don't they don't they are not calculated like these numbers like these equations easily so there are a lot of sorry no we don't we don't navigate um what did you ask sorry like is it just probility no no it's not probability so it is like uh so first of all these delay like contamination delay and propagation delay they have very exact meaning uh for example propagation delay means like 50% of input to 50% of output something like that you know so if you consider that um then you cannot easily say that you know propagation delay minus that so these are not that accurate but but I like your um abstraction basically you know it's good for understanding any other question okay so this is optional uh I'm not go into it in detail uh if you were uh interested enough and you checked Caro map that we had some lecture on it not lecture some slide uh that's that was optional you can there is a way to avoid glitches um by going through the caral map but I'm not going to cover it because you can check it if you're interested ler this is optional as you can see but essentially you can uh what you can do is adding a circuit here this additional and gate so you can see that now this circuit is not minimal so you're adding another gate but by adding this you make sure that there is no glitch in your output so now the question is that do we always care about glitches so first of all F fix glitches is uh usually undesirable because it can consume more cheap area more power consumption and More Design effort and the circuit is eventually guaranteed to converge to the right value regardless of glitches uh glitchiness so the answer is that no not always so if we only care about the long-term steady state output we can safely ignore glitches and it's up to the designer uh to decide if glitches matter in their application so glitches also can affect some unnecessary Dynamic power consumption as well because your output is going down and then coming up so that's some power consumption that you're consuming but to avoid glitches if you add circuits you know that those circuits are going to also consume power so it may not force it uh but you remember that for example when I was presenting about Milli and more machine I said that more machines are better in terms of avoiding glitch the reason is that uh in in mil machine when you change the input that input can also cause some glitch in your output and that output might be also the input for some other M machine and that can also cause some glitches in some other uh another M machine and and this can propagate so one simple glitch can actually propagate to many many other circuits which can consume a lot of power so that's why we actually when we design our sequential circuit we usually prefer movement machine as long as they don't you know they don't need too a lot of more um States compared to M and they can work very well for glitches for example okay now we can uh take a look at sequential circuit timing which is a bit more interesting uh this is a recall for you that defl as you remember um that basically fof samples data D at the Active clock a so the thing the important thing about def F flop is that data must be stable when sampled uh meaning at at active clock age so this is your clock cycle so your clock cycle also is not um ideal so it cannot you know goes to one sharp so it's going to be like diagonal and then essentially your data in order to be sampled um reliably should be stable from this amount of time before Rising edge of this clock so this amount that we call it setup time and it should be also stable a bit after the rising Edge which we call it hold time so if you don't follow if you basically don't follow it then you you might get into some meta stability situation of Def flop and then essentially in the end you don't know what is stored in that defly flop so set up time uh the definition is that time before the clock Edge that data must be stable meaning that there should not be basically your the data is not changing and whole time time after the clock Edge that data must be stable and there is also another terminology uh aperture time which is time around clock age that data must be stable which is the summation of T set up and t hold you had for no okay yeah would be using for this say it again sorry would be using for this like an enable um so for enable def flip flop that's um yes for enable signal also you should make sure that you activate enable signal long I mean soon enough but the thing is that you are not sampling uh enable uh basically uh signal in your def FL it only talks about your data oh the way that we Implement you're asking okay no no no no it's not about enable we're going to see how we ensure that soon yeah yes this this for yeah like you're saying that okay D is changing you know because there is a there is a circuit there might be a combinational circuit that driving D data here right so that circuit is causing some D to be like changing there might be a lot of changing here but I don't care but I what I care is that from this time this should be stable yeah any other question so this is what I talked about ment stability if D is changing when sample mental stability can occur and FP flop output is stuck somewhere between one and zero in the end output eventually settles but this is not deter nondeterministically so you don't know if is one or zero essentially and this is an example that for nand RS lat that this is the region for meta stability that you don't something between one and zero and in the end you converge to one or zero in the end but you don't know because this is non deterministic so you should avoid basically designer should avoid violating setup time and hold time so similar to uh combinational circuits we also have some timing that we should Define for flip-flops so the moment that we have this Rising edge of clock and the amount of time that takes uh basically takes time to for the output to react so we have two one is that contamination delay clock to Q to Q essentially TCC Q um which is early as time after the clock ede that Q starts to change which is unstable and we also have this propagation delay clock to Q which is latest time after the clock Edge that Q stops changing meaning that Q is stable so yeah essentially from this figure you can see uh this is propagation Delay from clock to q and this is your uh contamination Delay from clock to q and Q starts changing from this um time and it's uh basic basically stops changing and it's a stable exactly at this moment yes what's well um here the the thing so because your circuit uh as I said like the the way that we implement the FL flop right it was for example two latches first and second latch and each latch for example is a d latch and has for example several n Gates Right it can be also different version of it but overall there are some gates and all these Gates needs to react you know to have some uh um basically to have your output right so for example your output wants to rise from zero to one so it cannot get to one immediately right it it will U take some time and when you sample during that time for example you can see that okay it's not so your one logic one might be like 5 volt for example you know so if you sample you're going to see that okay at this moment my output is one volt which is not zero which is not one as well logical one uh then you need to wait a bit more you're going to see okay my output is 3vt and then four volt and then in the end it's going to be 5 volt so you always want to have this stable situation that your output is zero volt which is logic zero or high volt which is logic one uh and between that is your unstable situation that you want to avoid essentially any other question yes it it doesn't um yeah it can also depend partially on the gates that you're using but in the end for contamination delay it depends on your uh shortest path of your circuit and also uh when you want to calculate contamination delay you need to actually be consider best case scenarios for latencies essentially in in the sense that you consider everything to be fast as possible because for contamination delay we want to be as conservative as possible because we want to say that the latency the con the latency of this circuit is not going to be lower than this amount for sure so for that we are very conservative because if you don't you're going to see why that is very important um but you should know that the latency of this circuit cannot come lower than that so you're going to even if you use different Gates you know that okay among these Gates this gate can have much lower latency so I'm to assume that for example because we want to be conservative in calculating contamination question perfect okay so this is also another recall for you to jog your memory you know that this sequential system design we have flip flops or registers and then we have some um you know uh combinational circuits and that combinational circuit can also drive another uh register uh registers or flip flop so the thing is that multiple flip flops are connected with combinational logic and clock runs with periods which is TC cycle time and we must this circuit must meet timing requirements for both R1 register one and R2 in this example for example so we need to ensure uh correct sequential operation meaning that uh for example talking about this R2 we need to ensure correct input timing on R2 specifically D2 which is the output of this combinational circuit must be stable at least T set up before the clock H and at least until t hold after the clock H so this is the definition for setup time and hold time you need to make sure that D is stable during this aperture time so then this means there is both a minimum and maximum delay between two flip-flops if this uh combinational logic is too fast so when when we have clock when you have this Rising edge of clock this register um can basically start changing right and then q1 is going to also start changing meaning that then we have this combinational circuit and in the end D2 will also start changing so if this combinational circuit is too fast then it might be also the case that um the for R2 we have this t- hold violation you can see from this example here that we have this clock and then q1 start changing at this moment which is the contamination delay and uh the moment that you have this change in q1 this uh combination of circuit also starts reacting and then you also have this contamination delay for this combinational circuit and in the end your D2 start changing at this uh point of time even though it should actually stay stable until this point which is t hold so you can see that now we have t hold violation so that's why our shortest pass latencies are also very important specifically for t hold um violation and if your basically combinational logic is too slow then you might have t set of violation so this it takes that amount of time until you have stable output basically stable um data on D2 but this already violate T set up because this D2 should be estable at this amount at this point of time in order to make sure that setup time is um basically preserved so now we have potential T set of violation here so we always need to make sure that we have safe timing and that safe timing depends on the maximum Delay from R1 to R2 uh the input to R2 must be a stable at least T set up before the clock hedge so let's make some calculation this is for setup time constraints so essentially your clock cycle time should be greater than so you need to make some calculation about the uh the length of this clock cycle um so you have this propagation Delay from clock to Q so that's why we have this um T pcq here and then you have this propagation delay for your combinational logic plus you need to make sure that your um D2 is stable before setup time so you add it also then you need to basically uh that's your clock cycle so your clock cycle should be at least uh summation of these values so this is uh this gives you some insight here whenever you see that your setup time um is violated one easy way to fix is to just decrease your clock frequency or increase your clock period of course this is not desirable because that can reduce the performance of your of your circuit but at least it can fix the correctness issue of your circuit but we're going to see that for whole time this is not that easy so in this uh in every clock cycle you can see that we have wasted work this is the only useful work that we are doing because you are doing some computation you're calculating with with your combinational circuit but these two like propagation Delay from clock to q and also the setup time they are wasted work that you need to basically wait for that in order to make sure that your flip flops are working correctly and uh this is actually called as sequencing overhead amount of time wasted each cycle due to sequencing element uh sequencing element timing requirements and these are sequencing elements essentially flip flops any question does this equation make sense to you this T clock should be greater than uh these values okay good usually it's not that easy uh I mean it's it seems easy when you see it but uh you need to think about it actually a bit more and to digest it um but when you get the concept you're going to see that this is very easy okay so that's the uh setup time and uh essentially for your critical pass critical pass of your circuit in the combination circuit would essentially um determined your clock cycle because in this clock cycle you can see that this propagation clock to Q is kind of constant it depends on your flip flop that you're using and setup time is also constant uh you need to assume some setup time for your flip flop that you design but propagation delay of your combinational circuit would be can be different and it depends on your basically the circuit that you're implementing so overall design performance is determined by the critical pass and it that determines the minimum clock period or maximum operating frequency if the critical pass is too long the design will run slowly if critical pass is too short then each cycle will do very little useful work remember that that combination of circuit is the useful work that we are doing and all other like propagation delay of clock to q and T setup is they are the sequencing overhead that we have so we need to make sure that basically that sequencing overhead does not dominate you know essentially the your clock cycle okay now let's take a look at whole time constraint so for whole time uh we want to make sure that our uh basically D2 stay stable after the clock cycle after the rising edge of the clock at least for t hold so for that in order to calculate the minimum latencies as I said uh because we want to be conservative we go with contamination delay so essentially uh the minimum latency that would cause D2 to change is contamination delay of clock to q and contamination delay of this combination of circuit and this should be greater than hold time essentially and you can see that um this equation has nothing to do with clock cycle and that's why if you have whole time violation you cannot fix it by clock cyle as I said with setup time if your setup time is violated you can just reduce clock frequency but for whole time uh you need to basically sometime you need to actually change your circuit and we're going to see in an example how we can do that so your uh contamination delay of your combinational circuit should be greater than T whole time minus uh contamination delay of your flip flop uh clock to Q okay so these are basically summary of these parameters um that you can also check later but now let's uh go over these numbers with some with this example so this is our example circuit we have four flip flops here and two flip flops here and then we also have this combination of circuit in between and these are the latencies that we're going to assume um this is contamination delay for clock toq 30 P second propagation delay clock to q and so on so forth okay let's make some calculation we need to calculate uh first uh latency propagation delay of our combinational circuit anyone any guess yes so we have it's per G most we GES so it's three times exactly so that's your longest pass what with your contamination delay you just go with the shortest pass so there are two actually shortest pass you have this short shortest pass and also these two short and then you also need to get your contamination DeLay So for each of them U we know that the contamination delay per gate is 25 then the contamination delay of this circuit is 25 P so now we want to check setup time const Str so we know that the clock period should be greater than this uh equation so we just make this calculation uh propagation delay of clock to Q is 50 plus 105 that we already calculated and setup time which is 60 then you're going to see that okay clock period should be greater than 215 P second meaning that your frequency can be um at most for 4 65 GHz that's your frequency now let's check whole time so the for the whole time uh the contamination delay of your uh flip flop is 30 and the contamination delay of your combination circuit is 25 so 30 + 25 is 55 and you can see that it's not greater than your whole time which is 70 and now we have whole time violation so any idea how how can we fix it yes yes essentially you need to add logic uh in order to basically you need to artificially increase the latency of your circuit so you add these buffers and then when you these buffers does not change your um longest pass still your longest pass is that so setup time calculation is not changing at all but for the whole time now you have uh basically contamination delay of two gates so it's going to be 2 * 25 which is 50 plus um 30 which is 80 so 80 is greater than 70 so now we don't have whole time violation make sense okay let's see okay so I think I think I'm not going to start clock skew um because this we need some time so let's have break until 3:15 and then we will continue e e e e e e e e e e e e e e e e e e e e e e e e e e e for e spe okay uh let's continue we still have a lot to cover today so we already talked about um sequential latency and how to ensure we have we don't violate setup time and whole time but essentially to make matters worse clocks have delayed too basically the clock does not reach all parts of the chip at the same time and that's we call it as clock skew time difference between two clock edges so here is an example um for this flip flop for example your clock reaches at this um at this certain time but then um this is your clock Source it may take a lot of you know long paths to reach to the second free flow so definitely this flip flop would also you know um would see the clock a bit skewed which is uh because of long slow clock pass so here also showing this waveform using our abstraction showing of waveforms that this is your clock source and point a because it's very close to clock source is um observing the same but in point B we are observing a bit skew and that can cause issues here is an example uh from real processor alpha alpha processor that they show the clock skew in the chip and you can see that in the horizontal AIS and vertical axis of your chip you actually see different ex skes so now we need to revisit setp time and hold time while we have clock excuse so the thing that uh for setup time clock can arrive at register 2 before for register one meaning that um your register two would see the rising edge of the clock earlier than register one and that's what what why why is it important it's important because your um latency your setup time now needs to be increased so I'll show you with this example so this is your clock two and it comes earlier and this is clock one and then you can see that Q one uh starts to change a bit late because clock one actually comes late and that's D2 also gets delay because clock one comes late and as a result if you don't consider the the fact that you have Clock Two is earlier than clock one then you may violate setup time so in order to fix this issue we need to increase potentially our setup time and in our equation we just add a SK to this equation and we call it as a effective setup time so essentially you need to calculate the worst case uh scenario for your SQ and then you add that to your setup time and that would be your setup time essentially effective setup time that you need to use it in your calculation so for whole time we think another way around we think that we consider the clock arrives at uh R2 after R1 so that can increase the minimum required delay for the combination of circuit why is that anyone can provide some interpretation yes the signal starts changing earlier because the is is fast yes exactly so Q uh q1 and the D2 and they start changing earlier uh because we uh get this Rising edge of R1 earlier essentially and that can violate whole time for R2 so in order to consider that in your equation you need to add um T SK to your whole time as well and this going to be your effective whole time latency in the end good okay so yeah that's a summary that basically you need to that increase your sequencing overhead because you need to add that skew to your setup time and hold time and then you're going to have less useful work done per cycle and designers must keep skew to a minimum which requires intelligent clock Network across a chip and uh there is actually research a lot going on in this topic that people try to clock uh you know design the clock Network in order to make sure that clock arrives at all locations at roughly the same time there are many many different ways of doing that um H3 for example is one of the famous way of clock network but there are also many other but that's still uh a big problem for uh designers essentially question I remember that I said that uh in the previous lecture we had this uh example that we wanted to divide the clock frequency by three right and we were using uh FSM you know uh to divide the clock frequency by three and I said that okay this is an example but this is not the way uh when you want to change the frequency of clock Cycles because in uh if you get the clock Network actually you have a different layers usually in your uh chips to Road clock wires and when you want to do some um computation on your clock to change your clock you basically need to bring that clock wire to the to your logic and that's cause some delay actually that can Inc exacerbate your skq and that's why people try to not touch clock Network at all so they try to avoid um moving um basically doing computation on clock Cycles so clock can be only used as the input for your flip-flops but not as input that you want to also operate on clock and make a difference to your clock frequency there are some analog circuit like PLL for example that they are designed in order to um basically change different frequency provide different frequency of the clock I'm not sure in the fvj that you are using the lab if it is provided or not but for example in fbj also you can choose the clock frequency that you want your design work and there is a PLL there that actually by configuring that PLL you can um choose between the clock frequency that you're looking for okay so now let's get to the another interesting part of this lecture which is circuit verification and the question that we want to answer here is that how do you know that a circuit works so we design a circuit there are several question that we want to ask is it functionally correct even if it is logically correct or functionally correct uh does the hardware meet all timing constraints how can you test for functionality timing and to answer that we have different ways of that um we actually rely on simulations a lot so there is a formal verification tools like sat solvers uh that they are useful they can provide like theoretical Foundation that this design works but they are also expensive um and you cannot use them easily for you know very very complex and big designs we also have this HDL timing simulation like VI that you're going to use a lot in your labs and we can also do these simulations at the circuit level at the transistor level using tools like spice for example but in general testing large digital design is not easy and it's it's the most time consuming design stage there are reports that actually like 70% of the design time in a modern processor actually needs to be allocated for the for this verification so a lot of effort needs to be done um in order to make sure that that design works correctly and uh basically uh test Engineers they have to do a lot in the design process of the modern processors as well as any U logic like memory chips as well um so we need to make make sure the functional correctness of all logic paths timing power Etc of all circuit elements for example you want to make sure that your circuit is not going to uh consume power which is not uh loot for example your you want to put a circuit in your for example ear to use it and that circuit you need to make sure that um your ear does not bear does not burn essentially right so that circuit should not for example consume more than I don't know some microwatt you need to make sure right it's not easy so you need to test it and make sure that design is not going to consume more than uh some amount of power so unfortunately lowlevel circuit simulation is much store than high level like HDL and C simulation and the solution that people devis for that is that they split responsibilities we usually check only functionality at the high level C or HDL which is relatively fast simulation time uh that lows High code coverage and is much easier to write and run test and then we check only timing power and Etc at low level meaning at circuit level for example so for that we don't do any functional testing anymore and then instead we try to only make sure that the functional equivalence is correct so our circuit has this fun like we test the functional equivalence to the high level model of the circuit that we have and for that also there are some theoretical Foundation that people are following meaning that if you're HDL circuit htl code um you follow the principles well and you write a good code and your code is synthesizable well then uh this gate level synthesized version of that or even lower than transistor version of the transistor level version of that would work the same like in terms of functionality would be correct so the functional equivalence would be uh there but your circuit level simulation might not um basically might violate timings and that's another issue which you need to fix but still this functional equivalence is also hard but easier than testing logical functionality at this at low level essentially any question good so we have tools to handle different levels of verification uh we have logic synthesis tools uh that guarantee equivalence of high level logic and synthesiz circle level description that's what I said so in VI when you input H your htl code um the logic Sy synthesizing tool that you have and viado is not actually the exact U great example because uh the tool chain is for a FG but for example design compiler for synopsis uh when you input your htl code and your htl code is synthesizable then your gate level output is actually has this exact logic that you are looking for but for timing verification uh you need to use some other tools and these tools check all circuit timings and there are also design uh design rule checks uh to ensure that physical circuits are buildable essentially so the task of a logic designer is to provide functional test for logic correctness of the design for logical correctness of the design and also provide timing constraints like desire desired operating frequency as well as also designer need to be involved also into um timing as well as we already seen uh when we have whole time violation designer actually need to change the design so unfortunately designer also needs to deal with timing as well a lot like how to fix the timing it's not only saying that okay these are my constraints and then make the circuit for me you also at some point the designer also need to find Solutions in order to fix the timing violations there are tools and circuit um engineers and they they will decide if it can be built like in the end tools and or circuit Engineers will decide if your circuit can be built but we will first start with functional verification and we will actually go into a lot of detail in this part because you're going to use this a lot in your Labs essentially so the goal for functional verification is to check logical correctness of the design physical circuit timing like setup time and hold time is typically ignored like all timing actually usually are completely ignored in your in this logical correctness uh test we may Implement simple checks to to catch obvious boxs but basically the goal is not really make sure that your design is not violating any timing and we will discuss timing verification later uh soon in this lecture there are two uh primary approaches one is logical logic simulation using C C++ very L test routines or using formal verification techniques like sat solware for example but in our course we actually use logic simulation so let me introduce you test bench so test bench is a module created specifically to test a design we our design that we want to test it we called it the device under test which is dut so this is your test bench uh that has different parts the in this part you need to generate some test pattern like the patterns of different inputs that you are looking for and then you input these test patterns to your uh design under test device under test sorry and then uh you need to have a logic to check the outputs and make see that if everything going uh running correctly or not so yeah again a test bench provides inputs uh to the to the dut and that can be these inputs can be handcrafted values or can be automatically generated using some sequential or random values and the test Finch has to check outputs of the uh device under test against handcrafted values again or a golden design that is known to be bug free we're going to see about golden design very soon but very briefly you may have a model of your design that you call it golden design and that usually is implemented with much higher abstraction and you make you you're sure that that golden design is bug free of course to make sure that is also bu free there is also a lot to do but as long as you are sure that that golden design is buf free you can just compare the output of device under test with the output of your golden design and then you can see that if your design works correctly not we're going to see more about that soon a test bench can be implemented in HDL code written to test other HDL modules or can be also circuit schematic used to test other circuit design but we going to use htl code uh in this course and um as we as you also probably know and we also discussed a bit yesterday test MCH is not designed for Hardware synthesis so it only runs in simulation only and there are many commands in your test bench that they are not synthesizable like definition of these timing some delays that we're going to use in order to generate some PSE so all of them are not really or maybe you have some function like print uh display I mean these are not synthesizable right uh but you you're using them in your test pinch in order to test your system for simulation and uh this test PCH can be also in htl simul simulator like V simulator or can be also in spice circus simulations EXA for example but spice circus simulation we are not uh covering at all in this course so don't worry about that uh testbench uses simulation only constructs like weight for example 10 nond or we are considering ideal voltage current source and of course they are not suitable to be physically built there are uh three ways three common ways of very L test bench types one would be we call it simple which in the for input and output generation is manual and error checking is also manual so you as a designer or the one that you are doing test you need to do all these task manually we can also do it have this in a self checking like input and output generation is manual but error checking is automatic and we can also do everything automatic meaning that input output generation and error checking all all automatic and we're going to see with some examples very soon so we're going to use this uh silly function uh to go to walk through these different kinds of test bench U uh basically the the common ways that you can design so this is definition of your modu and you can you see that we are using gate this completely structural way of programming and we are using Gates not not and and or of course this is not the the best way but this is very good uh way of Designing for for the sake of our uh example that we're going to see today so before jumping through the different syn uh basically test bench that we can write for this code uh let me introduce you some useful very like Syntax for test benching one of them is initial block which is similar to always block you can actually also have some uh sensitivity leas for initial But Here For example we don't have and initial actually is uh uh executes only one so when you enter your initial block that's the difference with uh between initial and always Block in always block you always basically try to repeat it but in initial you only execute it once so you can use it uh in your and also this uh notation like hasht 10 showing the latency that we wait um or for example do nothing for 10 nond and you can also uh print stuff in your um console in vivado for example you can use this this operation dollar display and basically um string that going to be showed okay now let's start with simple testbench uh this is our test bench one uh and we want to test our C function so we need to instance instantiate from our function as you know so when you have this test PCH actually your test m is going to be your top level maure remember um our bottom of design so your test M you can consider as your top level maure and there is a sub module that you're actually instantiating and that sub module you're going to test it so your top level Mau test MCH has no input output but has some values like um internally has some wires and registers so you instantiate from your uh module that you want to test it and then you in the initial block you're going to basically give some values to these inputs like for example you put a z b0 and c0 and then wait for 10 nond and then change the uh input C to one then wait for 10 nond and then another bit Vector another and another so once you have this and then you can run your simulation and then you can uh generate the waveform diagrams and in waveform diagrams you can you need to manually check um basically the output and make sure that output actually is correct for different combination of the these inputs so as you can see that this is not this is easy to create but it's not really easy to check and it's it's not really uh scalable when your design gets more complicated so as proos is like easy to design can easily test a few specific inputs like Corner cases but as cons like not scalable to many test cases and output must be checked manually outside the simulation uh which is not easy and it's very much similar to print style debugging when you want to debug for example your Java code C++ you know one way is to add a lot of PR you know which is not the best way of debuging for sure another approach would be self-checking test bench so yeah again you are instantiating from your C function and this is your initial block uh you're initializing these values but then you're actually checking the output of these uh input in your code as well so you are not checking the output in your wave form so you know that the output should be one for this uh input combination so you're saying that if one uh is not equal to one then display uh this input Vector 0 Z fate so and with that you can see basically some errors essentially so so Pros is still easy to design and easy to test simulator will print whenever an error occurs but cons is still not scalable to millions of test cases and easy to make an error in hand-coded values so you make just as many errors writing a test MCH as actual quote so there is no guarantee that you are not making errors while you're writing your test MCH right because I mean you can make errors humans make errors and hard to debug whether an issue is in the test bench or in your device under test you can also make it a bit smarter and you can use test Vector file so you can put different uh basically you can make a file and then um put inputs input vectors and the output that you're looking for in them like here is an example and U then so this this actually um test Vector can can be created manually or automatically manually is designer just you know fill it up or automatically can be also U basically created using your golden model that we're going to also seen very soon so the way that we use this test Vector is that you need to read values from your file essentially so you have a test vector and is in stored in a file you need to read it for that your book consider that there is a clock so you this this clock has nothing to do with your the clock that you're going to you know add to your circuit for your sequential circuit this is another clock that you're going to consider only for your test bench so in the rising edge of your clock cycle um you read uh from your file and apply input uh to to your uh device under test and in the falling edge of your clock cycle you check the outputs on um basically of your device under check to see if it's working well or not we're going to see with example but just make sure that do not confuse this clock cycle with the clock that you are inputting to your sequential circuit this is different okay so here is a very L code for that um so yeah this circuit for example again our C function didn't have a clock but here we Define a clock specific for our test bench and we need to generate the clock cycle for that this is a very normal way of generating block cycle in your test bench you need to always block and you say that okay begin clock one and then you wait for five NC and you make your clock zero and then you wait for five nanc this means that your clock period is 10 nond and this will repeat all the time because your always does not have any sensitivity list essentially okay and uh in the in the initial also statement um you need to read basically you need to open this file example the TV the file that we have all the input vectors and output and map that to test vectors and test vectors is essentially defined here is as the array of test vectors it is uh each element is actually for bit in our array why is it forbit because we need essentially four bit right one three of them for inputs and one bit for output essentially so that's why we have four bits uh for each element of this test vector so you open this file and copy to your test vectors so this read vectors and uh you also reset initial Vector n to zero meaning that you want to start your test from zero and the number of errors that you have observed so far is of you reset to zero and then you also reset the system so and then you wait essentially so that's your initialization then we have this always at the positive edge of clock that you in the positive Edge you are giving input to your uh basically device under test so at the positive Edge you read from test vectors Vector n which is initially zero so you are reading the first element and then you're mapping that to your inputs as well as y expected y expected meaning that the output that I'm expecting so that's why you are um basically moving uh the output to this y expected so that that is stored in your file essentially and then we apply ABC inputs on the rising edge of the clock and get y expected for the checking the output on the falling age so this rising and falling age actually are chosen only by convention your book actually uses this but you can use any part of the clock signal as you wish so this is not really important just the concept is important here so in the Nega negative Ed of clock uh unit if it's not reset we U need to check so if the output of the circuit in that Negative Edge is not equal to Y expected then we can display errors and then we increment our errors and in the end if uh out outside of this also we increment our Vector n in order to test the second um element and we need to go further and further and at some point you need to see that um you need to have the end of your test Vector so you may fill it up as 4bit binary XXXX and you check it if it's that the case then you say that okay test completed with um how many number of errors that you see so this is uh easy again to design and easy to test and simulator will print whenever an error occurs and no need to change change hardcoded values for different test so you have this test Vector file and you don't need to hardcode it but as cons maybe still error prun depending on source of test vectors so how you generate that test Vector essentially uh are you using golden um basically logic golden model of your device if that's also the case how you are sure um that that golden design is error free so this not really easy to check and also more scalable but still limited by reading a file and might have many more combinational path to test than we fit in the memory but we also have another way of testing which in my opinion is the best one automatic test bench which we use Golden models any questions so far okay good so a golden model represents the ideal circuit behavior and must be developed and might be difficult to write so this can be done in any language or even in verog for for our basically silly example the golden model would be as easy as this right so you had this basically very low code with a lot of gates but you could easily write the code with uh this assign statement essentially so that's a high level abstraction so here it was easy but for for many actually um real real device designs actually coming up with the golden model is not that easy actually and there there is a team in the design that they are actually working on making this golden model uh and make sure that is error free okay so golden model usually easier to design and understand and golden model is much easier to verify we hope for that actually so the automatic test bench uh is like that so you generate test pattern and then you input to your device under test as well as your golden model and then you check equality of the outputs of these two models essentially so the challenge is to we need to generate inputs to the design and this can be sequential values uh to cover the entire input space or can be random values or something between or many other ways also different base so this is a way of implementing this test M so now you have two modules that you need to instantiate you instantiate from U device under test and also your golden model and you give them the same input Vector ABC and ABC and then the outputs you need to check them if they are correct or not essentially so if um output of device under test is not equal to Output of your gold model then you need to display some error is happening essentially so as of course this output checking is fully fully automated automated and could even compare timing using a golden timing model so people also came up with you know designing uh gold golden timing model to check timing of your circuit it's not in the scope of this course but you can search for it if you're interested and it's highly scalable to as much simulation time as it's visible and leads to of the reason that scalability is important because we want High coverage in our test so coverage meaning that how many test vectors for example you could uh input and how how much you can basically check the outputs and in the end if you have high coverage then you are kind of confident that your design is working correctly and also a better separation of roles so now we can separate designers uh some designer can work on device under test and some other designer can work on the golden model device under test Testing Engineer can focus on important test cases instead of output checking for example and as cons creating correct buen model may be very difficult and and coming up with good testing input may be difficult as well so however even with automatic testing how long would it take to test a 32-bit Adder so in such an add 32 bit add meaning that you want to add the two numbers and each are 32bit so in overall you have 64 inputs so you have two to the^ 64 possible inputs if you test one input one input pattern in one nond you can test 10 the^ 9 inputs per second for example meaning that this amount inputs per day and this amount per year and you need a still need basically 58. five years to test all possibilities so that's clearly show that root Force testing is not feasible for most circuits and we need to prune the overall testing space so that's why uh we we can use for example formal ver verification methods to choosing important cases so you can you know benefit from different way of a way of um ensuring the correctness and combine them in order to have more intelligent way of uh verification and as a conclusion I would say that verification is a hard problem and that's why actually as I said like 70% of the design time on average is uh allocated to to the testing and verification of of the modern processor for example any question and all these tested devices they might also they may fail after when they come to the market because you you cannot have 100% coverage essentially in your test okay let me also quickly go over timing verification and then we will conclude any question good so for timing verification um we have approaches we can also do it at high level simulation um and because we can model timing using this notation hasht statement in the device under test which actually useful for hierarchical modeling we can insert delays in flip flops basic Gates memories Etc I also briefly mentioned that in cell libraries that you have um there are delays of different cells and after synthesizing your circuit um your HDL code when you synthesize it to gate level or to Gates that they are defined in your cell library then you can do simulation which is post synthesis um simulation and in that you can actually uh all these numbers if you open that very L code you can see that all these hashtag um values minimum typical and maximum latency of gates flip flops everything is annotated in that very L code in order that in order to make sure that when you are simulating this circuit you're considering timings of all these circuits and that can help you to verify um basically timing violation or not in your circuit using simulations so this is usually not as accurate as real circuit timing of course because when we want to when we are using these numbers we are basically abstract abstracting the latency and at some sometimes we are actually considering worst case numbers you know uh in order to make sure that uh the the latency that I'm considering is not is for sure my design would work better than that so you actually consider sometimes the worst case latency when you want to embed that latency your very L code so that's why in the end if you have this real circuit timing that would be best but you cannot build your uh essentially your circuit um for every timing verification but there is also a way that you can use a circle level system simulator like HS spice for example that you can synthesize and then try to simulate spice model of your circuit that has the model in transistors and this spice model can basically model all these resistance capacity s transistors uh that you have and then you can probably get much better understanding of your latencies but there's unfortunately there is no one General approach um which is very design flow specific for a f a etc technology has a special tools for this for example Zink VI what we are using for the lab is actually for fpgas for as for example we can we have synopsis or Cadence tools these are different companies and that can be used for vsi designs so the good news is that tools were tried to meet timing for you so we just Define setup time whole time I mean whole time is actually defined by your s Library essentially but setup times for example you can Define I'm sorry uh setup you can also Define setup time also uh but you you also Define for example the frequency that your design you want to work on and you can also Define the clock skew and basically uh these tools they are doing their best to make sure that uh they can meet the timing that you looking for and in the end they usually provide a timing report or timing summary and for in this report you can actually get some understanding of worse case delay pass like what is the worse case latency that you have from which input to which output and for example we can also get maximum operation frequency or any timing errors that we found but the bad news is that the tool can fail to find the solution and the desired clock frequency might be too aggressive and can result in setup time violation for example on a particularly long pass or you might have too much too much Logic on clock pass that can introduce excessive clock skew as I said you should avoid having too many Logics on your clock pass and timing issues with asynchronous logic also they can also cause issues so the tool will provide hopefully helpful errors and reports will contain paths that fail to meet timing like when when it's failed you can actually get uh query some of these reports and you're going to see that okay the setup time failed in a in a pass from input a to Output y for example and then that gives you a clue to go into your circuits and try to optimize uh your design in order to make sure because you can actually help uh your uh basically U tool in order to finally meet your U desired frequency for example okay and that these reports can give a place from where to start debugging for example so now the question is that how can we fix timing errors uh so meeting timing constraint unfortunately this is often a manual and iterative process so meeting a strict timing constraint like high performance design can be tedious as well so we can try sentences and place and wrot with different options so some because as I said these U tools uh they are doing they are doing a lot of htic when they want to optimize and meeting your constraints so you sometimes you can just repeat and that repeat can maybe fix the problem you know because when you repeat and then you start with a different random seat then that heris stic may work a bit differently and then you might be lucky and then everything would be fine you know and then uh you you're going to get what you wanted and you can also provide us some hints uh to the to these uh devices to these tools for place and road that can be actually used in advance for example place and road for fbgs there are some people that they go into the place and root uh file output and they try to edit place and root uh in order to make sure that for example they reduce latencies so you can try that uh but in the end that may not work and you can also manually optimize the reported problem pass so you can simplify complicated logic you can split up long combinational Logic passs for example if you have mil Millie machine you have several mil machines and you're connecting all these mil machines you have a very very long combinational circuit and that can affect your timing so you may decide okay now I'm going to use Mo machine and that can fix a lot of problems for example for you so a split of long combination of logic path is one way it's not only by changing from from melee to more you can also add registers in between so you can actually divide the combination of circuit into two um by adding a register in between and you're going to see a lot about that when we talk about pipelining in the processor in the microarchitectural processor and also recall that fixing whole time violation uh we might need to add more Logics but there are some principles that we need to basically follow when we want to meeting timing constraint so let's go back to the fundamentals clock cycle time is determined by the maximum logic delay uh and maximum logic delay that we can accommodate without violating timing constraints so there are several good design principles that we should follow we're going to also look into these design principle later on when we also look into microprocessor design but here I'm going to also provide them so critical pass design we want to minimize the maximum logic delay because this can maximize your performance and your clock frequency is defined by your critical pass design essentially and you want to have a Balan design so you want to balance maximum logic delays across different ports of parts of the system between different pairs of flip flops so you might you might have one for example flip flop and then combinational circuit another F flop combinational circuit another F flop so your clock frequency is defined by the longest combinational circuit that you have you might have some combinational circuit that they are quite fast but your clock frequency cannot determine by them because you have other combination of circuits that quite a slow so it's very important that you make sure that your design is balanced for example all these combinational circuit have the latency around for example 200 PC it's not good that for example you have some combinational circuit in between that latency is like for example 50 and some combination circuit that the latency are for example 200 500 so then your design is not balanced and this one of the principle that you need to to follow in order to have a very uh good design and another design principle that we call it uh we like to call it bread and butter is that we optimize for the common case um but we should also make sure that non-common cases do not overv the design for example in your uh this actually is going to be more um detailed and easy to understand when we get to the microprocessor design but for example when there are some input vectors or input patterns that you know that these input patterns might happen much more like very very likely and there are some input patterns that they don't appear much or they may not appear at all so if you know that information you can actually try to optimize your system your design for the input vectors that they are very likely happen so that can provide better per overall better performance so that's why we are saying that we we are optimizing for the common case but we should also make sure that those non-common case they do not overwhelm the design as well okay so we learn a lot in this lecture we covered timing in combinational circuit timing in sequential circuit and we also look into briefly about verification how to functionally verify and timing verification and that brings us to the end of today's lecture and also end of digital design part of this course uh next week we're going to start with uh computer architecture and we're going to start with fun and uh topics that they are quite interesting any questions then I wish you a nice weekend see you next week

Transcript for:[Lecture 6] Understanding Circuit Timing and Verification

Transcript for:
[Lecture 6] Understanding Circuit Timing and Verification