[Lecture 8] Understanding Instruction Set Architecture

e e e e e e e e e is that the reason okay there you go maybe human error but it turned off by itself you told me that you were hearing me okay okay people did not miss anything if they didn't hear here online we're just getting started with some delay testing didn't work well now can you hear me online hopefully people can raise their hands online and we can do the testing with people online okay nobody says anything so hopefully they're hearing if they're not hearing they'd better be in class right I'm just joking because we have other people who are attending that are that cannot come to class okay uh yeah so office hours some reminders extra credit assignment one two hopefully you want to get extra credits don't let uh llms do that work for you we can we can actually uh see the difference between people who write reviews using llms this happens in the scientific Community also you submit a paper like original scientific work and some person reviews the paper but it turns out that person didn't review the paper because you can tell and llm reviewed the paper very likely uh or this person was like an llm both of which are not good in my opinion in the end because you get something that's not very interesting not very deep uh you're looking for feedback from an expert that is supposed to give you some really good quality feedback on a paper that you've written and what you get is some generalized comments that are not very interesting okay that's my I'm going to start tarping on LMS at this point but we need better ways of developing these things let me finish it that way okay but we're not going to talk about those we're going to talk about uh isas inst structures and architectures and I'm going to go through some slides to remind you what we were talking about we covered the one nyman machine I'm going to go through this very quickly again and we started looking at instructions at architectures and these are your readings this was the one noyman model you remember the five components we went through all of them and we said there are also two major Hallmarks of vol machines stor program and sequential instruction processing you should remember this all the time and we started going through L3 lc3 which is a one Nyon machine uh one Nyon model is an execution model and Isa instructions at architecture implements a version of that execution model obeying the one Norman principles and this picture is a microarchitecture that implements that Isa and this is one implementation as I said yesterday and we were going through some of it and we started going through some instructions we specifically covered an operate instruction ad instruction for example and a load instruction today we're going to Branch out uh which is kind of synergistic with control flow branching out right you change the control flow change the sequence of execution we're going to see more instructions today basically but again just to remind you instructions go through a processing cycle multiple phases of execution fetch decode eval address fetch op print execute and store result and you go back uh to the fetch to fetch the next instruction that's the sequential execution right you keep doing this over and over for every instruction in the program until you get to the end and we discussed that we have a stored program computer meaning you don't distinguish between instructions and data whether a value fetched from memory is interpreted as an instruction depends on when that value is fetched in the instruction processing cycle I said this yesterday but I didn't put it into words now it's put into words on the slide also so for example when you're fetching from memory and if you're in the fetch cycle of the finite State machine you have seen you interpret that memory value as an instruction at that point whatever you fetch from memory is interpreted as an instruction and put it into the instruction register if you remember IR but if you're in the fetch operand stage in the finite State machine in the instruction processing cycle and you're doing a fetch from memi uh then the interpret the memory value as data right so it could be exactly the same memory location uh the memory address you generate and put it into the memory address register could be exactly the same in the fetch State and the fetch operant State except how you interpret it depends on whether you're doing it at the fetch stage or the fetch opan stage okay and this was the last thing uh one of the last slides we covered not the last one last one is going to be the next one uh we were actually going into the control flow instructions that change the sequence of execution so that you can do hopefully more meaningful things like decision- making in your programs so this an unconditional Branch or jump uh you can see that uh this operates on base register so this is the encoding we went through some encodings also yesterday this is the up code uh and this is the base register the the rest are set to zeros meaning that they don't they don't matter for the purpose of this instruction and the semantics of this instruction is change the program counter or update the program counter with the value that's stored in register 2 base register in this in this case uh so this is called register addressing mode we're going to see addressing modes later today but this is an example of a jump right and there are other examples of jumps that we will not necessarily see but we may look at later uh in lectures so how does this work if you want to implement this in micro architecture you need to provide the data path uh or path from the register file take one register and place it into the program counter and that's what we do basically you access the register file get the source register one out and and how do you know it's Source register one you basically index uh one of the source registers with bits coming from the instruction register over here and you know the exactly which bits over here these three Bits And once the data is out you ensure that uh in in the execute stage of this instruction uh the program counter is loaded meaning you right enable the program counter that the data flowing from Source register one goes into the program counter make sense and you should know essentially the logic that goes into all of this because we have actually developed the registers wires uh and you know how to how a register files accessed also uh you're going to see more and more of this later on does that sound good okay so that's a jump uh okay so this is the slide that we stopped at actually I I will maybe go through this relatively quickly uh but the these are the this is uh the state machine or part of the state machine that we looked at it's not exactly the same state machine because this is a more instructional version but essentially this is what you do during fetch you have state one uh every instruction goes through these three state states in the first state you uh load the memory address register uh with the program counter and for this uh to work uh the sequential logic should assert gate PC and ldm if you remember the gate PC uh this is the gate PC over here this is the uh you take the value in the program counter this trate buffer is enabled so that the value in the program counter gets loaded on the processor bus it's flowing and you ensure that it goes to a memory address register we went through this yesterday I'm going through this again just to remind you but to make sure this happens you need to also right enable uh the m in this state meaning ldm signal is assert asserted okay so the key takeaway over here is you have a finite State machine that's controlling the execution of every instruction in the machine and every instruction in this case goes through a state one two three and in state one you have some control signals asserted in this case the control signals are gate PC and ldm such that the program counter gets put into the memory address register and then uh you also update uh the program counter by incrementing the program counter by one this is lc3 It's word addressable every instruction has is a single word so you go to the next word so there's something else that's asserted if you go back to that over here switching between these two are too much overhead but essentially what we do over here is in the fetch State you basically take program counter incremented by one and you need to have data path elements uh to enable that and the PC Max this multiplexer should select this input and load PC should be asserted so that you actually increment the program counter by one and the reason why we inclement program counter by one is because we're kind of done with this instruction right done in the sense that we fetched it we're going to write the instruction to the instruction register we didn't fetch it completely yet we're going to fetch it actually in state three but we're done with the program counter we actually took the program counter put it into the memory address register and we're going to fetch from that address now we should move on to the next instruction so this a design choice that someone micro architecting the machine made you could have incremented the program counter somewhere else essentially right it didn't have to be done in this state necessarily right as long as you did before you start the next instruction and you don't start the next instruction until you finish this particular instruction that's pointed to by the PC okay uh so I'll go through the remaining States a little bit more quickly the next state is memory access State basically you wait until the memory Returns the data in the MDR so you you need to make sure that Lo MDR uh uh signal is asserted but this is just instructive because it doesn't have an arrow over here that should be going this is a conditional this a state where you stay conditionally if the memory is not ready you keep staying in the state until the memory becomes ready okay you will see the full picture in the next slide and then uh when the memory is ready the data is in the MDR and you need to ensure in state three the finite State machine make sure that uh the data in the MDR gets put into the instruction register let's go back to that over here so in this picture uh assume that the memory is uh finished uh and loaded the data into to the MDR we need to make sure that data flows from MDR into the IR instruction register to be able to do that the control signal for this gate MDR should be asserted such that the trid state buffer is enabled the data from MDR flows onto the wire and then after it flows onto the wire it's flowing here and we need to capture it in the instruction register such that so and to be able to do that we need to write enable the instruction register so LD signal is also asserted okay so it's very simple basically of course as a micro architect will to design this finite State machine but here someone designed the finite State machine and gave it to us okay this next state is decode uh in this decode State basically what happens is uh the uh the finite State machine checks the up code which is the top four bits in the lc3 ISA top four bits in the IR instruction register and based on that it decides which next state to jump to and this next state the term is based on the up code for example if it's an ad instruction if it's an ad up code you go and follow the sequence of these states if it's a load register up code you go and follow these states if it's jump like what we have seen you go and follow these states so in case of jump you go to a state where in that particular State let's say uh the register ID that we have seen earlier gets loaded into the program count which is essentially what we have shown in the previous slide this is State 63 in state 63 you ensure that the control signals are such that PC is loaded again and you actually access the register file using sr1 which is coming from the instruction register over here okay so hopefully this is clear if not you can study it a little bit more uh there's no magic involved this is all based on everything that we've learned in the digital design part of the course and now a little bit microarchitecture part of the course it's a finite State machine and the full State diagram is here essentially this actually the real diagram the previous one was kind of like a toy diagram to illustrate uh the decode phase you don't really need this particular State actually uh you you can jump directly into uh a state where you update the PC with register as I will show you right now so these are the three fetch States this is the decode phase and then if this a jump if you can read the jump over here that's a jump you directly go to in in this case State 12 and you load the program counter with the base register and the next data is 18 where you do the petch for the next instruction and this is lc3b lc3b is B addressable so you increment the PC by two so these are all things that you need to think about uh for different isas instructions at architectures make sense but principles are the same you can build a state machine like this for myips you can build a state machine like this for x86 or this computer over here except the states will be a lot in that particular computer okay okay so that was the instruction cycle uh now let's go into instruction set architectures a little bit any questions before that this all makes sense hopefully okay so we're going to go a little bit more time we have been talking about instructions at architectures but we're going to talk about a little bit more other things like data types addressing modes and a little bit more on op codes before that and we have clearly studied ad and ldr in a little bit more detail uh even though you may not have realized that I actually gave examples from ad and load register and these are actually using up codes clearly there are up codes over here uh they're using some addressing modes which we called out but we're going to see more addressing modes and they're also using some data types for example this ad is operating on two complement uh integers uh which you may have read about in your book I'm going to briefly mention that also that's the only data type in lc3 but they could be operating on floating Point uh data types also and we're going to have have some fun with data types also okay so what is instruction set architecture uh essentially it's really the interface between what the soft commands and what the hardware carries up so you saw a version of this uh in a previous lecture this is a simplified version you have written a program and you have a microarchitecture instruction set architecture is really the interface between the software program and Hardware micro architecture I'm going to lose on this later on a little bit uh but this is actually the uh the hard definition of Isa it's it specifies many things uh as you will see it's actually more than that but at the basic level it specifies a memory organization which we have seen uh it specifies the address space addressability whether you're bir or B addressable for example and we've seen examples of this it specifies how many registers you have uh near uh the alus in lc3 it's eight in mips it's 32 and we've seen that also and then it specifies the set of instructions and the set of instructions are defined by multiple things up codes data types and addressing modes and also on top of this length and format and encoding of instructions and you've seen all of these in some way now let's talk a little bit more about them okay off codes so essentially there are a lot of trade-offs that you can make when you actually design Nia how many instructions do I have for example do I give thousands or millions of instructions or do I have only a basic set maybe 10 12 this is actually an interesting trade-off and we're going to cover the trade-off a little bit but it could be a large set of op codes or a small set of op codes clearly if you have very large set of op codes uh this complicates the hardware design right the hardware needs to support a million different instructions if you have only 10 the hardware needs to support only 10 right the hardware can be simpler if the number of op codes is smaller that's that's actually the think uh thinking behind reduced instructions at computers so mips for example the ISA that you're going to use in your labs and the ISA we're following is really A reduced instructions at computer with fewer up codes okay for example uh this is an up code that was there and uh h pecker it's Precision architecture you can see that a * B plus C it's multiply and accumulate which may not be a bad idea you put together two things right as opposed to just having a multiply and separately having an add they decided in addition to having those separate things also let's have a multiplying ad right and you could make it a floating Point multiplying ad which makes it nicer for machine learning workload of today instruction later in lectures uh when we talk about simd and gpus we'll also talk about multimedia extensions these are instructions that operate on many data values at the same time con currently we're going to talk about you may have heard for example uh streaming simd extensions uh Advanced Vector extensions and more recently Advanced Matrix extensions these are op codes that are actually operating on many data at the same time which could be useful for Matrix operations for example there could be very complicated op codes like VX Isa had an to save all information of one program prior to switching to another program all of the registers and some of the special purpose registers so you could imagine many things over here sky is the limit in the end uh and if you define it in the ISA then somebody needs to implement it in Hardware so there are many trade-offs involved uh Hardware complexity versus software complexity uh some of these up codes make the software simpler in the sense that at the high level language you have some constructs right it could be nice matrix multiplication so if you're doing matrix multiplication and if you have an OP code that says Matrix multiply that's a very nice fit right you don't need to translate it with a lot of effort if you have an OP code that is if your op codes are not if your instructions are not that high level if they're just uh for example uh add and shift then you need to translate all of those matrix multiplication operations to ads and shifts right and that's a lot of work for someone it could be the programmer it could be the compiler but someone needs to complicate their lives so software complexity increases if your off codes are not matching the high level language uh there are also trade-offs like if you have simple instructions simple up codes latency is shorter if you have complex instructions they can be they they may take a long time we'll learn more about that later on uh so they're interesting trade-offs I'm going to get to this in a little bit more so in lc3 and mips there are three types of op codes and we have seen them actually operate data movement and control and we're going to see some examples this is all of the up codes in lc3 lc3 is a really simple educational machine you can see the instructions over here we're not going to go through all of them if you're interested in them you can read the book but this is the up code basically top four bits is always the up code so this is very easy to decode you basically look at top four bits and you know what you're going to do uh in the instruction I mean there are also extended op codes as we will see a little bit so there's some this bit over here that kind of enables you to interpr some part of the instruction this could have been part of the op code but they decided not to put it as part of the op code so there are some bits over here that also uh determine what the instruction should do in addition to the upcode okay this is lc3b and they changed it a little bit this is by addressable and instead of having a not they have an exor and by now we should probably figure out know how to implement not using an exor I'll let you think about it I'm going to ask that question later again so you don't need a not operation if you have it not is unary Right basically there single operant uh if you want to take the bitwise knot of something you could actually do it using a binary operator like exor and you'll you can think about it and there are some up codes that are not used in the future they may extend it for example right you may actually uh say oh okay a lot of programs may need this up code so I'm going to uh in the future incarnation of the ISA I'm going to map that to something else okay so these are the myips instruction types it's actually a little bit more complicated uh there are three types of instructions and myips R Type I Type and J type and we've seen all of those in yesterday's uh lecture we've looked a lot into R type for example R type up code is zero but the real up code for it type is really in the fun part this function uh it type uh basically R type operations are register to register operations you have two register operands one destination register uh and uh the upcode uh and function together determine what you do to the registers right uh I type is a source you have a source register and then you have an immediate value and you do an operation on those and the up code determines what that operation is and there's also J type this is dump essentially as you have seen you have an up code and an immediate using this immediate huge immediate value you decide where to jump for example okay so let's take a look at this AR type instructions a little bit this from your book you can see that up Code Zero is uh up code is zero in these AR type instructions and once you see that up code is zero the decoder needs to look at the bottom uh six bits which are fun essentially that defines the operation for example if the fun is zero you do shift left logical if the fun is uh this 26 you do a divide operation and you can see there are many operations over here there are adds there are subtracts there's nor there's exor so this is what they decided again somebody decided that these should be the up codes and those are the ISA designers when they design myips make sense and you can see that these op codes are not that complicated right these are actually relatively simple you don't see a matrix multiply here in later incarnations of myips people may may add them but they're not here in this Incarnation okay those are up codes let's talk about data types now uh so n Isa supports one or several data types basically this is how you interpret the data the data you access in the register file and you you want to do an add operation how do you interpret that data value is it a signed value is it an unsigned value is it a two's complement value uh is it a floating point value basically in lc3 people uh the designers made the decision that it's all to complement integers right essentially what is a to complement integer it's a it's a it's a system arithmetic system where where the negative of a binary value X is defined to be the bitwise not of X plus one and this makes uh the nice thing that if you actually add X and the negative X you get a zero in some other systems you don't get that in some arithmetic system so it's it's an arithmetic system again we're not going into computer arithmetic that much in this course uh but there is a huge design space of how you do arithmetic using computers and this is one very commonly used way of doing arithmetic the negative of a binary value X is the bitwise not of X plus one okay and if you're interested in this you can read H&H chapter uh that which talks about to complement mys in addition support other things like to complement integers yes but also unsigned integers meaning integer is treated as unsigned it's not signed everything is positive or you can think everything is negative right that's another way of thinking or it has floating Point arithmetic also so there are trade-offs involved over here again Hardware complexity versus software complexity if you actually have lots of data types Hardware becomes more complex but software may be easier because because in software you may have actually different types of things for example you may actually have a matrix as uh a data type right none of these are supporting matrices but some of the advanced Matrix extensions that are incorporated into general purpose machines today are supporting matrices because of the importance of machine learning okay and again they are trade-offs in terms of latency if you for example have a supported data type latency is fast but if you don't have a supported data type for example again as I show as I discussed earlier if you want to do a matrix multiplication but Matrix is not support as a data type what is support is only uh unsigned integers let's say you have to convert every operation that you do you have to convert a full matrix multiplication to many operations on unsigned integers okay now let me give you a high level perspective and then we're going to go into a little bit more detail this a uh this course is about Hardware software interface right we are going to talk a lot about that later on especially when we go into other execution models but this having uh different data types in the ISA is a very good example of the trade-off that people make between hardware and software in other words programmer and micro architect programmer is who implements software micro architect is who implements Hardware if you have more data types this is good for the programmer in general if you have fewer data types this is good for the hardware designer in general generally so why because if you have more data types uh it enables better mapping of highle programming constructs to the hardware Hardware can directly operate on data types present in programming languages for example this way you get small number of instructions and a smaller code size your programs in assembly will be compact in machine code so they can fit small memories for example right so I've already given you examples from matrixes right you can have Matrix operations a single instructing CH say multiply Matrix A and B Store the result in Matrix C and of course there's a description of what these are exactly but one instruction can unleash huge amounts of work on matrices that could be very large as opposed to you only have multiply add and load and store instructions like we are going to have in myips in lc3 and you convert The Matrix operations into those smaller pieces of operations then you have lots of instructions right as opposed to having a single instruction that does all the work you may have millions of instructions that do a matrix multiplication think about that there are energy consequences of this also another example could be graph operations if you're operating graphs on graphs for example you have you a graph traversal for example and you may have graph traversal operations in the ISA as opposed to individual load store add instructions again uh you can have much better uh code size uh and also much better mapping of high level programming constructs to Hardware the disadvantage of these complex data types or more data types is much more work for the micro architect now the hardware designer micro architect needs to implement the data types and the instructions that operate on the data types so once you have data types you also need to add instructions that operate on the data types right Matrix add Matrix multiply Matrix transport transpose all of those are operations yes also to certain operations uh I mean you can make all of these fast right you can make a single ad fast a move fast but yes you can make the M Matrix multiply fast also so if you have an instruction then the hardware designer can customize it yes exactly so that's another benefit of more data types certainly but it's still more work for the micro architect that's the disadvantage it enables better mapping and you can customize it you can reduce the latency but it's more work make sense Okay so uh another way of looking at it is a semantic level right if you think about data types they're really tightly coupled to the semantic level or complexity of the instruction I'm going to introduce a concept of semantic gap which is a good way of thinking about instructions that architecture design essentially semantic Gap means how close are your instructions and data types to the high level language pick your favorite high level language Java python uh c c is still high level even though it's relatively low level uh rust uh if you're closer to the constructs in that ey level language then you have a low semantic Gap so if you have complex instructions and complex data types you have a small semantic Gap if you have simple instructions and simple data types you have a large one so let me give you a a few more examples for example you can have an instruction that inserts into a double link list this makes your life easier right a hashmap instruction right instruction operates on hashmaps or instructions that operate on key value stores uh VX I actually had some of these you actually had a high low semantic Gap small semantic Gap you had instructions that operate on double link list you had instruction that operates on multiple multi-dimensional arrays today we have Matrix operations which are similar to what we had a long time ago with VX for example and they cater to these heavily used workloads that work on Matrix operations right simple instructions we've already seen right basically you have primitive operations load store multiply add nor and essentially early risk Mach reduced instruction set machines had only integer data type for example later they figured out this is not good enough so we're going to add more data types so if you're too simple you quickly figure out that this is not enough because there's a huge software world out there and you didn't really support the software nicely your semantic Gap is too far from the high level language as a result you figure out and you keep adding to the ISA if you're too close to the software then your operations may be at some Point not so useful especially if the software word evolves and start using something else right so your Isa Also may get outdated so uh it's good to think about the cibit okay so let me illustrate the CIT pictorially essentially we're talking about how Clos instructions and data types are to high level language you have a high level language pick your favorite one again and then you have control signals at the low level in the end your goal is to map the high level language to the control signals of course over many clock Cycles right if you set your instructions at architecture close to the high level language you have a small semantic Gap right you're far away from the control signals a matrix multiply is much farther away from a control signals than a not or end on or operation right if you have an instruction set architecture that's with simple instructions and simple data types or few data types you're closer to the control signals so you have a large semantic Gap now you see the complexity now right now this mapping is hopefully easier for the software harder for the hardware designer again Hardware designer as uh uh your fellow student set can optimize things they can still make whatever instructions that are defined over here as fast as possible but they they still need to do the work to do that uh if you're here then the hardware designer's job is easier because you're closer to Hardware now the software designer's job is harder perhaps right because they need to get high performance using very simple operations right okay so let me uh uh wrap it up we're going to you see the similar pictures for addressing modes also so we're going to add adding but usually people talk about complex instructions and simple instructions right complex data types and simple data types a good definition of complex instruction is an instruction that does a lot of work many operations basically and we talked about some of these right you could for example have an instruction that copies one string to another or you could do that operation using loads and Stores Only uh you could do fast for transform for example simple instructions an instruction does little work but it's a primitive using which complex operation can be built add exore multiply so the advantage of complex instruction and data types is you have much denser encoding your programs are much smaller you have smaller code size which means that you don't need a lot of memory hopefully at least for the code uh you don't need to fetch that program far away from memory and storage into uh your on chip caches as we will discuss later on and because everything is smaller you keep things on chip so we're going to talk about caches this is good for caching for example as we will see later on also you have a simpler compiler no need to optimize small instructions as much and maybe you have a simpler uh uh person who is writing programs right uh because you can you can map the high level language to the ISA more easily but of course this comes with disadvantages as well you get larger chunks of work uh and as a result the compiler has less opportunity to optimize with complex instructions but if you also provide the fine grain instructions like not exore multiply Etc maybe the compiler can choose the best of both worlds now the compiler is not as simple anymore though okay but and clearly you get more complex Hardware because you need to translate from a high level uh to control signals and the optimization needs to be done by Hardware okay so this is a summary essentially does this make sense I didn't go into like how to optimize Etc because that's the subject of a later uh uh some of the later lectures uh but you can also take a compiler course if you're interested yes isn't a huge disadvantage that wased that basically you if you have multiple really complex instructions like the through is going to be really bad you won't use them all the time whereas you few instructions so canuse I mean not necessarily you can do very similar optimizations for large instructions also right that's performance optimization that's that's the reason why I didn't get into performance optimization here you can optimize the performance of a chip whether or not you're having complex instructions or simpler instructions if you have lots of complex instructions yes you need more area perhaps to support them but but as we will see you can change the tradeoffs also so we will see some ways of changing the tradeoffs depends depends on the workloads you're executing yes certainly primitive instructions you can map everything to them right but yes if you're more specialized instructions clearly they're going to be used fewer times I agree but depends on the workloads you're targeting also right clearly people are adding these complex instructions today because some of the workloads are so important uh that you don't uh and you can optimize those operations because there are benefits to it uh so in the end when you're actually doing the performance optimization you need to think about what workload you're optimizing for and you need to uh optimize for the common case okay okay okay let me give you an example this a fun example that I use this is a data type binary coded decimal do people know what this data type is it's not in your books I think it is supported by the x86 Isa it is covered in your books maybe maybe not okay basically you encode each deal decimal digit with a fixed number of bits so this is how I would represent time 10 37 49 and by now you should be able to read this right I hope this is 10 right essentially to represent the first digit you need two things and this is the lowest least sign significant bit uh to represent the second one you need four bits think about why because there are 10 different values this can take right and uh four gives you 16 so you encode this with a binary uh four four bit binary value this is 3712 over here so you need three bits over here and 7124 7 now hopefully you got the idea and this is 49 you need three bits again here one two 4 and not for to en code 9 you have 8 + one now if your brain was uh wired to process this data type you wouldn't have to go through this exercise you would just look at this and say this is 10 37 49 but your brain is better wired either by nature or nurture to read that right so essentially maybe our computers are not using this data type natively it is a data type though people use it which clock do you prefer now if you were always using from your birth this one maybe you would prefer that one this is good to think about don't say I prefer the left one because I'm a human maybe some other humans would prefer that one if they were using that one from birth and they never saw that one okay good to think about okay that's data types and I think that's a great example of a data type okay addressing modes this will make things a little bit more complicated essentially addressing mode is a mechanism for specifying where an OP is located we discuss this already in lc3 there're actually more addressing modes than in mips lc3 makes a different trade-off in this case it's it's more complex there's an immediate or literal mode the oper hand is in some bits of the instruction and we've seen this we're going to see more of this there's a register mode that we've been seeing the operand is in one of the general purpose registers and there are three memory addressing modes which we're going to see we haven't seen some of these we saw Bas plus offset we saw PC relative actually we didn't see indirect which is an interesting mode but we were going to see that myips actually has pseudo direct addressing but doesn't have indirect addressing it has Bas plus offset also it does immediate as well so there are quite a few of these addressing modes but it doesn't have indirect for example okay why do we have different addressing modes again the same trade-off I'll go through this relatively quickly uh this is another example example if you have more addressing modes you can map the highle programming construct to Hardware better because some accesses are better expressed with a different mode you get reduced number of instructions and code size for example if you're indexing arrays uh especially multi-dimensional arrays you may have an addressing modes that makes it easier you can have indirect uh addressing modes that make pointer based accesses easier you can have an addressing mode for example if you have Matrix operations again if you want to access one element in the Matrix you can speci ify XY coordinates and if your Matrix is multi-dimensional for example XY ZT P coordinates if you can think multi-dimensionally of course uh that make make things easier right and if your Matrix is sparse meaning you have a huge Matrix but most of the values are zeros maybe you have a different addressing mode to encode where those zeros are and your Hardware can natively support those addressing modes okay so clearly you can imagine many things and you you will dedicate hardware for these so uh uh you that uh needs to be there this is disadvantage again if you have actually more of these addressing modes there's more work for the micro architect and also maybe more options for the compiler to decide what to use so here compiler actually may become a little bit more complicated as well if you have too many options okay so this is a semantic Gap I will not talk about this again but you can essentially what I've done over here is to add addressing modes also so your semantic Gap depends on what are your instructions what are your data types and what are your addressing modes okay there are many other trade-offs in Isa design which we're not going to talk about right now at least I'm going to talk about instruction types and addressing mode types so let's jump into Opera instructions a little bit more so in lc3 there are three Opera instructions not be discussed it's a un operation One Source operant it executes bitwise not there's add and end there are binary operations two source operants and we said that the data type is two complement and end is bitwise actually sr1 and sr2 in myips there are many more there are many R type instructions that I shown you earlier uh I don't want to talk about everything of course uh there there are it type instructions meaning there are versions of this with one immediate operand and one register operand and there's ftype operations which are floating floating Point operations and again I'll I'm going through this relatively quickly because we've seen examples of this and these are not difficult Concepts but let's take a look at one example this is not in lc3 you basically take the knot of R5 store the value in R3 and again we've gone through this exercise a similar exercise you can figure this out easily by looking at a manual lc3 manual which is your book and not as an easy operation you take the value in the source register bring it to the ALU ALU performs bitwise not and you get that from the finite State machine meaning the decoded instruction instruction registers decoded version specify that this a not and generates a control signal saying ALU should perform a bitwise knot and you get the bitwise knot of the Source register into the destination register so this is all done in microarchitecture using control signals so there's no nuts in myips how is it implemented okay maybe I should yes yes exactly exor with one that's one way of implementing it another way of implementing it is using nor you can think about that also that's a good basically you could do you could do it multiple ways in myips actually so you don't need a n in myips that's the reason they they don't have not in myips and that's the reason they removed not in lc3b also lc3b replaces not with an exor they don't have a nor either but again I think you answered both of the questions over here but you could do a similar thing with nor okay so operate instruction we already familiar with ADD and end with register mode and R type in myips now let's see how we can do things with immediate operat right uh we'll use subtraction as an example because there is some instructive value to subtraction especially with two complement values let's take a look at how subtraction is implemented in lc3 and mips so recall we have seen this already this is register to register uh operation and if the op code is add this is an lc3 if the op code is 00001 you add Source register one to Source register two and store the result in destination register if it's end you do bitwise end between Source register one and Source register two okay you've seen this before yesterday now let's take a look at how this is done with immediates now we're going to make use of this bit over here bit five bit five over here it's called a steering bit let me finish this and we'll take a break so bit five is a steering bit here you see that these are all set to zero five is the important one over here now if the up code is the same up as the register mode at 00001 but if this bit is one you interpret the lower five bits as an immediate as some literal value uh that's inside encoded inside the instruction you don't go to a register so essentially what you do is you take Source register one value in Source register one add to it because the off code is add this immediate five bits sign extended sign extended mean if you have a two complement value that's five bits the top bit tells you whether it's negative or positive right if it's zero it's positive it's one if it's negative s extension means that take the top bit replicated for the remaining 11 bits okay that gives you the full 16bit value okay so that's the idea over here if the op code is end uh which is 0101 you take the source register and you do a bitwise end with that and the sign extended immediate five value from here okay I think we already discussed so the key is basically the same up code is used but this is an extension of the up code this this also called a steering bit in some your your your book actually talks about it as a steering bit it's also an it's really an extension of the op code by just looking at this you don't exactly know what to do you also need to look at this right does that make sense so when you're decoding an instruction this 152 gives you some information but not the full information exact about what you will need to do it's an encoding issue so let me uh uh finish this and then we'll take a break essentially this is the encoding uh you can do you can write an assembly like this you can see that I can express a negative value because it's a it's a two complement five bit value over here you can easily express a negtive -2 over here it looks like this this is the negative value immediate five you can convince yourself that that's negative -2 in in two tw's complement arithmetic that is negative -2 and this is what will happen in the hardware essentially The Source register is R4 in this case so you get the value in R4 it goes into one input of the ALU the other input of the ALU is selected from either the register file or the instruction register which contains the bottom five bits that we want and we have a sign extension logic that replicates the first bit makes it a 16 bit value and then you have a multiplexer and you let's let's let's guess what this multiplexer is controlled by it's bit five from the instruction register basically bit five says if bit five is zero take the value from the register file if bit five is one take the value from this data path element which is sign extended last five bits of the instruction register and this way you can Implement a full ad depending on whether bit five is one or zero this still executes the right thing if bit five is one you get the assign extend immediate if bit five Z you get the register hopefully that makes sense okay so this is where it sits in the data path and this is where you see the bit five over here okay I think this is a good place to stop uh let's uh I'm going to show you an example of how to do this in mips and then we're going to look at the trade-offs in terms of subtract and add let's be back when the bell rings and we're going to continue uh with this St e e e e e e e e e e e e e e e e e e e e e for [Music] okay you can hear me hopefully right okay uh so remember that we were talking about instructions operate instructions with literals or immediate values encoded inside the instruction uh so one of the reasons or a major reason why this is provided in instructors at architectures is to make sure you don't waste a register so if you know that you're decrementing or incrementing for example by a register by some amount you can encode that information inside the instruction itself right that way you don't uh waste a general purpose register to store a plus one or minus one right so it's good to think about the trade-offs in fact x86 Isa has an increment instruction it's basically a very specialized instruction that does increment right so this is not an increment instruction it's a more general purpose instruction because you have five bits of immediate that you can use to do more than increment you can add any uh five bit value into complement uh but you save a register essentially that way but of course if you need to add a known value uh that's larger than that can be represented with five bits then you need to store that in a register and then you do a register to register at you cannot use an immediate so that's the trade-off with literal an immediate addressing modes okay we've seen how this works and we have seen that we're complicating the data path so that we can actually execute this instruction it's not that bad but it's complicating data path still now let's take a look at a similar instruction myips uh these are it type myips instructions they actually have different up codes than R type uh you essentially have two register operands and immediate uh some uh some operate and data moment instructions are like this it looks like this essentially in myips instructions are longer as you know up code is six bits and you have an immediate that's 16 bits okay upod is Operation RS a source register RT could be the destination register in some instructions like add or load add immediate or load Word uh it could be Source register in others so now it's it's not actually very clean as you can see right uh you're using the same encoding where the same bits RT these five bits over here refer to either a source register or a destination register you will see this in lc3 also so you will need to have the data path elements that take those bits and uh rout it to the right place in the register file so that you can index into the register file okay the most important thing for our purposes right now because we're discussing literal values is this literal or immediate you can express a 16bit immediate over here so this is an example add immediate and you can see that the up code is different from add it's not add anymore it's ad I add immediate uh these are the field values if you want to add five to uh Source register 1 S1 which is represent as 16 uh sorry in this case Source register is really the 17 over here RS and the destination register is 16 uh okay so basically you take RS and add to it a sign extended immediate which is very similar to what we have done for uh lc3 and store the result in RT and this is the machine code that corresponds to all of this this is what is stored in the memory as you know okay now let's take a look at like why are we looking at this because one of the benefits of looking at uh this immediate is subtraction right the question is do we want to have a subtraction in myips in your Isa in mips assembly uh you can express a subtraction in different ways so this is high level code now we have two operations one addition and one subtraction as you can see and you store the result in a this is one way of doing this in myips uh assuming these are in registers you add the source register zero Source register one put it into a temporary register and then subtract that from S2 or subtract S2 from that and put the result in S3 so this is a subtract instruction that looks nice in lc3 you don't have a subtract instruction so what you do is this and this essentially accomplishes the same thing except the registers are different of course but what you do is you add B and C which are stored in ouru in R1 implicitly and then you take D which is stored in R3 you take the two's complement of it meaning you negate it uh you take the knot of R3 and add one to it that's the two's complement negative representation and then add uh that negative thing to the B plus C that you computed and stored into R2 okay so this is the same code Implement in LC yes you could do that also yes that's possible this is one way the the people decide to do it over here right okay so you can see now immediately a trade-off right and this is actually the same tradeoff that I was talking about do you want complex instructions or simple instructions subtract is a complex instruction compared to not an ad makes sense right if your data types can be converted to a negative value using not an ad which almost all data types can be converted then you can do whatever this is uh you can do everything with not an add you don't need a subtract operation the benefit of subtract is of course now you can see the denser encoding right I kind of told you that but now you can see even with a simple operation like subtract which is more complex than not and add you have two instructions here as opposed to four instructions here your encoding is better over here okay okay performance is a different thing you need to optimize the performance of both probably and I don't know which one would perform better depending on other constraints right maybe you can optimize subtract to be very fast but maybe you can optimize not an add to be as fast right okay so you get more instructions in lc3 but control logic is hopefully simpler because you don't have as many instructions right okay so let's take a look at subtract immediate so this is our high level code B minus 3 Let's play some more games in mips assembly you can express it this way subtract immediate because you have a subtract instruction s03 right but it's really not necessary because you could actually do it this way also right in myips you could do add iMed minus 3 right okay in lc3 essentially you do the add in lc3 doesn't have an ad immediate it basically uses the same pneumonic to specify ad versus that immediate and you can distinguish them based on here but those are all programming constructs also okay any questions so hopefully that give you an example of a little bit complex instruction versus simpler instructions now you can generalize this to much more complicated instructions okay now let's talk about data movement instructions we're going to deal with accessing memory and we're going to complicate things more because it's going to add more addressing modes because how do you address uh the location of the data that you want to get to so in lc3 there are actually seven data mov instructions we're going to see essentially all of them we're going to focus more on the loads uh and the format of load store instructions look like this basically you have an up code uh the top four bits you have either a destination register or Source register after that the next three bits and those are interpreted as destination register if you're loading those are interpreted as a source register if you're storing because storing takes a source register stores into memory uh loading takes a memory location and stores into a destination register and then the rest are address generation bits and how those are interpreted depends on the up code and there are four ways to interpret these bits and these are essentially the addressing modes that we have we've seen some of these we're going to see more uh these are the names of the modes PC relative indirect Bas plus offset immediate and they all have either restrictions or downsides uh or upsides let's see in mips there are only base plus offset and immediate modes actually there's no indirect mode uh and there's no PC realter mode at least for uh data movement instructions there's a PC alter mode for jump J type instructions that we have seen in the past and we probably going to see soon again okay let's take a look at this PCA addressing mode this is load and store and we have already uh seen an example of this I think uh but actually we have not seen the example of this sorry uh this is the uh op code op code can be 001 0 or 011 and they distinguish the load versus store if the up code is that the next three bits are either destination register or Source register depending on whether you're doing a load or store and this is how you calculate the address and this is actually the semantics the total semantics of the load the address calculation is done by taking the PC program C which is the address of the instruction plus actually address of the next instruction next sequential instruction because this is the ISA designer specify this as the incremented PC take the incremented PC they had in mind somebody incrementing the PC while you're fetching the instruction so when you about to execute the instruction you have the incremented PC at hand and you add to it the bottom nine bits sign extended we already learned about sign extension essentially this way you can go to a memory address that's relative to the instruction instruction at this location in memory you can go up 25 you can go down to5 I'm going to give you the exact numb soon right and then take the value in that memory location put it into destination registry so this is a limited addressing mode as you can see store does essentially the same thing to calculate the address but you go to the memory location get the data value over there and then put the results into a source register okay we're going to see the limitation soon so this is the we're going to look at the load this is load and let's assume that the offset is 1 AF n Bits And this is the machine code that you have this is how it's executed let's take a look at this uh before we go into the downsides of this so you basically fetch the instruction instruction register this is the incremented PC we're going to assume that that's the incremented PC for that instruction and to execute this instruction you first calculate the address you take the incremented PC we go through a specialized Adder for address calculation this which we're going to place in the larger machine later on uh you sign extend the bottom nine bits of the instruction register add it to the incremented PC uh you get the result which is the address you place it into the M memory address register you access memory in a state memory becomes ready at some point and then you move to the next State uh when the memory is ready the data is loaded data at address is loaded into the MDR and then when you go to the next state you take the data value in the MDR and place it into the destination register that's specified by these three bits in the instruction so there's some logic that are that's not shown here but these three bits need to get connected to uh the index or address decoder of the register file and you've seen that address decoder when we built memories earlier right this is the address the there are only eight addresses so you need three bits and you use these treeit to decide where to write and you need to write enable that location because you're actually writing in that state when you're moving data from MDR into the destination register again this not magic we've kind of seen it before I'm going through this again again so that hopefully it's it becomes like bread and butter for you okay so clearly there's a restriction to this even though I mean clearly you can do this but the Restriction is the memory address that you can address uh or specify using this sort of addressing mode is limited it's very similar to the limitation that we have when we actually use an immediate value to add or subtract right you basically uh have a small number of bits in that in the previous case it was only five bits if you want to add a value that's larger than that you can express in five bits you cannot do that here the memory address is only 255 uh after this instruction PC Plus 200 55 and or or it's uh 20 maximum 256 uh locations before this instruction because the offset is nine okay yes why use address well the memory addresses are unsigned the the address is unsigned but the offset is signed so that you can your relative value so if if your relative value was unsigned you would only go either forward or backward you cannot go backward and forward right make sense okay but the address itself has no uh sign to it because it's really it's really a linear array that grows from zero to two the N minus one where n is 16 in yeah okay so basically the limitation is a PC relative addressing mode cannot address far away from the instruction it could be useful because the data may be located close to your program right you have your program and your data is before the program or after the program but if your program is too big you cannot put the data that close potentially okay that's why people develop other addressing modes and you could argue that this is not a great addressing mode and I agree that's why myips doesn't have this addressing mode this an instructional addressing mode okay indirect addressing mode uh this is actually an interesting one that doesn't exist in myips and it's a more complicated one and essentially this called a load indir and store indir but it could be useful when you're doing pointer chasing meaning you have an you have a pointer pointer is essentially uh stored in a memory location but that memory location specifies an address that tells you to go to some other memory location right that's a pointer essentially uh okay so what does this do you this will become more clear when we do what we're going to do this instruction specifies so you can have load and store again the up code is like this if the up code is like this the next three bits are uh destination register for load Source register for store and again we have a PC offset nine bits but we're going to interpret it differently because of the up code right now the difference between this instruction and the previous instruction that we saw is the op code and this is how we're going to interpret it now this may look very similar to the previous one but we have an additional memory over here as you can see memory memory what does this mean we calculate the first address the same way we did the previous instruction we take the incremented PC add to it a sign of sign extended 9bit offset that's coming from the instruction register we go to that memory location get the data in that memory location and we say oh this is an address so I'm going to use that again to access memory and then get the data from that location so we first calculate one address PC Plus sinx and offset go to memory at that location get the data value that's our second address we use that to index memory and get the data value so you do two memory accesses which means that the first address you calculate as the address of the pointer to the location that you're going to access in memory afterwards and you could keep doing this this is uh indexing using pointers you could actually do memory memory memory memory memory memory and that would basically enable you to do many injections through different memory locations to get to what you want potential and this could be very useful if you have link data structures like graphs for example or link lists or binary trees these are constructed using uh by linking different noes through pointers in memory okay and store is essentially the same thing you just store the data by doing essentially the same type of address calculation okay so this is the instruction and this is an example PC offset over here this is machine encoding let's take a look at how this this is done we're going to use exactly the same Hardware that we designed for the previous instruction actually so we have the incremented PC over here we have the instruction register we have the sign extension Logic for the last nine bits uh the first step is to calculate the first address which basically takes the incremented PC adds to it a sign extend the sign extended N9 bit uh coming from the uh instruction register that's your address you place it into the M that's the end of one state and then you go to the second state which is the memory access state memory accesses itself using the value in the M the address in the m and at some point memory becomes ready and places the result into MDR once that's done you take the actually we're not using the same Hardware as you can see we're using this connection which goes through the memory bus Which goes through the bus actually Crosser bus as we will see later on but basically you take the value that is placed by memory into the memory data register you don't put it into the destination register immediately you put it into memory address register makes sense it happens to be 2110 in this particular example in your book okay so that's the critical step this is the indirection part you calculate the address access test memory memory provided some data you treat it as an address and you put it into the m and then we do another memory read this memory read provides us the data that we're looking for according to the specification of the instruction and the data once the memory is ready you get the data into the memory data register and you place the data into the register file as specified by the destination register bits in the instruction register make sense clearly you need to modify the stick machine to en this so you do two memory reads to be able to do this but now the benefit of this is the address of the operand uh can be anywhere in the memory and it's actually more powerful than this because it's you're actually using some pointer chasing if you will okay this an interesting instruction I think it's not used in myips because it's too complex for myips mips the philosophy of the design of myips in the 1980s actually late 1970s was let's make instructions set as simple as possible so this is too complex for myips and the reason they had that philosophy as we will talk about uh when we go into microarchitectures if you make the machine as simple as possible the hardware you can design as efficiently as possible now the job the software job is much harder of course and for example the original mips actually didn't even have a uh multiply instruction it didn't have even have a load bite instruction they start with load Word quickly they figured out bad idea people use multiplies all the time so we cannot just have adds and shifts they also figured out quickly word loading words is not enough bad idea people load bytes all the time so let's have a load bite so over time that Isa becomes becomes more complicated it's still not that complex as you will Implement in your lectures but this is the idea that's why they don't have this addressing mode but they do have this addressing mode which is equally capable let's say uh meaning I I shouldn't say equally capable but it's it's expressive enough to address anywhere in mem okay base plus offset we've seen this essentially ldr and SDR and nips do this and you've already we've already gone through ldr a lot in the last lecture so I'll go through this relatively quickly that's the this is the encoding as you can see and the the way address is calculated is you access the base register which is specified by these three bits in the instruction register uh you take the value over there you add to it this sign extended six-bit offset and that's your address now you access memory and get the value in that memory location and place into the destination register makes sense right and store is essentially the opposite way and we've seen this again uh but just to show you pictorially we've not seen exactly this thing over here assume that your base register is two destination register is one and offset is 1D this is ldr this is how it executes you take you access the base register which is register 2 in the register file uh actually this is an lc3 sorry but mips is very similar uh so MIP is going to be very similar but we we're looking at lc3 right now uh you can get isas confused also as you can see because they're very similar to each other in some instructions uh but you take the base register uh you go through the uh um address calculation Adder that's a specialized adder and you add to it uh sign extended six bits from the instruction and that is your address and that address gets stored in the the memory address register and then you move to the next state in the state machine that next state does the memory read and when the memory is ready the data is placed into the MDR by the memy and the last state in the execution of ldr is you take the data in the MDR and place into the destination register which is one in this case make sense okay so again the benefit of this is the address of the oper hand can be anywhere in memory but you don't do the injection so if you do the ldi you can do two of these accesses without going through memory if you want to actually do ldi using this you need to do another ldr okay so ldr actually saves one more instruction as you can see right if you think about it if you want to do two memory accesses if you want to access the memory if you want to treat this thing that youve loaded into destination register as another address you need to do another ldr with an offset of zero for that okay okay so let's take a look at the address calculation now we're actually making the address calculation more complicated by adding more addressing modes so this is the global bus this is the more complicated version like full version this is the memory address register multiplexer so we have multiplexer over here so that we can put data into the m and we have the data path for that this is the adder that I was showing you earli here this is the uh address calculation Adder which is different from the adder in the ALU we need to have a specialized Adder for this purpose and these are the sign extension logic that we added this is the nine bit uh nine bits coming from the instruction register sign extended for uh one of the addressing modes and six bits coming from the uh register for the other addressing mode uh and there are other addressing modes that actually uh we can use that we're not going to look at uh right now but essentially as you keep adding addressing modes you need to complicate the hardware which is what I said earlier right if you actually have more addressing modes your Hardware becomes more complex as a result Your Design becomes more complex potentially slower but you could also optimize it uh in different ways okay so in myips uh We've also seen LW earlier you use essentially the Sim same addressing mode Bas plus offset in mips it's also called base addressing mode I like Bas Plus plus offset better because there's an offset actually over there it's not just a base so this is the high level code this is the myips assembly and we're assuming B addressable so you know the reason for the eight because assuming this array is an array of words to get to the second element uh second word you need to multiply the address uh by four because each word is four bytes so this is how you calculate the address you take the value in register S zero add to it eight uh go to memory and uh actually this a store word sorry uh we just decided to use store word go calculate the location uh address of that memory location and store register three into that location and this is these are the field values over here so in this case because myips has larger instructions the offset or the immediate value uh can be much larger it can be 16 over here at it's sign extended to 32 bits of course okay now let's take a look at how these programs work in mips and lc3 we're going to look at both stores and loads uh these are the myips registers that we're going to allocate to again I will not go through this in detail and you can do it yourself but what this does is the first load Word loads the value into uh a temporary register a and then you do an add a plus b and then add I does the minus 5 now T2 stores what we have as C and you do this store word from T2 to uh the uh memory location that's specified by a base register that stores the beginning address of this array make sense okay that's hopefully simple we've gone through all of these instructions store word may be slightly new but it's the same thing as load except it's the opposite way lc3 is essentially very similar so this is the similarity of the isas they have the same addressing modes in this particular case same types of instructions and once once you have that it's just a difference of syntax right it's kind of like having two different languages and you use the same type of instructions to implement a particular construct right so uh if your instructions are similar you your Isa is are also very similar in the end and there's not much trade-off over here okay let's take a look at the immediate addressing mode this is going to be a little bit different because we're not going to access memory it may be surprising because we're talking about data movements in instructions but this is really initialization of a register type of instructions we're going to see the equivalent of this in myips uh and the idea is load effective address it should have been called what mips called it in my opinion it's really loading an immediate value into a register and you will see the name in immediate it's load upper immediate low load yeah you'll see that soon but basically this is what the up code looks like 11 one0 and there's a destination register and there's a immediate over here specified by PC offset actually this sorry it's a little bit different from uh myips because this is really uh using PC because you it's you really want to load an address in this particular case uh but maybe it's not as general purpose as what we will see in the myips so it's good to think about that so basically what you do in Lea in lc3 is you take the incremented PC you add to it a sign extended PC offset just like we did in a PC relative mode and as opposed to going to memory you just load this into the destination register does that make sense you basically load a PC relative address into the destination register without going through memory so what is the difference you don't Access Memory essentially instructions with the PC realter mode that we have seen some number of slides ago load from memory but Lea does not so that's the name uh that's why it's called the load effective address if you do this now you can manipulate uh things based on the program counter that's the benefit it provides but again myips doesn't have this myips actually doesn't provide PC relative addressing uh for uh data moment instructions but you can actually encode this in Jump instructions as we will see again okay okay so this is what it does even though we cover it in data movement instruction it's really a odd instruction it helps data movement because you're loading a PC relative address you get the incremented PC you add to it a sign extended 9bit offset very similar to a PC addressing it's actually PC addressing but you don't access memory in the end you take that and place it into destination register now what you have in the destination register is PC Plus offset make sense now you can manipulate something based on the current PC that you have uh so it has its benefits okay now we actually complicated this a little bit more uh what happens over here is you need to go through again the address Mark I will not go through this but you will study this more and more as the lectures come and essentially there's data path that's provided for this as well so it's really this one and uh yeah there's the PC over here right you you you have the incremented PC that's already stored over here you select that one and add to it this s extended 8bit value and that's your address and then you go through the M MOX over here you need to set the control signals accordingly and you put the data value into the register file okay so let's take a look at this it's a bit different in myips as I said because it's not PC relative uh and it's arguably it's uh because myips doesn't allow you to do this PC relative addressing mode this is a way of initializing the registers so in myips for example you have an Lui instruction load upper immediate that loads a 16bit immediate into the upper half of a register and sets the lower half to zero slightly different right you don't take you don't do anything with the program count you just take the 16bit immediate put it into the upper health so it's used to assign 32-bit constants to a register for example let's say let's say that we have a high level code that looks like this we assign a 32bit value to this variable MIP assembly looks like this so load of per imediate what it does is it basically uh with this instruction you specify 6D 5V the top the most significant 16 bits what this does is it sets the top 16 bits of s z to this value okay okay and the bottom 16 bits are zeroed out and then you can or that value with another immediate that's encoded in the instruction for f3c okay this is one way of initializing values hopefully it makes sense without going through memory accesses another way of initializing value is write some data to some memory location load it and then load into a register but this way you don't have a memory access right you elimate memory accesses okay it's a very similar addressing mode it's not exactly the same addressing mode it's not PC relative as you can see but this is very useful for initialization okay this is something that I will not go through uh actually I'd be interested in giving this to chat GPT and asking what this program does these are actually a good way of reverse engineering uh but if someone gave you for example this entire program you could reverse engineer it now based on what we have seen you need a manual meaning the lc3 manual in this case uh but you know what to do these are the instructions and you know the up codes and you can basically reverse engineer what each of these up codes do right how you basically take the patn Patel book you start with the back cover which has this and then if you don't know exactly how the instructions operate you need to flip a number of other pages to figure out how the instructions operate and in the end you go through the exercise you figure out these are these instructions and these are the immediates and then you figure out the addressing modes and you figure out exactly what this program does and you simulate it and in the end the final value of R3 is five people who do security actually do a lot of this reverse engineering because they don't know what a program does for example they're given a bunch of bits binary zeros and ones and they figure out what that program does and then they hack it for a good purpose or for a bad purpose but this reverse engineering is actually fun uh I don't know if you can try it with an Lon to see uh if it actually is able to do get get you the correct value it's good to check actually I'm curious maybe maybe one of our Tas will check later on if if if the student if you check it let me know if it gets the right value but you should be able to do this right now based on what we have learned everything about isas today and yesterday uh with the right tools meaning the right manual you should be able to do this on your own and it's good to try this it's it's actually in your book the book your book already does this for you but I would suggest doing it yourself so that you see how instructions operate okay let's jump into jump instructions control form instructions so we're going to treat control flow instructions a little bit more and hopefully we'll be done with control flow instructions by the end of this lecture this these allow a program to execute out of sequence and we've already discussed this they could be unconditional jumps like what we discussed yesterday today we're going to talk about conditional branches uh conditional branches are used to make decisions for example you have an if else statement if x is greater than five do this or execute go to this part of the code otherwise go to this part of the code right so in lc3 lc3 and myips treat this differently and there are different isas that do either what lc3 does and what myips does for example x86 does similar to lc3 x86 does also other stuff but uh condition codes for example that lc3 uses are used in x86 let's take a look at what those are so jumps on the other end are used to implement loops and function calls and we've already seen jump JMP in lc3 and J in myips we may see that again but let's talk about conditional branches because that enable those enable us to do something more sophisticated based on a condition you go to a part you jump to a particular part of the program or you go in sequence that's the idea so how is the support in lc3 so in lc3 each time you write a general purpose register there are three single bit registers that are updated so while you're writing to the general purpose register there's also something else happening in the data path that says check the value that I'm writing and set the condition codes each of these condition codes are either set set to one or cleared set to zero so if the value that I'm writing to destination register is negative you set the end bit and z and P are cleared now that means the value is negative clearly it's not zero it's not positive if the written value is zero you set the zbit if the written value is positive you set the P bit so clearly now you get the idea right you have nzp and this specifies whether the last value that you wrote to a destination register was negative zero or positive clearly uh only one of those could be true right a value that you write is either negative or zero or positive and you don't get any of those other combinations so none of those only one value is it should be one and exactly one value should be one so x86 and Spark are actually examples of isas that use condition codes this is used in real life if you program x86 you will see it so let's take a look at Branch if zero so this is uh the uh assembly uh of this this is uh the encoding so the branch up code happens to be 000000 0 there are three bits in the instruction en coding which tell you which condition code does this instruction test in this case Branch zero Branch if zero means that you should change the program counter to a Target value only if the condition code is zero nzp are the small nzp are the instruction bits to identify the condition codes to be tested and these are the registers that are set by the when you write to a destination register large nzp those are the values of the corresponding condition codes in other words PC offset is the usual PC offset that we're used to immediate or constant value nine bits and this is the semantics of the instruction if I'm testing for the N condition code and the N bit is set or if I'm testing for the P condition code and P bit is set or if I'm testing the Z condition code and Z bit set meaning the value that was written was Zero then I change the program counter to incremented program counter plus sign extended PC offset otherwise I don't do anything meaning PC was incremented anyway so I go to the next instruction so I Branch only if I'm testing a condition code and that condition code is set that's a more human way of describing it Branch zero means I'm testing the condition Code Zero if the last value written was Zero then I'm going to take that Branch meaning I'm going to change the PC to PC Plus sin extended offset hopefully that makes sense right okay this may be a little bit mystical but it's really you just test the condition code and if the value is negative you take the branch otherwise you go to the next instruction okay all right so let's take a look at how this is executed this is how it's implemented in Hardware now we have condition registers n zp these are actually special purpose registers remember we talked about general purpose registers that's the register file now we have special purpose registers each of them is one bit they're called condition codes nzp and I already described how they're set the last value that you wrote to a destination register can be negative zero or positive I'm not showing the logic that sets these condition codes here there should be logic that sets the condition codes when you're writing it a destination register there should be a comparator that compares that value to uh zero for example actually uh yeah there should be a zero comparator unfortunately and you know how to design this you could design it uh that's the hardest part negative and positive are maybe easier uh but you need still a zero comparator anyway so basic how this instruction is executed you have the up code it's a branch zbit is set so you're testing the zbit so you have an end with the zbit this Yes means whether the branch is taken or not right whether you're going to change the PC to PC Plus offset okay so there's actually a multiplex there's actually a WR enable signal or a multiplexor depending on how you implement it if this EV Val is to yes then you take the P incremented PC add it to it there's concurrently there's an addition going on the incremented PC comes here to one input of the adder uh this is a special Adder also that adds uh that that generates what the next PC should be uh sign extended uh 9bit immediate over here and then you basically decide whether you write this into the program count so if this yes is true uh you WR enable the program counter again it depends on the implementation that's why I don't want to go into the mation exactly over here but you will see an implementation later when we discuss things uh you set the program counter to uh the target value only if the condition code you're testing is true meaning one makes sense right and you can you can guess this logic now there are interesting things you can play what if what if you're testing all of the condition codes n is one Z is one p is one this called Branch n zp then what do you do yes yes exactly basically you always take the branch now if if this is the branch that you uh encoded this an unconditional Branch at this point even though you're testing all the condition codes you're testing all the condition codes you should take the branch now there's another game you can play what is nzp are all zero if the if you're not testing any of these bits yes yeah it's a noop basically no operation meaning you always go to the next instruction this branch is doing nothing right some it's a way of implementing a no operation and there is a reason we use noops as we will see later in later lectures okay mys does it differently there's a branch equal now mys is more complicated here what it does is basically this is the branch up code source and there two uh course registers RS and RT and what this is the specification of the instruction there's an offset if myips checks if the data value in RS is equal to RT if that's the case if these two data values are equal then the program counters changed to PC Plus sinx and offset that's left shifted because of bite addressability otherwise you don't take the branch it's called Branch if equal essentially you test two registers and there are other variations of this Branch if not equal Branch less than or equal to Z Branch greater than equal to Z Etc there are other variations of this but Branch equal is useful for implementing branches so let's take a look at how to implement this in myips so the same instruction this instruction is implemented with four instructions in lc3 that's the benefit of Branch equal so you're you're checking two register if two register are equal so Loop condition for example if one register is the counter of the loop how many times you iterate on the loop and the other register is how many times you should iterate to execute the loop once you get to the number of iterations you should iterate you should take the branch meaning get out of the loop right that's the benefit of beq instruction if you want to implement this in lc3 you don't have a beq you just have condition codes so what you do is you take one of the registers R1 uh take two's complement of it and subtract it from the other register that you're comparing to R zero in this particular case and if the result is zero then you take the branch so the this ad will set the condition code and if the result is zero that means r0 minus R1 is zero meaning r0 and R1 is are equal to each other and then you take the branch now you see another trade-off in the instruction set which is essentially very similar trade-off that we have seen the same functionality exactly the same functionality requires four instructions in lc3 and only one instruction myips but the control logic requires more complexity in myths right now because you need to take two registers and compare them and then branch based on the value so this is the same trade-off that we have seen uh earlier which is uh where do you put the instructions are they more complex are they more simple I didn't give you extremely complex instructions clearly but even with very simple instructions like beq there is a level of complexity involved that's more complex than primitive instructions like not add and uh Branch with condition codes okay so we were able to cover actually all of what I really wanted to cover uh we covered a lot of things today and yesterday there is a lot more to cover on isas that we're not going to cover if you're really interested there are a lot of trade-offs over here that we're not going to go into there have been many different isas over decades so later for example we're going to look at some GPU type of isas we're going to talk about vliw it's too early to talk about these but these are some names that I'm throwing at you all kinds of isas are there and the fun Al differences are essentially in at least from an instruction set perspective there are other differences from a system level perspective that we may get into how instructions are specified and what they do how complex are the instructions data types and addressing modes for example VX was a very complex Isa it had many instructions maybe more than x86 today x86 over time evolved over the course of 50 60 years and it added many many instructions so it's pretty complex some of the is are much simpler actually okay I think this is a good place to stop uh maybe let's see no I'm I'm right on time this is just a good place to stop to remind you the semantic Gap this is where the ISA differ next week we're going to cover a little bit more on the isas and then we're going to start the micro architecture which is really the implementation of the ISA so have a good weekend uh I'll see you all next week

Transcript for:[Lecture 8] Understanding Instruction Set Architecture

Transcript for:
[Lecture 8] Understanding Instruction Set Architecture