hello there and welcome to comms 10015 computer architecture week 5 lecture 4. earlier in the week we looked at various general concepts to do with instruction set architecture and instruction set architecture design i'd like to finish off the week in this lecture by looking at how those general concepts apply within a specific real-world instruction set architecture namely arm v7a this name is a little cryptic but essentially what we're talking about is version 7 of the arm instruction set architecture and specifically the variant that they call the a or application profile the a profile arguably represents the most general purpose variant of arm v7 it's intended for use in context where you might find an operating system for example on mobile phones or laptops it contrasts with alternatives such as the mobile profile so arm v7m which is essentially a special purpose cut down variant intended for use in embedded computing platforms this is some of the examples that we've used so far arm v7a is a real instruction set architecture which means it's implemented within a range of real micro architectures you can actually hold in your hand and make use of the example on the slide is of the arm cortex a7 microarchitecture an instance of which forms a central component within early generations of the raspberry pi platform the arm v7a instruction set architecture is a risk-like design in the sense that it follows a risk-like philosophy in the majority of cases however it's also possible to identify cases where it deviates from this philosophy and therefore you could argue that it's not a risk design per se either way it's reasonable to classify arm v7a as a load store architecture using a word size of 32 bits i'm going to try and give you a limited introduction to interesting or relevant features in arm v7a structured in the same way as when we looked at instruction set architectures in a general sense that is i'm first going to cover the specification of states and then the specification of instructions particularly when discussing the specification of instructions i am going to make use of arm assembly language in all of the examples on the slides it's important to keep in mind that the goal of doing so is simply to show the relationship between how you'd normally express instructions as a programmer and what the execution semantics of those instructions is the goal definitely isn't that you're an expert assembly language programmer by the end of the lecture although the associated lab worksheet does explore this topic in some more detail arm v7a specifies a 16 entry general purpose register file within an encoding instruction we refer to those registers using a four-bit register address or index but in an assembly language program we can refer to the registers in one of two different ways shown on the left and the right-hand side of this slide respectively on the left hand side you can see the registers referred to by index they're simply named r0 through to r15 on the right hand side in contrast you can see those same registers referred to using a more human readable identifier for example a1 through to a4 are so called because they're made use of to store arguments during function calls on the slides i've written that this is a general-ish purpose register file and what i mean by that is that some of the registers clearly do have a special purpose role you can see for example that register 15 is actually the program counter or to put it a different way in arm v7a the program counter isn't a special purpose register it forms part of the general purpose register file what that means in practice then is that although some of the registers have a special purpose role they're all generally addressable that is any instruction that allows us to specify a register address as some operand can use the program counter as that operand in more or less the same way as any other register arm v7a also specifies two special purpose registers called the current program status register or cpsr and the save program status register or spsr i'm going to focus on the cpsr but keep in mind that both have the same format as illustrated in the middle of the slides and can be described as allowing us to control or configure execution in various ways and also inspect the status of execution what that means concretely is that if we write certain bits in the cpsr then this will act to control execution of subsequent instructions in some way conversely if we read certain bits from the cpsr then this gives us some information about the execution of previous instructions in the diagram that describes the cpsr format you can see pertinent examples towards both the right hand or least significant and the left hand or most significant ends bits zero through to four constitute a five bit field labeled m this holds the processor mode and reflects the privilege level in some sense for example depending on the processor mode we might either allow or disallow access to certain regions of memory or execution of certain instructions bits 28-31 on the other hand are four one-bit flags labeled n z c and v these flags capture the results of any comparisons that might have been performed during execution of previous instructions keep in mind that because the cpsr is a special purpose register we can't make use of it in the same way as elements of the general purpose register file in order to write a value into the cpsr or read a value from it we need to make use of two special purpose instructions called mrs and msr as part of the specification of state arm v7a uses a memory model with a byte addressable 2 to the 32 element address space which you can see illustrated on the slides using the same terminology we have previously this means the access granularity is 8 bits i.e each element that we're able to access is 8 bits in size the addresses we use to do so are 32 bits in size meaning that they range from 0 through to 2 to the 32-1 this gives us a total address base size of four gigabytes the memory model also includes some rules which govern the semantics of accesses within that address space you can see some example instances listed on the slide here the first rule relates to alignments for example if we focus on standard 32-bit arm instructions the rule says that such instructions must be located at or aligned to a 4-byte boundary these instructions are allowed to be located at addresses zero four or eight therefore but not one three or seven the same sort of rules apply to data for example 32-bit word 16-bit half-words and 8-bit bytes must be located at or line 2 4-byte 2-byte or 1-byte boundaries respectively the second rule relates to endianness the rule basically says that access to instructions for instance within the fetch phase of the fetch decode execute cycle will always be little endian in contrast access to data for example within the execute phase of the fetch decode execute cycle could either be little engine or big endian although not unique to arm v7a this feature is somewhat unusual the decision between little endian and big-endian data access isn't made on a per-instruction basis but instead control globally in some sense setting execution into a little engine or big endian modes by default that mode is little endian and that's what we'll assume from here on moving on to the specification of instructions then the first topic to consider is how instructions are encoded and therefore decoded in short arm v7a makes use of a fixed length instruction encoding where each encoded instruction is 32 bits in length looking at the diagram on the slide however you could argue this is one topic where arm v7a deviates from a risk-like philosophy in the sense that it includes many different instruction formats some of those formats are relatively complex which implies the process of encoding and decoding instructions is also relatively complex notice that the majority of instruction formats could be described as three address in the sense that they allow the specification of up to three register address operands there are exceptions however both the short and long multiply formats allow the specification of four register address operands rather than three recall we used exactly this example while discussing various challenges related to instruction encoding also notice that the standard data processing format is somewhat unusual although this is a three address format in the sense that it allows a specification of three register address operands the encoding is such that the field associated with the second operand labeled operand here is 12 bits rather than 4 bits the purpose and meaning of this is something that we'll come back and explain later in the lecture as you'd expect as a general purpose instruction set architecture arm v7a includes a range of fairly standard instruction classes the most straightforward example is the so-called data processing or alu-like class this class is straightforward in the sense that selection and semantics of instruction types and variants within it are straightforward you can see some examples shown on the slide here the first two such examples show variants of arithmetic type addition instructions these respectively allow us to add the immediate one or the contents of general purpose register two to general purpose register number one in both cases storing the result in general purpose register number zero by default at least these instructions don't update the n z c or v flags in the special purpose cpsr register however we can enable such updates by adding an s to the end of the instruction identifier for example the instruction add s r zero r one r two has the same meaning as add r zero r one r two in the sense that it adds general purpose register number one to general purpose register number two and stalls the result in general purpose register number zero however it also updates those flags in the cpsr register to reflect the result computed this allows us for example to capture whether or not a carry resulted from the associated edition or whether addition produced a result that was zero or non-zero the so-called flexible second operand of any data processing instruction can take one of four forms shown on the slides here these relate to the slightly unusual instruction format that we discussed earlier we've already seen examples of forms one and two because these relate to that second operand either being an immediate or a register address respectively forms 3 and 4 are slightly more exotic in the sense that they relate to the idea that that second operand specifies data via some limited form of computation the idea is that the data that we end up with results from taking the contents of some general purpose register and either shifting or rotating it by some distance that distance might be specified by an immediate or by a register address operand one way to at least think about this is that arm v7a is offering us the ability to fuse together standard data processing instructions with either a shift or a rotate and as a result of this we get some form of two for the price of one we're able to actually perform two operations for the price of one instruction this renders instruction execution more efficient in a range of different situations such as the computation of addresses where shifts are commonly applied in order to perform some scaling of a base address or an offset in addition to the set of what you might call computational data processing instructions there's also a small set of comparisons available in arm v7a whereas computational instructions typically store a result in one or more general purpose registers comparison instructions simply update the flags in the special purpose cpsr register the first example compares general purpose register number zero with general purpose register number one updating the flags in the special purpose cpsr register with respect to the computation of their difference i.e subtracting general purpose register number one from general purpose register number zero we can subsequently tell for example whether general purpose register number zero is equal to general purpose register number one by inspecting the zed or zero flag within cpsr if subtracting general purpose register number one from general purpose register number zero produced zero then we know that they're equal to each other and if it didn't they aren't keep in mind that unlike an addition for example we don't need to add an s onto the end of the instruction identifier here in order to force an update to the flags in cpsr the only purpose of these instructions is to update those flags so in some sense the s is implicit the second important instruction class is the so-called data movement class rather than performing computation on data data movement instructions simply move or copy data around in some way as a starting point you could consider the two instruction types illustrated on the slide as being representative of this instruction class the first type are so called immediate to register or register to register moves the first two examples move some source either the immediate one or general purpose register number one into a destination which in both cases is general purpose register number zero keep in mind that the term copy arguably describes what's going on here more precisely what i mean by that is that having moved general purpose register number one into general purpose register number zero for example we don't lose or forget the value stored in general purpose register number one after the move both general purpose registers have the same value namely whatever value general purpose register number one had beforehand the second type are so called single shot memory accesses the two examples shown here to respectively load or store general purpose register number zero to or from memory at an address dictated by general purpose register number one these are called single shot memory accesses in the sense that both instructions perform one memory access only as suggested by the notation used these instructions are loading and storing 4 bytes or 32 bits of data respectively the instruction identifiers ldr and sdr can be read as load register and store register therefore this really is just a starting point however and in fact arm v7a includes a range of much more exotic data movement instructions and addressing modes the first case on the slide here describes the so-called multi-shot memory accesses the idea is that the ldm and stm instructions are analogous to ldr and str except that instead of loading or storing one general purpose register to and from memory they load or store n general purpose registers to or from memory in this case n is equal to three of course we'll perform the same number of memory accesses whether we use nldr instructions or one lvm instruction however using one ldm instruction reduces the memory footprint of our program from n instructions to one instruction and also reduces the number of fetch decode execute cycles that have to be performed therefore as a result then using one ldm instruction might plausibly be more efficient the second case describes so-called multi-shot stack memory access instructions the push and pop instructions allow us to respectively push values to or pop values from a stack that's maintained in memory the two instructions work analogously to ldm and stm in the sense that they load or store multiple general purpose registers to and from memory however in addition to doing so they also manipulate the stack pointer or sp register so as to keep track of where the top of stack is in memory this slide attempts to give a flavor of some of the different addressing modes that are available in arm v7a the first case captures a range of examples that differ in terms of their access granularity for example the first two variants load one byte or eight bits of data from memory at an address determined by general purpose register number one whereas the bottom two variants loads two bytes or sixteen bits of data within the top and bottom you can identify variants that either sign extend or zero extends the value loaded from memory in order to form a 32-bit result then stored in general purpose register number zero the second case captures various examples that differ in terms of the addressing mode used for example although the assembly language syntax used might be completely unfamiliar the second and third variants are very directly using the concepts of indexed and scaled index addressing modes that we encountered previously the third and final important instruction class relates to the management of control flow this is an interesting instruction class within the context of arm v7a because of the unusual way in which control flow is managed at face value at least there are only two control flow instruction types available so called branch or b instruction and branch and link or bl instruction both of these instruction types have immediate and computed variants which imply relative and absolute branch types respectively you can see examples of the four possible variants on the slide here beyond this however there are a couple of more subtle points to keep in mind the first of which stems from the fact that the program counter forms part of the general purpose register file and so is therefore generally addressable this implies that we've extended the definition of what we've previously referred to as a computed branch because basically any instruction for example a data processing or data movement instruction can perform the analogy of a branch simply by specifying the program counter as the target operands such an instruction would write a new value into the program counter as a result of the instruction semantics and thereby have some influence over control flow the second point is that in our discussion so far we haven't mentioned the concept of conditional branches at all clearly support for conditional branches is important so the question is how does rnv7a do that the answer is a little unusual in the sense that within arm v7a every instruction is conditionally executed using a concept called predicated execution as a result the concept of a conditional branch instruction is simply a special case of conditional instruction more generally the idea is that within the encoding of every instruction i there's a four bit field which identifies a predicate or condition p the execution of the instruction is modeled as shown in the middle of the slides basically having fetched and decoded the instruction we then evaluate the predicates if the predicate evaluates to true then execution of that instruction proceeds as normal however if the predicate evaluates to false then the instruction is discarded and not executed or if you prefer it's translated into a no-op from the programmer's perspective these predicates are specified by adding a suffix onto the end of the instruction identifier within the assembly language description of the instruction look at the second example on the slide for instance the instruction that looks like bne is actually a branch or b instruction with the suffix n e added onto the ends n e stands for not equals this identifies the predicate which tests whether the zed field within the cpsr register is equal to zero overall then the semantics of this instruction are such that if the z field of the cpsr register is equal to zero we execute an unconditional branch instruction derived by stripping off the suffix from the original on the other hand if the z field of the cpsr register is not equal to zero we don't execute that branch instruction we simply discard it and carry on as normal overall then this allows us to support exactly the style of conditional branch instruction we wanted albeit via a more general purpose mechanism this table captures all 16 possible predicates for each one including a description in logical terms in english language and also the suffix that one needs to add to the base instruction in order to specify the predicate itself the final two predicates are interesting cases the always predicate evaluates to true meaning that the instruction always executes unconditionally in some sense this is the defaults in contrast the never predicate evaluates to false meaning that the instruction never executes i can't think of too many uses for this but essentially what it means is that any instruction can be turned into a no-op rather than there being a dedicated no-op instruction effective use of predicated execution as supported by armv7a requires quite some care with respect to the design and implementation of programs as an example consider the gcd function implemented in c shown on the left hand side of this slide notice that the body of the function makes use of a while loop whose condition expression tests whether a is not equal to b the body of the while loop includes an if statement with both an if and an else clause the condition expression for the if statement tests whether a is greater than b the assembly language on the right hand side of the slide implements this structure in a fairly direct way assuming that a and b are stored in general purpose registers number 0 and 1 respectively it starts by comparing a and b with each other and then using a beq or branch instruction with the predicate eq for equals if a is equal to b then the branch is executed and the loop terminates otherwise if a is not equal to b then the branch is not executed and we proceed by executing the body of the loop the result of the comparison between a and b is retained so it then uses a blt or branch instruction with the predicate lt for less than if a is less than b then the branch is executed and control flow moves to the else clause otherwise if a is not less than b then the branch is not executed and control flow moves to the if clause having executed either the if or the else clause we then make use of an unconditional branch back to the start of the loop in order to repeat the process all over again keep in mind that our implementation constitutes seven instructions and each iteration of the loop requires the execution of either one or three branch instructions depending on whether the loop terminates or not through careful use of predicated execution we can improve on our implementation as shown on the right hand side of this slides this new implementation constitutes just four instructions and requires the execution of just one branch instruction per iteration of the loop i'm going to intentionally leave it as an exercise for you to think about how and why this works correctly but at a high level at least keep in mind that what we've done is flattened the if statement by using predicated execution now we have a situation where only one or the other of those subtraction instructions will actually be executed depending on the result of the comparison between a and b to sum up then this was really just a limited introduction to the arm v7a instruction set architecture intended to act as a starting point for more hands-on exploration during the associated lab slots beyond that though there's lots of reasons why arm v7a makes an interesting case study i've tried to enumerate some of those on the slides fundamentally i think one of the most important reasons is that it acts as a very clear bridge between theory and practice to put it a different way in the rest of the week we've really looked at some fairly theoretical general and quite abstract concepts to do with instruction set architecture and instruction set architecture design however hopefully you can see now that a majority of those concepts pan out in this a real-world instruction set architecture that you can actually make use of concretely don't underestimate the value of this set within the context of the unit objectives more generally there's a non-trivial chance that you have a device in your pocket or on your desk right now that supports arm v7a so if you consider that one of the objectives of the unit was to explain end to end how real computers work then you can think about this lecture as representing one more step towards doing so