Title:
URL Source: blob://pdf/e492bcb2-2947-459c-ab15-bbef6c81e995
Markdown Content:
# Computer Architecture
Computer Engineering, 3rd year, Semester 1
Week 02 What do we mean by architecture?
In general, Architecture means a style, structure, or composition
of buildings
Computer Architecture
>
Basic design and design concepts in computers
>
Hardware specifications with system software such as
operating systems and compilers
>
Architecture of the computer + Behavior of the computer Overall view of a computer
> 3
> Computer Architecture
## Instruction set Operating system
## Operative unit Memory unit
## Control unit Input and output units
> Architecture of the computer
> Behavior of the computer
What is CPU?
Equivalent terms: CPU, MPU, processor
The CPU is an IC that includes the arithmetic, control, and input/output units
of a computer.
Lists companies that design/manufacture CPUs:
Mainstream: Intel, AMD, ARM
Embedded and Industrial: Zilog, Atmel, Fujitsu, etc.
Architecture examples:
x86 (Intel/AMD)
ARM (mobile devices)
MIPS, SPARC, PowerPC (once common in academia/industry)
PIC, AVR, 68xxx, Z80 (microcontrollers) 4Computer Processing
The flow of data and control between major parts of a computer.
1. CPU (Central Processing Unit)
2. Input Devices
3. Memory (RAM / Main Memory)
4. Output Devices
> 5
Three questions about computer processing
Input data
Information
Processing, computing and storage
of information
Output data
Information
Display
Q1
How do computers
represent information?
what format?
Q
How is the acquired data
calculated?
Q
What procedures are used
to process (calculate) the
"information"?
> Binary format
Algo
> ALL
switching element Group work
Lets talk about
Q1: How do computers represent information?
What format?
Q2: How is the acquired data calculated?
Q3: What procedures are used to process (calculate)
the information?
7
text , image in Binary format
> 1.
Identify and gathering 2 .
Transforming to usable format 3 . performing calculation
Though CPU : ALL
Validation , sorting , summarization , aggregation , analysis and classification Group work
Lets talk about
>
Q1: How do computers represent information?
What format?
>
Q2: How is the acquired data calculated?
>
Q3: What procedures are used to process (calculate)
the information?
> 7
Explanation Ease of nderstanding Language
It describes instructions to a computer in a language
that is easy for humans to understand. High-level
languages are converted into machine language programs
by compilers and interpreters
ex.) C, C++, Java, Python
Easy High-level language
A low-level language that uses mnemonics (e.g., MO,
ADD, B) to represent machine instructions.
Closer to hardware than high-level languages.
ranslated into machine language by an assembler .
Assembly language
(Low-level language)
Computers can directly understand
Binary code converted from assembler language
Impossible machine language
Programming language Von Neumann Architecture (Von Neumann-type Computers)
Many computers are designed as Neumann-type calculators
Proposed by John von Neumann in 1945. The Basic Structure of Neumann computer
Stored-program computer
>
Instructions (programs) and data are placed indistinguishably in main memory
>
The distinction between instructions and data is made by the program
Sequential processing
>
A program is a list of instructions
>
Instructions are fetched one by one from main memory and executed in a
determined order
>
The position in memory of the next instruction to execute is stored into a
register called the program counter .
Linear address
>
Each cell of main memory is numbered sequentially
>
This number is called address
>
The Address is used to indicate the location of instructions and data 10
The memory address
PC The Basic Structure of Neumann computer
> 11
addi $s4, $zero, 1
add $1, $s1, $s0
sub $s0, $s0, $s4
bne $s0, $s4, L1
variable x
array(0,0)
Linear
address
variable y
array(0,1)
0 add $s1, $zero, $zero
4
8
12
16
20
24
28
32
...
Sequential processing
Stored-program
computer
The Basic Structure of Neumann computer The Basic Structure of Neumann computer
The central processing unit (CPU) is the part of the computer that contains
the arithmetic and control units.
>
Operating Unit : The arithmetic unit is the device that performs arithmetic and
logical operations and that temporarily stores the operation terms and results.
>
Control Unit : The control unit is the device that controls the operation of the
computer.
Memory Unit : The memory unit is the device that stores the program and the
data.
>
Main Memory Unit : The main memory is the part of the storage unit to which
the CPU has direct access.
>
Auxiliary Memory Unit : The auxiliary storage is the part of the memory that
cannot be directly accessed by the CPU.
Input/Output Unit(s) : The input/output unit is the device that performs the
data transfers between the computer and its environment. 13 The detailed structure of the CPU
The operative unit
> 14
The detailed structure of the CPU
The operative unit
(1) General-Purpose Registers (GPR)
The general-purpose registers are the registers that stores temporarily the terms
and the results of the computers calculations.
Compared to the memory units: Faster, smaller and without any address
(2) Arithmetic and Logic Unit (ALU)
The ALU is the device that performs the arithmetic and logic operations.
(3) Flag Register (FR)
> 15
The detailed structure of the CPU
The operative unit
(1) General-Purpose Registers (GPR)
(2) Arithmetic and Logic Unit (ALU)
(3) Flag Register (FR)
The flag register stores the status of the previous operation.
The flag register a 4 or more bits
Roles: Conditional branching (if and for statements)
> 16
CF OF ZF SF
CF: Carry Flag
The output carry of the operation
OF: Overflow Flag
The overflow of the signed calculations
ZF: Zero Flag
Indicates if the result were 0.
SF: Sign Flag
Indicates if the result were negative The detailed structure of the CPU
The Control Unit
> 17
The detailed structure of the CPU
The Control Unit
(1) Program Counter (PC)
The PC is the register that stores the address of the next instruction to be executed.
(2) Instruction Register (IR)
The IR is the register that stores the instruction to be executed.
(3) Decoder (DE)
The decoder is a circuit that decodes the value of the instruction stored in the
instruction register.
Its result drives a set of control signals for the computer.
(4) Sequencer
The sequencer is the circuit that generates the control signals related to the clock
and the status of the computer.
> 18
The Basic Behavior of a von Neumann Computer
Execution flow/Instruction Cycle: the following infinite loop
> 19
## Fetch
## (FE)
## Decode
## (DE)
## Execute
## (EX) The Basic Behavior of a von Neumann Computer
> 20
(1) Fetch (FE)
Fetch is the step at which the instruction indicated by the PC is read
from memory and stored in the IR.
Definition
(2) Decode (DE)
Decode is the step at which the instruction stored in the IR is
decoded.
Definition
(3) Execute (EX)
Execution is the step at which the instruction is carried out
depending on the results of the decoding stage.
Definition
(1) Fetch (FE)
Fetch is the step at which the instruction indicated by the PC is read
from memory and stored in the IR.
Definition
(2) Decode (DE)
Decode is the step at which the instruction stored in the IR is
decoded.
Definition
(3) Execute (EX)
Execution is the step at which the instruction is carried out
depending on the results of the decoding stage.
Definition The von Neumann Bottleneck
What is a bottleneck?
A bottlenecks is a situation that limit the overall performance of a given
product.
> 21
A bottles neck Computer Architecture
Computer Engineering, 3rd year, Semester 1
Week 08 LAST WEEK
> 2
Register Operands
Arithmetic instructions use register operands
MIPS has a 32 32-bit register file
>
Use for frequently accessed data
>
Numbered 0 to 31
>
32-bit data called a word
Assembler names ex.
>
$t0, $t1, , $t9 for temporary values
>
$s0, $s1, , $s7 for saved variables (of initial inputs)
Design Principle : Smaller is faster
>
main memory: millions of locations -> Slower
>
Keep frequently used data in registers. -> Fast Access
>
Minimize memory accesses 3
> Use $t when you dont care about keeping the
> value after a function call.
> Use $s when the value must survive across calls.
MIPS: Temporary vs. Saved Registers
Example:
# Using $t0 for temporary math
add $t0, $a0, $a1
# Using $s0 to store a persistent value
add $s0, $a2, $a3
If this function calls another function, it must save $s0 (using sw) before calling
and restore (using lw) afterward.
But $t0 doesnt need to be preserved it might be overwritten.
> 4
MIPS Register Numbering and Assembler Names
MIPS has 32 general-purpose registers , labeled $0 to $31
These are hardware register numbers , but in assembly programming, we use symbolic names to make
them easier to read and understand.
5
Common Use Assembler Name Register Number
Always 0 (read-only) $zero $0
Reserved for assembler $at $1
Function return values $v0$v1 $2$3
Function arguments $a0$a3 $4$7
Temporary registers $t0$t7 $8$15
Saved registers $s0$s7 $16$23
More temporary registers $t8$t9 $24$25
Reserved for kernel $k0$k1 $26$27
Global pointer $gp $28
Stack pointer $sp $29
Frame pointer $fp $30
Return address $ra $31 Memory Operands
Main memory used for composite data
>
Arrays, structures, dynamic data
To apply arithmetic operations
>
MIPS does not perform arithmetic directly on memory.
>
Load values from memory into registers
>
Store result from register to memory
Memory is byte addressed
>
Each address identifies an 8-bit (1 byte)
>
MIPS loads/stores 32-bit words (4 bytes)
Words are aligned in memory
>
Address must be a multiple of 4
> 6
> Byte Address
> Byte 1 0
> Byte 2 1
> Byte 3 2
> Byte 4 3
Memory Operand Example
C code:
g = h + A[8];
>
g is in $s1, h is in $s2
>
Base address of array A, that is A[0], is in $s3
Compiled MIPS code:
>
Index 8 requires offset of 32 (4 bytes per word)
lw $t0, 32($s3) # load word
add $s1, $s2, $t0
> 7
> offset base register
A[0]
> 1000
> 1004
> 1008
1 word=4 bytes (32 bit)
> 1012
> 1016
> 1020
> 1024
> 1028
> 1032
A[1]
A[2]
A[3]
A[4]
A[5]
A[6]
A[7]
A[8]
> Byte 1 0
> Byte 2 1
> Byte 3 2
> Byte 4 3
The MIPS Instruction Set
Used as the example in this class
Stanford MIPS commercialized by MIPS Technologies
Typical of many modern ISA s
(Instruction Set Architecture )
Similar ISAs have a large share of embedded core market
> Applications in consumer electronics, network/storage equipment,
> cameras, printers,
It's known for its simplicity and efficiency
> 8
The MIPS Instruction Set
MIPS instructions are categorized into 5 major types:
1. Arithmetic Instructions
2. Logical Operations
3. Memory Access Instructions
4. Control flow Instructions
> 9
Arithmetic Instructions
Add operation has three operands
>
Two sources and one destination
add a, b, c # a gets b + c
All arithmetic operations have this form
sub and or slt
Design Principle: Simplicity favors regularity
>
Regularity makes implementation simpler
>
Simplicity enables higher performance at lower cost
> 10
Arithmetic Instructions
Assumed Register
Mapping
MIPS Equivalent C Equivalent Description Instruction
$t0 = x, $t1 = a, $t2 =
b
add $t0, $t1, $t2 x = a + b Addition add
$t0 = x, $t1 = a, $t2 =
b
sub $t0, $t1, $t2 x = a - b Subtraction sub
$t0 = x, $t1 = a addi $t0, $t1, 5 x = a + 5
Add immediate
(constant)
addi
$t0 = x, $t1 = a, $t2 =
b
slt $t0, $t1, $t2 x = (a < b) ? 1 : 0 Set if less than (signed) slt
$t0 = x, $t1 = a, $t2 =
b
sltu $t0, $t1, $t2 x = (a < b) ? 1 : 0
Set if less than
(unsigned)
sltu
11 Logical Instructions
Assumed Register
Mapping
MIPS Equivalent C Equivalent Description Instruction
$t0 = x, $t1 = a, $t2 = b and $t0, $t1, $t2 x = a & b Bitwise AND and
$t0 = x, $t1 = a, $t2 = b or $t0, $t1, $t2 x = a | b Bitwise OR or
$t0 = x, $t1 = a, $t2 = b xor $t0, $t1, $t2 x = a ^ b Bitwise XOR xor
$t0 = x, $t1 = a, $t2 = b nor $t0, $t1, $t2 x = ~(a | b) Bitwise NOR nor
$t0 = x, $t1 = a andi $t0, $t1, 0xF0 x = a & 0xF0 AND with constant andi
12 Memory Access Instructions
Assumed Register
Mapping
MIPS Equivalent C Equivalent Description Instruction
$t0 = x, $s1 = base
address
lw $t0, 8($s1) x = A[i]
Load word from
memory
lw
$t0 = x, $s1 = base
address
sw $t0, 8($s1) A[i] = x
Store word to
memory
sw
$t0 = x, $s1 = base
address
lb $t0, 4($s1) x = byte[i]
Load byte from
memory
lb
$t0 = x, $s1 = base
address
sb $t0, 4($s1) byte[i] = x
Store byte to
memory
sb
13 Control Flow Instructions
Assumed Register
Mapping
MIPS Equivalent C Equivalent Description Instruction
$t1 = a, $t2 = b beq $t1, $t2, LABEL if (a == b) Branch if equal beq
$t1 = a, $t2 = b bne $t1, $t2, LABEL if (a != b) Branch if not equal bne
-j LABEL goto LABEL Unconditional jump j
return addr in $ra jal FUNC call FUNC() Jump and link (call) jal
$ra = return address jr $ra return
Jump to register
(return)
jr
14 Quiz 1: Arithmetic Example
Suppose we place the sum of four variables b, c, d and e into variable a
a = b + c + d + e;
What is the compiled MIPS code?
add a, b, c # sum of b and c is placed in a
add a, a, d # sum of b,c and d is now in a
add a, a, e # sum of b,c,d and e is now in a
> 15
Quiz 2: Arithmetic Example
Suppose we place the sum of four variables b, c, d and e into variable a
a = b + c ;
d = a e ;
What is the compiled MIPS code?
add a, b, c
sub d, a, e
> 16
Quiz 4: Register Operand
C code:
f = (g + h) - (i + j);
> The variables f, g, , j are assigned to
> the registers $s0, $s1 , $s4
What is the compiled MIPS code?
add $t0, $s1, $s2 # register t0 contains g+h
add $t1, $s3, $s4 # register t1 contains i+j
sub $s0, $t0, $ t1 # f gets (g+h) (i+j)
> 17
Quiz 5: Memory Operand
C code:
A[12] = h + A[8];
> h in $s2, base address of A in $s3
What is the compiled code?
lw $t0, 32($s3) # load word
add $t0, $s2, $t0
sw $t0, 48($s3) # store word
> 18
Convert C code to MIPS assembly
> 19
Find Absolute Value (abs)
if (a < 0)
abs = -a;
else
abs = a;
Find the Maximum (Max)
if (a > b)
max = a;
Else
max = b;
Check if a is between b and c
if (b < a && a < c)
result = 1;
else
result = 0; Convert C code to MIPS assembly
Find the Maximum (Max)
if (a > b)
max = a;
Else
max = b;
Hint : use slt, beq, bne, add, sub
> 20
slt $t3, $t1, $t0 # if a > b $t3 = 1 (i.e. b < a)
beq $t3, $zero, ELSE # if false go to ELSE
add $t2, $t0, $zero # max = a
j END
ELSE:
add $t2, $t1, $zero # max = b
END: Convert C code to MIPS assembly
Find Absolute Value (abs)
if (a < 0)
abs = -a;
else
abs = a;
Hint : use slt, sub, add, bne
> 21
slt $t2, $t0, $zero # if a < 0 $t2 = 1
beq $t2, $zero, ELSE # if a >= 0 go to ELSE
sub $t1, $zero, $t0 # abs = -a
j END
ELSE:
add $t1, $t0, $zero # abs = a
END: Convert C code to MIPS assembly
Check if a is between b and c
if (b < a && a < c)
result = 1;
else
result = 0;
> 22
slt $t4, $t1, $t0 # b < a $t4 = 1
slt $t5, $t0, $t2 # a < c $t5 = 1
add $t6, $t4, $t5 # $t6 = $t4 + $t5
beq $t6, 2, YES # if both true result = 1
add $t3, $zero, $zero # result = 0
j END
YES:
add $t3, $zero, 1 # result = 1
END: TODAY
> 23
Todays topic
Radix-conversion
>
Binary number (Base-2)
>
Decimal number (Base-10)
>
Hexadecimal number (Base-16)
2s-Complement
Representing Instructions We use decimal numbers in our daily lives
Decimal number uses 10 numbers from 0 to 9, and
when the number reaches 10, carry-over occurs
9876543210
1918 17 16 15 14 13 12 11 10
2928 27 26 25 24 23 22 21 20
99 98 97 96 95 94 93 92 91 90
10 9108 107 106 105 104 103 102 101 10 0
11 9118 117 116 115 114 113 112 111 110
Decimal number ( base-10 )FBA93210
1F1B 1A 19 13 12 11 10
2F2B 2A 29 23 22 21 20
9F9B 9A 99 93 92 91 90
AFAB AA A9 A3 A2 A1 A0
FFFB FA F9 F3 F2 F1 F0
Hexadecimal number (Base-16)
To represent more than 10 in a single number,
we use A, B, C, D, E, and F 365 (10) = 3 * 10 2 + 6 * 10 1 + 5 * 10 0
= 365 (10)
101 (2) = 1 * 2 2 + 0 * 2 1 + 1 * 2 0
= 5 (10)
125 (8)
2AD (16) = 2 * 16 2 + A * 16 1 + D * 16 0
= 2 * 16 2 + 10 * 16 1 + 13 * 16 0
= 685 (10)
Conversion from base-n to decimal
= 1 * 8 2 + 2 * 8 1 + 5 * 8 0
= 85 (10) 85 (10) base 8 85 8 = 10 ... 5
10 8 = 1 ... 2
1 8 = 0 ... 1
685 16 = 42 ... 13 D
42 16 = 2 ... 10 A
2 16 = 0 ... 2
Conversion from decimal to base-n
Ans. 125 (8)
Ans. 2AD (16)
685 (10) base 16 Hexadecimal
Base 16
>
Compact representation of bit strings
>
4 bits per hex digit
> 29
1100 c1000 80100 40000 0
1101 d1001 90101 50001 1
1110 e1010 a0110 60010 2
1111 f1011 b0111 70011 3
>
## Example: eca8 6420
1110 1100 1010 1000 0110 0100 0010 0000 Unsigned Binary Integers
Given an n-bit number
Example: What is the decimal value of the following 32-bit binary code?
>
0000 0000 0000 0000 0000 0000 0000 1011 2
= 0 + + 12 3 + 02 2 +12 1 +12 0
= 0 + + 8 + 0 + 2 + 1 = 11 10
> 30
> 0
> 0
> 1
> 1
> 2n
> 2n
> 1n
> 1n
2x2x2x2xx
>
>
>
Unsigned Binary Integers
Given an n-bit number
Range: 0 to +2 n 1
Using 32 bits
>
0 to +4,294,967,295
> 31
> 0
> 0
> 1
> 1
> 2n
> 2n
> 1n
> 1n
2x2x2x2xx
>
>
>
2s-Complement Signed Integers
Given an n-bit number
Example: What is the decimal value of
the 2s complement binary code?
>
1111 1111 1111 1111 1111 1111 1111 1100 2
= 12 31 + 12 30 + + 12 2 +02 1 +02 0
=
> 32
> 0
> 0
> 1
> 1
> 2n
> 2n
> 1n
> 1n
2x2x2x2xx
>
>
>
2s-Complement Signed Integers
Given an n-bit number
Example: What is the decimal value of
the 2s complement binary code?
>
1111 1111 1111 1111 1111 1111 1111 1100 2
= 12 31 + 12 30 + + 12 2 +02 1 +02 0
= 2,147,483,648 + 2,147,483,644 = 4 10
> 33
> 0
> 0
> 1
> 1
> 2n
> 2n
> 1n
> 1n
2x2x2x2xx
>
>
>
2s-Complement Signed Integers
Given an n-bit number
Range: 2 n 1 to +2 n 1 1
Using 32 bits
>
2,147,483,648 to +2,147,483,647
> 34
> 0
> 0
> 1
> 1
> 2n
> 2n
> 1n
> 1n
2x2x2x2xx
>
>
>
2s-Complement Signed Integers 2s-Complement Signed Integers
Bit 31 is sign bit
>
1 for negative numbers
>
0 for non-negative numbers
Non-negative numbers have the same unsigned and 2s-complement
representation
Some specific numbers
>
0: 0000 0000 0000
>
1: 1111 1111 1111
>
Most-negative: 1000 0000 0000 2,147,483,648
>
Most-positive: 0111 1111 1111 2,147,483,647
> 36
2s-Complement Signed Integers
Complement and add 1
Complement means 1 0, 0 1
Example: negate +2
>
+2 = 0000 0000 0010 2
>
2 = 1111 1111 1101 2 + 1
= 1111 1111 1110 2
> 37
x1x
11111...111 xx 2
X = 0000 1111
X = 1111 0000 Sign Extension
Representing a number using more bits
>
Preserve the numeric value
Why It's Important:
>
When instructions handle:
>
addi : extend immediate value
>
lb , lh : extend loaded byte/halfword
Replicate the sign bit to the left
>
c.f. unsigned values: extend with 0s
Examples: 8-bit to 16-bit
>
+2: 0000 0010 => 0000 0000 0 000 0010
>
2: 1111 1110 => 1111 1111 1 111 1110
> 38
Sign Extension
In MIPS instruction set
>
addi : extend immediate value
>
addi $t0, $t1, -13 # $t0 = $t1 + (-13)
-13 is represented as 11110011 in 8-bit and 11111111 11110011 in 16-bit
This value is sign-extended to 11111111 11111111 11111111 11110011 in 32-bit before the
addition.
>
lb , lh : extend loaded byte/halfword
lb (load byte) and lh (load halfword) are instructions used for loading 8-bit and 16-
bit data from memory into registers
> 39
Representing Instructions
MIPS R-format Instructions
>
used for arithmetic and logical operations
MIPS I-format Instructions
>
used for operations involving constants (immediate values), load/store
instructions, and branch instructions .
> 40
Representing Instructions
Instructions are encoded in binary
>
Called machine code
MIPS instructions
>
Encoded as 32-bit instruction words
>
Operation code (opcode), register numbers,
Register numbers
>
$t0 $t7 are regs number 8 15
>
$t8 $t9 are regs number 24 25
>
$s0 $s7 are regs number16 23
> $t is registers for temporary values
> $s is registers for saved variables in a program
> 41
MIPS R-format Instructions
Instruction fields
>
op : operation code (opcode)
>
rs : first source register number
>
rt : second source register number
>
rd : destination register number
>
shamt: shift amount (00000 for now)
>
funct : function code (extends opcode)
> 42
op rs rt rd shamt funct
> 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits 43
R-format Example (add operation)
add $t0 , $s1, $s2 ($t0 =$s1+$s2 )
special $s1 $s2 $t0 0 add
0 17 18 8 0 32
000000 10001 10010 01000 00000 100000
00000010001100100100000000100000 2 = 02324020 16
Dicimal
Binary
op rs rt rd shamt funct
> 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits 44
R-format Example (sub operation)
sub $t0 , $s1, $s2 ($t0 =$s1-$s2 )
special $s1 $s2 $t0 0 sub
0 17 18 8 0 34
000000 10001 10010 01000 00000 100010
00000010001100100100000000100010 2 = 02324022 16
Dicimal
Binary
op rs rt rd shamt funct
> 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits
MIPS R-format Instructions
Funct (dec) Funct (bin) Opcode (dec) Opcode (bin) Instruction
32 100000 0000000 add
34 100010 0000000 sub
36 100100 0000000 and
37 100101 0000000 or
38 100110 0000000 xor
39 100111 0000000 nor
42 101010 0000000 slt
0000000 0000000 sll
2000010 0000000 srl
8001000 0000000 jr
45 MIPS I-format Instructions
Immediate arithmetic and load/store instructions
>
rt : destination or source register number
>
Constant : 2 15 to +2 15 1
>
Address : offset added to base address in rs
Design Principle: Good design demands good compromises
>
Different formats complicate decoding, but allow 32-bit instructions
uniformly
>
Keep formats as similar as possible
> 46
> op rs rt constant or address
> 6 bits 5 bits 5 bits 16 bits 47
I-format Example (Load operation)
## lw $t0 , 32( $s3) #A[8] is loaded to $t0
35 19 8
100011 10011 01000
10001110011010000000000000100000 2 = 8D680020 16
Dicimal
Binary
Load word rs rt constant or address
32
0000000000100000
op rs rt constant or address
> 6 bits 5 bits 5 bits 16 bits
MIPS I-format Instructions
Opcode (dec) Opcode (bin) Instruction
8001000 addi
12 001100 andi
13 001101 ori
14 001110 xori
35 100011 lw
43 101011 sw
32 100000 lb
40 101000 sb
33 100001 lh
41 101001 sh
4000100 beq
5000101 bne
10 001010 slti
48 Logical Operations
Instructions for bitwise manipulation
Useful for extracting and inserting groups of bits in a word
> 49
MIPS Java COperation
sll << << Shift left
srl >>> >> Shift right
and, andi &&Bitwise AND
or, ori ||Bitwise OR
nor ~~Bitwise NOT Shift Operations: MIPS R-format Instructions
shamt: how many positions to shift
Shift left logical
>
Shift left and fill with 0 bits
>
sll by
i bits multiplies by 2
> i
Shift right logical
>
Shift right and fill with 0 bits
>
srl by
i bits divides by 2
> i
(unsigned only)
> 50
op rs rt rd shamt funct
> 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits
AND Operations
Useful to mask bits in a word
>
Select some bits, clear others to 0
and $t0, $t1, $t2
> 51
0000 0000 0000 0000 0000 1101 1100 0000
0000 0000 0000 0000 0011 1100 0000 0000
$t2
$t1
0000 0000 0000 0000 0000 1100 0000 0000 $t0 OR Operations
Useful to include bits in a word
>
Set some bits to 1, leave others unchanged
or $t0, $t1, $t2
> 52
0000 0000 0000 0000 0000 1101 1100 0000
0000 0000 0000 0000 0011 1100 0000 0000
$t2
$t1
0000 0000 0000 0000 0011 1101 1100 0000 $t0 NOT Operations
Useful to invert bits in a word
>
Change 0 to 1, and 1 to 0
MIPS has NOR 3-operand instruction
>
a NOR b == NOT ( a OR b )
nor $t0, $t1, $zero
> 53
0000 0000 0000 0000 0011 1100 0000 0000 $t1
1111 1111 1111 1111 1100 0011 1111 1111 $t0
> Register 0: always
> read as zero
# Computer Architecture
Computer Engineering, 3rd year, Semester 1
Week 07 LAST WEEK
> 2
Power saving
1. Parallelism
- Multi-Core Processors: Incorporate more cores into a single processor chip.
- Simultaneous Multithreading (SMT)
- GPUs and Accelerators : Utilize graphics processing units (GPUs) and other
accelerators (such as TPUs for machine learning) Power saving
2. Memory Optimization
- Prefetching: Predict and load data into the cache before it is needed by the
CPU to reduce wait times.
- Memory Bandwidth and Latency Improvements: Use faster memory
technologies (such as DDR5 or HBM) and optimize memory controller
designs Power saving
3. Software and Algorithm Optimization
- Efficient Algorithms: Use algorithms with lower computational complexity to
reduce the number of instructions required to complete a task.
- Concurrency: Design software to effectively utilize multiple cores and threads Power saving
4. Specialized Hardware : Develop custom hardware tailored for specific tasks
5. Energy-Efficient Computing : Turn off portions of the processor when they
are not in use
6. Heterogeneous Computing: Integrate different types of processors on the
same chip to handle diverse workloads Execution of programs
> 7
CPU
> 100101001010100
> 000001011011100
> 111001111010011
lw $4, 0($1)
lw $5, 4($1)
add $2, $4, $5
Compiler
If (y == 1)
c=a+b ;
else
c=a-b ;
High-level programming language
such as C, Java, Verilog-HDL etc..
Instruction
set
Assembly lang.
Machine lang. to CPU
Binary code CPUa
CPUb
CPUc
C Program
## CPU Independent
0 1 1
1 1 1
1 1 0
1 0 1
0 1 0
0 0 1
Interface between programs and CPUs
Compiler
for CPU-C
Instruction set C
Compiler
for CPU-B
Instruction set B
Compiler
for CPU-A
Instruction set A
## CPU dependent
Machine lang. Register
> PC
> Decoder
>
>
Processor
Main
memory
ALU Small capacity
Fast access speed Buss
Big capacity
Slow access
> ALU (Arithmetic-Logic Unit)
Operations of a Processor Register
> PC
> Decoder
>
>
Processor
Main
memory
ALU Small capacity
Fast access speed Buss
Big capacity
Slow access
> ALU (Arithmetic-Logic Unit)
Executes arithmetic
and logic operations
Stores data computed by the ALU
Data is read from main memory and
written back to main memory
Stores programs,
instructions and data
Load / Store architecture
The way of access to the main memory is
Load (read) and store (write) instructions
ALU executes operations on registers values
Operations of a Processor Register
>
> Processor
> Address bus
> Data bus
> A
> B
ALU
Main memory
Operations of a Processor
When you want to sum two
data in the main memory and
write the result back to memory
1: Reading data from memory to
register (Load instruction)
2: ALU calculations using
register values
3: Write the value of the register
to memory (Store instruction) Register
>
> Processor
> Address bus
> Data bus
> A
> B
ALU
Main memory
Operations of a Processor
When you want to sum two
data in the main memory and
write the result back to memory
1: Reading data from memory to
register (Load instruction)
2: ALU calculations using
register values
3: Write the value of the register
to memory (Store instruction)
> B
> A
Register
>
> Processor
> Address bus
> Data bus
> A
> B
ALU
Main memory
Operations of a Processor
When you want to sum two
data in the main memory and
write the result back to memory
1: Reading data from memory to
register (Load instruction)
2: ALU calculations using
register values
3: Write the value of the register
to memory (Store instruction)
> B
> A
> C
ALU
Register
>
> Processor
> Address bus
> Data bus
> A
> B
ALU
Main memory
Operations of a Processor
When you want to sum two
data in the main memory and
write the result back to memory
1: Reading data from memory to
register (Load instruction)
2: ALU calculations using
register values
3: Write the value of the register
to memory (Store instruction)
> B
> A
> C
ALU
> C
# TODAY
> 15
Register Operands
Arithmetic instructions use register operands
MIPS has a 32 32-bit register file
>
Use for frequently accessed data
>
Numbered 0 to 31
>
32-bit data called a word
Assembler names ex.
>
$t0, $t1, , $t9 for temporary values
>
$s0, $s1, , $s7 for saved variables (of initial inputs)
Design Principle : Smaller is faster
>
main memory: millions of locations -> Slower
>
Keep frequently used data in registers. -> Fast Access
>
Minimize memory accesses 16
> Use $t when you dont care about keeping the
> value after a function call.
> Use $s when the value must survive across calls.
MIPS: Temporary vs. Saved Registers
Example:
# Using $t0 for temporary math
add $t0, $a0, $a1
# Using $s0 to store a persistent value
add $s0, $a2, $a3
If this function calls another function, it must save $s0 (using sw) before calling
and restore (using lw) afterward.
But $t0 doesnt need to be preserved it might be overwritten.
> 17
MIPS Register Numbering and Assembler Names
MIPS has 32 general-purpose registers , labeled $0 to $31
These are hardware register numbers , but in assembly programming, we use symbolic names to make
them easier to read and understand.
18
Common Use Assembler Name Register Number
Always 0 (read-only) $zero $0
Reserved for assembler $at $1
Function return values $v0$v1 $2$3
Function arguments $a0$a3 $4$7
Temporary registers $t0$t7 $8$15
Saved registers $s0$s7 $16$23
More temporary registers $t8$t9 $24$25
Reserved for kernel $k0$k1 $26$27
Global pointer $gp $28
Stack pointer $sp $29
Frame pointer $fp $30
Return address $ra $31 Memory Operands
Main memory used for composite data
>
Arrays, structures, dynamic data
To apply arithmetic operations
>
MIPS does not perform arithmetic directly on memory.
>
Load values from memory into registers
>
Store result from register to memory
Memory is byte addressed
>
Each address identifies an 8-bit (1 byte)
>
MIPS loads/stores 32-bit words (4 bytes)
Words are aligned in memory
>
Address must be a multiple of 4
> 19
> Byte Address
> Byte 1 0
> Byte 2 1
> Byte 3 2
> Byte 4 3
Memory Operand Example
C code:
g = h + A[8];
>
g is in $s1, h is in $s2
>
Base address of array A, that is A[0], is in $s3
Compiled MIPS code:
>
Index 8 requires offset of 32 (4 bytes per word)
lw $t0, 32($s3) # load word
add $s1, $s2, $t0
> 20
> offset base register
A[0]
> 1000
> 1004
> 1008
1 word=4 bytes (32 bit)
> 1012
> 1016
> 1020
> 1024
> 1028
> 1032
A[1]
A[2]
A[3]
A[4]
A[5]
A[6]
A[7]
A[8]
> Byte 1 0
> Byte 2 1
> Byte 3 2
> Byte 4 3
Registers vs. Memory
## Registers are faster to access than memory
## Operating on memory data requires loads and stores
>
More instructions to be executed
## Compiler must use registers for variables as much as
## possible
>
Only spill to memory for less frequently used variables
>
Register optimization is important!
> 21
The MIPS Instruction Set
Used as the example in this class
Stanford MIPS commercialized by MIPS Technologies
Typical of many modern ISA s
(Instruction Set Architecture )
Similar ISAs have a large share of embedded core market
> Applications in consumer electronics, network/storage equipment,
> cameras, printers,
It's known for its simplicity and efficiency
> 22
The MIPS Instruction Set
MIPS instructions are categorized into 5 major types:
1. Arithmetic Instructions
2. Logical Operations
3. Memory Access Instructions
4. Control flow Instructions
> 23
Arithmetic Instructions
Add operation has three operands
>
Two sources and one destination
add a, b, c # a gets b + c
All arithmetic operations have this form
sub and or slt
Design Principle: Simplicity favors regularity
>
Regularity makes implementation simpler
>
Simplicity enables higher performance at lower cost
> 24
Arithmetic Instructions
Assumed Register
Mapping
MIPS Equivalent C Equivalent Description Instruction
$t0 = x, $t1 = a, $t2 =
b
add $t0, $t1, $t2 x = a + b Addition add
$t0 = x, $t1 = a, $t2 =
b
sub $t0, $t1, $t2 x = a - b Subtraction sub
$t0 = x, $t1 = a addi $t0, $t1, 5 x = a + 5
Add immediate
(constant)
addi
$t0 = x, $t1 = a, $t2 =
b
slt $t0, $t1, $t2 x = (a < b) ? 1 : 0 Set if less than (signed) slt
$t0 = x, $t1 = a, $t2 =
b
sltu $t0, $t1, $t2 x = (a < b) ? 1 : 0
Set if less than
(unsigned)
sltu
25 Logical Instructions
Assumed Register
Mapping
MIPS Equivalent C Equivalent Description Instruction
$t0 = x, $t1 = a, $t2 = b and $t0, $t1, $t2 x = a & b Bitwise AND and
$t0 = x, $t1 = a, $t2 = b or $t0, $t1, $t2 x = a | b Bitwise OR or
$t0 = x, $t1 = a, $t2 = b xor $t0, $t1, $t2 x = a ^ b Bitwise XOR xor
$t0 = x, $t1 = a, $t2 = b nor $t0, $t1, $t2 x = ~(a | b) Bitwise NOR nor
$t0 = x, $t1 = a andi $t0, $t1, 0xF0 x = a & 0xF0 AND with constant andi
26 Memory Access Instructions
Assumed Register
Mapping
MIPS Equivalent C Equivalent Description Instruction
$t0 = x, $s1 = base
address
lw $t0, 8($s1) x = A[i]
Load word from
memory
lw
$t0 = x, $s1 = base
address
sw $t0, 8($s1) A[i] = x
Store word to
memory
sw
$t0 = x, $s1 = base
address
lb $t0, 4($s1) x = byte[i]
Load byte from
memory
lb
$t0 = x, $s1 = base
address
sb $t0, 4($s1) byte[i] = x
Store byte to
memory
sb
27 Control Flow Instructions
Assumed Register
Mapping
MIPS Equivalent C Equivalent Description Instruction
$t1 = a, $t2 = b beq $t1, $t2, LABEL if (a == b) Branch if equal beq
$t1 = a, $t2 = b bne $t1, $t2, LABEL if (a != b) Branch if not equal bne
-j LABEL goto LABEL Unconditional jump j
return addr in $ra jal FUNC call FUNC() Jump and link (call) jal
$ra = return address jr $ra return
Jump to register
(return)
jr
28 Computer Architecture
Computer Engineering, 3rd year, Semester 1
Week 06 LAST WEEK
> 2
## Performance and Execution time
Performance is
For two computers X and Y, if the performance X is greater than Y, we have
That is, the execution time on Y is longer, if X is faster
> 3
Execution Time
Performance
Performance X Performance Y
Execution Time X Execution Time Y Relative Performance
Define Performance = 1/Execution Time
X is n time faster than Y
Example: time taken to run a program
>
10s on comp. A, 15s on comp. B
>
Execution Time B / Execution Time A
= 15s / 10s = 1.5
>
So A is 1.5 times faster than B
> 4
Performance X
Performance Y Execution Time X
Execution Time Y
> ==
## nCPU Clocking
Operation of digital hardware governed by a constant-rate clock
> Clock (cycles)
> Data transfer
> and computation
> Update state
> Clock period
>
Clock period: duration of a clock cycle
>
e.g., 250ps = 0.25ns = 250 10 12 s
>
Clock frequency (rate): cycles per second / Hz
>
e.g., 4.0GHz = 4000MHz = 4.0 10 9Hz CPU Time
CPU Time is the actual time the CPU spends processing instructions for a program.
Its a key measure of how efficiently a program runs on a processor.
Rate Clock
Cycles Clock CPU
Time Cycle Clock Cycles Clock CPU Time CPU
When executing a program, (using clock rate)
Unit: cycle Unit: seconds/cycle
Unit: cycles/second , Hz
Unit: second Instruction Count and CPI
Instruction Count = The total number of instructions a CPU must execute to
complete a program.
It depends on:
The program itself (what it's doing)
The compiler (how it translates code)
The instruction set architecture (ISA)
> 7
Instruction Count and CPI
CPI = The average number of clock cycles the CPU takes to execute one
instruction . (Cycles per Instruction)
It depends on:
CPU hardware design
Type of instruction (some are faster/slower)
How well the CPU handles memory and branches
> 8
Instruction Count and CPI
Instruction Count for a program
>
Determined by program, instruction set
Average cycles per instruction (CPI)
>
Determined by CPU hardware
Rate Clock
CPI Count nInstructio
Time Cycle Clock CPI Count nInstructio Time CPU
nInstructio per Cycles Count nInstructio Cycles Clock
When executing a program,
Unit: Cycles/Instruction Unit: Instructions CPI CPI in More Detail
If different instruction classes take different numbers of cycles
When a program runs, not all instructions are equal - some take more clock
cycles than others.
So instead of a single CPI value, we calculate an average CPI based on
instruction classes .
Example,
> Cycles per Instruction % of Instructions Instruction Type
> 1 cycle 50% ALU (Add/Sub)
> 2 cycles 30% Load/Store
> 3 cycles 20% Branches (if/jump)
CPI in More Detail
If different instruction classes take different numbers of cycles
Weighted average CPI
#
> n
> 1i
> ii
)Count nInstructio (CPI Cycles Clock
#
> n
> 1i
> i
> i
Count nInstructio
Count nInstructio
CPI
Count nInstructio
Cycles Clock
CPI
> Relative frequency
Power and Energy
Important for embedded systems and computers
Embedded systems want to save battery usage as much as possible. and
reduce the heat generated
C = capacitance (how much charge moves in each switch)
V = voltage
f = clock frequency (how often switching happens)
= 1
2 CapacitiveLoad
= CapacitiveLoad
=
Power and Energy
Power increases with the square of voltage!
>
Lowering voltage is very effective for saving energy.
Power is proportional to frequency
>
Higher clock speeds = more energy burned.
= 1
2 CapacitiveLoad
= CapacitiveLoad
=
CPU Time Quiz I
Computer A: 2GHz clock
it takes 10s CPU time to run a program
Try to design Computer B
> Aim for 6s CPU time
> Can do faster clock, but causes 1.2 clock cycles than comp. A
How fast must Computer B clock be?
4GHz
6s
10 24
6s
10 20 1.2
Rate Clock
10 20 2GHz 10s
Rate Clock Time CPU Cycles Clock
6s
Cycles Clock 1.2
Time CPU
Cycles Clock
Rate Clock
> 99
> B
> 9
> AAA
> A
> B
> B
> B
CPI Quiz II
Computer A has a clock cycle time = 250ps,
CPI = 2.0 for some program
Computer B has a clock cycle time = 500ps,
CPI = 1.2 for the same program
Which is faster, and by how much?
1.2
500ps I
600ps I
A
Time CPU
B
Time CPU
600ps I500ps 1.2 I
B
Time Cycle
B
CPI Count nInstructio
B
Time CPU
500ps I250ps 2.0 I
A
Time Cycle
A
CPI Count nInstructio
A
Time CPU
> A is faster
> by this much
CPI Quiz III
Alternative compiled code sequences using instructions in classes A, B, C
> CBAClass
> 321CPI for class
> 212Instruction numbers
> in sequence 1
> 114Instruction numbers
> in sequence 2
1. Which sequence executes the most instructions?
2. Which will be faster? How about CPU clock cycles? CPI Quiz III
Alternative compiled code sequences using instructions in classes A, B, C
> CBAClass
> 321CPI for class
> 212Instruction numbers
> in sequence 1
> 114Instruction numbers
> in sequence 2
1. Which sequence executes the most instructions?
Sequence 1 executes 2 + 1+ 2 = 5 instructions
Sequence 2 executes 4 + 1+ 1 = 6 instructions
Seq. 1 executes fewer instructions CPI Quiz III
Alternative compiled code sequences using instructions in classes A, B, C
> CBAClass
> 321CPI for class
> 212Instruction numbers
> in sequence 1
> 114Instruction numbers
> in sequence 2
2. Which will be faster? How about CPU clock cycles?
CPU clock cycles1 = 2*1 + 1*2 + 2*3 = 10 cycles
CPU clock cycles2 = 4*1 + 1*2 + 1*3 = 9 cycles
Seq. 2 is faster even though it executes one more instruction Quiz IV
Suppose you reduce voltage by 17% and it reduces switching
frequency by 17%. So how much will the dynamic power be
reduced?
= 1
2 CapacitiveLoad
= 1
2 0.83 CapacitiveLoad (0.83 )
= (0.83)
0.6 Group Work: Reducing Power
The power wall
We cant reduce voltage much more
>
Lower voltage saves energy, but too low causes instability.
We cant remove more heat efficiently
>
Smaller chips = more transistors = more heat in less space.
So: Simply increasing clock speed or transistor count no longer works
without overheating or wasting power.
# Presentation
Your goal is to explain how computer architects overcome the Power Wall
and improve speed, efficiency, and energy use .TODAY: POWER SAVING PRESENTATION
> 21
# INSTRUCTION SET
> 22
Instruction and Instruction set
Instruction The words of a computers language
Instruction set A set of instructions and it depends on each computer
> 23
ADD
SUB
Logic
Branch
Comp. A
ADD
Branch
Comp. B
ADD
SUB
Comp. C
ADD
SUB
And
Branch Instruction set
Different computers have different instruction sets
>
But with many aspects in common
>
Most of instruction sets have similarity
Early computers had very simple instruction sets
>
Simplified implementation
Many modern computers also have simple instruction sets
> 24
The MIPS Instruction Set
Used as the example in this class
Stanford MIPS commercialized by MIPS Technologies
Typical of many modern ISA s
(Instruction Set Architecture )
Similar ISAs have a large share of embedded core market
> Applications in consumer electronics, network/storage equipment,
> cameras, printers,
It's known for its simplicity and efficiency
> 25
A part of MIPS instructions
In this class , we will use MIPS instruction set
> 26
Example Instruction
$s1 = $s2 + $s3 add $s1, $s2, $s3 add Arithmetic
$s1 = $s2 $s3 sub $s1, $s2, $s3 Subtract
Load from Memory address
[$s2+100] to register $s1
lw $s1, 100($s2) load word Data
translati
on Store from register $s1 to
Memory address [$s2+100]
sw $s1, 100($s2) store word
If $s1==$s2 then Branch to L beq $s1, $s2, L branch on equal Branch
If $s1!=$s2 then branch to L bne $s1, $s2, L branch on not
equal
If $s2<$s3 then $s1=1, otherwise
$s1=0
slt $s1, $s2, $s3 set on less than
Jump to L j L jump Jump
Jump to the address $s1 jr $s1 jump register Example
> 27
## Execution of programs
> 28
CPU
> 100101001010100
> 000001011011100
> 111001111010011
lw $4, 0($1)
lw $5, 4($1)
add $2, $4, $5
Compiler
If (y == 1)
c=a+b ;
else
c=a-b ;
High-level programming language
such as C, Java, Verilog-HDL etc..
Instruction
set
Assembly lang.
Machine lang. to CPU
Binary code CPUa
CPUb
CPUc
C Program
## CPU Independent
0 1 1
1 1 1
1 1 0
1 0 1
0 1 0
0 0 1
Interface between programs and CPUs
Compiler
for CPU-C
Instruction set C
Compiler
for CPU-B
Instruction set B
Compiler
for CPU-A
Instruction set A
## CPU dependent
Machine lang. Register
> PC
> Decoder
>
>
Processor
Main
memory
ALU Small capacity
Fast access speed Buss
Big capacity
Slow access
> ALU (Arithmetic-Logic Unit)
Operations of a Processor Register
> PC
> Decoder
>
>
Processor
Main
memory
ALU Small capacity
Fast access speed Buss
Big capacity
Slow access
> ALU (Arithmetic-Logic Unit)
Executes arithmetic
and logic operations
Stores data computed by the ALU
Data is read from main memory and
written back to main memory
Stores programs,
instructions and data
Load / Store architecture
The way of access to the main memory is
Load (read) and store (write) instructions
ALU executes operations on registers values
Operations of a Processor Register
>
> Processor
> Address bus
> Data bus
> A
> B
ALU
Main memory
Operations of a Processor
When you want to sum two
data in the main memory and
write the result back to memory
1: Reading data from memory to
register (Load instruction)
2: ALU calculations using
register values
3: Write the value of the register
to memory (Store instruction) Register
>
> Processor
> Address bus
> Data bus
> A
> B
ALU
Main memory
Operations of a Processor
When you want to sum two
data in the main memory and
write the result back to memory
1: Reading data from memory to
register (Load instruction)
2: ALU calculations using
register values
3: Write the value of the register
to memory (Store instruction)
> B
> A
Register
>
> Processor
> Address bus
> Data bus
> A
> B
ALU
Main memory
Operations of a Processor
When you want to sum two
data in the main memory and
write the result back to memory
1: Reading data from memory to
register (Load instruction)
2: ALU calculations using
register values
3: Write the value of the register
to memory (Store instruction)
> B
> A
> C
ALU
Register
>
> Processor
> Address bus
> Data bus
> A
> B
ALU
Main memory
Operations of a Processor
When you want to sum two
data in the main memory and
write the result back to memory
1: Reading data from memory to
register (Load instruction)
2: ALU calculations using
register values
3: Write the value of the register
to memory (Store instruction)
> B
> A
> C
ALU
> C
# Computer Architecture
Computer Engineering, 3rd year, Semester 1
Week 04 LAST WEEK
> 2
Computer Classification
Microcontrollers
Micro computer / Microcontroller
Usage : embedded systems
Features : Miniaturization, energy saving, low power consumption.
CPU, memory and peripheral circuits are contained in the same IC.
Currently : also known as SoC (System on a Chip)
> 3
This is a computer! Computer Classification
Personal computers
Usage : Office work, internet, gaming
Features : Low price, Versatile and scalable, Single architecture,
Large variety of peripherals, e.g., keyboard, mouse, display, sound
card, printer.
Currently : research, industry
> 4
Computer Classification
Workstations
Usage : large calculation e.g., research, industry
Features : Like PCs but with higher performance, High price.
Gone with the success of personal computers but revived with
artificial intelligence.
> 5
Computer Classification
Mainframes
Usage : Internet companies, banks, industry, education, offices, etc.
Features : Very high parallelism. One computer, many users (access
via terminals)
> 6
Computer Classification
Supercomputers
Usage : Industry and research huge computations.
Features :
>
Huge performance
>
Huge parallelism
>
Huge area
>
Huge power consumption
>
One computer, few users
> 7
Mechanical Era
Timeframe: 1620s to the early 20th century.
Key Developments:
Mechanical Calculators : which could perform basic arithmetic operations.
Charles Babbage's Designs : Difference Engine and the Analytical Engine.
Introducing concepts like programmability and mechanical computation.
Punched Card Systems : Tabulating and processing data
> Punched card - Wikipedia
Electro-Mechanical Era (Transition)
Timeframe: Roughly from the late 19th century to the mid-20th century.
Key Developments:
Integration of Electromechanical Components :
such as relays and switches.
Harvard Mark I, which used electromechanical relays for computation.
Punched Card Systems Continued : Punched card systems continued to be
widely used for data input and processing, particularly in industries requiring
large-scale data manipulation. Electronic Era
Timeframe: Starting in the 1940s until now.
Key Developments:
ENIAC : The Electronic Numerical Integrator and Computer (ENIAC),
completed in 1946, marked the beginning of the electronic era.
Transistors : In 1947, the invention of the transistor at Bell Labs revolutionized
electronics.
Transistors replaced bulky vacuum tubes, leading to smaller, faster, and more
reliable computers. Post-PC Era (Contemporary)
Timeframe: From the late 20th century to the present.
Key Characteristics:
Proliferation of Mobile Devices : Smartphones and tablets have become
popular
Touch Interfaces and Intuitive User Experiences
Cloud Computing
Internet of Things (IoT)
AI and Voice Assistants The Post PC Era Technology Trends (Now)
Advanced IC (AI, 3D, RISC-V, Thermodynamic)
AI chips :
TPU (Tensor Processing Unit) Google
NPU (Neural Processing Unit) Apple, Huawei
Chiplets : Modular chip building blocks used in AMD and Intel architectures
3D-Stacked ICs : Vertical integration to boost speed and density
RISC-V : Open-source CPU architecture with growing ecosystem
Thermodynamic computing (experimental): Uses physics to power ultra-low-
energy AI tasks
> 13
Semiconductor Technology
The foundation of all modern electronics including smartphones,
computers, cars, TVs, and even smart fridges!
Add materials to transform properties:
Conductors (Lets electricity flow)
Insulators (Blocks electricity)
Switch (Controls flow (on/off))
A material (like silicon) that can act as both a conductor and an insulator
depending on how its used.
This makes it perfect for building transistors the basic "on/off switches" in
all digital devices.
> 14
Defining Performance
Question Which airplane has the best performance?
Passenger capacity: Airbus
Cruising range : Boeing777
Cruising speed : Concorde
Best performance depends on the definition of performance
> 15
> Cruising
> speed
> (mile/h)
> Cruising
> range
> (miles)
> Passenger
> capacity
> Airplane
> 564 3,000 240 Boeing737
> 1,350 4,000 132 BAC/Sud Concorde
> 554 9,395 301 Boeing777
> 587 8,477 853 Airbus A380-800
Defining Performance
Group work: Lets discuss about the definition of performance
1. Propose a definition for performance
2. Under the definition, which one has the best performance?
> 16
Cruising
speed
(mile/h)
Cruising
range
(miles)
Passenger
capacity
Airplane
564 3,000 240 Boeing737
1,350 4,000 132 BAC/Sud Concorde
554 9,395 301 Boeing777
587 8,477 853 Airbus A380-800 Defining Performance
Group work: Lets discuss about the definition of performance
If we define the performance as
Less time required to transport people
Passenger capacity Cruising speed (mile/h)
> 17
Passenger
throughput
(passenmile/h)
Cruising
speed
(mile/h)
Cruising
range
(miles)
Passenger
capacity
Airplane
135,360 564 3,000 240 Boeing737
178,200 1,350 4,000 132 BAC/Sud Concorde
166,761 554 9,395 301 Boeing777
500,711 587 8,477 853 Airbus A380-800
Airbus has the best performance (Passenger throughput ) Response Time and Throughput
Response time (Execution time)
>
The total time required for a computer to complete a task, including disk
access, memory access, I/O activities
Throughput
>
Total amount of work done per unit time
> e.g., the number of tasks/transactions/ per hour
How are response time and throughput affected by
>
Replacing the processor with a faster version?
>
Adding more processors? Response Time and Throughput
Question A:
If we replace the processor in a computer with a faster version ,
1. Increase throughput
2. Decrease response time
3. Both of the above
> Replace a faster processor
Decrease response time
Increase throughput
Run program A
in 10 sec.
Run program A
in 5 sec. Run program A
twice in 10 sec. Response Time and Throughput
Question B: If we add additional processors to a system that uses multiple
processors for separate tasks
1. Increase throughput
2. Decrease response time
3. Both of the above
Do not decrease
response time
Increase throughput
Run program A
in 10 sec. then
program B
in 10 sec.
Run program B
in 10 sec.
Run program A
In 10 sec.
> Add more processors
# TODAY: PERFORMANCE OF COMPUTER
> 21
## Performance and Execution time
Performance is
For two computers X and Y, if the performance X is greater than Y, we have
That is, the execution time on Y is longer, if X is faster
> 22
Execution Time
Performance
Performance X Performance Y
Execution Time X Execution Time Y Relative Performance
Define Performance = 1/Execution Time
X is n time faster than Y
Example: time taken to run a program
>
10s on comp. A, 15s on comp. B
>
Execution Time B / Execution Time A
= 15s / 10s = 1.5
>
So A is 1.5 times faster than B
> 23
Performance X
Performance Y Execution Time X
Execution Time Y
> ==
## nMeasuring Execution Time
Elapsed time
>
Total response time, including all aspects
Processing, I/O, OS overhead, idle time(waiting)
>
Determines system performance
CPU time
>
Time spent processing a given job
Discounts I/O time, other jobs shares
>
User CPU time
Time spent executing user code (e.g., a programs logic).
>
System CPU time
Time spent doing system-level operations (e.g., file I/O, memory allocation by OS) CPU Clocking
Operation of digital hardware governed by a constant-rate clock
> Clock (cycles)
> Data transfer
> and computation
> Update state
> Clock period
>
Clock period: duration of a clock cycle
>
e.g., 250ps = 0.25ns = 250 10 12 s
>
Clock frequency (rate): cycles per second / Hz
>
e.g., 4.0GHz = 4000MHz = 4.0 10 9Hz CPU Time
CPU Time is the actual time the CPU spends processing instructions for a program.
Its a key measure of how efficiently a program runs on a processor.
Rate Clock
Cycles Clock CPU
Time Cycle Clock Cycles Clock CPU Time CPU
When executing a program, (using clock rate)
Unit: cycle Unit: seconds/cycle
Unit: cycles/second , Hz
Unit: second Instruction Count and CPI
Instruction Count = The total number of instructions a CPU must execute to
complete a program.
It depends on:
The program itself (what it's doing)
The compiler (how it translates code)
The instruction set architecture (ISA)
> 27
Instruction Count and CPI
CPI = The average number of clock cycles the CPU takes to execute one
instruction . (Cycles per Instruction)
It depends on:
CPU hardware design
Type of instruction (some are faster/slower)
How well the CPU handles memory and branches
> 28
Instruction Count and CPI
Instruction Count for a program
>
Determined by program, instruction set
Average cycles per instruction (CPI)
>
Determined by CPU hardware
Rate Clock
CPI Count nInstructio
Time Cycle Clock CPI Count nInstructio Time CPU
nInstructio per Cycles Count nInstructio Cycles Clock
When executing a program,
Unit: Cycles/Instruction Unit: Instructions CPI CPI in More Detail
If different instruction classes take different numbers of cycles
When a program runs, not all instructions are equal - some take more clock
cycles than others.
So instead of a single CPI value, we calculate an average CPI based on
instruction classes .
Example,
> Cycles per Instruction % of Instructions Instruction Type
> 1 cycle 50% ALU (Add/Sub)
> 2 cycles 30% Load/Store
> 3 cycles 20% Branches (if/jump)
CPI in More Detail
If different instruction classes take different numbers of cycles
Weighted average CPI
#
> n
> 1i
> ii
)Count nInstructio (CPI Cycles Clock
#
> n
> 1i
> i
> i
Count nInstructio
Count nInstructio
CPI
Count nInstructio
Cycles Clock
CPI
> Relative frequency
Performance Summary
Performance depends on
>
Algorithm : affects Instruction Count(IC), possibly CPI
>
Programming language : affects IC, CPI
>
Compiler : affects IC, CPI
>
Instruction Set Architecture : affects IC, CPI, T c
>
Fewer instructions = potentially faster performance
>
Lower CPI = better performance
>
Shorter cycles = faster execution
>
Higher clock rate = more instructions per second
cycle Clock
Seconds
nInstructio
cycles Clock
Program
ns Instructio
Time CPU MIPS
MIPS measures how many millions of instructions a CPU can execute per second .
Execution speed of a computers processor
Comparative Analysis: It allows for easy comparison between different processors
and systems.
A higher MIPS value generally indicates a faster CPU
>
Power and Energy
Important for embedded systems and computers
Embedded systems want to save battery usage as much as possible. and
reduce the heat generated
C = capacitance (how much charge moves in each switch)
V = voltage
f = clock frequency (how often switching happens)
= 1
2 CapacitiveLoad
= CapacitiveLoad
=
Power and Energy
Power increases with the square of voltage!
>
Lowering voltage is very effective for saving energy.
Power is proportional to frequency
>
Higher clock speeds = more energy burned.
= 1
2 CapacitiveLoad
= CapacitiveLoad
=
The von Neumann Bottleneck
For the von Neumann computers
Single-memory: both program and data are stored in the same memory and
are transferred over the same bus.
Instructions and data cannot be retrieved from memory at the same time, this
is the von Neumann bottleneck.
> 22
Summarize
Computer architecture : overall design of a computer system including:
>
Hardware specifications
>
Software (OS, compilers)
>
Behavior and structure of the computer
Three Key Questions about Computer Processing
>
How is information represented?
>
How is data calculated?
>
What processes are used?
Von Neumann Architecture
>
Stored-program concept
>
Sequential processing
>
Linear memory addressing
Instruction Cycle (FetchDecodeExecute) 23 Computer Architecture
Computer Engineering, 3rd year, Semester 1
Week 03 LAST WEEK
> 2
## What do we mean by architecture?
In general, Architecture means a style, structure, or composition
of buildings
Computer Architecture
>
Basic design and design concepts in computers
>
Hardware specifications with system software such as
operating systems and compilers
>
Architecture of the computer + Behavior of the computer Overall view of a computer
> 4
> Computer Architecture
## Instruction set Operating system
## Operative unit Memory unit
## Control unit Input and output units
> Architecture of the computer
> Behavior of the computer
Three questions about computer processing
Input data
Information
Processing, computing and storage
of information
Output data
Information
Display
Q1
How do computers
represent information?
what format?
Q
How is the acquired data
calculated?
Q
What procedures are used
to process (calculate) the
"information"? Von Neumann Architecture (Von Neumann-type Computers)
Many computers are designed as Neumann-type calculators
Proposed by John von Neumann in 1945. The Basic Structure of Neumann computer
Stored-program computer
>
Instructions (programs) and data are placed indistinguishably in main memory
>
The distinction between instructions and data is made by the program
Sequential processing
>
A program is a list of instructions
>
Instructions are fetched one by one from main memory and executed in a
determined order
>
The position in memory of the next instruction to execute is stored into a
register called the program counter .
Linear address
>
Each cell of main memory is numbered sequentially
>
This number is called address
>
The Address is used to indicate the location of instructions and data 7
The memory address
PC The Basic Structure of Neumann computer
> 8
The Basic Structure of Neumann computer
The central processing unit (CPU) is the part of the computer that contains
the arithmetic and control units.
>
Operating Unit : The arithmetic unit is the device that performs arithmetic and
logical operations and that temporarily stores the operation terms and results.
>
Control Unit : The control unit is the device that controls the operation of the
computer.
Memory Unit : The memory unit is the device that stores the program and the
data.
>
Main Memory Unit : The main memory is the part of the storage unit to which
the CPU has direct access.
>
Auxiliary Memory Unit : The auxiliary storage is the part of the memory that
cannot be directly accessed by the CPU.
Input/Output Unit(s) : The input/output unit is the device that performs the
data transfers between the computer and its environment. 9The detailed structure of the CPU
The operative unit
> 10
The detailed structure of the CPU
The operative unit
(1) General-Purpose Registers (GPR)
The general-purpose registers are the registers that stores temporarily the terms
and the results of the computers calculations.
Compared to the memory units: Faster, smaller and without any address
(2) Arithmetic and Logic Unit (ALU)
The ALU is the device that performs the arithmetic and logic operations.
(3) Flag Register (FR)
> 11
The detailed structure of the CPU
The operative unit
(1) General-Purpose Registers (GPR)
(2) Arithmetic and Logic Unit (ALU)
(3) Flag Register (FR)
The flag register stores the status of the previous operation.
The flag register a 4 or more bits
Roles: Conditional branching (if and for statements)
> 12
CF OF ZF SF
CF: Carry Flag
The output carry of the operation
OF: Overflow Flag
The overflow of the signed calculations
ZF: Zero Flag
Indicates if the result were 0.
SF: Sign Flag
Indicates if the result were negative The detailed structure of the CPU
The Control Unit
> 13
The detailed structure of the CPU
The Control Unit
(1) Program Counter (PC)
The PC is the register that stores the address of the next instruction to be executed.
(2) Instruction Register (IR)
The IR is the register that stores the instruction to be executed.
(3) Decoder (DE)
The decoder is a circuit that decodes the value of the instruction stored in the
instruction register.
Its result drives a set of control signals for the computer.
(4) Sequencer
The sequencer is the circuit that generates the control signals related to the clock
and the status of the computer.
> 14
The Basic Behavior of a von Neumann Computer
Execution flow/Instruction Cycle: the following infinite loop
> 15
## Fetch
## (FE)
## Decode
## (DE)
## Execute
## (EX) The Basic Behavior of a von Neumann Computer
> 16
(1) Fetch (FE)
Fetch is the step at which the instruction indicated by the PC is read
from memory and stored in the IR.
Definition
(2) Decode (DE)
Decode is the step at which the instruction stored in the IR is
decoded.
Definition
(3) Execute (EX)
Execution is the step at which the instruction is carried out
depending on the results of the decoding stage.
Definition
(1) Fetch (FE)
Fetch is the step at which the instruction indicated by the PC is read
from memory and stored in the IR.
Definition
(2) Decode (DE)
Decode is the step at which the instruction stored in the IR is
decoded.
Definition
(3) Execute (EX)
Execution is the step at which the instruction is carried out
depending on the results of the decoding stage.
Definition The von Neumann Bottleneck
What is a bottleneck?
A bottlenecks is a situation that limit the overall performance of a given
product.
> 17
A bottles neck The von Neumann Bottleneck
For the von Neumann computers
Single-memory: both program and data are stored in the same memory and
are transferred over the same bus.
Instructions and data cannot be retrieved from memory at the same time, this
is the von Neumann bottleneck.
> 18
# TODAY
> 19
Topic
## Computer Classification
## Computer Evolution
## Semiconductor
## Computer Performance
> 20
Computer Classification
Microcontrollers
Personal computers
Workstations
Mainframes
Supercomputers
> 21
Computer Classification
Microcontrollers
Micro computer / Microcontroller
Usage : embedded systems
Features : Miniaturization, energy saving, low power consumption.
CPU, memory and peripheral circuits are contained in the same IC.
Currently : also known as SoC (System on a Chip)
> 22
This is a computer! Computer Classification
Personal computers
Usage : Office work, internet, gaming
Features : Low price, Versatile and scalable, Single architecture,
Large variety of peripherals, e.g., keyboard, mouse, display, sound
card, printer.
Currently : research, industry
> 23
Computer Classification
Workstations
Usage : large calculation e.g., research, industry
Features : Like PCs but with higher performance, High price.
Gone with the success of personal computers but revived with
artificial intelligence.
> 24
Computer Classification
Mainframes
Usage : Internet companies, banks, industry, education, offices, etc.
Features : Very high parallelism. One computer, many users (access
via terminals)
> 25
Computer Classification
Supercomputers
Usage : Industry and research huge computations.
Features :
>
Huge performance
>
Huge parallelism
>
Huge area
>
Huge power consumption
>
One computer, few users
> 26
Computer Evolution
The evolution of computers can be broadly categorized into four main eras:
Mechanical Era
Electro-Mechanical Era (Transition)
Electronic Era
Post-PC Era (Contemporary)
> 27
Mechanical Era
Timeframe: 1620s to the early 20th century.
Key Developments:
Mechanical Calculators : which could perform basic arithmetic operations.
Charles Babbage's Designs : Difference Engine and the Analytical Engine.
Introducing concepts like programmability and mechanical computation.
Punched Card Systems : Tabulating and processing data
> Punched card - Wikipedia
Electro-Mechanical Era (Transition)
Timeframe: Roughly from the late 19th century to the mid-20th century.
Key Developments:
Integration of Electromechanical Components :
such as relays and switches.
Harvard Mark I, which used electromechanical relays for computation.
Punched Card Systems Continued : Punched card systems continued to be
widely used for data input and processing, particularly in industries requiring
large-scale data manipulation. Electronic Era
Timeframe: Starting in the 1940s until now.
Key Developments:
ENIAC : The Electronic Numerical Integrator and Computer (ENIAC),
completed in 1946, marked the beginning of the electronic era.
Transistors : In 1947, the invention of the transistor at Bell Labs revolutionized
electronics.
Transistors replaced bulky vacuum tubes, leading to smaller, faster, and more
reliable computers. Post-PC Era (Contemporary)
Timeframe: From the late 20th century to the present.
Key Characteristics:
Proliferation of Mobile Devices : Smartphones and tablets have become
popular
Touch Interfaces and Intuitive User Experiences
Cloud Computing
Internet of Things (IoT)
AI and Voice Assistants The Post PC Era The Post PC Era
Personal Mobile Device(PMD)
>
Battery operated, have become popular, offering computing capabilities
on the go.
>
Connects to the Internet
>
Smart phones, tablets, electronic glasses
>
Popular
Cloud computing
>
Software as a Service (SaaS)
>
Services and applications increasingly rely on cloud infrastructure for
storage, processing, and collaboration.
>
Reducing the dependence on local computing resources.
>
Amazon and Google The Post PC Era
Internet of Things (IoT)
>
Connected devices, from smart thermostats to fitness trackers, contribute
to the expanding ecosystem of computing beyond traditional PCs.
AI and Voice Assistants
>
Advances in artificial intelligence and natural language processing have
led to the integration of voice-controlled assistants
>
like Siri, Alexa, and Google Assistant into various devices, changing how
users interact with technology. Touchscreen
Post PC device
simplifying interaction with technology
Supersedes keyboard and mouse
Resistive and Capacitive types
>
Most tablets, smart phones use capacitive
>
Capacitive allows multiple touches simultaneously Technology Trends
Electronics technology continues to evolve
>
Increased capacity and performance
>
Reduced cost
Relative performance/cost Technology Year
1Vacuum tube 1951
35 Transistor 1965
900 Integrated circuit (IC) 1975
2,400,000 Very large-scale IC (VLSI) 1995
250,000,000,000 Ultra large-scale IC 2013
> DRAM capacity
# Now? Technology Trends (Now)
Advanced IC (AI, 3D, RISC-V, Thermodynamic)
AI chips :
TPU (Tensor Processing Unit) Google
NPU (Neural Processing Unit) Apple, Huawei
Chiplets : Modular chip building blocks used in AMD and Intel architectures
3D-Stacked ICs : Vertical integration to boost speed and density
RISC-V : Open-source CPU architecture with growing ecosystem
Thermodynamic computing (experimental): Uses physics to power ultra-low-
energy AI tasks
> 37
Semiconductor Technology
The foundation of all modern electronics including smartphones,
computers, cars, TVs, and even smart fridges!
Add materials to transform properties:
Conductors (Lets electricity flow)
Insulators (Blocks electricity)
Switch (Controls flow (on/off))
A material (like silicon) that can act as both a conductor and an insulator
depending on how its used.
This makes it perfect for building transistors the basic "on/off switches" in
all digital devices.
> 38
Manufacturing ICs
> 39
Yield: proportion of working dies per wafer Example: Intel Core 10th Gen
300mm wafer, 506 chips, 10nm technology
Each chip is 11.4 x 10.7 mm
> 40
Example: Intel Core Ultra
41 https://www.thailand.intel.com/content/www/th/th/content-details/842532/intel-core-ultra-processors-200hx-series-processors-quick-reference-guide-pdf.html Integrated Circuit Cost
Nonlinear relation to area and defect rate
>
Wafer cost and Die area are fixed.
>
Defect rate determined by manufacturing process.
>
Die area determined by architecture and circuit design.
> 2
area/2)) Die area per (Defects (1
1
Yield
area Die area Wafer wafer per Dies
Yield wafer per Dies
wafer per Cost
die per Cost
Wafer cost is fixed (e.g., $5,000 per 300mm wafer)
Dies per wafer depends on chip size
Yield is how many chips work (e.g., 90%)
This shows how defect rate and chip size affect the yield.
If die area increases, yield drops quickly (nonlinear).
If defect rate is high, fewer good chips are produced. Response Time and Throughput
Response time (Execution time)
>
The total time required for a computer to complete a task, including disk
access, memory access, I/O activities
Throughput
>
Total amount of work done per unit time
> e.g., the number of tasks/transactions/ per hour
How are response time and throughput affected by
>
Replacing the processor with a faster version?
>
Adding more processors?