Understanding CPU Architecture and Innovation

You might not realize it, but we are surrounded by CPUs or processors and the computing they do for us. They touch every aspect of our lives. CPUs are in your laptop, in the machines you use to check out at the grocery store, in the electronics that power the instruments in your car more efficiently. They enable our artists and scientists to create things that were unimaginable only yesterday. CPUs are everywhere and shape just about everything we do. Welcome to Architecture All Access. CPU Architecture. Part 1 I've always been fascinated with computers since I was a kid. When I was first captivated by those revolutionary green screen computers at my elementary school. And a lot has changed since then. And over the last 23 years I've had the privilege of working on some of the most well-known chip designs in Intel's history. From the architectural definition and design of the Pentium 4, to designing the Nehalem, Westmere, Haswell, Broadwell, and Tiger Lake processors, and many more. Today I help lead the development of Intel's client engineering teams, where I oversee the development of current and future products. Hi, my name is Boyd Phelps, and today I want to talk to you about CPUs, and more specifically about CPU architecture, its future, and the technologies involved in modern CPU design. As we do so, my hope is that you will gain not only a greater appreciation for what a CPU is, what it does, and how it does it, but that you'll also be inspired about the possibilities of what lies ahead in the decades to come. And who knows, maybe some of you will even have a hand in shaping that future. There is so much innovation ahead of us in what is one of the most amazing industries on Earth. We've divided the content into two modules. The first one focused on a brief history of CPU computer architecture. the concept of computing abstraction layers, and the instruction set architecture. Module 2 will cover the building blocks of CPUs broken down into what we call front-end and back-end, as well as a primer on caching. Think of Module 1 as the architecture module where we describe things at a high level, and Module 2 is the microarchitecture module for CPUs where we begin to click down a bit into the details. We've got a lot to cover, so let's get going. So what is a CPU? Well CPU stands for Central Processing Unit and it's often called the brain of the computer. The CPU sits at the center of everything in the computer and handles all of the computation needed to turn inputs from memory, like a photo on your hard drive, into outputs on your peripherals, like an image on your monitor. CPUs are a general-purpose flexible architecture that take in a stream of instructions from all types of workloads and compute or process information based on those instructions. Simply put, CPUs do what we tell them or program them to do. CPUs are what run your operating system, the web browser that you're using right now to watch this video, your favorite photo or video editing application, your productivity software, conferencing applications, and much, much more. With the advances in silicon technology over time, and our ability to continue miniaturizing transistors and make them more efficient, we've been able to pull more and more functionality onto the same piece of silicon that contains the CPU. This ability to continue shrinking transistors is based on a famous law, or observation, that we in the industry refer to as Moore's Law. Moore's Law is based on the observed trend showing that we can double the number of transistors per unit area about every two years. This simple law drives tremendous innovation. Making what was once thought to be impossible, not only possible, but also cheaper over time. Several decades ago, computers were a luxury of large institutions. Whereas today, computers, in some form or fashion, touch the lives of nearly everyone on Earth. When we look at a computer motherboard from 40 years ago, like this one, what we see is we have numerous expansion card slots that have different functionalities. We have I.O. devices, we have memory, we have printer drivers, we have... graphics cards, display drivers. And what's really impressive is when you take out one of these, look at the number of discrete components that you have on this. We have here, and you can probably count them, there are well over 20, 30 or more discrete components on this. What's impressive is all of this functionality today that existed in a PC of this size and this complexity now fits essentially in a microprocessor of this size. Yesterday's supercomputer that once filled a large room now Easily fits on a chip inside your laptop and we've only begun to scratch the surface of what is possible. The CPU or brain continues to be the center of it all. While there are other things like memory and I.O. controllers, display, media and graphics engines, and other components in your system on chip processor, in this class we'll focus on the brain, the central processing unit, which continues to be what we call the spark of life in modern computer architecture. Since the focus of this class again is on how an individual CPU core works, when we say CPU or processor, we'll be referring to an individual core. But before we jump into modern CPU design, let's go back to where it all started. At the birth of digital computing, computers were fragile, slow, and large. And I mean gigantic. ENIAC, the first general purpose digital computer from 1946 covered 1,800 square feet or the size of a modern home and weighed about 30 tons. ENIAC which stood for electronic numerical integrator and computer and other earlier computers were built using vacuum tube technology, which made them huge and unreliable. These were program-controlled computers, meaning that an operator programmed the computer with a set of switches and wires for each new calculation. These early computers were technically general purpose, but programming them was complicated and error-prone, often taking weeks. Even so, they were on the cutting edge of technology in the late 1940s. As an aside, speaking of error-prone programming, the term bug has been used as a part of engineering jargon long before the ENIAC. But it was used in an account by the famous computer pioneer Grace Hopper in 1946 in an interesting way. While working the Mark II and Mark III computers, operators traced an error on the Mark II to a moth trapped in a relay, recoining the term bug. The bug was carefully removed and taped to the logbook with the caption, first actual case of a bug being found. Fun fact. the silicon die of a modern microprocessor is smaller than that bug. On a more serious note, the importance of the ease of programmability, along with an abstracted computer model to make computers simpler to program and use is paramount in CPU design. In the late 1940s, prominent mathematician John Van Neumann popularized a new kind of computer architecture that simplified computer design and programming. His idea for a stored program computer reimagined the general purpose computer as three separate systems. A bank of memory for storing data and instructions, a central processing unit for decoding and executing instructions, and a set of input and output interfaces. This von Neumann architecture separated units for processing information, the CPU, from units that stored information, the memory, and allowed data and instructions to be stored and addressed in memory in the same way. It also introduced the four-step instruction cycle, fetching instructions from memory, decoding the instructions, executing the instructions, and storing the results back in memory. This architecture was so revolutionary that modern computers are still today based on these basic principles. In the 1950s semiconductor based transistors started to hit the market replacing the larger more unreliable mechanical vacuum tube based technologies. This brought smaller and faster circuits to the electronics industry. Then in 1959 Robert Noyce patented the first monolithic integrated circuit which combined multiple transistors on a single silicon chip. These early chips didn't have many transistors, but they enabled much smaller and more complex circuit designs. By 1968, several companies were making integrated circuits, and Robert Noyce teamed up with colleague Gordon Moore to get into the game, founding Intel, which was short for Integrated Electronics. Intel quickly found a niche, making memory chips, but in 1971 they unveiled the Intel 4004 single-chip microprocessor for the calculator market. With a whopping 2,300 transistors and a blazing fast clock speed of 740 kilohertz, it was the first general-purpose programmable processor and packed all of the computational power of the ENIAC into one tiny device. Engineers can now purchase a single component that can be customized with software for a variety of functions. In 1972, Intel released the first 8-bit microprocessor, the Intel 8008. But its successor, which released in 1974, known as the Intel 8080 was a real breakthrough. The Intel 8080 was a giant leap in CPU design. One of the world's most widespread microprocessors. And when the 16-bit 8086 was released in 1978, the world would never be the same. The Intel 8086 with its x86 instruction set architecture became the foundation for modern CPU designs that we still use today. Here you can see a dive picture of the 4004 microprocessor. And here is the 11th gen Intel Core processor codenamed Tiger Lake. You might look at both these die pictures side by side and think that the physical size is not that dramatic. But consider this. The 4004 CPU had 2300 transistors, while the Tiger Lake processor actually packs in billions and billions of transistors. In fact, I would have to have about 4 to 5 million of these chips to have the same number of transistors as Tiger Lake. What's even more remarkable is that the 4004 took a team of engineers to design. Whereas today, due to the power of computing abstraction, a single engineer handles millions of transistors. There are a couple of concepts that are key to understand with regards to computer architecture. First is the notion and use of the binary system. And second... is that concept of computing abstraction layers I was just mentioning. Digital computers use a number system based on zeros and ones rather than the decimal system that most people are familiar with. For computers to function, it's critical that we represent all control and data and numbers in simple on or off states. It turns out that the characters and numbers humans use to communicate with can be easily translated into a binary-based system of numbers or encodings like the popular ASCII representation for characters of the alphabet. There is great power in what the binary system allows us to do. By representing data as a series of on or off states in a computer or memory, we can simplify and create a foundation for computer engineers to do much more complicated tasks. The second concept is that of computational abstraction layers, and how you can start with very simple things like atoms and transistors, and add abstraction layer to abstraction layer, to build things up to complex applications that run in large data centers. In general, the abstraction layers that make up computing are well understood. And building from one layer to the next allows us to build complex computing structures from the bottom up that would be almost impossible to visualize from the top down. Allow me to explain. At the foundation of the computing abstraction layers, you have atoms, which get put together in materials like silicon, from which we build tiny transistors. These transistors act as switches that turn on or off. with the application of an electrical current or voltage signal. By connecting switches together in specific arrangements, we can form what we call the fundamental Boolean logic operators for performing calculations. AND, OR, and NOT. For example, one switch by itself can convert a 1 to a 0, and on its output, it is called a NOT, or an inverting operator. And two switches arranged in series form an AND operator. If we wire switches in parallel with each other, we get the OR operator at the output. These arrangements of switches are known as logic gates. So we can now organize or abstract ones and zeros to a language of logic that is more efficient to understand than the language of physics and the flow of electrons. Using transistors as switches and connecting the output of one to the input of another, we can build a variety of logic circuits or functional blocks. These functional blocks can take the form of adders, multiplexers, decoders, latches, flip-flops, registers, counters. The list goes on and on. The power of abstraction is incredible. And as you might expect, chaining functional blocks together allows for even more complex logical functions. With them, we can build custom execution units that perform specific calculations. For example, one of the most important execution units in a CPU is the arithmetic logic unit. or ALU. Designing a whole CPU comes down to building multiple specialized processing elements and connecting them together in ways that allow complex computations to be done. The combination of those processing elements into a design that can fetch instructions from memory, decode the instructions, execute those instructions, and store the results back in memory is what we call a microarchitecture, or the instantiation of an architecture implemented In hardware, John Van Neumann would be proud. So how do we get from hardware to software? With yet another abstraction layer, of course. The Instruction Set Architecture, or ISA, is a set of instructions that defines what kinds of operations can be performed in hardware. It is nothing more than the language of the computer. Much like English or Spanish are languages, most languages have dictionaries that describe the words, the format, grammatical syntax, and meanings to those who communicate in those languages. An ISA is an abstract model of the computer, sometimes also referred to as architecture or computer architecture. The ISA describes the memory model, supported data types, registers, and behavior of machine code. The sequences of... zeros and ones that the CPU must execute. You may have heard of several types of ISAs, like x86, ARM, or MIPS. The ISA acts like a sort of bridge between software and hardware. On the software side, a compiler uses the ISA to transform code written in a high-level language like C, Perl, or Java into machine code instructions or language that the CPU can process. The ISA is the dictionary of instructions, data types, and the formats that the CPU adhering to that ISA must execute. This means the application programmer can usually ignore the ISA and focus on what the programming language provides. On the hardware side, when a CPU microarchitecture is being designed, the ISA is used as a design spec that tells the engineer what operations it needs to execute. Because of this layer of abstraction, the instructions in the ISA are implementation independent. This means that even if a different company creates different microarchitecture designs, they can all run the same code based on the same ISA. Computer architects continue to evolve ISAs through extensions to the instruction set, much like new words are often added to dictionaries. These additional instructions are often created to perform certain operations more efficiently, leveraging new processing elements in their microarchitecture. These ISO extensions can increase a CPU's performance by streamlining operations for a particular arrangement of processing elements. Modern CPUs support thousands of different instructions, but many of them are related to arithmetic operations like addition, subtraction, multiplication. Logical operations like AND, OR, NOT. Memory operations like loading, storing, moving. And flow control, like branching. We'll explain what branching is later. In general, instructions consist of an opcode, which is the operand to be performed, like ADD, and a number of operands, the data to be operated on, like ADDA in register X to B in memory location Y. ISA instructions often also include additional bits of data that give the CPU more information relevant to the operation, which the CPU uses to decode and execute the instruction in an efficient way according to its microarchitecture. Of course, this is a highly simplified description of instruction set architectures, and it's important to realize that modern ICES are much more complex than we could hope to explain in this short video. ICES are one of the most critical parts of modern CPU design, as they are the linchpin between software and hardware that allows efficient, high-performance computation and seamless software experiences across a variety of CPU microarchitectures. In part two of this series, We'll go into the details of the microarchitecture, the specifics of the instruction cycle, and the different parts and functions of the front and back end of a CPU, as well as a primer on caching. Stay tuned.

Welcome to Architecture All Access. CPU Architecture. Part 1 I've always been fascinated with computers since I was a kid.

When I was first captivated by those revolutionary green screen computers at my elementary school. And a lot has changed since then. And over the last 23 years I've had the privilege of working on some of the most well-known chip designs in Intel's history. From the architectural definition and design of the Pentium 4, to designing the Nehalem, Westmere, Haswell, Broadwell, and Tiger Lake processors, and many more. Today I help lead the development of Intel's client engineering teams, where I oversee the development of current and future products.

Hi, my name is Boyd Phelps, and today I want to talk to you about CPUs, and more specifically about CPU architecture, its future, and the technologies involved in modern CPU design. As we do so, my hope is that you will gain not only a greater appreciation for what a CPU is, what it does, and how it does it, but that you'll also be inspired about the possibilities of what lies ahead in the decades to come. And who knows, maybe some of you will even have a hand in shaping that future. There is so much innovation ahead of us in what is one of the most amazing industries on Earth. We've divided the content into two modules.

The first one focused on a brief history of CPU computer architecture. the concept of computing abstraction layers, and the instruction set architecture. Module 2 will cover the building blocks of CPUs broken down into what we call front-end and back-end, as well as a primer on caching. Think of Module 1 as the architecture module where we describe things at a high level, and Module 2 is the microarchitecture module for CPUs where we begin to click down a bit into the details. We've got a lot to cover, so let's get going.

So what is a CPU? Well CPU stands for Central Processing Unit and it's often called the brain of the computer. The CPU sits at the center of everything in the computer and handles all of the computation needed to turn inputs from memory, like a photo on your hard drive, into outputs on your peripherals, like an image on your monitor. CPUs are a general-purpose flexible architecture that take in a stream of instructions from all types of workloads and compute or process information based on those instructions. Simply put, CPUs do what we tell them or program them to do.

CPUs are what run your operating system, the web browser that you're using right now to watch this video, your favorite photo or video editing application, your productivity software, conferencing applications, and much, much more. With the advances in silicon technology over time, and our ability to continue miniaturizing transistors and make them more efficient, we've been able to pull more and more functionality onto the same piece of silicon that contains the CPU. This ability to continue shrinking transistors is based on a famous law, or observation, that we in the industry refer to as Moore's Law. Moore's Law is based on the observed trend showing that we can double the number of transistors per unit area about every two years.

This simple law drives tremendous innovation. Making what was once thought to be impossible, not only possible, but also cheaper over time. Several decades ago, computers were a luxury of large institutions. Whereas today, computers, in some form or fashion, touch the lives of nearly everyone on Earth.

When we look at a computer motherboard from 40 years ago, like this one, what we see is we have numerous expansion card slots that have different functionalities. We have I.O. devices, we have memory, we have printer drivers, we have... graphics cards, display drivers. And what's really impressive is when you take out one of these, look at the number of discrete components that you have on this.

We have here, and you can probably count them, there are well over 20, 30 or more discrete components on this. What's impressive is all of this functionality today that existed in a PC of this size and this complexity now fits essentially in a microprocessor of this size. Yesterday's supercomputer that once filled a large room now Easily fits on a chip inside your laptop and we've only begun to scratch the surface of what is possible. The CPU or brain continues to be the center of it all.

While there are other things like memory and I.O. controllers, display, media and graphics engines, and other components in your system on chip processor, in this class we'll focus on the brain, the central processing unit, which continues to be what we call the spark of life in modern computer architecture. Since the focus of this class again is on how an individual CPU core works, when we say CPU or processor, we'll be referring to an individual core.

But before we jump into modern CPU design, let's go back to where it all started. At the birth of digital computing, computers were fragile, slow, and large. And I mean gigantic.

ENIAC, the first general purpose digital computer from 1946 covered 1,800 square feet or the size of a modern home and weighed about 30 tons. ENIAC which stood for electronic numerical integrator and computer and other earlier computers were built using vacuum tube technology, which made them huge and unreliable. These were program-controlled computers, meaning that an operator programmed the computer with a set of switches and wires for each new calculation.

These early computers were technically general purpose, but programming them was complicated and error-prone, often taking weeks. Even so, they were on the cutting edge of technology in the late 1940s. As an aside, speaking of error-prone programming, the term bug has been used as a part of engineering jargon long before the ENIAC. But it was used in an account by the famous computer pioneer Grace Hopper in 1946 in an interesting way.

While working the Mark II and Mark III computers, operators traced an error on the Mark II to a moth trapped in a relay, recoining the term bug. The bug was carefully removed and taped to the logbook with the caption, first actual case of a bug being found. Fun fact. the silicon die of a modern microprocessor is smaller than that bug.

On a more serious note, the importance of the ease of programmability, along with an abstracted computer model to make computers simpler to program and use is paramount in CPU design. In the late 1940s, prominent mathematician John Van Neumann popularized a new kind of computer architecture that simplified computer design and programming. His idea for a stored program computer reimagined the general purpose computer as three separate systems.

A bank of memory for storing data and instructions, a central processing unit for decoding and executing instructions, and a set of input and output interfaces. This von Neumann architecture separated units for processing information, the CPU, from units that stored information, the memory, and allowed data and instructions to be stored and addressed in memory in the same way. It also introduced the four-step instruction cycle, fetching instructions from memory, decoding the instructions, executing the instructions, and storing the results back in memory. This architecture was so revolutionary that modern computers are still today based on these basic principles. In the 1950s semiconductor based transistors started to hit the market replacing the larger more unreliable mechanical vacuum tube based technologies.

This brought smaller and faster circuits to the electronics industry. Then in 1959 Robert Noyce patented the first monolithic integrated circuit which combined multiple transistors on a single silicon chip. These early chips didn't have many transistors, but they enabled much smaller and more complex circuit designs.

By 1968, several companies were making integrated circuits, and Robert Noyce teamed up with colleague Gordon Moore to get into the game, founding Intel, which was short for Integrated Electronics. Intel quickly found a niche, making memory chips, but in 1971 they unveiled the Intel 4004 single-chip microprocessor for the calculator market. With a whopping 2,300 transistors and a blazing fast clock speed of 740 kilohertz, it was the first general-purpose programmable processor and packed all of the computational power of the ENIAC into one tiny device. Engineers can now purchase a single component that can be customized with software for a variety of functions.

In 1972, Intel released the first 8-bit microprocessor, the Intel 8008. But its successor, which released in 1974, known as the Intel 8080 was a real breakthrough. The Intel 8080 was a giant leap in CPU design. One of the world's most widespread microprocessors.

And when the 16-bit 8086 was released in 1978, the world would never be the same. The Intel 8086 with its x86 instruction set architecture became the foundation for modern CPU designs that we still use today. Here you can see a dive picture of the 4004 microprocessor.

And here is the 11th gen Intel Core processor codenamed Tiger Lake. You might look at both these die pictures side by side and think that the physical size is not that dramatic. But consider this. The 4004 CPU had 2300 transistors, while the Tiger Lake processor actually packs in billions and billions of transistors. In fact, I would have to have about 4 to 5 million of these chips to have the same number of transistors as Tiger Lake.

What's even more remarkable is that the 4004 took a team of engineers to design. Whereas today, due to the power of computing abstraction, a single engineer handles millions of transistors. There are a couple of concepts that are key to understand with regards to computer architecture.

First is the notion and use of the binary system. And second... is that concept of computing abstraction layers I was just mentioning. Digital computers use a number system based on zeros and ones rather than the decimal system that most people are familiar with. For computers to function, it's critical that we represent all control and data and numbers in simple on or off states.

It turns out that the characters and numbers humans use to communicate with can be easily translated into a binary-based system of numbers or encodings like the popular ASCII representation for characters of the alphabet. There is great power in what the binary system allows us to do. By representing data as a series of on or off states in a computer or memory, we can simplify and create a foundation for computer engineers to do much more complicated tasks. The second concept is that of computational abstraction layers, and how you can start with very simple things like atoms and transistors, and add abstraction layer to abstraction layer, to build things up to complex applications that run in large data centers. In general, the abstraction layers that make up computing are well understood.

And building from one layer to the next allows us to build complex computing structures from the bottom up that would be almost impossible to visualize from the top down. Allow me to explain. At the foundation of the computing abstraction layers, you have atoms, which get put together in materials like silicon, from which we build tiny transistors. These transistors act as switches that turn on or off. with the application of an electrical current or voltage signal.

By connecting switches together in specific arrangements, we can form what we call the fundamental Boolean logic operators for performing calculations. AND, OR, and NOT. For example, one switch by itself can convert a 1 to a 0, and on its output, it is called a NOT, or an inverting operator. And two switches arranged in series form an AND operator. If we wire switches in parallel with each other, we get the OR operator at the output.

These arrangements of switches are known as logic gates. So we can now organize or abstract ones and zeros to a language of logic that is more efficient to understand than the language of physics and the flow of electrons. Using transistors as switches and connecting the output of one to the input of another, we can build a variety of logic circuits or functional blocks.

These functional blocks can take the form of adders, multiplexers, decoders, latches, flip-flops, registers, counters. The list goes on and on. The power of abstraction is incredible.

And as you might expect, chaining functional blocks together allows for even more complex logical functions. With them, we can build custom execution units that perform specific calculations. For example, one of the most important execution units in a CPU is the arithmetic logic unit. or ALU.

Designing a whole CPU comes down to building multiple specialized processing elements and connecting them together in ways that allow complex computations to be done. The combination of those processing elements into a design that can fetch instructions from memory, decode the instructions, execute those instructions, and store the results back in memory is what we call a microarchitecture, or the instantiation of an architecture implemented In hardware, John Van Neumann would be proud. So how do we get from hardware to software? With yet another abstraction layer, of course. The Instruction Set Architecture, or ISA, is a set of instructions that defines what kinds of operations can be performed in hardware.

It is nothing more than the language of the computer. Much like English or Spanish are languages, most languages have dictionaries that describe the words, the format, grammatical syntax, and meanings to those who communicate in those languages. An ISA is an abstract model of the computer, sometimes also referred to as architecture or computer architecture. The ISA describes the memory model, supported data types, registers, and behavior of machine code.

The sequences of... zeros and ones that the CPU must execute. You may have heard of several types of ISAs, like x86, ARM, or MIPS.

The ISA acts like a sort of bridge between software and hardware. On the software side, a compiler uses the ISA to transform code written in a high-level language like C, Perl, or Java into machine code instructions or language that the CPU can process. The ISA is the dictionary of instructions, data types, and the formats that the CPU adhering to that ISA must execute. This means the application programmer can usually ignore the ISA and focus on what the programming language provides. On the hardware side, when a CPU microarchitecture is being designed, the ISA is used as a design spec that tells the engineer what operations it needs to execute.

Because of this layer of abstraction, the instructions in the ISA are implementation independent. This means that even if a different company creates different microarchitecture designs, they can all run the same code based on the same ISA. Computer architects continue to evolve ISAs through extensions to the instruction set, much like new words are often added to dictionaries.

These additional instructions are often created to perform certain operations more efficiently, leveraging new processing elements in their microarchitecture. These ISO extensions can increase a CPU's performance by streamlining operations for a particular arrangement of processing elements. Modern CPUs support thousands of different instructions, but many of them are related to arithmetic operations like addition, subtraction, multiplication.

Logical operations like AND, OR, NOT. Memory operations like loading, storing, moving. And flow control, like branching.

We'll explain what branching is later. In general, instructions consist of an opcode, which is the operand to be performed, like ADD, and a number of operands, the data to be operated on, like ADDA in register X to B in memory location Y. ISA instructions often also include additional bits of data that give the CPU more information relevant to the operation, which the CPU uses to decode and execute the instruction in an efficient way according to its microarchitecture.

Of course, this is a highly simplified description of instruction set architectures, and it's important to realize that modern ICES are much more complex than we could hope to explain in this short video. ICES are one of the most critical parts of modern CPU design, as they are the linchpin between software and hardware that allows efficient, high-performance computation and seamless software experiences across a variety of CPU microarchitectures. In part two of this series, We'll go into the details of the microarchitecture, the specifics of the instruction cycle, and the different parts and functions of the front and back end of a CPU, as well as a primer on caching.

Stay tuned.

Transcript for:Understanding CPU Architecture and Innovation

Transcript for:
Understanding CPU Architecture and Innovation