Understanding Operating Systems and the Internet

Welcome back to McMaster University course Computer Science 1JC3 Introduction to Computational Thinking. We are working on the topic today of operating systems and the internet. We're going to start by talking about files. So a file is simply a finite sequence of data. So it's a gigantic, possibly a gigantic sequence of data, and it's stored on a persistent data storage device of some kind. So it's a device where we can store this data for an indefinite amount of time. And as I said, a file is a finite sequence of data. The data is usually characters or bits. And by bits, I just mean values which are true or you can think of true or false, yes or no, zero, one, so forth. Now, there are many kinds of files. There are, and they usually... break down into text files and these text files are either ASCII or Unicode files or they're binary files and file names have extensions like we see here.exe or.txt.dlc or so forth these file extensions they indicate what kind of file we're talking about but the file extend extensions don't necessarily, they could be incorrect and they're not actually needed. I could have a file which is, let's say, holding text and instead of being here a.txt file, it just has no extension at all. So files are important because they allow us to share data between programs, users, and storage devices. They are the fundamental way of sharing data. Now, files can be very large. They can be so large that they can't be placed in a computer's memory all at one time. You can only put part of the file in at a time. An example of this would be a video file. Now, programs. They use files for both input and output. The principal way a program gets a large amount of input or produces a large amount of output is with files. And files are really the most effective way of handling input when we have large amounts of it. Input and output when we have large amounts of it. Okay, so... Computers have file systems that are set up by the operating system. And this consists of a tree. And the tree has files and directories. So a directory is basically something that can hold files or more directories. By the way, directories are often called folders. So, for instance, you can think of it as a directory here. And inside the directory, we could hold other directories. and then we could also have files in here and we could have files in here or a whole lot of the directories that's basically the way the file system works and files and directories are access using path names and the top level directory is called root and in UNIX its path name starts with a slash. This represents root and if we had a directory in root called a we would refer to it like that. We would call it ADirect. And then we could also have one called b. We'd represent that like this. And if we were going to represent a directory in here, this might be c dev. And then we might have a file, which I'll just call f. So this would be a path named to f. This is what's called a... Full path name, so this is full. And I could just have some... Something like this. If I'm in the directory ADEV, this would be a relative path name. So files and directories can be created, they can be deleted, they can be moved from one part of your file tree to another part, they can be copied, they can be renamed, and they can be modified in various ways. Okay, so now let's move to processes. A process is simply an application program that's executing on a program. So as you know, a program is just a big piece of syntax. If we have this program and we're actually executing it, it becomes a process. And of course we could have several processes that are executing the same program. So processes run on one of the computer's central processing units. These are called CPUs. So computers now have multi-cores. They have multiple CPUs. These cores each could be running a computer. Excuse me, running a program. They could be each executing a program. So if we had, let's say, just eight cores, we could execute eight programs. But actually, we can execute more than that using timesharing. So the way timesharing works is a process is given... a piece of time on a CPU. This is called time slice and during this time slice it's allowed to execute and when the time slice is done it is suspended. It's taken taken off and then it will wait around until it has access again to the CPU or to another CPU. And this process of stopping a process and then suspending it, this is called process contact switch. So basically the way the CPU is working is a program that is running on the CPU. A process is put on the CPU, it is executed for a very short period of time, taken off, and then another process is put on the CPU, executed for a very short time, then taken off. And this is done over and over again. So processes can have certain status. There's different ways of doing this, but we can think of it as having five different statuses. It can be executing. This means it's on the CPU and it's executing. It can be blocked. This means it's waiting for some event to occur. It's not executing. And it will stay in this blocked state as long as the event... has not happened yet. If it's in the ready state, the process is all ready to be executed, and it's just waiting to have control of the CPU, and final state is finished, the process has finished execution, it's done, or it's been stopped. it's been permanently stopped now each process is given a virtual address space so it's given this big address space of memory it can access this memory this memory holds the pro code, it holds its data, and the reason for having a virtual address space is because the process could be running on the CPU or it could not be. If it's not running on the CPU, the CPU, it's a virtual address space is going to be mapped to secondary storage which would be disk storage. So it's not running, it's data and its program are in in disk storage and the virtual address space is mapped to that. When it's running, the processes code and data will be in main memory, it will be in RAM and then this virtual address space will be mapped to that. So this way the program only worries about its virtual address space. The operating system worries about how the virtual address space is mapped to actually where the program is. code and data is. And the interesting thing is the virtual address space does not have to fit entirely into RAM. Only the part that is needed for the program that's running at the moment. Okay, so I have a question about program execution. Programs can be executed in parallel on a computer with one CPU. So it wasn't so long ago computers had usually just one CPU. So the question is, is this statement true? Computers can be executed in parallel on a program with one CPU. So I'll give you a moment to think about that. Okay, thank you for coming back. Well, the answer is really A and it's B. It depends on your perspective. If you have a CPU, one CPU, it can only execute one program at a time. It can only execute one process at a time. But when it executes this process, it executes it for an extremely short period of time, let's say milliseconds. And so it executes a program for this short period of time, and then it switches it out and puts in a new program and executes that program for a very short period of time. So it looks to the user like a whole bunch of programs are being executed in parallel. In a single cycle. second. Many programs are being executed simultaneously. So from the user's point of view, the CPU is executing programs in parallel. From the strict physical point of view, it's executing only one program at a time. Okay, so now we're going to move on towards talking about the internet. The first thing to discuss are physical networks. So, a physical network is a set of computers. They exchange digital information with each other using a physical medium. And there can be many different kinds of physical mediums. And these computers are connected to the physical medium via a network interface. So, I like to think of, I like to draw this like this. This is the medium, these are the computers, and these are the interfaces. That's how I'd like to draw a physical network. Now there's many many kinds of physical web networks based on different technologies and different communication protocols. So a communication protocol would be the root of how the computers communicate with each other using the physical network. And there's wired, here we can have wired and wired lists. physical networks we can have connection oriented that would be like a telephone system where when you're using a telephone system you create a connection between the caller and receiver and then there's connectionless this is sort of like our physical mail system you drop a letter in the mailbox and it's basically bounces around until it gets to where it's delivered we can have local area networks these are like networks that work just for a building or wide area networks. These would be networks that work across, let's say, part of a province. And then we have different topologies. And the topologies I can show you is simply how they look. A bus topology looks like this. We have a medium that we connect to. It's basically just a line. And so that's a bus to Taleb topology. A ring looks like this, where information goes around and around this way. A star is we have some components let's say switches and on the outside we have our computers and the simplest case is point to point we have two computers and they're connected like this you can do this at home if you have two computers and they have an ethernet card you can just put in here Let's say an Ethernet crossover cable. And that will create a point-to-point physical network. Okay, so there's many different kinds of physical network. There are technologies. Here are some examples. Ethernet. Wi-Fi. ATM, and by the way, ATM has nothing to do with ATM machines you find in banks, and fiber distributed data interface, FDDI, which uses optical fiber. So these are physical networks. And so now we get to the subject of an internet. So if we go back to maybe about the 70s, 60s, and 70s, there were a lot of these physical networks that were created, people connected up computers. How do we get computers and different physical networks to communicate with each other? This is a problem. How do we basically perform communication across all these physical networks? And remember, these physical networks can have very different network technologies. So the solution is pretty simple. We build a universal virtual network on top of the physical networks. So we have these physical networks with different technology and we build on top of that a universal virtual network. So this is what we want. Universal virtual network. And an internet is a virtual network that's based on two things. It's based on an internet architecture and a set of protocols for communicating over this. architecture which is called the TCP IP internet protocol suite now in this suite there are many protocols but the most important two are TCP and IP So what we call the Internet today, the global Internet, it is an Internet of this form, and it serves as a virtual, universal virtual network. Okay, so what is this Internet architecture? The architecture is pretty simple. Remember I said we would, I like to write physical networks like this, where the circle can be any, communication medium and so we have a bunch of these you know thousands and thousands of them and the question is how do we build a virtual network what's going to be our architecture So the idea is we have this set of physical networks and then we have routers that connect the networks to each other. So what is a router? A router is just another computer which happens to be on more than one of these. So this is a router and this is a router. So by this definition, all a router is, is a computer that's on more than one physical network and we can use that computer to connect these physical networks. So now if you think about What we have here, we have two kinds of, we can think of this as a graph, a bipart type graph, and there's two kinds of nodes, hosts and physical networks. So let me, these are going to be the hosts, and so forth. And these are the physical networks. And, let's see, and these are the edges. They're the network. These are the edges. So this is a bipartite graph. All a bipartite graph is is a graph with two kinds of nodes. So if I drew it as a bipartite graph up here, let's say, draw this, we'd have two kinds of nodes, I'll say, and we have edges like this. So the circles are one kind of node. They represent the physical networks. The squares are another kind of nodes. They represent computers. And the edges are the network interfaces. So it's convenient to think of it as a bipartite graph. And so when information is going across the internet, it's going from host, physical Okay, so the structure of the internet is two layered. It has, oops, sorry about that. It has two layers. It has the physical networks with our heterogeneous in the sense that they can be worked very differently. They use different communication programs. protocols they have different technology and then there's a homogeneous virtual network that's implemented on top and this is homogeneous because all this whole network is using the same communication protocols and these Two layers have two different layers of addresses. We have physical addresses. These are actually attached to the network technology. So for instance, if we had an Ethernet physical network, and this was network interface right here. This network interface is basically physically it's going to be your ethernet card in your computer. This network interface will have a physical address. And that's at the physical network layer. At the virtual network layer, it will have what's called an IP address, an Internet Protocol address. And so one of the problems with a network is that we have to match up the physical addresses with the virtual addresses. Okay, so we're going to stop here and we're going to continue next time looking at the protocols that are used to make the Internet work.

So a file is simply a finite sequence of data. So it's a gigantic, possibly a gigantic sequence of data, and it's stored on a persistent data storage device of some kind. So it's a device where we can store this data for an indefinite amount of time. And as I said, a file is a finite sequence of data.

The data is usually characters or bits. And by bits, I just mean values which are true or you can think of true or false, yes or no, zero, one, so forth. Now, there are many kinds of files. There are, and they usually... break down into text files and these text files are either ASCII or Unicode files or they're binary files and file names have extensions like we see here.exe or.txt.dlc or so forth these file extensions they indicate what kind of file we're talking about but the file extend extensions don't necessarily, they could be incorrect and they're not actually needed.

I could have a file which is, let's say, holding text and instead of being here a.txt file, it just has no extension at all. So files are important because they allow us to share data between programs, users, and storage devices. They are the fundamental way of sharing data. Now, files can be very large.

They can be so large that they can't be placed in a computer's memory all at one time. You can only put part of the file in at a time. An example of this would be a video file.

Now, programs. They use files for both input and output. The principal way a program gets a large amount of input or produces a large amount of output is with files.

And files are really the most effective way of handling input when we have large amounts of it. Input and output when we have large amounts of it. Okay, so...

Computers have file systems that are set up by the operating system. And this consists of a tree. And the tree has files and directories.

So a directory is basically something that can hold files or more directories. By the way, directories are often called folders. So, for instance, you can think of it as a directory here.

And inside the directory, we could hold other directories. and then we could also have files in here and we could have files in here or a whole lot of the directories that's basically the way the file system works and files and directories are access using path names and the top level directory is called root and in UNIX its path name starts with a slash. This represents root and if we had a directory in root called a we would refer to it like that. We would call it ADirect. And then we could also have one called b.

We'd represent that like this. And if we were going to represent a directory in here, this might be c dev. And then we might have a file, which I'll just call f.

So this would be a path named to f. This is what's called a... Full path name, so this is full. And I could just have some... Something like this.

If I'm in the directory ADEV, this would be a relative path name. So files and directories can be created, they can be deleted, they can be moved from one part of your file tree to another part, they can be copied, they can be renamed, and they can be modified in various ways. Okay, so now let's move to processes. A process is simply an application program that's executing on a program. So as you know, a program is just a big piece of syntax.

If we have this program and we're actually executing it, it becomes a process. And of course we could have several processes that are executing the same program. So processes run on one of the computer's central processing units.

These are called CPUs. So computers now have multi-cores. They have multiple CPUs. These cores each could be running a computer.

Excuse me, running a program. They could be each executing a program. So if we had, let's say, just eight cores, we could execute eight programs. But actually, we can execute more than that using timesharing.

So the way timesharing works is a process is given... a piece of time on a CPU. This is called time slice and during this time slice it's allowed to execute and when the time slice is done it is suspended.

It's taken taken off and then it will wait around until it has access again to the CPU or to another CPU. And this process of stopping a process and then suspending it, this is called process contact switch. So basically the way the CPU is working is a program that is running on the CPU. A process is put on the CPU, it is executed for a very short period of time, taken off, and then another process is put on the CPU, executed for a very short time, then taken off. And this is done over and over again.

So processes can have certain status. There's different ways of doing this, but we can think of it as having five different statuses. It can be executing. This means it's on the CPU and it's executing.

It can be blocked. This means it's waiting for some event to occur. It's not executing. And it will stay in this blocked state as long as the event...

has not happened yet. If it's in the ready state, the process is all ready to be executed, and it's just waiting to have control of the CPU, and final state is finished, the process has finished execution, it's done, or it's been stopped. it's been permanently stopped now each process is given a virtual address space so it's given this big address space of memory it can access this memory this memory holds the pro code, it holds its data, and the reason for having a virtual address space is because the process could be running on the CPU or it could not be.

If it's not running on the CPU, the CPU, it's a virtual address space is going to be mapped to secondary storage which would be disk storage. So it's not running, it's data and its program are in in disk storage and the virtual address space is mapped to that. When it's running, the processes code and data will be in main memory, it will be in RAM and then this virtual address space will be mapped to that. So this way the program only worries about its virtual address space. The operating system worries about how the virtual address space is mapped to actually where the program is. code and data is.

And the interesting thing is the virtual address space does not have to fit entirely into RAM. Only the part that is needed for the program that's running at the moment. Okay, so I have a question about program execution. Programs can be executed in parallel on a computer with one CPU. So it wasn't so long ago computers had usually just one CPU.

So the question is, is this statement true? Computers can be executed in parallel on a program with one CPU. So I'll give you a moment to think about that.

Okay, thank you for coming back. Well, the answer is really A and it's B. It depends on your perspective.

If you have a CPU, one CPU, it can only execute one program at a time. It can only execute one process at a time. But when it executes this process, it executes it for an extremely short period of time, let's say milliseconds.

And so it executes a program for this short period of time, and then it switches it out and puts in a new program and executes that program for a very short period of time. So it looks to the user like a whole bunch of programs are being executed in parallel. In a single cycle.

second. Many programs are being executed simultaneously. So from the user's point of view, the CPU is executing programs in parallel.

From the strict physical point of view, it's executing only one program at a time. Okay, so now we're going to move on towards talking about the internet. The first thing to discuss are physical networks. So, a physical network is a set of computers.

They exchange digital information with each other using a physical medium. And there can be many different kinds of physical mediums. And these computers are connected to the physical medium via a network interface.

So, I like to think of, I like to draw this like this. This is the medium, these are the computers, and these are the interfaces. That's how I'd like to draw a physical network. Now there's many many kinds of physical web networks based on different technologies and different communication protocols. So a communication protocol would be the root of how the computers communicate with each other using the physical network.

And there's wired, here we can have wired and wired lists. physical networks we can have connection oriented that would be like a telephone system where when you're using a telephone system you create a connection between the caller and receiver and then there's connectionless this is sort of like our physical mail system you drop a letter in the mailbox and it's basically bounces around until it gets to where it's delivered we can have local area networks these are like networks that work just for a building or wide area networks. These would be networks that work across, let's say, part of a province.

And then we have different topologies. And the topologies I can show you is simply how they look. A bus topology looks like this.

We have a medium that we connect to. It's basically just a line. And so that's a bus to Taleb topology. A ring looks like this, where information goes around and around this way. A star is we have some components let's say switches and on the outside we have our computers and the simplest case is point to point we have two computers and they're connected like this you can do this at home if you have two computers and they have an ethernet card you can just put in here Let's say an Ethernet crossover cable.

And that will create a point-to-point physical network. Okay, so there's many different kinds of physical network. There are technologies. Here are some examples.

Ethernet. Wi-Fi. ATM, and by the way, ATM has nothing to do with ATM machines you find in banks, and fiber distributed data interface, FDDI, which uses optical fiber. So these are physical networks.

And so now we get to the subject of an internet. So if we go back to maybe about the 70s, 60s, and 70s, there were a lot of these physical networks that were created, people connected up computers. How do we get computers and different physical networks to communicate with each other?

This is a problem. How do we basically perform communication across all these physical networks? And remember, these physical networks can have very different network technologies. So the solution is pretty simple.

We build a universal virtual network on top of the physical networks. So we have these physical networks with different technology and we build on top of that a universal virtual network. So this is what we want.

Universal virtual network. And an internet is a virtual network that's based on two things. It's based on an internet architecture and a set of protocols for communicating over this. architecture which is called the TCP IP internet protocol suite now in this suite there are many protocols but the most important two are TCP and IP So what we call the Internet today, the global Internet, it is an Internet of this form, and it serves as a virtual, universal virtual network. Okay, so what is this Internet architecture?

The architecture is pretty simple. Remember I said we would, I like to write physical networks like this, where the circle can be any, communication medium and so we have a bunch of these you know thousands and thousands of them and the question is how do we build a virtual network what's going to be our architecture So the idea is we have this set of physical networks and then we have routers that connect the networks to each other. So what is a router? A router is just another computer which happens to be on more than one of these.

So this is a router and this is a router. So by this definition, all a router is, is a computer that's on more than one physical network and we can use that computer to connect these physical networks. So now if you think about What we have here, we have two kinds of, we can think of this as a graph, a bipart type graph, and there's two kinds of nodes, hosts and physical networks. So let me, these are going to be the hosts, and so forth. And these are the physical networks.

And, let's see, and these are the edges. They're the network. These are the edges. So this is a bipartite graph.

All a bipartite graph is is a graph with two kinds of nodes. So if I drew it as a bipartite graph up here, let's say, draw this, we'd have two kinds of nodes, I'll say, and we have edges like this. So the circles are one kind of node. They represent the physical networks.

The squares are another kind of nodes. They represent computers. And the edges are the network interfaces. So it's convenient to think of it as a bipartite graph. And so when information is going across the internet, it's going from host, physical Okay, so the structure of the internet is two layered.

It has, oops, sorry about that. It has two layers. It has the physical networks with our heterogeneous in the sense that they can be worked very differently. They use different communication programs. protocols they have different technology and then there's a homogeneous virtual network that's implemented on top and this is homogeneous because all this whole network is using the same communication protocols and these Two layers have two different layers of addresses.

We have physical addresses. These are actually attached to the network technology. So for instance, if we had an Ethernet physical network, and this was network interface right here.

This network interface is basically physically it's going to be your ethernet card in your computer. This network interface will have a physical address. And that's at the physical network layer. At the virtual network layer, it will have what's called an IP address, an Internet Protocol address. And so one of the problems with a network is that we have to match up the physical addresses with the virtual addresses.

Okay, so we're going to stop here and we're going to continue next time looking at the protocols that are used to make the Internet work.

Transcript for:Understanding Operating Systems and the Internet

Transcript for:
Understanding Operating Systems and the Internet