Transcript for:
Fundamentals of Operating Systems

sleep and study don't forget to subscribe in this lecture I'm going to introduce the fundamentals of operating systems what they are what they do and why they're important now what's an operating system well it's a layer of software that provides two important services to a computer system it provides abstraction and arbitration abstraction means hiding the details of different Hardware configurations so that each application doesn't have to be tailored for each possible device that might be present on a system arbitration means that the operating system manages access to shared hardware resources so that multiple applications can run on the same Hardware at the same time without interfering with one another at least Hardware resources that need to be managed include the CPU hierarchy of memory all the input output devices on the system and to some degree the power and system management features in the system many of these features are handled directly by the hardware but the operating system is involved particularly in energy conservation now the abstraction features of the operating system allow Hardware devices manufactured by different manufacturers to have the same interface within software for applications to use these Hardware devices all have different low-level instruction sets they all have particularly particular capabilities features and details that are unique to each Hardware device if we didn't have a common interface into these Hardware devices first of all our variety of Hardware might be limited but worse every application on the system would have to be programmed to use every single device on the system an example back in the 1990s computer games often required internal programming for specific video cards and specific sound cards it was often necessary to go into the settings for each game and tell the game what type of video card you had what type of sound card you had for the purpose of configuring the game to use that particular Hardware imagine if that had to be done for every single application on the system including say a calculator or a web browser that would be very untenable situation in terms of being able to make use of computers the way we use them today also what if we could only run one program at a time on a computer system years and years ago that was the case however a modern system is running multiple applications simultaneously and it's up to the operating system to ensure that all these applications can access resources so each CPU is divided among the different programs each program gets access to memory input output as well as disk and in an Ideal World the operating system also enforces policies that isolate applications from each other so that a crash in one application doesn't take down the entire system or other applications now of these examples do we have a situation where we have abstraction or arbitration well first example supporting both Intel and AMD processors this is an example of abstraction we don't have to write separate software for an Intel processor relative to an AMD processor at least for 99.99 of applications simply write the application once it will run on either process switching between applications is an example of arbitrating Hardware resources among the different applications on the system separating memory allocated to different applications is also an arbitration activity it keeps one application when overwriting the contents of memory that being used by another application enabling video conferencing software to use different camera devices this would be an example of abstraction the video conferencing program just has to know how to use a camera interface that the operating system provides and then different cameras from different manufacturers can be used without having to write the application the video conferencing program to be able to talk to each individual camera similarly accessing two different hard disks from two different manufacturers any underlying detail differences between those drives can be handled by using the operating system to abstract away the details sending and receiving messages over a network is both abstraction and arbitration on the one hand we're abstracting away the details of each particular network card to be able to send and receive the message on the other hand we're sharing that network card among all the different applications on the system now we can think of the system in terms of layers at the bottom of the layer cake we have the hardware this is what executes the software the top three layers the operating system is the middleman so to speak between the applications and the libraries and utilities upon which the applications Depend and the underlying Hardware specifically the core of the operating system or the kernel so named after say a kernel of corn is the minimum piece of software that's needed to share the hardware between the different applications whatever lives outside the kernel that is software is said to be in user space that means it's application code or sometimes even parts of the operating system that are not strictly necessary to share the hardware or abstract away its details now operating systems Implement a common mechanism for allowing applications to access the hardware which is abstraction and the applications make requests from the operating system to go and access the hardware or other features by making system calls down into the operating system this is called entering the operating system or iterating entering the kernel from the top path operating systems are also able to alert applications that Hardware has changed perhaps a network packet has come in perhaps the user has pressed a key on the keyboard these alerts are delivered via system called interrupts and refer to entering the kernel from the bottom half from the hardware side of the operating system we should also note that operating systems can manage and terminate applications by sending signals to those applications now there are a wide variety of operating systems out there they basically fall into two categories Microsoft Windows which is not Unix and everything else which is UNIX Microsoft Windows systems non-unic systems are the most popular desktop operating systems at the moment however they're rapidly being eclipsed by tablet devices most of which are running Unix style systems Windows is typically pre-installed by PC manufacturers which accounts for a good portion of its present popularity Unix systems on the other hand are installed by the end user except for commercial Unix systems and Mac OS 10. Mac OS 10 is probably the best selling Unix system of date and Linux which includes the Android platform is probably second now there are some Mainframe systems out there also some of which are Unix like some use custom operating systems there are a variety of players in the embedded systems Market however that's presently dominated by Android with Apple's IOS which is based on Mac OS 10 in close second others Symbian Blackberry OS tiny os are confined to smaller markets now the idea behind the Unix systems began with a time sharing system started in 1964 called multix and the idea behind maltics was to make Computing into a remote service that could be accessed by terminals using telephone lines it wasn't terribly successful because dial-up connections of the time at about 9 600 baud were rather slow however Canada actually did have a multic system deployed and used to this part of their National Defense Network until the year 2000. maltix has remembered however primarily for its influence as a multi-user shared system and today with cloud computing there's some significant parallels between some of the cloud computing models that we used and some of the ideas that were pioneered back in the multic system now Unix was actually a play on the term multix it was developed by Bell labs in 1969 it's actually a trademark term so some authors use starnics to refer to the family of systems and use the word Unix only for the commercial distributions the original Unix systems were commercial distributions with free variance BSD and Linux coming along late 80s and early 90s now Unix systems are time sharing systems in that they support multiple users at the same time but they're designed to be run on local Computer Resources instead of remote resources the Berkeley system distribution which is based on at T's commercial distribution was one of the first open source BS one of the first open source unit systems to emerge it was based on the commercial distribution so at T sued UC Berkeley but an eventual settlement allowed Berkeley to distribute BSD freely in both source and binary form today's descendants of BSD include FreeBSD openbsd and Mac OS 10 which is based on the Unix vsd system Linux however was an alternate approach to getting a Unix light kernel running on computer system this was started by Linus torvales as an undergrad at the University of Helsinki released in source code forum in 1991 and often combined with a set of user space utilities and libraries created by the gnu project thus the resulting combination is sometimes called the news Linux it has been commercially successful for many device classes including web servers Network and embedded devices and mobile phones some people including myself use Linux as a desktop platform and it does have benefit of having the one kernel that's scalable from embedded devices all the way up to supercomputers now it's convenient to think of a system computer system as a set of layers where the top layer we have a group of applications these are the programs that we run and with which we interact on a day-to-day basis to accomplish various tasks these applications are built on top of libraries and utilities some of which are included with the operating system and some of which are installed separately the operating system acts as an intermediary layer between the application layer and the utilities and the underlying Hardware the underlying Hardware consists of several important pieces I'll talk more in detail about these in a moment two primary pieces that come to mind are the central processing unit which actually executes the program instructions and the memory hierarchy which provides a place to store instructions and data for use in computation now the memory hierarchy is critical to understand because the speeds of each level of memory within the hierarchy vary the fastest memory consists of CPU registers and then the cache attached to the CPU itself whereas the slowest memory consists of persistent memory typically disk the hardware also includes a bunch of input output devices you have a keyboard mouse network interface card screen printer plenty of others webcam different types of devices one of the other things that the hardware does is supplies power to the system via power supply and removes heat generated by the system through cooling this is typically done through fans built into the computer's case now when we look at a computer system from the outside we see something called a form factor now form factor is simply a set of specifications for standard sizes of equipment to allow different computer components developed by different people be interchanged with one another in standard form factors include ATX which is a typical workstation size form factor Micro ATX and Mini ITX which are used in small form factor systems such as bookshelf systems and home theater PCS and rack mount systems which are typical servers in the data center now laptop systems are a special case these types of systems typically are proprietary in nature the extent it's not easy to take components out of one laptop and put it in another laptop workstation systems however do allow easy component interchange and these are the simplest of systems they typically support multiple users but normally only one user at a time when a workstation is shared between different users it often involves a user getting up from the keyboard and another user sitting down of course it's not always the case but generally that's the case workstation systems are also frequently either ATX or microatx form factors and each workstation typically has its own power supply which can be an issue of efficiency considering the power supplies are only about 65 to 85 percent efficient workstations can be used to host server applications this is particularly true in smaller settings such as small businesses non-profit organizations server applications in larger scale environments such as Enterprise environments are typically hosted in rack mount systems and racknon systems are simply a space efficient way to put a whole lot of servers into one data center rack mount systems are measured or rack sizes are measured in terms of units or u in a typical racks 42 Utah so the most space efficient of servers a one-use system with 12 cores in each machine could enable up to 480 cores 480 CPU cores to be supported by a single rack however there won't be much room for expansion there won't be much room for extra devices or extra hard drives for that setup we use 2u and 4u systems which are less space efficient but do allow for more expansion more discs to be included and more add-in guards now if we take a computer system and we open up the case inside we're going to find a power supply we're going to find a motherboard or what Apple calls a logic board and attached to that motherboard we're going to find the CPU we're going to find Random Access Memory we're going to find expansion slots for add-in devices some of which may be populated and we're going to find that the motherboard contains its own on-board devices such as Network controllers or video controllers inside the case we're also going to find persistent storage devices such as hard disks solid state drives and Optical drives and here's a conceptual view of what the inside of the computer system looks like here we have the motherboard which has some expansion slots these are normally used for add-in cards to be plugged in to add functionality to the system onboard devices which are soldered onto the motherboard and are not removed one or more CPUs which are typically removable these are typically separate chips and each of these chips inside you can't see them but inside are typically multiple CPU cores and some kind of cache memory random access memory is another type of memory that's attached directly to the motherboard and it's typically in the form of removable modules connected to the motherboard via cables we find a hard disk optical drive perhaps a solid state drive the central processing unit is the part of the computer that actually executes instructions has one or more execution cores most modern CPUs have at least two cores CPU registers are used for direct computation direct operations that are occurring right now the cache is used to store program instructions and data that will be used immediately the next few instructions that are going to be executed this cache is divided into levels of different speeds it's also volatile it loses contents whenever power is disconnected and it's expensive this type of fast memory is quite expensive random access memory is separate from the CPU this is used to store programs and data while the program is in a state of execution this memory is order of magnitude several orders of magnitude faster than hard disks or even solid state drives which is why programs are loaded into random access memory by the loader before they start executing most programs also buffer their data in RAM for Speed however the memory and RAM is still volatile and it contents are lost whenever power is disconnected and it is more expensive per unit storage than disk space or even solid state drive space persistent storage is the slowest of all types of memory in the system this is non-volatile storage for data to be saved whenever the system is powered off now this is the least expensive type of storage per unit of capacity but it is slow it's not fast enough to execute programs directly largely because the CPU would constantly be waiting on the drive to give it the next instruction and that would slow down the CPU hard disk drives are what we typically think of when we think of slow persistent storage these are mechanical devices that use magnetic storage basically a magnetic medium on top of either a glass or metal disc called a platter and multiple platters that are stacked spin at high speed with the heads moving back and forth between tracks on the platters accessing data on a hard drive is thus subject to two types of delay seek time which is the time required to move the heads back and forth and rotational delay which is the time required for the platter to rotate around to the location the desired data is stored now hard drives are relatively inexpensive per unit of storage space but they are mechanical which means they're guaranteed to fail sooner or later here's a diagram of a hard disk and here we can see the platters which are stacked on top of each other and spun around by a motor motor spins us around at high speeds the head moves back and forth now there's typically one head first side of each bladder so the heads are actually stacked up and a Servo motor moves the heads back and forth the time it takes to move the head from the inside of the platter to the outside of the platter or from the outside back to the inside to get the appropriate track on the platter where data is located is called the seek time the time it takes for the spot where the data is located to rotate around under the head is called the rotational delay so if I have a piece of data right here about where the mouse pointer is my drive is rotating clockwise the head has to seek to the track where that data is going to be located and I have to wait for that data come all the way around and wind up under the head so that it can be red this adds to delay in accessing reading or writing data from the hard drive so in performance critical situations we'll typically use solid state drives these drives have no moving Parts they instead use solid state chips to store data consequently they have no seek time no rotational delay and faster burst transfers much higher performance than hard drives however this performance comes at a cost and solid state drives themselves will eventually fail each block of an SSD can only be written the finite number of times before it wears out the drive itself extends its own Lifetime by reallocating blocks dynamically so that each block is written in average number of times instead of a large number of times this process is called where lovely it extends the life the device significantly given that our persistent storage devices are guaranteed to fail in time it is critical system administrators keep backup copies of programs and data so that the system can be restored after a failure now the backup media should be stored separately from the computer center itself best practice is on-site at least 10 miles away in a location that can be protected from damage as well as theft or loss of data backup media include Optical disks such as reportable CDs DVDs or Blu-ray these Optical discs are nice because they're inexpensive however they do degrade over time scientists are still working on the details of exactly how long but it is known that data will become unreadable eventually flash drives such as USB sticks are also inexpensive although a bit more expensive than DVD media they're slow sometimes slower than DVD media and they themselves will lose data eventually due to electrical properties inside the flash memory tape drives are considered best Enterprise level solution which of course means they have the highest costs but they have the best reliability on average despite being somewhat slow to access for reading and writing cloud storage is another option for many users and in this model backups are simply uploaded to a remote server that someone else maintains typically a backup service provider cloud storage is relatively low cost and convenient however its speed does depend upon the internet connection that you're using to send data to or retrieve data from the backup service and theft of data can be a concern the operating system implements a common mechanism for allowing applications to access and share Hardware the applications can make requests from the operating system via system calls the operating system can deliver Hardware events to applications and this is how we allow our applications to make use of the underlying Hardware of the computer system I'll be discussing disk input output And discussing many of the physical properties that come into play when attempting to schedule a hard disk drive for Access by multiple programs in particular I'll talk about disk attachment talk about some of the properties of magnetic disks discuss disk addressing discuss partitioning and introduce solid-state drives but begin with disk attachment disks are attached to the motherboard via some kind of cable and the exact type of cable depends upon the bus in use on the system there are several different types of bus which are implemented by different chips attached to the motherboard on the different computer systems one common bus that was widely in use until the early 2000s on consumer grade Hardware was the integrated Drive Electronics or IDE bus this has been since acronymed parallel ATA or Pata it consisted of 40 to 80 ribbon cable connecting to a 40-pin connector to provide 40 simultaneous parallel channels of communication between the motherboard and the hard drive Enterprise level systems of that time period typically used scuzzy or small computer system interface buses which consisted of the cabling of 50 to 80 pin connectors between the hard disk and the motherboard scuzzy also defined a standard set of commands a standard protocol for interfacing with disks CDs and other types of storage devices this protocol was useful for recordable CD media and dvd-rom media and was implemented on the ATA bus using the scuzzy protocol in a system known as a tapi or ATA packet interface now in minor times serial interfaces between the motherboard and the disk have replaced for the most part the parallel interfaces for ath style disks we have Serial ATA or SATA which replaces the 40-pin connector with a seven pin connector still uses the same protocol either ATA or a chappy protocol and serial attached scuzzy or SAS which uses the scuzzy protocol over a narrower Channel consisting of 26 to 32 pins both of these new serial attachment mechanisms support higher bus transfer speeds enabling theoretically faster devices to be attached to the motherboard it does not necessarily mean however the disks have gotten that much faster we still store large amounts of information using magnetic disks these are metallic or glass platters that are coated in a magnetic surface and a stack of these platters is rotated at high speed by an electric motor a stack of heads moves back and forth across the platters altering the magnetic fields in order to read and write data moving the heads back and forth results in seek time and waiting for the platter to rotate around to the correct position results in rotational delay historically magnetic media were addressed by geometry the smallest addressable unit of space on a hard disk is called a sector this is typically 512 bytes at least on older disks though other sizes have been used and newer drives go up to four kilobytes tracks are circular paths in a constant radius from the center of the disc and each track is divided into sectors one head reads from a single track on a single side of each platter when you stack multiple heads up while using multiple tracks on the various sides of several different platters the result is what's called a cylinder and historically accessing disks required accessing the particular data locations using cylinder head sector or CHS geometry address as disks grew larger and faster however this type of addressing scheme became Limited and so logical block addressing was put into use and it's now the standard today logical block addressing or LBA gives each block on a disk its own logical address and leaves it up to the disk firmware to convert The Logical addresses into physical locations on the disk current standards with logical block addressing on ATA will allow enough space for disks up to 128 heavy bytes operating systems normally Implement these 48-bit addresses using 64-bit data structures thus operating systems that support 64-bit disk addressing can generally support hard disks up to eight zebi bytes of data assuming we're using 512 byte sector sizes now regardless of the size of the disk it is convenient to partition the disk into multiple sections so that we can isolate data from each other we can isolate the main partition of the operating system from a partition we would use for swapping out pages of virtual memory for example and we can isolate that from user data also with early hard drives it was convenient to isolate partitions so as to minimize seat Time by making it such that the head didn't have to move as far in order to access data now there are two types of partition tables or data structure that resides on the disk to indicate where on the disk the different partitions lie a common partition table type that's in widespread use today is is the master boot record-based partitioning scheme and the way this works is that the BIOS on the system the basic input output system actually loads the first 512 byte sector from the boot drive at boot time and code stored in that 512 byte sector loads the rest of the system also within that sector is stored the partition table which is called a Dos style partition table in the Dos partition table still uses Legacy cylinder head sector addressing supports a maximum of four primary partitions one of which can be an extended partition with logical drives in it and supports maximum partition sizes and maximum partition starting addresses at two tebi bytes this is the default partitioning scheme for Microsoft Windows and most Linux distributions however as hard drives become larger and grow past two terabytes the good partition table or GPT starts to be used this is a larger partition table that can support distance or partitions up to eight zebi bytes in size for compatibility with old partitioning tools to prevent old tools from overwriting sections of the disk and seeing in his free space a protective or dummy Master boot record is retained at the beginning of the disk GPT is the default partitioning scheme in Mac OS 10 is an optional partitioning scheme in Linux and is supported in 64-bit versions of Windows 7 Windows Vista and Windows Server 2008 provided that the system uses the extensible firmware interface or EFI instead of the Legacy BIOS interface for Linux the grub2 bootloader can use a Legacy BIOS interface but it requires a dedicated small partition on the hard drive in which to store the rest of the bootloader hard drives and magnetic media are used for large quantities of space because of their relative low cost when high performance is required we prefer to use solid state drives or ssds these drives have no moving Parts which makes them generally faster and less subject to physical damage than mechanical hard disks most of these ssds use nand flash memory to store data and this is a storage mechanism that's based on injecting or removing an electron from a flash cell injecting an electron into a flash cell changes its state from one to zero so this is backwards from what one would expect an empty flash cell actually has a state of one instead of zero the membranes through which this eject this electron is injected and removed eventually wear out typically after anywhere from a hundred thousand to a million Cycles furthermore the electrons tend to leak out over long time periods periods of many years causing flash to lose data which makes flash based memory systems unsuitable for long-term backups in this diagram we can see how a flash or how a solid state drive using flash memory works blank flash memory stores the value one one one one one one one one one so we have all ones here if I want to write the value one zero zero one zero one zero zero I have to pop electrons into the second third fifth seventh and eighth flash locations the eight those bit locations I'm I'm pretending we only have a byte here if later I wish to change that stored value to one zero one zero zero zero one one I must first erase that block of flash memory before I can program the new data value generally I must erase flash memory in the size of an erased block and this is typically four kilobytes waiting for this block Erasure procedure causes something called right amplification where successive writes to an SSD become progressively slower to avoid this problem typically erase the SSD ahead of time whenever space has been freed on it and there's an ATA command called trim that facilitates this process one other issue that has to be dealt with with ssds is the fact that each cell can only be written to and read from or generally written to erased and written to a fixed number of times this is typically between a hundred thousand and a million so to spread out the rights across the entire SSD the SSD moves data around the drive as files are updated and it also reserves a certain amount of free space unused so that that free space can be swapped in and out with space that's in use later this process called wear leveling has the advantage of dramatically increasing the useful life of the SSD and reducing write amplification whenever clean blocks are made available however in order for the right amplification reduction to be effective the operating system and the underlying file system must support the ATA trim command and furthermore it's impossible to ensure that the disk is secure against forensic data recovery because it may not be possible to overwrite and properly erase the reserved cells of memory that have been taken out of service thus a used solid-state drive should be physically destroyed instead of attempting to resell it which can be an issue because a solid-state drive has a higher upfront cost so in summary disks are attached to the system via some kind of bus the newer bus styles are SATA and SAS these modern discs are addressed using logical block addressing they're partitioned either with DOs partition tables or good partition tables GPT use will increase as the size of disks becomes larger owing to the limits of the Dos partition table and solid-state drives offer higher performance at a higher initial cost subject to the requirement of wear leveling and subject to not being able to be safely resold due to the inability to forensically secure the data on the drive in this lecture I'll be discussing disk scheduling I'll be introducing the purpose of disk scheduling talking about some classical and historical disk scheduling algorithms talking about Native command queuing and disk schedulers that are currently in use in the Linux kernel and then I'll talk a little bit about how I O requests can be efficiently scheduled on solid state drives scheduling serves two purposes disk scheduling serves two purposes the first of these is to arbitrate disk access among different programs this ensures that competing programs have access to disk resources and that a single program cannot monopolize the disk resources in such a way as to prevent other programs from accessing the disk with mechanical hard drives scheduling algorithms historically have also attempted to improve disk performance by reducing the number of seeks required moving reducing the number of times the drive head needs to be moved if the drive head has to be moved too many times a lot of throughput can be lost from the disk because we're waiting on all the seek times to occur the simplest scheduling algorithm would be the first come first serve algorithm which is implemented in Linux as a no op scheduler this algorithm is extremely straightforward it simply consists of a fifo cube into which new requests are added the requests are removed from the queue in order one by one these requests are sent to the disk for processing now there's no reordering of the queue this is a first come first serve ordering so back-to-back requests for different parts of the disk may cause the drive head to move back and forth across the platters wasting quite a bit of time the seeks historically several attempts have been made to try to improve this Behavior one example would be the scan algorithm or the elevator algorithm and in this algorithm the drive head only Moves In One Direction it serves all the requests in that direction before moving back in the other direction this is called the elevator algorithm because it's modeled after how an elevator Works in a building the elevator leaves the ground floor moves to the highest floor stopping along the way to add passengers traveling up remove passengers at whichever floor they wish to stop on and then once the algorithm reaches or once the elevator reaches the highest floor turns around and comes back down same process here in this example with the scan algorithm assuming that the head starts at sector one and this request for sector 50 comes in while sector 61 is being processed the algorithm is going to process the requests in order 3 12 32 40 42 61 84 97 and since that request for sector 50 came in while 61 was being processed that request for sector 50 is going to have to wait until the head changes Direction and returns to sector 50. now there are a few optimizations of the simple scan algorithm the original algorithm proposed that the head would move all the way from the beginning of the disc to the end of the disk and then all the way back to the beginning the look algorithm improves upon this Behavior by moving the head only as far as the highest number of requests before changing directions and moving it back down the circular versions of the algorithm see scan and C look only serve requests moving in One Direction so for example with circular scan the head would start at the first sector move all the way to the end of the disk servicing requests and then come all the way back down to the first sector without servicing any requests and start the process over again these are historical algorithms in the sense that with modern drives we have LBA or logical block addressing and so we don't actually know where the disk is placing data physically this type of algorithm was used historically with cylinder head sector addressing where we knew the physical properties of the disk and the idea here was to reduce average seek time at no time did we know where the platter was that was up to the disk to figure out so there was no way to reduce rotational delay only minimize seektow another algorithm and attempts to minimize seek time is the shortest seek time first algorithm or sstf this algorithm actually orders requests by sector location so when the request for sector 50 comes in it's kept in an ordered queue a priority queue and the next request to be served by the disk these requests are still sent out one at a time is chosen by looking at whatever request is closest to the correct disk position historically this could result in Starvation because if a bunch of requests come in for one piece of the disk requests for the remainder of the disk might not be processed for lengthy periods of time furthermore once again with logical block addressing the operating system doesn't really know where the sectors are laid out on disk and so this algorithm doesn't really work with modern hard drives also a solid state drives this algorithm assumes there's a disk head which in the case of a non-mechanical drive there is not in the Linux kernel an approximation to shortest seek time first was implemented with the anticipatory scheduler this was the default scheduler from 2.6.0 to 2.6.17 it was removed in 2.6.33 because it's obsolete the idea behind the anticipatory scheduler was to approximate short of seek time first by ordering only the read requests into an ordered queue into a priority queue if the next read request was close to the current head position that request would be dispatched immediately otherwise the schedule would actually wait a few milliseconds to see if another request arrives for nearby location and starvation was avoided in this algorithm by placing expiration times on each request and adding preemption so that if a request was waiting too long it would go ahead and be serviced regardless of its location the idea here was to reduce overall seeking there is a separate cue for write requests because right requests could be performed asynchronously we did not have to wait on those requests to be completed before our process could continue doing useful work this algorithm was shown with low performance drives to improve performance on web server applications however it was shown to have poor performance for database loads where there were a lot of random reads and writes on the disk and with high performance disks this algorithm actually breaks down modern hard disks generally do qualify as high performance disks the reason being is that they Implement something called native command queuing this is a feature of newer SATA drives and basically native command queuing leaves scheduling of disk requests up to the disk itself the disk circuitry and firmware makes the decision about which request to handle next to do this the disk has a built-in priority queue of about 32 entries and the disk is able to schedule its requests automatically taking into account both the seat time and the rotational delay since the disk knows the location of the platter this makes modern discs much more efficient and this works with logical block addressing the operating system's role with this type of hard disk is really more arbitration of the just resources among the different programs running on the system one modern Linux scheduler that can be used for such arbitration is called the deadline scheduler and in this scheduler the kernel maintains separate request cues for both read requests and write requests similar to the anticipatory scheduler reads are prioritized over writes because processes typically block or stop and wait while waiting to read something from the disk thus rights can be done later at some point when it's convenient for the operating system the waiting time in each queue with the deadline scheduler is used to determine which request will be scheduled next a 500 millisecond request time is the goal for read request this is the time it would take to start the request with a five second goal to start a write requests this scheduler May improve system responsiveness during periods of heavy disk i o at the expense of data throughput since each request has a deadline and the longer request has been waiting the sooner it will be scheduled this is especially useful for database workloads because there are many requests for different parts of the disk however for web servers and other services that try to access large quantities of data located in the same location in the disk this particular scheduler can actually reduce total throughput completely fair queuing is a somewhat different idea where instead of actually scheduling the i o requests the completely Fair queuing model schedules processes or running programs to have time slice access to each disk and essentially this cfq scheduler gives each process an i o time slice and that process can do as much i o on the disk as it would like within that time slice when the time slice expires the CF key scheduler moves on to the next process and gives it access time to the disk there is a little bit of similarity here to anticipatory scheduling since a process can send another request during its time slice and try to get that request in without having a lot of seek time however this algorithm can waste time if a process does not immediately turn around and send another request thus it can reduce the overall disk throughput since there could be idle times while waiting to see if another request will come in for our time slice expires this has been the default scheduler in Linux since 2.6.18 for solid state disks we have to choose a scheduler that accounts for the fact that there's no seek time to worry about on the disk the anticipatory scheduler the elevator algorithms short of seek time first all of these algorithms can actually reduce performance on a solid state drive because they make assumptions about minimizing head seek time and we have no heads to move with a solid-state disk completely Fair queuing can also reduce disk performance because of idling at the end of a Time slice that's true for any disk so what do we do to maximize performance for solid state drive there are really two good choices for a solid state drive there is the NOAA or fifo scheduler this works well for general purpose systems however when their handy I O workloads and we want to maintain system responsiveness the deadline algorithm is useful since other processes will have more opportunities to access the disk in summary operating systems are arbitrating disk access among different processes to prevent one process from monopolizing the disk and preventing other processes and having access on older discs with cylinder head sector addressing the operating system was also attempting to reduce seek time thus improving aggregate disk performance however newer sanatists with Native command queuing scheduled themselves to reduce both seek time and rotational delay thus algorithms that attempt to minimize seek time are unnecessary Furthermore with solid-state drives scheduling mechanisms that are based on Old mechanical assumptions can actually reduce performance so for ssds we're really only interested in arbitration this lecture I'm going to discuss development Cycles in particular we'll talk about software life cycle steps in software development talk about some software development models and give a brief overview of formal methods the software life cycle is a way of expressing how software comes into being and evolves over a lengthy period of time there's more to software development than simply programming in particular we need to plan the piece of software that we're going to develop when we're talking about a new piece of software figure out what the requirements of that software is going to be design that software to satisfy those requirements and then perform several rounds of coding and testing to ensure that our finished application actually meets the requirements before we can put the application into production the life of a software application is not over however simply because it enters production over time invariably we will find the need to upgrade the software which means going through the same development process all over again and since any software application less trivial than hello world contains bugs we will have quite a bit of ongoing maintenance in order to find and fix errors that creep up in the software we'll also need to support users and go through several of these processes before the software finally gets to the point where it has reached its end of life and is replaced by a new application software development activities comprise a number of steps we have to conceptualize the type of software that we're going to be creating perform requirements analysis to figure out what the software actually needs to do come up with a design for the particular application that we wish to implement and then finally Implement and test the application during the maintenance phase of software life cycle we need to check and fix bugs it's important that we be able to replicate bugs that are reported to the from the user so that we're able to verify the bugs do in fact exist and determine the actual location of the problem before trying to implement a fix once a fix is implemented it too needs to be tested before a patch can be released for the software now managing these development activities is an important step in and of itself each step of this process requires some kind of orchestration particularly as the size of the development team increases a discipline process helps to maximize Effectiveness when the work is being done by more than one person in the details of this process management are what we call a development model the simplest development model is the code and fix model this is really the simplest of models because it exhibits a complete lack of any kind of design discipline what happens is programmers simply sit down write some code maybe run a couple tests fix the code based upon the tests deliver the result to the customer if the customer likes the code then they're done if not then go back fix the code some more deliver it again this process continues until the customer is either satisfied or just completely gives up this type of model is inefficient the code is often hastily completed quite a few bugs this type of model is while fairly well suited to outsourced programming is not really suitable for long-term support of a particular application because the project is treated as something that's to be coded and later forgotten the opposite extreme would be the waterfall model which is an attempt to follow a conceptual diagram of processes step by step and perform each step all the way to completion before moving on to the next step so one would start with a concept perform requirements analysis until all the requirements were teased out perform a design step until the complete design was finished and then start the coding this model doesn't work well in practice because often issues related to requirements do not emerge until after a few versions of the software have been completed and the user comes back and fills in the development team on details that weren't previously disclosed during the original requirements analysis phase similarly the design often needs to be changed as the source code evolves there is a related model called The V model that tries to establish relationships between these steps however these models are generally not considered effective iterative and incremental models are the most popular software development models used today and instead of Performing each task to completion like one would try to do in the waterfall model and break the project into pieces and Implement one piece at a time test and deliver multiple prototypes during the project development phase and solicit feedback from the customer for each given prototype this process begins with a concept but then goes through a requirements analysis a partial design a prototype and testing in this feedback process multiple times and over time the software application from its requirements through its design through its code are progressively revised until the application actually satisfies the customer's requirements once this occurs some final testing and integration steps can be performed before the software is finally released and put into production now there are multiple iterative and incremental development models rapid application development is a model that features short planning phase with a large number of prototypes Agile development puts a strong emphasis on people and interactions over particular requirements Dot documents and other formalities and focuses on responding to customer feedback and developing in a more feedback Centric way extreme programming is a form of Agile development with extremely short prototype release cycles and very little documentation other than the application itself for larger teams a more formal approach is available with a Spyro model which emphasizes risk analysis at each step in order to evaluate the progress of the project and attempt to prevent cost overrents there is also a branch of development called formal methods these are used for life and safety critical software systems and are in fact required by the European Union for avionics software and other safety critical applications clean room software engineering is a process in which software is formally verified to ensure that it meets all specifications and is statistically tested and statistically quality controlled to ensure the application is actually performing the tasks that it's supposed to perform this is also an iterative and incremental approach however it does require much greater documentation of the testing procedures so how much formalism is actually required in a software development process well Dr Aleister Coburn came up with a scale and determined that the need for formal development processes increases with the size of the development team the financial risk of the project or the safety critical nature of the final system or any combination of those factors furthermore depending on the regulatory requirements in under which the software is being developed there may be a greater need for documentation and process verification than in regulatory domains where there is little or no such regulation so in summary software development encompasses much more than simply writing program code most of the popular development processes that are in use today are iterative and incremental approaches that emphasize building small prototypes and adding on to those prototypes as the project progresses however safety critical systems and projects with extremely large development teams or high risk often will need greater formal procedures in order to coordinate the development process and may require formal verif in this lecture I'm going to discuss file systems I'll be providing an overview of the purpose of file systems discussing metadata that they store explaining how we create a file system through a process known as formatting talk about some issues with file systems including fragmentation and journaling briefly discuss some internal layouts used by different file systems and finally talk about mounting and unmounting file systems to make them available to users the file system is responsible for laying out data on a persistent storage device and ensuring that data can be retrieved reliably the file system is an abstraction of disk space it provides routines for querying opening and closing files and providing human readable names for files and some kind of organizational structure for files typically through directories or folders without a file system we would have to access disk by physical location or by address and each program would have to reserve certain address limits for its exclusive use file systems in addition to performing this abstraction also arbitrate disk space among different programs and different users of a computer system file permissions allow users to have a certain degree of privacy while file quotas ensure that one user does not monopolize the entire system by utilizing all the disk space file systems are also responsible for storing metadata or information about each file this includes a file name file size who owns the file what group that owner belongs to what permissions exist for each different category of users on the system to access that file as well as to provide certain time stamps on Unix we have the inode creation time the file modification time and optionally the last file access time metadata records also include internal information that is important for the file system itself such as pointers to the actual data on disk and a reference count for how many different names refer to the same file system called hard links we create a file system by taking an empty partition or a partition that we're ready to reuse and formatting it formatting quite simply is the process of making a new file system on an existing partition formatting typically destroys the structure of any file system that was previously installed on the partition thus when you format a partition you lose the access to its contents at least through standard tools however unless that free space unless that partition is securely erased the contents that were formally on the partition can still be recovered using forensic tools the only safe way to switch between file systems on a single partition is to back the data that's our net partition up to another disk format the partition using whatever the new file system would be and then restoring the data from the backup there is no safe way to change a file system type in place without losing data bus systems do suffer from a few issues over time sections of a file in a file system can become non-contiguous in other words a file gets split over different parts of the disk and in the process that also splits up the free space so that when new files need to be allocated they have to be split up to take advantage of smaller blocks of free space this is a situation called fragmentation it's worse in some file systems than others fragmented file systems were a big problem with mechanical hard drives simply because a fragmented file requires a seek to move from the location of the first fragment to the location of the next fragment and possibly further seeks if there are more fragments not so much a problem on solid state drives however since there's no seek time and file systems can get around this problem either with offline defragmentation tools that the system administrator can run manually or they can use fragmentation avoidance strategies or automatic on the Fly defragmentation another issue that can occur with file systems is that a single high-level file system operation typically requires several low-level steps in order to complete if the computer Should Crash or power be lost in the middle of those steps being performed the file system could be left in an inconsistent state a solution to this problem is called journaling and the way this works is by recording all the steps that are to be taken in something called the journal a special part of disk space reserved for this particular information records all the steps that are going to be taken prior to taking the steps thus if the system should crash in the middle of a file system high level operation all that needs to occurs the journal simply needs to be replayed next time the file system is mounted and the steps can be carried out again and leave the file system in a consistent state internally file systems may use one of several different approaches for storing data on the disk a simple layout is called the file allocation table which simply has a single table on each partition to store metadata and the addresses of data segments for each file dial allocation table type storage methods are limited only in terms of how big the file allocation table can be and these limitations specify among other things the maximum size a file can be and how many files can be on the system another approach is to use something called inodes which are data structures on Unix systems that contain the metadata for a file including pointers to the actual data inodes do not store the file names however these are stored in a separate structure called a directory that Maps file names to inodes the maximum number of files of a single file system can hold is limited by the number of inodes created when the file system is formatted this can be a particular problem for file systems that wind up storing a really large number of very small files there could be plenty of space left on the file system on the partition however if the number of inodes is exhausted then no more files will be able to be created another approach to storing files and laying out data on the file system is through the use of extents extents support larger maximum file sizes because they're designed to allow files to be composed of several non-contiguous blocks of space much like fragmenting that could occur with a file system that doesn't use extents extent information is stored with the metadata in the file inode or some other type of data structure for file systems that support extents now regardless of how the file system lays out its data internally we need to make the file system available to users and programs on the computer to do this we perform a process called mounting mounting is the act of making a file system available to the users of the system the opposite process is called unmounting which is disconnecting a previously mounted file system Unix systems Mount file systems at Mount points which are simply directories somewhere within the overall directory structure of the system so if I plug in a flash drive on a Linux machine for example that flash drive may be mounted at slash media my drive I can mount a large number of different file systems this way at the same time and I can make all of the different file systems appear to be part of One Directory structure on the other hand Windows systems use Drive letters where the letter c is reserved for the system partition the one on which Windows is installed and a and b are reserved for floppy drives Windows supports a maximum of 26 file systems to be mounted at once simply because that's the number of letters that are available to assigned drives to its signed partitions so in summary file systems organize data store metadata provide an abstraction of the underlying storage medium and arbitrate access to the storage space we create file systems through a process known as formatting we can make file systems more robust against data loss during a power failure through the use of journaling internally file systems use various different mechanisms for laying the data out on disk but no matter how they work internally we can make them available to a running system by mounting them in this lecture I'm going to discuss requirements analysis in particular I'll give an overview of different types of projects different types of resources that can be applied to projects talk about the purpose of Performing requirements analysis discuss stakeholders interests and actors and talk about the functional and non-functional requirements of development projects software development projects can be described as falling into a range in between two extremes at one extreme we have something called a green field project and it's called a Greenfield project because it's analogous to building a building on an empty field start with a grassy field build everything from scratch it's a completely new system get to make all the design decisions from the ground up there's no existing code that has to be merged into the system and this type of project has the potential to be extremely rewarding to the people who work on it at the opposite extreme we have a re-engineering project or a project that needs to be re-engineered this type of project begins with a pile of existing code that lacks any type of documentation whatsoever the design of the system the architecture of the system is an unknown and in fact it's necessary to perform code analysis to determine how the application was designed and how it was architected this type of project is the greatest potential for frustration since many older programming paradigms did not emphasize making code that was easy to untangle most projects fall in between these two extremes there are some projects that fall into the Greenfield category there are some re-engineering projects however many projects are going to land in the middle there's going to be some amount of code that has to be reused yet there'll be some amount of documentation at least hopefully now regardless of the type of project we can utilize several different types of resources in the software development activities of course we begin with people we have a number of people who work with us on a software development project we have software Architects we have designers we have developers we have programmers we have others one other type of person that I didn't put in the slides but is important is someone called a domain expert that's the type of person who knows what types of problems the software actually needs to solve and what the business rules are for that classic problem of course we also have money and infrastructure infrastructure includes things such as computer equipment software applications desktop environments other computational resources that we can utilize in order to develop a working solution of course given enough money we can always purchase additional resources in terms of additional infrastructure or hire additional programmers Consultants others to help with the process now the purpose of performing a requirements analysis is to make effective use of these resources we have available to us ideally we'd like to keep our development staff happy we would like to spend our money wisely and we'd like to optimize use of the infrastructure we have the aim of trying to determine what exactly needs to be developed so that we can build the right product so that we can actually make something that satisfies the business needs of whoever has commissioned the product speaking of folks who have an interest in the product we can define a few roles and relationships for specific individuals within a software development setting one important class of individual is the stakeholder stakeholders are people or other entities such as companies or groups that have a vested interest in the system under discussion it's often called the sud stakeholders include the customer the development company itself users of the system and other direct participants in the system however stakeholders may also include indirect participants in the system perhaps another agency or another unit of the company is actually funding the development work even though they may not directly use the resulting products these entities have vested interest in the outcome of the system because they have an investment in the system now the interests that one could associate with stakeholders are simply the expectations that a stakeholder has of the system under discussion these interests may be of a technical nature they could also be of a financial nature or even a political nature perhaps there's some aspect of the system as some political significance to someone so interests fall into a variety of categories actors in a system are anything that has Behavior so this includes both human users and other computer systems the primary actor in a system is a stakeholder that is using the system directly to carry out some tasks in other words this is the user who sits down at the keyboard and actually runs the application in order to perform some business function supporting actors on the other hand are other entities upon which the system or lies to complete a task this could include something such as an external database system an external printing system some other functionality that can be described as having behaviors which the system requires in order to perform its business functions this could even be a person who has to go and manually input some type of data into the system human actors typically require a user interface or UI in order to interact the system computerized actors on the other hand typically require something called an application programming interface or API in order to interact with the system now when we're specifying requirements and considering what type of actions an actor can take with the system we can divide the requirements into two sets the first set of requirements is called functional requirements these requirements specify what a software application does in other words given some set of inputs this set of requirements should be able to tell you what the outputs and side effects of the system will be for that set of inputs this is the design of the software this is the low level detail about how the software responds to particular different types of input non-functional requirements on the other hand specify what the software is these are often called qualities of the software an example of this is the so-called herps list which includes usability reliability performance and Sportability ISO or IEC 9126 also Define some other qualities that a software system may have including portability and efficiency non-functional requirements are embodied in the architecture of the software how is the software put together to meet these quality requirements so in summary the types of development projects that we could be considering vary from the extremes of Greenfield to re-engineering projects these projects are supported by resources including people money and infrastructure each project has stakeholders and each stakeholder has interest in the system stakeholders who use the system directly become primary actors on the system and software engineering requirements can be classified as functional in other words describing how the system works what the system does or non-functional describing qualities of the system or what the system is in this lecture I'm going to discuss features of the central processing unit or CPU that are useful for supporting multiple applications sharing a computer system simultaneously in particular I'll introduce multi-programming and discuss the hardware requirements to support multi-programming I'll discuss CPU privilege amounts x86 protection Rings mode switches and briefly introduce interrupts the first concept to introduce is multi-programming in multi-programming is simply the idea that we can run multiple processes or multiple instances of potentially several programs at the same time and we can do this by having the CPU switch quickly among the different processes enabling all of them to make forward progress per unit of human perceivable time the CPU will switch quickly enough to provide the illusion that all the processes are running at the same time even if we only have one processor core in the vast majority of modern Computing systems with the exception of some special purpose systems are multi-programming system some of the old computer systems of the day were batch systems that only ran one sys one application at a time now in order to support multi-programming we have to have certain features of our computer hardware first of these is an interrupt mechanism for enabling preemption of running processes whenever some kind of event occurs we have to have a way of stopping a process handling an event and then restarting the process we need to have a clock so that we know how long the process has been running and we need to have CPU protection levels to restrict access to certain instructions to prevent processes from hijacking the system or just trying to bypass the operating system altogether we also need to restrict access to memory in order to prevent reading and writing to memory that a particular process does not own belongs to somebody else two CPU protection levels are sufficient a protected mode and a privileged mode these modes are also called the supervisor mode or kernel mode and user mode in kernel mode all instructions on the CPU are enabled and the kernel can access all memory on the system in user mode the CPU disables all the privileged instructions and restricts most direct memory operations thus a user program must make a system call to the operating system to request memory or perform other resource allocation tasks in this way the user processes are effectively sandboxed both from the system and from each other on the Intel based systems x86 and x8664 systems there are actually Four modes available these are implemented by what are known as x86 protection Rings which consist of four privileged levels numbered zero through three rating 0 has the greatest number of privileges code executing in ring 0 can execute any instruction the CPU provides and can access all memory ring 3 has the fewest privileges all the instructions are restricted to the set of instructions that are relatively safe in practice most operating systems actually only use ring 0 and 3. os2 and Zen are the notable exceptions that make use of ring 1. newer systems with virtualization extensions either the Intel VTX extensions or the amdb extensions add an extra privilege level below ring zero this is colloquially sometimes referred to as ring negative one and this mode enables instructions that allow multiple operating systems to share the same processor these instructions help system support posting multiple virtual machines at the same time now regardless of the number of modes available whenever we wish to change modes for whatever reason we have to perform something called a mode switch and that occurs whenever the CPU switches into user mode kernel mode or hypervisor mode or whenever an x86 CPU changes which protection ring is presently effective mode switches have the potential to be slow operations compared to other machine instructions depending upon the hardware a notable example was the first generation of Intel Core 2 Series processors in which the mode switches into and out of hypervisor mode were quite slow one situation in which a mode switch might occur is when something called an interrupt happens in an interrupt is simply a situation in which the currently executing code is interrupted so that an event can be handled by the operating system interrupts could fall into two categories we can have involuntary interrupts which are external to running processes these consist of things such as I O interrupts which are generated every time you press the key on the keyboard or perform any other i o test clock interrupts which are timer mechanisms that can be scheduled to go off at a particular time and Page faults which have to do with the virtual memory subsystem interrupts can also be voluntary in other words created by a process that's running system calls and exceptions such as said fault or divide by zero actions can result in interrupts as well and the CPU provides Hardware mechanisms for detecting when an interrupt is occurring and handling the interrupt so in summary multi-programming systems allow multiple applications to run simultaneously implementing multi-programming reports requires support from the hardware in particular we need CPU privileges we need a clock and we need some kind of interrupt handling mechanism CPUs used in multi-programming systems need to have at least two privileged modes Intel x86 Systems Support four or five modes depending on the processor mode switches can be expensive in terms of performance so we don't want to do them more than necessary and interrupts enable Hardware events to be delivered to applications and they allow applications to yield control of the system while waiting for events or waiting for service in this lecture I'm going to discuss kernel architectures I'll begin by introducing the functions of the kernel explain the separation between mechanism and policy talk about some seminal early kernels in the history of computing and then introduce the differences between monolithic kernels and micro kernels the kernel provides two functions the same two functions is in the operating system it provides abstraction and arbitration the kernel provides abstraction in the sense that it provides a mechanism for programs to access Hardware a way to schedule multiple multiple programs on the system and it provides some method for inter-process communication or IPC a way for programs to send messages to each other or send messages to Hardware devices or out to the network kernels also provide abstraction mechanisms they ensure that a single process or running program can't take over the entire system enforce any kind of security requirements such as access privileges that might be in place on the system and they minimize the risk of a total system Crash from a buggy application or device driver it's important to distinguish between mechanism and policy when discussing the internal components of an operating system the mechanism put simply is the software methods that enable operations to be carried out an example of a mechanism would be code that implemented inside a device driver sends a message to a device that causes that device to Blink a light enable a camera or perform some other Hardware level operation policy on the other hand is a set of software methods that enforce permissions access roles or other limits against applications so a policy for example would be something that said that only users who met certain criteria could send a message out to a hardware device to Blink a light or enable a camera or perform some other Hardware function it's a generally accepted principle of good design with mechanism and policy should be separated as much as possible an early kernel that separated mechanism and policy quite well was the regna centralin RC 4000 monitor kernel this kernel was developed in 1969 primarily by Perry Brink Hanson for the regna Central and rc4000 computer system this was a computer system that was developed in Denmark and the central component of the system was a small nucleus as brinkansen called it called monitor which allowed programs to send messages to each other and allowed programs to send and receive buffers which were essentially types of messages for Hardware to and from different Hardware devices in particular at that time they had a card reader a tape reader and a printing style output device other kernels with different scheduling mechanisms and other capabilities could be run under monitor in those days it was not clear that multi-programming was really a desirable feature for computing thus someone could write a multi-programming capable kernel and actually run that as a sub kernel under the monitor system importantly this was also the first system that allowed sub kernels and systems level software to be written in a high-level language in this case Pascal the system performance was actually quite awful Brink Hansen stated that the operating system itself was so slow at performing its IPC tasks that there were a number of issues with the system completing tasks on time however the system was stable and reliable making it successful in computer science history even if it was not a successful product commercially on the other hand the opposite extreme would be the Unix kernel this was developed at Bell Labs by a team headed by Ken Thompson and Dennis Ritchie also starting in the late 1960s the difference between the Unix kernel and the rc4000 monitor was that the Unix Kernel's design implemented performance thus instead of having a very small kernel that simply provided an IPC mechanism and some basic resource collision avoidance this kernel actually provided all the device drivers all the scheduling all the memory management including support for multi-programming directly inside the kernel this kernel was an early example of what would later be called a monolithic kernel a monolithic kernel is a kernel that contains the entire operating system in kernel space runs all of the operating system code in privileged mode or ring 0 on an x86 system and divides the different functions of the operating system into subsystems of the kernel all of these subsystems however are run in the same memory space this has the advantage of higher performance but the disadvantage is that the kernel becomes less modular and more difficult to maintain and the components are not separated very well so a crash in one component could in fact bring down the entire system the opposite of this the rc4000 style kernel is what we now call a micro kernel in a micro kernel basically contains the bare minimum of code is necessary in order to implement basic addressing inter-process Communications and scheduling this basic amount of code runs in kernel space and everything else runs in user space often with lower privileges as a general rule of thumb micro kernels contain less than 10 000 lines of code micro kernel based operating systems tend to be quite modular because they divide the operating system functions between the kernel and a set of servers that run in user space however because many of the core functions of the operating system are performed by user space components which have to communicate with each other via the kernel performance does suffer thus most kernels they're in use today are a hybrid of these two designs I'm going to introduce Murphy's Law of reality sort of an extension of the Murphy's laws with which you may be familiar and by definition of Murphy's Law of reality is simply that reality is the hazy space between the extremes of competing academic theories in which everything is wrong in some way at least according to the theories this idea of a hybrid kernel architecture is a controversial one some people do not like to use this terminology at all many people prefer to keep the binary classification of Monolithic kernel and micro kernel however if we look at modern kernels typically the monolithic versions of modern kernels are broken into modules that can be loaded and unloaded at runtime this helps to increase maintainability of the kernel and true micro kernels today would have unacceptable performance thus micro kernel based systems typically have some of the features of Monolithic kernels such as more device drivers and other code that runs inside the Kernel's memory space some examples of different types of kernels for monolithic kernels in addition to the system 5 units kernel which is a descendant from the original units kernel we have the Linux kernel BSD MS-DOS and windows 9x kernels Windows NT XP Vista and seven if you don't prefer to use the hybrid terminology would also qualify as monolithic minerals and the Mac OS 10 kernel falls into the same category terms of micro kernels the rc4000 monitor kernel would have been the earliest however there have been plenty other examples including mock L4 the MIT exokernel project and the idea at least behind the Windows NT kernel which was based upon a micro kernel design the same is true of the Mac OS 10 kernel since that was originally based on the mock micro kernel however those have been heavily modified and now have many properties of Monolithic kernels also so in summary the kernel is the minimum layer of software inside the operating system that provides the basic foundations for abstracting away details of the hardware and arbitrating between multiple applications when the bearer absolute bare minimum implementations are used we call the result a micro kernel monolithic kernels on the other hand have all their major OS components contained within them running everything inside kernel space to improve performance two early influential kernels were the rc4000 monitor an example of a micro kernel and the original Unix terminal which was an example of a monolithic kernel in practice however most modern operating system kernels are hybrids of the two designs and have features of both kernel type in this lecture I'm going to introduce the unified modeling language uml I'll be discussing software modeling introduce uml talk about some of its limitations discuss a few of its diagram types discuss the role of uml and software engineering and mention a few case tools in software systems especially complex software systems it's useful to be able to present a design to other Developers for discussion during the development process complex software systems need to be designed before they can be constructed or modified anyway and it's helpful to keep design documentation on hand so that the system can be modified more easily later in order to communicate with other developers and other stakeholders in the system it's helpful to have some analog to a contractor's blueprint consider for example in the real world building a large building such as a hospital first an architect will come up with a high level design for the project Engineers will Design the individual components and ensure that the hospital will meet all codes and remain standing and operational and then during construction the designs get updated slightly to account for changes in any problems that are encountered during the construction phase after construction the blueprints are retained in case an addition is needed later or in case modifications are needed to the building unified modeling language provides an option for software Engineers to have similar design documentation for software projects and in the early days of software engineering diagrams were drawn using a wide variety of different formats with different symbols different connectors different annotations and so forth and developers moving between companies or trying to collaborate across company or even divisional boundaries within a company had to be trained on each specific format that was being used for the communication the idea behind unified modeling language was to provide a single standardized formatting in a standardized set of symbols in order to create a Common Language and the group of folks that did this were Grady booch Ivar Jacobson and Jim Rumba at rational software corporation which was later purchased by IBM UMO was standardized by the object Management Group but the standards are not always followed by uml tools and in particular with the latest version of uml there is no set of tests in order to verify tool compliance with standards however uml has become popular for drawing certain types of diagrams either manually by hand using different tools or using electronic tools individuals drawing uml diagrams by hand typically do not remain strictly adherent to the standards there can be slight variations in the diagrams as drawn the uml provides diagram types to model system structure behavior and interaction there are some limitations to uml first and foremost uml does not replace other types of design documents a good example of this is use cases uml use case diagrams can provide useful indexes into use cases however they do not replace the use cases themselves many of the diagram types that uml provides are not actually used widely simply because their specialty diagrams that many developers don't choose to utilize uml diagramming tools often use incompatible formats making them difficult to Interchange diagrams between one tool and another in standardization of uml was a controversial and in some circles remains a controversial topic today other software modeling approaches do exist but were not included in uml because they were not considered by the uml authors nevertheless uml does provide a rich variety of diagram types these can be classified into two to three categories depending on how you prefer to draw the classification we have structural diagrams in particular class diagrams that are widely used to show how object-oriented systems are designed we have Behavior diagrams in particular activity diagrams which can show how a development process can be coordinated and state machine diagrams which are useful for representing finite State machines that exist in many types of software there are also interaction diagrams which can be thought of as a subset of behavior diagrams and this can show how different software components interact with each other uml as a tool when software engineering is simply one of a number of tools that can be used to plan architect and design software applications diagrams using uml are convenient for communication between different members of a development team and they help communicate the architecture and design of an implemented Software System to Future developers who might need to maintain that system there are some tools available called case tools or computer-aided software engineering tools that can take uml diagrams a bit further and actually generate Partial Program code from the uml diagram so if you have a uml diagram laying out a few classes the case tools can often fill in templates of the classes and programmers would only need to fill in the code for each individual method in the class other tools exist that can actually take existing code and extract euml diagram from that code showing how the code is structured so in summary uml provides us a common set of communication tools for presenting And discussing aspects of the architecture and design of software systems uml is one of many tools used by software engineers and software applications exist for drawing uml diagrams and potentially generating code from those diagrams however these tools are often not interoperable with each other in this lecture I'm going to discuss uml activity diagrams uml activity diagrams are used for modeling processes these processes can include different operations within software they can include the software development process itself can also include business processes and other types of process in this example graphic I have a uml activity diagram illustrating the development for a simple client server project in which the client and the server are implemented by different members of the development team the basic parts of the uml activity diagram include the initial State the final state and set of transitions in other states and controls in between the initial State and the final state the initial state is represented by a filled Circle and represents the point at which we enter the process the final state is displayed as a filled Circle within another Circle and this is the state in which we exit the process in between these two states we have action States and activity States action States represent single operations in the process being modeled and these operations are not broken down into another activity diagram activity States represent longer running processes that occur as steps in our outer process and these inner processes have their own activity diagrams to explain how they function within an activity diagram we can make a decision to perform one step or another or even repeat a prior step based upon some condition in a uml activity diagram this decision is called a sequential branch and it's represented by a small diamond borrowed from flowcharting symbols a decision can be taken at the point of the diamond and then transitions leading from the diamond will indicate the next state conditions associated with those transitions are included in square brackets as annotations next to the transition error the special annotation else indicates a path to take if no other condition is satisfied in modeling processes it's useful to show where certain steps can be carried out by different people or different components of the system in parallel we can represent this in a uml activity diagram by means of something called a concurrent Fork a concurrent Fork is simply a thick line to which a transition connects and several transitions originate from the originating transitions we have states that are reached in parallel presumably by different entities within the system thus the concurrent Fork allows our process to be split into multiple concurrent paths things that can be done at the same time and then we can get back together later in the way we come back together from a concurrent Fork is with the opposite a concurrent join this is the same kind of heavy line however transition arrows come into the line and a single transition comes out of the line to the next state in the sequential process so we have multiple concurrent parallel paths meeting to form a single sequential path it's useful with concurrent forks and concurrent joints to have something called activity partitions or Swim Lanes these are ways of annotating the diagram to show which component individual or entity is responsible for each parallel path in the project in this example we can see that the entire team performs research and then performs a design step then we have a concurrent Fork after which Bob implements the client and Jill implements the server after those implementations are done we have a concurrent join and come back to the team separation of responsibility is indicated by drawing a box or activity partition around those parts of the path that are the responsibility of one single component one single team member or one single entity I'll take just a moment to talk about types of tools we can use to draw uml activity diagrams a simple way is to draw the diagrams by hand on paper if we need a digital copy we can simply use a camera such as the type of camera in your cell phone to make a digital image some more professional looking results can be obtained by using a drawing program I've used Google Docs drawing tool to make the examples in these lecture slides however you could use OpenOffice draw Vizio or another type of drawing program in order to make these diagrams there are also some open source tools that have uml-specific additions that can be useful for making uml diagrams these include DIA certain plugins to the eclipse framework however these are complicated to set up and sometimes difficult to use and umbrella which is a general purpose computer-aided software engineering tool so in summary uml activity diagrams allow us to model a process this process can fall into any one of several domains this could be a process inside software this could be a business process this could even be the software development process itself how the individual team members are going to come together in the project in this example I am modeling a software development process a very simple one you can see the initial State the filled circle at the top at which the development process begins this process begins with the entire team performing research research is an activity state so it would be described by a separate activity diagram which I have not shown after the research phase we would move to the design phase in which the entire team would be working on the design for the software after the design state which is an action state so it's not further Illustrated anywhere else we have a concurrent Fork at which point Bob goes and implements the client part of the project and Jill goes and implements the server part of the project when both Bob and Jill are finished with their implementations we have a concurrent join at which point the entire team reassembles to integrate the project and test the project if all the tests pass we move on to the ship state however if the tests are not passing if we take the else branch of that sequential Branch then we go back to the test step finally once we've moved to the ship state we can move to the final state of the diagram which indicates that this process is complete in this lecture I'm going to discuss interrupts and device input output when Hardware devices on a computer produce events we need some way of being able to handle those events within the operating system and deliver them to Applications and Hardware devices are going to produce events at times and in patterns that we don't know about in advance for example we don't know which Keys the user is going to press on the keyboard until the user actually presses those keys if a cat walks across the keyboard we're going to see a completely different pattern of key presses from which we would expect to see with the human user similarly if we have incoming Network packets if we're running a server application or even just a workstation and we have messages coming in from the network we don't know the order and timing of those messages we also don't know when the mouse is going to be moved or when any other of a whole bunch of Hardware events is going to occur so how can we get the information generated by these events and make it available to our applications for use well we have two options first option is that we can pull each device we can ask each device if it has any new information and retrieve that information or we can let the devices send a signal whenever they have information and have the operating system stop whatever it's doing and pick up that information this is called an interrupt the polling model of input involves the OS periodically polling each device for information so every so often the CPU is going to send a message to each Hardware device in the system and say Hey you have any data for me and most of the time the device is going to send back no don't really have any data for you other times the device is going to send back some data it's a really simple design extremely simple to implement but there are a number of problems with polling first problem is is that most of the time when you're polling the devices are not going to have any input data delivered thus polling is going to waste a whole lot of CPU time the second issue that occurs is high latency if I press the key on the keyboard that keystroke is not going to get transmitted to the computer until the next time the CPU pulls the keyboard to ask which Keys have been pressed if that time is set to be really short I'll have good responsiveness but the CPU is not going to get any useful work done on the other hand if we set that time length to be long enough for the CPU to get some work done there's going to be a noticeable lag between the time I press a key and the time a character appears on the screen since the device must wait for a polling interval because it can translate input we're going to have a high latency situation and again shortening that polling interval to try to reduce the latency simply wastes a whole lot of CPU time checking devices that have no input so better mechanism is to use a system called interrupts and with interrupts the hardware device is actually signal the operating system whenever events occur or more precisely they signal the CPU and then it's up to the operating system to receive and handle that signal what the operating system will do is it will preempt any running process in other words it will switch what we call context away from that running process to handle the event basically it will move the program counter of the CPU to the code to handle that particular interrupt this allows for a more responsive system than we could ever achieve through polling without having to waste a whole bunch of time asking idle devices for data however this does require a more complex implementation and that implementation complexity begins at the hardware level specifically within the CPU we need to have a mechanism for checking and responding to interrupts and this mechanism is implemented as part of the cpu's fetch execute cycle in the process of fetch execute the CPU is going to fetch an instruction for memory increment the program counter execute that instruction but instead of Simply going back to the next fetch the CPU actually has to have additional Hardware to check to see if an interrupt event is pending if there is an interrupt pending then the CPU has to be switched to Kernel mode if it's running in user mode so the privilege level needs to be escalated save the program counter by pushing it onto the stack and load a program counter from fixed memory location and that fixed memory location is called the interrupt Vector table or ivt so we load the program counter from the IBT and then the CPU goes and executes that new instruction the next time the fetch execute cycle resumes so the CPU actually moves from executing program code to executing code from the interrupt Handler for the particular event if no interrupt is pending at the end of an execute then we simply go back to the next instruction fetch the interrupt Vector table consists of an array of addresses of handlers each element in this array essentially gives the program counter location for the Handler for a particular interrupt this Handler is going to be in a subsystem of the kernel for a monolithic kernel or this Handler might invoke a call to an external server for a micro kernel in any case however the first Handler by convention element 0 of the array is always the Handler for the clock then handlers for different devices are in the array after the clock Handler so the interrupt Vector is always mapped into the user part of memory it's always available at all times so that the kernel can go and look up interrupt information whenever is necessary an interrupt is processed by branching the program counter to the interrupt Handler executing interrupt handling code and then at the end of the interrupt handling code there'll be an instruction to return from the interrupt in the Intel Assembly Language this is The Irate instruction which loads the process program counter back from memory it pops the stack to get the original program counter back and goes ahead and changes the CPU back to user mode so it removes the privilege escalation the interrupt handling mechanism is thus able to handle events from Hardware devices without having to poll each device individually to get data in this lecture I'll be discussing interrupt controllers in particular I'll introduce the old and new mechanisms for delivering interrupts from Hardware devices to the CPU these methods include the original programmable interrupt controllers and the new Advanced programmable interrupt controller with message signaled interrupts interrupt controllers provide an interface for Hardware to Signal the CPU whenever a device needs attention it's important to note that this signal only includes a message that essentially says hey I'm a device I need attention the CPU historically then actually does have to go and pull the device to get any data that the device may have the older mechanism for performing this operation was called a programmable interrupt controller or pick and it actually required dedicated lines to be added to the motherboard the ISA or industry standard architecture bus Which dates back all the way to the first PC back in 1987. and older versions of the PCI or peripheral component interconnect bus utilized this mechanism the new mechanism or the advanced programmable interrupt controller is used on PCI Express devices and some newer PCI devices now the old controller or the programmable interrupt controller actually consisted of two programmable interrupt controller chips that were attached to each other with one of the chips being attached to the CPU the so-called Master chip was the one attached to the CPU and pin 2 of that Master chip was attached to a slave chip each pin on each of the two chips allows for 16 interrupt numbers to be created interrupts 0 through 7 are correspond to the pins of the master chip and interrupts 8 through 15 correspond to the pins of the slave chip now it should be noted that since pin 2 of The Master chip handles the slave chip that the master programmable interrupt controller only supports an effective seven interrupts so there are only 15 usable interrupt Hardware lines for devices and these are numbered 0 through 15 but we have to skip the number two now historically pin number zero which corresponds in software terms to what we call interrupt request line or irq0 was connected to the timer interrupt request line 1 was connected to the keyboard different Isa and PCI devices could then use the remainder of the master chip by connecting to irq lines three through seven on the slave chip pin 0 which corresponds to irq8 was connected to the real-time clock pin 4 corresponding to irq12 was connected to a PS2 Mouse pin 5 or irq13 connected to the math coprocessor which was a separate component from the main CPU in earlier PCS and then pin 6 and 7 corresponding to irq lines 14 and 15 connected to the IDE controllers these were used for disk and eventually for optical devices this left pins one through three on the slave controller or irqs 9 through 11 available for Hardware devices now these interrupt lines on the motherboard were actually circuit traces these were conductive paths etched into the motherboard that allowed interrupts to be received from devices there were 15 lines available of the 16 that could be used by devices with lines 0 and 1 reserved for the timer and a PS2 keyboard respectively actually even before the PS2 reservation the original at keyboard Isa and PCI add-in devices actually had to share interrupt request lines and this sharing could lead to Hardware conflicts that could lock up the system it was thus up to the system owner to manage the sharing by setting little jumpers on the add-in cards so that the cards were using different irq lines there were also performance issues when irq lines were shared because the operating system actually had to pull each device sharing an irq to determine which device it was that raised the interrupt polling was still necessary in order to receive any kind of data from the device regardless of whether it was sharing an interrupt line or not on Modern systems a completely different interrupt mechanism is used and this mechanism has a set of memory registers on what's called an advanced programmable interrupt control in this set of memory registers is connected to a single shared bus that each device on the system can use to raise an interrupt message by writing that message into one of the memory registers these are called message signal interrupts using the MSI and MSI X specifications essentially each device here I have a timer RTC USB host controller SATA controller is attached to the bus and indicates its interest in raising an interrupt to the apic by sending a message over that bus now this message does not contain any data it's only a request for attention if the CPU has to be involved in the operation of sending or receiving information then the CPU actually has to contact the device in other words pull it directly there is a way around this called direct memory access or dma transfers which are used extensively on PCI Express devices the register on the Apec stores the request for attention until such time as the operating system handles the interrupt request and then that message is cleared from the apic this is the only interrupt mechanism that's available on PCI Express buses there are no Hardware interrupt lines however a number of motherboards still have interrupt lines physical interrupt lines and have physical pick pens so that they can support Legacy devices your number of specialty Legacy devices still in use that need to be supported message signal interrupts do solve a number of problems with interrupt request sharing the original specification allows each device to use any one of 32 irq lines the MSI X specification will allow each device to use up to 2048 virtual lines virtual interrupt request offers essentially and this allows for Less contention and reduces the need to share interrupt request numbers by device thus reduces the amount of time necessary for the CPU to determine which device wanted attention so main thing to take away from this is that the interrupt controller and the interrupt request mechanism only allows the device to raise a signal that says it wants attention it's up to the CPU or on certain buses up the device and the memory controller to get the information out of that device and into memory in this lecture I will introduce use cases use cases are a type of requirements document that provides narrative descriptions of scenarios that will be handled by the system I will be discussing the elements that make up a use case these elements include the trigger actors in The use case preconditions steps in the process described by the use case guarantees made to the stakeholders and any quality requirements applicable to the use kits use cases are one type of requirements document they provide a written narrative that describes how the system under discussion or sud will behave in given scenarios after the system is implemented the sud could be tested by following the use cases and checking the behavior of the system matches the requirements for the given scenario it is important to remember that a use case is only one type of requirements document other types of requirements documents such as the system conceptualization or user interface specification do exist the primary function of a use case is to describe how the sud responds to interaction from a stakeholder in the system in other words a use case specifies what the set is to do in different scenarios for example a use case for a database system might specify what steps are to be taken whenever the database administrator asks to update a record in the table on the other hand a use case might specify that certain information needs to be updated whenever a customer makes a new order from a company these two scenarios illustrate the different levels at which use cases can be written at the high level we have business use cases that describe the business process that must be carried out without specifying any details of the system these use cases are also sometimes called essential use cases lower level use cases or system use cases describe how the sun will behave in specific situations these use cases incorporate details of the technology being used use cases may have either black box or white box scope all business use cases and many system use cases use Black Box scope this means they treat the system as a mystery device this specify how the system will behave under given conditions but they leave out all the details of how the behaviors will be implemented some system use cases on the other hand may be written from a white box perspective the easiest cases detail some of the internal operations of the sud such as specific protocols or algorithms that will be used white box use cases are especially useful for describing how different components of the sud will interoperate with each other when writing use case narratives it is customary to include a number of standard pieces of information the specific entries and the Order of the entries will vary by organization in most cases you will write use cases by following a template a sample use case template might contain items such as the trigger for the East case a list of actors involved in the use case and a list of preconditions that must be true before the East case steps can begin use case then specifies the system behavior is a series of steps illustrating how the primary actor interacts with the sud at the end of the scenario described by the use case certain guarantees are made to the stakeholders about the state of the system the use case might also specify quality requirements such as the maximum amount of time allowed to complete the scenario in general a use case will have a triggering condition this trigger corresponds to the user wishing to perform some action using the sud alternatively this trigger could be the results of an event that occurs somewhere in the system or outside the system preconditions are statements that are assumed to be true when executing the use case these conditions are not checked anywhere in the use case use cases guarantee that certain conditions will be true at the end of a process these guarantees are promises made to the stakeholder as to how the system will behave ideally the process would be successful and the stud will deliver all guarantees presented by the use case however if an error or unexpected condition occurs somewhere during the process some minimal guarantees about the state of the system should be given actors in eu's case are external persons systems or other entities that interact with Assad in some way these actors include the user or users of the system any entities upon which the system depends in order to carry out a business function and any other systems with which the sud interfaces the actor that performs the actions in the use case is called the primary actor in a general sense the primary actor is the user of the system that will be created thus by definition the primary actor is also a stakeholder in the system supporting actors have roles in carrying out some operation of the system but they do not utilize the system directly in a business level use case the supporting actors have important roles in the business process but they do not carry out the process directly now use case documents are narrative texts they tell a story about how the system is going to act under given scenarios in order to communicate the story effectively there are certain rows of thumb that should be followed when writing use cases first use Simple language with active voice and simple sentence constructs for example say the user sits down at the keyboard instead of keyboard is presented to the user second do not include user interface elements in the use case say the user enters data into the system instead of the user types characters into a text box in the middle of the screen leave the user interface designed separate specifications or prototypes finally every use case should end with at least one guarantee about the state of the system if the process is successful how does the system react if the process fails what is the condition of the system after the failure remember that computer software can and does periodically fail also remember that human users are guaranteed to fail and that the system should be made robust or as idiot proof as possible now you may be aware that the unified modeling language provides a graphical use case diagram you may also believe that these diagrams will save time compared to writing use case narratives unfortunately graphical use case diagrams do not provide much information about the process being described they illustrate a stick figure representing a primary actor executing a use case represented by a box containing the use case title however the graphical diagrams do not provide any detail about the process as such the diagrams are useful for providing indexes into the use case narratives but they do not add any real value to the use cases themselves a simple textual index with numbered use cases would accomplish the same Purpose with less effort agile methods particularly extreme programming have extremely short iteration times that are not conducive to formal use cases one lightweight alternative to use cases is the user story which is a short and simple way of expressing a particular requirement user stories are written for each scenario similar to the approach used in use cases however instead of a formal document the user story simply expresses what a primary actor would like to achieve in the system for example a user story for a database administrator might read as a database administrator I want to be able to perform an operation that updates the name address and age fields in a single transaction user stories typically omit some design details but these details can be discovered later as the software is developed also these stories can be either black box or white box in perspective depending upon the needs of the development team in summary es cases describe a scenario in which a primary actor interacts with the system under discussion each use case is named according to the process it describes in use cases generally have triggers indicating the conditions under which the process is applicable use cases May specify preconditions which are assumed to be true prior to execution of the use case use cases provide an outline of the steps required in the process along with a set of guarantees about the state of the sud after the scenario is finished quality requirements such as execution time limits may also be specified Agile development methods especially extreme programming disfavor use cases due to the time involved in creating the documentation the short iteration development techniques use user stories and other lightweight methods instead in this lecture I'm going to discuss interrupt handling I'm going to begin by discussing interrupt handling at the hardware level and then move on to Features provided by the CPU and finally features of the operating system for handling interrupts at the hardware level devices are connected either via traces on the motherboard or via a shared messaging bus to the advanced programmable interrupt controller the CPU checks for Hardware interrupt signals from this controller after each user mode instruction is processed so after each instruction is executed running some particular program on the system the CPU actually checks to see if there are any interrupts that need to be processed if an interrupt signal is present then a kernel routine is called by the CPU in order to handle this interrupt the interrupt dispatch routine if it's not implemented directly in Hardware is actually a compact and fast routine that could be implemented in the kernel often coded in Assembly Language it has to be that fast the specific interrupt Handler is always a kernel routine or in the case of a micro kernel an external server routine in this specific Handler depends upon the type of interrupt received these are typically coded in C so once again the fetch execute cycle we check for an interrupt pending after each instruction is executed if there is an interrupt pending we escalate privilege to Kernel mode push the program counter onto the stack in other words save it so we can resume from that point in whatever program we're interrupting and then go and handle the interrupt we do this by loading the program counter the new program counter that is from a fixed memory location provided to us by the interrupt Vector table the interrupt Vector table gives us the address of all the different interrupt handlers we need a separate Handler for each type of interrupt the clock requires different logic from the keyboard for example other devices such as say a webcam attached to your computer needs different logic in order to process messages from it so we have different interrupt handlers for each of these different devices table simply stores the addresses of each Handler and in our monolithic kernel case the handlers are actually part of the kernel and are mapped into kernel memory space the interrupt Vector table consists of a list of each of these addresses for kernel handles and conceptually is mapped into both kernel memory and user memory so that it can be accessed quickly and historically these started at address 0. however the mappings are different depending on the architecture on Intel based systems we have something called the interrupt descriptor table and the IDT provides special instructions and data structures that are actually managed by the CPU itself so the interrupt handling can be as fast as possible and protection rings can be changed automatically the IET is simply a reserved block of RAM used by the CPU to jump quickly to a specific interrupt Handler this IDT is mapped into kernel space which was originally beginning in address zero but this mapping is actually flexible with modern CPUs and can be mapped into other parts of the memory space the first 32 entries of the IDT are actually not used for interrupts per se but they're actually used for CPU fault handlers and then the interrupt Vector table part of the data structure begins after the CPU fault Handler table when we actually go to handle interrupts the handling occurs in the kernel and this is done with two levels of interrupt handling the fast interrupt Handler and the slow interrupt Handler the fast interrupt Handler is the piece of code that's invoked directly from the interrupt Vector table whenever an interrupt occurs this is the piece of code that the CPU is just going to jump to when an interrupt occurs fast handlers execute in real time and they're called Fast interrupt handlers because they need to be fast the execution of one of these interrupt handlers needs to be short if any large-scale data transfer needs to occur say we need to get a lot of data from the device cell at once this operation is handled by having the fast interrupt Handler enqueue something called a task into the operating systems task queue whenever all the fast interrupt handlers for all the different interrupts are done executing the CPU will go and check the task queue and execute any tasks that are present there part of interrupt handling that goes into the task queue is called slow interrupt handling and it's called this because it's not executed immediately and it can be interrupted by other devices so what happens if an interrupt request is received while we're still processing an interrupt from a previous request well there's no problem if we're in the slow interrupt Handler because this processing is done in such a way that we can stop this processing and handle the new interact if necessary what happens if we are still running a fast interrupt handle well the new interrupt Handler could be executed before the first interrupt Handler is finished and this could cause some major problems especially if we get a new interrupt from a device that's sharing the same interrupt line as the one that we're handling so what we do is we make fast interrupt handling Atomic that is we make it uninterruptable on a single core system this is as simple as disabling interrupts as long as we're running a fast interrupt Handler on multi-core systems their actual special machine instructions to facilitate Atomic operations that are pegged to one CPU core these Atomic interrupt handling operations will run to completion without interruption by any other interrupt or any other request on the system thus the longer an interrupt Handler takes to run the longer the system will be unresponsive to any new interrupts so what happens if a fast interrupt Handler is coded in such a way that it takes too long other devices might be requesting attention at the same time that this long-running interrupt Handler is executing worse the same device that generated the original interrupt might now have more data to deliver to the OS before all the previous data is completely received this could cause Hardware failures buffer overflows dropped data dropped messages all kinds of issues at the hardware level however inside the operating system this could lead to something called an interrupt storm which is really bad an interrupt storm occurs whenever another interrupt is always waiting to be processed whenever a fast interrupt Handler finishes its execution that could occur either because the fast interrupt Handler is too long it needs to be split into a fast and slow Handler this can also occur if Hardware has certain bugs that cause it to erase spurious interrupts if the operating system is perpetually handling interrupts it never runs any application code thus it never appears to respond to any user inputs the result of the situation something called a live lock the system is still running it's still processing all these interrupts however it's not doing any useful work thus to the user the system appears to be frozen and when an interrupt storm occurs in this live lock situation results the typical way out of this problem involves judicious use of the power button so interrupt handling is an important Concept in order to support multi-programming systems an interrupt handling when these interrupt messages come through from Hardware is divided into two types of Handler so that we don't get the interrupt storm the fast interrupt Handler executes atomically without being interrupted by anything else but it must be fast must enter that interrupt handler do some very short operations and immediately exit that handle if any long-running operations need to occur as a result of the device interrupt request we need to handle those operations inside the slow interrupt Handler in this lecture I will discuss finite State machines and uml state diagrams uml State diagrams provide a way to Model Behavior of a system by analyzing how the state of the system changes in response to input data in this diagram we have a model of a simple system that captures the user entering a number consisting only of the digits 0 through 9 and at most one decimal point the system finishes and presumably Returns the inner number whenever the user presses the enter key invalid keystrokes are simply ignored with the simple State machine such as the one presented here we can verify that the user has entered only numeric data however understanding how the state machine operates and how uml provides us with the convenient way to draw the state machine requires some explanation First We Begin by observing that every software system has something that we call State in simple terms Software State consists of the current values of all the variables in a program at any given moment in time in the case of object-oriented software systems the state of the system is stored within the objects in the system and each object has its own state which is stored in the object's variables or fields the essence of running a computer program is changing the software from one state to another a computer at least as we know it today is a machine capable of representing an extremely large but finite number of different states computer programs simply move the computer system from one state to another state a conceptual way to model a simple computer or a single object in a software system is to use a finite State machine finite State machines have a fixed number of states and the system is always in one of these states when it is running as computation progresses the finite State machine transitions from one state to the next each state machine can receive input and a state machine possibly produces output during a state transition upon entering a state upon leaving a state or even within a state for Simplicity the conceptual model used for input and output data is a tape the state machine can read characters from or write characters to one of these tapes the set of characters that a machine knows how to read or write is called an alphabet words are formed from this alphabet creating a language that the state machine recognizes nearly all state machines have an input tape technically speaking a finite State machine with only an input tape and no output tape is called a finite State automaton this type of machine simply accepts or rejects the input validating whether or not it is a recognized word in the language a finite State transducer has two tapes one for reading input and one for writing output State machines can be either deterministic or non-deterministic a deterministic finite State machine transitions from its present state to another state based solely upon present State and the most recent input character read from the input tape in contrast a non-deterministic finite State machine can transition from one state to another on its own without consuming any input data these types of transitions are called Lambda transitions or Epsilon transitions in programming language Theory a state machine that produces output can be limited to producing output only upon entering a new state this type of estate machine is known as a war machine and is theoretically easier to model if a state machine can produce output at any time within a state or during a state transition it is called a melee machine by now you might be thinking that state machines sound rather Limited in their capabilities and this is a correct observation for the simple finite State machine both deterministic and non-deterministic finite State machines are equally powerful but their power is limited to recognizing inputs that could be described using a regular expression to obtain the power to process computer programming languages which typically utilize what are known as context-free grammars we must add a stack to our finite State machine to create a push down automaton alternatively if we modify a finite State machine to be able to move back and forth along a tape used for both input and output we create a linearly bounded automaton that can handle context-sensitive languages as input the linearly bounded automaton is the closest theoretical State machine to the present day computer the final type of State machine is the turing machine which further modifies the linearly bounded automaton to make the tape infinitely long giving it infinite memory such a machine if it existed could handle recursively innumerable languages the unified modeling language allows us to diagram deterministic finite State machines which allows us to model many of the operations in individual objects in a large-scale system perform in the practice of writing software we try to minimize non-determinism since non-deterministic programs could produce different results each time they are run with the same inputs therefore we would like our Enterprise applications to be deterministic uml State diagrams are used to model the behavior of individual objects within the system under discussion one diagram is generally used for each object being modeled so a normal set of design documents would contain multiple State diagrams State machines expressed by uml either can produce output at any time or the designers can restrict the output to the entry condition for each state thus these State machines can use either the melee or more model I'm now going to illustrate the components of the uml state diagram first we have the initial state in the final state of the system these states represent the entry and exit points for the diagram respectively the initial state or diagram entry point is represented by a filled Circle the final state or diagram exit point is represented by a filled Circle within another Circle internal States within the uml state diagram are given by a rounded rectangle divided into two sections in the top section a human readable name for the state is generally provided this name should reflect the purpose of the state below the name the bottom section of the state rectangle contains the set of activities that the system performs while in the listed state these activities are specified by providing a condition under which the activity is performed followed by a slash followed by the name of the action that is to be executed Three Special condition names are reserved by the uml specification to handle specific activities these are entry which specifies an action to take upon entering the state exit which specifies an action to take upon exiting the state and do which specifies an action to take while in the state note that a do action may be a continuous ongoing action that occurs until the system leaves the state transitions between states are drawn as arrows originating from one state and pointing to the next state the system follows the arrows from state to state until it reaches the final state a transition is taken only if the input to the machine matches a trigger condition specified next to the transition error this trigger condition can be accompanied by an optional guard expression enclosed in square brackets in which case the transition will be taken only if both the trigger matches and the guard expression is true after the trigger an optional guard expression is placed a slash followed by any action to take during the transition the slash is always present in the uml state diagram even if there is no action given uml does allow a state with any uml state diagram to contain another state diagram instead of a list of activities these so-called nested State diagrams are permitted but they will quickly make a uml state diagram difficult to draw and read instead of nesting State diagrams I suggest using a special symbol consisting of two empty states joined together as a way of indicating that a state within the state diagram is itself modeled by another state diagram then you can draw a second state diagram for the complex state I will now summarize what we have learned by discussing the meaning of this simple uml State diagram first the state machine starts in the initial state it transitions to the no decimal state only upon receiving a digit in the range 0 through 9 as input upon arriving in the no decimal State the machine depends the received input digit to the output the machine stays in this state until one of three types of input is observed another digit a period for a decimal point or the enter key press if another digit is received the machine exits then re-enters the no decimal state upon re-entry the new digit is appended to the output if the period key is pressed the machine exits the no decimal State and enters the have decimal State appending the decimal point to the output during the transition from the half decimal State the system will exit and re-enter the have decimal State whenever the digits 0 through 9 are received appending the received digit to the output upon the transition from either State pressing the enter key ends the Machine by moving to the final state also any invalid data to the machine is ignored at all States since no trigger is matched in short this finite State machine implements logic that reads in and stores either an integer or a simple floating Point number in this lecture I will introduce dynamic memory allocation at the system level this method of memory allocation improves the degree of multi-programming that A system can provide by allocating memory to processes as needed instead of ahead of time in fixed size chunks yeah fixed memory partitioning is fast and efficient in terms of overhead but it wastes space and limits the number of concurrent processes to the number of partitions that will fit into RAM dynamic memory allocation resolves this issue by allocating memory to processes as it is needed this mechanism increases the degree of multi-programming with the current office support but this Improvement comes at a cost of Greater complexity in addition there are trade-offs present in dynamic memory allocation that directly affect performance of the system these issues include efficiency methods for tracking free space algorithms for determining where to make the next allocation and memory fragmentation the primary issue with dynamic memory allocation is fragmentation during execution processes allocate in free memory in chunks of varying sizes over Time regions of free space within memory become non-contiguous with sections of allocated memory in between sections of available memory this fragmentation could lead to Serious performance issues since program execution speed will drop dramatically if every data structure must be implemented as a linked list of small pieces furthermore algorithms that try to perform on-the-fly memory defragmentation are extremely complex to implement and would also severely impact system performance since it is Impractical to avoid or fix fragmentation memory fragments are significant concern when space is dynamically allocated small fragments of memory are useless to programs particularly CNC plus plus programs which require structures and objects to be allocated in contiguous memory regions these structures are likely to be in various sizes that are not efficient to utilize for memory management purposes so the systems will allocate a few more bytes than the structures actually require in order to improve efficiency of the memory tracking systems furthermore structures and objects will be allocated in freed numerous times during program execution further fragmenting the ram over time fragmentation can waste large portions of system memory limiting the degree of multi-programming by making it impossible for new processes to start there are two types of fragmentation external and internal external fragmentation occurs when free space and memory is broken into small pieces over time as blocks of memory are allocated and deallocated this type of fragmentation tends to become worse external fragmentation is so bad with some allocation algorithms that for every n blocks of memory that are allocated to a process the system wastes another half in blocks and fragments with these algorithms up to a third of system memory becomes unusable the other type of fragmentation is called internal fragmentation internal fragmentation occurs because memory is normally allocated to processes in some fixed block size the block size is normally a power of two in order to make the kernel memory tracking structures more efficient processes tend to request pieces of memory and sizes that do not fit neatly into these blocks thus it is often the case that parts of a block are wasted as a result for small memory requests large portions of the block may be wasted when a process requests cheap memory the kernel must find a chunk of free space large enough to accommodate the request this chunk cannot be smaller than the requested size since the process expects to use all the space it is requesting since the memory is divided into blocks for efficiency the chunk of memory returned to the process is normally larger than the amount requested unless the process happens to request a whole number multiple block size I'm now going to introduce four classical algorithms for dynamic memory allocation best fit worst fit first fit and next fit the first classic algorithm for dynamic memory allocation is the best fed algorithm in this algorithm the kernel searches for the smallest chunk of free space that is big enough to accommodate the memory requests although best fit minimizes internal fragmentation by avoiding overallocation external fragmentation is a major problem an attempt to reduce the external fragmentation in the best fit algorithm is observed in the somewhat counterintuitive worst fit algorithm using the worst fit algorithm the kernel finds and allocates the largest available chunk of free space provided it is large enough to accommodate the request in theory this allocation strategy leaves larger and thus potentially more usable chunks of free space available in practice however this algorithm still fragments badly both internally and externally aside from the fragmentation both of these algorithms are impractical in actual kernel implementations because they must perform a search of the entire free list to find the smallest or largest chunk of memory the need to search the free list is eliminated using the first fit or next fit algorithm in the first fit algorithm the kernel simply finds and allocates the first chunk of memory that is large enough to satisfy the requests this approach does result in internal fragmentation and it also tends to create small fragments free space that accumulate at the start of the free list reducing performance over time the next fit algorithm avoids the external fragment accumulation by starting the search for the next chunk of memory from the most recent allocation in practice only the first fit algorithm is used in the Linux kernel and then only for embedded devices this algorithm is called the slob allocator which stands for simple list of blocks for non-embedded systems that have the computational power for a more complex algorithm slab allocation is used instead of any of these simple algorithms in this lecture I continued the discussion of dynamic memory allocation I will introduce the power of two methods with the buddy system and coalescence for allocating memory to processes then I will introduce slab allocation which is used within the kernel to allocate kernel data structures efficiently in the previous lecture I introduced the classic algorithms for memory allocation best fit worst fit first fit and next fit now I want to introduce algorithms that are actually used within OS kernels to perform memory allocations efficiently these algorithms are called power of two methods and they work by maintaining information about allocated and free blocks in a binary tree instead of a list at the top level memory is divided into large blocks called super blocks as processes request memory these super blocks are divided into smaller sub blocks from which the memory is allocated sub blocks can be further divided creating a hierarchy of block sizes algorithms based on this method are relatively fast and scaled to multiple parallel CPU cores these algorithms also reduce external fragmentation by coalescing free blocks as I will discuss in a few moments some internal fragmentation does still occur in this diagram we can see three super blocks two of which are partially in use each of the first two super blocks has been divided into three sub blocks and the third sub-block of the first Super block has been further divided when a process requests memory the kernel must perform a search of the tree to find an appropriately sized block to handle the request in performing the search the kernel might subdivide an existing block into a smaller block to reduce the amount of internal fragmentation since this is a search process a reasonably powerful CPU is required to make this operation efficient another Improvement to memory management is to utilize a buddy system in the power of two strategy in this allocation system two limits are chosen to be powers of two the upper limit U and the lower limit l the super blocks are the blocks of size U and these blocks can be subdivided into blocks as small as L bytes there are trade-offs in picking the size to use for l a smaller size produces less internal fragmentation since the block size more closely matches the smallest request sizes from the processes however a smaller size for l means that there are more total blocks to be tracked which increases the size of the binary tree using more RAM to store the tree and increasing the search time on the other hand a larger size for L reduces the search time and makes the tree smaller but the amount of internal fragmentation increases in addition to the block size limits the buddy system also uses a technique called coalescence whenever a process frees a block the kernel checks to see if either neighboring blocks is free also if one or more neighbors or buddy blocks are free the block is coalesced into a larger block reducing external fragmentation the coalescence algorithm is efficient since the maximum number of coalescence operations that must be performed is equal to the base 2 logarithm of U divided by L by properties of logarithms this value is equivalent to the base 2 log of U minus the base J log of L thus for example a system with a maximum block size U of 4096 bytes or 2 to the 12. and a minimum block size of 512 bytes or 2 to the 9 will require at most three coalescence operations to recreate the super block returning a single 512 byte block whose neighboring 512 byte buddy is free will cause the two blocks to be coalesced into a single 1024 byte block if the neighboring 1024 byte block is free the 2024 byte blocks will be coalesced into a 2048 byte block then if a buddy 2048 byte block is free the third coalescence will produce to 4096 byte super block the power of two methods are useful for allocating memory to processes where some internal fragmentation is acceptable however within the kernel it is preferable to minimize both internal and external fragmentation to avoid wasting space this conservative approach is needed since the kernel is always mapped into main memory an efficient solution for allocating kernel memory is to use a slab allocation algorithm in which kernel memory is arranged into fixed size slabs each slab is divided into region size for specific types of Kernel objects including file descriptors semaphores process control structures and other internal data structures initial layout of these slabs is performed at compile time at runtime several of each of the different slab layouts are pre-allocated into caches whenever the kernel requires a new data structure space for the data structure is simply taken from the slab cache if the slab cache starts to run out of certain Slab layout it automatically Provisions extras graphically slabs can be represented in a manner shown here in this example we have two pre-allocated copies of the same Slab layout in which each slab can hold a single instance of each of five different kernel objects some wasted memory does occur with this Arrangement since there might be a larger number one type of object than of another type of object however this approach is generally more efficient in terms of Kernel space utilization slab allocation does require more CPU power than does a classical method such as first fit thus in some embedded environments the slab allocator might be preferable questions if slab allocation is chosen over the slab allocator the Linux kernel has two choices of slab allocator the first choice is the original slab allocator which was the default allocator until kernel version 2.6.23 this allocator performed well on shared memory systems with few CPU cores but wasted considerable memory space when used on extremely large shared memory systems such as those found in graphics rendering Farms to reduce the space waste on large-scale SMA systems Kristoff lameter at silicon Graphics developed a new allocator called slub which reduces the size of data structures needed to track allocated and free objects the initial implementation of the slub allocator contained a performance bug that affected the results of certain memory benchmarking tools initially Kristoff leaped the bug was of little importance since the conditions required to trigger it were fairly uncommon in practice however Lenas informed Krista that either the problem would be fixed or slub would be dropped entirely from the kernel in the end it was determined that the bug was caused by adding partially used slabs and beginning of a linked list instead up to the end of that list the fix was a change to one line of code and the slub allocator has been the default Linux allocator this is 2.6.23 in this lecture I will introduce memory resources and how the system and its processes view random access memory in order to run any process or instance of a program on a computer system we need to provide two critical resources access to the CPU and access to random access memory or RAM for storing and manipulating data Ram is a type of dedicated Hardware memory that is attached to the motherboard it is separate from persistent storage or our disk space Ram is also volatile which means that it loses its contents whenever power is interrupted to the ram modules including whenever the computer is turned off at a low level the computer hardware presents memory to the operating system as one large block of space this space is divided into bytes and each byte in memory has a unique address that can be used to access it a 32-bit architecture provides enough of these byte addresses to utilize up to four gigabytes of RAM above that amount there is not enough space in a 32-bit number to store the address of any memory locations in access to four gigabytes thus a 64-bit architecture is required for systems with more than four gigabytes of RAM each process or running instance of a program on a system uses a different view of memory process memory is divided into several pieces the stack the Heat global variables and the text segment the stack is used to store automatic variables or variables that are local to functions in the C programming language space on the heat is manually allocated and deallocated by the programmer when writing C code using the function's Malik and free space for extra data is located between the stack and the Heap and the stack and the Heap grow toward one another Global variables are provided with their own section of memory which is allocated between the Heat and text segment the text segment is used to store program code this segment of memory is read-only and cannot be changed as I mentioned in the previous slide automatic variables are placed onto the stack this placement is performed by the compiler when the program is built placement of data onto the Heap is historically performed by the programmer although many of these operations are now automated in modern dynamic languages in the CNC plus plus languages the programmer must explicitly request and release or allocate and de-allocate keep space in C these operations are performed using the Malik and free functions while the new and delete operators are used in C plus in Java the programmer must explicitly allocate space on the Heap using the new keyword however the Java runtime automatically determines which Heap allocations are no longer in use and freeze those locations automatically this process is called garbage collection python provides both automatic allocation and garbage collection whenever a data structure requires Heap space the python interpreter allocates it automatically once the data structure is no longer used by the program the garbage collector deallocates it without programmer intervention use of heat memory and processes presents a challenge to the operating system because these allocations and deallocations are not known in advance the compiler is able to track and Report the amount of Stack space needed since the number of local variables in a function never changes in a language like C however the number of Heap allocations may vary from program execution to program execution making it impossible to know exactly how much space should be allocated in advance worse keep memory allocations tend to be small and frequent so the process of allocating memory from this section needs to be fast this speed and dynamic availability needs to be maintained while the operating system shares the computer's Ram among multiple processes at the same time in addition to sharing memory among processes the kernel has memory requirements of its own aside from the kernel code and its own automatic variables the kernel is filled with data structures that dynamically grow and shrink during the course of system operation these data structures are numerous and include process control blocks the ready list scheduling views device access tables and many other types of structure unlike some processes however all memory allocation and deallocation in the kernel is performed by the kernel programmers in the Linux kernel programmers can use one of a number of functions including K Malik Casey Alec VM Alec K free and V3 the first issue that must be solved by the kernel is sharing memory between itself and a user space process this sharing is accomplished first by dividing the memory into two regions kernel memory and user memory the kernel keeps its data structures program code Global variables and automatic variables in kernel memory isolating these items from the running process process memory is placed in a separate region from the kernel but the kernel is always available in memory this mapping is necessary to maintain performance whenever an interrupt occurs the process makes the system call to the kernel or the process experiences a fault that must be handled by the kernel so how do we go about running multiple processes at the same time well the first way we can consider which is used in some types of embedded systems is to divide the memory into fixed size chunks called partitions a memory partition is a single region of ram that is provided to a single process both the program code and data associated with each process must fit within this pre-allocated space if a process grows too large it will run out of memory and crash furthermore the maximum number of concurrent processes called the degree of multi-programming is limited by the number of partitions available once all memory partitions are used the system cannot start any new processes this situation is exacerbated by the fact that small processes may not be using their entire partitions resulting in wasted memory in this lecture I will introduce paging and related topics including logical addressing address translation and the translation look aside buffer paging provides a mechanism for sharing memory among multiple user space processes at the same time this mechanism improves upon simpler algorithms such as static partitioning and direct power of two methods by allocating fixed size pages of memory to processes the key to effective memory utilization with gauging is that each process is given its own logical memory space in other words each process has its own view of memory with its own address space the addresses that the process sees are called logical addresses these logical addresses are divided into fixed size pages each process in the system receives its own private set of pages with private memory addresses when a process accesses memory using one of its logical addresses the CPU translates The Logical address into a physical address physical addresses refer to locations in system memory for performance reasons translation is done in terms of memory frames or fixed size regions of ramp the base frame size is normally four Kitty bytes although this can vary by Hardware device and most Hardware can support multiple different frame sizes operating systems normally use logical page sizes that correspond to supported Arbor frame sizes again four Kibby byte pages are a typical base size on x86 and x8664 systems the Linux kernel can support so-called huge Pages which can be as large as one Gibby byte when using the newest AMD and Intel CPUs the key advantage to paging is that it eliminates the issue of external fragmentation since the CPU is translating logical page-based addresses into physical frame-based addresses anyway there is no need for the physical frames to be contiguous as a result we can store a data structure and process memory using pages that are logically contiguous however when these logical pages are mapped to physical frames the frames may be scattered throughout ramp notice that the distinction between a page and a frame is a matter of terminology a page refers to a block of logical memory while a frame refers to a block of physical memory for now pretend that there is a one-to-one correspondence between logical pages and physical frames we'll make things more complicated later the key to making page translation efficient is that the CPU contains special Hardware called the memory management unit or mmu which performs the translation operations in this diagram the process accesses memory using logical addresses which are divided into pages when requests are made using these addresses the memory management unit on the CPU translates The Logical address into a corresponding physical address the resulting fiscal address will be to some point of the physical memory frame note that individual memory addresses within Pages or frames still remain contiguous which is important because the mmu translates page numbers to frame numbers leaving the offset to the memory location within the page unchanged as shown in this diagram we can divide a logical address from a process into two components the page number P and the offset d when the mmu is asked to perform a translation it consults the data structure called a page table which provides a mapping between page number and frame numbers using this information the mmu constructs the physical address by using the corresponding frame number represented here about the letter F in place of the page number once again the offset to the particular byte within the page or frame is left unchanged a particular byte in Ram is actually addressed using the frame number and offset into the frame however to the process this memory access appears to occur using a logical memory address which is conceptually divided into a page number and offset the offset component is not changed by the mmu but the page number is replaced by the physical frame number in order to perform the translation from page numbers to frame numbers the mmu must consult the page table the page table is a data structure that is itself stored in Ram in the kernel memory space storing the page table in Ram leads to a major problem since every mmu translation would require a lookup since the lookup requires a memory access each process memory request would actually require two physical memory accesses this situation is especially Troublesome because a memory access occurs both on accessing data and upon reading the next instruction to be executed without some additional Hardware to resolve this problem system memory performance would be effectively cut in two greatly reducing the overall performance of the system the solution for eliminating the double memory access issue is to add a component to the CPU called the translation look aside buffer or tlb which stores some paged frame mappings some tlbs also provide room for address space identifiers which Aid in implementing memory protection the tlb is a piece of associative memory meaning that it can perform rapid parallel searches resulting in constant time lookups for page translation this memory is exceptionally fast meaning that it is also quite expensive as a result tlb sizes are typically limited from eight to four thousand ninety six entries the addition of the tlb provides a potential shortcut for performing address translation instead of immediately searching the page table the mmu first searches the tlb if the page to frame mapping can be found in the tlb then it is used to perform the translation from page number P to frame number F this situation is called a tlb it a tlb missed occurs whenever the page number is not present in the tlb in this case the mmu must search the page table to locate the appropriate frame number the CPU and operating system employ various policies to determine when to store a page to frame mapping in the tlb a simple policy would be to use a first in first out policy that replaces the earliest entry in the tlb with the newest entry upon a tlb miss other more complex and potentially better policies also exist in this lecture I will discuss memory protection including segmentation and permission bits on page table entries remember the operating systems perform two functions abstraction and arbitration mechanisms for accessing memory provide abstractions of the underlying memory Hardware however operating systems must also arbitrate access to Ram by ensuring that one process cannot access memory that does not belong to it without this arbitration a process could change memory belonging to another process or worse it could crash the system by changing memory that belongs to the kernel on systems that utilize simple memory management such as power of two methods memory access protections are provided by a mechanism called segmentation on the majority of modern systems which employ aging memory protection is implemented as part of the paging system and memory access permissions are stored in the page table on any system processed memory is divided into logical pieces or segments that compile time these segments include the text segment a region for Global variables a stack region for automatic variables and a heap for dynamically allocated data structures access permissions apply to each segment in particular text segment is set to be read-only outside a single process there must be a mechanism to track which segments of memory belong to which processes when a process is executing its segments are marked valid so that it can access the corresponding memory locations segments of memory belonging to other processes are marked invalid and any attempt to access those segments results in a fault or interrupt called the segmentation fault typically a segmentation fault causes the process to be terminated segment memory permissions are implemented on non-paging systems using a segment table the segment table has permission bits that can be applied to each region of memory when memory is accessed using segmentation the segment table must be consulted to determine whether or not the access is legal in this example a process requests access to memory using a logical address since we do not have paging with the system this logical address is not translated by a page table mechanism however this logical address is divided into a segment address and an offset into the segment in a manner similar to page translation the mmu checks the segment table to determine if a particular memory access is valid in this example the segment Table stores up to four permissions and up to three bits a valid invalid bit a read write bit and an execute bit in practice most systems that support segmentation without paging normally only use two bits valid and valid and read write if the process tries to read from a segment that is marked valid the memory access is permitted and occurs normal the same thing happens if a process tries to write to a memory location that is marked both valid and writable however if a process tries to write to a segment marked read only or if a process tries to access an invalid segment the CPU triggers a segmentation fault and the process is terminated for some invalid accesses on a unix-like system this segmentation fault may be reported as a bus here with paging systems which comprise the majority of modern systems including mobile devices memory protection is accomplished by adding permission bits to the page table entries in general page table entries will have a valid invalid bit and a read write bit the valid invalid bit is used in the same way as it is for segmentation pages that a process is allowed to access are marked valid other pages and any non-existent pages are marked invalid if a process attempts to access an invalid page a CPU fault is raised which functions like an interrupt to trap into the kernel a Linux kernel will send a six Sig V or sigbus signal to the process depending on the location and memory the process tried to access in practice the signal is normally not caught and the process terminates for historical reasons this event is called the segmentation fault or seg fault for short the read write bit used to mark the text segment of a process can be used to allow pages of memory to be shared between processes pages that are re-entrant or read-only can be accessed by multiple instances of multiple programs simultaneously this capability is useful on Modern systems since multiple instances of programs are typically run at the same time in the case of a web browser for example it is only necessary to load one copy of the browser program code into memory several copies of the browser can be run as several different processes sharing the program code and thus saving memory the open source Chromium browser and its Google Chrome derivative allow each tab to run in a separate process shared memory Pages allow the code for the browser any extensions and any plugins to be loaded only once saving memory this diagram illustrates how two processes can share a single page in Ram each process sees a handful of valid frames one of which is marked read only if this memory frame contains code or other information that can be shared between the processes then the two frame numbers will be identical within the separate processes each process may use a different page number to represent this memory location however since each process has its own independent logical view of memory incidentally this diagram is a conceptual diagram only it does not directly map to any particular data structure in the operating system instead the two tables illustrate how each process might see page two newer AMD and Intel CPUs support an additional permission bit for setting execute permissions this bit called the no execute or NX bit is actually an inverted permission it is set to 1 whenever execution of data found on a memory page is forbidden originally the NX bit was implemented by AMD on its 64-bit capable processors using the marketing name of enhanced virus protection Intel followed suit and added this mechanism as the execute disable or XD bit the concept behind the bit was to provide a mechanism that could be used to prevent execution of native machine instructions from memory space used for regular data although the primary beneficiary of this feature was a certain virus prone system that is not a Unix variant Linux kernel does support the nxbit as a guard against buffer overflow and similar exploits in the example presented in the hypothetical page table here I'll leave the page with hex numbers 0 4 A4 allows code execution in the event of an exploit attempt a malicious application could try to load code at another page perhaps 0 for A1 however since the NX bit is set on that page any attempt to execute the code loaded by the exploit will trigger CPU fault and the process will be terminated this mechanism increases the security of the system against certain types of attacks in this lecture I will introduce test driven design I will Begin by discussing the importance of testing then I will introduce unit testing followed by an overview of the test driven process ing computer software serves two purposes first it provides a way to detect and isolate buds in the application any application less trivial than hello world generally contains multiple bugs as software matures through its life cycle we expect the number of bugs in an application to decrease as issues are found and fixed although the number of bugs in an application May decline over time it is not good practice to release an initial version of a program with obvious and or major defects thus testing should be a major component of the initial software design process as well as an ongoing activity throughout the software life cycle although we often associate testing with finding and eliminating bugs in the code testing serves a second purpose that is at least as important that is testing verifies that an application actually satisfies its requirements and carries out the operations it is designed to perform testing demonstrates that our software is ultimately correct in fact one principle of software development is that a working program as evidenced by successful testing is a proof of a solution to some kind of problem just as mathematicians use proofs to verify that steps in solving a problem are correct we can treat each program as approved the steps in an algorithm are correct for solving a problem I should emphasize here that despite the recent marketing of mathematical methods for software analysis in a number of Professional Publications in our field testing Remains the one and only way to verify that a program is ultimately correct while mathematical methods such as formal verification and design by contract do have useful properties such as ensuring that our specifications are not internally inconsistent they do not replace testing in fact showing that a mathematical model correctly represents a solution to a problem is mathematically equivalent to writing the correct program furthermore we have already stated that a correct program is approved so how can these tools truly revolutionize the way we programmed to the point where programming as we know it becomes extinct in other words can we use the formal methods effectively and efficiently to replace good programming techniques well the first problem is that it takes roughly three times as many symbols to model a program as it does to implement the program as Peregrine Hansen once noted how can you expect to get the formal approaches working correctly with three ensembles if you can't write a correct program with n symbols the larger issue however is that all the formal approaches are effectively applying various algorithms to the process perhaps these algorithms could save us some time by figuring out whether or not our program design will be correct in advance or to put it mathematically perhaps these methods could determine in advance if our proof will work before we actually do the proof by writing the program turns out that David Hilbert had the same question about mathematical proofs back in 1928 the question which he called the insightens problem asked if it is possible for an algorithm to determine whether or not a proof would be successful before actually doing the proof within a decade Alan Turing proved that the halting problem was reducible to the insurance problem and he also proved the halting problem is not solvable algorithmically thus the answer is no or to put it succinctly there is no substitute for testing there are a number of different ways that we could test a program and these methods correspond to different types of testing for test driven design we are primarily concerned with the type of testing called unit testing unit testing involves testing each component of the application in isolation ensuring that every component works by itself before trying to integrate it with other components in an object-oriented programming Paradigm a component of the program is typically a single class we write unit tests as assertions or conditions that will be either true or false we say that a unit test passes whenever the result of a test matches its expected result when a test result does not match its expected result the test fails in performing unit testing we normally use a combination of positive and negative test cases positive test cases expect the result to have a value of true while negative test cases expect a false answer we normally automate unit testing by using some kind of unit testing tool that runs the test cases for us and produces a report a large number of automated unit testing tools exist one popular class of unit testing tools is called X unit these are technically completely different tools implemented for testing components written in completely different languages these tools tend to operate in a similar way among the different x-unit tools we have junit for Java applications Pi unit for python applications PHP unit for PHP code in-unit for c-sharp applications and CPP unit for C plus code the process of using a test driven design with unit testing is Illustrated in this uml activity diagram this diagram shows a single iteration of a test driven process test driven techniques are examples of iterative and incremental development models We Begin by designing a test case which must initially fail as I will discuss in a moment if the test case does not initially fail we continue the design step until we develop a test case that does fail once the test case fails we write code for the application until the test case passes when we run the application against the test case we run the test as many times as necessary until the test passes after the test passes we refactor the code to make it readable and clean we then run all the test cases we've ever written for the system and verify that all the test cases pass if not all the test cases pass we still have work to do on our new code once all the test cases do pass our iteration is complete as I mentioned a moment ago the first step in an iteration using a test driven process is to create a test case that fails at first this may seem counter-intuitive why do we want a test case that fails well remember that we haven't yet written any code from our test driven design if we design a test case that immediately passes we've made one of two mistakes either the test case is wrong or the program already does whatever the test is checking if the test is wrong we need to fix it so that we don't go to an incorrect test on the other hand if the test is correct and the software already passes the test case then we have no new code to write since the feature is already present in the application once we have a test case that fails we write code that extends the application to pass the new test case this code should be as simple as possible and it should be written to the test it should make the application pass the test nothing more also this initial code does not need to be pretty have correct styling except in python or be especially readable at first the goal here is to pass the test once the test passes we refactor the code to meet Style Guidelines and make it readable refactoring means that we change the code and perhaps the design to make the implementation cleaner without changing the functional behavior of the program we need to ensure that our refactoring process has not broken The Code by ensuring that the test case still access once we finish the refactoring process we know that the application passes the new test case however our new code may have introduced one or more regressions in the application meaning that our new code may have broken something that used to work to check for these regressions we must rerun every single test case that we have ever written for the application it is for this reason that it is highly useful to be able to automate the testing if our testing finds no regressions then our iteration is complete and we can move on to the next iteration in the process by designing a new failing test case for a new feature to be implemented however if regressions are present we must fix the regressions if the code cannot be fixed in a reasonable time period we revert the code to the previous version and try the process again in other words we throw out the code we just wrote and try again while throwing out the code may seem like a waste it is sometimes faster to try the implementation again instead of becoming bogged down and debugging this economy of speed is especially valued when using agile methods such as extreme programming one final thing to consider about test driven design is that like any other development technique it is not the perfect solution to every problem on the planet always be wary of claims that any process or technique including test driven design can somehow revolutionize software development like every other technique this approach has its limitations one limitation is the test driven design depends upon unit testing and unit testing has the same power as using assertions in code everything we test with the unit test needs to be reducible to a true false decision which means that we cannot unit test some aspects of the system such as the graphical user interface another limitation of test driven design is the same limitation that we have with software engineering in general if the specification or high level design is wrong test driven design cannot fix it if you implement a test case using an incorrect specification that test case will itself be wrong writing code that passes an incorrect test case will Implement a non-feature and the program will be wrong finally the test cases must be maintained over the life of the software since we need to be able to regression test after every change if the test cases are not well written so that they are themselves maintainable the test cases may become useless as the software evolves should this happen we will lose the ability to test for regressions and future changes could easily break the application in this lecture I will discuss page tables age tables are data structures that store mappings between logical pages and process memory and physical frames and ramp these structures are used and managed in different ways on different systems often with assistance from the hardware at the end of this lecture I will discuss extended page tables which are useful for allowing Hardware to support multiple simultaneous operating systems at once recall from the previous lecture that the page Table stores mappings between age numbers and frame numbers whenever a tlb Miss occurs the page table must be searched to find the appropriate mapping CPUs have differing levels of support for managing and searching the page tables automatically on most modern systems including x8664 and armed CPUs page tables are managed and searched by the CPU automatically upon tlbms this search increases the memory access time but no fault or interrupt is generated as a result the CPU does not have to perform a context switch among a few other CPUs the mips architecture requires the operating system to manage and search the page table whenever a tlb Miss occurs the CPU triggers a fault which is a type of interrupt the CPU must make a context switch away from whatever tasks currently being executed in order to execute the interrupt Handler for default software managed page tables are becoming increasingly uncommon even on embedded systems as the popular arm CPU supports Hardware management the mips CPU is typically used in lower end consumer devices such as inexpensive e-readers and the least expensive tablets one approach to reducing the page table search time whenever a tlb Miss occurs is to store the page table as a tree instead of a list this technique of hierarchical page tables divides the page tables into pages each logical address in process memory is divided into an outer page table a set of offsets into various levels of subtables and a final offset to the specific byte of memory to be accessed this technique can be generalized to any number of levels in the hierarchy but I will present here a simple system that uses only two levels as Illustrated in this diagram the data structure is arranged so that an outer page table provides a mapping between outer page numbers and Page table pages once the proper page table page is located the translation from page number to frame number can be completed quickly since the page of the inner page table is relatively small the address translation mechanism used with hierarchical page tables is more complex than that used with a simple linear page table The Logical address is divided into additional components in this example with two levels in the page table The Logical address is divided into an outer page table entry number T which specifies the location in the outer page table in which to find the proper inner page table the next component of the address p is the offset into the inner page table at which the mapping can be found in this example we have a single page to frame mapping in each inner page table entry so we can obtain the frame number from that entry a real system will be more complex and likely will require a short linear search at some level of H2 once the frame number is determined Ram is accessed in exactly the same way as it is in simpler designs with the final address join a frame number and offset into the physical address storing the page table in a tree improves access performance however as the total amount of system Ram continues to increase with newer and newer generations of computers the size of the page table also increases moreover the address spaces on 64-bit architectures are much larger than the amount of memory that the mmu actually supports current 64-bit systems have True Hardware address sizes in the range of 34 to 48 bits if we were to store a mapping to handle every logical page number in such a system the mapping would be large and inefficient since the address space is sparse that is not every 64-bit logical address can map to a physical location in Ram since the physical addresses are at most 48 bits as a result many of the possible 64-bit dresses are unused a solution to this problem which both reduces page table storage size and increases search speed is to use a hash table or dictionary structure to store the outer page table since several addresses May hash to the same value each entry in the hash table is an inner linear page table which allows the hash collisions to be resolved through chaining translating a logical address to a physical address with a hashed page table Begins by hashing the page number P ashing is accomplished using a hash function which may be implemented in hardware for high performance the hash value returned by the function gives the location in the hash table where the inner page table may be found a linear search of the inner page table is performed to locate the frame number once the frame number f is obtained it is joined with the offset D to give the hardware address Sun architectures notably power PC and intelitanium store their page tables backwards that is the system stores a data structure with one entry per frame and the entry stores the corresponding page number along with the process ID of the process owning the page this approach called an inverted page table is efficient in terms of page table storage size however inverted page tables are inefficient in terms of performance and these structures are not used on a majority of systems a newer x8664 systems with virtualization extensions Hardware support exists for extended page tables or EPT AMD and Intel each brand this technique with a different name AMD uses the term rapid virtualization indexing or rvi on newer CPUs they used to call this technique nested page tables or npt Intel uses the extended page tables terminology EPT adds a level of paging to the system at the outer level each virtual machine or guest running on the CPU sees its own set of memory frames isolated from and independent of the actual Hardware the CPU translates page numbers to frame numbers first by translating the page number to a guest frame number the guest frame number is then translated to a host frame number which is the physical frame Nook EPT technology is important for virtual machine performance since it allows each guest to manage its own memory efficiently moreover guests can access memory without having to switch the CPU into hypervisor mode which can be an expensive operation the downside to logical address translation with extended page tables is that it becomes conceptually more difficult to understand as illustrated by the complexity of this diagram a process running in a guest operating system makes a memory request just as it would if no virtualization or present this memory request is divided into a page number and an offset as usual the CPU then performs translation of this page number P to a frame number F also as usual furthermore as far as the guest operating system is concerned the translation is complete the guest OS sees the memory region provided by the host system as if that memory were physical memory in other words the guest OS has no idea that is running in a virtual machine to the guest OS the virtual machine looks just like a physical system however in reality the guest's physical memory is actually an illusion provided by The Host to make the solution work the host must translate the frame number F in guest memory to an actual physical frame number G this translation is performed by the CPU without any context remote switches using the EPT table once this translation is performed the physical memory address is generated by combining G and D where D is the original unchanged offset into the page it is important to note that this entire process is performed by the CPU without switching to the host Os or hypervisor in this lecture I will introduce basic uml class diagrams this introduction will be language independent and will focus on the diagrams themselves I will cover language specific uses of class diagrams in later lectures uml class diagrams provide a graphical way to illustrate relationships between classes and an object-oriented system they also provide a way to draw an abstract diagram of a class so that we can inspect features visually without having to resort to looking at code although class diagrams are uml standard in practice there are differences between the diagrams produced by different tools not all tools support all uml features some tools add extra features and the final appearance of diagrams drawn by different tools varies certain uml tools are capable of round-trip engineering which means they can draw uml-class diagrams from existing code and generate program code from uml models generated code provides templates classes that programmers can finish the ability to perform round-trip engineering depends upon having a programming language the tool can parse or upon having a uml class diagram model in a format the tool can use the basic entity in the uml class diagram is a class Es are represented in the simplest Case by boxes containing the class name here we have a sample class named hello world represented by a rather underwhelming box in object-oriented designs the data members of a class are often called fields class fields are listed in a uml diagram between two horizontal lines under the class name Inside the Box representing the class the first character in the field name gives the field scope a plus sign for public scope and a minus sign for private scope there are other Scopes which are represented by other symbols but remember that different implementation languages support different scopes different programming languages also use different type systems in the pure case uml class diagrams can be drawn to be language independent in practice however most uml class diagrams have language dependent semantics which include the type system used in the target language in this example the hello world class has one field a private member named message which is of type message with a capital m the capitalized message refers to another class named message of which message with the lowercase m is an instance methods or member functions in C plus speak go below the class fields in the bottom section of a Class Bus these methods are functions that are members of the class which can be called to operate on instances of objects static methods or methods that can stand alone without having to be called from an instance of a class are underlined the same scoping and typing conventions are used for methods as are used for Fields as with Fields the exact semantics tend to vary from tool to tool with each tool using a slightly different format in this example we have added five methods to our class all of them public we have Setter and getter methods for the underlying message set message and get message respectively the class also has a mechanism to display the messages using the speak method two of the methods are special the first of these is the hello world method which represents the class Constructor the second is the underlined main method which is a static method that implements a main driver program for the object-oriented system relationships between classes in uml class diagrams are represented by lines connecting the classes together these relationships fall into two general categories generalization relationships which show inheritance between classes and Association relationships which show how instances of classes relate to each other each relationship in a class diagram can have multiplicity numbers assigned one or both ends of line connecting the glasses multiplicities can consist of a single number such as the number one or a range of numbers separated by two dots if the range is unbounded the second value after the dots is a star instead of a number multiplicity values specify how many instances of a class are involved in a relationship as an example if a relationship between class is Foo and bar has a multiplicity of one on the food side and zero dot dot star or just a single Star by itself on the bar side that relationship specifies that for every single instance of Foo there can be zero or more instances of bar in the relationship as I mentioned before one of the two categories of relationship in a uml class diagram is the generalization which expresses the is a relationship between classes this relationship corresponds to inheritance in an object-oriented programming language in this example the simple message is a type of message in other words the simple message class is a child class of the message parent class this relationship is drawn by connecting a line from the simple message class to the message class with a hollow Arrow at the message end pointing to the parent class we leave off multiplicity on this type of relationship since generalizations are between classes themselves and not between instances of classes the first type of Association relationship that I will describe is the aggregation relationship an aggregation is a weak type of Hazard relationship between Class instances in which an instance of one class provides a collection of instances of other classes it is important to note that aggregation relationships are considered weak Hazard relationships because the instances that are collected by the collecting class can exist independently from the collecting class in practice this means that we create instances of various classes and then add those instances to the collection typically with a dedicated method in The Collector for example here we have a multi-message class that acts as a collection of message instances as indicated by the relationship with the hollow Diamond at the multi-message end multi-message provides a method add message to add an instance of a message to the collection this instance is already created elsewhere in the code before the add message method is called to add it to the collection if we want a class to contain instances of other classes without having to manage collections we can use a composition relationship a composition is a Titan Association that expresses a strong Hazard relationship between classes in a composition a class instance contains one or more instances of another class and those instances are created or destroyed by the containing class it is important to understand this key distinction between an aggregation and a composition with an aggregation we create instances to add to the collection then add them to the collection for the composition the instances are normally created in the class Constructor for the composing class and destroyed by the destructor of the composing class there are no methods to manage the collection of composed instances graphically the composition relationship is represented by a filled Diamond at the composing instance end of the relationship as shown in this example here we have a double message composing instance that contains two instances of simple message as shown by the multiplicity of the relationship each instance of double message has exactly two simple message instances contained within it these simple message instances will be created whenever an instance of the double message class is created and both simple message instances will be destroyed with the double message in this lecture I will begin discussing virtual memory due to the complexity of the virtual memory subsystem the second part of this introduction will be given as a second lecture we have previously seen that each process in the system can be given its own logical memory space this Arrangement allows logical pages to be mapped to physical frames without the need for physical frames to be contiguous eliminating external fragmentation and increasing the degree of multi-programming that A system can support we can further increase the degree of multi-programming in a system by recognizing that processes do not actually use all the memory in their logical address spaces at any given time parts of the program code including error handlers and infrequently called functions are not utilized often furthermore arrays and other data structures are often oversized and used in sections instead of all at once if we can swap unused pages out of memory and onto a backing store such as a hard disk we can fit more processes into memory at once increasing our degree of multi-programming furthermore we can give each process its own large logical memory space which can in fact be larger than the amount of physical RAM on the system when we add a backing store the general address translation process Remains the Same processes Access Memory using logical addresses which are translated into physical addresses by the mmu the page table was still utilized to store the page-to-frame mappings however we do add some complexity in that a frame could be swapped out to disk at the time when it is needed the CPU must provide a mechanism to detect the situation and generate a fault that the operating system can handle to bring the required page back into physical memory the process of moving Pages or frames of memory back and forth between RAM and the backing store is known either as swapping or as paging historically the term swapping referred to the movement of entire logical address spaces or entire processes between RAM and disk moving single Pages or frames of data between RAM and the disk was called paging in modern practice both terms are used interchangeably in the Linux kernel component that performs page movements is called swapper a single movement of a single page frame into or out of physical memory is called a page swap historically Linux machines use the dedicated hard disk partition to store the pages that were swapped out to disk modern versions of Linux are just as efficient using a swap file which is a regular file stored alongside other data in the file system it should be noted that swapping is an optional feature and it is possible and even quite common to run systems without any backing store or swapping capability most embedded Linux systems such as Android devices do not use a backing store if memory cannot be allocated to a process on such a system the process typically crashes now page swaps are implemented by the operating system some assistance from Hardware is required to determine when a page swap needs to be performed when translating a page number to a frame number the mmu checks to see if the corresponding frame is resident or loaded in Ram if the frame is present the memory access proceeds as normal if the frame is not present in Ram however the mmu generates a page fold which is a CPU exception that is similar in concept to an interrupt a specific page fault handling routine is registered with the system either as part of the interrupt Vector table or using a separate structure for fault handlers a page fault causes this routine known as the swapper in Linux to be invoked it is then the responsibility of the swapper to locate the missing page on the backing store and load it into RAM possibly moving some other page frame to the backing store in the process the address translation process gains a few steps when paging is utilized a process makes a memory request using a logical address in its private address space as usual the mmu first checks the translation look aside buffer to determine if the page to frame mapping is present in the case of a tlb miss the mmu must consult the page table to find the mapping once the mapping from page number to frame number is known the mmu must next verify that the page is actually loaded in physical RAM if the corresponding frame is available in Ram the memory access proceeds as normal however if the corresponding frame is not in memory the mmu generates a page fault which is essentially a type of interrupt if generated the page fault causes the operating system to switch context to the page fault handling routine which retrieves the corresponding memory contents from the backing store once this process is complete the OS changes the CPU context back to the original process and the memory access proceeds as normal in order for the mmu to be able to detect situations in which your requested memory frame is not physically present in Ram an extra bit must be added to the page table this bit is set to 1 whenever the contents of a logical page are present in a memory frame if the present bit is zero the page has been swapped out to the backing store for efficiency reasons the tlb entry corresponding to a row in the page table must also store the present bit you might have noticed the terminology between page and frame is starting to become a bit blurry here in general we refer to pages of memory being swapped out to disk even though the swap operation is actually moving physical memory frame contents this fuzzy terminology is a result of historical evolution of the virtual memory subsystem now I'd like to take a moment to discuss the nature of backing stores as technology is changing in this area historically the backing store was a mechanical hard disk drive and a number of design decisions in the virtual memory subsystem still use this assumption however many systems now especially embedded systems have only solid-state storage since each block on a solid-state drive can be erased and written only a finite number of times there is some question as to whether it is a good idea to use an SSD as a backing store for virtual memory many embedded devices do not use paging for this reason another issue with the backing store is that it is subject to attack via forensic disk analysis methods in the event the device is lost or stolen sensitive information such as cached passwords and other credentials might have been swapped out to the backing store and these pieces of information could be recovered one solution to this problem which is available as an easy to enable option in Mac OS 10 is to encrypt the contents of virtual memory the downside to this approach is the addition of CPU overhead on top of the generally slow nature of the backing store hardware another approach to avoiding the issues of right limits and post-mortem forensic recovery of sensitive memory data is to use the Linux compressed caching or comp cache mechanism as a backing store with this approach a section of RAM is reserved ahead of time to create a compressed Ram disk or zram disk when a page is swapped out to the cram disk it is compressed on the fly to fit into a smaller amount of memory whenever a page needs to be swapped in from the backing store the page is read from the xeram disk and decompressed although the compression and decompression steps do result in CPU overhead the comp cache system is still generally faster than using a disk or SSD as a backing store furthermore comp cache is as secure as RAM against forensic analysis particularly against recovering sensitive information from A system that has been powered off for a period of time in this lecture I'm going to discuss object-oriented design the general purpose of object-oriented design is to increase the modularity of software by breaking large software systems into smaller pieces this approach to software engineering improves code maintainability by separating application components from each other these components may also be reused in other software applications objects in software are modeled after real-world processes for example consider the process of playing a song in the earliest history of music playing a song required assembling instruments and musicians for a live performance fixed recordings on wire vinyl magnetic tape and Optical disk made songs reproducible by a physical machine with the Advent of digital music this process is moved to Software however the basic operations of playing pausing stopping and moving around the song remain the same in an object-oriented design both the data required to produce the sounds that comprise the song and the behaviors required to implement the song listening process are grouped together in a logical unit this logical unit is what we call an object in a software system in this case we might call this object a song player a common misconception widely held among students software developers and even computer science professors is that an object-oriented language such as C plus or Java is required to implement an object-oriented design this is not the case object-oriented designs are theoretically independent of the implementation language a traditional procedural language such as C can be used to implement an object-oriented program as long as the programmers are disciplined of course object-oriented languages do try to make the implementations of object-oriented designs easier and less ugly that said it is possible to implement a regular procedural program in an object-oriented language just Implement everything in one object the key concept behind an object is that it combines a representation of data in a system along with algorithms to access manipulate and perform operations on that data the data are encoded in an object using properties or fields in the object the combination of the values of these properties is collectively called the state of the object an object's behaviors are implemented as functions that are bound to the object itself in object-oriented terminology these functions are called Methods although C plus plus retains the older terminology of member functions methods perform operations on objects objects are created in software systems by instantiating classes thus another definition of an object is that an object is an instance of a class Es are programming language structures that provide templates that specify how objects are to be constructed these classes Define what properties and behaviors an object will have once it is instantiated an issue with dividing a system into objects is that the developers must decide how data and behaviors are divided among objects the level at which this division occurs is called granularity and there is a direct relationship between higher granularity and the total number of objects in assistant achieving the correct amount of granularity in an object-oriented design is a balancing act at the extreme of having too much granularity the system is comprised of many small objects this type of software project becomes quite complex to develop and rather difficult to maintain furthermore due to the nature of the underlying operations needed to create manage and Destroy objects the execution performance of an object-oriented application will decrease as granularity increases at the opposite extreme is an object-oriented system that is essentially a procedural application since all its data and behavior are implemented in a small number of huge objects perhaps even a single ginormous object this extreme has all the drawbacks of a non-object-oriented system in that the code is not modular is more difficult to debug and is more difficult to maintain notice that maintenance tends to become a nightmare at either extreme a good object oriented design is somewhere in the middle of these two extremes the benefits of object-oriented design are widely celebrated among different object-oriented Developers however no one can ever seem to agree on a concise set of objectives for an object-oriented assistant I have followed the approach of cave Horseman and Gary Cornell in providing the objectives of encapsulation information hiding polymorphism and reuse encapsulation simply means the data and methods operate upon the data are stored together in an object these elements are accessed via well-defined interfaces to the object information hiding is another desirable property of an object-oriented system with information hiding code that uses an object knows nothing about how the object Works internally more importantly other programmers can be given an interface to an object without revealing any of the implementation code preventing them from using shortcuts or making assumptions that may later break when the object is updated in a sense the object is a black box it has behaviors but its internal workings are a mystery to Outsiders object-oriented designs provide the opportunity for polymorphic interfaces this means that different objects can expose compatible interfaces allowing them to be used interchangeable finally well-designed tightly encapsulated objects can be reused later either in the same program or in another application in theory over time a collection of well-tested objects will provide a development team with a library of useful components that can speed the development of new applications in practice of course there is a downside to any development approach an object-oriented development is no exception developers working on object-oriented systems have a tendency to run amok with the design creating a terribly over engineered mess of code that is needlessly complex and overly granular I have been guilty of this design mistake myself feature creep and the realization that more features can be supported with just a few tweaks are tempting forces another issue with the theory of object-oriented design is that objects may be difficult to reuse in practice object designs often wind up being tied to the underlying business process making reuse difficult in new problem domains of course one could argue that if the objects are designed to be sufficiently generic they should be more easily reused however making the objects too generic tends to result in the first problem with the designer and a muck finally one major disadvantage of object-oriented designs is that they are slower to execute than comparable procedural Solutions a major cause of this performance degradation is that objects are dynamically allocated and deallocated using heat memory in a process which is a relatively slow operation I once saw a software engineer complain about the fact that the Linux kernel uses a procedural design with go-to statements in the code such as design is a necessity in a performance critical program like an operating system kernel an object-oriented design would simply be too slow to execute in this lecture I will discuss the implementations of object-oriented designs by way of example I will present the same object oriented design in four different languages Java C plus plus C and python I will begin with a reminder about the distinction between an object-oriented design and an object-oriented language it is a common misconception that an object-oriented language such as C plus or Java is required to implement an object-oriented design this is not the case object-oriented designs are theoretically independent of the implementation language as long as the programmers are disciplined object-oriented designs can be successfully implemented in procedural languages such as C the primary benefit to an object-oriented language is that it sometimes makes object-oriented implementations easier some languages are better at ease of writing than are others my design for this example is utterly simple I have a single class which for whatever reason I named forward this class has a single private field which is a string named message the design also specifies a single public method named screen when the screen method is invoked the message stored in the class field will be printed to standard output for Simplicity this message will be set to hello world in the class Constructor note that I have deliberately omitted Constructor and Destructor methods from the design since the syntax and semantics of these methods are tightly bound to the implementation language let's begin with the Java language which is a canonical example of an object-oriented language here I have a class named horror which has a single private field of type string named message a public method named screen returns no value since it is of the Void type however it does have the side effect of printing the message stored in the object to standard output The Constructor is the final method in the class with the signature public horror left brand right paren this is a special method that runs whenever new objects are instantiated using this horror class as a template in this Example The Constructor sets the message for each object created from this class to a value of hello world I can use or reuse or objects in another class here I have a public class named driver which has a single method named Main this main method takes an array of strings corresponding to any command line arguments to the program this main method returns no value as noted by its void type furthermore main is a static method in Java meaning that it can be invoked by itself directly from the class code without having to create an instance of the driver class in other words main is a procedural function in a language with mandatory objects when main executes it instantiates a new instance of the horror glass creating a new or object after this is created main invokes the screen method on the horror object H which will cause a message to be printed to standard output another example of an object-oriented language is C plus plus in this first part of my C plus plus implementation of the horror class design I Define the class since I must manage the allocation and deallocation of memory for the message string manually in C plus I have added both Constructor and Destructor member functions to the class C plus plus has a few major problems in terms of object support first although data and member functions are encapsulated together in runtime object instances the implementation of the member functions is not encapsulated inside the class definition as it is with Java thus C plus plus requires the use of the scope operator or the double colons which I find less easy to read C plus plus also introduces pointer notation that is not present in Java worst C plus plus class definitions break information hiding by requiring private members to be disclosed in the class interface avoiding this issue requires the inconvenient use of a void typed pointer to a private implementation class unlike Java which is garbage collected C plus plus requires the programmer to return dynamically allocated heat memory to the system manually thus a Destructor is needed in the class for this particular example whenever a new instance of horror is created The Constructor will allocate a new string on the Heap containing the message hello world if I omit the destructor the memory containing the string will be leaked away whenever the horror instance is deleted to avoid this memory leak I need to add a Destructor that deletes the dynamically allocated string before the horror instance is deleted in a more complex object which might dynamically allocate memory in several different member functions the destructor would require careful coding to ensure that all utilized memory is returned to the system upon object deletion a simple C plus driver program demonstrating the use of horror objects is presented here first I allocate a new horror instance on the Heap then I invoke the screen method to print the message to standard output finally before I return from the main function I must remember to delete the instance I created to be technical if I failed to call delete the operating system would reclaim the Lost memory for me whenever the program exits however Reliance on the operating system to perform such a memory cleanup is considered bad coding practice Okay so we've seen that we can Implement an object-oriented program in an object-oriented language like Java or C plus plus but remember that I previously said that we can Implement an object-oriented design using a procedural language in this example I implement the exact same object-oriented design using C it isn't pretty but it is an object-oriented design the first step to implementing an object-oriented design in C is to find a way to encapsulate the properties of or data associated with an object along with the methods that operate on the object in C we can use a struct to accomplish this action since C does not have true strings my message will be a pointer to an array of characters associating the screen method with horror objects requires a bit of trickery here I am using a c function pointer or a pointer to a function in the text segment of the program to allow object instances to be associated with a particular C function it will serve as my object method this function will not return any data so the function pointer is given a void type since the C function cannot be defined right here inside the struct I must pass the instance of the object to the function as its argument thus my function will have one parameter which will be a pointer to an instance of struct order since C is not an object-oriented language it does not have a notion of a Constructor however I can use a factory function instead a factory function creates a new instance of an object and returns that instance here I have the new horror Factory function which first allocates a struct horror data structure on the heat using the malloc function next I allocate space for the message using a predefined constant to determine how long the message can be I then use Stir copy to copy the literal string hello world into the message field of the new object instance finally I set the screen function pointer in the new object instance to the address of the horror screen function this step binds the function or screen as the new objects screen method to facilitate reclaiming the Heap memory which I must do manually since C has no garbage collection I have added a second function named delete horror this function first frees the memory used for the message then it frees the memory used for the horror instance itself note here that order is important I must free the message memory first before I destroy the instance to test my c-based horror code I have a simple driver implemented as a main function my first task inside the driver is to call my factory function to make a new instance of Horror then I invoke the screen method notice that the screen method invocation looks a bit strange H Arrow screen left paren H right parent in order for scream to have a reference to the object on which to work I must pass a pointer to that object as the first argument to screen since C is not an object-oriented language it does not have the built-in syntax features for automatically mapping the object on the left side of the arrow into the method on the right side of the arrow the programmer must perform this step manually with an extra function parameter as with C plus plus I must reclaim the heat memory manually using my delete or function if you compare the C code to the previous C plus code you will notice quite a few similarities in the steps required to utilize the horror design you might also notice that the implementations are approximately the same length if I had to pick a favorite among object-oriented languages python would be the winner at least for single threaded code as you can see from this complete example which fits on one slide the python code is exceptionally Compact and relatively easy to read the horror class explicitly subclasses object to create a new style class instead of a classic class which is a python-specific issue class Constructors in Python are always named Double underscore init double underscore notice that the first formal parameter to any method in a class is always self which will be a reference to the current class instance internally python is operating much like C and passing the instance as the first parameter to the method however unlike C python has syntactic sugar for accomplishing this task when I call H dot scream in the driver code I do not need to pass the instance to the method explicitly the other item to note in this implementation is the double underscore before the message field name these underscores are Python's approximate implementation of private scope which uses a technique known as name mangling another feature of python is that it is duct typed which means that any two objects that have the same interface can be used interchangeably furthermore since python is background compiled on the fly as it is executed the interfaces are not checked until runtime thus if I were to reuse my horror class from the previous slide and add a happy class with an identical interface but a different message I could use horror and happy interchangeably in my driver code I instantiate an instance of horror named H in an instance of happy named I I then Loop over both objects collected in a tuple calling the screen method on each object this is an ideal example of polymorphism the same code can be written to work with two different types of objects the duct typing used in Python means that I can distribute my class in compiled bytecode form along with some documentation explaining the interface I thus maintain information hiding since both the implementation and the instances are tightly encapsulated python classes can be easy to reuse of course nothing is free so there is a downside which is that python execution is slower than C in almost all cases comparison to C plus plus and Java is more difficult finally I'd like to demonstrate procedural programming in a supposedly object-only language Java is sometimes criticized for making it impossible to design non-object-oriented systems and while this might technically be true it is not a practical issue by implementing an application using a single class in which each method is static I can write a procedural Java program here I have a static add procedure that adds two integers together and returns an integer in my main method which incidentally is always static in Java I initialize two primitive integer variables X and Y I then call the add function and print the result the standard output nowhere in this code do I ever instantiate any objects with the new keyword the system.out object already exists and is provided by the Java virtual machine automatically I can make Java code even more procedural by removing the main method and placing my code in a static initializer block the only catch to this implementation is that I must call system.exit at the end of the block otherwise the Java program would print my Hello World message then immediately crash with a no such method error exception informing me that I have no main so to summarize I would like you to come away from this comparative languages discussion with an understanding that object-oriented designs are independent of the implementation language you can Implement an object-oriented design in a procedural language though it does require a little bit of work and a lot of discipline you can also Implement a procedural program in a mandatory object-oriented language though you might need to violate the spirit of the language to do so in this lecture I will discuss page replacement as it is used in the virtual memory subsystem I will discuss Global and local page replacement page table entries that support page replacement and a number of classical page replacement algorithms Whenever there is a demand for memory that is greater than the actual amount of physical RAM installed on the system the operating system must determine which pages will be kept in memory and which pages will be swapped out to disk when memory frame contents are swapped out to disk the operating system needs to find a page of memory that is not currently in use furthermore in an ideal case the operating system should also pick a page that will not be used for some time so as to reduce the total number of page swaps in order for the page swapper to operate some additional data must be kept about each page including whether or not the page has been referenced and whether or not the page has been altered since it was last swapped in Durant decisions regarding page swaps can be made globally or locally with global replacement any page in the system is a potential candidate to be swapped out in favor of another page while simpler to implement Global page replacement does allow processes to steal memory frames from each other on the other hand local replacement policies allocate a limited number of frames to each process when a process exceeds its frame allocation only frames belonging to that process are selected for replacement implementing a local page replacement algorithm is more complex than it seems on the surface largely due to abelities anomaly allocating more frames to a process does not necessarily reduce the number of page faults that occur as a result of that process in fact with some replacement algorithms increasing the number of available frames actually increases the number of page faults that occur at the opposite extreme allocation of too few memory frames to a process also increases the number of page faults without enough physical memory processes will spend more time page faulting or swapping than they will spend executing the goal with a local replacement algorithm is to find an optimal working set for each process this working set is the minimum number of frames that a process actually requires in order to execute to some desired level of efficiency regardless of whether Global or local replacement policies are chosen the operating system needs a few pieces of information to implement page swapping correctly first the operating system needs to know whether or not a page that is currently in memory has been referenced by a process pages that are loaded but unused might be more ideal candidates to be swapped out to the backing store the second piece of information that needs to be stored in the page table is the Dirty Bit this bit is set to one whenever a process writes to a page which lets the operating system know that the copy of the memory frame contents on the backing store needs to be updated whenever the page is swapped out keep in mind that on systems with Hardware managed page tables such as the x86 and x8664 platforms the mmu updates these bits automatically whenever a page fault occurs the operating system must locate the desired page on the backing store then it must find a free frame and RAM into which the page can be loaded if no memory frames are free the operating system must select a victim frame to be swapped out to the backing store the algorithm that is run to determine which frame will be the victim is called the page replacement algorithm page replacement algorithms ideally should minimize the total number of page faults in the running system in order to maximize system performance let's take a look at several plastic page replacement algorithms the first classical page replacement algorithm we will consider is the random algorithm whenever a page swap is required this algorithm simply picks a victim frame at random in practice this random selection often picks a page that will be needed in the near future leading to another page fault in a short time period as such it is not effective for minimizing page faults another ineffective algorithm is to select the oldest page or the page that has been in memory for the longest period of time unfortunately this page could be frequently accessed so if it is swapped out another page fault could be triggered in a short period of time to bring it back into memory somewhat counter-intuitively selecting the frame that has been accessed the least frequently is also ineffective a page that is used relatively infrequently might still be used regularly which would lead to another page fault to bring this frame back into RAM the most frequently used algorithm picks whichever frame is being used the most and selects that frame to be swapped out to the backing store this is a completely stupid idea since this page is likely to be accessed again shortly after it is swapped out a good algorithm for choosing victim frames is the least recently used or lru algorithm this algorithm selects the victim frame that has not been accessed for the longest period of time unfortunately with current Hardware there is no good way to track the last memory access time tracking every access and software would be a terrible idea since such a scheme would require an interrupt on every memory access thus it is Impractical to implement lru directly most implemented page placement algorithms are approximations of lru however theoretically the optimal algorithm or opt is the best Stage replacement algorithm to use with this algorithm the operating system picks a frame that will not be accessed for the longest period of time as the victim delaying a future page fault related to the corresponding page for as long as possible a mathematical proof exists showing that opt is the best possible page replacement algorithm unfortunately opt is also impossible to implement since it must be able to predict all memory accesses ahead of time as such we are left with lru approximation algorithms such as the not used recently or in Ur algorithm Nur tracks frame accesses using a combination of the reference bit Dirty Bit And or an age counter this algorithm produces reasonable performance in actual implementations in this lecture which is presented in two parts I will begin discussing processes I will introduce the process model discuss the type of information associated with the process give an overview of process State and introduce the concept of process forking as is usually the case my presentation is focused on unix-like systems let's begin by defining what a process is a process is an instance of a computer program in execution when we ask computer system to run a program the code for that program is loaded from disk into memory and executed as a process in the system on some platforms processes might be called jobs or tasks on a modern system a process consists of one or more threads of execution in other words a process can execute one instruction at a time or it can execute several instructions at the same time on the CPU each process on the system receives its own private allocation of resources each process also has access to its own data and the operating system maintains statistics about each process in order to make effective scheduling decisions foreign a process is divided into segments program code and other read-only data are placed into the text segment Global variables in a program have their own data segment that allows both reading and writing automatic variables or local variables and functions are allocated at compile time and placed on the stack data structures explicitly allocated at runtime are placed on the heat as memory is used by a process the stack and the Heap grow toward each other if a process makes use of shared libraries these libraries are mapped into process memory between the stack and the Heat in order to track processes correctly and allow multiple processes to share the same system the operating system must track some information that is associated with each process this information includes the memory that the process is using as well as the current location and the process code that is executing known as the process program counter the operating system must also track other resources in use by a process including which files are currently open in any network connections the process is using in addition to the information generated by the process itself the operating system must keep scheduling information and statistics about each process this information includes a unique identifier or process ID it can be used to distinguish processes from each other in order to arbitrate access to system resources the operating system must also store information about the owner of a process so that permissions can be enforced correctly to facilitate scheduling decisions the operating system collects various statistics about process execution such as the amount of CPU time consumed and the amount of memory used during the lifetime of a process the process moves between several States when a process is first created it is initially in the new state once creation is complete and the process is ready to run it transitions to the ready state where it waits to be assigned to a CPU or when the scheduler selects a ready process to run that process is moved to the running State and is given CPU resources during execution a process might request external resources such as Disgaea since these resources take time to provide the process is moved out of the running State and into the waiting state so that the CPU core can be given to another process finally when a process is finished it is placed in the terminated state so that the operating system can perform cleanup tasks before destroying the process instance completely in this diagram we can see how processes May transition between states at creation time a process is placed into the new state while the operating system allocates initial memory and other resources once creation is complete the process is submitted to the system and placed in the ready state whenever a CPU core is available to execute a process it is dispatched to the running state where it executes execution of a process can be interrupted for a variety of reasons if a hardware interrupt occurs the operating system might have to move the process off the CPU core in order to service the interrupt returning the process to the ready state or the process might make an i o request in which case the process is moved to the waiting State while the system Waits on the relatively slow i o device to provide the requested data once I O is complete the process is moved back to the ready state so that it can be scheduled to run again whenever a CPU core becomes free upon exiting the process is moved to the terminated state for cleanup the mechanism for process creation is platform dependent I will be introducing process creation on a unix-like platform such as Linux or Mac OS 10. on these platforms all processes descend from a single parent process that is created by the kernel at Bhutan on Linux this first process is called init which is the common Unix name for the first created process Apple decided to call this process launch D on Mac OS 10. by convention the initial process always has a process ID of 1. the initial process must also remain alive for the entire time the system is up and running otherwise the whole computer crashes with the kernel panic the init or launch date process is started by the kernel at boot time every other process on the system is a child of this special process child processes on Unix are created by forking a parent process the parent process makes a system call named Fork which makes a copy of the parent process this copy which is initially a clone of the parent is called the child process it is up to the parent process to determine what if any resources it will share with the child process by default the parent process shares any open file descriptors network connections and other resources apart from the CPU and memory with the child however the program code can close or reassign resources in the child making the child completely independent of the parent once a child process is forked the child becomes an independent instance of the program which can be scheduled to run in parallel with the parent however the parent process can be coded to wait on the child process to finish executing before the parent proceeds furthermore the parent process is able to terminate the child process at any time on some systems termination of the parent process will terminate all child processes automatically on other systems child processes become orphan processes whenever the parent terminates unix-like systems including Linux have the ability to support both models any process including a child process has the ability to load a different program into its memory space this loading is accomplished via the exact system call which replaces the entire program code of the process with the program code from a different program new programs on unix-like systems are started by forking an existing program then executing the new program in the child process when multiple processes are executing on the same system they have the ability to execute independently or share information between themselves an independent process is completely separate from other processes in the system its execution is not affected by other processes and it cannot affect other processes as long as the operating system is designed and implemented correctly alternatively processes could share information between themselves and thereby affect each other when this occurs we say that the processes are cooperating cooperating processes may be used for a variety of reasons including information sharing implementing high performance parallel computation increasing the modularity of a program implementation or simply for convenience when implementing certain designs in part two of this lecture I will provide additional detail about process forking and executing new programs in this lecture I will discuss Process Management I will discuss process contexts context switches and process scheduling each process running on a system has a certain amount of information associated with it a minimal set of State information that allows a process to be stopped and later restarted is called process contexts process context includes the correct contents of CPU registers the current program counter value for the process and the contents of ram the process is using switching between processes on a system is often called a context switch although process switch is a more precise term the operating system can perform a context switch from a process into a section of the kernel and then back to the same process without actually performing a process switch context switches require at least one mode switch to perform since the CPU must enter supervisor mode to enter the kernel relatively speaking context switches are a fairly expensive operation and frequent context switching will reduce system performance whenever the operating system needs to switch from one process to another it must first make a context switch into the kernel the kernel then saves the state of the previously running process and restores the state of the process to which it is switching a second context switch is then required to start the newly restored process on Unix systems processes are created via the fork system call following creation the CPU scheduler determines when and where the process will be run once the CPU core is selected the dispatcher starts the process whenever a process makes an i o request a system call into the kernel is made which removes the process from execution while waiting for the i o to complete the process yields any CPU cores it is currently using so that another process can use those resources while the first process Waits on the relatively slow i o device operations that cause the process to yield the CPU cores and wait for some external event to occur are called blocking operations reading data from a file is an example of such an operation the process calls the read function and the read function does not return until it is read some data the process may have been moved off the CPU and then restored and returned to the CPU while waiting for the read function to return some processes are CPU bound which means they perform large computations with few i o requests or other blocking operations in order to enable the system to service other processes and effectively implement multi-programming CPU scheduler must preempt these types of processes preemption involuntarily saves the process State removes the process from execution and allows another process to run interrupts from Hardware devices may also preempt running processes in favor of Kernel interrupt handlers without this type of preemption the system would appear to be unresponsive to user input now in order to store process State information the operating system must maintain data structures about each process these structures are called process control blocks or pcbs pcbs contain Fields where information about a process can be saved whenever a process is moved off the CPU once again this information includes the contents of CPU registers and current program counter value the operating system stores process control blocks in linked list structures within kernel memory space here is a process control Block in Greater detail some of the information Fields include process State the unique ID for the process the process program counter CPU register contents memory limits for regions of memory used by the process open file descriptors and other data when performing a process switch it is critical that the operating system saves the processes CPU state at a theoretical minimum this state information includes the program counter and CPU register contents in practice more information will be saved during each process switch this diagram presents a simplified view of process switching where only the program counter and register contents are saved and restored here are the operating system switches from the process with iodine 1 to the process with id2 the first step in the process switch is to perform a mode switch into kernel mode along with the context switch to the section of the kernel that handles process switching that component of the kernel saves the program counter and CPU registers into the process control block for process one also the process state for process 1 is set to ready indicating that the process is ready to run again once the CPU core becomes available the process switching code then restores the CPE register contents and program counter value from the process control block for process 2. the state of process 2 is changed from ready to running the CPU privilege level is returned to user mode and context to switch to process 2. process 2 now begins executing during the lifetime of a process the corresponding process control block is moved between various queues in the operating system each queue is a linked list of process control blocks and multiple linked lists overlap all pcbs are at all times in the job queue which is a linked list of all process control blocks in the system process control blocks corresponding to processes in the ready state are linked into the ready list processes waiting for device i o have their pcbs linked into various different device queues in this diagram we can see the pcbs for five different processes all pcbs are members of the job list with list links depicted by the green arrows three jobs are in the ready list and two jobs are in a device queue waiting on i o linked lists within each queue are represented by the dark blue arrows notice that the two linked lists for the queues overlap the linked list for the job list careful management of linked lists is a major requirement of operating system kernel code now I'd like to shift Focus for a moment to mention scheduling since it is closely related to the linked list management in operating systems theory we typically divide scheduling into three types job scheduling midterm scheduling and CPU scheduling job or long-term scheduling refers to the selection of processes to place into the ready state this type of scheduler has a long interval typically on the order of seconds to minutes one of the clearest examples of job scheduling on Modern systems occurs on high performance computational clusters on which users submit jobs that may require hours to be scheduled and execute midterm scheduling refers to the swapping of inactive processes out to disk and restoring swapped processes from disk many of these tasks are now part of the virtual memory subsystem CPU scheduling or short-term scheduling refers to the selection of processes from the ready list to run on CPU cores this type of scheduling will be the subject to Future lectures one important operating system component related to short-term scheduling is the dispatcher which receives a process control block from the short-term scheduler and restores the context of the process the dispatcher also completes the context switch to the process by performing a mode switch into user mode and jumping to the instruction address specified by the newly restored program counter thanks for watching please subscribe and don't miss out on new videos and lectures