Dr Charles Severance is one of the world's most popular programming instructors in this course he'll teach you C programming and object orientation With a Little Help from the classic cbook from kinghan and Richie this is definitely the place to start if you want to learn C welcome to C programming for everybody my name is Charles sance and I'm your instructor for this course this course and website is dedicated to learning the classic version of C programming language from the 1978 book written by Brian W kernigan and Dennis M Richie this book places the reader in the middle of the 1970s transition from a hardware centered computer science to a focus on writing portable and efficient software C was used to develop operating systems like Unix Minix and Linux programming languages like Pearl python Java and JavaScript and Ruby are all written in C software like early tcpip networking stack implementations that made the internet possible were written in C and the first web browsers and web servers were written in C writing software in C enabled major advances in computer architecture and performance operating systems compilers and utilities could be recompiled to work on a new hardware platform once we had a c compiler for the new hardware so much software has been written in C over the past 40 years that there's a very good chance that much of the software that you use every day was either written in C or written in a programming language that was was written in C so we study C less as a programming language to use on a daily basis and more as the foundation of modern software and Computing in many ways C is the technology equivalent of the Rosetta Stone in that it provides a connection between the programming languages of the past and the programming languages of the present the name cc4 in www.cc.com refers to the original Unix command CC which was the command that you used to compile your C program CC stood for C compiler and it is featured on the first page of the first chapter of the KRC book programmers like me from the 1970s and 1980s typed CC on unic systems like the AT&T 3bw to compile and run their first hello world program in C this material is being presented under fair use as we are making use of material from a copyrighted work that is out of print and not broadly available in any format the book is also not available in any accessible format we are making use of this material in a teaching and research context with a focus on studying its contribution to Computing history the material is a available for free and online to anyone who wants to learn about the history of the sea language Computing and computer architecture welcome to the course [Music] welcome to C programming for everybody my name is Charles sance and I'm your instructor for this course this course in website is dedicated to learning the classic version of C programming language from the 1978 book written by Brian W kernigan and Dennis M Richie this book places the reader in the middle of the 1970s transition from a hardware centered computer science to a focus on writing portable and efficient software C was used to develop operating systems like Unix NX and Linux programming languages like Pearl python Java and JavaScript and Ruby are all written in C software like early tcpip networking stack implementations that made the internet possible were written in C and the first web browsers and web servers were written in C writing software in C enabled major advances in computer architecture and performance operating systems compilers and utilities could be recompiled to work on a new hardware platform once we had a c comp filer for the new hardware so much software has been written in C over the past 40 years that there's a very good chance that much of the software that you use every day was either written in C or written in a programming language that was written in C so we study C less as a programming language to use on a daily basis and more as the foundation of modern software and Computing in many ways C is the technology equivalent of the Rosetta Stone in that it provides a connection between the programming languages of the past and the programming languages of the present the name cc4 in www.cc.com refers to the original Unix command CC which was the command that you Ed to compile your C program CC stood for C compiler and it is featured on the first page of the first chapter of the KRC book programmers like me from the 1970s and 1980s typed CC on unic systems like the AT&T 3bw to compile and run their first hello world program in C this material is being presented under fair use as we are making use of material from a copyrighted work that is out of print and not broadly available in any format the book is also not available in any accessible format we are making use of this material in a teaching and research context with a focus on studying its contribution to Computing history the material is a available for free and online to anyone who wants to learn about the history of the sea language Computing and computer architecture welcome to the [Music] course hello and welcome to C programming for everybody this lecture is putting C in a bit of a historical context now if you're watching this lecture you're probably familiar with some of my other classes I just want to kind of let you know that I've been been building a lot of classes most of you probably took python for everybody available on corsera ATX and many other platforms but I have a whole series of classes that are designed what I call the path of the master programmer where I try to start people that no matter where you start I want you to be able to learn to be a really good programmer and follow along as far as you want these are all my materials they're all 100% free open and online and they're really aimed at teaching everybody how to program I mean I started doing doing this back in 2012 with corsera and I have dedicated myself to making all my materials free and to create a path that anyone can take anywhere in the world regardless of economic uh challenges or other things in your life I want everybody to have an opportunity to be a professional developer so I encourage you and and and frankly if you haven't taken python yet like my python for everybody class this this course is going to be a little bit uh difficult so let's start with a history of sea the book we're looking at the kigan and Richie uh C A C programming language by Brian W kernigan and Dennis M Richie uh was published in 1978 and the the key thing is is that it it is a moment in history where everything changed and and so we're looking at this textbook and the text in this textbook and the language itself in the context of how it is impacted history the C programming language itself has a long history there was a language called B and they were using it at AT&T of bell labs to build utilities and operating system stuff um but it was a little too word oriented and so they the language C was as new uh computer hardware came out that supported bite addressing and the ability to load a string of bites and send store a string of bites rather than a set of words words being lger than a bite and more than one character were packed into a words and kind of like the 60s and early' 70s um C wanted to make a character a sort of core uh low-level V kind of data that the language would happen and from like like the the mid early and mid 70s the C and Unix kind of co-evolved they wanted to build something that would uh make Unix work well on a PDP 1120 and at the same time uh make it so they could Port Unix to other systems but really um it was about the PDP 1120s uh cool memory architecture having to do with bite addressability and what happened was is they were they were carefully rewriting Unix in C but then fixing C laying the groundwork for uh Unix portability and so by 1978 the this KRC book was published and at that point you could think of it as a um a summary of over a decade of research in how to build a portable programming language and then use that portable programming language to build a portable operating system C in Unix of course um by uh 1989 the C had become popular and there was a need to standardize it so there's a variant of C called c89 that is the ANC and then that same version was called C90 as ISO ISO the international organization of standards also standardized that and so that was our first uh version of the sea that we could all agree on uh the the ancy did not intend to go too far away from what we call KRC it but instead it just you know sort of nailed down a few things that by then were important to nail down um and C has continued to evolve from sort of 1990 to the present and there's a number of major revisions but the key thing that these revisions don't do in the modern version of C is they do not attempt to make C uh easy to use language say like python or JavaScript and they C is knows its place in the in the panoply of languages and does a good job of that so if we look at sort of what's the future C is a difficult language to use as a general purpose language python is a is a great general purpose language but it's not a great systems programming language and the two things that are missing from C are the lack of really solid dynamic memory support in the core types and libraries and then there is no safe string type strings are not there's no string in C it's character arrays and arrays have sizes and if you start putting stuff beyond the boundary of that array things just blow up and C++ is to me not the the sort of future version of C it's really a more powerful inter and flexible version of C for programmers who are doing really professional intricate systems applications writing good C++ in some ways is more difficult than writing good C the the languages that sort of take on in C's mantle in the general purpose are things like Java JavaScript CP or python the key thing with these languages is they don't give you sort of strings as just raw bite arrays and they they give us a simple objectoriented layer that keeps us away from the metal the goal of C is to get close to the hardware close to the metal and so Java JavaScript C python are all great languages and they're great for what we use them for they're just not well suited for writing an operating system kernel the most likely language that is like C like the next C is probably rust the idea of rust is that it stays close to the metal um but then gives us some simple and safe core data types and recently Linux is starting to accept some Rust in Linux and so that means that rust has to be mature it means that rust can't be like evolving rapidly um I've seen situations where operating systems like Mac OS like decide to depend on python so there's parts of Mac OS that depend on python 2 but then they can't really upgrade to python 2 because their operating system blah blah blah so to for an operating system to depend on a programming language like rust it really has to be mature and even more importantly stable you can't have clever Innovations in the programming language causing regressions in an operating system say like clinics so I look look look for rust now C is has been around a long time um before C we call C starting in 72 the book is published in 78 before c most of us would write Assembly Language or Fortran some people WR wrote pl1 that's not on here uh Fortran is not really a general purpose programming language you wouldn't write uh command like cat in Fortran Fortran was really for scientific computations and the earliest of computers in the 50s 60s were either sort of really specialized toward like payroll and HR systems or they were really specialized to doing computations and the ones that specialized were science the science ones were uh used for Trend because it was just the right language for those computers that were aimed at doing scientific calculations c as a language was kind of none of the above in that it was aimed at writing system code a kernel an operating system and the utilities around it including like other languages and so C is kind of the mother tongue of all kinds of other derivative languages and things like The Bash shell Pearl python PHP C++ JavaScript Java and C sh and Objective C just kind of were derivatives of this beginning of c and that's why you see a lot of patterns in these other languages that are similar and that's because JavaScript and Java both inherited their for Loop syntax from C so I've got a couple of videos in this section this is Brian kernigan and uh talking about the C programming language and it's a short video I didn't produce this video but it's a great little video um another video is from the creator of C++ Barn strrip this was an interview that I did with itle e computer Magazine on top of this history of the sea language we can look at a brief history of computers and I have a whole course called internet history technology and security that really starts in the 1940s with a focus more on communication rather than computation even though Comm communication and computation were very much connected throughout the 40s through the even today in the early 1950s computers were you best think of them as like a multi-million doll strategic asset every single computer and a lot of them were custombuilt the first computer uh Michigan State University where where I went to undergrad was built by the electrical engineering students of that University based on some designs that they' borrowed I think from Iowa and so things like the programming language the operating system you didn't have a lot of generalization you didn't have a lot of sharing you tended to write code and put it on a paper tape or later a magnetic tape and load it and run it and so you were just pretty happy if the code worked you didn't need an operating system these weren't multiprocessing computers and so the software environment was very minimalist but in the late 50s and early 60s you saw companies like IBM and digital equipment corporation began selling general purpose computers they just could make them um and they and they started selling them and they still were expensive and they were only in like a business would have a couple of computers to help them do payroll perhaps or something like that something that was really really important because the computers were expensive but in the' 60s there we we really got to the point where the computer componentry the chips Etc were becoming Commodities you could just go to a place and buy chips and then you could make a computer by buying a bunch of chips and putting those things together and and because you weren't building everything from scratch the cost got a lot lower the other thing that these uh these less expensive computers were is they were a little slower but by the end of the 1960 there was a lot of computers there were some you know super expensive weird oneof you know small production computers there were computers that had had there was like the previous generation of mini computer where there were lots of them laying around old computer science departments or businesses that they weren't sure what they did with them to do with them they wanted to buy a new one and then there were just Innovative new lowcost computers coming out and in the' 70s in the this millu of just lots of new and old computer hardware the question was is is there a way we can do things with all of this old hardware and is there sort of one solution and that's where Unix andc uh came and certainly after the 70s we look at the 80s and that's where microprocessors and personal computers and so we went from computers that were sizes of refrigerators or or desks to the size of a a computer could be on a single chip and in the beginning those personal computers like the IBM PC or Commodore pet they had really bad performance but that performance once you could get everything on a single chip that performance could get faster very quickly and because personal computers became a mass Market item a lot of money could be invested in personal computers by the 1990s personal computers continued to grow right and um but the to communicate and and talk and exchange information became important and so in the 1990s we saw really an increasing focus on connecting computers with the internet and other kinds of networks and the performance of these computers and the price kept going down the performance kept going up and then by the time we get to the 2000s Amazon's AWS was founded in 2002 and it used personal computer microprocessors like from Intel and produced Computing as a commodity right and so and you don't even buy computers anymore you just go to Amazon and say I'll rent a computer for $7 a month and so we see in 1978 we see this moment where the we going from uh the computers were were becoming more common they're going down in price and there's getting to be more and more of them and there was a diversity of computers these days there's actually less diversity if we go back in history and you take a look at my internet history class you you can see the I go to Bletchley Park and show you some of the earliest uh computations from the 1940s in World War II we go to uh Computer History Museum in California and visit with Gordon Bell and talk about the the pdp1 and he talks a lot about um buying the components and putting the components together and how how sort of the mass availability of relatively lowcost components really allowed for a rapid innovation in computer architecture uh a computer that I used in my uh computer science degree was the Control Data CDC 6500 and I have a video where we visit the living computer Museum in Seattle Washington and here card readers work and card punches and if uh if we ever get to the point where everything opens back up again it's a it's a tremendous visit to go see uh all the technology that really Through The Years except that it in the living computer Museum they like to have everything running and then you can take a look at a more modern smaller the Raspberry Pi which the Raspberry Pi is actually not an Intel based system the Raspberry Pi is based on uh arm the processor that really became popular as a result of the cell phone Revolution and so they the Raspberry Pi is to take what the the technology was Advanced both um low power and high performance technology that was Advanced because of mobile phone Innovations the Raspberry Pi came out and was a good single board computer so let's take a look at the operating system munix is the operating system that is uh connected with C in the 1960s there was a multi-user operating system called multics um and then in the 1970s they they wanted to come up with yet another operating system and they called it eventually called it Unix and the deck PDP 1120 which was one of these new uh commodity PE part-based computers um that was coming into the marketplace and so in 1973 Unix was Rewritten in C but it only was there on the pdp11 although they had laid the groundwork for portability from the beginning they they knew they wanted to H have everything be portable they just couldn't make it all portable the first version they just had to make it run on the pdp11 and then by 1978 the Unix the second computer that Unix had run on was an inata 832 and that was quite a different computer and so it was good and so they really learned a lot about making Unix a portable bit of software from in the early 1970s C was evolving in a way so that the Unix could be ported right so it's like let's we got this problem between let's just say the pdb1 and the inner data and how can we fix this and we can both change how the operating system works we can change the operating system code and we can change the C compiler and then we can rewrite the operating system code to get less and less Assembly Language and more and more C language and so the idea was to get to the point where uh there was a very very small amount of Assembly Language in Unix and over the years that's gotten lower and lower Unix was Rewritten a number of versions came out in the 70s having to do with their portability so by 1978 the Unix version 7 could also run on a whole new architecture from deck called the VAC systems um uh the University of California Berkeley had their own distribution of Unix called BSD the Berkeley software distribution and that was really cool because uh universities often pushed things like networking tcpip arpanet BSD Unix was the first place some of us saw tcpip 1982 a company based solely on Unix called Sun Microsystems was found uh sun was some work at Stanford some work at Berkeley based on Unix and they created what in effect was the Unix workstation Marketplace at this point you could imagine that the world was about to just adopt Unix Unix was the greatest thing ever computer science departments were um teaching Unix in their operating systems classes in the in the mid 80s the problem became AT&T had never come up really with a business plan for what the purpose of Unix was and so there was some fits and starts as to how they could monetize this extremely popular thing and they they didn't do a great job and it took them a long time to figure out what was going to be successful and by the time AT&T sort of figured things out the market had moved on and so minex is an operating system system that was developed in the Netherlands by Andrew Tannon bomb and he he built a completely free and open- Source operating system that was used for Education he he built a textbook around it and it was very popular but uh he didn't want commercialization at least not at that point in time so he sort of he sort of held on to it too tightly again kind of an intellectual property mistake and in 19911 a program called lonus Torvalds decided he was going to build a fresh groundup implementation of the Unix kernel that was 100% free so he wasn't going to use Unix he wasn't going to use Minix he wanted to create another thing and originally it was just like a hobby I'm going to try to see how far I can go by 1992 Linux started to work and it adopted this license called the the GPL which is called the ganu public library license which is a strongly open source license in the way that it's difficult to take uh Linux out of Open Source with which meant that people could then invest in Linux and so Linux has become Unix right Linux in the modern world is the Unix like system and Unix tried to hold on for a while but they really really couldn't and so the uh the the remaining Unix distributions are a a tiny tiny fraction of the marketplace and Linux is the marketplace so we can see some of this in some of the videos that I've got like Andrew tanom will tell us the story of how Minix was created and how Minix be how Minix kind of begat Linux and so that there's some interesting stuff here okay so I'll lay on top of all this remember that 1978 is this moment in history where I claim like everything changed now I've been a computer scientist for a very long time I started in 1975 and in 19775 I learned things like Fortran on a computer that CDC 6500 computer it ran a special operating system called scope Hustler I knew how to do assembly I knew to Fortran and even Pascal so Pascal in the 70s was one of these languages um that and I've got a video for Pascal as well one of these languages that was sort of like saying look here's the future now Pascal was really aimed at teaching and so I used Pascal in educational context and I as I became a professional programmer in the 1980s I was using a bunch of stuff Assembly Language Cobalt little bit Unix and see here it wasn't like by the early 80s Unix was everywhere at least not in the professional world because the PC Revolution was happening so we had Doss we had early Windows versions I used things like dbas and turbo Pascal um I taught classes of the IBM 360 and I taught Assembly Language I use the deck vax and I use the VMS not Unix operating system in Fortran and I also then taught in the mid 80s on AT&T West I think it's called Uh a g8t 3B 2 I taught Unix I taught C and I taught Fortran and so even though we can trace back to like 1978 is the moment that everything changed it's not like the market changed the the thing that happened in the 1990s though is that all these older computers all and the older computer vendors like burough and Unisys and IBM and cray and control data um as microprocessor speeds increased all these little companies would create Unix workstations that were faster and faster and faster and because they could build new hardware that was fast and then grab the Unix operating system um and make their operating system work on their new hardware it allowed for some amazing Innovation so I was using Sun ardan Stellar these are all gone now IBM RS 6000 convex C2 2400 I had a next on my desk I used C but TCP IP Windows HTTP the web you know and Windows and Mac OS were all kind of in the mix but the Innovation was really happening most rapidly in the Unix space but by the 2000s everything that I touched really had some aspect I'm not a Windows person so everything I touched had some aspect of Unix in it Mac OS still has Unix in Linux and my languages were sort of narrowing back down to Java PHP and JavaScript and then in 2010s right Linux Mac OS python PHP JavaScript so things really have settled into a world that my own software development where I'm pretty much using Unix like systems all the time and C based or C derived languages because python Java p HP and JavaScript are all based on C and that's why it's so important to understand C I'm not writing c as a profession professional in any way but I am using all my C knowledge every single day and so that's a a sort of a picture of me in my office that's an IBM PS2 in my office I I was registering in 1989 I was registering with Punch Cards when I first started I was using a line printer and punch cards and so so there there's my history now I think it's important to acknowledge that as I've shown you all these history documents and the videos and the oral histories that I've done is there is a preponderance of old white males in here and I I think that it is important to talk a bit about that and why that is perhaps and so I would encourage you to take a look at an article it's so article it's actually a you can listen to it about what happened to computer science the title of the article is when women stop coding yeah all the way up to 1980 computer science was just another field and then it you know kept going up to sort of 1985 but then it went down and so you know all these other fields have a pretty good uh gender mix and the computer science uh does not have a gender and so I would encourage you to listen to this article uh another another area that you can learn something about um the diversity in computer science is uh this book by Jane margalis called unlocking the clubhouse the essence of the book is really talking about how uh social pressure advertising pressure in the early 80s uh made it seem as though Computing was a guy thing and in particular uh it made it so that uh young men even in high school would go fiddle around with computers as a hobby and by the time they got to college they were pretty skilled and if you weren't skilled the college classes were designed for people who already knew what programming was and if you were a woman and you hadn't played with computers in 1980 when you were like 14 years old you'd came to college at a distinct disadvantage and this is what led to this falloff where colleges tended to teach to programmers who already knew how to program and uh that really meant that that the hobbyists people who are in their youth before they went to college and played with programming as a hobby uh had a tremendous Advantage now Jane did her work in the 80s late 80s and 90s um I myself I'm older than that and I think that to some degree this notion of the social pressure uh that said that uh Computing was really for young men and not young women uh it it's actually as much a symptom as it is the cause because I grew up in a time in the mid '70s where there were women everywhere um and this is a person who's who who who had a really large impact in my life her name is Helen Spence she was a professor at Michigan State University and I show you a URL to look at her oral history um she taught me operating systems so in 1977 or 1978 I don't remember when Helen was my operating systems teacher uh the interesting thing about how Helen taught operating systems was she made it fun I it was really fun uh I encountered my first Auto grader in Helen's class she had made a piece of software that she called the HMS 5050 which itself is both Helen Spence and uh I think it's her majesty ship at the same time and what she did was she built a operating system simulator in Fortran and and our job as uh taking the class was to build functions and sub routines that would implement the tasks and the algorithms that operating systems need to do so you'd run your program over and over and over again until you had written a successful operating system and she wrote a autog grader that would do things to your program and then measure whether or not you knew how to build an operating system it was just a natural thing to have females as fast faculty members uh you know and and it was just a there was a lot more fun in the field um and I I have my own theories that I think complement uh Jane's theories as well but I'll leave it here that that's a TED Talk for another day as to uh gender and Computing right now I'm just acknowledging it and giving you some resources to uh to to read and U and reach your own conclusions if you look at everything I've done since 2012 in the free online education in Muk space you'll notice that every course I've ever created has the words for everybody and that's not just a marketing thing that's because it's my philosophy if you've taken some of my classes like python for everybody or jangle for everybody web applications for everybody postgress for everybody and now see programming for everybody I try to build a course where I spend time thinking about how not to create the clubhouse that Jane talks about and that is I want you in the club I want everybody in the club I don't want it just to be a club for the super Geniuses and so part of why I'm building an entire curriculum is to perhaps just make a new club and that is programming for everybody club and I think once you become a great programmer I think that's a good time to study computer science and so perhaps after we teach everybody programming then we can teach everybody computer science but again that's a TED Talk talk for another day so on to the course C programming for everybody I'll just say that c is the most important programming language you're going to ever learn it should never be the first programming language taught to any students you will likely never write a line of C in a professional context in your career I'm not teaching you C so that you can go be a professional C programmer I'm teaching you C so you can be a professional Java programmer really but if you learn see at the right time in your Learning Journey it's a necessary step on the path to becoming a master programmer but it's important step because I can't teach you Java if I hadn't already taught you C I have to explain how Java Works in terms of what you're going to learn in this class so this class is not just like get a c certificate and go make lots of money CU I don't think that this certificate's going to make you a lot of money but I do think this certificate is going to unlock the future of you as a programmer so please be patient with this material do not rush it's an online class I guarantee that you can search for solutions to every programming exercise that I create I don't change them that much so the solutions are out there and if your goal is to game it and just search for solutions to programming go ahead finish the class congratulations but you you have just taken the opportunity to learn from yourself right you didn't you didn't trick me I didn't lose anything by you searching for and just pasting in solutions to programs each exercise from beginning to end is trying to take you a little tiny step further down your understanding and the easy exercises in the beginning even if I have a bunch of them they're there to prepare and strengthen strengthen you for something much more challenging in the course if you start pasting in Solutions in the beginning of the course you have no chance in the end of the course and you're just going to be pasting in Solutions and when you do that you'll have only wasted your time so I look forward to you taking the rest of this course I of course I look forward to you telling me if uh if I'm right in my instinct about how important this is for you uh especially after you take a few more classes and then uh find your way to being a professional [Music] that b abbs bunch of smart people I call it 1,200 PhD level people in Murray Hill in basically one giant building and so that's a lot lot of people rubbing up against each and they're all doing technical kinds of things and and the environment did not tell people what to do it was go do your thing and once a year one side of one piece of paper tell us what you did and that'll determine how much money you get next year but it was a very long cycle the one extra thing is that it was a problem Rich environment and so there were things that you could work on and there was this I think very gentle gravitational drift towards doing something that somebody else might care about now lots of people you know who cares what other people think I'm going to do my stuff but I think most people got some reward you know psychic reward from I do something I give it to you and you say that was great and then you say but and you tell me all the things that aren't yet great but but that kind of thing was very common at the labs because AT&T at that time basically supplied telephone service to most of the United States million people all kinds of in problems in and around Communications and so no matter what you were interested in there' be some part of AT&T that could probably make use of that so that was part of it but then there was the external world as well the research Community Bill Labs was just part of the academic research community so they're both outlets and the labs was perfectly happy to have people do either and so all of that I think worked out really very well um and it helped to have stable funding because basically at that point if you made a a longdistance phone call in the United States remember the concept of longdistance right if you made one of those calls a tiny slice of the revenue of that finance bill Labs with the charter of make the service better and don't we won't worry about the details of how you do that so at the time there was a lot of interest in programming languages this all came out of the multics experience right where where people at Bel labs and of course the folks at MIT were had realized that writing things in a highle language made sense and then the question is what's what's the high level language and they started with pl1 which in the abstract sounded like a good idea and in reality was a horrible idea because it was a horrible language and and so Martin Richards from uh Cambridge University of Cambridge um had this language called bcpl and he had spent if I understand it correctly a sabatical year at MIT planted the language in some sense and it was much simpler much cleaner much better suited to system programming kinds of things than any version of pl1 would have been and so the people at Bell Labs Ken Thompson Dennis Richie and so on had gotten some experience with high level languages as suitable for writing lots of different things bcpl wasn't a suitable thing for modern machines because it was typeless and newer machines clearly were coming on stream that would have types like btes and integers and maybe bigger things and so so at some point Ken did a lot of experimenting this is Ken Thomas did a lot of experimenting with simpler versions even of bcpl in particular one called B which was an interpreted language no compiler um and that was again expressive enough that people started to like it and that's sort of where I started in on this I mean I'd written bits of pl1 it was awful I'd written Fortran better um but the B was sort of nicer to use but it was still typeless and an interpreter as well so it wouldn't be terribly efficient um but with the pdp1 in the offing and I don't remember the exact timing here U it was clear that a version of something that was felt sort of like B but which had some mechanism to include types so that you could talk about characters or integers was going to be the way to go and that's where Dennis picked up and started developing the C language and the compiler to go with it and so on portability was very much on people's minds at that time because although the core Unix work was done on the pdp1 there were other machines at the time that were you know in the same equivalence class uh inata had a couple 732 832 numbers like that and I think there were probably HP machines as well and ETA um and the other thing that was in some ways harder was that there was the big big Mainframe kind of computerss that were used by the local computer center and these were fundamentally stripped down versions of the multic machines there were g635 kind of things and so those were big clunky word oriented machines they were you know effect IBM 794 is cleaned up a bit and getting something that would compile sensibly for those machines that really didn't have characters in a language which had become what it was so it could manipulate characters I think there was a bit of a strain there but that portability how do you get the same language to work on different computers and Dennis's original compiler really was targeted at the pdp11 and Steve Johnson came along with the portable C compiler which basically separated the front end okay let's recognize the language let's build some intermediate structure and then let's generate code for different kinds of machines I had written a tutorial on B because you know I thought it was interesting maybe I can tell other people how to use it um and so when C started to be used and I became somewhat better at using it then I basically repurposed the B tutorial brought it forward in in and made the c tutorial out of it uh and so I used C4 all this essentially all the software that I was writing at that time and you know kind of liked it it was good it was a nice match for the way people think about Computing I think but also a very nice match for the actual Hardware of the time you could imagine what the compiler was doing all of it was clear so efficient expressive and you know nicely matched to everything around it um and then somewhere in probably 1977 earlyish um I coerced Dennis into writing a book about it first edition came out in 78 and at that point the language was pretty reasonable the book I think it was right on the cusp of we structures were fully part of the language or not a bit of overhang there and I don't remember but I think probably they were not quite but awful close and since Dennis was doing both compiler and book I it was at least a consistent Viewpoint question can you pick up a structure as a unit and pass it around or do you have to do something special um so I think but that's a milestone and then the next one is probably the 1988 book and the development of the ANC C standard the first stand which is essentially again about the same time um and so I think those are the ways I measured and at that point call it 1988 give or take just before you encountered it um C was probably just fine for anything you might reasonably want to do and this is probably heresy or something but I don't think that the changes in evolution since then have bought enough in some sense [Music] welcome to C programming for everybody my name is Charles sance and this is my reading of the 1978 C programming book written by Brian kernigan and Dennis Richie at times I add my own interpretation of the material from a historical perspective chapter zero introduction C is a general purpose programming language it has been closely associated with the Unix System since it was developed on that system and since Unix and its software are written in C the language however is not tied to any one operating system or machine and although it has been called a system programming language because it is useful for writing operating systems it has been used equally well to write major numerical text processing and database programs C is a relatively low-level language this characterization is not perjorative it simply means that c deals with the same sort of objects that most computers do namely characters numbers and addresses these may be be combined and moved about with the usual arithmetic and logical operators implemented by actual machines C provides no operations to deal directly with composite objects such as character strings sets lists or arrays considered as a whole there is no analog for example of the pl1 operations which manipulate an entire array or string the language does not define any storage allocation facility other than static definition and the stack discipline provided by the local variables of functions there is no Heap or Garbage Collection like that provided by alol 68 finally C itself Prov provides no input output facilities there are no read or write statements and no wired INF file access methods all of these higher level mechanisms must be provided by explicitly called functions I would note that the lack of a heap or Garbage Collection feature in C is both one of the great strengths of the language and at the same time is likely the reason that the average programmer will never develop or maintain a major C application during their career C provides a simple feature using Malo and free functions that allow a programmer to request a certain amount of memory be allocated dynamically use the memory and then return the memory to the C runtime library for reuse for example to convert a jpeg image to a PNG image our application will read the jpeg data into memory then convert the image into a PNG image in memory and then write the PNG data out to a file we don't know how large the images will be in ADV advance so we request whatever size we need from C and then give it back when we're done the term Heap refers to the memory that c manages on our behalf when we need to borrow a bit of memory and give it back later there are a couple of issues with a simple Heap implementation first if we forget to call free when we are done with a memory we have created a memory leak and our program will eventually run out of memory and aboard C places the onus of giving back any dynamically allocated memory on the programmer modern languages like Java JavaScript and python keep track of when we stop using dynamic memory using a dynamic memory layer that can automatically reclaim the memory the more difficult problem is after a series of calls to malakin free the Heap space becomes fragmented and some cleanup is needed this cleanup is called garbage collection efficient memory allocation and garbage collection has been the subject of Decades of computer science research the Java language has built an number of increasingly effective garbage collection approaches over the years kernigan and Richie in one simple paragraph Define most of the problem as out of scope for the ca language which makes it a bit challenging for us to make good use of dynamic memory allocation in C but when we do it properly it performs very well if you are currently using a language like Java python or PHP every time you create a new string through concatenation without thinking about memory allocation remember to appreciate the Decades of work by computer scientists that made it easy for you kernigan and Richie knew that garbage collection was difficult so they left it out of the C language and put it into a runtime Library back to chapter zero similarly C offers only straightforward singlethread control flow constructions tests Loops grouping and subprograms but not multi-programming parallel operations synchronization or code routines though the absence of some of these features may seem like a grave deficiency you mean I have to call a function to compare two character strings keeping the language down to modest Dimensions has brought real benefits since C is relatively small it can be described in a small space and learned quickly a compiler for C can be simple and compact compilers are also easily written using current technology one can expect to prepare a compiler for a new machine in a couple of months and to find that 80% of the code of a new compiler is common with existing ones this provides a high degree of language Mobility because the data types and control structures provided by C are supported directly by most existing computers the runtime Library required to implement self-contained programs is Tiny on the pdp1 for example sample it contains only the routines to do 32-bit multiplication and division and to perform sub routine entry and exit sequences of course each implementation provides a comprehensible compatible library of functions to carry out input output string handling and storage allocation operations but since they are only called explicitly they can be avoided if required and they can also be written portably in C itself again because the language reflects the capabilities of current computers C programs tend to be efficient enough that there is no compulsion to write Assembly Language instead the most obvious example of this is the Unix operating system itself which is written almost entirely in C of 13,000 lines of system code only about 800 lines at the very lowest level are an assembler in addition essentially all of the Unix application software is written in C the vast majority of Unix users including one of the authors of this book do not even know the pdp1 Assembly Language I would note that in this preface the authors are carefully explaining the fact that many of the wellestablished programming languages of the 1960s and 1970s like Fortran Cobalt Pascal alol and pl1 were solving many of the use cases that were needed by us programmers by adding syntax to the languages the creators of CN Unix for advocating for a more minimal set of programming language constructs and more Reliance on calling function in provided runtime libraries to meet programmer use cases it may have seemed a strange approach for experienced programmers in the 1980s but over time it has allowed C to expand to meet a very wide range of programmer needs without requiring major revisions to the core language or compiler back to chapter zero although C matches the capability of many computers it is independent of any particular machine architecture and so with a little care it is easy to write portable programs that is programs which can be run without change on a variety of Hardware it is now routine in our environment that software developed on Unix is transported to local Honeywell IBM and inata systems in fact the C compilers and runtime support on these four machines are much more compatible than the supposedly ansy Standard Version of Fortran the Unix operating system itself now runs on both the pdp1 and the interdata 832 outside of programs which are necessarily somewhat machine dependent like the compiler assembler and debugger software written in C is identical on both machines within the operating system itself the 7,000 lines of code outside of the Assembly Language support and the io device handlers is about 95% identical as a note note before Unix and C if you were running the vendor operating system and writing in the best language for systems like the pdp11 and in dat 732 the user experience was completely different today we take for granted that we expect to be able to download the same application for Windows Mac OS or a Linux system even in the 1970s those that were using Unix andc could write code once and move it between two Hardware platforms and expect that it would work with no or relatively few changes back to chapter zero for programmers familiar with other language it may prove helpful to mention a few historical Technical and philosophical aspects of C for contrast and comparison many of the most important ideas of C stem from the considerably older but still quite vital language bcpl developed by Martin Richards the influence of bcpl on C proceeded indirectly through the language B which was written by Ken Thompson in 1970 for the first Unix system on the PDP 7 although it shares several characteristic features with bcpl C is in no sense a dialect of it bcpl and B are typeless language the only data type is the machine word and access to other kinds of objects is by special operators or function calls in C the fundamental data objects are characters integers of several sizes and floating Point numbers in addition there is a hierarchy of derived data types created with pointers arrays structures unions and functions C provides the fundamental control constructions required for well structured programs statement grouping decision- making with if looping with termination test at the top using for and while or at the bottom using do and selecting one of a set of possible cases switch all of these were provided in bcpl as well though with somewhat different syntax that language anticipated the Vogue for structured programming by several years C provides pointers in the ability to do address arithmetic the arguments to functions are passed by copying the value of an argument and is impossible for the called function to change the actual argument in the caller when it is desired to achieve call by reference a pointer may be passed explicitly and the function may change the object to which the pointer points array names are passed as the location of the array origin so array arguments are effectively called by reference any function can be called recursively and its local variables are typically automatic or created a new with each invocation function definitions may not be nested but variables may be be declared in a block structured fashion the functions of a c program may be compiled separately variables may be internal to a function external but known only within a single source file or complet completely Global internal variables may be automatic or static automatic variables may be placed in registers for increased efficiency but the register declaration is only a hint to the compiler and does not refer to specific machine registers C is not a strongly typed language in the sense of Pascal or algo 68 it is relatively permissive about data conversion although it will not automatically convert data types with the wild abandon of pl1 existing compilers provide no runtime checking of array subscripts or argument types etc for those situations where strong type checking is desirable a separate version of the compiler is used this program is called lint apparently because it picks up bits of fluff from one's program lint does not generate code but instead applies a very strict check as to many aspects of the program as can be verified at compile and load time it detects type mismatches inconsistent argument use unused or apparently uninitialized variables potential portability difficulties and the like programs which pass unscathed through lint enjoy with few exceptions freedom from type errors about as complete as do for example algol 68 programs we will mention other lint capabilities as the occasion arises I would note that separately checking for things that might be wrong into the lint program keeps the C compiler simple and easy to port to a new computer the lint program was naturally a very portable text processing application well there's some overlap between a lint program and a compiler over time there's quite distinct research and expertise in how to lint versus how to compile modern lint programs look at programs in far more detail than most compilers separating concerns of lint and the C compiler also allow ow lint programs to use more memory and take more time to execute than compilers since the typical developer might use the compiler many times per day and run lint less often it was nice for the compiler to run quickly and make light use of Computer Resources we call this idea of building two smaller complimentary programs that each specialize in one task separation of concerns and it is an important principle in computer science by keeping each component simple and focused we can more easily build test and verify each component Unix and C showed the benefits of taking many small components approach to solve an overall set of problems back to chapter zero finally C like any other language has its blemishes some of the operators have the wrong precedence some of the syntax could be better there are several versions of the language extent differing in minor ways nonetheless has proven to be an extremely effective and expressive language for a wide variety of programming applications the rest of this book is organized as follows chapter one is a tutorial introduction to the central part of C the purpose is to get the reader started as quickly as possible since we believe strongly that the only way to learn a new language is to write programs in it this tutorial does assume a working knowledge of the basic elements of programming there is no explanation of computers of compilation nor the meaning of an expression like nals n plus one although we have tried where possible to show useful programming techniques the book is not intended to be a reference work on data structures and algorithms when forced to make a choice we have concentrated on the language chapters 2 through six discuss the various aspects of C in more detail and rather more formally than does chapter 1 although the emphasis is still on working examples of complete useful programs rather than isolated fragments chapter 2 deals with da basic data types operators and expressions and chapter three treats control flow if else while for ETC chapter four covers functions and program structure external variables scope RS and so on chapter five discusses pointers and address arithmetic and chapter six contains the details of structures and unions chapter 7 describes the standard CIO Library which provides a common interface to the operating system this IO library is supported on all machines that support C so programs which use it for input output and other system functions can be moved from one system to another essentially without Change chapter 8 describes the interface between C programs and the Unix operating system concentrating on input output the file system and portability although some of this chapter is UNIX specific programmers who are not using Unix system should still find useful material here including some insight on how one version of the standard library is implemented and suggestions on achieving portable code appendix a contains the C reference manual this is the official statement of the syntax and semantics of c and except for one owns compiler the final Arbiter of any ambiguities and omissions from earlier chapters since C is an evolving language that exists on a variety of systems some of the material in this book may not correspond to the current state of developments for a particular system we have tried to steer clear of such problems and warn of potential difficulties with in doubt however we have chosen generally to describe the pdp11 Unix System since that is the environment of the majority of C programmers appendix a also describes implementation differences on the major C systems this recording of chapter zero of the 1978 C programming book written by Brian kernigan and Dennis Richie is part of my C programming for everybody course where I teach C from a historical perspective my name is Charles sance and I'm the teacher of the [Music] course hello and welcome to C programming for everybody I'm Charles Severance and I'm your professor for this course in this lecture we are going to do a very rapid translation from python to C so as I've shown you in our earlier lecture she C is kind of like the mother tongue of advanced programming languages python itself was written and still is written in C and python is deeply influenced by C even though the syntax looks very different right um and if you've taken all my other classes you will have seen PHP you'll have seen uh JavaScript and to some degree even see SS takes some of its inspiration from uh the syntax of c and so I'm not intending for this to be your first programming class I intend for you to be an expert in Python well not expert but certainly I'm not going to tell you what a if statement is I'm not going to tell you what a variable is I'm going to just tell you how to use variables in C and I'm going to tell you how to use if statements in C and so that's why a solid foundation of python not wizard but solid foundation of python is essential and frankly I would rather that you learned a bit of PHP some JavaScript and all this other stuff before you come I see that c I think a C programming is not the first class that you should take but instead it is your gateway to the advanced work that you're going to do and so I think C is very very important I just don't think it's your first programming class so you might think Python and C are not very different although python is written in C python has Whit space that is part of its syntax c whes space is ignored I I do like C better in that python is very object Orient if you read an article I wrote on quora you'll see I I rank all my languages and I I put Javascript and python is the most object Orient languages Java is a little less objectoriented and C is like the least unob oriented C is not objectoriented at all python has wonderfully convenient data structures in the form of list and dictionaries PHP has arrays and um JavaScript has objects the all beautiful beautiful stuff Fair object R structures Python A C does not it's fast it's efficient it's powerful it's got strs and pointers and by goly you will use them and they're not they're not convenient but they are scorchingly fast and by the time we're done with all this we will see how to use strs and pointers to build lists and dictionaries and that really is we will follow down the path of building python so you'll see a a common three theme throughout this class of how python achieves what it's trying to achieve by writing C python has automatic memory management to the point where if you've been a python programmer a PHP programmer a Java programmer or JavaScript programmer you probably don't even know what memory management is well you're going to by the end of this class and by the end of this class you're going to be able to see how python automates memory management for you right python is written in the 80s and C was written in the 70s in in many ways I see python as a convenience layer that was built on top of C just C programmers look at C it's like it's great it's great it's great if we just had this layer of easiness on top of it then things would be better and so um that's what python is now python also introduced things like quite different syntax to make it uh indentation you know required because they thought it was a good idea so we we could argue one way or the other I mean I I'll tell you when I'm writing a million lines of code I white white space is not to me a good way to have syntax so we're going to look at C through a python lens and we're going to learn by example now most of the time I tell you you um don't copy and paste don't cheat don't look for Solutions this lecture is the exception to that rule I've written this lecture as a Rosetta Stone it's just a little tiny bit of connection to what you already know in Python to what you're going to do in C and so I'm not intending at this point for you to build your own stuff based on reading a book I actually just want you and if I give you assignments to do these particular things I really do want you to just watch this lecture grab the PowerPoint and feel free to cut and paste from my PowerPoint into my assignments because this is I don't know if you've ever seen it where the mama tiger is teaching a baby tiger how to hunt and the mama tiger goes out and gets a something or other and brings it back and puts it near the baby tiger and lets the baby tiger Chase it well that's kind of what I'm doing I'm the mama tiger and I'm giving you some C code and I'm putting it right in front of you and then I want you to take that c code and I want you to run it and play with it and understand it so I'm not expecting in this lecture that you're going to derive it that you're going to sort of somehow read the textbook look at a problem and solve the problem that's later that is that's absolutely later so this is the beginning this is trying to make connections conceptual connections to the complex knowledge you have about python to little places where you can hook things on to in C and so the idea is to go through it quickly so I do assigned some of these as programming exercises it's intended for me my intent is that you'll watch the lectures and just work on the code at the same time I'm not trying to test what you learned I really want you to watch and listen and type that's how we learn right you could cut and paste it or you could type it and you could type it one piece at a time and the mere Act of you typing even though you're just looking at a slide and typing it in at this point in the course that is the learning objective is this lecture now that whole rule of just typing code in that you're looking at some from someone else don't don't do that forever later I want you to do things like synthesize what you learn in the book synthesize it struggle through it and figure things out and do the assignment yourself so don't go searching if you want to gain maximum benefit if you're just in the biggest hurry of all just go ahead and search but please if you want benefit from this class don't cheat yourself there's a lot of similarities that I'm not going to cover you can go read the textbook like the plus minus asterisk slash and percent hey probably when you were learning python you're like whoa what's this percent thing and why did they choose percent the answer is that's what C chose and so modulo is just percent and all these other languages because they flipped a coin in C and decided percent was modulo the comparison operators the assignment operators is equal sign which means that the the equality operators got to be double equals exclamation equals less than greater than less than equal all that stuff's the same variable naming rules the same you start with a letter underscore and then numbers letters and underscores and case matters both languages while Loops the concept of break and continue which you know some people get all worried about I love break and continue if you've taken my other courses and you'll see when we talk about it in see I love breaking continu in c2o maybe because that's I learned C first and I I just love breaking and you okay enough about that I love breaking okay enough about that uh constants uh about the only thing that's really different in constants is like strings and characters and booleans and strings and characters are like the biggest thing in the beginning both have int and float and Char and bite now again bite and string and Char and not the same thing C has no stir class which is the string class list or dictionary and python has no concept of struct or double and in a sense you could think of um Python's float is really sees double right and so by the time python was written the notion of shorter floating Point numbers is less less critical there are some differences a lot of this I think was in the design of python trying to be a little less obtuse and a little more convenient uh for me it's annoying I write the C versions of The Operators like and double Amper sand not is exclamation point we call that bang or double vertical bar and in Python they're all convenient we use the word A and D but okay whatever um so there in C we have a for Loop but it's it's an indeterminant Loop if you remember the definition from python for everybody indetermined Loop is one that you have to examine to see if it's got infinite Loop whereas in the in Python if you say 4X in some list you're going to go through the whole list it's a determinant Loop it only runs until that list is exhausted C does not have such a thing right it just every Loop has got a condition to finish it now we write Loops like 4 I equal 1 to 100 or 0 to 99 we write them and you can look at them and say yeah that's not an infinite loop it's just technically you have to look at the Loop to make sure that you haven't inadvertently made it infinit Loop there's no predefined true and false I find this really like wow couldn't they they got eoff capital eof for IND defil Andy none and null are similar Concepts but quite different none in Python is its own type null is the number zero that's cast to be a pointer to nothing and so none is like specially marked empty null is a zero we'll get there we'll get there um strings and character arrays for a while you can kind of pretend that character arays in C are mostly like strings like when you throw a constant and you pass it to a function they kind of look the same but like once we start working with them you'll see they're very very different and that's that's kind of the first fun part of the first part of this class is like strings are now your responsibility there's there's no help right um and C of course has no list or dictionary and python has no concept of like tightly packed data which is what strs are and uh and doubles and floats so here we go let's get started let's see if my pen is working here yeah so what we're going to see is on one side we're going to see some Python and other side we're going to see some C and so this is just talking about output this is Python 3 of course and so we have a print function and it takes any number of parameters one of the things you'll notice about the print function is like hello space World well that's part of this constant but answer comma 42 puts a little space in between answer and 42 and the output so if you want to suppress that kind of automatic addition of spacing you have to maybe concatenate things together or or some other trick um the print statement automatically knows if it's got a string or a float or an integer and it just does things kind of all automatically and so if you want to see something usually you just print it okay so let's take a look at the uh compare and contrast so first off you pretty much you're going to have to start every one of your C programs with pound includes standard i.h comments are different python comments are pound signed to the end of the line uh C comments are SL star across multiple lines to Star slash so everything in between there that can be multiple lines later versions of C also add the What's called the C++ style of comments and JavaScript uses those as comments which is the double slash so when you're writing you probably can use double slash in the C that you're using but I'm kind of being kind of strict and so I'm pretending I'm in 1978 so I'm not using that uh C++ style comment again that came from C++ it didn't come from JavaScript some of you have taken python classes where there's this like uncore uncore Main and it calls this thing and it makes a function and calls it and indents everything one Tab and they're really imitating C in that respect and I think I don't like that style I think those people who do that in Python I'm sure they have a good reason but I think they're just like wishing it was C because the definition of c is code a program starts it when it starts running it searches for a function named Main and later we'll see that this function can actually have arguments and it returns an integer as to the success or failure so really main is a function and so that first line there int main open print Clos print open curly brace that is the definition of a function that happens to be named Main and then we have print F now again if we if you I don't know if you learned python 2 but in Python 2 there was a print statement in Python 3 there is a print function and so here we're using a Python 3 print function and um C never did the statement so C decided as we'll talk about later to not have any input output any reading or writing in the language itself but instead put them in standard libraries and that's what this pound includes stdio.h is saying okay I'm going to do some IO here input output here and so include the C input output Library okay and so print F takes as its first parameter a string the other thing you see in C is you can't use single quotes for long strings later we'll see single quotes but in C there's a major difference between single quotes and double quotes single single quotes are single characters and double quotes are a character array not a string character array the other thing is things like the end of line so in Python the new line is added implicitly in C you have to add it explicitly and so that's basically saying print hello world and then go to the beginning of the next line now this first parameter is actually not just a string it's a string with embedded format codes that start with percent percent D says there is a corresponding integer number and I want you to conver convert that into a string and print it out I guess I should probably just re erase some of this so it says answer and then a number and so you can have more than one of these things and then they match so that says there's an integer as another parameter so you can have one parameter two parameters Etc so for Beyond one parameter like in this one X per. onef that corresponds to this first flow voting Point number and this percent D corresponds to the integer one okay so you have these percent things now we will learn that these percent things have um a language unto themselves in chapter 8 and uh this is basically saying please print me a floating Point number with only one digit of precision right so percent. onef says print a floating Point number and then if we want a string it's percent s but this really Sarah here double quote Sarah double quote is a character array and it's actually not five characters but six because there is always a terminating zero character at the end and so percent s says the parameter needs to be a character array and properly terminated by an end of string indicator which is a zero character so so that's this right it's pretty simple but I we got a lot of stuff to cover here and this is the Rosetta Stone it's more complex than C you have you have more control you're doing things the more explicitly um and and it's not doing it for you automatically so let's take a look at a simple number input and you'll see that some of these things come from my uh my python for everybody class this is the famous US floor to European floor um elevator can inverter so we we're going to print something out now one thing about C is that you've got to declare all your variables python is sort of a typeless language it's increasingly getting more typy um but it's but it's a typeless language so we have to declare that we're going to have two variables USF and euf and they're going to be integers we print the statement about the diff only difference there is we have to put the backs slash n in otherwise it won't automatically do that and then we have this IO routine again coming from stdio.h called scanf and its first parameter much like printf is actually a formatting string and what this says is this says read four a lot little ways find me a number as long as it seems like a number keep reading and turn that into an integer and give it back to me and so it actually has got some scanning built into it and it reads until it finds a non-digit and then stops and says that's the number so it turns out in C the way you could type a lot of different things here we won't go into that too much detail we'll hold that until chapter 8 but the the idea is it doesn't work exactly the same although this input in Python reads a line now again I got this little note here if you recall if you recall in Python 2 there was an input and a raw input and raw input was what read a line which I tended to use when I was teaching python 2 input was a weird thing that had some kind of scanning thing going on and it scanned and threw stuff away and it grabbed something it might go from to multiple lines and it was totally inconsistent and it was worse than scan F so I was really glad when they just got rid of it in Python 3 and then they changed the name what used to be python 2's raw input became input in Python 3 so the old input from python 2 is kind of like homage to scan F in C but it's not exactly the same and the reason it's not the same is the input in Python 2 was was deriving the type of the data from what it encountered so it might give you back a string it might give you back a floating Point might go like oh that's dangerous right and and that's because the type of USF in Python here is determined it it it you can it's not preclarus and by the way both the input and the scan F we can write stuff that confuses it badly and causes it to blow up but we're not we're not worrying too much about that right now we're just kind of getting the basics done so we have a we read an integer we subtract one we print it out right we read an integer oh I forgot to say this Amper sand call by reference and call by value so in Python this is coming back as a function return so it's really easy to assign it into USF whereas in C we put these parameters on the scan F call and we have to say oh and by the way I want you to change it because ultimately if we don't put Ampersand on it's what's called call by value not call by reference and ampers USF is a way to tell C to actually give it the address of the USF variable rather than the value in the USF variable and in that is a the whole chapter that we will cover like uh I think four and five will be all about the Amper sand and call by reference call by I'm we we're way ahead because I don't think chapters one through four would all they ever do is mention call by reference and then say Oh that's in the future so I'm going to just say oh that's in the future I will tell you that that Amper sand is really important and the code doesn't work without it because it is the way that c does call by reference for simple variables like integers and floats as you're going to see on the next slide there's always an exception string input okay so here we're going to this H my hello Sarah thing right we say enter your name and then we say name input now the beauty is here in Python is input gives us a whole line okay and then we just print hello name and you'll notice that there's like this little space that comes out automatically so switching over to C we have our included the library we have the in main which is the and then we have to precare a character array there is no like make it a string If This Were python we could say hey let's make a string but you can't and what's even more important is you got to tell it how long which means that we could type too much stuff in here and blow our program up and that's one of the difficulties of C is the fact that arrays including character arrays have fixed length then they don't Auto extend there is no Auto there is no DOA pend in C you can't like say oh name DOA pend you can't do it python it's an object it's not an actual array python it's an object name is a string object here name is a character array with 100 elements and if you put 20 in you'll be fine if you put 80 in you'll be fine if you put 101 in it's going to blow up ah well that's okay that's why C is fast we'll get to all that we'll get to all that so we print out a prompt and we say scan F and we say in this case percent s give me a string and you can put a limit on it so we're saying look only read up to a 100 characters and you'll notice there is no Amper sand on name and that is because name is an array and so when you put name in with no square brackets no index operator then you're passing the address of the beginning of the array and so that is in a sense an Amper sand that is the location of the beginning of a 100 character array we're going to scan up to 100 characters into it and so it really is roughly equivalent to the input and then we just again say hello percent s and then name is the corresponding thing and so it says hello Sarah now a lot of what we did in the python for everybody is read whole lines of input and we tended to use string parsing of those lines like we would trim the stuff off the end and then we'd split it and all these things there's there's no good split see so we won't be doing too much of that but it does help to understand how to read a whole line of input so now we're going to read something that doesn't have it it has lots of spaces we're going to read the whole line and we're just going to Echo the whole line right enter line read the Line Print the line so now we're going to have again we have to declare how big of a string we're willing to take Char line with a thousand characters in it the prompt by now should be pretty easy and we have a really weird look percent square bracket carrot back sln close square bracket 1000s well if you took python for everybody and you remember chapter 11 regular Expressions that should look familiar to you open bracket carrot back sln close bracket says match any character that's not a new line so that says scan up to the end of a line or until you hit a th characters that's what percent open square bracket carrot back sln close square bracket 1000s means as the first parameter to scan F read a whole line but stop at 100 stop at a thousand characters and then of course line is just the parameter and then we print that thing out okay and so a lot of C programmers probably never written this particular line of code but it gives you a sense that um there's a lot of of sort of programmability and things like regular Expressions that we you know that python had well those are kind of an old concept those are seven 1970s Concepts this C language had that Concept in it in 1970 there's another way that's a little safer to do this and these are the exact same thing where this command fgs so fgs says put it into the up to a thousand characters looking for a new line and reading from what's called standard input so in C there are three basic files one is the standard input which usually is read through to up to eof standard output which is where print f is going and then there's a thing called standard error which is where you send errror messages that you don't just go want to go to the output so the input and the output like if you're going to make a program to do uppercase you would read your input you would uppercase it and then send it out but if for example um um you encountered a character that you didn't want to copy and you want to send an airor says I'm I'm not going to copy you wouldn't just send it to standard output you actually send it to standard err when you're running um just on a terminal like in your command line standard input is your keyboard standard output is your screen and standard error is your screen so you can see both the error messages and the output of the program but if you're running sort of with redirecting input and output you do tend to still see the error message on your screen and it doesn't end up hidden in some stand standard output but in this case we're using fgs which is part of the standard library and we are saying read this from standard input now you'll see in a second when we read a file fgs can read a file and that third parameter is the file handle but there are three predefined file Handles in C programs standard in standard out and standard err they're all named stdin that's their name their predefined constants in the C the C standard SDI Doh Library okay so now we're going to read a file we do this a lot in Python we go get a file handle it reads it this might fail of course if the file doesn't exist then we got add a a nice determinant Loop remember we talk about iner loops and this for in it's so Python and it's so awesome and it's like so expressive I love it I miss it okay and then uh line. strip which takes the new line off um and so that's going to read you know just reads reads the little file writes it out so we uh we have to create a variable we'll call it a thousand characters we now we are and and in in Python we could have any length of characters in our file and it would work but in C now we're going to have to actually say we can only handle up to a th characters because we've declared the line that we're going to use the line variable we're going to read this in has a thousand there is a equivalent to the handle file is a type it's it's defined in stdio.h starand which means it's a pointer to a file object and a hand equals fopen romeo. text R so that's two character arrays romeo. txt and R and actually the open in Python is inspired by the F open in um in in C and that's because again when they were writing python they were writing it in C why don't we take an open and all the did was made the open in python be a little easier so we don't have any kind of a io4 in so we have to write our own y Loop here so we're going to call F gets line give me up to a th000 characters from the file handle named hand and fgs returns null which is a constant that's defined in stdio.h if it reaches end of file so this basically is a loop that says re everything up till end of file very similar to this four line in hand and then we're printing it now I don't have to strip it because F gets actually takes the new line that is the end of each line so in Python you would get double spacing if you didn't strip the new line at the end of each one of these little things right each line whereas the the F gets is nice enough not to give us a new line so there we go so a counted Loop now this honestly is not one of my favorite things in P in Python but this range is a generator that's going to generate the numbers 0o through four 4 I in range this is effectively kind of a a dynamically generated set and then we're going to print it so we're going 0 1 2 3 4 in C we of course have to declare the I is going to be an INT and the for Loop has three pieces separated by semicolons there's the initialization piece now PHP and JavaScript are the exact same thing so if this looks familiar to you that's because you took those classes congratulations so for I equal Z is the initialization that says before the loop starts set I to zero then there is the middle part is the test whether or not the loop should run or continue to run it's a top tested Loop and so I less than five must be true or The Loop won't run at all but given that I is zero at the beginning it's less than five so it's going to run at least once and then each time through the loop at the bottom after the loop is run we're going to add one to I with a i++ a post increment operator and again that line of code PHP JavaScript Java all look the same except PHP has dollar signs for variable prefixes which yeah bothers me but it is what it is and of course we have a a block curly brace open curly brace and close curly brace denote the block and then we simply print the variable out and both both bits of code produce the exact same output so if we get a little bit trickier we're going to do a uh take an example from my python for everybody class and look at the max and Min and because we need to Prime the loop we're going to set our Max Val to none and the Min Val to none and we're going to do an a middle tested infinite Loop while true we're going to read the input line each line like 5 29 we're going to strip it just because we're going to check to see if it's the string done if it is we're going to break out of the loop right and then we're going to convert it to an integer and we're going to check to see if Max Val is none or the Val value we read is greater than Max Val we're going to reme remember it and if Minal is none or the value we just read is less than minval we're going to remember it and when the loop finally reads all the way through we're going to print out the maximum and the minimum so this is pretty much the same code except we're using scanf with a percent D format input format and scanning into the V integer variable and using Ampersand to indicate that it's called by reference and to replace the current V and then the rest of it the same right if uh if it's the first one or we've got a larger one we keep it if it's the first one or we got a smaller one we keep that one as well we Loop through and it all goes now one thing that if we're using scanf as I mentioned before scanf doesn't sort of stop at the end of lines it keeps on going and so the if I have 5 to and nine and again we have to use contrl D or eof here to to finish this or we have to five space two five Space 9 and then eof it it does the same because scanf is just looking for an integer it start it's really it's algorithm we'll see this in chapter 8 but the the thing that it does it's like get me an integer which means throw away stuff that's not an integer so um away you go so that's a slightly more cthonic version of uh this min max using uh scan f and it doesn't suffer from the problem of using uh get us and having to worry about the size of the arrays that's character arrays here's a guessing game it's one of my favorite applications so we have a a infinite Loop the ultimate non-determinate Loop a loop that you got to examine to know that it's going to finish and in this particular one we're just looping to eof you're using try and accept why because line doesn't give you any return indication that it's hit into files so we just have to like have it blow up and then do an accept and then jump out oh well so we throw away the new line and then we convert the line to an integer and we say if guess is 42 nice work and then break which gets out of the loop the break affects the loop not the if and then LF guess less than 42 too low else print too high so this is a classic multi-way if where we can have an if you can have kind of as many lfs as you want dot dot dot and then an else we do the same thing in C uh we're going to use the scanf pattern waiting till we uh see eof um if the guess is 42 um we print nice work and then break now we have to have curly braces here because that is a two statement block and so if you're having more than one statement you've got to do uh curly braces and then else this else matches up with that if else if guess less than 42 print F now modern programmers would tend to put curly braces even though this is only one line but this print f is the statement connected to the if and it does not need curly braces because what comes after an if is a statement or block of statements with curly braces and the same is true of its else the print f is the single statement so you would You' not seeing curly braces here and I I'm I would write this with curly braces but because the authors of the book are really very succinct they tend to not put curly braces in so I'm calling your attention to that now a really important thing to call your attention to is the difference between else space if and L if now the high level is what we're doing in C is not really a multi-way if what we're doing in Python is truly a multi-way if this if and L if and else are really part of the same block of code but this else if is two keywords and so if you look at the the the first if the first if has one block of code which is the print F nice work and the break and then the else Clause of the first if is this entire block of code here which is if guess less than 42 print F yada yada and then another else and so this is a block so this is a block if and in the else Clause there is another block if and so if you look at this really the indentation of this stuff ought to be in in this like the El If part in the else that indentation should be further in now by convention we don't de indent we don't add that indentation even though it's technically correct because this is an else and then there's one statement and that statement is the if okay and so it's we use this idiomatically all the time it looks like a multiway if else if else but it's not it's actually a further and further deeply nested elseif an else with an if inside the else and then another else with an if inside the else we just don't indent it we indent it by Convention as if it were a multi-way if you don't need to know this precisely when you are writing code but I just want to point that out in case like in the back of your mind you're like why does python called LF which is one reserved word and why does c not have an LF but instead has an LF I think when the gido invented python he said look that's a cool convention let's make it actually part of the language rather than a idiomatic use of the language okay enough of that enough of that okay calling by value functions this is pretty easy right there's no defa key word um you have the return value the name of the function and the parameters and then of course before the curly braces you have to have the type of the parameters those are not the type of the variables in the function those are the type of the parameters in Python you don't need to tell it what type they are python is kind of a flexible typeless language the type of a variable goes right along with it any you could be inside my mol and say what kind of a thing is a and then a could be an integer it could be a float it could be whatever could be a string right because that's an object and an object can have a type whereas a is just a number and you have to tell it in C uh that number that's coming in it is an integer and if someone miscalls it in C like put 6.0 here it just blows up doesn't work right I mean it might do something it's just unexpected right so there's no cleverness now there might be some checking you might get a compiler warning that says how come it's an integer here and it's a floating Point here that will be dangerous but it won't fix it for you and it doesn't automatically convert it whereas if you did this in Python it would automatically convert so you have a far greater responsibility to match your types up and see uh things like return statement pretty much functions the same it was you know the python return state statement is an imitation of the C return statement you do have to declare your variable types that are going to be used temporary inside the function scope wise right this C is not outside we will see later when we get to the functions chapter about external values and static values Etc but the default scoping is that any variable that you declare inside of a function only lives inside the function there is no a b or c in the main code any and that's the same as how python works that's a lot of Rosetta Stone we talked about input output we talked about looping we talked about reading a file we've talked about strings which are really character arrays we've talked about um float and later later later later later we're going to learn a lot and chapters five and six are the crazy chapters but we're going to play with how would P how would we Implement some of the things that python strings lists and dictionaries handle and before this course is is over we're going to come back and get inside the mind of what it would take to build python using the C language so we'll see how Alec structures pointers Etc character Rays can be used to build string object list object and dict object and that to me is the learning objective of this course is not so much how to code C because it's your job but what in C is necessary to make a higher level language like python or JavaScript or Java or C work and we'll get to that before the end of the course it is a long course um and again this was a long lecture this takes some time to absorb and and just zooming through this you you you achieve nothing if you just do the homework without understanding so take your time I put the lines in this lecture the lines of code are there very much on purpose every single one is trying to teach something so I hope you'll take the time to learn all this material [Music] cheers hello and welcome to chapter one of K en Richie my name is Charles Sant and of course I'm your professor for this course that's about history so welcome to this course it's really part of a learning path um I don't think that c should be your first programming Lang language and I don't think it should be your last programming language I have a whole series of courses that are all free and available online both just on the web on places like free code camp and corera and at X and the place that you're at in my learning path is a uh that you're at right now is C programming and we're not learning C programming to learn C programming we're learning C programming to take a historical look at how computers work and lead into uh computer architecture I'm not trying to teach you coding in C but I am going to explain how computers work and things like how Java Works using C as kind of like the it just gives me a way to explain Java to you so the outline of the textbook is a kind pretty typical computer science textbook where it uh it starts off easy and then whoo everything goes pretty crazy so chapters 1 through 4 and we're on chapter one right now is mostly syntax and it's just another programming language and especially if you've know a little bit of Java or a little little bit of PHP or a little bit of JavaScript some of that syntax is going to be like whoa of course this is familiar and the answer is well that's because all those languages came from C so it's kind of feels like just another programming language except that arrays are not lists and character arrays are not strings and character arrays kind of look like strings but they don't work like strings and you can get in all kinds of trouble but other than that once you sort of stop worrying about how long things are pretend it's okay which is dangerous of course when you write code chapters 1 through 4 feel a lot like you're just any other programming language but then chapters 5 and six are the valuable chap chter of this book but they also become a lot more difficult so don't don't give chapters 1 through 4 short shrift because five and six are going to just go woo um and then seven and eight is just sort of filling in detail and seven and eight are not so critical um you know it just kind of fills in all the gaps so that's that's the outline of the book just expect that 1 through four is going to be smooth and then five and six are going to be like now we're really getting somewhere okay so looking at chapter one one again chapters sections 1 through 15 looks not that different than any other programming language that you've learned it's chapter one section 1.6 is arrays static allocation arrays you you have to know how big they are when you declare them and you can't resize them until chapter 5 at which point we'll start talking about dynamic memory and pointers and resizing chapter 1.7 and 1.8 functions and parameters and it's all called by value in this early phase called by by reference is in chapter 5 because we need to know about pointers before that we talked about chapter 5 even though they use a little pointer syntax here and there in chapter one and section 1.9 is character Rays read this one closely because there is no string object in C there's no objects at all in C and in section 1.10 they talk about variable scoping between functions and that feels kind of similar to to other languages and part of it is because other languages took their inspiration from C so if we just take a quick look at C character arrays we must understand that the size of the character array is at allocation time and there is nothing Auto extended and if you write a for Loop that goes off the end of the loop like I've got where you know I have a character array that's 10 long and I write a loop That's goes up to a thousand storing data in it eventually the program will blow up and you can see like I mean in Python you just make you just add characters whereas in C if you add characters beyond what where is allocated the system blows up and you probably heard me say more than once that the C language is probably responsible for 90% of the security hole significant security holes in all of computing and this kind of code where you allocate an array and then you wildly go beyond it ends up making it so that people can inject things into operating systems and routers and all kinds of things so this is why we don't use C to to write programs I mean here we are in the first page the example one of chapter one is why we don't write C very often or if we do we have to be really careful at reviewing it and making sure that it's right it's really fast but it's also dangerous uh string constants and character constants strings and characters in most languages Java is a little different but PHP Python and JavaScript treat single and double quotes roughly the same and they create string constants and that's a multicar thing that has a length C doesn't have a multi character thing has a length it has an array of characters that has a zero character at the end of it in C single quotes are a single character and double quotes are a character array so a single a double quote with one character in it is actually two bytes because it's the character and the string ending whereas in Python A String has a length it doesn't really have an ending character there's a special character that we use for an ending in c a character is a bite which is a short integer usually eight bits in in in most computers and so we you got to be real careful you you got double quote things and single quote things and single quote things in in C are far more like integers and far less like strings and so in in in Python you just use them interchangeably single quotes and double quotes character sets the Char in C is like a number it's a tiny number it's eight bits long so you can go from 0 to 255 and the character representations depend on the character set but quite often they're asky and so you can just go look up at an asky chart and figure out what the numeric representation of the letter a is and in Python we can actually see the ordinal position of a by using the or function but that's the or function of a single character string which pulls the ORD of the very first character and we find it it's 65 and if you look up in the asky chart it's 65 but in python python 3 Python 3 are multi-te characters that represent Unicode and unicode is much larger than 8 Bits I think Unicode is 32 bits uh utf8 is a way to represent Unicode and unicode is a 32bit character set and so if you say what is the character the integer equivalent of the character Smiley phase you see that it's 128,50 122 and that's in a space of 32bit it's a 32-bit integer and that's the character Point within that 32-bit integer that represents smiley face in C there is no smiley face you can't represent well unless you put a bunch of libraries into it but the normal out of- the box seed can't can't represent a smiley face it can represent an uppercase a and you can say what is the A and you'll notice we're printing it out with a percent c and a percent D and and it's the same thing if you print a a a a character out as a a character it's an A and if you print it out as an integer it's an a we don't even need an Ord function because character constants are really integer constants in the asy character set okay just understand that every time you see single quote a single quote think of it as an integer as a number that happens to be conveniently looked up for you by the C compiler and you can take a look at the asy character set and you can go look at uppercase a and you see that its decimal equivalent is 65 you also see in this table that its heximal is 41 and its octal is one1 and it's it's binary its actual bits are one bunch of zeros and a one now the reason we like octal and and hex as programmers is it's easier to convert directly one without having converting from decimal requires like divisions and modulo and stuff like that um but converting from octal or hex to Binary is Direct on a on a digit by digit basis so I can convert an octal digit to a binary set of binary digits just by looking at each digit in succession so when we're printing out and we want to be able to understand what the raw bit pattern is of some data we tend to print it out in HEX or in octal so that we can quickly figure out what bits are set inside that uh value strings in C are not strings they are arrays of characters and there is no length so you can ask python what the length of a string is and the string knows its length but in C that you can ask what the length of a string is but it turns into a for Loop that scans until it finds the end of the string and the end of the string is a special character which is quote back sl0 quote which is zero I mean it's literally the integer zero so you have characters that are nonzero and then you have a zero character and the length is how many characters are in this array up to the end now that is different than the allocation so you can have in this case I have a an example of a six character array and I put six things in it it's all full I could have terminated it like you notice I say x sub3 equal 0 it's no it still got six characters in the array but now the end of the string in that array or the end of the character sequence in that array has a zero at position sub three and of course array start at zero so you see the first three characters and the third one is an end and that that stops IT to print out and so you got to you got to a allocate for the end of the character string and you you be you've got to have it there if just because it goes up to six if you don't have the end of the string it's going to go off and and and R randomly go through memory until it blows up probably right and so strings must be terminated if you append something to a string first you have to have enough space in that string if you pen something to a character aray you have to have enough space and then if you overwrite the end of the string you got to add another little Mark to say now the end of the string has been moved so terminating a string is a thing that you always got to think about both when you're scanning through a string and when you're creating a new string like I said the C string length is only computable by a loop that scans for a zero character so there's a sterland function in string.h that computes the string but it's very very different than the Len function in Python Len function in Python X is an object and length is an attribute of that object whereas in C there is an array and it has a length and it has a zero position but to ask how long is it you've got to actually Loop through all the characters looking for the zero marker so you kind of can find a length of a character a length of a string the length of a quote unquote string in C but you got to write a for Loop to do it you don't have to write a for Loop because python just knows the length later we'll bring all these things together much later so one of your assignments exercise one7 is reversing a string in C without requiring any information an extra string you can't you have a string it's got a certain amount of space and you've got to just flip you got to swap the characters you're going to probably have to draw a picture to do that it is exercise 117 and I'm going to tell you do not cheat there are probably a million Solutions out there on the internet chat GPT will tell you how to do it don't be tempted as you do this you will get there I show you a blurred out version of it it's not all that much code so don't shortcut this don't just the solution getting the solution to this assignment without actually doing it is the meanest thing you'll ever do to yourself you have to do the reversal in place it's a classic interview question at the interview you don't get to go to chat GPT you got to think about even length strings odd length strings empty strings and Single Character strings you're going to have to draw some pictures take your time enjoy this assignment seriously it's not that big and when you get it done you can be very very proud of yourself that you you really thought through the low-level storage of what an array of characters with an ending marker is working with and so that's why it's such a good interview question so there we go that's kind of my callouts from chapter one give you a sense of overall sense of the book see character arrays and encouraging you to actually do your homework even though there's a million ways to get it done for you cheers [Music] welcome to C programming for everybody my name is Charles sance and this is my reading of the 1978 C programming book written by Brian kernigan and Dennis Richie at times I add my own interpretation of the material from a historical perspective chapter 1 a tutorial introduction let us begin with a quick ruction to see our aim is to show the Essential Elements of the language in real programs but without getting bogged down in details formal rules and exceptions at this point we are not trying to be complete or even precise we want you to get as quickly as possible to the point where you can write useful programs and to do that we have to concentrate on the basics variables and constants arithmetic control flow functions and the rudiments of input and output we are quite intentionally leaving out of this chapter features of C which are of vital importance for writing bigger programs these include pointers structures and most of se's Rich set of operators several control flow statements and a myriad of details this approach has its drawbacks of course most notable is that the complete story on any particular language feature is not found in a single place the tutorial by being brief may also mislead and because they cannot use the full power of C the examples are not as concise and elegant as they might be we have tried to minimize these effects but be warned another drawback is that later chapters will necessarily repeat some of this chapter in any case experienced programmers should be able to extrapolate from the material in this chapter to their own programming needs beginners should supplement it by writing small similar programs of Their Own both groups can use it as a framework on which to hang the more detailed descriptions that begin in Chapter 2 1.1 getting started the only way to learn a new programming language is by writing programs in it the first program to write is the same for all languages print the words hello world this is the basic hurdle to leap over it you have to be able to create the program text somewhere compile it successfully load it it run it and find out where your output went with these mechanical details mastered everything else is comparatively easy in traditional C the program to PR print hello world is main open parentheses close parentheses open curly brace print F parentheses double quote hello comma space world back sln double quote close parentheses semicolon close curly brace the modern minimal version of this program needs a bit more syntax we add a single line at the beginning hashtag include space left angle brackets stdio.h right angle bracket P include stdio.h we have to add that line for the modern program back to the book just how to run this program depends on the system that you're using as a specific example on the Unix operating system you must create the source program in a file whose name ends in C such as hello.c and then you compile it with the command CC space hello.c if you haven't botched anything such as omitting a character or misspelling something the compilation will proceed silently and make an executable file called a.out running that by the command a.out will produce hello comma world as its output on other systems the rules will be different check with a local expert on Modern systems we use the GCC compiler with the dash ANC option to accept the Legacy syntax of C so we use GCC space minus an space hello.c and to run the resulting a.out file you usually you need to prepend the local directory because most shell configurations do not include the current path in the paths to search for applications so you need to write SL a.out now for some explanations about the program itself a c program whatever its size consists of one or more functions which specify the actual Computing operations that are to be done C functions are similar to functions and subroutines of a Fortran program or the procedures of pl1 Pascal Etc in our example Maine is such a function normally you are at Liberty give functions whatever names you like but Maine is a special name your program begins executing at the beginning of main this means every program must have a main somewhere main will usually invoke other functions to perform its job some coming from the same program and others from libraries of previously written functions one method of communicating data between functions is by arguments the parentheses following the function name surround the argument list here main is a function of no arguments indicated by open parentheses closed parentheses the curly braces enclose the statements that make up the function they're analogous to the due end of pl1 or the begin end of alol or Pascal and so on a function is invoked by naming it followed by a parenthesized list of arguments there is no call statement as there is in foran or pl1 the parentheses must be present even if there are no arguments in the above text the authors were making connections to the popular general purpose programming languages of the time when the book was written it was not all a ured that c and C- like languages would ever evolve past writing high performance applications like operating system kernels and device drivers by comparing C to these more general purpose languages the authors are trying to plant the seed that c could have value as a general purpose language back to the text the line that says print F parentheses double quot hello commas space world back slash and double quot close parentheses semicolon is a function call which calls a function named print f with the argument hello world print f is a library function which prints the output to the terminal unless some other destination is specified in this case it prints the string of characters that make up its argument any sequence of any number of characters enclosed in double quotes is called a character string or string constant for the moment our only use of the character strings will be as arguments to print F and other functions the sequence back sln in the string is C notation for the new line character which when printed advances the terminal to the left margin on the next line if you leave out the back slash n a worthwhile experiment by the way you will find that your output is not terminated by a line feed the only way to get a new line character into the print f argument is with back sln if you try to break it into two lines like print F quote hello world and then just hit the return double quote close parentheses semicolon on a new line the C compiler will print out unfriendly Diagnostics about missing quotes printf never supplies a new line automatically so multiple calls can be used to build up an output line in stages our first program could have just as well been written as main open parentheses Clos parentheses open curly brace prf quote hello quote semicolon print F quote World quote semicolon print F back slash n semicolon and then on a sixth line Clos curly braids and it would have produce the identical output note that back sln represents only a single character an Escape SE sequence like back sln provides a general and extensible mechanism for representing hardto get or invisible characters among the others that c provides are back SLT for tab back SLB for backspace back SL double quote for double quote and back slashback slash for the backs slash itself 1.2 page8 variables and arithmetic the next program prints the following table of fair fhe temperatures and their Centigrade or Celsius equivalents using the formula c equal parentheses 5 / 9 Clos parentheses parentheses Fus 32 the table contains Fahrenheit of 0 Celsius of -7.8 Fahrenheit of 20 Celsius of -6.7 Fahrenheit of 40 Celsius of 4.4 and so forth here is the program itself for reference this this program is on page 29 of the textbook so it starts with pound include stdio.h to include the standard library then it has a comment it says print the Fahrenheit Celsius table for f equals 0 comma 20 comma dot dot dot comma 300 close comment main open parentheses close parentheses open curly brace int lower comma upper comma step semicolon float far comma Celsius lower equals z semicolon followed by a comment upper equals 300 semicolon followed by a comment step equals 20 semicolon followed by a comment far equals lower then while open parentheses far less than or equal upper close parentheses open curly brace Celsius equals parentheses 5.0 9.0 Clos parentheses asterisk open parentheses far minus 32.0 closed parentheses semicolon then a print F statement print F open parentheses double quote percent 4.0 f space percent 6.1 f back slash n close quote comma far comma Celsius close parentheses semicolon far equals far plus step and then a closing curly brace to finish the while statement and then a closing curly brace to finish the main statement the first two lines slash star print Fahrenheit to Celsius table for FAL 0 comma 20 dot dot dot 3 100 star slash are a comment which in this case explains briefly what the program does any characters between slashstar and star slash are ignored by the compiler they may be used to freely make the program easier to understand comments may appear anywhere a blank or new line can in C all variables must be declared before use usually at the beginning of a function before any executable statements if you forget a declaration you will get a diagnostic from the compiler a declaration consider consists of a type and a list of variables that have that type as in int lower comma upper comma step semicolon float far comma Celsius semicolon the type int implies that the variables listed are integers float stands for floating Point I.E numbers which may have a fractional part Precision of both int and Float depends on the particular machine that you are using on the pdp1 for instance an INT is a 16bit signed number that is one that lies between negative 32,768 and positive 32,767 a float number is a 32bit quantity which amounts to about seven significant digits with a magnitude of about 10 Theus 38 and 10 + 38 chapter 2 lists the sizes for other machines I would note that the 1970s was a time of transition in the amount of memory installed in computers the C language int type was 16 bits in the older but more generally available computers like the pdp1 C could be used to write programs like the Unix operating system that made efficient use of available memory in particular the 1978 version of C did not require that Computers support 32bit integers but 32,768 is a pretty small number the size of an integer affected the maximum size of arrays and strings a lot of early C programs use the long type to get at least a 32-bit integer capable of representing numbers up to about 2 billion in modern modern computers and database we tend to choose between 32bit and 64-bit integers back to the text C provides several other basic data types besides int and Float Char is a character a single bite short is a short integer long is a long integer and double is a double Precision floating Point size of these objects are also machine dependent and details are in Chapter 2 there are also arrays structures and unions of these basic types and pointers to them and functions that return them all of which we will meet in due course the actual computation in our temperature conversion program begins with the assignments lower equals z upper equals 300 step equals 20 far equals lower all ending with semicolon these set the variables to their starting VAR values individual statements are terminated by semicolons each line of the table is computed in the same way so we use a loop which repeats once per line This is the purpose of the while statement while parentheses far less than or equal upper closed parentheses open curly brace then the body of the loop and then close curly brace the condition in the parenthesis is tested if it is true I.E far is less than or equal to Upper the body of the loop all of the statements included between the open curly brace and the closed curly brace are executed then the condition is retested if true the body's executed again when the test becomes false I.E far exceeds upper the loop ends and execution continues at the statement that follows the loop there are no further statements in the program so it terminates the body of a while loop can be one or more statements enclosed in braces as the temperature converter or a single statement without braces as in while open parentheses I less than J closed parentheses I = 2 * I semicolon in either case the statements controlled by the while are indented by one tab stop so you can see at a glance what statements are inside the loop the indentation emphasizes The Logical structure of the program although C is quite permissive about statement positioning proper indentation and the use of Whit space are critical in making programs easy for people to read we recommend writing only one statement per line and usually leaving blanks around operators the position of the braces is less important we have chosen one of the several popular Styles pick a style that suits you and then use it consistently I would add that with these words the authors triggered a great debate about how to best indent code and use curly braces that continues to this day the indentation style used in this book is often referred to as the k&r style it tends to put open braces at the end of statements like if and while to keep code more compact in terms of the number of lines of code the best advice is not to debate at all when you modify someone else's code just imitate the style that they used when they wrote their code back to the text most of the work gets done in the body of the loop the Celsius temperature is computed and assigned to the Celsius variable by the statement Celsius equals open parentheses 5.0 SL 9.0 closed parentheses asteris open parentheses far minus 32.0 closed parentheses semicolon the reason for using 5.0 9.0 instead of the simpler looking 59 is that in C as in many other languages integer division truncates so that any fractional part is discarded thus 59 is zero and of course so would then all the temperatures be zero a decimal point in a constant indicates that it is floating point so that 5.0 over 9.0 is 0.555 5 repeating which is what we want we also wrote 32.0 instead of 32 even though since far is a float 32 would automatically be converted to float before the subtraction but as a matter of style it's wise to write floating Point constants with explicit decimal points even when they have integral values it emphasizes their floating Point nature for human readers and ensures the compiler will see things the way you do as well I would note that for those of you familiar with python before Python 3 integer division truncated and returned an integer just like C in Python 3 one of the major improvements was that the division of two integers perform the division operation in floating point and returns a floating Point result C and python 2 made the choice because of efficiency integer division with truncation especially for 16bit numbers was quite fast in the 1970s computers compared to floating Point division that kept the fractional part intact early pdp1 computers did integer division in Hardware while all floating point was done with loops and functions so it was far slower if you wanted to write fast code in the 1970s you avoided floating Point numbers except for special situations modern computers usually do 6 4bit floating Point operations almost at the same speed as integer division so we don't need to allow programmers to avoid using floating point computations in their code the detailed rules for when integers are converted to floating Point are in chapter two for now notice that the assignment far equals lower semicolon and the test while far less than or equals upper both work as expected the int is converted to a float before the operation is done this example also shows a bit more of how print F Works print f is actually a general purpose format conversion function which we will describe completely in chapter 7 its first argument is a string of characters to be printed with each percent sign indicating where one of the other second third Etc arguments is to be substituted and what form it is to be printed in for instance in the statement print F parentheses double quote percent 4.0 f space percent 6.1 F back slash and double quote comma far comma Celsius the conversion specification percent 4.0 F says that a floating Point number is to be printed in a space at least four characters wide with no digits after the decimal point percent 6.1 F describes another number to occupy at least six spaces with one digit after the decimal point analogous to the f61 of Fortran or the F parentheses 6 comma 1 of pl1 parts of a specification may be omitted percent 6f says that the number is to be at least six characters y percent 2f requests two places after the decimal point but the width is not constrained and merely percent F says to print the number itself as floating point printf also recognizes percent D for decimal integers percent o for octal percent X for heximal and percent C for characters and percent s for a character string and percent percent for the percent itself each percent Construction in the first argument of print f is paired with its corresponding second third Etc argument they must line up properly by number and type or else you'll get meaningless Answers by the way print f is not part of the C language there is no input or output defined in C itself there is nothing magic about printf it's just a useful function which is part of the standard library of routines that are normally accessible to C programs in order to concentrate on C itself we won't we won't talk much about IO until chapter 7 in particular will defer formatted input until then if you have to input numbers read the discussion of the function scanf in chapter 7 section 7.4 scanf is much like printf except that it reads input instead of writing output the balance between building a feature into the language itself and providing it as a function in a library is something that computer language designers struggle with many years later for example in Python 2 print was a language element in Python 3 one of the non- upwards compatible and somewhat unpopular changes was changing print to be a function many programmers feel that a print statement is a more elegant way to Express Printing but from a compiler and language design perspective a function call with a variable number of parameters is seen as technically more elegant and flexible with kernigan and Richie focused on keeping everything small and portable they opted to keep all input output functionality in libraries the syntax is a little more complex but given how Computing has changed in the past 30 years it is the right choice section 1.3 the four statement as you might expect there are plenty of different ways to write a program let's try a variation on the temperature converter this is sample code is on page 11 of the textbook pound sign include less than stdio.h greater than main open parentheses closed parentheses open curly brace int bar that is f a HR semicolon for open parentheses far equals z semicolon far less than or equal to 300 semicolon far equal far + 20 close parentheses print F open parenthese double quote percent 4D space percent 6.1 F back sln close print close quote comma far comma open parentheses 5.0 9.0 Clos parentheses asterisk open parentheses far minus 32 close parentheses close parentheses semicolon this code produces the same answers as the one before but it certainly looks different one major change is the elimination of most of the variables only far fahr remains as an INT to show the percent D conversion in print F the lower and upper limits of the step size appear only as constants in the four statement itself four is a new construction and the expression that computes the Celsius temperature now appears as the third argument of print F instead of in a separate assignment statement this last change is an instance of a quite general rule in C in any context where it is permiss permissible to use the value of a variable of some type you can use an expression of that type since the third argument of print F has to be a floating point value to match the percent 6.1 F any floating Point expression can occur there the four itself is a loop a generalization of while if you compare it to the earlier while its operation should be clear it contains three parts separated by semicolons the first part far equals zero is done once before the loop proper is entered the second part is the test or condition that controls the loop far less than or equal to 300 this condition is evaluated if it is true the body of a loop in this case a single print f is executed then the reinitialization step faral far + 20 is done and the condition is re-evaluated the loop terminates when the condition becomes false as with the while the body of the loop can be a single statement or a group of statements enclosed in braces the initialization and reinitialization parts can be any single expression the choice between while and for is arbitrary and should be based on what seems clearer the four is usually appropriate for Loops in which the initialization and reinitialization are single statements and logically related since it is more compact than while and keeps the Lo Loop control statements together in one place I would note that the syntax of the four and while loop is a feature of c and derived C- like languages in modern languages we tend to have two kinds of loop structures determinant and indeterminant the four and the Y Loop structures in C are both indeterminant because you must read them closely to make sure they are properly constructed and for example are not unintentionally infinite Loops an example of a determinant Loop is the for each Loop in PHP or the for Loop in Python the semantics of both of these Loops is to iterate all the elements in a collection but since collections are never infinite you can be assured that these determinant Loops will not run forever section 1. for symbolic constants a final observation before we leave temperature conversion it's a bad practice to bury magic numbers or magic constants like 320 or 320 in a program they convey little information to someone who might read the program later and they're hard to change in a systematic way fortunately C provides a way to avoid such magic numbers with the pound sign defined Construction at the beginning of a program we can define a symbolic name or symbolic constant to be a particular string of characters thereafter the compiler will replace all unquoted occurrences of the name by the corresponding string the replacement for the name can actually be any text at all it's not related to numbers so this is sample code on page 13 of the text pound sign include less than stdio.h greater than next line pound toine space lower space zero next line pound toine space upper space 300 pound define space step space 20 for these pound sign statements I would note that they have to start in the First Column the rest of this sample code is the code itself main open parentheses Clos parentheses open curly brace int far F A HR for open parthy far equals uppercase lower semicolon far less than or equal to uppercase upper semicolon far equals far plus uppercase step and then the same print statement print F open parentheses double quote percent 4D space percent 6.1 F back sln quote comma far comma percent 5.0 9.0 close parentheses asterisk open parentheses far minus 32 close parentheses close parentheses semicolon and then to end the program close curly brace the quantities uppercase lower uppercase upper and uppercase step are constants so they do not eer in declarations symbolic names are commonly written in uppercase so they can be readly distinguished from lowercase variable names notice that there is no semicolon at the end of a pound fine statement since the whole line after the defined name is substituted there would be too many semicolons in the four section 1.5 a collection of useful programs we are now going to consider a family of related programs for doing Simple operations on character data you will find that many programs are just expanded versions of the prototypes we discussed here character input and output the standard Library provides functions for reading and writing a character at a time get charar fetches the next input character each time it is called and Returns the character as its value that is after C equals get Char open parentheses Clos parentheses the variable C contains the next character of input these characters normally come from the terminal or keyboard but that need not concern us until chapter 7 the function put Char open parentheses C closed parentheses is the complement of get charar put charar open parentheses C close parentheses prints the content of the variable C on some output medium again usually the terminal or screen calls to put chart and print F may be interleaved the output may be app will appear in the order in which the calls are made as with printf there is nothing special about getchar and putchar they are not part of the C language but they are universally available once again I would note that the authors are making the case that the syntax of the language should not include Syntax for input output operations but instead call library functions keeping the compiler small and easy to Port new systems was important to the creators of c and even if it's something like putchar was part of the language syntax it would be translated at runtime to call a function programming languages from the 1960s tended to have a small set of use cases read some input run some calculation and then write some output so it seemed like a few language elements would be sufficient to describe all programs but as programs started to make network connections draw buttons on a screen or respond to API calls over the network it would have been difficult to keep expanding the core language Syntax for each new use case but it was extremely natural to add new libraries to languages like C with functions to call to accomplish these new use cases file copying given getchar and putchar you can write a surprising amount of useful code without knowing anything more about input output the simplest example is a program which copies its input to its output one character at a time in outline here's what we do get a character while the character is not the end of file signal I'll put the character we just read and then get a new character converting this into C gives us the sample code on page 14 of the textbook pound includes stdio.h main open parentheses Clos parentheses open curly prce int c semicolon C equals getchar open parentheses close parentheses semicolon while c not equal eoff uppercase eoff close parentheses open curly brace put char C C equals get charge semicolon close parentheses close curly brace the relational operator exclamation equals means not equal to the main problem is detecting the end of the input by convention getchar Returns the value which is not a valid character when it encounters the end of input in this way programs can detect when they did not get a character and and they've actually simply run out of input the only complication which is a serious nuisance is that there are two conventions in common use about what that endif file value really is we have deferred this issue by using the S symbolic name EF capital eof for the value whatever it might be in practice eof will be either negative one or zero so the program must be proceeded by the appropriate pound Define eof minus one or pound Define e0 to work properly by using the symbolic constant eof to represent the value that geta returns when the end of file occurs we are assured that only one thing in the program defines on the specific depends on the specific value numeric value of eof I would note most of that is incorrect modern C compilers actually Define EOS in the stdio.h include file so you never Define eoff in your code in modern C the value of eoff is minus one you should just include stdio.h and use predefined eoff constant to check for end a file the nuisance of different values for eof was resolved shortly after 1978 continuing with the text we also declare C to be an INT not a chair Char so that it can hold the value which get Char returns as we'll see in chapter two the value is actually an in because it must be capable of representing end of file in addition to all possible characters so the program for copying could actually be written more concisely by experienced C programmers in C any assignment such as C equals get Char open parentheses close parentheses can be used in an expression it's a value is simply the residual value being assigned to the left hand side if the assignment of a character to the variable C is put inside the test part of a while statement the file copy program can be written as shown in the example code on page 15 of the textbook pound include stdio.h main open parentheses closed parentheses open curly brace int space c semicolon while open parentheses open parentheses C equals getchar open parentheses close parentheses Clos parentheses not equal EF Clos parentheses put Char C parentheses C Clos parentheses semicolon close curly brace the program gets a character assigns it to C and then tests whether the character was the end ofile signal if it was not the body of the while is executed printing the character the while then repeats when the input is end of input is finally reached the Wild termin terminates and so does Main this version version centralizes the input there's now only one call to getchar and shrinks the program nesting an assignment is a test of one of the places where C permits a valuable conciseness it is possible to get carried away and create impensable code though a tendency that we will try though that is a tendency we will try to curve it's important to recognize that the parentheses around the assignment within the conditional are really necessary the Precedence of exclamation equal not equals is higher than that of equals the assignment operator which means that in absence of parentheses the relational test exclamation equals would be done before the assignment equals so the statement C equals get charar parentheses open parentheses closed parentheses not equal eof is equivalent to C equals get Char open no sorry is equivalent to C equals open parthey get chart open parthy closed parentheses not equal e closed parenthesis this has the undef desired effect of setting it is important to recognize that the parentheses around the assignment within the conditional are really necessary the Precedence of exclamation equals is higher than that of equals which means that in the absence of parentheses the relational test not equals would be done before the assignment so the statement C equals get Char not equal eoff is equivalent to C equals open parentheses get Char not equal eof closed parentheses this has the undesired effect of setting C to zero or one depending on whether or not the call of getchar encountered the end of file more on this in Chapter 2 The Next program counts characters it is a small elaboration of the copy program this sample code is on page 16 of the textbook pound include stdio.h main open parentheses close parentheses open curly brace long NC semicolon N C equals 0 semicolon while open parentheses get CH Char open parthey close parentheses exclamation equal EO Plus+ n c semicolon print f double quot percent LD back sln double quot comma NC parentheses semicolon close curly brakes the statement plus plus NC semicolon shows a new operator Plus+ which means increment by one you could write ncal NC + 1 but plus plus NC is more concise and often more efficient there is a corresponding operator minus minus to de By One The Operators Plus+ and minus minus can either be prefix operators Plus+ C NC or postfix n C++ these two forms have different values and expressions as will be shown in Chapter 2 but Plus+ NC and N C++ both increment and see for the moment we'll stick to the prefix form the character counting program accumulates its count in a long variable instead of an INT on a pdp1 the maximum value of an INT is 32,767 and it would take relatively little input to overflow that counter if it were declared as an INT in honey well and ibmc long and int are synonymous and much larger the conversion specification percent LD signals to print F that the corresponding argument is a long integer we again as a note we again see another reference to the fact that the number of bits of the int type is in transition in 1978 the older pdp1 used a 16-bit integer to save limited memory on a small almost obsolete computer while later computers from IBM and Honeywell have already switched to their int type to be 32bits this allowed code originally written for the pdp1 like Unix or even the C compiler to be recompiled on the IBM or Honeywell with very few changes to cope with bigger numbers you can use a double which is a double length float we will also use a four statement instead of a while to illustrate an alternate way to write a while loop this code is the second sample code on page 16 of the textbook found include stdio.h main open parentheses closed parenthesis open curly brace double NC semicolon for open parentheses NC equal 0 semicolon getar open parentheses closed parentheses not equal e EF semicolon plus plus NC Clos parentheses and then a semicolon a semicolon in this case is an empty statement because there's nothing in the body of the of the for Loop and at the end we say print F double quote percent. 0f back sln double quot comma NC close parentheses semicolon close curly bracket print F uses percent f for both float and double percent. ZF suppresses printing of the non-existent fraction part the body of the for Loop here is empty because all the work is done in the test and reinitialization parts of the for Loop but the grammatical rules of C require that a four statement have a body the isolated SE semi and technically a null statement is there to satisfy that syntax requirement we put it on a separate line to make it more visible before we leave the character counting program observe that if the input contains no characters the while or for test fails on the very first call to getchar so that the loop program produces zero the right answer this is an important observation one of the nice things about while and four is they are tested at the top of the loop proceed before proceeding with the body if there is nothing to do nothing is done even if that means never going through the loop body programs should act intelligently with hand handed input like no characters the while and the four statements help ensure that they do reasonable things with boundary conditions line counting the next program counts lines in its in put input lines are assumed to be terminated by the new line character back sln that has been carefully appended to every line written out this is sample code on line 17 of the textbook pound includes stdio.h main open parentheses closed parentheses open curly brace int C comma NL semicolon NL equals z semicolon while open parentheses open parentheses C equals get Char open parentheses closed parentheses close parentheses not equal eoff Clos parentheses if open parentheses C double equals single quote back sln single quote close parentheses Plus+ NL semicolon print F double quot percent D back sln double quot comma NL close parentheses semicolon Clos curly brace the body of the while loop now consists of an if which in turn controls the increment Plus+ n l the if statement tests its parenthesized condition and if true does the statement or group of statements inside braces that follow we have again indented to show what is controlled by what the double equal sign in is the C notation for is equal to like fortrans do EQ do this symbol is used to distinguish the quality test a question being asked from the single equal sign used for assignment since assignment is about twice as frequently used as equality testing in typical C programs it's appropriate that the operator be half as long A Single Character can be written between single quotes to produce a value equal to the numerical value of the character in The Machine's character set this is called a character constant so for example single quote a single quote is a a character constant in the asky character set its value is 65 the internal representation of the character a of course double single quote a single quote is to be preferred over 65 its meaning is obvious and it is independent of a particular character set these Escape sequences that are used in character strings are also legal and character constants so in tests and arithmetic Expressions single quote back slash ning single quote stands for the value of a new line character you should note carefully that single quote back sln is a single qu character and in Expressions is equivalent to a single integer on the other hand double quote back sln double quote is a character string which happens to contain only one character the topics of strings versus characters is discussed further in Chapter 2 the numeric values that are shown for characters are using the asy character set the character sets in the 1970s were quite intricate most were eight bits long to conserve computer memory and only support a 100 or so Latin like characters this is why early programming languages use special characters like asterisk and curly brace in their syntax very carefully they needed to choose characters that were commonly available on computer keyboards from different manufacturers modern programming languages like Python 3 three and Ruby store internal string values using the Unicode character set so they are all able to represent all the characters in all languages around the world modern languages tend to represent 8 bit values in the range from 0 to 256 using a bite or similar type python 2 strings were stored as 8bit bytes and Python 3 strings are stored as 32-bit Unicode characters moving to Unicode was a major effort in the python 2 to Python 3 transition word counting the fourth in our series of useful programs counts lines words and characters with a loose definition that a word is any sequence of characters that does not contain a blank a tab or a new line this is a very very barebones version of the Unix utility WC this example is on page in the textbook pound include stdio.h pound Define yes one pound Define no zero main open parentheses closed parentheses open curly braas int C comma NL comma NW comma NC comma NW NW equals no NL = NW = ncal Z while open parentheses open parentheses C equals get Char not equal to eof open curly brace plus plus n c if parentheses C equals quote back sln quote parentheses Plus+ n l if open parentheses C double equals single quote space single quote double vertical bar C equals single quot back slash n single quote double vertical Bar C equals equals single quote back SLT single quote close parentheses inward equals no else if open parentheses inward equal equal no close parentheses open curly brace inward equals yes semicolon Plus+ n W semicolon close curly brace close curly brace print F open priny double quot percent d space percent d space percent D back sln double quot comma NL comma NW comma NC close curly brace every time the program encounters the first character of a word it counts it the variable inword records whether the program is currently in a word or not initially it is not in a word which which is assign the value no we prefer the symbolic constants yes and no to the literal values one and zero because they make the program more readable of course in a program as Tiny as this it makes little difference but in larger programs the increase of clarity is well worth the modest X effort to write it this way to make it more readable you will also find that it's easier to make changes to programs where numbers appear only as symbolic constants the line NL equal nwal N C equals z sets all three variables to zero this is not a special case but a consequence of the fact that an assignment has a value and assignments associate right to left it's really as if we had written NC equals open parentheses NL equals open parentheses NW equals 0 close parentheses close parentheses semicolon the operator double vertical bar vertical bar vertical bar means or so the line if open parentheses C equals single quote space single quote vertical bar vertical bar cble equals single quote back slash n single quote double vertical Bar C equals quot back SLT quote parentheses says if C is a blank or C is a new line or C is a tab the escape sequence back SLT is a visible representation of the tab character there's a corresponding operator which is double Amper sand for and expressions connected by double Amper sand or double vertical bar are evaluated left to right and it is guaranteed that the evaluation will stop as TR as true soon as the truth or falsehood for the overall expression is known thus if C contains a blank there is no need to test whether it contains contains a new line or tab so these tests are not made this isn't particularly important here but is very significant in more complicated situations as we will soon see I would note that the double vertical bar and double Ampersand are the norm for Boolean operators in C like languages when a new language was being designed it was really easy to just adopt the C convention for logical operators because while they may seem cryptic millions of software developers were already familiar with the operators in this way the relationship between C and C like languages is like the relationship between Latin and romance languages including English back to the text the example also shows the C else statement which specifies an alternative action to be done if the condition part of an if statement is false the general form is if open parentheses expression closed parentheses statement one else statement two one and only one of the two statements associated with an if then else is done if the expression is true statement one is executed if not statement two is executed each statement can actually be in fact quite complicated in the word count problem the one after the else is an if that controls two statements in braces section 1.6 arrays understanding the capabilities and limitations of CR arrays is one of the most important topics in our historical look at the C programming language most importantly the number of elements in an array declaration must be a constant at compiled time and the size of an array cannot be adjusted using an array declaration while the program's running this inability to automatically resize C arrays as data is added leads to a class of security laws that are generally referred to as buffer overflow where a program reads more data in that can fit into an array and is tricked to overwriting other data or code in compromising an application later in this book we will create Dynamic array like structures in C using pointers and the standard Library calic function python has support for non-dynamic arrays buffers python buffers are generally not used except for programmers writing Library code that talks to low-level code written in a language other than python or talking to operating system things like Linux more commonly used python list and dictionary structures can change their sizes automatically as elements are added and deleted at runtime Java has support for non-dynamic arrays like C which are given a length at the moment they are created and the array length cannot be increased nor decreased without making a new array and copying all the elements from the first to the second array Java does provide list and map structures that automatically adjust their length as data is added or removed Java has a class called array list which can be dynamically extended but provides array likee linear access it is a list internally but it can be used like an array externally the underlying technique that is used to implement language structures like Python's list is dynamic memory allocation in a link list structure link list are one of the most important data structures in all of computer science we will cover Dynamic allocation in implementing data structures in C in chapter 6 for now we will merely examine the syntax of C arrays but keep in mind that allocating an array in C is very different than C creating a list in Python back to the text let us write a program to count the number of occurrences of each digit of Whit space characters blank tab and new line and all other characters this is an artificial problem to solve but it permits us to illustrate several aspects of C in one program there are 12 categories of input so it is convenient to use an array to hold the number of occurrences of each digit rather than 10 individual variables actually 12 individual variables here is one version of the program on page 21 in the textbook and I would note that as these programs get larger and larger it is harder and harder for you to just listen to me read them and you have to go look at them in the textbook so I recommend that you go check out the textbook in page 20 and find this um actual code found include stdio.h main open parentheses close parenthe open curly brace int C comma I comma n white comma n other semicolon int n digit open square bracket 10 Clos square bracket semicolon n white equal n other equals z semicolon four open parentheses I equal 0 semicolon I less than 10 semicolon Plus+ I open I mean close parentheses n digit open square bracket I closed square bracket equals 0 semicolon now we're going to have a loop to read all of our input while double open parentheses C equals get Char open parentheses close parentheses and another closed parentheses not equal EF closed parentheses if open parentheses C greater than or equal to single quote 0 single quote double Amper sand C less than or equal single quote 9 single quote close parentheses Plus+ in digit open square bracket C minus single quote 0 single quote close square bracket semicolon else if open parentheses C double equals space uh quote space quote or double vertical bar cble equals single quot back sln single quote double vertical Bar C double equals single quot backt single quote close parentheses plus plus n white semicolon else plus plus n other that if statement was a sort of a three branch if checking to see if we were doing a digit a a whit space character or some other character at the end of the while loop or not at the end of The L while loop and so we say print F parentheses double quote digits equals double quote closed parentheses semicolon now we'll note in this that there is no new line so we can have these print defs kind of concatenate outward without going to a separate line for parentheses I equals 0 semicolon I less than 10 semicolon Plus+ I close parentheses print F double quote space percent d double quote comma igit open square bracket I close square bracket Clos parentheses semicolon print F open parentheses double quote back slash n Whit space equals percent D comma other equals percent D back slash N double quote comma n white comma n other Clos parentheses semicolon close curly brace let's go through the code the Declaration int n digit op square bracket 10 closed square bracket semicolon declares igit to be an array of 10 integers array of subscripts always start as zero in C rather than one as in forrer pl1 so the elements are igit Subzero igit sub one dot dot dot n digit sub n square brackets are the sub this is reflected in the for Loops which initialize and print the array a subscript can be integer any integer expression which of course includes integer variables like I and integer constants this particular program relies heavily on the properties of character representation of the digits for example if C greater than or equal to single quote zero single quote double Ampersand C less than or equal to single quote 9 single quote Clos parentheses determines whether it's a digit that is if the numeric value of the digit and the numeric value of the digit is C minus double quot 0 double quot this only works if double quot 0 double quote one Etc are positive and increasing order and there's nothing but digits between 0 and 9 fortunately this is true for all conventional character sets by definition arithmetic involving chars and ins converts everything to int before proceeding so Char variables and constants are identically to essentially identical to ins in arithmetic contexts this is quite natural and convenient for example C minus single quote 0 single quote is an integer expression that gives us a value between zero and nine an integer value between Zer and N corresponding to the character quote 0 quote 2 quote 9 quote stored in C and is thus a valid subscript for the 10 element array and digit the decision as to whether the character is a digit a white space or something else is made by the sequence if open parentheses C greater than or equal to quote zero quote double Amper San C less than or equal quote Z quote quote 9 quote closed parentheses plus plus igit open square bracket C minus quote zero quote close bracket semicolon else if C equals single quote or C equals single quot back sln or cou equals single quot backt plus plus n white else Plus+ and other the pattern if in parentheses condition statement else if parentheses condition statement L statement occurs frequently in programs as a way to Express a multi-way decision the code is simply read from the top of the B until the bottom until some condition is satisfied at that point the corresponding statement part is executed and the entire construction is finished of course statement can be several statements enclosed in braces if none of the conditions are satisfied the statement after the final else is executed if present if the final else and statement are omitted as in the word count program no action takes place there can be an arbitrary number of elsif condition statements groups between the initial if and the final else as a matter of style it is advisable to format this construction as we have shown with proper indentation so that long decisions do not March off the right side of the page the switch statement to be discussed in chapter 3 provides another way to write multi-way branching that is particularly suitable when the condition being tested is simply whether some integer or character expression matches one of a set of constants for contrast we will present a switch version of this program in chapter three functions in C a function is equivalent to a subroutine or function in Fortran or a procedure in pl1 Pascal Etc a function provides a convenient way to encapsulate some computation in a black box which can then be used without worrying about its inerts functions are really the only way to cope with the potential complexity of large programs with properly designed functions it is possible to ignore how a job gets done knowing what is done is sufficient C is designed to make the use of functions easy convenient and efficient you will often see a function only a few lines long called only once just because it clarifies some piece of code so far we have used functions like print f get charart and put chart that have been provided for us now it's time to write a few of our own since C has no exponentiation operator like the double asterisk of forrer pl1 let us illustrate the mechanics of function definition by writing a function power open parentheses M comma n closed parentheses to raise an integer into a positive power n that is the value of power parentheses 2 comma 5 5 is 32 this function certainly doesn't do the whole job of exponentiation since it only handles positive powers of small integers but it is best to confuse only one issue at a time here is the function power and a main program to exercise it so you can see the whole structure at once this sample code is on page 23 of the textbook pound include stdio.h main open curly R in I semicolon for for parentheses I equal 0 semicolon I less than 10 semicolon plus plus I Clos parentheses print F double quote percent d space percent d space percent D back slash N double quote comma I comma power open parentheses 2 comma I close parentheses comma power open parentheses -3 comma I close parentheses close parentheses semicolon close curly brace the end of the main now we' begin the function power open parentheses x comma n Clos parentheses int X comma n semicolon open curly brace int I comma p p equal 1 for open parentheses I equal 1 semicolon I less than or equal to n semicolon plus plus I p = p * X semicolon return open parentheses p closed parentheses semicolon close curly brace each function has the same form function name open parentheses argument list if any followed by argument declarations if any followed by the body of the function which includes declarations and statements the functions can't appear in either order and in one source file or two if of course the source file appears in two files you will have to say more to compile and load it then it fall appears in one but that's an operating system matter not a language attribute for the moment we'll just assume that both fun functions are in the same file so whatever you learned about C programs running them will not change the function power is called twice in the Line Print F open parenes double quote percent d space percent d space percent D back sln double quote comma I comma power open parentheses to comma I Clos parentheses comma power open parentheses -3 comma I close parentheses close parentheses each call passes two arguments to the power function which each time returns an integer to be formatted and printed in the expression power open parentheses 2 comma I is just an integer as two and I are not all functions produce an integer value and we'll take this up in more detail in chapter 4 in power the arguments have to declared appropriately so their types are known before the beginning of the body of the function this is done by the line int X comma n semicolon that follows the function name the argument declarations go between the argument list and the opening left brace each declaration is terminated by a semicolon the names used for power and for its arguments are purely local to power and not accessible to any other functions other routines can use the same names for their variables without conflict this is all so true of the variables I and P within the function the i in power is unrelated to the i in main the value that power comp computes is returned to main by the return statement which is just as in pl1 any expression must occur within the parentheses a function need not return a value a return statement with no expression causes control but no useful value to be returned to the call caller as does falling off the end of a function by reaching the terminating right curly brace section 1.8 arguments call by value one aspect of C function which may be unfamiliar to programmers who are used to other languages particularly Fortran and P one in C all function arguments are passed by value this means that the called function is given the values of its arguments in temporary variables actually on a stack rather than their addresses this leads to some different properties than are seen with call by reference languages like Fortran and pl1 in which the called routine is handled the address of the argument not its value it may seem strange that the authors are calling so much attention to the fact that function arguments are passed call by value in the very first chapter most modern programming languages like python PHP or Java pass single value arguments by value by default and to pass in an argument by reference you need to do something special like adding the Ampersand in the function declaration in PHP passing by reference was the norm before c and passing by value was the norm form after C since modern languages were deeply influenced by and often written in C passing by value is the norm for modern languages it's nice because it isolates the data in the calling code from the called code so the called code can't easily mess with its arguments either intentionally or by mistake and create an unexpected side effect and possibly a bug or security flaw in the calling code it was a bit of work to make pass by value work in C C implements a call stack where a bit of memory is automatically allocated at each function call and C makes a copy of the values in the calling code to pass them into the called code in a way that the calling code can see the values and change their local copies without affecting the values in the calling code the same call stack that made it possible for C function arguments to be passed by value also made it possible for a function to call itself recursively Fortran functions could not be called recursively until the 1990 version of Fortran if you know your python you know that simple variables like integers and strings are passed by value while structured data like dictionaries and list are passed by reference I.E the called function can modify its arguments we will later see this in C as well talking about call Stacks recursive functions and the fact that arrays andru structured are called by reference is jumping ahead somewhat so for now let's just remember the author's point that normal valuable values like integers and floats are passed by value in C back to the text the main distinction is that in C the called function cannot alter a variable in the calling function it can only alter its private temporary copy call by value is an asset however not a liability it usually leads to more compact programs with fewer extraneous variables because arguments can be treated as conveniently initialized local variables in the called routine for example here is a version of power which makes use of this fact this code is on page 24 of the text power open parentheses x comma n close parentheses int X comma n semicolon open curly brace int I comma P semicolon for open parentheses p = 1 semicolon n greater than 0 semicolon minus- n Clos parentheses p = p * X semicolon return open parentheses p close parentheses semicolon close curly brace the argument n is used as a temporary variable and is counted down until it becomes zero there is no longer a need for the variable I as in the previous example whatever is done to n inside the power function has no effect on the argument that power was originally called with when necessary it is possible to arrange for a function to modify the variable in the calling routine the caller must provide the address of their variable to be set technically a pointer to the variable and the called function must declare the argument to be a pointer and reference the actual variable indirectly through it we will cover this in detail in chapter five when the name of an array is used as an argument the value passed to the function is actually the location or address of the beginning of the array there is no copying of the elements in the array by subscripting this value the function can access and alter any element of the array in the calling code this is the topic of the next section now I would recommend that you're careful looking at the code samples in the rest of this chapter recall that in C array sizes do not grow and Shrink dynamically at all after they're allocated the authors statically allocate character arrays capable of handling up to 1,000 characters long their code works but it is somewhat riddle so look at the next two sections as examples of cyntax with many important Concepts about character strings stored as arrays and calling patterns when passing arrays to functions as parameters that but not exactly the best practice when handling dynamically sized data back to the text probably the most common type of array in C is an array of characters to illustrate the use of character arrays and functions to manipulate them let's write a program that reads a set of lines and prints the longest the basic outline is simple enough while there's another line if it's longer than the previous longest save it and its length and then at the very end print the longest line the outline makes it clear that the program divides naturally into pieces one piece gets a new line another checks it another saves it and then the rest controls the process since things divide so nicely it' be it would be well to write them that way too accordingly let's write first a separate function called getline to fetch the next line of input this is a generalization of get Char to make the function useful in other context we'll try to make it as flexible as possible at the minimum get line has to return a signal about possible end of file a more generally useful design would be to return the length of the line line or zero if the end of file is encountered zero is never a valid line length since every line has at least one character even a line containing only a new line has length one I would note that here in chapter one we have changed the book's original use of the function named getline to get underscore line in the code examples because it conflicts with Ste the stdio.h that defines getline as a library function in this chapter the authors are providing examples around function naming and linking in later chapters code samples will simply use the built-in git line without an underscore to read an input when we find a line that is longer than the previous longest it must be saved somewhere this suggests a second function copy to copy the new line to a safe place finally we need a main program to control get line and copy here is the result the sample code for this is on page 26 and it's a bit long so you might want to show take a look at uh the sample code in a browser pound include stdi H pound Define MAX Line 1000 Main open pen and Clos pen open curly brace int Len semicolon which is the current line length int Max semicolon which is the maximum length we've seen so far Char Line open square bracket MAX Line closed square bracket semicolon a character array that's the current input line and then char Save open square bracket MAX Line close square bracket semicolon which is a character array that has the longest line where we're going to save it onto the code Max equals z while open parentheses open parentheses Len equals get Line open parentheses line comma MAX Line close parentheses close parthy greater than zero if Len is greater than Max open curly brace Max equals Len to save it and then copy line comma save close curly brace if open parentheses Max greater than zero I.E there was a line print F open parentheses double quote percent s double quote comma save close curly brace to end the main program now we're in the first function getline open parentheses s comma limb closed parentheses Char s Open Bracket close bracket semicolon since it's being passed in as an argument we don't need to know the length of it and the next argument is int limb semicolon so getline takes a character array of unknown length and a limit that tells us the length of the character array open curly brace int C comma I semicolon four open parentheses I equals 0 semicolon I less than limb minus1 double Amper sand parentheses C equals get Char open parentheses close parentheses close parentheses not equal eof and double Amper sand c not equal single quote back sln back single quote semicolon Plus+ I and in the body of the loop it's s open square bracket I closed square bracket equals c from now on I'll read that S Sub I equals c at the end of the loop we say if open parentheses cble equals single quote back slash n single quote closed parenthe see open curly brace S Sub i = c Plus+ I Clos curly brace s subi equals back sl0 quot semicolon return open parentheses I Clos parentheses semicolon close curly brace and that's the end of the getline function and now we on to the copy function copy copy open parentheses S1 S2 closed parentheses purpose of this function is copy S1 to S2 assume that S2 is big enough the Declaration is Char S1 open square bracket close square bracket comma S2 open square bracket closed square bracket as a note these arrays have a size we just don't know what they are and we hope that they're large enough the body of the copy function starts with open curly braids int I semicolon I equal 0 while open parentheses open parentheses S2 sub I equals S1 sub I Clos parentheses not equal single quote back sl0 single quote plus plus I glow curly brace to end the copy function Main and getline communicate both through a pair of arguments and a returned value in getline the arguments are declared by the lines Char s open square bracket close square bracket semicolon int limb semicolon which specify that the first argument is an array of unknown length and the second is an integer the length of the array s is not specified in getline since it's determined in main getline uses return to send a value back to the call Callum caller just as the function power did some functions return a useful value others like copy are only used for their effect and return no value getline puts the character back sl0 the null character whose integer value is zero at the end of the array it's creating to Mark the end of the string of characters this convention is also used by the C compiler with a when a string constant like double quote h l l back SL N double quote is written in a c program the compiler creates an array of characters containing the characters of the string and adds a back sl0 at the end to terminate so that functions such as print F can detect the end so that would lead to an array that has h e l l o back sln back sl0 so it's a five character array with a new line which is a sixth character and then back sl0 which is an actual character again we don't know the arrays don't know their length and so you use the back sl0 as the indicator of the end of a string the percent s format specification in print F expects a string represented in exactly this form if you examine copy you will discover that it too relies on the fact that its input argument S1 is ter terminated by back sl0 and it copies this character back sl0 into the argument output argument S2 all of this implies that back sl0 is not part of normal text it's merely a marker it is worth mentioning in passing that even a program as small as this one presents some sticky design problems for example what should Maine do if it encounters line which is bigger than its limit get line works properly in that it stops collecting when the array is full even if no new line has been seen by testing the length in the last character returned main can determine whether the line was too long then cope with it as it wishes in interest of brevity we have ignored this issue there's also no way for a user of getline function to know in Advance how long an input line might be so get line checks for overflow on the other hand a user of the copy function already knows or should be able to find out how big the strings are so we have have chosen not to add error checking to it section section 1.10 scope external variables the variables in Main Line save Etc are private or local to main because they're declared within main no other function can have direct access to them the same is true of the variables in the other functions for example the variable I in get line is unrelated to the i in copy each local variable in a routine comes into existence only when a function is called and disappears when the function is exited it is for this reason that such variables are usually known as automatic variables following terminology and other languages we'll use the term automatic henceforth to refer to these Dynamic local variables chapter 4 discusses the static storage class in which local variables do retain their values between function invocations because automatic variables come and go with function in invocation they do not retain their values from one call to the next and must be explicitly set upon each entry if they are not set they will contain garbage as an alternative to automatic variables it is possible to Define variables which are external to all functions that is global variables which can be accessed by name by any function that cares to this function is rather like Fortran common or pl1 external because external variables are globally accessible they can be used instead of arguments to communicate data between functions furthermore because external variables remain in existence permanently rather than appearing and disappearing as functions are called and exited they retain their values even after the functions that set them are done an external variable has to be defined outside of any function this allocates actual storage for it the function also must be declared on each function that wants to access it this may be done either by an explicit extern declaration or implicitly by context to make the discussion concrete let's rewrite the longest line program with line save and Max as external variables this requires changing the calls declarations and bodies of all three functions this sample code is on page 29 of the textbook and it's pretty long but um I'll read it for you pound include stdio.h pound Define MAX Line 1000 and we're still outside of main Char line Open Bracket MAX Line close bracket semicolon Char save Open Bracket MAX Line semicolon int Max semicolon those are our three Global variables starting the main main open pin Clos pin open curly brace int Len extern int Max so we're saying that this is an integer but it's also not to be allocated inside of main extern Char Save open square bracket closed square bracket we the line length the length of the save array is defined above so we don't need to Define it here Max equals z semicolon while double parentheses Len double left parentheses Len equals get line parentheses greater than zero parentheses if parthey Len greater than Max open curly brace Max equals Len copy open parentheses no parentheses semicolon no parameters to copy because it's going to Simply talk to the global variables close curly brace after the loop finishes we say if open parentheses Max greater than zero then print F open parentheses double quote percent s double quote comma save close parentheses semicolon close close curly brace and that is the end of the main so now we have the getline function which is specialized to deal with external variables okay so we start get Line open parentheses closed parentheses no parameters open curly brace int comma C comma int C comma I semicolon these are local variables extern Char Line open curl curly BL brace closed curly brace semicolon this is the our reference inside of getline to the global variable line for open parentheses I equals z semicolon I less than MAX Line minus one MAX Line minus one max line is a predefined constant at compile time for I less than MAX Line minus one double Amper sand parentheses C equals get open parentheses close parentheses close parentheses not equal EF double Amper sand IE and c not equal single quot back sln single quote semicolon Plus+ I line subi equals c that's the for Loop that in effect reads characters one at a time and puts them in line after the for Loop we say if open parentheses C double equals single quote back slash n single quote close parentheses open curly brace line sub I equal C plus plus I close curly brace this ensures that we append the new line to the function line subi equals single quote back slash zero back slash that's semicolon the that's the string termination character return I return open parentheses I Clos parenthe see this is the length that getline is returning and then close curly brace to end the getline function and then we have the copy function and again it takes no parameters copy open parentheses Clos parentheses curly brace in I extern Char line Open Bracket close bracket comma save Open Bracket close bracket semic colum I equals z while parentheses open parentheses open parentheses say sub I equals line sub I Clos parentheses is not equal quote back sl0 quote plus plus I close curly brace for copy so the external variables in Main getline and copy are defined by the very first lines in the example above outside of main which state their type and cause storage to be allocated for them syntactically external definitions are just like the Declarations we used previously but because they occur outside of any function including outside the main function the variables are external before a function can use or access an external variable the name of the variable must be made known to the function one way to do this is to write an extern declaration in the function the Declaration is the same as before except for the added keyword extern in certain circumstances the external declaration can be admitted omitted if the external definition of the variable occurs in the same source file before it's used in a particular function then there's no need for an extern declaration in the function the extern Declarations in Main and getline and copy are thus redundant in fact common practice is to place all definition of all external variables at the beginning of the source file and then omit all extern declarations if the program is in several source files and a variable is defined in say file one and used in file two then an extern declaration is needed in file two to connect the two occurrences of the variables this topic is discussed at length in chapter 4 you should note that we are using the words declaration and definition very carefully when we refer to external variables in this section definition refers to the place where the variable is actually created or assigned storage declaration refers to places where the nature of the variable is stated but no storage is allocated by the way there is a tendency to make everything in sight an extern variable because it appears to simplify things argument lists are short and variables are always there when you want them but external variables are always there also when you don't want them the style of coding is fraught with Peril since it leads to programs whose data connections are not at all obvious variables can be changed in unexpected and even inadvertent ways and the program is hard to modify when it becomes necessary the second version of the longest line program is inferior to the first partly because of these reasons and partly because it destroys the general generality of two quite useful functions by hardwiring them into the names of the variables they will manipulate section 1.1 summary at this point we have covered what might be called the conventional core of see with this handful of building blocks it's possible to write useful programs of considerable size and it probably a good idea if you paused long enough to do so the exercises that follow are intended to give you suggestions for programs of somewhat greater complexity than the ones presented in this chapter after you have this much C under control it will be well worth your effort to read on for the features covered in the next few chapters are where the power and expressiveness of the language begin to become apparent this work is based on the 1978 C programming book written by Brian W kernigan and Dennis M Richie their book is copyright All Rights Reserved by AT&T but is used in this work under fair use because of the book's historical and scholarly significance its lack of availability and the lack of an accessible version of the book the book is augmented in places to help understand stand Its Right Place in a historical context amidst the major changes of the 1970s and 1980s as computer science evolve from a hardware first vendor centered approach to a software centered approach where portable operating systems and applications written in C could run on any hardware this is not the ideal book to learn SE programming because the 1978 Edition does not reflect the modern sea language using an obsolete book gives us an opportunity to take students back in time and understand how the sea language was evolving as it laid the groundwork for a future with portable [Music] applications welcome to chapter 2 types operators and expressions so again I'm not going to tell you everything the book I want you to read the book the book does a great job I'm just going to call your attention to some things that that might seem a little bit weird if you're coming from a language like python or JavaScript or even PHP where things are objects and you don't even notice it you've been using objects your whole career and you didn't even realize it so we're going to talk about data types and storage allocation one of the things that you just got to part of what I love about teaching this historical view on C is that we have to talk about storage allocation um float and double worked out pretty well um partly because in the early days of C they did them all in software so they just made them easy and they made them work well um they weren't expected expected to be fast the things that they wanted to be super fast were like the integer and bite bite data character data uh type conversion and then and there's a story that connects integer division in Python 2 and all that pain of Python 3 and how division changed and why it was the way it was and how that worked and again it has to do with performance and simple decisions that got made sadly bitwise logical operations we'll talk about them you're probably not going to use them but it's really important in a historical context to understand why they were so thorough and it really had to do with the fact that because of word oriented computers switching to character oriented computers all of us programmers were thinking in words and if we didn't see shifting and masking and bitwise stuff we'd be like I can't program in this thing cuz a lot of our work work in those word oriented computers was masking and shifting and so it's like we had to have it um we didn't use it as much as we uh we could have thought so let's start with uh division in the good old days we were not worried too much about doing Division and if you were doing Division and you cared about the division uh you were probably doing it in floating point because you were doing SCI scientific Computing and you did that on supercomputers you didn't do that on general purpose computers Unix is really designed for general purpose computers and in general purpose computers you you sort thought to yourself you know why is division that important and I'm sure they made some decision somewhere I do not know why it probably had to do with one of the computers they were working with had truncating division in hardware and non-truncating Division in software I don't know or rounding division or whatever a lot of those computers didn't even have fast floating point so some of the computers they were working on did all the floating point in software and maybe they even did integer division in software with loops and stuff uh but we don't know I I don't exactly know but they made this decision to do integer Division truncating and this was one of the biggest things of going from python 2 to Python 3 that was the most painful so python 2 was like over 25 years old and it wasn't that long after C that python 2 was written python 2 was written in C and uh python 2 to Python 3 transition was a big deal it took a long time it took 12 years to get there but python 2 was like the greatest thing ever except for a few things because python 2 was so related to C the strings in Python 2 natively were asky not Unicode which meant it was it it couldn't even do like Spanish characters let alone uh Asian characters print was part of the language and the programmers got a lot of help like they got automated code converter and syntax Checkers and they did all kinds of things where they would take certain libraries like the print function and then would put it in Python 3 then we backport it to python 2 so you could like switch from using the print statement to the print function and there were lots of things that made this transition as easy as possible the one thing that they really never could crack and we just had to bite the bullet and get used to it was python 2 returns integer and the division is truncated so if you do three four three divided by four is zero in Python 2 and in Python 3 it's integer 3 divided by integer 4 becomes floating Point 0.75 because that's what calculators do and the python 2 Division truncated because it didn't seem like it mattered much back in the 80s and C integer division truncated so 3/4 in C was zero and 3/4 integer 3/4s was zero in C and so it was in Python and 20 plus years later that was the one thing we couldn't autocon convert and Python 3 does it the way python does so it's less of a problem in C because C is actually a strongly typed language which means if I wanted to say 3 over4 I knew whether it was integers or floats and I could force that um and so but in in Python you just what's the variable and so that it imputes the variable type from the result of that expression where C has got a declare X as a float or a double or an in so when a c programmer writes a division they need to know they know already that they have to cast the values or use float constants to trigger type conversion and expressions so as you're looking through the chapters seeing these type conversions and casting uh that's the kind of problem that was solved but then python simplified it and then made it really kind of kind of luck uh kind of yucky and then they had to fix it it's better in Python 3 and most of you have just learned Python 3 so consider yourself lucky so another thing they talk about a lot and it has to do with as we start thinking about um storage allocation on a bit bybit basis we tend to need to know how how to represent and print things out not just in base 10 and it really has to do with the fact that um base 8 and base 16 are better at printing out binary data raw data 10 is the number like where you know how many pizzas do you want I want 16 pizzas or 22 pizzas it's it's our natural the way we think way humans think so talk about to talk about bases let's start by just reviewing what base 10 means there is you know the ones place and the 10's place and of course later there's hundreds and thousands but the the four and 42 represents 4 10 so you can think of it as 4 * 10 which is 40 and then the one's place is two more so 42 is 40 + 2 which is kind of redundant we do that instinctively so now let's take a look at base 8 so in 42 in base 10 is 52 in base 8 and what base 8 is really doing is it means that the digits in the two places means something different meaning the five and 52 represents an eight so there's 58s in this number that we're dealing with and two ones and so 5 * 8 is 40 and 2 * 1 is 2 and so converting from 52 base 8 to 52 base 10 we get to 42 and we used base 8 and again base 8 lines up perfectly with base two because three base 2 digits equals one base 8 digit and I did a lot of Base 8 in the early days but base 16 is really the way we tend to do it now because it's a little more dense the rightmost place is still the ones and the next place is the 16s so the two in the 16's Place represents 2 * 16 or 32 and then what's left over is 10 which we represent with an A and 10 + 32 is 42 now the problem is is we only have digits 0 through n so by convention 10 is a 11 is B 12 is C 13 is D 14 is e and 15 is f f is all the ones it's four ones I know that I just like four ones so I can convert hex from hex to base two like very very quickly and if I need to look at some a dump of some memory I can dump it in HEX and then I can when I need to I can convert it to base two so just converting back and forth between base 16 and base two uh is a bit of a trick and I don't really care if you do much of this you can grab this sample code and play with it this is a conversion from base 10 Base number like 1 2 3 4 to base eight and then base 16 and the way this one does it is it converts the number in effect from the left the rightmost number up to the low the low digit up to the high digit and so what you do is you use the modulo function and so you take your number one 2 3 4 and you take it modulo 8 and what's left over as remainder is two so that's the far right number in the new one and then you chop it off with integer division you chop that off and see what's left and that's 154 and you're accumulating the two in the converted number and then you take the modulo of 154 modulo 8 and you get two and then you chop off the next eight with divide by integer divide by four and you get 19 so now you your bottom two digits are two two and then you do 19 modulo 8 and you get three and that next digit from the right is a three then you divide by eight integer divide by 8 to get what's left over and that's below eight and so that the fourth digit from the right is a two so 1 2 3 4 in base 8 is 2 322 and you can do the exact same thing the difference is you got to look up the digits because the base 16 needs the ab bcde e f and so I make a little string now go we're talking python here and so we do a we do repeatedly modulo 16 integer divide by 16 modulo 16 integer divide by 16 modulo 16 integer divide by 16 and so we take 1 2 3 4 and when we convert it to base 16 it's kind of comes up from the bottom as 2 D4 which we read 4d2 so that's just an algorithm that converts these you tend to you tend to use this modulo and that's how can convert from one base to another now it's not critical in this class we're not going to spend a lot of time converting bases but we just need to be aware that uh because there was so much awareness of how bits were stored we tended to print a lot of stuff out in heximal or in base eight and so I just want you to to know what those things are so if you look for example at the asy chart that we've already seen you see that it shows us that the letter a is 65 and in HEX it's 41 and in octal it's one1 and in base 2 or binary it's one and a bunch of zeros and a one and so this is this is just something that's making and you know in the old days you had to be just much more aware of the real bits inside the computer and hex and base 8 hex and octal were better ways to sort of know what the bits were um you know so there you go now another thing that uh C really was one of the early Innovative language was bite addressable computers right and so we don't think much everything's a string and we can look at the characters in The String but in the old days before c and the generation of computers that kind of triggered c um we didn't have characters you couldn't in the hardware load a character you could only load a word and then you had to find the character within the word and uh the language that really was the immediate precursor to C was the B language and the difference was is the B language was a really cool low-level word oriented language and then the C language came from B and became a bite oriented language and so C sort of like said we're going to do bite and dressing so if I take a look at the way I had to do character character support in a CDC 6500 which is the computer I was using in like 1975 1976 it was a scientific computer it it barely cared about printing characters it didn't even have lowercase it had 60 bit words and packed six-bit uppercase characters into those words and it used a series of zeros to fill it up and so if I put the word hello world words hello world into the CDC 6500 it took two words and hello space w r l was packed into the first word and D was in the second one and then we did what was called zero filling the rest of those characters were all zeros the integer zero 0000000000 Z and if I wanted to know what the fifth character of this two-word string was like the O for example you would create a mask and in that mask you would have zeros where you wanted to get rid of stuff and ones where you wanted to copy stuff so you would take hello world and you'd run it through this mask with the bits in the right position and then you would get the O and all zeros in the rest of the word and then you would have to shift it half the way down because there were 10 characters so you had to shift it five characters to the right and then you would have the letter o in the bottom six bits of that word and then I could write an if statement that's how I would say if the fifth character is the letter O I had to extract the fifth character by hand so you can imagine how happy I was when I began to see programming languages that allowed me to use more of an array syntax and say string sub five or even in this case 0 1 2 3 four string sub four but I could treat characters as an array the notion of a character array for me in the 197 in 1977 was what why would you you know you couldn't do that right and so we you know a whole generation of programmers went go through their entire career without having to do any masking and shifting so this chapter is going to talk to you about it and you might say well if C was so good at doing it for you um why did they show it and that is that people like me would not have had respect for this language if it weren't for the fact that they had good masking and shifting CU we were doing that all the time in these word oriented computers and word oriented languages and just as C and Unix were making the world safe for characters then we had this other problem and I'm only going to talk a little bit about this just don't worry about it the concept of indianness so if you're loading words do you load them with the least significant digits first or the most significant digits first and most comp computers were big Indian and big Indian to us software developers made the most sense because that's how we thought it would lay out but it turns out that a few processors wanted to load if they were going to do an ad they wanted to Lo load the low end of the integer first so they could start the addition while they're adding bringing in the high end and they could overlap the load and the ad and then Intel which in those days wasn't all that popular but they were so so interested in the microprocessor performance that they became little Indian so that they loading an addition were fast and so we've been stuck with a lot of little Indian microprocessors since then and big Indian and little Indian it's it's one of the harder things to solve really it really is and so I'm going to show you some code I really I really all I want you to do is feel sorry for those of us who had to figure out little Indian and big Indian and let me just give you an out line of what this code is doing I'm not going to walk through it in detail it just is kind of scary so and and you're not even going to understand most of this code until chapter 5 it's just let's just talk a little bit about the bits and how masking and shifting would have worked if we didn't have character arrays so what I'm doing in this program is I'm creating a character array the length of this character array is hello world plus one for the Terminator h e l l o space w r l d so 11 + 1 uh should be 12 characters are allocated and then what I'm doing in this next line that says in Star SII is I'm actually saying I want to take the same storage and pretend it's an integer array and so that's what that line is is the it takes the address of the beginning of the first character and convert it from a pointer to a character which is Char s is a pointer to a character and have it be a pointer to an integer and again I'm sort of like jumping ahead in chapter 5 so I'm I'm not expecting to understand all this I'm just just making you aware of it so in those first two lines I've got a character array and an integer array okay and this is a this is a 32-bit integer and so that means that um the characters are stuck into 32-bit integers in a little ending way and so if you look that means that if you just look at memory from left to right that the the lowest of the first four which is 32 bits the the first character that comes out is the L right and you can see the the little Indian which which in your mind should thinks it should be shifted but it's because this is running on a little Indian computer and different computers will give you different results and this is a a little endian example and so you can see with masking and shifting where I'm going to try to get the E out which would normally be the second character but it's kind of the second from the bottom of the first integer and so what I do is I make a mask and I'm going to print this out I take FF which is uh eight bits of ones and then I shifted up eight characters to the left and you can see that in the printout and then I mask out that character which is the e but then it's in the wrong position and then I have to take that masked result and I have to shift it back down eight so that it's in the bottom part so now I can check to see what that letter is this is how I would pull out the second character of a string so I could check to see if it's an e because I can't compare directly the second character of string right in Python you're like why are we doing this that's why you build a string class instead of use a character array for this and i' I made it even worse by like starting with taking a character array and viewing it as an integer array and then playing with the integers so you don't have to understand this just be thankful that you use Python and if you don't use Python use C and whether it's a big Indian or a little Indian machine you can you can treat an array of characters as an array of characters and you can get the third one or the fifth one with with a square bracket notation okay storage allocation storage allocation storage allocation so a summary of this lecture is we talked about number base conversion we talked a little bit about division why the python 2 integer division happened I I don't really have a really good answer for that concept of integers and words and bites and masking and shifting and characters just because these topics are covered in this chapter and they will feel very foreign to you and unnatural but just give them a shot read through them understand them and uh and they'll make sense later later we're going to learn about structures and pointers and addresses advancing and stuff like that it'll all make a lot more sense coming up [Music] welcome to C programming for everybody my name is Charles sance and this is my reading of the 1978 C programming book written by Brian kernigan and Dennis Richie at times I add my own interpretation of the material from a historical perspective chapter 2 types operators and expressions variables and constants are the basic data objects manipulated in a program declarations list the variables to be used and state what typee they have and perhaps what their initial values are operators specify what is to be done to them Expressions combine variables and constants to produce new values these are the topics of this chapter section 2.1 variable names although we didn't come out and say so there are some restrictions on variable and symbolic constant names names are made up of letters and digits the first character must be a letter the underscore counts as a letter it is useful for improving the readability of long variable names Upper and Lower cases are different traditional C practice is to use lowercase for variable names and all uppercase for symbolic constants only the first eight characters of an internal name are significant although more may be used for external names such as function names and external variables the number must be less than eight because external names are used by various assemblers and loaders appendix a list the details furthermore keywords like if else in float Etc are reserved you can't use them as variable names and they must be in lower case I would note that in modern CA languages the limitation of the first eight characters of a variable name being unique has been extended in most C variants at least 30 characters of a variable are treated as unique the character limitation was to reflect the typical limitation of identifier length in Assembly Language programming and runtime linkers of the time naturally it's wise to choose variable names that mean something that are related to the purpose of the variable and are unlikely to get mixed up typographically section 2.2 data types and sizes there are only a few basic data types in C Char which is a single bite capable of holding one character in the local character set int an integer typically reflecting the natural size of integers on the host machine float a single precision floating point and double is a double Precision floating point in addition there are a number of qualifiers which can be applied to int short long and unsigned short and long refer to different sizes of integers unsigned numbers obey the Ari arithmetic of modulo 2 to the N where n is the number of bits in an INT unsigned numbers are always positive the Declarations for qualifiers look like short int X semicolon long int y semicolon unsigned int Z semicolon the word int can be omitted in such situations and typically is the Precision of these objects depends on the machine at hand the table below shows some representative values on the deck pdp1 a Char is 8 Bits an in is 16 a short is 16 a Char is 32 a float is 32 a double is 64 on a Honeywell 6000 which uses asky character set a Char is nine bits an INT is 36 bits a short is 36 bits a Char is 36 bits and a float is 36 bits and a double is 32 bits on the IBM 370 which is idic in its character set a Char is 8 Bits an INT is 32 bits a short is 16 bits a Char is 32 bits a float is 32 bits and a double is 6 64 bits and so on the intent is that short and long should provide different lengths of integers where practical int will normally reflect the most natural size of a particular machine as you can see each compiler is free to interpret short and long as appropriate for their own Hardware about all you should count on is that short is no longer than long in this table we see that in the mid1 1970s C was designed to support a range of computer Generations the pdp11 was a common previous generation computer that had less memory so variable sizes were kept small the more modern computers in the chart had a bit more memory and could afford to have slightly larger larger sizes the idea of a natural size is the size that could be loaded computed and stored in usually a single machine language instruction you knew as a programmer that when you used int the machine code you would generate would not need to include extra instructions for a simple line of code like xal x +1 semicolon modern int values in C are 32 bits long and long values are 64 bits long even though modern computers can do 64-bit computations in a single instruction using the shorter int type when appropriate can save on memory storage and memory bandwidth using int values interestingly the length of a 32-bit int leads to a Unix and C problem with dates that is called the year 2038 problem a common way to represent time in Unix C programs was as a 32bit integer of the number of seconds since January 1st 1970 it was quick and easy to compare or add or subtract these second counter dates in code and even in databases but the number of seconds since January 1st 1970 will overflow a 30-bit number on the 19th of January in 2038 by now in order to avoid systems in order to avoid problems most systems have converted to storing these number of second values in long or 64-bit values which gives us almost 300 billion years until we need to worry about overflowing second Tim counters again back when C was developed we had two different character sets and two different character variable lengths the world generally standardized on the asy character set for the core Western characters and the Unicode utf8 to represent all characters in all languages worldwide but that is a story for another time for now just think of the Char type as also a bite type it is8 bits in length and can store asy modern languages like python or Java have excellent support for wide character sets in our historical look at C we will not cover wide or multi-b characters also if you look at the float and double types you will see different bit sizes even worse each of these computers in the 1970s did floating Point computation using slightly different hardware implementations and the same code run on different computers would give slightly different results and have unpredictable Behavior un overflow underflow and other extraordinary floating Point operations this was solved by the introduction of the i e 754 standard in 1985 which standardized floating Point format this standardized both the length of the float in the double but also ensured that the same set of floating Point calculations would produce the exact same result on different processors 2.3 constants int and Float constants have already been disposed of except to note that the usual 123.456 eus 7 or 0.12 E3 scientific notation for floats is also legal every floating Point constant is taken to be double so the E notation serves for both float and double long constants are written in the style 1 2 3 capital L an ordinary integer constant that is too long to fit into an INT is also assumed to be a long there is a notation for octal and heximal constants a leading zero on an INT constant implies octal and a Le meting 0x or 0x uppercase indicates heximal for example the decimal 31 can be written as 037 in octal and 0x1f or 0x1f where f is capitalized in HEX heximal and octal constants may also be followed by the letter L to make them long a character constant is a single character written in single quotes as in quote X quote the value of the character constant is the numeric value of the character in The Machine's character set for example the asking character set zero or quote Zer quote is 48 and in idic quote Zer quote is 240 both quite different from the numeric value zero writing quote zero quote instead of a numeric value like 48 or 240 makes the program independent of the particular value character constants participate in numeric operations just like any other numbers although they are most often used in comparisons with other characters a later section treats conversion rules certain non-graphic characters can be represented in character constants by escape sequence like back sln for New Line back SLT for tab back sl0 for null back slashback slash for backs slash itself and back slash quote for single quote Etc these look like two characters but they're actually only one in addition an arbitrary bite-sized P bit pattern can be generated by writing single quote back slash and then three digits single quote where the three digits is 1 to three octal digits as in pound define space form feed single quote back sl14 single quote which is asking for a form feed we mentioned form feed here because in the 1970s we sent much of our output to printers physical printers a form feed was the character we would send to the printer to a to advance to the top of a new page the character constant quote back sl0 quote represents the character with the value zero quot back sl0 quote is often written instead of zero to emphasize the character nature of some expression expression a constant expression is an expression that only involves constants such expressions are evaluated at compile time rather than runtime and accordingly may be used in in any place a constant may be as in pound Define MAX Line 1000 Char line Open Bracket MAX Line plus one close bracket semicolon or seconds equals 60 * 60 * hours a string constant is a sequence of zero or more characters surrounded by double quotes as in double quote I space am space a space string double quote or double quote double quote which is a way to show an empty string the quotes are not part of the string but only serve to delimit it the same Escape sequences used for character constants apply in strings back SL double quote represents the double quote character technically a string is an array whose elements are single characters the compiler automatically places the null character back sl0 at the end of each such string so programs can conveniently find the end this representation means that there is no real limit on how long a string can be but programs have to scan one to completely determine its length length the physical storage required is one more location than the number of characters written between the quotes the following function sterin taking the parameter s Returns the length of the character string s excluding the terminal back sl0 sterlin open parentheses s closed parentheses Char s Open Bracket Clos bracket semicolon open curly brace int space I semicolon I equal 0 semicolon while parentheses S Sub I exclamation equals quote back sl0 quote close parentheses plus plus I semicolon return parentheses I parentheses semicolon Clos curly brace be careful to distinguish between the character constant and a string that contains a single character single quote X single quote is not the same as double quot X double quote the former is a single character used to produce the numerical value of the letter X in the machine's character set the latter is a character string that contains one character the letter X and a back sl0 section 2.4 all variables must be declared before use although certain declarations can be made implicitly by context a declaration specifies a type and is followed by a list of one or more variables of that type as in int lower comma upper comma step semicolon Char C comma line Open Bracket 1000 close bracket semicolon variables can be distributed among declaration in any fashion the list above could be could be equally well written as in lower semicolon int upper semicolon int step semicolon Char C semicolon Char line Open Bracket 1000 close bracket semicolon the latter form takes more room but it is convenient for adding a comment to each declaration or for subsequent modifications variables may also be initialized in their declaration although there are some restrictions if the name is followed by an equal sign in a constant that serves as an initializer as in Char backs slash equals single quote backs slashback slash single quote semicolon int I equal 0 semicolon float EPS equal 1.0 eus 5 semicolon if the variable in question is external or static the initialization is done once only conceptually before the program starts executing explicitly initialized automatic variables are initialized each time the function they are in is called automatic variables for which there is no EXP explicit initializer have undefined that is garbage values external and static variables are initialized to Zero by default but it is a good style to State the initialization anyway we will discuss initialization further as new data types are introduced section 2.5 arithmetic operators the binary operators are plus minus asterisk and Slash and the modulus operator percent there is a unary minus but no unary plus integer division truncates any fractional fractional part the expression x% y produces the remainder when X is divided by Y and is thus zero when y divides X exactly for example a year is a leap year if it is divisible by four but not by 100 except years divisible by four are leap years therefore if parentheses year percent 4 equal equal 0 and and year percent 100 not equal zero or year perent 400 equal equal Z it's a leap year else it's not the percent operator cannot be applied to float or double the plus and minus operators have have the same precedence which is lower than the identical precedence of asterisk slash and percent which are in turn lower than unary minus arithmetic operators group from left to right a table at the end of this chapter summarizes precedence and associativity for all operators the order of evaluation is not specified for associative and communative operators like asterisk and plus the compiler May rearrange a parenthesized computation involving these thus A Plus open parentheses B plus C closed parentheses can be evaluated as open parentheses a plus b closed parentheses plus C this rarely makes any difference but if a particular order is required explicit temporary variables might be used the action on overflow or underflow depends on the machine at hand I would note that the above paragraph allows the compiler I would note that the above paragraph allowing the compiler to reorder computations even in the parenthe presence of parentheses is known as the knr C Arrangement license as the author state it almost never makes a difference unless an expression contains a value computed in a function call or is there a poined lookup to find a value for the computation that might fail the the rule was subtly adjusted in the iso version of C but ISO C still does not strictly Force the order of otherwise communative operations even in the pr presence of parentheses the good news is that as long as you keep your expression simple you don't have to worry about this rule sometimes the real value of parentheses is to communicate your intentions to the human readers of your code if you are writing code that depends on the order of overflow function calls and pointer D references in a single mathematical expression perhaps you should break your expression into multiple statements section 2.6 relational and logical operators the relational operators are greater than greater than or equal less than less than equal they all have the same precedence just below them in precedence are the equality operators double equals and exclamation equals which have the same precedence relational have lower precedence than arithmetic operators so expressions like I less than limb minus one are taken as I less than open parentheses limb minus one closed parentheses as would be expected more interesting are The Logical connectives double Amper sand and double vertical bar and and or Expressions connected by double Amper sand or double vertical bar are evaluated left to right and the evaluation stops as soon as the truth or falsehood of the result result is known these properties are critical in writing programs that work for example here is a loop from the input function getline which we wrote in chapter 1 for parentheses I equal 0 semicolon I less than limb minus one double Ampersand parentheses C equals get Char open parentheses closed parentheses closed parthy not equal single quote back slash n single quote double ampersand C exclamation equal e f semicolon Plus+ I close parentheses S Sub I equals c clearly before reading a new character it is necessary to check that there's room to store the array so the test I less than limb minus one must be made first not only but if this test fails we must not go on and read another character similarly it would be unfortunate if C were tested against eof before get Char was called the call must occur before the character C is checked against do the Precedence of double Amper sand and is greater than that of double vertical bar or and both are low lower than the relational inequality operators so expressions like I less than limb minus one double %an parentheses C equals get Char open pen Clos P Clos pin not equal single quote back sln single quote double Ampersand c not equal EF needs no extra parentheses but since the parentheses of not equals is higher than assignment parentheses do need to be added in open pry equals get Char open print close print close print not equal back sln to achieve the desired result let's take a brief digression one of the great debates of the 1970s was how to use structured programming to avoid any use of go-to statements that lead to completely unreadable spaghetti code structured code was easier to read debug and validate structured code advocated for if then else else if while do loops and do while Loops where the loop exit test was at the top or the bottom of the loops respectively there was a move from flowcharts with lines boxes and arrows to to structured programming techniques like nshi niderman diagrams that youed used nested boxes to emphasize the structured nature of the code the proponents of each approach tended to approach the problem based on the language they used alal and Pascal programmers were strong Advocates of structured programming and those languages had syntax that encouraged the approach Fortran programs had Decades of flowchart use and style flowchart style thinking intended to avoid full adoption of structured programming Kuran and Richie chose a middle path and made it so that c could support both approaches to avoid angering either side of the structure programming debate one area where the structured code movement kept hitting a snag was implementing a loop that reads a file and processes data until it reaches the end of file the loop must be able to handle an empty file or no data at all there are three ways to construct a read and process until EOS Loop and none of the approaches are ideal the loop constructions that you can do are a top tested loop with prime a priming read before the loop a bottom tested loop with a read is the first statement in the loop and then if then else is the rest of the body of the loop a top tested infinite loop with a priming read and a middle test and exit and a top tested loop with a side effect read in the test of the loop which is the way that kerigan and Richie chose to document in this chapter all of this serves to explain the syntax while open parentheses open parentheses C equals getchar open print Clos print close parentheses not equal eof close parentheses open curly brace body of the loop closed curly brace this construct is a topped tested Loop which most programmers prefer and it folds the priming read and puts its value inside the variable C but since getch might also return eof we need to check if we actually receive no data at all and need to avoid executing the body of the loop or exit the loop if eof were defined as zero instead of n minus one the loop could have been written while open parentheses C equals getchar open paren Clos paren Clos parentheses open curly brace body of the loop Clos curly brace now the getar function returns a character or zero and the test itself is looking at the side side effect or residual value of the assignment statement to decide to start and or continue the loop body the problem with using zero is end file if you are reading a binary file like jpeg data a zero character might make perfect sense and we would not want to incorrectly end the loop because of a zero character in input data that does not end a file so we get the double parentheses syntax the side effect call to get Char and test the return value within the while test I'm quite confident that this is far more detail that you wanted he in Chapter 2 but is it is as good a time as any to understand how much thought goes into a programming language how it is designed and documented by the time we finish chapter 3 and look at the break and continue statements which are in languages like Python and Java you will see that this 50-year-old structured programming debate debate is still unresolved in the minds of many software Developers back to the book The unary negation operator logical negation operator converts a nonzero or true operand into zero and a zero or false operand into one a common use of exclamation which we often call bang is in constructions like if open parentheses exclamation point inward closed parentheses rather than if open parentheses inward equal equal zero it is hard to generalize about which of these two forms is better construction like exclamation in word read quite nicely as if not in word but more complicated ones can be hard to understand section 2.7 type conversions when operands of different types appear in Expressions they are converted to a common type according to a small number of rules in general the only conversions that happen automatically are those that make sense such as converting an integer to a floating point in an expression like f+ I Expressions that don't make sense like using a float in a subscript are disallowed first chars and ins may be freely intermixed in arithmetic Expressions every chart in an expression is automatically converted to an INT this permits considerable flexibility in certain kinds of character Transformations one is exemplified by the function a toi which converts a string of digits into its numeric equivalent a to Y open parentheses s closed parentheses Char s Open Bracket close bracket semicolon open curly brace int I comma n semicolon n equals z for parentheses I equals 0 semicolon S Sub I greater than or equal to quot 0 quote double Ampersand s subi less than or equal to D quote 9 quote semic Plus+ I closed parentheses n = 10 asterisk n plus s subi minus single quote 0 single quote semicolon return open parentheses n close parentheses semicolon close curly brace to end the function as we discussed in chapter one the expression S Sub I minus quot 0 quote gives the numeric value of the character stored in s subi because of values 0 1 Etc form a continuously increasing positive sequ positive sequence another example of the card chart int conversion is the function lower which mats A Single Character to lowercase for the asky character set only if the character is not an uppercase letter lower is returned unchanged here's a function lower open parentheses C closed parentheses int c semicolon open curly brace if open parentheses C greater than or equal to quote capital a quote double Amper C less than or equal to quote Z quote uppercase Z parentheses return open parentheses C plus quote lowercase a quote minus quote uppercase a quote Clos parentheses semicolon else return C this works for asky because the corresponding upper and lowercase letters are a fixed distance apart as numeric values and each alphabet is contiguous there is nothing but letters between a and z this later observation is not true of the idct character set on IBM 360 370 architectures so this code fails on such systems it converts more than letters there is one subtle point about the conversion of characters to integer the language does not specify whether character V where the variables of type Char are signed or unsigned quantities when Char is converted to an INT can it ever produce a negative number unfortunately this varies from machine to machine reflecting differences in architecture on some machines pdp1 for instance a Char whose leftmost bit is one will be converted to negative integer using S extension on others a Char is promoted to an INT by adding zeros at the Left End and is thus always positive the definition of c guarantees that any character in The Machine standard character set will never appear to be negative so these characters may be used freely in Expressions as positive quantities but arbitrary bit patterns stored in character variables may appear to be negative on some machines yet positive on others the most common occurrence of this situation is when the value Nega -1 is used for eof consider the code Char c semicolon C equals get Char open PR Clos PR semicolon if open pren couble equals eof Clos PR dot dot dot on a machine which does not do sign extension C is always positive because it returns a Char yet eof is negative as a result this test always fails to avoid this we have been careful to use int instead of char for any variable which holds a value returned by the function getar the real reason for using int instead of char is not related to any questions of possible sign extensions it is simply that get Char must return all possible characters so that it can be used to read arbitrary input and in addition a distinct eof value thus its value cannot be represented as a car Char but must instead be stored as an INT as an aside since the book was written before the getar function was standardized the text is somewhat vague in this section shortly after the book was published getchar was put into the stdio.h library and declared to return an integer so as to accommodate all possible characters and the integer minus one value to indicate the end file the above code would be better written with C declared as an integer int c semicolon C equals get Char open PR Clos PR semicolon if open PR c equal eoff Clos PR dot dot dot while the conversion from Char to int may or may not have S extension and yes it still depends on the implementation 50 years later the conversion from int to char is predictable with the top bits being simply discarded if you're using the library function gets to read a file line by line we don't need to worry about this converion since git s returns a pointer to a character array I.E a string it indicates it reach has reached end of file by returning the null pointer I.E there is no more data to give back to the textbook another useful form of automatic type conversions is that relational expressions like I greater than J and logical Expressions constructed by double Amper sand and double vertical bar and Andor respectively are defined to have the value one if true true and zero if false thus the assignment is digit equals c greater than or equal to quote 0 quote double % C less than or equal to quote 9 quote sets the variable is digit to one if C is a digit and zero if it's not in the if test of a in a test part of an if while or for True just means non zero implicit arithmetic conversions work much as expected in general if an operator likes C or asterisk for multiplication which takes two operators I.E a binary operator if it has operate oper of different types the lower type is promoted to the higher type before the operation proceeds and the result is the higher type more precisely for each arithmetic operator the following sequence of conversion rules is applied Char and short are converted to int and Float is converted to double then if either operand is double the other is converted to double and the result is double otherwise if either operand is long and the other is converted to Long the result is long otherwise if either operand is UN operand is unsigned the other is converted to unsigned and the result is unsigned otherwise the operands must be int and the result is in note that all float values in an expression are converted to double all floating Point arithmetic in C is done in double precision conversions take place across assignments the value of the right side is converted to the type of the left which is the type of the result a character is converted to an integer either by ass sign extension or not as described above the re reverse operation in Char is well behaved excess high order bits are simply discarded thus in in I semicolon Char c semicolon IAL C Cal I the value of C is unchanged and this is true true whether or not sin extension is involved if x is float and I is int then xal I and IAL X both cause conversions float to in causes truncation of any fractional part double is converted to float by rounding longer ins are converted to Shorter ones by or to chars by dropping excess high order bits since a function argument is an expression type conversions also take place when arguments are passed to functions in particular Char and short become int and Float becomes double and that is why we have declared function arguments to be int and double even when the function is called with Char and Float finally explicit type conversions can be forced also we call it coerced in any expression with a construct called the cast in the construction open parentheses type name closed parentheses expression the expression is converted to the named type by the conversion rules above the precise meaning of cast is in fact as if the expression were assigned to a variable of the specified type which is then used in place of the whole construction for example the library square root sqrt expects a double argument and will produce nonsense if inadvertently handed something else so if n is an integer sqr r t open parentheses open parentheses double closed parentheses n closed parentheses converts n to double before passing it to sqrt note that the cast produces the value of n in the proper type the actual content of n is not altered the cast operator has the same pre precedence as other unary operations as is summarized in the table at the end of this chapter section 2.8 increment and decrement operators C provides two unusual operators for incrementing and decrementing variables the increment operator Plus+ adds one to its operand the decrement operator minus minus subtracts one we have frequently used plus plus to increment variables as in if open parentheses C equals equals single quote back sln single quote closed parentheses Plus+ n l semicolon the unusual aspect is that Plus+ and minus minus may be used either as prefix operators I.E before the variable as in Plus+ n or postfix after the variable n++ in both cases the effect is to increment n but the expression the residual value of the expression Plus+ n increments n before using its value while n++ increments n after its value has been used this means that in a context where the value is used not just the effect Plus+ n and n Plus+ are different if n is five then xal n++ semicolon sets X to 5 the old value but x = ++ n semicolon sets X to six the new value in both cases n becomes six the increment and decrement operators can only be applied to variables an expression like x equals open parthey i+ J closed parth parentheses Plus+ is illegal in a context where no value is wanted just incrementing effect as in if open parentheses C double equals single quote back slash n single quote closed parentheses n l++ semicolon choose prefix or postfix according to taste but there are situations where one or the other is specifically called for for example consider the function squeeze s comma C which removes all occurrences of the character C from the string s squeeze open parentheses s comma C closed parentheses Char s Open Bracket close bracket semicolon int c semicolon open curly brace int I comma J semicolon for parentheses i = j = 0 semicolon S Sub I I not equal to single quote back sl0 single quote semicolon i++ closed parentheses if open parentheses S Sub I not equal to C Clos parentheses S Sub j++ closed parentheses equals S Sub I and outside the for Loop totally s subj equals back slash quot s subj equals quot back0 quot semicolon close curly brace each time a non c a character other than what's in the variable C occurs is copied into the current J position and only then is J incremented to be ready for the next character this is exactly equivalent to if open parentheses S Sub I not equal to C closed parentheses open curly brace s subj equal S Sub I semicolon j++ semicolon Clos curly brace another example of a similar construction comes from the getline function we wrote in chapter 1 where we can replace if open parentheses C equals quote back slash n quote Clos parentheses open curly brace s subi equal c semicolon Plus+ I semicolon Clos parentheses by the far more compact if parentheses c equal back sln quot close parentheses S Sub i++ equal c semicolon in a third example the function stir cat concatenates the string t to the end of the string s stir cat does assume that there's enough space in s to hold the combination here's the code stir cat open parentheses s comma T closed parentheses Char s Open Bracket close bracket comma T Open Bracket close bracket semicolon open curly brace int I comma J semicolon i = j = 0 semicolon while open PR S Sub I not equal single quot back sl0 single quote Clos pren i++ in this we'd find the end of s while open parentheses open parentheses S Sub I ++ equals T sub j++ close parentheses not equal back sl0 close parentheses semicolon that code copies the rest of T into s Clos curly brace as each characters copied from T to S the postfix Plus+ is applied to both I and J to make sure they are in position for the next pass through the loop section 2.9 bitwise logical operators C provides a number of operators for bit manipulation these may not be applied to float or double Amper sand is bitwise and vertical bar is bitwise inclusive or carrot is bitwise exclusive or less than less than is left shift and greater than greater than is Right shift and till is one complement and it's a unary operator the bitwise and operator Ampersand is often used to mask off off some set of bits for example c equal n Ampersand 0177 sets to zero all but the lower seven bits of n the bit wise or operator is used to turn bits on xals X vertical bar mask sets to one the X bits that are set to one in mask you should carefully distinguish the bitwise operator Amper sand and vertical bar from The Logical connective giv double Amper sand and double vertical bar which imply left to right evaluation of a truth value for example if x is 1 and Y is two then X single Amper sand Y is zero while X double Amper sand Y is one think about that for a moment the shift operators less than less than and greater than greater than per perform left and right shifts respectively of their left oper End by the number of bit positions given by the right operand thus X less than less than two shifts X left by two positions filling the vacated bits with zero this is the equivalent to multiplication by four right shifting an unsigned quantity fills the vacated bits with zero right shifting a s quantity will fill the sign bits or arithmetic shift on some machines such as pdp1 and with and with zero bits logical shift on other the unary operator till yields the ones complement of an integer that is it converts each one bit into a zero bit and vice versa this operator typically finds use in expressions like X Ampersand till 077 which masks the last six bits of X to Zero note that X Ampersand till 077 is independent of word length and is thus preferable to for example X Ampersand 0177 70 which assumes that X is a 16bit quantity the portable form involves no extra cause since keld 077 is a constant expression and thus evaluated at compil time to illustrate the use of some of the bit operators consider the function get bits open parentheses x comma P comma n which Returns the right adjusted nbit field of X that b begins at position P we assume that bit position zero is at the right end and that n and p are sensible positive values for example get bits open print X comma 4 comma 3 close print Returns the three bits in bit positions Four 3 and two right adjusted here we go with the code get bits open PR X comma p comma n unsigned X comma P comma n semicolon open curly brace return open parentheses open parentheses x greater than greater than open parentheses p + 1 minus n closed parentheses Clos parentheses and Ampersand till open parentheses till zero less than less than in close parentheses close parentheses semicolon Clos curly brace X greater than greater than parentheses p + 1 minus n Clos parentheses moves the desired field to the right end of the word declaring the argument to be X declaring the argument X to be unsigned ensures that when it is right shifted vacated bits will be filled with zeros not sign bits regardless of the machine the program is run on till zero is all one bits shifting it left in bit positions with till zero left Shi uh till zero less than less and N creates a mask with the zeros in the rightmost end bits and ones everywhere else complimenting that with till makes a masks with ones in the rightmost bits bitwise operators may seem unnecessary for modern computers but if you look at the internal structure of tcpip packet the values are packed very tightly into the headers in order to save space c made it possible to write portable tcpi implementations on a wide range of hardware architectures bitwise operators also play an important role in encryption decryption and check some calculations modern languages like Java and python support bitwise operators following the same patterns that we established in C so that things like TCP IP and encryption algorithms can also be implemented in these languages by defining these operators it kept software developers from needing to write non-portable Assembly Language to implement these low-level features in operating systems in and libraries section 2.10 assignment operators and expressions Expressions such as IAL I + 2 in which the left hand side is repeated on the right can be written in the compressed form I plus equal 2 using an assignment operator like plus equals most binary operators operators like plus that have a left and right operand have a corresponding assignment operator op equals where op is one of Plus plus minus asterisk for multiplication slash for division percent for modulo less than less than left shift greater than greater than right shift ampers sign bitwise and carrot exclusive or vertical bar bitwise or if E1 and E2 are expressions then E1 operand equals E2 is equivalent to E1 equal E1 operand E2 except that E1 is only computed once note the parenthesis around E2 X star = y + 1 is actually x = x * parentheses y + 1 Clos parentheses rather than x * y + 1 as an example function bit count counts the number of one bits in its integer argument here's the code bit count open parentheses and closed parentheses unsigned and semicolon open curly brace int B for open parentheses b equals 0 semicolon n exclamation equals 0 semicolon n greater than greater than equal 1 close parentheses if open parthey n Ampersand 01 Clos parentheses B plus plus semicolon return open print B Clos print semicolon curly brace quite apart from its conciseness assignment operators have the advantage in that they correspond better to way the way people think we said we say add two to I or increment I by two not take I add two and put the result back in I thus I plus equal 2 in addition for a complicated expression like y y L Open Bracket y ypv Open Bracket P3 plus P4 close bracket plus y y PV Open Bracket P1 plus P2 close bracket close bracket plus equals 2 the assignment operator makes it code easier to understand since the reader doesn't have to check painstakingly that two long expressions are indeed the same or wonder why they're not an assignment operator may even help the compiler to produce more efficient code we have already used the fact that the assignment statement has a side effect value and can occur in Expressions the most common example is while open parentheses open parentheses C equals get Char open parentheses closed parentheses closed parentheses exclamation equal e closed parentheses and then the rest of the loop assignments using other assignment operators plus equal minus equal can also occur in Expressions although it's a less frequent occurrence the type of an assignment expression is the type of its left operand section 2.11 conditional Expressions the statements if parentheses a greater than b Clos parentheses Z equals a semicolon else Z equals B semicolon of course compute Z in the maximum of A and B the conditional expression written with a trinary operator which is question mark and colon provides an alternate way to write this and similar constructions in the expression E1 question mark E2 colon E3 the expression E1 is evaluated first if it is non zero true then the expression E2 is evaluated and that is the value of the conditional expression otherwise E3 is evaluated and that is the value only one of E2 and E3 are evaluated thus to set the Z to set Z to the maximum of A and B we say Z equals Pro parentheses a greater than b close parentheses question mark a colon B and this implements Z equals Max of A and B it should be noted that the conditional expression is indeed an expression and can be used just as any other expression if E1 and E if E2 and E3 are different types the type of the result is determined by the conversion rules described earlier in this chapter for example if f is a float and N is an INT then the expression parentheses n greater than zero Clos parentheses question mark F colon n is of type double regardless of whether n is positive or not parentheses are not necessary around the first expression of a conventional expression since the Precedence of question mark colon is very low just above assignment they are advisable anyway however since they make the condition part of the expression easier to see the conditional expression often needs to succinct code for example this Loop print prints n elements of an array 10 per line with each column separated by one blank and with each line including the last terminated by exactly one new line here's the code four parentheses I equals 0 semicolon I less than n semicolon I ++ Clos parentheses print F open parentheses double quote percent 6D percent C double quote comma a sub I comma open parentheses i% 10 equal 9 or double vertical bar I equal n minus1 Clos parentheses question mark single quote backs slash and single quote colon single quote space single quote close parenthesis semicolon a new line is printed after every 10th element and after the nth and all elements are followed by one blank although this might look tricky it's instructive to try to write it without the conditional expression section 2.2 precedence in order of evaluation the table below summarizes the rules for precedence and associativity of all operators including those which we have not yet discussed operators on the same line have the same precedence rows are in order of decreasing precedence so for example asterisk slash and percent all have the same precedence Which is higher than that of plus and minus parentheses square brackets the arrow operator and the dot operator are first with the right left to right associativity next precedence is exclamation till plus plus minus minus single minus the cast multiplication bitwise and size of with left with right to left associativity then multiplication division and modulo with left right associativity then plus and minus with left to right associativity then left shift Which is less than less than and right shift which is right greater than greater than with left to right associativity then logical operators less than less than or equal to greater than or greater than or equal to with left to right associativity then comparison double equals not equal exclamation equal left to right then we have some bitwise operators the bitwise and with left to right associativity then the bitwise exclusive or with Left Right associativity which is a carrot then the single vertical bar which is bitwise or with left to right associativity then we have Amper sand s which is and the logical and left to right associativity then double vertical bar which is the logical or with left to right associativity then the ternary operator which is question mark and colon with right to left associativity then the assignments equals plus equals minus equals Etc with right to left associativity and then the comma which we'll cover in the next chapter with left to right associativity The Operators Dash greater than and Dot are used to access members of structures they'll be covered in chapter six along with size of in ch and then in chapter five we'll discuss asterisk in Direction and Ampersand address of note that the Precedence of The bitwise Logical operators Ampersand carrot and vertical bar flws below double equals and exclamation equals that this implies that bit bit testing expressions like if parentheses parentheses X Amper sand Mass closed parentheses equal equal Z closed parentheses must be fully parenthesized to get proper results as mentioned before Expressions evolving one or more of the associative and commutative operators like multiply plus bitwise and bitwise or bit and bitwise exclusive or can be rearranged even when parenthesized in most cases this makes no difference whatsoever in situations where it might explicit temporary variables can use to force a particular order of evaluation C like most languages does not specify in what order operands of an operator are evaluated for example in a statement like xals f open parentheses closed parentheses plus G open parentheses closed parentheses semicolon F may be evaluated before v g or vice versa thus if F or G Alters an external variable that the other depends on x X can depend on the order value of evaluation again intermediate results can be stored in temporary variables to ensure a particular sequence similarly the order which function arguments are evaluated is not specified so the statement print F open parentheses double quote percent d space percent D back sln double quote comma Plus+ n comma power open parentheses 2 comma n Clos parentheses Clos parentheses is wrong it can and often does produce different results on different machines depending on whether or not n is incremented before power is called the solution of course is to write Plus+ n semicolon print F open parentheses double quote percent d space percent D back sln double quote comma n comma power open parentheses 2 comma n close parentheses close parentheses function calls nested assignment statements and increment and decrement operators cause side effects some variable is changed as a byproduct of the EV valuation of an expression in any expression involving side side effects there can be subtle dependencies on the order in which variables taking part in the expression are stored one unhappy situation is typified by the statement a sub I equals i++ the question is whether or not the subscript is the old value of I or the new value the compiler can do this in different ways and generate different answers depending on its interpretation when side effects I.E assignment to actual variables takes place is left to the discretion of the compiler since the best order strongly depends on machine architecture the moral of this discussion is writing code which depends on the order of evaluation is a bad programming practice in any language now naturally it is necessary to know what things to avoid but if you don't know how they are done on various machines that innocence may help to protect you the C verifier lint will detect most dependencies on order of evaluation I would add that the real moral of the story is to use side effect operators very carefully they are generally only used in idiomatic situations and then use it written using simple code the authors are happy to tell you everything that you can do and see in great detail and they are also suggesting that just because you can do something does not mean that you should do something remember that a key aspect of writing programs is to communicate with future human readers of your code including you reading your own code in the future with modern-day compilers and optimizers you gain little performance by writing DSE or obtuse code write the code describe what you want done and let the compiler find the best way to to do it one of the reasons that a common senior project in many computer science degrees was to write a compiler is to make sure that all computer scientists understand that they can trust the compiler to generate great code this work is based on the 1978 C programming book written by Brian W kernigan and Dennis M Richie their book is copyright All Rights Reserved by AT&T but is used in this work under fair use because of the book's historical and scholarly significance its lack of availability and the lack of an accessible version of the book the book is augmented in places to help understand Its Right Place in a historical context amidst the major changes of the 1970s and 1980s as computer science evolved from a hardware first vendor centered approach to a software centered approach where portable operating systems and applications written in C could run on any hardware this is not the idea book to learn SE programming because the 1978 Edition does not reflect the modern sea language using an obsolete book gives us an opportunity to take students back in time and understand how the sea language was evolving as it laid the groundwork for a future with portable applications [Music] hello welcome to chapter 3 I'm Charles Severance and I'm your instructor so here in chapter 3 again I'm just I want you to read the book I'm just going to call your attention to a few of the unique things that uh might help you make more sense of the book so we're going to talk about semicolon use how it's uh started in Sea and is used across multiple languages how elsif is a little different across languages the switch statement a bit of motivation the switch statement is even Inc um the comma comma I don't know operator separator and then sort of this this tendency towards excessive succinctness or brevity that uh is pretty common in uh C programming right it just it's like it's there's such a value in making things really really short and that makes it kind of different so I love semicolon based languages and we have a whole bunch of semicolon based languages that that we've learned and are going to learn um certainly 1978 the C programming language with its you know non non syntactically important spacing um the key to C is that c the semicolon is a terminator and every statement must be terminated by a semicolon so we say xal X+1 semicolon and xal x divided two semicolon right and that's that's the idea the print F ends in a semicolon so you may or may not know in Python you're allowed to have semicolons they're pretty much optional like on the print open print X Clos print that semicolon does not need to be there but it is a separator not a Terminator so you can think of the print open print X Clos PR semicolon as one statement followed by a separator followed by an empty statement which does nothing but the interesting thing is you can put more than one line on one line you can put more than one line of Code by put using a separator so there I say x = x + 1 semicolon x = x over2 semicolon I don't have to indent that I just it it it's two two lines in the same block of code and that's legal most of the time people choose not to use semicolon the other thing about that is that shell scripting which is sort of the Linux automation uh treats it as a separator and so that sort of looks a bit like shell scripting to have multip statements on the same line uh separated by semicolons in Java it tends to follow the C pattern where it's a Terminator I tend to like it as a Terminator I don't like the idea that you can leave it off the way JavaScript does and so you see it's on two assignment statements and the system out print Lin in Java in PHP PHP follows C very closely and so it is um it is a Terminator there as well and so that I I think that's natural and the good news if it's a separator like in JavaScript in the next example where it's separating it and so in this case the xal x / 2 does not need to be terminated because the closed curly brace is going to going to terminate that and um and like even the console.log open print Clos print semicolon that semicolon is optional when I tend to write JavaScript I tend to put semicolons everywhere when I tend to write python I put semicolons nowhere and then in phpc and Java I tend to you know put semicolons everywhere even though sometimes there are things you can leave out another thing that is very very subtle is the notion of else if so C predates Python and um C in this book shows else if has two separate words and there's an else keyword and an if keyword and so you say else space if and then you have the expression and another statement else space if expression and another statement and then else for the one where none of those expressions are true and if you look at python it looks almost identical it says if expression then L if expression L if expression and else the key is that L if is a separate language construct in Python and I think it's actually really beautiful and elegant and the key is is that this else if while it is I can think of it in C as like indented incorrectly so you can look at it is the very first if has an if and an else and everything from the second if on down is really part of that else and so if you look to the right you see the curly braces with the entation that's explicit it's exactly the same thing but what you're going to see is you can see that it's if you were going to truly correctly indent an if else if else if else in C you would indent it the way it's shown on the right side and it it's neither here nor there very very it is very rare that you would see anyc programmer you know do all the indentation the technically right way but I just want to call your attention to it that it's different than LF LF is its own language element that is not a deeper nesting deeper and deeper nesting if you were to Nest it you see on the right hand side you see I've got three Cur closed curly braces curly brace curly brace curly brace and it's just so the LF I think is a really elegant addition that python has added the switch statement I I think that the reason that the authors put the switch statement into C is there was a time where we would write code in Assembly Language using what we call a jump table where where we take sort of the take a number maybe take it mask it so it's only from 0 through 16 and then look up a series of addresses and jump through a jump table and the computed go-to was the way in Fortran of expressing a jump table but in Fortran it was just a mess you got these you got these labels and columns 1 through six and the continue statement doesn't work like the Contin in and c and and C like languages and you had to have these go-tos to get out of the the switch statement if you think about it from an Assembly Language perspective it's not that hard to build the computed goto with a little tiny jump table um and so I I think to some degree whether or not we have to use a jump table in modern C is really it's really really rare where you have to use a jump table we just would do a few repeating Els and it's just fine back then a few extra statements might have bothered something if you were going to do it a million times a you know a million times a minute or something the switch statement is much prettier you do have to put The Brak statements in there you can kind of Nest the ca you have the Stacked cases and then there's a default case so if I at least I compare the C switch statement with the Fortran computed goto I want to say that the C switch statement was pretty much a lot more elegant a lot easier to use a lot easier to understand and because Assembly Language programmers of the time did think in terms of jump tables if a highlevel language didn't have a way to express a jump table in that language then we would kind of think of it as missing but frankly you know in your programming I I'm not sure I've written a switch statement in because Java has a switch statement too I probably haven't written a switch statement in over 20 years and maybe more so I I like the fact that it improved on Fortran but that doesn't mean that that you should use it the comma operator or comma separator it's I like to think of it as like a light version of the semicolon and um most people almost never use it and the only place we use it is uh when it is sort of uh idiomatic where in a four statement because we're already using semicolon to separate the start before the loop the loop test and then the loop increment per iteration we're using semicolon for that so if we want to do like two statements we going to like oh I equals z J equal sterland s minus1 with a comma in between to say do these two things before the loop starts and then at the end you say i++ comma J minus minus it says do these two things at the end of each Loop so I only see it in idiomatic situations just think of it as like we couldn't use a semicolon here it functions exactly like a semicolon although the syntax already has a semicolon it so I I think it's actually a pretty clever um way to say I want to put two statements in here you could maybe you could put curly braces in there or something but I thought the comma was a pretty cool thing another problem is that there was just this notion that we as Assembly Language programmers we could do things like be smart and leave some value in a register and then check the register a couple of different ways and that would lead to really succinct fast code hand tune code where you might have to look at it to figure out what it's doing but then you realize well I I did got in six statements rather than 12 statements 12 statements might have made more sense but the six statements were really fast and in the early days in the early 70s they were changing their compilers so fast and changing their Hardware so fast that they really didn't build um super great optimizing compilers so they would look at the source code that came out of the compiler and like I could do better than that so there was a lot of a kind of comparison of the source code um between the what the C compiler generate and whatever and so they found over time that if they would kind of use these tricks that like told the compiler to like take this C equals get charar and leave the C in a register and compare it to double to a space and then compare it again to a new line and compare it again to a tab we would think oh I I can see how that would run in Assembly Language and I can hope that the compiler would generate the Assembly Language that um compiler would generate Assembly Language that would make me happy and then another pattern you see in this is the number four thing where all the work's been done in the loop test it's a y Loop that whole big expression is just a test to know when it's done but it's actually reading the data comparing it three times storing it in a variable and when that's all done there's nothing to do in the loop and so that's why you say close pen uh semicolon and you'll see a lot of those things especially when you're doing string stuff where you're sort of zooming through an array and you did it all in the for Loop and you don't really have anything to do in the for Loop and again we're thinking in the early days of how this is going to translate into Assembly Language and so you're trying to make that Loop really really small and again it's amazing how often they looked at the resulting Assembly Language um in a non-optimizing compiler situation and then wondered if the compiler could have done better so that gets us going in uh this chapter we talked about the semicolon we talked about the switch statement the if subtle syntax difference between Python and C the comma and just get used to the notion that it's obtuse code please don't write obtuse code these days the optimizers are so great and uh and so don't write obtuse code but don't be too upset as you read the textbook and see obtuse code [Music] welcome to C programming for everybody my name is Charles sance and this is my reading of the 1978c programming book written by Brian kernigan and Dennis Richie at times I add my own interpretation of the material from a historical perspective chapter 3 control flow the control flow statements of a language specify the order in which the computations are done we have already met the most common control flow constructions of C in earlier examples here we will complete the set and be more precise about the ones discussed above 3.1 statements and blocks an expression such as xal 0 or i++ or printf open pen dot dot dot Clos pen becomes a statement when it is followed by a semicolon as in X = 0 semicolon I ++ semicolon print F open parentheses something closed parentheses semicolon in C the semicolon is a statement Terminator rather than a separator as it is in alol like languages the braces the curly braces open curly brace and closed curly brace are used to group declarations and statements together into a compound statement or block so that they are syntactically equivalent to a single statement the braces that actually surround the statements of a function are one obvious example braces around multiple statements after an if else while or for or another variables can actually be declared inside any block we will talk about this in chapter 4 there is never a semicolon after the right curly brace that ends a block ah see how do I love thee let me count the ways uh quote by Dr Chuck with homage to Elizabeth Barrett Browning The Humble semicolon is why spacing and line ends do not matter in C and SE like languages it means that we as programmers can focus all of our white space and line ends on communicating our intent to humans this freedom is not an excuse to write obtuse code or dense code for example see the obfuscated Pearl contest but instead freedom to describe what we mean or use spacing to help us understand or maintain our code we can take a quick look at how a few other c-like languages that came after C treat the semicolon Java is just like C in that the semicolon terminate statements python treats the semicolon as a separator like algol allowing more than one statement on a single line but since python treats the end of a line as a statement separator you generally never use semicolon in Python but for people like me who automatically add a semicolon when typing code too fast at least python ignores the few semicolon I mistakenly add to my code out of habit JavaScript treats the semicolon as a separator but since JavaScript ignores the end of the line it's treated as Whit space semic Colones are required when a block of code consists of more than one statement when I write JavaScript I meticulously include semicolons at the end of all statements because any good programmer can write C in any language back to the text 3.2 if else the if else statement is used to make decisions formally the syntax is if parentheses expression parenthesis statement one else statement two where the else part is optional the expression in parentheses is evaluated if it's evaluates to true that is the expression has a nonzero value statement one is done if it is false the expression is zero and there's an else part statement two is executed instead since an if simply tests the numeric value of the expression certain coding shortcuts are possible the most obvious is writing if open parenes expression instead of if open parenthesis expression not equal Z zero Clos parentheses sometimes this is natural and clear other times it's cryptic because the else part of an if else is optional there is an ambiguity when an else is omitted from a nested if sequence this is resolved the usual way the else is associated with the Clos closest previous eless if for example if open parentheses n greater than Z closed parentheses if open parentheses a greater than b closed parentheses zal a semicolon else Z equal B the else with the Z equal B goes with the inner if as we have shown by the indentation in the above example if that's not what you want braces must be used to force the proper Association if n greater than zero open curly brace if a greater than b z equals a semicolon Clos curly brace else Z equals B this ambiguity is especially pernicious in situations like if open parenes n greater than Z for open parentheses I equal 0 semicolon I less than n semicolon i++ closed parentheses if open parentheses S Sub I greater than zero closed parentheses open curly brace print F string dot dot dot close parentheses semicolon return open parentheses I close parentheses semicolon close curly brace else and this is where it's wrong print F open parthy eror and is zero closed parentheses semicolon the indentation in the above example shows unequivocally what you want but the compiler does not get the message and Associates the else with the inner F this bug can be very hard to find by the way notice that there is a semicolon after Z equals a in if open parentheses a greater than b closed parentheses Z equal a semicolon else Z equals B semicolon that is because grammatically a statement follows the if and an expression like Z equals a is always terminated by a semicolon section 3.3 else if the construction if open parentheses expression closed parentheses statement else blank if open parentheses expression Clos parentheses statement else if open parentheses expression closed parentheses statement else statement occurs so often that is worth a brief separate discussion the sequence of ifs is the most General way of writing a multi-way decision the expressions are evaluated in order if any expression is true the statement associated with it is executed and this terminates the whole chain the code for each statement is either a single statement or a group of statements in Braes the last else handles the none of the above or default case where none of the other conditions was satisfied sometimes there is no explicit action for the default in that case the trailing else statement can be omitted or it may be used for error checking to catch an impossible condition to illustrate three-way decision here is a binary search function that decides if a particular value X occurs in the sorted array V the elements of V must be in increasing order the function Returns the position a number between 0 and N minus1 if x occurs in v n minus one if not this sample code is on page 54 it is the first example on page 54 in the book binary open parentheses x comma V comma n Clos parentheses int X comma V open square bracket closed square bracket comma n semicolon open curly brace int low comma High comma mid semicolon low equals zero semicolon High equal n-1 semicolon while open parentheses low less than or equal to high closed parentheses open curly brace mid equals open parentheses low plus High closed parentheses slash to semicolon if open parentheses x less than V sub mid closed parentheses High equal midus one semicolon else if open parentheses x greater than V sub mid closed parentheses low equals mid + 1 else return open parentheses mid close parentheses semicolon close curly brace closing the while and then return open parentheses minus one close parentheses semicolon close curly brace to end the function the fundamental decision in this code is whether X is less than greater than or equal to the middle element V submit at each step this is a natural for Els I would note that in the above examples the else and the if in C are two language constructs that are just being used idiomatically to construct a multi-way branch or elsf pattern with indentation that captures the idiom if we are pedantic about the indentation of the of the above sequence we would be separating the else and if and ending each succeeding block further as follows with brackets or added for clarity if open parthey expression close parenthe parentheses open curly brace statement Clos curly brace else open curly brace if open parenthesis expression closed parentheses open curly brace statement Clos curly brace else open curly brace and now we're quite indented at this point if open parenthesis expression close parentheses open curly brace statement close curly brace else open curly brace statement close curly brace close curly brace close curly brace Java and JavaScript keep the else and if as separate language elements and document their idiomatic usage and indentation just like C but in Python L if is a single keyword and a new language construct that achieves the same name idiom as shown below if open parentheses expression closed parentheses colon block LF open parentheses expression closed parentheses colon block LF if LF parentheses expression parentheses colon block else colon block the C Java JavaScript and python idioms thankfully look the same when the idiomatic indentation is used even fortran77 supports El the elseif construct to implement multi-way logic section 3.4 switch the switch statement is a special multi-way decision maker that tests whether an expression matches one of a number of constant values and branches accordingly in chapter 1 we wrote a program to count the occurrences of each digit Whit space and all other characters using a sequence of if else if else here's the same program with a switch this is the first example program on page 55 pound include less than stdio.h main open parentheses closed parentheses open curly brace int C comma I comma n White comma n other comma n digit open square bracket 10 closed square bracket a 10 element array n white equal n other equals 0 for I equal 0 I less than 10 i ++ n digit sub I equals z while parentheses parentheses C equals getar open print close PR Clos PR not equal EF Clos print switch open parentheses C close parentheses open curly brace case quot Zero quote colon case quote 1 quote colon case quote 2 quote colon case quote 3 quote colon case quote 4 quote colon case quote 5 quote colon case quote 6 quote colon case quote 7 quote colon case quote 8 quote colon case quote 9 quote colon igit Sub C minus quot 0 quot close bracket Plus+ break now that that bit of code right there was to take all of Z through nine and guide it to the line of code that incremented the particular element of the igit array by one continuing after the break semicolon case quote space quote quote case quot sln quot colon case quot SLT quot colon n white Plus+ semicolon break default colon n other Plus+ colon break semicolon close curly brace print F open parentheses double quote digits equals double quote close parentheses semicolon four open open print I equals 0 I less than 10 I semicolon i++ close parentheses print F open parentheses percent space percent D close double quote comma igit subi Clos parentheses semicon print F quote back sln wh space equals percent D comma other equals percent D back slash N double quote comma n white comma n other closed parentheses semicolon closed querly brace the switch statement evaluates the integer expression in parentheses in this case the character C and Compares its value simultaneously to all the cases each case must be labeled by an integer or character constant or constant expression if a case matches the expression value execution starts at that case the case labeled default is executed if none of the other cases is satisfied default is optional if there isn't if it isn't there and none of the cases matches no action at all takes place cases in default can occur in any order cases must all be different the break statement causes an immediate exit from the switch because the cases serve as labels after the code for one case is done and execution falls through to the next unless you take explicit action to escape break and return are the most common Ways to Leave a switch a break statement can be used to force an immediate exit from a while for and two Loops as well as will be discussed later in this chapter falling through the cases is a mixed blessing on the positive side it allows multiple cases for a single action as with blank tab or new line this example but it also implies that normally each case must end with a break to prevent falling through to the next falling through from one case to another is not robust being prone to disintegration when the program is modified with the exception of multiple labels for a single computation fall throughs should be used sparingly as a matter of good form put a break after the last case in this case default although it's logically unnecessary some when another case gets added at the end this bit of defensive programming will save you uh the switch statement what is there to say I think that the switch statement was added to se C to compete with the earlier Fortran computed go-to statement or just to keep low-level Assembly Language programmers from switching into Assembly Language to implement the concept of a branch table the authors spend most of the previous section apologizing for the switch statement so you you should perhaps take this as a hint and never use it there are very few situations where a branch table outperforms a series of this if then else checks and those are likely deep in a library or operating system code programmers should only use switch if they understand what a branch table is and why a branch table is more efficient for the particular bit of a program they're writing otherwise just use LF and do the read of your code a favor 3.5 Loops while and for we have already ex countered the while and for Loops in while open parentheses expression closed parentheses statement the expression is evaluated if it is nonzero the statement is executed and the expression is re-evaluated this cycle continues until the expression becomes a zero at which point execution return resumes after the statement the four statement for open parentheses expression one semicolon Expression 2 semicolon Expression 3 closed parentheses statement is equivalent to expression one semicolon while open parentheses Expression 2 Clos parentheses open curly brace statement Expression 3 semicolon Clos curly brace grammatically the three components of the four are expressions most commonly expression one and expression three are assignments or function calls and expression two is a relational expression any of the three parts can be omitted although the semicolons must remain expression if expression one or expression three is left out it is simply dropped from the expansion if the test expression two is not present it is taken as permanently true so the code for open parentheses semicolon semicolon closed parentheses open curly dot dot dot closed curly brace is an infinite Loop presumably to be broken by some other means such as a break or a return whether to use while or for is largely a matter of taste for example in the code while open parentheses open parentheses C equals get chart open paren close paren close parentheses equal equal quote space quote or c equal equal quot back sln quote or or c equal equal quot back SLT quote closed parentheses semicolon skipping White characters there is no initialization or reinitialization so a while seems more natural the four is clearly Superior when there is simple initialization and reinitialization since it keeps the loop control statements close together and visible at the top of the loop this is the most obvious in for open p i equal 0 sum semicolon I less than n semicolon i++ Clos per n which is the C idiom for processing the first n elements of an array the analog of a Fortran or pl1 do Loop the analogy is not perfect however since the limits of a for Loop can be altered within the loop and the controlling variable I retains its value when the loop terminates for any reason because the components of the four are arbitrary expressions for Loops are not restrict red to arithmetic progressions nonetheless is bad style to force unrelated computations into a four it is best reserved for loot control operations as a larger example here is another version of the a2i function for converting a string to its numeric equivalent this one is more General it copes with optional leading white space and an optional plus or minus sign chapter four shows a to F which does the same conversion for floating Point numbers the basic structure of the program reflects the form of the input skip whites space if any get the sign if any get the integer part and convert it each step does its part and leaves a clean slate for the next the whole process terminates on the first character that would not be part of a number this is the first example on page 58 of the textbook a to I open pen s closed pen which is going to convert s to an integer Char s open square bracket closed square bracket semicolon open curly brace int I comma n comma sign semicolon for I equal 0 S Sub I equal equal quote space quote or S Sub I equals equals quote back n quote or S Sub I equals equals back SL quote back SLT quote semicolon I ++ semicolon skip white space this is a for loop with an empty Loop body s equal 1 semicolon if S Sub I equal equal plus or S Sub I equal equal quote minus quote close parentheses sign equals and here we use a turn Turner operator open parentheses S Sub i++ equal equal quote plus quote close parentheses question mark 1 colon minus one semicolon for open parentheses n equals 0 semicolon s subi greater than or equal to quot 0 quote and S Sub I less than or equal to quote 9 quote I Plus plus Clos parentheses n = 10 * n + S Sub I minus quot 0 quote semicolon return open parentheses sign time n closed parentheses semicolon Clos curly brace to end the function the advantages of keeping Loop controls centralized are even more obvious when there are sep several nested Loops the following function is a shell sort for sorting an array of integers the basic idea of the shell sort is that in early stages far apart elements are compared rather than adjacent ones as in simple interchange sorts this tends to eliminate large amounts of disorder quickly so later stages have less work to do the interval between the compared elements is gradually decreased to one at which point the sort becomes an effectively adjacent interchange method this sample code is the second example in page 58 of the textbook shell open parentheses V comma n Clos parentheses int V open square bracket closed square bracket comma n semicolon open curly brace int Gap comma I comma J comma temp for open p Gap equals n / 2 semicolon Gap greater than zero semicolon Gap slash equals to closed parentheses for open parentheses I equals Gap semicolon I less than n i+ Clos parentheses four and now we're 3D nested in the four Loops for open parentheses Jal IUS Gap semicolon J greater than equal to Zer and V subj greater than V subj plus Gap semicolon J minus equals Gap close parentheses open curly brace temp equals v subj v subj equals V subj plus Gap semicolon V subj plus Gap equals temp semicolon just a swap of V and V V subj plus Gap V subj and V subj plus Gap close curly brace and that Clos the in inmost for Loop and and then the next closed curly brace closes the function there are three nested Loops the outermost Loop controls the gap between the compared elements shrinking it from n/ Two by a factor of to each pass until it becomes zero the middle Loop Compares each pair of elements that is separated by Gap the innermost Loop reverses any elements that are out of order since Gap is eventually reduced to one all elements are eventually ordered correctly note that the generality of the four makes the outer loop fit the same form as the others even though it is is not an arithmetic progression one final C operator is the comma which most often finds use in the four statement a pair of Expressions separated by a comma is evaluated left to right and the type and value of the result are the type and value of the right operand thus in a force statement it is possible to place multiple expressions in the various parts for example to process two parallel indices this is Illustrated in the function reverse which reverses the function string in place this code is from page 59 of the textbook and it is the first example on page 59 pound include less than string.h greater than reverse open pen s closed pen Char s open square bracket closed square bracket semicolon open curly brace int C comma I I comma J for open fenesy i = 0 comma Jal sterin sub s-1 semicolon I less than J semicolon i++ comma J minus minus Clos parentheses open curly brace Cal S Sub I S Sub IAL s subj semicolon s subj equals c semicolon close curly brace close curly braas to end the function the commas that F separate the function arguments variables declarations Etc are not comma operators and do not guarantee left to right evaluation section 3.6 Loops do while the while and for Loops share the desirable attribute of testing the termination condition at the top rather than at the bottom as we discussed in chapter one the third Loop in see the do while tests at the bottom after making each pass through the loop body the body is always executed at least once the syntax is as follows do statement while open parentheses expression closed parentheses semicolon the statement is executed then the expression is evaluated if it is true statement is evaluated again and so on if the expression becomes false the loop terminates as might be expected do while is much less used than while and four accounting perhaps 5% of all loops nonetheless it is from time to time valuable as the following as in the following function I to a which converts a number to a character string the inverse of a to I the job is slightly more complicated than it might be thought at first because the easy methods of generating the digits generate them in the wrong order we've chosen to generate the string backwards and then reverse it this is the first sample code on page 61 uh page 60 of the textbook I to a open parentheses n comma s closed parentheses Char s open square bracket Clos square bracket semicolon int n semicolon open curly brace int I comma sin semicolon if open parentheses open parentheses s equals n close parentheses less than zero close parentheses n = minus n SU col that's a bit of a complex if I would say there uh it has an assignment statement that both copies n into sign um and then evaluates as to whether or not it's less than Z the N or sign is less than zero because it's a side effect assignment statement with a residual value but the net result is sign contains n and then it's negated um uh then n is made positive so continuing I equal 0 semicolon do open curly brace S Sub i++ equals n modulo 10 plus quote 0 quote semicolon close curly brace while parentheses parentheses n/ equal 10 close parentheses greater than zero close parentheses semicolon if open parentheses sign less than zero Clos parentheses S Sub I ++ equals quote- quote or minus sign semicolon s subi equals qu0 quote semicolon in a sense to terminate the string and then we call the function reverse reverse open parentheses s closed parentheses semicolon Clos curly brace the do while is necessary or at least convenient since at least one character must be installed in the array s regardless of the value of n we have also used braces around the single statement that makes up the body of the do while even though they are unnecessary so the Hasty reader will not mistaken mistake the while part for the beginning of a while loop I would note that it's important for any language to provide top tested loops and bottom tested Loops but don't feel bad if you write code for years and never feel like a bottom tested Loop is the right way to solve a problem you're facing it is usually rare to write a loop that you insist will run once regardless of its input data section 3.7 break it is sometimes convenient to be able to control Loop exits other than by testing at the top or the bottom the break statement provides an early exit from a for while or do just as it does from the switch statement a break statement causes the innermost enclosing Loop or switch to be executed immediately the following program removes trailing blanks and tabs from the end of each line of input using a break to exit from a loop when the most character is non-blank non- tab is found this example code is on page 61 of the textbook and it is the first example on that page pound include less than stdio.h pound Define MAX Line 1000 Main open parentheses closed parenthesis open curly brace int n semicolon Char line Open Bracket MAX Line close bracket semicolon while open pren open pren n equals get Line open pren Line comma MAX Line Clos PR close pren greater than zero Clos print open curly brakes while open Cur open parentheses minus- n greater than or equal to zero close parentheses if open parentheses lines sub n not equal Z quote space quote and line subn not equal quote back SLT quote and line subn not equal quot back slend quote close parentheses break line subn +1 equals back0 quote semicolon print F double quot percent s back sln double quot comma line close parentheses semicolon close curly brace to end the while and then close curly brace to end the main get line Returns the length of the line the inner y Loop starts at the last character of line recall that minus minus n decrements n before using the value and scans backwards looking for the first character that is not a blank tab or new line the loop is broken when one is found or when n n becomes negative that is when the entire line has been scanned you should verify that this is the correct Behavior even when the line contains only whites space characters an alternative to break is to put the testing in the loop itself while open parentheses open parentheses n equals get Line open parentheses line comma MAX Line closed parentheses closed parentheses greater than zero closed parentheses open curly brace while open PR minus- n greater than equal to zero and open PR line subn equal equal quot subas qu quote or line subn equals quot backt quote or line subn equals quote back slash n quote close per n close per n semicolon dot dot dot close curly brace this is inferior to the previous version because the test is is harder to understand test which require a mixture of and or not and parentheses should generally be avoided 3.8 continue the continu statement is related to break but less often used it causes the next iteration of the enclosing Loop for while or due to begin in the while and do this means that the test part is executed immediately in the four control passes to the reinitialization step contr by the way continue applies only to Loops not to switch a continue inside a switch statement inside of a loop causes the next Loop iteration as an example this fragment processes only positive elements in the array a negative values are skipped for open print I equal 0 semicolon I less than n semicolon i++ closed pen open curly brace if open p a subi less than zero closed P continue this line skips the negative elements and then the rest of the body of the loop dot dot dot is will run only for the positive elements of the loop and the loop finishes with a close curly brace the continue statement is often used when part of the loop that follows is complicated so that reversing a test and indenting another level would Nest the program too deeply it's time for a bit of an aside now that we have seen the break and continue language structures in C that also have made it into C like languages and learned about middle tested Loops it is time to revisit the structured programming debate and the need for priming operations when a program must process all data until it finishes and still handle the there is no data at all situation in the previous chapter the author somewhat skirted the issue by using a top tested while loop and a side effect statement with residual value that was compared to eof to decide when to exit the loop int C colon while open parthey open parentheses C equals get Char open pren close paren close parentheses not equal eoff close parentheses open curly brace process your data close curly brace and just for fun now that we do know about the for Loop let's rewrite this loop as a for Loop to make sure we really understand how for loops work int C semicolon for open print C equals getchar open print Clos print semicolon c not equal eof semicolon C equals getchar open PR close PR close PR curly brace process your data close curly brace for the for Loop now you will almost never see a read all characters until eof written this way because because it is not the way KR told us KR told us to use a y Loop for this but the four Loop formulation is probably clearer to many than the while formulation especially to a reader who's not familiar with the assignment side effect idiom commonly used in C in particular the four formulation does not require that the assignment statement has a residual value of the value that was assigned the first part of the four is a priming read the second part of the four is a top text tested exit criteria that works both for no data and after all data has been read and processed and the third part of the four is done at the bottom of the loop to advance to the next character or encounter eof before going back to the top and doing the loop test the call up to get Char is done twice in the four formulation of the read all all available data loop and while we don't like to repeat ourselves in code it is a if it is a small and obvious bit of code perhaps this code is more clear with a bit of repetition so with all this is background you can take this page of the document and sit down with a friend at a coffee shop and debate as long as you like about which is the better formulation for the read all available data but if at that coffee shop you ask Dr Chuck's opinion neither of these is ideal because in the real world we build data oriented Loops that usually do a lot more than get one character from standard input my formulation of a data loop will upset the structured programming purists and probably upset kernigan and Richie as well but but I write code in the real world so here is my version int c semicolon while open print one Clos print open curly prce C equals getchar if open print C double equals eof close pen break process your data and then close curly brace to end the loop and if I wanted to skip blanks and new lines I could use both break and continue further angered angering the structure programming purists int c semicolon while open parentheses one closed parentheses open curly bra C equals get charar open parentheses closed parentheses semicolon if open parentheses c equal equal eoff close parentheses break if open parentheses c equal equal quote space quote or c equal equal quot back slash and quote close parentheses continue then process your data close I use this middle tested approach because usually the data I am processing is coming from a more complex Source than the keyboard and I don't want a function with two to three parameters stuck inside of a sign side effect statement in a while test and also sometimes you want to exit Loop not just based on the return value from the function but instead based on the complex structure that came back from the function itself as these data processing Loops get more complex the middle test Loop is a tried and true pattern even kernigan and Richie point out its benefits above and with that I have now triggered endless coffee shop conversations about the best way to write a data handling Loop section 3.9 Goos and labels C provides the infinitely abusable go-to statement and labels to Branch to form the go-to is never necessary and in practice it is almost always easy to write code without it we have not used goto in this book nevertheless we will suggest a few situations where go-tos might find a place the most common use is to abandon processing in some deeply nested structure such as breaking out of two Loops at once The Brak statement cannot be used directly since it leaves the only the most inter Loop innermost Loop thus four open parentheses dot dot dot close parentheses four open parentheses dot dot dot close parentheses open curly brace do some stuff if open parentheses disaster close parentheses go to error semicolon close curly brace dot dot dot dot dot dot and then error colon and then clean up the mess this organization is Handy if the error handling code is non-trivial and if errors can occur in several places a label has the same form as a variable name and is followed by a c it can be attached to any statement in the same function as the goto another example consider the possibility of finding the first negative element in a two-dimensional array multi-dimensional arrays are discussed in chapter 5 one possibility is for open parentheses I equals 0 semicolon I less than n semicolon i++ closed parentheses four open parentheses J equals 0 semicolon J less than M semicolon j++ Clos parentheses if open parentheses V sub I subj less than zero close parentheses go to found other and then you handle and you keep going and then found colon include is where it comes to uh jumps to code involving a go-to can always be written without one but though perhaps at the price of somewhat repeated test or an extra variable for example the array search becomes found equals z for open pry I equals 0 semicolon I less than n Ampersand Ampersand exclamation found semicolon i++ for open parentheses J equal 0 semicolon J less than M Ampersand Ampersand exclamation found semicolon j++ found equals V sub I subj less than than zero if found continue else not found although we are not dogmatic about the matter it does seem that go to goto statements should not should be used sparingly at if at all I would add before we leave control flow I need to say that I agree with structure programming experts as well as kernigan and Richie in that the go-to is universally a bad idea there are a lot of little details that make them a real problem things like how the stack Works in function calls and how code blocks happen and patching the stack up correctly when a go-to happens in the middle of a deeply nested mess you might be tempted to use a go-to when you want to exit multiple nested Loops in a single statement break can continue only exit the innermost Loop the authors use this example above but are quite lukewarm when describing it as a use of go-to usually if your problem is that complex putting things in a function and using return or adding a few if statements is a better choice the Dr Chuck middl tested Loop data processing solves this because the loop is always the innermost Loop also as new languages were built the concept of exceptions became part of language design and was by a by far more elegant solution to a path of some deeply nested code that just needs to get out so most of the time you think the goto might be a good idea you should lean towards a throw catch pattern to make your your intention clear it is one of the reasons why we prefer languages like Java or python over C when writing general purpose code this work is based on the 1978 C programming book written by Brian W kernigan and Dennis M Richie their book is copyright All Rights Reserved by AT&T but is used in this work under fair use because of the book's historical and scholarly significance its lack of availability and the lack of an accessible version of the book the book is augmented in places to help understand Its Right Place in a historical context amidst the major changes of the 1970s and 1980s as computer science evolved from a hardware first vendor centered approach to a software centered approach where portable operating systems and applications written in C could run on any hardware this is not the ideal book to learn SE programming because the 1978 Edition does not reflect the modern sea language using an obsolete book gives us an opportunity to take students back in time and understand how the sea language was evolving as it laid the groundwork for a future with portable applications [Music] welcome to chapter four functions and program structure in this chapter we're going to start digging a little deeper part of the goal of this course is to get you to the point where we can talk about how things really work eventually in the next course we'll even go down to Hardware hardware and architecture and Gates and so it's it's time to start opening things up and looking at how things work and so this is a good time to do so and the big thing we're going to learn among other things is the concept of a stack how pass by reference works how how pass by value and pass by reference work a little bit about recursion recursion is a a thing that I worry about a lot um well we'll get there and a pre-processors these are all things where we're really stting it's I'm not just it's not so much about just how functions work but how functions are implemented and how that affects how they work so the first thing I want to talk to you about is a really Nifty computer science concept called a stack a stack is a data structure that we use and it has a couple of attributes the idea of a stack is we we start with an empty stack and we put things on the stack and then we take things off we take them off the last thing we to put on is the first thing we get off and they go up and they go down you can push things onto them and take things off of them we can approximate this with a python list so so we start with an empty list we append the string one to it and the stack has a one on it it's kind of growing up from the bottom and it's going to shrink from the top and then we append a two to it and then we our stack is now one and two so the bottom thing in the stack is one and the top one is two and then we append a three to it and we have one two three on the stack at that point we pop pop says give me the most recent pushed thing and then take it off so we pop off three and the stack REM with one and two again this is also known as a last in first out or lifo q q is like a line of things and so the the last thing in is the first thing you got out so that's a stack and we're going to use Stacks in function calls so historically when we talk about call by value and call by reference we basically say that call by value means that somehow this value like in the main program ma with a variable with a value of 42 ends up being copied into the function and the parameter op is got a is got a copy of the 42 it's not the original ma it's the copy of 42 so 42 gets passed in the function one and op is a copy of 42 so then inside the function we can subtract 10 from it and then we can see that but then when we get back in the main function we see that Ma is been unchanged and it's like oh we built a little wall around the function and nothing inside the function happens the outside world is unaffected by it and that's a great oversimplification of call by value of course call by reference means the stuff un function can affect outside the function but let's talk a little bit about how a stack is used to accomplish this so just to use some terminology C calls these variables that are allocated inside the function before the function starts as the automatic variables and frankly int Ma = 42 in main is an automatic variable inside the main because main is a function inside a seat program that happens to be the one that starts everything out so if we get to the point where it says int ma equals 42 and then it prints ma being 42 at that point on the stack the C runtime is allocated one integer and we've assigned 42 to it so that's what the stack looks like at that first print statement in Maine then we call the function one and pass ma in and this is where the C runtime Library kind of before everything starts out in the function one it allocates what's called a stack frame and a stack frame includes the parameters op and the automatic variables that are inside of that function and so in this case we're going to get two variables we're going to get op as an integer and TN as an integer and before the program starts up the value 42 is copied from ma into op and so the stack frame is the context in which that function operates so when it first starts you see that op has 42 you also see we have two copies of the number 42 and we have a parameter op and then we have the the automatic variable the TN then the next line runs okay and that point op is changed op equals op minus TN and so op becomes 32 but you'll notice that on the stack out beyond the stack frame the stack frame is our current execution of the function one beyond the stack frame the 42 is still there we can't see it we're we're in the function right now and we're only seeing the top part of the stack we're not seeing the part of the stack that belongs to the main program so that's where it prints out 32 so 32 in the function says op is 32 and that's fine and then we return and that's when the C runtime removes the stack frame pops those things off the stack it remembers how much it put on and it pops all the stuff off the stack that it put on and then it basically comes back into Ma and the stack frame for the main program has one variable in it and it's Ma and it has 42 and that's because one operated in its stack frame and now the main program is back to operating in the same stack frame you can almost think of to this as like one never happened right from Main's perspective it had some variables one ran and a stack frame was created some of the main data was copied into the one stack frame one operated in its stack frame and then the stack frame went away right before one or right at the moment that one returned and the return value ends up in the stack frame too I just haven't shown you that and these don't send return value the the return value comes back uh from the stack frame but you can see how main started with a stack with one one variable on it and then it one ran and all that stuff happened and then it kind of was undone and that's where the changed variable just kind of went away and so the stack it's as if nothing ever happened except it went up and then it went back down now one thing we notice and and in Python we see this too where you say everything is called by value which implies a copy um except for things like certain objects and calling method and objects and if but if you look at say this function zap and we pass in X and X is starts out as original and then it calls the zap function and it passes in and it's got the original then it gets changed inside the inside the zap function and that change uh prints out but then when it comes back it is back to the original so X is back to the original in the main code and you might say oh that looks right and that that's actually quite intuitive and that that python has made it so a call by value inside of this ins call by value to the zap function it happened meaning that nothing change nothing changed outside of the zap function and it was a call by value not a call by reference now I'm not going to go into it at least not right now talking about why that really worked and it's it's less about call by value and call by reference and more about the fact that Y is really a pointer to an object and when y equals changed executes inside of Zap the object pointer it points to a different object and then but X never changed and so python has a slightly different runtime but it leads to this notion that seems like a string variable in Python is call by value now if we look at the similar but quite different code inside of C we see the main has a a a character array X of unknown length which is original and that just unknown length means that it say it's uh looks like eight characters plus a back sl0 which is nine characters and it prints out kind of like a string it's a character array with a terminator and then we Call Zap pass X in and then inside of zap zap takes a character array as its parameter and it can print out the word original when it starts and then it copies changed into it and then it says at the end it's it's the why is at the end is changed but then we come back and back in the main program it got changed so does that mean it was called by reference or what and the answer is sort of and this is where it kind of helps so it turns out that when you are passing an array into a function in C you're not actually passing the contents of the aray so most time we think of that 42 being copied if it's an integer if it's just like a scaler thing like a float or an INT or a Char or something that's being copied but when you have an array that could be gigantic so it doesn't actually copy in the whole array so when X is being passed into zap and being received as y we're not actually passing the string because that could be a million characters for all we know so it's not like it makes an extra copy of a million characters what it's doing is it's passing in the address of the start of the string not on the stack but somewhere else it actually could be on the stack somewhere but it's not in the stack frame and the word original is not copied into the stack frame the stack frame only includes a pointer or the address X is the address of this letter O and then Y is also the address of the letter O which means when we're calling stir copy we are overwriting those characters oh and by the way I carefully made sure that the string changed was shorter than the string original or my program could have blown up because this is C and arrays don't get bigger python strings get bigger but arrays don't in a couple more lectures we will build a data structure where it's like a python string and we can add to it and you'll see that the code is very complex a character array is very simple okay and so it's not exactly a pass by reference it is a pass by location and if you happen to misuse that location meaning you write to that location you write to location now this might have been in like readon memory something your program might have blown up so you better be sure what you're doing when you start messing inside of a function with an array that's been passed to you now sometimes you're supposed to sometimes we tell you to write that another thing that you're going to see in this is the reference to register variables uh this is another rather historical notion and in my opinion it really had to do with uh convincing uh really skilled Assembly Language programmers that they could get the same performance out of sea that they were used to getting inside Assembly Language and so what are registers well when you have a central processing unit and you have memory the data lives in the memory and the registers live in the central processing unit and depending on the speed of things a register might be you know 40 times faster than regular memory and so if you could keep a variable like I in a loop you could keep that in a register that's faster and so what we can do with saying register int X is say hey by the way next few lines X is a really important variable and I expect to use it a lot so if you can possibly not store this in memory please do so and that's why the only kind of weird thing about the register is you can't get the memory address of a register variable now in mon compilers we have runtime optimizers that are miraculous I mean they border on mirac even the simplest of runtime optimizers that speed the code up at runtime are miraculous and saying this is a register or that's a register might actually confuse things so all the register does is hey I am never going to ask you the address of this variable so don't bother putting it in memory if you don't feel like it okay so it's it's probably but I also think I think it's kind of fascinating and fun to think about this think about how early sea developers were so deeply connected to their Assembly Language that is at the runtime uh recursion recursion recursion recursion when a function calls itself it's called recursion it's a powerful it's a beautiful concept there is places when if you're given a treel like structure like you parse some XML or something and you're reading through the XML recursion is such a pretty way to write code I'm about to show you a very simple recursion example that are two things first they're really inefficient and silly uses of recursion and they mislead you as to why you should use recursion and they misleadingly kind of tell you that like recursion is great let's use it for something it doesn't e is not well used for so so really in this section I'm not trying to show you what recursion is used for well I'm more interested in giving you a really artificial synthetic example to show how the call stack works and how recursion works with the call stack so here we have I mean some tortured code it is not pretty code this is a I'm I'm writing code that if given a number like uh three adds up 1 plus 2 plus three okay and gives me six there are so many ways to do this there's even a close form solution that doesn't even require a loop that's called algebra but we're going to use recursion so if you look at the int main I'm going to say sum up three that means sum up things one two three so sum up is being called from Main and then I've got the return value and I say Su and you'll notice in the print out that Su is the very last thing that comes out and so if you walk through sum up you see that there is a parameter called above that's coming I call that above because it's coming from whoever's calling us there's a parameter below which is we're going to compute a value and send it down to the next copy of ourselves down and then sum is the sum of the the above value and the sum of the below values and then R Val is I mean actually sum is just coming back from the call to ourselves and then R Val is adding those two things together and I do this in exceedingly slow motion with print statements everywhere that just makes this look ugly because really the only thing that matters is where it says sum equals sum up below and what we're doing is we're calling the same function again so if you look at Main and you see the sumup call that is going to create a stack frame and in that stack frame it's going to have a whatever the three number is we're going to make an above variable copy three into the above variable and allocate a below sum and R valve so our stack frame is going to have four integers on it and then the function starts working and the way the r recursion works is there's always got to be a time at which it stops this is kind of like going down down down and it has to work its way back up if it goes down forever that's called a stack Overflow and then your computer runs out of memory and your application blows up so you know have to have a time at which the stack uh algorithm stops so what we're saying is if we're being told to sum up up one or less well we Define the sum of that as one so we just return the sum of everything up to one as one and that's our way of stopping the recursion at the bottom and then what we do is we take below and we subtract one from it so if we're being called with three below becomes two and so you see that over in the lower stack frame below is two right below is two and we're about to go down deeper into the call tree and um so we're going to we're going to call sum equals sum up below so what happens now is we're passing two in to another stack frame and so there's not really a cop another copy of the code but there's another stack frame and so now we're calling sum up with two as the parameter that's our below but then we see the stack frame and now this is the stack frame that's kind of on the left hand side there above in this stack frame is two and then execution begins and we subtract one from below well above is not less than one so we subtract one and above below becomes one because it's 2 - one and we're going to go down so it says down one which means it's going to again say sum equals sum up below and below in this case is one and so it makes another stack frame there's a so we're actually there there's a maximum of like three calls here and then it's going to work its way out so then it calls another stack frame that's not shown on the right hand side and then it runs runs with one as above and then above is less than or equal to one so it returns one and that's why it says in one and it doesn't say anything because then it's returned and it returned the value one so now the third stack frame comes off and now we're in the other stack frame and sum is what the return value of the sum up call was so it says back one and then it says above now we're now we're in the stack frame that's on the right hand side so sum is one below is one above is two and so we compute R Val which is 1 + 2 and that's three and we print that out and that's kind of where the where we're indicating and then we return the three right and then it returns R Val and it runs some more it gets three back then it adds the adds the three to it and returns one more time and the stack all pops up and eventually you get six and this is a I mean you can look at this as long as you like I don't this code is like a foolish way to make this calculation like most artificial recursion examples the key thing here is just think about the stack frame right every time you call in another stack frame happens call in stack frame happens and then when the return value happens it goes back to that stack frame so the stack frame is a way of pausing execution at the moment create a new stack frame execu in that stack frame and if you need to have a yet another stack frame and so this idea of creating stack frame with the parameters and automatic variables each time you call a function copying the parameters into that stack frame and then executing the function in that stack frame we're not making extra copies of the function we are just creating a new stack frame that's what the essence of recursion is is the fact that you have a stack and each call makes a stack frame and if you recursively call again you just make another stack frame and so it's almost easier in my mind to Think Through how stack frames work than it is to think through how recursive code works now I want to talk a little bit about the C preprocessor it's it's the last thing of this chapter and it's in some ways orthogonal to functions and program structure I mean it it is part of program structure and so I've talked a lot about how wonderfully the C compiler and eventually Unix solve so many problems of uh software source code portability and things like indianness and character arrays and masking shifting not being necessary that those were awesome but the problem was is that c has always operated in an environment the language has changed uh in the early days it uh it wasn't standardized by 84 it got standardized ancc came out a lot of people used it outside its original creation and so a lot of things got fixed in the first decade of C's use the language evolved a lot the language kept kept evolving and a lot of the things that would make it evolve are things like integers went from 32 bits in some situations to 64 bits and then you have to say well what is a long is long 64 long 32 because in some it started with int being 16 bits and long being 32 and then long with 64 well for a while then ins were 32 and Longs were 32 and then ins were 32 and Longs were 64 and what would happen if ins were 64 and it had to do with computer architecture 64 bits right and so sometimes you would have a bit of code and it just what you knew you wanted a 32bit thing and you weren't sure if an INT or a long or a short was going to give you 32 and so you had to say you know I really need different source code like if I'm working on a pdp1 I got one thing and if I'm working at interd 732 I want another thing CU really I want a 32 a 32-bit integer and now there's actually int 32 in some of these things because you do need to know sometimes you're using 32-bit integer then libraries changed there's calling sequences that changed because again as computers got bigger and bigger and memory got bigger and dis drives got bigger you would be in a certain version of an operating system and and the the calls to reading files might be slightly different and so it's not really that the source code was portable it's the the the the calling sequences to library started to change um Hardware evolved operating systems evolved C started running on non-unix because C originally started on Unix but then it quickly went to other operating systems because it was such a powerful concept but sometimes in these other operating systems just things were kind of different because they weren't working on Unix and so the pre-processor really was a an effective way to patch your source code so that you could say look I I wrote this Source Code 10 years ago and it worked on a pdp1 and now I'm going to run this on an IBM 3 60 architecture and I don't want to change that there's a few changes I need to make that have nothing to do with sort of the what what a for Loop looks like but it has to do with like what library I'm calling or what the return type for that Library might be so the pre-processor allowed us to put variations in the source code and the pre-processor it's really feels weird because it's it's syntax syntax is very different because the pre-processor is kind of a line oriented processor and has these pound things right these these um like pound if def pound Define pound else pound end if and pound include that's even a pre-processor so what the pre-processor is is a not a compiler at all it is a source code to source code translator it expands the include files and then it makes changes so in the top example where you see pound include stdio.h you can actually run GCC minus capital E and says just run on the pre-processor and shows me what comes out you take you know 10 12 lines of code on the left and it puts out hundreds and hundreds of lines of code on the right I'm only showing you a subset of it but the biggest part is the fact that include std. is literally expanded and then that is C code without the pound include okay so that's the pre-processor but then another example here is I'm creating this use underscore long this is not really a variable this is a compiled time variable so I'm going to create a new string called intore 32 and if Ed long has been defined I'm going to make int 32 be long else I'm going to make intore 32 be an INT and again this could be a thing where I'm compiling for different architectures and I want this variable IP address to be a 32-bit integer and I need it to work on different operating systems so in this case um because use long is not defined int 32 as a string substitution like a macro string substitution before the compiler even does anything turns n32 into int and so that's what the uh five lines of pound if Def and all that stuff do is it says change this in32 string in my source code into int and so what we see on the right is really C code what we see on the left is kind of c pre-processor plus c code and so the pre-processor transforms source code to source code so I was looking around at some old code that I happen to have grabbed and put into uh GitHub which was some code from 1994 from X Mosaic 1.2 and for those of you took my internet history technology and security you know that X Mosaic was the first web browser that was portable across multiple operating systems which and the more and then eventually Mosaic ended up on uh unic systems with x windows that's why it's called X mosaic and it went to the mac and went to the PC and so it was really many unix's Mac and PC and what we're seeing here on the right hand side is actual source code from that which was written in like 1993 1994 and what you see is a bunch of if defs in if and defs and some comments and like there's a if def Solaris 9 broken um and it has to do with like where do we find the error messages on this across all these weird operating systems because the way they put error sometimes they would use extern which are Global variables defined inside the runtime and we would just look at those variables we would make a function call and it would write into these Global variables but then that Global variable might be different so this is actually from some code that was HTT p and C that was some early network connections now these days you know we just do this stuff in like pound import requests in Python but in those days the C libraries for network connections were really different meaning that they were just you know here comes the network here's this language C It's Been Around by you know 89 991 we were the network was there and so we were building libraries but then how each Library worked in each operating system was a little wonky and so they had to write different C code to compensate for the different operating systems that this C program a web browser would be running on and so all these if defs mean that one source code with a few predefined constants compile time constants could then work on a wide range of of operating systems and so yes the C language itself is portability portable but we also want to be portable over time and so sometimes Library Val libraries change operating systems change um and we want to be able to compensate for that and so this is an important part of C these days it's less important um because a lot of the libraries have stabilized and they don't change quite so much and so this code here would probably just be a bit of socket code and the errors would come back the same way no matter what version like um VMS is an operating system that doesn't exist anymore think C it doesn't exist anymore NEX doesn't exist anymore Solaris doesn't exist anymore so these are all operating systems that don't even exist anymore but this code was portable across all those things and actually I I compiled all this and you can kind of take a look at it I made a a video where I re resurrected this code oh it's got to be eight or nine years ago on a Macintosh which is an evolved from next I don't know if I could get it to work again but back then I got it to work on a Macintosh and I said to find it as next and so I compiled this C code and I there's there wasn't there is an x- windows on Macintosh I got the x windows Library I got all this stuff working and I told it you're an next and then I recompiled the C code and eventually something came up and I I made a video about it and so because I knew that it'd be very difficult to keep this thing working over time but to go from 1994 to 2014 um and recompile something in you know 20 years later uh that's still pretty impressive that that next code would still work uh things like the V VM code VMS code there's no VMS computers that I know of uh these days so just it just shows that the idea of you know portability is a it it some of it is simple and elegant and was laid down in 1978 but then there are things outside the programming language that were evolving uh and still are evolving to this day and if you are doing uh C coding today or C++ coding today you may be using things that start with pound sign which are compiler directives rather than um C code so with that uh dive into chapter four and uh learn about [Music] functions welcome to C programming for everybody my name is Charles sance and this is my reading of the 1978 C programming book written by Brian kernigan and Dennis Richie at times I add my own interpretation of the material from a historical perspective chapter four functions in program structure functions break large Computing tasks into smaller ones and enable people to build on what others have done instead of starting over from scratch appropriate functions can often hide details of operation from parts of the program that don't need to know about them thus clarifying the whole and easing the pain of making changes C has been designed to make functions efficient and easy to use C programs generally consist of numerous small functions rather than a few big ones a program May reside on one or more source files in any convenient way the source files may be compiled separately and loaded together along with previously compiled functions from libraries we will not go into that process here since the details vary according to the local system most programmers are familiar with Library functions for inut and output like get Char and put charart and numerical computations like s cosine and square root in this chapter we will show more about writing new functions 4.1 Basics to begin let us design and write a program to print each line of input that contains a particular pattern or string of characters this is a special function of the Unix utility program grap for example searching for the pattern the' in the set of lines now is the time for all good men to come to the aid of their party will produce the output now is the time men to come to the aid of their party the basic structure of the task Falls neatly into three pieces while there's another line if that line contains a pattern print it although it's certainly possible to put the code for all of this in one main routine a better way is to use the natural structure to Advantage by making each part a separate function three small pieces are easier to deal with than the one big one because irrelevant details can be buried in the functions and the chance of unwanted interactions minimized and the pieces might even be useful later in their own right while there is another line is get line a function we wrote in chapter one and print it is print F which someone has already provided for us this means that we need to only write a routine which which decides if the line contains an occurrence of the pattern we can solve that problem by stealing a design from pl1 the function index s comma T Returns the position or index in the string s where the string T begins or minus one if s doesn't contain T we use zero rather than one as the starting position in s because C arrays always begin at position zero when we later need more sophisticated pattern matching we only have to replace index the rest of the code can remain the same recall that because the modern stdio.h defines a getline function whenever the book writes this function to teach a feature of functions we reame it to get underscore line given this much design filling in the details of the program is straightforward here's the whole thing so you can see how the pieces fit together for now the pattern to be searched for is a literal string in the argument of index which is not the most general of mechanisms we will return shortly to a discussion of how to initialize character raise and in chapter five we will show how to make the pattern a parameter that is set when the program is run this is also a new version of getline you might find it instructive to compare it to the one in chapter one pound include stdio.h pound Define MAX Line 1000 Main open parth Pary closed parentheses open curly brace Char Line open square bracket MAX Line closed square bracket semicolon while parentheses get Line open parentheses line comma MAX Line closed parentheses greater than zero Clos parentheses if open parentheses index open parthey Line comma double quote the double quote close parentheses greater than or equal to zero close parentheses print F open parthey double quote percent s double quote comma line Clos parentheses semicolon and a close curly brace to finish the main function next function is get line get Line open parentheses s comma limb closed parentheses Char s open square bracket close square bracket semicolon int limb semicolon open curly brace int C comma I I semicolon for I equal 0 semicolon I less than Lim minus1 and Pen C equals get Char open pen Clos pen Clos pen not equal EF double Amper sand c not equal double quot back slash N double quote semicolon Plus+ I close parentheses S Sub I equals c that Loop basically read until it found a new line or and Def file if open parentheses cble equals quot back slash and quot close parentheses open curly brace S Sub I equal C plus plus I semicolon Clos curly brace closing the f s subi equals quote back0 quote semicolon to terminate the string properly return open par I Clos PR semicolon to return the length of the line and then close curly brace to finish the getline function and now the new code index open parentheses s comma T close parentheses Char s open square bracket close square bracket comma T open square bracket close square bracket semicolon I would note that when we are declaring a character array and we don't give the length of the array that means that we're inside of a function the length of the array exists but it was from the calling calling code so j s open square bracket closed square bracket T open square bracket closed square bracket semic equalent means parameters s and t are strings of some length and we will use back sl0 to know when that length is done beginning the code in index with an open curly brace int I comma J comma k semicolon and then we have two nested for Loops for open print I equals 0 S Sub I not equal to back0 semicolon I ++ open curly brace for JAL I comma K = 0 semicolon t t subk not equal to Back sl0 and Double Amper sand s subj equal T sub K semicolon j++ k++ semicolon if open PR t subk double equals quot back sl0 quote close PR return open PR I close peren semicolon close curly brace return open peren minus one close peren semicolon curly brace each function has the form name argument list if any in parenthesis argument declarations if any open curly brace declarations and statements if any Clos curly brace as suggested various parts may be absent a minimal function is dummy open pen closed pen open curly brace closed curly brace which does nothing a do nothing function is sometimes useful as a placeholder during program development the function name may also be preceded by a type if the function returns something other than an integer value this is the topic of the next section a program is just a set of individual function definitions communication between the functions is in this case by arguments and values return by the functions it can also be via external variables the functions can occur in any order in the source file and the source program can be split into multiple files so long as no function is split the return statement is the mechanism for returning a residual value from the called function to its caller any expression can follow return return open parentheses expression close parentheses the calling function is free to ignore the return value if it wishes furthermore there need there is no need to have an expression after the return in that case no value is returned to the caller control also returns to the caller with no value when EX ution falls off the end of the function by reaching the right closing brace it is not illegal but probably a sign of trouble if a function returns a value from one place and no value from another place in any case the residual value of a function which does not return one is certain to be garbage the C verifier lint checks for such errors mechanics of how to compile and lot toy program which resides on multiple source files vary from one system to the next on the Unix system for example the CC command mentioned in chapter one does the job suppose that three functions are in three files called main.c getline Doc and index. C then the command CC main.c get line. C index. C compiles the three files and PR places the resulting relocatable object code in files main. o get line. o and index. O and loads them all into an executable file named a do out if there is an error say in main.c that file can be recompiled by itself and the result loaded with the previous object files with the command CC main.c get line. o index. o the CC command uses the C suffix versus the suffix naming convention to distinguish source files from object files I would note that this CC example exactly as the authors has wrote it does not quite work as described in modern SE compilers if you want to compile your source code and leave the compiled object code around after the compile you add the minus C option to the compiler call Modern C compilers generally do accept multiple files with either C or. O suffixes and comi combine them into a runnable application section 4.2 functions returning non- integers so far none of our programs has contained any Declaration of the type of a function this is because by default called a function is implicitly declared by its appearance in an expression or statement such as while open print get Line open print line comma MAX Line Clos print greater than zero Clos print if a name which has not been previously declared occurs in an expression and is followed by a left parentheses it is declared by context to be a function name furthermore by default the function is assumed to return an INT since Char promotes to INT in Expressions there is no need to declare functions that return Char these assumptions cover the majority of classes including all of our examples so far I would add that's not true quite true anymore in modern C languages you are required to provide a type for each function if you leave off the type for a function declaration at a minimum you will give a get a Stern warning message but sometimes functions do not intend to return anything at all and so so the void type was invented to indicate that a function returns nothing the rule of requiring a type on a modern function definition in C even if it's void allows the compiler check to make sure all of your return values in a function match the expected return type back to the text but what happens if a function must return some other type many numerical function like square root S and cosine return double other specialized function functions return other types to illustrate how to deal with this let us write and use the function a to F which converts its argument string s to its double Precision floating Point equivalent a2f is an extension of a to I which we wrote versions of in chapters 2 and three it handles an optional sign and decimal point and the presence or absence of either the integer or fractional part we would note that this is not a highquality input conversion routine takes doing everything takes a bit more space than we care to use here in this book first a to F must declare the type of the value it returns since it's not int because float is converted to double in Expressions there is no point in saying that a to F returns float we might as well make use of the extra precision and thus declare it to return double the type name precedes the function name like this double A to F open parentheses s closed parentheses char s open square bracket closed square bracket open curly brace double Val comma power int I comma sign semicolon for I equal 0 semicolon s subi double equals quote space quote or SBI double equals quot back slash and quote or SBI double equals quot backt quote semicolon i++ semicolon that's skips the white space s equal 1 if s subi dou equals quot plus quote or s double equals quote minus quote s equals and now we're going to use a turn operator here pen S Sub i++ close bracket double equals quote plus quote close Pin question mark one colon minus one semicolon and basically tells us it it makes sign be one or negative 1 based on the presence or absence of a plus or minus for Val equals 0 semicolon s subi greater than or equal to quot 0 quote double Amper s subi less than or equal toble quot 9 quot I ++ Clos PR Val equals 10 * Val plus s subi minus qu0 quot semicolon what this is doing is multiplying the current value by 10 in effect shifting it left and then adding in that new lower empty spot um the digit that we're encountering which is somewhere between the character Zero and the character nine continuing if parentheses s subi double equals quote. quote Clos parentheses i++ four open parentheses power equals 1 semicolon S Sub I greater than or equal to qu0 quote double Amper sand S Sub I less than or equal toot 9 quot semicolon I ++ open curly brace Val equals 10 * Val plus S Sub I minus quot 0 quote again Shifting the number to the left as it's as we encounter characters Power Star equals 10 semicolon close curly brace return open parentheses s times Val over power close parentheses semicolon close curly brace second and just as important the calling routine must state that a to F returns a non-int value the Declaration is shown in the Pro following primitive desk calculator barely adequate for checkbook balancing which reads one number per line optionally preceded by sign and adds them all up printing the sum after each input this example is from page 70 of the textbook pound include stdio.h pound Define MAX Line 100 Main open parentheses close parentheses open curly brace double sum comma a to F open parentheses closed parentheses semicolon Char Line open square bracket MAX Line Clos square bracket semicolon sum equals zer while open parentheses get Line open parentheses line comma MAX Line close parentheses greater than zero close parentheses print F quote back SLT percent. 2f back sln comma sum plus equals a to F open parentheses line closed parentheses close parentheses semicolon close curly brace that code use the plus plus side effect operator and merge the called to a to F right into the parameter of the second parameter of print F the Declaration double sum a to F open parentheses closed parentheses says that sum is a double variable and that a to F is a function that returns a double value as a pneumonic it suggests that sum and a to F parentheses dot dot dot are both double Precision floating Point values unless a to F is explicitly declared in both places C assumes it returns an integer and you will get nonsense answers if a to F itself and the call to it in main are typed inconsistently with the with in the same source file it will be detected by the compiler but if as is more likely a to F were compiled separately the mismatch would not be detected and a to F would return a double which main would treat as an end and meaningless answers would result Lind catches this error given a to F we could in principle write a to I to convert a string to an integer in terms of it int a to I open parentheses s closed parentheses Char s open square bracket closed square bracket semicolon open curly brace double A to F open parentheses closed parentheses semicolon return open parentheses a to F close parentheses s close parentheses close parentheses semicolon Clos curly brace note the structure of the Declarations in the return statement the value in the expression and return expression is always converted to the type of the function before the return is taken therefore the value of a to f a double is converted automatically to int when it appears in a return since the function a toi returns an INT the conversion of a floating point value to int truncates any fractional part as we discussed in chapter two more on function arguments in chapter one we discussed the fact that function arguments are passed by value that is the called function receives a private temporary copy of each argument not its address this means that the function cannot affect the original argument in the calling function within a function each argument is in a a local variable initialized to the value with which the function was called when an array name appears as an argument to a function the location of the beginning of the array is passed elements are not copied the function can alter elements of the array by subscripting from this location the effect is that arrays are always passed by reference in chapter five we will discuss the use of pointers to permit functions to affect non- arrays in calling functions a bit of a a digression since including a array as an argument passes the location or memory address of the array into the function the function cannot can change the items in the array using array subscripts in particular the array contents are not copied when an array is passed into a c function when we get to strs in a future chapter we will find that the content of strs also are passed using the address of the entire struct so strs are passed by reference as well when thinking about pass by reference or pass by value remember that a Char variable is a single item similar to int and passed by value I.E it is copied in C strings are arrays of characters so they are passed by reference python follows this design for the same efficiency reason is C normal single variables like int or float are copied before being passed into a function and therefore passed by value collections like list or dict are passed into functions by reference so the contents can be changed within a function python strings are not technically copied when being passed into a function but the way assignments happen in Python make it seem like strings are passed by value since they can never be modified you can learn more with a bit of web research but the easy way is to imagine in Python that strings are passed by value with a clever trick to avoid requiring a copy for every function call PHP follows the same pattern of passing numbers and strings by value and passing arrays as reference PHP passes strings by value without recever without requiring a copy again using clever runtime code because in Java JavaScript and PHP strings are objects of course which we haven't discussed much yet those languages can make sure that strings act as if they were passed by value and not passed by reference the way they are always passed in C C made decision on its runtime based on getting the maximum performance out of the hardware of the 1970s at the expense of making it too easy to write code that overwrites memory and leads to corrupted programs that have dangerous and undefined Behavior languages like PHP Java and JavaScript add a small amount of runtime overhead to do things like store the length of an array and make sure we programmers don't over reference the array and overwrite random bits of our programs code or data the creators of C placed more priority on speed and efficient use of memory than safety it is like driving an autom automobile in the rain without ABS automatic braking system it is fast but dangerous and should be reserved by highly skilled and very careful programmers and drivers and those drivers should probably be on a race course by the way back to the text by the way there is no entire L satisfactory way to write a portable function that accepts a variable number of arguments because there is no portable way for the called function to determine how many arguments were actually passed in a given call thus you can't write a truly portable function that will compute the maximum of an arbitrary number of arguments as well the max functions that are built in to Fortran and pl1 it is generally safe to deal with a variable number of arguments if the called function doesn't use an argument that was not actually supplied and if the types are consistent print F the most common C function with a variable number of arguments uses information from the first argument which is the formatting string to determine how many other arguments are present and what their types are it fails badly if the caller does not supply enough arguments or if types are not what the first argument says it is also non-portable and therefore must be modified for different programming environments alternatively if arguments are of known types it is possible to Mark the end of the argument list in some agreed upon way such as a special argument value often zero that stands for the end of the arguments interestingly modern languages like python PHP and Java go to Great Links to make variable length argument lists work predictably and portably the Syntax for variable length argument lists in these language can be a bit obtuse at times but at least it's allowed documented reli iable and portable section four external variables a c program consists of a set of external objects which are either variables or functions the adjective external is used primarily in contrast to internal which describes arguments and automatic variables defined inside functions external variables are defined outside any function and are thus potentially available to many functions functions themselves are always external because C does not allow functions to be defined inside other functions by default external variables are also Global so that all references to such a variable by the same name even from functions that are compiled separately are references to the same thing in this sense external variables are analogous to Fortran common or pl1 external we will later see how to define external variables and functions that are not globally available but but instead only visible within a single source file because external variables are globally accessible they provide an alternative to function arguments and return values for communicating data between function any function May access an external variable by referring it to by name if the name has been declared somehow if a large number of variables must be shared among functions external variables are a more convenient and efficient than long argument list as pointed out in chapter one however this reasoning should be applied with some caution for it can have a bad effect on program structure and lead to programs with many data connections between functions a second reason for using external variables concerns initialization in particular external arrays may be initialized but automatic I.E internal arrays may not we will treat initialization near the end of this chapter the third reason for using external variables is their scope and lifetime automatic variables are internal to a function they come into existence when the routine is entered and disappear when it's left external variables on the other hand are permanent they do not come and go so they retain values from one function invocation to the next thus if two functions must share some data yet neither calls the other it is often most convenient if the shared data is kept in external variables rather than rather than passed in and out via arguments let's examine this issue further with a larger example the problem is to write another calculator program better than the previous one this one permits plus minus asterisk and Slash and equals equals will print our answer because it is somewhat easier to implement the calculator will use reverse polish notation instead of infix notation reverse polish notation is the scheme used by for example hulet Packard pocket calculators in Reverse polish notation each operator follows its operators operand an infix expression like open parentheses 1 minus 2 closed parentheses star open parentheses 4 + 5 closed parentheses equals is entered as 1 2 minus 45 + star equals parentheses are not needed in Reverse polish notation the implementation is quite simple each operand is pushed onto a stack when an operator arrives the proper number of operands two for binary operators are popped the operator applied to them and the result is pushed back onto the stack in the example above one and two are pushed then replaced by their difference neg1 next four and five are pushed then replaced by their sum N9 and then the product of -1 and 9 which is9 replaces them on the stack and then the equal sign operator prints the top element without removing it so intermediate steps in a calculation can be checked the operations of pushing and popping a stack are trivial but by the time airor detection and Recovery are added they're long enough so it's better to put each in a separate function then to repeat the code throughout the whole program and there should be a separate function for f fetching the next input operator or operant thus the overall structure of the program is while next operator or operand is not end file if it's a number push it else if it's an operator pop operands do the oper operation and push the result else error the main design decision that has not yet been discussed is where the stack is that is what routines access it directly one possibility is to keep it in Main and pass the stack and current stack position to the rettin that push and pop it but N Main doesn't need to know about the variables that control the stack it should only think about pushing and popping so we have decided to make the stack and its Associated information external variables accessible to push and pop but not to main translating this outline to code is easy enough the main program is primarily a big switch on the type of operator or operand this is a more typical use of switch than the one shown in chapter 3 this sample code is from page 74 of the textbook pound include stdio.h pound Define Max op 20 pound toine number quote zero quote this is going to be a single we found a number and pound toine two big quote N quote a signal that the string is too big main open print closed parentheses open curly brace int type semicolon Char s open square bracket Max op Clos square bracket semicolon double op two a to F open parentheses closed parentheses comma pop open parentheses Clos parentheses comma push open parentheses closed parentheses semicolon while open parthey open parentheses type equals get up open parentheses s comma Max op closed parentheses closed parentheses not equal e f closed parentheses switch open parentheses type closed parentheses open curly brace case number colon note that number is a predefined constant above push open parentheses a to F open parentheses s close parentheses Clos parentheses semicolon break case quote plus quote colon push open parentheses pop open pin Clos pin plus pop open pin Clos pin Clos pin semicolon break semicolon case quote asteris quote colon push open pin pop open pen Clos pin asterisk pop open pen Clos pin Clos P semicolon break semicolon that was the multiplication case quote quote colon op two equals pop open Forint close PR push open print pop minus op2 Clos print semicolon break semicolon that was subtraction case open pin slash Clos pin colon Op 2 equals pop open pin semicolon if op two not equal 0.0 push open print pop open print Clos print SL Op 2 close print semicolon else print F double quote zero divisor popped back sln double quote close print semicolon break case quote equal quote colon print F open PR double quote backt percent F back sln double quote comma push open pin pop open print close print close print close print semicolon so to print it we pop it and push it and then print the residual value of the push function the next line is break semicolon case quote C quote colon clear open print Clos print semicolon break semicolon case too big colon print F double quot percent. 20s space dot dot dot is too long back sln double quote comma s close Pin semicolon break semicolon default colon print F open print double quote unknown command percent C back sln double quote comma type open print Clos print semicolon break semicolon close P close curly brace to finish the switch statement and then close curly brace to finish the main so now we're going to have a separate file that has some of these functions defined this file is on page 75 of the textbook this file will be compiled separately but then later linked all together with the main program but we're going to Define push pop and clear in this file pound include stdio.h pound toine maxv Val 100 maximum value that' be the maximum value of our stack the maximum size of our stack maximum depth of our stack now we are declaring variables outside of any function these are the external variables int SP equals z double Val open square bracket Max Val closed square bracket semicolon and so those variables are external variables and they're outside of all of the functions but we can use them in any function and there's just one copy no matter what function we're using in so now Define our functions double push open pen F Clos pen double F semicolon open curly brace if open pen SP less than Max Val Clos pen return open pen Val open square bracket SP Plus+ closed square bracket equals F Clos pen semicolon else open curly brace print F double quote air colon stack full back SL N double quote Clos print semicolon clear open print Clos print semicolon return open print zero close PR semicolon close curly brace to finish the else and then close curly brace to finish the double function the push function and now we Define the pop double pop open PR close print open curly brace if open pen SP greater than zero closed pen return open PR Val open square bracket minus minus SP close square bracket close curly brace semicolon else open curly brace print F open print double quote air colon stack empty back sln close quote close double quote close PR semicolon clear open PR close PR semicolon return open pen zero close pen semicolon close curly brace to finish the else and then close curly brace to finish the double and the last function we're going to Define is the clear function which is quite simple clear open pen Clos pend open curly brace SP equals zero semicolon Clos pend I would note that just read this one carefully um they're very good at using side effect operators and side effect assignments and to keep this code very simple and succinct um and you really have to understand a lot of the other stuff that you've covered in the book up to this point back to the text the command C clears the stack with a function clear which can also be used by push and pop in case of error we return to getop in a moment as discussed in chapter one a variable is external if it is defined outside the body of any function thus the stack and stack pointer which must be shared by push pop and clear are defined outside the three functions but if main itself does not refer to the stack or stack pointer their representation is carefully hidden thus the code for the equal operator must use push open print pop parentheses par closed parentheses closed parentheses semicolon to examine the top of the stack without disturbing it notice also that because plus and multiplication or commutative operators the order in which the popped operands are combined is irrelevant but for the minus and slash operators the left and right operands must be distinguished this example code above shows why it's important to remember the k&r C arrange rearrangement license as it applies to operators that are associative and commutative if the code for the minus operator were written written push open print pop open print close print minus pop open print Clos print Clos print semicolon there is no guarantee that the left pop will run before the right pop and since these functions access Global variables and have side effects it is important to force the compiler not to rearrange the order of the function calls to force the evaluation order the code is broken into two statements op two equals pop open pen Clos pen semicolon push open open pen pop open pen Clos pen minus op to Clos print semicolon now you might think that the lesson here is that the KRC Arrangement license which was done to allow optimization in performance is a bad idea but the more important lesson is that writing low-level utility functions like push and pop that use Global variables and high side effects is a dangerous pattern in any programming language section 4.5 scope rules the functions and external variables that make up a c program need not all be compiled at the same time the source text of the program may be kept in several files and previously compiled routines may be loaded from libraries the two questions of Interest are how are declarations written so that variables are properly declared during compilation and how are declarations set up so that all the pieces will be properly linked or connected when the program is loaded the scope of a name is the part of the program over which the name is defined for an automatic variable declared at the beginning of a fun function the scope is the function in which the name is declared and variables of the same name in different functions are unrelated the same is true of the arguments of the function the scope of an external variable lasts from the point at which is it is declared in a source file to the end of that file for example if Val SP push pop and clear are defined in one file in the order shown above that is int SP equals z double Val open square bracket Max Val close square bracket semicolon double push open pen F Clos pin open curly brace do do da close curly brace double pop open print Clos PR open curly brace dot dot dot close curly brace clear open pen close PR open curly brace dot dot dot close curly brace then the variables Val and SP P may be used in push pop and clear simply by naming them and no further declarations are needed on the other hand if an external variable is to be referenced before it is defined or it is defined in a different source file from the one in which it's being used then an extern declaration is mandatory it is very important to distinguish between the Declaration of an external variable and its definition a declaration announces the property of the variable its type its size Etc a definition also causes storage to be allocated if the lines int SP semicolon double Val open square bracket Max Val closed square bracket semicolon appear outside any function they Define the external variables SPN Val and cause the storage to be allocated and also serve as the Declaration for the rest of that source file on the other hand the lines exter INTP xter double Val open square bracket close square bracket semicolon declare for the rest of the source file that SP is an INT and Val is a double array whose size is determined and allocated elsewhere but they do not create variables or allocate storage for them there must be only one definition of an external variable among all the files that make up the source program other files may contain extern declarations to access it there may also be an extern declaration in the file containing the definition any initialization of an external variable goes only with the definition array sizes must be specified with the definition but are optional with the extern Declaration although it is not a likely Organization for this program Val and SP could be defined and initialized in one file and the functions push pop and clear defined in another then these definitions and declarations would be necessary to tie them together in file one we would see int spals 0 semicolon double Val open square bracket Max Val closed square bracket semicolon and then in file two exter int SP semicolon X turn double Val open square bracket close square bracket semicolon double push open pren F Clos pen open curly brace dot dot dot Clos curly brace double pop open pren close pen open curly brace do dot dot close curly brace clear open pren close pen open curly brace dot dot dot close curly brace because the ex turn declarations in file two lie ahead and outside the three functions they apply to all one set of declarations suffices for all of file to for larger programs the pound include file inclusion facility discussed later later in this chapter allows one to keep only a single copy of the extern Declarations for the program and have that inserted in each source file it's as it's being compiled let us now turn to the implementation of getup the function that Fe fetches the next operator or operand the basic task is easy skip blanks tabs and new lines if the character is not a digit or a decimal point return it otherwise collect a string of digits that might include a decimal point and return number a single signal that a number has been collected routine is substantially Complicated by an attempt to handle the situation properly when an input number is too long get up reads digits perhaps with an intervening decimal point and until it doesn't see anymore but only stores the ones that fit if there was no overflow it returns number in the string of digits if the number was too long however getop discards the rest of the input line so the user can simply retype the line from the point of air it returns too big as the Overflow signal this example code is from page 78 of the textbook and you can view it at www.cc.com code page 78 get up open print s comma Lim Clos print Char s open square bracket close square bracket semicolon int limb semicolon open curly brace int i commac c semicolon while open PR open PR C equals get CH open PR close PR close PR double equals quote space quote or C equals quot SLT quote or C equals quot back sln quote close Pin semicon skip all the blanks if C is not equal quote. quote and open PR C less than quot 0 quote vertical bar vertical Bar C greater than quote 9 quote Clos PR close P return return open PR C Clos PR S Sub 0al C 4 pen I = 1 semicolon open PR C equals get chart open PR Clos PR close PR greater than or equal to quote 0o quote and C less than or equal to quote 9 quote semicolon i++ inside the for Loop if I less than limb S Sub I equal C if open for N C equals quote. quote closed pen open curly brace we begin to collect the fraction if open pen I less than limb Clos pen S Sub I equal C four open PR i++ C equals get Char open p close PR close PR greater than or equal to quote 0o quote ersan ersan C less than or equal to quote 9 quote semicolon i++ Clos per if open p i less than limb Clos pen S Sub I equals c close curly brace to close the if statement where we're collecting the fraction if open pen I less than limb closed pren open curly brace this means the number is good unget to CH open pren C closed pen semicolon S Sub I equals quote back0 quote semicolon return open print number Clos print and recall that number is a predefined constant close Cur curly brace else open curly brace if it's too big we're going to skip to the end of the line while open PR c not equal quote back slash and quote and c not equal eof open PR C equals get Char open par closed par semicolon s sublim minus one equals quot back sl0 quote semicolon return too big semicolon close curly brace to finish the if and then close curly brace to finish the function recall that too big is a a constant that indicates that uh We've read too much back to the text what are get CH and unget CH well it is often the case that a programming reading input cannot determine that is read enough until it is read too much one instance is collecting characters that make up a number until the first non digit is seen the number is not complete but then the program has read one character too far and that is a character it is not prepared for the problem would be solved if it were possible to unread The Unwanted character then every time the program reads one character too many it could push it back on the input so that the rest of the code would behave as if it never been read fortunately it is easy to simulate UNG getting a character by writing a pair of cooperating functions get CH delivers the next input character to be considered unget CH puts the character back on the input so the next call to get ch will return it again how they work together is simple unget CH put puts the pushed back characters into a shared buffer a character array get CH reads from the buffer is there's anything there and then it calls get Char if the buffer is empty there must be an index variable which records the position of the current character in The buffer since the buffer and index are shared by get CH and unget CH and must retain their values between calls they must be external to both routines thus we can write get CH and unget CH and their shared variables as follows this is on page 79 of the textbook we can see the code at www.cc.com code and go to page 79 pound include stdio.h pound Define buff size 100 char buff open square bracket buff size close square bracket semicolon int buff P equals 0 semicolon those or the external variables outside any function here's the first function get CH open pen close pen open curly brace return open parentheses open parentheses buff P greater than zero close parentheses question mark buff open square bracket minus minus buff P close square bracket colon get Char open print close PR close PR semicolon close curly brace to finish the get CH function the unget CH function pushes a character back on input unget CH open PR C Clos PR int c semicolon open curly brace if open print buff P greater than buff size print F open PR double quote unget CH colon too many characters back slash N double quote Clos PR semicolon else buff open square bracket buff p++ closed square bracket braet equals c semicolon and then close curly brace to finish the unget CH function we have used an array for push back rather than a single character since the generality may come in handy later section 4.6 static variables static variables are a third class of storage in addition to the X turn and automatic that we've already met static variables May either be internal or external internal static variables are local to a particular function just like automatic variables are but unlike automatics they remain in existence rather than coming and going each time the function is activated this means that internal static variables provide private permanent storage in a function character strings that appear within a function such as the arguments of print f are internal static an external static variable is known within the remainder of the source file in which it's declared but not in any other file external static thus provides a way to hide names like buff and buff p in the get ch unget ch combination which must be external so they can be shared yet which should not be visible to users of get CH and unget CH so there's no possibility of conflict if the two routines are compiled in one file as in static Char buff open square bracket buff size closed square bracket ET semicolon static int buff P equals z semicolon get CH open print Clos PR open curly brace dot dot dot close curly brace unget CH open print C close print open curly brace dot dot dot close curly brace then no other routine will be able to access buff and buff p in fact they will not conflict with the same names in other files of the same program static storage whether internal or external is specified by pref prefixing the normal declaration with the word static the variable is external if it's defined outside of any function and internal if defined inside a function normally functions are external objects their names are known globally it is possible for a function however to be declared static this makes its name unknown outside the file in which it's declared in C static kind otates o not only permanence but also a degree of what might be called privacy internal St static objects are known only inside one function external static objects variables or function are are known only within the source file in which they appear their names do not interfere with variables or functions of the same name in other files external static variables and function s provide a way to conceal data objects and any internal routines that manipulate them so that other routines and data cannot conflict even inadvertently for example get CH and unget CH form a module for character input in push back buff and buff P should be static so they're inaccessible from the outside in the same way push pop and clear form a module for stack manipulations Val and SP should also be external static 4.7 register variables the fourth and final storage class is called register a register declaration advises the compiler that the variable in question will be heavily used when possible register variables are placed in machine registers which may result in smaller and faster programs the register declaration looks like register int X semicolon register Char C suol and so on the in part may be omitted register can only be applied to automatic variables and the formal parameters of a function in this latter case the function declaration looks like f open prin C comma n Clos PR register int C comma n semicolon open curly brace register in I semicolon and then dot dot dot close curly brace in practice there are some restrictions on register variables reflecting the realities of the underlying Hardware only a few variables in each function may be kept in registers and only certain types are allowed the word register is ignored for excess or disallowed declarations and it is not possible to take the address of a register variable a topic that will be covered in chapter 5 the specific restrictions vary from machine to machine as an example on the pdp1 only the first three registered declarations in a function are effective and the types must be int Char or pointer as a quick aside the description of the details of the implementation of the register Mo modifier on the pdp1 is a delightful Peak into how the C compiler generated runtime code on that particular system in the 1970s as compilers have become more sophisticated the compiler could decide which variables to keep in registers far better than the programmer could and since how variables would be allocated to registers might be different on different Hardware architectures the register indication is generally ignored by modern C compilers so you should probably never use it in your code as a matter of fact I wrote The Following sample C program and compiled it with the minus capital S option so I can see the generated Assembly Language with and without the register declaration with optimization there was no difference between the code generated with or without the register declaration the reason the generated assembly code was identical once you take a look at it was regardless of the use of the register keyboard was that the C Optimizer on my armm based computer in 2022 realized the best way to implement the code was to keep both of the variables in registers because the loop code was so simple and the CPU in my computer has plenty of registers and optimized any loading and storing of the data for these variables right out of the program in 1978 the authors likely included the register function as a feature to convince the experienced Assembly Language programmers that they should write all but the lowest lowest level code in C so write a little tiny bit in C and then I mean write a little tiny bit in Assembly Language and write everything else in C so here's an example that's not in the textbook it's on page 81 if it were in the textbook you can see this code at www.cc.com code page 81 and this is code that I wrote to play with the register keyword to mostly convince myself it was pointless to use it but here we go pound include stdio.h int main open print Clos print open curly brace int compute semicolon register int itter semicolon scan F open print double quote percent D close quote comma Ampersand compute close parentheses semicolon PR F open parentheses double quot compute space percent D back sln double quot comma compute closed parentheses semicolon four open print iter equals z semicolon iter less than 1,000 semicolon iter Plus+ close parentheses open curly brace compute equals open parentheses compute time 22 closed parentheses * 7 if compute greater than 1,000 compute equals compute modulo 1000 close curly brace print F open print double quote compute space percent D back sln close quote comma compute semicolon close curly brace now some of these I wrote this code in a way that tries to convince the the the optimizer that I'm actually going to use these values that's why I read the value from input as compared to a constant it would actually optim the optimizer so so so smart that it would just eliminate all the constant calculations so but that's my sample U playing with register section 4.8 block structure C is not a block structured language in the sense of pl1 or alol in that functions may not be defined within other function on the other hand variables can be defined in a block structured fashion Declarations of variables including initializations may follow the left brace that introduces any compound statement not just the one that begins a function variables declared this this way supersede any identically named variables in outer blocks and remain EX in existence until the matching right brace for example if open parentheses n greater than Z closed parentheses open curly brace in I semicolon declare a new I for open parentheses I equal 0 semicolon I less than n semicolon i++ close parentheses and dot dot dot the rest of the for Loop and then a closed curly brace for the if the scope of the variable I is in the true branch of the if this I is unrelated to any other I in the program Blu structure also applies to ex internal variables given the Declarations int X semicolon F open parentheses closed parentheses open curly brace double X semicolon dot dot dot Clos parentheses then within the function f occurrences of X refer to the internal double variable outside of f they refer to the external integer this is same is true of the names of formal parameters for example in Z semicolon outside of any function f open parentheses Z closed parentheses double z semicolon open curly brace dot dot dot Clos curly brace within function f z refers to the formal parameter not the external variable section 4.9 initialization initialization has been mentioned in passing many times so far but always peripherally to some other topic this section summarize some of the rules now that we have discussed the various storage classes in absence of explicit initialization external and static variables are guaranteed to be initialized to zero automatic and register variables have undefined or Garbage values simple variables not arrays or structures may be initialized when they are declared by following the name with an equal sign and a constant expression int x = 1 semicolon j s quot equals single quote back single quote single quote semicolon a constant of a single character that is a single quote itself long day equals 60 * 24 semicolon which is the minutes in a day for external and static variables the initialization is done once conceptually at compile time for automatic and register variables it is done each time the function or block is entered for automatic can register variables the initializer is not restricted to being a constant it may in fact be any valid expression involving previously defined values even function calls for example the initializations of the binary search program that we wrote in chapter 3 could be written as binary open parentheses x comma V comma n closed parentheses int X comma V open square bracket close square bracket comma n semicolon open curly brace int low equals z semicolon in high equals nus1 semicolon and in mid followed by the rest of the function and enclosed curly brace instead of initializing these as the first executable statements and we would do this with binary open pen x comma V comma n Clos pen int X comma V open square bracket close square bracket comma n semicolon open curly brace int low comma mid comma High semicolon low equals z semicolon High equals n minus1 semicolon and so forth in effect initializations of automatic variables are just shorthand for assignment statements which form to prefer is largely a matter of taste we have generally used explicit assignments because initializers and declarations are harder to see automatic arrays may not be initialized external and static arrays may be initialized by following the Declaration with a list of initializers enclosed in braces and separated by commas for example the character counting program of chapter one which originally was main open parency closed parency open curly brace int C comma I comma n white comma n other int n open Square braet 10 Close square bracket semicolon n white = n other equal 0 for open parentheses I equal 0 semicolon I less than 10 semicolon I ++ closed parentheses n digit sub I equals Zer and then the rest of the code followed by a Clos curly brace finishing main this could be written instead using initializers as follows int and white equal 0 semicolon int n other equals z semicolon int n digit open square bracket 10 Close square bracket equals open curly brace 0 comma 0 comma 0 comma 0 comma 0 comma 0 comma 0 comma 0 comma 0 comma Z Clos curly brace 10 zeros in a row in separated by commas and in braces and then the main code is simply main open PR close Pin open curly brace int comma C comma I and then the rest of the main code close curly brace the idea is is that with the initializers with the external variables outside of the main function you do not need to initialize them even with a for Loop in the beginning of the main program these initializations are actually all unnecessary since they're all zero anyways but it's a good form to make them explicit anyway if there are fewer initializers than the specialized size the others will be zero it is an error to have too many initializers regrettably there is no way to spe specify the repetition of an initializer nor to initialize an element in the middle of the array without supplying all the intervening intervening values as well character arrays are a special case of initialization a string may be used instead of the braces and comm's notation as in Char pattern open square bracket closed square bracket equals Double quot T H double quot semicolon this is a shorthand for the longer but equivalent Char pattern open square bracket Clos square bracket equals open curly brace single quote T single quote comma single quote H single quote comma single quote e single quote comma single quote back slash zero single quote Clos curly brace semicolon when the size of an array of any type is omitted the compiler will compute the length of the array by counting the initial initializers in this specific case the size of pattern is four three actual characters plus the terminating back sl0 I would note that the primary difference between C and C influence like languages like Java PHP and JavaScript this key is that c strings are character arrays while in the other languages strings are objects these string objects do have inside themselves an array of byes or characters but they also keep track of things like the length of the string and provide functionality like extract a substring in the methods in these objects in C there is a set of Library functions that perform string operations like compare to Strings while string comparison is built into the string objects in each of the other languages strings as character arrays allow programmers to build very fast low-level code in libraries and operating systems but to write the code well you need to understand what is really going on at the low level section 4.10 recursion C functions may be used recursively that is a function may call itself either directly or indirectly one traditional example involves printing of a number as a character string as we mentioned before the digits are generated in the wrong order order low order digits are available before for high order digits but they have to be printed in the other way around there are two solutions to this problem one is to store the digits in an array as they are generated and then print them in reverse order as we did in chapter 3 with I to a the first version of print D follows this pattern this is sample code on page 85 of the textbook you can view the sample code at www. cc4 /c code page 85 example one pound include stdio.h print D open pen n close pen int n open curly brace Char s open square bracket 10 closed square bracket semicolon in I semicolon if parentheses n less than zero parentheses open curly brace put Char open prin single quote Dash single quote Clos peren semicolon n equal minus n semicolon Clos curly brace I equal Z semicolon do open curly brace S Sub i++ equals n modulo 10 plus quot 0 quote which gets the next character while parentheses parentheses n/ equal 10 close parentheses greater than Z closed parentheses semicolon then we reverse the string while open print minus- I greater than equal to zero Clos pen put Char open pen s subi Clos pen semicolon close curly brace to end the print D function the alternative is a recursive solution in which each call of print D first it calls itself to cope with any leading digits then prints the trailer digit after that call returns this is an example on page 85 of the textbook in its example two pound include stdio.h print D open pen n close pen Inn semicolon open curly brace in I semicolon if open for n less than Zer Clos pren open curly brace put chair open pren single quote- sing single quote close pen semicolon Nal minus n semicolon close curly brace to finish the if if open print open print I equal n/ 10 Clos print not equal zero close print print D open print I Clos print semicolon that's the recursive call and after the recursive call comes back we do put Char open pen in modulo 10 plus single quote 0 single quote close print semicolon close curly brace to end the print D function when a function calls itself recursively each invocation gets a fresh set of all the automatic variables quite independent of the previous set thus in print D 123 the first print D has Nal 123 it passes 12 to a second print D then prints three when that one returns in the same way that second print D passes one to a third which prints it and then prints two recursion generally provides no saving in storage since somewhere a stack of values is being processed or has to be maintained nor will it be faster but recursive code is more Compact and often much easier to write and understand recursion is especially convenient for recursively defined data structures like trees we will see a nice example in chapter six as as an aside uh recursion recursion recursion recursion is a beloved Concept in computer science it is often taught early in most programming courses because it is just so cool most examples are sadly like Computing factorial or the example above converting an integer to a string and they're not good uses of recursion actually but when you do finally find yourself in need of traversing a tree based structure like an XML document or parsing a mathematical expression with parentheses recursion is the ideal solution so the problem in a sense is not recursion but when it is taught and what examples are used interestingly kernigan and Richie include the correct warning about using recursion when it is not the best solution in the above text and it Bears another read back to the book cursion generally provides no saving in storage since somewhere a stack of values is being processed or has to be maintained nor will it be faster but recursive code is more Compact and often much easier to write and understand recursion is especially convenient for recursively defined data structures like trees we will see a nice example in chapter six I couldn't have said it better section 4.11 the c pre-processor c provides certain language Extensions by means of a simple macro pre-processor the pound Define capability which we have used is the most common of these extensions another is the ability to include the entire contents of other files during compilation file inclusion to facilitate handling of pound defines and declarations among other things C provides a file included feature any line that looks like pound include space double quote file name double quote is replaced by the conts of the file name the quotes are indeed mandatory often a line or two of this form appears at the very beginning of each source file to include common pound defined statements and extern declarations for Global variables pound includes may be nested pound include is the preferred way to tie declarations together for a large problem and in a large program it guarantees that all the source files will be supplied with the same definitions and variable declarations thus eliminating a particularly nasty kind of bug of course when an include file is changed all the files that depend on it must be recompiled macro substitution a definition of the form pound Define yes one calls for a macro substitution of the simplest kind rep replacing a logical name by a string of characters names and the pound Define have the same forms of as the C identifiers the replacement text is arbitrarily normally the replacement text is the rest of the line a long definition may be continued by pacing a backslash at the end of the line to be continued the scope of the name defined with pound Define is from from its point of definition to the end of the source file names may be redefined and a definition may use previous definitions substitutions do not Place take place within quoted strings so for example if yes is a defined name there would be no substitution in print F open print double quote yes double quote Clos print since implementation of pound Define is a macro prepass not part of the compiler proper there are very few grammatical restrictions on what can be defined for example alcohol fans can say pound Define then and then nothing pound Define begin open curly brace pound Define end semicolon Clos curly brace and then write approximate alol if pen I greater than zero Clos pen then begin a = 1 semicolon b = 2 end it is also possible to Define Mac macros with arguments so the replacement text depends on on the way the macro's called as an example to find a macro called Max like this pound Define Max open PR a comma B open PR open PR a Clos PR greater than open PR B Clos print question mark open print a Clos print colon open print B Clos print Clos print now the line x equals Max open print p+ Q comma r+ s closed BR semicolon will be replaced in the pre-processor by the line x equals open print open PR p+ Q Clos print greater than open PR R plus s Clos PR question mark open PR p+ Q Clos pen colon open pen r+ s Clos pen Clos print semicolon this provides a maximum function that expands into inline code rather than a function call so long as the arguments are treated consistently this macro will serve for any data type there is no need for different kinds of Macs for different data types as there would be with functions of course if you examine the expansion of Max above you will notice some pitfalls the expressions are evaluated twice it is bad if they involve side effects like function calls increment operators or perhaps push and pop like we've used before some care has to be taken with parentheses to make sure the OD order of evaluation is preserved consider the macro pound Define Square open print X Clos print equals x * X when it is invoked as Square open print Z + one close print there are even some purely lexical problems there can be no space between the macroon name and left parenthesis that introduces the argument list nevertheless macros are quite valuable one practical example example is the standard IO library to be described in chapter 7 in which get charar and putchar are defined as macros obviously put chart needs an argument thus o avoiding the overhead of a function call per character Pro process other capabilities of the macro processor are described in appendex a as a bit of a long aside in this section we are talking about the pre-processor is probably a good time to talk bit about why we use this terminology for those of you with a computer science degree from back in the day many of you wrote a compiler as a senior project just like I did building a compiler was a great project because part of the goal of computer science is to understand the technologies that make programming proc possible from the language syntax down to the hardware the compiler that translates our source code into machine code is an essential part of the technology stack that we use early compilers for languages like the early Fortran variants tended to be translators they just translated code one line at a time from a high level language to Assembly Language you could think of early Fortran programs in the 1950s and the 1960s as just more convenient ways to write Assembly Language for programmers that knew Assembly Language you always needed to be aware of Assembly Language and the translation that it was going to write fast Fortran programs were small and optimization was done at the for Trend level often leading to some hard to understand code by the mid 1970s programming languages were based on parsing Theory and we used what is called a grammar to define the language kernigan and Richie kept I/O statements out of the C language to keep its formal definition I.E its grammar as simple as possible as these new languages emerged they allowed for more theoretical and Powerful approach to converting source code to machine language the theoretical advances in compiler and language design me that parts of the compiler might be reusable across multiple programming languages each language could have its own syntax and grammar rules and they could be plugged into the compiler and poof you would have a new programming language it got to the point where Unix system Unix systems had a tool called Yak which stood for yet another compiler compiler you would give it a grammar for your new language and it would make a compiler for you as a matter of fact the job JavaScript language that was created in 10 days back in 1995 was possible because Brendan Ike had a lot of experience with compiler generators he defined a grammar for JavaScript and generated his first compiler part of what made a compiler generate generator possible is the idea of a multi-step compiler or the tasks of a compiler were broken down into a series of simpler and more well-defined steps here are the steps of a typical C compiler in the 1970s first a pre-processor step that takes code with syntax like pound Define and P include as its input and produces raw code output with those instructions processed and or expanded the pre-processor processor was a ctoc transformation next a parser step that took the raw C code applied the grammar to the language and created what is called a parse tree think of the tree is a hierarchy of statements grouped into blocks grouped into functions Etc a things like a loop where just one node in a parse tree after that a code generation would turn the parse tree into some kind of simplistic portable internal code that expanded things like loops and if and elf statements into code after that a code optimization that looked at the internal code and moved things around eliminating any redundant computations say don't compute the same things twice this step is why the authors make such a big Foss about how there are times where C might do things in a slightly different order in an expression even in the presence of parentheses remember the KRC Arrangement license back in Chapter 2 that rule removes constraints on the compiler's optimization step so it can generate the most efficient code I would note that all the steps up to this point did not depend in any way on the actual machine language of the system that they were running on this meant a pre-processor parser code generator and code Optimizer could literally be written in C and used on any architecture the final step is a code generator that takes the optimized intermediate code and generates the actual assembly and machine language for the processor for fun you can add the minus capital S parameter to your C compiler and see the resulting Assembly Language output for your system if you look at the machine language generated on Intel or AMD processor and compare it to the machine language on an armm processor it will look very different because all but the final compiler steps did not depend on the computer where the program is being run you could actually create a c compiler on a new computer architecture by writing a code generator on the new computer then running all but the last step of the compiler on one computer then copying the internal code generated by the compiler to the new compiler and running the code generation step on the new computer then you actually have a working C compiler on the new computer and the first step is usually to recompile the C compiler itself from source code to produce a fully native C compiler on the new computer that can compile all the rest of the C code you have including possibly the mostly portable elements of the Unix operating system on the new compile yes describing how to cross compile and bootstrap a c compiler onto a new computer hardware architecture can give you a headache if you think about it too much but this notion of bootstrapping a c compiler onto a new architecture was an important technique to move C and Unix to a wide range of very different computer architectures we see this in action as the Unix like Mac OS operating system over the past 20 years was delivered initially on a Motorola 68,000 family processors then on power PC processors and then on Intel processors and most recently on arm-based processors built by Apple using the software portability patterns that come from C and Unix and described by kran and Richie in this book Apple now made makes their own Hardware that can be tuned and evolved over time as their operating system and their applications requirements dictate the use of a grammar by the way is to define a programming language is one of the reasons that syntax errors are so obtuse the compiler is not looking at your code like a human it is following a very set of simple rules to parti your code and it's stuck with something ilog logical and gives you a message like unexpect unexpect expected statement block or constant on line 17 and the error is nowhere near line 17 modern compilers are more sophisticated of course than the steps above but these steps give you a sense that the compiler does many things to make it so your code can actually run very efficiently and given that kernigan and Richie were building a programming language c a more mostly portable operating system written in C Unix and a mostly portable C compiler written in C some of the their Innovative work and Research into compiler design finds it its way into this book so we have a section in this chapter called the C preprocessor so here we are at the end of chapter 4 and it's a good time to talk about the word address up to this point in the book if you count them the word address has been used 10 times without a precise definition beyond the notion that data is stored in memory and the address of the data is where the data is stored in memory in the next chapter this notion of the address where the data is stored becomes very real and tangible as we explore pointers as well as the Ampersand and asterisk operators up to now an experienced JavaScript PHP or Java programmer can view c as just another set of similar syntax rules with a few quirky runtime bits but in the next chapter we will deeply explore the concept of data allocation and location it turns out that every programming language pays a lot of attention to data allocation and location but the runtime environments of modern languages work very hard not to expose you to those details just because modern languages hide the difficult bits from us it does not mean that those languages solve the problem using magic eventually the problem needs to be solved and that is why the comp and lowlevel runtime elements of language like PHP JavaScript and Java are usually written in C so the Builders of those languages can solve the difficult data storage and allocation problems for you this work is based on the 1978 C programming book written by Brian W kernigan and Dennis M Richie their book is copyright All Rights Reserved by AT&T but is used in this work under fair use because of the book's historical and scholarly significance its lack of availability and the lack of an accessible version of the book the book is augmented in places to help understand Its Right Place in a historical context amidst the major changes of the 1970s and 1980s as computer science evolved from a hardware first vendor centered approach to a software centered approach where portable operating systems and applications written in C could run on any hardware this is not the ideal book to learn SE programming because the 1978 Edition does not reflect the modern sea language using an obsolete book gives us an opportunity to take students back in time and understand how the sea language was evolving as it laid the groundwork for a future with portable applications [Music] hello and welcome to our lecture on kernigan Richie chapter 5 putting some context around it chapter five is functions and program structure so the first thing I want to call your attention to is section 5.1 I actually think that section 5.1 is the most poignant and beautiful section in the book everything you've learned up till now everything talked about size of data Etc has led to the point where you can read 5.1 and understand every word of it you should enjoy reading it I think of it as like a love letter from the creators of sea to Future computer scientists so 5.1 is important uh we'll talk a little bit about Pointer arithmetic uh 5.6 we'll look at the sort of The Duality between pointers and integers then we'll hit call by reference and call by value that are Ena B in C by pointers and then look at the biggest security hole that c has caused over the past 40 plus years uh buffer overflow now the the the the chapter gets a little dense um in some of the sections and so I'll I'll just have you skim some of those sections this is the essential example of pointers we have two variables int X and Y we have a variable PX which is of type pointer it points to an integer that's what instar means we store 42 in X and we store the address of X into PX using the Ampersand operator and then we use the address of X which is in PX and then we use a lookup operator or a dreference operator star PX it says go to the memory location pointed to by PX and load me an integer and put that into Y and so we can see when we print out X is 42 and Y is 42 and P X is a long hexadecimal number that is some memory location inside the actual computer and so ERS send and asterisk and in Star the star as a sort of a modifier for a type are the the important things one of the things that You' probably never seen in Python is the ID function we've used functions like type and dur and there are ways for us to inquire about variables and constants ID is a way to ask ask for the idea of something now in cpython and and just to be clear there are multiple versions of python cpython is the classic one it's the implementation of python that happens to be written in C uh there are other implementations of python um and so what I'm telling you with this ID function is something that will work for the moment in cpython but not necessarily every other one if you print it out and you say what is X and what is the ID of X it's kind like the address and if you look at the documentation it says don't think this is the address right and it says the python ID function is not intended to be dereferenceable meaning we're not supposed to look up memory from that the fact that it's based on the memory address is a cpython implementation detail that other python implementations do not follow now if you download the source code k501 py I actually have a completely unauthorized implementation of a lookup a dfference and it has to know the type of the thing that it's D referencing y equals DF of PX and it can then give me back that integer pointed to by the address but this is not guaranteed to work it's not supposed to be how it works it just is there is kind there things have addresses and in cpython at least for this particular version of python that I'm using you can use that pointers gives us the ability to do call by reference and so you know if you've done python you see that we we we have a I had a slide in an early version of my python class that said uh sorry python doesn't do call by reference it only it only does call by value and that means that within a function you change the parameters and nothing happens but some languages do have call by reference which means the parameters that come into a function are somehow handles that allow us to actually change the values in the main programmer or where where we've been calling from so the language Pascal and c and C++ PHP and C have this notion a formal notion of call by reference and languages that don't have it are like Python and Java and JavaScript now these there is a notion the fact that I said this is for simple types like integers objects are passed in but then if you call methods in objects you can actually change the dat that the object has but it's not like you're changing the object you're changing the object's data so let's take a look at a bit of code now the first example is actually Pascal now Pascal is a programming language that was written by uh Nicholas verth and in Switzerland in 1970 and it had a call by reference and it had this notion of VAR and so you're creating a function name Funk takes two parameters one is a call by value which is a and then the other one's call by reference which is a b and we set a and b to two new numbers and then in the main program we set x = 42 yal 43 and then we call the function and you'll notice there's no extra like syntax in the function and then we come back and you will see that uh the Y variable is changed and the X variable is not and then the C version of this we have you know x = 42 y = y = 43 and then when we call Funk we say we're going to pass in x and then Ampersand Y which is the address of Y and if you go back to the very first example in section 5.1 we're passing in a number which we're actually passing in by value but the value is the address and then inside the function we take a and then a pointer to B PB and we say that a is just an integer and PB is an address of an integer by adding the little asterisk there so the address where it's at has been passed by value but using that value we can dreference it and get to the thing so we say a equal 1 and we say star PB equals 2 that says store two as an integer into the location pointed to by PB and then when you come back the second parameter will have been changed y will have been changed and X will not be changed if we take a look at a few other languages so here we have the C code again um python uh 1989 doesn't have the notion of pass by reference and so one of the things that I think is a an excellent compromise uh that is the case in Python is the notion of returning a tupple not just a single value but a topple return and so that way we could if we really wanted to get back a value more than one value um we could return a tupple and then in the main program we assign the tupple so if we really wanted X and Y to change from uh inside the function we could do so by just explicitly saying function is going to return two values and we're going to change them both and if we look at PHP which is 1994 um we see a very elegant I think now whenever you look at PHP you got to realize the dollar sign is just part of the variable name that's just the first character of all variables in PHP so what we do inside a function is we say Ampersand dollar B which is the second parameter is B dollar B and we're expecting to change it and you'll note that we don't change the syntax inside the function dollar Bal 2 dollar AAL 1 the syntax doesn't change and when we make the call Funk dollar X comma dollar y we don't change that either and yet call by reference works so if you look at all these examples other than the weird dollar sign convention I would say that the simplest and most elegant is probably the PHP implementation right because we don't have to do anything inside the function except I'm planning on changing this now C sh which is much later 2000 um has this notion of ref which is somewhat a call a throwback to um Pascal but also you know it's the Amper sand thing and um but the one thing I like about it is inside the C defunk you have to kind of agree the calling code by saying refx is in a sense agreeing that it is aware that X is likely to be changed by that function and so that's that's called by reference now you know we're in a cclass and so Amper sand and asterisk are how we do it so again that's just it's it's really quite straight forward inside a c code as long as you are very good at understanding what the asterisk and Ampersand do in C another important thing that's easily understood with a very simple bit of code is pointer arithmetic the key to pointer arithmetic is that a pointer to an integer is different than a pointer to a character now both these point pointers are the same size because they are an address and addresses are all the same size but if you add one to a character pointer that actually adds one to the address and if you add one to an integer pointer then it adds four and that's because on each integer takes four characters and so when you're doing increments and subtracts Etc you are when they're pointers it it increments based on the type of the thing that's pointed to so a pointer is not just a pointer it's a pointer to a thing with a type and when you're incrementing and decrementing the type that's being pointed to is more important than the fact that it's a pointer it goes up and goes down but it doesn't always just go up and down by one pointers are not integers so if you go back to chapter 2 there was from the book a table of the sizes of things and so if you look in the PDP 11 integers are 16 bits and Honeywell 6000 there're 36 bits and IBM 370 there 32 bits and inata 832 there's 32 bits now I've added a line to this that tells the number of bits in addresses in these systems and you can see if you compare the int numbers to the address numbers that in all the cases except the pdp1 the the integer is larger than the address which means that there is extra space in the address and we can almost treat addresses as unsigned integers now the pdp11 is a little weird in that 16 to 32 is a range of delivered computers over over the years and uh not all computers had full memory and not all applications use the entire memory of the entire computer so um most of the time you can conveniently put an address into an integer and then get that address back out and not have truncated um that address or messed it up so treating pointers as integers almost works and the long longer in longer ago in history it was the more likely it did work addresses are generally positive numbers that often start from zero sometimes Heap numbers come down and sometimes stack numbers go up or whatever but most computers did not come with a maximum memory installed and and if you're a multi-user computer you didn't give all the systems memory to every application and we tended to use very little memory in applications we're very careful about it so it just never ran into the problem of our memory address is not fitting into integers so in the early ' 70s applications could get away with having a function that returned an address return it as an integer and then copy it into a pointer without conversion and so by like the early 80s the notion of a void pointer gave us a way to have a generic address that is a pointer to something we don't know what type it is cuz all addresses are addresses but what they point to is different and so uh if you take a look at the Alec function which we'll play a lot more in the next chapter the Alec says oh give me 42 bytes and give it back to me as a pointer give me a pointer to 42 new bytes that you just allocated so if you go in the early 70s Alec returned an INT but then we would cast it to whatever type we wanted so we would say Alec 42 would give us an address that'd be an integer but then we cast it to an integer star which is a no loss C pass and then we would store it by the time in the 1978 CNR book we tended to call it a charar cuz the 42 is how many characters we're going to allocate and then you would take the pointer to a character and cast it to a pointer to an integer and so Alec of 42 would give us 14 integers actually I think if I got my multiplication right but in modern C we have this pointer void pointer which basically says look Alec is going to return an address and you have to cast it to something so Alec 42 returns a void star which is cast it to an instar which is a lossless cast and not something that's going to confuse the compiler and then we store it in our intar variable and so void you'll everything you'll ever touch will be using void um but I just wanted to give you a little bit of the history of it and why void's kind of not mentioned in this 1978 book every time in the class I'm like hey it's time to learn about security and everybody got kind of groans like oh no back when I taught HTML injection and SQL injection and cross-site scripting in all my previous classes and here's the classic XKCD where um the mom has named their child with a bit of SQL and some single quotes and some comments and um that's all fun it's important that we as software developers are aware of how the things that we build could be corrupted by those with uh those with uh in evil intent right so it has come time to talk about that for C probably the single worst security hole in all of computing history from 1950 to today even before c was a thing is what's called buffer overflow and it has to do with the fact that the there is no sense that a string of characters has a length it's has an allocated length but it doesn't have a runtime length and so when we put more data into a string than can hold the string it just keeps on storing beyond the end of the string it doesn't like push make a little more space and so this is from the Wikipedia page where you have an eight character um string followed by a two character integer or something and we copy the string excessive which is a nine character which includes nine characters and the sl0 the zero and that completely overwrites by just trying to write into the a string it overwrites the B variable as well and so that's buffer overflow it's sort of like somehow we're going to push too much into this variable so that it extends where it's been allocated and that never is detected and then it keeps going on and it means that you can do all kinds of things with buffer overflow you can change variables you can like turn on super user permission who knows you got to look at the source code you got to carefully construct a sort of nasty attack but the attack Vector is the fact that string arrays bounds are not checked when we're copying stuff in and if you write bad code or if the system writes bad code it's just going to go wiping out memory so it turns out that the probably the the the worst offender of this is the gets function and this was part of standard C for a long time and so here what I'm doing is I'm I'm creating a 15 character uh string array a character array which is 15 elements and I'm calling get S and the problem with get S is like somebody's going to give us that data and it's not us and then I print it out so the first first thing you see is when I compile a pit of code that has GS the compiler is upset I have greatly uh simplified the errors it just it comes up with three errors and this is a subset of one of the errors the compiler is telling you don't use gets if you didn't hear what I said the first time don't use gets and so so the compiler is not happy but it it's like you know people WR write that so we're going to run it okay a. out which starts the code as soon as that line gets runs the runtime of the C standard SDI Doh says before it prompts us for the data it actually adds a print statement it's not our print statement it's the library saying you really really should not be using gets and if you think this program is trustworthy you're probably wrong so I Type Hello World which is 11 characters hello space world yeah it's 11 characters hello world's 11 characters I type 11 characters in that includes the 12th character which is the back slash zero and that fits into s S15 a 15 15 element string uh character array in the variable s so the program works just fine then I type a. out again and it once again tells me please don't use gets you're going to be in so much trouble and now I type in dead a bunch of a hello and a bunch of spaces and then world and it prints out hello bunch of spaces and world but has overwritten all kinds of unknown data after the S15 so that's you know that's like 30 or 25 or 30 characters and it the first 15 are in s but then the next 15 are somewhere else and S is on the stack because it's an automatic variable in Main and it goes wiping out the rest of the stack now it turns out that the cun time puts things on the stack to kind of Mark or to catch this overflow and so what happens is as soon as that code finishes it says abort trap six which is basically the C runtime saying you know what I'm not going to let this program proceed any further because there has been an array that got messed up and it's not that it caught it's not that it caught the array messing up it didn't know how long it was it just put characters in but what it did is I put something after the array and then it checked for it later and that got wiped out and it's like okay you wiped out my magic little secret and so I'm going to not let you continue and so we don't you want you to use get S um and this is a buffer overflow and I I can give you eventually maybe we will look at some much more complex examples of this where we try to like use something like get us to manipulate what the program does rather than just blow the program up but this is a very simple example of buffer overflow so in summary pointers are the beautiful most beautiful part of SE they're complex but basically pointers make it so that a highlevel language can function like a low-level language if we don't have pointers and I mean not even kind of crappy python ones I mean pointers that we can look up and then D reference officially and formally and not have it be a sneaky way that we're doing it that means means that you can do the things that operating systems need to do the kinds of things that we used to write Assembly Language for meaning we're going to here's a buffer of memory we're going to copy this buffer we're going to do another thing and there there's another buffer and there's a link list of all the different buffers so understanding pointers leads you to the path of Assembly Language machine language and then ultimately Hardware so you should not rush through this material pointers are really really important everything we're going to do from now on pointer is just I'm just going to say pointer pointer pointer just like I say object oriented over all the time I'm going to say pointer all the time sections 57 and 510 through 5 52 are a little dense so what I really want you to do is understand the stuff I just talked about and the corresponding sections and chapter six will be more fun because we'll be doing much more with the pointers rather than just what is a pointer [Music] welcome to C programming for everybody my name is Charles S and this is my reading of the 1978c programming book written by Brian kernigan and Dennis Richie at times I add my own interpretation of the material from a historical perspective chapter five pointers and arrays before we start chapter 5 a quick note from your narrator from time to time I have been adding some of my interpretation to this book but I won't be adding anything to this chapter I think that sections 5.1 through 5.6 contain some of the most elegantly written text in the book concepts are clearly stated and the example quote is short direct and easy to understand pointers are the essential difference between C and any other modern programming language so pay close attention to this chapter and make sure that you understand it before continuing this chapter is as strong now as it was in 1978 and so without further Ado we read and listen as kernigan and Richie teach us about pointers and arrays a pointer is a variable that contains the address of another variable pointers are very much used in C partly because they are sometimes the only way to express a computation and partly because they usually lead to more Compact and efficient code than can be obtained in other ways pointers have been lumped with a go-to statement as a marvelous way to create impossible to understand programs this is certainly true when they are used carelessly and it is easy to create pointers that point somewhere unexpected with discipline however pointers can can also be used to achieve Clarity and simplicity this is the a aspect that we will try to illustrate section 5.1 pointers and addresses since a pointer contains the address of an object it is possible to access the object indirectly through the pointer suppose that X is a variable say int and that PX is a pointer created in some as yet unspecified way the unary operator Ampersand gives the address of an object so the statement PX equals Ampersand X semicolon assigns the address of x to the variable PX PX is now said to point to X the Ampersand operator can be applied only to variables and array elements construct like Amper sand open pren X+1 Clos pren and ersan 3 are illegal is also illegal to take the address of a register variable the UN the unary operator asterisk treats its operand as the address of the ultimate Target and accesses that address to fetch the contents thus if Y is also an INT y equals star PX semicolon assigns to Y the contents of whatever p PX points to so the sequence PX = Ampersand X semicolon yal star PX semicolon assigns the same value to Y as does y = x it is also necessary to declare the variables that participate in all of this int X comma y semicolon int star PX semicolon the Declaration of X and Y is what we have seen All Along The Declaration of the pointer PX is new int star PX semicolon is intended as a pneumonic it says that the combination star PX is an INT that is if PX occurs in the context star PX it is equivalent to a variable of type INT in effect the syntax of the Declaration for a variable mimics the syntax of expressions in which the variable might appear this reasoning is useful in all cases involving complicated declarations for example double A to F open parentheses closed parentheses comma star DP semicon says that in a particular expression a to F open PR Clos print and star DP have values of type double you should also note the implication in the direction declaration that a pointer is constrained to point to a part particular kind of objects pointers can occur in expressions for example if PX points to the integer X Then star PX can occur in any context where X could y equal star PX + 1 sets y to one more than x print F open parentheses double quote percent D back sln double quote comma star PX close perin prints the current value of x and D equal Square OT open pin open p double Clos paren star PX closed per n produces in D the square root of x which is coerced into a double before being passed to square root expressions like y equals star PX + one The unary Operators star and Ampersand bind more tightly than arithmetic operators so this expression takes whatever PX points at adds one and assigns it to Y we will return shortly to what y equal star open print PX + one Clos print might mean pointer references can also occur on the left side of assignments if PX points to X Then star PX equals 0 sets X to Zero and star PX plus equals 1 increments it as does open pen star PX Clos pen plus plus the parentheses are necessary in this last example without them the Inc expression would increment PX instead of what it points to because unary operators like star and Plus+ are evaluated right to left finally since pointers are variables they can be manipulated as other variables can if py is another pointer to int then py equals PX copies the contents of PX into py thus making py point to whatever PX points to section 5.2 pointers and function arguments since C passes arguments to functions by call by value there is no direct way for the called function to alter the variable in the calling function what do you do if you really have to change an ordinary argument for example a sorting routine might exchange two outof order elements with a function called swap it's not enough to write swap open parentheses a comma B closed parentheses semicolon where the swap function is defined as and this is sample source code on page 91 of the textbook and you can see it at ww w. cc4 e.com code this is a wrong swap by the way this this is showing you the code that you're not supposed to do swap open PR X comma y Clos print int X comma y semicolon open curly brace in temp semicolon temp equals X xal Y semicolon y equals temp semicolon Clos curly brace because of call by value swap can't affect the arguments A and B in the routine that called it fortunately there is a way to obtain the desired effect the calling Point program passes pointers to the values to be changed the call is swap open parentheses Amper sand a comma Ampersand B closed parentheses semicolon since the operator Ampersand gives the address of a variable Ampersand a is a pointer to a in swap itself the arguments are declared to be pointers and the actual operands are accessed through them so the correct code is on page 92 swap open pin PX comma py Clos pin int star PX comma star py semicolon open curly brace int temp semicolon temp equals star PX semicolon star PX equals star py semicolon star py equals temp semicolon and close curly brace one common use of pointer arguments is in functions that must return more than a single value you might say that swap actually returns two values the new values of its arguments as an example consider a function get int which performs a free format input conversion by breaking a stream of characters into integer values one integer birth call it int has to return the value that it found or an Ile signal when there is no more input these values have to be returned as separate objects for no matter what value is used for eof that could also be a value of the integer one solution which is based on the input function scanf that we will describe in chapter 7 is to have get int return eof as its function value at finds end of file and other any other returned value signals a normal integer the numeric value of the integer it found is returned through an argument which then must be a pointer to an integer this organization separates the end of file status from The Returned numeric value the following Loop fills an array with integers by calls to get in int comma n comma array open pin size Clos pen semicolon for n equals 0 n less than size double Amper sand yet int open print Ampersand V Clos print not equal eof semicolon n plus plus close parentheses array subn equals V each call sets V to the next integer found in the input notice it is essential to write Ampersand V instead of v as the argument to get int using plain V is likely to cause an addressing error since get in believes that it's been handed a valid pointer get in is an obvious modification to a toi which we wrote earlier the sample code is on page 93 of the textbook and you can see this sample code at www.cc.com code pound include stdio.h get int open print PN close print int star PN semicolon open PR open curly brace int C comma sign while open print open print C equals get CH open print close print close print dou equals quote space quote or C equals quot sln quote or C equals back SLT quot close paren semicolon this Loop we've done before and it skips the white space s equals one semicolon if open pen C equals quot plus quote or C equals quote minus quote Clos paren open cly brace sign equals open pen cou equals quote plus quote Clos pen question mark one colon minus one semicolon C equals get CH to advance the character semicolon and then close curly brace that those four lines record the sign now for Star PN equals z c greater than or equal to quote 0 quote and C less than or equal to quote 9 quote colon semicolon C equals get C open pen closed pin close curly brace star PN equal 10 * star PN plus C minus quot 0 quote star PN star equals sign if open PR c not equal to EF Clos print on get CH open print C Clos print semicolon return C throughout get int star PN is used as an ordinary int variable we have also used get CH and unget CH as described in chapter 4 so the one extra character that must be read can be pushed back down to the input section 5.3 pointers and arrays and see there is a strong relationship between pointers and arrays strong enough that pointers and arrays should really be treated simultaneously any operation which can be achieved by array subscripting can also be done with pointers the pointer version will in general be faster but at least to the uninitiated somewhat harder to grasp immediately the Declaration int a sub 10 finds an array a of size 10 that is a block of 10 consecutive objects named a sub z a sub one dot dot dot a sub n the notation a subi means the element of the array I positions from the beginning if PA is a pointer to an integer declared as int star PA then the assignment PA equals Ampersand a sub Zer sets PA to point to the zeroth element of a that is PA contains the address of a Subzero now the assignment x equals star PA will copy the contents of a sub Z into X if PA points to a particular element of array a then by definition PA points to the next Element no if PA points to a particular element of an array a then by definition PA plus one points to the next element and in general PA minus i points to I elements before PA and Pa plus I points to I elements after thus if PA points to a sub Zer star parentheses PA + one Clos parentheses refers to the contents of a sub one PA plus I is the address of a subi and star P print and star openr PA plus I is the contents of a subi these remarks are true regardless of the type of the variables in the array a the definition of adding one to a pointer and by extension all pointer arithmetic is that the increment is scaled by the size of the storage of the object that is pointed to thus in PA plus I I is multiplied by the size of the objects that PA points to before being added to PA the correspondence between indexing and pointer arithmetic is evidently very close in fact a reference to an array is converted by the compiler to a pointer to the beginning of the array the effect is that the array name is a pointer expression this has quite a few useful implications since the name of an array is a synonym for the location of the zeroth element the assignment PA equals Ampersand a subz can also be written as PA equals a rather more surprising at least at First Sight is the fact that a reference to a subi can also be written as star open PR a + I Clos PR in evaluating a subi c converts it to Star open pren A+ I close pren immediately the two forms are completely equivalent applying the operator Ampersand to both parts of this equivalence it follows that Ampersand a sub I and A+ I are also identical a plus I is the address of the I element Beyond a as on the other side of this coin if PA is a pointer Expressions may use it with a subscript PA a subi is identical to Star open print PA plus I Clos print in short any array and index expression can be Rewritten as a pointer and an offset and vice versa even in the same statement there is one difference between an array name and a pointer that must be kept in mind a pointer is a variable so PA equal a and Pa A++ are sensible operations but an array name is a constant not a variable constructions like a equals PA or A++ or P equals Ampersand a are illegal when an array name is passed to a function what is passed is the location of the beginning of the array within the called function this argument is a variable just like any other variable and so an array name argument is truly a pointer that is a variable containing an address we can use this fact to write a new version of sterlin which computes the length of the string the sample code is on page 95 of the book and you can see it in www.cc.com code page 95 in sterin open PR s Clos PR Char star s semicolon open curly brace int and semicolon four open parentheses n equals 0 semicolon star s not equal qu0 quot semicolon s++ close per n n++ return open print and Clos print semicolon Clos curly brace incrementing s is perfectly legal since it's a pointer variable s++ has no effect on the character string in function that called sterland but merely increments the sterland private copy of the address as the formal parameters in a function definition Char s open square bracket closed square bracket semicolon and and Char star s semicolon are exactly equivalent which one should be written is determined largely by how Expressions will be written in the function when an array name is passed to a function the function can its convenience believe that has been handed either an array or a pointer and manipulated accordingly it can even use both kinds of operations if it seems appropriate and clear it is possible to pass part of an array to a function by passing a pointer to the beginning of the subarray for example if a is an array F open PR Ampersand a sub 2 Clos prin and F open print a + 2 Clos print both pass to the function f the address of the element a sub 2 because Ampersand a sub 2 and a + 2 are both pointer expressions that refer to the third element of a within F the argument declaration can read F open print array Clos print int array Open Bracket close bracket semicolon dot dot dot or F open print array Clos print in Star array semicolon dot dot dot so far as f is concerned the fact that the argument really refers to a part of a larger array is really of no consequence section 5.4 address arithmetic if p is a pointer then p++ increments P to point to the next element of whatever kind of object P points to and P plus equals I increments P to the point I elements Beyond where it currently does these and simp similar constructions are the simplest and most common form forms of pointer or address arithmetic C is consistent and regular in its approach to address arithmetic its integration of pointers arrays and address arithmetic is one of the major strengths of the language Let Us illustrate some of the properties by writing a rudimentary storage allocator but useful in spite of its Simplicity there are two routines Alec open pren and closed pren returns a pointer P to n consecutive character positions which can be used by the caller of Alec for storing characters free open print P closed print releases the storage thus acquired so it can later be reused these routines are rudimentary because the calls to free must be made in the opposite order to the calls on Alec that is storage managed by Alec and free is a stack or last in first out the standard seed Library provides analogous functions which have no such restrictions and in chapter 8 we'll show how improved versions as well in the meantime however many applications really only need a trivial alet to dispense little pieces of storage of unpredictable sizes at unpredictable times the simplest implementation is to have Alec hand out pieces of a large character array which we will call Alec buff this array is private to Alec and free since they deal in pointers and not array indices no other routine need know the name of the array which can be declared as external static that is local to the source file containing alakin free and invisible outside it in Practical implementations the array May well not even have a name it might be obtained by asking the operating system for a pointer to some unnamed block of storage the other information needed is how much Alec buff has been used we use a pointer to the next free element called Alec CP when Alec is asked for n characters it checks to see if there is enough room left in Alec buff if so Alec Returns the current value of Alec P I.E the beginning of the free boach and then increments it by n to point to the next free area free P merely sets Alec P to P if p is inside Alec buff this next code example is on page 97 of the textbook you can you can see the code at www.cc.com code pound include stdio.h pound toine null Zer it's a pointer value for in the error report pound toine Alex size 1000 the size of the available space static Char Alec buff open square bracket Alex size closed square bracket semicolon static chair Char star Alec P equals Alec buff next free position initialized to the start of the array Char star Alec open print and Clos print return a pointer to nend characters int and semicolon open curly brace if Alec CP plus n less than or equal to Alec buff plus Alec size Clos pen open curly brace meaning we have space Alex CP plus equals n return Alex CP minus n closed parentheses semicolon close curly brace else if there's not enough room return open pin null Clos pin semicolon close curly brace free open PR P Clos print this function will free the storage point2 by P Char star P open curly brace if open PR P greater than equal to Alec buff and P less than Alec buff plus Alex size Alex CP equals P close curly brace some exclamations in general a pointer can be initialized just as any other variable can though normally only meaningful values are null discussed below or an expression involving the address of a previously defined data of the appropriate type the Declaration static Char star Alex CP equals Alec buff defines Alex CP to be a character pointer and initializes it to point to Alec buff which is the next free position when the program starts this could also have been written static Char St star Alex CP equals ENT Alec buff Subzero semicolon since the array name is the address of the zeroth element whichever is more natural the test if open foren Alec p plus n less than or equal to Alec buff plus Alex size checks if there's enough room to satisfy a request for n characters if there is the new value of Alec P would be at most one beyond the end of Alec buff if the request can be satisfied Alec returns a normal pointer notice the Declaration of the function itself if not Alec must return some kind of signal that there's no space left C guarantees that no pointer that validly points to data will ever contain a zero so a return value of zero can be used to signal in an abnormal event in this case no space we write null instead of zero however to indicate more clearly that this is a special value for a pointer in general integers cannot be meaningfully assigned to pointers but zero is a special case tests like if open for n Alec P plus n less than or equal to Alec buff plus Alex size and if open print P greater than or equal to Alec buff and P less than Alec buff plus Alex size shows several important facets of pointer arithmetic first pointers may be paired on certain circumstances if p and Q point to members of the same array then relations like less than greater than equal Etc work properly P greater than Q is true for example if P points to an earlier member of the array than Q the relations double equals and not equals exclamation equals also work any pointer can be meaningfully compared for equality or inequality with null but all bets are off if you do arithmetic or comp comparisons with pointers that point to different arrays if you're lucky you get obvious nonsense on all machines if you're lucky your code will work on one machine but collapse mysteriously on another second we've already observed that a pointer and an integer can be added or subtracted the construction p+ N means the nth object beyond the one p currently points to this is true regardless of the kind of object p is declared to point at the compiler Scales N according to the size of the objects P points to which is determined by the Declaration of P for example on the PDP 11 the factors are one for Char two for INT and short and four for long float and d and float and eight for double pointer subtraction is also valid if p and Q point to members of the same array P minus Q is the number of elements between p and Q This fact can be used to write yet another version of sterland sterland open pin s closed pin Char star s semicolon open curly brace Char star P equals s semicolon while star P not equal back slash single quote back sl0 single quote Clos pen p++ semicolon return open pen P minus s Clos pen semicolon Clos curly brace in its declaration p is a initialized s that is to point to the first character in the Y Loop each character in turn is examined until back sl0 at the end of scene since back sl0 is zero and since while tests only whether the expression is zero it is possible to emit the explicit test and such Loops are often written as while open pen star P close Brin p++ semicolon because P points to characters p p++ advances P up to the next character each time and P minus FS gives the number of characters Advanced over that is the string length pointer arithmetic is consistent if we'd been dealing with floats which OCC occupy more storage than chars and if P were a pointer to a float p++ would advance to the next float thus we could write another version of Alec which maintains say floats instead of chars merely by changing Char to float throughout Alec and free all the pointer manipulations automatically take into the account the size of the object pointed to so nothing else has to be altered other than the operations mentioned here adding or subtracting a pointer in an integer subtracting or comparing two pointers all other pointer arithmetic is illegal it is not permitted to add two pointers or to multiply or divide or shift or mass them or add float or double to them Section 5 .5 character pointers and functions a string constant written as double quote I am a string double quote is an array of characters in the internal representation the compiler terminates the array with a character back sl0 so programs can find the end the length in storage is thus one more than the number of characters between the double quotes perhaps the most common occurrence of a string constant is arguments to functions as in print F open PR double quot quot hello comma world back sln double quote when a character string appears like this in a program access to it is through a character pointer what print F receives is a pointer to the character array character arrays of course need not be function arguments if message is declared as Char star message then the statement message equals double quote now is the time double quot semicolon assigns message to a pointer to the actual characters this is not a string copy only pointers are involved C does not provide any operators for processing an entire string of characters as a unit in the language we will illustrate more aspects of pointers and arrays by studying two useful functions from the standard IO library to be discussed in chapter 7 the first function is Stir copy open print s comma T which copies the string t to the string s s the arguments are written in this order by analogy to assignment where one would say s equals T to assign T to S the array version is first stir copy open print s comma T Clos print Char s open square bracket close square bracket comma T open square bracket close square bracket semicolon open curly brace in I semicolon I equals z semicolon while open PR open pren S Sub I equals T sub I Clos pren not equal quote back sl0 quote close parentheses i++ semicolon close curly brace you'll note in that while statement that there is a copying of the actual characters as an assignment and then the side effect of the result of that assignment is compared to the new line to the end of string which terminates the while loop for contrast here is a version of stir copy with pointers and this is on page 100 of the textbook and you can see all the code in the textbook at www.cc.com codee and again this is example number two on page 100 stir copy open pren S comma T close pren Char star s comma star T semicolon open curly brace while open par open Print Star s equals star T Clos print not equal single quote back sl0 single quote close print open curly brace s++ comma uh semicolon t++ semicolon close curly brace close curly brace to end the function because the arguments are passed by value stir copy can use SNT in any way it pleases here they are conveniently initialized pointers which are marched along arrays a character at a time until the backslash terminates then T has been copied to s in practice stir copy would not be written as we showed above a second possibility might be and this is the third example on page 100 of the textbook stir copy open BR s comma T Clos PR Char star s comma star T semicolon open curly brace while pen pen star S Plus plus equals star t++ Clos print not equal quot back sl0 quote close print semicolon curly brace this moves the increment and S&T into the test part the value of star t++ is the character that t pointed to before T was incremented the post fix Plus+ doesn't change T until after this character has been fetched in the same way the character is stored in the old position of s before s is incremented the character is also the value that is compared against back sl0 to control the loop the net effect is that the characters are copied from T to S up to and including the terminating back sl0 as the final abbreviation of this solving this problem we can observe that the comparison against back x0 is redundant so the function is often written as and now this is the first sample code on page 101 of the textbook stir copy open print s comma T Clos print Char star s comma star T semicolon open curly brace while open Print Star s++ equals star t++ Clos print semicolon Cur brace although this may seem cryptic at first the notational convenience is considerable and the idiom should be mastered if for no other reason than you will see it frequently in C programs the second routine is Stir comp open PR s comma T which compares the character strings s and t and returns negative zero or positive according to as s is lexographic less than equal to or greater than T the value returned is obtained by subtracting the characters at the first position where s& disagree this is the second example on page 101 of the textbook which you can see at www.cc.com code stirm open print s comma T Clos print Char s open square bracket close square square bracket comma T open square bracket close square bracket semicolon open curly brace in I semicolon I equals z semicolon while S Sub I double equals T subi Clos PR open print S Sub i++ double equals single quote back slash 0 single quote close print return zero return open print S Sub I minus t sub I Clos print semicolon close curly brace the pointer version of stir comp is the first example on page 102 of the textbook stir comp open p s comma T Clos pen Char star s comma star T open curly brace four open prin semicolon star s dou equal star T semicolon s++ comma t++ Clos pint if open pren star s equal equal single quote back sl0 single quote close pren return open pren Z close print return open Print Star s minus star T Clos print semicolon close curly brace since plus plus and minus minus are either prefix or postfix operators the combination of star and Plus+ and minus minus occur although less frequently for example star r++ P increments P before fetching the character that P points to Star minus minus p decrements p first section 5.6 pointers are not integers you may notice in older C programs a rather Cavalier attitude towards copying pointers it has generally been true that on most machines a pointer may be assigned to an integer and back again without changing it no scaling or conversion takes place and no bits are lost regrettably this has led to the taking of liberties with routines that return pointers which are then merely passed to other routines the requisite pointer declarations are often left out for example consider the function stir Save open print s Clos print which copies the string s into a safe place obtained by a call to Alec and returns a pointer to it properly this should be written as this this is the first example on page 103 of the textbook you can see the sample code at www.cc for.com code pound include STD live. charar Ser stir Save open print s Clos pin save a string somewhere Char star s semicolon open curly brace Char star P star Alec open PR Clos PR semicolon if open PR open PR p equals Alec open PR sterland open PR s Clos PR plus one Clos PR Clos PR not equal null Clos PR stir copy open PR P comma s close PR semicolon return open p p Clos print semicolon curly brace in practice there would be a strong tendency mistaken tendency that is to emit declarations this is the example two on page 103 pound include stdlib.h stir Save open print s Clos print open curly brace Char star P semicolon if parentheses parentheses p equals Alec open p sterlin open p s Clos PR plus one close print close print not equal null Clos PR stir copy open print P comma s Clos print semicolon return open print P close print semicolon this will work on many machines since the default type for functions and arguments is int and int and pointer can usually be safely assigned back and forth nonetheless this kind of code is inherently risky for it depends on the details of the implementation and machine architecture which may not hold for the particular compiler you use it is wiser to be complete in all declarations the program lint will warn of such constructions in case they creep in inadvertently section 5.7 multi-dimensional arrays in general rectangular multi-dimensional arrays are used in computational programs like a weather simulation and were a way back in the day to write C code that could accept Fortran multi-dimensional arrays as parameters so that computational or statistical libraries could be written in C arrays of pointers are a mapping to the typical operating system and string manipulation use cases that are more the core of C applications we also call these ragged arrays because each row can be a different length this also works well as data is dynamically allocated in C as compared to the more static allocation approach that's typical in forr multi-dimensional arrays uh now to the textbook C provides for rectangular multi-dimensional arrays although in practice they tend to be much less used than the arrays of pointers in this section we will show some of their properties consider the problem of date conversion from the day of the month to the day of the year and vice versa for example March 1st is the 60th day of a non-leap year and 61st day of a leap year let us Define two functions to do the conversions day of year converts month and day to the day of the year and month day converts the day of the year into the month and the day since this latter function returns to two values the month and day arguments will be pointers month day open parentheses 1977 comma 60 Ampersand M comma Ampersand D Clos parentheses sets m to3 and d to one which is March 1st these functions both need the same information a table of the number of days in each month 30 days half September Etc since the number of days per month differs for leap years and non- Le years it's easier to separate them into two rows of a two-dimensional array rather than try to keep track of what happens in February during computation the array and the functions performing the Transformations are as follows this is example number one on page 104 of the textbook and you can see the code at www.cc.com code static int d a tab open PR to close PR open PR 13 close PR equals open curly brace open curly brace 0 comma 31 and then a number of numbers close curly brace comma open curly brace 0 comma 31 29 and then a bunch of numbers close curly brace close curly brace semicolon day of the year open print year comma month comma day int year comma month comma day semicolon open curly brace int I comma leap semicolon leap equals z year modulo 4 equals 0 and year modulo 100 not equals z or year modulo 400 equals equals z semicolon for open print I equal 1 I less than month I ++ Clos prin day plus equals Day tab open square bracket leap close square bracket open square bracket I close square bracket semicolon return open pen day close pen semicolon close curly brace then the month day function month day open pen year comma Year Day comma P Monon comma pday close pen int year comma Year Day star P month comma star p day semicolon open curly brace in I comma leap semicolon leap equals year percent 4 Double equals z and year percent 100 not equals z or year percent 1400 double equals z for I equal 1 Year Day greater than date tab open square bracket leap close square bracket open square bracket I close square bracket semicolon I ++ close parentheses Year Day minus equal Day tab open Square parenthe open square bracket leap closed square bracket open square bracket I closed square bracket semicolon star P month equals I star pday equals Year Day semicolon Clos parent the array Day tab has to be external to both day of year and month day so they can both use it dat tab is the first two-dimensional array we've dealt with in C by definition a two-dimensional array is really a one-dimensional array Each of which elements is also an array hence subscripts are written as Day tab open square bracket I closed square bracket open square bracket J closed square bracket rather than Day tab open square bracket I comma J close square bracket as in most languages other than this a two-dimensional array can be treated much the same in the same way as other languages elements are stored by rows that is the rightmost subscript varies fastest as elements are accessed in storage order an array is initialized by ini list of initializers and braces each row of a two-dimensional array is initialized by a corresponding sublist we started the array datab with a column of zero so that month numbers can run from the natural 1 to 12 instead of 0 to 11 since space is is not a premium here this is easier than adjusting indices if a two-dimensional array is to be passed to a function the argument definition declaration in the function must include the column Dimension the row Dimension is irrelevant since what is passed in as before it is a pointer this is in this particular case it's a pointer to objects which are arrays of 13 ins thus the array Day tab if it's to be passed to a function f the Declaration would be F open pren Day tab Clos pren int Day tab open square bracket 2 close square bracket open square bracket 13 close square bracket semicolon open curly brace dot dot dot Clos curly brace the argument declaration in F could also be int dat tab open square bracket close square bracket open square bracket 13 close square bracket semicolon since the number of rows is actually irrelevant it could also be SE as in open prin star Day tab Clos prin Open Bracket 13 Clos bracket semicolon which says that the argument is a pointer to an array of 13 integers the parentheses are necessary since the brackets have higher precedence than asterisk without parentheses the Declaration int star datab sub3 close square bracket semicolon is an array of 13 pointers to integers as we shall see in the next section section 5.8 pointer arrays pointers to pointers since pointers are variables themselves you might expect that there would be uses for arrays of pointers this is indeed the case let us illustrate by writing a program that will sort a set of text lines into alphabetic order than a stripped down version of the Unix utility sort in chapter 3 we presented a shell sort function that would sort an array of integers this same algorithm will work except that now we have to deal with lines of text which are of different lengths and which unlike integers can't be compared or moved in a single operation we will need a data representation data structure that will cope efficiently and conveniently with variable length text lines this is where an a array of pointers enters if the lines to be sorted are stored end to end in one long character Ray maintained by Alec perhaps then each line can be accessed by a pointer to its first character the pointers themselves can be stored in an array two lines can be compared by passing their pointers to stirum when two outof order lines have to be swapped or exchanged the pointers in the pointer array are exchanged not the text lines themselves this eliminates the twin problems of complex storage management and high overhead that would go with moving the text of the actual lines the Sorting process involves three steps read all the lines of input sort them and then print them in order as usual it's best to divide the program into functions that match this natural division with the main routine controlling things let us defer the Sorting step for a moment and concentrate on the data structures and the input and output the input routine has to collect and save the characters in each line and build an array of pointers to the lines it will also have have to count the number of input lines since that information is also needed for sorting and printing since the input function can only count cope with a finite number of input lines it can return some illegal line count like negative one if too much input is presented the output routine only has to print the lines in order in the order in which they appear in the array of pointers this next code segment uh is actually a combination of three successive sample code segments uh starting on page one 6 of the textbook and it's pretty complex so you might be uh best it might be best for you to see them on www.cc.com code line 106 example one okay pound include std. pound include string.h pound Define lines 100 the maximum lines of support to be sorted main open PR close PR Char line PT TR open bra open square bracket lines closed square bracket semicolon int n lines which is the number of lines red if open print open print end lines equals read lines open print line PTR comma lines close PR close PR greater than or equal to zero Clos PR open curly brace sort open PR line PTR comma end lines Clos pen right lines open pen line PTR comma end lines closed pen Su col close curly brace else print F open PR double quote input too big to sort back slash n close double quote close pen semicolon and close curly brace to end the main program this next routine is actually from page 107 of the textbook but we combined them into one pound to find maxan 1000 read lines open pren Line PTR comma Max lines Clos perin Char Star Line PTR open square bracket Clos square square bracket sum Co this is an array of pointers to characters an array of pointers pointers being you know long like four bytes characters being generally one bite int MAX Line semicolon open curly brace int Len comma end lines semicolon Char star P star Alec open PR Clos PR comma line open square bracket max length closed square bracket semicolon so just to recall Alec is a function we we did later I mean did earlier that allows us to allocate uh uh some text of uh a varying length and then lines so max Len is a place that we're going to uh read each line into beginning the code of read lines end lines equal zero while open pen open pen Len equals get Line open print line comma MAX Line Clos print Clos print greater than or equal to zero Clos print if open print end lines greater than or equal to Max lines Clos print return open print minus one Clos print semicolon else if open print open print P equals Alec open print Len close print close print double equals null Clos print return minus one so those two tests basically make sure that we don't get too many lines and that we have enough space in our Al Dynamic data area that Alec is managing for us so continuing with the if we're at the else opens curly brace line sub Len minus one equals quot back0 quote semicolon stir copy P comma line Clos print semicolon line PTR sub n lines Plus+ equals P semicolon Clos curly brace that finishes the else segment return n lines semicolon and close curly brace to finish the read lines function now at a high level we're reading a line into a allocated automatic variable line and then we are calling Alec to get another cop place a copy that line then we're making a copy of that line and then we are remembering the pointer to the beginning of that line in line PTR and that's the essence of it okay right lines open pren Line PTR comma end lines Clos pren Char Star Line PTR open square bracket closed square bracket semicolon again an array of point characters int n lines semicolon which is the number of character pointers in line PTR open curly brace in I semicolon four pen I equal 0 semicolon I less than n lines semicolon i+ plus Clos print print F open print double quote percent s back sln double quote comma line PTR subi Clos pin semicolon Clos curly brace a simple Loop that goes through the the array of character pointers and then prints each one out using print F the main new thing is the Declaration for line PTR Char Star Line PTR sub open square bracket lines closed square bracket semicolon says that line PTR is an array of lines elements each element of which is a pointer to a Char that is line PTR sub I is a character pointer and star line PR PTR sub I accesses a character since line PTR itself is an array that was passed to right lines it can be treated as a pointer exactly in the same manner as our earlier examples and the function can be written instead as right lines open pen line PTR comma end lines closed pen Char Star Line PTR open square bracket close square bracket semicolon int and line semicolon open curly bra Cas while open PR minus minus end lines greater than or equal to zero close PR print F open prin double quot percent s back sln double quot comma Star Line PTR ++ Clos PR semicolon that code by the way was on page 108 example one of the textbook Star Line PTR points initially to the first line but each in increment of line PTR advances it to the next line while in lines is counted down with input and output under control we can proceed to sorting the shell sort from chapter 3 needs minor changes the Declarations have to be modified and the comparison operation must be moved into a function but the basic algorithm is Remains the Same which gives us some confidence that it will still work and this is the second example on page 108 of the textbook and you can see this example at www.cc.com code sort sort open PR V comma n Clos PR Char starv open square bracket Clos square bracket semicolon in and semicolon open curly brace so we're getting an array of pointers to the beginnings of lines and how many of those pointers matter and the rest of it is shell sort with the stir comp being used um to do the string comparison so it's a three nested for loop with a simple if test in it so here we go open curly brace for the sort function int Gap comma I comma J semicolon Char star temp semicolon and that's a pointer to a character for open print Gap equals n /2 slash I mean semicolon Gap greater than zero semicolon Gap slash equals 2 close parentheses four I equal Gap semicolon I less than n semicolon I ++ Clos parentheses four open print Jal IUS Gap semicolon J greater than equal to Z semicolon J minus equals Gap Clos pren open curly brace so that's sort of the shell part of the shell sort and now we have to do our comparison if open pin stir comp open pin V subj comma V subj plus Gap Clos pin less than or equal to zero Clos pin break and note that that's only breaking the third deep for Loop which it just goes and then runs the next iteration of the second for Loop now we do the swapping temp equals V subj semicolon V subj equal a V subj plus Gap semicolon V subj plus Gap equal temp now that's just swapping pointer values so the the strings that are pointed to by these two pointers V subj and V subj plus Gap if they're out of order we're going to swap the pointers in the array and move them so that if you then go through like we did in right lines earlier then um they come out in order but we literally read the data once we copy it once into the its final destination using Alec Alec and stir copy but Al once we sort it which is the most complex part of the calculation we're only moving the pointers back and forth so this sort is very efficient and requires no extra uh memory than uh what we had before the sort so that's really nice and it sorts in place so back to the text since any individual element of V which is an to line pointer PTR is a character pointer temp could also should also be one so one can be copied to the other we wrote the program about as straightforwardly as possible so as to get it working quickly it might be faster for instance to copy the incoming lines directly into an array made by read lines rather than copying them into line and then into a hidden Place maintained by Alec but it's wiser to make the first draft of something easy to understand and worry about efficiency later the way to make this program significantly faster is probably not by avoiding an unnecessary copy of the input Lines Just instead replacing the shell St sort by something quicker and better like quick sort is much more likely to make a real difference that matters in chapter one we pointed out that because while and for Loops test the termination condition before executing the loop body even once they help to ensure that the programs will work at their boundaries in particular with no input it's Illuminating to walk through the functions of the Sorting program checking what happens if there is no input text at all section 5.9 initialization of pointer arrays consider the problem of writing a function month name open pren and Clos pren which returns a pointer to a character string containing the name of the MTH month this is an ideal application for an internal static array month name contains a private array of characters strings and returns a pointer to the proper one when called the topic of this section is how that array of names is initialized the syntax is quite similar to the previous initializations this is sample code from page 109 of the textbook which you can see at www.cc.com code charar month name open print Clos print so the return value for this function is a character pointer int n semicolon open curly brace static Char star name open square bracket closed square bracket equals open curly brace quote illegal month quote comma double quot January double quote comma double quote February double quote comma and so forth down to double quote December double quote Clos curly brace semicolon the body of the function function is one line return open pen open pen n less than one or n greater than 12 Clos pen question mark name Sub Zero colon name subn close pen semicolon close curly brace the Declaration of name which is an array of character pointers is the same as line PTR in the Sorting example the initializer is simply a list of character strings each assigned to the Corr responding position in the array more precisely the characters of the E string are placed somewhere else and a pointer to them is stored in name subi since the name size of the array name is not specify the compiler itself counts the initializers and fills in the correct number section 5.10 pointers versus multi-dimensional arrays newcomers to see are sometimes confused about the difference between a two-dimensional array and an array of pointers such as name in the example above given the Declarations int a open square bracket 10 Close square bracket open square bracket 10 Close square bracket semicolon and int star B open square bracket 10 Close square bracket semicolon the usage of A and B may be similar in that a sub 55 and B sub 55 are both legal references to a single integer but a is a true array all 100 storage cells have been allocated and the conventional rectangular subrip calculation is done to find any given element for B however the Declaration only allocates 10 pointers each must be set to point to an array of integers assuming that each does point to a 10 element array then there will be 100 storage cells set aside plus the 10 cells for the pointers thus the array of pointer uses slightly more space and may require an explicit initialization step but it has two advantages accessing an element is done by IND direction through a pointer rather than by a multiplication and an addition and the rows of the array may be of different lengths that is each element of B need not point to a 10 element Vector some may point to two elements Others May point to 20 and some to none at all although we have phrased this discussion in terms of integers by far the most frequent use of arrays of pointers is like that shown in month name to store character strings of diverse lengths section 5.11 command line arguments in environments that support C there is a way to pass commandline arguments or parameters to a program when it begins executing when main is called to begin execution it is called with two arguments the first conventionally called argc is the number of command line arguments the program was in invoked with the second argv is a pointer to an array of character strings that contain the arguments one per string manipulating these character strings is a common use of multiple levels of pointers I would note that back in 1978 the two largest bodies of code were likely the AT&T Unix kernel itself and Unix utilities like grep LS or the login shell so writing an operating system was fresh on the mind of the authors while writing this book these topics find their way into the text of this book in a sense a likely second order goal of the book was to train programs that might learn C and then might help build and maintain Unix the 1978 edition of this textbook fits nicely into a series of AT&T Bell Labs technical reports like the portability of C programs in the Unix system written by Stephen C Johnson and Dennis M Richie published in the bell's system technical Journal volume 57 number six part two July through August 1978 Pages 2021 through 2048 you can see this one online if you search for it back to the textbook the simplest illustration of the ne necessary declarations and use is in the program echo which simply Echoes its command line arguments in a single line separated by blanks that is if the command Echo hello comma world is given the output is hello comma World by convention arv subz is the name by which the program was invok so AR C is at least one in the above example AR C is three and AR arv Sub 0 arv sub1 and r v sub 2 are Echo hello comma and World respectively the first real argument is argv sub one and the last is argv sub sub Arc minus one and if Arc is one there are no command line arguments after the program name this is shown in the source code to Echo and this source code is on page 111 of the textbook and you can see this source code at www.cc.com codee pound include stdio.h pound include string.h main open pen Arc comma AR Fe Clos pen int Arc semicolon Char star arv open square bracket close square bracket semicolon open curly brace in I semicolon for open p i equal 1 semicolon I less than RX c semicolon I ++ close pen print F open pen double quote percent s percent C double quote comma arv subi comma open pen I less than r c minus one closed pen question mark single quote space single quote colon single quot back slash n single quote close parentheses semicolon close curly brace to end it since ARG VV is a pointer to an array of pointers there are several ways to write this program that involve manipulating the pointer rather than indexing an array let us show two variations and this is the example number two on page 111 of the textbook pound include stdio.h pound include string.h main open pen Arc comma arv Clos pen int argc semicolon Char star arv open square bracket close square bracket semicolon open curly brace while open pen minus minus r c greater than zero Clos print print F open PR double quote percent s percent C double quote comma star Plus+ arv comma open print ARG C greater than one close peren question mark single quote space single quote colon single quote back slash n single quote close PR semicolon Clos curly brace since arcv is a pointer to the beginning of an array of argument strings incrementing it by one plus plus RV makes it point to the at at the original array argv sub1 instead of arv Sub 0 each successive increment moves it along to the next argument star argv is then the pointer to that argument at the same time ARG C is decremented and when it becomes zero there are no arguments left to print another version the third version on page 111 of the textbook pound include stdi Doh pound include string.h m open pen Arc comma arv Clos pin int Arc semicolon Char star arv open square bracket closed square bracket semicolon open curly brace while open pin minus minus r c greater than zero close pen print F open pen open pen RC greater than one close pen question mark double quot percent s blank double quote colon double quot percent s back sln double quote comma star Plus+ argv Clos PR semicolon Clos curly brace this version so shows that the format argument of print F can be an expression just like any of the others this usage is not very frequent but worth remembering as a second example let's make some example uh enhancements to the pattern finding program from chapter 4 if you recall we wired the search pattern deep into the programing and this is an obviously unsatisfactory arrangement for flexible code following the lead of the Unix utility GP which stands for the generalized regular expression parser let us change the program so that the pattern to be matched is specified by the first argument on the command line This is example one on page 112 of the book which you can see at www.cc.com code pounding clude stdio.h pound include string.h pound toine MAX Line 1000 Main open pren Arc comma arv Clos P int Arc semicolon charar arv open square bracket close square bracket semicolon open curly brace Char Line open open square bracket MAX Line closed square bracket semicolon if open PR Argy not equal to closed PR print F double quote usage colon find pattern back sln double quote Clos pin semicolon else while open pen get Line open pen line comma MAX Line Clos pen greater than zero Clos pen if index open pen line comma RV sub one Clos pen greater than or equal to zero Clos pen print F open p double quot percent s double quot comma line Clos PR semicolon close curly brace the basic model can now be elaborated to illustrate further pointer constructions suppose we want to allow two optional arguments one says print all the lines except those that match the pattern the second says preced each print each printed line with its line number a common convention for C programs is that an argument beginning with A minus sign introduces an optional flag or parameter if we choose minus X for except to Signal the inversion and minus n number to request line numbering then the command find Space - x space minus n the with the input now is the time for all good men to come to the aid of their party should produce two comma for all good men optional arguments should be permitted in any order and the rest of the program should be insensitive to the number of arguments which were actually present in particular the call to index should not refer to arv sub 2 where a single flag argument and to argv sub one when there was no single flag furthermore it's convenient for users if option arguments can be concatenated and as in find space- NX space thee here is the program and this program is on page 113 of the textbook and it is complex enough that I suggest that you take a look at it at www.cc.com SLC code it's it's about 35 lines long the commentary on the program hopefully now you're watching looking at it argv is incremented before each optional argument and argc is decremented if there are no errors at the end of the loop argc should be one and star argv should point to the point to the pattern note that star ++ arv is a pointer to an argument string open pin star Plus+ arv Clos print open square bracket 0o square square bracket is its first character a parentheses are necessary for without them the expression would be star Plus+ open paren arv subz close print which is quite different and wrong an alternate valid form would be star star Plus+ arv section 5.12 pointers to functions in C a function itself is not a variable but it is possible to define a pointer to a function which can be manipulated past to functions placed in arrays and so on we will illustrate this by modifying the Sorting procedure written earlier in this chapter so that if the optional argument minus n is given it will sort the input lines numerically instead of lexographic graphically a sort often consists of three parts a comparison which determines the ordering of any pair of objects an exchange which reverses their order and a sorting algorithm which which makes comparisons and exchanges until the objects are in order the Sorting algorithm is independent of the comparison and exchange operations so by passing different comparison and exchange functions to it we can arrange to sort by different criteria this approach is taken in our new sour the lexographic comparison of the two lines is done by stir comp and swapping by swap as before we'll also need a routine num comp which Compares two lines on the basis of numeric value and Returns the same kind of condition indication as stir comp does these three functions are declared in Main and pointers to them are passed to sort sort in turn calls the functions via pointers we have skimped on error processing processing for arguments so as to concentrate on the main issues this sample code is from page 115 of the textbook which you can view at www.cc for.com slode pound include stdio.h pound include string.h pound Define lines 100 Main open paren Arc comma arv Clos parent int argc semicolon Char star Ary open square bracket close square bracket sumon open curly brace jar Star Line PTR open Open Bracket lines close square bracket semicolon this is the pointers to the text lines so we're going to be reading in the lines saving them and keeping an array of the pointer and then we're going to sort that way int end lines of semicolon int stir comp open print Clos print comma num comp open print Clos print which are comparison functions int swap open print close print semicolon int numeric equals z and this is going to be one if it's a numeric sort first we par the arguments if open print AR C greater than one % % arv sub 1 Sub 0 equal equal quote minus quote and RV opens bracket one close bracket Open Bracket 1 close bracket double equal quote and quote close perin numeric equals 1 if open pen open pren end lines equals read lines open pen line PTR comma lines Clos print Clos print greater than or equal to zero Clos print open curly brace if openr numeric Clos PR sort open pren Line PTR end lines num comp comma swap Clos PR semicolon else sort open print line PTR comma n lines comma stir comp comma swap Clos pen right Lines line PR PTR comma end lines Clos pen close curly brace else print F open PR double quote input too big to sort back slash N double quote Clos peren semicolon stir comp num comp and swap are addresses of functions since they're known to be functions the Ampersand operator is not necessary in the same way that it is not needed before an array name the compiler arranges for the address of the function to be passed the second step is to modify our sort function and this is the first example on page 116 of the textbook sort open pen V comma n comma comp comma ex C Exchange close pen Char star V open square bracket closed square square bracket semicolon that's our pointer array of pointers int and semicolon int open pren star comp close PR open pren close PR comma open pren Star Exchange Clos pen open pen Clos pen semicolon that declared the type and the fact that these are Pointers to functions it's a little more complex here in the called code open curly brace int Gap comma I comma J semicolon and now we're going to do the three nested for Loops for the uh quick sort no shell sort um and then the only really change is in the code checking to see if the uh two items pair of items are out of order and then what we do so for open print Gap equals n over 2 semicolon Gap greater than zero semicolon gra Gap slash equals 2 close PR four open pren I equals Gap semicolon I less than n semicolon I ++ Clos pren four open PR Jal IUS Gap semicolon J greater than or equal to Z semicolon J minus equals Gap Clos PR open curly brace and now here starts the different code if open print open Print Star comp Clos print open print V subj comma V subj plus Gap Clos print less than or equal to zero Clos print break open Print Star ex exchange Clos print open print Amper V subj comma Amper V subj plus Gap Clos print semicolon close curly brace for the for Loop and then close curly braas for the sort function and so really all we're doing is we're checking the the order of the two items B subj and B subj plus Gap and if they're out of order IE less than or equal Zer I mean greater than or equal to U greater than zero then we just exchange them with the provided exchange function and so the key thing here is it looks exactly like the previous time we wrote this code except we're calling the pointer to the comparison function and the pointer to The Exchange function which is makes this flexible um so it can handle different kinds of data back to the textbook The Declaration should be studied with some care int open print start comp close print open print close print says that comp is a pointer to a function that returns an INT the first set of parentheses are necessary without them int star comp open print Clos print would say that comp is a function returning a pointer to an INT which is a quite different thing the use of comp in the line if open PR open print start comp Clos print open print V subj comma V subj plus Gap close print less than or equal to zero Clos PR is consistent with a declaration comp is a pointer to the function and star comp is the function and open Print Star comp Clos print open print V subj comma V subj plus Gap close print is the call to it the parentheses are needed so the components are correctly Associated we've only we've already shown stir comp which Compares two strings here is num comp which Compares two strings on a leading numeric value this is sample code from page 117 of the textbook which you can see at www.cc.com code numc open print S1 com S2 Clos print Char star S1 comma star S2 semicolon open curly brace double A to F open print Clos print comma V1 comma V2 V1 equal a to F open print S1 Clos print semicolon V2 equal a to F open print S2 Clos print semicolon if open print V1 less than V2 Clos print return open print minus one else if open PR V1 greater than V2 Clos pin return open p one closed pen semicolon else return open pin zero Clos pin semicolon Clos curly brace the final step is to add the function swap which exchanges the two pointers this is adapted directly from what we presented earlier in the chapter swap open print PX comma py Clos print Char star p PX open square bracket close square bracket comma star py open square bracket close square bracket semicolon open curly brace Char star temp semicolon temp equals star PX semicolon star PX equal star py semicolon star py equals temp semicolon close curly brace there are a variety of other options that can be added to the Sorting program some make challenging exercises [Music] this work is based on the 1978 C programming book written by Brian W kernigan and Dennis M Richie their book is copyright All Rights Reserved by AT&T but is used in this work under fair use because of the book's historical and scholarly significance its lack of availability and the lack of an accessible version of the book the book is augmented in places to help understand Its Right Place in a historical context amidst the major changes of the 19 7s and 1980s as computer science evolved from a hardware first vendor centered approach to a software centered approach where portable operating systems and applications written in C could run on any hardware this is not the ideal book to learn C programming because the 1978 Edition does not reflect the modern sea language using an obsolete book gives us an opportunity to take students back in time and understand how the SE language was evolving as it laid the groundwork for a future with portable [Music] applications so hello welcome to chapter 6 in this chapter we talk about structures but so much more there is a mid chapter surpris in this book and I'm sure that it's so surprising that it's caused a few too many people to drop out a a computer science first computer science course and that's because in section 6.1 through 6.4 we're learning the C language and we're learning just what a structure is it's a simple beautiful elegant concept it's a sort of a wrapper for a whole bunch of types it groups them together so that you can create a new type and it really at this point is the last foundational component of the cor C languages cor C language and then in section 6.5 they the authors pivot to talking about data structures and that is the applications of C structures and it's the foundational notion in computer science it's the kind of thing where how do you build a python dictionary and C um and and so this this is a a pattern we call the knee of the curve where things are going along just fine up to 6.4 like oh here's a for Loop and here's a string and here's an array and here's even a pointer that that's not that hard and structures are not that hard but then when we start talking about applications of structures what we call data structures and structures was named because data structures was a concept but how we use structures is a quite a Next Level thing we're kind of leveling up so I want you starting from here chapter six is the last real chapter that I'm going to cover but it is expanding because chapter 6 is the beginning of a whole additional course a course on data structures so I want you if you're rushing I want you to slow down I want you to take your time and understand because if you understand this you can literally you have a doorway into a lot of computer science we'll talk about recursion even by the end and so just don't rush work on Mastery these are complex Concepts they are not natural to understand and so before we start I want to do something different I want to read you a poem one of my favorite poems from Robert Frost uh I was lucky enough fortunate enough to be a good friend of Bob Frost who was a grandson of Robert Frost Bob Frost had a connection to the University of Michigan Robert Frost has a Michigan and University of Michigan uh connection and um I loved this poem that I'm about to read you uh long before I met Bob Frost the grandson of Robert Frost so the poem that I talk a lot about and really it's one line miles to go before I sleep this is the poem called stopping by the Woods on a Snowy Evening and to me it it speaks of the notion that Journeys are not short nor easy and it's okay accept that so here we go Dr Chuck in poetry whose woods these are I think I know his house is in the village though he will not see me stopping here to watch his Woods fill up with snow my little horse must think it queer to stop without a farmhouse near between the woods and frozen lake the darkest evening of the year he gives his harness bells a shake to ask if there is is some mistake the only other sounds The Sweep of easy wind and dowy Flake the woods are lovely dark and deep but I have promises to keep and miles to go before I sleep and miles to go before I sleep so the essence of this is that you have come a long way to get to 6.4 in this book and it may feel like you've gone long enough and you should just pat yourself on the back but after 6.4 there are miles to go before I sleep but the good news is when you're done you can relax so what I want you to do is take your time things get much more complex really fast going forward and I don't want to lose [Music] you structures structures is the mo one of the most beautiful parts of SE like pointers it's a userdefined type that contains sort of one or more types within it we call things like X and Y in this case uh members of the structure X is a member of the struct point and Y is a member of the struct point the dot operator allows us to take a variable that is a structure variable that has all these members within it and then access the members of the structure so an example kr601 doc we have struct Point open curly brace double X semicolon Double Y semicolon closed curly brace semicolon and this defines a new type it doesn't allocate any data and then we say Point P1 comma P2 semicolon and that actually allocates two points which is four doubles named P1 and and two of each point is P1 and P2 then I say p1.x equal 3.0 and I say p1y = 4.0 and then I say P2 = P1 which copies all the fields into the corresponding locations in P2 and then I print them out and I get three and four and so that's BAS basic structures simple clean elegant understand every line of this memorize it when you do a in a sense call by value with a structure the entire structure is placed on the stack and so you don't want to make structures too gigantically big you know if they're like 10 to 40 bytes of characters they're not all that bad but when we make struct Point pm and set X and Y to three and four and then we call Funk with funk open print PM Clos print PM is a structure that is in the scope of the main program and then we're calling Funk and passing it in and then we're accepting it as a structure inside of funk and the key thing there is at this point it's a sort of copy call by value where the entire structure is allocated on the stack and then passed into Funk so if we change it inside Funk PF dox = 9.0 and pfy = 8.0 it changes locally and can print it out and then when it Returns the PM that's in the main program is unchanged and that's because the entire structure PM is duplicated into the stack frame for function Funk operating inside function Funk we're only messing with the copy that's in our stack frame and then when we sort of undo that it restores PM back to the way it was so those are just plain structures but pointers to structures are where we get a lot more powerful so here we have another same struct with a double X and A Double Y and now we say struct Point PT which is a regular old variable and Then star PP which is a pointer to a structure an address of a structure and you can take the address of a structure just like you can take the address of an integer a structure is a very fundamental type so PP equals Ampersand PT copies the address of the actual structure PT into the variable PP PT dox equal 3.0 0 or open Print Star PP Clos print doy equals 4.0 so if we have a pointer to a structure we have to use the asteris to sort of look up the actual structure and then we can do structure things with it like set the doy value there's a shortcut take an address dreference it with the asterisk and then use the dot they're combined into kind of what we call the arrow operator which is pp-h greater than Y which is the same as open paren star PP Clos par n.x and so this we call it the arrow operator and so it's kind of a shorthand and it's used all the time and you'll see when we look later at things like PHP they adopted this Arrow operator most other object oriented languages tend to use the dot operator but others use the arrow operator so if we're going to pass a structure by reference you use the Amper sand in the call and the asterisk inside the function so we create stru Point PM set X and Y within that to three and four respectively then we print it out and then we call Funk but now we call it with Ampersand PM which says pass in the address of PM and then in the function we take in PP and we declare its type as struct point star PP which means we are getting as a parameter an address not the value and then we use the arrow op operator PP Arrow X = 9 PP Arrow yal 8 we print that in the function but because PP just points to PM it's also changed in the copy that's in the main so PM is being changed at the moment that PP is being changed and so when it comes back you see that the value is 9 and 8 in outside of the function so this is simply passing by reference it almost be better call it pass by address so if you take a look at the stack frame what you see is when we're calling Funk open pren Ampersand PM Clos pren we are making a copy of the address and putting that into the stack frame and then passing that stack frame into Funk and so now Funk has an address of something it happens to not be in its domain meaning that PM is still in main but we can work with the arrow operator and actually make changes in the underlying object and so the key there is that you could change PP but PP Arrow X you're actually changing the single copy that is in pm at that point storage allocation now pretty soon we're going to be dynamically allocating things we've always said that oh you create a Char array of char with Open Bracket 10 there's 10 things in that what if we want more and we've said you can't reallocate this stuff but now we have this thing we're going to start allocating so it turns out when you're allocating things you have to know how big they are okay and so there's this size of operator so size of so if we take struct Point PT and star PP which means we have a a structure with two doubles in it and then a pointer to a structure with two doubles of it and we simply say what is the size of the structure and it's 16 because each double is eight and then what is the size of a pointer to a structure and that is eight because eight is how big addresses are the fact that doubles and addresses happen to both be the same thing is not relevant here addresses are on most C systems eight characters you can also because it's an operator not just give it a variable but also give it a type so we can say size of open per N struct Point close per n and that also will be 16 because the size of a struct if we were going to allocate it is 16 characters so size of returns the size of something in characters and so we have this function called Mal and Malik you have to include stdlib.h to do it and so now what we're going to do is create a the structure Point again and we're going to create a pointer we're going to not actually allocate it so that struct point star PP allocates an8 character address not the actual double two doubles now what we do is we say PP equals first and we're casting this we're casting The Returned address from Malik pen struct point star pen Clos pren and that is casting it to a pointer to a struct a pointer to a point and then we say Malo open friend size of struct Point Clos print Clos print and so that says Malik 16 because a struct point is 16 characters so Malik goes and finds some free memory for us in its pool of free memory and gives us back an address which we then convert to a pointer to a point and then we store that in PP and at that point we have data we have a a working structure and we can set the x and the Y value just as the normal way whether we use the arrow operator or the star and Dot operators we can access that information the next thing we're going to talk about is combining all these things dynamic memory and structures to create lists and this is a simple Python program lines equals list Hand equals open open pren romeo. txt then Loop through and then append each line after stripping the new lines and then printing them out so lines is a list object now underneath it there is a data structure which by the end of this course you're going to get to know really well but this will print out the four lines from romeo. txt and this is kr67 py now we're going to build this list structure and then we can store some data in it the entries of the list are going to be stored in dynamically allocated memory and each list contains some data and then links to other members and so we're going to create a thing called struct L node open PR Char star text which is a pointer to a list an array of characters of of unknown length struct L node star next semicolon colos curly brace semicon and so if you construct a real live uh link list you need also to have two variables struct L node star head and struct L node star tail and if we end up with a link list of three things head points to the first item in the list and then within that there is the text the text points to some string in this case it's the letter c and then the next is an address of the second thing in the list in that second thing text points to is and next points to the third thing in the list and then next in the third thing in the list points to a value called null which is our indication that it's an address to Nowhere null and zero are pretty much the same thing all addresses are nonzero so we look for an address of zero and that tells us that we've got to the end and then in order to append to this list we have another value which is a pointer called tail which points to the last element in the list when we start the list head and tail both point to null and so ultimately what we've done is we've created a dynamically allocable structure where we can put put sort of any number of lines in it and so it's kind of like a python list so the first line of this the while statement reads a value into line a string value into line the next line from the file right and so we have a pointer to a character that we pre-allocated Above This and That points to the variable fun uh three characters plus an end of string and then the next thing we're going to do is allocate a new string that is the same length plus one so we're going to allocate four characters using Malik sterland line plus one which is going to give us four then we're going to get that address back and we're going to cast that to a Char star and then we are going to assign that to save so that's the the place we're going to save this new item the next thing we're going to do is allocate a brand new L node a brand new node in our list and we're going to maloc size of struct L node and then we're going to cast it to a struct l node star and then we're going to assign that into a struct l node star variable named new we we saved our string and we've got a pointer to that save string and a pointer to a empty at this point uh L node then what we do is we connect with um if tail is not equal to null tail next equals new so we take the now the about to be second to last item and we connect it to the last item and then what we do is we take the text pointer inside of our newly allocated list node and point it at the saved copy of our string in dynamic memory and then we point the next to be null and then we simply Advance tail so that next time we do this tail is pointing to the new end of the string now we've got one more thing that we've got to do and that is if head is null we have to set head to new this is only that's only going to happen on the very first one and so at this point our list has three entries it went from two entries to three entries and so we can go back up and read the next line and this sequence of statements will figure it out so I would just say take your time on this one thing that we learn when we're working on linked data structures inside of C is you got to draw a picture I I literally can't do this for from memory I just draw the picture and then it's really easy to do so sometimes you want to walk through a link list so we tend to call this we make a variable called Cur current which is a struct l node star a pointer to an L node and so this is the same as looping through a python list We Set current to head because that's where we're going to start going through the list and then we print out current text which prints the print C out and then we go to the bottom of the loop and then we advance current to current next it's kind of like adding plus one but we just went from the first item in the list to the second item in the list and then we do it again and we are printing out the third item of the list and then we current equals current next gets to null and so then the list is over and we have printed all of them out so in addition to creating link lists we're going to do many different things with link lists we're going to delete things we're going to sort things we're going to find things we're going to look things up we're going to change things so one of the things that will save you a lot of craziness when working with link lists is always draw pictures and arrows it's just it's just necessary and frankly those programming exams all they ever tell you is to draw these pictures sometimes draw a picture of a hashmap as as a good example so I'm just going to show you some pictures rather than showing you a bunch of code and then I think you can produce the code um uh later so I want to show you how you delete an item from a singly linked list so first you got to find it you got to do the walking of the link list which I just showed you you got to walk through and you got to find the item that you want to delete now if you walk through and you don't find it there's nothing to delete so our goal is is to delete the line is is and so but you got to handle three cases there's the easy one which is where the thing you found is in the middle of the list or if the thing that you found is at the start of the list or if the thing you found is the last item in the list and again you got to draw pictures and you may have to draw them separately because we we're going to be adjusting head and tail in addition to the next values now it turns out when you're going to do a delete you want to not only track the current item in the list you want to track the previous item in the list so as you're walking down the list you look you current moves ahead and preve moves right behind it so preve Trails current by one item you can see that current the the is line is the one we're going to delete and it's pretty simple the only real thing that you've got to change is you take PR next and you point it at ker next right so you can see on the right hand side that the the little link from the C to the fun has just B P the is now you do have to deallocate the struct L node and the string but that's pretty much all you need to do and so the middle is really easy you find it you keep track of prieve you have current and then you just kind of bypass it by moving next so that's the easy one the next thing that you got to do is what if it's the first node and in this case prieve will point at null because we have not really seen more than one PR only goes non-null when we've passed the first one so current is pointing itead let's say we want to delete the line that says C and so we notice that preve is null and so what we do is we actually just move the head to Kern next right you can see in the on the right hand the head just points now the head has now bypassed that first line and that's all we've got to do except clean up the memory of the first entry uh both the string and the struct L node so that's pretty easy you detect by noticing that preve is null because you you're catching it in the first time through the loop and preve trails current by one and if you have not seen the second one preve is still null so that's a pretty easy one but you got to check it you got to check all so this one is if preve is null you're sitting at the front so you mess with head and that's all you've got to do other than get rid of the data and free it up the last note is perhaps a little bit trickier and so at if you look and how you know you're on the last node is C next is null right because if you're pointing at the very last node in the list then the next is null so if C next is null then what you're doing is you know you're deleting the last one and so you first set PR next to be null because that's the new last one and then you have to update tail to point to preve so tail was pointing at cerr at the current and then tail at the end points to proeve and then you got to clean up your data of that formerly last item in the list so now we're going to talk about doubly link lists and the main purpose to have doubly link lists is to be able to reverse a list python is easy there's a method in the list object called reverse what we're about to see is exactly how reverse works so you just read the lines in and you say lines. reverse and then you print them out and they come out and backwards order now somebody probably G van rossom who wrote this in the first place in 1991 he is writing a doubly link list to make reverse easy so if you remember how we did the um deleting where we kept track of prieve well in a doubly link list we actually just have preve in the L node and so in addition to the text we're going to store we're going to have a pre pointer and a next pointer and um at the beginning of the list the pre will be null and at the end of the list next will be null it's called a doubly link list because it has both reverse and forward chains of pointers and we keep them working all the time so it turns out that making a doubly link list in terms of code is not that different we have another thing in the L node we now have three things Str struct L node star preve and as we're linking a new thing onto the end of the list we basically take new preve the new new item we're adding and we copy tail into that because tail points to the last one we're adding a new one to the last one and the previous one is tail and then we do that before we set tail to new so we add a new one and then we set tail to new so it's not that hard so this is an example of the three item list with all of the previous and nexts properly uh properly shown and so you it it just links together once you draw these things in a picture and you get the understanding of what they're talking about they're not too hard and section 6.5.1 in the book which I added actually um walks through this W link list in some details so I won't replicate that here now let's just say that you have a doubly link list and you want to now reverse it well it's pretty simple you set current to tail same as long as current stays n not null and then say current equals current prieve so you're you're sort of popping back up one at a time so the second time through the loop current has followed the preve from the last item and now it's on the second to last item and then it does it again and then it goes to the top and then the last time it goes up to the preve of the top one and finds null and then the uh loop finishes because current has become null and you printed out the three lines cool is C backwards so this is just an example in k69 Doc and I won't walk through it in this lecture I'll let you take a look at that that um it's quite common to encode all this stuff in uh some functions rather than just in straight line code it's not all that hard you don't want to pass in uh the the list structure and the line um a list is passed in by reference and so you have to say Ampersand my list on this list add function and inside the function you do uh you have the list be a pointer to that structure and then you have to use the arrow syntax inside of the function the next thing I want to talk about is unions a union is like a structure but it it reallocates the same memory over and over whereas a structure allocates more memory this says I'm going to take this same piece of memory and I'm going to view it different ways and so it's the same area and you can assign multiple types to it this this is useful in like Network protocols and pulling bits out of uh memory Etc and so in my current sample I've got my union sample open curly brace in I semicolon Char CA bracket 4 bracket semicolon float F semicolon Clos curly brace semicolon what I have done here is allocated in a sense four bytes integers in this case are 32 bits ca4 is four bytes which is 4 * 8 is 32 and floats are 32-bit floating Point numbers so that's the same amount of memor so instead of being 4 * 3 or 12 bytes this is actually Four bytes that I can see it I can I can view it either as an integer or as a character array of four characters or float and I've carefully lined them up so that they're the same width allocate Union sample U and then I set the integer version of that that 4 bytes to 42 and then I print it out and you can see the hex floating Point that's a complete failure is 0. z is like it's not a very good big floating Point number and then the as a character string it is an asterisk and then I take view.ca which is viewing that same area of memory as a character array and I copy quote capital a BC into it and you'll see that it prints out as a string as ABC the floating Point number is still zeros it's still kind of a bad floating Point number um but then I see the hex as 00 63 6241 now this is a little Indian computer that I'm running on it so the a is the 41 the B is the 62 and the C is the 63 and the zero is the end of string indicator and so that's why I picked very carefully only copying a three character string into a four character array so I could copy that end of end of string indicator then I take u.f which is taking that exact same memory and perceiving it as a floating point and I put 1/3 into it so now when I print that out it's 0.33 lovely when I print it out as a string it is pretty bad and so it turns out that uh the first three characters are the string but there's no zero at the end of the string so that greater than is there and it just keeps on going it goes into memory and it's it you know that all that stuff that's greater than at sign question mark HK that's just random garbage on the stack somewhere that because the percent s is wandering randomly through memory at this point and then if I print that out as hex and if if we wanted to we could learn something about the I floating Point format but 3E AA aaab is 1/3 and that is a base two repeating 1/3 with a exponent and uh floating Point Internal formats are not the sub object for this course I've accomplished everything I wanted you to know about C so the next topic is going to be object Orient programming but not just how to use object-oriented programming in python or whatever C doesn't have it what I want to do is look at if we were writing python itself in C which is what C python is how would we have to build things like a list structure a a list object a string object and a dictionary object how would we build them and we'll take a look at other programming languages that have objectoriented features like C++ and Java and Etc and so really the next bit is going to be about the implementation details for objectoriented programming [Music] welcome to C programming for everybody my name is Charles sance and this is my reading of the 1978 C programming book written by Brian kernigan and Dennis Richie at times I add my own interpretation of the material from a historical perspective chapter six structures a structure is a collection of one or more variables possibly of different types grouped together under a single name for convenient handling while we talk about data structures and how to use them in every language this section is about understanding how software developers carefully control the low-level shape of their data items to solve their problems when you first learn about the C struct keyword you might think it's equivalent to a python dict a dynamic key Value Store like a PHP array Java my apppp or JavaScript object but nothing is further from the truth these other languages provide us with easy to usee data structures where all the challenging problems are solved this chapter tells or for told the creators of python PHP Java and JavaScript how to solve the complex problems and build convenient and flexible data structures which we now all use in those objectoriented languages one way to look at this code in the chapter is to think of it as a lesson on how one might build Python's list and dict data structures if the code in the chapter takes you a little while to figure out mentally make a note of thanks for all the hard work that mod languages invest to make their highlevel data structures flexible and easy to use back to the textbook the traditional example of a structure is a payroll record an employee is described as a set of attributes such as name address social security number salary Etc some of these in turn could be structures a name has several components as does an address and even a salary structures help organize complicated data particularly in large programs because in many situations they permit a group of related variables to be treated as a unit instead of separate entities in this chapter we'll try to illustrate how structures are used the programs that we will use are bigger than many others in the book but are still of modest size section 6.1 Basics let us revisit the date conversion routines of chapter 5 a date consists of several parts such as the day month and year and perhaps the day of the year and the month name these five variables can all be placed in a single structure like this struct date open curly brace in day semicolon int month semicolon in in year semicolon int Year Day semicolon Char M name open square bracket 4 closed square bracket semicolon curly brace semicolon the keyword struct introduces a structure decoration which is a list of decorations enclosed in braces an option name called the structure tag may follow the word struct as with date here the tag names this kind of a structure and can subsequently be used as shorthand for the detailed declaration the elements or variables mentioned in a structure are called its members a structure member or tag and ordinary I.E non-member variable can have the same name without conflict since they are always distinguished by context of course as a matter of style one would normally use the same names only for closely related objects the right brace that terminates the list of members may be followed by a list of variables just as for any basic type that is struct open curly brace dot dot dot closed curly brace X comma y comma Z semicon is syntactically analogous to int X comma y comma Z semicon in the sense that each statement declares X Y and Z to be variables of the named type and causes space to be allocated for them a structured declaration that is not followed by a list of variables allocates no storage it merely describes a template or the shape of the structure if the Declaration is tagged however the tag can be used in later definitions of the actual instances of the structure for example given the Declaration of date above struct date D defines a variable D which is a structure of type date and external or static structure can be initialized by the following by following its definition with a list of initializers for the components struct date D equals open curly brace 14 comma 7 comma 1776 comma 186 comma Double quot jul L double quot Clos curly brace semicolon a member of a particular structure is referred to in an expression by construction of the form structure name do member the structure member operator dot connects the structure name and the member name to set leap from the date in structure D for example leap equals D.E modulo 4 equal 0 and D.E modulo 100 not equal to zero or D.E modulo 400 equal 0 semicolon or to check the month name if open PR stir comp open PR d.m name comma double qu Aug double quote close pren equals 0 close PR dot dot dot or to convert the first character of the month name to lowercase d.m name subz equals lower open pin d.m name Subzero closed pen semicolon structures may be nested a payroll record might actually look like struct person open curly bra Char name open square bracket name size close square bracket semicolon Char address open square bracket Adder size Clos square bracket semicolon long zip code semicolon long SS number semicolon double salary struct date birth date semicolon struct date hire date semicolon close curly bra semicolon the person structure contains two dates if we declare M as struct person M semicolon then. birthd dat. Monon refers to the month of birth the structure member operator dot associates left to right section 6.2 structures and functions there are a number of restrictions on C structures the essential rules are that only operations you can perform on a structure are to take its address with Amber sand and access one of its members this implies that structures may not be assigned or copied to as a unit and that they cannot be passed or returned from functions these restrictions will be removed in forthcoming versions porista structures do not suffer these limitations however so structures and functions do work together comfortably finally automatic structures like automatic arrays cannot be initialized only external or static structures can this prediction was indeed accurate modern C compilers do support the copying of a structure with a single assignment statement given that a c structure is just a fixed length block of memory it's easy easy to generate machine code to copy it a key bit to remember that when the C structure is copied it is done as a shallow copy a shallow copy copies the values of the variables and the pointers in the structure but does not make copies of any data which the pointers point to a structure that contains other structures I.E not pointers to structures then those structures are shallow copied as well back to the text let us investigate some of these points by rewriting the date conversion functions in the last chapter to use structures since the rules prohibit passing of a structure to a function directly we must must either pass the component separately or pass a pointer to the whole thing the first alternative uses of day of year which as we wrote in chapter five D.E day equals day of year open print D.E comma d. Monon comma d. day Clos print semicolon the other way is to pass a pointer if we've declared higher date as struct date higher date semicolon and Rewritten day of year we could then say higher date. Year Day equals day of year open print % higher date Clos print semicolon to pass a pointer to higher date to day of year the function has to be modified because its argument is now a pointer rather than a list of variables this example code is on page 122 of the textbook and you can see it at www.cc.com code struct date open curly brace int day semicolon int month semicolon int year semicolon int Year Day semicolon int M name open square bracket 4 Clos square square bracket semicolon closed curly brace semicolon static int Day tab open square bracket 2 close square bracket open square bracket 13 close square bracket equals open curly brace open curly brace 0 comma 31 comma 28 comma 31 comma 30 comma 31 comma 30 comma 31 31 comma 30 comma 31 comma 30 31 Clos curly brace comma and then another list just as long as that Co with a closed curly brace and a semicolon that just initialized the lookup table for the days in each month now on to the function day of year open pin PD close perin struct date star PD semicolon open curly braks in I comma day comma leap semicolon day equals PD minus greater than day semicolon leap equals PD minus greater than year percent 4 equals 0 and and PD minus greater than year per 100 not equal to zero or PD minus greater than year per 400 equal 0 semicolon four open print I equal 1 semicolon I less than PD minus greater than month semicolon I plus plus close print day plus equals Day tab open square bracket leap close square bracket open square bracket I close square bracket semicolon return open pen day closed pren semicolon close curly brace the d The Declaration struct date star PD says that PD is a pointer to a structure of type date the notion exemplified by PD minus greater than year I think I'll call that at this point PD right arrow because that's really what it is it's the minus greater than looks like an arrow to the right so I'm going to call it right arrow but it's really two characters PD right arrow year is new if p is a pointer to a structure then PD right arrow member of structure refers to the particular me member the operator right arrow is a minus sign followed by a greater than since PD points to the structure the year member could also be referenced as open Print Star PD Clos print doe but pointers to structures are so frequently used that the right arrow notation is provided both as a convenient shorthand the parentheses are necessary in open penar PD Clos P.E because the Precedence of the structure member operator dot is higher than the pointer lookup operator asterisk both right arrow and Dot associate to left from left to right so P right arrow Q right AR M and .b birthdate Doon are open print P WR aroq Clos print right arrow M and open print. birthdate close print. Monon for completeness here is the other function month day Rewritten to use the structure this is the first example on page 123 of the text which you can look at the source code at www.cc.com codee I won't read the struct and the date the struct date definition and the struct int static int dat tab definition we'll just go month day month day open pen PD Clos pen struct date star PD semicolon open curly brace in I comma leap semicolon leap equals PD right arrow year percent 4 equals 0 and PD right arrow year percent 100 not equal to zero or PD right arrow year percent 400 equal 0 semicolon PD right arrow day equals PD right arrow year day four open print I equal 1 semicolon PD right arrow day greater than Day tab sub leap sub I semicolon i++ Clos print PD right arrow day minus equals Day tab sub leap sub I semicolon PD right arrow month equals I semicolon close curly brace the structure operators right arrow and Dot together with parentheses for argument lists and square brackets for subscripts are the top of the precedent hierarchy and then thus bind very tightly for example given the Declaration struct open curly brace int X int star y semicolon close curly brace star P semicolon then plus plus P right arrow X increments X not P because the implied parenthesis is Plus+ open PR P right arrow X Clos print parentheses can be used after The Binding open print Plus+ P right arrow X increments P before accessing X and open print p++ Clos PR right Arrow X increments P afterward the last set of parentheses is unnecessary in the same way star P right arrow y fetches whatever y points to Star P right arrow y ++ increments y after what after accessing whatever it points to just like Star s++ open PR star P right arrow Y close print Plus+ increments whatever y points to to and star p++ right arrow y increments P after accessing whatever y points to section 6.3 arrays of structures structures are especially suitable for man managing arrays of related variables for instance consider a program to count the occurrences of each C keyword we need an array of character strings to hold the names and an array of integers to hold the counts one Poss possibility is to use two parallel arrays keyword and key count as in Char star keyword open square bracket n Keys close square bracket semicolon int key count open square bracket n Keys close square bracket semicolon but the very fact that the arrays are parallel indicates a different organization is possible each keyword entry is really a pair Char star keyword semicolon int key count semicolon and there's an array of the pairs the structure declaration struct key open curly brace Char star keyword semicolon int key count semicolon closed curly brace key tab open square bracket n Keys close square bracket semicolon defs an array key tab of structures of this type and allocates storage to them each element of the array is a structure this could also be written struct key open curly brace Char star keyword semicolon int key count semicolon Co closed curly brace semicolon struct key key tab open square bracket n Keys close square bracket semicolon since the structure key tab actually contains a constant set of names it's easiest to initialize it once and for all when it's defined the structural initialization is quite analgous to the earlier ones the definition is followed by a list of initializers enclosed in braces struct key open curly brace Char stock R keyword semicolon int key count semicolon close curly brace key tab open square bracket close square bracket equals open curly brace double quote break comma 0 comma double quote case comma 0 comma double quote Char comma zero comma and so forth down to double quote unsign double quote comma 0 comma double quote while double quot comma 0er close curly brace semicolon these initializers are listed in pairs corresponding to the structure members it would be more precise to enclose initializers for each row or structure in the braces as in open curly brace double quote break double quote comma zero closed curly brace comma open curly brace double quote case double quot comma zero Clos curly brace comma and so forth but the inter braces are not necessary when the initializers are simple variables or character strings and when all are present as usual the compiler will compute the number of entries in the array key tab if the initializers present and the open square bracket closed square bracket is left empty the keyword counting program which begins with a definition of key tab the main routine reads the input repeatedly by calling a function get word that fetches input one word at a time each word is looked up in key tab with a version of binary the binary search function we wrote in Chapter 3 of course the list of keywords has to be given in increasing order for this to work here is the first example on page 125 of the textbook you can see this at www.cc.com code pound include stdio.h pound toine maxw 20 pound toine letter quote a quote main open PR Clos PR open curly rce int n Comm T semicolon Char word open square bracket Max word close square bracket semicolon while open print open PR T equals get word open PR word comma Max word close PR close PR not equal eoff close PR if open PR t equal letter Clos PR if double open PR open PR n equals binary open print word comma key tab comma n Keys close print close print greater than or equal to zero close print key tab subn dokey count Plus+ four open pen n equals z semicolon n less than n Keys semicolon n++ if open open PR key tab sub n. key count greater than zero close PR print f double quote percent 4D space percent s back sln double quote comma key tab subn dokey count comma key tab sub n. keyword close pren semicolon and then a close curly brace to finish main binary to find the word in the in the table binary open PR word comma tab comma n close print charar word semicolon ruct key tab open square bracket close square bracket semicolon int n open curly brace int low comma High comma mid comma cond low equals z semicolon High equal n minus one semicolon while open pren low less than or equal to high closed pen open curly brace mid equals pren low plus High Clos print over two semicolon if open PR open PR cond equals stir comp open PR word comma tab comma open square bracket mid close Square bracket. keyword close print close print less than zero close print high equals mid minus one semicolon else if open pren Con greater than zero Clos pen low equals mid + one semicolon else return open PR mid Clos pren semicolon close curly brace to finish finish the while and then return open print minus one close print semicolon close curly brace so that's really a rewrite of the binary function from the earlier part where we're just taking the keyword and the count and uh and look and looking up in the array but then using the dot keyword to find the actual keyword back to the text we'll show function get word in a moment for now it suffices to say it returns letter each time it finds a word and copies the word into its first argument the quantity n Keys is the number of keywords in key tab although we could count this by hand it's a lot easier and safer to do it by Machine especially if the list is subject to change one possibility would be to terminate the list of initializers with a null pointer and then the loop along key tab just runs until the end is found but this is more than is needed since the size of the array is completely determined at compile time the number number of entries is just the size of key tab divided by the size of the struct Key C provides a compile time op unary operator called size of which can be used to compute the size of any object the expression size of open pen object closed pen yields an integer equal to the size of the specified object the size is given in unspecified units called bytes which are the same size as a Char closed pen the object can be an actual variable or an array or structure or the name of a basic type like int or double or the name of a derived type like a structure in our case the number of keywords is the array size divided by the size of one array element and this computation is used in a p pound define statement to set the value of n Keys pound Define n Keys open print size of open print key tab Clos print divided by size of open print struct key Clos print Clos print now for the function get word we have actually written a more General get word than is necessary for this program but is not really much more complicated get word Returns the next word from the input where a word is either a string or of letters or digits beginning with a letter or a single character the type of the object that is is the returned function as a function value it is the letter if the token is a word eof Rend of file or the character itself if it's non-alphabetic this sample code is on page 127 of the textbook which you can see at www.cc.com code pound toine letter quote a quote pound toine digit quote zero quote get word open paren w comma Lim close Pin Char starw colon semicolon int limb semicolon open curly brace int C comma T semicolon if open pen type open print C equals star w++ equals get CH open print close print close print not equal letter Clos print open curly brace start W equals quot back sl0 quote semicolon return open for C close for semicolon close curly brace that if statement has got some stuff going on in it you might want to look at it very closely while open print minus minus limb greater than zero closed print open curly brace T equals type open print C equals star w++ equals get CH open print Clos print Clos print semicolon if if open print T not equal letter and T not equal digit Clos print open curly brace unget CH open PR C Clos print semicolon break semicolon close curly brace star open pin wus one closed pin equals quote back sl0 quote semicolon return open pin letter close Pin semicolon close curly brace close curly brace that that example code has a lot of stuff about pointers and incrementing pointers and dereferencing pointers ET ET so take a good look at that code back to the text get word uses the routines get CH and unget CH which we wrote in chapter 4 when the collection of alphabetic of an alphabetic token stops get word has gone one character too far the call to unget CH pushes that character back on the input for the next call get word calls another function called type to determine the type of each individual character for input here is a version that's only for asky this code is the second example on page 127 of the textbook at www.cc.com code Type open paren C Clos paren int c semicolon H C open curly brace if open paren C greater than or equal to quote a quote and C less than or equal to quote Z quote or C greater than or equal to quote uppercase a quote and C less than or equal to quote uppercase Z quote closed PR return letter semicolon else if open p c greater than or equal to quote 0 quote and C less than or equal to quote 9 quote close pen return open pen digit close Pin semicolon else return open pen C Clos pin semicolon close curly brace the symbolic constants letter and digits can have any values that do not conflict with nom non-alpha numeric values and eof the obvious choices are quote a quote and quote zero quote get word can be faster if calls to the function type are replaced by references to the appropriate array type the standard C library provides macros like is Alpha and is digit which operate in this manner section 6.4 pointers to structures to illustrate some of the considerations involved with pointers and arrays of structures let us write a keyword in program again this time using pointers instead of array Industries as an aside I would note that it's a classic early assignment in any programming language to do a word frequency program here is a Python program from my python for everybody course to count words from an input stream handle equ equals open open parentheses single quote romeo. txt close single quote comma quote R quote close parentheses words equals handle do read open print Clos print.it openr close PR counts equals dict open PR close PR for word inword colon counts subword equals counts. getet open print word comma 0er Clos Print Plus One print open print counts Clos print this section in this book implements a less General word counting program in C the code depends on several functions from earlier in the book and the code below is pretty complex where the programmer only has access to low-l language without powerful and easy to use data types like Python's list or dict it is likely that GTO van rossom read this book took a long look at this code and designed the dict data structure in Python so the rest of us could write a data parsing and word frequency programming program in the above six lines of code without worrying about dynamic memory allocation pointer management string length and a myriad of other Det details that must be solved when you're solving this program in C since python is open source you can actually look at the C code that implements the dict object in a file called dict object. C it is almost 6,000 lines of code and includes other files or utility code there thankfully we only have to write one line of python to use it counts equals dict open friend close PR we will leave the complex bits to the C programmers that build and maintain this section is not showing us how to use the python dict object rather it is showing how one would build a dict like structure using C so continuing with Section 6.4 pointers to structures the external definition of key tab do not need a change but Main and binary do need modification this is example on page 129 of the textbook and that is available at www . cc4 e.com code pointer version of Counting C keywords main open print Clos print open curly brace int T semicolon Char word open square bracket Max word close square bracket semicolon struct key star binary open print Clos print comma star P semicolon while open PR open PR T equals get word open print word comma Max word close print Clos print not equal EF Clos PR if open PR t equal equal letter Clos print if open print open print b equals binary open print word comma key tab comma in Keys Clos print Clos print not equal null Clos print P right arrow key count Plus+ for p equals key tab P less than key tab plus n Keys semicolon P plus Plus close parentheses if open print P right arrow key count greater than zero Clos print print F open print double quote percent 4D space percent s back slash N double quote comma P Arrow key count comma P right arrow key word and now we look at the binary search struct key star binary open pin word comma tab comma n close Pin Char star word struct key tab open square bracket Clos spur bracket semicolon int n open curly brace int cond semicolon struct key star low equals ersan tab Subzero semicolon struct key Star High equals Ampersand tab subn minus1 semicolon struct key equals mid while open pen low less than or equal to high closed pen open curly brace mid equals low plus High minus low / two if open pin open PR cond equals stir comp word comma mid right arrow key word close print close print less than zero Clos PR High equals midus one semicolon else if Clos PR con greater than zero Clos PR low equals mid + 1 semicolon else return open pen mid Clos pen semicolon close curly brace to finish the while and then return open pen null close Pin close curly brace there are several things worthy of note here first the Declaration of binary must indicate that it returns a pointer to the structure type key instead of of an integer this is declared both in Main and in binary if binary finds the word it returns a pointer to it if it fails it returns null second all the accessing elements of keydb is done by pointers this causes one very significant change in binary the computation of the middle element can no longer be simply mid equals pen low plus High Clos pin divided by two because the addition of two pointers will will not produce any kind of useful answer even when divided by two and is fact illegal this must be changed to Mid equals low Plus open pen High minus low closed peren / two which sets mid to the point point to the element halfway between low and high you should also study the initializers for low and high it is possible to initialize a pointer to the address of a previously defined object and that is precisely what we have done here in main we wrote for open print P equals key tab semicolon P less than key tab plus n Keys semicolon p++ Clos print if p is a pointer to a structure any arithmetic on P takes into the account the actual size of the structure so p++ increments P by the correct amount to get to the next element in the array of structures but don't assume that the size of a structure is the sum of the sizes of its members because alignment requirements for different objects C potentially may cause holes in the structure finally an aside on program format when a function returns a complicated type as instruct key star binary open print word comma tab comma n Clos print the function name can kind of be hard to see and or find with a text editor alternately another style is sometimes used on two lines now struct key star on the first line in binary open PR word comma tab comma n close print the spacing is mostly a matter of personal taste pick the form you like and hold to it section 6.5 self-referential structures before we start this section a slightly longer aside from your narrator up to now I've resisted the temptation to augment the book with my own bits of code but we have reached the single point in the book where I feel that there is too big of a conceptual leap between two sections so I'm going to add some of my own narrative between section 6.4 and 6.5 the rest of this chapter talks very nicely about binary trees and hash tables both essential lowle data structures in computer science and both excellent ways to understand pointers and how C can be used to build data structures like the python dictionary however the author skipped separately describing the structure of of a dynamic Dynam dynamically constructed link list which is the first and foundational collection data structure in computer science and should be understand understood before moving to tree and hashmap structures link lists form the foundation of the Python list object Java array object PHP numeric key arrays and JavaScript arrays the link list can be dynamically extended and items can be added in the middle efficiently as well as being pushed or popped on or off the front or back of the list link lists are used also to implement cues as well as other aspects of operating system I will attempt to mimic the authors's writing style in this new section of the book I'll write some sample code using a more modern dialect of C so it's easier to run on a modern compiler section 6.5.1 linked lists bonus content suppose we want to read a file and print the file in reverse order we don't know how many lines will be in the file before read the file so we can't simply use an array of pointers to Strings and character Rays like lines in a sense we need a dynamic array that grows as we encounter new lines when we reach the end of the file we then just Loop through our stored lines from the end to the beginning so we can print them out in reverse order one solution is to make a data structure called a doubly link list of character strings in addition to each line of data we will store a pointer to the previous line and the next line as well as well as a pointer to the first item the add to the list and then we'll call this the head of the list and then the most recent item we've added to the list which we will call the tail of the list we will see a singly link list as part of the hashmap data structure in a following section a single link list can only be traversed in a forward Direction a doubly link list can be traversed either forwards or backwards given that our link list of strings will keep expanding as we get new lines we avoid hardcoding array sizes like pound to find Max Len 1000 in the previous chapter we were building a program to sort a file going back to the description of a line in our doubly linked list it is clearly a structure with three components struct L node open curly brace Char star text semicolon struct L node star preve semicolon struct L node star next semicolon Clos curly brace this recursive definition of l node might look chancy but it's actually quite correct it is illegal for a structure contain to contain an instance of itself but struct L node star preve declares prieve to be a pointer to an L node not an l- node itself we'll write this code in a more modern seed dialect using modern memory allocation and IO routines provided by the standard seed Library this code is on page 130 of the textbook available at www.cc.com code pound include stdio.h pound include STD live. pound include string.h pound toine MAX Line 1,000 this is the length of the line not the number of lines struct L node open curly brace Char star text semicolon struct L node star preve semicolon struct L node star next semicolon Clos curly brace semicolon now we have our main program to print the lines in reverse and we will use int main open print Clos PR because we're coding in a modern dialect of C open curly brace struct L node star head equals null semicolon struct L node star tail equals null semicolon Charline submax line semicolon while open print fgs open print line comma MAX Line comma STD in close print not equal null Clos print open curly brace Char star save equals open print Char star Clos print Malik open print sterlin open print line Clos Print Plus One Clos print semicon stir copy save comma line struct L node Star new equals open print struct L node star Clos print Malik open pen size of struct L node close pen close pen semicolon new right arrow text equals save new right arrow next equals null new right arrow preve equals tail and tail equals new if head equals equals null close pen head equals new semicolon close curly brace to finish the while now we'll print it all out four open PR instruct L node star current equals tail semicolon current not equal to null semicolon current equals current right arrow prev close parentheses open curly brace print F open print double quote s close uh double quote comma current right arrow text close pen semicolon close curly brace to finish the four and then close curly brace to finish the main interestingly if we wanted to print the list in for forward order or if we did have only a singly link list our Loop would look as follows four open pen struct L node star current equals head semicolon current not equal to null semicolon current equals current right arrow next close PR open curly brace print F open print double quote percent s double quote comma current right arrow text Clos pen semicolon Clos curly brace in general we use the variable names head tail and current as well as next and preve or similar names when writing code that builds or uses a link list so other programs will quickly understand what we are talking about after a while reading a for Loop to Traverse a linked list becomes As Natural as reading a for Loop that progresses through a sequence of numbers section 6.5.2 binary trees suppose we want to handle the more General problem of counting the occurrences of all the words in some input since the list of words isn't known in advance we can't conveniently sort it and use a binary search yet we can't do a linear search for each word that is arrived to see if it's already been seen the program would take forever more precisely what is the expected running time and it would grow quadratically with a number of input words how can we organize the data to cope efficiently with a list of arbitrary words our solution is to keep the set of words seen so far sorted at all times by placing each word into its proper position in the order as it arrives this should this shouldn't be done by Shifting the words in a lineary array though that takes too long instead we will use a data structure called a binary tree the tree contains one node per distinct word each node contains a pointer to the text of the word a count of the number of occurrences of the word a pointer to the left child node and pointer to the bride child of node no node may have more than one more than two children it might have only zero or one the nodes are maintained so that any node in the left tree contains only words that are less than the words at the node and the right tree only contains words that are gr to find out whether a new word is already in the tree one starts at the root and compares the new word to the word starting into that node if they match we found it if the new word is less than the tree word the search continues down the left otherwise the right child is searched if there's no child in the required Direction then the word is not in the tree and in fact in fact the proper place for it to be is the missing child this search process is inherently recursive since the search from any node uses a search from one of its children accordingly recursive routines for inserting and printing will be the most natural going back to the description of a node is clearly a structure with four components struct t- Noe open curly brace Char star word in count semicolon struct t- node star left semicolon struct t- node Star right semicolon Clos curly brace this recursive Declaration of a node might look chancy but actually it's quite correct it's eal illegal for a structure to contain an instance of itself but struct te node star left semicolon declares left to be a pointer to a node not a node itself the code for the whole program is surprisingly small given the handful of supporting routines that we've already written these are get word to fetch input and Alec to provide space for squirreling the words away the main routine simply reads words with get word installs them in the tree with tree this is the first example on page 131 of the textbook which you can see www.cc.com code pound include stdio.h pound Define maxw 20 pound toine letter single quote a single quote main open pen Clos pen open curly brace struct t- Noe star root comma star Tree open print Clos print semicolon Char word open square bracket Max word close square bracket semicolon int T semicolon root equals null while open pen open pen T equals get word open pen word comma Max word close Pin close Pin not equal eoff close Pin if open pin t equal letter close Pin root equals Tree open pin root comma word Clos pin semicolon tree print open pen root Clos print semicolon close curly brace tree it's itself is straightforward a word is presented by main to the top level root of the tree at each stage that word is compared to the word already stored at the node and is percolated down either to the left or right subtree by a recursive called a tree eventually the word either matches something already in the tree in which case the count simply is incremented or a null poter pointer is encountered indicating the node must be created and added to the tree if a new node is created tree returns a pointer to it which is installed in the parent node this is the example on page 132 of the textbook at www.cc.com code pound includes string.h struct t- Noe open curly brace Char star word semicolon int count semicolon struct T node star left semicolon struct T node Star right right semicolon closed curly brace semicolon struct t- node star Tree open pin p comma W close print we're going to install w at or below P struck T node star P semicolon Char star W open curly brace struct t- node star T alic open pin Clos pin semicolon Char star stir St stir Save open p close p semicolon int cond semicolon if open print pble equal null Clos print open curly brace we've got a new word p equals T Alec open print close print to make a new note P right arrow word equals stir Save open print W Clos print semicolon P right arrow count equals 1 P right arrow left equals P right arrow right equals null close curly brace else if open print open print cond equals stir comp W comma P right arrow word close print close print double equals z close print P right arrow count Plus+ indicate that we've seen the word one more time else if open print cond less than zero Clos print lower will go into the left part of the tree P right arrow left equals tree P right arrow left comma W Clos PR semicolon else P right arrow right equals Tree open PR P right arrow right comma W Clos print semicolon return open print P Clos print semicolon close curly brace and that's a bit of code it's not much and it's beautiful recursion is happening the return P at the very end is really really important because we're kind of always overriding it but we're okay so um The overriding Works its way back up to the recursion because it's recursive it's using pointers where we go back to the text storage for the new node is fixed by a routine called tallic which is an adaptation of the Alec we wrote earlier it rep turns a pointer to a free space suitable for holding a tree Noe we'll discuss this more in a moment the new word is copied to a hidden place by stir say the count is initialized and the two children are made null this part of the code is executed only at the edge of the tree when a new node is being added we have unwisely for a production program omitted error checking on the values returned by stir save and T Alec tree print prints the tree in left sub tree order at each note Prince its left subtree all the words less than this word then the word itself then the right sub tree all the words greater if you feel shaky about recursion draw yourself a tree and print it with tree print it's one of the cleanest recursive routines you can find this example code is on page 133 of the textbook which you can see at www.cc.com code I won't read The Struck T node code just tree print tree print open pen P Clos pen struct T node star P semicolon open curly brace if open pen P not equal to null Clos pen curly brace open curly brace tree print open pen P right arrow left close pen semicolon print F percent 4D space percent s back slash N double quote comma P right arrow count comma P right arrow word some close parentheses semicolon tree print open PR P right arrow right Clos PR semicolon close curly brace for the if and then close curly brace for the tree print function again I agree with the authors and that is one of the cleanest and most beautiful and most applicable uses of recursion that you will probably ever see in all your I'm not a fan of recursion in all uses cases but you really can't do this any other way well back to the text pract as a practical note if the pre tree becomes unbalanced because the words don't arrive in random order the running time of the program can grow too fast as a worst case if the fors are already in order this program does an expensive simulation of linear search there are generalizations of the binary tree notably two three trees AVL trees and I would add balanced binary trees which do not suffer from this worst case Behavior but we will not describe them here before we leave this example it's also worth a brief digression on a problem related to storage allocators clearly it's desirable that there only be one storage allocator in a program even though it allocates different kinds of objects but if one allocator is to process requests for say pointers to chars and pointers to struck T noes two questions arise first does it meet the requirement of most real machines that objects of a certain types must satisfy alignment restrictions for example integers must often be located uneven addresses second what declarations can cope with the fact that Alec necessarily returns different kinds of pointers alignment requirements can generally be satisfied easily at the cost of some wasted space merely by assuring ensuring the allocator always returns a pointer that meets all alignment restri restrictions for an example on the pdp1 it is sufficient that Alec always returns an even pointer since any type of object may be stored at an even address the only C cost is a wasted character on the odd length requests similar actions are taken on other machines thus the implementation of Alec may not be portable but its usage is the Alec of chapter 5 does not guarantee any particular alignment in chapter 8 we'll show how to do the job right as an aside by now you know that when the authors mentioned the pdp1 they are sharing some aspects of the challenge of making CW work on previous generation computers with short memory words and small amounts of memory and at the same time making them work well on the incoming generation of computers with larger words and more memory the research thought and care that went into making sure the C code was portable across multiple generations of computer hard Ware is on display in the previous paragraph the question of the type declaration for Alec is a vexing one for any language that takes its type checking seriously in C the best procedure is to declare that Alec returns a pointer to char and then explicitly coers the pointer into the desired type with a cast therefore if p is declared as Char star P semicolon then open per instruct t-e star Clos print P converts it into a t- node pointer expression thus tallic is written as struct t- node star T Alec open pren Clos pren open curly brace Char star Alec return open pren open print struct t- node star Clos print Alec open print size of open print struct t- node Clos print Clos print close print this is more than is needed for current compounds but represents the the safest course for the future I would add that these concerns that the author's mention in this section are also nicely resolved in modern C compilers in the NC version of C they introduce the notion of the Void type the void type indicates the lack of a type much like null is used to indicate not a valid pointer or not a flying toy in 1978 because the charart type was generally the most native type on any system it was often used as the generic pointer needed to return memory from an allocation function in modern C we use pointers to void and then cast The Returning pointer to be a pointer to whatever struct or other data we just allocated if we were writing Alec the Alec routine in this book using modern C it would return a pointer to void the 1978 version is Char star Alec open pin Clos PR and the modern version is void star Alec open pin close Pin we've left the book alone we haven't used void throughout the book but it is a testament to the foresight of the authors that all the pointer casting code in this book still works today the same regardless of whether the memory allocation functions return Char or void pointers to the allocated data section 6.6 table lookup as an aside in this section we finish our quick tour of the implement ment ation of the Three core data structures in computer science one the link list two the tree and three the hashmap as described in this section A singly Link list is also part of a hashmap implementation so you can compare it to the doubly link list code introduced in the earlier bonus section 6.5.1 this section is worth understanding well because not only is it an excellent review of pointers and structures but also because one of the most common questions on a face-to-face programming interview is draw a hashmap on the Whiteboard and explain how it works this is an easy question if you study and understand this section of the book and almost impossible if you have not in some ways this section is the most intricate data structure that's described in the book it is why it is so popular in coding interviews chapters is seven and eight talk about lots of practical things like input output of the Unix operating system elegant data structures and their use are Core Concepts in computer science understanding them highlights the difference between a good programmer and a computer scientist in a sense understanding how a hashmap is the secret handshake of computer science and it is the secret handshake because of this book and this section of this book written back in 1978 and used in a course that the person interviewing you may have took when they were in college hashmaps were difficult for them to understand back then and so if you understand the concept then you must be solid so I hope you pay close attention to this section and remember the handshake back to the text in this section we will write the innards of a table lookup package as an illustration of more aspects of structures this code is typical of what might be found in the symbol table management routines of a macro processor or a compiler for example consider the C pound defined statement when a line like pound Define yes one is encountered the name replacement Tech the name yes and the replacement text one are stored in a table later when the name yes appears in a statement like inw equals yes semicolon it must be replaced by one there are two major routines that manipulate the names and replacement texts install open pin s comma T close Pin records the name s and the replacement T in the table s and t are just character strings look up open print s Clos print searches for s in the table and returns a pointer to the place where it was found or null if it wasn't there the algorithm used is a hash search the incoming name is converted to a small positive integer which is then used to index into an array of pointers an array element points to the beginning of a chain of blocks describing the names that have the hash value and if it is null then no names have hashed to that value a block in the a block in the chain is a structure containing pointers to the name the replacement text and the next block in the chain a null next pointer marks marks the end of the chain struct and list open pen Char star name semicolon Char stard def semicolon struct end list star next semicolon close curly brace semicolon the pointer array is just pound toine hash size 100 static struct end list star hashtab open square bracket hash size Clos square bracket semicolon the hashing function which is used by both lookup and install simply adds up the character values in the string and forms the remainder modulo the array size this is not the best possible algorithm but it has the Merit of extreme Simplicity hash open pen s Clos pin Char star s semicolon open curly brace int hash Val four open curly brace hash valal equals zero star s not equal quote single quot back sl0 single quote semicolon Clos pen hash Val plus equals star s++ semicolon return open pen hash Val modulo percent hash size Clos prin semicolon closed curly brace as an aside hashing functions are one of the foundational Notions in computer science hashing functions are used for every from high performance in memory structures organizing databases data digital signing Network packet check sums security algorithms and much more the above text is a really great example of a really simple hashing function you should understand this simple presentation well so that when you encounter a more complex implementation or use of hashing you can fall back on this text to understand that at its core hashing is a very simple concept so much of this chapter is a succinct example of some of the most powerful Concepts in computer science please don't look at the eight code lines of code above and think I got that and just jump to the next bit this chapter is showing you the way of the master programmer wax on wax off be patient slow down and enjoy your time time here back to the text the hashing process produces a starting index in the array hashtab if the string is to be found anywhere it will be in the chain of blocks beginning there the search is performed by lookup if lookup finds the entry already present it just returns a pointer to it if not it returns null here's the code struct end list star lookup open pin s Clos pin Char star s semicolon open curly brace struct and list star NP semicolon four open pen NP equals hashtab sub hash open pen s Clos pen close square bracket semicolon NP not equal null semicolon NP equals an NP right arrow next close parentheses if stir comp open print s NP right arrow name close print double equals 0 return open print NP close print semicolon return open print null Clos print semicolon close curly brace install uses lookup to determine whether the name installed is already present if so the new definition must supersede the old one otherwise a completely new entry is created install returns null if for any reason there's no room for a new entry struct and list star install open pin name common def close Pin jar star name comma stard def semicolon open curly brace struct end list star NP comma star lookup open PR Clos PR semicolon Char stir save comma star Alec open PR close PR semicolon int hash file semicolon if open print open print NP equals lookup open PR name close print close print double equal null close Pin open curly brace I.E it's not found NP equals open pin struct end list star Clos pin Alec open print size of open Print Star NP Clos print Clos print semicolon if NP equals null return null that means the allocation failed if open pen open print NP right arrow name equals stir Save open print name close print close print double equals null Clos print return open p null Clos pint semicolon hash Val equals hash open PR NP right arrow name closed print semicolon and P next equals hashtab open square bracket hash Val close square bracket second semicolon hashtab open square bracket hash Val close sare bracket equals NP semicolon we're actually pushing these new ones to the head of this singly link list and so those last two statements push the the most recent top of the list forward and then replace with a new one at the top of the list the list does not stay in any order so we have a Clos curly brace to end all that if for the not found code closed curly brace else this is the already code already there code free open pin NP WR Arrow defa Clos pin semicolon free the previous definition that's the string part of the the uh the pound Define if open print NP right arrow def equal stir Save open print defa close print close print equals no close print return open print null Clos print semicolon return open print NP Clos print semicolon Clos curly brace so that last bit there was to if you have a pound Define with the same keyw and a different later you have a later a different uh definition you can replace the definition and so that last bit was replacing the definition again this code is pretty intricate it's really both the hash table and a singly link list going on at the same time so take a close look at this on page 136 of the book stir save merely copies the string given its argument to a safe space obtained by a call to Alec we've showed this code in chapter five since calls to Alec and free may occur in any order and since alignment matters the simple version of alic is just not adequate here see more in chapters 7 and 8 as an aside one of the reasons that the authors make vague forward-looking statements when they talk about dynamic memory is that large-scale memory management in a programming language is still a subject of active research 40 years later back in 1978 it was absolutely not a settled topic you can see this when the authors build a simple non-production memory allocation scheme with their own Alec and free routines backed by a fixed length static extern array of characters Dynamic allocation is essential to writing competency programs but it is likely that the production grade dynamic memory support was still somewhat non-portable when the book was written so they use Simple self-contained implementations in this book modern dynamic memory support is through the Malak C alic and free functions in the standard Library these functions request dynamic memory blocks from the operating system and manage those areas on behalf of your C code on Unix and Unix like systems the memory allocation layer asks the underlying operating systems for blocks of memory through the sbrk interface even with virtual memory programmers must carefully manage their use of dynamically allocated memory because memory is never unlimited section 6.7 Fields when storage space is at a premium it may be necessary to pack several objects into a single machine word one especially common use is a set of single bit app flags and applications like compiler symbol tables externally imposed data formats such as interfaces to Hardware devices also often require the ability to get at pieces of a word as an aside we are going to go now from lowlevel programming to even lower level programming the Unix operating system is written in C and Unix needs to have for example an implementation of the internet Pro protocol so it can be connected to the internet one of the most important internet protocols is the transmission control protocol TCP in order to implement TCP you need to send very precisely formatted data across the network the data is very tightly packed in order to save precious Network bandwidth the exact format of a TCP header is described in the TCP Wikipedia page if you look at the header you will find that it bits 96 through 99 TCP expects a 4-bit integer that defines the data offset exactly what this data means is less relevant unless you're actually writing the TCP implementation but it does demonstrate that we need to control our data layout at times on a bit bybit basis this section covers how we can use struct to build up a TCP header in C which can be parsed and set without using masking and shifting operations with hard-coded numbers the section below is simpler than constructing a valid TCP header using a carefully pack struct but it does lay the groundwork for these more complex situations now back to the text imagine a fragment of a compiler that manipulates a symbol table each identifier in a program has certain information associated with it for example whether or not it's a keyword whether or not it's external Andor static and so on the most compact way to encode such information is a set of one bit Flags in a single Char or int the usual way this is done is Define a set of masks corresponding to the relevant bit positions as in pound Define keyword 01 pound Define external 02 and pound define static 04 the numbers of course must be powers of two so that the shifting works then accessing the bits become a matter of bit fiddling with shifting masking and complimenting operators which were described in Chapter 2 certain idioms a appear frequently Flags vertical bar equals external vertical bar static turns on the external and static bits in Flags while flags and percent equal till open pen external or static closed pen semicolon turns them off and if open pen open pen Flags Amper sand pen external vertical bar static Clos pen Clos pren equal equals zero dot dot dot is true if both bits are off although these iums are readily mastered as an alternative C offers the capability of defining and accessing Fields Within A Word directly rather than by bitwise logical operators a field is a set of adjacent bits within a single int the syntax of field definition and access is based on structures for example the symbol table pound defines above could be replaced by the definition of three Fields struct unsigned is keyword colon one semicolon unsigned is extern colon one semicolon unsigned is static colon one semicolon closed curly brace Flags this defines a variable called flags that contains three onebit Fields the number following the colon represents the field width in bits the fields are declared unsigned to emphasize that they are really unsigned quantities individual fields are referenced as flags. is keyword flags. is extern ETC just like other structure members Fields behave like small unsigned integers and may part participate in arithmetic Expressions just like the other integers thus the previous examples may be written much more naturally as Flags is underscore extern equals Flags is static equals 1 semicolon turn the bits on flags is exter equals Flags is static equals 1 semicolon to turn the bits on flags. is extern equals flags. is static equals z turns them off and if open prin flags. is extern double equals 0 and flags. is static double equals z close print dot dot dot to test them a field may not overlap an INT boundary if the width would cause this to happen the field is aligned to the next int boundary Fields need not be named unnamed Co Fields with a colon in width only are used for padding the special width zero may be used to force alignment to the next int boundary there are a number of caveats that apply to Fields perhaps the most significant fields are assigned left to right on some machines and right to left on others reflecting the nature of different Hardware this means that although fields are quite useful for maintaining internally defined data structures the question of which end comes first have to have has to be carefully considered when picking apart externally defined data other restrictions to bear in mind fields are unsigned they may only be stored in ins or equivalently unsigned they are not arrays and they do not have addresses so the Ampersand operator cannot be applied to them section 6.8 unions a union is a variable which may hold at different times object objects of different types and sizes with the compiler keeping track of the size and alignment requirements unions provide a way to manipulate different kinds of data in a single area of storage without embedding any machine dependent information in the program for example again from a compiler symbol table suppose constants may be in Floats or character pointers the value of a particular constant must be stored in a variable of the proper type yet it is most convenient for table management if the value occupies the same amount of storage and is stored in the same place regardless of its type this is the purpose of a union to provide a single variable which can legitimately hold one of several types as with Fields the syntax is based on structures Union UT tag open curly brace int IAL semicolon float F Val semicolon Char star P valal semicolon Clos curly brace U Val the variable U Val will be large enough to hold the largest of the these three types regardless of the machine it is compiled of compiled on the code is independent of Hardware characteristics any one of these types may be assigned to U Val and then used in Expressions so long as the usage is consistent the type retrieved must be the type most recently stored it is the responsibility of the programmer to keep track of what type is currently stored in Union the results are machine defend dependent if something is stored as one type and extracted as another type syntactically members of a union are accessed as Union name. member or Union pointer right arrow member Justice for structures if the variable U type is used to keep track of the current type start in U Val then one might see code as if open pin you type double equals int Clos pen print F pen double quot percent d/n double quot comma U IAL close paren semicolon else if open paren U type double equal float close paren print F open PR double quote percent F sln double quot comma U val. fval close pen semicolon else if open pen U type double equal string close pen print F double quote percent s back sln double quote comma U val. pval close PR semicolon else print F open print double quote bad type percent D and U type back sln double quot comma U type Clos pren semicolon unions may occur in structures and arrays and vice versa the notion for accessing a member of a union in a structure or vice versa is identical that to that for nested structures for example in the structure array defined by struct open curly brace Char star name semicolon int Flags semicolon int utype semicolon Union open curly brace int IAL semicolon float F Val semicolon Char star P Val semicolon Clos curly brace U Val semicolon Clos curly brace Sim tab open square bracket n Sim close square bracket semicolon the variable IAL is referred to as simab sub. val. IAL and the first character of the string P Val by star simab sub. U.P in effect a union is a structure in which all the members have offset zero the structure is big enough to hold the widest member and the alignment is appropriate for all types in the Union as with structures the only operations currently permitted on unions are access accessing a member and taking the address unions may not be assigned to passed to functions or returned by functions pointers to unions can be used in a manner identical to pointers to structures as an aside the above limitations on unions are just no longer accurate like structures modern C compilers can assign the contents of the Union to another Union variable you can also pass unions into functions by value and receive a union as the return type of a function the storage allocator in chapter 8 shows a union can be used to force a variable to be aligned on a particular kind of storage boundary section 6.9 typ def C provides a facility called type def for creating new data type names for example the Declaration typed def space int space length semicolon makes the name length a synonym for INT the the type length can be used in declarations casts Etc in exactly the same ways int can be length Len comma Max Len semicolon length star lengths open square bracket close square bracket semicolon similar the Declaration type def Char star string semicolon makes string a synonym for Char star or a character pointer which then may be used in declarations like string P comma line PTR open square bracket lines closed square bracket comma Alec open pen closed pen semicolon note that the type being declared in a type def appears in the position of a variable name not right after the word type def syntactically type def is like the storage class extern static Etc we've used uppercase letters in these examples to emphasize the names a more complicated example we could make type deps for the tree nod shown earlier in the tra chapter type def struct T node open curly brace Char star word semicolon int count semicolon struct T node star left semicolon struct T node Star right semicolon Clos curly brace tree node comma star tree PTR this creates three two new type keywords called tree node which is a structure and tree PTR which is a pointer to the structure then the routine tallic could become tree PTR T alic open PR Clos print open curly brace Char star Alec open print Clos PR semicolon return open print open print tree PTR Clos print Alec open print size of open print tree node close print Clos print close print semicolon close curly brace it should be emphasized that a type def declaration does not exactly create a new type in any sense it merely adds a new name for some existing types nor are there any new semantics variables declared this way have exactly the same property as variables who whose declarations are spelled out explicitly in effect type def is like toine except that since it's interpreted by the compiler it can cope with textual substitutions that are beyond the capabilities of the C Macro preprocessor for example typ def int open PR star PFI Clos print open print Clos print semicolon creates the type PFI to mean pointer to a function returning integer which can be used in contexts like PFI stir comp comma num comp comma swap in the sort program in chapter 5 there are two main reasons for using type def declaration the first is to parameterize a program against portability problems if type defs are used for the data types which may be machine dependent only type deps need to change when the program is moved one common situation is to use type def names for various integer quantities and then make an appropriate set of choices of short int and Lawn for each host machine the second purpose of type defs is to provide better document mentation for a program a type called tree PTR may be easier to understand than one declared only as a pointer to a complicated structure finally there's always the possibility that in the future a compiler or some other program such as Lind may make use of the information contained in typed def declaration to perform some extra checking on the program this work is based on the 1978 C programming book written by Brian W kernigan and Dennis M Richie their book is copyright All Rights Reserved by AT&T but is used in this work under fair use because of the book's historical and scholarly significance its lack of availability and the lack of an accessible version of the book the book is augmented in places to help understand Its Right Place in a historical context amidst the major changes of the 1970s and 1980s as computer science evolved from a hardware first vendor centered approach to a software centered approach where portable operating systems and applications written in C could run on any hardware this is not the ideal book to learn SE programming because the 1978 Edition does not reflect the modern sea language using an obsolete book gives us an opportunity to take students back in time and understand how the sea language was evolving as it laid the groundwork for a future with portable applications [Music] hello and welcome to objectoriented patterns a historical perspective we're going to cover a number of different things in this lecture first we're going to do a bit of review of object orientation from previous courses that we're going to take a look at historical perspective across a bunch across time and a bunch of different programming languages important the important part about that is that object orientation is a concept it's not just a syntax and so by looking at different syntaxes we'll have a better understanding of the underlying concept then what we're going to do is we'll look at how one might have used C to build object Orient support into a language like say python for example or C++ and then we're going to look at building a python string class in C a python list class in C and then a python dictionary class in C sort of showing how C is also the foundation of most modern object-oriented languages so if you take a look at the other courses that I've taught I have been teaching objectoriented if you taken my courses I have been teaching you object orientation for a long time one of the things you'll notice is that um I just keep coming back to it if you look at python for everybody chapter 14 is about object orientation I I claim that Django for everybody is is really a class and object orientation because D Jango itself is a a collection of cooperating objects you you create Jango applications by creating objects that are being sent messages which is very much a purely objectoriented concept I cover object oriented JavaScript particularly because JavaScript is a little bit different I cover that in both Django for everybody and in web applications for everybody and so we're going to take a look at all these different uh syntaxes of object orientation and then we're going to try to build our own objects but in a non-ob Orient language IE C I am not going to teach you object orientation in this lecture I am going to do a very very brief review on my other lectures mostly what I wanted you to know was terminology so the most important terminology is a class A Class is not an object it's a template to make objects it's like the cookie cutter in the cookie methodology an attribute is some data that you can contain in each instance of the class and a method is some code like a function that operates within the context of the instance of the class the object is a particular instance of the class Stamped Out from the class when a new instance of the class is requested so you have one class then you have many many instances so if you do a userdefined class in Python you see that we have a special keyword class which is different than defa and within it we have attributes and we have methods so the methods look like functions but in particular the methods inside of a class always have this first parameter which by convention we call Self you could call literally anything else but if you did people would be confused because the convention is so strong that SF is always the first parameter of every method inside a python so in it is the instance and then two parameters and then we can look at attributes like self.x and self.y and then assign them in this case to the X and Y from the call so we have a couple of methods like def dump and again a dump itself is a zero parameter method but it always has the instance and then we look within it we can say self.x and self.y and look at all the attributes and we can use the ID function which we learned in the last chapter which is kind of like an address lookup function in Python and so what we have here is we have a little method called origin which takes the distance for our point from the origin to our point and takes the square root of the the two sides of the triangle squared so if we then get out of the the class definition and move into the actual code we call PT equals 4050 that is how we call the Constructor point is the class name and then the parameters to the Constructor are four and five in that example point. dump that's the dump method inside a point and you can pass the parameter PT which is the instance so PT ends up being the instance of this point object but the more common syntax is to not use the class name and the method name and explicitly pass self in but instead in the next line we see print origin and then pt. origin well pt. origin is expanded to point. origin and then the first first parameter is kind of added on which is PT which is the instance now I've got a Dell Dell parentheses PT closed parentheses and that is the destructor now in Python the destructor would get run automatically at the end of the program so in this case that Destructor is a little bit redundant so now I want to take a bit of a historical look at object orientation so if you've seen any of my other classes I try to put the programming languages that were learning in context and these are the common programming languages like Python and PHP and JavaScript and now we're teaching C and so if you look at this the idea is is you can see that the language C has been the inspiration for virtually all of the modern languages that we we use today and that I've taught you um python inspired is INS while the syntax of python doesn't look like C python has been inspired we see things like there's formatted printing that talks directly to the underlying C implementation because cpython not every python but the C python that we normally use is written in C so c as we've said many times in this class already is profoundly influential in the syntactical evolution of procedural programming language so we've seen this before but if we think about object orientation object orientation the inspiration and evolution of object orientation really took sort of a different path through a bunch of languages that you may or may not have seen or heard so the oldest language on this is Fortran in 1955 and alal 60 UH alol 60 I don't have it marked as like that alol 60 was inspired by Fortran uh Fortran um but probably the better thing to say was that alol 60 was created in spite of Fortran so Fortran was loved by some and not loved by others and alol 60 is more what the computer scientist of the day in 1960 decided so there's a a series of languages that were popular in computer science but not necessarily popular in general purpose programming like uh simulus 67 which took a lot of the objectoriented ideas from um elal 60 and um Pascal which is a procedural language which is another language that uh I learned in the in the 70s when I was going to school and um if you look for so so algol and simula were languages that were mostly procedural but had objectoriented Concepts in them I always think of the most objectoriented languages as small talk it it you know it's not necessar the first one but it feels like it's the one that developed the notion of object orientation the most but it it really took inspiration from an earlier language which was lisp and so lisp was an early early 1960 interpreted language uh lisp was often thought of as the foundations of artificial intelligence now scheme in 1975 was a direct derivative of lisp inspired by Lis kind of the next generation of lisp 15 years later and it had a bunch of objectoriented Concepts in it as well and so you can kind of see where where the object-oriented notation the object-oriented Notions were evolving kind of independently from the preferred syntax and in the' 70s when C came along it really changed the way we think of a syntax and so it inspired C++ inspired Java JavaScript C and PHP but then what happened is each of these things kind of took different inspiration from different object orientation so there's a there's really almost independent inheritance of where the inspiration for the language came from versus where the inspiration for the objectoriented pattern and probably the biggest thing that you see is C++ which is early 80s took its inspiration both from simula and from C so it it it was the a hybrid language that tried to take the object-oriented notations from simula and layer them on top of the procedural syntax of C now another interesting thing here is is that python in 1991 um was quite aware of C++ and so if you look at a PHP which in 1995 was not at all object oriented it wasn't object oriented till 2000 python was objectoriented in sort of almost 1991 92 because the computer scientist of the day were quite aware of these kinds of things and so python sort of implemented object orientation almost as soon as python was implemented and so like I said python 1991 implemented object orientation and I I really like the object orientation in Python PHP implemented non-ob oriented '95 but then added objects in 2000 Java took a lot of inspiration from C++ Java was trying to be the next C and so it's got a lot from C it's got a lot from C++ and then C Shar was inspired by C++ and Java but then JavaScript is kind of like the outlier JavaScript even though it came out in 1995 and was very much informed by Java it took its object-oriented pattern more from scheme which is a more pure object orientation rather than an objectoriented layer and so JavaScript among all of these things that we'll look at JavaScript is sort of the outlier in terms of its object and approach and if you've taken classes for me in the past I've mentioned that before because JavaScript has first class functions and the way you create things etc etc etc so we're going to take a look at some of these object Jo implementations over over time I've already talked about the python inspiration I flip Python and C++ Plus in the early days because at least python we kind of know um C++ is 1980 Java ' 95 JavaScript in 95 PHP in 2000 and then C in 2001 [Music] n [Music] it was highly collaborative in the sense that this was a group of you make up a number 30-ish people who were all interested in much the same kinds of things although with tentacles off into you know theoretical computer science math and so on uh physical sciences kind of thing but mostly a lot of us at least we're we basically software people BNA came to uh B Labs I think in 1979 after getting his PhD at Cambridge um and he was interested in simulation you know he known uh simula in particular which was an object probably one of the very first object-oriented languages and he wanted to do simulation but C was kind of the language that people used at Bell labs and so what he did was to try and take some of the good ideas from SIM in particular class ideas and put that on top of c and for a long time the implementation of C++ was B basically translate C++ into C and then you could run it anywhere so it's one of many pragmatic engineering decisions that BNA made that if you want it's hard to get people to buy into a new language if they have to carry an enormous amount of infrastructure and support and other baggage with it whereas if it's one more program that then fits perfectly into your existing environment as a language as libraries and all the rest of it much easier and so C went through a period of evolu well it's still evolving uh but starting there in the very early you 1980 81 something like that the two languages were very much together because we're all in this one group and Babs would fit comfortably in this Corridor this building um and be which certainly new see inside out and then was developing this new language that ran on top of it that stressed C compilers so that was useful because the code that that his pre-processor generated was astonishing and I think some of the ideas in C++ then retrofitted back into C in particular the obvious one of how you declare the arguments for a function I mean that's just that was better and a handful of other things so for a while the two languages you could say that c was a pretty close perfect subset of C++ uh I think that's evolved in both directions and so it's less true now than it was but for a long time you could take a seek program and just run it through a C++ compiler and it would work there's a general observation that people write code differently than computers write code and so machine generated code tends to stress in particular the compiler or the language for which you're generating and and so in the C++ to C example were you know incredibly deeply nested constructs of one sort or another you know parentheses that made lisp look tame um and then very convoluted pointer kinds of computations as well function pointers all kinds of and so it was definitely a stress test and also generating things that had odd sizes and so on that were not expected or at least paths that had maybe not been thoroughly tested in a given compiler I think a lot of people did not think that C+ plus was right in some sense it had various warts blemishes and so on many of those were again direct result of be's engineering judgment if you want this thing to take off the more culturally compatible it is the more likely it will do that if you make something that's wildly different people are going to kind of ignore you and so um and so some of the syntactic problems of C++ that are still with us are you know the same syntactic problems that usually can see for quite a while when I was trying to teach C++ to people I would show them the translation that goes from a C+ plus object into C and you know it's basically just pointers into structures with the compiler kind of keeping the names apart so you don't have to think about them and and seeing that translation you could see how objectoriented programming could be done at essentially no overhead because it's just structure pointers and um and funny function name and you can pass function pointers around and so it was all pretty well behaved and I think that understand I mean it helped me understand what was going on in C++ and objectoriented programming and so on I think in modern languages and python a fine example of that there's an incredible amount of magic going on there and I don't quite know how it's done as well as it is I mean I can sort of Imagine but but the mechanisms to make some of those things list comprehensions with lambdas in them and so on I how the heck does that work [Music] I was trying to characterize what is C++ particularly useful for what did I actually build it for and what is its strength today after almost 30 years of evolution and and so and and also where's the limits to where it is so the way I see it is there's a core domain for C++ and uh that's what you would call traditional uh um systems programming but that's not the right term because it's just a style of language and a style of programming so I go a little bit further I said what is it that that that requires the kind of services that C++ has what is the sum of all these applications I've been dealing with where did it work and where was it essential and I came up with the phrase infrastructure and I I roughly Define it as if it breaks uh somebody gets hurt or somebody gets ruined these are the kind of foundational uh things in our systems that must work for the system for the societ to work the the the the question I'm trying to phrase is what matters in those areas and I've come up with some Notions of compact data structures um very strongly typed interfaces for maintaining ability and for minimizing errors um a heavier emphasis on algorithms or uh random code because we we need reliability we need the stuff to be comprehensive ible analyzable we want to make sure that it's actually correct and so the paper I was writing come from from that kind of of thinking what is the right Style and what are the supports for that style we need for infrastructure software for for for software that must be dependable and we can get real examples um sort of the the keys of some of the modern operating system the the basics of our phone system the brakes in my car um how do we make that dependable how do we uh make sure that the space probes don't uh get the logical equivalent of the blue screen of death halfway to Mars where we can't send a a repair man how do we make sure that they actually go into the right orbits and JPL lost a um a probe of Mars because two groups um had communicated nicely they thought but in fact one of them spoke Imperial meeses and the other one spoke uh the SI system the MKS system and the result was uh a misn navigation that sent uh more than 500 million worth of equipment into the wrong orbit not not a good idea it was the work of 200 good Engineers lifetimes work down the drain uh and that could have been avoided by an ever so slightly Improvement in the interfaces uh between the parts of that program so things like that I'm interested in so there's a core area where I think the the facilities for C++ the kind of they're not perfect the kind of things I would like the kind of things I work on are essential and then there's a uh huge gray area where uh you have choices they can help in my opinion but they're not essential then you have areas where it uh probably is unsuitable to apply that kind of uh of stringency and technique I mean I I want to essentially get 100% reliability if I'm putting up a small application for my own use or or if somebody is trying to push out a a little cute uh web app they don't need that kind of reliability and maybe what I'm talking about in terms of programming is not for them but I think the really important thing here is that realize that there are different techniques different languages that apply to different areas and we have to recognize this we can't have a single language for everybody a single technique of using that language for everybody um we can't have a single tool chain or single kind of of system um and from there we can go a little bit further we probably don't need the same kind of training for uh education for for everybody the engineer software develop or whatever that is building infrastructure um say the mechanism that automatically updates the software on uh on a cell phone uh has to have a different education different knowledge different training from the one that makes a little game um because the one can actually destroy um a whole day or um or a whole week for millions of people by a little slip um maybe even somebody gets hurt if the 911 calls don't get through bad things happen and it's not just the software that runs on it it's the update software it's anything in the chain of of safety critical uh issues that has to be dealt with but somebody doing that has to think differently from somebody who writes a little application uh maybe to make a couple of of of bucks uh uh quickly uh there's nothing wrong with that but they have to think different as a matter of fact if you apply this very stringent engineering oriented thinking to to little commercial apps you'll probably be a year late to Market and uh it will be irrelevant on the other hand if you took the attitude of of first to Market is the only thing that happens and apply them to uh the steering wheel of my car uh no that's not a good idea these people have to think different and the way you get people to think differently is to is to give them different educations we we don't have an ANC standard programmer and and we shouldn't have if it was we should have several and I think the field has to in some sense clean its up its act before somebody else comes in and try and cleans it up for us this notion that you can sort of make your own constants and have the seconds after the constants is that is that a feature that you added at some intermediate stage or is that simple sort of operator overloading we were observing that we're getting a whole zoo of little suffixes out of uh the various fundamental types so the U suffix is for unsigned and L is for long I think and I can't remember them all anymore and we thought you know you can do anything in C++ but you can't make your own little Ro and then I had observed separately that there was techniques that were effective except that people would not used them because no notation was too ugly uh the Unix example is one of units example is one of those uh you've been able to do everything in the Unix example I showed not quite as elegant but you can do everything showed for the last 10 years libraries have been available uh there was a nice one from firmy Labs back to nice places and nice people again uh but it wasn't used as much as it should because the users didn't believe in the notation they they didn't like it and so we were looking at how can you basically clean up People's Source Code how can we make code look the way it would look in an ideal language how can we make code look much as it does in the textbooks so the units sample is simply a way of getting your code to look uh the way the equations look in your physics textbook I mean we know how to avoid that Mars climate o a problem everybody had been taught it in physics in high school first make sure your units match then do the detail calculation so why didn't people do it it was too combersome and it was too ugly when they did it or too costly if they used um runtime techniques so I uh together with with friends we we thought this problem was worth addressing try to figure out what Solutions we had went through several Evolutions I I think the the last finishing touches was done by David um and it's it's now standard um this is one of the features that is not shipping widely today but wait a year and that that example will probably run your computer too is are there other kind of examples where you've let people um from within a class create their own literals Beyond sort of suffixes when I started out the C++ I provided conr structors which allows you to to construct objects of a certain type from arguments and and that has been very effective and people have used Constructors as um as if they were literals but they weren't there was a runtime cost so the first thing we did with C++ uh Ox C+ plus 11 was we introduced uh constant Expressions as a more fundamental unit this was work between me and my colleague Gabriel d r and we have con exper functions that can be evaluated at compile time and we have conext for types so that you can use type Rd programming at compile time this is very important in sort of the high performance Computing and in the embedded systems world that was what we did to address that constant expression evaluation is much more General much much easier to use and yes more more pretty uh in C++ 11 at an earlier versions so that just strengthens what the direction we've gone before uh so you could write complex one comma uh two to make a complex number in C++ in 1984 um today you can write the same thing and have the complex number uh created at compile sign and therefore say put in ROM I don't know why you would want a complex number on ROM but you might want a point which is the same thing point of what two now you still have to write point or complex and the other uh thread in uh of thinking in C++ work was to generalize and make safer the initialization and so it happens that if you know the type you need like you have a function returning uh a complex number you can simply write open curly 1 comma 2 and it says oh 1 comma 2 is supposed to make you a complex number and it will make a complex number and return it and if you happen to be a compil time you'll do that at compile time so things work together you get better notation you get better performance and anything you can do at compile time works even better in a concurrent system because you can't get a race condition on a constant if it's been calculated before the program starts you can't get the threading problems so open curly means find a Constructor for the thing that I'm about to put this in yes it looks at where the destination is and it says is there a two two parameter Constructor or a three parameter Constructor or whatever or if it's just a struct it'll take the first element and put in the first element and second it does it for a struct as well oh yeah see I should I haven't played in C++ in a while that's a great idea refers to say uniform and uh uniform initialization it's little little initialize if it can and if there's any ambiguity of course finds the ambiguity if you're in a context where the like you're calling a function and the target could be a point or a complex number it's a tough luck go or tell me so the the the error checking has actually been improved C++ 11 is slightly better at finding bugs than C++ uh 98 was I mean I come from the the school of philosophy that says that the compiler is your best friend when you generate um Cod when you build programs and uh to make it your best friend you actually have to have more types if everything is an integer well what can the type system help you you there no it can't if everything is a floating Point number it can't tell you whether it's Imperial or or SI units and you get books so you need type Bridge interfaces and for that you need to be able to build U cheap types basically and flexible types that's easy to use uh simple and so we worked from complex of 1 comma Z One comma two or something like that so curly is is 1 comma 2 and the type is optional only needed when it's needed and then finally we can now write uh if we wanted to uh 1 + 2 I Define I as the uh unit for the um imaginary parts and you get complex arithmetic without ever saying complex it's down in the definition of the ey suffix [Music] so we're going to take a look at some of these objectoriented implementations over over time I've already talked about the python inspiration I flip Python and C++ in the early days because at least python we kind of know um C++ is 1980 Java in '95 JavaScript in ' 95 PHP in 200 and then C in 2001 so here's the example that I uh showed you before uh talked about it before you have a Constructor double underscore andit double underscore with self as the first parameter um self.x the again the key thing is self is by convention self is not a language construct um it just so happens that the first parameter of method calls is to the instance and we almost exclusively use the word self for that that is a very early 1991 implementation of an objectoriented syntax on top of a mostly procedural language now if we take a look at C++ C++ was initially implemented as a pre-processing past and it sort of did sort of one for one textual Transformations and so class was not a keyword in the C language public that's not a keyword in the C language there was syntactic transformation that kind of transformed it into C and then run it through the C compiler and so we say I mean and this is this is pretty elegant and you can absolutely see how python was inspired by C++ so class Point open curly brace uh double X and Y are the attributes and the Constructor by convention is the exact it's a function that has no type the Constructor is the exact same name as the class so in this case I've got point with an uppercase p and then the Constructor parameters um come in now the interesting thing is as you have to declare as double X and Y as the attribute variables and then within the function you can think of the double XY as almost like xter scope means they're Global across all functions so you have this weird thing where you can't have AE parameter like you could in Python you can't have a parameter the same name as an instance variable or you'll be confused and so you'll notice I called the XC and YC as my parameter variables so that I know that those are um parameters in the Constructor and so I can copy XC into X and YC into Y and um it also means if you look at the dump code you see like you don't have to say self.x you don't you just say X and Y because X and Y are doubles they're instance y extern variables as it were and the same is true in the the origin function that's returning the otk of x^2 + y^2 you don't have to have self or this or anything else so if we take a look at the main program on that first line Point PT open pin 4.0 comma 5.0 we are both allocating uh a point instance and uh calling the Constructor and setting it up then we use the dot syntax pt. dump open PR close close print semicolon uh to dump it out and then we can call pt. origin and then that will return us a uh double and so you'll see this sort of dot syntax um that that is becomes pretty common in every every language that comes afterwards so let's take a look at Java now recall that Java was inspired both by C and C++ and really wanted to be like the super language and you see that it looks a lot like like C++ but it does introduce the concept of this so that it Scopes the external variables that are the instance variables you access an instance variable by saying this.x and this.y rather than just X and Y and I I think it's actually more elegant this way so we say public class point and then double X comma Y which are the instance variables and then the the Constructor is using the C++ Convention of point with no type uh double x and double y are the two parameters this dox equals X and thisy equal Y close close curly brace now I I like this better because the this using this does not uh keeps me from having to make weird function parameter variable names I can make them what I want to be and this is the way using this is the way that we contextualize that so if we look at the dump method we see um the use of this uh and and this is not just by convention this is a language element so this is the pointer uh you'll see you can see from the output um it actually prints out the class that it is and some kind of uh reference like ID like thing and so Java actually if you start printing objects out it tells you what its type and what instance it is it's not necessarily an address but it probably is related to the address somehow but then we print out this.x and this doy and then if we look at the Double origin method we see it's this.x time this.x plus this.y * this.y so you see the sort of use of of this throughout then if we look at the main program we see Point PT equals new point 4.0 comma 5.0 this is where we see the use of this new operator where you're saying look call The Constructor and I like this I I like the sense that you're you're calling the Constructor on purpose you're not calling it implicitly and then you end with PT as the instance of this object and so for me uh this feels pretty good pt. dump is a good example of calling a method within the instance and pt. origin uh is uh similar if we take a look at JavaScript and recall that JavaScript is the weird one JavaScript did not take its inspiration from C++ c++'s syntax was kind of influenced by the fact that it was initially a a pre-processor to the language would see and it some would some computer scientists would think that that's a rather impure way to think about object orientation and so the JavaScript when it was created the idea was is to be more pure in object orientation and so the concept of first class functions there is because it has first class functions there is no class keyword the class keyword was really useful when it was a pre-processor the thing that uh JavaScript does take is the concept of this now the interesting thing is if you look at everything on the screen from function point to the closed curly brace this is the Constructor The Constructor constructs everything it constructs the attribute variables and it actually constructs all the methods as well so if we look at the line this.p party equals function open print Clos print open curly brace we are setting in a sense a attribute variable to source code this is an anonymous function there's no name to the function so most functions are named but this function has no name that's a JavaScript thing it's a first class function thing we are basically not running that function with the two lines of this.x equals this.x plus1 and console.log blah blah blah those lines aren't running they're being compiled and then they're being the code to execute that is being assigned into the attribute variable party so so it doesn't run it it reads it and stores it in party you can literally later print out this. party and you will see the source code to that function and the same is true for this. dump it's an assignment statement and then similarly this. origin and you see within those methods you see this.x and this.y which is taken directly from uh job we also see the concept of new in PT equals new. 4.0 5.0 we're explicit and I like this I like the idea of saying please call the Constructor from the point class and pass these two variables in and then we see the C++ inspired syntax of pt. dump to call a method in the PT instance and pt. origin to call a method in uh instance if we look at PHP now the key to PHP was is PHP is a procedural language uh when it was created in uh in '94 and then it became an objectoriented language in PHP 4 and then five it was late to the party so it could be inspired by everything by JavaScript by Java by by um by C++ by simula by scheme all of those things and so the PHP object orientation is is kind of pretty as a result of that now one of the things that happens in PHP is it's got some weird language syntax things in that variables have have to start with a dollar sign thank you Pearl for that um and the dot operator is used for concatenation in PHP so we couldn't use the dot operator to look up instance variable or a method inside of an instance PHP borrowed the C Arrow operator which is minus sign greater than now the interesting thing is is if you go back to pointers the the arrow operator is what you do when you have a pointer to a structure and so in some ways that is a throwback to see in a beautiful way because I think kind of under the covers and as we shall C when we are starting to implement object orientation in C ourselves we're going to see that pretty much we get pointers to an instance is a pointer to something rather than a thing and so we we see this this Arrow x equals dollar X and the Constructor we see the Constructor is a double underscore construct so the um uh PHP kind of use the single underscore and double underscore as sort of metadata about the meaning of things and double underscore are things you're not supposed to call they're supposed to be private and then we have a function dump and you see the concepts of this inside of dump and origin you see this in the main code you see dollar PT equals new 4050 well it's following the new from Java and others and then we call PT Arrow dump dump and then dollar PT Arrow origin it pretty much works like most of the other ones except that You' never use dot because dot is concatenation in PHP and I'll be honest I love dot for concatenation except for the fact that PHP is different than every other language that I use every other language uses Plus for concatenation of strings now C is a 2001 so C was inspired kind of by everything and so you see that c is uh clearly very C++ oriented but with some javess to it so you see the double XY in the class which is the uh two instance variables you also see that not there's no use of this and so you see that we have to name the parameters differently because X and Y are in effect Global across the entire class and then we have you know void dump there and then we see that it's just using X and Y no need to use this it's it's tough for me to decide which of these two ways I like better um I guess that this is this feels more explicit to me um so I like Java and JavaScript in that respect self in the I like self also if we look at the origin function we see um X starx and plus y star Y and so we don't need to use this we again see a very uh Java oriented Point PT equals new Point open pren 4.0 comma 5.0 know point. dump we're using the dot operator to look at instance variables and methods within the uh within the class and then point. origin so now that we've looked at sort of a survey of the different kinds of object-oriented languages that are that are today and we can sort of see how they derived ideas from one another now we're going to actually try to build an object in a non-object oriented language so we're going to build like a python object in C [Music] so now what we're going to do is we're going to try to build objects in C C doesn't have obor support so in a sense we're going to do it by writing functions and using structures and pointers Etc so we're kind of answering the question about how was Python's object-oriented layer layered on top of a c structure so we can kind of put ourselves in the position of uh G van rosom as he was building python in 1991 and say how are we going to make this syntax work how are we going to um in C which is underneath all of this how can we make this syntax work and so this is just review you know we got class point we got a Constructor takes two parameters self is our instance pointer and we got dump and we got origin and then if we look at the main program we create a new Point call The Constructor we can we can see that point. dump that is the function name dump inside a point but then we've also got to pass the instance in or the shortcut syntax and so the pt. origin open print Clos print that is kind of paying more homage to the way C++ would have called Methods and then of course the Dell operator at the end so let's build ourselves some code in C we are building in effect a point object in C so we're going to just start with a structure and the structure is going to be point and there's some instance variables we're going to just allocate a double X and A Double Y inside of it but then the methods are kind of weird we are going to take the Dell method the dump method and the origin method and we're going to Define them as pointers to functions so void open PR star Dell Clos print open print construct point star self Clos print semicolon the void is the return type of this function stard Dell means a function name Dell that the code is not here but this points to a function somewhere else and then the construct point star self that's the first parameter right and so construct point star self is the fact we're going to have one parameter it's going to be named self and it is a pointer to a structure and we've got something similar to dump um the origin is pretty much the same except it's got a return value okay and so that now is a structure this is C so the structure is going to allocate one double two doubles which is uh should be eight bytes each and then three pointers which is eight bytes each so we got eight times 5 that's going to be 40 bytes it's a it's not a dynamic structure it is exactly 40 bytes of allocation because C structures are just memory and so it's not like you can sort of throw more stuff in there you got to define it you have to Define what type it is and it's going to allocate space we are going to use a naming convention for now and we're going to create the dump function and the dump function is going to take a self parameter and we're going to name it pointcore dump that's just a naming convention and we're going to name the first parameter as it comes into our function self just like python we're going to print out object point at and then percent P which is the way we print a pointer out so self is a pointer xal percent f y equal percent F and then we're going to print out self and self Arrow X and self Arrow y now remember that kind of looks like uh uh PHP uh because self is a pointer to a structure not a structure itself so we use the arrow operator to both dreference self and then look up the attribute uh the uh attribute variable X so if you look at the output you see it's object point at big long HEX number for the address and x = 4 y = 5 and then we have the void Point Dell which is very similar construct point star self so the first parameter to Dell is self and you're all of the first parameters are always going to be self when we create these functions that we're going to treat as methods and then we're going to free that's all that's going to do is call free on the pointer where we pre originally allocated it then we're going to create the origin method and again take a single parameter self and we're going to return the square root of self x * self X Plus self y * self Y and that's going to have return value and then we're going to go and do the Constructor and the Constructor is going to return a pointer to a point and it's called Point new we're going to sort of follow the new convention and it's going to take two parameters an X and A Y so the first thing we've got to do is we got to allocate the 40 bytes size of star P which is a double a double two doubles and two pointers to functions which I I think I've got it right is 40 characters and then we're going to set the we'll get that address of 40 characters back we're going to set the x value to be X from The Constructor the yv value to Y from The Constructor call and then we're going to set the dump pointer to Ampersand Point dump now this is done on purpose where Point dump is defined earlier in the file and then P origin is the same thing point aend point origin so in each object that we're creating we are going to record the address of three in effect Global functions right they're named Point underscore but these are just regular old functions in the global function namespace right now you know we don't we don't have namespaces we're in C folks we can't sort of do that fancy stuff so we just use a naming convention to accomplish it and then when we're done with the Constructor we do return P so that whatever is calling us gets their instance back so p is the instance but we are in the Constructor allocating and filling the instance up with data and it's just a struct it's just 40 bytes of memory with some labels so in the main code we say struct point star PT equals pointcore new and then prin 4.0 comma 5.0 and this looks a lot like oo code except it's not we're using a struct a pointer to a structure and we're calling a global function called Point new we just happen to have named it in a way that looks a lot like op orientation and so now what we can do is we can say PT Arrow dump which means go look up the dump variable inside the point object that's pointed to by PT and then call it but we still have to pass in PT as that first parameter because that is self that is the instance so all these functions dump Dell origin all need to have as their first parameter self and so PT dump looks up dump but then we still have to put PT in as a parameter and that syntax we're going to do the same thing for PT Arrow origin open print PT Clos print and then to clean things up and in this case we need to well the F the program is done but you know you need to free up allocated memory so the memory is allocated in the Constructor 40 bytes is allocated in the Constructor and then those same 40 bytes are deallocated with by calling free in the destructor and so we we fall intents of purposes there is other than conceptually there is no objects involved in this there's strs there's pointers and there's functions the fact that you can get a pointer to a function means that we've kind of imitated it and again I look at this as how GTO van rosom actually was like facing this and thought to himself how am I going to figure this out how am I going to make it look like this is object orientation so this is this is kind of probably some of the code looked a lot like this in the early days of python and then there was kind of a simple syntactic transformation layer in the in the python sort of parsing to call these things with naming conventions so you can do a lot of object orientation with naming conventions and if you recall C++ started as a language pre-processor and so again you could almost look at this as how did C++ get built right C++ had some oo syntax that then transformed the oo syntax into C code that looks a lot like this which is oh we got some functions the functions have name and conventions and we create a struct and that struct has data in it but it also has pointers to function in it and we'll call the data the attributes and we'll call the pointers to the functions the methods and voila we have objectoriented programming so up next we're going to actually implement the python string class or at least a little bit of the Python string class [Music] so now we're going to switch from my little point class which is just two doubles to an actual string class so what's interesting about the python string class is that you can extend it and so we've been talking a lot about pointers and arrays and even when you call Malo you can't just keep extending things whereas in Python thankfully we can just extend things we create a string we can append H to it and we can print it append L world and print it and then assign it to some other string and then print that and get get its length and we never had to allocate or deallocate any memory during this time when you get done looking at the code what we're going to have to do to allocate and deallocate memory what you should be thinking is wow I'm glad I'm programming in Python I'm glad that GTO van rasum gave me a string class rather than a character array of fixed length an expandable string class rather than a character array of fixed length so we're going to create in C using our little Convention of naming a string class and so if we look at the code what we're try going to try to do here is we're going to basically emulate the python syntax but in C so we're going to start bying making a structure pyer structure we're going to get a pointer back we're going to name that X we're going to call the Constructor pyer new we're going to dump it we're going to have a little dumper we're going to append an H to it we're going to dump it again then we're going to pend a whole string now H in C is a character and L world is a multi character string and so we're appending many characters we're going to dump that then we're going to assign it to a completely new string and then we're going to print it out like Pi give me the string version of this object or the length of it and then we're going to delete it throw it away so you can see all of the Python operations are sort of mimicked but with naming conventions in C now the one thing you'll notice here is in this main code we never allocated any memory and we never deallocated any any memory that is within the object now within the object we have a responsibility to properly allocate and deallocate but one of the interesting things here is is I haven't shown you the code to do any of that and so you don't know that but that's cool because we can use this as long as we do a new play with it and then do a Dell we can do stuff with it it underneath pyer it does all of that memory management for us and that's one of the beautiful things about an object-oriented approach again the syntax on one side if and see is pretty heavy but in the syntax on the on the other side which is the U the python is pretty light but the idea is is that in Python we never had to worry about over making a string too long or Too Short or having buffer overrun or anything like that so as we dive in we have to realize that part of the job of this pyer object is to handle all memory allocation on our behalf so we as programmers can write much simpler code okay so now we are going to build the pyer class we're going to create a structure called piser and in that we're going to have three things the length of the string we have how much data we've got allocated in the string and then Char data is the actual character array and so we have to have a character array inside of it we're not we're not going to let the outside code touch this character array directly we're going to completely manage it inside this object we got draw a little bubble around us and it's like you can do stuff you can use my object but I'm going to deal with everything for you so don't mess around with my internal stuff so what we would think of is all of this instruct pyer is sort of private and see we don't have a good way to force it to be private um but in the concept of object orientation length Alec and data would be something we'd think of it as private in our Constructor we are being asked to create a new python string and we're going to return a pointer to that structure when it's done so the first thing we do is we Al alloc it now int int is usually 32 bits so that's four uh Alec there's 4 eight there's probably 16 characters in pyer um when we do Malo size of p is 16 that's the number 16 and so it's going to give me 16 characters now the the key thing is is that that is not allocating the actual string data it's just allocated eight bytes for a pointer to the string star data is a pointer and that first Malo is only giving us the pointer not the actual data so then we just sort of set it up we say our length of the string is zero there's nothing in it our allocated length of the underlying data string is 10 and then we immediately call Malik to get 10 characters so now data is a 10 character character array and Alec tells us how much we've allocated because it's our job inside this thing to keep track of that stuff and then just to be be good we throw back sl0 at the zero position in that allocated character we don't know what the rest of them are we just know that the first one is zero and then we return the pointer to the structure not the pointer to the data the pointer to the structure and this gets called inside the main as TR struct pyer star x equals pister new and when we're done we get back this cool little two pieces of data that have been dynamically allocated and it's all it's all ready for us to do cool stuff with we got the struct we've got the Constructor and then we've got the destructor which is pyer Dell and that again passes in self now we're calling free now if you recall there are two allocated things one is the data which is the character array that we've got we've got to get rid of that and then we got to get rid of the object itself and so at the end of Dell we have given back all of the data that we've alled now one thing important here is the order of these two statements matters a lot so so when we free self we're not supposed to access self anymore after that point I'm I'm sure there could be some data just laying in there that's not been ruined but you just don't know and so that's why we have to free self Arrow Data before we free self just because it's just wrong to do that in the other order and so we do a pyer dump and in that we dump out the length we've dump out how much we've allocated so far and what the data is in it so far and then pyer Len piser Len returns an integer and it takes self as a parameter the key to this is it returns self length and you might ask why it is that we don't just let our calling code access self length and this again is encapsulation we don't want to refal the fact that we're keeping track of length in this variable because we don't want the calling code to be messing with it remember that length data and Alec are kind of private and so instead of saying just go look at self length no I would like you to call my function and I will give you the thing you want so you just call the Lend function and pass in the instance and that allows me to change the name of length it allows me to interpret length differently allows me to do all kinds of things but at least the object writer is in control of the contract with the outside world so by hiding all the data and giving methods to a we call these accessors to access this data is a good idea now the underscore stir if you think of python it's like you can say stir open print close print anything inside the parentheses and it converts it to a string well it just so happens that we're going to maintain self. data as a valid string So when you say take this string object and convert it to a string ready for printing I'm just going to return the pointer to the string we've been maintaining all along internally we have some other methods that we've got to add we've got to add an append to add a single character you can see that it's got two parameters it's got self and a single character CH you got a pend s which is a got two parameters the self the instance and a whole character string which is a pointer to a character then we have a sign which is a got two parameters one is self and one is a pointer to a character string now I'm not going to give you these lines of code I'm going to give you an assignment to write these lines of code I'm going to show you how they're supposed to work but I'm not going to give you the code so I'm telling you that pyer append is about 10 lines of code pyer append s is just one line of code it's a for Loop pyer a sign is about three lines of code so pyer append s calls pyer append and pyer assign calls pyer append s and so we do a lot of reuse here so let's take a look at how these are going to be used in our main program we say struck pyer starx equals pyer new which is give me a new string object and then we're going to append A Single Character H to it and then we're going to append s a multi character string and we're going to dump it each time and then we're going to overwrite our object with a completely new string and so the key thing is you have got to build this this is what you're going to build okay but I'm talk a little bit about how to do it so let's walk through what you might need to do in pyer append now recall that when we set this thing up we created length we allocated 10 characters and a 10 character array and had data point to that 10 character array and we remembered that we had 10 characters so the first thing that a pen does is it checks if the the the length is greater than what we've allocated meaning that you know if we're going to put in character Zero like the letter H we can just depend it and then update length we still have 10 characters allocated and we've used one of them um and so we can just start appending into data right and we have to put a a zero at the after it so that the data is a valid string all the time and so if you kind of imagine that we create the new object we have a new object that has a length of zero and it has 10 character array and it has a a string end character in the first character we're good we have 10 allocated and we know we have 10 allocated then if we add an H character A Single Character H all we have to do is add H into that array data Subzero in that case and then update length to be one and then say data sub one is back sl0 so that we terminate it correctly so after that first line the data is H it's a valid H string so we've appended A Single Character we've updated the length and then we have terminated the string and then we go to the next line in C where we're just in this case we're going to append the letter e and we look at the length of the of it because it tells us where to put it the length is one so we put it in sub one and add back sl0 and then we check to make sure that we have space for it because we've got 10 but we've only used two we' really used three because h e end of character string so we really use three but the length of the string we've got is three so as long as no one asks us to append more than 10 characters append is a pretty simple operation you just add to the the character array that we've already got allocated okay but of course it gets interesting you can pen h e l l o space w r and at that point we have uh nine characters in our length of the string that's in data is nine we've got it properly terminated so we have used the 10th character to terminate the string so we're really good things are great but now the problem is we have got to append the L after the R so we have pen the L after the R so what we have to do is we have to call a function we called Malik in the Constructor and now in a pen we're going to have to call realic to say ooh I asked for 10 characters but now I want to extend that from 10 to 20 characters and realic does that realic says here's a pointer and it knows how many characters it is please reallocate this pointer take this data in the this pointer and give me make it 20 long instead of 10 long it might have to copy it so let's take a look at what realic does so we can extend the size of a dynamically allocated area by calling realic with the current pointer to the area and the new size so in the Constructor you see that we Malik 10 and then we're in the pend and we say if the length is greater than self Alec minus 2 we don't have space for two characters left then and we going to have to realloc so what we're going to do is we're going to change this from 10 to 20 characters so we're going to take self aloc which is 10 and add 10 to it so now self alic is 20 and then we're going to set self data to a to realloc the old self data 20 characters so this this realic takes a pointer and a new size and gives us back a new pointer now it actually may have to move it in memory so you can't assume that self data is the same before and after but you can assume that if it had to move the data to find you a 20 character slot in its free space that it will have copied all the first 10 characters will be copied and then you'll get a new parameter and that's why you see self data on both sides both in the call to realic and as the assignment statement so we go back here and we can see that oh yeah now we have 20 and it's got plenty of space for the L and the D and the back sl0 so now we're going to show the code that's going to basically test our class we're going to create a new we're going to dump it we're going to append A Single Character H we're going to dump it we're going to append a string one way to make this simple is just have append s call append repeatedly for nine characters because a pending nine character string is the same as a pending nine characters not appending one character at a time nine times then assign assigning a completely new string which means that you got a you got to take length back and you got to set some things and you got to check the size and do a whole bunch of stuff and then we're going to ask the piser uncore stir to give us back a printable string and then we're going to ask the pyer uncore L Len to tell us how long this thing is and so you get to write some code not too much code probably 15 lines of code um but it is code that you will need to think deeply about and you're going to need to understand the structures you're going to need to understand the pointers etc etc etc up next we are going to make a list class [Music] so the next class that we're going to build in C is an emulation of what you would do if you were building the python list class in C so let's just start by taking a look at sort of a python and uh C version of this thing in Python we create a new list then we append a whole string then we print it then we then we have another string then we print it we have another string we print that we ask how long is the list we do an index which is a positional look up saying where is the string Brian then we say if Bob's in the list where is it or we say we can't find Bob so we have to do an if then else and use in because otherwise we'd have to use a TR catch because if you do an index to a with a string that's not there in Python it's going to blow up so we can either do an if then else or we can do a TR catch it's sort of six in one half a dozen in the other but in C we're going to effectively call py list new to create a new list we're going to call py list aen and again remember all the time we're calling these things that are like methods we're always putting the instance as the first parameter in this case LST is the instance so we're going to append hello world we're going to print it we're going to append append catchphrase we're going to print it we're going to append Brian we're going to print it then we're going to look at the list length of the list then we're going to look up Brian and we're going to look up Bob and in this case I made it so that the index just gives us back negative one to say I didn't find Bob so that I didn't have to try catch because it's like a little bit more SE like and then we do a py listor Dell to clear up the memory we are about to switch from being the consumer of the list object to the builder of the list object and our job as the Builder of list object is to dynamically allocate all the data that we need to make this thing work and so we don't get to see the details of that all we know is there these functions that we can call in this structure that we can use and if we call the functions right somebody else is going to deal with all of the dynamic memory that makes this work and you've done link lists in previous assignments so link list should not be completely foreign to you but now we're taking an object-oriented approach to implementing a link list and hiding the implementation detail within the object which is an important part of object-oriented programming so here's like some basic stuff and some of this should start looking pretty familiar we got a l node which is short for list node we got a pointer to a character string um and then we have a pointer to the next one so we call that one next by convention next is not a keyword next is just a really common convention when we're making link lists and then we and that's just the node so link list is a list of nodes but then there's kind of the list itself and that's what the struct py list is and it's got a pointer to the head and a pointer to the tail and just a counter and so if we create the new list py list new we are going to allocate the the the py list object which is a pointer which is eight two pointer 16 and four should be 20 bytes and then we're going to that's what star p is going to be then we're going to set the head to null and the tail to null to indicate that we have an empty list we're just we're not creating a list with things in it and set the count to zero and we're done so it's it's pretty straightforward in some ways it is this list is easier in some ways than the the python uh string was now the destructor is a little trickier because we actually have to go through the list and we have to free up all of the text areas not just the um not just the the struct L nodes but we've also got the charar text that we've got to get rid of so what we'll see here is we're going to in the pyus Dell we're going to carefully start at the head and then Loop through and remember that I free Cur has got to be the last thing we do with CER once we say free CER for the that's the L nodes we got to do the free Cur we're not supposed to touch C ever again so you'll see I say free Cur text which is the string that's pointed to in the current node then I look up next and I I'm looking up next before I call free curve because I'm not supposed to use Cur afterwards so I say next equals K next give me the next pointer before I wipe out cerr I wipe out cerr and then I say C equals next and so I just I I created that next variable inside that function just to kind of get past the free Cur so I didn't have to say C equals c next after I called free C and then Loop goes through and it slowly but surely cleans up all of the L nodes might be zero there might be no l nodes and head head will be null at that point and W won't even run but you got to free the text and you got to know where the next pointer is then you free the the current one and then you advance to the next pointer and jump up to the while loop and then do the rest and then and only then afterwards do you free the self which is the actual py list object these structures tend to point to structures that tend to point to structures and you got to when you're ding them when you're freeing them you got to free them from the outside that that think of it as a tree you got the leaves and the branches and then the trunk and then The Roots You Got to you got to free them from the leaves inwards and so just be real careful about this that's part of the reason that I give you so much sample code where I do the Dell for you because I just don't want it to mess up if we take a look at the step of freeing the dynamic memory you're going to see that it's going to if we have a head and we have a tail here um the first thing it's going to do is it's going to to the L node that is the head and then it's going to the first thing that's actually going to be freed is the text and then it's going to free the L node then it's going to advance to the next L node which has is it's going to free the is then it's going to free the fourth thing which is the second L node and then it's going to advance to the third l node and then it's going to free the fun which is the fifth thing freed and then it's going to free the last L node and it'll notice that next is is is null and so we're done with there was a three node list and then the last thing we do is we free the the py list itself so that the order in which we free these things is really really important and you think of it as the leaves outwards right the C is fun those are three strings that have been allocated those have to be freed first before we free the L node that happens to point to them so every little bit of order matters the one thing I want you to do in this one is I want you to make the list output instead of it being dump I you'll notice I called this one print not dump and I want it to look exactly like Python's list output which means it's got to open Square brace it's got the strings and double single quotes comma space in between them Etc and don't try to use string concatenate to do this cuz you're in C you're not in Python you don't even know how long these strings are going to be what you need to do is you need to cleverly write a loop that uses print F so think of this as you can only use print F don't use a string because you don't have strings just use print F and remember that print F doesn't add a new line unless you actually put the new line in so it's pretty easy to do print F Open Bracket then print f single quote print off the string printf single quote printf comma etc etc so you got to it's about 10 lines of code and you know enjoy yourself I think you'll do a pretty good job of this and you'll be impressed when you're all done and then you think oh I'm walking down the path of GTO van rosom because GTO van rossom had to write exactly these lines of code now he he actually was probably using a string class which I just told you not to use because he didn't want to uh he didn't want to call Print F directly so he wanted to make it so you can convert to a string but whatever you're walking the path that GTO van rossom walked while he was building the list object that's what I want you to do here's some more methods some are easier some are hard um Len is really easy uh index is not too bad it's a for Loop but you Loop through you look for a value and then you just return negative one if you don't find it you just return the position you got to kind of ADD go 0 1 2 3 4 5 6 7 and return the seven if you find it aend is a bit tricky but hey that was chapter six you should know how to do that you've actually by this point in chapter six you would have written one of these things so go consult your own code at that point and so here is the ultimate test case of our list class you were just going to mimic that python code we're going to append a hello world string append a catchphrase string append a Brian string print them all the time we're going to print the length and we're going to do a index lookup for Brian and Bob and then we're going to delete it we always delete it because we're not in Python so we're carefully deleting it and other than the negative one for Bob being 404 in the python cuz that was kind of a joke um it is identical right we're really starting to build what looks like a python list so up next it's pretty much you guessed it we did a string we did a list yep it's a dictionary we're going to actually build a dictionary in our next [Music] bit so now we're going to build a python dictionary class and here is the code that we would put in our dictionary I I kept the strings really short because of of of all the uh I want the examples to be pretty short and easy um so what do we do we create a dictionary and we use the you know bracket square bracket operator to create a key it's a key value pair the key Z match it goes to catchphrase we print it out the dictionary Z goes to W which is replacing catchphrase with W because if you overwrite the same key you have to put the same Val the the you have to replace the value and then we're going to throw three more things y map maps to b c maps to Capital C and A maps to D that's just so that it's not sorted too badly and then I printed out and then I um print the length of it I do a get and with a default value of 404 so sometimes I get the for z i get the W and then for X is not there so I get the 404 again kind of an homage to the HTTP error code 404 not found and then I use it right a little for Loop for key and dict uh Etc and I can print the key value pairs out we'll do the same thing in C and again this is almost a perfect transformation literally we first create the dictionary with calling a new then we uh put the word catchphrase in the Z key then we print it then we put a w in the Z key which should overwrite it we print that and then we set the y key to be B the C to be Capital C the a to be capital D and then we print that and then we ask how long is it and then we do a get to look up under the Z key and the X key one of which is there one of which is not there and I get a null back in that situation I guess I was a little see like in my get code and then I dump it out and I do a struct d node Loop go from the head until it's null and I print out the key and the value from each of those uh dictionary nodes and then I delete it at the end so this is the code now again notice we don't know much about looking things up we don't know how the length is maintained we don't know how the static and dynamic allocation is going to happen we now have a contract with a bunch of Library code that is going to implement this dictionary object for us and do all of them memory manipulation on our behalf again we start with the basic stuff the big thing we're going to do is we're going to not just have a value it looks a lot like a link list we're going to have a key value pair the pi dict has a head and a tail and a count just like the P list and if you look at the the the Constructor it's pretty much like the Constructor for the list we allocate the uh dict Pi dict um uh structure and we set the head and tail to null to indicate empty and we set the count to zero and we're done and the same with the dell the Dell is very much like the link list Dell we are have to because we've allocated the key the key is also going to be a dynamically allocated pointer to a character array so we got a free Cur key along with Cur value but then everything else is the same we we we pre preload the next value then we free Cur then we move to the next value and then when it's all said and done we free self which is the pi dict value when it's all said and done we can call the new and then we can uh set a key like catch froze and the key thing there is the key and value are both Malik uh bits of memory just like you know before we had the text which was a Malik bit of memory and copied and we had to free it but now we just have two things and so the key and the value are two things that are that are going to be Malik and then copied into the Malik area so some methods for you to build the lens should be pretty easy similarly that we have a print that's going to be pretty and I want you to match exactly the output of the Python and so it turns out that we can make a method called find which returns a d node rather than get returns a string D and P find returns a d node and then we can use find both in get and inp put now we use it pretty much in get to go find it and then return um the the value because we have the key we look up find it by key and then return the value so that's pretty pretty easy to do the get once you have find so the find is find is a for Loop where you're going to go and you're going to find it and if you find it you're going to send it back okay and if you're not you're going to send a null back now you better check if it's null right including in the get you got to you got to check if it's null um but then in the put what you do is you look up the old one with pictor find and if you get one if old is not equal to null then you're updating the key updating the value for the key and if not you're adding it now the thing about the else Clause here is it looks a lot like a link list because really if you if you look at this thing it is a linked list it's just there's two values in each one we're not doing anything magical now more Advanced Dictionary implement ation might use hash Maps or binary trees or other things like that like that were in chapter six that we didn't talk too much about but for now we're just going to make our dictionary be a linked list but instead of just a value it's a key and a value so we can look it up by key and so we're not doing too much tricky stuff to make our dictionary really by just adding a bit to a list so let's just take a look at how this is going to work in sort of the real world as it runs so remember we have kind of the dictionary itself which is a head and a tail and a count and then we have the dictionary nodes which is our key and value and the next one now the key and the value are not the actual strings they're just pointers to Strings which means we're going to have to use Malik to when we get a key and we get a value we're going to have to Malik and copy both of those things um so if we start and we see Pi dictor new we're going to get a a dictionary with head and tail that point to n and then if we add catchphrase well we've we we allocate the Z we allocate and copy the key Z and we allocate and copy the value and put those in key and then next is null and head and tail point to this thing so we've allocated three things we've allocated a a d node and we've allocated a character two character arrays using Malik okay so then let's say we're going to run the next line of code which is setting the key Z to W now when you're in the put code you go and you call find and you see that there is a thing there is already a z in there so what you've got to do is you've got to replace catchphrase so you actually before you go and make a new value you have and copy W into it you want to free the old stuff and so you tend to free the catch the value that was in there before and then you maloc and copy in for the new value so if you're done at the end of this you will have catchphrase somewhere in Magic free space we don't know where how C does magic free space but it does do it so at the end of the second put you still have one entry but the value has been changed from pointing at catchphrase to pointing at w then we add y equals B well you do a find and there is no y key so now it's more like a link list you create a new D node and you pend it to the end just like in a link list and then you save the key and Point Key at it and then you save the value into new Malo space and then point value at that and then we go to the next one where we point C that we we don't find c in there so we create a new uh uh D node and then we we do a Malo of the the key and a malic of the value and we point to those and then copy the data into those two Malik areas and then Point Key and value at those malic areas and you can kind of see that this is really at this point it's unless we find the key uh in there already it's just a link list that happens to have two character arrays that are dynamically allocated and copied one for key one for Value that was a bunch of object orientation it was kind of a walk down the path that Neo van rossom took probably in the first few weeks of him building the string class list class and dictionary class chances are good he built something very very similar and then he's like okay now I got to make this better but uh you know if I was just writing this thing he'd probably just type this out it's kind of pretty for computer scientists who've been doing algorithms data structures their whole lives it's like well why don't I just make a class that does this you know now that I now that I've got sort of an object-oriented Universe let me hide all of the dynamic memory and that's really what we're doing we're hiding the dynamic memory and the implementation details and all the for loops and while Loops they're being hit hidden they're important and if you were to look at the source code to stir list and dict in Python you'd see they're allocating and reallocating they're doing it a lot more cleverly than what we did um you don't want to call real too many times but for for now it works we're doing small stuff um there is an infinite number there's an infinite array of optimizations to make all of this way faster and more impressive um but that's that's really for another time so we've kind of got the idea of the the Baby Steps From A procedural language with pointers structures and dynamic memory allocation how you would take the step using those underlying things in a procedural language to build basic objects and support those objects perhaps as you're building a new language like python [Music] hello and welcome to our continuing series of lectures on improving our implementation of a python object so what we've been doing is we have been building a series of of implementations of approximate implementations of some of the things that we find in Python like the python dictionary and so we the last thing we worked on with is pyth on dictionary class and in the previous implementation was just a linked list with a key now we're going to eventually have to build all kinds of different implementations and so the part of what we're doing is we're working toward abstraction where we're separating what the object is that we're interacting with from how we build it underneath and so we're going to do things like move our methods into the structure instead of just using prefix style naming conventions and just reduce the need to look inside the class or inside the structure that is holding our class for code that's in our calling code so we don't want to have to look at the class uh the class values inside the class so this is just continuing along understanding object or principles the three implement the three principles of object orientations are encapsulation abstraction inheritance and polymorphism and so for now we're bundling more things together that's encapsulation and we're working on abstraction and that is thinking about separately how we are going to use this object from how we're going to build the object we're going to reveal less and less of our implementation details to the caller okay and so for a while we just said well we'll just take the class and add like an underscore and and name it etc etc and it seems absolutely simple enough and in some ways you see that C++ does exactly that when it's um compiling C++ to C code um but it just it seems like it's simple enough and it seems like you would keep it straight but it turns out to be a bad idea in practice so python strings which I can write python code and not have to look up documentation are real objects that follow the principle of encapsulation everything that you touch is inside of the object object like uppercase searching for something PHP strings are kind of more archaic and that PHP is more of a seike way of thinking about things and they're a type and then there's a bunch of libraries that know how to use this type and so I'm going to show you some ickiness in PHP but I do love phps so don't I'm not just criticizing PHP PHP has a lot going for it but the in the language and the library there are some annoyances so let's take a look at a little bit of python and some equivalent PHP and so you sort of see this this notion that name convention seems tempting but it's not necessarily a great idea so in Python we say x equals a string we call x. find and have the first parameter the thing we're looking for and then we have yals x. replace old new and so the the first parameter is the old string and the second parameter is the new string and then we say how long long this thing is and so everything's very consistent but if we look at PHP it's almost identical except it's calling libraries right so dollar x equals a string with old in it and then we're going to use Stir pose that is the position in a string and the first parameter is the what they call the Hy stack and then the second parameter is with which we call the needle but then we look at the replacement which is equivalent of x. replace in Python and it's not stir replace it's stir underscore replace so do we use underscores or we not use underscores and then the the thing that just drives me crazy is what is the what would you expect the first second and third parameter of PHP stir replace well if I was writing it it would be the The Source string the old the old search string and the new thing to replace it with but that's not what it is the stir replace says old string new string and then the string we're doing the searching in you you can go look this up but PHP talks about how there were generations of things like stur pose I think is one of the earlier ones um sterland basically is that's one of the new that's one of the older ones and so it's consistent with stir pose but not consistent with stir replace and so the naming conventions is just less than ideal and and you can just see from a programmer understanding is just really simple that if we look at the python says yal X . replace old new I can remember that calling sequence and I can never remember a sore replace in PHP so the thing we're going to do here is we're going to put the methods in the structure and so we see some of our C code before and after and so we create a pi dict and then you'll notice that just like what PHP did I called those Pi dictor put Pi dictor Len and pict get I was consistent because I always used underscores and pict is the name of the structure pictor Dell and so I was pretty consistent with that but now what we're going to do in the name of encapsulation is we are going to take all those methods make them be part of the structure we're going to find that they're just pointers to the methods but we are going to have Global methods we're just not going to access them through their Global names so we're going to create a new new pict saying pictor new but then we're going to call to put something in we're going to call DCT Arrow put and remember we have to make this first parameter like the self parameter DCT that's just always going to be that first parameter because we're sort of doing it python style and then you kind of know that Z is the key and catchphrase is the value and so that's the that's the calling sequence and again that DCT comma is just because we're not an objectoriented language and so we put self in there again homage back to exactly the way that python did it and why they did it the way they did it because they were creating an object-oriented framework on top of a non objector environment just like we are and then you look at DCT arol Len and of course putting DCT in is the first parameter is redundant but necessary and we can do a DCT get and we in effect other than the first parameter which is DCT we are putting the key in and away it goes and then we could call the Dell method but now you'll notice that that every single method that is associated with a pi dict is in the pi dick structure so let's talk about how we're going to do that and why and so this all kind of falls under um leaky abstractions meaning that when we are in the main calling code and we sneak in and Peak at the data attributes inside the class we call this leaking and later we'll talk about iterators and why we iterators seem inconvenient and clunky but what they whole job really is is to hide uh implementation details to make a cleaner abstraction when the calling code depends on this the internal implementation names and approaches that's what it is to be leaky so we need to define a contract between the class and its calling code from above that we won't change and we're going to call this contract on interface AB another kind of word for that is abstraction and so if we look at all the code in our earlier implementation the whole whole thing is leaky right um especially if you look at that for Loop where it says for struct D node Cur equals DCT arrowead Cur not equal null C equals c next print c key and Cur value and and so what happens is it that should trigger a little like that says don't look inside these things and this is we later we'll talk about uh iterators how far we go and how we actually what we look at this code that we're looking at right here I think it's pretty but it kind of violates the abstraction boundary because we're looking too deeply into what the fact that this is even a link list we don't know there's a link list it may not be a link list it may be some other kind of a structure it may be a tree or whatever and we will later get to the point where we will make these things different implementations not just a link list and so that's that's this idea that like you you should be like oh no I'm there's this little wall but now I'm looking inside and then when you're looking inside that's when you kind of violating the abstraction boundary or we call it a leaky abstraction what this leads to is the notion of not all object attributes are the same and so we are when we're going to build an object we are going to decide what parts of these things are the contract and what parts are leave us alone we're going to hide this stuff and so the the concepts in object running programming are that things that the calling code is allowed to see whether they be data or methods are called public things that are like reserved for class use only are private and then when we start talking about inheritance which we W talk about too much there is this middle class called protected and that is stuff that classes and derived classes can look at but um but not the calling code so protected is sort of more like private from the point of view of the calling code so if you look at the abstraction boundary that we have here we see that the place where the abstraction boundary is sort of failing is that if that is that head and tail we want to make an abstraction boundary and um and say that look that the notion of head and the tail that's going to be all ours and that's going to be all inside um and that's inside the abstraction boundary and you're not supposed to mess with it so if we look at how we do this in Java there is a keyword called private and so Java we're making a point class and we're making it two double values that we're saying X and y are private which means you can't access it outside of this class the Constructor is public and the dump is public and so you see that the that that just means that you can't access X and Y outside of the class but you can access dump and the Constructor in C++ you see a private and a public and so private says this double X and Y are things that can only be used inside the class and public is the Constructor can be used outside the class and the dump can be used outside the class and this is just syntax that they put in now interestingly Access Control in Python is a little sort of wonkier because python doesn't really put things like public and private so what python is doing and you've seen these across all the python you've done where you see these double underscores dunders as it were like the Constructor you're not supposed to call the under underscore underscore and nit underscore underscore that's just when you create an object that's what happens um and so that in the init is an in internal method it's a private method uncore uncore X andore uncore Y are totally valid variable names except we're marking to the outside world hey you're not supposed to access these and then deaf dump the fact that we didn't put double underscore in front of it means that it's public and so double underscore are is the signal inside of python to do access control and we look at some of the stuff that C++ really does this was borrowed in many ways from how C++ does things internally so up next we're going to talk about about this map and the abstraction and the kinds of things that we do uh under the covers of the implementation details of the [Music] abstraction so now we're going to dive into the notion of abstractions we're going to take an interface and kind of compare it across a number of different languages we're going call this abstraction a map a map is a common term that we use kind of abstractly to describe key value Collections and each different language tends to have on a different name for that C++ they actually call it a map python calls it a dictionary Java also calls it a map but with an uppercase and in PHP we call it arrays and in JavaScript are there actually objects and then we're going to look at the iterator pattern as an abstraction for looping across multiple implementations so let's take a look at some sample um python code that's playing with a dictionary class so we created dictionary at the very beginning then we fill it up with some uh key value Pairs and so you'll notice that like D subzal 8 and D subz equals 1 that's got to be a replacement so so there's no eight in there after that second uh replacement we then print it then we do a get of Z to see if it's there and then we do a get of X and it's not there so we see x equals 42 when it executes then we say give me an iterator of the items in this dictionary and so what that basically is going to do is an is the iterator itself is not a list in earlier version like python 2 when you ask for the items you tended to get a fully filled out list but that's a waste of memory so the iterator is simply a data structure that is keeping track of where in the list we are and then we call it next over and over and over to advance through the iterator so we don't have to make complete copy of all the data we just have a little pointer that advances through so items is a relatively small data structure meaning it doesn't include all the data in the dictionary it just is itself a pointer to something it's all internal remember abstraction is like hey I can give you the next thing internally there's pointers and all kinds of crazy things inside these iterators which we shall soon see so if you print out items you will see that it's like an item iterator for dictionaries that's what that class dict item iterator is telling us but then we can call the next function which is built into Python and say hey iterator do your job and hand me back the next thing or if we've let reach the end of of the dictionary Falls now come in any order these have any order dictionaries of course um but we get back the entry or we get back false so we say while entry then we print the entry and then we say hey give me the next one and then Loop up to the top and when it becomes false we're all done and so what you see because this is an order dictionary is you see Z1 X9 B3 A4 and then it finishes so this we've not we don't know about next Arrow next we don't know even the in this case we're just getting a tuple back so we do know that but if we take a look at this same kind of concept in PHP uh we make an array and we fill it up Z gets to be eight Z gets to be one and that's an overwrite and then we put three more things in and we can print them out and we see that it's kind of an ordered dictionary as it were x z yba and then we do a get and we're using the null coals operator which is the double question mark so we say give me a subz and if that doesn't exist then give me back 42 so it's kind of like a get but that's a that's a PHP 7 in later so we look up uh X and we don't get it so we see x equal 42 and then we run through an iterator and again there again there's structures inside of arrays but we know nothing about how PHP implemented the arrays we just know that if we say for each a as key is aign value we can print out K andv and so this is a very abstract way of saying I want to go through all of them I want the keys and values give those back to me but I don't care how you do it whether you make extra copies of data Etc so that's another iterator pattern now in C the data structure we created is a map and if you read this you'll see that talks about how the implementations work etc etc etc but the C++ equivalent of a dictionary is in effect a map and so this is some C++ code the first thing we see is we're going to create a map and in this less than greater than syntax you're seeing that the map is mapping a string to an integer so the key in this case is a string and the value is an integer the previous two languages didn't care so much about types and so that's that's why they but now we're in C++ which cares greatly about types and and so now we say MP subz equals 8 then MP subz equal 1 which again is a replace operator then y b and a are set to two three and four respectively and um and then we do like a get operation and this one is a little funky and see why they didn't give us a get operation I do not know but what this is using as a Turner operation and saying MP count how many Z keys are inside this thing and if it's greater than zero we print out MP subz and if it's if it's not greater than zero if it's zero then we print out 42 which functions like a python get on a dictionary so this syntax is funky you can go like Google it there's just no that it's like there's two ways you can do it and neither of them makes me particularly happy because I think that for a map-like object a get a get with a default is uh pretty valuable um the notion of running through and Counting means you found it or didn't find it and if you found it why don't you give it back to me but they just don't have a get but now we see an iteration so it says for auto auto is a type but it's an automatic type and it knows um that this um MP is a map string int and so it creates this current pointer which is a pointer to not exactly a m map string it's a map entry but we don't have to care about that there's a there's actually a type CER the variable cerr has a type whatever the MP begin is going to give us back as a type and it knows that based on the map string int and it makes Cur the right type so this is like whatever type you want but it is not any type it's a very precise type and that's a that's a sort of a Hallmark of C++ is all the types are very very precise so it's a for Loop you see the three semicolons the initial initialization Clause Auto Cur equals MP begin says Hey we've got our iterator get me started begin go to the beginning of it and give me the first one and as long as C is not equal to MPN the the the the end there are no more that's kind of like their null um and then plus plus Cur so we're incrementing Cur and then there's a key in a value and they don't call them key and value they call them first and second that's the thing coming back from MP begin has a attribute first and attribute second and we call the ccore St to convert that to a C string so I can use print F so I don't have to use C out just because I don't know why I didn't want to see out in this one but you see an abstraction where the first and the second are known but because this is a key in of value that's not such a big deal okay and so that's doing the same thing in C++ in Java they have a interface you see the word interface here an interface the named map less than greater than K comma V and and just like in C++ this is saying a map is a key and a value but what we're putting in here is the type of the key and the type of the value so we're going to make a map that has a string key and an integer value you might say why didn't I do string string and that's because it makes it just easier when I'm writing so much C code um it it also will be fun when we actually count things if you remember from a long time ago we did count but map is the class and string integer there's that um this is kind of polymorphism where it can be a map is that Maps strings to integers or integers to Strings or strings to Strings or who knows what to who knows what else meaning this map is exceedingly flexible and it doesn't care what kind of type it's it's using as long as the type meets some basic criteria so here's a bit of java code that does the same thing that we've been doing and so we see that we're going to make this variable map lowercase is of type map map of strings to integers and we're going to create a new tree map of strings to integers and the new creates a new object now the difference between a map and a tree map is a map is an interface and a tree map is implementation the tree map says we're going to build this key value store but we're going to store our data in a tree and that says to a computer scientist that it's going to have a certain performance and memory footprint to are a great way to store key value data um but they they're they take a little bit more memory than a link list as we will later see um and so we're we're choosing an implementation the other thing where it says tree map that you might use is what's called a hashmap which is a simpler map implementation but doesn't keep things in order so you can choose the map doesn't change but you can say I'd like this to be a tree map or a hashmap they're both key value stores one is an ordered key value store and a hashmap is a unordered key value store and they both have different performance behaviors and internal implementation details but it doesn't matter because they're both maps and this code that we write we could literally change tree map to hashmap and the code would work exactly the same but the order of the key value pairs might be a little bit different now you'll notice that when we're putting stuff in We call we call a method map. so everything we've seen so far says like map Open Bracket quot Z quote close bracket equals 8 Java chose not to do what's called operator overloading and so it really does everything in a method so the kind of things that you think are going to be done with an assignment statement or uh some other syntax tend to be done it's like okay we're going to do everything with methods and parameters Now map is the object instance that's being worked on and Z where that's basically saying map subz equals 8 and we'll do an aut of Z1 which is going to overwrite you'll see I'm doing the same thing in each one of these things and then put in YB and a with two three and four respectively I can print it out and if you look at the print out it looks a lot like what it looks like in Python there's this thing called get or default map. getet or default which is you know if the key Z is in there give it to the value or just give me 42 is a default and in the first casee Z is there in the second case X is not there so you see X is 42 that's not a bad name for it it's a little more of aose than get it's pretty much the same as what we do in Python and then we have an iterator and now you see in this for Loop you see kind of the notion of the fact that the iteration variable is has a type so we don't have this Auto later versions of java may have an auto but now I'm explicitly showing you it's not a map string integer it's a map. entry which is an entry inside of a map it's an abstract interface to the entry inside of a map each entry that's got to match the string integer that's in the map and so there's a map string integer which is the whole map and then there's a map entry which is one of the entries but this map entry is also kind of an iterator right so we're going to iterate and move forward so it's not just the key and the value it's really the key and the value and the position but we don't see the position all we know is we keep we use this four syntax which is kind of like a four in in Python and we call map. entry set which is I want a set of all the entries and that map entry set does not construct a giant inmemory list and then go through it that actually creates a single map entry with the key and the value of the first one and then you hit it again and it gives you the second one you hit again it gives the third one and pretty soon it gives you null which means that the loop is going to stop and the entry itself does have a key and a value now key and value are known in the map entry interface so you say entry. getet key and entry. getet value now that they're using um methods to give us back the key and the value versus in the previous things you saw attributes being used in the iterators and that's because Java is obsessed with preferring to use access accessor methods like Getters and Setters versus just grabbing attributes and the key thing is is they can add just a little bit of business logic if they want rather than having to do something and have the key and the value already completely computed sitting in an attribute for you to use entry get key sometimes it just grabs something that's already got computed or it might actually go do something or do some work and so by putting these things in what they what Java calls Getters and Setters in this case we're not seeing a we're not seeing a Setter so much um but making it so that instead of it being entry. key it's entry. getet key open print close print that's a very Java way of thinking about this so we started by talking about a simple python dictionary where we filled up we use get then we create an iterator and then we abstractly Loop through that iterator and that's what we wanted to accomplish in this section just to see how that is done in a wide range of different language anges cuz the map abstraction is kind of like this thing that we use as software developers and then it's a a kind of a sealed thing and then Underneath It All the magic [Music] happens so I want to talk about the C++ programming language for just a bit because C++ plays a really really important role in the development of objectoriented programming C of course came out like 72 through 78 and then C++ came out in 80 and then both C++ and C co-evolved uh through the early 1980s and then you see things like C and Java and Python and PHP all informed heavily by C++ if we look at how object oriented was happening kind of before C++ it there was there was it was like C++ wasn't really appreciated by the typical mainstream procedural programmer of the day and so things like alal and simula they it was kind of like there were tribes that liked procedural and tribes that liked uh uh objectoriented but then C++ came along and sort of pretty much unified it which meant that you know you probably learned python as your first programming language and you were using object or progr from the time that you started and C++ is what sort of unified that and it was the C++ as the quickly number one procedural programming language and then C++ as the number one um object Orient programming language that kind of made brought order to the notion of procedural objectoriented hybrid Etc and sort of everything that kind of came after 1980 was really strongly influenced and informed by C and C++ so let's take a look at how this sort of changed over time by looking at some syntactical influence so C++ which was the earliest which is a pre-processor plus a compiler it turned into a compiler on its own eventually but it has this concept of a map that has uh a separately selectable uh type of the key type of the value and it uses the square brackets map open square bracket Z close square bracket equals 8 and that is in effect a put right that's like a insert into the map um or update of the key and it's a pretty succinct syntax and so python came up with a way python really didn't want to make a more complex syntax than C++ so python worked on its language to make it so that you could just say d equals di ICT and it was a typeless language and so we don't care the fact that it's going to be strings mapping to integers because you can map lots of things to lots of things in Python but it did follow the D subz equals 8 and again you you use this from the beginning when you first started programming in Python and you thought that was just natural but what's really going on is more like what Java did in 1995 we're not going to use the square bracket to do put stuff into a map we're going to be more pure here and we're going to create methods now if you look at under the covers in Python you see that that is really a method that does that insert of the The Key Of Z and the value of eight but if you look at Java it says map string integers so we again have this notion of a generic class capital map map string integer map which is our object variable equals new tree map string integer which of course is choosing the underlying implementation and then they have the syntax of a Setter style map. put and then they're giving the key and the value as two different parameters there's choices that each of these languages are making and I'll try to highlight them as we go through it I want to talk a little bit about how C++ and its object under approach and design made it so that a class like the map works almost the same as like a sort of a low-level class like a float or an in and it has access in particular to kind of the special characters or operators like square brackets or plus or minus how that happens and it is that that you can create a specially named method inside of a C++ class that the compiler will consult and call when it encounters certain what you think of as language syntax meaning that as it's parsing the language syntax like square brackets it's like oh I've got some code to do some work here this concept is called operator overloading meaning that the operator the behavior of the operator is controlled by the writer of the class you are writing a class so in this bit of code here I've created for no particular reason a a class that I call 10 integers in a row and I have a array of 10 integer values but that's private so that's something that the outer class can't talk to right so you can't say 10 in 10 like in the main you 10 do values is like no you're not allowed to touch that because it's private but then when what I do is I create a public method which is the square brackets operator method the method to be called when the compiler encounters square brackets my class name my object name followed by square brackets call me now the first thing we see is the return type of this operator square bracket operator code is an integer reference an integer that can either be read or written which means it can be on the left or right hand side of an assignment statement and the parameter that it's being given is an integer index which is the thing inside of the square brackets now it is a reference because of the Amper sand but it is a constant reference meaning we are not allowed to change it inside of this function okay so const means we're leaving it alone which means we're not allowed to say index equals 42 inside the operat the square bracket operator method and so what we're returning is that private variable values sub index but we are returning a reference to it so wherever it appears in the original C++ code in the main what happens is that reference can be like I said on the left or right side of an assignment state so let's take a look at how this works in the main code I am going to create a variable called 10 of type 10 int and then I say 10 sub 1 equals 40 which means I'm storing 40 in position one but when it sees that 10 sub one it says oh oh this little class has an operator square bracket operator so I better call that little method pass the one in then that returns a reference to values sub one and then into that reference the 40 is assigned now python doesn't have return a reference and it doesn't have a call a reference C kind of does but it's very they're pointers and so references and pointers are different references you'll notice there's no special syntax to dreference a re a reference whereas you have when you get a pointer you got to have special syntax to D reference a pointer so this notion of call by reference and return by reference is like impressive in C++ and allows it to do a lot of things and allows us to have this seemingly native line of Code 10 sub 1 equal 40 which is really just a bunch of method calls amazing and then we immediately say print 10 sub one contains and then 10 sub one and again when this is kind of a right hand side of an assignment statement it's looking at up it calls the method the operator the square bracket operator method passes one into it and then we return Valu sub one and then that's what gets printed out and so then we say 10 sub5 = 10 sub 1 + 2 and now we see 10 sub 1 on the right hand side of an assignment statement which calls that same code in the operator brackets method within the class passes in the one returning the reference to Value sub one but then that reference is just 40 and then we add two to it and then we assign it into T sub five which is again calling the in operator method to get a reference to values sub five and that assignment happens all I really did was kind of faked it but I I used it to show you this like lovely ability to do operator overloading when I first found my way into Java my greatest disappointment in because I taught a C++ class and then I learned Java I wasn't a wiiz at C++ but I really thought it was pretty ele again so for me my brain was so I didn't Learn Python first I learned C++ first and I'm like hey that's what object Orient ought to be and then I'm looking go to Java and Java didn't do that Java basically does not want and did not want as a choice to take values by reference in in method calls and even more it did not want to return references in the return values of method calls that those two things the Ampersand in the return type and the % in the call parameter call those are essential for C++ to accomplish this and Java did not want to return references in particular because it has to do with garbage collection variables going out of scope etc etc and if you return a reference you don't know when it's out of scope Etc um these are powerful complex and potentially quite dangerous things right but the C++ design was hey you are a samurai warrior and you are going to you are going to use these very wisely and we don't want to take power away from you we want to give you all the power that you might want and just trust that you're not going to make mistakes right and so it trust is that Java make and python make are like no no no we don't want you to make mistakes so we're not even going to give you this kind of thing there other ways to do everything U things like topple returns in C is a good example of um kind of an homage to this notion of returning things is not always just a single thing so um but C++ is kind of unique now again um python emulated the C syntax that was quite beautiful that was a result of C's support for operator overloading and here's the here's the thing where like it all comes together so python saw the beautiful syntax that C++ when you did the right things the compiler would give you this let you use this beautiful syntax and still call your methods inside the object but they didn't want to do the call by reference and return by return reference that the C++ did and so what they did is they did basically a syntax transformation if you look python is in a sense very javal likee in that everything has to happen through methods but then there's these hidden methods okay let's take a look at the code so I'll create a dictionary named X and then we say x sub 1al 40 and again we know what this means that means somewhere in the key the key under the key one there's 40 now I can print this out by saying x. Dore get item Dore open print one close print that's taking the index inside of the square brackets and passing it to a predefined python understood rule for what get item that the the square brackets turn into double underscore get item double underscore or X is the object and the parameter is the syntax so if we think about it on the right hand side of an assignment statement where we're just reading it it's just doing a get right it's kind of doing a getter like thing give me item one and then out comes the 40 and that's how the print of x sub one is a 40 that's really what happens under the covers is there is a class which has under underscore uncore get item underscore uncore defined in it as a method and that's how it loads x sub one so if we go down another line and we say x sub 5al x sub 1 + 2 seems simple enough and and literally for years python software developers don't even need to know that this is miraculously and beautifully complex but what this translates into at runtime by compiler syntax transformation is the x sub one on the right hand side turns into a x.g get item pren one which pulls up the 40 and then plus two gets added to it but then that expression is passed into x. set item in position five so the the leftand side of the assignment statement is the x of five equals part and that's calling set item so if it's doing a square bracket look up on the right hand side it's calling get item and if it's doing a square bracket on the left hand side of an assignment statement it's doing a set item so this means that there was no need to return references no need to process references none of the Hoops that C++ went through and so you see that python did not did not choose to implement the way C++ did but they supported the very elegan syntax now and then you'll see that Java in 1995 takes it uh or 94 takes it one step further and that they're not even going to give you that cool syntax they're like no it we're going to say just do X dop put and X do get and call it good if you if you know that X is an object and you need to do a get in the put do the get in the put we're not going to do this little syntax transformation that makes it pretty and we're also not going to give you uh operator over overloading because again operator overloading requires references because it allows the class to return a thing that can be used on either the left or right hand side and again Java did that because they did not want to make their memory management more complex um so it a hard to argue but what this kind of shows you is like the amazing interplay between these languages um Garn strrip went to school in Denmark and started working on C++ in Denmark but then was hired to go to Bell labs in New Jersey where he met and worked with for a number of years uh bjn strrip and Dennis Richie and all the folks at Bell Labs that gave us Unix and C over the decades and so C+ plus kind of came to the world from Bell Labs from Murray Hill and and G van rasum who was in Netherlands at the time really was just looking at all this stuff and using all this stuff and an expert in C and C++ and back in those days we tended to Look a Lot at like the code that C++ generated and G's like I'm just GNA I'm G to borrow these are really good patterns and so that's how we see so much influence of not just the syntax but the actual runtime conventions and and like if if you look at some of the generated C++ code the concept of private is often done with underscores they use underscores a lot Python's like yeah I'll just borrow that I'll just use double underscore as my signal of this private and a way you go so to show the influence that C++ and C had over python both in the syntax and in the runtime we can take a quick look at some internal details of how python works and python turns out to have almost identically implemented operator overloading in as as C++ but we don't see it it's all internal you have to kind of look so on on the left hand side that's the code that I just got done going through that's the C++ code that has the private values and then the public uh operator overload and now if we look on the right hand side we see a class 10in and I'm creating a double underscore values which is values private as a dictionary and then I'm going to Define the set item this this is like a private method called set item and then a private method called get item python basically has the left side and right side assignment of of uh of bracket lookup operators different and the set item is the left- hand side and the get item is the right hand side uh you'll see that in the set item I'm just taking self values sub index equals value and I'm in the in the getter I'm returning self uncore value sub index so that's the right hand side so if we look at the code I see let's make a 10 int on the variable 10 10 sub 1 equals 40 now python transforms that 10 sub one syntax into a set item of 10 comma the number one comma 40 and then call set item and you can see of course it worked right the three values are self index and value well self is 10 which is the object instance the index is the thing inside the square brackets and the value is the result of the expression on the right hand side it's not just 40 but it's the expression on the right hand side and so that goes in right we see the print 10 sub one well that is a right- hand side reference to 10 sub one so that's going to called get item self is 10 and then index is the one and we're going to just return it and that's going to print out a nice little happy little integer which is exactly the variable 40 so it says 10 10 sub 1 contains 40 and at this point it should be obvious what's going on when I say 10 sub 5al 10 sub 1 + 2 well the 10 sub 1 on the right hand side turns into a get item that gives us back the 40 Value then the 40 and the two are added together to finish the right hand side of the expression then we're going to sign that into T sub five which then turns into a set item of 10 comma 5 comma 42 and then that stores 42 in position five in our private VAR varable values private values variable and then I printed out which is the left hand side lookup of 10 sub5 which calls get item again with uh self is 10 and index is five and so we get the 42 and so you see how they're so similar inside I mean again like if you look at generated C++ code from early C++ code compilers you'll see these doual underscores used in various places which means that python in internal implementation use the same patterns as C++ did in its internal implementation pan chose not to do call by reference and return by reference Java chose not to be not to do call by reference and return by reference and Java chose not to do the the fancy syntax transformation but you know who knows maybe maybe one of these days Java could do that syntax transformation and be like whoa Java has everything and then to some degree python has shown the way about how you do this without doing um without doing call by reference and again that Ampersand operator and Ampersand index on the left hand side that's the like scary part where language designers are like I'm not sure I want to go do that because C++ is not a a garbage collected language but Python and Java are garbage collected languages and that's not the only reason that's that Python and Java didn't want to do call by reference but it is one of the reasons that you kind of just simplifies to know that there when a function is done it's done and there's not like sneaky little pointers inside that function that need to stay alive so it allows you to throw stuff away with functions are finished okay enough of that just again I'm just trying to show you in the simplest possible examples the kinds of design decisions that all these language and Library designers were doing as they buildt the languages that we know love and use [Music] today so it's time to stop deep diving into objector in theory and get to writing some code so we're going to start with something simple going to do encapsulation the second thing we're to do is iteration but for now we're just going to do encapsulation and then in the next section we'll do iteration and really most of this code you've already done we're just kind of refactoring it and moving things around and taking these you know functions that we named by convention and we allowed the call-in code to use and uh and moving them into the class using some pointers so a real accomplishment here is the map Arrow put the map Arrow get and the map Arrow Dell these things are now named and accessed in such a way that they are attributes the the functions we're calling are attributes in the class itself and so other than that it's not that different we and so it's not that big of a deal the other thing we're going to do is be a little more explicit about what things in these classes are public and what things in this class that we're building are private so we'll start with a map entry this is the structure that makes up the nodes of the link lists uh the key is a character string and the actual value is an integer that we we're just going to make it simple to got to we got to dynamically allocate the key like we've been doing um and then we have a pre and a next the key there is the pre and the next are double underscore so that means they're private but we are going to decide that key and value are public and we just indicate that much like python would do by not putting double underscores in front of it and remembering in our mind that they're allowed to be used in calling code the map structure uh most of it looks pretty simple we have a head we have a tail and we have a count you've been maintaining those for some time now those are private attributes so we've renamed them in such a way that they have double underscores in front of them and then we have a series of public methods we have five of them the key thing is these are Pointers to functions and that's what void starp put that's parentheses star put that means that there is a we're allocating a variable in the structure named put and it is a function pointer that will return a void it's a pointer to a function that returns a void so not only are we defining the attribute that's we're going to use to access the function we're also defining the calling sequence it returns a void and it takes three parameters a struck map pointer self charar key and int value so when it's all said and done this is not putting the code in here as it might say in JavaScript for example what is is a single 64-bit number which is a pointer to the beginning of a function now the function method signature has to match so we're defining the method signature but in terms of allocating we're really allocating one one pointer for put one pointer for get one pointer for size one pointer for dump and one pointer for Dell and again you know you look at get well get takes as its first parameter a pointer to the map which is self a key that we're going to use to do the lookup and then a default value to return and then get returns and int and so that that's pretty straightforward it took me a little while to get the pattern right about because the parentheses here are really really important because we're both def finding the attribute name and the rules of its use and the method signature of the function that we're eventually going to point to okay but that's pretty much it right we we're just going to put these things in and so the Constructor is pretty straightforward it's not that different than the Constructor that you did we got to build these functions double uncore maput double AR get map size they're they're outside of this they're above us in the source code somewhere and we're just saying P put which is an attribute put public attribute put is equal to Ampersand the address of the double uncore maput function super simple a Ampersand is address of address of that function get is address of that function size is address of that function dump is address of the function and we're done and and this is kind of showing you that the this is let's see head is a 64-bit pointer Tail's a 64-bit pointer count is probably a 64-bit integer or a 32-bit integer put get size dump and Dell are all 64bit so the size of the map itself the map structure is about you know 10 words or less and that again has to do with uh efficiency right but you probably have most of the code you need for map put map get map size map dump and map Dell so map dump is pretty simple you know the if we look at this the you know it's a the self is the pointer to the map so it has a head and a next and we're going to just go through it until Cur is equal to null we got a map entry which is the type now we don't double underscore the C because that's really just an automatic variable inside this function that has nothing to do with the outside world and you'll notice that we're just as access UND double uncore head we're accessing double underscore next because we're in the class right and so that those are private but are totally legit to access them when we're building a dump tool inside the class so private things are accessed in the methods in the class that's normal right we don't have to hide those I'll tell you when I'm building something like this the first thing I want to get to work as some kind of a dumper because how I mean when I write this code before I hand parts of it over to you I have like map dump map dump map dump map dump every line I put a map dump and eventually when stuff starts working I start taking the map dumps out so just debug debug debug debug always so that's that's why I'm just like I couldn't write this code if I didn't have a map dump and so I'm going to make you guys do it as well so the destructor like most destructors the key thing is to draw the picture and figure out what parts were dynamically allocated and then call free or which parts came from Malak and then make sure you free them and so we're just going to Loop through and again we're in the class so we're happily using double uncore attributes we're going to Loop through we're going to and the order this is always important but by now it should make sense we're going to free the key because remember that's a string pointer that we maled we do not need to free the value that's just actually part of the map entry struct and we're going to get rid of that in a second we're gonna we're going to advance to the next one first and then we're going to free the current map entry and then we're say Curr next and we're going to Loop up and So eventually we're just going to go through the link list and breath the key and then free the entry itself and we've given back all of our data and then we're all done with that we actually free the 10 words or so that is the map structure again this should start to look familiar to you so get is pretty simple as long as you have some code that like is map find um you know map find is going to do all the hard work but it but map fine can look at underscore head and and um and all that stuff and next and look at all that write some for Loops should be not too hard um and again underscore underscore map find is private but we're in the class and so just have fun talking to the private stuff Matt put is something you're going to have to write but if you think about it if you get map find and it returns you that it will and you've done this before you've used a find like method to find the thing in the link list and you update it if you found it it's really simple you just change the value in return and if not you add it to the end of the L you construct a new map entry and you add it to the end of the list and so again I just hope by now you can knock these things out and so that's basically it I mean if you really think this is was a very simple section where all we're doing is changing from globally named functions we're enforcing the rules of private double underscore and then we're taking those pointers we declared pointers to functions in our map and then our Constructor sets them up and the rest is really just refactoring code that you pretty much already have [Music] so now we come to the last section of this module and that is iterators it's all been building up to iterators and this is a situation where you might say wow I don't like iterators iterators seem like a more complex way to write Loops than just looking at like head and next and sneaking in and violating the abstraction boundary but as you'll see in the overall next module you're going to have to have very different underlying data structures and we want to be able to write the same code over and over again so at at some level what we're doing here is we're building a map implementation that can be a link list a hashmap list I mean a hash based map a list based map or a tree based map and what we want is this code right here this code should not change we should say hey give me a map we got a map entry we got a map iterator those are all part of the contract that we have with the object be it a link list tree or map our hash we're going to do a put put put put dump get get now we're going to iterate a hash won't even have like a it doesn't have a head and the next it's not going to work right so we're going to have to say hey there's this abstraction give me an iterator for your map okay and we don't know what's in the iterator we don't need to know what's in the iterator the only thing we need to know is it has a method called Next that's it so we're basically saying let's get started give me an iterator from the map call the iterator method passing the map instance as a parameter and give me back iterator and then we write a while loop and we say hey iterator give me the next thing it is up to the iterator to start at the beginning and then advance and move down and when we get null we break if not we print key in value from the CER now CER is of type map entry there's a map iter iter next and then CER is what we get back from iterator so we get from iter itter next we get back a map entry and so if you recall key and value are public in the map entry so we we could have I could have had you hide those behind sort of Getters and have get key and whatever and name those underscore but we're just going to leave them public attributes for now if we really going to be the if we were implementing Java I mean right now we're kind of hardcoding this string key integer throughout so it's going to be okay and then of course we call the destructor on the iterator once we're kind of done with that Loop and then we call the destructor on the overall map and this code should be roughly the same when we go from link list hashes to trees this is the moment and it's this iteration pattern so I'm going to do a bunch of pictures and so I just wanna we I've been drawing some pretty complex pictures on these things but by now the whole pattern of what next means and preve and these things being null a doubly link list and that key pointing to another little uh you know a Char star key which points to another little statically allocated thing and the head points to the tail and head and the tail and all that stuff I'm just going to for this section really simplify these pictures to say look there's a variable called Head somewhere and it points to a zal 22 then it points to a wal 42 and then that's the last one and that points to next there points to null and so I'm going to really use a succinct representation of Link list uh going far forward the if we re review what we don't want to do right we do not want our call and code to know about count we do not want it to know about head we don't even want it to know about next within the entries right we don't want to know that we do want to know about key and value and so the calling code where map eror number underscore head current underscore next no no no no no that's not allowed right those are private so in our calling code if we're if we're looking at things have underscores technically we could do it because there's nothing in C that's stopping us right we create those things right so we don't want to call head or count because then if we change when we're doing a map head's not there anymore I mean a hash head's not there next doesn't work I mean we got to hide that we got to like wrap it we got to create a strong abstraction around this notion of starting a loop and then iterating one iteration of a loop and then ending the loop we have to abstract that away this is the concept of separation of concerns our calling code does not need to be concerned about how the object can lo be looped through right so we need a generic notion of looping so you can think of the iterator object itself as thing you create and it sort of starts at the beginning and then you hit it Boop next give me another one give me another one give me another one and inside the iterator the state is changing it's like advancing and it just gives them to you want at a time you can't ask it for the same one once you it's been given to you it's sort of like ratcheted down to the next one so if we look at this python code we uh well we start with a dictionary a maps to one B to two c to three and we print it and there's the dictionary and we say oh let's convert that to a list and that list is the keys which is ABC and then say give me an iterator from that dictionary we print that out and we print the type of it it is of type dictor key iterator object and so the iterator itself is not the entire dictionary it is not a list of all the keys it is an internal structure that python is going to maintain and then we're going to poke it by calling next next next now the whole next thing that's probably calling an internal method like UND double underscore next Double underscore so next is part is is is part of the Python language so if you look at the while loop it's a while true Loop we say item equals next so that means give me the next available item in the iterator and then Advance it and if we're past the end return me false and I say if item is false break and otherwise I print the item and so I'm getting the items a b and c the the key thing here in Python just we're using python to keep it as simple as possible is we the the iterator is something that's created the iterator doesn't contain all the data the iterator contains pointers inside of it so that it knows what what to do next and we repeatedly probe the iterator with the next call to get the next thing and that both advances returns and indicates when we have run out of things so it's weird because we're so used to say like for blah in blah or for this that but that's that's not how iterators work iterators want this next thing to happen the C code to do the iterator you create it you Loop next through and the C1 is going to look pretty much the same as the python one so if we look at the map it structure it is going to have the kind of things that we we've needed we just are going to pull them in so the concept of current like we've used the variable current in the past for these Loops is in the map iter structure and it's private so we moved that from a sort of a variable that was in the main scope to inside this and made it private the only public things we have are a next method and a Dell method and so now what we have is a simple contract you can see our our kind of outside contract for this class is it's not cre it's created by the map class but once it's constructed next and Dell are the only thing that you can do with this and that's that and then we get to decide inside this class and so when we construct it we're basically going to start it we're going to allocate the right size we're going to take the current and point it at the first item pulling from the head of the link list and then we're going to set the two methods next and Dell based on the address of the implementations of the functions that implemented and then we're done right and so because we're inside of the map and that's map iter so this is what you get when you go Map arrow iter you get this code um and so we're totally allowed to do everything private with map because again the developer of the map class is the same person or team that's develop developing map iter and if we wanted to change head to you know X we could because we would just go inside all our code and change it but head is not exposed to the calling code so they wouldn't notice that change again that's that's the key at the moment the Constructor is called before the first call to next this is what the map B looks like current is pointing to the first item in our length list then the Y Loop starts and it calls head now you'll notice that it's kind of got this weird thing where it grabs the current and that's because current starts at we could have implemented this differently but the way I did it was current is pointing at head and I have to for the return it's got to return at the first call to next I grab current and then I Advance current so that at the moment that it Returns the return value is b equals 14 and current now points at 21 d equal 21 preparing for the next call to the next function okay okay so then it comes in and R Val grabs 21 and that's what we return and then we advance to 19 and you can kind of see R Val and current Chase each other down this uh link list so the we return f equals 19 and then current points to null and then we notice that current is null and then we return null to tell our calling code that we are finished so to start the iteration to Prime the iteration we call the map object say hey give me an iterator for the map and we get that back and we're going to store that in our variable iterator then we're going to start an infinite Loop that says while one or while true uh C equals it or next give me the next one which the first time two is going to give me the first one then if I got a null I'm done with a loop otherwise I print the key and the value of the one I got and then I go up and I iterate to the next one print it up print up print up print oop I got a null and then I delete the iterator and this is super equivalent to what we do in Python where we say x equal give me the iterator for the dictionary X in the variable it then while true we advance to the Advance next of it otherwise give me false if we got a false we're done and otherwise we printed it so these two are very very parallel and you'll notice that Java and C++ don't do iterators the same thing but I wrote this C code to mimic python way of doing this so it's been quite a long uh Journey we really focused on abstraction and encapsulation and we've done it with iterators and all we've done now is we've laid the groundwork for multiple implementations of the map we shouldn't have to change our main code anymore we should be able to put a we we build a list map a a list map and then we're going to build a hash map and we're going to build a tree map those have increasing complexity and improved performance characteristics and now you really are going to start seeing why we say have abstraction so that we can fool around underneath the abstraction and accomplish really cool things and get closer to what python really does underneath of the dictionary implementation [Music] hello and welcome to the last lecture in this course we're going to talk about tree maps and hash Maps up till now we've built a map abstraction we've looked at how iterators work we've created a linked list implementation of the hash abstraction and now we're going to go and build a hash version and a tree version of that same thing if you recall some lectures ago I read a Robert Frost poem you know miles to go before I sleep well that's where we're at now we're coming to the end of this miles to go although as you'll see the end is really just the beginning of the next phase and with a little foreshadowing if you have been with me for a very long time all the way since python for everybody which for me was recorded a number of years ago already this is the first complete piece of code that I showed you and that was the code to count the number of words in a file by splitting them then creating a dictionary and then counting them and we're going to finish this lecture by implementing this in C but I'm getting ahead of myself so the idea here is we're exploring you know different key value implementation Alternatives we're we built a unordered Java based hashmap which is like a python 2 dictionary if you recall python 2 had unordered hashmaps which meant you sort of ended up with your stuff coming out in a random order it was the same order but every time you inserted something the order might change now Python 3 they tend to later versions of Python 3 they tend to be ordered which are more like the list map that we did we're going to have a u map that is sorted that's more like Java's tree map with an iterator and it is sort of chapter 6.5 or section 6.5 of the book um and it is a combination of a tree map and a link list map um but uh Java doesn't have such thing which really kind of surprises me they got a tree map and they got a hashmap but they don't have a linked tree map or a linked map so here we go now these these the two abstractions were I mean two implementations we're going to build uh kind of you see them in Python you see them in C++ and you pretty much see them in um Java as well but we're going to do our own thing so I would say to you when you're writing this code um I I don't want you to think that when I wrote this code that I gave you with samples or when I wrote These slides that it was easy for me um the concept of trees and hashes are pretty straightforward but then you got to solve the little problems of how to take the previous and hook it to the current and hook the next to the to the next the current next to the next from the pre and so you you got to draw pictures and this is this is actual picture that I drew I really wrote all this code from scratch I mean I didn't come up with the idea of a tree from scratch but I wrote this code from scratch and you can see that the like when I was building the the tree map um my goal was to find the right place in the tree to insert the next item and so you see I've got this 1 3 5 7 9 11 13 so I kind of constructed this tree that was right it was in order um and then I was trying to figure out where I might put four and where I might put eight and where I might put 14 and then um I kind of had this notion when I was writing the picture that I you'll see I had the words lowest node greater than and I cross them all out because that wasn't enough and you'll see when we get there that I have the lowest node greater than and the greatest node less than and I've got to get both of those things and so as we work our way down the tree we got to keep track of this I'm I'm getting way ahead of myself like many data structure programming tasks they if you you can draw the picture and it makes a lot of sense you see the hashmap which is the first one we're going to do that's really just nothing more than a bunch of Link lists with a hash function picking the the head instead of one head it has in this case four heads um so that one turns out to be easy and that's the first one that we're going to do but when I wrote this I mean I knew what I was doing I knew what a tree was I knew what a hash was that that's the easy part the hard part's writing the code now taking the code from someone else like if you're taking it from python or C++ or Java that's easy thank heaven they wrote it and they tested it and we have a nice tested working implementations so you shouldn't have to write this stuff in most languages and so we're just understanding how to write it but if you do it right you're going to make mistakes and you're going to you're going to be 80% right but then it's going to be real hard to debug this stuff so part of what you need to figure you need to accept the fact that you will un you're not likely to write it perfect the first time and debugging is difficult you're going to print out a bunch of like percent PS and hex values and stuff and you're going to just go through it slowly like what did I do wrong because the main programs for the programming assignments that I give you are really kind of like unit tests they're sort of pushing your implementation to see if it's capable of handling all of the common situations and so don't don't stress if it doesn't work right away they my my implementations didn't work right away they they failed you you can you know if you can go to some website and get the solution I mean if you're going to do that just go to Python and make a dictionary if that's your goal your goal is to struggle with doing something that you understand you know how to do it you know what a tree is you kind of know that you got preves and necks and lefts and rights but you still got to write the code and making one or two mistakes and then fixing those one or two mistakes is essential to understanding so with that up next we're going to talk about the hashmap [Music] so now we're going to talk about a hash based implementation of our map and this is the answer to the world's most common programming interview question but we're going to do hashmaps and then tree Maps tree maps are harder so hash Maps turn out to be beautifully simple and that's kind of the reason that um Everyone likes these is interview questions because the interviewer can remember the answer a tree map they might have trouble remembering and so it's the perfect thing in an interview to say draw me a hashmap because they know the they still remember the answer from when they went to school they' have to do a little review to get the tree map right so here we go so let's talk a little bit about our hash map implementation it's got a weird order and once you see the data structure internally it'll be clear why there's a weird order it is like a python 2 dictionary and it's like a Java hashmap it's very very similar to both of these things I'm guessing the code we're going to write is very similar to when when GTO made his first dictionary it's going to have extremely fast insert and looked up just like python 2 dictionaries and Java hashmaps it's going to be iterable like Java's python 2's dictionaries and Java's hashmaps and it builds on link lists surprisingly it's really easy to build a hash L hashmap if you understand link lists and so it's it's covered in chapters 6.5.1 and 6.6 in kernigan and Richie 6.5.2 is literally the hardest part of the book and that's why we kind of start with 6.6 and then kind of go back to 6.5 okay so let's take a look at our data structures and how we're going to go from the list map to the hash map so our our our list map is pretty simple we've got the entries in the map which are key value which we've decided is going to stay public we've got the pre the next for the map entry which is just you know we're going to link these things together and then list map itself it's got a head and it's got a tail and maybe a count and a few other things and then the methods Etc because we've done encapsulation so the hashmap entry if you look at it it's pretty much identical and that's because the entries in a hashmap are just part of a link list the key to the hashmap is there's multiple link lists and we see that instruct hashmap and underscore buckets is how many buckets we have in a in a more sophisticated hashmap implementation we would have the number of buckets grow as size grew and the the list got too long but we're going to keep that so that's called rehashing and we're going to keep that out of our conversation so but we're going to have a number of buckets and in this case it's going to be eight so those are called hash buckets and then we're going to have heads plural we're going to have eight of them and Tails eight of them but within a particular head and the tail it really is a hashmap so as you're writing the code for the hashmap go back to the list map I mean literally copy the list map code and then change the singular to plural and you'll see some of the things I show you in the actual code so if we look at how a list map looks it's got a head and it's got a bunch of entries that have preves and necks I'm not even showing the preves in the arrows I'm just showing the next but assume there's always preves there because it's a way for us to link things in but if you look at the hashmap so you take the actual key you run it through a hash function which creates some big number no matter how but it is just a number no matter how long the key is it can be one character or 2,000 characters eventually the hash runs a calculation that gives us back a number sort of a pseudo random number that has you know equally likely and there's a whole science of hashing and then we take a modulo and in this case we have four buckets so we take this hash calculation modulo 4 and that gives us a number from 0 through three and with that number we pick the linked list and then we add it to the linked list just as if we were doing this with a link list so the M once we've done the hash and we picked a bucket it really is exactly the same as a link list so a hashmap with four buckets is the same as four link list lists and you pick the link list by the hash computation and hash computation is deterministic and predictable so wherever we put D it's going to be in bucket one and we can look it up in bucket one we can store it in bucket one Etc and so for inserting m equals 90 that's going to Hash into into bucket two and we're going to put it in that that link list okay so it is beautifully simple now what is a hash calculation this is actually from my uh postgress for everybody course basically the hash Maps large data items to a single a single number basically and these are called hash values so the whole concept of a hash function when used with a modulo in this case I've got 16 modulo 16 in this picture it Maps a big string into some fixed number of buckets and often the buckets are power of two but they don't have to be it's really a modulo operation and so there's a whole um there's a whole science of hashing and hash functions and it turns out the hashing and hash functions are a big part of security and digital signatures and all that stuff and so there is there are people who spend their whole lives researching how to build good hash functions and so there's this this sha 256 compression function you can go look it up you can see what's going on here is like the arrows are shifting and the plus with a circle ccle is exclusive ores and they sort of both show you um the shifting and the exclusive ore and they give you a diagram of how these things and they shift an exclusive or yada and they're taking the pieces of a a value it's computed in a loop and updated and what they're showing you is what happens each iteration through the loop and so um the idea is is we are going to take a string string array and we're going to take a number of buckets and the idea of a hash is it is just some integer number and we're going to go through each of the items each of the characters in The String that's the four star stir star stir plus plus and we're going to take the current value of hash in this case we're going to shift it three to the left and then we're going to exclusive or it with the character we're looking at so you can say shift three exclusive or shift three exclusive or so you could think of it as like a accumulation but the exclusive or is a nice form of accumulation in that it it increases the randomness the pseudo randomness of this thing and so exclusive or just turns out to be a super valuable calculation and so this this Loop is going to run so many times and so we're going to print it out you're going to see the hex if we're just taking the letters Hi you can see kind of the internal hash value growing and changing and you can kind of see it going from uh right to left as it sort of grows um and there's new data being put in bitwise it's a bit you know bit exclusive or but at the very end it says return hash percent buckets which takes the modulo operator of the number of buckets and in this case I'm going to be using eight buckets just to run the hash function right give me the bucket for this string so you can see me running different things on the right hand side and getting back the ultimate final bucket so hi goes in bucket one hello goes in bucket s and World goes into bucket four this is this is really inspired by you know the the the shifting and the masking but I've simplifi it so you can kind of see what's going on and in our particular hash it's it's good enough for our purposes but it's probably going to have collisions when treated against a whole series of random data it's not going to be as good and that's where a fancy hash like shot 256 would be helpful so now that we understand the basic data structures and how hashing functions work let's up next we're going to take a look at actually building a hashmap or at least adapting our list map and turning it into a [Music] hashmap so now that we understand what a hash function is is we're going to actually build our hashmap implementation but what we're really going to do is make a copy of our list map code and change as few things as possible you'll be impressed with how easy this really is if you have a bit of working list map code this is the new Constructor and I can call your attention to the changes right remember we have a number of buckets we have a however many head however many buckets we have we have eight heads and eight tails so we're going to look in the Constructor so we're going to allocate a hashman it's not that big right you it's still got it's still got the functions for our encapsulation to put get size dump and it or that's not changed at all except it's called hashmap instead of list map the buckets is set to eight we're going to initialize all eight buckets to had a head of null and a tail of all because remember this is just eight link lists and count you set to zero pretty straightforward especially if you understand the hash list and if you don't go back and watch that lecture don't just like go oh I didn't understand what hash list was I'll just keep on not understanding and and use chat GPT it's like well I I don't know what to do if that's how you're going to go through this assignment but if you have and understand a working hash list this is easy easy easy we've been using list map find before and all it does is it finds a hashmap entry if it's already there and so we send in the whole hash map self which is very you know python objectoriented pattern where the first parameter always s we have a key we're looking up and then we're telling it to start in a particular bucket and that's the real change if you have hash list find it doesn't have a bucket hashmap findind has a bucket and so this code is exactly the same as hash list find except instead of starting at head we have an array of heads and we use bucket to figure out the thing and then we Loop through it we're in the right bucket something above us figured out what bucket it go through and so if we look at hashmap get which is taking key and having a default and having a self we say hey compute the bucket from the key and however many buckets we have which in this case might be eight and then we do a struct hashmap star rvel go find it passing in the bucket if the if the return is null from find we return the default otherwise return the value so again there's one line changed between the code from list map get to hashmap get so let's do a quick review of what we do in mapap put now this is not the hashmap put this is map put this is our list map put so we call find if we find it then we're just going to update the value and we're done if we don't find it we allocate the new entry we set it's next to null and we link it into the list and this is the place where you should be drawing a picture if the head is null that means we have an empty list then self head is this new thing if the self tail is not equal to null we've linked it at the end then we're going to update the tail Etc so draw these pictures and these are the parts where you'll mess it up you you will get these wrong and it's okay to get these wrong put like a print statement in every line here if you're having trouble right and you got all the cases and this this nice little four lines of code captures the cases okay so remember we're inserting at the end so what do we do for hashmap put we have a bucket we're going to run a a hash computation to figure out which bucket it is and we're going to call hashmap find and we just tell it to find it in the bucket so in the linked list sub three or sub four or whatever and if we found it we update the value and return otherwise it's time to insert so we allocate a new one we said it's next to null and then we know which bucket it is so in this we looked at the previous code it was for one link list and for this current code there's eight link lists but we already know which link list we're dealing with and so literally you can take the word head and change it to head sub bucket everywhere you see head here you can change it the head sub bucket tail sub bucket and literally when I wrote this code that is exactly what I did I did it slowly not to mess up and of course the compiler helped me if I forgot something but that's as simple as it is to transform the put from a list map to a hashmap of course the dump we have to do a little bit differently we want to show all the buckets and so we were showing what bucket it is the key value pair but other than that it's a it's really a pretty straightforward stuff to do the hashmap dump remember I told you that writing a debug tool is essential so you can change your your main code to dump dump dump dump dump because you'll mess it up right you will not write this code perfectly and you need to be able to debug it so now let's take a bit of a review of the list map iterator so recall that the iterator is its own object we've got the entry we got the iterator and we got the map itself and the map iterator is allowed to do all the internal stuff because the whoever's writing these things writing them in a group so we can think of it as like a protected value and so um really the essence of the map iteration iterator is a current so we're going to call the public part of it is next and Dell and the private part is what the current thing is because we're going to we don't get to see the current but we can use next to get the current back and so the idea is is we can call next next next next next rather than looking at current recall that the way the iterator works is you ask for the iterator and then you hit the iterator with the next and you go until you you're done and when you find something you print it out and we can do the same thing in CC and as well as python if we think about how to do the iterator for a hashmap it's a little different we must move we must start at generally we start the first bucket and we must move through all the buckets eventually we got to get to because we got you know eight or four link lists we got to go through all of them and we got to go down each of them but if we're looking for the next item and we got to skip empty buckets so so we kind of got a bit of a complex while operation and so one of the things we're going to do is when we create the map Eder we are going to store a reference to the hashmap so when we construct the map hashmap itter we're going to know which which bucket is an internal value current is an internal value and map is an internal value which we're going to use to remember that map that we're an iterator for and then we're going to have a next and aell this is the Constructor and we're saying make an iterator given a hashmap we make the iter structure we remember a pointer to the map in case we're going to need it later the current bucket we're going to look at is zero and the current map entry is the head of the zero bucket and then next and Dell are just capsulated methods basically and then we return it now that first bucket may or may not be an empty list right so that first been you know there might be just one bucket the third bucket might have a list in it and 0 one and two won't have any in it so remember but so we've starting at the top bucket and our current is pointing to the head of the the first bucket which may be null the tricky bit is the it or next and we remember we've given the map it or we don't get map but we've stashed map in underscore map if self Curren equals null in the old days in a list map we knew we were done and we could just return null but now we have to go down a bucket so self this is the iterator's bucket goes from like 0 to one so we increment it and if the self the current bucket we're looking at in the iterator is greater than or equal to the maximum number of buckets we have now got to the last bucket and we return null and then what we do is we say okay that must mean we have more buckets so we say self current is equals self map that's our little stashed version the map and we're going to go we've already incremented self bucket and so we're going to grab the next head and then we're going to Loop up to the top now at this point if that bucket is empty we're going to do it again and we're either going to go through this y Loop enough times until we either have exhausted the buckets or we have found a bucket that has an entry in it then we got to do a little trick grab the current so the only way we're coming out of this Loop is if self current is not null because if self current was null we would have wilded our way through and then returned null eventually after we exhausted the buckets in the while loop above so the r Val is self current if self current is not equal to no we're going to go to the next and then return rep Val I've given you the code go through it carefully It's tricky to write what I would do is I would like print these Co this code out and draw the picture Okay draw the picture so let's take a look at a hashmap iterator in action so this is what it looks like when we've just got constructed and we're in the first call to next current is pointing at the first item in the first link list the link head Subzero we'll fall through we look at this self- current value and it's not equal to null which is great which means we have something to return so that so we don't have to go through the sort of scanning across null entries so we skip the while loop and we simply set R valid to self current and if it's not equal to n we advance that and then we return RV and so at the end it looks like this the current has been Advanced to the to the next thing we're going to return on the second call so the first call returns f equals 19 that's the r Val and the next thing is going to be H equal 17 but now we give it back to the calling code and away we go so now we come in in the second call it's going to do kind of the same thing current is going to have pointed to equal 17 it's not going it's not null and so we simply take R Val and be self. current and then we advance it as long as self current is not equal to null and because it was pointing to H equals 17 it is we're going to advance but now as we exit this second call current is going to be null but we're going to take care of that on the third call so don't worry about that R valal is H equal 17 and current equals null so now we're done with a second call so now we come into the third call and in this situation we are pointing at the zero bucket and current is null so the Y Loop is going to take over so while self current equals null which is true right now we are going to run the code say self bucket Plus+ which is the bucket number in the iterator and we're going to ask is the bucket number in the iterator greater than or equal to the number of buckets in the map if we are we're done we we're at the last one but we're not so we're not going to return null we're going to say self current equals self map head self sub bucket so we're going to go down to bucket sub one now and we are going to make current point at bucket whatever the head of bucket one is and that's okay we found an empty bucket because remember I said you got to skip empty buckets but we're still in the while loop while goes up says self current is still null I mean I just moved to the next bucket but self- current is still null so I add one to the bucket the bucket becomes two we check to see if we're done right if it's greater than or equal map buckets return now which it's not because we are we're at bucket one and self current is equal to self map Remember map is our remembered version of the whole map so we can see all the heads because we got to work through the heads so now self current points to b equals 14 and now the while loop goes back up and now self current is not null because it's pointing to the b equals 14 item so it pops out of that while loop and drops down and says R Val equals self current which was b equals 14 and then it advances self current and self current becomes null again but that's okay because we're going to return b equal 14 on the third call so just to review we returned f equals 19 on the first call we returned H equal 17 on the second call and return Bal 14 on the third call and now it's going to loop back up and we're going to see the fourth call and so the fourth call is going to come in and um current was D equals 21 and so it's pretty simple we're we're we're not going to run the Y loop at this point um we're going to return equals 21 we're Advanced current so current is now pointing to null and now in the fifth call current is null but now that's going to trigger us working in the while loop we're going to add one to the bucket the bucket is going to become four and we say if this bucket inside this iterator is greater than the total buckets in the map return null and we're now done the fifth call returns null that tells the calling code that we are at the end of the list so if you keep yourself straight and you draw pictures like this and you think it through this is a surprisingly small amount of code to build a complete iterator for a hashmap I kind of mentioned this in passing but we still have more work to do a thing called rehashing it's not that hard and feel free to feel free to try it um at some point if these link lists get too long our performance starts to suffer and so one of the things that hashes do is in the middle of an insert they'll have something we call a load factor and it's like whoop this these buckets we have each bucket would have a length and we might check all the bucket lengths and if it got to be like over 10 or 15 or something we would go from we would have to rehash these things you don't have to reallocate you just have to make go from four buckets to eight buckets and then you recalculate the hash modulo 8 and figure out which bucket it belongs in and reconstruct all these things and so it's not impossible to do a rehash doubling the bucket and reducing the average chain length but we are not going to do that in this particular thing because we're going to keep it simple so the hashmap iterator while complex is surprisingly simple it's really very simil similar to the list iterator the the the key thing is that we've got to have that while loop that sort of Skips if we're at the end of one list it's got to get to the beginning of the next list and it's got to skip empty buckets now this is why you can see because that the things in the list are in somewhat random order the the buckets the the mapping of any key to any bucket is in random order and this is why when you think of python 2 we we can look in Python 2 and we say oh if if you iterate through a map they come out in the same order but there is no predictable order but if you do it twice you're going to get the same order and the fact that the order might change if you do inserts or deletes that has to do with the rehashing and so we're we're kind of at the point where we have built the two foundational types of python 2.0 we've built a list and we built a dictionary python 2.0 but next we're going to move like to python 3.0 and start creating a link list that maintains sorted order and can be iterated in key order uh and so that's going to be our [Music] tree so up to now we've done all the easy stuff so it's time to do the link tree map which really is kind of a modern flexible key Value Store um this is a nice key value store that you would want to use if you were a software developer um our linked tree map is ordered like python order dictionaries it stays sorted meaning that not only does it stay in order it stays in sorted order we insert things and they go in in order like a Java tree map you can be iterated like a C++ map or an ordered dictionary but not a Java tree map this just boggles my mind why you can't well we'll talk about actually why they didn't do it but it's not that hard why why they so um and we're going to have fast lookup so the problem with a A list map is it's got a lot of nice features and we can make a sorted list map and we're going to in this section um but the problem with the list map is lookup is slow and so we're going to actually pretty much use the tree part to do fast lookup and you can see this in 6.5.2 of the textbook so we are going to do something that's pretty common in data structures and that is we're going to maintain two whatever the entries and they're going to maintain simultaneously a sorted link list through the entries and a binary tree and so we're going to look at them separately and then we're ultimately going to find them together we're going to put them together so let's talk about what a tree is a tree is a structure that the tree map entry you see has a left and a right so the things to the left are things where the key is lower than the current right so H equals 42 the question is where would a go well a goes to the left where would T go well T goes to the right so the idea of the tree is is that whatever entry you're in there's a key in that in the entry and then you can either go left or right based on the comparison of the key and instead of having a head and a tail there is just a root so the root is the top entry of this tree and then there's a series of left and right choices that you make and each entry has a left and a right and the tree map entry has a key value pair and we're going to keep key and value as um public because it's just an entry but now there is no next right this is why we are using abstraction because we're not even going to give them a next we're going to have a left and a right okay and that's we're not going to show that we're not going to let people see that we're just going to give them a set of methods to mess with our tree map and we're going to deal with all this stuff and we need a left and a right to do it so just taking a look at how it works is let's just say we're going to in this current tree that I've sort of built they're not always balanced I happen to balance it just because it looks better on PowerPoint slides but let's just say we're going to insert in this tree a a g equals 25 so it's got a a key of G and a value of 25 so what you do is you start at the root and you compare it and you say oh G is less than H so we're going to go down the left hand side so you can think of it as you're walking down the left hand side side then you are encountering D equal 8 and then you're like Okay g is greater than D so we got to take the it's like driving a car turn right at intersection D and so we do and we're working our way down and now we have we're looking at intersection F and we've either got to go left to right and so FG G is greater than F and so we're going to go to the right not the left and so we do and so that basically is the path that we took and so to insert g into the tree we find the greatest value less than which is f and you're kind of inserting it if you think about it between F and H right so it's it then the next Higher One in the tree is H and the next lower one in the tree is f and that was the trip we did I like to think of this sort of sort of trickling down the tree and making these decisions as almost like a Pachinko machine um where you you hit the balls they go dinging dinging dinging it's not random of course it's it's very precise but it is kind of like you stop at the top and you make a bunch of you know binary left right decisions and eventually you find yourself somewhere at the bottom and so if we kind of look at at tree matap put and we are going to start at the root and then we're going to do a comparison and um and if we find it the comparison is going to be zero we update the value and if we don't find it and it's less than we're going to go left and if it's greater than we're going to go right and so this is basically the idea this y Loop will trickle down the tree going left going right and it will either find the value if it there's a match like if we're looking for f we'd have found it and we would have stopped and we would have returned if we're looking for G we won't find it but we will find where we're supposed to insert it okay so the last thing that we talk to which G will find its way down to the right of f we will find where to put it so as long as the tree is correctly maintained you will either find a match or you will find the right place to insert it and the tree will it's not guaranteed it's not guaranteed to be balanced there's further algorithms that can make the tree balanced but the the key thing is is that the order will be right okay so so by inserting following these rules following this algorithm the order will be right and you will always find the right place to put it or you will find a matching place and think about how dictionaries work right you say x sub hello equals something well there's either going to be a hello key in there or not if there's not we're going to put it in if there is we're going to update it and that's what this code does we're going to create a a new tree map we're going to put H equal 22 and then we're going to do H equal 42 which replaces H right then we're going to do D equal 8 then we do b = 1 2 3 and then we're going to do f equals 6 and it turns out I'm kind of doing this in order so it doesn't get too long on the page and then I'm going to do a dump and remember how important a debug is when I first wrote this code you can daral bet you that I had map dump it was matap put matap dump mat put matap dump mat put Matt dump so I could like see what it does and so the Matt The Dump if you look at the dump um then we put in K and M and J and then we dump it again and so what I've done is I if you look at the map output you see that the map output has these H equal 42 and then it's kind of trying to give you some sense of the treeness of it okay meaning that the Bal 123 um FAL 6 and D equal 8 the number of vertical bars tell you how deep in the tree you are and so you can see that the immediate child nodes of H in the second dump are D and K and the child nodes of K are J and M so you can draw this all up and so the idea of my dump code is I'm trying to sort of draw you a tree so here is the dump code now this is very very different and literally this is the first time well I talked about recursions and functions talked about stack frames and stuff like that but if you go all the way back to python for everybody I delay talking about recursion until there's a real value for it and it turns out this is a beautiful use of recursion and if you didn't write this recursively you'd probably have to write your own stack and that would be like a bummer is we we're going to recursively go down the tree and we're going to keep track of the depth and the idea of the depth is it tells me how many vertical bars to print so we come in and we're pointed at a particular place in the tree maybe the root maybe the top one and depth is going to be zero and so if Cur is null we're done a key thing to recursion is you've got to have a way to get out so if we if we get to the end of some tree sub tree and we get to a null we go left or right and that it's a null don't print anything out just you're done you've gone one Beyond The Leaf of the tree and Cur is null so just return then what we do is we have a for Loop depending on depth that prints vertical bar space that spaces it over and we're going to go down the left tree and then we're going to come back and we're going to down the right tree and this is what's called a depth first search for those computer science nerds right and so we're going to go down the left and you'll notice that when we go down the left if the left is not equal to null we are going to dump the tree on the left with depth equals depth plus one and so if we start with a depth of zero that's going to become one and then if we recursively go down further then it's going to be two and then if and and then that when that recursion comes back if curve right is not equal to null we're going to dump the tree on the right so what you see is dump the tree on the left recursively which means go all the way down the left and come back up then dump the right node then come back up go up again and dump the right node and so you see this like how the order this is a depth first search of H we're going we're going down past um d then we're going even further past B and then we're coming up from B and we're going back down from D to F coming up from F then we come up from D then we go across H and then we go down to K but then we go to the left of K which is J we go back up and then we go to the right of K which is M and then we go back up we go back up and we're done so the calling sequence to this is tree map dump tree self root with a depth of zero I will say this when I wrote this code the first time I had print FS all over the place now I pretty much know how to do a depth first search of a tree but still you in a debugging sometimes sometimes you make your tree incorrectly and you're debugging is like huh that doesn't look like what I thought it was gonna be and uh and so I don't be ashamed if you have to put print state ments in all over the place first get your dump working just make sure your dump works because then you can debug everything else with the dump the gets pretty simple and we've got the default we got the key and we've got the tree so we're going to start a while loop where we're going to go down left right left right left right right so we're going to start at the root we're going to compare the key to the key we're searching for the current key if they're the same then return the value we could do this recursively but that you don't do recursion if you don't need to if the current key if the key we're looking for is less than the than the key we found we're going to go down the left and if the current key otherwise we're going to go down the right and so you can see this thing is just going to Pinko its way down Tink dtin Tink Tink Tink to the right spot and if it gets to null then we return the default right remember this is like a dictionary if the key is not there on a get we return the default you can see why gido van rosom in Christmas 1987 created a function called get which looked for a key and took a default value because this code is what what you you write what I'm going to return null and then I have an if statement heck no just passing a default if you get to the bottom of the tree and you haven't found it return the default if I want it to be null I can make the default null right away we go so this is beautiful this is beautiful so from that beauty both the dump are beautiful and the get is beautiful iterator is a pain you just can't easily build an iterator for a pure tree if you have nothing more than that tree so a list map can support an ordered iterator we saw before a hashmap can support an unordered iterator but a tree map cannot support an iterator without building some kind of a stack and that the problem is is that the concept of current is just so complex when we're doing recursion the concept of current is just so implicit it's actually in the call call stack the notion of current because there's really when you're doing the M the the dump you have a call stack of currents and then when you go back up the call stack you're getting a different current so you're switching back and so you either have to make your own stack of currents you could build an iterator for a tree but you'd have to build a stack so a lot of folks sort of don't want to do that we could build a stack but we're not going to so what we're going to do is we're going to do a technique for our iterator that is a common technique when you have a data structure that almost does what you want and then you have another data structure that does what you want you combine them so if you look at this tree there's a lot of nice things for searching for replacing for inserting and it does all that very fast because it's it's actually log in because the height of the tree is log base two of the width of the tree right the number of items in the tree so log base 2ish so it's super fast all my trees are small but if these trees get big they're super fast goes down the penos its way down to the bottom really fast but I can't easily build an iterator so what I'm going to do is I'm going to add to this a linked list but I'm going to have the link list simultaneously working with the tree so each of these items is going to have a next and a pre and a left and a right and we're going to almost write the code mentally independently for the tree code and the link list code okay so we're ultimately going to combine these things together there's not a separate link list in a separate tree it looks like this in a link tree man so think of each one of these things having a next and a preve and a left and a right and we maintain them in such a way when we're doing inserts that everything works perfectly so we're going to simultaneously maintain with the same entries a tree and a list but we're only going to use the list to build the iterator and we're going to make this a sorted list because these things are in order the tree is helping us quickly find the place to put it but also where to put it in order so let's take advantage of that so this is a sorted ordered dictionary in Python lingo okay so just remember that these entries these list tree map entries are simultaneously participating in a tree and at the same time in another layer as it were participating in a link list the tree map which is the tree map structure has a head and a root because at the same time tree map is both a tree and a link list and the map entry is going to have a next and a left and a right and again you just almost keep these things separate right in your mind when we're doing tree things we're going to use left and right and root and when we're doing list things we're going to use head and next and these are things now by now that should be sort of familiar to you so this is the structure that we're going to build and maintain and up next we're going to build the put code for this combined two- layer data structure that has a tree and a sorted link list all at the same [Music] time so this is an entire lecture on the put method of our tree and the essence of this is we're going to be simultaneously up updating two data structures at the same time I guess that's the definition of simultaneously so just before we start this is not easy I think it's pretty much impossible to do exactly what I'm asking you to do just use a bunch of Google searches or asking your AI bot maybe maybe you can you need to really understand what you're doing and this is where a picture is so valuable once you understand it the code should look very clean and very simple to you my put code as I was writing it was like broken and like I tried to fix it and then it was broken again and I threw it away and I wrote it again and I drew a new picture and I wrote it and it was broken and I threw it away again and then like poof it was perfect so as you're writing data structure code the notion of like it's broken just accept that it's broken it's going to be broken you're going to throw it away that's the point because you know that this algorithm is eventually going to work the algorithm is not the problem it's your implementation that's the problem okay so if we take a quick look at the performance of put and I've mentioned this the binary search is log in while a sorted list search is order in meaning that if you have a search list of 500,000 or a million it takes on average 500,000 lookups to find it whereas a million entry tree search you take the log two of a million and you get about 20 so the key thing is we're going to use the tree anytime we're searching and we're going to only use the list when we're iterating but we're going to while we're doing put we've got to maintain both the tree and the list okay so we are going to have to be real careful to keep in our mind and this is where I drew all those pictures we got to be able to insert into an empty list which is easy because roote is null and you just put the thing in then you got to find a right Gap a gap to the to the right of something and then find a left Gap and insert at the beginning after you go down a bunch of lefts and go down a bunch of Rights and then replacing is the easy part as we've seen in put you just that they you say if it's equal change the value okay so let's just take a look at our data structures and then we talked about these before we have simultaneously in the entry we have a left and a right and we have a next because we are simultaneously maintaining a sorted link list and a sorted tree the tree map has a head and it has a root that's pretty much it so in our in our Constructor we set the head to null we set the root to null we're empty we don't have anything in the tree we don't have anything in the link list so the empty list is easy right and so we have to scan to see if it's in the list right and you know if you know first we can just say hey if self head is null and it be the same as saying if self root is null well we just point head and root to the new item okay because we're inserting H equal 22 I'm going to insert these in order so the list so I I don't run out of space on my PowerPoint slides and so when we're done with this root is going to point to H equal 22 the left is going to point to null the right is going to point to null the next is going to point to null and head is going to point to the item so we have a valid link list and we have a valid tree at the same time okay so again this is not the whole put code this is just the first part where we there's some dot dot dots in there where you're putting all the data in putting the key in and setting next to null and left to null Etc so that's that's in there but the first one is pretty easy from The Head and the root okay so we we keep doing that for a while we let's just take our we put some things in there and we've got our our link list going correctly you can just verify that you run through they're all in order you can take your your lefts and your rights and all the things that are less than H are to the left of H and all the things that are to the right of H are greater than H away we go so we're going to write some code and what we're really going to look for and this is the tricky part is to find the item or the Gap where the item belongs now the problem is is we're going to have to link these things back up when we were just doing a tree it was easier because you would either find find the item or find the place to link it okay so trees are easier to easy to maintain the link list is harder to maintain because you've got to keep track of the item that is the largest item less and the smallest item greater and that's what I call the Gap so this left in this right are to as we're walking down the list we're going to keep track of the the greatest the smallest greater item and the the largest less item okay and that's what this left so we got CER we got left and we've got right and you can think of left and right as like breadcrumbs like we're going to throw breadcrumbs when we turn when we're going to turn to the left we're going to remember the right when we turn to the right we're going to remember the left and you'll see it in action so here we go here we go so we got this tree gal 29 we're going to insert it so then what we do is we compare it to H = 42 and then we say oh that's a turn to the left so now we know at least at this point the greatest value the smallest greatest value is H and so we point right at H as we are moving down the tree and you see that you do the stir comp you see what it says if comp is less than zero we're going to turn turn left and then right is going to point at Curve where we were and now the next time when we're going to turn right which is what we're going to do next it's going to remember where left so we're now comparing G to D and G is greater than D so now we're going to turn right but now we're going to update left so D at this point in our search D is the largest number less than or the largest key less than G and H is the smallest key greater than G so you see how left and right are like breadcrumbs as we're sort of pinking our way down uh this tree so then we compare G to f and g is greater than F and so we're going to take a right turn and whenever we take a right turn we update left so now we have actually found the place that g belongs and if you look left and right are perfect for the link list because now we know that that left next is going to point to G and G next is going to point to H so so left next won't point at H anymore it's going to point at G and G new next is going to point at H okay so when we do this because we've got left and we've got right we just link them in and then we insert it into the tree and away we go and so now you look what we have done is we have now left and right were just temporary variables that we had during this tree map put code um but if you look at this and look it carefully we have a correctly formed link list that's sorted in order and we have a correctly form tree and we use the tree to get to F fast and we use left and right so that once we got to the right place which was to the right of f we could just hook it into that link list with no additional cost okay see how pretty it is so now let's take a look at some other of these things so let's take a look at inserting J well J is going to go right when it sees H because it's greater than it's going to go left when it sees K because it's less than I mean I I'm inserting I not J it's going to go left when it sees K and it's going to go left when it sees J and when it's done you got left is H and right the right value and the the I shouldn't even these left I should call them the smallest wait a second the largest value less than and the smallest value greater than that would probably be a more pneumonic name is largest value less than and smallest value greater than instead of left and right I'm thinking of it as the thing it's like a bracket you got a gap and what's your immediate left and what's your immediate right and so now we know exactly where this belongs and we know how to update both the link list and the tree so Bo update the link list using left and right and then update the tree um using cerr right so away we go and we've got ourselves in that Gap and so we can insert to the left we can insert to the right now remember remember remember that if we if if our key was J we'd have found it and then all we'd have done is updated the value so remember I I and my brain does this a lot when I'm looking at this code I'm like but will I find it and will I find the right one what if it's already there well already there's the easy part okay so there we go so let's take a look at inserting a equals 17 and how this works remember the use cases got to do the beginning the end left Gap right Gap empty list and then pretty much we'll have it so a equal 17 that's going to end up all the way down so we are going to compare it and we're going to we're going to turn left to the left to the left I think there's a song and a dance about that um um and so we remain remember right which is the smallest number greater than and it's not going to be right is not going to stay H equals 42 because we're going to compare a and d and then we're going to go left again and now right is going to follow us and then we're going to compare A and B and we're going to go left again and right is going to follow us now the interesting thing is left is now null we have found the place we're going to insert to the left of B but we're also going to insert through the head because we know that left is null which means that we just found the lowest thing in the current something lower than anything in the link list so then we just hook the hook a in before B right after head and then we hook it in to the the left of the B going with a larger than anything else that's in there we're going to end up with this at the end of the link list we're going to compare X to H and going to go right and remember left the largest value less than for now we're going to look at K we're going to go left again and left is going to be updated to be K the largest value less than then we're going to compare X to M and X is still greater than M so we're going to go I right again and left is going to be m equal 67 and you'll note that the thing we detect here is Right equals zero which means we have no value greater than x in this list so we just say x next the X current the current thing next is null and the Cur next is you know points to the X entry and then we hook it in on the right side of the m equal 67 and then we are done so if we're going to do a replacement remember I told you this was the easy one so again in my brain when I was writing this code I'm like but what if it's already there calm down that's the easy one okay so we're comparing F to H well it's to the left we're going to keep track of right but we're not going to use them because we're going to find it and then we have a match and then we just see it and go like oh fine f is 16 we're done life is simple if they're equal we found it you know we don't have to keep looking we found it we just change it and again think of how fast now how fast this works when it's the key already there it's like nothing to allocate no links to make you just change the value and you're done so it's important to test all of these cases and I just went through every single one of the cases and showed you what they're supposed to look like I I will tell you that you will make mistakes and I will tell you that asking Google for help as long as you read what they say will inform you but but I doubt that Google's going to just give you the whole code of something that is as intricate as this I would say a simultaneous sorted link list and a tree at the same time maybe it can do that because that's what you're really doing so up next we are going to go back to the beginning and go back to python for everybody to wrap things up [Music] well it's been quite a journey we have built in C a whole objectoriented pattern reviewed all of object running programming implemented a number of different python objects in c as a way to understand how C++ Works how Java Works how python Works how they all work under the covers so we come to the end of this sort of walk through all these amazing data structures and I hope you've had fun but one of the things I like to do at the end is I like to go back to the beginning so some of you have been with me from the very beginning python for everybody may be the first programming class that you ever took and I want to now finish by reviewing the very first program that I ever showed you in Python for everybody it is from chapter one I love this example and this is counting the most common word in a file so it's in Python we read a file name we create a dictionary we read all the lines we split it I think we we don't do conversion to lowercase but um then we're going through all the word and words and we're we're we're saying we're going to set the counts to counts. get word comma zero remember if you when you first saw that zero is the default then we're going to add one and that's the way when we see the first word we we sort of bootstrap whatever the word is that we're looking at then we have a Max Loop so we're going to iterate with items we're going to look for word comma counting items and we're going to do a simple Max Loop and then when it's all done we're going to print out the largest word and the count of the number of times that large word was shown so fast forward here we go now you by now have built a tree map hopefully so now what we're going to do is we are going to use your tree map code and we're going to implement this count so we're going to have a tree map call The Constructor for it that's our dictionary we're going to have the tree map entry that we're going to need to use to go through the iterator we're going to have a tree map iterator we're going to create cuz we don't have strings we're going to create a 100 item 100 item array name Char array and word and yes it's dangerous we're just not going to be too mean to our code and blow it up but we could and then variables like I and J and count max value and Char Max key that's all of our setup stuff so we're going to open the file name using scan F now we're in C it's not python anymore but you can see the similarity so then we're going to do an F open of the file using read and again you see the similarity we're going to do a scan F through f scanf with a file pointer and we're going to do a percent s which gives us a word and and word there is a pass by reference because it's remember words and array if we don't get an end of file we're going to then we're going to write a for Loop to go through word and call to lower which is in ctype.h and then we are going to carefully put a new line at the end of word and then we're going to get the current count with map get ask for map which is like self word is the key and zero is the default and then we're going to do a map put into the word position with count plus one and then we're going to f close it or close the thing and we're going to dump the map then what we're going to do is write a Max Loop Max Key equals null max value equals negative 1 it's a count so I guess we can assume that negative 1 works here because there's only positive integers in our dictionary slash treap we're going to ask for an iterator we're going to create an infinite loop we're going to ask for the next item from the iterator if it's null we're done and if Max key is null or the curve value is greater than or equal to max value the one we're looking at is greater than our current Max we retrieve we return We retain Max Key and max value and when we're done done we give back the iterator and we print out the max key and the max value and then we delete the and so that's the miles to go before I sleep been a long time but the end is really the beginning these are the most basic data structures these are the classic data structures these are the data structures from chapter six of kernigan and Richie these are the data structures for 40 plus years that people have been learning about once you get good and I hope you have taken the time to get really good at these data structures because what these are is they're like the omelet of cooking they're easy and it seems like everyone knows how to do them but until you know how to do the easy stuff you can't understand the large fancy stuff in a recipe you need foundational Notions and you can create something amazing if you have done all the work in this course and you've done it well your journey can continue with many great cookbooks the one I'm showing you now is what we called CLR um because of the the authors when I did it back in grad school there was only three authors not four authors CLR and this is a thick book a very thick book and um what you're going to find is this is a very well-written book and if you know everything in this course you should be able to open this book up to war Shell's algorithm and write an implementation of C orell's algorithm because you know how to allocate things you know how to create structures with pointers in them and you know how to deallocate them and if you learned every lesson in this course you can start you can almost open anywhere Alpha Beta pruning all all kinds of things you can just open it up and go four or five pages look at how they describe the algorithm and then Implement so I'm not going to teach you every one of the algorithms in this book what I've taught you is what an algorithm is and what the foundational pieces of all algorithms are okay so I wish you luck and I encourage you to keep going on your journey your journey is not ending It's [Music] Beginning hello and welcome to the lecture in C programming for everybody that I call the epilogue and that's because this lecture happened after the course was completely finished I have a saying in my life that basically says that when you think you are finished with a journey often that's when you finally know where the journey actually begins and that applies in C programming for everybody because C programming for everybody for me was four and a half year project to create the book create the autog graders create the lectures get it up on corsera get it out on the internet Etc and so I just was going through the class and at some point I ran into to chapter 6 of kigan and Richie and I'm like uh what will some good examples to use what what are some good examples that I can use that will be relevant to the students who who perhaps have taken python for everybody and I'm like well why don't we just Implement some python classes we'll see how complex they get and if they get if it works out well it talks about the concept of interfaces and implementations you know and so so I think it worked out really really well so as we were going through chapter six of kernigan and Richie I built us a python string class and you can go back and you can watch those other ones as an extendable car character array with some chunking where it would allocate some space and then fill that space up and as that space filled up it would extend it and I made a list class and I used the link list from cigan Richie chapter 6 if you recall I made an extra little bonus section 6.5.1 where I talked about link list explicit itly because in the original kigan enriching 1978 book and I believe it's the same in the ' 84 and but I'm like I'm going to show you link list first and so I added this little piece to the the Canon the kernigan Richie Canon I added that I implemented the python dictionary using the technique of kernigan Richie 6.6 pretty much straight on when I built the python string class you'll notice that there's a structure that has a length and an Alec and the Alec has how many characters we Al ated and the length is how many of those we've used and we're putting automatically a zero BTE at the end of it as you add things together eventually you get the length to be nine with a zero bite and the alak is 10 which means we can't add the letter d and so we re use realic to extend it and make it be 20 and then we have space to put both the letter D and the end of string we built a python list class it was it was so natural to just make it be a link list and Link list has has two structs one is the link list itself that has a head pointer and a tail pointer and a number of items that like lets us give back Len when we need to give Len back but then every node we're going to just have a list of of character strings or character arrays and so we have a pointer to some to a character array and then we have a pointer to the next and so when we sort of do our Constructor we allocate the object and then we set the head and the tail to null to indicate that the the list is currently in empty and set the count to zero and we're done and then as we add things to the list right we have these pointers and we point to the head to the beginning of the list the tail to the the last item that we added and we store using Malo the strings so that we get a pointer to a string that the list owns rather than the the parameter which doesn't belong to us and then we hook the next up and there's a little tricky stuff right if the string is empty which means head is null then we just Point head at the newly allocated node if the Tail's not null we take the last thing and point it to the one we just made and set tail to new and then we allocate the string and then copy the text from the parameter into text and then we store that as a pointer and then we update our count like if we add another one you have to kind of graph the new one in beyond the tail so the tail instead of being null now points to the one we just created the fun and then we update tail to point to that and then the next on that one is zero because that's our way of ending the list and if we're going to iterate through this list we start at the head look at the item then we go to next and look at that item then go to next and look at that item and go to next and it's null and we're done again we were able to build a quite competent python list object from that and as you might expect uh when we switch to building the dictionary class we just go into section 6.6 and go in and make a hash table with buckets and it's uh literally the the hash the bucket based hashmap is probably the most common programming interview question perhaps it's less common now because every know everyone knows that it's a programming interview question right and so you know I I just figured of course the dictionary is going to be a set of buckets that are a set of pointers to lists you know recall that hashes are some function that takes the key and creates a large pseudo random number which means it's deterministic but it's widely distributed with the idea of to limit collisions so John Smith and Joe Smith hopefully will hash differently even though they're very close right and so the way we did this again following kernigan and Richie is we use the key computed a hash which is a large integer generally but then we modulo it based on the number of buckets and so the buckets are in a sense for link lists and if we wanted to write this code in a way we could have kind of used the list object and said here and and and and be a little bit less repetitive but we just implemented the whole thing so if you look we start with a struct KR dict which has a number of buckets it has four heads and four tails and a count and so it's just heads and tails of the way you do link lists and if you look at the node we're going to do key and value and I'm going to make it be a string key and an integer value again to simplify right um and the and the Char star key is a pointer to a key that we're going to save and if we look at the new operation we allocate the actual dictionary object we decide how many buckets we're going to have the way I Define struct dict it's just a Four Element array then we set them all to null so that we know that they're empty because it's important to know if the heads and tails are each of the link lists are are empty now one thing is this is doesn't have any expansion mechanism and so I just wanted to keep it really simple to show you the data structure so I kind of like punted on rehashing an expansion and then we set the count to zero and then we we return it as we're inserting things in right we use the hash to figure it out and then we simply have four link lists so if you were to compare the KR list code to the KR dict code you would see that a lot of it looks the same except we're starting with a head that has been chosen by a hash computation along with modulo based on the number of buckets as I finished all that up and I finished the class up I wondered I began to really wonder I looked I started looking at it less from a c and KR kigan Richie perspective and more from a python perspective and I'm like did I just inadvertently do exactly what GTO van rossom did did GTO van rossom read this book like most of us did in the 70s and 80s and did he just say you know what I'm going to make a list object and it's going to be link list and I'm going to make a hash object and it's going to be a set of buckets link lists and buckets like everybody would do so I decided I would ask gido if I could come out to him see him and talk to him about the influence that kernigan and Richie chapter 6 had on his design of python and in particular was my guess because I didn't look at the python code to do this I was really teaching kigan en Richie chapter 6 and so up next we have the first of two interviews with Kido asking about how he built his structures and whether or not my structures that I had just guessed and assumed were were even close and so we started the interview with me handing him a copy of the kernigan and Richie c book which was signed and I left it with him and saying you know skim through chapter 6 and tell me how chapter six affected how you built the original version of python [Music] where in the python 001 did you start building the objects and where where did the objects come first and then a syntax came or did you build a syntax and then the objects I think in my head I had both okay because I I was building a stripped down version of ABC and I had sort of I was very familiar with how ABC implemented its data structures and I had pretty welldeveloped ideas about how how I would do it instead of the ABC way both for the syntax and for the data structures so for the syntax actually my main gripe about ABC was that uh it used uppercase letters for the keywords of the language they had a reason for that but uh I didn't think it was a good reason and it just looked horrible to a Unix hacker like myself so that's what I wanted to change for the syntax but I knew that I wanted to do the indentation and I I had already participated in the paror for ABC so I knew how to do that stuff I had some some of my own ideas but I I knew what I wanted I I literally actually started with a parer and actually the the I started with a lexer and a parer those were actually the first bits of the language that I wrote but before I started I knew I had very specific ideas on how the primitive data types were would be implemented I would use the same reference count mechanism that I knew well from ABC Uh I would Implement integers in a similar way because I want I want it that would be an e would be an easy choice not to put in the object fi yeah no I sort of I wanted everything to be an object that that was also a thing I approved of about ABC and I think I I have to take it back about the arbitrary Precision integers those came quickly but I don't think that they came immediately there was there was an integer type which was 32 bits there was a separate long type which survived until the end of python 2 which was arbitrary position and there was a flow type uh then for the the sort of more complex data structures uh and and and sort of the the the in the numeric types were not all that different or interesting from not that different from ABC uh but for the the rest I I sort of had seen what ABC did which was that everything was implemented as a tree even strings and I did not like that because I wanted to interface with system calls and C libraries and I said I I want strings to be arbitrary length but I want them to be a linear buffer and so too bad if long string sort of uh requires allocating a large buffer at once uh most strings aren't that long I'll make sure that it works for any size but uh I'll optimize for the short strings that that are the bread and butter of so many programs I imagined would be written in Python that that is a brilliant choice but not automatic or intuitive that that would be the right answer having having sort of written a lot of C code and and knowing that I wanted python to be extensible with C that was also one of the the very early choices I I wanted to to sort of Link back to code in a natural way so the sort of the the import system was part of that so when when python was a month old or maybe two months old if you were appending to a string in a loop was it basically extending reallocating and copying no strings were always immutable so soor yes it was it was allocate it was calculating the size of the result allocating a new string object and then copying the two Originals into that there there is a string resize internal operation that is sort of intended to be only used when you're building up a string before you've shown it to anyone else right and I I needed that because I was envisioning an IO system where you say oh I'm going to read a line and I don't know how long that line is going to be or maybe I'm going to slurp an entire file into a single string and I don't know how long that file is so I'm allocating a large enough buffer I'm reading into that buffer and then if it turns out that I allocated 1,000 bytes but what I read was only 15 bytes I reallocated to give the sort of remaining 85 or whatever 900 byes talking about your thinking Before You released the very first version of Python Meaning you didn't like at some point you came back from vacation and you handed it to somebody at work this is your thinking when there's only one person before even the 0.01 oh yeah yeah I I sort of I wanted strings to be done that way including like the the little detail that if you have a string of say 10 bytes you allocate 11 byes and you put a no by at the end just so that if you happen to want to pass that string to a C library function that expected zero terminated strings no by terminated strings you wouldn't have to copy it there might be a null bite in the middle so you might the things might still go wrong but if you youve sort of knew or trusted that that wasn't the case you you wouldn't have to make a copy with one extra bite just to make sure that that no bite was there the no bite is part of the data structure only of course visible on the seaside so for lists I had a similar idea again lists in in ABC uh were a twee structure that was sort of super efficient even if you grew a very large list from small ones and I thought the three structure was way too complicated so I said okay list just a list is a mutable data structure that was sort of a concept that didn't really exist in in ABC in ABC everything was immutable I I thought well pragmatically speaking I prefer my larger data structures meaning lists and dictionaries to be mutable and so the list was the implementation was always just a PO or two a buffer that that could be reallocated we call it list in Python it really is an array that that is just is an array of pointers and each pointer points to an object we know how long that array is that's in the object header and so if there's no room we reallocate it and if we throw something away from the middle then we shift everything over and we also reallocate okay the only the only Improvement that happened to that data structure in the last well let's say 34 years is that the original implementation did not have over allocation I was relying on realloc doing some kind of chunking so if you realloc something from a th000 bytes to 1,4 bytes I imagine well internally aloc probably aligns everything in in chunks of 16 bytes or more and so it's not going to move that that memory and that sort of eventually that was shown to be either false or just inefficient you would do it as well as you would have done it but eventually it didn't it didn't yeah and and so uh eventually there there sort of internally there are two sizes that are held in the list object header one tells you what the length of the array is for of the list is to the python user and the other one tells you how much space there is in the array which and the second is always larger than the first and I'm shocked that it wasn't the link list oh really yeah I'm shocked oh I'm sorry named list uh but okay yeah I know it it I get what you're doing so then talk through as you built the earliest dictionary structure what what's different between that uh so again in in ABC dictionaries were trees and in the case of dictionaries uh they were kept in sorted order by the key the key was always some orderable object well I think in ABC everything was comparable yeah at least two things of the same type so ABC p and and again I thought that was was too complex and I I had skimmed at least can volume 3 which explains the co concept of hash tables and I was familiar with hash tables in Pearl I where I think they're cold hashes and so I just sort of I Le through the table of contents of C volume 3 and I picked a hashing algorithm and then and and sort of a hashtable organization that that felt right and so I I sort of I chose open hashing instead of uh sort of having separate linked lists for buckets uh and the original hash algorithm for strings at least was something I don't know if I picked the the the hash function out of K also but I probably did between Python 37 and python 3.8 dictionaries kept their order ah ah like what happened was it the Revenge of ABC you know meaning that the trees they so the the it's a different kind of order in in ABC the keys were sorted so uh if if you if you have numeric keys if you have key the 112 and 500 in your dictionary in ABC at least the keys are ordered 112500 or 1 123 or whatever and if you insert 11 it gets inserted between 1 and 12 on the other hand in the the newer python dictionaries that preserve order uh it is insertion order right so it is not mean because that sort of python dictionaries don't require that the key type is sortable is comparable it only needs to be hashable and so we can well and it it and of course it needs to be com you need to have an equality comparison uh is this string equal to that string but you don't need you you never need to look at is this string less than that string so what did you do to make it keep insertion order uh so that was in a time when I had long relinquished or delegated development of most of the basic data types uh I think we had a developer in Japan who sort of for years had been improving the efficiency of the dictionary type M and sort of one one of the problems of the original design with open hashing that I picked from K is that it's pretty space inefficient because if you have uh I let's let's see if I can reconstruct for each key value pair you have to have a pointer to the key let's be oldfashioned and say that four bytes you have a pointer to the value that's another four bytes then you have the hash which is another four bytes so now the the hash table is an array of structs that are each 12 bytes long and for the hashtable algorithm to the lookup and insertion and deletion algorithm to work at all uh you can't have the table be more than 2 third full so that means that if you have an array of a thousand entries you can store at most six or 700 key value Pairs and so you have three or 400 uh times 24 bytes wasted space and so our Japanese uh cev figured out a way to have Separate Tables where the the sort of the hash table only contained one thing I and uh the actual key value Pairs and hashes were kept in a a table that had no holes in it so they were basically like kind of growing filling remembering everything is remembering where things are La out and so first he stumbled upon sort of I think he he refined the algorithm a few times having these separate arrays and then he stumbled upon the property that oh hey it it happens to preserve insertion order in the second table for sure right exactly in in in the second table because the the the sort of the table in which you jump around based on the hash value now just has in an index in into the other table and so there is an additional space saving because if you if your hash table has less than 256 elements uh that array only needs needs to have one bite for the index and so there there there there's like all kinds of cleverness there it comes as a surprise to me that you don't do link list really I could have told you that 10 years ago that it doesn't do link list you you I mean I guess and that has probably to do with your your desire to interoperate with seed kind of just percolates throughout that uh blocks blocks of things that can be extended and then filled in seem to be better than generic n what I didn't know at the time that that's also a a good architecture for Modern Hardware because you have you have better cach locality exactly I I would have thought which is not a concept that that I think I I even knew existed in in '89 so I I I think I just avoided link lists because because I didn't like them for some other reason that's cool that's exactly it's not exactly what I hope you'd say I hope list over and over and over again cuz I just have assumed all my life link list were the you know link list and hashmaps and Link list on top of hashmaps and Link list link list link list cuz computer science thinks about link list all the time there are plenty of pointers in Python yeah but but sort of the Classic Link list is not used much [Music] so I hope you watch that interview carefully one of the things that I do when I edit the interviews that I have with luminaries is that um it's not uncommon that the questions that I ask them are not perfect questions and then you kind of what I do at that point is I'm like whoops my assumptions were Incorrect and and so one of the things I did in the editing of the interview that you just watched and you'll see it in the interview the next one is that I didn't cut out all of my confusion and that the reason for that is is that I wanted you to see the moments where I had an assumption that turned out to be wrong and then I'm kind of mentally scrambling to ask a a good question and I'm and I'm asking for clarifications and so during that video you can see me learning from gido the summary of this is that gido really didn't use the link list much at all he didn't use the link list for the list object and didn't use the link list for the string object and didn't use the list L linked list for the dictionary object surprise surprise I was completely wrong the python 1.0 list in dictionary objects were extendable arrays of pointers and not linked list at all while gido was an expert in k&r chapter 6 like most of us were his much more recent work was um in ABC and C++ and so he really wasn't looking at k&r for his data structure implementation and more importantly he was looking at ABC for his data structure implementation or more specifically he was looking at ABC and saying I don't like the way ABC did its data structure implementation but he didn't then go back to chapter 6 of kernigan and Richie and say well I'll just do it this way which is again I'm a computer scientist and my instinct is like chapter six of kernigan En Richie is just the ground truth why wouldn't you do that so I think he started with lists as simple extendable arrays which makes a lot of sense because you're either linearly looking them up which is not the fastest way or you're looking them up by position like sub five and why not use array so that just you know once you talk to him and he walks you through it you're like oh yeah I get it I get it but then the other thing is is he didn't even use Link list and dictionaries and you can see me as I'm asking that question like Inc incredulously saying like please tell me that you're that use Link list and dictionaries and buckets like like all the interview questions for the last 35 years and the answer is no so he looked at a an earlier document and this was the truth of algorithms for all of us in the 70s um and that is Donald nth volumes 1 12 and three and here's here right here is Donald newth volume 3 the one that I scanned uh to get what's in there and here in where are we yeah Collision resolution by open hashing so what he was doing was what good computer scientists did of the day and that is read through this kind of a book and found inspiration for how to build a hashmap because he knew we wanted to do hashing so we we we did hashing in Kar chapter 6 and he wanted to do hashing but he did it a very different way and it has to do with the Collision resolution and and the linear probing and so again there were no real link lists in the core data structures um it turned out that and we talk about this a little bit in the video that um there are performance advantages to not using link lists and it and the interesting thing is if you look at when Gita was actually building um python we were all using computers that didn't depend heavily on cached memory architectures meaning that the CPUs that we were using and the memory that we're using had a speed match much better and that's because all of it was in refrigerator size computers and things were just slow enough that the CPU was not that much faster than the memory but when this CPU became a single chip CPU in sort of in the late 1980s and early 1990s when the floating Point fast floating Point even ended up on a single very large very hot chip um the memory could never keep up because what happened inside the chip which is you know maybe 3/4 of an inch to an inch or more likely more like a half inch um that was so fast inside the CPUs that the memory just couldn't keep up and so they put caches inside the CPUs that could keep up with the CPUs but link list caused this bouncing bounce a bounce bounce bounce a bounce bouncing through memory that would that blew the cash and so if if you were to you know try to run a pure link list based operation with you know a 10,000 long list it would perform terribly on a 1992 93 94 computer but gido wrote this thing in 19 like 89 and 90 and the so the so he wasn't like thinking I got to make an cash efficient data structure he's just like I like arrays but they turn out to be really good for cash architectures and so to some degree if you were to go back and look at it and say well let's go back and add link list you'd say no because link list would have a really bad performance impact if we did them in a sense the way I did them when I was teaching you chapter six of kernigan and Richie and so that's why it was you see the Delight I mean I'm wrong when I'm talking to gido I'm wrong the whole time but I'm learning and I'm like oh that's so cool so let's let's do a little bit of a review and this sample code is available to you so let's take a look at my re-implementation of a python 1.0 list not the kernigan Richie way but the gido way and what you find is you look at this code and and and I I'll get some code walkthroughs and you can look at those later but if you really spend some time and compare the link list implementation to the extendable array implementation you'll realize it's simpler so for one thing we only have one structure it's the list we have again how many allocated spaces are in the list much like the string that we did I did the string pretty close to how gido did the string but the list I got wrong so you have an allocation in length which is very much like the string that I did and then a an array of pointers okay so that's what the Char star star says that's a an array the first star is an array the second star is an array of what it's a pointers to items okay and that is an array of pointers to characters that's what that's saying and so if you look at what we do we allocate the the thing we we set the alloc to zero and I mean alect to two and the length is zero and then we allocate an a two item array of pointers we know that length is zero so we know that none of them are used and so that's the data structure in a sense it's already simpler than a linked list and if you append first you got to see if it's you've got space to append right and if you've got space to a pend well that's okay you just allocate the new string you copy the parameter into that string and it and wherever the end is and length tells you where the end is you put it in that position and add one to the length and so at the end of the first one you've got a half full link list now uh with a zeroth item pointing at the character array that you just saved and then if you put the second one in you do the same thing and we don't have to do anything right now because the length is two in the alicus 2 we've got a completely full array because our list has two items in it but then the next time you come in self length is greater than or equal to self alic so we're just going to extend it it so I just added two entries for Simplicity and then I do a realic and what realic does is for the things that were in it before we re they get copied if we get a new pointer back sometimes you get the same pointer back with a little more space allowed at the end sometimes you get a new pointer you can't tell with realic computer scientists like myself who were trained on link lists really tried to avoid realic and maybe that was a good idea and maybe that was because realic wasn't such a great implementation and gido and I talk a little bit about like uh real realic let gido down as python you know progressed and became more and more significant so he tended to start doing his own memory management and not depending on realic which is a combination of the C runtime and potentially the operating system but the key to realic is if you got two items in there you might get a new pointer back and that's why I've got to reassign it a new pointer back but it'll copy the ones that are there but you're responsible for setting up the ones that are new now for us because length is all we need we don't even have to set like two and three to zero we don't have to do that so we just now have four spaces and we save the Malo we maloc it again and we copy the save thing in and we put it in at the end now we have space and then we add one to the length and I have some code walkr that goes into this more in more detail but let's just take a quick look at the shapes of these two approaches right and again again I just assumed link list but I'll tell you that I apologize to you I'm like well some of these for Loops in link lists are not the greatest thing blah blah blah blah blah blah right and if you just look you can kind of see how the python in the lower right that just has an array of pointers is simpler than the link list on the left and again we computer scientists have always like taken pride in the fact that we understand link lists but that doesn't mean that just cuz it's something we know how to use that it's the right thing in all situations and G gido chose to go elsewhere and then if we look at the uh the code in the upper left we're dynamically extending the pointers there was no reallocation in my KR list of pen because I didn't need to because it would always alloc a new node so there's two malaks in here in the the kernigan en Richie one you Malo the node and then you Malo the string whereas in the in the G one you just Malik the string and every once in a while you realloc the items the part of Link list that always gets me and I just have to draw a picture every time I do it is that part in the middle of the KR list uncore append and that's if self head equals null self head equals new if self tail not equal null self tail next equals new and then self tail equals new I get it every time right but those don't roll off the T nearly as easily as saying self item self llength equals saved self length plus plus for all of the years since 1972 we just use Link list almost in some ways as a badge of honor and gido felt no real urge to do that and inadvertently his approach to extendable arrays is great for caching and it's great for fast lookup cuz you can never look up a link list by sub 27 whereas if you do it gido's way sub 27 is a very cheap operation so up next we're going to dive into what gido did as he implemented the python 1.0 [Music] dictionary hello and welcome to a Code walkthrough for C programming for everybody the code we're walking through is some of the epilog code um where we're comparing kind of what I did in my chapter six kernigan and Richie stuff to what gido tells us uh was the python one and then later the python 3.7 approach to dictionaries lists and strings so what I'm going to go through right in this one is the string so let's take a look at that code code the this is basically the code that is the string code now the pattern that I'm using here is a chunked array of characters and so like like if you look at it the string has some data but what we're adding for this particular one was something that Gita was very obsessed with uh in the early python version again from ABC and the idea is is that we use this thing called reference counting and it means that if you sort of assign something you don't always have to copy all the data you can kind of copy a pointer instead but then you have to be careful that reference count because you got an because the Dell operation has to know when the reference count goes to zero so this pretty much looks like the code that I wrote uh sort of based on the k for the k&r book to implement it with some reference Counting and the easiest thing to do is uh look at the main code here and so we create a new string we dump it we add an H character we dump it we add L world as a string and we dump it and and then we're going to set it to a new value um but then here's the new part right here um we're going to create this assignment and so this we're creating a we have a variable called X which is a pointer to a p1st and we have a variable called Y which is also a pointer to a p1st so what this is is this is P1 store assign and we're passing in a pointer and so what's going to happen here is this is like going to increment the reference count you're going to see this it's going to increment the reference count because now we're going to have two variables X and Y that are literally pointing to the same string so let's let's even run this code okay and so I've got I've got it run here um and so what we see in that last bit when we make a all all the top bit here is all on the string X but the interesting part here is where we say string x equals a completely new string and we're pointing out the location in memory that that is and then after the assignment statement we see string y equals a completely new string and it's at the same location but what we've done is we have incremented the reference count then if you look at the main code we Dell X which was the original P1 Dell X which is the original one and all we do is we decrement the reference count but don't we don't actually deallocate data and then we still have the string y we shouldn't have the string X but then what happens when we delete Y at the very end here P1 store Dell open print y Clos print then it actually frees the data okay and so the idea is we can copy a reference without copying all the data have X pointing to it y pointing to it with a reference count of two and then we can free either X or Y that'll instead of throwing away the data that decrements the reference count so let's just sort of take a bit of a look now most of this is the same as what we covered um like if we look at uh let's look at sort of the The Constructor p1st new we allocate a a buffer we allocate the object and then we allocate 10 bytes and we tell it that it's 10 long and we we put a new line an end of string in there and we set the reference count to one so as soon as we create it we assume that this new is going to be assigned into a variable and then we make the reference count be one and so let if you look at the Len and the dump and the the ETC you and you look at the append we see the append is pretty much a clone of what I did where you know if we don't have enough space we allocate another block of 10 this GTO calls this chunking in the video and then we reallocate it and then we've got 10 more more and so we can stick our character in to the end of it and add one to it and then we we null terminate the string so that code is identical to what I did uh in the kernigan en Richie book um and so let's look at the assign code so this is the interesting thing where P1 store assign we have one pointer and we're going to return this pointer so we have two variables pointing to the same block of dynamically allocated memory so when we're doing this assignment statement in effect yal X inside the object we don't need to worry too much about y or X but we do need to know that we are now referenced two places so every time we reference add a second or third reference we just add one to the reference count so self eror refs Plus+ and then we return it so then if we look at the code in the main program where we're saying struct P1 star yal P1 store aign X we could have said Y equals X but we wanted to record the fact that we've added a reference so that we know that that has reference count of two so we don't inadvertently free the wrong thing and then the only other place that this gets interesting is in the Dell method so if we go into the Dell method what's cool about this and this is where reference counts and so we we in our main code we just if we're we we delete y with underscore Dell method we delete X we can do all that stuff and it's inside the object where these reference counts are being resolved and so what's cool about this is we're saying okay we're going to Dell X which was the original thing that we assigned it into and if the reference are greater than one we don't actually free any data we just decrement the reference count and we're done and so that's where we see in the output we see decrementing reference and you see all these addresses are the same 06 0 x60 blah blah blah 91 c0 okay and so they're they're being decrement so the first free decrements it and that goes from 2 to one in this case because we the underscore assign incremented it and then the underscore Dell decremented it but then when we get the ref count to one that means we're in effect freeing the last reference so it prints out freeing reference and you can see it says free we're actually freeing the data and so that's where we do the free of self data and then we free the the self to get rid of it which is the code we did before and so the real e essence of this code is the uh uh the obsession that this this code is the obsession with reference count counting and that has to do with the fact that you want to be able to point multiple places to the same string without wasting extra memory just to make a bunch of copies for no real purpose so when you're when you're kind of making a copy that points to the original then you have to increment the reference count and decrement it and so in the rest of these uh sample code I will not add reference counting to it because we're just going to look at the underlying data structures but it's really important to understand understand that reference counting was essential to the ABC implementation and gido's C++ implementation and python 1.0 implementation it was all about reference counting to save very scarce memory so that you could point to the same string many times and the reference counts could get very high especially strings that were constants so reference counting is important and this is just you can take a look at this code and compare it to the KR code that I I built um reference counting is an important part of of [Music] python welcome to another code walkthr for C programming for everybody in this code walkth through we are going to compare how uh gido implemented list in the earliest versions of python versus how I implemented list while teaching chapter six of kernigan and Richie and um and so this was if you watch the video of my interview with GTO van rasum this was like my greatest like Revelation like what and so the the big Revelation is is that um Python's Python's list object in you know python 0.01 was an array of pointers and if we look at the kernigan en Richie list item this was a linked list I did a l linked list and so the actual KR list struct has a pointer To The Head and the tail of its link list and count and again this is Classic Link list and to some degree while I was teaching this in kernigan Richie chapter 6 I was really teaching you link lists and using the python list abstraction to teach it and so we have two data structures we have this node which just has a pointer to a uh text saved and a pointer to the next one so this is like Classic Link list I'm not going to I'm not going to go through that again so you just go back and watch the chapter six stuff and I talk about link list all the time but that's not how python does it did it or does it and it's not clear to me exactly why but he I think he was just trying to build the simplest possible data structure and we'll look at some code and you'll see that there is a certain Simplicity like already in the just the struct definition we see the struct P1 list there's only one of them if it's a link list you have sort of a struct for the node and a struct for the list itself the The Constructor P1 list new well we're just going to Malo the the object and then we're going to say okay let's allocate let's allocate to a charact an array of pointers to characters and so so if we look at the struct P1 list and we see Char star star items that is syntax for a array of pointers okay so you think of each pointer as either 32 bits in the old days or 64 bits in the modern days so that's an array and so so what I'm doing in this P items equals Malik open print p alak time of char star that means I'm allocating two elements that are Pointers which means again 2 * 64 bits in the modern world and noting that I have two in there and length which is the python view of the number of items is zero so we we' got space for two and we have zero and that's an array that we know Alec tells us how long the array is and length tells us how much of the array we've used and just to go back I'll try not to compare and contrast too much but just think about the complexity of Link lists the way I did them in kernigan and Rich my kernigan Richie chapter 6 you have this thing called Head you have this thing called tail which is null and count is zero and again for those of us who know link list this is obvious it's what you do but an array is simpler than a link list and so you know there we go okay okay okay okay so that's what we're what we've got when we're done with our Constructor we've got an array of two pointers two characters allocated two and length one so let's go take a look at let's take a look at the main code right now and so the the key to this main code is that in a sense here's the Kernan Richie main code because this is like an interface and an abstraction the main code should be pretty much the same and the main code pretty much is the same meaning that we create a link list we append some stuff to it we print the list we check the length we look something up and then we delete it and we do both things because below the abstraction below the interface both of these implementations both the KR list and the P1 list are supposed to provide to us the caller the same abstraction we can append we can print we we can check the length we can check the index and we can delete it and it does not matter what the implementation is and that so that's more the Builders of the Python runtime get to decide how to do this because we've got a contract with them again an interface so let's just take a look at the code we're going to add hello world to our list and print it then we're going to add catchphrase and print it then we're going to add Brian and print it and then we're going to say how big is it and then we're going to ask where is Brian in there and where is Bob in there and then we're going to delete it and if you look at the run you see you know the list starts out as hello world then the list is Hello World catchphrase and we'll see what this extending because we started with two slots in our array and for the first two you didn't have to get bigger but then we're like oh wow we're running out of space so we got to like extend this array we'll show that code in a bit but then we end up with three things in the list again that's not our job as the caller three things in the list and Brian is in position two which 012 hello world is zero catchphrase is one and Brian is two and Bob is not there so we get back at negative one pretty stuff okay so let's take a look at the appen code because this is where the fun happens okay so here's the append code so let's take a look at the I now let's go back how I taught you aend you know a month or so a go and so so again you got this lecture has pictures of all this right so if you're app pending um if it's empty your self head is new if the self tail is not equal null then self tail next equals new and then self tail new equals new so that's just like you got to draw the picture and add the little arrows and a way you go and then you allak can save the string itself but now we look at the P1 list and we ex we have to extend it if necessary right so if self length I.E is greater than self Alex so we allocated two in the first in the in the Constructor and then if the length is two we don't have enough space because our our next one would be sub two and that you can't you can have Subzero and sub one in a two long array and so all we're going to do then is we're going to have chunking and GTO mentioned chunking in the video we're going to chunk it to add 10 and so we're going to basically extend from two to 10 so we're going to add 10 we're going to increase the Alex size and then we're going to call realic and realic is going to take the array of items and say however big that was free it move it whatever extend it depending on what realic is how realic is working and we're going to say okay we want to have TW uh 12 of these things now 12 and and that becomes our new items now realic will also so there's two things in it and we extended 12 realic will copy the two things so we don't have to do any copying because realic copies the first two things because it knows that items is too long and so it copies the first two things and then gives us 10 more so we there's no copy code here and so I think you know gido really liked the realic and a lot of C programmers don't like realic and he he did he's like look realic says it's going to do this and I want to do this so re realic do your job and so I I I think back to my own time as a software developer I just felt because again we were taught link list link list link list I just didn't think about realic as a useful thing and gido clearly felt like realic is the answer and it lets him have this simple array uh array mentality so you just reallocate and say look here's an array that's two I want it to be 12 help me and we're done and so that's really simple code I think very easy to understand and then we make a a a save string and then we just add at the end of the array self length which in this case is sub two is that string and then we add one to the length so this code is really simple and if you were doing debug print you don't really need any addresses CU if you recall when I'm printing in all my link list stuff so that you can debug it and redraw all your lines and figure everything out I'm I'm printing addresses out all the time but no this is just a position so this is the zero the one and now the two in this case and so that's where you see when it says extending from 2 to 12 that's as a side effect of adding the third item to a list that was pre-allocated with two slots okay and that's it but then let's take a look at the print code right let's look at the print code for both of them this is KR list print let's take a look at the print code in GTO van rossum's version okay so the key to this is the for Loop in P1 uncore P1 listor print the for Loop is 4 I equal 0 semicolon I less than self length let's take that blank out I less than self length i++ that is like really basic chapter four chapter five stuff in kernigan and Richie so it's just an array so you write a simple incremented for loop it's fast cash efficient it's it's beautifully simple right so in this print this is obvious now when I showed you the same thing in kernigan and Richie chapter 6 I was I I this four in KR listor print it says four CER equals self Arro head ker not equal null Kerr equals KR next and I apologize for this line and I'm like you will eventually write this because it's an idiom you will write this quite naturally and it'll make a lot of sense to you right but in Python 1 we didn't do that it was an array and the only place that we have to worry about its Dynamic nature the only place we have to worry about its Dynamic nature is in the append right where we reallocate it so every everything we're doing here is a simple for Loop so like even the Dell command here the Dell basically says let's free all those little items let's free those character strings with a for Loop for I equals z i less than self length i++ again a beginning c programmer can understand this code and if we look at the C code in KR listor Dell we just see a while loop and remember you had had to or you had to do these in a certain order and so the whole free and I talked about all this stuff the fact that you got to do it in a certain order well this is pretty simple right so it frees each of the items it's 4 equal Z in in P1 list Dell you free each of the characters strings that we point to then we free the array that's got those pointers which are now valid because we got rid of them and then you free the object itself and so to some degree one can appreciate the Simplicity of what gido did in this by going with arrays and again the the key thing that like misled me or that gido just took a different approach it really came down to realic and so he believed in I was trained to not think about realic is plan a and so I thought link lists were plan a because then you don't have to do so many realic and and G's like I want an array and realic says it's going to do this for me and away we go so I encourage you uh to take a look at uh P1 list and KR list and put them in two windows next to each other and sort of compare and contrast and what I really want you to do as you're comparing and contrasting is I want you to think about the complexity of writing debugging and then later the complexity of understanding and how much knowledge a programmer has to understand to be able to make sense of these two uh bits of code and again for those of us computer scientists for whom link lists are very natural we just write this stuff I can write it pretty fast but that doesn't mean that it's the easiest to learn so way we go and so I hope you found this comparison interesting [Music] cheers so now I want to talk to you about the python 1.0 dictionary as built by gido back in 1989 1991 and this sample code is available under slode and it's the epilog code and it's p1d I.C so the key thing is is that instead of instead of reading the C programming book and KR in chapter 6 G van Rasam was reading page 518 of a much earlier document which is more about pure data structures and algorithms and so this was this was kind of like our Bible on how to write good fast code and this was our Bible on how to write sophisticated algorithms so gido found this and he decided he didn't want to make link lists and that's partly because of his experience in ABC and so this is open hashing using an array so this is an array based hash concept and in the bucket Styles it's there's an array of hash hash link lists and so this is an array that actually everything is stored in the array rather than a pointer to things that are outside the array the key and open addressing is how you probe and find open slots when your initial hash it leads to a collision and hashes we try to make hashes not Collide but they can Collide and so this is basically it us as a circular iteration and it actually if you look at there at L3 it's subtracting one and if it's less than Zer set I to I + m go back to step L2 it's probably just easier to show you a picture of what's going on so let's imagine that we've got an array of eight key value Pairs and this is literally an array in our case these will become pointers key key will be a pointer value be po pointer but canth is not thinking about that as far as kth is concerned everything is just a variable so it's an array of key value Pairs and the the key thing to the hash it's the same hash computation in the same modulo operation that looks at the number of buckets but when it picks a slot in the the array that slot is just where we hope to store it but if that's already occupied we got to find another place and we presuppose that there's always space and we'll talk about how that ends up getting solved later with rehashing but the key is is we got to figure out where we want to put it assuming that there's space and linear probing algorithm is you start going backwards so you go from three to two and is that available great use it if two is not available go to one go to zero and if you get to zero you got to go all the way to the end this is kind of a circular list eventually it's going to visit all eight entries but it's going to start at whatever entry the hash indicated it's supposed to start and the whole purpose of the hash is to get to the entry that has the key and value in question more rapidly or to know that it's not there and the way you know that it's not there is you run this Loop and when you find a key in a value of zero which means it's empty then you know that it's not there and you also know where you're supposed to put it so if we look at the data structures that I wrote to implement this python 1 dictionary we have a d Noe which is just a pointer to a key and a pointer to a value because we're going to do strings key value pairs to simplify this the dictionary itself has the size of the array the alic which we've been using all along count or length I switch back and forth in some of this code and then an array of D noes now remember that that's a struct so we look at the Constructor we allocate the dictionary we set its length which is the number of things in it to zero and we set the Alec which is the size of the space we can store things to two and then we allocate our two item array of struct D noes and then we have to mark them we can't just assume they're zeros when we get it back from Malik some malaks give us zeros some Maliks don't but in this case we need to no matter what ensure that we have them marked as null because our null is an indication of emptiness so that later when we're looking around we can find which areas are empty and so if we take a look at the put the hash tells us where to look and we put this all in P1 dict find which does the hash computation and does the modulo of the number of buckets and gives us back a pointer to the D node in question now this pointer is either where that key belongs and already exists or it's where that key belongs and it's empty now if we found an empty slot we don't have to allocate the D node because it's already the array we just have to set up the key with a Malo and a copy set up the value with a Malo and a copy and then record the fact that we now have one key value pair in our array we're going to use this this not only to return like Len but also to know when we filled it up because when Len gets to be the size of Alec it's full and we'll talk more about that in a second so that's what it looks like after we've inserted one key value pair now let's say we're getting a put request for a key that's already in there so we're going to say Z equals W instead of Catchphrase so we run predictor find and it comes back and it says here's your thing but the difference is you look in the key the one it's pointing to the one it gave us back the one it found already has a key now you might think that's bad news no it's great news it means Z already has a slot and then all we have to do is update the value cuz dictionaries function like assignments statements if you have Z that has catchphrase and then we get Z equals W you just you're supposed to store W and then throw catchphrase away and so that's you see it do that right it it frees old value which was catchphrase and then Malo the new one and then stir copies into it and now at the end of this one we still have a length of one and Z maps to W in our hashmap so now let's add let's do another put let's say y equal B we're going to do PCT underscore find it's going hash the value y it's going to give us back position one now that could be because it hashed to position zero in linear probing found its way to position one or maybe it hash to position one it just doesn't matter predict fine says look this is the best possible place to store why Y in this particular array so now at the end of that we will have z and y and we have an Alec of Two and a length of two but now let's say we want to insert C so the problem is is that we're full which means that predictor find is not going to find one it whatever it hashes to it's going to look through all the rest of them but there's no space there's no space in the array so what we have to do is if it's not there and we're out of space we have to expand items that is the code what we call rehashing so let's take a look at the reaction code we'll look at this in some detail at the high level we store the size and the array of items in Old Alec and old items respectively then we double the size that's just how we do it and then we allocate a brand new array of D noes in this case four D noes get allocated and that's in items and then we do some code that pretty much looks like it came from the Constructor we're going to take those four nodes for D nodes and we're going to set their key and value to null because at this point we're kind of like halfway through reorganizing this thing our old items are available to us but new items are empty that means that we can use find and do inserting into the new one and so that's what we're going to do before we throw old Alec away we're just going to write a simple for Loop to go through them now we got to check to see if old item subi is null because when we're done with this we're not just going to wait till it's 100% full we're going to actually reallocate when it's 70% full if old item sub I is null. key is null that means it's empty we don't have to reinsert it that's all we're nameing is we don't have to reinsert it but if we find one then we go and ask the P1 dictor find where to put it and in this case we're only going to always get a new slot because keys are unique which means we can go through all the old item keys and never hit the same one twice and there will always be space because we just alled a thing that's twice as big as the thing it's coming from so there was two and we've allocated it to for it we will always find a place to put it because the keys are unique think about it for just a minute and then we just say key equal and value equals copy them we don't have to reallocate them or anything they're just pointers to the save strings that are key in value and then the only thing we throw away is old items and so you know it's it's it's it's kind of pretty now the last thing where we say old equals P1 dick find self key um we used old before to figure out that oh wait a sec we need to make some more space um so we have to then find where the incoming key because we're in the middle of an insert right now so we have to know in the new items where that belongs and so that's why we say that so if we take a look at the moment where we're inserting the c key we've got we're full up we've got two allocated right and we're looking we're saying where should we put C and the answer is you can't so now we drop in to the reallocation code so the first thing the reallocation code does is make a copy of old items and old Alec so there's the two items z and y are there and then what we do is we double the size and make an empty array of four D nodes and set their key and value to null and so we kind of have the old array and the new array sitting around right now so then what we do is we start looping through the old array and we see oh Z equals W so we just run a simple hash calculation and say where does Z belong in the new items and then we put it in there now we don't have to deallocate or reallocate the actual Z or the actual w we just have to change the pointers now you'll notice at this moment we've got a pointer in the top one and the bottom one that's pointing to the same allocated memory that's a reference count problem but we're going to throw old items away in just a bit so we Loop through that right so we find the Z and we put the Z in the right spot remember we're still in the middle of trying to insert C we've kind of temporarily paused and clean are cleaning up our hashmap to give us space to insert so then we go through the loop again and now we find that the Y belongs in position three yals B goes into position three and now we're at the point where what we have done we're still again trying to insert C but we have to first make space now we're kind of done with old items we've got pointers in the new items that point to the key value Pairs and we can actually just throw that away not only when we throw that away it not only throws away the array itself but it really kind of resolves our problem that we had two pointers pointing to the same thing which leads us to like memory leaks potentially but now we've cleaned our mess up we've freed the old items we're still in the middle of inserting C and this is one of the like downsides to hashing is this cleanup phase can take a little while I just moved everything around so it looks like what we had before and now that we've got this thing rehashed and we've got our Z equal W and yal B in an array that is four long rather than two long um we are ready to continue with the insertion of C so we say okay where does c belong We Run The Hash again with the predict underscore find of the c key and it says that belongs in two and again it might be in two because it's empty it might be in two because something was not empty and we did Collision resolution but it doesn't matter when it comes back from find remember it's bigger so there's always going to be space so we don't have to worry too much about there not being space but now we got an empty slot and we can put that value in and put the key in and and update it so that there are three items uh the length becomes three so that's how they did it how gido did it in Python 1.0 but up next we're going to talk with G we we already talked with gido about how things changed between Python 3.6 and python 3.7 the same implementation was like python 0.1 through python 3.6 roughly the same shape I just covered but things changed in Python [Music] 3.7 so welcome to another code walkthr for C programming for everybody this is another in our epilog code where we're comparing and contrasting the way I taught uh dictionaries lists and strings in the kerning and and Richie chapter 6 with how gido van rossom actually implemented diction D dictionaries lists and strings and we're going to look at dictionaries here so I've got two tabs open in my text editor I've got one that's P1 dict doc which is the python 1 implementation which is my approximation of gido's approach and my simplification of gido's approach and then I have the KR dict which is the version that I wrote kind of cleaned up and adapted for this um the version that I wrote uh as we covered it in uh kig breni chapter 6 and if we look at the very allocation and I mean the structures at the top um we have a a D node a dictionary node and and and we have and the dictionary node in my implementation is got a next pointer and so it's got a key and a value we're going to have character we're going to have uh string keys and integer values just to keep our malx down but the difference between the dictionary node in my code is that I am going to everyone is going to be part of a link list because if you recall in my code in kernigan and Richie as it's described it is an array of Link lists and so in this case I just have four heads and four tails that point to the head of a link list and so that means that every D Noe has got to be the either the beginning middle or end of a link list and so it's got to have the D noes have to have a star next in them so let's go compare and contrast that with how gido did it so gido still has a d Noe because now we're in dictionaries and so you need a key and a value um and and so I so in my KR dict I made the value be integers just to simplify it but I'm going to make the keys be the values Be Strings in my python code to be a little closer to you know what what gido did but then the change happens when we start allocating the actual dictionary object itself so if we look at what I allocated I have a number of buckets and I have a count for my struct krct but then I have an array aray of four heads an array of Four Tails which is basically a way to make four link lists that I will select among the link lists based on the hash function but that's not what GTO did in P1 dict we got an allocation in a length but then we also just have an array seriously an array of pointers to D noes that we'll call items now the fun thing is if you look at this let me go ahead and show you this a little bit different if I look the P1 list and I compare and contrast a P1 list has an allocation a length and a pointer to it an array of pointers to strings in my P1 dictionary it's an array of pointers to key value pairs but the Alec length and then an array of pointers to something is a very similar approach and you'll see when we we get to the Python 3 stuff that the there's almost a duality in in gido's mind between the dictionary and the list the dictionary is like just a slightly improved list having to do with indexing and and again it was a surprise to me and um but the similarity again when we get to python the three python python 37 version of the dictionary is going to be like oh I see what here going on but so just for now remember the approximate Duality between dictionaries and lists in Python 1.0 okay so again we have our link list nodes that have a pointer to a key pointer to a Val that'll be a dynamically allocated uh pointer uh pointer to a dynamically allocated in save string and then we have a dictionary which is then a one-dimensional array of pointer those pointers to those D nodes um and if we um if we look so so the key thing here is we need to know which entries in um this array of D noes are empty and available and which ones are used so the Constructor is a little more complex if we look at uh P1 dict under underscore new of course we allocate the kind of object itself we set the length because there's nothing in it and we're going to allocate two slots just like we did in the P1 list that that forces us to reallocate um so that we don't have to write too big of a code to cause reallocation and we can debug our reallocation and then we basically create an array of D noes right so we're like two times the size of struck D node and the size of struck D node is is um two 64-bit pointers and then what we're going to do is we're going to mark them as empty we need to know that these items are empty and so we're going to set the key which is a pointer in the value for each of them to null so you've created an a two long array of D noes with keys and values of null and again we need to we need to remember which ones are empty and are not and so uh so then let's look at the main code so that's the Constructor that's the data that's the that's the data structures and the Constructor so so what we're going to do is we're um oh man I didn't I thought I deleted that line that um yeah let's delete this line because we're printing it we've added some stuff in the print that makes it a little simpler okay let's hope it still runs that would be cool let's run ITC I change the code yay it works okay okay it's simpler I like simple so we create a dictionary calling P1 dict new we print it it's going to be empty then we put under the key of Z the string catchphrase and print it then we put the key we we put W in that should be a replacement and then we put the string Sakai in Sakai equals B basically Sally equals c basically and then a equals D and then we say how how many things do we have in there and then we do a get like a DOT get in a python dictionary we're looking for the the key Z and we looking for the key X well the key X isn't going to be in there and then we delete it so let's go ahead and run this code which we just did and um and so what this print does I've added stuff to it so you see the first print is just curly braake open and Clos curly brace empty and it's also printing out the length an Alec and so basically what it's saying is there is a length of zero and and we what we have two spaces alect so then what we do is we put Z in and that Z ends up in Z equals catchphrase and that ends up in position zero in the array and then we have a length of two a length of one because we've got one thing in there an Alec of two then if you recall we replace Z equals W now because Z hashes to the same spot we'll talk about how that happens in a second that hashes to position zero in the array and then it just replaces it and so it just replaced the value in that case and so we still have only one item in there and an allocation of two okay and so then what we're doing is we're inserting a new like Sakai equals B I should probably put some more print statements in there uh Let's do let's do that let's do the underscore put and let's put a print statement in here print F insert percent s equals percent s back sln comma key comma value oops it'll be just easy for us to debug this okay okay so we insert Z equals catchphrase that goes in hashed slot zero we have one item and a length of two we insert Z equals uh no I want to call that put I want to call that put because it's not insert so then we put Z equals W and then it uses the hashing and the lookup and all that stuff to find that that's in position zero and so we simply replace the Valu so we didn't we don't we didn't extend it at all and now it's trying to put Sakai equals B in there and with the hashing hashing of Sakai Finds Its way to position one in our two item array and when we're done we've got our two item array um happens to be insert order but that doesn't necessarily mean because this it's just my hash function is terrible um and then we have two items in and and two items Al two items in there and two items allocated now we're trying to insert Sally equals c and now the hashing algorithm which I'll show you in a bit looks through and says wait a sec there's no space here usually it that hashing algorithm says if it's above 60% full then we declare it no space and so we're going to do what's called rehashing so we reallocate and then we read the thing so we're making the space so it doubles it in size so when it's all said and done um Sakai ends up in position one and Z ends up in position two and Sally ends up in position three and we have three items in a four long array now here's an interesting thing you will see here when we go through the rehashing Z in the two two long array Z was in position zero but in the four long array Z is in position two because this is rehashing now what what happens is that's because whatever the hash value for Z was modulo 2 is zero but modulo 4 is two and that's be and so the positions don't change right it wasn't it wasn't going to end up in position one but modulo 2 and modulo four are are multiples of each other but you'll see you see in this case here Sakai Z Sally the position of Z moved and so again if you're writing an iterator that's going through this and you just inserted one and it reallocated in that that's why python 1 dictionaries and literally any hash based dictionary that's truly using hash we'll see how it worked in Python 3 in a second but when we reallocate things stuff gets shuffled around so the order is different so we made space for Sally and then we put a equals D in now the thing we're triggering is if it's greater than 60% full one when you have one and you got two that's 50% full and that's why the Sakai equals B did not trigger reallocation but if we have three items in a four slots that's above 60% and so we're going to reallocate so even though we could have snuck it in that last spot that's not good for hashing so we're going to say you know what it's time to make this bigger again so we're going to make so a equals D we're going to make space for a and they're going to double the size of the array to be eight and let's see we got Sakai equals B and position one we got zal W in position two we got Sally equals c in position 3 and a equals D in position four and we're half full at this point so we're done now we have a length of four and we say hey let's look up Z and yes we get W and let's look up X and the answer is there is no X and so our code is working doing you know our little unit test our little unit test is kind of doing dictionary things so let's take a look at uh put okay let's take a look at print because print's easier let's take a look at print because print teaches us a little bit about the data structure okay so oh come back come back come back so in P1 dict uncore print the whole little first equals one that's just there so that we put the comma out but the nice thing is it's a for Loop for I equal z i less than self Alec now if we look at the P1 list let's go look at P1 list the the P1 list the array the it's I equals z i less than self length and that's because in the list we just append in the beginning of the array Sub Zero sub one sub two sub three that's how this work but because we're using hashing our array is like sparse in that it starts out empty and we start using slots but we don't use every slot in order so we got to go through all the slots to iterate through a dictionary array and if it's null remember in the remember if self item sub i. key equal equal null continue which means skip empty slots skip empty slots so this this array could have a hundred things in it and if we only put one where where where the thing is inserted which we're going to see in a second with insert is dependent on the hash function because we're not just inserting them linearly we're inserting them based on hash function using open hashing and so we have to skip the empty items but at after that it's okay we just print them and so we're iterating through this array of pointers skipping empty empty entries and printing out the entries that exist every time we're doing an underscore print and so we're seeing them in the order that they got spread out using the hashing function okay and so you can see that it it prints the key and it prints the value and then it prints the I the position um and so that leads to you know very very pretty stuff and then we print the length and the Alec and so this is great for debugging great for debugging for us okay so that kind of reviews that the items is a sparse array with nulls being our way of marking emptiness okay so and again if you go back to the the underscore new we allocated it and we set everything to null okay so let's go look at the underscore put because that's where all the good stuff happens it's it's a bit of work yeah can I get it all on one screen no I can't get it all on one screen okay so we'll just work through it um so the the first thing we have to do is we have to figure out which of the slots in this array the key belongs in which of the slots in the array does the key belong in so we're going to use a a utility thing we write we wrote called P1 dictor find to say find me the entry in the array that is the right entry for this particular key okay so now I got to look at that P1 dict find okay so this is pretty straightforward and so that this is open hashing and so the way open hasing works is it starts by doing a hash computation to figure out a position in this array of items and get bucket you've seen this in other code that I wrote get bucket is just a crappy little hash function that does a shift and an exclusive ore repeatedly going through the entire string with the idea of creating a pseudo random number that is deterministic based on the string that I can then take the modulo of the number of buckets so this ends up with a relatively large integer that in Long strings might even overflow and again hashing Computing hash functions is a uh is is a is is a research area unto itself this is a terrible hassing function but I've used it over and over and over again because it's short and gives me some random some pseudo Randomness but it's probably highly Collision resistant so the whole idea is if I have two buckets this gives me a deterministic number between zero and and two but not including two okay so let's go back to uh the find um um operation so we we get the bucket and that bucket might be let's say it's bigger let's say we've got uh 16 slots and they're all empty the way it works is bucket will say okay you're you're you're hopefully you're slot five the problem is is then there's what's called Collision resolution and if slot five is is already filled slot you got to find another slot and but you got to find another slot in a way that later after it's in there you can find it again find the key again and so we do linear hashing uh we we do linear Collision resolution which means if we find oursel hashing to position five and position five is full we say oh well let's just linearly go forward let's look at six let's look at seven then 8 n whatever and then when we get to the end we go 0 1 2 3 4 5 and if we get to the point where we have checked all the slots and they're all full then we kind of have to blow up and that's where it says print F could not fly slot for key that would be like throwing an exception we'll just print it out here cuz but that'd be like uh something went wrong because you're never supposed to 100% allocate a hash about 60 or 70% is when you're supposed to quit and double it or expended in some other way so let's look at the code that hunts for a starting at position five hunts for a free position and so we say offset equals zero offset less than self alak offset Plus+ so that's going to go if we have like eight entries that's going to go from 0 to seven but we really want to start that iterator in five and then wrap around when we get to eight okay and so I just I call it offset but then I calculate the position in the in the um array as offset plus the number of buckets I mean offset plus the bucket which is five and then modulo self alic so if we got eight offset's going to go from zero through seven and if off and bucket is five that's going to go from 5 through 7 and then 0 through 4 okay so I is the circular look so we're we're doing a circular lookup in an array starting at five we're going to look through every if necessary every single position in that Ray array is going to be checked so we're starting at like five where the hash function told us to go and if self item sub five. key is null it's a great Splat that means that the function pointed us to a available entry and we're done so we return the address ersan self Arrow items subi otherwise we might have found something that's full and we don't know right now if we're going to replace this value or not but if the key matches what we're looking for we actually found the right one and so we return self the address of self items I and if if we didn't find an empty one and and we didn't find one that was full with a matching key then what we've got to do is go back up into the for Loop and go down one so if we were at five and we it was full but the key didn't match then we would go to six and if six was full and the key didn't match we'd go to seven if seven was empty then we're done so we say oh Seven's the where we're going to go and so you can see if you think about this for a little while and you can go read the the lecture uh slide on open hashing you can see that as long as there's space in this array and it's not completely full eventually we're going to find a place okay we're either going to find a place that matches the key or we're going to find an empty place and that's what Fine's job is to do and again as long as resize is working and we never let it get to be above 70% we can always find a slot for the key okay okay so this again when we say print F could not fly slot for key that's really traceback time because that means that the thing above us which we're going to look at now is going to be um going to blow up okay so let's look at underscore put again so come underscore put so a lot of work gets done in underscore find where we hash the key and then we do the linear we do the linear look up if it um we do the L we we do the linear lookup for cl Collision resolution and we either have an empty entry or we have the actual entry and so the first if statement we have afterwards is if old is not null and the key is not null that means we just found it which means all we got to do is replace the value we don't have to add any entries the index is great great the array is great so we just free the old value and we alloc the new value and then we just string copy it in and away we go so that's that's when we did the uh that's when we did the Z equals W that was the code that ran to basically say oh well we found Z so we just have to CH we have to free the original string and the original string is catchphrase and the second string is W so we're replacing catchphrase with w and this little bit code right after the underscore find is the thing that does here we found it okay but so that's that's the second one let's go back to Z equals catchphrase because Z equals catchphrase is not found because we're starting with an empty array so that means that old key we're going to get an old because the the hash will find a slot and that slot will be empty and it'll give us back to us but old key is going to be no which means this one is available which is cool okay now actually this to-do is no to-do not a to-do anymore right now it's going to be an ignore so this is the bit okay I'll come back to this okay this is the tricky bit now I never did this in kernigan and Richie I don't think I have to check but this is called rehashing and this is when our length is greater than or equal to 70% of our allocation which means this array is more than 70% full that's when we're going to do this reallocation okay but I'm going to ignore that whole if statement for now I'll come back so we're processing Z equals catchphrase here and Z ends up we can even look ends up in subzero of the array and so if if it's time to insert this part is really easy so it's time to insert we're going to allocate the key the value maloc the value and maloc the key and string copy them in the old already exists old is a pointer into an entry into an array that's got a key and a value okay so there's an array of key value Pairs and so we don't have to allocate the old itself because the old is already in the array but we do have to allocate the strings that we've been passed as parameters and then we add one to the length and so that's basically how it ran when we're doing Z equals catchphrase at the end we had Z in position zero we had a length of one and an Alec of two okay so now now we're going to look at where we've got a and you'll notice my 7 let's go back to that now because we're going to look at that this is kind of tricky okay so this whole self length greater than or equal to Alec 7 I kind of it it allowed me because it was only two items it allowed me to fill it up completely because one item is only 0.5 and so when I put aai equals B in I didn't trigger the reallocation so it just put it in position one I don't know if oh Sakai did go into position one so um but now we're doing C and it's like I could not find a slot for key Sally and so um that means could not find slot for key Sally so means that it searched let's go back to find I guess this is not really a traceback this is just a fact so it it went and looked it hashed this the what's the key on that again Sally that key is Sally it hashed the Sally found a a a starting point with bucket that that hash just found a bucket and where did Sally want to go Sally ends up well in three but it would probably have been a One S would have been bucket one before it got expanded and then it looked through the whole thing which in this case was only two and it couldn't find either an empty one or one that matched Sally and so we say ah can't find it return null okay and so now we get back to put so we're doing Sally now and we find old is actually null so this code doesn't run so we didn't find an entry old key is not even allowed we're not even allowed to say old Arrow key because old is null and now fine didn't find it which could be bad but we're going to fix it come back come back we're going to fix it so we're coming through here and we're saying if self length is too full greater than 70% full then we're going to make space so you can see here in the output it says we are making space for Sally we're still in the middle of the put of Sally equals c right and so we're going to grab a copy of the old allocation number and the old items those are just integers and then what we're going to do is we're going to make a brand new empty items and so we are going to double the size of our array and then we're going to set the new items that's why had toore old items here we're going to set the new items to be four D noes it's an array of 4D nodes now we got to be really careful this is kind of like a Constructor for that items we're going to set the key and the value of the newly allocated four items in that array to null because they start empty you'll see in a sec that we got to read them so we're kind of creating an empty array that's twice as big and now we got to do this is why we call this rehashing is we got to go through the old array and find all those items and then we got to add them in the right spot and this is where you'll notice that um Z was in position zero and Z ends up in this the reallocated rehashed array in position two so you really have to reash it because all the rules in the new one have to be followed the initial hash cash the linear resolution of conflicts of collisions Etc so you really think of this items as a brand new one and we've got the old items sitting there not much longer but we're going to go through them all and we're going to just add them again okay now yeah so so we're going through all of the old items if the key is null that's one of the empty slots in Old items continue then we're going to call call find again and the key thing there is that's why we made self items be the new thing is so we could call find so that's the first time it's being called in this Loop the the new items is completely empty but old item sub I sub key is the key and then um and so let's see what we got here yeah so new item if new item is null that means it didn't find Space now we just doubled the space so that shouldn't happen and given that the keys in Old items are already unique because this is a dictionary we should never find in the new empty the key twice so that basically that basically says oh it's already in there and the answer is wait wait wait we started with an empty one so new item key not equals null is a bad thing but it should never happen because we're going we have a unique each key is unique there are no duplicates and so as we're inserting them again into a new hashmap array that we should never get it so that's why it says very bad news that means that we either couldn't find Space because I don't know why or it's already in there which I don't know why either that means that this is like traceback time this is like uh the Run time library is not well formed and we made a bug in our runtime Library so then that means that new item should be non-null and it should be empty which means we can just like copy now this is just a pointer copy it's not actually the stuff because we're take the key is the pointer to the key the string array and the value is the pointer to the string array so we say new item key equals old item soy. key and the same thing for value and so now what we've done is we've copied all of the old items and we've positioned them correctly in the new items so at this point we're done and the free here is simpler because we're not freeing the strings because we just copied those we are freeing the array okay so we're freeing the old array free old underscore items frees the old array now what we need to do is we need to search for the position because we're going back to put Sally equals c and so Sally is the key that we're putting in so we just made space for Sally now again if Sally was already in there this code would have run and we'd be done and so we have to relook in the new the newly expanded array we got to find the right spot and again if Sally was already in there this code would have been run and we'd never be down here so we have to redo this and say okay where is Sally in this case we really where should Sally be long because we just made more space and so old equals null would mean we don't have we still don't have space for Sally even though we doubled the darn thing or yeah we doubled it and old key not equal no that's means that we s Sally's already in there well wait a second if Sally was in there we' not even come down here so again the selling old key not equal again you put out these kind of Trace backy very very bad news that is like somebody our code our code the LI we're the library writer our code is messing up so old is is really supposed to find no matter what we're adding Sally to the dictionary old's got to find a slot for us or we just or our code is broken so this is me debugging and leaving this in you know just to make sure like oh man that is impossible should never happen but I'm going to say very bad news so it finds a slot and then we just copy the key and the value and increment the length and so this whole realic thing is kind of new I don't think I did this when I did the kernigan and Richie because the way kernigan and Richie does it you can just ex keep extending those link lists like if you look so if we look at the um under _ put in the Kernan en Richie code you'll see I do a get bucket right and I find a bucket and if I if it matches there's always a bucket in this one because it's the buckets are Pointers to link list so there's always a bucket there's no resolution by linear Collision resolution so old is if we find it we just copy the value and this I had integer values and then otherwise you just appen to the end of the tail of the bucket using you know self head sub bucket equal all new then it's just at that point this is just SM in in P in KR dict we don't have to reallocate now you would want to reallocate at some point because then these chains uh get too long and that's and so it's not like you don't have to reallocate with these chains I didn't write it so the reallocation would be sort of in the middle of put and it'd be very complex code and it would be actually probably about the same complexity maybe a little bit more complex than what the python 1 uh implementation was and so uh with that I think I pretty much have covered the essential differences between the kernigan and Richie dictionary code that I wrote and the basic approach you know the basic approach using an array of den Noe items and if we look at find a linear Collision resolution and the linear Collision resolution let's do this I'll put a comment in here linear and underscore find H can't um I got to figure out how to spell Collision well the version you see will have uh how many s's no how many L's okay I'll fix that for you and so this is linear Collision resolution and that that's tricky stuff and so you want to take a look at that and understand that uh very carefully so I hope that you found uh this lecture uh useful I think the next thing we're going to talk about is how dictionaries changed from python 1 through 3.6 all the way up through python 37 because the dictionaries did change and got a lot better so we'll talk about that soon [Music] cheers hello and welcome to another programming walkthr for C programming for everybody this is uh one of the last walkthroughs for the epilog code and in this walkthrough we are going to look at how dictionary the internal data structures in Python for dictionary changed between the first version and the 3.7 version and so the we we talked previously P1 dict doc about in effect python 0.01 through python 3.6 and then python 3.7 and later has a new dictionary that maintains uh insert order um but it also saves a lot in terms of efficiency so here is our approximation and simplification of the Python 3.7 dictionary um and so let's talk a little bit about the problem that the python 3.7 dictionary solved by looking at the python uh 0.01 dictionary now if you listen to the you rossom video that I've got these key value pairs in the real version of the dictionary it stores the pointer to a key pointer to the value and the hash value so it doesn't have to recompute the hash now I I kept it simple and all my stuff was going to be small so I didn't didn't do this optimization to recompute the hash I could have but I just kind of want to keep the code as small as possible so I just recompute the hash in the few places I need it but this means that in a 64-bit system this is uh 3 * 8 or 24 bytes and the problem is let me delete this line so I don't break my code uh the load Factor now we never let the load Factor get above 0.7 which means that by definition and the larger the structure gets the larger the wastage is 30% of the entries have to be empty in the key value pair array and again that's 30% of 24 bytes wasted 34 20 uh 30% of the size of the array times 24 bytes that's always wasted you cannot not waste it and so what happened is is that in the python 37 version in effect items started were was treated more like a simple linear array of pointers to key value pairs so now you got your key value pair and in in Python is key value hash and a few other things so this is a larger data structure D Noe is a larger data structure um and then we have a separate simple integer array that is the index because items is an array and index is the offset into that array but that's just one integer and so we're going to do all the hash indexing and re re resolution and conflict resolution and collision resolution in this index array which itself is a much smaller item and I'm going to just make it and I don't know if if uh python does this I'm just going to let index be twice as long as items which means from an indexing perspective we never get a load Factor above 50% and we're only going to extend the array when we run out of space in items and we're going to store the key value pairs linearly 0 1 2 3 4 5 in items and then we're going to just reallocate so in many ways all of a sudden the kind of reing let me open a new tab here P1 list. C the realin code if we go back to the P1 list extend if necessary you're going to see that in in P1 list append in P3 list append we're going to see that it looks a lot like the realic pattern in the python list because it is an array okay the the actual key value pairs are stored linearly in an array and that's also how you end up with maintaining insert order it wasn't they tried to maintain insert order but at some point if you're just using a linear list 012 3 4 insert order is going to be maintained and if you delete one you shift them all up and it just treats so I would guess that ultimately inside of uh python itself after 3.7 there might be some overlap in the code between dictionary and list because they pulled out the hashing lookup in this separate data structure which is an array of integers okay so so then if we look at the the get bucket is the same it allows a hash function I make did it as small as I could just so it didn't take up a lot of screen space if we look at the Constructor P3 dictor new we see we're allocating the actual dictionary object and then we're allocating two key value pairs two struck D noes and then we're allocating four two times the Alex size of integers and so our index is four and our d Noe list our D node array is um D note array is two and we don't have to initialize the key value pair array the items array because we know with length which ones are valid and which ones aren't valid but we do have to take this new index and put negative-1 in there we did this a little differently in Python 1 dictionary we use null but right now I'm just it's an array of integers and the negative one is going to be my marker that's going to be my marker to say this is an available index slot okay so that's what the data structures look like let's go ahead and run the code now you'll note that this code is in effect exactly the same as the P1 dict code that we went through before and so we'll run it and it will look stunningly familiar meaning that you know we're putting we have an empty empty dictionary we're putting putting uh Z equals catchphrase and putting Z equal W soai equal B Sally equals c AAL D and we're printing it out so now we're printing it out this is like an iterator so Z catchphrase and then zero is is the position in the index not the position in the items it's the position in the index so this is in index position zero and when we get and so we see Z Maps w Sakai maps to be and they're in index positions zero and one it's kind of hard to hash and we didn't and be because we we don't have to rehash until it's completely full because and we have two items the Alec is two so we can put Z equal W and soai equals B remember that index is four when Alec is two so index has four integers so we still have a load factor of 50% but now what happens we're going to put Sally equals c in and now we have to extend the items and we automatically extend the index so we have a 50% load average before the extend and then we have a 50% load average after the extend well actually less than 50% load average but the point is is because I just by definition had my hash index be double the length I never get to a load average of 0.5 above 0.5 and again you can make this more complex and let the load average get to 7 but I'm I'm keeping it really simple the other thing that you'll notice is in the rehash again thinking back to the python 1 so you'll notice that the hash position of Z just like in the previous one after it went from two to four went from position zero to position two and I talked about how it's not going to completely move randomly but it it hopped in this case because whatever the hash value um modulo 2 was zero and modulo 4 was two and that's because two you know zero is modulo 2 so these things kind of go by powers of two like and so but b equals one stayed in the same place etc etc etc and so but here's the thing notice the order of the things printing out these zeros and ones Etc that's the hash position but it's not the position in items so Z equals catchphrase is item Subzero Z equal W is in item Sub Zero Sakai is B that's in item sub one hash position so Z equal W in in this output is in hash position two but items position zero and so that you'll notice that insert order is maintained in this unlike the P1 again this wasn't to make insert order work it was a side effect of changing the data structures to make items be just a an indexed array starting at zero and again pulling the hash computation not oh sorry yeah items is an array that's linear and index is an array that is looked up by hash okay so let's take a look so let's take a look at put and of course the first thing we see is P3 find and uh just underscore find let's so recall from the previous one that the goal of find is to find a bucket position that's free and available okay and so um remember that get bucket is just the hash computation for that particular key we're looking for and we have the same kind of circular this will give us a number between you know Z and eight like 0 through 7even if there's eight entries or 0 through three if there's four or 0 through 0 or one with there's two and then we have to go with the circular look okay but you'll notice now when I say four offset equals z offset less than self Alec time 2 I am now looping through the index array not the items not the items array and I do the same little trick to do the modulos so that it goes like 56701 2 3 4 so so I goes 567 01234 which is exactly what you need to do to do uh Collision resolution using linear probing which is the way we do it and it's very simple if we find an empty spot in the index now the data is not an index the data is still in items right so if we find an empty spot in index meaning it's a negative one that's a free one and that's what we're going to return notice that in this one I'm returning an integer rather than a pointer I probably should go back and change all those others to return integers instead of pointers because it I think makes code easier it looks a little more complex in some places but easier to understand in other places if it's negative one we're done if we care we look at the key and we find in items now now this one's a little trickier we're comparing the key to self items which is the key value pairs right but it's self index sub I self index sub I is the position within items of a particular key value pair so we have to look up using self index to find the right spot because self items is like a linear list it's not really a d it's not really a hashmap the only thing that's hashing here is the index and then we go grab the key out of the thing and if we match it we're like oh this is the position oh by the way I found it so in our calling code we'll check to see if it's empty or not but right now if it's empty we turn I and if it's if it matches we return I otherwise we keep incrementing now the key thing is because the the load factor is only 0.5 this is always going to succeed that's another nice thing about this it always is going to succeed because when we increase the size of items we also increase the size of index so that index is always two times the size of items so we never get a load Factor above 0.5 which means this feels to me like a lot more reliable code and if we get if we don't find an empty slot which in the other one we that was a trigger for reallocation but in this one we reallocate uh differently so returning negative 1 that's just like not a good sign okay so let's take a look again at at at find I mean at the put so this code here when we do the find position equals P3 dict fine of the inserted key we basically say if the position is not equal to1 that means this is a valid entry which means we're replacing the value and again again that's the that's the scenario where we have Z equals catchphrase and then we say Z equal W we're just going to replace the value we're not going to change the key the key is already in the right position the key is already hashed properly the index is correct the items is correct and all we need to do is grab the value and get rid of it and then alloc the new value and stir copy it in and we're done so we're done so if we find it that's great okay so we don't find it we have to insert it but we know where to insert it right here but now what we do is we simply say if self length is greater than or equal to self alic that's when we're going to expand it so here we're saying if self length is greater than or equal to self Alec 0.7 now the Alec here is the number of things in the items which we're using linearly which means we can build up 100% before we have to extend it so we're looking to see if the length is greater than or equal to self Alec here we were looking as is the length greater than or equal to self Alec times the load Factor 0.7 that's when we triggered in Python 1 dictionaries but if we look this again this aen says if self and now we in P1 list this is again that Duality between Python 1 lists and python 37 dictionaries because in a sense the items is just a list right so if self length which means now it's full we've it's 100% full it's 100% utilized there are no spaces in items if self length is greater than self aloc we're going to extend it so that means that this code here self length self alic is really quite simple right so we just realloc the items and realic twice and realic whatever we're going to do twice as big in this case and so that is like exactly copied from the list we just reic okay so but now we got a little bit more work to do in Python 3 because the items was easy we just reall it we did exactly what lists would do in this situation but now we got to fix the index okay and so uh so self items at this point is right okay so self items is right because realic did all the work for us you know we doubled the size of it but then it copied all that stuff okay and the other thing we don't have to do here which we did have to do in Python 1 is we didn't have to set all these new new items that have been allocated because in items because it's like a list Alec and length all is all that matters and we haven't changed length yet we have changed Alex so the fact there might be garbage in all that new data it might be zeros might be garbage is okay now what we're going to do is rebuild the index items hasn't moved and so we don't need it we just free Index right now remember index is that integer array that's that's twice aloc right so now we just allocate another integer array that's twice the size of whatever we've got allocated in the in items an integer array so self Arrow Alec time 2 * size of int boom and then we just have a little for Loop for IAL less than or equal to self Alec 2 I ++ set it to Nega 1 because what we're doing is we're creating a new completely empty index the key value pairs are just sitting there happy as a clam in items but now we got to make a new index okay so we're going to refill this index but it's really simple so now we're going not to Alec for I equals 0 I less than self length now we're going to go through the items and that's length and then we're going to call the find operation based on the key of each of the items and that's going to give us a position and it's going to do conflict resolution uh Collision resolution using linear probing and so at that point all the index at that position position is the position is where in this array should it go and that means that you know if it's in the third position in the hash map and we're looking at the zeroth position in in items I is zero and position is three and so this code if you compare this code again it's better memory-wise if you compare this code to the code inside if you compare this real code between the python one code and the Python 3 code oh and oh and I don't even do it in KR because it was going to be so ugly I didn't want to show it to you right so it's ugly in Python 1 it is pretty in Python to I'm seriously this you look at this for look at this code in undor put for a while and you'll be like why didn't they think of that back in 1989 why did it take them till whatever 2008 or something to to see this it's a beautiful data structure right it is a it is a beautiful data structure okay so here we are in this code where we have refilled the index and again self index is changing right and because that's how the conflict resolution works and positions in the index might be different before and after so after that we're done extending we're going to have to do the actual insert because the reason we extended is we're were going to insert a new thing in this case if we look at we're we're trying to put Sally equals c in then we extend from two through four and now we're like done extending and so we got to put Sally in right so Sally's going to go in and this is the insert code that we've already looked at right but we do have to go find its the position of Sally because we found it before in the pre-expanded index slash items and now we're finding it in the post expanded index SL items but then we just allocate the key and the value copy them in um and then add the key value pair at the end of length which is not Alec but length is like the next position in that linear array of items so we're new items we set the new key the new value and then we set the index to be pointing at this entry in self items because index is an integer so it's like if we're putting it in so if we're expanding from two to four and we're going to now use three uh position two actually um self index whatever the hash the position is computed by hashing and linear Collision clean up and then we set it to length and we add one to the length so if you watch kind of all this stuff it it just of of these four lines of code only one of those lines has to do with hashing and that is the self- index position equal self length we're we're recording it's kind of like recording a a a cookie crumb in the Hansel grle so so we're just remembering in the index where that is so we we can jump to the right position in index quickly using hashing rather than doing a linear scan of the key value pairs but literally if you took this index away you have just a list and so again let's look at python list yeah this is simpler because I'm not um yeah but all it does is it it extends the extends this was a list of strings so it's actually a little simpler but all it does is it copies the string and then puts it at the end and then adds one to the length and it's done and so the Python 3 dictionary uh does that it's kind of cool so as we put Sally C it extends it it recomputes the index and that's why the hash position of Z moved but not the position in items the position in index moved because this sub two Z maps to W in position two that's the index position not the items position the items position is just like a link list Z is in zero Sakai is in sub one and Sally is in sub two and so that's insert order and uh everything works pretty well and so for me as as I think about like the Python 3 dictionary and compare that to uh classic kernigan and Richie dictionary where I didn't where the classic kernigan and Richie dictionary was so complex on rehashing uh that I didn't even write code for it I'm like let's pretend that's not a problem okay and and now I can tell you in the Python 3 code I can tell you I can go all the way through the rehashing and tell you at some of the more beautiful code that I've seen so I hope this you found this valuable uh cheers and hope to see you [Music] online so now let's talk about what happened between Python 3.6 and python 3.7 where dictionaries began to maintain insert order not key order but insert order so it really what the the basic idea here is is that the hashing is a quick way to find a starting position in an array but it doesn't necessarily mean that everything has to be stored in hash list so so now what happens and you you saw this the code we wrote with trees the tree map exercises you can have sort of more than one data structure that you're maintaining at the same time and so that's what's happening in Python 3.7 now python 3.6 we made a big fuss about how there was no order and when you when I go when you go through the code walkthroughs you'll see that sometimes the order changes at the moment of rehashing so the order is pseudo random as it were but the order can then change at any moment because of rehashing and um but the pseudo Randomness is because of hashing and the it's not even consistent from insert to insert because rehashing and datab and not database hashmap reorganization so like I said python 3.7 separated the notion of the hash index from the key value store and so this leads to the fact that iterators that go through python 3.7 dictionaries function much more like python lists which are iterated in Key order now you don't have a Subzero or sub one because that dictionaries don't want to give you that semantic um but it basically is a python list plus a hash index for quick lookup and quick for inserts and gets and iterating through the dictionary is just like the python 1.0 list and key insert key look up by key and insert by key is still very quick now I have a whole long walkth through that goes through all this and I'll just kind of give you the highlevel picture and that will help you when you go through the walkthr so if you look at struct PCT it's got an allocation and it's got a length and it's got a struct D Noe that is an array of items okay but now we have an integer array that's index and in the code that I wrote I just made index be twice as big as the the number of items in index was twice as big as the number of items and items and that basically meant that I always had space and so I end up with a load factor of 0 five and so if you look at items it's a list meaning that we insert Z equals catchphrase zal w yal b and Cal 42 and they maintain insert order and a new insert is just done at the end so what we're seeing here is a list that's 3/4 full but we don't care about that in the same way as we did in the python 1 because the load factor of the items is irrelevant it's the load factor of the index that matters and because I'm making the index twice as large as items it never reaches a load Factor above 50 meaning we we reallocate when we need to make the list bigger but then we also make index bigger too and so we never exceed a load factor of 50% which makes things really smooth and really easy now the key thing is the index is an array of of integers and what's stored in each integer as you can see with these arrows is simply the index that the key value pair lives in the items so again this is kind of two simultaneous data structures the index is a hash items is a list and index points to the offset within the list now I didn't do it but gido could easily have they could easily shared some of the code between and some of the optimizations between list and the the items in a dict so I'm not going to go into it I do have a Code walkthrough that takes a good bit of time that goes through all this code just remember this picture but it's really kind of extracting the hash index into its own table so in summary we learned from gido surprisingly that he loved realic and expandable arrays of strs with pointers to his objects link list is not hardly used in Python's core data structures and it turns out to be a really really good choice in retrospect the code is surprisingly simple once you start taking a look at it and you're kind of glad I was kind of glad to leave link list behind even though I'm pretty good at link lists and moving memory management into python from realic was something that happened like 10 years later that EO mentioned um because the realic wasn't as predictable as he wanted it to be and eventually there's this concept of garbage collection which is underneath realic and and it was too difficult to hope that realic was going to do the things that python wanted done so the places in the code that you've looked at here that use realic now use a python allocator and realic is it what happens is realus gives us bigger chunks of memory and then python manages those and garage collects them and cleans them up Etc um and so that's you know that's the modern era of python really depended a lot less on the cleverness of realic because it just turned out to not always be as clever as we [Music] thought but there's more youo and I didn't stop talking after we talked about the data structure shape the whole surprise link list thing I'm like oh and so if you recall I had this picture which was kind of like not just how things worked I was placing in this picture I'm placing C in the context of all the languages that influence C and all the languages that influenced that see influenced right and so this is the you've seen this picture before but what I wanted to talk to gido about is I wanted to talk to him about what were the influences on python C C++ ABC and as you will see in the video modula I wasn't expecting that but I was expect I don't know what I was well I do know what I was expecting I I was expecting that he didn't like C++ and he loves C and I forgotten I didn't know as much how AB how strongly uh AB had influenced him so there was in ABC there was much to like and for him much to dislike so C++ something gido used it he wrote code in it he wrote A series sounds like a series of experimentations to like to some degree I guess maybe his question was should I is C++ so awesome that I can get done what I want to get done in Python in C++ and so he he did some C++ experiments so that he found some disappointing and so that's why he used C instead of C++ cuz he had to he could get done what he wanted to get done and um and so G chose c as a language to uh build python but gido learned a lot about how to layer object-oriented Concepts on top of an otherwise procedural uh programming language and so C++ had a big influence and I bet at some point he thought C++ was the answer and then said no I got to use uh C and and of course ABC if you look at the Wikipedia entries like ABC influenced uh Python and the answer is yes but it influenced it more than you might think and that is that uh there were things that gido liked and things that gido didn't like about ABC places he wanted to improve so it had a lot of ABC had a lot of cool types ABC handled allocation and deallocation using reference counting G elect all that stuff but it used B trees internally and be trees are not binary trees be trees are a thing that's most commonly done on databases the other thing is it it had no mechanism for user defin classes all the concept of object orientation was in the language itself and in the language implementation itself and there was no chance for users to Define their own objects in ABC and so I mean ABC did what it did well and and gido knew ABC well and worked on ABC and and knew what he wanted to take from ABC and knew what he wanted to build Beyond ABC I mean I think that in some ways the the language ABC is kind of pretty I mean I can read that and you can see things like split and in and other Concepts the for Loop the sort of implicit iteration in the for Loop for line in document you can see that that just came straight to python except he made it all lower case of course ABC was kind of cobalt like where would wanted you wanted how to return words a doc how to return words document is parameter that that first line is a little bit tricky right he also wanted uh real object orientation and he wanted to stay much closer to the C libraries because ABC didn't really care about being able to call like C string libraries or C socket libraries or anything like that and he wanted lowercase keywords another thing that surprised me was the fact that modula 3 was a significant influence so modula 3 was a rather European centered language it kind of came from Pascal which came from Zurich aaha and Zurich and so this was the kind of thing where folks like myself in the United States really didn't think too much of modula but G was clearly investigating how to do things and there were some really good ideas in modula 3 and he went and talk to the folks at modula 3 and the concept of self as the first parameter is a way to layer an objectoriented mechanism on top of a procedural language really the concept of self that was inspired by uh gido's interaction with modula 3 I'm going to give gido the last word here so I'm going to put the second half of my interview and just let gido talk about what inspired him as he was designing object orientation in Python [Music] can you walk us up to The Inspirations and the history of how ABC got so good at o h so uh ABC was actually not objectoriented okay ABC had a fixed set of data types the while the data types were composable like you could have say a list of integers or a list of strings and those would share the the operations on lists but sort of there was no concept of class there was no concept of users defining classes there was no concept of subclassing either in the implementation or for the built-in types it was like ABC had a bunch of really convenient to use Primitives yeah did a lot work for you but they weren't really oh that's correct and and and they even they insisted that there was only a single numeric type so that in part so that they could could sort of not deal with the complicated hierarchy of integers and floats and rationals and sort of so then so then in but when you started pip you had like object orientation very front of mind not just a convenient set of Primitives but a convenient set of object based Primitives where where did you did you read that in yeah or did you read that somewhere else uh I was familiar with C++ okay yeah and I think that might have been the only objectoriented language that I knew at the time at least I can't think of any well I I I own a big book about simula which is the sort of the granddaddy of all object oriented languages I don't think I ever managed to get my hands on a simula compiler and I have to admit I also only skimmed the book because it was was actually sort of had you written C Plus+ before you yes like what kind of how much have you written C++ uh enough to invent automatic or what I forget what they're called the pointers that are like automatically ref counted so yeah I I forget C layered there standard counting on top of C++ so that you would not lose your sanity while working in C++ uh no because I was very familiar with ref count because ABC's implementation is written in C and all everything uses reference counting and it works out better in ABC because there are no Cycles in possible in the data types because there are no mutable data types you you you can't have an object that contains itself or references itself directly or indirectly in ABC I don't think that I sort of realized all the details of why that is important and I didn't care because pragmatically uh those things aren't always as important as they seem to be theoretically but anyway I was very familiar with reference counting in C sort of for how it was used for ABC I think that that's where I'd learned about it and I didn't sort of hear much about it elsewhere then at some point when I was teaching myself C++ this must have been been in the mid 80s one of the things I tried was I mean my experience with ABC and reference counts in the implementation was that it was very error prone we regularly had to sort of deal with bugs in the implementation where we either leaked memory or uh sort of crashed because we had had freed things early and it was always like oh there's a missing incraft for de C and of course the the Mi the missing decra is much harder to debug because you leak a bite here and you leak aite there and nobody notices because people didn't write large applications in ABC uh so so anyway I I I someone probably put me on the idea of uh in C++ you can override primitive operations to the extent that you can build automatic reference counting and I built that for some toy application and played with it and realized that it didn't work right the problem was that it was was like and I I probably didn't know enough C++ at the time or possibly C++ hadn't developed certain subtle mechanisms like move operations uh it was very crude like over overloading assignment that was basically what I did and so I found that where if I had handwritten reference count operations I would sort of know okay this object is owned here so we pass it to some function that function doesn't have to increment the reference count just because it's using that that object because it's not going anywhere as long as the color of that function has a reference to it on the other hand the automatic reference counting as I had implemented it using C++ uh would say oh we're passing this thing as an argument to a function so increment the reference count oh that function Returns decrement the reference count so there was much much much more reference counting activity and sort of that part I didn't like and so python still does it does it manually and is written in C not in C++ but somehow I picked up the idea of VTS or at least arrays of pointers to functions as a handy way to implement objects and actually initially for the first probably six months python was not object or oriented the Implement and so very few people saw that like a handful of my co-workers saw that and the sort of the implementation had this notion of you can Implement an object type by putting a bunch of function pointers and a bit of other information in a standard structure that describes the type I called it a type not an object so so in a way that you were you you were emulating c++'s way of kind of like creating a perception of objectoriented programming without TR yeah oh and wait a second I there is another step that I just missed uh I so in around 88 I spent a summer as an intern at dexer and I talked to the designers and inventors of a language named modula 3 and they were like putting the last proof raing reading efforts for the modula 3 report but they had built a modula two and a half compiler they called it modula 2+ uh and in their documentation either the the draft of the modula 3 manual or the the internal modula 2+ manual I learned this concept of how modula 2+ and three do kind of not quite object orientation and I think they meant it as as sort of a reaction to how it was done in C++ they said modula 3 is not actually an object oriented language but the sort of the key part of of object oriented use where you you define a class through a bunch of functions and the notation uh where you say object. method and then the arguments instead of function and then the object and then the rest of the arguments they they they said if you use that notation object dot or thing do do method pen and then arguments thing had to be something that that had a type I think it was a Structure type that had a bunch of pointers function pointers in it and so the method names were simply uh fields or members of that structure in in modular 3 and so to create the equivalent of a class you defined a structure you defined a structure that had a bunch of function pointers they were all typed so you could say this is a function that takes one argument of this type and it returns blah blah and then a bunch of those with names and the trick was if the compiler sort of noticed that you were using that and then you were calling it it would say oh we're going to insert the thing whose method you just used as the the very first parameter to the function this is where Python's explicit self comes from exactly and so that is all modular 3's design that I copied and so originally in Python I didn't have a user level notation to even do that I only the the type system was only extensible by writing a c extension oh and so the the author of class keyword what we think it was the class keyword was not there that was month and a half that in the first probably five or six months it wasn't there and uh we I think we had an intern who who knew C++ better or or who somehow had the right lik it better who knows he he was younger than me uh and he said hey if you if you want to give users the ability to Define their own classes here's a heck how you could do it you add this little bit of syntax uh and then you map it to this structure at the implementation time and it all works and that's five months in and and and yeah and so everything else was working we we we had like like a working interpreter with a reppel and integers and floats and strings and tupal and lists and dictionaries and functions and types that were were sort of internal things I think even even then you could ask for the type of an object did you at the time in these first six months of python did you feel like you were doing core research on what object orientation was going to mean in the future no were you just no I was I was just hacking together an implementation of a language that I didn't know where it was going I I obviously spent enough time on it that I I was hoping it would go somewhere uh but you didn't think of yourself as like a that you were going to someday like write a paper about like how object orientation should be done you weren't thinking like a researcher would think definitely not that you were just solid problem I was not a researcher I was a programmer I I do not have a PhD uh I was employed by CWI as a programmer do do you do you now even think maybe that you made a profound contribution you're the first person to tell me that and I'm not sure that I I totally believe it I would I would say that the modula 3 inventors sort of made that contribution because they were they were much more sort of researchy types Theory types they they thought long and deep about all the the sort of theoretical eventual repercussions of designs like that and I I was happy to implement something even even if there were edge cases where it would just not do the right thing at all like until we we finally after about a decade or so added a sickly garbage collector to the language python would leak memory irretrievably in many situations where you had created a cycle and then lost the last pointer into the cycle from outside it and it took a very long time before our users convinced us that there were some edge cases where that was actually a real problem and there was no good existing solution [Music]

Transcript for: