Transcript for:
CS50 Week 2 Lecture on Memory and Cryptography

[MUSIC PLAYING] DAVID MALAN: All right. This is CS50. This is week 2 wherein we will ultimately learn how to use memory, but we thought we'd first begin with a bit of story time. And in fact, allow me to walk over to our brave volunteers who have joined us already. First here on my left, we have who? AKSHAYA: Hi, I'm Akshaya. I'm a first year in Mathews, and I'm planning on concentrating in chemical and physical biology and CS. DAVID MALAN: Wonderful, welcome. And let me have you hang on to the microphone first because we've asked Akshaya to tell us a short story. So in your envelope, you have the beginnings of a story. If you wouldn't mind reading it aloud. And as she reads this, allow us to give some thought as to what level Akshaya reads at, so to speak. AKSHAYA: All right, it's a long one, get ready. One fish, two fish, red fish, blue fish. DAVID MALAN: All right, very well done. What grade level would you say she reads at if you think back to your middle school, grade school, when maybe teacher said you read at this level or maybe this level or this one here? So OK, no offense taken yet. AUDIENCE: 1st grade. DAVID MALAN: I'm sorry? AUDIENCE: 1st grade. DAVID MALAN: 1st grade. OK, so first grade is just about right. And in fact, according to one algorithm, this text here, one fish, two fish, red fish, blue fish, would indeed be considered to actually be 1st grade or just before first grade. So let's-- and why is that, though? Why did you say 1st grade? AUDIENCE: It's very basic. DAVID MALAN: It's very basic. But what is it about these words that are very basic? Do you want to identify yourself? AKSHAYA: Sure. They're all one syllable and they're very simple like colors and stuff like that. DAVID MALAN: Spot-on. So like they're very short words they're very short sentences. And you would expect that of a younger person. All right, let's go ahead and hand the mic to your next volunteer here if you'd like to introduce yourself. ETHAN: Yes. Hi, I'm Ethan. I'm a first year in Canada, and I'll be concentrating in economics. DAVID MALAN: Wonderful. And in your folder, we have another story to share. ETHAN: Congratulations. Today is your day. You're off to great places. You're off and away. DAVID MALAN: So this text might sound familiar, particularly on the heels of high school, perhaps. What grade level might he be reading at? So maybe 5th grade. And why 5th grade? AUDIENCE: [INAUDIBLE] DAVID MALAN: OK. Yeah. So a little more complicated. Like the words-- we've got some more punctuation, we have an apostrophe, we have longer sentences. And indeed, according to one algorithm, not quite 5th grade, but we would adjudicate your reading level to be 3rd. But let's see if we can't do one final flourish here if you'd like to introduce yourself and your story. MIKE: Hi, I'm Mike. I'm also a first year. I'm in Weld, and I'm planning on concentrating in biomedical engineering. DAVID MALAN: Welcome. And your tale? MIKE: It was a bright, cold day in April and the clocks were striking 13. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of victory mansions, though not quickly enough to prevent a swirl of gritty dust from entering along with him. DAVID MALAN: All right, so escalated quickly. And someone's guess at this reading level? AUDIENCE: 1984. DAVID MALAN: What's that? Oh, OK, 1984 is indeed the text in question, and in what grade did you perhaps read that book? So I'm hearing 8th, I'm hearing 10th. So indeed, 10th grade is what a certain algorithm would actually adjudicate that reading level to be at. And consider now the heuristics. So we started with very small words, very small sentences, very easy words, and then things sort of escalated into more interesting, more sophisticated English, more interesting sentence construction and the like. So I bet if we could somehow capture those characteristics of text, the length of the words and the lengths of the sentences and the position of the punctuation, I daresay, even using week 1 material and, today, week 2 material, we'll be able to actually write code and implement an algorithm like that can take these spoken words, put them to paper, and actually analyze roughly what that reading level might be. So that's just a teaser of what lies ahead. For now, allow us to thank our volunteers, each of whom gets a wonderful parting gift here to read at home. [APPLAUSE] All right. And Thank you all so much. So with that said, there's another domain that we'll explore this week, and indeed, what you'll find in the coming weeks is that beyond just focusing on some of the fundamentals and the basics like we've really done in the past couple of weeks talking about loops and conditionals and Boolean expressions, really building blocks or puzzle pieces that we can assemble together, we're going to increasingly start talking about applications of these ideas which, after all, is why any field is perhaps important and applicable. So here, for instance, we'll consider not only reading levels today, and in turn, in problem set 2 this week, but also the world of cryptography, which is the art, the science of scrambling, encrypting information, and ciphering it in such a way that you can send a message securely through the internet, through the air, through any medium even though someone might intercept it. Ideally, thanks to cryptography, they shouldn't be able to decrypt it or actually determine what it there says. So for instance, if you were to receive a message like this, at first glance, it's indeed a bit cryptic. Three words maybe, but by day's end, we'll have decrypted even this message for you. So up until now, though, we've had some sort of conceptual training wheels on. And I gave us this picture last week when we introduced the tool make via which you can make programs out of your source code because you need to turn that source code into machine code, the 0's and 1's. And in the middle here was this thing called a compiler. But it really has been kind of an abstraction for us, and we've sort of had these metaphorical and physical training wheels here in the sense that we haven't really needed to care like what the compiler is doing, how it works and so forth. But today, what we thought we'd do is peel back a bit of that layer so that even though after today you'll continue to be able to use commands like make and sort of return to the beautiful abstraction that is not caring about some of these lower-level details, we'll offer you a glimpse of how some of these things work. Because so that inevitably when something goes wrong, you've got some bug, you're having some problem, you'll have a bottom-up understanding of what it could actually be. And indeed, these basics, you'll find, will very often help you troubleshoot problems and really solve problems more generally. So here, for instance, is the code that we keep coming back to. And this code here is the simplest of C programs that just says "hello, world." This is the source code. This, we claimed, was the corresponding machine code. And it was that program called a compiler that converted one into the other. But let's dive a little more deeply this week into what we mean by compiling code. Like what is happening so that by day's end, nothing really feels like magic anymore. It's not just that it goes from source code to machine code and that's that, you understand what's actually being done for you, and frankly, what other humans have done over the decades to make make as beautifully abstract and as simple as it now might seem to be. So here are a couple of commands that you've been in the habit of running when you want to first compile your code and then execute your code. But it turns out that make is actually running another command for you. The first of several white lies we'll tell in the course is that make itself is not a compiler, per se. It's actually a program that automatically runs a compiler for you. And by that, I mean this. Let me go over to VS Code here and let me create our familiar hello.c program. And I'm going to go ahead and do include stdio.h, int main void, and inside of the curly braces, printf "hello," comma, "world," backslash n semicolon. So that's the code that we keep writing again and again. And up until now, if I wanted to compile that, I would do make hello dot slash hello, and voila, now my program is made and it actually executes. But what's actually going on underneath the hood there is that make is running an actual compiler for you, and the reveal today is that the compiler we have been using is something called Clang for C language. And this is just another program whose purpose in life is actually to do the conversion of source code to machine code. But it turns out that Clang by itself can be used very simply like you see here, clang hello.c, but it doesn't behave nearly as user-friendly as you might like. So in particular, let me go ahead and do this. I'm going to go ahead and remove my compiled program by running rm for remove, which I alluded to briefly last time. And then I'm going to say y for yes, remove that regular file. And if I go ahead now and run just clang of hello.c and hit Enter, it seems to be successful, at least insofar as there's no error messages. But if I try to do dot slash hello, Enter, there is no such file or directory called hello. That is because by default, Clang somewhat goofily like just outputs a file name called a dot out. Like why a? Well, it's sort of a simple name. a dot out, technically for assembler output, but this just means this is the default file name that Clang is going to give us. So OK, it turns out I can do dot slash a dot out Enter, and voila, that now is my program, but that's just a stupid name for a program. It's not very user-friendly. It's certainly not an icon you would want to put on people's desktops or phones. So how can we do better? Well, it turns out, with Clang, we can configure it using what we'll call command line arguments. And command line arguments are actually something we've been using thus far, we just didn't slap this word on it, but command line arguments are additional words or shorthand notation that you typed at your command prompt that somehow modify the behavior of a program. And you can perhaps guess where this is going. It turns out that if I actually want to create a program called hello-- not a.out, which is the default, I can actually do this-- clang, space, dash lowercase o, space, hello, or whatever I want to call the thing, space, hello.c. And now if I hit Enter, nothing seems to happen, but now if I do ./hello and Enter, now I've actually got that program. So why is make useful? Well, it just saves us the trouble of having to type out this longer line of command any time we actually want to compile the code. But in fact, it gets even worse than that with commands like clang or compilers in general because consider this code here. Not just the version of "hello, world," but maybe the second version wherein last week, I started to get user input by adding the CS50 Library using get_string and then saying, "hello," comma, "David." Well, if I go back to VS Code and I modify this program to be that same one-- so let me go ahead and include cs50.h at the top. Let me get rid of this simple print line and instead give myself a string called name equals get_string, "What's your name?" Question mark, just like we did in Scratch. Then I can do printf, quote-unquote, "hello," comma. And previously I typed "world." I obviously don't want to type "David" because I want it to be dynamic. What did I type last week for as a placeholder? So yeah, just-- not Command-S, but %S. So %S in this case, which is a placeholder for any such string. Then I can still do my new line, close, quote, comma, and then I can substitute in something like the value of the name variable. All right, so if I go ahead now and compile this, now last week, I could just do make hello and I'm on my way, it worked just fine. But if I instead do clang manually, it turns out that this is not going to be sufficient now. clang -o hello, space, hello.c. Exact same thing I typed a moment ago, but I think I'm going to see some errors. So what's this error hinting at here? Well, at the very bottom, it's a bit arcane with its output, and much of this you can ignore, but there are some certain key words. What's the first maybe keyword you recognize in these three lines of erroneous output? So it mentions main. That's not that much of a clue because that's the only thing I wrote so far. Second line, though, get_string. There's some issue with an undefined reference to get_string. Now why might that be? I did include cs50.h, but that's apparently not enough to teach the compiler about get_string. Well, it turns out that if you're using a third-party library, one that doesn't necessarily come with C the language, something like CS50's, it turns out that you additionally have to tell the compiler that you want to use that library. And not just by including the header file, but by an additional command as well. So when you run Clang, you want to provide an additional rather command line argument. Literally -l for library, which is a term I used last week, cs50. A library is just code that someone else wrote that you want to use in your project. So if I really want to compile this version that uses the CS50 Library, I can still do clang o hello hello.c, but before I finish my thought, I need to tell the compiler to link, so to speak, in the library CS50. And now I hit Enter, the error message goes away, I can do ./hello, I can type in my name, and voila, we're back to week 1. And this is why, suffice it to say, we introduce make, which is not a CS50 thing. This is a popular tool that real people in the real world use to automate these kinds of processes. So unbeknownst to you, make has been using the -o for you. make, unbeknownst to you, has been using -l cs50 for you just because it makes our lives easier. But today, we thought we would deliberately peel back this layer so we at least understand what's going on behind this abstraction that is make itself and compiling more generally. So let me propose that compiling itself is not quite what we've described it to be. Compiling is like this catch-all phrase that apparently I claim goes from source code to machine code. But if we really want to get pedantic, which we'll do briefly, but this is not a sign of things to come because this, too, will be abstract away, compiling is just one of four steps that are involved in turning source code that you and I write into those 0's and 1's. But through an understanding of these four steps today, you'll hopefully better understand how to troubleshoot issues like that and just know what's happening because it's not, in fact, magic. It's just the result of years of humans developing these four steps here. So when you run make, what's happening? Or in turn, when you run clang, four different things are happening. And the first one is called pre-processing. So what is this all about? Well, let's consider this code here. And this code is a little bit interesting insofar as it's one of the more complicated examples from last week. And you'll notice, for instance, that I had include stdio at the top so I could use printf. I had main down here, whose purpose in life was just to meow three times. And then recall we made our own meow function just like we did in week 0 with Scratch that just printed out, quote-unquote, "meow." But I also included this line here, which we called what? This was a prototype. And why did I have to include it there? Or equivalently, what would happen if I didn't include a prototype up at the top there? Yeah? AUDIENCE: [INAUDIBLE] DAVID MALAN: Exactly. If I didn't include it up here, the program, when trying to compile main, would not know what meow is because it's not defined until later. So this is kind of like a little hint of what is to come. Alternatively, we could just move this whole thing up at the top of the file, but I claim that just devolves into a big mess eventually once you have many different functions. Like you can't realistically put them all at the top to solve this problem. So these prototypes solve that problem. So nothing new here. Just a reminder of what motivated this one line of prototype. Now let's consider this simpler program, which is just the one we wrote most recently in VS Code. This program prompts the human for their name and then says hello to that person. But it has two includes at the top of the file. And in fact, any line of C that starts with this hash symbol is what we'll call now a preprocessor directive. It's not really a word you need to remember in your vocabulary, but it is a little bit different from most every other line because it starts with that hash. That's a special symbol in C. And what this means is the following. This very first line, cs50.h, is indeed a file that I and CS50 staff wrote and we installed somewhere in VS Code for you, somewhere in the cloud. And I've claimed you need to use this header file in order to use get_string. So just logically, what is probably inside of cs50.h? Yeah? AUDIENCE: Function [INAUDIBLE]. DAVID MALAN: Super close. So the function called get_string that does the getting of a string, but it's not quite as much as the function itself. It's actually a little bit less than that, but you're on the right track. What is inside of cs50.h, presumably? Just a what? Just a prototype for? Which function? get_string. So admittedly, there's some other stuff in there, too, but the important line for today's discussion is that inside of cs50.h is indeed one line of code that defines what the return value, what the name is, and what the arguments, if any, are to get_string, and some other stuff. And so what happens effectively when you compile your code, step 1 is this pre-processing line. And essentially, there is some code that someone else wrote inside of the clang compiler that looks for a line that starts with hash include, and when it sees that, it goes and finds this file and effectively copies and pastes the contents of that file right there into your code so that you don't have to go find the file, copy and paste it, and make a mess of your own code. So in particular, it's effectively as though you're copying and pasting the prototype of get_string to the very top of your file, thereby teaching the compiler that it exists. By that same logic, what is probably in stdio.h? The prototype for? For printf. And indeed, exactly that. So this line effectively gets replaced with the equivalent of the prototype for printf, which, for today's purposes, is a bit more complicated, so let me wave my hand at the dot-dot-dot just because it takes a variable number of arguments depending on how many placeholders or format codes you have. But effectively, that, too, is what's happening. So the preprocessor step, step 1 of 4, just does that find and replace, if you will. Now there's some-- again, some other stuff in that file, and this, too, is kind of a white lie. printf probably has its own file because that's a really big library, but the essence of it is exactly this. So preprocessing converts all of those hash include lines to whatever the underlying prototypes are within the file plus some other stuff. Now compiling we use it as this catch-all phrase, but it turns out, it has a very specific meaning that's worth knowing about even though after today, you can go back to using compiling as the sort of catch-all phrase. So when you've got this same code here after the pre-processing step has happened. So this is essentially happening in the computer's memory. It's not changing your hello.c file permanently or anything like that. This code gets, quote-unquote, "compiled" into something that looks more like this. And this is a scarier language that we won't spend time on in this particular class. This is what's known as assembly language. And back in the day, before there was C, humans wrote this to program their computers. Similarly, before there was assembly code back in the day, humans very initially used what instead? AUDIENCE: 0's and 1's. DAVID MALAN: So 0's and 1's-- like they actually wrote the machine code painfully, be it in code or be it in punch cards like physical objects or the like. So again, these are sort of abstractions, but we're rewinding for today in time. But what this compiler for C is doing is converting C into this other language called assembly language. And even though this looks very esoteric, there's at least some juicy things in here. If I highlight get_string, it's mentioned in this code. printf is mentioned in this code. And even some of these keywords here that are spelled a bit weirdly, this relates to subtracting and moving something in memory and calling a function, calling a function. So there's some semantics that are probably somewhat familiar even though this is not code we ourselves will write. But unfortunately, this is not yet machine code, and that's where step 3 comes in. So step 3 of this four-step process is technically called assembling. And assembling just takes that assembly code and converts it, thankfully, to the thing we do care about, the 0's and 1's. So assembling takes assembly code converts it to 0's and 1's. As an aside, and I alluded to this earlier, the reason that Clang names its files a.out by default, assembler output, is a side effect of that being one of the steps in this process, dealing with assembly language and its subsequent output. All right, so here are some 0's and 1's, but unfortunately, there's still that fourth and final step, which is a word that I also used earlier, namely linking. So let me take a step back and look at this code here. And even though this code is exactly as I wrote in VS Code in hello.c-- so no copying and pasting, no prototypes have been plugged in here, this is my code, technically, there's three different files involved in compiling even something relatively simple like this. There's obviously this thing itself, hello.c, which I wrote. There's apparently cs50.h, and there's apparently stdio.h. But technically-- and you don't have to know this file name, per se, somewhere else on the computer's hard drive, so to speak, is a cs50.c file, which actually contains the staff's implementation of get_string and get_int and get_float and all of those other functions. Somewhere on the server's hard drive is stdio.c that implements printf and all of these other functions as well. So the dot c is just inferred from the dot h here. You don't ever mention the dot c file, but someone else wrote those files, someone else stored them in the server for you-- CS50 staff in this case. So technically, even when compiling a relatively short program like this, you're really combining three files at least at the end of the day. And I'll write them from left to right. hello.c, which I wrote, cs50.c, which the staff wrote, and then stdio.c as well. So somewhere there's these three files. And Clang, our compiler, needs to compile each of these into the corresponding 0's and 1's. Lastly, this is not yet sufficient because these 0's and 1's haven't been linked together. I mean, I deliberately left a gap here to imply that these are three separately-compiled files. So that fourth and final step called linking takes all of these 0's and 1's and an intelligent way combines them into just one final file named hello, named a.out, whatever the file name is of choice. So what you and I for the past week have just been calling compiling-- and that's what a normal person will use henceforth to describe this whole process, technically, there's these four different steps underneath the hood, each of which is sort of a representative of an evolution of technology over the years. And nowadays, if we fast forward a few weeks in class, when we start talking about Python, which is another more modern language, that, too, is going to be conceptually even higher level, even though underneath the hood, there's going to be some lower-level principles at work. So any questions on just terminology or these processes known as compiling? Yeah? AUDIENCE: I didn't really understand what compiling means. [INAUDIBLE] DAVID MALAN: Sure. Compiling, if I rewind, is the process of taking your source code, which looks like this, recall-- whoops, this, and converting it into assembly code. So preprocessing just converts all of those hash include lines and a few others to their equivalents. So that's step 1. Compiling converts the C code into the underlying assembly code. The assembling step, step 3, converts the assembly code to 0's and 1's. And then the fourth step, linking, combines all of the 0's and 1's from the one, the two, the three or more files that are involved in your project and links them all together for you magically. But at the end of the day, all of this is happening automatically for you. If I jump now to the end here, whereby just by running make, which, in turn, runs clang for you, like all of this is abstracted away. But the key here is that even with these commands that we've been running, be it the make command or the clang command, everything should be explainable what you are typing at the prompt ultimately. Each of those things has a purpose. So any questions, then, on what we've just now called compiling even though it's only when you take another CS course that you might spend more time on assembly language or these lower-level details? Yeah? AUDIENCE: [INAUDIBLE] DAVID MALAN: A good question. Are there other types of compilers? Yes. Back when I took CS50, I used a popular compiler called GCC, the GNU Compiler Collection, which still exists actually in the code space that you're using for CS50. Clang is somewhat more recent. It's gaining popularity. And frankly, we use it in large part because it's error messages are slightly more user-friendly. You might not believe us because if you encountered some errors with your code this past week, they were probably just as arcane as the error messages I saw, but it's better than it was some years ago. And there's alternatives to compiling, too, but more on that when we get to Python as well. Other questions? No? All right. Well, what are the implications of the fact that we're going from source code to machine code? Well, it stands to reason that if you can compile code, maybe you can decompile it-- that is, go in the reverse direction. Go from 0's and 1's to actual source code. Now that would be handy if you want to go in as a programmer and change something in a program that you or someone else already wrote. It's maybe not ideal for your intellectual property, though, if you are the person who wrote that program in the first place. If you are Microsoft and you wrote Microsoft Word or Excel that people with Macs and PCs and phones have installed on their devices, it doesn't actually sound very appealing if any old customer can take those 0's and 1's and reverse them, reverse engineer them, so to speak, into the original source code because then they can have their own version of Microsoft Word and make changes to it without really having put in all of the R&D that it might have taken to build the first version thereof. But it turns out that reverse engineering-- so doing things in the opposite direction-- is easier said than done because there are multiple ways, as you've seen already, to implement programs. Like loops alone, you can use for loops, while loops, even do-while loops. And so there's other ways-- there's multiple ways to solve the same problem. So even if you try to reverse engineer a program and convert machine code back to source code, there's not necessarily going to be an obvious way to do so. And the reality is, that it ends up being such a mess because you lose the variable names typically, you lose the function names typically, that what you end up looking at might very well be C code, but it's completely difficult for you, even a good programmer, to read. And generally, the mindset is, if you're really good enough to decompile code in that way and read it subsequently even without good variable names, good function names, good documentation and the like, could probably have just implemented the program in the first place yourself without jumping through those hoops. So there's some practicality pushing back on what are otherwise potential threats to, say, your intellectual property. But that's not going to be the case later on in the term when we do get to languages like Python to some extent, other languages like JavaScript. Some of those are actually going to be readable by anyone. Any of your customers, any of your friends, and your family that actually use your programs. So with that said, let's introduce now another tool to our toolkit that will hopefully make some of the pain from this past week when you did encounter bugs a little more manageable. And indeed, part of the process of writing code to this day is debugging it. And it is a rare thing to write a program, be it in C or any other language, and get it 100% right the first time. I mean, to this day, I still, 20-plus years later, still write buggy code. Hopefully a little bit less of it, but any time you're adding a new feature, any time you're doing something for the first time, you're not necessarily going to see all of the possible mistakes. So even in industry, bugs are omnipresent, which is really to say, having techniques to debug code-- that is, eliminate bugs, is super compelling. Now just for a bit of history, here is Admiral Grace Hopper, who was actually in not only the military, but also on the faculty of Harvard years ago and worked on a Harvard computer called the Harvard Mark I, which is actually on display at the School of Engineering and Applied Sciences if you take a tour over there sometime. But also when working on the Harvard Mark II, she is known for having at least popularized the phrase "bug" to mean a mistake in a computer's program-- a mistake in a computer's code. And the etymology of this supposedly is this here logbook wherein she and her colleagues were documenting processes being computed on computers, that a moth actually got stuck in one of the relays, one of the mechanical-- the electric relays inside of the very old now computer, and someone very cleverly wrote, "First actual case of bug being found." So it wasn't she who actually discovered it, but this was a story she was thereafter fond of telling as a famed computer scientist thereafter. We now know bugs to be all too familiar when it comes to writing our own code, and I thought I would deliberately write some buggy code based on some of the programs with which we experimented last week. So let me go back over to VS Code here and let me propose that I do something somewhat simplistic just like this to print out a column of bricks of height 3. So I'm going into VS Code and I'm going to deliberately call this program buggy.c because I intend to do this poorly. I'm going to include stdio.h as before, int main void as before. And in here, if I want to print a pyramid of height 3, I'm going to do 4 int i gets-- all right, I'm still new to programming in my mind here, so I know I'm supposed to start counting at 0, OK. And I want to do this until I count up to 3, so I'm going to do that. And then i++ I remember from class in this way. And now I might go ahead and print out just a hash mark, backslash n, which I do want because I want to move this cursor to the next line to make this vertical. But of course, if you've noticed with your eye already, when I do make buggy, it compiles OK. So no typos, no syntactical errors. But when I run this, I'm going to see how many bricks. So four in this case. Now this is meant to be a simplistic example so that we don't spend time trying to figure out what the bug is, but rather, focus on techniques for actually identifying the bug. So-- finding, rather, the bug. So what's one of the first tools in your toolkit? Literally one you have already. printf is your friend. And it is a very quick and dirty tool for just seeing what's going on inside of the computer when you don't have more sophisticated tools or even the time to use them. And so in this case, for instance, what I'd propose is that-- all right, I'm obviously seeing four hashes. And let me play a little slow here. It'd be helpful for me to understand why logically I'm ending up with four, even though I'm starting at 0 like I remember from class and I'm going up to 3 as we did in class, like I'm just not seeing it in this particular story. So what I would commonly do is go into my code and just help me see what's going on, and I might literally write a printf line like, i is %i, backslash n, comma, and then just print out the value of i. I just want to see on every iteration, what is i, what is i, what is i just to help me see what the computer already knows. So let me go ahead and recompile buggy, let me rerun buggy, and then let me make my terminal window bigger just to make clear what's going on. And now it's a little more pedantic. Now i is 0, I get a hash. i is 1, I get a hash. i is 2, I get a hash. Wait a minute. i is 3, I get a hash. So clearly now, it should be maybe more obvious to you, especially if the syntax itself is unfamiliar, I certainly don't want this last one printing, or maybe equivalently, I don't want the first one printing. So I can fix this in a couple of ways, but the solution, the most canonical solution is probably to do what with my code? To change to what to what? Yeah? AUDIENCE: [INAUDIBLE] DAVID MALAN: Yeah. So change the less than or equal sign to just a less than sign. So even though this is like counting from 0 to 3 instead of 1 through 3, it's the more typical programmatic way to write code like this. And now, of course, if I do make buggy-- and I'll increase my terminal window again, ./buggy, now I see what's going on inside of the code. Now it matches my expectations, and so now the bug is gone. Now of course, if I'm submitting this or shipping it, I should delete the temporary printf. And let me disclaim that using printf in this way just to help you see what's going on is generally a good thing, but generally adding a printf and a printf and a printf and a printf-- like it starts to devolve into just trial and error and you have no idea what's going on, so you're just printing out everything. Let me propose that if you ever find yourself slipping down that hill into just trying this, trying this, trying this, you need a better tool, not just doing printf. And frankly, it's annoying to use printf because every time you add a printf, you have to recompile the code, rerun the code. It's just adding to the number of steps. So let me propose instead that we do this. I'm going to go back into VS Code here and I'm going to write a different program that actually has a helper function, so to speak. A second function whose purpose in life is maybe just to print that column for me. So I'm going to say this-- void print_column, though I could call it anything I want, and this function is going to take a argument or a parameter called height which will tell it how many bricks to print, how many vertical bricks. I'm going to do the same kind of logic. for int i equals 0. i is less than-- I'm going to make the same mistake again-- less than or equal to height, i++. And then inside of this for loop, let me go ahead and print out the hash mark. So I've made the same mistake, but I've made it in the context now of a helper function only because in main, what I'd like to do now, just to be a little more sophisticated is get int from the user for the height. And when I do get that int, I want to store it in a variable called n, but I do need to give that variable a type like last week. So I'll say that it's an integer. And now, lastly, I can print_column, passing in-- actually, I'll call it h just because height is h. Print column h, semicolon. OK, so it's the exact same program except I'm getting user input now. So it's not just going to be 3, it's going to be a variable height, but I've done something stupid. AUDIENCE: [INAUDIBLE] DAVID MALAN: I've done two stupid things. So this, of course, is not supposed to be there, so I'll fix that. And someone else. What else have I done? AUDIENCE: [INAUDIBLE] DAVID MALAN: Yeah. I'm missing the prototype. And this is, let me reiterate, probably the only time where copy-paste is OK. Once you've implemented the function, you can copy paste its first line at a semicolon so that it teaches the compiler that this function will exist. AUDIENCE: [INAUDIBLE] DAVID MALAN: Three stupid things. OK. Thank you. So, good. Include cs50.h. And now, anyone want to go for four? No? All right. Slightly unintended here. So let's see. make buggy. OK, no syntax errors thanks to you all. So the code compiles, but of course, when I run buggy and I type in something like 3 manually, I'm still going to get 1, 2, 3 4 out. So let me now introduce a more powerful tool that's generally known as a debugger. And within the VS Code environment that you're using, we actually have a command that makes it a little easier to use this tool, but we didn't write the tool itself. You are about to see a very graphical, a very popular industry standard tool called a debugger, but we'll start the debugger using a CS50-specific command called debug50, which just makes it easier with a single command to start the debugger without having to configure a text file with all of your preferred settings and all of that. It's just an annoying hoop otherwise to jump through. So what I'm going to do is go back to my code here. I have already compiled it, but just for good measure, I'll make buggy again because the debugger needs your code to be compiled. It's not going to help with syntax errors like the stupid mistakes I just made unintentionally, it will help you though with programmatic errors, logical errors in your code once your code is running. So to run debug50, I'm going to do this. debug50, space, and then the exact same command I would normally run to just run the program itself. So ./buggy. So exact same thing, ./buggy, but I prefix it now with debug50. When I hit Enter, a whole bunch of-- another error is going to pop up on the screen, which is a good reminder because this will happen to you, too, invariably. It's reminding me that I have to set what's called a breakpoint. And as that word suggests, it is the point at which you want your code to break. Not break in make the situation worse sense, but rather, where do you want to pause? Execution, break, execution-- like hitting the brakes on a car so the program doesn't run all at once. And you can put this any number of places, and you might have done this accidentally if you've ever hovered over the gutter of VS Code, the left-hand side next to your line numbers. See the little red dot that appears? If I click on any of these lines, that's going to set a breakpoint, so to speak. And I want to break execution at main. So I'm just going to click to the left of line 6 in this case. That makes it a darker red circle, a stop sign of sorts that tells the debugger to pause execution on that line, though I could put it elsewhere if I so choose. Let me go ahead and rerun debug50 ./buggy, Enter, and now a bunch of things are going to happen on the screen. It's going to look a little overwhelming perhaps at first glance, but there's some useful stuff that just happened. So one, my code is still here, but the line that I set the breakpoint on is-- rather, the first line of actual executable code at or below the breakpoint I set is highlighted in this yellowish green here, which says, this line of code has not yet been executed. We broke at this point, but if I click a button, this line of code will be executed. Because up until now, every C program you write runs as fast as that. I want to pump the brakes and pause here. But notice a few other aspects of the window here. So notice that up here some weirdness. There's mentions of variables and we're familiar with these. Local is a term we'll use this week. But there's this variable h, which weirdly, where did the value 21912 come from? So it turns out, in C, before you initialize a variable with a value by literally typing the number 3, or by using a function like get_int, it often contains what's called a garbage value. More on those in a couple of weeks. But a garbage value is you can think of it as like remnants of whatever was in the computer's memory before you ran your program. And that's a bit of a oversimplification, but you cannot trust that a variable will have a certain value in this case if you did not put one there yourself. So for now, h is nonsensical. It's a garbage value it means nothing. But once I execute this line, it should contain whatever the human types in. All right. Down here, there's a watch section, which is a more sophisticated feature. Down here is what's called the call stack. More on that in the future. But what this means for now is that I'm executing the main function, not, for instance, print_column. So notice up here, these are the most useful controls within the interface. If I hit this Play button, it's just going to actually run my program to the end of it without bothering me further. However, I can actually step over this line of code and execute it, or I can step into this line of code and actually poke around the contents of get_int if it's available on the system. So conceptually you can either execute this line or you can dive down conceptually deeper and see what's inside of that function. Lastly, this will let you step out, this will allow you to restart the whole process, and this will just stop the debugger. So these buttons are going to be our friends. And the one I'll click first is the first one I described, which is step over. So step over doesn't mean, skip this step, it just means execute it, but don't bother me by going into the weeds of what is on the specific line, namely get_int. So when I click this button in a moment, you'll see that my terminal, which is still at the bottom, prompts me for a height. I'm going to go ahead and type 3. As soon as I hit Enter, what part of the screen probably will change based on what I've said? So h, the variable h should hopefully take on the number 3. And I'll probably see a different line of code highlighted, probably line 9 next once I'm done executing line 8. So let me go ahead and hit Enter and watch the top-left of the screen. And voila, h now has the value 3, and execution has now paused on line 9 because the debugger is allowing me to step through my code line by line. Now let me go ahead and print out-- let me go ahead and just say, all right, I'm done with this. Let's go ahead and run the rest of the program. It clearly got the value 3. But wait a minute-- oh, and at this point, it closed the window in which I would have seen the output, I would have still seen four hashes. So let me actually do this again. Let me go back into debug50 by running the exact same command again. It's going to think for a moment, it's going to reconfigure the screen. I'm going to do the exact same thing. I'm going to step over this line, but I'd like to actually see what's going on inside of my print_column function. So this time, instead of just saying run to the end and close all the windows on me, let me go ahead and step into my print_column function. So don't step over, step into. Because if I step over-- and now this is what I meant to show earlier, you can see that it's still printing out 4. So in fact, let me undo this, let me just stop the whole thing. Let me rerun the command a final time. So it goes back to where we began before. It's going to prompt me again once I step over line 8 for a number like 3. But this time, instead of stepping over line 9, let's poke around. I wrote print_column, so let's look at print_column step by step, step into it, and watch what happens to the yellow highlight. It now jumps logically to the inside of print_column, thereby letting me walk through this code. And now I can just step over each of these lines one at a time. So stepping over. OK, so what did it do? It did that whole narrative that I did verbally last week where it compared i against height. It then went inside of the loop. When I click Step Over, watch what happens in my terminal-- one hash prints out. Now line 14 is highlighted again. It's comparing per the Boolean expression, i, is it less than or equal to height? If so, it's going to go ahead and print out the hash. It's going to do this again, print out the hash. But notice at the top-left of the screen, height is still the same, it's still 3, but what has been changing, apparently? i on each iteration. So the debugger is letting me see what's going on slowly inside of this loop because i keeps getting incremented. So if I step over this line now, notice that I've now printed 3. So ideally I want this loop to end, but if I click Step Over once more, notice that the value of i at top-left is 3, but 3 is less than or equal to height-- oh, now I get it, if I play along here. Now I see why less than or equals to, mathematically, is clearly incorrect. And as soon as that light bulb goes off, you can just sort of bail out, click the red Stop button to turn the debugger off, go back in, fix your code, and voila, recompile, run it, and you're back in business. So the takeaways here really are just what tools now exist? Printf is your friend, but only for quick-and-dirty debugging techniques. Get into the habit now of using debug50, and in turn, VS Code's debugger. You will invariably not take this advice, say, for problem set 2 as you first begin because it's going to feel easier and quicker just to use printf, just to use printf, just to use printf. And the problem with that logic is that you begin to build up like technical debt, so to speak, where you really should have learned it earlier, you really should have learned it earlier, you really should have learned it earlier, at which point, you end up spending more time wasted using printf and doing things manually than if you had just spent 10 minutes, 30 minutes just learning the user interface and the buttons of a proper debugger. So please take that advice because it will save you significant amounts of time over time. Questions on printf or debugging in this way? Any questions on this? No? OK. So let me give you a third and final technique for debugging, which has been looming over us here for some time. So there is actually this technique known as rubber duck debugging. And in the absence of a roommate who is taking CS50 or who has taken CS50 or knows how to program, in the absence of having a TF or TA or CA sitting next to you, in the absence of having a family member available to ask questions of, if you have simply an inanimate object on your desk, goes the tradition, just talk to that inanimate object. Better yet, if it's an adorable rubber duck in this way. And the idea of rubber duck debugging is that simply by verbalizing literally out loud to this inanimate object-- probably with the door closed and no one knowing that you're talking to this rubber duck, you invariably end up hearing any illogic in your own thoughts, at which point the proverbial light bulb tends to go off and you're like, oh, I'm an idiot. It's supposed to be less than, not less than or equal to. So literally just explaining to a duck or any inanimate object what's going on in your code will quite frequently just help you see in your mind's eye what it is you've been doing wrong. So rubber duck debugging is indeed a very effective technique even if you don't happen to have a small or large rubber duck. Of course, you're also welcome to use the CS50 Duck who lives at cs50.ai, and also within a pane in VS Code at cs50.dev. You can ask the CS50 Duck about concepts you don't understand, or you can even copy paste certain lines of code with which you might be having trouble and ask the duck for its own advice. All right. So, with those tools in our toolkit, let me propose now that we do-- that we introduce now a few lower-level features of C itself and better understand how we can start solving some of those problems like the readability of text or the encryption of data. These were our so-called types last week when we introduced at least a subset of them or used them just to store data in a certain format, so to speak. Like in week 0, we said that everything at the end of the day is just 0's and 1's, binary. And I claimed conceptually that how a computer knows if a set of bits is a number versus a letter versus a color or a sound or an image or a video is just context-dependent, like you're using Photoshop or you're using Microsoft Word or something else. But last week, we saw a little more precisely that it's not quite as broad strokes as that. It's more about what the programmer has told the software is being stored in a given variable. Is it an integer? Is it a char, a character? Is it a whole string? Is it a longer integer or the like? So you now have this control. The catch, though, recall, though, is that each of these types has only a finite amount of space allocated to it. So for instance, an integer is typically 4 bytes, and 4 bytes is 32 bits because it's 8 times 4. 32 bits, we claimed, is roughly 4 billion, but if you want to represent negative and positive numbers, the biggest integer you can store is like 2 billion. Now that's really big for a lot of applications, but years ago, Facebook, for instance, was rumored to be using integers when they had fewer users. But now that they have billions of users-- 3-plus billion users, an integer is no longer big enough for the Facebooks, the Googles, the Microsofts and so forth of the world. So we also have longs, which use twice as many bytes, but exponentially bigger range of values. Meanwhile, a bool, interestingly, is a byte, which is kind of bad design in what sense? Why might that be bad design? It's only-- it should only be 2-- 1 bit, rather, because a 0 or 1 should suffice. Turns out, it's just easier to use a whole byte even though we're wasting seven of those bits, but bools are represented nonetheless with 1 byte. Chars are going to be 1 byte. Floats tend to be 4 bytes. Doubles tend to be 8 bytes. Some of this is system-dependent, but nowadays on modern computers, this tends to be a useful rule of thumb. The only one I can't commit to here is a string because a string, recall, is a sequence of text. And maybe it has no characters, one character, two, 10, 100. So it's a variable number of bytes presumably where each byte represents a given character. So with that said, how do we get from an actual computer to information being represented therein? Well, let me remind us that this is what's inside of our Macs, PCs, phones. Even though this isn't a scale and it might not be the same shape, this is memory, random access memory. And on these black chips, on the circuit board here, are the bytes that we keep talking about. In fact, let's go ahead and zoom in on one of these chips, fill the screen here. And just for an artist's depiction's sake, let me propose that if you've got, I don't know, a megabyte, a gigabyte-- like a lot of bytes packed into this chip nowadays, it stands to reason that no matter how many of them you have, we could just number them from top to bottom and we could say that this is byte 1, or you know what? This is byte 0, 1, 2, 3, and this is maybe byte 1 billion or whatever it is. So you can think of memory as having addresses or just locations, numeric indices that identify each of those bytes individually. Why a byte? Individual bits are not that useful, so 8, again, 1 byte tends to be the de facto standard. Let me-- so, for instance, if you're storing just a single character, a char, it might be stored literally in this top-left corner, so to speak, of the chip of memory. If you're storing maybe an integer, 4 bytes, it might take up that many bytes. If you're storing a long, it might take up that many bytes instead. Now we don't have to dwell on the particulars of the circuit board and these traces and all the connections, so let me just abstract this away and claim that what your computer's memory really is is just kind of this canvas, I mean kind of in the Photoshop sense. If you've ever made pictures, it's just a grid of pixels, up, down, left, right, that's really all your memory is. It's this canvas that you can manipulate the bits on to store numbers anywhere you want in the computer's memory. So in fact, let's zoom in here and let's consider how your computer is actually storing information using just these bytes. At the end of the day, no matter how sophisticated your Mac, your PC, your phone is, like this is all it has access to for storing information. It's a canvas of bytes, and what you do with this now really invites design decisions. So let's consider this. Here is an excerpt from a program wherein maybe I'm prompting the user for three scores. Like three test, scores, exam scores, something like that. And the purpose in life of this program is maybe to average those three scores together if you want to get a sense of where you stand in some class. So we can certainly whip up some code like this. And in just a moment, let me go ahead and flip over to VS Code here. And I'll write up a new program called scores.c. And in this, let me go ahead and first include stdio.h, int main void at the top. And in here, let me go ahead and assume that, eh, it's not been the greatest semester. So my first score, which I'll call score1, was a 72, my second score was a 73, but my third score, score3, was like a 33. Now you might remember these numbers in another context, they might spell a message, but in this case, it's just integers. It's just numbers because I'm telling the computer to treat these as ints. Now if I want to figure out what my average is, I can do a bit of math. So let me just print out that my average is-- and I don't want to shortchange myself. I'm not going to use %i because I don't want to lose even anything after the decimal point. So we're going to use a float instead. And my average i claim will be score1 plus score2 plus score3 divided by 3, semicolon. With parentheses, because just like grade school math, like order of operations, I parenthesize the numerator, so I can divide the whole thing by 3. But I have screwed up already. I am going to shortchange myself and not give myself as high a grade as I deserve, but this one's subtle. What have I done wrong? Yeah, I might want to cast these scores to floats because if you do integral math, divide an integer or the sum of an integers-- some integers by an integer, it's going to be an integer as the result, so it's going to throw away anything after the decimal point. Even if it's something-point-1, something-point-5, something-point-9, that fraction is going to be thrown away. There's a bunch of ways to fix this. I could just use floats or doubles for all of these. I could cast score1, score2, or score3 as you propose. Frankly, the simplest way is just change the denominator because so long as I've got one float involved in the math, this will promote the whole arithmetic expression to being floating point math instead of integer math. So let me go ahead now and do make scores, Enter. So far, so good. ./scores, and my average seems to be not great, but 59.33333-- so in the third. But I would have lost that third if I hadn't used a float in this particular way. Well, let's consider now what's actually going on inside of the computer when I store these three variables. So, back to the grid here, just my canvas of memory. It doesn't really matter where things end up. I might put it here, I might put it there, the computer makes these decisions. But for the artist's sake, I'm going to put it at the top left-hand corner here. So, score1 is containing the integer 72. Why is it taking up four squares, though? Because? It's an integer. And on this system, an integer is 4 bytes. So I've drawn it to scale, if you will. score2 is the number 73, it also takes 4 bytes. By coincidence, but also by convention, it will likely end up next to the first integer in memory because I've only got three variables going on anyway, so the computer quite likely will store them back to back to back. And indeed, by that logic, score3, containing the number 33, is going to fill in this space here. We'll consider down the road what happens if things get fragmented-- something's here, something's here, something's here, but for now, we can assume that this is probably contiguous, though not necessarily so. All right, so that's pretty straightforward, but what's really going on? Well, these are just bytes of memory-- that is, bits of memory times 8. And so what's really going on is this pattern of 0's and 1's is being stored to represent 72. This pattern of 0's and 1's is being stored to represent 73, and similarly, 33. But that's a very low level detail that we don't really care about, so we'll generally just think about these as numbers like 72, 73, 33. All right. So if we go back to the actual code, though, here, I wonder if this is the best idea. These three lines of code are correct. I got my 59 and 1/3 for my average, which I claim is correct, but code-wise, this should maybe rub you the wrong way. Even if you hadn't programmed before CS50, why might this not be the best approach to storing things like scores in a program? How might this get us in trouble? Yeah? AUDIENCE: [INAUDIBLE] DAVID MALAN: Yeah. It's not the best because you have to use a whole bunch of different variables for each score. They're almost identically named, though, but just imagine in almost any question involving the design of your code, what happens is n, the number of things involved, gets larger? Am I really going to start writing code that has score4, score5, score6, score10, score20? I mean, your code is just going to look like this mess of mostly copy-paste except that the number at the end of the variable is changing. Like that should make you cringe a little bit because it's not going to end well eventually. And typographical errors are going to get in the way most likely because we'll make mistakes. So how can we do a little bit better than that? Well, let me propose that we introduce what we're going to now call an array. An array is a sequence of values back to back to back in memory. So an array is just a chunk of memory storing values back to back to back. So no gaps, no fragmentation. From left to right, top to bottom, just as I already drew. But these arrays in C, at least, are going to give a slightly new syntax that addresses exactly your concern. So here instead is I would propose how you define a one variable-- not three, one variable called scores, plural, each of whose values is going to be an int, and you want three integers tucked away in that variable. So now I can pluralize the name of my variable because by using square brackets and the number 3, I'm telling the compiler, give me enough room for not one, not two, but three integers in total. And the computer is going to do me a favor by storing them back to back to back in the computer's memory. Now assigning values to these variables is almost the same, but the syntax looks like this. To assign the first value, I do scores, bracket, 0 equals whatever, 72. scores, bracket, 1 equals 73; scores, bracket, 2 equals 33. And it's square brackets consistently. And notice, this is a feature-- or a downside of C. We very frequently use the same syntax for slightly different ideas. This first line tells the computer, give me an array of size 3. These next three lines mean, go into this array at location 0 and put this value there. Location 1, put this value there; location 2, put this value there. So same syntax, but different meaning depending on the context here. But the equal sign indeed means that this is assignment from right to left just like last week. So what does this mean in the computer's memory? Well, in this case here, we now have a slightly different way of doing this. And actually, let me do it first in code. Let me go back to VS Code here, and let me propose that instead of having these three separate variables, let me give myself an int, scores variable of size 3, and then do scores, bracket, 0 equals 72; scores, bracket, 1 equals 73; scores, bracket, 2 equals 33. And now I have to change this syntax slightly, but same idea. scores, bracket, 0; scores, bracket, 1; and lastly, scores, bracket, 2. So a couple of key details. I started counting at 0. Why? That's just the way it is with arrays. You must start counting at 0 unless you want to waste one of those spaces. And what you definitely don't want to do is go into scores, bracket, 3 because I only ask the computer for three integers. If I blindly do something like this, you're going too far. You're going beyond the end of the chunk of memory and bad things will often happen. So we won't do that just yet. But for now, 0, 1, and 2 are the first, second, and third locations. So if I recompile this code-- so make scores seems OK. ./scores, and I get the exact same answer there. But let me make it more dynamic because this is a little stupid that I'm compiling a program with my scores hardcoded. What if I have a fourth exam tomorrow or something like that? So let's make it more dynamic and I think the syntax will start to make a little more sense. Let's go ahead and use get_int and ask the user for a score. Let's go ahead and get_int and ask the user for another score. Let's go ahead and get_int and ask the user for a third score, now storing the return values in each of those variables. If I now do make scores-- oh, darn it. a mistake. Similar to one I've made before, but we didn't see the error message last time. What'd I do wrong? Yeah? AUDIENCE: [INAUDIBLE] DAVID MALAN: OK. What did I do wrong-- how about over here? AUDIENCE: [INAUDIBLE] DAVID MALAN: Yeah. So I'm missing the CS50 header file. So how do you know that? Well, implicit declaration of function get_int. So it just doesn't know what get_int is. Well, who does know what get_int is? The CS50 Library, that should be your first instinct. All right. Let me go to the top here and let me go ahead and squeeze in the CS50 Library like this. Now let me clear my terminal. make scores again. We're back in business. And notice, I don't need to do -l cs50. make is doing that for me for clang, but we don't even see clang being executed, but it is being executed underneath the hood, so to speak. All right, so ./scores, here we go. 72, 73, 33. Math is still the same, but now the program is more interactive. Now this, too, hopefully should rub you the wrong way. This is correct, I would claim, but bad design still. Reeks of week 0 inefficiencies. Yeah? AUDIENCE: [INAUDIBLE] DAVID MALAN: OK. So I could ask the human how many scores do you want to input? Let's come back to that. But I think even in this construct, what better could I do? Use a loop, right? Because I'm literally doing the same thing again and again. And notice, this number is just changing slightly. I would think that a little plus-plus could help there. get_int Score, get_int Score, get_int Score-- that's the exact same thing. So a loop is a perfect solution here. So let me go over into this code here, and I can still for now declare it to be of size 3, but I think I could do something like this-- for int i get 0, i is less than 3, so I'm not going to make the same buggy mistake as I made earlier. I++. Inside of the loop now, I can do scores, bracket, i, and now arrays are getting really interesting because you can use and reuse them, but dynamically go to a specific location. Equals get_int, quote-unquote, "Score." Now I can type that phrase just once and this loop ultimately will do the same thing, but it's getting better. The code is getting better designed because it's more compact and I'm not repeating myself. 72, 73, 33. Still works the same, but we're iteratively improving the code here. Now how else-- there's one design flaw here that I still don't love it's a little more subtle. Any observations? AUDIENCE: [INAUDIBLE] DAVID MALAN: Ah, interesting. So instead of dividing by 3.0, maybe I should divide it by the array size, which at the moment is technically still 3, but I do concur that that is worrisome because they could get out of sync. But there's something else that still isn't quite right. Yeah? AUDIENCE: [INAUDIBLE] DAVID MALAN: I'm OK moving to this zero-indexed model. So this is a new term of art. To index into an array means to go to a specific location. So here, I'm indexing into location i, but i is going to start at 0 and then 1 and then 2. I'm actually OK with that. Even though in common day life we would say score1, score2, score3, as a programmer, I just have to get into the habit of saying score0, score1, score2 now. But something else. Yeah? AUDIENCE: I could compute the average. DAVID MALAN: I could also compute the average in a loop because indeed, this is only going-- so solving the problem halfway. I'm gathering the information in the loop, but then I'm manually writing it all out. So it does feel like there should be a better solution here. But let me also identify one other issue I really don't like, and this is, indeed, subtle. I've got 3 here, I've got 3 here, and I essentially have 3 here, albeit a floating point version. This is just ripe for me making a mistake eventually and changing one of those values, but not the other two? So how might I fix this? I might at least do something like this. I could say integer maybe n for scores, I'll set that equal to 3. I could then use n here, I could use n here. I could use n here, but that's a step backwards because I don't want an int because I'm going to run into the same math issue as before, but I could convert it-- that is, cast it to a float, and we did that briefly last week. But there's one other thing I could do here that we did introduced last week. This is better because I don't have a magic number floating around in multiple places. Yeah, if I really want to be proper, I should probably say this should be a constant integer. Why? Because I don't want to accidentally change it myself. I don't want to be collaborating with a colleague and they foolishly change it on me. This just sends a stronger signal to the compiler, do not let the humans change this value. And now just to point out one other feature of C, if you have a number like this, like the number 3, I've deliberately capitalized this variable name really for the first time. Any time you have a constant, it tends to be a convention to capitalize it just to draw your attention to it. It doesn't mean anything technically. Capitalizing a variable does nothing to it, but it draws attention visually to it to the human. So if you declare something as a constant, it's commonplace to capitalize it just because. Moreover, if you have a constant that you might want to occasionally modify-- maybe next semester when there's four exams or five exams instead of three, it actually is OK sometimes to define what might be called a global variable, a variable that is not inside of curly braces, it's literally at the top of the file outside of main, and despite what I said about scope last week, a global variable like this on line 4 will be in scope to every function in this file. So it's actually a way of sharing a variable across multiple functions, which is generally fine if you're using a constant. If you intend to change it, there's probably a better way than actually using a global variable, but this is just in contrast to what I previously did, which I would call, by contrast, a local variable. But again, I'm just trying to reduce the probability of making mistakes somewhere in the code. And I do agree. I don't like that I'm still adding all of these scores manually even though clearly I had a loop a moment ago. But for now, let's at least consider what's been going on inside of the computer's memory. So with this array, I now have not three variables, score1, score2, score3. I have one variable, an array variable, called scores, plural. And if I want to access the first element, its scores, bracket, 0. If I want to access the second element, its scores, bracket, 1. If I want to access the third element, it's scores, bracket, 2. If I were to make a mistake and do scores, bracket, 3, which is the fourth element, I'd end up in no man's land here, and worst case, your program could crash or something weird will happen, spinning beach balls, those kinds of things. Just don't make those mistakes. And C makes it easy to make those mistakes, so the onus is really on you programmatically. Questions on this use of arrays? Question on this use of arrays? Yeah, in back. AUDIENCE: Is there any way [INAUDIBLE]? DAVID MALAN: A really good question. Is there any way to create an array just by using syntax alone without prompting the human for it? Short answer, yes. If you want to have an array of integers called, for instance, array, you could actually do like 13, 42, 50, something like this, would give you an array if you use this syntax. This would give you an array of size 3 where the three values by default are 13, 42 and 50. It's not syntax we'll use for now, but there is syntax like that. It's not quite as user-friendly, though, as other languages if you've indeed programmed before. Other questions on this use of arrays? Yeah, in front. AUDIENCE: [INAUDIBLE] DAVID MALAN: Is there a way to copy what? AUDIENCE: [INAUDIBLE] DAVID MALAN: Oh, is there a way to calculate the length of an array? Short answer, no, and I'm about to show you one demonstration of this. Those of you who have programmed before in Java, in JavaScript, in certain other languages, it's very easy to get the length of an array. You essentially just ask the array, what's its length? C does not give you that capability. The onus is entirely on you and me to remember, s as with another variable, like n, how long the array is. And so in fact, let me go ahead and do this. I'm going to go ahead and open up a baking style, a program that I wrote in advance here which kind of escalates quickly, but there's not really too many new ideas here except for the array specifics. So this is scores.c premade this time. And notice what I have. One, I've included cs50.h and stdio.h at the top, so that's the same. I have declared a constant called n, set it equal to 3. That is now the same as of my most recent change. I did introduce an average function, which was one of the remaining concerns that I could compute the average with some kind of loop, too. That average function is going to return a float, which is what. I want my average to be a float with the fraction. But notice this. In answer to your question, if I want a function called average to do something iterate over an array step by step by step, add up all the numbers, and divide by the total number of numbers, I need to give it the array of numbers, and I need to tell it how many of those numbers are. So I literally have to pass in two values. Meanwhile, this code is the same as before inside of main. I'm declaring a variable called scores of size n. I'm iterating from i to n. And actually-- yep. And then in this loop, I'm assigning each of the scores a return value of get_int. The last line of main is this-- print out the average with f, but don't just do it manually by adding and dividing with parentheses. Call the average function, pass in the length of the array and the array itself, and hope that it returns a float that then gets plugged into percent f So I would claim that pretty much all of this, even though it's a lot, should be familiar. There's no real new ideas except for this use of the global variable now and this average function. So let me scroll down to the average function because this is the takeaway from this final example. In this example here-- let me scroll up to the average function, copy-pasted the prototype for the very first line. And here's how I'm computing the average. There's different ways of doing this, but here's an accumulator way. On line 28, I'm declaring a variable inside of the average function called sum, and I'm just initializing it to 0. Why? Mentally I want to add up all of the person scores and then I want to divide by the total and that's my mathematical average. So here's my loop where I'm iterating from 0 up to, but not through the length-- so that should be three times. I am adding to the sum variable whatever is at the i-th location, so to speak, of the array. So this is array, bracket 0; array, bracket, 1; array, bracket, 2 on each iteration. And then the last thing I'm doing is a nice one-liner. I'm dividing the sum, which is an int, which is the sum of 72, 73, 33, divided by the length, which is 3, but 3 is not a float, so I cast it to a float so that the end value, hopefully, is going to be 59.33333 and so forth. So the only thing that's weird syntactically is this, though. When you define a function in C that takes an argument that isn't just a simple char, isn't just a simple integer, it's actually an array, you don't have to know the array's length in advance. You can just put square brackets after the name you give it. And I don't have to call it array. I could call it x or y or z or anything else. I called it array just to make clear that it's an array, but you do need to know the length somehow. OK. Questions on combining those ideas in that there way? Any questions? No? All right. Well, we've only dealt with numbers thus far. It would be nice to actually deal with letters and words and paragraphs and the like, much like our readability example, but I think first, some snacks and some fruit are served in the transept. So we'll see you in 10. See you in 10. All right. So we're back. And up until now, we've been representing just numbers underneath the hood, but we've introduced arrays, which gave us this ability, recall, to store numbers back to back to back. So it turns out, you actually had this capability for the past week even though you might not have realized it. And let me propose that we first consider very simple example of three chars instead of three integers. And for simplistically, I'm going to call them c1, c2, and c3 just for the sake of discussion. But I'm going to put our familiar characters, "HI!" in those variables using single quotes because again. That's what you do when using individual chars to make the point that I can store three chars in three separate variables. So let me go ahead and go over to VS Code here and let me create something called hi.c. And in this program, I'll first include stdio.h, int main void as before. And then inside of main, let's just do exactly that. Char c1 equals, quote-unquote, capital H. Char C2 equals, quote-unquote, capital I. Char C3 equals, quote-unquote, exclamation point. So clearly not the best approach, but just for demonstration's sake. And here now that you understand hopefully from week 1 that really number-- and really, from week 0, that numbers are just letters, which can be something more, too. We can really just use our basic understanding of C to tinker with these ideas now and see them such that there is indeed going to be no magic happening for us ultimately. So let me go ahead and print out three characters-- %c, %c, %c, backslash n. And then print out c1, c2, c3. So I've got three separate placeholders. And we haven't really had occasion to use %c, but it means put char here, unlike %s, which is put a whole string here, or %i, put an integer. Let me go ahead and make hi, no syntax errors, ./hi, and it should print out "HI!" in exclamation points because I'm printing out just three simple characters. But per our discussion as far back as week 0, letters are just numbers and numbers are just letters, it just depends on the context in which we use them. So let me change this %c to an i. And I'm going to add a space just so that you can obviously separate one number from another. Change this to i, change this to i, but still print out c1, c2, c3. So no integers, per se. Let me just print out those chars. Let me do make hi, no errors, ./hi, and now I see 72, 73, 33. So in the case of chars and ints, you can actually treat one as the other so long as you have enough bits to fit one in the other. You don't have to cast even or do anything explicitly. You do have to cast one of-- converting an integer to a float to make clear to the compiler that you really intend to do this because that could be destructive if it can't quite represent the number as you intend. But in this case here, I think we're OK just poking around and seeing what's going on underneath the hood. Well, what is going on underneath the hood memory-wise? Well, something very similar. Here's that canvas of memory. And maybe we got lucky and it's in the top left-hand corner like this-- c1, c2, c3. But these are just three individual characters, but we're getting awfully close to what we last week called a string, which are just characters, a sequence of characters from left to right. And in fact, I think if we combine this revelation that these are just numbers underneath the hood back to back to back combined with the idea of an array from earlier, we can start to see what's really going on. Because indeed, underneath the hood, this is just a number, 72, 73, 33. And really, if we go lower level than that, it's these three patterns of 0's and 1's. That's all that's going on inside of the computer, but it's our use of int that shows it to us as an integer. It's our use of char that makes it clear that it's a char, or equivalently, %i and %c respectively. But what exactly is a string? Well, it's really just a sequence of characters, and so why don't we go there? Let me propose that we actually give ourselves an actual string, call it s-- we'll use double quotes this time. So if I go back to VS Code here, let me shorten this program and just give myself a single string s, set it equal to "HI!" in double quotes. And then below that, let's go ahead and print out %s, backslash n, and then s itself. And then, turns out, for reasons we'll soon see, I do need to include the CS50 Library so as to use the actual keyword string here even though I'm not using get_string, but more on that another time. But if I now do make hi, it does compile ./hi and it still prints out the exact same thing. But what's going on inside of the computer's memory when I use a string called s instead of three chars, well, you can think of the string as taking up at least three bytes, H, I, exclamation point. But it's not three separate variables, it's one variable. But what does this really look like now, especially if I add back the yellow lines? s is really just an array of characters. So we called it a string last week, and I claim today that this is an abstraction in the CS50 library that's giving us this string, but it's really just an array of size at least 3 here where s, bracket, 0 presumably gives me the H, s, bracket, 1 is the I, s, bracket, 2 is the exclamation point. But just by saying string, all of that happens automatically. I don't even need to tell the computer how many chars are going to be in this string all at once. So in fact, let me go over to maybe a variant of this program and we can see this syntactically. So instead of printing out the whole string with %s, let me actually be a little curious and print out %c, %c, %c, and then change s to s, bracket, 0, s, bracket, 1, s, bracket, 2. Which is not better in any sense. This is way more tedious now, but it does demonstrate that I can treat here in week 2 as though it's an array, which means even in week 1 it was an array, we just didn't know it. We didn't have the syntax with which to express that. So if I now do make hi, still compiles ./hi. Same exact output, but I'm now just kind of manipulating the string in these different ways because I a string is just an array of characters, so I can treat with the square bracket notation. But how do I know-- how does the computer know where hi ends? And this is where strings get a little dangerous. Like a char is 1 byte no matter what. 1 char, 1 character, that's it. But a string, recall my question mark from earlier, could be null bytes if it's-- you would think could be 0 bytes if you have nothing in it inside the quotes. It could be one character, two, 10, 100 like I claimed, but how does the computer know where strings end? Like how does the computer not know that the string is not the whole row of memory here? How does it know that it ends here? Well, it turns out, all this time, when we've been using, quote-unquote, string and using get_string from the CS50 library, there's actually a special sentinel value at the end of every string in a computer's memory that tells the computer string, stops here. And the sentinel value-- and by sentinel, I just mean special value that the world decided on decades ago, is all 0 bits. If you have a byte with all 0 bits in it, that means string ends here. So the implication is that the computer now, using a loop or something, can print out char, char, char-- oh, done, because it sees this special value. If it didn't have that, it might blindly go char, char, char, char char-- printing out values of memory that don't belong to that given string. So I was correcting myself verbally a moment ago because I said that this string is of length 3, it's 3 bytes, but it's not. Every string in the world, both last week and now, this is actually n plus 1 bytes where n is the actual human length that you care about, H-I, exclamation point, or 3, but it's always going to use one extra byte for this so-called zero value at the end. And this 0 value is very tedious to write a 0-- as 8 0 bits. So we would actually typically just write it as a 0. But you don't want to confuse a 0 on the screen-- it's actually being like the number 0 on the keyboard. And so we would actually typically write this symbol with a backslash 0. So this is the char-based representation of 0. So it means the exact same thing, this is just C notation that indicates that this is 8 0 bits, but just makes clear that it's not literally the number 0 that you want to see on the screen, it's a sentinel value that is terminating this here string. So now what can I do once I know this information? Well, I can actually even see this let me go back to this code here in VS Code. Let me change these %c's to %i's just like before. And now, we'll see again those same numbers, make hi, ./hi, there are the three. I can technically poke around a little bit further, %i one more, and let's look at s, bracket, 3. I was not exaggerating earlier when I said, in general, if you go past the end of an array, bad things can happen. But in this case, I know that there is one more thing at the end of this array because this is how strings are built. This is not a CS50 thing, this is a thing in C. Every string in the world in double quotes ends with a backslash 0-- that is 8 0 bits. So if I really want, I can see this by printing out s, bracket, 3, which is the fourth and final location. If I recompile my code now, make hi ./hi, I should see 72, 73, 33, and 0. That's always been there. So I'm always using 4 bytes, somewhat wastefully, but somewhat necessarily so that the computer actually knows where that string ends. So if we go back to the memory representation of this here, it's just as though you have an array of integers being stored contiguously back to back to back, the last one of which means this is the end of the array of characters, but because I'm using, quote-unquote, "string," because I'm using %s and %c, I'm not seeing these numbers by default, I'm seeing H-I, exclamation point unless I explicitly tell printf, no, no, no, no, show me with %i these actual integers. This, then, is how you can think about the string. Like you don't really need to think about it as being individual characters. This is just s, and it has some length here, but it does not necessarily an array that you yourself have to create, you get it automatically just by using a string. Now there's just-- not to add on to the jargon. This backslash 0, these 8 0 bits, there's actually a technical term for them. You can call them NUL. It's typically written in all caps like this, confusingly. In a couple of weeks, we're going to see another word pronounced null, but spelled N-U-L-L. Left hand wasn't talking to right hand years ago, but N-U-L means this is the 0 byte that terminates strings, that indicate the end of a string. And fun fact, you've actually seen this before even though we glossed over it. Here's that ASCII chart from last time. If I focus on the leftmost column, guess what is the 0 ASCII character? NUL. You never see null on the screen, it's just how you pronounce 8 0 bits. Whew! questions on this representation of strings? Yeah? AUDIENCE: Are strings [INAUDIBLE]? DAVID MALAN: Are string structured differently in other languages? Yes. They are more powerful in other languages. In C, you have to build them yourself in this way. More on that when we get to Python. Other questions. Yeah? AUDIENCE: [INAUDIBLE] DAVID MALAN: A really good question. Does that mean we don't have a function to get the length of a string? Do we have to create it? Short answer, there is a function, but you have to-- someone had to write code for it. You can't just ask the string itself like you can in JavaScript or Java. What is the-- AUDIENCE: [INAUDIBLE] DAVID MALAN: Yeah, you can. It's actually more similar to Python than it is to JavaScript or Java, but we'll see that in just a few minutes, in fact. So let's introduce maybe a couple of strings. So here's two strings in the abstract called s and t, and I've initialized them arbitrarily to "HI!" and "BYE!" just so we can explore what's going to actually happen underneath the hood. So let me go back to VS Code. Let me just completely change this program to be that instead. So string equals, quote-unquote, "HI!" String t equals, quote-unquote, "BYE!" in all caps. And then let's print them both out very simply. %s backslash n, s. Print out %s backslash n, t just so we can see what's going on. If I do make hi ./hi, I should, of course, see these two strings. But what's going on inside of the computer's memory? Well, in this computer's memory, assuming these are the only two variables involved and assuming the computer is just doing things top to bottom, "HI!" is probably going to be stored somewhere like this on my canvas of memory, "BYE!" is probably going to be stored there. And it's wrapping around, but that's just an artist's representation. But notice that it is now really important that there is this NUL byte at the end of each string because that's how the computer is going to know where "HI!" ends and where "BYE!" begins, otherwise you might see "HI!" "BYE!" all on the screen at once if there weren't the sentinel value indicating to printf, stop at this character. But that's all that's going on in your program when you have two variables in this way. And in fact, what's really going on and things get a little more interesting here, if I were to want two of these things, notice that I could refer to them two as arrays. So s, bracket, 0, 1, 2, and even 3. t, bracket, 0, 1, 2, and even 3 and 4. But if I want to actually really blend some ideas, just playing around with these basic principles now, notice what I can do in this version. If I know I've got two arrays in VS Code, I don't strictly need to do string s and t and u and v. That's devolving back into the scores1, scores2, scores3 mantra where I had multiple variables almost the same name even though I'm using different letters of the alphabet. What if I want-- what if I do this? string words, and if I want to store two words in the computer's memory, fine. Create an array of two strings. But what is a string? A string is an array of characters, so it's getting a little bit trippy here, but the ideas are still going to be the same. words, bracket, 0 could certainly equal "HI!" words, bracket, 1 can certainly equal "BYE!" just like the scores example. And then if I want to print these things with %s, I can print out words, bracket, 0. And then I can print out %s backslash n words bracket 1. And the example is not going to be any different in terms of its output, but I've now avoided s and t, I now just have one variable called words containing both of these here things. And if I really want to poke around, here's where things get even more visually overwhelming, but just the logical extension of these same ideas. Right now is the previous version where I had two variables, s and t. If I now use this new version where I have one variable called words, just like this here, the picture should follow logically like this. words, bracket, 0 is this string; words, bracket, 1 is this string; but what is each string? It's an array of characters. And so you can also think of it like this, where this H is words, bracket, 0, bracket, 0. So the 0-th character of the 0-th word. And this is words, bracket, 0, 1; words, bracket, 0, 2; words, bracket, 0, 3. And then words, bracket, 1, 0. So it's kind of like a two-dimensional array, almost. And you can think about it that way if helpful. But for now, it's just applying the same principles to the code. So if I go to my code here and I've got my "HI!" and my "BYE!"-- this is going to look a little stupid, but let me change this %s to %c, %c, %c, and print out words, bracket, 0. words, bracket, 0, bracket 1. words, bracket, 0, bracket, 2 to print out that three-letter word. And now down here, let me print out %c, %c, %c, %c because it's four letters in BYE, exclamation point. This is words, bracket, 1, but the first character; words, bracket, 1, the second character; words, bracket, 1, the third character; and words, bracket, 1, the fourth character. It's hard to say when you're typing a different number, but that's what we get by using zero indexing, so to speak. make hi. Whew! No mistakes. "HI!" Says the same thing. So again, there's no magic. Like you are fully in control over what's going on inside of the computer's memory. And now that we have this array syntax with square brackets, you can both create these things and then manipulate them or access them however you so choose. Whew! Questions on arrays or strings in this way? Yeah, over here. AUDIENCE: Can you have any array that has multiple data types in it? DAVID MALAN: Good question. Can you have an array with multiple different data types? Short answer, no; longer answer, sort of, but not in nearly the same user-friendly way as with languages like Python or JavaScript or others. So assume for now arrays should be the same type in C. Other questions? Yeah, over here. AUDIENCE: When you talk about [INAUDIBLE]?? DAVID MALAN: Oh, a really good question. It will-- so for those who couldn't hear, if you were to look past the end of one array, would you start to see the beginning of the second? In this case, maybe the word "BYE!" Could depend on the particulars of your code in the computer. Let's try this. So let's get a little greedy here and go one past H-I, exclamation point, null character by looking at words, bracket, 0, 3, which should actually be our null character, so that's going to be there. And actually, let's see. Let's go ahead and do this. Make hi ./hi. Still works as expected, but let me change this to integer, integer so we can actually see what's going on. Integer. And now, if I recompile make hi, I should see the same thing, but numerically. And now what I think you're proposing is let's get a little crazy and go even past that to what could be location 4, but we know semantically doesn't exist, but maybe is bumping up against "BYE!" So make hi ./hi. And guess what 66 is. Well, just the B, but yes. 66, recall, is capital B because in week 0, capital A was 65. So indeed, now we're really poking around. And you can get crazy. Like, what's 400 characters away and see what's going on there. Eventually your program will probably crash, and so don't poke around too much, but more on that in the coming days, too. All right, well how about some other revelations and problem-solving? Now coming back to the question about strings length earlier, and we'll see if we can then tie this all together to something like cryptography in the end and manipulating strings for the purpose of sending them securely. So let me propose that we go into VS Code here again in a moment. And I'm going to create a program called length. Let's actually figure out ourselves the length of a string initially. So I'm going to go ahead and code length.c. I'm going to go ahead and include cs50.h. I'm going to include stdio.h, int main void. And then inside of main, I'm going to prompt the user for their name. get_string, quote-unquote, "Name." And then I'm going to go ahead and I want to count the length of this string. But I know what a string is now. It's char, char, char, char, and then eventually the null character. So I can look for that. And I can write this in a few different ways. I know a bunch of different types of loops now, but I'm going to go with a while loop by first declaring a variable n, for number of characters, set it equal to 0. It's like starting to count with your fingers all down, and I want to do the equivalent of this, counting each of the letters that I type in. So I can do that as follows. While the name variable at location n does not equal, quote-unquote, backslash 0, which looks weird, but it's just asking the question, is the character at that location equal to the so-called null character? Which is written with single quotes and backslash 0 by convention. And what I want to do, while that is true, is just add 1 to n. And then at the very bottom here, let's just go ahead and print out with %i the value of n because presumably if I type in HI, exclamation point, I'm starting at 0 and I'm going to have H, I, exclamation point, null character so I don't increment n a fourth time. So let's go ahead and run down here. make length ./length, Enter. Well, I guess I'm asking for name, so I'll do my name for real. David, five characters, and I indeed get 5. If I used a for loop, I could do something similar, but I think this while loop approach, much like our counter from the past, is fairly straightforward. But what if I want to do this? What if I want to make another function for this? Well, I could do that. Let me-- All right, let's do this. Let's write a quick function called string_length. It's going to take a string called s or whatever as input. And then you know what? Let's just do this in that function. I'm going to borrow my code from a moment ago. I'm going to paste it into this function. But I'm not going to print out the length, I'm going to return the length n. So I have a helper function of sorts that's going to hand me back the length of the string, and that's why this returns an int, but takes a string as its argument. How do I use this? Well, first, I do need to copy the prototype so I don't get into trouble as before. Semicolon. And then in my main function, what I think I can do now is something like this. I can do int length equals the string length of the name variable that was just typed in. And now using printf %i, print out length, semicolon. So exact same logic. The only thing I've done that's different this time is I've added a helper function just to demonstrate how I can take some pretty basic functionality, find the length of a string, and modularize it into a function abstract it away so I never again have to copy-paste that for loop. I now have a function called string_length that will solve this problem for me. Whoops, wrong program. make length. Huh. Use of undeclared identifier 'name.' What did I do wrong? Apparently on line 16 of length.c, what did I do wrong here? Yeah, in front. AUDIENCE: [INAUDIBLE] DAVID MALAN: Good. AUDIENCE: [INAUDIBLE] DAVID MALAN: Good. Perfect terminology. So name is local to main. The scope of name is main, though sounds similar, but different words. And so I'm actually should be calling this s because s is the name of the local variable being passed in even though it happens to be 1 and the same as name because on line 9, I'm indeed passing in name as the argument. All right. So this is where, again, copy-paste can sometimes get you into trouble. Let's try to make length again. Now it works. ./length, D-A-V-I-D, and now we have a function that seems to be working. But this is such like commodity functionality. Like my God, like surely someone before us has written a function to get the length of a string before, and indeed, other people have. So it turns out that in C, just as you have the stdio library, you also have a string library whose header file is called, appropriately, string.h. In fact CS50 has documentation, therefore, in its own manual pages, so to speak, along with some sample usage thereof. But it turns out, in the string library, there is a very popular function analogous to the Python one that you asked about earlier called strlen where strlen, one word, no underscores, just figures out the length of a string. And honestly, I've never looked at its source code, but it probably uses a while loop, maybe it uses a for loop, but it certainly uses the same idea of just iterating-- that is, walking from left to right over a variable in order to figure out what the length of a given string is. So how do we use this? Well if I go back to VS Code here, I can throw away the entirety of my string length function, I can throw away the prototype, therefore, and I can include a third header file, string.h, inside of which I claim now is this function called strlen that I can just now use out of the box for free because someone else wrote this function for me. And string.h will teach the compiler that it exists. So if I now do make length and ./length, now I have a similarly working program that doesn't bother having me write unnecessary code. So this is another example of a library. The string library is just going to make our lives easier by not having to-- for us not having to reinvent some wheel. All right, well where else does this get interesting? How about something like this? Let me go back into VS Code here. Let's create a program called string.c-- we'll play around with our own strings-- that's going to start similarly. So let's include cs50.h, let's include stdio.h, let's include string.h so we can use that same strlen function. int main void. And inside of this, let's do this. Let's get a string s and prompt the user for any old string as input. All right. And then let's go ahead and maybe print out, quote-unquote, "Output." And I'm just going to line up my spaces just right because these words are slightly different lengths, but we'll see why I'm doing this. It's just for aesthetics' sake in a moment. And let's go ahead now and do this. If I want to print out every character in a string, how can I now do this? Well, this is actually a pretty common task even though this version, thereof, will seem pointless. for int i gets 0, i is less than the length of s. i++ is just the conventional way to start a loop that iterates from left to right over a string of that length. And then let's go ahead and print out each character, %c, printing out the string at location i using our fancy new array syntax. And at the very end of this program, let's just print out a new line character just to move the cursor to the bottom like we've done in the past. So this is kind of a stupid program like I am reinventing the wheel that is the %s format code. I already know that printf can print out a whole string. Suppose it didn't. Suppose I forgot about %s and I only knew about %c, these lines of code here collectively will print out the entirety of a string character by character based on its length. So if I compile this program, make string ./string and type in my name-- for instance, David, the output is D-A-V-I-D, and here's why I hit the spacebar an extra time, because I wanted input and output to line up nicely so we could see that they're, in fact, the same length. So let me just stipulate. This code is correct, but there is an inefficiency with this line of code. Let's talk about design instinctively. What is maybe bad about this line of code 9-- line 9 that I've highlighted? This one is subtle. Let's go over here. AUDIENCE: [INAUDIBLE] DAVID MALAN: Yeah. I'm calling strlen inside of the loop again and again and again. Why? Well, recall how for loops worked. When we walked through it last week, that middle part of for loop in between the semicolons keeps getting checked, keeps getting checked, keeps getting checked. And so if you put a function call there, which is totally fine syntactically, you're asking the same damn question again and again and again. And the length of David, D-A-V-I-D, is never changing. So strlen, implemented decades ago by some other human, has some kind of loop in it, and you're literally making that code run again and again and again just to get the same answer 5 again and again. So I think your instinct is right. I could come up with another variable outside of the loop. I could do something like this. int length equals strlen of s, and then I could just plug that in. But there's a slightly more elegant way. If you like doing things with slightly less code, this is correct as I've now written it. It's less efficient-- it's more efficient because I'm only calling strlen once now on this new line 9, but a more common way to write this would typically be to do something like this. After initializing i, you can also initialize something else like length. And you can set length equal to strlen of s, then your semicolon, and now you can say while i is less than that length. Or I can tighten this up further. If it's just a number and it's a super short loop, might as well just call it n. So this now would be a canonical way of implementing the exact same idea, but without the inefficiency because now you're calling strlen in the initialization part of for loop, not inside of the Boolean expression that gets checked and executed again and again. Yeah? AUDIENCE: [INAUDIBLE] DAVID MALAN: Correct. Well, I'm declaring i as an int, but by way of the comma, I am also declaring n as an int. So they've got to be the same type for this trick to work. Good observation. Other questions on this one here? No? All right. Well, let's play around further here. Let me propose that there's other libraries and header files as well that you might find useful. There's also something called ctype, which relates to types and c's that's got a bunch of useful functions that we can actually see if we visit the documentation here. But before we get there, let me actually whip up a program that maybe does something a little bit fun, albeit low level, like forcing some string to uppercase if the human types it in lowercase. So let me go ahead and write a program called uppercase.c. Let me go ahead and give myself the same header files. Include cs50.h, include stdio.h. And for now, let's include string.h for the length. And let's go ahead and have int main void as before. And inside of main, let's give myself a string s equaling get_string "Before," just so I know what the string is initially. Now I'm going to print out proactively "After" with two spaces just so that things line up aesthetically on the screen because "After" is one character shorter. And now I'm going to do the same technique as before. for int i equals 0, n equals the string length of s, i is less than n, i++. And then inside of this loop, what do I want to do logically? I want to force these characters to uppercase if they are, in fact, lowercase. And so how might I do this? Well, there's a bunch of ways to express this, but I'm going to do it maybe the most straightforward way even if you've not seen this before. If the current letter in the string at location i, because I'm in a loop starting from 0 all the way up to, but not through the string length, is greater than or equal to a lowercase a, in single quotes, and that letter is less than or equal to a lowercase z. What does this mean in English? Well, this essentially means if lowercase-- logically, if it's greater than or equal to little a and less than or equal to little z, it's somewhere between and z in lowercase. What do I want to do? Well, I want to force it to uppercase. So I want to print out a character without a new line yet that prints out the current character, but force it to uppercase. Well, how can I do this? Well, this is where this gets into some low-level hacking, but notice the same ASCII chart. Here's our uppercase letters from last time. Here's our lowercase characters, and let me highlight those. Does anyone notice a relationship between capital A and lowercase a that happens to be the same for capital B and lowercase b? AUDIENCE: Capital A [INAUDIBLE]. DAVID MALAN: Yeah. Like this pattern is true. So 97 minus 65 is 32, and that's true for every lowercase and uppercase letter respectively. So I can leverage that. And this is not a CS50 thing. Like this is ASCII. This is, in turn, Unicode. This is how modern computers work. So if I go back to VS Code here, you know what I could do. Let's just literally subtract 32. But because I'm displaying this as a char, not as an int, I'm going to see the lowercase letter seemingly become an uppercase instead. Else, if it's not lowercase-- maybe it's already uppercase, maybe it is punctuation, let's just go ahead and print out with %c the original character unaltered. And then at the very end of this program, let's print a new line just to move the cursor to the next line. All right, so let's do make uppercase. And let me type ./uppercase. And I'll type in D-A-V-I-D, all lowercase, and now, you'll see it's in all caps. If, though, I type in maybe my last name but capitalized M, that's OK, the rest of it will still be capitalized for me. Now I don't love this technique. It's a little bit fragile because I had to do some math. I had to check my reference sheet and then incorporate it into my program. Even though it will be correct, I could be a little more clever. I could actually do something like this. Well, whatever the value of lowercase is-- lowercase a is minus whatever the value of capital A is, and I could actually do it arithmetically even though that, too, is somewhat inefficient in that it's asking the same question again and again, but the compiler is probably smart enough to optimize that. And frankly, for those more comfortable, a good compiler will also notice, no, no, no, no, you don't want to call strlen again and again. The compiler can do some of these optimizations for you, but it's still good practice to get into yourself. But there's probably a better way. Instead of rolling this solution ourselves and subtracting 32 or doing any arithmetic, let's use that ctype library. Let me go back up to my header files. Let's additionally include ctype.h. Let's pretend like I read the documentation in advance, which I did, in fact. And let's instead of doing any math here, let's use a function that exists in that library called toupper and pass to it whatever the current character is in s at location i. Otherwise, I still print out the unchanged character. And let me go ahead and do make uppercase ./uppercase. And now without any math, no subtracting 32, that, too, also works. But it gets better. If you read the documentation for toupper, it turns out its documentation tells you, if C is already uppercase, it just passes it through for you. So you don't even need to ask this conditional question. I can actually cut this to my clipboard, get rid of all of this, and just replace that one line only and just let toupper handle the situation for me because again, its documentation has assured me that if it's already uppercase, it's just going to return the original value. So if I make uppercase, this time, ./uppercase, now it works and now things are getting kind of fun. I mean, these are mundane tasks, admittedly, but at least I'm standing on the shoulders of smart people who came before me who implemented the string library, the ctype library-- heck, even the CS50 Library so I don't need to reinvent any of those wheels. Questions on any of these library techniques? It's all still arrays, it's all still strings and chars, but now we're leveraging libraries to solve some of our problems for us. All right. So let's come full circle to where we began, where and I mentioned that some programs include support for command line arguments. Like Clang takes command line arguments words after the word clang. CD, which you've used in Linux, takes command line arguments. If you type cd, space, pset1 or cd, space, mario in order to change directories into another folder. If you do rm like I did earlier, you can remove a file by using a command line argument, a second word that tells the computer what to remove. Well, it turns out that you, too, can write code that takes words at the command prompt and uses them as input. Up until now, you and I have only gotten user input via get_string, get_int, get_float, and functions like that. You, too, can write code that take command line arguments which, frankly, just save the human time. They can type their entire thought at the command line, hit Enter, and boom, the program can complete without prompting them and re-prompting them again. So here's where we can now start to take off some more training wheels. Up until now, we've just put void inside of the parentheses here any time we implement main. It turns out that you can put something else in parentheses when using C. It's a mouthful, but you can replace void with this bigger expression. But it's two things. int, called argc by convention, and a string, but not a string, actually an array of strings called argv. And these terms are a little arcane, but argc means argument count-- how many words did the human type at the prompt? Argv stands for argument vector, which is generally another term for an array-- you've heard it perhaps from mathematics. It's like a list of values, or in this case, a list of command line arguments. So C is special. If you declare main as not taking void inside of parentheses, but rather, an int and an array of strings, C will figure out whatever the human typed at the prompt and hand it to you as an array and the length thereof. So if I want to leverage this, I can start to implement some programs of my own that actually incorporate command line arguments. For instance, let me go back in a moment here to VS Code. Let me create a program, for instance, called greet.c that's just going to greet the user in a few different ways. So let me first do it the old way. cs50.h. Let me include stdio.h. Let me do int main void still. So the old way. And if I want to greet myself or Carter or Yulie or anyone else, I could do, old fashioned now, get the answer from the user, get_string. Let's prompt for "What's your name?" question mark, just like we did in Scratch. And then do printf, "Hello," comma, %s backslash n, answer. So we've done this many times now this week and last. This is the old school way now of getting command line-- of getting user input by prompting them for it. So if I do make greet /greet, there's no command line arguments at the prompt, I'm literally just running the program's name. If I hit Enter, though, now get_string kicks in, asks me for my name, and the program then greets me. But I can do-- otherwise, I could do something like this instead. First, answer's a little generic, so let's first change this back to name and back to name, but that's a minor improvement there just stylistically. Let's, though, introduce now a command line argument so that I can just greet myself by running the program, hitting Enter, and being done, no more get_string. So I'm going to go ahead and change void to int argc, string argv with square brackets. string means-- the square brackets means it's an array; string means it's an array of strings; and argc, again, is just an integer of the number of words typed. Now I'm going to somewhat dangerously going to do this. I'm going to get rid of my use of get_string altogether, and I'm going to change this line to be not name, which no longer exists, but I'm going to go into this array called argv and I'm going to go into location 1. So I'm doing this on faith. I haven't explained what I'm doing yet, but I'm going to do make greet ./greet, and now I'm going to type my name at the command line just like with rm, with clang, with cd. With any of the commands you've written with multiple words, I'm going to greet literally David. So I hit Enter, and voila, I've somehow gotten access to what I typed at the prompt by accessing this special parameter called argv. Technically you could call it anything you want, but the convention is argv and argc from right to left here. Just a guess, then. What if I change this to print out bracket 0 and recompile the code? And I run ./greet David? What might it say instinctively? Any hunches? Yeah. So it's going to say hello, ./greet. So it turns out, you get one for free. Whatever the name of your program is always accessible in argv at location 0. That's just because. It's a handy feature. In case there's an error or you need to tell the user how to use the program, you know what the command is that they ran, but at location 1, maybe 2, maybe 3 are the additional words that the human might have typed in. Well, let's do something a little smarter than this. Let me go back to version 1. Let me recompile it, make greet. Let me rerun ./greet David, and this seems to work fine. What if I get a little curious and print out location 2? Let me recompile the code, make greet ./greet David, Enter, OK, there's null. And I mentioned we'd see N-U-L-L, and here's one incarnation thereof, but this is clearly wrong. So I probably don't want to even let the user do this because I don't want them to see bogus output. Like this is arguably the a bug in the code that it even bothered to show this by default. So what could I do instead? Well, what if I do this? If argc equals equals 2, then go ahead and comfortably say printf "hello," argv, bracket, 1. Else, if the human did not give exactly two arguments at the prompt, let's just print out some default value like "hello, world" like from last week. In other words now I'm doing this error checking with a conditional, making sure with this Boolean expression only if argc equals equals 2, and therefore has two words in argv do you want to proceed. And so now if I do make greet again, ./greet David, this now works. But if I don't cooperate and I just run greet, what should it say? Just hello, world. If I run David Malan as two words, what should it say? hello, world, because that's not exactly equal to 2. Again, the first word in argv is always the program's name. The second word is whatever the human, then, has typed. Now if we don't even know in advance how many words they're going to be, we can combine today's ideas. This is going to look a little weird, but it's the same thing as before. for int i gets 0, i is less than-- how about argc i++? And then inside of this loop, I can print out %s, maybe backslash n, comma, and then print out argv, bracket, i. So I can have a loop that iterates argc number of times, once for every word at the prompt. I can print out argv, bracket, i, which is the i-th word in that array from left to right. And so if I now run make greet and I do ./greet alone, I just see the program's name. If I do ./greet David, I see, those two, one after the other. If I do David Malan, I get those three words. If I keep going, I'll get more and more words. So using just the length of the array and the name of the array, I can actually do quite a bit there. Now there's actually some fun things you can do with this, and this is sort of beside the point, but there's this thing in the world called ASCII art, which is making pictures and beautiful things just using ASCII or maybe nowadays Unicode characters, but without using emoji. Like emoji kind of make this a little too easy. But if all you have are traditional largely English letters and punctuation, you can actually do some interesting things. On Linux systems-- for instance, if I go back to VS Code here, let me increase the size of my terminal window here. And it turns out that we've pre-installed-- really, for no compelling reason, but just for fun, a program called cowsay, which has a cow say something. So if I want to have a cow say "moo" in ASCII art, I can do this, and you get an adorable cow saying something like "moo" on the screen. But moo is a command line argument that is clearly modifying the output of this program because I could also change it to say hello, comma, world, and now the cow is going to say that instead. So it takes multiple command line arguments, if you will. But it also takes what are called flags or switches whereby any command line argument that starts with a dash is usually like a special configuration option that you would only know exists by reading the documentation or seeing a demonstration. And if I have my syntax right, if I do cowsay -f, and maybe I'll do-- let's see. Instead of this cow say, how about I'll do -f for file, and I'm going to change it into duck mode. And I'm going to have this version of the ASCII art say quack. So it's a tiny little duck there, but it's saying quack. And you can kind of waste a lot of time doing this. I can do cowsay -f dragon and say something like, RAWR, and this is just amazing. Again, not really academically compelling, but it does demonstrate, again, command line arguments, which are everywhere, and you've indeed been using them already. But there's one other feature we wanted to introduce you to today, which will be a useful building block, which will also reveal one other thing about the code that we've been writing. It turns out that all of the programs we've been writing thus far, eventually obviously exit because you see your prompt again unless you have an infinite loop such that it never ends. But eventually they exit. And secretly, every program we've written thus far actually has what's called an exit status. It's like a special return value from the program itself that by default is always 0. 0 as a number in the world generally means everything's OK. The flip side of that is because the world tends to use integers and you've got four billion possibilities, like every other number in the world when it comes to our program's exit status is bad. If it's 1, it's probably bad. If it's negative 1, it's bad. And in fact, you've probably seen this in the real world. If you've ever had like a random error message on the screen-- here's a screenshot of Zoom, for instance. And that screenshot, somewhat confusingly or unknowingly, has an error code like 1132, that probably means that the Zoom software that some other humans wrote incorrectly somehow had an error and it did not exit with status 0, it exited with status 1132. And somewhere at Zoom, there's probably a file or a book that tells the programmers what this error code actually means. This is not useful for you and me. There's some programmer at Zoom who would probably be like, oh, I know what I did or my colleague did wrong in this case. You've seen this elsewhere even though this is not quite the same thing, but we'll talk about this in a few weeks. If you've ever seen 404, like numbers are everywhere, and on the web, 404 means like file not found. It means you made a typo, the web server deleted a file, or something like that, but this is just to say numbers are so often used to signify or represent errors. Even though that's not an exit status, per se, that's an HTTP status code, which we'll soon see. But you have access to exit statuses as it relates to command line software already. Up until now, this is how we've been writing main, now with command line arguments, but we've also been writing main with an int return value. And you've never used this-- we didn't talk about this last week. I just ask that you trust me and just keep copying and pasting this. But that int means that even your programs can return values which can be useful even if you don't use command line arguments and we just go back to the original version like void. So for instance, if I go ahead and open up, for instance, VS Code again, I'll get rid of the dragon. And let's do one other program here called status just to play around with the idea of these so-called exit statuses. Let me just demonstrate the idea with an include cs50.h, include stdio.h, int main, and here I'll do int argc, string argv. And then inside of main, let's do a similar program to before like the hello, world. So printf "hello," comma, %s backslash n. Then let's print out argv 1. But I only want to execute that line if the human gave me a command line argument. Otherwise I don't want to even say some default like hello, world. I just want to abort early and just exit the program, no output whatsoever. So I could do this. If argc does not equal 2-- and it's a single equals, but it's a bang, an exclamation point, means not equal. So this is the opposite of equals equals. Then previously I would have just printed hello, world, but now I want to print out an error message like, "Missing command-line argument" just to explain to the user why the program is about to terminate, and then I can return 1. It's kind of arbitrary. I could also return 1132, but why start there? This is the only possible error that could go wrong in my program. So I'm going to start at 1. Zoom clearly has 1,000-plus possible things that can go wrong in their source code, which is why the number got as big as 1132, but I'm just going to arbitrarily, but conventionally return 1. But if everything is OK and I do-- it is not the case that argc does not equal 2 and I actually get to line 11, I'm going to return 0 because 0, again, I claim, signifies success. And all of this time, every program we've written-- you've written has secretly exited with 0 by default. But now that our programs are getting more sophisticated, when something goes wrong, it turns out it's useful to have the power to just return some other value even though the user is not going to see it. Even though the Zoom user shouldn't see it, it's still there. It's diagnostically useful to you, or in the case of a class, to your TF or TA or CA. So if I do make status now to compile this program and run ./status and type my first name I think this is a success. It should say hello, David and secretly exit with 0. If you really want to see the 0, there's this arcane command you can type. You can literally type at your prompt echo $?. It's weird symbology, but it's what the humans chose decades ago. This will just show you what did the most recently-run program secretly exit with. So if I do this in VS Code, I can do exit $?, Enter, and there's that secret 0. I could have been doing this week and last week, it's just not that interesting. But it is interesting, or at least marginally so, if I rerun status and maybe I don't provide a command line argument or I provide too many. So argc does not equal 2. And I hit Enter, I get yelled at with the error message, but I can see the secret status code, which is, indeed, 1. And so now if you're ever in the habit in either a class like this or in the real world where you're automatically testing your code, be it with check50 or in the real world, things called unit tests and other third-party software, those tests can actually detect these status code-- exit statuses and know that your code succeed or fail, 0 or 1. And if there's different types of failures it can detect-- status 2, status 3, status 1132, it's just one other tool in your toolkit. But all of that is terribly low level, and really, the goal of this week-- and really, today, and really, code more generally, is to solve problems. So let's consider an increasingly important one, which is the ability to send information securely, whether it is in file format, wirelessly, or any other. Cryptography is the art and the science of encrypting. Scrambling information. So that even if I write a secret message to you and I send it through this open audience with so many nosey eyes who could look at the message, if I've encrypted this message, none of them should be able to read it, only you, whoever you are, to whom I intended that message. In the world of cryptography, then encryption means scrambling the information so that only you and the recipient can receive it. So if we consider our black box like in week 0 and 1, here is the problem to be solved. And let me propose a couple of pieces of vocabulary. Plaintext is any message written in English or any human language that you want to send and write yourself. Ciphertext is what you want to convert it to before you just hand it off to a bunch of random strangers in the audience or a bunch of servers on the internet, any one of whom could look at your message. So in the black box is what we're going to call a cipher, an algorithm for encrypting or scrambling information in a reversible way. It doesn't suffice to just scramble the information randomly, otherwise the recipient can't do anything with it. It's an algorithm, a cipher that encrypts it in such a way that someone else can decrypt it. And here's a common way. Most ciphers take as input not only the plaintext message in English or whatever else, but also a key. And it's metaphorically like a key to open a lock, but it's technically generally a number, like a really big number made up of lots of bits. And not even 32, not even 64, sometimes 1,024 bits, which is crazy unpronounceable large, but the probability that someone is going to guess your key is just so, so small that for all intents and purposes, you are, in fact, secure. So what's an example of this, for instance? Suppose the secret message I want to send is innocuously just "HI!" Well, it'd be pretty stupid to write "HI!" on a piece of paper, hand it to someone in the audience, and expect it to get all the way to the back without someone like glancing at it and obviously seeing and reading the plaintext. So what if I, though, agree with someone in back, for instance, that our secret is going to be 1? And we have to agree upon that secret in advance, but 1 just means that is my key. And let me propose that according to one popular cipher, if I want to send "HI!", change the H to an I and the I to a J-- that is, increment effectively every letter of the alphabet by one, and if you get to a Z, wrap back around to A, for instance. So shift the alphabet by one place in this case and send this message now instead. So is that secure? Well, if one of you kind of nosily looks at this sheet of paper, you won't see "HI!" You will see some information leak in this algorithm. You'll see an exclamation point, so I'm enthusiastically saying something, but you won't know what the message is unless you decrypt it. Now that said, is this very secure, really, in practice? I mean, not really. Like, if you know I'm just using a key and I'm using the English alphabet, you could probably brute force your way to a solution by just trying 1, trying 2, trying 3, trying 25, go through all the possibilities tediously, but eventually it's probably going to pop out. This is actually known, though, as the Caesar cipher. And back in the day, before anyone else knew about or had invented encryption, Caesar, Julius Caesar, was known to use a cipher like this using a key of three, literally. And I guess it works OK if you're literally the first human in the world by lore to have thought of this idea, but of course, anyone who intercepts it could attack it nonetheless and figure things out a bit mathematically. 13 is more common. This is called ROT13 on the internet for rotate the letters of the alphabet 13. That changes "HI!" to "UV!" You might think what's better than 13? Well, let's double the security. ROT26. Why is this stupid? I mean, there's like 26 letters in the alphabet, so like A becomes A. So that doesn't really help-- oh, wait. Oh, I'm pointing at something that's not on the screen, dammit. Suppose the message is more lovingly, "I LOVE YOU," instead of just "HI!" Same exact approach, whether or not there's punctuation, "I LOVE YOU," with an input of 13 might now become this. And now it's getting a little less obvious what the ciphertext actually represents. And now, what's twice as secure is 13? Well, 26 is surely better, but of course, if you rotate 26 places, that, of course, just gives you the same thing. So there's a limit to this, but again, that just speaks to the cipher being used, which is very simple. There is much, much better, more sophisticated mathematical ciphers that are used. We're just starting with something simple here. As for decryption, if I'm using a key of 1, how do I reverse the process? Yeah, so I just minus 1. So B becomes A, C becomes B, A becomes Z. And if it's 13, I subtract 13 instead or whatever the key is, so long as sender and receiver actually know it. So in this case here, this is actually the message with which we began class. If we have this message here and I used a key of 1 to encrypt it, well, decrypting, it might involve doing something like this. Here's those same letters on the screen, and I think in a moment before we adjourn, I'll mention too that we might have encrypted a message in eight characters this whole day, so if any of you took the time and procrastinated and figured out what the light bulb spelled and they didn't seem to spell anything in English, well, here now is the solution for cracking it. This, if I subtract 1, becomes what? U becomes T. And this is obviously-- see where we're going with this? And if we keep going, subtracting 1-- so indeed, we're at the end of class now because this was CS50. And the last thing we have to say is we have hundreds of ducks waiting for you outside. So on the way out, grab your own rubber duck. [APPLAUSE] [MUSIC PLAYING]