Transcript for:
Understanding Computer Fundamentals and C Programming

[MUSIC PLAYING] SPEAKER 1: All right. So this is CS50, and this is week four. And this is actually one of the weeks that really makes CS50, CS50, insofar as we'll take an even lower level look at how computers work, and in turn, what it is you're doing when you write code toward an end of really giving you a complete mental model of what's going on inside, so that when you run to solve some problem, when you want to fix, solve some problem, when you want to write some code, you actually know what those building blocks inside of the computer itself actually are. We'll, ultimately, too, take off some of the training wheels that we've had on for the past few weeks, particularly in C, and we'll also introduce more familiar media types. So files, like images are sort of everywhere. And we'll introduce you to exactly what's going on when you just look at a photograph, or a GIF, or a PNG, or any kind of image on your screen like this one here. And it will become clear that, unlike Hollywood TV shows and movies, if you try to enhance a picture like this to look closer, and closer, and closer, in the movies typically trying to figure out who the bad guy is, for instance, eventually, you run out of information because there's only a finite number of bits or bytes that compose these files. So any time you've seen computers that you just hit a button, and boom, it's enhanced, and all of a sudden the suspect is clear, that's a lot more Hollywood than it is computer science. But with that said, later in the term, we will talk about artificial intelligence. And even though there might not be that information there, through statistical reasoning, and modeling, and predictions, can computers increasingly actually create information, where perhaps there was none, just based on what's most likely to be there? So more on that before long, too. But you'll see that all of these dots on the screen, all of these pixels, so to speak, are just a grid up, down, left, right that compose these pictures. And we're fortunate to have three volunteers on stage who kindly, just before the lecture began, created their own pixel artwork, so to speak, on this here easel. If you guys would like to spin this around, let's see what it is you've been working on. And if you'd like to introduce yourselves as our three artists today, first. SPEAKER 2: Yes, I'm Talia. I'm a junior at the college studying economics with a possible computer science secondary. SPEAKER 3: Hi. My name is Bulut. I'm from BU. SPEAKER 1: Welcome. SPEAKER 4: I'm a Assalo Caesar, self-taught computer science student. I've been working as a software engineer since age 16. SPEAKER 1: Nice. Well, welcome to you all. And if you would like to give us a description of what it is that you built out of pixels here. SPEAKER 2: So we built a firework. SPEAKER 1: OK, nice. And it's very blocky because what we've given them is post-it notes, each of which represents one of these pixels or dots. Now, typically, it might be black or white, but the post-it notes we have here are pink or blue. So each of these represents a dot on the screen. And I gather you did one other that actually conveys maybe a bit more information, if you want to reveal version two. And thus we have yet more pixel art. So maybe round of applause for what our volunteers were able to do using pixels alone. Thank you. We have, as always, limited supply of delicious Super Mario Brothers Oreos for each of you. Thank you so much for coming up. But thank you. But the point here, really, is that there's only so much you can do when you just have dots on the screen. Now, of course, the image that we saw a moment ago of these here stress balls is much higher quality. It's much higher fidelity, or more specifically, much higher resolution. And resolution just refers to how many dots or pixels are on the screen. And the smaller they are, and the more you cram in on the screen, the clearer, and clearer the images are. But at the end of the day, even this here pixel art represents what's going on your phone, your laptop, your desktop, your TV nowadays, because all it is this grid of pixels. Now, before we can actually write code that actually manipulates these kinds of images, we need to understand, and we need to have some new syntax for navigating files. So not just text, but files stored somewhere on the computer, somewhere on the server. But let's consider how we might store even information like this. But we'll make it simpler. Here is a grid of zeros and ones, clearly. But I would argue that each of these might as well represent a pixel, an individual dot. And if that dot is a zero, it's representing the color black. If that dot is a one, it's representing the color white. Given that, can anyone see what this grid is a picture of, even though it's using zeros and ones and not post-it notes, like this here? Yeah, in the back? It's a smiley face. How do you see that? Well, in a moment it's going to be super obvious. But if I actually get rid of the ones, leaving just the zeros, there you have the zeros that were there just a moment ago. So what this translates to, typically on a screen, is not a pattern of zeros and ones literally on the screen, but a pattern of dots. So again, white might be one, and black might be-- one might be white. Zero might be black. But we picture it, of course, on our screens as this actual grid. But that's really all we need. Inside of a file to store something like an image, we just need a pattern of zeros and ones. But of course, having more colors would be more interesting. And if you actually have a larger grid, you can do even more with pixel art. And in fact for fun, at the beginning of the semester, we have a staff training with all of the teaching fellows, course assistants, teaching assistants, and we gave them all this Google spreadsheet. And we sort of resized all of the rows and columns to just be squares instead of the default rectangles. And then we encouraged them to create something out of this. And in fact, just a few weeks ago, here are some of this year's creations, creating, essentially, images using Google Spreadsheets by treating each of the cells as just a dot on the screen. So here, we have a team who in a few minutes made a Super Mario World, a bigger canvas, of course, than this here easel. Here we have a pixel based version of Scratch. Here, we had an homage to the Harvard-Yale football competition. And then here, we had a character of some sort. So this is what the team here did. And actually, if you'd like to play along at home at the risk of distracting you the entirety of lecture, if you go to this URL here, it'll actually give you a copy of that same blank spreadsheet. But let's talk about representing, not just zeros and ones, and black and white, but actual colors. And so recall from week zero when we talked about how to represent information, colors among, them we introduced RGB, which stands for red, green, blue. And it's just this kind of convention of using some amount of red, some amount of green, and some amount of blue mixed together to give you the actual color that you want. Well, it turns out in the world of computers, there's a standard way for describing those amounts of red, green, and blue. At the end of the day, it's of course just bits. And equivalently, it's just numbers, like 72, 73, 33 was the arbitrary example we used in week zero for the color yellow. But there actually tends to be a different notation by convention for representing colors that we'll actually see today, too, as we explore the world of memory. So here's a screenshot of Photoshop. If you've never used it before, this is like the color picker that you can pull up, just to pick any number of millions of colors by clicking and dragging, or typing in numbers. But notice down here. We've picked at the moment the color black by moving the slider all the way down here to the bottom left hand corner. And what this user interface is telling us is that there's zero red, zero green, zero blue. And a conventional way of writing this on a screen would be, literally, a hash symbol, and then three pairs of digits. zero, zero for red, zero, zero for green, zero, zero for blue. If by contrast, you were to pick the color, say, white in Photoshop, it gets a little weird. Now it's a lot of red, a lot of green, a lot of blue, as you might expect, cranking all of those values up. But the way you write it conventionally is not using decimal, but using letters of the alphabet, it would seem here. So FF for red, FF, for Green, FF for blue. More on that in a moment. When it comes to representing red, here's a lot of red, 255. Zero green, zero blue. And so the pattern is now FF0000. Before I reveal what green is, what probably should it be? What pattern? Yeah. Close. Not 0000FF, but 00FF00 because it seems to be following this pattern, indeed, from left to right of red, green, blue. So zero red, 255 green, zero blue, and thus 00FF00. And then lastly, if we do solid blue, it's zero red, zero green, a lot of blue, and thus 0000FF. So somehow or other, FF is apparently representing the number 255. And we'll see why in just a moment. But recall that in the world of computers, they just speak zeros and ones. And we've seen that already in black and white form. We of course, in the real world, tend to use decimal instead of binary. So we have 10 digits at our disposal, zero through nine. But it turns out that in the world of graphics and colors, turns out in the world of computer memory, it tends to be convenient not to use binary, per se, not to use decimal, per se, but to use something called hexadecimal, where as soon as you need more than 10 digits total, you start stealing from the English alphabet. So the next few numbers, or digits rather, are A, B, C, D, E, F. And there's other systems that use even more letters of the alphabet, but this is probably the last we'll discuss in any detail. So in this case, we have a total of 10 plus one, two, three, four, five, six, so 16 total, a.k.a. hexadecimal, or what we might call base 16. And the capitalization actually doesn't matter. It's conventional to use uppercase or lowercase, so long as you're generally consistent. So hexa, implying 16, decimal. So hexadecimal notation here, or otherwise known as base 16, for mathematical reasons that go back to our discussion in week zero. So here's some of that same reasoning from week zero. How might we go about representing using two digits in hexadecimal, different numbers that you and I know as decimal? Well, if we consider this as being the 16 to the zeros place, 16 to the ones place, and if we do out that math, of course, that gives us the ones place and the sixteens place. So we've only changed the base, not the story from week zero. So if we were to start representing actual values in hexadecimal, here are two zeros. So that's 1 times 0 plus 16 times 0, which, of course, gives us the number you and I know is zero. So in hexadecimal, and in binary, and in decimal, it's the same way to represent the number you and I know as zero. But here now is the number one in hexadecimal. Here's the number two. Here's the number three, four, five, six, seven, eight, nine. So it's identical up until this point to our world of decimal. But how do I count up to what you and I would call 10 in decimal, according to what we're seeing here thus far? Yeah. So now it goes up to A, because A would, apparently, represent what you and I know as 10. B represents 11. C represents 12, 13, 14, 15. How, though, do I count up to 16? Yeah. Exactly. So not 10, quote unquote, but one, zero because the one in the second column here to the left actually represents the sixteens place. So it's 16 times 1 gives you 16, plus 1 times 0 gives you 0, so 16 in total. So this now is the way the number you and I would think of as 17, 18, 19, 20, 21, dot, dot, dot. And if we go all the way up, as high up as we can count, well, what's the largest digit, apparently, in hexadecimal? The smallest is clearly zero, and the biggest I said was F. So once you get to FF, the math gets a little annoying. But this is now 16 times 15 plus 1 times 15. And what that gives us, actually, is the number you and I know as 255. So we saw it in Photoshop. We've seen it now in hexadecimal. This is not math that you would ever do frequently, but indeed, it's the exact same system as week zero, just with a different base. But why all of this additional complexity? Why are we jumping through these hoops introducing yet another one to give us just some pattern like this of FF? Well, it turns out that hexadecimal is just convenient. Why? Well, if you have 16 digits in your alphabet, zero through F, how many bits, how many zeros and ones do you need to represent 16 different values? It's four, right? Because if you've got four bits, that's two possibilities for the first times 2, times 2, times 2. So that's 16 possibilities. 2 to the fourth power. And if you've got four bits represented by a single digit, it's just convenient in practice for computer scientists and programmers. So F might indeed represent 1111. But that's not a full byte, which is eight bits. And no one counts in units of four in computing. It's always in units of, like, eight, or 16, or 32, or 64, or the like. So it turns out, though, because hexadecimal lends itself to representing four bits at a time, well, if you just use two of them, you can represent eight bits at a time. And eight bits is a byte, which is a common unit of measure. And this is why even Photoshop uses this convention, as do color programs, as does web development, more generally, of using two hexadecimal digits just to represent single bytes. Because the one on the left represents the first bits, the first four bits. The one on the right represents the second four bits. So it's not a big deal, per se. It's just convenient, even though this might feel like a lot all at once. Any questions then on hexadecimal? Yeah, in the middle. No. OK, no. Questions on hexadecimal. All right. So with this system in mind, let's go about considering where else we might see this in the computing world. And I would propose that we consider, as we've done in the past, that our computer is really just this grid of memory, for instance, where each of these squares represents a single byte. And I proposed a couple of times already that, when we talk about a computer's memory, we can think of them as each of these squares as having an individual location. Like, I spitballed back in the day that maybe this is the first byte, the second byte, the third byte. Maybe this is the billionth byte, so we can number of the bytes inside of a computer. Well, it turns out, as we'll see today in code, computers typically use numbers, indeed, to represent all of the bytes in their memory, and they typically use hexadecimal notation for such by convention. So what do I mean by that? Technically, if we were to start numbering these and count at zero, as most programmers would, this is byte zero, one, two, three, dot, dot, dot. This is byte 15. But if I wanted to keep going, it would be then 16, 17, 18, but that's not the true in hexadecimal. So instead in hexadecimal, once you hit the nine, you'd actually use A through F, just as I've proposed. Meanwhile, if you kept going thereafter, you would have one zero. But as you noted, this is not 10. This is 16 here, 17, 18, 19. And so here's where things get a little weird. I'm saying 16. I'm saying 17, and you're obviously seeing what any reasonable person would read as 10 and 11. So there's this dichotomy, and so we need some convention for making clear to the reader that these are hexadecimal numbers, not decimal. Otherwise, it's completely ambiguous. And the convention there, which you might have seen in the real world, even though it's a bit weird, is just to prefix hexadecimal numbers with zero X. It's not doing anything mathematically. It's not multiplication or anything like that. Just zero X means, here comes a hexadecimal number hereafter, just to distinguish it from decimal. And you can see that, even though we don't have enough room for 255 bytes, you start to see patterns that we haven't even talked about yet because we're just using those two columns as the ones place, the 16th place, and so forth. Capital or uppercase is fine. All right. So with that said, let's actually do things more technically interesting, like looking back at some code that we've already seen and seeing what we can actually glean from this newfound representation of memory location. So I'm going to go over to VS Code here, where I've opened my terminal window, but no code file yet. And I'm going to go ahead and create a file called addresses.c because I want to start playing around now with the addresses of information in my computer. And to do this, let me do something super simple first. Let me include standard io.h. Let me do an int main void, no command line arguments. And then in here, let me do exactly the line of code we just saw. Declare an int called n, set it equal to a default value of 50. And just so that the program does something noteworthy, let's have it actually print out %i backslash n, and plug-in that value of n. So this is, like, week one stuff, just creating a variable, and printing out its value, just to make sure that we're on the same page. So let me do make addresses in my terminal window, enter. And when I do dot slash addresses, no surprise. I should indeed see the number 50. But let's consider what that actually does inside of the computer now by flipping over, for instance, to this same line of code, and translating it into this same grid. So here's a grid of memory, and I don't necessarily know where in the computer's memory it's going to end up. So I'm picking spots arbitrarily. But I know that an int, typically, is four bytes on most systems. And so I've used one, two, three, four squares. And the first four that I assume are available are down here, and I'm calling this n, and I'm putting the value 50 in it. So literally, when you write that line of code, int n equals 50 semicolon, the computer's doing something like this underneath the hood. Might be over here, might be over there, but I've drawn it simply down there. But that means that that 50 and that variable n, in particular, live somewhere in the computer's memory. And where might it live? Well, I don't really know. And frankly, I'm not going to care, ultimately, after today. But let me propose that, if all of these bytes are numbered from zero on down, maybe this is address OX123, for the sake of discussion. So it's a hexadecimal number, one, two, three. It's not 123. It's one, two, three, but in hexadecimal, just because it's a little easy to say. But that variable n clearly must live at some address. So can we maybe see this? Well, it turns out that in C, there is a bit more syntax we can introduce today that actually gives you access to the locations of variables inside of the computer's memory. The first of these is literally an ampersand, and you might pronounce that the address of operator. Using a single ampersand, you can actually ask the computer at what address is this variable. And then the asterisk here might be known as the dereference operator, which allows you to take an address and go to it, kind of like following a map. X marks the spot. The star will take you to that location in memory, so you can see what's actually there. So what do I mean by that? Well, let me go back over to VS Code here, and let me go ahead and change my program to be ever so slightly different, as follows. I'm going to still declare n, just as before, to have the value of 50. But instead of printing out an integer, per se, I'm going to print out an address. And it turns out the format code for that, using printf, is %p. And if I want to print out now the address of n, recall that I have these two new capabilities, the first of which is germane. The ampersand will get me the address of n. So let me go back now to VS Code, and let me make a change, whereby I'm going to change the %i to %p, which is going to show me an address, as opposed to an integer, per se. But I need to tell printf what address to show, so I don't want to print out n because that's literally the number 50. I want to print out the address of n, like, where is it in memory. So here I prefix it with an ampersand. And now if I go back into my terminal window, make addresses again, dot slash addresses. I'm not going to get as lucky as seeing OX123, probably, because I got even more memory than that in this computer. But when I hit enter, I do indeed see OX something. And if I zoom in here, enhance, if you will, it happens to be at this moment in time, on this server, OX7FFC3A7CFFBC. So it's a big address. That's a really big number if we actually did all of the math. But who really cares? Just the fact that it exists somewhere is the only point for now. So this %p symbol that we're passing into printf as a format code is leveraging the fact that C supports what are known as pointers. So a pointer is really just an address, the address of some variable that you can even store in another variable called itself a pointer. So what do I mean by this? Well, if a pointer is an address, we can start to tinker with this same idea as follows. Let me actually go back to VS Code once more and play around with syntax like this. So let me still declare a variable called n and set it equal to 50. But let's actually create an actual pointer, a variable whose purpose in life is not to store a boring number like 50, but the address of some value. And so the syntax for that is admittedly weird. If you want p to be a pointer, a variable that stores an address, you literally say int star for reasons we'll sort of see. And this is different from the star I mentioned earlier for reasons we'll also see soon. But int star p means, hey compiler, give me a variable called p, inside of which I can store the address of an integer. What address do you want to put in there? Well, now I can borrow that same syntax from a moment ago. I can use ampersand n, which is going to say, hey compiler, give me-- or hey computer, give me the address of n itself. Previously, I didn't bother with a variable. I just sent the address of n into printf directly. But I can now play with it as follows. Let me go back to VS Code here. I'll clear my terminal window. And let's just play around with two variables. So int star p-- so it's an asterisk, but most people would say star-- equals the address of n. And now, I can just tweak line seven ever so slightly. Instead of printing out in duplicate ampersand n, I can literally just pass in p for pointer. So I've not done anything really that interesting, other than add a variable, but just to show you the syntax via which you can create a variable whose purpose in life is to store one of these addresses. So let me go ahead and now and do make addresses once more. Dot slash addresses. And we should see, indeed, pretty much the same idea, the address at which n happens to be, now that I've recompiled and actually run my code. But it gets a little more interesting than that. I can do one more thing when it comes to my computer's memory. In VS Code here, let me clear my terminal again, and let me see if I can perhaps reverse this process. If n is 50, and p is storing the address of n, wouldn't it be interesting if I could somehow express, go to the address of n and tell me what is there. So to do that, I'm just kind of undoing all of the intellectual work I'm doing here. But if I want to print out an integer at some location, I can go back to %i, just print an integer as always. But p now is storing the address of someplace. It is the treasure map, so to speak. So if I want to go where X marks the spot, the syntax for that I claimed a moment ago is star p. So star p means go to that address. Don't print the address, go to that address, and show me what's inside of the computer's memory there. So now, if I go into my terminal and do make addresses, and do dot slash addresses, what should I see on the screen when I hit enter? 50. So I indeed see now 50. Now, here's where it's an unfortunate choice of syntax from the authors of C decades ago. Clearly, I'm using star in two different locations. And suffice it to say, it doesn't represent multiplication in either of them. It's being used to represent addresses somehow. When, on line six, I specify a data type like int, and then I have a star, and then the name of a variable, that is the syntax for declaring a pointer, for declaring a variable that will store an address. What address? Well, ampersand n, whatever that is, OX something. When you do a star and then the name of a pointer without specifying a type, this just means, go there. So the star clearly is related to addresses. It's unfortunate that it's the same symbol. It would have been nice if they picked maybe a different symbol of punctuation. But they mean slightly different things in that context. On line six, we're declaring the pointer, declaring a variable called p that's going to point to an integers location. But when I say star p, that means go to that actual location. So just try to keep that in mind, even though it's ever so slightly subtly different. So what's going on then inside of the computer's actual memory? Well, let's consider that in pictorial form again. So even though I've written the pointer in this way, int then a space, then star p equals ampersand n semicolon, that is the conventional way. That's how you'll see it on most websites, most textbooks. Technically speaking, I will admit that it might actually be easier to understand if you actually move the asterisk a little to the left, because this makes, visually, I think, it even more clear that int star is the type of the variable p, as opposed to the star being somehow attached to the variable name itself. However, you might also see it written with a space on either side, which I don't think really helps anyone. But the point is that white space does not matter in this context. And the conventional way is to do it by prefixing the variable's name with the star. And this avoids getting into trouble when you declare multiple variables at a time. But if it helps you to think about it, you can think of it as int star as being the type. It's not just an int, per se. So with that said, let's consider now the canvas of computer's memory inside of which we're storing n, and now, p. So previously, I proposed that n is maybe, yeah, maybe it's done in the bottom right hand corner of the screen. So n is storing the number 50 here. But technically, n lives somewhere. And for simplicity, I'm going to claim it's at OX123, rather than the bigger actual address we just saw. But what about p? Well, p itself is another variable that I declared separately, so it's got to live somewhere in the computer's memory. And it turns out, by convention, pointers take up more space. They typically use eight bytes nowadays, rather than just four. Why is that? Well, if you've got eight bytes, you can count even higher. You can have even more addresses. You can have more memory in your Mac, your PC, and phone. That's a good thing. So pointers tend to be eight bytes, which is why I've used eight squares on the screen here. But what is actually p storing? Well, it's just storing a number. Yes, it's technically an integer, but that integer is itself should be thought of as the address of some other value. So n is down here at OX123. p is up here at who knows what address. Doesn't matter for the sake of discussion, but its value, what it's storing with its pattern of 64 bits is apparently OX123. So how does this help us? Well, if you think about this a little more abstractly, who cares about what else is going on in the computer's memory? It actually tends to be helpful to think about this pictorially as being a little something like this. At the end of the day, you and I, even when we start writing code in C that uses pointers, generally, you and I are never going to care about the actual addresses. Even though I showed you OX7 something, that's not generally useful information. It's suffices to know that it exists somewhere, and let the computer figure out how to get there. And so very often when talking about pointers and addresses more generally, people actually abstract them away, so to speak. So instead of literally writing on the screen or the whiteboard when discussing this, OX123, what the actual address is, who cares what it is? It suffices that it's a value that leads me to the other value that I care about, sort of the treasure map, as I described it earlier. So let's now connect this maybe a little more metaphorically. So Carter, maybe here you might have noticed that we've had for a while now these two mailboxes on the stage. So this white one here is labeled p to represent our pointer variable. Carter's is labeled n, representing our actual integer. And what's really kind of going on here is that, if I were to access the value inside of p, much like we saw it up here, that's like opening this up and figuring out what the actual value is. Now, this itself is a little arcane. OX123. And so if we actually do this a little more metaphorically, we can maybe do this and point our way, if you don't mind. So here we have a big pointer. Oh, forgive me. I guess we'll use this one here. OK. So we have this big pointer that's essentially pointing at the location in memory that we care about, be it OX123, or something else. And then if we dereference this, that is, use the star notation, star p, that's like asking Carter to go to that location, open up the mailbox, and voila. What value do you have there? Voila. Maybe a big round of applause for Carter for having practiced this beforehand with me. All right. That was mostly just an excuse to use the foam fingers today. But with that said, that's hopefully a helpful metaphor, honestly, because these pointers, these addresses actually tend to be among the more arcane topics in C that, even if things are kind of clicking right now, as soon as you start writing code involving addresses, it's easy to get lost in some of the details. But metaphorically, these mailboxes are meant to represent, really, what's going on. Mailboxes in the physical human world have addresses. I can go to that address, open it up, and then I can go to another address by following that treasure map, if you will, or pictorially here, the arrow that's pointing from one location to another. So even though it's very weird syntax with ampersands, and asterisks, and the like, it's just addresses in memory, much like mailboxes in the real world. So with that said, let's maybe begin to take off certain training wheels by revisiting what strings are, as we've been using them thus far. So here's a line of code in C that we've been using since week one, really, where I declare a string variable called s, and set it equal to, quote unquote, hi. Now, technically "hi" is three letters, or two letters in a punctuation symbol. But how many bytes is that string taking up? Is it one, two, three, or was it-- I'm seeing it here. It's four. Why? Yeah, there's always a null character that, even though you don't see it on the screen, that is what terminates every string, we claimed, a while back. So if I were to draw this maybe "hi" ends up in the computer's memory down here, bottom right hand corner. But it is indeed four bytes, not just three because, secretly, there's always been that null character, even though we as programmers don't often have to type it explicitly ourselves. That's what the double quotes do for us. It terminates the string with that null character. Now, recall from week two when we talked about arrays, we started playing around with strings as really just being arrays of characters. So we call them a string, but we could treat them as arrays of char, so to speak. So if the string was called s, s bracket zero would give us the first char, s bracket one the second, s bracket two the third. And if you're really curious, s bracket three would give you the last hidden null character, which we saw on the screen as just a zero when we printed it out, while tinkering with some actual code. But technically, today, logically, it would seem that it's also true that H-I exclamation point and the null character must clearly live at some address. They must clearly live in their own mailbox, so to speak. So maybe, for the sake of discussion, this H today is at OX123. But recall that arrays are characterized by contiguousness from left to right. So if H is at OX123, it must be the case that I is at OX124, I is at 125, and the null character is at OX126 because those are one byte apart. And I deliberately chose numbers here where, whether it's decimal or hexadecimal, it doesn't matter. These differ by just one byte themselves. So that's what implies that they're indeed adjacent, or contiguous in memory. But what is s then? When I declared s to be a string, what is it that's been going in s all of this time if, clearly, s is actually this thing here? Well, strings have kind of been a white lie for a few weeks because s itself, technically, is a pointer. s is the address of this string. So the string is somewhere in memory, but s itself is a separate variable that gives you a clue as how to find all of those characters in memory. So if you had to guess just intuitively now, if this is the string actually in memory, that is, this is the array of chars in memory, what would logically make sense to put as the value of s? A pointer. Specifically? A pointer to h. And how would I express that? What's the actual value? OX123 might very well suffice as the value here of s. Now, why might that be? Well, that essentially gives you enough information to find the beginning of the string, "hi" in this case. Now, you might think, well, wait a minute. How does it know about the second character and the third character? But now, if you kind of rewind in time, oh, wait a minute, maybe now the null character actually makes even more sense from week two. Why? Because if s technically storing the location of the beginning of the string, someone's got to keep track of where the string ends, presumably. And that's effectively the string itself because humans decided decades ago, let's just null terminate every string with a special so character, zero, all zero bits, eight zero bits, specifically. But and that's enough information. The sort of treasure map leads you to the beginning of the string, and then you can use a for loop, a while loop, or whatever to walk through the string, and that's what printf does. And you just stop as soon as you see that null character. So this then is what a string actually is. s is and has always been, since week one, a pointer, so to speak, that actually refers to the start of that array of characters. And frankly, again, who cares about the OX123 specifics? We can abstract that away and actually just treat s as, literally, an arrow that points to the beginning of that string because it will be rare that we actually care about where this thing physically is in the computer's memory. Now, before we see this in code, any questions on this revelation? Yeah. AUDIENCE: Have pointers gotten larger as computer memories have increased over the decades? SPEAKER 1: Yes. Have pointers gotten larger as computers memory has increased over the decades? Short answer, yes. Like, back in my day, we were limited to, like, two gigabytes of memory total. Well, why two? Well, if you had 32-bit memory, or if you use 32 bits to represent addresses, a.k.a. four bytes, as was conventional, you can count recall as high as 4 billion values. But generally, numbers are both negative and positive, so that halves it. So the reason decades ago, computers, PCs, Macs could have no more than two gigabytes of memory was because, literally, the integers being used, the pointers being used were only four bits, that is 32 bits. Sorry, four bytes, that is, 32 bits long. And so you literally could buy more memory. You could buy a third gigabyte, a fourth gigabyte, but you literally had no way mathematically to express all of those bigger locations. So it was effectively useless, in that case. In more modern times, computers tend now to use 64 bits, which allows you to count crazy high. And that's more than enough to address bigger chunks of memory. Really good question. Others on memory thus far. No, all right. Well, let's translate this a bit to code by going back over to VS Code here. And let me propose now that we revisit maybe a simpler string example, as opposed to these integers. So let me go ahead and throw away all of this integer related code. Let me go ahead and, for the moment, include CS50.h so that we have access to string and other things as in week one. And let me do a string s equals quote unquote "HI" in all caps. And let me do a simple safety check %s backslash n s, just to make sure everything works as it did in week one. So make addresses, dot slash addresses, and I should indeed see "HI" on the screen. Well, let's now kind of tinker with what's going on underneath the hood. And now, things can get a little more memory specific. So I'm still going to declare s as a string up here. But you know what? Instead of printing out the string itself, let me actually treat s as the pointer I claim it is. I claim a string is just an address, so I have this new syntax today, %p to print out pointers, to print out addresses. Let's see what s actually is. Let me do make addresses again, dot slash addresses, and there it is. It's not as simple as OX123, but it is at location OX55C670878004. All right. Who really cares, specifically? But if we poke around a bit more, things might make a bit more sense. Let's do this. Let's also print out the address using %p of, how about the very first character of s. So the very first character of is known as s bracket zero. We did that in week two, treating a string as an array. But how do I get the address of a character? Well, I have our new symbol today, ampersand. So even though this looks like a mouthful, ampersand, s, square bracket, zero, square bracket, it's just two ideas combined. s bracket zero gives you the first character in the string, s. And adding an ampersand at the beginning says, tell me what that address is. So if I recompile this code, make addresses, dot slash addresses, even if you don't remember the value OX whatever, what are we going to see on the screen at a higher level? Perhaps the same exact thing. Why? Well, s is just an address. But what does that mean? Well, it's just the address of its first character. And we saw that per our picture a moment ago. So can I see the contiguousness of this? Well, I'm going to resort to some copy paste just for time's sake, even though this is going to look a little silly, and I could certainly use a loop instead. But let me print out the second location, the third location, and heck, even the fourth location, whoops, the fourth location of that null character. If I now do make addresses again and dot slash addresses, and zoom in, I don't really care about what these are, specifically. But notice the first two are indeed the same because the first represents s. The first represents the first character of s, which now I reveal are exactly the same idea. And the next ones are literally just one byte away, ending in five, six, and seven, respectively. So again, the numbers in and of themselves are not useful, actionable information, but it does let us actually see what's going on underneath the hood. So just to rewind for a moment, let me actually go back to the original version, where I'm printing out the string itself, using %s. Let me remake addresses to make sure that, OK, it still prints out "HI". But what has been going on now all this time? Well, let me go back to our simple line of code that we've been using since week one, which gave us a string called s, setting it equal to the value of "HI". Let me propose now that strings were indeed this white lie. And if I can unnecessarily dramatically say, here we take the training wheels off and reveal that, all this time, string, string is probably, actually, what, technically? Yeah. A char star. That was amazing. Thank you for that. So yeah. So it's a char star, which admittedly at first glance, just makes a simple idea look unnecessarily complicated. And that's why in week one, we indeed introduced these training wheels, whereby we, CS50, invented the datatype called string, just to kind of hide this lower level detail. If you will, string for us is an abstraction. Now, that is to say string is not a CS50 specific word. Every programmer in the world knows what a string is. It is a sequence of characters. It is an array of characters. But in C, technically, decades ago when it was invented, they didn't think, they didn't decide to create an actual data type called string because, especially if they were among those more comfortable, char star is equivalent, and it achieves the exact same thing, even though at a glance, we didn't want to start week one with that lower level detail. Question here in front. Sure. Can I clarify how the star makes it a string? So we've, up until now, been just calling it a string so that's s is a string. And that's a sufficient mental model. But technically, what is a string? I claimed pictorially with my grid of memory that a string is really just an address. It's really just the address of its first character. I then tried to demonstrate as much in code by using percent p and showing you, literally, s is a value, like, OX something. And literally, its first character is at that same address, OX something. So here, when I claim that string has never really existed, except within the confines of CS50, technically, the data type of a string is best expressed as char star. Why? Well, a string clearly can't just be a char because a char, by definition, is a single character. A string, we already know, is a sequence of characters. But how can you represent a sequence of characters? You can call it a char star, which is a different data type that we're introducing today for the first time. And the star just means that s itself is not a char. The star means that s is the address of a char. And by convention, it's the address of the first char in a string. So with that said, if I go back to my actual VS Code over here, I can change, literally, char string to char star s. I can get rid of the CS50 library, our so-called training wheels, which has been the goal for the past few weeks, to put them on initially and then take them off quite quickly. So now this is the same program, and %s is still the same. s is still the same. Everything else is still the same. All I've done is change, quote unquote, string to, quote unquote, char star, which obviates the need for the CS50 library. And if I now do make addresses and dot slash addresses, "HI" behaves exactly as it would. So this is now raw native C code without any training wheels, without any CS50 scaffolding, that just uses these basic building blocks and primitives. Other questions on this? AUDIENCE: Could you please clarify why we don't use the end symbol for that s, as opposed to the other ones? SPEAKER 1: Correct. Why don't we use the ampersand symbol for this, though we did earlier? So in this case, there's no reason for an ampersand because the ampersand tells you what the address of a variable is. I'll concede that it probably would be a little more consistent for us to do this, which is maybe where your mind is going. Now, never mind the fact that looks even worse, I think, syntactically. It's a reasonable instinct, but it turns out that two is what the double quotes are doing for you. The C compiler, called Clang, is smart enough to realize that when it sees double quotes around a sequence of characters, it wants to put the address of that first char in the variable for you. But when we had a variable like n, which we created, you have to distinguish n from its address. So that's why we prefixed n with an ampersand. But the double quotes take care of it for you. Other questions on these here addresses? No? All right. Well, beyond that, let me propose that we tinker with one other idea to see how we actually invented this thing called a string. Well, I claim that string is just char star. You've actually seen this technique before. It was just a week ago that we tinkered with structures, custom data types to represent a person. And recall that we had a structure of a name and a number representing a person. But more importantly, we had this keyword typedef, which defines your own type to be whatever you want. Now, we used it a little more powerfully last time to actually represent a whole structure of a person, having a name, and having a number. But at the end of the day, we really just invented our own data type that we called, obviously, person. But and that represented, indeed, this structure. But typedef was really the enabling element there. And so it turns out with typedef, you can create any number of data types of your own. For instance, if you just really can't get the hang of calling an integer an int, you can create your own data type called integer that itself is a synonym for int, because the way typedef works, even though this one's even simpler than the struct, is you can read it from right to left. This means give me a data type called integer that is actually an int. And that's the same thing that happened a moment ago. Give me a data type called person that is actually this whole structure. But an integer is even simpler. Now, most people wouldn't do this. This really doesn't create any intellectual enhancement of the data types, but you could do it if you really wanted. More commonly, and as you'll see this in code in the future, would be not just a typedef something like an integer. But it turns out, curiously, C has no data type for a byte. Like, there's no built in obvious way to represent eight bits and store whatever you want in them. However, you can use what's called a uint8_t, t which is a data type that comes with C. And frankly, those more comfortable might simply use this data type once you sort of commit to memory that it exists. But honestly, for most of us, it's a lot more convenient to think of a byte as being its own data type. When you want to write code that manipulates one or two or more bytes, wouldn't it be nice to have a data type called byte? So it turns out that you can represent a byte, which is eight bits using an unsigned integer with 8 bits. And this is just a data type that's declared in some other C header file. But long story short, you'll see and use this before long. But it's just a synonym to make things a little more user friendly, like person, like string, like byte. So what is in the CS50 header file, among other things? Literally, this line of code. This is the single line of code that we deploy in week one onward that teaches Clang to think of the word string as being synonymous with char star, so that you all never have to type, or know, or think about char star until, wonderfully, today in week four, a couple of weeks later instead. So that's all we've been doing. That is the technical implementation of the training wheels. It's just using a custom data type in this way. So how about one other maybe pair of examples here with our addresses, such that we can tinker a little bit further? It turns out that, once everything in the world is addressable using these pointers, like using numeric addresses to represent where things are in memory, you can actually do something called pointer arithmetic. And here, too, we the programmers generally don't care what the specific values are, but we care that they do exist. And if they do exist, we can maybe do some arithmetic on them and add one to go to the next byte, add two to go to the next, next byte, add three to go to the next, next, net byte, and so forth. So pointer arithmetic literally refers to doing math on addresses. So how do we translate this into something actionable? Let me actually go back to VS Code here, and let me propose that we do something like the following. I'm going to throw away my first printf here. And I'm instead going to print out this string character by character, just like we did in week two. Let me go ahead and call printf, pass in %c for a single char, backslash n, comma, and now I want to print out the first character in s. Using array notation, what do I type to print the first character in s? Yep, over here. s bracket zero. So s bracket zero gives me the first character in s. And let me copy paste just for demonstration sake here inside of my same curly braces, and print out the second char, and the third. And I don't care about the null character. I just want to print the string itself for now. So even though this is jumping through way more hoops than just using %s and print the whole thing at once, it's again, just demonstrating how we can, at a lower level, manipulate these strings. So let me do make addresses, dot slash addresses. And yet again, we see, somewhat stupidly, one per line, H-I exclamation point. I can, of course fix that by getting rid of this. I can get rid of this, and I can leave the last backslash n. So let's just make it a look a little prettier. Make addresses, dot slash addresses, enter, and I can print it out all on one line. But now using pointer notation, it turns out we can do one other thing, which admittedly, for now, is going to feel like unnecessary complexity. But it's actually a really helpful tool to add to our toolkit, so to speak, whereby I could instead do this. To print out the first character in s, yes, I can treat it as an array and get the zeroth index. However, what is s? s is just the address of a string. What does that mean? s is the address of the first char in the string. So if I do star s, what's that going to print? Presumably h, right? Because if the first character in s is h, then star s will go to that address and show me what's actually there. And let me go ahead and do this again. Let me copy paste twice and then tweak this a little bit. I want to go to the next byte over. Well, I could do s bracket one. All right, but I could instead go to s plus one. And I could instead go to s plus two, thereby doing what we're calling pointer arithmetic, math on addresses. And now, if I go ahead and rerun make addresses, dot slash addresses, voila. Whoops, I forgot my backslash n. Let's fix that just to be tidy. Dot slash addresses, voila. There is our "HI". Now, this is not how a normal person would print out a string, but it does go to show you that there's not really been any magic. Like, these characters are just where we predicted they would be. And now that you have this star notation, the dereference operator, which means go there, you have the ability to access individual values. You even have the ability to ask where those things are by using ampersands, as well. But it turns out that the reason that we introduced the array syntax first is that the array syntax is what the world would call syntactic sugar for exactly this. When you say s bracket zero, the compiler is essentially doing star s and saving you the trouble. When you do s bracket one, the compiler is essentially saving you the trouble of doing star, in parentheses, s plus one, and same for the third char, as well. So all this time, pointers have been there underneath the hood. They are what allow us to go to very specific memory locations. They are going to be what allow us soon to start manipulating files, whether it's photographs of stress balls, or CSI style content. But for now, I think we should take our 10 minute break where whoopie pies will now be served in the transept. See you in 10. All right. So we are back, and we've clearly drawn too much attention to the stress balls today because now we're all out of these and whoopie pies. But more next week. In the meantime, though, we thought we'd now use some of these new building blocks, this idea of being able to manipulate underlying addresses, to revisit a couple of problems that we kind of swept under the rug previously by avoiding these problems altogether. So by that, I mean this. Let me go over to VS Code. And let me create another example called compare.c, whose purpose in life in a moment is going to be to compare values in kind of a very weak one way, too. So let me go ahead and include CS50.h. Let me go ahead and include standard io.h. Let me do int main void, no command line arguments. And in here, let me just get two integers using getint as follows. So getint, and we'll ask for i. Let's go ahead and get int and ask for j, just so that we have two things to compare. And then I'm going to do something super simple. So if i equals, equals j, then let's print out, as we actually did in the past, same backslash n. Else, if they're not the same, let's of course, print out, for instance, different. So super simple program that we used the first time around, really, just to demonstrate conditionals. But now, we'll use it to tease apart some subtleties. So let me go ahead and compile this with make compare. Dot slash compare. And we'll compare one and one for i and j respectively. Those are, of course, the same. Let's compare one and two. Those are, of course, different. So long story short, this program seems to work, and we won't dwell much further on it. But let's consider for a moment what's going on inside of the computer's memory when that code is executed. So here's my canvas of memory. Maybe i ends up over here. Maybe j ends up over here. Each of them I've drawn as four squares because integers are typically four bytes, or 32 bits. So i has the value 50 here. i has the value 50. So I accidentally typed one and one, but assume that I had typed 50 in both cases. They both live at these two separate locations. All right. So that's all fine and good. And when we compare them, of course, 50 and 50, or one and one are, in fact, the exact same. But what if we actually compare different types of values? Let me go back into VS Code here. And instead of integers, let's still, using the CS50 library, maybe use some strings instead. So let me go ahead and change my i and j to maybe s and t, respectively. So string s equals getstring. And I'll ask for s, quote unquote. And then string t equals getstring. And then I'll ask for t, quote unquote. And then down here, I'll compare s equals, equals t. So here's the code, almost the same logically. I'm just getting different data types instead, still using the CS50 library. So let's do make compare again, dot slash compare. And let's type in something like "HI" exclamation point, "HI" exclamation point. And that's interesting. All right, let's maybe try it again. So maybe lowercase "hi", "hi". No, those are different. Let's do it one more time, like "hi", "bye". OK, so it half works. But it seems to be saying different no matter what. Well, why might that be? Well, let me first just peel back a layer. We already know that strings don't technically exist. They're really char star. And string here is char star. So does this reveal, perhaps implicitly, why s and t are being thought to be different, even though I literally typed "hi" twice? Yeah, on line nine here, I'm really just comparing the addresses that are in s and t, and that's why I changed it to char star, just not to change anything, but to make it even clearer that s and t are, in fact, addresses. They're not strings, per se. They're the address of the first character in those strings. And even though they happen to be the same words that I typed in, it would seem to imply that they're ending up in different places. So here's another canvas of memory for this program. And here, for instance, might be s with enough room for eight bytes up here as a pointer. Here maybe is where "hi" ended up for this particular story. Well, what's actually going in s? Well, if h is at OX123, i is at OX124, and so forth, what's going in s is OX123. But when I use get string a second time and type in "hi" exclamation point, even the exact same way, uppercase or lowercase, t is ending up, presumably, somewhere else in memory. So it's maybe using these eight bytes over here. The same letters, coincidentally, by nature of how getstring works, are ending up in the computer's memory, maybe down there, bottom right hand corner. Those are presumably different addresses, OX456, 457, 458, 459. So what's going to go in t as its value? OX456, according to this example. And so when you literally compare s equals, equals t, no, they're not the same. They are, in fact, different, even if what they're pointing at happens to be the same. So the computer's taking us literally. If you compare s and t respectively, it's going to compare what their values actually are. And their values are the addresses of the first letter of this string, and the first letter of this string, respectively. And if those addresses differ, which they clearly do, they're going to be deemed different. Now, you might wonder, well, this just seems stupidly inefficient. Why put the same string in two different places? Well, maybe the string needs to be changed later on, and we might want to have two different versions thereof. And frankly, the first time you call getstring, it does its thing. The second time you call getstring, it does its own thing. It doesn't necessarily know how many times it's been called in the past. And so maybe there's no communication between those calls. And so surely, it's going to do the simple thing and just create more memory, create more memory for each of those strings, duplicates though they may seem to be. So what does this imply? Well, you might recall that we avoided this problem altogether just a week ago by using what solution on line nine? I did not compare two strings using equals, equals last time. Exactly. We used the strcompare function, which is in string.h very deliberately at the time, because I didn't want to trip over this mistake at the time until we were sort of ready and had the vocabulary to discuss it. But I did not do s equals, equals t, even though, logically, that's what you're trying to do, compare for equality. But if you know now what a string is, it's an array of characters starting at some address. You really need someone something to do the heavy lifting of comparing every one of those chars from left to right. We did it ourselves last time by just implementing it in code two weeks ago. But strcompare compare does it for us. So strcompare, s comma t actually weirdly returns three possible values, zero if they're the same, a positive number if one comes before the other, or a negative number if the opposite is true. So strcomp, remember, can be used for alphabetizing things, or ascii-betizing things, based on those Ascii values. So this version, if I open my terminal window now and do make compare, dot slash compare, and type in "hi" and "hi", now, in fact, they're the same because strcomp is doing the work of comparing them char by char. And if I do "hi" and "bye", those are now, in fact, different. So we avoided the problem last time for this very reason that simply using equals, equals would not have worked. Yes. AUDIENCE: Using those values? Is it like one minus one, or one, two, three, depending how different they are? SPEAKER 1: Oh, good question. So when using strcompare, the documentation says that it will return zero, or a positive number, or a negative number. It doesn't tell you a specific number. So the magnitude of the integer that comes back actually has no meaning. It might very well be one, zero, and negative one, but there's no guarantee. And so you can check for equality equals, equals, or you should check for greater than or less than, but not specific to a certain number. So it just gives you relative ordering. It doesn't give you any more detail than that. All right. So if we were to now take this lesson a step further, just to hammer home this point, whereby these strings s and t must clearly live at different addresses, let's actually try to see this in code. So let me go back to VS Code here. Let me go ahead and just remove all of the conditional code, and instead do something old school, like print out %s backslash n and print out s. Then Let's go ahead and print out %s again, but print out t, just to see the two strings as being duplicative. So here I go. Make compare dot slash compare, "hi" exclamation point, "hi" exclamation point. And of course, they're actually the same. But if I actually want to see where s and t are, I can change the % s to what? %p, %p here. And I don't need to use an ampersand before the s or the t because they are already addresses. That was today's big reveal. And it turns out that printf is smart enough when you use s, and you give it an address of s, or the address in t, to just go there for you. So printf has been doing all of that for us with %s. But %p is actually going to print out those raw addresses. So let me do make compare, dot slash compare, "hi" once, "hi" twice. And here now, we should see the addresses at which "hi" lives. And it's not going to be as simplistic as OX123 and OX456. But if I go back to my terminal and hit enter, indeed, I get two different hexadecimal values that makes clear that, if I were to naively compare them with equals, equals, they're always going to be different, even if I typed in the same words. So there's implications now of this, especially if we want to start changing things in memory. So for instance, let me create a new program called copy.c. And in here, we'll start somewhat similarly with CS5o.h. We'll start with standard io.h. And preemptively, I'm going to go ahead and include string.h, as well. I'm going to declare main as not taking any command line arguments. And this time, I'm just going to get one string s with getstring, and I'll prompt the user for s. And now, let me go ahead and naively say this. Let me give myself a new string called t and just set it equal to s, my instinct being this is how I've copied integers before. This is how I've copied floating point values before. This surely is how I copy strings, using the assignment operator as usual. Let me now for the sake of discussion propose that I want to capitalize the first letter in t, but not the first letter in s. So logically, based on week two syntax, I'm going to go into the t string, go to location zero, and set it equal to upper of t bracket zero. So recall, we introduced two upper. It's just a handy function for capitalizing things. There's two lower, and there's a bunch of others, as well. I didn't include the header file yet, though, so I'm going to go back and include-- anyone remember where these are? Yeah, ctype.h. And it's fine to look that up in the menu if you ever need it. So here, I am, a little naively, capitalizing the first letter in t. Technically speaking, I should check what the length of t is first, because if there's no characters there, if it has zero chars, there's nothing to uppercase. But for now, I'm going to keep it simple and just blindly do that there. Now, let me go ahead and print out with %s the value of s. Now, let me go ahead and print out with %s value of t. And I should see one lowercase s and one capitalized T. All right, here we go. Make copy, dot slash copy. And I'm going to deliberately type it in lowercase. "hi" exclamation point, and we should see now they're both capitalized, it would seem. Intuitively, why might that be? Exactly, the addresses are the same. So if I do use the assignment operator and just do t equals s semicolon, it's going to take me literally and copy the address in s over to t, so that effectively, they're pointing at the same thing. So if we draw another picture here, for instance, here maybe is s, and here maybe is the lowercase "hi" that I first type in down here in memory. Maybe that's at OX123 again, and therefore that's what's in s. When I then create the variable t by declaring it to be a string, as well, that gives me another variable here called t. But I'm just setting it equal to s. I'm not calling getstring again in this version of copy. That was in compare. In copy, I'm just literally copying s into t. So that literally just changes the value to OX123, also. And if we abstract away all of these addresses, that's essentially like s and t both pointing to the same place. So if I use s bracket zero, or t bracket zero, they are one and the same. So when I use t bracket zero to use uppercase, it's changing that lowercase h to capital H. But again, both strings, both pointers are pointing at the same value. And again, this should be even clearer as of today. If I go back into VS Code and, indeed, take these training wheels off, and treat string as what it is, char star, which indicates that both s and t are just addresses, which makes even clearer, syntactically, that this is probably the picture that's going on underneath the hood. Now, just to make the code a little more robust, let me at least be a little careful here. If the string length of t is greater than zero, then and only then, should I really blindly index into the string and go to location zero. That doesn't really solve the fundamental problem, but it at least avoids a situation where maybe the user just hits enter, gives me no characters, and I try to blindly uppercase something that's not there. But there's still a bug. There's still a bug. So how do I actually solve this? Well, it turns out we need two other functions that we haven't had occasion to use. But these are perhaps the most powerful, and they're going to allow us to solve even grander problems next week when we discuss all the more, things called data structures. But for now, let's very simply solve this idea of copying a string. Let me go back into VS Code here, and let me give myself one more header file that's called standard lib for standard library. So include standard lib dot h, in which both of these functions, malloc and free, are declared for me. And now, in my code, I'm going to behave a little bit differently here. Clearly, I got into trouble by just blindly copying the addresses. What I really want to do when I copy strings, presumably, and then uppercase one of them, is I want to create a duplicate string, a second array that is identical, but is elsewhere in memory. So the way to do this might be as follows. Instead of just setting t equal to s, I should really call this brand new function called malloc, which stands for memory allocate, and it takes a single argument, which is just the number of bytes you would like the operating system to allocate for you. So whether you're using this on Windows, Mac OS, or Linux in our case, this is a way I can literally ask the operating system, please find for me some number of bytes in the computer's memory that I can now use for my own purposes. So malloc here, I technically need at least three bytes, but that's not going to be enough because I need a fourth for the null character. So I could put four here. But that's stupid. I shouldn't just hardcode a number like this we've seen. So I could probably do strlen of s to dynamically figure out how many bytes I want for the copy. But that, too, is not enough because string length returns the human readable length, so H-I exclamation point. So I think I want a plus one in there, too. So that just means get the length of whatever the human typed in, add one for the null character to make sure that we're not undercounting. Now, what can I then do? Unfortunately, I need to do a bit of work here. So let me actually go ahead now and do something like this. For int i equals zero, I is less than the string length of s, i plus, plus. And then inside of this loop, I could copy into the ith location of t, whatever is in the ith location of s. Now, this is a little buggy. One, this is inefficient to keep asking this question. We talked about this in the context of design. I should probably improve this by giving myself a variable like n, set that equal to the string length, and then do i is less than n again, and again, just so I'm not stupidly calling string length four different times, or three different times. But this, too, is slightly buggy, and this one's very subtle. This does not fully copy s into t. Does anyone see the very subtle bug that I've introduced? Sorry? Yeah, I'm forgetting the backslash zero. So even though I'm copying H-I exclamation point, or whatever the human typed in, I need to go one step further deliberately to make sure I also copy the backslash zero, or at least manually put it in myself. So I could solve this by, either doing this up to and through n, i is less than or equal to n, or I could plus one here. That, too, would be fine. Or if I really want, I could do this, like t bracket three equals, quote unquote, backslash zero. But again, I shouldn't get into the habit of hard coding things. I could do string length of s, and that would give me the last location in s, which would also work. But that, too, is stupid. I might as well-- or just unnecessarily complex. Let's just do this, change one symbol, and boom. Now we're copying all three, and the fourth character, as well. All right, so with this said, let's go ahead now and make sure that t is indeed of length at least greater than zero. Then let's go ahead and capitalize t as before and print out the results. So let me go ahead and open my terminal window, make copy, dot slash copy, and I'm going to deliberately type "hi" in lowercase. And now we should see disparate s and t. s is now still lowercase, and T is now capitalized. But why is that exactly? Well, let me actually go into, say, my computer's memory again and propose that, if what I had before was this situation, where s is pointing at this chunk of memory, and t was accidentally pointing in that same chunk of memory, what we really want to do is have t point at a new chunk of memory. And malloc is what gives us this chunk of memory. And then using that for loop, can I copy the H, the I, the exclamation point, and even the backslash zero. So now, this is a little subtle, but malloc is what gives me access to this new chunk of memory. Malloc takes one argument, the number of bytes that you want it to find for you. Take a guess. What value is malloc returning? Conceptually, it's returning a chunk of memory, but that's kind of handwavy. What might malloc actually be returning? AUDIENCE: Maybe the pointer to the first character? SPEAKER 1: Perfect. malloc is returning the address of that chunk of memory, not the last address. The first address. And here's a difference with strings. This chunk of memory is not magically terminated with null for you. I had to do that with for loop. malloc, and in turn, your operating system, does keep track of how big these chunks of memory are. So even though it's only returning the address of the first byte of that memory, the operating system is going to know that it used up four bytes here, four bytes here. And it will keep track of that so that it doesn't give you an overlapping address in the future because that would be bad. Your data would get corrupted. But you, similarly, have to remember or figure out how many bytes are available thereafter. It's up to you to manage it, as by putting a null character there yourself. So if I go back to my code now, let me actually harden this code just a little bit more as follows, whereby I can do this a little better. If I go back to VS Code here, it turns out, if something goes wrong and I'm out of memory, maybe I've got an old computer, or maybe I'm typing something way bigger than three characters in, like three billion characters, and the computer might genuinely run out of memory. I actually should be in the habit of doing this. If t equals, equals a special symbol called null with two Ls, and I promised this would eventually exist, I should just return one now, or return two, return negative one, return any value other than zero, and just abort the program early. That means, if malloc returns null, there's not enough memory available. And it turns out, all this time, I'm going to do one other crazy thing, even though we've not expected you to do this thus far. Technically, when using getstring, getstring, if you read the documentation, the manual, it too can return null. Because if you type in a crazy long string, and the computer can't fit it in its memory, getstring needs to signal that to you somehow. And the documentation actually says that, if getstring returns null, then you too should not trust what's in it. You should just exit the program immediately, in this case. But there's one other improvement we can make here. And even though this is making the code seem way longer than it is, most of this I've just added is just error checking, just mindless error checking to make sure that I don't treat s as being valid, or t as being valid when it isn't. It turns out this is stupid. I don't need to reinvent this wheel. Certainly, for decades, people have been copying strings, even in C. So it turns out there's another fun function called strcopy, wonderfully enough, that takes the destination as its first argument, the source as its second argument. And that will for me copy s into t, respectively. So that does the equivalent of that for loop, including the backslash zero. However, there's one other function recall that was on our cheat sheet a moment ago, whereby malloc is accompanied by one other function called Free. So Free is the opposite of malloc. When you're done with your computer's memory, you're supposed to give it back to Windows, to Mac OS, to Linux so it can reuse it for something else. And frankly, if you've ever been using your computer for hours on end, days on end, and maybe it's getting slower, and slower, maybe it's Photoshop, maybe it's a really big document, generally, really big files consume lots and lots of memory. If the humans who wrote that software, be it Photoshop or something else, wrote buggy code and kept using malloc, malloc, malloc, malloc, asking for more and more memory, but they never call the opposite function, Free, your computer might actually run out of memory. And typically, the symptom is that it gets so darn slow it becomes annoying to use. And frankly, the mouse starts moving very slowly, maybe the thing freezes altogether, the computer crashes. Bad things happen when you run out of memory. So in my case here, if I go back to VS Code, it's actually on me in this language called C to actually manage the memory myself so that, when I have called malloc, thereafter, I had better free that same memory. Now, I don't want to free it right away. I want to free it when I'm done with it. So frankly, the very last thing I'm going to do in my program here is called Free on t because t is what I malloced up here. So at the very bottom of my program, I should free t. And then just to be super nitpicky, let me return zero just to signify success at this point. Now, there's a slight asymmetry, which is a little inconsistent here. Even though getstring, I'm going to imply, is still allocating memory for me, it actually does use malloc. getstring and CS50's other functions are special. They manage memory for you, so you do not and should not free memory that getstring returns to you. We handle all of that for you. But that's a training wheel that's going to be taken off as of this week anyway, so it's kind of moot. So not to worry. But I'm only freeing memory that malloced. All right. Null, then, means the-- what is null? It is just an address, and it's literally the address zero. So there's this theme. N-U-L recall, was the terminating symbol, which just means the string ends here. N-U-L-L, which is not greatly named, but it's what humans went with years ago, just means that this is the address zero. And what your computer does is, even though I've been playfully saying that, oh, in the top left is address zero, and then one, and then two, and then three, the address zero is hands off. It's kind of a wasted byte that your computer should never use because the computer uses zero as a special sentinel value, null, to signify error. So we're spending one byte out of billions nowadays just to make sure that there's a special symbol that's coming back that can indicate when something has gone wrong. All right. That was a mouthful. Any questions on this copying of strings, this malloc-ing, or this freeing? Oh, all right. So let me give you a tool with which to make some of this stuff easier, so that when you make mistakes or have bugs, as you invariably will, you can chase them down without having to raise your hand, without having to ask the duck. You actually have more technical tools with which to diagnose the problem yourself. And there's this new tool that we'll introduce today called valgrind. And valgrind's purpose in life is to check your usage of memory for you. Admittedly, it's an older program. It's pretty arcane in terms of its interface, and there's just going to be a mess of output on the screen. But there's going to be certain patterns of mistakes that you'll notice, and I'll demonstrate a couple of them now so you can see where and how you might go wrong. So I'm going to go over to VS Code here. I'm going to create a program called memory.c that is deliberately buggy, but it's not going to be obviously buggy at first. So by that I mean this. Let me do include standard io.h. Let me also include proactively standard lib.h so I can use malloc. Let me declare main with no command line arguments, and let me do something very simple. Instead of just declaring an int called X, let me be a little crazy and manually allocate this memory myself. So int X just gives me an integer, and it has since week one. But now that I have malloc, I can kind of take control over this process. So let me declare, not an int, but an int star called X. So give me the address of an integer, and let me store there the return value of malloc by asking malloc for, let's say, four bytes. So I know that ints are four bytes. If I want four bytes, I just tell malloc, give me four bytes. Now, frankly, this is a little stupid. I shouldn't just assume that the int is always going to be four bytes on everyone's computer. So there's this function you can start using called sizeof, or this operator, technically, where you can say sizeof int. And even if you're on an older computer, for instance, really old at this point, sizeof int will return the correct value, no matter what. You don't have to assume that it's, in fact, four. But you know what? I'd actually like more than this number of ints. Let me actually treat X as an array of integers. So actually, if I want an array of integers, I could do this. Give me three integers. But no, no. Let me not do week two syntax. Let me do this myself as follows. Let me treat this as three times the size of an int. So that's technically going to give me 12 bytes. But this makes X effectively an array. And this is kind of deliberate now because if an array is just contiguous memory, and malloc returns to you a chunk of contiguous memory, you can treat what comes back from malloc as an array. And indeed, that's what we're doing as strings. We're treating chunks of memory as arrays of chars. So let me do something arbitrary here. Let me go to X bracket one and set it equal to 72. X bracket two, set it equal to 73. X bracket three, set it equal to 33. And we did this a couple of weeks ago. That's "hi" but in Ascii code. Let me go ahead and make memory, and it seems to work fine. Let me do dot slash memory, and no problem. There's no error messages from the compiler. There's no runtime errors when I actually run the code. But does anyone see any of the bugs thus far? What did I do wrong? Let me look a little in the back. Yeah. AUDIENCE: Does it not know when the array ends? SPEAKER 1: It doesn't seem to know when the array ends. Or more specifically, I'm not respecting when the array ends because I'm sort of stupidly starting at one, then two then three. But technically, if I asked for three of these things, I should have done bracket zero, bracket one, bracket two. And there's a second more subtle bug that you would only know from today. Yeah. OK, I don't necessarily know when one integer ends and the next one begins. That's actually not a problem, because on a given system, integers are always the same size. So the computer can be smart enough to go from here, four bytes this way, four bytes this way, four bytes this way. That's OK. Strings are problematic because who knows how big the sentence was that the human typed in. But there's a more subtle bug. What have I not done? I didn't call free. So I didn't practice what I just preached. Anytime I malloc, I call free. But again, per my terminal window, neither of these bugs seem obvious. You might submit this code, or deploy it to your software, and be none the wiser. But a tool like valgrind can actually help you find these things. So let me increase the size of my terminal window. Let me run this command valgrind on my program. So dot slash memory is how I ran it a moment ago. Just like debug50, you type before the name of your program. Valgrind, you type before the name of your program. And the output is going to look crazy, but this is useful. Why? So notice at the very top of this, we're just seeing what version of valgrind we're using and what command we ran. But this starts to get juicy, and I'll highlight this here. Invalid write of size four invalid write. So writing means changing information, like setting a value or assigning it a. Value and this is useful here. The problem is in memory.c at line nine. So colon nine means line nine. All right, so let me go back to my code, look at line nine, and oh, interesting. So invalid write of size four. So it's cryptic, but size four I know is the size of an integer. So I'm probably doing something stupid on line nine involving changing an integer. And sure enough, even though it's not super obvious, X bracket three, oh, obviously, this doesn't exist. So I have to change the problem. One and two were OK, even though it's logically the wrong thing. Now I think this will get rid of this error. So let me actually clear my terminal window and make it bigger again. Let me recompile my code because I made a change. Let me rerun valgrind of dot slash memory. And now, that error went away. There's a mess of output here, but that error went away. But this is interesting here now. 12 bytes in one blocks are definitely lost in loss record one of one. So unnecessarily verbose, but the hint here is that I somehow lost some bytes, otherwise known as a memory leak. So earlier, when I described an imaginary bad programmer who kept calling malloc, malloc, malloc, and never freeing, that's what's called a memory leak, where you're sort of losing track of your memory and never freeing it again. So I've definitely lost 12 bytes in one block, whatever a block is, in this case. This is a little less obvious. It's up to us to notice that, OK, wait a minute, memory.c line six is somehow germane. Let me go back to-- oh, this is where I called malloc. And valgrind doesn't necessarily know when I should free the memory. That's up to me, but I should probably free it at the end of my function when I'm definitely done with it, because once you free your memory, you should not touch that variable again, unless you actually change what its value is. So now, as I've done this, and this program to be clear does nothing useful. This is just an intellectual exercise, not anything productive. Let me do make memory one last time. Let's do valgrind, dot slash memory. And let me grow my terminal window again and hit enter. And even though it's still kind of output, it's still kind of cryptic, at least it says no leaks are possible. So now this is my own sort of teaching assistant telling me before I submit the code, or before I deploy it to production in real software, that at least there seem to be no memory related errors. So valgrind is not for logical bugs. It's not for syntax errors. It's for memory related bugs, as of today. Questions on any of that? No? OK, so what else can go wrong? We mentioned these in the past. It turns out that garbage values are a thing. And recall that, if you declare a variable but don't give it a value with an equal sign, and you just blindly start using it, like printing it out, or doing math on it, you might be manipulating a garbage value, which is some number that's essentially remnants of your computer having been on for a while. Because if you're using this canvas and reusing it again, and again, surely there's going to be patterns of zeros and ones there that you didn't put there yourself, at least in the moment. They might be remnants of the past. So garbage values are values of variables that you did not proactively set yourself as intended. So we can actually see this. Let me actually go ahead and whip up a really quick program here after shrinking my terminal window. Let me close memory.c. Let me go ahead and open garbage.c. And in here, I'll do include. How about standard io.h? Let's include standard lib.h Actually, we don't even need standard lib.h. Let's go ahead and include standard io.h and then int main void. And then inside of the curly braces, let's give me a really big array of scores, like 1,024 scores, like if it's a really busy semester. And then let me go ahead and just blindly iterate from i equals zero on up to i is less than 1,024. And I'm not going to bother with constants. I'm just going to play around with these numbers for a moment. And, oh, thank you. Oh, cookies for you. OK. OK, here we go. OK, come on up. Thank you very much. Fair is fair. OK. Thank you. OK. OK, now everyone's really paying attention. All right. So in my loop here, I'm just going to do something stupid, like print out all of the values in the scores array using percent i, even though I did not put anything in this array. So on line five, I'm obviously declaring an array of size 1,024 for that many ints, but I'm never actually putting values in there myself, or with getint, or any other function. So there's garbage values there. There's presumably 1,024 garbage values there, and we can now actually see them. Let me make my terminal window bigger. Let me make garbage, no pun intended, dot slash garbage. And there's going to be way more than even fit on the screen. But who cares? We just need to see a few. There are some of the garbage values in the array. So make super clear that when you create variables of your own, you do not give them values of your own. Who knows what may be there? In some cases, it gets automatically initialized for you to all zeros, but that is not always the case. And in general, distrust the variable unless you yourself have put a value there. So how now might we leverage this to-- how now might we think about potential problems? Well, consider this code here, which this program, too, is more for discussion than actual utility, where at the top of it, I declare a variable called x and a variable called y, both of type pointer. So x and y are supposed to be the addresses of two integers. malloc, the size of an int, and stored in x. So I'm giving myself space for x, even though, obviously, I could have done this weeks ago by just not using the star, and just say give me an int x. Now I'm doing it the low level way, malloc-ing the x for myself. I'm then saying go to x, go to that address in memory, and put the number 42 there. I'm then saying go to y and put the unlucky number 13 there. But what's worrisome about this line here? After this line, this line, this line, something's bad, I think. Yeah, I never allocated memory for y. So specifically, I never assigned y a value, which means it's a garbage value, which is still a number. Maybe it's zero. Maybe it's a big number. Maybe it's a negative number. And if it's a positive number, it could be an actual address somewhere in the computer's memory. But star y means go there. Who knows what memory I'm touching? That's how computers crash if you touch memory that you're not supposed to. So let me pretend that I didn't at least do this and let me just forge ahead and set y equal to x so they're the same. And I think what that would mean is now, if I do star y and go to the address, that's the same thing as going to the address in x. And I think this will have the effect of changing the 42 to 13. So this code is correct, so long as I don't blindly dereference y by using star y notation. So this gets a little abstract, even though this is just an exercise here. And our friend Nick Parlante, a professor at Stanford, wonderfully put together a little claymation that's fun to take a look at, whereby if I go ahead and open up this file, we'll be introduced to someone who's a little famous in the world of computing named Binky, if we could dim the lights and take a look at what bad things can happen if you don't manage your memory properly. SPEAKER 5: Hey, Binky. Wake up. It's time for pointer fun. SPEAKER 6: What's that? Learn about pointers? Oh, goody. SPEAKER 5: Well, to get started, I guess we're going to need a couple pointers. SPEAKER 6: OK. This code allocates two pointers, which can point to integers. SPEAKER 5: OK. Well, I see the two pointers, but they don't seem to be pointing to anything. SPEAKER 6: That's right. Initially, pointers don't point to anything. The things they point to are called pointees, and setting them up is a separate step. SPEAKER 5: Oh, right, right. I knew that. The pointees are separate. So how do you allocate a pointee? SPEAKER 6: OK. Well, this code allocates a new integer pointee, and this part sets x to point to it. SPEAKER 5: Hey, that looks better. So make it do something. SPEAKER 6: OK, I'll dereference the pointer x to store the number 42 into its pointee. For this trick, I'll need my magic wand of dereferencing. SPEAKER 5: Your magic wand of dereferencing? That's great. SPEAKER 6: This is what the code looks like. I'll just set up the number and-- SPEAKER 5: Hey, look. There it goes. So doing a dereference on x follows the arrow to access its pointee, in this case, to store 42 in there. Hey, try using it to store the number 13 through the other pointer, y. SPEAKER 6: OK. I'll just go over here to y and get the number 13 set up, and then take the wand of dereferencing, and just-- whoa. SPEAKER 5: Oh, hey, that didn't work. Say, Binky, I don't think dereferencing y is a good idea because setting up the pointee is a separate step, and I don't think we ever did it. SPEAKER 6: Good point. SPEAKER 5: Yeah, we allocated the pointer y, but we never set it to point to a pointee. SPEAKER 6: Very observant. SPEAKER 5: Hey, you're looking good there, Binky. Can you fix it so that y points to the same pointee as x? SPEAKER 6: Sure. I'll use my magic wand of pointer assignment. SPEAKER 5: Is that going to be a problem like before? SPEAKER 6: No, this doesn't touch the pointees. It just changes one pointer to point to the same thing as another. SPEAKER 5: Oh, I see. Now y points to the same place as x. So wait. Now y is fixed. It has a pointee, so you can try the wand of dereferencing again to send the 13 over. SPEAKER 6: OK. Here it goes. SPEAKER 5: Hey, look at that. Now dereferencing works on y. And because the pointers are sharing that one pointee, they both see the 13. SPEAKER 6: Yeah, sharing. Whatever. So are we going to switch places now? SPEAKER 5: Oh, look. We're out of time. SPEAKER 6: But-- SPEAKER 1: All right. So our thanks to Nick. I can only imagine how many hours he spent making that happen. But hopefully, it gives you more of a visual as to what's happening when we're dereferencing these addresses, and going to them, and assigning values, and as per Binky's explosion there, what happens when you dereference values you shouldn't. So related thereto, let me do this. Let me go over to VS Code and open up now a program I wrote in advance called swap.c. And the purpose of this program is just to swap the value of two variables. So let me walk over to the code here and point out that, in main, I've got two variables, x and y. No pointers, no magic there. Just x and y are one and two respectively. I've got a couple of printfs here saying x is %i, y is %i, passing in x and y, just so we can see that x and y are indeed one and two. I'm then calling a function called swap, which presumably, should swap the two values. And then I'm just printing the exact same thing again, my hoping that it's first going to say one, two, then it's going to say two, one, thus achieving the idea of swapping here. And here's swap. Swap takes in two integers, a and b, though I could have called them whatever I want. It temporarily puts a in temp. It then changes a to b. It then changes b to temp, and then that's it. It's a void function, so it doesn't return anything, but it does all of the mathematical work in here. So this is curious, though, because when it runs, let me open up my terminal window here. Make swap, dot slash swap, I should see one, two, and then two, one. But no, even though I do think this is logically correct. And actually, we're almost out of stock, but we do have another box of cookies here. Can we get one volunteer to come on up here maybe? OK, how about you? Yes, in the pink, come on up. A round of applause, though, really, it's about the cookies now, I know. OK. And what is your name? SPEAKER 7: My name is Caleb, and I'm a first year concentrating in computer science. SPEAKER 1: All right, welcome. Please stand behind the desk here. No, you can stand. It's fine. SPEAKER 7: OK. SPEAKER 1: We have two glasses of water, colored, blue, and orange, respectively. And I would like you to swap the values of these two variables so that the orange liquid goes in the blue glass, and the blue liquid goes in the orange glass. SPEAKER 7: Seems like a bad idea. SPEAKER 1: Why is that? SPEAKER 7: Because I can't get one out to put the other one in because there's no third glass. SPEAKER 1: OK correct because we do have what we generally call a temporary variable here. So here, let me give you a variable called temp. And if I give you this, how does that change things? SPEAKER 7: Well, now, I can take one. SPEAKER 1: OK. SPEAKER 7: Very carefully. SPEAKER 1: Nice. SPEAKER 7: I'm trying. SPEAKER 1: OK. SPEAKER 7: There we go. SPEAKER 1: And now you can put b into a, if you will. Nice. And now temp goes back into that one. All right. That was very well done. Maybe round of applause. Thank you. So this was just a cookie based way of making clear that the code on the screen seems to work. If I scroll back down to the swap function, it seems to do exactly what you just did there, whereby the temporary glass is where we put a, then we changed a to contain b, then we changed b to contain what was originally an a, but is now in the temporary glass. And now we're done. So it did achieve the stated goal, and yet when I ran this code a moment ago, it was one, two, and then one, two again. So why might that actually be? Well, here we can go back to some of today's fundamentals to consider what it is that's going wrong. And in this case, it's actually related to a concept we introduced some time ago, whereby there seems to be an issue of scope, whereby sometimes when you're manipulating variables inside of curly braces, thus defining their scope, it has no effect on values elsewhere. The variables might not even exist elsewhere, as we saw in the past. So what do I mean by this? Well, with matters of scope, it turns out that in this case, the way I've implemented the swap function, I'm doing something a programmer would call passing by value. I'm literally passing in x and y by their values, one and two. Another way of putting this is passing by copy. So when I pass x and y into the swap function, it turns out swap is actually getting copies thereof. Now, what do I mean by this? Well, let's go back again to this picture of memory representative of what's in your Mac, your PC, or your phone. And if we zoom in on this chip and we treat it more abstractly as this canvas, get rid of the actual hardware, and consider what's going on inside of the computer, it turns out that there are conventions of how computers use this memory. And it's worth having a general sense of what goes where. So generally speaking, if this is a big rectangular region of memory, even though this is just an artist's depiction thereof, it turns out that the top of your memory, so to speak, is where machine code goes. The zeros and ones that you compile get loaded into here. So when you do dot slash something, or on a Mac or PC, when you double click, or on a phone, when you single tap, that loads your program's machine code, the app's machine code to the top of your computer's memory. Strictly speaking, it doesn't have to be the top. But for our sake, it's in this region here. That's how the computer can access all of those zeros and ones quickly. Below that, so to speak, are where global variables go. We haven't had many occasions to use these. But if you define a variable outside of main, and outside of every other function in C, it's what's called a global variable. So those get tucked especially up at the top so that they're accessible everywhere else in your program. Then there's something called the heap. More on that in a moment. And it grows downward, so you have a lot of memory available to you here in the heap, and you can keep getting more, and more, and more available to you. But at the bottom of this memory is what's called the stack. And the stack actually grows, curiously, in the other direction, up, and up, and up. And it turns out, when you use malloc and ask the computer for memory, it comes from this heap region, specifically. When you use functions with variables and arguments, you're using stack memory. Now, the astute viewer will notice that this does not seem like a good thing if they're about to collide eventually. And bad things can and will happen when one overflows the other, but more on that, too, in a moment. But let's focus for the moment on a stack when we do something like this swap function. So for instance, when we had code like this, which was bad, it did not allow us to permanently change the values of x and y. Why? No pun intended. Here on the stack is where the very first function goes in your computer's memory. So main, if you have any variables, they go at the bottom of the computer's memory once you've loaded that program. So what do I mean by that? Well, if you think back to the code a moment ago, it was things like x and y, and so forth. When main calls swap, swap goes above it on the stack, so to speak. And each of these rectangles, the technical term is frame. So this is a stack frame. This is a stack frame. And if swap called another function, another frame would go on the stack this way. And then as soon as swap returns, though, that memory essentially goes away, or the computer forgets about it, even though the bits are obviously still there. You still have the hardware, but it's forgotten. And main remains until main finishes and exits your program. But let's consider what's going inside of these stack frames. So here's main at the bottom, and it had two variables, x and y. Those variables were one and two, respectively. Main called swap, which had two arguments, a and b, also integers, which are effectively local variables, also, even though you're declaring them in the signature, the prototype of the function. So when swap is called, swap is using its frame of memory as follows. Room for a, room for b, room for temp. Not necessarily to scale. I just wanted everything to be a pretty rectangle. What's going where? Well, because functions in C pass by value, that is, copy, a is a copy of x, and b is a copy of y. But they're separate bytes. This is a different memory location than this. This is a different memory location than this. So we're just copying the patterns of bits from one to the other. This is passing by value, a.k.a. Passing by copy. So what then happens? Just like our demonstration, we used temp cleverly, whereby with this code here, we copied the value of a into temp. So that puts the number one here, too. We then changed a to equal b. So that's what happened here. We then changed b to equal temp, so that changed the value there. But then swap returned. You went back to your seat, leaving a and b swapped, yes. But what was not swapped was x and y. You did all of this work correctly, but in the wrong scope. You operated on copies thereof. So this swap function, while logically correct, will never solve this problem correctly as written because we've been passing by value. So today, we introduce a technique, whereby you can pass by reference instead, pass by pointer instead, because instead of just passing in copies, what if we actually tell swap where x is and where y is, not what it is and what it is, but where each is? Then swap can follow the proverbial treasure map, go to those locations, and change them permanently. So this was the bad code in red, and this is going to escalate quickly visually. But it's just an application of today's ideas. This is the correct solution now. So let me do before. In red is bad. Green, after, is correct. Why? The way you specify pass by reference or pointer instead is you change swap to take, not two integers, per se, but two addresses of integers. And the syntax for that today is just to add the star. So int star, and int star. Meanwhile, the code down here has to change. Temp does not have to change. Temp is still just a variable that's ready for some value. But the a and the b, and the b need to be rewritten as follows. So star a means go to the address a, and get its value, and put it in temp, just like you reached for one of the glasses and poured it in. Star b means, go to the value in b and grab it. And then go to the value at a and change it to be that at b. And then lastly, this is not now sweat. This is now colored liquid. This last line is go to the address b, and put temp there instead. So the picture now is fundamentally different. Main looks the same still. But when swap is called, effectively, and we won't bother with OX123 or 456. Let's just do it with arrows pointing at things. a is a pointer to x, b is a pointer to y. So what do those lines of code tell us to do? Go to a. So that means this, kind of like the old chutes and ladders game, if you ever played. Follow the arrow, and that leads you to the number one, and store it in temp. So that one was straightforward. Go to the value in b. So follow the arrow. That gives us two, and put it at the location in a, which means put the two there. The very last line of code now means get temp, which is obviously there, and go to the address b, and change it to one. So now, even though we've not changed a and b at all, per se, we've used them as little breadcrumbs to lead us to the right location in the computer's memory. So when swap returns this time, even though it's a void function, it has made a difference. And it's had this effect of swapping the actual original values of x and y. The code, admittedly, is cryptic looking. It's not the most user friendly syntax, but this ability now to go to locations in memory and change what is actually there is what we've been given today with this new syntax of the star operator, and occasionally as needed, the ampersand one, as well. Questions on this technique, which is, admittedly, the most sophisticated of the examples thus far, and we'll probably take time to get used to. Yeah. Say that again. Will this work if-- AUDIENCE: Will this work if you're swapping the value of two strings instead of ints? SPEAKER 1: Ah, good question. Will this work if you're swapping the values of two strings instead of two ints? Yes, if you go to the address that the string represents and change maybe with a loop all of the characters one at a time. So it's going to be more complicated than this in green because you're going to have to change all of the individual characters, probably reusing a temporary char, instead of a temporary integer. But you could. Yeah. AUDIENCE: [INAUDIBLE] SPEAKER 1: Since integers have a fixed number of bits, can you ever run into a situation where you run out of memory? Absolutely. Your phone, your laptop, your desktop can only do so much, can only count so high because of these physical limitations. And hopefully, it's just never reach that limit. But we'll talk in a couple of weeks time when we transition to web programming and databases, and the Metas, the Microsofts, the Googles of the world that have crazy large amounts of data. The number of bits we use in those contexts is actually going to matter for exactly that reason. If business is booming, and if you've got lots and lots of data, lots and lots of users, you need to be able to count higher. Just so that you've seen the code actually in operation, here is, of course, my swap function down below. And if I go ahead and change its prototype to take in pointers to a and b, and similarly change the prototype up here. And if I go in and change a here to be star a to dereference it, star b to dereference it, star a to dereference it, and star b to dereference it, I claim now that this version of the code should now work. In fact, let me go ahead and do make swap. It didn't compile. So why might that be? Well, let me scroll up to see what the message is. Incompatible integer to pointer conversion passing in parameter of type int star. That's a lot to absorb. But clearly, the issue is with how I'm calling swap. So why is this? Well, notice that now that my swap function expects as arguments pointers to integers, I can't just blindly pass in x and y, which are integers. Instead, I do need to use our friend, the new ampersand operator to pass in the address of x and the address of y. So now, if I reopen my terminal window, run make swap, and do dot slash swap, now I see one, two, and then two, one. So the code changes in this way. And maybe this example more so than others makes clear why the star operator lets us go somewhere, but the ampersand effectively does the opposite and figures out what the address of something now is. All right. So what about these other locations in memory? Well, it turns out that, indeed, the stack, as we've described it, grows up, and up, and up. And recall that stack here in this sense is kind of like the stack of trays and the cafeteria or any of the dining halls. There's one tray, another tray, another tray, another tray. But then you start removing them from top down. So there's an ordering to them that we'll actually revisit next week. But this is not a good design, in general. You shouldn't be doing things like two trains on the tracks barreling together toward each other in this way. But honestly, it's kind of the only way, because if you've only got a finite amount of memory, OK, sure, you can have them both grow in the same direction. But they're still going to hit some impasse eventually. You're still going to run out of space. So the way computers were designed years ago is they use memory in this way, even though bad things can happen, if you use too much stack space, or too much heap space. So what do I mean by that? Our example a moment ago just had us call main and then swap and that was it. So it's like two frames no big deal. But if you call many functions again, and again, and again, if you do something recursively, where you call yourself, you're going to pile, pile, pile stack frames potentially. So you could start to hit the so-called heap area. Meanwhile, if you call malloc too many times, you might be growing down, down, down, down, down, and then overrun some of the stack memory, as well. So bad things can happen when you overrun either of these. And those of you maybe with prior programming experience might have heard at least one of these terms, heap overflow, or more popularly, stack overflow. Super popular website for questions and answers about programming. The etymology thereof is exactly this idea of overflowing the stack and touching memory that you should not, whether it's memory down here, or even worse, memory over here, as by something called heap overflow. And these are specific examples of what we'll start calling buffer overflows. Buffer is just a chunk of memory. And buffer overflows means overflowing, using too much of that memory. And buffers are everywhere. In fact, if you've used YouTube recently, and maybe it's just kind of paused and spinning, and spinning, and spinning, maybe you're on a really bad connection. There's no more bytes in your buffer. There's no more video footage in the buffer because maybe you have such a bad connection. But if Google were to make mistakes and try to download too many bytes at a time, they too could overflow a buffer. And if YouTube or similar apps have ever crashed, it could be because they're trying to use more memory than they actually should be. So these things are sort of everywhere. Now, as for these training wheels, we sort of took away the mystery of what a string is. But what about all of these other functions we've been taking for granted now for a few weeks? You can and should still use them to solve some problems because, frankly, C does not make it easy to get user input safely, like, period, full stop. It is very non-trivial to get user input without running the risk of overflowing a buffer. Why? Well, you're the programmer. How do you possibly know in advance how big of a string a human might type in tomorrow, or the next week, or the next day? You could try to be safe and allocate a million bytes all at once. But what if they type in 1,000,001 characters, or use copy paste so much that they similarly overflow? So getting user input is a hard problem. So let's introduce you to what the alternative would be and given an appreciation for what libraries like CS50's and others like it are actually doing for you. Let me go ahead and create our own version of getint and getstring without using the CS50 library, but using a standard C function called scanf. And to do that, let me go over to VS Code. Let me create a new file called, for instance, get.c. And then in get.c, let's make a very simple program first that just gets an integer, but again, without using the CS50 library. So let me go ahead and include the standard io library. Let me go ahead and declare main as int main void. And then inside of main, let me go ahead and declare an integer n so that we have some place to put the integer that we're getting from the user. Then let me go ahead and just prompt the user for a value for n, so n colon space, for instance. Because again, I'm not using getint, so I can't just call it to present the user with a prompt. So I'm going to use printf to create my own prompt. And now, let me use this function scanf as follows. I'm going to call scanf, and then I'm going to pass to scanf, similar in spirit to printf, a format code, like %i, effectively telling scanf that what I want it to scan, so to speak, from the user's keyboard is in fact a single integer. Now, I'm going to close quotes, and I don't need a new line because I'm not trying to print anything. I'm trying to get something from the user using scanf. But I do need to tell scanf where to put this integer. Now, if I want to put this integer in the variable n, it's not quite as simple as just passing n in because recall how variables are passed. This variable n is going to be passed by value. Effectively, a copy is going to go into scanf, and so scanf is not going to have the ability to change that value. But if you think back to how we swapped two values and passed two values into that swap function in C, well, if we pass those two values in by their addresses, so passing by reference, so to speak, then the function, swap in that case, scanf in this case, can actually go to that address and change the value. So to summarize, I'm going to pass the scanf one argument, which is a format code, and a second argument, which is the address of an integer into which to put the user's value. After that, I'm just going to go ahead and print out what's happened. So I'm going to go ahead and print out the value of n followed by a colon, followed by an actual placeholder, %i backslash n. And I'm going to pass in now to printf the value n. So to be clear, I'm still passing n into printf, just like we've been doing since week one, but I'm passing to scanf the address of n, so that scanf can actually go to that address and change the value of n. So I think this is actually going to work, even though I've not used getint or any of the CS50 library. Let me go into my terminal, run make get. Seems to compile OK. Let me do dot shalsh get, and let me type in a value like 50 for n. And indeed, I should see spit back at me that the value I got was 50. So it turns out that getting an integer from users is relatively straightforward just using scanf. But of course, to use scanf, you need to know a little something about pointers or addresses, more generally. That was not knowledge we had in week one. And so we do, indeed, use those training wheels of the CS50 library for the past few weeks so that we can get integers more easily. And it turns out, if the user types more than a simple integer, or doesn't even type in an integer, scanf isn't necessarily going to behave as user friendly as getint might. So in the CS50 library, we do a bit more error handling for you, as well. But let's consider now an implementation, not of getting an integer, but getting a string instead. Let me clear my terminal window, and let me go ahead and erase all of this code, and instead focus this time on getting a string. Well, we know we can't use string anymore, at least if we're not using the CS50 library. But not a problem because we know that strings are now char stars. So if I want to get a string from the user, that's like getting, I think, a char star. So let me just call this string s by default. Let me go ahead therefore and declare a variable s that's going to store my string. Let me go ahead next, as before, and prompt the user for the value of that variable, just by prompting them with printf. So nothing fancy there. And let me try again using scanf to scan this time a string from the user's keyboard. I'm going to type scanf. I'm going to do %s instead of %i because I, indeed, want to scan a string in this case. And then I'm going to go ahead and pass in just s. And here, at first glance, seems to be an inconsistency because, previously, I did ampersand n. But that's because n was an integer, not the address thereof. But in the world of strings as we now know, a string is just the address of its first byte. And so if we declare s to be a char star, a.k.a. string, well, s is already in address. So I can just pass in s in this case to scanf without actually using an ampersand. After that, let's go ahead and print out the result. So let's just use printf. Let's print out a prefix, like s colon again, %s as my placeholder, and now backslash n because I'm formatting it on the screen. And then let's go ahead and pass in s as always to printf. So s is just a string, so I just pass it into printf like that. Well, let me go ahead now, and I'm going to go ahead and compile this an old fashioned way because we actually protect you from doing something like this. But I'm going to go ahead and ignore the warnings you would otherwise see us make. And I'm going to go ahead and compile this with clang directly. So clang-o get, because that's the name of the program I want to output. But I'm also going to specify dash capital W, no, uninitialized, which is simply another command line argument that's going to tell clang not to warn us about variables that are not initialized. Because case in point on line five, as some of you might have noticed, I didn't actually initialize s to anything, even null. But that's OK because I want to forge ahead blindly just to make a point as to what's going on here. And in fact, let's go ahead and compile this code as follows. It does seem to compile, even though make would have warned us that something's awry. Let me go ahead now and run dot slash get, and this time, not type in 50. But let me type in something like our familiar "hi" exclamation point, and hit enter. I immediately get a segmentation fault, which means something has gone wrong related to memory. A segment of memory has been touched that I shouldn't have. Well, why, in fact, is this? Well, let's consider what it is we've been doing. If this here is my computer's memory, and in the first case I was just trying to get an integer, that was actually pretty straightforward, because even if this memory is filled with a whole bunch of garbage values, as personified here by Oscar the Grouch, when I declared n to be an integer before, I just needed, on this machine, four bytes, which is the typical size for an int. And I put the number 50 there. So it doesn't matter that there were these garbage values. I just went to those four bytes after declaring a variable called n and overwrote those bits, with some pattern of bits representing the number 50. But strings we now know are sort of fundamentally different. If I go back to that same memory space, and I declare s to be a pointer, that is, a char star, well, recall that pointers are generally eight bytes on modern systems. And so that's like taking eight of these bytes from memory and calling it s. But the catch is, if I haven't initialized s to actually be a valid location, as via calling malloc, there are still garbage values there. That is to say, patterns of bits that maybe have been there always, but from some previous function that got called, or some other lines of code if this program were actually bigger. So it's just some garbage value is filling that variable s. The problem, though, is that in my code now, when I call scanf and I tell scanf to scan a string from the user and to put it at that location s, well, what is that location s? It's literally a garbage value. It's the equivalent of a foam finger pointing there, there, there. We just don't know because it's not a valid address. And so I get that segmentation fault here in my terminal window because I've not initialized s to be some known value. I get a segmentation fault because, effectively, I've accidentally touched memory that I should not, in fact, have done so. So how do we fix this? Well clearly, I need s to point at some valid chunk of memory, and I could do that using malloc. But frankly, in this case, I could do it even more simply by just declaring s to be an array of characters, as we might have in week two. So let me go ahead and clear my terminal window here. Let me go into get.c, and let's simply change what's s is. Instead of a char star, which we know is what a string technically is, we can still implement strings as arrays of characters. That's certainly still true. So let me go ahead and do that, declare s to be an array of, say, four characters. And in this case, I should have enough room for the H, the I, the exclamation point, and even that null character, the trailing backslash zero. So now, let me go ahead and build this. Make get. And because I'm not, not initializing something this time, I can use make as usual without getting yelled at because I'm not yet doing anything wrong. Now let me go ahead and do dot slash get enter. And in this case, it's ready to receive my H-I exclamation point enter, and all actually seems well. Why? Because in this case, I actually had enough space for s, because if I go back to my memory here, because I've now redeclared s as an actual array of four characters, that's like asking the operating system, for instance, for these four chars here. And certainly, I can fit H-I exclamation point and the null character into those four bytes. So there's not a problem. But there might be a problem if the I, or the user, more generally, types in too many characters. So let me go ahead and run dot slash get again. Let me type H-I exclamation point. But just to get a little aggressive, let me highlight that and paste it again, again, again, and again, and really type, very excitedly, a pretty long string that is surely longer than four bytes. Well unfortunately, I've only asked the operating system for an array of four bytes. So what's going to happen with all of those extra hi's, hi's, hi's? They're just going to, by default, remain contiguous from left to right, top to bottom in the computer's memory. But they're going to end up, some of those characters, at locations I didn't ask the operating system for in this array. So if I go back to VS Code here, I've typed in a very long string, certainly longer than four bytes in total. Let me hit enter. And darn it. There is another segmentation fault. So in short, we, you are going to see these segmentation faults any time you touch segments of memory, so to speak, that do not belong to you, that you didn't allocate space for, as via an array, or even via malloc. And this is going to be a fundamental problem with getting strings because I don't in advance how long the string is going to be that the human's going to type in. Maybe it's four. Maybe it's fewer characters. Maybe it's even more. So what's the alternative? Well, I could go in here maybe and allocate, I don't know, 4,000 characters for s. But what if you type in an even longer string that's 4,001 characters or more? I might still have these memory related errors, these segmentation faults. So one of the reasons then, too, that we provide you with the CS50 library, and in turn, functions like getstring, is that getstring very, very conservatively walks through these user's input byte, by byte, by byte, one character at a time. And what the CS50 library is doing underneath the hood is, as soon as it realizes, oh, the user gave us another byte, another byte, we in the CS50 library are constantly allocating and re allocating more and more memory using malloc for you, and effectively managing the memory required for that string. So even though scanf exists, it's dangerous to use with strings. And even with integers, it turns out it lacks some of the error handling that the CS50 library has thus far provided. How do we actually go about solving this? The way getstring actually works in the CS50 library is it kind of tiptoes. It waits. It gets one character from you and then checks if there's another one coming. Then it allocates more space for a second. If there's still a third, it allocates more space, more space, more space. So essentially, what getstring does is it uses malloc again, and again, and again, and it kind of lays the tracks down as you're typing in the keystrokes and hitting enter, so that we never assume how many characters you're going to type in. We dynamically allocate just enough bytes for you, plus one extra, for the null character. And this is sort of a hoop that's just not fun to jump through when, at the end of the day, all you want to do is get input from the user. So even with the training wheels officially off, it's going to be annoying to get strings from users in C. But it is easy with ints, with floats, with other data types. And frankly, we'll soon, in two weeks, pivot to Python, which takes care of all of these problems for us and manages our memory. But for now, we have one final to do beyond scanf, which is file IO, which is a fancy way of saying input and output. Because now that we a little bit of hexadecimal, now that we know a little bit about pointers, we actually have some more functions available to us that will let us actually manipulate files on a computer's hard drive, like image files, or text files, or anything else we might want. Among the most common functions that are related to files are these here. fopen is going to be a function that lets you open a file, doing in code what you might otherwise do by going to File, Open in a graphical program. Fclose does the opposite. It's the way you, in code, click on an X and close a file. Nothing's going to happen visually, but it's how you give access to a program to the contents of a file. fprintf, it allows you to print, not to the screen, but to a file. fscanf lets you read data, not from the keyboard, but from a file. fread and fwrite write are similarly used to read and write data from a file, but generally binary data, like images, or something that's not Ascii or Unicode text. fseek is a function that lets you move around in a file left to right, kind of like fast forwarding or rewinding through Netflix, or similar when you want to jump to a different location in a video, or in this case, a file. And there's bunches of others, as well. So to give you a sense of what you can do with it when it comes to manipulating files, let's write just a couple of final programs, for instance, that let us manipulate some of this code for us. In fact, let me go ahead and open up here in VS Code a new file called, say, phonebook.c. And in phonebook.c, we're going to implement now a version of the phonebook like we did in the past. But in this case, we don't actually have a forgetful program that prompts the user with getstring for a couple of names and numbers, and then just forgets about them altogether. This version of the phonebook is actually going to go ahead and save them persistently to a file for us. And for this, let me go ahead and open up just on my other screen here, without flipping over just yet, let me go ahead and open up-- give me just one moment. So we have this ready to go. Let me go ahead and create the file, this program as follows. I'm going to cheat and save time by using the CS50 library because I do not want to get into the nuances of getting strings character by character, which itself will escalate too quickly. But let me go ahead and include the CS50 library, the standard IO library, and lastly, the string library for this particular case. In my main function now, I'm going to go ahead and open up a file called maybe phonebook.csv. If you've ever used a CSV file, it's like a lightweight spreadsheet that you can open in Apple Numbers, Google Spreadsheets, Microsoft Excel. But CSV means that we're going to separate all of the values by commas. So anywhere we want a new column, we actually use a comma, as we'll soon see. So how do I actually do this? I can open a file called phonebook.csv by literally using fopen phonebook.csv. And I have to tell fopen how I want to open it. Do I want to open it for reading with R? Do I want to open it for writing with W? Or do I want to open it with appending, A? And for something like a phone book, if I run this program again and again, I'm going to actually do append so that new contacts get added to the file, and we don't overwrite it with W. Now, what is fopen return? It technically returns a pointer to a file. But this one's a little weird. It's all capitalized, but it is a thing in C. File in all caps star file is going to be a pointer to that file in memory. So think of fopen as opening the file and returning the address thereof in the computer's memory. All right. What do I want to next do? I want to go ahead and get two strings from the user, like maybe someone's name, using getstring, again, to keep things simple for now. Let me then go ahead and get another one. How about their number? Using getstring, again, prompting for a number. And I don't strictly need these training wheels. So even though it doesn't really make a difference, I'm going to at least change that to char star, even though I do want to keep using getstring conveniently. And now I want to save this person's name and number to that CSV file. So I'm going to use, not printf, but fprintf, printing to that file, variable, which is open in the computer's memory. Now I'm going to go ahead and print out two strings, %s comma, %s. Then I want to go ahead and print out the name for the first placeholder, and the number for the second placeholder. And for good measure, I want to move the cursor to the next line in the file, so I am going to include a backslash n. Then I'm going to go ahead and fclose that same file with fclose. And that's it. No more printing to the user. But I claim that I'm going to be changing the file again, and again, and again. So let me try this. Make phone book, OK, dot slash phonebook, enter. And let's type in David. And how about +1 617-495-1000? Enter. OK, hopefully, it worked. Let's do it again. Dot slash phonebook. Carter, we'll give him the same number as last time. 495-1000. And let's do, how about just those two? So let me go ahead now and reveal that we do have a file in here called phonebook.csv. So that does exist. Let me go ahead and do this. Let me open up my file browser over here. I've got a lot of files I've created. Here's phonebook.csv. And if I click on it, there is the file that I just created, separated by commas. But even more interestingly, let me actually right click or control click on this, download it to my Mac's downloads folder. Let me go into my downloads folder just for fun, and I've installed in advance Microsoft Excel. If I go into my downloads folder and open up phonebook.csv, we're going to see, oh, Apple Numbers, not Excel, opening up. View my spreadsheets. All right, numbers is kind of stupid. So there we go. No, this isn't a Mac versus PC thing. So now we have phonebook.csv rendered in this format here. Numbers presumes that the top row should be gray and not white, as well. So the formatting looks a bit off. Anyhow, clearly, you could open this same file in a spreadsheet program like Microsoft Office, or Apple Numbers, or of course, something like Google Spreadsheets. But let me do one other thing when it comes to copying files now, whereby besides making a phone book, whereby I clearly have the ability now to save strings in files. And actually, just for good measure, let me hammer home the point that anytime we're dealing with pointers now, something could go wrong. And if you read the documentation for fopen, we should also check that file could be null. Maybe the file is not found, or something's not working on the server. And so just to be safe, we should return one there. So even not just malloc, not just getstring. Any time a function returns a pointer, you should check if it's null, because if it is, per the documentation, almost always means something has gone wrong. So you should get out, lest you trust the return value therein. So let me go ahead and do one other program here. Let me create my own copy program. So up until now, we've used commands like RM, and LS, and CP for copy. I can actually create my own version of Linux's copy program, perhaps as follows. Let me actually go into cp.c, in this case. Let me include some familiar file. Standard io.h. Let me include, how about one other? Standard int.h for reasons we'll see now, because in standard int.h is that uint8_type that I mentioned earlier, which just means, give me an eight bit value that's unsigned, which means no negative numbers. It's just raw data. It's not an integer in the positive or negative sense. And let me just nickname that to byte, just to make clear that I want to manipulate files byte at a time. Let me now declare, for the first time today, a version of main that takes in an ergc command, takes in argc, and takes in argv, which is for command line arguments. Technically though, I'm not using the CS50 library in this version, so even that can now be changed. And this is the canonical way in C to declare main when you want to get command line arguments using char star instead of string. So now, I'm going to do two things. Remember how copy works. You specify two files, the file you want to copy, and the new name that you want to give to the copy. So it would be like CP, space, old name, space, new name at the command line. So accordingly, I'm going to do this. I'm going to create one file in memory called source, or SRC for short. And I'm going to set that equal to whatever is in argv one in read mode. But just to be super specific, I'm going to use read binary mode. I don't want to be copying text files. I want binary data, zeros and ones, like images. So I'm going to tell fopen to expect binary data. I'm then going to go ahead and create a second variable called destination, DST for short. And I'm going to open up whatever is in argv two, the second file name at the command line. But I don't want to read this file. I want to write to it in binary using zeros and ones. Now, let me do the copying one bite at a time. It's a little inefficient. I should really do bunches of bytes at a time for speed. But let me just give myself one byte in a variable called b. So byte is not a thing in C. It's literally a synonym I created just for the sake of discussion because we'll do this in the future, as well. Now, let me go ahead and do this. How do you copy a file from old to new? Well, I think it would suffice to use a loop and just start at the beginning of the file, loop all the way to the end of the file, and within the loop, copy one byte from old to new. So how do I do that? I used fprintf last time to write text. This time, I'm going to use a different function as follows. While there are bytes to read from the file, and this one's going to be a mouthful, so let me just type it out and then I'll explain it. While that line is true, go ahead and write this line, which is similarly a mouthful, so I'll type it first and then explain what it does. Then I'm going to close destination. Whoops. Then I'm going to close source, and I claim, if I haven't messed anything up, this will now copy files for me. How? So this is indeed a mouthful, but there's a function called fread, whose purpose in life is to read one or more bytes for you. How does it work? Well, just like swap, just like scanf, you have to tell it where to load those bytes in memory. So if I want to put them in the byte called b, I can't just say b because that's passed by value. I need to pass by reference. So I say the address of b is where I want you to put one byte from the file at a time. How big is a byte? Technically, I could just say one because we all know how big a byte is. But I'm just going to be super proper and generalize this as size of b so it just figures it out for me, just in case we ever do more than one bite at a time. How many bytes do I want to copy at a time? One, just to keep it simple. And where do I want to read those bytes from? The source file. fread, if you read the documentation, just tells you how many bytes were successfully read. Logically, it should either be one was read, or zero were read, based on what I'm asking it to do. I'm asking it to read one at a time, so it's either going to succeed or fail. So I want to do this for as long as it succeeds because it's going to succeed until it gets to the end of the file, and then there's no more bytes to read, at which point it will return zero. So now, I do the opposite with fwrite, and it's almost the same line. Where do I want to write that byte? Well, first, I tell fwrite where to find the byte, go there, and get the byte that was copied. It's this size, which is going to be one, but I did it generally. One bite at a time, please. And write it to the destination file. So if I now open up my terminal window, let me first make CP to create my own copy program. Let me actually open an image I came with today. Here's a happy cat from the internet. And that's going to be my original image. Let me now go ahead and run this. Dot slash CP. I have to run dot slash because I want my version of CP, not the one that comes with Linux. So dot slash CP, cat.jpeg, and let's call it maybe my backup cat, just in case I ever mess up the original. Enter. Seems to work OK. When I run now code of backup dot jpeg to open the copy, there is that same happy cat. So it's very low level manipulation, but it all results from my now having the power to express myself in terms of locations and memory using pointers, understanding that strings and now files are really just abstractions on top of these lower level details. And from all of that is going to come some pretty powerful functionality. In fact, among the things that you can now do, as you'll soon see, is manipulate at least simple files, known as bitmap files. So BMP is bitmap file, and it essentially implements images exactly as we began today, as just a map of bits, a grid, xy coordinates of grids, each of which represents a pixel coordinate. A bitmap is a type of file with a dot BMP file extension on a computer that stores images just like that. And now that you have the ability to not only think about images in this way, but write code that manipulates images, you can do powerful things all on Instagram, and TikTok, and Snapchat, like filters nowadays. So for instance, here is an image of the bridge, the Weeks Bridge across the river. Here is a black and white filter that we've applied by writing some C code, as you soon will, to change it from colorful to black and white. Here's the original that you might see every day. Here, meanwhile, is a reflection thereof. If you've ever flipped an image around on the x-axis, this can actually rotate the image, even though this is the other side of the bridge over there. Meanwhile, here is a blurred version. If it looks a little blurry, that's deliberate because we've essentially smudged all of the values by looking at every pixel, looking up, down, left, and right, and kind of blurring the effect to give it this effect here. Here is what's called edge detection, whereby if you're feeling more comfortable, you can write code that looks at these individual pixels, tries to figure out where the edges are, just like a fancy computer might, and then colorize it in this way, as well. And you'll be able to do all of that because images like these are just grids with coordinates with lots and lots of pixels. So what started quite simply now is going to be something you now have complete control over, now that we've taken off these training wheels. And it's cultural within computer science to understand geek humor like this. And so the last thing we'll do today is give you this joke to end on, which for better or for worse, should now make sense. And those chuckles will suffice. This was CS50.