The following content is
provided under a Creative Commons license. Your support will help
MIT OpenCourseWare continue to offer high quality
educational resources for free. To make a donation or to
view additional materials from hundreds of MIT courses,
visit MIT OpenCourseWare at ocw.mit.edu PROFESSOR: In this
class, this semester, the other co-lecturer
is going to be James Mickens, who is a visiting
professor from Microsoft Research. He'll lecture on some other
topics like web security later on. But we'll decide later what's
going on exactly, in terms of the lecture split up. We also have four TAs this year,
Stephen, Webb, [INAUDIBLE], and James. And hopefully you'll meet them
in office hours over the year if you need help. So the plan for this
class is to understand how to build secure systems,
why computer systems sometimes are insecure and how we
can make them better, and what goes wrong. And in order to do this, there's
not really a great textbook about this topic. So instead, what
we're going to do is, each lecture
other than this one is going to be focused around some
research, typically a paper, that we'll assign on
the website and you guys should read ahead of time. And there are some
question that you should answer in the submission
system about the paper. And submit your own question
by 10:00 PM before the lecture day. And then when you
come to lecture, we'll actually discuss
the paper, figure out, what is the system? What problem does it solve? When does it work? When does it not work? Are these ideas any
good in other cases? Et cetera. And hopefully, through
these case studies, we'll get some appreciation
of how do we actually build systems that are secure. And we have some preliminary
schedule up on the website. If there's other topics you guys
are particularly interested in, or if there's particular
papers you're excited about, just send us email and
we'll see if we can fit them in or do something. We're pretty flexible. So if there's
anything that you'd like to hear more
about, just let us know. And in a similar vein, if
you ever have a question or if there's some mistake,
just interrupt and ask us what's going on
in lecture, anytime. Security is, in many ways,
all about the details and getting everything right. And I will inevitably
make mistakes. So if something
doesn't seem right, there's a good chance it's not. And you should just
interrupt and ask. And we'll figure
out what's going on and what's the right
way to do things. And I guess in terms of
the class organization, the other large
part of the class, in addition to
lectures, is going to be a series of
lab assignments. The first one is already
posted on the website. And these lab
assignments will help you go through understanding
the different range of security problems and how do you prevent
them in a simple web server. So in lab one, which
is out right now, you'll actually
take a web server that we give you and find ways
to exploit buffer overflow vulnerabilities in it and
take control of this website by just sending it
carefully-crafted requests and packets. And in other labs,
you'll look at ways to defend the web server,
to find bugs in the code, to write worms that
run in the user's browser, and other kinds of
interesting security problems. One thing that
surprises many students is that every lab uses
a different language. So lab one is all
about C and Assembly. Lab two involves a
lot of Python coding. Lab three will be
something else. Lab five will be JavaScript. And so on. This is sort of inevitable. And I sort of
apologize ahead of time that you're going
to have to learn all these languages if you
haven't seen them already. In some ways it's useful,
because the real world is like this. All the systems are
complicated and composed of different parts. And in the long run,
it'll be useful for you, for your moral
character or something like that, to learn this stuff. But it will take
some preparation, especially if you haven't
seen these languages before. It might be helpful
to start early. In particular, lab
one is going to rely on a lot of subtle
details of C and Assembly code that we don't really
teach in other classes here in as much detail. So it's probably a good
idea to start early. And we'll try to get the TAs
to hold office hours next week where we'll do some sort
of a tutorial session where we can help you get
started with understanding what a binary program looks
like, how to disassemble it, how to figure out what's
on the stack, and so on. All right. And I guess the one other
thing, we're actually videotaping lectures this year. So you might be able
to watch these online. We'll post them as soon
as we get them ourselves from the video people. And the last bit
of administrivia is you should, if you
have questions online, we're using Piazza,
so I'm sure you've used this in other classes. All right. So before we dive into security,
I need to tell you one thing. There is a sort of rules
that MIT has for accessing MIT's network when you're,
especially, doing security research or playing
with security problems, you should be aware that not
everything you can technically do is legal. And there's many things that you
will learn in this class that are technically possible. We'll understand how systems
can be broken or compromised. Doesn't mean you should go
out and do this everywhere. And there's this link
in the lecture notes we'll post that has some rules
that are good guidelines. But in general, if
you're in doubt, ask one of the lecturers or a
TA as to what you should do. And hopefully it's not too
puzzling, what's going on. All right. So any questions about
all this administrivia before we dive in? Feel free to ask questions. OK. So what is security? So we'll start with
some basic stuff today. And we'll look at just
some general examples of why security is hard
and what it means to try to build a secure system. Because there's
not really a paper, this will not have sort of deep
intellectual content, maybe, but it'll give you some
background and context for how to think about secure systems. So security, in
general, is all about achieving some goal when
there is an adversary present. So think of it as there's some
bad guy out there that wants to make sure you don't succeed. They want to steal your files. They want to delete your
entire hard drive contents. They want to make
sure nothing works and your phone doesn't connect,
all these things, right? And a secure system is
one that can actually do something, regardless
of what the bad guy is trying to do to you. So it's kind of cool that we
can actually potentially build systems that are
resilient to a whole range of bad guys,
adversaries, attackers, whatever you want to call them. And we can still build
computer systems that allow us to get our work done. And the general way to
think about security is sort of break it
up into three parts. One part is roughly
the policy that you want your system to enforce. This is roughly the goal
that you want to achieve. Like well, maybe,
only I should be able to read the
grades file for 6.858. Or maybe the TAs as well,
and all the co-lecturers, et cetera. But there is some statement
about what I want my system to be able to do. And then, if you
want sort of think about what kinds of
policies you might write, typical ones have to do with
either confidentiality of data, so the grades file is only
accessible to the 6.858 course staff. Another example of
a security policy has something to
do with integrity. For example, only
the course staff can also modify the grades file. Or only the course staff
can upload the final grades to the registrar's office. That'll be great. Then you can also think about
things like availability. So for example, a website
should be available, even if the bad guys try
to take it down and mount some sort of a DOS-- Denial
of Service-- attack on it. So this is all well and good. So these are the policies
that we might actually care about from a system. But because it's security,
there's a bad guy involved. We need to understand,
what are we thinking the bad guy is going to do? And this is typically what
we call a threat model. And this is basically
just a set of assumptions about the bad guy or adversary. And it's important to have
some sort of assumptions about the bad guy because,
if the bad guy is omnipresent and is everywhere at once and
you can do anything they want, it's going to be hard to achieve
some semblance of security. So for example,
you probably want to assume the bad guy doesn't
exactly know your password, or they don't actually have
physical access to your phone and your keys and your laptop. Otherwise, it's going to be hard
to make some sort of progress in this game. And turns out that while
this is actually quite tricky to come up with, but I guess
one general rule is it's much better err on
the side of caution and being conservative in
picking your threat model, because bad guy might
always surprise you in terms of what they might
be able to do in practice. And finally, in order to
achieve security, in order to achieve our goal under
the set of assumptions, we're going to look
at some mechanism. And this is the, basically,
software or hardware or whatever part
of system design, implementation,
et cetera, that's going to try to make sure
our policy is followed as long as the bad guy
follows the threat model. So the end result
is that, as long as our threat model was
correct, hopefully we'll satisfy our policy. And it has to be the case that
the mechanism doesn't screw up. Make sense? Fairly high level
story about how to think about
this kind of stuff. So why is this so hard, right? It seems like a simple plan. You write down
these three things, and you're off and running. But in practice, as you, I'm
sure, have seen in the world, computer systems are
almost always compromised in some way or another. And break ins are
pretty commonplace. And the big reason
why security tends to be a difficult problem
is because what we have here is sort of, this will be
familiar to those of you took 6.033, this
is a negative goal, meaning that we have to make
sure our security policy is followed regardless of
what the attacker can do. So just by contrast, if you
want to build a file system, and you want to make sure that
my TAs can access the grades file, that's pretty easy. I just ask them, hey, can
you guys test and see? Can you access the grades file? And if they all can
access it, done. The system works. But if I want to say that
no one other than the TAs can access the grades file,
this is a much harder problem to solve, because now
I have to figure out what could all these non TA
people in the world to try to get my grades file, right? They could try to just
open it and read it. Maybe my file system
will disallow it. But they might try all
kinds of other attacks, like guessing the
password for the TAs or stealing the TAs laptops
or breaking into the room or who knows, right? This is all stuff that
we have to really put into our threat model. Probably for this class,
I'm not that concerned about the grades file to worry about
these guys' laptops being stolen from their dorm room. Although maybe I should be. I don't know. It's hard to tell, right? And as a result,
this security game is often not so
clear cut as to what the right set of
assumptions to make is. And it's only after the
fact that you often realize, well should have
thought of that. All right. And sort of, as a
result, this is very much an iterative process. And the thing you end up
realizing at every iteration is, well, here's the
weakest link into my system. Maybe I got the
threat model wrong. Maybe my mechanism had some bugs
in it because it's a software and it's going to
be large systems. They'll have lots of bugs. And you sort of fix them up. You change your
threat model a bit. And you iterate and try
to design a new system, and hopefully,
make things better. So one possible interpretation
of this class-- well, one danger-- is that you come
away thinking, man, everything is just broken. Nothing works. We should just give up
and stop using computers. And this is one
possible interpretation. But it's probably not
quite the right one. The reason this is
going to come up or you're going
to think this way is because,
throughout this class, we're going to look at all
these different systems, and we're going to sort
of push them to the edge. We're going to see, OK,
well, what if we do this? Is it going to break? What if we do that? Is it going to break then? And inevitably,
every system is going to have some sort
of a breaking point. And we'll figure out, oh hey. This system, we can break
it in if we push this way. And this system doesn't work
under these set of assumptions. And it's inevitable
that every system will have a breaking point. But that doesn't mean that
every system is worthless. It just means you
have to know when to use every system design. And it's sort of useful to
do this pushing exercise to find the
weaknesses so that you know when certain ideas work,
when certain ideas are not applicable. And in reality, this is a little
more fuzzy boundary, right? The more secure you make
your system, the less likely you'll have some embarrassing
story on the front page of New York Times saying,
your start up company leaked a million people's
social security numbers. And then you pay less money
to recover from that disaster. And I guess one sort of actually
positive note on security is that, in many ways, security
enables cool things that you couldn't do before, because
security, especially mechanisms, that
allow us to protect against certain classes of
attacks, are pretty powerful. As one example, the browser used
to be fairly boring in terms of what you could do with it. You could just view
web pages, maybe run some JavaScript code in it. But now there's all
these cool mechanisms we'll learn about
in a couple of weeks that allow you to run arbitrary
x86 native code in the web browser and make
sure it doesn't do anything funny to your machine. And it can send-- and
there's a technique or system called Native Client
from Google that actually allows us to do this securely. And before, in order to run some
native game on your machine, you'd have download and install
it, click on lot's of dialogue boxes, say yes, I allow this. But now, you can just
run it in a browser, no clicking required. It just runs. And the reason it's
so easy and powerful is that our security mechanism
can sandbox this program and not have to assume anything
about the user choosing the right game and not
some malicious game to play in their computer, or
some other program to run. So in many ways, good
security mechanisms are going to enable constructing
cool new systems that weren't possible to construct before. All right. Make sense? Any questions about this story? All right. So I guess in the
rest of the lecture, I want to go through a bunch
of different examples of how security goes wrong. So, so far, we've seen
how you can think of it. But inevitably, it's
useful to see examples of what not to do so that you
can have a better mindset when you're approaching
security problems. And in this sort of breakdown
of a security system, pretty much every one of
these three things goes wrong. In practice, people
get the policy wrong, people get the
threat model wrong, and people get the
mechanism wrong. And let's, I guess, start
with policies and examples of how you can screw
up a system's policy. Maybe the cleanest or sort
of simplest example of this are account recovery questions. So typically, when you
sign into a website, you provide a password. But what happens if
you lose your password? Some sites will send
you email if you lose your password with a
link to reset your password. So it's easy enough, if you
have another email address. But what if this is
your email provider? So at least, several
years ago, Yahoo hosted email, webmail, for
anyone on the internet. And when you forgot
your Yahoo password, they couldn't really
send you email because you couldn't get it. So instead, they
had you register a couple of questions with them
that hopefully only you know. And if you forget your password,
you can click on a link and say, well, here's the
answers to my questions. Let me have my password again. And what turns out to
be the case is-- well, some people failed to realize is
that this changes your policy, because before, the
policy of the system is people that can log
in are the people that know the password. And when you introduce
these recovery questions, the policy becomes,
well, you can log in if you know either the password
or those security questions. So it strictly weakens the
security of your system. And many people have actually
taken advantage of this. One sort of well known example
is, I think a couple years ago, Sarah Palin had an
email account at Yahoo. And her recovery questions
were things like, well, where'd you go to school? What was your friend's name? What's your birthday? Et cetera. These were all things written
on her Wikipedia page. And as a result, someone
can quite easily, and someone did, actually, get
into her Yahoo email account just by looking up on Wikipedia
what her high school was and what her birthday was. So you really have
to think carefully about the implications
of different security policies you're making here. Perhaps a more intricate and,
maybe, interesting example, is what happens when you have
multiple systems that start interacting with one another. So there's this nice story
about a guy called Mat Honan. Maybe you read this
story a year or two ago. He's a editor at this
wired.com magazine. And had a bit of a problem. Someone basically got
into his Gmail account and did lots of bad things. But how did they do it, right? So it's kind of interesting. So all parties in
this story seem to be doing reasonable things. But we'll see how they add
up to something unfortunate. So we have Gmail. And Gmail lets you
reset your password if you forget, as do pretty
much every other system. And the way you do
a reset at Gmail is you send them
a reset request. And what they say
is, well, you weren't going to do this recovery
questions, at least not for this guy. What they do is they send you a
recovery link to a backup email address, or some other
email address that you have. And helpful, they actually
print the email address for you. So for this guy's
account, someone went and asked Gmail
to reset the password. And they said, well, yeah. Sure. We just sent the recovery
link to this email,
[email protected], which was
some Apple email service. OK, but the bad guy doesn't
have access to me.com, either. But they want to get
this password reset link to get access to Gmail. Well, the way things
worked was that, in Apple's case,
this me.com site, allowed you to actually reset
your password if you know your billing address and the
last four digits of your credit card number. So it's still not clear how
you're going to get this guy's-- well, home address,
maybe you could look it up somewhere. This guy was a well
known person at the time. But where do you get the last
four digits of his credit card number? Well, not clear, but
let's keep going further. So you need to send these
things to me.com to get access to his email account there. Well, it turns out this guy
had an account at Amazon, which is another party in this story. Amazon really wants
you to buy things. And as a result, they actually
have a fairly elaborate account management system. And in particular, because they
really want you to buy stuff, they don't require
you to sign in in order to purchase some
item with a credit card. So I can actually go on Amazon,
or at least at the time, I was able to go on Amazon
and say, well, I'm this user. And I want to buy this
pack of toothbrushes. And if I wanted to use
the saved credit card number in the guy's account, I
shouldn't be able to do this. But if I just was providing a
new credit card, what Amazon would do is, they can
actually add a new credit card to some guy's account. So that seems not
too bad, right? I'm basically
ordering toothbrushes through one of your
Amazon accounts. But it's not your
credit card anyway. It's just my credit
card number being used. So it's not clear how
things go wrong yet. But Amazon had
another interface. All these are
complicated systems. And Amazon had an interface
for password reset. And in order to reset
a password in Amazon, what you had to provide is just
one of the user's credit card numbers. So I can order stuff and
add a credit card number to your account. And then I can say, hey, I
want to reset my password. This is one of my
credit card numbers. And this, in fact, worked. So this is where the bad guy
got a hold of this guy's, Mat's, Amazon account. But OK. How do you fish out
the credit card number for resetting Apple's site? Well, Amazon was
actually very careful. Even if you break into
someone's Amazon account, it will not print you
the saved credit card numbers from that person. But it will show the
last four digits. Just so you know which credit
card you're talking about. So you can list all the credit
cards, other than the one you added. You can then go and
break into me.com. You can click on this
link and get access to the guy's Gmail account. This is all very subtle stuff. And in isolation,
each system seems to be doing somewhat
sensible things. But it's actually
quite hard to reason about these vulnerabilities
and weaknesses unless you have this whole
picture explained to you and you've sort of put
all the pieces together. So this is actually
fairly tricky stuff. And unfortunately, well,
much like for every one of these three categories, the
answer for how to avoid this is often think hard
and be careful. I guess the one general plan
is, be conservative in terms of what you set
your policy to be, to maybe not depend on things
other sites might reveal. So well, I'm not sure if any
really great advice would have prevented this problem. But now you know. And now you'll make
other mistakes. There's many other
examples of policies going wrong and allowing a
system to be compromised. That's interesting enough. But let's look at how people
might screw up threat models. So let me turn off
this blue square. OK. So what are examples of
threat models that go wrong? Well, probably a big one in
practice is human factors. So we often make
assumptions about what people will do in
a system, like they will pick a good,
strong password, or they will not click
on random websites that they get through email
and enter their password there. So these are-- well, as
you probably suspect, and in practice,
happens to be the case, these are not good
assumptions in all cases. And people pick bad passwords. And people will click
on random links. And people will
enter their password on sites that are actually
not the right site at all. And they will not be
paying a lot of attention. So you probably don't want
to have threat models that make very strong
assumptions about what humans will do because
inevitably, something will go wrong. Make sense? Any questions? All right. Another sort of good thing
to watch out in threat models is that they sometimes
change over time. Or whether something is
a good assumption or not changes over time. One example of this is actually
at MIT in the mid '90s-- mid '80s, actually--
Project Athena developed this system called Kerberos. And we'll read about this in a
couple of weeks in this class. And at the time, they were sort
of figuring out, well, Kerberos is going to be based
on cryptography. So we need to pick
some size keys to make sure they're
not going to be guessed by arbitrary people. And they said, OK. Well you know, 56-bit
keys, at the time, for this cypher called DES,
seemed like a plausible size. Maybe not great, but certainly
not entirely unreasonable. And this was in the mid '80s. But then you know, this system
got popular and got used a lot. MIT still uses it. And they never really went
back to seriously revisit this assumption. And then, a couple years ago,
a group of 6.858 students figured out that actually, yeah,
you can just break this, right? It's easy enough to enumerate
all the 256 keys these days. Computers are so fast,
you can just do it. And as a result,
they were able to, with the help of some
hardware from a particular web service-- we'll have some links
the lecture notes-- they were able to get, basically, anyone's
Kerberos account key in roughly a day. And so this assumption
was good in the mid 1980s. No longer a good
assumption today. So you really have to
make sure your assumptions sort of keep up with the times. Maybe a more timely example
is, if your adversary-- or if you're worried
about government attacks, you might realize that you
shouldn't trust hardware even these days, right? There was all these
revelations about what the NSA is capable of doing. And they have
hardware back doors that they can insert
into computers. And maybe up until a couple
years ago, well, who knows? I guess we didn't
know about this stuff. So maybe it was a
reasonable assumption to assume your
laptop is not going to be compromised physically,
the hardware itself. But now you know. Actually, if you're worried
about the government being after you, you probably
have a much harder problem to deal with because
your laptop might be compromised
physically, regardless of what you install in it. So we really have to be
careful with your threat model and really sort of
balance it against who you think is out to get you. I think it's going to be a very
expensive proposition if you're going to try to protect
yourself from the NSA, really. On the other hand, if you're
just protecting yourself from random other
students that are, I don't know, snooping around
in your Athena home directory or whatnot, maybe you
don't have to worry about this stuff as much. So it's really a balancing game
and picking the right threat model. Another example of a bad threat
model shows up in the way secure websites these days
check certificates of a website that you're connecting to. So in this SSL protocol or TLS,
when you connect to a website and it says HTTPS-- we'll
talk much more about this in later lectures--
but what happens is that the site you're
connecting to presents you a certificate signed by one
of the certificate authorities out there that attests
that, yep, this key belongs to Amazon.com. And architecturally,
the sort of mistake or the bad threat model
that these guys assumed is that all these CAs are
going to be trustworthy. They will never make a mistake. And in fact, the
way system works is that there's hundreds
of these CAs out there. The Indian postal authority,
I think, has a CA. The Chinese government has a CA. Lots of entities are certificate
authorities in this design. And any of them can
make a certificate for any host name
or a domain name. And as a result, what
happens if you're a bad guy, if you want to compromise Gmail
or if you want to impersonate Gmail's website, you
just have to compromise one of these
certificate authorities. And it turns out the
weakest link is probably some poorly run authority
somewhere in some, you know, not particularly
up to date country. Who knows, right? And as a result, it's
probably a bad assumption to build a system--
or it's a bad idea to build a system
around the assumption that you'll manage to
keep all 300 certificate authorities spread out around
the globe perfectly secure. But yet, that's the
assumption underpinning the security mechanism of
today's SSL protocol used by web browsers. And there's sort of many
other, I guess, examples that are things you might
not have thought of. Another sort of amusing example
from the 1980s was DARPA. This defense
agency, at the time, really wanted to build
secure operating systems. And they actually
went so far as to get a bunch of universities
and researchers to build secure OS prototypes. And then they actually
got a red team, like a team of bad guys
pretending to be the attackers, and told them, well, go break
into these secure operating systems any way you can. We actually want to
know, is it secure? And it's kind of amusing,
some of the surprising ways they compromised the systems. One was that there
was this OS research team that seemed to have
a perfectly secure OS, but it got compromised. And the way it happened is that
the server in which the source code of the operating
system was stored was some development
machine in someone's office that wasn't secured at all. But that had all
the source code. So the bad guys broke
into that server. It was not protected very well. Changed the source code
of the operating system to introduce a back door. And then, when the researchers
built their operating systems, well, it had this back door. And the bad guys were
able to break in. So you really have to think
about all the possible sort of assumptions
you're making about where your software
is coming from, about how the bad
guy can get in, in order to make sure your
system is really secure. And there's many other examples
in lecture notes, if you want. So I'm using anecdotes. You can page through those. Probably the most pervasive
problem that shows up, of course, is in
mechanisms, though. And in part, it's
because mechanisms are the most complicated
part of the story. It's the entirety of all
the software and hardware and all that sort
of system components that make up what is trying to
enforce your security policy. And there's no end of ways
in which mechanisms can fail. And, partly as a result,
much of this class will focus pretty
heavily on mechanisms and how do you make
mechanisms that are secure, that provide correct enforcement
of security policies. And we'll talk about threat
models and policies as well. But turns out it's much
easier to make clean, sort of crisp statements
about mechanisms and ways they work and don't work, as
opposed to policies and threat models which, really,
you have to figure out how to fit them into
a particular context where you're using a system. So let's look at some examples
of, I guess, mechanism bugs. One that you might have heard
in the last couple of days was a problem in the security
mechanism in Apple's cloud infrastructure called iCloud. Well actually, any one
of you that has an iPhone might be using this
iCloud service. They basically provide
storage for files and let you find your iPhone
if you lose it, and probably lots of other useful features. And I think it's some relative
of this me.com service that was implicated in this
scheme a couple years back. And the problem that
someone discovered in this iCloud
service is that they didn't enforce the same sort
of mechanism at all interfaces. OK, so what does
iCloud look like? Well, it basically provides lots
of services for the same sort of set of accounts. So maybe you have your
file storage on iCloud. Maybe you have
your photo sharing. Maybe you have other interfaces. And one of the
interfaces into iCloud-- these are all sort
of at different APIs that they provide-- was this
feature to find my iPhone, I think. And all these interfaces
want to make sure that you are the right user,
you're authenticated correctly. And unfortunately,
the developers all this iCloud system, you know
it's a giant piece of software. I'm sure lots of
developers worked on this. But on this
particular interface, the find my iPhone
interface, when you tried to log in with
a username and password, they didn't keep track of how
many times you tried to log in. And the reason is important is
that, as I mentioned earlier, humans are not that great
at picking good passwords. So actually building a system
that authenticates users with passwords is pretty tricky. We'll actually read a whole
paper about this later on. But one good strategy
is, there's probably a million passwords out
there that will account for 50% percent of accounts. So if you can guess,
make a million attempts at someone's
account, then there's a good chance you'll get
their password because people actually pick
predictable passwords. And one way to
try to defeat this is to make sure that
your system doesn't allow an arbitrary
number of attempts to log in to an account. Maybe after three
or 10 tries, you should say, well,
you've had enough tries. Time out. You can try again in 10
minutes or in an hour. And this way you really
slow down the attacker. So they can only make a
handful of guesses a day, instead of millions of guesses. And as a result, even if
you have not the greatest of passwords, it's going to
be pretty hard for someone to guess it. What would happen is that iCloud
had this password guessing prevention or, basically,
back off, on some interfaces, like if you tried to log
in through other interfaces and you failed 10 times,
it would say, well, sorry. You have to wait
until you try again. But on this find my
iPhone interface, they forget this check. That's probably, you
know, some guy just forgot to call this
function on this API. But the result is that, for
the same set of accounts, a bad guy would be able
to now guess your password through this interface at
millions of attempts per day easily, because this is just
limited up to how fast they can send packets to
this iCloud thing. And they can probably
guess your password with pretty good accuracy, or
with pretty good success rate, after making many guesses. And this led to some
unfortunate break ins. And people's confidential
data got stolen from this iCloud service. So this is sort of an example
of you had the right policy. Only the user and
the right password would get you
access to the files. You even had the
right threat model that, well, the bad guy might
be able to guess the password. So we'll have to break limit
the number of guess attempts. But he just screwed up, like
the mechanism had a bug in it. He just forgot to enforce this
right policy and mechanism at some interface. And this shows up again
and again in systems, where just made a mistake and
it has pretty drastic effects on the security of
the overall system. This make sense? Any questions so far? All right. OK. So another example-- this
is sort of an example of you forget to check for
password guessing attempts. There's many other
things you can forget. You could forget to check for
access control altogether. So one example is, Citibank
had a website-- actually, still has a website that allows you
to look at your credit card account information. So if you have a credit
card with Citibank, you go to this
website, it tells you, yeah, you have this credit card. Here's all the charges,
all this great stuff. And the workflow a couple
of years ago was that you go to some site, you provide a
log in username and password, and you get redirected
to another URL, which is something like, I
don't know, I'm guessing, but basically like
citi.com/account?id= you know, whatever, one two three four. And it turns out that some
guy figured out, well, if you change this
number, you just get someone else's account. And it's not clear quite
how to think of this. One possibility is that these
guys were just thinking right, but they, again, forgot to
check a function in this account page that, not only do I
have a valid ID number, but it's also the ID
number of the guy that's currently logged in. It's an important check to me. But it's easy to forget. Another thing is, maybe
these guys were thinking, no, no one could hit URLs. Maybe they had a bad
threat model, right? Maybe they're
thinking, the URL-- if I don't print this URL,
no one can click on it. It's like a bad threat model. So maybe that's-- well, it's
hard to tell exactly what went wrong. But anyway, these
mistakes do happen. And they show up a lot. So easy to have
small, seemingly, bugs in your mechanism lead to
pretty unfortunate consequences. Another example that's not
so much in missing checks is a problem that
showed up on Android phones a couple of months ago. Maybe I'll use this
board over here. So the problem was related to
Bitcoin, which is this-- well, I'm sure you've heard--
this electronic currency system that's pretty
popular these days. And the way that Bitcoin
works, at a very high level, is that your balance
of Bitcoins is associated with a private key. And if you have
someone's private key you can, of course,
spend their Bitcoins. So the security of Bitcoin
relies quite heavily on no one else knowing
your private key. It's kind of like a password,
except it's even more important, because people can
probably make lots of guesses at your private key. And there's no real server
that's checking your key. It's just cryptography. So any machine can try
to make lots of guesses at your private key. And if they guess it, then
they can transfer your Bitcoins to someone else. And as a result, it's
critically important that you generate
good, random keys that no one else can guess. And there are people
using Bitcoin on Android. And the Android applications
for Bitcoin were getting random values for these keys using this
Java API called SecureRandom(), which sounds great, but as
people figured out, well, OK. So what it is, right, it doesn't
really get real random numbers. Inside of it, there's
this construction called Pseudorandom
Number Generator, or PRNG that, given
a particular seed value, like you get
maybe a couple of hundred bits of randomness and you
shove it into this PRNG, you can keep asking it for more
randomness and sort of stretch these random bits into as
many random bits as you want. So you see them
initially, and then you can generate as many
random bits as you want. And for various cryptographic
reasons I won't go into here, it actually works. If you give it a couple of
hundred really good random bits initially, it's going to
be very hard for anyone to predict what the pseudorandom
values it's generating are. But the problem is
that this Java library had a small bug in it. In some set of
circumstances, it forgot to initialize the
PRNG with a seed, so it was just all zeros, which
means that everyone could just figure out what your
random numbers were. If they start with
zeros, they'll produce the same
random numbers as you, which means they'll produce
the same private key as you. So they can just generate
the same private key and transfer your Bitcoins. So this is, again, a
small or not small bug, depending on, I
guess, who is asking. But nonetheless, right? Another example of small
programming mistakes leading to pretty
catastrophic results. Lot's of people got their
Bitcoin balances stolen because of this weakness. Of course, the fix is
pretty simple at some level. You change the
Java implementation of SecureRandom() to always
seed this PRNG with random input bits. And then, hopefully,
you're in good shape. But still, that's yet another
example of mechanism failure. Yeah? AUDIENCE: Just to be clear,
is this a different attack from the DSA
signature randomness? PROFESSOR: Well yeah. So the actual problem
is a little bit more complicated, as
you're hinting at. The problem is, even
if you didn't generate your key on the Android
device in the first place, the particular signature
scheme used by Bitcoin assumes that every time you
generate a new signature with that key, you
use a fresh, what's called a nonce, for
generating that signature. And if you ever generate two
signatures with the same nonce, then someone can figure
out what your key is. The story is pretty similar. But the details are
a little different. So yeah, even if you actually
generated your key somewhere else and your key was great,
it's just that every time you generate a signature,
you would-- and you generated two signatures
with exactly the same nonce, or random value, someone
could apply some clever math to your signatures and sort
of extract your public key out of it. Or private key,
more importantly. All right. Other questions about these
problems, examples, et cetera? All right. So I guess, one thing I wanted
to point out is that actually, well, as you're
starting to appreciate, is that in computer security,
almost every detail has a chance of really mattering. If you screw up almost something
seemingly inconsequential, like forgetting to check
something, or this, or forgetting to
initialize the random seed, it can have pretty
dramatic consequences for the overall system. And you really have to
be very clear about, what is the specification
of your system? What is it doing? Exactly what are all
the corner cases? And a good way to sort of
think of breaking a system or, conversely, figure out
if your system is secure, is to really push
all the edge cases, like what happens if my
input is just large enough? Or what is the biggest
or the smallest input? What is the sort
of strangest set of inputs I could
provide to my program and push it in all
these corner cases? One example of this ambiguity,
sort of a good example to keep in mind, is how
SSL certificates, again, encode names into the
certificate itself. So this is a different
problem than the problem about the certificate
authorities being trusted. So these SSL certificates
are just sequences of bytes that a web server sends to you. And inside of this
SSL certificate is the name of the server
you're connecting to, so something like Amazon.com. You know, you can't just
put down those bytes. You have to encode it
somehow and specify, well, it's Amazon.com. And that's the
end of the string. So in SSL certificates, they
use a particular encoding scheme that writes down Amazon.com
by first writing down the number of bytes
in the string. So you first write down, OK. Well, I'm going to have a 10
byte string called Amazon.com. That's actually 10 bytes. Great. OK. So this is like-- in the
SSL certificate, somewhere in there, there is this byte
10 followed by 10 bytes saying what the host name is. And there's other stuff
afterwards, right, and before. And when a browser takes
it, well, the browser is written in C. And the
way C represents strings is by null terminating them. So in C, a string doesn't
have a length count. Instead, it has all the bytes. And the end of the string
is just the byte zero. And in C, you write it with
a backslash zero character. So this is in memory
in your browser. Somewhere in memory
there's this string of 11 bytes, now, with
an extra zero at the end. And when a browser
interprets this string, it just keeps going until
it sees an end of string marker, which is a zero byte. OK. So, what could go wrong? Any guesses? Yeah? AUDIENCE: You have a zero
in the middle [INAUDIBLE]? PROFESSOR: Yes. This is great. All right. So, this is actually a
bit of a discontinuity in terms of how
this guy represents strings and this guy. So suppose that I own
the domain foo.com. So I can get certificates
for anything dot foo dot com. So what I could do is ask for
a certificate for the name amazon.com0x.foo.com. That's a perfectly valid string. It has a bunch of bytes. I guess it's 10, 11
12 13, 14, 15, 16, there's another four, 20, right? So this is 20 byte name
with these 20 bytes. So it used to be that if you
go to a certificate authority, in many cases, you could
say, hey, I own foo.com. Give me a certificate
for this thing. And they'd be perfectly
willing to do it because it's a subdomain of foo.com. It's all yours. But then, when a browser
takes this string and loads it in memory, well,
what it does is the same thing it did here. It copies the string. amazon.com0x.foo.com. It'll dutifully add the
terminating zero at the end. But then, when the rest
of the browser software goes and tries to interpret the
string at this memory location, it'll keep going up until it
gets to zero and say, OK well, that's the end of the string. So this is Amazon.com. That's it. So this sort of disconnect
between how C software and how SSL certificates
represent names led to some unfortunate
security problems. This was actually
discovered a number of years ago now by this guy,
Moxie Marlinspike. But it's a fairly
clever observation. And these kinds of encoding
bugs are actually also pretty common in
lots of software because, unless you're very
diligent about exactly how you encode things, there might be
different ways of encoding. And whenever there's
disagreement, there's a chance the bad guy
can take advantage of this. One system thinks
that's a fine name. Another thinks that's
not, something else. So these are good places
to sort of push a system to see how it might break. That make sense? All right. So maybe the last example
of mechanism failure I'm going to talk about today
is a reasonably popular one. It's this problem
or buffer overflows. So some of you have seen this
in, or at least at some level, in 6.033, if you did
the undergrad course. But for those of you that have
forgotten or haven't taken oh three three, we'll sort
of go over buffer overflows in more detail. And this will be, actually,
quite critical for you guys, because lab one is all
about buffer overflows. And you're going
to be exploiting these vulnerabilities in a
somewhat real web server. So let's figure out,
what is the setting? What are we talking about here? So the setting we're
going to be considering is a system which has,
let's say, a web server. So what we have is, we have
some computer out there that has a web server on it. And the web server
is a program that is going to accept connections
from the outside world, take requests-- which are
basically just packets-- and somehow process them, and
do some checking, probably. If it's an illegal
URL or if they're trying to access a file they
are not authorized to access, the web server is going
to return an error. But otherwise, it's going
to access some files, maybe on disk, and
send them back out in some sort of a reply. So this is a hugely common
picture, almost any system you look at. What's the policy? Or what's the threat model? So this is a bit of a problem
in many real world systems, namely that it's
actually pretty hard to pin down what is the
exact policy or threat model that we're talking about. And this sort of imprecision
or ambiguity about policies, threat models, et
cetera, is what sometimes leads to security problems. Not in this particular
case, but we'll see. But maybe just to give
you a sense of how to think of a typical web server
in the context of this policy, threat model kind of stuff, is
that well, probably the policy is, the web server should do
what the programmer intended it to do. It's a little vague. But that's probably what's
going on because anything more specific, as well,
the web server should do exactly what the
code does, is going to be a bit of an [INAUDIBLE]
And if your code has a bug, well, your policy
says, well, that's exactly what I should do. I should follow the bug. So it's a little hard to
state a policy precisely, but in this case, let's
go with some intuitive version of, well, the
web server should do what the programmer wanted it to do. And the threat
model is probably, the attacker doesn't have
access to this machine, can't log in to it remotely,
doesn't have physical access to it, but can send
any packet they want. So they're not restricted
to certain kinds of packets. Anything you can
shape and sort of deliver to this web
server, that's fair game. Seems like a reasonable
threat model, in practice, to have in mind. And I guess the goal is that
this web server shouldn't allow arbitrary stuff
to go wrong here. I guess that sort of
goes along with what the programmer intended. The programmer probably
didn't intend any request to be able to access
anything on the server. And yet, it turns out if you
make certain kinds of mistakes in writing the web server
software, which is basically the mechanism here, right? The web server software is
the thing that takes a request and looks at it and
makes sure that it's not going to do something bad, sends
a response back if everything's OK. The web server in
this mechanism. It's enforcing your policy. And as a result, if the web
server software is buggy, then you're in trouble. And one sort of common
problem, if you're writing software in
C which, you know, many things are
still written in C and probably will continue to
be written in C for a while, you can mismanage your
memory allocations. And as we saw in this SSL
certificate naming example, even sort of a single
byte can really make a huge difference,
in terms of what goes on. And I guess for
this example, we'll look at a small piece of code
that's not quite a real web server. In the lab, you'll have this
whole picture to play with. But for lecture, I
just want to give you a simplified example
so we can talk about what's sort of at the
core of what's going wrong. And, in particular, if
this system wakes up, I will show you sort of
a very small C function. And we can sort of
see what goes wrong if you provide different
inputs to that piece of code. All right. So the C function that I
have in mind is this guy. Somewhere here. Oh, yeah. It's coming on. All right. So here's the sort of
program I'm talking about, or I want to use
as an example here. So this program is just
going to read a request. And you can sort of
imagine it's going to read a request from the network. But for the purposes
of this example, it's just going to read
a request from whatever I'm typing in on the keyboard. And it's going to store
it in a buffer here. And then it's going to
parse it is an integer and return the integer. And the program will then print
whatever integer I get back. It's like far from a web server. But we'll at least
see some basics of how buffer overflows
work and what goes wrong. So let's see actually what
happens if we run this program. So I can compile it here. And actually, you
can sort of see the-- it's already telling me
what I'm screwing up, right? The get function is dangerous
and should not be used. And we'll see in a second
why the compiler is so intent on telling me this. And it actually is true. But for now, suppose
we're a happy go lucky developer that is willing
to ignore this warning. So OK. I run this redirect function,
I provide some input, and it works. Let's see if I
provide large inputs. If I type in some
large number, well, at least it gives me
some large number. It basically maxes out to
two to the 31 and prints that and doesn't go any higher. So that's maybe not
disastrous, right? Whatever. You provided this
ridiculously large number. You got something
didn't quite work. It's not quite a problem yet. But if we provide some
really large input, we might get some
other problem, right? So suppose I provide
in a lot of by 12 I just provided things
that are not numbers. It prints zero. That's not so bad. But suppose I'm going to
paste in a huge number of As. OK, so now the program crashes. Maybe not too surprising. So if it was the case that if
I send a bad request to the web server, it just doesn't get back
to me or doesn't send a reply, that would be fine. But we'll sort of
look inside and see what happens, and try to
figure out how we can actually take advantage of this crash
to maybe do something much more interesting, or, well, much more
along with what a hacker might be interested in doing. So to do this, we're
going to run this program under a debugger. You'll get super familiar
with this in lab one. But for now, what
we're going to do is set a breakpoint in
that redirect function. And we're going to sort of run
along and see what happens. So when I run the
program, it's going to start executing
in the main function. And pretty quickly,
it calls redirect. And the debugger is now stopped
at the beginning of redirect. And we can actually see what's
going on here by, for example, we can ask it to print
the current CPU registers. We're going to look at
really low level stuff here, as opposed to at the
level of C source code. We're going to look at
the actual instructions that my machine is
executing because that's what really is going on. The C is actually maybe
hiding some things from us. So you can actually
print all the registers. So on x86, as you
might remember. Well, on [INAUDIBLE]
architecture, there's a stack pointer. So let me start maybe drawing
this diagram on the board so we can try to reconstruct
what's happening. So what's going on is that
my program, not surprisingly, has a stack. On x86, the stack grows down. So it sort of is
this stack like this. And we can keep
pushing stuff onto it. So right now, the
stack pointer points at this particular
memory location FFD010. So some value. So you can try to figure
out, how did it get there? One way to do it is to
disassemble the code of this redirect function. Is this going to work better? Really? Convenience variable
must have integer value. Man. What is going on
with my debugger? All right. Well, we can disassemble
the function by name. So this is what the
function is doing. So first off, it starts
by manipulating something with this EBP register. That's not super interesting. But the first thing
it does after that is subtract a certain value
from the stack pointer. This is, basically, it's making
space for all those variables, like the buffer and the integer,
i, we saw in the C source code. So we're actually,
now, four instructions into the function, here. So that stack
pointer value that we saw before is actually already
in the middle, so to say, of the stack. And currently,
there's stuff above it that is going to be the
buffer, that integer value, and actually,
also the return address into the main function
goes on the stack, as well. So somewhere here, we'll
have the return address. And we actually
try to figure out, where are things on the stack? So we can print the address
of that buffer variable. So the buffer variable
is at address D02C. We can also print the
value of that integer, i. That guy is at D0AC. So the i is way up on the stack. But the buffer is a bit lower. So what's going on is that
we have our buffer here on the stack, and then
followed above by i and maybe some other stuff, and
then finally, the return address into the main
function that called redirect. And the buffer
is-- this is going, the stack is growing down. So these are higher addresses. So what this means is that
the buffer-- we actually have to decide, where is the
zeroth element of the buffer, and where is the 128th
element of this buffer? So where does the zeroth
element of the buffer go? Yeah? Should be at the bottom,
right, because yeah, higher elements
just keep going up. So buff of zero is down here. It just keeps going on. And buff of 127 is
going to be up there. And then we'll have
i and other stuff. OK. Well, let's see what
happens now if we provide that input that seemed
to be crashing it before. So I guess one thing
we can actually do before this is to see
whether we can actually find this return address. Where it actually happens to
live is at the EBP pointer. This is just a convenient thing
in the x86 calling convention, that the EBP pointer,
or register, actually happens to point to something
on the stack which is going to be called the saved EBP. It's a separate location, sort
of after all the variables but before the return address. And this is the
thing that's being saved by those first couple
of instructions at the top. And you actually
sort of examine it. In GDB you can say, examine x,
some value, so the EBP pointer value. So that's the location
of the stack, D0B8. Indeed, it's actually
above even the i variable. So it's great. And it has some other
value that happens to be the EBP before
this function was called. But then, sort of one
more memory location up is going to be
the return address. So if we print EBP plus four,
there's something else there, this 0x08048E5F. And let's actually see
where that's pointing. So this is something you're
going to do a lot in the lab. So you can take this address. And you can try
to disassemble it. So what is this guy? Where did we end up? So GDB actually helpfully
figures out which function contains that address. So 5F. This is the guy that our
return address is pointing to. And as you can see, this
is the instruction right after the call to redirect. So when we return
from redirect, this is exactly where we're going
to jump and continue execution. This is, hopefully,
fairly straightforward stuff from double oh four,
some standard OS class. OK. So where are we now? Just to recap, we can try to
disassemble our instruction pointer. So we're at the beginning
of redirect right now. And we can run for a bit, and
maybe run that getS() function. So OK, we run next. What this does is it runs getS()
and it's waiting for getS() to return. We can provide our bad input
to getS() and try to get it to crash again and see what's
going on, really, there, right? So we can paste a
bunch of As again. OK. So we got out of getS() and
things are actually still OK, right? The program is still running. But we can try to figure out,
what is in memory right now and why are things
going to go wrong? Actually, what do
you guys think? What happened, right? So I printed out a bunch of As. What did getS()
do to the memory? Yeah, yeah. So it just keeps
writing As here, right? All we actually passed to
getS() was a single pointer, the start of this
address, right? So this is the
argument to getS(), is a pointer to this memory
location on the stack. So it just kept writing As. And it doesn't actually
know what the length is, so it just keeps going, right? It's going to override As
all the way up the stack, past the return address,
probably, and into whatever was up the stack above us. So we can check whether
that's the case. So we can actually
print the buffer. And in fact, it
tells us, yeah, we have 180 As there,
even though the buffer should be 128 elements large. So this is not so great. And we can actually,
again, examine what's going on in that EBP pointer. Dollar sign, EBP. So in fact, yeah. It's all 0x41, which is the
ASCII encoding of the letter A. And in fact, the return
address is probably going to be the same way, right? If we print the return
address, it's also all As. That's not so great. In fact, what's going to
happen if we return now is the program will jump to
that address, 41414141. And there's nothing there. And it'll crash. That's the segmentation
fault you're getting. So let's just step up to
it and see what happens. So let's run next. So we keep stepping
through the program. And we can see where we are. OK. We're getting close to
the end of the function. So we can step over
two more instructions. nexti. And now we can
disassemble again. OK. We're now just at the return
instruction from this function. And we can actually figure out. So as you can see, at
the end of the function, it runs this leave
x86 instruction, which basically restores the
stack back to where it was. So it sort of pushes
the stack pointer all the way back to the return
address using the same EBP. That's what it's basically for. And now, the stack is
pointing at the return address that we're going to use. And in fact, it's all A's. And if we run one
more instruction, the CPU is going to jump to
that exact memory address and start executing
code there and crash, because it's not a valid address
that's in the page table. So let's actually see, just to
double check, what's going on. Let's print our buffer again. Our buffer-- well, that's
actually kind of interesting, right? So now, buffer,
for some reason it only says A repeats 128 times. Whereas if you remember before,
it said A repeated 180 times in our buffer. So what happened? Yeah? AUDIENCE: [INAUDIBLE]. PROFESSOR: Yeah, yeah. Exactly. So there's actually
something going on after the buffer
overflow happens that changes what's going on. So actually, if
you remember, we do this A to i conversion of
the string to an integer. And if you provide
all As, it actually writes zero to this
memory location. So a zero, if you remember,
terminates strings in C. So GDB now thinks, yep, we have
a perfectly well-terminated 128 byte string of all As. But you know, it
doesn't really matter, because we still have
those As up top that already corrupted our stack. OK. That was actually kind
of an important lesson that-- it's actually a
little bit tricky, sometimes, to explore these buffer
overflows because, even though you've already changed
lots of stuff on the stack, you still have to
get to the point where you use the value
that you have somehow placed on the stack. So there's other
code that's going to run after you've
managed to overflow some buffer and corrupt memory. You have to make sure that
code doesn't do something silly like, if it's A to i,
just exited right away, as soon as it saw a
non-integer value, we might not get to jump to
all this 41414141 address. So you have to massage
your input in some cases. Maybe not so much in this case. But in other
situations, you'll have to be careful in
constructing this input. OK, so just to see what happens,
we can jump one more time. Well, let's look
at our register. So right now, our EIP, the
sort of instruction pointer, is pointing at the
last thing in redirect. And if we step one more
time, hopefully we'll jump to, finally, that
unfortunate 4141 address. Over here. And in fact, yep. The program now seems
to be executing there. If we ask GDB to print the
current set of registers, yep, the current instruction
pointer is this strange value. And if we exclude
one more instruction, it's going to crash
because that's finally trying to execute an instruction
pointer that doesn't correspond to a valid page in the
operating system's page table for this process. Make sense? Any questions? All right. Well, I've got a question
for you guys, actually. So what happens-- you know,
it seems to be exploitable. Or well, OK. Maybe let's first figure out
why this is particularly bad, right? So why is it a problem? So not only does
our program crash, but presumably we're
going to take it over. So I guess, first
simple question is, OK, so what's the problem? What can you do? Yeah? AUDIENCE: You can do
whatever you want. PROFESSOR: Yeah. So I was actually pretty silly
and just put in lots of As. But if you were
careful about knowing where to put what
values, you might be able to put in
a different value and get it to jump
somewhere else. So let's see if we can
actually do this, right? We can retrace this whole thing. OK. Re-run the program again. And I guess I have to
reset the breakpoint. So I can break and
redirect again. And run. And this time,
I'll, again, next, supply lots of As
and overflow things. But I'm not going to try
to carefully construct-- you know, figure out which
point in these As corresponds to the location in the stack. That's something
you guys are going to have to do for lab one. But suppose that I
overflow the stack here. And then I'm going
to manually try to change things on the stack to
get it to jump to some point I want to jump to. And in this program, OK,
so let's again-- nexti. Where are we? We're at, again, at the
very end of redirect. And let's actually look
at the stack, right? So if we examine esp here,
we see our corrupted pointer. OK. Where could we jump to? What interesting
things could we do? Unfortunately, this
program is pretty limited. There's almost nothing
in the program's code where you could jump and
do anything interesting. But maybe we can do a little
bit of something interesting. Maybe we'll find
the printf in main and jump directly there, and
get it to print the x value, or x equals something. So we can do this. We can actually disassemble
the main function. And main does a
bunch of stuff, you know, initializes, calls
redirect, does some more stuff, and then calls printf. So how about we jump to
this point, which is, it sets up the
argument to printf, which is x equals percent d,
and then actually calls printf. So we can actually
take this value and try to stick
it in the stack. And should be able to do
this with the debugger pretty easily, at least. So you can do this set
[? int ?] esp equals this value. So we can examine esp again
and, indeed, it actually has this value. So if we continue now,
well, it printed out x equals some garbage,
which I guess happens to be just whatever
is on the stack that was passed to printf. We didn't correctly set
up all the arguments because we jumped in the middle
of this calling sequence. But yeah, we printed this value. And then it crashed. Why did crash? Why do you think? What actually happens, right? So we jump to printf. And then, something went wrong. Yeah? Well, we changed
the return address so that when we
return from redirect, we now jump to this new address,
which is that point up there, right after printf. So where's this
crash coming from? Yeah? AUDIENCE: Is it
restricted because your i is supposed to be some
sort of integer, but-- PROFESSOR: No, actually,
well the i is like, well it's a 32-bit register. So whatever's in the
register, it'll print. In fact, that's the thing
that's in the register. So that's OK. Yeah? AUDIENCE: [INAUDIBLE]
main returns. PROFESSOR: Yes. Actually, yeah. What's going on is, you
have to sort of-- OK, so this is the point
where we jumped. It's set up some arguments. It actually calls printf. printf seems to work.
printf is going to return. Now actually, that's fine,
because this call instruction put a return address on the
stack for printf to use. That's fine. Then main is going
to continue running. It's going to run the sleeve
instruction, which doesn't do anything interesting. And then it does another return. But the thing in
this-- up to the stack, it doesn't actually have
a valid return address. So presumably, we
return to some other who knows what memory location
that's up on the stack and jump somewhere else. So unfortunately,
here, our pseudoattack didn't really work. It ran some code. But then it crashed. That's probably not
something you want to do. So if you really
wanted to be careful, you would carefully plant not
just this return address up on the stack, but
maybe you'd figure out, where is this second red going
to get its return address from, and try to carefully
place something else on the stack
there that will ensure that your program cleanly
exits after it gets exploited so that no one notices. So this is all
stuff you'll sort of try to do in lab one in
a little bit more detail. But I guess one thing we
can try to think about now is, we sort of understand
why it's bad to jump to the-- or to have these
buffer overflows. One problem, or one sort
of way to think of this is that, the problem is just
because the return address is up there, right? So the buffer keeps
growing and eventually runs over the return address. What if we flip
the stack around? You know, some machines actually
have stacks that grow up. So an alternative design
we could sort of imagine is one where the stack
starts at the bottom and keeps going up
instead of going down. So then, if you
overflow this buffer, you'll just keep
going up on the stack, and maybe there's nothing
bad that will happen. Yeah? AUDIENCE: [INAUDIBLE]. PROFESSOR: So you're right. It might be that,
if you have-- well, so let me draw this
new stack diagram. And we'll sort of try to figure
out what it applies to and not. But OK. So we'll basically just
invert the picture. So when you call redirect on
this alternative architecture, what's going to happen
is the return address is going to go
here on the stack. Then we'll have our i variable,
or maybe the saved EBP. Then we'll have our i variable. And then we'll have buff. So we'll have buff of zero,
buff 127, and so on, right? So then when we do the overflow,
it overflows up there and maybe doesn't hit anything bad. I guess what you're
saying is that, well, maybe we had
a buffer down there. And if we had a buffer
down there, then yeah, that seems kind of unfortunate. It could overrun
this return address. So you're right. So you could still
run into problems on this stack growing up. But what about
this exact program? Is this particular
program safe on machines where the stack grows up? So just to recap what the
program read is this guy. Yeah? AUDIENCE: Still
going to overwrite [INAUDIBLE] as a return value. PROFESSOR: Yeah. So that's actually
clever, right? So this is the stack
frame for redirect. I guess it actually spans
all the way up here. But what actually happens
when you call getS() is that redirect makes a function call. It actually saves its return
address up here on the stack. And then getS() starts running. And getS() puts its
own saved EBP up here. And getS() is going to post
its own variables higher up. And then getS() is going
to fill in the buffer. So this is still problematic. Basically, the buffer is
surrounded by return initials on all sides. Either way, you're going to
be able to overflow something. So at what point-- suppose we
had a stack growing up machine. At what point would
you be able to take control of the program's
execution then? Yes, and that is actually
even easier in some ways. You don't have to wait
until redirect returns. And maybe there was like, stuff
that was going to mess you up like this A to i. No. It's actually easier, because
getS() is going to overflow the buffer. It's going to change
the return address and then immediately
return and immediately jump to wherever you sort
of tried to construct, makes sense. So what happens if we
have a program like this that's pretty boring? There's like no real
interesting code to jump to. All you can do is get it to
print different x values here. What if you want to do
something interesting that you didn't-- yeah? AUDIENCE: I mean, if you
have an extra cable stack, you could put
arbitrary code that, for example, executes a shell? PROFESSOR: Yeah yeah yeah. So that's kind of clever,
right, because you actually can supply other inputs, right? So at least, well-- there's
some defenses against this. And we'll go over these
in subsequent lectures. But in principle, you could
have the return address here that you override on either the
stack up or stack down machine. And instead of pointing
it to some existing code, like the printf
inside of main, we can actually have the return
address point into the buffer. So it's previously just
some location on the stack. But you could jump there
and treat it as executable. So as part of your
request, you'll actually send some bytes of
data to the server, and then have the return address
or the thing you overwrite here point to the base of the
buffer, and you'll just keep going from there. So then you'll be able
to sort of provide the code you want
to run, jump to it, and get the server to run it. And in fact, traditionally,
in Unix systems, what adversaries would often
do is just ask the operating system to execute the
binsh command, which lets you sort of type in
arbitrary shell commands after that. So as a result, this
thing, this piece of code you inject
into this buffer, was often called, sort of for
historical reasons, shell code. And you'll try to construct
some in this lab one as well. All right. Make sense, what
you can do here? Any questions? Yeah? AUDIENCE: Is there a separation
between code and data? PROFESSOR: Right. So is there a separation
between code and data here? At least, well,
historically, many machines didn't enforce any
separation of code and data. You'd just have a flat
memory address space. The stack pointer
points somewhere. The code pointer
points somewhere else. And you just execute wherever
the code pointer, instruction pointer is pointing. Modern machines try to
provide some defenses for these kinds of attacks. And what modern
machines often do is, they actually
associate permissions with various memory regions. And one of the
permissions is execute. So the part of your
32-bit or 64-bit address space that contains code
has the execute permission. So if your instruction
pointer points there, the CPU will actually
run those things. And the stack and other data
portions of your address space typically don't have
the execute permission. So if you happen to somehow
set your instruction pointer to some non-code memory
location, you can set it, but the CPU will
refuse to execute it. So this is a reasonably
nice way to defend against these kinds of attacks. But it doesn't prevent
quite everything. So just a question. OK. So how would you
bypass this if you had this non-executable stack? You actually saw this
example earlier, right, when I actually jumped
to the middle of main. So that was a way of sort
of exploiting this buffer overflow without having to
inject new code of my own. So even if the stack
was non-executable, I would still be able to
jump in the middle of main. In this particular case,
it's kind of boring. It just prints x and crashes. But in other
situations, you might have other pieces of
code in your program that are doing interesting
stuff that you really do want to execute. And that's sort of called return
to libc attacks for, again, somewhat historical reasons. But it is a way to bypass
the security measures. So in the context
of buffer overflows, there's not really
a clear cut solution that provides perfect protection
against these mistakes because, at the end of the
day, the programmer did make some mistake in
writing this source code. And the best way to fix it
is probably just to change the source code and make sure
you don't call getS() very much, like the
compiler warned you. And there's more subtle
things that the compiler doesn't warn you about. And you still have to
avoid making those calls. But because it's
hard, in practice, to change all the
software out there, many people try to
devise techniques that make it more difficult
to exploit these bugs. For example, making the
stack non-executable, so you can't inject the
shell code onto the stack, and you have to do something
slightly more elaborate. And next couple of
lectures, next two lectures, actually, we'll look at
these defense techniques. They're not all perfect. But they do, in
practice, make it much more difficult for that
hacker to exploit things. Question? AUDIENCE: I just have a general
administrative question. PROFESSOR: Yeah? AUDIENCE: I was wondering
if there was a final? And also if there are
quizzes, and what dates-- PROFESSOR: Oh yeah. Yeah, I think if you go
to the schedule page, there's two quizzes. And there's no final
during the final week, but there's a quiz
right before it. So you're free for
the final week, but there's still something
at the end of the class. Yeah. All right. OK. So I think that's probably
it for buffer overflows. I guess the one
question is, so what do you do about
mechanism problems? And the general answer is to
probably have fewer mechanisms. So as we saw here,
if you're relying on every piece of software to
enforce your security policy, you'll inevitably
have mistakes that allow an adversary to bypass
your mechanism to exploit some bug in the web server. And a much better
design, and one but you will explore
in lab two, is one where you structure
your whole system so the security of
the system doesn't depend on all the
pieces of software enforcing your security policy. The security policy
is going to be enforced by a small
number of components. And the rest of
the stuff actually doesn't matter, for
security purposes, if it's right or wrong. It's not going to violate
your security policy at all. So this, kind of minimizing
your trusted computing base is a pretty powerful technique
to get around these mechanism bugs and problems that we've
looked at today, at least in a little bit of detail. All right. So read the paper for Monday. And come to Monday's lecture. And submit the questions
on the website. See you guys then.