[MUSIC PLAYING] SPEAKER: So today we're going to
have our first of a few discussions about cybersecurity,
and later on we're going to talk a little bit about cybersecurity
in the context of the internet and some of the challenges
that it brings up there. But today we're going to focus mostly
on cybersecurity issues related to your machine, your
computer without necessarily being connected to the internet. Before we do, we need to
understand a little bit more about our machine's
infrastructure, its hardware. And the biggest question
to ask at the outset is, when we talk about the system's
memory, what do we mean by that? That term kind of gets thrown around and
it means a couple of different things, potentially. It might mean your system's
RAM or random access memory, which is a rough translation
of how much computing power it has, how many things it can do. And we can also talk
about hard drive space as another example of system memory. Hard drive space is usually
just free storage, basically. How much room do we have to
literally store files on our machine? How much memory does your computer have? Maybe you do or maybe you don't know. If you take a look at
your system information or look up the computer that
you bought on the internet, you might find that if we're
quoting memory in terms of RAM, that your device might have as
low as 512 megabytes of RAM, which is about half of a gigabyte. And that's not very much, most
machines have much more than that now unless you have a low
powered Chromebook, for example, that you use for travel. Memory on the RAM scale might go
as high as 32 gigabytes of RAM, which is quite a bit more than that. That's generally for
really high end computers. Computers, in particular, that
process a lot of graphics. So sometimes computers that are
specifically dedicated for gaming might have that much RAM. But typically the range is somewhere
between four and 16 nowadays. When we're talking about hard drive
space, that number is quite a bit bigger. So the typical hard drive nowadays
might be as low as 128 gigabytes, if the drive is a solid state
drive, versus a hard disk drive. We won't go into too much detail about
the distinction between those two things, other than right now
to say those are just two different ways to store data long term. So that might be the low end. The high end is probably somewhere
on two terabytes of information. One terabyte is 1000
gigabytes, give or take. So two terabytes would be about
2000, give or take, gigabytes. So quite a bit. Maybe even as high as four terabytes. That's quite a bit of
storage information. That's enough to store several
hundred HD, high quality films. But there's much more to memory
than just RAM and hard disk space. There's actually kind of a hierarchy of
memory that exists within your machine. Most of these numbers,
though, aren't usually quoted in the specs of a device. So there's RAM, random
access memory, and then there's a series of caches, each
of which gets successively smaller. So they're going to be quite a
bit smaller than the four gigs, say, of RAM that your device has. But they're also a little bit faster,
and the reason these things get faster, these caches get faster, is they
are getting closer and closer to the computer's processor, which is
really the only part of the device that is able to manipulate information. It's the only part that
can process information. So the memory that we're
feeding to that processor needs to get faster and
faster, such that it can continue to swap things in and out. So we have the RAM, maybe an L3 cache,
a Level 3 cache, Level 2, Level 1, and then finally CPU memory, which
is the processor memory itself. Plus some small bits of
memory called registers, which are used to be the final sort
of pass of information from RAM or this hierarchy of
memory into the CPU. But again, every file on your
machine lives somewhere permanently on a disk drive. And there are, again, two
different kinds of disk drives. We have solid state drives
and hard disk drives. We should treat them as
effectively identical for purposes of our discussion today. They-- solid state drives tend to
behave a bit differently than hard disk drives, they tend to be a bit more
secure than some of the vulnerabilities that hard disk drives
present, which we're going to talk about a little
bit later in today's lecture. But in general, when we talk
about hard disks or storage space for the rest of today's
lecture, we're going to be mostly focusing
on hard disk drives. They're also just much
more prevalent still. Solid state drives are coming into their
own and becoming more and more frequent as they appear in devices,
but hard disk drives are still far and away more and
more prevalent within devices that exist now. They are just storage space,
though, we can't do anything with data that is stored on disk. We have to first move
it to RAM and then have it sort of go up and down that chain of
RAM, the different caches to the CPU, in order to actually
manipulate the data. Once we're done manipulating
it, and maybe we're turning our computer
off for the evening, then all of the data that is in RAM will
be stored back into the hard disk space so that we're able to
access it at another time. One thing to keep in mind as we
begin this discussion of memory, though, is that memory
is really just an array. And we've talked about arrays
already, where each cell of that array basically is one byte wide. And recall that one byte is eight bits. We may have anywhere between
512 megabytes of memory, so about 512 million of
those little one byte sized cells, maybe as high as
four, 8, 16, and so on gigabytes. And we have quite a few of
those items in our array. But it really is just
an array, which means we can jump to different addresses. It has the same properties
as any other random access array that we've already discussed. Different types of data take
up different amounts of memory on our systems. So if we think about a very low
level programming language like C, which is this is just an example. Different programming languages
may store different types of data using different amounts of space. But if we look to just the
most base level of data and think about the smallest individual
pieces into which we can break it, we may be able to store an
integer, for example, in four byte. Which means we have exactly 32 bits
worth of space to store an integer. Characters will take up one byte, so
we have only eight bits worth of memory required to store a single character. So capital or lowercase letters,
digits, punctuation marks, and so on. Not a huge variety of options there. Floats are-- you may
recall are real numbers, numbers that have
decimal points in them. Doubles are, as well. They're double precision
floating point values and they take up four or eight bytes. So basically the idea here
is different types of memory will take up different
amount of space and then we eventually can construct these things
into pixels, and images, and films, each of which will also take up
different amounts of space and memory if we are manipulating or
working with that data. So again, let's think of memory as a big
array of individual byte-sized cells. Because it is an array, that means
we have random accessability. We can say, I want to go to memory
address x and see what is there. I want to go to memory address
y and change what is there. We have the ability to do that. We don't have to iterate through step by
step by step in order to make changes. If we did, the processor would be quite
a bit slower having to perform this, we might term linear search as
we try to iterate through memory to find the one byte we're looking for. It's very helpful to be able
to jump to a particular byte. And that means that every location
in memory must have an address. We must have a way to refer
to that individual byte in order to randomly access it. We can't just look at this grid of
cells and say, I want to go to this one and sort of, you know,
imagine particular spot. We need to say, I want to go
to exactly this memory address. OK? So s-- the fact that memory
cells have an address is what comes into play when
you think about this idea of a 32-bit system or a
64-bit system, and this may be a term that you've heard before. It refers to the ability
to process an address. So for example, a 32-bit
computer, a 32-bit system, can process memory addresses
up to 32 bits in length. Which means it understands memory
address zero through memory address right up to four billion,
a little over four billion. But it doesn't understand
memory past that. Now interestingly, this doesn't
mean that a 32-bit system is limited to four gigabytes of RAM. There are some software tricks that we
can pull using something called virtual memory, which we're not going to get
into in any more depth than to refer to it as virtual memory today, that
allow you to use more than four gigabytes of RAM on a 32-bit system
by doing-- sort of, you know, pretending that things live
somewhere where they don't. But when you talk about
a 64-bit system, that means we have many more
memory cells that we can refer to without running into our
sort of artificial limit of how high we can count. Now granted, there are
no memory banks out there that have all of the memory addresses
from zero to 64 bits worth of memory. That's somewhere in the
quintillion or higher. It's a very, very large
number and we don't yet have the storage capacity to store
that much data on our machines. But theoretically, it is possible
that with a 64-bit system we could have very, very large amounts
of RAM and again, the more RAM we have, generally the more
quickly our computer is going to operate because there's more
space for it to store information. It doesn't have to keep sending
stuff back to the hard drive when the RAM is full because
there's so much information being processed at once. More of it is available in that
quicker, more accessible bit of memory. So recall that with each bit, remember a
bit can only take on one of two states. Zero or one, off or on. Or you can think about it in terms of
electricity, which is how RAM actually works, as being unpowered or powered. That again means that we have 32-- two to the 32nd power, excuse
me, possible memory addresses. So about four billion memory addresses. Now it is sometimes the case that
programmers, and subsequently, those who may need to read
their code, may need a way to refer to specific memory addresses. But a memory address like this,
which is a memory address. There are zeros and
ones in this address. This is exactly how we would
refer to an address in memory. This is rather cumbersome. No programmer wants to talk to another
programmer and no programmer wants to talk to an advisor by saying the
code that lives at 00101 and so on. That's just not-- that
doesn't make any sense. That's just not how we would talk
and it would take forever just to say the name of the
memory before you even get to the point of what is in that memory. And so rather than using binary
notation to refer to a memory address, computer scientists will oftentimes use
something called hexadecimal notation. Hexadecimal is 16 hexadecimal, 6 and 10. And so this is the
base 16 number system. It's a different number system
than the decimal system, base 10, that we have used since
childhood to count and understand place values of numbers and so on. What's convenient
about hexadecimal being base 16 versus binary being base two
is that four binary digits or four bits can be represented using a single
what is often called hex digit. So for every group of four
binary digits that we have, we can represent that more succinctly
using just one hexadecimal digit. And because there are
four bits, that means we have two to the fourth,
or 16 different combinations. So we can account for every
single possible on off combination of all of the four bits in
that cluster using a single hex digit. So we might instead refer to this
memory address looking like this. And there are some letter
characters in there, and that's because in order to
represent a single digit in hexadecimal, we need to be on the
count higher than 10 using two digits, as we
are confined to in decimal. In order to represent
the number 10, we need a one and zero, a one being in the tens
place and a zero in the ones place. But in hexadecimal, we
need 16 possible digits to represent all of the 16 possible
values at any given place value. So here's an example of something
that a programmer might see. This is using a tool
called GDB, which is a debugging tool that is used to debug
or root out problems in some low level code. And all we're seeing here is
a bunch of memory addresses. So I've highlighted them here in yellow. We don't need to worry too much about
the context around this, what these all refer to. But basically, these things on the
left, EAX, ECX and so on are registers. Those are things that are
very close to the memory. And they are storing the memory
address of something else. And so all these things on the left
here are just memory addresses, and the things on the right are
translations of those memory addresses in some cases into
decimal numbers that make more sense to us having used the base 10
or decimal system for quite some time. So we can map all of the different
possible values in hexadecimal to their binary equivalents
as well as to decimal numbers that we're familiar with. So again, here we have all of the
possible combinations of four bits or zeros and ones showing you
what they translate to in decimal, recalling that for every set of four
bits here we see, the one on the right is the ones place, the one to
its left is the twos place. Then we have the fours
place and the eights place. Because again, our base is two. Every place value is a power of
two as opposed to a power of 10 like we would in decimal. And then it's hexadecimal equivalent. So again, for every single
one of those combinations, we have one distinct way to represent
it using a single hex digit. And sometimes you'll see
the hex digits for 10 through 15, which are a through
f, presented in capital letters. I like to present them
in capital letters, but sometimes you see them
in lowercase letters as well. That is immaterial to it. And this zero x at the beginning of
it, I should mention that as well. Zero x means absolutely nothing. It is purely a note
for us as human beings when we are seeing something like
this that we should interpret it as hexadecimal numbers as opposed
to as decimal, for example. Because we could have a valid
hexadecimal string that is-- I'm going to use the zero
x here just for second-- 0x, five, zero. If we saw that, we might read it if
we didn't have a 0x in front of it, we might read that as 50, which would
be not actually accurate, because 0x, five, zero is actually
80 in decimal notation. So that 0x is really just a
guide for us as human beings to say, OK, what I'm about to
read here is a hexadecimal number. Let's just do a quick
exercise where we translate some binary into hexadecimal and then
subsequently into decimal as well. And so here, we have eight bits, each
of which again is a zero or a one, and our goal is to translate
this into ultimately decimal, but let's start by translating
it into hexadecimal. The first approach is
counting from right to left, we want to split these
into groups of four. It so happens that we
have eight bits here, and so this splits pretty
cleanly into two groups of four. But if we, for example, had seven
bits, like if this wasn't here, we would start by having
one zero one zero, and then whatever we had
left over, we would just pad with extra zeros at the front so
we always had a cluster of four bits at a time to work with. Each of these maps directly
to a single hexadecimal digit. And sometimes you may be able to
just quickly do this in your head, or you can jump back to
the table that we had here to see when I see this
particular pattern, I want to plug in this
hexadecimal digit. And so if we do that here, we see
that the one on the left, 0010, this is in binary again. A zero in the ones place,
a one in the twos place, and nothing else, which
means we have one times two. And so this would be a two. And 1010, well, that's a one
in the eights place and a one in the twos place, which is 10. But in hexadecimal, we would
represent that as a, because again, we need to confine this idea
of 10 to a single place value. We can't have two digits to represent
it using hexadecimal notation. And so this binary
value, 001010, is 0x-- again, human convention to
prepend a 0x in front of anything that is a hexadecimal number-- 0x2a. Now, how do we translate
this to decimal? Well, it may help to think about
how we translate this or understand this number, 123. When we see it, one two
three just written out, we are really doing something like
this in our head where we're saying, there's a one in the one hundreds
place, there's a two in the tens place, and there's a three in the ones place. And we've just over
time internalized that and have been able to
very quickly understand that the number I'm
talking about here is 123. Well, another way to think
about these labels here, one hundreds place, tens place,
and ones place, might be to say, we have the 10 squareds place or
the 10 to the second powers place, the 10 to the first powers place,
and the ten to the zero powers place. Any number to the zero
power is always one, and so this is really the ones place,
the tens place, and the hundreds place. With hexadecimal, we don't have 10
as the base of the exponent here. Instead, we have 16 as
the base of the exponent. But the rules are the same. We have a 16 to the
zero place which is one. We have 16 to the first
power or 16s place, and we have a 16 squared or 256s place. In our example number here,
we didn't go that high. We had 0x2a. We only had two digits,
which means we really only needed these two place values,
the 16 to the zero power and the 16 to the one power. Now, we just translate this in exactly
the same way that we would intuitively do it in when we're counting in
decimal or reading a decimal number. This is zero times 16
squared plus two times 16 to the first power plus a times
one, or 16 to the zero power. Two times 16 is 32, and a, which again
is hexadecimal's way of representing 10, 10 times one is ten,
so what we're really saying is that we have 32 plus 10. And so to translate this hexadecimal
number, 0x2a, into decimal, we end up with 42,
because 42 is 32 plus 10. So hopefully, that gives you a
bit of a better understanding of what these cryptic number strings
that you might have seen before mean. And if you're working with programmers
or you're ever analyzing source code and you see references
like this, hopefully this gives you a better
understanding of what they mean and what they likely
refer to on the system and how that might affect things. Let's talk a little bit more about
the function, how memory actually works now that we know how to
access individual parts of it. With the exception of hard
disk space-- so again, the permanent storage
space on your device-- memory on your computer
is termed volatile, which means two different things. One, that the memory
is constantly changing. Things are cycling in and out of it. It's very dynamic in terms of the
values that are being stored there, again because the RAM is sort of this
holding ground for everything that's going to eventually need
to go to the processor, and things are getting swapped
in and out pretty frequently. But the other really key
detail about volatile memory is that it requires power. If it is unpowered, if there
is not electricity literally flowing to the RAM at any
given time, that is a problem and that memory will no longer work. In fact, after some amount of
time, a pretty small amount time like 30 seconds to a minute perhaps,
without power, the electrical charge which is used to maintain each of
those individual cells of memory-- remember, a little bit
of electricity being one, and the absence of
electricity being zero is how the computer can store
this idea of zeros and ones on a physical manifestation thereof. Without power, that electrical
charge eventually dissipates. It does not just stay. it goes away. And the state is eventually lost such
that unpowered for about a minute or so, all the data in RAM has
effectively turned into zeros. It has completely become
completely unpowered. Now obviously, that would be
very bad if our entire system relied on this technology. But it's only RAM and the caches from
RAM going forward that rely on this. Processing can only
happen in the processor. This probably makes a
little bit of sense. And again, recall that
a 32-bit processor can understand 32-bit addresses. That also means that it only has 32
bits of space in which to do anything. So it only can work with four
bytes of information at a time. And maybe if you have a computer
that has multiple cores, maybe you've heard that term
before, multicore processors, you might have a few of these processors
that can do four bytes at a time. But either way, we're still talking
about a very, very small amount of information, maybe
four to 16 or 32 bytes. That's not very much at
all when you consider that a basic document
perhaps using Microsoft Word will contain enough metadata to be
about 15,000 bytes before you even type a single character into it. So a lot of metadata there,
and that amount of empty files gets pretty big pretty quickly. Because the process can only
process 32 bits worth of information at a time, any given processor, we
need to move data to it frequently. And that's what the
caches are for, and that's why each one needs to be faster
and be able to get information to the processor pretty quickly. Because even though
the processor can only process four bytes or 32 bits worth
of information at any given time, it can do two to three
billion operations per second, so that's what a gigahertz is. And in terms of when a
processor's speed is quoted, it's sometimes said it's like 2.4
gigahertz or 2.6 gigahertz or so on. That means that the computer can do
2.4 to $2.6 billion things per second. So again, 32 bits, not a lot
of information at any instant, but there's a lot of those
instants within a second. It can do two to three billion things
per second, each one of those things operating on exactly four bytes
at a time, 32 bits at a time, on a 32-bit processor, as opposed
to a 64-bit processor which can process a little bit more data. Let's take a look now
at what we determine on your computer as the
motherboard, or sort of the control processor for everything
that your computer does, and highlight some of the
different pieces of where things live on your physical device. So right here are some
slots for RAM, so these are basically sticks that get plugged in. A RAM stick is just a green chip. It looks similar to the motherboard. They're usually green. They have some gold connector
pins at the bottom of them, and they plug into the motherboard. And information can then be stored
there and flow to and from when needed by the processor and so on. So that's where these go. This particular motherboard,
which is from a computer that's about 15 years old. For example, I don't
think most of us have floppy drive connectors on our computers
anymore, but this one still does. Here is where the CPU
would live, so this is where the actual processor goes. And that processor again can only do
32 or 64 bits worth of information at any given time. And on top of the CPU, it's not pictured
here, but typically on top of the CPU there's a giant fan, literally like
mounted or screwed right above it. And again, that's because the computer
is doing two to three billion things a second, so it gets quite hot. And to prevent a CPU
meltdown or a core meltdown, you want to make sure to
have air constantly flowing across the top of the
device as well as a heat sink to pull all the heat away from
the CPU such that it doesn't overheat, which would create quite a big
problem and eventually might result in computer breakage
if left to overheat for a prolonged period of time. Over here is a graphics processor. Graphics processors are
really just CPUs that are specialized to do certain operations
that make interpreting graphics on your monitor much easier. The math for those is usually
a bit more complicated, and so modern devices may have both a
CPU and a GPU, a Graphical Processor Unit, as opposed to relying
on just the CPU you to handle all of those different things. And it similarly would have a heat
sink and a fan mounted with it as well. And then over here at the
top, it's pretty small. There are things called SATA connectors. SATA connectors are what you might use
to connect hard drives to your machine so that you can extend the
storage capacity of the device. But all of these things
might live on your computer, and also all of these things in shrunk
down form will live on your laptop and even in your mobile phone. This basic idea exists just
in smaller and smaller scales with all of the parts being
similarly scaled down. So again, CPU memory, what
actually lives in the CPU as well as the registers, those really fast
things right around the CPU memory, is the fastest memory on your machine. But there's the least of it. And the reason for this is
that it's very, very expensive. It is the most expensive
stuff in your computer. That is basically the
price that you are paying when you buy the computer
is for that processor and the materials that are
used to allow electricity to conduct through it
very quickly really determines the cost of the device. So there's the least amount of it,
but it is the most important memory on your machine. The caches, one two and three, are each
successively slower than CPU memory but also successively cheaper. So your l1 cache is going to be a
little bit slower than your CPU, but there will be a
little bit more of it. And your l1 cache will be a little
bit larger than the CPU space that you have, but it'll
be a little bit cheaper. The l2 cache may be a little
bit larger than the l1 cache but a little bit cheaper. Again, this is really just
referring to the materials that are used to make the memory operational. RAM is slower but cheaper. RAM typically used to
be the most expensive or be considered the driving cost. If you had more RAM in your
computer, that made it more powerful. That was the cost driver. This is becoming less and less the case. It's still more expensive
than hard disk space, which is effectively free at this point. It's really just how much
stuff we can literally fit into the container for the hard
disk itself, which is just pure storage. But RAM is slower memory
than any of the caches, but you're able to have more of
it because it is less expensive. So that's memory. But in terms of hard disk space,
that does not work in the same way that RAM and the other
volatile memories work, and hard disk space is non-volatile. Information in the hard disk
is not changed terribly often, only when we're certain that
we're done working with it in RAM. And the data there is also
persistent, and that's because it does not rely on
electricity to store state. Instead, and we're talking again
specifically now about hard disk drive, solid state drives behave
a little bit differently. They use microchips that
do some different things. But we're talking about hard disk
space, HDDs, traditional hard disks. Each cell of a hard disk is
instead controlled by magnetism, so data is stored magnetically. If there is a-- we'll just say for
purposes of this discussion here that if the magnetism is in a
down position, so south for example, it's oriented south, that would be zero. That's a way to represent zero. And any magnet that
is in the up position is one, so we can have these flip states
of the polarity is pointing up or north and the polarity is pointing down or
south to represent zero and one as opposed to using powered versus
unpowered to represent one and zero, respectively in a RAM or
volatile memory situation. Because these magnets,
though, don't require power in order to work long term, that
means that when the computer shuts off and they become unpowered,
the data remains. And this is a really good thing, right? Because if every time
we shut off our computer we lost literally all of the
files we'd ever saved on it, that would not be very effective. We would lose a lot of the utility
that we rely on computers for. And so the way that hard disks work is
specifically designed such that memory can persist after the
computer is shut off. But again, that memory can not be
processed directly in the hard disk. We have to move it to
the processor eventually. So if our system detects that
we need a chunk of memory from the hard disk, that's all
going to be moved from the hard disk to RAM using something called a bus. Much like a bus is used to move
human beings from one place to another in large
quantities, a bus is used to move data from one part of your
machine to another in large quantities. And in fact, if you ever see a SATA
connection from a hard drive to RAM using one of the SATA connectors
we saw a moment ago on the slide, there's usually a long, thin
strip that connects them together. That strip also forms
part of the bus that is used to transfer
data from the hard drive to the RAM in fairly large quantities. In general, when we're
working on a program, the data for that program including
the code that actually is running is moved from hard disk to RAM. And it stays in RAM, assuming
there's no space constraint that forces it to have to leave which
sometimes can happen if you're running a lot of programs at once. You may notice your computer
slows down quite a lot. That's because the
computer is going to have to keep swapping things
in and out of RAM in order to process multiple things. That's why you don't want to
leave several hundred tabs open, for example in your browser, or
have 20 or 30 programs running at once on your computer
if you can avoid it, because it's going to slow
down and require things to be swapped in and out
of RAM such that it can be moved to the processor quite a bit. That's really going to slow things down. While the program is running
or being used by the computer, everything will stay in RAM. All the data will keep
being manipulated there, and then ultimately when
we close the program or once we otherwise indicate
we haven't used it for some time and the computer realizes it needs
that space for something else, all of those bits and bytes
have been manipulated in RAM will be sort of picked up and moved
back on the bus back to a hard disk where they will be resaved with the new
state, such that any changes that you make in a program will ultimately
be saved back to hard disk, but only once the program is completely
done being used by the computer and it realizes it can
free up that information and save it for long term storage. Hard drives, though,
are not unbreakable. They have a lot of moving pieces. A typical hard disk drive
consists of several platters, some thin metal circles spinning
around a central axis very rapidly, about 4,000 to 5,000
revolutions per minute. So very, very quickly,
with a magnetic read write arm that extends over across
the diameter of the disk, basically. And each one of the
little rings that gets formed as you do this, as is the
read write arm moves in and out, it can access different
sectors on the disk, and those different
sectors are the things that get zeroed and oned over time. So it is possible for
hard drives to fail. There's usually a couple
ways that this happens. If the read write arm jams, because
it is on some sort of track that moves in and out, if it
jams without collapsing, your hard drive will just
stop working, basically, because you can't read or write
information anymore using that arm. But it is also possible for the
hard disk arm to break and fall. That arm spins just above the top of
these disks, and if it crashes into it, you'll hear that sound. That'll be a very unique and
interesting sound to hear. Suffice it to say, your
hard drive at that point is destroyed, because the
collapse will crash everything, and these things are
spinning very, very quickly, and so they're going to shred
themselves from the inside. And you will no longer be able to
get any data off of that drive. But if it's just the arm that gets stuck
moving in and out but it doesn't fall down, you will still be able to
recover data from that hard drive, and we'll talk about that shortly. Because a hard drive failure does not
mean that the data is unrecoverable if the hard drive hasn't literally
suffered this catastrophic shredding sort of thing that happens. That's going to render it unusable. But if it's just the arm that
gets stuck, it is still usable. So what happens when we actually
delete something on our machine? It turns out that
overwriting hard disk space is actually a very, very
time consuming and what we might consider computationally
expensive operation for the machine. You could think about it as it has to
pull all of the data from the hard disk into RAM, change all of those bytes
to delete what was there before, and then put all of that data back. The computer for some
large files, say you want to delete a video
file like a movie, that might be several gigabytes, so
several billion bytes worth of data that we have to delete. The computer does not want
to incur that sort of cost. Deleting a file if it actually had to
do it that way would be very, very slow. It would compromise any other program
that you had running on your machine. And so that's not how computers
actually delete information. Rather, they just forget
where the data live. It turns out there's also
something called a page file that exists on your machine
that is basically the home address of the first
byte of every single file that you have on your machine. And when you delete a file
typically in your computer, it just forgets where it lives. The bytes that made
it up are still there. The zeros and ones that comprise
that file don't go anywhere. They may eventually be overwritten
by some other file that happens to be stored in that same
spot, because the computer now thinks it's open because it
forgot that you live there. And even then, this only happens when
you empty your recycle bin or trash if you're using a Mac. If you just put something
in the recycle bin, that's not actually deleting it
in any meaningful way at all. It hides the icon. You can't really click
on that icon anymore, but you haven't deleted
that file, and you probably know this because you can restore
things from the recycle bin. But even when you empty the recycle
bin or empty the trash on your machine, you're still not actually
deleting anything in the sense that you might be thinking
is how we delete things. Instead, your computer's just
forgetting what was there before. But those bits and bytes that comprise
those files that you have deleted are still there, and that creates a
couple of really interesting security implications. So files that get deleted
aren't really deleted, which means that we can recover the
information from them if we need to. How exactly might we do that? Well, there's definitely some tools
out there that can be used to do this. And again, this requires
that the hard drive was not physically destroyed in some way by
the collapse of the read write arm. But we can literally just connect
the hard drive to something and have a specialized tool that reads over
all of those individual sectors on the disk-- and this is a
very slow operation for sure-- read over all of the individual
sectors on that disk and just say, well, this is a zero and
this is a one and this is a zero and this is a one until we end
up with this huge file that is all the zeros and ones that
comprised what was originally the state of that hard drive. And we usually refer to
this file that gets created, this clone of the hard drive,
as a for forensic image. It's really just a huge file
that is a complete replication of the bit by bit content
as well as any metadata that might be associated with
it that can be then created and read on a different computer so
that even though the hard drive this was plugged into, maybe the
computer got destroyed, where we can make a copy of it and
read it on a different machine instead. So we go from this to how do people
pick out what those files were? Again, computers only
understand zeros and ones and at the end of the
day, all of the stuff that is stored in your hard
drive, all those files, anything that was stored
in RAM when it was powered, is still just zeros and ones. They don't have icons like
we see on our desktop. They don't mean anything intuitively. So how do we figure out
what those files are? Well, it turns out that many of
them have what is called a signature or a magic number associated with them. A magic number is just a way to refer
to the first few bytes of a file where many file types, for examples,
PDFs, most image files, most music file types and so on, happen to
start in a particular way. This isn't a way that we ever see
when we open one of these files. But in the metadata at the
beginning of those files, there's usually a sequence
of bytes that represent a signature in effect of saying, the
file that I'm about to open is a PDF, and you can generally rely on that
because these first four bytes or whatever are these values. Now again, it's four
to eight bytes, which means there are two to the 32 to two
to the 256ish possibilities for what these first bits are. That's a lot of different combinations. And so if we see a magic number
randomly appear in some forensic image or on some hard drive,
the odds are pretty good that if we see that pattern,
we know that that pattern generally refers to a file of that
type, that what we have found is the beginning of a
file of exactly that type. And we can start to
interpret it in that way maybe and maybe be able to
reconstruct something from it. So for example, it turns out that
most PDFs have in their metadata-- and we never really see this-- the characters percent PDF
at the beginning of them. And that translates into this
sequence of bits using the Ascii table that we've talked
about before, and we don't need to get into a lot
of detail, and it translates into these hexadecimal values. And so generally, if we happen to
encounter exactly this pattern of 32 bits, which we should only expect
to see at the beginning of a PDF or otherwise once every one
in two to the 32nd times-- like it's pretty uncommon
to see exactly this pattern and we're looking for
exactly that pattern. If we see those bits,
generally what we can do is start to interpret the
rest of this file as a PDF until we encounter some signature
that we've reached the end of that. Whether that's a whole bunch of
zeros or whether that's a signature that is again perhaps
the start of another PDF. Now, of course it's possible that
you'll end up with a false positive. For example, anybody who's
examining these slides at some point in the future--
say that my hard drive crashed and I happen to literally
have the characters percent PDF typed on to this slide. If you were to forensically recover
my hard drive and analyze it and you found this PowerPoint file that
is where I'm presenting the slides from and you saw literally the characters
percent PDF in it as zeros and ones, you might mistakenly think,
this happens to be a PDF and start to interpret
from this point forward, this yellow point forward as a PDF. But it wouldn't work. And that's OK. You might get a false
positive sometimes, and then you just kind of
disregard it and you keep looking. You look for a different type of file. You look for a different
file signature and so on. But it can happen that you
have a false positive like this in situations where you're
trying to sort it out, because you have no other context clues. All you have are the
bits and the information that you know about file signatures. OK, so we have this empty trash
or empty recycle bin icon or menu option on our computers. But now we know it doesn't
actually empty the trash at all. So how do we actually delete
files from our hard drives as opposed to just having
our hard drives forget or our systems forget where on
the hard drive that file lived? We probably want to do that at
some point, get rid of the data on our machines. How exactly can we go about that? Well, there's actually relatively few
ways to actually delete this data. The first of which we've
already kind of discussed, which is physically
destroying the hard drive. There are services out there that
will shred your hard drives for you. If your read write arm
breaks in a catastrophic way, your read write arm will shred
the device for you itself. That's one way to ensure that
your data is protected or deleted is to make it absolutely
impossible to recover information from it by physical destruction. You can use a tool called a
degausser A degausser is really just a very strong magnet that you hold
over the device for a period of time. It will also usually cause
some sort of physical damage, because it's also going to
mess up some of the metal that is inside the machine
that is not storing data but is just structural metal. So usually a degausser will
not only wipe out information by setting all of the bits, flipping
the polarity of all the bits from south to north or something like
that, but it will also usually cause some sort of
mechanical wear just based on the strength of that magnet. But then we have this
thing Secure Empty Trash. We saw this in the menu a second ago. What do you think Secure
Empty Trash might do? Well, one thing that you
might think is that it would overwrite the data with random
bits, and you would be correct. That's what Secure Empty Trash does. So instead of just deleting
information from the hard drive by forgetting where it lives,
instead we actually go to that spot. And instead of writing
all zeros or all ones, we just write random bits over it. But it turns out that this
is actually not good enough to delete information on a single pass. But a single pass is actually
what Secure Empty Trash does. It only makes one pass through,
randomly setting each bit of that file to a one or a zero. But it turns out, and the physics
of this is a little bit beyond me, but it turns out that when the
polarity of a magnet on a hard drive is flipped from zero to one, there's
actually sort of this lingering halo effect that it leaves behind so that
you can tell that this bit is a one now, but it used to be a zero. And that effect lingers
for a little while. But if you keep changing it
multiple times over and over, eventually that effect gets lost. So you can tell what bits-- imagine every bit was a one after
you make one pass through it. All of those things that were ones
before, their polarity didn't flip. There's no halo effect. But everything that used
to be zero and is now a one has this slight signature left behind
that says, this used to be a zero. And a good forensic analyst is
able to take a look at that. As it reads, it can read
the polarity of the magnet and see that it's slightly not exactly
zero and not exactly one and say, OK. Well this bit probably
used to be the opposite. And so even making one random
pass across a hard drive is not enough to definitely
securely erase the data on it. You actually have to
make it's considered to be seven passes is
the industry standard to make sure that enough randomness has
affected each of the individual magnets such that you can't tell
what was there before. So to truly securely erase the hard
drive and preserve it in a state where you can actually use
it, you need to use-- and there are software
tools that do this-- a tool that will overwrite
the drive randomly multiple times to eliminate any
of that lingering halo effect. But Secure Empty Trash does not do that. It only makes a single
pass over the drive. So enough to cover it
up for undescerning eyes, but experts who study this
and work with this kind of data regularly might still be able to
figure out what the original data was if just a single pass is made. So why is this important? Well, there's two reasons. One, as attorneys, we want to make
sure that we are doing everything we can to protect our clients' data. And also as we're working with those who
may be less technically inclined, it's important for us as part of our
competent representation of clients to inform them about what we can about
the technology implications of some of the things they do
from a legal perspective. And so if you're working in a large firm
environment or as an in-house counsel, it's probably not going to
fall to you as an attorney to develop some sort of protocol
for establishing the best practices for working with client data. But it is really useful to
understand what these protocols are and how you might be able to
contribute to a conversation about making these
protocols more robust. Here are some basic strategies
that you can use as an attorney to protect your own client
data but also to advise clients so that they can protect their
data for their clients and so on. So the first one is quite easy, and
that is to encrypt your hard drive. So we talked about
encryption previously, but you can also encrypt
your own hard drive such that when your computer turns
on, you need to enter a password. It's again similar to this
public private key idea that we've previously discussed. You need to type in this password
in order for your entire hard drive to be unencrypted such that you
can then read the data on it. Most operating systems
now provide tools that are built into the operating system
itself so that you can do this. So there's really no
excuse not to do it. It is a very easy,
straightforward and simple way to take a pretty strong step at
protecting the data on your machine easily. Again, this usually requires a password. Typically it'll be after you turn
your computer on before the operating system itself loads, the operating
system being one of the few things that is not encrypted such that
it can then open the files and unencrypt everything and so on. But it will not proceed past
the operating system load point until that password is provided. But do be careful, because
some of these systems, particularly the more advanced
ones, after a certain number of incorrect guesses will begin to
securely wipe your hard drive using multiple passes of zeros and ones. And so if you think there's a danger
that you might forget your master password so to speak for
this hard drive encryption, you might want to keep something
somewhere to remind you. I wouldn't recommend like sticking
a sticky note on the monitor or anything like that,
but have some sort of way to remember that password in the
event that you might forget it, because you might lose data if you
guess wrong too many times depending on which hard drive
encryption tool you are using. Another relatively easy
thing to do is to avoid using insecure wireless networks. These are generally
not as common anymore. Most people have wireless
networks that require a password, and usually wireless networks
that require a password will then have encryption for that
individual making the connection on the system on the network. But unsecured networks
do provide opportunities for those listening using tools that
are called packet sniffers, which are literally just
listening and gathering data on all of the
packets of information that are being transmitted over
the internet in the vicinity of the unsecured wireless network. And so you might see-- this as a
screenshot of a tool called Wireshark, and it's a little blurry. There's not a lot of
relevant information here. But on an unsecured network,
it is possible to read all of the bytes and bits
that are flowing through, translate them into
their Ascii equivalence, and realize that this person
is providing a username and password and an action logging in. And so anybody who is able to then
take this information and see what IP address it came from-- and we'll talk
about IP addresses shortly as well-- or where it was going
to might be able to use that data to log in as that
person, which would definitely not be a good thing at all. One way to get around this if
you find yourself in a situation where you need to connect
to the internet to do work or for whatever reason you need
to be connected to the internet even if you're not sure about
the quality of the network is to rely on private or
work provided VPN services. VPN is a virtual private
network, and it provides a way to connect to a trusted encrypted
network, have that network act as you, effectively for providing encryption
services for your web traffic even if you're not sure that your
traffic itself is unencrypted. So VPNs are available at most
businesses or also available online. Relatively inexpensively,
you can buy tools that would allow you to make use
of a virtual private network. Password managers. Password managers are great. Honestly, I can tell you that I
don't know most of the passwords that I use on a daily basis because
I rely on a password manager. There are several services out there-- Last Pass, One Password, and others. Basically, the idea is the tool
will generate passwords for you. You only have to remember
the master password, the one password that you can
use to unlock everything to open the password manager itself. And then once you're logged
into the password manager, you just direct it to log in on
your behalf to different services. You usually tell it this is
the URL I'd like you to go to, this is the username to use, and
then the secretly generated password that you don't generally know is
stored in the password manager itself. Some of these tools are
local to your machine. More often than not, they
are starting to migrate to be cloud based services, which does
introduce another interesting question of do you trust your data to be
stored on the cloud as opposed to being stored on your device? And that's really a
question that you should consider when you're thinking
about using one of these tools. Most of these tools also have
an excellent secondary effect, which is that they often provide
two factor authentication support. And two factor
authentication is something that we will talk about shortly as
well, but it is usually something that you know, like a
password or something that the password manager knows, and
something you have like your cell phone, for example, that might be
getting a text message with a code that you're you're
supposed to enter as well. And the idea is that an adversary who
is trying to hack into your account probably may know your password
but won't have your phone, or may have your phone because they
took it but won't know your password. And so these two factors are designed
to preempt basic hacking attempts. But as I mentioned,
these tools are great, but you should be skeptical
of them, particularly if they are cloud based, because it
is possible for bad things to happen. So for example, not too long
ago, a few million users of the password manager Blur had
information that was leaked online. None of this information was
actually their passwords. It was more customer related
information, sort of ancillary this is their email address
and some other stuff. But it hits a little close to home. And so again, always be
skeptical when thinking about your own data
and your clients' data. But these tools are
generally more good than bad. But again, the decision of
whether to use these tools really does ultimately fall to you
having done research into them, seeing whether or not
they make sense for you, whether you want to take advantage
of the advantages that they offer. If you're not going to
use a password manager, you should at least be sure
to use complex passwords and certainly make sure to avoid using
the same password for multiple services unless it's like a throw away
password that you use on things that you don't care about. But you want to definitely
avoid using the same password on important services. So like your Gmail account or any
client log in related information that you have, or anything banking. You want to use different
passwords for all of those things. Passwords that have less
than eight characters or less than or equal
to eight characters, you should effectively consider
have been broken and hacked already. Those are not secure. Computers are definitely powerful enough
nowadays that it can be brute forced in a relatively short amount of time. We're still talking maybe days here
for an eight character password, but that is not that much of an effort. Passwords should be at least
12 characters now for sure. You should definitely have a mix of
uppercase, lowercase letters, numbers, symbols, anything like that. But anything that is less than
or equal to eight characters should definitely be considered
to be effectively hacked already. And if it hasn't been
hacked already, certainly it is capable of being hacked
very easily by anybody who wants to put in the effort to do so. You should also change your
passwords as frequently as you can. For example, I have a bank that requires
me to change my password every 90 days in order to continue to use
their online banking services. And on the one hand, yes, you
may find that kind of annoying. But on the other hand, it's good to keep
things changing so that you're never having a password get stale and
potentially then leaving it vulnerable, especially if it's the
password that you may have used on multiple services in the past. It's a good thing to
keep in mind, especially if you don't have that
many passwords that you need to maintain to change them
as frequently as you're able to. Creating backups. Creating backups of
information is really important, because sometimes things
will go wrong that you don't expect, like maybe your hard drive
will suffer some sort of catastrophic mechanical failure
and you wouldn't otherwise have a way to get that information back. So periodically backing
your data up protects you in the event of hardware
failure or in the event of some sort of ransomware
attack where an adversary breaks into your network, your
office's network for example, and doesn't take any data away but
encrypts it using their own public and private key such that
there's no way for you to read that information
until you usually pay them some ransom, which is usually
money or something like that or bitcoin or the like. So you should back your
data up pretty regularly. You can back it up in the cloud using
cloud based document storage services. You can also just back it up on
paper in certain situations as well. But definitely back it up to
non network connected machines, so a computer that you have that
is never connected to the internet and is primarily used just for
its hard drive space, basically. Or to flash drives or CD ROMS if
you're still using that technology. Just have some offline way to
access important data in the event that something goes
really, really wrong. Also, have an archival plan for data. You don't need to keep
data around forever. We oftentimes think that because
we're living in this digital age that everything we do persists
forever and needs to persist forever and is tracked. But that's not entirely
true, particularly if we are proactive in doing our
part to archive or delete data when we no longer need it. Particularly when you're
considering client data, it is important to develop
a consistent plan for when you are done working with that data. So for example, it may be the case that
in your firm after three years of no longer having any matters
related to that client, it is just your office's policy
to delete that client's data. And that might mean
transferring other data that might be on a shared
disk with them off of it and literally going through the
process of either destroying the drive or doing the multiple passes over the
drive using zeros and ones randomly just to obscure that data, because
having that policy of not keeping things forever generally protects you,
protect your clients if that data is no longer needed. Also, make talking about
data security a priority. I know it's not exactly
the buzziest conversation to have around the water
cooler, but a lot of people are not as thoughtful about technology
as you may be taking this course. And it may be a shock to
them to realize that when they delete a file on their machine,
it doesn't actually do anything, basically. It just forgets that information,
but that information still lives on. You don't have to be a tech
expert to educate others. Particularly as someone who's coming
into it with maybe a bit more of a leg up in understanding technology,
speaking to individuals who may not know anything about
what this technology is you can really do yourself and your
colleagues and your clients a service by making this part of
a typical conversation. Share your knowledge with others
in your office and in your field. And finally, think about
establishing a compliance protocol. A lot of these things
that I've just described are very, very easy to
set up at the outset. It is not difficult to say, I'm
going to change all my passwords, and I'm going to use
this password manager, and I'm going to write this policy
for deleting information and archiving information periodically. The problem is that it becomes over
time something that we forget to do. And having regular
periods of having someone designated to make sure that
these policies are being followed is really important, as we'll see
shortly when we talk about some of the ABA ethical requirements for
lawyers dealing with technology. You want to make sure that if you
establish some of these ground rules for working with data, that you continue
to follow these rules as you work with this data for the months
and years and so on going forward as opposed to just doing it
once and forgetting about it. Because technology is not static. It's going to continue
to advance, and we need to stay ahead of that as attorneys. It's part of our obligation to
really understand this technology, stay current with any changes, and adapt
and change our policies accordingly so that we're always staying as close
to the cutting edge as we possibly can. I really encourage you to
volunteer with the compliance team. You may have a compliance
team, particularly if you are at a large
office or in-house counsel setting, who is tasked with developing
these technological policies. And even if you don't feel like you
want to advise on new avenues to pursue or new policies to initiate, you still
should be part of that conversation. You do bring something valuable to the
conversation just having the knowledge that you have from a course like this
and should be part of this conversation so that you can contribute to
it more in the future as well. I'd like to conclude our
discussion today about security by drawing your attention to two
really important ABA ethical decisions that relate to lawyers
and technology and what lawyers should do in the event
of a data breach at their office. And let's start by taking a look
at formal opinion 477R which was released by the ABA in May of 2017. This opinion deals with attorneys'
obligations with respect to technical know how. So it is now considered part
of competent representation for an attorney to be considerate of
the technological implications of what they do in their office. What does it mean to store documents? What does it mean to secure
communications with clients? It is incumbent upon us as lawyers
to stay abreast of these developments and really be informed
about them and inform our clients about the
ramifications of some of these new technological advancements. It also formalizes the
requirement of offices and firms to have a compliance protocol. What do you do when you
receive client data? Now, this opinion came out in 2017. It replaced something from
1999, which at the time the previous ABA opinion stated
that all communications, including unsecured unencrypted
email, were generally considered quote unquote secured. Obviously, I think we can agree
that is not the case anymore and certainly the ABA agrees
that is not the case anymore. That's because we've transitioned from
a time when a lot of lawyerly work was done not using the
internet, not using emails. It was done using fax
and paper and so on. And now we've transitioned
to a mostly electronic way of providing legal
services to our clients, and so our technological rules
of our self-governing ethics need to evolve to account for that. It also brings up a very interesting
question which is something just to think about going forward or discuss
with others in your group of how do you reconcile a situation where
you have a client who doesn't want to use secured communications
or doesn't want to secure their data
in working with you? How does that square with
your job or your requirement as an attorney to ethically
abide by this opinion and be mindful and guard clients
against technological mistakes? Is it possible to provide competent
representation to a client if they are unwilling to adhere to
your firm's compliance protocol? It's a really interesting question
that I don't have an answer to but provokes an interesting
discussion about what does it mean for us to have client
intake and work with clients, and what happens when
the client's wishes run against our ethical obligations? That's not a novel question to lawyers. That presents itself in different
ways, but via technology, do we have yet another way we might
have to consider this dilemma? Subsequent to 477R, a year and
a half later in October of 2018, the ABA issued formal
opinion 483, which kind of is the natural follow on to
477R, which deals with what happens if a lawyer's
information is breached? If there is a data breach at the
firm and client data is compromised, what do you have to do? One important thing to think about
here is that this opinion formalizes the notion that has sort have long
been held in technological circles that there are two kinds
of businesses that exist-- ones that have been hacked,
and ones that will be. Not ones that might be or
not ones that could be. And perhaps even these are ones that
have been and they don't know it yet. But it's just such a
part of life nowadays that businesses either have
been hacked or will be hacked, and that is the mindset
that you should have when you are thinking about protecting
client data, bringing in consultants, and hiring people to do their best
work to defend your clients' data. Now, it turns out that law firms tend
to be excellent targets for hackers, and the reason for that is that they
have a lot of very valuable data. And unfortunately, the history is such
that it is not always as well protected by law firms as it might have
been by the clients themselves, because we as lawyers
have been as equipped to have a conversation about technology
and how that technology might affect our representation of clients. The opinion describes a bunch of
different cyber episodes, so to speak, that might comprise a data breach,
which would rise to the level of needing to report to a client. These include things such
as ransomware attacks, as we've discussed a
little bit earlier today, systems attacks that might
break or somehow damage the infrastructure of
the firm or workplace, as well as exfiltrations,
which are probably the worst kind of breach, which is
someone hacks into your system and is able to remove data such that you
may not even have a copy of that data anymore, and that's why having
backups is so important, but removes that data from your servers, for
example, to the adversary's servers. There is no ethical
violation in being hacked. It's really important
to make that very clear. The ethical violation occurs when
non reasonable efforts are made, unreasonable efforts are
made to protect that data. If we as attorneys are making reasonable
efforts to protect our clients' data and we still get hacked, we have
not necessarily done anything wrong as long as we were doing our
best to protect or prevent that from happening in the
first place and once we detect that it has happened, to
make every reasonable effort to stop the attack if it is
ongoing from continuing. This also introduces a
very interesting question of what to do with former client
data that has been hacked, and that's why it's really
important to establish some sort of archival or deletion
plan for working with that data. The ABA proposes a
couple of different ways to resolve how to deal with informing a
former client about information related to a hack. But one of the most important things
to draw from this opinion, I would say, is discussion about data
retention needs to be part of your firm's intake
process or your intake process for dealing with new clients. Who owns what has always sort of
been part of the conversation. Generally as we know, we
return client data to them when we are done working with it. How does this work in a digital context? It is really important
for your intake plan at your firm to handle what happens
to digital versions of client data when the representation has concluded
because the matter has concluded. Speaking of concluded, that is going
to wrap up our discussion today on security. This will be the first
of our two discussions generally at length about
security in the legal context. But hopefully you've
come away from today with a better understanding of how
your system works, what memory is, and why when we delete
things on our hard drives, it doesn't actually get deleted
and what some of the ramifications might be for that. And hopefully you also
have come away from this with an understanding of what
to do going forward establishing best practices for
working with client data to stay within the ethical
guidelines proposed by the ABA, and just to generally have a more
technical conversation with clients about your representation of them
and what happens to their data when that representation has concluded.