Transcript for:
Probability and Counting Lecture Overview

OK, so as far as clarifications and hints and comments and so on, on the homework, these are kind of general comments. One is don't lose your common sense. That doesn't mean you can rely only on common sense, because we're going to see over and over again in this course a lot of counterintuitive results that may seem to defy common sense.

But that doesn't mean, just because we're doing some counterintuitive stuff sometimes doesn't mean you should abandon common sense. And I also mean this not only in terms of like, do your answers make sense, but also just like in terms of being reasonable, like a couple of you asked about calculators and things like that. On the homework, you can use calculators if you want, but it shouldn't be necessary for the most part.

Once in a while, it'll be obvious that you should use a calculator for that part. On the exams, there's no calculators allowed, but also no calculators needed. So either on the homework or on the exam, you should use some common sense.

For example, if you have 52 choose 5, it's perfectly fine to leave it as 52 choose 5. Certainly on the exam, common sense. I don't expect you to compute 52 choose 5 by hand. On the homework, if you're curious what the number is, you can do this very easily using a calculator or a computer. But you can also leave it as 52 choose 5, which has the virtue of being what I call this self-annotating.

And 52 choose 5, it's some big number, right? If I just gave you that big number, it would be hard to know what it was. But 52 choose 5 is already making you think, oh, this has something to do with choosing 5 out of 52. So this is self-annotating, which is good. You could leave it that way. On the other hand, so this you would leave.

But common sense would be, well, if you have 4 divided by 2 times 1, I would prefer, either on the homework or on the exam, I would prefer that you simplify it to 2. Occasionally, you may need to add fractions. So 1 half plus 1 third, I'm assuming you can do without a calculator. But you're never going to have ugly, tedious stuff on. on the exam, certainly.

And if it's tedious on the homework, then you could use a calculator for that. And that's also a hint. If on the midterm or final you find yourself doing all these massive calculations with huge fractions and things like that, then there's probably something wrong.

OK. So that's just a clarification about that. Second, useful throughout the course, do check answers. And I'll talk more about checking answers at different points. But just briefly, checking answers does not mean go through and look what you did and say that looks okay.

Like do the same thing twice, right? Because if you made a mistake the first time, you're probably not going to detect it when you look through it again. Checking answer means like trying special cases, things like that, thinking of another approach. A lot of problems can be solved in more than one method. And if you get the same answer, you're happy.

get a different answer, then you have to think harder and you'll learn something from that. So the best advice for this is especially by doing simple and extreme cases. You might have a problem that you have some answer in terms of n and k and q and w, and if you plug in n equals 1, it's completely obviously wrong. then that's useful information, right?

But most students don't bother to just plug in n equals 1 and see if it makes sense. So simple and extreme cases. The third thing, I've never seen it given a name or any emphasis at all. But it's very, very useful to label.

And let me tell you what I mean by label, people, objects, whatever. Let me tell you what I mean by that. So a lot of the homework problems and a lot of problems in this course would say, well, you have a group of n people or a group of n elk or gummy bears or something, right?

And it's asking about this collection of objects or people or animals. And all I mean by this is just if you have n people, so the problem would just say we have n people, blah, blah, blah. Then assume that they're labeled with the numbers 1 up to n. Very, very simple, okay? But that's extremely- Useful.

So basically, you have this collection of people. Now, in the problem, I'm not going to give names to all of these n people. And n could be a billion or anything, right? But assume that they each have, just think of this as an ID number, or a social security number, or whatever. So you definitely want to do that with the people.

That's going to be helpful with the elk. And you have a homework problem about balls in a jar, right? There's a certain number of red balls and a certain number of green balls. It's very useful if you, for example, you can choose your own notation.

It's very useful, for example, though, you could say, let's number or label the red balls 1 through R and the green balls R plus 1 through R plus G, for example. You could do something like that. Then you're referring to specific balls. Now, this might seem very obvious, but I'm stressing it just because I've seen students... Very often get confused because they're not thinking of this labeling.

And for people, it's pretty clear, right? Even if you had identical twins, you can still tell them apart in some way, right? OK, so perfect sense with people.

People start to get more confused when we talk about this for balls that are, and the question is, are they indistinguishable or distinguishable and whatever? So suppose you had 10 green balls in a jar and they all look completely identical to you. I'm not actually saying that the numbers are actually written on them. They may look completely identical to you.

But the point is that as far as probability is concerned, as far as nature is concerned, it behaves as if they are distinguishable and labeled. And that's going to give you the correct answers. Whereas if you just say, well, they're completely identical, so they're indistinguishable, you'll run into trouble. So it helps to think about the labeling. Okay, and I think that is also relevant for the robberies and districts.

There's six districts, six robberies. I think the best way to think about it is to imagine numbering the districts, one to six, number the robberies, one to six, and think about it that way. So someone asked me whether robberies are distinguishable or indistinguishable. Well, the problem doesn't designate this robbery was this type and this was this type.

They may look very similar, but they're different robberies, right? They didn't occur at exactly the same time, exactly the same circumstances. There's something that distinguishes them.

You may as well give each one an ID number as if you file a report on each one. You have robbery one through six and think about it that way. Will work better. We'll come back to this issue of indistinguishability later, cuz it's actually quite subtle.

And just a quick clarification about the problem about the teams, you're splitting up. I think it's clearly worded, but just to make sure everyone understands. If I say you have ten people and you wanna split them into a team of four and a team of six.

Let me just do that as a quick example, cuz there's actually some interesting things here. It's a bit of a hint, but it's also useful. So suppose we have 10 people, and I want to split into a team of six and a team of four. I want to know how many ways are there to do that. Well, then it's just 10 choose 4. Because I picked the team of four, whoever's left is the team of six.

That's it, right? Of course, I could have said pick the team of six and whoever's left is the team of four. So I could have said 10 choose 6. We actually just proved that 10 choose 4 equals 10 choose 6, right? Because I just counted the same thing in two different ways.

It must be the same. I mean, you can check this. This is 10 factorial over 6 factorial, 4 factorial, and so is this.

So it's true, but we proved it just by thinking about it. On the other hand, if we wanted two teams of five, Now if we just do 10 choose 5, we're gonna be off, okay? Because I say pick 5 for one team and the remaining 5 for the other team.

But if I had picked the remaining 5 first, that would have been the same. Because it's not like I said that there's a team A and that have some difference between them. It's just two teams.

So picking play, again, it helps to label them. So assume the people are numbered 1 through 5. If the teams are 1 through 5 and 6 through 10, there's only one way to do that. I'm not designating some distinction between the two teams. Therefore, in this case, Okay, so it would be 10 choose 5 divided by 2 because we've double counted, right?

Does everyone understand the difference between these two things? Here you're dividing by 2, here you're not. Because there is a clear difference between a team of 4 and a team of 6, right? Well, two teams of 5, unless you said, well, you know, one team is supposed to wear this jersey and the other one wears this jersey.

it's equivalent. OK, so that's a key, sometimes subtle distinction that sometimes gets missed. So I want to emphasize that a little bit.

All right, so there's a difference there. So you should think carefully about issues like that. It's a little too simplistic to say order matters or order doesn't matter. Well, what matters is thinking in a way that makes sense.

And for this homework, the naive definition of probability is enough for all the probability questions, cuz that's the power of probability we've done so far. But that doesn't mean you can just naively apply the naive definition of probability, right? It's that the naive definition assumes you've broken up your problem into equally likely outcomes. So if you break up the problem in a way where they're clearly not equally likely, then a naive definition will not work. So what I'm saying is for every probability question on the homework, if you frame it in the right way, you can apply the naive definition.

But you have to think hard about making sure that it makes sense to assume equally likely outcomes. Does that make sense? Okay, so. Coming back to this sampling table, so I drew this two by two table last time. I won't draw the whole table again, but you can look it up in your notes to refresh you.

Three of the four entries, so the table had sampling with and without replacement, and order matters or doesn't matter. And last time we talked about the fact that three of the four entries of the table are obvious just from the multiplication. So three of the four entries you can just fill in right away once you've mastered the multiplication rule. The fourth entry is the tricky one, so that's the one I wanna talk about now.

So for the fourth one, I stated the result, I wrote it down, but it's mysterious where it comes from. So I wanna show you where does that come from, okay? So the problem is, we wanna pick k times, from a set of n objects, could be objects, people, whatever. And we're looking at the case where order doesn't matter and it's with replacement.

That's the case we're interested in. So we pick one of the n objects, put it back, pick another one, put it back, do that k times. But we also don't care if we got that same list in a different order, that counts as the same. So we want to count how many ways are there to do this.

And I stated last time that the answer is n plus k minus 1, choose k ways. So I want to prove this result and also give you a little more intuition for it. The other three results should be easy, but this one is tricky. But first, I should follow my own advice. Let's see, is that plausible?

Let's check a couple simple and extreme cases. Well, the most extreme cases, so extreme cases. The most extreme example I can think of is k equals 0. K equals 0 says you don't do anything.

So k equals 0, but let's just see if it makes sense. Assuming that this is correct, k equals 0, this is n minus 1, choose 0. n minus 1, choose 0 is 1. For the same reasoning, why 0 factorial 1? Well, I always think of factorial in terms of if you have n people and they're lined up for ice cream, how many ways can you order them?

Well, if there's no one there, there's one way. It's the way when there's no one there. There's one way. 0 factorial is 1. n choose 0 is 1 for any n, because if you have a group of n people and you choose none of them, there's one way to do that. You just don't choose anyone.

So it's 1, not 0, is the key. So if you had memorized this and memorized it and then got confused whether this was k or n, if you put n here, then it would be. I'm changing k to n in my mind and let that k be 0. n-1 choose n is 0, cuz you can't pick n people out of n-1.

There's not enough people, okay? The answer should. be 1, not 0, so that makes sense.

Now let's do a slightly less extreme case than that. Well, what if k equals 1? If k equals 1, then this would just be n choose 1, which equals n.

Well, that makes sense. You're just picking once. Notice, if you only pick once, It makes no difference if it's with replacement or without replacement.

It makes no difference if it's ordered or unordered. It has to be n. There would be something really wrong if we didn't get that. All right, and let's do one other case that is pretty simple.

Let's do n equals 2. That's an interesting case for us. You can try n equals 1 is another easy case, but let's do n equals 2. n equals 2 is what I would call the simplest non-trivial example. And that's one of the best just general pieces of research advice in general is to look at the simplest non-trivial example. So this one is special in some sense.

Simplest non-trivial. These ones are pretty trivial, but they're still worth checking, because if they were wrong, then we know there's something wrong. This is the simplest non-trivial. Example, if n equals 2, according to this formula, this is gonna be k plus 1 choose k. Notice, k plus 1 choose k is the same thing as k plus 1 choose 1. Same argument as, why is 10 choose 4 the same as 10 choose 6?

You could choose the 4 or you could choose the 6, it's the same thing. This is exactly the same reason, so that's the same as that. K plus 1 choose 1 is obviously K plus 1, cuz you're choosing one thing out of K plus 1. All right, now let's see if that's correct. We have n, we have two objects, okay? And we are picking, We are picking k times.

So let's just draw two buckets to represent the two objects. And we're picking k times. So let's just put a check mark. But for simplicity, I'll put a dot every time I put. This is object 1. This is object 2. OK, every time I select object 1, I'm going to put a dot here.

Every time I select object 2, I'm going to put a dot here. OK, so you can make up some numbers if you want. Maybe it looks like that. I chose object 1 three times, object 2 three times. two, four times.

Since it's with replacement, there's no restriction on how many, as long as the total equals whatever the total is supposed to be. And since order doesn't matter, I don't actually care which dot came before which dot. It's just all I care is there's three dots here and four dots here. Okay, now it follows from that that in order to specify this result, all we need to do is say how many dots are there in this box, right?

Okay, because if I know how many dots are in this box, then however many are left are in this one, okay? So the number of dots here, number of dots. Is in the set 0, 1, up to k, right?

Because I have k dots. In my example, k is 7, but k could be whatever. k dots. So however many are in here, it could be that they're all in here, or none of them, or anything in between.

So there are k plus 1 possibilities. So just by thinking of this, two boxes, that gives a direct proof that this is correct in this case. So that gives some comfort.

And I'll come back a little bit to this case, n equals 2. But now let's prove that this formula is correct in general. All right, well, this dot picture is already giving us a hint at the problem. One of the most difficult and most important things in this course is to try to get in the habit of trying to recognize pattern and structure. That is recognizing when two problems are equivalent, even if they sound different. Okay, so that's not a calculus thing, that's a thinking thing.

You have to think hard. So the way to think about this problem, the best way to think about this problem is not actually think of it this way. It's to notice that this is equivalent to having indistinguishable particles in distinguishable boxes. So I'll just say this is equivalent, and you can think through. Yourself, why is this the same thing?

This is already giving you a hint as to why this is equivalent, but you should think about why this is true in general. So equivalently, how many ways are there to put k indistinguishable particles into n distinguishable boxes? So I'm thinking of these dots as indistinguishable particles.

The boxes are distinguishable because this is box one and this is box two. Okay, indistinguishable particles into n distinguishable boxes. Okay, and the answer is n plus k minus 1 choose k, but let's see why.

So another piece of advice that I could have added there is draw a diagram and try some simple examples. I mentioned the simple examples, but I didn't mention draw a diagram. I'm just gonna draw a picture, that should make everything. Nice. So suppose we had, that was a really easy case where we only had two boxes.

Okay, let's boldly do four boxes. I can do as many boxes as I have space for, it's not any harder. Once you see the idea, it could be any- number of boxes.

Okay, but I'll draw a picture with four boxes. Suppose that this box has three particles. This box is empty. This box has two particles, and this box has one particle. So just for this picture, we had n equals 4 and we had k equals 6, just the number of particles, number of boxes.

Okay, and you could try to draw out, I wouldn't suggest doing it for this case that there's already too many possibilities. It would be very tedious to list them all out. But you could do one that's somewhere in between this one and this one and just try listing out the cases.

Do an example that's simple enough that's not too tedious. as a check. But here, it's already getting complicated, because I can have empty boxes, I could have all four over here, anything. And now this idea of indistinguishability, this is in contrast to the labeling problem. So for most real life marbles and things like that, it behaves more like that, where you can label them.

But for certain counting problems and for certain problems in physics, They are so completely indistinguishable that you can't even think of them as labeled. So in other words, if I swapped this one and this one when you weren't looking, you would never know the difference. But not only would you never know the difference, God could not tell the difference.

That's this problem here, okay? Now usually that's not the case, right? So usually we can think of labeling where even if you can't tell a difference, God can tell a difference.

That's the most common scenario. For some physics problems it behaves more like this, and also for some counting problems. I'm just emphasizing that this is important in physics, but most of you are not physicists.

I'm not a physicist. The reason I'm talking about it is not because of the physics applications, but because. It is important for counting, okay?

But for probability, usually with the naive definition, you're not gonna be able to apply this because it's gonna function more like the labeled case, not like this case. All right, so now it doesn't look like we've done anything except draw a picture, but actually we're basically done deriving the result, almost done. All we need to do is convert this. This picture will be kind of hard to type up, you need to draw this.

Let's convert this into something simpler that you could actually write down easier. So I'm just gonna make up a little code. This is a very simple code.

I'm gonna represent these dots by dots. So dot, dot, dot. And then I'm not gonna draw these rectangles.

I'm just gonna draw separators, okay? So I'm gonna draw a vertical. vertical line segment to denote the separators between boxes like that.

Okay, so the second box was empty, so therefore I'm drawing two vertical lines without any dots in between. The third box had two dots, and then there's a separator. And then there's another dot, okay?

So does everyone see what I did? So it's a very simple encoding. So notice, I mean, I just did this as an example, but obviously no matter what configuration we had drawn here, that can be encoded in this way, and you can go from here to here, you can go from here to here.

It's just a different way to represent the same situation. Once we have This we actually are done because in this picture there must be k dots, because there were k dots here, there must be k dots here. And there must be n-1 separators.

That looks like a 1, but that's a separator symbol. Because if I have n boxes, then there's n-1 separators in between, okay? Now to specify this, How many ways are there to do this?

Well, you could think of it as the factorial of all these things, except that that's over counting. It's similar to the problem on the strategic practice. There was a problem like, how many ways are there to rearrange the letters of the word pepper? So it's not just the factorial of the number of letters in pepper, because there's multiple P's and multiple E's, and you have to adjust for that over counting.

Same as that. But an even easier way to think of it is, To specify this, we have n plus k minus 1 positions here. And in order to specify our code, all we need to do is specify where are the dots.

Choose the positions for the dots. The remaining positions are the positions for the separators. So n plus k minus 1, choose k. There isn't even anything else I can write on this. This is self-annotating again.

I have n plus k minus 1 positions here, and I'm gonna pick k of them to put the dots. If you want, you could also say this is n plus k minus 1, choose n minus 1. Choose where the dots are and the separators are determined, or choose where the separators are and then the dots are determined, same thing. Okay, so that completes the proof of that result.

So just coming back very briefly to this n equals 2 case, a case where you could think of that would be, if you imagine flipping two coins, okay? Usually we think of that as having four, assume they're fair coins, so heads and tails are equally likely. Usually we think of four equally likely outcomes, right? Heads, heads, heads, tails, tails, heads, tails, tails.

So there's four of them, right? Now suppose the coins look completely indistinguishable to you. Still, we could imagine that there's coin number one and coin number two.

Or rather than thinking of flipping two coins, we could think of flipping the same coin twice. So we have the first toss and the second toss. So even if the coins look the same to you, we would think of four outcomes, right?

Does that make sense? Heads, heads, heads, tails, tails, heads, tails, tails. Okay.

A massive controversy in physics arose in the 1920s when a young Indian physicist named Bose proposed, not for coins, you know he was doing a particle physics problem, but what he was saying was equivalent for coins as saying that there are only three outcomes not four and all are equally likely. Because in terms of coins it would be saying well either it's heads heads Tails, tails, or one head and one tail, right? And if the coins are completely indistinguishable, so you can't distinguish between heads, tails, and tails, heads. So he was proposing a model in physics, not for coins, where there were three equally likely outcomes, not four.

And he basically got laughed at for that. But he wrote a letter to a guy called Einstein. And Einstein really liked the idea.

And then they were able to get it, you know, finally start convincing people. That was in the 1920s. So they actually used something, and this just looks like a simple little counting thing, but ideas along these lines in physics, they used these to predict a new state of matter called the Bose-Einstein condensate, which was only, they predicted that theoretically, it was only empirically observed 70 years later. So they predicted it 70 years in advance, and it has, you know, this is not a physics class. But you can look that up if you're curious.

It has all kinds of bizarre properties. The point of this though is that for coins, the thinking of it as labeled, whether you can tell them apart or not, is normally the right way to go. So this is useful for counting and for physics, but you have to be careful about using this with the naive definition of probability. Okay, so let's talk a little more about counting and what I call story proofs.

So let's start with a simple example. So a story proof is still a proof, otherwise I wouldn't call it a proof. Someone asked whether that means an example. What it means is an application or an interpretation, so proof by interpretation I would say, rather than proof by algebra or calculus. So there are some examples on the strategic practice, and I just wanted to do a couple more quick examples, okay?

And then we're ready to go beyond the naive definition of probability. So proof by interpretation, what do I mean by that? Well, we already saw an example today. We proved that 10 choose 4 equals 10 choose 6. So in general, n choose k equals n choose n minus k. That's a very useful fact.

And I'm not gonna write the proof again. This one is, so this is example one. This one is easy to do using algebra as well. But it's even easier to just say, well, if I pick k out of n, or I could pick the other n minus k, and you could write one sentence explaining that. But I already talked about that, so I'm not gonna write that again.

It's the same idea, right? Okay, so this is obvious by the story. The story is just the interpretation that we are picking k people out of n. Rather than just thinking of this as a formal symbol involving factorials that you manipulate. That's actually thinking about what this means rather than manipulating factorials.

That's what I mean by that. Okay, let's do a slightly harder one and then one that would be a nightmare using algebra. So here's kind of an intermediate example.

This is a very handy identity. That n choose n-1 choose k-1 equals k times n choose k. So I didn't memorize, I've used this identity like many, many times in my life, okay? But I don't like memorizing things, so I wrote this kind of slowly because I didn't remember it. I just derived it, okay?

Again, you can check this by algebra pretty easily, okay? But that's not gonna help you remember it or understand it. That's just like, you could write out the algebra and it'll just look like a curiosity.

It cancels out and it doesn't give you any intuition for why that's true, okay? So the story proof for this would be to imagine that we're gonna pick k people out of n. Of course, it doesn't have to be people, whatever example you want. But the point is you're not losing any generality.

I'm not saying three people or something. It's still a general interpretation. Doesn't matter that I said it's people.

Pick k people out of n with one of them designated as the president, say, or whatever you want to call the title. That is, you have a committee and there's a chair of a committee or a president or whatever, okay? President of the club.

So I want to know how many ways are there to do that. Well, there's two different approaches I could take. Either I could first select...

who's in the club, right? There are n people in the population, there are k people in my club. So I could pick who's in the club and choose k. And then one of those k must be elected as president, so we multiply by k, multiplication rule, right? Choose who's in the club, then choose the president.

That's this. But I could also just say, first choose the president, okay? And then once I have the president, then I need k-1 more people in my club. And those could be any of the remaining n-1, those again the multiplication rule. Those are the same thing.

That's a proof. That's a completely rigorous mathematical proof. But it also gives you some interpretation, okay?

So that's the kind of thing that I mean. That we're counting. the same thing in two different ways. So if both ways are correct, they must agree, right? You know, so that's the idea.

So count the same thing in two ways. Okay, and one more example. That will be useful several times in this course, it's a handy identity.

Suppose we had m plus n choose k, okay? And I wanna write this as a sum, usually it goes the other way around. We're gonna see a sum that looks like this, and we wanna, so we're gonna sum j equals 0 to k of m choose j, n choose. K-j. So suppose we had this sum, and we wanna prove that this sum just collapses just to this one binomial coefficient.

This is a famous identity in math called Vandermans identity. It comes up actually a lot in different areas of math, and especially in probability, but it also comes up outside of probability. So if you try to derive this one using algebra, it's pretty horrible, right?

I mean, you write all this in terms of factorials, you can try to cancel stuff, but you still have to deal with this. You can try to apply the binomial theorem, you can do a lot of stuff, but it's not easy at all, okay? So let's prove this using a story. Well, again, it's very helpful that this is self annotating. You don't have to be that smart to think of, well, this says m plus n choose k.

So I'm gonna think about picking k people out of m plus n, that's what it- it says to do, right? So that's what I'm gonna do. I'm gonna pick k out of m plus n, that's the story, okay? Now I have to say, well of course, clearly that's this, but how does that relate to the sum here?

How does that relate to picking k out of m plus n? Well, M plus N is kind of self-annotating also. It means I have an M and I have an N and I added them together. So I'm imagining two groups, a group of size M and a group of size N, and together it's M plus N.

So that seems like a pretty natural thing to do. I mean, that's what M plus N means. So we have a group of M.

Here's M people. I'm not gonna try to draw stick figures or anything. But these dots represent people, not indistinguishable particles.

So you should think of these people as labeled. If you want, you can call this person 1, 2, 3. So 1 through m are in this group. And then n.

Maybe m is 3 and n is 5. Here's n people, okay? So maybe I'll number these 1 through 3 and these ones are 4, 5, 6, 7, 8. Or however you wanna label them, okay? Now I need to select k.

people total from these two groups. So maybe I picked, I'll just circle the ones I picked. Maybe I picked this one, this one, this one, that one, there.

I picked four of them, for example. Well, let's pick five of them, there, okay. So how many ways are there to do that?

Well, obviously, I need to pick some number of people from this group and some number from this group, such that the total is k, right? So suppose that I picked j from here. I pick j, and suppose I picked, well, so in this case j is 2. I picked 2 from this group, and I don't know why the projector is going on.

I picked j from this group, j is 2 in this example. Now if I need 5 total and I pick 2 from here, there must be 3 from here. Obviously, right? So this is just k minus j from over here.

If there's j here, there must be k minus j here. How many ways are there to pick j here, k minus j here? Well, it's just this, right? That's the multiplication rule.

Then we add those up. We're not double counting anything. We're just adding up disjoint cases. So we just add them up and that's that.

And that's the proof. That's it. Okay, so just writing up a few sentences about that in words. That's a proof and it's a lot better to do it that way than to try to do that using algebra, okay?

All right, so last thing for today is to start, just very briefly start on the non-naive definition of probability. And then obviously we'll be continuing that for the rest of the semester, but I just wanna get the basic structure in place. So.

So here's the general definition of probability. Up until now we were assuming equally likely outcomes, right? And we wanna go beyond that. We don't wanna assume everything's equally likely.

And we don't wanna have to assume that there are only finitely many possible outcomes. We wanna go beyond that, okay? So this is the non-naive definition.

For the non-naive definition, we need the notion of a probability space. And I already introduced the concept of a sample space. So a probability space consists of two ingredients, which I'll call S and P. And I just have to tell you what S and P are and what rules they obey, okay? So S, same notation we used before, S is a sample space.

Remember that's just the set of all possible outcomes of some experiment and we'll interpret the word experiment very broadly. So S is a sample space, I already talked about what that means. Up until now, we had to assume that this was finite and that all of the outcomes were equally likely. And now we're gonna go way, way beyond that and not have to assume that anymore. So p is a function, okay?

But it's not the kind of function that you usually see, like f equals x squared. The domain of p is all subsets of s. So p is a function.

It's a mapping that takes As an input, it takes an event. Remember we talked before, and there's also a handout that you should look at, if you haven't already. An event is a subset of S. So P is a function which takes an event, any event, I'll just call it A, an event A, which is a subset of S as input. So an event is a subset of S, that's the subset symbol, input.

And gives P of A. Which is a number between 0 and 1, because we want probabilities just by convention. Standard convention is that we want probabilities to be numbers between 0 and 1. So the input is an event, the output is a number between 0 and 1. So that's the output.

Now. The only thing left is I have to tell you what axioms does this have to satisfy. In other words, what properties do we need p to satisfy, aside from the fact that it's between 0 and 1. OK, so you might think that probability is a very complicated thing.

All this understanding uncertainty is a very complicated thing. But actually, we only need two axioms, two rules. So such that rule number one.

The probability of the empty set equals 0 and the probability of the full space equals 1. And you might say I cheated by including two things in one. Actually, you can try to simplify the axioms and derive, and you could try to derive this. But I'm not trying to make this completely minimal. I like to write it this way because these are the two extremes.

The probability of the empty set equals 0, that makes sense intuitively. What would it mean for the empty set to occur? We say that an event occurs if, so our setup is we have this, Venn diagrams are useful here, S is like the universe all possible outcomes. And we get one specific, let's say this is A, is this oval.

And let's suppose that S, that's a lowercase s, but I did it big so that it's visible. That's lowercase s0 is an example of an outcome. So our picture is before we do the experiment, we just have this general sample space. After we do the experiment, then we get to observe the outcome. And so suppose that the actual outcome of the experiment was s sub 0. Now if that actual outcome is an element of A, then we say that A occurred.

And if this S0 is outside of A, we say it doesn't occur. All right, now what would it mean for the empty set to occur? It would mean that this S0 is in the empty set, but Nothing is in the empty set, it's empty. So that's why I want this to be 0, okay?

What that's saying is we want impossible events to have probability 0. This is impossible. This on the other hand, S itself happens with certainty. If somehow S0 Fell outside of that rectangle, that would mean you had the wrong rectangle. This is the whole universe here, so it can't be outside.

So that's one. That's basically a convention. The most important axiom is the second one. And luckily there's only two, okay? So you're not gonna memorize ten axioms, it's just two simple rules.

So just very quickly, the second one says that the probability of the union, so I'm assuming that you know unions now. This is called a countable union. That is, I'm taking a union of infinitely many, but it's called countably infinitely many. So the probability of the union equals the sum of the probabilities. And there's an important condition there.

That's true if the a1, a2, et cetera are disjoint, which just means that no two of them overlap. So I'll just say non-overlapping. So those are the two axioms of probability.

And I find it pretty amazing that from these two simple rules, you can derive every single theorem and result in probability eventually follows from these two rules. OK.