Introduction to Hidden Markov Models

Now hidden Markov models are way different than what we've seen. So far, we've been using kind of algorithms that rely on data. So like k means clustering, we gave a lot of data. And we know clustered all those data points found those centroids. Use those centroids, to find where new data points should be. Same thing with linear regression and classification. Whereas hidden Markov models, we actually deal with probability distributions. Now, example we're going to go into here and kind of I have to do a lot of examples for this gives it's a very abstract concept is a basic weather model. So what we actually want to do is predict the weather on any given day, given the probability of different events occurring. So let's say we know, you know, maybe in like a simulated environment or something like that this might be an application, that we have some specific things about our environment, like we know, if it's sunny, there's an 80% chance that the next day, it's going to be sunny again, and a 20% chance that it's going to rain. Maybe we know some information about sunny days and about cold days. And we also know some information about the average temperature on those days. Using this information, we can create a hidden Markov model that will allow us to make a prediction for the weather in future days, given kind of that probability that we've discovered. Now, you might be like, Well, how do we know this? Like, how do I know this probability, a lot of the times you actually do know the probability of certain events occurring, or certain things happening, which makes these models really good. But there's some times where what you actually do is you have a huge data set, and you calculate the probability of things occurring based on that data set. So we're not going to do that part, because that's just kind of going a little bit too far. And the whole point of this is just to introduce us to some different models. But in this example, what we will do is use some predefined probability distributions. So let me just read out the exact definition of a hidden Markov model and start going more in depth. So the hidden Markov model is a finite set of states, each of which is associated with a generally multi dimensional probability distribution. Transitions among the states are governed by a set of probabilities called transition probabilities. So in a hidden Markov model, we have a bunch of states. Now in the example I was talking about with this weather model, the states we would have is hot day, and cold day. Now, these are what we call hidden because never do we actually access or look at these states. While we interact with the model, in fact, we'll be looking at is something called observations. Now at each state, we have an observation, I'll give you an example of an observation. If it is hot outside team has an 80% chance of being happy. If it is cold outside, Tim has a 20% chance of being happy. That is an observation. So at that state, we can observe the probability of something happening during that state is x, right? Whereas y or whatever it is, so we don't actually care about the States. In particular, we care about the observations we get from that state. Now in our example, what we're actually going to do is we're going to look at the weather as an observation for the state. So for example, on a sunny day, the weather has, you know, the probability of being between five and 15 degrees Celsius with an average temperature of 11 degrees, that's like that's a probability we can use. Now I know this is slightly abstract, but I just want to talk about the data we're going to work with here. I'm going to draw out a little example go through it, and then we'll actually get into the code. So let's start by discussing the type of data we're going to use. So typically, in previous ones, right, we use like hundreds, if not, like 1000s of entries, or rows or data points for our models to train for this. We don't need any of that. In fact, all we need is just constant values for probability, and what does it transition distributions and observation distributions. Now, what I'm going to do is go in here and talk about states observations and transitions. So we have a certain amount of states. Now we will define how many states we have, but we don't really care what that state is. So we could have states for example, like warm, cold, high, low, red, green, blue, you can have as many states as we want, we could have one state to be honest, although that would be kind of strange to have that. And these are called hidden because we don't directly observe them. Now observations, so each state has a particular outcome or observation associated with it based on a probability distribution. So it could be the fact that during a hot day, it is 100% true that Tim is happy. Although an a hot day, we could observe that 80% of the time, Tim is happy, and 20% of the time, he is sad, right? Those are observations we make about each state, and each state will have their different observations and different probabilities of those observations occurring. So if we were just going to have like an outcome for the state, that means it's always the same, there's no probability that something happens. And in that case, that's just called an outcome because the probability of the event occurring will be 100%. Okay, then we have transitions. So each state will have a probability to find the likelihood of transitioning to a different state. So for example, if we have a hot day, there'll be a percentage chance that the next day will be a cold day and if we have a cold day, there'll be a percentage chance of the next day is either a hot day or a cold day. So we're going to go through like the exact what we have for our specific model below. But just understand there's a probability that we could transition into a different state. And from each state, we can transition into every other state or a defined set of states given a certain probability. So I know it's a mouthful, I know it's a lot. But let's go into a basic drawing example. Because I just want to illustrate like graphically a little bit kind of how this works. In case these are ideas are a little bit too abstract for any of you. Okay, I'm just pulling out the drawing tablet, just one second here, and let's do this basic weather model. So what I'm gonna do is just simply draw two states, actually, let's do it with some colors, because why not? So we're gonna use yellow. And this is going to be our hot day, okay, this is going to be our Sun. And then I'm just gonna make a cloud, we'll just do like a gray cloud, this will be my cloud. And we'll just say it's gonna be raining over here. Okay, so these are my two states. Now, in each state, there's a probability of transitioning to the other state. So for example, in a hot day, we have a, let's say, 20% chance of transitioning to a cold day, and we have a 80% chance of transitioning to another hot day, like the next day, right? Now, in a cold day, we have, let's say, a 30% chance of transitioning to a hot day. And we have in this case, what is that going to be a 70% chance of transitioning to another cold day. Now, on each of these days, we have a list of observations. So these are what we call states, right? So this could be s one. And this could be s two, it doesn't really matter. Like if we named them or anything, we just we have two states. That's what we know. We know the transition probability. That's what we've just defined. Now we want the observation probability or distribution for that. So essentially, on a hot day, our observation is going to be that the temperature could be between 15 and 25 degrees Celsius with an average temperature of let's say, 20. So we could say observation, right, say, observation. And we'll say that the mean, so the average temperature is going to be 20. And then the distribution for that will be like the minimum value is going to be 15. And the max is going to be 25. So this is what we call actually like a standard deviation, I'm not really going to explain exactly what standard deviation is, although you can kind of think of it as something like this. So essentially, there's a mean, which is the middle point, the most common event that could occur, and at different levels of standard deviation, which is going into statistics, which I don't really want to mention that much, because I'm definitely not an expert, we have a probability of hitting different temperatures as we move to the left and right of this value. So on this curve, somewhere, we have 15. And on this curve to the right, somewhere, we have 25. Now we're just defining the fact that this is where we're gonna kind of end our curve. So we're gonna say that, like the probability is, in between these numbers, it's gonna be in between 15 and 25, with an average of 20. And then our model will kind of figure out some things to do with that. That's as far as I really want to go in standard deviation. And I'm sure that's like a really horrible explanation. But that's kind of the best I'm going to give you as for right now. Okay, so that's our observation here. Now, our observation over here is going to be similar. So we're going to say mean, on a cold day temperature is going to be five degrees, we'll say the minimum temperature maybe is going to be something like negative five, and the max could be something like 15. Or like, yeah, we can say 15. So we'll have some distribution, not just what we want to understand, right. And this is kind of a strange distribution, because we're dealing with what is it standard deviation, although we can just deal with like straight percentage observations. So for example, you know, is a 20% chance that Tim is happy, or there's an 80% chance that he is sad. Like, those are probabilities that we can have as our observation probabilities in the model. Okay, so there's a lot of lingo. There's a lot going on, we're gonna get into like a concrete example now. So hopefully, this should make more sense. But again, just understand states transitions observations, we don't actually ever look at the states, we just have to know how many we have in the transition probability and observation probability in each of them. Okay. So what I want to say now, though, is what do we even do with this model? So once I make this right, once I make this hidden Markov model, what's the point of it? Well, the point of it is to predict future events based on past events. So we know that probability distribution, and I want to predict the weather for the next week. Well, I can use that model to do that. Because I can say, Well, if the current date today is warm, then what is the likelihood that the next day tomorrow is going to be cold, right? And that's what we're kind of doing with this model. We're making predictions for the future based on probability of past events occurring. Okay.

Now hidden Markov models are way different
than what we've seen. So far, we've been using kind of algorithms that rely on data.
So like k means clustering, we gave a lot of data. And we know clustered all those data
points found those centroids. Use those centroids, to find where new data points
should be. Same thing with linear regression and classification. Whereas hidden Markov
models, we actually deal with probability distributions. Now, example we're going to go
into here and kind of I have to do a lot of examples for this gives it's a very abstract
concept is a basic weather model. So what we actually want to do is predict the weather on
any given day, given the probability of different events occurring. So let's say we
know, you know, maybe in like a simulated environment or something like that this might
be an application, that we have some specific things about our
environment, like we know, if it's sunny, there's an 80% chance that the next day, it's
going to be sunny again, and a 20% chance that it's going to rain. Maybe we know some
information about sunny days and about cold days. And we also know some information about
the average temperature on those days. Using this information, we can create a hidden
Markov model that will allow us to make a prediction for the weather in future days,
given kind of that probability that we've discovered. Now, you might be like, Well, how
do we know this? Like, how do I know this probability, a lot of the times you actually
do know the probability of certain events occurring, or certain things happening, which
makes these models really good. But there's some times where what you actually do is you
have a huge data set, and you calculate the probability of things occurring based on that
data set. So we're not going to do that part, because that's just kind of going a little
bit too far. And the whole point of this is just to introduce us to some different
models. But in this example, what we will do is use some predefined probability
distributions. So let me just read out the exact definition of a hidden Markov model and
start going more in depth. So the hidden Markov model is a finite set of states, each
of which is associated with a generally multi dimensional probability distribution.
Transitions among the states are governed by a set of probabilities called transition
probabilities. So in a hidden Markov model, we have a bunch of states. Now in the example
I was talking about with this weather model, the states we would have is hot day, and cold
day. Now, these are what we call hidden because never do we actually access or look
at these states. While we interact with the model, in fact, we'll be looking at is
something called observations. Now at each state, we have an observation, I'll give you
an example of an observation. If it is hot outside team has an 80% chance of being
happy. If it is cold outside, Tim has a 20% chance of being happy. That is an
observation. So at that state, we can observe the probability of something happening during
that state is x, right? Whereas y or whatever it is, so we don't actually care about the
States. In particular, we care about the observations we get from that state. Now in
our example, what we're actually going to do is we're going to look at the weather as an
observation for the state. So for example, on a sunny day, the weather has, you know, the
probability of being between five and 15 degrees Celsius with an average temperature
of 11 degrees, that's like that's a probability we can use. Now I know this is
slightly abstract, but I just want to talk about the data we're going to work with here.
I'm going to draw out a little example go through it, and then we'll actually get into
the code. So let's start by discussing the type of data we're going to use. So
typically, in previous ones, right, we use like hundreds, if not, like 1000s of entries,
or rows or data points for our models to train for this. We don't need any of that. In
fact, all we need is just constant values for probability, and what does it transition
distributions and observation distributions. Now, what I'm going to do is go in here and
talk about states observations and transitions. So we have a certain amount of
states. Now we will define how many states we have, but we don't really care what that
state is. So we could have states for example, like warm, cold, high, low, red,
green, blue, you can have as many states as we want, we could have one state to be
honest, although that would be kind of strange to have that. And these are called
hidden because we don't directly observe them. Now observations, so each state has a
particular outcome or observation associated with it based on a probability distribution.
So it could be the fact that during a hot day, it is 100% true that Tim is happy.
Although an a hot day, we could observe that 80% of the time, Tim is happy, and 20% of the
time, he is sad, right? Those are observations we make about each state, and
each state will have their different observations and different probabilities of
those observations occurring. So if we were just going to have like an outcome for the
state, that means it's always the same, there's no probability that something
happens. And in that case, that's just called an outcome because the probability of the
event occurring will be 100%. Okay, then we have transitions. So each state will have a
probability to find the likelihood of transitioning to a different state. So for
example, if we have a hot day, there'll be a percentage chance that the next day will be a
cold day and if we have a cold day, there'll be a percentage chance of the next day is
either a hot day or a cold day. So we're going to go through like the exact what we
have for our specific model below. But just understand there's a probability that we
could transition into a different state. And from each state, we can transition into every
other state or a defined set of states given a certain probability. So I know it's a
mouthful, I know it's a lot. But let's go into a basic drawing example. Because I just
want to illustrate like graphically a little bit kind of how this works. In case these are
ideas are a little bit too abstract for any of you. Okay, I'm just pulling out the
drawing tablet, just one second here, and let's do this basic weather model. So what
I'm gonna do is just simply draw two states, actually, let's do it with some colors,
because why not? So we're gonna use yellow. And this is going to be our hot day, okay,
this is going to be our Sun. And then I'm just gonna make a cloud, we'll just do like a
gray cloud, this will be my cloud. And we'll just say it's gonna be raining over here.
Okay, so these are my two states. Now, in each state, there's a probability of
transitioning to the other state. So for example, in a hot day, we have a, let's say,
20% chance of transitioning to a cold day, and we have a 80% chance of transitioning to
another hot day, like the next day, right? Now, in a cold day, we have, let's say, a 30%
chance of transitioning to a hot day. And we have in this case, what is that going to be a
70% chance of transitioning to another cold day. Now, on each of these days, we have a
list of observations. So these are what we call states, right? So this could be s one.
And this could be s two, it doesn't really matter. Like if we named them or anything, we
just we have two states. That's what we know. We know the transition probability. That's
what we've just defined. Now we want the observation probability or distribution for
that. So essentially, on a hot day, our observation is going to be that the
temperature could be between 15 and 25 degrees Celsius with an average temperature
of let's say, 20. So we could say observation, right, say, observation. And
we'll say that the mean, so the average temperature is going to be 20. And then the
distribution for that will be like the minimum value is going to be 15. And the max
is going to be 25. So this is what we call actually like a standard deviation, I'm not
really going to explain exactly what standard deviation is, although you can kind of think
of it as something like this. So essentially, there's a mean, which is the middle point,
the most common event that could occur, and at different levels of standard deviation,
which is going into statistics, which I don't really want to mention that much, because I'm
definitely not an expert, we have a probability of hitting different temperatures
as we move to the left and right of this value. So on this curve, somewhere, we have
15. And on this curve to the right, somewhere, we have 25. Now we're just
defining the fact that this is where we're gonna kind of end our curve. So we're gonna
say that, like the probability is, in between these numbers, it's gonna be in between 15
and 25, with an average of 20. And then our model will kind of figure out some things to
do with that. That's as far as I really want to go in standard deviation. And I'm sure
that's like a really horrible explanation. But that's kind of the best I'm going to give
you as for right now. Okay, so that's our observation here. Now, our observation over
here is going to be similar. So we're going to say mean, on a cold day temperature is
going to be five degrees, we'll say the minimum temperature maybe is going to be
something like negative five, and the max could be something like 15. Or like, yeah, we
can say 15. So we'll have some distribution, not just what we want to understand, right.
And this is kind of a strange distribution, because we're dealing with what is it
standard deviation, although we can just deal with like straight percentage observations.
So for example, you know, is a 20% chance that Tim is happy, or there's an 80% chance
that he is sad. Like, those are probabilities that we can have as our observation
probabilities in the model. Okay, so there's a lot of lingo. There's a lot going on, we're
gonna get into like a concrete example now. So hopefully, this should make more sense.
But again, just understand states transitions observations, we don't actually ever look at
the states, we just have to know how many we have in the transition probability and
observation probability in each of them. Okay. So what I want to say now, though, is
what do we even do with this model? So once I make this right, once I make this hidden
Markov model, what's the point of it? Well, the point of it is to predict future events
based on past events. So we know that probability distribution, and I want to
predict the weather for the next week. Well, I can use that model to do that. Because I
can say, Well, if the current date today is warm, then what is the likelihood that the
next day tomorrow is going to be cold, right? And that's what we're kind of doing with this
model. We're making predictions for the future based on probability of past events
occurring. Okay.

Transcript for:Introduction to Hidden Markov Models

Transcript for:
Introduction to Hidden Markov Models