Master Equation for Markov Processes

so last time you recall we were talking about the master equation for markov processes and just to refresh you your memory ah we said consider a conditional probability density that the system is in state some state k at time t given that it was at j at time 0 and this quantity this set of quantities obeyed a set of coupled differential equations which was of the form d over d t of this was equal to on the right hand side you had a summation over all possible intermediate states so l equal to one to n ah w k l p of l t j minus w of l k p of k t given j okay and these quantities were the transition rates to go from the state l to the state k and there was a constraint here it said that l not equal to k and this is what i call the master equation it is a couple set of first order differential equations for this quantity and of course the initial condition could be anything you want you care to specify since we are talking about conditional probabilities the initial condition is p of k 0 j equal to delta k j ok so the task is to solve this set of equations given that initial condition and then you have a whole host of probabilities ok now notice that i wrote this rewrote this as i wrote a column vector so i said let p of t be a column vector with elements p of 1 t for each j etcetera up to p of n t j in this fashion and then this equation became d p of t over d t equal to a matrix w acting on p of t and this matrix w as you can easily verify has the following elements w ah k j equal to w k j for all k not equal to j all off diagonal elements were of this form and the diagonal elements w k k was equal to minus the sum of all the other elements in that row ok so this is equal to w ah for a given k if k is one for example its w two one three etcetera etcetera so its w l k summed over l equal to 1 to n l not equal to k so we have this picture of a matrix where each column each each row adds up to zero ok so the determinant of w is 0. now w has real elements all the off diagonal elements are either 0 or positive and the diagonal elements are all negative just go back to this equation let us write this equation for instance for the case when k is one ok and let us suppress the index j this is not necessary here so essentially you have an equation like d over d t p of one comma t equal to on the right hand side you have to sum over all k 2 3 4 etcetera etcetera so it is equal to w 1 ah 2 p of 2 t plus dot dot plus w n sorry i put k equal to 1 so 1 n p of n comma t minus this quantity here which is just a sum over l leaving out the index one so this is equal to minus w two one plus w two three plus one etcetera times p of one comma t in this fashion right and this is what i call w one two capital so it is clear that all the off diagonal elements of this big matrix w are precisely w 1 2 w 2 3 and so on and so forth multiplying p 2 p 3 etcetera column vector and what multiplies p one because after all what does this equation imply it says ah p of one t dot dot dot p of n t the d over d t of that equal to w times p so that is w one one one p of one t plus w one two p of two t plus dot dot dot etcetera on the first column and then similarly for the other guys so d p one by d t is this follow here and w one one as you can see is minus of all these just writing this out explicitly ok so the important thing to remember is that the off diagonal elements of w are the transition probabilities ah transition rates and they are non negative they are either zero or positive ok if two states are not directly connected then that particular w happens to be zero ok i will be making some assumptions here which i haven't technical assumptions about the nature of this w matrix we will talk about it later on when i give you special cases of it ah there are there could be situations where we talking about the generic case in other words we are talking about a markov process in which you can reach any state from any state no matter where you start there are no subsets of states which are you know closed among themselves and so on so we are taking the general case where all the w's exist and then you have transitions possible now you must remember that it is entirely possible that you may not be able to go from say state 6 to state 8 directly there may be a 0 transition probability but given enough time you may go from six to some other state five and then five to eight and so on so that possibility exists of course right and that is really what happens most of the time because we will look at cases where you may be able to jump only from a given state to neighboring states on either side and yet over a period of time the system wherever you start will reach any other state ok so we have this w matrix and then we wrote down a formal solution to it and that was just the exponential of this matrix w so we had d over d t p of t equal to w times p of t and this implied that p of t was e to the w t is equal to e to the w t times p of zero whatever that is and in the case we are talking about this p of zero is a column vector in which the elements are all zero except for the jth row which has got one because you are starting with the state j ok so the task reduces to finding this quantity here and the statement made was what sort of time dependence can we generically expect and the answer is you expect things to decay in time to some equilibrium distribution and we already know what that distribution is we have some idea of what it is because remember that determinant w equal to zero implies lambda equal to zero is an eigen value of w you can actually prove more than this you can show that this zero is a simple eigen value that is not a repeated eigen value generically moreover that it has a right eigen vector which will be the stationary probability distribution and in the simplest cases we are looking at it is a unique vector what is important is when you going to identify the column the elements of this eigen vector as corresponding to probabilities we better make sure that the elements cannot be negative so we need a guarantee that this eigen vector has elements which are not only real but also have are also non negative ok there are theorems which assure us of that so one of them was the statement i made that the eigen values of this w can never have positive real part and the reason i said was due to this disk theorem which said that if you give me a general matrix m and this was the famous gash current disc or circle theorem and what it says is you give me an n by n matrix m then take its diagonal elements and mark those points in the complex plane and take the sum of the moduli of all the off diagonal elements in that row add them all up and draw a circle about this center point of that radius and element the eigen values are guaranteed to remain inside these or on these disks that was a statement this is not a hard theorem to prove because look at here is a simple way of doing this suppose lambda is an eigen value then it says m times some eigen vector u equal to lambda times u now let us look at a particular eigen value lambda and let us look at a corresponding eigen value lambda and corresponding eigen vector u then suppose this u is of the form u one u two etcetera and let us say the kth element in it is the largest in magnitude ok so this fellow has elements u one u two up to u n and let suppose that some element k this fellow has the largest magnitude in magnitude over all these of all these elements then if you write this element down its clear that m k j u j must be equal to lambda times u k i write this down for that element out here and there is a summation over j running from one to n so let us take out the j equal to k element and put it on the right hand side and then it says summation j equal to one to n j not equal to k m k j u j equal to lambda minus m k k on u k ok now take modulus on both sides then it immediately tells you that modulus of lambda minus m k k times modulus of uk i will transfer it to the right hand side equal to summation j not equal to k summed over j m k j u j over uk modulus but of course this is less than or equal to summation j not equal to k modulus mkj times the modulus of this because if i take the modulus inside the summation i get maximize the sum itself but these numbers are guaranteed to be less than or equal to 1 because we took this to be the largest right so this is less than equal to summation j not equal to k modulus mkj that is the disk theorem because it says this eigen value is inside us or on the circle of radius m k k whose radius is given by this out here so this region is a disc and you are guaranteed the eigen values inside or on the disk so all the eigen values of this m are either in or on the union of all these disks of the gersh current disks ok there are further refinements to this for example if one of the disc is disjoint you can show that that disc has at least one eigen value etcetera definitely has an eigen value and so on now what we need is just one part of this theorem applied to this matrix w remember that we had this w whose off diagonal elements j k were all w j k these guys were all positive numbers and the diagonal element k was equal to minus summation j not equal to k w ah j k so that immediately tells us that zero is an eigen value in the eigen value plane and all these numbers the diagonal elements are all at negative values because these are non negative numbers inside here so this is equal to minus modulus w k k and so you immediately know that the centers are all sitting here and this look like this each disk looks like that it touches zero therefore the real part of lambda less than equal to zero for all so the eigen values are such that e to the minus lambda t could must d k to zero except for the zero eigen value when it remains at zero there is no t dependence at all therefore w is a relaxation matrix things relax to the equilibrium state and what is that given by well that stationary distribution is not difficult to write down so the stationary distribution corresponding to lambda equal to zero satisfies p state w d of course it satisfies d p stationary over d t equal to zero which must be equal to w times p stationary so it is the right eigen vector corresponding to the zero eigen value of w since each column of w adds up to zero it is immediately clear that the uniform eigen vector eigen uniform row vector is an eigen vector a left eigen vector one one one one one etcetera but this is not a symmetric matrix so the left and right eigen vectors need not be the same in general and the right eigen vector is indeed the equilibrium distribute the stationary distribution ok is that clear now how do we find it well you need to put in this condition and discover what it is so let us put that in and if i write out what w is and let us call this p of p stationary p 1 p 2 up to p n its stationary there is no time dependent so i wont put in a t index at all there is no t dependence in this its the single time probability independent of t right and what is that given by well we have summation l running from 1 to n l not equal to k w k l p of l minus w l k p of k must be equal to zero for the stationary distribution again it is a couple set of equations simultaneous equations linear in all these fellows is there a guarantee that this will have a non trivial solution a set of homogeneous equations and when does a set of simultaneous homogeneous equations have a non trivial solution when the determinant is zero which indeed it is we know determinant w is zero right so it is got a solution right it's got a unique solution and you need to find that by solving this set of equations here whatever it be ok so this problem reduces to an algebraic problem nothing more than that so this thing will lead lead you to implies you can find this quantity explicitly now there are some important cases where you can write the solution down explicitly remember its this sum that is zero summed over l but you could ask what happens if each of these terms is zero inside the sum ok then you get a very special kind of distribution ok which corresponds to saying that term by term this sum is zero and that is called detailed balance so since it is so important in physical applications let me write it down it is called detail balance implies that this quantity w k l p of l equal to w l k p of k for each k l k naught for every pair this is true pair wise and of course if that condition is true then immediately you know that this that is going to give you the stationary distribution without doing much solution at all now what is this actually saying it says you wait for a long time get to the stationary distribution let p of k be the probability that you are in state k then that multiplied by the transition probability rate that you go from k to l must be same thing in the reverse order so here is the initial state here is the final state if you interchange the two you get just this guy here so the flow both ways pairwise is exactly the same the weighted flow of probability mass now this should be a curiosity just a curiosity except for the fact that systems in thermodynamic equilibrium physical system satisfy detail balance provided the underlying dynamics has a property called time reversal in variance i wont get further into the details of this at the moment but just to tell you that its a very very important special case there are lots of physical systems under very general conditions which satisfy detail balance in which case this thing is immediately true and what will that tell us it gives us the ratios of all these probabilities so you choose any one of them say p 1 then you can find p 2 p 3 and so on in terms of this p 1 and how would you determine p 1 itself you normalize the probability remember you have to normalize right so remember that you definitely need this summation k equal to one to n p of k must be equal to one so we know all the ratios p p two over p one p three over p one up to p n over p one and we know p one itself from that equal set of uh equations provided you have a normalization condition there is got to be one inhomogeneous condition which is what this is and with this normalization you can find all of these photos i leave it to you as an exercise to show that in the general case for arbitrary n when you have this detailed balance condition valid then this will imply this will imply that the solution p of k looks like this it looks like one over and this is a well known form so it is worth memorizing it one plus a summation l not equal to k one to n a ratio i get to verify this so there is a very simple formula essentially an algebraic formula for the stationary probability distribution of this markov process in the stationary markov process in the case when detailed balance is valid and then it is just that simple algebraic formula out there what happens if the rates are all the same the rate from l to k is the same as a rate from k to l what do you think would happen now you are saying not only retail balance you are saying that look this rate is equal to that rate that tells us this must be equal to that the probability is distribution must be a uniform distribution there are n states you are in the steady state and each of them is equally probable so what is the probability of any one of them 1 over n and indeed that is true because if these rates are all equal this cancels gives you one there is a summation here except for one index k so this is n minus one you add it to one and you get a one over f ok so this goes over but when you do not have that when you do not have that extra condition that the rate transition rates are also equal then of course you have a non trivial solution in terms of all the transition rates now let us look at the simplest case the simplest possible case is when you have just two states possible and this is such a famous case and it applies in so many places it occurs in so many places that its got a special name to itself its called the dichotomous stationary dichotomous markov process corresponds to n equal to two just two possible states we can write down what the solution is in this case because the stationary distribution is utterly trivial in this case you have summation over l and for any given k it runs over only one other value for one or two right so l k etcetera run over values one to two and you have w ah for the stationary distribution you have w ah say 1 2 p of 2 minus w 2 1 p of 1 should be equal to 0. because there is nothing else to sum over if i said k equal to 1 i have this and i sum over l there is only one other value 2 and similarly you have w well you have essentially this equation and nothing more ok so this is what we call w one two and let me call this p of two equal to w two one p one here so p two is this fellow p two is this and p one plus p two must be equal to one so this must be equal to one minus p of y on this side and what does that lead us to it gives us explicit solutions it says ah w two one over w one two plus one p of one equal to one or ah p of one w 2 one two is a rate transition probab rate to go from two to one ok and similarly p of two equal to w by symmetry that's it those are exact solutions for the stationary distribution of course we also have to write the time dependent solution for p of t we have just solve the problem w on p stationary equal to 0 and we have this but now think about it a little bit and you see immediately how physical this solution is because what we have is a process and we cannot draw this let us draw this is a process which as a function of time it takes on two states possible now we need a symbol to say what the values of this random variable are which can take two possible values let us call this values two constants c one and c two for example so let us say c one is sitting somewhere here and c two is sitting somewhere here then what it does is it starts in say state one goes on for a long time so that any arbitrary instant of time you can put the origin of time and then it abruptly makes it jump to c two and then goes on for a while it makes c one and then it does this etcetera so this is c one and that's c two two states its a two state process and in each state the value of the variable is either c one or c two ok and it keeps switching back and forth randomly between these two states so you can now imagine you can easily imagine how many possible situations this will model immediately anything which has got two possible states either an on and off or a passive state and an active state you name it a huge number of applications where this model will work the simplest possible model now what is the rate or average rate at which it switches from one to the other so we have already said that we have said that w one two is what we call w one two is the rate from at which two to one switching is happening so let us call this lambda two the rate at which the system switches state if it is in state two the rate at which it switches to state one and similarly two one is a rate at which it switches from one to two then according to what we have here it says p one must be equal to ah w one two that is lambda two over lambda one plus lambda two and p two must be equal to lambda one over lambda one plus lambda two now what is the physical meaning of this lambda one and lambda two if you look at this picture you see that it stays suppose it had started like this somewhere it says stays for a time interval in state one random time interval another random time interval another random time interval etcetera and the average over all these things is the mean residence time in state one so if you say what is the average in between switches from state two to one in between what is the average time it spends in each each time it reaches state one let us call it some tau one so average residence in state one lets call it tau one and similarly the average duration of a stay in state two lets call this average equal to tau two ok then clearly the switching rate is just the reciprocal of these things right so it is clear that ah lambda one tau 1 equal to 1 over lambda 1 tau 2 equal to 1 over lambda 2 so it says this time is obviously equal this is equal to tau 1 over tau 1 plus tau 2 and this is equal to tau 2 over tau 1 plus ok and that is exactly what you would expect physically because now you would say ah if i look at it as a function of time and i ask i put my finger on one particular point in time i ask what is the stationary probability that it is in state one or state two well the probability that sits in state one will be the fraction of the time that it spends in state one over a long period of time and similarly for state two and they are precisely proportional to the mean residence time in state one divided by the total residence time these are the fractions ok so we have a simple physical interpretation of what is meant by the equilibrium distribution for a dichotomous markov process it is just the ratio of the fraction of the mean mean resonance time in one of the states divided by the sum of the residence times in both the states so its physically very clear this is what is happening out here we still have to solve this problem of the time dependent problem we have still not dealt with that but the stationary distribution is completely clear in this particular case now even if you have a three state process the formulas that you write down in general without detail balance are not so trivial they're quite involved they get more and more algebraically more complicated but when you have something like detail balance then of course it simplifies enormously but the most popular model is the dichotomous markov process in this case now whats what remains is to ask what happens as a function of time what happens when you put in the time dependences everywhere and this is what happens we need to solve this problem by going back and saying d over d t so we need we write the solution down p of t is e to the w t p of 0 and we need to know what is w and w was equal to w 1 1 w 1 2 but w 1 2 was the rate to go from 2 to 1 on this side which was lambda 2 and similarly w 2 1 was the rate to go from 1 to 2 and the diagonal elements are minus those guys so that is w the transition matrix and we need to find the exponential of this matrix so you square it you cube it and things like that and find out what the exponential but matters are made a little simpler by noticing that w squared is minus lambda 1 lambda 2 lambda 1 minus lambda 2 this is w squared and what is that equal to its lambda 1 squared plus lambda 1 lambda 2 so take out a lambda 1 and you get lambda 1 into lambda 1 plus lambda 2 and then minus lambda 1 squared minus lambda 1 lambda 2 so again you take out a lambda 1 and then on the other side you get minus lambda 1 lambda 2 minus lambda 2 squared so it is minus lambda 2 and 1 plus lambda 2 and then finally lambda 1 lambda 2 plus lambda 2 squared so that is lambda 2. plus lambda 2 so we can take out the lambda 1 plus lambda 2 and write this as equal to lambda 1 plus lambda 2 times lambda 1 minus lambda 1 that's just minus of this guy out here so this is turning out to be equal to minus and then a w itself okay now the rate to go from 1 to 2 is lambda 1 and 2 to 1 is lambda 2 and the average rate is half the sum right so let us define define the mean transition rate lambda is lambda 1 plus lambda 2 over 2 so it says that w squared equal to minus 2 lambda w that of course immediately makes the problem of finding the exponential trivial because it says w cubed is proportional to w once again and so on and so forth so where does it get us it says e to the w t is the identity matrix plus w t plus t squared over two factorial w squared which is minus two lambda w and then t cubed over three factorial and then w cube but thats w times w squared and so on so its minus two lambda whole squared w etcetera all the way so let us write this as i plus let us take out a w ah let us take out the w and then let us write this term as minus 2 lambda w t divided by 2 lambda let's write it in that form so this becomes over minus two lambda minus two lambda take out a minus 2 lambda in this fashion and then inside i have minus 2 lambda t plus minus 2 lambda t whole squared over two factorial plus dot i took out a minus two lambda so put let me put that back in here so it matches this power out here but what is this fellow e to the minus two lambda t minus one right so lets erase this one minus e to the minus two lambda t and get rid of the minus sign and its over right now we can write the full probability distribution down completely i leave that to you as an exercise to write this down so you should be able to write down p of ah now let us let us put in the values that we have for this process so you should be able to find p of c one comma t given that you started in c one p of c two comma t given that you started in c one how would you find this what is this guy equal to this is e to the w t on the initial matrix and what is the initial matrix if i start with c 1 what is this equal to it is equal to 1 at equal and what is this 0. so all you have to do is to apply that to this form all you need to do is to apply this matrix which is a 2 by 2 matrix because we know both w and i apply it to this fellow and read off these two numbers and similarly p of c one t c two and p of c two t c two this column matrix is equal to e to the w t that's this matrix acting on zero one and you can write the full matrix now by the way just to check since we have this statement here if you let this if you let t tends to infinity if you let t tend to infinity we better recover the values that we already had for the equilibrium distribution what happens to this guy it becomes a stationary distribution p 1 p 2 right so you have p 1 p 2 should be the limit of e to the w t on the initial state whatever that state is whether it is in one or two we do not care now what is the limit of e to the w t you can read it off from here yeah so we know that the limit t tends to infinity e to the w t turns out to be the identity matrix plus w over two lambda because this goes away so you have ah one zero zero one plus w over two lambda and now you have to tell me what is one over two lambda what is w by the way ah it was lambda 1 here lambda 2 here minus lambda 1 minus lambda 2 in this fashion ok added to this guy out here and this is lambda 1 plus lambda 2 okay so what does this become equal to ah lambda 1 plus lambda 2 so it's lambda 2 again a lambda 2 and then lambda 1 and the lambda 1 1 over 2 lambda times that right on whatever initial state you want to put down now one of our articles of faith is that as time increases the initial value should not matter so whatever be the initial distribution i should still get the equilibrium or stationary distribution to be the same thing so whether i apply it on 0 1 or 1 0 it shouldn't matter it shouldn't matter at all otherwise i am in trouble right in fact i could have had a of this and one minus a of the other so let us do that and see what happens so suppose the initial state this fellow here had been sum a b with a plus b equal to one non negative number such that a plus b is one so we apply it to a b and what do you get you get lambda two a plus lambda 2 b so lambda 2 comes out and you have a plus b which is 1 it goes away so this indeed gives you lambda 2 over lambda 1 plus lambda 2 and lambda 1 over lambda plus lambda two independent of a and b well independent of a b is one minus a in case so that corroborates the requirement we had that the stationary distribution should not depend on the initial distribution at all okay so it does it does cancel out this thing does that is why you had the same element both places out here and that is exactly the this is exactly the equilibrium distribution we discovered earlier ok but you can write the time dependent one out here write this out explicitly i leave you to do this little bit of algebra but you get expressions what is that so what is the decay to equilibrium what is it going like there is a constant part in these probabilities which gives you the stationary part and then there is a part which decays there is only one more eigen value and what is that other eigen value two two minus two lambda minus two lambda so the correlation time you expect the correlation time in this process is going to be 2 lambda inverse and again that corroborates what we are going to find the correlation time explicitly i am going to define it in a minute but you see it is this time scale that is the correlation time in a very specific way we will see that the autocorrelation will die down with this exponent out here but what is this guy equal to this is equal to one over lambda one plus lambda two right which in turn remember this is its equal to tau one tau two over tau one plus tau because this is one over tau one that's one over tau and what is this number what sort of mean is it it's a harmonic mean of the individual resonance times right so we have this very simple relationship which says in a dichotomous markov process with mean residence times lambda tau 1 and tau 2 in the two states the correlation time of this process the time on which it loses memory in some sense is the harmonic mean of these two individual times ok now let us define correlation well some things can be written down immediately and then we will if time permits do this or i come back to this a little later we will write the answer down at least in equilibrium we will write the answer down so i ask now what is the mean value of this process let us call that process x so what is the mean value of x x has values c one and c two the sample space of x comprises the two values c one and c two and the system is switching randomly for back and forth between these two values with mean residence time tau one and tau two respectively in the two states so what you expect is the average value of x x stationary x stationary is with respect to the stationary distribution by definition this is equal to c one p one plus c two p two by definition the stationary or t turning to infinity average has got to be this because p one p two are the stationary probabilities and c one c two are the corresponding values therefore this is the weighted average and its a normalized probability and what is this equal to not surprisingly this is c 1 tau 1 plus c 2 tau 2 over tau 1 plus tau 2 or in terms of lambda is it c 1 lambda 2 plus c 2 lambda 1 over lambda 1 plus lambda what is the mean square going to be for same guy but with squares right so its c one squared tau one plus c two squared tau two over tau one plus tau two again this is the stationary and what is the variance going to be so what is delta x that is the difference between this and the mean squared stationary delta x whole square stationary what is that going to be that is equal to this minus the square of this row now the square of this is first of all if i take this minus the square of that there is a tau one plus tau two multiplying this follow here so let us quickly do that with c one square tau one plus c two square tau two tau one plus tau two minus c one square tau one squared plus c two square tau two squared plus two c one c two tau one tau two the whole thing divided by tau one plus tau two square out here so some terms definitely cancel out we can write down a very simple formula and that is delta x whole square in the stationary state is equal to first of all the tau one squared cancels with this guy the tau two squared cancels with that and then you have a c one squared tau one tau two or c two square tau one tau two and then you have minus two c one c two tau one tau two so this is equal to tau one tau two that's great right that's it ok now what is tau 1 tau 2 over tau 1 plus tau 2 its its the correlation time right so there is a direct connection between this what happens if what happens if c c one c two equal to minus c one this becomes a square of whatever value so sometimes it switches between plus one and minus one or something like that then the formulas simplify that is called the symmetric dichotomous process we will talk about that little while later but what happens if there is time dependence so the next question to ask is the auto correlation the generalization of the variance and i will do that tomorrow we need to generalize this to ask delta x at any instant t1 delta x at any instant t two what is that equal to that generalizes the variance but this is a stationary process so this thing must be a function of t two minus t one or something so i might as well look at delta x at any time zero and then call this just t what would this be if t equal to 0 it should reduce to this right so its this time something or the other and you expect the memory is going to drop as a function of time exponentially with the correlation time which is 2 lambda inverse so what do you expect this formula to be what do you what do i expect this to be in general that is equal to this multiplied by e power minus 2 lambda so the dichotomous markov process is exponentially correlated ok i will prove this we have to define the correlation time correlation self auto correlation more precisely will do that and then we take it from this point tomorrow okay you

so last time you recall we were talking about the master equation for markov processes and just to refresh you your memory ah we said consider a conditional probability density that the system is in state some state k at time t given that it was at j at time 0 and this quantity this set of quantities obeyed a set of coupled differential equations which was of the form d over d t of this was equal to on the right hand side you had a summation over all possible intermediate states so l equal to one to n ah w k l p of l t j minus w of l k p of k t given j okay and these quantities were the transition rates to go from the state l to the state k and there was a constraint here it said that l not equal to k and this is what i call the master equation it is a couple set of first order differential equations for this quantity and of course the initial condition could be anything you want you care to specify since we are talking about conditional probabilities the initial condition is p of k 0 j equal to delta k j ok so the task is to solve this set of equations given that initial condition and then you have a whole host of probabilities ok now notice that i wrote this rewrote this as i wrote a column vector so i said let p of t be a column vector with elements p of 1 t for each j etcetera up to p of n t j in this fashion and then this equation became d p of t over d t equal to a matrix w acting on p of t and this matrix w as you can easily verify has the following elements w ah k j equal to w k j for all k not equal to j all off diagonal elements were of this form and the diagonal elements w k k was equal to minus the sum of all the other elements in that row ok so this is equal to w ah for a given k if k is one for example its w two one three etcetera etcetera so its w l k summed over l equal to 1 to n l not equal to k so we have this picture of a matrix where each column each each row adds up to zero ok so the determinant of w is 0. now w has real elements all the off diagonal elements are either 0 or positive and the diagonal elements are all negative just go back to this equation let us write this equation for instance for the case when k is one ok and let us suppress the index j this is not necessary here so essentially you have an equation like d over d t p of one comma t equal to on the right hand side you have to sum over all k 2 3 4 etcetera etcetera so it is equal to w 1 ah 2 p of 2 t plus dot dot plus w n sorry i put k equal to 1 so 1 n p of n comma t minus this quantity here which is just a sum over l leaving out the index one so this is equal to minus w two one plus w two three plus one etcetera times p of one comma t in this fashion right and this is what i call w one two capital so it is clear that all the off diagonal elements of this big matrix w are precisely w 1 2 w 2 3 and so on and so forth multiplying p 2 p 3 etcetera column vector and what multiplies p one because after all what does this equation imply it says ah p of one t dot dot dot p of n t the d over d t of that equal to w times p so that is w one one one p of one t plus w one two p of two t plus dot dot dot etcetera on the first column and then similarly for the other guys so d p one by d t is this follow here and w one one as you can see is minus of all these just writing this out explicitly ok so the important thing to remember is that the off diagonal elements of w are the transition probabilities ah transition rates and they are non negative they are either zero or positive ok if two states are not directly connected then that particular w happens to be zero ok i will be making some assumptions here which i haven&#39;t technical assumptions about the nature of this w matrix we will talk about it later on when i give you special cases of it ah there are there could be situations where we talking about the generic case in other words we are talking about a markov process in which you can reach any state from any state no matter where you start there are no subsets of states which are you know closed among themselves and so on so we are taking the general case where all the w&#39;s exist and then you have transitions possible now you must remember that it is entirely possible that you may not be able to go from say state 6 to state 8 directly there may be a 0 transition probability but given enough time you may go from six to some other state five and then five to eight and so on so that possibility exists of course right and that is really what happens most of the time because we will look at cases where you may be able to jump only from a given state to neighboring states on either side and yet over a period of time the system wherever you start will reach any other state ok so we have this w matrix and then we wrote down a formal solution to it and that was just the exponential of this matrix w so we had d over d t p of t equal to w times p of t and this implied that p of t was e to the w t is equal to e to the w t times p of zero whatever that is and in the case we are talking about this p of zero is a column vector in which the elements are all zero except for the jth row which has got one because you are starting with the state j ok so the task reduces to finding this quantity here and the statement made was what sort of time dependence can we generically expect and the answer is you expect things to decay in time to some equilibrium distribution and we already know what that distribution is we have some idea of what it is because remember that determinant w equal to zero implies lambda equal to zero is an eigen value of w you can actually prove more than this you can show that this zero is a simple eigen value that is not a repeated eigen value generically moreover that it has a right eigen vector which will be the stationary probability distribution and in the simplest cases we are looking at it is a unique vector what is important is when you going to identify the column the elements of this eigen vector as corresponding to probabilities we better make sure that the elements cannot be negative so we need a guarantee that this eigen vector has elements which are not only real but also have are also non negative ok there are theorems which assure us of that so one of them was the statement i made that the eigen values of this w can never have positive real part and the reason i said was due to this disk theorem which said that if you give me a general matrix m and this was the famous gash current disc or circle theorem and what it says is you give me an n by n matrix m then take its diagonal elements and mark those points in the complex plane and take the sum of the moduli of all the off diagonal elements in that row add them all up and draw a circle about this center point of that radius and element the eigen values are guaranteed to remain inside these or on these disks that was a statement this is not a hard theorem to prove because look at here is a simple way of doing this suppose lambda is an eigen value then it says m times some eigen vector u equal to lambda times u now let us look at a particular eigen value lambda and let us look at a corresponding eigen value lambda and corresponding eigen vector u then suppose this u is of the form u one u two etcetera and let us say the kth element in it is the largest in magnitude ok so this fellow has elements u one u two up to u n and let suppose that some element k this fellow has the largest magnitude in magnitude over all these of all these elements then if you write this element down its clear that m k j u j must be equal to lambda times u k i write this down for that element out here and there is a summation over j running from one to n so let us take out the j equal to k element and put it on the right hand side and then it says summation j equal to one to n j not equal to k m k j u j equal to lambda minus m k k on u k ok now take modulus on both sides then it immediately tells you that modulus of lambda minus m k k times modulus of uk i will transfer it to the right hand side equal to summation j not equal to k summed over j m k j u j over uk modulus but of course this is less than or equal to summation j not equal to k modulus mkj times the modulus of this because if i take the modulus inside the summation i get maximize the sum itself but these numbers are guaranteed to be less than or equal to 1 because we took this to be the largest right so this is less than equal to summation j not equal to k modulus mkj that is the disk theorem because it says this eigen value is inside us or on the circle of radius m k k whose radius is given by this out here so this region is a disc and you are guaranteed the eigen values inside or on the disk so all the eigen values of this m are either in or on the union of all these disks of the gersh current disks ok there are further refinements to this for example if one of the disc is disjoint you can show that that disc has at least one eigen value etcetera definitely has an eigen value and so on now what we need is just one part of this theorem applied to this matrix w remember that we had this w whose off diagonal elements j k were all w j k these guys were all positive numbers and the diagonal element k was equal to minus summation j not equal to k w ah j k so that immediately tells us that zero is an eigen value in the eigen value plane and all these numbers the diagonal elements are all at negative values because these are non negative numbers inside here so this is equal to minus modulus w k k and so you immediately know that the centers are all sitting here and this look like this each disk looks like that it touches zero therefore the real part of lambda less than equal to zero for all so the eigen values are such that e to the minus lambda t could must d k to zero except for the zero eigen value when it remains at zero there is no t dependence at all therefore w is a relaxation matrix things relax to the equilibrium state and what is that given by well that stationary distribution is not difficult to write down so the stationary distribution corresponding to lambda equal to zero satisfies p state w d of course it satisfies d p stationary over d t equal to zero which must be equal to w times p stationary so it is the right eigen vector corresponding to the zero eigen value of w since each column of w adds up to zero it is immediately clear that the uniform eigen vector eigen uniform row vector is an eigen vector a left eigen vector one one one one one etcetera but this is not a symmetric matrix so the left and right eigen vectors need not be the same in general and the right eigen vector is indeed the equilibrium distribute the stationary distribution ok is that clear now how do we find it well you need to put in this condition and discover what it is so let us put that in and if i write out what w is and let us call this p of p stationary p 1 p 2 up to p n its stationary there is no time dependent so i wont put in a t index at all there is no t dependence in this its the single time probability independent of t right and what is that given by well we have summation l running from 1 to n l not equal to k w k l p of l minus w l k p of k must be equal to zero for the stationary distribution again it is a couple set of equations simultaneous equations linear in all these fellows is there a guarantee that this will have a non trivial solution a set of homogeneous equations and when does a set of simultaneous homogeneous equations have a non trivial solution when the determinant is zero which indeed it is we know determinant w is zero right so it is got a solution right it&#39;s got a unique solution and you need to find that by solving this set of equations here whatever it be ok so this problem reduces to an algebraic problem nothing more than that so this thing will lead lead you to implies you can find this quantity explicitly now there are some important cases where you can write the solution down explicitly remember its this sum that is zero summed over l but you could ask what happens if each of these terms is zero inside the sum ok then you get a very special kind of distribution ok which corresponds to saying that term by term this sum is zero and that is called detailed balance so since it is so important in physical applications let me write it down it is called detail balance implies that this quantity w k l p of l equal to w l k p of k for each k l k naught for every pair this is true pair wise and of course if that condition is true then immediately you know that this that is going to give you the stationary distribution without doing much solution at all now what is this actually saying it says you wait for a long time get to the stationary distribution let p of k be the probability that you are in state k then that multiplied by the transition probability rate that you go from k to l must be same thing in the reverse order so here is the initial state here is the final state if you interchange the two you get just this guy here so the flow both ways pairwise is exactly the same the weighted flow of probability mass now this should be a curiosity just a curiosity except for the fact that systems in thermodynamic equilibrium physical system satisfy detail balance provided the underlying dynamics has a property called time reversal in variance i wont get further into the details of this at the moment but just to tell you that its a very very important special case there are lots of physical systems under very general conditions which satisfy detail balance in which case this thing is immediately true and what will that tell us it gives us the ratios of all these probabilities so you choose any one of them say p 1 then you can find p 2 p 3 and so on in terms of this p 1 and how would you determine p 1 itself you normalize the probability remember you have to normalize right so remember that you definitely need this summation k equal to one to n p of k must be equal to one so we know all the ratios p p two over p one p three over p one up to p n over p one and we know p one itself from that equal set of uh equations provided you have a normalization condition there is got to be one inhomogeneous condition which is what this is and with this normalization you can find all of these photos i leave it to you as an exercise to show that in the general case for arbitrary n when you have this detailed balance condition valid then this will imply this will imply that the solution p of k looks like this it looks like one over and this is a well known form so it is worth memorizing it one plus a summation l not equal to k one to n a ratio i get to verify this so there is a very simple formula essentially an algebraic formula for the stationary probability distribution of this markov process in the stationary markov process in the case when detailed balance is valid and then it is just that simple algebraic formula out there what happens if the rates are all the same the rate from l to k is the same as a rate from k to l what do you think would happen now you are saying not only retail balance you are saying that look this rate is equal to that rate that tells us this must be equal to that the probability is distribution must be a uniform distribution there are n states you are in the steady state and each of them is equally probable so what is the probability of any one of them 1 over n and indeed that is true because if these rates are all equal this cancels gives you one there is a summation here except for one index k so this is n minus one you add it to one and you get a one over f ok so this goes over but when you do not have that when you do not have that extra condition that the rate transition rates are also equal then of course you have a non trivial solution in terms of all the transition rates now let us look at the simplest case the simplest possible case is when you have just two states possible and this is such a famous case and it applies in so many places it occurs in so many places that its got a special name to itself its called the dichotomous stationary dichotomous markov process corresponds to n equal to two just two possible states we can write down what the solution is in this case because the stationary distribution is utterly trivial in this case you have summation over l and for any given k it runs over only one other value for one or two right so l k etcetera run over values one to two and you have w ah for the stationary distribution you have w ah say 1 2 p of 2 minus w 2 1 p of 1 should be equal to 0. because there is nothing else to sum over if i said k equal to 1 i have this and i sum over l there is only one other value 2 and similarly you have w well you have essentially this equation and nothing more ok so this is what we call w one two and let me call this p of two equal to w two one p one here so p two is this fellow p two is this and p one plus p two must be equal to one so this must be equal to one minus p of y on this side and what does that lead us to it gives us explicit solutions it says ah w two one over w one two plus one p of one equal to one or ah p of one w 2 one two is a rate transition probab rate to go from two to one ok and similarly p of two equal to w by symmetry that&#39;s it those are exact solutions for the stationary distribution of course we also have to write the time dependent solution for p of t we have just solve the problem w on p stationary equal to 0 and we have this but now think about it a little bit and you see immediately how physical this solution is because what we have is a process and we cannot draw this let us draw this is a process which as a function of time it takes on two states possible now we need a symbol to say what the values of this random variable are which can take two possible values let us call this values two constants c one and c two for example so let us say c one is sitting somewhere here and c two is sitting somewhere here then what it does is it starts in say state one goes on for a long time so that any arbitrary instant of time you can put the origin of time and then it abruptly makes it jump to c two and then goes on for a while it makes c one and then it does this etcetera so this is c one and that&#39;s c two two states its a two state process and in each state the value of the variable is either c one or c two ok and it keeps switching back and forth randomly between these two states so you can now imagine you can easily imagine how many possible situations this will model immediately anything which has got two possible states either an on and off or a passive state and an active state you name it a huge number of applications where this model will work the simplest possible model now what is the rate or average rate at which it switches from one to the other so we have already said that we have said that w one two is what we call w one two is the rate from at which two to one switching is happening so let us call this lambda two the rate at which the system switches state if it is in state two the rate at which it switches to state one and similarly two one is a rate at which it switches from one to two then according to what we have here it says p one must be equal to ah w one two that is lambda two over lambda one plus lambda two and p two must be equal to lambda one over lambda one plus lambda two now what is the physical meaning of this lambda one and lambda two if you look at this picture you see that it stays suppose it had started like this somewhere it says stays for a time interval in state one random time interval another random time interval another random time interval etcetera and the average over all these things is the mean residence time in state one so if you say what is the average in between switches from state two to one in between what is the average time it spends in each each time it reaches state one let us call it some tau one so average residence in state one lets call it tau one and similarly the average duration of a stay in state two lets call this average equal to tau two ok then clearly the switching rate is just the reciprocal of these things right so it is clear that ah lambda one tau 1 equal to 1 over lambda 1 tau 2 equal to 1 over lambda 2 so it says this time is obviously equal this is equal to tau 1 over tau 1 plus tau 2 and this is equal to tau 2 over tau 1 plus ok and that is exactly what you would expect physically because now you would say ah if i look at it as a function of time and i ask i put my finger on one particular point in time i ask what is the stationary probability that it is in state one or state two well the probability that sits in state one will be the fraction of the time that it spends in state one over a long period of time and similarly for state two and they are precisely proportional to the mean residence time in state one divided by the total residence time these are the fractions ok so we have a simple physical interpretation of what is meant by the equilibrium distribution for a dichotomous markov process it is just the ratio of the fraction of the mean mean resonance time in one of the states divided by the sum of the residence times in both the states so its physically very clear this is what is happening out here we still have to solve this problem of the time dependent problem we have still not dealt with that but the stationary distribution is completely clear in this particular case now even if you have a three state process the formulas that you write down in general without detail balance are not so trivial they&#39;re quite involved they get more and more algebraically more complicated but when you have something like detail balance then of course it simplifies enormously but the most popular model is the dichotomous markov process in this case now whats what remains is to ask what happens as a function of time what happens when you put in the time dependences everywhere and this is what happens we need to solve this problem by going back and saying d over d t so we need we write the solution down p of t is e to the w t p of 0 and we need to know what is w and w was equal to w 1 1 w 1 2 but w 1 2 was the rate to go from 2 to 1 on this side which was lambda 2 and similarly w 2 1 was the rate to go from 1 to 2 and the diagonal elements are minus those guys so that is w the transition matrix and we need to find the exponential of this matrix so you square it you cube it and things like that and find out what the exponential but matters are made a little simpler by noticing that w squared is minus lambda 1 lambda 2 lambda 1 minus lambda 2 this is w squared and what is that equal to its lambda 1 squared plus lambda 1 lambda 2 so take out a lambda 1 and you get lambda 1 into lambda 1 plus lambda 2 and then minus lambda 1 squared minus lambda 1 lambda 2 so again you take out a lambda 1 and then on the other side you get minus lambda 1 lambda 2 minus lambda 2 squared so it is minus lambda 2 and 1 plus lambda 2 and then finally lambda 1 lambda 2 plus lambda 2 squared so that is lambda 2. plus lambda 2 so we can take out the lambda 1 plus lambda 2 and write this as equal to lambda 1 plus lambda 2 times lambda 1 minus lambda 1 that&#39;s just minus of this guy out here so this is turning out to be equal to minus and then a w itself okay now the rate to go from 1 to 2 is lambda 1 and 2 to 1 is lambda 2 and the average rate is half the sum right so let us define define the mean transition rate lambda is lambda 1 plus lambda 2 over 2 so it says that w squared equal to minus 2 lambda w that of course immediately makes the problem of finding the exponential trivial because it says w cubed is proportional to w once again and so on and so forth so where does it get us it says e to the w t is the identity matrix plus w t plus t squared over two factorial w squared which is minus two lambda w and then t cubed over three factorial and then w cube but thats w times w squared and so on so its minus two lambda whole squared w etcetera all the way so let us write this as i plus let us take out a w ah let us take out the w and then let us write this term as minus 2 lambda w t divided by 2 lambda let&#39;s write it in that form so this becomes over minus two lambda minus two lambda take out a minus 2 lambda in this fashion and then inside i have minus 2 lambda t plus minus 2 lambda t whole squared over two factorial plus dot i took out a minus two lambda so put let me put that back in here so it matches this power out here but what is this fellow e to the minus two lambda t minus one right so lets erase this one minus e to the minus two lambda t and get rid of the minus sign and its over right now we can write the full probability distribution down completely i leave that to you as an exercise to write this down so you should be able to write down p of ah now let us let us put in the values that we have for this process so you should be able to find p of c one comma t given that you started in c one p of c two comma t given that you started in c one how would you find this what is this guy equal to this is e to the w t on the initial matrix and what is the initial matrix if i start with c 1 what is this equal to it is equal to 1 at equal and what is this 0. so all you have to do is to apply that to this form all you need to do is to apply this matrix which is a 2 by 2 matrix because we know both w and i apply it to this fellow and read off these two numbers and similarly p of c one t c two and p of c two t c two this column matrix is equal to e to the w t that&#39;s this matrix acting on zero one and you can write the full matrix now by the way just to check since we have this statement here if you let this if you let t tends to infinity if you let t tend to infinity we better recover the values that we already had for the equilibrium distribution what happens to this guy it becomes a stationary distribution p 1 p 2 right so you have p 1 p 2 should be the limit of e to the w t on the initial state whatever that state is whether it is in one or two we do not care now what is the limit of e to the w t you can read it off from here yeah so we know that the limit t tends to infinity e to the w t turns out to be the identity matrix plus w over two lambda because this goes away so you have ah one zero zero one plus w over two lambda and now you have to tell me what is one over two lambda what is w by the way ah it was lambda 1 here lambda 2 here minus lambda 1 minus lambda 2 in this fashion ok added to this guy out here and this is lambda 1 plus lambda 2 okay so what does this become equal to ah lambda 1 plus lambda 2 so it&#39;s lambda 2 again a lambda 2 and then lambda 1 and the lambda 1 1 over 2 lambda times that right on whatever initial state you want to put down now one of our articles of faith is that as time increases the initial value should not matter so whatever be the initial distribution i should still get the equilibrium or stationary distribution to be the same thing so whether i apply it on 0 1 or 1 0 it shouldn&#39;t matter it shouldn&#39;t matter at all otherwise i am in trouble right in fact i could have had a of this and one minus a of the other so let us do that and see what happens so suppose the initial state this fellow here had been sum a b with a plus b equal to one non negative number such that a plus b is one so we apply it to a b and what do you get you get lambda two a plus lambda 2 b so lambda 2 comes out and you have a plus b which is 1 it goes away so this indeed gives you lambda 2 over lambda 1 plus lambda 2 and lambda 1 over lambda plus lambda two independent of a and b well independent of a b is one minus a in case so that corroborates the requirement we had that the stationary distribution should not depend on the initial distribution at all okay so it does it does cancel out this thing does that is why you had the same element both places out here and that is exactly the this is exactly the equilibrium distribution we discovered earlier ok but you can write the time dependent one out here write this out explicitly i leave you to do this little bit of algebra but you get expressions what is that so what is the decay to equilibrium what is it going like there is a constant part in these probabilities which gives you the stationary part and then there is a part which decays there is only one more eigen value and what is that other eigen value two two minus two lambda minus two lambda so the correlation time you expect the correlation time in this process is going to be 2 lambda inverse and again that corroborates what we are going to find the correlation time explicitly i am going to define it in a minute but you see it is this time scale that is the correlation time in a very specific way we will see that the autocorrelation will die down with this exponent out here but what is this guy equal to this is equal to one over lambda one plus lambda two right which in turn remember this is its equal to tau one tau two over tau one plus tau because this is one over tau one that&#39;s one over tau and what is this number what sort of mean is it it&#39;s a harmonic mean of the individual resonance times right so we have this very simple relationship which says in a dichotomous markov process with mean residence times lambda tau 1 and tau 2 in the two states the correlation time of this process the time on which it loses memory in some sense is the harmonic mean of these two individual times ok now let us define correlation well some things can be written down immediately and then we will if time permits do this or i come back to this a little later we will write the answer down at least in equilibrium we will write the answer down so i ask now what is the mean value of this process let us call that process x so what is the mean value of x x has values c one and c two the sample space of x comprises the two values c one and c two and the system is switching randomly for back and forth between these two values with mean residence time tau one and tau two respectively in the two states so what you expect is the average value of x x stationary x stationary is with respect to the stationary distribution by definition this is equal to c one p one plus c two p two by definition the stationary or t turning to infinity average has got to be this because p one p two are the stationary probabilities and c one c two are the corresponding values therefore this is the weighted average and its a normalized probability and what is this equal to not surprisingly this is c 1 tau 1 plus c 2 tau 2 over tau 1 plus tau 2 or in terms of lambda is it c 1 lambda 2 plus c 2 lambda 1 over lambda 1 plus lambda what is the mean square going to be for same guy but with squares right so its c one squared tau one plus c two squared tau two over tau one plus tau two again this is the stationary and what is the variance going to be so what is delta x that is the difference between this and the mean squared stationary delta x whole square stationary what is that going to be that is equal to this minus the square of this row now the square of this is first of all if i take this minus the square of that there is a tau one plus tau two multiplying this follow here so let us quickly do that with c one square tau one plus c two square tau two tau one plus tau two minus c one square tau one squared plus c two square tau two squared plus two c one c two tau one tau two the whole thing divided by tau one plus tau two square out here so some terms definitely cancel out we can write down a very simple formula and that is delta x whole square in the stationary state is equal to first of all the tau one squared cancels with this guy the tau two squared cancels with that and then you have a c one squared tau one tau two or c two square tau one tau two and then you have minus two c one c two tau one tau two so this is equal to tau one tau two that&#39;s great right that&#39;s it ok now what is tau 1 tau 2 over tau 1 plus tau 2 its its the correlation time right so there is a direct connection between this what happens if what happens if c c one c two equal to minus c one this becomes a square of whatever value so sometimes it switches between plus one and minus one or something like that then the formulas simplify that is called the symmetric dichotomous process we will talk about that little while later but what happens if there is time dependence so the next question to ask is the auto correlation the generalization of the variance and i will do that tomorrow we need to generalize this to ask delta x at any instant t1 delta x at any instant t two what is that equal to that generalizes the variance but this is a stationary process so this thing must be a function of t two minus t one or something so i might as well look at delta x at any time zero and then call this just t what would this be if t equal to 0 it should reduce to this right so its this time something or the other and you expect the memory is going to drop as a function of time exponentially with the correlation time which is 2 lambda inverse so what do you expect this formula to be what do you what do i expect this to be in general that is equal to this multiplied by e power minus 2 lambda so the dichotomous markov process is exponentially correlated ok i will prove this we have to define the correlation time correlation self auto correlation more precisely will do that and then we take it from this point tomorrow okay you

Transcript for:Master Equation for Markov Processes

Transcript for:
Master Equation for Markov Processes