Markov chains
Markov chains
We refer to a Markov model in which all states are observable and in which time occurs in discrete steps as a Markov chain.
Extending forward in time or in some sequence
Interesting questions we can answer with Markov chains are of the form: given that we are in state X at time t what are the probabilities of being in various states at time t + n. For example, if Egbert is listening to avant-garde music now, what predictions can we make for the next album he plays, and the next, and the next?
We represent probabilities of being in any given state as a state vector, and then multiply the state vector by the stochastic matrix to get a new vector of probabilities.
If we know Egbert is currently listening to avant-garde music (state C), we can represent this as the state vector
\begin{bmatrix} 0.00 & 0.00 & 1.00\\ \end{bmatrix}
To get a new state vector, indicating the probabilities of what Egbert will be listening to at time t + 1, we multiply the state vector by the stochastic matrix.
\begin{bmatrix} 0.00 & 0.00 & 1.00 \end{bmatrix} \begin{bmatrix} 0.30 & 0.22 & 0.48 \\ 0.14 & 0.71 & 0.15 \\ 0.05 & 0.64 & 0.31 \end{bmatrix} = \begin{bmatrix} 0.05 & 0.64 & 0.31 \end{bmatrix}
In this example, you can see that the math worked out to the exact same values that we see in the third row of the matrix. So the resultant vector on the right represents the probabilities that at time t+1 Egbert will be listening to jazz, indie, or avant-garde (given that he’s listening to avant-garde at time t). But what if we wanted to compute the probability of what he’d be listening to at time t+2? We’d multiply the state vector for time t+1 to get a new vector for time t+2.
\begin{bmatrix} 0.05 & 0.64 & 0.31 \end{bmatrix} \begin{bmatrix} 0.30 & 0.22 & 0.48 \\ 0.14 & 0.71 & 0.15 \\ 0.05 & 0.64 & 0.31 \end{bmatrix} = \begin{bmatrix} \text{?} & \text{?} & \text{?} \end{bmatrix}
Now the problem isn’t as easy as picking out a single row, so let’s take a step back and see how we can multiply a vector by a matrix.
Vector by matrix multiplication
We’ll start with a small example. Let’s say we have a 2 \times 2 matrix. If we are to multiply a row vector on the left by a 2 \times 2 matrix on the right, our row vector must have exactly two entries. We multiply each element in the row vector by the corresponding entries in the first column of the matrix, and take the sum.
\begin{bmatrix} a & b \end{bmatrix} \begin{bmatrix} c & d\\ e & f \end{bmatrix} = \begin{bmatrix} ac + be & ad + bf \end{bmatrix}
Here’s another example where we multiply a vector of three elements by a 3 \times 2 matrix.
\begin{bmatrix} x & y & z\end{bmatrix} \begin{bmatrix} a & b\\ c & d\\ e & f \end{bmatrix} = \begin{bmatrix} xa + yc + ze & xb + yd + zf \end{bmatrix}
Notice that in this form, the number of columns in the vector must agree with the number of rows in the matrix, but it needn’t agree with the number of columns in the matrix. If we think of a vector as a 1 \times n matrix (we always write this as rows \times columns) then we can multiply this by any n \times m matrix. We say the inner dimensions must agree (n = n).
Exercises
- Perform the following vector by matrix multiplications:
\begin{bmatrix} 1 & 2 \end{bmatrix} \begin{bmatrix} 2 & 3\\ 0 & 5 \end{bmatrix}
\begin{bmatrix} 1 & 0 & 3\end{bmatrix} \begin{bmatrix} 2 & 4\\ 1 & 5\\ 0 & 2 \end{bmatrix}
- Which of the following are valid dimensions for vector by matrix multiplication?
- 1 \times 7 by 3 \times 7
- 1 \times 7 by 7 \times 3
- 1 \times 3 by 3 \times 7
Solutions to exercises
1a. \begin{bmatrix} 2 & 13\end{bmatrix}
1b. \begin{bmatrix} 2 & 10\end{bmatrix}
2a. Invalid. 7 \neq 3
2b. Valid. 7 = 7
2c. Valid. 3 = 3
Returning to our example
Now that we know how to multiply a vector by a matrix, we can return to the problem of predicting what Egbert will be listening to at time t+2, given that he is listening to avant-garde at time t. We’d been stuck at this point:
\begin{bmatrix} 0.05 & 0.64 & 0.31 \end{bmatrix} \begin{bmatrix} 0.30 & 0.22 & 0.48 \\ 0.14 & 0.71 & 0.15 \\ 0.05 & 0.64 & 0.31 \end{bmatrix} = \begin{bmatrix} \text{?} & \text{?} & \text{?} \end{bmatrix}
Here’s how it works out. The first entry in our new state vector is
0.05 \times 0.30 + 0.64 \times 0.14 + 0.31 \times 0.05 = 0.0150 + 0.0896 + 0.0155 = 0.1201.
Our second entry in our new state vector is 0.05 \times 0.22 + 0.64 \times 0.71 + 0.31 \times 0.64 = 0.0110 + 0.4544 + 0.1984 = 0.6638.
Our third entry is
0.05 \times 0.48 + 0.64 \times 0.15 + 0.31 \times 0.31 = 0.0240 + 0.0960 + 0.0961 = 0.2161.
That is,
\begin{bmatrix} 0.05 & 0.64 & 0.31 \end{bmatrix} \begin{bmatrix} 0.30 & 0.22 & 0.48 \\ 0.14 & 0.71 & 0.15 \\ 0.05 & 0.64 & 0.31 \end{bmatrix} = \begin{bmatrix} 0.1201 & 0.6638 & 0.2161 \end{bmatrix}
Important observations:
- If we multiply a 1 \times n state vector by an n \times n stochastic matrix, we get another 1 \times n state vector—the dimensions do not change.
- The entries in our state vector must sum to one.
- The entries of each row in our stochastic matrix must sum to one.
- The entries in the resulting state vector all sum to one. In the example above, 0.1201 + 0.6638 + 0.2161 = 1.0000.
We can repeat this calculation as many times as we like to compute the probabilities at time t+n for any positive integer n.
Long-run behavior of Markov chains
We can repeat the vector-by-matrix multiplication as many times as we like, computing the probabilities at time t+1, t+2, t+3, and so on. What happens as we keep going? It depends on the chain.
Stationary distributions
For many Markov chains—including Egbert’s listening habits, above—repeated multiplication eventually settles down: at some point, multiplying the state vector by the transition matrix no longer changes it (beyond rounding). We call this a stationary distribution, also known as a steady state. If v is a stationary state vector and P is the transition matrix, then
vP = v.
What’s remarkable is that, for a chain like Egbert’s—where every transition probability is strictly positive—the chain converges to the same stationary distribution no matter which state we start from. The stationary distribution describes the long-run behavior of the chain, regardless of how it began.
Periodic Markov chains
Not every Markov chain settles into a stationary distribution. Consider a two-state chain with states “day” and “night,” in which each state transitions to the other with probability 1.0 (and never to itself). Starting in “day,” the chain visits day, night, day, night, \ldots forever—it never settles down. We say a state is periodic if a return to that state is only possible after a number of steps that is a multiple of some integer greater than one. In the day/night example, a return to “day” always takes an even number of steps.
Notice that a state with a self-loop (a nonzero probability of transitioning to itself) cannot be periodic, since it can always return to itself in a single step.
Absorbing Markov chains
Some Markov chains have one or more states that, once entered, can never be left: a state with a self-loop of probability 1.0 and no other outgoing transitions. We call such a state an absorbing state, and a chain containing one or more absorbing states an absorbing Markov chain. Once an absorbing chain enters an absorbing state, it stays there forever, so the long-run behavior of the chain is simply to end up in one of its absorbing states.
Copyright © 2023–2026 Clayton Cafiero
No generative AI was used in producing drafts of this material. This was written the old-fashioned way. AI was used to rewrite existing pseudocode in LaTeX to produce standalone *.tex files for rendering, and for revisions toward satisfying WCAG 2.1 AA-level accessibility standards as required by UVM policy. AI may also have been used to proofread selected human-written prose. Claude 2.1 with model Sonnet 4.6. Revisions, if any, were performed by the author. AI was not used in generating any code whatsoever. All code samples and starter code are by the author only.