Understanding Markov Chains#
Markov chains were introduced by Andrey Markov (1856 - 1922) during the early 20th century. These mathematical models have been extensively used in diverse fields such as finance, physics, biology, engineering, and computer science. Markov chains are stochastic processes that model the probability of moving from one state to another in discrete time steps. The model assumes that the probability of transitioning from one state to another solely depends on the current state and not on any past events. Markov chains are especially valuable for modeling complex systems that involve uncertainty and randomness.
Discrete-time Markov chains#
A discrete-time Markov chain is a sequence of random variables \(X_{1}\), \(X_{2}\), \(X_{3}\), …, \(X_{n}\) that have the Markov property, i.e., the probability of moving to the next state depends only on the present state and not on the previous states:
where states refer to a finite set of discrete values in which the system can exist. If the state space is finite, the transition probability distribution, i.e., the probability of moving from the state(s) \(i\rightarrow{j}\), can be encoded in the transition matrix \(\mathbf{P}\). Elements of \(\mathbf{P}\), denoted as \(p_{ij}\), encode the probability of moving from state \(i\rightarrow{j}\) during the next time step:
The transition matrix \(\mathbf{P}\) has interesting properties:
First, the rows of transition matrix \(\mathbf{P}\) must sum to unity, i.e., each row encodes the probability of all possible outcomes. Thus, it must sum to one.
Second, if the transition matrix \(\mathbf{P}\) is time-invariant, then \(\mathbf{P}\) is the same at each step, i.e., \(p_{ij}\) doesn’t change as \(n\rightarrow{n+1}~\forall{n}\).
Putting these ideas together gives (Definition 13):
(Time-invariant state transition)
A Markov chain has finite state set \(\mathcal{S}\) and the time-invariant state transition matrix \(\mathbf{P}\). Let the state vector at time \(j\) be given by \(\mathbf{\pi}_{j}\), where \(\pi_{s,j}\geq{0},\forall{s}\in\mathcal{S}\) and:
Then, the state of the Markov chain at time step \(n+1\) is given by:
where \(\mathbf{\pi}_{n}\) denotes the system state vector at time step \(n\).
Finally, suppose that a Markov chain is both time-invariant and non-periodic. Then, there exists a unique stationary distribution \(\pi\) such that \(\mathbf{P}^{k}\) converges to a rank-one matrix in which each row is the stationary distribution \(\pi\):
where \(\mathbf{1}\) is a column vector of all 1’s. Let’s consider an example to make these ideas less abstract (Example 1):
(Discrete Markov chain stationary distribution)
Consider the time-invariant two-state Discrete Markov chain with state transition matrix \(\mathbf{P}\):
shown in (Fig. 11). This transition matrix admits a stationary (non-periodic) solution. As the number of iterations \(n\) becomes large the system state converges to a stationary distribution \(\pi\). Thus, regardless of the starting state of this Markov chain, the long-term behavior is given by the stationary distribution \(\pi\).
Develop a script to compute the stationary distribution \(\pi\):
# load package -
using LinearAlgebra
"""
iterate(P::Array{Float64,2}, counter::Int;
maxcount::Int = 100, ϵ::Float64 = 0.1) -> Array{Float64,2}
Recursively computes a stationary distribution.
Computation stops if ||P_new - P|| < ϵ or the max number of iterations is hit.
"""
function iterate(P::Array{Float64,2}, counter::Int; maxcount::Int = 100, ϵ::Float64 = 0.1)::Array{Float64,2}
# base case -
if (counter == maxcount)
return P
else
# generate a new P -
P_new = P^(counter+1)
err = P_new - P;
if (norm(err)<=ϵ)
return P_new
else
# we have NOT hit the error target, or the max iterations
iterate(P_new, (counter+1), maxcount=maxcount, ϵ = ϵ)
end
end
end
# Setup the transition probability matrix -
P = [
0.9 0.1;
0.6 0.4;
];
# compute -
counter = 1
P_stationary = iterate(P, counter; maxcount = 10, ϵ = 0.001)
Sampling the stationary distribution#
Once the stationary distribution \(\pi\) of the Markov chain has been estimated, we can use it to generate samples from that chain. For example, the stationary distribution for the two-state Markov chain in Example 1 is given by:
We model this system as a Categorical distribution. A categorical distribution is a discrete probability distribution that models the probability of a random variable taking on one of a finite set of possible outcomes:
Sampling a categorical distribution constructed from the stationary distribution \(\pi\) will return the stationary distribution (Example 2):
(Categorical distribution)
Sample a categorical distribution constructed using the stationary distribution:
Generate \(N = 1000\) samples and compute the fraction of state 1
and state 2
.
# include -
include("Include.jl")
# setup -
π = [ 0.857, 0.143];
number_of_samples = 1000;
# build a categorical distribution
d = Categorical(π);
# sample -
samples = rand(d, number_of_samples)
# how many 1's -
number_of_1 = findall(x-> x == 1, samples) |> length;
number_of_2 = findall(x-> x == 2, samples) |> length;
# println -
println("Fraction of state 1: $(number_of_1/number_of_samples) and state 2: $(number_of_2/number_of_samples)")
Summary#
In this lecture, we discussed Markov models:
Discrete-time Markov chains are mathematical models that describe the behavior of a system that transitions between a finite set of states at discrete time steps. The evolution of the system depends only on its current state and not on its previous history.
Hidden Markov Models (HMMs) is a probabilistic model that describes a system where the observations are generated by an underlying Markov process whose states are not directly observable. The goal is to infer the hidden states given only the observations.