Understanding Markov Chains

Understanding Markov Chains#

Markov chains were introduced by Andrey Markov (1856 - 1922) during the early 20th century. These mathematical models have been extensively used in diverse fields such as finance, physics, biology, engineering, and computer science. Markov chains are stochastic processes that model the probability of moving from one state to another in discrete time steps. The model assumes that the probability of transitioning from one state to another solely depends on the current state and not on any past events. Markov chains are especially valuable for modeling complex systems that involve uncertainty and randomness.

Discrete-time Markov chains#

../_images/Fig-Discrete-MarkovChain-Schematic.pdf — Fig. 11 Schematic of a discrete two-state time-invariant Markov model; \(p_{ij}\) denotes the time-invariant transition probability between state \(i\) and \(j\).#

A discrete-time Markov chain is a sequence of random variables \(X_{1}\), \(X_{2}\), \(X_{3}\), …, \(X_{n}\) that have the Markov property, i.e., the probability of moving to the next state depends only on the present state and not on the previous states:

(21)#\[P(X_{n+1} = x | X_{1}=x_{1}, \dots, X_{n}=x_{n}) = P(X_{n+1} = x | X_{n}=y)\]

where states refer to a finite set of discrete values in which the system can exist. If the state space is finite, the transition probability distribution, i.e., the probability of moving from the state(s) \(i\rightarrow{j}\), can be encoded in the transition matrix \(\mathbf{P}\). Elements of \(\mathbf{P}\), denoted as \(p_{ij}\), encode the probability of moving from state \(i\rightarrow{j}\) during the next time step:

(22)#\[p_{ij} = P(X_{n+1}~=~j~|~X_{n}~=~i)\]

The transition matrix \(\mathbf{P}\) has interesting properties:

First, the rows of transition matrix \(\mathbf{P}\) must sum to unity, i.e., each row encodes the probability of all possible outcomes. Thus, it must sum to one.
Second, if the transition matrix \(\mathbf{P}\) is time-invariant, then \(\mathbf{P}\) is the same at each step, i.e., \(p_{ij}\) doesn’t change as \(n\rightarrow{n+1}~\forall{n}\).

Putting these ideas together gives (Definition 13):

Definition 13 (Time-invariant state transition)

A Markov chain has finite state set \(\mathcal{S}\) and the time-invariant state transition matrix \(\mathbf{P}\). Let the state vector at time \(j\) be given by \(\mathbf{\pi}_{j}\), where \(\pi_{s,j}\geq{0},\forall{s}\in\mathcal{S}\) and:

\[\sum_{s\in\mathcal{S}}\pi_{s,j} = 1\qquad\forall{j}\]

Then, the state of the Markov chain at time step \(n+1\) is given by:

\[\mathbf{\pi}_{n+1} = \mathbf{\pi}_{n}\mathbf{P}^n\]

where \(\mathbf{\pi}_{n}\) denotes the system state vector at time step \(n\).

Finally, suppose that a Markov chain is both time-invariant and non-periodic. Then, there exists a unique stationary distribution \(\pi\) such that \(\mathbf{P}^{k}\) converges to a rank-one matrix in which each row is the stationary distribution \(\pi\):

\[\lim_{k\rightarrow\infty} \mathbf{P}^{k} = \mathbf{1}\pi\]

where \(\mathbf{1}\) is a column vector of all 1’s. Let’s consider an example to make these ideas less abstract (Example 1):

Example 1 (Discrete Markov chain stationary distribution)

Consider the time-invariant two-state Discrete Markov chain with state transition matrix \(\mathbf{P}\):

\[\begin{split} \mathbf{P} = \begin{bmatrix} 0.9 & 0.1 \\ 0.6 & 0.4 \\ \end{bmatrix} \end{split}\]

shown in (Fig. 11). This transition matrix admits a stationary (non-periodic) solution. As the number of iterations \(n\) becomes large the system state converges to a stationary distribution \(\pi\). Thus, regardless of the starting state of this Markov chain, the long-term behavior is given by the stationary distribution \(\pi\).

Develop a script to compute the stationary distribution \(\pi\):

# load package -
using LinearAlgebra

"""
    iterate(P::Array{Float64,2}, counter::Int; 
        maxcount::Int = 100, ϵ::Float64 = 0.1) -> Array{Float64,2}

Recursively computes a stationary distribution. 
Computation stops if ||P_new - P|| < ϵ or the max number of iterations is hit. 
"""
function iterate(P::Array{Float64,2}, counter::Int; maxcount::Int = 100, ϵ::Float64 = 0.1)::Array{Float64,2}

    # base case -
    if (counter == maxcount)
        return P
    else
        # generate a new P -
        P_new = P^(counter+1)
        err = P_new - P;
        if (norm(err)<=ϵ)
            return P_new
        else
            # we have NOT hit the error target, or the max iterations
            iterate(P_new, (counter+1), maxcount=maxcount, ϵ = ϵ)
        end
    end
end

# Setup the transition probability matrix -
P = [
    0.9 0.1;
    0.6 0.4;
];

# compute -
counter = 1
P_stationary = iterate(P, counter;  maxcount = 10, ϵ = 0.001)

Sampling the stationary distribution#

Once the stationary distribution \(\pi\) of the Markov chain has been estimated, we can use it to generate samples from that chain. For example, the stationary distribution for the two-state Markov chain in Example 1 is given by:

\[\pi = (0.857,0.143)\]

We model this system as a Categorical distribution. A categorical distribution is a discrete probability distribution that models the probability of a random variable taking on one of a finite set of possible outcomes:

(23)#\[P(X = k) = \pi\left[k\right]\]

Sampling a categorical distribution constructed from the stationary distribution \(\pi\) will return the stationary distribution (Example 2):

Example 2 (Categorical distribution)

Sample a categorical distribution constructed using the stationary distribution:

\[\pi = (0.857,0.143)\]

Generate \(N = 1000\) samples and compute the fraction of state 1 and state 2.

# include -
include("Include.jl")

# setup -
π = [ 0.857, 0.143];
number_of_samples = 1000;

# build a categorical distribution 
d = Categorical(π);

# sample -
samples = rand(d, number_of_samples)

# how many 1's -
number_of_1 = findall(x-> x == 1, samples) |> length;
number_of_2 = findall(x-> x == 2, samples) |> length;

# println -
println("Fraction of state 1: $(number_of_1/number_of_samples) and state 2: $(number_of_2/number_of_samples)")

Hidden Markov Models (HMMs)#

Hidden Markov models (HMMs) are statistical models in which the system being modeled is assumed to be a Markov process with unobservable states \(s\in\mathcal{S}\) but observable outcomes \(o\in\mathcal{O}\). HMMs have the same structural components as a standard Markov chain model. However, each hidden state can be considered sending an observable single, with the emission probability.

../_images/Fig-HMM-Schematic-23.pdf — Fig. 12 Schematic of a discrete two-state time-invariant hidden Markov model (HMM); \(p_{ij}\) denotes the time-invariant transition probability between state \(i\) and \(j\) while \(t_{ij}\) denotes the emission probability for state \(i\) and observation \(j\).#

The emission probability refers to the likelihood of observing a particular output \(Y = o_{t}\), given the current state of the Markov chain \(X = s_{t}\):

(24)#\[P(Y = o_{t} | X = s_{t})\]

Similar to the transition probability, the emission probability must sum to unity:

\[\sum_{o\in\mathcal{O}} P(Y = o | X = s) = 1\qquad\forall{s\in\mathcal{S}}\]

The emission probability plays a crucial role in HMMs, as it is used to calculate the likelihood of a sequence of observed symbols, given the current state of the hidden Markov chain. This likelihood is then used in various applications, including speech recognition, natural language processing, and bioinformatics. The emission probability can be computed using different methods, including maximum likelihood estimation or Bayesian inference.

Let’s build upon Example 1 and construct an HMM that mimics a trinomial lattice model of Boyle (Example 3):

Example 3 (Stationary hidden Markov model)

Model the share price of a stock XYZ using a time-invariant two-state Discrete Markov chain with the state transition matrix \(\mathbf{P}\):

\[\begin{split} \mathbf{P} = \begin{bmatrix} 0.60 & 0.40 \\ 0.35 & 0.65 \\ \end{bmatrix} \end{split}\]

and an emission probability matrix \(\mathbf{E}\):

\[\begin{split} \mathbf{E} = \begin{bmatrix} 0.70 & 0.20 & 0.1 \\ 0.10 & 0.20 & 0.7 \\ \end{bmatrix} \end{split}\]

Let \(Y_{i}\) denote the observable value for state \(i\). Assume output state 1 is an up move (a 1% increase), state 2 the price stays the same, and state 3 indicates a price drop (a 1% decrease).

Simulation script:

# include the include -
include("Include.jl"); # load paths, packages and codes

# PHASE 1: Setup the calculation
# Setup/initialize
number_of_hidden_states = 2
number_of_output_states = 3
number_of_simulation_steps = 480
emission_probability_dict = Dict{Int,Categorical}()
simulation_dict = Dict{Int,Int}()

# Transition matrix for the MC
P = [
    0.60 0.40;
    0.35 0.65;
];

# Emission probability matrix -
EPM = [
    0.70 0.20 0.1 ;
    0.10 0.20 0.7 ;
]

# compute the stationary distribution for the MC -
π = iterate(P, 1;  maxcount = 10, ϵ = 0.001)[1,:] # assumption: iterate converges
mcd = Categorical(π); # Markov-chain distribution

# construct emission probability dictionary -
for i ∈ 1:number_of_hidden_states
    emission_probability_dict[i] = Categorical(EPM[i,:])
end

# PHASE 2: Simulation 
for i ∈ 1:number_of_simulation_steps
    
    # which state is the mc in?
    hidden_state = rand(mcd);
    
    # grab the emission probability model from the emission_probability_dict -
    epd = emission_probability_dict[hidden_state];
    
    # role for a random ouput -
    simulation_dict[i] = rand(epd);
end

Plotting script:

# runme -
include("runme.jl") # loads packages, computes simulation_dict -

# Setup -
Sₒ = 100.0;
Δ = [0.01, 0.0, -0.01];
S = Dict{Int,Float64}()
S[0] = Sₒ

# setup colors -
colors = Dict{Int,RGB}();
colors[1] = colorant"#0077BB"

# sim loop -
for i ∈ 1:number_of_simulation_steps

    # market state -
    market_state_index = simulation_dict[i]

    # get previous price -
    S_old = S[i-1];
    S[i] = S_old*exp(Δ[market_state_index]);
end

# make a plot -
plot!(S,lw=3,c=colors[1],label="")
xlabel!("Time index (AU)", fontsize=18)
ylabel!("Share price (USD/share)", fontsize=18)

Summary#

In this lecture, we discussed Markov models:

Discrete-time Markov chains are mathematical models that describe the behavior of a system that transitions between a finite set of states at discrete time steps. The evolution of the system depends only on its current state and not on its previous history.
Hidden Markov Models (HMMs) is a probabilistic model that describes a system where the observations are generated by an underlying Markov process whose states are not directly observable. The goal is to infer the hidden states given only the observations.