Understanding Markov Chains#

Markov chains were introduced by Andrey Markov (1856 - 1922) during the early 20th century. These mathematical models have been extensively used in diverse fields such as finance, physics, biology, engineering, and computer science. Markov chains are stochastic processes that model the probability of moving from one state to another in discrete time steps. The model assumes that the probability of transitioning from one state to another solely depends on the current state and not on any past events. Markov chains are especially valuable for modeling complex systems that involve uncertainty and randomness.


Discrete-time Markov chains#

../_images/Fig-Discrete-MarkovChain-Schematic.pdf

Fig. 11 Schematic of a discrete two-state time-invariant Markov model; \(p_{ij}\) denotes the time-invariant transition probability between state \(i\) and \(j\).#

A discrete-time Markov chain is a sequence of random variables \(X_{1}\), \(X_{2}\), \(X_{3}\), …, \(X_{n}\) that have the Markov property, i.e., the probability of moving to the next state depends only on the present state and not on the previous states:

(21)#\[P(X_{n+1} = x | X_{1}=x_{1}, \dots, X_{n}=x_{n}) = P(X_{n+1} = x | X_{n}=y)\]

where states refer to a finite set of discrete values in which the system can exist. If the state space is finite, the transition probability distribution, i.e., the probability of moving from the state(s) \(i\rightarrow{j}\), can be encoded in the transition matrix \(\mathbf{P}\). Elements of \(\mathbf{P}\), denoted as \(p_{ij}\), encode the probability of moving from state \(i\rightarrow{j}\) during the next time step:

(22)#\[p_{ij} = P(X_{n+1}~=~j~|~X_{n}~=~i)\]

The transition matrix \(\mathbf{P}\) has interesting properties:

  • First, the rows of transition matrix \(\mathbf{P}\) must sum to unity, i.e., each row encodes the probability of all possible outcomes. Thus, it must sum to one.

  • Second, if the transition matrix \(\mathbf{P}\) is time-invariant, then \(\mathbf{P}\) is the same at each step, i.e., \(p_{ij}\) doesn’t change as \(n\rightarrow{n+1}~\forall{n}\).

Putting these ideas together gives (Definition 13):

Definition 13 (Time-invariant state transition)

A Markov chain has finite state set \(\mathcal{S}\) and the time-invariant state transition matrix \(\mathbf{P}\). Let the state vector at time \(j\) be given by \(\mathbf{\pi}_{j}\), where \(\pi_{s,j}\geq{0},\forall{s}\in\mathcal{S}\) and:

\[\sum_{s\in\mathcal{S}}\pi_{s,j} = 1\qquad\forall{j}\]

Then, the state of the Markov chain at time step \(n+1\) is given by:

\[\mathbf{\pi}_{n+1} = \mathbf{\pi}_{n}\mathbf{P}^n\]

where \(\mathbf{\pi}_{n}\) denotes the system state vector at time step \(n\).

Finally, suppose that a Markov chain is both time-invariant and non-periodic. Then, there exists a unique stationary distribution \(\pi\) such that \(\mathbf{P}^{k}\) converges to a rank-one matrix in which each row is the stationary distribution \(\pi\):

\[\lim_{k\rightarrow\infty} \mathbf{P}^{k} = \mathbf{1}\pi\]

where \(\mathbf{1}\) is a column vector of all 1’s. Let’s consider an example to make these ideas less abstract (Example 1):

Sampling the stationary distribution#

Once the stationary distribution \(\pi\) of the Markov chain has been estimated, we can use it to generate samples from that chain. For example, the stationary distribution for the two-state Markov chain in Example 1 is given by:

\[\pi = (0.857,0.143)\]

We model this system as a Categorical distribution. A categorical distribution is a discrete probability distribution that models the probability of a random variable taking on one of a finite set of possible outcomes:

(23)#\[P(X = k) = \pi\left[k\right]\]

Sampling a categorical distribution constructed from the stationary distribution \(\pi\) will return the stationary distribution (Example 2):

Hidden Markov Models (HMMs)#

Hidden Markov models (HMMs) are statistical models in which the system being modeled is assumed to be a Markov process with unobservable states \(s\in\mathcal{S}\) but observable outcomes \(o\in\mathcal{O}\). HMMs have the same structural components as a standard Markov chain model. However, each hidden state can be considered sending an observable single, with the emission probability.

../_images/Fig-HMM-Schematic-23.pdf

Fig. 12 Schematic of a discrete two-state time-invariant hidden Markov model (HMM); \(p_{ij}\) denotes the time-invariant transition probability between state \(i\) and \(j\) while \(t_{ij}\) denotes the emission probability for state \(i\) and observation \(j\).#

The emission probability refers to the likelihood of observing a particular output \(Y = o_{t}\), given the current state of the Markov chain \(X = s_{t}\):

(24)#\[P(Y = o_{t} | X = s_{t})\]

Similar to the transition probability, the emission probability must sum to unity:

\[\sum_{o\in\mathcal{O}} P(Y = o | X = s) = 1\qquad\forall{s\in\mathcal{S}}\]

The emission probability plays a crucial role in HMMs, as it is used to calculate the likelihood of a sequence of observed symbols, given the current state of the hidden Markov chain. This likelihood is then used in various applications, including speech recognition, natural language processing, and bioinformatics. The emission probability can be computed using different methods, including maximum likelihood estimation or Bayesian inference.

Let’s build upon Example 1 and construct an HMM that mimics a trinomial lattice model of Boyle (Example 3):


Summary#

In this lecture, we discussed Markov models:

  • Discrete-time Markov chains are mathematical models that describe the behavior of a system that transitions between a finite set of states at discrete time steps. The evolution of the system depends only on its current state and not on its previous history.

  • Hidden Markov Models (HMMs) is a probabilistic model that describes a system where the observations are generated by an underlying Markov process whose states are not directly observable. The goal is to infer the hidden states given only the observations.