Random variables and probability

Random variables and probability#

Random variables and probability are central concepts in understanding uncertainty. Random variables represent quantities that can take on different values with a particular probability distribution. They are used to model uncertainty and randomness in various fields, including finance, physics, and engineering. Probability theory provides a framework for understanding and quantifying the likelihood of different events occurring based on the underlying distribution of the random variables involved.

A random variable is a variable \(X\) that takes on different values \(x\) according to the outcome of a random event or process. There are two types of random variables: discrete random variables and continuous random variables. Discrete random variables can take on a countable number of distinct values, while continuous random variables can take on any value in a continuous range.
Probability measures the likelihood that a particular event or outcome will occur and is commonly used to quantify uncertainty in various fields, such as science, engineering, economics, and finance. For a discrete random variable, the likelihood that \(X=x\) is described by a Probability Mass Function (PMF) and a Probability Density Function (PDF) for continuous random variables.

Probability spaces#

Frequentists argue that probability is the relative frequency or propensity of a particular outcome in the set of all possible outcomes. On the other hand, Bayesians argue that probability is a subjective belief. The context of your problem will typically suggest which perspective to use. For example, when you have a shortage of data, a Bayesian approach allows you to use prior knowledge or belief. On the other hand, when in a data-rich environment, a frequentist approach calculates the probability of an event directly from the data, along with confidence intervals on your estimate.

Whether you prefer the frequentist’s or the Bayesian view, there is a more fundamental notion of probability thanks to Andrey Kolmogorov, namely, the probability is a measure of the size of a set (Definition 19):

Definition 19 (Probability space)

A probability space is described by the tuple of objects \((\Omega,\mathcal{F},P)\):

The sample space \(\Omega\) holds the set of all possible outcomes from an experiment.
The event space \(\mathcal{F}\) is the collection of all possible events. An event \(E\) is a subset in \(\Omega\) that defines an outcome or a combination of outcomes.
The probability law \(P\) is a mapping from an event \(E\) to a number \(P(E)\) which, measures the size of the event \(E\).

The sample space and the event space are all based on statements, for example, getting a head when flipping a coin, winning the game, or drawing a card, etc. These statements are not numbers; how do we convert a statement to a number? The answer is a random variable; random variables are mappings from events to numbers, these numbers are probabilities.

Sample space \(\Omega\)#

Given an experiment, the sample space \(\Omega\) is the set containing all possible outcomes of that experiment. These outcomes can be numbers, alphabets, vectors, or functions, as well as, images, videos, EEG signals, audio speeches, etc.

Let’s consider an example of a six sided dice (Example 4):

Example 4 (Sample space \(\Omega\) of six sided dice)

Suppose we were intersted in the outcome of experiment where a six sided dice was rolled on time. Then the sample space for this experiment \(\Omega\) is given by:

\[\Omega=\left\{1,2,3,4,5,6\right\}\]

The cardinality of this sample space \(\dim\left(\Omega\right) = 6\).

Event space \(\mathcal{F}\)#

The sample space \(\Omega\) contains all the possible outcomes of an experiment. However, we may not be interested in an individual outcome. Rather we may be interested in combinations of individual outcomes where the elements of these sets share some common trait, e.g., even integers or the collection of face cards, etc. These subsets are called events \(E\subseteq\Omega\), and the set of all possible events, denoted as \(\mathcal{F}\), is called the event space. Thus, the event space \(\mathcal{F}\) is a particular set of sets; it’s the set of all possible subsets.

Let’s enumerate the event space for the four suits of a typical deck of cards (Example 5):

Example 5 (Enumerate the event space \(\mathcal{F}\) )

Construct the event space \(\mathcal{F}\) for the sample space \(\Omega=\left\{\clubsuit, \diamondsuit, \heartsuit, \spadesuit\right\}\). The cardinality of \(\Omega\) is \(\dim\left(\Omega\right) = 4\).

Solution: The \(\dim(\mathcal{F}) = 2^n\) where the elements of \(\mathcal{F}\) correspond to the first \(\dim(\Omega)\) digits of the binary representation of the integers \(i=0,\dots,\dim(\mathcal{F})-1\). For example, the bitstring 0000, which corresponds to \(i=0\), represents the empty set \(\emptyset\) while the bitstring 1111, which corresponds to \(i=15\), corresponds to \(\left\{\clubsuit, \diamondsuit, \heartsuit, \spadesuit\right\}\).

index	bitstring	\(x\in\mathcal{F}\)
0	`0000`	\(\emptyset\)
1	`0001`	\(\left\{\spadesuit\right\}\)
2	`0010`	\(\left\{\heartsuit\right\}\)
3	`0011`	\(\left\{\heartsuit, \spadesuit\right\}\)
.	….	…
9	`1001`	\(\left\{\clubsuit, \spadesuit\right\}\)
.	….	…
14	`1110`	\(\left\{\clubsuit, \diamondsuit, \heartsuit\right\}\)
15	`1111`	\(\left\{\clubsuit, \diamondsuit, \heartsuit, \spadesuit\right\}\)

Probability law \(P\)#

A probability law \(P\) is a function \(P\) : \(\mathcal{F}\rightarrow\left[0, 1\right]\); the function \(P\) maps an event (set) \(E\subseteq\Omega\) to a real number in \(\left[0, 1\right]\). The definition above does not specify how an event \(E\subseteq\Omega\) is being mapped to a number. However, since probability is a measure of the size of a set, a meaningful probability law \(P\) should be consistent for all \(E\subseteq\Omega\).

This requires rules, known as the axioms of probability (Axiom 1):

Axiom 1 (Axioms of Probability)

A probability law \(P\) is a function \(P:\mathcal{F}\rightarrow\left[0, 1\right]\) that maps an event \(E\subseteq\Omega\) to a real number on the interval \(\left[0, 1\right]\). The function \(P\) must satisfy the three axioms of probability:

Non-negativity: \(P(E)\geq{0}\), for any \(E\in\mathcal{F}\)
Normalization: \(P(\Omega)=1\)
Additivity: For any disjoint event sets \(\left\{E_{1}, E_{2}, \dots, E_{n}\right\}\) then \(P\left(\bigcup_{i=1}^{n}E_{i}\right) = \sum_{i=1}^{n}P(E_{i})\)

Conditional Probability#

The motivation of conditional probability is to restrict the probability to a subevent happening in the sample space. If B has happened, the probability for A to also happen is P[A∩B]/P[B]. If two events are not influencing each other, then we say that A and B are independent.

Independence versus Disjoint#

Conditional probability deals with situations where two events, \(A\) and \(B\), are related. However, what if the two events are unrelated, i.e., information about one event says nothing about the second event? In this case, the events \(A\) and \(B\) are independent (Definition 20):

Definition 20 (Statistical independence of events)

Two events A and B are statistically independent if:

\[P\left(A\cap{B}\right) = P(A)P(B)\]

However, independence says nothing about whether two events are disjoint. Suppose events \(A\) and \(B\) were disjoint, then we know that \(A\cap{B} = 0\) and:

\[P\left(A\cap{B}\right) = 0\]

But, this says nothing about whether \(P(A\cap{B})\) can be factorized into the product \(P(A)P(B)\). The only case when disjoint implies independence is if either \(P(A) = 0\) or \(P(B) = 0\).

Bayes’ theorem#

Bayes’ theorem, named after Thomas Bayes, describes the likelihood of an event based on prior knowledge of conditions related to the event (Theorem 1):

Theorem 1 (Bayes’ theorem)

For any two events \(A\) and \(B\) where \(P(A) > 0\) and \(P(B) > 0\), the conditional probability \(P(A\vert{B})\) is given by:

\[P(A\vert{B}) = \frac{P\left(A\cap{B}\right)}{P\left(B\right)}\]

Bayes’ theorem provides two views of the intersection \(P\left(A\cap{B}\right)\) using two different conditional probabilities. To see this, we use the fact that the order of the events \(A\) and \(B\) is arbitrary:

\[P(A\,\vert{B})P(B) = P(B\,\vert{A})P(A) = P(A \cap B)\]

Thus, Bayes’ theorem offers a mechanism to interconvert \(P(A\vert{B})\) and \(P(B\vert{A})\).

Law of Total Probability#

The law of total probability is a fundamental concept in probability theory that allows us to compute the probability of an event by considering all the possible ways in which it can occur (Theorem 2):

Theorem 2 (Law of Total Probability)

Let \(\left\{A_{1},\dots,A_{n}\right\}\) be a partition of the sample space \(\Omega\) where the partitions \(A_{\star}\) are disjoint and \(\Omega=A_{1}\cup{A_{2}}\cup\dots\cup{A_{n}}\). Then for \(B\subseteq\Omega\):

\[P(B) = \sum_{i=1}^{n}P\left(B\,\vert{A_{i}}\right)P\left(A_{i}\right)\]

Probability mass functions#

The probability mass function (PMF) of a discrete random variable \(X\) is a function that specifies the probability of obtaining \(X = x\), where \(x\) is a particular event in the set of possible events we’re interested in \(\mathcal{F}\subseteq{X\left(\Omega\right)}\):

\[ \begin{equation*} p_{X}(x) = P\left(X=x\right) \end{equation*} \]

where \(\mathcal{F}\) is the event space, and \(\Omega\) is the sample space. A probability mass function must satisfy the condition:

\[ \begin{equation*} \sum_{x\in{X(\Omega)}}p_{X}(x)=1 \end{equation*} \]

In Julia, probability mass (or density) functions can be constructed and sampled using the Distributions.jl package. Let’s look at a few common probability mass functions.

Bernoulli distribution#

Bernoulli random variables, named after the Swiss mathematician Jacob Bernoulli, have two states: either 1 or 0 and model binary events such as coin flips, binary bits, true or false, yes or no, present or absent, etc. (Definition 21):

Definition 21 (Bernoulli Random Variable)

A Bernoulli random variable \(X\) models a binary outcome: either 1 or 0, where 1 occurs with probability \(p\) and 0 occurs with probability \(1-p\). The probability mass function (pmf) of the Bernoulli random variable \(X\) is:

\[\begin{split} \begin{equation} p_{X}(x) = \begin{cases} p & \text{if } x = 1 \\ 1 - p & \text{if } x = 0 \end{cases} \end{equation} \end{split}\]

where \(0<p<1\) is called the Bernoulli parameter. The expectation a Bernoulli random variable equals:

\[ \begin{equation} \mathbb{E}\left[X\right] = p \end{equation} \]

while the variance \(\text{Var}(X)\) equals:

\[ \begin{equation} \text{Var}\left[X\right] = p(1-p) \end{equation} \]

Example code for a Bernoulli random variable:

# load the distributions package, and some other stuff
using Distributions
using Statistics
using PrettyTables

# Details of Bernoulli distribution: 
# https://juliastats.org/Distributions.jl/stable/univariate/#Discrete-Distributions

# setup constants -
p = 0.64;
number_of_samples = 100;

# build a Bernoulli distribution
d = Bernoulli(p)

# sample (check expectation, and variance)
samples = rand(d,number_of_samples);

# build a table -
data_for_table = Array{Any,2}(undef, 2, 3)
table_header = ["", "E(X)", "Var(X)"]

# row 1: model
data_for_table[1,1] = "model"
data_for_table[1,2] = mean(d);
data_for_table[1,3] = var(d);

# row 2: samples
data_for_table[2,1] = "samples"
data_for_table[2,2] = mean(samples);
data_for_table[2,3] = var(samples);
pretty_table(data_for_table, header=table_header);

Geometric distribution#

Geometric random variables model the number of trials required to obtain the first success in a sequence of independent Bernoulli trials (Definition 22):

Definition 22 (Geometric Random Variable)

Geometric random variables model the number of trials required to obtain the first success in a sequence of independent Bernoulli trials. The probability mass function for a geometric random variable is given by:

\[ \begin{equation*} p_{X}(k) = (1-p)^{(k-1)}p\qquad{k=1,2,\dots} \end{equation*} \]

where \(p\) denotes the geometric parameter \(0<p<1\). The expectation of a geometric random variable \(X\) is given by:

\[ \begin{equation*} \mathbb{E}\left[X\right] = \frac{1}{p} \end{equation*} \]

while the variance \(\text{Var}(X)\) is given by:

\[ \begin{equation*} \text{Var}\left[X\right] = \frac{1-p}{p^2} \end{equation*} \]

Example code for a Geometric random variables:

# load the distributions package, and some other stuff
using Distributions
using Statistics
using PrettyTables

# Details of Geometric distribution: 
# https://juliastats.org/Distributions.jl/stable/univariate/#Distributions.Geometric

# setup constants -
p = 0.64;
number_of_samples = 100;

 # build a Geometric distribution
d = Geometric(p)

# sample (check expectation, and variance)
samples = rand(d, number_of_samples);

# build a table -
data_for_table = Array{Any,2}(undef, 2, 3)
table_header = ["", "E(X)", "Var(X)"]

# row 1: model
data_for_table[1,1] = "model"
data_for_table[1,2] = succprob(d);
data_for_table[1,3] = var(d);

# row 2: samples
data_for_table[2,1] = "samples"
data_for_table[2,2] = mean(samples);
data_for_table[2,3] = var(samples);
pretty_table(data_for_table, header=table_header);

Binomial distribution#

The binomial distribution is the probability of getting \(k\) successes in \(n\) independent Bernoulli trials, e.g., the chance of getting four heads in six coin tosses (Definition 23):

Definition 23 (Binomial Random Variable)

The binomial distribution, the probability of \(k\) successes in \(n\) independent Bernoulli trials, has the probability mass function:

\[ \begin{equation*} p_{X}(k) = \binom{n}{k}p^{k}\left(1-p\right)^{n-k}\qquad{k=0,1,\dots,n} \end{equation*} \]

where \(k\) denotes the number of successes in \(n\) independent experiments, the binomial parameter \(0<p<1\) is the probability of a successful trial and:

\[ \begin{equation*} \binom{n}{k} = \frac{n!}{k!\left(n-k\right)!} \end{equation*} \]

is the binomial coefficient. The expectation and variance of a binomial random variable is given by:

\[\begin{split} \begin{eqnarray*} \mathbb{E}\left[X\right] &=& np\\ \text{Var}\left[X\right] &=& np(1-p) \end{eqnarray*} \end{split}\]

Example code for a binomial random variable:

# load the distributions package, and some other stuff
using Distributions
using Statistics
using PrettyTables

# Details of Binomial distribution: 
# https://juliastats.org/Distributions.jl/stable/univariate/#Distributions.Binomial

# setup constants -
number_of_trials = 100;
p = 0.64;
number_of_samples = 100;

# build a Bernoulli distribution
d = Binomial(number_of_trials,p)

# sample (check expectation, and variance)
samples = rand(d,number_of_samples);

# build a table -
data_for_table = Array{Any,2}(undef, 2, 3)
table_header = ["", "E(X)", "Var(X)"]

# row 1: model
data_for_table[1,1] = "model"
data_for_table[1,2] = mean(d);
data_for_table[1,3] = var(d);

# row 2: samples
data_for_table[2,1] = "samples"
data_for_table[2,2] = mean(samples);
data_for_table[2,3] = var(samples);
pretty_table(data_for_table, header=table_header);

Poisson distribution#

Poisson random variables are a type of discrete probability distribution that models the number of occurrences of an event in a fixed interval of time or space (Definition 24):

Definition 24 (Poisson Random Variable)

Poisson random variables model the number of occurrences of an event in a fixed interval of time or space. The probability mass function for a Poisson random variable is given by:

\[ \begin{equation*} p_{X}(x) = \frac{\lambda^{x}}{x!}\exp\left(-\lambda\right) \end{equation*} \]

where \(\lambda>0\) denotes the Poisson parameter, and \(!\) denotes the factorial function. The expectation of a Poisson random variable \(X\) is given by:

\[ \begin{equation*} \mathbb{E}\left[X\right] = \lambda \end{equation*} \]

while the variance \(\text{Var}(X)\) is given by:

\[ \begin{equation*} \text{Var}\left[X\right] = \lambda \end{equation*} \]

Example code for a Poisson random variable:

# load the distributions package, and some other stuff
using Distributions
using Statistics
using PrettyTables

# Details of Poisson distribution:
# https://juliastats.org/Distributions.jl/stable/univariate/#Distributions.Poisson

# build a Poisson distribution
d = Poisson(λ)

# sample (check expectation, and variance)
samples = rand(d, number_of_samples);

# build a table -
data_for_table = Array{Any,2}(undef, 2, 3)
table_header = ["", "E(X)", "Var(X)"]

# row 1: model
data_for_table[1,1] = "model"
data_for_table[1,2] = mean(d);
data_for_table[1,3] = var(d);

# row 2: samples
data_for_table[2,1] = "samples"
data_for_table[2,2] = mean(samples);
data_for_table[2,3] = var(samples);
pretty_table(data_for_table, header=table_header);

Poisson random variables estimate how likely something will happen \(x\) number of times in a fixed interval, e.g., the number of car crashes in a city of a given size or the number of cheeseburgers sold at a fast-food chain on a Friday night.

Moments of a random variable#

Moments of a random variable are a way to summarize its distribution and provide important information about its properties. Specifically, moments are mathematical quantities that describe the shape, center, and spread of a distribution, and they can be used to calculate other statistical measures such as variance and skewness.Let’s look at the first and second moments of a random variable, namely the Expectation and the Variance.

Expectation#

The expectation of a discrete random variable \(X\) measures the central tendency of the values of that random variable (Definition 25):

Definition 25 (Expectation discrete random variable)

Let \(X\) denote a discrete random variable with the probability space \(\left(\Omega,\mathcal{F}, P\right)\), where \(\Omega\) denotes the sample space, \(\mathcal{F}\) denotes the event space, and \(P\) denotes the probability measure. Then, the expected value of the random variable \(X\) is given by:

(36)#\[\mathbb{E}\left[X\right] = \sum_{x\in\Omega}xp_{X}(x)\]

where \(x\) denotes a value for the discrete random variable \(X\), and \(p_{X}(x)\) denotes the probability of \(X=x\). The value of \(p_{X}(x)\) is governed by a Probability Mass Function (PMF).

The expectation of a discrete random variable has a few interesting properties (Observation 1):

Observation 1 (Properties of expectation)

The expectation of a random variable \(X\) has several useful (and important) properties:

\(\mathbb{E}\left(c\right) = c\) for any constant \(c\)
\(\mathbb{E}\left(cX\right) = c\times\mathbb{E}\left(X\right)\) for any constant \(c\)
\(\mathbb{E}\left(g(X)\right) = \sum_{x\in{X(\Omega)}}g(x)p_{X}(x)\)
\(\mathbb{E}\left(g(X)+h(X)\right) = \mathbb{E}(g(X)) + \mathbb{E}(h(X))\)
\(\mathbb{E}\left(X+c\right) = \mathbb{E}(X) + c\) for any constant \(c\)

Variance#

The variance measures the expected dispersion for individual values of a random variable \(X\), i.e., the average distance that values of \(X\) are spread out from their expected value (Definition 26):

Definition 26 (Variance discrete random variable)

Let \(X\) denote a discrete random variable with the probability space \(\left(\Omega,\mathcal{F},P\right)\), where \(\Omega\) denotes the sample space, \(\mathcal{F}\) denotes the event space, and \(P\) denotes the probability measure. Then, the variance of the random variable \(X\) is given by:

(37)#\[\text{Var}(X) = \mathbb{E}\Bigl[(X-\mu)^{2}\Bigr]\]

where \(\mu = \mathbb{E}(X)\) denotes the expected value of the random variable \(X\).

The variance of a discrete random variable has a few interesting properties (Observation 2):

Observation 2 (Properties of variance)

The variance of a random variable \(X\) has a few interesting (and important) properties:

\(\text{Var}(X) = \mathbb{E}\left(X^{2}\right) - \mathbb{E}\left(X\right)^2\)
\(\text{Var}(cX) = {c^2}\text{Var}(X)\) for any constant \(c\)
\(\text{Var}(X+c) = \text{Var}(X)\) for any constant \(c\)

The more common quantity that is used to measure dispersion, the standard deviation \(\sigma\), is related to the variance: \(\sigma_{X} = \sqrt{\text{Var}(X)}\).