SOCR ≫ DSPA ≫ Topics ≫

1 Applied Bayesian Modeling, Simulation and Inference

1.1 Background

After Thomas Bayes died in 1761 his unpublished work about the flat-prior posterior binomial distribution was rediscovered and published in the Philosophical Transactions of the Royal Society of London in 1764.

The Monte Carlo (MC) Method is a simple technique used to estimate analytic quantities that may have closed-form expressions, but are difficult to compute exactly, e.g., integration, differentiation, optimization of non-trivial functions. MC simulation may be used to approximate probability distributions, estimate likelihoods, or calculate expectations. The intuition of MC sampling is rooted in the fact that we can estimate any desired (posterior) expectation using ergodic averages. In other words, we can compute any population characteristic, associated with a corresponding sample-driven statistic, of a (posterior) distribution provided we can generate \(N\) simulated samples from that distribution:

\[E(f(s))_P = \int {f(s)p(s)ds} \approx \frac{1}{N}\sum_{i=1}^N {f(s^{(i)})},\] where \(P\) and \(p\) are the probability (posterior) distribution and density functions of interest, \(E\) represents the expectation, \(f(s)\) is the population characteristic of interest (corresponding to a statistic), and \(s^{(i)}\) is the \(i^{th}\) simulated sample from the distribution \(P\). For instance, we can estimate the population mean (linked to the sample arithmetic average statistic) by:

\[E(x)_P =\int{xp(x)dx} \approx \frac{1}{N}\sum_{i=1}^N {x^{(i)}},\] where \(p(x)\) is the density of the distribution function \(P\). Similarly, assuming a trivial mean, to estimate the population variance (corresponding to the sample-variance statistic) we compute:

\[E(x^2)_P =\int{x^2p(x)dx} \approx \frac{1}{N-1}\sum_{i=1}^N {\left ( x^{(i)}\right ) ^2}.\]

Many integrals, differentials, or sums that cannot be exactly computed in closed analytic form, can still be expressed as expectations of some random variables with respect to some probability distribution, which are estimable by MC methods.

The Markov Chain Monte Carlo (MCMC) method was proposed by Andrey Markov in the early \(20^{th}\) century. Markov Chains refer to one-step dependent sequences of random variables obeying the Markov property (past and future are conditionally independent given the present). For example:

  • Sampling with replacement vs. without replacement from a space (e.g., an urn) containing a set of identifiable objects represent processes obeying vs. not obeying the Markov property, respectively.

  • A simple Brownian motion (Wiener process) random walk on an integer lattice \(Z^d \subset R^d\) represents a Markov chain whose transition probabilities are defined by \(p(x,x\pm e_i)=\frac{1}{2d}\), for each \(x\in Z^d\), where \(e_i\) are the normalized vectors in \(Z^d\). At each step, this random walk process, currently at position \(x\), moves to a randomly chosen nearest neighbor, \(x\pm e_i\).

  • Consider cancer staging and the corresponding transitions between stages. The Cancer Tumor-Node-Metastasis (TNM) staging system describes the size of the primary tumor (T), whether the cancer has spread to the lymph nodes (N), and whether it has spread to a different part of the body (M). \(T=1(small), 2, 3, 4(large)\) refers to the size of the cancer and how far it has spread into nearby tissue, \(N=0\) (no lymph nodes containing cancer cells), \(1, 2, 3\) (lots of lymph nodes containing cancer cells), and \(M=\) \(0\) (no spread), \(1\) (cancer has spread), refers to whether the cancer has spread to another part of the body. Most types of cancer have four stages, \(CS=1(early), 2, 3,\ to\ 4\ (advanced)\). Stage 1 implies that a cancer is relatively small and contained within the organ it started in. Stage 2 means the cancer has not started to spread into surrounding tissue but tumor is larger than in stage 1, or that cancer cells have spread into lymph nodes close to the tumor. Stage 3 usually means the cancer is larger and may have started to spread into surrounding tissues possibly with cancer cells in the lymph nodes in the area. And stage 4 means the cancer has spread from its point of origination to another body organ indicating metastatic progression.

Let’s use cancer staging to illustrate the fundamentals of MC representation, its properties, and computational characteristics. For each stage, let \(X_t\) denote the cancer stage of a patient at time \(t\). Cancer onset, progression, prognosis and outcomes have stochastic behavior. We have 5 states, \(X_n=0(no\ cancer), 1,2, 3, 4\), and initially \(X_o=0\) (cancer occurs), with \(P(X_o=0)=1\). Next we can define the marginal probabilities for transition from one cancer stage to another (these depend upon the type of cancer), say \(P(X_1=1)=0.6\), \(P(X_1=2)=0.4\), \(P(X_1=3)=0.2\), \(P(X_1=4)=0.1\). Computing the distribution of \(X_t\) for \(t\geq 2\) requires conditional probabilities.

Suppose that at time \(t\), the cancer patient is at stage \(s_o\). Then, we can compute the conditional probabilities: \(P(X_{t+1}=s_k|X_t=s_o)\) of patient being in a specific stage \(s_k\) at the next time point \(t+1\). For instance, if at time \(t\) the patient is in stage \(1\) we may have \(P(X_{t+1}=0|X_t=1)=0.15\), \(P(X_{t+1}=1|X_t=1)=0.5\), \(P(X_{t+1}=2|X_t=1)=0.2\), \(P(X_{t+1}=3|X_t=1)=0.1\), and \(P(X_{t+1}=4|X_t=1)=0.05\).

Would these probabilities change if we condition further on the oncological history of the same patient up to time \(t\)? Because of the stochastic coin-toss like mechanism of many biomedical, physiologic and natural processes, these conditional likelihoods may not be affected by knowing a longer staging history of the disease provenance. To predict the likelihood of being in a certain state at the next time period, only the current state of the disease matters. Thus, for any cancer provenance record (\(s_o, s_1, \cdots, s_{t-1}\)), we may have \(P(X_{t+1}=0|X_o=S_o, X_1=s_1, \cdots, X_{t-1}=s_{t-1}, X_t=1)=P(X_{t+1}=0|X_t=1)=0.2\), \(P(X_{t+1}=1|X_o=S_o, X_1=s_1, \cdots, X_{t-1}=s_{t-1}, X_t=1)=P(X_{t+1}=1|X_t=1)=0.2\), \(P(X_{t+1}=2|X_o=S_o, X_1=s_1, \cdots, X_{t-1}=s_{t-1}, X_t=1)=P(X_{t+1}=2|X_t=1)=0.2\), \(P(X_{t+1}=3|X_o=S_o, X_1=s_1, \cdots, X_{t-1}=s_{t-1}, X_t=1)=P(X_{t+1}=3|X_t=1)=0.2\), and \(P(X_{t+1}=4|X_o=S_o, X_1=s_1, \cdots, X_{t-1}=s_{t-1}, X_t=1)=P(X_{t+1}=4|X_t=1)=0.2\).

This is because the coin-toss like cancer staging outcome (not necessarily a fair coin) at time \(t+1\) is independent of the prior coin flips and therefore it does not depend on \(X_o, X_1, \cdots, X_{t-1}\). It only depends on the current cancer stage, \(X_t\), at time \(t\). This rule, that the conditional distribution of \(X_{t+1}\) depends only on the state at the prior time, \(X_t\), is referred to as the Markov property, or memoryless property.

Another characteristic of Markov processes, time homogeneity, is that the conditional distribution of \(X_{t+1}\) given (say) \(X_t=s_o\) is the same for all time points, i.e., \(P(X_{t'+1} | X_{t'}=s_o) = P(X_{t''+1} | X_{t''}=s_o)\), for all \(t'\) and \(t''\). This follows from the fact that albeit dependent on human intervention, the biophysiological mechanism of cancer stage transitioning obeys the same rules independently of the time.

Collectively, the transition probabilities comprise the Markov transition matrix, \(P\), where each element \(P_{i,j}\) represents the conditional probability of observing cancer stage \(s_j\) at time \(t+1\) given that the current, time \(t\), stage is \(s_i\), i.e., \(P_{i,j} = P(X_{t'+1} =s_j| X_{t'}=s_i)\).

This a Markov chain model for cancer progression, using the above 5-cancer stages (model states) may be represented mathematically as a transition matrix or a graph (here \(k=5\)): \[P= \begin{pmatrix} P_{1,1} & P_{1,2} & \cdots & P_{1,k} \\ P_{2,1} & P_{2,2} & \cdots & P_{2,k} \\ \vdots & \vdots & \ddots & \vdots \\ P_{k,1} & P_{k,2} & \cdots & P_{k,k} \end{pmatrix} .\] The axioms of probability theory dictate that:

  • the conditional probabilities are non-negative, \(P_{i,j} = P(X_{t+1} =s_j| X_{t}=s_i) \geq 0\), for all \(i,j \in \{1, 2, 3, \cdots, k\}\),
  • the law of total probability is satisfied, \(\sum_{j=1}^k{P_{i,j}} = 1\), for all all \(i \in \{1, 2, 3, \cdots, k\}\), and
  • The initial distribution of the Markov Chain \(\{X_o, X_1, X_2, \cdots \}\) defining the initial conditions, \(t=0\), of starting the chain over its states \(\{s_1, s_2, s_3, \cdots, s_k\}\), can be expressed as \(\mu^{(o)}=(\mu_1^{(o)}=P(X_o=s_1), \mu_2^{(o)} =P(X_o=s_2), \cdots, \mu_k^{(o)} =P(X_o=s_k))\), where \(\sum_{i=1}^k{\mu_i^{(o)}}=1\).

Below we show the mathematical matrix representation of the MC process above as a (directed, cyclic, complete) graphical object illustrating all states and their corresponding transition probabilities.

# install.packages("network")
library('igraph')

nodes <- data.frame(c('Stage_0','Stage_1','Stage_2', 'Stage_3', 'Stage_4'))
node_names <- c('Stage_0','Stage_1',' Stage_2', ' Stage_3', ' Stage_4' )
edges <- data.frame(from=c( 'Stage_0', 'Stage_0', 'Stage_0', 'Stage_0', 'Stage_0',
                        'Stage_1', 'Stage_1', 'Stage_1', 'Stage_1', 'Stage_1',
            'Stage_2', 'Stage_2', 'Stage_2', 'Stage_2', 'Stage_2', 
            'Stage_3', 'Stage_3', 'Stage_3', 'Stage_3', 'Stage_3', 
            'Stage_4', 'Stage_4', 'Stage_4', 'Stage_4', 'Stage_4'),
                        to=c('Stage_0', 'Stage_1', 'Stage_2', 'Stage_3', 'Stage_4',
            'Stage_0', 'Stage_1', 'Stage_2', 'Stage_3', 'Stage_4',
            'Stage_0', 'Stage_1', 'Stage_2', 'Stage_3', 'Stage_4',
            'Stage_0', 'Stage_1', 'Stage_2', 'Stage_3', 'Stage_4',
            'Stage_0', 'Stage_1', 'Stage_2', 'Stage_3', 'Stage_4'),
                            same.dept=c(T,T,T,F, T)
                        )
edge_names <- rep('P{i,j}', 25)
vertex_size <- rep(20, 5)
for (i in 1:5) {
    vertex_size[i] <- 70-5*(i-1)
    for (j in 1:5) {
        edge_names[j + (i-1)*5] <- paste0('P{', i-1, ',', j-1, '}')
        # edge_names[j + (i-1)*5]
    }
}

g <- graph_from_data_frame(edges, directed=TRUE, vertices=nodes)
plot(g, layout=layout_with_fr, vertex.size=vertex_size, vertex.label=node_names, edge.label=edge_names, edge.label.cex = 0.7, edge.curved=0.3, margin=c(0.1, 0.1, 0.1, 0.1), main='Cancer Staging Transition Graph', ylim=c(-1,1),xlim=c(-1,1), asp = 1)