The Monte Carlo Principle
At its core, Monte Carlo integration approximates expectations by averaging function evaluations at randomly sampled points. For Bayesian inference, this allows us to compute posterior expectations without knowing the normalizing constant.
The basic Monte Carlo estimator:
where
Markov Chain Monte Carlo (MCMC)
While basic Monte Carlo requires independent samples from the target distribution, MCMC methods generate correlated samples that converge to the target distribution as the chain runs longer.
Metropolis-Hastings Algorithm
The Metropolis-Hastings algorithm is one of the most fundamental MCMC methods:
import numpy as np
from scipy.stats import norm
def metropolis_hastings(target_logpdf, proposal_sampler, x0, n_samples, burnin=1000):
"""
Metropolis-Hastings MCMC sampler
Parameters:
- target_logpdf: log probability density function of target distribution
- proposal_sampler: function that samples from proposal distribution
- x0: initial state
- n_samples: number of samples to generate
- burnin: number of initial samples to discard
"""
samples = []
x = x0
accepted = 0
for i in range(n_samples + burnin):
# Propose new state
x_prop = proposal_sampler(x)
# Calculate acceptance ratio
log_ratio = target_logpdf(x_prop) - target_logpdf(x)
if np.log(np.random.random()) < log_ratio:
x = x_prop
if i >= burnin:
accepted += 1
if i >= burnin:
samples.append(x)
acceptance_rate = accepted / n_samples
print(f"Acceptance rate: {acceptance_rate:.3f}")
return np.array(samples)
# Example: Sampling from a mixture of Gaussians
def mixture_logpdf(x):
return np.log(0.3 * norm.pdf(x, -2, 0.5) + 0.7 * norm.pdf(x, 3, 1.5))
def proposal_sampler(x):
return x + np.random.normal(0, 1) # Random walk proposal
# Generate samples
samples = metropolis_hastings(mixture_logpdf, proposal_sampler, 0, 5000)
print(f"Sample mean: {np.mean(samples):.3f}")
print(f"Sample std: {np.std(samples):.3f}")
Hamiltonian Monte Carlo (HMC)
HMC uses gradient information to propose more efficient moves, especially effective for high-dimensional problems. The algorithm simulates a particle moving on a potential energy surface defined by the negative log posterior.
Hamiltonian dynamics:
Practical Considerations
- Convergence diagnostics: Always check for convergence using multiple chains and diagnostic tests
- Effective sample size: High autocorrelation reduces the effective number of independent samples
- Tuning: Proposal distributions and step sizes significantly impact performance
- Initialization: Poor starting points can lead to slow mixing or getting stuck in local modes
💡 Key Takeaway
MCMC methods transform the problem of computing complex integrals into the problem of simulating correlated random walks. While they require careful implementation and monitoring, they enable Bayesian inference in problems where analytical solutions are impossible.
Modern Applications
In modern probabilistic programming languages like PyMC3, Stan, and Pyro, these algorithms are implemented with sophisticated tuning procedures, making Bayesian inference accessible to practitioners across disciplines.
The combination of MCMC methods with deep learning architectures has led to Bayesian neural networks, enabling uncertainty quantification in complex prediction tasks. As computational power continues to grow, we can expect MCMC methods to play an increasingly important role in machine learning and scientific computing.