Metropolis-Hastings algorithm
Introduction
Metropolis-Hastings is a Markov chain Monte Carlo method for sampling from a target distribution when direct sampling is hard. The target can be known only up to a normalizing constant:
The strength of this algorithm is that we do not need to calculate the density's normalization constant, which is often difficult in practice.
The algorithm builds a Markov chain whose stationary distribution is . After burn-in, the visited states are treated as dependent samples from the target distribution.
There are some variants within the Metropolis-Hastings algorithm, which has specific assumptions and has its own name:
| Name | Proposal | Acceptance correction |
|---|---|---|
| Metropolis-Hastings (most general) | Any proposal | Uses the target ratio and the proposal ratio |
| Metropolis algorithm | Symmetric proposal, | Proposal ratio cancels |
| Random-walk Metropolis | Add symmetric noise, | A common Metropolis special case |
Acceptance ratio
At the current state , propose a candidate from any proposal distribution . The Metropolis-Hastings acceptance probability is
Because , the unknown normalizing constant cancels:
In this notebook we use one common special case: a Gaussian random-walk proposal. It is symmetric, so and the proposal terms also cancel:
Log-space implementation
In code we usually avoid computing the ratio itself. If , the proposal is accepted when
Taking logs gives the equivalent test
This is more stable numerically, and it is the same reason the graph-ensemble sampler compares math.log(rng.random()) with a change in log weight.
Target distribution
Let's use a bimodal distribution as a target in this notebook. This is a mixture of two normal densities,
Equivalently,
For this example the density is normalized, but the sampler only needs a log density up to an additive constant.
def normal_pdf(x, mean, sd):
return np.exp(-0.5 * ((x - mean) / sd) ** 2) / (sd * np.sqrt(2 * np.pi))
def target_pdf(x):
return 0.35 * normal_pdf(x, -2.0, 0.55) + 0.65 * normal_pdf(x, 1.3, 0.9)
def log_target_unnormalized(x):
"""Log unnormalized density; the missing normalizing constant is irrelevant."""
a = np.log(0.35) - 0.5 * ((x + 2.0) / 0.55) ** 2 - np.log(0.55)
b = np.log(0.65) - 0.5 * ((x - 1.3) / 0.9) ** 2 - np.log(0.9)
m = np.maximum(a, b)
return m + np.log(np.exp(a - m) + np.exp(b - m))
x_grid = np.linspace(-5.5, 5.5, 600)
fig, ax = plt.subplots(figsize=(6.4, 3.8))
ax.plot(x_grid, target_pdf(x_grid), color=COL_TARGET, linewidth=2.2, label='target density')
ax.fill_between(x_grid, target_pdf(x_grid), color=COL_TARGET, alpha=0.14)
ax.set_title('Target distribution', loc='left', fontsize=12, color=COL_TEXT)
ax.set_xlabel('$x$', fontsize=10)
ax.set_ylabel('density', fontsize=10)
style_axis(ax)
ax.legend(frameon=False, fontsize=9)
fig.tight_layout(pad=1.2)
plt.close()
show_fig_svg(fig)
Random-walk Metropolis sampler
The implementation below is the random-walk Metropolis, which is a special case of Metropolis-Hastings. The proposal is
This proposal is symmetric, so the acceptance decision only needs the log target difference. For asymmetric proposals, the log proposal ratio must also be included.
def metropolis_hastings(log_target, x0, proposal_sd, n_steps, rng):
"""Run a one-dimensional random-walk Metropolis sampler.
Parameters
----------
log_target : callable
Function returning the log target density up to an additive constant.
The normalizing constant is not needed because the sampler uses
log-density differences.
x0 : float
Initial state of the Markov chain.
proposal_sd : float
Standard deviation of the Gaussian random-walk proposal.
n_steps : int
Number of MCMC iterations to run.
rng : numpy.random.Generator
Random generator used for reproducible proposals and accept/reject
draws.
Returns
-------
samples : numpy.ndarray
Array of chain states after each iteration.
accepted : numpy.ndarray
Boolean array indicating whether each proposal was accepted.
"""
# Allocate storage for the chain and acceptance indicators.
samples = np.empty(n_steps)
accepted = np.zeros(n_steps, dtype=bool)
# Initialize the chain and cache the current log target value.
x = float(x0)
log_px = float(log_target(x))
for t in range(n_steps):
# Propose a local move from a symmetric Gaussian random walk.
x_proposed = x + rng.normal(0, proposal_sd)
log_p_proposed = float(log_target(x_proposed))
# Symmetry makes the proposal ratio cancel, leaving a log target ratio.
log_acceptance_ratio = log_p_proposed - log_px
# Accept if log(U) is below the log acceptance ratio; otherwise stay put.
if np.log(rng.random()) < log_acceptance_ratio:
x = x_proposed
log_px = log_p_proposed
accepted[t] = True
# Record the current state after the accept/reject decision.
samples[t] = x
return samples, accepted
The animation shows the local Metropolis step. The chain starts at the current state , proposes , then either moves to or stays at based on the log acceptance test. Rejected proposals are shown separately because the chain remains at the previous state.
You can see the Metropolis-Hastings step visually in 3D here:
MCMC Visualizationmcmc-visualization.vercel.appRun the chain from a low-density starting point and discard an initial burn-in period.
samples, accepted = metropolis_hastings(
log_target=log_target_unnormalized,
x0=-5.5,
proposal_sd=0.85,
n_steps=30_000,
rng=rng,
)
burn_in = 3_000
kept = samples[burn_in:]
acceptance_rate = accepted.mean()
print(f"Steps: {len(samples):,}")
print(f"Burn-in discarded: {burn_in:,}")
print(f"Proposal standard deviation: 0.85")
print(f"Acceptance rate: {acceptance_rate:.1%}")
print(f"Posterior/sample mean after burn-in: {kept.mean():.3f}")
print(f"Posterior/sample standard deviation after burn-in: {kept.std(ddof=1):.3f}")
Steps: 30,000
Burn-in discarded: 3,000
Proposal standard deviation: 0.85
Acceptance rate: 70.8%
Posterior/sample mean after burn-in: 0.091
Posterior/sample standard deviation after burn-in: 1.785fig, axes = plt.subplots(1, 2, figsize=(8, 3.5))
for ax in axes:
style_axis(ax)
axes[0].plot(samples[:2500], color=COL_SAMPLE, linewidth=0.9)
axes[0].axvline(burn_in, color=COL_REJECT, linewidth=1.1, linestyle='--', label='burn-in cutoff')
axes[0].set_title('Trace of the chain', loc='left', fontsize=11, color=COL_TEXT)
axes[0].set_xlabel('iteration', fontsize=9)
axes[0].set_ylabel('$x$', fontsize=10)
axes[0].legend(frameon=False, fontsize=8)
axes[1].hist(kept, bins=65, density=True, color=COL_SAMPLE, alpha=0.86, edgecolor='white', linewidth=0.7, label='MH samples')
axes[1].plot(x_grid, target_pdf(x_grid), color=COL_TARGET, linewidth=2.0, label='target density')
axes[1].set_title('Samples approximate the target', loc='left', fontsize=11, color=COL_TEXT)
axes[1].set_xlabel('$x$', fontsize=9)
axes[1].set_ylabel('density', fontsize=9)
axes[1].legend(frameon=False, fontsize=8)
fig.tight_layout(pad=1.3)
plt.close()
show_fig_svg(fig)
The histogram is not produced by independent draws. Consecutive points are correlated because each point is generated by modifying the previous state. The Markov chain is useful because its long-run distribution is the target distribution.
Proposal scale and mixing
The proposal standard deviation controls a tradeoff. If proposals are too small, most proposals are accepted but the chain moves slowly. If proposals are too large, many proposals jump into low-density regions and get rejected.
def autocorrelation(x, max_lag):
centered = x - np.mean(x)
denom = np.dot(centered, centered)
values = [1.0]
for lag in range(1, max_lag + 1):
values.append(float(np.dot(centered[:-lag], centered[lag:]) / denom))
return np.array(values)
rng_small = np.random.default_rng(10)
small_samples, small_accept = metropolis_hastings(
log_target_unnormalized, x0=-5.5, proposal_sd=0.12, n_steps=12_000, rng=rng_small
)
rng_large = np.random.default_rng(10)
large_samples, large_accept = metropolis_hastings(
log_target_unnormalized, x0=-5.5, proposal_sd=4.5, n_steps=12_000, rng=rng_large
)
small_kept = small_samples[2_000:]
large_kept = large_samples[2_000:]
print(f"Small proposal sd=0.12: acceptance={small_accept.mean():.1%}, sample sd={small_kept.std(ddof=1):.3f}")
print(f"Large proposal sd=4.50: acceptance={large_accept.mean():.1%}, sample sd={large_kept.std(ddof=1):.3f}")
Small proposal sd=0.12: acceptance=94.7%, sample sd=1.748
Large proposal sd=4.50: acceptance=34.2%, sample sd=1.765acf_main = autocorrelation(kept, 50)
acf_small = autocorrelation(small_kept, 50)
acf_large = autocorrelation(large_kept, 50)
lags = np.arange(len(acf_main))
fig, axes = plt.subplots(2, 2, figsize=(8.3, 5.4))
for ax in axes.flat:
style_axis(ax)
axes[0, 0].plot(small_samples[:2500], color=COL_PROPOSAL, linewidth=0.9)
axes[0, 0].set_title('Too small: slow random walk', loc='left', fontsize=10.5, color=COL_TEXT)
axes[0, 0].set_xlabel('iteration', fontsize=8.5)
axes[0, 0].set_ylabel('$x$', fontsize=9)
axes[0, 1].plot(large_samples[:2500], color=COL_REJECT, linewidth=0.9)
axes[0, 1].set_title('Too large: many rejections', loc='left', fontsize=10.5, color=COL_TEXT)
axes[0, 1].set_xlabel('iteration', fontsize=8.5)
axes[0, 1].set_ylabel('$x$', fontsize=9)
axes[1, 0].plot(lags, acf_small, color=COL_PROPOSAL, linewidth=1.7, label='sd=0.12')
axes[1, 0].plot(lags, acf_main, color=COL_SAMPLE, linewidth=1.7, label='sd=0.85')
axes[1, 0].set_title('Small steps stay correlated', loc='left', fontsize=10.5, color=COL_TEXT)
axes[1, 0].set_xlabel('lag', fontsize=8.5)
axes[1, 0].set_ylabel('autocorrelation', fontsize=9)
axes[1, 0].legend(frameon=False, fontsize=8)
axes[1, 1].plot(lags, acf_large, color=COL_REJECT, linewidth=1.7, label='sd=4.50')
axes[1, 1].plot(lags, acf_main, color=COL_SAMPLE, linewidth=1.7, label='sd=0.85')
axes[1, 1].set_title('Large steps reject often', loc='left', fontsize=10.5, color=COL_TEXT)
axes[1, 1].set_xlabel('lag', fontsize=8.5)
axes[1, 1].set_ylabel('autocorrelation', fontsize=9)
axes[1, 1].legend(frameon=False, fontsize=8)
fig.tight_layout(pad=1.2)
plt.close()
show_fig_svg(fig)
Summary
Metropolis-Hastings is useful when the target density is available up to proportionality:
The algorithm does not require direct samples from and does not require the normalizing constant . It only needs ratios of target density values, often implemented as differences in log density.
For symmetric random-walk proposals, the acceptance test is
For non-symmetric proposals, the proposal-density ratio must be included as well.