Introduction

In this sixth edition, we take a deeper look into the structure of Bayesian models by introducing the concept of conjugate prior distributions. These are prior distributions that lead to posterior distributions in the same family, offering mathematical convenience and analytical tractability.

We’ll begin with a general definition, then explore key examples of naturally conjugate models used throughout Bayesian inference. Understanding these models is essential for applying Bayesian methods efficiently in real-world problems.

What Are Conjugate Priors?

Definition

Let:

$\mathcal{F}$ be a class of likelihood functions $p(y \mid \theta)$,
$\mathcal{P}$ be a class of prior distributions $p(\theta)$.

Then $\mathcal{P}$ is said to be conjugate to $\mathcal{F}$ if for all $p(\cdot \mid \theta) \in \mathcal{F}$ and all $p(\cdot) \in \mathcal{P}$, the posterior $p(\theta \mid y)$ also belongs to $\mathcal{P}$.

Why are Conjugate Priors so important?

Conjugate priors simplify computation: posterior distributions can be computed analytically.
They provide closed-form posterior expressions for many standard models.
The functional form of the prior mirrors the likelihood, resulting in interpretability and mathematical efficiency.

Common Conjugate Models

Single-Parameter Models

Model	Likelihood	Prior	Posterior
Binomial-Beta	$Y \sim \text{Bin}(n,\theta)$	$\theta \sim \text{Beta}(\alpha, \beta)$	$\theta \mid y \sim \text{Beta}(\alpha + y, \beta + n – y)$
NegativeBinomial-Beta	$Y \sim \text{NegBin}(r, \theta)$	$\theta \sim \text{Beta}(\alpha, \beta)$	Same as above
Poisson-Gamma	$Y \sim \text{Poisson}(\theta)$	$\theta \sim \text{Gamma}(\alpha, \beta)$	$\theta \mid y \sim \text{Gamma}(\alpha + y, \beta + 1)$
Exponential-Gamma	$Y \sim \text{Exp}(\theta)$	$\theta \sim \text{Gamma}(\alpha, \beta)$	Gamma posterior
Normal-Normal	$Y \sim \mathcal{N}(\theta, \sigma^2)$	$\theta \sim \mathcal{N}(\mu, \tau^2)$	Normal posterior
Normal-Inverse-Gamma	$Y \sim \mathcal{N}(\theta, \sigma^2)$ with unknown $\sigma^2$	Hierarchical prior: $\theta \mid \sigma^2 \sim \mathcal{N}(\mu, v\sigma^2)$, $\sigma^2 \sim \text{InvGamma}(\alpha, \beta)$	Full posterior is Normal-Inverse-Gamma

Multivariate Models

Multinomial-Dirichlet:
$$\mathbf{Y} \sim \text{Multinomial}(n, \boldsymbol{\theta}), \quad \boldsymbol{\theta} \sim \text{Dirichlet}(\boldsymbol{\alpha})$$
$$\Rightarrow \boldsymbol{\theta} \mid \mathbf{Y} \sim \text{Dirichlet}(\boldsymbol{\alpha} + \mathbf{Y})$$

Multivariate Normal:
$$\mathbf{Y} \sim \mathcal{N}_k(\boldsymbol{\theta}, \Sigma), \quad \boldsymbol{\theta} \sim \mathcal{N}_k(\boldsymbol{\mu}, V)$$
The posterior is again multivariate normal.

Example: Normal-Normal Conjugate Model

Suppose:

$Y \mid \theta \sim \mathcal{N}(\theta, \sigma^2)$, with $\sigma^2$ known.
Prior: $\theta \sim \mathcal{N}(\mu, \tau^2)$

Then:

$$\theta \mid y \sim \mathcal{N}(\mu_*, \tau_*^2)$$

with:

$$\mu_* = \frac{\tau^2 y + \sigma^2 \mu}{\tau^2 + \sigma^2}, \quad \tau_*^2 = \frac{\tau^2 \sigma^2}{\tau^2 + \sigma^2}$$

This posterior mean $\mu_*$ is a weighted average of the prior mean and the observed data.

🔍 Insight: The more precise the data (i.e., smaller $\sigma^2$), the more influence it has on the posterior mean.

Example: Multinomial-Dirichlet Model

Generalizing the Binomial-Beta model to $k$-categories:

Likelihood:
$$p(\mathbf{Y} \mid \boldsymbol{\theta}) = {n \choose y_1, \ldots, y_k} \theta_1^{y_1} \cdots \theta_k^{y_k}$$

Prior:
$$p(\boldsymbol{\theta}) = \frac{\Gamma(\sum_j \alpha_j)}{\prod_j \Gamma(\alpha_j)} \prod_j \theta_j^{\alpha_j – 1}$$

Posterior:
$$\boldsymbol{\theta} \mid \mathbf{Y} \sim \text{Dirichlet}(\alpha_1 + y_1, \ldots, \alpha_k + y_k)$$

💡 This conjugate structure makes updating beliefs over multinomial outcomes straightforward.

Why Conjugate Priors Matter

Conjugate priors provide several key advantages in Bayesian analysis. They enable analytical solutions without numerical integration, helping with interpretability of parameter updates. These distributions provide a foundation for hierarchical modeling and are essential for understanding Bayesian predictive distributions. The mathematical tractability they offer makes them invaluable tools for both theoretical development and practical implementation of Bayesian methods.

Simulation and Visualization in R: Plotting Prior and Posterior Distributions

# Parameters
alpha_prior <- 2
beta_prior <- 2
n <- 20
y <- 14

# Posterior parameters
alpha_post <- alpha_prior + y
beta_post <- beta_prior + n - y

# Create theta grid
theta <- seq(0, 1, length.out = 1000)

# Densities
prior <- dbeta(theta, alpha_prior, beta_prior)
posterior <- dbeta(theta, alpha_post, beta_post)

# Combine into data frame
df <- tibble(
  theta = theta,
  Prior = prior,
  Posterior = posterior
) %>%
  pivot_longer(-theta, names_to = "Distribution", values_to = "Density")

# Create visualization
ggplot(df, aes(x = theta, y = Density, color = Distribution)) +
  geom_line(size = 1.2) +
  labs(
    title = "Binomial-Beta Conjugate Prior Updating",
    subtitle = "Prior: Beta(2,2), Data: 14 successes in 20 trials",
    x = expression(theta),
    y = "Density"
  ) +
  theme_minimal() +
  scale_color_manual(values = c("steelblue", "firebrick"))

Conclusion

This edition formalized the theory and practice of conjugate priors in Bayesian inference. From the intuitive Beta-Binomial to the flexible Normal-Inverse-Gamma, conjugate models allow us to update beliefs coherently and efficiently.

Understanding conjugate priors provides the mathematical foundation necessary for more advanced Bayesian modeling techniques. These distributions serve as building blocks for complex hierarchical models and offer computational advantages that make Bayesian analysis tractable in many real-world applications.

In the next edition, we will delve deeper into some usual priors.

Keep exploring with 3 D Statistical Learning.

We thank Dr. Dany Djeudeu for his dedication to making complex statistical ideas accessible, rigorous, and inspiring.

3 D Statistical Learning

We help businesses and researchers solve complex challenges by providing expert guidance in statistics, machine learning, and tailored education.

Our core services include:

– Statistical Consulting:
Comprehensive consulting tailored to your data-driven needs.

– Training and Coaching:
In-depth instruction in statistics, machine learning, and the use of statistical software such as SAS, R, and Python.

– Reproducible Data Analysis Pipelines:
Development of documented, reproducible workflows using SAS macros and customized R and Python code.

– Interactive Data Visualization and Web Applications:
Creation of dynamic visualizations and web apps with R (Shiny, Plotly), Python (Streamlit, Dash by Plotly), and SAS (SAS Viya, SAS Web Report Studio).

– Automated Reporting and Presentation:
Generation of automated reports and presentations using Markdown and Quarto.

– Scientific Data Analysis:
Advanced analytical support for scientific research projects.