Introduction

Statisticians draw data from a subset of the population, known as a sample, which should be representative of the whole. Analyses are then conducted on the sample with the intention of making reliable inferences about the entire population.

This raises a fundamental question: How do we make such estimations? This is where two major schools of thought in statistics diverge, the frequentist and the Bayesian approaches. Each provides a distinct perspective on how to draw inferences from data.

In this article, we take a journey from the classical frequentist approach, where all inference is derived solely from the data, to the more modern and flexible Bayesian framework, which integrates prior knowledge with data to inform decision-making. Understanding this transition is crucial to appreciating how Bayesian statistics extends and enriches classical methods.

1. A Motivating Example

Consider the task of estimating the average age of residents in Düsseldorf, Germany. Gathering age data from every resident would be infeasible. Instead, a representative sample of ages, say \(X_1, X_2, \ldots, X_n\), is collected. Based on this sample, we aim to estimate the average age \(\theta\) of the population.

This prompts a fundamental question: How do we make this estimation? The answer depends on the statistical philosophy adopted.

2. The Frequentist Perspective

The frequentist approach assumes that all the information required to estimate the parameter \(\theta\) is contained in the observed sample data. Under this framework, \(\theta\) is treated as a fixed but unknown quantity, and the randomness lies solely in the data.

A natural estimate for the population mean \(\theta\) is the sample mean:

\(\hat{\theta}_{\text{freq}} = \displaystyle \frac{1}{n} \sum_{i=1}^n X_i\)

This estimate is intuitive and widely used because, under standard conditions, it is unbiased and consistent. The key point is that the frequentist method relies exclusively on the observed data to make inferences.

3. The Bayesian Perspective

The Bayesian approach, in contrast, posits that the data alone do not capture all the information about the unknown parameter \(\theta\). Instead, it assumes that we also possess some prior knowledge or belief about \(\theta\), which may come from previous studies, expert opinion, or historical data. This belief is formalized as a prior distribution \(p(\theta)\).

Once data \(X = (X_1, X_2, \ldots, X_n)\) are observed, the Bayesian approach combines this new information with the prior knowledge using Bayes’ Theorem to form the posterior distribution:

\(p(\theta | X) = \displaystyle \frac{p(X | \theta) p(\theta)}{p(X)}\)

Here:

  • \(p(\theta)\) is the prior distribution,

  • \(p(X | \theta)\) is the likelihood of the observed data given the parameter,

  • \(p(X)\) is a normalizing constant, and

  • \(p(\theta | X)\) is the posterior distribution, which reflects our updated belief about \(\theta\) after observing the data.

In a more conventional and scientific formulation, Bayesian statistics begins with prior information about the unknown parameter \(\theta\). After data are collected from a sample, this information is updated through the likelihood of the observed data, resulting in the posterior distribution. The posterior thus synthesizes prior belief and new evidence into a unified probabilistic statement about the parameter.

This approach acknowledges that our knowledge about \(\theta\) evolves as new data become available.

A Simple Bayesian Example

Assume that the ages \(X_1, X_2, \ldots, X_n\) are independent and normally distributed with mean \(\theta\) and known variance \(\sigma^2\):

\(X_i \sim \mathcal{N}(\theta, \sigma^2)\)

Suppose our prior belief about \(\theta\) is also normally distributed:

\(\theta \sim \mathcal{N}(\mu_0, \tau^2)\)

Then the posterior distribution \(p(\theta | X)\) is also normal, with parameters:

$$\mu_n = \left(\displaystyle \frac{n}{\sigma^2} \bar{X} + \displaystyle \frac{1}{\tau^2} \mu_0\right) \Big/ \left( \displaystyle \displaystyle \frac{n}{\sigma^2} + \displaystyle \frac{1}{\tau^2}\right), \quad \tau_n^2 = \left(\displaystyle \frac{n}{\sigma^2} + \displaystyle \frac{1}{\tau^2}\right)^{-1}$$

Here, \(\bar{X}\) is the sample mean. The posterior mean \(\mu_n\) represents a weighted average of the prior mean \(\mu_0\) and the sample mean \(\bar{X}\), with weights determined by the relative precision (inverse variance) of the prior and the data.

Interpretation

  • When the prior variance \(\tau^2\) is large (i.e., the prior is vague or non-informative), the posterior mean approaches the sample mean.

  • When the sample size \(n\) is small, the prior has more influence, helping to stabilize the estimate.

  • The posterior distribution provides a full probabilistic description of uncertainty about \(\theta\).

Conclusion

Both frequentist and Bayesian approaches offer valuable frameworks for statistical inference. The frequentist approach relies solely on the data, while the Bayesian approach enriches data analysis by incorporating prior beliefs. In practice, Bayesian methods offer greater flexibility and interpretability, particularly when prior information is meaningful and well-justified.
This article provides a conceptual bridge between the two paradigms and lays the groundwork for deeper explorations into Bayesian statistics.

In the next edition, we will explore additional Motivating Examples.

Stay connected with us at 3 D Statistical Learning as we continue our journey into the world of Bayesian Statistics.

We would like to thank Dr. Dany Djeudeu once again for his expert contributions to this educational series.