Introduction

In the first edition of this series, we introduced the fundamental distinction between the frequentist and Bayesian paradigms, using the estimation of a population mean as a guiding example. We explored how the Bayesian framework integrates prior information with observed data, forming a posterior distribution that reflects an updated belief about the parameter of interest.

In this second edition, we extend those foundational ideas by working through realistic motivating examples. These examples are designed not only to reinforce key concepts but also to demonstrate the practical power and flexibility of Bayesian thinking in diverse contexts.

Our goal remains the same: to make Bayesian statistics accessible, intuitive, and relevant.

1. Example 1: Confidence Intervals vs. Credible Intervals

Let us now revisit the estimation of a population mean, but this time focusing on how uncertainty is quantified.

Suppose we collect a sample of size \(n = 36\) from a normal distribution with unknown mean \(\theta\) and known variance \(\sigma^2 = 1\). The sample mean is \(\bar{X} = 2\).

1.1 Frequentist Confidence Interval

A 95% confidence interval for \(\theta\) is constructed as:

\(\bar{X} \pm z_{0.975} \cdot \displaystyle \frac{\sigma}{\sqrt{n}} = 2 \pm 1.96 \cdot \displaystyle \frac{1}{6} = (1.673, 2.327)\)

Interpretation (Frequentist):

This interval does not mean that the probability that \(\theta\) lies in (1.673, 2.327) is 95%. Rather, if we were to repeat the experiment many times and construct a 95% interval each time, about 95% of those intervals would contain the true \(\theta\).

This distinction is often misunderstood frequentist confidence intervals refer to long-run coverage, not the probability of a particular interval containing the parameter.

1.2 Bayesian Credible Interval

Now consider a Bayesian approach. Assume the prior for \(\theta\) is normal: \(\theta \sim \mathcal{N}(0, 1)\). The likelihood is \(\bar{X} \sim \mathcal{N}(\theta, 1/36)\).

The posterior distribution is:

\(\theta | \bar{X} \sim \mathcal{N}(\mu_n, \tau_n^2), \quad \text{with}\)

\(\mu_n = \left(\displaystyle \frac{36}{1} \cdot \bar{X} + \displaystyle\frac{1}{1} \cdot 0\right) \Big/ \left(36 + 1\right) = \displaystyle \frac{72}{37} \approx 1.946\)

\(\tau_n^2 = \left(36 + 1\right)^{-1} = \frac{1}{37}\)

So the posterior distribution is \(\mathcal{N}(1.946, 1/37)\), and a 95% credible interval is:

\(\mu_n \pm 1.96 \cdot \displaystyle \sqrt{\tau_n^2} = 1.946 \pm 1.96 \cdot \displaystyle \sqrt{1/37} \approx (1.63, 2.26)\)

Interpretation (Bayesian):
There is a 95% probability that \(\theta\) lies in the interval (1.63, 2.26), given the observed data and the prior belief.

1.3. Summary

FrameworkInterval TypeInterpretation
FrequentistConfidence Interval95% of such intervals (over repeated sampling) will contain the true \(\theta\).
BayesianCredible IntervalThere’s a 95% probability that \(\theta\) lies in this specific interval.

This example shows how Bayesian reasoning offers more intuitive probabilistic interpretations of uncertainty.

2. Example 2: Estimating the Probability of Success: A Beta-Binomial Model

Suppose you are monitoring the reliability of a newly manufactured device. You test it across 20 trials and observe 16 successes. What can you say about the probability of success \(\theta\) of the device?

2.1. Frequentist Estimation

A frequentist would likely use the observed proportion of successes as the point estimate:

\(\hat{\theta}_{\text{freq}} = \frac{16}{20} = 0.8\)

This is intuitive and straightforward. However, this approach provides no natural measure of uncertainty around the estimate (other than confidence intervals, which require additional assumptions) and does not account for prior knowledge.

2.2. Bayesian Estimation

Suppose you have prior experience with similar devices and believe the success rate \(\theta\) should be around 0.7. You express this belief using a Beta distribution:

\(\theta \sim \text{Beta}(\alpha = 7, \beta = 3)\)

This prior reflects a belief centered at 0.7 with moderate certainty.

Now, the likelihood of observing 16 successes in 20 trials is given by the binomial distribution:

\(X | \theta \sim \text{Binomial}(n = 20, \theta)\)

The conjugate prior for the binomial likelihood is the beta distribution, which leads to a Beta posterior:

\(\theta | X \sim \text{Beta}(\alpha + x, \beta + n – x) = \text{Beta}(23, 7)\)

Posterior Interpretation

  • Posterior mean: \(\displaystyle \frac{23}{23 + 7} = 0.767\)

  • The posterior reflects a compromise between the prior belief (centered at 0.7) and the observed data (0.8).

  • The more data we observe, the less influence the prior has, a key property of Bayesian inference.

3. Example 3: Updating Beliefs in Clinical Trials

Imagine you’re analyzing the effectiveness of a new drug. Previous studies suggest a small but positive treatment effect. You model this prior belief using a normal distribution:

\(\theta \sim \mathcal{N}(0.5, 0.25^2)\)

You conduct a trial and observe a treatment effect of 0.7 with a standard error of 0.2. The likelihood of the observed data can be modeled as:

\(\bar{X} \sim \mathcal{N}(\theta, 0.2^2)\)

Applying Bayesian updating, the posterior distribution is:

\(\theta | \bar{X} \sim \mathcal{N}(\mu_n, \tau_n^2)\)

with

$$\mu_n = \left(\displaystyle \frac{\bar{X}}{\sigma^2} + \displaystyle \frac{\mu_0}{\tau^2} \right) \Big/ \left(\frac{1}{\sigma^2} + \displaystyle \frac{1}{\tau^2} \right), \quad \tau_n^2 = \left( \displaystyle \frac{1}{\sigma^2} + \frac{1}{\tau^2} \right)^{-1}$$

Substituting values:

\(\mu_n = \left(\displaystyle \frac{0.7}{0.04} + \displaystyle \frac{0.5}{0.0625} \right) \Big/ \left(\displaystyle \frac{1}{0.04} + \displaystyle \frac{1}{0.0625} \right) = \displaystyle \frac{17.5 + 8}{25 + 16} = \displaystyle \frac{25.5}{41} \approx 0.622\)

  • The posterior mean is between the prior and the observed effect.

  • The posterior variance \(\tau_n^2\) is smaller than either the prior or data variance, indicating reduced uncertainty.

This example illustrates how Bayesian methods naturally account for prior beliefs, handle uncertainty, and provide updated estimates in a transparent way.

4. Why These Examples are important

Each of these motivating examples highlights a unique strength of the Bayesian framework:

  • The Beta-Binomial model shows how prior and observed proportions combine intuitively.

  • The Normal-Normal model for clinical trials illustrates how Bayesian updating reduces uncertainty.

  • In both cases, the Bayesian approach delivers posterior distributions that describe not just a point estimate but the entire distribution of plausible parameter values.

These examples demonstrate that Bayesian inference is not only a theoretical framework but a practical tool for reasoning under uncertainty in real-world settings.

Conclusion

This second edition reinforces the core idea that Bayesian methods are natural, interpretable, and adaptive. By incorporating prior knowledge and rigorously updating beliefs with data, Bayesian statistics empowers analysts to make informed decisions even in the face of limited information.

In the next edition, we will explore model comparison and selection using Bayesian tools such as Bayes Factors and posterior predictive checks.


Stay connected with us at 3 D Statistical Learning as we continue our journey into the world of Bayesian Statistics.

We thank Dr. Dany Djeudeu for his continued guidance and inspiration in making Bayesian statistics accessible to everyone.