Bayesian Classification with Finite Hypotheses

In this lesson, we study how to make good classification decisions using Bayesian Decision Theory. The goal is to decide which group or class an observation belongs to, based on how likely each possibility is.

The Problem Setup

Suppose we observe a value $X$ and we want to classify it into one of $k$ different groups. If $X$ comes from group $i$, it has a known probability density $f(x \,|\, \theta_i)$.

Here’s the setup:

The parameter space (possible group labels): $\Theta = \{\theta_1, \ldots, \theta_k\}$
The decision space (your options): $D = \{d_1, \ldots, d_k\}$

Your job: choose the best decision $d_i$ based on the observed $X$.

What Is the Loss Function?

The loss function tells us how bad a wrong decision is.

$$L(\theta_i, d_j) = \begin{cases} 1 & \text{if } i \ne j \\ 0 & \text{if } i = j \end{cases}$$

This means:

If you guess the right group (i.e., $i = j$), you lose nothing.
If you’re wrong, you lose 1 point.

a) Risk Function of a Decision Rule $\delta(X)$

A decision rule $\delta(X)$ tells you what decision to make when you observe $X$.

The risk function is the average loss if the true parameter is $\theta_i$:

$R(\theta_i, \delta) = \mathbb{E}_{\theta_i}[L(\theta_i, \delta(X))] = 1 – P_{\theta_i}(\delta(X) = d_i)$

So, the risk is just the probability that we misclassify the observation.

b) The Bayesian Decision Rule $\delta_B(X)$

Now let’s use Bayesian thinking. We combine prior knowledge (what we believe before seeing data) with what we observe.

We want to minimize the expected loss using the posterior probabilities.

For each possible decision $d_i$, the expected loss is:

$\mathbb{E}_{\pi(\theta \,|\, X=x)}[L(\theta, d_i)] = 1 – \pi(\theta_i \,|\, X = x)$

So, we should choose the $d_i$ that maximizes the posterior probability $\pi(\theta_i \,|\, X = x)$.

This leads to the Bayesian decision rule:

$$\delta_B(x) = d_i \text{ such that } \pi(\theta_i \,|\, x) = \max_j \pi(\theta_j \,|\, x)$$

c) Example: Classifying Based on Normal Distributions

Imagine you’re given a value $X$ that comes from a normal distribution depending on which group it belongs to:

$X \sim \mathcal{N}(\theta_i, 1)$
$\Theta = \{-1, 0, 1\}$
Prior probabilities: $\pi = (0.3, 0.4, 0.3)$

We want to decide which $\theta$ value is most likely, based on observing $x$.

We compare:

$$\pi(\theta_i \,|\, x) \propto f(x \,|\, \theta_i) \cdot \pi(\theta_i)$$

Let’s compare each pair of possible $\theta$ values.

Comparing $\theta = -1$ and $\theta = 0$:

We want:

$$\frac{f(x \,|\, -1)}{f(x \,|\, 0)} > \frac{\pi(0)}{\pi(-1)} = \frac{0.4}{0.3} = \frac{4}{3}$$

Solving:

$$e^{-\frac{1}{2}[(x+1)^2 – x^2]} > \frac{4}{3} \Rightarrow e^{-x – 0.5} > \frac{4}{3} \Rightarrow x < -\ln\left(\frac{4}{3}\right) – 0.5 \approx -0.788 = -c$$

So, if $x < -c$, choose $\theta = -1$.

Similarly, for $\theta = 1$ vs. $\theta = 0$:

$$x > +c$$

Final Classification Rule:

$$\delta_B(x) = \begin{cases} -1 & x < -c \\ 0 & -c \le x \le c \\ 1 & x > c \end{cases}$$

Visualizing the Decision Rule

plot of chunk setup2332122612prepa

This plot shows the three decision regions. Based on the observed $x$ value, we choose the group that is most likely according to the Bayesian rule.

3 D Statistical Learning

We help businesses and researchers solve complex challenges by providing expert guidance in statistics, machine learning, and tailored education.

Our core services include:

– Statistical Consulting:
Comprehensive consulting tailored to your data-driven needs.

– Training and Coaching:
In-depth instruction in statistics, machine learning, and the use of statistical software such as SAS, R, and Python.

– Reproducible Data Analysis Pipelines:
Development of documented, reproducible workflows using SAS macros and customized R and Python code.

– Interactive Data Visualization and Web Applications:
Creation of dynamic visualizations and web apps with R (Shiny, Plotly), Python (Streamlit, Dash by Plotly), and SAS (SAS Viya, SAS Web Report Studio).

– Automated Reporting and Presentation:
Generation of automated reports and presentations using Markdown and Quarto.

– Scientific Data Analysis:
Advanced analytical support for scientific research projects.