Bayesian Classification with Finite Hypotheses

In this lesson, we study how to make good classification decisions using Bayesian Decision Theory. The goal is to decide which group or class an observation belongs to, based on how likely each possibility is.

The Problem Setup

Suppose we observe a value \(X\) and we want to classify it into one of \(k\) different groups. If \(X\) comes from group \(i\), it has a known probability density \(f(x \,|\, \theta_i)\).

Here’s the setup:

  • The parameter space (possible group labels): \(\Theta = \{\theta_1, \ldots, \theta_k\}\)
  • The decision space (your options): \(D = \{d_1, \ldots, d_k\}\)

Your job: choose the best decision \(d_i\) based on the observed \(X\).


What Is the Loss Function?

The loss function tells us how bad a wrong decision is.

$$L(\theta_i, d_j) = \begin{cases} 1 & \text{if } i \ne j \\ 0 & \text{if } i = j \end{cases}$$

This means:

  • If you guess the right group (i.e., \(i = j\)), you lose nothing.

  • If you’re wrong, you lose 1 point.


a) Risk Function of a Decision Rule \(\delta(X)\)

A decision rule \(\delta(X)\) tells you what decision to make when you observe \(X\).

The risk function is the average loss if the true parameter is \(\theta_i\):

\(R(\theta_i, \delta) = \mathbb{E}_{\theta_i}[L(\theta_i, \delta(X))] = 1 – P_{\theta_i}(\delta(X) = d_i)\)

So, the risk is just the probability that we misclassify the observation.


b) The Bayesian Decision Rule \(\delta_B(X)\)

Now let’s use Bayesian thinking. We combine prior knowledge (what we believe before seeing data) with what we observe.

We want to minimize the expected loss using the posterior probabilities.

For each possible decision \(d_i\), the expected loss is:

\(\mathbb{E}_{\pi(\theta \,|\, X=x)}[L(\theta, d_i)] = 1 – \pi(\theta_i \,|\, X = x)\)

So, we should choose the \(d_i\) that maximizes the posterior probability \(\pi(\theta_i \,|\, X = x)\).

This leads to the Bayesian decision rule:

$$\delta_B(x) = d_i \text{ such that } \pi(\theta_i \,|\, x) = \max_j \pi(\theta_j \,|\, x)$$


c) Example: Classifying Based on Normal Distributions

Imagine you’re given a value \(X\) that comes from a normal distribution depending on which group it belongs to:

  • \(X \sim \mathcal{N}(\theta_i, 1)\)
  • \(\Theta = \{-1, 0, 1\}\)
  • Prior probabilities: \(\pi = (0.3, 0.4, 0.3)\)

We want to decide which \(\theta\) value is most likely, based on observing \(x\).

We compare:

$$\pi(\theta_i \,|\, x) \propto f(x \,|\, \theta_i) \cdot \pi(\theta_i)$$

Let’s compare each pair of possible \(\theta\) values.

Comparing \(\theta = -1\) and \(\theta = 0\):

We want:

$$\frac{f(x \,|\, -1)}{f(x \,|\, 0)} > \frac{\pi(0)}{\pi(-1)} = \frac{0.4}{0.3} = \frac{4}{3}$$

Solving:

$$e^{-\frac{1}{2}[(x+1)^2 – x^2]} > \frac{4}{3} \Rightarrow e^{-x – 0.5} > \frac{4}{3} \Rightarrow x < -\ln\left(\frac{4}{3}\right) – 0.5 \approx -0.788 = -c$$

So, if \(x < -c\), choose \(\theta = -1\).

Similarly, for \(\theta = 1\) vs. \(\theta = 0\):

$$x > +c$$

Final Classification Rule:

$$\delta_B(x) = \begin{cases} -1 & x < -c \\ 0 & -c \le x \le c \\ 1 & x > c \end{cases}$$


Visualizing the Decision Rule

plot of chunk setup2332122612prepa

plot of chunk setup2332122612prepa

This plot shows the three decision regions. Based on the observed \(x\) value, we choose the group that is most likely according to the Bayesian rule.