1. Basic Idea

Statistics is the science of learning from data. It has two main purposes:

  • To describe what has been observed.
  • To make inferences about the wider world beyond the observed data.

Often, we are interested in understanding a population, the entire group we want to study. Because collecting data from the whole population is usually impractical (too costly, too time-consuming, or simply impossible), we work with a sample. From this sample, we estimate population characteristics or test assumptions about them.

Traditional statistics therefore focuses on:

  • Estimating parameters (e.g., the population mean or proportion).
  • Testing hypotheses (e.g., whether a parameter takes a certain value).

New Focus:

Decision theory builds on this foundation but takes a broader view.
Here, the central goal is not only to estimate or test, but to make decisions under uncertainty.

We make decisions under uncertainty based on the available data and prior knowledge.


Example: Bringing a New Painkiller to Market

A pharmaceutical company is considering launching a new painkiller.

Parameters:

  • \(θ_1\): proportion of population for which the painkiller is effective
  • \(θ_2\): expected market share of the painkiller

In classical statistics, one would estimate these parameters or test hypotheses.

In decision theory, the company must make practical decisions:

  • Should we launch the product?
  • At what price?
  • How much should we invest in marketing?

2. Two New Key Ingredients

  1. Consequences of decisions → We need to assess gain or loss associated with actions.
  2. Prior knowledge about parameters (from databases, expert opinion, or prior studies) → Leads to Bayesian thinking.

3. Foundations of Statistical Decision Theory

Traditional statistics rests on three pillars:

  • Estimation
  • Testing
  • Confidence intervals

Decision theory offers a common framework for all of them.

In this course, we will mainly focus on estimation and testing.


4. The Statistical Decision Problem

Start: We observe a random variable \(X\) with unknown distribution \(F\).

Goal: Learn about \(F\) (parameters, probabilities, etc.).

More formally:

  • \(F \in \mathfrak{P}\): a family of probability distributions

Examples:

  • Nonparametric: \(F\) is arbitrary
  • Parametric: \(F(x, \theta)\) with \(\theta \in \Theta\)
## Example families:
##  1. F = set of all continuous distributions
##  2. F = {N(μ, σ²) | μ in ℝ, σ² > 0}
##  3. F = {Bin(n, p) | p in (0,1)}

Adding Sample Information

We often observe data:

We get observations \(x_1, …, x_n\) from random variables \(X_1, …, X_n\).

  • Assume: \(X_1, …, X_n\) are i.i.d.
  • Let \(\vec{X} = (X_1, …, X_n)’\), and \(\vec{x}\) its observed values.

Based on \(\vec{x}\), we make a decision \(d\) (also called an action).


Decision Examples

Example: Hypothesis Testing

  • \(d_0\): accept \(H_0\): \(\theta \in \Theta_0\)
  • \(d_1\): accept \(H_1\): \(\theta \in \Theta_1\)

The decision set \(D\) contains all possible decisions.

Other Examples

  • Estimation: \(D = \Theta\)

Note:

  • A decision can be correct or incorrect
  • Mistakes can have different “costs”

5. Loss Functions

What is a Loss Function?

A loss function \(L(\theta, d)\) measures how bad a decision \(d\) is, when the true parameter is \(\theta\).

Key idea: minimize expected loss (called risk).

Examples:

  • Absolute error: \(L(\theta, d) = |\theta – d|\)
  • Squared error: \(L(\theta, d) = (\theta – d)^2\) → common in practice

Asymmetric Loss Function Example

In our painkiller example:

  • Overestimating market share is twice as costly as underestimating

## [1] 0.4

6. Decision Rules

A decision rule \(\delta\) maps data \(\vec{x}\) to a decision \(d\):

$$\delta: \mathfrak{X} \rightarrow D, \quad \delta(\vec{x}) = d$$

Often, we care about the expected loss (called risk):

$$R(\theta, \delta) = E[L(\theta, \delta(\vec{X}))]$$


Randomized Decisions

Sometimes, decisions are randomized: instead of a fixed \(d\), we assign probabilities over \(D\).

This allows more flexibility, especially in hypothesis testing.


7. Comparing Decision Rules

How do we decide between two decision rules \(\delta_1\) and \(\delta_2\)?

Use their risk functions:

  • If \(R(\theta, \delta_1) \leq R(\theta, \delta_2)\) for all \(\theta\) and
  • Strict inequality for some \(\theta\)

Then \(\delta_1\) is better than \(\delta_2\).

In most cases, however, one risk function outperforms the other within a certain region; for example, $\delta_1$ is better than $\delta_2$ in the interval $(0, \theta_x)$ ($\theta_x$ not included).


Formal Definition

A statistical decision problem is the quadruple:

\((\Theta, \mathfrak{X}, D, L)\)

Where:

  • \(\Theta\): parameter space
  • \(\mathfrak{X}\): sample space
  • \(D\): decision set
  • \(L\): loss function

8. Example: Point Estimation

Let \(\theta\) be the probability in a \(Bin(n, \theta)\) distribution.

Estimate:

  • \(D = [0,1]\)
  • Sample mean: \(T(\vec{x}) = \frac{1}{n} \sum x_i\)

Note: \(T=0\) or \(T=1\) are possible, but \(\theta \in (0,1)\).


9. Quadratic Loss & MSE

With squared error loss:

\(R(\theta, \delta) = E[(\delta(\vec{X}) – \theta)^2]\)

This is the mean squared error (MSE):

\(MSE = Var(\delta(\vec{X})) + (Bias(\delta(\vec{X}), \theta))^2\)

Ideal: unbiased estimator with small variance → UMVU estimator


Example: Bernoulli Case

Let \(X_i \sim Bin(1, \theta)\), \(\theta \in (0,1)\).

  • \(\delta_1 = \bar{X}\): sample mean → UMVU estimator
  • \(\delta_2\): Bayes estimator with prior \(Beta(\alpha, \beta)\)
n <- 4  # or try n <- 400
alpha <- sqrt(n / 4)
beta <- alpha
MSE2 <- n / (4 * (n + sqrt(n))^2)
MSE2
## [1] 0.02777778

\(\delta_2\) has constant MSE, independent of \(\theta\)


Summary

  • Decision Theory helps us make practical decisions with uncertainty.
  • We model decisions, outcomes, and preferences through loss functions.
  • We compare decision rules via their risk (expected loss).
  • Classic estimation and testing are special cases.

Stay tuned for the next part!

We gratefully acknowledge Dr. Dany Djeudeu for preparing this course.