1. Basic Idea

Statistics is the science of learning from data. It has two main purposes:

To describe what has been observed.
To make inferences about the wider world beyond the observed data.

Often, we are interested in understanding a population, the entire group we want to study. Because collecting data from the whole population is usually impractical (too costly, too time-consuming, or simply impossible), we work with a sample. From this sample, we estimate population characteristics or test assumptions about them.

Traditional statistics therefore focuses on:

Estimating parameters (e.g., the population mean or proportion).
Testing hypotheses (e.g., whether a parameter takes a certain value).

New Focus:

Decision theory builds on this foundation but takes a broader view.
Here, the central goal is not only to estimate or test, but to make decisions under uncertainty.

We make decisions under uncertainty based on the available data and prior knowledge.

Example: Bringing a New Painkiller to Market

A pharmaceutical company is considering launching a new painkiller.

Parameters:

$θ_1$: proportion of population for which the painkiller is effective
$θ_2$: expected market share of the painkiller

In classical statistics, one would estimate these parameters or test hypotheses.

In decision theory, the company must make practical decisions:

Should we launch the product?
At what price?
How much should we invest in marketing?

2. Two New Key Ingredients

Consequences of decisions → We need to assess gain or loss associated with actions.
Prior knowledge about parameters (from databases, expert opinion, or prior studies) → Leads to Bayesian thinking.

3. Foundations of Statistical Decision Theory

Traditional statistics rests on three pillars:

Estimation
Testing
Confidence intervals

Decision theory offers a common framework for all of them.

In this course, we will mainly focus on estimation and testing.

4. The Statistical Decision Problem

Start: We observe a random variable $X$ with unknown distribution $F$.

Goal: Learn about $F$ (parameters, probabilities, etc.).

More formally:

$F \in \mathfrak{P}$: a family of probability distributions

Examples:

Nonparametric: $F$ is arbitrary
Parametric: $F(x, \theta)$ with $\theta \in \Theta$

## Example families:
##  1. F = set of all continuous distributions
##  2. F = {N(μ, σ²) | μ in ℝ, σ² > 0}
##  3. F = {Bin(n, p) | p in (0,1)}

Adding Sample Information

We often observe data:

We get observations $x_1, …, x_n$ from random variables $X_1, …, X_n$.

Assume: $X_1, …, X_n$ are i.i.d.
Let $\vec{X} = (X_1, …, X_n)’$, and $\vec{x}$ its observed values.

Based on $\vec{x}$, we make a decision $d$ (also called an action).

Decision Examples

Example: Hypothesis Testing

$d_0$: accept $H_0$: $\theta \in \Theta_0$
$d_1$: accept $H_1$: $\theta \in \Theta_1$

The decision set $D$ contains all possible decisions.

Other Examples

Estimation: $D = \Theta$

Note:

A decision can be correct or incorrect
Mistakes can have different “costs”

5. Loss Functions

What is a Loss Function?

A loss function $L(\theta, d)$ measures how bad a decision $d$ is, when the true parameter is $\theta$.

Key idea: minimize expected loss (called risk).

Examples:

Absolute error: $L(\theta, d) = |\theta – d|$
Squared error: $L(\theta, d) = (\theta – d)^2$ → common in practice

Asymmetric Loss Function Example

In our painkiller example:

Overestimating market share is twice as costly as underestimating

## [1] 0.4

6. Decision Rules

A decision rule $\delta$ maps data $\vec{x}$ to a decision $d$:

$$\delta: \mathfrak{X} \rightarrow D, \quad \delta(\vec{x}) = d$$

Often, we care about the expected loss (called risk):

$$R(\theta, \delta) = E[L(\theta, \delta(\vec{X}))]$$

Randomized Decisions

Sometimes, decisions are randomized: instead of a fixed $d$, we assign probabilities over $D$.

This allows more flexibility, especially in hypothesis testing.

7. Comparing Decision Rules

How do we decide between two decision rules $\delta_1$ and $\delta_2$?

Use their risk functions:

If $R(\theta, \delta_1) \leq R(\theta, \delta_2)$ for all $\theta$ and
Strict inequality for some $\theta$

Then $\delta_1$ is better than $\delta_2$.

In most cases, however, one risk function outperforms the other within a certain region; for example, $\delta_1$ is better than $\delta_2$ in the interval $(0, \theta_x)$ ($\theta_x$ not included).

Formal Definition

A statistical decision problem is the quadruple:

$(\Theta, \mathfrak{X}, D, L)$

Where:

$\Theta$: parameter space
$\mathfrak{X}$: sample space
$D$: decision set
$L$: loss function

8. Example: Point Estimation

Let $\theta$ be the probability in a $Bin(n, \theta)$ distribution.

Estimate:

$D = [0,1]$
Sample mean: $T(\vec{x}) = \frac{1}{n} \sum x_i$

Note: $T=0$ or $T=1$ are possible, but $\theta \in (0,1)$.

9. Quadratic Loss & MSE

With squared error loss:

$R(\theta, \delta) = E[(\delta(\vec{X}) – \theta)^2]$

This is the mean squared error (MSE):

$MSE = Var(\delta(\vec{X})) + (Bias(\delta(\vec{X}), \theta))^2$

Ideal: unbiased estimator with small variance → UMVU estimator

Example: Bernoulli Case

Let $X_i \sim Bin(1, \theta)$, $\theta \in (0,1)$.

$\delta_1 = \bar{X}$: sample mean → UMVU estimator
$\delta_2$: Bayes estimator with prior $Beta(\alpha, \beta)$

n <- 4  # or try n <- 400
alpha <- sqrt(n / 4)
beta <- alpha
MSE2 <- n / (4 * (n + sqrt(n))^2)
MSE2

## [1] 0.02777778

$\delta_2$ has constant MSE, independent of $\theta$

Summary

Decision Theory helps us make practical decisions with uncertainty.
We model decisions, outcomes, and preferences through loss functions.
We compare decision rules via their risk (expected loss).
Classic estimation and testing are special cases.

Stay tuned for the next part!

We gratefully acknowledge Dr. Dany Djeudeu for preparing this course.

3 D Statistical Learning

We help businesses and researchers solve complex challenges by providing expert guidance in statistics, machine learning, and tailored education.

Our core services include:

– Statistical Consulting:
Comprehensive consulting tailored to your data-driven needs.

– Training and Coaching:
In-depth instruction in statistics, machine learning, and the use of statistical software such as SAS, R, and Python.

– Reproducible Data Analysis Pipelines:
Development of documented, reproducible workflows using SAS macros and customized R and Python code.

– Interactive Data Visualization and Web Applications:
Creation of dynamic visualizations and web apps with R (Shiny, Plotly), Python (Streamlit, Dash by Plotly), and SAS (SAS Viya, SAS Web Report Studio).

– Automated Reporting and Presentation:
Generation of automated reports and presentations using Markdown and Quarto.

– Scientific Data Analysis:
Advanced analytical support for scientific research projects.

Introduction to Statistical Decision Theory – Edition 1: From Traditional Statistical Methods to Decision Theory

1. Basic Idea

Example: Bringing a New Painkiller to Market

2. Two New Key Ingredients

3. Foundations of Statistical Decision Theory

4. The Statistical Decision Problem

Adding Sample Information

Decision Examples

Example: Hypothesis Testing

Other Examples

5. Loss Functions

What is a Loss Function?

Examples:

Asymmetric Loss Function Example

6. Decision Rules

Randomized Decisions

7. Comparing Decision Rules

Formal Definition

8. Example: Point Estimation

9. Quadratic Loss & MSE

Example: Bernoulli Case

Summary

Leave a Reply Cancel reply

Archive

Categories

Recent Posts

Share on Social Media

Follow Our Social Media

Data Visualization

Recent Posts

Recent Comments

Categories

Archives

Recent Post

Recent Comments