1. Basic Idea
Statistics is the science of learning from data. It has two main purposes:
- To describe what has been observed.
- To make inferences about the wider world beyond the observed data.
Often, we are interested in understanding a population, the entire group we want to study. Because collecting data from the whole population is usually impractical (too costly, too time-consuming, or simply impossible), we work with a sample. From this sample, we estimate population characteristics or test assumptions about them.
Traditional statistics therefore focuses on:
- Estimating parameters (e.g., the population mean or proportion).
- Testing hypotheses (e.g., whether a parameter takes a certain value).
New Focus:
Decision theory builds on this foundation but takes a broader view.
Here, the central goal is not only to estimate or test, but to make decisions under uncertainty.
We make decisions under uncertainty based on the available data and prior knowledge.
Example: Bringing a New Painkiller to Market
A pharmaceutical company is considering launching a new painkiller.
Parameters:
- \(θ_1\): proportion of population for which the painkiller is effective
- \(θ_2\): expected market share of the painkiller
In classical statistics, one would estimate these parameters or test hypotheses.
In decision theory, the company must make practical decisions:
- Should we launch the product?
- At what price?
- How much should we invest in marketing?
2. Two New Key Ingredients
- Consequences of decisions → We need to assess gain or loss associated with actions.
- Prior knowledge about parameters (from databases, expert opinion, or prior studies) → Leads to Bayesian thinking.
3. Foundations of Statistical Decision Theory
Traditional statistics rests on three pillars:
- Estimation
- Testing
- Confidence intervals
Decision theory offers a common framework for all of them.
In this course, we will mainly focus on estimation and testing.
4. The Statistical Decision Problem
Start: We observe a random variable \(X\) with unknown distribution \(F\).
Goal: Learn about \(F\) (parameters, probabilities, etc.).
More formally:
- \(F \in \mathfrak{P}\): a family of probability distributions
Examples:
- Nonparametric: \(F\) is arbitrary
- Parametric: \(F(x, \theta)\) with \(\theta \in \Theta\)
## Example families:
## 1. F = set of all continuous distributions
## 2. F = {N(μ, σ²) | μ in ℝ, σ² > 0}
## 3. F = {Bin(n, p) | p in (0,1)}
Adding Sample Information
We often observe data:
We get observations \(x_1, …, x_n\) from random variables \(X_1, …, X_n\).
- Assume: \(X_1, …, X_n\) are i.i.d.
- Let \(\vec{X} = (X_1, …, X_n)’\), and \(\vec{x}\) its observed values.
Based on \(\vec{x}\), we make a decision \(d\) (also called an action).
Decision Examples
Example: Hypothesis Testing
- \(d_0\): accept \(H_0\): \(\theta \in \Theta_0\)
- \(d_1\): accept \(H_1\): \(\theta \in \Theta_1\)
The decision set \(D\) contains all possible decisions.
Other Examples
- Estimation: \(D = \Theta\)
Note:
- A decision can be correct or incorrect
- Mistakes can have different “costs”
5. Loss Functions
What is a Loss Function?
A loss function \(L(\theta, d)\) measures how bad a decision \(d\) is, when the true parameter is \(\theta\).
Key idea: minimize expected loss (called risk).
Examples:
- Absolute error: \(L(\theta, d) = |\theta – d|\)
- Squared error: \(L(\theta, d) = (\theta – d)^2\) → common in practice
Asymmetric Loss Function Example
In our painkiller example:
- Overestimating market share is twice as costly as underestimating
## [1] 0.4
6. Decision Rules
A decision rule \(\delta\) maps data \(\vec{x}\) to a decision \(d\):
$$\delta: \mathfrak{X} \rightarrow D, \quad \delta(\vec{x}) = d$$
Often, we care about the expected loss (called risk):
$$R(\theta, \delta) = E[L(\theta, \delta(\vec{X}))]$$
Randomized Decisions
Sometimes, decisions are randomized: instead of a fixed \(d\), we assign probabilities over \(D\).
This allows more flexibility, especially in hypothesis testing.
7. Comparing Decision Rules
How do we decide between two decision rules \(\delta_1\) and \(\delta_2\)?
Use their risk functions:
- If \(R(\theta, \delta_1) \leq R(\theta, \delta_2)\) for all \(\theta\) and
- Strict inequality for some \(\theta\)
Then \(\delta_1\) is better than \(\delta_2\).
In most cases, however, one risk function outperforms the other within a certain region; for example, $\delta_1$ is better than $\delta_2$ in the interval $(0, \theta_x)$ ($\theta_x$ not included).
Formal Definition
A statistical decision problem is the quadruple:
\((\Theta, \mathfrak{X}, D, L)\)
Where:
- \(\Theta\): parameter space
- \(\mathfrak{X}\): sample space
- \(D\): decision set
- \(L\): loss function
8. Example: Point Estimation
Let \(\theta\) be the probability in a \(Bin(n, \theta)\) distribution.
Estimate:
- \(D = [0,1]\)
- Sample mean: \(T(\vec{x}) = \frac{1}{n} \sum x_i\)
Note: \(T=0\) or \(T=1\) are possible, but \(\theta \in (0,1)\).
9. Quadratic Loss & MSE
With squared error loss:
\(R(\theta, \delta) = E[(\delta(\vec{X}) – \theta)^2]\)
This is the mean squared error (MSE):
\(MSE = Var(\delta(\vec{X})) + (Bias(\delta(\vec{X}), \theta))^2\)
Ideal: unbiased estimator with small variance → UMVU estimator
Example: Bernoulli Case
Let \(X_i \sim Bin(1, \theta)\), \(\theta \in (0,1)\).
- \(\delta_1 = \bar{X}\): sample mean → UMVU estimator
- \(\delta_2\): Bayes estimator with prior \(Beta(\alpha, \beta)\)
n <- 4 # or try n <- 400
alpha <- sqrt(n / 4)
beta <- alpha
MSE2 <- n / (4 * (n + sqrt(n))^2)
MSE2
## [1] 0.02777778
\(\delta_2\) has constant MSE, independent of \(\theta\)
Summary
- Decision Theory helps us make practical decisions with uncertainty.
- We model decisions, outcomes, and preferences through loss functions.
- We compare decision rules via their risk (expected loss).
- Classic estimation and testing are special cases.
Stay tuned for the next part!
We gratefully acknowledge Dr. Dany Djeudeu for preparing this course.
We help businesses and researchers solve complex challenges by providing expert guidance in statistics, machine learning, and tailored education.
Our core services include:
– Statistical Consulting:
Comprehensive consulting tailored to your data-driven needs.
– Training and Coaching:
In-depth instruction in statistics, machine learning, and the use of statistical software such as SAS, R, and Python.
– Reproducible Data Analysis Pipelines:
Development of documented, reproducible workflows using SAS macros and customized R and Python code.
– Interactive Data Visualization and Web Applications:
Creation of dynamic visualizations and web apps with R (Shiny, Plotly), Python (Streamlit, Dash by Plotly), and SAS (SAS Viya, SAS Web Report Studio).
– Automated Reporting and Presentation:
Generation of automated reports and presentations using Markdown and Quarto.
– Scientific Data Analysis:
Advanced analytical support for scientific research projects.