1. Minimax Rules: Example

Consider nonrandomized decision rules only. Randomized minimax rules will be discussed later.

$\begin{array}{c|cccccc}
& d_1 & d_2 & d_3 & d_4 & d_5 & d_6 \
\hline
R(\theta_1, d_i) & 17 & 19 & 14 & 10 & 9 & 9 \
R(\theta_2, d_i) & 14 & 4 & 4 & 6 & 8 & 16 \
\hline
\sup_{j = 1,2}R(\theta_j, d_i) & 17 & 19 & 14 & 10 & 9 & 16
\end{array}$

Nonrandomized minimax rule:

\(\sup_{\theta \in \{\theta_1, \theta_2\}}R(\theta, d_M) = \inf_{d_i} \sup_{\theta} R(\theta, d_i) = 9\)

Hence, \(d_M = d_5\) yields the minimum maximal (minimax) risk.

Notes

  • The minimax rule \(d_5 \neq\) Bayes rule except for \(w \in [\frac{2}{3}, 1]\).

  • Randomized minimax rules require least favorable distributions (to be discussed).

2. Convexity of Risk Set

Theorem: For finite \(\Theta = \{\theta_1, \ldots, \theta_k\}\), the set of risk points \(\mathcal{R}\) is convex in \(\mathbb{R}^k\).

Idea of proof: Show that \(\lambda u + (1 – \lambda)v \in \mathcal{R} \forall \lambda \in (0, 1)\).

3. Admissibility and Bayes Rules

Theorem: For finite \(\Theta\), every admissible decision rule is also a Bayes rule with some prior \(\pi\).

Definition: A positive Bayes rule uses a prior \(\pi\) such that \(\pi(\theta_j) > 0 \; \forall j\).

Theorem: Every positive Bayes rule is admissible.

4. Generalized Minimax Rules

Let:

\(\Theta_{\ast} = \{ \pi(\cdot) \mid \pi \text{ is a prior on } \Theta \}\)

Definition: A rule \(\delta_M \in \mathcal{D}\) is minimax if:

\(\sup_{\pi \in \Theta_{\ast}} B(\pi, \delta_M) = \inf_{\delta \in \mathcal{D}} \sup_{\pi} B(\pi, \delta)\)

  • \(\overline{V} = \inf_{\delta} \sup_{\pi} B(\pi, \delta)\): upper value

  • \(\underline{V} = \sup_{\pi} \inf_{\delta} B(\pi, \delta)\): lower value

5. Remarks

  • \(\delta_M\) is minimax if \(\sup_{\pi} B(\pi, \delta_M) = \overline{V}\)

  • \(\underline{V} \leq \overline{V}\)

  • Equivalent characterization:

\(\sup_{\theta} R(\theta, \delta_M) = \inf_{\delta} \sup_{\theta} R(\theta, \delta)\)

6. Least Favorable Prior

If \(\pi_0 \in \Theta_{\ast}\) such that:

\(\inf_{\delta} B(\pi_0, \delta) = \underline{V}\)

then \(\pi_0\) is a least favorable prior.

7. Questions

  • When is \(\underline{V} = \overline{V}\)?

  • When do minimax rules and least favorable priors exist?

8. Game Theory Application

  • Two-person zero-sum games: loss of one = gain of other.

  • Statistical decision problems = statistician vs. nature.

  • Nature chooses \(\theta\), statistician chooses \(\delta\).

9. Finding a Minimax Rule

  1. Guess a least favorable prior \(\tau_0\)

  2. Compute corresponding Bayes rule \(\delta_0\)

  3. Check if \(\delta_0\) satisfies:

\(R(\theta, \delta_0) \leq B(\tau_0, \delta_0) ~ \forall \theta\)

If so: \(\delta_0\) is minimax.

10. Improper Priors and Limits

Even if \(\tau_0\) is improper, one can use a sequence of proper priors \(\pi_m\) such that:

\(c_m \pi_m(\theta) \to \tau_0(\theta)\)

Then limit theorems ensure minimaxity.

11. Equalizer Rules

Definition: A rule \(\delta\) is an equalizer rule if:

\(R(\theta, \delta) = c ~ \forall \theta\)

Theorem: If \(\delta\) is an equalizer rule and Bayes w.r.t. a proper prior, then \(\delta\) is minimax.

Example: Binomial Case

\(X_i \sim Bi(1, p), \quad Y = \sum X_i, \quad \hat{p} = Y/n\)

Loss: \(L(p, d) = \frac{(p-d)^2}{p(1-p)}\)

Then \(\hat{p}\) is:

  • an equalizer rule

  • Bayes under a proper prior

Counter Example

Casella & Strawderman (1981): \(\delta_m(X) = m \cdot \tanh(mX)\)

  • Bayes under 2-point prior on \(\{-m, m\}\)

  • Minimax but not an equalizer

12. Admissible + Equalizer = Minimax

Theorem: If \(\delta\) is admissible and equalizer, then \(\delta\) is minimax.

12. Stein Phenomenon

In \(\mathbb{R}^p\), for \(p \geq 3\), the standard estimator \(\delta(x) = x\) is inadmissible.

James-Stein Estimator:

\(\delta^{JS}(x) = x\left(1 – \frac{p-2}{\|x\|^2}\right)\)

  • dominates \(\delta(x) = x\)

  • shrinkage estimator

13. Risks of James-Stein

  • \(R(\theta, x) = p\)

  • \(R(\theta, \delta^{JS}) < p\)

For p = 1: \(\delta\) has lower risk for small \(\theta\)

For p = 2: equal risk (R-equivalent)

For p \geq 3: \(\delta^{JS}\) is better

Presented below is the graphical depiction of the James-Stein risk function.

plot of chunk setup2332122612prepa

plot of chunk setup2332122612prepa

Further readings

\bigskip

Berger (1985), Casella & Berger (1990), Casella & Strawderman (1981), Stein (1955)

Stay tuned for the next part!

We gratefully acknowledge Dr. Dany Djeudeu for preparing this course.