|
Preface |
1 |
|
|
Variability, Information, and Prediction |
16 |
|
|
The Curse of Dimensionality |
18 |
|
|
The Two Extremes |
19 |
|
|
Perspectives on the Curse |
20 |
|
|
Sparsity |
21 |
|
|
Exploding Numbers of Models |
23 |
|
|
Multicollinearity and Concurvity |
24 |
|
|
The Effect of Noise |
25 |
|
|
Coping with the Curse |
26 |
|
|
Selecting Design Points |
26 |
|
|
Local Dimension |
27 |
|
|
Parsimony |
32 |
|
|
Two Techniques |
33 |
|
|
The Bootstrap |
33 |
|
|
Cross-Validation |
42 |
|
|
Optimization and Search |
47 |
|
|
Univariate Search |
47 |
|
|
Multivariate Search |
48 |
|
|
General Searches |
49 |
|
|
Constraint Satisfaction and Combinatorial Search |
50 |
|
|
Notes |
53 |
|
|
Hammersley Points |
53 |
|
|
Edgeworth Expansions for the Mean |
54 |
|
|
Bootstrap Asymptotics for the Studentized Mean |
56 |
|
|
Exercises |
58 |
|
|
Local Smoothers |
68 |
|
|
Early Smoothers |
70 |
|
|
Transition to Classical Smoothers |
74 |
|
|
Global Versus Local Approximations |
75 |
|
|
LOESS |
79 |
|
|
Kernel Smoothers |
82 |
|
|
Statistical Function Approximation |
83 |
|
|
The Concept of Kernel Methods and the Discrete Case |
88 |
|
|
Kernels and Stochastic Designs: Density Estimation |
93 |
|
|
Stochastic Designs: Asymptotics for Kernel Smoothers |
96 |
|
|
Convergence Theorems and Rates for Kernel Smoothers |
101 |
|
|
Kernel and Bandwidth Selection |
105 |
|
|
Linear Smoothers |
110 |
|
|
Nearest Neighbors |
111 |
|
|
Applications of Kernel Regression |
115 |
|
|
A Simulated Example |
115 |
|
|
Ethanol Data |
117 |
|
|
Exercises |
122 |
|
|
Spline Smoothing |
132 |
|
|
Interpolating Splines |
132 |
|
|
Natural Cubic Splines |
138 |
|
|
Smoothing Splines for Regression |
141 |
|
|
Model Selection for Spline Smoothing |
144 |
|
|
Spline Smoothing Meets Kernel Smoothing |
145 |
|
|
Asymptotic Bias, Variance, and MISE for Spline Smoothers |
146 |
|
|
Ethanol Data Example -- Continued |
148 |
|
|
Splines Redux: Hilbert Space Formulation |
151 |
|
|
Reproducing Kernels |
153 |
|
|
Constructing an RKHS |
156 |
|
|
Direct Sum Construction for Splines |
161 |
|
|
Explicit Forms |
164 |
|
|
Nonparametrics in Data Mining and Machine Learning |
167 |
|
|
Simulated Comparisons |
169 |
|
|
What Happens with Dependent Noise Models? |
172 |
|
|
Higher Dimensions and the Curse of Dimensionality |
174 |
|
|
Notes |
178 |
|
|
Sobolev Spaces: Definition |
178 |
|
|
Exercises |
179 |
|
|
New Wave Nonparametrics |
186 |
|
|
Additive Models |
187 |
|
|
The Backfitting Algorithm |
188 |
|
|
Concurvity and Inference |
192 |
|
|
Nonparametric Optimality |
195 |
|
|
Generalized Additive Models |
196 |
|
|
Projection Pursuit Regression |
199 |
|
|
Neural Networks |
204 |
|
|
Backpropagation and Inference |
207 |
|
|
Barron's Result and the Curse |
212 |
|
|
Approximation Properties |
213 |
|
|
Barron's Theorem: Formal Statement |
215 |
|
|
Recursive Partitioning Regression |
217 |
|
|
Growing Trees |
219 |
|
|
Pruning and Selection |
222 |
|
|
Regression |
223 |
|
|
Bayesian Additive Regression Trees: BART |
225 |
|
|
MARS |
225 |
|
|
Sliced Inverse Regression |
230 |
|
|
ACE and AVAS |
233 |
|
|
Notes |
235 |
|
|
Proof of Barron's Theorem |
235 |
|
|
Exercises |
239 |
|
|
Supervised Learning: Partition Methods |
246 |
|
|
Multiclass Learning |
248 |
|
|
Discriminant Analysis |
250 |
|
|
Distance-Based Discriminant Analysis |
251 |
|
|
Bayes Rules |
256 |
|
|
Probability-Based Discriminant Analysis |
260 |
|
|
Tree-Based Classifiers |
264 |
|
|
Splitting Rules |
264 |
|
|
Logic Trees |
268 |
|
|
Random Forests |
269 |
|
|
Support Vector Machines |
277 |
|
|
Margins and Distances |
277 |
|
|
Binary Classification and Risk |
280 |
|
|
Prediction Bounds for Function Classes |
283 |
|
|
Constructing SVM Classifiers |
286 |
|
|
SVM Classification for Nonlinearly Separable Populations |
294 |
|
|
SVMs in the General Nonlinear Case |
297 |
|
|
Some Kernels Used in SVM Classification |
303 |
|
|
Kernel Choice, SVMs and Model Selection |
304 |
|
|
Support Vector Regression |
305 |
|
|
Multiclass Support Vector Machines |
308 |
|
|
Neural Networks |
309 |
|
|
Notes |
311 |
|
|
Hoeffding's Inequality |
311 |
|
|
VC Dimension |
312 |
|
|
Exercises |
315 |
|
|
Alternative Nonparametrics |
322 |
|
|
Ensemble Methods |
323 |
|
|
Bayes Model Averaging |
325 |
|
|
Bagging |
327 |
|
|
Stacking |
331 |
|
|
Boosting |
333 |
|
|
Other Averaging Methods |
341 |
|
|
Oracle Inequalities |
343 |
|
|
Bayes Nonparametrics |
349 |
|
|
Dirichlet Process Priors |
349 |
|
|
Polya Tree Priors |
351 |
|
|
Gaussian Process Priors |
353 |
|
|
The Relevance Vector Machine |
359 |
|
|
RVM Regression: Formal Description |
360 |
|
|
RVM Classification |
364 |
|
|
Hidden Markov Models -- Sequential Classification |
367 |
|
|
Notes |
369 |
|
|
Proof of Yang's Oracle Inequality |
369 |
|
|
Proof of Lecue's Oracle Inequality |
372 |
|
|
Exercises |
374 |
|
|
Computational Comparisons |
379 |
|
|
Computational Results: Classification |
380 |
|
|
Comparison on Fisher's Iris Data |
380 |
|
|
Comparison on Ripley's Data |
383 |
|
|
Computational Results: Regression |
390 |
|
|
Vapnik's sinc Function |
391 |
|
|
Friedman's Function |
403 |
|
|
Conclusions |
406 |
|
|
Systematic Simulation Study |
411 |
|
|
No Free Lunch |
414 |
|
|
Exercises |
416 |
|
|
Unsupervised Learning: Clustering |
419 |
|
|
Centroid-Based Clustering |
422 |
|
|
K-Means Clustering |
423 |
|
|
Variants |
426 |
|
|
Hierarchical Clustering |
427 |
|
|
Agglomerative Hierarchical Clustering |
428 |
|
|
Divisive Hierarchical Clustering |
436 |
|
|
Theory for Hierarchical Clustering |
440 |
|
|
Partitional Clustering |
444 |
|
|
Model-Based Clustering |
446 |
|
|
Graph-Theoretic Clustering |
461 |
|
|
Spectral Clustering |
466 |
|
|
Bayesian Clustering |
472 |
|
|
Probabilistic Clustering |
472 |
|
|
Hypothesis Testing |
475 |
|
|
Computed Examples |
477 |
|
|
Ripley's Data |
479 |
|
|
Iris Data |
489 |
|
|
Cluster Validation |
494 |
|
|
Notes |
498 |
|
|
Derivatives of Functions of a Matrix: |
498 |
|
|
Kruskal's Algorithm: Proof |
498 |
|
|
Prim's Algorithm: Proof |
499 |
|
|
Exercises |
499 |
|
|
Learning in High Dimensions |
506 |
|
|
Principal Components |
508 |
|
|
Main Theorem |
509 |
|
|
Key Properties |
511 |
|
|
Extensions |
513 |
|
|
Factor Analysis |
515 |
|
|
Finding and |
517 |
|
|
Finding K |
519 |
|
|
Estimating Factor Scores |
520 |
|
|
Projection Pursuit |
521 |
|
|
Independent Components Analysis |
524 |
|
|
Main Definitions |
524 |
|
|
Key Results |
526 |
|
|
Computational Approach |
528 |
|
|
Nonlinear PCs and ICA |
529 |
|
|
Nonlinear PCs |
530 |
|
|
Nonlinear ICA |
531 |
|
|
Geometric Summarization |
531 |
|
|
Measuring Distances to an Algebraic Shape |
532 |
|
|
Principal Curves and Surfaces |
533 |
|
|
Supervised Dimension Reduction: Partial Least Squares |
536 |
|
|
Simple PLS |
536 |
|
|
PLS Procedures |
537 |
|
|
Properties of PLS |
539 |
|
|
Supervised Dimension Reduction: Sufficient Dimensions in Regression |
540 |
|
|
Visualization I: Basic Plots |
544 |
|
|
Elementary Visualization |
547 |
|
|
Projections |
554 |
|
|
Time Dependence |
556 |
|
|
Visualization II: Transformations |
559 |
|
|
Chernoff Faces |
559 |
|
|
Multidimensional Scaling |
560 |
|
|
Self-Organizing Maps |
566 |
|
|
Exercises |
573 |
|
|
Variable Selection |
582 |
|
|
Concepts from Linear Regression |
583 |
|
|
Subset Selection |
585 |
|
|
Variable Ranking |
588 |
|
|
Overview |
590 |
|
|
Traditional Criteria |
591 |
|
|
Akaike Information Criterion (AIC) |
593 |
|
|
Bayesian Information Criterion (BIC) |
596 |
|
|
Choices of Information Criteria |
598 |
|
|
Cross Validation |
600 |
|
|
Shrinkage Methods |
612 |
|
|
Shrinkage Methods for Linear Models |
614 |
|
|
Grouping in Variable Selection |
628 |
|
|
Least Angle Regression |
630 |
|
|
Shrinkage Methods for Model Classes |
633 |
|
|
Cautionary Notes |
644 |
|
|
Bayes Variable Selection |
645 |
|
|
Prior Specification |
648 |
|
|
Posterior Calculation and Exploration |
656 |
|
|
Evaluating Evidence |
660 |
|
|
Connections Between Bayesian and Frequentist Methods |
663 |
|
|
Computational Comparisons |
666 |
|
|
The n > p Case |
666 |
|
|
When p > n |
678 |
|
|
Notes |
680 |
|
|
Code for Generating Data in Section 10.5 |
680 |
|
|
Exercises |
684 |
|
|
Multiple Testing |
692 |
|
|
Analyzing the Hypothesis Testing Problem |
694 |
|
|
A Paradigmatic Setting |
694 |
|
|
Counts for Multiple Tests |
697 |
|
|
Measures of Error in Multiple Testing |
698 |
|
|
Aspects of Error Control |
700 |
|
|
Controlling the Familywise Error Rate |
703 |
|
|
One-Step Adjustments |
703 |
|
|
Stepwise p-Value Adjustments |
706 |
|
|
PCER and PFER |
708 |
|
|
Null Domination |
709 |
|
|
Two Procedures |
710 |
|
|
Controlling the Type I Error Rate |
715 |
|
|
Adjusted p-Values for PFER/PCER |
719 |
|
|
Controlling the False Discovery Rate |
720 |
|
|
FDR and other Measures of Error |
722 |
|
|
The Benjamini-Hochberg Procedure |
723 |
|
|
A BH Theorem for a Dependent Setting |
724 |
|
|
Variations on BH |
726 |
|
|
Controlling the Positive False Discovery Rate |
732 |
|
|
Bayesian Interpretations |
732 |
|
|
Aspects of Implementation |
736 |
|
|
Bayesian Multiple Testing |
740 |
|
|
Fully Bayes: Hierarchical |
741 |
|
|
Fully Bayes: Decision theory |
744 |
|
|
Notes |
749 |
|
|
Proof of the Benjamini-Hochberg Theorem |
749 |
|
|
Proof of the Benjamini-Yekutieli Theorem |
752 |
|
|
References |
756 |
|
|
Index |
785 |
|