Statistical regression and classification: from linear models to machine learning
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Boca Raton London ; New York
CRC Press, Taylor & Francis Group
[2017]
|
Schriftenreihe: | Texts in statistical science
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis Inhaltsverzeichnis |
Beschreibung: | Includes bibliographical references Im Buch ist die ISBN der Hardbackausgabe fälschlich als: 978-1-138-06656-5 angegeben |
Beschreibung: | xxxviii, 489 Seiten Illustrationen, Diagramme |
ISBN: | 9781498710916 9781138066465 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV044901487 | ||
003 | DE-604 | ||
005 | 20180726 | ||
007 | t | ||
008 | 180412s2017 a||| |||| 00||| eng d | ||
010 | |a 017011270 | ||
020 | |a 9781498710916 |c pbk |9 978-1-4987-1091-6 | ||
020 | |a 9781138066465 |c hbk |9 978-1-138-06646-5 | ||
035 | |a (OCoLC)988749360 | ||
035 | |a (DE-599)BVBBV044901487 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-739 |a DE-384 |a DE-521 | ||
050 | 0 | |a QA278.2 | |
082 | 0 | |a 519.5/36 |2 23 | |
084 | |a QH 234 |0 (DE-625)141549: |2 rvk | ||
084 | |a SK 840 |0 (DE-625)143261: |2 rvk | ||
100 | 1 | |a Matloff, Norman S. |d 1948- |e Verfasser |0 (DE-588)1018956115 |4 aut | |
245 | 1 | 0 | |a Statistical regression and classification |b from linear models to machine learning |c Norman Matloff, University of California, Davis, USA |
264 | 1 | |a Boca Raton London ; New York |b CRC Press, Taylor & Francis Group |c [2017] | |
264 | 4 | |c © 2017 | |
300 | |a xxxviii, 489 Seiten |b Illustrationen, Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Texts in statistical science | |
500 | |a Includes bibliographical references | ||
500 | |a Im Buch ist die ISBN der Hardbackausgabe fälschlich als: 978-1-138-06656-5 angegeben | ||
650 | 4 | |a Regression analysis | |
650 | 4 | |a Vector analysis | |
650 | 0 | 7 | |a Lineare Regression |0 (DE-588)4167709-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Automatische Klassifikation |0 (DE-588)4120957-6 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Lineare Regression |0 (DE-588)4167709-2 |D s |
689 | 0 | 1 | |a Automatische Klassifikation |0 (DE-588)4120957-6 |D s |
689 | 0 | |5 DE-604 | |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030295275&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030295275&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-030295275 |
Datensatz im Suchindex
_version_ | 1804178457277497344 |
---|---|
adam_text | Contents
Preface xxix
List of Symbols xxxvii
1 Setting the Stage 1
1.1 Example: Predicting Bike-Sharing
Activity................................................... 1
1.2 Example of the Prediction Goal: Body Fat .................. 2
1.3 Example: Who Clicks Web Ads?............................... 3
1.4 Approach to Prediction..................................... 4
1.5 A Note about E(), Samples and Populations.................. 5
1.6 Example of the Description Goal: Do
Baseball Players Gain Weight As They Age? ................. 6
1.6.1 Prediction vs. Description.......................... 7
1.6.2 A First Estimator................................... 9
1.6.3 A Possibly Better Estimator, Using a. Linear Model 10
1.7 Parametric vs. Nonparametric Models....................... 15
1.8 Example: Click-Through Rate............................... 15
1.9 Several Predictor Variables............................... 17
1.9.1 Multipredictor Linear Models....................... 18
ix
X
CONTENTS
1.9.1.1 Estimation of Coefficients................. 18
1.9.1.2 The Description Goal...................... 19
1.9.2 Nonparametric Regression Estimation: k-NN .... 19
1.9.2.1 Looking at Nearby Points................... 20
1.9.2.2 Measures of Nearness....................... 20
1.9.2.3 The k-NN Method, and Tuning Parameters 21
1.9.2.4 Nearest-Neighbor Analysis in the regtools
Package..................................... 21
1.9.2.5 Example: Baseball Player Data.............. 22
1.10 After Fitting a Model, How Do We Use It for Prediction? . 22
1.10.1 Parametric Settings................................. 22
1.10.2 Nonparametric Settings.............................. 23
1.10.3 The Generic predict() Function...................... 23
1.11 Overfitting, and the Variance-Bias
Tradeoff................................................... 24
1.11.1 Intuition........................................... 24
1.11.2 Example: Student Evaluations of Instructors .... 26
1.12 Cross-Validation........................................... 26
1.12.1 Linear Model Case................................... 28
1.12.1.1 The Code................................... 28
1.12.1.2 Applying the Code.......................... 29
1.12.2 k-NN Case........................................... 29
1.12.3 Choosing the Partition Sizes ....................... 30
1.13 Important Note on Tuning Parameters........................ 31
1.14 Rough Rule of Thumb........................................ 32
1.15 Example: Bike-Sharing Data................................. 32
1.15.1 Linear Modeling of ................................. 33
1.15.2 Nonparametric Analysis.............................. 38
CONTENTS
xi
1.16 Interaction Terms, Including Quadratics................... 38
1.16.1 Example: Salaries of Female Programmers
and Engineers........................................ 39
1.16.2 Fitting Separate Models.............................. 42
1.16.3 Saving Your Work..................................... 43
1.16.4 Higher-Order Polynomial Models....................... 43
1.17 Classification Techniques.................................. 44
1.17.1 It’s a Regression Problem!........................... 44
1.17.2 Example: Bike-Sharing Data........................... 45
1.18 Crucial Advice: Don’t Automate, Participate!............... 47
1.19 Mathematical Complements................................... 48
1.19.1 Indicator Random Variables........................... 48
1.19.2 Mean Squared Error of an Estimator................. 48
1.19.3 fi(t) Minimizes Mean Squared Prediction Error ... 49
1.19.4 fj,(t) Minimizes the Misclassification Rate........ 50
1.19.5 Some Properties of Conditional Expectation......... 52
1.19.5.1 Conditional Expectation As a Random
Variable.................................... 52
1.19.5.2 The Law of Total Expectation.............. 53
1.19.5.3 Law of Total Variance....................... 54
1.19.5.4 Tower Property ............................. 54
1.19.5.5 Geometric View.............................. 54
1.20 Computational Complements.................................. 55
1.20.1 CRAN Packages........................................ 55
1.20.2 The Function tapply() and Its Cousins................ 56
1.20.3 The Innards of the k-NN Code......................... 58
1.20.4 Function Dispatch.................................... 59
1.21 Centering and Scaling...................................... 60
CONTENTS
xii
1.22 Exercises: Data, Code and Math Problems ................. 61
2 Linear Regression Models 65
2.1 Notation.................................................... 65
2.2 The “Error Term” ........................................... 67
2.3 Random- vs. Fixed-X Cases................................... 67
2.4 Least-Squares Estimation.................................... 68
2.4.1 Motivation........................................... 68
2.4.2 Matrix Formulations.................................. 70
2.4.3 (2.18) in Matrix Terms............................... 71
2.4.4 Using Matrix Operations to Minimize (2.18)........... 71
2.4.5 Models without an Intercept Term..................... 72
2.5 A Closer Look at lm() Output ............................... 73
2.5.1 Statistical Inference................................ 74
2.6 Assumptions ................................................ 75
2.6.1 Classical............................................ 75
2.6.2 Motivation: the Multivariate Normal
Distribution Family ................................. 76
2.7 Unbiasedness and Consistency................................ 79
2.7.1 /3 Is Unbiased....................................... 79
2.7.2 Bias As an Issue/Nonissue............................ 80
2.7.3 /3 Is Statistically Consistent....................... 80
2.8 Inference under Homoscedasticity............................ 81
2.8.1 Review: Classical Inference on a Single Mean .... 81
2.8.2 Back to Reality...................................... 82
2.8.3 The Concept of a Standard Error...................... 83
2.8.4 Extension to the Regression Case..................... 83
2.8.5 Example: Bike-Sharing Data........................... 86
CONTENTS xiii
2.9 Collective Predictive Strength of the 88
2.9.1 Basic Properties..................................... 88
2.9.2 Definition of R2 .................................... 90
2.9.3 Bias Issues.......................................... 91
2.9.4 Adjusted-i?2 ........................................ 92
2.9.5 The “Leaving-One-Out Method”......................... 94
2.9.6 Extensions of LOOM................................... 95
2.9.7 LOOM for k-NN........................................ 95
2.9.8 Other Measures....................................... 96
2.10 The Practical Value of p-Values — Small OR Large .... 96
2.10.1 Misleadingly Small p-Values.......................... 97
2.10.1.1 Example: Forest Cover Data.................. 97
2.10.1.2 Example: Click Through Data................. 98
2.10.2 Misleadingly LARGE p-Values.......................... 99
2.10.3 The Verdict......................................... 100
2.11 Missing Values............................................. 100
2.12 Mathematical Complements................................... 101
2.12.1 Covariance Matrices................................. 101
2.12.2 The Multivariate Normal Distribution Family .... 103
2.12.3 The Central Limit Theorem........................... 104
2.12.4 Details on Models Without a Constant Term .... 104
2.12.5 Unbiasedness of the Least-Squares Estimator .... 105
2.12.6 Consistency of the Least-Squares Estimator.......... 106
2.12.7 Biased Nature of S ................................. 108
2.12.8 The Geometry of Conditional Expectation............. 108
2.12.8.1 Random Variables As Inner Product Spaces 108
2.12.8.2 Projections................................ 109
XIV
CONTENTS
2.12.8.3 Conditional Expectations As Projections . 110
2.12.9 Predicted Values and Error Terms Are Uncorrelated 111
2.12.10 Classical “Exact” Inference...................... 112
2.12.11 Asymptotic (p H- 1)-Variate Normality of ¡3 ... 113
2.13 Computational Complements.............................. 115
2.13.1 Details of the Computation of (2.28).............. 115
2.13.2 R Functions for the Multivariate Normal
Distribution Family .............................. 116
2.13.2.1 Example: Simulation Computation of a
Bivariate Normal Quantity................. 116
2.13.3 More Details of 5lm’ Objects ..................... 118
2.14 Exercises: Data, Code and Math Problems ................. 120
3 Homoscedasticity and Other Assumptions in Practice 123
3.1 Normality Assumption..................................... 124
3.2 Independence Assumption — Don’t
Overlook It.............................................. 125
3.2.1 Estimation of a Single Mean ...................... 125
3.2.2 Inference on Linear Regression Coefficients....... 126
3.2.3 What Can Be Done?................................. 126
3.2.4 Example: MovieLens Data ........................ 127
3.3 Dropping the Homoscedasticity
Assumption............................................... 130
3.3.1 Robustness of the Homoscedasticity Assumption . . 131
3.3.2 Weighted Least Squares.......................... 133
3.3.3 A Procedure for Valid Inference................... 135
3.3.4 The Methodology................................... 135
3.3.5 Example: Female Wages............................. 136
3.3.6 Simulation Test................................... 137
CONTENTS
xv
3.3.7 Variance-Stabilizing Transformations............... 137
3.3.8 The Verdict........................................ 139
3.4 Further Reading........................................... 139
3.5 Computational Complements................................. 140
3.5.1 The R mergeQ Function.............................. 140
3.6 Mathematical Complements.................................. 141
3.6.1 The Delta Method................................... 141
3.6.2 Distortion Due to Transformation .................. 142
3.7 Exercises: Data, Code and Math Problems .................. 143
4 Generalized Linear and Nonlinear Models 147
4.1 Example: Enzyme Kinetics Model............................ 148
4.2 The Generalized Linear Model (GLM)........................ 150
4.2.1 Definition......................................... 150
4.2.2 Poisson Regression................................. 151
4.2.3 Exponential Families............................... 152
4.2.4 R*s glm() Function................................. 153
4.3 GLM: the Logistic Model................................... 154
4.3.1 Motivation......................................... 155
4.3.2 Example: Pima Diabetes Data........................ 158
4.3.3 Interpretation of Coefficients..................... 159
4.3.4 The predict() Function Again....................... 161
4.3.5 Overall Prediction Accuracy........................ 162
4.3.6 Example: Predicting Spam E-mail.................... 163
4.3.7 Linear Boundary.................................... 164
4.4 GLM: the Poisson Regression Model......................... 165
4.5 Least-Squares Computation................................. 166
CONTENTS
xvi
4.5.1 The Gauss-Newton Method........................... 166
4.5.2 Eicker-White Asymptotic Standard Errors........... 168
4.5.3 Example: Bike Sharing Data........................ 171
4.5.4 The “Elephant in the Room’ : Convergence
Issues............................................ 172
4.6 Further Reading.......................................... 173
4.7 Computational Complements................................ 173
4.7.1 GLM Computation................................... 173
4.7.2 R Factors ........................................ 174
4.8 Mathematical Complements................................. 175
4.8.1 Maximum Likelihood Estimation..................... 175
4.9 Exercises: Data, Code and Math Problems ................. 176
5 Multiclass Classification Problems 179
5.1 Key Notation............................................. 179
5.2 Key Equations............................................ 180
5.3 Estimating the Functions p*(t)........................... 182
5.4 How Do We Use Models for Prediction?..................... 182
5.5 One vs. All or All vs. All?.............................. 183
5.5.1 Which Is Better?.................................. 184
5.5.2 Example: Vertebrae Data........................... 184
5.5.3 Intuition......................................... 185
5.5.4 Example: Letter Recognition Data.................. 186
5.5.5 Example: k-NN on the Letter Recognition Data . . 187
5.5.6 The Verdict....................................... 188
5.6 Fisher Linear Discriminant Analysis...................... 188
5.6.1 Background........................................ 189
5.6.2 Derivation........................................ 189
CONTENTS xvii
5.6.3 Example: Vertebrae Data........................... 190
5.6.3.1 LDA Code and Results..................... 190
5.7 Multinomial Logistic Model............................... 191
5.7.1 Model............................................. 191
5.7.2 Software.......................................... 192
5.7.3 Example: Vertebrae Data........................... 192
5.8 The Issue of “Unbalanced” (and Balanced) Data............ 193
5.8.1 Why the Concern Regarding Balance?................ 194
5.8.2 A Crucial Sampling Issue.......................... 195
5.8.2.1 It All Depends on How We Sample .... 195
5.8.2.2 Remedies................................. 197
5.8.3 Example: Letter Recognition....................... 198
5.9 Going Beyond Using the 0.5 Threshold.................... 200
5.9.1 Unequal Misclassification Costs................... 200
5.9.2 Revisiting the Problem of Unbalanced Data......... 201
5.9.3 The Confusion Matrix and the ROC Curve............ 202
5.9.3.1 Code..................................... 203
5.9.3.2 Example: Spam Data....................... 203
5.10 Mathematical Complements................................. 203
5.10.1 Classification via Density Estimation............. 203
5.10.1.1 Methods for Density Estimation........... 204
5.10.2 Time Complexity Comparison, OVA vs. AVA .... 205
5.10.3 Optimal Classification Rule for
Unequal Error Costs............................... 206
5.11 Computational Complements................................ 207
5.11.1 R Code for OVA and AVA Logit Analysis............. 207
5.11.2 ROC Code.......................................... 210
xviii CONTENTS
5.12 Exercises: Data, Code and Math Problems ............... 211
6 Model Fit Assessment and Improvement 215
6.1 Aims of This Chapter..................................... 215
6.2 Methods.................................................. 216
6.3 Notation................................................. 216
6.4 Goals of Model Fit-Checking.............................. 217
6.4.1 Prediction Context................................ 217
6.4.2 Description Context............................... 218
6.4.3 Center vs. Fringes of the Data Set................ 218
6.5 Example: Currency Data.................................. 219
6.6 Overall Measures of Model Fit ........................... 220
6.6.1 R-Squared, Revisited.............................. 221
6.6.2 Cross-Validation, Revisited....................... 222
6.6.3 Plotting Parametric Fit Against a
Nonparametric One ................................ 222
6.6.4 Residuals vs. Smoothing........................... 223
6.7 Diagnostics Related to Individual
Predictors............................................... 224
6.7.1 Partial Residual Plots............................ 225
6.7.2 Plotting Nonpar ametric Fit Against
Each Predictor.................................... 227
6.7.3 The freqparcoord Package.......................... 229
6.7.3.1 Parallel Coordinates..................... 229
6.7.3.2 The freqparcoord Package................. 229
6.7.3.3 The regdiagQ Function.................... 230
6.8 Effects of Unusual Observations on Model Fit............. 232
6.8.1 The influenceQ Function........................... 232
CONTENTS
xix
6.8.1.1 Example: Currency Data.................... 233
6.8.2 Use of freqparcoord for Outlier Detection......... 235
6.9 Automated Outlier Resistance............................. 236
6.9.1 Median Regression.................................. 236
6.9.2 Example: Currency Data............................. 238
6.10 Example: Vocabulary Acquisition.......................... 238
6.11 Classification Settings.................................. 241
6.11.1 Example: Pima Diabetes Study....................... 242
6.12 Improving Fit............................................ 245
6.12.1 Deleting Terms from the Model..................... 245
6.12.2 Adding Polynomial Terms............................ 247
6.12.2.1 Example: Currency Data.................... 247
6.12.2.2 Example: Census Data...................... 248
6.12.3 Boosting........................................... 251
6.12.3.1 View from the 30,000-Foot Level.......... 251
6.12.3.2 Performance............................... 253
6.13 A Tool to Aid Model Selection............................ 254
6.14 Special Note on the Description Goal..................... 255
6.15 Computational Complements................................ 255
6.15.1 Data Wrangling for the Currency Dataset.............255
6.15.2 Data Wrangling for the Word Bank Dataset........... 256
6.16 Mathematical Complements................................. 257
6.16.1 The Hat Matrix .................................... 257
6.16.2 Matrix Inverse Update.............................. 259
6.16.3 The Median Minimizes Mean Absolute
Deviation.......................................... 260
6.16.4 The Gauss-Markov Theorem........................... 261
XX
CONTENTS
6.16.4.1 Lagrange Multipliers..................... 261
6.16.4.2 Proof of Gauss-Markov.................... 262
6.17 Exercises: Data, Code and Math Problems ................ 264
7 Disaggregating Regressor Effects 267
7.1 A Small Analytical Example............................... 268
7.2 Example: Baseball Player Data............................ 270
7.3 Simpson’s Paradox........................................ 274
7.3.1 Example: UCB Admissions Data (Logit)........... 274
7.3.2 The Verdict....................................... 278
7.4 Unobserved Predictor Variables........................... 278
7.4.1 Instrumental Variables (IVs) ..................... 279
7.4.1.1 The IV Method .......................... 281
7.4.1.2 Two-Stage Least Squares:................. 283
7.4.1.3 Example: Years of Schooling............. 284
7.4.1.4 The Verdict.............................. 286
7.4.2 Random Effects Models............................. 286
7.4.2.1 Example: Movie Ratings Data.............. 287
7.4.3 Multiple Random Effects .......................... 288
7.4.4 Why Use Random/Mixed Effects Models?........... 288
7.5 Regression Function Averaging............................ 289
7.5.1 Estimating the Counterfactual..................... 290
7.5.1.1 Example: Job Training.................... 290
7.5.2 Small Area Estimation: ‘‘Borrowing from
Neighbors” ....................................... 291
7.5.3 The Verdict....................................... 295
7.6 Multiple Inference....................................... 295
7.6.1 The Frequent Occurence of Extreme Events...........295
CONTENTS xxi
7.6.2 Relation to Statistical Inference.................. 296
7.6.3 The Ronferroni Inequality.......................... 297
7.6.4 Scheffe’s Method................................... 298
7.6.5 Example: MovieLens Data ........................... 300
7.6.6 The Verdict........................................ 303
7.7 Computational Complements................................. 303
7.7.1 MovieLens Data Wrangling........................... 303
7.7.2 More Data Wrangling in the MovieLens Example . . 303
7.8 Mathematical Complements.................................. 306
7.8.1 Iterated Projections............................... 306
7.8.2 Standard Errors for RFA............................ 307
7.8.3 Asymptotic Chi-Square Distributions................ 308
7.9 Exercises: Data, Code and Math Problems .................. 309
8 Shrinkage Estimators 311
8.1 Relevance of James-Stein to Regression Estimation..........312
8.2 Multicollinearity......................................... 313
8.2.1 What’s All the Puss About?......................... 313
8.2.2 A Simple Guiding Model ............................ 313
8.2.3 Checking for Multicollinearity..................... 314
8.2.3.1 The Variance Inflation Factor.............. 314
8.2.3.2 Example: Currency Data..................... 315
8.2.4 What Can/Should One Do?............................ 315
8.2.4.1 Do Nothing................................. 315
8.2.4.2 Eliminate Some Predictors.................. 316
8.2.4.3 Employ a Shrinkage Method.................. 316
8.3 Ridge Regression.......................................... 316
CONTENTS
xxii
8.3.1 Alternate Definitions............................... 317
8.3.2 Choosing the Value of A............................. 318
8.3.3 Example: Currency Data.............................. 319
8.4 The LASSO.................................................. 320
8.4.1 Definition.......................................... 321
8.4.2 The lars Package.................................... 322
8.4.3 Example: Currency Data...............................322
8.4.4 The Elastic Net..................................... 324
8.5 Cases of Exact Multicollinearity,
Including p n............................................ 324
8.5.1 Why It May Work..................................... 324
8.5.2 Example: R mtcars Data.............................. 325
8.5.2.1 Additional Motivation for the Elastic Net . 326
8.6 Bias, Standard Errors and Signficance
Tests...................................................... 327
8.7 Principal Components Analysis.............................. 327
8.8 Generalized Linear Models.................................. 329
8.8.1 Example: Vertebrae Data............................. 329
8.9 Other Terminology.......................................... 330
8.10 Further Reading............................................ 330
8.11 Mathematical Complements................................... 331
8.11.1 James-Stein Theory.................................. 331
8.11.1.1 Definition................................. 331
8.11.1.2 Theoretical Properties..................... 331
8.11.1.3 When Might Shrunken Estimators Be
Helpful?................................... 332
8.11.2 Yes, It Is Smaller.................................. 332
8.11.3 Ridge Action Increases Eigenvalues...................333
CONTENTS
xxiii
8.12 Computational Complements................................. 334
8.12.1 Code for ridgelmQ.................................. 334
8.13 Exercises: Data, Code and Math Problems .................. 336
9 Variable Selection and Dimension Reduction 339
9.1 A Closer Look at Under/Overfitting........................ 341
9.1.1 A Simple Guiding Example........................... 342
9.2 How Many Is Too Many?..................................... 344
9.3 Fit Criteria.............................................. 344
9.3.1 Some Common Measures .............................. 345
9.3.2 There Is No Panacea! .............................. 348
9.4 Variable Selection Methods................................ 348
9.5 Simple Use of p-Values: Pitfalls.......................... 349
9.6 Asking “What If” Questions................................ 349
9.7 Stepwise Selection........................................ 351
9.7.1 Basic Notion....................................... 351
9.7.2 Forward vs. Backward Selection..................... 352
9.7.3 R Functions for Stepwise Regression................ 352
9.7.4 Example: Body fat Data............................. 352
9.7.5 Classification Settings............................ 357
9.7.5.1 Example: Bank Marketing Data............. 357
9.7.5.2 Example: Vertebrae Data.................. 361
9.7.6 Nonparametric Settings............................. 362
9.7.6.1 Is Dimension Reduction Important in
the Nonparametric Setting?................. 362
9.7.7 The LASSO.......................................... 364
9.7.7.1 Why the LASSO Often Performs Subsetting 364
9.7.7.2 Example: Bodyfat Data.................... 366
xxiv CONTENTS
9.8 Post-Selection Inference................................. 367
9.9 Direct Methods for Dimension Reduction................... 369
9.9.1 Informal Nature................................... 369
9.9.2 Role in Regression Analysis....................... 370
9.9.3 PCA............................................... 370
9.9.3.1 Issues.................................... 371
9.9.3.2 Example: Bodyfat Data..................... 371
9.9.3.3 Example: Instructor Evaluations........... 375
9.9.4 Nonnegative Matrix Factorization (NMF)............ 377
9.9.4.1 Overview.................................. 377
9.9.4.2 Interpretation ........................... 377
9.9.4.3 Sum-of-Parts Property .....................378
9.9.4.4 Example: Spam Detection .................. 378
9.9.5 Use of freqparcoord for Dimension Reduction .... 380
9.9.5.1 Example: Student Evaluations of Instructors 380
9.9.5.2 Dimension Reduction for Dummy/R Factor
Variables................................. 381
9.10 The Verdict.............................................. 382
9.11 Further Reading.......................................... 383
9.12 Computational Complements................................ 383
9.12.1 Computation for NMF............................... 383
9.13 Mathematical Complements................................. 386
9.13.1 MSEs for the Simple Example ...................... 386
9.14 Exercises: Data, Code and Math Problems ................. 387
10 Partition-Based Methods 391
10.1 CART..................................................... 392
10.2 Example: Vertebral Column Data .......................... 394
CONTENTS
XXV
10.3 Technical Details......................................... 398
10.3.1 Split Criterion.................................... 398
10.3.2 Predictor Reuse and Statistical Consistency........ 398
10.4 Tuning Parameters......................................... 399
10.5 Random Forests ........................................... 399
10.5.1 Bagging............................................ 400
10.5.2 Example: Vertebrae Data............................ 400
10.5.3 Example: Letter Recognition........................ 401
10.6 Other Implementations of CART............................. 402
10.7 Exercises: Data, Code and Math Problems .................. 403
11 Semi-Linear Methods 405
11.1 k-NN with Linear Smoothing................................ 407
11.1.1 Extrapolation Via lm() ............................ 407
11.1.2 Multicollinearity Issues .......................... 409
11.1.3 Example: Bodyfat Data.............................. 409
11.1.4 Tuning Parameter................................... 409
11.2 Linear Approximation of Class
Boundaries................................................ 410
11.2.1 SVMs............................................... 410
11.2.1.1 Geometric Motivation ..................... 411
11.2.1.2 Reduced convex hulls...................... 413
11.2.1.3 Tuning Parameter.......................... 415
11.2.1.4 Nonlinear Boundaries...................... 416
11.2.1.5 Statistical Consistency .................. 417
11.2.1.6 Example: Letter Recognition Data......... 417
11.2.2 Neural Networks.................................... 418
11.2.2.1 Example: Vertebrae Data.................. 418
XXVI
CONTENTS
11.2.2.2 Tuning Parameters and Other Technical De-
tails ..............................................420
11.2.2.3 Dimension Reduction....................... 421
11.2.2.4 Why Does It Work (If It Does)?............ 421
11.3 The Verdict............................................... 423
11.4 Mathematical Complements...................................424
11.4.1 Edge Bias in Nonparametric Regression...............424
11.4.2 Dual Formulation for SVM........................... 425
11.4.3 The Kernel Trick................................... 428
11.5 Further Reading........................................... 429
11.6 Exercises: Data, Code and Math Problems ................. 429
12 Regression and Classification in Big Data 431
12.1 Solving the Big-n Problem................................. 432
12.1.1 Software Alchemy................................... 432
12.1.2 Example: Flight Delay Data .........................433
12.1.3 More on the Insufficient Memory Issue.............. 436
12.1.4 Deceivingly “Big” n.................................437
12.1.5 The Independence Assumption in Big-n Data .... 437
12.2 Addressing Big-p.......................................... 438
12.2.1 How Many Is Too Many?.............................. 438
12.2.1.1 Toy Model..................................439
12.2.1.2 Results from the Research Literature . . . 440
12.2.1.3 A Much Simpler and More Direct Approach 441
12.2.1.4 Nonparametric Case........................ 441
12.2.1.5 The Curse of Dimensionality............... 443
12.2.2 Example: Currency Data...................-........443
12.2.3 Example: Quiz Documents........................... 444
CONTENTS
XXVll
12.2.4 The Verdict........................................ 446
12.3 Mathematical Complements................................. 447
12.3.1 Speedup from Software Alchemy.................... 447
12.4 Computational Complements................................ 448
12.4.1 The partools Package .............................. 448
12.4.2 Use of the tm Package............................ 449
12.5 Exercises: Data, Code and Math Problems ............... 450
A Matrix Algebra 451
A.l Terminology and Notation.................................. 451
A.2 Matrix Addition and Multiplication ....................... 452
A.3 Matrix Transpose.......................................... 453
A.4 Linear Independence....................................... 454
A.5 Matrix Inverse ........................................... 454
A.6 Eigenvalues and Eigenvectors.............................. 455
A.7 Rank of a Matrix.......................................... 456
A.8 Matrices of the Form B’B.................................. 456
A.9 Partitioned Matrices...................................... 457
A. 10 Matrix Derivatives...................................... 458
A. 11 Matrix Algebra in R..................................... 459
A. 12 Further Reading......................................... 462
Index
475
Contents
Preface xxix
List of Symbols xxxvii
1 Setting the Stage 1
1.1 Example: Predicting Bike-Sharing
Activity................................................... 1
1.2 Example of the Prediction Goal: Body Fat .................. 2
1.3 Example: Who Clicks Web Ads?............................... 3
1.4 Approach to Prediction..................................... 4
1.5 A Note about E(), Samples and Populations.................. 5
1.6 Example of the Description Goal: Do
Baseball Players Gain Weight As They Age? ................. 6
1.6.1 Prediction vs. Description.......................... 7
1.6.2 A First Estimator................................... 9
1.6.3 A Possibly Better Estimator, Using a. Linear Model 10
1.7 Parametric vs. Nonparametric Models....................... 15
1.8 Example: Click-Through Rate............................... 15
1.9 Several Predictor Variables............................... 17
1.9.1 Multipredictor Linear Models....................... 18
ix
X
CONTENTS
1.9.1.1 Estimation of Coefficients................. 18
1.9.1.2 The Description Goal...................... 19
1.9.2 Nonparametric Regression Estimation: k-NN .... 19
1.9.2.1 Looking at Nearby Points................... 20
1.9.2.2 Measures of Nearness....................... 20
1.9.2.3 The k-NN Method, and Tuning Parameters 21
1.9.2.4 Nearest-Neighbor Analysis in the regtools
Package..................................... 21
1.9.2.5 Example: Baseball Player Data.............. 22
1.10 After Fitting a Model, How Do We Use It for Prediction? . 22
1.10.1 Parametric Settings................................. 22
1.10.2 Nonparametric Settings.............................. 23
1.10.3 The Generic predict() Function...................... 23
1.11 Overfitting, and the Variance-Bias
Tradeoff................................................... 24
1.11.1 Intuition........................................... 24
1.11.2 Example: Student Evaluations of Instructors .... 26
1.12 Cross-Validation........................................... 26
1.12.1 Linear Model Case................................... 28
1.12.1.1 The Code................................... 28
1.12.1.2 Applying the Code.......................... 29
1.12.2 k-NN Case........................................... 29
1.12.3 Choosing the Partition Sizes ....................... 30
1.13 Important Note on Tuning Parameters........................ 31
1.14 Rough Rule of Thumb........................................ 32
1.15 Example: Bike-Sharing Data................................. 32
1.15.1 Linear Modeling of ................................. 33
1.15.2 Nonparametric Analysis.............................. 38
CONTENTS
xi
1.16 Interaction Terms, Including Quadratics................... 38
1.16.1 Example: Salaries of Female Programmers
and Engineers........................................ 39
1.16.2 Fitting Separate Models.............................. 42
1.16.3 Saving Your Work..................................... 43
1.16.4 Higher-Order Polynomial Models....................... 43
1.17 Classification Techniques.................................. 44
1.17.1 It’s a Regression Problem!........................... 44
1.17.2 Example: Bike-Sharing Data........................... 45
1.18 Crucial Advice: Don’t Automate, Participate!............... 47
1.19 Mathematical Complements................................... 48
1.19.1 Indicator Random Variables........................... 48
1.19.2 Mean Squared Error of an Estimator................. 48
1.19.3 fi(t) Minimizes Mean Squared Prediction Error ... 49
1.19.4 fj,(t) Minimizes the Misclassification Rate........ 50
1.19.5 Some Properties of Conditional Expectation......... 52
1.19.5.1 Conditional Expectation As a Random
Variable.................................... 52
1.19.5.2 The Law of Total Expectation.............. 53
1.19.5.3 Law of Total Variance....................... 54
1.19.5.4 Tower Property ............................. 54
1.19.5.5 Geometric View.............................. 54
1.20 Computational Complements.................................. 55
1.20.1 CRAN Packages........................................ 55
1.20.2 The Function tapply() and Its Cousins................ 56
1.20.3 The Innards of the k-NN Code......................... 58
1.20.4 Function Dispatch.................................... 59
1.21 Centering and Scaling...................................... 60
CONTENTS
xii
1.22 Exercises: Data, Code and Math Problems ................. 61
2 Linear Regression Models 65
2.1 Notation.................................................... 65
2.2 The “Error Term” ........................................... 67
2.3 Random- vs. Fixed-X Cases................................... 67
2.4 Least-Squares Estimation.................................... 68
2.4.1 Motivation........................................... 68
2.4.2 Matrix Formulations.................................. 70
2.4.3 (2.18) in Matrix Terms............................... 71
2.4.4 Using Matrix Operations to Minimize (2.18)........... 71
2.4.5 Models without an Intercept Term..................... 72
2.5 A Closer Look at lm() Output ............................... 73
2.5.1 Statistical Inference................................ 74
2.6 Assumptions ................................................ 75
2.6.1 Classical............................................ 75
2.6.2 Motivation: the Multivariate Normal
Distribution Family ................................. 76
2.7 Unbiasedness and Consistency................................ 79
2.7.1 /3 Is Unbiased....................................... 79
2.7.2 Bias As an Issue/Nonissue............................ 80
2.7.3 /3 Is Statistically Consistent....................... 80
2.8 Inference under Homoscedasticity............................ 81
2.8.1 Review: Classical Inference on a Single Mean .... 81
2.8.2 Back to Reality...................................... 82
2.8.3 The Concept of a Standard Error...................... 83
2.8.4 Extension to the Regression Case..................... 83
2.8.5 Example: Bike-Sharing Data........................... 86
CONTENTS xiii
2.9 Collective Predictive Strength of the 88
2.9.1 Basic Properties..................................... 88
2.9.2 Definition of R2 .................................... 90
2.9.3 Bias Issues.......................................... 91
2.9.4 Adjusted-i?2 ........................................ 92
2.9.5 The “Leaving-One-Out Method”......................... 94
2.9.6 Extensions of LOOM................................... 95
2.9.7 LOOM for k-NN........................................ 95
2.9.8 Other Measures....................................... 96
2.10 The Practical Value of p-Values — Small OR Large .... 96
2.10.1 Misleadingly Small p-Values.......................... 97
2.10.1.1 Example: Forest Cover Data.................. 97
2.10.1.2 Example: Click Through Data................. 98
2.10.2 Misleadingly LARGE p-Values.......................... 99
2.10.3 The Verdict......................................... 100
2.11 Missing Values............................................. 100
2.12 Mathematical Complements................................... 101
2.12.1 Covariance Matrices................................. 101
2.12.2 The Multivariate Normal Distribution Family .... 103
2.12.3 The Central Limit Theorem........................... 104
2.12.4 Details on Models Without a Constant Term .... 104
2.12.5 Unbiasedness of the Least-Squares Estimator .... 105
2.12.6 Consistency of the Least-Squares Estimator.......... 106
2.12.7 Biased Nature of S ................................. 108
2.12.8 The Geometry of Conditional Expectation............. 108
2.12.8.1 Random Variables As Inner Product Spaces 108
2.12.8.2 Projections................................ 109
XIV
CONTENTS
2.12.8.3 Conditional Expectations As Projections . 110
2.12.9 Predicted Values and Error Terms Are Uncorrelated 111
2.12.10 Classical “Exact” Inference...................... 112
2.12.11 Asymptotic (p H- 1)-Variate Normality of ¡3 ... 113
2.13 Computational Complements.............................. 115
2.13.1 Details of the Computation of (2.28).............. 115
2.13.2 R Functions for the Multivariate Normal
Distribution Family .............................. 116
2.13.2.1 Example: Simulation Computation of a
Bivariate Normal Quantity................. 116
2.13.3 More Details of 5lm’ Objects ..................... 118
2.14 Exercises: Data, Code and Math Problems ................. 120
3 Homoscedasticity and Other Assumptions in Practice 123
3.1 Normality Assumption..................................... 124
3.2 Independence Assumption — Don’t
Overlook It.............................................. 125
3.2.1 Estimation of a Single Mean ...................... 125
3.2.2 Inference on Linear Regression Coefficients....... 126
3.2.3 What Can Be Done?................................. 126
3.2.4 Example: MovieLens Data ........................ 127
3.3 Dropping the Homoscedasticity
Assumption............................................... 130
3.3.1 Robustness of the Homoscedasticity Assumption . . 131
3.3.2 Weighted Least Squares.......................... 133
3.3.3 A Procedure for Valid Inference................... 135
3.3.4 The Methodology................................... 135
3.3.5 Example: Female Wages............................. 136
3.3.6 Simulation Test................................... 137
CONTENTS
xv
3.3.7 Variance-Stabilizing Transformations............... 137
3.3.8 The Verdict........................................ 139
3.4 Further Reading........................................... 139
3.5 Computational Complements................................. 140
3.5.1 The R mergeQ Function.............................. 140
3.6 Mathematical Complements.................................. 141
3.6.1 The Delta Method................................... 141
3.6.2 Distortion Due to Transformation .................. 142
3.7 Exercises: Data, Code and Math Problems .................. 143
4 Generalized Linear and Nonlinear Models 147
4.1 Example: Enzyme Kinetics Model............................ 148
4.2 The Generalized Linear Model (GLM)........................ 150
4.2.1 Definition......................................... 150
4.2.2 Poisson Regression................................. 151
4.2.3 Exponential Families............................... 152
4.2.4 R*s glm() Function................................. 153
4.3 GLM: the Logistic Model................................... 154
4.3.1 Motivation......................................... 155
4.3.2 Example: Pima Diabetes Data........................ 158
4.3.3 Interpretation of Coefficients..................... 159
4.3.4 The predict() Function Again....................... 161
4.3.5 Overall Prediction Accuracy........................ 162
4.3.6 Example: Predicting Spam E-mail.................... 163
4.3.7 Linear Boundary.................................... 164
4.4 GLM: the Poisson Regression Model......................... 165
4.5 Least-Squares Computation................................. 166
CONTENTS
xvi
4.5.1 The Gauss-Newton Method........................... 166
4.5.2 Eicker-White Asymptotic Standard Errors........... 168
4.5.3 Example: Bike Sharing Data........................ 171
4.5.4 The “Elephant in the Room’ : Convergence
Issues............................................ 172
4.6 Further Reading.......................................... 173
4.7 Computational Complements................................ 173
4.7.1 GLM Computation................................... 173
4.7.2 R Factors ........................................ 174
4.8 Mathematical Complements................................. 175
4.8.1 Maximum Likelihood Estimation..................... 175
4.9 Exercises: Data, Code and Math Problems ................. 176
5 Multiclass Classification Problems 179
5.1 Key Notation............................................. 179
5.2 Key Equations............................................ 180
5.3 Estimating the Functions p*(t)........................... 182
5.4 How Do We Use Models for Prediction?..................... 182
5.5 One vs. All or All vs. All?.............................. 183
5.5.1 Which Is Better?.................................. 184
5.5.2 Example: Vertebrae Data........................... 184
5.5.3 Intuition......................................... 185
5.5.4 Example: Letter Recognition Data.................. 186
5.5.5 Example: k-NN on the Letter Recognition Data . . 187
5.5.6 The Verdict....................................... 188
5.6 Fisher Linear Discriminant Analysis...................... 188
5.6.1 Background........................................ 189
5.6.2 Derivation........................................ 189
CONTENTS xvii
5.6.3 Example: Vertebrae Data........................... 190
5.6.3.1 LDA Code and Results..................... 190
5.7 Multinomial Logistic Model............................... 191
5.7.1 Model............................................. 191
5.7.2 Software.......................................... 192
5.7.3 Example: Vertebrae Data........................... 192
5.8 The Issue of “Unbalanced” (and Balanced) Data............ 193
5.8.1 Why the Concern Regarding Balance?................ 194
5.8.2 A Crucial Sampling Issue.......................... 195
5.8.2.1 It All Depends on How We Sample .... 195
5.8.2.2 Remedies................................. 197
5.8.3 Example: Letter Recognition....................... 198
5.9 Going Beyond Using the 0.5 Threshold.................... 200
5.9.1 Unequal Misclassification Costs................... 200
5.9.2 Revisiting the Problem of Unbalanced Data......... 201
5.9.3 The Confusion Matrix and the ROC Curve............ 202
5.9.3.1 Code..................................... 203
5.9.3.2 Example: Spam Data....................... 203
5.10 Mathematical Complements................................. 203
5.10.1 Classification via Density Estimation............. 203
5.10.1.1 Methods for Density Estimation........... 204
5.10.2 Time Complexity Comparison, OVA vs. AVA .... 205
5.10.3 Optimal Classification Rule for
Unequal Error Costs............................... 206
5.11 Computational Complements................................ 207
5.11.1 R Code for OVA and AVA Logit Analysis............. 207
5.11.2 ROC Code.......................................... 210
xviii CONTENTS
5.12 Exercises: Data, Code and Math Problems ............... 211
6 Model Fit Assessment and Improvement 215
6.1 Aims of This Chapter..................................... 215
6.2 Methods.................................................. 216
6.3 Notation................................................. 216
6.4 Goals of Model Fit-Checking.............................. 217
6.4.1 Prediction Context................................ 217
6.4.2 Description Context............................... 218
6.4.3 Center vs. Fringes of the Data Set................ 218
6.5 Example: Currency Data.................................. 219
6.6 Overall Measures of Model Fit ........................... 220
6.6.1 R-Squared, Revisited.............................. 221
6.6.2 Cross-Validation, Revisited....................... 222
6.6.3 Plotting Parametric Fit Against a
Nonparametric One ................................ 222
6.6.4 Residuals vs. Smoothing........................... 223
6.7 Diagnostics Related to Individual
Predictors............................................... 224
6.7.1 Partial Residual Plots............................ 225
6.7.2 Plotting Nonpar ametric Fit Against
Each Predictor.................................... 227
6.7.3 The freqparcoord Package.......................... 229
6.7.3.1 Parallel Coordinates..................... 229
6.7.3.2 The freqparcoord Package................. 229
6.7.3.3 The regdiagQ Function.................... 230
6.8 Effects of Unusual Observations on Model Fit............. 232
6.8.1 The influenceQ Function........................... 232
CONTENTS
xix
6.8.1.1 Example: Currency Data.................... 233
6.8.2 Use of freqparcoord for Outlier Detection......... 235
6.9 Automated Outlier Resistance............................. 236
6.9.1 Median Regression.................................. 236
6.9.2 Example: Currency Data............................. 238
6.10 Example: Vocabulary Acquisition.......................... 238
6.11 Classification Settings.................................. 241
6.11.1 Example: Pima Diabetes Study....................... 242
6.12 Improving Fit............................................ 245
6.12.1 Deleting Terms from the Model..................... 245
6.12.2 Adding Polynomial Terms............................ 247
6.12.2.1 Example: Currency Data.................... 247
6.12.2.2 Example: Census Data...................... 248
6.12.3 Boosting........................................... 251
6.12.3.1 View from the 30,000-Foot Level.......... 251
6.12.3.2 Performance............................... 253
6.13 A Tool to Aid Model Selection............................ 254
6.14 Special Note on the Description Goal..................... 255
6.15 Computational Complements................................ 255
6.15.1 Data Wrangling for the Currency Dataset.............255
6.15.2 Data Wrangling for the Word Bank Dataset........... 256
6.16 Mathematical Complements................................. 257
6.16.1 The Hat Matrix .................................... 257
6.16.2 Matrix Inverse Update.............................. 259
6.16.3 The Median Minimizes Mean Absolute
Deviation.......................................... 260
6.16.4 The Gauss-Markov Theorem........................... 261
XX
CONTENTS
6.16.4.1 Lagrange Multipliers..................... 261
6.16.4.2 Proof of Gauss-Markov.................... 262
6.17 Exercises: Data, Code and Math Problems ................ 264
7 Disaggregating Regressor Effects 267
7.1 A Small Analytical Example............................... 268
7.2 Example: Baseball Player Data............................ 270
7.3 Simpson’s Paradox........................................ 274
7.3.1 Example: UCB Admissions Data (Logit)........... 274
7.3.2 The Verdict....................................... 278
7.4 Unobserved Predictor Variables........................... 278
7.4.1 Instrumental Variables (IVs) ..................... 279
7.4.1.1 The IV Method .......................... 281
7.4.1.2 Two-Stage Least Squares:................. 283
7.4.1.3 Example: Years of Schooling............. 284
7.4.1.4 The Verdict.............................. 286
7.4.2 Random Effects Models............................. 286
7.4.2.1 Example: Movie Ratings Data.............. 287
7.4.3 Multiple Random Effects .......................... 288
7.4.4 Why Use Random/Mixed Effects Models?........... 288
7.5 Regression Function Averaging............................ 289
7.5.1 Estimating the Counterfactual..................... 290
7.5.1.1 Example: Job Training.................... 290
7.5.2 Small Area Estimation: ‘‘Borrowing from
Neighbors” ....................................... 291
7.5.3 The Verdict....................................... 295
7.6 Multiple Inference....................................... 295
7.6.1 The Frequent Occurence of Extreme Events...........295
CONTENTS xxi
7.6.2 Relation to Statistical Inference.................. 296
7.6.3 The Ronferroni Inequality.......................... 297
7.6.4 Scheffe’s Method................................... 298
7.6.5 Example: MovieLens Data ........................... 300
7.6.6 The Verdict........................................ 303
7.7 Computational Complements................................. 303
7.7.1 MovieLens Data Wrangling........................... 303
7.7.2 More Data Wrangling in the MovieLens Example . . 303
7.8 Mathematical Complements.................................. 306
7.8.1 Iterated Projections............................... 306
7.8.2 Standard Errors for RFA............................ 307
7.8.3 Asymptotic Chi-Square Distributions................ 308
7.9 Exercises: Data, Code and Math Problems .................. 309
8 Shrinkage Estimators 311
8.1 Relevance of James-Stein to Regression Estimation..........312
8.2 Multicollinearity......................................... 313
8.2.1 What’s All the Puss About?......................... 313
8.2.2 A Simple Guiding Model ............................ 313
8.2.3 Checking for Multicollinearity..................... 314
8.2.3.1 The Variance Inflation Factor.............. 314
8.2.3.2 Example: Currency Data..................... 315
8.2.4 What Can/Should One Do?............................ 315
8.2.4.1 Do Nothing................................. 315
8.2.4.2 Eliminate Some Predictors.................. 316
8.2.4.3 Employ a Shrinkage Method.................. 316
8.3 Ridge Regression.......................................... 316
CONTENTS
xxii
8.3.1 Alternate Definitions............................... 317
8.3.2 Choosing the Value of A............................. 318
8.3.3 Example: Currency Data.............................. 319
8.4 The LASSO.................................................. 320
8.4.1 Definition.......................................... 321
8.4.2 The lars Package.................................... 322
8.4.3 Example: Currency Data...............................322
8.4.4 The Elastic Net..................................... 324
8.5 Cases of Exact Multicollinearity,
Including p n............................................ 324
8.5.1 Why It May Work..................................... 324
8.5.2 Example: R mtcars Data.............................. 325
8.5.2.1 Additional Motivation for the Elastic Net . 326
8.6 Bias, Standard Errors and Signficance
Tests...................................................... 327
8.7 Principal Components Analysis.............................. 327
8.8 Generalized Linear Models.................................. 329
8.8.1 Example: Vertebrae Data............................. 329
8.9 Other Terminology.......................................... 330
8.10 Further Reading............................................ 330
8.11 Mathematical Complements................................... 331
8.11.1 James-Stein Theory.................................. 331
8.11.1.1 Definition................................. 331
8.11.1.2 Theoretical Properties..................... 331
8.11.1.3 When Might Shrunken Estimators Be
Helpful?................................... 332
8.11.2 Yes, It Is Smaller.................................. 332
8.11.3 Ridge Action Increases Eigenvalues...................333
CONTENTS
xxiii
8.12 Computational Complements................................. 334
8.12.1 Code for ridgelmQ.................................. 334
8.13 Exercises: Data, Code and Math Problems .................. 336
9 Variable Selection and Dimension Reduction 339
9.1 A Closer Look at Under/Overfitting........................ 341
9.1.1 A Simple Guiding Example........................... 342
9.2 How Many Is Too Many?..................................... 344
9.3 Fit Criteria.............................................. 344
9.3.1 Some Common Measures .............................. 345
9.3.2 There Is No Panacea! .............................. 348
9.4 Variable Selection Methods................................ 348
9.5 Simple Use of p-Values: Pitfalls.......................... 349
9.6 Asking “What If” Questions................................ 349
9.7 Stepwise Selection........................................ 351
9.7.1 Basic Notion....................................... 351
9.7.2 Forward vs. Backward Selection..................... 352
9.7.3 R Functions for Stepwise Regression................ 352
9.7.4 Example: Body fat Data............................. 352
9.7.5 Classification Settings............................ 357
9.7.5.1 Example: Bank Marketing Data............. 357
9.7.5.2 Example: Vertebrae Data.................. 361
9.7.6 Nonparametric Settings............................. 362
9.7.6.1 Is Dimension Reduction Important in
the Nonparametric Setting?................. 362
9.7.7 The LASSO.......................................... 364
9.7.7.1 Why the LASSO Often Performs Subsetting 364
9.7.7.2 Example: Bodyfat Data.................... 366
xxiv CONTENTS
9.8 Post-Selection Inference................................. 367
9.9 Direct Methods for Dimension Reduction................... 369
9.9.1 Informal Nature................................... 369
9.9.2 Role in Regression Analysis....................... 370
9.9.3 PCA............................................... 370
9.9.3.1 Issues.................................... 371
9.9.3.2 Example: Bodyfat Data..................... 371
9.9.3.3 Example: Instructor Evaluations........... 375
9.9.4 Nonnegative Matrix Factorization (NMF)............ 377
9.9.4.1 Overview.................................. 377
9.9.4.2 Interpretation ........................... 377
9.9.4.3 Sum-of-Parts Property .....................378
9.9.4.4 Example: Spam Detection .................. 378
9.9.5 Use of freqparcoord for Dimension Reduction .... 380
9.9.5.1 Example: Student Evaluations of Instructors 380
9.9.5.2 Dimension Reduction for Dummy/R Factor
Variables................................. 381
9.10 The Verdict.............................................. 382
9.11 Further Reading.......................................... 383
9.12 Computational Complements................................ 383
9.12.1 Computation for NMF............................... 383
9.13 Mathematical Complements................................. 386
9.13.1 MSEs for the Simple Example ...................... 386
9.14 Exercises: Data, Code and Math Problems ................. 387
10 Partition-Based Methods 391
10.1 CART..................................................... 392
10.2 Example: Vertebral Column Data .......................... 394
CONTENTS
XXV
10.3 Technical Details......................................... 398
10.3.1 Split Criterion.................................... 398
10.3.2 Predictor Reuse and Statistical Consistency........ 398
10.4 Tuning Parameters......................................... 399
10.5 Random Forests ........................................... 399
10.5.1 Bagging............................................ 400
10.5.2 Example: Vertebrae Data............................ 400
10.5.3 Example: Letter Recognition........................ 401
10.6 Other Implementations of CART............................. 402
10.7 Exercises: Data, Code and Math Problems .................. 403
11 Semi-Linear Methods 405
11.1 k-NN with Linear Smoothing................................ 407
11.1.1 Extrapolation Via lm() ............................ 407
11.1.2 Multicollinearity Issues .......................... 409
11.1.3 Example: Bodyfat Data.............................. 409
11.1.4 Tuning Parameter................................... 409
11.2 Linear Approximation of Class
Boundaries................................................ 410
11.2.1 SVMs............................................... 410
11.2.1.1 Geometric Motivation ..................... 411
11.2.1.2 Reduced convex hulls...................... 413
11.2.1.3 Tuning Parameter.......................... 415
11.2.1.4 Nonlinear Boundaries...................... 416
11.2.1.5 Statistical Consistency .................. 417
11.2.1.6 Example: Letter Recognition Data......... 417
11.2.2 Neural Networks.................................... 418
11.2.2.1 Example: Vertebrae Data.................. 418
XXVI
CONTENTS
11.2.2.2 Tuning Parameters and Other Technical De-
tails ..............................................420
11.2.2.3 Dimension Reduction....................... 421
11.2.2.4 Why Does It Work (If It Does)?............ 421
11.3 The Verdict............................................... 423
11.4 Mathematical Complements...................................424
11.4.1 Edge Bias in Nonparametric Regression...............424
11.4.2 Dual Formulation for SVM........................... 425
11.4.3 The Kernel Trick................................... 428
11.5 Further Reading........................................... 429
11.6 Exercises: Data, Code and Math Problems ................. 429
12 Regression and Classification in Big Data 431
12.1 Solving the Big-n Problem................................. 432
12.1.1 Software Alchemy................................... 432
12.1.2 Example: Flight Delay Data .........................433
12.1.3 More on the Insufficient Memory Issue.............. 436
12.1.4 Deceivingly “Big” n.................................437
12.1.5 The Independence Assumption in Big-n Data .... 437
12.2 Addressing Big-p.......................................... 438
12.2.1 How Many Is Too Many?.............................. 438
12.2.1.1 Toy Model..................................439
12.2.1.2 Results from the Research Literature . . . 440
12.2.1.3 A Much Simpler and More Direct Approach 441
12.2.1.4 Nonparametric Case........................ 441
12.2.1.5 The Curse of Dimensionality............... 443
12.2.2 Example: Currency Data...................-........443
12.2.3 Example: Quiz Documents........................... 444
CONTENTS
XXVll
12.2.4 The Verdict........................................ 446
12.3 Mathematical Complements................................. 447
12.3.1 Speedup from Software Alchemy.................... 447
12.4 Computational Complements................................ 448
12.4.1 The partools Package .............................. 448
12.4.2 Use of the tm Package............................ 449
12.5 Exercises: Data, Code and Math Problems ............... 450
A Matrix Algebra 451
A.l Terminology and Notation.................................. 451
A.2 Matrix Addition and Multiplication ....................... 452
A.3 Matrix Transpose.......................................... 453
A.4 Linear Independence....................................... 454
A.5 Matrix Inverse ........................................... 454
A.6 Eigenvalues and Eigenvectors.............................. 455
A.7 Rank of a Matrix.......................................... 456
A.8 Matrices of the Form B’B.................................. 456
A.9 Partitioned Matrices...................................... 457
A. 10 Matrix Derivatives...................................... 458
A. 11 Matrix Algebra in R..................................... 459
A. 12 Further Reading......................................... 462
Index
475
|
any_adam_object | 1 |
author | Matloff, Norman S. 1948- |
author_GND | (DE-588)1018956115 |
author_facet | Matloff, Norman S. 1948- |
author_role | aut |
author_sort | Matloff, Norman S. 1948- |
author_variant | n s m ns nsm |
building | Verbundindex |
bvnumber | BV044901487 |
callnumber-first | Q - Science |
callnumber-label | QA278 |
callnumber-raw | QA278.2 |
callnumber-search | QA278.2 |
callnumber-sort | QA 3278.2 |
callnumber-subject | QA - Mathematics |
classification_rvk | QH 234 SK 840 |
ctrlnum | (OCoLC)988749360 (DE-599)BVBBV044901487 |
dewey-full | 519.5/36 |
dewey-hundreds | 500 - Natural sciences and mathematics |
dewey-ones | 519 - Probabilities and applied mathematics |
dewey-raw | 519.5/36 |
dewey-search | 519.5/36 |
dewey-sort | 3519.5 236 |
dewey-tens | 510 - Mathematics |
discipline | Mathematik Wirtschaftswissenschaften |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02276nam a2200481 c 4500</leader><controlfield tag="001">BV044901487</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20180726 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">180412s2017 a||| |||| 00||| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">017011270</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781498710916</subfield><subfield code="c">pbk</subfield><subfield code="9">978-1-4987-1091-6</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781138066465</subfield><subfield code="c">hbk</subfield><subfield code="9">978-1-138-06646-5</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)988749360</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV044901487</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-739</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-521</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA278.2</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">519.5/36</subfield><subfield code="2">23</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">QH 234</subfield><subfield code="0">(DE-625)141549:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SK 840</subfield><subfield code="0">(DE-625)143261:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Matloff, Norman S.</subfield><subfield code="d">1948-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1018956115</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Statistical regression and classification</subfield><subfield code="b">from linear models to machine learning</subfield><subfield code="c">Norman Matloff, University of California, Davis, USA</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boca Raton London ; New York</subfield><subfield code="b">CRC Press, Taylor & Francis Group</subfield><subfield code="c">[2017]</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2017</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xxxviii, 489 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Texts in statistical science</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Im Buch ist die ISBN der Hardbackausgabe fälschlich als: 978-1-138-06656-5 angegeben</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Regression analysis</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Vector analysis</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Lineare Regression</subfield><subfield code="0">(DE-588)4167709-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Klassifikation</subfield><subfield code="0">(DE-588)4120957-6</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Lineare Regression</subfield><subfield code="0">(DE-588)4167709-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Automatische Klassifikation</subfield><subfield code="0">(DE-588)4120957-6</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030295275&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030295275&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-030295275</subfield></datafield></record></collection> |
id | DE-604.BV044901487 |
illustrated | Illustrated |
indexdate | 2024-07-10T08:04:16Z |
institution | BVB |
isbn | 9781498710916 9781138066465 |
language | English |
lccn | 017011270 |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-030295275 |
oclc_num | 988749360 |
open_access_boolean | |
owner | DE-739 DE-384 DE-521 |
owner_facet | DE-739 DE-384 DE-521 |
physical | xxxviii, 489 Seiten Illustrationen, Diagramme |
publishDate | 2017 |
publishDateSearch | 2017 |
publishDateSort | 2017 |
publisher | CRC Press, Taylor & Francis Group |
record_format | marc |
series2 | Texts in statistical science |
spelling | Matloff, Norman S. 1948- Verfasser (DE-588)1018956115 aut Statistical regression and classification from linear models to machine learning Norman Matloff, University of California, Davis, USA Boca Raton London ; New York CRC Press, Taylor & Francis Group [2017] © 2017 xxxviii, 489 Seiten Illustrationen, Diagramme txt rdacontent n rdamedia nc rdacarrier Texts in statistical science Includes bibliographical references Im Buch ist die ISBN der Hardbackausgabe fälschlich als: 978-1-138-06656-5 angegeben Regression analysis Vector analysis Lineare Regression (DE-588)4167709-2 gnd rswk-swf Automatische Klassifikation (DE-588)4120957-6 gnd rswk-swf Lineare Regression (DE-588)4167709-2 s Automatische Klassifikation (DE-588)4120957-6 s DE-604 Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030295275&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030295275&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Matloff, Norman S. 1948- Statistical regression and classification from linear models to machine learning Regression analysis Vector analysis Lineare Regression (DE-588)4167709-2 gnd Automatische Klassifikation (DE-588)4120957-6 gnd |
subject_GND | (DE-588)4167709-2 (DE-588)4120957-6 |
title | Statistical regression and classification from linear models to machine learning |
title_auth | Statistical regression and classification from linear models to machine learning |
title_exact_search | Statistical regression and classification from linear models to machine learning |
title_full | Statistical regression and classification from linear models to machine learning Norman Matloff, University of California, Davis, USA |
title_fullStr | Statistical regression and classification from linear models to machine learning Norman Matloff, University of California, Davis, USA |
title_full_unstemmed | Statistical regression and classification from linear models to machine learning Norman Matloff, University of California, Davis, USA |
title_short | Statistical regression and classification |
title_sort | statistical regression and classification from linear models to machine learning |
title_sub | from linear models to machine learning |
topic | Regression analysis Vector analysis Lineare Regression (DE-588)4167709-2 gnd Automatische Klassifikation (DE-588)4120957-6 gnd |
topic_facet | Regression analysis Vector analysis Lineare Regression Automatische Klassifikation |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030295275&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030295275&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT matloffnormans statisticalregressionandclassificationfromlinearmodelstomachinelearning |
Es ist kein Print-Exemplar vorhanden.
Inhaltsverzeichnis