An introduction to statistical learning: with applications in Python
An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance, marketing, and astrophysics in the past twenty years. Thi...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Cham
Springer
[2023]
|
Schriftenreihe: | Springer texts in statistics
|
Schlagworte: | |
Online-Zugang: | Volltext Inhaltsverzeichnis |
Zusammenfassung: | An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance, marketing, and astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, deep learning, survival analysis, multiple testing, and more. Color graphics and real-world examples are used to illustrate the methods presented. This book is targeted at statisticians and non-statisticians alike, who wish to use cutting-edge statistical learning techniques to analyze their data. |
Beschreibung: | xv, 607 Seiten Illustrationen, Diagramme |
ISBN: | 9783031387463 9783031391897 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV049047206 | ||
003 | DE-604 | ||
005 | 20241128 | ||
007 | t| | ||
008 | 230712s2023 xx a||| |||| 00||| eng d | ||
020 | |a 9783031387463 |q hbk |9 978-3-031-38746-3 | ||
020 | |a 9783031391897 |q pbk : ca. EUR 79.95 |9 978-3-031-39189-7 | ||
035 | |a (OCoLC)1390747667 | ||
035 | |a (DE-599)BVBBV049047206 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-473 |a DE-11 |a DE-860 |a DE-384 |a DE-355 |a DE-863 |a DE-188 |a DE-29T |a DE-83 |a DE-521 |a DE-573 |a DE-739 |a DE-703 | ||
082 | 0 | |a 519.5 |2 23 | |
084 | |a ST 250 |0 (DE-625)143626: |2 rvk | ||
084 | |a SK 830 |0 (DE-625)143259: |2 rvk | ||
084 | |a XF 3400 |0 (DE-625)152765: |2 rvk | ||
084 | |a SK 840 |0 (DE-625)143261: |2 rvk | ||
084 | |a 62-04 |2 msc/2020 | ||
084 | |a 62H30 |2 msc/2020 | ||
084 | |a 68T05 |2 msc/2020 | ||
100 | 1 | |a James, Gareth |e Verfasser |0 (DE-588)1038457327 |4 aut | |
245 | 1 | 0 | |a An introduction to statistical learning |b with applications in Python |c Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Jonathan Taylor |
264 | 1 | |a Cham |b Springer |c [2023] | |
264 | 4 | |c © 2023 | |
300 | |a xv, 607 Seiten |b Illustrationen, Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Springer texts in statistics | |
520 | 3 | |a An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance, marketing, and astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, deep learning, survival analysis, multiple testing, and more. Color graphics and real-world examples are used to illustrate the methods presented. This book is targeted at statisticians and non-statisticians alike, who wish to use cutting-edge statistical learning techniques to analyze their data. | |
600 | 0 | 7 | |a Python |0 (DE-588)118793772 |2 gnd |9 rswk-swf |
650 | 4 | |a Statistical Theory and Methods | |
650 | 4 | |a Statistics and Computing | |
650 | 4 | |a Applied Statistics | |
650 | 4 | |a Statistics | |
650 | 0 | 7 | |a Datenanalyse |0 (DE-588)4123037-1 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Statistik |0 (DE-588)4056995-0 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Python |0 (DE-588)118793772 |D p |
689 | 0 | 1 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |D s |
689 | 0 | 2 | |a Datenanalyse |0 (DE-588)4123037-1 |D s |
689 | 0 | 3 | |a Statistik |0 (DE-588)4056995-0 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Witten, Daniela |e Verfasser |0 (DE-588)108120849X |4 aut | |
700 | 1 | |a Hastie, Trevor |d 1953- |e Verfasser |0 (DE-588)172128242 |4 aut | |
700 | 1 | |a Tibshirani, Robert |d 1956- |e Verfasser |0 (DE-588)172417740 |4 aut | |
700 | 1 | |a Taylor, Jonathan E. |e Verfasser |0 (DE-588)102963100X |4 aut | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-3-031-38747-0 |w (DE-604)BV049032803 |
856 | 4 | 1 | |u https://www.statlearning.com/ |x Verlag |z kostenfrei |3 Volltext |
856 | 4 | 2 | |m Digitalisierung UB Bamberg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034309627&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
912 | |a ebook | ||
940 | 1 | |q gbd_0 | |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-034309627 |
Datensatz im Suchindex
DE-BY-863_location | 1000 |
---|---|
DE-BY-FWS_call_number | 1000/ST 250 J27 |
DE-BY-FWS_katkey | 1069087 |
DE-BY-FWS_media_number | 083101208092 |
_version_ | 1819742520647614464 |
adam_text |
Contents Preface 1 2 Introduction Statistical Learning What Is Statistical Learning?. 2.1.1 Why Estimate ƒ?. 2.1.2 How Do We Estimate ƒ?. 2.1.3 The Trade-Off Between Prediction Accuracy and Model Interpretability. 2.1.4 Supervised Versus Unsupervised Learning. 2.1.5 Regression Versus Classification Problems. 2.2 Assessing Model Accuracy . 2.2.1 Measuring the Quality ofFit . 2.2.2 The Bias-Variance Trade-Off. 2.2.3 The Classification Setting. 2.3 Lab: Introduction to Python. 2.3.1 Getting Started. 2.3.2 Basic Commands. 2.3.3 Introduction to Numerical Python . 2.3.4 Graphics. 2.3.5 Sequences and Slice Notation. 2.3.6 Indexing Data. 2.3.7 Loading Data. 2.3.8 For Loops. 2.3.9 Additional Graphical and Numerical Summaries . . 2.4
Exercises. 2.1 3 Linear Regression 3.1 3.2 Simple Linear Regression. 3.1.1 Estimating the Coefficients . 3.1.2 Assessing the Accuracy of the Coefficient Estimates. 3.1.3 Assessing the Accuracy of the Model. Multiple Linear Regression. 3.2.1 Estimating the Regression Coefficients. vij 1 15 15 17 20 23 25 27 27 28 31 34 40 40 40 42 48 51 51 55 59 61 63 69 70 71 72 77 80 81 ix
x Contents 3.3 3.4 3.5 3.6 3.7 4 3.2.2 Some Important Questions. 83 Other Considerations in the Regression Model. 91 3.3.1 Qualitative Predictors. 91 3.3.2 Extensions of the LinearModel. 94 3.3.3 Potential Problems.100 The Marketing Plan. 109 Comparison of Linear Regression with K-Nearest Neighbors. Ill Lab: Linear Regression. 116 3.6.1 Importing packages. 116 3.6.2 Simple Linear Regression. 117 3.6.3 Multiple Linear Regression.122 3.6.4 Multivariate Goodness of Fit .123 3.6.5 Interaction Terms. 124 3.6.6 Non-linear Transformations of the Predictors . . . 125 3.6.7 Qualitative Predictors. 126 Exercises. 127 Classification 135 4.1 An Overview of Classification. 135 4.2 Why Not Linear Regression?.
136 4.3 Logistic Regression . 138 4.3.1 The Logistic Model. 139 4.3.2 Estimating the Regression Coefficients. 140 4.3.3 Making Predictions. 141 4.3.4 Multiple Logistic Regression. 142 4.3.5 Multinomial Logistic Regression.144 4.4 Generative Models for Classification. 146 4.4.1 Linear Discriminant Analysis for p = 1. 147 4.4.2 Linear Discriminant Analysis for p 1. 150 4.4.3 Quadratic Discriminant Analysis. 156 4.4.4 Naive Bayes. 158 4.5 A Comparison of ClassificationMethods . 161 4.5.1 An Analytical Comparison. 161 4.5.2 An Empirical Comparison. 164 4.6 Generalized Linear Models. 167 4.6.1 Linear Regression on the Bikeshare Data. 167 4.6.2 Poisson Regression on the Bikeshare Data. 169 4.6.3 Generalized Linear Models in Greater Generality . 172 4.7 Lab: Logistic Regression, LDA,QDA, and KNN. 173 4.7.1 The Stock Market Data. 173 4.7.2 Logistic
Regression. 174 4.7.3 Linear Discriminant Analysis. 179 4.7.4 Quadratic Discriminant Analysis.181 4.7.5 Naive Bayes.182 4.7.6 К-Nearest Neighbors. 183 4.7.7 Linear and Poisson Regression on the Bikeshare Datal88 4.8 Exercises. 193
Contents 5 Resampling Methods 5.1 5.2 5.3 5.4 6.2 6.3 6.4 6.5 6.6 229 Subset Selection. 231 6.1.1 Best Subset Selection. 231 6.1.2 Stepwise Selection . 233 6.1.3 Choosing the Optimal Model. 235 Shrinkage Methods . 240 6.2.1 Ridge Regression. 240 6.2.2 The Lasso. 244 6.2.3 Selecting the Tuning Parameter. 252 Dimension Reduction Methods. 253 6.3.1 Principal Components Regression. 254 6.3.2 Partial Least Squares. 260 Considerations in High Dimensions . 262 6.4.1 High-Dimensional Data. 262 6.4.2 What Goes Wrong in High Dimensions?. 263 6.4.3 Regression in High Dimensions. 265 6.4.4 Interpreting Results in High Dimensions. 266 Lab: Linear Models and Regularization Methods. 267 6.5.1 Subset Selection Methods. 268 6.5.2 Ridge Regression and the Lasso. 273 6.5.3 PCR
and PLS Regression. 280 Exercises. 283 7 Moving Beyond Linearity 7.1 7.2 7.3 7.4 201 Cross-Validation. 202 5.1.1 The Validation Set Approach. 202 5.1.2 Leave-One-Out Cross-Validation . 204 5.1.3 ÅJ-Fold Cross-Validation . 206 5.1.4 Bias-Variance Trade-Off for /c-Fold Cross-Validation . 208 5.1.5 Cross-Validation on Classification Problems . 209 The Bootstrap. 212 Lab: Cross-Validation and the Bootstrap. 215 5.3.1 The Validation Set Approach. 216 5.3.2 Cross-Validation .217 5.3.3 The Bootstrap . 220 Exercises. 224 6 Linear Model Selection and Regularization 6.1 xi 289 Polynomial Regression. 290 Step Functions. 292 Basis Functions . 293 Regression
Splines. 294 7.4.1 Piecewise Polynomials. 294 7.4.2 Constraints and Splines . 296 7.4.3 The Spline Basis Representation .296 7.4.4 Choosing the Number and Locations of the Knots. 297 7.4.5 Comparison to Polynomial Regression. 299
xii Contents 7.5 7.6 7.7 7.8 7.9 Smoothing Splines. 300 7.5.1 An Overview of Smoothing Splines. 300 7.5.2 Choosing the SmoothingParameter λ . 301 Local Regression.303 Generalized Additive Models. 305 7.7.1 GAMs for Regression Problems. 306 7.7.2 GAMs for Classification Problems. 308 Lab: Non-Linear Modeling. 309 7.8.1 Polynomial Regression and StepFunctions. 310 7.8.2 Splines. 315 7.8.3 Smoothing Splines and GAMs. 317 7.8.4 Local Regression. 324 Exercises. 325 8 Tree-Based Methods 331 8.1 The Basics of Decision Trees. 331 8.1.1 Regression Trees . 331 8.1.2 Classification Trees. 337 8.1.3 Trees Versus Linear Models. 341 8.1.4 Advantages and Disadvantages of Trees. 341 8.2 Bagging, Random Forests, Boosting, and
Bayesian Additive Regression Trees. 343 8.2.1 Bagging. 343 8.2.2 Random Forests. 346 8.2.3 Boosting. 347 8.2.4 Bayesian Additive Regression Trees. 350 8.2.5 Summary of Tree Ensemble Methods. 353 8.3 Lab: Tree-Based Methods.354 8.3.1 Fitting Classification Trees. 355 8.3.2 Fitting Regression Trees. 358 8.3.3 Bagging and Random Forests.360 8.3.4 Boosting.361 8.3.5 Bayesian Additive Regression Trees. 362 8.4 Exercises. 363 9 Support Vector Machines 367 9.1 Maximal Margin Classifier. 367 9.1.1 What Is a Hyperplane?. 368 9.1.2 Classification Using a Separating Hyperplane . . . 368 9.1.3 The Maximal Margin Classifier. 370 9.1.4 Construction of the Maximal Margin Classifier . . 372 9.1.5 The Non-separable
Case. 372 9.2 Support Vector Classifiers. 373 9.2.1 Overview of the Support Vector Classifier.373 9.2.2 Details of the SupportVector Classifier.374 9.3 Support Vector Machines. 377 9.3.1 Classification with Non-Linear Decision Boundaries . 378 9.3.2 The Support VectorMachine. 379
Contents 9.4 9.5 9.6 9.7 9.3.3 An Application to the Heart Disease Data. 382 SVMs with More than Two Classes. 383 9.4.1 One-Versus-One Classification. 384 9.4.2 One-Versus-All Classification . 384 Relationship to Logistic Regression . 384 Lab: Support Vector Machines. 387 9.6.1 Support Vector Classifier. 387 9.6.2 Support Vector Machine. 390 9.6.3 ROC Curves. 392 9.6.4 SVM with Multiple Classes. 393 9.6.5 Application to Gene Expression Data. 394 Exercises. 395 10 Deep Learning 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 10.10 399 Single Layer Neural Networks .400 Multilayer Neural Networks. 402 Convolutional Neural Networks. 406 10.3.1 Convolution Layers. 407 10.3.2 Pooling Layers . 410 10.3.3 Architecture of a Convolutional Neural Network. . 410 10.3.4 Data
Augmentation. 411 10.3.5 Results Using a Pretrained Classifier. 412 Document Classification. 413 Recurrent Neural Networks. 416 10.5.1 Sequential Models for Document Classification . . 418 10.5.2 Time Series Forecasting. 420 10.5.3 Summary of RNNs.424 When to Use Deep Learning. 425 Fitting a Neural Network. 427 10.7.1 Backpropagation. 428 10.7.2 Regularization and Stochastic Gradient Descent . . 429 10.7.3 Dropout Learning. 431 10.7.4 Network Tuning. 431 Interpolation and Double Descent. 432 Lab: Deep Learning. 435 10.9.1 Single Layer Network on Hitters Data. 437 10.9.2 Multilayer Network on the MNIST Digit Data . . 444 10.9.3 Convolutional Neural Networks. 448 10.9.4 Using Pretrained CNN Models .452 10.9.5 IMDB Document Classification.454 10.9.6 Recurrent Neural
Networks.458 Exercises. 465 11 Survival Analysis and Censored Data 11.1 11.2 11.3 11.4 11.5 xiii 469 Survival and Censoring Times. 470 A Closer Look at Censoring. 470 The Kaplan-Meier Survival Curve. 472 The Log-Rank Test. 474 Regression Models With a Survival Response. 476
xiv Contents 11.6 11.7 11.8 11.9 11.5.1 The Hazard Function. 476 11.5.2 Proportional Hazards. 478 11.5.3 Example: Brain Cancer Data.482 11.5.4 Example: Publication Data . 482 Shrinkage for the Cox Model. 484 Additional Topics . 486 11.7.1 Area Under the Curve for Survival Analysis. 486 11.7.2 Choice of Time Scale. 487 11.7.3 Time-Dependent Covariates. 488 11.7.4 Checking the Proportional HazardsAssumption . . 488 11.7.5 Survival Trees. 488 Lab: Survival Analysis. 489 11.8.1 Brain Cancer Data. 489 11.8.2 Publication Data.493 11.8.3 Call Center Data.494 Exercises. 498 12 Unsupervised Learning 503 12.1 The Challenge of Unsupervised Learning. 503 12.2 Principal Components Analysis. 504 12.2.1 What Are Principal Components? .505 12.2.2 Another
Interpretation of PrincipalComponents . 508 12.2.3 The Proportion of Variance Explained. 510 12.2.4 More on PCA. 512 12.2.5 Other Uses for Principal Components. 515 12.3 Missing Values and Matrix Completion. 515 12.4 Clustering Methods. 520 12.4.1 К-Means Clustering. 521 12.4.2 Hierarchical Clustering. 525 12.4.3 Practical Issues in Clustering. 532 12.5 Lab: Unsupervised Learning. . 535 12.5.1 Principal Components Analysis. 535 12.5.2 Matrix Completion. 539 12.5.3 Clustering. 542 12.5.4 NCI60 Data Example. 546 12.6 Exercises. 552 13 Multiple Testing 557 13.1 A Quick Review of Hypothesis Testing . 558 13.1.1 Testing a Hypothesis. 558 13.1.2 Type I and Type II Errors. 562 13.2 The Challenge of Multiple Testing. 563 13.3 The
Family-Wise Error Rate.565 13.3.1 What is the Family-Wise Error Rate? . 565 13.3.2 Approaches to Control the Family-Wise Error Rate 567 13.3.3 Trade-Off Between the FWER and Power.572 13.4 The False Discovery Rate.573 13.4.1 Intuition for the False Discovery Rate . 573 13.4.2 The Benjamini-Hochberg Procedure. 575
Contents XV 13.5 A Re-Sampling Approach to p-Values and False Discovery Rates.577 13.5.1 A Re-Sampling Approach to the p-Value. 578 13.5.2 A Re-Sampling Approach to the False Discovery Rate579 13.5.3 When Are Re-Sampling Approaches Useful? . 581 13.6 Lab: Multiple Testing. 583 13.6.1 Review of Hypothesis Tests. 583 13.6.2 Family-Wise Error Rate. 585 13.6.3 False Discovery Rate. 588 13.6.4 A Re-Sampling Approach. 590 13.7 Exercises. 593 Index 597 |
adam_txt |
Contents Preface 1 2 Introduction Statistical Learning What Is Statistical Learning?. 2.1.1 Why Estimate ƒ?. 2.1.2 How Do We Estimate ƒ?. 2.1.3 The Trade-Off Between Prediction Accuracy and Model Interpretability. 2.1.4 Supervised Versus Unsupervised Learning. 2.1.5 Regression Versus Classification Problems. 2.2 Assessing Model Accuracy . 2.2.1 Measuring the Quality ofFit . 2.2.2 The Bias-Variance Trade-Off. 2.2.3 The Classification Setting. 2.3 Lab: Introduction to Python. 2.3.1 Getting Started. 2.3.2 Basic Commands. 2.3.3 Introduction to Numerical Python . 2.3.4 Graphics. 2.3.5 Sequences and Slice Notation. 2.3.6 Indexing Data. 2.3.7 Loading Data. 2.3.8 For Loops. 2.3.9 Additional Graphical and Numerical Summaries . . 2.4
Exercises. 2.1 3 Linear Regression 3.1 3.2 Simple Linear Regression. 3.1.1 Estimating the Coefficients . 3.1.2 Assessing the Accuracy of the Coefficient Estimates. 3.1.3 Assessing the Accuracy of the Model. Multiple Linear Regression. 3.2.1 Estimating the Regression Coefficients. vij 1 15 15 17 20 23 25 27 27 28 31 34 40 40 40 42 48 51 51 55 59 61 63 69 70 71 72 77 80 81 ix
x Contents 3.3 3.4 3.5 3.6 3.7 4 3.2.2 Some Important Questions. 83 Other Considerations in the Regression Model. 91 3.3.1 Qualitative Predictors. 91 3.3.2 Extensions of the LinearModel. 94 3.3.3 Potential Problems.100 The Marketing Plan. 109 Comparison of Linear Regression with K-Nearest Neighbors. Ill Lab: Linear Regression. 116 3.6.1 Importing packages. 116 3.6.2 Simple Linear Regression. 117 3.6.3 Multiple Linear Regression.122 3.6.4 Multivariate Goodness of Fit .123 3.6.5 Interaction Terms. 124 3.6.6 Non-linear Transformations of the Predictors . . . 125 3.6.7 Qualitative Predictors. 126 Exercises. 127 Classification 135 4.1 An Overview of Classification. 135 4.2 Why Not Linear Regression?.
136 4.3 Logistic Regression . 138 4.3.1 The Logistic Model. 139 4.3.2 Estimating the Regression Coefficients. 140 4.3.3 Making Predictions. 141 4.3.4 Multiple Logistic Regression. 142 4.3.5 Multinomial Logistic Regression.144 4.4 Generative Models for Classification. 146 4.4.1 Linear Discriminant Analysis for p = 1. 147 4.4.2 Linear Discriminant Analysis for p 1. 150 4.4.3 Quadratic Discriminant Analysis. 156 4.4.4 Naive Bayes. 158 4.5 A Comparison of ClassificationMethods . 161 4.5.1 An Analytical Comparison. 161 4.5.2 An Empirical Comparison. 164 4.6 Generalized Linear Models. 167 4.6.1 Linear Regression on the Bikeshare Data. 167 4.6.2 Poisson Regression on the Bikeshare Data. 169 4.6.3 Generalized Linear Models in Greater Generality . 172 4.7 Lab: Logistic Regression, LDA,QDA, and KNN. 173 4.7.1 The Stock Market Data. 173 4.7.2 Logistic
Regression. 174 4.7.3 Linear Discriminant Analysis. 179 4.7.4 Quadratic Discriminant Analysis.181 4.7.5 Naive Bayes.182 4.7.6 К-Nearest Neighbors. 183 4.7.7 Linear and Poisson Regression on the Bikeshare Datal88 4.8 Exercises. 193
Contents 5 Resampling Methods 5.1 5.2 5.3 5.4 6.2 6.3 6.4 6.5 6.6 229 Subset Selection. 231 6.1.1 Best Subset Selection. 231 6.1.2 Stepwise Selection . 233 6.1.3 Choosing the Optimal Model. 235 Shrinkage Methods . 240 6.2.1 Ridge Regression. 240 6.2.2 The Lasso. 244 6.2.3 Selecting the Tuning Parameter. 252 Dimension Reduction Methods. 253 6.3.1 Principal Components Regression. 254 6.3.2 Partial Least Squares. 260 Considerations in High Dimensions . 262 6.4.1 High-Dimensional Data. 262 6.4.2 What Goes Wrong in High Dimensions?. 263 6.4.3 Regression in High Dimensions. 265 6.4.4 Interpreting Results in High Dimensions. 266 Lab: Linear Models and Regularization Methods. 267 6.5.1 Subset Selection Methods. 268 6.5.2 Ridge Regression and the Lasso. 273 6.5.3 PCR
and PLS Regression. 280 Exercises. 283 7 Moving Beyond Linearity 7.1 7.2 7.3 7.4 201 Cross-Validation. 202 5.1.1 The Validation Set Approach. 202 5.1.2 Leave-One-Out Cross-Validation . 204 5.1.3 ÅJ-Fold Cross-Validation . 206 5.1.4 Bias-Variance Trade-Off for /c-Fold Cross-Validation . 208 5.1.5 Cross-Validation on Classification Problems . 209 The Bootstrap. 212 Lab: Cross-Validation and the Bootstrap. 215 5.3.1 The Validation Set Approach. 216 5.3.2 Cross-Validation .217 5.3.3 The Bootstrap . 220 Exercises. 224 6 Linear Model Selection and Regularization 6.1 xi 289 Polynomial Regression. 290 Step Functions. 292 Basis Functions . 293 Regression
Splines. 294 7.4.1 Piecewise Polynomials. 294 7.4.2 Constraints and Splines . 296 7.4.3 The Spline Basis Representation .296 7.4.4 Choosing the Number and Locations of the Knots. 297 7.4.5 Comparison to Polynomial Regression. 299
xii Contents 7.5 7.6 7.7 7.8 7.9 Smoothing Splines. 300 7.5.1 An Overview of Smoothing Splines. 300 7.5.2 Choosing the SmoothingParameter λ . 301 Local Regression.303 Generalized Additive Models. 305 7.7.1 GAMs for Regression Problems. 306 7.7.2 GAMs for Classification Problems. 308 Lab: Non-Linear Modeling. 309 7.8.1 Polynomial Regression and StepFunctions. 310 7.8.2 Splines. 315 7.8.3 Smoothing Splines and GAMs. 317 7.8.4 Local Regression. 324 Exercises. 325 8 Tree-Based Methods 331 8.1 The Basics of Decision Trees. 331 8.1.1 Regression Trees . 331 8.1.2 Classification Trees. 337 8.1.3 Trees Versus Linear Models. 341 8.1.4 Advantages and Disadvantages of Trees. 341 8.2 Bagging, Random Forests, Boosting, and
Bayesian Additive Regression Trees. 343 8.2.1 Bagging. 343 8.2.2 Random Forests. 346 8.2.3 Boosting. 347 8.2.4 Bayesian Additive Regression Trees. 350 8.2.5 Summary of Tree Ensemble Methods. 353 8.3 Lab: Tree-Based Methods.354 8.3.1 Fitting Classification Trees. 355 8.3.2 Fitting Regression Trees. 358 8.3.3 Bagging and Random Forests.360 8.3.4 Boosting.361 8.3.5 Bayesian Additive Regression Trees. 362 8.4 Exercises. 363 9 Support Vector Machines 367 9.1 Maximal Margin Classifier. 367 9.1.1 What Is a Hyperplane?. 368 9.1.2 Classification Using a Separating Hyperplane . . . 368 9.1.3 The Maximal Margin Classifier. 370 9.1.4 Construction of the Maximal Margin Classifier . . 372 9.1.5 The Non-separable
Case. 372 9.2 Support Vector Classifiers. 373 9.2.1 Overview of the Support Vector Classifier.373 9.2.2 Details of the SupportVector Classifier.374 9.3 Support Vector Machines. 377 9.3.1 Classification with Non-Linear Decision Boundaries . 378 9.3.2 The Support VectorMachine. 379
Contents 9.4 9.5 9.6 9.7 9.3.3 An Application to the Heart Disease Data. 382 SVMs with More than Two Classes. 383 9.4.1 One-Versus-One Classification. 384 9.4.2 One-Versus-All Classification . 384 Relationship to Logistic Regression . 384 Lab: Support Vector Machines. 387 9.6.1 Support Vector Classifier. 387 9.6.2 Support Vector Machine. 390 9.6.3 ROC Curves. 392 9.6.4 SVM with Multiple Classes. 393 9.6.5 Application to Gene Expression Data. 394 Exercises. 395 10 Deep Learning 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 10.10 399 Single Layer Neural Networks .400 Multilayer Neural Networks. 402 Convolutional Neural Networks. 406 10.3.1 Convolution Layers. 407 10.3.2 Pooling Layers . 410 10.3.3 Architecture of a Convolutional Neural Network. . 410 10.3.4 Data
Augmentation. 411 10.3.5 Results Using a Pretrained Classifier. 412 Document Classification. 413 Recurrent Neural Networks. 416 10.5.1 Sequential Models for Document Classification . . 418 10.5.2 Time Series Forecasting. 420 10.5.3 Summary of RNNs.424 When to Use Deep Learning. 425 Fitting a Neural Network. 427 10.7.1 Backpropagation. 428 10.7.2 Regularization and Stochastic Gradient Descent . . 429 10.7.3 Dropout Learning. 431 10.7.4 Network Tuning. 431 Interpolation and Double Descent. 432 Lab: Deep Learning. 435 10.9.1 Single Layer Network on Hitters Data. 437 10.9.2 Multilayer Network on the MNIST Digit Data . . 444 10.9.3 Convolutional Neural Networks. 448 10.9.4 Using Pretrained CNN Models .452 10.9.5 IMDB Document Classification.454 10.9.6 Recurrent Neural
Networks.458 Exercises. 465 11 Survival Analysis and Censored Data 11.1 11.2 11.3 11.4 11.5 xiii 469 Survival and Censoring Times. 470 A Closer Look at Censoring. 470 The Kaplan-Meier Survival Curve. 472 The Log-Rank Test. 474 Regression Models With a Survival Response. 476
xiv Contents 11.6 11.7 11.8 11.9 11.5.1 The Hazard Function. 476 11.5.2 Proportional Hazards. 478 11.5.3 Example: Brain Cancer Data.482 11.5.4 Example: Publication Data . 482 Shrinkage for the Cox Model. 484 Additional Topics . 486 11.7.1 Area Under the Curve for Survival Analysis. 486 11.7.2 Choice of Time Scale. 487 11.7.3 Time-Dependent Covariates. 488 11.7.4 Checking the Proportional HazardsAssumption . . 488 11.7.5 Survival Trees. 488 Lab: Survival Analysis. 489 11.8.1 Brain Cancer Data. 489 11.8.2 Publication Data.493 11.8.3 Call Center Data.494 Exercises. 498 12 Unsupervised Learning 503 12.1 The Challenge of Unsupervised Learning. 503 12.2 Principal Components Analysis. 504 12.2.1 What Are Principal Components? .505 12.2.2 Another
Interpretation of PrincipalComponents . 508 12.2.3 The Proportion of Variance Explained. 510 12.2.4 More on PCA. 512 12.2.5 Other Uses for Principal Components. 515 12.3 Missing Values and Matrix Completion. 515 12.4 Clustering Methods. 520 12.4.1 К-Means Clustering. 521 12.4.2 Hierarchical Clustering. 525 12.4.3 Practical Issues in Clustering. 532 12.5 Lab: Unsupervised Learning. . 535 12.5.1 Principal Components Analysis. 535 12.5.2 Matrix Completion. 539 12.5.3 Clustering. 542 12.5.4 NCI60 Data Example. 546 12.6 Exercises. 552 13 Multiple Testing 557 13.1 A Quick Review of Hypothesis Testing . 558 13.1.1 Testing a Hypothesis. 558 13.1.2 Type I and Type II Errors. 562 13.2 The Challenge of Multiple Testing. 563 13.3 The
Family-Wise Error Rate.565 13.3.1 What is the Family-Wise Error Rate? . 565 13.3.2 Approaches to Control the Family-Wise Error Rate 567 13.3.3 Trade-Off Between the FWER and Power.572 13.4 The False Discovery Rate.573 13.4.1 Intuition for the False Discovery Rate . 573 13.4.2 The Benjamini-Hochberg Procedure. 575
Contents XV 13.5 A Re-Sampling Approach to p-Values and False Discovery Rates.577 13.5.1 A Re-Sampling Approach to the p-Value. 578 13.5.2 A Re-Sampling Approach to the False Discovery Rate579 13.5.3 When Are Re-Sampling Approaches Useful? . 581 13.6 Lab: Multiple Testing. 583 13.6.1 Review of Hypothesis Tests. 583 13.6.2 Family-Wise Error Rate. 585 13.6.3 False Discovery Rate. 588 13.6.4 A Re-Sampling Approach. 590 13.7 Exercises. 593 Index 597 |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | James, Gareth Witten, Daniela Hastie, Trevor 1953- Tibshirani, Robert 1956- Taylor, Jonathan E. |
author_GND | (DE-588)1038457327 (DE-588)108120849X (DE-588)172128242 (DE-588)172417740 (DE-588)102963100X |
author_facet | James, Gareth Witten, Daniela Hastie, Trevor 1953- Tibshirani, Robert 1956- Taylor, Jonathan E. |
author_role | aut aut aut aut aut |
author_sort | James, Gareth |
author_variant | g j gj d w dw t h th r t rt j e t je jet |
building | Verbundindex |
bvnumber | BV049047206 |
classification_rvk | ST 250 SK 830 XF 3400 SK 840 |
collection | ebook |
ctrlnum | (OCoLC)1390747667 (DE-599)BVBBV049047206 |
dewey-full | 519.5 |
dewey-hundreds | 500 - Natural sciences and mathematics |
dewey-ones | 519 - Probabilities and applied mathematics |
dewey-raw | 519.5 |
dewey-search | 519.5 |
dewey-sort | 3519.5 |
dewey-tens | 510 - Mathematics |
discipline | Informatik Mathematik Medizin |
discipline_str_mv | Informatik Mathematik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV049047206</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20241128</controlfield><controlfield tag="007">t|</controlfield><controlfield tag="008">230712s2023 xx a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9783031387463</subfield><subfield code="q">hbk</subfield><subfield code="9">978-3-031-38746-3</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9783031391897</subfield><subfield code="q">pbk : ca. EUR 79.95</subfield><subfield code="9">978-3-031-39189-7</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1390747667</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV049047206</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-473</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-860</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-355</subfield><subfield code="a">DE-863</subfield><subfield code="a">DE-188</subfield><subfield code="a">DE-29T</subfield><subfield code="a">DE-83</subfield><subfield code="a">DE-521</subfield><subfield code="a">DE-573</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-703</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">519.5</subfield><subfield code="2">23</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 250</subfield><subfield code="0">(DE-625)143626:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SK 830</subfield><subfield code="0">(DE-625)143259:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">XF 3400</subfield><subfield code="0">(DE-625)152765:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SK 840</subfield><subfield code="0">(DE-625)143261:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">62-04</subfield><subfield code="2">msc/2020</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">62H30</subfield><subfield code="2">msc/2020</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">68T05</subfield><subfield code="2">msc/2020</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">James, Gareth</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1038457327</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">An introduction to statistical learning</subfield><subfield code="b">with applications in Python</subfield><subfield code="c">Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Jonathan Taylor</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Cham</subfield><subfield code="b">Springer</subfield><subfield code="c">[2023]</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2023</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xv, 607 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Springer texts in statistics</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance, marketing, and astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, deep learning, survival analysis, multiple testing, and more. Color graphics and real-world examples are used to illustrate the methods presented. This book is targeted at statisticians and non-statisticians alike, who wish to use cutting-edge statistical learning techniques to analyze their data.</subfield></datafield><datafield tag="600" ind1="0" ind2="7"><subfield code="a">Python</subfield><subfield code="0">(DE-588)118793772</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Statistical Theory and Methods</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Statistics and Computing</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Applied Statistics</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Statistics </subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Datenanalyse</subfield><subfield code="0">(DE-588)4123037-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Statistik</subfield><subfield code="0">(DE-588)4056995-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Python</subfield><subfield code="0">(DE-588)118793772</subfield><subfield code="D">p</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Datenanalyse</subfield><subfield code="0">(DE-588)4123037-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="3"><subfield code="a">Statistik</subfield><subfield code="0">(DE-588)4056995-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Witten, Daniela</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)108120849X</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Hastie, Trevor</subfield><subfield code="d">1953-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)172128242</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Tibshirani, Robert</subfield><subfield code="d">1956-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)172417740</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Taylor, Jonathan E.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)102963100X</subfield><subfield code="4">aut</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-3-031-38747-0</subfield><subfield code="w">(DE-604)BV049032803</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://www.statlearning.com/</subfield><subfield code="x">Verlag</subfield><subfield code="z">kostenfrei</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bamberg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034309627&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ebook</subfield></datafield><datafield tag="940" ind1="1" ind2=" "><subfield code="q">gbd_0</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-034309627</subfield></datafield></record></collection> |
id | DE-604.BV049047206 |
illustrated | Illustrated |
index_date | 2024-07-03T22:20:35Z |
indexdate | 2024-12-29T04:08:24Z |
institution | BVB |
isbn | 9783031387463 9783031391897 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-034309627 |
oclc_num | 1390747667 |
open_access_boolean | 1 |
owner | DE-473 DE-BY-UBG DE-11 DE-860 DE-384 DE-355 DE-BY-UBR DE-863 DE-BY-FWS DE-188 DE-29T DE-83 DE-521 DE-573 DE-739 DE-703 |
owner_facet | DE-473 DE-BY-UBG DE-11 DE-860 DE-384 DE-355 DE-BY-UBR DE-863 DE-BY-FWS DE-188 DE-29T DE-83 DE-521 DE-573 DE-739 DE-703 |
physical | xv, 607 Seiten Illustrationen, Diagramme |
psigel | ebook gbd_0 |
publishDate | 2023 |
publishDateSearch | 2023 |
publishDateSort | 2023 |
publisher | Springer |
record_format | marc |
series2 | Springer texts in statistics |
spellingShingle | James, Gareth Witten, Daniela Hastie, Trevor 1953- Tibshirani, Robert 1956- Taylor, Jonathan E. An introduction to statistical learning with applications in Python Python (DE-588)118793772 gnd Statistical Theory and Methods Statistics and Computing Applied Statistics Statistics Datenanalyse (DE-588)4123037-1 gnd Statistik (DE-588)4056995-0 gnd Maschinelles Lernen (DE-588)4193754-5 gnd |
subject_GND | (DE-588)118793772 (DE-588)4123037-1 (DE-588)4056995-0 (DE-588)4193754-5 |
title | An introduction to statistical learning with applications in Python |
title_auth | An introduction to statistical learning with applications in Python |
title_exact_search | An introduction to statistical learning with applications in Python |
title_exact_search_txtP | An Introduction to Statistical Learning with Applications in Python |
title_full | An introduction to statistical learning with applications in Python Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Jonathan Taylor |
title_fullStr | An introduction to statistical learning with applications in Python Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Jonathan Taylor |
title_full_unstemmed | An introduction to statistical learning with applications in Python Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Jonathan Taylor |
title_short | An introduction to statistical learning |
title_sort | an introduction to statistical learning with applications in python |
title_sub | with applications in Python |
topic | Python (DE-588)118793772 gnd Statistical Theory and Methods Statistics and Computing Applied Statistics Statistics Datenanalyse (DE-588)4123037-1 gnd Statistik (DE-588)4056995-0 gnd Maschinelles Lernen (DE-588)4193754-5 gnd |
topic_facet | Python Statistical Theory and Methods Statistics and Computing Applied Statistics Statistics Datenanalyse Statistik Maschinelles Lernen |
url | https://www.statlearning.com/ http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034309627&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT jamesgareth anintroductiontostatisticallearningwithapplicationsinpython AT wittendaniela anintroductiontostatisticallearningwithapplicationsinpython AT hastietrevor anintroductiontostatisticallearningwithapplicationsinpython AT tibshiranirobert anintroductiontostatisticallearningwithapplicationsinpython AT taylorjonathane anintroductiontostatisticallearningwithapplicationsinpython |
Volltext öffnen
THWS Würzburg Zentralbibliothek Lesesaal
Signatur: |
1000 ST 250 J27 |
---|---|
Exemplar 1 | ausleihbar Checked out – Rückgabe bis: 23.06.2025 Vormerken |