Probability and statistics for data science: math + R + data
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Boca Raton
CRC Press, Taylor & Francis Group
[2020]
|
Schriftenreihe: | Data science series
A Chapman & Hall book |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | xxxii, 412 Seiten Digramme |
ISBN: | 9781138393295 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV046110457 | ||
003 | DE-604 | ||
005 | 20211025 | ||
007 | t | ||
008 | 190820s2020 |||| |||| 00||| eng d | ||
015 | |a GBB9B5674 |2 dnb | ||
020 | |a 9781138393295 |c pbk |9 978-1-138-39329-5 | ||
020 | |z 9780367260934 |c hbk |9 978-0-367-26093-4 | ||
035 | |a (OCoLC)1117771216 | ||
035 | |a (DE-599)BVBBV046110457 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-29T |a DE-739 |a DE-2070s |a DE-898 |a DE-521 | ||
084 | |a SK 850 |0 (DE-625)143263: |2 rvk | ||
084 | |a ST 250 |0 (DE-625)143626: |2 rvk | ||
100 | 1 | |a Matloff, Norman S. |d 1948- |e Verfasser |0 (DE-588)1018956115 |4 aut | |
245 | 1 | 0 | |a Probability and statistics for data science |b math + R + data |c Norman Matloff |
264 | 1 | |a Boca Raton |b CRC Press, Taylor & Francis Group |c [2020] | |
300 | |a xxxii, 412 Seiten |b Digramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Data science series | |
490 | 0 | |a A Chapman & Hall book | |
650 | 4 | |a Probabilities / Textbooks | |
650 | 4 | |a Mathematical statistics / Textbooks | |
650 | 4 | |a Probabilities / Data processing | |
650 | 4 | |a Mathematical statistics / Data processing | |
650 | 7 | |a Mathematical statistics |2 fast | |
650 | 7 | |a Mathematical statistics / Data processing |2 fast | |
650 | 7 | |a Probabilities |2 fast | |
650 | 7 | |a Probabilities / Data processing |2 fast | |
650 | 0 | 7 | |a Statistik |0 (DE-588)4056995-0 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Wahrscheinlichkeit |0 (DE-588)4137007-7 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Data Science |0 (DE-588)1140936166 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Wahrscheinlichkeit |0 (DE-588)4137007-7 |D s |
689 | 0 | 1 | |a Statistik |0 (DE-588)4056995-0 |D s |
689 | 0 | 2 | |a Data Science |0 (DE-588)1140936166 |D s |
689 | 0 | |5 DE-604 | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-0-429-68712-9 |w (DE-604)BV046249663 |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=031491058&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-031491058 |
Datensatz im Suchindex
_version_ | 1804180424785657856 |
---|---|
adam_text | Contents About the Author xxiii To the Instructor xxv To the Reader I xxxi Fundamentals of Probability 1 1 Basic Probability Models 3 1.1 Example: Bus Ridership. 3 1.2 A “Notebook” View: the Notion of a Repeatable Experiment 4 1.2.1 Theoretical Approaches ... 1 1.2.2 A More Intuitive Approach ‘ . 5 . . .... . ... ... 5 1.3 Our Definitions ........................................................... 7 1.4 “Mailing Tubes”................. ... .............. 11 1.5 Example: Bus Ridership Model (cont’d.) .......... 11 1.6 Example: ALOHA Network...... ............... 14 1.6.1 ALOHA Network Model Summary ......... 16 1.6.2 ALOHA Network Computations .............................. 16 1.7 ALOHA in the Notebook Context . . ... . . . . . . . . . 19՜ 1.8 Example: A Simple Board Game ....................................... 20 v
Vi CONTENTS 1.9 Bayes’Rule...................................... 23 1.9.1 General Principle...................................................... 23 1.9.2 Example: Document Classification........................ 23 Random Graph Models......................................................... 24 1.10.1 Example: Preferential Attachment Model . . . . · 25 Combinatorics-Based Computation.................................... 26 1.11.1 Which Is More Likely in Five Cards, One King or Two Hearts?.............................................................. 26 1.11.2 Example: Random Groups of Students ....... 27 1.11.3 Example: Lottery Tickets....................................... 27 1.11.4 Example: Gaps between Numbers........................... 28 1.11.5 Multinomial Coefficients.......................... 29 1.11.6 Example: Probability of Getting Four Aces in a Bridge Hand ... ........................ 30 1.12 Exercises................................................................................ 31 Monte Carlo Simulation 35 2.1 Example: Rolling Dice ......................................................... 35 2.1.1 First Improvement .............................. 36 2.1.2 Second Improvement . . . ....................................... 37 2.1.3 Third Improvement............................. 38 1.10 1.11 2 2.2 Example: Dice Problem .......................... 39 2.3 Use of runif() for Simulating Events.................................... 39 2.4 Example: Bus Ridership (cont’d.) . . . .............................. 40 2.5 Example: Board Game (cont’d.)................................
40 2.6 Example: Broken Rod..................... .. . ........................... 41 2.7 How Long Should We Run the Simulation?........................ 42 2.8 Computational Complements 42 .............................................
CONTENTS 2.8.1 2.9 vii More on the replicateQ Function ........................... Exercises................................. 3 Discrete Random Variables: Expected Value 42 43 45 3.1 Random Variables................................................................. 45 3.2 Discrete Random Variables................................................... 46 3.3 Independent Random Variables.......................................... 46 3.4 Example: The Monty Hall Problem................. 47 3.5 Expected Value.................................................. 50 3.5.1 Generality — Not Just for Discrete Random Variables 50 3.5.2 Misnomer.................................................................. 50 3.5.3 Definition and Notebook View ............ 50 Properties of Expected Value ............................................. 51 3.6.1 Computational Formula.......................................... 51 3.6.2 Further Properties of Expected Value..................... 54 3.7 Example: Bus Ridership...................................................... 58 3.8 Example: Predicting Product Demand.............................. 58 3.9 Expected Values via Simulation.................... 59 3.10 Casinos, Insurance Companies and “Sum Users,” Compared to Others ............................................................ 60 Mathematical Complements....................... 61 3.11.1 Proof of Property E............ ...................................... 61 3.6 3.11 3.12 Exercises................................... 4 Discrete Random Variables: Variance 4.1 Variance . .................. 62 65 65 4.1.1
Definition . . ............................................................ 65 4.1.2 Central Importance of the Concept of Variance . . 69
viii CONTENTS 4.1.3 Intuition Regarding the Size of Var (X) . . ... . . 69 4.1.3.1 Chebychev’s Inequality............ ................. 69 4.1.3.2 The Coefficient of Variation..................... 70 4.2 A Useful Fact........................................................................ 71 4.3 Covariance................................... 72 4.4 Indicator Random Variables, and Their Means and Variances 74 4.4.1 Example: Return Time for Library Books, Version I 75 4.4.2 Example: Return Time for Library Books, Version II 76 4.4.3 Example: Indicator Variables in a Committee Problem............ .......................... 77 4.5 Skewness............................................... 79 4.6 Mathematical Complements ................................................ 79 4.6.1 79 4.7 Proof of Chebychev’s Inequality.................... Exercises...................................... 81 5 Discrete Parametric Distribution Families 5.1 83 Distributions.......................................................................... 83 5.1.1 Example: Toss Coin Until First Head..................... 84 5.1.2 Example: Sum of Two Dice..................................... 85 5.1.3 Example: Watts-Strogatz Random Graph Model . 85 5.1.3.1 85 The Model .................................... 5.2 Parametric Families of Distributions 5.3 The Case of Importance to Us: Parameteric Families of pmfs 86 5.4 Distributions Based on Bernoulli Trials.............................. 88 5.4.1 The Geometric Family of Distributions ....... 88 5.4.1.1 R Functions................................................ 91 5.4.1.2
Example: A Parking Space Problem ... 92 5.4.2 ............. The Binomial Family of Distributions............ .. 86 94
CONTENTS 5.4.2.1 R Functions........................................ 95 5.4.2.2 Example: Parking Space Model................ 96 The Negative Binomial Family of Distributions . . 96 5.4.3.1 R Functions.............................. 97 5.4.3.2 Example: Backup Batteries...................... 98 Two Major Non-Bernoulli Models........................................ 98 5.5.1 ...................... 99 5.5.1.1 R Functions................................................... 99 5.4.3 5.5 ix The Poisson Family of Distributions 5.5.1.2 Example: Broken Rod.................................... 100 5.5.2 The Power Law Family of Distributions...................100 5.5.2.1 5.5.3 The Model ............... 100 Fitting the Poisson and Power Law Models to Data 102 5.5.3.1 Poisson Model 5.5.3.2 ................................................ 102 Straight-Line Graphical Test for the Power Law ............. ...................... 103 5.5.3.3 Example: DNC E-mail Data...........................103 5.6 5.7 Further Examples........................................ ........................... 106 5.6.1 Example: The Bus Ridership Problem...................106 5.6.2 Example: Analysis of Social Networks...................... 107 Computational Complements 5.7.1 5.8 ........................ 108 Graphics and Visualization in R...................................108 Exercises........................................... 6 Continuous Probability Models 109 113 6.1 A Random Dart........................................................................... 113 6.2 Individual Values Now Have
ProbabilityZero........................114 6.3 But Now We Have a Problem.................................................. 115
CONTENTS x 6.4 6.5 Our Way Out of the Problem: Cumulative Distribution Functions........................................... . ...............................115 6.4.1 CDFs . . . . ............. .. · ............... ... · ............... 115 6.4.2 Non-Discrete, Non-Continuous Distributions .... 119 Density Functions...................................... 119 6.5.1 Properties of Densities................................................ 120 6.5.2 Intuitive Meaning of Densities....................... . . . 122 6.5.3 Expected Values.................... 122 6.6 A First Example................... .... 6.7 Famous Parametric Familiesof Continuous Distributions . 124 6.7.1 123 The Uniform Distributions.......................................125 6.7.1.1 Density and Properties..................................125 6.7.2 6.7.1.2 R Functions.......................... 125 6.7.1.3 Example: Modeling of Disk Performance . 126 6.7.1.4 Example: Modeling of Denial-of-Service Attack ................................. 126 The Normal (Gaussian) Family of Continuous Distributions .......................... :........................... 127 6.7.2.1 Density and Properties ........................ 127 6.7.2.2 R Functions............. .......................................127 6.7.2.3 Importance in Modeling . 6.7.3 . .. .....................128 The Exponential Family of Distributions...............128 6.7.3.1 Density and Properties.................. 128 6.7.3.2 R Functions................................................... 128 6.7.3.3 Example: Garage ParkingFees.................... 129 6.7.3.4 Memoryless
Property of Exponential Distributions . . ............... ... . ... . . . 130 6.7.3.5 Importance in Modeling . . ........................131
xi CONTENTS 6.7.4 The Gamma Family of Distributions......................... 131 6.7.4.1 Density and Properties . . ................... 132 6.7.4.2 Example: Network Buffer ......... 133 6.7.4.3Importance in Modeling.....................................133 6.7.5 The Beta Family of Distributions ................................ 134 6.7.5.1 6.7.5.2 6.8 6.9 6.10 II Density Etc..................................... 134 Importance in Modeling............ ·..... 138 Mathematical Complements ......................................................138 6.8.1 Hazard Functions ........................................... 6.8.2 Duality of the Exponential Family with the Poisson Family.................................. · · ................................. 139 Computational Complements 138 . . . ............................ ... . .141 6.9.1 R’s integrate() Function.................................. . . . 141 6.9.2 Inverse Method for Sampling from a Density .... 141 6.9.3 Sampling from a Poisson Distribution ........ 142 Exercises........................................ Fundamentals of Statistics 7 Statistics: Prologue 143 147 149 7.1 Importance of This Chapter ................ ... ............................ 150 7.2 Sampling Distributions ,. ............................................................ 150 7.2.1 7.3 Random Samples........................................ 150 The Sample Mean — a Random Variable............................152 7.3.1 Toy Population Example ... . . . . . 152 7.3.2 Expected Value and Variance of X . ..........................153 7.3.3 Toy Population Example Again .
. . . . . : . . . . 154
CONTENTS xii 7.3.4 Interpretation .................. 155 7.3.5 Notebook View ..................... 155 .7.4 Simple Random Sample Case.................... 156 7.5 The Sample Variance . ............................. ..............................157 .·, 7.6 7.5.1 Intuitive Estimation of σ2 ......՛...........................157 7.5.2 Easier Computation . . . .... ... ................... 158 7.5.3 Special Case: X Is an Indicator Variable............... 158 To Divide by n or n-1? . . . ............... .. . . ... . .·. . . 159 7.6.1 Statistical Bias............... 159 7.7 The Concept of a “Standard Error”. . . ... . . . . . . . 161 7.8 Example: Pima Diabetes Study............ .. ............................ 162 7.9 Don’t Forget: Sample Џ Population! .............. 7.10 Simulation Issues ......................................................................164 164 7.10.1 Sample Estimates . ............... .. 164 7.10.2 Infinite Populations?................. 164 7.11 Observational Studies............................. 165 7.12 Computational Complements ................................................ 165 7.12.1 The *apply() Functions . . . . . . . ... ... . . . 165 7.12.1.1 R’s applyQ Function.....................................166 7.12.1.2 The lapplyO and sapply() Function .... 166 7.12.1.3 The splitØ and tapplyØ Functions .... 167 7.12.2 Outliers/Errors in the Data....................................... 168 8 7.13 Exercises...................................... 170 Fitting Continuous Models 171 8.1 Why Fit a Parametric Model? .................................................171
8.2 Model-Free Estimation of a Densityfrom Sample Data . . 172
xiii CONTENTS 8.2.1 A Closer Look .................................................. ... 172 8.2.2 Example: BMI Data..................................... 173 8.2.3 The Number of Bins ......................... ........................ 174 8.2.3.1 The Bias-Variance Tradeoff............ ... 175 8.2.3.2 The Bias-Variance Tradeoff in the Histogram Case............................................................. 176 8.2.3.3 A General Issue: Choosing the Degree of Smoothing .................................................... 178 8.3 Advanced Methods for Model-Free DensityEstimation . . 180 8.4 Parameter Estimation .................................. 181 8.4.1 Method of Moments . . ......................... .................. 181 8.4.2 Example: BMI Data.................. 8.4.3 The Method of Maximum Likelihood......................... 183 8.4.4 Example: Humidity Data 182 ............................................185 8.5 MM vs. MLE ................................................................. 8.6 Assessment of Goodness of Fit.................................................. 187 8.7 The Bayesian Philosophy........................................................... 189 8.8 8.7.1 How Does It Work?........................................................ 190 8.7.2 Arguments For and Against.........................................190 Mathematical Complements..................................................... 191 8.8.1 8.9 8.10 187 Details of Kernel Density Estimators ..........................191 Computational Complements .................................................. 192
8.9.1 Generic Functions........................................................... 192 8.9.2 The gmm Package........................................................... 193 8.9.2.1 The gmm() Function....................·.... 193 8.Ә.2.2 Example: Bodyfat Data . ............................. 193 Exercises....................................................................................... 194
CONTENTS XIV 9 The Family of Normal Distributions 9.1 197 Density and Properties . . .... . ... . . . . . . . . . . 197 9.1.1 Closure under Affine Transformation ......................... 198 9.1.2 Closure under Independent Summation......................199 9.1:3 A Mystery ......................................................................200 9.2 R Functions..............................................................................200 9.3 The Standard Normal Distribution....................................200 9.4 Evaluating Normal cdfs . .............................. ... 201 9.5 Example: Network Intrusion.............. 202 9.6 Example: Class Enrollment Size .......................................... 203 9.7 The Central Limit Theorem . .... ...... . . .... 204 9.8 9.7.1 Example: Cumulative Roundoff Error........................ 205 9.7.2 Example: Coin Tosses .................................... .. . . 205 9.7.3 Example: Museum Demonstration ........................... 206 9.7.4 A Bit of Insight into the Mystery.............................. 207 X Is Approximately Normal ................................................ 207 9.8.1 Approximate Distribution of X ... :..................... 207 9.8.2 Improved Assessment of Accuracy of A ..................... 208 9.9 Importance in Modeling ................... 209 9.10 The Chi-Squared Family of Distributions ........................210 9.10.1 Density and Properties ..............................................210 9.10.2 Example: Error in Pin Placement ........................... 211 9.10.3 Importance in
Modeling............................................. 211 9.10.4 Relation to Gamma Family....................................... 212 9.11 Mathematical Complements................................................... 212 9.11.1 Convergence in Distribution, and the Precisely-Stated CLT............................................., 212
xv CONTENTS 9.12 Computational Complements...................................... 213 9.12.1 Example: Generating Normal Random Numbers . 213 9.13 Exercises................. 214 10 Introduction to Statistical Inference 217 10.1 The Role of Normal Distributions.......................................... 217 10.2 Confidence Intervals for Means................................................ 218 10.2.1 Basic Formulation..................................................... 218 10.3 Example: Pima Diabetes Study ................................. 220 10.4 Example: Humidity Data......................................................... 221 10.5 Meaning of Confidence Intervals............................................. 221 10.5.1 A Weight Survey in Davis...................................... 221 10.6 Confidence Intervals for Proportions.......................... - 223 10.6.1 Example: Machine Classification of Forest Covers . 224 10.7 The Student-t Distribution...................................................... 226 10.8 Introduction to Significance Tests ...........................................227 10.9 The Proverbial Fair Coin ............................................ 228 10.10 The Basics ..............................................................................229 10.11 General Normal Testing . . . ................................. 231 10.12 The Notion of “p-Values” . ....................................... 231 10.13 What’s Random and What Is Not................. 232 10.14 Example: The Forest Cover Data.......................................... 232 10.15 Problems with
Significance Testing.......................................234 10.15.1 History of Significance Testing . . . . . . . . ... . . 234 10.15.2 The Basic Issues.........................................................235 10.15.3 Alternative Approach............ .. . ........................ 236 10.16 The Problem of “P-hacking” ....................... 237
CONTENTS XVI 10.16.1 A Thought Experiment . . ... . . . . ............... 238 10.16.2 Multiple Inference Methods ...... . ............... 238 ։ 10.17 Philosophy of Statistics................... 239 10.17.1 More about Interpretation of CIs...............................239 10.17.1.1 The Bayesian View of Confidence Intervals 241 10.18 Exercises.....................................................................................241 III Multivariate Analysis 11 Multivariate Distributions 11.1 243 245 Multivariate Distributions: Discrete....................................... 245 11.1.1 Example: Marbles in a Bag....................................... 245 11.2 Multivariate Distributions: Continuous................................. 246 11.2.1 Motivation and Definition.......................................... 246 11.2.2 Use of Multivariate Densities in Finding Probabilities and Expected Values . . . .................. 247 11.2.3 Example: Train Rendezvous ........................................247 11.3 Measuring Co-variation................................ 248 11.3.1 Covariance........................................................... .. . 248 11.3.2 Example: The Committee Example Again............250 11.4 Correlation..................................................... 251 11.4.1 Sample Estimates ..........................................................252 11.5 Sets of Independent Random Variables . . . ..................... 252 11.5.1 Mailing Tubes . . . . ՛.՛■՛. . ............... .. .................... 252 11.5.1.1 Expected Values Factor.............................. 253
11.5.1.2 Covariance Is 0................. 253 11.5.1.3 Variances Add............ .................................253
CONTENTS 11.6 xvii Matrix Formulations....................... 254 11.6.1 Mailing Tubes: Mean Vectors ............ 254 11.6.2 Covariance Matrices ... . ... . .. . . · ■ 254 11.6.3 Mailing Tubes: Covariance Matrices 11.7 . . . ... . . 255 Sample Estimate of Covariance Matrix . ...............................256 11.7.1 Example: Pima Data . ................ 257 11.8 Mathematical Complements .......................................... 257 11.8.1 Convolution .................................... 257 11.8.1.1 Example: Backup Battery....................... 11.8.2 Transform Methods...................................... 258 259 11.8.2.1 Generating Functions ................................. 259 11.8.2.2 Sums of Independent Poisson Random Variables Are Poisson Distributed .... 261 11.9 Exercises . ........................ .......................... ................. .. ■ · 262 12 The Multivariate Normal Family of Distributions 265 12.1 Densities . . .,.............. 265 12.2 Geometric Interpretation....................... 266 12.3 R Functions.......................... . .... ............. 269 12.4 Special Case: New Variable Is a Single Linear Combination of a Random Vector..................................................................270 12.5 Properties of Multivariate Normal Distributions.................. 270 12.6 The Multivariate Central Limit Theorem.............................. 272 12.7 Exercises....................................................................................273 13 Mixture Distributions 13.1 275 Iterated Expectations .................... 276 13.1.1
Conditional Distributions .............. 277
CONTENTS xviii 13.1.2 The Theorem..................... 13.2 277 13.1.3 Example: Flipping Coins with Bonuses ....................279 13.1.4 Conditional Expectation as a Random Variable . . 280 13.1.5 What about Variance? . . . . . . . ......................... 280 A Closer Look at Mixture Distributions .............................281 13.2.1 Derivation of Mean and Variance . . . .... . . . 281 13.2.2 Estimation of Parameters .............................................283 13.2.2.1 Example: Old Faithful Estimation .... 283 13.3 Clustering........................... 284 13.4 Exercises........................... 285 14 Multivariate Description and Dimension Reduction 14.1 287 What Is Overfitting Anyway?.................................................. 288 14.1.1 “Desperate for Data”..................................................... 288 14.1.2 Known Distribution........................................................ 289 14.1.3 Estimated Mean.................. 289 14.1.4 The Bias/Variance Tradeoff: Concrete Illustration . 290 14.1.5 Implications.....................................................................292 14.2 Principal Components Analysis............................................... 293 14.2.1 Intuition . ................... 14.2.2 Properties of PCA 293 ........................................................ 295 14.2.3 Example: Turkish Teaching Evaluations................... 296 14.3 The Log-Linear Model...............................................................297 14.3.1 Example: Hair Color, Eye Color and Gender .... 297 14.3.2 Dimension of Our
Data.................................................. 299 14.3.3 Estimating the Parameters ............................................ 299 14.4 Mathematical Complements..................... 300
CONTENTS XIX 14.4.1 Statistical Derivation of PCA..................... 14.5 Computational Complements 300 ............ ... . . ...................... 302 14.5.1 R Tables...........................................................................302 14.5.2 Some Details on Log-Linear Models............................ 302 14.5.2.1 Parameter Estimation...................................303 14.5.2.2 The loglinØ Function .................................. 304 14.5.2.3 Informal Assessment of Fit......................... 305 14.6 Exercises........................................ 15 Predictive Modeling 306 309 15.1 Example: Heritage Health Prize...............................................309 15.2 The Goals: Prediction and Description...................................310 15.2.1 Terminology.....................................................................310 15.3 What Does “Relationship” Mean?............................................311 15.3.1 Precise Definition........................................................... 311 15.3.2 Parametric Models for the Regression Function m() 313 15.4 Estimation in Linear Parametric Regression Models .... 314 15.5 Example: Baseball Data........................................................... 315 15.5.1 R Code..............................................................................316 15.6 Multiple Regression.....................................................................319 15.7 Example: Baseball Data (cont’d.) . . . ......................... ·՛. . 320 15.8 Interaction Terms........................................ 15.9
Parametric Estimation.............................................................. 322 15.9.1 Meaning of “Linear” 321 ..................................................... 322 15.9.2 Random-X and Fixed-X Regression............... ... . . 322 , 15.9.3 Point Estimates and Matrix Formulation....................323 15.9.4 Approximate Confidence Intervals 326
CONTENTS XX 15.10 Example: Baseball Data (cont’d.) .................... 15.11 Dummy Variables 328 ..................................................................... 329 15.12 Classification........................................ 330 15.12.1 Classification — Regression ............. 331 15.12.2 Logistic Regression ......................................................... 332 15.12.2.1 The Logistic Model: Motivations............ 332 15.12.2.2 Estimation and Inference for Logit . . . . 334 15.12.3 Example: Forest Cover Data . . . . . . . . . . . . 334 15.12.4 R Code........................................................................... 334 15.12.5 Analysis of the Results 15.12.5.1 Multiclass Case ............................................... 335 ............................................336 15.13 Machine Learning: Neural Networks......................................336 15.13.1 Example: Predicting Vertebral Abnormalities . . . 336 15.13.2 But What Is Really Going On?............................... 339 15.13.3 R Packages.....................................................................339 15.14 Computational Complements.................................................. 340 15.14.1 Computational Details in Section 15.5.1 ...... 340 15.14.2 More Regarding glm().................................................. 341 15.15 Exercises....................................................................................... 342 16 Model Parsimony and Overfitting 16.1 What Is Overfitting?........................ 343 343 16.1.1 Example: Histograms ................. 343 16.1.2
Example: Polynomial Regression................................344 16.2 16.3 Can Anything Be Done about It? . . ...................................345 16.2.1 Cross-Validation . ...................... 345 Predictor Subset Selection . . .................................. 346
xxi CONTENTS 16.4 Exercises............................................................................. 17 Introduction to Discrete Time Markov Chains 347 349 17.1 Matrix Formulation.....................................................................350 17.2 Example: Die Game 17.3 Long-Run State Probabilities ................................................................. 351 ............ ... . . ...................... 352 17.3.1 Stationary Distribution..................................................353 17.3.2 Calculation of π.................................................... . 354 17.3.3 Simulation Calculation of π.........................................355 17.4 Example: 3-Heads-in-a-Row Game .........................................356 17.5 Example: Bus Ridership Problem..................... 358 17.6 Hidden Markov Models............ ............................................. 359 17.6.1 Example: Bus Ridership...............................................360 17.6.2 Computation ................................................................. 361 17.7 Google PageRank........................................................................361 17.8 Computational Complements .................................................. 361 17.8.1 Initializing a Matrix to All Os......................................361 17.9 IV Exercises....................................................................................... 362 Appendices A R Quick Start 365 367 A.l Starting R................................................................................... 367 A.2
Correspondences.......................................................................... 368 A.3 First Sample Programming Session........................................369 A.4 Vectorization................................................................................ 372 A.5 Second Sample Programming Session..................................... 372
CONTENTS XXII А.б Recycling.......................................................................................374 А.7 More on Vectorization A.8 Default Argument Values........................................................... 375 A.9 The R List Type .............................................................. 374 ....................................................................... 376 A.9.1 The Basics........................................................................376 A.9.2 S3 Classes........................................................................377 A.10 Data Frames.................................................................................378 A. 11 Online Help.................................................................................380 A. 12 Debugging in R.......................................................................... 380 В Matrix Algebra B.l 383 Terminology and Notation.................................... B.1.1 383 Matrix Addition and Multiplication.............................383 B.2 Matrix Transpose....................................................................... 385 B.3 Matrix Inverse..............................................................................385 B.4 Eigenvalues and Eigenvectors B.5 Mathematical Complements.....................................................386 ..................................................385 B.5.1 Matrix Derivatives.........................................................386 Bibliography 391 Index 395
|
any_adam_object | 1 |
author | Matloff, Norman S. 1948- |
author_GND | (DE-588)1018956115 |
author_facet | Matloff, Norman S. 1948- |
author_role | aut |
author_sort | Matloff, Norman S. 1948- |
author_variant | n s m ns nsm |
building | Verbundindex |
bvnumber | BV046110457 |
classification_rvk | SK 850 ST 250 |
ctrlnum | (OCoLC)1117771216 (DE-599)BVBBV046110457 |
discipline | Informatik Mathematik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02224nam a2200529 c 4500</leader><controlfield tag="001">BV046110457</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20211025 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">190820s2020 |||| |||| 00||| eng d</controlfield><datafield tag="015" ind1=" " ind2=" "><subfield code="a">GBB9B5674</subfield><subfield code="2">dnb</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781138393295</subfield><subfield code="c">pbk</subfield><subfield code="9">978-1-138-39329-5</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="z">9780367260934</subfield><subfield code="c">hbk</subfield><subfield code="9">978-0-367-26093-4</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1117771216</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV046110457</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-29T</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-2070s</subfield><subfield code="a">DE-898</subfield><subfield code="a">DE-521</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SK 850</subfield><subfield code="0">(DE-625)143263:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 250</subfield><subfield code="0">(DE-625)143626:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Matloff, Norman S.</subfield><subfield code="d">1948-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1018956115</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Probability and statistics for data science</subfield><subfield code="b">math + R + data</subfield><subfield code="c">Norman Matloff</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boca Raton</subfield><subfield code="b">CRC Press, Taylor & Francis Group</subfield><subfield code="c">[2020]</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xxxii, 412 Seiten</subfield><subfield code="b">Digramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Data science series</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">A Chapman & Hall book</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Probabilities / Textbooks</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Mathematical statistics / Textbooks</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Probabilities / Data processing</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Mathematical statistics / Data processing</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Mathematical statistics</subfield><subfield code="2">fast</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Mathematical statistics / Data processing</subfield><subfield code="2">fast</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Probabilities</subfield><subfield code="2">fast</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Probabilities / Data processing</subfield><subfield code="2">fast</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Statistik</subfield><subfield code="0">(DE-588)4056995-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Wahrscheinlichkeit</subfield><subfield code="0">(DE-588)4137007-7</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Science</subfield><subfield code="0">(DE-588)1140936166</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Wahrscheinlichkeit</subfield><subfield code="0">(DE-588)4137007-7</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Statistik</subfield><subfield code="0">(DE-588)4056995-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Data Science</subfield><subfield code="0">(DE-588)1140936166</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-0-429-68712-9</subfield><subfield code="w">(DE-604)BV046249663</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=031491058&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-031491058</subfield></datafield></record></collection> |
id | DE-604.BV046110457 |
illustrated | Not Illustrated |
indexdate | 2024-07-10T08:35:33Z |
institution | BVB |
isbn | 9781138393295 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-031491058 |
oclc_num | 1117771216 |
open_access_boolean | |
owner | DE-29T DE-739 DE-2070s DE-898 DE-BY-UBR DE-521 |
owner_facet | DE-29T DE-739 DE-2070s DE-898 DE-BY-UBR DE-521 |
physical | xxxii, 412 Seiten Digramme |
publishDate | 2020 |
publishDateSearch | 2020 |
publishDateSort | 2020 |
publisher | CRC Press, Taylor & Francis Group |
record_format | marc |
series2 | Data science series A Chapman & Hall book |
spelling | Matloff, Norman S. 1948- Verfasser (DE-588)1018956115 aut Probability and statistics for data science math + R + data Norman Matloff Boca Raton CRC Press, Taylor & Francis Group [2020] xxxii, 412 Seiten Digramme txt rdacontent n rdamedia nc rdacarrier Data science series A Chapman & Hall book Probabilities / Textbooks Mathematical statistics / Textbooks Probabilities / Data processing Mathematical statistics / Data processing Mathematical statistics fast Mathematical statistics / Data processing fast Probabilities fast Probabilities / Data processing fast Statistik (DE-588)4056995-0 gnd rswk-swf Wahrscheinlichkeit (DE-588)4137007-7 gnd rswk-swf Data Science (DE-588)1140936166 gnd rswk-swf Wahrscheinlichkeit (DE-588)4137007-7 s Statistik (DE-588)4056995-0 s Data Science (DE-588)1140936166 s DE-604 Erscheint auch als Online-Ausgabe 978-0-429-68712-9 (DE-604)BV046249663 Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=031491058&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Matloff, Norman S. 1948- Probability and statistics for data science math + R + data Probabilities / Textbooks Mathematical statistics / Textbooks Probabilities / Data processing Mathematical statistics / Data processing Mathematical statistics fast Mathematical statistics / Data processing fast Probabilities fast Probabilities / Data processing fast Statistik (DE-588)4056995-0 gnd Wahrscheinlichkeit (DE-588)4137007-7 gnd Data Science (DE-588)1140936166 gnd |
subject_GND | (DE-588)4056995-0 (DE-588)4137007-7 (DE-588)1140936166 |
title | Probability and statistics for data science math + R + data |
title_auth | Probability and statistics for data science math + R + data |
title_exact_search | Probability and statistics for data science math + R + data |
title_full | Probability and statistics for data science math + R + data Norman Matloff |
title_fullStr | Probability and statistics for data science math + R + data Norman Matloff |
title_full_unstemmed | Probability and statistics for data science math + R + data Norman Matloff |
title_short | Probability and statistics for data science |
title_sort | probability and statistics for data science math r data |
title_sub | math + R + data |
topic | Probabilities / Textbooks Mathematical statistics / Textbooks Probabilities / Data processing Mathematical statistics / Data processing Mathematical statistics fast Mathematical statistics / Data processing fast Probabilities fast Probabilities / Data processing fast Statistik (DE-588)4056995-0 gnd Wahrscheinlichkeit (DE-588)4137007-7 gnd Data Science (DE-588)1140936166 gnd |
topic_facet | Probabilities / Textbooks Mathematical statistics / Textbooks Probabilities / Data processing Mathematical statistics / Data processing Mathematical statistics Probabilities Statistik Wahrscheinlichkeit Data Science |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=031491058&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT matloffnormans probabilityandstatisticsfordatasciencemathrdata |