Practical data science with R:
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Shelter Island, NY
Manning
2014
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | XXV, 389 S. Ill., graph. Darst. |
ISBN: | 9781617291562 1617291560 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV041840045 | ||
003 | DE-604 | ||
005 | 20200110 | ||
007 | t | ||
008 | 140512s2014 ad|| |||| 00||| eng d | ||
020 | |a 9781617291562 |9 978-1-61729-156-2 | ||
020 | |a 1617291560 |9 1-61729-156-0 | ||
035 | |a (OCoLC)879595796 | ||
035 | |a (DE-599)GBV782270417 | ||
040 | |a DE-604 |b ger | ||
041 | 0 | |a eng | |
049 | |a DE-Aug4 |a DE-83 |a DE-2070s |a DE-945 |a DE-384 |a DE-739 | ||
082 | 0 | |a 006.312 | |
084 | |a ST 250 |0 (DE-625)143626: |2 rvk | ||
084 | |a ST 601 |0 (DE-625)143682: |2 rvk | ||
100 | 1 | |a Zumel, Nina |e Verfasser |0 (DE-588)1055925899 |4 aut | |
245 | 1 | 0 | |a Practical data science with R |c Nina Zumel ; John Mount |
264 | 1 | |a Shelter Island, NY |b Manning |c 2014 | |
300 | |a XXV, 389 S. |b Ill., graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 0 | 7 | |a R |g Programm |0 (DE-588)4705956-4 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Datenanalyse |0 (DE-588)4123037-1 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Statistik |0 (DE-588)4056995-0 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a R |g Programm |0 (DE-588)4705956-4 |D s |
689 | 0 | 1 | |a Datenanalyse |0 (DE-588)4123037-1 |D s |
689 | 0 | 2 | |a Statistik |0 (DE-588)4056995-0 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Mount, John |e Verfasser |0 (DE-588)1202632769 |4 aut | |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027284747&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-027284747 |
Datensatz im Suchindex
_version_ | 1804152181393195008 |
---|---|
adam_text | contents foreword, xv preface xvii acknowledgments xviii about this book xix about the cover illustration Part 1 ■ 1 xxv Introduction to data science......................... 1 The data science process 1.1 The roles in a data science project 3 Project roles 1.2 3 4 Stages of a data science project 6 Defining the goal 7 ■ Data collection and management 8 Modeling 10 ■ Model evaluation and critique 11 Presentation and documentation 13 ■ Model deployment and maintenance 14 1.3 Setting expectations 14 Determining lower and upper bounds on model performance 1.4 Summary 17 IX 15
CONTENTS x 2 Loading data into R 2.1 18 Working with data from files 19 Working with well-structured, data from fiks or URLs Using R on less-structured data 22 2.2 Working with relational databases 19 24 A production-size example 25 ■ Loading data from a database into R 30 * Working with the PUMS data 31 2.3 3 Summary 34 Exploring data 35 3.1 Using summary statistics to spot problems Typical problems reveakd by data summaries 3.2 36 38 Spotting problems using graphics and visualization 41 Visually checking distributions for a single variable 43 Visually checking relationships between two variabks 51 3.3 4 Summary Managing data 4.1 62 64 Cleaning data 64 Treating missing values (NAs) 4.2 65 ■ Data transformations Sampling for modeling and validation 76 Test and training splits 76 * Creating a sample group column 77 ■ Record grouping 78 * Data provenance 4.3 Part 2 5 Summary 69 78 79 Modeling methods............................................ 81 Choosing and evaluating models 5.1 83 Mapping problems to machine learning tasks 84 Solving classification problems 85 ■ Solving scoring problems 87 ■ Working without known targets 88 Probkm-to-method mapping 90 5.2 Evaluating models 92 Evaluating classification models 93 ■ Evaluating scoring models 98 ■ Evaluating probability models 101 * Evaluating ranking models 105 ■ Evaluating clustering models 105
Xl CONTENTS 5.3 Validating models 108 Identifying common model problems 108 · Quantifying model soundness 110 · Ensuring model quality 111 5.4 Summary 113 Memorization methods 6.1 115 KDD and KDD Cup 2009 116 Getting started with KDD Cup 2009 data 6.2 Building single-variable models 117 118 Using categoricalfeatures 119 · Using numeric features 121 Using cross-validation to estimate effects of overfitting 123 6.3 Building models using many variables 125 Variable selection 125· Using decision trees 127· Using nearest neighbor methods 130· Using Naive Bayes 134 6.4 Summary 138 Linear and logistic regression 7.1 Using linear regression 140 141 Understanding linear regression 141 · Building a linear regression model 144 ·■ Making predictions 145 · Finding relations and extracting advice 149 · Reading the model summary and characterizing coefficient quality 151 · Linear regression takeaways 156 7.2 Using logistic regression 157 Understanding logistic regression 157 · Building a logistic regression model 159 · Making predictions 160 · Finding relations and extracting advice from logistic models 164 Reading the model summary and characterizing coefficients 166 Logistic regression takeaways 173 7.3 Summary 174 Unsupervised methods 8.1 Cluster analysis 175 176 Distances 176 · Preparing the data 178 · Hierarchical clustering with hclustQ 180 · The k-means algorithm 190 Assigning new points to clusters 195 · Clustering takeaways 198
xii CONTENTS 8.2 Association rules 198 Overview of association rules 199 * The example problem Mining association ruks with the aruks package 201 Association гик takeaways 209 8.3 Summary 209 Exploring advanced methods 9.1 200 211 Using bagging and random forests to reduce training variance 212 Using bagging to improve prediction 213* Using random forests to further improve prediction 216 * Bagging and random forest takeaways 220 9.2 Using generalized additive models (GAMs) to learn non monotone relationships 221 Understanding GAMs 221 * A one-dimensional regression exampk 222 * Extracting the nonlinear relationships 226 Using GAM on actual data 228 * Using GAMfor logistic regression 231 * GAM takeaways 233 ·; 9.3 Using kernel methods to increase data separation 233 Understanding kernelfunctions 234 * Using an explicit kernel on a probkm , 238 * Kernel takeaways 241 9.4 Using SVMs to model complicated decision boundaries 242 Understanding support vector machines 242 * Trying an SVM on artificial exampk data 245 * Using SVMs on real data 248 Support vector machine takeaways 251 9.5 Parts Summary 251 Delivering results Documentation and deployment 10.1 10.2 253 255 The buzz dataset 256 Using knitr to produce milestone documentation What is knitr? 258 * knitr technical details to document the buzz data 262 258 261 * Using knitr
xiii CONTENTS 10.3 Using comments and version control for running documentation 266 Writing effective comments 266 * Using version control to record history 267■ Using version control to explore your project 272 Using version control to share work 276 10.4 Deploying models 280 Deploying models as R HTTP services 280 export 283 ■ What to take away 284 10.5 11.1 Deploying models by Summary 286 1 Producing effective presentations -*֊- ■ 287 Presenting your results to the project sponsor 288 Summarizing the project’s goah 289 ■ Stating the project’s results 290 ■ Filling in the details 292 ■ Making recommendations and discussingfuture work 294 Project sponsor presentation takeaways 293 11.2 Presenting your model to end users 295 Summarizing the project ’s goah 296 * Showing how the modelfits the users’ workflow 296 ■ Showing how to use the model 299 End user presentation takeaways 300 11.3 Presenting your work to other data scientists 301 Introducing the problem 301 ■ Discussing related work 302 Discussing your approach 302 ■ Discussing results and future work 303 ■ Peer presentation takeaways 304 11.4 Summary 304 appendix A Working with R and other tools appendix В Important statistical concepts appendix C More tools and ideas worth exploring bibliography index 377 375 307 333 369
|
any_adam_object | 1 |
author | Zumel, Nina Mount, John |
author_GND | (DE-588)1055925899 (DE-588)1202632769 |
author_facet | Zumel, Nina Mount, John |
author_role | aut aut |
author_sort | Zumel, Nina |
author_variant | n z nz j m jm |
building | Verbundindex |
bvnumber | BV041840045 |
classification_rvk | ST 250 ST 601 |
ctrlnum | (OCoLC)879595796 (DE-599)GBV782270417 |
dewey-full | 006.312 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 006 - Special computer methods |
dewey-raw | 006.312 |
dewey-search | 006.312 |
dewey-sort | 16.312 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01632nam a2200409 c 4500</leader><controlfield tag="001">BV041840045</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20200110 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">140512s2014 ad|| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781617291562</subfield><subfield code="9">978-1-61729-156-2</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1617291560</subfield><subfield code="9">1-61729-156-0</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)879595796</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBV782270417</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-Aug4</subfield><subfield code="a">DE-83</subfield><subfield code="a">DE-2070s</subfield><subfield code="a">DE-945</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-739</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.312</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 250</subfield><subfield code="0">(DE-625)143626:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 601</subfield><subfield code="0">(DE-625)143682:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Zumel, Nina</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1055925899</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Practical data science with R</subfield><subfield code="c">Nina Zumel ; John Mount</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Shelter Island, NY</subfield><subfield code="b">Manning</subfield><subfield code="c">2014</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXV, 389 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">R</subfield><subfield code="g">Programm</subfield><subfield code="0">(DE-588)4705956-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Datenanalyse</subfield><subfield code="0">(DE-588)4123037-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Statistik</subfield><subfield code="0">(DE-588)4056995-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">R</subfield><subfield code="g">Programm</subfield><subfield code="0">(DE-588)4705956-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Datenanalyse</subfield><subfield code="0">(DE-588)4123037-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Statistik</subfield><subfield code="0">(DE-588)4056995-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Mount, John</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1202632769</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027284747&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-027284747</subfield></datafield></record></collection> |
id | DE-604.BV041840045 |
illustrated | Illustrated |
indexdate | 2024-07-10T01:06:38Z |
institution | BVB |
isbn | 9781617291562 1617291560 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-027284747 |
oclc_num | 879595796 |
open_access_boolean | |
owner | DE-Aug4 DE-83 DE-2070s DE-945 DE-384 DE-739 |
owner_facet | DE-Aug4 DE-83 DE-2070s DE-945 DE-384 DE-739 |
physical | XXV, 389 S. Ill., graph. Darst. |
publishDate | 2014 |
publishDateSearch | 2014 |
publishDateSort | 2014 |
publisher | Manning |
record_format | marc |
spelling | Zumel, Nina Verfasser (DE-588)1055925899 aut Practical data science with R Nina Zumel ; John Mount Shelter Island, NY Manning 2014 XXV, 389 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier R Programm (DE-588)4705956-4 gnd rswk-swf Datenanalyse (DE-588)4123037-1 gnd rswk-swf Statistik (DE-588)4056995-0 gnd rswk-swf R Programm (DE-588)4705956-4 s Datenanalyse (DE-588)4123037-1 s Statistik (DE-588)4056995-0 s DE-604 Mount, John Verfasser (DE-588)1202632769 aut Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027284747&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Zumel, Nina Mount, John Practical data science with R R Programm (DE-588)4705956-4 gnd Datenanalyse (DE-588)4123037-1 gnd Statistik (DE-588)4056995-0 gnd |
subject_GND | (DE-588)4705956-4 (DE-588)4123037-1 (DE-588)4056995-0 |
title | Practical data science with R |
title_auth | Practical data science with R |
title_exact_search | Practical data science with R |
title_full | Practical data science with R Nina Zumel ; John Mount |
title_fullStr | Practical data science with R Nina Zumel ; John Mount |
title_full_unstemmed | Practical data science with R Nina Zumel ; John Mount |
title_short | Practical data science with R |
title_sort | practical data science with r |
topic | R Programm (DE-588)4705956-4 gnd Datenanalyse (DE-588)4123037-1 gnd Statistik (DE-588)4056995-0 gnd |
topic_facet | R Programm Datenanalyse Statistik |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027284747&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT zumelnina practicaldatasciencewithr AT mountjohn practicaldatasciencewithr |