The essentials of data science: knowledge discovery using R
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Boca Raton ; London
CRC Press
[2017]
|
Schriftenreihe: | The R series
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis Klappentext |
Beschreibung: | xix, 322 Seiten Illustrationen, Diagramme |
ISBN: | 9781138088634 9781498740005 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV044412182 | ||
003 | DE-604 | ||
005 | 20220119 | ||
007 | t | ||
008 | 170717s2017 a||| |||| 00||| eng d | ||
020 | |a 9781138088634 |c paperback |9 978-1-138-08863-4 | ||
020 | |a 9781498740005 |c hardback |9 978-1-4987-4000-5 | ||
035 | |a (OCoLC)1053905707 | ||
035 | |a (DE-599)BVBBV044412182 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-945 |a DE-355 |a DE-739 | ||
084 | |a ST 601 |0 (DE-625)143682: |2 rvk | ||
084 | |a ST 250 |0 (DE-625)143626: |2 rvk | ||
100 | 1 | |a Williams, Graham J. |e Verfasser |0 (DE-588)131380346 |4 aut | |
245 | 1 | 0 | |a The essentials of data science |b knowledge discovery using R |c Graham J. Williams |
264 | 1 | |a Boca Raton ; London |b CRC Press |c [2017] | |
264 | 4 | |c © 2017 | |
300 | |a xix, 322 Seiten |b Illustrationen, Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a The R series | |
650 | 0 | 7 | |a Data Science |0 (DE-588)1140936166 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a R |g Programm |0 (DE-588)4705956-4 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Data Science |0 (DE-588)1140936166 |D s |
689 | 0 | 1 | |a R |g Programm |0 (DE-588)4705956-4 |D s |
689 | 0 | |5 DE-604 | |
776 | 0 | 8 | |i Erscheint auch als |n Ebook-Ausgabe |z 978-1-351-64749-6 |
856 | 4 | 2 | |m Digitalisierung UB Regensburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029813951&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
856 | 4 | 2 | |m Digitalisierung UB Regensburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029813951&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |3 Klappentext |
999 | |a oai:aleph.bib-bvb.de:BVB01-029813951 |
Datensatz im Suchindex
_version_ | 1804177701802606592 |
---|---|
adam_text | CJontents
Preface ix
List of Figures xvii
List of Tables xix
1 Data Science 1
1.1 Exercises ....................................... 12
2 Introducing R 13
2.1 Tooling For R Programming........................ 16
2.2 Packages and Libraries .......................... 22
2.3 Functions, Commands and Operators ............... 27
2.4 Pipes ........................................... 31
2.5 Getting Help .................................... 40
2.6 Exercises ....................................... 41
3 Data Wrangling 43
3.1 Data Ingestion .................................. 44
3.2 Data Review ..................................... 51
3.3 Data Cleaning ................................... 54
3.4 Variable Roles .................................. 63
3.5 Feature Selection................................ 66
3.6 Missing Data..................................... 77
3.7 Feature Creation................................. 80
3.8 Preparing the Metadata........................... 85
3.9 Preparing for Model Building..................... 88
3.10 Save the Dataset................................ 92
3.11 A Template for Data Preparation................. 94
3.12 Exercises ...................................... 95
Xlll
xiv Contents
4 Visualising Data 97
4.1 Preparing the Dataset .......................... 98
4.2 Scatter Plot................................... 100
4.3 Bar Chart...................................... 102
4.4 Saving Plots to File .......................... 103
4.5 Adding Spice to the Bar Chart ............... 103
4.6 Alternative Bar Charts ........................ 107
4.7 Box Plots...................................... ill
4.8 Exercises ..................................... 118
5 Case Study: Australian Ports 119
5.1 Data Ingestion ................................ 120
5.2 Bar Chart: Value/Weight of Sea Trade...... 123
5.3 Scatter Plot: Throughput versus Annual Growth 130
5.4 Combined Plots: Port Calls .................... 138
5-5 Further Plots.................................. 141
5.6 Exercises ..................................... 147
6 Case Study: Web Analytics 149
6.1 Sourcing Data from CKAN ....................... 150
6.2 Browser Data................................... 155
6.3 Entry Pages ................................... 166
6.4 Exercises ..................................... 174
7 A Pattern for Predictive Modelling 175
7.1 Loading the Dataset............................ 177
7.2 Building a Decision Tree Model................. 180
7.3 Model Performance ............................. 185
7.4 Evaluating Model Generality ................... 193
7.5 Model Tuning .................................. 201
7.6 Comparison of Performance Measures ............ 209
7.7 Save the Model to File......................... 210
7.8 A Template for Predictive Modelling............ 212
7.9 Exercises ..................................... 212
8 Ensemble of Predictive Models 215
8.1 Loading the Dataset............................ 216
8.2 Random Forest.................................. 217
Contents
XV
8.3 Extreme Gradient Boosting....................... 227
8.4 Exercises ...................................... 239
9 Writing Functions in R 241
9.1 Model Evaluation ............................... 242
9.2 Creating a Function ............................ 243
9.3 Function for ROC Curves ........................ 254
9.4 Exercises ...................................... 256
10 Literate Data Science 257
10.1 Basic MTeX Template ............................ 259
10.2 A Template for our Narrative.................... 260
10.3 Including R Commands ........................... 263
10.4 Inline R Code.................................. 265
10.5 Formatting Tables Using Kable................... 266
10.6 Formatting Tables Using XTable ................. 270
10.7 Including Figures............................... 276
10.8 Add a Caption and Label ........................ 281
10.9 Knitr Options .................................. 282
10.10 Exercises ..................................... 283
HR with Style 285
11.1 Why We Should Care ............................. 285
11.2 Naming.......................................... 287
11.3 Comments ....................................... 291
11.4 Layout ......................................... 292
11.5 Functions....................................... 298
11.6 Assignment...................................... 302
11.7 Miscellaneous................................... 304
11.8 Exercises ...................................... 305
Bibliography 307
Index
313
Statistics
The Essentials of Data Science: Knowledge Discovery Using R presents the
concepts of data science through a hands-on approach using free and open
source software. It systematically drives an accessible journey through data analy-
sis and machine learning to discover and share knowledge from data.
Building on over thirty years’ experience in teaching and practising data science,
the author encourages a programming-by-example approach to ensure students
and practitioners attune to the practise of data science while building their data
skills. Proven frameworks are provided as reusable templates. Real-world case
studies then provide insight for the data scientist to swiftly adapt the templates to
new tasks and datasets.
The book begins by introducing data science. It then reviews R’s capabilities for
analysing data by writing computer programs. These programs are developed and
explained step by step. From analysing and visualising data, the framework moves
on to tried and tested machine learning techniques for predictive modelling and
knowledge discovery. Literate programming and a consistent style are a focus
throughout the book.
Graham J. Williams is Director of Data Science with Microsoft and Honorary As-
sociate Professor with the Australian National University;, He is also Adjunct Pro-
fessor with the University of Canberra. He was previously Senior Director of Ana-
lytics with the Australian Taxation Office, Lead Data Scientist with the Australian
Government’s Centre of Excellence in Data Analytics, and International Visiting
Professor of the Chinese Academy of Sciences. Over three decades he has been
an active machine learning researcher and author of many publications and soft-
ware including Rattle. As a practitioner of data science he has deployed solutions
in areas including finance, banking, insurance, health, education, and government.
He is also chair and steering committee member of international conferences in
knowledge discovery, artificial intelligence, machine learning, and data mining.
|
any_adam_object | 1 |
author | Williams, Graham J. |
author_GND | (DE-588)131380346 |
author_facet | Williams, Graham J. |
author_role | aut |
author_sort | Williams, Graham J. |
author_variant | g j w gj gjw |
building | Verbundindex |
bvnumber | BV044412182 |
classification_rvk | ST 601 ST 250 |
ctrlnum | (OCoLC)1053905707 (DE-599)BVBBV044412182 |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01906nam a2200409 c 4500</leader><controlfield tag="001">BV044412182</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20220119 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">170717s2017 a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781138088634</subfield><subfield code="c">paperback</subfield><subfield code="9">978-1-138-08863-4</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781498740005</subfield><subfield code="c">hardback</subfield><subfield code="9">978-1-4987-4000-5</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1053905707</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV044412182</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-945</subfield><subfield code="a">DE-355</subfield><subfield code="a">DE-739</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 601</subfield><subfield code="0">(DE-625)143682:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 250</subfield><subfield code="0">(DE-625)143626:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Williams, Graham J.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)131380346</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">The essentials of data science</subfield><subfield code="b">knowledge discovery using R</subfield><subfield code="c">Graham J. Williams</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boca Raton ; London</subfield><subfield code="b">CRC Press</subfield><subfield code="c">[2017]</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2017</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xix, 322 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">The R series</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Science</subfield><subfield code="0">(DE-588)1140936166</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">R</subfield><subfield code="g">Programm</subfield><subfield code="0">(DE-588)4705956-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data Science</subfield><subfield code="0">(DE-588)1140936166</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">R</subfield><subfield code="g">Programm</subfield><subfield code="0">(DE-588)4705956-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Ebook-Ausgabe</subfield><subfield code="z">978-1-351-64749-6</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029813951&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029813951&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Klappentext</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-029813951</subfield></datafield></record></collection> |
id | DE-604.BV044412182 |
illustrated | Illustrated |
indexdate | 2024-07-10T07:52:16Z |
institution | BVB |
isbn | 9781138088634 9781498740005 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-029813951 |
oclc_num | 1053905707 |
open_access_boolean | |
owner | DE-945 DE-355 DE-BY-UBR DE-739 |
owner_facet | DE-945 DE-355 DE-BY-UBR DE-739 |
physical | xix, 322 Seiten Illustrationen, Diagramme |
publishDate | 2017 |
publishDateSearch | 2017 |
publishDateSort | 2017 |
publisher | CRC Press |
record_format | marc |
series2 | The R series |
spelling | Williams, Graham J. Verfasser (DE-588)131380346 aut The essentials of data science knowledge discovery using R Graham J. Williams Boca Raton ; London CRC Press [2017] © 2017 xix, 322 Seiten Illustrationen, Diagramme txt rdacontent n rdamedia nc rdacarrier The R series Data Science (DE-588)1140936166 gnd rswk-swf R Programm (DE-588)4705956-4 gnd rswk-swf Data Science (DE-588)1140936166 s R Programm (DE-588)4705956-4 s DE-604 Erscheint auch als Ebook-Ausgabe 978-1-351-64749-6 Digitalisierung UB Regensburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029813951&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis Digitalisierung UB Regensburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029813951&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA Klappentext |
spellingShingle | Williams, Graham J. The essentials of data science knowledge discovery using R Data Science (DE-588)1140936166 gnd R Programm (DE-588)4705956-4 gnd |
subject_GND | (DE-588)1140936166 (DE-588)4705956-4 |
title | The essentials of data science knowledge discovery using R |
title_auth | The essentials of data science knowledge discovery using R |
title_exact_search | The essentials of data science knowledge discovery using R |
title_full | The essentials of data science knowledge discovery using R Graham J. Williams |
title_fullStr | The essentials of data science knowledge discovery using R Graham J. Williams |
title_full_unstemmed | The essentials of data science knowledge discovery using R Graham J. Williams |
title_short | The essentials of data science |
title_sort | the essentials of data science knowledge discovery using r |
title_sub | knowledge discovery using R |
topic | Data Science (DE-588)1140936166 gnd R Programm (DE-588)4705956-4 gnd |
topic_facet | Data Science R Programm |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029813951&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029813951&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT williamsgrahamj theessentialsofdatascienceknowledgediscoveryusingr |