Introducing data science: big data, machine learning, and more, using Python tools
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Shelter Island, NY
Manning
[2016]
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | xx, 300 Seiten Illustrationen, Diagramme |
ISBN: | 9781633430037 1633430030 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV043552396 | ||
003 | DE-604 | ||
005 | 20161020 | ||
007 | t | ||
008 | 160512s2016 a||| |||| 00||| eng d | ||
020 | |a 9781633430037 |c pbk. |9 978-1-63343-003-7 | ||
020 | |a 1633430030 |c pbk. |9 1-63343-003-0 | ||
035 | |a (OCoLC)951534451 | ||
035 | |a (DE-599)BVBBV043552396 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-526 |a DE-739 |a DE-11 |a DE-20 |a DE-83 |a DE-573 |a DE-945 |a DE-1049 |a DE-863 |a DE-384 | ||
084 | |a ST 265 |0 (DE-625)143634: |2 rvk | ||
084 | |a ST 300 |0 (DE-625)143650: |2 rvk | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
100 | 1 | |a Cielen, Davy |e Verfasser |0 (DE-588)1107223571 |4 aut | |
245 | 1 | 0 | |a Introducing data science |b big data, machine learning, and more, using Python tools |c Davy Cielen, Arno D. B. Meysman, Mohamed Ali |
264 | 1 | |a Shelter Island, NY |b Manning |c [2016] | |
264 | 4 | |c © 2016 | |
300 | |a xx, 300 Seiten |b Illustrationen, Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 0 | 7 | |a Big Data |0 (DE-588)4802620-7 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Daten |0 (DE-588)4135391-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Datenmanagement |0 (DE-588)4213132-7 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Daten |0 (DE-588)4135391-2 |D s |
689 | 0 | 1 | |a Datenmanagement |0 (DE-588)4213132-7 |D s |
689 | 0 | 2 | |a Big Data |0 (DE-588)4802620-7 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Meysman, Arno |e Verfasser |0 (DE-588)1107224225 |4 aut | |
700 | 1 | |a Ali, Mohamed |d 1981- |e Verfasser |0 (DE-588)1102266310 |4 aut | |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028967629&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-028967629 |
Datensatz im Suchindex
DE-BY-863_location | 1000 |
---|---|
DE-BY-FWS_call_number | 1000/ST 265 C569 |
DE-BY-FWS_katkey | 640520 |
DE-BY-FWS_media_number | 083101350302 |
_version_ | 1806782920186134528 |
adam_text |
contents
preface xiii
acknowledgments xiv
about this book xvi
about the authors xviii
about the cover illustration xx
J Data science in a big data world 1
1.1 Benefits and uses of data science and big data 2
1.2 Facets of data 4
Structured data 4 ■ Unstructured data 5
Natural language 5 ■ Machine-generated data 6
Graph-based or network data 7 ■ Audio, image; and video 8
Streaming data 8
1.3 The data science process 8
Setting the research goal 8 * Retrieving data 9
Data preparation 9 m Data exploration 9
Data modeling or model building 9 ■ Presentation
and automation 9
1.4 The big data ecosystem and data science 10
Distributed file systems 10 ■ Distributed programming
framework 12 ■ Data integration framework 12
Vll
viii
CONTENTS
Machine learning frameworks ■ 13
Scheduling tools 14■ Benchmarking tools 14
System deployment 14 ■ Service programming 14
Security 14
1.5 An introductory working example of Hadoop 15
1.6 Summary 20
The data science process 22
2.1 Overview of the data science process 22
Don’t be a slave to the process 25
2.2 Step 1: Defining research goals and creating
a project charter 25
Spend time understanding the goals and context of your research 26
Create a project charter 26
2.3 Step 2: Retrieving data 27
Start with data stored within the company 28 ■ Don't be afraid
to shop around 28 ■ Do data quality checks now to prevent
problems later 29
2.4 Step 3: Cleansing, integrating, and transforming data 29
Cleansing data 30 ■ Correct errors as early as possible 36
Combining data from different data sources 3 7
Transforming data 40
2.5 Step 4: Exploratory data analysis 43
2.6 Step 5: Build the models 48
Model and variable selection 48 ■ Model execution 49
Model diagnostics and model comparison 54
2.7 Step 6: Presenting findings and building applications on
top of them 55
2.8 Summary 56
Machine learning 57
3.1 What is machine learning and why should you care
about it? 58
Applications for■ machine learning in data science 58
Where machine learning is used in the data science process 59
Python tools used in machine learning 60
CONTENTS
IX
3.2 The modeling process 62
Engineering features and selecting a model 62 ■ Training
your model 64 ■ Validating a model 64 ■ Predicting
new observations 65
3.3 Types of machine learning 65
Supervised learning 66 ■ Unsupervised learning 72
3.4 Semi-supervised learning 82
3.5 Summary 83
Handling large data on a single computer 85
4.1 The problems you face when handling large data 86
4.2 General techniques for handling large volumes of data 87
Choosing the right algorithm 88 * Choosing the right data
structure 96 ■ Selecting the right tools 99
4.3 General programming tips for dealing with
large data sets 101
Don9t reinvent the wheel 101 ■ Get the most out of your
hardware 102 ■ Reduce your computing needs 102
4.4 Case study 1: Predicting malicious URLs 103
Step 1: Defining the research goal 104 ■ Step 2: Acquiring
the URL data 104 ■ Step 4: Data exploration 105
Step 5: Model building 106
4.5 Case study 2: Building a recommender system inside
a database 108
Tools and techniques needed 108 ■ Step 1: Research
question 111 ■ Step 3: Data preparation 111
Step 5: Model building 115 * Step 6: Presentation
and automation 116
4.6 Summary 118
First steps in big data 119
5.1 Distributing data storage and processing with
frameworks 120
Hadoop: a framework for storing and processing large data sets 121
Spark: replacing MapReduce for better performance 123
CONTENTS
5.2 Case study: Assessing risk when loaning money 125
Step 1; The research goal 126 ■ Step 2: Data retrieval 127
Step 3: Data preparation 131 * Step 4: Data exploration
Step 6: Report building 135
5.3 Summary 149
Join the NoSQL movement 150
6.1 Introduction to NoSQL 153
A CID: the core principle of relational databases 153
CAP Theorem: the problem with DBs on many nodes 154
The BASE principles of NoSQL databases 156
NoSQL database types 158
6.2 Case study: What disease is that? 164
Step 1: Setting the research goal 166 ■ Steps 2 and 3: Data
retrieval and preparation 167 ■ Step 4: Data exploration 175
Step 3 revisited: Data preparation for disease profiling 183
Step 4 revisited: Data exploration for disease profiling 187
Step 6: Presentation and automation 188
6.3 Summary 189
The rise of graph databases 190
7.1 Introducing connected data and graph databases 191
Why and when should I use a graph database ? 193
7.2 Introducing Neo4j: a graph database 196
Cypher: a graph query language 198
7.3 Connected data example: a recipe recommendation
engine 204
Step 1: Setting the research goal ■ Step 2: Data retrieval 206
Step 3: Data preparation 210
Step 5: Data modeling 212 • 216
7.4 Summary 216
Text mining and text analytics 218
8.1 Text mining in the real world 220
8.2 Text mining techniques 225
Bag of words 225 ■ Stemming 227
Decision tree classifier 228
CONTENTS
xi
8.3 Case study: Classifying Reddit posts 230
Meet the Natural Language Toolkit 231 ■ Data science process
overview and step 1: The research goal 233 ■ Step 2: Data
retrieval 234 ■ Step 3: Data preparation 237 1 Step 4:
Data exploration 240 ■ Step 3 revisited: Data preparation
adapted 242 ■ Step 5: Data analysis 246 m Step 6:
Presentation and automation 230
8.4 Summary 252
Data visualization to the end user 253
9.1 Data visualization options 254
9.2 Crossfilter, the JavaScript MapReduce library 257
Setting up everything 258 ■ Unleashing Crossfilter to filter the
medicine data set 262
9.3 Creating an interactive dashboard with dc.js 267
9.4 Dashboard development tools 272
9.5 Summary 273
appendix A Setting up Elasticsearch 275
appendix B Setting up Neo4j 281
appendix C Installing MySQL server 284
appendix D Setting up Anaconda with a virtual environment 288
index 291 |
any_adam_object | 1 |
author | Cielen, Davy Meysman, Arno Ali, Mohamed 1981- |
author_GND | (DE-588)1107223571 (DE-588)1107224225 (DE-588)1102266310 |
author_facet | Cielen, Davy Meysman, Arno Ali, Mohamed 1981- |
author_role | aut aut aut |
author_sort | Cielen, Davy |
author_variant | d c dc a m am m a ma |
building | Verbundindex |
bvnumber | BV043552396 |
classification_rvk | ST 265 ST 300 ST 530 |
ctrlnum | (OCoLC)951534451 (DE-599)BVBBV043552396 |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV043552396</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20161020</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">160512s2016 a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781633430037</subfield><subfield code="c">pbk.</subfield><subfield code="9">978-1-63343-003-7</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1633430030</subfield><subfield code="c">pbk.</subfield><subfield code="9">1-63343-003-0</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)951534451</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV043552396</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-526</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-83</subfield><subfield code="a">DE-573</subfield><subfield code="a">DE-945</subfield><subfield code="a">DE-1049</subfield><subfield code="a">DE-863</subfield><subfield code="a">DE-384</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 265</subfield><subfield code="0">(DE-625)143634:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Cielen, Davy</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1107223571</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Introducing data science</subfield><subfield code="b">big data, machine learning, and more, using Python tools</subfield><subfield code="c">Davy Cielen, Arno D. B. Meysman, Mohamed Ali</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Shelter Island, NY</subfield><subfield code="b">Manning</subfield><subfield code="c">[2016]</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2016</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xx, 300 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Big Data</subfield><subfield code="0">(DE-588)4802620-7</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Daten</subfield><subfield code="0">(DE-588)4135391-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Datenmanagement</subfield><subfield code="0">(DE-588)4213132-7</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Daten</subfield><subfield code="0">(DE-588)4135391-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Datenmanagement</subfield><subfield code="0">(DE-588)4213132-7</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Big Data</subfield><subfield code="0">(DE-588)4802620-7</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Meysman, Arno</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1107224225</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ali, Mohamed</subfield><subfield code="d">1981-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1102266310</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028967629&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-028967629</subfield></datafield></record></collection> |
id | DE-604.BV043552396 |
illustrated | Illustrated |
indexdate | 2024-08-08T04:01:06Z |
institution | BVB |
isbn | 9781633430037 1633430030 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-028967629 |
oclc_num | 951534451 |
open_access_boolean | |
owner | DE-526 DE-739 DE-11 DE-20 DE-83 DE-573 DE-945 DE-1049 DE-863 DE-BY-FWS DE-384 |
owner_facet | DE-526 DE-739 DE-11 DE-20 DE-83 DE-573 DE-945 DE-1049 DE-863 DE-BY-FWS DE-384 |
physical | xx, 300 Seiten Illustrationen, Diagramme |
publishDate | 2016 |
publishDateSearch | 2016 |
publishDateSort | 2016 |
publisher | Manning |
record_format | marc |
spellingShingle | Cielen, Davy Meysman, Arno Ali, Mohamed 1981- Introducing data science big data, machine learning, and more, using Python tools Big Data (DE-588)4802620-7 gnd Daten (DE-588)4135391-2 gnd Datenmanagement (DE-588)4213132-7 gnd |
subject_GND | (DE-588)4802620-7 (DE-588)4135391-2 (DE-588)4213132-7 |
title | Introducing data science big data, machine learning, and more, using Python tools |
title_auth | Introducing data science big data, machine learning, and more, using Python tools |
title_exact_search | Introducing data science big data, machine learning, and more, using Python tools |
title_full | Introducing data science big data, machine learning, and more, using Python tools Davy Cielen, Arno D. B. Meysman, Mohamed Ali |
title_fullStr | Introducing data science big data, machine learning, and more, using Python tools Davy Cielen, Arno D. B. Meysman, Mohamed Ali |
title_full_unstemmed | Introducing data science big data, machine learning, and more, using Python tools Davy Cielen, Arno D. B. Meysman, Mohamed Ali |
title_short | Introducing data science |
title_sort | introducing data science big data machine learning and more using python tools |
title_sub | big data, machine learning, and more, using Python tools |
topic | Big Data (DE-588)4802620-7 gnd Daten (DE-588)4135391-2 gnd Datenmanagement (DE-588)4213132-7 gnd |
topic_facet | Big Data Daten Datenmanagement |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028967629&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT cielendavy introducingdatasciencebigdatamachinelearningandmoreusingpythontools AT meysmanarno introducingdatasciencebigdatamachinelearningandmoreusingpythontools AT alimohamed introducingdatasciencebigdatamachinelearningandmoreusingpythontools |
Inhaltsverzeichnis
THWS Würzburg Zentralbibliothek Lesesaal
Signatur: |
1000 ST 265 C569 |
---|---|
Exemplar 1 | ausleihbar Verfügbar Bestellen |