Data analytics with Hadoop: an introduction for data scientists
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Sebastopol, CA
O'Reilly
June 2016
|
Ausgabe: | First edition |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | xvi, 268 Seiten Illustrationen, Diagramme |
ISBN: | 9781491913703 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV043642086 | ||
003 | DE-604 | ||
005 | 20160718 | ||
007 | t | ||
008 | 160627s2016 a||| |||| 00||| eng d | ||
020 | |a 9781491913703 |c pbk. |9 978-1-4919-1370-3 | ||
035 | |a (OCoLC)953127910 | ||
035 | |a (DE-599)BVBBV043642086 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-83 |a DE-11 |a DE-898 | ||
084 | |a ST 201 |0 (DE-625)143612: |2 rvk | ||
084 | |a ST 230 |0 (DE-625)143617: |2 rvk | ||
100 | 1 | |a Bengfort, Benjamin |e Verfasser |0 (DE-588)110686901X |4 aut | |
245 | 1 | 0 | |a Data analytics with Hadoop |b an introduction for data scientists |c Benjamin Bengfort, Jenny Kim |
250 | |a First edition | ||
264 | 1 | |a Sebastopol, CA |b O'Reilly |c June 2016 | |
300 | |a xvi, 268 Seiten |b Illustrationen, Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Hadoop |0 (DE-588)1022420135 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Hadoop |0 (DE-588)1022420135 |D s |
689 | 0 | 1 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Kim, Jenny |e Verfasser |0 (DE-588)1106869648 |4 aut | |
856 | 4 | 2 | |m HBZ Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029055885&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-029055885 |
Datensatz im Suchindex
_version_ | 1804176383578996736 |
---|---|
adam_text | Titel: Data analytics with Hadoop
Autor: Bengfort, Benjamin
Jahr: 2016
Table of Contents Preface.......................................................................vii Part I. Introduction to Distributed Computing 1. The Age of the Data Product................................................. 3 What Is a Data Product? 4 Building Data Products at Scale with Hadoop 5 Leveraging Large Datasets 6 Hadoop for Data Products 7 The Data Science Pipeline and the Hadoop Ecosystem 8 Big Data Workflows 10 Conclusion 11 2. An Operating System for Big Data............................................13 Basic Concepts 14 Hadoop Architecture 15 A Hadoop Cluster 17 HDFS 20 YARN 21 Working with a Distributed File System 22 Basic File System Operations 23 File Permissions in HDFS 25 Other HDFS Interfaces 26 Working with Distributed Computation 27 MapReduce: A Functional Programming Model 28 MapReduce: Implemented on a Cluster 30 Beyond a Map and Reduce: Job Chaining 37 Hi
Submitting a MapReduce Job to YARN 38 AH Conclusion A Framework for Python and Hadoop Streaming................ ...............41 Hadoop Streaming 42 Computing on CSV Data with Streaming 45 Executing Streaming Jobs 50 A Framework for MapReduce with Python 52 Counting Bigrams 55 Other Frameworks 59 Advanced MapReduce 60 Combiners 60 Parti tioners 61 Job Chaining 62 Conclusion 65 ln-Memory Computing with Spark............................ ............... 67 Spark Basics 68 The Spark Stack 70 Resilient Distributed Datasets 72 Programming with RDDs 73 Interactive Spark Using PySpark 77 Writing Spark Applications 79 Visualizing Airline Delays with Spark 81 Conclusion 87 Distributed Analysis and Patterns............................ ................89 Computing with Keys 91 Compound Keys 92 Keyspace Patterns 96 Pairs versus Stripes 100 Design Patterns 104 Summarization 105 Indexing 110 Filtering 117 Toward Last-Mile Analytics 123 Fitting a Model 124 Validating Models 125 Conclusion 127 iv | Table of Contents
Part II. Workflows and Tools for Big Data Science Data Mining and Warehousing................................. ............ 131 Structured Data Queries with Hive 132 The Hive Command-Line Interface (CLI) 133 Hive Query Language (HQL) 134 Data Analysis with Hive 139 HBase 144 NoSQL and Column-Oriented Databases 145 Real-Time Analytics with HBase 148 Conclusion 156 Data Ingestion............................................... ............. 157 Importing Relational Data with Sqoop 158 Importing from MySQL to HDFS 159 Importing from MySQL to Hive 161 Importing from MySQL to HBase 163 Ingesting Streaming Data with Flume 165 Flume Data Flows 166 Ingesting Product Impression Data with Flume 169 Conclusion 173 Analytics with Higher-Level APIs............................... ............ 175 Pig 175 Pig Latin 177 Data Types 181 Relational Operators 182 User-Defined Functions 182 Wrapping Up 184 Spark’s Higher-Level APIs 184 Spark SQL 186 DataFrames 189 Conclusion 195 Machine Learning............................................ ............ 197 Scalable Machine Learning with Spark 197 Collaborative Filtering 199 Classification 206 Clustering 208 Conclusion 212 Table of Contents | v
10. Summary: Doing Distributed Data Science ................................... 213 Data Product Lifecycle 214 Data Lakes 216 Data Ingestion 218 Computational Data Stores 220 Machine Learning Lifecycle 222 Conclusion 224 A. Creating a Hadoop Pseudo-Distributed Development Environment............... 227 B. Installing Hadoop Ecosystem Products........................................237 Glossary.....................................................................247 Index....................................................................... 263 vi I Table of Contents
|
any_adam_object | 1 |
author | Bengfort, Benjamin Kim, Jenny |
author_GND | (DE-588)110686901X (DE-588)1106869648 |
author_facet | Bengfort, Benjamin Kim, Jenny |
author_role | aut aut |
author_sort | Bengfort, Benjamin |
author_variant | b b bb j k jk |
building | Verbundindex |
bvnumber | BV043642086 |
classification_rvk | ST 201 ST 230 |
ctrlnum | (OCoLC)953127910 (DE-599)BVBBV043642086 |
discipline | Informatik |
edition | First edition |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01485nam a2200373 c 4500</leader><controlfield tag="001">BV043642086</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20160718 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">160627s2016 a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781491913703</subfield><subfield code="c">pbk.</subfield><subfield code="9">978-1-4919-1370-3</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)953127910</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV043642086</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-83</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-898</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 201</subfield><subfield code="0">(DE-625)143612:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 230</subfield><subfield code="0">(DE-625)143617:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Bengfort, Benjamin</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)110686901X</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Data analytics with Hadoop</subfield><subfield code="b">an introduction for data scientists</subfield><subfield code="c">Benjamin Bengfort, Jenny Kim</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">First edition</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Sebastopol, CA</subfield><subfield code="b">O'Reilly</subfield><subfield code="c">June 2016</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xvi, 268 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Hadoop</subfield><subfield code="0">(DE-588)1022420135</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Hadoop</subfield><subfield code="0">(DE-588)1022420135</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Kim, Jenny</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1106869648</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029055885&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-029055885</subfield></datafield></record></collection> |
id | DE-604.BV043642086 |
illustrated | Illustrated |
indexdate | 2024-07-10T07:31:19Z |
institution | BVB |
isbn | 9781491913703 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-029055885 |
oclc_num | 953127910 |
open_access_boolean | |
owner | DE-83 DE-11 DE-898 DE-BY-UBR |
owner_facet | DE-83 DE-11 DE-898 DE-BY-UBR |
physical | xvi, 268 Seiten Illustrationen, Diagramme |
publishDate | 2016 |
publishDateSearch | 2016 |
publishDateSort | 2016 |
publisher | O'Reilly |
record_format | marc |
spelling | Bengfort, Benjamin Verfasser (DE-588)110686901X aut Data analytics with Hadoop an introduction for data scientists Benjamin Bengfort, Jenny Kim First edition Sebastopol, CA O'Reilly June 2016 xvi, 268 Seiten Illustrationen, Diagramme txt rdacontent n rdamedia nc rdacarrier Data Mining (DE-588)4428654-5 gnd rswk-swf Hadoop (DE-588)1022420135 gnd rswk-swf Hadoop (DE-588)1022420135 s Data Mining (DE-588)4428654-5 s DE-604 Kim, Jenny Verfasser (DE-588)1106869648 aut HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029055885&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Bengfort, Benjamin Kim, Jenny Data analytics with Hadoop an introduction for data scientists Data Mining (DE-588)4428654-5 gnd Hadoop (DE-588)1022420135 gnd |
subject_GND | (DE-588)4428654-5 (DE-588)1022420135 |
title | Data analytics with Hadoop an introduction for data scientists |
title_auth | Data analytics with Hadoop an introduction for data scientists |
title_exact_search | Data analytics with Hadoop an introduction for data scientists |
title_full | Data analytics with Hadoop an introduction for data scientists Benjamin Bengfort, Jenny Kim |
title_fullStr | Data analytics with Hadoop an introduction for data scientists Benjamin Bengfort, Jenny Kim |
title_full_unstemmed | Data analytics with Hadoop an introduction for data scientists Benjamin Bengfort, Jenny Kim |
title_short | Data analytics with Hadoop |
title_sort | data analytics with hadoop an introduction for data scientists |
title_sub | an introduction for data scientists |
topic | Data Mining (DE-588)4428654-5 gnd Hadoop (DE-588)1022420135 gnd |
topic_facet | Data Mining Hadoop |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029055885&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT bengfortbenjamin dataanalyticswithhadoopanintroductionfordatascientists AT kimjenny dataanalyticswithhadoopanintroductionfordatascientists |