Principles of data wrangling: practical techniques for data preparation
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Elektronisch E-Book |
Sprache: | English |
Veröffentlicht: |
Beijing
O'Reilly
May 2017
|
Ausgabe: | First edition |
Schlagworte: | |
Online-Zugang: | FUBA1 Inhaltsverzeichnis |
Beschreibung: | 1 Online-Ressource (viii, 82 Seiten) Diagramme |
ISBN: | 9781491938898 |
Internformat
MARC
LEADER | 00000nmm a2200000 c 4500 | ||
---|---|---|---|
001 | BV044512053 | ||
003 | DE-604 | ||
005 | 20170926 | ||
007 | cr|uuu---uuuuu | ||
008 | 170925s2017 |||| o||u| ||||||eng d | ||
020 | |a 9781491938898 |c Online |9 978-1-491-93889-8 | ||
035 | |a (OCoLC)1005927160 | ||
035 | |a (DE-599)BVBBV044512053 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-188 | ||
084 | |a QH 232 |0 (DE-625)141547: |2 rvk | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
084 | |a ST 265 |0 (DE-625)143634: |2 rvk | ||
100 | 1 | |a Rattenbury, Tye |e Verfasser |0 (DE-588)1140505211 |4 aut | |
245 | 1 | 0 | |a Principles of data wrangling |b practical techniques for data preparation |c Tye Rattenbury, Joseph M. Hellerstein, Jeffrey Heer, Sean Kandel, and Connor Carreras |
250 | |a First edition | ||
264 | 1 | |a Beijing |b O'Reilly |c May 2017 | |
300 | |a 1 Online-Ressource (viii, 82 Seiten) |b Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b c |2 rdamedia | ||
338 | |b cr |2 rdacarrier | ||
650 | 0 | 7 | |a Big Data |0 (DE-588)4802620-7 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Datenanalyse |0 (DE-588)4123037-1 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | |5 DE-604 | |
689 | 1 | 0 | |a Datenanalyse |0 (DE-588)4123037-1 |D s |
689 | 1 | 1 | |a Big Data |0 (DE-588)4802620-7 |D s |
689 | 1 | |5 DE-604 | |
700 | 1 | |a Hellerstein, Joseph M. |e Verfasser |0 (DE-588)114050536X |4 aut | |
700 | 1 | |a Heer, Jeffrey |e Verfasser |0 (DE-588)1140505416 |4 aut | |
700 | 1 | |a Kandel, Sean |e Verfasser |0 (DE-588)1140505645 |4 aut | |
700 | 1 | |a Carreras, Connor |e Verfasser |4 aut | |
776 | 0 | 8 | |i Erscheint auch als |n Druck-Ausgabe |z 978-1-491-93892-8 |
856 | 4 | 2 | |m HBZ Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029911813&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
912 | |a ZDB-30-PQE | ||
999 | |a oai:aleph.bib-bvb.de:BVB01-029911813 | ||
966 | e | |u https://ebookcentral.proquest.com/lib/fuberlin-ebooks/detail.action?docID=4891366 |l FUBA1 |p ZDB-30-PQE |x Aggregator |3 Volltext |
Datensatz im Suchindex
_version_ | 1804177849683279872 |
---|---|
adam_text | Titel: Principles of data wrangling
Autor: Rattenbury, Tye
Jahr: 2017
Table of Contents
Foreword vii
1. Introduction 1
Magic Thresholds, PYMK, and User Growth at Facebook 3
2. A Data Workflow Framework 7
How Data Flows During and Across Projects 8
Connecting Analytic Actions to Data Movement: A Holistic Workflow
Framework for Data Projects 11
Raw Data Stage Actions: Ingest Data and Create Metadata 12
Ingesting Known and Unknown Data 12
Creating Metadata 14
Refined Data Stage Actions: Create Canonical Data and Conduct Ad Hoc
Analyses 23
Designing Refined Data 24
Refined Stage Analytical Actions 26
Production Data Stage Actions: Create Production Data and Build Automated
Systems 28
Creating Optimized Data 29
Designing Regular Reports and Automated Products/Services 29
Data Wrangling within the Workflow Framework 30
3. The Dynamics of Data Wrangling 31
Data Wrangling Dynamics 31
Additional Aspects: Subsetting and Sampling 32
Core Transformation and Profiling Actions 34
Data Wrangling in the Workflow Framework 36
Ingesting Data 36
Describing Data 37
Assessing Data Utility 37
Designing and Building Refined Data 37
Ad Hoc Reporting 38
Exploratory Modeling and Forecasting 39
Building an Optimized Dataset 39
Regular Reporting and Building Data-Driven Products and Services 40
4. Profiling 43
Overview of Profiling 43
Individual Value Profiling: Syntactic Profiling 44
Individual Value Profiling: Semantic Profiling 44
Set-Based Profiling 45
Profiling Individual Values in the Candidate Master File 46
Syntactic Profiling in the Candidate Master File .. •. • 47
Set-Based Profiling in the Candidate Master File 48
5. Transformation: Structuring 51
Overview of Structuring 51
Intrarecord Structuring: Extracting Values 52
Positional Extraction 52
Pattern Extraction 54
Complex Structure Extraction 55
Intrarecord Structuring: Combining Multiple Record Fields 56
Interrecord Structuring: Filtering Records and Fields 57
Interrecord Structuring: Aggregations and Pivots 57
Simple Aggregations 58
Column-to-Row Pivots 59
Row-to-Column Pivots 59
6. Transformation: Enriching 61
Unions 61
Joins 62
Inserting Metadata 63
Derivation of Values 63
Generic 63
Proprietary —... 64
7. Using Transformation to Clean Data 67
Addressing Missing/NULL Values 67
Addressing Invalid Values 67
iv | Table of Contents
8. Roles and Responsibilities 69
Skills and Responsibilities 69
Data Engineer 70
Data Architect 71
Data Scientist 71
Analyst 72
Roles Across the Data Workflow Framework 73
Organizational Best Practices 74
9. Data Wrangling Tools 77
Data Size and Infrastructure 78
Data Structures 78
Excel 79
SQL 79
Trifacta Wrangler 79
Transformation Paradigms 79
Excel 80
SQL 80
Trifacta Wrangler 81
Choosing a Data Wrangling Tool 82
Table of Contents | v
|
any_adam_object | 1 |
author | Rattenbury, Tye Hellerstein, Joseph M. Heer, Jeffrey Kandel, Sean Carreras, Connor |
author_GND | (DE-588)1140505211 (DE-588)114050536X (DE-588)1140505416 (DE-588)1140505645 |
author_facet | Rattenbury, Tye Hellerstein, Joseph M. Heer, Jeffrey Kandel, Sean Carreras, Connor |
author_role | aut aut aut aut aut |
author_sort | Rattenbury, Tye |
author_variant | t r tr j m h jm jmh j h jh s k sk c c cc |
building | Verbundindex |
bvnumber | BV044512053 |
classification_rvk | QH 232 ST 530 ST 265 |
collection | ZDB-30-PQE |
ctrlnum | (OCoLC)1005927160 (DE-599)BVBBV044512053 |
discipline | Informatik Wirtschaftswissenschaften |
edition | First edition |
format | Electronic eBook |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02157nmm a2200493 c 4500</leader><controlfield tag="001">BV044512053</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20170926 </controlfield><controlfield tag="007">cr|uuu---uuuuu</controlfield><controlfield tag="008">170925s2017 |||| o||u| ||||||eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781491938898</subfield><subfield code="c">Online</subfield><subfield code="9">978-1-491-93889-8</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1005927160</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV044512053</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-188</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">QH 232</subfield><subfield code="0">(DE-625)141547:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 265</subfield><subfield code="0">(DE-625)143634:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Rattenbury, Tye</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1140505211</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Principles of data wrangling</subfield><subfield code="b">practical techniques for data preparation</subfield><subfield code="c">Tye Rattenbury, Joseph M. Hellerstein, Jeffrey Heer, Sean Kandel, and Connor Carreras</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">First edition</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Beijing</subfield><subfield code="b">O'Reilly</subfield><subfield code="c">May 2017</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 Online-Ressource (viii, 82 Seiten)</subfield><subfield code="b">Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Big Data</subfield><subfield code="0">(DE-588)4802620-7</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Datenanalyse</subfield><subfield code="0">(DE-588)4123037-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="689" ind1="1" ind2="0"><subfield code="a">Datenanalyse</subfield><subfield code="0">(DE-588)4123037-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2="1"><subfield code="a">Big Data</subfield><subfield code="0">(DE-588)4802620-7</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Hellerstein, Joseph M.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)114050536X</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Heer, Jeffrey</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1140505416</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Kandel, Sean</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1140505645</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Carreras, Connor</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Druck-Ausgabe</subfield><subfield code="z">978-1-491-93892-8</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029911813&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-30-PQE</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-029911813</subfield></datafield><datafield tag="966" ind1="e" ind2=" "><subfield code="u">https://ebookcentral.proquest.com/lib/fuberlin-ebooks/detail.action?docID=4891366</subfield><subfield code="l">FUBA1</subfield><subfield code="p">ZDB-30-PQE</subfield><subfield code="x">Aggregator</subfield><subfield code="3">Volltext</subfield></datafield></record></collection> |
id | DE-604.BV044512053 |
illustrated | Not Illustrated |
indexdate | 2024-07-10T07:54:37Z |
institution | BVB |
isbn | 9781491938898 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-029911813 |
oclc_num | 1005927160 |
open_access_boolean | |
owner | DE-188 |
owner_facet | DE-188 |
physical | 1 Online-Ressource (viii, 82 Seiten) Diagramme |
psigel | ZDB-30-PQE |
publishDate | 2017 |
publishDateSearch | 2017 |
publishDateSort | 2017 |
publisher | O'Reilly |
record_format | marc |
spelling | Rattenbury, Tye Verfasser (DE-588)1140505211 aut Principles of data wrangling practical techniques for data preparation Tye Rattenbury, Joseph M. Hellerstein, Jeffrey Heer, Sean Kandel, and Connor Carreras First edition Beijing O'Reilly May 2017 1 Online-Ressource (viii, 82 Seiten) Diagramme txt rdacontent c rdamedia cr rdacarrier Big Data (DE-588)4802620-7 gnd rswk-swf Data Mining (DE-588)4428654-5 gnd rswk-swf Datenanalyse (DE-588)4123037-1 gnd rswk-swf Data Mining (DE-588)4428654-5 s DE-604 Datenanalyse (DE-588)4123037-1 s Big Data (DE-588)4802620-7 s Hellerstein, Joseph M. Verfasser (DE-588)114050536X aut Heer, Jeffrey Verfasser (DE-588)1140505416 aut Kandel, Sean Verfasser (DE-588)1140505645 aut Carreras, Connor Verfasser aut Erscheint auch als Druck-Ausgabe 978-1-491-93892-8 HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029911813&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Rattenbury, Tye Hellerstein, Joseph M. Heer, Jeffrey Kandel, Sean Carreras, Connor Principles of data wrangling practical techniques for data preparation Big Data (DE-588)4802620-7 gnd Data Mining (DE-588)4428654-5 gnd Datenanalyse (DE-588)4123037-1 gnd |
subject_GND | (DE-588)4802620-7 (DE-588)4428654-5 (DE-588)4123037-1 |
title | Principles of data wrangling practical techniques for data preparation |
title_auth | Principles of data wrangling practical techniques for data preparation |
title_exact_search | Principles of data wrangling practical techniques for data preparation |
title_full | Principles of data wrangling practical techniques for data preparation Tye Rattenbury, Joseph M. Hellerstein, Jeffrey Heer, Sean Kandel, and Connor Carreras |
title_fullStr | Principles of data wrangling practical techniques for data preparation Tye Rattenbury, Joseph M. Hellerstein, Jeffrey Heer, Sean Kandel, and Connor Carreras |
title_full_unstemmed | Principles of data wrangling practical techniques for data preparation Tye Rattenbury, Joseph M. Hellerstein, Jeffrey Heer, Sean Kandel, and Connor Carreras |
title_short | Principles of data wrangling |
title_sort | principles of data wrangling practical techniques for data preparation |
title_sub | practical techniques for data preparation |
topic | Big Data (DE-588)4802620-7 gnd Data Mining (DE-588)4428654-5 gnd Datenanalyse (DE-588)4123037-1 gnd |
topic_facet | Big Data Data Mining Datenanalyse |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029911813&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT rattenburytye principlesofdatawranglingpracticaltechniquesfordatapreparation AT hellersteinjosephm principlesofdatawranglingpracticaltechniquesfordatapreparation AT heerjeffrey principlesofdatawranglingpracticaltechniquesfordatapreparation AT kandelsean principlesofdatawranglingpracticaltechniquesfordatapreparation AT carrerasconnor principlesofdatawranglingpracticaltechniquesfordatapreparation |