Data profiling:
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
[San Rafael, California]
Morgan & Claypool Publishers
[2019]
|
Schriftenreihe: | Synthesis lectures on data management
#52 |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | xviii, 136 Seiten Illustrationen, Diagramme |
ISBN: | 9781681734460 9781681734484 |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV045352039 | ||
003 | DE-604 | ||
005 | 20230804 | ||
007 | t | ||
008 | 181210s2019 a||| |||| 00||| eng d | ||
020 | |a 9781681734460 |c paperback |9 978-1-68173-446-0 | ||
020 | |a 9781681734484 |c hardcover |9 978-1-68173-448-4 | ||
035 | |a (OCoLC)1081346491 | ||
035 | |a (DE-599)BVBBV045352039 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-29T |a DE-898 |a DE-12 |a DE-11 |a DE-739 | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
100 | 1 | |a Abedjan, Ziawasch |e Verfasser |0 (DE-588)1081174676 |4 aut | |
245 | 1 | 0 | |a Data profiling |c Ziawasch Abedjan (Technische Universität Berlin), Lukasz Golab (University of Waterloo), Felix Naumann (Hasso Plattner Institute, University of Potsdam), Thorsten Papenbrock (Hasso Plattner Institute, University of Potsdam) |
264 | 1 | |a [San Rafael, California] |b Morgan & Claypool Publishers |c [2019] | |
300 | |a xviii, 136 Seiten |b Illustrationen, Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Synthesis lectures on data management |v #52 | |
650 | 0 | 7 | |a Metadaten |0 (DE-588)4410512-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Data-Profiling |0 (DE-588)7670125-6 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Data-Profiling |0 (DE-588)7670125-6 |D s |
689 | 0 | 1 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | 2 | |a Metadaten |0 (DE-588)4410512-5 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Golab, Lukasz |e Verfasser |0 (DE-588)1207414689 |4 aut | |
700 | 1 | |a Naumann, Felix |d 1971- |e Verfasser |0 (DE-588)129576379 |4 aut | |
700 | 1 | |a Papenbrock, Thorsten |e Verfasser |0 (DE-588)1153740621 |4 aut | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-1-68173-447-7 |
830 | 0 | |a Synthesis lectures on data management |v #52 |w (DE-604)BV036766043 |9 52 | |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030738692&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-030738692 |
Datensatz im Suchindex
_version_ | 1804179178001530880 |
---|---|
adam_text | xi Contents Preface............................ Acknowledgments 1 2 xvii 1 1 1.1 Motivation and Overview...................... 1.2 Data Profiling and Data Mining.......... ............................... Use Cases............... ............................. 1.4 Organization of This Book. ........................................... Data Profiling Tasks ................... 3 ....... 4 1.3 6 ........................ 7 Single-Column Analysis .................................................... 7 2.2 Dependency Discovery.............................................................................. 9 2.3 Relaxed Dependencies................................. 9 Single-Column Analysis .......................................... 11 3.1 Cardinalities..................... 11 3.2 Value Distributions................................................... 11 3.3 Data Types, Patterns, and Domains.......... ............................................ 14 3.4 Data Completeness ................................................................................... 15 3.5 3.6 4 ................................................. Discovering Metadata................................................... 2.1 3 .xv Approximate Statistics............... Summary and Discussion........................................ Dependency Discovery ................................. 16 17 19 4.1 Dependency Definitions .......................................... 4.1.1 Functional Dependencies ..................................... ............. 4.1.2 Unique Column Combinations ................. . 4.1.3 Inclusion Dependencies
........................ 19 21 22 23 4.2 Search Space and Data Structures ....................................... 4.2.1 Lattices and Search Space Sizes............................. 4.2.2 Position List Indexes and Search Space Validation ................ 4.2.3 Search Complexity ...................................................... 24 24 27 29
ii 4.2.4 4.3 4.4 Gordian 32 .... ......................................... 4.3.2 HCA............................................................. 34 4.3.3 Duce.................................................. 35 4.3.4 HyUCC 37 4.3.5 Swan................ ................................... 38 Discovering Functional Dependencies ................ 39 4.4.1 Tane ... ....... 41 4.4.2 Fun ... ....... 42 4.4.3 FD_.Mine................. 45 4.4.4 Dfd....... . ..................................................... 45 4.4.5 Dep-Miner 4.4.6 FastFDs...... . .................... 4.4.7 Fdep .................... 46 48 50 ....... HyFD................................... ......... 51 Discovering Inclusion Dependencies........................... 4.5.1 5 ЗО Discovering Unique Column Combinations................................ 31 4.3.1 4.4.8 4.5 Null Semantics ............. SQL-Based IND Validation................................ ......... 57 5.2 5.3 60 4.5.2 B B 4.5.3 DeMarchi............. 61 4.5.4 Binder .................... 62 4.5.5 Spider 64 4.5.6 S-IndD...... . .............. 4.5.7 Sindy 4.5.8 4.5.9 ..... ..... ..... 66 .......................... 68 .Mind . ...... 69 Find2 ..... ........ 70 4.5,10 ZigZag....... . ............ 71 4.5.11 Mind2 72 ............................. Relaxed and Other Dependencies . ............... 5.1 55 Relaxing the Extent of a Dependency...... ..................... 75 75 5.1.1 Partial Dependencies.............. . ................. . .......... .. 76 5.1.2 Conditional Dependencies.......... 76 Relaxing Attribute Comparisons ................ 78 5.2.1 Metric and
Matching Dependencies........................ 78 5.2.2 Order and Sequential Dependencies .......... 81 Approximating the Dependency Discovery....... . .......... 83
xiii 5.4 6 7 Use Cases .............. 9 Ю .87 6.1 Data Exploration .................... 87 6.2 Schema Engineering .......................................................... 88 6.3 Data Cleaning ............................................... 89 6.4 Query Optimization ............................ 90 6.5 Data integration ............... 91 Profiling Non - Relational Data ................................. 7.1 8 Generalizing Funcționa! Dependencies............... ...................................... 83 5.4.1 Denial Constraints....................... 84 5.4.2 Multivalued Dependencies....................................................................... . 84 .93 XML..................... 93 7.2 RDF........... .................... 94 7.3 Time Series............... ................................................ 94 7.4 Graphs ..................................... 95 7.5 Text....................................................... 96 Data Profiling Tools........ ............................ 97 8.1 Research Prototypes..................................................... 97 8.2 Commercial Cools .................................. 99 Data Profiling Challenges.................... ....... ................... .................... 103 9.1 Functional Challenges..................................................... 9.1.1 Profiling Dynamic Data........... ................. 9.1.2 Interactive Profiling............................................ 9.1,3 Profiling for Integration.............................. 9.1.4 Interpreting Profiling Results................... 103 103 104 105 106 9.2 Non-
Functional Challenges ........................ 9.2.1 Efficiency and Scalability ................................................................... 9.2.2 Profiling on New Architectures ............ ........................ 9.2.3 Benchmarking Profiling Methods ............. ................ 108 108 109 109 Conclusions ................................................... Bibliography ...................... Authors ’ Biograph։ es Ill
|
any_adam_object | 1 |
author | Abedjan, Ziawasch Golab, Lukasz Naumann, Felix 1971- Papenbrock, Thorsten |
author_GND | (DE-588)1081174676 (DE-588)1207414689 (DE-588)129576379 (DE-588)1153740621 |
author_facet | Abedjan, Ziawasch Golab, Lukasz Naumann, Felix 1971- Papenbrock, Thorsten |
author_role | aut aut aut aut |
author_sort | Abedjan, Ziawasch |
author_variant | z a za l g lg f n fn t p tp |
building | Verbundindex |
bvnumber | BV045352039 |
classification_rvk | ST 530 |
ctrlnum | (OCoLC)1081346491 (DE-599)BVBBV045352039 |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02186nam a2200445 cb4500</leader><controlfield tag="001">BV045352039</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20230804 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">181210s2019 a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781681734460</subfield><subfield code="c">paperback</subfield><subfield code="9">978-1-68173-446-0</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781681734484</subfield><subfield code="c">hardcover</subfield><subfield code="9">978-1-68173-448-4</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1081346491</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV045352039</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-29T</subfield><subfield code="a">DE-898</subfield><subfield code="a">DE-12</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-739</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Abedjan, Ziawasch</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1081174676</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Data profiling</subfield><subfield code="c">Ziawasch Abedjan (Technische Universität Berlin), Lukasz Golab (University of Waterloo), Felix Naumann (Hasso Plattner Institute, University of Potsdam), Thorsten Papenbrock (Hasso Plattner Institute, University of Potsdam)</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">[San Rafael, California]</subfield><subfield code="b">Morgan & Claypool Publishers</subfield><subfield code="c">[2019]</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xviii, 136 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Synthesis lectures on data management</subfield><subfield code="v">#52</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Metadaten</subfield><subfield code="0">(DE-588)4410512-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data-Profiling</subfield><subfield code="0">(DE-588)7670125-6</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data-Profiling</subfield><subfield code="0">(DE-588)7670125-6</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Metadaten</subfield><subfield code="0">(DE-588)4410512-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Golab, Lukasz</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1207414689</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Naumann, Felix</subfield><subfield code="d">1971-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)129576379</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Papenbrock, Thorsten</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1153740621</subfield><subfield code="4">aut</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-1-68173-447-7</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Synthesis lectures on data management</subfield><subfield code="v">#52</subfield><subfield code="w">(DE-604)BV036766043</subfield><subfield code="9">52</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030738692&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-030738692</subfield></datafield></record></collection> |
id | DE-604.BV045352039 |
illustrated | Illustrated |
indexdate | 2024-07-10T08:15:44Z |
institution | BVB |
isbn | 9781681734460 9781681734484 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-030738692 |
oclc_num | 1081346491 |
open_access_boolean | |
owner | DE-29T DE-898 DE-BY-UBR DE-12 DE-11 DE-739 |
owner_facet | DE-29T DE-898 DE-BY-UBR DE-12 DE-11 DE-739 |
physical | xviii, 136 Seiten Illustrationen, Diagramme |
publishDate | 2019 |
publishDateSearch | 2019 |
publishDateSort | 2019 |
publisher | Morgan & Claypool Publishers |
record_format | marc |
series | Synthesis lectures on data management |
series2 | Synthesis lectures on data management |
spelling | Abedjan, Ziawasch Verfasser (DE-588)1081174676 aut Data profiling Ziawasch Abedjan (Technische Universität Berlin), Lukasz Golab (University of Waterloo), Felix Naumann (Hasso Plattner Institute, University of Potsdam), Thorsten Papenbrock (Hasso Plattner Institute, University of Potsdam) [San Rafael, California] Morgan & Claypool Publishers [2019] xviii, 136 Seiten Illustrationen, Diagramme txt rdacontent n rdamedia nc rdacarrier Synthesis lectures on data management #52 Metadaten (DE-588)4410512-5 gnd rswk-swf Data Mining (DE-588)4428654-5 gnd rswk-swf Data-Profiling (DE-588)7670125-6 gnd rswk-swf Data-Profiling (DE-588)7670125-6 s Data Mining (DE-588)4428654-5 s Metadaten (DE-588)4410512-5 s DE-604 Golab, Lukasz Verfasser (DE-588)1207414689 aut Naumann, Felix 1971- Verfasser (DE-588)129576379 aut Papenbrock, Thorsten Verfasser (DE-588)1153740621 aut Erscheint auch als Online-Ausgabe 978-1-68173-447-7 Synthesis lectures on data management #52 (DE-604)BV036766043 52 Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030738692&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Abedjan, Ziawasch Golab, Lukasz Naumann, Felix 1971- Papenbrock, Thorsten Data profiling Synthesis lectures on data management Metadaten (DE-588)4410512-5 gnd Data Mining (DE-588)4428654-5 gnd Data-Profiling (DE-588)7670125-6 gnd |
subject_GND | (DE-588)4410512-5 (DE-588)4428654-5 (DE-588)7670125-6 |
title | Data profiling |
title_auth | Data profiling |
title_exact_search | Data profiling |
title_full | Data profiling Ziawasch Abedjan (Technische Universität Berlin), Lukasz Golab (University of Waterloo), Felix Naumann (Hasso Plattner Institute, University of Potsdam), Thorsten Papenbrock (Hasso Plattner Institute, University of Potsdam) |
title_fullStr | Data profiling Ziawasch Abedjan (Technische Universität Berlin), Lukasz Golab (University of Waterloo), Felix Naumann (Hasso Plattner Institute, University of Potsdam), Thorsten Papenbrock (Hasso Plattner Institute, University of Potsdam) |
title_full_unstemmed | Data profiling Ziawasch Abedjan (Technische Universität Berlin), Lukasz Golab (University of Waterloo), Felix Naumann (Hasso Plattner Institute, University of Potsdam), Thorsten Papenbrock (Hasso Plattner Institute, University of Potsdam) |
title_short | Data profiling |
title_sort | data profiling |
topic | Metadaten (DE-588)4410512-5 gnd Data Mining (DE-588)4428654-5 gnd Data-Profiling (DE-588)7670125-6 gnd |
topic_facet | Metadaten Data Mining Data-Profiling |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030738692&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
volume_link | (DE-604)BV036766043 |
work_keys_str_mv | AT abedjanziawasch dataprofiling AT golablukasz dataprofiling AT naumannfelix dataprofiling AT papenbrockthorsten dataprofiling |