Data profiling:
Data profiling refers to the activity of collecting data about data, i.e., metadata. Most IT professionals and researchers who work with data have engaged in data profiling, at least informally, to understand and explore an unfamiliar dataset or to determine whether a new dataset is appropriate for...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Elektronisch E-Book |
Sprache: | English |
Veröffentlicht: |
[San Rafael, California]
Morgan & Claypool Publishers
[2019]
|
Schriftenreihe: | Synthesis lectures on data management
#52 |
Schlagworte: | |
Online-Zugang: | URL des Erstveröffentlichers |
Zusammenfassung: | Data profiling refers to the activity of collecting data about data, i.e., metadata. Most IT professionals and researchers who work with data have engaged in data profiling, at least informally, to understand and explore an unfamiliar dataset or to determine whether a new dataset is appropriate for a particular task at hand. Data profiling results are also important in a variety of other situations, including query optimization, data integration, and data cleaning. Simple metadata are statistics, such as the number of rows and columns, schema and datatype information, the number of distinct values, statistical value distributions, and the number of null or empty values in each column. More complex types of metadata are statements about multiple columns and their correlation, such as candidate keys, functional dependencies, and other types of dependencies. This book provides a classification of the various types of profilable metadata, discusses popular data profiling tasks, and surveys state-of-the-art profiling algorithms. While most of the book focuses on tasks and algorithms for relational data profiling, we also briefly discuss systems and techniques for profiling non-relational data such as graphs and text. We conclude with a discussion of data profiling challenges and directions for future work in this area |
Beschreibung: | Part of: Synthesis digital library of engineering and computer science Title from PDF title page (viewed on November 28, 2018) |
Beschreibung: | 1 Online-Resource (xviii, 136 Seiten) Illustrationen |
ISBN: | 9781681734477 |
DOI: | 10.2200/S00878ED1V01Y201810DTM052 |
Internformat
MARC
LEADER | 00000nmm a2200000zcb4500 | ||
---|---|---|---|
001 | BV046427634 | ||
003 | DE-604 | ||
005 | 20211124 | ||
007 | cr|uuu---uuuuu | ||
008 | 200217s2019 |||| o||u| ||||||eng d | ||
020 | |a 9781681734477 |c ebook |9 978-1-68173-447-7 | ||
024 | 7 | |a 10.2200/S00878ED1V01Y201810DTM052 |2 doi | |
035 | |a (ZDB-105-MCS)8540360 | ||
035 | |a (OCoLC)1141156976 | ||
035 | |a (DE-599)BVBBV046427634 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-83 | ||
082 | 0 | |a 025.3 |2 23 | |
084 | |a ST 520 |0 (DE-625)143678: |2 rvk | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
100 | 1 | |a Abedjan, Ziawasch |e Verfasser |0 (DE-588)1081174676 |4 aut | |
245 | 1 | 0 | |a Data profiling |c Ziawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock |
264 | 1 | |a [San Rafael, California] |b Morgan & Claypool Publishers |c [2019] | |
264 | 4 | |c © 2019 | |
300 | |a 1 Online-Resource (xviii, 136 Seiten) |b Illustrationen | ||
336 | |b txt |2 rdacontent | ||
337 | |b c |2 rdamedia | ||
338 | |b cr |2 rdacarrier | ||
490 | 1 | |a Synthesis lectures on data management |v #52 | |
500 | |a Part of: Synthesis digital library of engineering and computer science | ||
500 | |a Title from PDF title page (viewed on November 28, 2018) | ||
520 | |a Data profiling refers to the activity of collecting data about data, i.e., metadata. Most IT professionals and researchers who work with data have engaged in data profiling, at least informally, to understand and explore an unfamiliar dataset or to determine whether a new dataset is appropriate for a particular task at hand. Data profiling results are also important in a variety of other situations, including query optimization, data integration, and data cleaning. Simple metadata are statistics, such as the number of rows and columns, schema and datatype information, the number of distinct values, statistical value distributions, and the number of null or empty values in each column. More complex types of metadata are statements about multiple columns and their correlation, such as candidate keys, functional dependencies, and other types of dependencies. This book provides a classification of the various types of profilable metadata, discusses popular data profiling tasks, and surveys state-of-the-art profiling algorithms. While most of the book focuses on tasks and algorithms for relational data profiling, we also briefly discuss systems and techniques for profiling non-relational data such as graphs and text. We conclude with a discussion of data profiling challenges and directions for future work in this area | ||
650 | 4 | |a Metadata | |
650 | 4 | |a Data mining | |
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Metadaten |0 (DE-588)4410512-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Data-Profiling |0 (DE-588)7670125-6 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Data-Profiling |0 (DE-588)7670125-6 |D s |
689 | 0 | 1 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | 2 | |a Metadaten |0 (DE-588)4410512-5 |D s |
689 | 0 | |8 1\p |5 DE-604 | |
700 | 1 | |a Golab, Lukasz |e Verfasser |0 (DE-588)1207414689 |4 aut | |
700 | 1 | |a Naumann, Felix |d 1971- |e Verfasser |0 (DE-588)129576379 |4 aut | |
700 | 1 | |a Papenbrock, Thorsten |e Verfasser |0 (DE-588)1153740621 |4 aut | |
776 | 0 | 8 | |i Erscheint auch als |n Druck-Ausgabe, paperback |z 978-1-68173-446-0 |
776 | 0 | 8 | |i Erscheint auch als |n Druck-Ausgabe, hardcover |z 978-1-68173-448-4 |
830 | 0 | |a Synthesis lectures on data management |v #52 |w (DE-604)BV036731811 |9 52 | |
856 | 4 | 0 | |u https://doi.org/10.2200/S00878ED1V01Y201810DTM052 |x Verlag |z URL des Erstveröffentlichers |3 Volltext |
912 | |a ZDB-105-MCS |a ZDB-105-MCDM | ||
999 | |a oai:aleph.bib-bvb.de:BVB01-031839937 | ||
883 | 1 | |8 1\p |a cgwrk |d 20201028 |q DE-101 |u https://d-nb.info/provenance/plan#cgwrk |
Datensatz im Suchindex
_version_ | 1804180976795910144 |
---|---|
any_adam_object | |
author | Abedjan, Ziawasch Golab, Lukasz Naumann, Felix 1971- Papenbrock, Thorsten |
author_GND | (DE-588)1081174676 (DE-588)1207414689 (DE-588)129576379 (DE-588)1153740621 |
author_facet | Abedjan, Ziawasch Golab, Lukasz Naumann, Felix 1971- Papenbrock, Thorsten |
author_role | aut aut aut aut |
author_sort | Abedjan, Ziawasch |
author_variant | z a za l g lg f n fn t p tp |
building | Verbundindex |
bvnumber | BV046427634 |
classification_rvk | ST 520 ST 530 |
collection | ZDB-105-MCS ZDB-105-MCDM |
ctrlnum | (ZDB-105-MCS)8540360 (OCoLC)1141156976 (DE-599)BVBBV046427634 |
dewey-full | 025.3 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 025 - Operations of libraries and archives |
dewey-raw | 025.3 |
dewey-search | 025.3 |
dewey-sort | 225.3 |
dewey-tens | 020 - Library and information sciences |
discipline | Allgemeines Informatik |
doi_str_mv | 10.2200/S00878ED1V01Y201810DTM052 |
format | Electronic eBook |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>03763nmm a2200589zcb4500</leader><controlfield tag="001">BV046427634</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20211124 </controlfield><controlfield tag="007">cr|uuu---uuuuu</controlfield><controlfield tag="008">200217s2019 |||| o||u| ||||||eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781681734477</subfield><subfield code="c">ebook</subfield><subfield code="9">978-1-68173-447-7</subfield></datafield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.2200/S00878ED1V01Y201810DTM052</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ZDB-105-MCS)8540360</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1141156976</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV046427634</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-83</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">025.3</subfield><subfield code="2">23</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 520</subfield><subfield code="0">(DE-625)143678:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Abedjan, Ziawasch</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1081174676</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Data profiling</subfield><subfield code="c">Ziawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">[San Rafael, California]</subfield><subfield code="b">Morgan & Claypool Publishers</subfield><subfield code="c">[2019]</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2019</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 Online-Resource (xviii, 136 Seiten)</subfield><subfield code="b">Illustrationen</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Synthesis lectures on data management</subfield><subfield code="v">#52</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Part of: Synthesis digital library of engineering and computer science</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Title from PDF title page (viewed on November 28, 2018)</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Data profiling refers to the activity of collecting data about data, i.e., metadata. Most IT professionals and researchers who work with data have engaged in data profiling, at least informally, to understand and explore an unfamiliar dataset or to determine whether a new dataset is appropriate for a particular task at hand. Data profiling results are also important in a variety of other situations, including query optimization, data integration, and data cleaning. Simple metadata are statistics, such as the number of rows and columns, schema and datatype information, the number of distinct values, statistical value distributions, and the number of null or empty values in each column. More complex types of metadata are statements about multiple columns and their correlation, such as candidate keys, functional dependencies, and other types of dependencies. This book provides a classification of the various types of profilable metadata, discusses popular data profiling tasks, and surveys state-of-the-art profiling algorithms. While most of the book focuses on tasks and algorithms for relational data profiling, we also briefly discuss systems and techniques for profiling non-relational data such as graphs and text. We conclude with a discussion of data profiling challenges and directions for future work in this area</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Metadata</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data mining</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Metadaten</subfield><subfield code="0">(DE-588)4410512-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data-Profiling</subfield><subfield code="0">(DE-588)7670125-6</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data-Profiling</subfield><subfield code="0">(DE-588)7670125-6</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Metadaten</subfield><subfield code="0">(DE-588)4410512-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="8">1\p</subfield><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Golab, Lukasz</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1207414689</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Naumann, Felix</subfield><subfield code="d">1971-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)129576379</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Papenbrock, Thorsten</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1153740621</subfield><subfield code="4">aut</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Druck-Ausgabe, paperback</subfield><subfield code="z">978-1-68173-446-0</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Druck-Ausgabe, hardcover</subfield><subfield code="z">978-1-68173-448-4</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Synthesis lectures on data management</subfield><subfield code="v">#52</subfield><subfield code="w">(DE-604)BV036731811</subfield><subfield code="9">52</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.2200/S00878ED1V01Y201810DTM052</subfield><subfield code="x">Verlag</subfield><subfield code="z">URL des Erstveröffentlichers</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-105-MCS</subfield><subfield code="a">ZDB-105-MCDM</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-031839937</subfield></datafield><datafield tag="883" ind1="1" ind2=" "><subfield code="8">1\p</subfield><subfield code="a">cgwrk</subfield><subfield code="d">20201028</subfield><subfield code="q">DE-101</subfield><subfield code="u">https://d-nb.info/provenance/plan#cgwrk</subfield></datafield></record></collection> |
id | DE-604.BV046427634 |
illustrated | Not Illustrated |
indexdate | 2024-07-10T08:44:19Z |
institution | BVB |
isbn | 9781681734477 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-031839937 |
oclc_num | 1141156976 |
open_access_boolean | |
owner | DE-83 |
owner_facet | DE-83 |
physical | 1 Online-Resource (xviii, 136 Seiten) Illustrationen |
psigel | ZDB-105-MCS ZDB-105-MCDM |
publishDate | 2019 |
publishDateSearch | 2019 |
publishDateSort | 2019 |
publisher | Morgan & Claypool Publishers |
record_format | marc |
series | Synthesis lectures on data management |
series2 | Synthesis lectures on data management |
spelling | Abedjan, Ziawasch Verfasser (DE-588)1081174676 aut Data profiling Ziawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock [San Rafael, California] Morgan & Claypool Publishers [2019] © 2019 1 Online-Resource (xviii, 136 Seiten) Illustrationen txt rdacontent c rdamedia cr rdacarrier Synthesis lectures on data management #52 Part of: Synthesis digital library of engineering and computer science Title from PDF title page (viewed on November 28, 2018) Data profiling refers to the activity of collecting data about data, i.e., metadata. Most IT professionals and researchers who work with data have engaged in data profiling, at least informally, to understand and explore an unfamiliar dataset or to determine whether a new dataset is appropriate for a particular task at hand. Data profiling results are also important in a variety of other situations, including query optimization, data integration, and data cleaning. Simple metadata are statistics, such as the number of rows and columns, schema and datatype information, the number of distinct values, statistical value distributions, and the number of null or empty values in each column. More complex types of metadata are statements about multiple columns and their correlation, such as candidate keys, functional dependencies, and other types of dependencies. This book provides a classification of the various types of profilable metadata, discusses popular data profiling tasks, and surveys state-of-the-art profiling algorithms. While most of the book focuses on tasks and algorithms for relational data profiling, we also briefly discuss systems and techniques for profiling non-relational data such as graphs and text. We conclude with a discussion of data profiling challenges and directions for future work in this area Metadata Data mining Data Mining (DE-588)4428654-5 gnd rswk-swf Metadaten (DE-588)4410512-5 gnd rswk-swf Data-Profiling (DE-588)7670125-6 gnd rswk-swf Data-Profiling (DE-588)7670125-6 s Data Mining (DE-588)4428654-5 s Metadaten (DE-588)4410512-5 s 1\p DE-604 Golab, Lukasz Verfasser (DE-588)1207414689 aut Naumann, Felix 1971- Verfasser (DE-588)129576379 aut Papenbrock, Thorsten Verfasser (DE-588)1153740621 aut Erscheint auch als Druck-Ausgabe, paperback 978-1-68173-446-0 Erscheint auch als Druck-Ausgabe, hardcover 978-1-68173-448-4 Synthesis lectures on data management #52 (DE-604)BV036731811 52 https://doi.org/10.2200/S00878ED1V01Y201810DTM052 Verlag URL des Erstveröffentlichers Volltext 1\p cgwrk 20201028 DE-101 https://d-nb.info/provenance/plan#cgwrk |
spellingShingle | Abedjan, Ziawasch Golab, Lukasz Naumann, Felix 1971- Papenbrock, Thorsten Data profiling Synthesis lectures on data management Metadata Data mining Data Mining (DE-588)4428654-5 gnd Metadaten (DE-588)4410512-5 gnd Data-Profiling (DE-588)7670125-6 gnd |
subject_GND | (DE-588)4428654-5 (DE-588)4410512-5 (DE-588)7670125-6 |
title | Data profiling |
title_auth | Data profiling |
title_exact_search | Data profiling |
title_full | Data profiling Ziawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock |
title_fullStr | Data profiling Ziawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock |
title_full_unstemmed | Data profiling Ziawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock |
title_short | Data profiling |
title_sort | data profiling |
topic | Metadata Data mining Data Mining (DE-588)4428654-5 gnd Metadaten (DE-588)4410512-5 gnd Data-Profiling (DE-588)7670125-6 gnd |
topic_facet | Metadata Data mining Data Mining Metadaten Data-Profiling |
url | https://doi.org/10.2200/S00878ED1V01Y201810DTM052 |
volume_link | (DE-604)BV036731811 |
work_keys_str_mv | AT abedjanziawasch dataprofiling AT golablukasz dataprofiling AT naumannfelix dataprofiling AT papenbrockthorsten dataprofiling |