Building the data lakehouse:
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Basking Ridge, NJ
Technics Publications
2021
|
Ausgabe: | First printing |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | iv, 246 Seiten Illustrationen, Diagramme |
ISBN: | 9781634629669 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV047658694 | ||
003 | DE-604 | ||
005 | 20220307 | ||
007 | t | ||
008 | 220104s2021 a||| |||| 00||| eng d | ||
020 | |a 9781634629669 |9 978-1-63462-966-9 | ||
035 | |a (OCoLC)1302313902 | ||
035 | |a (DE-599)KXP178054250X | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-739 | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
100 | 1 | |a Inmon, William H. |d 1945- |e Verfasser |0 (DE-588)113317662 |4 aut | |
245 | 1 | 0 | |a Building the data lakehouse |c Bill Inmon, Mary Levins, Ranjeet Srivastava |
250 | |a First printing | ||
264 | 1 | |a Basking Ridge, NJ |b Technics Publications |c 2021 | |
300 | |a iv, 246 Seiten |b Illustrationen, Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 0 | 7 | |a Data Science |0 (DE-588)1140936166 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Data-Warehouse-Konzept |0 (DE-588)4406462-7 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Data-Warehouse-Konzept |0 (DE-588)4406462-7 |D s |
689 | 0 | 1 | |a Data Science |0 (DE-588)1140936166 |D s |
689 | 0 | 2 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Levins, Mary |d ca. 20./21. Jh. |e Verfasser |0 (DE-588)1253028931 |4 aut | |
700 | 1 | |a Srivastava, Ranjeet |d ca. 20./21. Jh. |e Verfasser |0 (DE-588)1253029954 |4 aut | |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033043563&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-033043563 |
Datensatz im Suchindex
_version_ | 1804183128284069888 |
---|---|
adam_text | Contents Introduction________________________________________________ 1 Chapter 1: Euolution to the Data Lahehouse_____________________ 5 The euolution of technology__________________________________ 5 All the data In the organization_______________________________ 12 Where is business value?___________________________________ 18 The data lake____________________________________________ 19 Current data architecture challenges__________________________ 22 Emergence of the data lakehouse_____________________________ 23 Comparing data warehouse and data lake with data lakehouse_____ 28 Chapter 2: Data Scientists and End Users________________________ 31 The data lake____________________________________________ 31 The analytical infrastructure________________________________ 32 Different audiences________________________________________32 The tools of analysis_______________________________________ 33 What is being analyzed?____________________________________ 34 The analytical approaches__________________________________ 35 Types of data____________________________________________ 36 Chapter 3: Different Types of Data in the Data Lakehouse_________ 39 Types of data____________________________________________ 40 Different volumes of data___________________________________ 45 Relating data across the diverse types of data___________________ 45 Segmenting data based on probability of access__________________ 46 Relating data in the loT and the analog environment_____________ 47 The analytical infrastructure________________________________ 50 Chapter 4: The Open
Environment_____________________________ 53 The euolution of open systems_______________________________ 54 Innovation today_________________________________________ 55 The unstructured portion builds on open standards_______________ 56 Open source lakehouse software______________________________57 Lakehouse provides open APIs beyond SQL_____________________ 60 Lakehouse enables open data sharing_________________________ 61 і
ii · Building the Data Lakehouse Lakehouse supports open data exploration______________________ 63 Lakehouse simplifies discovery with open data catalogs___________ 65 Lakehouse leverages cloud architecture_________________________ 66 An euolution to the open data lakehouse________________________ 69 Chapter 5: Machine Learning and the Data Lakehouse_______________ 71 Machine learning_____________________________________________ 71 What machine learning needs from a lakehouse___________________ 72 New value from data__________________________________________ 73 Resolving the dilemma_________________________________________ 73 The problem of unstructured data_______________________________ 75 The importance of open source__________________________________ 77 Taking advantage of cloud elasticity_____________________________ 78 Designing MLOps for a data platform_________________________ 80 Example: Learning to classify chest x-rays________________________ 81 An euolution of the unstructured component______________________ 85 Chapter 6: The Analytical Infrastructure for the Data Lahehouse____87 Metadata___________________________________________________ 89 The data model______________________________________________ 90 Data quality________________________________________________ 92 ETL ______________________________________________________ 93 Textual ETL__________________________________________________ 94 Taxonomies_________________________________________________ 95 Volume of data______________________________________________ 96 Lineage of
data_______________________________________________97 KPIs ______________________________________________________ 98 Qranularity_________________________________________________ 99 Transactions_______________________________________________ 100 Keys ______________________________________________________101 Schedule of processing________________________________________102 Summarization_____________________________________________102 Minimum requirements_______________________________________ 104 Chapter 7: Blending Data in the Data Lahehouse_________________ 105 The lakehouse and the data lakehouse__________________________ 105 The origins of data___________________________________________106 Different types of analysis____________________________________ 107 Common identifiers__________________________________________ 109
Contents · iii Structured identifiers_______________________________________ 110 Repetitiue data_____________________________________________ 111 Identifiers from the textual enuironment________________________ 112 Combining text and structured data____________________________ 114 The importance of matching___________________________________121 Chapter 8: Types of Analysis Avass the Data Lakehouse Architecture______ 125 Known queries____________________________________________ 125 Heuristic analysis_________________________________________ 128 Chapter 9: Data lakehouse Housekeeping™___________________________ 135 Data integration and interoperability__________________________ 138 Master references for the data lakehouse______________________ 142 Data lakehouse priuacy, confidentiality, and data protection______ 145 Data future-proofing™” in a data lakehouse___________________ 148 Fiue phases of Data Future-proofing __________________________ 154 Data lakehouse routine maintenance___________________________ 165 Chapter 10: Visualization_________________________________________ 167 Turning data into information________________________________ 169 What is data uisualuation and why is it important?______________ 172 Data uisualuation, data analysis, and data interpretation_________174 Aduantage of data uisualuation_______________________________ 177 Chapter 11: Data lineage in the Data lakehouse Architecture______________191 The chain of calculations_____________________________________ 192 Selection of data____________________________________________ 194 Algorithmic
differences______________________________________ 194 Lineage for the textual enuironment____________________________196 Lineage for the other unstructured enuironment_________________ 197 Data lineage_______________________________________________ 198 Chapter 12: Probability of Access in the Data Lakehouse Architecture______ 201 Efficient arrangement of data________________________________ 202 Probability of access________________________________________203 Different types of data in the data lakehouse____________________204 Relatiue data uolume differences_____________________________ 205 Aduantages of segmentation_________________________________ 206
¡v · Building the Data Lakehouse Using bulk storage_________________________________________207 Incidental indexes__________________________________________207 Chapter 13: Crossing the Chasm___________________________________209 Merging data______________________________________________ 209 Different kinds of data_______________________________________210 Different business needs_____________________________________ 211 Crossing the chasm__________________________________________ 211 Chapter 14: Managing Volumes of Data in the Data Lahehouse___________219 Distribution of the uolumes of data____________________________ 220 High performance/bulk storage of data_________________________221 Incidental indexes and summarization_________________________ 222 Periodic filtering___________________________________________ 224 Tokenization of data_________________________________________225 Separating text and databases________________________________ 225 Archiual storage___________________________________________ 226 Monitoring actiuitg_________________________________________227 Parallel processing__________________________________________227 Chapter 15: Data Qouemance and the Lakehouse______________________229 Purpose of data gouemance__________________________________ 229 Data lifecycle management__________________________________ 232 Data quality management___________________________________ 234 Importance of metadata management_________________________ 236 Data gouemance ouer time___________________________________ 237 Types of
gouemance_________________________________________238 Data gouemance across the lakehouse_________________________ 239 Data gouemance considerations______________________________ 241 Index______________________________________________________ 243
|
adam_txt |
Contents Introduction_ 1 Chapter 1: Euolution to the Data Lahehouse_ 5 The euolution of technology_ 5 All the data In the organization_ 12 Where is business value?_ 18 The data lake_ 19 Current data architecture challenges_ 22 Emergence of the data lakehouse_ 23 Comparing data warehouse and data lake with data lakehouse_ 28 Chapter 2: Data Scientists and End Users_ 31 The data lake_ 31 The analytical infrastructure_ 32 Different audiences_32 The tools of analysis_ 33 What is being analyzed?_ 34 The analytical approaches_ 35 Types of data_ 36 Chapter 3: Different Types of Data in the Data Lakehouse_ 39 Types of data_ 40 Different volumes of data_ 45 Relating data across the diverse types of data_ 45 Segmenting data based on probability of access_ 46 Relating data in the loT and the analog environment_ 47 The analytical infrastructure_ 50 Chapter 4: The Open
Environment_ 53 The euolution of open systems_ 54 Innovation today_ 55 The unstructured portion builds on open standards_ 56 Open source lakehouse software_57 Lakehouse provides open APIs beyond SQL_ 60 Lakehouse enables open data sharing_ 61 і
ii · Building the Data Lakehouse Lakehouse supports open data exploration_ 63 Lakehouse simplifies discovery with open data catalogs_ 65 Lakehouse leverages cloud architecture_ 66 An euolution to the open data lakehouse_ 69 Chapter 5: Machine Learning and the Data Lakehouse_ 71 Machine learning_ 71 What machine learning needs from a lakehouse_ 72 New value from data_ 73 Resolving the dilemma_ 73 The problem of unstructured data_ 75 The importance of open source_ 77 Taking advantage of cloud elasticity_ 78 Designing " MLOps" for a data platform_ 80 Example: Learning to classify chest x-rays_ 81 An euolution of the unstructured component_ 85 Chapter 6: The Analytical Infrastructure for the Data Lahehouse_87 Metadata_ 89 The data model_ 90 Data quality_ 92 ETL _ 93 Textual ETL_ 94 Taxonomies_ 95 Volume of data_ 96 Lineage of
data_97 KPIs _ 98 Qranularity_ 99 Transactions_ 100 Keys _101 Schedule of processing_102 Summarization_102 Minimum requirements_ 104 Chapter 7: Blending Data in the Data Lahehouse_ 105 The lakehouse and the data lakehouse_ 105 The origins of data_106 Different types of analysis_ 107 Common identifiers_ 109
Contents · iii Structured identifiers_ 110 Repetitiue data_ 111 Identifiers from the textual enuironment_ 112 Combining text and structured data_ 114 The importance of matching_121 Chapter 8: Types of Analysis Avass the Data Lakehouse Architecture_ 125 Known queries_ 125 Heuristic analysis_ 128 Chapter 9: Data lakehouse Housekeeping™_ 135 Data integration and interoperability_ 138 Master references for the data lakehouse_ 142 Data lakehouse priuacy, confidentiality, and data protection_ 145 "Data future-proofing™” in a data lakehouse_ 148 Fiue phases of "Data Future-proofing"_ 154 Data lakehouse routine maintenance_ 165 Chapter 10: Visualization_ 167 Turning data into information_ 169 What is data uisualuation and why is it important?_ 172 Data uisualuation, data analysis, and data interpretation_174 Aduantage of data uisualuation_ 177 Chapter 11: Data lineage in the Data lakehouse Architecture_191 The chain of calculations_ 192 Selection of data_ 194 Algorithmic
differences_ 194 Lineage for the textual enuironment_196 Lineage for the other unstructured enuironment_ 197 Data lineage_ 198 Chapter 12: Probability of Access in the Data Lakehouse Architecture_ 201 Efficient arrangement of data_ 202 Probability of access_203 Different types of data in the data lakehouse_204 Relatiue data uolume differences_ 205 Aduantages of segmentation_ 206
¡v · Building the Data Lakehouse Using bulk storage_207 Incidental indexes_207 Chapter 13: Crossing the Chasm_209 Merging data_ 209 Different kinds of data_210 Different business needs_ 211 Crossing the chasm_ 211 Chapter 14: Managing Volumes of Data in the Data Lahehouse_219 Distribution of the uolumes of data_ 220 High performance/bulk storage of data_221 Incidental indexes and summarization_ 222 Periodic filtering_ 224 Tokenization of data_225 Separating text and databases_ 225 Archiual storage_ 226 Monitoring actiuitg_227 Parallel processing_227 Chapter 15: Data Qouemance and the Lakehouse_229 Purpose of data gouemance_ 229 Data lifecycle management_ 232 Data quality management_ 234 Importance of metadata management_ 236 Data gouemance ouer time_ 237 Types of
gouemance_238 Data gouemance across the lakehouse_ 239 Data gouemance considerations_ 241 Index_ 243 |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Inmon, William H. 1945- Levins, Mary ca. 20./21. Jh Srivastava, Ranjeet ca. 20./21. Jh |
author_GND | (DE-588)113317662 (DE-588)1253028931 (DE-588)1253029954 |
author_facet | Inmon, William H. 1945- Levins, Mary ca. 20./21. Jh Srivastava, Ranjeet ca. 20./21. Jh |
author_role | aut aut aut |
author_sort | Inmon, William H. 1945- |
author_variant | w h i wh whi m l ml r s rs |
building | Verbundindex |
bvnumber | BV047658694 |
classification_rvk | ST 530 |
ctrlnum | (OCoLC)1302313902 (DE-599)KXP178054250X |
discipline | Informatik |
discipline_str_mv | Informatik |
edition | First printing |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01720nam a2200397 c 4500</leader><controlfield tag="001">BV047658694</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20220307 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">220104s2021 a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781634629669</subfield><subfield code="9">978-1-63462-966-9</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1302313902</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)KXP178054250X</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-739</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Inmon, William H.</subfield><subfield code="d">1945-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)113317662</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Building the data lakehouse</subfield><subfield code="c">Bill Inmon, Mary Levins, Ranjeet Srivastava</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">First printing</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Basking Ridge, NJ</subfield><subfield code="b">Technics Publications</subfield><subfield code="c">2021</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">iv, 246 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Science</subfield><subfield code="0">(DE-588)1140936166</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data-Warehouse-Konzept</subfield><subfield code="0">(DE-588)4406462-7</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data-Warehouse-Konzept</subfield><subfield code="0">(DE-588)4406462-7</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Data Science</subfield><subfield code="0">(DE-588)1140936166</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Levins, Mary</subfield><subfield code="d">ca. 20./21. Jh.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1253028931</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Srivastava, Ranjeet</subfield><subfield code="d">ca. 20./21. Jh.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1253029954</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033043563&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-033043563</subfield></datafield></record></collection> |
id | DE-604.BV047658694 |
illustrated | Illustrated |
index_date | 2024-07-03T18:51:53Z |
indexdate | 2024-07-10T09:18:31Z |
institution | BVB |
isbn | 9781634629669 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-033043563 |
oclc_num | 1302313902 |
open_access_boolean | |
owner | DE-739 |
owner_facet | DE-739 |
physical | iv, 246 Seiten Illustrationen, Diagramme |
publishDate | 2021 |
publishDateSearch | 2021 |
publishDateSort | 2021 |
publisher | Technics Publications |
record_format | marc |
spelling | Inmon, William H. 1945- Verfasser (DE-588)113317662 aut Building the data lakehouse Bill Inmon, Mary Levins, Ranjeet Srivastava First printing Basking Ridge, NJ Technics Publications 2021 iv, 246 Seiten Illustrationen, Diagramme txt rdacontent n rdamedia nc rdacarrier Data Science (DE-588)1140936166 gnd rswk-swf Data-Warehouse-Konzept (DE-588)4406462-7 gnd rswk-swf Maschinelles Lernen (DE-588)4193754-5 gnd rswk-swf Data-Warehouse-Konzept (DE-588)4406462-7 s Data Science (DE-588)1140936166 s Maschinelles Lernen (DE-588)4193754-5 s DE-604 Levins, Mary ca. 20./21. Jh. Verfasser (DE-588)1253028931 aut Srivastava, Ranjeet ca. 20./21. Jh. Verfasser (DE-588)1253029954 aut Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033043563&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Inmon, William H. 1945- Levins, Mary ca. 20./21. Jh Srivastava, Ranjeet ca. 20./21. Jh Building the data lakehouse Data Science (DE-588)1140936166 gnd Data-Warehouse-Konzept (DE-588)4406462-7 gnd Maschinelles Lernen (DE-588)4193754-5 gnd |
subject_GND | (DE-588)1140936166 (DE-588)4406462-7 (DE-588)4193754-5 |
title | Building the data lakehouse |
title_auth | Building the data lakehouse |
title_exact_search | Building the data lakehouse |
title_exact_search_txtP | Building the data lakehouse |
title_full | Building the data lakehouse Bill Inmon, Mary Levins, Ranjeet Srivastava |
title_fullStr | Building the data lakehouse Bill Inmon, Mary Levins, Ranjeet Srivastava |
title_full_unstemmed | Building the data lakehouse Bill Inmon, Mary Levins, Ranjeet Srivastava |
title_short | Building the data lakehouse |
title_sort | building the data lakehouse |
topic | Data Science (DE-588)1140936166 gnd Data-Warehouse-Konzept (DE-588)4406462-7 gnd Maschinelles Lernen (DE-588)4193754-5 gnd |
topic_facet | Data Science Data-Warehouse-Konzept Maschinelles Lernen |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033043563&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT inmonwilliamh buildingthedatalakehouse AT levinsmary buildingthedatalakehouse AT srivastavaranjeet buildingthedatalakehouse |