Data science on the Google Cloud Platform: implementing end-to-end real-time data pipelines: from ingest to machine learning ; updated for TensorFlow 2.0
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Sebastopol, CA
O'Reilly Media
January 2018
|
Ausgabe: | First edition |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Hier auch später erschienene, unveränderte Nachdrucke |
Beschreibung: | xiv, 387 Seiten Illustrationen |
ISBN: | 9781491974568 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV044881440 | ||
003 | DE-604 | ||
005 | 20200429 | ||
007 | t | ||
008 | 180326s2018 a||| |||| 00||| eng d | ||
020 | |a 9781491974568 |c pbk. |9 978-1-491-97456-8 | ||
035 | |a (OCoLC)1023573183 | ||
035 | |a (DE-599)BVBBV044881440 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-11 |a DE-91G |a DE-898 |a DE-739 | ||
084 | |a ST 510 |0 (DE-625)143676: |2 rvk | ||
084 | |a DAT 708f |2 stub | ||
084 | |a DAT 620f |2 stub | ||
100 | 1 | |a Lakshmanan, Valliappa |e Verfasser |4 aut | |
245 | 1 | 0 | |a Data science on the Google Cloud Platform |b implementing end-to-end real-time data pipelines: from ingest to machine learning ; updated for TensorFlow 2.0 |c Valliappa Lakshmanan |
250 | |a First edition | ||
264 | 1 | |a Sebastopol, CA |b O'Reilly Media |c January 2018 | |
264 | 4 | |c © 2018 | |
300 | |a xiv, 387 Seiten |b Illustrationen | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Hier auch später erschienene, unveränderte Nachdrucke | ||
650 | 0 | 7 | |a Data Science |0 (DE-588)1140936166 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Google Cloud Platform |0 (DE-588)1163407496 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Google Cloud Platform |0 (DE-588)1163407496 |D s |
689 | 0 | 1 | |a Data Science |0 (DE-588)1140936166 |D s |
689 | 0 | |5 DE-604 | |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030275648&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-030275648 |
Datensatz im Suchindex
_version_ | 1804178419889471488 |
---|---|
adam_text | Table of Contents Preface...................................................................................................... ix 1. Making Better Decisions Based on Data.......................................................... 1 Many Similar Decisions The Role of Data Engineers The Cloud Makes Data Engineers Possible The Cloud Turbocharges Data Science Case Studies Get at the Stubborn Facts A Probabilistic Decision Data and Tools Getting Started with the Code Summary 2 4 6 10 12 13 19 20 22 2. Ingesting Data into the Cloud..................................................................... 23 Airline On-Time Performance Data Knowability Training-Serving Skew Download Procedure Dataset Fields Why Not Store the Data in Situ? Scaling Up Scaling Out Data in Situ with Colossus and Jupiter Ingesting Data Reverse Engineering a Web Form Dataset Download Exploration and Cleanup Uploading Data to Google Cloud Storage 23 25 26 27 28 29 31 33 35 38 39 41 43 45 iii
Scheduling Monthly Downloads Ingesting in Python Cloud Functions Securing the URL Scheduling the Cloud Function Improving the Cloud Function Design Summary Code Break 48 50 57 58 59 60 61 62 3. Creating Compelling Dashboards................................................................. 65 Explain Your Model with Dashboards Why Build a Dashboard First? Accuracy, Honesty, and Good Design Loading Data into Google Cloud SQL Create a Google Cloud SQL Instance Interacting with Google Cloud Platform Controlling Access to MySQL Create Tables Populating Tables Building Our First Model Contingency Table Threshold Optimization Machine Learning Building a Dashboard Getting Started with Data Studio Creating Charts Adding End-User Controls Showing Proportions with a Pie Chart Explaining a Contingency Table Summary 66 68 69 71 72 73 74 76 78 79 79 80 81 82 82 86 88 89 94 97 4. Streaming Data: Publication and Ingest........................................................ 99 Designing the Event Feed Time Correction Apache Beam/Cloud Dataflow Parsing Airports Data Adding Time Zone Information Converting Times to UTC Correcting Dates Creating Events Running the Pipeline in the Cloud Publishing an Event Stream to Cloud Pub/Sub iv I Table of Contents 99 102 103 105 106 107 109 110 111 115
Get Records to Publish Paging Through Records Building a Batch of Events Publishing a Batch of Events Real-Time Stream Processing Streaming in Java Dataflow Executing the Stream Processing Analyzing Streaming Data in BigQuery Real-Time Dashboard Summary 117 118 118 119 120 121 126 127 129 131 5. Interactive Data Exploration..................................................................... 133 Exploratory Data Analysis Loading Flights Data into BigQuery Advantages of a Serverless Columnar Database Staging on Cloud Storage Access Control Federated Queries Ingesting CSV Files Exploratory Data Analysis in Cloud AI Platform Notebooks Jupyter Notebooks Cloud AI Platform Notebooks Installing Packages in Cloud AI Platform Notebooks Jupyter Magic for Google Cloud Platform Quality Control Oddball Values Outlier Removal: Big Data Is Different Filtering Data on Occurrence Frequency Arrival Delay Conditioned on Departure Delay Applying Probabilistic Decision Threshold Empirical Probability Distribution Function The Answer Is... Evaluating the Model Random Shuffling Splitting by Date Training and Testing Summary 134 136 136 138 139 144 146 151 153 153 155 156 161 162 163 165 166 169 170 172 173 173 174 176 180 6. Bayes Classifier on Cloud Dataproc............................................................. 181 MapReduce and the Hadoop Ecosystem How MapReduce Works Apache Hadoop 181 182 184 Table of Contents | v
Google Cloud Dataproc Need for Higher-Level Tools Jobs, Not Clusters Initialization Actions Quantization Using Spark SQL JupyterLab on Cloud Dataproc Independence Check Using BigQuery Spark SQL in JupyterLab Histogram Equalization Dynamically Resizing Clusters Bayes Classification Using Pig Running a Pig Job on Cloud Dataproc Automating Cloud Dataproc with Workflow Templates Limiting to Training Days The Decision Criteria Evaluating the Bayesian Model Summary 184 186 188 190 191 192 192 195 198 202 205 206 207 208 209 212 214 7. Machine Learning: Logistic Regression in Spark and BigQuery.............................. 217 Logistic Regression Spark ML Library Getting Started with Spark Machine Learning Spark Logistic Regression Creating a Training Dataset Dealing with Corner Cases Creating Training Examples Training Predicting by Using a Model Evaluating a Model Feature Engineering Experimental Framework Creating the Held-Out Dataset Feature Selection Scaling and Clipping Features Feature Transforms Categorical Variables Scalable Machine Learning Models in BigQuery Repeatable, Real Time Summary 218 221 222 223 224 226 228 229 231 232 235 236 238 239 242 244 248 250 254 254 8. Time-Windowed Aggregate Features............................................................ 257 The Need for Time Averages vi I Table of Contents 257
Dataflow in Java Setting Up Development Environment Filtering with Beam Pipeline Options and Text I/O Run on Cloud Parsing into Objects Computing Time Averages Grouping and Combining Parallel Do with Side Input Debugging BigQuerylO Mutating the Flight Object Sliding Window Computation in Batch Mode Running in the Cloud Monitoring, Troubleshooting, and Performance Tuning Troubleshooting Pipeline Side Input Limitations Redesigning the Pipeline Removing Duplicates Summary 259 260 261 264 266 267 270 270 272 274 275 277 278 280 282 283 284 287 289 292 9. Machine Learning Classifier Using TensorFlow............................................... 295 Toward More Complex Models Reading Data into TensorFlow Training and Evaluation in Keras Model Function Input and Features Training and Evaluating Input Functions Saving and Exporting Performing a Training Run Training in the Cloud Wide-and-Deep Model Hyperparameter Tuning Deploying the Model Predicting with the Model Explaining the Model Summary 296 299 304 306 306 308 308 309 309 311 314 318 319 320 322 10. Real-Time Machine Learning...................................................................... 325 Invoking Prediction Service Java Classes for Request and Response Post Request and Parse Response 326 327 328 Table of Contents | vii
Client of Prediction Service Adding Predictions to Flight Information Batch Input and Output Data Processing Pipeline Identifying Inefficiency Batching Requests Streaming Pipeline Flattening PCollections Executing Streaming Pipeline Late and Out-of-Order Records Watermarks and Triggers Transactions, Throughput, and Latency Possible Streaming Sinks Cloud Bigtable Designing Tables Designing the Row Key Streaming into Cloud Bigtable Querying from Cloud Bigtable Evaluating Model Performance The Need for Continuous Training Evaluation Pipeline Evaluating Performance Marginal Distributions Checking Model Behavior Identifying Behavioral Change Summary Book Summary 329 330 330 332 333 334 336 337 338 339 344 346 346 348 349 350 351 354 355 355 356 358 358 360 363 364 365 A. Considerations for Sensitive Data within Machine Learning Datasets..................... 369 Index...................................................................................................... 379 viii I Table of Contents
|
any_adam_object | 1 |
author | Lakshmanan, Valliappa |
author_facet | Lakshmanan, Valliappa |
author_role | aut |
author_sort | Lakshmanan, Valliappa |
author_variant | v l vl |
building | Verbundindex |
bvnumber | BV044881440 |
classification_rvk | ST 510 |
classification_tum | DAT 708f DAT 620f |
ctrlnum | (OCoLC)1023573183 (DE-599)BVBBV044881440 |
discipline | Informatik |
edition | First edition |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01678nam a2200397 c 4500</leader><controlfield tag="001">BV044881440</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20200429 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">180326s2018 a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781491974568</subfield><subfield code="c">pbk.</subfield><subfield code="9">978-1-491-97456-8</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1023573183</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV044881440</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-11</subfield><subfield code="a">DE-91G</subfield><subfield code="a">DE-898</subfield><subfield code="a">DE-739</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 510</subfield><subfield code="0">(DE-625)143676:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 708f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 620f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Lakshmanan, Valliappa</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Data science on the Google Cloud Platform</subfield><subfield code="b">implementing end-to-end real-time data pipelines: from ingest to machine learning ; updated for TensorFlow 2.0</subfield><subfield code="c">Valliappa Lakshmanan</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">First edition</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Sebastopol, CA</subfield><subfield code="b">O'Reilly Media</subfield><subfield code="c">January 2018</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2018</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xiv, 387 Seiten</subfield><subfield code="b">Illustrationen</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Hier auch später erschienene, unveränderte Nachdrucke</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Science</subfield><subfield code="0">(DE-588)1140936166</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Google Cloud Platform</subfield><subfield code="0">(DE-588)1163407496</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Google Cloud Platform</subfield><subfield code="0">(DE-588)1163407496</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Data Science</subfield><subfield code="0">(DE-588)1140936166</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030275648&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-030275648</subfield></datafield></record></collection> |
id | DE-604.BV044881440 |
illustrated | Illustrated |
indexdate | 2024-07-10T08:03:41Z |
institution | BVB |
isbn | 9781491974568 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-030275648 |
oclc_num | 1023573183 |
open_access_boolean | |
owner | DE-11 DE-91G DE-BY-TUM DE-898 DE-BY-UBR DE-739 |
owner_facet | DE-11 DE-91G DE-BY-TUM DE-898 DE-BY-UBR DE-739 |
physical | xiv, 387 Seiten Illustrationen |
publishDate | 2018 |
publishDateSearch | 2018 |
publishDateSort | 2018 |
publisher | O'Reilly Media |
record_format | marc |
spelling | Lakshmanan, Valliappa Verfasser aut Data science on the Google Cloud Platform implementing end-to-end real-time data pipelines: from ingest to machine learning ; updated for TensorFlow 2.0 Valliappa Lakshmanan First edition Sebastopol, CA O'Reilly Media January 2018 © 2018 xiv, 387 Seiten Illustrationen txt rdacontent n rdamedia nc rdacarrier Hier auch später erschienene, unveränderte Nachdrucke Data Science (DE-588)1140936166 gnd rswk-swf Google Cloud Platform (DE-588)1163407496 gnd rswk-swf Google Cloud Platform (DE-588)1163407496 s Data Science (DE-588)1140936166 s DE-604 Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030275648&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Lakshmanan, Valliappa Data science on the Google Cloud Platform implementing end-to-end real-time data pipelines: from ingest to machine learning ; updated for TensorFlow 2.0 Data Science (DE-588)1140936166 gnd Google Cloud Platform (DE-588)1163407496 gnd |
subject_GND | (DE-588)1140936166 (DE-588)1163407496 |
title | Data science on the Google Cloud Platform implementing end-to-end real-time data pipelines: from ingest to machine learning ; updated for TensorFlow 2.0 |
title_auth | Data science on the Google Cloud Platform implementing end-to-end real-time data pipelines: from ingest to machine learning ; updated for TensorFlow 2.0 |
title_exact_search | Data science on the Google Cloud Platform implementing end-to-end real-time data pipelines: from ingest to machine learning ; updated for TensorFlow 2.0 |
title_full | Data science on the Google Cloud Platform implementing end-to-end real-time data pipelines: from ingest to machine learning ; updated for TensorFlow 2.0 Valliappa Lakshmanan |
title_fullStr | Data science on the Google Cloud Platform implementing end-to-end real-time data pipelines: from ingest to machine learning ; updated for TensorFlow 2.0 Valliappa Lakshmanan |
title_full_unstemmed | Data science on the Google Cloud Platform implementing end-to-end real-time data pipelines: from ingest to machine learning ; updated for TensorFlow 2.0 Valliappa Lakshmanan |
title_short | Data science on the Google Cloud Platform |
title_sort | data science on the google cloud platform implementing end to end real time data pipelines from ingest to machine learning updated for tensorflow 2 0 |
title_sub | implementing end-to-end real-time data pipelines: from ingest to machine learning ; updated for TensorFlow 2.0 |
topic | Data Science (DE-588)1140936166 gnd Google Cloud Platform (DE-588)1163407496 gnd |
topic_facet | Data Science Google Cloud Platform |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030275648&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT lakshmananvalliappa datascienceonthegooglecloudplatformimplementingendtoendrealtimedatapipelinesfromingesttomachinelearningupdatedfortensorflow20 |