Learning PySpark :: build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 /
Annotation
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Elektronisch E-Book |
Sprache: | English |
Veröffentlicht: |
Birmingham, UK :
Packt Publishing,
2017.
|
Schlagworte: | |
Online-Zugang: | Volltext |
Zusammenfassung: | Annotation |
Beschreibung: | Includes index. |
Beschreibung: | 1 online resource (1 volume) : illustrations, maps |
ISBN: | 9781786466259 1786466252 |
Internformat
MARC
LEADER | 00000cam a2200000 i 4500 | ||
---|---|---|---|
001 | ZDB-4-EBA-ocn976408019 | ||
003 | OCoLC | ||
005 | 20241004212047.0 | ||
006 | m o d | ||
007 | cr unu|||||||| | ||
008 | 170317s2017 enkab o 001 0 eng d | ||
040 | |a UMI |b eng |e rda |e pn |c UMI |d TEFOD |d OCLCF |d IDEBK |d STF |d TOH |d OCLCQ |d N$T |d COO |d UOK |d CEF |d KSU |d DEBBG |d UAB |d YDX |d MOF |d AU@ |d OCLCO |d OCLCQ |d OCLCO |d OCLCL |d OCLCQ | ||
019 | |a 1081417339 | ||
020 | |a 9781786466259 |q (electronic bk.) | ||
020 | |a 1786466252 |q (electronic bk.) | ||
020 | |z 9781786463708 | ||
020 | |z 1786463709 | ||
035 | |a (OCoLC)976408019 |z (OCoLC)1081417339 | ||
037 | |a CL0500000840 |b Safari Books Online | ||
037 | |a 978A042E-251E-4460-88A6-41FFF582EF91 |b OverDrive, Inc. |n http://www.overdrive.com | ||
050 | 4 | |a QA76.76.A65 | |
072 | 7 | |a COM |x 021030 |2 bisacsh | |
082 | 7 | |a 005.7 |2 23 | |
049 | |a MAIN | ||
100 | 1 | |a Drabas, Tomasz, |e author. | |
245 | 1 | 0 | |a Learning PySpark : |b build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 / |c Tomasz Drabas, Denny Lee ; foreword by Holden Karau. |
264 | 1 | |a Birmingham, UK : |b Packt Publishing, |c 2017. | |
300 | |a 1 online resource (1 volume) : |b illustrations, maps | ||
336 | |a text |b txt |2 rdacontent | ||
337 | |a computer |b c |2 rdamedia | ||
338 | |a online resource |b cr |2 rdacarrier | ||
588 | |a Description based on online resource; title from title page (viewed March 17, 2017). | ||
500 | |a Includes index. | ||
505 | 0 | |a Cover -- Copyright -- Credits -- Foreword -- About the Authors -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Understanding Spark -- What is Apache Spark? -- Spark Jobs and APIs -- Execution process -- Resilient Distributed Dataset -- DataFrames -- Datasets -- Catalyst Optimizer -- Project Tungsten -- Spark 2.0 architecture -- Unifying Datasets and DataFrames -- Introducing SparkSession -- Tungsten phase 2 -- Structured streaming -- Continuous applications -- Summary -- Chapter 2: Resilient Distributed Datasets -- Internal workings of an RDD -- Creating RDDs -- Schema -- Reading from files -- Lambda expressions -- Global versus local scope -- Transformations -- The .map(...) transformation -- The .filter(...) transformation -- The .flatMap(...) transformation -- The .distinct(...) transformation -- The .sample(...) transformation -- The .leftOuterJoin(...) transformation -- The .repartition(...) transformation -- Actions -- The .take(...) method -- The .collect(...) method -- The .reduce(...) method -- The .count(...) method -- The .saveAsTextFile(...) method -- The .foreach(...) method -- Summary -- Chapter 3: DataFrames -- Python to RDD communications -- Catalyst Optimizer refresh -- Speeding up PySpark with DataFrames -- Creating DataFrames -- Generating our own JSON data -- Creating a DataFrame -- Creating a temporary table -- Simple DataFrame queries -- DataFrame API query -- SQL query -- Interoperating with RDDs -- Inferring the schema using reflection -- Programmatically specifying the schema -- Querying with the DataFrame API -- Number of rows -- Running filter statements -- Querying with SQL -- Number of rows -- Running filter statements using the where Clauses -- DataFrame scenario -- on-time flight performance -- Preparing the source datasets. | |
505 | 8 | |a Joining flight performance and airports -- Visualizing our flight-performance data -- Spark Dataset API -- Summary -- Chapter 4: Prepare Data for Modeling -- Checking for duplicates, missing observations, and outliers -- Duplicates -- Missing observations -- Outliers -- Getting familiar with your data -- Descriptive statistics -- Correlations -- Visualization -- Histograms -- Interactions between features -- Summary -- Chapter 5: Introducing MLlib -- Overview of the package -- Loading and transforming the data -- Getting to know your data -- Descriptive statistics -- Correlations -- Statistical testing -- Creating the final dataset -- Creating an RDD of LabeledPoints -- Splitting into training and testing -- Predicting infant survival -- Logistic regression in MLlib -- Selecting only the most predictable features -- Random forest in MLlib -- Summary -- Chapter 6: Introducing the ML Package -- Overview of the package -- Transformer -- Estimators -- Classification -- Regression -- Clustering -- Pipeline -- Predicting the chances of infant survival with ML -- Loading the data -- Creating transformers -- Creating an estimator -- Creating a pipeline -- Fitting the model -- Evaluating the performance of the model -- Saving the model -- Parameter hyper-tuning -- Grid search -- Train-validation splitting -- Other features of PySpark ML in action -- Feature extraction -- NLP -- related feature extractors -- Discretizing continuous variables -- Standardizing continuous variables -- Classification -- Clustering -- Finding clusters in the births dataset -- Topic mining -- Regression -- Summary -- Chapter 7: GraphFrames -- Introducing GraphFrames -- Installing GraphFrames -- Creating a library -- Preparing your flights dataset -- Building the graph -- Executing simple queries -- Determining the number of airports and trips. | |
505 | 8 | |a Determining the longest delay in this dataset -- Determining the number of delayed versus on-time/early flights -- What flights departing Seattle are most likely to have significant delays? -- What states tend to have significant delays departing from Seattle? -- Understanding vertex degrees -- Determining the top transfer airports -- Understanding motifs -- Determining airport ranking using PageRank -- Determining the most popular non-stop flights -- Using Breadth-First Search -- Visualizing flights using D3 -- Summary -- Chapter 8: TensorFrames -- What is Deep Learning? -- The need for neural networks and Deep Learning -- What is feature engineering? -- Bridging the data and algorithm -- What is TensorFlow? -- Installing Pip -- Installing TensorFlow -- Matrix multiplication using constants -- Matrix multiplication using placeholders -- Running the model -- Running another model -- Discussion -- Introducing TensorFrames -- TensorFrames -- quick start -- Configuration and setup -- Launching a Spark cluster -- Creating a TensorFrames library -- Installing TensorFlow on your cluster -- Using TensorFlow to add a constant to an existing column -- Executing the Tensor graph -- Blockwise reducing operations example -- Building a DataFrame of vectors -- Analysing the DataFrame -- Computing elementwise sum and min of all vectors -- Summary -- Chapter 9: Polyglot Persistence with Blaze -- Installing Blaze -- Polyglot persistence -- Abstracting data -- Working with NumPy arrays -- Working with pandas' DataFrame -- Working with files -- Working with databases -- Interacting with relational databases -- Interacting with the MongoDB database -- Data operations -- Accessing columns -- Symbolic transformations -- Operations on columns -- Reducing data -- Joins -- Summary -- Chapter 10: Structured Streaming -- What is Spark Streaming?. | |
505 | 8 | |a Why do we need Spark Streaming? -- What is the Spark Streaming application data flow? -- Simple streaming application using DStreams -- A quick primer on global aggregations -- Introducing Structured Streaming -- Summary -- Chapter 11: Packaging Spark Applications -- The spark-submit command -- Command line parameters -- Deploying the app programmatically -- Configuring your SparkSession -- Creating SparkSession -- Modularizing code -- Structure of the module -- Calculating the distance between two points -- Converting distance units -- Building an egg -- User defined functions in Spark -- Submitting a job -- Monitoring execution -- Databricks Jobs -- Summary -- Index. | |
520 | 8 | |a Annotation |b Build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 About This Book - Learn why and how you can efficiently use Python to process data and build machine learning models in Apache Spark 2.0 - Develop and deploy efficient, scalable real-time Spark solutions - Take your understanding of using Spark with Python to the next level with this jump start guide Who This Book Is For If you are a Python developer who wants to learn about the Apache Spark 2.0 ecosystem, this book is for you. A firm understanding of Python is expected to get the best out of the book. Familiarity with Spark would be useful, but is not mandatory. What You Will Learn - Learn about Apache Spark and the Spark 2.0 architecture - Build and interact with Spark DataFrames using Spark SQL - Learn how to solve graph and deep learning problems using GraphFrames and TensorFrames respectively - Read, transform, and understand data and use it to train machine learning models - Build machine learning models with MLlib and ML - Learn how to submit your applications programmatically using spark-submit - Deploy locally built applications to a cluster In Detail Apache Spark is an open source framework for efficient cluster computing with a strong interface for data parallelism and fault tolerance. This book will show you how to leverage the power of Python and put it to use in the Spark ecosystem. You will start by getting a firm understanding of the Spark 2.0 architecture and how to set up a Python environment for Spark. You will get familiar with the modules available in PySpark. You will learn how to abstract data with RDDs and DataFrames and understand the streaming capabilities of PySpark. Also, you will get a thorough overview of machine learning capabilities of PySpark using ML and MLlib, graph processing using GraphFrames, and polyglot persistence using Blaze. Finally, you will learn how to deploy your applications to the cloud using the spark-submit command. By the end of this book, you will have established a firm understanding of the Spark Python API and how it can be used to build data-intensive applications. Style and approach This book takes a very comprehensive, step-by-step approach so you understand how the Spark ecosystem can be used with Python to develop efficient, scalable solutions. Every chapter is standalone and written in a very easy-to-understand manner, with a focus on both the hows and the whys of each concept. | |
650 | 0 | |a Application software |x Development. |0 http://id.loc.gov/authorities/subjects/sh95009362 | |
650 | 0 | |a Python (Computer program language) |0 http://id.loc.gov/authorities/subjects/sh96008834 | |
650 | 0 | |a SPARK (Computer program language) |0 http://id.loc.gov/authorities/subjects/sh2015001170 | |
650 | 6 | |a Logiciels d'application |x Développement. | |
650 | 6 | |a Python (Langage de programmation) | |
650 | 7 | |a COMPUTERS |x Databases |x Data Mining. |2 bisacsh | |
650 | 7 | |a Application software |x Development |2 fast | |
650 | 7 | |a Python (Computer program language) |2 fast | |
650 | 7 | |a SPARK (Computer program language) |2 fast | |
700 | 1 | |a Lee, Denny, |e author. | |
700 | 1 | |a Karau, Holden, |e writer of foreword. |0 http://id.loc.gov/authorities/names/no2015027417 | |
758 | |i has work: |a Learning PySpark (Text) |1 https://id.oclc.org/worldcat/entity/E39PCFHvgRtHt4y7mDh6KMyVBX |4 https://id.oclc.org/worldcat/ontology/hasWork | ||
776 | 0 | 8 | |i Print version: |a Drabas, Tomasz. |t Learning PySpark. |d Birmingham : Packt Publishing, ©2017 |
856 | 4 | 0 | |l FWS01 |p ZDB-4-EBA |q FWS_PDA_EBA |u https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1477650 |3 Volltext |
938 | |a ProQuest MyiLibrary Digital eBook Collection |b IDEB |n cis35945158 | ||
938 | |a EBSCOhost |b EBSC |n 1477650 | ||
938 | |a YBP Library Services |b YANK |n 13522893 | ||
994 | |a 92 |b GEBAY | ||
912 | |a ZDB-4-EBA | ||
049 | |a DE-863 |
Datensatz im Suchindex
DE-BY-FWS_katkey | ZDB-4-EBA-ocn976408019 |
---|---|
_version_ | 1816882382707359744 |
adam_text | |
any_adam_object | |
author | Drabas, Tomasz Lee, Denny |
author_GND | http://id.loc.gov/authorities/names/no2015027417 |
author_facet | Drabas, Tomasz Lee, Denny |
author_role | aut aut |
author_sort | Drabas, Tomasz |
author_variant | t d td d l dl |
building | Verbundindex |
bvnumber | localFWS |
callnumber-first | Q - Science |
callnumber-label | QA76 |
callnumber-raw | QA76.76.A65 |
callnumber-search | QA76.76.A65 |
callnumber-sort | QA 276.76 A65 |
callnumber-subject | QA - Mathematics |
collection | ZDB-4-EBA |
contents | Cover -- Copyright -- Credits -- Foreword -- About the Authors -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Understanding Spark -- What is Apache Spark? -- Spark Jobs and APIs -- Execution process -- Resilient Distributed Dataset -- DataFrames -- Datasets -- Catalyst Optimizer -- Project Tungsten -- Spark 2.0 architecture -- Unifying Datasets and DataFrames -- Introducing SparkSession -- Tungsten phase 2 -- Structured streaming -- Continuous applications -- Summary -- Chapter 2: Resilient Distributed Datasets -- Internal workings of an RDD -- Creating RDDs -- Schema -- Reading from files -- Lambda expressions -- Global versus local scope -- Transformations -- The .map(...) transformation -- The .filter(...) transformation -- The .flatMap(...) transformation -- The .distinct(...) transformation -- The .sample(...) transformation -- The .leftOuterJoin(...) transformation -- The .repartition(...) transformation -- Actions -- The .take(...) method -- The .collect(...) method -- The .reduce(...) method -- The .count(...) method -- The .saveAsTextFile(...) method -- The .foreach(...) method -- Summary -- Chapter 3: DataFrames -- Python to RDD communications -- Catalyst Optimizer refresh -- Speeding up PySpark with DataFrames -- Creating DataFrames -- Generating our own JSON data -- Creating a DataFrame -- Creating a temporary table -- Simple DataFrame queries -- DataFrame API query -- SQL query -- Interoperating with RDDs -- Inferring the schema using reflection -- Programmatically specifying the schema -- Querying with the DataFrame API -- Number of rows -- Running filter statements -- Querying with SQL -- Number of rows -- Running filter statements using the where Clauses -- DataFrame scenario -- on-time flight performance -- Preparing the source datasets. Joining flight performance and airports -- Visualizing our flight-performance data -- Spark Dataset API -- Summary -- Chapter 4: Prepare Data for Modeling -- Checking for duplicates, missing observations, and outliers -- Duplicates -- Missing observations -- Outliers -- Getting familiar with your data -- Descriptive statistics -- Correlations -- Visualization -- Histograms -- Interactions between features -- Summary -- Chapter 5: Introducing MLlib -- Overview of the package -- Loading and transforming the data -- Getting to know your data -- Descriptive statistics -- Correlations -- Statistical testing -- Creating the final dataset -- Creating an RDD of LabeledPoints -- Splitting into training and testing -- Predicting infant survival -- Logistic regression in MLlib -- Selecting only the most predictable features -- Random forest in MLlib -- Summary -- Chapter 6: Introducing the ML Package -- Overview of the package -- Transformer -- Estimators -- Classification -- Regression -- Clustering -- Pipeline -- Predicting the chances of infant survival with ML -- Loading the data -- Creating transformers -- Creating an estimator -- Creating a pipeline -- Fitting the model -- Evaluating the performance of the model -- Saving the model -- Parameter hyper-tuning -- Grid search -- Train-validation splitting -- Other features of PySpark ML in action -- Feature extraction -- NLP -- related feature extractors -- Discretizing continuous variables -- Standardizing continuous variables -- Classification -- Clustering -- Finding clusters in the births dataset -- Topic mining -- Regression -- Summary -- Chapter 7: GraphFrames -- Introducing GraphFrames -- Installing GraphFrames -- Creating a library -- Preparing your flights dataset -- Building the graph -- Executing simple queries -- Determining the number of airports and trips. Determining the longest delay in this dataset -- Determining the number of delayed versus on-time/early flights -- What flights departing Seattle are most likely to have significant delays? -- What states tend to have significant delays departing from Seattle? -- Understanding vertex degrees -- Determining the top transfer airports -- Understanding motifs -- Determining airport ranking using PageRank -- Determining the most popular non-stop flights -- Using Breadth-First Search -- Visualizing flights using D3 -- Summary -- Chapter 8: TensorFrames -- What is Deep Learning? -- The need for neural networks and Deep Learning -- What is feature engineering? -- Bridging the data and algorithm -- What is TensorFlow? -- Installing Pip -- Installing TensorFlow -- Matrix multiplication using constants -- Matrix multiplication using placeholders -- Running the model -- Running another model -- Discussion -- Introducing TensorFrames -- TensorFrames -- quick start -- Configuration and setup -- Launching a Spark cluster -- Creating a TensorFrames library -- Installing TensorFlow on your cluster -- Using TensorFlow to add a constant to an existing column -- Executing the Tensor graph -- Blockwise reducing operations example -- Building a DataFrame of vectors -- Analysing the DataFrame -- Computing elementwise sum and min of all vectors -- Summary -- Chapter 9: Polyglot Persistence with Blaze -- Installing Blaze -- Polyglot persistence -- Abstracting data -- Working with NumPy arrays -- Working with pandas' DataFrame -- Working with files -- Working with databases -- Interacting with relational databases -- Interacting with the MongoDB database -- Data operations -- Accessing columns -- Symbolic transformations -- Operations on columns -- Reducing data -- Joins -- Summary -- Chapter 10: Structured Streaming -- What is Spark Streaming?. Why do we need Spark Streaming? -- What is the Spark Streaming application data flow? -- Simple streaming application using DStreams -- A quick primer on global aggregations -- Introducing Structured Streaming -- Summary -- Chapter 11: Packaging Spark Applications -- The spark-submit command -- Command line parameters -- Deploying the app programmatically -- Configuring your SparkSession -- Creating SparkSession -- Modularizing code -- Structure of the module -- Calculating the distance between two points -- Converting distance units -- Building an egg -- User defined functions in Spark -- Submitting a job -- Monitoring execution -- Databricks Jobs -- Summary -- Index. |
ctrlnum | (OCoLC)976408019 |
dewey-full | 005.7 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 005 - Computer programming, programs, data, security |
dewey-raw | 005.7 |
dewey-search | 005.7 |
dewey-sort | 15.7 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik |
format | Electronic eBook |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>11643cam a2200637 i 4500</leader><controlfield tag="001">ZDB-4-EBA-ocn976408019</controlfield><controlfield tag="003">OCoLC</controlfield><controlfield tag="005">20241004212047.0</controlfield><controlfield tag="006">m o d </controlfield><controlfield tag="007">cr unu||||||||</controlfield><controlfield tag="008">170317s2017 enkab o 001 0 eng d</controlfield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">UMI</subfield><subfield code="b">eng</subfield><subfield code="e">rda</subfield><subfield code="e">pn</subfield><subfield code="c">UMI</subfield><subfield code="d">TEFOD</subfield><subfield code="d">OCLCF</subfield><subfield code="d">IDEBK</subfield><subfield code="d">STF</subfield><subfield code="d">TOH</subfield><subfield code="d">OCLCQ</subfield><subfield code="d">N$T</subfield><subfield code="d">COO</subfield><subfield code="d">UOK</subfield><subfield code="d">CEF</subfield><subfield code="d">KSU</subfield><subfield code="d">DEBBG</subfield><subfield code="d">UAB</subfield><subfield code="d">YDX</subfield><subfield code="d">MOF</subfield><subfield code="d">AU@</subfield><subfield code="d">OCLCO</subfield><subfield code="d">OCLCQ</subfield><subfield code="d">OCLCO</subfield><subfield code="d">OCLCL</subfield><subfield code="d">OCLCQ</subfield></datafield><datafield tag="019" ind1=" " ind2=" "><subfield code="a">1081417339</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781786466259</subfield><subfield code="q">(electronic bk.)</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1786466252</subfield><subfield code="q">(electronic bk.)</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="z">9781786463708</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="z">1786463709</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)976408019</subfield><subfield code="z">(OCoLC)1081417339</subfield></datafield><datafield tag="037" ind1=" " ind2=" "><subfield code="a">CL0500000840</subfield><subfield code="b">Safari Books Online</subfield></datafield><datafield tag="037" ind1=" " ind2=" "><subfield code="a">978A042E-251E-4460-88A6-41FFF582EF91</subfield><subfield code="b">OverDrive, Inc.</subfield><subfield code="n">http://www.overdrive.com</subfield></datafield><datafield tag="050" ind1=" " ind2="4"><subfield code="a">QA76.76.A65</subfield></datafield><datafield tag="072" ind1=" " ind2="7"><subfield code="a">COM</subfield><subfield code="x">021030</subfield><subfield code="2">bisacsh</subfield></datafield><datafield tag="082" ind1="7" ind2=" "><subfield code="a">005.7</subfield><subfield code="2">23</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">MAIN</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Drabas, Tomasz,</subfield><subfield code="e">author.</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Learning PySpark :</subfield><subfield code="b">build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 /</subfield><subfield code="c">Tomasz Drabas, Denny Lee ; foreword by Holden Karau.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Birmingham, UK :</subfield><subfield code="b">Packt Publishing,</subfield><subfield code="c">2017.</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 online resource (1 volume) :</subfield><subfield code="b">illustrations, maps</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">computer</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">online resource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="588" ind1=" " ind2=" "><subfield code="a">Description based on online resource; title from title page (viewed March 17, 2017).</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes index.</subfield></datafield><datafield tag="505" ind1="0" ind2=" "><subfield code="a">Cover -- Copyright -- Credits -- Foreword -- About the Authors -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Understanding Spark -- What is Apache Spark? -- Spark Jobs and APIs -- Execution process -- Resilient Distributed Dataset -- DataFrames -- Datasets -- Catalyst Optimizer -- Project Tungsten -- Spark 2.0 architecture -- Unifying Datasets and DataFrames -- Introducing SparkSession -- Tungsten phase 2 -- Structured streaming -- Continuous applications -- Summary -- Chapter 2: Resilient Distributed Datasets -- Internal workings of an RDD -- Creating RDDs -- Schema -- Reading from files -- Lambda expressions -- Global versus local scope -- Transformations -- The .map(...) transformation -- The .filter(...) transformation -- The .flatMap(...) transformation -- The .distinct(...) transformation -- The .sample(...) transformation -- The .leftOuterJoin(...) transformation -- The .repartition(...) transformation -- Actions -- The .take(...) method -- The .collect(...) method -- The .reduce(...) method -- The .count(...) method -- The .saveAsTextFile(...) method -- The .foreach(...) method -- Summary -- Chapter 3: DataFrames -- Python to RDD communications -- Catalyst Optimizer refresh -- Speeding up PySpark with DataFrames -- Creating DataFrames -- Generating our own JSON data -- Creating a DataFrame -- Creating a temporary table -- Simple DataFrame queries -- DataFrame API query -- SQL query -- Interoperating with RDDs -- Inferring the schema using reflection -- Programmatically specifying the schema -- Querying with the DataFrame API -- Number of rows -- Running filter statements -- Querying with SQL -- Number of rows -- Running filter statements using the where Clauses -- DataFrame scenario -- on-time flight performance -- Preparing the source datasets.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Joining flight performance and airports -- Visualizing our flight-performance data -- Spark Dataset API -- Summary -- Chapter 4: Prepare Data for Modeling -- Checking for duplicates, missing observations, and outliers -- Duplicates -- Missing observations -- Outliers -- Getting familiar with your data -- Descriptive statistics -- Correlations -- Visualization -- Histograms -- Interactions between features -- Summary -- Chapter 5: Introducing MLlib -- Overview of the package -- Loading and transforming the data -- Getting to know your data -- Descriptive statistics -- Correlations -- Statistical testing -- Creating the final dataset -- Creating an RDD of LabeledPoints -- Splitting into training and testing -- Predicting infant survival -- Logistic regression in MLlib -- Selecting only the most predictable features -- Random forest in MLlib -- Summary -- Chapter 6: Introducing the ML Package -- Overview of the package -- Transformer -- Estimators -- Classification -- Regression -- Clustering -- Pipeline -- Predicting the chances of infant survival with ML -- Loading the data -- Creating transformers -- Creating an estimator -- Creating a pipeline -- Fitting the model -- Evaluating the performance of the model -- Saving the model -- Parameter hyper-tuning -- Grid search -- Train-validation splitting -- Other features of PySpark ML in action -- Feature extraction -- NLP -- related feature extractors -- Discretizing continuous variables -- Standardizing continuous variables -- Classification -- Clustering -- Finding clusters in the births dataset -- Topic mining -- Regression -- Summary -- Chapter 7: GraphFrames -- Introducing GraphFrames -- Installing GraphFrames -- Creating a library -- Preparing your flights dataset -- Building the graph -- Executing simple queries -- Determining the number of airports and trips.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Determining the longest delay in this dataset -- Determining the number of delayed versus on-time/early flights -- What flights departing Seattle are most likely to have significant delays? -- What states tend to have significant delays departing from Seattle? -- Understanding vertex degrees -- Determining the top transfer airports -- Understanding motifs -- Determining airport ranking using PageRank -- Determining the most popular non-stop flights -- Using Breadth-First Search -- Visualizing flights using D3 -- Summary -- Chapter 8: TensorFrames -- What is Deep Learning? -- The need for neural networks and Deep Learning -- What is feature engineering? -- Bridging the data and algorithm -- What is TensorFlow? -- Installing Pip -- Installing TensorFlow -- Matrix multiplication using constants -- Matrix multiplication using placeholders -- Running the model -- Running another model -- Discussion -- Introducing TensorFrames -- TensorFrames -- quick start -- Configuration and setup -- Launching a Spark cluster -- Creating a TensorFrames library -- Installing TensorFlow on your cluster -- Using TensorFlow to add a constant to an existing column -- Executing the Tensor graph -- Blockwise reducing operations example -- Building a DataFrame of vectors -- Analysing the DataFrame -- Computing elementwise sum and min of all vectors -- Summary -- Chapter 9: Polyglot Persistence with Blaze -- Installing Blaze -- Polyglot persistence -- Abstracting data -- Working with NumPy arrays -- Working with pandas' DataFrame -- Working with files -- Working with databases -- Interacting with relational databases -- Interacting with the MongoDB database -- Data operations -- Accessing columns -- Symbolic transformations -- Operations on columns -- Reducing data -- Joins -- Summary -- Chapter 10: Structured Streaming -- What is Spark Streaming?.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Why do we need Spark Streaming? -- What is the Spark Streaming application data flow? -- Simple streaming application using DStreams -- A quick primer on global aggregations -- Introducing Structured Streaming -- Summary -- Chapter 11: Packaging Spark Applications -- The spark-submit command -- Command line parameters -- Deploying the app programmatically -- Configuring your SparkSession -- Creating SparkSession -- Modularizing code -- Structure of the module -- Calculating the distance between two points -- Converting distance units -- Building an egg -- User defined functions in Spark -- Submitting a job -- Monitoring execution -- Databricks Jobs -- Summary -- Index.</subfield></datafield><datafield tag="520" ind1="8" ind2=" "><subfield code="a">Annotation</subfield><subfield code="b">Build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 About This Book - Learn why and how you can efficiently use Python to process data and build machine learning models in Apache Spark 2.0 - Develop and deploy efficient, scalable real-time Spark solutions - Take your understanding of using Spark with Python to the next level with this jump start guide Who This Book Is For If you are a Python developer who wants to learn about the Apache Spark 2.0 ecosystem, this book is for you. A firm understanding of Python is expected to get the best out of the book. Familiarity with Spark would be useful, but is not mandatory. What You Will Learn - Learn about Apache Spark and the Spark 2.0 architecture - Build and interact with Spark DataFrames using Spark SQL - Learn how to solve graph and deep learning problems using GraphFrames and TensorFrames respectively - Read, transform, and understand data and use it to train machine learning models - Build machine learning models with MLlib and ML - Learn how to submit your applications programmatically using spark-submit - Deploy locally built applications to a cluster In Detail Apache Spark is an open source framework for efficient cluster computing with a strong interface for data parallelism and fault tolerance. This book will show you how to leverage the power of Python and put it to use in the Spark ecosystem. You will start by getting a firm understanding of the Spark 2.0 architecture and how to set up a Python environment for Spark. You will get familiar with the modules available in PySpark. You will learn how to abstract data with RDDs and DataFrames and understand the streaming capabilities of PySpark. Also, you will get a thorough overview of machine learning capabilities of PySpark using ML and MLlib, graph processing using GraphFrames, and polyglot persistence using Blaze. Finally, you will learn how to deploy your applications to the cloud using the spark-submit command. By the end of this book, you will have established a firm understanding of the Spark Python API and how it can be used to build data-intensive applications. Style and approach This book takes a very comprehensive, step-by-step approach so you understand how the Spark ecosystem can be used with Python to develop efficient, scalable solutions. Every chapter is standalone and written in a very easy-to-understand manner, with a focus on both the hows and the whys of each concept.</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Application software</subfield><subfield code="x">Development.</subfield><subfield code="0">http://id.loc.gov/authorities/subjects/sh95009362</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Python (Computer program language)</subfield><subfield code="0">http://id.loc.gov/authorities/subjects/sh96008834</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">SPARK (Computer program language)</subfield><subfield code="0">http://id.loc.gov/authorities/subjects/sh2015001170</subfield></datafield><datafield tag="650" ind1=" " ind2="6"><subfield code="a">Logiciels d'application</subfield><subfield code="x">Développement.</subfield></datafield><datafield tag="650" ind1=" " ind2="6"><subfield code="a">Python (Langage de programmation)</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">COMPUTERS</subfield><subfield code="x">Databases</subfield><subfield code="x">Data Mining.</subfield><subfield code="2">bisacsh</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Application software</subfield><subfield code="x">Development</subfield><subfield code="2">fast</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Python (Computer program language)</subfield><subfield code="2">fast</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">SPARK (Computer program language)</subfield><subfield code="2">fast</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Lee, Denny,</subfield><subfield code="e">author.</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Karau, Holden,</subfield><subfield code="e">writer of foreword.</subfield><subfield code="0">http://id.loc.gov/authorities/names/no2015027417</subfield></datafield><datafield tag="758" ind1=" " ind2=" "><subfield code="i">has work:</subfield><subfield code="a">Learning PySpark (Text)</subfield><subfield code="1">https://id.oclc.org/worldcat/entity/E39PCFHvgRtHt4y7mDh6KMyVBX</subfield><subfield code="4">https://id.oclc.org/worldcat/ontology/hasWork</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Print version:</subfield><subfield code="a">Drabas, Tomasz.</subfield><subfield code="t">Learning PySpark.</subfield><subfield code="d">Birmingham : Packt Publishing, ©2017</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="l">FWS01</subfield><subfield code="p">ZDB-4-EBA</subfield><subfield code="q">FWS_PDA_EBA</subfield><subfield code="u">https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1477650</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="938" ind1=" " ind2=" "><subfield code="a">ProQuest MyiLibrary Digital eBook Collection</subfield><subfield code="b">IDEB</subfield><subfield code="n">cis35945158</subfield></datafield><datafield tag="938" ind1=" " ind2=" "><subfield code="a">EBSCOhost</subfield><subfield code="b">EBSC</subfield><subfield code="n">1477650</subfield></datafield><datafield tag="938" ind1=" " ind2=" "><subfield code="a">YBP Library Services</subfield><subfield code="b">YANK</subfield><subfield code="n">13522893</subfield></datafield><datafield tag="994" ind1=" " ind2=" "><subfield code="a">92</subfield><subfield code="b">GEBAY</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-4-EBA</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-863</subfield></datafield></record></collection> |
id | ZDB-4-EBA-ocn976408019 |
illustrated | Illustrated |
indexdate | 2024-11-27T13:27:44Z |
institution | BVB |
isbn | 9781786466259 1786466252 |
language | English |
oclc_num | 976408019 |
open_access_boolean | |
owner | MAIN DE-863 DE-BY-FWS |
owner_facet | MAIN DE-863 DE-BY-FWS |
physical | 1 online resource (1 volume) : illustrations, maps |
psigel | ZDB-4-EBA |
publishDate | 2017 |
publishDateSearch | 2017 |
publishDateSort | 2017 |
publisher | Packt Publishing, |
record_format | marc |
spelling | Drabas, Tomasz, author. Learning PySpark : build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 / Tomasz Drabas, Denny Lee ; foreword by Holden Karau. Birmingham, UK : Packt Publishing, 2017. 1 online resource (1 volume) : illustrations, maps text txt rdacontent computer c rdamedia online resource cr rdacarrier Description based on online resource; title from title page (viewed March 17, 2017). Includes index. Cover -- Copyright -- Credits -- Foreword -- About the Authors -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Understanding Spark -- What is Apache Spark? -- Spark Jobs and APIs -- Execution process -- Resilient Distributed Dataset -- DataFrames -- Datasets -- Catalyst Optimizer -- Project Tungsten -- Spark 2.0 architecture -- Unifying Datasets and DataFrames -- Introducing SparkSession -- Tungsten phase 2 -- Structured streaming -- Continuous applications -- Summary -- Chapter 2: Resilient Distributed Datasets -- Internal workings of an RDD -- Creating RDDs -- Schema -- Reading from files -- Lambda expressions -- Global versus local scope -- Transformations -- The .map(...) transformation -- The .filter(...) transformation -- The .flatMap(...) transformation -- The .distinct(...) transformation -- The .sample(...) transformation -- The .leftOuterJoin(...) transformation -- The .repartition(...) transformation -- Actions -- The .take(...) method -- The .collect(...) method -- The .reduce(...) method -- The .count(...) method -- The .saveAsTextFile(...) method -- The .foreach(...) method -- Summary -- Chapter 3: DataFrames -- Python to RDD communications -- Catalyst Optimizer refresh -- Speeding up PySpark with DataFrames -- Creating DataFrames -- Generating our own JSON data -- Creating a DataFrame -- Creating a temporary table -- Simple DataFrame queries -- DataFrame API query -- SQL query -- Interoperating with RDDs -- Inferring the schema using reflection -- Programmatically specifying the schema -- Querying with the DataFrame API -- Number of rows -- Running filter statements -- Querying with SQL -- Number of rows -- Running filter statements using the where Clauses -- DataFrame scenario -- on-time flight performance -- Preparing the source datasets. Joining flight performance and airports -- Visualizing our flight-performance data -- Spark Dataset API -- Summary -- Chapter 4: Prepare Data for Modeling -- Checking for duplicates, missing observations, and outliers -- Duplicates -- Missing observations -- Outliers -- Getting familiar with your data -- Descriptive statistics -- Correlations -- Visualization -- Histograms -- Interactions between features -- Summary -- Chapter 5: Introducing MLlib -- Overview of the package -- Loading and transforming the data -- Getting to know your data -- Descriptive statistics -- Correlations -- Statistical testing -- Creating the final dataset -- Creating an RDD of LabeledPoints -- Splitting into training and testing -- Predicting infant survival -- Logistic regression in MLlib -- Selecting only the most predictable features -- Random forest in MLlib -- Summary -- Chapter 6: Introducing the ML Package -- Overview of the package -- Transformer -- Estimators -- Classification -- Regression -- Clustering -- Pipeline -- Predicting the chances of infant survival with ML -- Loading the data -- Creating transformers -- Creating an estimator -- Creating a pipeline -- Fitting the model -- Evaluating the performance of the model -- Saving the model -- Parameter hyper-tuning -- Grid search -- Train-validation splitting -- Other features of PySpark ML in action -- Feature extraction -- NLP -- related feature extractors -- Discretizing continuous variables -- Standardizing continuous variables -- Classification -- Clustering -- Finding clusters in the births dataset -- Topic mining -- Regression -- Summary -- Chapter 7: GraphFrames -- Introducing GraphFrames -- Installing GraphFrames -- Creating a library -- Preparing your flights dataset -- Building the graph -- Executing simple queries -- Determining the number of airports and trips. Determining the longest delay in this dataset -- Determining the number of delayed versus on-time/early flights -- What flights departing Seattle are most likely to have significant delays? -- What states tend to have significant delays departing from Seattle? -- Understanding vertex degrees -- Determining the top transfer airports -- Understanding motifs -- Determining airport ranking using PageRank -- Determining the most popular non-stop flights -- Using Breadth-First Search -- Visualizing flights using D3 -- Summary -- Chapter 8: TensorFrames -- What is Deep Learning? -- The need for neural networks and Deep Learning -- What is feature engineering? -- Bridging the data and algorithm -- What is TensorFlow? -- Installing Pip -- Installing TensorFlow -- Matrix multiplication using constants -- Matrix multiplication using placeholders -- Running the model -- Running another model -- Discussion -- Introducing TensorFrames -- TensorFrames -- quick start -- Configuration and setup -- Launching a Spark cluster -- Creating a TensorFrames library -- Installing TensorFlow on your cluster -- Using TensorFlow to add a constant to an existing column -- Executing the Tensor graph -- Blockwise reducing operations example -- Building a DataFrame of vectors -- Analysing the DataFrame -- Computing elementwise sum and min of all vectors -- Summary -- Chapter 9: Polyglot Persistence with Blaze -- Installing Blaze -- Polyglot persistence -- Abstracting data -- Working with NumPy arrays -- Working with pandas' DataFrame -- Working with files -- Working with databases -- Interacting with relational databases -- Interacting with the MongoDB database -- Data operations -- Accessing columns -- Symbolic transformations -- Operations on columns -- Reducing data -- Joins -- Summary -- Chapter 10: Structured Streaming -- What is Spark Streaming?. Why do we need Spark Streaming? -- What is the Spark Streaming application data flow? -- Simple streaming application using DStreams -- A quick primer on global aggregations -- Introducing Structured Streaming -- Summary -- Chapter 11: Packaging Spark Applications -- The spark-submit command -- Command line parameters -- Deploying the app programmatically -- Configuring your SparkSession -- Creating SparkSession -- Modularizing code -- Structure of the module -- Calculating the distance between two points -- Converting distance units -- Building an egg -- User defined functions in Spark -- Submitting a job -- Monitoring execution -- Databricks Jobs -- Summary -- Index. Annotation Build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 About This Book - Learn why and how you can efficiently use Python to process data and build machine learning models in Apache Spark 2.0 - Develop and deploy efficient, scalable real-time Spark solutions - Take your understanding of using Spark with Python to the next level with this jump start guide Who This Book Is For If you are a Python developer who wants to learn about the Apache Spark 2.0 ecosystem, this book is for you. A firm understanding of Python is expected to get the best out of the book. Familiarity with Spark would be useful, but is not mandatory. What You Will Learn - Learn about Apache Spark and the Spark 2.0 architecture - Build and interact with Spark DataFrames using Spark SQL - Learn how to solve graph and deep learning problems using GraphFrames and TensorFrames respectively - Read, transform, and understand data and use it to train machine learning models - Build machine learning models with MLlib and ML - Learn how to submit your applications programmatically using spark-submit - Deploy locally built applications to a cluster In Detail Apache Spark is an open source framework for efficient cluster computing with a strong interface for data parallelism and fault tolerance. This book will show you how to leverage the power of Python and put it to use in the Spark ecosystem. You will start by getting a firm understanding of the Spark 2.0 architecture and how to set up a Python environment for Spark. You will get familiar with the modules available in PySpark. You will learn how to abstract data with RDDs and DataFrames and understand the streaming capabilities of PySpark. Also, you will get a thorough overview of machine learning capabilities of PySpark using ML and MLlib, graph processing using GraphFrames, and polyglot persistence using Blaze. Finally, you will learn how to deploy your applications to the cloud using the spark-submit command. By the end of this book, you will have established a firm understanding of the Spark Python API and how it can be used to build data-intensive applications. Style and approach This book takes a very comprehensive, step-by-step approach so you understand how the Spark ecosystem can be used with Python to develop efficient, scalable solutions. Every chapter is standalone and written in a very easy-to-understand manner, with a focus on both the hows and the whys of each concept. Application software Development. http://id.loc.gov/authorities/subjects/sh95009362 Python (Computer program language) http://id.loc.gov/authorities/subjects/sh96008834 SPARK (Computer program language) http://id.loc.gov/authorities/subjects/sh2015001170 Logiciels d'application Développement. Python (Langage de programmation) COMPUTERS Databases Data Mining. bisacsh Application software Development fast Python (Computer program language) fast SPARK (Computer program language) fast Lee, Denny, author. Karau, Holden, writer of foreword. http://id.loc.gov/authorities/names/no2015027417 has work: Learning PySpark (Text) https://id.oclc.org/worldcat/entity/E39PCFHvgRtHt4y7mDh6KMyVBX https://id.oclc.org/worldcat/ontology/hasWork Print version: Drabas, Tomasz. Learning PySpark. Birmingham : Packt Publishing, ©2017 FWS01 ZDB-4-EBA FWS_PDA_EBA https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1477650 Volltext |
spellingShingle | Drabas, Tomasz Lee, Denny Learning PySpark : build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 / Cover -- Copyright -- Credits -- Foreword -- About the Authors -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Understanding Spark -- What is Apache Spark? -- Spark Jobs and APIs -- Execution process -- Resilient Distributed Dataset -- DataFrames -- Datasets -- Catalyst Optimizer -- Project Tungsten -- Spark 2.0 architecture -- Unifying Datasets and DataFrames -- Introducing SparkSession -- Tungsten phase 2 -- Structured streaming -- Continuous applications -- Summary -- Chapter 2: Resilient Distributed Datasets -- Internal workings of an RDD -- Creating RDDs -- Schema -- Reading from files -- Lambda expressions -- Global versus local scope -- Transformations -- The .map(...) transformation -- The .filter(...) transformation -- The .flatMap(...) transformation -- The .distinct(...) transformation -- The .sample(...) transformation -- The .leftOuterJoin(...) transformation -- The .repartition(...) transformation -- Actions -- The .take(...) method -- The .collect(...) method -- The .reduce(...) method -- The .count(...) method -- The .saveAsTextFile(...) method -- The .foreach(...) method -- Summary -- Chapter 3: DataFrames -- Python to RDD communications -- Catalyst Optimizer refresh -- Speeding up PySpark with DataFrames -- Creating DataFrames -- Generating our own JSON data -- Creating a DataFrame -- Creating a temporary table -- Simple DataFrame queries -- DataFrame API query -- SQL query -- Interoperating with RDDs -- Inferring the schema using reflection -- Programmatically specifying the schema -- Querying with the DataFrame API -- Number of rows -- Running filter statements -- Querying with SQL -- Number of rows -- Running filter statements using the where Clauses -- DataFrame scenario -- on-time flight performance -- Preparing the source datasets. Joining flight performance and airports -- Visualizing our flight-performance data -- Spark Dataset API -- Summary -- Chapter 4: Prepare Data for Modeling -- Checking for duplicates, missing observations, and outliers -- Duplicates -- Missing observations -- Outliers -- Getting familiar with your data -- Descriptive statistics -- Correlations -- Visualization -- Histograms -- Interactions between features -- Summary -- Chapter 5: Introducing MLlib -- Overview of the package -- Loading and transforming the data -- Getting to know your data -- Descriptive statistics -- Correlations -- Statistical testing -- Creating the final dataset -- Creating an RDD of LabeledPoints -- Splitting into training and testing -- Predicting infant survival -- Logistic regression in MLlib -- Selecting only the most predictable features -- Random forest in MLlib -- Summary -- Chapter 6: Introducing the ML Package -- Overview of the package -- Transformer -- Estimators -- Classification -- Regression -- Clustering -- Pipeline -- Predicting the chances of infant survival with ML -- Loading the data -- Creating transformers -- Creating an estimator -- Creating a pipeline -- Fitting the model -- Evaluating the performance of the model -- Saving the model -- Parameter hyper-tuning -- Grid search -- Train-validation splitting -- Other features of PySpark ML in action -- Feature extraction -- NLP -- related feature extractors -- Discretizing continuous variables -- Standardizing continuous variables -- Classification -- Clustering -- Finding clusters in the births dataset -- Topic mining -- Regression -- Summary -- Chapter 7: GraphFrames -- Introducing GraphFrames -- Installing GraphFrames -- Creating a library -- Preparing your flights dataset -- Building the graph -- Executing simple queries -- Determining the number of airports and trips. Determining the longest delay in this dataset -- Determining the number of delayed versus on-time/early flights -- What flights departing Seattle are most likely to have significant delays? -- What states tend to have significant delays departing from Seattle? -- Understanding vertex degrees -- Determining the top transfer airports -- Understanding motifs -- Determining airport ranking using PageRank -- Determining the most popular non-stop flights -- Using Breadth-First Search -- Visualizing flights using D3 -- Summary -- Chapter 8: TensorFrames -- What is Deep Learning? -- The need for neural networks and Deep Learning -- What is feature engineering? -- Bridging the data and algorithm -- What is TensorFlow? -- Installing Pip -- Installing TensorFlow -- Matrix multiplication using constants -- Matrix multiplication using placeholders -- Running the model -- Running another model -- Discussion -- Introducing TensorFrames -- TensorFrames -- quick start -- Configuration and setup -- Launching a Spark cluster -- Creating a TensorFrames library -- Installing TensorFlow on your cluster -- Using TensorFlow to add a constant to an existing column -- Executing the Tensor graph -- Blockwise reducing operations example -- Building a DataFrame of vectors -- Analysing the DataFrame -- Computing elementwise sum and min of all vectors -- Summary -- Chapter 9: Polyglot Persistence with Blaze -- Installing Blaze -- Polyglot persistence -- Abstracting data -- Working with NumPy arrays -- Working with pandas' DataFrame -- Working with files -- Working with databases -- Interacting with relational databases -- Interacting with the MongoDB database -- Data operations -- Accessing columns -- Symbolic transformations -- Operations on columns -- Reducing data -- Joins -- Summary -- Chapter 10: Structured Streaming -- What is Spark Streaming?. Why do we need Spark Streaming? -- What is the Spark Streaming application data flow? -- Simple streaming application using DStreams -- A quick primer on global aggregations -- Introducing Structured Streaming -- Summary -- Chapter 11: Packaging Spark Applications -- The spark-submit command -- Command line parameters -- Deploying the app programmatically -- Configuring your SparkSession -- Creating SparkSession -- Modularizing code -- Structure of the module -- Calculating the distance between two points -- Converting distance units -- Building an egg -- User defined functions in Spark -- Submitting a job -- Monitoring execution -- Databricks Jobs -- Summary -- Index. Application software Development. http://id.loc.gov/authorities/subjects/sh95009362 Python (Computer program language) http://id.loc.gov/authorities/subjects/sh96008834 SPARK (Computer program language) http://id.loc.gov/authorities/subjects/sh2015001170 Logiciels d'application Développement. Python (Langage de programmation) COMPUTERS Databases Data Mining. bisacsh Application software Development fast Python (Computer program language) fast SPARK (Computer program language) fast |
subject_GND | http://id.loc.gov/authorities/subjects/sh95009362 http://id.loc.gov/authorities/subjects/sh96008834 http://id.loc.gov/authorities/subjects/sh2015001170 |
title | Learning PySpark : build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 / |
title_auth | Learning PySpark : build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 / |
title_exact_search | Learning PySpark : build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 / |
title_full | Learning PySpark : build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 / Tomasz Drabas, Denny Lee ; foreword by Holden Karau. |
title_fullStr | Learning PySpark : build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 / Tomasz Drabas, Denny Lee ; foreword by Holden Karau. |
title_full_unstemmed | Learning PySpark : build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 / Tomasz Drabas, Denny Lee ; foreword by Holden Karau. |
title_short | Learning PySpark : |
title_sort | learning pyspark build data intensive applications locally and deploy at scale using the combined powers of python and spark 2 0 |
title_sub | build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 / |
topic | Application software Development. http://id.loc.gov/authorities/subjects/sh95009362 Python (Computer program language) http://id.loc.gov/authorities/subjects/sh96008834 SPARK (Computer program language) http://id.loc.gov/authorities/subjects/sh2015001170 Logiciels d'application Développement. Python (Langage de programmation) COMPUTERS Databases Data Mining. bisacsh Application software Development fast Python (Computer program language) fast SPARK (Computer program language) fast |
topic_facet | Application software Development. Python (Computer program language) SPARK (Computer program language) Logiciels d'application Développement. Python (Langage de programmation) COMPUTERS Databases Data Mining. Application software Development |
url | https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1477650 |
work_keys_str_mv | AT drabastomasz learningpysparkbuilddataintensiveapplicationslocallyanddeployatscaleusingthecombinedpowersofpythonandspark20 AT leedenny learningpysparkbuilddataintensiveapplicationslocallyanddeployatscaleusingthecombinedpowersofpythonandspark20 AT karauholden learningpysparkbuilddataintensiveapplicationslocallyanddeployatscaleusingthecombinedpowersofpythonandspark20 |