Holdings: Hands-On Big Data Analytics with PySpark

Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs

bUse PySpark to easily crush messy data at-scale and discover proven techniques to create testable, immutable, and easily parallelizable Spark jobs/b h4Key Features/h4 ulliWork with large amounts of agile data using distributed datasets and in-memory caching /li liSource data from all popular data h...

Full description

Saved in:

Bibliographic Details
Main Author:	Lai, Rudy (Author)
Format:	Electronic eBook
Language:	English
Published:	Birmingham Packt Publishing Limited 2019
Edition:	1
Subjects:	COMPUTERS / Intelligence (AI) &amp Semantics COMPUTERS / Data Modeling &amp Design
Summary:	bUse PySpark to easily crush messy data at-scale and discover proven techniques to create testable, immutable, and easily parallelizable Spark jobs/b h4Key Features/h4 ulliWork with large amounts of agile data using distributed datasets and in-memory caching /li liSource data from all popular data hosting platforms, such as HDFS, Hive, JSON, and S3 /li liEmploy the easy-to-use PySpark API to deploy big data Analytics for production/li/ul h4Book Description/h4 Apache Spark is an open source parallel-processing framework that has been around for quite some time now. One of the many uses of Apache Spark is for data analytics applications across clustered computers. In this book, you will not only learn how to use Spark and the Python API to create high-performance analytics with big data, but also discover techniques for testing, immunizing, and parallelizing Spark jobs. You will learn how to source data from all popular data hosting platforms, including HDFS, Hive, JSON, and S3, and deal with large datasets with PySpark to gain practical big data experience. This book will help you work on prototypes on local machines and subsequently go on to handle messy data in production and at scale. This book covers installing and setting up PySpark, RDD operations, big data cleaning and wrangling, and aggregating and summarizing data into useful reports. You will also learn how to implement some practical and proven techniques to improve certain aspects of programming and administration in Apache Spark. By the end of the book, you will be able to build big data analytical solutions using the various PySpark offerings and also optimize them effectively. h4What you will learn/h4 ulliGet practical big data experience while working on messy datasets /li liAnalyze patterns with Spark SQL to improve your business intelligence /li liUse PySpark's interactive shell to speed up development time /li liCreate highly concurrent Spark programs by leveraging immutability /li liDiscover ways to avoid the most expensive operation in the Spark API: the shuffle operation /li liRe-design your jobs to use reduceByKey instead of groupBy /li liCreate robust processing pipelines by testing Apache Spark jobs/li/ul h4Who this book is for/h4 This book is for developers, data scientists, business analysts, or anyone who needs to reliably analyze large amounts of large-scale, real-world data. Whether you're tasked with creating your company's business intelligence function or creating great data platforms for your machine learning models, or are looking to use code to magnify the impact of your business, this book is for you
Physical Description:	1 Online-Ressource (182 Seiten)
ISBN:	9781838648831

Staff View

MARC


LEADER	00000nmm a2200000zc 4500
001	BV047069978
003	DE-604
005	00000000000000.0
007	cr\|uuu---uuuuu
008	201218s2019 \|\|\|\| o\|\|u\| \|\|\|\|\|\|eng d
020			\|a 9781838648831 \|9 978-1-83864-883-1
035			\|a (ZDB-5-WPSE)9781838648831182
035			\|a (OCoLC)1227478095
035			\|a (DE-599)BVBBV047069978
040			\|a DE-604 \|b ger \|e rda
041	0		\|a eng
100	1		\|a Lai, Rudy \|e Verfasser \|4 aut
245	1	0	\|a Hands-On Big Data Analytics with PySpark \|b Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs \|c Lai, Rudy
250			\|a 1
264		1	\|a Birmingham \|b Packt Publishing Limited \|c 2019
300			\|a 1 Online-Ressource (182 Seiten)
336			\|b txt \|2 rdacontent
337			\|b c \|2 rdamedia
338			\|b cr \|2 rdacarrier
520			\|a bUse PySpark to easily crush messy data at-scale and discover proven techniques to create testable, immutable, and easily parallelizable Spark jobs/b h4Key Features/h4 ulliWork with large amounts of agile data using distributed datasets and in-memory caching /li liSource data from all popular data hosting platforms, such as HDFS, Hive, JSON, and S3 /li liEmploy the easy-to-use PySpark API to deploy big data Analytics for production/li/ul h4Book Description/h4 Apache Spark is an open source parallel-processing framework that has been around for quite some time now. One of the many uses of Apache Spark is for data analytics applications across clustered computers. In this book, you will not only learn how to use Spark and the Python API to create high-performance analytics with big data, but also discover techniques for testing, immunizing, and parallelizing Spark jobs.
520			\|a You will learn how to source data from all popular data hosting platforms, including HDFS, Hive, JSON, and S3, and deal with large datasets with PySpark to gain practical big data experience. This book will help you work on prototypes on local machines and subsequently go on to handle messy data in production and at scale. This book covers installing and setting up PySpark, RDD operations, big data cleaning and wrangling, and aggregating and summarizing data into useful reports. You will also learn how to implement some practical and proven techniques to improve certain aspects of programming and administration in Apache Spark. By the end of the book, you will be able to build big data analytical solutions using the various PySpark offerings and also optimize them effectively.
520			\|a h4What you will learn/h4 ulliGet practical big data experience while working on messy datasets /li liAnalyze patterns with Spark SQL to improve your business intelligence /li liUse PySpark's interactive shell to speed up development time /li liCreate highly concurrent Spark programs by leveraging immutability /li liDiscover ways to avoid the most expensive operation in the Spark API: the shuffle operation /li liRe-design your jobs to use reduceByKey instead of groupBy /li liCreate robust processing pipelines by testing Apache Spark jobs/li/ul h4Who this book is for/h4 This book is for developers, data scientists, business analysts, or anyone who needs to reliably analyze large amounts of large-scale, real-world data. Whether you're tasked with creating your company's business intelligence function or creating great data platforms for your machine learning models, or are looking to use code to magnify the impact of your business, this book is for you
650		4	\|a COMPUTERS / Intelligence (AI) &amp
650		4	\|a Semantics
650		4	\|a COMPUTERS / Data Modeling &amp
650		4	\|a Design
700	1		\|a Potaczek, Bartlomiej \|e Sonstige \|4 oth
912			\|a ZDB-5-WPSE
999			\|a oai:aleph.bib-bvb.de:BVB01-032477004

Record in the Search Index

_version_	1804182072361746432
adam_txt
any_adam_object
any_adam_object_boolean
author	Lai, Rudy
author_facet	Lai, Rudy
author_role	aut
author_sort	Lai, Rudy
author_variant	r l rl
building	Verbundindex
bvnumber	BV047069978
collection	ZDB-5-WPSE
ctrlnum	(ZDB-5-WPSE)9781838648831182 (OCoLC)1227478095 (DE-599)BVBBV047069978
edition	1
format	Electronic eBook
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>03789nmm a2200373zc 4500</leader><controlfield tag="001">BV047069978</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">00000000000000.0</controlfield><controlfield tag="007">cr\|uuu---uuuuu</controlfield><controlfield tag="008">201218s2019 \|\|\|\| o\|\|u\| \|\|\|\|\|\|eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781838648831</subfield><subfield code="9">978-1-83864-883-1</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ZDB-5-WPSE)9781838648831182</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1227478095</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV047069978</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Lai, Rudy</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Hands-On Big Data Analytics with PySpark</subfield><subfield code="b">Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs</subfield><subfield code="c">Lai, Rudy</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Birmingham</subfield><subfield code="b">Packt Publishing Limited</subfield><subfield code="c">2019</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 Online-Ressource (182 Seiten)</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">bUse PySpark to easily crush messy data at-scale and discover proven techniques to create testable, immutable, and easily parallelizable Spark jobs/b h4Key Features/h4 ulliWork with large amounts of agile data using distributed datasets and in-memory caching /li liSource data from all popular data hosting platforms, such as HDFS, Hive, JSON, and S3 /li liEmploy the easy-to-use PySpark API to deploy big data Analytics for production/li/ul h4Book Description/h4 Apache Spark is an open source parallel-processing framework that has been around for quite some time now. One of the many uses of Apache Spark is for data analytics applications across clustered computers. In this book, you will not only learn how to use Spark and the Python API to create high-performance analytics with big data, but also discover techniques for testing, immunizing, and parallelizing Spark jobs. </subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a"> You will learn how to source data from all popular data hosting platforms, including HDFS, Hive, JSON, and S3, and deal with large datasets with PySpark to gain practical big data experience. This book will help you work on prototypes on local machines and subsequently go on to handle messy data in production and at scale. This book covers installing and setting up PySpark, RDD operations, big data cleaning and wrangling, and aggregating and summarizing data into useful reports. You will also learn how to implement some practical and proven techniques to improve certain aspects of programming and administration in Apache Spark. By the end of the book, you will be able to build big data analytical solutions using the various PySpark offerings and also optimize them effectively. </subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">h4What you will learn/h4 ulliGet practical big data experience while working on messy datasets /li liAnalyze patterns with Spark SQL to improve your business intelligence /li liUse PySpark's interactive shell to speed up development time /li liCreate highly concurrent Spark programs by leveraging immutability /li liDiscover ways to avoid the most expensive operation in the Spark API: the shuffle operation /li liRe-design your jobs to use reduceByKey instead of groupBy /li liCreate robust processing pipelines by testing Apache Spark jobs/li/ul h4Who this book is for/h4 This book is for developers, data scientists, business analysts, or anyone who needs to reliably analyze large amounts of large-scale, real-world data. Whether you're tasked with creating your company's business intelligence function or creating great data platforms for your machine learning models, or are looking to use code to magnify the impact of your business, this book is for you</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">COMPUTERS / Intelligence (AI) &amp</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Semantics</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">COMPUTERS / Data Modeling &amp</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Design</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Potaczek, Bartlomiej</subfield><subfield code="e">Sonstige</subfield><subfield code="4">oth</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-5-WPSE</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-032477004</subfield></datafield></record></collection>
id	DE-604.BV047069978
illustrated	Not Illustrated
index_date	2024-07-03T16:13:34Z
indexdate	2024-07-10T09:01:44Z
institution	BVB
isbn	9781838648831
language	English
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-032477004
oclc_num	1227478095
open_access_boolean
physical	1 Online-Ressource (182 Seiten)
psigel	ZDB-5-WPSE
publishDate	2019
publishDateSearch	2019
publishDateSort	2019
publisher	Packt Publishing Limited
record_format	marc
spelling	Lai, Rudy Verfasser aut Hands-On Big Data Analytics with PySpark Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs Lai, Rudy 1 Birmingham Packt Publishing Limited 2019 1 Online-Ressource (182 Seiten) txt rdacontent c rdamedia cr rdacarrier bUse PySpark to easily crush messy data at-scale and discover proven techniques to create testable, immutable, and easily parallelizable Spark jobs/b h4Key Features/h4 ulliWork with large amounts of agile data using distributed datasets and in-memory caching /li liSource data from all popular data hosting platforms, such as HDFS, Hive, JSON, and S3 /li liEmploy the easy-to-use PySpark API to deploy big data Analytics for production/li/ul h4Book Description/h4 Apache Spark is an open source parallel-processing framework that has been around for quite some time now. One of the many uses of Apache Spark is for data analytics applications across clustered computers. In this book, you will not only learn how to use Spark and the Python API to create high-performance analytics with big data, but also discover techniques for testing, immunizing, and parallelizing Spark jobs. You will learn how to source data from all popular data hosting platforms, including HDFS, Hive, JSON, and S3, and deal with large datasets with PySpark to gain practical big data experience. This book will help you work on prototypes on local machines and subsequently go on to handle messy data in production and at scale. This book covers installing and setting up PySpark, RDD operations, big data cleaning and wrangling, and aggregating and summarizing data into useful reports. You will also learn how to implement some practical and proven techniques to improve certain aspects of programming and administration in Apache Spark. By the end of the book, you will be able to build big data analytical solutions using the various PySpark offerings and also optimize them effectively. h4What you will learn/h4 ulliGet practical big data experience while working on messy datasets /li liAnalyze patterns with Spark SQL to improve your business intelligence /li liUse PySpark's interactive shell to speed up development time /li liCreate highly concurrent Spark programs by leveraging immutability /li liDiscover ways to avoid the most expensive operation in the Spark API: the shuffle operation /li liRe-design your jobs to use reduceByKey instead of groupBy /li liCreate robust processing pipelines by testing Apache Spark jobs/li/ul h4Who this book is for/h4 This book is for developers, data scientists, business analysts, or anyone who needs to reliably analyze large amounts of large-scale, real-world data. Whether you're tasked with creating your company's business intelligence function or creating great data platforms for your machine learning models, or are looking to use code to magnify the impact of your business, this book is for you COMPUTERS / Intelligence (AI) &amp Semantics COMPUTERS / Data Modeling &amp Design Potaczek, Bartlomiej Sonstige oth
spellingShingle	Lai, Rudy Hands-On Big Data Analytics with PySpark Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs COMPUTERS / Intelligence (AI) &amp Semantics COMPUTERS / Data Modeling &amp Design
title	Hands-On Big Data Analytics with PySpark Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs
title_auth	Hands-On Big Data Analytics with PySpark Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs
title_exact_search	Hands-On Big Data Analytics with PySpark Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs
title_exact_search_txtP	Hands-On Big Data Analytics with PySpark Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs
title_full	Hands-On Big Data Analytics with PySpark Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs Lai, Rudy
title_fullStr	Hands-On Big Data Analytics with PySpark Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs Lai, Rudy
title_full_unstemmed	Hands-On Big Data Analytics with PySpark Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs Lai, Rudy
title_short	Hands-On Big Data Analytics with PySpark
title_sort	hands on big data analytics with pyspark analyze large datasets and discover techniques for testing immunizing and parallelizing spark jobs
title_sub	Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs
topic	COMPUTERS / Intelligence (AI) &amp Semantics COMPUTERS / Data Modeling &amp Design
topic_facet	COMPUTERS / Intelligence (AI) &amp Semantics COMPUTERS / Data Modeling &amp Design
work_keys_str_mv	AT lairudy handsonbigdataanalyticswithpysparkanalyzelargedatasetsanddiscovertechniquesfortestingimmunizingandparallelizingsparkjobs AT potaczekbartlomiej handsonbigdataanalyticswithpysparkanalyzelargedatasetsanddiscovertechniquesfortestingimmunizingandparallelizingsparkjobs

Holdings

There is no print copy available.

Interlibrary loan Place Request Caution: Not in THWS collection!

MARC

Record in the Search Index

There is no print copy available.

Similar Items