Verfügbarkeit: Hadoop blueprints

Hadoop blueprints: use Hadoop to solve business problems by learning from a rich set of real-life case studies

Cover -- Copyright -- Credits -- About the Authors -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Hadoop and Big Data -- The beginning of the big data problem -- Limitations of RDBMS systems -- Scaling out a database on Google -- Parallel processing of large...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Shrivastava, Anurag (VerfasserIn), Deshpande, Tanmay (VerfasserIn)
Format:	Elektronisch E-Book
Sprache:	English
Veröffentlicht:	Birmingham, England Packt Publishing 2016
Ausgabe:	1st ed
Schlagworte:	Hadoop Electronic data processing > Distributed processing Apache Hadoop Electronic data processing / Distributed processing Big data Electronic books
Online-Zugang:	BTW01 Volltext
Zusammenfassung:	Cover -- Copyright -- Credits -- About the Authors -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Hadoop and Big Data -- The beginning of the big data problem -- Limitations of RDBMS systems -- Scaling out a database on Google -- Parallel processing of large datasets -- Building open source Hadoop -- Enterprise Hadoop -- Social media and mobile channels -- Data storage cost reduction -- Enterprise software vendors -- Pure Play Hadoop vendors -- Cloud Hadoop vendors -- The design of the Hadoop system -- The Hadoop Distributed File System (HDFS) -- Data organization in HDFS -- HDFS file management commands -- NameNode and DataNodes -- Metadata store in NameNode -- Preventing a single point of failure with Hadoop HA -- Checkpointing process -- Data Store on a DataNode -- Handshakes and heartbeats -- MapReduce -- The execution model of MapReduce Version 1 -- Apache YARN -- Building a MapReduce Version 2 program -- Problem statement -- Solution workflow -- Getting the dataset -- Studying the dataset -- Cleaning the dataset -- Loading the dataset on the HDFS -- Starting with a MapReduce program -- Installing Eclipse -- Creating a project in Eclipse -- Coding and building a MapReduce program -- Run the MapReduce program locally -- Examine the result -- Run the MapReduce program on Hadoop -- Further processing of results -- Hadoop platform tools -- Data ingestion tools -- Data access tools -- Monitoring tools -- Data governance tools -- Big data use cases -- Creating a 360 degree view of a customer -- Fraud detection systems for banks -- Marketing campaign planning -- Churn detection in telecom -- Analyzing sensor data -- Building a data lake -- The architecture of Hadoop-based systems -- Lambda architecture -- Summary -- Chapter 2: A 360-Degree View of the Customer -- Capturing business information Collecting data from data sources -- Creating a data processing approach -- Presenting the results -- Setting up the technology stack -- Tools used -- Installing Hortonworks Sandbox -- Creating user accounts -- Exploring HUE -- Exploring MYSQL and the HIVE command line -- Exploring Sqoop at the command line -- Test driving Hive and Sqoop -- Querying data using Hive -- Importing data in Hive using Sqoop -- Engineering the solution -- Datasets -- Loading customer master data into Hadoop -- Loading web logs into Hadoop -- Loading tweets into Hadoop -- Creating the 360-degree view -- Exporting data from Hadoop -- Presenting the view -- Building a web application -- Installing Node.js -- Coding the web application in Node.js -- Summary -- Chapter 3: Building a Fraud Detection System -- Understanding the business problem -- Selecting and cleansing the dataset -- Finding relevant fields -- Machine learning for fraud detection -- Clustering as an unsupervised machine learning method -- Designing the high-level architecture -- Introducing Apache Spark -- Apache Spark architecture -- Resilient Distributed Datasets -- Transformation functions -- Actions -- Test driving Apache Spark -- Calculating the yearly average stock prices using Spark -- Apache Spark 2.X -- Understanding MLib -- Test driving K-means using MLib -- Creating our fraud detection model -- Building our K-means clustering model -- Processing the data -- Putting the fraud detection model to use -- Generating a data stream -- Processing the data stream using Spark streaming -- Putting the model to use -- Scaling the solution -- Summary -- Chapter 4: Marketing Campaign Planning -- Creating the solution outline -- Supervised learning -- Tree-structure models for classification -- Finding the right dataset -- Setting the up the solution architecture -- Coupon scan at POS -- Join and transform Train the classification model -- Scoring -- Mail merge -- Building the machine learning model -- [Introducing BigML] -- Introducing BigML -- Model building steps -- Sign up as a user on BigML site -- Upload the data file -- Creating the dataset -- Building the classification model -- Downloading the classification model -- Running the Model on Hadoop -- Creating the target List -- Post campaign activities -- Summary -- Chapter 5: Churn Detection -- A business case for churn detection -- Creating the solution outline -- Building a predictive model using Hadoop -- Bayes' Theorem -- Playing with the Bayesian predictor -- Running a Node.js-based Bayesian predictor -- Understanding the predictor code -- Limitations of our solution -- Building a churn predictor using Hadoop -- Synthetic data generation tools -- Preparing a synthetic historical churn dataset -- The processing approach -- Running the MapReduce program -- Understanding the frequency counter code -- Putting the model to use -- Integrating the churn predictor -- Summary -- Chapter 6: Analyze Sensor Data Using Hadoop -- A business case for sensor data analytics -- Creating the solution outline -- Technology stack -- Kafka -- Flume -- HDFS -- Hive -- Open TSDB -- HBase -- Grafana -- Batch data analytics -- Loading streams of sensor data from Kafka topics to HDFS -- Using Hive to perform analytics on inserted data -- Data visualization in MS Excel -- Stream data analytics -- Loading streams of sensor data -- Data visualization using Grafana -- Summary -- Chapter 7: Building a Data Lake -- Data lake building blocks -- Ingestion tier -- Storage tier -- Insights tier -- Ops facilities -- Limitation of open source Hadoop ecosystem tools -- Hadoop security -- HDFS permissions model -- Fine-grained permissions with HDFS ACLs -- Apache Ranger -- Installing Apache Ranger -- Test driving Apache Ranger
Beschreibung:	1 online resource (312 pages)
ISBN:	9781783980314

Internformat

MARC


LEADER	00000nmm a22000001c 4500
001	BV049477400
003	DE-604
005	00000000000000.0
007	cr\|uuu---uuuuu
008	231221s2016 \|\|\|\| o\|\|u\| \|\|\|\|\|\|eng d
020			\|a 9781783980314 \|c : electronic bk. \|9 978-1-78398-031-4
035			\|a (OCoLC)1416409381
035			\|a (DE-599)GBV881028223
040			\|a DE-604 \|b ger \|e rda
041	0		\|a eng
049			\|a DE-526
082	0		\|a 005.7585
084			\|a ST 253 \|0 (DE-625)143628: \|2 rvk
100	1		\|a Shrivastava, Anurag \|e Verfasser \|4 aut
245	1	0	\|a Hadoop blueprints \|b use Hadoop to solve business problems by learning from a rich set of real-life case studies \|c Anurag Shrivastava, Tanmay Deshpande.
250			\|a 1st ed
264		1	\|a Birmingham, England \|b Packt Publishing \|c 2016
300			\|a 1 online resource (312 pages)
336			\|b txt \|2 rdacontent
337			\|b c \|2 rdamedia
338			\|b cr \|2 rdacarrier
520	3		\|a Cover -- Copyright -- Credits -- About the Authors -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Hadoop and Big Data -- The beginning of the big data problem -- Limitations of RDBMS systems -- Scaling out a database on Google -- Parallel processing of large datasets -- Building open source Hadoop -- Enterprise Hadoop -- Social media and mobile channels -- Data storage cost reduction -- Enterprise software vendors -- Pure Play Hadoop vendors -- Cloud Hadoop vendors -- The design of the Hadoop system -- The Hadoop Distributed File System (HDFS) -- Data organization in HDFS -- HDFS file management commands -- NameNode and DataNodes -- Metadata store in NameNode -- Preventing a single point of failure with Hadoop HA -- Checkpointing process -- Data Store on a DataNode -- Handshakes and heartbeats -- MapReduce -- The execution model of MapReduce Version 1 -- Apache YARN -- Building a MapReduce Version 2 program -- Problem statement -- Solution workflow -- Getting the dataset -- Studying the dataset -- Cleaning the dataset -- Loading the dataset on the HDFS -- Starting with a MapReduce program -- Installing Eclipse -- Creating a project in Eclipse -- Coding and building a MapReduce program -- Run the MapReduce program locally -- Examine the result -- Run the MapReduce program on Hadoop -- Further processing of results -- Hadoop platform tools -- Data ingestion tools -- Data access tools -- Monitoring tools -- Data governance tools -- Big data use cases -- Creating a 360 degree view of a customer -- Fraud detection systems for banks -- Marketing campaign planning -- Churn detection in telecom -- Analyzing sensor data -- Building a data lake -- The architecture of Hadoop-based systems -- Lambda architecture -- Summary -- Chapter 2: A 360-Degree View of the Customer -- Capturing business information
520	3		\|a Collecting data from data sources -- Creating a data processing approach -- Presenting the results -- Setting up the technology stack -- Tools used -- Installing Hortonworks Sandbox -- Creating user accounts -- Exploring HUE -- Exploring MYSQL and the HIVE command line -- Exploring Sqoop at the command line -- Test driving Hive and Sqoop -- Querying data using Hive -- Importing data in Hive using Sqoop -- Engineering the solution -- Datasets -- Loading customer master data into Hadoop -- Loading web logs into Hadoop -- Loading tweets into Hadoop -- Creating the 360-degree view -- Exporting data from Hadoop -- Presenting the view -- Building a web application -- Installing Node.js -- Coding the web application in Node.js -- Summary -- Chapter 3: Building a Fraud Detection System -- Understanding the business problem -- Selecting and cleansing the dataset -- Finding relevant fields -- Machine learning for fraud detection -- Clustering as an unsupervised machine learning method -- Designing the high-level architecture -- Introducing Apache Spark -- Apache Spark architecture -- Resilient Distributed Datasets -- Transformation functions -- Actions -- Test driving Apache Spark -- Calculating the yearly average stock prices using Spark -- Apache Spark 2.X -- Understanding MLib -- Test driving K-means using MLib -- Creating our fraud detection model -- Building our K-means clustering model -- Processing the data -- Putting the fraud detection model to use -- Generating a data stream -- Processing the data stream using Spark streaming -- Putting the model to use -- Scaling the solution -- Summary -- Chapter 4: Marketing Campaign Planning -- Creating the solution outline -- Supervised learning -- Tree-structure models for classification -- Finding the right dataset -- Setting the up the solution architecture -- Coupon scan at POS -- Join and transform
520	3		\|a Train the classification model -- Scoring -- Mail merge -- Building the machine learning model -- [Introducing BigML] -- Introducing BigML -- Model building steps -- Sign up as a user on BigML site -- Upload the data file -- Creating the dataset -- Building the classification model -- Downloading the classification model -- Running the Model on Hadoop -- Creating the target List -- Post campaign activities -- Summary -- Chapter 5: Churn Detection -- A business case for churn detection -- Creating the solution outline -- Building a predictive model using Hadoop -- Bayes' Theorem -- Playing with the Bayesian predictor -- Running a Node.js-based Bayesian predictor -- Understanding the predictor code -- Limitations of our solution -- Building a churn predictor using Hadoop -- Synthetic data generation tools -- Preparing a synthetic historical churn dataset -- The processing approach -- Running the MapReduce program -- Understanding the frequency counter code -- Putting the model to use -- Integrating the churn predictor -- Summary -- Chapter 6: Analyze Sensor Data Using Hadoop -- A business case for sensor data analytics -- Creating the solution outline -- Technology stack -- Kafka -- Flume -- HDFS -- Hive -- Open TSDB -- HBase -- Grafana -- Batch data analytics -- Loading streams of sensor data from Kafka topics to HDFS -- Using Hive to perform analytics on inserted data -- Data visualization in MS Excel -- Stream data analytics -- Loading streams of sensor data -- Data visualization using Grafana -- Summary -- Chapter 7: Building a Data Lake -- Data lake building blocks -- Ingestion tier -- Storage tier -- Insights tier -- Ops facilities -- Limitation of open source Hadoop ecosystem tools -- Hadoop security -- HDFS permissions model -- Fine-grained permissions with HDFS ACLs -- Apache Ranger -- Installing Apache Ranger -- Test driving Apache Ranger
650	0	7	\|a Hadoop \|0 (DE-588)1022420135 \|2 gnd \|9 rswk-swf
653		0	\|a Electronic data processing--Distributed processing
653		0	\|a Apache Hadoop
653		0	\|a Electronic data processing / Distributed processing
653		0	\|a Big data
653		0	\|a Electronic books
689	0	0	\|a Hadoop \|0 (DE-588)1022420135 \|D s
689	0		\|5 DE-604
700	1		\|a Deshpande, Tanmay \|e Verfasser \|4 aut
856	4	0	\|m X:EBC \|u https://ebookcentral.proquest.com/lib/kxp/detail.action?docID=4709437 \|x Verlag \|3 Volltext
912			\|a ZDB-30-PQE
999			\|a oai:aleph.bib-bvb.de:BVB01-034822901
966	e		\|u https://ebookcentral.proquest.com/lib/th-wildau/detail.action?docID=4709437 \|l BTW01 \|p ZDB-30-PQE \|q BTW_PDA_PQE_KAUF \|x Aggregator \|3 Volltext

Datensatz im Suchindex

_version_	1804186264920915968
adam_txt
any_adam_object
any_adam_object_boolean
author	Shrivastava, Anurag Deshpande, Tanmay
author_facet	Shrivastava, Anurag Deshpande, Tanmay
author_role	aut aut
author_sort	Shrivastava, Anurag
author_variant	a s as t d td
building	Verbundindex
bvnumber	BV049477400
classification_rvk	ST 253
collection	ZDB-30-PQE
ctrlnum	(OCoLC)1416409381 (DE-599)GBV881028223
dewey-full	005.7585
dewey-hundreds	000 - Computer science, information, general works
dewey-ones	005 - Computer programming, programs, data, security
dewey-raw	005.7585
dewey-search	005.7585
dewey-sort	15.7585
dewey-tens	000 - Computer science, information, general works
discipline	Informatik
discipline_str_mv	Informatik
edition	1st ed
format	Electronic eBook
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>07308nmm a22004691c 4500</leader><controlfield tag="001">BV049477400</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">00000000000000.0</controlfield><controlfield tag="007">cr\|uuu---uuuuu</controlfield><controlfield tag="008">231221s2016 \|\|\|\| o\|\|u\| \|\|\|\|\|\|eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781783980314</subfield><subfield code="c">: electronic bk.</subfield><subfield code="9">978-1-78398-031-4</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1416409381</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBV881028223</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-526</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">005.7585</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 253</subfield><subfield code="0">(DE-625)143628:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Shrivastava, Anurag</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Hadoop blueprints</subfield><subfield code="b">use Hadoop to solve business problems by learning from a rich set of real-life case studies</subfield><subfield code="c">Anurag Shrivastava, Tanmay Deshpande.</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1st ed</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Birmingham, England</subfield><subfield code="b">Packt Publishing</subfield><subfield code="c">2016</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 online resource (312 pages)</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">Cover -- Copyright -- Credits -- About the Authors -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Hadoop and Big Data -- The beginning of the big data problem -- Limitations of RDBMS systems -- Scaling out a database on Google -- Parallel processing of large datasets -- Building open source Hadoop -- Enterprise Hadoop -- Social media and mobile channels -- Data storage cost reduction -- Enterprise software vendors -- Pure Play Hadoop vendors -- Cloud Hadoop vendors -- The design of the Hadoop system -- The Hadoop Distributed File System (HDFS) -- Data organization in HDFS -- HDFS file management commands -- NameNode and DataNodes -- Metadata store in NameNode -- Preventing a single point of failure with Hadoop HA -- Checkpointing process -- Data Store on a DataNode -- Handshakes and heartbeats -- MapReduce -- The execution model of MapReduce Version 1 -- Apache YARN -- Building a MapReduce Version 2 program -- Problem statement -- Solution workflow -- Getting the dataset -- Studying the dataset -- Cleaning the dataset -- Loading the dataset on the HDFS -- Starting with a MapReduce program -- Installing Eclipse -- Creating a project in Eclipse -- Coding and building a MapReduce program -- Run the MapReduce program locally -- Examine the result -- Run the MapReduce program on Hadoop -- Further processing of results -- Hadoop platform tools -- Data ingestion tools -- Data access tools -- Monitoring tools -- Data governance tools -- Big data use cases -- Creating a 360 degree view of a customer -- Fraud detection systems for banks -- Marketing campaign planning -- Churn detection in telecom -- Analyzing sensor data -- Building a data lake -- The architecture of Hadoop-based systems -- Lambda architecture -- Summary -- Chapter 2: A 360-Degree View of the Customer -- Capturing business information</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">Collecting data from data sources -- Creating a data processing approach -- Presenting the results -- Setting up the technology stack -- Tools used -- Installing Hortonworks Sandbox -- Creating user accounts -- Exploring HUE -- Exploring MYSQL and the HIVE command line -- Exploring Sqoop at the command line -- Test driving Hive and Sqoop -- Querying data using Hive -- Importing data in Hive using Sqoop -- Engineering the solution -- Datasets -- Loading customer master data into Hadoop -- Loading web logs into Hadoop -- Loading tweets into Hadoop -- Creating the 360-degree view -- Exporting data from Hadoop -- Presenting the view -- Building a web application -- Installing Node.js -- Coding the web application in Node.js -- Summary -- Chapter 3: Building a Fraud Detection System -- Understanding the business problem -- Selecting and cleansing the dataset -- Finding relevant fields -- Machine learning for fraud detection -- Clustering as an unsupervised machine learning method -- Designing the high-level architecture -- Introducing Apache Spark -- Apache Spark architecture -- Resilient Distributed Datasets -- Transformation functions -- Actions -- Test driving Apache Spark -- Calculating the yearly average stock prices using Spark -- Apache Spark 2.X -- Understanding MLib -- Test driving K-means using MLib -- Creating our fraud detection model -- Building our K-means clustering model -- Processing the data -- Putting the fraud detection model to use -- Generating a data stream -- Processing the data stream using Spark streaming -- Putting the model to use -- Scaling the solution -- Summary -- Chapter 4: Marketing Campaign Planning -- Creating the solution outline -- Supervised learning -- Tree-structure models for classification -- Finding the right dataset -- Setting the up the solution architecture -- Coupon scan at POS -- Join and transform</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">Train the classification model -- Scoring -- Mail merge -- Building the machine learning model -- [Introducing BigML] -- Introducing BigML -- Model building steps -- Sign up as a user on BigML site -- Upload the data file -- Creating the dataset -- Building the classification model -- Downloading the classification model -- Running the Model on Hadoop -- Creating the target List -- Post campaign activities -- Summary -- Chapter 5: Churn Detection -- A business case for churn detection -- Creating the solution outline -- Building a predictive model using Hadoop -- Bayes' Theorem -- Playing with the Bayesian predictor -- Running a Node.js-based Bayesian predictor -- Understanding the predictor code -- Limitations of our solution -- Building a churn predictor using Hadoop -- Synthetic data generation tools -- Preparing a synthetic historical churn dataset -- The processing approach -- Running the MapReduce program -- Understanding the frequency counter code -- Putting the model to use -- Integrating the churn predictor -- Summary -- Chapter 6: Analyze Sensor Data Using Hadoop -- A business case for sensor data analytics -- Creating the solution outline -- Technology stack -- Kafka -- Flume -- HDFS -- Hive -- Open TSDB -- HBase -- Grafana -- Batch data analytics -- Loading streams of sensor data from Kafka topics to HDFS -- Using Hive to perform analytics on inserted data -- Data visualization in MS Excel -- Stream data analytics -- Loading streams of sensor data -- Data visualization using Grafana -- Summary -- Chapter 7: Building a Data Lake -- Data lake building blocks -- Ingestion tier -- Storage tier -- Insights tier -- Ops facilities -- Limitation of open source Hadoop ecosystem tools -- Hadoop security -- HDFS permissions model -- Fine-grained permissions with HDFS ACLs -- Apache Ranger -- Installing Apache Ranger -- Test driving Apache Ranger</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Hadoop</subfield><subfield code="0">(DE-588)1022420135</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Electronic data processing--Distributed processing</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Apache Hadoop</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Electronic data processing / Distributed processing</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Big data</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Electronic books</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Hadoop</subfield><subfield code="0">(DE-588)1022420135</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Deshpande, Tanmay</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="m">X:EBC</subfield><subfield code="u">https://ebookcentral.proquest.com/lib/kxp/detail.action?docID=4709437</subfield><subfield code="x">Verlag</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-30-PQE</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-034822901</subfield></datafield><datafield tag="966" ind1="e" ind2=" "><subfield code="u">https://ebookcentral.proquest.com/lib/th-wildau/detail.action?docID=4709437</subfield><subfield code="l">BTW01</subfield><subfield code="p">ZDB-30-PQE</subfield><subfield code="q">BTW_PDA_PQE_KAUF</subfield><subfield code="x">Aggregator</subfield><subfield code="3">Volltext</subfield></datafield></record></collection>
id	DE-604.BV049477400
illustrated	Not Illustrated
index_date	2024-07-03T23:17:42Z
indexdate	2024-07-10T10:08:22Z
institution	BVB
isbn	9781783980314
language	English
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-034822901
oclc_num	1416409381
open_access_boolean
owner	DE-526
owner_facet	DE-526
physical	1 online resource (312 pages)
psigel	ZDB-30-PQE ZDB-30-PQE BTW_PDA_PQE_KAUF
publishDate	2016
publishDateSearch	2016
publishDateSort	2016
publisher	Packt Publishing
record_format	marc
spelling	Shrivastava, Anurag Verfasser aut Hadoop blueprints use Hadoop to solve business problems by learning from a rich set of real-life case studies Anurag Shrivastava, Tanmay Deshpande. 1st ed Birmingham, England Packt Publishing 2016 1 online resource (312 pages) txt rdacontent c rdamedia cr rdacarrier Cover -- Copyright -- Credits -- About the Authors -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Hadoop and Big Data -- The beginning of the big data problem -- Limitations of RDBMS systems -- Scaling out a database on Google -- Parallel processing of large datasets -- Building open source Hadoop -- Enterprise Hadoop -- Social media and mobile channels -- Data storage cost reduction -- Enterprise software vendors -- Pure Play Hadoop vendors -- Cloud Hadoop vendors -- The design of the Hadoop system -- The Hadoop Distributed File System (HDFS) -- Data organization in HDFS -- HDFS file management commands -- NameNode and DataNodes -- Metadata store in NameNode -- Preventing a single point of failure with Hadoop HA -- Checkpointing process -- Data Store on a DataNode -- Handshakes and heartbeats -- MapReduce -- The execution model of MapReduce Version 1 -- Apache YARN -- Building a MapReduce Version 2 program -- Problem statement -- Solution workflow -- Getting the dataset -- Studying the dataset -- Cleaning the dataset -- Loading the dataset on the HDFS -- Starting with a MapReduce program -- Installing Eclipse -- Creating a project in Eclipse -- Coding and building a MapReduce program -- Run the MapReduce program locally -- Examine the result -- Run the MapReduce program on Hadoop -- Further processing of results -- Hadoop platform tools -- Data ingestion tools -- Data access tools -- Monitoring tools -- Data governance tools -- Big data use cases -- Creating a 360 degree view of a customer -- Fraud detection systems for banks -- Marketing campaign planning -- Churn detection in telecom -- Analyzing sensor data -- Building a data lake -- The architecture of Hadoop-based systems -- Lambda architecture -- Summary -- Chapter 2: A 360-Degree View of the Customer -- Capturing business information Collecting data from data sources -- Creating a data processing approach -- Presenting the results -- Setting up the technology stack -- Tools used -- Installing Hortonworks Sandbox -- Creating user accounts -- Exploring HUE -- Exploring MYSQL and the HIVE command line -- Exploring Sqoop at the command line -- Test driving Hive and Sqoop -- Querying data using Hive -- Importing data in Hive using Sqoop -- Engineering the solution -- Datasets -- Loading customer master data into Hadoop -- Loading web logs into Hadoop -- Loading tweets into Hadoop -- Creating the 360-degree view -- Exporting data from Hadoop -- Presenting the view -- Building a web application -- Installing Node.js -- Coding the web application in Node.js -- Summary -- Chapter 3: Building a Fraud Detection System -- Understanding the business problem -- Selecting and cleansing the dataset -- Finding relevant fields -- Machine learning for fraud detection -- Clustering as an unsupervised machine learning method -- Designing the high-level architecture -- Introducing Apache Spark -- Apache Spark architecture -- Resilient Distributed Datasets -- Transformation functions -- Actions -- Test driving Apache Spark -- Calculating the yearly average stock prices using Spark -- Apache Spark 2.X -- Understanding MLib -- Test driving K-means using MLib -- Creating our fraud detection model -- Building our K-means clustering model -- Processing the data -- Putting the fraud detection model to use -- Generating a data stream -- Processing the data stream using Spark streaming -- Putting the model to use -- Scaling the solution -- Summary -- Chapter 4: Marketing Campaign Planning -- Creating the solution outline -- Supervised learning -- Tree-structure models for classification -- Finding the right dataset -- Setting the up the solution architecture -- Coupon scan at POS -- Join and transform Train the classification model -- Scoring -- Mail merge -- Building the machine learning model -- [Introducing BigML] -- Introducing BigML -- Model building steps -- Sign up as a user on BigML site -- Upload the data file -- Creating the dataset -- Building the classification model -- Downloading the classification model -- Running the Model on Hadoop -- Creating the target List -- Post campaign activities -- Summary -- Chapter 5: Churn Detection -- A business case for churn detection -- Creating the solution outline -- Building a predictive model using Hadoop -- Bayes' Theorem -- Playing with the Bayesian predictor -- Running a Node.js-based Bayesian predictor -- Understanding the predictor code -- Limitations of our solution -- Building a churn predictor using Hadoop -- Synthetic data generation tools -- Preparing a synthetic historical churn dataset -- The processing approach -- Running the MapReduce program -- Understanding the frequency counter code -- Putting the model to use -- Integrating the churn predictor -- Summary -- Chapter 6: Analyze Sensor Data Using Hadoop -- A business case for sensor data analytics -- Creating the solution outline -- Technology stack -- Kafka -- Flume -- HDFS -- Hive -- Open TSDB -- HBase -- Grafana -- Batch data analytics -- Loading streams of sensor data from Kafka topics to HDFS -- Using Hive to perform analytics on inserted data -- Data visualization in MS Excel -- Stream data analytics -- Loading streams of sensor data -- Data visualization using Grafana -- Summary -- Chapter 7: Building a Data Lake -- Data lake building blocks -- Ingestion tier -- Storage tier -- Insights tier -- Ops facilities -- Limitation of open source Hadoop ecosystem tools -- Hadoop security -- HDFS permissions model -- Fine-grained permissions with HDFS ACLs -- Apache Ranger -- Installing Apache Ranger -- Test driving Apache Ranger Hadoop (DE-588)1022420135 gnd rswk-swf Electronic data processing--Distributed processing Apache Hadoop Electronic data processing / Distributed processing Big data Electronic books Hadoop (DE-588)1022420135 s DE-604 Deshpande, Tanmay Verfasser aut X:EBC https://ebookcentral.proquest.com/lib/kxp/detail.action?docID=4709437 Verlag Volltext
spellingShingle	Shrivastava, Anurag Deshpande, Tanmay Hadoop blueprints use Hadoop to solve business problems by learning from a rich set of real-life case studies Hadoop (DE-588)1022420135 gnd
subject_GND	(DE-588)1022420135
title	Hadoop blueprints use Hadoop to solve business problems by learning from a rich set of real-life case studies
title_auth	Hadoop blueprints use Hadoop to solve business problems by learning from a rich set of real-life case studies
title_exact_search	Hadoop blueprints use Hadoop to solve business problems by learning from a rich set of real-life case studies
title_exact_search_txtP	Hadoop blueprints use Hadoop to solve business problems by learning from a rich set of real-life case studies
title_full	Hadoop blueprints use Hadoop to solve business problems by learning from a rich set of real-life case studies Anurag Shrivastava, Tanmay Deshpande.
title_fullStr	Hadoop blueprints use Hadoop to solve business problems by learning from a rich set of real-life case studies Anurag Shrivastava, Tanmay Deshpande.
title_full_unstemmed	Hadoop blueprints use Hadoop to solve business problems by learning from a rich set of real-life case studies Anurag Shrivastava, Tanmay Deshpande.
title_short	Hadoop blueprints
title_sort	hadoop blueprints use hadoop to solve business problems by learning from a rich set of real life case studies
title_sub	use Hadoop to solve business problems by learning from a rich set of real-life case studies
topic	Hadoop (DE-588)1022420135 gnd
topic_facet	Hadoop
url	https://ebookcentral.proquest.com/lib/kxp/detail.action?docID=4709437
work_keys_str_mv	AT shrivastavaanurag hadoopblueprintsusehadooptosolvebusinessproblemsbylearningfromarichsetofreallifecasestudies AT deshpandetanmay hadoopblueprintsusehadooptosolvebusinessproblemsbylearningfromarichsetofreallifecasestudies

Verfügbarkeit

Es ist kein Print-Exemplar vorhanden.

Fernleihe Bestellen Achtung: Nicht im THWS-Bestand! Volltext öffnen

MARC

Datensatz im Suchindex

Es ist kein Print-Exemplar vorhanden.

Ähnliche Einträge