Hadoop blueprints: use Hadoop to solve business problems by learning from a rich set of real-life case studies
Cover -- Copyright -- Credits -- About the Authors -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Hadoop and Big Data -- The beginning of the big data problem -- Limitations of RDBMS systems -- Scaling out a database on Google -- Parallel processing of large...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Elektronisch E-Book |
Sprache: | English |
Veröffentlicht: |
Birmingham, England
Packt Publishing
2016
|
Ausgabe: | 1st ed |
Schlagworte: | |
Online-Zugang: | BTW01 Volltext |
Zusammenfassung: | Cover -- Copyright -- Credits -- About the Authors -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Hadoop and Big Data -- The beginning of the big data problem -- Limitations of RDBMS systems -- Scaling out a database on Google -- Parallel processing of large datasets -- Building open source Hadoop -- Enterprise Hadoop -- Social media and mobile channels -- Data storage cost reduction -- Enterprise software vendors -- Pure Play Hadoop vendors -- Cloud Hadoop vendors -- The design of the Hadoop system -- The Hadoop Distributed File System (HDFS) -- Data organization in HDFS -- HDFS file management commands -- NameNode and DataNodes -- Metadata store in NameNode -- Preventing a single point of failure with Hadoop HA -- Checkpointing process -- Data Store on a DataNode -- Handshakes and heartbeats -- MapReduce -- The execution model of MapReduce Version 1 -- Apache YARN -- Building a MapReduce Version 2 program -- Problem statement -- Solution workflow -- Getting the dataset -- Studying the dataset -- Cleaning the dataset -- Loading the dataset on the HDFS -- Starting with a MapReduce program -- Installing Eclipse -- Creating a project in Eclipse -- Coding and building a MapReduce program -- Run the MapReduce program locally -- Examine the result -- Run the MapReduce program on Hadoop -- Further processing of results -- Hadoop platform tools -- Data ingestion tools -- Data access tools -- Monitoring tools -- Data governance tools -- Big data use cases -- Creating a 360 degree view of a customer -- Fraud detection systems for banks -- Marketing campaign planning -- Churn detection in telecom -- Analyzing sensor data -- Building a data lake -- The architecture of Hadoop-based systems -- Lambda architecture -- Summary -- Chapter 2: A 360-Degree View of the Customer -- Capturing business information Collecting data from data sources -- Creating a data processing approach -- Presenting the results -- Setting up the technology stack -- Tools used -- Installing Hortonworks Sandbox -- Creating user accounts -- Exploring HUE -- Exploring MYSQL and the HIVE command line -- Exploring Sqoop at the command line -- Test driving Hive and Sqoop -- Querying data using Hive -- Importing data in Hive using Sqoop -- Engineering the solution -- Datasets -- Loading customer master data into Hadoop -- Loading web logs into Hadoop -- Loading tweets into Hadoop -- Creating the 360-degree view -- Exporting data from Hadoop -- Presenting the view -- Building a web application -- Installing Node.js -- Coding the web application in Node.js -- Summary -- Chapter 3: Building a Fraud Detection System -- Understanding the business problem -- Selecting and cleansing the dataset -- Finding relevant fields -- Machine learning for fraud detection -- Clustering as an unsupervised machine learning method -- Designing the high-level architecture -- Introducing Apache Spark -- Apache Spark architecture -- Resilient Distributed Datasets -- Transformation functions -- Actions -- Test driving Apache Spark -- Calculating the yearly average stock prices using Spark -- Apache Spark 2.X -- Understanding MLib -- Test driving K-means using MLib -- Creating our fraud detection model -- Building our K-means clustering model -- Processing the data -- Putting the fraud detection model to use -- Generating a data stream -- Processing the data stream using Spark streaming -- Putting the model to use -- Scaling the solution -- Summary -- Chapter 4: Marketing Campaign Planning -- Creating the solution outline -- Supervised learning -- Tree-structure models for classification -- Finding the right dataset -- Setting the up the solution architecture -- Coupon scan at POS -- Join and transform Train the classification model -- Scoring -- Mail merge -- Building the machine learning model -- [Introducing BigML] -- Introducing BigML -- Model building steps -- Sign up as a user on BigML site -- Upload the data file -- Creating the dataset -- Building the classification model -- Downloading the classification model -- Running the Model on Hadoop -- Creating the target List -- Post campaign activities -- Summary -- Chapter 5: Churn Detection -- A business case for churn detection -- Creating the solution outline -- Building a predictive model using Hadoop -- Bayes' Theorem -- Playing with the Bayesian predictor -- Running a Node.js-based Bayesian predictor -- Understanding the predictor code -- Limitations of our solution -- Building a churn predictor using Hadoop -- Synthetic data generation tools -- Preparing a synthetic historical churn dataset -- The processing approach -- Running the MapReduce program -- Understanding the frequency counter code -- Putting the model to use -- Integrating the churn predictor -- Summary -- Chapter 6: Analyze Sensor Data Using Hadoop -- A business case for sensor data analytics -- Creating the solution outline -- Technology stack -- Kafka -- Flume -- HDFS -- Hive -- Open TSDB -- HBase -- Grafana -- Batch data analytics -- Loading streams of sensor data from Kafka topics to HDFS -- Using Hive to perform analytics on inserted data -- Data visualization in MS Excel -- Stream data analytics -- Loading streams of sensor data -- Data visualization using Grafana -- Summary -- Chapter 7: Building a Data Lake -- Data lake building blocks -- Ingestion tier -- Storage tier -- Insights tier -- Ops facilities -- Limitation of open source Hadoop ecosystem tools -- Hadoop security -- HDFS permissions model -- Fine-grained permissions with HDFS ACLs -- Apache Ranger -- Installing Apache Ranger -- Test driving Apache Ranger |
Beschreibung: | 1 online resource (312 pages) |
ISBN: | 9781783980314 |
Internformat
MARC
LEADER | 00000nmm a22000001c 4500 | ||
---|---|---|---|
001 | BV049477400 | ||
003 | DE-604 | ||
005 | 00000000000000.0 | ||
007 | cr|uuu---uuuuu | ||
008 | 231221s2016 |||| o||u| ||||||eng d | ||
020 | |a 9781783980314 |c : electronic bk. |9 978-1-78398-031-4 | ||
035 | |a (OCoLC)1416409381 | ||
035 | |a (DE-599)GBV881028223 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-526 | ||
082 | 0 | |a 005.7585 | |
084 | |a ST 253 |0 (DE-625)143628: |2 rvk | ||
100 | 1 | |a Shrivastava, Anurag |e Verfasser |4 aut | |
245 | 1 | 0 | |a Hadoop blueprints |b use Hadoop to solve business problems by learning from a rich set of real-life case studies |c Anurag Shrivastava, Tanmay Deshpande. |
250 | |a 1st ed | ||
264 | 1 | |a Birmingham, England |b Packt Publishing |c 2016 | |
300 | |a 1 online resource (312 pages) | ||
336 | |b txt |2 rdacontent | ||
337 | |b c |2 rdamedia | ||
338 | |b cr |2 rdacarrier | ||
520 | 3 | |a Cover -- Copyright -- Credits -- About the Authors -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Hadoop and Big Data -- The beginning of the big data problem -- Limitations of RDBMS systems -- Scaling out a database on Google -- Parallel processing of large datasets -- Building open source Hadoop -- Enterprise Hadoop -- Social media and mobile channels -- Data storage cost reduction -- Enterprise software vendors -- Pure Play Hadoop vendors -- Cloud Hadoop vendors -- The design of the Hadoop system -- The Hadoop Distributed File System (HDFS) -- Data organization in HDFS -- HDFS file management commands -- NameNode and DataNodes -- Metadata store in NameNode -- Preventing a single point of failure with Hadoop HA -- Checkpointing process -- Data Store on a DataNode -- Handshakes and heartbeats -- MapReduce -- The execution model of MapReduce Version 1 -- Apache YARN -- Building a MapReduce Version 2 program -- Problem statement -- Solution workflow -- Getting the dataset -- Studying the dataset -- Cleaning the dataset -- Loading the dataset on the HDFS -- Starting with a MapReduce program -- Installing Eclipse -- Creating a project in Eclipse -- Coding and building a MapReduce program -- Run the MapReduce program locally -- Examine the result -- Run the MapReduce program on Hadoop -- Further processing of results -- Hadoop platform tools -- Data ingestion tools -- Data access tools -- Monitoring tools -- Data governance tools -- Big data use cases -- Creating a 360 degree view of a customer -- Fraud detection systems for banks -- Marketing campaign planning -- Churn detection in telecom -- Analyzing sensor data -- Building a data lake -- The architecture of Hadoop-based systems -- Lambda architecture -- Summary -- Chapter 2: A 360-Degree View of the Customer -- Capturing business information | |
520 | 3 | |a Collecting data from data sources -- Creating a data processing approach -- Presenting the results -- Setting up the technology stack -- Tools used -- Installing Hortonworks Sandbox -- Creating user accounts -- Exploring HUE -- Exploring MYSQL and the HIVE command line -- Exploring Sqoop at the command line -- Test driving Hive and Sqoop -- Querying data using Hive -- Importing data in Hive using Sqoop -- Engineering the solution -- Datasets -- Loading customer master data into Hadoop -- Loading web logs into Hadoop -- Loading tweets into Hadoop -- Creating the 360-degree view -- Exporting data from Hadoop -- Presenting the view -- Building a web application -- Installing Node.js -- Coding the web application in Node.js -- Summary -- Chapter 3: Building a Fraud Detection System -- Understanding the business problem -- Selecting and cleansing the dataset -- Finding relevant fields -- Machine learning for fraud detection -- Clustering as an unsupervised machine learning method -- Designing the high-level architecture -- Introducing Apache Spark -- Apache Spark architecture -- Resilient Distributed Datasets -- Transformation functions -- Actions -- Test driving Apache Spark -- Calculating the yearly average stock prices using Spark -- Apache Spark 2.X -- Understanding MLib -- Test driving K-means using MLib -- Creating our fraud detection model -- Building our K-means clustering model -- Processing the data -- Putting the fraud detection model to use -- Generating a data stream -- Processing the data stream using Spark streaming -- Putting the model to use -- Scaling the solution -- Summary -- Chapter 4: Marketing Campaign Planning -- Creating the solution outline -- Supervised learning -- Tree-structure models for classification -- Finding the right dataset -- Setting the up the solution architecture -- Coupon scan at POS -- Join and transform | |
520 | 3 | |a Train the classification model -- Scoring -- Mail merge -- Building the machine learning model -- [Introducing BigML] -- Introducing BigML -- Model building steps -- Sign up as a user on BigML site -- Upload the data file -- Creating the dataset -- Building the classification model -- Downloading the classification model -- Running the Model on Hadoop -- Creating the target List -- Post campaign activities -- Summary -- Chapter 5: Churn Detection -- A business case for churn detection -- Creating the solution outline -- Building a predictive model using Hadoop -- Bayes' Theorem -- Playing with the Bayesian predictor -- Running a Node.js-based Bayesian predictor -- Understanding the predictor code -- Limitations of our solution -- Building a churn predictor using Hadoop -- Synthetic data generation tools -- Preparing a synthetic historical churn dataset -- The processing approach -- Running the MapReduce program -- Understanding the frequency counter code -- Putting the model to use -- Integrating the churn predictor -- Summary -- Chapter 6: Analyze Sensor Data Using Hadoop -- A business case for sensor data analytics -- Creating the solution outline -- Technology stack -- Kafka -- Flume -- HDFS -- Hive -- Open TSDB -- HBase -- Grafana -- Batch data analytics -- Loading streams of sensor data from Kafka topics to HDFS -- Using Hive to perform analytics on inserted data -- Data visualization in MS Excel -- Stream data analytics -- Loading streams of sensor data -- Data visualization using Grafana -- Summary -- Chapter 7: Building a Data Lake -- Data lake building blocks -- Ingestion tier -- Storage tier -- Insights tier -- Ops facilities -- Limitation of open source Hadoop ecosystem tools -- Hadoop security -- HDFS permissions model -- Fine-grained permissions with HDFS ACLs -- Apache Ranger -- Installing Apache Ranger -- Test driving Apache Ranger | |
650 | 0 | 7 | |a Hadoop |0 (DE-588)1022420135 |2 gnd |9 rswk-swf |
653 | 0 | |a Electronic data processing--Distributed processing | |
653 | 0 | |a Apache Hadoop | |
653 | 0 | |a Electronic data processing / Distributed processing | |
653 | 0 | |a Big data | |
653 | 0 | |a Electronic books | |
689 | 0 | 0 | |a Hadoop |0 (DE-588)1022420135 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Deshpande, Tanmay |e Verfasser |4 aut | |
856 | 4 | 0 | |m X:EBC |u https://ebookcentral.proquest.com/lib/kxp/detail.action?docID=4709437 |x Verlag |3 Volltext |
912 | |a ZDB-30-PQE | ||
999 | |a oai:aleph.bib-bvb.de:BVB01-034822901 | ||
966 | e | |u https://ebookcentral.proquest.com/lib/th-wildau/detail.action?docID=4709437 |l BTW01 |p ZDB-30-PQE |q BTW_PDA_PQE_KAUF |x Aggregator |3 Volltext |
Datensatz im Suchindex
_version_ | 1804186264920915968 |
---|---|
adam_txt | |
any_adam_object | |
any_adam_object_boolean | |
author | Shrivastava, Anurag Deshpande, Tanmay |
author_facet | Shrivastava, Anurag Deshpande, Tanmay |
author_role | aut aut |
author_sort | Shrivastava, Anurag |
author_variant | a s as t d td |
building | Verbundindex |
bvnumber | BV049477400 |
classification_rvk | ST 253 |
collection | ZDB-30-PQE |
ctrlnum | (OCoLC)1416409381 (DE-599)GBV881028223 |
dewey-full | 005.7585 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 005 - Computer programming, programs, data, security |
dewey-raw | 005.7585 |
dewey-search | 005.7585 |
dewey-sort | 15.7585 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik |
discipline_str_mv | Informatik |
edition | 1st ed |
format | Electronic eBook |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>07308nmm a22004691c 4500</leader><controlfield tag="001">BV049477400</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">00000000000000.0</controlfield><controlfield tag="007">cr|uuu---uuuuu</controlfield><controlfield tag="008">231221s2016 |||| o||u| ||||||eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781783980314</subfield><subfield code="c">: electronic bk.</subfield><subfield code="9">978-1-78398-031-4</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1416409381</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBV881028223</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-526</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">005.7585</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 253</subfield><subfield code="0">(DE-625)143628:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Shrivastava, Anurag</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Hadoop blueprints</subfield><subfield code="b">use Hadoop to solve business problems by learning from a rich set of real-life case studies</subfield><subfield code="c">Anurag Shrivastava, Tanmay Deshpande.</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1st ed</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Birmingham, England</subfield><subfield code="b">Packt Publishing</subfield><subfield code="c">2016</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 online resource (312 pages)</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">Cover -- Copyright -- Credits -- About the Authors -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Hadoop and Big Data -- The beginning of the big data problem -- Limitations of RDBMS systems -- Scaling out a database on Google -- Parallel processing of large datasets -- Building open source Hadoop -- Enterprise Hadoop -- Social media and mobile channels -- Data storage cost reduction -- Enterprise software vendors -- Pure Play Hadoop vendors -- Cloud Hadoop vendors -- The design of the Hadoop system -- The Hadoop Distributed File System (HDFS) -- Data organization in HDFS -- HDFS file management commands -- NameNode and DataNodes -- Metadata store in NameNode -- Preventing a single point of failure with Hadoop HA -- Checkpointing process -- Data Store on a DataNode -- Handshakes and heartbeats -- MapReduce -- The execution model of MapReduce Version 1 -- Apache YARN -- Building a MapReduce Version 2 program -- Problem statement -- Solution workflow -- Getting the dataset -- Studying the dataset -- Cleaning the dataset -- Loading the dataset on the HDFS -- Starting with a MapReduce program -- Installing Eclipse -- Creating a project in Eclipse -- Coding and building a MapReduce program -- Run the MapReduce program locally -- Examine the result -- Run the MapReduce program on Hadoop -- Further processing of results -- Hadoop platform tools -- Data ingestion tools -- Data access tools -- Monitoring tools -- Data governance tools -- Big data use cases -- Creating a 360 degree view of a customer -- Fraud detection systems for banks -- Marketing campaign planning -- Churn detection in telecom -- Analyzing sensor data -- Building a data lake -- The architecture of Hadoop-based systems -- Lambda architecture -- Summary -- Chapter 2: A 360-Degree View of the Customer -- Capturing business information</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">Collecting data from data sources -- Creating a data processing approach -- Presenting the results -- Setting up the technology stack -- Tools used -- Installing Hortonworks Sandbox -- Creating user accounts -- Exploring HUE -- Exploring MYSQL and the HIVE command line -- Exploring Sqoop at the command line -- Test driving Hive and Sqoop -- Querying data using Hive -- Importing data in Hive using Sqoop -- Engineering the solution -- Datasets -- Loading customer master data into Hadoop -- Loading web logs into Hadoop -- Loading tweets into Hadoop -- Creating the 360-degree view -- Exporting data from Hadoop -- Presenting the view -- Building a web application -- Installing Node.js -- Coding the web application in Node.js -- Summary -- Chapter 3: Building a Fraud Detection System -- Understanding the business problem -- Selecting and cleansing the dataset -- Finding relevant fields -- Machine learning for fraud detection -- Clustering as an unsupervised machine learning method -- Designing the high-level architecture -- Introducing Apache Spark -- Apache Spark architecture -- Resilient Distributed Datasets -- Transformation functions -- Actions -- Test driving Apache Spark -- Calculating the yearly average stock prices using Spark -- Apache Spark 2.X -- Understanding MLib -- Test driving K-means using MLib -- Creating our fraud detection model -- Building our K-means clustering model -- Processing the data -- Putting the fraud detection model to use -- Generating a data stream -- Processing the data stream using Spark streaming -- Putting the model to use -- Scaling the solution -- Summary -- Chapter 4: Marketing Campaign Planning -- Creating the solution outline -- Supervised learning -- Tree-structure models for classification -- Finding the right dataset -- Setting the up the solution architecture -- Coupon scan at POS -- Join and transform</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">Train the classification model -- Scoring -- Mail merge -- Building the machine learning model -- [Introducing BigML] -- Introducing BigML -- Model building steps -- Sign up as a user on BigML site -- Upload the data file -- Creating the dataset -- Building the classification model -- Downloading the classification model -- Running the Model on Hadoop -- Creating the target List -- Post campaign activities -- Summary -- Chapter 5: Churn Detection -- A business case for churn detection -- Creating the solution outline -- Building a predictive model using Hadoop -- Bayes' Theorem -- Playing with the Bayesian predictor -- Running a Node.js-based Bayesian predictor -- Understanding the predictor code -- Limitations of our solution -- Building a churn predictor using Hadoop -- Synthetic data generation tools -- Preparing a synthetic historical churn dataset -- The processing approach -- Running the MapReduce program -- Understanding the frequency counter code -- Putting the model to use -- Integrating the churn predictor -- Summary -- Chapter 6: Analyze Sensor Data Using Hadoop -- A business case for sensor data analytics -- Creating the solution outline -- Technology stack -- Kafka -- Flume -- HDFS -- Hive -- Open TSDB -- HBase -- Grafana -- Batch data analytics -- Loading streams of sensor data from Kafka topics to HDFS -- Using Hive to perform analytics on inserted data -- Data visualization in MS Excel -- Stream data analytics -- Loading streams of sensor data -- Data visualization using Grafana -- Summary -- Chapter 7: Building a Data Lake -- Data lake building blocks -- Ingestion tier -- Storage tier -- Insights tier -- Ops facilities -- Limitation of open source Hadoop ecosystem tools -- Hadoop security -- HDFS permissions model -- Fine-grained permissions with HDFS ACLs -- Apache Ranger -- Installing Apache Ranger -- Test driving Apache Ranger</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Hadoop</subfield><subfield code="0">(DE-588)1022420135</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Electronic data processing--Distributed processing</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Apache Hadoop</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Electronic data processing / Distributed processing</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Big data</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Electronic books</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Hadoop</subfield><subfield code="0">(DE-588)1022420135</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Deshpande, Tanmay</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="m">X:EBC</subfield><subfield code="u">https://ebookcentral.proquest.com/lib/kxp/detail.action?docID=4709437</subfield><subfield code="x">Verlag</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-30-PQE</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-034822901</subfield></datafield><datafield tag="966" ind1="e" ind2=" "><subfield code="u">https://ebookcentral.proquest.com/lib/th-wildau/detail.action?docID=4709437</subfield><subfield code="l">BTW01</subfield><subfield code="p">ZDB-30-PQE</subfield><subfield code="q">BTW_PDA_PQE_KAUF</subfield><subfield code="x">Aggregator</subfield><subfield code="3">Volltext</subfield></datafield></record></collection> |
id | DE-604.BV049477400 |
illustrated | Not Illustrated |
index_date | 2024-07-03T23:17:42Z |
indexdate | 2024-07-10T10:08:22Z |
institution | BVB |
isbn | 9781783980314 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-034822901 |
oclc_num | 1416409381 |
open_access_boolean | |
owner | DE-526 |
owner_facet | DE-526 |
physical | 1 online resource (312 pages) |
psigel | ZDB-30-PQE ZDB-30-PQE BTW_PDA_PQE_KAUF |
publishDate | 2016 |
publishDateSearch | 2016 |
publishDateSort | 2016 |
publisher | Packt Publishing |
record_format | marc |
spelling | Shrivastava, Anurag Verfasser aut Hadoop blueprints use Hadoop to solve business problems by learning from a rich set of real-life case studies Anurag Shrivastava, Tanmay Deshpande. 1st ed Birmingham, England Packt Publishing 2016 1 online resource (312 pages) txt rdacontent c rdamedia cr rdacarrier Cover -- Copyright -- Credits -- About the Authors -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Hadoop and Big Data -- The beginning of the big data problem -- Limitations of RDBMS systems -- Scaling out a database on Google -- Parallel processing of large datasets -- Building open source Hadoop -- Enterprise Hadoop -- Social media and mobile channels -- Data storage cost reduction -- Enterprise software vendors -- Pure Play Hadoop vendors -- Cloud Hadoop vendors -- The design of the Hadoop system -- The Hadoop Distributed File System (HDFS) -- Data organization in HDFS -- HDFS file management commands -- NameNode and DataNodes -- Metadata store in NameNode -- Preventing a single point of failure with Hadoop HA -- Checkpointing process -- Data Store on a DataNode -- Handshakes and heartbeats -- MapReduce -- The execution model of MapReduce Version 1 -- Apache YARN -- Building a MapReduce Version 2 program -- Problem statement -- Solution workflow -- Getting the dataset -- Studying the dataset -- Cleaning the dataset -- Loading the dataset on the HDFS -- Starting with a MapReduce program -- Installing Eclipse -- Creating a project in Eclipse -- Coding and building a MapReduce program -- Run the MapReduce program locally -- Examine the result -- Run the MapReduce program on Hadoop -- Further processing of results -- Hadoop platform tools -- Data ingestion tools -- Data access tools -- Monitoring tools -- Data governance tools -- Big data use cases -- Creating a 360 degree view of a customer -- Fraud detection systems for banks -- Marketing campaign planning -- Churn detection in telecom -- Analyzing sensor data -- Building a data lake -- The architecture of Hadoop-based systems -- Lambda architecture -- Summary -- Chapter 2: A 360-Degree View of the Customer -- Capturing business information Collecting data from data sources -- Creating a data processing approach -- Presenting the results -- Setting up the technology stack -- Tools used -- Installing Hortonworks Sandbox -- Creating user accounts -- Exploring HUE -- Exploring MYSQL and the HIVE command line -- Exploring Sqoop at the command line -- Test driving Hive and Sqoop -- Querying data using Hive -- Importing data in Hive using Sqoop -- Engineering the solution -- Datasets -- Loading customer master data into Hadoop -- Loading web logs into Hadoop -- Loading tweets into Hadoop -- Creating the 360-degree view -- Exporting data from Hadoop -- Presenting the view -- Building a web application -- Installing Node.js -- Coding the web application in Node.js -- Summary -- Chapter 3: Building a Fraud Detection System -- Understanding the business problem -- Selecting and cleansing the dataset -- Finding relevant fields -- Machine learning for fraud detection -- Clustering as an unsupervised machine learning method -- Designing the high-level architecture -- Introducing Apache Spark -- Apache Spark architecture -- Resilient Distributed Datasets -- Transformation functions -- Actions -- Test driving Apache Spark -- Calculating the yearly average stock prices using Spark -- Apache Spark 2.X -- Understanding MLib -- Test driving K-means using MLib -- Creating our fraud detection model -- Building our K-means clustering model -- Processing the data -- Putting the fraud detection model to use -- Generating a data stream -- Processing the data stream using Spark streaming -- Putting the model to use -- Scaling the solution -- Summary -- Chapter 4: Marketing Campaign Planning -- Creating the solution outline -- Supervised learning -- Tree-structure models for classification -- Finding the right dataset -- Setting the up the solution architecture -- Coupon scan at POS -- Join and transform Train the classification model -- Scoring -- Mail merge -- Building the machine learning model -- [Introducing BigML] -- Introducing BigML -- Model building steps -- Sign up as a user on BigML site -- Upload the data file -- Creating the dataset -- Building the classification model -- Downloading the classification model -- Running the Model on Hadoop -- Creating the target List -- Post campaign activities -- Summary -- Chapter 5: Churn Detection -- A business case for churn detection -- Creating the solution outline -- Building a predictive model using Hadoop -- Bayes' Theorem -- Playing with the Bayesian predictor -- Running a Node.js-based Bayesian predictor -- Understanding the predictor code -- Limitations of our solution -- Building a churn predictor using Hadoop -- Synthetic data generation tools -- Preparing a synthetic historical churn dataset -- The processing approach -- Running the MapReduce program -- Understanding the frequency counter code -- Putting the model to use -- Integrating the churn predictor -- Summary -- Chapter 6: Analyze Sensor Data Using Hadoop -- A business case for sensor data analytics -- Creating the solution outline -- Technology stack -- Kafka -- Flume -- HDFS -- Hive -- Open TSDB -- HBase -- Grafana -- Batch data analytics -- Loading streams of sensor data from Kafka topics to HDFS -- Using Hive to perform analytics on inserted data -- Data visualization in MS Excel -- Stream data analytics -- Loading streams of sensor data -- Data visualization using Grafana -- Summary -- Chapter 7: Building a Data Lake -- Data lake building blocks -- Ingestion tier -- Storage tier -- Insights tier -- Ops facilities -- Limitation of open source Hadoop ecosystem tools -- Hadoop security -- HDFS permissions model -- Fine-grained permissions with HDFS ACLs -- Apache Ranger -- Installing Apache Ranger -- Test driving Apache Ranger Hadoop (DE-588)1022420135 gnd rswk-swf Electronic data processing--Distributed processing Apache Hadoop Electronic data processing / Distributed processing Big data Electronic books Hadoop (DE-588)1022420135 s DE-604 Deshpande, Tanmay Verfasser aut X:EBC https://ebookcentral.proquest.com/lib/kxp/detail.action?docID=4709437 Verlag Volltext |
spellingShingle | Shrivastava, Anurag Deshpande, Tanmay Hadoop blueprints use Hadoop to solve business problems by learning from a rich set of real-life case studies Hadoop (DE-588)1022420135 gnd |
subject_GND | (DE-588)1022420135 |
title | Hadoop blueprints use Hadoop to solve business problems by learning from a rich set of real-life case studies |
title_auth | Hadoop blueprints use Hadoop to solve business problems by learning from a rich set of real-life case studies |
title_exact_search | Hadoop blueprints use Hadoop to solve business problems by learning from a rich set of real-life case studies |
title_exact_search_txtP | Hadoop blueprints use Hadoop to solve business problems by learning from a rich set of real-life case studies |
title_full | Hadoop blueprints use Hadoop to solve business problems by learning from a rich set of real-life case studies Anurag Shrivastava, Tanmay Deshpande. |
title_fullStr | Hadoop blueprints use Hadoop to solve business problems by learning from a rich set of real-life case studies Anurag Shrivastava, Tanmay Deshpande. |
title_full_unstemmed | Hadoop blueprints use Hadoop to solve business problems by learning from a rich set of real-life case studies Anurag Shrivastava, Tanmay Deshpande. |
title_short | Hadoop blueprints |
title_sort | hadoop blueprints use hadoop to solve business problems by learning from a rich set of real life case studies |
title_sub | use Hadoop to solve business problems by learning from a rich set of real-life case studies |
topic | Hadoop (DE-588)1022420135 gnd |
topic_facet | Hadoop |
url | https://ebookcentral.proquest.com/lib/kxp/detail.action?docID=4709437 |
work_keys_str_mv | AT shrivastavaanurag hadoopblueprintsusehadooptosolvebusinessproblemsbylearningfromarichsetofreallifecasestudies AT deshpandetanmay hadoopblueprintsusehadooptosolvebusinessproblemsbylearningfromarichsetofreallifecasestudies |