Hadoop blueprints: use Hadoop to solve business problems by learning from a rich set of real-life case studies

Cover -- Copyright -- Credits -- About the Authors -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Hadoop and Big Data -- The beginning of the big data problem -- Limitations of RDBMS systems -- Scaling out a database on Google -- Parallel processing of large...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Shrivastava, Anurag (VerfasserIn), Deshpande, Tanmay (VerfasserIn)
Format: Elektronisch E-Book
Sprache:English
Veröffentlicht: Birmingham, England Packt Publishing 2016
Ausgabe:1st ed
Schlagworte:
Online-Zugang:BTW01
Volltext
Zusammenfassung:Cover -- Copyright -- Credits -- About the Authors -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Hadoop and Big Data -- The beginning of the big data problem -- Limitations of RDBMS systems -- Scaling out a database on Google -- Parallel processing of large datasets -- Building open source Hadoop -- Enterprise Hadoop -- Social media and mobile channels -- Data storage cost reduction -- Enterprise software vendors -- Pure Play Hadoop vendors -- Cloud Hadoop vendors -- The design of the Hadoop system -- The Hadoop Distributed File System (HDFS) -- Data organization in HDFS -- HDFS file management commands -- NameNode and DataNodes -- Metadata store in NameNode -- Preventing a single point of failure with Hadoop HA -- Checkpointing process -- Data Store on a DataNode -- Handshakes and heartbeats -- MapReduce -- The execution model of MapReduce Version 1 -- Apache YARN -- Building a MapReduce Version 2 program -- Problem statement -- Solution workflow -- Getting the dataset -- Studying the dataset -- Cleaning the dataset -- Loading the dataset on the HDFS -- Starting with a MapReduce program -- Installing Eclipse -- Creating a project in Eclipse -- Coding and building a MapReduce program -- Run the MapReduce program locally -- Examine the result -- Run the MapReduce program on Hadoop -- Further processing of results -- Hadoop platform tools -- Data ingestion tools -- Data access tools -- Monitoring tools -- Data governance tools -- Big data use cases -- Creating a 360 degree view of a customer -- Fraud detection systems for banks -- Marketing campaign planning -- Churn detection in telecom -- Analyzing sensor data -- Building a data lake -- The architecture of Hadoop-based systems -- Lambda architecture -- Summary -- Chapter 2: A 360-Degree View of the Customer -- Capturing business information
Collecting data from data sources -- Creating a data processing approach -- Presenting the results -- Setting up the technology stack -- Tools used -- Installing Hortonworks Sandbox -- Creating user accounts -- Exploring HUE -- Exploring MYSQL and the HIVE command line -- Exploring Sqoop at the command line -- Test driving Hive and Sqoop -- Querying data using Hive -- Importing data in Hive using Sqoop -- Engineering the solution -- Datasets -- Loading customer master data into Hadoop -- Loading web logs into Hadoop -- Loading tweets into Hadoop -- Creating the 360-degree view -- Exporting data from Hadoop -- Presenting the view -- Building a web application -- Installing Node.js -- Coding the web application in Node.js -- Summary -- Chapter 3: Building a Fraud Detection System -- Understanding the business problem -- Selecting and cleansing the dataset -- Finding relevant fields -- Machine learning for fraud detection -- Clustering as an unsupervised machine learning method -- Designing the high-level architecture -- Introducing Apache Spark -- Apache Spark architecture -- Resilient Distributed Datasets -- Transformation functions -- Actions -- Test driving Apache Spark -- Calculating the yearly average stock prices using Spark -- Apache Spark 2.X -- Understanding MLib -- Test driving K-means using MLib -- Creating our fraud detection model -- Building our K-means clustering model -- Processing the data -- Putting the fraud detection model to use -- Generating a data stream -- Processing the data stream using Spark streaming -- Putting the model to use -- Scaling the solution -- Summary -- Chapter 4: Marketing Campaign Planning -- Creating the solution outline -- Supervised learning -- Tree-structure models for classification -- Finding the right dataset -- Setting the up the solution architecture -- Coupon scan at POS -- Join and transform
Train the classification model -- Scoring -- Mail merge -- Building the machine learning model -- [Introducing BigML] -- Introducing BigML -- Model building steps -- Sign up as a user on BigML site -- Upload the data file -- Creating the dataset -- Building the classification model -- Downloading the classification model -- Running the Model on Hadoop -- Creating the target List -- Post campaign activities -- Summary -- Chapter 5: Churn Detection -- A business case for churn detection -- Creating the solution outline -- Building a predictive model using Hadoop -- Bayes' Theorem -- Playing with the Bayesian predictor -- Running a Node.js-based Bayesian predictor -- Understanding the predictor code -- Limitations of our solution -- Building a churn predictor using Hadoop -- Synthetic data generation tools -- Preparing a synthetic historical churn dataset -- The processing approach -- Running the MapReduce program -- Understanding the frequency counter code -- Putting the model to use -- Integrating the churn predictor -- Summary -- Chapter 6: Analyze Sensor Data Using Hadoop -- A business case for sensor data analytics -- Creating the solution outline -- Technology stack -- Kafka -- Flume -- HDFS -- Hive -- Open TSDB -- HBase -- Grafana -- Batch data analytics -- Loading streams of sensor data from Kafka topics to HDFS -- Using Hive to perform analytics on inserted data -- Data visualization in MS Excel -- Stream data analytics -- Loading streams of sensor data -- Data visualization using Grafana -- Summary -- Chapter 7: Building a Data Lake -- Data lake building blocks -- Ingestion tier -- Storage tier -- Insights tier -- Ops facilities -- Limitation of open source Hadoop ecosystem tools -- Hadoop security -- HDFS permissions model -- Fine-grained permissions with HDFS ACLs -- Apache Ranger -- Installing Apache Ranger -- Test driving Apache Ranger
Beschreibung:1 online resource (312 pages)
ISBN:9781783980314

Es ist kein Print-Exemplar vorhanden.

Fernleihe Bestellen Achtung: Nicht im THWS-Bestand! Volltext öffnen