Mastering Spark for data science /:
"Master the techniques and sophisticated analytics used to construct Spark-based solutions that scale to deliver production-grade data science products."
Gespeichert in:
1. Verfasser: | |
---|---|
Weitere Verfasser: | , , |
Format: | Elektronisch E-Book |
Sprache: | English |
Veröffentlicht: |
Birmingham, UK :
Packt Publishing Ltd.,
2017.
|
Schlagworte: | |
Online-Zugang: | Volltext |
Zusammenfassung: | "Master the techniques and sophisticated analytics used to construct Spark-based solutions that scale to deliver production-grade data science products." |
Beschreibung: | Includes index. |
Beschreibung: | 1 online resource |
ISBN: | 1785888285 9781785888281 |
Internformat
MARC
LEADER | 00000cam a2200000 i 4500 | ||
---|---|---|---|
001 | ZDB-4-EBA-ocn981985497 | ||
003 | OCoLC | ||
005 | 20240705115654.0 | ||
006 | m o d | ||
007 | cr |n||||||||| | ||
008 | 170407s2017 enk o 001 0 eng d | ||
040 | |a IDEBK |b eng |e pn |c IDEBK |d YDX |d MERUC |d N$T |d EBLCP |d OCLCF |d COO |d IDEBK |d OCLCQ |d OCLCO |d OCLCQ |d OCLCO |d LVT |d UKAHL |d OCLCQ |d OCLCO |d OCLCQ |d OCLCO |d OCLCL |d OCLCQ | ||
019 | |a 981591538 |a 981844508 |a 982010852 | ||
020 | |a 1785888285 |q (electronic bk.) | ||
020 | |a 9781785888281 |q (electronic bk.) | ||
020 | |z 1785882147 | ||
035 | |a (OCoLC)981985497 |z (OCoLC)981591538 |z (OCoLC)981844508 |z (OCoLC)982010852 | ||
037 | |a 1003903 |b MIL | ||
050 | 4 | |a QA76.9.D343 | |
072 | 7 | |a COM |x 021030 |2 bisacsh | |
082 | 7 | |a 005.75/85 |2 23 | |
049 | |a MAIN | ||
100 | 1 | |a Morgan, Andrew. | |
245 | 1 | 0 | |a Mastering Spark for data science / |c Andrew Morgan, Antoine Amend, Matthew Hallett, David George ; foreword by Harry Powell. |
260 | |a Birmingham, UK : |b Packt Publishing Ltd., |c 2017. | ||
300 | |a 1 online resource | ||
336 | |a text |b txt |2 rdacontent | ||
337 | |a computer |b c |2 rdamedia | ||
338 | |a online resource |b cr |2 rdacarrier | ||
500 | |a Includes index. | ||
520 | |a "Master the techniques and sophisticated analytics used to construct Spark-based solutions that scale to deliver production-grade data science products." | ||
588 | 0 | |a Print version record. | |
505 | 0 | |a Cover; Copyright; Credits; Foreword; About the Authors; About the Reviewer; www.PacktPub.com; Customer Feedback; Table of Contents; Preface; Chapter 1: The Big Data Science Ecosystem; Introducing the Big Data ecosystem; Data management; Data management responsibilities; The right tool for the job; Overall architecture; Data Ingestion; Data Lake; Reliable storage; Scalable data processing capability; Data science platform; Data Access; Data technologies; The role of Apache Spark; Companion tools; Apache HDFS; Advantages; Disadvantages; Installation; Amazon S3; Advantages; Disadvantages. | |
505 | 8 | |a InstallationApache Kafka; Advantages; Disadvantages; Installation; Apache Parquet; Advantages; Disadvantages; Installation; Apache Avro; Advantages; Disadvantages; Installation; Apache NiFi; Advantages; Disadvantages; Installation; Apache YARN; Advantages; Disadvantages; Installation; Apache Lucene; Advantages; Disadvantages; Installation; Kibana; Advantages; Disadvantages; Installation; Elasticsearch; Advantages; Disadvantages; Installation; Accumulo; Advantages; Disadvantages; Installation; Summary; Chapter 2: Data Acquisition; Data pipelines; Universal ingestion framework. | |
505 | 8 | |a Introducing the GDELT news streamDiscovering GDELT in real-time; Our first GDELT feed; Improving with publish and subscribe; Content registry; Choices and more choices; Going with the flow; Metadata model; Kibana dashboard; Quality assurance; [Example 1 -- Basic quality checking, no contending users]; Example 1 -- Basic quality checking, no contending users; Example 2 -- Advanced quality checking, no contending users; Example 3 -- Basic quality checking, 50% utility due to contending users; Summary; Chapter 3: Input Formats and Schema; A structured life is a good life; GDELT dimensional modeling. | |
505 | 8 | |a GDELT modelFirst look at the data; Core global knowledge graph model; Hidden complexity; Denormalized models; Challenges with flattened data; Issue 1 -- Loss of contextual information; Issue 2: Re-establishing dimensions; Issue 3: Including reference data; Loading your data; Schema agility; Reality check; GKG ELT; Position matters; Avro; Spark-Avro method; Pedagogical method; When to perform Avro transformation; Parquet; Summary; Chapter 4: Exploratory Data Analysis; The problem, principles and planning; Understanding the EDA problem; Design principles; General plan of exploration; Preparation. | |
505 | 8 | |a Introducing mask based data profilingIntroducing character class masks; Building a mask based profiler; Setting up Apache Zeppelin; Constructing a reusable notebook; Exploring GDELT; GDELT GKG datasets; The files; Special collections; Reference data; Exploring the GKG v2.1; The Translingual files; A configurable GCAM time series EDA; Plot.ly charting on Apache Zeppelin; Exploring translation sourced GCAM sentiment with plot.ly; Concluding remarks; A configurable GCAM Spatio-Temporal EDA; Introducing GeoGCAM; Does our spatial pivot work?; Summary; Chapter 5: Spark for Geographic Analysis. | |
630 | 0 | 0 | |a Spark (Electronic resource : Apache Software Foundation) |0 http://id.loc.gov/authorities/names/no2015027445 |
630 | 0 | 7 | |a Spark (Electronic resource : Apache Software Foundation) |2 fast |
650 | 0 | |a Data mining. |0 http://id.loc.gov/authorities/subjects/sh97002073 | |
650 | 0 | |a Machine learning. |0 http://id.loc.gov/authorities/subjects/sh85079324 | |
650 | 0 | |a Big data. |0 http://id.loc.gov/authorities/subjects/sh2012003227 | |
650 | 6 | |a Exploration de données (Informatique) | |
650 | 6 | |a Apprentissage automatique. | |
650 | 6 | |a Données volumineuses. | |
650 | 7 | |a COMPUTERS |x Databases |x Data Mining. |2 bisacsh | |
650 | 7 | |a Big data |2 fast | |
650 | 7 | |a Data mining |2 fast | |
650 | 7 | |a Machine learning |2 fast | |
700 | 1 | |a Amend, Antoine. | |
700 | 1 | |a George, David. | |
700 | 1 | |a Hallett, Matthew. | |
758 | |i has work: |a Mastering Spark for data science (Text) |1 https://id.oclc.org/worldcat/entity/E39PCGBKJPMByyyXcFYPVy4FDq |4 https://id.oclc.org/worldcat/ontology/hasWork | ||
776 | 0 | 8 | |i Print version: |a Morgan, Andrew. |t Mastering Spark for Data Science. |d Birmingham : Packt Publishing, ©2017 |
856 | 1 | |l FWS01 |p ZDB-4-EBA |q FWS_PDA_EBA |u https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1495812 |3 Volltext | |
856 | 1 | |l CBO01 |p ZDB-4-EBA |q FWS_PDA_EBA |u https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1495812 |3 Volltext | |
938 | |a Askews and Holts Library Services |b ASKH |n AH30656483 | ||
938 | |a EBL - Ebook Library |b EBLB |n EBL4833930 | ||
938 | |a EBSCOhost |b EBSC |n 1495812 | ||
938 | |a ProQuest MyiLibrary Digital eBook Collection |b IDEB |n cis34561627 | ||
938 | |a YBP Library Services |b YANK |n 13953597 | ||
994 | |a 92 |b GEBAY | ||
912 | |a ZDB-4-EBA |
Datensatz im Suchindex
DE-BY-FWS_katkey | ZDB-4-EBA-ocn981985497 |
---|---|
_version_ | 1813903754957684736 |
adam_text | |
any_adam_object | |
author | Morgan, Andrew |
author2 | Amend, Antoine George, David Hallett, Matthew |
author2_role | |
author2_variant | a a aa d g dg m h mh |
author_facet | Morgan, Andrew Amend, Antoine George, David Hallett, Matthew |
author_role | |
author_sort | Morgan, Andrew |
author_variant | a m am |
building | Verbundindex |
bvnumber | localFWS |
callnumber-first | Q - Science |
callnumber-label | QA76 |
callnumber-raw | QA76.9.D343 |
callnumber-search | QA76.9.D343 |
callnumber-sort | QA 276.9 D343 |
callnumber-subject | QA - Mathematics |
collection | ZDB-4-EBA |
contents | Cover; Copyright; Credits; Foreword; About the Authors; About the Reviewer; www.PacktPub.com; Customer Feedback; Table of Contents; Preface; Chapter 1: The Big Data Science Ecosystem; Introducing the Big Data ecosystem; Data management; Data management responsibilities; The right tool for the job; Overall architecture; Data Ingestion; Data Lake; Reliable storage; Scalable data processing capability; Data science platform; Data Access; Data technologies; The role of Apache Spark; Companion tools; Apache HDFS; Advantages; Disadvantages; Installation; Amazon S3; Advantages; Disadvantages. InstallationApache Kafka; Advantages; Disadvantages; Installation; Apache Parquet; Advantages; Disadvantages; Installation; Apache Avro; Advantages; Disadvantages; Installation; Apache NiFi; Advantages; Disadvantages; Installation; Apache YARN; Advantages; Disadvantages; Installation; Apache Lucene; Advantages; Disadvantages; Installation; Kibana; Advantages; Disadvantages; Installation; Elasticsearch; Advantages; Disadvantages; Installation; Accumulo; Advantages; Disadvantages; Installation; Summary; Chapter 2: Data Acquisition; Data pipelines; Universal ingestion framework. Introducing the GDELT news streamDiscovering GDELT in real-time; Our first GDELT feed; Improving with publish and subscribe; Content registry; Choices and more choices; Going with the flow; Metadata model; Kibana dashboard; Quality assurance; [Example 1 -- Basic quality checking, no contending users]; Example 1 -- Basic quality checking, no contending users; Example 2 -- Advanced quality checking, no contending users; Example 3 -- Basic quality checking, 50% utility due to contending users; Summary; Chapter 3: Input Formats and Schema; A structured life is a good life; GDELT dimensional modeling. GDELT modelFirst look at the data; Core global knowledge graph model; Hidden complexity; Denormalized models; Challenges with flattened data; Issue 1 -- Loss of contextual information; Issue 2: Re-establishing dimensions; Issue 3: Including reference data; Loading your data; Schema agility; Reality check; GKG ELT; Position matters; Avro; Spark-Avro method; Pedagogical method; When to perform Avro transformation; Parquet; Summary; Chapter 4: Exploratory Data Analysis; The problem, principles and planning; Understanding the EDA problem; Design principles; General plan of exploration; Preparation. Introducing mask based data profilingIntroducing character class masks; Building a mask based profiler; Setting up Apache Zeppelin; Constructing a reusable notebook; Exploring GDELT; GDELT GKG datasets; The files; Special collections; Reference data; Exploring the GKG v2.1; The Translingual files; A configurable GCAM time series EDA; Plot.ly charting on Apache Zeppelin; Exploring translation sourced GCAM sentiment with plot.ly; Concluding remarks; A configurable GCAM Spatio-Temporal EDA; Introducing GeoGCAM; Does our spatial pivot work?; Summary; Chapter 5: Spark for Geographic Analysis. |
ctrlnum | (OCoLC)981985497 |
dewey-full | 005.75/85 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 005 - Computer programming, programs, data, security |
dewey-raw | 005.75/85 |
dewey-search | 005.75/85 |
dewey-sort | 15.75 285 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik |
format | Electronic eBook |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>06030cam a2200697 i 4500</leader><controlfield tag="001">ZDB-4-EBA-ocn981985497</controlfield><controlfield tag="003">OCoLC</controlfield><controlfield tag="005">20240705115654.0</controlfield><controlfield tag="006">m o d </controlfield><controlfield tag="007">cr |n|||||||||</controlfield><controlfield tag="008">170407s2017 enk o 001 0 eng d</controlfield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">IDEBK</subfield><subfield code="b">eng</subfield><subfield code="e">pn</subfield><subfield code="c">IDEBK</subfield><subfield code="d">YDX</subfield><subfield code="d">MERUC</subfield><subfield code="d">N$T</subfield><subfield code="d">EBLCP</subfield><subfield code="d">OCLCF</subfield><subfield code="d">COO</subfield><subfield code="d">IDEBK</subfield><subfield code="d">OCLCQ</subfield><subfield code="d">OCLCO</subfield><subfield code="d">OCLCQ</subfield><subfield code="d">OCLCO</subfield><subfield code="d">LVT</subfield><subfield code="d">UKAHL</subfield><subfield code="d">OCLCQ</subfield><subfield code="d">OCLCO</subfield><subfield code="d">OCLCQ</subfield><subfield code="d">OCLCO</subfield><subfield code="d">OCLCL</subfield><subfield code="d">OCLCQ</subfield></datafield><datafield tag="019" ind1=" " ind2=" "><subfield code="a">981591538</subfield><subfield code="a">981844508</subfield><subfield code="a">982010852</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1785888285</subfield><subfield code="q">(electronic bk.)</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781785888281</subfield><subfield code="q">(electronic bk.)</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="z">1785882147</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)981985497</subfield><subfield code="z">(OCoLC)981591538</subfield><subfield code="z">(OCoLC)981844508</subfield><subfield code="z">(OCoLC)982010852</subfield></datafield><datafield tag="037" ind1=" " ind2=" "><subfield code="a">1003903</subfield><subfield code="b">MIL</subfield></datafield><datafield tag="050" ind1=" " ind2="4"><subfield code="a">QA76.9.D343</subfield></datafield><datafield tag="072" ind1=" " ind2="7"><subfield code="a">COM</subfield><subfield code="x">021030</subfield><subfield code="2">bisacsh</subfield></datafield><datafield tag="082" ind1="7" ind2=" "><subfield code="a">005.75/85</subfield><subfield code="2">23</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">MAIN</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Morgan, Andrew.</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Mastering Spark for data science /</subfield><subfield code="c">Andrew Morgan, Antoine Amend, Matthew Hallett, David George ; foreword by Harry Powell.</subfield></datafield><datafield tag="260" ind1=" " ind2=" "><subfield code="a">Birmingham, UK :</subfield><subfield code="b">Packt Publishing Ltd.,</subfield><subfield code="c">2017.</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 online resource</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">computer</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">online resource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes index.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">"Master the techniques and sophisticated analytics used to construct Spark-based solutions that scale to deliver production-grade data science products."</subfield></datafield><datafield tag="588" ind1="0" ind2=" "><subfield code="a">Print version record.</subfield></datafield><datafield tag="505" ind1="0" ind2=" "><subfield code="a">Cover; Copyright; Credits; Foreword; About the Authors; About the Reviewer; www.PacktPub.com; Customer Feedback; Table of Contents; Preface; Chapter 1: The Big Data Science Ecosystem; Introducing the Big Data ecosystem; Data management; Data management responsibilities; The right tool for the job; Overall architecture; Data Ingestion; Data Lake; Reliable storage; Scalable data processing capability; Data science platform; Data Access; Data technologies; The role of Apache Spark; Companion tools; Apache HDFS; Advantages; Disadvantages; Installation; Amazon S3; Advantages; Disadvantages.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">InstallationApache Kafka; Advantages; Disadvantages; Installation; Apache Parquet; Advantages; Disadvantages; Installation; Apache Avro; Advantages; Disadvantages; Installation; Apache NiFi; Advantages; Disadvantages; Installation; Apache YARN; Advantages; Disadvantages; Installation; Apache Lucene; Advantages; Disadvantages; Installation; Kibana; Advantages; Disadvantages; Installation; Elasticsearch; Advantages; Disadvantages; Installation; Accumulo; Advantages; Disadvantages; Installation; Summary; Chapter 2: Data Acquisition; Data pipelines; Universal ingestion framework.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Introducing the GDELT news streamDiscovering GDELT in real-time; Our first GDELT feed; Improving with publish and subscribe; Content registry; Choices and more choices; Going with the flow; Metadata model; Kibana dashboard; Quality assurance; [Example 1 -- Basic quality checking, no contending users]; Example 1 -- Basic quality checking, no contending users; Example 2 -- Advanced quality checking, no contending users; Example 3 -- Basic quality checking, 50% utility due to contending users; Summary; Chapter 3: Input Formats and Schema; A structured life is a good life; GDELT dimensional modeling.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">GDELT modelFirst look at the data; Core global knowledge graph model; Hidden complexity; Denormalized models; Challenges with flattened data; Issue 1 -- Loss of contextual information; Issue 2: Re-establishing dimensions; Issue 3: Including reference data; Loading your data; Schema agility; Reality check; GKG ELT; Position matters; Avro; Spark-Avro method; Pedagogical method; When to perform Avro transformation; Parquet; Summary; Chapter 4: Exploratory Data Analysis; The problem, principles and planning; Understanding the EDA problem; Design principles; General plan of exploration; Preparation.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Introducing mask based data profilingIntroducing character class masks; Building a mask based profiler; Setting up Apache Zeppelin; Constructing a reusable notebook; Exploring GDELT; GDELT GKG datasets; The files; Special collections; Reference data; Exploring the GKG v2.1; The Translingual files; A configurable GCAM time series EDA; Plot.ly charting on Apache Zeppelin; Exploring translation sourced GCAM sentiment with plot.ly; Concluding remarks; A configurable GCAM Spatio-Temporal EDA; Introducing GeoGCAM; Does our spatial pivot work?; Summary; Chapter 5: Spark for Geographic Analysis.</subfield></datafield><datafield tag="630" ind1="0" ind2="0"><subfield code="a">Spark (Electronic resource : Apache Software Foundation)</subfield><subfield code="0">http://id.loc.gov/authorities/names/no2015027445</subfield></datafield><datafield tag="630" ind1="0" ind2="7"><subfield code="a">Spark (Electronic resource : Apache Software Foundation)</subfield><subfield code="2">fast</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Data mining.</subfield><subfield code="0">http://id.loc.gov/authorities/subjects/sh97002073</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Machine learning.</subfield><subfield code="0">http://id.loc.gov/authorities/subjects/sh85079324</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Big data.</subfield><subfield code="0">http://id.loc.gov/authorities/subjects/sh2012003227</subfield></datafield><datafield tag="650" ind1=" " ind2="6"><subfield code="a">Exploration de données (Informatique)</subfield></datafield><datafield tag="650" ind1=" " ind2="6"><subfield code="a">Apprentissage automatique.</subfield></datafield><datafield tag="650" ind1=" " ind2="6"><subfield code="a">Données volumineuses.</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">COMPUTERS</subfield><subfield code="x">Databases</subfield><subfield code="x">Data Mining.</subfield><subfield code="2">bisacsh</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Big data</subfield><subfield code="2">fast</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Data mining</subfield><subfield code="2">fast</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Machine learning</subfield><subfield code="2">fast</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Amend, Antoine.</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">George, David.</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Hallett, Matthew.</subfield></datafield><datafield tag="758" ind1=" " ind2=" "><subfield code="i">has work:</subfield><subfield code="a">Mastering Spark for data science (Text)</subfield><subfield code="1">https://id.oclc.org/worldcat/entity/E39PCGBKJPMByyyXcFYPVy4FDq</subfield><subfield code="4">https://id.oclc.org/worldcat/ontology/hasWork</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Print version:</subfield><subfield code="a">Morgan, Andrew.</subfield><subfield code="t">Mastering Spark for Data Science.</subfield><subfield code="d">Birmingham : Packt Publishing, ©2017</subfield></datafield><datafield tag="856" ind1="1" ind2=" "><subfield code="l">FWS01</subfield><subfield code="p">ZDB-4-EBA</subfield><subfield code="q">FWS_PDA_EBA</subfield><subfield code="u">https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1495812</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="856" ind1="1" ind2=" "><subfield code="l">CBO01</subfield><subfield code="p">ZDB-4-EBA</subfield><subfield code="q">FWS_PDA_EBA</subfield><subfield code="u">https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1495812</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="938" ind1=" " ind2=" "><subfield code="a">Askews and Holts Library Services</subfield><subfield code="b">ASKH</subfield><subfield code="n">AH30656483</subfield></datafield><datafield tag="938" ind1=" " ind2=" "><subfield code="a">EBL - Ebook Library</subfield><subfield code="b">EBLB</subfield><subfield code="n">EBL4833930</subfield></datafield><datafield tag="938" ind1=" " ind2=" "><subfield code="a">EBSCOhost</subfield><subfield code="b">EBSC</subfield><subfield code="n">1495812</subfield></datafield><datafield tag="938" ind1=" " ind2=" "><subfield code="a">ProQuest MyiLibrary Digital eBook Collection</subfield><subfield code="b">IDEB</subfield><subfield code="n">cis34561627</subfield></datafield><datafield tag="938" ind1=" " ind2=" "><subfield code="a">YBP Library Services</subfield><subfield code="b">YANK</subfield><subfield code="n">13953597</subfield></datafield><datafield tag="994" ind1=" " ind2=" "><subfield code="a">92</subfield><subfield code="b">GEBAY</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-4-EBA</subfield></datafield></record></collection> |
id | ZDB-4-EBA-ocn981985497 |
illustrated | Not Illustrated |
indexdate | 2024-10-25T16:23:43Z |
institution | BVB |
isbn | 1785888285 9781785888281 |
language | English |
oclc_num | 981985497 |
open_access_boolean | |
owner | MAIN |
owner_facet | MAIN |
physical | 1 online resource |
psigel | ZDB-4-EBA |
publishDate | 2017 |
publishDateSearch | 2017 |
publishDateSort | 2017 |
publisher | Packt Publishing Ltd., |
record_format | marc |
spelling | Morgan, Andrew. Mastering Spark for data science / Andrew Morgan, Antoine Amend, Matthew Hallett, David George ; foreword by Harry Powell. Birmingham, UK : Packt Publishing Ltd., 2017. 1 online resource text txt rdacontent computer c rdamedia online resource cr rdacarrier Includes index. "Master the techniques and sophisticated analytics used to construct Spark-based solutions that scale to deliver production-grade data science products." Print version record. Cover; Copyright; Credits; Foreword; About the Authors; About the Reviewer; www.PacktPub.com; Customer Feedback; Table of Contents; Preface; Chapter 1: The Big Data Science Ecosystem; Introducing the Big Data ecosystem; Data management; Data management responsibilities; The right tool for the job; Overall architecture; Data Ingestion; Data Lake; Reliable storage; Scalable data processing capability; Data science platform; Data Access; Data technologies; The role of Apache Spark; Companion tools; Apache HDFS; Advantages; Disadvantages; Installation; Amazon S3; Advantages; Disadvantages. InstallationApache Kafka; Advantages; Disadvantages; Installation; Apache Parquet; Advantages; Disadvantages; Installation; Apache Avro; Advantages; Disadvantages; Installation; Apache NiFi; Advantages; Disadvantages; Installation; Apache YARN; Advantages; Disadvantages; Installation; Apache Lucene; Advantages; Disadvantages; Installation; Kibana; Advantages; Disadvantages; Installation; Elasticsearch; Advantages; Disadvantages; Installation; Accumulo; Advantages; Disadvantages; Installation; Summary; Chapter 2: Data Acquisition; Data pipelines; Universal ingestion framework. Introducing the GDELT news streamDiscovering GDELT in real-time; Our first GDELT feed; Improving with publish and subscribe; Content registry; Choices and more choices; Going with the flow; Metadata model; Kibana dashboard; Quality assurance; [Example 1 -- Basic quality checking, no contending users]; Example 1 -- Basic quality checking, no contending users; Example 2 -- Advanced quality checking, no contending users; Example 3 -- Basic quality checking, 50% utility due to contending users; Summary; Chapter 3: Input Formats and Schema; A structured life is a good life; GDELT dimensional modeling. GDELT modelFirst look at the data; Core global knowledge graph model; Hidden complexity; Denormalized models; Challenges with flattened data; Issue 1 -- Loss of contextual information; Issue 2: Re-establishing dimensions; Issue 3: Including reference data; Loading your data; Schema agility; Reality check; GKG ELT; Position matters; Avro; Spark-Avro method; Pedagogical method; When to perform Avro transformation; Parquet; Summary; Chapter 4: Exploratory Data Analysis; The problem, principles and planning; Understanding the EDA problem; Design principles; General plan of exploration; Preparation. Introducing mask based data profilingIntroducing character class masks; Building a mask based profiler; Setting up Apache Zeppelin; Constructing a reusable notebook; Exploring GDELT; GDELT GKG datasets; The files; Special collections; Reference data; Exploring the GKG v2.1; The Translingual files; A configurable GCAM time series EDA; Plot.ly charting on Apache Zeppelin; Exploring translation sourced GCAM sentiment with plot.ly; Concluding remarks; A configurable GCAM Spatio-Temporal EDA; Introducing GeoGCAM; Does our spatial pivot work?; Summary; Chapter 5: Spark for Geographic Analysis. Spark (Electronic resource : Apache Software Foundation) http://id.loc.gov/authorities/names/no2015027445 Spark (Electronic resource : Apache Software Foundation) fast Data mining. http://id.loc.gov/authorities/subjects/sh97002073 Machine learning. http://id.loc.gov/authorities/subjects/sh85079324 Big data. http://id.loc.gov/authorities/subjects/sh2012003227 Exploration de données (Informatique) Apprentissage automatique. Données volumineuses. COMPUTERS Databases Data Mining. bisacsh Big data fast Data mining fast Machine learning fast Amend, Antoine. George, David. Hallett, Matthew. has work: Mastering Spark for data science (Text) https://id.oclc.org/worldcat/entity/E39PCGBKJPMByyyXcFYPVy4FDq https://id.oclc.org/worldcat/ontology/hasWork Print version: Morgan, Andrew. Mastering Spark for Data Science. Birmingham : Packt Publishing, ©2017 FWS01 ZDB-4-EBA FWS_PDA_EBA https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1495812 Volltext CBO01 ZDB-4-EBA FWS_PDA_EBA https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1495812 Volltext |
spellingShingle | Morgan, Andrew Mastering Spark for data science / Cover; Copyright; Credits; Foreword; About the Authors; About the Reviewer; www.PacktPub.com; Customer Feedback; Table of Contents; Preface; Chapter 1: The Big Data Science Ecosystem; Introducing the Big Data ecosystem; Data management; Data management responsibilities; The right tool for the job; Overall architecture; Data Ingestion; Data Lake; Reliable storage; Scalable data processing capability; Data science platform; Data Access; Data technologies; The role of Apache Spark; Companion tools; Apache HDFS; Advantages; Disadvantages; Installation; Amazon S3; Advantages; Disadvantages. InstallationApache Kafka; Advantages; Disadvantages; Installation; Apache Parquet; Advantages; Disadvantages; Installation; Apache Avro; Advantages; Disadvantages; Installation; Apache NiFi; Advantages; Disadvantages; Installation; Apache YARN; Advantages; Disadvantages; Installation; Apache Lucene; Advantages; Disadvantages; Installation; Kibana; Advantages; Disadvantages; Installation; Elasticsearch; Advantages; Disadvantages; Installation; Accumulo; Advantages; Disadvantages; Installation; Summary; Chapter 2: Data Acquisition; Data pipelines; Universal ingestion framework. Introducing the GDELT news streamDiscovering GDELT in real-time; Our first GDELT feed; Improving with publish and subscribe; Content registry; Choices and more choices; Going with the flow; Metadata model; Kibana dashboard; Quality assurance; [Example 1 -- Basic quality checking, no contending users]; Example 1 -- Basic quality checking, no contending users; Example 2 -- Advanced quality checking, no contending users; Example 3 -- Basic quality checking, 50% utility due to contending users; Summary; Chapter 3: Input Formats and Schema; A structured life is a good life; GDELT dimensional modeling. GDELT modelFirst look at the data; Core global knowledge graph model; Hidden complexity; Denormalized models; Challenges with flattened data; Issue 1 -- Loss of contextual information; Issue 2: Re-establishing dimensions; Issue 3: Including reference data; Loading your data; Schema agility; Reality check; GKG ELT; Position matters; Avro; Spark-Avro method; Pedagogical method; When to perform Avro transformation; Parquet; Summary; Chapter 4: Exploratory Data Analysis; The problem, principles and planning; Understanding the EDA problem; Design principles; General plan of exploration; Preparation. Introducing mask based data profilingIntroducing character class masks; Building a mask based profiler; Setting up Apache Zeppelin; Constructing a reusable notebook; Exploring GDELT; GDELT GKG datasets; The files; Special collections; Reference data; Exploring the GKG v2.1; The Translingual files; A configurable GCAM time series EDA; Plot.ly charting on Apache Zeppelin; Exploring translation sourced GCAM sentiment with plot.ly; Concluding remarks; A configurable GCAM Spatio-Temporal EDA; Introducing GeoGCAM; Does our spatial pivot work?; Summary; Chapter 5: Spark for Geographic Analysis. Spark (Electronic resource : Apache Software Foundation) http://id.loc.gov/authorities/names/no2015027445 Spark (Electronic resource : Apache Software Foundation) fast Data mining. http://id.loc.gov/authorities/subjects/sh97002073 Machine learning. http://id.loc.gov/authorities/subjects/sh85079324 Big data. http://id.loc.gov/authorities/subjects/sh2012003227 Exploration de données (Informatique) Apprentissage automatique. Données volumineuses. COMPUTERS Databases Data Mining. bisacsh Big data fast Data mining fast Machine learning fast |
subject_GND | http://id.loc.gov/authorities/names/no2015027445 http://id.loc.gov/authorities/subjects/sh97002073 http://id.loc.gov/authorities/subjects/sh85079324 http://id.loc.gov/authorities/subjects/sh2012003227 |
title | Mastering Spark for data science / |
title_auth | Mastering Spark for data science / |
title_exact_search | Mastering Spark for data science / |
title_full | Mastering Spark for data science / Andrew Morgan, Antoine Amend, Matthew Hallett, David George ; foreword by Harry Powell. |
title_fullStr | Mastering Spark for data science / Andrew Morgan, Antoine Amend, Matthew Hallett, David George ; foreword by Harry Powell. |
title_full_unstemmed | Mastering Spark for data science / Andrew Morgan, Antoine Amend, Matthew Hallett, David George ; foreword by Harry Powell. |
title_short | Mastering Spark for data science / |
title_sort | mastering spark for data science |
topic | Spark (Electronic resource : Apache Software Foundation) http://id.loc.gov/authorities/names/no2015027445 Spark (Electronic resource : Apache Software Foundation) fast Data mining. http://id.loc.gov/authorities/subjects/sh97002073 Machine learning. http://id.loc.gov/authorities/subjects/sh85079324 Big data. http://id.loc.gov/authorities/subjects/sh2012003227 Exploration de données (Informatique) Apprentissage automatique. Données volumineuses. COMPUTERS Databases Data Mining. bisacsh Big data fast Data mining fast Machine learning fast |
topic_facet | Spark (Electronic resource : Apache Software Foundation) Data mining. Machine learning. Big data. Exploration de données (Informatique) Apprentissage automatique. Données volumineuses. COMPUTERS Databases Data Mining. Big data Data mining Machine learning |
url | https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1495812 |
work_keys_str_mv | AT morganandrew masteringsparkfordatascience AT amendantoine masteringsparkfordatascience AT georgedavid masteringsparkfordatascience AT hallettmatthew masteringsparkfordatascience |