Hands-on data science and Python machine learning :: perform data mining and machine learning efficiently using Python and Spark /
This book covers the fundamentals of machine learning with Python in a concise and dynamic manner. It covers data mining and large-scale machine learning using Apache Spark. About This Book Take your first steps in the world of data science by understanding the tools and techniques of data analysis...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Elektronisch E-Book |
Sprache: | English |
Veröffentlicht: |
Birmingham, UK :
Packt Publishing,
2017.
|
Schlagworte: | |
Online-Zugang: | Volltext |
Zusammenfassung: | This book covers the fundamentals of machine learning with Python in a concise and dynamic manner. It covers data mining and large-scale machine learning using Apache Spark. About This Book Take your first steps in the world of data science by understanding the tools and techniques of data analysis Train efficient Machine Learning models in Python using the supervised and unsupervised learning methods Learn how to use Apache Spark for processing Big Data efficiently Who This Book Is For If you are a budding data scientist or a data analyst who wants to analyze and gain actionable insights from data using Python, this book is for you. Programmers with some experience in Python who want to enter the lucrative world of Data Science will also find this book to be very useful, but you don't need to be an expert Python coder or mathematician to get the most from this book. What You Will Learn Learn how to clean your data and ready it for analysis Implement the popular clustering and regression methods in Python Train efficient machine learning models using decision trees and random forests Visualize the results of your analysis using Python's Matplotlib library Use Apache Spark's MLlib package to perform machine learning on large datasets In Detail Join Frank Kane, who worked on Amazon and IMDb's machine learning algorithms, as he guides you on your first steps into the world of data science. Hands-On Data Science and Python Machine Learning gives you the tools that you need to understand and explore the core topics in the field, and the confidence and practice to build and analyze your own machine learning models. With the help of interesting and easy-to-follow practical examples, Frank Kane explains potentially complex topics such as Bayesian methods and K-means clustering in a way that anybody can understand them. Based on Frank's successful data science course, Hands-On Data Science and Python Machine Learning empowers you to conduct data analysis and perform efficient machine learning using Python. Let Frank help you unearth the value in your data using the various data mining and data analysis techniques available in Python, and to develop efficient predictive models to predict future results. You will also learn how to perform large-scale machine learning on Big Data using Apache Spark. The book covers preparing your data for analysis, training machine learning models, and visualizing the final data analysis. Style and approach This comprehen ... |
Beschreibung: | 1 online resource (1 volume) : illustrations |
ISBN: | 9781787280229 1787280225 9781523112227 1523112220 |
Internformat
MARC
LEADER | 00000cam a2200000 i 4500 | ||
---|---|---|---|
001 | ZDB-4-EBA-on1001346998 | ||
003 | OCoLC | ||
005 | 20241004212047.0 | ||
006 | m o d | ||
007 | cr unu|||||||| | ||
008 | 170818s2017 enka o 000 0 eng d | ||
040 | |a UMI |b eng |e rda |e pn |c UMI |d IDEBK |d OCLCF |d TOH |d STF |d N$T |d YDX |d COO |d KNOVL |d UOK |d CEF |d KSU |d UAB |d AU@ |d ERF |d OCLCQ |d OCLCO |d OCLCQ |d OCLCO |d OCLCL |d DXU | ||
019 | |a 1097002980 | ||
020 | |a 9781787280229 |q (electronic bk.) | ||
020 | |a 1787280225 |q (electronic bk.) | ||
020 | |a 9781523112227 |q (electronic bk.) | ||
020 | |a 1523112220 |q (electronic bk.) | ||
020 | |z 9781787280748 | ||
035 | |a (OCoLC)1001346998 |z (OCoLC)1097002980 | ||
037 | |a CL0500000885 |b Safari Books Online | ||
050 | 4 | |a QA76.73.P98 | |
072 | 7 | |a COM |x 051000 |2 bisacsh | |
082 | 7 | |a 005.133 |2 23 | |
049 | |a MAIN | ||
100 | 1 | |a Kane, Frank, |e author. | |
245 | 1 | 0 | |a Hands-on data science and Python machine learning : |b perform data mining and machine learning efficiently using Python and Spark / |c Frank Kane. |
264 | 1 | |a Birmingham, UK : |b Packt Publishing, |c 2017. | |
300 | |a 1 online resource (1 volume) : |b illustrations | ||
336 | |a text |b txt |2 rdacontent | ||
337 | |a computer |b c |2 rdamedia | ||
338 | |a online resource |b cr |2 rdacarrier | ||
588 | 0 | |a Online resource; title from title page (Safari, viewed August 18, 2017). | |
520 | |a This book covers the fundamentals of machine learning with Python in a concise and dynamic manner. It covers data mining and large-scale machine learning using Apache Spark. About This Book Take your first steps in the world of data science by understanding the tools and techniques of data analysis Train efficient Machine Learning models in Python using the supervised and unsupervised learning methods Learn how to use Apache Spark for processing Big Data efficiently Who This Book Is For If you are a budding data scientist or a data analyst who wants to analyze and gain actionable insights from data using Python, this book is for you. Programmers with some experience in Python who want to enter the lucrative world of Data Science will also find this book to be very useful, but you don't need to be an expert Python coder or mathematician to get the most from this book. What You Will Learn Learn how to clean your data and ready it for analysis Implement the popular clustering and regression methods in Python Train efficient machine learning models using decision trees and random forests Visualize the results of your analysis using Python's Matplotlib library Use Apache Spark's MLlib package to perform machine learning on large datasets In Detail Join Frank Kane, who worked on Amazon and IMDb's machine learning algorithms, as he guides you on your first steps into the world of data science. Hands-On Data Science and Python Machine Learning gives you the tools that you need to understand and explore the core topics in the field, and the confidence and practice to build and analyze your own machine learning models. With the help of interesting and easy-to-follow practical examples, Frank Kane explains potentially complex topics such as Bayesian methods and K-means clustering in a way that anybody can understand them. Based on Frank's successful data science course, Hands-On Data Science and Python Machine Learning empowers you to conduct data analysis and perform efficient machine learning using Python. Let Frank help you unearth the value in your data using the various data mining and data analysis techniques available in Python, and to develop efficient predictive models to predict future results. You will also learn how to perform large-scale machine learning on Big Data using Apache Spark. The book covers preparing your data for analysis, training machine learning models, and visualizing the final data analysis. Style and approach This comprehen ... | ||
505 | 0 | |a Intro -- Copyright -- Credits -- About the Author -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Getting Started -- Installing Enthought Canopy -- Giving the installation a test run -- If you occasionally get problems opening your IPNYB files -- Using and understanding IPython (Jupyter) Notebooks -- Python basics -- Part 1 -- Understanding Python code -- Importing modules -- Data structures -- Experimenting with lists -- Pre colon -- Post colon -- Negative syntax -- Adding list to list -- The append function -- Complex data structures -- Dereferencing a single element -- The sort function -- Reverse sort -- Tuples -- Dereferencing an element -- List of tuples -- Dictionaries -- Iterating through entries -- Python basics -- Part 2 -- Functions in Python -- Lambda functions -- functional programming -- Understanding boolean expressions -- The if statement -- The if-else loop -- Looping -- The while loop -- Exploring activity -- Running Python scripts -- More options than just the IPython/Jupyter Notebook -- Running Python scripts in command prompt -- Using the Canopy IDE -- Summary -- Chapter 2: Statistics and Probability Refresher, and Python Practice -- Types of data -- Numerical data -- Discrete data -- Continuous data -- Categorical data -- Ordinal data -- Mean, median, and mode -- Mean -- Median -- The factor of outliers -- Mode -- Using mean, median, and mode in Python -- Calculating mean using the NumPy package -- Visualizing data using matplotlib -- Calculating median using the NumPy package -- Analyzing the effect of outliers -- Calculating mode using the SciPy package -- Some exercises -- Standard deviation and variance -- Variance -- Measuring variance -- Standard deviation -- Identifying outliers with standard deviation -- Population variance versus sample variance -- The Mathematical explanation. | |
505 | 8 | |a Analyzing standard deviation and variance on a histogram -- Using Python to compute standard deviation and variance -- Try it yourself -- Probability density function and probability mass function -- The probability density function and probability mass functions -- Probability density functions -- Probability mass functions -- Types of data distributions -- Uniform distribution -- Normal or Gaussian distribution -- The exponential probability distribution or Power law -- Binomial probability mass function -- Poisson probability mass function -- Percentiles and moments -- Percentiles -- Quartiles -- Computing percentiles in Python -- Moments -- Computing moments in Python -- Summary -- Chapter 3: Matplotlib and Advanced Probability Concepts -- A crash course in Matplotlib -- Generating multiple plots on one graph -- Saving graphs as images -- Adjusting the axes -- Adding a grid -- Changing line types and colors -- Labeling axes and adding a legend -- A fun example -- Generating pie charts -- Generating bar charts -- Generating scatter plots -- Generating histograms -- Generating box-and-whisker plots -- Try it yourself -- Covariance and correlation -- Defining the concepts -- Measuring covariance -- Correlation -- Computing covariance and correlation in Python -- Computing correlation -- The hard way -- Computing correlation -- The NumPy way -- Correlation activity -- Conditional probability -- Conditional probability exercises in Python -- Conditional probability assignment -- My assignment solution -- Bayes' theorem -- Summary -- Chapter 4: Predictive Models -- Linear regression -- The ordinary least squares technique -- The gradient descent technique -- The co-efficient of determination or r-squared -- Computing r-squared -- Interpreting r-squared -- Computing linear regression and r-squared using Python -- Activity for linear regression. | |
505 | 8 | |a Polynomial regression -- Implementing polynomial regression using NumPy -- Computing the r-squared error -- Activity for polynomial regression -- Multivariate regression and predicting car prices -- Multivariate regression using Python -- Activity for multivariate regression -- Multi-level models -- Summary -- Chapter 5: Machine Learning with Python -- Machine learning and train/test -- Unsupervised learning -- Supervised learning -- Evaluating supervised learning -- K-fold cross validation -- Using train/test to prevent overfitting of a polynomial regression -- Activity -- Bayesian methods -- Concepts -- Implementing a spam classifier with Naïve Bayes -- Activity -- K-Means clustering -- Limitations to k-means clustering -- Clustering people based on income and age -- Activity -- Measuring entropy -- Decision trees -- Concepts -- Decision tree example -- Walking through a decision tree -- Random forests technique -- Decision trees -- Predicting hiring decisions using Python -- Ensemble learning -- Using a random forest -- Activity -- Ensemble learning -- Support vector machine overview -- Using SVM to cluster people by using scikit-learn -- Activity -- Summary -- Chapter 6: Recommender Systems -- What are recommender systems? -- User-based collaborative filtering -- Limitations of user-based collaborative filtering -- Item-based collaborative filtering -- Understanding item-based collaborative filtering -- How item-based collaborative filtering works? -- Collaborative filtering using Python -- Finding movie similarities -- Understanding the code -- The corrwith function -- Improving the results of movie similarities -- Making movie recommendations to people -- Understanding movie recommendations with an example -- Using the groupby command to combine rows -- Removing entries with the drop command -- Improving the recommendation results -- Summary. | |
505 | 8 | |a Chapter 7: More Data Mining and Machine Learning Techniques -- K-nearest neighbors -- concepts -- Using KNN to predict a rating for a movie -- Activity -- Dimensionality reduction and principal component analysis -- Dimensionality reduction -- Principal component analysis -- A PCA example with the Iris dataset -- Activity -- Data warehousing overview -- ETL versus ELT -- Reinforcement learning -- Q-learning -- The exploration problem -- The simple approach -- The better way -- Fancy words -- Markov decision process -- Dynamic programming -- Summary -- Chapter 8: Dealing with Real-World Data -- Bias/variance trade-off -- K-fold cross-validation to avoid overfitting -- Example of k-fold cross-validation using scikit-learn -- Data cleaning and normalisation -- Cleaning web log data -- Applying a regular expression on the web log -- Modification one -- filtering the request field -- Modification two -- filtering post requests -- Modification three -- checking the user agents -- Filtering the activity of spiders/robots -- Modification four -- applying website-specific filters -- Activity for web log data -- Normalizing numerical data -- Detecting outliers -- Dealing with outliers -- Activity for outliers -- Summary -- Chapter 9: Apache Spark -- Machine Learning on Big Data -- Installing Spark -- Installing Spark on Windows -- Installing Spark on other operating systems -- Installing the Java Development Kit -- Installing Spark -- Spark introduction -- It's scalable -- It's fast -- It's young -- It's not difficult -- Components of Spark -- Python versus Scala for Spark -- Spark and Resilient Distributed Datasets (RDD) -- The SparkContext object -- Creating RDDs -- Creating an RDD using a Python list -- Loading an RDD from a text file -- More ways to create RDDs -- RDD operations -- Transformations -- Using map() -- Actions -- Introducing MLlib. | |
505 | 8 | |a Some MLlib Capabilities -- Special MLlib data types -- The vector data type -- LabeledPoint data type -- Rating data type -- Decision Trees in Spark with MLlib -- Exploring decision trees code -- Creating the SparkContext -- Importing and cleaning our data -- Creating a test candidate and building our decision tree -- Running the script -- K-Means Clustering in Spark -- Within set sum of squared errors (WSSSE) -- Running the code -- TF-IDF -- TF-IDF in practice -- Using TF- IDF -- Searching wikipedia with Spark MLlib -- Import statements -- Creating the initial RDD -- Creating and transforming a HashingTF object -- Computing the TF-IDF score -- Using the Wikipedia search engine algorithm -- Running the algorithm -- Using the Spark 2.0 DataFrame API for MLlib -- How Spark 2.0 MLlib works -- Implementing linear regression -- Summary -- Chapter 10: Testing and Experimental Design -- A/B testing concepts -- A/B tests -- Measuring conversion for A/B testing -- How to attribute conversions -- Variance is your enemy -- T-test and p-value -- The t-statistic or t-test -- The p-value -- Measuring t-statistics and p-values using Python -- Running A/B test on some experimental data -- When there's no real difference between the two groups -- Does the sample size make a difference? -- Sample size increased to six-digits -- Sample size increased seven-digits -- A/A testing -- Determining how long to run an experiment for -- A/B test gotchas -- Novelty effects -- Seasonal effects -- Selection bias -- Auditing selection bias issues -- Data pollution -- Attribution errors -- Summary -- Index. | |
630 | 0 | 0 | |a Spark (Electronic resource : Apache Software Foundation) |0 http://id.loc.gov/authorities/names/no2015027445 |
630 | 0 | 7 | |a Spark (Electronic resource : Apache Software Foundation) |2 fast |
650 | 0 | |a Python (Computer program language) |0 http://id.loc.gov/authorities/subjects/sh96008834 | |
650 | 0 | |a Machine learning. |0 http://id.loc.gov/authorities/subjects/sh85079324 | |
650 | 0 | |a Data mining. |0 http://id.loc.gov/authorities/subjects/sh97002073 | |
650 | 0 | |a Artificial intelligence. |0 http://id.loc.gov/authorities/subjects/sh85008180 | |
650 | 2 | |a Data Mining |0 https://id.nlm.nih.gov/mesh/D057225 | |
650 | 2 | |a Artificial Intelligence |0 https://id.nlm.nih.gov/mesh/D001185 | |
650 | 2 | |a Machine Learning |0 https://id.nlm.nih.gov/mesh/D000069550 | |
650 | 6 | |a Python (Langage de programmation) | |
650 | 6 | |a Apprentissage automatique. | |
650 | 6 | |a Exploration de données (Informatique) | |
650 | 6 | |a Intelligence artificielle. | |
650 | 7 | |a artificial intelligence. |2 aat | |
650 | 7 | |a COMPUTERS |x Programming |x General. |2 bisacsh | |
650 | 7 | |a Artificial intelligence |2 fast | |
650 | 7 | |a Data mining |2 fast | |
650 | 7 | |a Machine learning |2 fast | |
650 | 7 | |a Python (Computer program language) |2 fast | |
758 | |i has work: |a Hands-on data science and Python machine learning (Text) |1 https://id.oclc.org/worldcat/entity/E39PCGmjhxVPVYxqbkrTxcyb8d |4 https://id.oclc.org/worldcat/ontology/hasWork | ||
776 | 0 | |z 1787280748 | |
856 | 4 | 0 | |l FWS01 |p ZDB-4-EBA |q FWS_PDA_EBA |u https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1566405 |3 Volltext |
938 | |a EBSCOhost |b EBSC |n 1566405 | ||
938 | |a ProQuest MyiLibrary Digital eBook Collection |b IDEB |n cis38588568 | ||
938 | |a YBP Library Services |b YANK |n 14736476 | ||
994 | |a 92 |b GEBAY | ||
912 | |a ZDB-4-EBA | ||
049 | |a DE-863 |
Datensatz im Suchindex
DE-BY-FWS_katkey | ZDB-4-EBA-on1001346998 |
---|---|
_version_ | 1816882397998743553 |
adam_text | |
any_adam_object | |
author | Kane, Frank |
author_facet | Kane, Frank |
author_role | aut |
author_sort | Kane, Frank |
author_variant | f k fk |
building | Verbundindex |
bvnumber | localFWS |
callnumber-first | Q - Science |
callnumber-label | QA76 |
callnumber-raw | QA76.73.P98 |
callnumber-search | QA76.73.P98 |
callnumber-sort | QA 276.73 P98 |
callnumber-subject | QA - Mathematics |
collection | ZDB-4-EBA |
contents | Intro -- Copyright -- Credits -- About the Author -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Getting Started -- Installing Enthought Canopy -- Giving the installation a test run -- If you occasionally get problems opening your IPNYB files -- Using and understanding IPython (Jupyter) Notebooks -- Python basics -- Part 1 -- Understanding Python code -- Importing modules -- Data structures -- Experimenting with lists -- Pre colon -- Post colon -- Negative syntax -- Adding list to list -- The append function -- Complex data structures -- Dereferencing a single element -- The sort function -- Reverse sort -- Tuples -- Dereferencing an element -- List of tuples -- Dictionaries -- Iterating through entries -- Python basics -- Part 2 -- Functions in Python -- Lambda functions -- functional programming -- Understanding boolean expressions -- The if statement -- The if-else loop -- Looping -- The while loop -- Exploring activity -- Running Python scripts -- More options than just the IPython/Jupyter Notebook -- Running Python scripts in command prompt -- Using the Canopy IDE -- Summary -- Chapter 2: Statistics and Probability Refresher, and Python Practice -- Types of data -- Numerical data -- Discrete data -- Continuous data -- Categorical data -- Ordinal data -- Mean, median, and mode -- Mean -- Median -- The factor of outliers -- Mode -- Using mean, median, and mode in Python -- Calculating mean using the NumPy package -- Visualizing data using matplotlib -- Calculating median using the NumPy package -- Analyzing the effect of outliers -- Calculating mode using the SciPy package -- Some exercises -- Standard deviation and variance -- Variance -- Measuring variance -- Standard deviation -- Identifying outliers with standard deviation -- Population variance versus sample variance -- The Mathematical explanation. Analyzing standard deviation and variance on a histogram -- Using Python to compute standard deviation and variance -- Try it yourself -- Probability density function and probability mass function -- The probability density function and probability mass functions -- Probability density functions -- Probability mass functions -- Types of data distributions -- Uniform distribution -- Normal or Gaussian distribution -- The exponential probability distribution or Power law -- Binomial probability mass function -- Poisson probability mass function -- Percentiles and moments -- Percentiles -- Quartiles -- Computing percentiles in Python -- Moments -- Computing moments in Python -- Summary -- Chapter 3: Matplotlib and Advanced Probability Concepts -- A crash course in Matplotlib -- Generating multiple plots on one graph -- Saving graphs as images -- Adjusting the axes -- Adding a grid -- Changing line types and colors -- Labeling axes and adding a legend -- A fun example -- Generating pie charts -- Generating bar charts -- Generating scatter plots -- Generating histograms -- Generating box-and-whisker plots -- Try it yourself -- Covariance and correlation -- Defining the concepts -- Measuring covariance -- Correlation -- Computing covariance and correlation in Python -- Computing correlation -- The hard way -- Computing correlation -- The NumPy way -- Correlation activity -- Conditional probability -- Conditional probability exercises in Python -- Conditional probability assignment -- My assignment solution -- Bayes' theorem -- Summary -- Chapter 4: Predictive Models -- Linear regression -- The ordinary least squares technique -- The gradient descent technique -- The co-efficient of determination or r-squared -- Computing r-squared -- Interpreting r-squared -- Computing linear regression and r-squared using Python -- Activity for linear regression. Polynomial regression -- Implementing polynomial regression using NumPy -- Computing the r-squared error -- Activity for polynomial regression -- Multivariate regression and predicting car prices -- Multivariate regression using Python -- Activity for multivariate regression -- Multi-level models -- Summary -- Chapter 5: Machine Learning with Python -- Machine learning and train/test -- Unsupervised learning -- Supervised learning -- Evaluating supervised learning -- K-fold cross validation -- Using train/test to prevent overfitting of a polynomial regression -- Activity -- Bayesian methods -- Concepts -- Implementing a spam classifier with Naïve Bayes -- Activity -- K-Means clustering -- Limitations to k-means clustering -- Clustering people based on income and age -- Activity -- Measuring entropy -- Decision trees -- Concepts -- Decision tree example -- Walking through a decision tree -- Random forests technique -- Decision trees -- Predicting hiring decisions using Python -- Ensemble learning -- Using a random forest -- Activity -- Ensemble learning -- Support vector machine overview -- Using SVM to cluster people by using scikit-learn -- Activity -- Summary -- Chapter 6: Recommender Systems -- What are recommender systems? -- User-based collaborative filtering -- Limitations of user-based collaborative filtering -- Item-based collaborative filtering -- Understanding item-based collaborative filtering -- How item-based collaborative filtering works? -- Collaborative filtering using Python -- Finding movie similarities -- Understanding the code -- The corrwith function -- Improving the results of movie similarities -- Making movie recommendations to people -- Understanding movie recommendations with an example -- Using the groupby command to combine rows -- Removing entries with the drop command -- Improving the recommendation results -- Summary. Chapter 7: More Data Mining and Machine Learning Techniques -- K-nearest neighbors -- concepts -- Using KNN to predict a rating for a movie -- Activity -- Dimensionality reduction and principal component analysis -- Dimensionality reduction -- Principal component analysis -- A PCA example with the Iris dataset -- Activity -- Data warehousing overview -- ETL versus ELT -- Reinforcement learning -- Q-learning -- The exploration problem -- The simple approach -- The better way -- Fancy words -- Markov decision process -- Dynamic programming -- Summary -- Chapter 8: Dealing with Real-World Data -- Bias/variance trade-off -- K-fold cross-validation to avoid overfitting -- Example of k-fold cross-validation using scikit-learn -- Data cleaning and normalisation -- Cleaning web log data -- Applying a regular expression on the web log -- Modification one -- filtering the request field -- Modification two -- filtering post requests -- Modification three -- checking the user agents -- Filtering the activity of spiders/robots -- Modification four -- applying website-specific filters -- Activity for web log data -- Normalizing numerical data -- Detecting outliers -- Dealing with outliers -- Activity for outliers -- Summary -- Chapter 9: Apache Spark -- Machine Learning on Big Data -- Installing Spark -- Installing Spark on Windows -- Installing Spark on other operating systems -- Installing the Java Development Kit -- Installing Spark -- Spark introduction -- It's scalable -- It's fast -- It's young -- It's not difficult -- Components of Spark -- Python versus Scala for Spark -- Spark and Resilient Distributed Datasets (RDD) -- The SparkContext object -- Creating RDDs -- Creating an RDD using a Python list -- Loading an RDD from a text file -- More ways to create RDDs -- RDD operations -- Transformations -- Using map() -- Actions -- Introducing MLlib. Some MLlib Capabilities -- Special MLlib data types -- The vector data type -- LabeledPoint data type -- Rating data type -- Decision Trees in Spark with MLlib -- Exploring decision trees code -- Creating the SparkContext -- Importing and cleaning our data -- Creating a test candidate and building our decision tree -- Running the script -- K-Means Clustering in Spark -- Within set sum of squared errors (WSSSE) -- Running the code -- TF-IDF -- TF-IDF in practice -- Using TF- IDF -- Searching wikipedia with Spark MLlib -- Import statements -- Creating the initial RDD -- Creating and transforming a HashingTF object -- Computing the TF-IDF score -- Using the Wikipedia search engine algorithm -- Running the algorithm -- Using the Spark 2.0 DataFrame API for MLlib -- How Spark 2.0 MLlib works -- Implementing linear regression -- Summary -- Chapter 10: Testing and Experimental Design -- A/B testing concepts -- A/B tests -- Measuring conversion for A/B testing -- How to attribute conversions -- Variance is your enemy -- T-test and p-value -- The t-statistic or t-test -- The p-value -- Measuring t-statistics and p-values using Python -- Running A/B test on some experimental data -- When there's no real difference between the two groups -- Does the sample size make a difference? -- Sample size increased to six-digits -- Sample size increased seven-digits -- A/A testing -- Determining how long to run an experiment for -- A/B test gotchas -- Novelty effects -- Seasonal effects -- Selection bias -- Auditing selection bias issues -- Data pollution -- Attribution errors -- Summary -- Index. |
ctrlnum | (OCoLC)1001346998 |
dewey-full | 005.133 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 005 - Computer programming, programs, data, security |
dewey-raw | 005.133 |
dewey-search | 005.133 |
dewey-sort | 15.133 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik |
format | Electronic eBook |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>14823cam a2200733 i 4500</leader><controlfield tag="001">ZDB-4-EBA-on1001346998</controlfield><controlfield tag="003">OCoLC</controlfield><controlfield tag="005">20241004212047.0</controlfield><controlfield tag="006">m o d </controlfield><controlfield tag="007">cr unu||||||||</controlfield><controlfield tag="008">170818s2017 enka o 000 0 eng d</controlfield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">UMI</subfield><subfield code="b">eng</subfield><subfield code="e">rda</subfield><subfield code="e">pn</subfield><subfield code="c">UMI</subfield><subfield code="d">IDEBK</subfield><subfield code="d">OCLCF</subfield><subfield code="d">TOH</subfield><subfield code="d">STF</subfield><subfield code="d">N$T</subfield><subfield code="d">YDX</subfield><subfield code="d">COO</subfield><subfield code="d">KNOVL</subfield><subfield code="d">UOK</subfield><subfield code="d">CEF</subfield><subfield code="d">KSU</subfield><subfield code="d">UAB</subfield><subfield code="d">AU@</subfield><subfield code="d">ERF</subfield><subfield code="d">OCLCQ</subfield><subfield code="d">OCLCO</subfield><subfield code="d">OCLCQ</subfield><subfield code="d">OCLCO</subfield><subfield code="d">OCLCL</subfield><subfield code="d">DXU</subfield></datafield><datafield tag="019" ind1=" " ind2=" "><subfield code="a">1097002980</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781787280229</subfield><subfield code="q">(electronic bk.)</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1787280225</subfield><subfield code="q">(electronic bk.)</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781523112227</subfield><subfield code="q">(electronic bk.)</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1523112220</subfield><subfield code="q">(electronic bk.)</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="z">9781787280748</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1001346998</subfield><subfield code="z">(OCoLC)1097002980</subfield></datafield><datafield tag="037" ind1=" " ind2=" "><subfield code="a">CL0500000885</subfield><subfield code="b">Safari Books Online</subfield></datafield><datafield tag="050" ind1=" " ind2="4"><subfield code="a">QA76.73.P98</subfield></datafield><datafield tag="072" ind1=" " ind2="7"><subfield code="a">COM</subfield><subfield code="x">051000</subfield><subfield code="2">bisacsh</subfield></datafield><datafield tag="082" ind1="7" ind2=" "><subfield code="a">005.133</subfield><subfield code="2">23</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">MAIN</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Kane, Frank,</subfield><subfield code="e">author.</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Hands-on data science and Python machine learning :</subfield><subfield code="b">perform data mining and machine learning efficiently using Python and Spark /</subfield><subfield code="c">Frank Kane.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Birmingham, UK :</subfield><subfield code="b">Packt Publishing,</subfield><subfield code="c">2017.</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 online resource (1 volume) :</subfield><subfield code="b">illustrations</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">computer</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">online resource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="588" ind1="0" ind2=" "><subfield code="a">Online resource; title from title page (Safari, viewed August 18, 2017).</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">This book covers the fundamentals of machine learning with Python in a concise and dynamic manner. It covers data mining and large-scale machine learning using Apache Spark. About This Book Take your first steps in the world of data science by understanding the tools and techniques of data analysis Train efficient Machine Learning models in Python using the supervised and unsupervised learning methods Learn how to use Apache Spark for processing Big Data efficiently Who This Book Is For If you are a budding data scientist or a data analyst who wants to analyze and gain actionable insights from data using Python, this book is for you. Programmers with some experience in Python who want to enter the lucrative world of Data Science will also find this book to be very useful, but you don't need to be an expert Python coder or mathematician to get the most from this book. What You Will Learn Learn how to clean your data and ready it for analysis Implement the popular clustering and regression methods in Python Train efficient machine learning models using decision trees and random forests Visualize the results of your analysis using Python's Matplotlib library Use Apache Spark's MLlib package to perform machine learning on large datasets In Detail Join Frank Kane, who worked on Amazon and IMDb's machine learning algorithms, as he guides you on your first steps into the world of data science. Hands-On Data Science and Python Machine Learning gives you the tools that you need to understand and explore the core topics in the field, and the confidence and practice to build and analyze your own machine learning models. With the help of interesting and easy-to-follow practical examples, Frank Kane explains potentially complex topics such as Bayesian methods and K-means clustering in a way that anybody can understand them. Based on Frank's successful data science course, Hands-On Data Science and Python Machine Learning empowers you to conduct data analysis and perform efficient machine learning using Python. Let Frank help you unearth the value in your data using the various data mining and data analysis techniques available in Python, and to develop efficient predictive models to predict future results. You will also learn how to perform large-scale machine learning on Big Data using Apache Spark. The book covers preparing your data for analysis, training machine learning models, and visualizing the final data analysis. Style and approach This comprehen ...</subfield></datafield><datafield tag="505" ind1="0" ind2=" "><subfield code="a">Intro -- Copyright -- Credits -- About the Author -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Getting Started -- Installing Enthought Canopy -- Giving the installation a test run -- If you occasionally get problems opening your IPNYB files -- Using and understanding IPython (Jupyter) Notebooks -- Python basics -- Part 1 -- Understanding Python code -- Importing modules -- Data structures -- Experimenting with lists -- Pre colon -- Post colon -- Negative syntax -- Adding list to list -- The append function -- Complex data structures -- Dereferencing a single element -- The sort function -- Reverse sort -- Tuples -- Dereferencing an element -- List of tuples -- Dictionaries -- Iterating through entries -- Python basics -- Part 2 -- Functions in Python -- Lambda functions -- functional programming -- Understanding boolean expressions -- The if statement -- The if-else loop -- Looping -- The while loop -- Exploring activity -- Running Python scripts -- More options than just the IPython/Jupyter Notebook -- Running Python scripts in command prompt -- Using the Canopy IDE -- Summary -- Chapter 2: Statistics and Probability Refresher, and Python Practice -- Types of data -- Numerical data -- Discrete data -- Continuous data -- Categorical data -- Ordinal data -- Mean, median, and mode -- Mean -- Median -- The factor of outliers -- Mode -- Using mean, median, and mode in Python -- Calculating mean using the NumPy package -- Visualizing data using matplotlib -- Calculating median using the NumPy package -- Analyzing the effect of outliers -- Calculating mode using the SciPy package -- Some exercises -- Standard deviation and variance -- Variance -- Measuring variance -- Standard deviation -- Identifying outliers with standard deviation -- Population variance versus sample variance -- The Mathematical explanation.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Analyzing standard deviation and variance on a histogram -- Using Python to compute standard deviation and variance -- Try it yourself -- Probability density function and probability mass function -- The probability density function and probability mass functions -- Probability density functions -- Probability mass functions -- Types of data distributions -- Uniform distribution -- Normal or Gaussian distribution -- The exponential probability distribution or Power law -- Binomial probability mass function -- Poisson probability mass function -- Percentiles and moments -- Percentiles -- Quartiles -- Computing percentiles in Python -- Moments -- Computing moments in Python -- Summary -- Chapter 3: Matplotlib and Advanced Probability Concepts -- A crash course in Matplotlib -- Generating multiple plots on one graph -- Saving graphs as images -- Adjusting the axes -- Adding a grid -- Changing line types and colors -- Labeling axes and adding a legend -- A fun example -- Generating pie charts -- Generating bar charts -- Generating scatter plots -- Generating histograms -- Generating box-and-whisker plots -- Try it yourself -- Covariance and correlation -- Defining the concepts -- Measuring covariance -- Correlation -- Computing covariance and correlation in Python -- Computing correlation -- The hard way -- Computing correlation -- The NumPy way -- Correlation activity -- Conditional probability -- Conditional probability exercises in Python -- Conditional probability assignment -- My assignment solution -- Bayes' theorem -- Summary -- Chapter 4: Predictive Models -- Linear regression -- The ordinary least squares technique -- The gradient descent technique -- The co-efficient of determination or r-squared -- Computing r-squared -- Interpreting r-squared -- Computing linear regression and r-squared using Python -- Activity for linear regression.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Polynomial regression -- Implementing polynomial regression using NumPy -- Computing the r-squared error -- Activity for polynomial regression -- Multivariate regression and predicting car prices -- Multivariate regression using Python -- Activity for multivariate regression -- Multi-level models -- Summary -- Chapter 5: Machine Learning with Python -- Machine learning and train/test -- Unsupervised learning -- Supervised learning -- Evaluating supervised learning -- K-fold cross validation -- Using train/test to prevent overfitting of a polynomial regression -- Activity -- Bayesian methods -- Concepts -- Implementing a spam classifier with Naïve Bayes -- Activity -- K-Means clustering -- Limitations to k-means clustering -- Clustering people based on income and age -- Activity -- Measuring entropy -- Decision trees -- Concepts -- Decision tree example -- Walking through a decision tree -- Random forests technique -- Decision trees -- Predicting hiring decisions using Python -- Ensemble learning -- Using a random forest -- Activity -- Ensemble learning -- Support vector machine overview -- Using SVM to cluster people by using scikit-learn -- Activity -- Summary -- Chapter 6: Recommender Systems -- What are recommender systems? -- User-based collaborative filtering -- Limitations of user-based collaborative filtering -- Item-based collaborative filtering -- Understanding item-based collaborative filtering -- How item-based collaborative filtering works? -- Collaborative filtering using Python -- Finding movie similarities -- Understanding the code -- The corrwith function -- Improving the results of movie similarities -- Making movie recommendations to people -- Understanding movie recommendations with an example -- Using the groupby command to combine rows -- Removing entries with the drop command -- Improving the recommendation results -- Summary.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Chapter 7: More Data Mining and Machine Learning Techniques -- K-nearest neighbors -- concepts -- Using KNN to predict a rating for a movie -- Activity -- Dimensionality reduction and principal component analysis -- Dimensionality reduction -- Principal component analysis -- A PCA example with the Iris dataset -- Activity -- Data warehousing overview -- ETL versus ELT -- Reinforcement learning -- Q-learning -- The exploration problem -- The simple approach -- The better way -- Fancy words -- Markov decision process -- Dynamic programming -- Summary -- Chapter 8: Dealing with Real-World Data -- Bias/variance trade-off -- K-fold cross-validation to avoid overfitting -- Example of k-fold cross-validation using scikit-learn -- Data cleaning and normalisation -- Cleaning web log data -- Applying a regular expression on the web log -- Modification one -- filtering the request field -- Modification two -- filtering post requests -- Modification three -- checking the user agents -- Filtering the activity of spiders/robots -- Modification four -- applying website-specific filters -- Activity for web log data -- Normalizing numerical data -- Detecting outliers -- Dealing with outliers -- Activity for outliers -- Summary -- Chapter 9: Apache Spark -- Machine Learning on Big Data -- Installing Spark -- Installing Spark on Windows -- Installing Spark on other operating systems -- Installing the Java Development Kit -- Installing Spark -- Spark introduction -- It's scalable -- It's fast -- It's young -- It's not difficult -- Components of Spark -- Python versus Scala for Spark -- Spark and Resilient Distributed Datasets (RDD) -- The SparkContext object -- Creating RDDs -- Creating an RDD using a Python list -- Loading an RDD from a text file -- More ways to create RDDs -- RDD operations -- Transformations -- Using map() -- Actions -- Introducing MLlib.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Some MLlib Capabilities -- Special MLlib data types -- The vector data type -- LabeledPoint data type -- Rating data type -- Decision Trees in Spark with MLlib -- Exploring decision trees code -- Creating the SparkContext -- Importing and cleaning our data -- Creating a test candidate and building our decision tree -- Running the script -- K-Means Clustering in Spark -- Within set sum of squared errors (WSSSE) -- Running the code -- TF-IDF -- TF-IDF in practice -- Using TF- IDF -- Searching wikipedia with Spark MLlib -- Import statements -- Creating the initial RDD -- Creating and transforming a HashingTF object -- Computing the TF-IDF score -- Using the Wikipedia search engine algorithm -- Running the algorithm -- Using the Spark 2.0 DataFrame API for MLlib -- How Spark 2.0 MLlib works -- Implementing linear regression -- Summary -- Chapter 10: Testing and Experimental Design -- A/B testing concepts -- A/B tests -- Measuring conversion for A/B testing -- How to attribute conversions -- Variance is your enemy -- T-test and p-value -- The t-statistic or t-test -- The p-value -- Measuring t-statistics and p-values using Python -- Running A/B test on some experimental data -- When there's no real difference between the two groups -- Does the sample size make a difference? -- Sample size increased to six-digits -- Sample size increased seven-digits -- A/A testing -- Determining how long to run an experiment for -- A/B test gotchas -- Novelty effects -- Seasonal effects -- Selection bias -- Auditing selection bias issues -- Data pollution -- Attribution errors -- Summary -- Index.</subfield></datafield><datafield tag="630" ind1="0" ind2="0"><subfield code="a">Spark (Electronic resource : Apache Software Foundation)</subfield><subfield code="0">http://id.loc.gov/authorities/names/no2015027445</subfield></datafield><datafield tag="630" ind1="0" ind2="7"><subfield code="a">Spark (Electronic resource : Apache Software Foundation)</subfield><subfield code="2">fast</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Python (Computer program language)</subfield><subfield code="0">http://id.loc.gov/authorities/subjects/sh96008834</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Machine learning.</subfield><subfield code="0">http://id.loc.gov/authorities/subjects/sh85079324</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Data mining.</subfield><subfield code="0">http://id.loc.gov/authorities/subjects/sh97002073</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Artificial intelligence.</subfield><subfield code="0">http://id.loc.gov/authorities/subjects/sh85008180</subfield></datafield><datafield tag="650" ind1=" " ind2="2"><subfield code="a">Data Mining</subfield><subfield code="0">https://id.nlm.nih.gov/mesh/D057225</subfield></datafield><datafield tag="650" ind1=" " ind2="2"><subfield code="a">Artificial Intelligence</subfield><subfield code="0">https://id.nlm.nih.gov/mesh/D001185</subfield></datafield><datafield tag="650" ind1=" " ind2="2"><subfield code="a">Machine Learning</subfield><subfield code="0">https://id.nlm.nih.gov/mesh/D000069550</subfield></datafield><datafield tag="650" ind1=" " ind2="6"><subfield code="a">Python (Langage de programmation)</subfield></datafield><datafield tag="650" ind1=" " ind2="6"><subfield code="a">Apprentissage automatique.</subfield></datafield><datafield tag="650" ind1=" " ind2="6"><subfield code="a">Exploration de données (Informatique)</subfield></datafield><datafield tag="650" ind1=" " ind2="6"><subfield code="a">Intelligence artificielle.</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">artificial intelligence.</subfield><subfield code="2">aat</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">COMPUTERS</subfield><subfield code="x">Programming</subfield><subfield code="x">General.</subfield><subfield code="2">bisacsh</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Artificial intelligence</subfield><subfield code="2">fast</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Data mining</subfield><subfield code="2">fast</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Machine learning</subfield><subfield code="2">fast</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Python (Computer program language)</subfield><subfield code="2">fast</subfield></datafield><datafield tag="758" ind1=" " ind2=" "><subfield code="i">has work:</subfield><subfield code="a">Hands-on data science and Python machine learning (Text)</subfield><subfield code="1">https://id.oclc.org/worldcat/entity/E39PCGmjhxVPVYxqbkrTxcyb8d</subfield><subfield code="4">https://id.oclc.org/worldcat/ontology/hasWork</subfield></datafield><datafield tag="776" ind1="0" ind2=" "><subfield code="z">1787280748</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="l">FWS01</subfield><subfield code="p">ZDB-4-EBA</subfield><subfield code="q">FWS_PDA_EBA</subfield><subfield code="u">https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1566405</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="938" ind1=" " ind2=" "><subfield code="a">EBSCOhost</subfield><subfield code="b">EBSC</subfield><subfield code="n">1566405</subfield></datafield><datafield tag="938" ind1=" " ind2=" "><subfield code="a">ProQuest MyiLibrary Digital eBook Collection</subfield><subfield code="b">IDEB</subfield><subfield code="n">cis38588568</subfield></datafield><datafield tag="938" ind1=" " ind2=" "><subfield code="a">YBP Library Services</subfield><subfield code="b">YANK</subfield><subfield code="n">14736476</subfield></datafield><datafield tag="994" ind1=" " ind2=" "><subfield code="a">92</subfield><subfield code="b">GEBAY</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-4-EBA</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-863</subfield></datafield></record></collection> |
id | ZDB-4-EBA-on1001346998 |
illustrated | Illustrated |
indexdate | 2024-11-27T13:27:58Z |
institution | BVB |
isbn | 9781787280229 1787280225 9781523112227 1523112220 |
language | English |
oclc_num | 1001346998 |
open_access_boolean | |
owner | MAIN DE-863 DE-BY-FWS |
owner_facet | MAIN DE-863 DE-BY-FWS |
physical | 1 online resource (1 volume) : illustrations |
psigel | ZDB-4-EBA |
publishDate | 2017 |
publishDateSearch | 2017 |
publishDateSort | 2017 |
publisher | Packt Publishing, |
record_format | marc |
spelling | Kane, Frank, author. Hands-on data science and Python machine learning : perform data mining and machine learning efficiently using Python and Spark / Frank Kane. Birmingham, UK : Packt Publishing, 2017. 1 online resource (1 volume) : illustrations text txt rdacontent computer c rdamedia online resource cr rdacarrier Online resource; title from title page (Safari, viewed August 18, 2017). This book covers the fundamentals of machine learning with Python in a concise and dynamic manner. It covers data mining and large-scale machine learning using Apache Spark. About This Book Take your first steps in the world of data science by understanding the tools and techniques of data analysis Train efficient Machine Learning models in Python using the supervised and unsupervised learning methods Learn how to use Apache Spark for processing Big Data efficiently Who This Book Is For If you are a budding data scientist or a data analyst who wants to analyze and gain actionable insights from data using Python, this book is for you. Programmers with some experience in Python who want to enter the lucrative world of Data Science will also find this book to be very useful, but you don't need to be an expert Python coder or mathematician to get the most from this book. What You Will Learn Learn how to clean your data and ready it for analysis Implement the popular clustering and regression methods in Python Train efficient machine learning models using decision trees and random forests Visualize the results of your analysis using Python's Matplotlib library Use Apache Spark's MLlib package to perform machine learning on large datasets In Detail Join Frank Kane, who worked on Amazon and IMDb's machine learning algorithms, as he guides you on your first steps into the world of data science. Hands-On Data Science and Python Machine Learning gives you the tools that you need to understand and explore the core topics in the field, and the confidence and practice to build and analyze your own machine learning models. With the help of interesting and easy-to-follow practical examples, Frank Kane explains potentially complex topics such as Bayesian methods and K-means clustering in a way that anybody can understand them. Based on Frank's successful data science course, Hands-On Data Science and Python Machine Learning empowers you to conduct data analysis and perform efficient machine learning using Python. Let Frank help you unearth the value in your data using the various data mining and data analysis techniques available in Python, and to develop efficient predictive models to predict future results. You will also learn how to perform large-scale machine learning on Big Data using Apache Spark. The book covers preparing your data for analysis, training machine learning models, and visualizing the final data analysis. Style and approach This comprehen ... Intro -- Copyright -- Credits -- About the Author -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Getting Started -- Installing Enthought Canopy -- Giving the installation a test run -- If you occasionally get problems opening your IPNYB files -- Using and understanding IPython (Jupyter) Notebooks -- Python basics -- Part 1 -- Understanding Python code -- Importing modules -- Data structures -- Experimenting with lists -- Pre colon -- Post colon -- Negative syntax -- Adding list to list -- The append function -- Complex data structures -- Dereferencing a single element -- The sort function -- Reverse sort -- Tuples -- Dereferencing an element -- List of tuples -- Dictionaries -- Iterating through entries -- Python basics -- Part 2 -- Functions in Python -- Lambda functions -- functional programming -- Understanding boolean expressions -- The if statement -- The if-else loop -- Looping -- The while loop -- Exploring activity -- Running Python scripts -- More options than just the IPython/Jupyter Notebook -- Running Python scripts in command prompt -- Using the Canopy IDE -- Summary -- Chapter 2: Statistics and Probability Refresher, and Python Practice -- Types of data -- Numerical data -- Discrete data -- Continuous data -- Categorical data -- Ordinal data -- Mean, median, and mode -- Mean -- Median -- The factor of outliers -- Mode -- Using mean, median, and mode in Python -- Calculating mean using the NumPy package -- Visualizing data using matplotlib -- Calculating median using the NumPy package -- Analyzing the effect of outliers -- Calculating mode using the SciPy package -- Some exercises -- Standard deviation and variance -- Variance -- Measuring variance -- Standard deviation -- Identifying outliers with standard deviation -- Population variance versus sample variance -- The Mathematical explanation. Analyzing standard deviation and variance on a histogram -- Using Python to compute standard deviation and variance -- Try it yourself -- Probability density function and probability mass function -- The probability density function and probability mass functions -- Probability density functions -- Probability mass functions -- Types of data distributions -- Uniform distribution -- Normal or Gaussian distribution -- The exponential probability distribution or Power law -- Binomial probability mass function -- Poisson probability mass function -- Percentiles and moments -- Percentiles -- Quartiles -- Computing percentiles in Python -- Moments -- Computing moments in Python -- Summary -- Chapter 3: Matplotlib and Advanced Probability Concepts -- A crash course in Matplotlib -- Generating multiple plots on one graph -- Saving graphs as images -- Adjusting the axes -- Adding a grid -- Changing line types and colors -- Labeling axes and adding a legend -- A fun example -- Generating pie charts -- Generating bar charts -- Generating scatter plots -- Generating histograms -- Generating box-and-whisker plots -- Try it yourself -- Covariance and correlation -- Defining the concepts -- Measuring covariance -- Correlation -- Computing covariance and correlation in Python -- Computing correlation -- The hard way -- Computing correlation -- The NumPy way -- Correlation activity -- Conditional probability -- Conditional probability exercises in Python -- Conditional probability assignment -- My assignment solution -- Bayes' theorem -- Summary -- Chapter 4: Predictive Models -- Linear regression -- The ordinary least squares technique -- The gradient descent technique -- The co-efficient of determination or r-squared -- Computing r-squared -- Interpreting r-squared -- Computing linear regression and r-squared using Python -- Activity for linear regression. Polynomial regression -- Implementing polynomial regression using NumPy -- Computing the r-squared error -- Activity for polynomial regression -- Multivariate regression and predicting car prices -- Multivariate regression using Python -- Activity for multivariate regression -- Multi-level models -- Summary -- Chapter 5: Machine Learning with Python -- Machine learning and train/test -- Unsupervised learning -- Supervised learning -- Evaluating supervised learning -- K-fold cross validation -- Using train/test to prevent overfitting of a polynomial regression -- Activity -- Bayesian methods -- Concepts -- Implementing a spam classifier with Naïve Bayes -- Activity -- K-Means clustering -- Limitations to k-means clustering -- Clustering people based on income and age -- Activity -- Measuring entropy -- Decision trees -- Concepts -- Decision tree example -- Walking through a decision tree -- Random forests technique -- Decision trees -- Predicting hiring decisions using Python -- Ensemble learning -- Using a random forest -- Activity -- Ensemble learning -- Support vector machine overview -- Using SVM to cluster people by using scikit-learn -- Activity -- Summary -- Chapter 6: Recommender Systems -- What are recommender systems? -- User-based collaborative filtering -- Limitations of user-based collaborative filtering -- Item-based collaborative filtering -- Understanding item-based collaborative filtering -- How item-based collaborative filtering works? -- Collaborative filtering using Python -- Finding movie similarities -- Understanding the code -- The corrwith function -- Improving the results of movie similarities -- Making movie recommendations to people -- Understanding movie recommendations with an example -- Using the groupby command to combine rows -- Removing entries with the drop command -- Improving the recommendation results -- Summary. Chapter 7: More Data Mining and Machine Learning Techniques -- K-nearest neighbors -- concepts -- Using KNN to predict a rating for a movie -- Activity -- Dimensionality reduction and principal component analysis -- Dimensionality reduction -- Principal component analysis -- A PCA example with the Iris dataset -- Activity -- Data warehousing overview -- ETL versus ELT -- Reinforcement learning -- Q-learning -- The exploration problem -- The simple approach -- The better way -- Fancy words -- Markov decision process -- Dynamic programming -- Summary -- Chapter 8: Dealing with Real-World Data -- Bias/variance trade-off -- K-fold cross-validation to avoid overfitting -- Example of k-fold cross-validation using scikit-learn -- Data cleaning and normalisation -- Cleaning web log data -- Applying a regular expression on the web log -- Modification one -- filtering the request field -- Modification two -- filtering post requests -- Modification three -- checking the user agents -- Filtering the activity of spiders/robots -- Modification four -- applying website-specific filters -- Activity for web log data -- Normalizing numerical data -- Detecting outliers -- Dealing with outliers -- Activity for outliers -- Summary -- Chapter 9: Apache Spark -- Machine Learning on Big Data -- Installing Spark -- Installing Spark on Windows -- Installing Spark on other operating systems -- Installing the Java Development Kit -- Installing Spark -- Spark introduction -- It's scalable -- It's fast -- It's young -- It's not difficult -- Components of Spark -- Python versus Scala for Spark -- Spark and Resilient Distributed Datasets (RDD) -- The SparkContext object -- Creating RDDs -- Creating an RDD using a Python list -- Loading an RDD from a text file -- More ways to create RDDs -- RDD operations -- Transformations -- Using map() -- Actions -- Introducing MLlib. Some MLlib Capabilities -- Special MLlib data types -- The vector data type -- LabeledPoint data type -- Rating data type -- Decision Trees in Spark with MLlib -- Exploring decision trees code -- Creating the SparkContext -- Importing and cleaning our data -- Creating a test candidate and building our decision tree -- Running the script -- K-Means Clustering in Spark -- Within set sum of squared errors (WSSSE) -- Running the code -- TF-IDF -- TF-IDF in practice -- Using TF- IDF -- Searching wikipedia with Spark MLlib -- Import statements -- Creating the initial RDD -- Creating and transforming a HashingTF object -- Computing the TF-IDF score -- Using the Wikipedia search engine algorithm -- Running the algorithm -- Using the Spark 2.0 DataFrame API for MLlib -- How Spark 2.0 MLlib works -- Implementing linear regression -- Summary -- Chapter 10: Testing and Experimental Design -- A/B testing concepts -- A/B tests -- Measuring conversion for A/B testing -- How to attribute conversions -- Variance is your enemy -- T-test and p-value -- The t-statistic or t-test -- The p-value -- Measuring t-statistics and p-values using Python -- Running A/B test on some experimental data -- When there's no real difference between the two groups -- Does the sample size make a difference? -- Sample size increased to six-digits -- Sample size increased seven-digits -- A/A testing -- Determining how long to run an experiment for -- A/B test gotchas -- Novelty effects -- Seasonal effects -- Selection bias -- Auditing selection bias issues -- Data pollution -- Attribution errors -- Summary -- Index. Spark (Electronic resource : Apache Software Foundation) http://id.loc.gov/authorities/names/no2015027445 Spark (Electronic resource : Apache Software Foundation) fast Python (Computer program language) http://id.loc.gov/authorities/subjects/sh96008834 Machine learning. http://id.loc.gov/authorities/subjects/sh85079324 Data mining. http://id.loc.gov/authorities/subjects/sh97002073 Artificial intelligence. http://id.loc.gov/authorities/subjects/sh85008180 Data Mining https://id.nlm.nih.gov/mesh/D057225 Artificial Intelligence https://id.nlm.nih.gov/mesh/D001185 Machine Learning https://id.nlm.nih.gov/mesh/D000069550 Python (Langage de programmation) Apprentissage automatique. Exploration de données (Informatique) Intelligence artificielle. artificial intelligence. aat COMPUTERS Programming General. bisacsh Artificial intelligence fast Data mining fast Machine learning fast Python (Computer program language) fast has work: Hands-on data science and Python machine learning (Text) https://id.oclc.org/worldcat/entity/E39PCGmjhxVPVYxqbkrTxcyb8d https://id.oclc.org/worldcat/ontology/hasWork 1787280748 FWS01 ZDB-4-EBA FWS_PDA_EBA https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1566405 Volltext |
spellingShingle | Kane, Frank Hands-on data science and Python machine learning : perform data mining and machine learning efficiently using Python and Spark / Intro -- Copyright -- Credits -- About the Author -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Getting Started -- Installing Enthought Canopy -- Giving the installation a test run -- If you occasionally get problems opening your IPNYB files -- Using and understanding IPython (Jupyter) Notebooks -- Python basics -- Part 1 -- Understanding Python code -- Importing modules -- Data structures -- Experimenting with lists -- Pre colon -- Post colon -- Negative syntax -- Adding list to list -- The append function -- Complex data structures -- Dereferencing a single element -- The sort function -- Reverse sort -- Tuples -- Dereferencing an element -- List of tuples -- Dictionaries -- Iterating through entries -- Python basics -- Part 2 -- Functions in Python -- Lambda functions -- functional programming -- Understanding boolean expressions -- The if statement -- The if-else loop -- Looping -- The while loop -- Exploring activity -- Running Python scripts -- More options than just the IPython/Jupyter Notebook -- Running Python scripts in command prompt -- Using the Canopy IDE -- Summary -- Chapter 2: Statistics and Probability Refresher, and Python Practice -- Types of data -- Numerical data -- Discrete data -- Continuous data -- Categorical data -- Ordinal data -- Mean, median, and mode -- Mean -- Median -- The factor of outliers -- Mode -- Using mean, median, and mode in Python -- Calculating mean using the NumPy package -- Visualizing data using matplotlib -- Calculating median using the NumPy package -- Analyzing the effect of outliers -- Calculating mode using the SciPy package -- Some exercises -- Standard deviation and variance -- Variance -- Measuring variance -- Standard deviation -- Identifying outliers with standard deviation -- Population variance versus sample variance -- The Mathematical explanation. Analyzing standard deviation and variance on a histogram -- Using Python to compute standard deviation and variance -- Try it yourself -- Probability density function and probability mass function -- The probability density function and probability mass functions -- Probability density functions -- Probability mass functions -- Types of data distributions -- Uniform distribution -- Normal or Gaussian distribution -- The exponential probability distribution or Power law -- Binomial probability mass function -- Poisson probability mass function -- Percentiles and moments -- Percentiles -- Quartiles -- Computing percentiles in Python -- Moments -- Computing moments in Python -- Summary -- Chapter 3: Matplotlib and Advanced Probability Concepts -- A crash course in Matplotlib -- Generating multiple plots on one graph -- Saving graphs as images -- Adjusting the axes -- Adding a grid -- Changing line types and colors -- Labeling axes and adding a legend -- A fun example -- Generating pie charts -- Generating bar charts -- Generating scatter plots -- Generating histograms -- Generating box-and-whisker plots -- Try it yourself -- Covariance and correlation -- Defining the concepts -- Measuring covariance -- Correlation -- Computing covariance and correlation in Python -- Computing correlation -- The hard way -- Computing correlation -- The NumPy way -- Correlation activity -- Conditional probability -- Conditional probability exercises in Python -- Conditional probability assignment -- My assignment solution -- Bayes' theorem -- Summary -- Chapter 4: Predictive Models -- Linear regression -- The ordinary least squares technique -- The gradient descent technique -- The co-efficient of determination or r-squared -- Computing r-squared -- Interpreting r-squared -- Computing linear regression and r-squared using Python -- Activity for linear regression. Polynomial regression -- Implementing polynomial regression using NumPy -- Computing the r-squared error -- Activity for polynomial regression -- Multivariate regression and predicting car prices -- Multivariate regression using Python -- Activity for multivariate regression -- Multi-level models -- Summary -- Chapter 5: Machine Learning with Python -- Machine learning and train/test -- Unsupervised learning -- Supervised learning -- Evaluating supervised learning -- K-fold cross validation -- Using train/test to prevent overfitting of a polynomial regression -- Activity -- Bayesian methods -- Concepts -- Implementing a spam classifier with Naïve Bayes -- Activity -- K-Means clustering -- Limitations to k-means clustering -- Clustering people based on income and age -- Activity -- Measuring entropy -- Decision trees -- Concepts -- Decision tree example -- Walking through a decision tree -- Random forests technique -- Decision trees -- Predicting hiring decisions using Python -- Ensemble learning -- Using a random forest -- Activity -- Ensemble learning -- Support vector machine overview -- Using SVM to cluster people by using scikit-learn -- Activity -- Summary -- Chapter 6: Recommender Systems -- What are recommender systems? -- User-based collaborative filtering -- Limitations of user-based collaborative filtering -- Item-based collaborative filtering -- Understanding item-based collaborative filtering -- How item-based collaborative filtering works? -- Collaborative filtering using Python -- Finding movie similarities -- Understanding the code -- The corrwith function -- Improving the results of movie similarities -- Making movie recommendations to people -- Understanding movie recommendations with an example -- Using the groupby command to combine rows -- Removing entries with the drop command -- Improving the recommendation results -- Summary. Chapter 7: More Data Mining and Machine Learning Techniques -- K-nearest neighbors -- concepts -- Using KNN to predict a rating for a movie -- Activity -- Dimensionality reduction and principal component analysis -- Dimensionality reduction -- Principal component analysis -- A PCA example with the Iris dataset -- Activity -- Data warehousing overview -- ETL versus ELT -- Reinforcement learning -- Q-learning -- The exploration problem -- The simple approach -- The better way -- Fancy words -- Markov decision process -- Dynamic programming -- Summary -- Chapter 8: Dealing with Real-World Data -- Bias/variance trade-off -- K-fold cross-validation to avoid overfitting -- Example of k-fold cross-validation using scikit-learn -- Data cleaning and normalisation -- Cleaning web log data -- Applying a regular expression on the web log -- Modification one -- filtering the request field -- Modification two -- filtering post requests -- Modification three -- checking the user agents -- Filtering the activity of spiders/robots -- Modification four -- applying website-specific filters -- Activity for web log data -- Normalizing numerical data -- Detecting outliers -- Dealing with outliers -- Activity for outliers -- Summary -- Chapter 9: Apache Spark -- Machine Learning on Big Data -- Installing Spark -- Installing Spark on Windows -- Installing Spark on other operating systems -- Installing the Java Development Kit -- Installing Spark -- Spark introduction -- It's scalable -- It's fast -- It's young -- It's not difficult -- Components of Spark -- Python versus Scala for Spark -- Spark and Resilient Distributed Datasets (RDD) -- The SparkContext object -- Creating RDDs -- Creating an RDD using a Python list -- Loading an RDD from a text file -- More ways to create RDDs -- RDD operations -- Transformations -- Using map() -- Actions -- Introducing MLlib. Some MLlib Capabilities -- Special MLlib data types -- The vector data type -- LabeledPoint data type -- Rating data type -- Decision Trees in Spark with MLlib -- Exploring decision trees code -- Creating the SparkContext -- Importing and cleaning our data -- Creating a test candidate and building our decision tree -- Running the script -- K-Means Clustering in Spark -- Within set sum of squared errors (WSSSE) -- Running the code -- TF-IDF -- TF-IDF in practice -- Using TF- IDF -- Searching wikipedia with Spark MLlib -- Import statements -- Creating the initial RDD -- Creating and transforming a HashingTF object -- Computing the TF-IDF score -- Using the Wikipedia search engine algorithm -- Running the algorithm -- Using the Spark 2.0 DataFrame API for MLlib -- How Spark 2.0 MLlib works -- Implementing linear regression -- Summary -- Chapter 10: Testing and Experimental Design -- A/B testing concepts -- A/B tests -- Measuring conversion for A/B testing -- How to attribute conversions -- Variance is your enemy -- T-test and p-value -- The t-statistic or t-test -- The p-value -- Measuring t-statistics and p-values using Python -- Running A/B test on some experimental data -- When there's no real difference between the two groups -- Does the sample size make a difference? -- Sample size increased to six-digits -- Sample size increased seven-digits -- A/A testing -- Determining how long to run an experiment for -- A/B test gotchas -- Novelty effects -- Seasonal effects -- Selection bias -- Auditing selection bias issues -- Data pollution -- Attribution errors -- Summary -- Index. Spark (Electronic resource : Apache Software Foundation) http://id.loc.gov/authorities/names/no2015027445 Spark (Electronic resource : Apache Software Foundation) fast Python (Computer program language) http://id.loc.gov/authorities/subjects/sh96008834 Machine learning. http://id.loc.gov/authorities/subjects/sh85079324 Data mining. http://id.loc.gov/authorities/subjects/sh97002073 Artificial intelligence. http://id.loc.gov/authorities/subjects/sh85008180 Data Mining https://id.nlm.nih.gov/mesh/D057225 Artificial Intelligence https://id.nlm.nih.gov/mesh/D001185 Machine Learning https://id.nlm.nih.gov/mesh/D000069550 Python (Langage de programmation) Apprentissage automatique. Exploration de données (Informatique) Intelligence artificielle. artificial intelligence. aat COMPUTERS Programming General. bisacsh Artificial intelligence fast Data mining fast Machine learning fast Python (Computer program language) fast |
subject_GND | http://id.loc.gov/authorities/names/no2015027445 http://id.loc.gov/authorities/subjects/sh96008834 http://id.loc.gov/authorities/subjects/sh85079324 http://id.loc.gov/authorities/subjects/sh97002073 http://id.loc.gov/authorities/subjects/sh85008180 https://id.nlm.nih.gov/mesh/D057225 https://id.nlm.nih.gov/mesh/D001185 https://id.nlm.nih.gov/mesh/D000069550 |
title | Hands-on data science and Python machine learning : perform data mining and machine learning efficiently using Python and Spark / |
title_auth | Hands-on data science and Python machine learning : perform data mining and machine learning efficiently using Python and Spark / |
title_exact_search | Hands-on data science and Python machine learning : perform data mining and machine learning efficiently using Python and Spark / |
title_full | Hands-on data science and Python machine learning : perform data mining and machine learning efficiently using Python and Spark / Frank Kane. |
title_fullStr | Hands-on data science and Python machine learning : perform data mining and machine learning efficiently using Python and Spark / Frank Kane. |
title_full_unstemmed | Hands-on data science and Python machine learning : perform data mining and machine learning efficiently using Python and Spark / Frank Kane. |
title_short | Hands-on data science and Python machine learning : |
title_sort | hands on data science and python machine learning perform data mining and machine learning efficiently using python and spark |
title_sub | perform data mining and machine learning efficiently using Python and Spark / |
topic | Spark (Electronic resource : Apache Software Foundation) http://id.loc.gov/authorities/names/no2015027445 Spark (Electronic resource : Apache Software Foundation) fast Python (Computer program language) http://id.loc.gov/authorities/subjects/sh96008834 Machine learning. http://id.loc.gov/authorities/subjects/sh85079324 Data mining. http://id.loc.gov/authorities/subjects/sh97002073 Artificial intelligence. http://id.loc.gov/authorities/subjects/sh85008180 Data Mining https://id.nlm.nih.gov/mesh/D057225 Artificial Intelligence https://id.nlm.nih.gov/mesh/D001185 Machine Learning https://id.nlm.nih.gov/mesh/D000069550 Python (Langage de programmation) Apprentissage automatique. Exploration de données (Informatique) Intelligence artificielle. artificial intelligence. aat COMPUTERS Programming General. bisacsh Artificial intelligence fast Data mining fast Machine learning fast Python (Computer program language) fast |
topic_facet | Spark (Electronic resource : Apache Software Foundation) Python (Computer program language) Machine learning. Data mining. Artificial intelligence. Data Mining Artificial Intelligence Machine Learning Python (Langage de programmation) Apprentissage automatique. Exploration de données (Informatique) Intelligence artificielle. artificial intelligence. COMPUTERS Programming General. Artificial intelligence Data mining Machine learning |
url | https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1566405 |
work_keys_str_mv | AT kanefrank handsondatascienceandpythonmachinelearningperformdataminingandmachinelearningefficientlyusingpythonandspark |