Statistics for data science :: leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks /
Get your statistics basics right before diving into the world of data science About This Book No need to take a degree in statistics, read this book and get a strong statistics base for data science and real-world programs; Implement statistics in data science tasks such as data cleaning, mining, an...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Elektronisch E-Book |
Sprache: | English |
Veröffentlicht: |
Birmingham, UK :
Packt Publishing,
2017.
|
Schlagworte: | |
Online-Zugang: | Volltext |
Zusammenfassung: | Get your statistics basics right before diving into the world of data science About This Book No need to take a degree in statistics, read this book and get a strong statistics base for data science and real-world programs; Implement statistics in data science tasks such as data cleaning, mining, and analysis Learn all about probability, statistics, numerical computations, and more with the help of R programs Who This Book Is For This book is intended for those developers who are willing to enter the field of data science and are looking for concise information of statistics with the help of insightful programs and simple explanation. Some basic hands on R will be useful. What You Will Learn Analyze the transition from a data developer to a data scientist mindset Get acquainted with the R programs and the logic used for statistical computations Understand mathematical concepts such as variance, standard deviation, probability, matrix calculations, and more Learn to implement statistics in data science tasks such as data cleaning, mining, and analysis Learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks Get comfortable with performing various statistical computations for data science programmatically In Detail Data science is an ever-evolving field, which is growing in popularity at an exponential rate. Data science includes techniques and theories extracted from the fields of statistics; computer science, and, most importantly, machine learning, databases, data visualization, and so on. This book takes you through an entire journey of statistics, from knowing very little to becoming comfortable in using various statistical methods for data science tasks. It starts off with simple statistics and then move on to statistical methods that are used in data science algorithms. The R programs for statistical computation are clearly explained along with logic. You will come across various mathematical concepts, such as variance, standard deviation, probability, matrix calculations, and more. You will learn only what is required to implement statistics in data science tasks such as data cleaning, mining, and analysis. You will learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks. By the end of the book, you will be comfortab ... |
Beschreibung: | 1 online resource (1 volume) : illustrations |
ISBN: | 9781788295345 178829534X 1788290674 9781788290678 |
Internformat
MARC
LEADER | 00000cam a2200000 i 4500 | ||
---|---|---|---|
001 | ZDB-4-EBA-on1017754186 | ||
003 | OCoLC | ||
005 | 20241004212047.0 | ||
006 | m o d | ||
007 | cr unu|||||||| | ||
008 | 180104s2017 enka o 000 0 eng d | ||
040 | |a UMI |b eng |e rda |e pn |c UMI |d IDEBK |d TOH |d STF |d OCLCF |d N$T |d SNM |d CEF |d KSU |d DEBBG |d TEFOD |d G3B |d S9I |d UAB |d QGK |d OCLCO |d OCLCQ |d OCLCO |d K6U |d OCLCQ |d OCLCO |d OCLCL |d DXU | ||
020 | |a 9781788295345 |q (electronic bk.) | ||
020 | |a 178829534X |q (electronic bk.) | ||
020 | |a 1788290674 | ||
020 | |a 9781788290678 | ||
020 | |z 9781788290678 | ||
035 | |a (OCoLC)1017754186 | ||
037 | |a CL0500000922 |b Safari Books Online | ||
037 | |a 8A0E96E4-6AE2-4399-BE71-56ACDE2E5ED3 |b OverDrive, Inc. |n http://www.overdrive.com | ||
050 | 4 | |a QA276.4.M55 |b S73 2017eb | |
072 | 7 | |a COM |x 018000 |2 bisacsh | |
082 | 7 | |a 005.7565 |2 23 | |
049 | |a MAIN | ||
100 | 1 | |a Miller, James D. |c (Software consultant), |e author. |0 http://id.loc.gov/authorities/names/nb2016005442 | |
245 | 1 | 0 | |a Statistics for data science : |b leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / |c James D. Miller. |
264 | 1 | |a Birmingham, UK : |b Packt Publishing, |c 2017. | |
300 | |a 1 online resource (1 volume) : |b illustrations | ||
336 | |a text |b txt |2 rdacontent | ||
337 | |a computer |b c |2 rdamedia | ||
338 | |a online resource |b cr |2 rdacarrier | ||
347 | |a data file | ||
588 | 0 | |a Online resource; title from title page (viewed January 2, 2018). | |
520 | |a Get your statistics basics right before diving into the world of data science About This Book No need to take a degree in statistics, read this book and get a strong statistics base for data science and real-world programs; Implement statistics in data science tasks such as data cleaning, mining, and analysis Learn all about probability, statistics, numerical computations, and more with the help of R programs Who This Book Is For This book is intended for those developers who are willing to enter the field of data science and are looking for concise information of statistics with the help of insightful programs and simple explanation. Some basic hands on R will be useful. What You Will Learn Analyze the transition from a data developer to a data scientist mindset Get acquainted with the R programs and the logic used for statistical computations Understand mathematical concepts such as variance, standard deviation, probability, matrix calculations, and more Learn to implement statistics in data science tasks such as data cleaning, mining, and analysis Learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks Get comfortable with performing various statistical computations for data science programmatically In Detail Data science is an ever-evolving field, which is growing in popularity at an exponential rate. Data science includes techniques and theories extracted from the fields of statistics; computer science, and, most importantly, machine learning, databases, data visualization, and so on. This book takes you through an entire journey of statistics, from knowing very little to becoming comfortable in using various statistical methods for data science tasks. It starts off with simple statistics and then move on to statistical methods that are used in data science algorithms. The R programs for statistical computation are clearly explained along with logic. You will come across various mathematical concepts, such as variance, standard deviation, probability, matrix calculations, and more. You will learn only what is required to implement statistics in data science tasks such as data cleaning, mining, and analysis. You will learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks. By the end of the book, you will be comfortab ... | ||
505 | 0 | |a Cover -- Copyright -- Credits -- About the Author -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Transitioning from Data Developer to Data Scientist -- Data developer thinking -- Objectives of a data developer -- Querying or mining -- Data quality or data cleansing -- Data modeling -- Issue or insights -- Thought process -- Developer versus scientist -- New data, new source -- Quality questions -- Querying and mining -- Performance -- Financial reporting -- Visualizing -- Tools of the trade -- Advantages of thinking like a data scientist -- Developing a better approach to understanding data -- Using statistical thinking during program or database designing -- Adding to your personal toolbox -- Increased marketability -- Perpetual learning -- Seeing the future -- Transitioning to a data scientist -- Let's move ahead -- Summary -- Chapter 2: Declaring the Objectives -- Key objectives of data science -- Collecting data -- Processing data -- Exploring and visualizing data -- Analyzing the data and/or applying machine learning to the data -- Deciding (or planning) based upon acquired insight -- Thinking like a data scientist -- Bringing statistics into data science -- Common terminology -- Statistical population -- Probability -- False positives -- Statistical inference -- Regression -- Fitting -- Categorical data -- Classification -- Clustering -- Statistical comparison -- Coding -- Distributions -- Data mining -- Decision trees -- Machine learning -- Munging and wrangling -- Visualization -- D3 -- Regularization -- Assessment -- Cross-validation -- Neural networks -- Boosting -- Lift -- Mode -- Outlier -- Predictive modeling -- Big Data -- Confidence interval -- Writing -- Summary -- Chapter 3: A Developer's Approach to Data Cleaning -- Understanding basic data cleaning. | |
505 | 8 | |a Common data issues -- Contextual data issues -- Cleaning techniques -- R and common data issues -- Outliers -- Step 1 -- Profiling the data -- Step 2 -- Addressing the outliers -- Domain expertise -- Validity checking -- Enhancing data -- Harmonization -- Standardization -- Transformations -- Deductive correction -- Deterministic imputation -- Summary -- Chapter 4: Data Mining and the Database Developer -- Data mining -- Common techniques -- Visualization -- Cluster analysis -- Correlation analysis -- Discriminant analysis -- Factor analysis -- Regression analysis -- Logistic analysis -- Purpose -- Mining versus querying -- Choosing R for data mining -- Visualizations -- Current smokers -- Missing values -- A cluster analysis -- Dimensional reduction -- Calculating statistical significance -- Frequent patterning -- Frequent item-setting -- Sequence mining -- Summary -- Chapter 5: Statistical Analysis for the Database Developer -- Data analysis -- Looking closer -- Statistical analysis -- Summarization -- Comparing groups -- Samples -- Group comparison conclusions -- Summarization modeling -- Establishing the nature of data -- Successful statistical analysis -- R and statistical analysis -- Summary -- Chapter 6: Database Progression to Database Regression -- Introducing statistical regression -- Techniques and approaches for regression -- Choosing your technique -- Does it fit? -- Identifying opportunities for statistical regression -- Summarizing data -- Exploring relationships -- Testing significance of differences -- Project profitability -- R and statistical regression -- A working example -- Establishing the data profile -- The graphical analysis -- Predicting with our linear model -- Step 1: Chunking the data -- Step 2: Creating the model on the training data -- Step 3: Predicting the projected profit on test data -- Step 4: Reviewing the model. | |
505 | 8 | |a Step 4: Accuracy and error -- Summary -- Chapter 7: Regularization for Database Improvement -- Statistical regularization -- Various statistical regularization methods -- Ridge -- Lasso -- Least angles -- Opportunities for regularization -- Collinearity -- Sparse solutions -- High-dimensional data -- Classification -- Using data to understand statistical regularization -- Improving data or a data model -- Simplification -- Relevance -- Speed -- Transformation -- Variation of coefficients -- Casual inference -- Back to regularization -- Reliability -- Using R for statistical regularization -- Parameter Setup -- Summary -- Chapter 8: Database Development and Assessment -- Assessment and statistical assessment -- Objectives -- Baselines -- Planning for assessment -- Evaluation -- Development versus assessment -- Planning -- Data assessment and data quality assurance -- Categorizing quality -- Relevance -- Cross-validation -- Preparing data -- R and statistical assessment -- Questions to ask -- Learning curves -- Example of a learning curve -- Summary -- Chapter 9: Databases and Neural Networks -- Ask any data scientist -- Defining neural network -- Nodes -- Layers -- Training -- Solution -- Understanding the concepts -- Neural network models and database models -- No single or main node -- Not serial -- No memory address to store results -- R-based neural networks -- References -- Data prep and preprocessing -- Data splitting -- Model parameters -- Cross-validation -- R packages for ANN development -- ANN -- ANN2 -- NNET -- Black boxes -- A use case -- Popular use cases -- Character recognition -- Image compression -- Stock market prediction -- Fraud detection -- Neuroscience -- Summary -- Chapter 10: Boosting your Database -- Definition and purpose -- Bias -- Categorizing bias -- Causes of bias -- Bias data collection -- Bias sample selection. | |
505 | 8 | |a Variance -- ANOVA -- Noise -- Noisy data -- Weak and strong learners -- Weak to strong -- Model bias -- Training and prediction time -- Complexity -- Which way? -- Back to boosting -- How it started -- AdaBoost -- What you can learn from boosting (to help) your database -- Using R to illustrate boosting methods -- Prepping the data -- Training -- Ready for boosting -- Example results -- Summary -- Chapter 11: Database Classification using Support Vector Machines -- Database classification -- Data classification in statistics -- Guidelines for classifying data -- Common guidelines -- Definitions -- Definition and purpose of an SVM -- The trick -- Feature space and cheap computations -- Drawing the line -- More than classification -- Downside -- Reference resources -- Predicting credit scores -- Using R and an SVM to classify data in a database -- Moving on -- Summary -- Chapter 12: Database Structures and Machine Learning -- Data structures and data models -- Data structures -- Data models -- What's the difference? -- Relationships -- Machine learning -- Overview of machine learning concepts -- Key elements of machine learning -- Representation -- Evaluation -- Optimization -- Types of machine learning -- Supervised learning -- Unsupervised learning -- Semi-supervised learning -- Reinforcement learning -- Most popular -- Applications of machine learning -- Machine learning in practice -- Understanding -- Preparation -- Learning -- Interpretation -- Deployment -- Iteration -- Using R to apply machine learning techniques to a database -- Understanding the data -- Preparing -- Data developer -- Understanding the challenge -- Cross-tabbing and plotting -- Summary -- Index. | |
650 | 0 | |a Statistics. |0 http://id.loc.gov/authorities/subjects/sh85127580 | |
650 | 0 | |a Big data. |0 http://id.loc.gov/authorities/subjects/sh2012003227 | |
650 | 6 | |a Données volumineuses. | |
650 | 6 | |a Statistique. | |
650 | 7 | |a statistics. |2 aat | |
650 | 7 | |a COMPUTERS |x Data Processing. |2 bisacsh | |
650 | 7 | |a Big data |2 fast | |
650 | 7 | |a Statistics |2 fast | |
758 | |i has work: |a Statistics for data science (Text) |1 https://id.oclc.org/worldcat/entity/E39PCFw4V4C6wbP7myyQPcFKr3 |4 https://id.oclc.org/worldcat/ontology/hasWork | ||
856 | 4 | 0 | |l FWS01 |p ZDB-4-EBA |q FWS_PDA_EBA |u https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1636280 |3 Volltext |
938 | |a EBSCOhost |b EBSC |n 1636280 | ||
938 | |a ProQuest MyiLibrary Digital eBook Collection |b IDEB |n cis38039490 | ||
994 | |a 92 |b GEBAY | ||
912 | |a ZDB-4-EBA | ||
049 | |a DE-863 |
Datensatz im Suchindex
DE-BY-FWS_katkey | ZDB-4-EBA-on1017754186 |
---|---|
_version_ | 1816882409571876864 |
adam_text | |
any_adam_object | |
author | Miller, James D. (Software consultant) |
author_GND | http://id.loc.gov/authorities/names/nb2016005442 |
author_facet | Miller, James D. (Software consultant) |
author_role | aut |
author_sort | Miller, James D. (Software consultant) |
author_variant | j d m jd jdm |
building | Verbundindex |
bvnumber | localFWS |
callnumber-first | Q - Science |
callnumber-label | QA276 |
callnumber-raw | QA276.4.M55 S73 2017eb |
callnumber-search | QA276.4.M55 S73 2017eb |
callnumber-sort | QA 3276.4 M55 S73 42017EB |
callnumber-subject | QA - Mathematics |
collection | ZDB-4-EBA |
contents | Cover -- Copyright -- Credits -- About the Author -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Transitioning from Data Developer to Data Scientist -- Data developer thinking -- Objectives of a data developer -- Querying or mining -- Data quality or data cleansing -- Data modeling -- Issue or insights -- Thought process -- Developer versus scientist -- New data, new source -- Quality questions -- Querying and mining -- Performance -- Financial reporting -- Visualizing -- Tools of the trade -- Advantages of thinking like a data scientist -- Developing a better approach to understanding data -- Using statistical thinking during program or database designing -- Adding to your personal toolbox -- Increased marketability -- Perpetual learning -- Seeing the future -- Transitioning to a data scientist -- Let's move ahead -- Summary -- Chapter 2: Declaring the Objectives -- Key objectives of data science -- Collecting data -- Processing data -- Exploring and visualizing data -- Analyzing the data and/or applying machine learning to the data -- Deciding (or planning) based upon acquired insight -- Thinking like a data scientist -- Bringing statistics into data science -- Common terminology -- Statistical population -- Probability -- False positives -- Statistical inference -- Regression -- Fitting -- Categorical data -- Classification -- Clustering -- Statistical comparison -- Coding -- Distributions -- Data mining -- Decision trees -- Machine learning -- Munging and wrangling -- Visualization -- D3 -- Regularization -- Assessment -- Cross-validation -- Neural networks -- Boosting -- Lift -- Mode -- Outlier -- Predictive modeling -- Big Data -- Confidence interval -- Writing -- Summary -- Chapter 3: A Developer's Approach to Data Cleaning -- Understanding basic data cleaning. Common data issues -- Contextual data issues -- Cleaning techniques -- R and common data issues -- Outliers -- Step 1 -- Profiling the data -- Step 2 -- Addressing the outliers -- Domain expertise -- Validity checking -- Enhancing data -- Harmonization -- Standardization -- Transformations -- Deductive correction -- Deterministic imputation -- Summary -- Chapter 4: Data Mining and the Database Developer -- Data mining -- Common techniques -- Visualization -- Cluster analysis -- Correlation analysis -- Discriminant analysis -- Factor analysis -- Regression analysis -- Logistic analysis -- Purpose -- Mining versus querying -- Choosing R for data mining -- Visualizations -- Current smokers -- Missing values -- A cluster analysis -- Dimensional reduction -- Calculating statistical significance -- Frequent patterning -- Frequent item-setting -- Sequence mining -- Summary -- Chapter 5: Statistical Analysis for the Database Developer -- Data analysis -- Looking closer -- Statistical analysis -- Summarization -- Comparing groups -- Samples -- Group comparison conclusions -- Summarization modeling -- Establishing the nature of data -- Successful statistical analysis -- R and statistical analysis -- Summary -- Chapter 6: Database Progression to Database Regression -- Introducing statistical regression -- Techniques and approaches for regression -- Choosing your technique -- Does it fit? -- Identifying opportunities for statistical regression -- Summarizing data -- Exploring relationships -- Testing significance of differences -- Project profitability -- R and statistical regression -- A working example -- Establishing the data profile -- The graphical analysis -- Predicting with our linear model -- Step 1: Chunking the data -- Step 2: Creating the model on the training data -- Step 3: Predicting the projected profit on test data -- Step 4: Reviewing the model. Step 4: Accuracy and error -- Summary -- Chapter 7: Regularization for Database Improvement -- Statistical regularization -- Various statistical regularization methods -- Ridge -- Lasso -- Least angles -- Opportunities for regularization -- Collinearity -- Sparse solutions -- High-dimensional data -- Classification -- Using data to understand statistical regularization -- Improving data or a data model -- Simplification -- Relevance -- Speed -- Transformation -- Variation of coefficients -- Casual inference -- Back to regularization -- Reliability -- Using R for statistical regularization -- Parameter Setup -- Summary -- Chapter 8: Database Development and Assessment -- Assessment and statistical assessment -- Objectives -- Baselines -- Planning for assessment -- Evaluation -- Development versus assessment -- Planning -- Data assessment and data quality assurance -- Categorizing quality -- Relevance -- Cross-validation -- Preparing data -- R and statistical assessment -- Questions to ask -- Learning curves -- Example of a learning curve -- Summary -- Chapter 9: Databases and Neural Networks -- Ask any data scientist -- Defining neural network -- Nodes -- Layers -- Training -- Solution -- Understanding the concepts -- Neural network models and database models -- No single or main node -- Not serial -- No memory address to store results -- R-based neural networks -- References -- Data prep and preprocessing -- Data splitting -- Model parameters -- Cross-validation -- R packages for ANN development -- ANN -- ANN2 -- NNET -- Black boxes -- A use case -- Popular use cases -- Character recognition -- Image compression -- Stock market prediction -- Fraud detection -- Neuroscience -- Summary -- Chapter 10: Boosting your Database -- Definition and purpose -- Bias -- Categorizing bias -- Causes of bias -- Bias data collection -- Bias sample selection. Variance -- ANOVA -- Noise -- Noisy data -- Weak and strong learners -- Weak to strong -- Model bias -- Training and prediction time -- Complexity -- Which way? -- Back to boosting -- How it started -- AdaBoost -- What you can learn from boosting (to help) your database -- Using R to illustrate boosting methods -- Prepping the data -- Training -- Ready for boosting -- Example results -- Summary -- Chapter 11: Database Classification using Support Vector Machines -- Database classification -- Data classification in statistics -- Guidelines for classifying data -- Common guidelines -- Definitions -- Definition and purpose of an SVM -- The trick -- Feature space and cheap computations -- Drawing the line -- More than classification -- Downside -- Reference resources -- Predicting credit scores -- Using R and an SVM to classify data in a database -- Moving on -- Summary -- Chapter 12: Database Structures and Machine Learning -- Data structures and data models -- Data structures -- Data models -- What's the difference? -- Relationships -- Machine learning -- Overview of machine learning concepts -- Key elements of machine learning -- Representation -- Evaluation -- Optimization -- Types of machine learning -- Supervised learning -- Unsupervised learning -- Semi-supervised learning -- Reinforcement learning -- Most popular -- Applications of machine learning -- Machine learning in practice -- Understanding -- Preparation -- Learning -- Interpretation -- Deployment -- Iteration -- Using R to apply machine learning techniques to a database -- Understanding the data -- Preparing -- Data developer -- Understanding the challenge -- Cross-tabbing and plotting -- Summary -- Index. |
ctrlnum | (OCoLC)1017754186 |
dewey-full | 005.7565 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 005 - Computer programming, programs, data, security |
dewey-raw | 005.7565 |
dewey-search | 005.7565 |
dewey-sort | 15.7565 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik |
format | Electronic eBook |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>12201cam a2200577 i 4500</leader><controlfield tag="001">ZDB-4-EBA-on1017754186</controlfield><controlfield tag="003">OCoLC</controlfield><controlfield tag="005">20241004212047.0</controlfield><controlfield tag="006">m o d </controlfield><controlfield tag="007">cr unu||||||||</controlfield><controlfield tag="008">180104s2017 enka o 000 0 eng d</controlfield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">UMI</subfield><subfield code="b">eng</subfield><subfield code="e">rda</subfield><subfield code="e">pn</subfield><subfield code="c">UMI</subfield><subfield code="d">IDEBK</subfield><subfield code="d">TOH</subfield><subfield code="d">STF</subfield><subfield code="d">OCLCF</subfield><subfield code="d">N$T</subfield><subfield code="d">SNM</subfield><subfield code="d">CEF</subfield><subfield code="d">KSU</subfield><subfield code="d">DEBBG</subfield><subfield code="d">TEFOD</subfield><subfield code="d">G3B</subfield><subfield code="d">S9I</subfield><subfield code="d">UAB</subfield><subfield code="d">QGK</subfield><subfield code="d">OCLCO</subfield><subfield code="d">OCLCQ</subfield><subfield code="d">OCLCO</subfield><subfield code="d">K6U</subfield><subfield code="d">OCLCQ</subfield><subfield code="d">OCLCO</subfield><subfield code="d">OCLCL</subfield><subfield code="d">DXU</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781788295345</subfield><subfield code="q">(electronic bk.)</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">178829534X</subfield><subfield code="q">(electronic bk.)</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1788290674</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781788290678</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="z">9781788290678</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1017754186</subfield></datafield><datafield tag="037" ind1=" " ind2=" "><subfield code="a">CL0500000922</subfield><subfield code="b">Safari Books Online</subfield></datafield><datafield tag="037" ind1=" " ind2=" "><subfield code="a">8A0E96E4-6AE2-4399-BE71-56ACDE2E5ED3</subfield><subfield code="b">OverDrive, Inc.</subfield><subfield code="n">http://www.overdrive.com</subfield></datafield><datafield tag="050" ind1=" " ind2="4"><subfield code="a">QA276.4.M55</subfield><subfield code="b">S73 2017eb</subfield></datafield><datafield tag="072" ind1=" " ind2="7"><subfield code="a">COM</subfield><subfield code="x">018000</subfield><subfield code="2">bisacsh</subfield></datafield><datafield tag="082" ind1="7" ind2=" "><subfield code="a">005.7565</subfield><subfield code="2">23</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">MAIN</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Miller, James D.</subfield><subfield code="c">(Software consultant),</subfield><subfield code="e">author.</subfield><subfield code="0">http://id.loc.gov/authorities/names/nb2016005442</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Statistics for data science :</subfield><subfield code="b">leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks /</subfield><subfield code="c">James D. Miller.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Birmingham, UK :</subfield><subfield code="b">Packt Publishing,</subfield><subfield code="c">2017.</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 online resource (1 volume) :</subfield><subfield code="b">illustrations</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">computer</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">online resource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="347" ind1=" " ind2=" "><subfield code="a">data file</subfield></datafield><datafield tag="588" ind1="0" ind2=" "><subfield code="a">Online resource; title from title page (viewed January 2, 2018).</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Get your statistics basics right before diving into the world of data science About This Book No need to take a degree in statistics, read this book and get a strong statistics base for data science and real-world programs; Implement statistics in data science tasks such as data cleaning, mining, and analysis Learn all about probability, statistics, numerical computations, and more with the help of R programs Who This Book Is For This book is intended for those developers who are willing to enter the field of data science and are looking for concise information of statistics with the help of insightful programs and simple explanation. Some basic hands on R will be useful. What You Will Learn Analyze the transition from a data developer to a data scientist mindset Get acquainted with the R programs and the logic used for statistical computations Understand mathematical concepts such as variance, standard deviation, probability, matrix calculations, and more Learn to implement statistics in data science tasks such as data cleaning, mining, and analysis Learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks Get comfortable with performing various statistical computations for data science programmatically In Detail Data science is an ever-evolving field, which is growing in popularity at an exponential rate. Data science includes techniques and theories extracted from the fields of statistics; computer science, and, most importantly, machine learning, databases, data visualization, and so on. This book takes you through an entire journey of statistics, from knowing very little to becoming comfortable in using various statistical methods for data science tasks. It starts off with simple statistics and then move on to statistical methods that are used in data science algorithms. The R programs for statistical computation are clearly explained along with logic. You will come across various mathematical concepts, such as variance, standard deviation, probability, matrix calculations, and more. You will learn only what is required to implement statistics in data science tasks such as data cleaning, mining, and analysis. You will learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks. By the end of the book, you will be comfortab ...</subfield></datafield><datafield tag="505" ind1="0" ind2=" "><subfield code="a">Cover -- Copyright -- Credits -- About the Author -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Transitioning from Data Developer to Data Scientist -- Data developer thinking -- Objectives of a data developer -- Querying or mining -- Data quality or data cleansing -- Data modeling -- Issue or insights -- Thought process -- Developer versus scientist -- New data, new source -- Quality questions -- Querying and mining -- Performance -- Financial reporting -- Visualizing -- Tools of the trade -- Advantages of thinking like a data scientist -- Developing a better approach to understanding data -- Using statistical thinking during program or database designing -- Adding to your personal toolbox -- Increased marketability -- Perpetual learning -- Seeing the future -- Transitioning to a data scientist -- Let's move ahead -- Summary -- Chapter 2: Declaring the Objectives -- Key objectives of data science -- Collecting data -- Processing data -- Exploring and visualizing data -- Analyzing the data and/or applying machine learning to the data -- Deciding (or planning) based upon acquired insight -- Thinking like a data scientist -- Bringing statistics into data science -- Common terminology -- Statistical population -- Probability -- False positives -- Statistical inference -- Regression -- Fitting -- Categorical data -- Classification -- Clustering -- Statistical comparison -- Coding -- Distributions -- Data mining -- Decision trees -- Machine learning -- Munging and wrangling -- Visualization -- D3 -- Regularization -- Assessment -- Cross-validation -- Neural networks -- Boosting -- Lift -- Mode -- Outlier -- Predictive modeling -- Big Data -- Confidence interval -- Writing -- Summary -- Chapter 3: A Developer's Approach to Data Cleaning -- Understanding basic data cleaning.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Common data issues -- Contextual data issues -- Cleaning techniques -- R and common data issues -- Outliers -- Step 1 -- Profiling the data -- Step 2 -- Addressing the outliers -- Domain expertise -- Validity checking -- Enhancing data -- Harmonization -- Standardization -- Transformations -- Deductive correction -- Deterministic imputation -- Summary -- Chapter 4: Data Mining and the Database Developer -- Data mining -- Common techniques -- Visualization -- Cluster analysis -- Correlation analysis -- Discriminant analysis -- Factor analysis -- Regression analysis -- Logistic analysis -- Purpose -- Mining versus querying -- Choosing R for data mining -- Visualizations -- Current smokers -- Missing values -- A cluster analysis -- Dimensional reduction -- Calculating statistical significance -- Frequent patterning -- Frequent item-setting -- Sequence mining -- Summary -- Chapter 5: Statistical Analysis for the Database Developer -- Data analysis -- Looking closer -- Statistical analysis -- Summarization -- Comparing groups -- Samples -- Group comparison conclusions -- Summarization modeling -- Establishing the nature of data -- Successful statistical analysis -- R and statistical analysis -- Summary -- Chapter 6: Database Progression to Database Regression -- Introducing statistical regression -- Techniques and approaches for regression -- Choosing your technique -- Does it fit? -- Identifying opportunities for statistical regression -- Summarizing data -- Exploring relationships -- Testing significance of differences -- Project profitability -- R and statistical regression -- A working example -- Establishing the data profile -- The graphical analysis -- Predicting with our linear model -- Step 1: Chunking the data -- Step 2: Creating the model on the training data -- Step 3: Predicting the projected profit on test data -- Step 4: Reviewing the model.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Step 4: Accuracy and error -- Summary -- Chapter 7: Regularization for Database Improvement -- Statistical regularization -- Various statistical regularization methods -- Ridge -- Lasso -- Least angles -- Opportunities for regularization -- Collinearity -- Sparse solutions -- High-dimensional data -- Classification -- Using data to understand statistical regularization -- Improving data or a data model -- Simplification -- Relevance -- Speed -- Transformation -- Variation of coefficients -- Casual inference -- Back to regularization -- Reliability -- Using R for statistical regularization -- Parameter Setup -- Summary -- Chapter 8: Database Development and Assessment -- Assessment and statistical assessment -- Objectives -- Baselines -- Planning for assessment -- Evaluation -- Development versus assessment -- Planning -- Data assessment and data quality assurance -- Categorizing quality -- Relevance -- Cross-validation -- Preparing data -- R and statistical assessment -- Questions to ask -- Learning curves -- Example of a learning curve -- Summary -- Chapter 9: Databases and Neural Networks -- Ask any data scientist -- Defining neural network -- Nodes -- Layers -- Training -- Solution -- Understanding the concepts -- Neural network models and database models -- No single or main node -- Not serial -- No memory address to store results -- R-based neural networks -- References -- Data prep and preprocessing -- Data splitting -- Model parameters -- Cross-validation -- R packages for ANN development -- ANN -- ANN2 -- NNET -- Black boxes -- A use case -- Popular use cases -- Character recognition -- Image compression -- Stock market prediction -- Fraud detection -- Neuroscience -- Summary -- Chapter 10: Boosting your Database -- Definition and purpose -- Bias -- Categorizing bias -- Causes of bias -- Bias data collection -- Bias sample selection.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Variance -- ANOVA -- Noise -- Noisy data -- Weak and strong learners -- Weak to strong -- Model bias -- Training and prediction time -- Complexity -- Which way? -- Back to boosting -- How it started -- AdaBoost -- What you can learn from boosting (to help) your database -- Using R to illustrate boosting methods -- Prepping the data -- Training -- Ready for boosting -- Example results -- Summary -- Chapter 11: Database Classification using Support Vector Machines -- Database classification -- Data classification in statistics -- Guidelines for classifying data -- Common guidelines -- Definitions -- Definition and purpose of an SVM -- The trick -- Feature space and cheap computations -- Drawing the line -- More than classification -- Downside -- Reference resources -- Predicting credit scores -- Using R and an SVM to classify data in a database -- Moving on -- Summary -- Chapter 12: Database Structures and Machine Learning -- Data structures and data models -- Data structures -- Data models -- What's the difference? -- Relationships -- Machine learning -- Overview of machine learning concepts -- Key elements of machine learning -- Representation -- Evaluation -- Optimization -- Types of machine learning -- Supervised learning -- Unsupervised learning -- Semi-supervised learning -- Reinforcement learning -- Most popular -- Applications of machine learning -- Machine learning in practice -- Understanding -- Preparation -- Learning -- Interpretation -- Deployment -- Iteration -- Using R to apply machine learning techniques to a database -- Understanding the data -- Preparing -- Data developer -- Understanding the challenge -- Cross-tabbing and plotting -- Summary -- Index.</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Statistics.</subfield><subfield code="0">http://id.loc.gov/authorities/subjects/sh85127580</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Big data.</subfield><subfield code="0">http://id.loc.gov/authorities/subjects/sh2012003227</subfield></datafield><datafield tag="650" ind1=" " ind2="6"><subfield code="a">Données volumineuses.</subfield></datafield><datafield tag="650" ind1=" " ind2="6"><subfield code="a">Statistique.</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">statistics.</subfield><subfield code="2">aat</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">COMPUTERS</subfield><subfield code="x">Data Processing.</subfield><subfield code="2">bisacsh</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Big data</subfield><subfield code="2">fast</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Statistics</subfield><subfield code="2">fast</subfield></datafield><datafield tag="758" ind1=" " ind2=" "><subfield code="i">has work:</subfield><subfield code="a">Statistics for data science (Text)</subfield><subfield code="1">https://id.oclc.org/worldcat/entity/E39PCFw4V4C6wbP7myyQPcFKr3</subfield><subfield code="4">https://id.oclc.org/worldcat/ontology/hasWork</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="l">FWS01</subfield><subfield code="p">ZDB-4-EBA</subfield><subfield code="q">FWS_PDA_EBA</subfield><subfield code="u">https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1636280</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="938" ind1=" " ind2=" "><subfield code="a">EBSCOhost</subfield><subfield code="b">EBSC</subfield><subfield code="n">1636280</subfield></datafield><datafield tag="938" ind1=" " ind2=" "><subfield code="a">ProQuest MyiLibrary Digital eBook Collection</subfield><subfield code="b">IDEB</subfield><subfield code="n">cis38039490</subfield></datafield><datafield tag="994" ind1=" " ind2=" "><subfield code="a">92</subfield><subfield code="b">GEBAY</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-4-EBA</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-863</subfield></datafield></record></collection> |
id | ZDB-4-EBA-on1017754186 |
illustrated | Illustrated |
indexdate | 2024-11-27T13:28:09Z |
institution | BVB |
isbn | 9781788295345 178829534X 1788290674 9781788290678 |
language | English |
oclc_num | 1017754186 |
open_access_boolean | |
owner | MAIN DE-863 DE-BY-FWS |
owner_facet | MAIN DE-863 DE-BY-FWS |
physical | 1 online resource (1 volume) : illustrations |
psigel | ZDB-4-EBA |
publishDate | 2017 |
publishDateSearch | 2017 |
publishDateSort | 2017 |
publisher | Packt Publishing, |
record_format | marc |
spelling | Miller, James D. (Software consultant), author. http://id.loc.gov/authorities/names/nb2016005442 Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / James D. Miller. Birmingham, UK : Packt Publishing, 2017. 1 online resource (1 volume) : illustrations text txt rdacontent computer c rdamedia online resource cr rdacarrier data file Online resource; title from title page (viewed January 2, 2018). Get your statistics basics right before diving into the world of data science About This Book No need to take a degree in statistics, read this book and get a strong statistics base for data science and real-world programs; Implement statistics in data science tasks such as data cleaning, mining, and analysis Learn all about probability, statistics, numerical computations, and more with the help of R programs Who This Book Is For This book is intended for those developers who are willing to enter the field of data science and are looking for concise information of statistics with the help of insightful programs and simple explanation. Some basic hands on R will be useful. What You Will Learn Analyze the transition from a data developer to a data scientist mindset Get acquainted with the R programs and the logic used for statistical computations Understand mathematical concepts such as variance, standard deviation, probability, matrix calculations, and more Learn to implement statistics in data science tasks such as data cleaning, mining, and analysis Learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks Get comfortable with performing various statistical computations for data science programmatically In Detail Data science is an ever-evolving field, which is growing in popularity at an exponential rate. Data science includes techniques and theories extracted from the fields of statistics; computer science, and, most importantly, machine learning, databases, data visualization, and so on. This book takes you through an entire journey of statistics, from knowing very little to becoming comfortable in using various statistical methods for data science tasks. It starts off with simple statistics and then move on to statistical methods that are used in data science algorithms. The R programs for statistical computation are clearly explained along with logic. You will come across various mathematical concepts, such as variance, standard deviation, probability, matrix calculations, and more. You will learn only what is required to implement statistics in data science tasks such as data cleaning, mining, and analysis. You will learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks. By the end of the book, you will be comfortab ... Cover -- Copyright -- Credits -- About the Author -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Transitioning from Data Developer to Data Scientist -- Data developer thinking -- Objectives of a data developer -- Querying or mining -- Data quality or data cleansing -- Data modeling -- Issue or insights -- Thought process -- Developer versus scientist -- New data, new source -- Quality questions -- Querying and mining -- Performance -- Financial reporting -- Visualizing -- Tools of the trade -- Advantages of thinking like a data scientist -- Developing a better approach to understanding data -- Using statistical thinking during program or database designing -- Adding to your personal toolbox -- Increased marketability -- Perpetual learning -- Seeing the future -- Transitioning to a data scientist -- Let's move ahead -- Summary -- Chapter 2: Declaring the Objectives -- Key objectives of data science -- Collecting data -- Processing data -- Exploring and visualizing data -- Analyzing the data and/or applying machine learning to the data -- Deciding (or planning) based upon acquired insight -- Thinking like a data scientist -- Bringing statistics into data science -- Common terminology -- Statistical population -- Probability -- False positives -- Statistical inference -- Regression -- Fitting -- Categorical data -- Classification -- Clustering -- Statistical comparison -- Coding -- Distributions -- Data mining -- Decision trees -- Machine learning -- Munging and wrangling -- Visualization -- D3 -- Regularization -- Assessment -- Cross-validation -- Neural networks -- Boosting -- Lift -- Mode -- Outlier -- Predictive modeling -- Big Data -- Confidence interval -- Writing -- Summary -- Chapter 3: A Developer's Approach to Data Cleaning -- Understanding basic data cleaning. Common data issues -- Contextual data issues -- Cleaning techniques -- R and common data issues -- Outliers -- Step 1 -- Profiling the data -- Step 2 -- Addressing the outliers -- Domain expertise -- Validity checking -- Enhancing data -- Harmonization -- Standardization -- Transformations -- Deductive correction -- Deterministic imputation -- Summary -- Chapter 4: Data Mining and the Database Developer -- Data mining -- Common techniques -- Visualization -- Cluster analysis -- Correlation analysis -- Discriminant analysis -- Factor analysis -- Regression analysis -- Logistic analysis -- Purpose -- Mining versus querying -- Choosing R for data mining -- Visualizations -- Current smokers -- Missing values -- A cluster analysis -- Dimensional reduction -- Calculating statistical significance -- Frequent patterning -- Frequent item-setting -- Sequence mining -- Summary -- Chapter 5: Statistical Analysis for the Database Developer -- Data analysis -- Looking closer -- Statistical analysis -- Summarization -- Comparing groups -- Samples -- Group comparison conclusions -- Summarization modeling -- Establishing the nature of data -- Successful statistical analysis -- R and statistical analysis -- Summary -- Chapter 6: Database Progression to Database Regression -- Introducing statistical regression -- Techniques and approaches for regression -- Choosing your technique -- Does it fit? -- Identifying opportunities for statistical regression -- Summarizing data -- Exploring relationships -- Testing significance of differences -- Project profitability -- R and statistical regression -- A working example -- Establishing the data profile -- The graphical analysis -- Predicting with our linear model -- Step 1: Chunking the data -- Step 2: Creating the model on the training data -- Step 3: Predicting the projected profit on test data -- Step 4: Reviewing the model. Step 4: Accuracy and error -- Summary -- Chapter 7: Regularization for Database Improvement -- Statistical regularization -- Various statistical regularization methods -- Ridge -- Lasso -- Least angles -- Opportunities for regularization -- Collinearity -- Sparse solutions -- High-dimensional data -- Classification -- Using data to understand statistical regularization -- Improving data or a data model -- Simplification -- Relevance -- Speed -- Transformation -- Variation of coefficients -- Casual inference -- Back to regularization -- Reliability -- Using R for statistical regularization -- Parameter Setup -- Summary -- Chapter 8: Database Development and Assessment -- Assessment and statistical assessment -- Objectives -- Baselines -- Planning for assessment -- Evaluation -- Development versus assessment -- Planning -- Data assessment and data quality assurance -- Categorizing quality -- Relevance -- Cross-validation -- Preparing data -- R and statistical assessment -- Questions to ask -- Learning curves -- Example of a learning curve -- Summary -- Chapter 9: Databases and Neural Networks -- Ask any data scientist -- Defining neural network -- Nodes -- Layers -- Training -- Solution -- Understanding the concepts -- Neural network models and database models -- No single or main node -- Not serial -- No memory address to store results -- R-based neural networks -- References -- Data prep and preprocessing -- Data splitting -- Model parameters -- Cross-validation -- R packages for ANN development -- ANN -- ANN2 -- NNET -- Black boxes -- A use case -- Popular use cases -- Character recognition -- Image compression -- Stock market prediction -- Fraud detection -- Neuroscience -- Summary -- Chapter 10: Boosting your Database -- Definition and purpose -- Bias -- Categorizing bias -- Causes of bias -- Bias data collection -- Bias sample selection. Variance -- ANOVA -- Noise -- Noisy data -- Weak and strong learners -- Weak to strong -- Model bias -- Training and prediction time -- Complexity -- Which way? -- Back to boosting -- How it started -- AdaBoost -- What you can learn from boosting (to help) your database -- Using R to illustrate boosting methods -- Prepping the data -- Training -- Ready for boosting -- Example results -- Summary -- Chapter 11: Database Classification using Support Vector Machines -- Database classification -- Data classification in statistics -- Guidelines for classifying data -- Common guidelines -- Definitions -- Definition and purpose of an SVM -- The trick -- Feature space and cheap computations -- Drawing the line -- More than classification -- Downside -- Reference resources -- Predicting credit scores -- Using R and an SVM to classify data in a database -- Moving on -- Summary -- Chapter 12: Database Structures and Machine Learning -- Data structures and data models -- Data structures -- Data models -- What's the difference? -- Relationships -- Machine learning -- Overview of machine learning concepts -- Key elements of machine learning -- Representation -- Evaluation -- Optimization -- Types of machine learning -- Supervised learning -- Unsupervised learning -- Semi-supervised learning -- Reinforcement learning -- Most popular -- Applications of machine learning -- Machine learning in practice -- Understanding -- Preparation -- Learning -- Interpretation -- Deployment -- Iteration -- Using R to apply machine learning techniques to a database -- Understanding the data -- Preparing -- Data developer -- Understanding the challenge -- Cross-tabbing and plotting -- Summary -- Index. Statistics. http://id.loc.gov/authorities/subjects/sh85127580 Big data. http://id.loc.gov/authorities/subjects/sh2012003227 Données volumineuses. Statistique. statistics. aat COMPUTERS Data Processing. bisacsh Big data fast Statistics fast has work: Statistics for data science (Text) https://id.oclc.org/worldcat/entity/E39PCFw4V4C6wbP7myyQPcFKr3 https://id.oclc.org/worldcat/ontology/hasWork FWS01 ZDB-4-EBA FWS_PDA_EBA https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1636280 Volltext |
spellingShingle | Miller, James D. (Software consultant) Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / Cover -- Copyright -- Credits -- About the Author -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Transitioning from Data Developer to Data Scientist -- Data developer thinking -- Objectives of a data developer -- Querying or mining -- Data quality or data cleansing -- Data modeling -- Issue or insights -- Thought process -- Developer versus scientist -- New data, new source -- Quality questions -- Querying and mining -- Performance -- Financial reporting -- Visualizing -- Tools of the trade -- Advantages of thinking like a data scientist -- Developing a better approach to understanding data -- Using statistical thinking during program or database designing -- Adding to your personal toolbox -- Increased marketability -- Perpetual learning -- Seeing the future -- Transitioning to a data scientist -- Let's move ahead -- Summary -- Chapter 2: Declaring the Objectives -- Key objectives of data science -- Collecting data -- Processing data -- Exploring and visualizing data -- Analyzing the data and/or applying machine learning to the data -- Deciding (or planning) based upon acquired insight -- Thinking like a data scientist -- Bringing statistics into data science -- Common terminology -- Statistical population -- Probability -- False positives -- Statistical inference -- Regression -- Fitting -- Categorical data -- Classification -- Clustering -- Statistical comparison -- Coding -- Distributions -- Data mining -- Decision trees -- Machine learning -- Munging and wrangling -- Visualization -- D3 -- Regularization -- Assessment -- Cross-validation -- Neural networks -- Boosting -- Lift -- Mode -- Outlier -- Predictive modeling -- Big Data -- Confidence interval -- Writing -- Summary -- Chapter 3: A Developer's Approach to Data Cleaning -- Understanding basic data cleaning. Common data issues -- Contextual data issues -- Cleaning techniques -- R and common data issues -- Outliers -- Step 1 -- Profiling the data -- Step 2 -- Addressing the outliers -- Domain expertise -- Validity checking -- Enhancing data -- Harmonization -- Standardization -- Transformations -- Deductive correction -- Deterministic imputation -- Summary -- Chapter 4: Data Mining and the Database Developer -- Data mining -- Common techniques -- Visualization -- Cluster analysis -- Correlation analysis -- Discriminant analysis -- Factor analysis -- Regression analysis -- Logistic analysis -- Purpose -- Mining versus querying -- Choosing R for data mining -- Visualizations -- Current smokers -- Missing values -- A cluster analysis -- Dimensional reduction -- Calculating statistical significance -- Frequent patterning -- Frequent item-setting -- Sequence mining -- Summary -- Chapter 5: Statistical Analysis for the Database Developer -- Data analysis -- Looking closer -- Statistical analysis -- Summarization -- Comparing groups -- Samples -- Group comparison conclusions -- Summarization modeling -- Establishing the nature of data -- Successful statistical analysis -- R and statistical analysis -- Summary -- Chapter 6: Database Progression to Database Regression -- Introducing statistical regression -- Techniques and approaches for regression -- Choosing your technique -- Does it fit? -- Identifying opportunities for statistical regression -- Summarizing data -- Exploring relationships -- Testing significance of differences -- Project profitability -- R and statistical regression -- A working example -- Establishing the data profile -- The graphical analysis -- Predicting with our linear model -- Step 1: Chunking the data -- Step 2: Creating the model on the training data -- Step 3: Predicting the projected profit on test data -- Step 4: Reviewing the model. Step 4: Accuracy and error -- Summary -- Chapter 7: Regularization for Database Improvement -- Statistical regularization -- Various statistical regularization methods -- Ridge -- Lasso -- Least angles -- Opportunities for regularization -- Collinearity -- Sparse solutions -- High-dimensional data -- Classification -- Using data to understand statistical regularization -- Improving data or a data model -- Simplification -- Relevance -- Speed -- Transformation -- Variation of coefficients -- Casual inference -- Back to regularization -- Reliability -- Using R for statistical regularization -- Parameter Setup -- Summary -- Chapter 8: Database Development and Assessment -- Assessment and statistical assessment -- Objectives -- Baselines -- Planning for assessment -- Evaluation -- Development versus assessment -- Planning -- Data assessment and data quality assurance -- Categorizing quality -- Relevance -- Cross-validation -- Preparing data -- R and statistical assessment -- Questions to ask -- Learning curves -- Example of a learning curve -- Summary -- Chapter 9: Databases and Neural Networks -- Ask any data scientist -- Defining neural network -- Nodes -- Layers -- Training -- Solution -- Understanding the concepts -- Neural network models and database models -- No single or main node -- Not serial -- No memory address to store results -- R-based neural networks -- References -- Data prep and preprocessing -- Data splitting -- Model parameters -- Cross-validation -- R packages for ANN development -- ANN -- ANN2 -- NNET -- Black boxes -- A use case -- Popular use cases -- Character recognition -- Image compression -- Stock market prediction -- Fraud detection -- Neuroscience -- Summary -- Chapter 10: Boosting your Database -- Definition and purpose -- Bias -- Categorizing bias -- Causes of bias -- Bias data collection -- Bias sample selection. Variance -- ANOVA -- Noise -- Noisy data -- Weak and strong learners -- Weak to strong -- Model bias -- Training and prediction time -- Complexity -- Which way? -- Back to boosting -- How it started -- AdaBoost -- What you can learn from boosting (to help) your database -- Using R to illustrate boosting methods -- Prepping the data -- Training -- Ready for boosting -- Example results -- Summary -- Chapter 11: Database Classification using Support Vector Machines -- Database classification -- Data classification in statistics -- Guidelines for classifying data -- Common guidelines -- Definitions -- Definition and purpose of an SVM -- The trick -- Feature space and cheap computations -- Drawing the line -- More than classification -- Downside -- Reference resources -- Predicting credit scores -- Using R and an SVM to classify data in a database -- Moving on -- Summary -- Chapter 12: Database Structures and Machine Learning -- Data structures and data models -- Data structures -- Data models -- What's the difference? -- Relationships -- Machine learning -- Overview of machine learning concepts -- Key elements of machine learning -- Representation -- Evaluation -- Optimization -- Types of machine learning -- Supervised learning -- Unsupervised learning -- Semi-supervised learning -- Reinforcement learning -- Most popular -- Applications of machine learning -- Machine learning in practice -- Understanding -- Preparation -- Learning -- Interpretation -- Deployment -- Iteration -- Using R to apply machine learning techniques to a database -- Understanding the data -- Preparing -- Data developer -- Understanding the challenge -- Cross-tabbing and plotting -- Summary -- Index. Statistics. http://id.loc.gov/authorities/subjects/sh85127580 Big data. http://id.loc.gov/authorities/subjects/sh2012003227 Données volumineuses. Statistique. statistics. aat COMPUTERS Data Processing. bisacsh Big data fast Statistics fast |
subject_GND | http://id.loc.gov/authorities/subjects/sh85127580 http://id.loc.gov/authorities/subjects/sh2012003227 |
title | Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / |
title_auth | Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / |
title_exact_search | Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / |
title_full | Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / James D. Miller. |
title_fullStr | Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / James D. Miller. |
title_full_unstemmed | Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / James D. Miller. |
title_short | Statistics for data science : |
title_sort | statistics for data science leverage the power of statistics for data analysis classification regression machine learning and neural networks |
title_sub | leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / |
topic | Statistics. http://id.loc.gov/authorities/subjects/sh85127580 Big data. http://id.loc.gov/authorities/subjects/sh2012003227 Données volumineuses. Statistique. statistics. aat COMPUTERS Data Processing. bisacsh Big data fast Statistics fast |
topic_facet | Statistics. Big data. Données volumineuses. Statistique. statistics. COMPUTERS Data Processing. Big data Statistics |
url | https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1636280 |
work_keys_str_mv | AT millerjamesd statisticsfordatascienceleveragethepowerofstatisticsfordataanalysisclassificationregressionmachinelearningandneuralnetworks |