Internformat: Statistics for data science :

Statistics for data science :: leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks /

Get your statistics basics right before diving into the world of data science About This Book No need to take a degree in statistics, read this book and get a strong statistics base for data science and real-world programs; Implement statistics in data science tasks such as data cleaning, mining, an...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Miller, James D. (Software consultant) (VerfasserIn)
Format:	Elektronisch E-Book
Sprache:	English
Veröffentlicht:	Birmingham, UK : Packt Publishing, 2017.
Schlagworte:	Statistics. Big data. Données volumineuses. Statistique. statistics. COMPUTERS > Data Processing. Big data Statistics
Online-Zugang:	Volltext
Zusammenfassung:	Get your statistics basics right before diving into the world of data science About This Book No need to take a degree in statistics, read this book and get a strong statistics base for data science and real-world programs; Implement statistics in data science tasks such as data cleaning, mining, and analysis Learn all about probability, statistics, numerical computations, and more with the help of R programs Who This Book Is For This book is intended for those developers who are willing to enter the field of data science and are looking for concise information of statistics with the help of insightful programs and simple explanation. Some basic hands on R will be useful. What You Will Learn Analyze the transition from a data developer to a data scientist mindset Get acquainted with the R programs and the logic used for statistical computations Understand mathematical concepts such as variance, standard deviation, probability, matrix calculations, and more Learn to implement statistics in data science tasks such as data cleaning, mining, and analysis Learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks Get comfortable with performing various statistical computations for data science programmatically In Detail Data science is an ever-evolving field, which is growing in popularity at an exponential rate. Data science includes techniques and theories extracted from the fields of statistics; computer science, and, most importantly, machine learning, databases, data visualization, and so on. This book takes you through an entire journey of statistics, from knowing very little to becoming comfortable in using various statistical methods for data science tasks. It starts off with simple statistics and then move on to statistical methods that are used in data science algorithms. The R programs for statistical computation are clearly explained along with logic. You will come across various mathematical concepts, such as variance, standard deviation, probability, matrix calculations, and more. You will learn only what is required to implement statistics in data science tasks such as data cleaning, mining, and analysis. You will learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks. By the end of the book, you will be comfortab ...
Beschreibung:	1 online resource (1 volume) : illustrations
ISBN:	9781788295345 178829534X 1788290674 9781788290678

Internformat

MARC


LEADER	00000cam a2200000 i 4500
001	ZDB-4-EBA-on1017754186
003	OCoLC
005	20241004212047.0
006	m o d
007	cr unu\|\|\|\|\|\|\|\|
008	180104s2017 enka o 000 0 eng d
040			\|a UMI \|b eng \|e rda \|e pn \|c UMI \|d IDEBK \|d TOH \|d STF \|d OCLCF \|d N$T \|d SNM \|d CEF \|d KSU \|d DEBBG \|d TEFOD \|d G3B \|d S9I \|d UAB \|d QGK \|d OCLCO \|d OCLCQ \|d OCLCO \|d K6U \|d OCLCQ \|d OCLCO \|d OCLCL \|d DXU
020			\|a 9781788295345 \|q (electronic bk.)
020			\|a 178829534X \|q (electronic bk.)
020			\|a 1788290674
020			\|a 9781788290678
020			\|z 9781788290678
035			\|a (OCoLC)1017754186
037			\|a CL0500000922 \|b Safari Books Online
037			\|a 8A0E96E4-6AE2-4399-BE71-56ACDE2E5ED3 \|b OverDrive, Inc. \|n http://www.overdrive.com
050		4	\|a QA276.4.M55 \|b S73 2017eb
072		7	\|a COM \|x 018000 \|2 bisacsh
082	7		\|a 005.7565 \|2 23
049			\|a MAIN
100	1		\|a Miller, James D. \|c (Software consultant), \|e author. \|0 http://id.loc.gov/authorities/names/nb2016005442
245	1	0	\|a Statistics for data science : \|b leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / \|c James D. Miller.
264		1	\|a Birmingham, UK : \|b Packt Publishing, \|c 2017.
300			\|a 1 online resource (1 volume) : \|b illustrations
336			\|a text \|b txt \|2 rdacontent
337			\|a computer \|b c \|2 rdamedia
338			\|a online resource \|b cr \|2 rdacarrier
347			\|a data file
588	0		\|a Online resource; title from title page (viewed January 2, 2018).
520			\|a Get your statistics basics right before diving into the world of data science About This Book No need to take a degree in statistics, read this book and get a strong statistics base for data science and real-world programs; Implement statistics in data science tasks such as data cleaning, mining, and analysis Learn all about probability, statistics, numerical computations, and more with the help of R programs Who This Book Is For This book is intended for those developers who are willing to enter the field of data science and are looking for concise information of statistics with the help of insightful programs and simple explanation. Some basic hands on R will be useful. What You Will Learn Analyze the transition from a data developer to a data scientist mindset Get acquainted with the R programs and the logic used for statistical computations Understand mathematical concepts such as variance, standard deviation, probability, matrix calculations, and more Learn to implement statistics in data science tasks such as data cleaning, mining, and analysis Learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks Get comfortable with performing various statistical computations for data science programmatically In Detail Data science is an ever-evolving field, which is growing in popularity at an exponential rate. Data science includes techniques and theories extracted from the fields of statistics; computer science, and, most importantly, machine learning, databases, data visualization, and so on. This book takes you through an entire journey of statistics, from knowing very little to becoming comfortable in using various statistical methods for data science tasks. It starts off with simple statistics and then move on to statistical methods that are used in data science algorithms. The R programs for statistical computation are clearly explained along with logic. You will come across various mathematical concepts, such as variance, standard deviation, probability, matrix calculations, and more. You will learn only what is required to implement statistics in data science tasks such as data cleaning, mining, and analysis. You will learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks. By the end of the book, you will be comfortab ...
505	0		\|a Cover -- Copyright -- Credits -- About the Author -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Transitioning from Data Developer to Data Scientist -- Data developer thinking -- Objectives of a data developer -- Querying or mining -- Data quality or data cleansing -- Data modeling -- Issue or insights -- Thought process -- Developer versus scientist -- New data, new source -- Quality questions -- Querying and mining -- Performance -- Financial reporting -- Visualizing -- Tools of the trade -- Advantages of thinking like a data scientist -- Developing a better approach to understanding data -- Using statistical thinking during program or database designing -- Adding to your personal toolbox -- Increased marketability -- Perpetual learning -- Seeing the future -- Transitioning to a data scientist -- Let's move ahead -- Summary -- Chapter 2: Declaring the Objectives -- Key objectives of data science -- Collecting data -- Processing data -- Exploring and visualizing data -- Analyzing the data and/or applying machine learning to the data -- Deciding (or planning) based upon acquired insight -- Thinking like a data scientist -- Bringing statistics into data science -- Common terminology -- Statistical population -- Probability -- False positives -- Statistical inference -- Regression -- Fitting -- Categorical data -- Classification -- Clustering -- Statistical comparison -- Coding -- Distributions -- Data mining -- Decision trees -- Machine learning -- Munging and wrangling -- Visualization -- D3 -- Regularization -- Assessment -- Cross-validation -- Neural networks -- Boosting -- Lift -- Mode -- Outlier -- Predictive modeling -- Big Data -- Confidence interval -- Writing -- Summary -- Chapter 3: A Developer's Approach to Data Cleaning -- Understanding basic data cleaning.
505	8		\|a Common data issues -- Contextual data issues -- Cleaning techniques -- R and common data issues -- Outliers -- Step 1 -- Profiling the data -- Step 2 -- Addressing the outliers -- Domain expertise -- Validity checking -- Enhancing data -- Harmonization -- Standardization -- Transformations -- Deductive correction -- Deterministic imputation -- Summary -- Chapter 4: Data Mining and the Database Developer -- Data mining -- Common techniques -- Visualization -- Cluster analysis -- Correlation analysis -- Discriminant analysis -- Factor analysis -- Regression analysis -- Logistic analysis -- Purpose -- Mining versus querying -- Choosing R for data mining -- Visualizations -- Current smokers -- Missing values -- A cluster analysis -- Dimensional reduction -- Calculating statistical significance -- Frequent patterning -- Frequent item-setting -- Sequence mining -- Summary -- Chapter 5: Statistical Analysis for the Database Developer -- Data analysis -- Looking closer -- Statistical analysis -- Summarization -- Comparing groups -- Samples -- Group comparison conclusions -- Summarization modeling -- Establishing the nature of data -- Successful statistical analysis -- R and statistical analysis -- Summary -- Chapter 6: Database Progression to Database Regression -- Introducing statistical regression -- Techniques and approaches for regression -- Choosing your technique -- Does it fit? -- Identifying opportunities for statistical regression -- Summarizing data -- Exploring relationships -- Testing significance of differences -- Project profitability -- R and statistical regression -- A working example -- Establishing the data profile -- The graphical analysis -- Predicting with our linear model -- Step 1: Chunking the data -- Step 2: Creating the model on the training data -- Step 3: Predicting the projected profit on test data -- Step 4: Reviewing the model.
505	8		\|a Step 4: Accuracy and error -- Summary -- Chapter 7: Regularization for Database Improvement -- Statistical regularization -- Various statistical regularization methods -- Ridge -- Lasso -- Least angles -- Opportunities for regularization -- Collinearity -- Sparse solutions -- High-dimensional data -- Classification -- Using data to understand statistical regularization -- Improving data or a data model -- Simplification -- Relevance -- Speed -- Transformation -- Variation of coefficients -- Casual inference -- Back to regularization -- Reliability -- Using R for statistical regularization -- Parameter Setup -- Summary -- Chapter 8: Database Development and Assessment -- Assessment and statistical assessment -- Objectives -- Baselines -- Planning for assessment -- Evaluation -- Development versus assessment -- Planning -- Data assessment and data quality assurance -- Categorizing quality -- Relevance -- Cross-validation -- Preparing data -- R and statistical assessment -- Questions to ask -- Learning curves -- Example of a learning curve -- Summary -- Chapter 9: Databases and Neural Networks -- Ask any data scientist -- Defining neural network -- Nodes -- Layers -- Training -- Solution -- Understanding the concepts -- Neural network models and database models -- No single or main node -- Not serial -- No memory address to store results -- R-based neural networks -- References -- Data prep and preprocessing -- Data splitting -- Model parameters -- Cross-validation -- R packages for ANN development -- ANN -- ANN2 -- NNET -- Black boxes -- A use case -- Popular use cases -- Character recognition -- Image compression -- Stock market prediction -- Fraud detection -- Neuroscience -- Summary -- Chapter 10: Boosting your Database -- Definition and purpose -- Bias -- Categorizing bias -- Causes of bias -- Bias data collection -- Bias sample selection.
505	8		\|a Variance -- ANOVA -- Noise -- Noisy data -- Weak and strong learners -- Weak to strong -- Model bias -- Training and prediction time -- Complexity -- Which way? -- Back to boosting -- How it started -- AdaBoost -- What you can learn from boosting (to help) your database -- Using R to illustrate boosting methods -- Prepping the data -- Training -- Ready for boosting -- Example results -- Summary -- Chapter 11: Database Classification using Support Vector Machines -- Database classification -- Data classification in statistics -- Guidelines for classifying data -- Common guidelines -- Definitions -- Definition and purpose of an SVM -- The trick -- Feature space and cheap computations -- Drawing the line -- More than classification -- Downside -- Reference resources -- Predicting credit scores -- Using R and an SVM to classify data in a database -- Moving on -- Summary -- Chapter 12: Database Structures and Machine Learning -- Data structures and data models -- Data structures -- Data models -- What's the difference? -- Relationships -- Machine learning -- Overview of machine learning concepts -- Key elements of machine learning -- Representation -- Evaluation -- Optimization -- Types of machine learning -- Supervised learning -- Unsupervised learning -- Semi-supervised learning -- Reinforcement learning -- Most popular -- Applications of machine learning -- Machine learning in practice -- Understanding -- Preparation -- Learning -- Interpretation -- Deployment -- Iteration -- Using R to apply machine learning techniques to a database -- Understanding the data -- Preparing -- Data developer -- Understanding the challenge -- Cross-tabbing and plotting -- Summary -- Index.
650		0	\|a Statistics. \|0 http://id.loc.gov/authorities/subjects/sh85127580
650		0	\|a Big data. \|0 http://id.loc.gov/authorities/subjects/sh2012003227
650		6	\|a Données volumineuses.
650		6	\|a Statistique.
650		7	\|a statistics. \|2 aat
650		7	\|a COMPUTERS \|x Data Processing. \|2 bisacsh
650		7	\|a Big data \|2 fast
650		7	\|a Statistics \|2 fast
758			\|i has work: \|a Statistics for data science (Text) \|1 https://id.oclc.org/worldcat/entity/E39PCFw4V4C6wbP7myyQPcFKr3 \|4 https://id.oclc.org/worldcat/ontology/hasWork
856	4	0	\|l FWS01 \|p ZDB-4-EBA \|q FWS_PDA_EBA \|u https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1636280 \|3 Volltext
938			\|a EBSCOhost \|b EBSC \|n 1636280
938			\|a ProQuest MyiLibrary Digital eBook Collection \|b IDEB \|n cis38039490
994			\|a 92 \|b GEBAY
912			\|a ZDB-4-EBA
049			\|a DE-863

Datensatz im Suchindex

DE-BY-FWS_katkey	ZDB-4-EBA-on1017754186
_version_	1816882409571876864
adam_text
any_adam_object
author	Miller, James D. (Software consultant)
author_GND	http://id.loc.gov/authorities/names/nb2016005442
author_facet	Miller, James D. (Software consultant)
author_role	aut
author_sort	Miller, James D. (Software consultant)
author_variant	j d m jd jdm
building	Verbundindex
bvnumber	localFWS
callnumber-first	Q - Science
callnumber-label	QA276
callnumber-raw	QA276.4.M55 S73 2017eb
callnumber-search	QA276.4.M55 S73 2017eb
callnumber-sort	QA 3276.4 M55 S73 42017EB
callnumber-subject	QA - Mathematics
collection	ZDB-4-EBA
contents	Cover -- Copyright -- Credits -- About the Author -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Transitioning from Data Developer to Data Scientist -- Data developer thinking -- Objectives of a data developer -- Querying or mining -- Data quality or data cleansing -- Data modeling -- Issue or insights -- Thought process -- Developer versus scientist -- New data, new source -- Quality questions -- Querying and mining -- Performance -- Financial reporting -- Visualizing -- Tools of the trade -- Advantages of thinking like a data scientist -- Developing a better approach to understanding data -- Using statistical thinking during program or database designing -- Adding to your personal toolbox -- Increased marketability -- Perpetual learning -- Seeing the future -- Transitioning to a data scientist -- Let's move ahead -- Summary -- Chapter 2: Declaring the Objectives -- Key objectives of data science -- Collecting data -- Processing data -- Exploring and visualizing data -- Analyzing the data and/or applying machine learning to the data -- Deciding (or planning) based upon acquired insight -- Thinking like a data scientist -- Bringing statistics into data science -- Common terminology -- Statistical population -- Probability -- False positives -- Statistical inference -- Regression -- Fitting -- Categorical data -- Classification -- Clustering -- Statistical comparison -- Coding -- Distributions -- Data mining -- Decision trees -- Machine learning -- Munging and wrangling -- Visualization -- D3 -- Regularization -- Assessment -- Cross-validation -- Neural networks -- Boosting -- Lift -- Mode -- Outlier -- Predictive modeling -- Big Data -- Confidence interval -- Writing -- Summary -- Chapter 3: A Developer's Approach to Data Cleaning -- Understanding basic data cleaning. Common data issues -- Contextual data issues -- Cleaning techniques -- R and common data issues -- Outliers -- Step 1 -- Profiling the data -- Step 2 -- Addressing the outliers -- Domain expertise -- Validity checking -- Enhancing data -- Harmonization -- Standardization -- Transformations -- Deductive correction -- Deterministic imputation -- Summary -- Chapter 4: Data Mining and the Database Developer -- Data mining -- Common techniques -- Visualization -- Cluster analysis -- Correlation analysis -- Discriminant analysis -- Factor analysis -- Regression analysis -- Logistic analysis -- Purpose -- Mining versus querying -- Choosing R for data mining -- Visualizations -- Current smokers -- Missing values -- A cluster analysis -- Dimensional reduction -- Calculating statistical significance -- Frequent patterning -- Frequent item-setting -- Sequence mining -- Summary -- Chapter 5: Statistical Analysis for the Database Developer -- Data analysis -- Looking closer -- Statistical analysis -- Summarization -- Comparing groups -- Samples -- Group comparison conclusions -- Summarization modeling -- Establishing the nature of data -- Successful statistical analysis -- R and statistical analysis -- Summary -- Chapter 6: Database Progression to Database Regression -- Introducing statistical regression -- Techniques and approaches for regression -- Choosing your technique -- Does it fit? -- Identifying opportunities for statistical regression -- Summarizing data -- Exploring relationships -- Testing significance of differences -- Project profitability -- R and statistical regression -- A working example -- Establishing the data profile -- The graphical analysis -- Predicting with our linear model -- Step 1: Chunking the data -- Step 2: Creating the model on the training data -- Step 3: Predicting the projected profit on test data -- Step 4: Reviewing the model. Step 4: Accuracy and error -- Summary -- Chapter 7: Regularization for Database Improvement -- Statistical regularization -- Various statistical regularization methods -- Ridge -- Lasso -- Least angles -- Opportunities for regularization -- Collinearity -- Sparse solutions -- High-dimensional data -- Classification -- Using data to understand statistical regularization -- Improving data or a data model -- Simplification -- Relevance -- Speed -- Transformation -- Variation of coefficients -- Casual inference -- Back to regularization -- Reliability -- Using R for statistical regularization -- Parameter Setup -- Summary -- Chapter 8: Database Development and Assessment -- Assessment and statistical assessment -- Objectives -- Baselines -- Planning for assessment -- Evaluation -- Development versus assessment -- Planning -- Data assessment and data quality assurance -- Categorizing quality -- Relevance -- Cross-validation -- Preparing data -- R and statistical assessment -- Questions to ask -- Learning curves -- Example of a learning curve -- Summary -- Chapter 9: Databases and Neural Networks -- Ask any data scientist -- Defining neural network -- Nodes -- Layers -- Training -- Solution -- Understanding the concepts -- Neural network models and database models -- No single or main node -- Not serial -- No memory address to store results -- R-based neural networks -- References -- Data prep and preprocessing -- Data splitting -- Model parameters -- Cross-validation -- R packages for ANN development -- ANN -- ANN2 -- NNET -- Black boxes -- A use case -- Popular use cases -- Character recognition -- Image compression -- Stock market prediction -- Fraud detection -- Neuroscience -- Summary -- Chapter 10: Boosting your Database -- Definition and purpose -- Bias -- Categorizing bias -- Causes of bias -- Bias data collection -- Bias sample selection. Variance -- ANOVA -- Noise -- Noisy data -- Weak and strong learners -- Weak to strong -- Model bias -- Training and prediction time -- Complexity -- Which way? -- Back to boosting -- How it started -- AdaBoost -- What you can learn from boosting (to help) your database -- Using R to illustrate boosting methods -- Prepping the data -- Training -- Ready for boosting -- Example results -- Summary -- Chapter 11: Database Classification using Support Vector Machines -- Database classification -- Data classification in statistics -- Guidelines for classifying data -- Common guidelines -- Definitions -- Definition and purpose of an SVM -- The trick -- Feature space and cheap computations -- Drawing the line -- More than classification -- Downside -- Reference resources -- Predicting credit scores -- Using R and an SVM to classify data in a database -- Moving on -- Summary -- Chapter 12: Database Structures and Machine Learning -- Data structures and data models -- Data structures -- Data models -- What's the difference? -- Relationships -- Machine learning -- Overview of machine learning concepts -- Key elements of machine learning -- Representation -- Evaluation -- Optimization -- Types of machine learning -- Supervised learning -- Unsupervised learning -- Semi-supervised learning -- Reinforcement learning -- Most popular -- Applications of machine learning -- Machine learning in practice -- Understanding -- Preparation -- Learning -- Interpretation -- Deployment -- Iteration -- Using R to apply machine learning techniques to a database -- Understanding the data -- Preparing -- Data developer -- Understanding the challenge -- Cross-tabbing and plotting -- Summary -- Index.
ctrlnum	(OCoLC)1017754186
dewey-full	005.7565
dewey-hundreds	000 - Computer science, information, general works
dewey-ones	005 - Computer programming, programs, data, security
dewey-raw	005.7565
dewey-search	005.7565
dewey-sort	15.7565
dewey-tens	000 - Computer science, information, general works
discipline	Informatik
format	Electronic eBook
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>12201cam a2200577 i 4500</leader><controlfield tag="001">ZDB-4-EBA-on1017754186</controlfield><controlfield tag="003">OCoLC</controlfield><controlfield tag="005">20241004212047.0</controlfield><controlfield tag="006">m o d </controlfield><controlfield tag="007">cr unu\|\|\|\|\|\|\|\|</controlfield><controlfield tag="008">180104s2017 enka o 000 0 eng d</controlfield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">UMI</subfield><subfield code="b">eng</subfield><subfield code="e">rda</subfield><subfield code="e">pn</subfield><subfield code="c">UMI</subfield><subfield code="d">IDEBK</subfield><subfield code="d">TOH</subfield><subfield code="d">STF</subfield><subfield code="d">OCLCF</subfield><subfield code="d">N$T</subfield><subfield code="d">SNM</subfield><subfield code="d">CEF</subfield><subfield code="d">KSU</subfield><subfield code="d">DEBBG</subfield><subfield code="d">TEFOD</subfield><subfield code="d">G3B</subfield><subfield code="d">S9I</subfield><subfield code="d">UAB</subfield><subfield code="d">QGK</subfield><subfield code="d">OCLCO</subfield><subfield code="d">OCLCQ</subfield><subfield code="d">OCLCO</subfield><subfield code="d">K6U</subfield><subfield code="d">OCLCQ</subfield><subfield code="d">OCLCO</subfield><subfield code="d">OCLCL</subfield><subfield code="d">DXU</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781788295345</subfield><subfield code="q">(electronic bk.)</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">178829534X</subfield><subfield code="q">(electronic bk.)</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1788290674</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781788290678</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="z">9781788290678</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1017754186</subfield></datafield><datafield tag="037" ind1=" " ind2=" "><subfield code="a">CL0500000922</subfield><subfield code="b">Safari Books Online</subfield></datafield><datafield tag="037" ind1=" " ind2=" "><subfield code="a">8A0E96E4-6AE2-4399-BE71-56ACDE2E5ED3</subfield><subfield code="b">OverDrive, Inc.</subfield><subfield code="n">http://www.overdrive.com</subfield></datafield><datafield tag="050" ind1=" " ind2="4"><subfield code="a">QA276.4.M55</subfield><subfield code="b">S73 2017eb</subfield></datafield><datafield tag="072" ind1=" " ind2="7"><subfield code="a">COM</subfield><subfield code="x">018000</subfield><subfield code="2">bisacsh</subfield></datafield><datafield tag="082" ind1="7" ind2=" "><subfield code="a">005.7565</subfield><subfield code="2">23</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">MAIN</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Miller, James D.</subfield><subfield code="c">(Software consultant),</subfield><subfield code="e">author.</subfield><subfield code="0">http://id.loc.gov/authorities/names/nb2016005442</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Statistics for data science :</subfield><subfield code="b">leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks /</subfield><subfield code="c">James D. Miller.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Birmingham, UK :</subfield><subfield code="b">Packt Publishing,</subfield><subfield code="c">2017.</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 online resource (1 volume) :</subfield><subfield code="b">illustrations</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">computer</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">online resource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="347" ind1=" " ind2=" "><subfield code="a">data file</subfield></datafield><datafield tag="588" ind1="0" ind2=" "><subfield code="a">Online resource; title from title page (viewed January 2, 2018).</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Get your statistics basics right before diving into the world of data science About This Book No need to take a degree in statistics, read this book and get a strong statistics base for data science and real-world programs; Implement statistics in data science tasks such as data cleaning, mining, and analysis Learn all about probability, statistics, numerical computations, and more with the help of R programs Who This Book Is For This book is intended for those developers who are willing to enter the field of data science and are looking for concise information of statistics with the help of insightful programs and simple explanation. Some basic hands on R will be useful. What You Will Learn Analyze the transition from a data developer to a data scientist mindset Get acquainted with the R programs and the logic used for statistical computations Understand mathematical concepts such as variance, standard deviation, probability, matrix calculations, and more Learn to implement statistics in data science tasks such as data cleaning, mining, and analysis Learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks Get comfortable with performing various statistical computations for data science programmatically In Detail Data science is an ever-evolving field, which is growing in popularity at an exponential rate. Data science includes techniques and theories extracted from the fields of statistics; computer science, and, most importantly, machine learning, databases, data visualization, and so on. This book takes you through an entire journey of statistics, from knowing very little to becoming comfortable in using various statistical methods for data science tasks. It starts off with simple statistics and then move on to statistical methods that are used in data science algorithms. The R programs for statistical computation are clearly explained along with logic. You will come across various mathematical concepts, such as variance, standard deviation, probability, matrix calculations, and more. You will learn only what is required to implement statistics in data science tasks such as data cleaning, mining, and analysis. You will learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks. By the end of the book, you will be comfortab ...</subfield></datafield><datafield tag="505" ind1="0" ind2=" "><subfield code="a">Cover -- Copyright -- Credits -- About the Author -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Transitioning from Data Developer to Data Scientist -- Data developer thinking -- Objectives of a data developer -- Querying or mining -- Data quality or data cleansing -- Data modeling -- Issue or insights -- Thought process -- Developer versus scientist -- New data, new source -- Quality questions -- Querying and mining -- Performance -- Financial reporting -- Visualizing -- Tools of the trade -- Advantages of thinking like a data scientist -- Developing a better approach to understanding data -- Using statistical thinking during program or database designing -- Adding to your personal toolbox -- Increased marketability -- Perpetual learning -- Seeing the future -- Transitioning to a data scientist -- Let's move ahead -- Summary -- Chapter 2: Declaring the Objectives -- Key objectives of data science -- Collecting data -- Processing data -- Exploring and visualizing data -- Analyzing the data and/or applying machine learning to the data -- Deciding (or planning) based upon acquired insight -- Thinking like a data scientist -- Bringing statistics into data science -- Common terminology -- Statistical population -- Probability -- False positives -- Statistical inference -- Regression -- Fitting -- Categorical data -- Classification -- Clustering -- Statistical comparison -- Coding -- Distributions -- Data mining -- Decision trees -- Machine learning -- Munging and wrangling -- Visualization -- D3 -- Regularization -- Assessment -- Cross-validation -- Neural networks -- Boosting -- Lift -- Mode -- Outlier -- Predictive modeling -- Big Data -- Confidence interval -- Writing -- Summary -- Chapter 3: A Developer's Approach to Data Cleaning -- Understanding basic data cleaning.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Common data issues -- Contextual data issues -- Cleaning techniques -- R and common data issues -- Outliers -- Step 1 -- Profiling the data -- Step 2 -- Addressing the outliers -- Domain expertise -- Validity checking -- Enhancing data -- Harmonization -- Standardization -- Transformations -- Deductive correction -- Deterministic imputation -- Summary -- Chapter 4: Data Mining and the Database Developer -- Data mining -- Common techniques -- Visualization -- Cluster analysis -- Correlation analysis -- Discriminant analysis -- Factor analysis -- Regression analysis -- Logistic analysis -- Purpose -- Mining versus querying -- Choosing R for data mining -- Visualizations -- Current smokers -- Missing values -- A cluster analysis -- Dimensional reduction -- Calculating statistical significance -- Frequent patterning -- Frequent item-setting -- Sequence mining -- Summary -- Chapter 5: Statistical Analysis for the Database Developer -- Data analysis -- Looking closer -- Statistical analysis -- Summarization -- Comparing groups -- Samples -- Group comparison conclusions -- Summarization modeling -- Establishing the nature of data -- Successful statistical analysis -- R and statistical analysis -- Summary -- Chapter 6: Database Progression to Database Regression -- Introducing statistical regression -- Techniques and approaches for regression -- Choosing your technique -- Does it fit? -- Identifying opportunities for statistical regression -- Summarizing data -- Exploring relationships -- Testing significance of differences -- Project profitability -- R and statistical regression -- A working example -- Establishing the data profile -- The graphical analysis -- Predicting with our linear model -- Step 1: Chunking the data -- Step 2: Creating the model on the training data -- Step 3: Predicting the projected profit on test data -- Step 4: Reviewing the model.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Step 4: Accuracy and error -- Summary -- Chapter 7: Regularization for Database Improvement -- Statistical regularization -- Various statistical regularization methods -- Ridge -- Lasso -- Least angles -- Opportunities for regularization -- Collinearity -- Sparse solutions -- High-dimensional data -- Classification -- Using data to understand statistical regularization -- Improving data or a data model -- Simplification -- Relevance -- Speed -- Transformation -- Variation of coefficients -- Casual inference -- Back to regularization -- Reliability -- Using R for statistical regularization -- Parameter Setup -- Summary -- Chapter 8: Database Development and Assessment -- Assessment and statistical assessment -- Objectives -- Baselines -- Planning for assessment -- Evaluation -- Development versus assessment -- Planning -- Data assessment and data quality assurance -- Categorizing quality -- Relevance -- Cross-validation -- Preparing data -- R and statistical assessment -- Questions to ask -- Learning curves -- Example of a learning curve -- Summary -- Chapter 9: Databases and Neural Networks -- Ask any data scientist -- Defining neural network -- Nodes -- Layers -- Training -- Solution -- Understanding the concepts -- Neural network models and database models -- No single or main node -- Not serial -- No memory address to store results -- R-based neural networks -- References -- Data prep and preprocessing -- Data splitting -- Model parameters -- Cross-validation -- R packages for ANN development -- ANN -- ANN2 -- NNET -- Black boxes -- A use case -- Popular use cases -- Character recognition -- Image compression -- Stock market prediction -- Fraud detection -- Neuroscience -- Summary -- Chapter 10: Boosting your Database -- Definition and purpose -- Bias -- Categorizing bias -- Causes of bias -- Bias data collection -- Bias sample selection.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Variance -- ANOVA -- Noise -- Noisy data -- Weak and strong learners -- Weak to strong -- Model bias -- Training and prediction time -- Complexity -- Which way? -- Back to boosting -- How it started -- AdaBoost -- What you can learn from boosting (to help) your database -- Using R to illustrate boosting methods -- Prepping the data -- Training -- Ready for boosting -- Example results -- Summary -- Chapter 11: Database Classification using Support Vector Machines -- Database classification -- Data classification in statistics -- Guidelines for classifying data -- Common guidelines -- Definitions -- Definition and purpose of an SVM -- The trick -- Feature space and cheap computations -- Drawing the line -- More than classification -- Downside -- Reference resources -- Predicting credit scores -- Using R and an SVM to classify data in a database -- Moving on -- Summary -- Chapter 12: Database Structures and Machine Learning -- Data structures and data models -- Data structures -- Data models -- What's the difference? -- Relationships -- Machine learning -- Overview of machine learning concepts -- Key elements of machine learning -- Representation -- Evaluation -- Optimization -- Types of machine learning -- Supervised learning -- Unsupervised learning -- Semi-supervised learning -- Reinforcement learning -- Most popular -- Applications of machine learning -- Machine learning in practice -- Understanding -- Preparation -- Learning -- Interpretation -- Deployment -- Iteration -- Using R to apply machine learning techniques to a database -- Understanding the data -- Preparing -- Data developer -- Understanding the challenge -- Cross-tabbing and plotting -- Summary -- Index.</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Statistics.</subfield><subfield code="0">http://id.loc.gov/authorities/subjects/sh85127580</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Big data.</subfield><subfield code="0">http://id.loc.gov/authorities/subjects/sh2012003227</subfield></datafield><datafield tag="650" ind1=" " ind2="6"><subfield code="a">Données volumineuses.</subfield></datafield><datafield tag="650" ind1=" " ind2="6"><subfield code="a">Statistique.</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">statistics.</subfield><subfield code="2">aat</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">COMPUTERS</subfield><subfield code="x">Data Processing.</subfield><subfield code="2">bisacsh</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Big data</subfield><subfield code="2">fast</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Statistics</subfield><subfield code="2">fast</subfield></datafield><datafield tag="758" ind1=" " ind2=" "><subfield code="i">has work:</subfield><subfield code="a">Statistics for data science (Text)</subfield><subfield code="1">https://id.oclc.org/worldcat/entity/E39PCFw4V4C6wbP7myyQPcFKr3</subfield><subfield code="4">https://id.oclc.org/worldcat/ontology/hasWork</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="l">FWS01</subfield><subfield code="p">ZDB-4-EBA</subfield><subfield code="q">FWS_PDA_EBA</subfield><subfield code="u">https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1636280</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="938" ind1=" " ind2=" "><subfield code="a">EBSCOhost</subfield><subfield code="b">EBSC</subfield><subfield code="n">1636280</subfield></datafield><datafield tag="938" ind1=" " ind2=" "><subfield code="a">ProQuest MyiLibrary Digital eBook Collection</subfield><subfield code="b">IDEB</subfield><subfield code="n">cis38039490</subfield></datafield><datafield tag="994" ind1=" " ind2=" "><subfield code="a">92</subfield><subfield code="b">GEBAY</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-4-EBA</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-863</subfield></datafield></record></collection>
id	ZDB-4-EBA-on1017754186
illustrated	Illustrated
indexdate	2024-11-27T13:28:09Z
institution	BVB
isbn	9781788295345 178829534X 1788290674 9781788290678
language	English
oclc_num	1017754186
open_access_boolean
owner	MAIN DE-863 DE-BY-FWS
owner_facet	MAIN DE-863 DE-BY-FWS
physical	1 online resource (1 volume) : illustrations
psigel	ZDB-4-EBA
publishDate	2017
publishDateSearch	2017
publishDateSort	2017
publisher	Packt Publishing,
record_format	marc
spelling	Miller, James D. (Software consultant), author. http://id.loc.gov/authorities/names/nb2016005442 Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / James D. Miller. Birmingham, UK : Packt Publishing, 2017. 1 online resource (1 volume) : illustrations text txt rdacontent computer c rdamedia online resource cr rdacarrier data file Online resource; title from title page (viewed January 2, 2018). Get your statistics basics right before diving into the world of data science About This Book No need to take a degree in statistics, read this book and get a strong statistics base for data science and real-world programs; Implement statistics in data science tasks such as data cleaning, mining, and analysis Learn all about probability, statistics, numerical computations, and more with the help of R programs Who This Book Is For This book is intended for those developers who are willing to enter the field of data science and are looking for concise information of statistics with the help of insightful programs and simple explanation. Some basic hands on R will be useful. What You Will Learn Analyze the transition from a data developer to a data scientist mindset Get acquainted with the R programs and the logic used for statistical computations Understand mathematical concepts such as variance, standard deviation, probability, matrix calculations, and more Learn to implement statistics in data science tasks such as data cleaning, mining, and analysis Learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks Get comfortable with performing various statistical computations for data science programmatically In Detail Data science is an ever-evolving field, which is growing in popularity at an exponential rate. Data science includes techniques and theories extracted from the fields of statistics; computer science, and, most importantly, machine learning, databases, data visualization, and so on. This book takes you through an entire journey of statistics, from knowing very little to becoming comfortable in using various statistical methods for data science tasks. It starts off with simple statistics and then move on to statistical methods that are used in data science algorithms. The R programs for statistical computation are clearly explained along with logic. You will come across various mathematical concepts, such as variance, standard deviation, probability, matrix calculations, and more. You will learn only what is required to implement statistics in data science tasks such as data cleaning, mining, and analysis. You will learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks. By the end of the book, you will be comfortab ... Cover -- Copyright -- Credits -- About the Author -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Transitioning from Data Developer to Data Scientist -- Data developer thinking -- Objectives of a data developer -- Querying or mining -- Data quality or data cleansing -- Data modeling -- Issue or insights -- Thought process -- Developer versus scientist -- New data, new source -- Quality questions -- Querying and mining -- Performance -- Financial reporting -- Visualizing -- Tools of the trade -- Advantages of thinking like a data scientist -- Developing a better approach to understanding data -- Using statistical thinking during program or database designing -- Adding to your personal toolbox -- Increased marketability -- Perpetual learning -- Seeing the future -- Transitioning to a data scientist -- Let's move ahead -- Summary -- Chapter 2: Declaring the Objectives -- Key objectives of data science -- Collecting data -- Processing data -- Exploring and visualizing data -- Analyzing the data and/or applying machine learning to the data -- Deciding (or planning) based upon acquired insight -- Thinking like a data scientist -- Bringing statistics into data science -- Common terminology -- Statistical population -- Probability -- False positives -- Statistical inference -- Regression -- Fitting -- Categorical data -- Classification -- Clustering -- Statistical comparison -- Coding -- Distributions -- Data mining -- Decision trees -- Machine learning -- Munging and wrangling -- Visualization -- D3 -- Regularization -- Assessment -- Cross-validation -- Neural networks -- Boosting -- Lift -- Mode -- Outlier -- Predictive modeling -- Big Data -- Confidence interval -- Writing -- Summary -- Chapter 3: A Developer's Approach to Data Cleaning -- Understanding basic data cleaning. Common data issues -- Contextual data issues -- Cleaning techniques -- R and common data issues -- Outliers -- Step 1 -- Profiling the data -- Step 2 -- Addressing the outliers -- Domain expertise -- Validity checking -- Enhancing data -- Harmonization -- Standardization -- Transformations -- Deductive correction -- Deterministic imputation -- Summary -- Chapter 4: Data Mining and the Database Developer -- Data mining -- Common techniques -- Visualization -- Cluster analysis -- Correlation analysis -- Discriminant analysis -- Factor analysis -- Regression analysis -- Logistic analysis -- Purpose -- Mining versus querying -- Choosing R for data mining -- Visualizations -- Current smokers -- Missing values -- A cluster analysis -- Dimensional reduction -- Calculating statistical significance -- Frequent patterning -- Frequent item-setting -- Sequence mining -- Summary -- Chapter 5: Statistical Analysis for the Database Developer -- Data analysis -- Looking closer -- Statistical analysis -- Summarization -- Comparing groups -- Samples -- Group comparison conclusions -- Summarization modeling -- Establishing the nature of data -- Successful statistical analysis -- R and statistical analysis -- Summary -- Chapter 6: Database Progression to Database Regression -- Introducing statistical regression -- Techniques and approaches for regression -- Choosing your technique -- Does it fit? -- Identifying opportunities for statistical regression -- Summarizing data -- Exploring relationships -- Testing significance of differences -- Project profitability -- R and statistical regression -- A working example -- Establishing the data profile -- The graphical analysis -- Predicting with our linear model -- Step 1: Chunking the data -- Step 2: Creating the model on the training data -- Step 3: Predicting the projected profit on test data -- Step 4: Reviewing the model. Step 4: Accuracy and error -- Summary -- Chapter 7: Regularization for Database Improvement -- Statistical regularization -- Various statistical regularization methods -- Ridge -- Lasso -- Least angles -- Opportunities for regularization -- Collinearity -- Sparse solutions -- High-dimensional data -- Classification -- Using data to understand statistical regularization -- Improving data or a data model -- Simplification -- Relevance -- Speed -- Transformation -- Variation of coefficients -- Casual inference -- Back to regularization -- Reliability -- Using R for statistical regularization -- Parameter Setup -- Summary -- Chapter 8: Database Development and Assessment -- Assessment and statistical assessment -- Objectives -- Baselines -- Planning for assessment -- Evaluation -- Development versus assessment -- Planning -- Data assessment and data quality assurance -- Categorizing quality -- Relevance -- Cross-validation -- Preparing data -- R and statistical assessment -- Questions to ask -- Learning curves -- Example of a learning curve -- Summary -- Chapter 9: Databases and Neural Networks -- Ask any data scientist -- Defining neural network -- Nodes -- Layers -- Training -- Solution -- Understanding the concepts -- Neural network models and database models -- No single or main node -- Not serial -- No memory address to store results -- R-based neural networks -- References -- Data prep and preprocessing -- Data splitting -- Model parameters -- Cross-validation -- R packages for ANN development -- ANN -- ANN2 -- NNET -- Black boxes -- A use case -- Popular use cases -- Character recognition -- Image compression -- Stock market prediction -- Fraud detection -- Neuroscience -- Summary -- Chapter 10: Boosting your Database -- Definition and purpose -- Bias -- Categorizing bias -- Causes of bias -- Bias data collection -- Bias sample selection. Variance -- ANOVA -- Noise -- Noisy data -- Weak and strong learners -- Weak to strong -- Model bias -- Training and prediction time -- Complexity -- Which way? -- Back to boosting -- How it started -- AdaBoost -- What you can learn from boosting (to help) your database -- Using R to illustrate boosting methods -- Prepping the data -- Training -- Ready for boosting -- Example results -- Summary -- Chapter 11: Database Classification using Support Vector Machines -- Database classification -- Data classification in statistics -- Guidelines for classifying data -- Common guidelines -- Definitions -- Definition and purpose of an SVM -- The trick -- Feature space and cheap computations -- Drawing the line -- More than classification -- Downside -- Reference resources -- Predicting credit scores -- Using R and an SVM to classify data in a database -- Moving on -- Summary -- Chapter 12: Database Structures and Machine Learning -- Data structures and data models -- Data structures -- Data models -- What's the difference? -- Relationships -- Machine learning -- Overview of machine learning concepts -- Key elements of machine learning -- Representation -- Evaluation -- Optimization -- Types of machine learning -- Supervised learning -- Unsupervised learning -- Semi-supervised learning -- Reinforcement learning -- Most popular -- Applications of machine learning -- Machine learning in practice -- Understanding -- Preparation -- Learning -- Interpretation -- Deployment -- Iteration -- Using R to apply machine learning techniques to a database -- Understanding the data -- Preparing -- Data developer -- Understanding the challenge -- Cross-tabbing and plotting -- Summary -- Index. Statistics. http://id.loc.gov/authorities/subjects/sh85127580 Big data. http://id.loc.gov/authorities/subjects/sh2012003227 Données volumineuses. Statistique. statistics. aat COMPUTERS Data Processing. bisacsh Big data fast Statistics fast has work: Statistics for data science (Text) https://id.oclc.org/worldcat/entity/E39PCFw4V4C6wbP7myyQPcFKr3 https://id.oclc.org/worldcat/ontology/hasWork FWS01 ZDB-4-EBA FWS_PDA_EBA https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1636280 Volltext
spellingShingle	Miller, James D. (Software consultant) Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / Cover -- Copyright -- Credits -- About the Author -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Transitioning from Data Developer to Data Scientist -- Data developer thinking -- Objectives of a data developer -- Querying or mining -- Data quality or data cleansing -- Data modeling -- Issue or insights -- Thought process -- Developer versus scientist -- New data, new source -- Quality questions -- Querying and mining -- Performance -- Financial reporting -- Visualizing -- Tools of the trade -- Advantages of thinking like a data scientist -- Developing a better approach to understanding data -- Using statistical thinking during program or database designing -- Adding to your personal toolbox -- Increased marketability -- Perpetual learning -- Seeing the future -- Transitioning to a data scientist -- Let's move ahead -- Summary -- Chapter 2: Declaring the Objectives -- Key objectives of data science -- Collecting data -- Processing data -- Exploring and visualizing data -- Analyzing the data and/or applying machine learning to the data -- Deciding (or planning) based upon acquired insight -- Thinking like a data scientist -- Bringing statistics into data science -- Common terminology -- Statistical population -- Probability -- False positives -- Statistical inference -- Regression -- Fitting -- Categorical data -- Classification -- Clustering -- Statistical comparison -- Coding -- Distributions -- Data mining -- Decision trees -- Machine learning -- Munging and wrangling -- Visualization -- D3 -- Regularization -- Assessment -- Cross-validation -- Neural networks -- Boosting -- Lift -- Mode -- Outlier -- Predictive modeling -- Big Data -- Confidence interval -- Writing -- Summary -- Chapter 3: A Developer's Approach to Data Cleaning -- Understanding basic data cleaning. Common data issues -- Contextual data issues -- Cleaning techniques -- R and common data issues -- Outliers -- Step 1 -- Profiling the data -- Step 2 -- Addressing the outliers -- Domain expertise -- Validity checking -- Enhancing data -- Harmonization -- Standardization -- Transformations -- Deductive correction -- Deterministic imputation -- Summary -- Chapter 4: Data Mining and the Database Developer -- Data mining -- Common techniques -- Visualization -- Cluster analysis -- Correlation analysis -- Discriminant analysis -- Factor analysis -- Regression analysis -- Logistic analysis -- Purpose -- Mining versus querying -- Choosing R for data mining -- Visualizations -- Current smokers -- Missing values -- A cluster analysis -- Dimensional reduction -- Calculating statistical significance -- Frequent patterning -- Frequent item-setting -- Sequence mining -- Summary -- Chapter 5: Statistical Analysis for the Database Developer -- Data analysis -- Looking closer -- Statistical analysis -- Summarization -- Comparing groups -- Samples -- Group comparison conclusions -- Summarization modeling -- Establishing the nature of data -- Successful statistical analysis -- R and statistical analysis -- Summary -- Chapter 6: Database Progression to Database Regression -- Introducing statistical regression -- Techniques and approaches for regression -- Choosing your technique -- Does it fit? -- Identifying opportunities for statistical regression -- Summarizing data -- Exploring relationships -- Testing significance of differences -- Project profitability -- R and statistical regression -- A working example -- Establishing the data profile -- The graphical analysis -- Predicting with our linear model -- Step 1: Chunking the data -- Step 2: Creating the model on the training data -- Step 3: Predicting the projected profit on test data -- Step 4: Reviewing the model. Step 4: Accuracy and error -- Summary -- Chapter 7: Regularization for Database Improvement -- Statistical regularization -- Various statistical regularization methods -- Ridge -- Lasso -- Least angles -- Opportunities for regularization -- Collinearity -- Sparse solutions -- High-dimensional data -- Classification -- Using data to understand statistical regularization -- Improving data or a data model -- Simplification -- Relevance -- Speed -- Transformation -- Variation of coefficients -- Casual inference -- Back to regularization -- Reliability -- Using R for statistical regularization -- Parameter Setup -- Summary -- Chapter 8: Database Development and Assessment -- Assessment and statistical assessment -- Objectives -- Baselines -- Planning for assessment -- Evaluation -- Development versus assessment -- Planning -- Data assessment and data quality assurance -- Categorizing quality -- Relevance -- Cross-validation -- Preparing data -- R and statistical assessment -- Questions to ask -- Learning curves -- Example of a learning curve -- Summary -- Chapter 9: Databases and Neural Networks -- Ask any data scientist -- Defining neural network -- Nodes -- Layers -- Training -- Solution -- Understanding the concepts -- Neural network models and database models -- No single or main node -- Not serial -- No memory address to store results -- R-based neural networks -- References -- Data prep and preprocessing -- Data splitting -- Model parameters -- Cross-validation -- R packages for ANN development -- ANN -- ANN2 -- NNET -- Black boxes -- A use case -- Popular use cases -- Character recognition -- Image compression -- Stock market prediction -- Fraud detection -- Neuroscience -- Summary -- Chapter 10: Boosting your Database -- Definition and purpose -- Bias -- Categorizing bias -- Causes of bias -- Bias data collection -- Bias sample selection. Variance -- ANOVA -- Noise -- Noisy data -- Weak and strong learners -- Weak to strong -- Model bias -- Training and prediction time -- Complexity -- Which way? -- Back to boosting -- How it started -- AdaBoost -- What you can learn from boosting (to help) your database -- Using R to illustrate boosting methods -- Prepping the data -- Training -- Ready for boosting -- Example results -- Summary -- Chapter 11: Database Classification using Support Vector Machines -- Database classification -- Data classification in statistics -- Guidelines for classifying data -- Common guidelines -- Definitions -- Definition and purpose of an SVM -- The trick -- Feature space and cheap computations -- Drawing the line -- More than classification -- Downside -- Reference resources -- Predicting credit scores -- Using R and an SVM to classify data in a database -- Moving on -- Summary -- Chapter 12: Database Structures and Machine Learning -- Data structures and data models -- Data structures -- Data models -- What's the difference? -- Relationships -- Machine learning -- Overview of machine learning concepts -- Key elements of machine learning -- Representation -- Evaluation -- Optimization -- Types of machine learning -- Supervised learning -- Unsupervised learning -- Semi-supervised learning -- Reinforcement learning -- Most popular -- Applications of machine learning -- Machine learning in practice -- Understanding -- Preparation -- Learning -- Interpretation -- Deployment -- Iteration -- Using R to apply machine learning techniques to a database -- Understanding the data -- Preparing -- Data developer -- Understanding the challenge -- Cross-tabbing and plotting -- Summary -- Index. Statistics. http://id.loc.gov/authorities/subjects/sh85127580 Big data. http://id.loc.gov/authorities/subjects/sh2012003227 Données volumineuses. Statistique. statistics. aat COMPUTERS Data Processing. bisacsh Big data fast Statistics fast
subject_GND	http://id.loc.gov/authorities/subjects/sh85127580 http://id.loc.gov/authorities/subjects/sh2012003227
title	Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks /
title_auth	Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks /
title_exact_search	Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks /
title_full	Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / James D. Miller.
title_fullStr	Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / James D. Miller.
title_full_unstemmed	Statistics for data science : leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks / James D. Miller.
title_short	Statistics for data science :
title_sort	statistics for data science leverage the power of statistics for data analysis classification regression machine learning and neural networks
title_sub	leverage the power of statistics for data analysis, classification, regression, machine learning, and neural networks /
topic	Statistics. http://id.loc.gov/authorities/subjects/sh85127580 Big data. http://id.loc.gov/authorities/subjects/sh2012003227 Données volumineuses. Statistique. statistics. aat COMPUTERS Data Processing. bisacsh Big data fast Statistics fast
topic_facet	Statistics. Big data. Données volumineuses. Statistique. statistics. COMPUTERS Data Processing. Big data Statistics
url	https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1636280
work_keys_str_mv	AT millerjamesd statisticsfordatascienceleveragethepowerofstatisticsfordataanalysisclassificationregressionmachinelearningandneuralnetworks

Verfügbarkeit

MARC

Datensatz im Suchindex

Ähnliche Einträge