Machine learning for knowledge discovery with R: methodologies for modeling, inference and prediction
"Machine Learning for Knowledge Discovery with R contains methodologies and examples for statistical modelling, inference, and prediction of data analysis. It includes many recent supervised and unsupervised machine learning methodologies such as recursive partitioning modelling, regularized re...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Boca Raton ; London ; New York
CRC Press
2022
|
Ausgabe: | First edition |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Zusammenfassung: | "Machine Learning for Knowledge Discovery with R contains methodologies and examples for statistical modelling, inference, and prediction of data analysis. It includes many recent supervised and unsupervised machine learning methodologies such as recursive partitioning modelling, regularized regression, support vector machine, neural network, clustering, and causal-effect inference. Additionally, it emphasizes statistical thinking of data analysis, use of statistical graphs for data structure exploration, and result presentations. The book includes many real-world data examples from life-science, finance, etc. to illustrate the applications of the methods described therein"-- |
Beschreibung: | XV, 244 Seiten Diagramme 24 cm |
ISBN: | 9781032065366 9781032071596 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV047657454 | ||
003 | DE-604 | ||
005 | 20231201 | ||
007 | t | ||
008 | 220103s2022 |||| |||| 00||| eng d | ||
015 | |a GBC1B4055 |2 dnb | ||
020 | |a 9781032065366 |c hbk |9 978-1-032-06536-6 | ||
020 | |a 9781032071596 |c pbk |9 978-1-032-07159-6 | ||
035 | |a (OCoLC)1296280482 | ||
035 | |a (DE-599)BVBBV047657454 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-739 | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
100 | 1 | |a Tsai, Kao-Tai |e Verfasser |0 (DE-588)1245968041 |4 aut | |
245 | 1 | 0 | |a Machine learning for knowledge discovery with R |b methodologies for modeling, inference and prediction |c Kao-Tai Tsai |
250 | |a First edition | ||
264 | 1 | |a Boca Raton ; London ; New York |b CRC Press |c 2022 | |
300 | |a XV, 244 Seiten |b Diagramme |c 24 cm | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
520 | |a "Machine Learning for Knowledge Discovery with R contains methodologies and examples for statistical modelling, inference, and prediction of data analysis. It includes many recent supervised and unsupervised machine learning methodologies such as recursive partitioning modelling, regularized regression, support vector machine, neural network, clustering, and causal-effect inference. Additionally, it emphasizes statistical thinking of data analysis, use of statistical graphs for data structure exploration, and result presentations. The book includes many real-world data examples from life-science, finance, etc. to illustrate the applications of the methods described therein"-- | ||
650 | 4 | |a Data mining / Methodology | |
650 | 4 | |a Machine learning | |
650 | 4 | |a R (Computer program language) | |
650 | 7 | |a Machine learning |2 fast | |
650 | 7 | |a R (Computer program language) |2 fast | |
650 | 0 | 7 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a R |g Programm |0 (DE-588)4705956-4 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |D s |
689 | 0 | 1 | |a R |g Programm |0 (DE-588)4705956-4 |D s |
689 | 0 | |5 DE-604 | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-1-003-20568-5 |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033042354&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-033042354 |
Datensatz im Suchindex
_version_ | 1804183126112468992 |
---|---|
adam_text | Contents Preface xiii 1 Data Analysis 1.1 1.2 1.3 1.4 Perspectives of Data Analysis ................................................ Strategies and Stages of Data Analysis ................................. Data Quality.............................................................................. 1.3.1 Heterogeneity in Data Sources .................................... 1.3.1.1 Heterogeneity in Study SubjectPopulations 1.3.1.2 Heterogeneity in Data due to Timing of Gen erations ............................................................ 1.3.2 Noise Accumulation...................................................... 1.3.3 Spurious Correlation...................................................... 1.3.4 Missing Data.................................................................. Data Sets Analyzed in This Book .......................................... 1.4.1 NCI-60 ........................................................................... 1.4.2 Riboflavin Production with Bacillus Subtilis............... 1.4.3 TCGA.............................................................................. 1.4.4 The Boston Housing Data Set....................................... 2 Examining Data Distribution 2.1 One Dimension........................................................................... 2.1.1 Histogram, Stem-and-Leaf, Density Plot..................... 2.1.2 Box Plot ........................................................................ 2.1.3 Quantile-Quantile (Q-Q) Plot, Normal Plot, ProbabilityProbability (P-P) Plot................................................... 2.2 Two
Dimension ........................................................................ 2.2.1 Scatter Plot..................................................................... 2.2.2 Ellipse ֊ Visualization of Covariance andCorrelation . 2.2.3 Multivariate Normality Test.......................................... 2.3 More Than Two Dimension...................................................... 2.3.1 Scatter Plot Matrix ...................................................... 2.3.2 Andrews s Plot............................................................... 2.3.3 Conditional Plot............................................................ 2.4 Visualization of Categorical Data .......................................... 2.4.1 Mosaic Plot..................................................................... 2.4.2 Association Plot............................................................ 1 1 3 4 5 5 5 6 6 6 7 7 7 7 8 9 9 9 10 11 12 12 13 17 19 19 20 23 25 26 27 vii
viii Contents 3 Regressions 3.1 3.2 Ridge Regression........................................................................ Lasso ......................................................................................... 3.2.1 Example: Lasso on Continuous Data........................... 3.2.2 Example: Lasso on Binary Data ................................. 3.2.3 Example: Lasso on Survival Data................................. 3.3 Group Lasso ............................................................................. 3.3.1 Example: Group Lasso on Gene Signatures............... 3.4 Sparse Group Lasso ................................................................. 3.4.1 Example: Lasso, Group Lasso, Sparse Group Lasso on Simulated Continuous Data.......................................... 3.4.2 Example: Lasso, Group Lasso, Sparse Group Lasso on Gene Signatures ContinuousData............................... 3.5 Adaptive Lasso.......................................................................... 3.5.1 Example: Adaptive Lasso on Continuous Data .... 3.5.2 Example: Adaptive Lasso on Binary Data.................. 3.6 Elastic Net ................................................................................ 3.6.1 Example: Elastic Net on Continuous Data.................. 3.6.2 Example: Elastic Net on Binary Data........................ 3.7 The Sure Screening Method ................................................... 3.7.1 The Sure Screening Method.......................................... 3.7.2 Sure Independence Screening on Model Selection ... 3.7.3 Example: SIS on Continuous
Data.............................. 3.7.4 Example: SIS on Survival Data.................................... 3.8 Identify Minimal Class of Models ....................................... 3.8.1 Analysis Using Minimal Models.................................... 4 RecursivePartitioning Modeling 4.1 Recursive Partitioning Modeling viaTrees ............................ 4.1.1 Elements of Growing a Tree.......................................... 4.1.1.1 Grow a Tree................................................... 4.1.2 The Impurity Function................................................... 4.1.2.1 Definition of Impurity Function ................... 4.1.2.2 Measure of Node Impurity - the Gini Index . 4.1.3 Misclassification Cost ................................................... 4.1.4 Size of Trees .................................................................. 4.1.5 Example of Recursive Partitioning.............................. 4.1.5.1 Recursive Partitioning with Binary Outcomes 4.1.5.2 Recursive Partitioning with Continuous Out comes ............................................................... 4.1.5.3 Recursive Partitioning for Survival Outcomes 4.2 Random Forest........................................................................... 4.2.1 Mechanism of Action of Random Forests..................... 4.2.2 Variable Importance...................................................... 4.2.3 Random Forests for Regression.................................... 29 29 30 31 32 33 34 35 37 38 41 45 46 47 49 51 52 53 54 55 56 56 57 58 59 59 59 60 60 61 61 61 62 63 63 65 67 70 72 72 73
Contents 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5 4.2.4 Example of Random Forest Data Analysis................... 4.2.4.1 randomForest for Binary Data..................... 4.2.4.2 randomForest for Continuous Data............... Random Survival Forest............................................................ 4.3.1 Algorithm to Construct RSF....................................... 4.3.2 Individual and Ensemble Estimate at Terminal Nodes 4.3.3 VIMP.............................................................................. 4.3.4 Example........................................................................... XGBoost: A Tree Boosting System.......................................... 4.4.1 Example Using xgboost for Data Analysis.................. 4.4.1.1 xgboost for Binary Data .............................. 4.4.1.2 xgboost for Continuous Data........................ 4.4.2 Example ֊ xgboost for Cox Regression........................ Model-based Recursive Partitioning ....................................... 4.5.1 The Recursive Partitioning Algorithm........................ 4.5.2 Example........................................................................... Recursive Partition for Longitudinal Data.............................. 4.6.1 Methodology.................................................................. 4.6.2 Recursive Partition for Longitudinal Data Based on Baseline Covariates......................................................... 4.6.2.1 Methodology................................................... 4.6.3 LongCART
Algorithm................................................... 4.6.4 Example of Recursive Partitioning of Longitudinal Data Analysis of Ordinal Data ......................................................... Examples ֊ Analysis of Ordinal Data .................................... 4.8.1 Analysis of Cleveland Clinic Heart Data (Ordinal) . . 4.8.2 Analysis of Cleveland Clinic Heart Data (Twoing) . . Advantages and Disadvantages of Trees ................................. 1X 73 73 76 77 78 79 79 79 81 83 83 84 87 88 89 89 91 91 92 92 93 93 95 96 96 97 99 Support Vector Machine 101 5.1 101 102 102 103 103 104 104 104 105 106 106 108 109 110 110 5.2 General Theory of Classification and Regression in Hyperplane 5.1.1 Separable Case............................................................... 5.1.2 Non-separable Case ...................................................... 5.1.2.1 Method of Stochastic Approximation .... 5.1.2.2 Method of Sigmoid Approximations............ 5.1.2.3 Method of Radial Basis Functions............... SVM for Indicator Functions ................................................... 5.2.1 Optimal Hyperplane for Separable Data Sets............ 5.2.1.1 Constructing the Optimal Hyperplane .... 5.2.2 Optimal Hyperplane for Non-Separable Sets............... 5.2.2.1 Generalization of the Optimal Hyperplane . 5.2.3 Support Vector Machine................................................ 5.2.4 Constructing SVM......................................................... 5.2.4.1 Polynomial Kernel Functions........................ 5.2.4.2 Radial Basis Kernel
Functions.....................
x Contents 5.2.5 Example: Analysis of Binary Classification Using SVM 5.2.6 Example: Effect of Kernel Selection ............................ 5.3 SVM for Continuous Data ........................................................ 5.3.1 Minimizing the Risk with e-insensitive Loss Functions 5.3.2 Example: Regression Analysis Using SVM................... 5.4 SVM for Survival Data Analysis ............................................... 5.4.1 Example: Analysis of Survival Data Using SVM .... 5.5 Feature Elimination for SVM..................................................... 5.5.1 Example: Gene Selection via SVM with Feature Elimi nation ................................................................................. 5.6 Spare Bayesian Learning with Relevance Vector Machine (RVM) .......................................................................................... 5.6.1 Example: Regression Analysis Using RVM................... 5.6.2 Example: Curve Fitting for SVM and RVM................ 5.7 SV Machines for Function Estimation..................................... 110 112 112 113 115 117 118 119 120 122 125 125 127 6 Cluster Analysis 129 6.1 Measure of Distance/Dissimilarity ........................................... 129 6.1.1 Continuous Variables........................................................ 130 6.1.2 Binary and Categorical Variables.................................. 130 6.1.3 Mixed Data Types........................................................... 130 6.1.4 Other Measure of Dissimilarity..................................... 131 6.2 Hierarchical Clustering
.............................................................. 131 6.2.1 Options of Linkage........................................................... 132 6.2.2 Example of Hierarchical Clustering............................... 133 6.3 К-means Cluster.......................................................................... 135 6.3.1 General Description of К-means Clustering................ 135 6.3.2 Estimating the Number of Clusters............................... 137 6.4 The PAM Clustering Algorithm ............................................... 139 6.4.1 Example of К-means with PAM Clustering Algorithm 141 6.5 Bagged Clustering....................................................................... 141 6.5.1 Example of Bagged Clustering..................................... 142 6.6 RandomForest for Clustering..................................................... 144 6.6.1 Example: Random Forest for Clustering...................... 144 6.7 Mixture Models/Model-based Cluster Analysis...................... 145 6.8 Stability of Clusters .................................................................... 147 6.9 Consensus Clustering ................................................................. 147 6.9.1 Determination of Clusters............................................... 148 6.9.2 Example of Consensus Clustering, on RNA Sequence Data.................................................................................... 149 6.10 The Integrative Clustering Framework..................................... 151 6.10.1 Example: Integrative
Clustering..................................... 152
Contents 7 Neural Network χί 155 7.1 General Theoryof NeuralNetwork............................................ 155 7.2 Elemental Aspects and Structure ofArtificial Neural Networks 156 7.3 Multilayer Perceptrons ............................................................ 157 7.3.1 The Simple (Single Unit) Perceptron........................... 157 7.3.2 Training Perceptron Learning....................................... 157 7.4 Multilayer Perceptrons (MLP) ................................................ 158 7.4.1 Architectures of MLP................................................... 158 7.4.2 Training MLP ............................................................... 159 7.5 Deep Learning ........................................................................... 159 7.5.1 Model Parameterization................................................ 160 7.6 Few Pros and Cons of Neural Networks ................................. 161 7.7 Examples.................................................................................... 162 8 Causal InferenceandMatching 173 8.1 8.2 8.3 8.4 8.5 8.6 173 173 174 176 177 178 178 178 180 180 181 181 182 183 184 185 186 187 188 189 191 192 Introduction .............................................................................. Three Layer Causal Hierarchy ................................................ Seven Tools of Causal Inference ............................................. Statistical Framework of Causal Inferences ........................... Propensity Score........................................................................
Methodologies of Matching ...................................................... 8.6.1 Nearest Neighbor (or greedy) Matching..................... 8.6.1.1 Example Using Nearest Neighbor Matching . 8.6.2 Exact Matching............................................................... 8.6.2.1 Example.......................................................... 8.6.3 Mahalanobis Distance Matching ................................. 8.6.3.1 Example.......................................................... 8.6.4 Genetic Matching ......................................................... 8.6.4.1 Example.......................................................... 8.7 Optimal Matching..................................................................... 8.7.0.1 Example.......................................................... 8.8 Full Matching ........................................................................... 8.8.0.1 Example.......................................................... 8.8.1 Analysis of Data After Matching.................................. 8.8.1.1 Example.......................................................... 8.9 Cluster Matching ..................................................................... 8.9.1 Example................................... 9 Business 9.1 Case Study One: Marketing Campaigns of a Portuguese Bank ing Institution ........................................................................... 9.1.1 Description of Data ...................................................... 9.1.2 Data
Analysis.................................................................. 9.1.2.1 Analysis via Lasso........................................... 9.1.2.2 Analysis via Elastic Net.................................. 197 197 197 198 198 198
Contents xii 9.2 9.3 9.4 9.1.2.3 Analysis via SIS............................................. 9.1.2.4 Analysis via rpart.......................................... 9.1.2.5 Analysis via randomForest........................... 9.1.2.6 Analysis via xgboost....................................... Summary................................................................................... Case Study Two: Polish Companies Bankruptcy Data .... 9.3.1 Description of Data ...................................................... 9.3.2 Data Analysis.................................................................. 9.3.2.1 Analysis of Year-1 Data (univariate analysis) 9.3.2.2 Analysis of Year-3 Data (univariate analysis) 9.3.2.3 Analysis of Year-5 Data (univariate analysis) 9.3.2.4 Analysis of Year-1 Data (composite analysis) 9.3.2.5 Analysis of Year-3 Data (composite analysis) 9.3.2.6 Analysis of Year-5 Data (composite analysis) Summary.................................................................................... 10 Analysis of Response Profiles 199 200 200 202 203 204 204 206 207 209 210 212 214 216 218 221 10.1 10.2 10.3 10.4 Introduction .............................................................................. 221 Data Example ........................................................................... 221 Transition of Response States ................................................ 224 Classification of Response Profiles .......................................... 225 10.4.1 Dissimilarities Between Response Profiles............ 225 10.4.2 Visualizing Clusters via
Multidimensional Scaling . . . 226 10.4.3 Response Profile Differences among Clusters......... 227 10.4.4 Significant Clinical Variables for Each Cluster...... 228 10.5 Modeling of Response Profiles via GEE ................................. 230 10.5.1 Marginal Models...................................................... 230 10.5.2 Estimation of Marginal Regression Parameters.... 231 10.5.3 Local Odds Ratio...................................................... 231 10.5.4 Results of Modeling................................................ 231 10.6 Summary.................................................................................... 233 Bibliography 235 Index 243
|
adam_txt |
Contents Preface xiii 1 Data Analysis 1.1 1.2 1.3 1.4 Perspectives of Data Analysis . Strategies and Stages of Data Analysis . Data Quality. 1.3.1 Heterogeneity in Data Sources . 1.3.1.1 Heterogeneity in Study SubjectPopulations 1.3.1.2 Heterogeneity in Data due to Timing of Gen erations . 1.3.2 Noise Accumulation. 1.3.3 Spurious Correlation. 1.3.4 Missing Data. Data Sets Analyzed in This Book . 1.4.1 NCI-60 . 1.4.2 Riboflavin Production with Bacillus Subtilis. 1.4.3 TCGA. 1.4.4 The Boston Housing Data Set. 2 Examining Data Distribution 2.1 One Dimension. 2.1.1 Histogram, Stem-and-Leaf, Density Plot. 2.1.2 Box Plot . 2.1.3 Quantile-Quantile (Q-Q) Plot, Normal Plot, ProbabilityProbability (P-P) Plot. 2.2 Two
Dimension . 2.2.1 Scatter Plot. 2.2.2 Ellipse ֊ Visualization of Covariance andCorrelation . 2.2.3 Multivariate Normality Test. 2.3 More Than Two Dimension. 2.3.1 Scatter Plot Matrix . 2.3.2 Andrews's Plot. 2.3.3 Conditional Plot. 2.4 Visualization of Categorical Data . 2.4.1 Mosaic Plot. 2.4.2 Association Plot. 1 1 3 4 5 5 5 6 6 6 7 7 7 7 8 9 9 9 10 11 12 12 13 17 19 19 20 23 25 26 27 vii
viii Contents 3 Regressions 3.1 3.2 Ridge Regression. Lasso . 3.2.1 Example: Lasso on Continuous Data. 3.2.2 Example: Lasso on Binary Data . 3.2.3 Example: Lasso on Survival Data. 3.3 Group Lasso . 3.3.1 Example: Group Lasso on Gene Signatures. 3.4 Sparse Group Lasso . 3.4.1 Example: Lasso, Group Lasso, Sparse Group Lasso on Simulated Continuous Data. 3.4.2 Example: Lasso, Group Lasso, Sparse Group Lasso on Gene Signatures ContinuousData. 3.5 Adaptive Lasso. 3.5.1 Example: Adaptive Lasso on Continuous Data . 3.5.2 Example: Adaptive Lasso on Binary Data. 3.6 Elastic Net . 3.6.1 Example: Elastic Net on Continuous Data. 3.6.2 Example: Elastic Net on Binary Data. 3.7 The Sure Screening Method . 3.7.1 The Sure Screening Method. 3.7.2 Sure Independence Screening on Model Selection . 3.7.3 Example: SIS on Continuous
Data. 3.7.4 Example: SIS on Survival Data. 3.8 Identify Minimal Class of Models . 3.8.1 Analysis Using Minimal Models. 4 RecursivePartitioning Modeling 4.1 Recursive Partitioning Modeling viaTrees . 4.1.1 Elements of Growing a Tree. 4.1.1.1 Grow a Tree. 4.1.2 The Impurity Function. 4.1.2.1 Definition of Impurity Function . 4.1.2.2 Measure of Node Impurity - the Gini Index . 4.1.3 Misclassification Cost . 4.1.4 Size of Trees . 4.1.5 Example of Recursive Partitioning. 4.1.5.1 Recursive Partitioning with Binary Outcomes 4.1.5.2 Recursive Partitioning with Continuous Out comes . 4.1.5.3 Recursive Partitioning for Survival Outcomes 4.2 Random Forest. 4.2.1 Mechanism of Action of Random Forests. 4.2.2 Variable Importance. 4.2.3 Random Forests for Regression. 29 29 30 31 32 33 34 35 37 38 41 45 46 47 49 51 52 53 54 55 56 56 57 58 59 59 59 60 60 61 61 61 62 63 63 65 67 70 72 72 73
Contents 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5 4.2.4 Example of Random Forest Data Analysis. 4.2.4.1 randomForest for Binary Data. 4.2.4.2 randomForest for Continuous Data. Random Survival Forest. 4.3.1 Algorithm to Construct RSF. 4.3.2 Individual and Ensemble Estimate at Terminal Nodes 4.3.3 VIMP. 4.3.4 Example. XGBoost: A Tree Boosting System. 4.4.1 Example Using xgboost for Data Analysis. 4.4.1.1 xgboost for Binary Data . 4.4.1.2 xgboost for Continuous Data. 4.4.2 Example ֊ xgboost for Cox Regression. Model-based Recursive Partitioning . 4.5.1 The Recursive Partitioning Algorithm. 4.5.2 Example. Recursive Partition for Longitudinal Data. 4.6.1 Methodology. 4.6.2 Recursive Partition for Longitudinal Data Based on Baseline Covariates. 4.6.2.1 Methodology. 4.6.3 LongCART
Algorithm. 4.6.4 Example of Recursive Partitioning of Longitudinal Data Analysis of Ordinal Data . Examples ֊ Analysis of Ordinal Data . 4.8.1 Analysis of Cleveland Clinic Heart Data (Ordinal) . . 4.8.2 Analysis of Cleveland Clinic Heart Data (Twoing) . . Advantages and Disadvantages of Trees . 1X 73 73 76 77 78 79 79 79 81 83 83 84 87 88 89 89 91 91 92 92 93 93 95 96 96 97 99 Support Vector Machine 101 5.1 101 102 102 103 103 104 104 104 105 106 106 108 109 110 110 5.2 General Theory of Classification and Regression in Hyperplane 5.1.1 Separable Case. 5.1.2 Non-separable Case . 5.1.2.1 Method of Stochastic Approximation . 5.1.2.2 Method of Sigmoid Approximations. 5.1.2.3 Method of Radial Basis Functions. SVM for Indicator Functions . 5.2.1 Optimal Hyperplane for Separable Data Sets. 5.2.1.1 Constructing the Optimal Hyperplane . 5.2.2 Optimal Hyperplane for Non-Separable Sets. 5.2.2.1 Generalization of the Optimal Hyperplane . 5.2.3 Support Vector Machine. 5.2.4 Constructing SVM. 5.2.4.1 Polynomial Kernel Functions. 5.2.4.2 Radial Basis Kernel
Functions.
x Contents 5.2.5 Example: Analysis of Binary Classification Using SVM 5.2.6 Example: Effect of Kernel Selection . 5.3 SVM for Continuous Data . 5.3.1 Minimizing the Risk with e-insensitive Loss Functions 5.3.2 Example: Regression Analysis Using SVM. 5.4 SVM for Survival Data Analysis . 5.4.1 Example: Analysis of Survival Data Using SVM . 5.5 Feature Elimination for SVM. 5.5.1 Example: Gene Selection via SVM with Feature Elimi nation . 5.6 Spare Bayesian Learning with Relevance Vector Machine (RVM) . 5.6.1 Example: Regression Analysis Using RVM. 5.6.2 Example: Curve Fitting for SVM and RVM. 5.7 SV Machines for Function Estimation. 110 112 112 113 115 117 118 119 120 122 125 125 127 6 Cluster Analysis 129 6.1 Measure of Distance/Dissimilarity . 129 6.1.1 Continuous Variables. 130 6.1.2 Binary and Categorical Variables. 130 6.1.3 Mixed Data Types. 130 6.1.4 Other Measure of Dissimilarity. 131 6.2 Hierarchical Clustering
. 131 6.2.1 Options of Linkage. 132 6.2.2 Example of Hierarchical Clustering. 133 6.3 К-means Cluster. 135 6.3.1 General Description of К-means Clustering. 135 6.3.2 Estimating the Number of Clusters. 137 6.4 The PAM Clustering Algorithm . 139 6.4.1 Example of К-means with PAM Clustering Algorithm 141 6.5 Bagged Clustering. 141 6.5.1 Example of Bagged Clustering. 142 6.6 RandomForest for Clustering. 144 6.6.1 Example: Random Forest for Clustering. 144 6.7 Mixture Models/Model-based Cluster Analysis. 145 6.8 Stability of Clusters . 147 6.9 Consensus Clustering . 147 6.9.1 Determination of Clusters. 148 6.9.2 Example of Consensus Clustering, on RNA Sequence Data. 149 6.10 The Integrative Clustering Framework. 151 6.10.1 Example: Integrative
Clustering. 152
Contents 7 Neural Network χί 155 7.1 General Theoryof NeuralNetwork. 155 7.2 Elemental Aspects and Structure ofArtificial Neural Networks 156 7.3 Multilayer Perceptrons . 157 7.3.1 The Simple (Single Unit) Perceptron. 157 7.3.2 Training Perceptron Learning. 157 7.4 Multilayer Perceptrons (MLP) . 158 7.4.1 Architectures of MLP. 158 7.4.2 Training MLP . 159 7.5 Deep Learning . 159 7.5.1 Model Parameterization. 160 7.6 Few Pros and Cons of Neural Networks . 161 7.7 Examples. 162 8 Causal InferenceandMatching 173 8.1 8.2 8.3 8.4 8.5 8.6 173 173 174 176 177 178 178 178 180 180 181 181 182 183 184 185 186 187 188 189 191 192 Introduction . Three Layer Causal Hierarchy . Seven Tools of Causal Inference . Statistical Framework of Causal Inferences . Propensity Score.
Methodologies of Matching . 8.6.1 Nearest Neighbor (or greedy) Matching. 8.6.1.1 Example Using Nearest Neighbor Matching . 8.6.2 Exact Matching. 8.6.2.1 Example. 8.6.3 Mahalanobis Distance Matching . 8.6.3.1 Example. 8.6.4 Genetic Matching . 8.6.4.1 Example. 8.7 Optimal Matching. 8.7.0.1 Example. 8.8 Full Matching . 8.8.0.1 Example. 8.8.1 Analysis of Data After Matching. 8.8.1.1 Example. 8.9 Cluster Matching . 8.9.1 Example. 9 Business 9.1 Case Study One: Marketing Campaigns of a Portuguese Bank ing Institution . 9.1.1 Description of Data . 9.1.2 Data
Analysis. 9.1.2.1 Analysis via Lasso. 9.1.2.2 Analysis via Elastic Net. 197 197 197 198 198 198
Contents xii 9.2 9.3 9.4 9.1.2.3 Analysis via SIS. 9.1.2.4 Analysis via rpart. 9.1.2.5 Analysis via randomForest. 9.1.2.6 Analysis via xgboost. Summary. Case Study Two: Polish Companies Bankruptcy Data . 9.3.1 Description of Data . 9.3.2 Data Analysis. 9.3.2.1 Analysis of Year-1 Data (univariate analysis) 9.3.2.2 Analysis of Year-3 Data (univariate analysis) 9.3.2.3 Analysis of Year-5 Data (univariate analysis) 9.3.2.4 Analysis of Year-1 Data (composite analysis) 9.3.2.5 Analysis of Year-3 Data (composite analysis) 9.3.2.6 Analysis of Year-5 Data (composite analysis) Summary. 10 Analysis of Response Profiles 199 200 200 202 203 204 204 206 207 209 210 212 214 216 218 221 10.1 10.2 10.3 10.4 Introduction . 221 Data Example . 221 Transition of Response States . 224 Classification of Response Profiles . 225 10.4.1 Dissimilarities Between Response Profiles. 225 10.4.2 Visualizing Clusters via
Multidimensional Scaling . . . 226 10.4.3 Response Profile Differences among Clusters. 227 10.4.4 Significant Clinical Variables for Each Cluster. 228 10.5 Modeling of Response Profiles via GEE . 230 10.5.1 Marginal Models. 230 10.5.2 Estimation of Marginal Regression Parameters. 231 10.5.3 Local Odds Ratio. 231 10.5.4 Results of Modeling. 231 10.6 Summary. 233 Bibliography 235 Index 243 |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Tsai, Kao-Tai |
author_GND | (DE-588)1245968041 |
author_facet | Tsai, Kao-Tai |
author_role | aut |
author_sort | Tsai, Kao-Tai |
author_variant | k t t ktt |
building | Verbundindex |
bvnumber | BV047657454 |
classification_rvk | ST 530 |
ctrlnum | (OCoLC)1296280482 (DE-599)BVBBV047657454 |
discipline | Informatik |
discipline_str_mv | Informatik |
edition | First edition |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02509nam a2200457 c 4500</leader><controlfield tag="001">BV047657454</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20231201 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">220103s2022 |||| |||| 00||| eng d</controlfield><datafield tag="015" ind1=" " ind2=" "><subfield code="a">GBC1B4055</subfield><subfield code="2">dnb</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781032065366</subfield><subfield code="c">hbk</subfield><subfield code="9">978-1-032-06536-6</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781032071596</subfield><subfield code="c">pbk</subfield><subfield code="9">978-1-032-07159-6</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1296280482</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV047657454</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-739</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Tsai, Kao-Tai</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1245968041</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Machine learning for knowledge discovery with R</subfield><subfield code="b">methodologies for modeling, inference and prediction</subfield><subfield code="c">Kao-Tai Tsai</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">First edition</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boca Raton ; London ; New York</subfield><subfield code="b">CRC Press</subfield><subfield code="c">2022</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XV, 244 Seiten</subfield><subfield code="b">Diagramme</subfield><subfield code="c">24 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">"Machine Learning for Knowledge Discovery with R contains methodologies and examples for statistical modelling, inference, and prediction of data analysis. It includes many recent supervised and unsupervised machine learning methodologies such as recursive partitioning modelling, regularized regression, support vector machine, neural network, clustering, and causal-effect inference. Additionally, it emphasizes statistical thinking of data analysis, use of statistical graphs for data structure exploration, and result presentations. The book includes many real-world data examples from life-science, finance, etc. to illustrate the applications of the methods described therein"--</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data mining / Methodology</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Machine learning</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">R (Computer program language)</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Machine learning</subfield><subfield code="2">fast</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">R (Computer program language)</subfield><subfield code="2">fast</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">R</subfield><subfield code="g">Programm</subfield><subfield code="0">(DE-588)4705956-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">R</subfield><subfield code="g">Programm</subfield><subfield code="0">(DE-588)4705956-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-1-003-20568-5</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033042354&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-033042354</subfield></datafield></record></collection> |
id | DE-604.BV047657454 |
illustrated | Not Illustrated |
index_date | 2024-07-03T18:51:31Z |
indexdate | 2024-07-10T09:18:29Z |
institution | BVB |
isbn | 9781032065366 9781032071596 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-033042354 |
oclc_num | 1296280482 |
open_access_boolean | |
owner | DE-739 |
owner_facet | DE-739 |
physical | XV, 244 Seiten Diagramme 24 cm |
publishDate | 2022 |
publishDateSearch | 2022 |
publishDateSort | 2022 |
publisher | CRC Press |
record_format | marc |
spelling | Tsai, Kao-Tai Verfasser (DE-588)1245968041 aut Machine learning for knowledge discovery with R methodologies for modeling, inference and prediction Kao-Tai Tsai First edition Boca Raton ; London ; New York CRC Press 2022 XV, 244 Seiten Diagramme 24 cm txt rdacontent n rdamedia nc rdacarrier "Machine Learning for Knowledge Discovery with R contains methodologies and examples for statistical modelling, inference, and prediction of data analysis. It includes many recent supervised and unsupervised machine learning methodologies such as recursive partitioning modelling, regularized regression, support vector machine, neural network, clustering, and causal-effect inference. Additionally, it emphasizes statistical thinking of data analysis, use of statistical graphs for data structure exploration, and result presentations. The book includes many real-world data examples from life-science, finance, etc. to illustrate the applications of the methods described therein"-- Data mining / Methodology Machine learning R (Computer program language) Machine learning fast R (Computer program language) fast Maschinelles Lernen (DE-588)4193754-5 gnd rswk-swf R Programm (DE-588)4705956-4 gnd rswk-swf Maschinelles Lernen (DE-588)4193754-5 s R Programm (DE-588)4705956-4 s DE-604 Erscheint auch als Online-Ausgabe 978-1-003-20568-5 Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033042354&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Tsai, Kao-Tai Machine learning for knowledge discovery with R methodologies for modeling, inference and prediction Data mining / Methodology Machine learning R (Computer program language) Machine learning fast R (Computer program language) fast Maschinelles Lernen (DE-588)4193754-5 gnd R Programm (DE-588)4705956-4 gnd |
subject_GND | (DE-588)4193754-5 (DE-588)4705956-4 |
title | Machine learning for knowledge discovery with R methodologies for modeling, inference and prediction |
title_auth | Machine learning for knowledge discovery with R methodologies for modeling, inference and prediction |
title_exact_search | Machine learning for knowledge discovery with R methodologies for modeling, inference and prediction |
title_exact_search_txtP | Machine learning for knowledge discovery with R methodologies for modeling, inference and prediction |
title_full | Machine learning for knowledge discovery with R methodologies for modeling, inference and prediction Kao-Tai Tsai |
title_fullStr | Machine learning for knowledge discovery with R methodologies for modeling, inference and prediction Kao-Tai Tsai |
title_full_unstemmed | Machine learning for knowledge discovery with R methodologies for modeling, inference and prediction Kao-Tai Tsai |
title_short | Machine learning for knowledge discovery with R |
title_sort | machine learning for knowledge discovery with r methodologies for modeling inference and prediction |
title_sub | methodologies for modeling, inference and prediction |
topic | Data mining / Methodology Machine learning R (Computer program language) Machine learning fast R (Computer program language) fast Maschinelles Lernen (DE-588)4193754-5 gnd R Programm (DE-588)4705956-4 gnd |
topic_facet | Data mining / Methodology Machine learning R (Computer program language) Maschinelles Lernen R Programm |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033042354&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT tsaikaotai machinelearningforknowledgediscoverywithrmethodologiesformodelinginferenceandprediction |