Data mining with R: learning with case studies
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Boca Raton [u.a.]
CRC Press
2011
|
Schriftenreihe: | Chapman & Hall/CRC data mining and knowledge discovery series
A Chapman & Hall book |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | XV, 289 S. graph. Darst. |
ISBN: | 9781439810187 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV037226564 | ||
003 | DE-604 | ||
005 | 20170622 | ||
007 | t | ||
008 | 110215s2011 d||| |||| 00||| eng d | ||
020 | |a 9781439810187 |9 978-1-439-81018-7 | ||
035 | |a (OCoLC)706877793 | ||
035 | |a (DE-599)GBV621536733 | ||
040 | |a DE-604 |b ger | ||
041 | 0 | |a eng | |
049 | |a DE-634 |a DE-824 |a DE-945 |a DE-859 |a DE-384 |a DE-20 |a DE-355 | ||
084 | |a SK 850 |0 (DE-625)143263: |2 rvk | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
084 | |a ST 601 |0 (DE-625)143682: |2 rvk | ||
100 | 1 | |a Torgo, Luís |e Verfasser |4 aut | |
245 | 1 | 0 | |a Data mining with R |b learning with case studies |c Luís Torgo |
264 | 1 | |a Boca Raton [u.a.] |b CRC Press |c 2011 | |
300 | |a XV, 289 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Chapman & Hall/CRC data mining and knowledge discovery series | |
490 | 0 | |a A Chapman & Hall book | |
650 | 0 | 7 | |a R |g Programm |0 (DE-588)4705956-4 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
655 | 7 | |8 1\p |0 (DE-588)4522595-3 |a Fallstudiensammlung |2 gnd-content | |
689 | 0 | 0 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | 1 | |a R |g Programm |0 (DE-588)4705956-4 |D s |
689 | 0 | |C b |5 DE-604 | |
856 | 4 | 2 | |m HBZ Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=021140331&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-021140331 | ||
883 | 1 | |8 1\p |a cgwrk |d 20201028 |q DE-101 |u https://d-nb.info/provenance/plan#cgwrk |
Datensatz im Suchindex
_version_ | 1804143829868085248 |
---|---|
adam_text | Titel: Data mining with R
Autor: Torgo, Luis
Jahr: 2011
Contents
Preface ix
Acknowledgments xi
List of Figures xiii
List of Tables xv
1 Introduction 1
1.1 How to Read This Book?.................... 2
1.2 A Short Introduction to R ................... 3
1.2.1 Starting with R ..................... 3
1.2.2 R Objects......................... 5
1.2.3 Vectors.......................... 7
1.2.4 Vectorization....................... 10
1.2.5 Factors .......................... 11
1.2.6 Generating Sequences.................. 14
1.2.7 Sub-Setting........................ 16
1.2.8 Matrices and Arrays................... 19
1.2.9 Lists............................ 23
1.2.10 Data Frames....................... 26
1.2.11 Creating New Functions................. 30
1.2.12 Objects, Classes, and Methods............. 33
1.2.13 Managing Your Sessions................. 34
1.3 A Short Introduction to MySQL ................ 35
2 Predicting Algae Blooms 39
2.1 Problem Description and Objectives.............. 39
2.2 Data Description......................... 40
2.3 Loading the Data into R .................... 41
2.4 Data Visualization and Summarization ............ 43
2.5 Unknown Values......................... 52
2.5.1 Removing the Observations with Unknown Values . . 53
2.5.2 Filling in the Unknowns with the Most Frequent Values 55
2.5.3 Filling in the Unknown Values by Exploring Correla-
tions ............................ 56
2.5.4 Filling in the Unknown Values by Exploring Similarities
between Cases ...................... 60
2.6 Obtaining Prediction Models.................. 63
2.6.1 Multiple Linear Regression............... 64
2.6.2 Regression Trees..................... 71
2.7 Model Evaluation and Selection ................ 77
2.8 Predictions for the Seven Algae ................ 91
2.9 Summary............................. 94
Predicting Stock Market Returns 95
3.1 Problem Description and Objectives.............. 95
3.2 The Available Data ....................... 96
3.2.1 Handling Time-Dependent Data in R ......... 97
3.2.2 Reading the Data from the CSV File.......... 101
3.2.3 Getting the Data from the Web............. 102
3.2.4 Reading the Data from a MySQL Database...... 104
3.2.4.1 Loading the Data into R Running on Windows 105
3.2.4.2 Loading the Data into R Running on Linux . 107
3.3 Denning the Prediction Tasks ................. 108
3.3.1 What to Predict?..................... 108
3.3.2 Which Predictors?.................... Ill
3.3.3 The Prediction Tasks .................. 117
3.3.4 Evaluation Criteria.................... 118
3.4 The Prediction Models ..................... 120
3.4.1 How Will the Training Data Be Used?......... 121
3.4.2 The Modeling Tools................... 123
3.4.2.1 Artificial Neural Networks .......... 123
3.4.2.2 Support Vector Machines........... 126
3.4.2.3 Multivariate Adaptive Regression Splines . . 129
3.5 From Predictions into Actions ................. 130
3.5.1 How Will the Predictions Be Used?........... 130
3.5.2 Trading-Related Evaluation Criteria.......... 132
3.5.3 Putting Everything Together: A Simulated Trader . . 133
3.6 Model Evaluation and Selection ................ 141
3.6.1 Monte Carlo Estimates ................. 141
3.6.2 Experimental Comparisons............... 143
3.6.3 Results Analysis..................... 148
3.7 The Trading System....................... 156
3.7.1 Evaluation of the Final Test Data........... 156
3.7.2 An Online Trading System............... 162
3.8 Summary............................. 163
Detecting Fraudulent Transactions 165
4.1 Problem Description and Objectives.............. 165
4.2 The Available Data ....................... 166
4.2.1 Loading the Data into R ................ 166
4.2.2 Exploring the Dataset.................. 167
4.2.3 Data Problems...................... 174
4.2.3.1 Unknown Values................ 175
4.2.3.2 Few Transactions of Some Products..... 179
4.3 Defining the Data Mining Tasks ................ 183
4.3.1 Different Approaches to the Problem.......... 183
4.3.1.1 Unsupervised Techniques........... 184
4.3.1.2 Supervised Techniques............. 185
4.3.1.3 Semi-Supervised Techniques......... 186
4.3.2 Evaluation Criteria.................... 187
4.3.2.1 Precision and Recall.............. 188
4.3.2.2 Lift Charts and Precision/Recall Curves ... 188
4.3.2.3 Normalized Distance to Typical Price .... 193
4.3.3 Experimental Methodology............... 194
4.4 Obtaining Outlier Rankings .................. 195
4.4.1 Unsupervised Approaches................ 196
4.4.1.1 The Modified Box Plot Rule......... 196
4.4.1.2 Local Outlier Factors {IX)F)......... 201
4.4.1.3 Clustering-Based Outlier Rankings {ORh) . 205
4.4.2 Supervised Approaches ................. 208
4.4.2.1 The Class Imbalance Problem........ 209
4.4.2.2 Naive Bayes .................. 211
4.4.2.3 AdaBoost.................... 217
4.4.3 Semi-Supervised Approaches.............. 223
4.5 Summary............................. 230
Classifying Microarray Samples 233
5.1 Problem Description and Objectives.............. 233
5.1.1 Brief Background on Microarray Experiments..... 233
5.1.2 The ALL Dataset .................... 234
5.2 The Available Data ....................... 235
5.2.1 Exploring the Dataset.................. 238
5.3 Gene (Feature) Selection .................... 241
5.3.1 Simple Filters Based on Distribution Properties .... 241
5.3.2 ANOVA Filters...................... 244
5.3.3 Filtering Using Random Forests ............ 246
5.3.4 Filtering Using Feature Clustering Ensembles..... 248
5.4 Predicting Cytogenetic Abnormalities............. 251
5.4.1 Defining the Prediction Task.............. 251
5.4.2 The Evaluation Metric.................. 252
5.4.3 The Experimental Procedure.............. 253
5.4.4 The Modeling Techniques................ 254
5.4.4.1 Random Forests................ 254
5.4.4.2 fc-Nearest Neighbors.............. 255
5.4.5 Comparing the Models.................. 258
5.5 Summary............................. 267
Bibliography 269
Subject Index 279
Index of Data Mining Topics 285
Index of R Functions 287
|
any_adam_object | 1 |
author | Torgo, Luís |
author_facet | Torgo, Luís |
author_role | aut |
author_sort | Torgo, Luís |
author_variant | l t lt |
building | Verbundindex |
bvnumber | BV037226564 |
classification_rvk | SK 850 ST 530 ST 601 |
ctrlnum | (OCoLC)706877793 (DE-599)GBV621536733 |
discipline | Informatik Mathematik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01665nam a2200409 c 4500</leader><controlfield tag="001">BV037226564</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20170622 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">110215s2011 d||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781439810187</subfield><subfield code="9">978-1-439-81018-7</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)706877793</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBV621536733</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-634</subfield><subfield code="a">DE-824</subfield><subfield code="a">DE-945</subfield><subfield code="a">DE-859</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-355</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SK 850</subfield><subfield code="0">(DE-625)143263:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 601</subfield><subfield code="0">(DE-625)143682:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Torgo, Luís</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Data mining with R</subfield><subfield code="b">learning with case studies</subfield><subfield code="c">Luís Torgo</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boca Raton [u.a.]</subfield><subfield code="b">CRC Press</subfield><subfield code="c">2011</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XV, 289 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Chapman & Hall/CRC data mining and knowledge discovery series</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">A Chapman & Hall book</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">R</subfield><subfield code="g">Programm</subfield><subfield code="0">(DE-588)4705956-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="8">1\p</subfield><subfield code="0">(DE-588)4522595-3</subfield><subfield code="a">Fallstudiensammlung</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">R</subfield><subfield code="g">Programm</subfield><subfield code="0">(DE-588)4705956-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="C">b</subfield><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=021140331&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-021140331</subfield></datafield><datafield tag="883" ind1="1" ind2=" "><subfield code="8">1\p</subfield><subfield code="a">cgwrk</subfield><subfield code="d">20201028</subfield><subfield code="q">DE-101</subfield><subfield code="u">https://d-nb.info/provenance/plan#cgwrk</subfield></datafield></record></collection> |
genre | 1\p (DE-588)4522595-3 Fallstudiensammlung gnd-content |
genre_facet | Fallstudiensammlung |
id | DE-604.BV037226564 |
illustrated | Illustrated |
indexdate | 2024-07-09T22:53:53Z |
institution | BVB |
isbn | 9781439810187 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-021140331 |
oclc_num | 706877793 |
open_access_boolean | |
owner | DE-634 DE-824 DE-945 DE-859 DE-384 DE-20 DE-355 DE-BY-UBR |
owner_facet | DE-634 DE-824 DE-945 DE-859 DE-384 DE-20 DE-355 DE-BY-UBR |
physical | XV, 289 S. graph. Darst. |
publishDate | 2011 |
publishDateSearch | 2011 |
publishDateSort | 2011 |
publisher | CRC Press |
record_format | marc |
series2 | Chapman & Hall/CRC data mining and knowledge discovery series A Chapman & Hall book |
spelling | Torgo, Luís Verfasser aut Data mining with R learning with case studies Luís Torgo Boca Raton [u.a.] CRC Press 2011 XV, 289 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Chapman & Hall/CRC data mining and knowledge discovery series A Chapman & Hall book R Programm (DE-588)4705956-4 gnd rswk-swf Data Mining (DE-588)4428654-5 gnd rswk-swf 1\p (DE-588)4522595-3 Fallstudiensammlung gnd-content Data Mining (DE-588)4428654-5 s R Programm (DE-588)4705956-4 s b DE-604 HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=021140331&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis 1\p cgwrk 20201028 DE-101 https://d-nb.info/provenance/plan#cgwrk |
spellingShingle | Torgo, Luís Data mining with R learning with case studies R Programm (DE-588)4705956-4 gnd Data Mining (DE-588)4428654-5 gnd |
subject_GND | (DE-588)4705956-4 (DE-588)4428654-5 (DE-588)4522595-3 |
title | Data mining with R learning with case studies |
title_auth | Data mining with R learning with case studies |
title_exact_search | Data mining with R learning with case studies |
title_full | Data mining with R learning with case studies Luís Torgo |
title_fullStr | Data mining with R learning with case studies Luís Torgo |
title_full_unstemmed | Data mining with R learning with case studies Luís Torgo |
title_short | Data mining with R |
title_sort | data mining with r learning with case studies |
title_sub | learning with case studies |
topic | R Programm (DE-588)4705956-4 gnd Data Mining (DE-588)4428654-5 gnd |
topic_facet | R Programm Data Mining Fallstudiensammlung |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=021140331&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT torgoluis dataminingwithrlearningwithcasestudies |