Principles of data mining:
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
London
Springer
[2020]
|
Ausgabe: | Fourth edition |
Schriftenreihe: | Undergraduate topics in computer science
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Literaturangaben |
Beschreibung: | xvi, 571 Seiten Illustrationen, Diagramme |
ISBN: | 9781447174929 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV046913134 | ||
003 | DE-604 | ||
005 | 20220121 | ||
007 | t | ||
008 | 200925s2020 xxka||| |||| 00||| eng d | ||
020 | |a 9781447174929 |c (pbk) |9 978-1-4471-7492-9 | ||
035 | |a (OCoLC)1184757055 | ||
035 | |a (DE-599)KXP172606784X | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
044 | |a xxk |c XA-GB | ||
049 | |a DE-83 |a DE-11 |a DE-355 | ||
082 | 0 | |a 025.04 | |
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
084 | |a QH 500 |0 (DE-625)141607: |2 rvk | ||
084 | |a 54.72 |2 bkl | ||
084 | |a 54.62 |2 bkl | ||
084 | |a 54.64 |2 bkl | ||
100 | 1 | |a Bramer, Max A. |d 1948- |e Verfasser |0 (DE-588)121430855 |4 aut | |
245 | 1 | 0 | |a Principles of data mining |c Max Bramer |
250 | |a Fourth edition | ||
264 | 1 | |a London |b Springer |c [2020] | |
300 | |a xvi, 571 Seiten |b Illustrationen, Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Undergraduate topics in computer science | |
500 | |a Literaturangaben | ||
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
653 | 0 | |a Data mining | |
655 | 7 | |0 (DE-588)4123623-3 |a Lehrbuch |2 gnd-content | |
689 | 0 | 0 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | |5 DE-604 | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-1-4471-7493-6 |
856 | 4 | 2 | |m HEBIS Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032322538&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-032322538 |
Datensatz im Suchindex
_version_ | 1805088178060656640 |
---|---|
adam_text |
Max Bramer
Principles of Data Mining
Fourth Edition
a Springer
Contents
Introduction to Data Mining -- 2-2 eee eee eee 1
1 1 The Data Explosion 06 cece eee eee eee e ee neee 1
1 2 Knowledge Discovery 220s cece nee e eer t tees 2
1 3 Applications of Data Mining - - 0-2 eee eens 3
1 4 Labelled and Unlabelled Data - - 02-02 eee eee eee 4
1 5 Supervised Learning: Classification tne eee eeeenee 5
1 6 Supervised Learning: Numerical Prediction -- 7
1 7 Unsupervised Learning: Association Rules - 7
1 8 Unsupervised Learning: Clustering --+-+2---+00e+- 8
Data for Data Mining --sseeeeesseeeeeenan nenn 9
2 1 Standard Formulation 26 2000 e eee eee ee eens 9
2 2 Types of Variable 22 c eee eee eee eee eee eee 10
221 Categorical and Continuous Attributes - 12
2 3 Data Preparation 0 0 000 c cece eee teeter eee eres 12
231 Data Cleaning 0 00 e cece ee cece eee eeee 13
2 4 Missing Values 0 2 cece eect e ee ee eee renee eee 15
241 Discard Instances 000 cece eee e cern ee eeees 15
242 Replace by Most Frequent/Average Value - 15
2 5 Reducing the Number of Attributes ----+-+-+-- 16
2 6 The UCI Repository of Datasets 0 -- eee e eee eens 17
2 7 Chapter Summary - 000 e eee e eee nee rere eeee 18
2 8 Self-assessment Exercises for Chapter 2 - -------+00+- 18
Reference 2 cece cee ete teen ener een ten eens 19
vii
viii
Principles of Data Mining
Introduction to Classification: Naive Bayes and Nearest 1
Neighbour ces eee tee eee eee 2
31 What Is Classification? - een nennt i
3 2 Naive Bayes Olassifiers nennen 2
3 3 Nearest Neighbour Classification een] 2
331 Distance Measures ee een nn nt
332 Normalisation 222 ee nennt tn nt 35
333 Dealing with Categorical Attributes 77 36
3 4 Eager and Lazy Leaming een tn 36
3 5 Chapter Summary ernennen nenn 37
3 6 Self-assessment Exercises for Chapter 3 ent! 37
Using Decision Trees for Classification rettet! 39
4 1 Decision Rules and Decision Trees - een! 39
411 Decision Trees: The Golf Example - ren 40
41 2 Terminology essen tn tt 4l
413 The degrees Dataset een nt 42
42 The TDIDT Algorithm ee een nn nt 45
4 3 Types of Reasoning nee een tt nn 47
4 4 Chapter Summary een een een n tnt 48
4 5 Self-assessment Exercises for Chapter 4 re rnerrent 48
References 0 c cece cece cece eee een ernennen tt nt 48
Decision Tree Induction: Using Entropy for Attribute
Selection 2 00 00c ccc ccc cee cee eee renee eee eer eres 49
5 1 Attribute Selection: An Experiment -----0+ secre 49
5 2 Alternative Decision Trees --20e eee eee errr 50
521 The Football/Netball Example - erst 51
522 The anonymous Dataset ---6-0 02 eet 53
5 3 Choosing Attributes to Split On: Using Entropy ------+---- 54
531 The lens24 Dataset -- - 20 cece ener ee tte 5
532 Entropy 2022 essen ernennen nennt 57
533 Using Entropy for Attribute Selection --- - 0° 58
534 Maximising Information Gain - ---reeeeeet 60
5 4 Chapter Summary 2eeceeeenener ernennen nenn nn nn 61
5 5 Self-assessment Exercises for Chapter 5 er reee 61
Decision Tree Induction: Using Frequency Tables for
Attribute Selection -2ereseeeeeeneree rennen net 63
6 1 Calculating Entropy in Practice - er -eenennnneenen 63
611 Proof of Equivalence -- -- eee eeeee reer ree 64
612A Note on Zeros 2 22 cece ce eee nenne nennt 66
Contents
6 2 Other Attribute Selection Criteria: Gini Index of Diversity 66
6 3 The x? Attribute Selection Criterion - 0-- 68
6 4 Inductive Bias 006 eee eee tee ne ene 71
6 5 Using Gain Ratio for Attribute Selection 6- 73
651 Properties of Split Information 006- 74
652 Summary ccc cence eee eects 75
6 6 Number of Rules Generated by Different Attribute Selection
Criteria 26 cc cee een eee e eee eee 75
6 7 Missing Branches 60: e cece eee eect eee eens 76
6 8 Chapter Summary 0 00 ccc e cece ne eee eee eeee 77
6 9 Self-assessment Exercises for Chapter 6 2rccrseenn 77
References 2000 cc cece ccc eet ee nent eee eens 78
7 Estimating the Predictive Accuracy of a Classifier 79
7 1 Introduction 0 cece eee eee eee eee e teens 79
7 2 Method 1: Separate Training and Test Sets - 80
721 Standard Error 0 eee cee cee eee teens 81
722 Repeated Train and Test 0 c cece eee 82
7 3 Method 2: k-fold Cross-validation 6 eee eeeeee 82
7 4 Method 3: N-fold Cross-validation 6 : cece eee ee eee 83
7 5 Experimental Results 1 0 cece eee eee ee cee eee 84
7 6 Experimental Results II: Datasets with Missing Values 86
761 Strategy 1: Discard Instances 000 02 eevee eee 87
762 Strategy 2: Replace by Most Frequent/Average Value 87
763 Missing Classifications 000 sees cence eeee 89
7 7 Confusion Matrix 2 00 cee eee eee eee e ene 89
771 True and False Positives 6 06: e eee ee ee eeeee 90
7 8 Chapter Summary 0 cece cece eee een e eens 91
7 9 Self-assessment Exercises for Chapter 7 - -000e eee 91
Reference 0 0 00 ccc cece ee eee een eee e een ete eees 92
8 Continuous Attributes 222cseneeeeeeennenense runs 93
8 1 Introduction 00 cece eee eee ene eens 93
8 2 Local versus Global Discretisation eee eens 95
8 3 Adding Local Discretisation to TDIDT -- 96
831 Calculating the Information Gain of a Set of Pseudo-
attributes 0 0 cc cece cee eet e ences 97
832 Computational Efficiency 0 cee cece ee eeeee 102
8 4 Using the ChiMerge Algorithm for Global Discretisation 105
841 Calculating the Expected Values and x? 108
842 Finding the Threshold Value 2 c22000 113
843 Setting minIntervals and mazIntervals -- 113
10
Principles of Data Mining
8 44 The ChiMerge Algorithm: Summary ent 115
845 The ChiMerge Algorithm: Comments 771° 115
8 5 Comparing Global and Local Discretisation for Tree Induction 116
8 6 Chapter Summary een 118
8 7 Self-assessment Exereises for Chapter 8 nennt! 118
Reference cece ee ee en nn ee 119
Avoiding Overfitting of Decision Trees nett 121
9 1 Dealing with Clashes in a Training Set eeeennnenunt! 122
911 Adapting TDIDT to Deal with Clashes +++ 122
9 2 More About Overfitting Rules to Data eeeeennnt 127
9 3 Prepruning Deeision Trees eeeeenennnnenntt 128
9 4 Post-pruning Decision Trees een 130
9 5 Chapter Summary - --0e cece ee reer errr 135
9 6 Self-assessment Exercise for Chapter 9 -- ----s00 cre? 136
References 0 cece eee cece reece eee ence ene nesses eee eee 136
More About Entropy --22 ee eee eee ent nn 137
10 1 Introduction 002 c eee eee eee eee eee 137
10 2 Coding Information Using Bits ---- reset 140
10 3 Discriminating Amongst M Values (M Not a Power of 2) --- 142
10 4 Encoding Values That Are Not Equally Likely ------++++- 143
10 5 Entropy of a Training Set -- 22 ee cere ert 146
10 6 Information Gain Must Be Positive or Zero ---rererrt 147
10 7 Using Information Gain for Feature Reduction for Classifica-
tion Tasks 20 00 0020 ccc cee ct eee eee eee nent 149
10 7 1 Example 1: The genetics Dataset ----r-r rer 150
10 7 2 Example 2: The best96 Dataset ------Heerere 154
10 8 Chapter Summary 000 eee cee eter erent? 156
10 9 Self-assessment Exercises for Chapter 10 --errret 156
References 00c cece cece renee eee e ee eee ene ene see eeeee 156
Inducing Modular Rules for Classification - +++ 157
11 1 Rule Post-pruning 02 eee eee eee ee eee nett 157
11 2 Conflict Resolution 0 000 c cee eee eee ne rennen 159
11 3 Problems with Decision Trees 2 20 00 0eeeeere rete? 162
11 4 The Prism Algorithm 0 0 cee eee eee eee nents 164
11 4 1 Changes to the Basic Prism Algorithm - --+-- 171
11 4 2 Comparing Prism with TDIDT -0-- 052+ 172
11 5 Chapter Summary 0 ccc cece eee eee eens 173
11 6 Self-assessment Exercise for Chapter 11 --r -e nee: 173
References 22 occ cc cece cee cee e een ee ene een ene neeees 174
Contents xi
Measuring the Performance of a Classifier - 175
12 1 True and False Positives and Negatives 000000 176
12 2 Performance MeasureS 0 0c eee c eee e eens 178
12 3 True and False Positive Rates versus Predictive Accuracy 181
12 4 ROC Graphs 0 0 cece cece eee ee ete n eee ees 182
12 5 ROC Curves 00 c cece e eee ceen eee nent eens 184
12 6 Finding the Best Classifier 0 ccc eee ee erence nee 185
12 7 Chapter Summary eee eee eee eee ee eee eee 186
12 8 Self-assessment Exercise for Chapter 12 222 0 187
Dealing with Large Volumes of Data -- r-c 00 189
13 1 Introduction --- --- 2rueeseeseeeenenenenennn ernennen 189
13 2 Distributing Data onto Multiple Processors ++- 192
13 3 Case Study: PMCRI 2 e cece eee eee eens 194
13 4 Evaluating the Effectiveness of a Distributed System: PMCRI 197
13 5 Revising a Classifier Incrementally - --+ 2400s eee 201
13 6 Chapter Summary 000 e eee eee reenter ences 207
13 7 Self-assessment Exercises for Chapter 13 -- 2-45- 207
References 000 cece eee nee een enna nent ne eee 208
Ensemble Classification 000s: cece cece erence 209
14 1 Introduction 0 00 eee eee eee eens 209
14 2 Estimating the Performance of a Classifier +-- 212
14 3 Selecting a Different Training Set for Each Classifier 213
14 4 Selecting a Different Set of Attributes for Each Classifier 214
14 5 Combining Classifications: Alternative Voting Systems 215
14 6 Parallel Ensemble Classifiers 0 00 2c eee rere e eee 219
14 7 Chapter Summary eeseeeeseereeenenen rennen nennen nn 219
14 8 Self-assessment Exereises for Chapter 14 - --cccr 220
References 200c ccc ccc cece een ene teeter eee eneneee 220
Comparing Classifiers 0 00 6 0 ccc ener een cence neces 221
15 1 Introduction -2:222 ses eeneeeeeneenenne nennen en nn 221
15 2 The Paired t-Test - -- -- e2--2200neeeeerer nenn nn 223
15 3 Choosing Datasets for Comparative Evaluation -- 229
15 3 1 Confidence Intervals -- -2222ssneeeen nenn 231
15 4 Sampling eseceseeeeennenneeesn een enten een 231
15 5 How Bad Is a ‘No Significant Difference’ Result? - 234
15 6 Chapter Summary cece eee renee eee nnn ene 235
15 7 Self-assessment Exercises for Chapter 15 - -+- +005 235
References 000c ccc e cect en eee tee e eee n ene eeenes 236
xii
16 Association Rule Mining I
Principles of Data Mining,
16 1 Introduction 0 0 ccc eee eee
16 2 Measures of Rule Interestingness ----+seer rrr 239
16 2 1 The Piatetsky-Shapiro Criteria and the RI Measure - 241
16 2 2 Rule Interestingness Measures Applied to the chess
Dataset ccc cece eee eee eee eerste 243
16 2 3 Using Rule Interestingness Measures for Conflict Res-
olution 0 eee cee ee eee eee eet 245
16 3 Association Rule Mining Tasks + enenntent 245
16 4 Finding the Best N Rules een net 246
16 4 1 The J-Measure: Measuring the Information Content
ofa Rule eee eee eee ee eee etree 247
16 4 2 Search Strategy 2 eeesenerenntntnnt 248
16 5 Chapter Summary -2ceeee ee nenese nennen nn 251
16 6 Self-assessment Exercises for Chapter 16 - - rent 251
References ccc cece cee eee eee ee eee ee eee eee tee ee 251
Association Rule Mining IT ------eeeer err 253
17 1 Introduction 2 eee eee renner 253
17 2 Transactions and Itemsets -- r rs eeer nennt 254
17 3 Support for an Itemset - - 2er een eesn sense nt 255
17 4 Association Rules -- -2er- ee essen nennen nn 256
17 5 Generating Association Rules re eeennnnntnnnt 258
17 6 Apriori 0 0 eee eee eee nee nent eee 259
17 7 Generating Supported Itemsets: An Example + 262
17 8 Generating Rules for a Supported Itemset ----+-+-+ +7777 264
17 9 Rule Interestingness Measures: Lift and Leverage -------+ - 266
17 10 Chapter Summary - 60 2 eee eee e err 268
17 11 Self-assessment Exercises for Chapter 17 - ec reernt 269
Reference 000 ccc cee eee eee eee ene een neta ee 269
Association Rule Mining III: Frequent Pattern Trees ----- 271
18 1 Introduction: FP-Growth - 0002 e ee eer 271
18 2 Constructing the FP-tree 2-2 ee cere erent 274
18 2 1 Pre-processing the Transaction Database ++ 274
18 2 2 Initialisation 0 2 cece eee er ert 276
18 2 3 Processing Transaction 1: f,c, a, Mm, P nenne 277
18 2 4 Processing Transaction 2: f, a,b, m nen 279
18 2 5 Processing Transaction 3: f, b-- sere rere tre 283
18 2 6 Processing Transaction 4: ¢, 6, DP ----sseer erect 285
18 2 7 Processing Transaction 5: f, , MP teren 237
18 3 Finding the Frequent Itemsets from the FP-tree ++-- 288
Contents x
18 3 1 Itemsets Ending with Item p -----0- eee eres 291
18 3 2 Itemsets Ending with Item m -- -0---+- 301
18 4 Chapter Summary -- 22s eee eee eee eee eee eens 308
18 5 Self amp;assessment Exercises for Chapter 18 - -- 309
Reference 0 cece cee tener ete teen enneeenneeee 309
Clustering 5 0060 c eect erence eee e eee nee eas 311
19 1 Introduction - 00 e cece ene eee eee eee eeeeeee 311
19 2 k-Means Clustering 0-- cece cece cece renee ee eeee 314
19 2 1 Example 6 e cee eee eee e reer er een eens 315
19 2 2 Finding the Best Set of Clusters ---+- 319
19 3 Agglomerative Hierarchical Clustering -------+seeeees 320
19 3 1 Recording the Distance Between Clusters - 323
19 3 2 Terminating the Clustering Process - - 326
19 4 Chapter Summary - 00 cece reece cee e reer eee 327
19 5 Selfassessment Exercises for Chapter 19 - - -----+5+ 327
Text Mining - ::2 2ceseeeneesene en rer ernennen en nnenn 329
20 1 Multiple Classifications - 00 sce e ere eee eee renee eee 329
20 2 Representing Text Documents for Data Mining -- 330
20 3 Stop Words and Stemming - -- -- +0 +e sees eee eee e ees 332
20 4 Using Information Gain for Feature Reduction - 333
20 5 Representing Text Documents: Constructing a Vector Space
IY) Cole Cc) I 333
20 6 Normalising the Weights - - 6-- se eee eee e ener cree 335
20 7 Measuring the Distance Between Two Vectors -++- 336
20 8 Measuring the Performance of a Text Classifier - - 337
20 9 Hypertext Categorisation -6e eee reece eee eee ees 338
20 9 1 Classifying Web Pages -- - -- +20 ee cece eee eeee 338
20 9 2 Hypertext Classification versus Text Classification 339
20 10 Chapter Summary --- 02 eee etter eee ener eee ees 343
20 11 Self-assessment Exercises for Chapter 20 --- --+-+--- 343
Classifying Streaming Data - 606e eee ree eee e eee ee 345
21 1 Introduction 0 00 tere e een eneee 345
21 1 1 Stationary v Time-dependent Data -- 347
21 2 Building an H-Tree: Updating Arrays - ---+0--seeeeee 347
21 2 1 Array currentAtts -2 ee cece eee eee eect eee 348
21 2 2 Array splitAtt 20 cece e eee e eee nennen nn 349
21 2 3 Sorting a record to the appropriate leaf node 349
21 2 4 Array hitcount 6 cee eee cee eee nee en eees 350
21 2 5 Array classtotals -- 2c eee eee ee eee ee eee eee eee 350
xiv
Principles of Data Minin
21 2 6 Array acvlCounls eeneneeeeeeeeeeene ee nen en et 350
91 2 7 Array branch oo eee eee rennet 352
231 3 Building an H-Tree: a Detailed Example »+--++- 352
91 3 1 Step (a): Initialise Root Node O Herner 352
91 3 2 Step (b): Begin Reading Records + 353
21 3 3 Step (c): Consider Splitting at Node 0 +---++- 354
91 3 4 Step (d): Split on Root Node and Initialise New Leaf
Nodes - ce eetcc tee ccc ec eee cece eee eee ec neeeenre 355
21 3 5 Step (e): Process the Next Set of Records ------ 357
21 3 6 Step (f): Consider Splitting at Node 2 ++- 358
21 3 7 Step (g): Process the Next Set of Records ----- 359
21 3 8 Outline of the H-Tree Algorithm + +- 360
21 4 Splitting on an Attribute: Using Information Gain ---- 363
21 5 Splitting on An Attribute: Using a Hoeffding Bound ---- 365
21 6 H-Tree Algorithm: Final Version 02000022 000? 370
21 7 Using an Evolving H-Tree to Make Predictions - --+- 372
21 7 1 Evaluating the Performance of an H-Tree --+- 373
21 8 Experiments: H-Tree versus TDIDT -+- 374
21 8 1 The lens24 Dataset 000 00 0000 cece eee 374
21 8 2 The vote Dataset 00e ccc cece eee eee eee 376
21 9 Chapter Summary 0000 0000 cece cece ee eee eeeee 377
21 10 Self-assessment Exereises for Chapter 21 - -+-- 377
References 20 00c ecco eee e cece ences veusnvesnenenenee 378
Classifying Streaming Data II: Time-Dependent Data 379
22 1 Stationary versus Time-dependent Data - 2 -+020+- 379
22 2 Summary of the H-Tree Algorithm 20000 00 02 cece eee 381
22 2 1 Array currentAtts 00 00000 c cece cece cece eee eeees 382
22 2 2 Array splitAtt ©0200 000000 cece cece cece ee eeeeee 383
22 2 3 Array hitcount 0 00 0 c cece ence eee eeueeees 383
22 2 4 Array classtotals 0 00 0 cece cee ueeeceueeee 383
22 2 5 Array acuCounts 0 00 cc cece eccecuvceeeees 384
22 2 6 Array branch 0 00 00 0c ec cece eee eceeeees 384
22 2 7 Pseudocode for the H-Tree Algorithm 384
22 3 From H-Tree to CDH-Tree: Overview 0000000eee ees 387
22 4 From H-Tree to CDH-Tree: Incrementing Counts --- 387
22 5 The Sliding Window Method 2 2e0ceeeecereees 388
22 6 Resplitting at Nodes 00 00 00 cece cueeeceuceesees 393
22 7 Identifying Suspect Nodes 0 00 000 cece eee eeees 394
22 8 Creating Alternate Nodes 00 0 c cece cece eueeeeees 396
22 9 Growing/Forgetting an Alternate Node and its Descendants 400
Contents
22 10 Replacing an Internal Node by One of its Alternate Nodes
22 11 Experiment: Tracking Concept Drift 0+- eee eee
22 11 1 lens24 Data: Alternative Mode 2z220r200:
22 11 2 Introducing Concept Drift 000 eee eee
22 11 38 An Experiment with Alternating lens24 Data
22 11 4 Comments on Experiment 00c cece eee eee
22 12 Chapter Summary 0 6 cece eee eee eee renee
22 13 Self-assessment Exercises for Chapter 22 00 ee eeeee
Reference 200 ec cece cece cette ene eee eee e nee tees eeenes
An Introduction to Neural Networks 6 2200s eeee
23 1 Introduction 0 cece cece eee cnet eeeeee
23 2 Neural Nets Example 100e eee seve ee eeeeee
23 3 Neural Nets Example 200e eee cece e eters
23 3 1 Forward Propagating the Values of the Input Nodes
23 3 2 Forward Propagation: Summary of Formulae
23 4 Backpropagation 0 cece cece etter e tees
23 4 1 Stochastic Gradient Descent -0e-ee eee
23 4 2 Finding the Gradients 6 0 cece ee eee eres
23 4 3 Working backwards from the output layer to the hid-
den layer 6 ccc eee cence een e eee enenee
23 4 4 Working backwards from the hidden layer to the input
Layer oc ccc eee nee e ete een
23 4 5 Updating the Weights 62: cee eee e eee eee eeee
23 5 Processing a Multi-instance Training Set - ---
23 6 Using a Neural Net for Classification: the iris Dataset
23 7 Using a Neural Net for Classification: the seeds Dataset
23 8 Neural Nets: A Note of Caution --: seer eee rere
23 9 Chapter Summary 00 :e cece eee nee nee eens
23 10 Self-assessment Exercises for Chapter 23 - 0 0000s
Essential Mathematics 60 60 c cece ce ete eters
A 1 Subscript Notation 2 06 0 c ccc eect e eee ee eee
A11 Sigma Notation for Summation ---++4-
A12 Double Subscript Notation cece ee eens
A13 Other Uses of Subscripts 000s ce eee eee eens
AD TreeS 0 ccc ene ne tete eee n ne neneee
A21 Terminology 00 cece cece renee rece r ene e eee
A22 Interpretation cc cece e cece eee en
A23 Subtrees cc cece eee eee eee ee eee eee e teenies
A 3 The Logarithm Function loggX - - eee ee eee eee eee
A31 The Function —XloggX eee eee ee eee eee
xv
Principles of Data Mining
A A Introduction to Set Theoty aan een 477
AAL Subsets acc 479
A42 Summary of Set Notation eeneenenett 481
Datasets 00 000 00 cccccccccececcecnceseeeeene ree ner 483
References 0 ccecccceeceveceeceuceeeuececneneeesetesnetes 504
Sources of Further Information re eeeeenttnttt 505
Websites 200 ccc eee tee te ee ene erent 505
| LL) 6 tt 505
Conferences 000 ecececeececeecesecueeeeeeeeeerneenees 506
Information About Association Rule Mining 79 507
Glossary and Notation re een essen 509
Solutions to Self-assessment Exercises ---ntrntn 535 |
adam_txt |
Max Bramer
Principles of Data Mining
Fourth Edition
a Springer
Contents
Introduction to Data Mining -- 2-2 eee eee eee 1
1 1 The Data Explosion 06 cece eee eee eee e ee neee 1
1 2 Knowledge Discovery 220s cece nee e eer t tees 2
1 3 Applications of Data Mining - - 0-2 eee eens 3
1 4 Labelled and Unlabelled Data - - 02-02 eee eee eee 4
1 5 Supervised Learning: Classification tne eee eeeenee 5
1 6 Supervised Learning: Numerical Prediction -- 7
1 7 Unsupervised Learning: Association Rules - 7
1 8 Unsupervised Learning: Clustering --+-+2---+00e+- 8
Data for Data Mining --sseeeeesseeeeeenan nenn 9
2 1 Standard Formulation 26 2000 e eee eee ee eens 9
2 2 Types of Variable 22 c eee eee eee eee eee eee 10
221 Categorical and Continuous Attributes - 12
2 3 Data Preparation 0 0 000 c cece eee teeter eee eres 12
231 Data Cleaning 0 00 e cece ee cece eee eeee 13
2 4 Missing Values 0 2 cece eect e ee ee eee renee eee 15
241 Discard Instances 000 cece eee e cern ee eeees 15
242 Replace by Most Frequent/Average Value - 15
2 5 Reducing the Number of Attributes ----+-+-+-- 16
2 6 The UCI Repository of Datasets 0 -- eee e eee eens 17
2 7 Chapter Summary - 000 e eee e eee nee rere eeee 18
2 8 Self-assessment Exercises for Chapter 2 - -------+00+- 18
Reference 2 cece cee ete teen ener een ten eens 19
vii
viii
Principles of Data Mining
Introduction to Classification: Naive Bayes and Nearest 1
Neighbour ces eee tee eee eee 2
31 What Is Classification? - een nennt i
3 2 Naive Bayes Olassifiers nennen 2
3 3 Nearest Neighbour Classification een] 2
331 Distance Measures ee een nn nt
332 Normalisation 222 ee nennt tn nt 35
333 Dealing with Categorical Attributes 77 36
3 4 Eager and Lazy Leaming een tn 36
3 5 Chapter Summary ernennen nenn 37
3 6 Self-assessment Exercises for Chapter 3 ent! 37
Using Decision Trees for Classification rettet! 39
4 1 Decision Rules and Decision Trees - een! 39
411 Decision Trees: The Golf Example - ren 40
41 2 Terminology essen tn tt 4l
413 The degrees Dataset een nt 42
42 The TDIDT Algorithm ee een nn nt 45
4 3 Types of Reasoning nee een tt nn 47
4 4 Chapter Summary een een een n tnt 48
4 5 Self-assessment Exercises for Chapter 4 re rnerrent 48
References 0 c cece cece cece eee een ernennen tt nt 48
Decision Tree Induction: Using Entropy for Attribute
Selection 2 00 00c ccc ccc cee cee eee renee eee eer eres 49
5 1 Attribute Selection: An Experiment -----0+ secre 49
5 2 Alternative Decision Trees --20e eee eee errr 50
521 The Football/Netball Example - erst 51
522 The anonymous Dataset ---6-0 02 eet 53
5 3 Choosing Attributes to Split On: Using Entropy ------+---- 54
531 The lens24 Dataset -- - 20 cece ener ee tte 5
532 Entropy 2022 essen ernennen nennt 57
533 Using Entropy for Attribute Selection --- - 0° 58
534 Maximising Information Gain - ---reeeeeet 60
5 4 Chapter Summary 2eeceeeenener ernennen nenn nn nn 61
5 5 Self-assessment Exercises for Chapter 5 er reee 61
Decision Tree Induction: Using Frequency Tables for
Attribute Selection -2ereseeeeeeneree rennen net 63
6 1 Calculating Entropy in Practice - er -eenennnneenen 63
611 Proof of Equivalence -- -- eee eeeee reer ree 64
612A Note on Zeros 2 22 cece ce eee nenne nennt 66
Contents
6 2 Other Attribute Selection Criteria: Gini Index of Diversity 66
6 3 The x? Attribute Selection Criterion - 0-- 68
6 4 Inductive Bias 006 eee eee tee ne ene 71
6 5 Using Gain Ratio for Attribute Selection 6- 73
651 Properties of Split Information 006- 74
652 Summary ccc cence eee eects 75
6 6 Number of Rules Generated by Different Attribute Selection
Criteria 26 cc cee een eee e eee eee 75
6 7 Missing Branches 60: e cece eee eect eee eens 76
6 8 Chapter Summary 0 00 ccc e cece ne eee eee eeee 77
6 9 Self-assessment Exercises for Chapter 6 2rccrseenn 77
References 2000 cc cece ccc eet ee nent eee eens 78
7 Estimating the Predictive Accuracy of a Classifier 79
7 1 Introduction 0 cece eee eee eee eee e teens 79
7 2 Method 1: Separate Training and Test Sets - 80
721 Standard Error 0 eee cee cee eee teens 81
722 Repeated Train and Test 0 c cece eee 82
7 3 Method 2: k-fold Cross-validation 6 eee eeeeee 82
7 4 Method 3: N-fold Cross-validation 6 : cece eee ee eee 83
7 5 Experimental Results 1 0 cece eee eee ee cee eee 84
7 6 Experimental Results II: Datasets with Missing Values 86
761 Strategy 1: Discard Instances 000 02 eevee eee 87
762 Strategy 2: Replace by Most Frequent/Average Value 87
763 Missing Classifications 000 sees cence eeee 89
7 7 Confusion Matrix 2 00 cee eee eee eee e ene 89
771 True and False Positives 6 06: e eee ee ee eeeee 90
7 8 Chapter Summary 0 cece cece eee een e eens 91
7 9 Self-assessment Exercises for Chapter 7 - -000e eee 91
Reference 0 0 00 ccc cece ee eee een eee e een ete eees 92
8 Continuous Attributes 222cseneeeeeeennenense runs 93
8 1 Introduction 00 cece eee eee ene eens 93
8 2 Local versus Global Discretisation eee eens 95
8 3 Adding Local Discretisation to TDIDT -- 96
831 Calculating the Information Gain of a Set of Pseudo-
attributes 0 0 cc cece cee eet e ences 97
832 Computational Efficiency 0 cee cece ee eeeee 102
8 4 Using the ChiMerge Algorithm for Global Discretisation 105
841 Calculating the Expected Values and x? 108
842 Finding the Threshold Value 2 c22000 113
843 Setting minIntervals and mazIntervals -- 113
10
Principles of Data Mining
8 44 The ChiMerge Algorithm: Summary ent 115
845 The ChiMerge Algorithm: Comments 771° 115
8 5 Comparing Global and Local Discretisation for Tree Induction 116
8 6 Chapter Summary een 118
8 7 Self-assessment Exereises for Chapter 8 nennt! 118
Reference cece ee ee en nn ee 119
Avoiding Overfitting of Decision Trees nett 121
9 1 Dealing with Clashes in a Training Set eeeennnenunt! 122
911 Adapting TDIDT to Deal with Clashes +++ 122
9 2 More About Overfitting Rules to Data eeeeennnt 127
9 3 Prepruning Deeision Trees eeeeenennnnenntt 128
9 4 Post-pruning Decision Trees een 130
9 5 Chapter Summary - --0e cece ee reer errr 135
9 6 Self-assessment Exercise for Chapter 9 -- ----s00 cre? 136
References 0 cece eee cece reece eee ence ene nesses eee eee 136
More About Entropy --22 ee eee eee ent nn 137
10 1 Introduction 002 c eee eee eee eee eee 137
10 2 Coding Information Using Bits ---- reset 140
10 3 Discriminating Amongst M Values (M Not a Power of 2) --- 142
10 4 Encoding Values That Are Not Equally Likely ------++++- 143
10 5 Entropy of a Training Set -- 22 ee cere ert 146
10 6 Information Gain Must Be Positive or Zero ---rererrt 147
10 7 Using Information Gain for Feature Reduction for Classifica-
tion Tasks 20 00 0020 ccc cee ct eee eee eee nent 149
10 7 1 Example 1: The genetics Dataset ----r-r rer 150
10 7 2 Example 2: The best96 Dataset ------Heerere 154
10 8 Chapter Summary 000 eee cee eter erent? 156
10 9 Self-assessment Exercises for Chapter 10 --errret 156
References 00c cece cece renee eee e ee eee ene ene see eeeee 156
Inducing Modular Rules for Classification - +++ 157
11 1 Rule Post-pruning 02 eee eee eee ee eee nett 157
11 2 Conflict Resolution 0 000 c cee eee eee ne rennen 159
11 3 Problems with Decision Trees 2 20 00 0eeeeere rete? 162
11 4 The Prism Algorithm 0 0 cee eee eee eee nents 164
11 4 1 Changes to the Basic Prism Algorithm - --+-- 171
11 4 2 Comparing Prism with TDIDT -0-- 052+ 172
11 5 Chapter Summary 0 ccc cece eee eee eens 173
11 6 Self-assessment Exercise for Chapter 11 --r -e nee: 173
References 22 occ cc cece cee cee e een ee ene een ene neeees 174
Contents xi
Measuring the Performance of a Classifier - 175
12 1 True and False Positives and Negatives 000000 176
12 2 Performance MeasureS 0 0c eee c eee e eens 178
12 3 True and False Positive Rates versus Predictive Accuracy 181
12 4 ROC Graphs 0 0 cece cece eee ee ete n eee ees 182
12 5 ROC Curves 00 c cece e eee ceen eee nent eens 184
12 6 Finding the Best Classifier 0 ccc eee ee erence nee 185
12 7 Chapter Summary eee eee eee eee ee eee eee 186
12 8 Self-assessment Exercise for Chapter 12 222 0 187
Dealing with Large Volumes of Data -- r-c 00 189
13 1 Introduction --- --- 2rueeseeseeeenenenenennn ernennen 189
13 2 Distributing Data onto Multiple Processors ++- 192
13 3 Case Study: PMCRI 2 e cece eee eee eens 194
13 4 Evaluating the Effectiveness of a Distributed System: PMCRI 197
13 5 Revising a Classifier Incrementally - --+ 2400s eee 201
13 6 Chapter Summary 000 e eee eee reenter ences 207
13 7 Self-assessment Exercises for Chapter 13 -- 2-45- 207
References 000 cece eee nee een enna nent ne eee 208
Ensemble Classification 000s: cece cece erence 209
14 1 Introduction 0 00 eee eee eee eens 209
14 2 Estimating the Performance of a Classifier +-- 212
14 3 Selecting a Different Training Set for Each Classifier 213
14 4 Selecting a Different Set of Attributes for Each Classifier 214
14 5 Combining Classifications: Alternative Voting Systems 215
14 6 Parallel Ensemble Classifiers 0 00 2c eee rere e eee 219
14 7 Chapter Summary eeseeeeseereeenenen rennen nennen nn 219
14 8 Self-assessment Exereises for Chapter 14 - --cccr 220
References 200c ccc ccc cece een ene teeter eee eneneee 220
Comparing Classifiers 0 00 6 0 ccc ener een cence neces 221
15 1 Introduction -2:222 ses eeneeeeeneenenne nennen en nn 221
15 2 The Paired t-Test - -- -- e2--2200neeeeerer nenn nn 223
15 3 Choosing Datasets for Comparative Evaluation -- 229
15 3 1 Confidence Intervals -- -2222ssneeeen nenn 231
15 4 Sampling eseceseeeeennenneeesn een enten een 231
15 5 How Bad Is a ‘No Significant Difference’ Result? - 234
15 6 Chapter Summary cece eee renee eee nnn ene 235
15 7 Self-assessment Exercises for Chapter 15 - -+- +005 235
References 000c ccc e cect en eee tee e eee n ene eeenes 236
xii
16 Association Rule Mining I
Principles of Data Mining,
16 1 Introduction 0 0 ccc eee eee
16 2 Measures of Rule Interestingness ----+seer rrr 239
16 2 1 The Piatetsky-Shapiro Criteria and the RI Measure - 241
16 2 2 Rule Interestingness Measures Applied to the chess
Dataset ccc cece eee eee eee eerste 243
16 2 3 Using Rule Interestingness Measures for Conflict Res-
olution 0 eee cee ee eee eee eet 245
16 3 Association Rule Mining Tasks + enenntent 245
16 4 Finding the Best N Rules een net 246
16 4 1 The J-Measure: Measuring the Information Content
ofa Rule eee eee eee ee eee etree 247
16 4 2 Search Strategy 2 eeesenerenntntnnt 248
16 5 Chapter Summary -2ceeee ee nenese nennen nn 251
16 6 Self-assessment Exercises for Chapter 16 - - rent 251
References ccc cece cee eee eee ee eee ee eee eee tee ee 251
Association Rule Mining IT ------eeeer err 253
17 1 Introduction 2 eee eee renner 253
17 2 Transactions and Itemsets -- r rs eeer nennt 254
17 3 Support for an Itemset - - 2er een eesn sense nt 255
17 4 Association Rules -- -2er- ee essen nennen nn 256
17 5 Generating Association Rules re eeennnnntnnnt 258
17 6 Apriori 0 0 eee eee eee nee nent eee 259
17 7 Generating Supported Itemsets: An Example + 262
17 8 Generating Rules for a Supported Itemset ----+-+-+ +7777 264
17 9 Rule Interestingness Measures: Lift and Leverage -------+ - 266
17 10 Chapter Summary - 60 2 eee eee e err 268
17 11 Self-assessment Exercises for Chapter 17 - ec reernt 269
Reference 000 ccc cee eee eee eee ene een neta ee 269
Association Rule Mining III: Frequent Pattern Trees ----- 271
18 1 Introduction: FP-Growth - 0002 e ee eer 271
18 2 Constructing the FP-tree 2-2 ee cere erent 274
18 2 1 Pre-processing the Transaction Database ++ 274
18 2 2 Initialisation 0 2 cece eee er ert 276
18 2 3 Processing Transaction 1: f,c, a, Mm, P nenne 277
18 2 4 Processing Transaction 2: f, a,b, m nen 279
18 2 5 Processing Transaction 3: f, b-- sere rere tre 283
18 2 6 Processing Transaction 4: ¢, 6, DP ----sseer erect 285
18 2 7 Processing Transaction 5: f, , MP teren 237
18 3 Finding the Frequent Itemsets from the FP-tree ++-- 288
Contents x
18 3 1 Itemsets Ending with Item p -----0- eee eres 291
18 3 2 Itemsets Ending with Item m -- -0---+- 301
18 4 Chapter Summary -- 22s eee eee eee eee eee eens 308
18 5 Self amp;assessment Exercises for Chapter 18 - -- 309
Reference 0 cece cee tener ete teen enneeenneeee 309
Clustering 5 0060 c eect erence eee e eee nee eas 311
19 1 Introduction - 00 e cece ene eee eee eee eeeeeee 311
19 2 k-Means Clustering 0-- cece cece cece renee ee eeee 314
19 2 1 Example 6 e cee eee eee e reer er een eens 315
19 2 2 Finding the Best Set of Clusters ---+- 319
19 3 Agglomerative Hierarchical Clustering -------+seeeees 320
19 3 1 Recording the Distance Between Clusters - 323
19 3 2 Terminating the Clustering Process - - 326
19 4 Chapter Summary - 00 cece reece cee e reer eee 327
19 5 Selfassessment Exercises for Chapter 19 - - -----+5+ 327
Text Mining - ::2 2ceseeeneesene en rer ernennen en nnenn 329
20 1 Multiple Classifications - 00 sce e ere eee eee renee eee 329
20 2 Representing Text Documents for Data Mining -- 330
20 3 Stop Words and Stemming - -- -- +0 +e sees eee eee e ees 332
20 4 Using Information Gain for Feature Reduction - 333
20 5 Representing Text Documents: Constructing a Vector Space
IY) Cole Cc) I 333
20 6 Normalising the Weights - - 6-- se eee eee e ener cree 335
20 7 Measuring the Distance Between Two Vectors -++- 336
20 8 Measuring the Performance of a Text Classifier - - 337
20 9 Hypertext Categorisation -6e eee reece eee eee ees 338
20 9 1 Classifying Web Pages -- - -- +20 ee cece eee eeee 338
20 9 2 Hypertext Classification versus Text Classification 339
20 10 Chapter Summary --- 02 eee etter eee ener eee ees 343
20 11 Self-assessment Exercises for Chapter 20 --- --+-+--- 343
Classifying Streaming Data - 606e eee ree eee e eee ee 345
21 1 Introduction 0 00 tere e een eneee 345
21 1 1 Stationary v Time-dependent Data -- 347
21 2 Building an H-Tree: Updating Arrays - ---+0--seeeeee 347
21 2 1 Array currentAtts -2 ee cece eee eee eect eee 348
21 2 2 Array splitAtt 20 cece e eee e eee nennen nn 349
21 2 3 Sorting a record to the appropriate leaf node 349
21 2 4 Array hitcount 6 cee eee cee eee nee en eees 350
21 2 5 Array classtotals -- 2c eee eee ee eee ee eee eee eee 350
xiv
Principles of Data Minin
21 2 6 Array acvlCounls eeneneeeeeeeeeeene ee nen en et 350
91 2 7 Array branch oo eee eee rennet 352
231 3 Building an H-Tree: a Detailed Example »+--++- 352
91 3 1 Step (a): Initialise Root Node O Herner 352
91 3 2 Step (b): Begin Reading Records + 353
21 3 3 Step (c): Consider Splitting at Node 0 +---++- 354
91 3 4 Step (d): Split on Root Node and Initialise New Leaf
Nodes - ce eetcc tee ccc ec eee cece eee eee ec neeeenre 355
21 3 5 Step (e): Process the Next Set of Records ------ 357
21 3 6 Step (f): Consider Splitting at Node 2 ++- 358
21 3 7 Step (g): Process the Next Set of Records ----- 359
21 3 8 Outline of the H-Tree Algorithm + +- 360
21 4 Splitting on an Attribute: Using Information Gain ---- 363
21 5 Splitting on An Attribute: Using a Hoeffding Bound ---- 365
21 6 H-Tree Algorithm: Final Version 02000022 000? 370
21 7 Using an Evolving H-Tree to Make Predictions - --+- 372
21 7 1 Evaluating the Performance of an H-Tree --+- 373
21 8 Experiments: H-Tree versus TDIDT -+- 374
21 8 1 The lens24 Dataset 000 00 0000 cece eee 374
21 8 2 The vote Dataset 00e ccc cece eee eee eee 376
21 9 Chapter Summary 0000 0000 cece cece ee eee eeeee 377
21 10 Self-assessment Exereises for Chapter 21 - -+-- 377
References 20 00c ecco eee e cece ences veusnvesnenenenee 378
Classifying Streaming Data II: Time-Dependent Data 379
22 1 Stationary versus Time-dependent Data - 2 -+020+- 379
22 2 Summary of the H-Tree Algorithm 20000 00 02 cece eee 381
22 2 1 Array currentAtts 00 00000 c cece cece cece eee eeees 382
22 2 2 Array splitAtt ©0200 000000 cece cece cece ee eeeeee 383
22 2 3 Array hitcount 0 00 0 c cece ence eee eeueeees 383
22 2 4 Array classtotals 0 00 0 cece cee ueeeceueeee 383
22 2 5 Array acuCounts 0 00 cc cece eccecuvceeeees 384
22 2 6 Array branch 0 00 00 0c ec cece eee eceeeees 384
22 2 7 Pseudocode for the H-Tree Algorithm 384
22 3 From H-Tree to CDH-Tree: Overview 0000000eee ees 387
22 4 From H-Tree to CDH-Tree: Incrementing Counts --- 387
22 5 The Sliding Window Method 2 2e0ceeeecereees 388
22 6 Resplitting at Nodes 00 00 00 cece cueeeceuceesees 393
22 7 Identifying Suspect Nodes 0 00 000 cece eee eeees 394
22 8 Creating Alternate Nodes 00 0 c cece cece eueeeeees 396
22 9 Growing/Forgetting an Alternate Node and its Descendants 400
Contents
22 10 Replacing an Internal Node by One of its Alternate Nodes
22 11 Experiment: Tracking Concept Drift 0+- eee eee
22 11 1 lens24 Data: Alternative Mode 2z220r200:
22 11 2 Introducing Concept Drift 000 eee eee
22 11 38 An Experiment with Alternating lens24 Data
22 11 4 Comments on Experiment 00c cece eee eee
22 12 Chapter Summary 0 6 cece eee eee eee renee
22 13 Self-assessment Exercises for Chapter 22 00 ee eeeee
Reference 200 ec cece cece cette ene eee eee e nee tees eeenes
An Introduction to Neural Networks 6 2200s eeee
23 1 Introduction 0 cece cece eee cnet eeeeee
23 2 Neural Nets Example 100e eee seve ee eeeeee
23 3 Neural Nets Example 200e eee cece e eters
23 3 1 Forward Propagating the Values of the Input Nodes
23 3 2 Forward Propagation: Summary of Formulae
23 4 Backpropagation 0 cece cece etter e tees
23 4 1 Stochastic Gradient Descent -0e-ee eee
23 4 2 Finding the Gradients 6 0 cece ee eee eres
23 4 3 Working backwards from the output layer to the hid-
den layer 6 ccc eee cence een e eee enenee
23 4 4 Working backwards from the hidden layer to the input
Layer oc ccc eee nee e ete een
23 4 5 Updating the Weights 62: cee eee e eee eee eeee
23 5 Processing a Multi-instance Training Set - ---
23 6 Using a Neural Net for Classification: the iris Dataset
23 7 Using a Neural Net for Classification: the seeds Dataset
23 8 Neural Nets: A Note of Caution --: seer eee rere
23 9 Chapter Summary 00 :e cece eee nee nee eens
23 10 Self-assessment Exercises for Chapter 23 - 0 0000s
Essential Mathematics 60 60 c cece ce ete eters
A 1 Subscript Notation 2 06 0 c ccc eect e eee ee eee
A11 Sigma Notation for Summation ---++4-
A12 Double Subscript Notation cece ee eens
A13 Other Uses of Subscripts 000s ce eee eee eens
AD TreeS 0 ccc ene ne tete eee n ne neneee
A21 Terminology 00 cece cece renee rece r ene e eee
A22 Interpretation cc cece e cece eee en
A23 Subtrees cc cece eee eee eee ee eee eee e teenies
A 3 The Logarithm Function loggX - - eee ee eee eee eee
A31 The Function —XloggX eee eee ee eee eee
xv
Principles of Data Mining
A A Introduction to Set Theoty aan een 477
AAL Subsets acc 479
A42 Summary of Set Notation eeneenenett 481
Datasets 00 000 00 cccccccccececcecnceseeeeene ree ner 483
References 0 ccecccceeceveceeceuceeeuececneneeesetesnetes 504
Sources of Further Information re eeeeenttnttt 505
Websites 200 ccc eee tee te ee ene erent 505
| LL) 6 tt 505
Conferences 000 ecececeececeecesecueeeeeeeeeerneenees 506
Information About Association Rule Mining 79 507
Glossary and Notation re een essen 509
Solutions to Self-assessment Exercises ---ntrntn 535 |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Bramer, Max A. 1948- |
author_GND | (DE-588)121430855 |
author_facet | Bramer, Max A. 1948- |
author_role | aut |
author_sort | Bramer, Max A. 1948- |
author_variant | m a b ma mab |
building | Verbundindex |
bvnumber | BV046913134 |
classification_rvk | ST 530 QH 500 |
ctrlnum | (OCoLC)1184757055 (DE-599)KXP172606784X |
dewey-full | 025.04 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 025 - Operations of libraries and archives |
dewey-raw | 025.04 |
dewey-search | 025.04 |
dewey-sort | 225.04 |
dewey-tens | 020 - Library and information sciences |
discipline | Allgemeines Informatik Wirtschaftswissenschaften |
discipline_str_mv | Allgemeines Informatik Wirtschaftswissenschaften |
edition | Fourth edition |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV046913134</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20220121</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">200925s2020 xxka||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781447174929</subfield><subfield code="c">(pbk)</subfield><subfield code="9">978-1-4471-7492-9</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1184757055</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)KXP172606784X</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxk</subfield><subfield code="c">XA-GB</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-83</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-355</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">025.04</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">QH 500</subfield><subfield code="0">(DE-625)141607:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.72</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.62</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.64</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Bramer, Max A.</subfield><subfield code="d">1948-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)121430855</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Principles of data mining</subfield><subfield code="c">Max Bramer</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">Fourth edition</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">London</subfield><subfield code="b">Springer</subfield><subfield code="c">[2020]</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xvi, 571 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Undergraduate topics in computer science</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturangaben</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Data mining</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)4123623-3</subfield><subfield code="a">Lehrbuch</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-1-4471-7493-6</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HEBIS Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032322538&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-032322538</subfield></datafield></record></collection> |
genre | (DE-588)4123623-3 Lehrbuch gnd-content |
genre_facet | Lehrbuch |
id | DE-604.BV046913134 |
illustrated | Illustrated |
index_date | 2024-07-03T15:28:48Z |
indexdate | 2024-07-20T09:03:54Z |
institution | BVB |
isbn | 9781447174929 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-032322538 |
oclc_num | 1184757055 |
open_access_boolean | |
owner | DE-83 DE-11 DE-355 DE-BY-UBR |
owner_facet | DE-83 DE-11 DE-355 DE-BY-UBR |
physical | xvi, 571 Seiten Illustrationen, Diagramme |
publishDate | 2020 |
publishDateSearch | 2020 |
publishDateSort | 2020 |
publisher | Springer |
record_format | marc |
series2 | Undergraduate topics in computer science |
spelling | Bramer, Max A. 1948- Verfasser (DE-588)121430855 aut Principles of data mining Max Bramer Fourth edition London Springer [2020] xvi, 571 Seiten Illustrationen, Diagramme txt rdacontent n rdamedia nc rdacarrier Undergraduate topics in computer science Literaturangaben Data Mining (DE-588)4428654-5 gnd rswk-swf Data mining (DE-588)4123623-3 Lehrbuch gnd-content Data Mining (DE-588)4428654-5 s DE-604 Erscheint auch als Online-Ausgabe 978-1-4471-7493-6 HEBIS Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032322538&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Bramer, Max A. 1948- Principles of data mining Data Mining (DE-588)4428654-5 gnd |
subject_GND | (DE-588)4428654-5 (DE-588)4123623-3 |
title | Principles of data mining |
title_auth | Principles of data mining |
title_exact_search | Principles of data mining |
title_exact_search_txtP | Principles of data mining |
title_full | Principles of data mining Max Bramer |
title_fullStr | Principles of data mining Max Bramer |
title_full_unstemmed | Principles of data mining Max Bramer |
title_short | Principles of data mining |
title_sort | principles of data mining |
topic | Data Mining (DE-588)4428654-5 gnd |
topic_facet | Data Mining Lehrbuch |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032322538&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT bramermaxa principlesofdatamining |