Principles of data mining:
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
London
Springer
2007
|
Schriftenreihe: | Undergraduate topics in computer science
|
Schlagworte: | |
Online-Zugang: | Inhaltstext Inhaltsverzeichnis |
Beschreibung: | X, 343 S. graph. Darst. |
ISBN: | 9781846287657 1846287650 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV022460249 | ||
003 | DE-604 | ||
005 | 20100601 | ||
007 | t | ||
008 | 070612s2007 d||| |||| 00||| eng d | ||
015 | |a 06,N50,0014 |2 dnb | ||
016 | 7 | |a 982004052 |2 DE-101 | |
020 | |a 9781846287657 |c Pb. : EUR 37.40 (freier Pr.), ca. sfr 54.50 (freier Pr.) |9 978-1-84628-765-7 | ||
020 | |a 1846287650 |9 1-84628-765-0 | ||
024 | 3 | |a 9781846287657 | |
028 | 5 | 2 | |a 11885863 |
035 | |a (OCoLC)255434561 | ||
035 | |a (DE-599)BVBBV022460249 | ||
040 | |a DE-604 |b ger |e rakddb | ||
041 | 0 | |a eng | |
049 | |a DE-1102 |a DE-20 |a DE-706 |a DE-573 |a DE-1051 |a DE-355 |a DE-824 |a DE-945 |a DE-91G |a DE-634 |a DE-11 |a DE-2070s | ||
050 | 0 | |a QA76.9.D343 | |
082 | 0 | |a 006.312 | |
084 | |a QH 500 |0 (DE-625)141607: |2 rvk | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
084 | |a 004 |2 sdnb | ||
084 | |a DAT 703f |2 stub | ||
100 | 1 | |a Bramer, Max A. |d 1948- |e Verfasser |0 (DE-588)121430855 |4 aut | |
245 | 1 | 0 | |a Principles of data mining |c Max Bramer |
264 | 1 | |a London |b Springer |c 2007 | |
300 | |a X, 343 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Undergraduate topics in computer science | |
650 | 4 | |a Data mining | |
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
655 | 7 | |8 1\p |0 (DE-588)4123623-3 |a Lehrbuch |2 gnd-content | |
689 | 0 | 0 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | |5 DE-604 | |
856 | 4 | 2 | |q text/html |u http://deposit.dnb.de/cgi-bin/dokserv?id=2876838&prov=M&dok_var=1&dok_ext=htm |3 Inhaltstext |
856 | 4 | 2 | |m SWBplus Fremddatenuebernahme |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=015667942&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
883 | 1 | |8 1\p |a cgwrk |d 20201028 |q DE-101 |u https://d-nb.info/provenance/plan#cgwrk | |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-015667942 |
Datensatz im Suchindex
_version_ | 1807954016267665408 |
---|---|
adam_text |
CONTENTS INTRODUCTION TO DATA MINING 1 1. DATA FOR DATA MINING 11 1.1
STANDARD FORMULATION 11 1.2 TYPES OF VARIABLE 12 1.2.1 CATEGORICAL AND
CONTINUOUS ATTRIBUTES 14 1.3 DATA PREPARATION 14 1.3.1 DATA CLEANING 15
1.4 MISSING VALUES 17 1.4.1 DISCARD INSTANCES * 17 1.4.2 REPLACE BY MOST
FREQUENT/AVERAGE VALUE 17 1.5 REDUCING THE NUMBER OF ATTRIBUTES 18 1.6
THE UCI REPOSITORY OF DATASETS 19 CHAPTER SUMMARY 20 SELF-ASSESSMENT
EXERCISES FOR CHAPTER 1 20 2. INTRODUCTION TO CLASSIFICATION: NAIVE
BAYES AND NEAREST NEIGHBOUR 23 2.1 WHAT IS CLASSIFICATION? 23 2.2 NAIVE
BAYES CLASSIFIERS 24 2.3 NEAREST NEIGHBOUR CLASSIFICATION 31 2.3.1
DISTANCE MEASURES 34 2.3.2 NORMALISATION 37 2.3.3 DEALING WITH
CATEGORICAL ATTRIBUTES , 38 2.4 EAGER AND LAZY LEARNING 38 CHAPTER
SUMMARY 39 VI PRINCIPLES OF DATA MINING SELF-ASSESSMENT EXERCISES FOR
CHAPTER 2 39 3. USING DECISION TREES FOR CLASSIFICATION 41 3.1 DECISION
RULES AND DECISION TREES 41 3.1.1 DECISION TREES: THE GOLF EXAMPLE 42
3.1.2 TERMINOLOGY 43 3.1.3 THE DEGREES DATASET 44 3.2 THE TDIDT
ALGORITHM 47 3.3 TYPES OF REASONING 49 CHAPTER SUMMARY 50
SELF-ASSESSMENT EXERCISES FOR CHAPTER 3 50 4. DECISION TREE INDUCTION:
USING ENTROPY FOR ATTRIBUTE SELECTION 51 4.1 ATTRIBUTE SELECTION: AN
EXPERIMENT 51 4.2 ALTERNATIVE DECISION TREES 52 4.2.1 THE
FOOTBALL/NETBALL EXAMPLE 53 4.2.2 THE ANONYMOUS DATASET 55 4.3 CHOOSING
ATTRIBUTES TO SPLIT ON: USING ENTROPY 56 4.3.1 THE LENSB4 DATASET 57
4.3.2 ENTROPY 59 4.3.3 USING ENTROPY FOR ATTRIBUTE SELECTION 60 4.3.4
MAXIMISING INFORMATION GAIN 62 CHAPTER SUMMARY 63 SELF-ASSESSMENT
EXERCISES FOR CHAPTER 4 63 5. DECISION TREE INDUCTION: USING FREQUENCY
TABLES FOR ATTRIBUTE SELECTION 65 5.1 CALCULATING ENTROPY IN PRACTICE 65
5.1.1 PROOF OF EQUIVALENCE 66 5.1.2 A NOTE ON ZEROS 68 5.2 OTHER
ATTRIBUTE SELECTION CRITERIA: GINI INDEX OF DIVERSITY 68 5.3 INDUCTIVE
BIAS 70 5.4 USING GAIN RATIO FOR ATTRIBUTE SELECTION 72 5.4.1 PROPERTIES
OF SPLIT INFORMATION 73 5.5 NUMBER OF RULES GENERATED BY DIFFERENT
ATTRIBUTE SELECTION CRITERIA 74 5.6 MISSING BRANCHES 75 CHAPTER SUMMARY
76 SELF-ASSESSMENT EXERCISES FOR CHAPTER 5 77 CONTENTS VII 6. ESTIMATING
THE PREDICTIVE ACCURACY OF A CLASSIFIER 79 6.1 INTRODUCTION 79 6.2
METHOD 1: SEPARATE TRAINING AND TEST SETS 80 6.2.1 STANDARD ERROR 81
6.2.2 REPEATED TRAIN AND TEST 82 6.3 METHOD 2: AI-FOLD CROSS-VALIDATION
82 6.4 METHOD 3: TV-FOLD CROSS-VALIDATION 83 6.5 EXPERIMENTAL RESULTS I
84 6.6 EXPERIMENTAL RESULTS II: DATASETS WITH MISSING VALUES 86 6.6.1
STRATEGY 1: DISCARD INSTANCES 87 6.6.2 STRATEGY 2: REPLACE BY MOST
FREQUENT/AVERAGE VALUE . 87 6.6.3 MISSING CLASSIFICATIONS 89 6.7
CONFUSION MATRIX 89 6.7.1 TRUE AND FALSE POSITIVES 90 CHAPTER SUMMARY 91
SELF-ASSESSMENT EXERCISES FOR CHAPTER 6 91 7. CONTINUOUS ATTRIBUTES 93
7.1 INTRODUCTION 93 7.2 LOCAL VERSUS GLOBAL DISCRETISATION 95 7.3 ADDING
LOCAL DISCRETISATION TO TDIDT 96 7.3.1 CALCULATING THE INFORMATION. GAIN
OF A SET OF PSEUDO- ATTRIBUTES 97 7.3.2 COMPUTATIONAL EFFICIENCY 102 7.4
USING THE CHIMERGE ALGORITHM FOR GLOBAL DISCRETISATION 105 7.4.1
CALCULATING THE EXPECTED VALUES AND X 2 108 7.4.2 FINDING THE THRESHOLD
VALUE 113 7.4.3 SETTING MINLNTERVALS AND MAXLNTERVALS .113 7.4.4 THE
CHIMERGE ALGORITHM: SUMMARY 115 7.4.5 THE CHIMERGE ALGORITHM: COMMENTS
115 7.5 COMPARING GLOBAL AND LOCAL DISCRETISATION FOR TREE INDUCTION
.116 CHAPTER SUMMARY 118 SELF-ASSESSMENT EXERCISES FOR CHAPTER 7 118 8.
AVOIDING OVERFITTING OF DECISION TREES 119 8.1 DEALING WITH CLASHES IN A
TRAINING SET 120 8.1.1 ADAPTING TDIDT TO DEAL WITH CLASHES 120 8.2 MORE
ABOUT OVERFITTING RULES TO DATA 125 8.3 PRE-PRUNING DECISION TREES 126
8.4 POST-PRUNING DECISION TREES 128 CHAPTER SUMMARY 133 SELF-ASSESSMENT
EXERCISE FOR CHAPTER 8 134 VIII PRINCIPLES OF DATA MINING 9. MORE ABOUT
ENTROPY 135 9.1 INTRODUCTION 135 9.2 CODING INFORMATION USING BITS 138
9.3 DISCRIMINATING AMONGST M VALUES (M NOT A POWER OF 2) 140 9.4
ENCODING VALUES THAT ARE NOT EQUALLY LIKELY 141 9.5 ENTROPY OF A
TRAINING SET 144 9.6 INFORMATION GAIN MUST BE POSITIVE OR ZERO 145 9.7
USING INFORMATION GAIN FOR FEATURE REDUCTION FOR CLASSIFICATION TASKS
147 9.7.1 EXAMPLE 1: THE GENETICS DATASET 148 9.7.2 EXAMPLE 2: THE
BCST96 DATASET 152 CHAPTER SUMMARY 154 SELF-ASSESSMENT EXERCISES FOR
CHAPTER 9 154 10. INDUCING MODULAR R 155 10.1 RULE POST-PRUNING 155 10.2
CONFLICT RESOLUTION 157 10.3 PROBLEMS WITH DECISION TREES 160 10.4 THE
PRISM ALGORITHM 162 10.4.1 CHANGES TO THE BASIC PRISM ALGORITHM 169
10.4.2 COMPARING PRISM WITH TDIDT 170 CHAPTER SUMMARY 171
SELF-ASSESSMENT EXERCISE FOR CHAPTER 10 171 11. MEASURING THE
PERFORMANCE OF A CLASSIFIER 173 11.1 TRUE AND FALSE POSITIVES AND
NEGATIVES 174 11.2 PERFORMANCE MEASURES 176 11.3 TRUE AND FALSE POSITIVE
RATES VERSUS PREDICTIVE ACCURACY 179 11.4 ROC GRAPHS 180 11.5 ROC CURVES
182 11.6 FINDING THE BEST CLASSIFIER 183 CHAPTER SUMMARY 184
SELF-ASSESSMENT EXERCISE FOR CHAPTER 11 185 12. ASSOCIATION RULE MINING
I 187 12.1 INTRODUCTION 187 12.2 MEASURES OF RULE INTERESTINGNESS 189
12.2.1 THE PIATETSKY-SHAPIRO CRITERIA AND THE RI MEASURE 191 12.2.2 RULE
INTERESTINGNESS MEASURES APPLIED TO THE CHESS DATASET 193 CONTENTS IX
12.2.3 "USING RULE INTERESTINGNESS MEASURES FOR CONFLICT RESOLUTION 195
12.3 ASSOCIATION RULE MINING TASKS 195 12.4 FINDING THE BEST N RULES 196
12.4.1 THE J-MEASURE: MEASURING THE INFORMATION CONTENT OF A RULE 197
12.4.2 SEARCH STRATEGY 198 CHAPTER SUMMARY 201 SELF-ASSESSMENT EXERCISES
FOR CHAPTER 12 201 13. ASSOCIATION RULE MINING II 203 13.1 INTRODUCTION
203 13.2 TRANSACTIONS AND ITEMSETS 204 13.3 SUPPORT FOR AN ITEMSET 205
13.4 ASSOCIATION RULES , . 206 13.5 GENERATING ASSOCIATION RULES 208
13.6 APRIORI 209 13.7 GENERATING SUPPORTED ITEMSETS: AN EXAMPLE 212 13.8
GENERATING RULES FOR A SUPPORTED ITEMSET 214 13.9 RULE INTERESTINGNESS
MEASURES: LIFT AND LEVERAGE 216 CHAPTER SUMMARY , 218 SELF-ASSESSMENT
EXERCISES FOR CHAPTER 13 219 14. CLUSTERING 221 14.1 INTRODUCTION 221
14.2 FC-MEANS CLUSTERING 224 14.2.1 EXAMPLE 225 14.2.2 FINDING THE BEST
SET OF CLUSTERS 230 14.3 AGGLOMERATIVE HIERARCHICAL CLUSTERING 231
14.3.1 RECORDING THE DISTANCE BETWEEN CLUSTERS 233 14.3.2 TERMINATING
THE CLUSTERING PROCESS 236 CHAPTER SUMMARY 237 SELF-ASSESSMENT EXERCISES
FOR CHAPTER 14 238 15. TEXT MINING 239 15.1 MULTIPLE CLASSIFICATIONS 239
15.2 REPRESENTING TEXT DOCUMENTS FOR DATA MINING 240 15.3 STOP WORDS AND
STEMMING 242 15.4 USING INFORMATION GAIN FOR FEATURE REDUCTION 243 15.5
REPRESENTING TEXT DOCUMENTS: CONSTRUCTING A VECTOR SPACE MODEL 243 15.6
NORMALISING THE WEIGHTS 245 PRINCIPLES OF DATA MINING 15.7 MEASURING THE
DISTANCE BETWEEN TWO VECTORS 246 15.8 MEASURING THE PERFORMANCE OF A
TEXT CLASSIFIER 247 15.9 HYPERTEXT CATEGORISATION 248 15.9.1 CLASSIFYING
WEB PAGES 248 15.9.2 HYPERTEXT CLASSIFICATION VERSUS TEXT CLASSIFICATION
249 CHAPTER SUMMARY 253 SELF-ASSESSMENT EXERCISES FOR CHAPTER 15 253
REFERENCES 255 A. ESSENTIAL MATHEMATICS 257 A.I SUBSCRIPT NOTATION 257
A. 1.1 SIGMA NOTATION FOR SUMMATION 258 A.I.2 DOUBLE SUBSCRIPT NOTATION
259 A.I.3 OTHER USES OF SUBSCRIPTS 260 A.2 TREES 260 A.2.1 TERMINOLOGY
261 A,2.2 INTERPRETATION 262 A.2.3 SUBTREES 263 A.3 THE LOGARITHM
FUNCTION LOG 2 X 264 A.3.1 THE FUNCTION -X LOG 2 X 266 A.4 INTRODUCTION
TO SET THEORY 267 A.4.1 SUBSETS 269 A.4.2 SUMMARY OF SET NOTATION 271 B.
DATASETS 273 C. SOURCES OF FURTHER INFORMATION 293 D. GLOSSARY AND
NOTATION 297 E. SOLUTIONS TO SELF-ASSESSMENT EXERCISES 315 INDEX 339
PPN: 259595950 TITEL: PRINCIPLES OF DATA MINING / MAX BRAMER. - LONDON :
SPRINGER, 2007 ISBN: 978-1-84628-765-7PB.CA. EUR 32.05 (FREIER PR.), CA.
SFR 54.50 (FREIER PR.); 1-84628-765- 0PB.CA. EUR 32.05 (FREIER PR.), CA.
SFR 54.50 (FREIER PR.); 1-84628-766-9E-ISBN; 978-1-84628-766- 4E-ISBN
BIBLIOGRAPHISCHER DATENSATZ IM SWB-VERBUND |
adam_txt |
CONTENTS INTRODUCTION TO DATA MINING 1 1. DATA FOR DATA MINING 11 1.1
STANDARD FORMULATION 11 1.2 TYPES OF VARIABLE 12 1.2.1 CATEGORICAL AND
CONTINUOUS ATTRIBUTES 14 1.3 DATA PREPARATION 14 1.3.1 DATA CLEANING 15
1.4 MISSING VALUES 17 1.4.1 DISCARD INSTANCES * 17 1.4.2 REPLACE BY MOST
FREQUENT/AVERAGE VALUE 17 1.5 REDUCING THE NUMBER OF ATTRIBUTES 18 1.6
THE UCI REPOSITORY OF DATASETS 19 CHAPTER SUMMARY 20 SELF-ASSESSMENT
EXERCISES FOR CHAPTER 1 20 2. INTRODUCTION TO CLASSIFICATION: NAIVE
BAYES AND NEAREST NEIGHBOUR 23 2.1 WHAT IS CLASSIFICATION? 23 2.2 NAIVE
BAYES CLASSIFIERS 24 2.3 NEAREST NEIGHBOUR CLASSIFICATION 31 2.3.1
DISTANCE MEASURES 34 2.3.2 NORMALISATION 37 2.3.3 DEALING WITH
CATEGORICAL ATTRIBUTES , 38 2.4 EAGER AND LAZY LEARNING 38 CHAPTER
SUMMARY 39 VI PRINCIPLES OF DATA MINING SELF-ASSESSMENT EXERCISES FOR
CHAPTER 2 39 3. USING DECISION TREES FOR CLASSIFICATION 41 3.1 DECISION
RULES AND DECISION TREES 41 3.1.1 DECISION TREES: THE GOLF EXAMPLE 42
3.1.2 TERMINOLOGY 43 3.1.3 THE DEGREES DATASET 44 3.2 THE TDIDT
ALGORITHM 47 3.3 TYPES OF REASONING 49 CHAPTER SUMMARY 50
SELF-ASSESSMENT EXERCISES FOR CHAPTER 3 50 4. DECISION TREE INDUCTION:
USING ENTROPY FOR ATTRIBUTE SELECTION 51 4.1 ATTRIBUTE SELECTION: AN
EXPERIMENT 51 4.2 ALTERNATIVE DECISION TREES 52 4.2.1 THE
FOOTBALL/NETBALL EXAMPLE 53 4.2.2 THE ANONYMOUS DATASET 55 4.3 CHOOSING
ATTRIBUTES TO SPLIT ON: USING ENTROPY 56 4.3.1 THE LENSB4 DATASET 57
4.3.2 ENTROPY 59 4.3.3 USING ENTROPY FOR ATTRIBUTE SELECTION 60 4.3.4
MAXIMISING INFORMATION GAIN 62 CHAPTER SUMMARY 63 SELF-ASSESSMENT
EXERCISES FOR CHAPTER 4 63 5. DECISION TREE INDUCTION: USING FREQUENCY
TABLES FOR ATTRIBUTE SELECTION 65 5.1 CALCULATING ENTROPY IN PRACTICE 65
5.1.1 PROOF OF EQUIVALENCE 66 5.1.2 A NOTE ON ZEROS 68 5.2 OTHER
ATTRIBUTE SELECTION CRITERIA: GINI INDEX OF DIVERSITY 68 5.3 INDUCTIVE
BIAS 70 5.4 USING GAIN RATIO FOR ATTRIBUTE SELECTION 72 5.4.1 PROPERTIES
OF SPLIT INFORMATION 73 5.5 NUMBER OF RULES GENERATED BY DIFFERENT
ATTRIBUTE SELECTION CRITERIA 74 5.6 MISSING BRANCHES 75 CHAPTER SUMMARY
76 SELF-ASSESSMENT EXERCISES FOR CHAPTER 5 77 CONTENTS VII 6. ESTIMATING
THE PREDICTIVE ACCURACY OF A CLASSIFIER 79 6.1 INTRODUCTION 79 6.2
METHOD 1: SEPARATE TRAINING AND TEST SETS 80 6.2.1 STANDARD ERROR 81
6.2.2 REPEATED TRAIN AND TEST 82 6.3 METHOD 2: AI-FOLD CROSS-VALIDATION
82 6.4 METHOD 3: TV-FOLD CROSS-VALIDATION 83 6.5 EXPERIMENTAL RESULTS I
84 6.6 EXPERIMENTAL RESULTS II: DATASETS WITH MISSING VALUES 86 6.6.1
STRATEGY 1: DISCARD INSTANCES 87 6.6.2 STRATEGY 2: REPLACE BY MOST
FREQUENT/AVERAGE VALUE . 87 6.6.3 MISSING CLASSIFICATIONS 89 6.7
CONFUSION MATRIX 89 6.7.1 TRUE AND FALSE POSITIVES 90 CHAPTER SUMMARY 91
SELF-ASSESSMENT EXERCISES FOR CHAPTER 6 91 7. CONTINUOUS ATTRIBUTES 93
7.1 INTRODUCTION 93 7.2 LOCAL VERSUS GLOBAL DISCRETISATION 95 7.3 ADDING
LOCAL DISCRETISATION TO TDIDT 96 7.3.1 CALCULATING THE INFORMATION. GAIN
OF A SET OF PSEUDO- ATTRIBUTES 97 7.3.2 COMPUTATIONAL EFFICIENCY 102 7.4
USING THE CHIMERGE ALGORITHM FOR GLOBAL DISCRETISATION 105 7.4.1
CALCULATING THE EXPECTED VALUES AND X 2 108 7.4.2 FINDING THE THRESHOLD
VALUE 113 7.4.3 SETTING MINLNTERVALS AND MAXLNTERVALS .113 7.4.4 THE
CHIMERGE ALGORITHM: SUMMARY 115 7.4.5 THE CHIMERGE ALGORITHM: COMMENTS
115 7.5 COMPARING GLOBAL AND LOCAL DISCRETISATION FOR TREE INDUCTION
.116 CHAPTER SUMMARY 118 SELF-ASSESSMENT EXERCISES FOR CHAPTER 7 118 8.
AVOIDING OVERFITTING OF DECISION TREES 119 8.1 DEALING WITH CLASHES IN A
TRAINING SET 120 8.1.1 ADAPTING TDIDT TO DEAL WITH CLASHES 120 8.2 MORE
ABOUT OVERFITTING RULES TO DATA 125 8.3 PRE-PRUNING DECISION TREES 126
8.4 POST-PRUNING DECISION TREES 128 CHAPTER SUMMARY 133 SELF-ASSESSMENT
EXERCISE FOR CHAPTER 8 134 VIII PRINCIPLES OF DATA MINING 9. MORE ABOUT
ENTROPY 135 9.1 INTRODUCTION 135 9.2 CODING INFORMATION USING BITS 138
9.3 DISCRIMINATING AMONGST M VALUES (M NOT A POWER OF 2) 140 9.4
ENCODING VALUES THAT ARE NOT EQUALLY LIKELY 141 9.5 ENTROPY OF A
TRAINING SET 144 9.6 INFORMATION GAIN MUST BE POSITIVE OR ZERO 145 9.7
USING INFORMATION GAIN FOR FEATURE REDUCTION FOR CLASSIFICATION TASKS
147 9.7.1 EXAMPLE 1: THE GENETICS DATASET 148 9.7.2 EXAMPLE 2: THE
BCST96 DATASET 152 CHAPTER SUMMARY 154 SELF-ASSESSMENT EXERCISES FOR
CHAPTER 9 154 10. INDUCING MODULAR R 155 10.1 RULE POST-PRUNING 155 10.2
CONFLICT RESOLUTION 157 10.3 PROBLEMS WITH DECISION TREES 160 10.4 THE
PRISM ALGORITHM 162 10.4.1 CHANGES TO THE BASIC PRISM ALGORITHM 169
10.4.2 COMPARING PRISM WITH TDIDT 170 CHAPTER SUMMARY 171
SELF-ASSESSMENT EXERCISE FOR CHAPTER 10 171 11. MEASURING THE
PERFORMANCE OF A CLASSIFIER 173 11.1 TRUE AND FALSE POSITIVES AND
NEGATIVES 174 11.2 PERFORMANCE MEASURES 176 11.3 TRUE AND FALSE POSITIVE
RATES VERSUS PREDICTIVE ACCURACY 179 11.4 ROC GRAPHS 180 11.5 ROC CURVES
182 11.6 FINDING THE BEST CLASSIFIER 183 CHAPTER SUMMARY 184
SELF-ASSESSMENT EXERCISE FOR CHAPTER 11 185 12. ASSOCIATION RULE MINING
I 187 12.1 INTRODUCTION 187 12.2 MEASURES OF RULE INTERESTINGNESS 189
12.2.1 THE PIATETSKY-SHAPIRO CRITERIA AND THE RI MEASURE 191 12.2.2 RULE
INTERESTINGNESS MEASURES APPLIED TO THE CHESS DATASET 193 CONTENTS IX
12.2.3 "USING RULE INTERESTINGNESS MEASURES FOR CONFLICT RESOLUTION 195
12.3 ASSOCIATION RULE MINING TASKS 195 12.4 FINDING THE BEST N RULES 196
12.4.1 THE J-MEASURE: MEASURING THE INFORMATION CONTENT OF A RULE 197
12.4.2 SEARCH STRATEGY 198 CHAPTER SUMMARY 201 SELF-ASSESSMENT EXERCISES
FOR CHAPTER 12 201 13. ASSOCIATION RULE MINING II 203 13.1 INTRODUCTION
203 13.2 TRANSACTIONS AND ITEMSETS 204 13.3 SUPPORT FOR AN ITEMSET 205
13.4 ASSOCIATION RULES , . 206 13.5 GENERATING ASSOCIATION RULES 208
13.6 APRIORI 209 13.7 GENERATING SUPPORTED ITEMSETS: AN EXAMPLE 212 13.8
GENERATING RULES FOR A SUPPORTED ITEMSET 214 13.9 RULE INTERESTINGNESS
MEASURES: LIFT AND LEVERAGE 216 CHAPTER SUMMARY , 218 SELF-ASSESSMENT
EXERCISES FOR CHAPTER 13 219 14. CLUSTERING 221 14.1 INTRODUCTION 221
14.2 FC-MEANS CLUSTERING 224 14.2.1 EXAMPLE 225 14.2.2 FINDING THE BEST
SET OF CLUSTERS 230 14.3 AGGLOMERATIVE HIERARCHICAL CLUSTERING 231
14.3.1 RECORDING THE DISTANCE BETWEEN CLUSTERS 233 14.3.2 TERMINATING
THE CLUSTERING PROCESS 236 CHAPTER SUMMARY 237 SELF-ASSESSMENT EXERCISES
FOR CHAPTER 14 238 15. TEXT MINING 239 15.1 MULTIPLE CLASSIFICATIONS 239
15.2 REPRESENTING TEXT DOCUMENTS FOR DATA MINING 240 15.3 STOP WORDS AND
STEMMING 242 15.4 USING INFORMATION GAIN FOR FEATURE REDUCTION 243 15.5
REPRESENTING TEXT DOCUMENTS: CONSTRUCTING A VECTOR SPACE MODEL 243 15.6
NORMALISING THE WEIGHTS 245 PRINCIPLES OF DATA MINING 15.7 MEASURING THE
DISTANCE BETWEEN TWO VECTORS 246 15.8 MEASURING THE PERFORMANCE OF A
TEXT CLASSIFIER 247 15.9 HYPERTEXT CATEGORISATION 248 15.9.1 CLASSIFYING
WEB PAGES 248 15.9.2 HYPERTEXT CLASSIFICATION VERSUS TEXT CLASSIFICATION
249 CHAPTER SUMMARY 253 SELF-ASSESSMENT EXERCISES FOR CHAPTER 15 253
REFERENCES 255 A. ESSENTIAL MATHEMATICS 257 A.I SUBSCRIPT NOTATION 257
A. 1.1 SIGMA NOTATION FOR SUMMATION 258 A.I.2 DOUBLE SUBSCRIPT NOTATION
259 A.I.3 OTHER USES OF SUBSCRIPTS 260 A.2 TREES 260 A.2.1 TERMINOLOGY
261 A,2.2 INTERPRETATION 262 A.2.3 SUBTREES 263 A.3 THE LOGARITHM
FUNCTION LOG 2 X 264 A.3.1 THE FUNCTION -X LOG 2 X 266 A.4 INTRODUCTION
TO SET THEORY 267 A.4.1 SUBSETS 269 A.4.2 SUMMARY OF SET NOTATION 271 B.
DATASETS 273 C. SOURCES OF FURTHER INFORMATION 293 D. GLOSSARY AND
NOTATION 297 E. SOLUTIONS TO SELF-ASSESSMENT EXERCISES 315 INDEX 339
PPN: 259595950 TITEL: PRINCIPLES OF DATA MINING / MAX BRAMER. - LONDON :
SPRINGER, 2007 ISBN: 978-1-84628-765-7PB.CA. EUR 32.05 (FREIER PR.), CA.
SFR 54.50 (FREIER PR.); 1-84628-765- 0PB.CA. EUR 32.05 (FREIER PR.), CA.
SFR 54.50 (FREIER PR.); 1-84628-766-9E-ISBN; 978-1-84628-766- 4E-ISBN
BIBLIOGRAPHISCHER DATENSATZ IM SWB-VERBUND |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Bramer, Max A. 1948- |
author_GND | (DE-588)121430855 |
author_facet | Bramer, Max A. 1948- |
author_role | aut |
author_sort | Bramer, Max A. 1948- |
author_variant | m a b ma mab |
building | Verbundindex |
bvnumber | BV022460249 |
callnumber-first | Q - Science |
callnumber-label | QA76 |
callnumber-raw | QA76.9.D343 |
callnumber-search | QA76.9.D343 |
callnumber-sort | QA 276.9 D343 |
callnumber-subject | QA - Mathematics |
classification_rvk | QH 500 ST 530 |
classification_tum | DAT 703f |
ctrlnum | (OCoLC)255434561 (DE-599)BVBBV022460249 |
dewey-full | 006.312 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 006 - Special computer methods |
dewey-raw | 006.312 |
dewey-search | 006.312 |
dewey-sort | 16.312 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik Wirtschaftswissenschaften |
discipline_str_mv | Informatik Wirtschaftswissenschaften |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV022460249</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20100601</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">070612s2007 d||| |||| 00||| eng d</controlfield><datafield tag="015" ind1=" " ind2=" "><subfield code="a">06,N50,0014</subfield><subfield code="2">dnb</subfield></datafield><datafield tag="016" ind1="7" ind2=" "><subfield code="a">982004052</subfield><subfield code="2">DE-101</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781846287657</subfield><subfield code="c">Pb. : EUR 37.40 (freier Pr.), ca. sfr 54.50 (freier Pr.)</subfield><subfield code="9">978-1-84628-765-7</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1846287650</subfield><subfield code="9">1-84628-765-0</subfield></datafield><datafield tag="024" ind1="3" ind2=" "><subfield code="a">9781846287657</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">11885863</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)255434561</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV022460249</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakddb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-1102</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-706</subfield><subfield code="a">DE-573</subfield><subfield code="a">DE-1051</subfield><subfield code="a">DE-355</subfield><subfield code="a">DE-824</subfield><subfield code="a">DE-945</subfield><subfield code="a">DE-91G</subfield><subfield code="a">DE-634</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-2070s</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA76.9.D343</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.312</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">QH 500</subfield><subfield code="0">(DE-625)141607:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">004</subfield><subfield code="2">sdnb</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 703f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Bramer, Max A.</subfield><subfield code="d">1948-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)121430855</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Principles of data mining</subfield><subfield code="c">Max Bramer</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">London</subfield><subfield code="b">Springer</subfield><subfield code="c">2007</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">X, 343 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Undergraduate topics in computer science</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data mining</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="8">1\p</subfield><subfield code="0">(DE-588)4123623-3</subfield><subfield code="a">Lehrbuch</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="q">text/html</subfield><subfield code="u">http://deposit.dnb.de/cgi-bin/dokserv?id=2876838&prov=M&dok_var=1&dok_ext=htm</subfield><subfield code="3">Inhaltstext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">SWBplus Fremddatenuebernahme</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=015667942&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="883" ind1="1" ind2=" "><subfield code="8">1\p</subfield><subfield code="a">cgwrk</subfield><subfield code="d">20201028</subfield><subfield code="q">DE-101</subfield><subfield code="u">https://d-nb.info/provenance/plan#cgwrk</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-015667942</subfield></datafield></record></collection> |
genre | 1\p (DE-588)4123623-3 Lehrbuch gnd-content |
genre_facet | Lehrbuch |
id | DE-604.BV022460249 |
illustrated | Illustrated |
index_date | 2024-07-02T17:39:53Z |
indexdate | 2024-08-21T00:15:10Z |
institution | BVB |
isbn | 9781846287657 1846287650 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-015667942 |
oclc_num | 255434561 |
open_access_boolean | |
owner | DE-1102 DE-20 DE-706 DE-573 DE-1051 DE-355 DE-BY-UBR DE-824 DE-945 DE-91G DE-BY-TUM DE-634 DE-11 DE-2070s |
owner_facet | DE-1102 DE-20 DE-706 DE-573 DE-1051 DE-355 DE-BY-UBR DE-824 DE-945 DE-91G DE-BY-TUM DE-634 DE-11 DE-2070s |
physical | X, 343 S. graph. Darst. |
publishDate | 2007 |
publishDateSearch | 2007 |
publishDateSort | 2007 |
publisher | Springer |
record_format | marc |
series2 | Undergraduate topics in computer science |
spelling | Bramer, Max A. 1948- Verfasser (DE-588)121430855 aut Principles of data mining Max Bramer London Springer 2007 X, 343 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Undergraduate topics in computer science Data mining Data Mining (DE-588)4428654-5 gnd rswk-swf 1\p (DE-588)4123623-3 Lehrbuch gnd-content Data Mining (DE-588)4428654-5 s DE-604 text/html http://deposit.dnb.de/cgi-bin/dokserv?id=2876838&prov=M&dok_var=1&dok_ext=htm Inhaltstext SWBplus Fremddatenuebernahme application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=015667942&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis 1\p cgwrk 20201028 DE-101 https://d-nb.info/provenance/plan#cgwrk |
spellingShingle | Bramer, Max A. 1948- Principles of data mining Data mining Data Mining (DE-588)4428654-5 gnd |
subject_GND | (DE-588)4428654-5 (DE-588)4123623-3 |
title | Principles of data mining |
title_auth | Principles of data mining |
title_exact_search | Principles of data mining |
title_exact_search_txtP | Principles of data mining |
title_full | Principles of data mining Max Bramer |
title_fullStr | Principles of data mining Max Bramer |
title_full_unstemmed | Principles of data mining Max Bramer |
title_short | Principles of data mining |
title_sort | principles of data mining |
topic | Data mining Data Mining (DE-588)4428654-5 gnd |
topic_facet | Data mining Data Mining Lehrbuch |
url | http://deposit.dnb.de/cgi-bin/dokserv?id=2876838&prov=M&dok_var=1&dok_ext=htm http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=015667942&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT bramermaxa principlesofdatamining |