Practical data mining:
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Boca Raton, Fla. [u.a.]
CRC Press
2012
|
Schriftenreihe: | An Auerbach book
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | XXIII, 277 S. Ill., graph. Darst. |
ISBN: | 9781439868362 1439868360 |
Internformat
MARC
LEADER | 00000nam a2200000zc 4500 | ||
---|---|---|---|
001 | BV039964070 | ||
003 | DE-604 | ||
005 | 20121019 | ||
007 | t | ||
008 | 120316s2012 xxuad|| |||| 00||| eng d | ||
010 | |a 2011040834 | ||
020 | |a 9781439868362 |9 978-1-439-86836-2 | ||
020 | |a 1439868360 |9 1-439-86836-0 | ||
035 | |a (OCoLC)779583658 | ||
035 | |a (DE-599)BVBBV039964070 | ||
040 | |a DE-604 |b ger |e aacr | ||
041 | 0 | |a eng | |
044 | |a xxu |c US | ||
049 | |a DE-634 |a DE-824 |a DE-473 | ||
050 | 0 | |a QA76.9.D343 | |
082 | 0 | |a 006.3/12 | |
084 | |a SK 850 |0 (DE-625)143263: |2 rvk | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
100 | 1 | |a Hancock, Monte |e Verfasser |4 aut | |
245 | 1 | 0 | |a Practical data mining |c Monte F. Hancock, Jr. |
264 | 1 | |a Boca Raton, Fla. [u.a.] |b CRC Press |c 2012 | |
300 | |a XXIII, 277 S. |b Ill., graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a An Auerbach book | |
650 | 4 | |a Data mining | |
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | |5 DE-604 | |
856 | 4 | 2 | |m Digitalisierung UB Bamberg |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024821745&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-024821745 |
Datensatz im Suchindex
_version_ | 1804148940042403840 |
---|---|
adam_text | Contents
Dedication
v
Preface
xv
About the Author
xxi
Acknowledgments
xxiii
Chapter
1
What Is Data Mining and What Can It Do?
1
Purpose
1
Goals
1
1.1
Introduction
1
1.2
A Brief Philosophical Discussion
2
1.3
The Most Important Attribute of the Successful
Data Miner: Integrity
3
1.4
What Does Data Mining Do?
4
1.5
What Do We Mean By Data?
6
1.5.1
Nominal Data vs. Numeric Data
7
1.5.2
Discrete Data vs. Continuous Data
7
1.5.3
Coding and Quantization as Inverse Processes
8
1.5.4
A Crucial Distinction: Data and Information Are
Not the Same Thing
9
1.5.5
The Parity Problem
11
1.5.6
Five Riddles about Information
11
1.5.7
Seven Riddles about Meaning
13
1.6
Data Complexity
14
1.7
Computational Complexity
15
viii
Practical
Data Mining
1.7.1
Some NP-Hard Problems
17
1.7.2
Some Worst-Case Computational Complexities
17
1.8
Summary
17
Chapter
2
The Data Mining Process
19
Purpose
19
Goals
19
2.1
Introduction
19
2.2
Discovery and Exploitation
20
2.3
Eleven Key Principles of Information Driven Data Mining
23
2.4
Key Principles Expanded
24
2.5
Type of Models: Descriptive, Predictive, Forensic
30
2.5.1
Domain Ontologies as Models
30
2.5.2
Descriptive Models
32
2.5.3
Predictive Models
32
2.5.4
Forensic Models
32
2.6
Data Mining Methodologies
32
2.6.1
Conventional System Development:
Waterfall Process
33
2.6.2
Data Mining as Rapid Prototyping
34
2.7
A Generic Data Mining Process
34
2.8
RAD Skill Set Designators
35
2.9
Summary
36
Chapter
3
Problem Definition (Step
1)
37
Purpose
37
Goals
37
3.1
Introduction
37
3.2
Problem Definition Task
1:
Characterize Your Problem
38
3.3
Problem Definition Checklist
38
3.3.1
Identify Previous Work
43
3.3.2
Data Demographics
45
3.3.3
User Interface
47
3.3.4
Covering Blind Spots
50
3.3.5
Evaluating Domain Expertise
51
3.3.6
Tools
53
3.3.7
Methodology
54
3.3.8
Needs
54
Contents ix
3.4
Candidate
Solution
Checklist
56
3.4.1
What Type of Data Mining Must the System
Perform?
56
3.4.2
Multifaceted Problems Demand Multifaceted
Solutions
57
3.4.3
The Nature of the Data
58
3.5
Problem Definition Task
2:
Characterizing Your Solution
62
3.5.1
Candidate Solution Checklist
62
3.6
Problem Definition Case Study
64
3.6.1
Predictive Attrition Model: Summary Description
64
3.6.2
Glossary
64
3.6.3
The ATM Concept
65
3.6.4
Operational Functions
65
3.6.5
Predictive Modeling and ATM
67
3.6.6
Cognitive Systems and Predictive Modeling
68
3.6.7
The ATM Hybrid Cognitive Engine
68
3.6.8
Testing and Validation of Cognitive Systems
69
3.6.9
Spiral Development Methodology
69
3.7
Summary
70
Chapter
4
Data Evaluation (Step
2) 71
Purpose
71
Goals
71
4.1
Introduction
71
4.2
Data Accessibility Checklist
72
4.3
How Much Data Do You Need?
74
4.4
Data Staging
75
4.5
Methods Used for Data Evaluation
76
4.6
Data Evaluation Case Study: Estimating the
Information Content Features
77
4.7
Some Simple Data Evaluation Methods
81
4.8
Data Quality Checklist
85
4.9
Summary
87
Chapter
5
Feature Extraction and Enhancement (Step
3) 89
Purpose
89
Goals
89
5.1
Introduction: A Quick Tutorial on Feature Space
89
5.1.1
Data Preparation Guidelines
90
χ
Practical Data Mining
5.1.2
General Techniques for Feature Selection and
Enhancement
91
5.2
Characterizing and Resolving Data Problems
93
5.2.1
Outlier Case Study
95
5.2.2
Winnowing Case Study: Principal Component
Analysis for Feature Extraction
95
5.3
Principal Component Analysis
96
5.3.1
Feature Winnowing and Dimension Reduction
Checklist
102
5.3.2
Checklist for Characterizing and Resolving
Data Problems
107
5.4
Synthesis of Features
108
5.4.1
Feature Synthesis Case Study
108
5.4.2
Synthesis of Features Checklist
111
5.5
Degapping
112
5.5.1
Degapping Case Study
114
5.5.2
Feature Selection Checklist
117
5.6
Summary
119
Chapter
6
Prototyping Plan and Model Development
(Step
4)
121
Purpose
121
Goals
121
6.1
Introduction
121
6.2
Step 4A: Prototyping Plan
122
6.2.1
Prototype Planning as Part of a Data Mining
Project
122
6.3
Prototyping Plan Case Study
124
6.4
Step 4B: Prototyping/Model Development
133
6.5
Model Development Case Study
135
6.6
Summary
141
Chapter
7
Model Evaluation (Step
5)
143
Purpose
143
Goals
143
7.1
Introduction
143
7.2
Evaluation Goals and Methods
144
7.2.1
Performance Evaluation Components
144
7.2.2
Stability Evaluation Components
144
Contents xi
7.3
What Does Accuracy Mean?
146
7.3.1
Confusion Matrix Example
146
7.3.2
Other Metrics Derived from the Confusion
Matrix
150
7.3.3
Model Evaluation Case Study: Addressing
Queuing Problems by Simulation
150
7.3.4
Model Evaluation Checklist
152
7.4
Summary
155
Chapter
8
Implementation (Step
6) 157
Purpose
157
Goals
157
8.1
Introduction
157
8.1.1
Implementation Checklist
158
8.2
Quantifying the Benefits of Data Mining
160
8.2.1
ROI
Case Study
160
8.2.2
ROI
Checklist
162
8.3
Tutorial on Ensemble Methods
164
8.3.1
Many Predictive Modeling Paradigms Are
Available
165
8.3.2
Adaptive Training
167
8.4
Getting It Wrong: Mistakes Every Data Miner Has Made
169
8.5
Summary
176
Chapter
9
Supervised Learning
Genre Section
1—
Detecting and Characterizing
Known Patterns
179
Purpose
179
Goals
179
9.1
Introduction
180
9.2
Representative Example of Supervised Learning:
Building a Classifier
180
9.2.1
Problem Description
180
9.2.2
Data Description: Background Research/
Planning
181
9.2.3
Descriptive Modeling of Data: Preprocessing
and Data Conditioning
182
9.2.4
Data Exploitation: Feature Extraction and
Enhancement
185
xii
Practical Data Mining
9.2.5
Model Selection and Development
187
9.2.6
Model Training
189
9.2.7
Model Evaluation
189
9.3
Specific Challenges, Problems, and Pitfalls of
Supervised Learning
190
9.3.1
High-Dimensional Feature Vectors (PCA,
Winnowing)
190
9.3.2
Not Enough Data
191
9.3.3
Too Much Data
192
9.3.4
Unbalanced Data
192
9.3.5
Overtraining
193
9.3.6
Noncommensurable Data: Outliers
193
9.3.7
Missing Features
195
9.3.8
Missing Ground Truth
195
9.4
Recommended Data Mining Architectures for
Supervised Learning
195
9.5
Descriptive Analysis
198
9.5.1
Technical Component: Problem Definition
198
9.5.2
Technical Component: Data Selection and
Preparation
200
9.5.3
Technical Component: Data Representation
200
9.6
Predictive Modeling
201
9.6.1
Technical Component: Paradigm Selection
201
9.6.2
Technical Component: Model Construction
and Validation
202
9.6.3
Technical Component: Model Evaluation
(Functional and Performance Metrics)
202
9.6.4
Technical Component: Model Deployment
202
9.6.5
Technical Component: Model Maintenance
202
9.7
Summary
204
Chapter
10
Forensic Analysis
Genre Section
2—
Detecting, Characterizing,
and Exploiting Hidden Patterns
205
Purpose
205
Goals
205
10.1
Introduction
206
10.2
Genre Overview
207
10.3
Recommended Data Mining Architectures for
Unsupervised Learning
207
Contents xiii
10.4
Examples and Case Studies for Unsupervised Learning
209
10.4.1
Case Study: Reducing Cost by Optimizing a
System Configuration
212
10.4.2
Case Study: Stacking Multiple Pattern
Processors for Broad Functionality
214
10.4.3
Multiparadigm Engine for Cognitive Intrusion
Detection
215
10.5
Tutorial on Neural Networks
217
10.5.1
The Neural Analogy
217
10.5.2
Artificial Neurons: Their Form and Function
218
10.5.3
Using Neural Networks to Learn Complex
Patterns
219
10.6
Making Syntactic Methods Smarter: The Search Engine
Problem
222
10.6.1
A Submetric for Sensitivity
224
10.6.2
A Submetric for Specificity
224
10.6.3
Combining the Submetrics to Obtain a Single
Score
225
10.6.4
Putting It All Together: Building a Simple
Search Engine
226
10.6.5
The Objective Function for This Search Engine
and How to Use It
231
10.7
Summary
231
Chapter
11
Genre Section
3—
Knowledge: Its Acquisition,
Representation, and Use
233
Purpose
233
Goals
233
11.1
Introduction to Knowledge Engineering
233
11.1.1
The Prototypical Example: Knowledge-Based
Expert Systems (KBES)
234
11.1.2
Inference Engines Implement Inferencing
Strategies
236
11.2
Computing with Knowledge
237
11.2.1
Graph Methods: Decision Trees, Forward/
Backward Chaining, Belief Nets
238
11.2.2
Bayesian Belief Networks
243
11.2.3
Non-Graph Methods: Belief Accumulation
245
11.3
Inferring Knowledge from Data: Machine Learning
246
11.3.1
Learning Machines
247
xiv
Practical Data Mining
11.3.2
Using Modeling Techniques to Infer Knowledge
from History
248
11.3.3
Domain Knowledge the Learner Will Use
250
11.3.4
Inferring Domain Knowledge from Human
Experts
251
11.3.5
Writing on a Blank Slate
255
11.3.6
Mathematizing Human Reasoning
256
11.3.7
Using Facts in Rules
256
11.3.8
Problems and Properties
258
11.4
Summary
259
References
261
Glossary
263
Index
269
|
any_adam_object | 1 |
author | Hancock, Monte |
author_facet | Hancock, Monte |
author_role | aut |
author_sort | Hancock, Monte |
author_variant | m h mh |
building | Verbundindex |
bvnumber | BV039964070 |
callnumber-first | Q - Science |
callnumber-label | QA76 |
callnumber-raw | QA76.9.D343 |
callnumber-search | QA76.9.D343 |
callnumber-sort | QA 276.9 D343 |
callnumber-subject | QA - Mathematics |
classification_rvk | SK 850 ST 530 |
ctrlnum | (OCoLC)779583658 (DE-599)BVBBV039964070 |
dewey-full | 006.3/12 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 006 - Special computer methods |
dewey-raw | 006.3/12 |
dewey-search | 006.3/12 |
dewey-sort | 16.3 212 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik Mathematik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01428nam a2200409zc 4500</leader><controlfield tag="001">BV039964070</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20121019 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">120316s2012 xxuad|| |||| 00||| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">2011040834</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781439868362</subfield><subfield code="9">978-1-439-86836-2</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1439868360</subfield><subfield code="9">1-439-86836-0</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)779583658</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV039964070</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-634</subfield><subfield code="a">DE-824</subfield><subfield code="a">DE-473</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA76.9.D343</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.3/12</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SK 850</subfield><subfield code="0">(DE-625)143263:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Hancock, Monte</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Practical data mining</subfield><subfield code="c">Monte F. Hancock, Jr.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boca Raton, Fla. [u.a.]</subfield><subfield code="b">CRC Press</subfield><subfield code="c">2012</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXIII, 277 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">An Auerbach book</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data mining</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bamberg</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024821745&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-024821745</subfield></datafield></record></collection> |
id | DE-604.BV039964070 |
illustrated | Illustrated |
indexdate | 2024-07-10T00:15:07Z |
institution | BVB |
isbn | 9781439868362 1439868360 |
language | English |
lccn | 2011040834 |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-024821745 |
oclc_num | 779583658 |
open_access_boolean | |
owner | DE-634 DE-824 DE-473 DE-BY-UBG |
owner_facet | DE-634 DE-824 DE-473 DE-BY-UBG |
physical | XXIII, 277 S. Ill., graph. Darst. |
publishDate | 2012 |
publishDateSearch | 2012 |
publishDateSort | 2012 |
publisher | CRC Press |
record_format | marc |
series2 | An Auerbach book |
spelling | Hancock, Monte Verfasser aut Practical data mining Monte F. Hancock, Jr. Boca Raton, Fla. [u.a.] CRC Press 2012 XXIII, 277 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier An Auerbach book Data mining Data Mining (DE-588)4428654-5 gnd rswk-swf Data Mining (DE-588)4428654-5 s DE-604 Digitalisierung UB Bamberg application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024821745&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Hancock, Monte Practical data mining Data mining Data Mining (DE-588)4428654-5 gnd |
subject_GND | (DE-588)4428654-5 |
title | Practical data mining |
title_auth | Practical data mining |
title_exact_search | Practical data mining |
title_full | Practical data mining Monte F. Hancock, Jr. |
title_fullStr | Practical data mining Monte F. Hancock, Jr. |
title_full_unstemmed | Practical data mining Monte F. Hancock, Jr. |
title_short | Practical data mining |
title_sort | practical data mining |
topic | Data mining Data Mining (DE-588)4428654-5 gnd |
topic_facet | Data mining Data Mining |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024821745&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT hancockmonte practicaldatamining |