Predictive data mining: a practical guide
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
San Francisco, Calif.
Morgan Kaufmann
2008
|
Ausgabe: | Transferred to digital printing |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | XII, 228 S. graph. Darst. |
ISBN: | 9781558604032 1558604030 |
Internformat
MARC
LEADER | 00000nam a2200000zc 4500 | ||
---|---|---|---|
001 | BV036699323 | ||
003 | DE-604 | ||
005 | 20101008 | ||
007 | t | ||
008 | 101004s2008 d||| |||| 00||| eng d | ||
020 | |a 9781558604032 |9 978-1-55860-403-2 | ||
020 | |a 1558604030 |9 1-55860-403-0 | ||
035 | |a (OCoLC)705877302 | ||
035 | |a (DE-599)BVBBV036699323 | ||
040 | |a DE-604 |b ger | ||
041 | 0 | |a eng | |
049 | |a DE-703 |a DE-11 | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
100 | 1 | |a Weiss, Sholom M. |e Verfasser |4 aut | |
245 | 1 | 0 | |a Predictive data mining |b a practical guide |c Sholom M. Weiss ; Nitin Indurkhya |
250 | |a Transferred to digital printing | ||
264 | 1 | |a San Francisco, Calif. |b Morgan Kaufmann |c 2008 | |
300 | |a XII, 228 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 0 | 7 | |a Methode |0 (DE-588)4038971-6 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Datenbankverwaltung |0 (DE-588)4389357-0 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | 1 | |a Datenbankverwaltung |0 (DE-588)4389357-0 |D s |
689 | 0 | |5 DE-604 | |
689 | 1 | 0 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 1 | 1 | |a Methode |0 (DE-588)4038971-6 |D s |
689 | 1 | |8 1\p |5 DE-604 | |
700 | 1 | |a Indurkhya, Nitin |e Verfasser |4 aut | |
856 | 4 | 2 | |m Digitalisierung UB Bayreuth |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020617793&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-020617793 | ||
883 | 1 | |8 1\p |a cgwrk |d 20201028 |q DE-101 |u https://d-nb.info/provenance/plan#cgwrk |
Datensatz im Suchindex
_version_ | 1804143338197090304 |
---|---|
adam_text | Table
of Contents
Preface
xi
1
What Is Data Mining?
1
1.1
Big Data
2
1.1.1
The Data Warehouse
3
1.1.2
Timelines
6
1.2
Types of Data-Mining Problems
7
1.3
The Pedigree of Data Mining
11
1.3.1
Databases
11
1.3.2
Statistics
12
1.3.3
Machine Learning
13
1.4
Is Big Better?
14
1.4.1
Strong Statistical Evaluation
14
1.4.2
More Intensive Search
14
1.4.3
More Controlled Experiments
15
1.4.4
Is Big Necessary?
15
1.5
The Tasks of Predictive Data Mining
16
1.5.1
Data Preparation
16
1.5.2
Data Reduction
18
1.5.3
Data Modeling and Prediction
19
1.5.4
Case and Solution Analyses
19
1.6
Data Mining: Art or Science?
21
1.7
An Overview of the Book
21
1.8
Bibliographic and Historical Remarks
22
Tteble
of
Contento
vt
_________________________________________-___-------------------------------
2
Statistical Evaluation for
Big Data 25
2.1
The Idealized Model
26
2.1.1
Classical Statistical Comparison and Evaluation
27
2.2
It s Big but Is It Biased?
30
2.2.1
Objective Versus Survey Data
30
2.2.2
Significance and Predictive Value
31
2.2.2.1
Too Many Comparisons?
32
2.3
Classical Types of Statistical Prediction
33
2.3.1
Predicting True-or-False: Classification
34
2.3.1.1
Error Rates
34
2.3.2
Forecasting Numbers: Regression
34
2.3.2.1
Distance Measures
35
2.4
Measuring Predictive Performance
36
2.4.1
Independent Testing
36
2.4.1.1
Random Training and Testing
36
2.4.1.2
How Accurate Is the Error Estimate?
38
2.4.1.3
Comparing Results for Error Measures
39
2.4.1.4
Ideal or Real-World Sampling?
41
2.4.1.5
Training and Testing from Difierent Time
Periods
43
2.5
Too Much Searching and Testing?
45
2.6
Why Are Errors Made?
47
2.7
Bibliographic and Historical Remarks
49
3
Preparing the Data
51
3.1
A Standard Form
52
3.1.1
Standard Measurements
53
3.1.2
Goals
55
3.2
Data Transformations
55
3.2.1
Normalizations
57
3.2.2
Data Smoothing
58
3.2.3
Differences and Ratios
60
3.3
Missing Data
61
3.4
Time-Dependent Data
62
3.4.1
Time Series
63
3.4.2
Composing Features from Time Series
67
3.4.2.1
Current Values
68
3.4.2.2
Moving Averages
68
liable of
Contenta
______________________________________
vü
3.4.2.3
Trends
69
3.4.2.4
Seasonal Adjustments
70
3.5
Hybrid Time-Dependent Applications
71
3.5.1
Multivariate Time Series
72
3.5.2
Classification and Time Series
73
3.5.3
Standard Cases with Time-Series Attributes
73
3.6
Text Mining
74
3.7
Bibliographic and Historical Remarks
78
4
Data Reduction
81
4.1
Selecting the Best Features
84
4.2
Feature Selection from Means and Variances
86
4.2.1
Independent Features
87
4.2.2
Distance-Based Optimal Feature Selection
88
4.2.3
Heuristic Feature Selection
90
4.3
Principal Components
92
4.4
Feature Selection by Decision Trees
95
4.5
How Many Measured Values?
96
4.5.1
Reducing and Smoothing Values
98
4.5.1.1
Rounding
101
4.5.1.2
K-Means Clustering
102
4.5.1.3
Class Entropy
104
4.6
How Many Cases?
106
4.6.1
A Single Sample
109
4.6.2
Incremental Samples 111
4.6.3
Average Samples
113
4.6.4
Specialized Case-Reduction Techniques
115
4.6.4.1
Sequential Sampling over Time
115
4.6.4.2
Strategic Sampling of Key Events
116
4.6.4.3
Adjusting Prevalence
116
4.7
Bibliographic and Historical Remarks
117
5
Looking for Solutions
119
5.1
Overview
119
5.2
Math Solutions
120
5.2.1
Linear Scoring
120
5.2.2
Nonlinear Scoring: Neural Nets
123
5.2.3
Advanced Statistical Methods
128
ЪЫе
of
Contente
viii
___________________----------------------------------------------------
132
5.3
Distance
Solutions
*
5.4
Logic Solutions
5.4.1
Decision Trees 136
5.4.2
Decision Rules 138
5.5
What Do the Answers Mean? 142
5.5.1
Is It Safe to Edit Solutions?
144
5.6
Which Solution Is Preferable? 145
5.7
Combining Different Answers 146
5.7.1
Multiple Prediction Methods
147
5.7.2
Multiple Samples 148
5.8
Bibliographic and Historical Remarks
150
β
What s Best for Data Reduction and Mining?
153
6.1
Let s Analyze Some Real Data i54
6.2
The Experimental Methods I58
6.3
The Empirical Results
161
6.3.1
Significance Testing I62
6.4
So What Did We Learn?
162
6.4.1
Feature Selection
163
6.4.2
Value Reduction
167
6.4.3
Subsampling or
All Cases?
170
6.5
Graphical Trend Analysis
172
6.5.1
Incremental Case Analysis
173
6.5.2
Incremental Complexity Analysis
176
6.6
Maximum Data Reduction
181
6.7
Are There Winners and Losers in Performance?
182
6.8
Getting the Best Results
184
6.9
Bibliographic and Historical Remarks
187
7
Art or Science? Case Studies in Data Mining
189
7.1
Why These Case Studies?
190
7.2
A Summary of Tasks for Predictive Data Mining
191
7.2.1
A Checklist for Data Preparation
192
7.2.2
A Checklist for Data Reduction
192
7.2.3
A Checklist for Data Modeling and Prediction
192
7.2.4
A Checklist for Case and Solution Analyses
193
Table of Contents
7.3
The Case Studies
193
7.3.1
Transaction Processing
193
7.3.2
Text Mining
197
7.3.3
Outcomes Analysis
199
7.3.4
Process Control
202
7.3.5
Marketing and User Profiling
205
7.3.6
Exploratory Analysis
207
7.4
Looking Ahead
210
7.5
Bibliographic and Historical Remarks
211
Appendix: Data-Miner Software Kit
213
References
215
Author Index
223
Subject Index
225
|
any_adam_object | 1 |
author | Weiss, Sholom M. Indurkhya, Nitin |
author_facet | Weiss, Sholom M. Indurkhya, Nitin |
author_role | aut aut |
author_sort | Weiss, Sholom M. |
author_variant | s m w sm smw n i ni |
building | Verbundindex |
bvnumber | BV036699323 |
classification_rvk | ST 530 |
ctrlnum | (OCoLC)705877302 (DE-599)BVBBV036699323 |
discipline | Informatik |
edition | Transferred to digital printing |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01723nam a2200433zc 4500</leader><controlfield tag="001">BV036699323</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20101008 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">101004s2008 d||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781558604032</subfield><subfield code="9">978-1-55860-403-2</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1558604030</subfield><subfield code="9">1-55860-403-0</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)705877302</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV036699323</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-703</subfield><subfield code="a">DE-11</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Weiss, Sholom M.</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Predictive data mining</subfield><subfield code="b">a practical guide</subfield><subfield code="c">Sholom M. Weiss ; Nitin Indurkhya</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">Transferred to digital printing</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">San Francisco, Calif.</subfield><subfield code="b">Morgan Kaufmann</subfield><subfield code="c">2008</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XII, 228 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Methode</subfield><subfield code="0">(DE-588)4038971-6</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Datenbankverwaltung</subfield><subfield code="0">(DE-588)4389357-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Datenbankverwaltung</subfield><subfield code="0">(DE-588)4389357-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="689" ind1="1" ind2="0"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2="1"><subfield code="a">Methode</subfield><subfield code="0">(DE-588)4038971-6</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2=" "><subfield code="8">1\p</subfield><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Indurkhya, Nitin</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bayreuth</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020617793&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-020617793</subfield></datafield><datafield tag="883" ind1="1" ind2=" "><subfield code="8">1\p</subfield><subfield code="a">cgwrk</subfield><subfield code="d">20201028</subfield><subfield code="q">DE-101</subfield><subfield code="u">https://d-nb.info/provenance/plan#cgwrk</subfield></datafield></record></collection> |
id | DE-604.BV036699323 |
illustrated | Illustrated |
indexdate | 2024-07-09T22:46:04Z |
institution | BVB |
isbn | 9781558604032 1558604030 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-020617793 |
oclc_num | 705877302 |
open_access_boolean | |
owner | DE-703 DE-11 |
owner_facet | DE-703 DE-11 |
physical | XII, 228 S. graph. Darst. |
publishDate | 2008 |
publishDateSearch | 2008 |
publishDateSort | 2008 |
publisher | Morgan Kaufmann |
record_format | marc |
spelling | Weiss, Sholom M. Verfasser aut Predictive data mining a practical guide Sholom M. Weiss ; Nitin Indurkhya Transferred to digital printing San Francisco, Calif. Morgan Kaufmann 2008 XII, 228 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Methode (DE-588)4038971-6 gnd rswk-swf Datenbankverwaltung (DE-588)4389357-0 gnd rswk-swf Data Mining (DE-588)4428654-5 gnd rswk-swf Data Mining (DE-588)4428654-5 s Datenbankverwaltung (DE-588)4389357-0 s DE-604 Methode (DE-588)4038971-6 s 1\p DE-604 Indurkhya, Nitin Verfasser aut Digitalisierung UB Bayreuth application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020617793&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis 1\p cgwrk 20201028 DE-101 https://d-nb.info/provenance/plan#cgwrk |
spellingShingle | Weiss, Sholom M. Indurkhya, Nitin Predictive data mining a practical guide Methode (DE-588)4038971-6 gnd Datenbankverwaltung (DE-588)4389357-0 gnd Data Mining (DE-588)4428654-5 gnd |
subject_GND | (DE-588)4038971-6 (DE-588)4389357-0 (DE-588)4428654-5 |
title | Predictive data mining a practical guide |
title_auth | Predictive data mining a practical guide |
title_exact_search | Predictive data mining a practical guide |
title_full | Predictive data mining a practical guide Sholom M. Weiss ; Nitin Indurkhya |
title_fullStr | Predictive data mining a practical guide Sholom M. Weiss ; Nitin Indurkhya |
title_full_unstemmed | Predictive data mining a practical guide Sholom M. Weiss ; Nitin Indurkhya |
title_short | Predictive data mining |
title_sort | predictive data mining a practical guide |
title_sub | a practical guide |
topic | Methode (DE-588)4038971-6 gnd Datenbankverwaltung (DE-588)4389357-0 gnd Data Mining (DE-588)4428654-5 gnd |
topic_facet | Methode Datenbankverwaltung Data Mining |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020617793&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT weisssholomm predictivedataminingapracticalguide AT indurkhyanitin predictivedataminingapracticalguide |