Mining imperfect data: dealing with contamination and incomplete records
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Philadelphia
Society for Industrial and Applied Mathematics
2005
|
Schlagworte: | |
Online-Zugang: | Publisher description Table of contents only Inhaltsverzeichnis |
Beschreibung: | Includes bibliographical references (p. 287-300) and index |
Beschreibung: | X, 305 S. graph. Darst. 25 cm |
ISBN: | 0898715822 |
Internformat
MARC
LEADER | 00000nam a2200000zc 4500 | ||
---|---|---|---|
001 | BV023008639 | ||
003 | DE-604 | ||
005 | 20080107 | ||
007 | t | ||
008 | 071119s2005 xxud||| |||| 00||| eng d | ||
010 | |a 2004065395 | ||
020 | |a 0898715822 |c pbk. |9 0-89871-582-2 | ||
035 | |a (OCoLC)57210946 | ||
035 | |a (DE-599)BVBBV023008639 | ||
040 | |a DE-604 |b ger |e aacr | ||
041 | 0 | |a eng | |
044 | |a xxu |c US | ||
049 | |a DE-384 |a DE-N2 | ||
050 | 0 | |a QA76.9.D343 | |
082 | 0 | |a 006.3 | |
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
100 | 1 | |a Pearson, Ronald K. |d 1952- |e Verfasser |0 (DE-588)121582736 |4 aut | |
245 | 1 | 0 | |a Mining imperfect data |b dealing with contamination and incomplete records |c Ronald K. Pearson |
264 | 1 | |a Philadelphia |b Society for Industrial and Applied Mathematics |c 2005 | |
300 | |a X, 305 S. |b graph. Darst. |c 25 cm | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Includes bibliographical references (p. 287-300) and index | ||
650 | 4 | |a Data mining | |
650 | 0 | 7 | |a Fehlende Daten |0 (DE-588)4264715-0 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Fehlende Daten |0 (DE-588)4264715-0 |D s |
689 | 0 | 1 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | |5 DE-604 | |
856 | 4 | |u http://www.loc.gov/catdir/enhancements/fy0665/2004065395-d.html |3 Publisher description | |
856 | 4 | |u http://www.loc.gov/catdir/enhancements/fy0665/2004065395-t.html |3 Table of contents only | |
856 | 4 | 2 | |m Digitalisierung UB Augsburg |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016212883&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-016212883 |
Datensatz im Suchindex
_version_ | 1804137222120669184 |
---|---|
adam_text | Contents
Preface
ix
1
Introduction
1
1.1
Data anomalies
............................... 2
1.1.1
Outliers
............................... 2
1.1.2
Boxplots: A
useful comparison tool................
4
1.1.3
Missing data
............................ 7
1.1.4
Misalignments
........................... 9
1.1.5
Unexpected structure
........................ 11
1.2
Data anomalies need not be bad
...................... 13
1.2.1
Materials with anomalous properties
................ 13
1.2.2
Product design: Looking for good anomalies
.......... 14
1.2.3
^Niches in business data records
................. 16
1.3
Conversely, data anomalies can be very bad
................ 16
1.3.1
The CAMDA 02 normal mouse
dataset
.............. 16
1.3.2
The influence of outliers on kurtosis
................ 18
1.4
Dealing with data anomalies
........................ 19
1.4.1
Outlier-resistant analysis procedures
................ 20
1.4.2
Outlier detection procedures
.................... 23
1.4.3
Preprocessing for anomaly detection
................ 24
1.5
GSA
..................................... 25
1.6
Organization of this book
.......................... 31
2
Imperfect
Datasets:
Character, Consequences, and Causes
33
2.1
Outliers
................................... 34
2.1.1
Univariate outliers
......................... 34
2.1.2
Multivariate outliers
........................ 37
2.1.3
Time-series outliers
........................ 40
2.2
Consequences of outliers
.......................... 41
2.2.1
Moments versus order statistics
.................. 41
2.2.2
The effect of outliers on volcano plots
............... 45
2.2.3
Product-moment correlations
................... 47
2.2.4
Spearman rank correlations
.................... 50
2.3
Sources of data anomalies
.......................... 52
2.3.1
Gross measurement errors and outliers
.............. 52
VI
Contents
2.3.2
Misalignments and software errors
................ 54
2.3.3
Constraints and hidden symmetries
................ 58
2.4
Missing data
................................. 60
2.4.1
Nonignorable missing data and sampling bias
........... 60
2.4.2
Special codes, nulls, and disguises
................. 61
2.4.3
Idempotent data transformations
.................. 63
2.4.4
Missing data from file merging
..................66
Univariate Outlier Detection
69
3.1
Univariate outlier models
..........................70
3.2
Three outlier detection procedures
..................... 73
3.2.1
The3a edit rule
.......................... 74
3.2.2
The Hampel identifier
....................... 76
3.2.3
Quartile-based detection and boxplots
............... 77
3.3
Performance comparison
.......................... 78
3.3.1
Formulation of the case study
................... 79
3.3.2
The uncontaminated reference case
................ 79
3.3.3
Results for
1%
contamination
................... 80
3.3.4
Results for
5%
contamination
................... 82
3.3.5
Results for
15%
contamination
.................. 84
3.3.6
Brief summary of the results
.................... 86
3.4
Application to real
datasets
......................... 86
3.4.1
The catalyst
dataset
......................... 87
3.4.2
The flow rate
dataset
........................ 88
3.4.3
The industrial pressure
dataset
................... 90
Data
Pretreatment
93
4.1
Noninformative
variables
.......................... 93
4.1.1
Classes of
noninformative
variables
................ 94
4.1.2
A microarray
dataset
........................ 95
4.1.3
Noise variables
........................... 96
4.1.4
Occam s hatchet and omission bias
................ 98
4.2
Handling missing data
........................... 102
4.2.1
Omission of missing values
.................... 102
4.2.2
Single imputation strategies
.................... 103
4.2.3
Multiple imputation strategies
................... 105
4.2.4
Unmeasured and unmeasurable variables
............. 108
4.3
Cleaning time-series
............................ 110
4.3.1
The nature of the problem
..................... 110
4.3.2
Data-cleaning filters
........................ 115
4.3.3
The center-weighted median filter
................. 118
4.3.4
The Hampel filter
........................... 122
4.4
Multivariate outlier detection
........................ 124
4.4.1
Visual inspection
.......................... 125
4.4.2
Covariance-based detection
.................... 127
Contents ______________________________________________________
vii
4.4.3
Regression-based detection
....................131
4.4.4
Depth-based detection
.......................134
4.5
Preliminary analyses and auxiliary knowledge
...............138
5
What Is a Good Data Characterization?
141
5.1
A motivating example
............................ 142
5.2
Characterization via functional equations
.................. 143
5.2.1
A brief introduction to functional equations
............ 144
5.2.2
Homogeneity and its extensions
.................. 147
5.2.3
Location-invariance and related conditions
............ 149
5.2.4
Outlier detection procedures
.................... 152
5.2.5
Quasi-linear means
........................ . 154
5.2.6
Results for positive-breakdown estimators
............. 155
5.3
Characterization via inequalities
...................... 159
5.3.1
Inequalities as aids to interpretation
................ 159
5.3.2
Relations between data characterizations
............. 161
5.3.3
Bounds on means and standard deviations
............. 163
5.3.4
Inequalities as uncertainty descriptions
.............. 166
5.4
Coda: What is a good data characterization?
............... 172
6
GSA
177
6.1
The GSA metaheuristic
...........................177
6.2
The notion of exchangeability
.......................179
6.3
Choosing scenarios
.............................180
6.3.1
Some general guidelines
......................181
6.3.2
Managing subscenarios
......................186
6.3.3
Experimental design and scenario selection
............188
6.4
Sampling schemes
..............................190
6.5
Selecting a descriptor d()
..........................192
6.6
Displaying and interpreting the results
...................193
6.6.1
Normal Q-Q plots
.........................194
6.6.2
Direct comparisons across scenarios
................195
6.7
The model approximation case study
....................197
6.8
Extensions of the basic GSA framework
..................201
6.8.1
Iterative analysis procedures
....................201
6.8.2 Multivariable
descriptors
......................204
7
Sampling Schemes for a Fixed
Dataset
207
7.1
Four general strategies
...........................207
7.1.1
Strategy
1:
Random selection
...................208
7.1.2
Correlation and overlap
......................212
7.1.3
Strategy
2:
Subset deletion
.....................213
7.1.4
Strategy
3:
Comparisons
......................217
7.1.5
Strategy
4:
Partially systematic sampling
.............222
7.2
Random selection examples
.........................231
7.2.1
Variability of kurtosis estimates
..................232
7.2.2
The industrial pressure
datasets
..................235
viii Contents
7.3
Subset deletion examples
..........................237
7.3.1
The storage tank
dataset
......................237
7.3.2
Dynamic correlation analysis
...................240
7.4
Comparison-based examples
........................244
7.4.1
Correlation-destroying permutations
................245
7.4.2
Rank-based dynamic analysis
...................248
7.5
Two systematic selection examples
.....................251
7.5.1
Moving-window data characterizations
..............252
7.5.2
The Michigan lung cancer
dataset
.................263
8
Concluding Remarks and Open Questions
269
8.1
Analyzing large
datasets
..........................269
8.2
Prior knowledge, auxiliary data, and assumptions
.............272
8.3
Some open questions
............................274
8.3.1
How prevalent are different types of data anomalies?
.......274
8.3.2
How should outliers be modelled?
.................276
8.3.3
How should asymmetry be handled?
................277
8.3.4
How should misalignments be detected?
..............283
Bibliography
287
Index
301
|
adam_txt |
Contents
Preface
ix
1
Introduction
1
1.1
Data anomalies
. 2
1.1.1
Outliers
. 2
1.1.2
Boxplots: A
useful comparison tool.
4
1.1.3
Missing data
. 7
1.1.4
Misalignments
. 9
1.1.5
Unexpected structure
. 11
1.2
Data anomalies need not be bad
. 13
1.2.1
Materials with anomalous properties
. 13
1.2.2
Product design: Looking for "good" anomalies
. 14
1.2.3
'^Niches" in business data records
. 16
1.3
Conversely, data anomalies can be very bad
. 16
1.3.1
The CAMDA'02 normal mouse
dataset
. 16
1.3.2
The influence of outliers on kurtosis
. 18
1.4
Dealing with data anomalies
. 19
1.4.1
Outlier-resistant analysis procedures
. 20
1.4.2
Outlier detection procedures
. 23
1.4.3
Preprocessing for anomaly detection
. 24
1.5
GSA
. 25
1.6
Organization of this book
. 31
2
Imperfect
Datasets:
Character, Consequences, and Causes
33
2.1
Outliers
. 34
2.1.1
Univariate outliers
. 34
2.1.2
Multivariate outliers
. 37
2.1.3
Time-series outliers
. 40
2.2
Consequences of outliers
. 41
2.2.1
Moments versus order statistics
. 41
2.2.2
The effect of outliers on volcano plots
. 45
2.2.3
Product-moment correlations
. 47
2.2.4
Spearman rank correlations
. 50
2.3
Sources of data anomalies
. 52
2.3.1
Gross measurement errors and outliers
. 52
VI
Contents
2.3.2
Misalignments and software errors
. 54
2.3.3
Constraints and hidden symmetries
. 58
2.4
Missing data
. 60
2.4.1
Nonignorable missing data and sampling bias
. 60
2.4.2
Special codes, nulls, and disguises
. 61
2.4.3
Idempotent data transformations
. 63
2.4.4
Missing data from file merging
.66
Univariate Outlier Detection
69
3.1
Univariate outlier models
.70
3.2
Three outlier detection procedures
. 73
3.2.1
The3a edit rule
. 74
3.2.2
The Hampel identifier
. 76
3.2.3
Quartile-based detection and boxplots
. 77
3.3
Performance comparison
. 78
3.3.1
Formulation of the case study
. 79
3.3.2
The uncontaminated reference case
. 79
3.3.3
Results for
1%
contamination
. 80
3.3.4
Results for
5%
contamination
. 82
3.3.5
Results for
15%
contamination
. 84
3.3.6
Brief summary of the results
. 86
3.4
Application to real
datasets
. 86
3.4.1
The catalyst
dataset
. 87
3.4.2
The flow rate
dataset
. 88
3.4.3
The industrial pressure
dataset
. 90
Data
Pretreatment
93
4.1
Noninformative
variables
. 93
4.1.1
Classes of
noninformative
variables
. 94
4.1.2
A microarray
dataset
. 95
4.1.3
Noise variables
. 96
4.1.4
Occam's hatchet and omission bias
. 98
4.2
Handling missing data
. 102
4.2.1
Omission of missing values
. 102
4.2.2
Single imputation strategies
. 103
4.2.3
Multiple imputation strategies
. 105
4.2.4
Unmeasured and unmeasurable variables
. 108
4.3
Cleaning time-series
. 110
4.3.1
The nature of the problem
. 110
4.3.2
Data-cleaning filters
. 115
4.3.3
The center-weighted median filter
. 118
4.3.4
The Hampel filter
. 122
4.4
Multivariate outlier detection
. 124
4.4.1
Visual inspection
. 125
4.4.2
Covariance-based detection
. 127
Contents _
vii
4.4.3
Regression-based detection
.131
4.4.4
Depth-based detection
.134
4.5
Preliminary analyses and auxiliary knowledge
.138
5
What Is a "Good" Data Characterization?
141
5.1
A motivating example
. 142
5.2
Characterization via functional equations
. 143
5.2.1
A brief introduction to functional equations
. 144
5.2.2
Homogeneity and its extensions
. 147
5.2.3
Location-invariance and related conditions
. 149
5.2.4
Outlier detection procedures
. 152
5.2.5
Quasi-linear means
. . 154
5.2.6
Results for positive-breakdown estimators
. 155
5.3
Characterization via inequalities
. 159
5.3.1
Inequalities as aids to interpretation
. 159
5.3.2
Relations between data characterizations
. 161
5.3.3
Bounds on means and standard deviations
. 163
5.3.4
Inequalities as uncertainty descriptions
. 166
5.4
Coda: What is a "good" data characterization?
. 172
6
GSA
177
6.1
The GSA metaheuristic
.177
6.2
The notion of exchangeability
.179
6.3
Choosing scenarios
.180
6.3.1
Some general guidelines
.181
6.3.2
Managing subscenarios
.186
6.3.3
Experimental design and scenario selection
.188
6.4
Sampling schemes
.190
6.5
Selecting a descriptor d()
.192
6.6
Displaying and interpreting the results
.193
6.6.1
Normal Q-Q plots
.194
6.6.2
Direct comparisons across scenarios
.195
6.7
The model approximation case study
.197
6.8
Extensions of the basic GSA framework
.201
6.8.1
Iterative analysis procedures
.201
6.8.2 Multivariable
descriptors
.204
7
Sampling Schemes for a Fixed
Dataset
207
7.1
Four general strategies
.207
7.1.1
Strategy
1:
Random selection
.208
7.1.2
Correlation and overlap
.212
7.1.3
Strategy
2:
Subset deletion
.213
7.1.4
Strategy
3:
Comparisons
.217
7.1.5
Strategy
4:
Partially systematic sampling
.222
7.2
Random selection examples
.231
7.2.1
Variability of kurtosis estimates
.232
7.2.2
The industrial pressure
datasets
.235
viii Contents
7.3
Subset deletion examples
.237
7.3.1
The storage tank
dataset
.237
7.3.2
Dynamic correlation analysis
.240
7.4
Comparison-based examples
.244
7.4.1
Correlation-destroying permutations
.245
7.4.2
Rank-based dynamic analysis
.248
7.5
Two systematic selection examples
.251
7.5.1
Moving-window data characterizations
.252
7.5.2
The Michigan lung cancer
dataset
.263
8
Concluding Remarks and Open Questions
269
8.1
Analyzing large
datasets
.269
8.2
Prior knowledge, auxiliary data, and assumptions
.272
8.3
Some open questions
.274
8.3.1
How prevalent are different types of data anomalies?
.274
8.3.2
How should outliers be modelled?
.276
8.3.3
How should asymmetry be handled?
.277
8.3.4
How should misalignments be detected?
.283
Bibliography
287
Index
301 |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Pearson, Ronald K. 1952- |
author_GND | (DE-588)121582736 |
author_facet | Pearson, Ronald K. 1952- |
author_role | aut |
author_sort | Pearson, Ronald K. 1952- |
author_variant | r k p rk rkp |
building | Verbundindex |
bvnumber | BV023008639 |
callnumber-first | Q - Science |
callnumber-label | QA76 |
callnumber-raw | QA76.9.D343 |
callnumber-search | QA76.9.D343 |
callnumber-sort | QA 276.9 D343 |
callnumber-subject | QA - Mathematics |
classification_rvk | ST 530 |
ctrlnum | (OCoLC)57210946 (DE-599)BVBBV023008639 |
dewey-full | 006.3 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 006 - Special computer methods |
dewey-raw | 006.3 |
dewey-search | 006.3 |
dewey-sort | 16.3 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik |
discipline_str_mv | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01791nam a2200433zc 4500</leader><controlfield tag="001">BV023008639</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20080107 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">071119s2005 xxud||| |||| 00||| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">2004065395</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0898715822</subfield><subfield code="c">pbk.</subfield><subfield code="9">0-89871-582-2</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)57210946</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV023008639</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-384</subfield><subfield code="a">DE-N2</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA76.9.D343</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.3</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Pearson, Ronald K.</subfield><subfield code="d">1952-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)121582736</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Mining imperfect data</subfield><subfield code="b">dealing with contamination and incomplete records</subfield><subfield code="c">Ronald K. Pearson</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Philadelphia</subfield><subfield code="b">Society for Industrial and Applied Mathematics</subfield><subfield code="c">2005</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">X, 305 S.</subfield><subfield code="b">graph. Darst.</subfield><subfield code="c">25 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references (p. 287-300) and index</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data mining</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Fehlende Daten</subfield><subfield code="0">(DE-588)4264715-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Fehlende Daten</subfield><subfield code="0">(DE-588)4264715-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2=" "><subfield code="u">http://www.loc.gov/catdir/enhancements/fy0665/2004065395-d.html</subfield><subfield code="3">Publisher description</subfield></datafield><datafield tag="856" ind1="4" ind2=" "><subfield code="u">http://www.loc.gov/catdir/enhancements/fy0665/2004065395-t.html</subfield><subfield code="3">Table of contents only</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Augsburg</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016212883&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-016212883</subfield></datafield></record></collection> |
id | DE-604.BV023008639 |
illustrated | Illustrated |
index_date | 2024-07-02T19:08:31Z |
indexdate | 2024-07-09T21:08:51Z |
institution | BVB |
isbn | 0898715822 |
language | English |
lccn | 2004065395 |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-016212883 |
oclc_num | 57210946 |
open_access_boolean | |
owner | DE-384 DE-N2 |
owner_facet | DE-384 DE-N2 |
physical | X, 305 S. graph. Darst. 25 cm |
publishDate | 2005 |
publishDateSearch | 2005 |
publishDateSort | 2005 |
publisher | Society for Industrial and Applied Mathematics |
record_format | marc |
spelling | Pearson, Ronald K. 1952- Verfasser (DE-588)121582736 aut Mining imperfect data dealing with contamination and incomplete records Ronald K. Pearson Philadelphia Society for Industrial and Applied Mathematics 2005 X, 305 S. graph. Darst. 25 cm txt rdacontent n rdamedia nc rdacarrier Includes bibliographical references (p. 287-300) and index Data mining Fehlende Daten (DE-588)4264715-0 gnd rswk-swf Data Mining (DE-588)4428654-5 gnd rswk-swf Fehlende Daten (DE-588)4264715-0 s Data Mining (DE-588)4428654-5 s DE-604 http://www.loc.gov/catdir/enhancements/fy0665/2004065395-d.html Publisher description http://www.loc.gov/catdir/enhancements/fy0665/2004065395-t.html Table of contents only Digitalisierung UB Augsburg application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016212883&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Pearson, Ronald K. 1952- Mining imperfect data dealing with contamination and incomplete records Data mining Fehlende Daten (DE-588)4264715-0 gnd Data Mining (DE-588)4428654-5 gnd |
subject_GND | (DE-588)4264715-0 (DE-588)4428654-5 |
title | Mining imperfect data dealing with contamination and incomplete records |
title_auth | Mining imperfect data dealing with contamination and incomplete records |
title_exact_search | Mining imperfect data dealing with contamination and incomplete records |
title_exact_search_txtP | Mining imperfect data dealing with contamination and incomplete records |
title_full | Mining imperfect data dealing with contamination and incomplete records Ronald K. Pearson |
title_fullStr | Mining imperfect data dealing with contamination and incomplete records Ronald K. Pearson |
title_full_unstemmed | Mining imperfect data dealing with contamination and incomplete records Ronald K. Pearson |
title_short | Mining imperfect data |
title_sort | mining imperfect data dealing with contamination and incomplete records |
title_sub | dealing with contamination and incomplete records |
topic | Data mining Fehlende Daten (DE-588)4264715-0 gnd Data Mining (DE-588)4428654-5 gnd |
topic_facet | Data mining Fehlende Daten Data Mining |
url | http://www.loc.gov/catdir/enhancements/fy0665/2004065395-d.html http://www.loc.gov/catdir/enhancements/fy0665/2004065395-t.html http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016212883&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT pearsonronaldk miningimperfectdatadealingwithcontaminationandincompleterecords |