Verfügbarkeit: Mining imperfect data

Mining imperfect data: dealing with contamination and incomplete records

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Pearson, Ronald K. 1952- (VerfasserIn)
Format:	Buch
Sprache:	English
Veröffentlicht:	Philadelphia Society for Industrial and Applied Mathematics 2005
Schlagworte:	Data mining Fehlende Daten Data Mining
Online-Zugang:	Publisher description Table of contents only Inhaltsverzeichnis
Beschreibung:	Includes bibliographical references (p. 287-300) and index
Beschreibung:	X, 305 S. graph. Darst. 25 cm
ISBN:	0898715822

Internformat

MARC


LEADER	00000nam a2200000zc 4500
001	BV023008639
003	DE-604
005	20080107
007	t
008	071119s2005 xxud\|\|\| \|\|\|\| 00\|\|\| eng d
010			\|a 2004065395
020			\|a 0898715822 \|c pbk. \|9 0-89871-582-2
035			\|a (OCoLC)57210946
035			\|a (DE-599)BVBBV023008639
040			\|a DE-604 \|b ger \|e aacr
041	0		\|a eng
044			\|a xxu \|c US
049			\|a DE-384 \|a DE-N2
050		0	\|a QA76.9.D343
082	0		\|a 006.3
084			\|a ST 530 \|0 (DE-625)143679: \|2 rvk
100	1		\|a Pearson, Ronald K. \|d 1952- \|e Verfasser \|0 (DE-588)121582736 \|4 aut
245	1	0	\|a Mining imperfect data \|b dealing with contamination and incomplete records \|c Ronald K. Pearson
264		1	\|a Philadelphia \|b Society for Industrial and Applied Mathematics \|c 2005
300			\|a X, 305 S. \|b graph. Darst. \|c 25 cm
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
500			\|a Includes bibliographical references (p. 287-300) and index
650		4	\|a Data mining
650	0	7	\|a Fehlende Daten \|0 (DE-588)4264715-0 \|2 gnd \|9 rswk-swf
650	0	7	\|a Data Mining \|0 (DE-588)4428654-5 \|2 gnd \|9 rswk-swf
689	0	0	\|a Fehlende Daten \|0 (DE-588)4264715-0 \|D s
689	0	1	\|a Data Mining \|0 (DE-588)4428654-5 \|D s
689	0		\|5 DE-604
856	4		\|u http://www.loc.gov/catdir/enhancements/fy0665/2004065395-d.html \|3 Publisher description
856	4		\|u http://www.loc.gov/catdir/enhancements/fy0665/2004065395-t.html \|3 Table of contents only
856	4	2	\|m Digitalisierung UB Augsburg \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016212883&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
999			\|a oai:aleph.bib-bvb.de:BVB01-016212883

Datensatz im Suchindex

_version_	1804137222120669184
adam_text	Contents Preface ix 1 Introduction 1 1.1 Data anomalies ............................... 2 1.1.1 Outliers ............................... 2 1.1.2 Boxplots: A useful comparison tool................ 4 1.1.3 Missing data ............................ 7 1.1.4 Misalignments ........................... 9 1.1.5 Unexpected structure ........................ 11 1.2 Data anomalies need not be bad ...................... 13 1.2.1 Materials with anomalous properties ................ 13 1.2.2 Product design: Looking for good anomalies .......... 14 1.2.3 ^Niches in business data records ................. 16 1.3 Conversely, data anomalies can be very bad ................ 16 1.3.1 The CAMDA 02 normal mouse dataset .............. 16 1.3.2 The influence of outliers on kurtosis ................ 18 1.4 Dealing with data anomalies ........................ 19 1.4.1 Outlier-resistant analysis procedures ................ 20 1.4.2 Outlier detection procedures .................... 23 1.4.3 Preprocessing for anomaly detection ................ 24 1.5 GSA ..................................... 25 1.6 Organization of this book .......................... 31 2 Imperfect Datasets: Character, Consequences, and Causes 33 2.1 Outliers ................................... 34 2.1.1 Univariate outliers ......................... 34 2.1.2 Multivariate outliers ........................ 37 2.1.3 Time-series outliers ........................ 40 2.2 Consequences of outliers .......................... 41 2.2.1 Moments versus order statistics .................. 41 2.2.2 The effect of outliers on volcano plots ............... 45 2.2.3 Product-moment correlations ................... 47 2.2.4 Spearman rank correlations .................... 50 2.3 Sources of data anomalies .......................... 52 2.3.1 Gross measurement errors and outliers .............. 52 VI Contents 2.3.2 Misalignments and software errors ................ 54 2.3.3 Constraints and hidden symmetries ................ 58 2.4 Missing data ................................. 60 2.4.1 Nonignorable missing data and sampling bias ........... 60 2.4.2 Special codes, nulls, and disguises ................. 61 2.4.3 Idempotent data transformations .................. 63 2.4.4 Missing data from file merging ..................66 Univariate Outlier Detection 69 3.1 Univariate outlier models ..........................70 3.2 Three outlier detection procedures ..................... 73 3.2.1 The3a edit rule .......................... 74 3.2.2 The Hampel identifier ....................... 76 3.2.3 Quartile-based detection and boxplots ............... 77 3.3 Performance comparison .......................... 78 3.3.1 Formulation of the case study ................... 79 3.3.2 The uncontaminated reference case ................ 79 3.3.3 Results for 1% contamination ................... 80 3.3.4 Results for 5% contamination ................... 82 3.3.5 Results for 15% contamination .................. 84 3.3.6 Brief summary of the results .................... 86 3.4 Application to real datasets ......................... 86 3.4.1 The catalyst dataset ......................... 87 3.4.2 The flow rate dataset ........................ 88 3.4.3 The industrial pressure dataset ................... 90 Data Pretreatment 93 4.1 Noninformative variables .......................... 93 4.1.1 Classes of noninformative variables ................ 94 4.1.2 A microarray dataset ........................ 95 4.1.3 Noise variables ........................... 96 4.1.4 Occam s hatchet and omission bias ................ 98 4.2 Handling missing data ........................... 102 4.2.1 Omission of missing values .................... 102 4.2.2 Single imputation strategies .................... 103 4.2.3 Multiple imputation strategies ................... 105 4.2.4 Unmeasured and unmeasurable variables ............. 108 4.3 Cleaning time-series ............................ 110 4.3.1 The nature of the problem ..................... 110 4.3.2 Data-cleaning filters ........................ 115 4.3.3 The center-weighted median filter ................. 118 4.3.4 The Hampel filter ........................... 122 4.4 Multivariate outlier detection ........................ 124 4.4.1 Visual inspection .......................... 125 4.4.2 Covariance-based detection .................... 127 Contents ______________________________________________________ vii 4.4.3 Regression-based detection ....................131 4.4.4 Depth-based detection .......................134 4.5 Preliminary analyses and auxiliary knowledge ...............138 5 What Is a Good Data Characterization? 141 5.1 A motivating example ............................ 142 5.2 Characterization via functional equations .................. 143 5.2.1 A brief introduction to functional equations ............ 144 5.2.2 Homogeneity and its extensions .................. 147 5.2.3 Location-invariance and related conditions ............ 149 5.2.4 Outlier detection procedures .................... 152 5.2.5 Quasi-linear means ........................ . 154 5.2.6 Results for positive-breakdown estimators ............. 155 5.3 Characterization via inequalities ...................... 159 5.3.1 Inequalities as aids to interpretation ................ 159 5.3.2 Relations between data characterizations ............. 161 5.3.3 Bounds on means and standard deviations ............. 163 5.3.4 Inequalities as uncertainty descriptions .............. 166 5.4 Coda: What is a good data characterization? ............... 172 6 GSA 177 6.1 The GSA metaheuristic ...........................177 6.2 The notion of exchangeability .......................179 6.3 Choosing scenarios .............................180 6.3.1 Some general guidelines ......................181 6.3.2 Managing subscenarios ......................186 6.3.3 Experimental design and scenario selection ............188 6.4 Sampling schemes ..............................190 6.5 Selecting a descriptor d() ..........................192 6.6 Displaying and interpreting the results ...................193 6.6.1 Normal Q-Q plots .........................194 6.6.2 Direct comparisons across scenarios ................195 6.7 The model approximation case study ....................197 6.8 Extensions of the basic GSA framework ..................201 6.8.1 Iterative analysis procedures ....................201 6.8.2 Multivariable descriptors ......................204 7 Sampling Schemes for a Fixed Dataset 207 7.1 Four general strategies ...........................207 7.1.1 Strategy 1: Random selection ...................208 7.1.2 Correlation and overlap ......................212 7.1.3 Strategy 2: Subset deletion .....................213 7.1.4 Strategy 3: Comparisons ......................217 7.1.5 Strategy 4: Partially systematic sampling .............222 7.2 Random selection examples .........................231 7.2.1 Variability of kurtosis estimates ..................232 7.2.2 The industrial pressure datasets ..................235 viii Contents 7.3 Subset deletion examples ..........................237 7.3.1 The storage tank dataset ......................237 7.3.2 Dynamic correlation analysis ...................240 7.4 Comparison-based examples ........................244 7.4.1 Correlation-destroying permutations ................245 7.4.2 Rank-based dynamic analysis ...................248 7.5 Two systematic selection examples .....................251 7.5.1 Moving-window data characterizations ..............252 7.5.2 The Michigan lung cancer dataset .................263 8 Concluding Remarks and Open Questions 269 8.1 Analyzing large datasets ..........................269 8.2 Prior knowledge, auxiliary data, and assumptions .............272 8.3 Some open questions ............................274 8.3.1 How prevalent are different types of data anomalies? .......274 8.3.2 How should outliers be modelled? .................276 8.3.3 How should asymmetry be handled? ................277 8.3.4 How should misalignments be detected? ..............283 Bibliography 287 Index 301
adam_txt	Contents Preface ix 1 Introduction 1 1.1 Data anomalies . 2 1.1.1 Outliers . 2 1.1.2 Boxplots: A useful comparison tool. 4 1.1.3 Missing data . 7 1.1.4 Misalignments . 9 1.1.5 Unexpected structure . 11 1.2 Data anomalies need not be bad . 13 1.2.1 Materials with anomalous properties . 13 1.2.2 Product design: Looking for "good" anomalies . 14 1.2.3 '^Niches" in business data records . 16 1.3 Conversely, data anomalies can be very bad . 16 1.3.1 The CAMDA'02 normal mouse dataset . 16 1.3.2 The influence of outliers on kurtosis . 18 1.4 Dealing with data anomalies . 19 1.4.1 Outlier-resistant analysis procedures . 20 1.4.2 Outlier detection procedures . 23 1.4.3 Preprocessing for anomaly detection . 24 1.5 GSA . 25 1.6 Organization of this book . 31 2 Imperfect Datasets: Character, Consequences, and Causes 33 2.1 Outliers . 34 2.1.1 Univariate outliers . 34 2.1.2 Multivariate outliers . 37 2.1.3 Time-series outliers . 40 2.2 Consequences of outliers . 41 2.2.1 Moments versus order statistics . 41 2.2.2 The effect of outliers on volcano plots . 45 2.2.3 Product-moment correlations . 47 2.2.4 Spearman rank correlations . 50 2.3 Sources of data anomalies . 52 2.3.1 Gross measurement errors and outliers . 52 VI Contents 2.3.2 Misalignments and software errors . 54 2.3.3 Constraints and hidden symmetries . 58 2.4 Missing data . 60 2.4.1 Nonignorable missing data and sampling bias . 60 2.4.2 Special codes, nulls, and disguises . 61 2.4.3 Idempotent data transformations . 63 2.4.4 Missing data from file merging .66 Univariate Outlier Detection 69 3.1 Univariate outlier models .70 3.2 Three outlier detection procedures . 73 3.2.1 The3a edit rule . 74 3.2.2 The Hampel identifier . 76 3.2.3 Quartile-based detection and boxplots . 77 3.3 Performance comparison . 78 3.3.1 Formulation of the case study . 79 3.3.2 The uncontaminated reference case . 79 3.3.3 Results for 1% contamination . 80 3.3.4 Results for 5% contamination . 82 3.3.5 Results for 15% contamination . 84 3.3.6 Brief summary of the results . 86 3.4 Application to real datasets . 86 3.4.1 The catalyst dataset . 87 3.4.2 The flow rate dataset . 88 3.4.3 The industrial pressure dataset . 90 Data Pretreatment 93 4.1 Noninformative variables . 93 4.1.1 Classes of noninformative variables . 94 4.1.2 A microarray dataset . 95 4.1.3 Noise variables . 96 4.1.4 Occam's hatchet and omission bias . 98 4.2 Handling missing data . 102 4.2.1 Omission of missing values . 102 4.2.2 Single imputation strategies . 103 4.2.3 Multiple imputation strategies . 105 4.2.4 Unmeasured and unmeasurable variables . 108 4.3 Cleaning time-series . 110 4.3.1 The nature of the problem . 110 4.3.2 Data-cleaning filters . 115 4.3.3 The center-weighted median filter . 118 4.3.4 The Hampel filter . 122 4.4 Multivariate outlier detection . 124 4.4.1 Visual inspection . 125 4.4.2 Covariance-based detection . 127 Contents _ vii 4.4.3 Regression-based detection .131 4.4.4 Depth-based detection .134 4.5 Preliminary analyses and auxiliary knowledge .138 5 What Is a "Good" Data Characterization? 141 5.1 A motivating example . 142 5.2 Characterization via functional equations . 143 5.2.1 A brief introduction to functional equations . 144 5.2.2 Homogeneity and its extensions . 147 5.2.3 Location-invariance and related conditions . 149 5.2.4 Outlier detection procedures . 152 5.2.5 Quasi-linear means . . 154 5.2.6 Results for positive-breakdown estimators . 155 5.3 Characterization via inequalities . 159 5.3.1 Inequalities as aids to interpretation . 159 5.3.2 Relations between data characterizations . 161 5.3.3 Bounds on means and standard deviations . 163 5.3.4 Inequalities as uncertainty descriptions . 166 5.4 Coda: What is a "good" data characterization? . 172 6 GSA 177 6.1 The GSA metaheuristic .177 6.2 The notion of exchangeability .179 6.3 Choosing scenarios .180 6.3.1 Some general guidelines .181 6.3.2 Managing subscenarios .186 6.3.3 Experimental design and scenario selection .188 6.4 Sampling schemes .190 6.5 Selecting a descriptor d() .192 6.6 Displaying and interpreting the results .193 6.6.1 Normal Q-Q plots .194 6.6.2 Direct comparisons across scenarios .195 6.7 The model approximation case study .197 6.8 Extensions of the basic GSA framework .201 6.8.1 Iterative analysis procedures .201 6.8.2 Multivariable descriptors .204 7 Sampling Schemes for a Fixed Dataset 207 7.1 Four general strategies .207 7.1.1 Strategy 1: Random selection .208 7.1.2 Correlation and overlap .212 7.1.3 Strategy 2: Subset deletion .213 7.1.4 Strategy 3: Comparisons .217 7.1.5 Strategy 4: Partially systematic sampling .222 7.2 Random selection examples .231 7.2.1 Variability of kurtosis estimates .232 7.2.2 The industrial pressure datasets .235 viii Contents 7.3 Subset deletion examples .237 7.3.1 The storage tank dataset .237 7.3.2 Dynamic correlation analysis .240 7.4 Comparison-based examples .244 7.4.1 Correlation-destroying permutations .245 7.4.2 Rank-based dynamic analysis .248 7.5 Two systematic selection examples .251 7.5.1 Moving-window data characterizations .252 7.5.2 The Michigan lung cancer dataset .263 8 Concluding Remarks and Open Questions 269 8.1 Analyzing large datasets .269 8.2 Prior knowledge, auxiliary data, and assumptions .272 8.3 Some open questions .274 8.3.1 How prevalent are different types of data anomalies? .274 8.3.2 How should outliers be modelled? .276 8.3.3 How should asymmetry be handled? .277 8.3.4 How should misalignments be detected? .283 Bibliography 287 Index 301
any_adam_object	1
any_adam_object_boolean	1
author	Pearson, Ronald K. 1952-
author_GND	(DE-588)121582736
author_facet	Pearson, Ronald K. 1952-
author_role	aut
author_sort	Pearson, Ronald K. 1952-
author_variant	r k p rk rkp
building	Verbundindex
bvnumber	BV023008639
callnumber-first	Q - Science
callnumber-label	QA76
callnumber-raw	QA76.9.D343
callnumber-search	QA76.9.D343
callnumber-sort	QA 276.9 D343
callnumber-subject	QA - Mathematics
classification_rvk	ST 530
ctrlnum	(OCoLC)57210946 (DE-599)BVBBV023008639
dewey-full	006.3
dewey-hundreds	000 - Computer science, information, general works
dewey-ones	006 - Special computer methods
dewey-raw	006.3
dewey-search	006.3
dewey-sort	16.3
dewey-tens	000 - Computer science, information, general works
discipline	Informatik
discipline_str_mv	Informatik
format	Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01791nam a2200433zc 4500</leader><controlfield tag="001">BV023008639</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20080107 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">071119s2005 xxud\|\|\| \|\|\|\| 00\|\|\| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">2004065395</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0898715822</subfield><subfield code="c">pbk.</subfield><subfield code="9">0-89871-582-2</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)57210946</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV023008639</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-384</subfield><subfield code="a">DE-N2</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA76.9.D343</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.3</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Pearson, Ronald K.</subfield><subfield code="d">1952-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)121582736</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Mining imperfect data</subfield><subfield code="b">dealing with contamination and incomplete records</subfield><subfield code="c">Ronald K. Pearson</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Philadelphia</subfield><subfield code="b">Society for Industrial and Applied Mathematics</subfield><subfield code="c">2005</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">X, 305 S.</subfield><subfield code="b">graph. Darst.</subfield><subfield code="c">25 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references (p. 287-300) and index</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data mining</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Fehlende Daten</subfield><subfield code="0">(DE-588)4264715-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Fehlende Daten</subfield><subfield code="0">(DE-588)4264715-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2=" "><subfield code="u">http://www.loc.gov/catdir/enhancements/fy0665/2004065395-d.html</subfield><subfield code="3">Publisher description</subfield></datafield><datafield tag="856" ind1="4" ind2=" "><subfield code="u">http://www.loc.gov/catdir/enhancements/fy0665/2004065395-t.html</subfield><subfield code="3">Table of contents only</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Augsburg</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016212883&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-016212883</subfield></datafield></record></collection>
id	DE-604.BV023008639
illustrated	Illustrated
index_date	2024-07-02T19:08:31Z
indexdate	2024-07-09T21:08:51Z
institution	BVB
isbn	0898715822
language	English
lccn	2004065395
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-016212883
oclc_num	57210946
open_access_boolean
owner	DE-384 DE-N2
owner_facet	DE-384 DE-N2
physical	X, 305 S. graph. Darst. 25 cm
publishDate	2005
publishDateSearch	2005
publishDateSort	2005
publisher	Society for Industrial and Applied Mathematics
record_format	marc
spelling	Pearson, Ronald K. 1952- Verfasser (DE-588)121582736 aut Mining imperfect data dealing with contamination and incomplete records Ronald K. Pearson Philadelphia Society for Industrial and Applied Mathematics 2005 X, 305 S. graph. Darst. 25 cm txt rdacontent n rdamedia nc rdacarrier Includes bibliographical references (p. 287-300) and index Data mining Fehlende Daten (DE-588)4264715-0 gnd rswk-swf Data Mining (DE-588)4428654-5 gnd rswk-swf Fehlende Daten (DE-588)4264715-0 s Data Mining (DE-588)4428654-5 s DE-604 http://www.loc.gov/catdir/enhancements/fy0665/2004065395-d.html Publisher description http://www.loc.gov/catdir/enhancements/fy0665/2004065395-t.html Table of contents only Digitalisierung UB Augsburg application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016212883&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis
spellingShingle	Pearson, Ronald K. 1952- Mining imperfect data dealing with contamination and incomplete records Data mining Fehlende Daten (DE-588)4264715-0 gnd Data Mining (DE-588)4428654-5 gnd
subject_GND	(DE-588)4264715-0 (DE-588)4428654-5
title	Mining imperfect data dealing with contamination and incomplete records
title_auth	Mining imperfect data dealing with contamination and incomplete records
title_exact_search	Mining imperfect data dealing with contamination and incomplete records
title_exact_search_txtP	Mining imperfect data dealing with contamination and incomplete records
title_full	Mining imperfect data dealing with contamination and incomplete records Ronald K. Pearson
title_fullStr	Mining imperfect data dealing with contamination and incomplete records Ronald K. Pearson
title_full_unstemmed	Mining imperfect data dealing with contamination and incomplete records Ronald K. Pearson
title_short	Mining imperfect data
title_sort	mining imperfect data dealing with contamination and incomplete records
title_sub	dealing with contamination and incomplete records
topic	Data mining Fehlende Daten (DE-588)4264715-0 gnd Data Mining (DE-588)4428654-5 gnd
topic_facet	Data mining Fehlende Daten Data Mining
url	http://www.loc.gov/catdir/enhancements/fy0665/2004065395-d.html http://www.loc.gov/catdir/enhancements/fy0665/2004065395-t.html http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016212883&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA
work_keys_str_mv	AT pearsonronaldk miningimperfectdatadealingwithcontaminationandincompleterecords

Verfügbarkeit

Es ist kein Print-Exemplar vorhanden.

Fernleihe Bestellen Achtung: Nicht im THWS-Bestand! Inhaltsverzeichnis

MARC

Datensatz im Suchindex

Es ist kein Print-Exemplar vorhanden.

Ähnliche Einträge