A practitioner's guide to resampling for data analysis, data mining, and modeling:
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Boca Raton [u.a.]
CRC Pr.
2012
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | X, 214 S. graph. Darst. |
ISBN: | 9781439855508 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV042230830 | ||
003 | DE-604 | ||
005 | 00000000000000.0 | ||
007 | t | ||
008 | 141209s2012 d||| |||| 00||| eng d | ||
020 | |a 9781439855508 |9 978-1-4398-5550-8 | ||
035 | |a (OCoLC)762147365 | ||
035 | |a (DE-599)BVBBV042230830 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
049 | |a DE-83 | ||
082 | 0 | |a 519.54 | |
100 | 1 | |a Good, Phillip I. |d 1937- |e Verfasser |0 (DE-588)130601152 |4 aut | |
245 | 1 | 0 | |a A practitioner's guide to resampling for data analysis, data mining, and modeling |c Phillip I. Good |
264 | 1 | |a Boca Raton [u.a.] |b CRC Pr. |c 2012 | |
300 | |a X, 214 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Datenanalyse |0 (DE-588)4123037-1 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Resampling |0 (DE-588)4288033-6 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Modellierung |0 (DE-588)4170297-9 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Resampling |0 (DE-588)4288033-6 |D s |
689 | 0 | 1 | |a Datenanalyse |0 (DE-588)4123037-1 |D s |
689 | 0 | 2 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | 3 | |a Modellierung |0 (DE-588)4170297-9 |D s |
689 | 0 | |5 DE-604 | |
856 | 4 | 2 | |m HBZ Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027669139&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-027669139 |
Datensatz im Suchindex
_version_ | 1804152764667789312 |
---|---|
adam_text | Titel: A practitioner s guide to resampling for data analysis, data mining, and modeling
Autor: Good, Phillip I
Jahr: 2012
Contents
Preface.
..IX
1 Wide Range of Applications.........................................................................1
Resampling Methods......................................................................................1
Fields of Application........................................................................................2
2 Estimation and the Bootstrap.......................................................................7
Precision of an Estimate..................................................................................7
Stata.............................................................................................................10
Applying the Bootstrap............................................................................10
Which Statistic Should We Use?.............................................................10
Confidence Intervals......................................................................................12
When Variances Cannot Be Assumed to Be the Same........................12
R...................................................................................................................13
Stata.............................................................................................................13
Testing for Equivalence............................................................................14
Improved Confidence Intervals...................................................................15
Bias-Corrected Bootstrap Confidence Interval.....................................15
Computer Code: The Bias-Corrected and -Accelerated Bootstrap.... 16
R..............................................................................................................16
SAS..........................................................................................................17
S-Phis......................................................................................................17
Stata........................................................................................................17
Balanced Bootstrap...................................................................................17
Tilted Bootstrap.........................................................................................18
Block Bootstrap..........................................................................................18
Iterated Bootstrap......................................................................................19
When the Form of the Distribution Is Known......................................20
Estimating Bias...............................................................................................20
An Example................................................................................................21
Determining Sample Size.............................................................................22
Original Sample........................................................................................22
Bootstrap Sample......................................................................................23
Summary.........................................................................................................24
To Learn More................................................................................................24
Exercises..........................................................................................................25
3 Software for Use with the Bootstrap and Permutation Tests..............27
AFNI................................................................................................................27
Blossom Statistical Analysis Package.........................................................27
iii
iv Contents
Eviews..............................................................................................................28
HaploView.......................................................................................................28
MATLAB®.......................................................................................................28
NCSS................................................................................................................28
PAUP................................................................................................................29
R........................................................................................................................29
SAS...................................................................................................................29
S-Plus...............................................................................................................30
SPSS Exact Tests.............................................................................................30
Stata..................................................................................................................30
Statistical Calculator......................................................................................30
StatXact............................................................................................................31
Testimate.........................................................................................................31
4 Comparing Two Populations......................................................................33
A Distribution-Free Test................................................................................33
A Little Math..............................................................................................35
Some Statistical Considerations...................................................................35
Framing the Hypothesis..........................................................................36
Hypothesis vs. Alternative......................................................................36
Assumptions..............................................................................................37
General Hypotheses.................................................................................38
Computing the p-Value.................................................................................39
Monte Carlo................................................................................................39
R..............................................................................................................40
SPLUS.....................................................................................................40
STATA....................................................................................................40
Other Two-Sample Comparisons................................................................41
Two-Sided Test...........................................................................................41
Rank Tests..................................................................................................42
Matched Pairs............................................................................................42
RCode....................................................................................................43
Stata........................................................................................................44
Test for Nonequivalence...........................................................................44
Underlying Assumptions.........................................................................45
Comparing Variances....................................................................................45
R Code for Aly s Test Statistic..................................................................47
Unequal Sample Sizes..............................................................................48
Preferred Method......................................................................................48
RCode....................................................................................................49
Testing in the Presence of Nonresponders............................................50
Summary.........................................................................................................51
To Learn More................................................................................................51
Exercises..........................................................................................................52
Contents
5 Multiple Variables........................................................................................55
Single-Valued Test Statistic...........................................................................55
Hotelling s T2.............................................................................................55
Application to Repeated Measures....................................................57
The Generalized Quadratic Form...........................................................58
Application to Epidemiology..............................................................58
Further Generalization........................................................................59
The MRPP Statistic....................................................................................59
Analyzing Migration Data..................................................................60
Gene Set Enrichment Analysis...............................................................61
Combining Univariate Tests.........................................................................62
Pesarin s Nonparametric Combination..................................................64
Summary.........................................................................................................65
To Learn More................................................................................................65
Exercises..........................................................................................................66
6 Experimental Design and Analysis..........................................................69
Separating Signal from Noise......................................................................69
Blocking......................................................................................................70
Analyzing a Blocked Experiment...........................................................71
Combining Data to Obtain Improved Estimates.............................71
Comparing Samples from Two Populations....................................72
Randomization..........................................................................................73
A:-Sample Comparison...................................................................................74
Testing for Any and All Differences among Means............................74
Testing for Any and All Differences among Variances.......................75
R..............................................................................................................76
Stata........................................................................................................77
Ordered Alternatives................................................................................77
Multiple Factors..............................................................................................78
Main Effects...............................................................................................79
Testing for Interactions.............................................................................81
Eliminating the Effects of Multiple Covariates.........................................82
Latin Squares.............................................................................................83
Crossover Designs.........................................................................................86
Analysis of a Complete Balanced Design..............................................87
Analysis of a Balanced Design When Not All Subjects
Complete Treatment.................................................................................88
Which Sets of Labels Should We Rearrange?............................................88
Determining Sample Size.............................................................................89
Missing Combinations..................................................................................89
Summary.........................................................................................................91
To Learn More................................................................................-..............91
Exercises..........................................................................................................92
vi Contents
7 Categorical Data............................................................................................97
Fisher s Exact Test..........................................................................................97
Computing Fisher s Exact Test................................................................99
R............................................................................................................100
Two-Tailed Tests......................................................................................100
Borderline Significance..........................................................................102
Is the Sample Large Enough?................................................................103
Odds Ratio....................................................................................................104
Stratified 2 x 2s........................................................................................106
Controlling the False Discovery Rate...................................................107
Unordered rxc Contingency Tables.........................................................107
Test of Association..................................................................................109
Causation vs. Association......................................................................Ill
Ordered Statistical Tables...........................................................................112
Partial Dependence.................................................................................113
Correspondence Analysis.................................................................114
More than Two Rows and Two Columns............................................114
Singly Ordered Tables.......................................................................114
Doubly Ordered Tables......................................................................116
Multidimensional Arrays...........................................................................116
Summary.......................................................................................................117
To Learn More..............................................................................................118
Exercises........................................................................................................118
8 Multiple Hypotheses..................................................................................121
Controlling the Family-Wise Error Rate...................................................121
Microarray Analysis...............................................................................122
EEG Analysis...........................................................................................122
Controlling the False Discovery Rate.......................................................123
Software for Performing Multiple Simultaneous Tests..........................124
AFNI.........................................................................................................124
ExactFDR..................................................................................................124
NPCtest.....................................................................................................125
R.................................................................................................................125
SAS............................................................................................................125
Testing for Trend..........................................................................................125
Summary.......................................................................................................127
To Learn More..............................................................................................127
9 Model Building...........................................................................................129
Regression Models.......................................................................................129
Bivariate Dependence.............................................................................131
Applying the Permutation Test..................................................................131
Models with a Single Predictor.............................................................132
Contents vii
Comparing Two Regression Lines........................................................132
Multipredictor Regression.....................................................................134
Adaptive Regression...............................................................................136
Applying the Bootstrap...............................................................................137
Stata...........................................................................................................138
Building a Model.....................................................................................139
Limitations of the Bootstrap..................................................................140
Prediction Error............................................................................................140
Cross-Validation......................................................................................141
Double Bootstrap.....................................................................................141
Validation......................................................................................................141
Metrics......................................................................................................142
Nearest Neighbors..................................................................................142
Goodness of Fit........................................................................................143
Using the Bootstrap for Model Validation...........................................144
RCode..................................................................................................145
Cross-Validation......................................................................................145
Summary.......................................................................................................146
To Learn More..............................................................................................146
Exercises........................................................................................................147
10 Classification...............................................................................................149
Cluster Analysis...........................................................................................149
Classification.................................................................................................151
Decision Trees..............................................................................................154
Refining the Model.................................................................................155
Decision Trees vs. Regression....................................................................155
Which Predictors?...................................................................................158
Which Decision Tree Algorithm Is Best for Your Application?............159
Some Comparisons.................................................................................163
Reducing the Rate of Misclassification.....................................................163
Boosting....................................................................................................163
AdaBoost Algorithm...............................................................................167
Ensemble Methods..................................................................................167
Comparison of Classification Tree Algorithms.......................................168
Validation vs. Cross-Validation..................................................................170
Summary.......................................................................................................170
To Learn More..............................................................................................171
Exercises........................................................................................................172
11 Restricted Permutations............................................................................173
Quasi Independence....................................................................................173
Complete Factorials.....................................................................................174
viii Contents
Synchronized Permutations.......................................................................175
Generalizing These Results to Multiple Factors.................................177
Algorithms...............................................................................................179
Which Test Should We Use?..................................................................180
Model Validation..........................................................................................180
Exercises........................................................................................................181
Appendix A: Basic Concepts in Statistics.....................................................183
Additive vs. Multiplicative Models...........................................................183
Central Values..............................................................................................183
Combinations and Rearrangements.........................................................184
Dispersion.....................................................................................................184
Frequency Distribution and Percentiles...................................................185
Linear vs. Nonlinear Regression...............................................................185
Regression Methods....................................................................................186
Appendix B: Proof of Theorems.....................................................................187
References...........................................................................................................193
Index.....................................................................................................................211
|
any_adam_object | 1 |
author | Good, Phillip I. 1937- |
author_GND | (DE-588)130601152 |
author_facet | Good, Phillip I. 1937- |
author_role | aut |
author_sort | Good, Phillip I. 1937- |
author_variant | p i g pi pig |
building | Verbundindex |
bvnumber | BV042230830 |
ctrlnum | (OCoLC)762147365 (DE-599)BVBBV042230830 |
dewey-full | 519.54 |
dewey-hundreds | 500 - Natural sciences and mathematics |
dewey-ones | 519 - Probabilities and applied mathematics |
dewey-raw | 519.54 |
dewey-search | 519.54 |
dewey-sort | 3519.54 |
dewey-tens | 510 - Mathematics |
discipline | Mathematik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01532nam a2200385 c 4500</leader><controlfield tag="001">BV042230830</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">00000000000000.0</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">141209s2012 d||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781439855508</subfield><subfield code="9">978-1-4398-5550-8</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)762147365</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV042230830</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-83</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">519.54</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Good, Phillip I.</subfield><subfield code="d">1937-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)130601152</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">A practitioner's guide to resampling for data analysis, data mining, and modeling</subfield><subfield code="c">Phillip I. Good</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boca Raton [u.a.]</subfield><subfield code="b">CRC Pr.</subfield><subfield code="c">2012</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">X, 214 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Datenanalyse</subfield><subfield code="0">(DE-588)4123037-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Resampling</subfield><subfield code="0">(DE-588)4288033-6</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Modellierung</subfield><subfield code="0">(DE-588)4170297-9</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Resampling</subfield><subfield code="0">(DE-588)4288033-6</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Datenanalyse</subfield><subfield code="0">(DE-588)4123037-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="3"><subfield code="a">Modellierung</subfield><subfield code="0">(DE-588)4170297-9</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027669139&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-027669139</subfield></datafield></record></collection> |
id | DE-604.BV042230830 |
illustrated | Illustrated |
indexdate | 2024-07-10T01:15:54Z |
institution | BVB |
isbn | 9781439855508 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-027669139 |
oclc_num | 762147365 |
open_access_boolean | |
owner | DE-83 |
owner_facet | DE-83 |
physical | X, 214 S. graph. Darst. |
publishDate | 2012 |
publishDateSearch | 2012 |
publishDateSort | 2012 |
publisher | CRC Pr. |
record_format | marc |
spelling | Good, Phillip I. 1937- Verfasser (DE-588)130601152 aut A practitioner's guide to resampling for data analysis, data mining, and modeling Phillip I. Good Boca Raton [u.a.] CRC Pr. 2012 X, 214 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Data Mining (DE-588)4428654-5 gnd rswk-swf Datenanalyse (DE-588)4123037-1 gnd rswk-swf Resampling (DE-588)4288033-6 gnd rswk-swf Modellierung (DE-588)4170297-9 gnd rswk-swf Resampling (DE-588)4288033-6 s Datenanalyse (DE-588)4123037-1 s Data Mining (DE-588)4428654-5 s Modellierung (DE-588)4170297-9 s DE-604 HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027669139&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Good, Phillip I. 1937- A practitioner's guide to resampling for data analysis, data mining, and modeling Data Mining (DE-588)4428654-5 gnd Datenanalyse (DE-588)4123037-1 gnd Resampling (DE-588)4288033-6 gnd Modellierung (DE-588)4170297-9 gnd |
subject_GND | (DE-588)4428654-5 (DE-588)4123037-1 (DE-588)4288033-6 (DE-588)4170297-9 |
title | A practitioner's guide to resampling for data analysis, data mining, and modeling |
title_auth | A practitioner's guide to resampling for data analysis, data mining, and modeling |
title_exact_search | A practitioner's guide to resampling for data analysis, data mining, and modeling |
title_full | A practitioner's guide to resampling for data analysis, data mining, and modeling Phillip I. Good |
title_fullStr | A practitioner's guide to resampling for data analysis, data mining, and modeling Phillip I. Good |
title_full_unstemmed | A practitioner's guide to resampling for data analysis, data mining, and modeling Phillip I. Good |
title_short | A practitioner's guide to resampling for data analysis, data mining, and modeling |
title_sort | a practitioner s guide to resampling for data analysis data mining and modeling |
topic | Data Mining (DE-588)4428654-5 gnd Datenanalyse (DE-588)4123037-1 gnd Resampling (DE-588)4288033-6 gnd Modellierung (DE-588)4170297-9 gnd |
topic_facet | Data Mining Datenanalyse Resampling Modellierung |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027669139&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT goodphillipi apractitionersguidetoresamplingfordataanalysisdataminingandmodeling |