Multivariate statistical modeling:
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Lincoln, Mass.
Entropy
1983
|
Ausgabe: | 1. ed. |
Schriftenreihe: | Entropy minimax sourcebook
5 |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | XVII, 726 S. graph. Darst. |
ISBN: | 0938876147 |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV013095273 | ||
003 | DE-604 | ||
005 | 00000000000000.0 | ||
007 | t | ||
008 | 000405s1983 d||| |||| 00||| eng d | ||
020 | |a 0938876147 |9 0-938-87614-7 | ||
035 | |a (OCoLC)10713767 | ||
035 | |a (DE-599)BVBBV013095273 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
049 | |a DE-703 |a DE-19 | ||
050 | 0 | |a Q370 | |
084 | |a QH 233 |0 (DE-625)141548: |2 rvk | ||
100 | 1 | |a Christensen, Ronald |d 1951- |e Verfasser |0 (DE-588)111351820 |4 aut | |
245 | 1 | 0 | |a Multivariate statistical modeling |c Ronald Christensen |
250 | |a 1. ed. | ||
264 | 1 | |a Lincoln, Mass. |b Entropy |c 1983 | |
300 | |a XVII, 726 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Entropy minimax sourcebook |v 5 | |
650 | 4 | |a Multivariate analysis | |
830 | 0 | |a Entropy minimax sourcebook |v 5 |w (DE-604)BV002406540 |9 5 | |
856 | 4 | 2 | |m HBZ Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=008919297&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-008919297 |
Datensatz im Suchindex
_version_ | 1804127790554939392 |
---|---|
adam_text | TABLE OF CONTENTS
PART I: Setting Up the Prediction Problem
1. Foundations of Statistical Modeling
A. Information Accounting 5
B. The Essential Pragmatism of Probability 12
C. The Rudimentary Logical Probabilities Ratios of
Possibilities 12
D. Rudimentary Empirical Probability—Frequency Ratios 16
E. Generalized Logical Probabilities 19
F. Measures of Human Knowledge—Referenced Probabilities 23
G. Maximum Entropy Expectation 38
H. Quantum Uncertainties and Linguistic Invariance
Versus Linguistic Relativity 46
I. The Evolution of Language—Minimum Entropy Descriptions 49
J. Irrelevancy and Ambiguity 54
2. Defining the Dependent Variable (What You Want to Predict)
A. Historical Perspective 67
B. Selecting the Dependent Variable 69
C. Partitioning the Dependent Variable 71
D. Decision Theoretic Partitioning 76
E. Merging Outcome Classes 77
F. Potential Function Classification 78
3. Identifying the Independent Variables (Upon Which the
Dependent Variable May Depend)
A. Independent Variables 95
B. Guides to Identification of Independent Variables 96
C. Data Costs versus Expected Information Value 99
D. Possible Usefulness of Redundancy 101
PART II: Establishing the Data Base
4. Collecting the Data
A. Locating Sources of Data 109
B. Characterizing Error and Uncertainty 113
C. Identifying Irrelevancy and Ambiguity 119
D. Designing Data Collection Forms and Procedures 120
E. Designing Data Qualification Procedures 120
F. Use of Pilot Samp!es to Test the Procedures 121
G. Debugging Data Collection Forms and Procedures 121
H. Collecting the Data 122
I. Data Entry and Qua!ification 123
J. Data Cleanup 124
K. How Much Data? 125
ix
5. Splitting the Data Into Training, Trial and Verification Portions
A. Training, Trial and Verification Portions 133
B. Methods of Data Splitting 135
C. Random Splitting 138
D. Accounting For Degrees of Freedom Used in Stratification 139
6. Surveying General Statistics of Training Data
A. Restrict Statistical Survey to Training Data Only 145
B. Outcome Class Frequencies 146
C. Outcome Class Distribution 148
D. Quantitative Independent Variables Distributions 168
E. Qualitative Independent Variables Histograms 170
F. Quantitative Individual Independent Variables Maxima, Minima,
Means, Standard Widths, Variances, Skewnesses, Kurtoses and
Warps 172
G. Inter Correlations Between Pairs of Quantitative Independent
VariablI es 173
H. Association Measurement for Pairs of Qualitative Independent
Variables 175
I. Association Measurement for Mixed Pairs of Qualitative and
Quantitative Independent Variables 177
J. Measuring Association Between Individual Independent
Variables and the Dependent Variable 178
K. Significance Against Null Hypothesis for Best Univariate
Predictor 179
L. Forming and Assessing Performance of Best Univariate
Predictor 183
PART III: Establishing a Mechanistic Algorithm Library
7. Constructing Mechanistic Algorithms
A. Assembling List of Algorithms 191
B. Acquiring Computer Algorithms 192
C. Preparing New Algorithm Codings 193
D. Testing Algorithms for Sensibility of Results Over Ranges
of Input Values Represented by the Training Data 193
E. Putting Parameters to be Tuned Into Input File For
Computational Efficiency 194
F. Censorship 194
8. Tuning Parameters in Mechanistic Algorithms
A. Restriction of Tuning to the Training Data 199
B. Weighting Data by Effective Event Count 200
C. Selecting the Measure of Error to be Minimized 202
D. Outlier Considerations 220
E. Error Minimization Techniques 222
F. Statistical Significance 224
G. Input Uncertainty and Sensitivity Considerations in
Parameter Setting 233
x
9. Forming a Mechanistic Predictor
A. The Role of the Mechanistic Predictor 241
B. Types of Mechanistic Predictors 241
C. What the Mechanistic Predictor Must Predict A Probability
Distribution Over the Dependent Variable 245
D. Tuning Parameters in the Mechanistic Predictor 249
E. Computing Effective Number of Events Supporting Mechanistic
Predictor as Function of Predicted Value 250
PART IV: Discovering the Patterns
10. Assembling the Model Building Files
A. Model Building Files 261
B. File PDPIN 262
C. File EVN (Optional) 273
D. File IVN (Optional) 274
E. File DVN 274
F. Files IVS and NVS 275
G. File DVS 276
H. File PDPSC (Optional) 279
11. Forming Feature Extraction Algorithms
A. The Rol e and Importance of Feature Extraction 283
B. Raw Data Features 284
C. Algebraic Combination Features 284
D. Principal Components, Factor Analysis, etc., 285
E. Mechanistic Algorithm Output Features 287
F. Time Series Filter Features 288
G. Few or Many Features? 296
12. Determining Constraint Information Weights
A. Maximum Entropy Probabilities 303
B. Constraint Information Weights 306
C. Effects of Unit Measure Violations 310
13. Selecting the Basic Feature Set
A. Overview of Feature Selection 315
B. Computing Single Feature Coefficients of Correlation with
the Dependent Variable 316
C. Computing Conditional Entropies of Single Feature Thresholds
for Sorting On the Dependent Variable 320
D. Computing Entropy Exchange of Each Single Feature Threshold
for Sorting On the Dependent Variable 322
E. Combined Estimate of Feature Efficacy 324
F. Eliminating Highly Correlating Features 325
G. Inclusion of Alternative Representations 327
H. Eliminating Physically Unreasonable Features 327
xi
14. Determining Weight Normalization
A. Training/Trial Crossvalidation 331
B. Plotting Error Versus Normalization 334
C. Selecting the Normalization 336
D. Normalization Splitting 337
E. Relationship of Weight Normalization to Bayesian Analysis,
Regression Analysis and Linear Shrinkage 339
15. Selecting Feature Subset for Rotations
A. Feature Subspace for Rotations 353
B. Conditional Entropy Minimization Preprocessing Rankings 353
C. Entropy Exchange Maximization Preprocessing Ranking 354
D. Linear Correlation Ranking 355
E. Making the Subspace Selection 355
16. Determining Thresholds and Rotations
A. Thresholds 361
B. Special Constraints on Thresholds 363
C. Rotations 364
D. Threshold Spreading for Continuous Features 366
17. Finding the Patterns
A. Pattern Discovery Procedures 371
B. SWAPDP 380
C. Selecting Processing Order 380
D. Selecting Preprocessing Order 381
E Selecting Correlation Limit 384
F. Selecting Special Restrictions on Patterns 384
G. Selecting Pattern Search Mode 388
H. Path Dependency of Probabil ities 391
I. Selecting Among Equal Entropy (or Entropy Exchange)
Patterns 393
J. Running the Program 394
K. Checking Output for Irregularities 396
L. Creating Condensed Pattern Files 397
18. Convoluting Utilities and Probabilities to Truncate Pattern Series
A. Introduction 403
B. Minimum Entropy, the Linear Information Utility
Truncation Criterion 405
C. Examples of Truncation Criteria for Which Utility is
Nonlinear in Information 407
D. Examples of Truncation Criteria which Explicitly Trade
Accuracy versus Definitiveness 409
E. General Utility—Probability Convolution for Pattern
Truncation 411
xii
PART V: Building the Pattern Recognizer and Outcome Predictor
19. Identifying Pattern Matches
A. Pattern Matching Logic 419
B. Handling Unknowns 421
C. Handling Irrelevancies and Ambiguities 422
D. Relation of Irrelevancies and Ambiguities to Fuzzy Sets 428
E. The Security Hyperannulus 428
F. Checkout on the Model Building Data 429
20. Setting Data Range Factors
A. Feature Ranges Spanned by Model Building Data 433
B. Use of Thresholds to Determine Relevant Data Range
Distance Seales 434
C. Data Range Shape Factors 435
D. Data Density Dependent Range Factors 436
E. Interpolation Versus Extrapolation 437
21. Defining Probabilities and Uncertainties by Solonic Amalgamation
of Output From Patterns and Mechanistic Predictor
A. Effective Event Count 443
B. Amalgamating Probabilities and Uncertainties 445
C. Amalgamating a Pattern With the Mechanistic Predictor 449
D. Amalgamating Amalgamations 450
E. Essential Feature Completely Out of Range 452
F. No Pattern Matched 452
G. Amalgamations Involving Quantitative Dependent Variables 453
PART VI: Making Predictions
22. Assembling the Model Verification Files
A. First Look at the Verification Data 459
B. Independent Variables Data File 460
C. Dependent Variable Data File 460
23. Extracting Features from the Model Verification Data
A. Importance of Using the Same Operational Definition of Features
for Verification Events as for Building Events 465
B. Passing Model Verification Data Through the Mechanistic
Predictor 466
C. Feature Extraction and Assembly of Model Verification
Feature Files 468
xiii
24. Finding Pattern Matches and Computing Probabilities and Associated
Uncertainties for the Model Verification Events
A. Finding Pattern Matches 473
B. Comparing Percentage Matching Each Pattern in Verification
Data to Percentages for Model Building Data 474
C. Data Range Factors 475
D. Probabilities and Associated Uncertainties 475
E. Expected Values and Standard Deviations for Quantitative
Dependent Variables 478
F. Statistical Summary File 478
G. Issuing Predictions 479
PART VII: Assessing the Predictions
25. Plotting Observations (Frequencies) versus Predictions
(Probabilities)
A. Predictive Performance Plots 489
B. Sorting According to Prediction (Probability) 490
C. Grouping Into P axis Lumps 491
D. Plotting Mean Probability Versus Mean Frequency for
Each Lump 493
E. Computing Uncertainty in Mean Frequency for Each Lump
(Vertical Bars) 494
F. Positioning the Vertical Error Bars 495
G. Computing Uncertainty in Mean Probability for Each Lump
(Horizontal Bars) 499
H. Positioning the Horizontal Error Bars 500
I. Comprehensiveness 502
26. Assessing Predictive Accuracy
A. Visual Comparison to Perfect and Random Predictors 507
B. Entropy Decrement Measure of Error in Probability 512
C. Other Measures of Error in Probability 515
D. Entropy Decrement Measure of Error in Asserted Uncertainty 527
E. Dispersion Ratio Measure of Error in Asserted Uncertainty 528
F. Assessment of Sensitivity of Predictions to Variations
in Input 529
27. Assessing Predictive Definitiveness
A. Visual Examination of Definitiveness 537
B. Entropy Decrement Measure of Predictive Definitiveness 538
C. Other Measures of Predictive Definitiveness 541
xiv
28. Detecting Information Leakage From Model Verification Data Into
the Model Building Process
A. Seriousness of Information Leakage 549
B. Predictability Degradation Indicator of Possible
Information Leakage 550
C. Overfitting Indication of Information Leakage 551
D. Questionable Procedures as Indicators of Information
Leakage 553
29. Assessing Predictions in Terms of Decision Making Objectives
A. Are the Dependent Variable and Its Partition Defined So That
the Predictions are Most Useful? 559
B. Are the Variables Employed as Features by the Patterns
Convenient to Obtain and Use for Decision Making? 559
C. Is the Level of Accuracy Adequate? 560
D. Is the Extent of Definitiveness Adequate? 561
E. Does the Intended Application Necessitate Utility Weighted
Assessment? 562
F. Are the Predictions Reproducible? 566
PART VIII: Improving the Model
30. Examining Reasons for Errors
A. Providing Active Support for Models 575
B. Identifying On Which Verification Events Errors Were Made 576
C. Comparing Building and Verification Data for Significant
Statistical Differences 578
D. Seeking Reasons for Differences Found Between Building
and Verification Data 581
E. Do the Building/Verification Data Differences Account for
the Major Predictive Errors? 582
F. Seeking Reasons for Residual Portions of Predictive Errors
on Verification Data 582
31. Improving the Mechanistic Predictor
A. Locating Consistent Errors and Revising the Model to
Account for Them 587
B. Identifying Portions of Model With Greatest Uncertainties 588
C. Estimating Sensitivity of Model Output to Input and
Parameters 589
D. Reducing Sensitivity to Parameters with Greatest
Uncertainties 589
E. Producing Monotonic Departure from Range of Reasonable
Results When Outside the Range of Reasonable Doubt 590
xv
32. Improving the Patterns
A. Expanding Set of Independent Variables for Consideration 595
B. Trying New Nonlinear Combinations and Other Mechanistic
Algorithms 596
C. Revising Selection of Basic Feature Set for Pattern
Discovery 596
D. Trying New Rotations Subspaces 598
E. Specifying Special Thresholds on Quantitative Variables 598
F. Defining Special Restrictions on the Pattern Search 599
G. Conducting More Extensive Pattern Search 599
H. Putting a Human Into the Model Applications Loop 604
I. Putting a Human Into the Model Building Loop 606
33. Assessing Improvements With New Verification Data
A. Acquiring New Verification Data 611
B. Making Predictions for the New Verification Events 612
C. Assessing the Predictions 613
D. Assessing Adequacy of Idiot Proofing 614
E. Assessing User Cordiality 615
F. Recap 616
EPILOG: Self Referential Predictors
A. Informational Feedback 621
B. Self Referencing Conditions 622
C. Self Referencing Outcomes 624
xvi
APPENDICES
1. Distribution Function Selection
A. Derived Selections 633
B. Sample Characteristics Inventory. 635
C. Discrete or Continuous? 641
D. Skewed? 642
E. Truncated or Elongated? 643
F. Warped? 645
G. Single or Multiple Peaked? 646
2. Critical Value Tables of Mean Square Error, Average Absolute Error
and Maximum Absolute Error for Rectangular, Isosceles Triangular,
Normal, Logistic, Extreme Value, Laplace, Exponential, Rayleigh,
Maxwell Boltzmann, Symmetric Quadratic, Ascending Wedge, Descend¬
ing Wedge and Student s t Distributions 651
3. Critical Value Tables of Average Relative Absolute Error, Total
Relative Square Error and Total Relative Log Ratio Error for
Log Normal, Pareto, Gamma, Weibull, Chi, Chi Squared, F and
Beta Distributions 665
4. Symbol s 689
Index of Names 691
Index of Subjects 699
xvii
|
any_adam_object | 1 |
author | Christensen, Ronald 1951- |
author_GND | (DE-588)111351820 |
author_facet | Christensen, Ronald 1951- |
author_role | aut |
author_sort | Christensen, Ronald 1951- |
author_variant | r c rc |
building | Verbundindex |
bvnumber | BV013095273 |
callnumber-first | Q - Science |
callnumber-label | Q370 |
callnumber-raw | Q370 |
callnumber-search | Q370 |
callnumber-sort | Q 3370 |
callnumber-subject | Q - General Science |
classification_rvk | QH 233 |
ctrlnum | (OCoLC)10713767 (DE-599)BVBBV013095273 |
discipline | Wirtschaftswissenschaften |
edition | 1. ed. |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01233nam a2200337 cb4500</leader><controlfield tag="001">BV013095273</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">00000000000000.0</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">000405s1983 d||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0938876147</subfield><subfield code="9">0-938-87614-7</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)10713767</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV013095273</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-703</subfield><subfield code="a">DE-19</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">Q370</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">QH 233</subfield><subfield code="0">(DE-625)141548:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Christensen, Ronald</subfield><subfield code="d">1951-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)111351820</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Multivariate statistical modeling</subfield><subfield code="c">Ronald Christensen</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1. ed.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Lincoln, Mass.</subfield><subfield code="b">Entropy</subfield><subfield code="c">1983</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XVII, 726 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Entropy minimax sourcebook</subfield><subfield code="v">5</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Multivariate analysis</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Entropy minimax sourcebook</subfield><subfield code="v">5</subfield><subfield code="w">(DE-604)BV002406540</subfield><subfield code="9">5</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=008919297&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-008919297</subfield></datafield></record></collection> |
id | DE-604.BV013095273 |
illustrated | Illustrated |
indexdate | 2024-07-09T18:38:57Z |
institution | BVB |
isbn | 0938876147 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-008919297 |
oclc_num | 10713767 |
open_access_boolean | |
owner | DE-703 DE-19 DE-BY-UBM |
owner_facet | DE-703 DE-19 DE-BY-UBM |
physical | XVII, 726 S. graph. Darst. |
publishDate | 1983 |
publishDateSearch | 1983 |
publishDateSort | 1983 |
publisher | Entropy |
record_format | marc |
series | Entropy minimax sourcebook |
series2 | Entropy minimax sourcebook |
spelling | Christensen, Ronald 1951- Verfasser (DE-588)111351820 aut Multivariate statistical modeling Ronald Christensen 1. ed. Lincoln, Mass. Entropy 1983 XVII, 726 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Entropy minimax sourcebook 5 Multivariate analysis Entropy minimax sourcebook 5 (DE-604)BV002406540 5 HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=008919297&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Christensen, Ronald 1951- Multivariate statistical modeling Entropy minimax sourcebook Multivariate analysis |
title | Multivariate statistical modeling |
title_auth | Multivariate statistical modeling |
title_exact_search | Multivariate statistical modeling |
title_full | Multivariate statistical modeling Ronald Christensen |
title_fullStr | Multivariate statistical modeling Ronald Christensen |
title_full_unstemmed | Multivariate statistical modeling Ronald Christensen |
title_short | Multivariate statistical modeling |
title_sort | multivariate statistical modeling |
topic | Multivariate analysis |
topic_facet | Multivariate analysis |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=008919297&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
volume_link | (DE-604)BV002406540 |
work_keys_str_mv | AT christensenronald multivariatestatisticalmodeling |