Computational methods of feature selection:
Gespeichert in:
Format: | Buch |
---|---|
Sprache: | English |
Veröffentlicht: |
Boca Raton [u.a.]
Chapman & Hall/CRC
2008
|
Schriftenreihe: | Chapman & Hall/CRC data mining and knowledge discovery series
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Literaturangaben |
Beschreibung: | 419 S. graph. Darst. |
ISBN: | 9781584888789 1584888784 |
Internformat
MARC
LEADER | 00000nam a2200000zc 4500 | ||
---|---|---|---|
001 | BV035176645 | ||
003 | DE-604 | ||
005 | 20100108 | ||
007 | t | ||
008 | 081124s2008 xxud||| |||| 00||| eng d | ||
010 | |a 2007027465 | ||
020 | |a 9781584888789 |c alk. paper |9 978-1-58488-878-9 | ||
020 | |a 1584888784 |c alk. paper |9 1-58488-878-4 | ||
035 | |a (OCoLC)154309055 | ||
035 | |a (DE-599)BVBBV035176645 | ||
040 | |a DE-604 |b ger |e aacr | ||
041 | 0 | |a eng | |
044 | |a xxu |c US | ||
049 | |a DE-355 |a DE-634 |a DE-91 | ||
050 | 0 | |a QA76.9.D3 | |
082 | 0 | |a 005.74 | |
084 | |a ST 270 |0 (DE-625)143638: |2 rvk | ||
084 | |a MAT 533f |2 stub | ||
245 | 1 | 0 | |a Computational methods of feature selection |c ed. by Huan Liu |
264 | 1 | |a Boca Raton [u.a.] |b Chapman & Hall/CRC |c 2008 | |
300 | |a 419 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Chapman & Hall/CRC data mining and knowledge discovery series | |
500 | |a Literaturangaben | ||
650 | 7 | |a Aprendizado computacional |2 larpcal | |
650 | 7 | |a Banco de dados (gerenciamento) |2 larpcal | |
650 | 4 | |a Bases de données - Gestion | |
650 | 4 | |a Exploration de données (Informatique) | |
650 | 7 | |a Mineração de dados |2 larpcal | |
650 | 4 | |a Recherche de l'information | |
650 | 4 | |a Systèmes d'information - Recherche | |
650 | 4 | |a Théorie de l'apprentissage informatique | |
650 | 4 | |a Database management | |
650 | 4 | |a Data mining | |
650 | 4 | |a Machine learning | |
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Datenbankverwaltung |0 (DE-588)4389357-0 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Merkmalsextraktion |0 (DE-588)4314440-8 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Datenbankverwaltung |0 (DE-588)4389357-0 |D s |
689 | 0 | 1 | |a Merkmalsextraktion |0 (DE-588)4314440-8 |D s |
689 | 0 | 2 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |D s |
689 | 0 | 3 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | |C b |5 DE-604 | |
700 | 1 | |a Liu, Huan |e Sonstige |4 oth | |
856 | 4 | 2 | |m Digitalisierung UB Regensburg |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016983487&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-016983487 |
Datensatz im Suchindex
_version_ | 1804138347456626688 |
---|---|
adam_text | Contents
I Introduction and Background
1
1
Less Is More
3
Huan Liu and Hiroshi Motoda
1.1
Background and Basics
..................... 4
1.2
Supervised, Unsupervised, and Semi-Supervised Feature Selec¬
tion
................................ 7
1.3
Key Contributions and Organization of the Book
....... 10
1.3.1
Part I
-
Introduction and Background
......... 10
1.3.2
Part II
-
Extending Feature Selection
......... 11
1.3.3
Part III
-
Weighting and Local Methods
........ 12
1.3.4
Part IV
-
Text Classification and Clustering
...... 13
1.3.5
Part V
-
Feature Selection in Bioinformatics
...... 14
1.4
Looking Ahead
.......................... 15
2
Unsupervised Feature Selection
19
Jennifer
G. Dy
2.1
Introduction
........................... 19
2.2
Clustering
............................ 21
2.2.1
The Ji-Means Algorithm
................ 21
2.2.2
Finite Mixture Clustering
................ 22
2.3
Feature Selection
......................... 23
2.3.1
Feature Search
...................... 23
2.3.2
Feature Evaluation
.................... 24
2.4
Feature Selection for Unlabeled Data
............. 25
2.4.1
Filter Methods
...................... 26
2.4.2
Wrapper Methods
.................... 27
2.5
Local Approaches
........................ 32
2.5.1
Subspace Clustering
................... 32
2.5.2
Co-Clustering/Bi-Clustering
............... 33
2.6
Summary
............................. 34
3
Randomized Feature Selection
41
David J. Stracuzzi
3.1
Introduction
........................... 41
3.2
Types of Randomizations
.................... 42
3.3
Randomized Complexity Classes
................ 43
3.4
Applying Randomization to Feature Selection
........ 45
3.5
The Role of Heuristics
...................... 46
3.6
Examples of Randomized Selection Algorithms
........ 47
3.6.1
A Simple Las Vegas Approach
............. 47
3.6.2
Two Simple Monte Carlo Approaches
......... 49
3.6.3
Random Mutation Hill Climbing
............ 51
3.6.4
Simulated Annealing
................... 52
3.6.5
Genetic Algorithms
.................... 54
3.6.6
Randomized Variable Elimination
........... 56
3.7
Issues in Randomization
.................... 58
3.7.1
Pseudorandom Number Generators
........... 58
3.7.2
Sampling from Specialized Data Structures
...... 59
3.8
Summary
............................. 59
4
Causal Feature Selection
63
Isabelle Guyon, Constantin
Alijms,
and
André Elisseeff
4.1
Introduction
........................... 63
4.2
Classical Non-Causal Feature Selection
........... 65
4.3
The Concept of Causality
.................... 68
4.3.1
Probabilistic Causality
.................. 69
4.3.2
Causal Bayesian Networks
................ 70
4.4
Feature Relevance in Bayesian Networks
........... 71
4.4.1
Markov Blanket
..................... 72
4.4.2
Characterizing Features Selected via Classical Methods
73
4.5
Causal Discovery Algorithms
.................. 77
4.5.1
A Prototypical Causal Discovery Algorithm
...... 78
4.5.2
Markov Blanket Induction Algorithms
......... 79
4.6
Examples of Applications
.................... 80
4.7
Summary, Conclusions, and Open Problems
......... 82
II Extending Feature Selection
87
5
Active Learning of Feature Relevance
89
Emanuele
Olivetti,
Sńharsha
Veeramachaneni, and Paolo Avesani
5.1
Introduction
........................... 89
5.2
Active Sampling for Feature Relevance Estimation
...... 92
5.3
Derivation of the Sampling Benefit Function
......... 93
5.4
Implementation of the Active Sampling Algorithm
...... 95
5.4.1
Data Generation Model: Class-Conditional Mixture of
Product Distributions
.................. 95
5.4.2
Calculation of Feature Relevances
........... 96
5.4.3
Calculation of Conditional Probabilities
........ 97
5.4.4
Parameter Estimation
.................. 97
5.5
Experiments
........................... 99
• 5.5.1
Synthetic Data
...................... 99
5.5.2
UCI Datasets
....................... 100
5.5.3
Computational Complexity Issues
........... 102
5.6
Conclusions and Future Work
................. 102
6
A Study of Feature Extraction Techniques Based on Decision
Border Estimate
109
Claudia
Diamantini
and
Domenico Patena
6.1
Introduction
........................... 109
6.1.1
Background on Statistical Pattern Classification
. . .
Ill
6.2
Feature Extraction Based on Decision Boundary
....... 112
6.2.1
MLP-Based Decision Boundary Feature Extraction
. . 113
6.2.2
SVM Decision Boundary Analysis
........... 114
6.3
Generalities About Labeled Vector Quantizers
........ 115
6.4
Feature Extraction Based on Vector Quantizers
....... 116
6.4.1
Weighting of Normal Vectors
.............. 119
6.5
Experiments
........................... 122
6.5.1
Experiment with Synthetic Data
............ 122
6.5.2
Experiment with Real Data
............... 124
6.6
Conclusions
............................ 127
7
Ensemble-Based Variable Selection Using Independent Probes
131
Eugene Tuv, Alexander
Borisov,
and
Kaň
Torkkola
7.1
Introduction
........................... 131
7.2
Tree Ensemble Methods in Feature Ranking
......... 132
7.3
The Algorithm: Ensemble-Based Ranking Against Indepen¬
dent Probes
........................... 134
7.4
Experiments
........................... 137
7.4.1
Benchmark Methods
................... 138
7.4.2
Data and Experiments
.................. 139
7.5
Discussion
............................ 143
8
Efficient Incremental-Ranked Feature Selection in Massive
Data
147
Roberto Ruiz,
Jesús
S.
Aguilar-Ruiz, and
José
С.
Riquelme
8.1
Introduction
........................... 147
8.2
Related Work
.......................... 148
8.3
Preliminary Concepts
...................... 150
8.3.1
Relevance
......................... 150
8.3.2
Redundancy
........................ 151
8.4
Incremental Performance over Ranking
............ 152
8.4.1
Incremental Ranked Usefulness
............. 153
8.4.2
Algorithm
......................... 155
8.5
Experimental Results
...................... 156
8.6
Conclusions
............................ 164
Ill Weighting and Local Methods
167
9
Non-Myopic Feature Quality Evaluation with (R)ReliefF
169
Igor Kononenko and
Marko Robnik
Sikonja
9.1
Introduction
........................... 169
9.2
Erom
Impurity to Relief
..................... 170
9.2.1
Impurity Measures in Classification
........... 171
9.2.2
Relief for Classification
................. 172
9.3
ReliefF for Classification and RReliefF for Regression
.... 175
9.4
Extensions
............................ 178
9.4.1
ReliefF for Inductive Logic Programming
....... 178
9.4.2
Cost-Sensitive ReliefF
.................. 180
9.4.3
Evaluation of Ordered Features at Value Level
.... 181
9.5
Interpretation
.......................... 182
9.5.1
Difference of Probabilities
................ 182
9.5.2
Portion of the Explained Concept
........... 183
9.6
Implementation Issues
...................... 184
9.6.1
Time Complexity
..................... 184
9.6.2
Active Sampling
..................... 184
9.6.3
Parallelization
...................... 185
9.7
Applications
........................... 185
9.7.1
Feature Subset Selection
................. 185
9.7.2
Feature Ranking
..................... 186
9.7.3
Feature Weighing
..................... 186
9.7.4
Building Tree-Based Models
............... 187
9.7.5
Feature Discretization
.................. 187
9.7.6
Association Rules and Genetic Algorithms
....... 187
9.7.7
Constructive Induction
.................. 188
9.8
Conclusion
............................ 188
10
Weighting Method for Feature Selection in K-Means
193
Joshua Zhexue Huang,
Jun
Хи,
Michael Ng, and Yunming Ye
10.1
Introduction
........................... 193
10.2
Feature Weighting in fc-Means
................. 194
10.3
W-fc-Means Clustering Algorithm
............... 197
10.4
Feature Selection
......................... 198
10.5
Subspace Clustering with fc-Means
............... 200
10.6
Text Clustering
......................... 201
10.6.1
Text Data and Subspace Clustering
.......... 202
10.6.2
Selection of Key Words
................. 203
10.7
Related Work
.......................... 204
10.8
Discussions
............................ 207
11
Local
Feature
Selection for Classification
211
Carlotta Domeniconi and Dimitrios Gunopulos
11.1
Introduction
........................... 211
11.2
The Curse of Dimensionality
.................. 213
11.3
Adaptive Metric Techniques
.................. 214
11.3.1
Flexible Metric Nearest Neighbor Classification
.... 215
11.3.2
Discriminant Adaptive Nearest Neighbor Classification
216
11.3.3
Adaptive Metric Nearest Neighbor Algorithm
..... 217
11.4
Large Margin Nearest Neighbor Classifiers
.......... 222
11.4.1
Support Vector Machines
................ 223
11.4.2
Feature Weighting
.................... 224
11.4.3
Large Margin Nearest Neighbor Classification
..... 225
НАЛ
Weighting Features Increases the Margin
....... 227
11.5
Experimental Comparisons
................... 228
11.6
Conclusions
............................ 231
12
Feature Weighting through Local Learning
233
Yijun Sun
12.1
Introduction
........................... 233
12.2
Mathematical Interpretation of Relief
............. 235
12.3
Iterative Relief Algorithm
.................... 236
12.3.1
Algorithm
......................... 236
12.3.2
Convergence Analysis
.................. 238
12.4
Extension to Multiclass Problems
............... 240
12.5
Online Learning
......................... 240
12.6
Computational Complexity
................... 242
12.7
Experiments
........................... 242
12.7.1
Experimental Setup
................... 242
12.7.2
Experiments on UCI
Datasets
.............. 244
12.7.3
Choice of Kernel Width
................. 248
12.7.4
Online Learning
..................... 248
12.7.5
Experiments on Microarray Data
............ 249
12.8
Conclusion
............................ 251
IV Text Classification and Clustering
255
13
Feature Selection for Text Classification
257
George
Forman
13.1
Introduction
........................... 257
13.1.1
Feature Selection Phyla
................. 259
13.1.2
Characteristic Difficulties of Text Classification Tasks
260
13.2
Text Feature Generators
.................... 261
13.2.1
Word Merging
...................... 261
13.2.2
Word Phrases
....................... 262
13.2.3
Character N-grams
.................... 263
13.2.4
Multi-Field
Records
................... 264
13.2.5
Other Properties
..................... 264
13.2.6
Feature Values
...................... 265
13.3
Feature Filtering for Classification
............... 265
13.3.1
Binary Classification
................... 266
13.3.2
Multi-Class Classification
................ 269
13.3.3
Hierarchical Classification
................ 270
13.4
Practical and Scalable Computation
.............. 271
13.5
A Case Study
.......................... 272
13.6
Conclusion and Future Work
.................. 274
14
A Bayesian Feature Selection Score Based on
Naïve Bayes
Models
277
Susana
Eyheramendy and David Madigan
14.1
Introduction
........................... 277
14.2
Feature Selection Scores
..................... 279
14.2.1
Posterior Inclusion Probability (PIP)
.......... 280
14.2.2
Posterior Inclusion Probability (PIP) under a Bernoulli
distribution
........................ 281
14.2.3
Posterior Inclusion Probability (PIPp) under
Poisson
distributions
....................... 283
14.2.4
Information Gain
(IG).................. 284
14.2.5
Bi-Normal
Separation (BNS)
.............. 285
14.2.6
Chi-Square
........................ 285
14.2.7
Odds Ratio
........................ 286
14.2.8
Word Frequency
..................... 286
14.3
Classification Algorithms
.................... 286
14.4
Experimental Settings and Results
............... 287
14.4.1
Datasets
.......................... 287
14.4.2
Experimental Results
.................. 288
14.5
Conclusion
............................ 290
15
Pairwise Constraints-Guided Dimensionality Reduction
295
Wei Tang and Shi Zhong
15.1
Introduction
........................... 295
15.2
Pairwise Constraints-Guided Feature Projection
....... 297
15.2.1
Feature Projection
.................... 298
15.2.2
Projection-Based Semi-supervised Clustering
..... 300
15.3
Pairwise Constraints-Guided Co-clustering
.......... 301
15.4
Experimental Studies
...................... 302
15.4.1
Experimental Study
-
1
................. 302
15.4.2
Experimental Study
-
II
................. 306
15.4.3
Experimental Study
-
III
................ 309
15.5
Conclusion and Future Work
.................. 310
16 Aggressive Feature
Selection by
Feature
Ranking
313
Masoud Makrehchi and
Mohamed S.
Kamel
16.1
Introduction
........................... 313
16.2
Feature Selection by Feature Ranking
............. 314
16.2.1
Multivariate Characteristic of Text Classifiers
..... 316
16.2.2
Term Redundancy
.................... 316
16.3
Proposed Approach to Reducing Term Redundancy
..... 320
16.3.1
Stemming, Stopwords, and Low-DF Terms Elimination
320
16.3.2
Feature Ranking
..................... 320
16.3.3
Redundancy Reduction
................. 322
16.3.4
Redundancy Removal Algorithm
............ 325
16.3.5
Term Redundancy Tree
................. 326
16.4
Experimental Results
...................... 326
16.5
Summary
............................. 330
V Feature Selection in Bioinformatics
335
17
Feature Selection for Genomic Data Analysis
337
Lei Yu
17.1
Introduction
........................... 337
17.1.1
Microarray Data and Challenges
............ 337
17.1.2
Feature Selection for Microarray Data
......... 338
17.2
Redundancy-Based Feature Selection
............. 340
17.2.1
Feature Relevance and Redundancy
.......... 340
17.2.2
An Efficient Framework for Redundancy Analysis
. . . 343
17.2.3
RBF Algorithm
...................... 345
17.3
Empirical Study
......................... 347
17.3.1
Datasets
.......................... 347
17.3.2
Experimental Settings
.................. 349
17.3.3
Results and Discussion
.................. 349
17.4
Summary
............................. 351
18
A Feature Generation Algorithm with Applications to Bio¬
logical Sequence Classification
355
Rezarta Islamaj Dogan,
Lise
Getoor, and W. John Wilbur
18.1
Introduction
........................... 355
18.2
Splice-Site Prediction
...................... 356
18.2.1
The Splice-Site Prediction Problem
........... 356
18.2.2
Current Approaches
................... 357
18.2.3
Our Approach
...................... 359
18.3
Feature Generation Algorithm
................. 359
18.3.1
Feature Type Analysis
.................. 360
18.3.2
Feature Selection
.................... 362
18.3.3
Feature Generation Algorithm (FGA)
......... 364
18.4
Experiments and Discussion
.................. 366
18.4.1
Data Description
.................... 366
18.4.2
Feature Generation
.................... 367
18.4.3
Prediction Results for Individual Feature Types
.... 369
18.4.4
Splice-Site Prediction with FGA Features
....... 370
18.5
Conclusions
........................... 372
19
An Ensemble Method for Identifying Robust Features for
Biomarker
Discovery
377
Diana Chan, Susan M. Bridges, and Shane C. Burgess
19.1
Introduction
........................... 377
19.2
Biomarker
Discovery from Proteome Profiles
......... 378
19.3
Challenges of
Biomarker
Identification
............. 380
19.4
Ensemble Method for Feature Selection
............ 381
19.5
Feature Selection Ensemble
................... 383
19.6
Results and Discussion
..................... 384
19.7
Conclusion
............................ 389
20
Model Building and Feature Selection with Genomic Data
393
Hui Zou
and Trevor
Hastie
20.1
Introduction
........................... 393
20.2
Ridge Regression, Lasso, and Bridge
.............. 394
20.3
Drawbacks of the Lasso
..................... 396
20.4
The Elastic Net
......................... 397
20.4.1
Definition
......................... 397
20.4.2
A Stylized Example
................... 399
20.4.3
Computation and Tuning
................ 400
20.4.4
Analyzing the Cardiomypathy Data
.......... 402
20.5
The Elastic-Net Penalized SVM
................ 404
20.5.1
Support Vector Machines
................ 404
20.5.2
A New SVM Classifier
.................. 405
20.6
Sparse Eigen-Genes
....................... 407
20.6.1
PCA and Eigen-Genes
.................. 408
20.6.2
Sparse Principal Component Analysis
......... 408
20.7
Summary
............................. 409
Index
413
|
adam_txt |
Contents
I Introduction and Background
1
1
Less Is More
3
Huan Liu and Hiroshi Motoda
1.1
Background and Basics
. 4
1.2
Supervised, Unsupervised, and Semi-Supervised Feature Selec¬
tion
. 7
1.3
Key Contributions and Organization of the Book
. 10
1.3.1
Part I
-
Introduction and Background
. 10
1.3.2
Part II
-
Extending Feature Selection
. 11
1.3.3
Part III
-
Weighting and Local Methods
. 12
1.3.4
Part IV
-
Text Classification and Clustering
. 13
1.3.5
Part V
-
Feature Selection in Bioinformatics
. 14
1.4
Looking Ahead
. 15
2
Unsupervised Feature Selection
19
Jennifer
G. Dy
2.1
Introduction
. 19
2.2
Clustering
. 21
2.2.1
The Ji-Means Algorithm
. 21
2.2.2
Finite Mixture Clustering
. 22
2.3
Feature Selection
. 23
2.3.1
Feature Search
. 23
2.3.2
Feature Evaluation
. 24
2.4
Feature Selection for Unlabeled Data
. 25
2.4.1
Filter Methods
. 26
2.4.2
Wrapper Methods
. 27
2.5
Local Approaches
. 32
2.5.1
Subspace Clustering
. 32
2.5.2
Co-Clustering/Bi-Clustering
. 33
2.6
Summary
. 34
3
Randomized Feature Selection
41
David J. Stracuzzi
3.1
Introduction
. 41
3.2
Types of Randomizations
. 42
3.3
Randomized Complexity Classes
. 43
3.4
Applying Randomization to Feature Selection
. 45
3.5
The Role of Heuristics
. 46
3.6
Examples of Randomized Selection Algorithms
. 47
3.6.1
A Simple Las Vegas Approach
. 47
3.6.2
Two Simple Monte Carlo Approaches
. 49
3.6.3
Random Mutation Hill Climbing
. 51
3.6.4
Simulated Annealing
. 52
3.6.5
Genetic Algorithms
. 54
3.6.6
Randomized Variable Elimination
. 56
3.7
Issues in Randomization
. 58
3.7.1
Pseudorandom Number Generators
. 58
3.7.2
Sampling from Specialized Data Structures
. 59
3.8
Summary
. 59
4
Causal Feature Selection
63
Isabelle Guyon, Constantin
Alijms,
and
André Elisseeff
4.1
Introduction
. 63
4.2
Classical "Non-Causal" Feature Selection
. 65
4.3
The Concept of Causality
. 68
4.3.1
Probabilistic Causality
. 69
4.3.2
Causal Bayesian Networks
. 70
4.4
Feature Relevance in Bayesian Networks
. 71
4.4.1
Markov Blanket
. 72
4.4.2
Characterizing Features Selected via Classical Methods
73
4.5
Causal Discovery Algorithms
. 77
4.5.1
A Prototypical Causal Discovery Algorithm
. 78
4.5.2
Markov Blanket Induction Algorithms
. 79
4.6
Examples of Applications
. 80
4.7
Summary, Conclusions, and Open Problems
. 82
II Extending Feature Selection
87
5
Active Learning of Feature Relevance
89
Emanuele
Olivetti,
Sńharsha
Veeramachaneni, and Paolo Avesani
5.1
Introduction
. 89
5.2
Active Sampling for Feature Relevance Estimation
. 92
5.3
Derivation of the Sampling Benefit Function
. 93
5.4
Implementation of the Active Sampling Algorithm
. 95
5.4.1
Data Generation Model: Class-Conditional Mixture of
Product Distributions
. 95
5.4.2
Calculation of Feature Relevances
. 96
5.4.3
Calculation of Conditional Probabilities
. 97
5.4.4
Parameter Estimation
. 97
5.5
Experiments
. 99
• 5.5.1
Synthetic Data
. 99
5.5.2
UCI Datasets
. 100
5.5.3
Computational Complexity Issues
. 102
5.6
Conclusions and Future Work
. 102
6
A Study of Feature Extraction Techniques Based on Decision
Border Estimate
109
Claudia
Diamantini
and
Domenico Patena
6.1
Introduction
. 109
6.1.1
Background on Statistical Pattern Classification
. . .
Ill
6.2
Feature Extraction Based on Decision Boundary
. 112
6.2.1
MLP-Based Decision Boundary Feature Extraction
. . 113
6.2.2
SVM Decision Boundary Analysis
. 114
6.3
Generalities About Labeled Vector Quantizers
. 115
6.4
Feature Extraction Based on Vector Quantizers
. 116
6.4.1
Weighting of Normal Vectors
. 119
6.5
Experiments
. 122
6.5.1
Experiment with Synthetic Data
. 122
6.5.2
Experiment with Real Data
. 124
6.6
Conclusions
. 127
7
Ensemble-Based Variable Selection Using Independent Probes
131
Eugene Tuv, Alexander
Borisov,
and
Kaň
Torkkola
7.1
Introduction
. 131
7.2
Tree Ensemble Methods in Feature Ranking
. 132
7.3
The Algorithm: Ensemble-Based Ranking Against Indepen¬
dent Probes
. 134
7.4
Experiments
. 137
7.4.1
Benchmark Methods
. 138
7.4.2
Data and Experiments
. 139
7.5
Discussion
. 143
8
Efficient Incremental-Ranked Feature Selection in Massive
Data
147
Roberto Ruiz,
Jesús
S.
Aguilar-Ruiz, and
José
С.
Riquelme
8.1
Introduction
. 147
8.2
Related Work
. 148
8.3
Preliminary Concepts
. 150
8.3.1
Relevance
. 150
8.3.2
Redundancy
. 151
8.4
Incremental Performance over Ranking
. 152
8.4.1
Incremental Ranked Usefulness
. 153
8.4.2
Algorithm
. 155
8.5
Experimental Results
. 156
8.6
Conclusions
. 164
Ill Weighting and Local Methods
167
9
Non-Myopic Feature Quality Evaluation with (R)ReliefF
169
Igor Kononenko and
Marko Robnik
Sikonja
9.1
Introduction
. 169
9.2
Erom
Impurity to Relief
. 170
9.2.1
Impurity Measures in Classification
. 171
9.2.2
Relief for Classification
. 172
9.3
ReliefF for Classification and RReliefF for Regression
. 175
9.4
Extensions
. 178
9.4.1
ReliefF for Inductive Logic Programming
. 178
9.4.2
Cost-Sensitive ReliefF
. 180
9.4.3
Evaluation of Ordered Features at Value Level
. 181
9.5
Interpretation
. 182
9.5.1
Difference of Probabilities
. 182
9.5.2
Portion of the Explained Concept
. 183
9.6
Implementation Issues
. 184
9.6.1
Time Complexity
. 184
9.6.2
Active Sampling
. 184
9.6.3
Parallelization
. 185
9.7
Applications
. 185
9.7.1
Feature Subset Selection
. 185
9.7.2
Feature Ranking
. 186
9.7.3
Feature Weighing
. 186
9.7.4
Building Tree-Based Models
. 187
9.7.5
Feature Discretization
. 187
9.7.6
Association Rules and Genetic Algorithms
. 187
9.7.7
Constructive Induction
. 188
9.8
Conclusion
. 188
10
Weighting Method for Feature Selection in K-Means
193
Joshua Zhexue Huang,
Jun
Хи,
Michael Ng, and Yunming Ye
10.1
Introduction
. 193
10.2
Feature Weighting in fc-Means
. 194
10.3
W-fc-Means Clustering Algorithm
. 197
10.4
Feature Selection
. 198
10.5
Subspace Clustering with fc-Means
. 200
10.6
Text Clustering
. 201
10.6.1
Text Data and Subspace Clustering
. 202
10.6.2
Selection of Key Words
. 203
10.7
Related Work
. 204
10.8
Discussions
. 207
11
Local
Feature
Selection for Classification
211
Carlotta Domeniconi and Dimitrios Gunopulos
11.1
Introduction
. 211
11.2
The Curse of Dimensionality
. 213
11.3
Adaptive Metric Techniques
. 214
11.3.1
Flexible Metric Nearest Neighbor Classification
. 215
11.3.2
Discriminant Adaptive Nearest Neighbor Classification
216
11.3.3
Adaptive Metric Nearest Neighbor Algorithm
. 217
11.4
Large Margin Nearest Neighbor Classifiers
. 222
11.4.1
Support Vector Machines
. 223
11.4.2
Feature Weighting
. 224
11.4.3
Large Margin Nearest Neighbor Classification
. 225
НАЛ
Weighting Features Increases the Margin
. 227
11.5
Experimental Comparisons
. 228
11.6
Conclusions
. 231
12
Feature Weighting through Local Learning
233
Yijun Sun
12.1
Introduction
. 233
12.2
Mathematical Interpretation of Relief
. 235
12.3
Iterative Relief Algorithm
. 236
12.3.1
Algorithm
. 236
12.3.2
Convergence Analysis
. 238
12.4
Extension to Multiclass Problems
. 240
12.5
Online Learning
. 240
12.6
Computational Complexity
. 242
12.7
Experiments
. 242
12.7.1
Experimental Setup
. 242
12.7.2
Experiments on UCI
Datasets
. 244
12.7.3
Choice of Kernel Width
. 248
12.7.4
Online Learning
. 248
12.7.5
Experiments on Microarray Data
. 249
12.8
Conclusion
. 251
IV Text Classification and Clustering
255
13
Feature Selection for Text Classification
257
George
Forman
13.1
Introduction
. 257
13.1.1
Feature Selection Phyla
. 259
13.1.2
Characteristic Difficulties of Text Classification Tasks
260
13.2
Text Feature Generators
. 261
13.2.1
Word Merging
. 261
13.2.2
Word Phrases
. 262
13.2.3
Character N-grams
. 263
13.2.4
Multi-Field
Records
. 264
13.2.5
Other Properties
. 264
13.2.6
Feature Values
. 265
13.3
Feature Filtering for Classification
. 265
13.3.1
Binary Classification
. 266
13.3.2
Multi-Class Classification
. 269
13.3.3
Hierarchical Classification
. 270
13.4
Practical and Scalable Computation
. 271
13.5
A Case Study
. 272
13.6
Conclusion and Future Work
. 274
14
A Bayesian Feature Selection Score Based on
Naïve Bayes
Models
277
Susana
Eyheramendy and David Madigan
14.1
Introduction
. 277
14.2
Feature Selection Scores
. 279
14.2.1
Posterior Inclusion Probability (PIP)
. 280
14.2.2
Posterior Inclusion Probability (PIP) under a Bernoulli
distribution
. 281
14.2.3
Posterior Inclusion Probability (PIPp) under
Poisson
distributions
. 283
14.2.4
Information Gain
(IG). 284
14.2.5
Bi-Normal
Separation (BNS)
. 285
14.2.6
Chi-Square
. 285
14.2.7
Odds Ratio
. 286
14.2.8
Word Frequency
. 286
14.3
Classification Algorithms
. 286
14.4
Experimental Settings and Results
. 287
14.4.1
Datasets
. 287
14.4.2
Experimental Results
. 288
14.5
Conclusion
. 290
15
Pairwise Constraints-Guided Dimensionality Reduction
295
Wei Tang and Shi Zhong
15.1
Introduction
. 295
15.2
Pairwise Constraints-Guided Feature Projection
. 297
15.2.1
Feature Projection
. 298
15.2.2
Projection-Based Semi-supervised Clustering
. 300
15.3
Pairwise Constraints-Guided Co-clustering
. 301
15.4
Experimental Studies
. 302
15.4.1
Experimental Study
-
1
. 302
15.4.2
Experimental Study
-
II
. 306
15.4.3
Experimental Study
-
III
. 309
15.5
Conclusion and Future Work
. 310
16 Aggressive Feature
Selection by
Feature
Ranking
313
Masoud Makrehchi and
Mohamed S.
Kamel
16.1
Introduction
. 313
16.2
Feature Selection by Feature Ranking
. 314
16.2.1
Multivariate Characteristic of Text Classifiers
. 316
16.2.2
Term Redundancy
. 316
16.3
Proposed Approach to Reducing Term Redundancy
. 320
16.3.1
Stemming, Stopwords, and Low-DF Terms Elimination
320
16.3.2
Feature Ranking
. 320
16.3.3
Redundancy Reduction
. 322
16.3.4
Redundancy Removal Algorithm
. 325
16.3.5
Term Redundancy Tree
. 326
16.4
Experimental Results
. 326
16.5
Summary
. 330
V Feature Selection in Bioinformatics
335
17
Feature Selection for Genomic Data Analysis
337
Lei Yu
17.1
Introduction
. 337
17.1.1
Microarray Data and Challenges
. 337
17.1.2
Feature Selection for Microarray Data
. 338
17.2
Redundancy-Based Feature Selection
. 340
17.2.1
Feature Relevance and Redundancy
. 340
17.2.2
An Efficient Framework for Redundancy Analysis
. . . 343
17.2.3
RBF Algorithm
. 345
17.3
Empirical Study
. 347
17.3.1
Datasets
. 347
17.3.2
Experimental Settings
. 349
17.3.3
Results and Discussion
. 349
17.4
Summary
. 351
18
A Feature Generation Algorithm with Applications to Bio¬
logical Sequence Classification
355
Rezarta Islamaj Dogan,
Lise
Getoor, and W. John Wilbur
18.1
Introduction
. 355
18.2
Splice-Site Prediction
. 356
18.2.1
The Splice-Site Prediction Problem
. 356
18.2.2
Current Approaches
. 357
18.2.3
Our Approach
. 359
18.3
Feature Generation Algorithm
. 359
18.3.1
Feature Type Analysis
. 360
18.3.2
Feature Selection
. 362
18.3.3
Feature Generation Algorithm (FGA)
. 364
18.4
Experiments and Discussion
. 366
18.4.1
Data Description
. 366
18.4.2
Feature Generation
. 367
18.4.3
Prediction Results for Individual Feature Types
. 369
18.4.4
Splice-Site Prediction with FGA Features
. 370
18.5
Conclusions
. 372
19
An Ensemble Method for Identifying Robust Features for
Biomarker
Discovery
377
Diana Chan, Susan M. Bridges, and Shane C. Burgess
19.1
Introduction
. 377
19.2
Biomarker
Discovery from Proteome Profiles
. 378
19.3
Challenges of
Biomarker
Identification
. 380
19.4
Ensemble Method for Feature Selection
. 381
19.5
Feature Selection Ensemble
. 383
19.6
Results and Discussion
. 384
19.7
Conclusion
. 389
20
Model Building and Feature Selection with Genomic Data
393
Hui Zou
and Trevor
Hastie
20.1
Introduction
. 393
20.2
Ridge Regression, Lasso, and Bridge
. 394
20.3
Drawbacks of the Lasso
. 396
20.4
The Elastic Net
. 397
20.4.1
Definition
. 397
20.4.2
A Stylized Example
. 399
20.4.3
Computation and Tuning
. 400
20.4.4
Analyzing the Cardiomypathy Data
. 402
20.5
The Elastic-Net Penalized SVM
. 404
20.5.1
Support Vector Machines
. 404
20.5.2
A New SVM Classifier
. 405
20.6
Sparse Eigen-Genes
. 407
20.6.1
PCA and Eigen-Genes
. 408
20.6.2
Sparse Principal Component Analysis
. 408
20.7
Summary
. 409
Index
413 |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
building | Verbundindex |
bvnumber | BV035176645 |
callnumber-first | Q - Science |
callnumber-label | QA76 |
callnumber-raw | QA76.9.D3 |
callnumber-search | QA76.9.D3 |
callnumber-sort | QA 276.9 D3 |
callnumber-subject | QA - Mathematics |
classification_rvk | ST 270 |
classification_tum | MAT 533f |
ctrlnum | (OCoLC)154309055 (DE-599)BVBBV035176645 |
dewey-full | 005.74 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 005 - Computer programming, programs, data, security |
dewey-raw | 005.74 |
dewey-search | 005.74 |
dewey-sort | 15.74 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik Mathematik |
discipline_str_mv | Informatik Mathematik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02367nam a2200613zc 4500</leader><controlfield tag="001">BV035176645</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20100108 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">081124s2008 xxud||| |||| 00||| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">2007027465</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781584888789</subfield><subfield code="c">alk. paper</subfield><subfield code="9">978-1-58488-878-9</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1584888784</subfield><subfield code="c">alk. paper</subfield><subfield code="9">1-58488-878-4</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)154309055</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV035176645</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-355</subfield><subfield code="a">DE-634</subfield><subfield code="a">DE-91</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA76.9.D3</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">005.74</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 270</subfield><subfield code="0">(DE-625)143638:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">MAT 533f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Computational methods of feature selection</subfield><subfield code="c">ed. by Huan Liu</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boca Raton [u.a.]</subfield><subfield code="b">Chapman & Hall/CRC</subfield><subfield code="c">2008</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">419 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Chapman & Hall/CRC data mining and knowledge discovery series</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturangaben</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Aprendizado computacional</subfield><subfield code="2">larpcal</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Banco de dados (gerenciamento)</subfield><subfield code="2">larpcal</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Bases de données - Gestion</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Exploration de données (Informatique)</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Mineração de dados</subfield><subfield code="2">larpcal</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Recherche de l'information</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Systèmes d'information - Recherche</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Théorie de l'apprentissage informatique</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Database management</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data mining</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Machine learning</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Datenbankverwaltung</subfield><subfield code="0">(DE-588)4389357-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Merkmalsextraktion</subfield><subfield code="0">(DE-588)4314440-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Datenbankverwaltung</subfield><subfield code="0">(DE-588)4389357-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Merkmalsextraktion</subfield><subfield code="0">(DE-588)4314440-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="3"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="C">b</subfield><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Liu, Huan</subfield><subfield code="e">Sonstige</subfield><subfield code="4">oth</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016983487&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-016983487</subfield></datafield></record></collection> |
id | DE-604.BV035176645 |
illustrated | Illustrated |
index_date | 2024-07-02T22:56:20Z |
indexdate | 2024-07-09T21:26:45Z |
institution | BVB |
isbn | 9781584888789 1584888784 |
language | English |
lccn | 2007027465 |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-016983487 |
oclc_num | 154309055 |
open_access_boolean | |
owner | DE-355 DE-BY-UBR DE-634 DE-91 DE-BY-TUM |
owner_facet | DE-355 DE-BY-UBR DE-634 DE-91 DE-BY-TUM |
physical | 419 S. graph. Darst. |
publishDate | 2008 |
publishDateSearch | 2008 |
publishDateSort | 2008 |
publisher | Chapman & Hall/CRC |
record_format | marc |
series2 | Chapman & Hall/CRC data mining and knowledge discovery series |
spelling | Computational methods of feature selection ed. by Huan Liu Boca Raton [u.a.] Chapman & Hall/CRC 2008 419 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Chapman & Hall/CRC data mining and knowledge discovery series Literaturangaben Aprendizado computacional larpcal Banco de dados (gerenciamento) larpcal Bases de données - Gestion Exploration de données (Informatique) Mineração de dados larpcal Recherche de l'information Systèmes d'information - Recherche Théorie de l'apprentissage informatique Database management Data mining Machine learning Data Mining (DE-588)4428654-5 gnd rswk-swf Datenbankverwaltung (DE-588)4389357-0 gnd rswk-swf Maschinelles Lernen (DE-588)4193754-5 gnd rswk-swf Merkmalsextraktion (DE-588)4314440-8 gnd rswk-swf Datenbankverwaltung (DE-588)4389357-0 s Merkmalsextraktion (DE-588)4314440-8 s Maschinelles Lernen (DE-588)4193754-5 s Data Mining (DE-588)4428654-5 s b DE-604 Liu, Huan Sonstige oth Digitalisierung UB Regensburg application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016983487&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Computational methods of feature selection Aprendizado computacional larpcal Banco de dados (gerenciamento) larpcal Bases de données - Gestion Exploration de données (Informatique) Mineração de dados larpcal Recherche de l'information Systèmes d'information - Recherche Théorie de l'apprentissage informatique Database management Data mining Machine learning Data Mining (DE-588)4428654-5 gnd Datenbankverwaltung (DE-588)4389357-0 gnd Maschinelles Lernen (DE-588)4193754-5 gnd Merkmalsextraktion (DE-588)4314440-8 gnd |
subject_GND | (DE-588)4428654-5 (DE-588)4389357-0 (DE-588)4193754-5 (DE-588)4314440-8 |
title | Computational methods of feature selection |
title_auth | Computational methods of feature selection |
title_exact_search | Computational methods of feature selection |
title_exact_search_txtP | Computational methods of feature selection |
title_full | Computational methods of feature selection ed. by Huan Liu |
title_fullStr | Computational methods of feature selection ed. by Huan Liu |
title_full_unstemmed | Computational methods of feature selection ed. by Huan Liu |
title_short | Computational methods of feature selection |
title_sort | computational methods of feature selection |
topic | Aprendizado computacional larpcal Banco de dados (gerenciamento) larpcal Bases de données - Gestion Exploration de données (Informatique) Mineração de dados larpcal Recherche de l'information Systèmes d'information - Recherche Théorie de l'apprentissage informatique Database management Data mining Machine learning Data Mining (DE-588)4428654-5 gnd Datenbankverwaltung (DE-588)4389357-0 gnd Maschinelles Lernen (DE-588)4193754-5 gnd Merkmalsextraktion (DE-588)4314440-8 gnd |
topic_facet | Aprendizado computacional Banco de dados (gerenciamento) Bases de données - Gestion Exploration de données (Informatique) Mineração de dados Recherche de l'information Systèmes d'information - Recherche Théorie de l'apprentissage informatique Database management Data mining Machine learning Data Mining Datenbankverwaltung Maschinelles Lernen Merkmalsextraktion |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016983487&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT liuhuan computationalmethodsoffeatureselection |