Verfügbarkeit: Computational methods of feature selection

Computational methods of feature selection:

Gespeichert in:

Bibliographische Detailangaben
Format:	Buch
Sprache:	English
Veröffentlicht:	Boca Raton [u.a.] Chapman & Hall/CRC 2008
Schriftenreihe:	Chapman & Hall/CRC data mining and knowledge discovery series
Schlagworte:	Aprendizado computacional Banco de dados (gerenciamento) Bases de données - Gestion Exploration de données (Informatique) Mineração de dados Recherche de l'information Systèmes d'information - Recherche Théorie de l'apprentissage informatique Database management Data mining Machine learning Data Mining Datenbankverwaltung Maschinelles Lernen Merkmalsextraktion
Online-Zugang:	Inhaltsverzeichnis
Beschreibung:	Literaturangaben
Beschreibung:	419 S. graph. Darst.
ISBN:	9781584888789 1584888784

Internformat

MARC


LEADER	00000nam a2200000zc 4500
001	BV035176645
003	DE-604
005	20100108
007	t
008	081124s2008 xxud\|\|\| \|\|\|\| 00\|\|\| eng d
010			\|a 2007027465
020			\|a 9781584888789 \|c alk. paper \|9 978-1-58488-878-9
020			\|a 1584888784 \|c alk. paper \|9 1-58488-878-4
035			\|a (OCoLC)154309055
035			\|a (DE-599)BVBBV035176645
040			\|a DE-604 \|b ger \|e aacr
041	0		\|a eng
044			\|a xxu \|c US
049			\|a DE-355 \|a DE-634 \|a DE-91
050		0	\|a QA76.9.D3
082	0		\|a 005.74
084			\|a ST 270 \|0 (DE-625)143638: \|2 rvk
084			\|a MAT 533f \|2 stub
245	1	0	\|a Computational methods of feature selection \|c ed. by Huan Liu
264		1	\|a Boca Raton [u.a.] \|b Chapman & Hall/CRC \|c 2008
300			\|a 419 S. \|b graph. Darst.
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
490	0		\|a Chapman & Hall/CRC data mining and knowledge discovery series
500			\|a Literaturangaben
650		7	\|a Aprendizado computacional \|2 larpcal
650		7	\|a Banco de dados (gerenciamento) \|2 larpcal
650		4	\|a Bases de données - Gestion
650		4	\|a Exploration de données (Informatique)
650		7	\|a Mineração de dados \|2 larpcal
650		4	\|a Recherche de l'information
650		4	\|a Systèmes d'information - Recherche
650		4	\|a Théorie de l'apprentissage informatique
650		4	\|a Database management
650		4	\|a Data mining
650		4	\|a Machine learning
650	0	7	\|a Data Mining \|0 (DE-588)4428654-5 \|2 gnd \|9 rswk-swf
650	0	7	\|a Datenbankverwaltung \|0 (DE-588)4389357-0 \|2 gnd \|9 rswk-swf
650	0	7	\|a Maschinelles Lernen \|0 (DE-588)4193754-5 \|2 gnd \|9 rswk-swf
650	0	7	\|a Merkmalsextraktion \|0 (DE-588)4314440-8 \|2 gnd \|9 rswk-swf
689	0	0	\|a Datenbankverwaltung \|0 (DE-588)4389357-0 \|D s
689	0	1	\|a Merkmalsextraktion \|0 (DE-588)4314440-8 \|D s
689	0	2	\|a Maschinelles Lernen \|0 (DE-588)4193754-5 \|D s
689	0	3	\|a Data Mining \|0 (DE-588)4428654-5 \|D s
689	0		\|C b \|5 DE-604
700	1		\|a Liu, Huan \|e Sonstige \|4 oth
856	4	2	\|m Digitalisierung UB Regensburg \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016983487&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
999			\|a oai:aleph.bib-bvb.de:BVB01-016983487

Datensatz im Suchindex

_version_	1804138347456626688
adam_text	Contents I Introduction and Background 1 1 Less Is More 3 Huan Liu and Hiroshi Motoda 1.1 Background and Basics ..................... 4 1.2 Supervised, Unsupervised, and Semi-Supervised Feature Selec¬ tion ................................ 7 1.3 Key Contributions and Organization of the Book ....... 10 1.3.1 Part I - Introduction and Background ......... 10 1.3.2 Part II - Extending Feature Selection ......... 11 1.3.3 Part III - Weighting and Local Methods ........ 12 1.3.4 Part IV - Text Classification and Clustering ...... 13 1.3.5 Part V - Feature Selection in Bioinformatics ...... 14 1.4 Looking Ahead .......................... 15 2 Unsupervised Feature Selection 19 Jennifer G. Dy 2.1 Introduction ........................... 19 2.2 Clustering ............................ 21 2.2.1 The Ji-Means Algorithm ................ 21 2.2.2 Finite Mixture Clustering ................ 22 2.3 Feature Selection ......................... 23 2.3.1 Feature Search ...................... 23 2.3.2 Feature Evaluation .................... 24 2.4 Feature Selection for Unlabeled Data ............. 25 2.4.1 Filter Methods ...................... 26 2.4.2 Wrapper Methods .................... 27 2.5 Local Approaches ........................ 32 2.5.1 Subspace Clustering ................... 32 2.5.2 Co-Clustering/Bi-Clustering ............... 33 2.6 Summary ............................. 34 3 Randomized Feature Selection 41 David J. Stracuzzi 3.1 Introduction ........................... 41 3.2 Types of Randomizations .................... 42 3.3 Randomized Complexity Classes ................ 43 3.4 Applying Randomization to Feature Selection ........ 45 3.5 The Role of Heuristics ...................... 46 3.6 Examples of Randomized Selection Algorithms ........ 47 3.6.1 A Simple Las Vegas Approach ............. 47 3.6.2 Two Simple Monte Carlo Approaches ......... 49 3.6.3 Random Mutation Hill Climbing ............ 51 3.6.4 Simulated Annealing ................... 52 3.6.5 Genetic Algorithms .................... 54 3.6.6 Randomized Variable Elimination ........... 56 3.7 Issues in Randomization .................... 58 3.7.1 Pseudorandom Number Generators ........... 58 3.7.2 Sampling from Specialized Data Structures ...... 59 3.8 Summary ............................. 59 4 Causal Feature Selection 63 Isabelle Guyon, Constantin Alijms, and André Elisseeff 4.1 Introduction ........................... 63 4.2 Classical Non-Causal Feature Selection ........... 65 4.3 The Concept of Causality .................... 68 4.3.1 Probabilistic Causality .................. 69 4.3.2 Causal Bayesian Networks ................ 70 4.4 Feature Relevance in Bayesian Networks ........... 71 4.4.1 Markov Blanket ..................... 72 4.4.2 Characterizing Features Selected via Classical Methods 73 4.5 Causal Discovery Algorithms .................. 77 4.5.1 A Prototypical Causal Discovery Algorithm ...... 78 4.5.2 Markov Blanket Induction Algorithms ......... 79 4.6 Examples of Applications .................... 80 4.7 Summary, Conclusions, and Open Problems ......... 82 II Extending Feature Selection 87 5 Active Learning of Feature Relevance 89 Emanuele Olivetti, Sńharsha Veeramachaneni, and Paolo Avesani 5.1 Introduction ........................... 89 5.2 Active Sampling for Feature Relevance Estimation ...... 92 5.3 Derivation of the Sampling Benefit Function ......... 93 5.4 Implementation of the Active Sampling Algorithm ...... 95 5.4.1 Data Generation Model: Class-Conditional Mixture of Product Distributions .................. 95 5.4.2 Calculation of Feature Relevances ........... 96 5.4.3 Calculation of Conditional Probabilities ........ 97 5.4.4 Parameter Estimation .................. 97 5.5 Experiments ........................... 99 • 5.5.1 Synthetic Data ...................... 99 5.5.2 UCI Datasets ....................... 100 5.5.3 Computational Complexity Issues ........... 102 5.6 Conclusions and Future Work ................. 102 6 A Study of Feature Extraction Techniques Based on Decision Border Estimate 109 Claudia Diamantini and Domenico Patena 6.1 Introduction ........................... 109 6.1.1 Background on Statistical Pattern Classification . . . Ill 6.2 Feature Extraction Based on Decision Boundary ....... 112 6.2.1 MLP-Based Decision Boundary Feature Extraction . . 113 6.2.2 SVM Decision Boundary Analysis ........... 114 6.3 Generalities About Labeled Vector Quantizers ........ 115 6.4 Feature Extraction Based on Vector Quantizers ....... 116 6.4.1 Weighting of Normal Vectors .............. 119 6.5 Experiments ........................... 122 6.5.1 Experiment with Synthetic Data ............ 122 6.5.2 Experiment with Real Data ............... 124 6.6 Conclusions ............................ 127 7 Ensemble-Based Variable Selection Using Independent Probes 131 Eugene Tuv, Alexander Borisov, and Kaň Torkkola 7.1 Introduction ........................... 131 7.2 Tree Ensemble Methods in Feature Ranking ......... 132 7.3 The Algorithm: Ensemble-Based Ranking Against Indepen¬ dent Probes ........................... 134 7.4 Experiments ........................... 137 7.4.1 Benchmark Methods ................... 138 7.4.2 Data and Experiments .................. 139 7.5 Discussion ............................ 143 8 Efficient Incremental-Ranked Feature Selection in Massive Data 147 Roberto Ruiz, Jesús S. Aguilar-Ruiz, and José С. Riquelme 8.1 Introduction ........................... 147 8.2 Related Work .......................... 148 8.3 Preliminary Concepts ...................... 150 8.3.1 Relevance ......................... 150 8.3.2 Redundancy ........................ 151 8.4 Incremental Performance over Ranking ............ 152 8.4.1 Incremental Ranked Usefulness ............. 153 8.4.2 Algorithm ......................... 155 8.5 Experimental Results ...................... 156 8.6 Conclusions ............................ 164 Ill Weighting and Local Methods 167 9 Non-Myopic Feature Quality Evaluation with (R)ReliefF 169 Igor Kononenko and Marko Robnik Sikonja 9.1 Introduction ........................... 169 9.2 Erom Impurity to Relief ..................... 170 9.2.1 Impurity Measures in Classification ........... 171 9.2.2 Relief for Classification ................. 172 9.3 ReliefF for Classification and RReliefF for Regression .... 175 9.4 Extensions ............................ 178 9.4.1 ReliefF for Inductive Logic Programming ....... 178 9.4.2 Cost-Sensitive ReliefF .................. 180 9.4.3 Evaluation of Ordered Features at Value Level .... 181 9.5 Interpretation .......................... 182 9.5.1 Difference of Probabilities ................ 182 9.5.2 Portion of the Explained Concept ........... 183 9.6 Implementation Issues ...................... 184 9.6.1 Time Complexity ..................... 184 9.6.2 Active Sampling ..................... 184 9.6.3 Parallelization ...................... 185 9.7 Applications ........................... 185 9.7.1 Feature Subset Selection ................. 185 9.7.2 Feature Ranking ..................... 186 9.7.3 Feature Weighing ..................... 186 9.7.4 Building Tree-Based Models ............... 187 9.7.5 Feature Discretization .................. 187 9.7.6 Association Rules and Genetic Algorithms ....... 187 9.7.7 Constructive Induction .................. 188 9.8 Conclusion ............................ 188 10 Weighting Method for Feature Selection in K-Means 193 Joshua Zhexue Huang, Jun Хи, Michael Ng, and Yunming Ye 10.1 Introduction ........................... 193 10.2 Feature Weighting in fc-Means ................. 194 10.3 W-fc-Means Clustering Algorithm ............... 197 10.4 Feature Selection ......................... 198 10.5 Subspace Clustering with fc-Means ............... 200 10.6 Text Clustering ......................... 201 10.6.1 Text Data and Subspace Clustering .......... 202 10.6.2 Selection of Key Words ................. 203 10.7 Related Work .......................... 204 10.8 Discussions ............................ 207 11 Local Feature Selection for Classification 211 Carlotta Domeniconi and Dimitrios Gunopulos 11.1 Introduction ........................... 211 11.2 The Curse of Dimensionality .................. 213 11.3 Adaptive Metric Techniques .................. 214 11.3.1 Flexible Metric Nearest Neighbor Classification .... 215 11.3.2 Discriminant Adaptive Nearest Neighbor Classification 216 11.3.3 Adaptive Metric Nearest Neighbor Algorithm ..... 217 11.4 Large Margin Nearest Neighbor Classifiers .......... 222 11.4.1 Support Vector Machines ................ 223 11.4.2 Feature Weighting .................... 224 11.4.3 Large Margin Nearest Neighbor Classification ..... 225 НАЛ Weighting Features Increases the Margin ....... 227 11.5 Experimental Comparisons ................... 228 11.6 Conclusions ............................ 231 12 Feature Weighting through Local Learning 233 Yijun Sun 12.1 Introduction ........................... 233 12.2 Mathematical Interpretation of Relief ............. 235 12.3 Iterative Relief Algorithm .................... 236 12.3.1 Algorithm ......................... 236 12.3.2 Convergence Analysis .................. 238 12.4 Extension to Multiclass Problems ............... 240 12.5 Online Learning ......................... 240 12.6 Computational Complexity ................... 242 12.7 Experiments ........................... 242 12.7.1 Experimental Setup ................... 242 12.7.2 Experiments on UCI Datasets .............. 244 12.7.3 Choice of Kernel Width ................. 248 12.7.4 Online Learning ..................... 248 12.7.5 Experiments on Microarray Data ............ 249 12.8 Conclusion ............................ 251 IV Text Classification and Clustering 255 13 Feature Selection for Text Classification 257 George Forman 13.1 Introduction ........................... 257 13.1.1 Feature Selection Phyla ................. 259 13.1.2 Characteristic Difficulties of Text Classification Tasks 260 13.2 Text Feature Generators .................... 261 13.2.1 Word Merging ...................... 261 13.2.2 Word Phrases ....................... 262 13.2.3 Character N-grams .................... 263 13.2.4 Multi-Field Records ................... 264 13.2.5 Other Properties ..................... 264 13.2.6 Feature Values ...................... 265 13.3 Feature Filtering for Classification ............... 265 13.3.1 Binary Classification ................... 266 13.3.2 Multi-Class Classification ................ 269 13.3.3 Hierarchical Classification ................ 270 13.4 Practical and Scalable Computation .............. 271 13.5 A Case Study .......................... 272 13.6 Conclusion and Future Work .................. 274 14 A Bayesian Feature Selection Score Based on Naïve Bayes Models 277 Susana Eyheramendy and David Madigan 14.1 Introduction ........................... 277 14.2 Feature Selection Scores ..................... 279 14.2.1 Posterior Inclusion Probability (PIP) .......... 280 14.2.2 Posterior Inclusion Probability (PIP) under a Bernoulli distribution ........................ 281 14.2.3 Posterior Inclusion Probability (PIPp) under Poisson distributions ....................... 283 14.2.4 Information Gain (IG).................. 284 14.2.5 Bi-Normal Separation (BNS) .............. 285 14.2.6 Chi-Square ........................ 285 14.2.7 Odds Ratio ........................ 286 14.2.8 Word Frequency ..................... 286 14.3 Classification Algorithms .................... 286 14.4 Experimental Settings and Results ............... 287 14.4.1 Datasets .......................... 287 14.4.2 Experimental Results .................. 288 14.5 Conclusion ............................ 290 15 Pairwise Constraints-Guided Dimensionality Reduction 295 Wei Tang and Shi Zhong 15.1 Introduction ........................... 295 15.2 Pairwise Constraints-Guided Feature Projection ....... 297 15.2.1 Feature Projection .................... 298 15.2.2 Projection-Based Semi-supervised Clustering ..... 300 15.3 Pairwise Constraints-Guided Co-clustering .......... 301 15.4 Experimental Studies ...................... 302 15.4.1 Experimental Study - 1 ................. 302 15.4.2 Experimental Study - II ................. 306 15.4.3 Experimental Study - III ................ 309 15.5 Conclusion and Future Work .................. 310 16 Aggressive Feature Selection by Feature Ranking 313 Masoud Makrehchi and Mohamed S. Kamel 16.1 Introduction ........................... 313 16.2 Feature Selection by Feature Ranking ............. 314 16.2.1 Multivariate Characteristic of Text Classifiers ..... 316 16.2.2 Term Redundancy .................... 316 16.3 Proposed Approach to Reducing Term Redundancy ..... 320 16.3.1 Stemming, Stopwords, and Low-DF Terms Elimination 320 16.3.2 Feature Ranking ..................... 320 16.3.3 Redundancy Reduction ................. 322 16.3.4 Redundancy Removal Algorithm ............ 325 16.3.5 Term Redundancy Tree ................. 326 16.4 Experimental Results ...................... 326 16.5 Summary ............................. 330 V Feature Selection in Bioinformatics 335 17 Feature Selection for Genomic Data Analysis 337 Lei Yu 17.1 Introduction ........................... 337 17.1.1 Microarray Data and Challenges ............ 337 17.1.2 Feature Selection for Microarray Data ......... 338 17.2 Redundancy-Based Feature Selection ............. 340 17.2.1 Feature Relevance and Redundancy .......... 340 17.2.2 An Efficient Framework for Redundancy Analysis . . . 343 17.2.3 RBF Algorithm ...................... 345 17.3 Empirical Study ......................... 347 17.3.1 Datasets .......................... 347 17.3.2 Experimental Settings .................. 349 17.3.3 Results and Discussion .................. 349 17.4 Summary ............................. 351 18 A Feature Generation Algorithm with Applications to Bio¬ logical Sequence Classification 355 Rezarta Islamaj Dogan, Lise Getoor, and W. John Wilbur 18.1 Introduction ........................... 355 18.2 Splice-Site Prediction ...................... 356 18.2.1 The Splice-Site Prediction Problem ........... 356 18.2.2 Current Approaches ................... 357 18.2.3 Our Approach ...................... 359 18.3 Feature Generation Algorithm ................. 359 18.3.1 Feature Type Analysis .................. 360 18.3.2 Feature Selection .................... 362 18.3.3 Feature Generation Algorithm (FGA) ......... 364 18.4 Experiments and Discussion .................. 366 18.4.1 Data Description .................... 366 18.4.2 Feature Generation .................... 367 18.4.3 Prediction Results for Individual Feature Types .... 369 18.4.4 Splice-Site Prediction with FGA Features ....... 370 18.5 Conclusions ........................... 372 19 An Ensemble Method for Identifying Robust Features for Biomarker Discovery 377 Diana Chan, Susan M. Bridges, and Shane C. Burgess 19.1 Introduction ........................... 377 19.2 Biomarker Discovery from Proteome Profiles ......... 378 19.3 Challenges of Biomarker Identification ............. 380 19.4 Ensemble Method for Feature Selection ............ 381 19.5 Feature Selection Ensemble ................... 383 19.6 Results and Discussion ..................... 384 19.7 Conclusion ............................ 389 20 Model Building and Feature Selection with Genomic Data 393 Hui Zou and Trevor Hastie 20.1 Introduction ........................... 393 20.2 Ridge Regression, Lasso, and Bridge .............. 394 20.3 Drawbacks of the Lasso ..................... 396 20.4 The Elastic Net ......................... 397 20.4.1 Definition ......................... 397 20.4.2 A Stylized Example ................... 399 20.4.3 Computation and Tuning ................ 400 20.4.4 Analyzing the Cardiomypathy Data .......... 402 20.5 The Elastic-Net Penalized SVM ................ 404 20.5.1 Support Vector Machines ................ 404 20.5.2 A New SVM Classifier .................. 405 20.6 Sparse Eigen-Genes ....................... 407 20.6.1 PCA and Eigen-Genes .................. 408 20.6.2 Sparse Principal Component Analysis ......... 408 20.7 Summary ............................. 409 Index 413
adam_txt	Contents I Introduction and Background 1 1 Less Is More 3 Huan Liu and Hiroshi Motoda 1.1 Background and Basics . 4 1.2 Supervised, Unsupervised, and Semi-Supervised Feature Selec¬ tion . 7 1.3 Key Contributions and Organization of the Book . 10 1.3.1 Part I - Introduction and Background . 10 1.3.2 Part II - Extending Feature Selection . 11 1.3.3 Part III - Weighting and Local Methods . 12 1.3.4 Part IV - Text Classification and Clustering . 13 1.3.5 Part V - Feature Selection in Bioinformatics . 14 1.4 Looking Ahead . 15 2 Unsupervised Feature Selection 19 Jennifer G. Dy 2.1 Introduction . 19 2.2 Clustering . 21 2.2.1 The Ji-Means Algorithm . 21 2.2.2 Finite Mixture Clustering . 22 2.3 Feature Selection . 23 2.3.1 Feature Search . 23 2.3.2 Feature Evaluation . 24 2.4 Feature Selection for Unlabeled Data . 25 2.4.1 Filter Methods . 26 2.4.2 Wrapper Methods . 27 2.5 Local Approaches . 32 2.5.1 Subspace Clustering . 32 2.5.2 Co-Clustering/Bi-Clustering . 33 2.6 Summary . 34 3 Randomized Feature Selection 41 David J. Stracuzzi 3.1 Introduction . 41 3.2 Types of Randomizations . 42 3.3 Randomized Complexity Classes . 43 3.4 Applying Randomization to Feature Selection . 45 3.5 The Role of Heuristics . 46 3.6 Examples of Randomized Selection Algorithms . 47 3.6.1 A Simple Las Vegas Approach . 47 3.6.2 Two Simple Monte Carlo Approaches . 49 3.6.3 Random Mutation Hill Climbing . 51 3.6.4 Simulated Annealing . 52 3.6.5 Genetic Algorithms . 54 3.6.6 Randomized Variable Elimination . 56 3.7 Issues in Randomization . 58 3.7.1 Pseudorandom Number Generators . 58 3.7.2 Sampling from Specialized Data Structures . 59 3.8 Summary . 59 4 Causal Feature Selection 63 Isabelle Guyon, Constantin Alijms, and André Elisseeff 4.1 Introduction . 63 4.2 Classical "Non-Causal" Feature Selection . 65 4.3 The Concept of Causality . 68 4.3.1 Probabilistic Causality . 69 4.3.2 Causal Bayesian Networks . 70 4.4 Feature Relevance in Bayesian Networks . 71 4.4.1 Markov Blanket . 72 4.4.2 Characterizing Features Selected via Classical Methods 73 4.5 Causal Discovery Algorithms . 77 4.5.1 A Prototypical Causal Discovery Algorithm . 78 4.5.2 Markov Blanket Induction Algorithms . 79 4.6 Examples of Applications . 80 4.7 Summary, Conclusions, and Open Problems . 82 II Extending Feature Selection 87 5 Active Learning of Feature Relevance 89 Emanuele Olivetti, Sńharsha Veeramachaneni, and Paolo Avesani 5.1 Introduction . 89 5.2 Active Sampling for Feature Relevance Estimation . 92 5.3 Derivation of the Sampling Benefit Function . 93 5.4 Implementation of the Active Sampling Algorithm . 95 5.4.1 Data Generation Model: Class-Conditional Mixture of Product Distributions . 95 5.4.2 Calculation of Feature Relevances . 96 5.4.3 Calculation of Conditional Probabilities . 97 5.4.4 Parameter Estimation . 97 5.5 Experiments . 99 • 5.5.1 Synthetic Data . 99 5.5.2 UCI Datasets . 100 5.5.3 Computational Complexity Issues . 102 5.6 Conclusions and Future Work . 102 6 A Study of Feature Extraction Techniques Based on Decision Border Estimate 109 Claudia Diamantini and Domenico Patena 6.1 Introduction . 109 6.1.1 Background on Statistical Pattern Classification . . . Ill 6.2 Feature Extraction Based on Decision Boundary . 112 6.2.1 MLP-Based Decision Boundary Feature Extraction . . 113 6.2.2 SVM Decision Boundary Analysis . 114 6.3 Generalities About Labeled Vector Quantizers . 115 6.4 Feature Extraction Based on Vector Quantizers . 116 6.4.1 Weighting of Normal Vectors . 119 6.5 Experiments . 122 6.5.1 Experiment with Synthetic Data . 122 6.5.2 Experiment with Real Data . 124 6.6 Conclusions . 127 7 Ensemble-Based Variable Selection Using Independent Probes 131 Eugene Tuv, Alexander Borisov, and Kaň Torkkola 7.1 Introduction . 131 7.2 Tree Ensemble Methods in Feature Ranking . 132 7.3 The Algorithm: Ensemble-Based Ranking Against Indepen¬ dent Probes . 134 7.4 Experiments . 137 7.4.1 Benchmark Methods . 138 7.4.2 Data and Experiments . 139 7.5 Discussion . 143 8 Efficient Incremental-Ranked Feature Selection in Massive Data 147 Roberto Ruiz, Jesús S. Aguilar-Ruiz, and José С. Riquelme 8.1 Introduction . 147 8.2 Related Work . 148 8.3 Preliminary Concepts . 150 8.3.1 Relevance . 150 8.3.2 Redundancy . 151 8.4 Incremental Performance over Ranking . 152 8.4.1 Incremental Ranked Usefulness . 153 8.4.2 Algorithm . 155 8.5 Experimental Results . 156 8.6 Conclusions . 164 Ill Weighting and Local Methods 167 9 Non-Myopic Feature Quality Evaluation with (R)ReliefF 169 Igor Kononenko and Marko Robnik Sikonja 9.1 Introduction . 169 9.2 Erom Impurity to Relief . 170 9.2.1 Impurity Measures in Classification . 171 9.2.2 Relief for Classification . 172 9.3 ReliefF for Classification and RReliefF for Regression . 175 9.4 Extensions . 178 9.4.1 ReliefF for Inductive Logic Programming . 178 9.4.2 Cost-Sensitive ReliefF . 180 9.4.3 Evaluation of Ordered Features at Value Level . 181 9.5 Interpretation . 182 9.5.1 Difference of Probabilities . 182 9.5.2 Portion of the Explained Concept . 183 9.6 Implementation Issues . 184 9.6.1 Time Complexity . 184 9.6.2 Active Sampling . 184 9.6.3 Parallelization . 185 9.7 Applications . 185 9.7.1 Feature Subset Selection . 185 9.7.2 Feature Ranking . 186 9.7.3 Feature Weighing . 186 9.7.4 Building Tree-Based Models . 187 9.7.5 Feature Discretization . 187 9.7.6 Association Rules and Genetic Algorithms . 187 9.7.7 Constructive Induction . 188 9.8 Conclusion . 188 10 Weighting Method for Feature Selection in K-Means 193 Joshua Zhexue Huang, Jun Хи, Michael Ng, and Yunming Ye 10.1 Introduction . 193 10.2 Feature Weighting in fc-Means . 194 10.3 W-fc-Means Clustering Algorithm . 197 10.4 Feature Selection . 198 10.5 Subspace Clustering with fc-Means . 200 10.6 Text Clustering . 201 10.6.1 Text Data and Subspace Clustering . 202 10.6.2 Selection of Key Words . 203 10.7 Related Work . 204 10.8 Discussions . 207 11 Local Feature Selection for Classification 211 Carlotta Domeniconi and Dimitrios Gunopulos 11.1 Introduction . 211 11.2 The Curse of Dimensionality . 213 11.3 Adaptive Metric Techniques . 214 11.3.1 Flexible Metric Nearest Neighbor Classification . 215 11.3.2 Discriminant Adaptive Nearest Neighbor Classification 216 11.3.3 Adaptive Metric Nearest Neighbor Algorithm . 217 11.4 Large Margin Nearest Neighbor Classifiers . 222 11.4.1 Support Vector Machines . 223 11.4.2 Feature Weighting . 224 11.4.3 Large Margin Nearest Neighbor Classification . 225 НАЛ Weighting Features Increases the Margin . 227 11.5 Experimental Comparisons . 228 11.6 Conclusions . 231 12 Feature Weighting through Local Learning 233 Yijun Sun 12.1 Introduction . 233 12.2 Mathematical Interpretation of Relief . 235 12.3 Iterative Relief Algorithm . 236 12.3.1 Algorithm . 236 12.3.2 Convergence Analysis . 238 12.4 Extension to Multiclass Problems . 240 12.5 Online Learning . 240 12.6 Computational Complexity . 242 12.7 Experiments . 242 12.7.1 Experimental Setup . 242 12.7.2 Experiments on UCI Datasets . 244 12.7.3 Choice of Kernel Width . 248 12.7.4 Online Learning . 248 12.7.5 Experiments on Microarray Data . 249 12.8 Conclusion . 251 IV Text Classification and Clustering 255 13 Feature Selection for Text Classification 257 George Forman 13.1 Introduction . 257 13.1.1 Feature Selection Phyla . 259 13.1.2 Characteristic Difficulties of Text Classification Tasks 260 13.2 Text Feature Generators . 261 13.2.1 Word Merging . 261 13.2.2 Word Phrases . 262 13.2.3 Character N-grams . 263 13.2.4 Multi-Field Records . 264 13.2.5 Other Properties . 264 13.2.6 Feature Values . 265 13.3 Feature Filtering for Classification . 265 13.3.1 Binary Classification . 266 13.3.2 Multi-Class Classification . 269 13.3.3 Hierarchical Classification . 270 13.4 Practical and Scalable Computation . 271 13.5 A Case Study . 272 13.6 Conclusion and Future Work . 274 14 A Bayesian Feature Selection Score Based on Naïve Bayes Models 277 Susana Eyheramendy and David Madigan 14.1 Introduction . 277 14.2 Feature Selection Scores . 279 14.2.1 Posterior Inclusion Probability (PIP) . 280 14.2.2 Posterior Inclusion Probability (PIP) under a Bernoulli distribution . 281 14.2.3 Posterior Inclusion Probability (PIPp) under Poisson distributions . 283 14.2.4 Information Gain (IG). 284 14.2.5 Bi-Normal Separation (BNS) . 285 14.2.6 Chi-Square . 285 14.2.7 Odds Ratio . 286 14.2.8 Word Frequency . 286 14.3 Classification Algorithms . 286 14.4 Experimental Settings and Results . 287 14.4.1 Datasets . 287 14.4.2 Experimental Results . 288 14.5 Conclusion . 290 15 Pairwise Constraints-Guided Dimensionality Reduction 295 Wei Tang and Shi Zhong 15.1 Introduction . 295 15.2 Pairwise Constraints-Guided Feature Projection . 297 15.2.1 Feature Projection . 298 15.2.2 Projection-Based Semi-supervised Clustering . 300 15.3 Pairwise Constraints-Guided Co-clustering . 301 15.4 Experimental Studies . 302 15.4.1 Experimental Study - 1 . 302 15.4.2 Experimental Study - II . 306 15.4.3 Experimental Study - III . 309 15.5 Conclusion and Future Work . 310 16 Aggressive Feature Selection by Feature Ranking 313 Masoud Makrehchi and Mohamed S. Kamel 16.1 Introduction . 313 16.2 Feature Selection by Feature Ranking . 314 16.2.1 Multivariate Characteristic of Text Classifiers . 316 16.2.2 Term Redundancy . 316 16.3 Proposed Approach to Reducing Term Redundancy . 320 16.3.1 Stemming, Stopwords, and Low-DF Terms Elimination 320 16.3.2 Feature Ranking . 320 16.3.3 Redundancy Reduction . 322 16.3.4 Redundancy Removal Algorithm . 325 16.3.5 Term Redundancy Tree . 326 16.4 Experimental Results . 326 16.5 Summary . 330 V Feature Selection in Bioinformatics 335 17 Feature Selection for Genomic Data Analysis 337 Lei Yu 17.1 Introduction . 337 17.1.1 Microarray Data and Challenges . 337 17.1.2 Feature Selection for Microarray Data . 338 17.2 Redundancy-Based Feature Selection . 340 17.2.1 Feature Relevance and Redundancy . 340 17.2.2 An Efficient Framework for Redundancy Analysis . . . 343 17.2.3 RBF Algorithm . 345 17.3 Empirical Study . 347 17.3.1 Datasets . 347 17.3.2 Experimental Settings . 349 17.3.3 Results and Discussion . 349 17.4 Summary . 351 18 A Feature Generation Algorithm with Applications to Bio¬ logical Sequence Classification 355 Rezarta Islamaj Dogan, Lise Getoor, and W. John Wilbur 18.1 Introduction . 355 18.2 Splice-Site Prediction . 356 18.2.1 The Splice-Site Prediction Problem . 356 18.2.2 Current Approaches . 357 18.2.3 Our Approach . 359 18.3 Feature Generation Algorithm . 359 18.3.1 Feature Type Analysis . 360 18.3.2 Feature Selection . 362 18.3.3 Feature Generation Algorithm (FGA) . 364 18.4 Experiments and Discussion . 366 18.4.1 Data Description . 366 18.4.2 Feature Generation . 367 18.4.3 Prediction Results for Individual Feature Types . 369 18.4.4 Splice-Site Prediction with FGA Features . 370 18.5 Conclusions . 372 19 An Ensemble Method for Identifying Robust Features for Biomarker Discovery 377 Diana Chan, Susan M. Bridges, and Shane C. Burgess 19.1 Introduction . 377 19.2 Biomarker Discovery from Proteome Profiles . 378 19.3 Challenges of Biomarker Identification . 380 19.4 Ensemble Method for Feature Selection . 381 19.5 Feature Selection Ensemble . 383 19.6 Results and Discussion . 384 19.7 Conclusion . 389 20 Model Building and Feature Selection with Genomic Data 393 Hui Zou and Trevor Hastie 20.1 Introduction . 393 20.2 Ridge Regression, Lasso, and Bridge . 394 20.3 Drawbacks of the Lasso . 396 20.4 The Elastic Net . 397 20.4.1 Definition . 397 20.4.2 A Stylized Example . 399 20.4.3 Computation and Tuning . 400 20.4.4 Analyzing the Cardiomypathy Data . 402 20.5 The Elastic-Net Penalized SVM . 404 20.5.1 Support Vector Machines . 404 20.5.2 A New SVM Classifier . 405 20.6 Sparse Eigen-Genes . 407 20.6.1 PCA and Eigen-Genes . 408 20.6.2 Sparse Principal Component Analysis . 408 20.7 Summary . 409 Index 413
any_adam_object	1
any_adam_object_boolean	1
building	Verbundindex
bvnumber	BV035176645
callnumber-first	Q - Science
callnumber-label	QA76
callnumber-raw	QA76.9.D3
callnumber-search	QA76.9.D3
callnumber-sort	QA 276.9 D3
callnumber-subject	QA - Mathematics
classification_rvk	ST 270
classification_tum	MAT 533f
ctrlnum	(OCoLC)154309055 (DE-599)BVBBV035176645
dewey-full	005.74
dewey-hundreds	000 - Computer science, information, general works
dewey-ones	005 - Computer programming, programs, data, security
dewey-raw	005.74
dewey-search	005.74
dewey-sort	15.74
dewey-tens	000 - Computer science, information, general works
discipline	Informatik Mathematik
discipline_str_mv	Informatik Mathematik
format	Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02367nam a2200613zc 4500</leader><controlfield tag="001">BV035176645</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20100108 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">081124s2008 xxud\|\|\| \|\|\|\| 00\|\|\| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">2007027465</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781584888789</subfield><subfield code="c">alk. paper</subfield><subfield code="9">978-1-58488-878-9</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1584888784</subfield><subfield code="c">alk. paper</subfield><subfield code="9">1-58488-878-4</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)154309055</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV035176645</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-355</subfield><subfield code="a">DE-634</subfield><subfield code="a">DE-91</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA76.9.D3</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">005.74</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 270</subfield><subfield code="0">(DE-625)143638:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">MAT 533f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Computational methods of feature selection</subfield><subfield code="c">ed. by Huan Liu</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boca Raton [u.a.]</subfield><subfield code="b">Chapman & Hall/CRC</subfield><subfield code="c">2008</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">419 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Chapman & Hall/CRC data mining and knowledge discovery series</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturangaben</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Aprendizado computacional</subfield><subfield code="2">larpcal</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Banco de dados (gerenciamento)</subfield><subfield code="2">larpcal</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Bases de données - Gestion</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Exploration de données (Informatique)</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Mineração de dados</subfield><subfield code="2">larpcal</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Recherche de l'information</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Systèmes d'information - Recherche</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Théorie de l'apprentissage informatique</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Database management</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data mining</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Machine learning</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Datenbankverwaltung</subfield><subfield code="0">(DE-588)4389357-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Merkmalsextraktion</subfield><subfield code="0">(DE-588)4314440-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Datenbankverwaltung</subfield><subfield code="0">(DE-588)4389357-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Merkmalsextraktion</subfield><subfield code="0">(DE-588)4314440-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="3"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="C">b</subfield><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Liu, Huan</subfield><subfield code="e">Sonstige</subfield><subfield code="4">oth</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016983487&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-016983487</subfield></datafield></record></collection>
id	DE-604.BV035176645
illustrated	Illustrated
index_date	2024-07-02T22:56:20Z
indexdate	2024-07-09T21:26:45Z
institution	BVB
isbn	9781584888789 1584888784
language	English
lccn	2007027465
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-016983487
oclc_num	154309055
open_access_boolean
owner	DE-355 DE-BY-UBR DE-634 DE-91 DE-BY-TUM
owner_facet	DE-355 DE-BY-UBR DE-634 DE-91 DE-BY-TUM
physical	419 S. graph. Darst.
publishDate	2008
publishDateSearch	2008
publishDateSort	2008
publisher	Chapman & Hall/CRC
record_format	marc
series2	Chapman & Hall/CRC data mining and knowledge discovery series
spelling	Computational methods of feature selection ed. by Huan Liu Boca Raton [u.a.] Chapman & Hall/CRC 2008 419 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Chapman & Hall/CRC data mining and knowledge discovery series Literaturangaben Aprendizado computacional larpcal Banco de dados (gerenciamento) larpcal Bases de données - Gestion Exploration de données (Informatique) Mineração de dados larpcal Recherche de l'information Systèmes d'information - Recherche Théorie de l'apprentissage informatique Database management Data mining Machine learning Data Mining (DE-588)4428654-5 gnd rswk-swf Datenbankverwaltung (DE-588)4389357-0 gnd rswk-swf Maschinelles Lernen (DE-588)4193754-5 gnd rswk-swf Merkmalsextraktion (DE-588)4314440-8 gnd rswk-swf Datenbankverwaltung (DE-588)4389357-0 s Merkmalsextraktion (DE-588)4314440-8 s Maschinelles Lernen (DE-588)4193754-5 s Data Mining (DE-588)4428654-5 s b DE-604 Liu, Huan Sonstige oth Digitalisierung UB Regensburg application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016983487&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis
spellingShingle	Computational methods of feature selection Aprendizado computacional larpcal Banco de dados (gerenciamento) larpcal Bases de données - Gestion Exploration de données (Informatique) Mineração de dados larpcal Recherche de l'information Systèmes d'information - Recherche Théorie de l'apprentissage informatique Database management Data mining Machine learning Data Mining (DE-588)4428654-5 gnd Datenbankverwaltung (DE-588)4389357-0 gnd Maschinelles Lernen (DE-588)4193754-5 gnd Merkmalsextraktion (DE-588)4314440-8 gnd
subject_GND	(DE-588)4428654-5 (DE-588)4389357-0 (DE-588)4193754-5 (DE-588)4314440-8
title	Computational methods of feature selection
title_auth	Computational methods of feature selection
title_exact_search	Computational methods of feature selection
title_exact_search_txtP	Computational methods of feature selection
title_full	Computational methods of feature selection ed. by Huan Liu
title_fullStr	Computational methods of feature selection ed. by Huan Liu
title_full_unstemmed	Computational methods of feature selection ed. by Huan Liu
title_short	Computational methods of feature selection
title_sort	computational methods of feature selection
topic	Aprendizado computacional larpcal Banco de dados (gerenciamento) larpcal Bases de données - Gestion Exploration de données (Informatique) Mineração de dados larpcal Recherche de l'information Systèmes d'information - Recherche Théorie de l'apprentissage informatique Database management Data mining Machine learning Data Mining (DE-588)4428654-5 gnd Datenbankverwaltung (DE-588)4389357-0 gnd Maschinelles Lernen (DE-588)4193754-5 gnd Merkmalsextraktion (DE-588)4314440-8 gnd
topic_facet	Aprendizado computacional Banco de dados (gerenciamento) Bases de données - Gestion Exploration de données (Informatique) Mineração de dados Recherche de l'information Systèmes d'information - Recherche Théorie de l'apprentissage informatique Database management Data mining Machine learning Data Mining Datenbankverwaltung Maschinelles Lernen Merkmalsextraktion
url	http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016983487&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA
work_keys_str_mv	AT liuhuan computationalmethodsoffeatureselection

Verfügbarkeit

Es ist kein Print-Exemplar vorhanden.

Fernleihe Bestellen Achtung: Nicht im THWS-Bestand! Inhaltsverzeichnis

MARC

Datensatz im Suchindex

Es ist kein Print-Exemplar vorhanden.

Ähnliche Einträge