Verfügbarkeit: Data clustering

Data clustering: theory, algorithms, and applications

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Gan, Guojun 1979- (VerfasserIn), Ma, Chaoqun (VerfasserIn), Wu, Jianhong (VerfasserIn)
Format:	Buch
Sprache:	English
Veröffentlicht:	Philadelphia, Pa [u.a.] SIAM [u.a.] 2007
Schriftenreihe:	ASA-SIAM series on statistics and applied probability 20
Schlagworte:	Classification automatique (Statistique) Classification automatique (Statistique) - Informatique Datenverarbeitung Cluster analysis Cluster analysis > Data processing Cluster-Analyse
Online-Zugang:	Publisher description Inhaltsverzeichnis Contributor biographical information Inhaltsverzeichnis
Beschreibung:	Literaturverz. S. 397 - 441
Beschreibung:	XXII, 466 S. graph. Darst.
ISBN:	0898716233 9780898716238

Internformat

MARC


LEADER	00000nam a2200000 cb4500
001	BV023397368
003	DE-604
005	20121011
007	t
008	080715s2007 d\|\|\| \|\|\|\| 00\|\|\| eng d
020			\|a 0898716233 \|9 0-89871-623-3
020			\|a 9780898716238 \|9 978-0-89871-623-8
035			\|a (OCoLC)77831225
035			\|a (DE-599)GBV522828817
040			\|a DE-604 \|b ger \|e aacr
041	0		\|a eng
049			\|a DE-29T \|a DE-91 \|a DE-634 \|a DE-11 \|a DE-384
050		0	\|a QA278
082	0		\|a 519.5/3 \|2 22
084			\|a SK 840 \|0 (DE-625)143261: \|2 rvk
084			\|a MAT 627f \|2 stub
100	1		\|a Gan, Guojun \|d 1979- \|e Verfasser \|0 (DE-588)142575968 \|4 aut
245	1	0	\|a Data clustering \|b theory, algorithms, and applications \|c Guojun Gan ; Chaoqun Ma ; Jianhong Wu
264		1	\|a Philadelphia, Pa [u.a.] \|b SIAM [u.a.] \|c 2007
300			\|a XXII, 466 S. \|b graph. Darst.
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
490	1		\|a ASA-SIAM series on statistics and applied probability \|v 20
500			\|a Literaturverz. S. 397 - 441
650		4	\|a Classification automatique (Statistique)
650		4	\|a Classification automatique (Statistique) - Informatique
650		4	\|a Datenverarbeitung
650		4	\|a Cluster analysis
650		4	\|a Cluster analysis \|x Data processing
650	0	7	\|a Cluster-Analyse \|0 (DE-588)4070044-6 \|2 gnd \|9 rswk-swf
689	0	0	\|a Cluster-Analyse \|0 (DE-588)4070044-6 \|D s
689	0		\|5 DE-604
700	1		\|a Ma, Chaoqun \|e Verfasser \|4 aut
700	1		\|a Wu, Jianhong \|e Verfasser \|4 aut
830		0	\|a ASA-SIAM series on statistics and applied probability \|v 20 \|w (DE-604)BV021491710 \|9 20
856	4		\|u http://www.loc.gov/catdir/enhancements/fy0709/2007061713-d.html \|z Publisher description \|z lizenzfrei
856	4		\|u http://www.loc.gov/catdir/enhancements/fy0709/2007061713-t.html \|z lizenzfrei \|3 Inhaltsverzeichnis
856	4		\|u http://www.loc.gov/catdir/enhancements/fy0710/2007061713-b.html \|z Contributor biographical information \|z lizenzfrei
856	4	2	\|m HBZ Datenaustausch \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016580217&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
999			\|a oai:aleph.bib-bvb.de:BVB01-016580217

Datensatz im Suchindex

_version_	1804137778301108224
adam_text	Contents List of Figures xiii List of Tables xv List of Algorithms xvii Preface xix 1 Clustering, Data, and Similarity Measures 1 1 Data Clustering 3 1.1 Definition of Data Clustering 3 1.2 The Vocabulary of Clustering 5 1.2.1 Records and Attributes 5 1.2.2 Distances and Similarities 5 1.2.3 Clusters, Centers, and Modes 6 1.2.4 Hard Clustering and Fuzzy Clustering 7 1.2.5 Validity Indices 8 1.3 Clustering Processes 8 1.4 Dealing with Missing Values 10 1.5 Resources for Clustering 12 1.5.1 Surveys and Reviews on Clustering 12 1.5.2 Books on Clustering 12 1.5.3 Journals 13 1.5.4 Conference Proceedings 15 1.5.5 Data Sets 17 1.6 Summary 17 2 Data Types 19 2.1 Categorical Data 19 2.2 Binary Data 21 2.3 Transaction Data 23 2.4 Symbolic Data 23 2.5 Time Series 24 2.6 Summary 24 v vi Contents 3 Scale Conversion 25 3.1 Introduction 25 3.1.1 Interval to Ordinal 25 3.1.2 Interval to Nominal 27 3.1.3 Ordinal to Nominal 28 3.1.4 Nominal to Ordinal 28 3.1.5 Ordinal to Interval 29 3.1.6 Other Conversions 29 3.2 Categorization of Numerical Data 30 3.2.1 Direct Categorization 30 3.2.2 Cluster-based Categorization 31 3.2.3 Automatic Categorization 37 3.3 Summary 41 4 Data Standardization and Transformation 43 4.1 Data Standardization 43 4.2 Data Transformation 46 4.2.1 Principal Component Analysis 46 4.2.2 SVD 48 4.2.3 The Karhunen-Loeve Transformation 49 4.3 Summary 51 5 Data Visualization 53 5.1 Sammon s Mapping 53 5.2 MDS 54 5.3 SOM 56 5.4 Class-preserving Projections 59 5.5 Parallel Coordinates 60 5.6 Tree Maps 61 5.7 Categorical Data Visualization 62 5.8 Other Visualization Techniques 65 5.9 Summary 65 6 Similarity and Dissimilarity Measures 67 6.1 Preliminaries 67 6.1.1 Proximity Matrix 68 6.1.2 Proximity Graph 69 6.1.3 Scatter Matrix 69 6.1.4 Covariance Matrix 70 6.2 Measures for Numerical Data 71 6.2.1 Euclidean Distance 71 6.2.2 Manhattan Distance 71 6.2.3 Maximum Distance 72 6.2.4 Minkowski Distance 72 6.2.5 Mahalanobis Distance 72 Contents vii 6.2.6 Average Distance 73 6.2.7 Other Distances 74 6.3 Measures for Categorical Data 74 6.3.1 The Simple Matching Distance 76 6.3.2 Other Matching Coefficients 76 6.4 Measures for Binary Data 77 6.5 Measures for Mixed-type Data 79 6.5.1 A General Similarity Coefficient 79 6.5.2 A General Distance Coefficient 80 6.5.3 A Generalized Minkowski Distance 81 6.6 Measures for Time Series Data 83 6.6.1 The Minkowski Distance 84 6.6.2 Time Series Preprocessing 85 6.6.3 Dynamic Time Warping 87 6.6.4 Measures Based on Longest Common Subsequences .... 88 6.6.5 Measures Based on Probabilistic Models 90 6.6.6 Measures Based on Landmark Models 91 6.6.7 Evaluation 92 6.7 Other Measures 92 6.7.1 The Cosine Similarity Measure 93 6.7.2 A Link-based Similarity Measure 93 6.7.3 Support 94 6.8 Similarity and Dissimilarity Measures between Clusters 94 6.8.1 The Mean-based Distance 94 6.8.2 The Nearest Neighbor Distance 95 6.8.3 The Farthest Neighbor Distance 95 6.8.4 The Average Neighbor Distance 96 6.8.5 Lance-Williams Formula 96 6.9 Similarity and Dissimilarity between Variables 98 6.9.1 Pearson s Correlation Coefficients 98 6.9.2 Measures Based on the Chi-square Statistic 101 6.9.3 Measures Based on Optimal Class Prediction 103 6.9.4 Group-based Distance 105 6.10 Summary 106 II Clustering Algorithms 107 7 Hierarchical Clustering Techniques 109 7.1 Representations of Hierarchical Clusterings 109 7.1.1 H-tree 110 7.1.2 Dendrogram 110 7.1.3 Banner 112 7.1.4 Pointer Representation 112 7.1.5 Packed Representation 114 7.1.6 Icicle Plot 115 7.1.7 Other Representations 115 viii Contents 7.2 Agglomerative Hierarchical Methods 116 7.2.1 The Single-link Method 118 7.2.2 The Complete Link Method 120 7.2.3 The Group Average Method 122 7.2.4 The Weighted Group Average Method 125 7.2.5 The Centroid Method 126 7.2.6 The Median Method 130 7.2.7 Ward s Method 132 7.2.8 Other Agglomerative Methods 137 7.3 Divisive Hierarchical Methods 137 7.4 Several Hierarchical Algorithms 138 7.4.1 SLINK 138 7.4.2 Single-link Algorithms Based on Minimum Spanning Trees 140 7.4.3 CLINK 141 7.4.4 BIRCH 144 7.4.5 CURE 144 7.4.6 DIANA 145 7.4.7 DISMEA 147 7.4.8 Edwards and Cavalli-Sforza Method 147 7.5 Summary 149 8 Fuzzy Clustering Algorithms 151 8.1 Fuzzy Sets 151 8.2 Fuzzy Relations 153 8.3 Fuzzy /c-means 154 8.4 Fuzzy A:-modes 156 8.5 The c-means Method 158 8.6 Summary 159 9 Center-based Clustering Algorithms 161 9.1 The £-means Algorithm 161 9.2 Variations of the £-means Algorithm 164 9.2.1 The Continuous A-means Algorithm 165 9.2.2 The Compare-means Algorithm 165 9.2.3 The Sort-means Algorithm 166 9.2.4 Acceleration of the A-means Algorithm with the kd-tree 167 9.2.5 Other Acceleration Methods 168 9.3 The Trimmed £-means Algorithm 169 9.4 The .r-means Algorithm 170 9.5 The A--harmonic Means Algorithm 171 9.6 The Mean Shift Algorithm 173 9.7 MEC 175 9.8 The ^-modes Algorithm (Huang) 176 9.8.1 Initial Modes Selection 178 9.9 The it-modes Algorithm (Chaturvedi et al.) 178 Contents ix 9.10 The/t-probabilities Algorithm 179 9.11 The ^-prototypes Algorithm 181 9.12 Summary 182 10 Search-based Clustering Algorithms 183 10.1 Genetic Algorithms 184 10.2 The Tabu Search Method 185 10.3 Variable Neighborhood Search lor Clustering 186 10.4 Al-Sultan s Method 187 10.5 Tabu Search-based Categorical Clustering Algorithm 189 10.6 ./-means 190 10.7 GKA 192 10.8 The Global Jfc-means Algorithm 195 10.9 The Genetic A-modes Algorithm 195 10.9.1 The Selection Operator 196 10.9.2 The Mutation Operator 196 10.9.3 The it-modes Operator 197 10.10 The Genetic Fuzzy A-modes Algorithm 197 10.10.1 String Representation 198 10.10.2 Initialization Process 198 10.10.3 Selection Process 199 10.10.4 Crossover Process 199 10.10.5 Mutation Process 200 10.10.6 Termination Criterion 2(K) 10.11 SARS 200 10.12 Summary 202 11 Graph-based Clustering Algorithms 203 11.1 Chameleon 203 11.2 CACTUS 204 11.3 A Dynamic System-based Approach 205 11.4 ROCK 207 11.5 Summary 208 12 Grid-based Clustering Algorithms 209 12.1 STING 209 12.2 OptiGrid 210 12.3 GRIDCLUS 212 12.4 GDILC 214 12.5 WaveCluster 216 12.6 Summary 217 13 Density-based Clustering Algorithms 219 13.1 DBSCAN 219 13.2 BRIDGE 221 13.3 DBCLASD 222 x Contents 13.4 DENCLUE 223 13.5 CUBN 225 13.6 Summary 226 14 Model-based Clustering Algorithms 227 14.1 Introduction 227 14.2 Gaussian Clustering Models 230 14.3 Model-based Agglomerative Hierarchical Clustering 232 14.4 The EM Algorithm 235 14.5 Model-based Clustering 237 14.6 COOLCAT 240 14.7 STUCCO 241 14.8 Summary 242 15 Subspace Clustering 243 15.1 CLIQUE 244 15.2 PROCLUS 246 15.3 ORCLUS 249 15.4 ENCLUS 253 15.5 FINDIT 255 15.6 MAFIA 258 15.7 DOC 259 15.8 CLTree 261 15.9 PART 262 15.10 SUBCAD 264 15.11 Fuzzy Subspace Clustering 270 15.12 Mean Shift for Subspace Clustering 275 15.13 Summary 285 16 Miscellaneous Algorithms 287 16.1 Time Series Clustering Algorithms 287 16.2 Streaming Algorithms 289 16.2.1 LSEARCH 290 16.2.2 Other Streaming Algorithms 293 16.3 Transaction Data Clustering Algorithms 293 16.3.1 Largeltem 294 16.3.2 CLOPE 295 16.3.3 OAK 296 16.4 Summary 297 17 Evaluation of Clustering Algorithms 299 17.1 Introduction 299 17.1.1 Hypothesis Testing 301 17.1.2 External Criteria 302 17.1.3 Internal Criteria 303 17.1.4 Relative Criteria 304 Contents xi 17.2 Evaluation of Partitional Clustering 305 17.2.1 Modified Hubert s T Statistic 305 17.2.2 The Davies-Bouldin Index 305 17.2.3 Dunn s Index 307 17.2.4 The SD Validity Index 307 17.2.5 The S_Dbw Validity Index 308 17.2.6 The RMSSTD Index 309 17.2.7 The RS Index 310 17.2.8 The Calinski-Harabasz Index 310 17.2.9 Rand s Index 311 17.2.10 Average of Compactness 312 17.2.11 Distances between Partitions 312 17.3 Evaluation of Hierarchical Clustering 314 17.3.1 Testing Absence of Structure 314 17.3.2 Testing Hierarchical Structures 315 17.4 Validity Indices for Fuzzy Clustering 315 17.4.1 The Partition Coefficient Index 315 17.4.2 The Partition Entropy Index 316 17.4.3 The Fukuyama-Sugeno Index 316 17.4.4 Validity Based on Fuzzy Similarity 317 17.4.5 A Compact and Separate Fuzzy Validity Criterion 318 17.4.6 A Partition Separation Index 319 17.4.7 An Index Based on the Mini-max Filter Concept and Fuzzy Theory 319 17.5 Summary 320 III Applications of Clustering 321 18 Clustering Gene Expression Data 323 18.1 Background 323 18.2 Applications of Gene Expression Data Clustering 324 18.3 Types of Gene Expression Data Clustering 325 18.4 Some Guidelines for Gene Expression Clustering 325 18.5 Similarity Measures for Gene Expression Data 326 18.5.1 Euclidean Distance 326 18.5.2 Pearson s Correlation Coefficient 326 18.6 ACase Study 328 18.6.1 C++ Code 328 18.6.2 Results 334 18.7 Summary 334 IV MATLAB and C++for Clustering 341 19 Data Clustering in MATLAB 343 19.1 Read and Write Data Files 343 19.2 Handle Categorical Data 347 xii Contents 19.3 M-files, MEX-files, and MAT-files 349 19.3.1 M-files 349 19.3.2 MEX-files 351 19.3.3 MAT-files 354 19.4 Speed up MATLAB 354 19.5 Some Clustering Functions 355 19.5.1 Hierarchical Clustering 355 19.5.2 £-means Clustering 359 19.6 Summary 362 20 Clustering in C/C++ 363 20.1 The STL 363 20.1.1 The vector Class 363 20.1.2 The list Class 364 20.2 C/C++ Program Compilation 366 20.3 Data Structure and Implementation 367 20.3.1 Data Matrices and Centers 367 20.3.2 Clustering Results 368 20.3.3 The Quick Sort Algorithm 369 20.4 Summary 369 A Some Clustering Algorithms 371 B The kd-tree Data Structure 375 C MATLAB Codes 377 C.I The MATLAB Code for Generating Subspace Clusters 377 C.2 The MATLAB Code for the A-modes Algorithm 379 C.3 The MATLAB Code for the MSSC Algorithm 381 D C++ Codes 385 D. 1 The C++ Code for Converting Categorical Values to Integers 385 D.2 The C++ Code for the FSC Algorithm 388 Bibliography 397 Subject Index 443 Author Index 455
adam_txt	Contents List of Figures xiii List of Tables xv List of Algorithms xvii Preface xix 1 Clustering, Data, and Similarity Measures 1 1 Data Clustering 3 1.1 Definition of Data Clustering 3 1.2 The Vocabulary of Clustering 5 1.2.1 Records and Attributes 5 1.2.2 Distances and Similarities 5 1.2.3 Clusters, Centers, and Modes 6 1.2.4 Hard Clustering and Fuzzy Clustering 7 1.2.5 Validity Indices 8 1.3 Clustering Processes 8 1.4 Dealing with Missing Values 10 1.5 Resources for Clustering 12 1.5.1 Surveys and Reviews on Clustering 12 1.5.2 Books on Clustering 12 1.5.3 Journals 13 1.5.4 Conference Proceedings 15 1.5.5 Data Sets 17 1.6 Summary 17 2 Data Types 19 2.1 Categorical Data 19 2.2 Binary Data 21 2.3 Transaction Data 23 2.4 Symbolic Data 23 2.5 Time Series 24 2.6 Summary 24 v vi Contents 3 Scale Conversion 25 3.1 Introduction 25 3.1.1 Interval to Ordinal 25 3.1.2 Interval to Nominal 27 3.1.3 Ordinal to Nominal 28 3.1.4 Nominal to Ordinal 28 3.1.5 Ordinal to Interval 29 3.1.6 Other Conversions 29 3.2 Categorization of Numerical Data 30 3.2.1 Direct Categorization 30 3.2.2 Cluster-based Categorization 31 3.2.3 Automatic Categorization 37 3.3 Summary 41 4 Data Standardization and Transformation 43 4.1 Data Standardization 43 4.2 Data Transformation 46 4.2.1 Principal Component Analysis 46 4.2.2 SVD 48 4.2.3 The Karhunen-Loeve Transformation 49 4.3 Summary 51 5 Data Visualization 53 5.1 Sammon's Mapping 53 5.2 MDS 54 5.3 SOM 56 5.4 Class-preserving Projections 59 5.5 Parallel Coordinates 60 5.6 Tree Maps 61 5.7 Categorical Data Visualization 62 5.8 Other Visualization Techniques 65 5.9 Summary 65 6 Similarity and Dissimilarity Measures 67 6.1 Preliminaries 67 6.1.1 Proximity Matrix 68 6.1.2 Proximity Graph 69 6.1.3 Scatter Matrix 69 6.1.4 Covariance Matrix 70 6.2 Measures for Numerical Data 71 6.2.1 Euclidean Distance 71 6.2.2 Manhattan Distance 71 6.2.3 Maximum Distance 72 6.2.4 Minkowski Distance 72 6.2.5 Mahalanobis Distance 72 Contents vii 6.2.6 Average Distance 73 6.2.7 Other Distances 74 6.3 Measures for Categorical Data 74 6.3.1 The Simple Matching Distance 76 6.3.2 Other Matching Coefficients 76 6.4 Measures for Binary Data 77 6.5 Measures for Mixed-type Data 79 6.5.1 A General Similarity Coefficient 79 6.5.2 A General Distance Coefficient 80 6.5.3 A Generalized Minkowski Distance 81 6.6 Measures for Time Series Data 83 6.6.1 The Minkowski Distance 84 6.6.2 Time Series Preprocessing 85 6.6.3 Dynamic Time Warping 87 6.6.4 Measures Based on Longest Common Subsequences . 88 6.6.5 Measures Based on Probabilistic Models 90 6.6.6 Measures Based on Landmark Models 91 6.6.7 Evaluation 92 6.7 Other Measures 92 6.7.1 The Cosine Similarity Measure 93 6.7.2 A Link-based Similarity Measure 93 6.7.3 Support 94 6.8 Similarity and Dissimilarity Measures between Clusters 94 6.8.1 The Mean-based Distance 94 6.8.2 The Nearest Neighbor Distance 95 6.8.3 The Farthest Neighbor Distance 95 6.8.4 The Average Neighbor Distance 96 6.8.5 Lance-Williams Formula 96 6.9 Similarity and Dissimilarity between Variables 98 6.9.1 Pearson's Correlation Coefficients 98 6.9.2 Measures Based on the Chi-square Statistic 101 6.9.3 Measures Based on Optimal Class Prediction 103 6.9.4 Group-based Distance 105 6.10 Summary 106 II Clustering Algorithms 107 7 Hierarchical Clustering Techniques 109 7.1 Representations of Hierarchical Clusterings 109 7.1.1 H-tree 110 7.1.2 Dendrogram 110 7.1.3 Banner 112 7.1.4 Pointer Representation 112 7.1.5 Packed Representation 114 7.1.6 Icicle Plot 115 7.1.7 Other Representations 115 viii Contents 7.2 Agglomerative Hierarchical Methods 116 7.2.1 The Single-link Method 118 7.2.2 The Complete Link Method 120 7.2.3 The Group Average Method 122 7.2.4 The Weighted Group Average Method 125 7.2.5 The Centroid Method 126 7.2.6 The Median Method 130 7.2.7 Ward's Method 132 7.2.8 Other Agglomerative Methods 137 7.3 Divisive Hierarchical Methods 137 7.4 Several Hierarchical Algorithms 138 7.4.1 SLINK 138 7.4.2 Single-link Algorithms Based on Minimum Spanning Trees 140 7.4.3 CLINK 141 7.4.4 BIRCH 144 7.4.5 CURE 144 7.4.6 DIANA 145 7.4.7 DISMEA 147 7.4.8 Edwards and Cavalli-Sforza Method 147 7.5 Summary 149 8 Fuzzy Clustering Algorithms 151 8.1 Fuzzy Sets 151 8.2 Fuzzy Relations 153 8.3 Fuzzy /c-means 154 8.4 Fuzzy A:-modes 156 8.5 The c-means Method 158 8.6 Summary 159 9 Center-based Clustering Algorithms 161 9.1 The £-means Algorithm 161 9.2 Variations of the £-means Algorithm 164 9.2.1 The Continuous A-means Algorithm 165 9.2.2 The Compare-means Algorithm 165 9.2.3 The Sort-means Algorithm 166 9.2.4 Acceleration of the A-means Algorithm with the kd-tree 167 9.2.5 Other Acceleration Methods 168 9.3 The Trimmed £-means Algorithm 169 9.4 The .r-means Algorithm 170 9.5 The A--harmonic Means Algorithm 171 9.6 The Mean Shift Algorithm 173 9.7 MEC 175 9.8 The ^-modes Algorithm (Huang) 176 9.8.1 Initial Modes Selection 178 9.9 The it-modes Algorithm (Chaturvedi et al.) 178 Contents ix 9.10 The/t-probabilities Algorithm 179 9.11 The ^-prototypes Algorithm 181 9.12 Summary 182 10 Search-based Clustering Algorithms 183 10.1 Genetic Algorithms 184 10.2 The Tabu Search Method 185 10.3 Variable Neighborhood Search lor Clustering 186 10.4 Al-Sultan's Method 187 10.5 Tabu Search-based Categorical Clustering Algorithm 189 10.6 ./-means 190 10.7 GKA 192 10.8 The Global Jfc-means Algorithm 195 10.9 The Genetic A-modes Algorithm 195 10.9.1 The Selection Operator 196 10.9.2 The Mutation Operator 196 10.9.3 The it-modes Operator 197 10.10 The Genetic Fuzzy A-modes Algorithm 197 10.10.1 String Representation 198 10.10.2 Initialization Process 198 10.10.3 Selection Process 199 10.10.4 Crossover Process 199 10.10.5 Mutation Process 200 10.10.6 Termination Criterion 2(K) 10.11 SARS 200 10.12 Summary 202 11 Graph-based Clustering Algorithms 203 11.1 Chameleon 203 11.2 CACTUS 204 11.3 A Dynamic System-based Approach 205 11.4 ROCK 207 11.5 Summary 208 12 Grid-based Clustering Algorithms 209 12.1 STING 209 12.2 OptiGrid 210 12.3 GRIDCLUS 212 12.4 GDILC 214 12.5 WaveCluster 216 12.6 Summary 217 13 Density-based Clustering Algorithms 219 13.1 DBSCAN 219 13.2 BRIDGE 221 13.3 DBCLASD 222 x Contents 13.4 DENCLUE 223 13.5 CUBN 225 13.6 Summary 226 14 Model-based Clustering Algorithms 227 14.1 Introduction 227 14.2 Gaussian Clustering Models 230 14.3 Model-based Agglomerative Hierarchical Clustering 232 14.4 The EM Algorithm 235 14.5 Model-based Clustering 237 14.6 COOLCAT 240 14.7 STUCCO 241 14.8 Summary 242 15 Subspace Clustering 243 15.1 CLIQUE 244 15.2 PROCLUS 246 15.3 ORCLUS 249 15.4 ENCLUS 253 15.5 FINDIT 255 15.6 MAFIA 258 15.7 DOC 259 15.8 CLTree 261 15.9 PART 262 15.10 SUBCAD 264 15.11 Fuzzy Subspace Clustering 270 15.12 Mean Shift for Subspace Clustering 275 15.13 Summary 285 16 Miscellaneous Algorithms 287 16.1 Time Series Clustering Algorithms 287 16.2 Streaming Algorithms 289 16.2.1 LSEARCH 290 16.2.2 Other Streaming Algorithms 293 16.3 Transaction Data Clustering Algorithms 293 16.3.1 Largeltem 294 16.3.2 CLOPE 295 16.3.3 OAK 296 16.4 Summary 297 17 Evaluation of Clustering Algorithms 299 17.1 Introduction 299 17.1.1 Hypothesis Testing 301 17.1.2 External Criteria 302 17.1.3 Internal Criteria 303 17.1.4 Relative Criteria 304 Contents xi 17.2 Evaluation of Partitional Clustering 305 17.2.1 Modified Hubert's T Statistic 305 17.2.2 The Davies-Bouldin Index 305 17.2.3 Dunn's Index 307 17.2.4 The SD Validity Index 307 17.2.5 The S_Dbw Validity Index 308 17.2.6 The RMSSTD Index 309 17.2.7 The RS Index 310 17.2.8 The Calinski-Harabasz Index 310 17.2.9 Rand's Index 311 17.2.10 Average of Compactness 312 17.2.11 Distances between Partitions 312 17.3 Evaluation of Hierarchical Clustering 314 17.3.1 Testing Absence of Structure 314 17.3.2 Testing Hierarchical Structures 315 17.4 Validity Indices for Fuzzy Clustering 315 17.4.1 The Partition Coefficient Index 315 17.4.2 The Partition Entropy Index 316 17.4.3 The Fukuyama-Sugeno Index 316 17.4.4 Validity Based on Fuzzy Similarity 317 17.4.5 A Compact and Separate Fuzzy Validity Criterion 318 17.4.6 A Partition Separation Index 319 17.4.7 An Index Based on the Mini-max Filter Concept and Fuzzy Theory 319 17.5 Summary 320 III Applications of Clustering 321 18 Clustering Gene Expression Data 323 18.1 Background 323 18.2 Applications of Gene Expression Data Clustering 324 18.3 Types of Gene Expression Data Clustering 325 18.4 Some Guidelines for Gene Expression Clustering 325 18.5 Similarity Measures for Gene Expression Data 326 18.5.1 Euclidean Distance 326 18.5.2 Pearson's Correlation Coefficient 326 18.6 ACase Study 328 18.6.1 C++ Code 328 18.6.2 Results 334 18.7 Summary 334 IV MATLAB and C++for Clustering 341 19 Data Clustering in MATLAB 343 19.1 Read and Write Data Files 343 19.2 Handle Categorical Data 347 xii Contents 19.3 M-files, MEX-files, and MAT-files 349 19.3.1 M-files 349 19.3.2 MEX-files 351 19.3.3 MAT-files 354 19.4 Speed up MATLAB 354 19.5 Some Clustering Functions 355 19.5.1 Hierarchical Clustering 355 19.5.2 £-means Clustering 359 19.6 Summary 362 20 Clustering in C/C++ 363 20.1 The STL 363 20.1.1 The vector Class 363 20.1.2 The list Class 364 20.2 C/C++ Program Compilation 366 20.3 Data Structure and Implementation 367 20.3.1 Data Matrices and Centers 367 20.3.2 Clustering Results 368 20.3.3 The Quick Sort Algorithm 369 20.4 Summary 369 A Some Clustering Algorithms 371 B The kd-tree Data Structure 375 C MATLAB Codes 377 C.I The MATLAB Code for Generating Subspace Clusters 377 C.2 The MATLAB Code for the A-modes Algorithm 379 C.3 The MATLAB Code for the MSSC Algorithm 381 D C++ Codes 385 D. 1 The C++ Code for Converting Categorical Values to Integers 385 D.2 The C++ Code for the FSC Algorithm 388 Bibliography 397 Subject Index 443 Author Index 455
any_adam_object	1
any_adam_object_boolean	1
author	Gan, Guojun 1979- Ma, Chaoqun Wu, Jianhong
author_GND	(DE-588)142575968
author_facet	Gan, Guojun 1979- Ma, Chaoqun Wu, Jianhong
author_role	aut aut aut
author_sort	Gan, Guojun 1979-
author_variant	g g gg c m cm j w jw
building	Verbundindex
bvnumber	BV023397368
callnumber-first	Q - Science
callnumber-label	QA278
callnumber-raw	QA278
callnumber-search	QA278
callnumber-sort	QA 3278
callnumber-subject	QA - Mathematics
classification_rvk	SK 840
classification_tum	MAT 627f
ctrlnum	(OCoLC)77831225 (DE-599)GBV522828817
dewey-full	519.5/3
dewey-hundreds	500 - Natural sciences and mathematics
dewey-ones	519 - Probabilities and applied mathematics
dewey-raw	519.5/3
dewey-search	519.5/3
dewey-sort	3519.5 13
dewey-tens	510 - Mathematics
discipline	Mathematik
discipline_str_mv	Mathematik
format	Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02281nam a2200517 cb4500</leader><controlfield tag="001">BV023397368</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20121011 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">080715s2007 d\|\|\| \|\|\|\| 00\|\|\| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0898716233</subfield><subfield code="9">0-89871-623-3</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780898716238</subfield><subfield code="9">978-0-89871-623-8</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)77831225</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBV522828817</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-29T</subfield><subfield code="a">DE-91</subfield><subfield code="a">DE-634</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-384</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA278</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">519.5/3</subfield><subfield code="2">22</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SK 840</subfield><subfield code="0">(DE-625)143261:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">MAT 627f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Gan, Guojun</subfield><subfield code="d">1979-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)142575968</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Data clustering</subfield><subfield code="b">theory, algorithms, and applications</subfield><subfield code="c">Guojun Gan ; Chaoqun Ma ; Jianhong Wu</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Philadelphia, Pa [u.a.]</subfield><subfield code="b">SIAM [u.a.]</subfield><subfield code="c">2007</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXII, 466 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">ASA-SIAM series on statistics and applied probability</subfield><subfield code="v">20</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturverz. S. 397 - 441</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Classification automatique (Statistique)</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Classification automatique (Statistique) - Informatique</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Datenverarbeitung</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cluster analysis</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cluster analysis</subfield><subfield code="x">Data processing</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Cluster-Analyse</subfield><subfield code="0">(DE-588)4070044-6</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Cluster-Analyse</subfield><subfield code="0">(DE-588)4070044-6</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ma, Chaoqun</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Wu, Jianhong</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">ASA-SIAM series on statistics and applied probability</subfield><subfield code="v">20</subfield><subfield code="w">(DE-604)BV021491710</subfield><subfield code="9">20</subfield></datafield><datafield tag="856" ind1="4" ind2=" "><subfield code="u">http://www.loc.gov/catdir/enhancements/fy0709/2007061713-d.html</subfield><subfield code="z">Publisher description</subfield><subfield code="z">lizenzfrei</subfield></datafield><datafield tag="856" ind1="4" ind2=" "><subfield code="u">http://www.loc.gov/catdir/enhancements/fy0709/2007061713-t.html</subfield><subfield code="z">lizenzfrei</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2=" "><subfield code="u">http://www.loc.gov/catdir/enhancements/fy0710/2007061713-b.html</subfield><subfield code="z">Contributor biographical information</subfield><subfield code="z">lizenzfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016580217&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-016580217</subfield></datafield></record></collection>
id	DE-604.BV023397368
illustrated	Illustrated
index_date	2024-07-02T21:22:25Z
indexdate	2024-07-09T21:17:42Z
institution	BVB
isbn	0898716233 9780898716238
language	English
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-016580217
oclc_num	77831225
open_access_boolean
owner	DE-29T DE-91 DE-BY-TUM DE-634 DE-11 DE-384
owner_facet	DE-29T DE-91 DE-BY-TUM DE-634 DE-11 DE-384
physical	XXII, 466 S. graph. Darst.
publishDate	2007
publishDateSearch	2007
publishDateSort	2007
publisher	SIAM [u.a.]
record_format	marc
series	ASA-SIAM series on statistics and applied probability
series2	ASA-SIAM series on statistics and applied probability
spelling	Gan, Guojun 1979- Verfasser (DE-588)142575968 aut Data clustering theory, algorithms, and applications Guojun Gan ; Chaoqun Ma ; Jianhong Wu Philadelphia, Pa [u.a.] SIAM [u.a.] 2007 XXII, 466 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier ASA-SIAM series on statistics and applied probability 20 Literaturverz. S. 397 - 441 Classification automatique (Statistique) Classification automatique (Statistique) - Informatique Datenverarbeitung Cluster analysis Cluster analysis Data processing Cluster-Analyse (DE-588)4070044-6 gnd rswk-swf Cluster-Analyse (DE-588)4070044-6 s DE-604 Ma, Chaoqun Verfasser aut Wu, Jianhong Verfasser aut ASA-SIAM series on statistics and applied probability 20 (DE-604)BV021491710 20 http://www.loc.gov/catdir/enhancements/fy0709/2007061713-d.html Publisher description lizenzfrei http://www.loc.gov/catdir/enhancements/fy0709/2007061713-t.html lizenzfrei Inhaltsverzeichnis http://www.loc.gov/catdir/enhancements/fy0710/2007061713-b.html Contributor biographical information lizenzfrei HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016580217&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis
spellingShingle	Gan, Guojun 1979- Ma, Chaoqun Wu, Jianhong Data clustering theory, algorithms, and applications ASA-SIAM series on statistics and applied probability Classification automatique (Statistique) Classification automatique (Statistique) - Informatique Datenverarbeitung Cluster analysis Cluster analysis Data processing Cluster-Analyse (DE-588)4070044-6 gnd
subject_GND	(DE-588)4070044-6
title	Data clustering theory, algorithms, and applications
title_auth	Data clustering theory, algorithms, and applications
title_exact_search	Data clustering theory, algorithms, and applications
title_exact_search_txtP	Data clustering theory, algorithms, and applications
title_full	Data clustering theory, algorithms, and applications Guojun Gan ; Chaoqun Ma ; Jianhong Wu
title_fullStr	Data clustering theory, algorithms, and applications Guojun Gan ; Chaoqun Ma ; Jianhong Wu
title_full_unstemmed	Data clustering theory, algorithms, and applications Guojun Gan ; Chaoqun Ma ; Jianhong Wu
title_short	Data clustering
title_sort	data clustering theory algorithms and applications
title_sub	theory, algorithms, and applications
topic	Classification automatique (Statistique) Classification automatique (Statistique) - Informatique Datenverarbeitung Cluster analysis Cluster analysis Data processing Cluster-Analyse (DE-588)4070044-6 gnd
topic_facet	Classification automatique (Statistique) Classification automatique (Statistique) - Informatique Datenverarbeitung Cluster analysis Cluster analysis Data processing Cluster-Analyse
url	http://www.loc.gov/catdir/enhancements/fy0709/2007061713-d.html http://www.loc.gov/catdir/enhancements/fy0709/2007061713-t.html http://www.loc.gov/catdir/enhancements/fy0710/2007061713-b.html http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016580217&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA
volume_link	(DE-604)BV021491710
work_keys_str_mv	AT ganguojun dataclusteringtheoryalgorithmsandapplications AT machaoqun dataclusteringtheoryalgorithmsandapplications AT wujianhong dataclusteringtheoryalgorithmsandapplications

Verfügbarkeit

Es ist kein Print-Exemplar vorhanden.

Fernleihe Bestellen Achtung: Nicht im THWS-Bestand! Inhaltsverzeichnis
Inhaltsverzeichnis

MARC

Datensatz im Suchindex

Es ist kein Print-Exemplar vorhanden.

Ähnliche Einträge