Data clustering: theory, algorithms, and applications
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Philadelphia, Pa [u.a.]
SIAM [u.a.]
2007
|
Schriftenreihe: | ASA-SIAM series on statistics and applied probability
20 |
Schlagworte: | |
Online-Zugang: | Publisher description Inhaltsverzeichnis Contributor biographical information Inhaltsverzeichnis |
Beschreibung: | Literaturverz. S. 397 - 441 |
Beschreibung: | XXII, 466 S. graph. Darst. |
ISBN: | 0898716233 9780898716238 |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV023397368 | ||
003 | DE-604 | ||
005 | 20121011 | ||
007 | t | ||
008 | 080715s2007 d||| |||| 00||| eng d | ||
020 | |a 0898716233 |9 0-89871-623-3 | ||
020 | |a 9780898716238 |9 978-0-89871-623-8 | ||
035 | |a (OCoLC)77831225 | ||
035 | |a (DE-599)GBV522828817 | ||
040 | |a DE-604 |b ger |e aacr | ||
041 | 0 | |a eng | |
049 | |a DE-29T |a DE-91 |a DE-634 |a DE-11 |a DE-384 | ||
050 | 0 | |a QA278 | |
082 | 0 | |a 519.5/3 |2 22 | |
084 | |a SK 840 |0 (DE-625)143261: |2 rvk | ||
084 | |a MAT 627f |2 stub | ||
100 | 1 | |a Gan, Guojun |d 1979- |e Verfasser |0 (DE-588)142575968 |4 aut | |
245 | 1 | 0 | |a Data clustering |b theory, algorithms, and applications |c Guojun Gan ; Chaoqun Ma ; Jianhong Wu |
264 | 1 | |a Philadelphia, Pa [u.a.] |b SIAM [u.a.] |c 2007 | |
300 | |a XXII, 466 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a ASA-SIAM series on statistics and applied probability |v 20 | |
500 | |a Literaturverz. S. 397 - 441 | ||
650 | 4 | |a Classification automatique (Statistique) | |
650 | 4 | |a Classification automatique (Statistique) - Informatique | |
650 | 4 | |a Datenverarbeitung | |
650 | 4 | |a Cluster analysis | |
650 | 4 | |a Cluster analysis |x Data processing | |
650 | 0 | 7 | |a Cluster-Analyse |0 (DE-588)4070044-6 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Cluster-Analyse |0 (DE-588)4070044-6 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Ma, Chaoqun |e Verfasser |4 aut | |
700 | 1 | |a Wu, Jianhong |e Verfasser |4 aut | |
830 | 0 | |a ASA-SIAM series on statistics and applied probability |v 20 |w (DE-604)BV021491710 |9 20 | |
856 | 4 | |u http://www.loc.gov/catdir/enhancements/fy0709/2007061713-d.html |z Publisher description |z lizenzfrei | |
856 | 4 | |u http://www.loc.gov/catdir/enhancements/fy0709/2007061713-t.html |z lizenzfrei |3 Inhaltsverzeichnis | |
856 | 4 | |u http://www.loc.gov/catdir/enhancements/fy0710/2007061713-b.html |z Contributor biographical information |z lizenzfrei | |
856 | 4 | 2 | |m HBZ Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016580217&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-016580217 |
Datensatz im Suchindex
_version_ | 1804137778301108224 |
---|---|
adam_text | Contents
List of Figures xiii
List of Tables xv
List of Algorithms xvii
Preface xix
1 Clustering, Data, and Similarity Measures 1
1 Data Clustering 3
1.1 Definition of Data Clustering 3
1.2 The Vocabulary of Clustering 5
1.2.1 Records and Attributes 5
1.2.2 Distances and Similarities 5
1.2.3 Clusters, Centers, and Modes 6
1.2.4 Hard Clustering and Fuzzy Clustering 7
1.2.5 Validity Indices 8
1.3 Clustering Processes 8
1.4 Dealing with Missing Values 10
1.5 Resources for Clustering 12
1.5.1 Surveys and Reviews on Clustering 12
1.5.2 Books on Clustering 12
1.5.3 Journals 13
1.5.4 Conference Proceedings 15
1.5.5 Data Sets 17
1.6 Summary 17
2 Data Types 19
2.1 Categorical Data 19
2.2 Binary Data 21
2.3 Transaction Data 23
2.4 Symbolic Data 23
2.5 Time Series 24
2.6 Summary 24
v
vi Contents
3 Scale Conversion 25
3.1 Introduction 25
3.1.1 Interval to Ordinal 25
3.1.2 Interval to Nominal 27
3.1.3 Ordinal to Nominal 28
3.1.4 Nominal to Ordinal 28
3.1.5 Ordinal to Interval 29
3.1.6 Other Conversions 29
3.2 Categorization of Numerical Data 30
3.2.1 Direct Categorization 30
3.2.2 Cluster-based Categorization 31
3.2.3 Automatic Categorization 37
3.3 Summary 41
4 Data Standardization and Transformation 43
4.1 Data Standardization 43
4.2 Data Transformation 46
4.2.1 Principal Component Analysis 46
4.2.2 SVD 48
4.2.3 The Karhunen-Loeve Transformation 49
4.3 Summary 51
5 Data Visualization 53
5.1 Sammon s Mapping 53
5.2 MDS 54
5.3 SOM 56
5.4 Class-preserving Projections 59
5.5 Parallel Coordinates 60
5.6 Tree Maps 61
5.7 Categorical Data Visualization 62
5.8 Other Visualization Techniques 65
5.9 Summary 65
6 Similarity and Dissimilarity Measures 67
6.1 Preliminaries 67
6.1.1 Proximity Matrix 68
6.1.2 Proximity Graph 69
6.1.3 Scatter Matrix 69
6.1.4 Covariance Matrix 70
6.2 Measures for Numerical Data 71
6.2.1 Euclidean Distance 71
6.2.2 Manhattan Distance 71
6.2.3 Maximum Distance 72
6.2.4 Minkowski Distance 72
6.2.5 Mahalanobis Distance 72
Contents vii
6.2.6 Average Distance 73
6.2.7 Other Distances 74
6.3 Measures for Categorical Data 74
6.3.1 The Simple Matching Distance 76
6.3.2 Other Matching Coefficients 76
6.4 Measures for Binary Data 77
6.5 Measures for Mixed-type Data 79
6.5.1 A General Similarity Coefficient 79
6.5.2 A General Distance Coefficient 80
6.5.3 A Generalized Minkowski Distance 81
6.6 Measures for Time Series Data 83
6.6.1 The Minkowski Distance 84
6.6.2 Time Series Preprocessing 85
6.6.3 Dynamic Time Warping 87
6.6.4 Measures Based on Longest Common Subsequences .... 88
6.6.5 Measures Based on Probabilistic Models 90
6.6.6 Measures Based on Landmark Models 91
6.6.7 Evaluation 92
6.7 Other Measures 92
6.7.1 The Cosine Similarity Measure 93
6.7.2 A Link-based Similarity Measure 93
6.7.3 Support 94
6.8 Similarity and Dissimilarity Measures between Clusters 94
6.8.1 The Mean-based Distance 94
6.8.2 The Nearest Neighbor Distance 95
6.8.3 The Farthest Neighbor Distance 95
6.8.4 The Average Neighbor Distance 96
6.8.5 Lance-Williams Formula 96
6.9 Similarity and Dissimilarity between Variables 98
6.9.1 Pearson s Correlation Coefficients 98
6.9.2 Measures Based on the Chi-square Statistic 101
6.9.3 Measures Based on Optimal Class Prediction 103
6.9.4 Group-based Distance 105
6.10 Summary 106
II Clustering Algorithms 107
7 Hierarchical Clustering Techniques 109
7.1 Representations of Hierarchical Clusterings 109
7.1.1 H-tree 110
7.1.2 Dendrogram 110
7.1.3 Banner 112
7.1.4 Pointer Representation 112
7.1.5 Packed Representation 114
7.1.6 Icicle Plot 115
7.1.7 Other Representations 115
viii Contents
7.2 Agglomerative Hierarchical Methods 116
7.2.1 The Single-link Method 118
7.2.2 The Complete Link Method 120
7.2.3 The Group Average Method 122
7.2.4 The Weighted Group Average Method 125
7.2.5 The Centroid Method 126
7.2.6 The Median Method 130
7.2.7 Ward s Method 132
7.2.8 Other Agglomerative Methods 137
7.3 Divisive Hierarchical Methods 137
7.4 Several Hierarchical Algorithms 138
7.4.1 SLINK 138
7.4.2 Single-link Algorithms Based on Minimum Spanning Trees 140
7.4.3 CLINK 141
7.4.4 BIRCH 144
7.4.5 CURE 144
7.4.6 DIANA 145
7.4.7 DISMEA 147
7.4.8 Edwards and Cavalli-Sforza Method 147
7.5 Summary 149
8 Fuzzy Clustering Algorithms 151
8.1 Fuzzy Sets 151
8.2 Fuzzy Relations 153
8.3 Fuzzy /c-means 154
8.4 Fuzzy A:-modes 156
8.5 The c-means Method 158
8.6 Summary 159
9 Center-based Clustering Algorithms 161
9.1 The £-means Algorithm 161
9.2 Variations of the £-means Algorithm 164
9.2.1 The Continuous A-means Algorithm 165
9.2.2 The Compare-means Algorithm 165
9.2.3 The Sort-means Algorithm 166
9.2.4 Acceleration of the A-means Algorithm with the
kd-tree 167
9.2.5 Other Acceleration Methods 168
9.3 The Trimmed £-means Algorithm 169
9.4 The .r-means Algorithm 170
9.5 The A--harmonic Means Algorithm 171
9.6 The Mean Shift Algorithm 173
9.7 MEC 175
9.8 The ^-modes Algorithm (Huang) 176
9.8.1 Initial Modes Selection 178
9.9 The it-modes Algorithm (Chaturvedi et al.) 178
Contents ix
9.10 The/t-probabilities Algorithm 179
9.11 The ^-prototypes Algorithm 181
9.12 Summary 182
10 Search-based Clustering Algorithms 183
10.1 Genetic Algorithms 184
10.2 The Tabu Search Method 185
10.3 Variable Neighborhood Search lor Clustering 186
10.4 Al-Sultan s Method 187
10.5 Tabu Search-based Categorical Clustering Algorithm 189
10.6 ./-means 190
10.7 GKA 192
10.8 The Global Jfc-means Algorithm 195
10.9 The Genetic A-modes Algorithm 195
10.9.1 The Selection Operator 196
10.9.2 The Mutation Operator 196
10.9.3 The it-modes Operator 197
10.10 The Genetic Fuzzy A-modes Algorithm 197
10.10.1 String Representation 198
10.10.2 Initialization Process 198
10.10.3 Selection Process 199
10.10.4 Crossover Process 199
10.10.5 Mutation Process 200
10.10.6 Termination Criterion 2(K)
10.11 SARS 200
10.12 Summary 202
11 Graph-based Clustering Algorithms 203
11.1 Chameleon 203
11.2 CACTUS 204
11.3 A Dynamic System-based Approach 205
11.4 ROCK 207
11.5 Summary 208
12 Grid-based Clustering Algorithms 209
12.1 STING 209
12.2 OptiGrid 210
12.3 GRIDCLUS 212
12.4 GDILC 214
12.5 WaveCluster 216
12.6 Summary 217
13 Density-based Clustering Algorithms 219
13.1 DBSCAN 219
13.2 BRIDGE 221
13.3 DBCLASD 222
x Contents
13.4 DENCLUE 223
13.5 CUBN 225
13.6 Summary 226
14 Model-based Clustering Algorithms 227
14.1 Introduction 227
14.2 Gaussian Clustering Models 230
14.3 Model-based Agglomerative Hierarchical Clustering 232
14.4 The EM Algorithm 235
14.5 Model-based Clustering 237
14.6 COOLCAT 240
14.7 STUCCO 241
14.8 Summary 242
15 Subspace Clustering 243
15.1 CLIQUE 244
15.2 PROCLUS 246
15.3 ORCLUS 249
15.4 ENCLUS 253
15.5 FINDIT 255
15.6 MAFIA 258
15.7 DOC 259
15.8 CLTree 261
15.9 PART 262
15.10 SUBCAD 264
15.11 Fuzzy Subspace Clustering 270
15.12 Mean Shift for Subspace Clustering 275
15.13 Summary 285
16 Miscellaneous Algorithms 287
16.1 Time Series Clustering Algorithms 287
16.2 Streaming Algorithms 289
16.2.1 LSEARCH 290
16.2.2 Other Streaming Algorithms 293
16.3 Transaction Data Clustering Algorithms 293
16.3.1 Largeltem 294
16.3.2 CLOPE 295
16.3.3 OAK 296
16.4 Summary 297
17 Evaluation of Clustering Algorithms 299
17.1 Introduction 299
17.1.1 Hypothesis Testing 301
17.1.2 External Criteria 302
17.1.3 Internal Criteria 303
17.1.4 Relative Criteria 304
Contents xi
17.2 Evaluation of Partitional Clustering 305
17.2.1 Modified Hubert s T Statistic 305
17.2.2 The Davies-Bouldin Index 305
17.2.3 Dunn s Index 307
17.2.4 The SD Validity Index 307
17.2.5 The S_Dbw Validity Index 308
17.2.6 The RMSSTD Index 309
17.2.7 The RS Index 310
17.2.8 The Calinski-Harabasz Index 310
17.2.9 Rand s Index 311
17.2.10 Average of Compactness 312
17.2.11 Distances between Partitions 312
17.3 Evaluation of Hierarchical Clustering 314
17.3.1 Testing Absence of Structure 314
17.3.2 Testing Hierarchical Structures 315
17.4 Validity Indices for Fuzzy Clustering 315
17.4.1 The Partition Coefficient Index 315
17.4.2 The Partition Entropy Index 316
17.4.3 The Fukuyama-Sugeno Index 316
17.4.4 Validity Based on Fuzzy Similarity 317
17.4.5 A Compact and Separate Fuzzy Validity Criterion 318
17.4.6 A Partition Separation Index 319
17.4.7 An Index Based on the Mini-max Filter Concept and Fuzzy
Theory 319
17.5 Summary 320
III Applications of Clustering 321
18 Clustering Gene Expression Data 323
18.1 Background 323
18.2 Applications of Gene Expression Data Clustering 324
18.3 Types of Gene Expression Data Clustering 325
18.4 Some Guidelines for Gene Expression Clustering 325
18.5 Similarity Measures for Gene Expression Data 326
18.5.1 Euclidean Distance 326
18.5.2 Pearson s Correlation Coefficient 326
18.6 ACase Study 328
18.6.1 C++ Code 328
18.6.2 Results 334
18.7 Summary 334
IV MATLAB and C++for Clustering 341
19 Data Clustering in MATLAB 343
19.1 Read and Write Data Files 343
19.2 Handle Categorical Data 347
xii Contents
19.3 M-files, MEX-files, and MAT-files 349
19.3.1 M-files 349
19.3.2 MEX-files 351
19.3.3 MAT-files 354
19.4 Speed up MATLAB 354
19.5 Some Clustering Functions 355
19.5.1 Hierarchical Clustering 355
19.5.2 £-means Clustering 359
19.6 Summary 362
20 Clustering in C/C++ 363
20.1 The STL 363
20.1.1 The vector Class 363
20.1.2 The list Class 364
20.2 C/C++ Program Compilation 366
20.3 Data Structure and Implementation 367
20.3.1 Data Matrices and Centers 367
20.3.2 Clustering Results 368
20.3.3 The Quick Sort Algorithm 369
20.4 Summary 369
A Some Clustering Algorithms 371
B The kd-tree Data Structure 375
C MATLAB Codes 377
C.I The MATLAB Code for Generating Subspace Clusters 377
C.2 The MATLAB Code for the A-modes Algorithm 379
C.3 The MATLAB Code for the MSSC Algorithm 381
D C++ Codes 385
D. 1 The C++ Code for Converting Categorical Values to Integers 385
D.2 The C++ Code for the FSC Algorithm 388
Bibliography 397
Subject Index 443
Author Index 455
|
adam_txt |
Contents
List of Figures xiii
List of Tables xv
List of Algorithms xvii
Preface xix
1 Clustering, Data, and Similarity Measures 1
1 Data Clustering 3
1.1 Definition of Data Clustering 3
1.2 The Vocabulary of Clustering 5
1.2.1 Records and Attributes 5
1.2.2 Distances and Similarities 5
1.2.3 Clusters, Centers, and Modes 6
1.2.4 Hard Clustering and Fuzzy Clustering 7
1.2.5 Validity Indices 8
1.3 Clustering Processes 8
1.4 Dealing with Missing Values 10
1.5 Resources for Clustering 12
1.5.1 Surveys and Reviews on Clustering 12
1.5.2 Books on Clustering 12
1.5.3 Journals 13
1.5.4 Conference Proceedings 15
1.5.5 Data Sets 17
1.6 Summary 17
2 Data Types 19
2.1 Categorical Data 19
2.2 Binary Data 21
2.3 Transaction Data 23
2.4 Symbolic Data 23
2.5 Time Series 24
2.6 Summary 24
v
vi Contents
3 Scale Conversion 25
3.1 Introduction 25
3.1.1 Interval to Ordinal 25
3.1.2 Interval to Nominal 27
3.1.3 Ordinal to Nominal 28
3.1.4 Nominal to Ordinal 28
3.1.5 Ordinal to Interval 29
3.1.6 Other Conversions 29
3.2 Categorization of Numerical Data 30
3.2.1 Direct Categorization 30
3.2.2 Cluster-based Categorization 31
3.2.3 Automatic Categorization 37
3.3 Summary 41
4 Data Standardization and Transformation 43
4.1 Data Standardization 43
4.2 Data Transformation 46
4.2.1 Principal Component Analysis 46
4.2.2 SVD 48
4.2.3 The Karhunen-Loeve Transformation 49
4.3 Summary 51
5 Data Visualization 53
5.1 Sammon's Mapping 53
5.2 MDS 54
5.3 SOM 56
5.4 Class-preserving Projections 59
5.5 Parallel Coordinates 60
5.6 Tree Maps 61
5.7 Categorical Data Visualization 62
5.8 Other Visualization Techniques 65
5.9 Summary 65
6 Similarity and Dissimilarity Measures 67
6.1 Preliminaries 67
6.1.1 Proximity Matrix 68
6.1.2 Proximity Graph 69
6.1.3 Scatter Matrix 69
6.1.4 Covariance Matrix 70
6.2 Measures for Numerical Data 71
6.2.1 Euclidean Distance 71
6.2.2 Manhattan Distance 71
6.2.3 Maximum Distance 72
6.2.4 Minkowski Distance 72
6.2.5 Mahalanobis Distance 72
Contents vii
6.2.6 Average Distance 73
6.2.7 Other Distances 74
6.3 Measures for Categorical Data 74
6.3.1 The Simple Matching Distance 76
6.3.2 Other Matching Coefficients 76
6.4 Measures for Binary Data 77
6.5 Measures for Mixed-type Data 79
6.5.1 A General Similarity Coefficient 79
6.5.2 A General Distance Coefficient 80
6.5.3 A Generalized Minkowski Distance 81
6.6 Measures for Time Series Data 83
6.6.1 The Minkowski Distance 84
6.6.2 Time Series Preprocessing 85
6.6.3 Dynamic Time Warping 87
6.6.4 Measures Based on Longest Common Subsequences . 88
6.6.5 Measures Based on Probabilistic Models 90
6.6.6 Measures Based on Landmark Models 91
6.6.7 Evaluation 92
6.7 Other Measures 92
6.7.1 The Cosine Similarity Measure 93
6.7.2 A Link-based Similarity Measure 93
6.7.3 Support 94
6.8 Similarity and Dissimilarity Measures between Clusters 94
6.8.1 The Mean-based Distance 94
6.8.2 The Nearest Neighbor Distance 95
6.8.3 The Farthest Neighbor Distance 95
6.8.4 The Average Neighbor Distance 96
6.8.5 Lance-Williams Formula 96
6.9 Similarity and Dissimilarity between Variables 98
6.9.1 Pearson's Correlation Coefficients 98
6.9.2 Measures Based on the Chi-square Statistic 101
6.9.3 Measures Based on Optimal Class Prediction 103
6.9.4 Group-based Distance 105
6.10 Summary 106
II Clustering Algorithms 107
7 Hierarchical Clustering Techniques 109
7.1 Representations of Hierarchical Clusterings 109
7.1.1 H-tree 110
7.1.2 Dendrogram 110
7.1.3 Banner 112
7.1.4 Pointer Representation 112
7.1.5 Packed Representation 114
7.1.6 Icicle Plot 115
7.1.7 Other Representations 115
viii Contents
7.2 Agglomerative Hierarchical Methods 116
7.2.1 The Single-link Method 118
7.2.2 The Complete Link Method 120
7.2.3 The Group Average Method 122
7.2.4 The Weighted Group Average Method 125
7.2.5 The Centroid Method 126
7.2.6 The Median Method 130
7.2.7 Ward's Method 132
7.2.8 Other Agglomerative Methods 137
7.3 Divisive Hierarchical Methods 137
7.4 Several Hierarchical Algorithms 138
7.4.1 SLINK 138
7.4.2 Single-link Algorithms Based on Minimum Spanning Trees 140
7.4.3 CLINK 141
7.4.4 BIRCH 144
7.4.5 CURE 144
7.4.6 DIANA 145
7.4.7 DISMEA 147
7.4.8 Edwards and Cavalli-Sforza Method 147
7.5 Summary 149
8 Fuzzy Clustering Algorithms 151
8.1 Fuzzy Sets 151
8.2 Fuzzy Relations 153
8.3 Fuzzy /c-means 154
8.4 Fuzzy A:-modes 156
8.5 The c-means Method 158
8.6 Summary 159
9 Center-based Clustering Algorithms 161
9.1 The £-means Algorithm 161
9.2 Variations of the £-means Algorithm 164
9.2.1 The Continuous A-means Algorithm 165
9.2.2 The Compare-means Algorithm 165
9.2.3 The Sort-means Algorithm 166
9.2.4 Acceleration of the A-means Algorithm with the
kd-tree 167
9.2.5 Other Acceleration Methods 168
9.3 The Trimmed £-means Algorithm 169
9.4 The .r-means Algorithm 170
9.5 The A--harmonic Means Algorithm 171
9.6 The Mean Shift Algorithm 173
9.7 MEC 175
9.8 The ^-modes Algorithm (Huang) 176
9.8.1 Initial Modes Selection 178
9.9 The it-modes Algorithm (Chaturvedi et al.) 178
Contents ix
9.10 The/t-probabilities Algorithm 179
9.11 The ^-prototypes Algorithm 181
9.12 Summary 182
10 Search-based Clustering Algorithms 183
10.1 Genetic Algorithms 184
10.2 The Tabu Search Method 185
10.3 Variable Neighborhood Search lor Clustering 186
10.4 Al-Sultan's Method 187
10.5 Tabu Search-based Categorical Clustering Algorithm 189
10.6 ./-means 190
10.7 GKA 192
10.8 The Global Jfc-means Algorithm 195
10.9 The Genetic A-modes Algorithm 195
10.9.1 The Selection Operator 196
10.9.2 The Mutation Operator 196
10.9.3 The it-modes Operator 197
10.10 The Genetic Fuzzy A-modes Algorithm 197
10.10.1 String Representation 198
10.10.2 Initialization Process 198
10.10.3 Selection Process 199
10.10.4 Crossover Process 199
10.10.5 Mutation Process 200
10.10.6 Termination Criterion 2(K)
10.11 SARS 200
10.12 Summary 202
11 Graph-based Clustering Algorithms 203
11.1 Chameleon 203
11.2 CACTUS 204
11.3 A Dynamic System-based Approach 205
11.4 ROCK 207
11.5 Summary 208
12 Grid-based Clustering Algorithms 209
12.1 STING 209
12.2 OptiGrid 210
12.3 GRIDCLUS 212
12.4 GDILC 214
12.5 WaveCluster 216
12.6 Summary 217
13 Density-based Clustering Algorithms 219
13.1 DBSCAN 219
13.2 BRIDGE 221
13.3 DBCLASD 222
x Contents
13.4 DENCLUE 223
13.5 CUBN 225
13.6 Summary 226
14 Model-based Clustering Algorithms 227
14.1 Introduction 227
14.2 Gaussian Clustering Models 230
14.3 Model-based Agglomerative Hierarchical Clustering 232
14.4 The EM Algorithm 235
14.5 Model-based Clustering 237
14.6 COOLCAT 240
14.7 STUCCO 241
14.8 Summary 242
15 Subspace Clustering 243
15.1 CLIQUE 244
15.2 PROCLUS 246
15.3 ORCLUS 249
15.4 ENCLUS 253
15.5 FINDIT 255
15.6 MAFIA 258
15.7 DOC 259
15.8 CLTree 261
15.9 PART 262
15.10 SUBCAD 264
15.11 Fuzzy Subspace Clustering 270
15.12 Mean Shift for Subspace Clustering 275
15.13 Summary 285
16 Miscellaneous Algorithms 287
16.1 Time Series Clustering Algorithms 287
16.2 Streaming Algorithms 289
16.2.1 LSEARCH 290
16.2.2 Other Streaming Algorithms 293
16.3 Transaction Data Clustering Algorithms 293
16.3.1 Largeltem 294
16.3.2 CLOPE 295
16.3.3 OAK 296
16.4 Summary 297
17 Evaluation of Clustering Algorithms 299
17.1 Introduction 299
17.1.1 Hypothesis Testing 301
17.1.2 External Criteria 302
17.1.3 Internal Criteria 303
17.1.4 Relative Criteria 304
Contents xi
17.2 Evaluation of Partitional Clustering 305
17.2.1 Modified Hubert's T Statistic 305
17.2.2 The Davies-Bouldin Index 305
17.2.3 Dunn's Index 307
17.2.4 The SD Validity Index 307
17.2.5 The S_Dbw Validity Index 308
17.2.6 The RMSSTD Index 309
17.2.7 The RS Index 310
17.2.8 The Calinski-Harabasz Index 310
17.2.9 Rand's Index 311
17.2.10 Average of Compactness 312
17.2.11 Distances between Partitions 312
17.3 Evaluation of Hierarchical Clustering 314
17.3.1 Testing Absence of Structure 314
17.3.2 Testing Hierarchical Structures 315
17.4 Validity Indices for Fuzzy Clustering 315
17.4.1 The Partition Coefficient Index 315
17.4.2 The Partition Entropy Index 316
17.4.3 The Fukuyama-Sugeno Index 316
17.4.4 Validity Based on Fuzzy Similarity 317
17.4.5 A Compact and Separate Fuzzy Validity Criterion 318
17.4.6 A Partition Separation Index 319
17.4.7 An Index Based on the Mini-max Filter Concept and Fuzzy
Theory 319
17.5 Summary 320
III Applications of Clustering 321
18 Clustering Gene Expression Data 323
18.1 Background 323
18.2 Applications of Gene Expression Data Clustering 324
18.3 Types of Gene Expression Data Clustering 325
18.4 Some Guidelines for Gene Expression Clustering 325
18.5 Similarity Measures for Gene Expression Data 326
18.5.1 Euclidean Distance 326
18.5.2 Pearson's Correlation Coefficient 326
18.6 ACase Study 328
18.6.1 C++ Code 328
18.6.2 Results 334
18.7 Summary 334
IV MATLAB and C++for Clustering 341
19 Data Clustering in MATLAB 343
19.1 Read and Write Data Files 343
19.2 Handle Categorical Data 347
xii Contents
19.3 M-files, MEX-files, and MAT-files 349
19.3.1 M-files 349
19.3.2 MEX-files 351
19.3.3 MAT-files 354
19.4 Speed up MATLAB 354
19.5 Some Clustering Functions 355
19.5.1 Hierarchical Clustering 355
19.5.2 £-means Clustering 359
19.6 Summary 362
20 Clustering in C/C++ 363
20.1 The STL 363
20.1.1 The vector Class 363
20.1.2 The list Class 364
20.2 C/C++ Program Compilation 366
20.3 Data Structure and Implementation 367
20.3.1 Data Matrices and Centers 367
20.3.2 Clustering Results 368
20.3.3 The Quick Sort Algorithm 369
20.4 Summary 369
A Some Clustering Algorithms 371
B The kd-tree Data Structure 375
C MATLAB Codes 377
C.I The MATLAB Code for Generating Subspace Clusters 377
C.2 The MATLAB Code for the A-modes Algorithm 379
C.3 The MATLAB Code for the MSSC Algorithm 381
D C++ Codes 385
D. 1 The C++ Code for Converting Categorical Values to Integers 385
D.2 The C++ Code for the FSC Algorithm 388
Bibliography 397
Subject Index 443
Author Index 455 |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Gan, Guojun 1979- Ma, Chaoqun Wu, Jianhong |
author_GND | (DE-588)142575968 |
author_facet | Gan, Guojun 1979- Ma, Chaoqun Wu, Jianhong |
author_role | aut aut aut |
author_sort | Gan, Guojun 1979- |
author_variant | g g gg c m cm j w jw |
building | Verbundindex |
bvnumber | BV023397368 |
callnumber-first | Q - Science |
callnumber-label | QA278 |
callnumber-raw | QA278 |
callnumber-search | QA278 |
callnumber-sort | QA 3278 |
callnumber-subject | QA - Mathematics |
classification_rvk | SK 840 |
classification_tum | MAT 627f |
ctrlnum | (OCoLC)77831225 (DE-599)GBV522828817 |
dewey-full | 519.5/3 |
dewey-hundreds | 500 - Natural sciences and mathematics |
dewey-ones | 519 - Probabilities and applied mathematics |
dewey-raw | 519.5/3 |
dewey-search | 519.5/3 |
dewey-sort | 3519.5 13 |
dewey-tens | 510 - Mathematics |
discipline | Mathematik |
discipline_str_mv | Mathematik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02281nam a2200517 cb4500</leader><controlfield tag="001">BV023397368</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20121011 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">080715s2007 d||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0898716233</subfield><subfield code="9">0-89871-623-3</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780898716238</subfield><subfield code="9">978-0-89871-623-8</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)77831225</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBV522828817</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-29T</subfield><subfield code="a">DE-91</subfield><subfield code="a">DE-634</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-384</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA278</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">519.5/3</subfield><subfield code="2">22</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SK 840</subfield><subfield code="0">(DE-625)143261:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">MAT 627f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Gan, Guojun</subfield><subfield code="d">1979-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)142575968</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Data clustering</subfield><subfield code="b">theory, algorithms, and applications</subfield><subfield code="c">Guojun Gan ; Chaoqun Ma ; Jianhong Wu</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Philadelphia, Pa [u.a.]</subfield><subfield code="b">SIAM [u.a.]</subfield><subfield code="c">2007</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXII, 466 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">ASA-SIAM series on statistics and applied probability</subfield><subfield code="v">20</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturverz. S. 397 - 441</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Classification automatique (Statistique)</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Classification automatique (Statistique) - Informatique</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Datenverarbeitung</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cluster analysis</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cluster analysis</subfield><subfield code="x">Data processing</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Cluster-Analyse</subfield><subfield code="0">(DE-588)4070044-6</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Cluster-Analyse</subfield><subfield code="0">(DE-588)4070044-6</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ma, Chaoqun</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Wu, Jianhong</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">ASA-SIAM series on statistics and applied probability</subfield><subfield code="v">20</subfield><subfield code="w">(DE-604)BV021491710</subfield><subfield code="9">20</subfield></datafield><datafield tag="856" ind1="4" ind2=" "><subfield code="u">http://www.loc.gov/catdir/enhancements/fy0709/2007061713-d.html</subfield><subfield code="z">Publisher description</subfield><subfield code="z">lizenzfrei</subfield></datafield><datafield tag="856" ind1="4" ind2=" "><subfield code="u">http://www.loc.gov/catdir/enhancements/fy0709/2007061713-t.html</subfield><subfield code="z">lizenzfrei</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2=" "><subfield code="u">http://www.loc.gov/catdir/enhancements/fy0710/2007061713-b.html</subfield><subfield code="z">Contributor biographical information</subfield><subfield code="z">lizenzfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016580217&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-016580217</subfield></datafield></record></collection> |
id | DE-604.BV023397368 |
illustrated | Illustrated |
index_date | 2024-07-02T21:22:25Z |
indexdate | 2024-07-09T21:17:42Z |
institution | BVB |
isbn | 0898716233 9780898716238 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-016580217 |
oclc_num | 77831225 |
open_access_boolean | |
owner | DE-29T DE-91 DE-BY-TUM DE-634 DE-11 DE-384 |
owner_facet | DE-29T DE-91 DE-BY-TUM DE-634 DE-11 DE-384 |
physical | XXII, 466 S. graph. Darst. |
publishDate | 2007 |
publishDateSearch | 2007 |
publishDateSort | 2007 |
publisher | SIAM [u.a.] |
record_format | marc |
series | ASA-SIAM series on statistics and applied probability |
series2 | ASA-SIAM series on statistics and applied probability |
spelling | Gan, Guojun 1979- Verfasser (DE-588)142575968 aut Data clustering theory, algorithms, and applications Guojun Gan ; Chaoqun Ma ; Jianhong Wu Philadelphia, Pa [u.a.] SIAM [u.a.] 2007 XXII, 466 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier ASA-SIAM series on statistics and applied probability 20 Literaturverz. S. 397 - 441 Classification automatique (Statistique) Classification automatique (Statistique) - Informatique Datenverarbeitung Cluster analysis Cluster analysis Data processing Cluster-Analyse (DE-588)4070044-6 gnd rswk-swf Cluster-Analyse (DE-588)4070044-6 s DE-604 Ma, Chaoqun Verfasser aut Wu, Jianhong Verfasser aut ASA-SIAM series on statistics and applied probability 20 (DE-604)BV021491710 20 http://www.loc.gov/catdir/enhancements/fy0709/2007061713-d.html Publisher description lizenzfrei http://www.loc.gov/catdir/enhancements/fy0709/2007061713-t.html lizenzfrei Inhaltsverzeichnis http://www.loc.gov/catdir/enhancements/fy0710/2007061713-b.html Contributor biographical information lizenzfrei HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016580217&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Gan, Guojun 1979- Ma, Chaoqun Wu, Jianhong Data clustering theory, algorithms, and applications ASA-SIAM series on statistics and applied probability Classification automatique (Statistique) Classification automatique (Statistique) - Informatique Datenverarbeitung Cluster analysis Cluster analysis Data processing Cluster-Analyse (DE-588)4070044-6 gnd |
subject_GND | (DE-588)4070044-6 |
title | Data clustering theory, algorithms, and applications |
title_auth | Data clustering theory, algorithms, and applications |
title_exact_search | Data clustering theory, algorithms, and applications |
title_exact_search_txtP | Data clustering theory, algorithms, and applications |
title_full | Data clustering theory, algorithms, and applications Guojun Gan ; Chaoqun Ma ; Jianhong Wu |
title_fullStr | Data clustering theory, algorithms, and applications Guojun Gan ; Chaoqun Ma ; Jianhong Wu |
title_full_unstemmed | Data clustering theory, algorithms, and applications Guojun Gan ; Chaoqun Ma ; Jianhong Wu |
title_short | Data clustering |
title_sort | data clustering theory algorithms and applications |
title_sub | theory, algorithms, and applications |
topic | Classification automatique (Statistique) Classification automatique (Statistique) - Informatique Datenverarbeitung Cluster analysis Cluster analysis Data processing Cluster-Analyse (DE-588)4070044-6 gnd |
topic_facet | Classification automatique (Statistique) Classification automatique (Statistique) - Informatique Datenverarbeitung Cluster analysis Cluster analysis Data processing Cluster-Analyse |
url | http://www.loc.gov/catdir/enhancements/fy0709/2007061713-d.html http://www.loc.gov/catdir/enhancements/fy0709/2007061713-t.html http://www.loc.gov/catdir/enhancements/fy0710/2007061713-b.html http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016580217&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
volume_link | (DE-604)BV021491710 |
work_keys_str_mv | AT ganguojun dataclusteringtheoryalgorithmsandapplications AT machaoqun dataclusteringtheoryalgorithmsandapplications AT wujianhong dataclusteringtheoryalgorithmsandapplications |
Es ist kein Print-Exemplar vorhanden.
Inhaltsverzeichnis