Data clustering: algorithms and applications
Gespeichert in:
Format: | Buch |
---|---|
Sprache: | English |
Veröffentlicht: |
Boca Raton [u.a.]
CRC Press
2014
|
Schriftenreihe: | Chapman & Hall, CRC data mining and knowledge discovery series
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Includes bibliographical references and index |
Beschreibung: | XXVI, 622 S. Ill., graph. Darst. 26 cm |
ISBN: | 9781466558212 |
Internformat
MARC
LEADER | 00000nam a22000002c 4500 | ||
---|---|---|---|
001 | BV041209676 | ||
003 | DE-604 | ||
005 | 20140218 | ||
007 | t | ||
008 | 130808s2014 ad|| |||| 00||| eng d | ||
020 | |a 9781466558212 |c hbk |9 978-1-4665-5821-2 | ||
035 | |a (OCoLC)859376232 | ||
035 | |a (DE-599)BSZ391828436 | ||
040 | |a DE-604 |b ger | ||
041 | 0 | |a eng | |
049 | |a DE-473 |a DE-703 |a DE-20 |a DE-2070s |a DE-11 |a DE-523 |a DE-634 | ||
082 | 0 | |a 519.535 | |
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
245 | 1 | 0 | |a Data clustering |b algorithms and applications |c ed. by Charu C. Aggarwal ... |
264 | 1 | |a Boca Raton [u.a.] |b CRC Press |c 2014 | |
300 | |a XXVI, 622 S. |b Ill., graph. Darst. |c 26 cm | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Chapman & Hall, CRC data mining and knowledge discovery series | |
500 | |a Includes bibliographical references and index | ||
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Cluster-Analyse |0 (DE-588)4070044-6 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Algorithmus |0 (DE-588)4001183-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | 1 | |a Cluster-Analyse |0 (DE-588)4070044-6 |D s |
689 | 0 | 2 | |a Algorithmus |0 (DE-588)4001183-5 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Aggarwal, Charu C. |d 1970- |e Sonstige |0 (DE-588)133500101 |4 oth | |
856 | 4 | 2 | |m Digitalisierung UB Bamberg |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=026184401&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-026184401 |
Datensatz im Suchindex
_version_ | 1804150640870424576 |
---|---|
adam_text | Contents
Prefaci xxj
Kditor
Biographies
xx¡¡¡
Contributors xxv
1 An
Introduction to Cluster Analysis
1
( harn ( .
Ліщатчіі
1.1 Introduction
..................................... 2
1.2
Common Techniques Used in (Muster Analysis
.................. 3
1.2.1
Feature Selection Methods
......................... 4
1.2.2
Probabilistic and Generative Models
................... 4
1.2.3
Distance Based Algorithms
........................ 5
1.2.4
Density- and Grid-Based Methods
..................... 7
1.2.5
Leveraging Dimensionality Reduction Methods
............. 8
1.2.5.1
Generative Models lor Dimensionality Reduction
....... 8
1.2.5.2
Matrix Factorization and Co-Clustering
............ 8
1.2.5.3
Spectral Methods
........................ 10
1.2.6
Hie High Dimensional Scenario
...................... 11
1.2.7
Scalable Techniques Tor Cluster Analysis
................. 13
1.2.7.1
I/O Issues in Database Management
.............. 13
1.2.7.2
Streaming Algorithms
..................... 14
1.2.7.3
lhe
Big Data Framework
.................... 14
I..Î
Data Types Studied in Cluster Analysis
...................... 15
1.3.1
Clustering Categorical Data
........................ 15
1.3.2
Clustering Text Data
............................ 16
1.3.3
Clustering Multimedia Data
........................ 16
1.3.4
Clustering lime Series Data
........................ 17
1.3.5
Clustering Discrete Sequences
....................... 17
1.3.6
Clustering Network Data
......................... 18
1.3.7
Clustering Uncertain Data
......................... 19
14
Insights Gained from Different Variations of Cluster Analysis
........... 19
1.4.1
Visual Insights
............................... 20
1.4.2
Supervised Insights
............................ 20
1.4.3
Multiview and F.nsemble-Based Insights
................. 21
1.4.4
Validation-Based Insights
......................... 21
1.5
Discussion and Conclusions
............................ 22
vu
VIH
Contents
2
Feature Selection for Clustering: A Review
Salem Alelyani, Jiliang Tang, and Huan Liu
2.1
Introduction
2.2
2.1.1
Data Clustering
........
2.1.2
Feature Selection
........
2.
1
.3
Feature Selection
l or
Clustering
2.1.3.1
Filter Model
.....
2.1.3.2
Wrapper Model
. . .
2.1.3.3
Hybrid Model
....
Feature Selection lor Clustering
....
2.2.1
Algorithms lor Generic Data
2.2.
2.2.
2.2.
2.2.
2.2.
2.2.
Spectral Feature Selection (Sl FC)
...............
.2
Laplacian Score (LS)
......................
.3
Feature Selection lor Sparse Clustering
............
.4
Localized Feature Selection Based on Scatter
Separabili!)
(LFSBSS)
............................
.5
Multicluster Feature Selection (MCFS)
............
.6
Feature Weighting ¿-Means...................
2.2.2
Algorithms lor Text Data
.........................
2.2.2.1
Term Frequency (TF)
......................
2.2.2.2
Inverse Document Frequency (IDF)
..............
2.2.2.3
Term Frequency-Inverse Document Frequency (TF IDF)
. . .
2.2.2.4
Chi Square Statistic
.......................
2.2.2.5
Frequent Term-Based Text Clustering
.............
2.2.2.6
Frequent Term Sequence
....................
2.2.3
Algorithms for Streaming Data
......................
2.2.3.1
Text Stream Clustering Based on Adaptive Feature Selection
(TSC-AFS)
...........................
2.2.3.2
High-Dimensional Projected Stream Clustering (HPStream)
.
2.2.4
Algorithms for Linked Data
........................
2.2.4.1
Challenges and Opportunities
..................
2.2.4.2
LUFS: An Unsupervised Feature Selection Framework for
Linked Data
...........................
2.2.4.3
Conclusion and Future Work for Linked Data
.........
2.3
Discussions and Challenges
.............................
2.3.1
The Chicken or the Egg Dilemma
.....................
2.3.2
Model Selection:
К
and
/ .........................
2.3.3
Scalability
.................................
2.3.4
Stability
..................................
3
Probabilistic Models for Clustering
Hongbo Deng and Jiawei Han
3.1
Introduction
.............
3.2
Mixture Models
...........
3.2.1
Overview
..........
3.2.2
Gaussian Mixture Model
.
3.2.3
Bernoulli Mixture Model
.
3.2.4
Model Selection Criteria
.
3.3
EM Algorithm and Its Variations
3.3.1
The General EM Algorithm
3.3.2
Mixture Models Revisited
.
29
30
32
32
. .
34
35
35
35
Mi
Mi
Њ
37
38
39
40
41
41
42
42
42
44
45
47
47
48
50
50
51
52
53
53
54
54
55
61
61
62
62
64
67
68
69
69
73
Contents
¡x
3.3.3
Limitations
of the
HM
Algorithm
..................... 75
3.3.4
Applications of the
HM
Algorithm
.................... 76
3.4
Probabilistic Topic Models
............................. 76
3.4.1
Probabilistic Latent Semantic Analysis
.................. 77
3.4.2
Latent Dirichlet Allocation
........................ 79
3.4.3
Variations and Hxtensions
......................... 81
3.5
Conclusions and Summary
............................. 81
4
A Survey of Fartitional and Hierarchical Clustering Algorithms
87
Chandan
К.
Reddy and Bhanukiran Vinzumuri
4.1
Introduction
..................................... 88
4.2
Parlitional Clustering Algorithms
.......................... 89
4.2.1
A Means Clustering
............................ 89
4.2.2
Minimization of Sum of Squared Hrrors
.................. 90
4.2.3
Factors Affecting A -Means
........................ 91
4.2.3.1
Popular Initialization Methods
................. 91
4.2.3.2
Hstimuting the Number of Clusters
............... 92
4.2.4
Variations of A -Means
........................... 93
4.2.4.1
A -Medoids Clustering
..................... 93
4.2.4.2
A -Medians Clustering
..................... 94
4.2.4.3
AT-Modes Clustering
...................... 94
4.2.4.4
Fuzzy A -Means Clustering
................... 95
4.2.4.5
Х
-Means Clustering
....................... 95
4.2.4.6
Intelligent A -Means Clustering
................. 96
4.2.4.7
Bisecting A -Means Clustering
................. 97
4.2.4.8
Kernel A -Means Clustering
................... 97
4.2.4.9
Mean Shift Clustering
...................... 98
4.2.4.10
Weighted A -Means Clustering
................. 98
4.2.4.11
Genetic A -Means Clustering
.................. 99
4.2.5
Making A -Means Faster
.......................... 100
4.3
Hierarchical Clustering Algorithms
......................... 100
4.3.1
Agglomerative Clustering
......................... 101
4.3.1.1
Single and Complete Link
................... 101
4.3.1.2
Group Averaged and Centroid Agglomerative Clustering
... 102
4.3.1.3
Ward s Criterion
........................ 103
4.3.1.4
Agglomerative Hierarchical Clustering Algorithm
....... 103
4.3.1.5
Lance-Williams Dissimilarity Update Formula
........ 103
4.3.2
Divisive Clustering
............................. 104
4.3.2.1
Issues in Divisive Clustering
.................. 104
4.3.2.2
Divisive Hierarchical Clustering Algorithm
.......... 105
4.3.2.3
Minimum Spanning Tree-Based Clustering
.......... 105
4.3.3
Other Hierarchical Clustering Algorithms
................. 106
4.4
Discussion and Summary
..............................
Ю6
5
Density-Based Clustering
Ш
Martin Ester
5.1
Introduction
.....................................
5.2
DBSCAN
......................................
m
5.3
DHNCLUE
.....................................
115
5.4
OPTICS
.......................................
П6
5.5
Other Algorithms
..................................
и6
5.6
Subspace Clustering
5.7
Clustering Networks
5.8
Other Directions
Contents
......
ПК
.........................
...................................
5.9
Conclusion
.....................................
127
6
Grid-Based Clustering
Wei Cheng, Wei Wang, and Sandra Batista
6.1
Introduction
.....................................
6.2
The Classical Algorithms
..............................
6.2.1
Earliest Approaches: GRIDCLUS and BANG
..............
Ш
6.2.2
STING and
STlNCï+:
ľhc
Statistical Information
(
¡rid Approach
....
U2
6.2.3
WaveCluster: Wavelets in Grid-Based Clustering
.............
LU
6.3
Adaptive Grid-Based Algorithms
..........................
^
6.3.1
AMR: Adaptive Mesh Refinement Clustering
............... 1·^
6.4
Axis-Shifting Grid-Based Algorithms
.......................
I Ui
6.4.1
NSGC: New Shifting Grid Clustering Algorithm
............. 1-М)
6.4.2
ADCC: Adaptable Deflect and Conquer Clustering
............
l^7
6.4.3
ASGC: Axis-Shifted Grid-Clustering
...................
I-*7
6.4.4
GD1LC: Grid-Based Densily-lsoLine Clustering Algorithm
....... 138
6.5
High-Dimensional Algorithms
...........................
l-W
6.5.1
CLIQUE: The Classical High-Dimensional Algorithm
..........
LW
6.5.2
Variants of CLIQUE
............................ 140
6.5.2.1
ENCLUS: Entropy-Based Approach
.............. 140
6.5.2.2
MAFIA: Adaptive Grids in High Dimensions
......... 141
6.5.3
OptiGrid: Density-Based Optimal Grid Partitioning
........... 141
6.5.4
Variants of the OptiGrid Approach
.................... 143
6.5.4.1
О
-Cluster:
A Scalable Approach
................ 143
6.5.4.2
CBF: Cell-Based Filtering
................... 144
6.6
Conclusions and Summary
............................. 145
7
Nonnegative
Matrix Factorizations for Clustering: A Survey
149
Tao Li and Chris Ding
7.1
Introduction
..................................... 150
7.1.1
Background
................................ 150
7.1.2
NMF Formulations
............................. 151
7.2
NMF for Clustering: Theoretical Foundations
................... 151
7.2.1
NMF and
Äľ-Means
Clustering
....................... 151
7.2.2
NMF and Probabilistic Latent Semantic Indexing
............. 152
7.2.3
NMF and Kernel K-Means and Spectral Clustering
............ 152
7.2.4
NMF Boundedness Theorem
....................... 153
7.3
NMF Clustering Capabilities
............................ 153
7.3.1
Examples
.......................
153
7.3.2
Analysis
.......... 15}
7.4
NMF Algorithms
................................. . 155
7.4.1
Introduction
.......................
1
55
7.4.2
Algorithm Development
..........................
I55
7.4.3
Practical Issues in NMF Algorithms
.................... 156
7.4.3.1
Initialization
.................... 1^6
7.4.3.2
Stopping Criteria
........................ 156
7.4.3.3
Objective Function vs. Clustering Performance
.. 157
7.4.3.4
Scalability
.....................
157
Contents
x j
7.5
NMF
Related Factorizations
........................... 158
7.6
NMF
for Clustering: Hxtensions
.......................... 161
7.6.1
Co-Clustering
............................... 161
7.6.2
Semisupervised Clustering
........................ 162
7.6.3
Semisupervised Co-Clustering
...................... 162
7.6.4
Consensus Clustering
........................... 163
7.6.5
(¡rapii
Clustering
.............................. 164
7.6.6
Other Clustering Hxtensions
........................ 164
7.7
Conclusions
..................................... 165
8
Spectral Clustering
177
J
tal u
Liu and Jiawei Han
8.1
Introduction
..................................... 177
8.2
Similarity
(ìraph
.................................. 179
8.3
Unnormali/.ed Spectral Clustering
......................... 180
8.3.1
Notation
.................................. 180
8.3.2
Unnormalized Graph Laplacian
...................... 180
8.3.3
Spectrum Analysis
............................. 181
8.3.4
Unnormalized Spectral Clustering Algorithm
............... 182
8.4
Normalized Spectral Clustering
........................... 182
8.4.1
Normalized Graph Laplacian
....................... 183
8.4.2
Spectrum Analysis
............................. 184
8.4.3
Normalized Spectral Clustering Algorithm
................ 184
8.5
Graph Cut View
................................... 185
8.5.1
Ratio Cut Relaxation
............................ 186
8.5.2
Normalized Cut Relaxation
........................ 187
8.6
Random Walks View
................................ 188
8.7
Connection to Laplacian
Eigenmap......................... 189
8.8
Connection to Kernel Ar-Means and
Nonnegative
Matrix Factorization
...... 191
8.9
Large Scale Spectral Clustering
........................... 192
8.10
Further Reading
................................... 194
9
Clustering High-Dimensional Data
201
Arthur Zimek
9.1
Introduction
..................................... 201
9.2
The Curse of Dimensionality
........................... 202
9.2.1
Different Aspects of the Curse
..................... 202
9.2.2
Consequences
............................... 206
9.3
Clustering Tasks in Subspaces of High-Dimensional Data
............. 206
9.3.1
Categories of Subspaces
.......................... 206
9.3.1.1
Axis-Parallel Subspaces
.................... 206
9.3.1.2
Arbitrarily Oriented Subspaces
................. 207
9.3.1.3
Special Cases
.......................... 207
9.3.2
Search Spaces for the Clustering Problem
................. 207
9.4
Fundamental Algorithmic Ideas
.......................... 208
9.4.1
Clustering in Axis-Parallel Subspaces
................... 208
9.4.1.1
Cluster Model
.......................... 208
9.4.1.2
Basic Techniques
........................ 208
9.4.1.3
Clustering Algorithms
..................... 210
9.4.2
Clustering in Arbitrarily Oriented Subspaces
............... 215
9.4.2.1
Cluster Model
.......................... 215
xii
Contents
9.4.2.2
Basic Techniques and Kxample Algorithms
.......... 216
9.5
Open Questions and Current Research Directions
................. -
9.6
Conclusion
.....................................
10
A Survey of Stream Clustering Algorithms
Cham
С
Aggarwal ^
10.1
Introduction
.....................................
~
10.2
Methods Based on Partitioning Representatives
.................. -■ ·
10.2.1
The STREAM Algorithm
......................... 233
10.2.2
CluStrcam: The Microclustering Framework
...............
23~>
10.2.2.1
Microcluster Definition
.....................
—Ъ
10.2.2.2
Pyramidal Time Frame
..................... 236
10.2.2.3
Online Clustering with CluStream
............... 237
10.3
Density-Based Stream Clustering
..........................
23(>
10.3.1
DenStream: Density-Based Microclustering
............... 240
10.3.2
Grid-Based Streaming Algorithms
.................... 241
10.3.2.1
D-Stream Algorithm
...................... 241
10.3.2.2
Other Grid-Based Algorithms
................. 242
10.4
Probabilistic Streaming Algorithms
........................ 243
10.5
Clustering High-Dimensional Streams
....................... 243
10.5.1
The HPSTREAM Method
......................... 244
10.5.2
Other High-Dimensional Streaming Algorithms
............. 244
10.6
Clustering Discrete and Categorical Streams
.................... 245
10.6.1
Clustering Binary Data Streams with ¿Means
.............. 245
10.6.2
The StreamCluCD Algorithm
....................... 245
10.6.3
Massive-Domain Clustering
........................ 246
10.7
Text Stream Clustering
............................... 249
10.8
Other Scenarios for Stream Clustering
....................... 252
10.8.1
Clustering Uncertain Data Streams
.................... 253
10.8.2
Clustering Graph Streams
......................... 253
10.8.3
Distributed Clustering of Data Streams
.................. 254
10.9
Discussion and Conclusions
............................ 254
11
Big Data Clustering
259
Hanghang
Tong
and
U Kang
11.1
Introduction
........................ 259
11.2
One-Pass Clustering Algorithms
.......................... 260
11.2.1
CLARANS: Fighting with Exponential Search Space
.......... 260
11.2.2
BIRCH: Fighting with Limited Memory
................. 261
11.2.3
CURE: Fighting with the Irregular Clusters
................ 263
11.3
Randomized Techniques for Clustering Algorithms
................ 263
11.3.1
Locality-Preserving Projection
........... 264
11.3.2
Global Projection
................
-^
11.4
Parallel and Distributed Clustering Algorithms
........ . .......... 268
11.4.1
General Framework
...............
11.4.2
DBDC: Density-Based Clustering
. .
11.4.3
ParMETIS: Graph Partitioning
.......... . . . . . . . . . . . 269
11.4.4
PKMeans:
Aľ-Means
with MapReduce
........... .. . 270
11.4.5
DisCo: Co-Clustering with MapReduce
............ 271
11.4.6
BoW: Subspace Clustering with MapReduce
.........
11.5
Conclusion
...........
Contents
xjii
12
Clustering
Categorical
Data
277
Bill Andreopoulos
12.1
Introduction
................... - 278
12.2
Goals
ol
Categorical Clustering
........................... 279
12.2.1
Clustering
Road Map
........................... 280
12.3
Similarity Measures
l or
Categorical Data .....................
282
12.3.1
The Hamming Distance in Categorical and Binary Data
......... 282
12.3.2
Probabilistic Measures
........................... 283
12.3.3
Information-Theorelic Measures
..................... 283
12.3.4
Context-Based Similarity Measures
.................... 284
12.4
Descriptions of Algorithms
............................. 284
12.4.1
Piirtition-Based Clustering
......................... 284
12.4.1.1
Л
-Modes
.............................
284
12.4.1.2
Ar-Prototypes
(Mixed Categorical and Numerical)
....... 285
12.4.1.3
l-uz/.y
A-Modes
......................... 286
12.4.1.4
Squeezer
............................ 286
12.4.1.5
COOLCAT
........................... 286
12.4.2
Hierarchical Clustering
.......................... 287
12.4.2.1
ROCK
.............................. 287
12.4.2.2
COBWHB
............................ 288
12.4.2.3
LIMBO
............................. 289
12.4.3
Density-Based Clustering
......................... 289
12.4.3.1
Projected (Subspace) Clustering
................ 290
12.4.3.2
CACTUS
............................ 290
12.4.3.3
CLICKS
............................. 291
12.4.3.4
STIRR
.............................. 291
12.4.3.5
CLOPE
............................. 292
12.4.3.6
HIHRDENC: Hierarchical Density-Based Clustering
..... 292
12.4.3.7
MULIC:
Multiple Layer Incremental Clustering
........ 293
12.4.4
Model-Based Clustering
.......................... 296
12.4.4.1
BILCOM Empirical Bayesian (Mixed Categorical and Numer¬
ical)
............................... 296
12.4.4.2
AutoClass (Mixed Categorical and Numerical)
........ 296
12.4.4.3
SV M
Clustering (Mixed Categorical and Numerical)
..... 297
12.5
Conclusion
..................................... 298
13
Document Clustering: The Next Frontier
305
David C. Anas
tasi
и
,
Andrea
Tagare
l
li,
and George
Kary
pis
13.1
Introduction
..................................... 306
13.2 Modelinga
Document
............................... 306
13.2.1
Preliminaries
................................ 306
13.2.2
The Vector Space Model
.......................... 307
13.2.3
Alternate Document Models
........................ 309
13.2.4
Dimensionality Reduction for Text
.................... 309
13.2.5
Characterizing Extremes
.......................... 310
13.3
General Purpose Document Clustering
....................... 311
13.3.1
Similarity/Dissimilarity-Based Algorithms
................ 311
13.3.2
Density-Based Algorithms
......................... 312
13.3.3
Adjacency-Based Algorithms
....................... 313
13.3.4
Generative Algorithms
........................... 313
13.4
Clustering Long Documents
............................ 315
Contents
xiv
λ 1
5
13.4.1
Document Segmentation
.......................
13.4.2
Clustering Segmented Documents
......................
13.4.3
Simultaneous Segment Identification and Clustering
............-
13.5
Clustering Short Documents
.............................
13.5.1
General Methods for Short Document Clustering
..............
ł-
13.5.2
Clustering with Knowledge Infusion
....................-_
13.5.3
Clustering Web Snippets
...........................
~*
13.5.4
Clustering Microblogs
............................-
13.6
Conclusion
......................................
~
11(1
14
Clustering Multimedia Data
Shen-Fu Tsai, Guo-Jun Qi, Shiyu Chang, Min-Hsuan Tsui, and Thomas S. Hwms
14.1
Introduction
......................................
U0
14.2
Clustering with Image Data
..............................
i4()
14.2.1
Visual Words Learning
............................
^
14.2.2
Face Clustering and Annotation
.......................
u-
14.2.3
Photo Album Event Recognition
......................
^
14.2.4
Image Segmentation
.............................
w
14.2.5
Large-Scale Image Classification
..................... ^45
14.3
Clustering with Video and Audio Data
....................... 347
14.3.1
Video Summarization
............................^48
14.3.2
Video Event Detection
........................... 349
14.3.3
Video Story Clustering
........................... 350
14.3.4
Music Summarization
........................... 350
14.4
Clustering with
Multimodal Data.......................... 351
14.5
Summary and Future Directions
.......................... 353
15
Time-Series Data Clustering
357
Dimitrios Kotsakos,
Goce
Trajcevski, Dimitrios Gunopulos, and Charu
С.
Aggarwal
15.1
Introduction
..................................... 358
15.2
The Diverse Formulations for Time-Series Clustering
............... 359
15.3
Online Correlation-Based Clustering
........................ 360
15.3.1
Selective Muscles and Related Methods
.................. 361
15.3.2
Sensor Selection Algorithms for Correlation Clustering
......... 362
15.4
Similarity and Distance Measures
......................... 363
15.4.1
Univariate Distance Measures
....................... 363
15.4.1.1
Lp Distance
...........................
З6З
15.4.1.2
Dynamic Time Warping Distance
............... 364
15.4.1.3
EDIT Distance
.........................
M-,5
15.4.1.4
Longest Common Subsequence
................ 365
15.4.2
Multivariate Distance Measures
...............
}66
15.4.2.1
Multidimensional Lp Distance
................. 366
15.4.2.2
Multidimensional DTW
............... ^67
15.4.2.3
Multidimensional LCSS
..............
З68
15.4.2.4
Multidimensional Edit Distance
................
З68
15.4.2.5
Multidimensional Subsequence Matching
. 368
15.5
Shape-Based Time-Series Clustering Techniques
.......... ...... 369
15.5.1
ќ
-Means
Clustering
.............. ..........
15.5.2
Hierarchical Clustering
........... ............
^7]
15.5.3
Density-Based Clustering
...........].............
ηΊ
Contents xv
15.5.4
Trajectory (Mustering
........................... 372
15.6
Time-Series Clustering Applications
........................ 374
15.7
Conclusions
..................... 375
16
Clustering Biological Data
381
Chandan
К.
Ready, Mohammad
Al
Hasan, and Mohammed
J
.
Zaki
16.1
Introduction
..................................... 382
16.2
Clustering Microarray Data
............................. 383
16.2.1
Proximity Measures
............................ 383
16.2.2
Categorization of Algorithms
....................... 384
16.2.3
Standard (Mustering Algorithms
...................... 385
16.2.3.1
Hierarchical Clustering
..................... 385
16.2.3.2
Probabilistic (Mustering
..................... 386
16.2.3.3
(ìraph-Thcoretic
Clustering
................... 386
16.2.3.4
Sell-Organizing Maps
...................... 387
16.2.3.5
Other (Mustering Methods
................... 387
16.2.4
Hicluslering
................................ 388
16.2.4.1
types and Structures of Biclusters
............... 389
16.2.4.2
Biclustering Algorithms
.................... 390
16.2.4.3
Recent Developments
...................... 391
16.2.5
Triclustering
................................ 391
16.2.6
Time-Series
(iene
Expression Data Clustering
.............. 392
16.2.7
Cluster Validation
............................. 393
16.3
Clustering Biological Networks
.......................... 394
16.3.1
Characteristics of
PPI
Network Data
................... 394
16.3.2
Network (Mustering Algorithms
...................... 394
16.3.2.1
Molecular Complex Detection
................. 394
16.3.2.2
Markov (Mustering
....................... 395
16.3.2.3
Neighborhood Search Methods
................. 395
16.3.2.4
Clique Percolation Method
................... 395
16.3.2.5
Ensemble
Clustering
...................... 396
16.3.2.6
Other (Mustering Methods
................... 396
16.3.3
Cluster Validation and Challenges
..................... 397
16.4
Biological Sequence (Mustering
........................... 397
16.4.1
Sequence Similarity Metrics
........................ 397
16.4.1.1
Alignment-Based Similarity
.................. 398
16.4.1.2
Keyword-Based Similarity
................... 398
16.4.1.3
Kernel-Based Similarity
.................... 399
16.4.1.4
Model-Based Similarity
..................... 399
16.4.2
Sequence (Mustering Algorithms
..................... 399
16.4.2.1
Subsequence-Based Clustering
................. 399
16.4.2.2
Graph-Based Clustering
.................... 400
16.4.2.3
Probabilistic Models
...................... 402
16.4.2.4
Suffix Free and Suffix Array-Based Method
.......... 403
16.5
Software Packages
.................................
403
16.6
Discussion and Summary
..............................
405
xvi Contents
415
17
Network Clustering
Srinivasan Parthasarathy and
S M
Faisal
17.1
Introduction
.....................................
17.2
Background and Nomenclature
...........................
17.3
Problem Definition
.................................
17.4
Common Evaluation Criteria
............................
17.5
Partitioning with Geometric Information
......................
17.5.1
Coordinate Bisection
............................
4|l)
17.5.2
Inerţial
Bisection
..............................
4^}
17.5.3
Geometric Partitioning
...........................
17.6
Graph Growing and Greedy Algorithms
......................
421
17.6.1
Kernighan-Lin Algorithm
......................... ■*--
17.7
Agglomerativeand Divisive Clustering
.......................
4-
17.8
Spectral Clustering
.................................
424
17.8.1
Similarity Graphs
.............................
4-s
17.8.2
Types of Similarity Graphs
........................-*->
17.8.3
Graph Laplacians
.............................
42ft
17.8.3.1 Unnormal
ized
Graph Laplacian
................ 426
17.8.3.2
Normalized Graph Laplacians
................. 427
17.8.4
Spectral Clustering Algorithms
...................... 427
17.9
Markov Clustering
................................. 428
17.9.1
Regularized MCL (RMCL): Improvement over MCI
........... 429
17.10
Multilevel Partitioning
............................... 430
17.11
Local Partitioning Algorithms
...........................432
17.12
Hypergraph Partitioning
..............................433
17.13
Emerging Methods for Partitioning Special Graphs
................435
17.13.1
Bipartite Graphs
.............................. 435
17.13.2
Dynamic Graphs
.............................. 436
17.13.3
Heterogeneous Networks
......................... 437
17.13.4
Directed Networks
............................. 438
17.13.5
Combining Content and Relationship Information
............ 439
17.13.6
Networks with Overlapping Communities
................ 440
17.13.7
Probabilistic Methods
................ 442
17.14
Conclusion
...................... 44 t
18
A Survey of Uncertain Data Clustering Algorithms
457
Charu C. Aggarwal
18.1
Introduction
...................
,^7
18.2
Mixture Model Clustering of Uncertain Data
........... 459
18.3
Density-Based Clustering Algorithms
. .............
46()
18.3.1
FDBSCAN Algorithm
. ....................
ш)
18.3.2
FOPTICS Algorithm
.........
.................
46,
18.4
Partitional Clustering Algorithms
. .................
Д
18.4.1
The UK-Means Algorithm
.......................
Jí
18.4.2
The CK-Means Algorithm
.... . .................... 4^
18.4.3
Clustering Uncertain Data with Voronoi Diagrams
............ 464
8.4.4
Approximation Algorithms for Clustering Uncertain Data
........ 464
18.4.5
Speeding Up Distance Computations
..... ,,
s
18.5
Clustering Uncertain Data Streams
................
18.5.1
The UMicro Algorithm
........................
18.5.2
The LuMicro Algorithm
....................... **
Contents
XVII
18.5.3
Enhancements to Stream Clustering
.................... 471
18.6
Clustering Uncertain Data in High Dimensionality
................. 472
18.6.1
Subspace Clustering of Uncertain Data
.................. 473
18.6.2
I iPStream: Projected Clustering of Uncertain Data Streams
....... 474
18.7
Clustering with the Possible Worlds Model
.................... 477
18.8
Clustering Uncertain Graphs
...................... 47g
18.9
Conclusions and Summary
.......................... 47g
19
Concepts of Visual and Interactive Clustering
483
A lexatuU r l
І і
une
burg
19.1
Introduction
.....................................
4g3
19.2
Direct Visual and Interactive Clustering
......................
4g4
19.2.1
Scatlerplots
.................................
4g5
19.2.2
Parallel Coordinates
............................ 488
19.2.3
Discussion
.................................
49I
19.3
Visual Interactive Steering of Clustering
...................... 491
19.3.1
Visual Assessment of Convergence of Clustering Algorithm
....... 491
19.3.2
Interactive Hierarchical Clustering
.................... 492
19.3.3
Visual Clustering with SOMs
....................... 494
19.3.4
Discussion
................................. 494
19.4
Interactive Comparison and Combination of Clusterings
.............. 495
19.4.1
Spacco!
Clusterings
............................ 495
19.4.2
Visualization
................................ 497
19.4.3
Discussion
................................. 497
19..5 Visualization of Clusters for Sense-Making
.................... 497
19.6
Summary
...................................... 500
20
Semisuperviscd Clustering
505
Amrudin Agavic and Arindam Banerjee
20.1
Introduction
..................................... 506
20.2
Clustering with Pointwise and Pairvvise Semisupervision
............. 507
20.2.1
Semisupervised Clustering Based on Seeding
............... 507
20.2.2
Semisupervised Clustering Based on Pairvvise Constraints
........ 508
20.2.3
Active Learning for Semisupervised Clustering
.............. 511
20.2.4
Semisupervised Clustering Based on User Feedback
........... 512
20.2.5
Semisupervised Clustering Based on
Nonnegative
Matrix Factorization
. 513
20.3
Semisupervised Graph Cuts
............................. 513
20.3.1
Semisupervised Unnormalized Cut
.................... 515
20.3.2
Semisupervised Ratio Cut
......................... 515
20.3.3
Semisupervised Normalized Cut
...................... 516
20.4
A Unified View of Label Propagation
....................... 517
20.4.1
Generalized Label Propagation
...................... 517
20.4.2
Gaussian Fields
.............................. 517
20.4.3
Tikhonov Regularization (T1KREG)
................... 518
20.4.4
Local and Global Consistency
....................... 518
20.4.5
Related Methods
..............................
5l9
20.4.5.1
Cluster Kernels
......................... 519
20.4.5.2
Gaussian Random Walks EM (GWEM)
............ 519
20.4.5.3
Linear Neighborhood Propagation
............... 520
20.4.6
Label Propagation and Green s Function
................. 521
20.4.7
Label Propagation and Semisupervised Graph Cuts
............ 521
Contents
xviu
521
20.5 Semisupervised
Embedding
.........................
.Ί1
20.5.
1 Nonlinear Manifold Embedding
...................... .--
20.5.2
Semisupervised Embedding
........................
~[
20.5.2.
1 Unconstrained Semisupervised Embedding
.......... *-■
20.5.2.2
Constrained Seinisuperviscd Embedding
............
^-
20.6
Comparative Experimental Analysis
........................
^~
20.6.
1 Experimental Results
...........................
]~
20.6.2
Semisupervised Embedding Methods
...................
^-
20.7
Conclusions
.......................................
21
Alternative Clustering Analysis: A Review 535
James Bailey
21.1
Introduction
.....................................
^ -1
21.2
Technical Preliminaries
...............................
;i7
21.3
Multiple Clustering Analysis Using Alternative Clusterings
............
-VW
21.3.1
Alternative Clustering Algorithms: A Taxonomy
.............
six
21.3.2
Unguided Generation
...........................
-vW
21.3.2.1
Naive
..............................
*-w
21.3.2.2
Meta
Clustering
.........................
5-w
21.3.2.3
Eigenvectors of the Laplacian Matrix
.............. 540
21.3.2.4
Decorrelated ¿-Means and
Convoiutional
EM
......... 540
21.3.2.5
САМІ
.............................. 540
21.3.3
Guided Generation with Constraints
.................... 541
21.3.3.1
COALA
............................. 541
21.3.3.2
Constrained Optimization Approach
.............. 541
21.3.3.3
MAXIMUS
........................... 542
21.3.4
Orthogonal Transformation Approaches
................. 543
21.3.4.1
Orthogonal Views
........................ 543
21.3.4.2
ADFT
.............................. 543
21.3.5
Information Theoretic
........................... 544
21.3.5.1
Conditional Information Bottleneck
(СІВ)
........... 544
21.3.5.2
Conditional Ensemble Clustering
................ 544
21.3.5.3
NACI
.............................. 544
21.3.5.4
mSC
............................... 545
21.4
Connections to Multivievv Clustering and Subspace Clustering
.......... 545
21.5
Future Research Issues
.................. 547
21.6
Summary
...................... 547
22
Cluster Ensembles: Theory and Applications
551
Joydeep Ghosh and Ayan Acharya
22.1
Introduction
.............. ^1
22.2
The Cluster Ensemble Problem
.................. 554
22.3
Measuring Similarity Between Clustering Solutions
......... .555
22.4
Cluster Ensemble Algorithms
................. ..........
55Я
22.4.1
Probabilistic Approaches to Cluster Ensembles
.............. 558
22.4.1.1
A Mixture Model for Cluster Ensembles (MMCE)
...... 558
22.4.1.2
Bayesian Cluster Ensembles
(ВСЕ)
.............. 558
22.4.1.3
Nonparametric Bayesian Cluster Ensembles (NPBCE)
.... 559
22.4.2
Pairwise Similarity-Based Approaches
.................. 560
22.4.2.1
Methods Based on Ensemble Co-Association Matrix
..... 560
Contents xix
22.4.2.2
Relating Consensus Clustering to Other Optimization Formu¬
lations
........................ 5^2
22.4.3
Direct Approaches I/sing Cluster Labels
................. 562
22.4.3.1
Graph Partitioning
....................... 562
22.4.3.2
Cumulative Voting
....................... 563
22.5
Applications of Consensus Clustering
....................... 564
22.5.1
(iene
Expression Data Analysis
...................... 564
22.5.2
Image Segmentation
............................ 564
22.6
Concluding Remarks
................................ 566
23
Clustering Validation Measures
571
Hui
Xiong and
У.Іюпцтои
Li
23.1
Introduction
..................................... 572
23.2
External Clustering Validation Measures
...................... 573
23.2.1
An Overview of External Clustering Validation Measures
........ 574
23.2.2
Defective Validation Measures
...................... 575
23.2.2.1
K-Means: The Uniform Effect
................. 575
23.2.2.2
A Necessary Selection Criterion
................ 576
23.2.2.3
The Cluster Validation Results
................. 576
23.2.2.4
The Issues with the Defective Measures
............ 577
23.2.2.5
Improving the Defective Measures
............... 577
23.2.3
Measure Normalization
.......................... 577
23.2.3.1
Normalizing the Measures
................... 578
23.2.3.2
The DCV Criterion
....................... 581
23.2.3.3
The Effect of Normalization
.................. 583
23.2.4
Measure Properties
............................. 584
23.2.4.1
The Consistency Between Measures
.............. 584
23.2.4.2
Properties of Measures
..................... 586
23.2.4.3
Discussions
........................... 589
23.3
Internal Clustering Validation Measures
...................... 589
23.3.1
An Overview oflnternal Clustering Validation Measures
......... 589
23.3.2
Understanding of Internal Clustering Validation Measures
........ 592
23.3.2.1
The Impact of
Monotonicity
.................. 592
23.3.2.2
The Impact of Noise
...................... 593
23.3.2.3
The Impact of Density
..................... 594
23.3.2.4
The Impact of Subclusters
................... 595
23.3.2.5
The Impact of Skewed Distributions
.............. 596
23.3.2.6
The Impact of Arbitrary Shapes
................ 598
23.3.3
Properties of Measures
........................... 600
23.4
Summary
......................................
601
24
Educational and Software Resources for Data Clustering
607
Charu C. Aggarwal and
Chandan K.
Reddy
24.1
Introduction
.....................................
607
24.2
Educational Resources
...............................
24.2.1
Books on Data Clustering
.........................
608
24.2.2
Popular Survey Papers on Data Clustering
................ 608
24.3
Software for Data Clustering
............................
^JO
24.3.1
Free and Open-Source Software
...................... 610
24.3.1.1
General Clustering Software
.................. 610
243 1 2
Specialized Clustering Software
................ 610
xx Contents
24.3.2
Commercial Packages
........................... 611
24.3.3
Data
Benchmarks
for Software and Research
............... 611
24.4
Summary
...................................... 612
Index
617
|
any_adam_object | 1 |
author_GND | (DE-588)133500101 |
building | Verbundindex |
bvnumber | BV041209676 |
classification_rvk | ST 530 |
ctrlnum | (OCoLC)859376232 (DE-599)BSZ391828436 |
dewey-full | 519.535 |
dewey-hundreds | 500 - Natural sciences and mathematics |
dewey-ones | 519 - Probabilities and applied mathematics |
dewey-raw | 519.535 |
dewey-search | 519.535 |
dewey-sort | 3519.535 |
dewey-tens | 510 - Mathematics |
discipline | Informatik Mathematik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01657nam a22003972c 4500</leader><controlfield tag="001">BV041209676</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20140218 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">130808s2014 ad|| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781466558212</subfield><subfield code="c">hbk</subfield><subfield code="9">978-1-4665-5821-2</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)859376232</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BSZ391828436</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-473</subfield><subfield code="a">DE-703</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-2070s</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-523</subfield><subfield code="a">DE-634</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">519.535</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Data clustering</subfield><subfield code="b">algorithms and applications</subfield><subfield code="c">ed. by Charu C. Aggarwal ...</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boca Raton [u.a.]</subfield><subfield code="b">CRC Press</subfield><subfield code="c">2014</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXVI, 622 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield><subfield code="c">26 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Chapman & Hall, CRC data mining and knowledge discovery series</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references and index</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Cluster-Analyse</subfield><subfield code="0">(DE-588)4070044-6</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Algorithmus</subfield><subfield code="0">(DE-588)4001183-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Cluster-Analyse</subfield><subfield code="0">(DE-588)4070044-6</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Algorithmus</subfield><subfield code="0">(DE-588)4001183-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Aggarwal, Charu C.</subfield><subfield code="d">1970-</subfield><subfield code="e">Sonstige</subfield><subfield code="0">(DE-588)133500101</subfield><subfield code="4">oth</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bamberg</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=026184401&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-026184401</subfield></datafield></record></collection> |
id | DE-604.BV041209676 |
illustrated | Illustrated |
indexdate | 2024-07-10T00:42:09Z |
institution | BVB |
isbn | 9781466558212 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-026184401 |
oclc_num | 859376232 |
open_access_boolean | |
owner | DE-473 DE-BY-UBG DE-703 DE-20 DE-2070s DE-11 DE-523 DE-634 |
owner_facet | DE-473 DE-BY-UBG DE-703 DE-20 DE-2070s DE-11 DE-523 DE-634 |
physical | XXVI, 622 S. Ill., graph. Darst. 26 cm |
publishDate | 2014 |
publishDateSearch | 2014 |
publishDateSort | 2014 |
publisher | CRC Press |
record_format | marc |
series2 | Chapman & Hall, CRC data mining and knowledge discovery series |
spelling | Data clustering algorithms and applications ed. by Charu C. Aggarwal ... Boca Raton [u.a.] CRC Press 2014 XXVI, 622 S. Ill., graph. Darst. 26 cm txt rdacontent n rdamedia nc rdacarrier Chapman & Hall, CRC data mining and knowledge discovery series Includes bibliographical references and index Data Mining (DE-588)4428654-5 gnd rswk-swf Cluster-Analyse (DE-588)4070044-6 gnd rswk-swf Algorithmus (DE-588)4001183-5 gnd rswk-swf Data Mining (DE-588)4428654-5 s Cluster-Analyse (DE-588)4070044-6 s Algorithmus (DE-588)4001183-5 s DE-604 Aggarwal, Charu C. 1970- Sonstige (DE-588)133500101 oth Digitalisierung UB Bamberg application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=026184401&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Data clustering algorithms and applications Data Mining (DE-588)4428654-5 gnd Cluster-Analyse (DE-588)4070044-6 gnd Algorithmus (DE-588)4001183-5 gnd |
subject_GND | (DE-588)4428654-5 (DE-588)4070044-6 (DE-588)4001183-5 |
title | Data clustering algorithms and applications |
title_auth | Data clustering algorithms and applications |
title_exact_search | Data clustering algorithms and applications |
title_full | Data clustering algorithms and applications ed. by Charu C. Aggarwal ... |
title_fullStr | Data clustering algorithms and applications ed. by Charu C. Aggarwal ... |
title_full_unstemmed | Data clustering algorithms and applications ed. by Charu C. Aggarwal ... |
title_short | Data clustering |
title_sort | data clustering algorithms and applications |
title_sub | algorithms and applications |
topic | Data Mining (DE-588)4428654-5 gnd Cluster-Analyse (DE-588)4070044-6 gnd Algorithmus (DE-588)4001183-5 gnd |
topic_facet | Data Mining Cluster-Analyse Algorithmus |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=026184401&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT aggarwalcharuc dataclusteringalgorithmsandapplications |