Constrained clustering: advances in algorithms, theory, and applications
Gespeichert in:
Format: | Buch |
---|---|
Sprache: | English |
Veröffentlicht: |
Boca Raton, FL [u.a.]
Chapman & Hall/CRC
2009
|
Schriftenreihe: | Chapman & Hall/CRC data mining and knowledge discovery series
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Includes bibliographical references and index |
Beschreibung: | 441 S. graph. Darst. |
ISBN: | 9781584889960 |
Internformat
MARC
LEADER | 00000nam a2200000zc 4500 | ||
---|---|---|---|
001 | BV035037453 | ||
003 | DE-604 | ||
005 | 20090515 | ||
007 | t | ||
008 | 080904s2009 xxud||| |||| 00||| eng d | ||
010 | |a 2008014590 | ||
020 | |a 9781584889960 |c hardback : alk. paper |9 978-1-58488-996-0 | ||
035 | |a (OCoLC)1332003875 | ||
035 | |a (DE-599)BVBBV035037453 | ||
040 | |a DE-604 |b ger |e aacr | ||
041 | 0 | |a eng | |
044 | |a xxu |c US | ||
049 | |a DE-91G | ||
050 | 0 | |a QA278 | |
082 | 0 | |a 519.5/3 | |
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
084 | |a MAT 627f |2 stub | ||
084 | |a DAT 700f |2 stub | ||
084 | |a DAT 777f |2 stub | ||
245 | 1 | 0 | |a Constrained clustering |b advances in algorithms, theory, and applications |c ed. by Sugato Basu ... |
264 | 1 | |a Boca Raton, FL [u.a.] |b Chapman & Hall/CRC |c 2009 | |
300 | |a 441 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Chapman & Hall/CRC data mining and knowledge discovery series | |
500 | |a Includes bibliographical references and index | ||
650 | 7 | |a Algorithmes |2 ram | |
650 | 7 | |a Classification automatique (statistique) - Informatique |2 ram | |
650 | 7 | |a Exploration de données |2 ram | |
650 | 4 | |a Datenverarbeitung | |
650 | 4 | |a Cluster analysis |x Data processing | |
650 | 4 | |a Data mining | |
650 | 4 | |a Computer algorithms | |
650 | 0 | 7 | |a Cluster-Analyse |0 (DE-588)4070044-6 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | 1 | |a Cluster-Analyse |0 (DE-588)4070044-6 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Basu, Sugato |e Sonstige |4 oth | |
856 | 4 | 2 | |m Digitalisierung UB Bayreuth |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016706328&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-016706328 |
Datensatz im Suchindex
_version_ | 1809768531292585984 |
---|---|
adam_text |
Contents
1
Introduction
1
Sugato
Basu.
Ian
Davidson,
and Kirt
L.
Wagstaff
1.1
Background and Motivation
. 1
1.2
Initial Work: Instance-Level Constraints
. 2
1.2.1
Enforcing Pairwise Constraints
. 3
1.2.2
Learning a Distance Metric from Pairwise Constraints
5
1.3
Advances Contained in This Book
. 6
1.3.1
Constrained Partitional Clustering
. 7
1.3.2
Beyond Pairwise Constraints
. 8
1.3.3
Theory
. 9
1.3.4
Applications
. 9
1.4
Conclusion
. 10
1.5
Notation and Symbols
. 12
2
Semi-Supervised Clustering with User Feedback
17
David
Colin,
Rich Caruana, and Andrew Kachites McCallum
2.1
Introduction
. 17
2.1.1
Relation to Active Learning
. 19
2.2
Clustering
. 20
2.3
Semi-Supervised Clustering
. 21
2.3.1
Implementing Pairwise Document Constraints
. 22
2.3.2
Other Constraints
. 23
2.4
Experiments
. 24
2.4.1
Clustering Performance
. 24
2.4.2
Learning Term Weightings
. 26
2.5
Discussion
. 27
2.5.1
Constraints vs. Labels
. 27
2.5.2
Types of User Feedback
. 27
2.5.3
Other Applications
. 28
2.5.4
Related Work
. 28
3
Gaussian Mixture Models with Equivalence Constraints
33
Noam. Shental, Aharon Bar-HUM, Tomer Hertz, and Daphna
Weinshall
3.1
Introduction
. 34
3.2
Constrained EM: The Update Rules
. 36
3.2.1
Notations
. 37
3.2.2
Incorporating Must-Link Constraints
. 38
3.2.3
Incorporating Cannot-Link Constraints
. 41
3.2.4
Combining Must-Link and Cannot-Link Constraints
. 44
3.3
Experimental Results
. 45
3.3.1
UCI Data Sets
. 46
3.3.2
Facial
Imagé
Database
. 48
3.4
Obtaining Equivalence Constraints
. 49
3.5
Related Work
. 50
3.6
Summary and Discussion
. 51
3.7
Appendix: Calculating the Normalizing Factor
Z
and its Deriva¬
tives when Introducing Cannot-Link Constraints
. 52
3.7.1
Exact Calculation of
Z
and
§§¡
. 53
3.7.2
Approximating
Z
Using the Pseudo-Likelihood Assump¬
tion
. 54
4
Pairwise Constraints as Priors in Probabilistic Clustering
59
Zhengdong
Lu
and Todd K.
Leen
4.1
Introduction
. 60
4.2
Model
. 61
4.2.1
Prior Distribution on Cluster Assignments
. 61
4.2.2
Pairwise Relations
. 62
4.2.3
Model Fitting
. 63
4.2.4
Selecting the Constraint Weights
. 65
4.3
Computing the Cluster Posterior
. 68
4.3.1
Two Special Cases with Easy Inference
. 68
4.3.2
Estimation with Gibbs Sampling
. 69
4.3.3
Estimation with Mean Field Approximation
. 70
4.4
Related Models
. 70
4.5
Experiments
. 72
4.5.1
Artificial Constraints
. 73
4.5.2
Real-World Problems
. 75
4.6
Discussion
. 78
4.7
Appendix A
. 80
4.8
Appendix
В
. 81
4.9
AppendixC
. 83
5
Clustering with Constraints: A Mean-Field Approximation
Perspective
91
Tilman
Lange, Mariin
H.
Law, Anil
К.
Jain, and Joachim M.
Buhmann
5.1
Introduction
. 92
5.1.1
Related Work
. 93
5.2
Model-Based Clustering
. 96
5.3
A Maximum Entropy Approach to Constraint Integration
. . 98
5.3.1
Integration of Partial Label Information
. 99
5.3.2
Maximum-Entropy Label Prior
. 100
5.3.3
Markov Random Fields and the Gibbs Distribution
. . 102
5.3.4
Parameter Estimation
. 104
5.3.5
Mean-Field Approximation for Posterior Inference
. . 105
5.3.6
A Detour: Pairwise Clustering, Constraints, and Mean
Fields
. 107
5.3.7
The Need for Weighting
. 108
5.3.8
Selecting
η
. 110
5.4
Experiments
. 112
5.5
Summary
. 116
Constraint-Driven Co-Clustering of
0/1
Data
123
Rugge.ro G.
Pensa,
Celine Robardet, and
Jean-François Bouli-
caut
6.1
Introduction
. 124
6.2
Problem Setting
. 126
6.3
A Constrained Co-Clustering Algorithm Based on a Local-to-
Global Approach
. 127
6.3.1
A Local-to-Global Approach
. 128
6.3.2
The CDK-Means Proposal
. 128
6.3.3
Constraint-Driven Co-Clustering
. 130
6.3.4
Discussion on Constraint Processing
. 132
6.4
Experimental Validation
. 134
6.4.1
Evaluation Method
. 135
6.4.2
Using Extended Must-Link and Cannot-Link
Constraints
. 136
6.4.3
Time Interval Cluster Discovery
. 139
6.5
Conclusion
. 144
On Supervised Clustering for Creating Categorization Seg¬
mentations
149
Charu Aggarwal, Stephen C. Gates, and Philip Yu
7.1
Introduction
. 150
7.2
A Description of the Categorization System
. 151
7.2.1
Some Definitions and Notations
. 151
7.2.2
Feature Selection
. 152
7.2.3
Supervised Cluster Generation
. 153
7.2.4
Categorization Algorithm
. 157
7.3
Performance of the Categorization System
. 160
7.3.1
Categorization
. 164
7.3.2
An Empirical Survey of Categorization Effectiveness
. 166
7.4
Conclusions and Summary
. 168
8
Clustering with Balancing Constraints
171
Arindam Banerjee and Joy deep Ghosh
8.1
Introduction
.
171
8.2
A Scalable Framework for Balanced Clustering
. 174
8.2.1
Formulation and Analysis
. 174
8.2.2
Experimental Results
. 177
8.3
Frequency Sensitive Approaches for Balanced Clustering
. 182
8.3.1
Frequency Sensitive Competitive Learning
. 182
8.3.2
Case Study: Balanced Clustering of Directional Data
. 183
8.3.3
Experimental Results
. 186
8.4
Other Balanced Clustering Approaches
. 191
8.4.1
Balanced Clustering by Graph Partitioning
. 192
8.4.2
Model-Based Clustering with Soft Balancing
. 194
8.5
Concluding Remarks
. 195
9
Using Assignment Constraints to Avoid Empty Clusters in
k-Means Clustering
201
Ayhan Demiriz, Kristin P. Bennett, and Paul S. Bradley
9.1
Introduction
. 202
9.2
Constrained Clustering Problem and Algorithm
. 203
9.3
Cluster Assignment Sub-Problem
. 206
9.4
Numerical Evaluation
. 208
9.5
Extensions
. 213
9.6
Conclusion
. 217
10
Collective Relational Clustering
221
Indrajit Bhattacharya and
Lise
Getoor
10.1
Introduction
. 222
10.2
Entity Resolution: Problem Formulation
. 223
10.2.1
Pairwise Resolution
. 224
10.2.2
Collective Resolution
. 225
10.2.3
Entity Resolution Using Relationships
. 226
10.2.4
Pairwise Decisions Using Relationships
. 226
10.2.5
Collective Relational Entity Resolution
. 227
10.3
An Algorithm for Collective Relational Clustering
. 230
10.4
Correctness of Collective Relational Clustering
. 233
10.5
Experimental Evaluation
. 235
10.5.1
Experiments on Synthetic Data
. 238
10.6
Conclusions
. 241
11
Non-Redundant Data Clustering
245
David Gondek
11.1
Introduction
. 245
11.2
Problem Setting
. 246
11.2.1
Background Concepts
. 247
11.2.2
Multiple
Clusterings
. 249
11.2.3
Information
Orthogonality
. 250
11.2.4
Non-Redundant Clustering
. 251
11.3
Conditional
Ensembles
. 252
11.3.1
Complexity.
254
11.3.2
Conditions for Correctness
. 254
11.4
Constrained Conditional Information Bottleneck
. 257
11.4.1
Coordinated Conditional Information Bottleneck
. . . 258
11.4.2
Derivation from Multivariate IB
. 258
11.4.3
Coordinated
СІВ
. 260
11.4.4
Update Equations
. 261
11.4.5
Algorithms
. 265
11.5
Experimental Evaluation
. 267
11.5.1
Image Data Set
. 269
11.5.2
Text Data Sets
. 269
11.5.3
Evaluation Using Synthetic Data
. 273
11.5.4
Summary of Experimental Results
. 279
11.6
Conclusion
. 280
12
Joint Cluster Analysis of Attribute Data and Relationship
Data
285
Martin Ester, Rong Ge, Byron
,/.
Gao,
Zengjian
Ни,
and Boaz
Ben-moshe
12.1
Introduction
. 285
12.2
Related Work
. 287
12.3
Problem Definition and Complexity Analysis
. 291
12.3.1
Preliminaries and Problem Definition
. 291
12.3.2
Complexity Analysis
. 292
12.4
Approximation Algorithms
. 295
12.4.1
Inapproximability Results for CkC
. 295
12.4.2
Approximation Results for Metric CkC
. 296
12.5
Heuristic Algorithm
. 300
12.5.1
Overview of NetScan
. 300
12.5.2
More Details on NetScan
. 302
12.5.3
Adaptation of NetScan to the Connected A1-Means Prob¬
lem
. 305
12.6
Experimental Results
. 305
12.7
Discussion
. 307
13
Correlation Clustering
313
Nicole Immorlica and Anthony
Wirth
13.1
Definition and Model
. 313
13.2
Motivation and Background
. 314
13.2.1
Maximizing Agreements
. 315
13.2.2
Minimizing Disagreements
. 316
13.2.3
Maximizing Correlation
. 317
13.3
Techniques
. 317
13.3.1
Region Growing
. 318
13.3.2
Combinatorial Approach
. 321
13.4
Applications
. 323
13.4.1
Location Area Planning
. 323
13.4.2
Co-Reference
. 323
13.4.3
Constrained Clustering
. 324
13.4.4
Cluster Editing
. 324
13.4.5
Consensus Clustering
. 324
14
Interactive Visual Clustering for Relational Data
329
Marie
des
Jardins,
James MacGlashan, and Julia
Ferraioli
14.1
Introduction
. 329
14.2
Background
. 331
14.3
Approach
. 332
14.3.1
Interpreting User Actions
. 332
14.3.2
Constrained Clustering
. 332
14.3.3
Updating the Display
. 333
14.3.4
Simulating the User
. 334
14.4
System Operation
. 334
14.5
Methodology
. 337
14.5.1
Data Sets
. 339
14.5.2
Circles
. 340
14.5.3
Overlapping Circles
. 340
14.5.4
Iris
. 340
14.5.5
Internet Movie Data Base
. 341
14.5.6
Classical and Rock Music
. 342
14.5.7
Amino
Acid Indices
. 342
14.5.8
Amino
Acid
. 342
14.6
Results and Discussion
. 343
14.6.1
Circles
. 343
14.6.2
Overlapping Circles
. 345
14.6.3
Iris
. 346
14.6.4
IMDB
. 346
14.6.5
Classical and Rock Music
. 347
14.6.6
Amino
Acid Indices
. 348
14.6.7
Amino
Acid
. 349
14.7
Related Work
. 350
14.8
Future Work and Conclusions
. 351
15
Distance
Metric
Learning from Cannot-be-Linked Example
Pairs, with Application to Name Disambiguation
357
Satoshi Oyama and Katsumi Tanaka
15.1
Background and Motivation
. 357
15.2
Preliminaries
. 359
15.3
Problem Formalization
. 361
15.4
Positive Semi-Definiteness of Learned Matrix
. 362
15.5
Relationship to Support Vector Machine Learning
. 363
15.6
Handling Noisy Data
. 364
15.7
Relationship to Single-Class Learning
. 365
15.8
Relationship to Online Learning
. 365
15.9
Application to Name Disambiguation
. 366
15.9.1
Name Disambiguation
. 366
15.9.2
Data Set and Software
. 367
15.9.3
Results
. 369
15.10
Conclusion
. 370
16
Privacy-Preserving Data Publishing: A Constraint-Based Clus¬
tering Approach
375
Anthony K. H. Tung, Jiawei Han,
Laku V. S.
Lakshmanan, and
Raymond T. Ng
16.1
Introduction
. 375
16.2
The Constrained Clustering Problem
. 377
16.3
Clustering without the Nearest Representative Property
. . . 380
16.3.1
Cluster Refinement under Constraints
. 381
16.3.2
Handling Tight Existential Constraints
. 384
16.3.3
Local Optimally and Termination
. 385
16.4
Scaling the Algorithm for Large Databases
. 387
16.4.1
Micro-Clustering and Its Complication
. 387
16.4.2
Micro-Cluster Sharing
. 388
16.5
Privacy Preserving Data Publishing as a Constrained Cluster¬
ing Problem
. 389
16.5.1
Determining
С
from V
. 390
16.5.2
Determining
с
. 391
16.6
Conclusion
. 392
17
Learning with Pairwise Constraints for Video Object Classi¬
fication
397
Rong Yan, Man Zhang, Me Yang, and Alexander G.
Hauptmann
17.1
Introduction
. 398
17.2
Related Work
. 400
17.3
Discriminative Learning with Pairwise Constraints
. 401
17.3.1
Regularized Loss Function with Pairwise Information
. 402
17.3.2
Non-Convex Pairwise Loss Functions
. 404
17.3.3
Convex Pairwise Loss Functions
. 404
17.4
Algorithms
. 406
17.4.1
Convex Pairwise Kernel Logistic Regression
. 407
17.4.2
Convex Pairwise Support Vector Machines
.· . . 408
17.4.3
Non-Convex Pairwise Kernel Logistic Regression
. . . 410
17.4.4
An Illustrative Example
. 414
17.5
Multi-Class Classification with Pairwise Constraints
. 414
17.6
Noisy Pairwise Constraints
. 415
17.7
Experiments
. 416
17.7.1
Data Collections and Preprocessing
. 417
17.7.2
Selecting Informative Pairwise Constrains from Video
417
17.7.3
Experimental Setting
. 420
17.7.4
Performance Evaluation
. 421
17.7.5
Results for Noisy Pairwise Constraints
. 424
17.8
Conclusion
. 426
Index
431 |
adam_txt |
Contents
1
Introduction
1
Sugato
Basu.
Ian
Davidson,
and Kirt
L.
Wagstaff
1.1
Background and Motivation
. 1
1.2
Initial Work: Instance-Level Constraints
. 2
1.2.1
Enforcing Pairwise Constraints
. 3
1.2.2
Learning a Distance Metric from Pairwise Constraints
5
1.3
Advances Contained in This Book
. 6
1.3.1
Constrained Partitional Clustering
. 7
1.3.2
Beyond Pairwise Constraints
. 8
1.3.3
Theory
. 9
1.3.4
Applications
. 9
1.4
Conclusion
. 10
1.5
Notation and Symbols
. 12
2
Semi-Supervised Clustering with User Feedback
17
David
Colin,
Rich Caruana, and Andrew Kachites McCallum
2.1
Introduction
. 17
2.1.1
Relation to Active Learning
. 19
2.2
Clustering
. 20
2.3
Semi-Supervised Clustering
. 21
2.3.1
Implementing Pairwise Document Constraints
. 22
2.3.2
Other Constraints
. 23
2.4
Experiments
. 24
2.4.1
Clustering Performance
. 24
2.4.2
Learning Term Weightings
. 26
2.5
Discussion
. 27
2.5.1
Constraints vs. Labels
. 27
2.5.2
Types of User Feedback
. 27
2.5.3
Other Applications
. 28
2.5.4
Related Work
. 28
3
Gaussian Mixture Models with Equivalence Constraints
33
Noam. Shental, Aharon Bar-HUM, Tomer Hertz, and Daphna
Weinshall
3.1
Introduction
. 34
3.2
Constrained EM: The Update Rules
. 36
3.2.1
Notations
. 37
3.2.2
Incorporating Must-Link Constraints
. 38
3.2.3
Incorporating Cannot-Link Constraints
. 41
3.2.4
Combining Must-Link and Cannot-Link Constraints
. 44
3.3
Experimental Results
. 45
3.3.1
UCI Data Sets
. 46
3.3.2
Facial
Imagé
Database
. 48
3.4
Obtaining Equivalence Constraints
. 49
3.5
Related Work
. 50
3.6
Summary and Discussion
. 51
3.7
Appendix: Calculating the Normalizing Factor
Z
and its Deriva¬
tives when Introducing Cannot-Link Constraints
. 52
3.7.1
Exact Calculation of
Z
and
§§¡
. 53
3.7.2
Approximating
Z
Using the Pseudo-Likelihood Assump¬
tion
. 54
4
Pairwise Constraints as Priors in Probabilistic Clustering
59
Zhengdong
Lu
and Todd K.
Leen
4.1
Introduction
. 60
4.2
Model
. 61
4.2.1
Prior Distribution on Cluster Assignments
. 61
4.2.2
Pairwise Relations
. 62
4.2.3
Model Fitting
. 63
4.2.4
Selecting the Constraint Weights
. 65
4.3
Computing the Cluster Posterior
. 68
4.3.1
Two Special Cases with Easy Inference
. 68
4.3.2
Estimation with Gibbs Sampling
. 69
4.3.3
Estimation with Mean Field Approximation
. 70
4.4
Related Models
. 70
4.5
Experiments
. 72
4.5.1
Artificial Constraints
. 73
4.5.2
Real-World Problems
. 75
4.6
Discussion
. 78
4.7
Appendix A
. 80
4.8
Appendix
В
. 81
4.9
AppendixC
. 83
5
Clustering with Constraints: A Mean-Field Approximation
Perspective
91
Tilman
Lange, Mariin
H.
Law, Anil
К.
Jain, and Joachim M.
Buhmann
5.1
Introduction
. 92
5.1.1
Related Work
. 93
5.2
Model-Based Clustering
. 96
5.3
A Maximum Entropy Approach to Constraint Integration
. . 98
5.3.1
Integration of Partial Label Information
. 99
5.3.2
Maximum-Entropy Label Prior
. 100
5.3.3
Markov Random Fields and the Gibbs Distribution
. . 102
5.3.4
Parameter Estimation
. 104
5.3.5
Mean-Field Approximation for Posterior Inference
. . 105
5.3.6
A Detour: Pairwise Clustering, Constraints, and Mean
Fields
. 107
5.3.7
The Need for Weighting
. 108
5.3.8
Selecting
η
. 110
5.4
Experiments
. 112
5.5
Summary
. 116
Constraint-Driven Co-Clustering of
0/1
Data
123
Rugge.ro G.
Pensa,
Celine Robardet, and
Jean-François Bouli-
caut
6.1
Introduction
. 124
6.2
Problem Setting
. 126
6.3
A Constrained Co-Clustering Algorithm Based on a Local-to-
Global Approach
. 127
6.3.1
A Local-to-Global Approach
. 128
6.3.2
The CDK-Means Proposal
. 128
6.3.3
Constraint-Driven Co-Clustering
. 130
6.3.4
Discussion on Constraint Processing
. 132
6.4
Experimental Validation
. 134
6.4.1
Evaluation Method
. 135
6.4.2
Using Extended Must-Link and Cannot-Link
Constraints
. 136
6.4.3
Time Interval Cluster Discovery
. 139
6.5
Conclusion
. 144
On Supervised Clustering for Creating Categorization Seg¬
mentations
149
Charu Aggarwal, Stephen C. Gates, and Philip Yu
7.1
Introduction
. 150
7.2
A Description of the Categorization System
. 151
7.2.1
Some Definitions and Notations
. 151
7.2.2
Feature Selection
. 152
7.2.3
Supervised Cluster Generation
. 153
7.2.4
Categorization Algorithm
. 157
7.3
Performance of the Categorization System
. 160
7.3.1
Categorization
. 164
7.3.2
An Empirical Survey of Categorization Effectiveness
. 166
7.4
Conclusions and Summary
. 168
8
Clustering with Balancing Constraints
171
Arindam Banerjee and Joy deep Ghosh
8.1
Introduction
.
171
8.2
A Scalable Framework for Balanced Clustering
. 174
8.2.1
Formulation and Analysis
. 174
8.2.2
Experimental Results
. 177
8.3
Frequency Sensitive Approaches for Balanced Clustering
. 182
8.3.1
Frequency Sensitive Competitive Learning
. 182
8.3.2
Case Study: Balanced Clustering of Directional Data
. 183
8.3.3
Experimental Results
. 186
8.4
Other Balanced Clustering Approaches
. 191
8.4.1
Balanced Clustering by Graph Partitioning
. 192
8.4.2
Model-Based Clustering with Soft Balancing
. 194
8.5
Concluding Remarks
. 195
9
Using Assignment Constraints to Avoid Empty Clusters in
k-Means Clustering
201
Ayhan Demiriz, Kristin P. Bennett, and Paul S. Bradley
9.1
Introduction
. 202
9.2
Constrained Clustering Problem and Algorithm
. 203
9.3
Cluster Assignment Sub-Problem
. 206
9.4
Numerical Evaluation
. 208
9.5
Extensions
. 213
9.6
Conclusion
. 217
10
Collective Relational Clustering
221
Indrajit Bhattacharya and
Lise
Getoor
10.1
Introduction
. 222
10.2
Entity Resolution: Problem Formulation
. 223
10.2.1
Pairwise Resolution
. 224
10.2.2
Collective Resolution
. 225
10.2.3
Entity Resolution Using Relationships
. 226
10.2.4
Pairwise Decisions Using Relationships
. 226
10.2.5
Collective Relational Entity Resolution
. 227
10.3
An Algorithm for Collective Relational Clustering
. 230
10.4
Correctness of Collective Relational Clustering
. 233
10.5
Experimental Evaluation
. 235
10.5.1
Experiments on Synthetic Data
. 238
10.6
Conclusions
. 241
11
Non-Redundant Data Clustering
245
David Gondek
11.1
Introduction
. 245
11.2
Problem Setting
. 246
11.2.1
Background Concepts
. 247
11.2.2
Multiple
Clusterings
. 249
11.2.3
Information
Orthogonality
. 250
11.2.4
Non-Redundant Clustering
. 251
11.3
Conditional
Ensembles
. 252
11.3.1
Complexity.
254
11.3.2
Conditions for Correctness
. 254
11.4
Constrained Conditional Information Bottleneck
. 257
11.4.1
Coordinated Conditional Information Bottleneck
. . . 258
11.4.2
Derivation from Multivariate IB
. 258
11.4.3
Coordinated
СІВ
. 260
11.4.4
Update Equations
. 261
11.4.5
Algorithms
. 265
11.5
Experimental Evaluation
. 267
11.5.1
Image Data Set
. 269
11.5.2
Text Data Sets
. 269
11.5.3
Evaluation Using Synthetic Data
. 273
11.5.4
Summary of Experimental Results
. 279
11.6
Conclusion
. 280
12
Joint Cluster Analysis of Attribute Data and Relationship
Data
285
Martin Ester, Rong Ge, Byron
,/.
Gao,
Zengjian
Ни,
and Boaz
Ben-moshe
12.1
Introduction
. 285
12.2
Related Work
. 287
12.3
Problem Definition and Complexity Analysis
. 291
12.3.1
Preliminaries and Problem Definition
. 291
12.3.2
Complexity Analysis
. 292
12.4
Approximation Algorithms
. 295
12.4.1
Inapproximability Results for CkC
. 295
12.4.2
Approximation Results for Metric CkC
. 296
12.5
Heuristic Algorithm
. 300
12.5.1
Overview of NetScan
. 300
12.5.2
More Details on NetScan
. 302
12.5.3
Adaptation of NetScan to the Connected A1-Means Prob¬
lem
. 305
12.6
Experimental Results
. 305
12.7
Discussion
. 307
13
Correlation Clustering
313
Nicole Immorlica and Anthony
Wirth
13.1
Definition and Model
. 313
13.2
Motivation and Background
. 314
13.2.1
Maximizing Agreements
. 315
13.2.2
Minimizing Disagreements
. 316
13.2.3
Maximizing Correlation
. 317
13.3
Techniques
. 317
13.3.1
Region Growing
. 318
13.3.2
Combinatorial Approach
. 321
13.4
Applications
. 323
13.4.1
Location Area Planning
. 323
13.4.2
Co-Reference
. 323
13.4.3
Constrained Clustering
. 324
13.4.4
Cluster Editing
. 324
13.4.5
Consensus Clustering
. 324
14
Interactive Visual Clustering for Relational Data
329
Marie
des
Jardins,
James MacGlashan, and Julia
Ferraioli
14.1
Introduction
. 329
14.2
Background
. 331
14.3
Approach
. 332
14.3.1
Interpreting User Actions
. 332
14.3.2
Constrained Clustering
. 332
14.3.3
Updating the Display
. 333
14.3.4
Simulating the User
. 334
14.4
System Operation
. 334
14.5
Methodology
. 337
14.5.1
Data Sets
. 339
14.5.2
Circles
. 340
14.5.3
Overlapping Circles
. 340
14.5.4
Iris
. 340
14.5.5
Internet Movie Data Base
. 341
14.5.6
Classical and Rock Music
. 342
14.5.7
Amino
Acid Indices
. 342
14.5.8
Amino
Acid
. 342
14.6
Results and Discussion
. 343
14.6.1
Circles
. 343
14.6.2
Overlapping Circles
. 345
14.6.3
Iris
. 346
14.6.4
IMDB
. 346
14.6.5
Classical and Rock Music
. 347
14.6.6
Amino
Acid Indices
. 348
14.6.7
Amino
Acid
. 349
14.7
Related Work
. 350
14.8
Future Work and Conclusions
. 351
15
Distance
Metric
Learning from Cannot-be-Linked Example
Pairs, with Application to Name Disambiguation
357
Satoshi Oyama and Katsumi Tanaka
15.1
Background and Motivation
. 357
15.2
Preliminaries
. 359
15.3
Problem Formalization
. 361
15.4
Positive Semi-Definiteness of Learned Matrix
. 362
15.5
Relationship to Support Vector Machine Learning
. 363
15.6
Handling Noisy Data
. 364
15.7
Relationship to Single-Class Learning
. 365
15.8
Relationship to Online Learning
. 365
15.9
Application to Name Disambiguation
. 366
15.9.1
Name Disambiguation
. 366
15.9.2
Data Set and Software
. 367
15.9.3
Results
. 369
15.10
Conclusion
. 370
16
Privacy-Preserving Data Publishing: A Constraint-Based Clus¬
tering Approach
375
Anthony K. H. Tung, Jiawei Han,
Laku V. S.
Lakshmanan, and
Raymond T. Ng
16.1
Introduction
. 375
16.2
The Constrained Clustering Problem
. 377
16.3
Clustering without the Nearest Representative Property
. . . 380
16.3.1
Cluster Refinement under Constraints
. 381
16.3.2
Handling Tight Existential Constraints
. 384
16.3.3
Local Optimally and Termination
. 385
16.4
Scaling the Algorithm for Large Databases
. 387
16.4.1
Micro-Clustering and Its Complication
. 387
16.4.2
Micro-Cluster Sharing
. 388
16.5
Privacy Preserving Data Publishing as a Constrained Cluster¬
ing Problem
. 389
16.5.1
Determining
С
from V
. 390
16.5.2
Determining
с
. 391
16.6
Conclusion
. 392
17
Learning with Pairwise Constraints for Video Object Classi¬
fication
397
Rong Yan, Man Zhang, Me Yang, and Alexander G.
Hauptmann
17.1
Introduction
. 398
17.2
Related Work
. 400
17.3
Discriminative Learning with Pairwise Constraints
. 401
17.3.1
Regularized Loss Function with Pairwise Information
. 402
17.3.2
Non-Convex Pairwise Loss Functions
. 404
17.3.3
Convex Pairwise Loss Functions
. 404
17.4
Algorithms
. 406
17.4.1
Convex Pairwise Kernel Logistic Regression
. 407
17.4.2
Convex Pairwise Support Vector Machines
.· . . 408
17.4.3
Non-Convex Pairwise Kernel Logistic Regression
. . . 410
17.4.4
An Illustrative Example
. 414
17.5
Multi-Class Classification with Pairwise Constraints
. 414
17.6
Noisy Pairwise Constraints
. 415
17.7
Experiments
. 416
17.7.1
Data Collections and Preprocessing
. 417
17.7.2
Selecting Informative Pairwise Constrains from Video
417
17.7.3
Experimental Setting
. 420
17.7.4
Performance Evaluation
. 421
17.7.5
Results for Noisy Pairwise Constraints
. 424
17.8
Conclusion
. 426
Index
431 |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
building | Verbundindex |
bvnumber | BV035037453 |
callnumber-first | Q - Science |
callnumber-label | QA278 |
callnumber-raw | QA278 |
callnumber-search | QA278 |
callnumber-sort | QA 3278 |
callnumber-subject | QA - Mathematics |
classification_rvk | ST 530 |
classification_tum | MAT 627f DAT 700f DAT 777f |
ctrlnum | (OCoLC)1332003875 (DE-599)BVBBV035037453 |
dewey-full | 519.5/3 |
dewey-hundreds | 500 - Natural sciences and mathematics |
dewey-ones | 519 - Probabilities and applied mathematics |
dewey-raw | 519.5/3 |
dewey-search | 519.5/3 |
dewey-sort | 3519.5 13 |
dewey-tens | 510 - Mathematics |
discipline | Informatik Mathematik |
discipline_str_mv | Informatik Mathematik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000zc 4500</leader><controlfield tag="001">BV035037453</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20090515</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">080904s2009 xxud||| |||| 00||| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">2008014590</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781584889960</subfield><subfield code="c">hardback : alk. paper</subfield><subfield code="9">978-1-58488-996-0</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1332003875</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV035037453</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-91G</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA278</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">519.5/3</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">MAT 627f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 700f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 777f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Constrained clustering</subfield><subfield code="b">advances in algorithms, theory, and applications</subfield><subfield code="c">ed. by Sugato Basu ...</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boca Raton, FL [u.a.]</subfield><subfield code="b">Chapman & Hall/CRC</subfield><subfield code="c">2009</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">441 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Chapman & Hall/CRC data mining and knowledge discovery series</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references and index</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Algorithmes</subfield><subfield code="2">ram</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Classification automatique (statistique) - Informatique</subfield><subfield code="2">ram</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Exploration de données</subfield><subfield code="2">ram</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Datenverarbeitung</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cluster analysis</subfield><subfield code="x">Data processing</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data mining</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computer algorithms</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Cluster-Analyse</subfield><subfield code="0">(DE-588)4070044-6</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Cluster-Analyse</subfield><subfield code="0">(DE-588)4070044-6</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Basu, Sugato</subfield><subfield code="e">Sonstige</subfield><subfield code="4">oth</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bayreuth</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016706328&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-016706328</subfield></datafield></record></collection> |
id | DE-604.BV035037453 |
illustrated | Illustrated |
index_date | 2024-07-02T21:52:04Z |
indexdate | 2024-09-10T00:56:05Z |
institution | BVB |
isbn | 9781584889960 |
language | English |
lccn | 2008014590 |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-016706328 |
oclc_num | 1332003875 |
open_access_boolean | |
owner | DE-91G DE-BY-TUM |
owner_facet | DE-91G DE-BY-TUM |
physical | 441 S. graph. Darst. |
publishDate | 2009 |
publishDateSearch | 2009 |
publishDateSort | 2009 |
publisher | Chapman & Hall/CRC |
record_format | marc |
series2 | Chapman & Hall/CRC data mining and knowledge discovery series |
spelling | Constrained clustering advances in algorithms, theory, and applications ed. by Sugato Basu ... Boca Raton, FL [u.a.] Chapman & Hall/CRC 2009 441 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Chapman & Hall/CRC data mining and knowledge discovery series Includes bibliographical references and index Algorithmes ram Classification automatique (statistique) - Informatique ram Exploration de données ram Datenverarbeitung Cluster analysis Data processing Data mining Computer algorithms Cluster-Analyse (DE-588)4070044-6 gnd rswk-swf Data Mining (DE-588)4428654-5 gnd rswk-swf Data Mining (DE-588)4428654-5 s Cluster-Analyse (DE-588)4070044-6 s DE-604 Basu, Sugato Sonstige oth Digitalisierung UB Bayreuth application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016706328&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Constrained clustering advances in algorithms, theory, and applications Algorithmes ram Classification automatique (statistique) - Informatique ram Exploration de données ram Datenverarbeitung Cluster analysis Data processing Data mining Computer algorithms Cluster-Analyse (DE-588)4070044-6 gnd Data Mining (DE-588)4428654-5 gnd |
subject_GND | (DE-588)4070044-6 (DE-588)4428654-5 |
title | Constrained clustering advances in algorithms, theory, and applications |
title_auth | Constrained clustering advances in algorithms, theory, and applications |
title_exact_search | Constrained clustering advances in algorithms, theory, and applications |
title_exact_search_txtP | Constrained clustering advances in algorithms, theory, and applications |
title_full | Constrained clustering advances in algorithms, theory, and applications ed. by Sugato Basu ... |
title_fullStr | Constrained clustering advances in algorithms, theory, and applications ed. by Sugato Basu ... |
title_full_unstemmed | Constrained clustering advances in algorithms, theory, and applications ed. by Sugato Basu ... |
title_short | Constrained clustering |
title_sort | constrained clustering advances in algorithms theory and applications |
title_sub | advances in algorithms, theory, and applications |
topic | Algorithmes ram Classification automatique (statistique) - Informatique ram Exploration de données ram Datenverarbeitung Cluster analysis Data processing Data mining Computer algorithms Cluster-Analyse (DE-588)4070044-6 gnd Data Mining (DE-588)4428654-5 gnd |
topic_facet | Algorithmes Classification automatique (statistique) - Informatique Exploration de données Datenverarbeitung Cluster analysis Data processing Data mining Computer algorithms Cluster-Analyse Data Mining |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016706328&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT basusugato constrainedclusteringadvancesinalgorithmstheoryandapplications |