Survey of text mining: clustering, classification, and retrieval
Gespeichert in:
Format: | Buch |
---|---|
Sprache: | English |
Veröffentlicht: |
New York, NY
Springer
2004
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Literaturangaben |
Beschreibung: | XVII, 244 S. graph. Darst. |
ISBN: | 0387955631 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV017671944 | ||
003 | DE-604 | ||
005 | 20081203 | ||
007 | t | ||
008 | 031118s2004 gw d||| |||| 00||| eng d | ||
016 | 7 | |a 96909101X |2 DE-101 | |
020 | |a 0387955631 |9 0-387-95563-1 | ||
035 | |a (OCoLC)300384988 | ||
035 | |a (DE-599)BVBBV017671944 | ||
040 | |a DE-604 |b ger |e rakddb | ||
041 | 0 | |a eng | |
044 | |a gw |c DE | ||
049 | |a DE-N2 |a DE-355 |a DE-20 |a DE-706 |a DE-634 |a DE-11 |a DE-525 | ||
050 | 0 | |a QA76.9.D343 | |
082 | 0 | |a 006.3 |2 21 | |
084 | |a SK 840 |0 (DE-625)143261: |2 rvk | ||
084 | |a ST 302 |0 (DE-625)143652: |2 rvk | ||
245 | 1 | 0 | |a Survey of text mining |b clustering, classification, and retrieval |c Michael W. Berry (ed.) |
264 | 1 | |a New York, NY |b Springer |c 2004 | |
300 | |a XVII, 244 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Literaturangaben | ||
650 | 4 | |a Analyse discriminante - Congrès | |
650 | 4 | |a Classification automatique (Statistique) - Congrès | |
650 | 4 | |a Exploration de données (Informatique) - Congrès | |
650 | 4 | |a Cluster analysis |v Congresses | |
650 | 4 | |a Data mining |v Congresses | |
650 | 4 | |a Discriminant analysis |v Congresses | |
650 | 0 | 7 | |a Text Mining |0 (DE-588)4728093-1 |2 gnd |9 rswk-swf |
655 | 7 | |0 (DE-588)4143413-4 |a Aufsatzsammlung |2 gnd-content | |
655 | 7 | |0 (DE-588)1071861417 |a Konferenzschrift |2 gnd-content | |
689 | 0 | 0 | |a Text Mining |0 (DE-588)4728093-1 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Berry, Michael W. |e Sonstige |0 (DE-588)128412976 |4 oth | |
856 | 4 | 2 | |m Digitalisierung UB Regensburg |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=010626622&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-010626622 |
Datensatz im Suchindex
_version_ | 1804130410903371776 |
---|---|
adam_text | Contents
Preface
xi
Contributors
xiii
I Clustering and Classification
1
1
Cluster-Preserving Dimension Reduction Methods for Efficient
Classification of Text Data
3
Peg Howland and Haesun Park
1.1
Introduction
........................... 3
1.2
Dimension Reduction in the Vector Space Model
........ 4
1.3
A Method Based on an Orthogonal Basis of Centroids
..... 5
1.3.1
Relationship to a Method from Factor Analysis
.... 7
1.4
Discriminant Analysis and Its Extension for Text Data
..... 8
1
.4.1
Generalized Singular Value Decomposition
...... 9
1.4.2
Extension of Discriminant Analysis
..........
U
1.4.3
Equivalence for Various
ii
and
Ą
........... 14
1.5
Trace Optimization Using an Orthogonal Basis of Centroids
. . 16
1.6
Document Classification Experiments
.............. 17
1.7
Conclusion
............................ 20
References
................................ 22
2
Automatic Discovery of Similar Words
25
Pierre P. Senellart and Vincent D.
Blondei
2.1
Introduction
........................... 25
2.2
Discovery of Similar Words from a Large Corpus
....... 26
2.2.1
A Document Vector Space Model
............ 27
2.2.2
A Thesaurus of InfrequentWords
............ 28
2.2.3
The SEXTANT System
. . . ,............. 29
2.2.4
How to Deal with the Web
............... 32
2.3
Discovery of Similar Words in a Dictionary
........... 33
vi
Contents
2.3.1
Introduction
....................... 33
2.3.2
A
Generalization of
Kleinberg s
Method
........ 33
2.3.3
Other Methods
..................... 35
2.3.4
Dictionary Graph
.................... 36
2.3.5
Results
.......................... 37
2.3.6
Future Perspectives
................... 41
2.4
Conclusion
............................ 41
References
................................ 42
3
Simultaneous Clustering and Dynamic Keyword Weighting for Text
Documents
45
Hichem Frigui and Olfa Nasraoui
3.1
Introduction
........................... 45
3.2
Simultaneous Clustering and Term Weighting of Text
Documents
............................ 47
3.3
Simultaneous Soft Clustering and Term Weighting of Text
Documents
............................ 52
3.4
Robustness in the Presence of Noise Documents
........ 56
3.5
Experimental Results
...................... 57
3.5.1
Simulation Results on Four-Class Web Text Data
... 57
3.5.2
Simulation Results on
20
Newsgroups Data
...... 59
3.6
Conclusion
............................ 69
References
................................ 70
4
Feature Selection and Document Clustering
73
Inderjit Dhillon, Jacob Kogan, and Charles Nicholas
4.1
Introduction
........................... 73
4.2
Clustering Algorithms
...................... 74
4.2.1
Means Clustering Algorithm
.............. 74
4.2.2
Principal Direction Divisive Partitioning
........ 78
4.3
Data and Term Quality
...................... 80
4.4
Term Variance Quality
...................... 81
4.5
Same Context Terms
....................... 86
4.5.1
Term Profiles
...................... 87
4.5.2
Term Profile Quality
.................. 87
4.6
Spherical Principal Directions Divisive Partitioning
...... 90
4.6.1
Two-Cluster Partition of Vectors on the Unit Circle
. . 90
4.6.2
Clustering with sPDDP
................. 96
4.7
Future Research
......................... 98
References
................................ 99
II Information Extraction and Retrieval
101
5
Vector Space Models for Search and Cluster Mining
103
Contents
vii
Mei
Kobayashi and Masaki Aono
5.1
Introduction
........................... 103
5.2
Vector Space Modeling (VSM)
................. 105
5.2.1
The Basic VSM Model for
IR
.............. 105
5.2.2
Latent Semantic Indexing (LSI)
............. 107
5.2.3
Covariance Matrix Analysis (COV)
........... 108
5.2.4
Comparison of LSI and COV
.............. 109
5.3
VSM for Major and Minor Cluster Discovery
..........
Ill
5.3.1
Clustering
........................
Ill
5.3.2
IterativeRescaling: Ando s Algorithm
.........
Ill
5.3.3
Dynamic Rescaling of LSI
............... 113
5.3.4
Dynamic Rescaling of COV
.............. 114
5.4
Implementation Studies
..................... 115
5.4.1
Implementations with Artificially Generated
Datasets
. 115
5.4.2
Implementations with
L.
A. Times News Articles
. ... 118
5.5
Conclusions and Future Work
.................. 120
References
................................ 120
6
HotMiner: Discovering Hot Topics from Dirty Text
123
Malú Castellanos
6.1
Introduction
........................... 124
6.2
Related Work
.......................... 128
6.3
Technical Description
...................... 130
6.3.1
Preprocessing
...................... 130
6.3.2
Clustering
........................ 132
6.3.3
Postfiltering
....................... 133
6.3.4
Labeling
......................... 136
6.4
Experimental Results
...................... 137
6.5
Technical Description
...................... 143
6.5.1
Thesaurus Assistant
................... 145
6.5.2
Sentence Identifier
................... 147
6.5.3
Sentence Extractor
................... 149
6.6
Experimental Results
...................... 151
6.7
Mining Case Excerpts for Hot Topics
.............. 153
6.8
Conclusions
..............·............. 154
References
................................ 155
7
Combining Families of Information Retrieval Algorithms Using
Metalearning
159
Michael Cornelson, Ed Greengrass, Robert L. Grossman, Ron Karidi,
and Daniel Shnidman
7.1
Introduction
........................... 159
7.2
Related Work
.......................... 161
7.3
Information Retrieval
...................... 162
7.4
Metalearning
........................... 164
Contents
7.5 Implementation......................... 166
7.6
Experimental
Results ......................
166
7.7
Further Work
........................... 167
7.8
Summary and Conclusion
.................... 168
References
................................ 168
III Trend Detection
171
8
Trend and Behavior Detection from Web Queries
173
Peiling
Wang, Jennifer Bownas, and Michael W. Berry
8.1
Introduction
........................... 173
8.2
Query Data and Analysis
.................... 174
8.2.1
Descriptive Statistics of Web Queries
.......... 175
8.2.2
Trend Analysis of Web Searching
............ 176
8.3
Zipf sLaw
............................ 178
8.3.1
Natural Logarithm Transformations
.......... 178
8.3.2
Piecewise
Trendlines.................. 179
8.4
Vocabulary Growth
....................... 179
8.5
Conclusions and Further Studies
................ 181
References
................................ 182
9
A Survey of Emerging Trend Detection in Textual Data Mining
185
April Kontostathis, Leon M.
Galitsky,
William M. Pottenger,
Soma
Roy, and Daniel J. Phelps
9.1
Introduction
........................... 186
9.2
ETD Systems
.......................... 187
9.2.1
Technology Opportunities Analysis
(TOA)
....... 189
9.2.2
CIMEL: Constructive, Collaborative Inquiry-Based
Multimedia E-Learning
................. 191
9.2.3
TimeMines
....................... 195
9.2.4
New Event Detection
.................. 199
9.2.5
ThemeRiver™
...................... 201
9.2.6
PatentMiner
....................... 204
9.2.7
HDDI™
......................... 207
9.2.8
Other Related Work
................... 211
9.3
Commercial Software Overview
................. 211
9.3.1
Autonomy
........................ 212
9.3.2
SPSS LexiQuest
..................... 212
9.3.3
ClearForest
....................... 213
9.4
Conclusions and Future Work
.................. 213
9.5
Industrial Counterpoint: Is ETD Useful? Dr. Daniel J. Phelps,
Leader, Information Mining Group, Eastman Kodak
...... 215
References
................................ 219
Contents ix
Bibliography 225
Index 241
|
any_adam_object | 1 |
author_GND | (DE-588)128412976 |
building | Verbundindex |
bvnumber | BV017671944 |
callnumber-first | Q - Science |
callnumber-label | QA76 |
callnumber-raw | QA76.9.D343 |
callnumber-search | QA76.9.D343 |
callnumber-sort | QA 276.9 D343 |
callnumber-subject | QA - Mathematics |
classification_rvk | SK 840 ST 302 |
ctrlnum | (OCoLC)300384988 (DE-599)BVBBV017671944 |
dewey-full | 006.3 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 006 - Special computer methods |
dewey-raw | 006.3 |
dewey-search | 006.3 |
dewey-sort | 16.3 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik Mathematik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01882nam a2200481 c 4500</leader><controlfield tag="001">BV017671944</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20081203 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">031118s2004 gw d||| |||| 00||| eng d</controlfield><datafield tag="016" ind1="7" ind2=" "><subfield code="a">96909101X</subfield><subfield code="2">DE-101</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0387955631</subfield><subfield code="9">0-387-95563-1</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)300384988</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV017671944</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakddb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">gw</subfield><subfield code="c">DE</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-N2</subfield><subfield code="a">DE-355</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-706</subfield><subfield code="a">DE-634</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-525</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA76.9.D343</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.3</subfield><subfield code="2">21</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">SK 840</subfield><subfield code="0">(DE-625)143261:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 302</subfield><subfield code="0">(DE-625)143652:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Survey of text mining</subfield><subfield code="b">clustering, classification, and retrieval</subfield><subfield code="c">Michael W. Berry (ed.)</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">New York, NY</subfield><subfield code="b">Springer</subfield><subfield code="c">2004</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XVII, 244 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturangaben</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Analyse discriminante - Congrès</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Classification automatique (Statistique) - Congrès</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Exploration de données (Informatique) - Congrès</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cluster analysis</subfield><subfield code="v">Congresses</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data mining</subfield><subfield code="v">Congresses</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Discriminant analysis</subfield><subfield code="v">Congresses</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Text Mining</subfield><subfield code="0">(DE-588)4728093-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)4143413-4</subfield><subfield code="a">Aufsatzsammlung</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)1071861417</subfield><subfield code="a">Konferenzschrift</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Text Mining</subfield><subfield code="0">(DE-588)4728093-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Berry, Michael W.</subfield><subfield code="e">Sonstige</subfield><subfield code="0">(DE-588)128412976</subfield><subfield code="4">oth</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=010626622&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-010626622</subfield></datafield></record></collection> |
genre | (DE-588)4143413-4 Aufsatzsammlung gnd-content (DE-588)1071861417 Konferenzschrift gnd-content |
genre_facet | Aufsatzsammlung Konferenzschrift |
id | DE-604.BV017671944 |
illustrated | Illustrated |
indexdate | 2024-07-09T19:20:36Z |
institution | BVB |
isbn | 0387955631 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-010626622 |
oclc_num | 300384988 |
open_access_boolean | |
owner | DE-N2 DE-355 DE-BY-UBR DE-20 DE-706 DE-634 DE-11 DE-525 |
owner_facet | DE-N2 DE-355 DE-BY-UBR DE-20 DE-706 DE-634 DE-11 DE-525 |
physical | XVII, 244 S. graph. Darst. |
publishDate | 2004 |
publishDateSearch | 2004 |
publishDateSort | 2004 |
publisher | Springer |
record_format | marc |
spelling | Survey of text mining clustering, classification, and retrieval Michael W. Berry (ed.) New York, NY Springer 2004 XVII, 244 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Literaturangaben Analyse discriminante - Congrès Classification automatique (Statistique) - Congrès Exploration de données (Informatique) - Congrès Cluster analysis Congresses Data mining Congresses Discriminant analysis Congresses Text Mining (DE-588)4728093-1 gnd rswk-swf (DE-588)4143413-4 Aufsatzsammlung gnd-content (DE-588)1071861417 Konferenzschrift gnd-content Text Mining (DE-588)4728093-1 s DE-604 Berry, Michael W. Sonstige (DE-588)128412976 oth Digitalisierung UB Regensburg application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=010626622&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Survey of text mining clustering, classification, and retrieval Analyse discriminante - Congrès Classification automatique (Statistique) - Congrès Exploration de données (Informatique) - Congrès Cluster analysis Congresses Data mining Congresses Discriminant analysis Congresses Text Mining (DE-588)4728093-1 gnd |
subject_GND | (DE-588)4728093-1 (DE-588)4143413-4 (DE-588)1071861417 |
title | Survey of text mining clustering, classification, and retrieval |
title_auth | Survey of text mining clustering, classification, and retrieval |
title_exact_search | Survey of text mining clustering, classification, and retrieval |
title_full | Survey of text mining clustering, classification, and retrieval Michael W. Berry (ed.) |
title_fullStr | Survey of text mining clustering, classification, and retrieval Michael W. Berry (ed.) |
title_full_unstemmed | Survey of text mining clustering, classification, and retrieval Michael W. Berry (ed.) |
title_short | Survey of text mining |
title_sort | survey of text mining clustering classification and retrieval |
title_sub | clustering, classification, and retrieval |
topic | Analyse discriminante - Congrès Classification automatique (Statistique) - Congrès Exploration de données (Informatique) - Congrès Cluster analysis Congresses Data mining Congresses Discriminant analysis Congresses Text Mining (DE-588)4728093-1 gnd |
topic_facet | Analyse discriminante - Congrès Classification automatique (Statistique) - Congrès Exploration de données (Informatique) - Congrès Cluster analysis Congresses Data mining Congresses Discriminant analysis Congresses Text Mining Aufsatzsammlung Konferenzschrift |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=010626622&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT berrymichaelw surveyoftextminingclusteringclassificationandretrieval |