Mining text data:
Gespeichert in:
Weitere Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
New York [u.a.]
Springer
2012
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | XI, 522 S. graph. Darst. |
ISBN: | 9781461432227 9781489989208 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV040039953 | ||
003 | DE-604 | ||
005 | 20190710 | ||
007 | t | ||
008 | 120411s2012 d||| |||| 00||| eng d | ||
010 | |a 2012930923 | ||
020 | |a 9781461432227 |9 978-1-4614-3222-7 | ||
020 | |a 9781489989208 |c pbk. |9 978-1-4899-8920-8 | ||
035 | |a (OCoLC)775736101 | ||
035 | |a (DE-599)BSZ356741958 | ||
040 | |a DE-604 |b ger | ||
041 | 0 | |a eng | |
049 | |a DE-11 |a DE-1051 |a DE-19 |a DE-188 |a DE-739 |a DE-N2 | ||
084 | |a ST 306 |0 (DE-625)143654: |2 rvk | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
245 | 1 | 0 | |a Mining text data |c Charu C. Aggarwal ... ed. |
264 | 1 | |a New York [u.a.] |b Springer |c 2012 | |
300 | |a XI, 522 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 0 | 7 | |a Text Mining |0 (DE-588)4728093-1 |2 gnd |9 rswk-swf |
655 | 7 | |0 (DE-588)4143413-4 |a Aufsatzsammlung |2 gnd-content | |
689 | 0 | 0 | |a Text Mining |0 (DE-588)4728093-1 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Aggarwal, Charu C. |d 1970- |0 (DE-588)133500101 |4 edt | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |a Aggarwal, Charu C. |t Mining Text Data |z 978-1-4614-3223-4 |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024896637&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-024896637 |
Datensatz im Suchindex
_version_ | 1804149040671096832 |
---|---|
adam_text | Contents
i
An Introduction to Text Mining 1
Cham C. Aggarwal and ChengXiang Zhai
1. Introduction 1
2. A1 gorithms for Text Mining 4
3. Future Directions 8
References 10
2
Information Extraction from Text
Jing Jiang
1. Introduction
2. Named Entity Recognition
2.1 Rule-based Approach
2.2 Statistical Learning Approach
3. Relation Extraction
3.1 Feature-based Classification
3.2 Kernel Methods
3.3 Weakly Supervised Learning Methods
4. Unsupervised Information Extraction
4.1 Relation Discovery and Template Induction
4.2 Open Information Extraction
5. Evaluation
6. Conclusions and Summary
References
3
A Survey of Text Summarization Techniques
Am Nenkova and Kathleen McKeown
1. How do Extractive Summarizers Work?
2. Topic Representation Approaches
2.1 Topic Words
2.2 Frequency-driven Approaches
2.3 Latent Semantic Analysis
2.4 Bayesian Topic Models
2.5 Sentence Clustering and Domain-dependent Topics
3. Influence of Context
3.1 Web Summarization
3.2 Summarization of Scientific Articles
11
11
15
16
17
22
23
26
29
30
31
32
33
34
35
44
46
46
48
52
53
55
56
jr *-7
V
VI
MINING TEXT DATA
3.3 Query-focused Summarization 58
3.4 Email Summarization 59
4. Indicator Representations and Machine Learning for Summa-
rization 60
4.1 Graph Methods for Sentence Importance 60
4.2 Machine Learning for Summarization 62
5. Selecting Summary Sentences 64
5.1 Greedy Approaches: Maximal Marginal Relevance 64
5.2 Global Summary Selection 65
6. Conclusion 66
References 66
4
A Survey of Text Clustering Algorithms
Charu C. Aggarwal and ChengXiang Zhai
1. Introduction
2. Feature Selection and Transformation Methods for Text Clus-
tering
2.1 Feature Selection Methods
2.2 LSI-based Methods
2.3 Non-negative Matrix Factorization
3. Distance-based Clustering Algorithms
3.1 Agglomerative and Hierarchical Clustering Algorithms
3.2 Distance-based Partitioning Algorithms
3.3 A Hybrid Approach: The Scatter-Gather Method
4. Word and Phrase-based Clustering
4.1 Clustering with Frequent Word Patterns
4.2 Leveraging Word Clusters for Document Clusters
4.3 Co-clustering Words and Documents
4.4 Clustering with Frequent Phrases
5. Probabilistic Document Clustering and Topic Models
6. Online Clustering with Text Streams
7. Clustering Text in Networks
8. Semi-Supervised Clustering
9. Conclusions and Summary
References
5
Dimensionality Reduction and Topic Modeling 129
Steven P. Crain, Ke Zhou, Shuang-Hong Yang and Hongyuan Zha
1. Introduction 130
1.1 The Relationship Between Clustering, Dimension Re-
duction and Topic Modeling 131
1.2 Notation and Concepts 132
2. Latent Semantic Indexing 133
2.1 The Procedure of Latent Semantic Indexing 134
2.2 Implementation Issues 135
2.3 Analysis 137
3. Topic Models and Dimension Reduction 139
3.1 Probabilistic Latent Semantic Indexing 140
3.2 Latent Dirichlet Allocation 142
4. Interpretation and Evaluation 148
77
77
81
81
84
86
89
90
92
94
99
100
102
103
105
107
110
115
118
120
121
Contents
vu
4.1 Interpretation 148
4.2 Evaluation 149
4.3 Parameter Selection 150
4.4 Dimension Reduction 150
5. Beyond Latent Dirichlet Allocation 151
5.1 Scalability 151
5.2 Dynamic Data 151
5.3 Networked Data 152
5.4 Adapting Topic Models to Applications 154
6. Conclusion 155
References 156
6
A Survey of Text Classification Algorithms
Charu C. Aggarwal and Cheng Xiang Zhai
1. Introduction
2. Feature Selection for Text Classification
2.1 Gini Index
2.2 Information Gain
2.3 Mutual Information
2.4 ;y2-Statistic
2.5 Feature Transformation Methods: Supervised LSI
2.6 Supervised Clustering for Dimensionality Reduction
2.7 Linear Discriminant Analysis
2.8 Generalized Singular Value Decomposition
2.9 Interaction of Feature Selection with Classification
3. Decision Tree Classifiers
4. Rule-based Classifiers
5. Probabilistic and Naive Bayes Classifiers
5.1 Bernoulli Multivariate Model
5.2 Multinomial Distribution
5.3 Mixture Modeling for Text Classification
6. Linear Classifiers
6.1 SVM Classifiers
6.2 Regression-Based Classifiers
6.3 Neural Network Classifiers
6.4 Some Observations about Linear Classifiers
7. Proximity-based Classifiers
8. Classification of Linked and Web Data
9. Met a- Algorithms for Text Classification
9.1 Classifier Ensemble Learning
9.2 Data Centered Methods: Boosting and Bagging
9.3 Optimizing Specific Measures of Accuracy
10. Conclusions and Summary
References
(
Transfer Learning for Text kilning
Weike Pan, Erheng Zhong and Qiang Yang
1 • Introduction
2. Transfer Learning in Text Classification
2.1 Cross Domain Text Classification
163
163
167
168
169
169
170
171
172
173
175
175
176
178
181
183
188
190
193
194
196
197
199
200
203
209
209
210
211
213
213
223
225
225
MINING TEXT DATA
2.2 Instance-based Transfer
2.3 Cross-Domain Ensemble Learning
2.4 Feature-based Transfer Learning for Document Classi-
fication
3. Heterogeneous Transfer Learning
3.1 Heterogeneous Feature Space
3.2 Heterogeneous Label Space
3.3 Summary
4. Discussion
5. Conclusions
References
8
Probabilistic Models for Text Mining
Yizhou Sun, Hongbo Deng and Jiawei Han
1. Introduction
2. Mixture Models
2.1 General Mixture Model Framework
2.2 Variations and Applications
2.3 The Learning Algorithms
3. Stochastic Processes in Bayesian Nonparametric Models
3.1 Chinese Restaurant Process
3.2 Dirichlet Process
3.3 Pitman-Yor Process
3.4 Others
4. Graphical Models
4.1 Bayesian Networks
4.2 Hidden Markov Models
4.3 Markov Random Fields
4.4 Conditional Random Fields
4.5 Other Models
5. Probabilistic Models with Constraints
6. Parallel Learning Algorithms
7. Conclusions
References
9
Mining Text Streams
Charu C. Aggarwal
1. Introduction
2. Clustering Text Streams
2.1 Topic Detection and Tracking in Text Streams
3. Classification of Text Streams
4. Evolution Analvsis in Text Streams
5. Conclusions
References
10
Translingual Mining from Text Data
Jian-Yun Nie, Jianfeng Gao and Guihong Gao
1. Introduction
2. Traditional Translingual Text Mining - Machine Translation
231
232
235
239
241
243
244
245
246
247
259
260
261
262
263
266
269
269
270
274
275
275
276
278
282
285
286
287
288
289
290
297
297
299
307
312
316
317
318
323
324
325
Contents
lx
2.1 SMT and Generative Translation Models 325
2.2 Word-Based Models 327
2.3 Phrase-Based Models 329
2.4 Syntax-Based Models 333
3. Automatic Mining of Parallel texts 336
3.1 Using Web structure 337
3.2 Mat clung parallel pages 339
4. Using Translation Models in CLIR 341
5. Collecting and Exploiting Comparable Texts 344
6. Selecting Parallel Sentences, Phrases and Translation Words 347
7. Mining Translingual Relations From Monolingual Texts 349
8. Mining using hyperlinks 351
9. Conclusions and Discussions 353
References 354
11
Text Mining in Multimedia 361
Zheng- Jun Zha, Meng Wang, Jialie Shen and Tat-Seng Chua
1. Introduction 362
2. Surrounding Text Mining 364
3. Tag Mining 366
3.1 Tag Ranking 366
3.2 Tag Refinement 367
3.3 Tag Information Enrichment 369
4. Joint Text and Visual Content Mining 370
4.1 Visual Re-ranking 371
5. Cross Text and Visual Content Mining 374
6. Summary and Open Issues 377
References 379
12
Text Analytics in Social Media 385
Xia Hu and Huan Liu
1. Introduction 385
2. Distinct Aspects of Text in Social Media 388
2.1 A General Framework for Text Analytics 388
2.2 Time Sensitivity 390
2.3 Short Length 391
2.4 Unstructured Phrases 392
2.5 Abundant Information 393
3. Applying Text Analytics to Social Media 393
3.1 Event Detection 393
3.2 Collaborative Question Answering 395
3.3 Social Tagging 397
3.4 Bridging the Semantic Gap 398
3.5 Exploiting the Power of Abundant Information 399
3.6 Related Efforts 401
4. An Illustrative Example 402
4.1 Seed Phrase Extraction 402
4.2 Semantic Feature Generation 404
4.3 Feature Space Construction 406
5. Conclusion and Future Work 407
References 408
X
MINING TEXT DATA
13
A Survev of Opinion Mining and Sentiment Analysis 415
Bing Liu and Lei Zhang
1. The Problem of Opinion Mining 416
1.1 Opinion Definition 416
1.2 Aspect-Based Opinion Summary 420
2. Document Sentiment Classification 422
2.1 Classification based on Supervised Learning 422
2.2 Classification based on Unsupervised Learning 424
3. Sentence Subjectivity and Sentiment Classification 426
4. Opinion Lexicon Expansion 429
4.1 Dictionary based approach 429
4.2 Corpus-based approach and sentiment consistency 430
5. Aspect-Based Sentiment Analysis 432
5.1 Aspect Sentiment Classification 433
5.2 Basic Rules of Opinions 434
5.3 Aspect Extraction 438
5.4 Simultaneous Opinion Lexicon Expansion and Aspect
Extraction 440
6. Mining Comparative Opinions 441
7. Some Other Problems 444
8. Opinion Spam Detection 447
8.1 Spam Detection Based on Supervised Learning 448
8.2 Spam Detection Based on Abnormal Behaviors 449
8.3 Group Spam Detection 450
9. Utility of Reviews 451
10. Conclusions 452
References 453
14
Biomedical Text Mining: A Survey 465
of Recent Progress
Matthew S. Simpson and Dina Demner-Fushman
1. Introduction 466
2. Resources for Biomedical Text Mining 467
2.1 Corpora 467
2.2 Annotation 469
2.3 Knowledge Sources 470
2.4 Supporting Tools 471
3. Information Extraction 472
3.1 Named Entity Recognition 473
3.2 Relation Extraction 478
3.3 Event Extraction 482
4. Summarization 484
5. Question Answering 488
5.1 Medical Question Answering 489
5.2 Biological Question Answering 491
6. Literature-Based Discovery 492
7. Conclusion 495
References 496
Index 519
|
any_adam_object | 1 |
author2 | Aggarwal, Charu C. 1970- |
author2_role | edt |
author2_variant | c c a cc cca |
author_GND | (DE-588)133500101 |
author_facet | Aggarwal, Charu C. 1970- |
building | Verbundindex |
bvnumber | BV040039953 |
classification_rvk | ST 306 ST 530 |
ctrlnum | (OCoLC)775736101 (DE-599)BSZ356741958 |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01512nam a2200373 c 4500</leader><controlfield tag="001">BV040039953</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20190710 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">120411s2012 d||| |||| 00||| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">2012930923</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781461432227</subfield><subfield code="9">978-1-4614-3222-7</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781489989208</subfield><subfield code="c">pbk.</subfield><subfield code="9">978-1-4899-8920-8</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)775736101</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BSZ356741958</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-11</subfield><subfield code="a">DE-1051</subfield><subfield code="a">DE-19</subfield><subfield code="a">DE-188</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-N2</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 306</subfield><subfield code="0">(DE-625)143654:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Mining text data</subfield><subfield code="c">Charu C. Aggarwal ... ed.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">New York [u.a.]</subfield><subfield code="b">Springer</subfield><subfield code="c">2012</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XI, 522 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Text Mining</subfield><subfield code="0">(DE-588)4728093-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)4143413-4</subfield><subfield code="a">Aufsatzsammlung</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Text Mining</subfield><subfield code="0">(DE-588)4728093-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Aggarwal, Charu C.</subfield><subfield code="d">1970-</subfield><subfield code="0">(DE-588)133500101</subfield><subfield code="4">edt</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="a">Aggarwal, Charu C.</subfield><subfield code="t">Mining Text Data</subfield><subfield code="z">978-1-4614-3223-4</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024896637&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-024896637</subfield></datafield></record></collection> |
genre | (DE-588)4143413-4 Aufsatzsammlung gnd-content |
genre_facet | Aufsatzsammlung |
id | DE-604.BV040039953 |
illustrated | Illustrated |
indexdate | 2024-07-10T00:16:42Z |
institution | BVB |
isbn | 9781461432227 9781489989208 |
language | English |
lccn | 2012930923 |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-024896637 |
oclc_num | 775736101 |
open_access_boolean | |
owner | DE-11 DE-1051 DE-19 DE-BY-UBM DE-188 DE-739 DE-N2 |
owner_facet | DE-11 DE-1051 DE-19 DE-BY-UBM DE-188 DE-739 DE-N2 |
physical | XI, 522 S. graph. Darst. |
publishDate | 2012 |
publishDateSearch | 2012 |
publishDateSort | 2012 |
publisher | Springer |
record_format | marc |
spelling | Mining text data Charu C. Aggarwal ... ed. New York [u.a.] Springer 2012 XI, 522 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Text Mining (DE-588)4728093-1 gnd rswk-swf (DE-588)4143413-4 Aufsatzsammlung gnd-content Text Mining (DE-588)4728093-1 s DE-604 Aggarwal, Charu C. 1970- (DE-588)133500101 edt Erscheint auch als Online-Ausgabe Aggarwal, Charu C. Mining Text Data 978-1-4614-3223-4 Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024896637&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Mining text data Text Mining (DE-588)4728093-1 gnd |
subject_GND | (DE-588)4728093-1 (DE-588)4143413-4 |
title | Mining text data |
title_auth | Mining text data |
title_exact_search | Mining text data |
title_full | Mining text data Charu C. Aggarwal ... ed. |
title_fullStr | Mining text data Charu C. Aggarwal ... ed. |
title_full_unstemmed | Mining text data Charu C. Aggarwal ... ed. |
title_short | Mining text data |
title_sort | mining text data |
topic | Text Mining (DE-588)4728093-1 gnd |
topic_facet | Text Mining Aufsatzsammlung |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024896637&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT aggarwalcharuc miningtextdata |