Text mining: applications and theory
Gespeichert in:
Format: | Buch |
---|---|
Sprache: | English |
Veröffentlicht: |
Chichester
Wiley
2010
|
Ausgabe: | 1. publ., reprint. |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Literaturangaben |
Beschreibung: | XIV, 207 S. Ill., graph. Darst. |
ISBN: | 9780470749821 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV036661543 | ||
003 | DE-604 | ||
005 | 20120725 | ||
007 | t | ||
008 | 100909s2010 ad|| |||| 10||| eng d | ||
020 | |a 9780470749821 |9 978-0-470-74982-1 | ||
035 | |a (OCoLC)642944290 | ||
035 | |a (DE-599)BVBBV036661543 | ||
040 | |a DE-604 |b ger | ||
041 | 0 | |a eng | |
049 | |a DE-20 |a DE-1051 |a DE-706 |a DE-355 |a DE-739 |a DE-522 |a DE-N2 | ||
084 | |a QH 500 |0 (DE-625)141607: |2 rvk | ||
084 | |a ST 306 |0 (DE-625)143654: |2 rvk | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
245 | 1 | 0 | |a Text mining |b applications and theory |c Michael W. Berry ; Jacob Kogan |
250 | |a 1. publ., reprint. | ||
264 | 1 | |a Chichester |b Wiley |c 2010 | |
300 | |a XIV, 207 S. |b Ill., graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Literaturangaben | ||
650 | 0 | |a Data mining / Congresses | |
650 | 0 | |a Natural language processing (Computer science) / Congresses | |
650 | 0 | 7 | |a Text Mining |0 (DE-588)4728093-1 |2 gnd |9 rswk-swf |
655 | 7 | |0 (DE-588)1071861417 |a Konferenzschrift |2 gnd-content | |
689 | 0 | 0 | |a Text Mining |0 (DE-588)4728093-1 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Berry, Michael W. |e Sonstige |0 (DE-588)128412976 |4 oth | |
856 | 4 | 2 | |m Digitalisierung UB Regensburg |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020580838&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-020580838 |
Datensatz im Suchindex
_version_ | 1804143282079399936 |
---|---|
adam_text | Contents
List of Contributors
xi
Preface
xiii
PART I TEXT EXTRACTION, CLASSIFICATION,
AND CLUSTERING
1
Automatic keyword extraction from individual documents
3
1.1
Introduction
3
1.1.1
Keyword extraction methods
4
1.2
Rapid automatic keyword extraction
5
1.2.1
Candidate keywords
6
1.2.2
Keyword scores
7
1.2.3
Adjoining keywords
8
1.2.4
Extracted keywords
8
1.3
Benchmark evaluation
9
1.3.1
Evaluating precision and recall
9
1.3.2
Evaluating efficiency
10
1.4
Stoplist
generation
11
1.5
Evaluation on news articles
15
1.5.1
The MPQA Corpus
15
1.5.2
Extracting keywords from news articles
15
1.6
Summary
18
1.7
Acknowledgements
19
References
19
2
Algebraic techniques for multilingual document clustering
21
2.1
Introduction
21
2.2
Background
22
2.3
Experimental setup
23
2.4
Multilingual LSA
25
2.5
Tuckerl method
27
vi
CONTENTS
2.6
PARAFAC2
method
28
2.7
LSA with term alignments
29
2.8
Latent morpho-semantic analysis (LMSA)
32
2.9
LMSA with
tenn
alignments
33
2.10
Discussion of results and techniques
33
2.11
Acknowledgements
35
References
35
3
Content-based spam email classification using
machine-learning algorithms
37
3.1
Introduction
37
3.2
Machine-learning algorithms
39
3.2.1
Naive
Bayes
39
3.2.2
LogitBoost
40
3.2.3
Support vector machines
41
3.2.4
Augmented latent semantic indexing spaces
43
3.2.5
Radial basis function networks
44
3.3
Data preprocessing
45
3.3.1
Feature selection
45
3.3.2
Message representation
47
3.4
Evaluation of email classification
48
3.5
Experiments
49
3.5.1
Experiments with PU1
49
3.5.2
Experiments with ZH1
51
3.6
Characteristics of classifiers
53
3.7
Concluding remarks
54
3.8
Acknowledgements
55
References
55
4
Utilizing
nonnegative
matrix factorization for email
classification problems
57
4.1
Introduction
57
4.1.1
Related work
59
4.1.2
Synopsis
60
4.2
Background
60
4.2.1
Nonnegative
matrix factorization
60
4.2.2
Algorithms for computing NMF
61
4.2.3
Datasets
63
4.2.4
Interpretation
64
4.3
NMF initialization based on feature ranking
65
4.3.1
Feature subset selection
66
4.3.2
FS initialization
66
4.4
NMF-based classification methods
70
4.4.1
Classification using basis features
70
4.4.2
Generalizing LSI based on NMF
72
CONTENTS
vii
4.5
Conclusions
78
4.6
Acknowledgements
79
References
79
Constrained clustering with A-means type algorithms
81
5.1
Introduction
81
5.2
Notations and classical ¿-means
82
5.3
Constrained ¿-means with Bregman divergences
84
5.3.1
Quadratic ¿-means with cannot-link constraints
84
5.3.2
Elimination of must-link constraints
87
5.3.3
Clustering with Bregman divergences
89
5.4
Constrained
smoka
type clustering
92
5.5
Constrained spherical ¿-means
95
5.5.1
Spherical ¿-means with cannot-link constraints only
96
5.5.2
Spherical ¿-means with cannot-link and must-link
constraints
98
5.6
Numerical experiments
99
5.6.1
Quadratic ¿-means
100
5.6.2
Spherical ¿-means
100
5.7
Conclusion
101
References
102
PART II ANOMALY AND TREND DETECTION
105
6
Survey of text visualization techniques
107
6.1
Visualization in text analysis
107
6.2
Tag clouds
108
6.3
Authorship and change tracking
110
6.4
Data exploration and the search for novel patterns 111
6.5
Sentiment tracking 111
6.6
Visual analytics and FutureLens
113
6.7
Scenario discovery
114
6.7.1
Scenarios
115
6.7.2
Evaluating solutions
115
6.8
Earlier prototype
116
6.9
Features of FutureLens
117
6.10
Scenario discovery example:
bioterrorism
119
6.11
Scenario discovery example: drug trafficking
121
6.12
Future work
123
References
126
7
Adaptive threshold setting for novelty mining
129
7.1
Introduction
129
7.2
Adaptive threshold setting in novelty mining
131
viü CONTENTS
7.2.1
Background
131
7.2.2 Motivation 132
7.2.3
Gaussian-based
adaptive
threshold
setting
132
7.2.4
Implementation issues
137
7.3
Experimental study
138
7.3.1
Datasets
138
7.3.2
Working example
139
7.3.3
Experiments and results
142
7.4
Conclusion
146
References
147
8
Text mining and cybercrime
149
8.1
Introduction
149
8.2
Current research in Internet
prédation
and cyberbullying
151
8.2.1
Capturing
DVI
and IRC chat
151
8.2.2
Current collections for use in analysis
152
8.2.3
Analysis of
IM
and IRC chat
153
8.2.4
Internet
prédation
detection
153
8.2.5
Cyberbullying detection
158
8.2.6
Legal issues
159
8.3
Commercial software for monitoring chat
159
8.4
Conclusions and future directions
161
8.5
Acknowledgements
162
References
162
PART HI TEXT STREAMS
165
9
Events and trends in text streams
167
9.1
Introduction
167
9.2
Text streams
169
9.3
Feature extraction and data reduction
170
9.4
Event detection
171
9.5
Trend detection
174
9.6
Event and trend descriptions
176
9.7
Discussion
180
9.8
Summary
181
9.9
Acknowledgements
181
References
10
Embedding semantics in LDA topic models
183
10.1
Introduction
183
10.2
Background
184
CONTENTS ix
10.2.1
Vector
space modeling
184
10.2.2
Latent semantic analysis
185
10.2.3
Probabilistic latent semantic analysis
185
10.3
Latent Dirichlet allocation
186
10.3.1
Graphical model and generative process
187
10.3.2
Posterior inference
187
10.3.3
Online latent Dirichlet allocation
(OLDA)
189
10.3.4
Illustrative example
191
10.4
Embedding external semantics from Wikipedia
193
10.4.1
Related Wikipedia articles
194
10.4.2
Wikipedia-influenced topic model
194
10.5
Data-driven semantic embedding
194
10.5.1
Generative process with data-driven semantic embedding
195
10.5.2
OLDA
algorithm with data-driven semantic embedding
196
10.5.3
Experimental design
197
10.5.4
Experimental results
199
10.6
Related work
202
10.7
Conclusion and future work
202
References
203
Index
205
|
any_adam_object | 1 |
author_GND | (DE-588)128412976 |
building | Verbundindex |
bvnumber | BV036661543 |
classification_rvk | QH 500 ST 306 ST 530 |
ctrlnum | (OCoLC)642944290 (DE-599)BVBBV036661543 |
discipline | Informatik Wirtschaftswissenschaften |
edition | 1. publ., reprint. |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01567nam a2200397 c 4500</leader><controlfield tag="001">BV036661543</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20120725 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">100909s2010 ad|| |||| 10||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780470749821</subfield><subfield code="9">978-0-470-74982-1</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)642944290</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV036661543</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-20</subfield><subfield code="a">DE-1051</subfield><subfield code="a">DE-706</subfield><subfield code="a">DE-355</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-522</subfield><subfield code="a">DE-N2</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">QH 500</subfield><subfield code="0">(DE-625)141607:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 306</subfield><subfield code="0">(DE-625)143654:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Text mining</subfield><subfield code="b">applications and theory</subfield><subfield code="c">Michael W. Berry ; Jacob Kogan</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1. publ., reprint.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Chichester</subfield><subfield code="b">Wiley</subfield><subfield code="c">2010</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XIV, 207 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturangaben</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Data mining / Congresses</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Natural language processing (Computer science) / Congresses</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Text Mining</subfield><subfield code="0">(DE-588)4728093-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)1071861417</subfield><subfield code="a">Konferenzschrift</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Text Mining</subfield><subfield code="0">(DE-588)4728093-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Berry, Michael W.</subfield><subfield code="e">Sonstige</subfield><subfield code="0">(DE-588)128412976</subfield><subfield code="4">oth</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020580838&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-020580838</subfield></datafield></record></collection> |
genre | (DE-588)1071861417 Konferenzschrift gnd-content |
genre_facet | Konferenzschrift |
id | DE-604.BV036661543 |
illustrated | Illustrated |
indexdate | 2024-07-09T22:45:11Z |
institution | BVB |
isbn | 9780470749821 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-020580838 |
oclc_num | 642944290 |
open_access_boolean | |
owner | DE-20 DE-1051 DE-706 DE-355 DE-BY-UBR DE-739 DE-522 DE-N2 |
owner_facet | DE-20 DE-1051 DE-706 DE-355 DE-BY-UBR DE-739 DE-522 DE-N2 |
physical | XIV, 207 S. Ill., graph. Darst. |
publishDate | 2010 |
publishDateSearch | 2010 |
publishDateSort | 2010 |
publisher | Wiley |
record_format | marc |
spelling | Text mining applications and theory Michael W. Berry ; Jacob Kogan 1. publ., reprint. Chichester Wiley 2010 XIV, 207 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier Literaturangaben Data mining / Congresses Natural language processing (Computer science) / Congresses Text Mining (DE-588)4728093-1 gnd rswk-swf (DE-588)1071861417 Konferenzschrift gnd-content Text Mining (DE-588)4728093-1 s DE-604 Berry, Michael W. Sonstige (DE-588)128412976 oth Digitalisierung UB Regensburg application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020580838&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Text mining applications and theory Data mining / Congresses Natural language processing (Computer science) / Congresses Text Mining (DE-588)4728093-1 gnd |
subject_GND | (DE-588)4728093-1 (DE-588)1071861417 |
title | Text mining applications and theory |
title_auth | Text mining applications and theory |
title_exact_search | Text mining applications and theory |
title_full | Text mining applications and theory Michael W. Berry ; Jacob Kogan |
title_fullStr | Text mining applications and theory Michael W. Berry ; Jacob Kogan |
title_full_unstemmed | Text mining applications and theory Michael W. Berry ; Jacob Kogan |
title_short | Text mining |
title_sort | text mining applications and theory |
title_sub | applications and theory |
topic | Data mining / Congresses Natural language processing (Computer science) / Congresses Text Mining (DE-588)4728093-1 gnd |
topic_facet | Data mining / Congresses Natural language processing (Computer science) / Congresses Text Mining Konferenzschrift |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020580838&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT berrymichaelw textminingapplicationsandtheory |