Fundamentals of Predictive Text Mining:
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
London
Springer
[2015]
|
Ausgabe: | Second edition |
Schriftenreihe: | Texts in computer science
|
Schlagworte: | |
Online-Zugang: | Inhaltstext Inhaltsverzeichnis |
Beschreibung: | xiii, 239 Seiten Illustrationen |
ISBN: | 9781447167495 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV043008405 | ||
003 | DE-604 | ||
005 | 20220912 | ||
007 | t | ||
008 | 151116s2015 xxka||| |||| 00||| eng d | ||
016 | 7 | |a 107329725X |2 DE-101 | |
020 | |a 9781447167495 |9 978-1-4471-6749-5 | ||
035 | |a (OCoLC)927919640 | ||
035 | |a (DE-599)DNB107329725X | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
044 | |a xxk |c XA-GB | ||
049 | |a DE-739 |a DE-N2 |a DE-11 | ||
082 | 0 | |a 004 |2 23 | |
084 | |a ST 302 |0 (DE-625)143652: |2 rvk | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
084 | |a ST 306 |0 (DE-625)143654: |2 rvk | ||
084 | |a 004 |2 sdnb | ||
100 | 1 | |a Weiss, Sholom M. |e Verfasser |0 (DE-588)14257547X |4 aut | |
245 | 1 | 0 | |a Fundamentals of Predictive Text Mining |c Sholom M. Weiss, Nitin Indurkhya, Tong Zhang |
250 | |a Second edition | ||
264 | 1 | |a London |b Springer |c [2015] | |
300 | |a xiii, 239 Seiten |b Illustrationen | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Texts in computer science | |
650 | 0 | 7 | |a Text Mining |0 (DE-588)4728093-1 |2 gnd |9 rswk-swf |
653 | |a Upper undergraduate | ||
653 | |a Text Mining | ||
653 | |a Machine Learning | ||
653 | |a Information Extraction | ||
653 | |a Document Classification | ||
653 | |a Information Retrieval | ||
689 | 0 | 0 | |a Text Mining |0 (DE-588)4728093-1 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Indurkhya, Nitin |e Verfasser |0 (DE-588)142575666 |4 aut | |
700 | 1 | |a Zhang, Tong |d 1971- |e Verfasser |0 (DE-588)142575852 |4 aut | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-1-4471-6750-1 |
856 | 4 | 2 | |m X:MVB |q text/html |u http://deposit.dnb.de/cgi-bin/dokserv?id=5302335&prov=M&dok_var=1&dok_ext=htm |3 Inhaltstext |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028433419&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-028433419 |
Datensatz im Suchindex
_version_ | 1806332805499584512 |
---|---|
adam_text |
Contents
1 Overview of Text Mining. 1
1.1 What’s Special About Text Mining?. 1
1.1.1 Structured or Unstructured Data?. 2
1.1.2 Is Text Different from Numbers?. 3
1.2 What Types of Problems Can Be Solved?. 5
1.3 Document Classification. 6
1.4 Information Retrieval. 6
1.5 Clustering and Organizing Documents. 7
1.6 Information Extraction. 8
1.7 Prediction and Evaluation. 9
1.8 The Next Chapters. 10
1.9 Summary. 11
1.10 Historical and Bibliographical Remarks. 11
1.11 Questions and Exercises. 12
2 From Textual Information to Numerical Vectors. 13
2.1 Collecting Documents. 13
2.2 Document Standardization. 15
2.3 Tokenization. 17
2.4 Lemmatization. 19
2.4.1 Inflectional Stemming. 19
2.4.2 Stemming to a Root. 21
2.5 Vector Generation for Prediction. 21
2.5.1 Multiword Features. 26
2.5.2 Labels for the Right Answers. 29
2.5.3 Feature Selection by Attribute Ranking. 29
2.6 Sentence Boundary Determination. 30
2.7 Part-of-Speech Tagging. 30
2.8 Word Sense Disambiguation. 32
2.9 Phrase Recognition. 33
2.10 Named Entity Recognition. 33
ix
x Contents
2.11 Parsing. 34
2.12 Feature Generation. 35
2.13 Summary. 37
2.14 Historical and Bibliographical Remarks. 37
2.15 Questions and Exercises. 39
3 Using Text for Prediction. 41
3.1 Recognizing that Documents Fit a Pattern. 43
3.2 How Many Documents Are Enough?. 44
3.3 Document Classification. 45
3.4 Learning to Predict from Text. 46
3.4.1 Similarity and Nearest-Neighbor Methods. 47
3.4.2 Document Similarity. 48
3.4.3 Decision Rules. 50
3.4.4 Decision Trees. 56
3.4.5 Scoring by Probabilities. 57
3.4.6 Linear Scoring Methods. 60
3.5 Evaluation of Performance. 69
3.5.1 Estimating Current and Future Performance. 69
3.5.2 Getting the Most from a Learning Method. 71
3.5.3 Errors and Pitfalls in Big Data Evaluation. 72
3.6 Applications. 74
3.7 Graph Models for Social Networks. 74
3.8 Summary. 76
3.9 Historical and Bibliographical Remarks. 77
3.10 Questions and Exercises. 79
4 Information Retrieval and Text Mining. 81
4.1 Is Information Retrieval a Form of Text Mining?. 81
4.2 Key Word Search. 82
4.3 Nearest-Neighbor Methods. 83
4.4 Measuring Similarity. 84
4.4.1 Shared Word Count. 84
4.4.2 Word Count and Bonus. 85
4.4.3 Cosine Similarity. 86
4.5 Web-Based Document Search. 87
4.5.1 Link Analysis. 88
4.6 Document Matching. 91
4.7 Inverted Lists. 92
4.8 Evaluation of Performance. 93
4.9 Summary. 94
4.10 Historical and Bibliographical Remarks. 95
4.11 Questions and Exercises. 95
Contents
5 Finding Structure in a Document Collection. 97
5.1 Clustering Documents by Similarity. 99
5.2 Similarity of Composite Documents. 100
5.2.1 ¿-Means Clustering. 102
5.2.2 Hierarchical Clustering. 106
5.2.3 The EM Algorithm. 108
5.3 What Do a Cluster’s Labels Mean?. 112
5.4 Applications. 113
5.5 Evaluation of Performance. 114
5.6 Summary. 116
5.7 Historical and Bibliographical Remarks. 116
5.8 Questions and Exercises. 118
6 Looking for Information in Documents. 119
6.1 Goals of Information Extraction. 119
6.2 Finding Patterns and Entities from Text. 121
6.2.1 Entity Extraction as Sequential Tagging. 122
6.2.2 Tag Prediction as Classification. 123
6.2.3 The Maximum Entropy Method. 124
6.2.4 Linguistic Features and Encoding. 129
6.2.5 Local Sequence Prediction Models. 130
6.2.6 Global Sequence Prediction Models. 134
6.3 Coreference and Relationship Extraction. 135
6.3.1 Coreference Resolution. 135
6.3.2 Relationship Extraction. 138
6.4 Template Filling and Database Construction. 139
6.5 Applications. 140
6.5.1 Information Retrieval. 140
6.5.2 Commercial Extraction Systems. 140
6.5.3 Criminal Justice. 141
6.5.4 Intelligence. 142
6.6 Summary. 143
6.7 Historical and Bibliographical Remarks. 143
6.8 Questions and Exercises. 145 7
7 Data Sources for Prediction: Databases, Hybrid Data
and the Web. 147
7.1 Ideal Models of Data. 147
7.1.1 Ideal Data for Prediction. 147
7.1.2 Ideal Data for Text and Unstructured Data. 148
7.1.3 Hybrid and Mixed Data. 148
7.2 Practical Data Sourcing. 150
Contents
xii
7.3 Prototypical Examples. 151
7.3.1 Web-Based Spreadsheet Data. 152
7.3.2 Web-Based XML Data. 152
7.3.3 Opinion Data and Sentiment Analysis. 153
7.4 Hybrid Example՛. Independent Sources of Numerical
and Text Data. 158
7.5 Mixed Data in Standard Table Format. 159
7.6 Summary. 160
7.7 Historical and Bibliographical Remarks. 162
7.8 Questions and Exercises. 162
8 Case Studies. 165
8.1 Market Intelligence from the Web. 165
8.1.1 The Problem. 165
8.1.2 Solution Overview. 166
8.1.3 Methods and Procedures. 167
8.1.4 System Deployment. 168
8.2 Lightweight Document Matching for Digital Libraries. 170
8.2.1 The Problem. 170
8.2.2 Solution Overview. 170
8.2.3 Methods and Procedures. 171
8.2.4 System Deployment. 173
8.3 Generating Model Cases for Help Desk Applications. 173
8.3.1 The Problem. 173
8.3.2 Solution Overview. 174
8.3.3 Methods and Procedures. 174
8.3.4 System Deployment. 176
8.4 Assigning Topics to News Articles. 177
8.4.1 The Problem. 177
8.4.2 Solution Overview. 177
8.4.3 Methods and Procedures. 178
8.4.4 System Deployment. 182
8.5 E-mail Filtering. 182
8.5.1 The Problem. 182
8.5.2 Solution Overview. 183
8.5.3 Methods and Procedures. 184
8.5.4 System Deployment. 185
8.6 Search Engines. 186
8.6.1 The Problem. 186
8.6.2 Solution Overview. 186
8.6.3 Methods and Procedures. 187
8.6.4 System Deployment. 188
Contents
хш
8.7 Extracting Named Entities from Documents. 190
8.7.1 The Problem. 190
8.7.2 Solution Overview. 190
8.7.3 Methods and Procedures. 191
8.7.4 System Deployment. 193
8.8 Mining Social Media. 194
8.8.1 The Problem. 194
8.8.2 Solution Overview. 195
8.8.3 Methods and Procedures. 196
8.8.4 System Deployment. 197
8.9 Customized Newspapers. 197
8.9.1 The Problem. 197
8.9.2 Solution Overview. 198
8.9.3 Methods and Procedures. 198
8.9.4 System Deployment. 199
8.10 Summary. 200
8.11 Historical and Bibliographical Remarks. 200
8.12 Questions and Exercises. 701
9 Emerging Directions. 703
9.1 Summarization. 703
9.2 Active Learning. 706
9.3 Learning with Unlabeled Data. 207
9.4 Different Ways of Collecting Samples. 208
9.4.1 Ensembles and Voting Methods. 208
9.4.2 Online Learning. 2Ю
9.4.3 Deep Learning. 211
9.4.4 Cost-Sensitive Learning. 214
9.4.5 Unbalanced Samples and Rare Events. 214
9.5 Distributed Text Mining. 215
9.6 Learning to Rank. 217
9.7 Question Answering. 218
11 Q
9.8 Summary.
9.9 Historical and Bibliographical Remarks. 219
9.10 Questions and Exercises. 222
ооа
References.
Author Index. 251
Subject Index. 25J |
any_adam_object | 1 |
author | Weiss, Sholom M. Indurkhya, Nitin Zhang, Tong 1971- |
author_GND | (DE-588)14257547X (DE-588)142575666 (DE-588)142575852 |
author_facet | Weiss, Sholom M. Indurkhya, Nitin Zhang, Tong 1971- |
author_role | aut aut aut |
author_sort | Weiss, Sholom M. |
author_variant | s m w sm smw n i ni t z tz |
building | Verbundindex |
bvnumber | BV043008405 |
classification_rvk | ST 302 ST 530 ST 306 |
ctrlnum | (OCoLC)927919640 (DE-599)DNB107329725X |
dewey-full | 004 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 004 - Computer science |
dewey-raw | 004 |
dewey-search | 004 |
dewey-sort | 14 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik |
edition | Second edition |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV043008405</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20220912</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">151116s2015 xxka||| |||| 00||| eng d</controlfield><datafield tag="016" ind1="7" ind2=" "><subfield code="a">107329725X</subfield><subfield code="2">DE-101</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781447167495</subfield><subfield code="9">978-1-4471-6749-5</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)927919640</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DNB107329725X</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxk</subfield><subfield code="c">XA-GB</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-739</subfield><subfield code="a">DE-N2</subfield><subfield code="a">DE-11</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">004</subfield><subfield code="2">23</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 302</subfield><subfield code="0">(DE-625)143652:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 306</subfield><subfield code="0">(DE-625)143654:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">004</subfield><subfield code="2">sdnb</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Weiss, Sholom M.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)14257547X</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Fundamentals of Predictive Text Mining</subfield><subfield code="c">Sholom M. Weiss, Nitin Indurkhya, Tong Zhang</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">Second edition</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">London</subfield><subfield code="b">Springer</subfield><subfield code="c">[2015]</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xiii, 239 Seiten</subfield><subfield code="b">Illustrationen</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Texts in computer science</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Text Mining</subfield><subfield code="0">(DE-588)4728093-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="653" ind1=" " ind2=" "><subfield code="a">Upper undergraduate</subfield></datafield><datafield tag="653" ind1=" " ind2=" "><subfield code="a">Text Mining</subfield></datafield><datafield tag="653" ind1=" " ind2=" "><subfield code="a">Machine Learning</subfield></datafield><datafield tag="653" ind1=" " ind2=" "><subfield code="a">Information Extraction</subfield></datafield><datafield tag="653" ind1=" " ind2=" "><subfield code="a">Document Classification</subfield></datafield><datafield tag="653" ind1=" " ind2=" "><subfield code="a">Information Retrieval</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Text Mining</subfield><subfield code="0">(DE-588)4728093-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Indurkhya, Nitin</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)142575666</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Zhang, Tong</subfield><subfield code="d">1971-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)142575852</subfield><subfield code="4">aut</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-1-4471-6750-1</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">X:MVB</subfield><subfield code="q">text/html</subfield><subfield code="u">http://deposit.dnb.de/cgi-bin/dokserv?id=5302335&prov=M&dok_var=1&dok_ext=htm</subfield><subfield code="3">Inhaltstext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028433419&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-028433419</subfield></datafield></record></collection> |
id | DE-604.BV043008405 |
illustrated | Illustrated |
indexdate | 2024-08-03T02:46:43Z |
institution | BVB |
isbn | 9781447167495 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-028433419 |
oclc_num | 927919640 |
open_access_boolean | |
owner | DE-739 DE-N2 DE-11 |
owner_facet | DE-739 DE-N2 DE-11 |
physical | xiii, 239 Seiten Illustrationen |
publishDate | 2015 |
publishDateSearch | 2015 |
publishDateSort | 2015 |
publisher | Springer |
record_format | marc |
series2 | Texts in computer science |
spelling | Weiss, Sholom M. Verfasser (DE-588)14257547X aut Fundamentals of Predictive Text Mining Sholom M. Weiss, Nitin Indurkhya, Tong Zhang Second edition London Springer [2015] xiii, 239 Seiten Illustrationen txt rdacontent n rdamedia nc rdacarrier Texts in computer science Text Mining (DE-588)4728093-1 gnd rswk-swf Upper undergraduate Text Mining Machine Learning Information Extraction Document Classification Information Retrieval Text Mining (DE-588)4728093-1 s DE-604 Indurkhya, Nitin Verfasser (DE-588)142575666 aut Zhang, Tong 1971- Verfasser (DE-588)142575852 aut Erscheint auch als Online-Ausgabe 978-1-4471-6750-1 X:MVB text/html http://deposit.dnb.de/cgi-bin/dokserv?id=5302335&prov=M&dok_var=1&dok_ext=htm Inhaltstext Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028433419&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Weiss, Sholom M. Indurkhya, Nitin Zhang, Tong 1971- Fundamentals of Predictive Text Mining Text Mining (DE-588)4728093-1 gnd |
subject_GND | (DE-588)4728093-1 |
title | Fundamentals of Predictive Text Mining |
title_auth | Fundamentals of Predictive Text Mining |
title_exact_search | Fundamentals of Predictive Text Mining |
title_full | Fundamentals of Predictive Text Mining Sholom M. Weiss, Nitin Indurkhya, Tong Zhang |
title_fullStr | Fundamentals of Predictive Text Mining Sholom M. Weiss, Nitin Indurkhya, Tong Zhang |
title_full_unstemmed | Fundamentals of Predictive Text Mining Sholom M. Weiss, Nitin Indurkhya, Tong Zhang |
title_short | Fundamentals of Predictive Text Mining |
title_sort | fundamentals of predictive text mining |
topic | Text Mining (DE-588)4728093-1 gnd |
topic_facet | Text Mining |
url | http://deposit.dnb.de/cgi-bin/dokserv?id=5302335&prov=M&dok_var=1&dok_ext=htm http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=028433419&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT weisssholomm fundamentalsofpredictivetextmining AT indurkhyanitin fundamentalsofpredictivetextmining AT zhangtong fundamentalsofpredictivetextmining |