Fundamentals of predictive text mining:
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
London
Springer
2010
|
Schriftenreihe: | Texts in computer science
|
Schlagworte: | |
Online-Zugang: | Inhaltstext Inhaltsverzeichnis |
Beschreibung: | XIII, 226 S. Ill., graph. Darst. |
ISBN: | 9781849962254 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV036474443 | ||
003 | DE-604 | ||
005 | 20140908 | ||
007 | t | ||
008 | 100528s2010 ad|| |||| 00||| eng d | ||
015 | |a 10,N10 |2 dnb | ||
016 | 7 | |a 1000569438 |2 DE-101 | |
020 | |a 9781849962254 |c GB. : ca. EUR 58.80 (freier Pr.), ca. sfr 85.50 (freier Pr.) |9 978-1-84996-225-4 | ||
024 | 3 | |a 9781849962254 | |
028 | 5 | 2 | |a 12727506 |
035 | |a (OCoLC)699886961 | ||
035 | |a (DE-599)DNB1000569438 | ||
040 | |a DE-604 |b ger |e rakddb | ||
041 | 0 | |a eng | |
049 | |a DE-20 |a DE-11 |a DE-355 |a DE-739 |a DE-945 | ||
084 | |a ST 302 |0 (DE-625)143652: |2 rvk | ||
084 | |a ST 306 |0 (DE-625)143654: |2 rvk | ||
084 | |a 004 |2 sdnb | ||
100 | 1 | |a Weiss, Sholom M. |e Verfasser |0 (DE-588)14257547X |4 aut | |
245 | 1 | 0 | |a Fundamentals of predictive text mining |c Sholom M. Weiss ; Nitin Indurkhya ; Tong Zhang |
264 | 1 | |a London |b Springer |c 2010 | |
300 | |a XIII, 226 S. |b Ill., graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Texts in computer science | |
650 | 0 | 7 | |a Text Mining |0 (DE-588)4728093-1 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Text Mining |0 (DE-588)4728093-1 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Indurkhya, Nitin |e Verfasser |0 (DE-588)142575666 |4 aut | |
700 | 1 | |a Zhang, Tong |d 1971- |e Verfasser |0 (DE-588)142575852 |4 aut | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-1-84996-226-1 |
856 | 4 | 2 | |q text/html |u http://deposit.dnb.de/cgi-bin/dokserv?id=3435250&prov=M&dok_var=1&dok_ext=htm |3 Inhaltstext |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020346071&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-020346071 |
Datensatz im Suchindex
_version_ | 1805094096822337536 |
---|---|
adam_text |
Contents
Overview of Text Mining
. 1
1.1
What's Special About Text Mining?
. 1
1.1.1
Structured or Unstructured Data?
. 2
1.1.2
Is Text Different from Numbers?
. 3
1.2
What Types of Problems Can Be Solved?
. 5
1.3
Document Classification
. 6
1.4
Information Retrieval
. 6
1.5
Clustering and Organizing Documents
. 7
1.6
Information Extraction
. 8
1.7
Prediction and Evaluation
. 9
1.8
The Next Chapters
. 10
1.9
Summary
. 10
1.10
Historical and Bibliographical Remarks
. 11
1.11
Questions and Exercises
. 12
From Textual Information to Numerical Vectors
. 13
2.1
Collecting Documents
. 13
2.2
Document Standardization
. 15
2.3
Tokenization
. 16
2.4
Lemmatization
. 17
2.4.1
Inflectional Stemming
. 19
2.4.2
Stemming to a Root
. 19
2.5
Vector Generation for Prediction
. 21
2.5.1
Multiword Features
. 26
2.5.2
Labels for the Right Answers
. 28
2.5.3
Feature Selection by Attribute Ranking
. 29
2.6
Sentence Boundary Determination
. 29
2.7
Part-of-Speech Tagging
. 31
2.8
Word Sense Disambiguation
. 32
2.9
Phrase Recognition
. 32
2.10
Named Entity Recognition
. 33
2.11
Parsing
. 33
2.12
Feature Generation
. 35
2.13
Summary
. 36
2.14
Historical and Bibliographical Remarks
. 36
2.15
Questions and Exercises
. 38
Using Text for Prediction
. 39
3.1
Recognizing that Documents Fit a Pattern
. 41
3.2
How Many Documents Are Enough?
. 42
3.3
Document Classification
. 43
3.4
Learning to Predict from Text
. 44
3.4.1
Similarity and Nearest-Neighbor Methods
. 45
3.4.2
Document Similarity
. 46
3.4.3
Decision Rules
. 48
3.4.4
Decision Trees
. 54
3.4.5
Scoring by Probabilities
. 55
3.4.6
Linear Scoring Methods
. 58
3.5
Evaluation of Performance
. 66
3.5.1
Estimating Current and Future Performance
. 66
3.5.2
Getting the Most from a Learning Method
. 69
3.6
Applications
. 69
3.7
Summary
. 70
3.8
Historical and Bibliographical Remarks
. 70
3.9
Questions and Exercises
. 72
Information Retrieval and Text Mining
. 75
4.1
Is Information Retrieval a Form of Text Mining?
. 75
4.2
Key Word Search
. 76
4.3
Nearest-Neighbor Methods
. 77
4.4
Measuring Similarity
. 78
4.4.1
Shared Word Count
. 78
4.4.2
Word Count and Bonus
. 78
4.4.3
Cosine Similarity
. 79
4.5
Web-based Document Search
. 80
4.5.1
Link Analysis
. 81
4.6
Document Matching
. 85
4.7
Inverted Lists
. 85
4.8
Evaluation of Performance
. 87
4.9
Summary
. 88
4.10
Historical and Bibliographical Remarks
. 88
4.11
Questions and Exercises
. 89
Finding Structure in a Document Collection
. 91
5.1
Clustering Documents by Similarity
. 93
5.2
Similarity of Composite Documents
. 94
5.2.1
^-Means Clustering
. 96
5.2.2
Hierarchical
Clustering
. 99
5.2.3
The EM Algorithm
. 102
5.3
What Do a Cluster's Labels Mean?
. 105
5.4
Applications
. 107
5.5
Evaluation of Performance
. 108
5.6
Summary
. 110
5.7
Historical and Bibliographical Remarks
. 110
5.8
Questions and Exercises
.
Ill
Looking for Information in Documents
. 113
6.1
Goals of Information Extraction
. 113
6.2
Finding Patterns and Entities from Text
. 115
6.2.1
Entity Extraction as Sequential Tagging
. 116
6.2.2
Tag Prediction as Classification
. 117
6.2.3
The Maximum Entropy Method
. 118
6.2.4
Linguistic Features and Encoding
. 123
6.2.5
Local Sequence Prediction Models
. 124
6.2.6
Global Sequence Prediction Models
. 128
6.3
Coreference and Relationship Extraction
. 129
6.3.1
Coreference Resolution
. 129
6.3.2
Relationship Extraction
. 131
6.4
Template Filling and Database Construction
. 132
6.5
Applications
. 133
6.5.1
Information Retrieval
. 133
6.5.2
Commercial Extraction Systems
. 134
6.5.3
Criminal Justice
. 135
6.5.4
Intelligence
. 135
6.6
Summary
. 136
6.7
Historical and Bibliographical Remarks
. 137
6.8
Questions and Exercises
. 138
Data Sources for Prediction: Databases, Hybrid Data and the Web
. 141
7.1
Ideal Models of Data
. 141
7.1.1
Ideal Data for Prediction
. 141
7.1.2
Ideal Data for Text and Unstructured Data
. 142
7.1.3
Hybrid and Mixed Data
. 142
7.2
Practical Data Sourcing
. 144
7.3
Prototypical Examples
. 145
7.3.1
Web-based Spreadsheet Data
. 146
7.3.2
Web-based XML Data
. 146
7.3.3
Opinion Data and Sentiment Analysis
. 148
7.4
Hybrid Example: Independent Sources of Numerical and Text Data
151
7.5
Mixed Data in Standard Table Format
. 152
7.6
Summary
. 153
7.7
Historical and Bibliographical Remarks
. 154
7.8
Questions and Exercises
. 154
8
Case Studies
. 157
8.1
Market Intelligence from the Web
. 157
8.1.1
The Problem
. 157
8.1.2
Solution Overview
. 158
8.1.3
Methods and Procedures
. 159
8.1.4
System Deployment
. 160
8.2
Lightweight Document Matching for Digital Libraries
. 161
8.2.1
The Problem
. 161
8.2.2
Solution Overview
. 162
8.2.3
Methods and Procedures
. 163
8.2.4
System Deployment
. 164
8.3
Generating Model Cases for Help Desk Applications
. 165
8.3.1
The Problem
. 165
8.3.2
Solution Overview
. 165
8.3.3
Methods and Procedures
. 166
8.3.4
System Deployment
. 168
8.4
Assigning Topics to News Articles
. 169
8.4.1
The Problem
. 169
8.4.2
Solution Overview
. 169
8.4.3
Methods and Procedures
. 169
8.4.4
System Deployment
. 173
8.5
E-mail Filtering
. 174
8.5.1
The Problem
. 174
8.5.2
Solution Overview
. 174
8.5.3
Methods and Procedures
. 175
8.5.4
System Deployment
. 177
8.6
Search Engines
. 177
8.6.1
The Problem
. 177
8.6.2
Solution Overview
. 177
8.6.3
Methods and Procedures
. 178
8.6.4
System Deployment
. 179
8.7
Extracting Named Entities from Documents
. 181
8.7.1
The Problem
. 181
8.7.2
Solution Overview
. 181
8.7.3
Methods and Procedures
. 182
8.7.4
System Deployment
. 184
8.8
Customized Newspapers
. 184
8.8.1
The Problem
. 184
8.8.2
Solution Overview
. 185
8.8.3
Methods and Procedures
. 186
8.8.4
System Deployment
. 187
8.9
Summary
. 187
8.10
Historical and Bibliographical Remarks
. 188
8.11
Questions and Exercises
. 188
9
Emerging
Directions
. 189
9.1
Summarization
. 189
9.2
Active Learning
. 192
9.3
Learning with Unlabeled Data
. 193
9.4
Different Ways of Collecting Samples
. 194
9.4.1
Ensembles and Voting Methods
. 194
9.4.2
Online Learning
. 196
9.4.3
Cost-Sensitive Learning
. 197
9.4.4
Unbalanced Samples and Rare Events
. 198
9.5
Distributed Text Mining
. 198
9.6
Learning to Rank
. 200
9.7
Question Answering
. 201
9.8
Summary
. 202
9.9
Historical and Bibliographical Remarks
. 203
9.10
Questions and Exercises
. 204
A Software Notes
.207
A.
1
Summary of Software
.207
A.2 Requirements
.208
A.3 Download Instructions
.208
References
.211
Author Index
.219
Subject Index
.223 |
any_adam_object | 1 |
author | Weiss, Sholom M. Indurkhya, Nitin Zhang, Tong 1971- |
author_GND | (DE-588)14257547X (DE-588)142575666 (DE-588)142575852 |
author_facet | Weiss, Sholom M. Indurkhya, Nitin Zhang, Tong 1971- |
author_role | aut aut aut |
author_sort | Weiss, Sholom M. |
author_variant | s m w sm smw n i ni t z tz |
building | Verbundindex |
bvnumber | BV036474443 |
classification_rvk | ST 302 ST 306 |
ctrlnum | (OCoLC)699886961 (DE-599)DNB1000569438 |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV036474443</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20140908</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">100528s2010 ad|| |||| 00||| eng d</controlfield><datafield tag="015" ind1=" " ind2=" "><subfield code="a">10,N10</subfield><subfield code="2">dnb</subfield></datafield><datafield tag="016" ind1="7" ind2=" "><subfield code="a">1000569438</subfield><subfield code="2">DE-101</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781849962254</subfield><subfield code="c">GB. : ca. EUR 58.80 (freier Pr.), ca. sfr 85.50 (freier Pr.)</subfield><subfield code="9">978-1-84996-225-4</subfield></datafield><datafield tag="024" ind1="3" ind2=" "><subfield code="a">9781849962254</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">12727506</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)699886961</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DNB1000569438</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakddb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-20</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-355</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-945</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 302</subfield><subfield code="0">(DE-625)143652:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 306</subfield><subfield code="0">(DE-625)143654:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">004</subfield><subfield code="2">sdnb</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Weiss, Sholom M.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)14257547X</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Fundamentals of predictive text mining</subfield><subfield code="c">Sholom M. Weiss ; Nitin Indurkhya ; Tong Zhang</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">London</subfield><subfield code="b">Springer</subfield><subfield code="c">2010</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XIII, 226 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Texts in computer science</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Text Mining</subfield><subfield code="0">(DE-588)4728093-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Text Mining</subfield><subfield code="0">(DE-588)4728093-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Indurkhya, Nitin</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)142575666</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Zhang, Tong</subfield><subfield code="d">1971-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)142575852</subfield><subfield code="4">aut</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-1-84996-226-1</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="q">text/html</subfield><subfield code="u">http://deposit.dnb.de/cgi-bin/dokserv?id=3435250&prov=M&dok_var=1&dok_ext=htm</subfield><subfield code="3">Inhaltstext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020346071&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-020346071</subfield></datafield></record></collection> |
id | DE-604.BV036474443 |
illustrated | Illustrated |
indexdate | 2024-07-20T10:37:58Z |
institution | BVB |
isbn | 9781849962254 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-020346071 |
oclc_num | 699886961 |
open_access_boolean | |
owner | DE-20 DE-11 DE-355 DE-BY-UBR DE-739 DE-945 |
owner_facet | DE-20 DE-11 DE-355 DE-BY-UBR DE-739 DE-945 |
physical | XIII, 226 S. Ill., graph. Darst. |
publishDate | 2010 |
publishDateSearch | 2010 |
publishDateSort | 2010 |
publisher | Springer |
record_format | marc |
series2 | Texts in computer science |
spelling | Weiss, Sholom M. Verfasser (DE-588)14257547X aut Fundamentals of predictive text mining Sholom M. Weiss ; Nitin Indurkhya ; Tong Zhang London Springer 2010 XIII, 226 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier Texts in computer science Text Mining (DE-588)4728093-1 gnd rswk-swf Text Mining (DE-588)4728093-1 s DE-604 Indurkhya, Nitin Verfasser (DE-588)142575666 aut Zhang, Tong 1971- Verfasser (DE-588)142575852 aut Erscheint auch als Online-Ausgabe 978-1-84996-226-1 text/html http://deposit.dnb.de/cgi-bin/dokserv?id=3435250&prov=M&dok_var=1&dok_ext=htm Inhaltstext Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020346071&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Weiss, Sholom M. Indurkhya, Nitin Zhang, Tong 1971- Fundamentals of predictive text mining Text Mining (DE-588)4728093-1 gnd |
subject_GND | (DE-588)4728093-1 |
title | Fundamentals of predictive text mining |
title_auth | Fundamentals of predictive text mining |
title_exact_search | Fundamentals of predictive text mining |
title_full | Fundamentals of predictive text mining Sholom M. Weiss ; Nitin Indurkhya ; Tong Zhang |
title_fullStr | Fundamentals of predictive text mining Sholom M. Weiss ; Nitin Indurkhya ; Tong Zhang |
title_full_unstemmed | Fundamentals of predictive text mining Sholom M. Weiss ; Nitin Indurkhya ; Tong Zhang |
title_short | Fundamentals of predictive text mining |
title_sort | fundamentals of predictive text mining |
topic | Text Mining (DE-588)4728093-1 gnd |
topic_facet | Text Mining |
url | http://deposit.dnb.de/cgi-bin/dokserv?id=3435250&prov=M&dok_var=1&dok_ext=htm http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020346071&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT weisssholomm fundamentalsofpredictivetextmining AT indurkhyanitin fundamentalsofpredictivetextmining AT zhangtong fundamentalsofpredictivetextmining |