Verfügbarkeit: Taming text :: THWS Bibkatalog

Taming text: how to find, organize, and manipulate it

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ingersoll, Grant S. (VerfasserIn), Morton, Thomas S. (VerfasserIn), Farris, Andrew L. (VerfasserIn)
Format:	Buch
Sprache:	English
Veröffentlicht:	Shelter Island, NY Manning 2013
Schlagworte:	Text Mining
Online-Zugang:	Inhaltsverzeichnis
Beschreibung:	XXI, 298 S. Ill., graph. Darst. 24 cm
ISBN:	9781933988382 193398838X

Internformat

MARC


LEADER	00000nam a2200000 c 4500
001	BV037187785
003	DE-604
005	20211027
007	t
008	110127s2013 ad\|\| \|\|\|\| 00\|\|\| eng d
020			\|a 9781933988382 \|c pbk. \|9 978-1-933988-38-2
020			\|a 193398838X \|9 1-933988-38-X
035			\|a (OCoLC)706990107
035			\|a (DE-599)BVBBV037187785
040			\|a DE-604 \|b ger \|e rakwb
041	0		\|a eng
049			\|a DE-19 \|a DE-523 \|a DE-188 \|a DE-739 \|a DE-B768 \|a DE-210
084			\|a ST 306 \|0 (DE-625)143654: \|2 rvk
084			\|a ST 350 \|0 (DE-625)143667: \|2 rvk
100	1		\|a Ingersoll, Grant S. \|e Verfasser \|4 aut
245	1	0	\|a Taming text \|b how to find, organize, and manipulate it \|c Grant S. Ingersoll ; Thomas S. Morton ; Andrew L. Farris
264		1	\|a Shelter Island, NY \|b Manning \|c 2013
300			\|a XXI, 298 S. \|b Ill., graph. Darst. \|c 24 cm
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
650	0	7	\|a Text Mining \|0 (DE-588)4728093-1 \|2 gnd \|9 rswk-swf
689	0	0	\|a Text Mining \|0 (DE-588)4728093-1 \|D s
689	0		\|5 DE-604
700	1		\|a Morton, Thomas S. \|e Verfasser \|4 aut
700	1		\|a Farris, Andrew L. \|e Verfasser \|4 aut
856	4	2	\|m HBZ Datenaustausch \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=021102273&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
999			\|a oai:aleph.bib-bvb.de:BVB01-021102273

Datensatz im Suchindex

_version_	1804143772419751936
adam_text	Titel: Taming text Autor: Ingersoll, Grant S Jahr: 2013 contents 1 o Jmmd foreword xiii preface xiv acknowledgments xvii about this book xix about the cover illustration xxii Getting started taming text 1 1.1 Why taming text is important 2 1.2 Preview: A fact-based, question answering system 4 Hello, Dr. Frankenstein 5 1.3 Understanding text is hard 8 1.4 Text, tamed 10 1.5 Text and the intelligent app: search and beyond 11 Searching and matching 12 • Extracting information 13 Grouping information 13* An intelligent application 14 1.6 Summary 14 1.7 Resources 14 Foundations of taming text 16 2.1 Foundations of language 17 Words and their categories 18 * Phrases and clauses 19 Morphology 20 CONTENTS 2.2 Common tools for text processing 21 String manipulation tools 21 * Tokens and tokenization 22 Part of speech assignment 24 * Stemming 25 * Sentence detection 27 ¦ Parsing and grammar 28 * Sequence modeling 30 2.3 Preprocessing and extracting content from common file formats 31 The importance of preprocessing 31 * Extracting content using Apache Tika 33 2.4 Summary 36 2.5 Resources 36 Searching 37 3.1 Search and faceting example: Amazon.com 38 3.2 Introduction to search concepts 40 Indexing content 41 * User input 43 * Ranking documents with the vector space model 46 * Results display 49 3.3 Introducing the Apache Solr search server 52 Running Solr for the first time 52 * Understanding Solr concepts 54 3.4 Indexing content with Apache Solr 57 Indexing using XML 58 * Extracting and indexing content using Solr and Apache Tika 59 3.5 Searching content with Apache Solr 63 Solr query input parameters 64 * Faceting on extracted content 67 3.6 Understanding search performance factors 69 Judging quality 69 * Judging quantity 73 3.7 Improving search performance 74 Hardware improvements 74 * Analysis improvements 75 Query performance improvements 76 * Alternative scoring models 79* Techniques for improving Solr performance 80 3.8 Search alternatives 82 3.9 Summary 83 3.10 Resources 83 CONTENTS Fuzzy string matching 84 4.1 Approaches to fuzzy string matching 86 Character overlap measures 86 * Edit distance measures 89 N-gram edit distance 92 4.2 Finding fuzzy string matches 94 Using prefixes for matching with Solr 94* Using a trie for prefix matching 95 * Using n-gramsfor matching 99 4.3 Building fuzzy string matching applications 100 Adding type-ahead to search 101 * Query spell-checking for search 105 * Record matching 109 4.4 Summary 114 4.5 Resources 114 Identifying people, places, and things 115 5.1 Approaches to named-entity recognition 117 Using rules to identify names 117 * Using statistical classifiers to identify names 118 5.2 Basic entity identification with OpenNLP 119 Finding names with OpenNLP 120* Interpreting names identified by OpenNLP 121 * Filtering names based on probability 122 5.3 In-depth entity identification with OpenNLP 123 Identifying multiple entity types with OpenNLP 123 Under the hood: how OpenNLP identifies names 126 5.4 Performance of OpenNLP 128 Quality of results 129 * Runtime performance 130 Memory usage in OpenNLP 131 5.5 Customizing OpenNLP entity identification for a new domain 132 The whys and hows of training a model 132 * Training an OpenNLP model 133 * Altering modeling inputs 134 A new way to model names 136 5.6 Summary 138 5.7 Further reading 139 CONTENTS Clustering text 140 6.1 Google News document clustering 141 6.2 Clustering foundations 142 Three types of text to cluster 142* Choosing a clustering algorithm 144 * Determining similarity 145 * Labeling the results 146* How to evaluate clustering results 147 6.3 Setting up a simple clustering application 149 6.4 Clustering search results using Carrot2 149 Using the Carrot2 API 150 * Clustering Solr search results using Carrot2 151 6.5 Clustering document collections with Apache Mahout 154 Preparing the data for clustering 155 * K-Means clustering 158 6.6 Topic modeling using Apache Mahout 162 6.7 Examining clustering performance 164 Feature selection and reduction 164* Carrot2 performance and quality 167* Mahout clustering benchmarks 168 6.8 Acknowledgments 172 6.9 Summary 173 6.10 References 173 Classification, categorization, and tagging 175 7.1 Introduction to classification and categorization 177 7.2 The classification process 180 Choosing a classification scheme 181 * Identifying features for text categorization 182* The importance of training data 183 * Evaluating classifier performance 186 Deploying a classifier into production 188 7.3 Building document categorizers using Apache Lucene 189 Categorizing text with Lucene 189 * Preparing the training data for the MoreLikeThis categorizer 191 * Training the MoreLikeThis categorizer 193 * Categorizing documents with the MoreLikeThis categorizer 197* Testing the MoreLikeThis categorizer 199 * MoreLikeThis in production 201 o 9 CONTENTS 7.4 Training a naive Bayes classifier using Apache Mahout 202 Categorizing text using naive Bayes classification 202 Preparing the training data 204 * Withholding test data 207 Training the classifier 208 * Testing the classifier 209 Improving the bootstrapping process 210* Integrating the Mahout Bayes classifier with Solr 212 7.5 Categorizing documents with OpenNLP 215 Regression models and maximum entropy * document categorization 216* Preparing training data for the maximum entropy document categorizer 219 * Training the maximum entropy document categorizer 220 * Testing the maximum entropy document classifier 224 * Maximum entropy document categorization in production 225 7.6 Building a tag recommender using Apache Solr 227 Collecting training data for tag recommendations 229 Preparing the training data 231 * Training the Solr tag recommender 232 * Creating tag recommendations 234 Evaluating the tag recommender 236 7.7 Summary 238 7.8 References 239 Building an example question answering system 240 8.1 Basics of a question answering system 242 8.2 Installing and running the QA code 243 8.3 A sample question answering architecture 245 8.4 Understanding questions and producing answers 248 Training the answer type classifier 248 * Chunking the query 251 * Computing the answer type 252 * Generating the query 255 * Ranking candidate passages 256 8.5 Steps to improve the system 258 8.6 Summary 259 8.7 Resources 259 Untamed text: exploring the next frontier 260 9.1 Semantics, discourse, and pragmatics: exploring higher levels of NLP 261 Semantics 262 * Discourse 263 * Pragmatics 264 CONTENTS 9.2 Document and collection summarization 266 9.3 Relationship extraction 268 Overview of approaches 270 * Evaluation 272 * Tools for relationship extraction 273 9.4 Identifying important content and people 273 Global importance and authoritativeness 274 * Personal importance 275 * Resources and pointers on importance 275 9.5 Detecting emotions via sentiment analysis 276 History and review 276* Tools and data needs 278* A basic polarity algorithm 279 * Advanced topics 280 * Open source libraries for sentiment analysis 281 9.6 Cross-language information retrieval 282 9.7 Summary 284 9.8 References 284 index 287
any_adam_object	1
author	Ingersoll, Grant S. Morton, Thomas S. Farris, Andrew L.
author_facet	Ingersoll, Grant S. Morton, Thomas S. Farris, Andrew L.
author_role	aut aut aut
author_sort	Ingersoll, Grant S.
author_variant	g s i gs gsi t s m ts tsm a l f al alf
building	Verbundindex
bvnumber	BV037187785
classification_rvk	ST 306 ST 350
ctrlnum	(OCoLC)706990107 (DE-599)BVBBV037187785
discipline	Informatik
format	Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01452nam a2200361 c 4500</leader><controlfield tag="001">BV037187785</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20211027 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">110127s2013 ad\|\| \|\|\|\| 00\|\|\| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781933988382</subfield><subfield code="c">pbk.</subfield><subfield code="9">978-1-933988-38-2</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">193398838X</subfield><subfield code="9">1-933988-38-X</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)706990107</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV037187785</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-19</subfield><subfield code="a">DE-523</subfield><subfield code="a">DE-188</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-B768</subfield><subfield code="a">DE-210</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 306</subfield><subfield code="0">(DE-625)143654:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 350</subfield><subfield code="0">(DE-625)143667:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Ingersoll, Grant S.</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Taming text</subfield><subfield code="b">how to find, organize, and manipulate it</subfield><subfield code="c">Grant S. Ingersoll ; Thomas S. Morton ; Andrew L. Farris</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Shelter Island, NY</subfield><subfield code="b">Manning</subfield><subfield code="c">2013</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXI, 298 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield><subfield code="c">24 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Text Mining</subfield><subfield code="0">(DE-588)4728093-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Text Mining</subfield><subfield code="0">(DE-588)4728093-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Morton, Thomas S.</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Farris, Andrew L.</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=021102273&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-021102273</subfield></datafield></record></collection>
id	DE-604.BV037187785
illustrated	Illustrated
indexdate	2024-07-09T22:52:58Z
institution	BVB
isbn	9781933988382 193398838X
language	English
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-021102273
oclc_num	706990107
open_access_boolean
owner	DE-19 DE-BY-UBM DE-523 DE-188 DE-739 DE-B768 DE-210
owner_facet	DE-19 DE-BY-UBM DE-523 DE-188 DE-739 DE-B768 DE-210
physical	XXI, 298 S. Ill., graph. Darst. 24 cm
publishDate	2013
publishDateSearch	2013
publishDateSort	2013
publisher	Manning
record_format	marc
spelling	Ingersoll, Grant S. Verfasser aut Taming text how to find, organize, and manipulate it Grant S. Ingersoll ; Thomas S. Morton ; Andrew L. Farris Shelter Island, NY Manning 2013 XXI, 298 S. Ill., graph. Darst. 24 cm txt rdacontent n rdamedia nc rdacarrier Text Mining (DE-588)4728093-1 gnd rswk-swf Text Mining (DE-588)4728093-1 s DE-604 Morton, Thomas S. Verfasser aut Farris, Andrew L. Verfasser aut HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=021102273&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis
spellingShingle	Ingersoll, Grant S. Morton, Thomas S. Farris, Andrew L. Taming text how to find, organize, and manipulate it Text Mining (DE-588)4728093-1 gnd
subject_GND	(DE-588)4728093-1
title	Taming text how to find, organize, and manipulate it
title_auth	Taming text how to find, organize, and manipulate it
title_exact_search	Taming text how to find, organize, and manipulate it
title_full	Taming text how to find, organize, and manipulate it Grant S. Ingersoll ; Thomas S. Morton ; Andrew L. Farris
title_fullStr	Taming text how to find, organize, and manipulate it Grant S. Ingersoll ; Thomas S. Morton ; Andrew L. Farris
title_full_unstemmed	Taming text how to find, organize, and manipulate it Grant S. Ingersoll ; Thomas S. Morton ; Andrew L. Farris
title_short	Taming text
title_sort	taming text how to find organize and manipulate it
title_sub	how to find, organize, and manipulate it
topic	Text Mining (DE-588)4728093-1 gnd
topic_facet	Text Mining
url	http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=021102273&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA
work_keys_str_mv	AT ingersollgrants tamingtexthowtofindorganizeandmanipulateit AT mortonthomass tamingtexthowtofindorganizeandmanipulateit AT farrisandrewl tamingtexthowtofindorganizeandmanipulateit

Verfügbarkeit

Es ist kein Print-Exemplar vorhanden.

Fernleihe Bestellen Achtung: Nicht im THWS-Bestand! Inhaltsverzeichnis

MARC

Datensatz im Suchindex

Es ist kein Print-Exemplar vorhanden.

Ähnliche Einträge