Introduction to linguistic annotation and text analytics:
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
San Rafael, CA
Morgan & Claypool
2009
|
Schriftenreihe: | Synthesis lectures on human language technologies
3 |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis Klappentext |
Beschreibung: | x, 149 Seiten Illustrationen |
ISBN: | 9781598297386 9781598297393 |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV035768010 | ||
003 | DE-604 | ||
005 | 20191011 | ||
007 | t | ||
008 | 091013s2009 a||| |||| 00||| eng d | ||
020 | |a 9781598297386 |c pbk |9 978-1-59829-738-6 | ||
020 | |a 9781598297393 |c ebook |9 978-1-59829-739-3 | ||
035 | |a (OCoLC)228425547 | ||
035 | |a (DE-599)BVBBV035768010 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
049 | |a DE-355 |a DE-384 |a DE-11 |a DE-20 | ||
050 | 0 | |a P98.3 | |
084 | |a ER 765 |0 (DE-625)27756: |2 rvk | ||
084 | |a ES 900 |0 (DE-625)27926: |2 rvk | ||
084 | |a ST 680 |0 (DE-625)143690: |2 rvk | ||
100 | 1 | |a Wilcock, Graham |e Verfasser |4 aut | |
245 | 1 | 0 | |a Introduction to linguistic annotation and text analytics |c Graham Wilcock |
264 | 1 | |a San Rafael, CA |b Morgan & Claypool |c 2009 | |
300 | |a x, 149 Seiten |b Illustrationen | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Synthesis lectures on human language technologies |v 3 | |
650 | 4 | |a Computational linguistics | |
650 | 4 | |a Corpora (Linguistics) | |
650 | 4 | |a Linguistic analysis (Linguistics) | |
650 | 4 | |a XML (Document markup language) | |
650 | 0 | 7 | |a Computerlinguistik |0 (DE-588)4035843-4 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Textanalyse |0 (DE-588)4194196-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Annotation |0 (DE-588)4560829-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Computerlinguistik |0 (DE-588)4035843-4 |D s |
689 | 0 | 1 | |a Annotation |0 (DE-588)4560829-5 |D s |
689 | 0 | 2 | |a Textanalyse |0 (DE-588)4194196-2 |D s |
689 | 0 | |5 DE-604 | |
830 | 0 | |a Synthesis lectures on human language technologies |v 3 |w (DE-604)BV035447238 |9 3 | |
856 | 4 | 2 | |m Digitalisierung UB Regensburg |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=018627761&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
856 | 4 | 2 | |m Digitalisierung UB Augsburg |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=018627761&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |3 Klappentext |
999 | |a oai:aleph.bib-bvb.de:BVB01-018627761 |
Datensatz im Suchindex
_version_ | 1804140696006819840 |
---|---|
adam_text | Preface
..............................................................................ix
Working with
XML...................................................................1
1.1
Introduction
...................................................................1
1.2
XML Basics
...................................................................2
1.3
XML Parsing and Validation
....................................................3
1.4
XML Transformations
..........................................................9
1.5
In-Line Annotations
..........................................................11
1.6
Stand-Off
Annotations
........................................................14
1.7
Annotation Standards
.........................................................18
1.8
Further Reading
...............................................................18
Linguistic Annotation
................................................................19
2.1
Levels of Linguistic Annotation
................................................19
2.2
WordFreak Annotation Tool
...................................................20
2.3
Sentence Boundaries
...........................................................22
2.4
Tokenization
..................................................................24
2.5
Part-of-Speech Tagging
........................................................27
2.6
Syntactic Parsing
..............................................................30
2.7
Semantics and Discourse
.......................................................33
2.8
WordFreakwith OpenNLP
....................................................38
2.9
Further Reading
...............................................................42
Using Statistical NLPTools
...........................................................45
3.1
Statistical Models
.............................................................45
3.2
OpenNLP and Stanford NLP Tools
.............................................46
3.3
Sentences and Tokenization
....................................................46
3.4
Statistical Tagging
.............................................................48
3.5
Chunking and Parsing
.........................................................49
3.6
Named Entity Recognition
.....................................................55
ii
CONTENTS
3.7
Coreference
Resolution........................................................59
3.8
Further Reading
...............................................................61
Ą
Annotation Interchange
..............................................................63
4.1
XSLT Transformations
........................................................63
4.2
WordFreak-OpenNLP Transformation
..........................................68
4.3
GATE XML Format
..........................................................71
4.4
GATE-WordFreak Transformation
.............................................75
4.5
XML Metadata Interchange: XMI
..............................................81
4.6
WordFreak-XMI Transformation
...............................................84
4.7
Towards Interoperability
.......................................................91
4.8
Further Reading
...............................................................93
5
Annotation Architectures
.............................................................95
5.1
GATE
.......................................................................95
5.2
GATE Information Extraction Tools
............................................97
5.3
Annotations with JAPE Rules
.................................................100
5.4
Customizing GATE Gazetteers
...............................................103
5.5
UIMA
......................................................................107
5.6
UIMA Wrappers for OpenNLPTools
.........................................108
5.7
Annotations with Regular Expressions
.........................................113
5.8
Customizing UIMA Dictionaries
..............................................115
5.9
Further Reading
.............................................................118
6
Text Analytics
......................................................................119
6.1
Text Analytics Tools
..........................................................119
6.2
Named Entity Recognition
....................................................122
6.3
Training Statistical Models
....................................................128
6.4
Coreference Resolution
.......................................................133
6.5
Information Extraction
.......................................................136
6.6
Text Mining and Searching
...................................................142
6.7
New Directions
..............................................................145
6.8
Further Reading
.............................................................145
Bibliography
.......................................................................147
Introduction
to Linguistic Annotation
and Text Analytics
Graham Wilcock
Linguistic annotation and text analytics are active areas of research and development, with academic
conferences and industry events such as the Linguistic Annotation Workshops and the annual Text
Analytics Summits. This book provides a basic introduction to both fields, and aims to show that good
linguistic annotations are the essential foundation for good text analytics.
After briefly reviewing the basics of XML, with practical exercises illustrating in-line and stand-off
annotations, a chapter is devoted to explaining the different levels of linguistic annotations. The reader
is encouraged to create example annotations using the WordFreak linguistic annotation tool. The next
chapter shows how annotations can be created automatically using statistical NLP tools, and compares
two sets of tools, the OpenNLP and Stanford NLP tools.
The second half of the book describes different annotation formats and gives practical examples of
how to interchange annotations between different formats using XSLT transformations. The two main
text analytics architectures, GATE and UIMA, are then described and compared, with practical exercises
showing how to configure and customize them. The final chapter is an introduction to text analytics,
describing the main applications and functions including named entity recognition, coreference resolution
and information extraction, with practical examples using both open source and commercial tools.
Copies of the example files, scripts, and stylesheets used in the book are available from the companion
website, located at sites.morganclaypool.com/wilcock.
|
any_adam_object | 1 |
author | Wilcock, Graham |
author_facet | Wilcock, Graham |
author_role | aut |
author_sort | Wilcock, Graham |
author_variant | g w gw |
building | Verbundindex |
bvnumber | BV035768010 |
callnumber-first | P - Language and Literature |
callnumber-label | P98 |
callnumber-raw | P98.3 |
callnumber-search | P98.3 |
callnumber-sort | P 298.3 |
callnumber-subject | P - Philology and Linguistics |
classification_rvk | ER 765 ES 900 ST 680 |
ctrlnum | (OCoLC)228425547 (DE-599)BVBBV035768010 |
discipline | Sprachwissenschaft Informatik Literaturwissenschaft |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02198nam a2200493 cb4500</leader><controlfield tag="001">BV035768010</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20191011 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">091013s2009 a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781598297386</subfield><subfield code="c">pbk</subfield><subfield code="9">978-1-59829-738-6</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781598297393</subfield><subfield code="c">ebook</subfield><subfield code="9">978-1-59829-739-3</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)228425547</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV035768010</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-355</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-20</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">P98.3</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ER 765</subfield><subfield code="0">(DE-625)27756:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ES 900</subfield><subfield code="0">(DE-625)27926:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 680</subfield><subfield code="0">(DE-625)143690:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Wilcock, Graham</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Introduction to linguistic annotation and text analytics</subfield><subfield code="c">Graham Wilcock</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">San Rafael, CA</subfield><subfield code="b">Morgan & Claypool</subfield><subfield code="c">2009</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">x, 149 Seiten</subfield><subfield code="b">Illustrationen</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Synthesis lectures on human language technologies</subfield><subfield code="v">3</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computational linguistics</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Corpora (Linguistics)</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Linguistic analysis (Linguistics)</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">XML (Document markup language)</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Computerlinguistik</subfield><subfield code="0">(DE-588)4035843-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Textanalyse</subfield><subfield code="0">(DE-588)4194196-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Annotation</subfield><subfield code="0">(DE-588)4560829-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Computerlinguistik</subfield><subfield code="0">(DE-588)4035843-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Annotation</subfield><subfield code="0">(DE-588)4560829-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Textanalyse</subfield><subfield code="0">(DE-588)4194196-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Synthesis lectures on human language technologies</subfield><subfield code="v">3</subfield><subfield code="w">(DE-604)BV035447238</subfield><subfield code="9">3</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=018627761&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Augsburg</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=018627761&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Klappentext</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-018627761</subfield></datafield></record></collection> |
id | DE-604.BV035768010 |
illustrated | Illustrated |
indexdate | 2024-07-09T22:04:04Z |
institution | BVB |
isbn | 9781598297386 9781598297393 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-018627761 |
oclc_num | 228425547 |
open_access_boolean | |
owner | DE-355 DE-BY-UBR DE-384 DE-11 DE-20 |
owner_facet | DE-355 DE-BY-UBR DE-384 DE-11 DE-20 |
physical | x, 149 Seiten Illustrationen |
publishDate | 2009 |
publishDateSearch | 2009 |
publishDateSort | 2009 |
publisher | Morgan & Claypool |
record_format | marc |
series | Synthesis lectures on human language technologies |
series2 | Synthesis lectures on human language technologies |
spelling | Wilcock, Graham Verfasser aut Introduction to linguistic annotation and text analytics Graham Wilcock San Rafael, CA Morgan & Claypool 2009 x, 149 Seiten Illustrationen txt rdacontent n rdamedia nc rdacarrier Synthesis lectures on human language technologies 3 Computational linguistics Corpora (Linguistics) Linguistic analysis (Linguistics) XML (Document markup language) Computerlinguistik (DE-588)4035843-4 gnd rswk-swf Textanalyse (DE-588)4194196-2 gnd rswk-swf Annotation (DE-588)4560829-5 gnd rswk-swf Computerlinguistik (DE-588)4035843-4 s Annotation (DE-588)4560829-5 s Textanalyse (DE-588)4194196-2 s DE-604 Synthesis lectures on human language technologies 3 (DE-604)BV035447238 3 Digitalisierung UB Regensburg application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=018627761&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis Digitalisierung UB Augsburg application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=018627761&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA Klappentext |
spellingShingle | Wilcock, Graham Introduction to linguistic annotation and text analytics Synthesis lectures on human language technologies Computational linguistics Corpora (Linguistics) Linguistic analysis (Linguistics) XML (Document markup language) Computerlinguistik (DE-588)4035843-4 gnd Textanalyse (DE-588)4194196-2 gnd Annotation (DE-588)4560829-5 gnd |
subject_GND | (DE-588)4035843-4 (DE-588)4194196-2 (DE-588)4560829-5 |
title | Introduction to linguistic annotation and text analytics |
title_auth | Introduction to linguistic annotation and text analytics |
title_exact_search | Introduction to linguistic annotation and text analytics |
title_full | Introduction to linguistic annotation and text analytics Graham Wilcock |
title_fullStr | Introduction to linguistic annotation and text analytics Graham Wilcock |
title_full_unstemmed | Introduction to linguistic annotation and text analytics Graham Wilcock |
title_short | Introduction to linguistic annotation and text analytics |
title_sort | introduction to linguistic annotation and text analytics |
topic | Computational linguistics Corpora (Linguistics) Linguistic analysis (Linguistics) XML (Document markup language) Computerlinguistik (DE-588)4035843-4 gnd Textanalyse (DE-588)4194196-2 gnd Annotation (DE-588)4560829-5 gnd |
topic_facet | Computational linguistics Corpora (Linguistics) Linguistic analysis (Linguistics) XML (Document markup language) Computerlinguistik Textanalyse Annotation |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=018627761&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=018627761&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |
volume_link | (DE-604)BV035447238 |
work_keys_str_mv | AT wilcockgraham introductiontolinguisticannotationandtextanalytics |