Integrating deep and shallow natural language processing components: representations and hybrid architectures
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Abschlussarbeit Buch |
Sprache: | English |
Veröffentlicht: |
Saarbrücken
DFKI [u.a.]
2007
|
Schriftenreihe: | Saarbrücken dissertations in computational linguistics and language technology
22 |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | 350 S. graph. Darst. 21 cm |
ISBN: | 9783933218216 |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV036612515 | ||
003 | DE-604 | ||
005 | 00000000000000.0 | ||
007 | t | ||
008 | 100811s2007 d||| m||| 00||| eng d | ||
010 | |a 2008384333 | ||
020 | |a 9783933218216 |9 978-3-933218-21-6 | ||
035 | |a (OCoLC)612398437 | ||
035 | |a (DE-599)GBV541656457 | ||
040 | |a DE-604 |b ger | ||
041 | 0 | |a eng | |
049 | |a DE-83 | ||
082 | 0 | |a 006.35 |2 22 | |
084 | |a ES 940 |0 (DE-625)27934: |2 rvk | ||
100 | 1 | |a Schäfer, Ulrich |e Verfasser |0 (DE-588)133633926 |4 aut | |
245 | 1 | 0 | |a Integrating deep and shallow natural language processing components |b representations and hybrid architectures |c Ulrich Schäfer |
264 | 1 | |a Saarbrücken |b DFKI [u.a.] |c 2007 | |
300 | |a 350 S. |b graph. Darst. |c 21 cm | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Saarbrücken dissertations in computational linguistics and language technology |v 22 | |
502 | |a Zugl.: Saarbrücken, Univ., Diss., 2007 | ||
650 | 0 | |a Computational linguistics | |
650 | 0 | |a Natural language processing (Computer science) | |
650 | 0 | 7 | |a Computerlinguistik |0 (DE-588)4035843-4 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Sprachverarbeitung |0 (DE-588)4116579-2 |2 gnd |9 rswk-swf |
655 | 7 | |0 (DE-588)4113937-9 |a Hochschulschrift |2 gnd-content | |
689 | 0 | 0 | |a Computerlinguistik |0 (DE-588)4035843-4 |D s |
689 | 0 | 1 | |a Sprachverarbeitung |0 (DE-588)4116579-2 |D s |
689 | 0 | |5 DE-604 | |
830 | 0 | |a Saarbrücken dissertations in computational linguistics and language technology |v 22 |w (DE-604)BV013075694 |9 22 | |
856 | 4 | 2 | |m HEBIS Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020532739&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-020532739 |
Datensatz im Suchindex
_version_ | 1804143218745409536 |
---|---|
adam_text | Ulrich Schafer
Integrating Deep and Shallow
Natural Language Processing
Components
Representations and Hybrid Architectures
Dissertation zur Eriangung des Grades des Doktors der
Ingenieurwissenschaften der Naturwissenschaftlich-Technischen
Fakultaten der Universitat des Saarlandes
Saarbriicken, im Dezember 2006
SAARBRUCKEN DISSERTATIONS VOLUME 22
IN COMPUTATIONAL LINGUISTICS AND LANGUAGE TECHNOLOGY
Contents
1 Introduction 15
2 Definitions and Motivation 17
2 1 Deep and Shallow Natural Language Processing 17
211 Deep Natural Language Processing 17
212 Shallow Natural Language Processing 19
2 2 Integration of the Paradigms 20
2 3 Benefits of Robust DNLP 22
3 Deep Linguistic Processing with HPSG 27
31A Short Introduction to HPSG 27
311 Excursus: Typed Feature Structures 28
312 HPSG and HPSG Parsing 35
3 2 Performance Properties of HPSG 42
321 Parsing Complexity 42
322 Implementations and Efficiency 43
323 Robustness 43
4 Shallow Processing and Linguistic Markup 47
4 1 Shallow Natural Language Processing 47
1 Tokenization 48
2 Finite-State Morphology and Compound Recognition 48
3 Part-of-Speech Tagging 49
4 Chunking 50
5 Shallow Parsing 51
6 Named Entity Recognition 52
7 Summary 52
4 2 Shallow Processing and XML Markup 53
421 SGML 53
422 XML 55
4 Contents
423 Well-Formed and Valid Documents 55
424 Strictly Structured vs Semi-Structured Documents 58
425 XML as Carrier Syntax for Computer Languages 59
426 XML as Open Data Structure 60
427 Linguistic Markup 60
428 Standards for Linguistic Markup 62
429 Further XML Standards Related to Linguistic Pro-
cessing 66
4 3 XML-based Linguistic Annotation 68
431 Standoff Annotation 72
432 Related Annotation Standards 73
433 Summary 76
5 Deep-Shallow Integration by Transformation 79
5 1 The Deep-Shallow Mapping Problem 79
511 Summary 88
5 2 NLP Integration by Transformation 88
521 Querying Multi-level (Standoff) Annotation 89
522 Using Corpus Query Languages for NLP Component
Integration? 93
5 3 Markup Transformation and Query with XSLT 94
531 Brief Introduction to XSLT 97
532 XQuery vs XSLT 101
533 NLP Integration and Computation with XSLT 102
5 4 Transforming XML-encoded TFS 104
541 Accessing and Transforming Feature Structure XML 104
542 The Role of Feature Structure XML Transformation
for the Integration of NLP Components 105
5 5 Summary 109
6 Hybrid Architectures 111
6 1 Motivation and Requirements I l l
6 2 RelatedWork 112
6 3 General XML Processing Frameworks 116
6 4 The Deep-Shallow Architectures Trilogy 117
7 SProUT 119
7 1 Introduction 119
72A Brief Introduction to SProUT 120
Contents 5
721 Motivation 120
722 Targeted Applications 120
723 RelatedWork 121
724 The SProUT Formalism 122
725 Architecture and Components 130
7 3 SProUTputDTD 133
7 4 Compile Time Type Check 135
7 5 Visualization 136
7 6 Applications 138
7 7 Evaluation 143
771 Evaluation Snapshot of the Multilingual NE Grammars 144
7 8 Building, Testing and Evaluation with SProlTTomat 146
781 Motivation 146
782 SProlTTomat 146
783 Building and Testing 147
784 Evaluation with JTaCo 149
785 Report 153
786 Summary and Outlook 154
7 9 SProUT Summary and Relation to Deep Processing 154
8 Whiteboard 157
8 1 Introduction and Motivation 157
8 2 The WHITEBOARD Architecture 158
8 3 The WHITEBOARD Annotation Machine (WHAM) 159
8 4 WHITEBOARD I 161
841 Components 161
842 Integration 165
8 5 First Evaluation 166
8 6 Applications on the Basis of WHAM 167
861 WAG - Mining Answers in German Web Pages 167
862 WHIES - Integrating Shallow and Deep NLP for IE 168
8 7 WHITEBOARD II 171
871 WHAT, the WHITEBOARD Annotation Transformer 172
872 WHAT Query Types 173
873 Topoparser Integration 177
874 Finding Appropriate Linguistic Structures 178
875 Architecture of the Hybrid Deep-Shallow System 179
876 Evaluation Results 190
877 Conclusion 191
Contents
878 Transformation for Visualization 192
8 8 RelatedWork 192
8 9 Summary 193
Heart of Gold 197
9 1 Introduction and Motivation 197
9 2 Project Context: DEEPTHOUGHT and QUETAL 197
9 3 Middleware Architecture 199
931 Overview 199
932 The Module Communication Manager (MoCoMan) 200
933 Modules and Components 202
934 NLP Analysis 203
935 Default Processing Strategy 204
936 Session and Annotation Management 205
937 Metadata 206
938 XML Annotation Database 206
939 Annotation Transformation Service 208
9 4 RMRS as Common Semantic Annotation Format 209
9 5 Integrated NLP Components 215
951 Tokenization, Word and Sentence Segmentation 216
952 Part-of-Speech Tagging 219
953 Chunking and Shallow Parsing 221
954 Named Entity Recognition and Information Extraction 222
955 Deep Parsing: The PetModule 232
956 Further Integrated NLP Components 239
957 Sub-Architectures with the Generic SdlModule 240
9 6 Deep-shallow integration scenarios 250
961 Sample Configuration for German 250
962 Sample Configuration for English 252
963 Sample Configuration for Japanese 253
9 7 Interfacing Ontologies 254
971 OntoNERdlE 256
9 8 Visualization 261
9 9 Evaluation 262
991 Hybrid Parsing Evaluation 262
992 Evaluation in Application Context 267
9 10 Further Applications Based on Heart of Gold 270
9 10 1 Creative Authoring Support 271
9 10 2 Question Answering from Struct Knowledge Sources 276
Contents 7
9 11 Further Applications 298
9 11 1 Learning Transfer Rules for Machine Translation 298
9 11 2 Parsing Japanese Dictionary Definition Sentences 298
9 11 3 Trailfinder 299
9 11 4 Soccer SmartWeb 299
9 11 5 RMRS Chatterbot 300
9 11 6 Training 301
9 11 7 Anaphora, Coreference Resolution in Discourse 301
9 11 8 Modern Greek Grammar 301
9 11 9 Spanish HPSG Grammar with Shallow Preprocessing 302
9 11 10 Parsing Debian Linux User Forum Discussions 302
9 11 11 SciBorg 303
9 12 Heart of Gold in International Collaboration 303
9 13 RelatedWork 304
9 14 Outlook and Future Work 304
10 Conclusion 307
A DTDs 309
A I ACE DTD Fragment 309
A 2 TFSDTD 310
A 3 XTDL 310
A 4 SProUTput 312
A 5 JTok 313
A 6 TnT 314
A 7 Chunkie 314
A 8 RMRS 315
A 9 PET Input Chart DTD 316
A 10 Simple Preprocessor Protocol (SPPP) DTD 317
B XSLT Stylesheets 319
B I Automatically Generated SProUT to RMRS Stylesheet 319
B 2 Combining Input Annotations 320
B 3 Removing Conflicting Items in the PET Input Chart 322
B 4 Sorting and Filtering Longest RMRS Fragments 323
B 5 Sorting and Merging RDF Descriptions 324
B 6 SPPPtoPIC 325
|
any_adam_object | 1 |
author | Schäfer, Ulrich |
author_GND | (DE-588)133633926 |
author_facet | Schäfer, Ulrich |
author_role | aut |
author_sort | Schäfer, Ulrich |
author_variant | u s us |
building | Verbundindex |
bvnumber | BV036612515 |
classification_rvk | ES 940 |
ctrlnum | (OCoLC)612398437 (DE-599)GBV541656457 |
dewey-full | 006.35 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 006 - Special computer methods |
dewey-raw | 006.35 |
dewey-search | 006.35 |
dewey-sort | 16.35 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik Sprachwissenschaft Literaturwissenschaft |
format | Thesis Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01873nam a2200433 cb4500</leader><controlfield tag="001">BV036612515</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">00000000000000.0</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">100811s2007 d||| m||| 00||| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">2008384333</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9783933218216</subfield><subfield code="9">978-3-933218-21-6</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)612398437</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBV541656457</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-83</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.35</subfield><subfield code="2">22</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ES 940</subfield><subfield code="0">(DE-625)27934:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Schäfer, Ulrich</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)133633926</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Integrating deep and shallow natural language processing components</subfield><subfield code="b">representations and hybrid architectures</subfield><subfield code="c">Ulrich Schäfer</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Saarbrücken</subfield><subfield code="b">DFKI [u.a.]</subfield><subfield code="c">2007</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">350 S.</subfield><subfield code="b">graph. Darst.</subfield><subfield code="c">21 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Saarbrücken dissertations in computational linguistics and language technology</subfield><subfield code="v">22</subfield></datafield><datafield tag="502" ind1=" " ind2=" "><subfield code="a">Zugl.: Saarbrücken, Univ., Diss., 2007</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Computational linguistics</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Natural language processing (Computer science)</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Computerlinguistik</subfield><subfield code="0">(DE-588)4035843-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Sprachverarbeitung</subfield><subfield code="0">(DE-588)4116579-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)4113937-9</subfield><subfield code="a">Hochschulschrift</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Computerlinguistik</subfield><subfield code="0">(DE-588)4035843-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Sprachverarbeitung</subfield><subfield code="0">(DE-588)4116579-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Saarbrücken dissertations in computational linguistics and language technology</subfield><subfield code="v">22</subfield><subfield code="w">(DE-604)BV013075694</subfield><subfield code="9">22</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HEBIS Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020532739&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-020532739</subfield></datafield></record></collection> |
genre | (DE-588)4113937-9 Hochschulschrift gnd-content |
genre_facet | Hochschulschrift |
id | DE-604.BV036612515 |
illustrated | Illustrated |
indexdate | 2024-07-09T22:44:10Z |
institution | BVB |
isbn | 9783933218216 |
language | English |
lccn | 2008384333 |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-020532739 |
oclc_num | 612398437 |
open_access_boolean | |
owner | DE-83 |
owner_facet | DE-83 |
physical | 350 S. graph. Darst. 21 cm |
publishDate | 2007 |
publishDateSearch | 2007 |
publishDateSort | 2007 |
publisher | DFKI [u.a.] |
record_format | marc |
series | Saarbrücken dissertations in computational linguistics and language technology |
series2 | Saarbrücken dissertations in computational linguistics and language technology |
spelling | Schäfer, Ulrich Verfasser (DE-588)133633926 aut Integrating deep and shallow natural language processing components representations and hybrid architectures Ulrich Schäfer Saarbrücken DFKI [u.a.] 2007 350 S. graph. Darst. 21 cm txt rdacontent n rdamedia nc rdacarrier Saarbrücken dissertations in computational linguistics and language technology 22 Zugl.: Saarbrücken, Univ., Diss., 2007 Computational linguistics Natural language processing (Computer science) Computerlinguistik (DE-588)4035843-4 gnd rswk-swf Sprachverarbeitung (DE-588)4116579-2 gnd rswk-swf (DE-588)4113937-9 Hochschulschrift gnd-content Computerlinguistik (DE-588)4035843-4 s Sprachverarbeitung (DE-588)4116579-2 s DE-604 Saarbrücken dissertations in computational linguistics and language technology 22 (DE-604)BV013075694 22 HEBIS Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020532739&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Schäfer, Ulrich Integrating deep and shallow natural language processing components representations and hybrid architectures Saarbrücken dissertations in computational linguistics and language technology Computational linguistics Natural language processing (Computer science) Computerlinguistik (DE-588)4035843-4 gnd Sprachverarbeitung (DE-588)4116579-2 gnd |
subject_GND | (DE-588)4035843-4 (DE-588)4116579-2 (DE-588)4113937-9 |
title | Integrating deep and shallow natural language processing components representations and hybrid architectures |
title_auth | Integrating deep and shallow natural language processing components representations and hybrid architectures |
title_exact_search | Integrating deep and shallow natural language processing components representations and hybrid architectures |
title_full | Integrating deep and shallow natural language processing components representations and hybrid architectures Ulrich Schäfer |
title_fullStr | Integrating deep and shallow natural language processing components representations and hybrid architectures Ulrich Schäfer |
title_full_unstemmed | Integrating deep and shallow natural language processing components representations and hybrid architectures Ulrich Schäfer |
title_short | Integrating deep and shallow natural language processing components |
title_sort | integrating deep and shallow natural language processing components representations and hybrid architectures |
title_sub | representations and hybrid architectures |
topic | Computational linguistics Natural language processing (Computer science) Computerlinguistik (DE-588)4035843-4 gnd Sprachverarbeitung (DE-588)4116579-2 gnd |
topic_facet | Computational linguistics Natural language processing (Computer science) Computerlinguistik Sprachverarbeitung Hochschulschrift |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020532739&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
volume_link | (DE-604)BV013075694 |
work_keys_str_mv | AT schaferulrich integratingdeepandshallownaturallanguageprocessingcomponentsrepresentationsandhybridarchitectures |