Treebanks: building and using parsed corpora
Gespeichert in:
Format: | Buch |
---|---|
Sprache: | English |
Veröffentlicht: |
Dordrecht
Kluwer Academic Publishers
2003
|
Schriftenreihe: | Text, speech and language technology
20 |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Includes index |
Beschreibung: | XXVI, 405 S. graph. Darst. |
ISBN: | 1402013345 1402013353 |
Internformat
MARC
LEADER | 00000nam a2200000zcb4500 | ||
---|---|---|---|
001 | BV017120686 | ||
003 | DE-604 | ||
005 | 20161006 | ||
007 | t | ||
008 | 030506s2003 ne d||| |||| 00||| eng d | ||
010 | |a 2003046759 | ||
020 | |a 1402013345 |c hb |9 1-4020-1334-5 | ||
020 | |a 1402013353 |c pb |9 1-4020-1335-3 | ||
035 | |a (OCoLC)52127831 | ||
035 | |a (DE-599)BVBBV017120686 | ||
040 | |a DE-604 |b ger |e aacr | ||
041 | 0 | |a eng | |
044 | |a ne |c NL | ||
049 | |a DE-19 |a DE-355 | ||
050 | 0 | |a P98.5.P38 | |
082 | 0 | |a 415 |2 21 | |
084 | |a ST 306 |0 (DE-625)143654: |2 rvk | ||
245 | 1 | 0 | |a Treebanks |b building and using parsed corpora |c ed. by Anne Abeillé |
264 | 1 | |a Dordrecht |b Kluwer Academic Publishers |c 2003 | |
300 | |a XXVI, 405 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Text, speech and language technology |v 20 | |
500 | |a Includes index | ||
650 | 4 | |a Analyse automatique (Linguistique) | |
650 | 7 | |a Corpora (taalkunde) |2 gtt | |
650 | 4 | |a Linguistique informatique | |
650 | 7 | |a Parsing |2 gtt | |
650 | 4 | |a Computational linguistics | |
650 | 4 | |a Parsing (Computer grammar) | |
650 | 0 | 7 | |a Syntaktische Analyse |0 (DE-588)4058778-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Korpus |g Linguistik |0 (DE-588)4165338-5 |2 gnd |9 rswk-swf |
655 | 7 | |0 (DE-588)4143413-4 |a Aufsatzsammlung |2 gnd-content | |
689 | 0 | 0 | |a Korpus |g Linguistik |0 (DE-588)4165338-5 |D s |
689 | 0 | 1 | |a Syntaktische Analyse |0 (DE-588)4058778-2 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Abeillé, Anne |e Sonstige |4 oth | |
830 | 0 | |a Text, speech and language technology |v 20 |w (DE-604)BV011123931 |9 20 | |
856 | 4 | 2 | |m HEBIS Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=010321571&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-010321571 |
Datensatz im Suchindex
_version_ | 1807956048255909888 |
---|---|
adam_text |
Treebanks
Building and Using Parsed Corpora
Edited by
Anne Abeille
Universite Paris 7, Pun's, France
KLUWER ACADEMIC PUBLISHERS
DORDRECHT / BOSTON / LONDON
Contents
Preface xi
Introduction xiii
AnneAbeille
1 Building Treebanks xv
2 Using treebanks xix
Part I Building treebanks
ENGLISH TREEBANKS
Chapter 1
THE PENN TREEBANK: AN OVERVIEW 5
Ann Taylor, Mitchell Marcus, Beatrice Santorini
1 The annotation schemes 6
2 Methodology 16
3 Conclusions 20
Chapter 2
THOUGHTS ON TWO DECADES OF DRAWING TREES
Geoffrey Sampson
Chapter 3
BANK OF
Historical background
Building treebanks
Exploiting the SUSANNE Treebank
Small is beautiful
Annotating a spoken corpus
Using the CHRISTINE Corpus
Conclusion
ENGLISH AND BEYOND
Timo Jarvinen
Introduction
Annotating 200 million words
ENGCG Syntax
FDG parser
Conclusion
vi TREEBANKS
Chapter 4
COMPLETING PARSED CORPORA FROM CORRECTION TO EVO- 61
LUTION
Sean Wallis
Introduction
Conventional post-correction
A paradigm shift: transverse correction
Critique
GERMAN TREEBANKS
Chapter 5
SYNTACTIC ANNOTATION OF A GERMAN NEWSPAPER CORPUS
Thorsten Brants, Wojciech Skut, Hans Uszkoreit
Introduction
Treebank development
Corpus annotation
Applications
Conclusions
Appendix: Tagsets
Introduction
Corpus Description
Annotation Strategy
Annotation Tools
Evaluation
First Results
Conclusion
Chapter 6
ANNOTATION OF ERROR TYPES, FOR A GERMAN NEWSGROUP 89
CORPUS
Markus Becker, Andrew Bredenkamp, Berthold Crysmann, Judith Klein
SLAVIC TREEBANKS
Chapter 7
THE PDT: A 3-LEVEL ANNOTATION SCENARIO 103
Alena Bohmovd, Jan Hajic, Eva Hajicovd, Barbora Hladkd
1 The Prague Dependency Treebank 103
2 Morphological Level 104
3 Analytical Level 106
4 Merging the Morphological and the Analytical Syntactic Level 114
5 Tectogrammatical Level 114
6 PDT versions 1 0 and 2 0 121
7 Conclusion 122
Appendix 126
Contents
Chapter 8
AN HPSG-ANNOTATED TEST SUITE FOR POLISH
Malgorzata Marciniak, Agnieszka Mvkowiecka, Adam Przepiorkowski,
Aims and design constraints
Correctness and complexity markers
Linguistic phenomena
Annotation schema
Implementation issues
Conclusion
TREEBANKS FOR ROMANCE LANGUAGES
Chapter 9
DEVELOPING A SPANISH TREEBANK
Antonio Moreno, Susana Lopez, Fernando Sanchez, Ralph Grishman
Aopen
Introduction
Data selection
Annotation scheme
Tools
Debugging and error statistics
Current state and future development
dix: Sample of trees
Vll
Anna Kupsc
Chapter 10
BUILDING A TREEBANK FOR FRENCH 165
Anne Abeille, Lionel Clement, Francois Toussenel
1 The tagging phase 166
2 The parsing phase 173
3 Current state and future work 180
4 Conclusion 181
Appendix ' 185
Chapter 11
BUILDING THE ITALIAN SYNTACTIC-SEMANTIC TREEBANK 189
Simonetta Montemagni, Francesco Barsotti, Marco Battista, Nicoletta Calzolari, Or-
nella Corazzari, Alessandro Lend, Antonio ZampollL Francesco Fanciulli, Maria Mas-
setani, Remo Raffaelli, Roberto Basili, Maria Teresa Pazienza, Dario Saracino, Fabio
Zanzotto, Nadia Mana, Fabio Pianesi, Rodolfo Delmonte
1 Introduction 190
2 ISST architecture 190
3 ISST corpus 191
4 ISST morpho-syntactic annotation 191
5 ISST syntactic annotation 192
6 ISST lexico-semantic annotation 196
7 The multi-level linguistic annotation tool 200
8 ISST evaluation 204
9 Conclusion 206
Appendix 209
viii TREEBANKS
Chapter 12
AUTOMATED CREATION OF A MEDIEVAL PORTUGUESE TREE- 211
BANK
Vitor Rocio, Mario Ainado Alves, J Gabriel Lopes, Maria Francisca Xavier, Gracia
Vicente
1 Introduction 211
2 The parsed corpus of medieval Portuguese texts 212
3 Tools and computational resources 215
4 Evaluation 222
5 Conclusion 224
TREEBANKS FOR OTHER LANGUAGES
Chapter 13
SINICA TREEBANK 231
Keh-Jiann Chen, Chi-Ching Luo, Ming-Chung Chang, Feng-Yi Chen, Chao-Jan Chen,
Chu-Ren Huang, Zhao-Ming Gao
1 Introduction 231
2 Design criteria 232
3 Representation of lexico-grammatical information: ICG 233
4 Annotation guideline 235
5 Implementation 239
6 Representational issues: problematic cases and how they are solved 241
7 Current status of the sinica treebank and future work 243
Appendix: Syntactic Categories 248
Chapter 14
BUILDING A JAPANESE PARSED CORPUS 249
Sadao Kurohashi, Makoto Nagao
1 Introduction 249
2 Overview of the project 250
3 Morphological analyzer JUMAN 253
4 Dependency structure analyzer KNP 255
5 Conclusion 259
Chapter 15
BUILDING A TURKISH TREEBANK 261
Kemal Oflazer, Bilge Say, Dilek Zeynep Hakkani-Tur, Gokhan Tiir
1 Turkish: Morphology and syntax 262
2 What information needs to be represented? 263
3 The annotation tool 270
4 Some difficult issues 272
5 Conclusions and future work 273
Appendix: Turkish Morphological Features 276
Contents ix
Part II Using treebanks
Chapter 16
ENCODING SYNTACTIC ANNOTATION 28 I
Nancy Ide, Laurent Romary
1 Introduction 281
2 XCES 283
3 Syntactic annotation: current practice 284
4 A model for syntactic annotation 286
5 Using the XCES scheme 291
6 Conclusion 293
EVALUATION WITH TREEBANKS
Chapter 17
PARSER EVALUATION 299
John Carroll Guido Minnen Ted Briscoe
Chapter
Introduction
Grammatical relation annotation
Corpus annotation
Parser evaluation
Discussion
Summary
DEPENDENCY-BASED EVALUATION OF MINIPAR
Dekang Lin
Introduction
Dependency-based parser evaluation
Evaluation of minipar with susanne corpus
Selective evaluation
Related work
Conclusions
GRAMMAR INDUCTION WITH TREEBANKS
Chapter 19
EXTRACTING STOCHASTIC GRAMMARS FROM TREEBANKS 333
Rens Bod
1 Introduction 333
2 Summary of data-oriented parsing 335
3 Simulating stochastic grammars by constraining the subtree set 337
4 Discussion and conclusion 344
Giinter Neumann
Introduction
Related work
Grammar extraction
SLTG from treebanks
SLTG from HPSG
Future steps: towards merging SLTGs
X TREEBANKS
Chapter 20
STOCHASTIC LEXICALIZED TREE GRAMMARS 351
Chapter 21
FROM TREEBANK RESOURCES TO LFG F-STRUCTURES 367
Anette Frank, Louisa Sadler, Josef van Genabith, Andy Way
1 Introduction 368
2 Methods for automatic f-structure annotation 370
3 Two Experiments 380
4 Discussion and Current Research 383
5 Summary 385
Appendix: Example of an Automatically Generated F-Structure (Susanne
Corpus) 389
Contributing Authors 391
Index 398 |
any_adam_object | 1 |
building | Verbundindex |
bvnumber | BV017120686 |
callnumber-first | P - Language and Literature |
callnumber-label | P98 |
callnumber-raw | P98.5.P38 |
callnumber-search | P98.5.P38 |
callnumber-sort | P 298.5 P38 |
callnumber-subject | P - Philology and Linguistics |
classification_rvk | ST 306 |
ctrlnum | (OCoLC)52127831 (DE-599)BVBBV017120686 |
dewey-full | 415 |
dewey-hundreds | 400 - Language |
dewey-ones | 415 - Grammar |
dewey-raw | 415 |
dewey-search | 415 |
dewey-sort | 3415 |
dewey-tens | 410 - Linguistics |
discipline | Sprachwissenschaft Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000zcb4500</leader><controlfield tag="001">BV017120686</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20161006</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">030506s2003 ne d||| |||| 00||| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">2003046759</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1402013345</subfield><subfield code="c">hb</subfield><subfield code="9">1-4020-1334-5</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1402013353</subfield><subfield code="c">pb</subfield><subfield code="9">1-4020-1335-3</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)52127831</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV017120686</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">ne</subfield><subfield code="c">NL</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-19</subfield><subfield code="a">DE-355</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">P98.5.P38</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">415</subfield><subfield code="2">21</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 306</subfield><subfield code="0">(DE-625)143654:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Treebanks</subfield><subfield code="b">building and using parsed corpora</subfield><subfield code="c">ed. by Anne Abeillé</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Dordrecht</subfield><subfield code="b">Kluwer Academic Publishers</subfield><subfield code="c">2003</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXVI, 405 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Text, speech and language technology</subfield><subfield code="v">20</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes index</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Analyse automatique (Linguistique)</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Corpora (taalkunde)</subfield><subfield code="2">gtt</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Linguistique informatique</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Parsing</subfield><subfield code="2">gtt</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computational linguistics</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Parsing (Computer grammar)</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Syntaktische Analyse</subfield><subfield code="0">(DE-588)4058778-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)4143413-4</subfield><subfield code="a">Aufsatzsammlung</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Syntaktische Analyse</subfield><subfield code="0">(DE-588)4058778-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Abeillé, Anne</subfield><subfield code="e">Sonstige</subfield><subfield code="4">oth</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Text, speech and language technology</subfield><subfield code="v">20</subfield><subfield code="w">(DE-604)BV011123931</subfield><subfield code="9">20</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HEBIS Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=010321571&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-010321571</subfield></datafield></record></collection> |
genre | (DE-588)4143413-4 Aufsatzsammlung gnd-content |
genre_facet | Aufsatzsammlung |
id | DE-604.BV017120686 |
illustrated | Illustrated |
indexdate | 2024-08-21T00:47:28Z |
institution | BVB |
isbn | 1402013345 1402013353 |
language | English |
lccn | 2003046759 |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-010321571 |
oclc_num | 52127831 |
open_access_boolean | |
owner | DE-19 DE-BY-UBM DE-355 DE-BY-UBR |
owner_facet | DE-19 DE-BY-UBM DE-355 DE-BY-UBR |
physical | XXVI, 405 S. graph. Darst. |
publishDate | 2003 |
publishDateSearch | 2003 |
publishDateSort | 2003 |
publisher | Kluwer Academic Publishers |
record_format | marc |
series | Text, speech and language technology |
series2 | Text, speech and language technology |
spelling | Treebanks building and using parsed corpora ed. by Anne Abeillé Dordrecht Kluwer Academic Publishers 2003 XXVI, 405 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Text, speech and language technology 20 Includes index Analyse automatique (Linguistique) Corpora (taalkunde) gtt Linguistique informatique Parsing gtt Computational linguistics Parsing (Computer grammar) Syntaktische Analyse (DE-588)4058778-2 gnd rswk-swf Korpus Linguistik (DE-588)4165338-5 gnd rswk-swf (DE-588)4143413-4 Aufsatzsammlung gnd-content Korpus Linguistik (DE-588)4165338-5 s Syntaktische Analyse (DE-588)4058778-2 s DE-604 Abeillé, Anne Sonstige oth Text, speech and language technology 20 (DE-604)BV011123931 20 HEBIS Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=010321571&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Treebanks building and using parsed corpora Text, speech and language technology Analyse automatique (Linguistique) Corpora (taalkunde) gtt Linguistique informatique Parsing gtt Computational linguistics Parsing (Computer grammar) Syntaktische Analyse (DE-588)4058778-2 gnd Korpus Linguistik (DE-588)4165338-5 gnd |
subject_GND | (DE-588)4058778-2 (DE-588)4165338-5 (DE-588)4143413-4 |
title | Treebanks building and using parsed corpora |
title_auth | Treebanks building and using parsed corpora |
title_exact_search | Treebanks building and using parsed corpora |
title_full | Treebanks building and using parsed corpora ed. by Anne Abeillé |
title_fullStr | Treebanks building and using parsed corpora ed. by Anne Abeillé |
title_full_unstemmed | Treebanks building and using parsed corpora ed. by Anne Abeillé |
title_short | Treebanks |
title_sort | treebanks building and using parsed corpora |
title_sub | building and using parsed corpora |
topic | Analyse automatique (Linguistique) Corpora (taalkunde) gtt Linguistique informatique Parsing gtt Computational linguistics Parsing (Computer grammar) Syntaktische Analyse (DE-588)4058778-2 gnd Korpus Linguistik (DE-588)4165338-5 gnd |
topic_facet | Analyse automatique (Linguistique) Corpora (taalkunde) Linguistique informatique Parsing Computational linguistics Parsing (Computer grammar) Syntaktische Analyse Korpus Linguistik Aufsatzsammlung |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=010321571&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
volume_link | (DE-604)BV011123931 |
work_keys_str_mv | AT abeilleanne treebanksbuildingandusingparsedcorpora |