History, features, and typology of language corpora:
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Singapore
Springer
[2018]
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis Klappentext |
Beschreibung: | xxix, 293 Seiten Illustrationen |
ISBN: | 9789811074578 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV044792370 | ||
003 | DE-604 | ||
005 | 20180914 | ||
007 | t | ||
008 | 180222s2018 si a||| |||| 00||| eng d | ||
016 | 7 | |a 1142550389 |2 DE-101 | |
020 | |a 9789811074578 |c hardback : 82.38 EUR |9 978-981-10-7457-8 | ||
035 | |a (OCoLC)1028925239 | ||
035 | |a (DE-599)DNB1142550389 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
044 | |a si |c XB-SG | ||
049 | |a DE-29 |a DE-11 |a DE-384 |a DE-19 | ||
084 | |a ES 900 |0 (DE-625)27926: |2 rvk | ||
084 | |a ER 765 |0 (DE-625)27756: |2 rvk | ||
084 | |a 400 |2 sdnb | ||
100 | 1 | |a Dash, Niladri Sekhar |d 1967- |e Verfasser |0 (DE-588)143093800 |4 aut | |
245 | 1 | 0 | |a History, features, and typology of language corpora |c Niladri Sekhar Dash, S. Arulmozi |
264 | 1 | |a Singapore |b Springer |c [2018] | |
264 | 4 | |c © 2018 | |
300 | |a xxix, 293 Seiten |b Illustrationen | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 0 | 7 | |a Korpus |g Linguistik |0 (DE-588)4165338-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Korpus |g Linguistik |0 (DE-588)4165338-5 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Arulmozi, S. |e Verfasser |4 aut | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-981-10-7458-5 |
856 | 4 | 2 | |m Digitalisierung UB Augsburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030187547&sequence=000003&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
856 | 4 | 2 | |m Digitalisierung UB Augsburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030187547&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |3 Klappentext |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-030187547 |
Datensatz im Suchindex
_version_ | 1807956190299160576 |
---|---|
adam_text |
Contents
1 Definition of ‘Corpus’. 1
1.1 Introduction. 1
1.2 Some Popular Definitions of‘Corpus5. 2
1.3 What Is a Corpus?. 3
1.4 The Acronym. 7
1.5 Corpus, Dataset and Database. 7
1.6 Formational Principles. 10
1.7 The Benefits of a Corpus. 11
1.8 Advantages of a Corpus. 13
1.9 Conclusion. 14
References. 15
2 Features of a Corpus. 17
2.1 Introduction. 17
2.2 Quantity. 18
2.3 Quality. 21
2.4 Representation. 22
2.5 Simplicity. 24
2.6 Equality. 26
2.7 Retrievability. 27
2.8 Verifiability. 28
2.9 Augmentation. 29
2.10 Documentation. 30
2.11 Management. 32
2.12 Conclusion. 33
References. 33
XVI
Contents
3 Genre of Text. 35
3.1 Introduction. 35
3.2 Why Classify Corpora?. 36
3.3 Genre of Text. 38
3.4 Text Corpus. 39
3.5 Speech Corpus. 41
3.6 Spoken Corpus. 46
3.7 Conclusion. 48
References. 49
4 Nature of Data. 51
4.1 Introduction. 51
4.2 General Corpus. 52
4.3 Special Corpus. 53
4.4 Sample Corpus. 55
4.5 Literary Corpus. 56
4.6 Monitor Corpus. 56
4.7 Multimodal Corpus. 58
4.8 Sublanguage Corpus. 60
4.9 Controlled Language Corpus. 62
4.10 Conclusion. 64
References. 64
5 Type and Purpose of Text. 67
5.1 Introduction. 67
5.2 Type of Text. 68
5.2.1 Monolingual Corpus. 68
5.2.2 Bilingual Corpus. 69
5.2.3 Multilingual Corpus. 71
5.3 Purpose of Design. 73
5.3.1 Unannotated Corpus. 73
5.3.2 Annotated Corpus. 74
5.4 Maxims of Corpus Annotation. 77
5.5 Issues Involved in Annotation. 79
5.6 The Challenges. 79
5.7 The State of the Art. 80
5.8 Conclusion. 81
References. 82
6 Nature of Text Application. 85
6.1 Introduction. 85
6.2 Parallel Corpus. 86
6.3 Translation Corpus. 89
6.4 Aligned Corpus. 90
6.5 Comparable Corpus. 93
Contents
XVII
6.6 Reference Corpus. 95
6.7 Learner Coipus. 96
6.8 Opportunistic Corpus. 97
6.9 Conclusion. 97
References. 98
7 Parallel Translation Corpus. 101
7.1 Introduction. 101
7.2 Definition of a Parallel Translation Corpus (PTC). 102
7.3 Construction of a PTC. 104
7.4 Features of a PTC. 105
7.4.1 Large Quantity of Data. 106
7.4.2 Quality of Text. 107
7.4.3 Text Representation. 107
7.4.4 Simplicity. 108
7.4.5 Equality. 108
7.4.6 Retriev ability. 109
7.4.7 Verifiability. 109
7.4.8 Augmentation. 110
7.4.9 Documentation. 110
7.5 Alignment of Texts in PTC. Ill
7.6 Analysis of Text in PTC. 114
7.7 Restructuring Translation Units in PTC. 115
7.8 Extraction of Translational Equivalent Units. 117
7.9 Bilingual Lexical Database. 118
7.10 Bilingual Temiinology Databank. 119
7.11 Conclusion. 121
References. 122
8 Web Text Corpus. 125
8.1 Introduction. 125
8.2 Defining a Web Text Corpus. 126
8.3 Theoretical Frame. 127
8.4 Purpose Behind a Web Text Corpus. 129
8.5 Early Attempts for Web Text Corpus Generation. 131
8.6 Methodologies Applied . 133
8.6.1 Overall Design of the Web Text Corpus. 133
8.6.2 Domains and Sub-domains of Texts. 133
8.6.3 Data Collection. 134
8.7 Metadata Information. 136
8.7.1 Computerizing the Data. 137
8.7.2 Validation of Web Corpus. 140
8.8 Problems in Generation of Web Text Corpus. 140
8.8.1 Technical Problems . 141
xviii Contents
8.8.2 Linguistic Problems. 141
8.9 Conclusion. 144
References. 145
9 Pre-digital Corpora (Part 1). 147
9.1 Introduction. 147
9.2 The Questions of Relevance. 148
9.3 Word Collection from Corpora for Dictionary Compilation . . . 150
9.3.1 Johnson’s Dictionary (1755). 151
9.3.2 The Oxford English Dictionary (1882). 153
9.3.3 Supplementary Volumes of the Oxford English
Dictionary. 156
9.3.4 Dictionary of American English. 157
9.4 Collecting Quotations for Dictionary. 158
9.5 Corpora in Lexical Study. 160
9.6 Corpora for Writing Grammars. 162
9.7 Conclusion. 164
References. 164
10 Pre-digital Corpora (Part 2). 167
10.1 Introduction. 167
10.2 Corpora in Dialect Study. 168
10.3 Corpora in Speech Study. 176
10.4 Corpora in Language Pedagogy. 179
10.5 Corpora in Language Acquisition. 181
10.6 Corpora in Stylistic Studies. 182
10.7 Corpora in Other Fields. 183
10.8 Conclusion. 184
References. 185
11 Digital Text Corpora (Part 1). 187
11.1 Introduction. 187
11.2 The Brown Corpus. 188
11.3 The LOB Corpus. 191
11.4 The Australian Corpus of English. 194
11.5 The Corpus of New Zealand English. 195
11.6 The Freiburg-LOB Corpus. 197
11.7 The International Corpus of English. 198
11.8 Conclusion. 201
References. 201
12 Digital Text Corpora (Part 2). 203
12.1 Introduction. 203
12.2 British National Corpus. 204
12.3 BNC-Baby. 205
Contents
xix
12.4 American National Corpus. 206
12.5 Bank of English. 208
12.6 Croatian National Corpus. 209
12.7 English-Norwegian Parallel Corpus. 210
12.8 Some Small-Sized Text Corpora. 212
12.9 Conclusion. 218
References. 219
13 Digital Speech Corpora. 221
13.1 Introduction. 221
13.2 The Hurdles. 222
13.3 Relevance of the Survey. 223
13.4 Speech Part of Survey of English Usage. 224
13.5 London-Lund Corpus of Spoken English. 226
13.6 Machine-Readable Corpus of Spoken English. 227
13.7 Corpus of Spoken New Zealand English. 228
13.8 Michigan Corpus of Academic Speech. 230
13.9 Corpus of London Teenage Language. 233
13.10 Some Small-Sized Speech Corpora. 235
13.11 Conclusion. 238
References. 238
14 Utilization of Language Corpora. 241
14.1 Introduction. 241
14.2 Utility of a Corpus. 242
14.3 The Revival Story. 244
14.4 Use of a Corpus. 245
14.5 Corpus Users. 248
14.5.1 Language Specialists. 248
14.5.2 Content Specialists. 249
14.5.3 Media Specialists. 249
14.6 Corpora in Language Technology. 250
14.7 Mutual Dependency Interface . 256
14.8 Conclusion. 257
References. 257
15 Limitations of Language Corpora. 259
15.1 Introduction. 259
15.2 Criticism from Generative Linguistics. 261
15.3 Paucity in Balanced Text Representation. 262
15.4 Limitations in Technical Efficiency. 263
15.5 Supremacy of Text Over Speech. 265
15.6 Scarcity of Dialogic Texts. 267
15.7 Lack of Pictorial Elements in Corpus. 268
15.8 Lack of Poetic Text. 269
xx Contents
15.9 Other Limitations. 270
15.10 Conclusion. 271
References. 271
Author Index. 273
Subject Index. 277
Niladri Sekhar Dash • S. Arulmozi
History, Features, and Typology of Language Corpora
This book discusses key issues of corpus linguistics like the definition of the corpus,
primary features of a corpus, and utilization and limitations of corpora. It presents a
unique classification scheme of language corpora to show how they can be studied from
the perspective of genre, nature, text type, purpose, and application. A reference to
parallel translation corpus is mandatory in the discussion of corpus generation, which
the authors thoroughly address here, with a focus on Indian language corpora and
English. Web-text corpus, a new development in corpus linguistics, is also discussed
with elaborate reference to Indian web text corpora. The book also presents a short
history of corpus generation and provides scenarios before and after the advent of
computer-generated digital corpora.
This bookhas several important features: it discusses many technical issues of the field
in a lucid manner; contains extensive new diagrams and charts for easy comprehension*
and presents discussions in simplified English to cater to the needs of non-nati\e
English readers. This is an important resource authored by academics who have mam.
years of experience teaching and researching corpus linguistics. Its focus on Indian
languages and on English corpora makes it applicable to students of graduate and
postgraduate courses in applied linguistics, computational linguistics and language
processing in South Asia and across countries where English is spoken as a first or
second language. |
any_adam_object | 1 |
author | Dash, Niladri Sekhar 1967- Arulmozi, S. |
author_GND | (DE-588)143093800 |
author_facet | Dash, Niladri Sekhar 1967- Arulmozi, S. |
author_role | aut aut |
author_sort | Dash, Niladri Sekhar 1967- |
author_variant | n s d ns nsd s a sa |
building | Verbundindex |
bvnumber | BV044792370 |
classification_rvk | ES 900 ER 765 |
ctrlnum | (OCoLC)1028925239 (DE-599)DNB1142550389 |
discipline | Sprachwissenschaft Literaturwissenschaft |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV044792370</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20180914</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">180222s2018 si a||| |||| 00||| eng d</controlfield><datafield tag="016" ind1="7" ind2=" "><subfield code="a">1142550389</subfield><subfield code="2">DE-101</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9789811074578</subfield><subfield code="c">hardback : 82.38 EUR</subfield><subfield code="9">978-981-10-7457-8</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1028925239</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DNB1142550389</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">si</subfield><subfield code="c">XB-SG</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-29</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-19</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ES 900</subfield><subfield code="0">(DE-625)27926:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ER 765</subfield><subfield code="0">(DE-625)27756:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">400</subfield><subfield code="2">sdnb</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Dash, Niladri Sekhar</subfield><subfield code="d">1967-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)143093800</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">History, features, and typology of language corpora</subfield><subfield code="c">Niladri Sekhar Dash, S. Arulmozi</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Singapore</subfield><subfield code="b">Springer</subfield><subfield code="c">[2018]</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2018</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xxix, 293 Seiten</subfield><subfield code="b">Illustrationen</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Arulmozi, S.</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-981-10-7458-5</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Augsburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030187547&sequence=000003&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Augsburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030187547&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Klappentext</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-030187547</subfield></datafield></record></collection> |
id | DE-604.BV044792370 |
illustrated | Illustrated |
indexdate | 2024-08-21T00:49:43Z |
institution | BVB |
isbn | 9789811074578 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-030187547 |
oclc_num | 1028925239 |
open_access_boolean | |
owner | DE-29 DE-11 DE-384 DE-19 DE-BY-UBM |
owner_facet | DE-29 DE-11 DE-384 DE-19 DE-BY-UBM |
physical | xxix, 293 Seiten Illustrationen |
publishDate | 2018 |
publishDateSearch | 2018 |
publishDateSort | 2018 |
publisher | Springer |
record_format | marc |
spelling | Dash, Niladri Sekhar 1967- Verfasser (DE-588)143093800 aut History, features, and typology of language corpora Niladri Sekhar Dash, S. Arulmozi Singapore Springer [2018] © 2018 xxix, 293 Seiten Illustrationen txt rdacontent n rdamedia nc rdacarrier Korpus Linguistik (DE-588)4165338-5 gnd rswk-swf Korpus Linguistik (DE-588)4165338-5 s DE-604 Arulmozi, S. Verfasser aut Erscheint auch als Online-Ausgabe 978-981-10-7458-5 Digitalisierung UB Augsburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030187547&sequence=000003&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis Digitalisierung UB Augsburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030187547&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA Klappentext |
spellingShingle | Dash, Niladri Sekhar 1967- Arulmozi, S. History, features, and typology of language corpora Korpus Linguistik (DE-588)4165338-5 gnd |
subject_GND | (DE-588)4165338-5 |
title | History, features, and typology of language corpora |
title_auth | History, features, and typology of language corpora |
title_exact_search | History, features, and typology of language corpora |
title_full | History, features, and typology of language corpora Niladri Sekhar Dash, S. Arulmozi |
title_fullStr | History, features, and typology of language corpora Niladri Sekhar Dash, S. Arulmozi |
title_full_unstemmed | History, features, and typology of language corpora Niladri Sekhar Dash, S. Arulmozi |
title_short | History, features, and typology of language corpora |
title_sort | history features and typology of language corpora |
topic | Korpus Linguistik (DE-588)4165338-5 gnd |
topic_facet | Korpus Linguistik |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030187547&sequence=000003&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030187547&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT dashniladrisekhar historyfeaturesandtypologyoflanguagecorpora AT arulmozis historyfeaturesandtypologyoflanguagecorpora |