The web as corpus: theory and practice
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
London [u.a.]
Bloomsbury
2014
|
Ausgabe: | 1. publ. |
Schriftenreihe: | Studies in corpus and discourse
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis Klappentext |
Beschreibung: | Literaturangaben |
Beschreibung: | XXII, 232 S. Ill., graph. Darst. |
ISBN: | 9781441161123 9781441150981 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV040796315 | ||
003 | DE-604 | ||
005 | 20210111 | ||
007 | t | ||
008 | 130305s2014 ad|| |||| 00||| eng d | ||
020 | |a 9781441161123 |c PB |9 978-1-4411-6112-3 | ||
020 | |a 9781441150981 |c HB |9 978-1-4411-5098-1 | ||
035 | |a (OCoLC)869847778 | ||
035 | |a (DE-599)BVBBV040796315 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
049 | |a DE-11 |a DE-19 |a DE-188 |a DE-12 |a DE-384 |a DE-20 |a DE-355 |a DE-83 |a DE-739 | ||
084 | |a ER 765 |0 (DE-625)27756: |2 rvk | ||
084 | |a ES 900 |0 (DE-625)27926: |2 rvk | ||
084 | |a ET 785 |0 (DE-625)28035: |2 rvk | ||
100 | 1 | |a Gatto, Maristella |e Verfasser |0 (DE-588)1048163504 |4 aut | |
245 | 1 | 0 | |a The web as corpus |b theory and practice |c Maristella Gatto |
250 | |a 1. publ. | ||
264 | 1 | |a London [u.a.] |b Bloomsbury |c 2014 | |
300 | |a XXII, 232 S. |b Ill., graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Studies in corpus and discourse | |
500 | |a Literaturangaben | ||
650 | 0 | 7 | |a Information Retrieval |0 (DE-588)4072803-1 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a World Wide Web |0 (DE-588)4363898-3 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Korpus |g Linguistik |0 (DE-588)4165338-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Korpus |g Linguistik |0 (DE-588)4165338-5 |D s |
689 | 0 | 1 | |a World Wide Web |0 (DE-588)4363898-3 |D s |
689 | 0 | 2 | |a Information Retrieval |0 (DE-588)4072803-1 |D s |
689 | 0 | |5 DE-604 | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-1-4725-4218-2 |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe, PDF |z 978-1-4411-3413-4 |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe, EPUB |z 978-1-4725-7153-3 |
856 | 4 | 2 | |m Digitalisierung UB Augsburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025776513&sequence=000003&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
856 | 4 | 2 | |m Digitalisierung UB Augsburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025776513&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |3 Klappentext |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-025776513 |
Datensatz im Suchindex
_version_ | 1807956156617850880 |
---|---|
adam_text |
Is the internet a suitable linguistic corpus? How can we use it in corpus
techniques? What are the special properties that we need to be aware of?
This book answers those questions.
The Web is an exponentially increasing source of language and corpus linguistics
data. From gigantic static information resources to user-generated Web
2.0
content,
the breadth and depth of information available is breathtaking
-
and bewildering.
This book explores the theory and practice of the 'web as corpus'. It looks at the
most common tools and methods used and features a plethora of examples
based on the author's own teaching experience. This book also bridges the gap
between studies in computational linguistics, which emphasize technical aspects,
and studies in corpus linguistics, which focus on the implications for language
theory and use.
Contents
List Of Figures
Xiii
List of Tables
xvii
Preface
xix
Acknowledgements
xxi
Introduction
1
1
Corpus Linguistics: Basic Principles
5
Introduction
5
1.
Theory, approach, methods: Corpus linguistics as a research field
5
2.
Key issues in corpus linguistics
7
2.1
Authenticity
9
2.2
Representativeness
10
2.3
Balance and sampling
12
2.4
Size
14
2.5
Types of corpora
15
3.
Corpus, concordance, collocation: Tools and analysis
16
3.1
Corpus creation
16
3.2
Corpus analysis 18
3.2.1
Word lists and keywords
19
3.2.2
Concordances
23
3.3
Collocation and basic statistics
25
3.4
Colligation and semantic associations
29
Conclusion
31
Study questions and activities
32
Suggestions for further reading
33
2
The Body and the Web: An Introduction to the Web as
Corpus
35
Introduction
35
1. Corpus linguistics and the web
35
2.
the web as corpus: A 'body' of texts?
39
3.
The corpus and the web: Key issues
41
3.1
Authenticity
42
3.2
Representativeness
43
3.3
Size
45
3.4
Composition
49
3.4.1
Medium
51
3.4.2
Language
52
3.4.3
Topics
57
3.4.4
Registers, (web) genres, and text types
61
3.5
Copyright
63
4.
From 'body' to 'web': New issues
65
4.1
Dynamism
66
4.2
Reproducibility
68
4.3
Relevance and reliability
69
Conclusion
70
Study questions and activities
71
Suggestions for further reading
71
3
Challenging Anarchy: Web Search from a Corpus Perspective
7;
Introduction
73
1. The corpus and the search
73
2.
Search engine basics: Crawling, indexing, searching, ranking
77
3.
Google and the others: An overview of commercial search engines
79
4.
Challenging anarchy: Mastering advanced web search
83
4.1
An overview of web search options
84
4.2
Limits and potentials of 'webidence'
87
4.3
Phrase search and collocation
88
4.4
Phraseology and patterns
91
4.5
Provenance, site, domain and more: Searching subsections
of the web
93
4.6
Testing translation candidates
96
5.
Query complexity: Web search from a corpus perspective
101
Conclusion
102
Study questions and activities
102
Suggestions for further reading
юз
4
Beyond Ordinary Search Engines: Concordancing the Web
105
Introduction
105
1.
Beyond ordinary search engines: Concordancing the web
105
2.
WebCorp Live
106
3.
Concordancing the web in the foreign language classroom
113
3.1
Exploring collocation: The case of scenery
113
3.2
Investigating neologisms and phrasal creativity
115
4.
Beyond web concordancing tools: The web as/for corpus
119
5.
Towards a linguist's search engine: The case of WebCorpLSE
121
5.1
The Web as/for Corpus: From WebCorp Live to WebCorpLSE
122
5.2
Using WebCorpLSE to explore contemporary English
125
5.2.1
Synchronie
English Web Corpus and
Diachronie
English Web
Corpus
126
5.2.2
Birmingham Blog Corpus 131
Conclusion
134
Study questions and activities
134
Suggestions for further reading
136
5
Building and Using Comparable Web Corpora: Tools and
Methods
137
Introduction
137
1. Building
DIY
web corpora
137
2.
from words to corpus: The 'bootstrap' process
140
2.1
Compiling a domain-specific corpus with BootCaT
140
2.2
Compiling specialized corpora with WebBootCaT
147
3.
Building and using comparable web corpora for translation practice
154
Conclusion
158
Study questions and activities
159
Suggestions for further reading I6I
6
Sketches of Language and Culture from Large Web Corpora
16З
Introduction
163
1.
From web as corpus to corpus as web: Introducing large general purpose
web corpora
163
2.
Mega-corpus, mini-Web:
The case of ukWaC
167
2.1
Selecting 'seed URLs' and crawling
167
2.2
Post-crawl cleaning and annotation
169
3.
Exploring large web corpora: Tools and resources
171
3.1
The Sketch Engine: From concordance lines to word sketches
172
3.2
The Sketch Difference function
179
4.
Case study: Sketches of
culture
from the BNC to the web
182
4.1
A very difficult word.
183
4.2
Sketches of culture from the British National Corpus
184
4.3
Sketches of culture from UkWaC
187
4.3.1
Culture as object
188
4.3.2
Culture as subject
193
4.3.3
Modifiers of culture
195
4.3.4
Culture as modifier
196
4.3.5
The pattern culture and/or NOUN
198
4.4
A culture of: The changing face of culture in contemporary society
198
Conclusion
202
Study questions and activities
203
Suggestions for further reading
20З
7
From Download to Upload: The Web as Corpus in the
Web
2.0
Era
205
Introduction
205
1.
From download to upload: Web users as prosumers
205
2.
Web
2.0
as corpus. The case of Wikipedia as a multilingual corpus
207
3.
The corpus in the cloud? The challenges that lie ahead 208
Suggestions for further reading
210
Conclusion
211
References
215
Index
229 |
any_adam_object | 1 |
author | Gatto, Maristella |
author_GND | (DE-588)1048163504 |
author_facet | Gatto, Maristella |
author_role | aut |
author_sort | Gatto, Maristella |
author_variant | m g mg |
building | Verbundindex |
bvnumber | BV040796315 |
classification_rvk | ER 765 ES 900 ET 785 |
ctrlnum | (OCoLC)869847778 (DE-599)BVBBV040796315 |
discipline | Sprachwissenschaft Literaturwissenschaft |
edition | 1. publ. |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV040796315</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20210111</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">130305s2014 ad|| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781441161123</subfield><subfield code="c">PB</subfield><subfield code="9">978-1-4411-6112-3</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781441150981</subfield><subfield code="c">HB</subfield><subfield code="9">978-1-4411-5098-1</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)869847778</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV040796315</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-11</subfield><subfield code="a">DE-19</subfield><subfield code="a">DE-188</subfield><subfield code="a">DE-12</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-355</subfield><subfield code="a">DE-83</subfield><subfield code="a">DE-739</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ER 765</subfield><subfield code="0">(DE-625)27756:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ES 900</subfield><subfield code="0">(DE-625)27926:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ET 785</subfield><subfield code="0">(DE-625)28035:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Gatto, Maristella</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1048163504</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">The web as corpus</subfield><subfield code="b">theory and practice</subfield><subfield code="c">Maristella Gatto</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1. publ.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">London [u.a.]</subfield><subfield code="b">Bloomsbury</subfield><subfield code="c">2014</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXII, 232 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Studies in corpus and discourse</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturangaben</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Information Retrieval</subfield><subfield code="0">(DE-588)4072803-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">World Wide Web</subfield><subfield code="0">(DE-588)4363898-3</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">World Wide Web</subfield><subfield code="0">(DE-588)4363898-3</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Information Retrieval</subfield><subfield code="0">(DE-588)4072803-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-1-4725-4218-2</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe, PDF</subfield><subfield code="z">978-1-4411-3413-4</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe, EPUB</subfield><subfield code="z">978-1-4725-7153-3</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Augsburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025776513&sequence=000003&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Augsburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025776513&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Klappentext</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-025776513</subfield></datafield></record></collection> |
id | DE-604.BV040796315 |
illustrated | Illustrated |
indexdate | 2024-08-21T00:49:10Z |
institution | BVB |
isbn | 9781441161123 9781441150981 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-025776513 |
oclc_num | 869847778 |
open_access_boolean | |
owner | DE-11 DE-19 DE-BY-UBM DE-188 DE-12 DE-384 DE-20 DE-355 DE-BY-UBR DE-83 DE-739 |
owner_facet | DE-11 DE-19 DE-BY-UBM DE-188 DE-12 DE-384 DE-20 DE-355 DE-BY-UBR DE-83 DE-739 |
physical | XXII, 232 S. Ill., graph. Darst. |
publishDate | 2014 |
publishDateSearch | 2014 |
publishDateSort | 2014 |
publisher | Bloomsbury |
record_format | marc |
series2 | Studies in corpus and discourse |
spelling | Gatto, Maristella Verfasser (DE-588)1048163504 aut The web as corpus theory and practice Maristella Gatto 1. publ. London [u.a.] Bloomsbury 2014 XXII, 232 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier Studies in corpus and discourse Literaturangaben Information Retrieval (DE-588)4072803-1 gnd rswk-swf World Wide Web (DE-588)4363898-3 gnd rswk-swf Korpus Linguistik (DE-588)4165338-5 gnd rswk-swf Korpus Linguistik (DE-588)4165338-5 s World Wide Web (DE-588)4363898-3 s Information Retrieval (DE-588)4072803-1 s DE-604 Erscheint auch als Online-Ausgabe 978-1-4725-4218-2 Erscheint auch als Online-Ausgabe, PDF 978-1-4411-3413-4 Erscheint auch als Online-Ausgabe, EPUB 978-1-4725-7153-3 Digitalisierung UB Augsburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025776513&sequence=000003&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis Digitalisierung UB Augsburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025776513&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA Klappentext |
spellingShingle | Gatto, Maristella The web as corpus theory and practice Information Retrieval (DE-588)4072803-1 gnd World Wide Web (DE-588)4363898-3 gnd Korpus Linguistik (DE-588)4165338-5 gnd |
subject_GND | (DE-588)4072803-1 (DE-588)4363898-3 (DE-588)4165338-5 |
title | The web as corpus theory and practice |
title_auth | The web as corpus theory and practice |
title_exact_search | The web as corpus theory and practice |
title_full | The web as corpus theory and practice Maristella Gatto |
title_fullStr | The web as corpus theory and practice Maristella Gatto |
title_full_unstemmed | The web as corpus theory and practice Maristella Gatto |
title_short | The web as corpus |
title_sort | the web as corpus theory and practice |
title_sub | theory and practice |
topic | Information Retrieval (DE-588)4072803-1 gnd World Wide Web (DE-588)4363898-3 gnd Korpus Linguistik (DE-588)4165338-5 gnd |
topic_facet | Information Retrieval World Wide Web Korpus Linguistik |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025776513&sequence=000003&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025776513&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT gattomaristella thewebascorpustheoryandpractice |