Corpus linguistics and statistics with R: introduction to quantitative methods in linguistics
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Cham
Springer
[2017]
|
Schriftenreihe: | Quantitative methods in the humanities and social sciences
|
Schlagworte: | |
Online-Zugang: | Inhaltstext Inhaltsverzeichnis |
Beschreibung: | xiii, 353 Seiten Illustrationen, Diagramme |
ISBN: | 9783319645704 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV044741606 | ||
003 | DE-604 | ||
005 | 20190910 | ||
007 | t | ||
008 | 180201s2017 a||| |||| 00||| eng d | ||
020 | |a 9783319645704 |c Festeinband : EUR 71.68 (DE), EUR 73.69 (AT) |9 978-3-319-64570-4 | ||
024 | 3 | |a 9783319645704 | |
035 | |a (OCoLC)1018234798 | ||
035 | |a (DE-599)BSZ496473611 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-355 |a DE-473 |a DE-521 |a DE-83 | ||
084 | |a HF 450 |0 (DE-625)48914: |2 rvk | ||
084 | |a ER 765 |0 (DE-625)27756: |2 rvk | ||
084 | |a ES 250 |0 (DE-625)27822: |2 rvk | ||
084 | |a ES 900 |0 (DE-625)27926: |2 rvk | ||
084 | |a ES 910 |0 (DE-625)27927: |2 rvk | ||
084 | |a ST 601 |0 (DE-625)143682: |2 rvk | ||
100 | 1 | |a Desagulier, Guillaume |e Verfasser |0 (DE-588)1148961305 |4 aut | |
245 | 1 | 0 | |a Corpus linguistics and statistics with R |b introduction to quantitative methods in linguistics |c Guillaume Desagulier |
264 | 1 | |a Cham |b Springer |c [2017] | |
264 | 4 | |c © 2017 | |
300 | |a xiii, 353 Seiten |b Illustrationen, Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Quantitative methods in the humanities and social sciences | |
650 | 0 | 7 | |a R |g Programm |0 (DE-588)4705956-4 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Sprachstatistik |0 (DE-588)4182534-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Korpus |g Linguistik |0 (DE-588)4165338-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Korpus |g Linguistik |0 (DE-588)4165338-5 |D s |
689 | 0 | 1 | |a Sprachstatistik |0 (DE-588)4182534-2 |D s |
689 | 0 | 2 | |a R |g Programm |0 (DE-588)4705956-4 |D s |
689 | 0 | |5 DE-604 | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |a Desagulier, Guillaume |t Corpus Linguistics and Statistics with R |d Cham : Springer, 2017 |h Online-Ressource (XIII, 353 p. 98 illus., 55 illus. in color, online resource) |z 978-3-319-64572-8 |
776 | 0 | |z 9783319645728 | |
856 | 4 | 2 | |m X:MVB |q text/html |u http://deposit.dnb.de/cgi-bin/dokserv?id=918bdb1b6b844a94aafee81417ff2f44&prov=M&dok_var=1&dok_ext=htm |3 Inhaltstext |
856 | 4 | 2 | |m HEBIS Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030137436&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-030137436 |
Datensatz im Suchindex
_version_ | 1807956189334470656 |
---|---|
adam_text |
Guillaume Desagulier
Corpus Linguistics and Statistics
with R
Introduction to Quantitative Methods in Linguistics
Springer
Contents
1 Introduction 1
1 1 From Introspective to Corpus-Informed Judgments 1
1 2 Looking for Corpus Linguistics 3
121 What Counts as a Corpus 3
122 What Linguists Do with the Corpus 6
123 How Central the Corpus Is to a Linguist’s Work 8
References 10
Part I Methods in Corpus Linguistics
2 R Fundamentals 15
2 1 Introduction 15
2 2 Downloads and Installs 15
221 Downloading and Installing R 16
222 Downloading and Installing RStudio 16
223 Downloading the Book Materials 17
2 3 Setting the Working Directory 17
24R Scripts 17
2 5 Packages 18
251 Downloading Packages 18
252 Loading Packages 19
2 6 Simple Commands 19
2 7 Variables and Assignment 20
2 8 Functions and Arguments 21
281 Ready-Made Functions 21
282 User-Defined Functions 22
29R Objects 24
291 Vectors 24
292 Lists 33
293 Matrices 34
294 Data Frames (and Factors) 36
2 10 for Loops 41
2 11 if and if else Statements 43
IX
X
Contents
2 11 1 if Statements 43
2 11 2 if else Statements 44
2 12 Cleanup 45
2 13 Common Mistakes and How to Avoid Them 46
2 14 Further Reading 47
Exercises 47
References 49
3 Digital Corpora 51
31A Short Typology 51
3 2 Corpus Compilation: Kennedy’s Five Steps 52
3 3 Unannotated Corpora 54
331 Collecting Textual Data 54
332 Character Encoding Issues 55
333 Creating an Unannotated Corpus 57
3 4 Annotated Corpora 58
341 Markup 58
342 POS-Tagging 58
343 POS-Tagging in R 59
344 Semantic Tagging 63
3 5 Obtaining Corpora 65
Exercise 65
References 66
4 Processing and Manipulating Character Strings 69
4 1 Introduction 69
4 2 Character Strings 69
421 Definition 70
422 Loading Several Text Files 70
4 3 First Forays into Character String Processing 71
431 Splitting 71
432 Matching 72
433 Replacing and Deleting 72
434 Limitations 73
4 4 Regular Expressions 73
441 Overview 73
442 Literals vs Metacharacters 74
443 Line Anchors 74
444 Quantifiers 75
445 Alternations and Groupings 76
446 Character Classes 77
447 Lazy vs Greedy Matching 79
448 Backreference 80
449 Exact Matching with strapplyO 81
4 4 10 Lookaround 82
Exercises 85
Contents
xi
5 Applied Character String Processing 87
5 1 Introduction 87
5 2 Concordances 87
521A Concordance Based on an Unannotated Corpus 87
522A Concordance Based on an Annotated Corpus 95
5 3 Making a Data Frame from an Annotated Corpus 104
531 Planning the Data Frame 104
532 Compiling the Data Frame 104
533 The Full Script 106
5 4 Frequency Lists 108
541A Frequency List of a Raw Text File 108
542A Frequency List of an Annotated File 110
Exercises 113
References 114
6 Summary Graphics for Frequency Data 115
6 1 Introduction 115
6 2 Plots, Barplots, and Histograms 115
6 3 Word Clouds 118
6 4 Dispersion Plots 122
6 5 Strip Charts 125
6 6 Reshaping Tabulated Data 127
6 7 Motion Charts 132
Exercises 133
References 135
Part II Statistics for Corpus Linguistics
7 Descriptive Statistics 139
7 1 Variables 139
7 2 Central Tendency 140
721 The Mean 140
722 The Median 142
723 The Mode 143
7 3 Dispersion 145
731 Quantiles 145
732 Boxplots 146
733 Variance and Standard Deviation 147
Exercises 148
8 Notions of Statistical Testing 151
*81 Introduction 151
8 2 Probabilities 151
821 Definition 151
822 Simple Probabilities 152
823 Joint and Marginal Probabilities 153
824 Union vs Intersection 155
Contents
825 Conditional Probabilities
826 Independence
8 3 Populations, Samples, and Individuals
8 4 Random Variables
8 5 Response/Dependent vs Explanatory/Descriptive/Independent Variables
8 6 Hypotheses
8 7 Hypothesis Testing
8 8 Probability Distributions
881 Discrete Distributions
882 Continuous Distributions
8 9 The x2 Test
891A Case Study: The Quotative System in British and Canadian Youth
8 10 The Fisher Exact Test of Independence
8 11 Correlation
8 11 1 Pearson’s r
8 11 2 Kendall’s t
8 11 3 Spearman’s p
8 11 4 Correlation Is Not Causation
Exercises
References
9 Association and Productivity
9 1 Introduction
9 2 Cooccurrence Phenomena
921 Collocation
922 Colligation
923 Collostruction
9 3 Association Measures
931 Measuring Significant Co-occurrences
932 The Logic of Association Measures
933A Quick Inventory of Association Measures
934A Loop for Association Measures
935 There Is No Perfect Association Measure
936 Collostructions
937 Asymmetric Association Measures
9 4 Lexical Richness and Productivity
941 Hapax-Based Measures
942 Types, Tokens, and Type-Token Ratio
943 Vocabulary Growth Curves
Exercise
References
10 Clustering Methods
10 1 Introduction
10 1 1 Multidimensional Data
10 1 2 Visualization
155;
156|
157?'
158|
159f
16(1
16?
163
165
16
17
17
18
18
18
18
19
19
19
19
19
19
19
19
2
2
20
20
2
2
21
21
21
2
2
2
2
2
2
2 ;
2
2
2
2
Contents
xiii
; 10 2 Principal Component Analysis 242
10 2 1 Principles of Principal Component Analysis 243
10 22A Case Study: Characterizing Genres with Prosody in Spoken French 243
10 2 3 How PCA Works 245
10 3 An Alternative to PCA: t-SNE 252
10 4 Correspondence Analysis 257
10 4 1 Principles of Correspondence Analysis 257
10 4 2 Case Study: General Extenders in the Speech of English Teenagers 257
10 4 3 How CA Works 261
10 4 4 Supplementary Variables 266
10 5 Multiple Correspondence Analysis 268
10 5 1 Principles of Multiple Correspondence Analysis 269
10 5 2 Case Study: Predeterminer vs Preadjectival Uses of Quite and Rather 270
10 5 3 Confidence Ellipses 275
10 5 4 Beyond MCA 276
10 6 Hierarchical Cluster Analysis 276
10 6 1 The Principles of Hierarchical Cluster Analysis 277
10 6 2 Case Study: Clustering English Intensifies 278
10 6 3 Cluster Classes 279
10 6 4 Standardizing Variables 281
10 7 Networks 283
10 7 1 What Is a Graph? 283
10 7 2 The Linguistic Relevance of Graphs 285
Exercises 290
References 292
A Appendix 295
3Al Chapter 6 295
3A11 Dispersion Plots 295
4A2 Chapter 8 297
5A21 Contingency Table 297
A22 Discrete Probability Distributions 298
3A23A x2 Distribution Table 300
B Bibliography 301
Solutions 309
Index |
any_adam_object | 1 |
author | Desagulier, Guillaume |
author_GND | (DE-588)1148961305 |
author_facet | Desagulier, Guillaume |
author_role | aut |
author_sort | Desagulier, Guillaume |
author_variant | g d gd |
building | Verbundindex |
bvnumber | BV044741606 |
classification_rvk | HF 450 ER 765 ES 250 ES 900 ES 910 ST 601 |
ctrlnum | (OCoLC)1018234798 (DE-599)BSZ496473611 |
discipline | Sprachwissenschaft Informatik Anglistik / Amerikanistik Literaturwissenschaft |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV044741606</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20190910</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">180201s2017 a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9783319645704</subfield><subfield code="c">Festeinband : EUR 71.68 (DE), EUR 73.69 (AT)</subfield><subfield code="9">978-3-319-64570-4</subfield></datafield><datafield tag="024" ind1="3" ind2=" "><subfield code="a">9783319645704</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1018234798</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BSZ496473611</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-355</subfield><subfield code="a">DE-473</subfield><subfield code="a">DE-521</subfield><subfield code="a">DE-83</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">HF 450</subfield><subfield code="0">(DE-625)48914:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ER 765</subfield><subfield code="0">(DE-625)27756:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ES 250</subfield><subfield code="0">(DE-625)27822:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ES 900</subfield><subfield code="0">(DE-625)27926:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ES 910</subfield><subfield code="0">(DE-625)27927:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 601</subfield><subfield code="0">(DE-625)143682:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Desagulier, Guillaume</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1148961305</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Corpus linguistics and statistics with R</subfield><subfield code="b">introduction to quantitative methods in linguistics</subfield><subfield code="c">Guillaume Desagulier</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Cham</subfield><subfield code="b">Springer</subfield><subfield code="c">[2017]</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2017</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xiii, 353 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Quantitative methods in the humanities and social sciences</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">R</subfield><subfield code="g">Programm</subfield><subfield code="0">(DE-588)4705956-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Sprachstatistik</subfield><subfield code="0">(DE-588)4182534-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Sprachstatistik</subfield><subfield code="0">(DE-588)4182534-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">R</subfield><subfield code="g">Programm</subfield><subfield code="0">(DE-588)4705956-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="a">Desagulier, Guillaume</subfield><subfield code="t">Corpus Linguistics and Statistics with R</subfield><subfield code="d">Cham : Springer, 2017</subfield><subfield code="h">Online-Ressource (XIII, 353 p. 98 illus., 55 illus. in color, online resource)</subfield><subfield code="z">978-3-319-64572-8</subfield></datafield><datafield tag="776" ind1="0" ind2=" "><subfield code="z">9783319645728</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">X:MVB</subfield><subfield code="q">text/html</subfield><subfield code="u">http://deposit.dnb.de/cgi-bin/dokserv?id=918bdb1b6b844a94aafee81417ff2f44&prov=M&dok_var=1&dok_ext=htm</subfield><subfield code="3">Inhaltstext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HEBIS Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030137436&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-030137436</subfield></datafield></record></collection> |
id | DE-604.BV044741606 |
illustrated | Illustrated |
indexdate | 2024-08-21T00:49:42Z |
institution | BVB |
isbn | 9783319645704 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-030137436 |
oclc_num | 1018234798 |
open_access_boolean | |
owner | DE-355 DE-BY-UBR DE-473 DE-BY-UBG DE-521 DE-83 |
owner_facet | DE-355 DE-BY-UBR DE-473 DE-BY-UBG DE-521 DE-83 |
physical | xiii, 353 Seiten Illustrationen, Diagramme |
publishDate | 2017 |
publishDateSearch | 2017 |
publishDateSort | 2017 |
publisher | Springer |
record_format | marc |
series2 | Quantitative methods in the humanities and social sciences |
spelling | Desagulier, Guillaume Verfasser (DE-588)1148961305 aut Corpus linguistics and statistics with R introduction to quantitative methods in linguistics Guillaume Desagulier Cham Springer [2017] © 2017 xiii, 353 Seiten Illustrationen, Diagramme txt rdacontent n rdamedia nc rdacarrier Quantitative methods in the humanities and social sciences R Programm (DE-588)4705956-4 gnd rswk-swf Sprachstatistik (DE-588)4182534-2 gnd rswk-swf Korpus Linguistik (DE-588)4165338-5 gnd rswk-swf Korpus Linguistik (DE-588)4165338-5 s Sprachstatistik (DE-588)4182534-2 s R Programm (DE-588)4705956-4 s DE-604 Erscheint auch als Online-Ausgabe Desagulier, Guillaume Corpus Linguistics and Statistics with R Cham : Springer, 2017 Online-Ressource (XIII, 353 p. 98 illus., 55 illus. in color, online resource) 978-3-319-64572-8 9783319645728 X:MVB text/html http://deposit.dnb.de/cgi-bin/dokserv?id=918bdb1b6b844a94aafee81417ff2f44&prov=M&dok_var=1&dok_ext=htm Inhaltstext HEBIS Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030137436&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Desagulier, Guillaume Corpus linguistics and statistics with R introduction to quantitative methods in linguistics R Programm (DE-588)4705956-4 gnd Sprachstatistik (DE-588)4182534-2 gnd Korpus Linguistik (DE-588)4165338-5 gnd |
subject_GND | (DE-588)4705956-4 (DE-588)4182534-2 (DE-588)4165338-5 |
title | Corpus linguistics and statistics with R introduction to quantitative methods in linguistics |
title_auth | Corpus linguistics and statistics with R introduction to quantitative methods in linguistics |
title_exact_search | Corpus linguistics and statistics with R introduction to quantitative methods in linguistics |
title_full | Corpus linguistics and statistics with R introduction to quantitative methods in linguistics Guillaume Desagulier |
title_fullStr | Corpus linguistics and statistics with R introduction to quantitative methods in linguistics Guillaume Desagulier |
title_full_unstemmed | Corpus linguistics and statistics with R introduction to quantitative methods in linguistics Guillaume Desagulier |
title_short | Corpus linguistics and statistics with R |
title_sort | corpus linguistics and statistics with r introduction to quantitative methods in linguistics |
title_sub | introduction to quantitative methods in linguistics |
topic | R Programm (DE-588)4705956-4 gnd Sprachstatistik (DE-588)4182534-2 gnd Korpus Linguistik (DE-588)4165338-5 gnd |
topic_facet | R Programm Sprachstatistik Korpus Linguistik |
url | http://deposit.dnb.de/cgi-bin/dokserv?id=918bdb1b6b844a94aafee81417ff2f44&prov=M&dok_var=1&dok_ext=htm http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=030137436&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT desagulierguillaume corpuslinguisticsandstatisticswithrintroductiontoquantitativemethodsinlinguistics |