Text Analytics for corpus linguistics and digital humanities: simple r scripts and tools
Do you want to gain a deeper understanding of how big tech analyzes and exploits our text data, or investigate how political parties differ by analyzing textual styles, associations and trends in documents? Or create a map of a text collection and write a simple QA system yourself? This open access...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
London ; New York ; Oxford ; New Dehli ; Sydney
Bloomsbury Academic
2024
|
Schriftenreihe: | Language, data science and digital humanities
|
Schlagworte: | |
Online-Zugang: | Cover Inhaltsverzeichnis |
Zusammenfassung: | Do you want to gain a deeper understanding of how big tech analyzes and exploits our text data, or investigate how political parties differ by analyzing textual styles, associations and trends in documents? Or create a map of a text collection and write a simple QA system yourself? This open access book explores how to apply state-of-the-art text analytics methods to detect and visualize phenomena in text data. Solidly based on methods from corpus linguistics, natural language processing, text analytics and digital humanities, this book shows readers how to conduct experiments with their own corpora and research questions, underpin their theories, quantify the differences and pinpoint characteristics. Case studies and experiments are detailed in every chapter using real-world and open access corpora from politics, World English, history, and literature. The results are interpreted and put into perspective, pitfalls are pointed out, and necessary pre-processing steps are demonstrated. This book also demonstrates how to use the programming language R, as well as simple alternatives and additions to R, to conduct experiments and employ visualisations by example, with extensible R-code, recipes, links to corpora, and a wide range of methods. The methods introduced can be used across texts of all disciplines, from history or literature to party manifestos and patient reports.The ebook editions of this book are available open access under a CC BY-NC-ND 4.0 licence on bloomsburycollections.com |
Beschreibung: | ix, 224 Seiten Illustrationen, Diagramme |
ISBN: | 9781350370821 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV049886779 | ||
003 | DE-604 | ||
005 | 20241206 | ||
007 | t| | ||
008 | 240926s2024 xx a||| |||| 00||| eng d | ||
020 | |a 9781350370821 |9 978-1-350-37082-1 | ||
035 | |a (OCoLC)1425548278 | ||
035 | |a (DE-599)KXP1859363423 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-739 |a DE-20 | ||
082 | 0 | |a 005.13/3 |2 23 | |
082 | 0 | |a 005.133 | |
084 | |a HF 450 |0 (DE-625)48914: |2 rvk | ||
100 | 1 | |a Schneider, Gerold |e Verfasser |0 (DE-588)140606904 |4 aut | |
245 | 1 | 0 | |a Text Analytics for corpus linguistics and digital humanities |b simple r scripts and tools |c Gerold Schneider |
264 | 1 | |a London ; New York ; Oxford ; New Dehli ; Sydney |b Bloomsbury Academic |c 2024 | |
300 | |a ix, 224 Seiten |b Illustrationen, Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Language, data science and digital humanities | |
520 | 3 | |a Do you want to gain a deeper understanding of how big tech analyzes and exploits our text data, or investigate how political parties differ by analyzing textual styles, associations and trends in documents? Or create a map of a text collection and write a simple QA system yourself? This open access book explores how to apply state-of-the-art text analytics methods to detect and visualize phenomena in text data. Solidly based on methods from corpus linguistics, natural language processing, text analytics and digital humanities, this book shows readers how to conduct experiments with their own corpora and research questions, underpin their theories, quantify the differences and pinpoint characteristics. Case studies and experiments are detailed in every chapter using real-world and open access corpora from politics, World English, history, and literature. The results are interpreted and put into perspective, pitfalls are pointed out, and necessary pre-processing steps are demonstrated. This book also demonstrates how to use the programming language R, as well as simple alternatives and additions to R, to conduct experiments and employ visualisations by example, with extensible R-code, recipes, links to corpora, and a wide range of methods. The methods introduced can be used across texts of all disciplines, from history or literature to party manifestos and patient reports.The ebook editions of this book are available open access under a CC BY-NC-ND 4.0 licence on bloomsburycollections.com | |
650 | 0 | 7 | |a Digital Humanities |0 (DE-588)1038714850 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Textanalyse |0 (DE-588)4194196-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Korpus |g Linguistik |0 (DE-588)4165338-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Computerlinguistik |0 (DE-588)4035843-4 |2 gnd |9 rswk-swf |
653 | 0 | |a Text data mining | |
653 | 0 | |a R (Computer program language) | |
653 | 0 | |a Corpora (Linguistics) / Data processing | |
653 | 0 | |a Digital humanities / Research / Methodology | |
653 | 0 | |a COMPUTERS / Natural Language Processing | |
653 | 0 | |a COMPUTERS / Programming Languages / General | |
653 | 0 | |a Computational linguistics | |
653 | 0 | |a Computerlinguistik und Korpuslinguistik | |
653 | 0 | |a Data analysis: general | |
653 | 0 | |a Datenwissenschaft und -analyse: allgemein | |
653 | 0 | |a LANGUAGE ARTS & DISCIPLINES / Library & Information Science | |
653 | 0 | |a Programmier- und Skriptsprachen, allgemein | |
653 | 0 | |a Programming & scripting languages: general | |
689 | 0 | 0 | |a Digital Humanities |0 (DE-588)1038714850 |D s |
689 | 0 | 1 | |a Computerlinguistik |0 (DE-588)4035843-4 |D s |
689 | 0 | 2 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | 3 | |a Korpus |g Linguistik |0 (DE-588)4165338-5 |D s |
689 | 0 | 4 | |a Textanalyse |0 (DE-588)4194196-2 |D s |
689 | 0 | |5 DE-604 | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 9781350370852 |z 9781350370838 |z 9781350370845 |
856 | 4 | 2 | |u https://www.dietmardreier.de/annot/426F6F6B446174617C7C393738313335303337303832317C7C434F50.jpg?sq=2 |x Verlag |3 Cover |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035226020&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-035226020 |
Datensatz im Suchindex
_version_ | 1822482871870291969 |
---|---|
adam_text |
Contents List of Figures ■ viii List of Tables x Acknowledgements xi 1 Introduction 1 Spikes of Frequencies 17 3 Frequency Lists 39 4 Overuse and Keywords 55 '2 75 5 Document Classification 6 Topic Modelling 105 7 Kernel Density Estimation for Conceptual Maps 141 8 Distributional Semantics and Word Embeddings 9 BERT and GPT-x Models 10 Conclusion 165 185 203 Notes 207 References 211 Index 223 |
any_adam_object | 1 |
author | Schneider, Gerold |
author_GND | (DE-588)140606904 |
author_facet | Schneider, Gerold |
author_role | aut |
author_sort | Schneider, Gerold |
author_variant | g s gs |
building | Verbundindex |
bvnumber | BV049886779 |
classification_rvk | HF 450 |
ctrlnum | (OCoLC)1425548278 (DE-599)KXP1859363423 |
dewey-full | 005.13/3 005.133 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 005 - Computer programming, programs, data, security |
dewey-raw | 005.13/3 005.133 |
dewey-search | 005.13/3 005.133 |
dewey-sort | 15.13 13 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik Anglistik / Amerikanistik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV049886779</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20241206</controlfield><controlfield tag="007">t|</controlfield><controlfield tag="008">240926s2024 xx a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781350370821</subfield><subfield code="9">978-1-350-37082-1</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1425548278</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)KXP1859363423</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-739</subfield><subfield code="a">DE-20</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">005.13/3</subfield><subfield code="2">23</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">005.133</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">HF 450</subfield><subfield code="0">(DE-625)48914:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Schneider, Gerold</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)140606904</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Text Analytics for corpus linguistics and digital humanities</subfield><subfield code="b">simple r scripts and tools</subfield><subfield code="c">Gerold Schneider</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">London ; New York ; Oxford ; New Dehli ; Sydney</subfield><subfield code="b">Bloomsbury Academic</subfield><subfield code="c">2024</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">ix, 224 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Language, data science and digital humanities</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">Do you want to gain a deeper understanding of how big tech analyzes and exploits our text data, or investigate how political parties differ by analyzing textual styles, associations and trends in documents? Or create a map of a text collection and write a simple QA system yourself? This open access book explores how to apply state-of-the-art text analytics methods to detect and visualize phenomena in text data. Solidly based on methods from corpus linguistics, natural language processing, text analytics and digital humanities, this book shows readers how to conduct experiments with their own corpora and research questions, underpin their theories, quantify the differences and pinpoint characteristics. Case studies and experiments are detailed in every chapter using real-world and open access corpora from politics, World English, history, and literature. The results are interpreted and put into perspective, pitfalls are pointed out, and necessary pre-processing steps are demonstrated. This book also demonstrates how to use the programming language R, as well as simple alternatives and additions to R, to conduct experiments and employ visualisations by example, with extensible R-code, recipes, links to corpora, and a wide range of methods. The methods introduced can be used across texts of all disciplines, from history or literature to party manifestos and patient reports.The ebook editions of this book are available open access under a CC BY-NC-ND 4.0 licence on bloomsburycollections.com</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Digital Humanities</subfield><subfield code="0">(DE-588)1038714850</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Textanalyse</subfield><subfield code="0">(DE-588)4194196-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Computerlinguistik</subfield><subfield code="0">(DE-588)4035843-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Text data mining</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">R (Computer program language)</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Corpora (Linguistics) / Data processing</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Digital humanities / Research / Methodology</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">COMPUTERS / Natural Language Processing</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">COMPUTERS / Programming Languages / General</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Computational linguistics</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Computerlinguistik und Korpuslinguistik</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Data analysis: general</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Datenwissenschaft und -analyse: allgemein</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">LANGUAGE ARTS & DISCIPLINES / Library & Information Science</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Programmier- und Skriptsprachen, allgemein</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Programming & scripting languages: general</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Digital Humanities</subfield><subfield code="0">(DE-588)1038714850</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Computerlinguistik</subfield><subfield code="0">(DE-588)4035843-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="3"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="4"><subfield code="a">Textanalyse</subfield><subfield code="0">(DE-588)4194196-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">9781350370852</subfield><subfield code="z">9781350370838</subfield><subfield code="z">9781350370845</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://www.dietmardreier.de/annot/426F6F6B446174617C7C393738313335303337303832317C7C434F50.jpg?sq=2</subfield><subfield code="x">Verlag</subfield><subfield code="3">Cover</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035226020&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-035226020</subfield></datafield></record></collection> |
id | DE-604.BV049886779 |
illustrated | Illustrated |
indexdate | 2025-01-28T09:05:06Z |
institution | BVB |
isbn | 9781350370821 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-035226020 |
oclc_num | 1425548278 |
open_access_boolean | |
owner | DE-739 DE-20 |
owner_facet | DE-739 DE-20 |
physical | ix, 224 Seiten Illustrationen, Diagramme |
publishDate | 2024 |
publishDateSearch | 2024 |
publishDateSort | 2024 |
publisher | Bloomsbury Academic |
record_format | marc |
series2 | Language, data science and digital humanities |
spelling | Schneider, Gerold Verfasser (DE-588)140606904 aut Text Analytics for corpus linguistics and digital humanities simple r scripts and tools Gerold Schneider London ; New York ; Oxford ; New Dehli ; Sydney Bloomsbury Academic 2024 ix, 224 Seiten Illustrationen, Diagramme txt rdacontent n rdamedia nc rdacarrier Language, data science and digital humanities Do you want to gain a deeper understanding of how big tech analyzes and exploits our text data, or investigate how political parties differ by analyzing textual styles, associations and trends in documents? Or create a map of a text collection and write a simple QA system yourself? This open access book explores how to apply state-of-the-art text analytics methods to detect and visualize phenomena in text data. Solidly based on methods from corpus linguistics, natural language processing, text analytics and digital humanities, this book shows readers how to conduct experiments with their own corpora and research questions, underpin their theories, quantify the differences and pinpoint characteristics. Case studies and experiments are detailed in every chapter using real-world and open access corpora from politics, World English, history, and literature. The results are interpreted and put into perspective, pitfalls are pointed out, and necessary pre-processing steps are demonstrated. This book also demonstrates how to use the programming language R, as well as simple alternatives and additions to R, to conduct experiments and employ visualisations by example, with extensible R-code, recipes, links to corpora, and a wide range of methods. The methods introduced can be used across texts of all disciplines, from history or literature to party manifestos and patient reports.The ebook editions of this book are available open access under a CC BY-NC-ND 4.0 licence on bloomsburycollections.com Digital Humanities (DE-588)1038714850 gnd rswk-swf Data Mining (DE-588)4428654-5 gnd rswk-swf Textanalyse (DE-588)4194196-2 gnd rswk-swf Korpus Linguistik (DE-588)4165338-5 gnd rswk-swf Computerlinguistik (DE-588)4035843-4 gnd rswk-swf Text data mining R (Computer program language) Corpora (Linguistics) / Data processing Digital humanities / Research / Methodology COMPUTERS / Natural Language Processing COMPUTERS / Programming Languages / General Computational linguistics Computerlinguistik und Korpuslinguistik Data analysis: general Datenwissenschaft und -analyse: allgemein LANGUAGE ARTS & DISCIPLINES / Library & Information Science Programmier- und Skriptsprachen, allgemein Programming & scripting languages: general Digital Humanities (DE-588)1038714850 s Computerlinguistik (DE-588)4035843-4 s Data Mining (DE-588)4428654-5 s Korpus Linguistik (DE-588)4165338-5 s Textanalyse (DE-588)4194196-2 s DE-604 Erscheint auch als Online-Ausgabe 9781350370852 9781350370838 9781350370845 https://www.dietmardreier.de/annot/426F6F6B446174617C7C393738313335303337303832317C7C434F50.jpg?sq=2 Verlag Cover Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035226020&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Schneider, Gerold Text Analytics for corpus linguistics and digital humanities simple r scripts and tools Digital Humanities (DE-588)1038714850 gnd Data Mining (DE-588)4428654-5 gnd Textanalyse (DE-588)4194196-2 gnd Korpus Linguistik (DE-588)4165338-5 gnd Computerlinguistik (DE-588)4035843-4 gnd |
subject_GND | (DE-588)1038714850 (DE-588)4428654-5 (DE-588)4194196-2 (DE-588)4165338-5 (DE-588)4035843-4 |
title | Text Analytics for corpus linguistics and digital humanities simple r scripts and tools |
title_auth | Text Analytics for corpus linguistics and digital humanities simple r scripts and tools |
title_exact_search | Text Analytics for corpus linguistics and digital humanities simple r scripts and tools |
title_full | Text Analytics for corpus linguistics and digital humanities simple r scripts and tools Gerold Schneider |
title_fullStr | Text Analytics for corpus linguistics and digital humanities simple r scripts and tools Gerold Schneider |
title_full_unstemmed | Text Analytics for corpus linguistics and digital humanities simple r scripts and tools Gerold Schneider |
title_short | Text Analytics for corpus linguistics and digital humanities |
title_sort | text analytics for corpus linguistics and digital humanities simple r scripts and tools |
title_sub | simple r scripts and tools |
topic | Digital Humanities (DE-588)1038714850 gnd Data Mining (DE-588)4428654-5 gnd Textanalyse (DE-588)4194196-2 gnd Korpus Linguistik (DE-588)4165338-5 gnd Computerlinguistik (DE-588)4035843-4 gnd |
topic_facet | Digital Humanities Data Mining Textanalyse Korpus Linguistik Computerlinguistik |
url | https://www.dietmardreier.de/annot/426F6F6B446174617C7C393738313335303337303832317C7C434F50.jpg?sq=2 http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=035226020&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT schneidergerold textanalyticsforcorpuslinguisticsanddigitalhumanitiessimplerscriptsandtools |