Verfügbarkeit: Exploring newspaper language

Exploring newspaper language: using the web to create and investigate a large corpus of modern Norwegian

Gespeichert in:

Bibliographische Detailangaben
Format:	Elektronisch E-Book
Sprache:	English
Veröffentlicht:	Amsterdam ; Philadelphia John Benjamins Pub. Co. 2012
Schlagworte:	FOREIGN LANGUAGE STUDY / Norwegian FOREIGN LANGUAGE STUDY / Scandinavian Languages (Other) Massenmedien Norwegian language (Nynorsk) > Usage Norwegian language (Nynorsk) > Syntax Newspapers > Norway Mass media > Norway Information technology > Norway Norwegisch Zeitungssprache Korpus > Linguistik Norwegen Aufsatzsammlung
Online-Zugang:	DE-1046 DE-1047 Volltext
Beschreibung:	6. Data and experimental evaluation Print version record
Beschreibung:	1 online resource (362 pages)
ISBN:	1280497661 9027274991 9781280497667 9789027274991

Internformat

MARC


LEADER	00000nmm a2200000zc 4500
001	BV043033325
003	DE-604
007	cr\|uuu---uuuuu
008	151120s2012 \|\|\|\| o\|\|u\| \|\|\|\|\|\|eng d
020			\|a 1280497661 \|9 1-280-49766-1
020			\|a 9027274991 \|c electronic bk. \|9 90-272-7499-1
020			\|a 9781280497667 \|9 978-1-280-49766-7
020			\|a 9789027274991 \|c electronic bk. \|9 978-90-272-7499-1
035			\|a (OCoLC)779828976
035			\|a (DE-599)BVBBV043033325
040			\|a DE-604 \|b ger \|e rda
041	0		\|a eng
049			\|a DE-1046 \|a DE-1047
082	0		\|a 439.8/20188 \|2 23
245	1	0	\|a Exploring newspaper language \|b using the web to create and investigate a large corpus of modern Norwegian \|c edited by Gisele Andersen
264		1	\|a Amsterdam ; Philadelphia \|b John Benjamins Pub. Co. \|c 2012
300			\|a 1 online resource (362 pages)
336			\|b txt \|2 rdacontent
337			\|b c \|2 rdamedia
338			\|b cr \|2 rdacarrier
500			\|a 6. Data and experimental evaluation
500			\|a Print version record
505	8		\|a Exploring Newspaper Language; Editorial page; Titla page; LCC data; Table of contents; Building a large corpus based on newspapers from the web; 1. Introduction; 2. An overview of the Norwegian Newspaper Corpus and its system architecture; 2.1 Text harvesting; 2.2 Boilerplate and duplicate removal; 2.3 Language classification; 2.4 Text annotation; 2.4.1 Annotation of source, date and author information; 2.4.2 Topic classification; 2.4.3 Part-of-speech tagging; 2.5 Search system and user interface; 2.5.1 Corpus WorkBench; 2.5.2 Corpuscle; 2.6 Extraction of new words
505	8		\|a 2.7 Classification of new words2.7.1 Anglicism detection; 2.8 Frequency profiling and lexical database entry; 2.9 Identification of multiword expressions; 3. The content of the research contributions to this book; 4. Concluding remarks; References; Part II. Exploiting the web as a corpus -- Methods and tools; Corpuscle -- a new corpus management platform for annotated corpora; 1. Introduction; 2. Design principles; 3. Querying the corpus; 4. API and Web interface; 4.1 The API; 4.2 The Web interface; 5. Editing and manual annotation; 6. Evaluation and concluding remarks; References; OBT+stat
505	8		\|a 1. Introduction2. Background; 2.1 The history of the Oslo-Bergen Tagger; 2.2 State of the art for Norwegian POS taggers; 3. The architecture of the Oslo-Bergen Constraint Grammar Tagger; 4. Methodology of improvements to the Oslo-Bergen Tagger; 5. Dealing with left-over ambiguities in the Oslo-Bergen Tagger; 5.1 Morphological ambiguities; 5.2 Lemma ambiguities; 6. Statistical disambiguation; 7. Modelling challenges and engineering concerns; 8. Evaluation of the statistical module; 8.1 How to evaluate; 8.2 Evaluation results; 9. Conclusion; References
505	8		\|a Exploring corpora through syntactic annotation1. Introduction; 2. Treebanking; 3. INESS -- the Norwegian treebanking infrastructure; 4. Searching for complex syntactic constructions in a treebank; 4.1 Passive constructions; 4.2 Relative clauses; 5. Conclusion; References; Collocations and statistical analysis of n-grams; 1. Introduction; 2. Background; 2.1 Multiword Expressions (MWEs); 2.2 Collocations; 3. Methodology; 3.1 Data and n-gram extraction; 3.2 Post-processing of n-gram lists; 3.3 Contingency tables; 3.3.1 Bigram Contingency Tables; 3.3.2 Trigram Contingency Tables
505	8		\|a 3.4 Bigram Association Measures3.5 Trigram Association Measures; 4. Results; 4.1 Bigrams; 4.2 Trigrams; 5. Conclusion and Future Work; References; Automatic topic classi?cation of a large newspaper corpus; 1. Introduction; 2. Background and related work; 2.1 The rule-based approach; 2.2 The pattern-matching approach; 2.3 Promising results; 3. Material; 3.1 Manual annotation; 3.2 Feature extraction; 3.3 Cleaning the text; 3.4 The gold standard; 4. Overview of our final approach; 5. Our approach in detail; 5.1 Hypothesis; 5.2 De?ning categories; 5.3 Tools; 5.4 Programming and experimenting
505	8		\|a This book describes new methodological and technological approaches to corpus building and presents recent research based on the Norwegian Newspaper Corpus. This is a large monitor corpus of contemporary Norwegian language, compiled through daily harvesting of web newspapers. The book gives an overview of the corpus and its system architecture, and presents tools used for tasks such as text harvesting, annotation, topic classification and extraction and frequency profiling of new words and phrases. Among the innovative technologies is Corpuscle, a corpus query engine and management system whic
650		7	\|a FOREIGN LANGUAGE STUDY / Norwegian \|2 bisacsh
650		7	\|a FOREIGN LANGUAGE STUDY / Scandinavian Languages (Other) \|2 bisacsh
650		4	\|a Massenmedien
650		4	\|a Norwegian language (Nynorsk) \|x Usage
650		4	\|a Norwegian language (Nynorsk) \|x Syntax
650		4	\|a Newspapers \|z Norway
650		4	\|a Mass media \|z Norway
650		4	\|a Information technology \|z Norway
650	0	7	\|a Norwegisch \|0 (DE-588)4120291-0 \|2 gnd \|9 rswk-swf
650	0	7	\|a Zeitungssprache \|0 (DE-588)4131821-3 \|2 gnd \|9 rswk-swf
650	0	7	\|a Korpus \|g Linguistik \|0 (DE-588)4165338-5 \|2 gnd \|9 rswk-swf
651		4	\|a Norwegen
655		7	\|8 1\p \|0 (DE-588)4143413-4 \|a Aufsatzsammlung \|2 gnd-content
689	0	0	\|a Norwegisch \|0 (DE-588)4120291-0 \|D s
689	0	1	\|a Zeitungssprache \|0 (DE-588)4131821-3 \|D s
689	0	2	\|a Korpus \|g Linguistik \|0 (DE-588)4165338-5 \|D s
689	0		\|8 2\p \|5 DE-604
700	1		\|a Andersen, Gisle \|e Sonstige \|4 oth
776	0	8	\|i Erscheint auch als \|n Druck-Ausgabe \|a Andersen, Gisle \|t Exploring Newspaper Language : Using the web to create and investigate a large corpus of modern Norwegian
856	4	0	\|u http://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&db=nlabk&AN=439344 \|x Aggregator \|3 Volltext
883	1		\|8 1\p \|a cgwrk \|d 20201028 \|q DE-101 \|u https://d-nb.info/provenance/plan#cgwrk
883	1		\|8 2\p \|a cgwrk \|d 20201028 \|q DE-101 \|u https://d-nb.info/provenance/plan#cgwrk
912			\|a ZDB-4-EBA
943	1		\|a oai:aleph.bib-bvb.de:BVB01-028457975
966	e		\|u http://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&db=nlabk&AN=439344 \|l DE-1046 \|p ZDB-4-EBA \|q FAW_PDA_EBA \|x Aggregator \|3 Volltext
966	e		\|u http://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&db=nlabk&AN=439344 \|l DE-1047 \|p ZDB-4-EBA \|q FAW_PDA_EBA \|x Aggregator \|3 Volltext

Datensatz im Suchindex

_version_	1807956176032235520
adam_text
any_adam_object
building	Verbundindex
bvnumber	BV043033325
collection	ZDB-4-EBA
contents	Exploring Newspaper Language; Editorial page; Titla page; LCC data; Table of contents; Building a large corpus based on newspapers from the web; 1. Introduction; 2. An overview of the Norwegian Newspaper Corpus and its system architecture; 2.1 Text harvesting; 2.2 Boilerplate and duplicate removal; 2.3 Language classification; 2.4 Text annotation; 2.4.1 Annotation of source, date and author information; 2.4.2 Topic classification; 2.4.3 Part-of-speech tagging; 2.5 Search system and user interface; 2.5.1 Corpus WorkBench; 2.5.2 Corpuscle; 2.6 Extraction of new words 2.7 Classification of new words2.7.1 Anglicism detection; 2.8 Frequency profiling and lexical database entry; 2.9 Identification of multiword expressions; 3. The content of the research contributions to this book; 4. Concluding remarks; References; Part II. Exploiting the web as a corpus -- Methods and tools; Corpuscle -- a new corpus management platform for annotated corpora; 1. Introduction; 2. Design principles; 3. Querying the corpus; 4. API and Web interface; 4.1 The API; 4.2 The Web interface; 5. Editing and manual annotation; 6. Evaluation and concluding remarks; References; OBT+stat 1. Introduction2. Background; 2.1 The history of the Oslo-Bergen Tagger; 2.2 State of the art for Norwegian POS taggers; 3. The architecture of the Oslo-Bergen Constraint Grammar Tagger; 4. Methodology of improvements to the Oslo-Bergen Tagger; 5. Dealing with left-over ambiguities in the Oslo-Bergen Tagger; 5.1 Morphological ambiguities; 5.2 Lemma ambiguities; 6. Statistical disambiguation; 7. Modelling challenges and engineering concerns; 8. Evaluation of the statistical module; 8.1 How to evaluate; 8.2 Evaluation results; 9. Conclusion; References Exploring corpora through syntactic annotation1. Introduction; 2. Treebanking; 3. INESS -- the Norwegian treebanking infrastructure; 4. Searching for complex syntactic constructions in a treebank; 4.1 Passive constructions; 4.2 Relative clauses; 5. Conclusion; References; Collocations and statistical analysis of n-grams; 1. Introduction; 2. Background; 2.1 Multiword Expressions (MWEs); 2.2 Collocations; 3. Methodology; 3.1 Data and n-gram extraction; 3.2 Post-processing of n-gram lists; 3.3 Contingency tables; 3.3.1 Bigram Contingency Tables; 3.3.2 Trigram Contingency Tables 3.4 Bigram Association Measures3.5 Trigram Association Measures; 4. Results; 4.1 Bigrams; 4.2 Trigrams; 5. Conclusion and Future Work; References; Automatic topic classi?cation of a large newspaper corpus; 1. Introduction; 2. Background and related work; 2.1 The rule-based approach; 2.2 The pattern-matching approach; 2.3 Promising results; 3. Material; 3.1 Manual annotation; 3.2 Feature extraction; 3.3 Cleaning the text; 3.4 The gold standard; 4. Overview of our final approach; 5. Our approach in detail; 5.1 Hypothesis; 5.2 De?ning categories; 5.3 Tools; 5.4 Programming and experimenting This book describes new methodological and technological approaches to corpus building and presents recent research based on the Norwegian Newspaper Corpus. This is a large monitor corpus of contemporary Norwegian language, compiled through daily harvesting of web newspapers. The book gives an overview of the corpus and its system architecture, and presents tools used for tasks such as text harvesting, annotation, topic classification and extraction and frequency profiling of new words and phrases. Among the innovative technologies is Corpuscle, a corpus query engine and management system whic
ctrlnum	(OCoLC)779828976 (DE-599)BVBBV043033325
dewey-full	439.8/20188
dewey-hundreds	400 - Language
dewey-ones	439 - Other Germanic languages
dewey-raw	439.8/20188
dewey-search	439.8/20188
dewey-sort	3439.8 520188
dewey-tens	430 - German and related languages
discipline	Germanistik / Niederlandistik / Skandinavistik
format	Electronic eBook
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nmm a2200000zc 4500</leader><controlfield tag="001">BV043033325</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="007">cr\|uuu---uuuuu</controlfield><controlfield tag="008">151120s2012 \|\|\|\| o\|\|u\| \|\|\|\|\|\|eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1280497661</subfield><subfield code="9">1-280-49766-1</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9027274991</subfield><subfield code="c">electronic bk.</subfield><subfield code="9">90-272-7499-1</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781280497667</subfield><subfield code="9">978-1-280-49766-7</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9789027274991</subfield><subfield code="c">electronic bk.</subfield><subfield code="9">978-90-272-7499-1</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)779828976</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV043033325</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-1046</subfield><subfield code="a">DE-1047</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">439.8/20188</subfield><subfield code="2">23</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Exploring newspaper language</subfield><subfield code="b">using the web to create and investigate a large corpus of modern Norwegian</subfield><subfield code="c">edited by Gisele Andersen</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Amsterdam ; Philadelphia</subfield><subfield code="b">John Benjamins Pub. Co.</subfield><subfield code="c">2012</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 online resource (362 pages)</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">6. Data and experimental evaluation</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Print version record</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Exploring Newspaper Language; Editorial page; Titla page; LCC data; Table of contents; Building a large corpus based on newspapers from the web; 1. Introduction; 2. An overview of the Norwegian Newspaper Corpus and its system architecture; 2.1 Text harvesting; 2.2 Boilerplate and duplicate removal; 2.3 Language classification; 2.4 Text annotation; 2.4.1 Annotation of source, date and author information; 2.4.2 Topic classification; 2.4.3 Part-of-speech tagging; 2.5 Search system and user interface; 2.5.1 Corpus WorkBench; 2.5.2 Corpuscle; 2.6 Extraction of new words</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">2.7 Classification of new words2.7.1 Anglicism detection; 2.8 Frequency profiling and lexical database entry; 2.9 Identification of multiword expressions; 3. The content of the research contributions to this book; 4. Concluding remarks; References; Part II. Exploiting the web as a corpus -- Methods and tools; Corpuscle -- a new corpus management platform for annotated corpora; 1. Introduction; 2. Design principles; 3. Querying the corpus; 4. API and Web interface; 4.1 The API; 4.2 The Web interface; 5. Editing and manual annotation; 6. Evaluation and concluding remarks; References; OBT+stat</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">1. Introduction2. Background; 2.1 The history of the Oslo-Bergen Tagger; 2.2 State of the art for Norwegian POS taggers; 3. The architecture of the Oslo-Bergen Constraint Grammar Tagger; 4. Methodology of improvements to the Oslo-Bergen Tagger; 5. Dealing with left-over ambiguities in the Oslo-Bergen Tagger; 5.1 Morphological ambiguities; 5.2 Lemma ambiguities; 6. Statistical disambiguation; 7. Modelling challenges and engineering concerns; 8. Evaluation of the statistical module; 8.1 How to evaluate; 8.2 Evaluation results; 9. Conclusion; References</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">Exploring corpora through syntactic annotation1. Introduction; 2. Treebanking; 3. INESS -- the Norwegian treebanking infrastructure; 4. Searching for complex syntactic constructions in a treebank; 4.1 Passive constructions; 4.2 Relative clauses; 5. Conclusion; References; Collocations and statistical analysis of n-grams; 1. Introduction; 2. Background; 2.1 Multiword Expressions (MWEs); 2.2 Collocations; 3. Methodology; 3.1 Data and n-gram extraction; 3.2 Post-processing of n-gram lists; 3.3 Contingency tables; 3.3.1 Bigram Contingency Tables; 3.3.2 Trigram Contingency Tables</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">3.4 Bigram Association Measures3.5 Trigram Association Measures; 4. Results; 4.1 Bigrams; 4.2 Trigrams; 5. Conclusion and Future Work; References; Automatic topic classi?cation of a large newspaper corpus; 1. Introduction; 2. Background and related work; 2.1 The rule-based approach; 2.2 The pattern-matching approach; 2.3 Promising results; 3. Material; 3.1 Manual annotation; 3.2 Feature extraction; 3.3 Cleaning the text; 3.4 The gold standard; 4. Overview of our final approach; 5. Our approach in detail; 5.1 Hypothesis; 5.2 De?ning categories; 5.3 Tools; 5.4 Programming and experimenting</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">This book describes new methodological and technological approaches to corpus building and presents recent research based on the Norwegian Newspaper Corpus. This is a large monitor corpus of contemporary Norwegian language, compiled through daily harvesting of web newspapers. The book gives an overview of the corpus and its system architecture, and presents tools used for tasks such as text harvesting, annotation, topic classification and extraction and frequency profiling of new words and phrases. Among the innovative technologies is Corpuscle, a corpus query engine and management system whic</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">FOREIGN LANGUAGE STUDY / Norwegian</subfield><subfield code="2">bisacsh</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">FOREIGN LANGUAGE STUDY / Scandinavian Languages (Other)</subfield><subfield code="2">bisacsh</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Massenmedien</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Norwegian language (Nynorsk)</subfield><subfield code="x">Usage</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Norwegian language (Nynorsk)</subfield><subfield code="x">Syntax</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Newspapers</subfield><subfield code="z">Norway</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Mass media</subfield><subfield code="z">Norway</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Information technology</subfield><subfield code="z">Norway</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Norwegisch</subfield><subfield code="0">(DE-588)4120291-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Zeitungssprache</subfield><subfield code="0">(DE-588)4131821-3</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="651" ind1=" " ind2="4"><subfield code="a">Norwegen</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="8">1\p</subfield><subfield code="0">(DE-588)4143413-4</subfield><subfield code="a">Aufsatzsammlung</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Norwegisch</subfield><subfield code="0">(DE-588)4120291-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Zeitungssprache</subfield><subfield code="0">(DE-588)4131821-3</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="8">2\p</subfield><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Andersen, Gisle</subfield><subfield code="e">Sonstige</subfield><subfield code="4">oth</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Druck-Ausgabe</subfield><subfield code="a">Andersen, Gisle</subfield><subfield code="t">Exploring Newspaper Language : Using the web to create and investigate a large corpus of modern Norwegian</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">http://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&db=nlabk&AN=439344</subfield><subfield code="x">Aggregator</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="883" ind1="1" ind2=" "><subfield code="8">1\p</subfield><subfield code="a">cgwrk</subfield><subfield code="d">20201028</subfield><subfield code="q">DE-101</subfield><subfield code="u">https://d-nb.info/provenance/plan#cgwrk</subfield></datafield><datafield tag="883" ind1="1" ind2=" "><subfield code="8">2\p</subfield><subfield code="a">cgwrk</subfield><subfield code="d">20201028</subfield><subfield code="q">DE-101</subfield><subfield code="u">https://d-nb.info/provenance/plan#cgwrk</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-4-EBA</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-028457975</subfield></datafield><datafield tag="966" ind1="e" ind2=" "><subfield code="u">http://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&db=nlabk&AN=439344</subfield><subfield code="l">DE-1046</subfield><subfield code="p">ZDB-4-EBA</subfield><subfield code="q">FAW_PDA_EBA</subfield><subfield code="x">Aggregator</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="966" ind1="e" ind2=" "><subfield code="u">http://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&db=nlabk&AN=439344</subfield><subfield code="l">DE-1047</subfield><subfield code="p">ZDB-4-EBA</subfield><subfield code="q">FAW_PDA_EBA</subfield><subfield code="x">Aggregator</subfield><subfield code="3">Volltext</subfield></datafield></record></collection>
genre	1\p (DE-588)4143413-4 Aufsatzsammlung gnd-content
genre_facet	Aufsatzsammlung
geographic	Norwegen
geographic_facet	Norwegen
id	DE-604.BV043033325
illustrated	Not Illustrated
indexdate	2024-08-21T00:49:30Z
institution	BVB
isbn	1280497661 9027274991 9781280497667 9789027274991
language	English
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-028457975
oclc_num	779828976
open_access_boolean
owner	DE-1046 DE-1047
owner_facet	DE-1046 DE-1047
physical	1 online resource (362 pages)
psigel	ZDB-4-EBA ZDB-4-EBA FAW_PDA_EBA
publishDate	2012
publishDateSearch	2012
publishDateSort	2012
publisher	John Benjamins Pub. Co.
record_format	marc
spelling	Exploring newspaper language using the web to create and investigate a large corpus of modern Norwegian edited by Gisele Andersen Amsterdam ; Philadelphia John Benjamins Pub. Co. 2012 1 online resource (362 pages) txt rdacontent c rdamedia cr rdacarrier 6. Data and experimental evaluation Print version record Exploring Newspaper Language; Editorial page; Titla page; LCC data; Table of contents; Building a large corpus based on newspapers from the web; 1. Introduction; 2. An overview of the Norwegian Newspaper Corpus and its system architecture; 2.1 Text harvesting; 2.2 Boilerplate and duplicate removal; 2.3 Language classification; 2.4 Text annotation; 2.4.1 Annotation of source, date and author information; 2.4.2 Topic classification; 2.4.3 Part-of-speech tagging; 2.5 Search system and user interface; 2.5.1 Corpus WorkBench; 2.5.2 Corpuscle; 2.6 Extraction of new words 2.7 Classification of new words2.7.1 Anglicism detection; 2.8 Frequency profiling and lexical database entry; 2.9 Identification of multiword expressions; 3. The content of the research contributions to this book; 4. Concluding remarks; References; Part II. Exploiting the web as a corpus -- Methods and tools; Corpuscle -- a new corpus management platform for annotated corpora; 1. Introduction; 2. Design principles; 3. Querying the corpus; 4. API and Web interface; 4.1 The API; 4.2 The Web interface; 5. Editing and manual annotation; 6. Evaluation and concluding remarks; References; OBT+stat 1. Introduction2. Background; 2.1 The history of the Oslo-Bergen Tagger; 2.2 State of the art for Norwegian POS taggers; 3. The architecture of the Oslo-Bergen Constraint Grammar Tagger; 4. Methodology of improvements to the Oslo-Bergen Tagger; 5. Dealing with left-over ambiguities in the Oslo-Bergen Tagger; 5.1 Morphological ambiguities; 5.2 Lemma ambiguities; 6. Statistical disambiguation; 7. Modelling challenges and engineering concerns; 8. Evaluation of the statistical module; 8.1 How to evaluate; 8.2 Evaluation results; 9. Conclusion; References Exploring corpora through syntactic annotation1. Introduction; 2. Treebanking; 3. INESS -- the Norwegian treebanking infrastructure; 4. Searching for complex syntactic constructions in a treebank; 4.1 Passive constructions; 4.2 Relative clauses; 5. Conclusion; References; Collocations and statistical analysis of n-grams; 1. Introduction; 2. Background; 2.1 Multiword Expressions (MWEs); 2.2 Collocations; 3. Methodology; 3.1 Data and n-gram extraction; 3.2 Post-processing of n-gram lists; 3.3 Contingency tables; 3.3.1 Bigram Contingency Tables; 3.3.2 Trigram Contingency Tables 3.4 Bigram Association Measures3.5 Trigram Association Measures; 4. Results; 4.1 Bigrams; 4.2 Trigrams; 5. Conclusion and Future Work; References; Automatic topic classi?cation of a large newspaper corpus; 1. Introduction; 2. Background and related work; 2.1 The rule-based approach; 2.2 The pattern-matching approach; 2.3 Promising results; 3. Material; 3.1 Manual annotation; 3.2 Feature extraction; 3.3 Cleaning the text; 3.4 The gold standard; 4. Overview of our final approach; 5. Our approach in detail; 5.1 Hypothesis; 5.2 De?ning categories; 5.3 Tools; 5.4 Programming and experimenting This book describes new methodological and technological approaches to corpus building and presents recent research based on the Norwegian Newspaper Corpus. This is a large monitor corpus of contemporary Norwegian language, compiled through daily harvesting of web newspapers. The book gives an overview of the corpus and its system architecture, and presents tools used for tasks such as text harvesting, annotation, topic classification and extraction and frequency profiling of new words and phrases. Among the innovative technologies is Corpuscle, a corpus query engine and management system whic FOREIGN LANGUAGE STUDY / Norwegian bisacsh FOREIGN LANGUAGE STUDY / Scandinavian Languages (Other) bisacsh Massenmedien Norwegian language (Nynorsk) Usage Norwegian language (Nynorsk) Syntax Newspapers Norway Mass media Norway Information technology Norway Norwegisch (DE-588)4120291-0 gnd rswk-swf Zeitungssprache (DE-588)4131821-3 gnd rswk-swf Korpus Linguistik (DE-588)4165338-5 gnd rswk-swf Norwegen 1\p (DE-588)4143413-4 Aufsatzsammlung gnd-content Norwegisch (DE-588)4120291-0 s Zeitungssprache (DE-588)4131821-3 s Korpus Linguistik (DE-588)4165338-5 s 2\p DE-604 Andersen, Gisle Sonstige oth Erscheint auch als Druck-Ausgabe Andersen, Gisle Exploring Newspaper Language : Using the web to create and investigate a large corpus of modern Norwegian http://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&db=nlabk&AN=439344 Aggregator Volltext 1\p cgwrk 20201028 DE-101 https://d-nb.info/provenance/plan#cgwrk 2\p cgwrk 20201028 DE-101 https://d-nb.info/provenance/plan#cgwrk
spellingShingle	Exploring newspaper language using the web to create and investigate a large corpus of modern Norwegian Exploring Newspaper Language; Editorial page; Titla page; LCC data; Table of contents; Building a large corpus based on newspapers from the web; 1. Introduction; 2. An overview of the Norwegian Newspaper Corpus and its system architecture; 2.1 Text harvesting; 2.2 Boilerplate and duplicate removal; 2.3 Language classification; 2.4 Text annotation; 2.4.1 Annotation of source, date and author information; 2.4.2 Topic classification; 2.4.3 Part-of-speech tagging; 2.5 Search system and user interface; 2.5.1 Corpus WorkBench; 2.5.2 Corpuscle; 2.6 Extraction of new words 2.7 Classification of new words2.7.1 Anglicism detection; 2.8 Frequency profiling and lexical database entry; 2.9 Identification of multiword expressions; 3. The content of the research contributions to this book; 4. Concluding remarks; References; Part II. Exploiting the web as a corpus -- Methods and tools; Corpuscle -- a new corpus management platform for annotated corpora; 1. Introduction; 2. Design principles; 3. Querying the corpus; 4. API and Web interface; 4.1 The API; 4.2 The Web interface; 5. Editing and manual annotation; 6. Evaluation and concluding remarks; References; OBT+stat 1. Introduction2. Background; 2.1 The history of the Oslo-Bergen Tagger; 2.2 State of the art for Norwegian POS taggers; 3. The architecture of the Oslo-Bergen Constraint Grammar Tagger; 4. Methodology of improvements to the Oslo-Bergen Tagger; 5. Dealing with left-over ambiguities in the Oslo-Bergen Tagger; 5.1 Morphological ambiguities; 5.2 Lemma ambiguities; 6. Statistical disambiguation; 7. Modelling challenges and engineering concerns; 8. Evaluation of the statistical module; 8.1 How to evaluate; 8.2 Evaluation results; 9. Conclusion; References Exploring corpora through syntactic annotation1. Introduction; 2. Treebanking; 3. INESS -- the Norwegian treebanking infrastructure; 4. Searching for complex syntactic constructions in a treebank; 4.1 Passive constructions; 4.2 Relative clauses; 5. Conclusion; References; Collocations and statistical analysis of n-grams; 1. Introduction; 2. Background; 2.1 Multiword Expressions (MWEs); 2.2 Collocations; 3. Methodology; 3.1 Data and n-gram extraction; 3.2 Post-processing of n-gram lists; 3.3 Contingency tables; 3.3.1 Bigram Contingency Tables; 3.3.2 Trigram Contingency Tables 3.4 Bigram Association Measures3.5 Trigram Association Measures; 4. Results; 4.1 Bigrams; 4.2 Trigrams; 5. Conclusion and Future Work; References; Automatic topic classi?cation of a large newspaper corpus; 1. Introduction; 2. Background and related work; 2.1 The rule-based approach; 2.2 The pattern-matching approach; 2.3 Promising results; 3. Material; 3.1 Manual annotation; 3.2 Feature extraction; 3.3 Cleaning the text; 3.4 The gold standard; 4. Overview of our final approach; 5. Our approach in detail; 5.1 Hypothesis; 5.2 De?ning categories; 5.3 Tools; 5.4 Programming and experimenting This book describes new methodological and technological approaches to corpus building and presents recent research based on the Norwegian Newspaper Corpus. This is a large monitor corpus of contemporary Norwegian language, compiled through daily harvesting of web newspapers. The book gives an overview of the corpus and its system architecture, and presents tools used for tasks such as text harvesting, annotation, topic classification and extraction and frequency profiling of new words and phrases. Among the innovative technologies is Corpuscle, a corpus query engine and management system whic FOREIGN LANGUAGE STUDY / Norwegian bisacsh FOREIGN LANGUAGE STUDY / Scandinavian Languages (Other) bisacsh Massenmedien Norwegian language (Nynorsk) Usage Norwegian language (Nynorsk) Syntax Newspapers Norway Mass media Norway Information technology Norway Norwegisch (DE-588)4120291-0 gnd Zeitungssprache (DE-588)4131821-3 gnd Korpus Linguistik (DE-588)4165338-5 gnd
subject_GND	(DE-588)4120291-0 (DE-588)4131821-3 (DE-588)4165338-5 (DE-588)4143413-4
title	Exploring newspaper language using the web to create and investigate a large corpus of modern Norwegian
title_auth	Exploring newspaper language using the web to create and investigate a large corpus of modern Norwegian
title_exact_search	Exploring newspaper language using the web to create and investigate a large corpus of modern Norwegian
title_full	Exploring newspaper language using the web to create and investigate a large corpus of modern Norwegian edited by Gisele Andersen
title_fullStr	Exploring newspaper language using the web to create and investigate a large corpus of modern Norwegian edited by Gisele Andersen
title_full_unstemmed	Exploring newspaper language using the web to create and investigate a large corpus of modern Norwegian edited by Gisele Andersen
title_short	Exploring newspaper language
title_sort	exploring newspaper language using the web to create and investigate a large corpus of modern norwegian
title_sub	using the web to create and investigate a large corpus of modern Norwegian
topic	FOREIGN LANGUAGE STUDY / Norwegian bisacsh FOREIGN LANGUAGE STUDY / Scandinavian Languages (Other) bisacsh Massenmedien Norwegian language (Nynorsk) Usage Norwegian language (Nynorsk) Syntax Newspapers Norway Mass media Norway Information technology Norway Norwegisch (DE-588)4120291-0 gnd Zeitungssprache (DE-588)4131821-3 gnd Korpus Linguistik (DE-588)4165338-5 gnd
topic_facet	FOREIGN LANGUAGE STUDY / Norwegian FOREIGN LANGUAGE STUDY / Scandinavian Languages (Other) Massenmedien Norwegian language (Nynorsk) Usage Norwegian language (Nynorsk) Syntax Newspapers Norway Mass media Norway Information technology Norway Norwegisch Zeitungssprache Korpus Linguistik Norwegen Aufsatzsammlung
url	http://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&db=nlabk&AN=439344
work_keys_str_mv	AT andersengisle exploringnewspaperlanguageusingthewebtocreateandinvestigatealargecorpusofmodernnorwegian

Verfügbarkeit

Es ist kein Print-Exemplar vorhanden.

Fernleihe Bestellen Achtung: Nicht im THWS-Bestand! Volltext öffnen

MARC

Datensatz im Suchindex

Es ist kein Print-Exemplar vorhanden.

Ähnliche Einträge