Verfügbarkeit: Language and text

Language and text: data, models, information and applications

Gespeichert in:

Bibliographische Detailangaben
Körperschaft:	QUALICO (Veranstaltung) Breslau (VerfasserIn)
Weitere Verfasser:	Pawłowski, Adam (HerausgeberIn), Mačutek, Ján 1976- (HerausgeberIn), Embleton, Sheila M. (HerausgeberIn), Mikros, George K. (HerausgeberIn)
Format:	Tagungsbericht Buch
Sprache:	English
Veröffentlicht:	Amsterdam ; Philadelphia John Benjamins Publishing Company [2021]
Schriftenreihe:	Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory volume 356
Schlagworte:	Linguistics > Statistical methods > Congresses Computational linguistics > Congresses Empirische Linguistik Datenverarbeitung Linguistik Korpus > Linguistik Computerlinguistik Essays Conference papers and proceedings Konferenzschrift > 05.07.2018-08.07.2018 > Breslau
Online-Zugang:	Inhaltsverzeichnis Register // Sachregister
Beschreibung:	Hervorgegangen aus der 10. QUALICO Konferenz, die vom 05.07.2018-08.07.2018 in Breslau, Polen, stattgefunden hat - Einleitung
Beschreibung:	vi, 280 Seiten
ISBN:	9789027210104

Internformat

MARC


LEADER	00000nam a2200000 cb4500
001	BV047816174
003	DE-604
007	t
008	220204s2021 \|\|\|\| 10\|\|\| eng d
020			\|a 9789027210104 \|9 978-90-272-1010-4
035			\|a (DE-599)BVBBV047816174
040			\|a DE-604 \|b ger \|e rda
041	0		\|a eng
049			\|a DE-12
050		0	\|a P138.5
111	2		\|a QUALICO (Veranstaltung) \|n 10. \|d 2018 \|c Breslau \|j Verfasser \|0 (DE-588)1249244242 \|4 aut
245	1	0	\|a Language and text \|b data, models, information and applications \|c edited by Adam Pawłowski (University of Wrocław), Jan Mačutek (Mathematical Institute of Slovac Academy of Sciences & Constantine the Philosopher University in Nitra), Sheila Embleton (York University), George Mikros (Hamad Bin Khalifa University)
264		1	\|a Amsterdam ; Philadelphia \|b John Benjamins Publishing Company \|c [2021]
264		4	\|c © 2021
300			\|a vi, 280 Seiten
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
490	1		\|a Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory \|v volume 356
500			\|a Hervorgegangen aus der 10. QUALICO Konferenz, die vom 05.07.2018-08.07.2018 in Breslau, Polen, stattgefunden hat - Einleitung
650		4	\|a Linguistics \|x Statistical methods \|v Congresses
650		4	\|a Computational linguistics \|v Congresses
650	0	7	\|a Empirische Linguistik \|0 (DE-588)4406207-2 \|2 gnd \|9 rswk-swf
650	0	7	\|a Datenverarbeitung \|0 (DE-588)4011152-0 \|2 gnd \|9 rswk-swf
650	0	7	\|a Linguistik \|0 (DE-588)4074250-7 \|2 gnd \|9 rswk-swf
650	0	7	\|a Korpus \|g Linguistik \|0 (DE-588)4165338-5 \|2 gnd \|9 rswk-swf
650	0	7	\|a Computerlinguistik \|0 (DE-588)4035843-4 \|2 gnd \|9 rswk-swf
653		6	\|a Essays
653		6	\|a Conference papers and proceedings
655		7	\|0 (DE-588)1071861417 \|a Konferenzschrift \|y 05.07.2018-08.07.2018 \|z Breslau \|2 gnd-content
689	0	0	\|a Computerlinguistik \|0 (DE-588)4035843-4 \|D s
689	0		\|5 DE-604
689	1	0	\|a Linguistik \|0 (DE-588)4074250-7 \|D s
689	1	1	\|a Datenverarbeitung \|0 (DE-588)4011152-0 \|D s
689	1		\|5 DE-604
689	2	0	\|a Korpus \|g Linguistik \|0 (DE-588)4165338-5 \|D s
689	2		\|5 DE-604
689	3	0	\|a Empirische Linguistik \|0 (DE-588)4406207-2 \|D s
689	3		\|5 DE-604
700	1		\|a Pawłowski, Adam \|4 edt
700	1		\|a Mačutek, Ján \|d 1976- \|0 (DE-588)1085147258 \|4 edt
700	1		\|a Embleton, Sheila M. \|4 edt
700	1		\|a Mikros, George K. \|4 edt
776	0	8	\|i Erscheint auch als \|n Online-Ausgabe \|z 978-90-272-5838-0
830		0	\|a Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory \|v volume 356 \|w (DE-604)BV000001437 \|9 356
856	4	2	\|m Digitalisierung BSB München 19 - ADAM Catalogue Enrichment \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033199570&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
856	4	2	\|m Digitalisierung BSB München 19 - ADAM Catalogue Enrichment \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033199570&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA \|3 Register // Sachregister
940	1		\|n oe
943	1		\|a oai:aleph.bib-bvb.de:BVB01-033199570

Datensatz im Suchindex

_version_	1807956262140248064
adam_text	Table of contents Introduction Adam Pawłowski, Sheila Embleton, Jan Mačutek and George Mikros i Parti. Theory and models On the impact of the initial phrase length on the position of enclitics in Old Czech Radek Čech, Pavel Kosek, Olga Navrátilová and Ján Mačutek 9 Term distance, frequency and collocations Lars G. Johnsen 21 A method for the comparison of general sequences via type-token ratio Vladimir Matlach, Diego Gabriel Krivochen and Jiří Milička 37 Quantitative analysis of syllable properties in Croatian, Serbian, Russian, and Ukrainian Biljana Rujević, Marija Kaplar, Sebastijan Kaplar, Ranka Stankovič, Ivan Obradovič and Ján Mačutek N-grams of grammatical functions and their significant order in the Japanese clause Haruko Sanada Linking the dependents: Quantitative-linguistic hypotheses on valency Petra Steiner Grammar efficiency and the One-Meaning-One-Form Principle Reija Vulanović Distribution and characteristics of commonly used words across different texts in Japanese Makoto Yamazaki 55 69 93 109 121 vi Language and Text Part II. Empirical studies The perils of big data Sheila Embleton, Dorin Uritescu and Eric S. Wheeler From distinguishability to informativity: A quantitative text model for detecting random texts Maxim Konca, Alexander Mehler, Daniel Baumartz and Wahed Hemati 137 145 A Modern Greek readability tool: Development of evaluation methods George Mikros and Rania Voskaki 163 Phonological properties as predictors of text success Jiří Milička and Alžběta Houzar Růžičková 177 Calculating the victory chances: A stylometric insight into the 2018 Czech presidential election Michal Místecký 195 Topological mapping for visualisation of high-dimensional historical linguistic data Hermann Moisi 209 Book genre and author’s gender recognition based on titles: The example of the bibliographic corpus of microtexts Adam Pawlowski, Elzbieta Herden and Tomasz Walkowiak 225 Quantitative analysis of bibliographic corpora: Statistical features, semantic profiles, word spectra Adam Pawlowski, Krzysztof Topolski and Elżbieta Herden 239 Analysis of English text genre classification based on dependency types Yaqin Wang 257 In memory of Gabriel Altmann: Eminent linguist, a man with a brilliant mind, and friend 271 Index 277 Index A activity (coefficient of) 149, 195. 199, 202 adjusted modulus (coefficient) 148,168 AI see artificial intelligence alpha (writer’s view, coefficient) 148,171,152-153,157-158 artificial intelligence 1,2,165, 235-236 ATL see average token length autocorrelation 147-148,151, 157 average token length (ATL) 195. 197. 148 В Balanced Corpus of Contem porary Written Japanese see tools and corpora Bayes classification 165 beauty-in-averageness effect 5,177-178,189-191 BERT see bidirectional encoder representations from transformers Bessel function 248 bibliography (as a corpus) 6, 225,227,239-240 bidirectional encoder represen tations from transformers 145.147.151.158 big data 2, 4,137-143,226 bigram 22,230,233-234 bijection 109,111,114,116,117 book genre see writing species book titles (corpus) 6,225, 234-235. 239.254 British English see language British National Corpus see tools and corpora Busemann coefficient 199 C cacophony 178,189-190 case 4,93-96,98 Chinese see language classification model see models classification 2,5-6,11-12, 19.115.145-146,151.153-159. 164-173, 226-234,257-266 clause 9-11,14,19,69-75, 79-82,87,89-90,97 clustering 3,6,24,37-38,42, 44,50,209,220,226,257-258, 260-266 coefficient of correlation 14, 17,61,119 cohesion of text 147,165 Coh-Metrix classifier see tools Coh-Metrix classifier see tools and corpora collocation 3,21- 22,24-35 Common European Framework of Languages 163-164,166 commonality (degree of) 4, 121,123-134 competence (linguistic) 170-173.235 complexity (of language, syntax, vocabulary) 37,39,50,109, 115,165,167,197,199,209 cophenetic correlation coefficient 260,264-265 Corpus of Middle English Prose and Verse see tools and corpora corpus see tools correlation coefficient see coefficient correlation 9,19, 61, 64,109110,119-120,146-148,151,153, 156.179, 226, 260,264-265 Crişana see dialect Croatian see language curve length R index Czech see language 148,168 D Dacey-Poisson distribution see statistical distribution dependency grammar 97, 257-258 dialect Crişana 137-140 shibboleths 137,139 dialectometrics 1, 4,137-143 digital humanities 6,225, 240 distance correlation 151,153, 156 distribution see statistical distribution doc2vec 228 E enclitic 9-12,14-19 encoding effort (minimization of) 94,100,106 English see language entropy 39, 41,50-51,148, 167-168,171-173 error based feature elimination 154.156 Euclidean distance 44,168, 211, 213, 215,260 euphony 5,177-178,189, 190-191 F fastText 6, 225,228-236 Finnish see language Flesch-Kincaid formula 165 Flesh readability formula 164-165 fog index 165 FrameNet 93-94.98,100-103 Fry readability score 165 278 Language and Text function word 121,133,167, 243,246 funcționai equivalent 93-96, J Japanese see language К 98 keyword 5,24,195-196,199, 203-205, 241 G Gauss-Poisson distribution see statistical distribution gender recognition 2,225-236 generation of text see natural language generation generative adversarial networks 2,150 generative model see models geography of language and texts 138-140, 209, 222, 240 German see language Gini coefficient 39,41,148,168, 171-171, 265 golden ratio 168 GPT-2 model see generative model grammar efficiency 109-110, 115-117,119-120 grammatical function 19, 69-71,73-86, 88-90,93-97 Greek see languages Guiraud’s R 164 Gutenberg corpus see tools and corpora Gutenberg Project 150-154, 157-158 H hapax legomena 145,148, 149.171 heat map 151 hierarchical cluster analysis 6, 213,257-258,260, 265, 266 historical linguistics 5, 209-222 h-point 147-148,149,167-168, 198-199 L lambda coefficient 148,156,171 LancsBox software see tools LancsBox software see tools and corpora language British English 138,259 Chinese 122,139,258,273 Croatian 55-63,65 Czech 2,5,9-19,42-43,62, 177-191,195-205 English 6,30,40,62, 72. 94֊95. 138,179.181, 190-191,216-217,շշշ։ 257-266,273 Finnish 138,142 German 95 Greek 5,163-173 Japanese 3-4,69-90, 121-134. 274 Mambila 139 Polish 6, 225-236,239-255 Romanian 137-139 Russian 3,55-65 Serbian 3,55-65 Turkish 110,115-119 Ukrainian 3,55-65 law of brevity 55,61,63 lexical balance 121 likes per view (parameter) 5,182-190 literary genre (recognizing) 63, 227-233,239 long short-term memories 150 Louvain-clustering 24 Μ I inflected languages 94-95,228, 232, 246 initial phrase (old Czech) 9-19 iterative feature elimination 153.155֊156 machine learning 3,5,37-38, 50,159,163-166,169,173. շշ5. 227, 235 Mambila see language MARC format 227,239, 241-241 Markov model 38,147 mathematical model see models MATTR (moving average type token ratio) 195,197,200 maximum onset principle 56-57 MDS see multidimensional scaling Menzerath-Altmann’s law 2-3, 55-56, 62-64,24б, շ54 models classification model 164, 166,170 generative model 146,150, 152,154-155 mathematical model 2, 17,40, 56,59-60,62-64, 134,220 probability model 23, 46-50,150,158 random text model 5.145-147,158-159 Sichel model 248 synergetic model of language 56,95,100 text model 145-159 Modern Greek Corpus see tools and corpora MOGRead (readability of Greek text) see tools and corpora morpheme 19,37, 61, 69,71-73 multidimensional scaling 37, 44-50,138-141,244-245,250 Multi-Layer Perceptron 228, 232 multivariate analysis 209, 212,216 mutual information 22-23,26 N named entity 164 National Corpus of Polish see tools and corpora natural language generation 145-146,150,159 natural language processing 90,145,153,165-166,225-226, 236,239,241,242,255 neural networks 145,147,150, 166,218,226 n-grams 3,38-41,46-47-50, 69-90,225, 228-230,232-234 Index 279 NLP see natural language processing nonlinearity 151,209-214 noun phrase 30, 69, 96 О one-meaning-one-form principle 4,109-120 P part of speech (POS) 94, 109-119,133,153,165-166,199, 242-243,247,257, 259,261 PCA see principal component analysis phonetics 2,139-140,177, 179-180, 217 phonology 5, 9, 56-57,100,138, 177-181,191, 217 phrase length 9-11,14-19 pleasantness of language (spoken) 179,191 political discourse 2,195-205 POS see part of speech precision 231,233-234 principal component analysis 6, 213,257-258, 260, 262-264, 266 principle of least effort 2, 94, 100,106,177,245, 254 probability model see models probability 10,21-23,25,39, 46,55,147,150,227,243,248 propositional function 109, 112-118 Q Quantitative Index Text Analyzer see tools quantitative text characteristics 5,39,145-149,158-159,167168,197-199 QUITA (Quantitative Index Text Analyzer) see tools and corpora QUITA see tools R Ri vocabulary size 46-49,147, 152-153,157-158,167-168,171 Random Forest (algorithm, classifier) 5-6,153-155,163, 169,172,257-258,261,264, 266 random text models see models random text 2,4-5,145-147, 150-151,154-159 randomization 50,145-147, 150-151,155,158 randomness 35,37-38,40-47, 49-51,169 rank-frequency relation 55,59, 147-148,168,198-199,248 readability of text (tool, assessment) 5,146,163-173 recall 231,233-234 relative chronology 216-217 repeat rate (relative, normalized) 149,152-153, 156-158,168,171 RFC see random forest RODA (Romanian Online Dialects) see tools and corpora RODA see tools Romanian Online Dialects see tools Romanian see language R-package ‘compute.es’ see tools and corpora RR see repeat rate Russian see language S Sacred Text Archive see tools and corpora self organizing map 218 semantic associations 21,147 semantics 1, 21,25,93-94,98, 100,106,210, 254 sensory input system 219-220 sentence length 164-168, 170-172, 243,245-246 sequence analysis 37-39, 51 Serbian see language shibboleths see dialect Sichel model see models SMOG readability formula 165 sonority sequencing principle 26-28 Spearman correlation coefficient 61 spoken language 6, 62,179180,257,259,261-264,266 statistical distribution Gauss-Poisson distribution 239, 248-249,252,254 Zipf-Mandelbrot distribution 3,55-56,59, 239,248-249, 251, 254 Dacey-Poisson distribution 55-56,60 stress 9-10,96 stylometry 1, 5,163,172, 195-205 support vector machines 153-155,166 SVM see support vector machines syllable frequency 3,55-56, 58-59, 61-62,64 syllable length 3,55-56, 60-65 synergetic model of language see models T taxonomy 225,235 t-complexity 50 text classification see classification text length 121-122,124-128, 130-134 text-mining 6,225-226,239, 241, 255,261 texts’ taxonomy see taxonomy thematic concentration 149, 195,198,202 time series 147,273 tools and corpora Altmann-Fitter 272 Balanced Corpus of Contemporary Written Japanese 4,121,123 British National Corpus 259 Coh-Metrix classifier 165 Corpus of Middle English Prose and Verse 217 Gutenberg corpus 152-153, 156 LancsBox software 199 շ8օ Language and Text MeCab (morphological analyzer of Japanese) 71 Modern Greek Corpus 62 MOGRead (readabihty of Greek text) 163-164, 166,170 National Corpus of Polish (NKJP) 242-246,249, 255 NLREG 63 QUITA (Quantitative Index Text Analyzer) 166,199 R-package compute.es’ 75 RODA (Romanian Online Dialect Atlas) 138-139 Sacred Text Archive 217 UniDic (electronic dictionary ofJapanese) 71 WCRF Tagger 242, 255 ZipfR package 243, 255 topological mapping 5, 209-222 treebank 259 trigram 149,166,230,233-234 TTR see type token ratio type token ratio 3,37-53,149, 151-153.156-158,168,171, word length 6,55-57,60-64, 165-166,168,170-171,245- 195,197 Turkish see language 246,254 word order 2,4,9,11-12,14,19, 69,90,116-117,119,128 word2vec 6,225,228 writer’s view 148,168 {see also alpha) writing species 225,227,235 U Ukrainian see language unique trigrams 149,152-153, 157-158 v valency (semantic) 93-94, 98-101 valency (syntactic) 93,95, 98-99 valency 2-4,69-72,76,81, 93-100,106 verb distances149,152-153, 157-158,195,199-201 visualization of data 2,3,5, 37-39,41-45,48,50,163, 172-173, 209-222 vocabulary richness 149,168, 197 W WCRF Tagger see tools and corpora Bayerische Staatsbibliothek München y Yule’s characteristic К 168 Z Zipflaw 121,126,128,131, 133-134 Zipf’s forces 245 Zipf-Mandelbrot distribution see statistical distribution ZipfR package see tools and corpora Δ Δ-score 23-25,31,34-35
adam_txt	Table of contents Introduction Adam Pawłowski, Sheila Embleton, Jan Mačutek and George Mikros i Parti. Theory and models On the impact of the initial phrase length on the position of enclitics in Old Czech Radek Čech, Pavel Kosek, Olga Navrátilová and Ján Mačutek 9 Term distance, frequency and collocations Lars G. Johnsen 21 A method for the comparison of general sequences via type-token ratio Vladimir Matlach, Diego Gabriel Krivochen and Jiří Milička 37 Quantitative analysis of syllable properties in Croatian, Serbian, Russian, and Ukrainian Biljana Rujević, Marija Kaplar, Sebastijan Kaplar, Ranka Stankovič, Ivan Obradovič and Ján Mačutek N-grams of grammatical functions and their significant order in the Japanese clause Haruko Sanada Linking the dependents: Quantitative-linguistic hypotheses on valency Petra Steiner Grammar efficiency and the One-Meaning-One-Form Principle Reija Vulanović Distribution and characteristics of commonly used words across different texts in Japanese Makoto Yamazaki 55 69 93 109 121 vi Language and Text Part II. Empirical studies The perils of big data Sheila Embleton, Dorin Uritescu and Eric S. Wheeler From distinguishability to informativity: A quantitative text model for detecting random texts Maxim Konca, Alexander Mehler, Daniel Baumartz and Wahed Hemati 137 145 A Modern Greek readability tool: Development of evaluation methods George Mikros and Rania Voskaki 163 Phonological properties as predictors of text success Jiří Milička and Alžběta Houzar Růžičková 177 Calculating the victory chances: A stylometric insight into the 2018 Czech presidential election Michal Místecký 195 Topological mapping for visualisation of high-dimensional historical linguistic data Hermann Moisi 209 Book genre and author’s gender recognition based on titles: The example of the bibliographic corpus of microtexts Adam Pawlowski, Elzbieta Herden and Tomasz Walkowiak 225 Quantitative analysis of bibliographic corpora: Statistical features, semantic profiles, word spectra Adam Pawlowski, Krzysztof Topolski and Elżbieta Herden 239 Analysis of English text genre classification based on dependency types Yaqin Wang 257 In memory of Gabriel Altmann: Eminent linguist, a man with a brilliant mind, and friend 271 Index 277 Index A activity (coefficient of) 149, 195. 199, 202 adjusted modulus (coefficient) 148,168 AI see artificial intelligence alpha (writer’s view, coefficient) 148,171,152-153,157-158 artificial intelligence 1,2,165, 235-236 ATL see average token length autocorrelation 147-148,151, 157 average token length (ATL) 195. 197. 148 В Balanced Corpus of Contem porary Written Japanese see tools and corpora Bayes classification 165 beauty-in-averageness effect 5,177-178,189-191 BERT see bidirectional encoder representations from transformers Bessel function 248 bibliography (as a corpus) 6, 225,227,239-240 bidirectional encoder represen tations from transformers 145.147.151.158 big data 2, 4,137-143,226 bigram 22,230,233-234 bijection 109,111,114,116,117 book genre see writing species book titles (corpus) 6,225, 234-235. 239.254 British English see language British National Corpus see tools and corpora Busemann coefficient 199 C cacophony 178,189-190 case 4,93-96,98 Chinese see language classification model see models classification 2,5-6,11-12, 19.115.145-146,151.153-159. 164-173, 226-234,257-266 clause 9-11,14,19,69-75, 79-82,87,89-90,97 clustering 3,6,24,37-38,42, 44,50,209,220,226,257-258, 260-266 coefficient of correlation 14, 17,61,119 cohesion of text 147,165 Coh-Metrix classifier see tools Coh-Metrix classifier see tools and corpora collocation 3,21- 22,24-35 Common European Framework of Languages 163-164,166 commonality (degree of) 4, 121,123-134 competence (linguistic) 170-173.235 complexity (of language, syntax, vocabulary) 37,39,50,109, 115,165,167,197,199,209 cophenetic correlation coefficient 260,264-265 Corpus of Middle English Prose and Verse see tools and corpora corpus see tools correlation coefficient see coefficient correlation 9,19, 61, 64,109110,119-120,146-148,151,153, 156.179, 226, 260,264-265 Crişana see dialect Croatian see language curve length R index Czech see language 148,168 D Dacey-Poisson distribution see statistical distribution dependency grammar 97, 257-258 dialect Crişana 137-140 shibboleths 137,139 dialectometrics 1, 4,137-143 digital humanities 6,225, 240 distance correlation 151,153, 156 distribution see statistical distribution doc2vec 228 E enclitic 9-12,14-19 encoding effort (minimization of) 94,100,106 English see language entropy 39, 41,50-51,148, 167-168,171-173 error based feature elimination 154.156 Euclidean distance 44,168, 211, 213, 215,260 euphony 5,177-178,189, 190-191 F fastText 6, 225,228-236 Finnish see language Flesch-Kincaid formula 165 Flesh readability formula 164-165 fog index 165 FrameNet 93-94.98,100-103 Fry readability score 165 278 Language and Text function word 121,133,167, 243,246 funcționai equivalent 93-96, J Japanese see language К 98 keyword 5,24,195-196,199, 203-205, 241 G Gauss-Poisson distribution see statistical distribution gender recognition 2,225-236 generation of text see natural language generation generative adversarial networks 2,150 generative model see models geography of language and texts 138-140, 209, 222, 240 German see language Gini coefficient 39,41,148,168, 171-171, 265 golden ratio 168 GPT-2 model see generative model grammar efficiency 109-110, 115-117,119-120 grammatical function 19, 69-71,73-86, 88-90,93-97 Greek see languages Guiraud’s R 164 Gutenberg corpus see tools and corpora Gutenberg Project 150-154, 157-158 H hapax legomena 145,148, 149.171 heat map 151 hierarchical cluster analysis 6, 213,257-258,260, 265, 266 historical linguistics 5, 209-222 h-point 147-148,149,167-168, 198-199 L lambda coefficient 148,156,171 LancsBox software see tools LancsBox software see tools and corpora language British English 138,259 Chinese 122,139,258,273 Croatian 55-63,65 Czech 2,5,9-19,42-43,62, 177-191,195-205 English 6,30,40,62, 72. 94֊95. 138,179.181, 190-191,216-217,շշշ։ 257-266,273 Finnish 138,142 German 95 Greek 5,163-173 Japanese 3-4,69-90, 121-134. 274 Mambila 139 Polish 6, 225-236,239-255 Romanian 137-139 Russian 3,55-65 Serbian 3,55-65 Turkish 110,115-119 Ukrainian 3,55-65 law of brevity 55,61,63 lexical balance 121 likes per view (parameter) 5,182-190 literary genre (recognizing) 63, 227-233,239 long short-term memories 150 Louvain-clustering 24 Μ I inflected languages 94-95,228, 232, 246 initial phrase (old Czech) 9-19 iterative feature elimination 153.155֊156 machine learning 3,5,37-38, 50,159,163-166,169,173. շշ5. 227, 235 Mambila see language MARC format 227,239, 241-241 Markov model 38,147 mathematical model see models MATTR (moving average type token ratio) 195,197,200 maximum onset principle 56-57 MDS see multidimensional scaling Menzerath-Altmann’s law 2-3, 55-56, 62-64,24б, շ54 models classification model 164, 166,170 generative model 146,150, 152,154-155 mathematical model 2, 17,40, 56,59-60,62-64, 134,220 probability model 23, 46-50,150,158 random text model 5.145-147,158-159 Sichel model 248 synergetic model of language 56,95,100 text model 145-159 Modern Greek Corpus see tools and corpora MOGRead (readability of Greek text) see tools and corpora morpheme 19,37, 61, 69,71-73 multidimensional scaling 37, 44-50,138-141,244-245,250 Multi-Layer Perceptron 228, 232 multivariate analysis 209, 212,216 mutual information 22-23,26 N named entity 164 National Corpus of Polish see tools and corpora natural language generation 145-146,150,159 natural language processing 90,145,153,165-166,225-226, 236,239,241,242,255 neural networks 145,147,150, 166,218,226 n-grams 3,38-41,46-47-50, 69-90,225, 228-230,232-234 Index 279 NLP see natural language processing nonlinearity 151,209-214 noun phrase 30, 69, 96 О one-meaning-one-form principle 4,109-120 P part of speech (POS) 94, 109-119,133,153,165-166,199, 242-243,247,257, 259,261 PCA see principal component analysis phonetics 2,139-140,177, 179-180, 217 phonology 5, 9, 56-57,100,138, 177-181,191, 217 phrase length 9-11,14-19 pleasantness of language (spoken) 179,191 political discourse 2,195-205 POS see part of speech precision 231,233-234 principal component analysis 6, 213,257-258, 260, 262-264, 266 principle of least effort 2, 94, 100,106,177,245, 254 probability model see models probability 10,21-23,25,39, 46,55,147,150,227,243,248 propositional function 109, 112-118 Q Quantitative Index Text Analyzer see tools quantitative text characteristics 5,39,145-149,158-159,167168,197-199 QUITA (Quantitative Index Text Analyzer) see tools and corpora QUITA see tools R Ri vocabulary size 46-49,147, 152-153,157-158,167-168,171 Random Forest (algorithm, classifier) 5-6,153-155,163, 169,172,257-258,261,264, 266 random text models see models random text 2,4-5,145-147, 150-151,154-159 randomization 50,145-147, 150-151,155,158 randomness 35,37-38,40-47, 49-51,169 rank-frequency relation 55,59, 147-148,168,198-199,248 readability of text (tool, assessment) 5,146,163-173 recall 231,233-234 relative chronology 216-217 repeat rate (relative, normalized) 149,152-153, 156-158,168,171 RFC see random forest RODA (Romanian Online Dialects) see tools and corpora RODA see tools Romanian Online Dialects see tools Romanian see language R-package ‘compute.es’ see tools and corpora RR see repeat rate Russian see language S Sacred Text Archive see tools and corpora self organizing map 218 semantic associations 21,147 semantics 1, 21,25,93-94,98, 100,106,210, 254 sensory input system 219-220 sentence length 164-168, 170-172, 243,245-246 sequence analysis 37-39, 51 Serbian see language shibboleths see dialect Sichel model see models SMOG readability formula 165 sonority sequencing principle 26-28 Spearman correlation coefficient 61 spoken language 6, 62,179180,257,259,261-264,266 statistical distribution Gauss-Poisson distribution 239, 248-249,252,254 Zipf-Mandelbrot distribution 3,55-56,59, 239,248-249, 251, 254 Dacey-Poisson distribution 55-56,60 stress 9-10,96 stylometry 1, 5,163,172, 195-205 support vector machines 153-155,166 SVM see support vector machines syllable frequency 3,55-56, 58-59, 61-62,64 syllable length 3,55-56, 60-65 synergetic model of language see models T taxonomy 225,235 t-complexity 50 text classification see classification text length 121-122,124-128, 130-134 text-mining 6,225-226,239, 241, 255,261 texts’ taxonomy see taxonomy thematic concentration 149, 195,198,202 time series 147,273 tools and corpora Altmann-Fitter 272 Balanced Corpus of Contemporary Written Japanese 4,121,123 British National Corpus 259 Coh-Metrix classifier 165 Corpus of Middle English Prose and Verse 217 Gutenberg corpus 152-153, 156 LancsBox software 199 շ8օ Language and Text MeCab (morphological analyzer of Japanese) 71 Modern Greek Corpus 62 MOGRead (readabihty of Greek text) 163-164, 166,170 National Corpus of Polish (NKJP) 242-246,249, 255 NLREG 63 QUITA (Quantitative Index Text Analyzer) 166,199 R-package compute.es’ 75 RODA (Romanian Online Dialect Atlas) 138-139 Sacred Text Archive 217 UniDic (electronic dictionary ofJapanese) 71 WCRF Tagger 242, 255 ZipfR package 243, 255 topological mapping 5, 209-222 treebank 259 trigram 149,166,230,233-234 TTR see type token ratio type token ratio 3,37-53,149, 151-153.156-158,168,171, word length 6,55-57,60-64, 165-166,168,170-171,245- 195,197 Turkish see language 246,254 word order 2,4,9,11-12,14,19, 69,90,116-117,119,128 word2vec 6,225,228 writer’s view 148,168 {see also alpha) writing species 225,227,235 U Ukrainian see language unique trigrams 149,152-153, 157-158 v valency (semantic) 93-94, 98-101 valency (syntactic) 93,95, 98-99 valency 2-4,69-72,76,81, 93-100,106 verb distances149,152-153, 157-158,195,199-201 visualization of data 2,3,5, 37-39,41-45,48,50,163, 172-173, 209-222 vocabulary richness 149,168, 197 W WCRF Tagger see tools and corpora Bayerische Staatsbibliothek München y Yule’s characteristic К 168 Z Zipflaw 121,126,128,131, 133-134 Zipf’s forces 245 Zipf-Mandelbrot distribution see statistical distribution ZipfR package see tools and corpora Δ Δ-score 23-25,31,34-35
any_adam_object	1
any_adam_object_boolean	1
author2	Pawłowski, Adam Mačutek, Ján 1976- Embleton, Sheila M. Mikros, George K.
author2_role	edt edt edt edt
author2_variant	a p ap j m jm s m e sm sme g k m gk gkm
author_GND	(DE-588)1085147258
author_corporate	QUALICO (Veranstaltung) Breslau
author_corporate_role	aut
author_facet	Pawłowski, Adam Mačutek, Ján 1976- Embleton, Sheila M. Mikros, George K. QUALICO (Veranstaltung) Breslau
author_sort	QUALICO (Veranstaltung) Breslau
building	Verbundindex
bvnumber	BV047816174
callnumber-first	P - Language and Literature
callnumber-label	P138
callnumber-raw	P138.5
callnumber-search	P138.5
callnumber-sort	P 3138.5
callnumber-subject	P - Philology and Linguistics
ctrlnum	(DE-599)BVBBV047816174
format	Conference Proceeding Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 cb4500</leader><controlfield tag="001">BV047816174</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">220204s2021 \|\|\|\| 10\|\|\| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9789027210104</subfield><subfield code="9">978-90-272-1010-4</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV047816174</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-12</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">P138.5</subfield></datafield><datafield tag="111" ind1="2" ind2=" "><subfield code="a">QUALICO (Veranstaltung)</subfield><subfield code="n">10.</subfield><subfield code="d">2018</subfield><subfield code="c">Breslau</subfield><subfield code="j">Verfasser</subfield><subfield code="0">(DE-588)1249244242</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Language and text</subfield><subfield code="b">data, models, information and applications</subfield><subfield code="c">edited by Adam Pawłowski (University of Wrocław), Jan Mačutek (Mathematical Institute of Slovac Academy of Sciences & Constantine the Philosopher University in Nitra), Sheila Embleton (York University), George Mikros (Hamad Bin Khalifa University)</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Amsterdam ; Philadelphia</subfield><subfield code="b">John Benjamins Publishing Company</subfield><subfield code="c">[2021]</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2021</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">vi, 280 Seiten</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory</subfield><subfield code="v">volume 356</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Hervorgegangen aus der 10. QUALICO Konferenz, die vom 05.07.2018-08.07.2018 in Breslau, Polen, stattgefunden hat - Einleitung</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Linguistics</subfield><subfield code="x">Statistical methods</subfield><subfield code="v">Congresses</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computational linguistics</subfield><subfield code="v">Congresses</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Empirische Linguistik</subfield><subfield code="0">(DE-588)4406207-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Datenverarbeitung</subfield><subfield code="0">(DE-588)4011152-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Linguistik</subfield><subfield code="0">(DE-588)4074250-7</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Computerlinguistik</subfield><subfield code="0">(DE-588)4035843-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="653" ind1=" " ind2="6"><subfield code="a">Essays</subfield></datafield><datafield tag="653" ind1=" " ind2="6"><subfield code="a">Conference papers and proceedings</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)1071861417</subfield><subfield code="a">Konferenzschrift</subfield><subfield code="y">05.07.2018-08.07.2018</subfield><subfield code="z">Breslau</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Computerlinguistik</subfield><subfield code="0">(DE-588)4035843-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="689" ind1="1" ind2="0"><subfield code="a">Linguistik</subfield><subfield code="0">(DE-588)4074250-7</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2="1"><subfield code="a">Datenverarbeitung</subfield><subfield code="0">(DE-588)4011152-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="689" ind1="2" ind2="0"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="2" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="689" ind1="3" ind2="0"><subfield code="a">Empirische Linguistik</subfield><subfield code="0">(DE-588)4406207-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="3" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Pawłowski, Adam</subfield><subfield code="4">edt</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Mačutek, Ján</subfield><subfield code="d">1976-</subfield><subfield code="0">(DE-588)1085147258</subfield><subfield code="4">edt</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Embleton, Sheila M.</subfield><subfield code="4">edt</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Mikros, George K.</subfield><subfield code="4">edt</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-90-272-5838-0</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory</subfield><subfield code="v">volume 356</subfield><subfield code="w">(DE-604)BV000001437</subfield><subfield code="9">356</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung BSB München 19 - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033199570&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung BSB München 19 - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033199570&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Register // Sachregister</subfield></datafield><datafield tag="940" ind1="1" ind2=" "><subfield code="n">oe</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-033199570</subfield></datafield></record></collection>
genre	(DE-588)1071861417 Konferenzschrift 05.07.2018-08.07.2018 Breslau gnd-content
genre_facet	Konferenzschrift 05.07.2018-08.07.2018 Breslau
id	DE-604.BV047816174
illustrated	Not Illustrated
index_date	2024-07-03T19:06:49Z
indexdate	2024-08-21T00:50:51Z
institution	BVB
institution_GND	(DE-588)1249244242
isbn	9789027210104
language	English
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-033199570
open_access_boolean
owner	DE-12
owner_facet	DE-12
physical	vi, 280 Seiten
publishDate	2021
publishDateSearch	2021
publishDateSort	2021
publisher	John Benjamins Publishing Company
record_format	marc
series	Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory
series2	Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory
spelling	QUALICO (Veranstaltung) 10. 2018 Breslau Verfasser (DE-588)1249244242 aut Language and text data, models, information and applications edited by Adam Pawłowski (University of Wrocław), Jan Mačutek (Mathematical Institute of Slovac Academy of Sciences & Constantine the Philosopher University in Nitra), Sheila Embleton (York University), George Mikros (Hamad Bin Khalifa University) Amsterdam ; Philadelphia John Benjamins Publishing Company [2021] © 2021 vi, 280 Seiten txt rdacontent n rdamedia nc rdacarrier Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory volume 356 Hervorgegangen aus der 10. QUALICO Konferenz, die vom 05.07.2018-08.07.2018 in Breslau, Polen, stattgefunden hat - Einleitung Linguistics Statistical methods Congresses Computational linguistics Congresses Empirische Linguistik (DE-588)4406207-2 gnd rswk-swf Datenverarbeitung (DE-588)4011152-0 gnd rswk-swf Linguistik (DE-588)4074250-7 gnd rswk-swf Korpus Linguistik (DE-588)4165338-5 gnd rswk-swf Computerlinguistik (DE-588)4035843-4 gnd rswk-swf Essays Conference papers and proceedings (DE-588)1071861417 Konferenzschrift 05.07.2018-08.07.2018 Breslau gnd-content Computerlinguistik (DE-588)4035843-4 s DE-604 Linguistik (DE-588)4074250-7 s Datenverarbeitung (DE-588)4011152-0 s Korpus Linguistik (DE-588)4165338-5 s Empirische Linguistik (DE-588)4406207-2 s Pawłowski, Adam edt Mačutek, Ján 1976- (DE-588)1085147258 edt Embleton, Sheila M. edt Mikros, George K. edt Erscheint auch als Online-Ausgabe 978-90-272-5838-0 Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory volume 356 (DE-604)BV000001437 356 Digitalisierung BSB München 19 - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033199570&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis Digitalisierung BSB München 19 - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033199570&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA Register // Sachregister
spellingShingle	Language and text data, models, information and applications Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory Linguistics Statistical methods Congresses Computational linguistics Congresses Empirische Linguistik (DE-588)4406207-2 gnd Datenverarbeitung (DE-588)4011152-0 gnd Linguistik (DE-588)4074250-7 gnd Korpus Linguistik (DE-588)4165338-5 gnd Computerlinguistik (DE-588)4035843-4 gnd
subject_GND	(DE-588)4406207-2 (DE-588)4011152-0 (DE-588)4074250-7 (DE-588)4165338-5 (DE-588)4035843-4 (DE-588)1071861417
title	Language and text data, models, information and applications
title_auth	Language and text data, models, information and applications
title_exact_search	Language and text data, models, information and applications
title_exact_search_txtP	Language and text data, models, information and applications
title_full	Language and text data, models, information and applications edited by Adam Pawłowski (University of Wrocław), Jan Mačutek (Mathematical Institute of Slovac Academy of Sciences & Constantine the Philosopher University in Nitra), Sheila Embleton (York University), George Mikros (Hamad Bin Khalifa University)
title_fullStr	Language and text data, models, information and applications edited by Adam Pawłowski (University of Wrocław), Jan Mačutek (Mathematical Institute of Slovac Academy of Sciences & Constantine the Philosopher University in Nitra), Sheila Embleton (York University), George Mikros (Hamad Bin Khalifa University)
title_full_unstemmed	Language and text data, models, information and applications edited by Adam Pawłowski (University of Wrocław), Jan Mačutek (Mathematical Institute of Slovac Academy of Sciences & Constantine the Philosopher University in Nitra), Sheila Embleton (York University), George Mikros (Hamad Bin Khalifa University)
title_short	Language and text
title_sort	language and text data models information and applications
title_sub	data, models, information and applications
topic	Linguistics Statistical methods Congresses Computational linguistics Congresses Empirische Linguistik (DE-588)4406207-2 gnd Datenverarbeitung (DE-588)4011152-0 gnd Linguistik (DE-588)4074250-7 gnd Korpus Linguistik (DE-588)4165338-5 gnd Computerlinguistik (DE-588)4035843-4 gnd
topic_facet	Linguistics Statistical methods Congresses Computational linguistics Congresses Empirische Linguistik Datenverarbeitung Linguistik Korpus Linguistik Computerlinguistik Konferenzschrift 05.07.2018-08.07.2018 Breslau
url	http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033199570&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033199570&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA
volume_link	(DE-604)BV000001437
work_keys_str_mv	AT qualicoveranstaltungbreslau languageandtextdatamodelsinformationandapplications AT pawłowskiadam languageandtextdatamodelsinformationandapplications AT macutekjan languageandtextdatamodelsinformationandapplications AT embletonsheilam languageandtextdatamodelsinformationandapplications AT mikrosgeorgek languageandtextdatamodelsinformationandapplications

Verfügbarkeit

Es ist kein Print-Exemplar vorhanden.

Fernleihe Bestellen Achtung: Nicht im THWS-Bestand! Inhaltsverzeichnis

MARC

Datensatz im Suchindex

Es ist kein Print-Exemplar vorhanden.

Ähnliche Einträge