Language and text: data, models, information and applications
Gespeichert in:
Körperschaft: | |
---|---|
Weitere Verfasser: | , , , |
Format: | Tagungsbericht Buch |
Sprache: | English |
Veröffentlicht: |
Amsterdam ; Philadelphia
John Benjamins Publishing Company
[2021]
|
Schriftenreihe: | Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory
volume 356 |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis Register // Sachregister |
Beschreibung: | Hervorgegangen aus der 10. QUALICO Konferenz, die vom 05.07.2018-08.07.2018 in Breslau, Polen, stattgefunden hat - Einleitung |
Beschreibung: | vi, 280 Seiten |
ISBN: | 9789027210104 |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV047816174 | ||
003 | DE-604 | ||
007 | t | ||
008 | 220204s2021 |||| 10||| eng d | ||
020 | |a 9789027210104 |9 978-90-272-1010-4 | ||
035 | |a (DE-599)BVBBV047816174 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-12 | ||
050 | 0 | |a P138.5 | |
111 | 2 | |a QUALICO (Veranstaltung) |n 10. |d 2018 |c Breslau |j Verfasser |0 (DE-588)1249244242 |4 aut | |
245 | 1 | 0 | |a Language and text |b data, models, information and applications |c edited by Adam Pawłowski (University of Wrocław), Jan Mačutek (Mathematical Institute of Slovac Academy of Sciences & Constantine the Philosopher University in Nitra), Sheila Embleton (York University), George Mikros (Hamad Bin Khalifa University) |
264 | 1 | |a Amsterdam ; Philadelphia |b John Benjamins Publishing Company |c [2021] | |
264 | 4 | |c © 2021 | |
300 | |a vi, 280 Seiten | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory |v volume 356 | |
500 | |a Hervorgegangen aus der 10. QUALICO Konferenz, die vom 05.07.2018-08.07.2018 in Breslau, Polen, stattgefunden hat - Einleitung | ||
650 | 4 | |a Linguistics |x Statistical methods |v Congresses | |
650 | 4 | |a Computational linguistics |v Congresses | |
650 | 0 | 7 | |a Empirische Linguistik |0 (DE-588)4406207-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Datenverarbeitung |0 (DE-588)4011152-0 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Linguistik |0 (DE-588)4074250-7 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Korpus |g Linguistik |0 (DE-588)4165338-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Computerlinguistik |0 (DE-588)4035843-4 |2 gnd |9 rswk-swf |
653 | 6 | |a Essays | |
653 | 6 | |a Conference papers and proceedings | |
655 | 7 | |0 (DE-588)1071861417 |a Konferenzschrift |y 05.07.2018-08.07.2018 |z Breslau |2 gnd-content | |
689 | 0 | 0 | |a Computerlinguistik |0 (DE-588)4035843-4 |D s |
689 | 0 | |5 DE-604 | |
689 | 1 | 0 | |a Linguistik |0 (DE-588)4074250-7 |D s |
689 | 1 | 1 | |a Datenverarbeitung |0 (DE-588)4011152-0 |D s |
689 | 1 | |5 DE-604 | |
689 | 2 | 0 | |a Korpus |g Linguistik |0 (DE-588)4165338-5 |D s |
689 | 2 | |5 DE-604 | |
689 | 3 | 0 | |a Empirische Linguistik |0 (DE-588)4406207-2 |D s |
689 | 3 | |5 DE-604 | |
700 | 1 | |a Pawłowski, Adam |4 edt | |
700 | 1 | |a Mačutek, Ján |d 1976- |0 (DE-588)1085147258 |4 edt | |
700 | 1 | |a Embleton, Sheila M. |4 edt | |
700 | 1 | |a Mikros, George K. |4 edt | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-90-272-5838-0 |
830 | 0 | |a Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory |v volume 356 |w (DE-604)BV000001437 |9 356 | |
856 | 4 | 2 | |m Digitalisierung BSB München 19 - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033199570&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
856 | 4 | 2 | |m Digitalisierung BSB München 19 - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033199570&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |3 Register // Sachregister |
940 | 1 | |n oe | |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-033199570 |
Datensatz im Suchindex
_version_ | 1807956262140248064 |
---|---|
adam_text |
Table of contents Introduction Adam Pawłowski, Sheila Embleton, Jan Mačutek and George Mikros i Parti. Theory and models On the impact of the initial phrase length on the position of enclitics in Old Czech Radek Čech, Pavel Kosek, Olga Navrátilová and Ján Mačutek 9 Term distance, frequency and collocations Lars G. Johnsen 21 A method for the comparison of general sequences via type-token ratio Vladimir Matlach, Diego Gabriel Krivochen and Jiří Milička 37 Quantitative analysis of syllable properties in Croatian, Serbian, Russian, and Ukrainian Biljana Rujević, Marija Kaplar, Sebastijan Kaplar, Ranka Stankovič, Ivan Obradovič and Ján Mačutek N-grams of grammatical functions and their significant order in the Japanese clause Haruko Sanada Linking the dependents: Quantitative-linguistic hypotheses on valency Petra Steiner Grammar efficiency and the One-Meaning-One-Form Principle Reija Vulanović Distribution and characteristics of commonly used words across different texts in Japanese Makoto Yamazaki 55 69 93 109 121
vi Language and Text Part II. Empirical studies The perils of big data Sheila Embleton, Dorin Uritescu and Eric S. Wheeler From distinguishability to informativity: A quantitative text model for detecting random texts Maxim Konca, Alexander Mehler, Daniel Baumartz and Wahed Hemati 137 145 A Modern Greek readability tool: Development of evaluation methods George Mikros and Rania Voskaki 163 Phonological properties as predictors of text success Jiří Milička and Alžběta Houzar Růžičková 177 Calculating the victory chances: A stylometric insight into the 2018 Czech presidential election Michal Místecký 195 Topological mapping for visualisation of high-dimensional historical linguistic data Hermann Moisi 209 Book genre and author’s gender recognition based on titles: The example of the bibliographic corpus of microtexts Adam Pawlowski, Elzbieta Herden and Tomasz Walkowiak 225 Quantitative analysis of bibliographic corpora: Statistical features, semantic profiles, word spectra Adam Pawlowski, Krzysztof Topolski and Elżbieta Herden 239 Analysis of English text genre classification based on dependency types Yaqin Wang 257 In memory of Gabriel Altmann: Eminent linguist, a man with a brilliant mind, and friend 271 Index 277
Index A activity (coefficient of) 149, 195. 199, 202 adjusted modulus (coefficient) 148,168 AI see artificial intelligence alpha (writer’s view, coefficient) 148,171,152-153,157-158 artificial intelligence 1,2,165, 235-236 ATL see average token length autocorrelation 147-148,151, 157 average token length (ATL) 195. 197. 148 В Balanced Corpus of Contem porary Written Japanese see tools and corpora Bayes classification 165 beauty-in-averageness effect 5,177-178,189-191 BERT see bidirectional encoder representations from transformers Bessel function 248 bibliography (as a corpus) 6, 225,227,239-240 bidirectional encoder represen tations from transformers 145.147.151.158 big data 2, 4,137-143,226 bigram 22,230,233-234 bijection 109,111,114,116,117 book genre see writing species book titles (corpus) 6,225, 234-235. 239.254 British English see language British National Corpus see tools and corpora Busemann coefficient 199 C cacophony 178,189-190 case 4,93-96,98 Chinese see language classification model see models classification 2,5-6,11-12, 19.115.145-146,151.153-159. 164-173, 226-234,257-266 clause 9-11,14,19,69-75, 79-82,87,89-90,97 clustering 3,6,24,37-38,42, 44,50,209,220,226,257-258, 260-266 coefficient of correlation 14, 17,61,119 cohesion of text 147,165 Coh-Metrix classifier see tools Coh-Metrix classifier see tools and corpora collocation 3,21- 22,24-35 Common European Framework of Languages 163-164,166 commonality (degree of) 4, 121,123-134 competence (linguistic) 170-173.235 complexity (of language, syntax, vocabulary) 37,39,50,109, 115,165,167,197,199,209 cophenetic
correlation coefficient 260,264-265 Corpus of Middle English Prose and Verse see tools and corpora corpus see tools correlation coefficient see coefficient correlation 9,19, 61, 64,109110,119-120,146-148,151,153, 156.179, 226, 260,264-265 Crişana see dialect Croatian see language curve length R index Czech see language 148,168 D Dacey-Poisson distribution see statistical distribution dependency grammar 97, 257-258 dialect Crişana 137-140 shibboleths 137,139 dialectometrics 1, 4,137-143 digital humanities 6,225, 240 distance correlation 151,153, 156 distribution see statistical distribution doc2vec 228 E enclitic 9-12,14-19 encoding effort (minimization of) 94,100,106 English see language entropy 39, 41,50-51,148, 167-168,171-173 error based feature elimination 154.156 Euclidean distance 44,168, 211, 213, 215,260 euphony 5,177-178,189, 190-191 F fastText 6, 225,228-236 Finnish see language Flesch-Kincaid formula 165 Flesh readability formula 164-165 fog index 165 FrameNet 93-94.98,100-103 Fry readability score 165
278 Language and Text function word 121,133,167, 243,246 funcționai equivalent 93-96, J Japanese see language К 98 keyword 5,24,195-196,199, 203-205, 241 G Gauss-Poisson distribution see statistical distribution gender recognition 2,225-236 generation of text see natural language generation generative adversarial networks 2,150 generative model see models geography of language and texts 138-140, 209, 222, 240 German see language Gini coefficient 39,41,148,168, 171-171, 265 golden ratio 168 GPT-2 model see generative model grammar efficiency 109-110, 115-117,119-120 grammatical function 19, 69-71,73-86, 88-90,93-97 Greek see languages Guiraud’s R 164 Gutenberg corpus see tools and corpora Gutenberg Project 150-154, 157-158 H hapax legomena 145,148, 149.171 heat map 151 hierarchical cluster analysis 6, 213,257-258,260, 265, 266 historical linguistics 5, 209-222 h-point 147-148,149,167-168, 198-199 L lambda coefficient 148,156,171 LancsBox software see tools LancsBox software see tools and corpora language British English 138,259 Chinese 122,139,258,273 Croatian 55-63,65 Czech 2,5,9-19,42-43,62, 177-191,195-205 English 6,30,40,62, 72. 94֊95. 138,179.181, 190-191,216-217,շշշ։ 257-266,273 Finnish 138,142 German 95 Greek 5,163-173 Japanese 3-4,69-90, 121-134. 274 Mambila 139 Polish 6, 225-236,239-255 Romanian 137-139 Russian 3,55-65 Serbian 3,55-65 Turkish 110,115-119 Ukrainian 3,55-65 law of brevity 55,61,63 lexical balance 121 likes per view (parameter) 5,182-190 literary genre (recognizing) 63, 227-233,239 long short-term memories 150 Louvain-clustering 24 Μ I inflected
languages 94-95,228, 232, 246 initial phrase (old Czech) 9-19 iterative feature elimination 153.155֊156 machine learning 3,5,37-38, 50,159,163-166,169,173. շշ5. 227, 235 Mambila see language MARC format 227,239, 241-241 Markov model 38,147 mathematical model see models MATTR (moving average type token ratio) 195,197,200 maximum onset principle 56-57 MDS see multidimensional scaling Menzerath-Altmann’s law 2-3, 55-56, 62-64,24б, շ54 models classification model 164, 166,170 generative model 146,150, 152,154-155 mathematical model 2, 17,40, 56,59-60,62-64, 134,220 probability model 23, 46-50,150,158 random text model 5.145-147,158-159 Sichel model 248 synergetic model of language 56,95,100 text model 145-159 Modern Greek Corpus see tools and corpora MOGRead (readability of Greek text) see tools and corpora morpheme 19,37, 61, 69,71-73 multidimensional scaling 37, 44-50,138-141,244-245,250 Multi-Layer Perceptron 228, 232 multivariate analysis 209, 212,216 mutual information 22-23,26 N named entity 164 National Corpus of Polish see tools and corpora natural language generation 145-146,150,159 natural language processing 90,145,153,165-166,225-226, 236,239,241,242,255 neural networks 145,147,150, 166,218,226 n-grams 3,38-41,46-47-50, 69-90,225, 228-230,232-234
Index 279 NLP see natural language processing nonlinearity 151,209-214 noun phrase 30, 69, 96 О one-meaning-one-form principle 4,109-120 P part of speech (POS) 94, 109-119,133,153,165-166,199, 242-243,247,257, 259,261 PCA see principal component analysis phonetics 2,139-140,177, 179-180, 217 phonology 5, 9, 56-57,100,138, 177-181,191, 217 phrase length 9-11,14-19 pleasantness of language (spoken) 179,191 political discourse 2,195-205 POS see part of speech precision 231,233-234 principal component analysis 6, 213,257-258, 260, 262-264, 266 principle of least effort 2, 94, 100,106,177,245, 254 probability model see models probability 10,21-23,25,39, 46,55,147,150,227,243,248 propositional function 109, 112-118 Q Quantitative Index Text Analyzer see tools quantitative text characteristics 5,39,145-149,158-159,167168,197-199 QUITA (Quantitative Index Text Analyzer) see tools and corpora QUITA see tools R Ri vocabulary size 46-49,147, 152-153,157-158,167-168,171 Random Forest (algorithm, classifier) 5-6,153-155,163, 169,172,257-258,261,264, 266 random text models see models random text 2,4-5,145-147, 150-151,154-159 randomization 50,145-147, 150-151,155,158 randomness 35,37-38,40-47, 49-51,169 rank-frequency relation 55,59, 147-148,168,198-199,248 readability of text (tool, assessment) 5,146,163-173 recall 231,233-234 relative chronology 216-217 repeat rate (relative, normalized) 149,152-153, 156-158,168,171 RFC see random forest RODA (Romanian Online Dialects) see tools and corpora RODA see tools Romanian Online Dialects see tools Romanian see language R-package ‘compute.es’
see tools and corpora RR see repeat rate Russian see language S Sacred Text Archive see tools and corpora self organizing map 218 semantic associations 21,147 semantics 1, 21,25,93-94,98, 100,106,210, 254 sensory input system 219-220 sentence length 164-168, 170-172, 243,245-246 sequence analysis 37-39, 51 Serbian see language shibboleths see dialect Sichel model see models SMOG readability formula 165 sonority sequencing principle 26-28 Spearman correlation coefficient 61 spoken language 6, 62,179180,257,259,261-264,266 statistical distribution Gauss-Poisson distribution 239, 248-249,252,254 Zipf-Mandelbrot distribution 3,55-56,59, 239,248-249, 251, 254 Dacey-Poisson distribution 55-56,60 stress 9-10,96 stylometry 1, 5,163,172, 195-205 support vector machines 153-155,166 SVM see support vector machines syllable frequency 3,55-56, 58-59, 61-62,64 syllable length 3,55-56, 60-65 synergetic model of language see models T taxonomy 225,235 t-complexity 50 text classification see classification text length 121-122,124-128, 130-134 text-mining 6,225-226,239, 241, 255,261 texts’ taxonomy see taxonomy thematic concentration 149, 195,198,202 time series 147,273 tools and corpora Altmann-Fitter 272 Balanced Corpus of Contemporary Written Japanese 4,121,123 British National Corpus 259 Coh-Metrix classifier 165 Corpus of Middle English Prose and Verse 217 Gutenberg corpus 152-153, 156 LancsBox software 199
շ8օ Language and Text MeCab (morphological analyzer of Japanese) 71 Modern Greek Corpus 62 MOGRead (readabihty of Greek text) 163-164, 166,170 National Corpus of Polish (NKJP) 242-246,249, 255 NLREG 63 QUITA (Quantitative Index Text Analyzer) 166,199 R-package compute.es’ 75 RODA (Romanian Online Dialect Atlas) 138-139 Sacred Text Archive 217 UniDic (electronic dictionary ofJapanese) 71 WCRF Tagger 242, 255 ZipfR package 243, 255 topological mapping 5, 209-222 treebank 259 trigram 149,166,230,233-234 TTR see type token ratio type token ratio 3,37-53,149, 151-153.156-158,168,171, word length 6,55-57,60-64, 165-166,168,170-171,245- 195,197 Turkish see language 246,254 word order 2,4,9,11-12,14,19, 69,90,116-117,119,128 word2vec 6,225,228 writer’s view 148,168 {see also alpha) writing species 225,227,235 U Ukrainian see language unique trigrams 149,152-153, 157-158 v valency (semantic) 93-94, 98-101 valency (syntactic) 93,95, 98-99 valency 2-4,69-72,76,81, 93-100,106 verb distances149,152-153, 157-158,195,199-201 visualization of data 2,3,5, 37-39,41-45,48,50,163, 172-173, 209-222 vocabulary richness 149,168, 197 W WCRF Tagger see tools and corpora Bayerische Staatsbibliothek München y Yule’s characteristic К 168 Z Zipflaw 121,126,128,131, 133-134 Zipf’s forces 245 Zipf-Mandelbrot distribution see statistical distribution ZipfR package see tools and corpora Δ Δ-score 23-25,31,34-35 |
adam_txt |
Table of contents Introduction Adam Pawłowski, Sheila Embleton, Jan Mačutek and George Mikros i Parti. Theory and models On the impact of the initial phrase length on the position of enclitics in Old Czech Radek Čech, Pavel Kosek, Olga Navrátilová and Ján Mačutek 9 Term distance, frequency and collocations Lars G. Johnsen 21 A method for the comparison of general sequences via type-token ratio Vladimir Matlach, Diego Gabriel Krivochen and Jiří Milička 37 Quantitative analysis of syllable properties in Croatian, Serbian, Russian, and Ukrainian Biljana Rujević, Marija Kaplar, Sebastijan Kaplar, Ranka Stankovič, Ivan Obradovič and Ján Mačutek N-grams of grammatical functions and their significant order in the Japanese clause Haruko Sanada Linking the dependents: Quantitative-linguistic hypotheses on valency Petra Steiner Grammar efficiency and the One-Meaning-One-Form Principle Reija Vulanović Distribution and characteristics of commonly used words across different texts in Japanese Makoto Yamazaki 55 69 93 109 121
vi Language and Text Part II. Empirical studies The perils of big data Sheila Embleton, Dorin Uritescu and Eric S. Wheeler From distinguishability to informativity: A quantitative text model for detecting random texts Maxim Konca, Alexander Mehler, Daniel Baumartz and Wahed Hemati 137 145 A Modern Greek readability tool: Development of evaluation methods George Mikros and Rania Voskaki 163 Phonological properties as predictors of text success Jiří Milička and Alžběta Houzar Růžičková 177 Calculating the victory chances: A stylometric insight into the 2018 Czech presidential election Michal Místecký 195 Topological mapping for visualisation of high-dimensional historical linguistic data Hermann Moisi 209 Book genre and author’s gender recognition based on titles: The example of the bibliographic corpus of microtexts Adam Pawlowski, Elzbieta Herden and Tomasz Walkowiak 225 Quantitative analysis of bibliographic corpora: Statistical features, semantic profiles, word spectra Adam Pawlowski, Krzysztof Topolski and Elżbieta Herden 239 Analysis of English text genre classification based on dependency types Yaqin Wang 257 In memory of Gabriel Altmann: Eminent linguist, a man with a brilliant mind, and friend 271 Index 277
Index A activity (coefficient of) 149, 195. 199, 202 adjusted modulus (coefficient) 148,168 AI see artificial intelligence alpha (writer’s view, coefficient) 148,171,152-153,157-158 artificial intelligence 1,2,165, 235-236 ATL see average token length autocorrelation 147-148,151, 157 average token length (ATL) 195. 197. 148 В Balanced Corpus of Contem porary Written Japanese see tools and corpora Bayes classification 165 beauty-in-averageness effect 5,177-178,189-191 BERT see bidirectional encoder representations from transformers Bessel function 248 bibliography (as a corpus) 6, 225,227,239-240 bidirectional encoder represen tations from transformers 145.147.151.158 big data 2, 4,137-143,226 bigram 22,230,233-234 bijection 109,111,114,116,117 book genre see writing species book titles (corpus) 6,225, 234-235. 239.254 British English see language British National Corpus see tools and corpora Busemann coefficient 199 C cacophony 178,189-190 case 4,93-96,98 Chinese see language classification model see models classification 2,5-6,11-12, 19.115.145-146,151.153-159. 164-173, 226-234,257-266 clause 9-11,14,19,69-75, 79-82,87,89-90,97 clustering 3,6,24,37-38,42, 44,50,209,220,226,257-258, 260-266 coefficient of correlation 14, 17,61,119 cohesion of text 147,165 Coh-Metrix classifier see tools Coh-Metrix classifier see tools and corpora collocation 3,21- 22,24-35 Common European Framework of Languages 163-164,166 commonality (degree of) 4, 121,123-134 competence (linguistic) 170-173.235 complexity (of language, syntax, vocabulary) 37,39,50,109, 115,165,167,197,199,209 cophenetic
correlation coefficient 260,264-265 Corpus of Middle English Prose and Verse see tools and corpora corpus see tools correlation coefficient see coefficient correlation 9,19, 61, 64,109110,119-120,146-148,151,153, 156.179, 226, 260,264-265 Crişana see dialect Croatian see language curve length R index Czech see language 148,168 D Dacey-Poisson distribution see statistical distribution dependency grammar 97, 257-258 dialect Crişana 137-140 shibboleths 137,139 dialectometrics 1, 4,137-143 digital humanities 6,225, 240 distance correlation 151,153, 156 distribution see statistical distribution doc2vec 228 E enclitic 9-12,14-19 encoding effort (minimization of) 94,100,106 English see language entropy 39, 41,50-51,148, 167-168,171-173 error based feature elimination 154.156 Euclidean distance 44,168, 211, 213, 215,260 euphony 5,177-178,189, 190-191 F fastText 6, 225,228-236 Finnish see language Flesch-Kincaid formula 165 Flesh readability formula 164-165 fog index 165 FrameNet 93-94.98,100-103 Fry readability score 165
278 Language and Text function word 121,133,167, 243,246 funcționai equivalent 93-96, J Japanese see language К 98 keyword 5,24,195-196,199, 203-205, 241 G Gauss-Poisson distribution see statistical distribution gender recognition 2,225-236 generation of text see natural language generation generative adversarial networks 2,150 generative model see models geography of language and texts 138-140, 209, 222, 240 German see language Gini coefficient 39,41,148,168, 171-171, 265 golden ratio 168 GPT-2 model see generative model grammar efficiency 109-110, 115-117,119-120 grammatical function 19, 69-71,73-86, 88-90,93-97 Greek see languages Guiraud’s R 164 Gutenberg corpus see tools and corpora Gutenberg Project 150-154, 157-158 H hapax legomena 145,148, 149.171 heat map 151 hierarchical cluster analysis 6, 213,257-258,260, 265, 266 historical linguistics 5, 209-222 h-point 147-148,149,167-168, 198-199 L lambda coefficient 148,156,171 LancsBox software see tools LancsBox software see tools and corpora language British English 138,259 Chinese 122,139,258,273 Croatian 55-63,65 Czech 2,5,9-19,42-43,62, 177-191,195-205 English 6,30,40,62, 72. 94֊95. 138,179.181, 190-191,216-217,շշշ։ 257-266,273 Finnish 138,142 German 95 Greek 5,163-173 Japanese 3-4,69-90, 121-134. 274 Mambila 139 Polish 6, 225-236,239-255 Romanian 137-139 Russian 3,55-65 Serbian 3,55-65 Turkish 110,115-119 Ukrainian 3,55-65 law of brevity 55,61,63 lexical balance 121 likes per view (parameter) 5,182-190 literary genre (recognizing) 63, 227-233,239 long short-term memories 150 Louvain-clustering 24 Μ I inflected
languages 94-95,228, 232, 246 initial phrase (old Czech) 9-19 iterative feature elimination 153.155֊156 machine learning 3,5,37-38, 50,159,163-166,169,173. շշ5. 227, 235 Mambila see language MARC format 227,239, 241-241 Markov model 38,147 mathematical model see models MATTR (moving average type token ratio) 195,197,200 maximum onset principle 56-57 MDS see multidimensional scaling Menzerath-Altmann’s law 2-3, 55-56, 62-64,24б, շ54 models classification model 164, 166,170 generative model 146,150, 152,154-155 mathematical model 2, 17,40, 56,59-60,62-64, 134,220 probability model 23, 46-50,150,158 random text model 5.145-147,158-159 Sichel model 248 synergetic model of language 56,95,100 text model 145-159 Modern Greek Corpus see tools and corpora MOGRead (readability of Greek text) see tools and corpora morpheme 19,37, 61, 69,71-73 multidimensional scaling 37, 44-50,138-141,244-245,250 Multi-Layer Perceptron 228, 232 multivariate analysis 209, 212,216 mutual information 22-23,26 N named entity 164 National Corpus of Polish see tools and corpora natural language generation 145-146,150,159 natural language processing 90,145,153,165-166,225-226, 236,239,241,242,255 neural networks 145,147,150, 166,218,226 n-grams 3,38-41,46-47-50, 69-90,225, 228-230,232-234
Index 279 NLP see natural language processing nonlinearity 151,209-214 noun phrase 30, 69, 96 О one-meaning-one-form principle 4,109-120 P part of speech (POS) 94, 109-119,133,153,165-166,199, 242-243,247,257, 259,261 PCA see principal component analysis phonetics 2,139-140,177, 179-180, 217 phonology 5, 9, 56-57,100,138, 177-181,191, 217 phrase length 9-11,14-19 pleasantness of language (spoken) 179,191 political discourse 2,195-205 POS see part of speech precision 231,233-234 principal component analysis 6, 213,257-258, 260, 262-264, 266 principle of least effort 2, 94, 100,106,177,245, 254 probability model see models probability 10,21-23,25,39, 46,55,147,150,227,243,248 propositional function 109, 112-118 Q Quantitative Index Text Analyzer see tools quantitative text characteristics 5,39,145-149,158-159,167168,197-199 QUITA (Quantitative Index Text Analyzer) see tools and corpora QUITA see tools R Ri vocabulary size 46-49,147, 152-153,157-158,167-168,171 Random Forest (algorithm, classifier) 5-6,153-155,163, 169,172,257-258,261,264, 266 random text models see models random text 2,4-5,145-147, 150-151,154-159 randomization 50,145-147, 150-151,155,158 randomness 35,37-38,40-47, 49-51,169 rank-frequency relation 55,59, 147-148,168,198-199,248 readability of text (tool, assessment) 5,146,163-173 recall 231,233-234 relative chronology 216-217 repeat rate (relative, normalized) 149,152-153, 156-158,168,171 RFC see random forest RODA (Romanian Online Dialects) see tools and corpora RODA see tools Romanian Online Dialects see tools Romanian see language R-package ‘compute.es’
see tools and corpora RR see repeat rate Russian see language S Sacred Text Archive see tools and corpora self organizing map 218 semantic associations 21,147 semantics 1, 21,25,93-94,98, 100,106,210, 254 sensory input system 219-220 sentence length 164-168, 170-172, 243,245-246 sequence analysis 37-39, 51 Serbian see language shibboleths see dialect Sichel model see models SMOG readability formula 165 sonority sequencing principle 26-28 Spearman correlation coefficient 61 spoken language 6, 62,179180,257,259,261-264,266 statistical distribution Gauss-Poisson distribution 239, 248-249,252,254 Zipf-Mandelbrot distribution 3,55-56,59, 239,248-249, 251, 254 Dacey-Poisson distribution 55-56,60 stress 9-10,96 stylometry 1, 5,163,172, 195-205 support vector machines 153-155,166 SVM see support vector machines syllable frequency 3,55-56, 58-59, 61-62,64 syllable length 3,55-56, 60-65 synergetic model of language see models T taxonomy 225,235 t-complexity 50 text classification see classification text length 121-122,124-128, 130-134 text-mining 6,225-226,239, 241, 255,261 texts’ taxonomy see taxonomy thematic concentration 149, 195,198,202 time series 147,273 tools and corpora Altmann-Fitter 272 Balanced Corpus of Contemporary Written Japanese 4,121,123 British National Corpus 259 Coh-Metrix classifier 165 Corpus of Middle English Prose and Verse 217 Gutenberg corpus 152-153, 156 LancsBox software 199
շ8օ Language and Text MeCab (morphological analyzer of Japanese) 71 Modern Greek Corpus 62 MOGRead (readabihty of Greek text) 163-164, 166,170 National Corpus of Polish (NKJP) 242-246,249, 255 NLREG 63 QUITA (Quantitative Index Text Analyzer) 166,199 R-package compute.es’ 75 RODA (Romanian Online Dialect Atlas) 138-139 Sacred Text Archive 217 UniDic (electronic dictionary ofJapanese) 71 WCRF Tagger 242, 255 ZipfR package 243, 255 topological mapping 5, 209-222 treebank 259 trigram 149,166,230,233-234 TTR see type token ratio type token ratio 3,37-53,149, 151-153.156-158,168,171, word length 6,55-57,60-64, 165-166,168,170-171,245- 195,197 Turkish see language 246,254 word order 2,4,9,11-12,14,19, 69,90,116-117,119,128 word2vec 6,225,228 writer’s view 148,168 {see also alpha) writing species 225,227,235 U Ukrainian see language unique trigrams 149,152-153, 157-158 v valency (semantic) 93-94, 98-101 valency (syntactic) 93,95, 98-99 valency 2-4,69-72,76,81, 93-100,106 verb distances149,152-153, 157-158,195,199-201 visualization of data 2,3,5, 37-39,41-45,48,50,163, 172-173, 209-222 vocabulary richness 149,168, 197 W WCRF Tagger see tools and corpora Bayerische Staatsbibliothek München y Yule’s characteristic К 168 Z Zipflaw 121,126,128,131, 133-134 Zipf’s forces 245 Zipf-Mandelbrot distribution see statistical distribution ZipfR package see tools and corpora Δ Δ-score 23-25,31,34-35 |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author2 | Pawłowski, Adam Mačutek, Ján 1976- Embleton, Sheila M. Mikros, George K. |
author2_role | edt edt edt edt |
author2_variant | a p ap j m jm s m e sm sme g k m gk gkm |
author_GND | (DE-588)1085147258 |
author_corporate | QUALICO (Veranstaltung) Breslau |
author_corporate_role | aut |
author_facet | Pawłowski, Adam Mačutek, Ján 1976- Embleton, Sheila M. Mikros, George K. QUALICO (Veranstaltung) Breslau |
author_sort | QUALICO (Veranstaltung) Breslau |
building | Verbundindex |
bvnumber | BV047816174 |
callnumber-first | P - Language and Literature |
callnumber-label | P138 |
callnumber-raw | P138.5 |
callnumber-search | P138.5 |
callnumber-sort | P 3138.5 |
callnumber-subject | P - Philology and Linguistics |
ctrlnum | (DE-599)BVBBV047816174 |
format | Conference Proceeding Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 cb4500</leader><controlfield tag="001">BV047816174</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">220204s2021 |||| 10||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9789027210104</subfield><subfield code="9">978-90-272-1010-4</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV047816174</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-12</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">P138.5</subfield></datafield><datafield tag="111" ind1="2" ind2=" "><subfield code="a">QUALICO (Veranstaltung)</subfield><subfield code="n">10.</subfield><subfield code="d">2018</subfield><subfield code="c">Breslau</subfield><subfield code="j">Verfasser</subfield><subfield code="0">(DE-588)1249244242</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Language and text</subfield><subfield code="b">data, models, information and applications</subfield><subfield code="c">edited by Adam Pawłowski (University of Wrocław), Jan Mačutek (Mathematical Institute of Slovac Academy of Sciences & Constantine the Philosopher University in Nitra), Sheila Embleton (York University), George Mikros (Hamad Bin Khalifa University)</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Amsterdam ; Philadelphia</subfield><subfield code="b">John Benjamins Publishing Company</subfield><subfield code="c">[2021]</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2021</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">vi, 280 Seiten</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory</subfield><subfield code="v">volume 356</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Hervorgegangen aus der 10. QUALICO Konferenz, die vom 05.07.2018-08.07.2018 in Breslau, Polen, stattgefunden hat - Einleitung</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Linguistics</subfield><subfield code="x">Statistical methods</subfield><subfield code="v">Congresses</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computational linguistics</subfield><subfield code="v">Congresses</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Empirische Linguistik</subfield><subfield code="0">(DE-588)4406207-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Datenverarbeitung</subfield><subfield code="0">(DE-588)4011152-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Linguistik</subfield><subfield code="0">(DE-588)4074250-7</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Computerlinguistik</subfield><subfield code="0">(DE-588)4035843-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="653" ind1=" " ind2="6"><subfield code="a">Essays</subfield></datafield><datafield tag="653" ind1=" " ind2="6"><subfield code="a">Conference papers and proceedings</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)1071861417</subfield><subfield code="a">Konferenzschrift</subfield><subfield code="y">05.07.2018-08.07.2018</subfield><subfield code="z">Breslau</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Computerlinguistik</subfield><subfield code="0">(DE-588)4035843-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="689" ind1="1" ind2="0"><subfield code="a">Linguistik</subfield><subfield code="0">(DE-588)4074250-7</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2="1"><subfield code="a">Datenverarbeitung</subfield><subfield code="0">(DE-588)4011152-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="689" ind1="2" ind2="0"><subfield code="a">Korpus</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4165338-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="2" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="689" ind1="3" ind2="0"><subfield code="a">Empirische Linguistik</subfield><subfield code="0">(DE-588)4406207-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="3" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Pawłowski, Adam</subfield><subfield code="4">edt</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Mačutek, Ján</subfield><subfield code="d">1976-</subfield><subfield code="0">(DE-588)1085147258</subfield><subfield code="4">edt</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Embleton, Sheila M.</subfield><subfield code="4">edt</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Mikros, George K.</subfield><subfield code="4">edt</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-90-272-5838-0</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory</subfield><subfield code="v">volume 356</subfield><subfield code="w">(DE-604)BV000001437</subfield><subfield code="9">356</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung BSB München 19 - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033199570&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung BSB München 19 - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033199570&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Register // Sachregister</subfield></datafield><datafield tag="940" ind1="1" ind2=" "><subfield code="n">oe</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-033199570</subfield></datafield></record></collection> |
genre | (DE-588)1071861417 Konferenzschrift 05.07.2018-08.07.2018 Breslau gnd-content |
genre_facet | Konferenzschrift 05.07.2018-08.07.2018 Breslau |
id | DE-604.BV047816174 |
illustrated | Not Illustrated |
index_date | 2024-07-03T19:06:49Z |
indexdate | 2024-08-21T00:50:51Z |
institution | BVB |
institution_GND | (DE-588)1249244242 |
isbn | 9789027210104 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-033199570 |
open_access_boolean | |
owner | DE-12 |
owner_facet | DE-12 |
physical | vi, 280 Seiten |
publishDate | 2021 |
publishDateSearch | 2021 |
publishDateSort | 2021 |
publisher | John Benjamins Publishing Company |
record_format | marc |
series | Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory |
series2 | Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory |
spelling | QUALICO (Veranstaltung) 10. 2018 Breslau Verfasser (DE-588)1249244242 aut Language and text data, models, information and applications edited by Adam Pawłowski (University of Wrocław), Jan Mačutek (Mathematical Institute of Slovac Academy of Sciences & Constantine the Philosopher University in Nitra), Sheila Embleton (York University), George Mikros (Hamad Bin Khalifa University) Amsterdam ; Philadelphia John Benjamins Publishing Company [2021] © 2021 vi, 280 Seiten txt rdacontent n rdamedia nc rdacarrier Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory volume 356 Hervorgegangen aus der 10. QUALICO Konferenz, die vom 05.07.2018-08.07.2018 in Breslau, Polen, stattgefunden hat - Einleitung Linguistics Statistical methods Congresses Computational linguistics Congresses Empirische Linguistik (DE-588)4406207-2 gnd rswk-swf Datenverarbeitung (DE-588)4011152-0 gnd rswk-swf Linguistik (DE-588)4074250-7 gnd rswk-swf Korpus Linguistik (DE-588)4165338-5 gnd rswk-swf Computerlinguistik (DE-588)4035843-4 gnd rswk-swf Essays Conference papers and proceedings (DE-588)1071861417 Konferenzschrift 05.07.2018-08.07.2018 Breslau gnd-content Computerlinguistik (DE-588)4035843-4 s DE-604 Linguistik (DE-588)4074250-7 s Datenverarbeitung (DE-588)4011152-0 s Korpus Linguistik (DE-588)4165338-5 s Empirische Linguistik (DE-588)4406207-2 s Pawłowski, Adam edt Mačutek, Ján 1976- (DE-588)1085147258 edt Embleton, Sheila M. edt Mikros, George K. edt Erscheint auch als Online-Ausgabe 978-90-272-5838-0 Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory volume 356 (DE-604)BV000001437 356 Digitalisierung BSB München 19 - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033199570&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis Digitalisierung BSB München 19 - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033199570&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA Register // Sachregister |
spellingShingle | Language and text data, models, information and applications Amsterdam studies in the theory and history of linguistic science. Series 4, Current issues in linguistic theory Linguistics Statistical methods Congresses Computational linguistics Congresses Empirische Linguistik (DE-588)4406207-2 gnd Datenverarbeitung (DE-588)4011152-0 gnd Linguistik (DE-588)4074250-7 gnd Korpus Linguistik (DE-588)4165338-5 gnd Computerlinguistik (DE-588)4035843-4 gnd |
subject_GND | (DE-588)4406207-2 (DE-588)4011152-0 (DE-588)4074250-7 (DE-588)4165338-5 (DE-588)4035843-4 (DE-588)1071861417 |
title | Language and text data, models, information and applications |
title_auth | Language and text data, models, information and applications |
title_exact_search | Language and text data, models, information and applications |
title_exact_search_txtP | Language and text data, models, information and applications |
title_full | Language and text data, models, information and applications edited by Adam Pawłowski (University of Wrocław), Jan Mačutek (Mathematical Institute of Slovac Academy of Sciences & Constantine the Philosopher University in Nitra), Sheila Embleton (York University), George Mikros (Hamad Bin Khalifa University) |
title_fullStr | Language and text data, models, information and applications edited by Adam Pawłowski (University of Wrocław), Jan Mačutek (Mathematical Institute of Slovac Academy of Sciences & Constantine the Philosopher University in Nitra), Sheila Embleton (York University), George Mikros (Hamad Bin Khalifa University) |
title_full_unstemmed | Language and text data, models, information and applications edited by Adam Pawłowski (University of Wrocław), Jan Mačutek (Mathematical Institute of Slovac Academy of Sciences & Constantine the Philosopher University in Nitra), Sheila Embleton (York University), George Mikros (Hamad Bin Khalifa University) |
title_short | Language and text |
title_sort | language and text data models information and applications |
title_sub | data, models, information and applications |
topic | Linguistics Statistical methods Congresses Computational linguistics Congresses Empirische Linguistik (DE-588)4406207-2 gnd Datenverarbeitung (DE-588)4011152-0 gnd Linguistik (DE-588)4074250-7 gnd Korpus Linguistik (DE-588)4165338-5 gnd Computerlinguistik (DE-588)4035843-4 gnd |
topic_facet | Linguistics Statistical methods Congresses Computational linguistics Congresses Empirische Linguistik Datenverarbeitung Linguistik Korpus Linguistik Computerlinguistik Konferenzschrift 05.07.2018-08.07.2018 Breslau |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033199570&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033199570&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |
volume_link | (DE-604)BV000001437 |
work_keys_str_mv | AT qualicoveranstaltungbreslau languageandtextdatamodelsinformationandapplications AT pawłowskiadam languageandtextdatamodelsinformationandapplications AT macutekjan languageandtextdatamodelsinformationandapplications AT embletonsheilam languageandtextdatamodelsinformationandapplications AT mikrosgeorgek languageandtextdatamodelsinformationandapplications |