Validity, reliability, and significance: empirical methods for NLP and data science
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
[San Rafael]
Morgan & Claypool Publishers
[2022]
|
Schriftenreihe: | Synthesis lectures on human language technologies
#55 |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | xvii, 147 Seiten Diagramme |
ISBN: | 9781636392738 9781636392714 |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV047668413 | ||
003 | DE-604 | ||
005 | 20220707 | ||
007 | t | ||
008 | 220112s2022 |||| |||| 00||| eng d | ||
020 | |a 9781636392738 |c hardback |9 978-1-63639-273-8 | ||
020 | |a 9781636392714 |c paperback |9 978-1-63639-271-4 | ||
035 | |a (OCoLC)1314905274 | ||
035 | |a (DE-599)BVBBV047668413 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-29 |a DE-739 | ||
084 | |a ST 306 |0 (DE-625)143654: |2 rvk | ||
100 | 1 | |a Riezler, Stefan |e Verfasser |0 (DE-588)1033925454 |4 aut | |
245 | 1 | 0 | |a Validity, reliability, and significance |b empirical methods for NLP and data science |c Stefan Riezler, Michael Hagmann |
264 | 1 | |a [San Rafael] |b Morgan & Claypool Publishers |c [2022] | |
264 | 4 | |c © 2022 | |
300 | |a xvii, 147 Seiten |b Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Synthesis lectures on human language technologies |v #55 | |
650 | 4 | |a Linguistics | |
650 | 0 | 7 | |a Automatische Sprachanalyse |0 (DE-588)4129935-8 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Automatische Sprachanalyse |0 (DE-588)4129935-8 |D s |
689 | 0 | 1 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Hagmann, Michael |d 1981- |e Verfasser |0 (DE-588)1193940567 |4 aut | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe, PDF |z 978-1-63639-272-1 |
830 | 0 | |a Synthesis lectures on human language technologies |v #55 |w (DE-604)BV035447238 |9 55 | |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033053114&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
Datensatz im Suchindex
_version_ | 1805074595448881152 |
---|---|
adam_text |
xiii Contents Preface. Acknowledgments 1 2 . xvii Introduction. . . , 1 1.1 Empirical Methods in Machine Learning. 1 1.2 Scope and Outline of this Book. 3 1.3 Intended Readership . . 6 Validity. .9 2.1 Validity Problems in NLP and Data Science . . 2.1.1 Bias Features. 2.1.2 Illegitimate Features. 2.1.3 Circular Features. 2.2 Theories of Measurement and Validity. 12 2.2.1 The Concept of Validity in Psychometrics. 12 2.2.2 The TheoryJ of Scales of Measurement. . 14 2.2.3 Theories of Measurement in Philosophy of Science . 15 Prediction as Measurement. 16 2.3.1 Feature Representations. 17 2.3.2 Measurement Data. .18 2.3 3 xv 9 9 10 11 2.4 Descriptive and Model- Based Validity Tests . . . 19 2.4,1 Dataset Bias Test. . . 20 2.4.2 Transformation
Invariance Test. . . 25 2.4.3 A Model-Based Test for Circularity. 28 2.5 Notes on Practical Usage . . . . 53 Reliability . . 55 3.1 Untangling Terminology: Reliability, Agreement, and Others .55 3.2 Performance Evaluation as Measurement . . 56 3.3 Descriptive and Model-Based Reliability lests . 57 3.3.1 Agreement Coefficients lot Data .Annotation . 57
xiv 3.3.2 3.3.3 3.4 4 Notes on Practical Usage. 88 Significance . 91 4.1 93 4.2 Д Bootstrap Confidence Intervals for Model Evaluation. 61 Model-Based Reliability Testing. 66 Parametric Significance Tests. О Sampling-Based Significance Tests . 4.2.1 Bootstrap Resampling. 4.2.2 Permutation Tests . 97 97 99 4.3 Model-Based Significance Testing . 101 4.3.1 Tie Generalized Likelihood Ratio Test. . . . 102 4.3.2 Likelihood Ratio Tests using LMEMs . 104 4.4 Notes on Practical Usage. 113 Mathematical Background . . 115 A.l Generalized Additive Models. A.1.1 G eneral Form of Model . . A. 1.2 Example. . A. 1.3 Parameter Estimation. . 115 115 116 117 A.2 Linear Mixed Effects Models . A.2.1 General Form of Model. A.2.2 Example . A.2.3 Parameter Optimization. 120 120 121 125 A.3 The Distribution of the Likelihood Ratio
Statistic.126 A.3.1 Score Function and Fisher Information. 126 A.3.2 Taylor Expansion and Asymptotic Distribution. 127 Bibliography. . . 129 Authors’ Biographies . 147 |
adam_txt |
xiii Contents Preface. Acknowledgments 1 2 . xvii Introduction. . . , 1 1.1 Empirical Methods in Machine Learning. 1 1.2 Scope and Outline of this Book. 3 1.3 Intended Readership . . 6 Validity. .9 2.1 Validity Problems in NLP and Data Science . . 2.1.1 Bias Features. 2.1.2 Illegitimate Features. 2.1.3 Circular Features. 2.2 Theories of Measurement and Validity. 12 2.2.1 The Concept of Validity in Psychometrics. 12 2.2.2 The TheoryJ of Scales of Measurement. . 14 2.2.3 Theories of Measurement in Philosophy of Science . 15 Prediction as Measurement. 16 2.3.1 Feature Representations. 17 2.3.2 Measurement Data. .18 2.3 3 xv 9 9 10 11 2.4 Descriptive and Model- Based Validity Tests . . . 19 2.4,1 Dataset Bias Test. . . 20 2.4.2 Transformation
Invariance Test. . . 25 2.4.3 A Model-Based Test for Circularity. 28 2.5 Notes on Practical Usage . . . . 53 Reliability . . 55 3.1 Untangling Terminology: Reliability, Agreement, and Others .55 3.2 Performance Evaluation as Measurement . . 56 3.3 Descriptive and Model-Based Reliability lests . 57 3.3.1 Agreement Coefficients lot Data .Annotation . 57
xiv 3.3.2 3.3.3 3.4 4 Notes on Practical Usage. 88 Significance . 91 4.1 93 4.2 Д Bootstrap Confidence Intervals for Model Evaluation. 61 Model-Based Reliability Testing. 66 Parametric Significance Tests. О Sampling-Based Significance Tests . 4.2.1 Bootstrap Resampling. 4.2.2 Permutation Tests . 97 97 99 4.3 Model-Based Significance Testing . 101 4.3.1 Tie Generalized Likelihood Ratio Test. . . . 102 4.3.2 Likelihood Ratio Tests using LMEMs . 104 4.4 Notes on Practical Usage. 113 Mathematical Background . . 115 A.l Generalized Additive Models. A.1.1 G eneral Form of Model . . A. 1.2 Example. . A. 1.3 Parameter Estimation. . 115 115 116 117 A.2 Linear Mixed Effects Models . A.2.1 General Form of Model. A.2.2 Example . A.2.3 Parameter Optimization. 120 120 121 125 A.3 The Distribution of the Likelihood Ratio
Statistic.126 A.3.1 Score Function and Fisher Information. 126 A.3.2 Taylor Expansion and Asymptotic Distribution. 127 Bibliography. . . 129 Authors’ Biographies . 147 |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Riezler, Stefan Hagmann, Michael 1981- |
author_GND | (DE-588)1033925454 (DE-588)1193940567 |
author_facet | Riezler, Stefan Hagmann, Michael 1981- |
author_role | aut aut |
author_sort | Riezler, Stefan |
author_variant | s r sr m h mh |
building | Verbundindex |
bvnumber | BV047668413 |
classification_rvk | ST 306 |
ctrlnum | (OCoLC)1314905274 (DE-599)BVBBV047668413 |
discipline | Informatik |
discipline_str_mv | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 cb4500</leader><controlfield tag="001">BV047668413</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20220707</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">220112s2022 |||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781636392738</subfield><subfield code="c">hardback</subfield><subfield code="9">978-1-63639-273-8</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781636392714</subfield><subfield code="c">paperback</subfield><subfield code="9">978-1-63639-271-4</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1314905274</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV047668413</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-29</subfield><subfield code="a">DE-739</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 306</subfield><subfield code="0">(DE-625)143654:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Riezler, Stefan</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1033925454</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Validity, reliability, and significance</subfield><subfield code="b">empirical methods for NLP and data science</subfield><subfield code="c">Stefan Riezler, Michael Hagmann</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">[San Rafael]</subfield><subfield code="b">Morgan & Claypool Publishers</subfield><subfield code="c">[2022]</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2022</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xvii, 147 Seiten</subfield><subfield code="b">Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Synthesis lectures on human language technologies</subfield><subfield code="v">#55</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Linguistics</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Sprachanalyse</subfield><subfield code="0">(DE-588)4129935-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Automatische Sprachanalyse</subfield><subfield code="0">(DE-588)4129935-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Hagmann, Michael</subfield><subfield code="d">1981-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1193940567</subfield><subfield code="4">aut</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe, PDF</subfield><subfield code="z">978-1-63639-272-1</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Synthesis lectures on human language technologies</subfield><subfield code="v">#55</subfield><subfield code="w">(DE-604)BV035447238</subfield><subfield code="9">55</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033053114&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield></record></collection> |
id | DE-604.BV047668413 |
illustrated | Not Illustrated |
index_date | 2024-07-03T18:54:31Z |
indexdate | 2024-07-20T05:28:00Z |
institution | BVB |
isbn | 9781636392738 9781636392714 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-033053114 |
oclc_num | 1314905274 |
open_access_boolean | |
owner | DE-29 DE-739 |
owner_facet | DE-29 DE-739 |
physical | xvii, 147 Seiten Diagramme |
publishDate | 2022 |
publishDateSearch | 2022 |
publishDateSort | 2022 |
publisher | Morgan & Claypool Publishers |
record_format | marc |
series | Synthesis lectures on human language technologies |
series2 | Synthesis lectures on human language technologies |
spelling | Riezler, Stefan Verfasser (DE-588)1033925454 aut Validity, reliability, and significance empirical methods for NLP and data science Stefan Riezler, Michael Hagmann [San Rafael] Morgan & Claypool Publishers [2022] © 2022 xvii, 147 Seiten Diagramme txt rdacontent n rdamedia nc rdacarrier Synthesis lectures on human language technologies #55 Linguistics Automatische Sprachanalyse (DE-588)4129935-8 gnd rswk-swf Data Mining (DE-588)4428654-5 gnd rswk-swf Automatische Sprachanalyse (DE-588)4129935-8 s Data Mining (DE-588)4428654-5 s DE-604 Hagmann, Michael 1981- Verfasser (DE-588)1193940567 aut Erscheint auch als Online-Ausgabe, PDF 978-1-63639-272-1 Synthesis lectures on human language technologies #55 (DE-604)BV035447238 55 Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033053114&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Riezler, Stefan Hagmann, Michael 1981- Validity, reliability, and significance empirical methods for NLP and data science Synthesis lectures on human language technologies Linguistics Automatische Sprachanalyse (DE-588)4129935-8 gnd Data Mining (DE-588)4428654-5 gnd |
subject_GND | (DE-588)4129935-8 (DE-588)4428654-5 |
title | Validity, reliability, and significance empirical methods for NLP and data science |
title_auth | Validity, reliability, and significance empirical methods for NLP and data science |
title_exact_search | Validity, reliability, and significance empirical methods for NLP and data science |
title_exact_search_txtP | Validity, reliability, and significance empirical methods for NLP and data science |
title_full | Validity, reliability, and significance empirical methods for NLP and data science Stefan Riezler, Michael Hagmann |
title_fullStr | Validity, reliability, and significance empirical methods for NLP and data science Stefan Riezler, Michael Hagmann |
title_full_unstemmed | Validity, reliability, and significance empirical methods for NLP and data science Stefan Riezler, Michael Hagmann |
title_short | Validity, reliability, and significance |
title_sort | validity reliability and significance empirical methods for nlp and data science |
title_sub | empirical methods for NLP and data science |
topic | Linguistics Automatische Sprachanalyse (DE-588)4129935-8 gnd Data Mining (DE-588)4428654-5 gnd |
topic_facet | Linguistics Automatische Sprachanalyse Data Mining |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033053114&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
volume_link | (DE-604)BV035447238 |
work_keys_str_mv | AT riezlerstefan validityreliabilityandsignificanceempiricalmethodsfornlpanddatascience AT hagmannmichael validityreliabilityandsignificanceempiricalmethodsfornlpanddatascience |