Collaborative annotation for reliable natural language processing: technical and sociological aspects
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
London
ISTE
2016
|
Schriftenreihe: | Focus series in cognitive science
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis Klappentext |
Beschreibung: | xxiv, 164 Seiten |
ISBN: | 9781848219045 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV043839655 | ||
003 | DE-604 | ||
005 | 20161216 | ||
007 | t | ||
008 | 161024s2016 |||| 00||| eng d | ||
020 | |a 9781848219045 |9 978-1-84821-904-5 | ||
035 | |a (OCoLC)960967190 | ||
035 | |a (DE-599)BVBBV043839655 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-355 | ||
084 | |a ST 306 |0 (DE-625)143654: |2 rvk | ||
100 | 1 | |a Fort, Karën |e Verfasser |0 (DE-588)1116453339 |4 aut | |
245 | 1 | 0 | |a Collaborative annotation for reliable natural language processing |b technical and sociological aspects |c Karën Fort |
264 | 1 | |a London |b ISTE |c 2016 | |
300 | |a xxiv, 164 Seiten | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Focus series in cognitive science | |
650 | 4 | |a Natural language processing (Computer science) | |
650 | 7 | |a Natural language processing (Computer science) |2 fast | |
650 | 0 | 7 | |a Annotation |0 (DE-588)4560829-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Sprachverarbeitung |0 (DE-588)4116579-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Natürliche Sprache |0 (DE-588)4041354-8 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Kollaboration |0 (DE-588)4031748-1 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Natürliche Sprache |0 (DE-588)4041354-8 |D s |
689 | 0 | 1 | |a Sprachverarbeitung |0 (DE-588)4116579-2 |D s |
689 | 0 | 2 | |a Annotation |0 (DE-588)4560829-5 |D s |
689 | 0 | 3 | |a Kollaboration |0 (DE-588)4031748-1 |D s |
689 | 0 | |5 DE-604 | |
856 | 4 | 2 | |m Digitalisierung UB Regensburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029250224&sequence=000003&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
856 | 4 | 2 | |m Digitalisierung UB Regensburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029250224&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |3 Klappentext |
999 | |a oai:aleph.bib-bvb.de:BVB01-029250224 |
Datensatz im Suchindex
_version_ | 1804176708105928704 |
---|---|
adam_text | Contents
Preface .............................................. ix
List of Acronyms...................................... xi
Introduction ....................................... xiii
Chapter 1. Annotating Collaboratively ................. 1
1.1. The annotation process (re)visited............. 1
1.1.1. Building consensus......................... 1
1.1.2. Existing methodologies..................... 3
1.1.3. Preparatory work........................... 7
1.1.4. Pre-campaign.............................. 13
1.1.5. Annotation................................ 17
1.1.6. Finalization.............................. 21
1.2. Annotation complexity......................... 24
1.2.1. Example overview.......................... 25
1.2.2. What to annotate?......................... 28
1.2.3. How to annotate?.......................... 30
1.2.4. The weight of the context................. 36
1.2.5. Visualization............................. 38
1.2.6. Elementary annotation tasks............... 40
1.3. Annotation tools.............................. 43
1.3.1. To be or not to be an annotation tool ..... 43
1.3.2. Much more than prototypes................. 46
vi Collaborative Annotation for Reliable Natural Language Processing
1.3.3. Addressing the new annotation challenges 49
1.3.4. The impossible dream tool................. 54
1.4. Evaluating the annotation quality ........... 55
1.4.1. What is annotation quality?............... 55
1.4.2. Understanding the basics.................. 56
1.4.3. Beyond kappas............................. 63
1.4.4. Giving meaning to the metrics............. 67
1.5. Conclusion .................................. 75
Chapter 2. Crowdsourcing Annotation ................. 77
2.1. What is crowdsourcing and why should we be
interested in it?................................. 77
2.1.1. A moving target........................... 77
2.1.2. A massive success......................... 80
2.2. Deconstructing the myths..................... 81
2.2.1. Crowdsourcing is a recent phenomenon ... 81
2.2.2. Crowdsourcing involves a crowd
(of non-experts)................................. 83
2.2.3. “Crowdsourcing involves (a crowd of)
non-experts”..................................... 87
2.3. Playing with a purpose....................... 93
2.3.1. Using the players’ innate capabilities and
world knowledge ................................. 94
2.3.2. Using the players’ school knowledge ...... 96
2.3.3. Using the players’ learning capacities .... 97
2.4. Acknowledging crowdsourcing specifics....... 101
2.4.1, Motivating the participants.............. 101
2.4.2. Producing quality data................... 107
2.5. Ethical issues.............................. 109
2.5.1. Game ethics.............................. 109
2.5.2. What’s wrong with Amazon Mechanical
Turk? .......................................... Ill
2.5.3. A charter to rule them all............... 113
Contents
vu
Conclusion........................................ 115
Appendix.......................................... 117
Glossary ......................................... 141
Bibliography...................................... 143
Index
163
FOCUS SERIES in COGNITIVE SCIENCE
This book presents a unique opportunity for constructing a consistent
image of collaborative manual annotation for Natural Language
Processing (NLP).
NLP has witnessed two major evolutions in the past 25 years: firstly, the
extraordinary success of machine learning, which is now, for better or
for worse, overwhelmingly dominant in the field, and secondly, the
multiplication of evaluation campaigns or shared tasks. Both involve
manually annotated corpora, for the training and evaluation of the
systems.
These corpora have progressively become the hidden pillars of our
domain, providing food for our hungry machine learning algorithms and
reference for evaluation. Annotation is now the place where linguistics
hides in NLP. However, manual annotation has largely been ignored for
some time, and it has taken a while even for annotation guidelines to be
recognized as essential.
Although some efforts have been made lately to address some of the
issues presented by manual annotation, there has still been little
research done on the subject. This book aims to provide some useful
insights into the subject.
Manual corpus annotation is now at the heart of NLP, and is still largely
unexplored. There is a blatant need for manual annotation engineering
(in the sense of a precisely formalized process), and this book aims to
provide a first step towards a holistic methodology, with a global view
on annotation.
Karen Fort is Associate Professor at University Paris-Sorbonne (Paris 4)
working on the STIH (meaning, text, computer science, history) team.
Her current research interests include collaborative manual annotation,
crowdsourcing and ethics.
|
any_adam_object | 1 |
author | Fort, Karën |
author_GND | (DE-588)1116453339 |
author_facet | Fort, Karën |
author_role | aut |
author_sort | Fort, Karën |
author_variant | k f kf |
building | Verbundindex |
bvnumber | BV043839655 |
classification_rvk | ST 306 |
ctrlnum | (OCoLC)960967190 (DE-599)BVBBV043839655 |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02061nam a2200433 c 4500</leader><controlfield tag="001">BV043839655</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20161216 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">161024s2016 |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781848219045</subfield><subfield code="9">978-1-84821-904-5</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)960967190</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV043839655</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-355</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 306</subfield><subfield code="0">(DE-625)143654:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Fort, Karën</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1116453339</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Collaborative annotation for reliable natural language processing</subfield><subfield code="b">technical and sociological aspects</subfield><subfield code="c">Karën Fort</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">London</subfield><subfield code="b">ISTE</subfield><subfield code="c">2016</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xxiv, 164 Seiten</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Focus series in cognitive science</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Natural language processing (Computer science)</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Natural language processing (Computer science)</subfield><subfield code="2">fast</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Annotation</subfield><subfield code="0">(DE-588)4560829-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Sprachverarbeitung</subfield><subfield code="0">(DE-588)4116579-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Natürliche Sprache</subfield><subfield code="0">(DE-588)4041354-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Kollaboration</subfield><subfield code="0">(DE-588)4031748-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Natürliche Sprache</subfield><subfield code="0">(DE-588)4041354-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Sprachverarbeitung</subfield><subfield code="0">(DE-588)4116579-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Annotation</subfield><subfield code="0">(DE-588)4560829-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="3"><subfield code="a">Kollaboration</subfield><subfield code="0">(DE-588)4031748-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029250224&sequence=000003&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029250224&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Klappentext</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-029250224</subfield></datafield></record></collection> |
id | DE-604.BV043839655 |
illustrated | Not Illustrated |
indexdate | 2024-07-10T07:36:28Z |
institution | BVB |
isbn | 9781848219045 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-029250224 |
oclc_num | 960967190 |
open_access_boolean | |
owner | DE-355 DE-BY-UBR |
owner_facet | DE-355 DE-BY-UBR |
physical | xxiv, 164 Seiten |
publishDate | 2016 |
publishDateSearch | 2016 |
publishDateSort | 2016 |
publisher | ISTE |
record_format | marc |
series2 | Focus series in cognitive science |
spelling | Fort, Karën Verfasser (DE-588)1116453339 aut Collaborative annotation for reliable natural language processing technical and sociological aspects Karën Fort London ISTE 2016 xxiv, 164 Seiten txt rdacontent n rdamedia nc rdacarrier Focus series in cognitive science Natural language processing (Computer science) Natural language processing (Computer science) fast Annotation (DE-588)4560829-5 gnd rswk-swf Sprachverarbeitung (DE-588)4116579-2 gnd rswk-swf Natürliche Sprache (DE-588)4041354-8 gnd rswk-swf Kollaboration (DE-588)4031748-1 gnd rswk-swf Natürliche Sprache (DE-588)4041354-8 s Sprachverarbeitung (DE-588)4116579-2 s Annotation (DE-588)4560829-5 s Kollaboration (DE-588)4031748-1 s DE-604 Digitalisierung UB Regensburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029250224&sequence=000003&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis Digitalisierung UB Regensburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029250224&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA Klappentext |
spellingShingle | Fort, Karën Collaborative annotation for reliable natural language processing technical and sociological aspects Natural language processing (Computer science) Natural language processing (Computer science) fast Annotation (DE-588)4560829-5 gnd Sprachverarbeitung (DE-588)4116579-2 gnd Natürliche Sprache (DE-588)4041354-8 gnd Kollaboration (DE-588)4031748-1 gnd |
subject_GND | (DE-588)4560829-5 (DE-588)4116579-2 (DE-588)4041354-8 (DE-588)4031748-1 |
title | Collaborative annotation for reliable natural language processing technical and sociological aspects |
title_auth | Collaborative annotation for reliable natural language processing technical and sociological aspects |
title_exact_search | Collaborative annotation for reliable natural language processing technical and sociological aspects |
title_full | Collaborative annotation for reliable natural language processing technical and sociological aspects Karën Fort |
title_fullStr | Collaborative annotation for reliable natural language processing technical and sociological aspects Karën Fort |
title_full_unstemmed | Collaborative annotation for reliable natural language processing technical and sociological aspects Karën Fort |
title_short | Collaborative annotation for reliable natural language processing |
title_sort | collaborative annotation for reliable natural language processing technical and sociological aspects |
title_sub | technical and sociological aspects |
topic | Natural language processing (Computer science) Natural language processing (Computer science) fast Annotation (DE-588)4560829-5 gnd Sprachverarbeitung (DE-588)4116579-2 gnd Natürliche Sprache (DE-588)4041354-8 gnd Kollaboration (DE-588)4031748-1 gnd |
topic_facet | Natural language processing (Computer science) Annotation Sprachverarbeitung Natürliche Sprache Kollaboration |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029250224&sequence=000003&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029250224&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT fortkaren collaborativeannotationforreliablenaturallanguageprocessingtechnicalandsociologicalaspects |