Practical weak supervision: doing more with less data
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo
O'Reilly
October 2021
|
Ausgabe: | First edition |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | xvii, 169 Seiten Illustrationen 24 cm |
ISBN: | 9781492077060 1492077062 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV048521109 | ||
003 | DE-604 | ||
005 | 20221124 | ||
007 | t | ||
008 | 221019s2021 cc a||| |||| 00||| eng d | ||
020 | |a 9781492077060 |c Broschur : EUR 80.50 (DE) |9 978-1-4920-7706-0 | ||
020 | |a 1492077062 |9 1-4920-7706-2 | ||
035 | |a (OCoLC)1286696237 | ||
035 | |a (DE-599)KXP1779549458 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
044 | |a cc |c XB-CN |a xxu |c XD-US |a xxk |c XA-GB |a ja |c XB-JP | ||
049 | |a DE-739 | ||
082 | 0 | |a 006.31 | |
084 | |a ST 300 |0 (DE-625)143650: |2 rvk | ||
100 | 1 | |a Tok, Wee Hyong |e Verfasser |0 (DE-588)1188724363 |4 aut | |
245 | 1 | 0 | |a Practical weak supervision |b doing more with less data |c Wee Hyong Tok, Amit Bahree, and Senja Filipi |
250 | |a First edition | ||
264 | 1 | |a Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo |b O'Reilly |c October 2021 | |
300 | |a xvii, 169 Seiten |b Illustrationen |c 24 cm | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 0 | 7 | |a Automatische Sprachanalyse |0 (DE-588)4129935-8 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Maschinelles Sehen |0 (DE-588)4129594-8 |2 gnd |9 rswk-swf |
653 | 0 | |a Supervised learning (Machine learning) | |
653 | 0 | |a Natural language processing (Computer science) | |
653 | 0 | |a Computer vision | |
653 | 0 | |a Computer vision | |
653 | 0 | |a Natural language processing (Computer science) | |
653 | 0 | |a Supervised learning (Machine learning) | |
689 | 0 | 0 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |D s |
689 | 0 | 1 | |a Maschinelles Sehen |0 (DE-588)4129594-8 |D s |
689 | 0 | 2 | |a Automatische Sprachanalyse |0 (DE-588)4129935-8 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Bahree, Amit |d ca. 20./21. Jh. |e Verfasser |0 (DE-588)1273677641 |4 aut | |
700 | 1 | |a Filipi, Senja |d ca. 20./21. Jh. |e Verfasser |0 (DE-588)1273677854 |4 aut | |
710 | 2 | |a O'Reilly Media, Inc. |0 (DE-588)1065501749 |4 pbl | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 9781492077039 |h 1 Online-Ressource (193 Seiten) |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033898001&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-033898001 |
Datensatz im Suchindex
_version_ | 1804184507632320512 |
---|---|
adam_text | Table of Contents Foreword by Xuedong Huang............................................................................. vii Foreword by Alex Ratner................................................................................. ix Preface..................................................................................................... xiii 1. Introduction to Weak Supervision........................................................... 1 What Is Weak Supervision? Real-World Weak Supervision with Snorkel Approaches to Weak Supervision Incomplete Supervision Inexact Supervision Inaccurate Supervision Data Programming Getting Training Data How Data Programming Is Helping Accelerate Software 2.0 Summary 2. Diving into Data Programming with Snorkel............................................... Snorkel, a Data Programming Framework Getting Started with Labeling Functions Applying the Labels to the Datasets Analyzing the Labeling Performance Using a Validation Set Reaching Labeling Consensus with LabelModel Intuition Behind LabelModel LabelModel Parameter Estimation Strategies to Improve the Labeling Functions 1 2 6 6 9 10 11 13 14 16 17 18 19 21 22 27 29 30 30 32 iii
Data Augmentation with Snorkel Transformers Data Augmentation Through Word Removal Snorkel Preprocessors Data Augmentation Through GPT-2 Prediction Data Augmentation Through Translation Applying the Transformation Functions to the Dataset Summary 33 36 38 39 42 45 47 3. Labeling in Action.................................................................................... 49 Labeling a Text Dataset: Identifying Fake News Exploring the Fake News Detection(FakeNewsNet) Dataset Importing Snorkel and Setting Up Representative Constants Fact-Checking Sites Is the Speaker a “Liar”? Twitter Profile and Botometer Score Generating Agreements Between Weak Classifiers Labeling an Images Dataset: Determining Indoor Versus Outdoor Images Creating a Dataset of Images from Bing Defining and Training Weak Classifiers in TensorFlow Training the Various Classifiers Weak Classifiers out of Image Tags Deploying the Computer Vision Service Interacting with the Computer Vision Service Preparing the DataFrame Learning a LabelModel Summary 50 51 52 52 61 63 64 67 71 71 74 76 77 78 80 81 85 4. Using the Snorkel-Labeled Dataset for Text Classification.................................. 87 Getting Started with Natural Language Processing (NLP) Transformers Hard Versus Probabilistic Labels Using ktrain for Performing Text Classification Data Preparation Dealing with an Imbalanced Dataset Training the Model Using the Text Classification Model for Prediction Finding a Good Learning Rate Using Hugging Face and Transformers Loading the Relevant Python Packages Dataset Preparation Checking Whether GPU
Hardware Is Available Performing Tokenization iv I Tabte of Contents 88 89 91 91 92 93 95 97 99 100 101 101 102 102
Model Training Testing the Fine-Tuned Model Summary 104 108 109 5. Using the Snorkel-Labeled Dataset for Image Classification................................ Ill Visual Object Recognition Overview Representing Image Features Transfer Learning for Computer Vision Using PyTorch for Image Classification Loading the Indoor/Outdoor Dataset Utility Functions Visualizing the Training Data Fine-Tuning the Pretrained Model Summary 111 112 113 114 115 118 119 120 130 6. Scalability and Distributed Training.............................................................. 131 The Need for Scalability Distributed Training Apache Spark: An Introduction Spark Application Design Using Azure Databricks to Scale Cluster Setup for Weak Supervision Fake News Detection Dataset on Databricks Labeling Functions for Snorkel Setting Up Dependencies Loading the Data Fact- Checking Sites Transfer Learning Using the LIAR Dataset Weak Classifiers: Generating Agreement Type Conversions Needed for Spark Runtime Summary 132 133 135 137 138 141 143 145 147 149 151 153 154 156 159 Index....................................................................................................... 161 Table of Contents | v
|
adam_txt |
Table of Contents Foreword by Xuedong Huang. vii Foreword by Alex Ratner. ix Preface. xiii 1. Introduction to Weak Supervision. 1 What Is Weak Supervision? Real-World Weak Supervision with Snorkel Approaches to Weak Supervision Incomplete Supervision Inexact Supervision Inaccurate Supervision Data Programming Getting Training Data How Data Programming Is Helping Accelerate Software 2.0 Summary 2. Diving into Data Programming with Snorkel. Snorkel, a Data Programming Framework Getting Started with Labeling Functions Applying the Labels to the Datasets Analyzing the Labeling Performance Using a Validation Set Reaching Labeling Consensus with LabelModel Intuition Behind LabelModel LabelModel Parameter Estimation Strategies to Improve the Labeling Functions 1 2 6 6 9 10 11 13 14 16 17 18 19 21 22 27 29 30 30 32 iii
Data Augmentation with Snorkel Transformers Data Augmentation Through Word Removal Snorkel Preprocessors Data Augmentation Through GPT-2 Prediction Data Augmentation Through Translation Applying the Transformation Functions to the Dataset Summary 33 36 38 39 42 45 47 3. Labeling in Action. 49 Labeling a Text Dataset: Identifying Fake News Exploring the Fake News Detection(FakeNewsNet) Dataset Importing Snorkel and Setting Up Representative Constants Fact-Checking Sites Is the Speaker a “Liar”? Twitter Profile and Botometer Score Generating Agreements Between Weak Classifiers Labeling an Images Dataset: Determining Indoor Versus Outdoor Images Creating a Dataset of Images from Bing Defining and Training Weak Classifiers in TensorFlow Training the Various Classifiers Weak Classifiers out of Image Tags Deploying the Computer Vision Service Interacting with the Computer Vision Service Preparing the DataFrame Learning a LabelModel Summary 50 51 52 52 61 63 64 67 71 71 74 76 77 78 80 81 85 4. Using the Snorkel-Labeled Dataset for Text Classification. 87 Getting Started with Natural Language Processing (NLP) Transformers Hard Versus Probabilistic Labels Using ktrain for Performing Text Classification Data Preparation Dealing with an Imbalanced Dataset Training the Model Using the Text Classification Model for Prediction Finding a Good Learning Rate Using Hugging Face and Transformers Loading the Relevant Python Packages Dataset Preparation Checking Whether GPU
Hardware Is Available Performing Tokenization iv I Tabte of Contents 88 89 91 91 92 93 95 97 99 100 101 101 102 102
Model Training Testing the Fine-Tuned Model Summary 104 108 109 5. Using the Snorkel-Labeled Dataset for Image Classification. Ill Visual Object Recognition Overview Representing Image Features Transfer Learning for Computer Vision Using PyTorch for Image Classification Loading the Indoor/Outdoor Dataset Utility Functions Visualizing the Training Data Fine-Tuning the Pretrained Model Summary 111 112 113 114 115 118 119 120 130 6. Scalability and Distributed Training. 131 The Need for Scalability Distributed Training Apache Spark: An Introduction Spark Application Design Using Azure Databricks to Scale Cluster Setup for Weak Supervision Fake News Detection Dataset on Databricks Labeling Functions for Snorkel Setting Up Dependencies Loading the Data Fact- Checking Sites Transfer Learning Using the LIAR Dataset Weak Classifiers: Generating Agreement Type Conversions Needed for Spark Runtime Summary 132 133 135 137 138 141 143 145 147 149 151 153 154 156 159 Index. 161 Table of Contents | v |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Tok, Wee Hyong Bahree, Amit ca. 20./21. Jh Filipi, Senja ca. 20./21. Jh |
author_GND | (DE-588)1188724363 (DE-588)1273677641 (DE-588)1273677854 |
author_facet | Tok, Wee Hyong Bahree, Amit ca. 20./21. Jh Filipi, Senja ca. 20./21. Jh |
author_role | aut aut aut |
author_sort | Tok, Wee Hyong |
author_variant | w h t wh wht a b ab s f sf |
building | Verbundindex |
bvnumber | BV048521109 |
classification_rvk | ST 300 |
ctrlnum | (OCoLC)1286696237 (DE-599)KXP1779549458 |
dewey-full | 006.31 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 006 - Special computer methods |
dewey-raw | 006.31 |
dewey-search | 006.31 |
dewey-sort | 16.31 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik |
discipline_str_mv | Informatik |
edition | First edition |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02386nam a2200529 c 4500</leader><controlfield tag="001">BV048521109</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20221124 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">221019s2021 cc a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781492077060</subfield><subfield code="c">Broschur : EUR 80.50 (DE)</subfield><subfield code="9">978-1-4920-7706-0</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1492077062</subfield><subfield code="9">1-4920-7706-2</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1286696237</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)KXP1779549458</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">cc</subfield><subfield code="c">XB-CN</subfield><subfield code="a">xxu</subfield><subfield code="c">XD-US</subfield><subfield code="a">xxk</subfield><subfield code="c">XA-GB</subfield><subfield code="a">ja</subfield><subfield code="c">XB-JP</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-739</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.31</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Tok, Wee Hyong</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1188724363</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Practical weak supervision</subfield><subfield code="b">doing more with less data</subfield><subfield code="c">Wee Hyong Tok, Amit Bahree, and Senja Filipi</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">First edition</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo</subfield><subfield code="b">O'Reilly</subfield><subfield code="c">October 2021</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xvii, 169 Seiten</subfield><subfield code="b">Illustrationen</subfield><subfield code="c">24 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Sprachanalyse</subfield><subfield code="0">(DE-588)4129935-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Maschinelles Sehen</subfield><subfield code="0">(DE-588)4129594-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Supervised learning (Machine learning)</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Natural language processing (Computer science)</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Computer vision</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Computer vision</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Natural language processing (Computer science)</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Supervised learning (Machine learning)</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Maschinelles Sehen</subfield><subfield code="0">(DE-588)4129594-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Automatische Sprachanalyse</subfield><subfield code="0">(DE-588)4129935-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Bahree, Amit</subfield><subfield code="d">ca. 20./21. Jh.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1273677641</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Filipi, Senja</subfield><subfield code="d">ca. 20./21. Jh.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1273677854</subfield><subfield code="4">aut</subfield></datafield><datafield tag="710" ind1="2" ind2=" "><subfield code="a">O'Reilly Media, Inc.</subfield><subfield code="0">(DE-588)1065501749</subfield><subfield code="4">pbl</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">9781492077039</subfield><subfield code="h">1 Online-Ressource (193 Seiten)</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033898001&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-033898001</subfield></datafield></record></collection> |
id | DE-604.BV048521109 |
illustrated | Illustrated |
index_date | 2024-07-03T20:49:55Z |
indexdate | 2024-07-10T09:40:26Z |
institution | BVB |
institution_GND | (DE-588)1065501749 |
isbn | 9781492077060 1492077062 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-033898001 |
oclc_num | 1286696237 |
open_access_boolean | |
owner | DE-739 |
owner_facet | DE-739 |
physical | xvii, 169 Seiten Illustrationen 24 cm |
publishDate | 2021 |
publishDateSearch | 2021 |
publishDateSort | 2021 |
publisher | O'Reilly |
record_format | marc |
spelling | Tok, Wee Hyong Verfasser (DE-588)1188724363 aut Practical weak supervision doing more with less data Wee Hyong Tok, Amit Bahree, and Senja Filipi First edition Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo O'Reilly October 2021 xvii, 169 Seiten Illustrationen 24 cm txt rdacontent n rdamedia nc rdacarrier Automatische Sprachanalyse (DE-588)4129935-8 gnd rswk-swf Maschinelles Lernen (DE-588)4193754-5 gnd rswk-swf Maschinelles Sehen (DE-588)4129594-8 gnd rswk-swf Supervised learning (Machine learning) Natural language processing (Computer science) Computer vision Maschinelles Lernen (DE-588)4193754-5 s Maschinelles Sehen (DE-588)4129594-8 s Automatische Sprachanalyse (DE-588)4129935-8 s DE-604 Bahree, Amit ca. 20./21. Jh. Verfasser (DE-588)1273677641 aut Filipi, Senja ca. 20./21. Jh. Verfasser (DE-588)1273677854 aut O'Reilly Media, Inc. (DE-588)1065501749 pbl Erscheint auch als Online-Ausgabe 9781492077039 1 Online-Ressource (193 Seiten) Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033898001&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Tok, Wee Hyong Bahree, Amit ca. 20./21. Jh Filipi, Senja ca. 20./21. Jh Practical weak supervision doing more with less data Automatische Sprachanalyse (DE-588)4129935-8 gnd Maschinelles Lernen (DE-588)4193754-5 gnd Maschinelles Sehen (DE-588)4129594-8 gnd |
subject_GND | (DE-588)4129935-8 (DE-588)4193754-5 (DE-588)4129594-8 |
title | Practical weak supervision doing more with less data |
title_auth | Practical weak supervision doing more with less data |
title_exact_search | Practical weak supervision doing more with less data |
title_exact_search_txtP | Practical weak supervision doing more with less data |
title_full | Practical weak supervision doing more with less data Wee Hyong Tok, Amit Bahree, and Senja Filipi |
title_fullStr | Practical weak supervision doing more with less data Wee Hyong Tok, Amit Bahree, and Senja Filipi |
title_full_unstemmed | Practical weak supervision doing more with less data Wee Hyong Tok, Amit Bahree, and Senja Filipi |
title_short | Practical weak supervision |
title_sort | practical weak supervision doing more with less data |
title_sub | doing more with less data |
topic | Automatische Sprachanalyse (DE-588)4129935-8 gnd Maschinelles Lernen (DE-588)4193754-5 gnd Maschinelles Sehen (DE-588)4129594-8 gnd |
topic_facet | Automatische Sprachanalyse Maschinelles Lernen Maschinelles Sehen |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=033898001&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT tokweehyong practicalweaksupervisiondoingmorewithlessdata AT bahreeamit practicalweaksupervisiondoingmorewithlessdata AT filipisenja practicalweaksupervisiondoingmorewithlessdata AT oreillymediainc practicalweaksupervisiondoingmorewithlessdata |