Blueprints for text analytics using Python: machine learning-based solutions for common real world (NLP) applications
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo
O'Reilly
2020
|
Ausgabe: | First edition |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | xx, 401 Seiten Illustrationen, Diagramme |
ISBN: | 9781492074083 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV047079317 | ||
003 | DE-604 | ||
005 | 20210722 | ||
007 | t | ||
008 | 210107s2020 a||| |||| 00||| eng d | ||
020 | |a 9781492074083 |c pbk. : EUR 69.40, US $ 69.99, CAN $ 92.99 |9 978-1-492-07408-3 | ||
035 | |a (OCoLC)1369619374 | ||
035 | |a (DE-599)BVBBV047079317 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-92 |a DE-824 |a DE-860 |a DE-703 |a DE-N2 |a DE-739 | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
084 | |a ST 306 |0 (DE-625)143654: |2 rvk | ||
084 | |a ST 250 |0 (DE-625)143626: |2 rvk | ||
100 | 1 | |a Albrecht, Jens |e Verfasser |0 (DE-588)1227722427 |4 aut | |
245 | 1 | 0 | |a Blueprints for text analytics using Python |b machine learning-based solutions for common real world (NLP) applications |c Jens Albrecht, Sidharth Ramachandran and Christian Winkler |
250 | |a First edition | ||
264 | 1 | |a Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo |b O'Reilly |c 2020 | |
264 | 4 | |c © 2021 | |
300 | |a xx, 401 Seiten |b Illustrationen, Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 0 | 7 | |a Python |g Programmiersprache |0 (DE-588)4434275-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Sprachverarbeitung |0 (DE-588)4116579-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Text Mining |0 (DE-588)4728093-1 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Sprachverarbeitung |0 (DE-588)4116579-2 |D s |
689 | 0 | 1 | |a Text Mining |0 (DE-588)4728093-1 |D s |
689 | 0 | 2 | |a Python |g Programmiersprache |0 (DE-588)4434275-5 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Ramachandran, Sidharth |e Verfasser |4 aut | |
700 | 1 | |a Winkler, Christian |d 1982- |e Verfasser |0 (DE-588)1084637057 |4 aut | |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032486170&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-032486170 |
Datensatz im Suchindex
_version_ | 1804182088302198784 |
---|---|
adam_text | Table of Contents Preface.............................................................................................. xiii 1. Gaining Early Insights from Textual Data...................................................... 1 What You’ll Learn and What We’ll Build Exploratory Data Analysis Introducing the Dataset Blueprint: Getting an Overview of the Data with Pandas Calculating Summary Statistics for Columns Checking for Missing Data Plotting Value Distributions Comparing Value Distributions Across Categories Visualizing Developments Over Time Blueprint: Building a Simple Text Preprocessing Pipeline Performing Tokenization with Regular Expressions Treating Stop Words Processing a Pipeline with One Line of Code Blueprints for Word Frequency Analysis Blueprint: Counting Words with a Counter Blueprint: Creating a Frequency Diagram Blueprint: Creating Word Clouds Blueprint: Ranking with TF-IDF Blueprint: Finding a Keyword-in-Context Blueprint: Analyzing N-Grams Blueprint: Comparing Frequencies Across Time Intervals and Categories Creating Frequency Timelines Creating Frequency Heatmaps Closing Remarks 1 2 3 4 5 6 7 8 9 10 11 12 13 15 15 18 18 20 24 25 28 28 30 31 v
2. Extrading Textual Insights with APIs.............................................................. 33 What You’ll Learn and What Well Build Application Programming Interfaces Blueprint: Extracting Data from an API Using the Requests Module Pagination Rate Limiting Blueprint: Extracting Twitter Data with Tweepy Obtaining Credentials Installing and Configuring Tweepy Extracting Data from the Search API Extracting Data from a User’s Timeline Extracting Data from the Streaming API Closing Remarks 33 33 35 39 41 45 45 47 48 51 53 56 3. Scraping Websites and Extrading Data......................................................... 57 What You’ll Learn and What We’ll Build Scraping and Data Extraction Introducing the Reuters News Archive URL Generation Blueprint: Downloading and Interpreting robots.txt Blueprint: Finding URLs from sitemap.xml Blueprint: Finding URLs from RSS Downloading Data Blueprint: Downloading HTML Pages with Python Blueprint: Downloading HTML Pages with wget Extracting Semistructured Data Blueprint: Extracting Data with Regular Expressions Blueprint: Using an HTML Parser for Extraction Blueprint: Spidering Introducing the Use Case Error Handling and Production-Quality Software Density-Based Text Extraction Extracting Reuters Content with Readability Summary Density-Based Text Extraction All-in-One Approach Blueprint: Scraping the Reuters Archive with Scrapy Possible Problems with Scraping Closing Remarks and Recommendation 4. Preparing Textual Data for Statistics and Machine Learning............................... What You’ll Learn and What Well Build vi I
Table of Contents 57 58 59 61 62 63 65 66 68 69 70 70 71 78 78 81 82 82 83 84 84 86 87 89 89
A Data Preprocessing Pipeline Introducing the Dataset: Reddit Self-Posts Loading Data Into Pandas Blueprint: Standardizing Attribute Names Saving and Loading a DataFrame Cleaning Text Data Blueprint: Identify Noise with Regular Expressions Blueprint: Removing Noise with Regular Expressions Blueprint: Character Normalization with textacy Blueprint: Pattern-Based Data Masking with textacy Tokenization Blueprint: Tokenization with Regular Expressions Tokenization with NLTK Recommendations for Tokenization Linguistic Processing with spaCy Instantiating a Pipeline Processing Text Blueprint: Customizing Tokenization Blueprint: Working with Stop Words Blueprint: Extracting Lemmas Based on Part of Speech Blueprint: Extracting Noun Phrases Blueprint: Extracting Named Entities Feature Extraction on a Large Dataset Blueprint: Creating One Function to Get It All Blueprint: Using spaCy on a Large Dataset Persisting the Result A Note on Execution Time There Is More Language Detection Spell-Checking Token Normalization Closing Remarks and Recommendations 90 91 91 91 93 94 95 96 98 99 101 101 102 103 104 104 105 107 109 110 113 114 115 116 117 118 119 119 119 120 120 120 5. Feature Engineering and Syntactic Similarity............................................. 123 What You’ll Learn and What Well Build A Toy Dataset for Experimentation Blueprint: Building Your Own Vectorizer Enumerating the Vocabulary Vectorizing Documents The Document-Term Matrix The Similarity Matrix 124 125 125 125 126 128 129 Table of Contents | vii
Bag-of- Words Models Blueprint: Using scikit-learn’s CountVectorizer Blueprint: Calculating Similarities TF-IDF Models Optimized Document Vectors with TfidfTransformer Introducing the ABC Dataset Blueprint: Reducing Feature Dimensions Blueprint: Improving Features by Making Them More Specific Blueprint: Using Lemmas Instead of Words for Vectorizing Documents Blueprint: Limit Word Types Blueprint: Remove Most Common Words Blueprint: Adding Context via N-Grams Syntactic Similarity in the ABC Dataset Blueprint: Finding Most Similar Headlines to a Made-up Headline Blueprint: Finding the Two Most Similar Documents in a Large Corpus (Much More Difficult) Blueprint: Finding Related Words Tips for Long-Running Programs like Syntactic Similarity Summary and Conclusion 131 131 133 134 135 136 138 140 141 142 142 143 145 145 146 150 151 152 6. Text Classification Algorithms.................................................................... 153 What You’ll Learn and What Well Build Introducing the Java Development Tools Bug Dataset Blueprint: Building a Text Classification System Step 1: Data Preparation Step 2: Train-Test Split Step 3: Training the Machine Learning Model Step 4: Model Evaluation Final Blueprint for Text Classification Blueprint: Using Cross-Validation to Estimate Realistic Accuracy Metrics Blueprint: Performing Hyperparameter Tuning with Grid Search Blueprint Recap and Conclusion Closing Remarks Further Reading 154 154 158 160 161 163 165 172 175 177 179 183 183 7. How to Explain a Text Classifier....................................................................
185 What You’ll Learn and What We’ll Build Blueprint: Determining Classification Confidence Using Prediction Probability Blueprint: Measuring Feature Importance of Predictive Models Blueprint: Using LIME to Explain the Classification Results viii I Table of Contents 186 186 191 195
Blueprint: Using ELI5 to Explain the Classification Results Blueprint: Using Anchor to Explain the Classification Results Using the Distribution with Masked Words Working with Real Words Closing Remarks 200 203 203 206 208 8. Unsupervised Methods: Topic Modeling and Clustering................................... 209 What You’ll Learn and What Well Build Our Dataset: UN General Debates Checking Statistics of the Corpus Preparations Nonnegative Matrix Factorization (NMF) Blueprint: Creating a Topic Model Using NMF for Documents Blueprint: Creating a Topic Model for Paragraphs Using NMF Latent Semantic Analysis/Indexing Blueprint: Creating a Topic Model for Paragraphs with SVD Latent Dirichlet Allocation Blueprint: Creating a Topic Model for Paragraphs with LDA Blueprint: Visualizing LDA Results Blueprint: Using Word Clouds to Display and Compare Topic Models Blueprint: Calculating Topic Distribution of Documents and Time Evolution Using Gensim for Topic Modeling Blueprint: Preparing Data for Gensim Blueprint: Performing Nonnegative Matrix Factorization with Gensim Blueprint: Using LDA with Gensim Blueprint: Calculating Coherence Scores Blueprint: Finding the Optimal Number of Topics Blueprint: Creating a Hierarchical Dirichlet Process with Gensim Blueprint: Using Clustering to Uncover the Structure of Text Data Further Ideas Summary and Recommendation Conclusion 210 210 210 212 213 214 216 217 219 221 221 223 224 228 230 230 231 232 233 234 237 238 242 242 243 9. Text Summarization............................................................................... 245 What You’ll
Learn and What Well Build Text Summarization Extractive Methods Data Preprocessing Blueprint: Summarizing Text Using Topic Representation Identifying Important Words with TF-IDF Values LSA Algorithm 245 245 247 247 248 249 250 Table of Contents | ix
Blueprint: Summarizing Text Using an Indicator Representation Measuring the Performance of Text Summarization Methods Blueprint: Summarizing Text Using Machine Learning Step 1: Creating Target Labels Step 2: Adding Features to Assist Model Prediction Step 3: Build a Machine Learning Model Closing Remarks Further Reading 253 257 260 261 264 265 267 268 10. Exploring Semantic Relationships with Word Embeddings.................................. 269 What You’ll Learn and What Well Build The Case for Semantic Embeddings Word Embeddings Analogy Reasoning with Word Embeddings Types of Embeddings Blueprint: Using Similarity Queries on Pretrained Models Loading a Pretrained Model Similarity Queries Blueprints for Training and Evaluating Your Own Embeddings Data Preparation Blueprint: Training Models with Gensim Blueprint: Evaluating Different Models Blueprints for Visualizing Embeddings Blueprint: Applying Dimensionality Reduction Blueprint: Using the TensorFlow Embedding Projector Blueprint: Constructing a Similarity Tree Closing Remarks Further Reading 269 270 271 272 272 275 275 276 279 279 282 283 286 286 290 291 294 295 11. Performing Sentiment Analysis on Text Data................................................. 297 What You’ll Learn and What We’ll Build Sentiment Analysis Introducing the Amazon Customer Reviews Dataset Blueprint: Performing Sentiment Analysis Using Lexicon-Based Approaches Bing Liu Lexicon Disadvantages of a Lexicon-Based Approach Supervised Learning Approaches Preparing Data for a Supervised Learning Approach Blueprint: Vectorizing Text Data and Applying a
Supervised Machine Learning Algorithm Step 1: Data Preparation x I Table of Contents 298 298 299 301 302 304 305 305 306 306
Step 2: Train-Test Split Step 3: Text Vectorization Step 4: Training the Machine Learning Model Pretrained Language Models Using Deep Learning Deep Learning and Transfer Learning Blueprint: Using the Transfer Learning Technique and a Pretrained Language Model Step 1: Loading Models and Tokenization Step 2: Model Training Step 3: Model Evaluation Closing Remarks Further Reading 307 307 308 309 310 312 313 316 320 322 322 12. Building a Knowledge Graph................................................................. 325 What You’ll Learn and What We’ll Build Knowledge Graphs Information Extraction Introducing the Dataset Named-Entity Recognition Blueprint: Using Rule-Based Named-Entity Recognition Blueprint: Normalizing Named Entities Merging Entity Tokens Coreference Resolution Blueprint: Using spaCy’s Token Extensions Blueprint: Performing Alias Resolution Blueprint: Resolving Name Variations Blueprint: Performing Anaphora Resolution with NeuralCoref Name Normalization Entity Linking Blueprint: Creating a Co-Occurrence Graph Extracting Co-Occurrences from a Document Visualizing the Graph with Gephi Relation Extraction Blueprint: Extracting Relations Using Phrase Matching Blueprint: Extracting Relations Using Dependency Trees Creating the Knowledge Graph Don’t Blindly Trust the Results Closing Remarks Further Reading Table of Contents 326 326 327 328 329 331 333 334 335 336 337 338 340 342 343 344 346 347 348 349 351 355 357 358 359 | xi
13. Using Text Analytics ¡n Production.............................................................. 361 What You’ll Learn and What We’ll Build Blueprint: Using Conda to Create Reproducible Python Environments Blueprint: Using Containers to Create Reproducible Environments Blueprint: Creating a REST API for Your Text Analytics Model Blueprint: Deploying and Scaling Your APIUsing aCloud Provider Blueprint: Automatically Versioning and Deploying Builds Closing Remarks Further Reading 361 362 366 374 380 385 389 389 Index....................................................................................................... 391 xii I Table of Contents
|
adam_txt |
Table of Contents Preface. xiii 1. Gaining Early Insights from Textual Data. 1 What You’ll Learn and What We’ll Build Exploratory Data Analysis Introducing the Dataset Blueprint: Getting an Overview of the Data with Pandas Calculating Summary Statistics for Columns Checking for Missing Data Plotting Value Distributions Comparing Value Distributions Across Categories Visualizing Developments Over Time Blueprint: Building a Simple Text Preprocessing Pipeline Performing Tokenization with Regular Expressions Treating Stop Words Processing a Pipeline with One Line of Code Blueprints for Word Frequency Analysis Blueprint: Counting Words with a Counter Blueprint: Creating a Frequency Diagram Blueprint: Creating Word Clouds Blueprint: Ranking with TF-IDF Blueprint: Finding a Keyword-in-Context Blueprint: Analyzing N-Grams Blueprint: Comparing Frequencies Across Time Intervals and Categories Creating Frequency Timelines Creating Frequency Heatmaps Closing Remarks 1 2 3 4 5 6 7 8 9 10 11 12 13 15 15 18 18 20 24 25 28 28 30 31 v
2. Extrading Textual Insights with APIs. 33 What You’ll Learn and What Well Build Application Programming Interfaces Blueprint: Extracting Data from an API Using the Requests Module Pagination Rate Limiting Blueprint: Extracting Twitter Data with Tweepy Obtaining Credentials Installing and Configuring Tweepy Extracting Data from the Search API Extracting Data from a User’s Timeline Extracting Data from the Streaming API Closing Remarks 33 33 35 39 41 45 45 47 48 51 53 56 3. Scraping Websites and Extrading Data. 57 What You’ll Learn and What We’ll Build Scraping and Data Extraction Introducing the Reuters News Archive URL Generation Blueprint: Downloading and Interpreting robots.txt Blueprint: Finding URLs from sitemap.xml Blueprint: Finding URLs from RSS Downloading Data Blueprint: Downloading HTML Pages with Python Blueprint: Downloading HTML Pages with wget Extracting Semistructured Data Blueprint: Extracting Data with Regular Expressions Blueprint: Using an HTML Parser for Extraction Blueprint: Spidering Introducing the Use Case Error Handling and Production-Quality Software Density-Based Text Extraction Extracting Reuters Content with Readability Summary Density-Based Text Extraction All-in-One Approach Blueprint: Scraping the Reuters Archive with Scrapy Possible Problems with Scraping Closing Remarks and Recommendation 4. Preparing Textual Data for Statistics and Machine Learning. What You’ll Learn and What Well Build vi I
Table of Contents 57 58 59 61 62 63 65 66 68 69 70 70 71 78 78 81 82 82 83 84 84 86 87 89 89
A Data Preprocessing Pipeline Introducing the Dataset: Reddit Self-Posts Loading Data Into Pandas Blueprint: Standardizing Attribute Names Saving and Loading a DataFrame Cleaning Text Data Blueprint: Identify Noise with Regular Expressions Blueprint: Removing Noise with Regular Expressions Blueprint: Character Normalization with textacy Blueprint: Pattern-Based Data Masking with textacy Tokenization Blueprint: Tokenization with Regular Expressions Tokenization with NLTK Recommendations for Tokenization Linguistic Processing with spaCy Instantiating a Pipeline Processing Text Blueprint: Customizing Tokenization Blueprint: Working with Stop Words Blueprint: Extracting Lemmas Based on Part of Speech Blueprint: Extracting Noun Phrases Blueprint: Extracting Named Entities Feature Extraction on a Large Dataset Blueprint: Creating One Function to Get It All Blueprint: Using spaCy on a Large Dataset Persisting the Result A Note on Execution Time There Is More Language Detection Spell-Checking Token Normalization Closing Remarks and Recommendations 90 91 91 91 93 94 95 96 98 99 101 101 102 103 104 104 105 107 109 110 113 114 115 116 117 118 119 119 119 120 120 120 5. Feature Engineering and Syntactic Similarity. 123 What You’ll Learn and What Well Build A Toy Dataset for Experimentation Blueprint: Building Your Own Vectorizer Enumerating the Vocabulary Vectorizing Documents The Document-Term Matrix The Similarity Matrix 124 125 125 125 126 128 129 Table of Contents | vii
Bag-of- Words Models Blueprint: Using scikit-learn’s CountVectorizer Blueprint: Calculating Similarities TF-IDF Models Optimized Document Vectors with TfidfTransformer Introducing the ABC Dataset Blueprint: Reducing Feature Dimensions Blueprint: Improving Features by Making Them More Specific Blueprint: Using Lemmas Instead of Words for Vectorizing Documents Blueprint: Limit Word Types Blueprint: Remove Most Common Words Blueprint: Adding Context via N-Grams Syntactic Similarity in the ABC Dataset Blueprint: Finding Most Similar Headlines to a Made-up Headline Blueprint: Finding the Two Most Similar Documents in a Large Corpus (Much More Difficult) Blueprint: Finding Related Words Tips for Long-Running Programs like Syntactic Similarity Summary and Conclusion 131 131 133 134 135 136 138 140 141 142 142 143 145 145 146 150 151 152 6. Text Classification Algorithms. 153 What You’ll Learn and What Well Build Introducing the Java Development Tools Bug Dataset Blueprint: Building a Text Classification System Step 1: Data Preparation Step 2: Train-Test Split Step 3: Training the Machine Learning Model Step 4: Model Evaluation Final Blueprint for Text Classification Blueprint: Using Cross-Validation to Estimate Realistic Accuracy Metrics Blueprint: Performing Hyperparameter Tuning with Grid Search Blueprint Recap and Conclusion Closing Remarks Further Reading 154 154 158 160 161 163 165 172 175 177 179 183 183 7. How to Explain a Text Classifier.
185 What You’ll Learn and What We’ll Build Blueprint: Determining Classification Confidence Using Prediction Probability Blueprint: Measuring Feature Importance of Predictive Models Blueprint: Using LIME to Explain the Classification Results viii I Table of Contents 186 186 191 195
Blueprint: Using ELI5 to Explain the Classification Results Blueprint: Using Anchor to Explain the Classification Results Using the Distribution with Masked Words Working with Real Words Closing Remarks 200 203 203 206 208 8. Unsupervised Methods: Topic Modeling and Clustering. 209 What You’ll Learn and What Well Build Our Dataset: UN General Debates Checking Statistics of the Corpus Preparations Nonnegative Matrix Factorization (NMF) Blueprint: Creating a Topic Model Using NMF for Documents Blueprint: Creating a Topic Model for Paragraphs Using NMF Latent Semantic Analysis/Indexing Blueprint: Creating a Topic Model for Paragraphs with SVD Latent Dirichlet Allocation Blueprint: Creating a Topic Model for Paragraphs with LDA Blueprint: Visualizing LDA Results Blueprint: Using Word Clouds to Display and Compare Topic Models Blueprint: Calculating Topic Distribution of Documents and Time Evolution Using Gensim for Topic Modeling Blueprint: Preparing Data for Gensim Blueprint: Performing Nonnegative Matrix Factorization with Gensim Blueprint: Using LDA with Gensim Blueprint: Calculating Coherence Scores Blueprint: Finding the Optimal Number of Topics Blueprint: Creating a Hierarchical Dirichlet Process with Gensim Blueprint: Using Clustering to Uncover the Structure of Text Data Further Ideas Summary and Recommendation Conclusion 210 210 210 212 213 214 216 217 219 221 221 223 224 228 230 230 231 232 233 234 237 238 242 242 243 9. Text Summarization. 245 What You’ll
Learn and What Well Build Text Summarization Extractive Methods Data Preprocessing Blueprint: Summarizing Text Using Topic Representation Identifying Important Words with TF-IDF Values LSA Algorithm 245 245 247 247 248 249 250 Table of Contents | ix
Blueprint: Summarizing Text Using an Indicator Representation Measuring the Performance of Text Summarization Methods Blueprint: Summarizing Text Using Machine Learning Step 1: Creating Target Labels Step 2: Adding Features to Assist Model Prediction Step 3: Build a Machine Learning Model Closing Remarks Further Reading 253 257 260 261 264 265 267 268 10. Exploring Semantic Relationships with Word Embeddings. 269 What You’ll Learn and What Well Build The Case for Semantic Embeddings Word Embeddings Analogy Reasoning with Word Embeddings Types of Embeddings Blueprint: Using Similarity Queries on Pretrained Models Loading a Pretrained Model Similarity Queries Blueprints for Training and Evaluating Your Own Embeddings Data Preparation Blueprint: Training Models with Gensim Blueprint: Evaluating Different Models Blueprints for Visualizing Embeddings Blueprint: Applying Dimensionality Reduction Blueprint: Using the TensorFlow Embedding Projector Blueprint: Constructing a Similarity Tree Closing Remarks Further Reading 269 270 271 272 272 275 275 276 279 279 282 283 286 286 290 291 294 295 11. Performing Sentiment Analysis on Text Data. 297 What You’ll Learn and What We’ll Build Sentiment Analysis Introducing the Amazon Customer Reviews Dataset Blueprint: Performing Sentiment Analysis Using Lexicon-Based Approaches Bing Liu Lexicon Disadvantages of a Lexicon-Based Approach Supervised Learning Approaches Preparing Data for a Supervised Learning Approach Blueprint: Vectorizing Text Data and Applying a
Supervised Machine Learning Algorithm Step 1: Data Preparation x I Table of Contents 298 298 299 301 302 304 305 305 306 306
Step 2: Train-Test Split Step 3: Text Vectorization Step 4: Training the Machine Learning Model Pretrained Language Models Using Deep Learning Deep Learning and Transfer Learning Blueprint: Using the Transfer Learning Technique and a Pretrained Language Model Step 1: Loading Models and Tokenization Step 2: Model Training Step 3: Model Evaluation Closing Remarks Further Reading 307 307 308 309 310 312 313 316 320 322 322 12. Building a Knowledge Graph. 325 What You’ll Learn and What We’ll Build Knowledge Graphs Information Extraction Introducing the Dataset Named-Entity Recognition Blueprint: Using Rule-Based Named-Entity Recognition Blueprint: Normalizing Named Entities Merging Entity Tokens Coreference Resolution Blueprint: Using spaCy’s Token Extensions Blueprint: Performing Alias Resolution Blueprint: Resolving Name Variations Blueprint: Performing Anaphora Resolution with NeuralCoref Name Normalization Entity Linking Blueprint: Creating a Co-Occurrence Graph Extracting Co-Occurrences from a Document Visualizing the Graph with Gephi Relation Extraction Blueprint: Extracting Relations Using Phrase Matching Blueprint: Extracting Relations Using Dependency Trees Creating the Knowledge Graph Don’t Blindly Trust the Results Closing Remarks Further Reading Table of Contents 326 326 327 328 329 331 333 334 335 336 337 338 340 342 343 344 346 347 348 349 351 355 357 358 359 | xi
13. Using Text Analytics ¡n Production. 361 What You’ll Learn and What We’ll Build Blueprint: Using Conda to Create Reproducible Python Environments Blueprint: Using Containers to Create Reproducible Environments Blueprint: Creating a REST API for Your Text Analytics Model Blueprint: Deploying and Scaling Your APIUsing aCloud Provider Blueprint: Automatically Versioning and Deploying Builds Closing Remarks Further Reading 361 362 366 374 380 385 389 389 Index. 391 xii I Table of Contents |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Albrecht, Jens Ramachandran, Sidharth Winkler, Christian 1982- |
author_GND | (DE-588)1227722427 (DE-588)1084637057 |
author_facet | Albrecht, Jens Ramachandran, Sidharth Winkler, Christian 1982- |
author_role | aut aut aut |
author_sort | Albrecht, Jens |
author_variant | j a ja s r sr c w cw |
building | Verbundindex |
bvnumber | BV047079317 |
classification_rvk | ST 530 ST 306 ST 250 |
ctrlnum | (OCoLC)1369619374 (DE-599)BVBBV047079317 |
discipline | Informatik |
discipline_str_mv | Informatik |
edition | First edition |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01992nam a2200433 c 4500</leader><controlfield tag="001">BV047079317</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20210722 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">210107s2020 a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781492074083</subfield><subfield code="c">pbk. : EUR 69.40, US $ 69.99, CAN $ 92.99</subfield><subfield code="9">978-1-492-07408-3</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1369619374</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV047079317</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-92</subfield><subfield code="a">DE-824</subfield><subfield code="a">DE-860</subfield><subfield code="a">DE-703</subfield><subfield code="a">DE-N2</subfield><subfield code="a">DE-739</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 306</subfield><subfield code="0">(DE-625)143654:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 250</subfield><subfield code="0">(DE-625)143626:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Albrecht, Jens</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1227722427</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Blueprints for text analytics using Python</subfield><subfield code="b">machine learning-based solutions for common real world (NLP) applications</subfield><subfield code="c">Jens Albrecht, Sidharth Ramachandran and Christian Winkler</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">First edition</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo</subfield><subfield code="b">O'Reilly</subfield><subfield code="c">2020</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2021</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xx, 401 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Python</subfield><subfield code="g">Programmiersprache</subfield><subfield code="0">(DE-588)4434275-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Sprachverarbeitung</subfield><subfield code="0">(DE-588)4116579-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Text Mining</subfield><subfield code="0">(DE-588)4728093-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Sprachverarbeitung</subfield><subfield code="0">(DE-588)4116579-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Text Mining</subfield><subfield code="0">(DE-588)4728093-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Python</subfield><subfield code="g">Programmiersprache</subfield><subfield code="0">(DE-588)4434275-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ramachandran, Sidharth</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Winkler, Christian</subfield><subfield code="d">1982-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1084637057</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032486170&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-032486170</subfield></datafield></record></collection> |
id | DE-604.BV047079317 |
illustrated | Illustrated |
index_date | 2024-07-03T16:15:46Z |
indexdate | 2024-07-10T09:01:59Z |
institution | BVB |
isbn | 9781492074083 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-032486170 |
oclc_num | 1369619374 |
open_access_boolean | |
owner | DE-92 DE-824 DE-860 DE-703 DE-N2 DE-739 |
owner_facet | DE-92 DE-824 DE-860 DE-703 DE-N2 DE-739 |
physical | xx, 401 Seiten Illustrationen, Diagramme |
publishDate | 2020 |
publishDateSearch | 2020 |
publishDateSort | 2020 |
publisher | O'Reilly |
record_format | marc |
spelling | Albrecht, Jens Verfasser (DE-588)1227722427 aut Blueprints for text analytics using Python machine learning-based solutions for common real world (NLP) applications Jens Albrecht, Sidharth Ramachandran and Christian Winkler First edition Beijing ; Boston ; Farnham ; Sebastopol ; Tokyo O'Reilly 2020 © 2021 xx, 401 Seiten Illustrationen, Diagramme txt rdacontent n rdamedia nc rdacarrier Python Programmiersprache (DE-588)4434275-5 gnd rswk-swf Sprachverarbeitung (DE-588)4116579-2 gnd rswk-swf Text Mining (DE-588)4728093-1 gnd rswk-swf Sprachverarbeitung (DE-588)4116579-2 s Text Mining (DE-588)4728093-1 s Python Programmiersprache (DE-588)4434275-5 s DE-604 Ramachandran, Sidharth Verfasser aut Winkler, Christian 1982- Verfasser (DE-588)1084637057 aut Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032486170&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Albrecht, Jens Ramachandran, Sidharth Winkler, Christian 1982- Blueprints for text analytics using Python machine learning-based solutions for common real world (NLP) applications Python Programmiersprache (DE-588)4434275-5 gnd Sprachverarbeitung (DE-588)4116579-2 gnd Text Mining (DE-588)4728093-1 gnd |
subject_GND | (DE-588)4434275-5 (DE-588)4116579-2 (DE-588)4728093-1 |
title | Blueprints for text analytics using Python machine learning-based solutions for common real world (NLP) applications |
title_auth | Blueprints for text analytics using Python machine learning-based solutions for common real world (NLP) applications |
title_exact_search | Blueprints for text analytics using Python machine learning-based solutions for common real world (NLP) applications |
title_exact_search_txtP | Blueprints for text analytics using Python machine learning-based solutions for common real world (NLP) applications |
title_full | Blueprints for text analytics using Python machine learning-based solutions for common real world (NLP) applications Jens Albrecht, Sidharth Ramachandran and Christian Winkler |
title_fullStr | Blueprints for text analytics using Python machine learning-based solutions for common real world (NLP) applications Jens Albrecht, Sidharth Ramachandran and Christian Winkler |
title_full_unstemmed | Blueprints for text analytics using Python machine learning-based solutions for common real world (NLP) applications Jens Albrecht, Sidharth Ramachandran and Christian Winkler |
title_short | Blueprints for text analytics using Python |
title_sort | blueprints for text analytics using python machine learning based solutions for common real world nlp applications |
title_sub | machine learning-based solutions for common real world (NLP) applications |
topic | Python Programmiersprache (DE-588)4434275-5 gnd Sprachverarbeitung (DE-588)4116579-2 gnd Text Mining (DE-588)4728093-1 gnd |
topic_facet | Python Programmiersprache Sprachverarbeitung Text Mining |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=032486170&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT albrechtjens blueprintsfortextanalyticsusingpythonmachinelearningbasedsolutionsforcommonrealworldnlpapplications AT ramachandransidharth blueprintsfortextanalyticsusingpythonmachinelearningbasedsolutionsforcommonrealworldnlpapplications AT winklerchristian blueprintsfortextanalyticsusingpythonmachinelearningbasedsolutionsforcommonrealworldnlpapplications |