Verfügbarkeit: Data science bookcamp

Data science bookcamp: five Python projects

1. Computing probabilities using Python -- 2. Plotting probabilities using Matplotlib -- 3. Running random simulations in NumPy -- 4. Case study 1 solution -- 5. Basic probability and statistical analysis using SciPy -- 6. Making predictions using the central limit theorem and SciPy -- 7. Statistica...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Apeltsin, Leonard ca. 20./21. Jh (VerfasserIn)
Format:	Buch
Sprache:	English
Veröffentlicht:	Shelter Island Manning [2021]
Schlagworte:	Data Science Data Mining Python > Programmiersprache Data mining Data sets Python (Computer program language)
Online-Zugang:	Inhaltsverzeichnis Inhaltsverzeichnis
Zusammenfassung:	1. Computing probabilities using Python -- 2. Plotting probabilities using Matplotlib -- 3. Running random simulations in NumPy -- 4. Case study 1 solution -- 5. Basic probability and statistical analysis using SciPy -- 6. Making predictions using the central limit theorem and SciPy -- 7. Statistical hypothesis testing -- 8. Analyzing tables using Pandas -- 9. Case study 2 solution -- 10. Clustering data into groups -- 11. Geographic location visualization and analysis -- 12. Case study 3 solution -- 13. Measuring text similarities -- 14. Dimension reduction of matrix data -- 15. NLP analysis of large text datasets -- 16. Extracting text from web pages -- 17. Case study 4 solution -- 18. An introduction to graph theory and network analysis -- 19. Dynamic graph theory techniques for node ranking and social network analysis -- 20. Network-driven supervised machine learning -- 21. Training linear classifiers with logistic regression -- 22. Training nonlinear classifiers with decision tree techniques -- 23. Case study 5 solution.
Beschreibung:	Untertitel auf Cover: Five real-world Python projects Literaturangaben
Beschreibung:	xxvi, 676 Seiten Diagramme, Illustrationen 24 cm
ISBN:	9781617296253

Internformat

MARC


LEADER	00000nam a2200000 c 4500
001	BV048630066
003	DE-604
005	20230213
007	t
008	230104s2021 xxua\|\|\| \|\|\|\| 00\|\|\| eng d
020			\|a 9781617296253 \|c softcover \|9 978-1-61729-625-3
035			\|a (OCoLC)1294301294
035			\|a (DE-599)KXP178714853X
040			\|a DE-604 \|b ger \|e rda
041	0		\|a eng
044			\|a xxu \|c XD-US
049			\|a DE-739
082	0		\|a 006.312
084			\|a ST 300 \|0 (DE-625)143650: \|2 rvk
100	1		\|a Apeltsin, Leonard \|d ca. 20./21. Jh. \|e Verfasser \|0 (DE-588)1280840129 \|4 aut
245	1	0	\|a Data science bookcamp \|b five Python projects \|c Leonard Apeltsin
264		1	\|a Shelter Island \|b Manning \|c [2021]
300			\|a xxvi, 676 Seiten \|b Diagramme, Illustrationen \|c 24 cm
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
500			\|a Untertitel auf Cover: Five real-world Python projects
500			\|a Literaturangaben
520	3		\|a 1. Computing probabilities using Python -- 2. Plotting probabilities using Matplotlib -- 3. Running random simulations in NumPy -- 4. Case study 1 solution -- 5. Basic probability and statistical analysis using SciPy -- 6. Making predictions using the central limit theorem and SciPy -- 7. Statistical hypothesis testing -- 8. Analyzing tables using Pandas -- 9. Case study 2 solution -- 10. Clustering data into groups -- 11. Geographic location visualization and analysis -- 12. Case study 3 solution -- 13. Measuring text similarities -- 14. Dimension reduction of matrix data -- 15. NLP analysis of large text datasets -- 16. Extracting text from web pages -- 17. Case study 4 solution -- 18. An introduction to graph theory and network analysis -- 19. Dynamic graph theory techniques for node ranking and social network analysis -- 20. Network-driven supervised machine learning -- 21. Training linear classifiers with logistic regression -- 22. Training nonlinear classifiers with decision tree techniques -- 23. Case study 5 solution.
650	0	7	\|a Data Science \|0 (DE-588)1140936166 \|2 gnd \|9 rswk-swf
650	0	7	\|a Data Mining \|0 (DE-588)4428654-5 \|2 gnd \|9 rswk-swf
650	0	7	\|a Python \|g Programmiersprache \|0 (DE-588)4434275-5 \|2 gnd \|9 rswk-swf
653		0	\|a Data mining
653		0	\|a Data sets
653		0	\|a Python (Computer program language)
689	0	0	\|a Data Science \|0 (DE-588)1140936166 \|D s
689	0	1	\|a Data Mining \|0 (DE-588)4428654-5 \|D s
689	0	2	\|a Python \|g Programmiersprache \|0 (DE-588)4434275-5 \|D s
689	0		\|5 DE-604
856	4	2	\|u https://www.gbv.de/dms/bowker/toc/9781617296253.pdf \|v 2022-07-28 \|x Aggregator \|3 Inhaltsverzeichnis
856	4	2	\|m Digitalisierung UB Passau - ADAM Catalogue Enrichment \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034005124&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
999			\|a oai:aleph.bib-bvb.de:BVB01-034005124

Datensatz im Suchindex

_version_	1804184761679216640
adam_text	brief contents Case study 1 Finding the winning strategy in 1 A CARD GAME......................................... 1 ■ Computing probabilities using Python 2 ■ 3 Plotting probabilities using Matplotlib 3 17 ■ Running random simulations in NumPy 33 4 ■ Case study 1 solution 58 Case study 2 Assessing online ad clicks for SIGNIFICANCE..................................... 69 5 i■ Basic probability and statistical analysis using SciPy 71 6 1■ Making predictions using the central limit theorem and SciPy 94 7 i■ Statistical hypothesis testing 8 ■ Analyzing tables using Pandas 9 ■ Case study 2 solution vii 154 114 137 viii Case study 3 BRIEF CONTENTS Tracking disease outbreaks using news HEADLINES........ ................................................................... 165 10 ■ Clustering data into groups 167 11 ■ Geographic location visualization and analysis 12 ■ Case study 3 solution Case study 4 Using 194 226 online job postings to improve: YOUR DATA SCIENCE RESUME ...................................... 245 13 ■ Measuring text similarities 249 14 ■ Dimension reduction of matrix data 15 ■ NLP analysis of large text datasets 16 ■ Extracting text from web pages 17 ■ Case study 4 solution Case study 5 Predicting 292 340 385 404 euture friendships from SOCIAL NETWORK DATA....... . .......... 445 18 ■ An introduction to graph theory and network analysis 451 19 ■ Dynamic graph theory techniques for node ranking and social network analysis 482 20 ■ Network-driven supervised machine learning 518 21 ■ Training linear classifiers with logistic regression 22 ■ Training nonlinear classifiers with decision tree techniques 586 23 ■ Case study 5 solution 634 548 contents preface xvii acknowledgments xix about this book xxi about the author xxv about the cover illustration Case study 1 xxvi Finding the winning strategy in A CARD GAME .................................. Ί Computing probabilities using Python 3 3 1.1 Sample space analysis: An equation-free approach for measuring uncertainty in outcomes 4 Analyzing a biased coin 7 1.2 Computing nontrivial probabilities 8 Problem 1: Analyzing a family with four children 8 ■ Problem 2: Analyzing multiple die rolls 10 ■ Problem 3: Computing die-roll probabilities using weighted sample spaces 11 1.3 Computing probabilities over interval ranges 13 Evaluating extremes using interval analysis ix 13 1 CONTENTS x Q Plotting probabilities using Matplotlib 21 2.2 Basic Matplotlib plots 17 Plotting coin-flip probabilities 17 22 Comparing multiple coin-flip probability distributions J Running random simulations in NumPy 3.1 33 Simulating random coin flips and die rolls using NumPy 34 Analyzing biased coin flips 3.2 26 36 Computing confidence intervals using histograms and NumPy arrays 38 Binning similar points in histogram plots 41 Deriving probabilities from histograms 43 ■ Shrinking the range of a high confidence interval 46 ■ Computing histograms in NumPy 49 3.3 3.4 Using confidence intervals to analyze a biased deck of cards 51 Using permutations to shuffle cards 54 Case study 1 solution 4.1 58 Predicting red cards in a shuffled deck 59 Estimating the probability of strategy success 4.2 Case study 60 Optimizing strategies using the sample space for a 1 О-card deck 64 2 Assessing online ad clicks for 69 SIGNIFICANCE........ . ....... 4.3 4.4 4.5 X Problem statement 69 Dataset description 70 Overview 70 Basic probability and statistical analysis using SciPy 5.1 5.2 Exploring the relationships between data and probability using SciPy 72 Mean as a measure of centrality 76 Finding the mean of a probability distribution 5.3 Variance as a measure of dispersion 83 85 Finding the variance of a probability distribution 90 71 CONTENTS 6 Making predictions using the central limit theorem and SciPy 6.1 Manipulating the normal distribution using SciPy 95 Comparing two sampled normal curves 99 6.2 Determining the mean and variance of a population through random sampling 103 Making predictions using the mean and variance 107 Computing the area beneath a normal curve 109 ■ Interpreting the computed probability 112 6.3 ^7 Statistical hypothesis testing * xi 7.1 7.2 7.3 7.4 Assessing the divergence between sample mean and population mean 115 Data dredging: Coming to false conclusions through oversampling 121 Bootstrapping with replacement: Testing a hypothesis when the population variance is unknown 124 Permutation testing: Comparing means of samples when the population parameters are unknown 132 О Analyzing tables using Pandas 8.1 8.2 8.3 8.4 8.5 8.6 8.7 137 Storing tables using basic Python 138 Exploring tables using Pandas 138 Retrieving table columns 141 Retrieving table rows 143 Modifying table rows and columns 145 Saving and loading table data 148 Visualizing tables using Seaborn 149 ^ Case study 2 solution 9.1 9.2 9.3 9.4 114 154 Processing the ad-click table in Pandas 155 Computing p-values from differences in means 157 Determining statistical significance 161 41 shades of blue: A real-life cautionary tale 162 94 CONTENTS Case study 3 Tracking disease outbreaks using ..165 NEWS HEADLINES.... 1 - 9.5 Problem statement 165 Dataset description 165 9.6 Overview 166 / J Clustering data into groups 167 10.1 10.2 Using centrality to discover clusters 168 К-means: A clustering algorithm for grouping data into К central groups 174 К-means clustering using scikit-learn 175 * Selecting the optimal К using the elbow method 177 10.3 10.4 Using density to discover clusters 181 DBSCAN: A clustering algorithm for grouping data based on spadal density 185 Comparing DBSCAN and K-means 186՝ Clustering based on non-Euclidean distance 187 10.5 Analyzing clusters using Pandas 191 Geographic location visualization and analysis 11.1 11.2 194 The great-circle distance: A metric for computing the distance between two global points 195 Plotting maps using Cartopy 198 Manually installing GEOS and Cartopy 199՝ Utilizing the Conda package manager 199 * Visualizing maps 201 11.3 Location tracking using GeoNamesCache 211 Accessing country information 212 Accessing city information 215 ■ Limitations of the GeoNamesCache library 219 11.4 Matching location names in text 221 Case study 3 solution 12.1 12.2 12.3 226 Extracting locations from headline data 227 Visualizing and clustering the extracted location data 233 Extracting insights from location clusters 238 xiii CONTENTS Case study 4 Using online job postings to improve YOUR DATA SCIENCE RESUME............ .......245 12.4 Problem statement 245 Dataset description 246 12.5 Overview 247 Ί П* Measuring text similarities 249 13.1 Simple text comparison 250 Exploring theJaccard similarity 255 ■ Replacing words with numeric values 25 7 13.2 Vectorizing texts using word counts 262 Using normalization to improve TF vector similarity 264 Using unit vector dot products to convert between relevance metrics 272 13.3 Matrix multiplication for efficient similarity calculation 274 Basic matrix operations 277 · Computing all-by-all matrix similarities 285 13.4 Computational limits of matrix multiplication ƒ /1 Dimension reduction of matrix data T- * 287 292 14.1 Clustering 2D data in one dimension 293 Reducing dimensions using rotation 297 14.2 14.3 Dimension reduction using PCA and seikit-learn Clustering 4D data in two dimensions 315 Limitations of PCA 320 14.4 Computing principal components without rotation Extracting eigenvectors using power iteration 32 7 14.5 Efficient dimension reduction using SVD and scikit-learn 336 309 323 Х NLP analysis of large text datasets 340 15.1 Loading online forum discussions using scikit-learn 341 15.2 Vectorizing documents using scikit-learn 343 15.3 Ranking words by both post frequency and count 350 Computing TFIDF vectors with scikit-learn 356 Г CONTENTS xiv 15.4 15.5 Computing similarities across large document datasets 358 Clustering texts by topic 363 Exploring a single text cluster 15.6 Visualizing text clusters 368 372 Using subplots to display multiple word clouds 377 Extracting textfrom web pages 385 16.1 16.2 16.3 J^ Í ^ The structure of HTML documents 386 Parsing HTML using Beautiful Soup 394 Downloading and parsing online data 401 Case study 4 solution 17.1 404 Extracting skill requirements from job posting data 405 Exploring the HTML for skill descriptions 17.2 17.3 406 Filtering jobs by relevance 412 Clustering skills in relevant job postings 422 Grouping the job skids into 15 clusters 425 * Investigating the technical skill clusters 431 ■ Investigating the soft-skill clusters 434 ■ Exploring clusters at alternative values of К 436 Analyzing the 700 most relevant postings 440 17.4 Case study 5 Conclusion 443 Predicting future friendships FROM SOCIAL NETWORK DATA.................... .445 17.5 Problem statement 445 Introducing the friend-of-afriend recommendation algorithm Predicting user behavior 446 17.6 Dataset description 447 The Profiles table 447* The Observations table The Friendships table 449 17.7 448 Overview 449 An introduction to graph theory and network analysis 11,.* 18.1 446 Using basic graph theory to rank websites by popularity 452 Analyzing web networks using NetworkX 455 451 CONTENTS 18.2 xv Utilizing undirected graphs to optimize the travel time between towns 465 Modeling a complex network of towns and counties 467 Computing the fastest travel time between nodes 473 Dynamic graph theory techniques for node ranking and social network analysis 482 19.1 Uncovering central nodes based on expected traffic in a network 483 Measuring centrality using traffic simulations 19.2 486 Computing travel probabilities using matrix multiplication 489 Deriving PageRank centrality from probability theory 492 Computing PageRank centrality using NetworkX 496 19.3 19.4 Community detection using Markov clustering 498 Uncovering friend groups in social networks 513 Network-driven supervised machine learning 518 20.1 20.2 The basics of supervised machine learning 519 Measuring predicted label accuracy 527 Scikit-learn’s prediction measurement functions 20.3 20.4 20.5 536 Optimizing KNN performance 537 Running a grid search using scikitdearn 539 Limitations of the KNN algorithm 544 Training linear classifiers with logistic regression 21.1 21.2 Linearly separating customers by size Training a linear classifier 554 548 549 Improving perceptron performance through standardization 21.3 Improving linear classification with logistic regression 565 Running logistic regression on more than two features 21.4 Training linear classifiers using scikitdearn Training multiclass linear models 21.5 21.6 562 572 574 576 Measuring feature importance with coefficients Linear classifier limitations 582 579 CONTENTS xvi Training nonlinear classifiers with decision tree techniques ì«! á,.,« շշ լ Automated learning of logical rules 587 Training a nested if/else model using two features 593 ■ Deciding which feature to split on 599 ■ Training if/ehe models with more than two features 608 22.2 Training decision tree classifiers using scikit-learn Studying cancerous cells usingfeature importance 621 22.3 22.4 Decision tree classifier limitations 624 Improving performance using random forest classification 626 Training random forest classifiers using scikit-learn 22.5 Case study 5 solution 614 630 634 23.1 Exploring the data 635 Examining the profiles 635 ■ Exploring the experimental observations 638 ■ Exploring the Friendships linkage table 641 23.2 23.3 23.4 Training a predictive model using network features Adding profile features to the model 652 Optimizing performance across a steady set of features 657 Interpreting the trained model 659 Why are generalizable models so important? 662 23.5 index 665 645 586
adam_txt	brief contents Case study 1 Finding the winning strategy in 1 A CARD GAME. 1 ■ Computing probabilities using Python 2 ■ 3 Plotting probabilities using Matplotlib 3 17 ■ Running random simulations in NumPy 33 4 ■ Case study 1 solution 58 Case study 2 Assessing online ad clicks for SIGNIFICANCE. 69 5 i■ Basic probability and statistical analysis using SciPy 71 6 1■ Making predictions using the central limit theorem and SciPy 94 7 i■ Statistical hypothesis testing 8 ■ Analyzing tables using Pandas 9 ■ Case study 2 solution vii 154 114 137 viii Case study 3 BRIEF CONTENTS Tracking disease outbreaks using news HEADLINES. . 165 10 ■ Clustering data into groups 167 11 ■ Geographic location visualization and analysis 12 ■ Case study 3 solution Case study 4 Using 194 226 online job postings to improve: YOUR DATA SCIENCE RESUME . 245 13 ■ Measuring text similarities 249 14 ■ Dimension reduction of matrix data 15 ■ NLP analysis of large text datasets 16 ■ Extracting text from web pages 17 ■ Case study 4 solution Case study 5 Predicting 292 340 385 404 euture friendships from SOCIAL NETWORK DATA. . . 445 18 ■ An introduction to graph theory and network analysis 451 19 ■ Dynamic graph theory techniques for node ranking and social network analysis 482 20 ■ Network-driven supervised machine learning 518 21 ■ Training linear classifiers with logistic regression 22 ■ Training nonlinear classifiers with decision tree techniques 586 23 ■ Case study 5 solution 634 548 contents preface xvii acknowledgments xix about this book xxi about the author xxv about the cover illustration Case study 1 xxvi Finding the winning strategy in A CARD GAME . Ί Computing probabilities using Python 3 3 1.1 Sample space analysis: An equation-free approach for measuring uncertainty in outcomes 4 Analyzing a biased coin 7 1.2 Computing nontrivial probabilities 8 Problem 1: Analyzing a family with four children 8 ■ Problem 2: Analyzing multiple die rolls 10 ■ Problem 3: Computing die-roll probabilities using weighted sample spaces 11 1.3 Computing probabilities over interval ranges 13 Evaluating extremes using interval analysis ix 13 1 CONTENTS x Q Plotting probabilities using Matplotlib ' 21 2.2 Basic Matplotlib plots 17 Plotting coin-flip probabilities 17 22 Comparing multiple coin-flip probability distributions J Running random simulations in NumPy 3.1 33 Simulating random coin flips and die rolls using NumPy 34 Analyzing biased coin flips 3.2 26 36 Computing confidence intervals using histograms and NumPy arrays 38 Binning similar points in histogram plots 41 " Deriving probabilities from histograms 43 ■ Shrinking the range of a high confidence interval 46 ■ Computing histograms in NumPy 49 3.3 3.4 Using confidence intervals to analyze a biased deck of cards 51 Using permutations to shuffle cards 54 Case study 1 solution 4.1 58 Predicting red cards in a shuffled deck 59 Estimating the probability of strategy success 4.2 Case study 60 Optimizing strategies using the sample space for a 1 О-card deck 64 2 Assessing online ad clicks for 69 SIGNIFICANCE. . . 4.3 4.4 4.5 X Problem statement 69 Dataset description 70 Overview 70 Basic probability and statistical analysis using SciPy 5.1 5.2 Exploring the relationships between data and probability using SciPy 72 Mean as a measure of centrality 76 Finding the mean of a probability distribution 5.3 Variance as a measure of dispersion 83 85 Finding the variance of a probability distribution 90 71 CONTENTS 6 Making predictions using the central limit theorem and SciPy 6.1 Manipulating the normal distribution using SciPy 95 Comparing two sampled normal curves 99 6.2 Determining the mean and variance of a population through random sampling 103 Making predictions using the mean and variance 107 Computing the area beneath a normal curve 109 ■ Interpreting the computed probability 112 6.3 ^7 Statistical hypothesis testing * xi 7.1 7.2 7.3 7.4 Assessing the divergence between sample mean and population mean 115 Data dredging: Coming to false conclusions through oversampling 121 Bootstrapping with replacement: Testing a hypothesis when the population variance is unknown 124 Permutation testing: Comparing means of samples when the population parameters are unknown 132 О Analyzing tables using Pandas 8.1 8.2 8.3 8.4 8.5 8.6 8.7 137 Storing tables using basic Python 138 Exploring tables using Pandas 138 Retrieving table columns 141 Retrieving table rows 143 Modifying table rows and columns 145 Saving and loading table data 148 Visualizing tables using Seaborn 149 ^ Case study 2 solution 9.1 9.2 9.3 9.4 114 154 Processing the ad-click table in Pandas 155 Computing p-values from differences in means 157 Determining statistical significance 161 41 shades of blue: A real-life cautionary tale 162 94 CONTENTS Case study 3 Tracking disease outbreaks using .165 NEWS HEADLINES. 1 - 9.5 Problem statement 165 Dataset description 165 9.6 Overview 166 / J Clustering data into groups 167 10.1 10.2 Using centrality to discover clusters 168 К-means: A clustering algorithm for grouping data into К central groups 174 К-means clustering using scikit-learn 175 * Selecting the optimal К using the elbow method 177 10.3 10.4 Using density to discover clusters 181 DBSCAN: A clustering algorithm for grouping data based on spadal density 185 Comparing DBSCAN and K-means 186՝ Clustering based on non-Euclidean distance 187 10.5 Analyzing clusters using Pandas 191 Geographic location visualization and analysis 11.1 11.2 194 The great-circle distance: A metric for computing the distance between two global points 195 Plotting maps using Cartopy 198 Manually installing GEOS and Cartopy 199՝ Utilizing the Conda package manager 199 * Visualizing maps 201 11.3 Location tracking using GeoNamesCache 211 Accessing country information 212' Accessing city information 215 ■ Limitations of the GeoNamesCache library 219 11.4 Matching location names in text 221 Case study 3 solution 12.1 12.2 12.3 226 Extracting locations from headline data 227 Visualizing and clustering the extracted location data 233 Extracting insights from location clusters 238 xiii CONTENTS Case study 4 Using online job postings to improve YOUR DATA SCIENCE RESUME. .245 12.4 Problem statement 245 Dataset description 246 12.5 Overview 247 Ί П* Measuring text similarities 249 13.1 Simple text comparison 250 Exploring theJaccard similarity 255 ■ Replacing words with numeric values 25 7 13.2 Vectorizing texts using word counts 262 Using normalization to improve TF vector similarity 264 Using unit vector dot products to convert between relevance metrics 272 13.3 Matrix multiplication for efficient similarity calculation 274 Basic matrix operations 277 · Computing all-by-all matrix similarities 285 13.4 Computational limits of matrix multiplication ƒ /1 Dimension reduction of matrix data T- * 287 292 14.1 Clustering 2D data in one dimension 293 Reducing dimensions using rotation 297 14.2 14.3 Dimension reduction using PCA and seikit-learn Clustering 4D data in two dimensions 315 Limitations of PCA 320 14.4 Computing principal components without rotation Extracting eigenvectors using power iteration 32 7 14.5 Efficient dimension reduction using SVD and scikit-learn 336 309 323 Х NLP analysis of large text datasets 340 15.1 Loading online forum discussions using scikit-learn 341 15.2 Vectorizing documents using scikit-learn 343 15.3 Ranking words by both post frequency and count 350 Computing TFIDF vectors with scikit-learn 356 Г CONTENTS xiv 15.4 15.5 Computing similarities across large document datasets 358 Clustering texts by topic 363 Exploring a single text cluster 15.6 Visualizing text clusters 368 372 Using subplots to display multiple word clouds 377 Extracting textfrom web pages 385 16.1 16.2 16.3 J^ Í ^ The structure of HTML documents 386 Parsing HTML using Beautiful Soup 394 Downloading and parsing online data 401 Case study 4 solution 17.1 404 Extracting skill requirements from job posting data 405 Exploring the HTML for skill descriptions 17.2 17.3 406 Filtering jobs by relevance 412 Clustering skills in relevant job postings 422 Grouping the job skids into 15 clusters 425 * Investigating the technical skill clusters 431 ■ Investigating the soft-skill clusters 434 ■ Exploring clusters at alternative values of К 436 Analyzing the 700 most relevant postings 440 17.4 Case study 5 Conclusion 443 Predicting future friendships FROM SOCIAL NETWORK DATA. .445 17.5 Problem statement 445 Introducing the friend-of-afriend recommendation algorithm Predicting user behavior 446 17.6 Dataset description 447 The Profiles table 447* The Observations table The Friendships table 449 17.7 448 Overview 449 An introduction to graph theory and network analysis 11,.* 18.1 446 Using basic graph theory to rank websites by popularity 452 Analyzing web networks using NetworkX 455 451 CONTENTS 18.2 xv Utilizing undirected graphs to optimize the travel time between towns 465 Modeling a complex network of towns and counties 467 Computing the fastest travel time between nodes 473 Dynamic graph theory techniques for node ranking and social network analysis 482 19.1 Uncovering central nodes based on expected traffic in a network 483 Measuring centrality using traffic simulations 19.2 486 Computing travel probabilities using matrix multiplication 489 Deriving PageRank centrality from probability theory 492 Computing PageRank centrality using NetworkX 496 19.3 19.4 Community detection using Markov clustering 498 Uncovering friend groups in social networks 513 Network-driven supervised machine learning 518 20.1 20.2 The basics of supervised machine learning 519 Measuring predicted label accuracy 527 Scikit-learn’s prediction measurement functions 20.3 20.4 20.5 536 Optimizing KNN performance 537 Running a grid search using scikitdearn 539 Limitations of the KNN algorithm 544 Training linear classifiers with logistic regression 21.1 21.2 Linearly separating customers by size Training a linear classifier 554 548 549 Improving perceptron performance through standardization 21.3 Improving linear classification with logistic regression 565 Running logistic regression on more than two features 21.4 Training linear classifiers using scikitdearn Training multiclass linear models 21.5 21.6 562 572 574 576 Measuring feature importance with coefficients Linear classifier limitations 582 579 CONTENTS xvi Training nonlinear classifiers with decision tree techniques ì«! á,.,« շշ լ Automated learning of logical rules 587 Training a nested if/else model using two features 593 ■ Deciding which feature to split on 599 ■ Training if/ehe models with more than two features 608 22.2 Training decision tree classifiers using scikit-learn Studying cancerous cells usingfeature importance 621 22.3 22.4 Decision tree classifier limitations 624 Improving performance using random forest classification 626 Training random forest classifiers using scikit-learn 22.5 Case study 5 solution 614 630 634 23.1 Exploring the data 635 Examining the profiles 635 ■ Exploring the experimental observations 638 ■ Exploring the Friendships linkage table 641 23.2 23.3 23.4 Training a predictive model using network features Adding profile features to the model 652 Optimizing performance across a steady set of features 657 Interpreting the trained model 659 Why are generalizable models so important? 662 23.5 index 665 645 586
any_adam_object	1
any_adam_object_boolean	1
author	Apeltsin, Leonard ca. 20./21. Jh
author_GND	(DE-588)1280840129
author_facet	Apeltsin, Leonard ca. 20./21. Jh
author_role	aut
author_sort	Apeltsin, Leonard ca. 20./21. Jh
author_variant	l a la
building	Verbundindex
bvnumber	BV048630066
classification_rvk	ST 300
ctrlnum	(OCoLC)1294301294 (DE-599)KXP178714853X
dewey-full	006.312
dewey-hundreds	000 - Computer science, information, general works
dewey-ones	006 - Special computer methods
dewey-raw	006.312
dewey-search	006.312
dewey-sort	16.312
dewey-tens	000 - Computer science, information, general works
discipline	Informatik
discipline_str_mv	Informatik
format	Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02944nam a2200469 c 4500</leader><controlfield tag="001">BV048630066</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20230213 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">230104s2021 xxua\|\|\| \|\|\|\| 00\|\|\| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781617296253</subfield><subfield code="c">softcover</subfield><subfield code="9">978-1-61729-625-3</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1294301294</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)KXP178714853X</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">XD-US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-739</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.312</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Apeltsin, Leonard</subfield><subfield code="d">ca. 20./21. Jh.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1280840129</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Data science bookcamp</subfield><subfield code="b">five Python projects</subfield><subfield code="c">Leonard Apeltsin</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Shelter Island</subfield><subfield code="b">Manning</subfield><subfield code="c">[2021]</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xxvi, 676 Seiten</subfield><subfield code="b">Diagramme, Illustrationen</subfield><subfield code="c">24 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Untertitel auf Cover: Five real-world Python projects</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturangaben</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">1. Computing probabilities using Python -- 2. Plotting probabilities using Matplotlib -- 3. Running random simulations in NumPy -- 4. Case study 1 solution -- 5. Basic probability and statistical analysis using SciPy -- 6. Making predictions using the central limit theorem and SciPy -- 7. Statistical hypothesis testing -- 8. Analyzing tables using Pandas -- 9. Case study 2 solution -- 10. Clustering data into groups -- 11. Geographic location visualization and analysis -- 12. Case study 3 solution -- 13. Measuring text similarities -- 14. Dimension reduction of matrix data -- 15. NLP analysis of large text datasets -- 16. Extracting text from web pages -- 17. Case study 4 solution -- 18. An introduction to graph theory and network analysis -- 19. Dynamic graph theory techniques for node ranking and social network analysis -- 20. Network-driven supervised machine learning -- 21. Training linear classifiers with logistic regression -- 22. Training nonlinear classifiers with decision tree techniques -- 23. Case study 5 solution.</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Science</subfield><subfield code="0">(DE-588)1140936166</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Python</subfield><subfield code="g">Programmiersprache</subfield><subfield code="0">(DE-588)4434275-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Data mining</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Data sets</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Python (Computer program language)</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data Science</subfield><subfield code="0">(DE-588)1140936166</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Python</subfield><subfield code="g">Programmiersprache</subfield><subfield code="0">(DE-588)4434275-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://www.gbv.de/dms/bowker/toc/9781617296253.pdf</subfield><subfield code="v">2022-07-28</subfield><subfield code="x">Aggregator</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034005124&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-034005124</subfield></datafield></record></collection>
id	DE-604.BV048630066
illustrated	Illustrated
index_date	2024-07-03T21:15:51Z
indexdate	2024-07-10T09:44:29Z
institution	BVB
isbn	9781617296253
language	English
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-034005124
oclc_num	1294301294
open_access_boolean
owner	DE-739
owner_facet	DE-739
physical	xxvi, 676 Seiten Diagramme, Illustrationen 24 cm
publishDate	2021
publishDateSearch	2021
publishDateSort	2021
publisher	Manning
record_format	marc
spelling	Apeltsin, Leonard ca. 20./21. Jh. Verfasser (DE-588)1280840129 aut Data science bookcamp five Python projects Leonard Apeltsin Shelter Island Manning [2021] xxvi, 676 Seiten Diagramme, Illustrationen 24 cm txt rdacontent n rdamedia nc rdacarrier Untertitel auf Cover: Five real-world Python projects Literaturangaben 1. Computing probabilities using Python -- 2. Plotting probabilities using Matplotlib -- 3. Running random simulations in NumPy -- 4. Case study 1 solution -- 5. Basic probability and statistical analysis using SciPy -- 6. Making predictions using the central limit theorem and SciPy -- 7. Statistical hypothesis testing -- 8. Analyzing tables using Pandas -- 9. Case study 2 solution -- 10. Clustering data into groups -- 11. Geographic location visualization and analysis -- 12. Case study 3 solution -- 13. Measuring text similarities -- 14. Dimension reduction of matrix data -- 15. NLP analysis of large text datasets -- 16. Extracting text from web pages -- 17. Case study 4 solution -- 18. An introduction to graph theory and network analysis -- 19. Dynamic graph theory techniques for node ranking and social network analysis -- 20. Network-driven supervised machine learning -- 21. Training linear classifiers with logistic regression -- 22. Training nonlinear classifiers with decision tree techniques -- 23. Case study 5 solution. Data Science (DE-588)1140936166 gnd rswk-swf Data Mining (DE-588)4428654-5 gnd rswk-swf Python Programmiersprache (DE-588)4434275-5 gnd rswk-swf Data mining Data sets Python (Computer program language) Data Science (DE-588)1140936166 s Data Mining (DE-588)4428654-5 s Python Programmiersprache (DE-588)4434275-5 s DE-604 https://www.gbv.de/dms/bowker/toc/9781617296253.pdf 2022-07-28 Aggregator Inhaltsverzeichnis Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034005124&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis
spellingShingle	Apeltsin, Leonard ca. 20./21. Jh Data science bookcamp five Python projects Data Science (DE-588)1140936166 gnd Data Mining (DE-588)4428654-5 gnd Python Programmiersprache (DE-588)4434275-5 gnd
subject_GND	(DE-588)1140936166 (DE-588)4428654-5 (DE-588)4434275-5
title	Data science bookcamp five Python projects
title_auth	Data science bookcamp five Python projects
title_exact_search	Data science bookcamp five Python projects
title_exact_search_txtP	Data science bookcamp five Python projects
title_full	Data science bookcamp five Python projects Leonard Apeltsin
title_fullStr	Data science bookcamp five Python projects Leonard Apeltsin
title_full_unstemmed	Data science bookcamp five Python projects Leonard Apeltsin
title_short	Data science bookcamp
title_sort	data science bookcamp five python projects
title_sub	five Python projects
topic	Data Science (DE-588)1140936166 gnd Data Mining (DE-588)4428654-5 gnd Python Programmiersprache (DE-588)4434275-5 gnd
topic_facet	Data Science Data Mining Python Programmiersprache
url	https://www.gbv.de/dms/bowker/toc/9781617296253.pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034005124&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA
work_keys_str_mv	AT apeltsinleonard datasciencebookcampfivepythonprojects

Verfügbarkeit

Es ist kein Print-Exemplar vorhanden.

Fernleihe Bestellen Achtung: Nicht im THWS-Bestand! Inhaltsverzeichnis
Inhaltsverzeichnis

MARC

Datensatz im Suchindex

Es ist kein Print-Exemplar vorhanden.

Ähnliche Einträge