Data analysis with open source tools: [a hands-on guide for programmers and data scientists]
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Beijing [u.a.]
O'Reilly
2011
|
Schlagworte: | |
Online-Zugang: | Inhaltstext Inhaltsverzeichnis |
Beschreibung: | XVIII, 509 S. Ill., graph. Darst. |
ISBN: | 9780596802356 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV036867489 | ||
003 | DE-604 | ||
005 | 20221013 | ||
007 | t | ||
008 | 101215s2011 ad|| |||| 00||| eng d | ||
015 | |a 10,N25 |2 dnb | ||
016 | 7 | |a 1003634710 |2 DE-101 | |
020 | |a 9780596802356 |c PB. : EUR 38.00 (freier Pr.) |9 978-0-596-80235-6 | ||
024 | 3 | |a 9780596802356 | |
035 | |a (OCoLC)845786048 | ||
035 | |a (DE-599)DNB1003634710 | ||
040 | |a DE-604 |b ger |e rakddb | ||
041 | 0 | |a eng | |
049 | |a DE-824 |a DE-11 |a DE-2070s |a DE-898 |a DE-188 |a DE-83 |a DE-473 |a DE-Aug4 |a DE-860 |a DE-20 |a DE-29T |a DE-945 | ||
084 | |a MR 2200 |0 (DE-625)123489: |2 rvk | ||
084 | |a ST 515 |0 (DE-625)143677: |2 rvk | ||
084 | |a ST 600 |0 (DE-625)143681: |2 rvk | ||
084 | |a 004 |2 sdnb | ||
100 | 1 | |a Janert, Philipp K. |e Verfasser |0 (DE-588)1028497148 |4 aut | |
245 | 1 | 0 | |a Data analysis with open source tools |b [a hands-on guide for programmers and data scientists] |c Philipp K. Janert |
264 | 1 | |a Beijing [u.a.] |b O'Reilly |c 2011 | |
300 | |a XVIII, 509 S. |b Ill., graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 0 | 7 | |a Open Source |0 (DE-588)4548264-0 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Datenanalyse |0 (DE-588)4123037-1 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Datenanalyse |0 (DE-588)4123037-1 |D s |
689 | 0 | 1 | |a Open Source |0 (DE-588)4548264-0 |D s |
689 | 0 | |5 DE-604 | |
856 | 4 | 2 | |q text/html |u http://deposit.dnb.de/cgi-bin/dokserv?id=3493910&prov=M&dok_var=1&dok_ext=htm |3 Inhaltstext |
856 | 4 | 2 | |m HBZ Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020783117&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-020783117 |
Datensatz im Suchindex
_version_ | 1805095107216539648 |
---|---|
adam_text |
Titel: Data analysis with open source tools
Autor: Janert, Philipp K.
Jahr: 2011
CONTENTS
PREFACE xiil
1 INTRODUCTION 1
Data Analysis 1
What's in This Book 2
What's with the Workshops? 3
What's with the Math? 1
What You'll Need 5
What's Missing 6
PART I Graphics: Looking at Data
2 A SINGLE VARIABLE: SHAPE AND DISTRIBUTION U
Dot and Jitter Plots 12
Histograms and Kernel Density Estimates 11
The Cumulative Distribution Function 23
Rank-Order Plots and Lift Charts 30
Only When Appropriate: Summary Statistics and Box Plots 33
Workshop: NumPy 38
Further Reading f 5
3 TWO VARIABLES: ESTABLISHING RELATIONSHIPS H7
Scatter Plots f7
Conquering Noise: Smoothing 18
Logarithmic Plots 57
Banking 61
Linear Regression and All That 62
Showing What's Important 66
Graphical Analysis and Presentation Graphics 68
Workshop: matplotlib 69
Further Reading 78
1 TIME AS A VARIABLE: TIME-SERIES ANALYSIS 79
Examples 79
The Task 83
Smoothing 81
Don't Overlook the Obvious! 90
The Correlation Function 91
Optional: Filters and Convolutions 95
Workshop: scipy.signal 96
Further Reading 98
5 MORE THAN TWO VARIABLES: GRAPHICAL MULTIVARIATE ANALYSIS 99
False-Color Plots 100
A Lot at a Glance: Multiplots 105
Composition Problems HO
Novel Plot Types 116
Interactive Explorations 120
Workshop: Tools for Multivariate Graphics 123
Further Reading 125
6 INTERMEZZO: A DATA ANALYSIS SESSION 127
A Data Analysis Session 127
Workshop: gnuplot 136
Further Reading 138
PART II Analytics: Modeling Data
7 GUESSTIMATION AND THE BACK OF THE ENVELOPE ltl
Principles ofGuesstimation 112
How Good Are Those Numbers? 151
Optional: A Closer Look at Perturbation Theory and
Error Propagation 155
Workshop: The Gnu Scientific Library (GSLi 158
Further Reading 161
8 MODELS FROM SCALING ARGUMENTS 163
Models 163
Arguments from Scale 165
Mean-Field Approximations 175
Common Time-Evolution Scenarios 178
Case Study: How Many Servers Are Best? 182
Why Modeling? 181
Workshop: Sage 181
Further Reading 188
9 ARGUMENTS FROM PROBABILITY MODELS 191
The Binomial Distribution and Bernoulli Trials 191
The Gaussian Distribution and the Central Limit Theorem 195
Power-Law Distributions and Non-Normal Statistics 201
Other Distributions 206
Optional: Case Study?Unique Visitors over Time 211
Workshop: Power-Law Distributions 215
Further Reading 218
*«i CONTENTS
10 WHAT YOU REALLY NEED TO KNOW ABOUT CLASSICAL STATISTICS 221
Genesis 221
Statistics Defined 223
Statistics Explained 226
Controlled Experiments Versus Observational Studies 230
Optional: Bayesian Statistics?The Other Point of View 235
Workshop: R 213
Further Reading 219
11 INTERMEZZO: MYTHBUSTING?BIGFOOT, LEAST SQUARES,
AND ALL THAT 253
How to Average Averages 253
The Standard Deviation 256
Least Squares 260
Further Reading 261
PART III Computation: Mining Data
12 SIMULATIONS 267
A Warm-Up Question 267
Monte Carlo Simulations 270
Resampling Methods 276
Workshop: Discrete Event Simulations with SimPy 280
Further Reading 291
13 FINDING CLUSTERS 293
What Constitutes a Cluster? 293
Distance and Similarity Measures 298
Clustering Methods 301
Pre- and Postprocessing 311
Other Thoughts 3if
A Special Case: Market Basket Analysis 316
A Word of Warning 319
Workshop: Pycluster and the C Clustering Library 320
Further Reading 32H
11 SEEING THE FOREST FOR THE TREES: FINDING
IMPORTANT ATTRIBUTES 327
Principal Component Analysis 328
Visual Techniques 337
KohonenMaps 339
Workshop: PCA with R 312
Further Reading 3ta
15 INTERMEZZO: WHEN MORE IS DIFFERENT 351
A Horror Story 353
CONTENTS ix
Some Suggestions 351
What About Map/Reduce? 356
Workshop: Generating Permutations 357
Further Reading 358
PART IV Applications: Using Data
16 REPORTING, BUSINESS INTELLIGENCE, AND DASHBOARDS 361
Business Intelligence 362
Corporate Metrics and Dashboards 369
Data Quality Issues 373
Workshop: Berkeley DB and SQLite 376
Further Reading 381
17 FINANCIAL CALCULATIONS AND MODELING 383
The Time Value of Money 381
Uncertainty in Planning and Opportunity Costs 391
Cost Concepts and Deprecia tion 391
Should You Care? 398
Is This All That Matters? 399
Workshop: The Newsvendor Problem 100
Further Reading 103
18 PREDICTIVE ANALYTICS 405
Introduction 105
Some Classification Terminology 107
Algorithms for Classification 108
The Process 119
The Secret Sauce 123
The Nature of Statistical Learning 121
Workshop: Two Do-lt-YourselfClassifiers 126
Further Reading 131
19 EPILOGUE: FACTS ARE NOT REALITY f33
A PROGRAMMING ENVIRONMENTS FOR SCIENTIFIC COMPUTATION
AND DATA ANALYSIS H35
Software Tools 135
A Catalog of Scientific Software 137
Writing Your Own 113
Further Reading HI
B RESULTS FROM CALCULUS H7
Common Functions 118
Calculus 160
Useful Tricks 168
CONTENTS
Notation and Basic Math 172
Where to Go from Here 179
Further Reading 181
C WORKING WITH DATA H85
Sources for Data 185
Cleaning and Conditioning 187
Sampling 189
Data File Formats 190
The Care and Feeding of Your Data Zoo 192
Skills 193
Terminology 195
Further Reading 197
INDEX 499
CONTENTS ri |
any_adam_object | 1 |
author | Janert, Philipp K. |
author_GND | (DE-588)1028497148 |
author_facet | Janert, Philipp K. |
author_role | aut |
author_sort | Janert, Philipp K. |
author_variant | p k j pk pkj |
building | Verbundindex |
bvnumber | BV036867489 |
classification_rvk | MR 2200 ST 515 ST 600 |
ctrlnum | (OCoLC)845786048 (DE-599)DNB1003634710 |
discipline | Informatik Soziologie |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV036867489</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20221013</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">101215s2011 ad|| |||| 00||| eng d</controlfield><datafield tag="015" ind1=" " ind2=" "><subfield code="a">10,N25</subfield><subfield code="2">dnb</subfield></datafield><datafield tag="016" ind1="7" ind2=" "><subfield code="a">1003634710</subfield><subfield code="2">DE-101</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780596802356</subfield><subfield code="c">PB. : EUR 38.00 (freier Pr.)</subfield><subfield code="9">978-0-596-80235-6</subfield></datafield><datafield tag="024" ind1="3" ind2=" "><subfield code="a">9780596802356</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)845786048</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DNB1003634710</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakddb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-824</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-2070s</subfield><subfield code="a">DE-898</subfield><subfield code="a">DE-188</subfield><subfield code="a">DE-83</subfield><subfield code="a">DE-473</subfield><subfield code="a">DE-Aug4</subfield><subfield code="a">DE-860</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-29T</subfield><subfield code="a">DE-945</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">MR 2200</subfield><subfield code="0">(DE-625)123489:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 515</subfield><subfield code="0">(DE-625)143677:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 600</subfield><subfield code="0">(DE-625)143681:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">004</subfield><subfield code="2">sdnb</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Janert, Philipp K.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1028497148</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Data analysis with open source tools</subfield><subfield code="b">[a hands-on guide for programmers and data scientists]</subfield><subfield code="c">Philipp K. Janert</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Beijing [u.a.]</subfield><subfield code="b">O'Reilly</subfield><subfield code="c">2011</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XVIII, 509 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Open Source</subfield><subfield code="0">(DE-588)4548264-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Datenanalyse</subfield><subfield code="0">(DE-588)4123037-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Datenanalyse</subfield><subfield code="0">(DE-588)4123037-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Open Source</subfield><subfield code="0">(DE-588)4548264-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="q">text/html</subfield><subfield code="u">http://deposit.dnb.de/cgi-bin/dokserv?id=3493910&prov=M&dok_var=1&dok_ext=htm</subfield><subfield code="3">Inhaltstext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020783117&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-020783117</subfield></datafield></record></collection> |
id | DE-604.BV036867489 |
illustrated | Illustrated |
indexdate | 2024-07-20T10:54:00Z |
institution | BVB |
isbn | 9780596802356 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-020783117 |
oclc_num | 845786048 |
open_access_boolean | |
owner | DE-824 DE-11 DE-2070s DE-898 DE-BY-UBR DE-188 DE-83 DE-473 DE-BY-UBG DE-Aug4 DE-860 DE-20 DE-29T DE-945 |
owner_facet | DE-824 DE-11 DE-2070s DE-898 DE-BY-UBR DE-188 DE-83 DE-473 DE-BY-UBG DE-Aug4 DE-860 DE-20 DE-29T DE-945 |
physical | XVIII, 509 S. Ill., graph. Darst. |
publishDate | 2011 |
publishDateSearch | 2011 |
publishDateSort | 2011 |
publisher | O'Reilly |
record_format | marc |
spelling | Janert, Philipp K. Verfasser (DE-588)1028497148 aut Data analysis with open source tools [a hands-on guide for programmers and data scientists] Philipp K. Janert Beijing [u.a.] O'Reilly 2011 XVIII, 509 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier Open Source (DE-588)4548264-0 gnd rswk-swf Datenanalyse (DE-588)4123037-1 gnd rswk-swf Datenanalyse (DE-588)4123037-1 s Open Source (DE-588)4548264-0 s DE-604 text/html http://deposit.dnb.de/cgi-bin/dokserv?id=3493910&prov=M&dok_var=1&dok_ext=htm Inhaltstext HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020783117&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Janert, Philipp K. Data analysis with open source tools [a hands-on guide for programmers and data scientists] Open Source (DE-588)4548264-0 gnd Datenanalyse (DE-588)4123037-1 gnd |
subject_GND | (DE-588)4548264-0 (DE-588)4123037-1 |
title | Data analysis with open source tools [a hands-on guide for programmers and data scientists] |
title_auth | Data analysis with open source tools [a hands-on guide for programmers and data scientists] |
title_exact_search | Data analysis with open source tools [a hands-on guide for programmers and data scientists] |
title_full | Data analysis with open source tools [a hands-on guide for programmers and data scientists] Philipp K. Janert |
title_fullStr | Data analysis with open source tools [a hands-on guide for programmers and data scientists] Philipp K. Janert |
title_full_unstemmed | Data analysis with open source tools [a hands-on guide for programmers and data scientists] Philipp K. Janert |
title_short | Data analysis with open source tools |
title_sort | data analysis with open source tools a hands on guide for programmers and data scientists |
title_sub | [a hands-on guide for programmers and data scientists] |
topic | Open Source (DE-588)4548264-0 gnd Datenanalyse (DE-588)4123037-1 gnd |
topic_facet | Open Source Datenanalyse |
url | http://deposit.dnb.de/cgi-bin/dokserv?id=3493910&prov=M&dok_var=1&dok_ext=htm http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020783117&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT janertphilippk dataanalysiswithopensourcetoolsahandsonguideforprogrammersanddatascientists |