Data analysis for the life sciences with R:
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Boca Raton ; London ; New York
CRC Press, Taylor & Francis Group
[2017]
|
Schriftenreihe: | A Chapman & Hall book
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis Klappentext |
Beschreibung: | xxi, 353 Seiten Illustrationen, Diagramme |
ISBN: | 9781138407206 9781498775670 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV043954947 | ||
003 | DE-604 | ||
005 | 20200708 | ||
007 | t | ||
008 | 161212s2017 a||| |||| 00||| eng d | ||
020 | |a 9781138407206 |c hbk |9 978-1-138-40720-6 | ||
020 | |a 9781498775670 |9 978-1-4987-7567-0 | ||
035 | |a (OCoLC)969648805 | ||
035 | |a (DE-599)OBVAC13295934 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-20 |a DE-355 |a DE-1043 | ||
084 | |a ST 250 |0 (DE-625)143626: |2 rvk | ||
084 | |a ST 601 |0 (DE-625)143682: |2 rvk | ||
084 | |a WC 7700 |0 (DE-625)148144: |2 rvk | ||
100 | 1 | |a Irizarry, Rafael A. |e Verfasser |4 aut | |
245 | 1 | 0 | |a Data analysis for the life sciences with R |c Rafael A. Irizarry, Michael I. Love |
264 | 1 | |a Boca Raton ; London ; New York |b CRC Press, Taylor & Francis Group |c [2017] | |
300 | |a xxi, 353 Seiten |b Illustrationen, Diagramme | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a A Chapman & Hall book | |
650 | 0 | 7 | |a Datenanalyse |0 (DE-588)4123037-1 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Biowissenschaften |0 (DE-588)4129772-6 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a R |g Programm |0 (DE-588)4705956-4 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Datenanalyse |0 (DE-588)4123037-1 |D s |
689 | 0 | 1 | |a Biowissenschaften |0 (DE-588)4129772-6 |D s |
689 | 0 | 2 | |a R |g Programm |0 (DE-588)4705956-4 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Love, Michael I. |e Verfasser |0 (DE-588)1043111328 |4 aut | |
856 | 4 | 2 | |m Digitalisierung UB Regensburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029363726&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
856 | 4 | 2 | |m Digitalisierung UB Regensburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029363726&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |3 Klappentext |
999 | |a oai:aleph.bib-bvb.de:BVB01-029363726 |
Datensatz im Suchindex
_version_ | 1804176907378360320 |
---|---|
adam_text | Contents
List of Figures ix
Acknowledgments xvii
Introduction xix
1 Getting Started 1
1.1 Installing R.............................................................. 1
1.2 Installing RStudio ....................................................... 1
1.3 Learn R Basics ........................................................... 1
1.4 Installing Packages....................................................... 2
1.5 Importing Data into R..................................................... 3
1.6 Exercises ................................................................ 5
1.7 Brief Introduction to dplyr .............................................. 6
1.8 Exercises ................................................................ 8
1.9 Mathematical Notation ......................................... 8
2 Inference 13
2.1 Introduction............................................................. 13
2.2 Random Variables ........................................................ 14
2.3 The Null Hypothesis...................................................... 15
2.4 Distributions ........................................................... 16
2.5 Probability Distribution ................................................ 17
2.6 Normal Distribution ..................................................... 19
2.7 Exercises ............................................................... 21
2.8 Populations, Samples and Estimates ...................................... 22
2.9 Exercises ............................................................... 23
2.10 Central Limit Theorem and t-distribution .............................. 24
2.11 Exercises ............................................................. 28
2.12 Central Limit Theorem in Practice.............................. 30
2.13 Exercises ............................................................... 33
2.14 t-tests in Practice...................................................... 36
2.15 The t-distribution in Practice........................................... 37
2.16 Confidence Intervals .................................................... 39
2.17 Power Calculations....................................................... 45
2.18 Exercises .............................................................. 51
2.19 Monte Carlo Simulation ................................................ 54
2.20 Parametric Simulations for the Observations ............................. 58
2.21 Exercises ............................................................. 58
2.22 Permutation Tests ...................................................... 60
2.23 Exercises ............................................................... 62
2.24 Association Tests........................................................ 63
v
vi Contents
2.25 Exercises ............................................................ 67
3 Exploratory Data Analysis 69
3.1 Quantile Quantile Plots ................................................. 69
3.2 Boxplots................................................................. 72
3.3 Scatterplots and Correlation............................................. 74
3.4 Stratification ......................................................... 74
3.5 Bivariate Normal Distribution............................................ 75
3.6 Plots to Avoid ........................................................ 78
3.7 Misunderstanding Correlation (Advanced).................................. 91
3.8 Exercises ............................................................ 93
3.9 Robust Summaries......................................................... 94
3.10 Wilcoxon Rank Sum Test ................................................ 99
3.11 Exercises ........................................................... 100
4 Matrix Algebra 103
4.1 Motivating Examples..................................................... 103
4.2 Exercises .............................................................. 109
4.3 Matrix Notation ...................................................... 110
4.4 Solving Systems of Equations.......................................... 110
4.5 Vectors, Matrices, and Scalars.......................................... Ill
4.6 Exercises .............................................................. 113
4.7 Matrix Operations .................................................... 113
4.8 Exercises .............................................................. 117
4.9 Examples ............................................................... 118
4.10 Exercises .............................................................. 122
5 Linear Models 125
5.1 Exercises .............................................................. 125
5.2 The Design Matrix..................................................... 127
5.3 Exercises .............................................................. 134
5.4 The Mathematics Behind lm() ........................................... 135
5.5 Exercises ............................................................ 137
5.6 Standard Errors ........................................................ 139
5.7 Exercises .............................................................. 145
5.8 Interactions and Contrasts.............................................. 146
5.9 Linear Model with Interactions ......................................... 156
5.10 Analysis of Variance ................................................. 160
5.11 Exercises .............................................................. 166
5.12 Collinearity............................................................ 168
5.13 Rank.................................................................... 169
5.14 Removing Confounding ............................................... 170
5.15 Exercises .............................................................. 170
5.16 The QR Factorization (Advanced) ........................................ 172
5.17 Going Further........................................................... 175
6 Inference for High Dimensional Data 177
6.1 Introduction............................................................ 177
6.2 Exercises .............................................................. 179
6.3 Inference in Practice................................................... 180
6.4 Exercises .............................................................. 183
Contents vii
6.5 Procedures ............................................................ 184
6.6 Error Rates............................................................ 184
6.7 The Bonierroni Correction.............................................. 187
6.8 False Discovery Rate................................................... 189
6.9 Direct Approach to FDR and q-values (Advanced) ........................ 195
6.10 Exercises ............................................................. 198
6.11 Basic Exploratory Data Analysis ....................................... 201
6.12 Exercises ............................................................. 206
7 Statistical Models
7.1 The Binomial Distribution....................
7.2 The Poisson Distribution ....................
7.3 Maximum Likelihood Estimation ...............
7.4 Distributions for Positive Continuous Values
7.5 Exercises ...................................
7.6 Bayesian Statistics .........................
7.7 Exercises ...................................
7.8 Hierarchical Models .........................
7.9 Exercises ...................................
209
209
210
213
215
220
224
229
230
234
8 Distance and Dimension Reduction 237
8.1 Introduction............................................................ 237
8.2 Euclidean Distance...................................................... 237
8.3 Distance in High Dimensions ............................................ 239
8.4 Exercises .............................................................. 241
8.5 Dimension Reduction Motivation.......................................... 241
8.6 Singular Value Decomposition............................................ 246
8.7 Exercises .............................................................. 252
8.8 Projections ............................................................ 254
8.9 Rotations .............................................................. 258
8.10 Multi-Dimensional Scaling Plots ....................................... 261
8.11 Exercises .............................................................. 266
8.12 Principal Component Analysis ......................................... 267
9 Basic Machine Learning 273
9.1 Clustering.............................................................. 273
9.2 Exercises .............................................................. 279
9.3 Conditional Probabilities and Expectations ............................. 281
9.4 Exercises .............................................................. 283
9.5 Smoothing............................................................... 285
9.6 Bin Smoothing .......................................................... 286
9.7 Loess ................................................................ 288
9.8 Exercises .............................................................. 290
9.9 Class Prediction ....................................................... 291
9.10 Cross-validation ....................................................... 297
9.11 Exercises .............................................................. 302
viii Contents
10 Batch Effects 305
10.1 Confounding ............................................................. 307
10.2 Confounding: High-Throughput Example .................................... 311
10.3 Exercises ............................................................... 312
10.4 Discovering Batch Effects with EDA ...................................... 313
10.5 Gene Expression Data..................................................... 314
10.6 Exercises ............................................................... 321
10.7 Motivation for Statistical Approaches ................................ 323
10.8 Adjusting for Batch Effects with Linear Models........................... 325
10.9 Exercises ............................................................... 329
10.10 Factor Analysis......................................................... 330
10.11 Exercises .............................................................. 333
10.12 Modeling Batch Effects with Factor Analysis ............................ 335
10.13 Exercises .............................................................. 342
Index
343
Genomics is being driven by new measurement technologies that permit us to ob-
serve certain molecular entities for the first time. These observations are leading
to discoveries analogous to identifying microorganisms and other breakthroughs
permitted by the invention of the microscope. Choice examples of these tech-
nologies are next generation sequencing and microarrays. This book was written
for the many life science researchers who are becoming data analysts due to the
emergence of these new types of data.
Although the content of the book is mostly focused on advanced statistical con-
cepts, the basics are covered to make sure readers have a strong grounding on
the fundamental statistical concepts required for all data analysis. The book be-
gins with statistical inference and then proceeds to an introduction to linear mod-
els and matrix algebra, high-dimensional data, distance and dimension reduction,
and batch effects and factor analysis.
The emphasis is on using a computer to perform data analysis. All sections of
this book are reproducible as they were made with R markdown documents that
include the code used to produce the book’s figures, tables and results.
Rafael A. Irizarry is Professor of Applied Statistics at the Dana Färber Cancer
Center and Harvard School of Public Health. In 2009 he was awarded The Pres-
idents Award by the Committee of Presidents of Statistical Societies (COPSS).
His work has been highly cited and his open source software tools widely down-
loaded.
Michael I. Love is a Postdoctoral Fellow at Harvard School of Public Health. He
received his Ph.D. in computational biology in 2013 from the Freie Universität
Berlin.
Professors Irizarry and Love have taught seven computational biology courses on
edX to hundreds of thousands of students.
|
any_adam_object | 1 |
author | Irizarry, Rafael A. Love, Michael I. |
author_GND | (DE-588)1043111328 |
author_facet | Irizarry, Rafael A. Love, Michael I. |
author_role | aut aut |
author_sort | Irizarry, Rafael A. |
author_variant | r a i ra rai m i l mi mil |
building | Verbundindex |
bvnumber | BV043954947 |
classification_rvk | ST 250 ST 601 WC 7700 |
ctrlnum | (OCoLC)969648805 (DE-599)OBVAC13295934 |
discipline | Biologie Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02053nam a2200433 c 4500</leader><controlfield tag="001">BV043954947</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20200708 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">161212s2017 a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781138407206</subfield><subfield code="c">hbk</subfield><subfield code="9">978-1-138-40720-6</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781498775670</subfield><subfield code="9">978-1-4987-7567-0</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)969648805</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)OBVAC13295934</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-20</subfield><subfield code="a">DE-355</subfield><subfield code="a">DE-1043</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 250</subfield><subfield code="0">(DE-625)143626:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 601</subfield><subfield code="0">(DE-625)143682:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">WC 7700</subfield><subfield code="0">(DE-625)148144:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Irizarry, Rafael A.</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Data analysis for the life sciences with R</subfield><subfield code="c">Rafael A. Irizarry, Michael I. Love</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boca Raton ; London ; New York</subfield><subfield code="b">CRC Press, Taylor & Francis Group</subfield><subfield code="c">[2017]</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xxi, 353 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">A Chapman & Hall book</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Datenanalyse</subfield><subfield code="0">(DE-588)4123037-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Biowissenschaften</subfield><subfield code="0">(DE-588)4129772-6</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">R</subfield><subfield code="g">Programm</subfield><subfield code="0">(DE-588)4705956-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Datenanalyse</subfield><subfield code="0">(DE-588)4123037-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Biowissenschaften</subfield><subfield code="0">(DE-588)4129772-6</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">R</subfield><subfield code="g">Programm</subfield><subfield code="0">(DE-588)4705956-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Love, Michael I.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1043111328</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029363726&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029363726&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Klappentext</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-029363726</subfield></datafield></record></collection> |
id | DE-604.BV043954947 |
illustrated | Illustrated |
indexdate | 2024-07-10T07:39:38Z |
institution | BVB |
isbn | 9781138407206 9781498775670 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-029363726 |
oclc_num | 969648805 |
open_access_boolean | |
owner | DE-20 DE-355 DE-BY-UBR DE-1043 |
owner_facet | DE-20 DE-355 DE-BY-UBR DE-1043 |
physical | xxi, 353 Seiten Illustrationen, Diagramme |
publishDate | 2017 |
publishDateSearch | 2017 |
publishDateSort | 2017 |
publisher | CRC Press, Taylor & Francis Group |
record_format | marc |
series2 | A Chapman & Hall book |
spelling | Irizarry, Rafael A. Verfasser aut Data analysis for the life sciences with R Rafael A. Irizarry, Michael I. Love Boca Raton ; London ; New York CRC Press, Taylor & Francis Group [2017] xxi, 353 Seiten Illustrationen, Diagramme txt rdacontent n rdamedia nc rdacarrier A Chapman & Hall book Datenanalyse (DE-588)4123037-1 gnd rswk-swf Biowissenschaften (DE-588)4129772-6 gnd rswk-swf R Programm (DE-588)4705956-4 gnd rswk-swf Datenanalyse (DE-588)4123037-1 s Biowissenschaften (DE-588)4129772-6 s R Programm (DE-588)4705956-4 s DE-604 Love, Michael I. Verfasser (DE-588)1043111328 aut Digitalisierung UB Regensburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029363726&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis Digitalisierung UB Regensburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029363726&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA Klappentext |
spellingShingle | Irizarry, Rafael A. Love, Michael I. Data analysis for the life sciences with R Datenanalyse (DE-588)4123037-1 gnd Biowissenschaften (DE-588)4129772-6 gnd R Programm (DE-588)4705956-4 gnd |
subject_GND | (DE-588)4123037-1 (DE-588)4129772-6 (DE-588)4705956-4 |
title | Data analysis for the life sciences with R |
title_auth | Data analysis for the life sciences with R |
title_exact_search | Data analysis for the life sciences with R |
title_full | Data analysis for the life sciences with R Rafael A. Irizarry, Michael I. Love |
title_fullStr | Data analysis for the life sciences with R Rafael A. Irizarry, Michael I. Love |
title_full_unstemmed | Data analysis for the life sciences with R Rafael A. Irizarry, Michael I. Love |
title_short | Data analysis for the life sciences with R |
title_sort | data analysis for the life sciences with r |
topic | Datenanalyse (DE-588)4123037-1 gnd Biowissenschaften (DE-588)4129772-6 gnd R Programm (DE-588)4705956-4 gnd |
topic_facet | Datenanalyse Biowissenschaften R Programm |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029363726&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029363726&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT irizarryrafaela dataanalysisforthelifescienceswithr AT lovemichaeli dataanalysisforthelifescienceswithr |