Deep learning: foundations and concepts
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Cham, Switzerland
Springer
[2024]
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | xx, 649 Seiten Illustrationen, Diagramme 1444 gr |
ISBN: | 9783031454677 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV049467217 | ||
003 | DE-604 | ||
005 | 20240814 | ||
007 | t | ||
008 | 231214s2024 a||| |||| 00||| eng d | ||
020 | |a 9783031454677 |9 978-3-031-45467-7 | ||
024 | 3 | |a 9783031454677 | |
035 | |a (OCoLC)1414165990 | ||
035 | |a (DE-599)BVBBV049467217 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
049 | |a DE-473 |a DE-898 |a DE-1050 |a DE-29T |a DE-1051 |a DE-863 |a DE-Po75 |a DE-1102 |a DE-384 |a DE-523 |a DE-91G |a DE-2070s |a DE-188 |a DE-B768 |a DE-634 |a DE-703 | ||
084 | |a ST 300 |0 (DE-625)143650: |2 rvk | ||
084 | |a ST 301 |0 (DE-625)143651: |2 rvk | ||
084 | |a DAT 708 |2 stub | ||
100 | 1 | |a Bishop, Christopher M. |d 1959- |e Verfasser |0 (DE-588)120454165 |4 aut | |
245 | 1 | 0 | |a Deep learning |b foundations and concepts |c Christopher M. Bishop, Hugh Bishop |
264 | 1 | |a Cham, Switzerland |b Springer |c [2024] | |
264 | 4 | |c © 2024 | |
300 | |a xx, 649 Seiten |b Illustrationen, Diagramme |c 1444 gr | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 4 | |a bicssc | |
650 | 4 | |a bicssc | |
650 | 4 | |a bisacsh | |
650 | 4 | |a bisacsh | |
650 | 4 | |a Machine learning | |
650 | 4 | |a Artificial intelligence—Data processing | |
650 | 4 | |a Artificial intelligence | |
650 | 0 | 7 | |a Deep learning |0 (DE-588)1135597375 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Deep learning |0 (DE-588)1135597375 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Bishop, Hugh |e Verfasser |4 aut | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 978-3-031-45468-4 |
856 | 4 | 2 | |m Digitalisierung UB Bamberg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034812867&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-034812867 |
Datensatz im Suchindex
DE-BY-863_location | 1000 |
---|---|
DE-BY-FWS_call_number | 1000/ST 302 B622 |
DE-BY-FWS_katkey | 1069085 |
DE-BY-FWS_media_number | 083101208014 |
_version_ | 1819651568318808064 |
adam_text |
Contents Preface Contents v xi 1 The Deep Learning Revolution 1.1 The Impact of Deep Learning. 1.1.1 Medical diagnosis . 1.1.2 Protein structure. 1.1.3 Image synthesis. 1.1.4 Large language models. 1.2 A Tutorial Example . 1.2.1 Synthetic data. 1.2.2 Linear models. 1.2.3 Error function. 1.2.4 Model complexity . 1.2.5 Regularization. 1.2.6 Model selection. 1.3 A Brief History of Machine Learning. 1.3.1 Single-layer networks . 1.3.2 Backpropagation. 1.3.3 Deep networks. 1 2 2 3 4 5 6 6 8 8 9 12 14 16 17 18 20 2 Probabilities
2.1 The Rules of Probability. 2.1.1 A medical screening example . 2.1.2 The sum and product rules. 2.1.3 Bayes’ theorem. 2.1.4 Medical screening revisited . 2.1.5 Prior and posterior probabilities. 23 25 25 26 28 30 31 xi
xii CONTENTS 2.1.6 Independent variables . Probability Densities. 2.2.1 Example distributions . 2.2.2 Expectations and covariances . 2.3 The Gaussian Distribution. 2.3.1 Mean and variance. 2.3.2 Likelihood function. 2.3.3 Bias of maximum likelihood. 2.3.4 Linear regression. 2.4 Transformation of Densities. 2.4.1 Multivariate distributions. 2.5 Information Theory. 2.5.1 Entropy. 2.5.2 Physics perspective. 2.5.3 Differential entropy. 2.5.4 Maximum entropy. 2.5.5 Kullback-Leibler divergence. 2.5.6 Conditional entropy
. 2.5.7 Mutual information. 2.6 Bayesian Probabilities. 2.6.1 Model parameters. 2.6.2 Regularization . 2.6.3 Bayesian machine learning. Exercises . 31 32 33 34 36 37 37 39 40 42 44 46 46 47 49 50 51 53 54 54 55 56 57 58 Standard Distributions 3.1 Discrete Variables . 3.1.1 Bernoulli distribution. 3.1.2 Binomial distribution. 3.1.3 Multinomial distribution. 3.2 The Multivariate Gaussian. 3.2.1 Geometry of the Gaussian. 3.2.2 Moments. 3.2.3 Limitations. 3.2.4 Conditional distribution . 3.2.5 Marginal
distribution. 3.2.6 Bayes’ theorem. 3.2.7 Maximum likelihood. 3.2.8 Sequential estimation. 3.2.9 Mixtures of Gaussians. 3.3 Periodic Variables . 3.3.1 Von Mises distribution. 3.4 The Exponential Family. 3.4.1 Sufficient statistics. 3.5 Nonparametric Methods. 65 66 66 67 68 70 71 74 75 76 79 81 84 85 86 89 89 94 97 98 2.2 3
CONTENTS 3.5.1 Histograms. 3.5.2 Kernel densities. 3.5.3 Nearest-neighbours. Exercises . xiii 98 100 103 105 4 Single-layer Networks: Regression 111 4.1 Linear Regression . 112 4.1.1 Basis functions. 112 4.1.2 Likelihood function. 114 4.1.3 Maximum likelihood. 115 4.1.4 Geometry of least squares. 117 4.1.5 Sequential learning. 117 4.1.6 Regularized least squares. 118 4.1.7 Multiple outputs . 119 4.2 Decision theory. 120 4.3 The Bias-Variance Trade-off. 123 Exercises . 128 5 Single-layer Networks: Classification 131
5.1 Discriminant Functions. 132 5.1.1 Two classes. 132 5.1.2 Multiple classes. 134 5.1.3 1-of-K coding. 135 5.1.4 Least squares for classification. 136 5.2 Decision Theory. 138 5.2.1 Misclassification rate. 139 5.2.2 Expected loss. 140 5.2.3 The reject option. 142 5.2.4 Inference and decision. 143 5.2.5 Classifier accuracy. 147 5.2.6 ROC curve. 148 5.3 Generative Classifiers . 150 5.3.1 Continuous inputs . 152 5.3.2 Maximum likelihood solution.153 5.3.3 Discrete features. 156 5.3.4 Exponential
family. 156 5.4 Discriminative Classifiers . 157 5.4.1 Activation functions. 158 5.4.2 Fixed basis functions. 158 5.4.3 Logistic regression. 159 5.4.4 Multi-class logistic regression. 161 5.4.5 Probit regression. 163 5.4.6 Canonical link functions. 164 Exercises . 166
xiv CONTENTS 6 Deep Neural Networks 171 6.1 Limitations of Fixed Basis Functions. 172 6.1.1 The curse of dimensionality.172 6.1.2 High-dimensional spaces. 175 6.1.3 Data manifolds. 176 6.1.4 Data-dependent basis functions . 178 6.2 Multilayer Networks. 180 6.2.1 Parameter matrices. 181 6.2.2 Universal approximation. 181. 6.2.3 Hidden unit activation functions. 182 6.2.4 Weight-space symmetries . 185 6.3 Deep Networks. 186 6.3.1 Hierarchical representations. 187 6.3.2 Distributed representations. 187 6.3.3 Representation learning . 188 6.3.4 Transfer learning. 189 6.3.5 Contrastive learning. 191
6.3.6 General network architectures. 193 6.3.7 Tensors. 194 6.4 Error Functions. 194 6.4.1 Regression . 194 6.4.2 Binary classification. 196 6.4.3 multiclass classification . 197 6.5 Mixture Density Networks. 198 6.5.1 Robot kinematics example. 198 6.5.2 Conditional mixture distributions . 199 6.5.3 Gradient optimization . 201 6.5.4 Predictive distribution . 202 Exercises . 204 7 Gradient Descent 209 7.1 Error Surfaces . 210 7.1.1 Local quadratic approximation. 211 7.2 Gradient Descent Optimization . 213 7.2.1 Use of gradient
information. 214 7.2.2 Batch gradient descent.214 7.2.3 Stochastic gradient descent. 214 7.2.4 Mini-batches. 216 7.2.5 Parameter initialization.216 7.3 Convergence. 218 7.3.1 Momentum. 220 7.3.2 Learning rate schedule. 222 7.3.3 RMSProp and Adam. 223 7.4 Normalization . 224 7.4.1 Data normalization. 226
CONTENTS XV 7.4.2 Batch normalization. 227 7.4.3 Layer normalization. 229 Exercises . 230 8 Backpropagation 233 8.1 Evaluation of Gradients . 234 8.1.1 Single-layer networks .234 8.1.2 General feed-forward networks . 235 8.1.3 A simple example .238 8.1.4 Numerical differentiation. 239 8.1.5 The Jacobian matrix. 240 8.1.6 The Hessian matrix. 242 8.2 Automatic Differentiation. 244 8.2.1 Forward-mode automatic differentiation. 246 8.2.2 Reverse-mode automatic differentiation .249 Exercises . 250 9 Regularization 253 9.1 Inductive Bias . 254 9.1.1 Inverse
problems. 254 9.1.2 No free lunch theorem. 255 9.1.3 Symmetry and invariance. 256 9.1.4 Equivariance. 259 9.2 Weight Decay. 260 9.2.1 Consistent regularizers. 262 9.2.2 Generalized weight decay. 264 9.3 Learning Curves. 266 9.3.1 Early stopping . 266 9.3.2 Double descent. 268 9.4 Parameter Sharing. 270 9.4.1 Soft weight sharing. 271 9.5 Residual Connections . 274 9.6 Model Averaging. 277 9.6.1 Dropout. 279 Exercises
. 281 10 Convolutional Networks 287 10.1 Computer Vision. 288 10.1.1 Image data. 289 10.2 Convolutional Filters. 290 10.2.1 Feature detectors. 290 10.2.2 Translation equivariance.291 10.2.3 Padding. 294 10.2.4 Strided convolutions. 294 10.2.5 Multi-dimensional convolutions. 295 10.2.6 Pooling. 296
xvi CONTENTS 10.2.7 Multilayer convolutions . 298 10.2.8 Example network architectures. 299 10.3 Visualizing Trained CNNs.302 10.3.1 Visual cortex. 302 10.3.2 Visualizing trained filters.303 10.3.3 Saliency maps . 305 10.3.4 Adversarial attacks. 306 10.3.5 Synthetic images. 308 10.4 Object Detection. 308 10.4.1 Bounding boxes . 309 10.4.2 Intersection-over-union. 310 10.4.3 Sliding windows. 311 10.4.4 Detection across scales. 313 10.4.5 Non-max suppression. 314 10.4.6 Fast region CNNs. 314 10.5 Image
Segmentation. 315 10.5.1 Convolutional segmentation. 315 10.5.2 Up-sampling. 316 10.5.3 Fully convolutional networks. 318 10.5.4 The U-net architecture. 319 10.6 Style Transfer. 320 Exercises . 322 11 Structured Distributions 325 11.1 Graphical Models. 326 11.1.1 Directed graphs. 326 11.1.2 Factorization. 327 11.1.3 Discrete variables. 329 11.1.4 Gaussian variables. 332 11.1.5 Binary classifier . 334 11.1.6 Parameters and observations. 334 11.1.7 Bayes’ theorem. 336 11.2 Conditional
Independence. 337 11.2.1 Three example graphs . 338 11.2.2 Explaining away. 341 11.2.3 D-separation. 343 11.2.4 Naive Bayes . 344 11.2.5 Generative models. 346 11.2.6 Markov blanket. 347 11.2.7 Graphs as filters. 348 11.3 Sequence Models. 349 11.3.1 Hidden variables. 352 Exercises . 353
CONTENTS xvii 12 Transformers 357 12.1 Attention. 358 12.1.1 Transformer processing. 360 12.1.2 Attention coefficients. 361 12.1.3 Self-attention. 362 12.1.4 Network parameters. 363 12.1.5 Scaled self-attention. 366 12.1.6 Multi-head attention. 366 12.1.7 Transformer layers. 368 12.1.8 Computational complexity. 370 12.1.9 Positional encoding. 371 12.2 Natural Language. 374 12.2.1 Word embedding. 375 12.2.2 Tokenization. 377 12.2.3 Bag of words. 378 12.2.4 Autoregressive models.379 12.2.5 Recurrent neural networks.
380 12.2.6 Backpropagation through time. 381 12.3 Transformer Language Models.382 12.3.1 Decoder transformers. 383 12.3.2 Sampling strategies. 386 12.3.3 Encoder transformers. 388 12.3.4 Sequence-to-sequence transformers. 390 12.3.5 Large language models. 390 12.4 Multimodal Transformers. 394 12.4.1 Vision transformers. 395 12.4.2 Generative image transformers. 396 12.4.3 Audio data. 399 12.4.4 Text-to-speech. 400 12.4.5 Vision and language transformers. 402 Exercises . 403 13 Graph Neural Networks 407 13.1 Machine Learning on Graphs . 409 13.1.1 Graph properties. 410 13.1.2 Adjacency
matrix. 410 13.1.3 Permutation equivariance.411 13.2 Neural Message-Passing. 412 13.2.1 Convolutional filters. 413 13.2.2 Graph convolutional networks.414 13.2.3 Aggregation operators. 416 13.2.4 Update operators. . 418 13.2.5 Node classification. 419 13.2.6 Edge classification. 420 13.2.7 Graph classification. 420
xviii CONTENTS 13.3 General Graph Networks. 420 13.3.1 Graph attention networks. 421 13.3.2 Edge embeddings. 421 13.3.3 Graph embeddings. 422 13.3.4 Over-smoothing . 422 13.3.5 Regularization . 423 13.3.6 Geometric deep learning. 424 Exercises . 425 14 Sampling 429 14.1 Basic Sampling Algorithms. 430 14.1.1 Expectations. 430 14.1.2 Standard distributions . 431 14.1.3 Rejection sampling. 433 14.1.4 Adaptive rejection sampling. 435 14.1.5 Importance sampling. 437 14.1.6 Sampling-importance-resampling . 439 14.2 Markov Chain Monte
Carlo. 440 14.2.1 The Metropolis algorithm . 441 14.2.2 Markov chains. 442 14.2.3 The Metropolis-Hastings algorithm. 445 14.2.4 Gibbs sampling. 446 14.2.5 Ancestral sampling. 450 14.3 Langevin Sampling. 451 14.3.1 Energy-based models. 452 14.3.2 Maximizing the likelihood. 453 14.3.3 Langevin dynamics. 454 Exercises . 456 15 Discrete Latent Variables 459 15.1 K-means Clustering. 460 15.1.1 Image segmentation . 464 15.2 Mixtures of Gaussians. 466 15.2.1 Likelihood function. 468 15.2.2 Maximum
likelihood. 470 15.3 Expectation-Maximization Algorithm. 474 15.3.1 Gaussian mixtures. 478 15.3.2 Relation to К-means. 480 15.3.3 Mixtures of Bernoullidistributions. 481 15.4 Evidence Lower Bound . 485 15.4.1 EM revisited. 486 15.4.2 Independent and identically distributed data. 488 15.4.3 Parameter priors. 489 15.4.4 Generalized EM . 489 15.4.5 Sequential EM. 490 Exercises . 490
CONTENTS Xix 16 Continuous Latent Variables 495 16.1 Principal Component Analysis. 497 16.1.1 Maximum variance formulation. 497 16.1.2 Minimum-error formulation. 499 16.1.3 Data compression. 501 16.1.4 Data whitening. 502 16.1.5 High-dimensional data. 504 16.2 Probabilistic Latent Variables. 506 16.2.1 Generative model. 506 16.2.2 Likelihood function. 507 16.2.3 Maximum likelihood. 509 16.2.4 Factor analysis. 513 16.2.5 Independent component analysis. 514 16.2.6 Kalman filters. 515 16.3 Evidence Lower Bound . 516 16.3.1 Expectation maximization. 518 16.3.2 EMforPCA. 519 16.3.3
EM for factor analysis.520 16.4 Nonlinear Latent Variable Models. 522 16.4.1 Nonlinear manifolds. 522 16.4.2 Likelihood function. 524 16.4.3 Discrete data. 526 16.4.4 Four approaches to generative modelling . 527 Exercises . 527 17 Generative Adversarial Networks 533 17.1 Adversarial Training. 534 17.1.1 Loss function. 535 17.1.2 GAN training in practice.536 17.2 Image GANs. 539 17.2.1 CycleGAN. 539 Exercises . 544 18 Normalizing Flows 547 18.1 Coupling Flows. 549 18.2 Autoregressive
Flows. 552 18.3 Continuous Flows . 554 18.3.1 Neural differential equations. 554 18.3.2 Neural ODE backpropagation . 555 18.3.3 Neural ODE flows. 557 Exercises . 559
XX CONTENTS 19 Autoencoders 563 19.1 Deterministic Autoencoders. 564 19.1.1 Linearautoencoders. 564 19.1.2 Deepautoencoders. 565 19.1.3 Sparse autoencoders. 566 19.1.4 Denoising autoencoders. 567 19.1.5 Masked autoencoders. 567 19.2 Variational Autoencoders. 569 19.2.1 Amortized inference. 572 19.2.2 The reparameterizationtrick. 574 Exercises . 578 20 Diffusion Models 581 20.1 Forward Encoder.582 20.1.1 Diffusion kernel . 583 20.1.2 Conditional distribution . 584 20.2 Reverse Decoder.585 20.2.1 Training the decoder. 587 20.2.2 Evidence
lower bound. 588 20.2.3 Rewriting the ELBO. 589 20.2.4 Predicting the noise.591 20.2.5 Generating new samples.592 20.3 Score Matching. 594 20.3.1 Score loss function. 595 20.3.2 Modified score loss. 596 20.3.3 Noise variance. 597 20.3.4 Stochastic differential equations. 598 20.4 Guided Diffusion. 599 20.4.1 Classifier guidance. 600 20.4.2 Classifier-free guidance . 600 Exercises . 603 Appendix A Linear Algebra 609 A.l Matrix Identities. 609 A.2 Traces and Determinants. 610 A.3 Matrix
Derivatives. 611 A.4 Eigenvectors. 612 Appendix В Calculus of Variations 617 Appendix C Lagrange Multipliers 621 Bibliography 625 Index 641 |
adam_txt |
Contents Preface Contents v xi 1 The Deep Learning Revolution 1.1 The Impact of Deep Learning. 1.1.1 Medical diagnosis . 1.1.2 Protein structure. 1.1.3 Image synthesis. 1.1.4 Large language models. 1.2 A Tutorial Example . 1.2.1 Synthetic data. 1.2.2 Linear models. 1.2.3 Error function. 1.2.4 Model complexity . 1.2.5 Regularization. 1.2.6 Model selection. 1.3 A Brief History of Machine Learning. 1.3.1 Single-layer networks . 1.3.2 Backpropagation. 1.3.3 Deep networks. 1 2 2 3 4 5 6 6 8 8 9 12 14 16 17 18 20 2 Probabilities
2.1 The Rules of Probability. 2.1.1 A medical screening example . 2.1.2 The sum and product rules. 2.1.3 Bayes’ theorem. 2.1.4 Medical screening revisited . 2.1.5 Prior and posterior probabilities. 23 25 25 26 28 30 31 xi
xii CONTENTS 2.1.6 Independent variables . Probability Densities. 2.2.1 Example distributions . 2.2.2 Expectations and covariances . 2.3 The Gaussian Distribution. 2.3.1 Mean and variance. 2.3.2 Likelihood function. 2.3.3 Bias of maximum likelihood. 2.3.4 Linear regression. 2.4 Transformation of Densities. 2.4.1 Multivariate distributions. 2.5 Information Theory. 2.5.1 Entropy. 2.5.2 Physics perspective. 2.5.3 Differential entropy. 2.5.4 Maximum entropy. 2.5.5 Kullback-Leibler divergence. 2.5.6 Conditional entropy
. 2.5.7 Mutual information. 2.6 Bayesian Probabilities. 2.6.1 Model parameters. 2.6.2 Regularization . 2.6.3 Bayesian machine learning. Exercises . 31 32 33 34 36 37 37 39 40 42 44 46 46 47 49 50 51 53 54 54 55 56 57 58 Standard Distributions 3.1 Discrete Variables . 3.1.1 Bernoulli distribution. 3.1.2 Binomial distribution. 3.1.3 Multinomial distribution. 3.2 The Multivariate Gaussian. 3.2.1 Geometry of the Gaussian. 3.2.2 Moments. 3.2.3 Limitations. 3.2.4 Conditional distribution . 3.2.5 Marginal
distribution. 3.2.6 Bayes’ theorem. 3.2.7 Maximum likelihood. 3.2.8 Sequential estimation. 3.2.9 Mixtures of Gaussians. 3.3 Periodic Variables . 3.3.1 Von Mises distribution. 3.4 The Exponential Family. 3.4.1 Sufficient statistics. 3.5 Nonparametric Methods. 65 66 66 67 68 70 71 74 75 76 79 81 84 85 86 89 89 94 97 98 2.2 3
CONTENTS 3.5.1 Histograms. 3.5.2 Kernel densities. 3.5.3 Nearest-neighbours. Exercises . xiii 98 100 103 105 4 Single-layer Networks: Regression 111 4.1 Linear Regression . 112 4.1.1 Basis functions. 112 4.1.2 Likelihood function. 114 4.1.3 Maximum likelihood. 115 4.1.4 Geometry of least squares. 117 4.1.5 Sequential learning. 117 4.1.6 Regularized least squares. 118 4.1.7 Multiple outputs . 119 4.2 Decision theory. 120 4.3 The Bias-Variance Trade-off. 123 Exercises . 128 5 Single-layer Networks: Classification 131
5.1 Discriminant Functions. 132 5.1.1 Two classes. 132 5.1.2 Multiple classes. 134 5.1.3 1-of-K coding. 135 5.1.4 Least squares for classification. 136 5.2 Decision Theory. 138 5.2.1 Misclassification rate. 139 5.2.2 Expected loss. 140 5.2.3 The reject option. 142 5.2.4 Inference and decision. 143 5.2.5 Classifier accuracy. 147 5.2.6 ROC curve. 148 5.3 Generative Classifiers . 150 5.3.1 Continuous inputs . 152 5.3.2 Maximum likelihood solution.153 5.3.3 Discrete features. 156 5.3.4 Exponential
family. 156 5.4 Discriminative Classifiers . 157 5.4.1 Activation functions. 158 5.4.2 Fixed basis functions. 158 5.4.3 Logistic regression. 159 5.4.4 Multi-class logistic regression. 161 5.4.5 Probit regression. 163 5.4.6 Canonical link functions. 164 Exercises . 166
xiv CONTENTS 6 Deep Neural Networks 171 6.1 Limitations of Fixed Basis Functions. 172 6.1.1 The curse of dimensionality.172 6.1.2 High-dimensional spaces. 175 6.1.3 Data manifolds. 176 6.1.4 Data-dependent basis functions . 178 6.2 Multilayer Networks. 180 6.2.1 Parameter matrices. 181 6.2.2 Universal approximation. 181. 6.2.3 Hidden unit activation functions. 182 6.2.4 Weight-space symmetries . 185 6.3 Deep Networks. 186 6.3.1 Hierarchical representations. 187 6.3.2 Distributed representations. 187 6.3.3 Representation learning . 188 6.3.4 Transfer learning. 189 6.3.5 Contrastive learning. 191
6.3.6 General network architectures. 193 6.3.7 Tensors. 194 6.4 Error Functions. 194 6.4.1 Regression . 194 6.4.2 Binary classification. 196 6.4.3 multiclass classification . 197 6.5 Mixture Density Networks. 198 6.5.1 Robot kinematics example. 198 6.5.2 Conditional mixture distributions . 199 6.5.3 Gradient optimization . 201 6.5.4 Predictive distribution . 202 Exercises . 204 7 Gradient Descent 209 7.1 Error Surfaces . 210 7.1.1 Local quadratic approximation. 211 7.2 Gradient Descent Optimization . 213 7.2.1 Use of gradient
information. 214 7.2.2 Batch gradient descent.214 7.2.3 Stochastic gradient descent. 214 7.2.4 Mini-batches. 216 7.2.5 Parameter initialization.216 7.3 Convergence. 218 7.3.1 Momentum. 220 7.3.2 Learning rate schedule. 222 7.3.3 RMSProp and Adam. 223 7.4 Normalization . 224 7.4.1 Data normalization. 226
CONTENTS XV 7.4.2 Batch normalization. 227 7.4.3 Layer normalization. 229 Exercises . 230 8 Backpropagation 233 8.1 Evaluation of Gradients . 234 8.1.1 Single-layer networks .234 8.1.2 General feed-forward networks . 235 8.1.3 A simple example .238 8.1.4 Numerical differentiation. 239 8.1.5 The Jacobian matrix. 240 8.1.6 The Hessian matrix. 242 8.2 Automatic Differentiation. 244 8.2.1 Forward-mode automatic differentiation. 246 8.2.2 Reverse-mode automatic differentiation .249 Exercises . 250 9 Regularization 253 9.1 Inductive Bias . 254 9.1.1 Inverse
problems. 254 9.1.2 No free lunch theorem. 255 9.1.3 Symmetry and invariance. 256 9.1.4 Equivariance. 259 9.2 Weight Decay. 260 9.2.1 Consistent regularizers. 262 9.2.2 Generalized weight decay. 264 9.3 Learning Curves. 266 9.3.1 Early stopping . 266 9.3.2 Double descent. 268 9.4 Parameter Sharing. 270 9.4.1 Soft weight sharing. 271 9.5 Residual Connections . 274 9.6 Model Averaging. 277 9.6.1 Dropout. 279 Exercises
. 281 10 Convolutional Networks 287 10.1 Computer Vision. 288 10.1.1 Image data. 289 10.2 Convolutional Filters. 290 10.2.1 Feature detectors. 290 10.2.2 Translation equivariance.291 10.2.3 Padding. 294 10.2.4 Strided convolutions. 294 10.2.5 Multi-dimensional convolutions. 295 10.2.6 Pooling. 296
xvi CONTENTS 10.2.7 Multilayer convolutions . 298 10.2.8 Example network architectures. 299 10.3 Visualizing Trained CNNs.302 10.3.1 Visual cortex. 302 10.3.2 Visualizing trained filters.303 10.3.3 Saliency maps . 305 10.3.4 Adversarial attacks. 306 10.3.5 Synthetic images. 308 10.4 Object Detection. 308 10.4.1 Bounding boxes . 309 10.4.2 Intersection-over-union. 310 10.4.3 Sliding windows. 311 10.4.4 Detection across scales. 313 10.4.5 Non-max suppression. 314 10.4.6 Fast region CNNs. 314 10.5 Image
Segmentation. 315 10.5.1 Convolutional segmentation. 315 10.5.2 Up-sampling. 316 10.5.3 Fully convolutional networks. 318 10.5.4 The U-net architecture. 319 10.6 Style Transfer. 320 Exercises . 322 11 Structured Distributions 325 11.1 Graphical Models. 326 11.1.1 Directed graphs. 326 11.1.2 Factorization. 327 11.1.3 Discrete variables. 329 11.1.4 Gaussian variables. 332 11.1.5 Binary classifier . 334 11.1.6 Parameters and observations. 334 11.1.7 Bayes’ theorem. 336 11.2 Conditional
Independence. 337 11.2.1 Three example graphs . 338 11.2.2 Explaining away. 341 11.2.3 D-separation. 343 11.2.4 Naive Bayes . 344 11.2.5 Generative models. 346 11.2.6 Markov blanket. 347 11.2.7 Graphs as filters. 348 11.3 Sequence Models. 349 11.3.1 Hidden variables. 352 Exercises . 353
CONTENTS xvii 12 Transformers 357 12.1 Attention. 358 12.1.1 Transformer processing. 360 12.1.2 Attention coefficients. 361 12.1.3 Self-attention. 362 12.1.4 Network parameters. 363 12.1.5 Scaled self-attention. 366 12.1.6 Multi-head attention. 366 12.1.7 Transformer layers. 368 12.1.8 Computational complexity. 370 12.1.9 Positional encoding. 371 12.2 Natural Language. 374 12.2.1 Word embedding. 375 12.2.2 Tokenization. 377 12.2.3 Bag of words. 378 12.2.4 Autoregressive models.379 12.2.5 Recurrent neural networks.
380 12.2.6 Backpropagation through time. 381 12.3 Transformer Language Models.382 12.3.1 Decoder transformers. 383 12.3.2 Sampling strategies. 386 12.3.3 Encoder transformers. 388 12.3.4 Sequence-to-sequence transformers. 390 12.3.5 Large language models. 390 12.4 Multimodal Transformers. 394 12.4.1 Vision transformers. 395 12.4.2 Generative image transformers. 396 12.4.3 Audio data. 399 12.4.4 Text-to-speech. 400 12.4.5 Vision and language transformers. 402 Exercises . 403 13 Graph Neural Networks 407 13.1 Machine Learning on Graphs . 409 13.1.1 Graph properties. 410 13.1.2 Adjacency
matrix. 410 13.1.3 Permutation equivariance.411 13.2 Neural Message-Passing. 412 13.2.1 Convolutional filters. 413 13.2.2 Graph convolutional networks.414 13.2.3 Aggregation operators. 416 13.2.4 Update operators. . 418 13.2.5 Node classification. 419 13.2.6 Edge classification. 420 13.2.7 Graph classification. 420
xviii CONTENTS 13.3 General Graph Networks. 420 13.3.1 Graph attention networks. 421 13.3.2 Edge embeddings. 421 13.3.3 Graph embeddings. 422 13.3.4 Over-smoothing . 422 13.3.5 Regularization . 423 13.3.6 Geometric deep learning. 424 Exercises . 425 14 Sampling 429 14.1 Basic Sampling Algorithms. 430 14.1.1 Expectations. 430 14.1.2 Standard distributions . 431 14.1.3 Rejection sampling. 433 14.1.4 Adaptive rejection sampling. 435 14.1.5 Importance sampling. 437 14.1.6 Sampling-importance-resampling . 439 14.2 Markov Chain Monte
Carlo. 440 14.2.1 The Metropolis algorithm . 441 14.2.2 Markov chains. 442 14.2.3 The Metropolis-Hastings algorithm. 445 14.2.4 Gibbs sampling. 446 14.2.5 Ancestral sampling. 450 14.3 Langevin Sampling. 451 14.3.1 Energy-based models. 452 14.3.2 Maximizing the likelihood. 453 14.3.3 Langevin dynamics. 454 Exercises . 456 15 Discrete Latent Variables 459 15.1 K-means Clustering. 460 15.1.1 Image segmentation . 464 15.2 Mixtures of Gaussians. 466 15.2.1 Likelihood function. 468 15.2.2 Maximum
likelihood. 470 15.3 Expectation-Maximization Algorithm. 474 15.3.1 Gaussian mixtures. 478 15.3.2 Relation to К-means. 480 15.3.3 Mixtures of Bernoullidistributions. 481 15.4 Evidence Lower Bound . 485 15.4.1 EM revisited. 486 15.4.2 Independent and identically distributed data. 488 15.4.3 Parameter priors. 489 15.4.4 Generalized EM . 489 15.4.5 Sequential EM. 490 Exercises . 490
CONTENTS Xix 16 Continuous Latent Variables 495 16.1 Principal Component Analysis. 497 16.1.1 Maximum variance formulation. 497 16.1.2 Minimum-error formulation. 499 16.1.3 Data compression. 501 16.1.4 Data whitening. 502 16.1.5 High-dimensional data. 504 16.2 Probabilistic Latent Variables. 506 16.2.1 Generative model. 506 16.2.2 Likelihood function. 507 16.2.3 Maximum likelihood. 509 16.2.4 Factor analysis. 513 16.2.5 Independent component analysis. 514 16.2.6 Kalman filters. 515 16.3 Evidence Lower Bound . 516 16.3.1 Expectation maximization. 518 16.3.2 EMforPCA. 519 16.3.3
EM for factor analysis.520 16.4 Nonlinear Latent Variable Models. 522 16.4.1 Nonlinear manifolds. 522 16.4.2 Likelihood function. 524 16.4.3 Discrete data. 526 16.4.4 Four approaches to generative modelling . 527 Exercises . 527 17 Generative Adversarial Networks 533 17.1 Adversarial Training. 534 17.1.1 Loss function. 535 17.1.2 GAN training in practice.536 17.2 Image GANs. 539 17.2.1 CycleGAN. 539 Exercises . 544 18 Normalizing Flows 547 18.1 Coupling Flows. 549 18.2 Autoregressive
Flows. 552 18.3 Continuous Flows . 554 18.3.1 Neural differential equations. 554 18.3.2 Neural ODE backpropagation . 555 18.3.3 Neural ODE flows. 557 Exercises . 559
XX CONTENTS 19 Autoencoders 563 19.1 Deterministic Autoencoders. 564 19.1.1 Linearautoencoders. 564 19.1.2 Deepautoencoders. 565 19.1.3 Sparse autoencoders. 566 19.1.4 Denoising autoencoders. 567 19.1.5 Masked autoencoders. 567 19.2 Variational Autoencoders. 569 19.2.1 Amortized inference. 572 19.2.2 The reparameterizationtrick. 574 Exercises . 578 20 Diffusion Models 581 20.1 Forward Encoder.582 20.1.1 Diffusion kernel . 583 20.1.2 Conditional distribution . 584 20.2 Reverse Decoder.585 20.2.1 Training the decoder. 587 20.2.2 Evidence
lower bound. 588 20.2.3 Rewriting the ELBO. 589 20.2.4 Predicting the noise.591 20.2.5 Generating new samples.592 20.3 Score Matching. 594 20.3.1 Score loss function. 595 20.3.2 Modified score loss. 596 20.3.3 Noise variance. 597 20.3.4 Stochastic differential equations. 598 20.4 Guided Diffusion. 599 20.4.1 Classifier guidance. 600 20.4.2 Classifier-free guidance . 600 Exercises . 603 Appendix A Linear Algebra 609 A.l Matrix Identities. 609 A.2 Traces and Determinants. 610 A.3 Matrix
Derivatives. 611 A.4 Eigenvectors. 612 Appendix В Calculus of Variations 617 Appendix C Lagrange Multipliers 621 Bibliography 625 Index 641 |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Bishop, Christopher M. 1959- Bishop, Hugh |
author_GND | (DE-588)120454165 |
author_facet | Bishop, Christopher M. 1959- Bishop, Hugh |
author_role | aut aut |
author_sort | Bishop, Christopher M. 1959- |
author_variant | c m b cm cmb h b hb |
building | Verbundindex |
bvnumber | BV049467217 |
classification_rvk | ST 300 ST 301 |
classification_tum | DAT 708 |
ctrlnum | (OCoLC)1414165990 (DE-599)BVBBV049467217 |
discipline | Informatik |
discipline_str_mv | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV049467217</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20240814</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">231214s2024 a||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9783031454677</subfield><subfield code="9">978-3-031-45467-7</subfield></datafield><datafield tag="024" ind1="3" ind2=" "><subfield code="a">9783031454677</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1414165990</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV049467217</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-473</subfield><subfield code="a">DE-898</subfield><subfield code="a">DE-1050</subfield><subfield code="a">DE-29T</subfield><subfield code="a">DE-1051</subfield><subfield code="a">DE-863</subfield><subfield code="a">DE-Po75</subfield><subfield code="a">DE-1102</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-523</subfield><subfield code="a">DE-91G</subfield><subfield code="a">DE-2070s</subfield><subfield code="a">DE-188</subfield><subfield code="a">DE-B768</subfield><subfield code="a">DE-634</subfield><subfield code="a">DE-703</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 301</subfield><subfield code="0">(DE-625)143651:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 708</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Bishop, Christopher M.</subfield><subfield code="d">1959-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)120454165</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Deep learning</subfield><subfield code="b">foundations and concepts</subfield><subfield code="c">Christopher M. Bishop, Hugh Bishop</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Cham, Switzerland</subfield><subfield code="b">Springer</subfield><subfield code="c">[2024]</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2024</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xx, 649 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield><subfield code="c">1444 gr</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">bicssc</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">bicssc</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">bisacsh</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">bisacsh</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Machine learning</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Artificial intelligence—Data processing</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Artificial intelligence</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Deep learning</subfield><subfield code="0">(DE-588)1135597375</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Deep learning</subfield><subfield code="0">(DE-588)1135597375</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Bishop, Hugh</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-3-031-45468-4</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bamberg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034812867&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-034812867</subfield></datafield></record></collection> |
id | DE-604.BV049467217 |
illustrated | Illustrated |
index_date | 2024-07-03T23:16:05Z |
indexdate | 2024-12-28T04:02:45Z |
institution | BVB |
isbn | 9783031454677 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-034812867 |
oclc_num | 1414165990 |
open_access_boolean | |
owner | DE-473 DE-BY-UBG DE-898 DE-BY-UBR DE-1050 DE-29T DE-1051 DE-863 DE-BY-FWS DE-Po75 DE-1102 DE-384 DE-523 DE-91G DE-BY-TUM DE-2070s DE-188 DE-B768 DE-634 DE-703 |
owner_facet | DE-473 DE-BY-UBG DE-898 DE-BY-UBR DE-1050 DE-29T DE-1051 DE-863 DE-BY-FWS DE-Po75 DE-1102 DE-384 DE-523 DE-91G DE-BY-TUM DE-2070s DE-188 DE-B768 DE-634 DE-703 |
physical | xx, 649 Seiten Illustrationen, Diagramme 1444 gr |
publishDate | 2024 |
publishDateSearch | 2024 |
publishDateSort | 2024 |
publisher | Springer |
record_format | marc |
spellingShingle | Bishop, Christopher M. 1959- Bishop, Hugh Deep learning foundations and concepts bicssc bisacsh Machine learning Artificial intelligence—Data processing Artificial intelligence Deep learning (DE-588)1135597375 gnd |
subject_GND | (DE-588)1135597375 |
title | Deep learning foundations and concepts |
title_auth | Deep learning foundations and concepts |
title_exact_search | Deep learning foundations and concepts |
title_exact_search_txtP | Deep Learning foundations and concepts |
title_full | Deep learning foundations and concepts Christopher M. Bishop, Hugh Bishop |
title_fullStr | Deep learning foundations and concepts Christopher M. Bishop, Hugh Bishop |
title_full_unstemmed | Deep learning foundations and concepts Christopher M. Bishop, Hugh Bishop |
title_short | Deep learning |
title_sort | deep learning foundations and concepts |
title_sub | foundations and concepts |
topic | bicssc bisacsh Machine learning Artificial intelligence—Data processing Artificial intelligence Deep learning (DE-588)1135597375 gnd |
topic_facet | bicssc bisacsh Machine learning Artificial intelligence—Data processing Artificial intelligence Deep learning |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034812867&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT bishopchristopherm deeplearningfoundationsandconcepts AT bishophugh deeplearningfoundationsandconcepts |
Inhaltsverzeichnis
THWS Würzburg Zentralbibliothek Lesesaal
Signatur: |
1000 ST 302 B622 |
---|---|
Exemplar 1 | ausleihbar Checked out – Rückgabe bis: 03.06.2025 Vormerken |