Verfügbarkeit: Deep learning

Deep learning: foundations and concepts

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Bishop, Christopher M. 1959- (VerfasserIn), Bishop, Hugh (VerfasserIn)
Format:	Buch
Sprache:	English
Veröffentlicht:	Cham, Switzerland Springer [2024]
Schlagworte:	bicssc bisacsh Machine learning Artificial intelligence—Data processing Artificial intelligence Deep Learning
Online-Zugang:	Inhaltsverzeichnis
Beschreibung:	xx, 649 Seiten Illustrationen, Diagramme 1444 gr
ISBN:	9783031454677

Internformat

MARC


LEADER	00000nam a2200000 c 4500
001	BV049467217
003	DE-604
005	20240814
007	t\|
008	231214s2024 xx a\|\|\| \|\|\|\| 00\|\|\| eng d
020			\|a 9783031454677 \|9 978-3-031-45467-7
024	3		\|a 9783031454677
035			\|a (OCoLC)1414165990
035			\|a (DE-599)BVBBV049467217
040			\|a DE-604 \|b ger \|e rda
041	0		\|a eng
049			\|a DE-473 \|a DE-898 \|a DE-1050 \|a DE-29T \|a DE-1051 \|a DE-863 \|a DE-Po75 \|a DE-1102 \|a DE-384 \|a DE-523 \|a DE-91G \|a DE-2070s \|a DE-188 \|a DE-B768 \|a DE-634 \|a DE-703
084			\|a ST 300 \|0 (DE-625)143650: \|2 rvk
084			\|a ST 301 \|0 (DE-625)143651: \|2 rvk
084			\|a DAT 708 \|2 stub
100	1		\|a Bishop, Christopher M. \|d 1959- \|e Verfasser \|0 (DE-588)120454165 \|4 aut
245	1	0	\|a Deep learning \|b foundations and concepts \|c Christopher M. Bishop, Hugh Bishop
264		1	\|a Cham, Switzerland \|b Springer \|c [2024]
264		4	\|c © 2024
300			\|a xx, 649 Seiten \|b Illustrationen, Diagramme \|c 1444 gr
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
650		4	\|a bicssc
650		4	\|a bicssc
650		4	\|a bisacsh
650		4	\|a bisacsh
650		4	\|a Machine learning
650		4	\|a Artificial intelligence—Data processing
650		4	\|a Artificial intelligence
650	0	7	\|a Deep Learning \|0 (DE-588)1135597375 \|2 gnd \|9 rswk-swf
689	0	0	\|a Deep Learning \|0 (DE-588)1135597375 \|D s
689	0		\|5 DE-604
700	1		\|a Bishop, Hugh \|e Verfasser \|4 aut
776	0	8	\|i Erscheint auch als \|n Online-Ausgabe \|z 978-3-031-45468-4
856	4	2	\|m Digitalisierung UB Bamberg - ADAM Catalogue Enrichment \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034812867&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
943	1		\|a oai:aleph.bib-bvb.de:BVB01-034812867

Datensatz im Suchindex

DE-BY-863_location	1000
DE-BY-FWS_call_number	1000/ST 302 B622
DE-BY-FWS_katkey	1069085
DE-BY-FWS_media_number	083101208014
_version_	1824556309899378689
adam_text	Contents Preface Contents v xi 1 The Deep Learning Revolution 1.1 The Impact of Deep Learning. 1.1.1 Medical diagnosis . 1.1.2 Protein structure. 1.1.3 Image synthesis. 1.1.4 Large language models. 1.2 A Tutorial Example . 1.2.1 Synthetic data. 1.2.2 Linear models. 1.2.3 Error function. 1.2.4 Model complexity . 1.2.5 Regularization. 1.2.6 Model selection. 1.3 A Brief History of Machine Learning. 1.3.1 Single-layer networks . 1.3.2 Backpropagation. 1.3.3 Deep networks. 1 2 2 3 4 5 6 6 8 8 9 12 14 16 17 18 20 2 Probabilities 2.1 The Rules of Probability. 2.1.1 A medical screening example . 2.1.2 The sum and product rules. 2.1.3 Bayes’ theorem. 2.1.4 Medical screening revisited . 2.1.5 Prior and posterior probabilities. 23 25 25 26 28 30 31 xi xii CONTENTS 2.1.6 Independent variables . Probability Densities. 2.2.1 Example distributions . 2.2.2 Expectations and covariances . 2.3 The Gaussian Distribution. 2.3.1 Mean and variance. 2.3.2 Likelihood function. 2.3.3 Bias of maximum likelihood. 2.3.4 Linear regression. 2.4 Transformation of Densities. 2.4.1 Multivariate distributions. 2.5 Information Theory. 2.5.1 Entropy. 2.5.2 Physics perspective. 2.5.3 Differential entropy. 2.5.4 Maximum entropy. 2.5.5 Kullback-Leibler divergence. 2.5.6 Conditional entropy . 2.5.7 Mutual information. 2.6 Bayesian Probabilities. 2.6.1 Model parameters. 2.6.2 Regularization . 2.6.3 Bayesian machine learning. Exercises . 31 32 33 34 36 37 37 39 40 42 44 46 46 47 49 50 51 53 54 54 55 56 57 58 Standard Distributions 3.1 Discrete Variables . 3.1.1 Bernoulli distribution. 3.1.2 Binomial distribution. 3.1.3 Multinomial distribution. 3.2 The Multivariate Gaussian. 3.2.1 Geometry of the Gaussian. 3.2.2 Moments. 3.2.3 Limitations. 3.2.4 Conditional distribution . 3.2.5 Marginal distribution. 3.2.6 Bayes’ theorem. 3.2.7 Maximum likelihood. 3.2.8 Sequential estimation. 3.2.9 Mixtures of Gaussians. 3.3 Periodic Variables . 3.3.1 Von Mises distribution. 3.4 The Exponential Family. 3.4.1 Sufficient statistics. 3.5 Nonparametric Methods. 65 66 66 67 68 70 71 74 75 76 79 81 84 85 86 89 89 94 97 98 2.2 3 CONTENTS 3.5.1 Histograms. 3.5.2 Kernel densities. 3.5.3 Nearest-neighbours. Exercises . xiii 98 100 103 105 4 Single-layer Networks: Regression 111 4.1 Linear Regression . 112 4.1.1 Basis functions. 112 4.1.2 Likelihood function. 114 4.1.3 Maximum likelihood. 115 4.1.4 Geometry of least squares. 117 4.1.5 Sequential learning. 117 4.1.6 Regularized least squares. 118 4.1.7 Multiple outputs . 119 4.2 Decision theory. 120 4.3 The Bias-Variance Trade-off. 123 Exercises . 128 5 Single-layer Networks: Classification 131 5.1 Discriminant Functions. 132 5.1.1 Two classes. 132 5.1.2 Multiple classes. 134 5.1.3 1-of-K coding. 135 5.1.4 Least squares for classification. 136 5.2 Decision Theory. 138 5.2.1 Misclassification rate. 139 5.2.2 Expected loss. 140 5.2.3 The reject option. 142 5.2.4 Inference and decision. 143 5.2.5 Classifier accuracy. 147 5.2.6 ROC curve. 148 5.3 Generative Classifiers . 150 5.3.1 Continuous inputs . 152 5.3.2 Maximum likelihood solution.153 5.3.3 Discrete features. 156 5.3.4 Exponential family. 156 5.4 Discriminative Classifiers . 157 5.4.1 Activation functions. 158 5.4.2 Fixed basis functions. 158 5.4.3 Logistic regression. 159 5.4.4 Multi-class logistic regression. 161 5.4.5 Probit regression. 163 5.4.6 Canonical link functions. 164 Exercises . 166 xiv CONTENTS 6 Deep Neural Networks 171 6.1 Limitations of Fixed Basis Functions. 172 6.1.1 The curse of dimensionality.172 6.1.2 High-dimensional spaces. 175 6.1.3 Data manifolds. 176 6.1.4 Data-dependent basis functions . 178 6.2 Multilayer Networks. 180 6.2.1 Parameter matrices. 181 6.2.2 Universal approximation. 181. 6.2.3 Hidden unit activation functions. 182 6.2.4 Weight-space symmetries . 185 6.3 Deep Networks. 186 6.3.1 Hierarchical representations. 187 6.3.2 Distributed representations. 187 6.3.3 Representation learning . 188 6.3.4 Transfer learning. 189 6.3.5 Contrastive learning. 191 6.3.6 General network architectures. 193 6.3.7 Tensors. 194 6.4 Error Functions. 194 6.4.1 Regression . 194 6.4.2 Binary classification. 196 6.4.3 multiclass classification . 197 6.5 Mixture Density Networks. 198 6.5.1 Robot kinematics example. 198 6.5.2 Conditional mixture distributions . 199 6.5.3 Gradient optimization . 201 6.5.4 Predictive distribution . 202 Exercises . 204 7 Gradient Descent 209 7.1 Error Surfaces . 210 7.1.1 Local quadratic approximation. 211 7.2 Gradient Descent Optimization . 213 7.2.1 Use of gradient information. 214 7.2.2 Batch gradient descent.214 7.2.3 Stochastic gradient descent. 214 7.2.4 Mini-batches. 216 7.2.5 Parameter initialization.216 7.3 Convergence. 218 7.3.1 Momentum. 220 7.3.2 Learning rate schedule. 222 7.3.3 RMSProp and Adam. 223 7.4 Normalization . 224 7.4.1 Data normalization. 226 CONTENTS XV 7.4.2 Batch normalization. 227 7.4.3 Layer normalization. 229 Exercises . 230 8 Backpropagation 233 8.1 Evaluation of Gradients . 234 8.1.1 Single-layer networks .234 8.1.2 General feed-forward networks . 235 8.1.3 A simple example .238 8.1.4 Numerical differentiation. 239 8.1.5 The Jacobian matrix. 240 8.1.6 The Hessian matrix. 242 8.2 Automatic Differentiation. 244 8.2.1 Forward-mode automatic differentiation. 246 8.2.2 Reverse-mode automatic differentiation .249 Exercises . 250 9 Regularization 253 9.1 Inductive Bias . 254 9.1.1 Inverse problems. 254 9.1.2 No free lunch theorem. 255 9.1.3 Symmetry and invariance. 256 9.1.4 Equivariance. 259 9.2 Weight Decay. 260 9.2.1 Consistent regularizers. 262 9.2.2 Generalized weight decay. 264 9.3 Learning Curves. 266 9.3.1 Early stopping . 266 9.3.2 Double descent. 268 9.4 Parameter Sharing. 270 9.4.1 Soft weight sharing. 271 9.5 Residual Connections . 274 9.6 Model Averaging. 277 9.6.1 Dropout. 279 Exercises . 281 10 Convolutional Networks 287 10.1 Computer Vision. 288 10.1.1 Image data. 289 10.2 Convolutional Filters. 290 10.2.1 Feature detectors. 290 10.2.2 Translation equivariance.291 10.2.3 Padding. 294 10.2.4 Strided convolutions. 294 10.2.5 Multi-dimensional convolutions. 295 10.2.6 Pooling. 296 xvi CONTENTS 10.2.7 Multilayer convolutions . 298 10.2.8 Example network architectures. 299 10.3 Visualizing Trained CNNs.302 10.3.1 Visual cortex. 302 10.3.2 Visualizing trained filters.303 10.3.3 Saliency maps . 305 10.3.4 Adversarial attacks. 306 10.3.5 Synthetic images. 308 10.4 Object Detection. 308 10.4.1 Bounding boxes . 309 10.4.2 Intersection-over-union. 310 10.4.3 Sliding windows. 311 10.4.4 Detection across scales. 313 10.4.5 Non-max suppression. 314 10.4.6 Fast region CNNs. 314 10.5 Image Segmentation. 315 10.5.1 Convolutional segmentation. 315 10.5.2 Up-sampling. 316 10.5.3 Fully convolutional networks. 318 10.5.4 The U-net architecture. 319 10.6 Style Transfer. 320 Exercises . 322 11 Structured Distributions 325 11.1 Graphical Models. 326 11.1.1 Directed graphs. 326 11.1.2 Factorization. 327 11.1.3 Discrete variables. 329 11.1.4 Gaussian variables. 332 11.1.5 Binary classifier . 334 11.1.6 Parameters and observations. 334 11.1.7 Bayes’ theorem. 336 11.2 Conditional Independence. 337 11.2.1 Three example graphs . 338 11.2.2 Explaining away. 341 11.2.3 D-separation. 343 11.2.4 Naive Bayes . 344 11.2.5 Generative models. 346 11.2.6 Markov blanket. 347 11.2.7 Graphs as filters. 348 11.3 Sequence Models. 349 11.3.1 Hidden variables. 352 Exercises . 353 CONTENTS xvii 12 Transformers 357 12.1 Attention. 358 12.1.1 Transformer processing. 360 12.1.2 Attention coefficients. 361 12.1.3 Self-attention. 362 12.1.4 Network parameters. 363 12.1.5 Scaled self-attention. 366 12.1.6 Multi-head attention. 366 12.1.7 Transformer layers. 368 12.1.8 Computational complexity. 370 12.1.9 Positional encoding. 371 12.2 Natural Language. 374 12.2.1 Word embedding. 375 12.2.2 Tokenization. 377 12.2.3 Bag of words. 378 12.2.4 Autoregressive models.379 12.2.5 Recurrent neural networks. 380 12.2.6 Backpropagation through time. 381 12.3 Transformer Language Models.382 12.3.1 Decoder transformers. 383 12.3.2 Sampling strategies. 386 12.3.3 Encoder transformers. 388 12.3.4 Sequence-to-sequence transformers. 390 12.3.5 Large language models. 390 12.4 Multimodal Transformers. 394 12.4.1 Vision transformers. 395 12.4.2 Generative image transformers. 396 12.4.3 Audio data. 399 12.4.4 Text-to-speech. 400 12.4.5 Vision and language transformers. 402 Exercises . 403 13 Graph Neural Networks 407 13.1 Machine Learning on Graphs . 409 13.1.1 Graph properties. 410 13.1.2 Adjacency matrix. 410 13.1.3 Permutation equivariance.411 13.2 Neural Message-Passing. 412 13.2.1 Convolutional filters. 413 13.2.2 Graph convolutional networks.414 13.2.3 Aggregation operators. 416 13.2.4 Update operators. . 418 13.2.5 Node classification. 419 13.2.6 Edge classification. 420 13.2.7 Graph classification. 420 xviii CONTENTS 13.3 General Graph Networks. 420 13.3.1 Graph attention networks. 421 13.3.2 Edge embeddings. 421 13.3.3 Graph embeddings. 422 13.3.4 Over-smoothing . 422 13.3.5 Regularization . 423 13.3.6 Geometric deep learning. 424 Exercises . 425 14 Sampling 429 14.1 Basic Sampling Algorithms. 430 14.1.1 Expectations. 430 14.1.2 Standard distributions . 431 14.1.3 Rejection sampling. 433 14.1.4 Adaptive rejection sampling. 435 14.1.5 Importance sampling. 437 14.1.6 Sampling-importance-resampling . 439 14.2 Markov Chain Monte Carlo. 440 14.2.1 The Metropolis algorithm . 441 14.2.2 Markov chains. 442 14.2.3 The Metropolis-Hastings algorithm. 445 14.2.4 Gibbs sampling. 446 14.2.5 Ancestral sampling. 450 14.3 Langevin Sampling. 451 14.3.1 Energy-based models. 452 14.3.2 Maximizing the likelihood. 453 14.3.3 Langevin dynamics. 454 Exercises . 456 15 Discrete Latent Variables 459 15.1 K-means Clustering. 460 15.1.1 Image segmentation . 464 15.2 Mixtures of Gaussians. 466 15.2.1 Likelihood function. 468 15.2.2 Maximum likelihood. 470 15.3 Expectation-Maximization Algorithm. 474 15.3.1 Gaussian mixtures. 478 15.3.2 Relation to К-means. 480 15.3.3 Mixtures of Bernoullidistributions. 481 15.4 Evidence Lower Bound . 485 15.4.1 EM revisited. 486 15.4.2 Independent and identically distributed data. 488 15.4.3 Parameter priors. 489 15.4.4 Generalized EM . 489 15.4.5 Sequential EM. 490 Exercises . 490 CONTENTS Xix 16 Continuous Latent Variables 495 16.1 Principal Component Analysis. 497 16.1.1 Maximum variance formulation. 497 16.1.2 Minimum-error formulation. 499 16.1.3 Data compression. 501 16.1.4 Data whitening. 502 16.1.5 High-dimensional data. 504 16.2 Probabilistic Latent Variables. 506 16.2.1 Generative model. 506 16.2.2 Likelihood function. 507 16.2.3 Maximum likelihood. 509 16.2.4 Factor analysis. 513 16.2.5 Independent component analysis. 514 16.2.6 Kalman filters. 515 16.3 Evidence Lower Bound . 516 16.3.1 Expectation maximization. 518 16.3.2 EMforPCA. 519 16.3.3 EM for factor analysis.520 16.4 Nonlinear Latent Variable Models. 522 16.4.1 Nonlinear manifolds. 522 16.4.2 Likelihood function. 524 16.4.3 Discrete data. 526 16.4.4 Four approaches to generative modelling . 527 Exercises . 527 17 Generative Adversarial Networks 533 17.1 Adversarial Training. 534 17.1.1 Loss function. 535 17.1.2 GAN training in practice.536 17.2 Image GANs. 539 17.2.1 CycleGAN. 539 Exercises . 544 18 Normalizing Flows 547 18.1 Coupling Flows. 549 18.2 Autoregressive Flows. 552 18.3 Continuous Flows . 554 18.3.1 Neural differential equations. 554 18.3.2 Neural ODE backpropagation . 555 18.3.3 Neural ODE flows. 557 Exercises . 559 XX CONTENTS 19 Autoencoders 563 19.1 Deterministic Autoencoders. 564 19.1.1 Linearautoencoders. 564 19.1.2 Deepautoencoders. 565 19.1.3 Sparse autoencoders. 566 19.1.4 Denoising autoencoders. 567 19.1.5 Masked autoencoders. 567 19.2 Variational Autoencoders. 569 19.2.1 Amortized inference. 572 19.2.2 The reparameterizationtrick. 574 Exercises . 578 20 Diffusion Models 581 20.1 Forward Encoder.582 20.1.1 Diffusion kernel . 583 20.1.2 Conditional distribution . 584 20.2 Reverse Decoder.585 20.2.1 Training the decoder. 587 20.2.2 Evidence lower bound. 588 20.2.3 Rewriting the ELBO. 589 20.2.4 Predicting the noise.591 20.2.5 Generating new samples.592 20.3 Score Matching. 594 20.3.1 Score loss function. 595 20.3.2 Modified score loss. 596 20.3.3 Noise variance. 597 20.3.4 Stochastic differential equations. 598 20.4 Guided Diffusion. 599 20.4.1 Classifier guidance. 600 20.4.2 Classifier-free guidance . 600 Exercises . 603 Appendix A Linear Algebra 609 A.l Matrix Identities. 609 A.2 Traces and Determinants. 610 A.3 Matrix Derivatives. 611 A.4 Eigenvectors. 612 Appendix В Calculus of Variations 617 Appendix C Lagrange Multipliers 621 Bibliography 625 Index 641
adam_txt	Contents Preface Contents v xi 1 The Deep Learning Revolution 1.1 The Impact of Deep Learning. 1.1.1 Medical diagnosis . 1.1.2 Protein structure. 1.1.3 Image synthesis. 1.1.4 Large language models. 1.2 A Tutorial Example . 1.2.1 Synthetic data. 1.2.2 Linear models. 1.2.3 Error function. 1.2.4 Model complexity . 1.2.5 Regularization. 1.2.6 Model selection. 1.3 A Brief History of Machine Learning. 1.3.1 Single-layer networks . 1.3.2 Backpropagation. 1.3.3 Deep networks. 1 2 2 3 4 5 6 6 8 8 9 12 14 16 17 18 20 2 Probabilities 2.1 The Rules of Probability. 2.1.1 A medical screening example . 2.1.2 The sum and product rules. 2.1.3 Bayes’ theorem. 2.1.4 Medical screening revisited . 2.1.5 Prior and posterior probabilities. 23 25 25 26 28 30 31 xi xii CONTENTS 2.1.6 Independent variables . Probability Densities. 2.2.1 Example distributions . 2.2.2 Expectations and covariances . 2.3 The Gaussian Distribution. 2.3.1 Mean and variance. 2.3.2 Likelihood function. 2.3.3 Bias of maximum likelihood. 2.3.4 Linear regression. 2.4 Transformation of Densities. 2.4.1 Multivariate distributions. 2.5 Information Theory. 2.5.1 Entropy. 2.5.2 Physics perspective. 2.5.3 Differential entropy. 2.5.4 Maximum entropy. 2.5.5 Kullback-Leibler divergence. 2.5.6 Conditional entropy . 2.5.7 Mutual information. 2.6 Bayesian Probabilities. 2.6.1 Model parameters. 2.6.2 Regularization . 2.6.3 Bayesian machine learning. Exercises . 31 32 33 34 36 37 37 39 40 42 44 46 46 47 49 50 51 53 54 54 55 56 57 58 Standard Distributions 3.1 Discrete Variables . 3.1.1 Bernoulli distribution. 3.1.2 Binomial distribution. 3.1.3 Multinomial distribution. 3.2 The Multivariate Gaussian. 3.2.1 Geometry of the Gaussian. 3.2.2 Moments. 3.2.3 Limitations. 3.2.4 Conditional distribution . 3.2.5 Marginal distribution. 3.2.6 Bayes’ theorem. 3.2.7 Maximum likelihood. 3.2.8 Sequential estimation. 3.2.9 Mixtures of Gaussians. 3.3 Periodic Variables . 3.3.1 Von Mises distribution. 3.4 The Exponential Family. 3.4.1 Sufficient statistics. 3.5 Nonparametric Methods. 65 66 66 67 68 70 71 74 75 76 79 81 84 85 86 89 89 94 97 98 2.2 3 CONTENTS 3.5.1 Histograms. 3.5.2 Kernel densities. 3.5.3 Nearest-neighbours. Exercises . xiii 98 100 103 105 4 Single-layer Networks: Regression 111 4.1 Linear Regression . 112 4.1.1 Basis functions. 112 4.1.2 Likelihood function. 114 4.1.3 Maximum likelihood. 115 4.1.4 Geometry of least squares. 117 4.1.5 Sequential learning. 117 4.1.6 Regularized least squares. 118 4.1.7 Multiple outputs . 119 4.2 Decision theory. 120 4.3 The Bias-Variance Trade-off. 123 Exercises . 128 5 Single-layer Networks: Classification 131 5.1 Discriminant Functions. 132 5.1.1 Two classes. 132 5.1.2 Multiple classes. 134 5.1.3 1-of-K coding. 135 5.1.4 Least squares for classification. 136 5.2 Decision Theory. 138 5.2.1 Misclassification rate. 139 5.2.2 Expected loss. 140 5.2.3 The reject option. 142 5.2.4 Inference and decision. 143 5.2.5 Classifier accuracy. 147 5.2.6 ROC curve. 148 5.3 Generative Classifiers . 150 5.3.1 Continuous inputs . 152 5.3.2 Maximum likelihood solution.153 5.3.3 Discrete features. 156 5.3.4 Exponential family. 156 5.4 Discriminative Classifiers . 157 5.4.1 Activation functions. 158 5.4.2 Fixed basis functions. 158 5.4.3 Logistic regression. 159 5.4.4 Multi-class logistic regression. 161 5.4.5 Probit regression. 163 5.4.6 Canonical link functions. 164 Exercises . 166 xiv CONTENTS 6 Deep Neural Networks 171 6.1 Limitations of Fixed Basis Functions. 172 6.1.1 The curse of dimensionality.172 6.1.2 High-dimensional spaces. 175 6.1.3 Data manifolds. 176 6.1.4 Data-dependent basis functions . 178 6.2 Multilayer Networks. 180 6.2.1 Parameter matrices. 181 6.2.2 Universal approximation. 181. 6.2.3 Hidden unit activation functions. 182 6.2.4 Weight-space symmetries . 185 6.3 Deep Networks. 186 6.3.1 Hierarchical representations. 187 6.3.2 Distributed representations. 187 6.3.3 Representation learning . 188 6.3.4 Transfer learning. 189 6.3.5 Contrastive learning. 191 6.3.6 General network architectures. 193 6.3.7 Tensors. 194 6.4 Error Functions. 194 6.4.1 Regression . 194 6.4.2 Binary classification. 196 6.4.3 multiclass classification . 197 6.5 Mixture Density Networks. 198 6.5.1 Robot kinematics example. 198 6.5.2 Conditional mixture distributions . 199 6.5.3 Gradient optimization . 201 6.5.4 Predictive distribution . 202 Exercises . 204 7 Gradient Descent 209 7.1 Error Surfaces . 210 7.1.1 Local quadratic approximation. 211 7.2 Gradient Descent Optimization . 213 7.2.1 Use of gradient information. 214 7.2.2 Batch gradient descent.214 7.2.3 Stochastic gradient descent. 214 7.2.4 Mini-batches. 216 7.2.5 Parameter initialization.216 7.3 Convergence. 218 7.3.1 Momentum. 220 7.3.2 Learning rate schedule. 222 7.3.3 RMSProp and Adam. 223 7.4 Normalization . 224 7.4.1 Data normalization. 226 CONTENTS XV 7.4.2 Batch normalization. 227 7.4.3 Layer normalization. 229 Exercises . 230 8 Backpropagation 233 8.1 Evaluation of Gradients . 234 8.1.1 Single-layer networks .234 8.1.2 General feed-forward networks . 235 8.1.3 A simple example .238 8.1.4 Numerical differentiation. 239 8.1.5 The Jacobian matrix. 240 8.1.6 The Hessian matrix. 242 8.2 Automatic Differentiation. 244 8.2.1 Forward-mode automatic differentiation. 246 8.2.2 Reverse-mode automatic differentiation .249 Exercises . 250 9 Regularization 253 9.1 Inductive Bias . 254 9.1.1 Inverse problems. 254 9.1.2 No free lunch theorem. 255 9.1.3 Symmetry and invariance. 256 9.1.4 Equivariance. 259 9.2 Weight Decay. 260 9.2.1 Consistent regularizers. 262 9.2.2 Generalized weight decay. 264 9.3 Learning Curves. 266 9.3.1 Early stopping . 266 9.3.2 Double descent. 268 9.4 Parameter Sharing. 270 9.4.1 Soft weight sharing. 271 9.5 Residual Connections . 274 9.6 Model Averaging. 277 9.6.1 Dropout. 279 Exercises . 281 10 Convolutional Networks 287 10.1 Computer Vision. 288 10.1.1 Image data. 289 10.2 Convolutional Filters. 290 10.2.1 Feature detectors. 290 10.2.2 Translation equivariance.291 10.2.3 Padding. 294 10.2.4 Strided convolutions. 294 10.2.5 Multi-dimensional convolutions. 295 10.2.6 Pooling. 296 xvi CONTENTS 10.2.7 Multilayer convolutions . 298 10.2.8 Example network architectures. 299 10.3 Visualizing Trained CNNs.302 10.3.1 Visual cortex. 302 10.3.2 Visualizing trained filters.303 10.3.3 Saliency maps . 305 10.3.4 Adversarial attacks. 306 10.3.5 Synthetic images. 308 10.4 Object Detection. 308 10.4.1 Bounding boxes . 309 10.4.2 Intersection-over-union. 310 10.4.3 Sliding windows. 311 10.4.4 Detection across scales. 313 10.4.5 Non-max suppression. 314 10.4.6 Fast region CNNs. 314 10.5 Image Segmentation. 315 10.5.1 Convolutional segmentation. 315 10.5.2 Up-sampling. 316 10.5.3 Fully convolutional networks. 318 10.5.4 The U-net architecture. 319 10.6 Style Transfer. 320 Exercises . 322 11 Structured Distributions 325 11.1 Graphical Models. 326 11.1.1 Directed graphs. 326 11.1.2 Factorization. 327 11.1.3 Discrete variables. 329 11.1.4 Gaussian variables. 332 11.1.5 Binary classifier . 334 11.1.6 Parameters and observations. 334 11.1.7 Bayes’ theorem. 336 11.2 Conditional Independence. 337 11.2.1 Three example graphs . 338 11.2.2 Explaining away. 341 11.2.3 D-separation. 343 11.2.4 Naive Bayes . 344 11.2.5 Generative models. 346 11.2.6 Markov blanket. 347 11.2.7 Graphs as filters. 348 11.3 Sequence Models. 349 11.3.1 Hidden variables. 352 Exercises . 353 CONTENTS xvii 12 Transformers 357 12.1 Attention. 358 12.1.1 Transformer processing. 360 12.1.2 Attention coefficients. 361 12.1.3 Self-attention. 362 12.1.4 Network parameters. 363 12.1.5 Scaled self-attention. 366 12.1.6 Multi-head attention. 366 12.1.7 Transformer layers. 368 12.1.8 Computational complexity. 370 12.1.9 Positional encoding. 371 12.2 Natural Language. 374 12.2.1 Word embedding. 375 12.2.2 Tokenization. 377 12.2.3 Bag of words. 378 12.2.4 Autoregressive models.379 12.2.5 Recurrent neural networks. 380 12.2.6 Backpropagation through time. 381 12.3 Transformer Language Models.382 12.3.1 Decoder transformers. 383 12.3.2 Sampling strategies. 386 12.3.3 Encoder transformers. 388 12.3.4 Sequence-to-sequence transformers. 390 12.3.5 Large language models. 390 12.4 Multimodal Transformers. 394 12.4.1 Vision transformers. 395 12.4.2 Generative image transformers. 396 12.4.3 Audio data. 399 12.4.4 Text-to-speech. 400 12.4.5 Vision and language transformers. 402 Exercises . 403 13 Graph Neural Networks 407 13.1 Machine Learning on Graphs . 409 13.1.1 Graph properties. 410 13.1.2 Adjacency matrix. 410 13.1.3 Permutation equivariance.411 13.2 Neural Message-Passing. 412 13.2.1 Convolutional filters. 413 13.2.2 Graph convolutional networks.414 13.2.3 Aggregation operators. 416 13.2.4 Update operators. . 418 13.2.5 Node classification. 419 13.2.6 Edge classification. 420 13.2.7 Graph classification. 420 xviii CONTENTS 13.3 General Graph Networks. 420 13.3.1 Graph attention networks. 421 13.3.2 Edge embeddings. 421 13.3.3 Graph embeddings. 422 13.3.4 Over-smoothing . 422 13.3.5 Regularization . 423 13.3.6 Geometric deep learning. 424 Exercises . 425 14 Sampling 429 14.1 Basic Sampling Algorithms. 430 14.1.1 Expectations. 430 14.1.2 Standard distributions . 431 14.1.3 Rejection sampling. 433 14.1.4 Adaptive rejection sampling. 435 14.1.5 Importance sampling. 437 14.1.6 Sampling-importance-resampling . 439 14.2 Markov Chain Monte Carlo. 440 14.2.1 The Metropolis algorithm . 441 14.2.2 Markov chains. 442 14.2.3 The Metropolis-Hastings algorithm. 445 14.2.4 Gibbs sampling. 446 14.2.5 Ancestral sampling. 450 14.3 Langevin Sampling. 451 14.3.1 Energy-based models. 452 14.3.2 Maximizing the likelihood. 453 14.3.3 Langevin dynamics. 454 Exercises . 456 15 Discrete Latent Variables 459 15.1 K-means Clustering. 460 15.1.1 Image segmentation . 464 15.2 Mixtures of Gaussians. 466 15.2.1 Likelihood function. 468 15.2.2 Maximum likelihood. 470 15.3 Expectation-Maximization Algorithm. 474 15.3.1 Gaussian mixtures. 478 15.3.2 Relation to К-means. 480 15.3.3 Mixtures of Bernoullidistributions. 481 15.4 Evidence Lower Bound . 485 15.4.1 EM revisited. 486 15.4.2 Independent and identically distributed data. 488 15.4.3 Parameter priors. 489 15.4.4 Generalized EM . 489 15.4.5 Sequential EM. 490 Exercises . 490 CONTENTS Xix 16 Continuous Latent Variables 495 16.1 Principal Component Analysis. 497 16.1.1 Maximum variance formulation. 497 16.1.2 Minimum-error formulation. 499 16.1.3 Data compression. 501 16.1.4 Data whitening. 502 16.1.5 High-dimensional data. 504 16.2 Probabilistic Latent Variables. 506 16.2.1 Generative model. 506 16.2.2 Likelihood function. 507 16.2.3 Maximum likelihood. 509 16.2.4 Factor analysis. 513 16.2.5 Independent component analysis. 514 16.2.6 Kalman filters. 515 16.3 Evidence Lower Bound . 516 16.3.1 Expectation maximization. 518 16.3.2 EMforPCA. 519 16.3.3 EM for factor analysis.520 16.4 Nonlinear Latent Variable Models. 522 16.4.1 Nonlinear manifolds. 522 16.4.2 Likelihood function. 524 16.4.3 Discrete data. 526 16.4.4 Four approaches to generative modelling . 527 Exercises . 527 17 Generative Adversarial Networks 533 17.1 Adversarial Training. 534 17.1.1 Loss function. 535 17.1.2 GAN training in practice.536 17.2 Image GANs. 539 17.2.1 CycleGAN. 539 Exercises . 544 18 Normalizing Flows 547 18.1 Coupling Flows. 549 18.2 Autoregressive Flows. 552 18.3 Continuous Flows . 554 18.3.1 Neural differential equations. 554 18.3.2 Neural ODE backpropagation . 555 18.3.3 Neural ODE flows. 557 Exercises . 559 XX CONTENTS 19 Autoencoders 563 19.1 Deterministic Autoencoders. 564 19.1.1 Linearautoencoders. 564 19.1.2 Deepautoencoders. 565 19.1.3 Sparse autoencoders. 566 19.1.4 Denoising autoencoders. 567 19.1.5 Masked autoencoders. 567 19.2 Variational Autoencoders. 569 19.2.1 Amortized inference. 572 19.2.2 The reparameterizationtrick. 574 Exercises . 578 20 Diffusion Models 581 20.1 Forward Encoder.582 20.1.1 Diffusion kernel . 583 20.1.2 Conditional distribution . 584 20.2 Reverse Decoder.585 20.2.1 Training the decoder. 587 20.2.2 Evidence lower bound. 588 20.2.3 Rewriting the ELBO. 589 20.2.4 Predicting the noise.591 20.2.5 Generating new samples.592 20.3 Score Matching. 594 20.3.1 Score loss function. 595 20.3.2 Modified score loss. 596 20.3.3 Noise variance. 597 20.3.4 Stochastic differential equations. 598 20.4 Guided Diffusion. 599 20.4.1 Classifier guidance. 600 20.4.2 Classifier-free guidance . 600 Exercises . 603 Appendix A Linear Algebra 609 A.l Matrix Identities. 609 A.2 Traces and Determinants. 610 A.3 Matrix Derivatives. 611 A.4 Eigenvectors. 612 Appendix В Calculus of Variations 617 Appendix C Lagrange Multipliers 621 Bibliography 625 Index 641
any_adam_object	1
any_adam_object_boolean	1
author	Bishop, Christopher M. 1959- Bishop, Hugh
author_GND	(DE-588)120454165
author_facet	Bishop, Christopher M. 1959- Bishop, Hugh
author_role	aut aut
author_sort	Bishop, Christopher M. 1959-
author_variant	c m b cm cmb h b hb
building	Verbundindex
bvnumber	BV049467217
classification_rvk	ST 300 ST 301
classification_tum	DAT 708
ctrlnum	(OCoLC)1414165990 (DE-599)BVBBV049467217
discipline	Informatik
discipline_str_mv	Informatik
format	Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV049467217</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20240814</controlfield><controlfield tag="007">t\|</controlfield><controlfield tag="008">231214s2024 xx a\|\|\| \|\|\|\| 00\|\|\| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9783031454677</subfield><subfield code="9">978-3-031-45467-7</subfield></datafield><datafield tag="024" ind1="3" ind2=" "><subfield code="a">9783031454677</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1414165990</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV049467217</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-473</subfield><subfield code="a">DE-898</subfield><subfield code="a">DE-1050</subfield><subfield code="a">DE-29T</subfield><subfield code="a">DE-1051</subfield><subfield code="a">DE-863</subfield><subfield code="a">DE-Po75</subfield><subfield code="a">DE-1102</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-523</subfield><subfield code="a">DE-91G</subfield><subfield code="a">DE-2070s</subfield><subfield code="a">DE-188</subfield><subfield code="a">DE-B768</subfield><subfield code="a">DE-634</subfield><subfield code="a">DE-703</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 301</subfield><subfield code="0">(DE-625)143651:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 708</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Bishop, Christopher M.</subfield><subfield code="d">1959-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)120454165</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Deep learning</subfield><subfield code="b">foundations and concepts</subfield><subfield code="c">Christopher M. Bishop, Hugh Bishop</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Cham, Switzerland</subfield><subfield code="b">Springer</subfield><subfield code="c">[2024]</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">© 2024</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xx, 649 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield><subfield code="c">1444 gr</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">bicssc</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">bicssc</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">bisacsh</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">bisacsh</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Machine learning</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Artificial intelligence—Data processing</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Artificial intelligence</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Deep Learning</subfield><subfield code="0">(DE-588)1135597375</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Deep Learning</subfield><subfield code="0">(DE-588)1135597375</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Bishop, Hugh</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">978-3-031-45468-4</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bamberg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034812867&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-034812867</subfield></datafield></record></collection>
id	DE-604.BV049467217
illustrated	Illustrated
index_date	2024-07-03T23:16:05Z
indexdate	2025-02-20T07:21:31Z
institution	BVB
isbn	9783031454677
language	English
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-034812867
oclc_num	1414165990
open_access_boolean
owner	DE-473 DE-BY-UBG DE-898 DE-BY-UBR DE-1050 DE-29T DE-1051 DE-863 DE-BY-FWS DE-Po75 DE-1102 DE-384 DE-523 DE-91G DE-BY-TUM DE-2070s DE-188 DE-B768 DE-634 DE-703
owner_facet	DE-473 DE-BY-UBG DE-898 DE-BY-UBR DE-1050 DE-29T DE-1051 DE-863 DE-BY-FWS DE-Po75 DE-1102 DE-384 DE-523 DE-91G DE-BY-TUM DE-2070s DE-188 DE-B768 DE-634 DE-703
physical	xx, 649 Seiten Illustrationen, Diagramme 1444 gr
publishDate	2024
publishDateSearch	2024
publishDateSort	2024
publisher	Springer
record_format	marc
spellingShingle	Bishop, Christopher M. 1959- Bishop, Hugh Deep learning foundations and concepts bicssc bisacsh Machine learning Artificial intelligence—Data processing Artificial intelligence Deep Learning (DE-588)1135597375 gnd
subject_GND	(DE-588)1135597375
title	Deep learning foundations and concepts
title_auth	Deep learning foundations and concepts
title_exact_search	Deep learning foundations and concepts
title_exact_search_txtP	Deep Learning foundations and concepts
title_full	Deep learning foundations and concepts Christopher M. Bishop, Hugh Bishop
title_fullStr	Deep learning foundations and concepts Christopher M. Bishop, Hugh Bishop
title_full_unstemmed	Deep learning foundations and concepts Christopher M. Bishop, Hugh Bishop
title_short	Deep learning
title_sort	deep learning foundations and concepts
title_sub	foundations and concepts
topic	bicssc bisacsh Machine learning Artificial intelligence—Data processing Artificial intelligence Deep Learning (DE-588)1135597375 gnd
topic_facet	bicssc bisacsh Machine learning Artificial intelligence—Data processing Artificial intelligence Deep Learning
url	http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034812867&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA
work_keys_str_mv	AT bishopchristopherm deeplearningfoundationsandconcepts AT bishophugh deeplearningfoundationsandconcepts

Verfügbarkeit

Inhaltsverzeichnis

THWS Würzburg Zentralbibliothek Lesesaal

Bestandesangaben von THWS Würzburg Zentralbibliothek Lesesaal
Signatur:	1000 ST 302 B622
Exemplar 1	ausleihbar Checked out – Rückgabe bis: 03.06.2025 Vormerken

MARC

Datensatz im Suchindex

THWS Würzburg Zentralbibliothek Lesesaal

Ähnliche Einträge