Inference and learning from data:
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Cambridge
Cambridge University Press
2023
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | 3 vol. |
ISBN: | 9781009218108 |
Internformat
MARC
LEADER | 00000nam a2200000 ca4500 | ||
---|---|---|---|
001 | BV048690777 | ||
003 | DE-604 | ||
005 | 20230825 | ||
007 | t | ||
008 | 230203s2023 xxk |||| 00||| eng d | ||
020 | |a 9781009218108 |c set hbk. £ 210.00 |9 978-1-009-21810-8 | ||
035 | |a (DE-599)BVBBV048690777 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
044 | |a xxk |c XA-GB | ||
084 | |a ST 300 |0 (DE-625)143650: |2 rvk | ||
084 | |a DAT 708 |2 stub | ||
100 | 1 | |a Sayed, Ali H. |e Verfasser |0 (DE-588)128216810X |4 aut | |
245 | 1 | 0 | |a Inference and learning from data |c Ali H. Sayed (École Polytechnique Fédérale de Lausanne, University of California at Los Angeles) |
264 | 1 | |a Cambridge |b Cambridge University Press |c 2023 | |
300 | |a 3 vol. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 0 | 7 | |a Lernen |0 (DE-588)4035408-8 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Big Data |0 (DE-588)4802620-7 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Schlussfolgern |0 (DE-588)4251178-1 |2 gnd |9 rswk-swf |
653 | 0 | |a Inference | |
653 | 0 | |a Learning | |
653 | 0 | |a Big data / Mathematical models | |
653 | 0 | |a Big data / Statistical methods | |
653 | 0 | |a Inference | |
653 | 0 | |a Learning | |
689 | 0 | 0 | |a Schlussfolgern |0 (DE-588)4251178-1 |D s |
689 | 0 | 1 | |a Lernen |0 (DE-588)4035408-8 |D s |
689 | 0 | 2 | |a Big Data |0 (DE-588)4802620-7 |D s |
689 | 0 | |5 DE-604 | |
776 | 0 | 8 | |i Erscheint auch als |n Online-Ausgabe |z 9781009218146 |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034064987&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-034064987 |
Datensatz im Suchindex
_version_ | 1804184871878262784 |
---|---|
adam_text | Contents VOLUME I FOUNDATIONS Preface P. 1 Emphasis on Foundations P.2 Glimpse of History P.3 Organization of the Text P.4 How to Use the Text P.5 Simulation Datasets P.6 Acknowledgments Notation 1 Matrix Theory 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 l.A l.B 2 Symmetric Matrices Positive-Definite Matrices Range Spaces and Nullspaces Schur Complements Cholesky Factorization QR Decomposition Singular Value Decomposition Square-Root Matrices Kronecker Products Vector and Matrix Norms Perturbation Bounds on Eigenvalues Stochastic Matrices Complex-Valued Matrices Commentaries and Discussion Problems Proof of Spectral Theorem Constructive Proof of SVD References Vector Differentiation 2.1 2.2 Gradient Vectors Hessian Matrices paye xxvii xxvii xxix xxxi xxxiv xxxvii xl xlv 1 1 5 7 11 11 18 20 22 24 30 37 38 39 41 47 50 52 53 59 59 62
viii Contents 2.3 2.4 3 Matrix Differentiation Commentaries and Discussion Problems References Random Variables Probability Density Functions Mean and Variance Dependent Random Variables Random Vectors Properties of Covaiiance Matrices Illustrative Applications Complex-Valued Variables Commentaries and Discussion Problems 3.A Convergence of Random Variables 3.B Concentration Inequalities References 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 4 Gaussian Distribution 4.1 4.2 4.3 4.4 4.5 4.6 4.7 5 Exponential Distributions 5.1 5.2 5.3 5.4 5.5 5.A 6 Scalar Gaussian Variables Vector Gaussian Variables Useful Gaussian Manipulations Jointly Distributed Gaussian Variables Gaussian Processes Circular Gaussian Distribution Commentaries and Discussion Problems References Definition Special Cases Useful Properties Conjugate Priors Commentaries and Discussion Problems Derivation of Properties References Entropy and Divergence 6.1 6.2 6.3 Information and Entropy Kullback Leibler Divergence Maximum Entropy Distribution 63 65 65 67 68 68 71 77 93 96 97 106 109 112 119 122 128 132 132 134 138 144 150 155 157 160 165 167 167 169 178 183 187 189 192 195 196 196 204 209
Contents 6.4 6.5 6.6 6.7 6.8 7 Random Processes 7.1 7.2 7.3 7.4 8 Convex Sets Convexity Strict Convexity Strong Convexity Hessian Matrix Conditions Subgradient Vectors Jensen Inequality Conjugate Functions Bregman Divergence Commentaries and Discussion Problems References Convex Optimization 9.1 9.2 9.3 9.4 9.5 10 Stationary Processes Power Spectral Density Spectral Factorization Commentaries and Discussion Problems References Convex Functions 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 9 Moment Matching Fisher Information Matrix Natural Gradients Evidence Lower Bound Commentaries and Discussion Problems References Convex Optimization Problems Equality Constraints Motivating the KKT Conditions Projection onto Convex Sets Commentaries and Discussion Problems References Lipschitz Conditions 10.1 Mean-Value Theorem 10.2 h-Smootli Functions 10.3 Commeiitaries and Discussion Problems References ¡X 211 213 217 227 231 234 237 240 240 245 252 255 257 259 261 261 263 265 266 268 272 279 281 285 290 293 299 302 302 310 312 315 322 323 328 330 330 332 337 338 340
X Contents 11 Proximal Operator 11.1 11.2 11.3 11.4 11.5 11.6 Definition and Properties Proximal Point Algorithm Proximal Gradient Algorithm Convergence Results Douglas-Racliford Algorithm Commentaries and Discussion Problems 11. A Convergence under Convexity 11. В Convergence under Strong Convexity References 12 Gradient-Descent Method Empirical and Stochastic Risks Conditions on Risk Function Constant Step Sizes Iteration-Dependent Step-Sizes Coordinate-Descent Method Alternating Projection Algorithm Commentaries and Discussion Problems 12.A Zeroth-Order Optimization References 12.1 12.2 12.3 12.4 12.5 12.6 12.7 13 Conjugate Gradient Method 13.1 13.2 13.3 13.4 14 Linear Systems of Equations Nonlinear Optimization Convergence Analysis Comnientaries and Discussion Problems References Subgradient Method Subgradient Algorithm Conditions on Risk Function Convergence Behavior Pocket Variable Exponential Smoothing Iteration-Dependent Step Sizes Coordinate-Descent Algorithms Comnientaries and Discussion Problems 14.A Deterministic Inequality Recursion References 14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8 341 341 347 349 354 356 358 362 366 369 372 375 375 379 381 392 402 413 418 425 433 436 441 441 454 459 465 466 469 471 471 475 479 483 486 489 493 496 498 501 505
Contents 15 Proximal and Mirror-Descent Methods 15.1 15.2 15.3 15.4 15.5 16 Proximal Gradient Method Projection Gradient Method Mirror-Descent Method Comparison of Convergence Rates Commentaries and Discussion Problems References Stochastic Optimization Stochastic Gradient Algorithm Stochastic Subgradient Algorithm Stochastic Proximal Gradient Algorithm Gradient Noise Regret Analysis Commentaries and Discussion Problems 16.A Switching Expectation and Differentiation References 16.1 16.2 16.3 16.4 16.5 16.6 17 Adaptive Gradient Methods 17.1 17.2 17.3 17.4 17.5 17.6 17.7 Motivation AdaGrad Algorithm RMSprop Algorithm ADAM Algorithm Momentum Acceleration Methods Federated Learning Commentaries and Discussion Problems 17.A Regret Analysis for ADAM References 18 Gradient Noise 18.1 18.2 18.3 18.4 18.5 18.6 Motivation Smooth Risk Functions Gradient Noise for Smooth Risks Nonsmooth Risk Functions Gradient Noise for Nonsmooth Risks Commentaries and Discussion Problem,s 18.A Averaging over Mini-Batches 18.В Auxiliary Variance Result References XI 507 507 515 519 537 539 541 544 547 548 565 569 574 576 582 586 590 595 599 599 603 608 610 614 619 626 630 632 640 642 642 645 648 660 665 673 675 677 679 681
xii Contents 19 Convergence Analysis 1: Stochastic Gradient Algorithms Problem Setting Convergence nudar Uniform Sampling Convergence of Mini-Batch Implementation Convergence under Vanishing Step Sizes Convergence under Random Reshuffling Convergence under Importance Sainpling Convergence of Stochastic Conjugate Gradient Commentaries and Discussion Problems 19.A Stochastic Inequality Recursion 19.В Proof of Theorem 19.5 References 19.1 19.2 19.3 19.4 19.5 19.6 19.7 19.8 20 Convergence Analysis II: Stochastic Subgradient Algorithms 20.1 20.2 20.3 20.4 20.5 20.6 20.7 21 Convergence Analysis III: Stochastic Proximal Algorithms 21.1 21.2 21.3 21.4 21.5 21.6 21.7 22 Problem Setting Convergence under Uniform Sampling Convergence with Pocket Variables Convergence with Exponential Smoothing Convergence of Mini-Batch Implementáljon Convergence under Vanishing Step Sizes Commentaries and Discussion Problems References Problem Setting Convergence under Uniform Sampling Convergence of Mini-Batch Implementation Convergence under Vanishing Step Sizes Stochastic Projection Gradient Mirror-Descent Algorithm Commentaries and Discussion Problem,s References Variance-Reduced Methods 1: Uniform Sampling 22.1 22.2 22.3 22.4 22.5 22.6 Problem Setting Naïve Stochastic Gradient Algorithm Stochastic Average-Gradient Algorithm (SAGA) Stochastic Variance-Reduced Gradient Algorithm (SVRG) Nonsmooth Risk Functions Commentaries and Discussion Problems 683 683 686 691 692 698 701 707 712 716 720 722 727 730 730 735 738 740 745 747 750 753 754 756 756 761 765 766 769 771 774 775 776 779 779 782 785 793
799 806 808
Contents 22. A 22.В 23 Variance-Reduced Methods II: Random Reshuffling 23.1 23.2 23.3 23.4 23.5 23.6 23.7 23.A 23.В 23,G 23.D 23.E 24 Amortized Variance-Reduced Gradient Algorithm (AVRG) Evolution of Memory Variable s Convergence of SAGA Convergence of AVRG Convergence of SVRG Nonsmooth Risk Functions Commentaries and Discussion Problems Proof of Lemma 23.3 Proof of Lemma 23.4 Proof of Theorem 23.1 Proof of Lemma 23.5 Proof of Theorem 23.2 References Nonconvex Optimization 24.1 24.2 24.3 24.4 24.A 24.В 24.C 24.D 25 Proof of Theorem 22.2 Proof of Theorem 22.3 References First- and Second-Order Stationarity Stochastic Gradient Optimization Convergence Behavior Commentaries and Discussion Problems Descent in the Large Gradient Régime Introducing a Short-Term Model Descent Away from Strict Saddle Points Second-Order Convergence Guarantee References Decentralized Optimization I: Primal Methods Graph Topology Weight Matrices Aggregate and Local Risks Incremental, Consensus, and Diffusion Formal Derivation as Primal Methods Commentaries and Discussion Problems 25.A Proof of Lemma 25.1 25.В Proof of Property (25.71) 25.C Convergence of Primal Algorithms References 25.1 25.2 25.3 25.4 25.5 25.6 xiii 810 813 815 816 816 818 822 827 830 831 832 833 834 838 842 845 849 851 852 852 860 865 872 874 876 877 888 897 900 902 903 909 913 918 935 940 943 947 949 949 965
xiv Contents 26 Decentralized Optimization II:Primal-Dual Methods Motivation EXTRA Algorithm EXACT Diffusion Algorithm Distributed Inexact Gradient Algorithm Augmented Decentralized Gradient Method АТС Tracking Method Unified Decentralized Algorithm Convergence Performance Dual Method Decentralized Nonconvex Optimization Commentaries and Discussion Problems 26.A Convergence of Primal-Dual Algorithms References 969 969 970 972 975 978 979 983 985 987 990 995 998 1000 1006 Author Index Subject Index 1009 1033 26.1 26.2 26.3 26.4 26.5 26.6 26.7 26.8 26.9 26.10 26.11 VOLUME II INFERENCE Preface P.l Emphasis on Foundations P.2 Glimpse of History P.3 Organization of the Text P.4 How to Use the Text P.5 Simulation Datasets P.6 Acknowledgments Notation 27 Mean-Square-Error Inference Inference without Observations Inference with Observations Gaussian Random Variables Bias-Variance Relation Commentaries and Discussion Problems 27.A Circular Gaussian Distribution References 27.1 27.2 27.3 27.4 27.5 28 Bayesian Inference 28.1 28.2 28.3 28.4 Bayesian Formulation Maximum А-Posteriori Inference Bayes Classifier Logistic Regression Inference xxvii xxvii xxix xxxi xxxiv xxxvii xl xlv 1053 1054 1057 1066 1072 1082 1085 1088 1090 1092 1092 1094 1097 1106
Contents 28.5 28.6 29 Discriminative and Generative Models Commentaries and Discussion Problemă References Linear Regression Regression Model Centering and Augmentation Vector Estimation Linear Models Data Fusion Minimum-Variance Unbiased Estimation Commentaries and Discussion Problems 29.A Consistency of Normal Equations References 29.1 29.2 29.3 29.4 29.5 29.6 29.7 ЗО Kalman Filter 30.1 30.2 30.3 30.4 30.5 30.6 30.7 30.8 30.9 31 Uncorrelated Observations Innovations Process State-Space Model Measurement- and Time-Update Forms Steady-State Filter Smoothing Filters Ensemble Kalman Filter Nonlinear Filtering Commentaries and Discussion Problems References Maximum Likelihood 31.1 31.2 31.3 31.4 31.5 31.6 31.7 Problem Formulation Gaussian Distribution Multinomial Distribution Exponential Family of Distributions Cramer֊ Rao Lower Bound Model Selection Commentaries and Discussion Problems 31.A Derivation of the Cramer Rao Bound ЗІ.В Derivation of the AIC Formulation 31.C Derivation of the BIC Formulation References XV 1110 1113 1116 1119 1121 1121 1128 1131 1134 1136 1139 1143 1145 1151 1153 1154 1154 1157 1159 1171 1177 1181 1185 1191 1201 1204 1208 1211 1211 1214 1223 1226 1229 1237 1251 1259 1265 1266 1271 1273
xvi 32 Contents Expectation Maximization Motivation Derivation of the EM Algorithm Gaussian Mixture Models Bernoulli Mixture Models Commentaries and Discussion Problems 32.A Exponential Mixture Models References 32.1 32.2 32.3 32.4 32.5 33 Predictive Modeling 33.1 33.2 33.3 33.4 34 Expectation Propagation 34.1 34.2 34.3 34.4 34.5 35 Factored Representation Gaussian Sites Exponential Sites Assumed Density Filtering Commentaries and Discussion Problems References Particle Filters 35.1 35.2 35.3 35.4 36 Posterior Distributions Laplace Method Markov Chain Monte Carlo Method Commentaries and Discussion Problems References Data Model Importance Sampling Particle Filter Implementations Commentaries and Discussion Problems References Variational Inference 36.1 36.2 36.3 36.4 36.5 36.6 36.7 36.8 Evaluating Evidences Evaluating Posterior Distributions Mean-Field Approximation Exponential Conjugate Models Maximizing the ELBO Stochastic Gradient Solution Black Box Inference Commentaries and Discussion 1276 1276 1282 1287 1302 1308 1310 1312 1316 1319 1320 1328 1333 1346 1348 1349 1352 1352 1357 1371 1375 1378 1378 1379 1380 1380 1385 1393 1400 1401 1403 1405 1405 1411 1413 1440 1454 1458 1461 1467
Contents Problems References 37 Latent Dirichlet Allocation 37.1 37.2 37.3 37.4 37.5 38 Hidden Markov Models 38.1 38.2 38.3 38.4 38.5 39 Decoding States Decoding Transition Probabilities Normalization and Scaling Viterbi Algorithm EM Algorithm for Dependent Observations Commentaries and Discussion Problems References Independent Component Analysis 40.1 40.2 40.3 40.4 40.5 40.6 41 Gaussian Mixture Models Markov Chains Forward-Backward Recursions Validation and Prediction Tasks Commentaries and Discussion Problems References Decoding Hidden Markov Models 39.1 39.2 39.3 39.4 39.5 39.6 40 Generative Model Coordinate-Ascent Solution Maximizing the ELBO Estimating Model Parameters Commentaries and Discussion Problems References Problem Formulat ion Maximum-Likelihood Formulation Mutual Information Formulation Maximum Kurtosis Formulation Projection Pursuit Commentaries and Discussion Problems References Bayesian Networks 41.1 41.2 Curse of Dimensionality Probabilistic Graphical Models xvii 1467 1470 1472 1473 1482 1493 1500 1514 1515 1515 1517 1517 1522 1538 1547 1551 1557 1560 1563 1563 1565 1569 1574 1586 1604 1605 1607 1609 1610 1617 1622 1627 1634 1637 1638 1640 1643 1644 1647
xviii Contents 41.3 41.4 41.5 42 Inference over Graphs 42.1 42.2 42.3 42.4 42.5 42.6 42.7 43 Active and Blocked Pathways Conditional Independence Relations Commentaries and Discussion Problems References Probabilistic Inference Inference by Enumeration Inference by Variable Elimination Chow Liu Algorithm Graphical LASSO Learning Graph Parameters Commentaries and Discussion Problems References Undirected Graphs Cliques and Potentials Representation Theorem Factor Graphs Message-Passing Algorithms Commentaries and Discussion Problems 43.A Proof of the Hammersley-Clifford Theorem 43.В Equivalence of Markovian Properties References 43.1 43.2 43.3 43.4 43.5 44 Markov Decision Processes 44.1 44.2 44.3 44.4 44.5 45 MDP Model Discounted Rewards Policy Evaluation Linear Function Approximation Commentaries and Discussion Problems References Value and Policy Iterations Value Iteration Policy Iteration Partially Observable MDP Commentaries and Discussion Problems 45.A Optimal Policy and State-Action Values 45.1 45.2 45.3 45.4 1661 1670 1677 1679 1680 1682 1682 1685 1691 1698 1705 1711 1733 1735 1737 1740 1740 1752 1756 1761 1793 1796 1799 1803 1804 1807 1807 1821 1825 1840 1848 1850 1851 1853 1853 1866 1879 1893 1900 1903
Contents 45.B 45.C 45.D 45.E 45.F 46 Temporal Difference Learning 46.1 46.2 46.3 46.4 46.5 46.6 46.7 46.8 46.A 46.В 46.C 46.D 47 Convergence of Value Iteration Proof of c-Optimality Convergence of Policy Iteration Piecewise Linear Property Bellman Principle of Optimality References Model-Based Learning Monte Carlo Policy Evaluation TD(0) Algorithm Look-Ahead TD Algorithm TD(A) Algorithm True Online TD(A) Algorithm Off-Policy Learning Commentaries and Discussion Problems Useful Convergence liesuit. Convergence of TD (0) Algorithm Convergence of TD(A) Algorithm Equivalence of Offline Implementations References Q-Learning 47.1 47.2 47.3 47.4 47.5 47.6 47.7 47.8 47.9 47.10 SARSA(O) Algorithm Look-Ahead SARSA Algorithm SARSA(A) Algorithm Off-Policy Learning Optimal Policy Extraction Q-Learning Algorithm Exploration versus Exploitation Q-Learning with Replay Buffer Double Q-Learuing Commentaries and Discussion Problems 47.A Convergence of SARSA(O) Algorithm 47.В Convergence of 6/ Learning Algorithm References 48 Value Function Approximation 48.1 48.2 48.3 48.4 Stochastic Gradient TD-Learning Least-Squares TD-Learning Projected Bellman Learning SARSA Methods xix 1905 1906 1907 1909 1910 1914 1917 1918 1920 1928 1936 1940 1949 1952 1957 1958 1959 1960 1963 1967 1969 1971 1971 1975 1977 1979 1980 1982 1985 1993 1994 1996 1999 2001 2003 2005 2008 2008 2018 2019 2026
XX Contents 48.5 48.6 49 Deep Ģ-Learning Commentaries and Discussion Problems References 2032 2041 2043 2045 Policy Model Finite-Difference Method Score Function Objective Functions Policy Gradient Theorem Actor-Critic Algorithms Natural Gradient Policy Trust Region Policy Optimization Deep Reinforcement Learning Soft Learning Commentaries and Discussion Problems 49.A Proof of Policy Gradient Theorem 49.В Proof of Consistency Theorem References 2047 2047 2048 2050 2052 2057 2059 2071 2074 2093 2098 2106 2109 2113 2117 2118 Author Index Subject Index 2121 2145 Policy Gradient Methods 49.1 49.2 49.3 49.4 49.5 49.6 49.7 49.8 49.9 49.10 49.11 VOLUME III LEARNING Preface P.l Emphasis on Foundations P.2 Glimpse of History P.3 Organization of the Text P.4 How to Use the Text P.5 Simulation Datasets P.6 Acknowledgments Notation 50 Least-Squares Problems Motivation Normal Equations Recursive Least-Squares Implicit Bias Commentaries and Discussion Problems 50.A Minimum-Norm Solution 50.В Equivalence in Linear Estimation 50.1 50.2 50.3 50.4 50.5 xxvii xxvii xxix xxxi xxxiv xxxvii xl xlv 2165 2165 2170 2187 2195 2197 2202 2210 2211
Contents 51 50.C Extended Least-Squares References 2212 2217 Regularization 2221 2222 2225 2230 2234 2242 2245 2250 2253 2257 Three Challenges ^-Regularization ^i-Regularization Soft Thresholding Commentaries and Discussion Problems 51.A Constrained Formulations for Regularization 51.В Expression for LASSO Solution References 51.1 51.2 51.3 51.4 51.5 52 Nearest-Neighbor Rule Bayes Classifier fc-NN Classifier Performance Guarantee Å’-Mcans Algorithm Commentaries and Discussion Problems 52.A Performance of the NN Classifier References 52.1 52.2 52.3 52.4 52.5 53 Self-Organizing Maps 53.1 53.2 53.3 53.4 54 Grid Arrangements Training Algorithm Visualization Commentaries and Discussion Problems References Decision Trees 54.1 54.2 54.3 54.4 55 xxi Trees and Attributes Selecting Attributes Constructing a Tree Commentaries and Discussion Problems References Naïve Bayes Classifier 55.1 55.2 55.3 Independence Condition Modeling the Conditional Distribution Estimating the Priors 2260 2262 2265 2268 2270 2279 2282 2284 2287 2290 2290 2293 2302 2310 2310 2311 2313 2313 2317 2327 2335 2337 2338 2341 2341 2343 2344
xxii Contents 55.4 55.5 56 Linear Discriminant Analysis 56.1 56.2 56.3 56.4 56.5 57 Gaussian Naïve Classifier Commentaries and Discussion Problems References Discriminant Functions Linear Discriminant Algorithm Minimum Distance Classifier Fisher Discriminant Analysis Commentaries and Discussion Problems References Principal Component Analysis Data Preprocessing Dimensionality Reduction Subspace Interpretations Sparse PCA Probabilistic PCA Commentaries and Discussion Problems 57.A Maximum Likelihood Solution 57.В Alternative Optimization Problem References 57.1 57.2 57.3 57.4 57.5 57.6 58 Dictionary Learning Learning Under Regularization Learning Under Constraints K-SVD Approach Nonnegative Matrix Factorization Commentaries and Discussion Problems 58.A Orthogonal Matching Pursuit References 58.1 58.2 58.3 58.4 58.5 59 Logistic Regression 59.1 59.2 59.3 59.4 59.5 59.6 Logistic Model Logistic Empirical Risk Multiclass Classification Active Learning Domain Adaptation Commentaries and Discussion Problems 2351 2352 2354 2356 2357 2357 2360 2362 2365 2378 2379 2381 2383 2383 2385 2396 2399 2404 2411 2414 2417 2421 2422 2424 2425 2430 2432 2435 2443 2446 2448 2454 2457 2457 2459 2464 2471 2476 2484 2488
Contents 60 59.A Generalized Linear Models References 2492 2496 Perceptron 2499 2499 2501 2507 2509 2513 2517 2520 2526 2528 Linear Separability Perceptron Empirical Risk Termination in Finite Steps Pocket PerceĮrtron Commentaries and Discussion Problems 60.A Counting Theorem 60.В Boolean Functions References 60.1 60.2 60.3 60.4 60.5 61 Support Vector Machines 61.1 61.2 61.3 61.4 62 SVM Empirical Risk Convex Quadratic Program Cross Validation Comment aries and Discussion Problems References Bagging and Boosting 62.1 62.2 62.3 62.4 63 xxiii Bagging Classifiers AdaBoost Classifier Gradient Boosting Commentaries and Discussion Problems References Kernel Methods 63.1 63.2 63.3 63.4 63.5 63.6 63.7 63.8 63.9 63.10 Motivation Nonlinear Mappings Polynomial and Gaussian Kernels Kernel-Based Perceptron Kernel-Based SVM Kernel-Based Ridge Regression Kernel-Based Learning Kernel PGA Inference under Gaussian Processes Commentaries and Discussion Problems References 2530 2530 2541 2546 2551 2553 2554 2557 2557 2561 2572 2580 2581 2584 2587 2587 2590 2592 2595 2604 2610 2613 2618 2623 2634 2640 2646
xxiv 64 Contents Generalization Theory 64.1 64.2 64.3 64.4 64.5 64.6 64.7 64.A 64.В 64.C 64.D 65 Curse of Dimensionality Empirical Risk Minimization Generalization Ability VC Dimension Bias-Variance Trade-off Surrogate Risk Functions Commentaries and Discussion Problems VC Dimension for Linear Classifiers Sauer Lemma Vapnik-Chervoncnkis Bound Rademacher Complexity References Feedforward Neural Networks Activation Functions Feedforward Networks Regression and Classification Calculation of Gradient Vectors Backpropagation Algorithm Dropout Strategy Regularized Cross-Entropy Risk Slowdown in Learning Batch Normalization Commentaries and Discussion Problems 65.A Derivation of Batch Normalization Algorithm References 65.1 65.2 65.3 65.4 65.5 65.6 65.7 65.8 65.9 65.10 66 Deep Belief Networks 66.1 66.2 66.3 66.4 66.5 66.6 67 Pre-Training Using Stacked Autoencoders Restricted Boltzmann Machines Contrastive Divergence Pre-Training using Stacked RBMs Deep Generative Model Commentaries and Discussion Problems References Convolutional Networks 67.1 67.2 67.3 Correlation Layers Pooling Full Network 2650 2650 2654 2657 2662 2663 2667 2672 2679 2686 2688 2694 2701 2711 2715 2716 2721 2728 2731 2739 2750 2754 2768 2769 2776 2781 2787 2792 2797 2797 2802 2809 2820 2823 2830 2834 2836 2838 2839 2860 2869
Contents Training Algorithm Commentaries and Discussion Problems 67.A Derivation of Training Algorithm References 2876 2885 2887 2888 2903 Generative Networks 2905 2905 2913 2930 2935 2943 2956 2960 2963 2964 67.4 67.5 68 68.1 68.2 68.3 68.4 68.5 68.6 68.7 69 Recurrent Neural Networks Backpropagation Through Time Bidirectional Recurrent Networks Vanishing and Exploding Gradients Long Short-Term Memory Networks Bidirectional LSTMs Gated Recurrent Units Commentaries and Discussion Problems References Explainable Learning 70.1 70.2 70.3 70.4 70.5 71 Variational Autoencoders Training Variational Autoencoders Conditional Variational Autoencoders Generative Adversarial Networks Training of GANs Conditional GANs Commentaries and Discussion Problems References Recurrent Networks 69.1 69.2 69.3 69.4 69.5 69.6 69.7 69.8 70 xxv Classifier Model Sensitivity Analysis Gradient X Input Analysis Relevance Analysis Commentaries and Discussion Problems References Adversarial Attacks 71.1 71.2 71.3 71.4 71.5 Types of Attacks Fast Gradient Sign Method Jacobian Saliency Map Approach DeepFool Technique Black-Box Attacks 2967 2967 2973 2995 3002 3004 3026 3034 3036 3037 3040 3042 3042 3046 3049 3050 3060 3061 3062 3065 3066 3070 3075 3078 3088
xxvi Contents 71.6 71.7 72 Defense Mechanisms Commentaries and Discussion Problems References 3091 3093 3095 3096 Network Model Siamese Networks Relation Networks Exploration Models Commentaries and Discussion Problems 72.A Matching Networks 72.В Prototypical Networks References 3099 3099 3101 3112 3118 3136 3136 3138 3144 3146 Author Index Subject Index 3149 3173 Meta Learning 72.1 72.2 72.3 72.4 72.5
|
adam_txt |
Contents VOLUME I FOUNDATIONS Preface P. 1 Emphasis on Foundations P.2 Glimpse of History P.3 Organization of the Text P.4 How to Use the Text P.5 Simulation Datasets P.6 Acknowledgments Notation 1 Matrix Theory 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 l.A l.B 2 Symmetric Matrices Positive-Definite Matrices Range Spaces and Nullspaces Schur Complements Cholesky Factorization QR Decomposition Singular Value Decomposition Square-Root Matrices Kronecker Products Vector and Matrix Norms Perturbation Bounds on Eigenvalues Stochastic Matrices Complex-Valued Matrices Commentaries and Discussion Problems Proof of Spectral Theorem Constructive Proof of SVD References Vector Differentiation 2.1 2.2 Gradient Vectors Hessian Matrices paye xxvii xxvii xxix xxxi xxxiv xxxvii xl xlv 1 1 5 7 11 11 18 20 22 24 30 37 38 39 41 47 50 52 53 59 59 62
viii Contents 2.3 2.4 3 Matrix Differentiation Commentaries and Discussion Problems References Random Variables Probability Density Functions Mean and Variance Dependent Random Variables Random Vectors Properties of Covaiiance Matrices Illustrative Applications Complex-Valued Variables Commentaries and Discussion Problems 3.A Convergence of Random Variables 3.B Concentration Inequalities References 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 4 Gaussian Distribution 4.1 4.2 4.3 4.4 4.5 4.6 4.7 5 Exponential Distributions 5.1 5.2 5.3 5.4 5.5 5.A 6 Scalar Gaussian Variables Vector Gaussian Variables Useful Gaussian Manipulations Jointly Distributed Gaussian Variables Gaussian Processes Circular Gaussian Distribution Commentaries and Discussion Problems References Definition Special Cases Useful Properties Conjugate Priors Commentaries and Discussion Problems Derivation of Properties References Entropy and Divergence 6.1 6.2 6.3 Information and Entropy Kullback Leibler Divergence Maximum Entropy Distribution 63 65 65 67 68 68 71 77 93 96 97 106 109 112 119 122 128 132 132 134 138 144 150 155 157 160 165 167 167 169 178 183 187 189 192 195 196 196 204 209
Contents 6.4 6.5 6.6 6.7 6.8 7 Random Processes 7.1 7.2 7.3 7.4 8 Convex Sets Convexity Strict Convexity Strong Convexity Hessian Matrix Conditions Subgradient Vectors Jensen Inequality Conjugate Functions Bregman Divergence Commentaries and Discussion Problems References Convex Optimization 9.1 9.2 9.3 9.4 9.5 10 Stationary Processes Power Spectral Density Spectral Factorization Commentaries and Discussion Problems References Convex Functions 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 9 Moment Matching Fisher Information Matrix Natural Gradients Evidence Lower Bound Commentaries and Discussion Problems References Convex Optimization Problems Equality Constraints Motivating the KKT Conditions Projection onto Convex Sets Commentaries and Discussion Problems References Lipschitz Conditions 10.1 Mean-Value Theorem 10.2 h-Smootli Functions 10.3 Commeiitaries and Discussion Problems References ¡X 211 213 217 227 231 234 237 240 240 245 252 255 257 259 261 261 263 265 266 268 272 279 281 285 290 293 299 302 302 310 312 315 322 323 328 330 330 332 337 338 340
X Contents 11 Proximal Operator 11.1 11.2 11.3 11.4 11.5 11.6 Definition and Properties Proximal Point Algorithm Proximal Gradient Algorithm Convergence Results Douglas-Racliford Algorithm Commentaries and Discussion Problems 11. A Convergence under Convexity 11. В Convergence under Strong Convexity References 12 Gradient-Descent Method Empirical and Stochastic Risks Conditions on Risk Function Constant Step Sizes Iteration-Dependent Step-Sizes Coordinate-Descent Method Alternating Projection Algorithm Commentaries and Discussion Problems 12.A Zeroth-Order Optimization References 12.1 12.2 12.3 12.4 12.5 12.6 12.7 13 Conjugate Gradient Method 13.1 13.2 13.3 13.4 14 Linear Systems of Equations Nonlinear Optimization Convergence Analysis Comnientaries and Discussion Problems References Subgradient Method Subgradient Algorithm Conditions on Risk Function Convergence Behavior Pocket Variable Exponential Smoothing Iteration-Dependent Step Sizes Coordinate-Descent Algorithms Comnientaries and Discussion Problems 14.A Deterministic Inequality Recursion References 14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8 341 341 347 349 354 356 358 362 366 369 372 375 375 379 381 392 402 413 418 425 433 436 441 441 454 459 465 466 469 471 471 475 479 483 486 489 493 496 498 501 505
Contents 15 Proximal and Mirror-Descent Methods 15.1 15.2 15.3 15.4 15.5 16 Proximal Gradient Method Projection Gradient Method Mirror-Descent Method Comparison of Convergence Rates Commentaries and Discussion Problems References Stochastic Optimization Stochastic Gradient Algorithm Stochastic Subgradient Algorithm Stochastic Proximal Gradient Algorithm Gradient Noise Regret Analysis Commentaries and Discussion Problems 16.A Switching Expectation and Differentiation References 16.1 16.2 16.3 16.4 16.5 16.6 17 Adaptive Gradient Methods 17.1 17.2 17.3 17.4 17.5 17.6 17.7 Motivation AdaGrad Algorithm RMSprop Algorithm ADAM Algorithm Momentum Acceleration Methods Federated Learning Commentaries and Discussion Problems 17.A Regret Analysis for ADAM References 18 Gradient Noise 18.1 18.2 18.3 18.4 18.5 18.6 Motivation Smooth Risk Functions Gradient Noise for Smooth Risks Nonsmooth Risk Functions Gradient Noise for Nonsmooth Risks Commentaries and Discussion Problem,s 18.A Averaging over Mini-Batches 18.В Auxiliary Variance Result References XI 507 507 515 519 537 539 541 544 547 548 565 569 574 576 582 586 590 595 599 599 603 608 610 614 619 626 630 632 640 642 642 645 648 660 665 673 675 677 679 681
xii Contents 19 Convergence Analysis 1: Stochastic Gradient Algorithms Problem Setting Convergence nudar Uniform Sampling Convergence of Mini-Batch Implementation Convergence under Vanishing Step Sizes Convergence under Random Reshuffling Convergence under Importance Sainpling Convergence of Stochastic Conjugate Gradient Commentaries and Discussion Problems 19.A Stochastic Inequality Recursion 19.В Proof of Theorem 19.5 References 19.1 19.2 19.3 19.4 19.5 19.6 19.7 19.8 20 Convergence Analysis II: Stochastic Subgradient Algorithms 20.1 20.2 20.3 20.4 20.5 20.6 20.7 21 Convergence Analysis III: Stochastic Proximal Algorithms 21.1 21.2 21.3 21.4 21.5 21.6 21.7 22 Problem Setting Convergence under Uniform Sampling Convergence with Pocket Variables Convergence with Exponential Smoothing Convergence of Mini-Batch Implementáljon Convergence under Vanishing Step Sizes Commentaries and Discussion Problems References Problem Setting Convergence under Uniform Sampling Convergence of Mini-Batch Implementation Convergence under Vanishing Step Sizes Stochastic Projection Gradient Mirror-Descent Algorithm Commentaries and Discussion Problem,s References Variance-Reduced Methods 1: Uniform Sampling 22.1 22.2 22.3 22.4 22.5 22.6 Problem Setting Naïve Stochastic Gradient Algorithm Stochastic Average-Gradient Algorithm (SAGA) Stochastic Variance-Reduced Gradient Algorithm (SVRG) Nonsmooth Risk Functions Commentaries and Discussion Problems 683 683 686 691 692 698 701 707 712 716 720 722 727 730 730 735 738 740 745 747 750 753 754 756 756 761 765 766 769 771 774 775 776 779 779 782 785 793
799 806 808
Contents 22. A 22.В 23 Variance-Reduced Methods II: Random Reshuffling 23.1 23.2 23.3 23.4 23.5 23.6 23.7 23.A 23.В 23,G 23.D 23.E 24 Amortized Variance-Reduced Gradient Algorithm (AVRG) Evolution of Memory Variable's Convergence of SAGA Convergence of AVRG Convergence of SVRG Nonsmooth Risk Functions Commentaries and Discussion Problems Proof of Lemma 23.3 Proof of Lemma 23.4 Proof of Theorem 23.1 Proof of Lemma 23.5 Proof of Theorem 23.2 References Nonconvex Optimization 24.1 24.2 24.3 24.4 24.A 24.В 24.C 24.D 25 Proof of Theorem 22.2 Proof of Theorem 22.3 References First- and Second-Order Stationarity Stochastic Gradient Optimization Convergence Behavior Commentaries and Discussion Problems Descent in the Large Gradient Régime Introducing a Short-Term Model Descent Away from Strict Saddle Points Second-Order Convergence Guarantee References Decentralized Optimization I: Primal Methods Graph Topology Weight Matrices Aggregate and Local Risks Incremental, Consensus, and Diffusion Formal Derivation as Primal Methods Commentaries and Discussion Problems 25.A Proof of Lemma 25.1 25.В Proof of Property (25.71) 25.C Convergence of Primal Algorithms References 25.1 25.2 25.3 25.4 25.5 25.6 xiii 810 813 815 816 816 818 822 827 830 831 832 833 834 838 842 845 849 851 852 852 860 865 872 874 876 877 888 897 900 902 903 909 913 918 935 940 943 947 949 949 965
xiv Contents 26 Decentralized Optimization II:Primal-Dual Methods Motivation EXTRA Algorithm EXACT Diffusion Algorithm Distributed Inexact Gradient Algorithm Augmented Decentralized Gradient Method АТС Tracking Method Unified Decentralized Algorithm Convergence Performance Dual Method Decentralized Nonconvex Optimization Commentaries and Discussion Problems 26.A Convergence of Primal-Dual Algorithms References 969 969 970 972 975 978 979 983 985 987 990 995 998 1000 1006 Author Index Subject Index 1009 1033 26.1 26.2 26.3 26.4 26.5 26.6 26.7 26.8 26.9 26.10 26.11 VOLUME II INFERENCE Preface P.l Emphasis on Foundations P.2 Glimpse of History P.3 Organization of the Text P.4 How to Use the Text P.5 Simulation Datasets P.6 Acknowledgments Notation 27 Mean-Square-Error Inference Inference without Observations Inference with Observations Gaussian Random Variables Bias-Variance Relation Commentaries and Discussion Problems 27.A Circular Gaussian Distribution References 27.1 27.2 27.3 27.4 27.5 28 Bayesian Inference 28.1 28.2 28.3 28.4 Bayesian Formulation Maximum А-Posteriori Inference Bayes Classifier Logistic Regression Inference xxvii xxvii xxix xxxi xxxiv xxxvii xl xlv 1053 1054 1057 1066 1072 1082 1085 1088 1090 1092 1092 1094 1097 1106
Contents 28.5 28.6 29 Discriminative and Generative Models Commentaries and Discussion Problemă References Linear Regression Regression Model Centering and Augmentation Vector Estimation Linear Models Data Fusion Minimum-Variance Unbiased Estimation Commentaries and Discussion Problems 29.A Consistency of Normal Equations References 29.1 29.2 29.3 29.4 29.5 29.6 29.7 ЗО Kalman Filter 30.1 30.2 30.3 30.4 30.5 30.6 30.7 30.8 30.9 31 Uncorrelated Observations Innovations Process State-Space Model Measurement- and Time-Update Forms Steady-State Filter Smoothing Filters Ensemble Kalman Filter Nonlinear Filtering Commentaries and Discussion Problems References Maximum Likelihood 31.1 31.2 31.3 31.4 31.5 31.6 31.7 Problem Formulation Gaussian Distribution Multinomial Distribution Exponential Family of Distributions Cramer֊ Rao Lower Bound Model Selection Commentaries and Discussion Problems 31.A Derivation of the Cramer Rao Bound ЗІ.В Derivation of the AIC Formulation 31.C Derivation of the BIC Formulation References XV 1110 1113 1116 1119 1121 1121 1128 1131 1134 1136 1139 1143 1145 1151 1153 1154 1154 1157 1159 1171 1177 1181 1185 1191 1201 1204 1208 1211 1211 1214 1223 1226 1229 1237 1251 1259 1265 1266 1271 1273
xvi 32 Contents Expectation Maximization Motivation Derivation of the EM Algorithm Gaussian Mixture Models Bernoulli Mixture Models Commentaries and Discussion Problems 32.A Exponential Mixture Models References 32.1 32.2 32.3 32.4 32.5 33 Predictive Modeling 33.1 33.2 33.3 33.4 34 Expectation Propagation 34.1 34.2 34.3 34.4 34.5 35 Factored Representation Gaussian Sites Exponential Sites Assumed Density Filtering Commentaries and Discussion Problems References Particle Filters 35.1 35.2 35.3 35.4 36 Posterior Distributions Laplace Method Markov Chain Monte Carlo Method Commentaries and Discussion Problems References Data Model Importance Sampling Particle Filter Implementations Commentaries and Discussion Problems References Variational Inference 36.1 36.2 36.3 36.4 36.5 36.6 36.7 36.8 Evaluating Evidences Evaluating Posterior Distributions Mean-Field Approximation Exponential Conjugate Models Maximizing the ELBO Stochastic Gradient Solution Black Box Inference Commentaries and Discussion 1276 1276 1282 1287 1302 1308 1310 1312 1316 1319 1320 1328 1333 1346 1348 1349 1352 1352 1357 1371 1375 1378 1378 1379 1380 1380 1385 1393 1400 1401 1403 1405 1405 1411 1413 1440 1454 1458 1461 1467
Contents Problems References 37 Latent Dirichlet Allocation 37.1 37.2 37.3 37.4 37.5 38 Hidden Markov Models 38.1 38.2 38.3 38.4 38.5 39 Decoding States Decoding Transition Probabilities Normalization and Scaling Viterbi Algorithm EM Algorithm for Dependent Observations Commentaries and Discussion Problems References Independent Component Analysis 40.1 40.2 40.3 40.4 40.5 40.6 41 Gaussian Mixture Models Markov Chains Forward-Backward Recursions Validation and Prediction Tasks Commentaries and Discussion Problems References Decoding Hidden Markov Models 39.1 39.2 39.3 39.4 39.5 39.6 40 Generative Model Coordinate-Ascent Solution Maximizing the ELBO Estimating Model Parameters Commentaries and Discussion Problems References Problem Formulat ion Maximum-Likelihood Formulation Mutual Information Formulation Maximum Kurtosis Formulation Projection Pursuit Commentaries and Discussion Problems References Bayesian Networks 41.1 41.2 Curse of Dimensionality Probabilistic Graphical Models xvii 1467 1470 1472 1473 1482 1493 1500 1514 1515 1515 1517 1517 1522 1538 1547 1551 1557 1560 1563 1563 1565 1569 1574 1586 1604 1605 1607 1609 1610 1617 1622 1627 1634 1637 1638 1640 1643 1644 1647
xviii Contents 41.3 41.4 41.5 42 Inference over Graphs 42.1 42.2 42.3 42.4 42.5 42.6 42.7 43 Active and Blocked Pathways Conditional Independence Relations Commentaries and Discussion Problems References Probabilistic Inference Inference by Enumeration Inference by Variable Elimination Chow Liu Algorithm Graphical LASSO Learning Graph Parameters Commentaries and Discussion Problems References Undirected Graphs Cliques and Potentials Representation Theorem Factor Graphs Message-Passing Algorithms Commentaries and Discussion Problems 43.A Proof of the Hammersley-Clifford Theorem 43.В Equivalence of Markovian Properties References 43.1 43.2 43.3 43.4 43.5 44 Markov Decision Processes 44.1 44.2 44.3 44.4 44.5 45 MDP Model Discounted Rewards Policy Evaluation Linear Function Approximation Commentaries and Discussion Problems References Value and Policy Iterations Value Iteration Policy Iteration Partially Observable MDP Commentaries and Discussion Problems 45.A Optimal Policy and State-Action Values 45.1 45.2 45.3 45.4 1661 1670 1677 1679 1680 1682 1682 1685 1691 1698 1705 1711 1733 1735 1737 1740 1740 1752 1756 1761 1793 1796 1799 1803 1804 1807 1807 1821 1825 1840 1848 1850 1851 1853 1853 1866 1879 1893 1900 1903
Contents 45.B 45.C 45.D 45.E 45.F 46 Temporal Difference Learning 46.1 46.2 46.3 46.4 46.5 46.6 46.7 46.8 46.A 46.В 46.C 46.D 47 Convergence of Value Iteration Proof of c-Optimality Convergence of Policy Iteration Piecewise Linear Property Bellman Principle of Optimality References Model-Based Learning Monte Carlo Policy Evaluation TD(0) Algorithm Look-Ahead TD Algorithm TD(A) Algorithm True Online TD(A) Algorithm Off-Policy Learning Commentaries and Discussion Problems Useful Convergence liesuit. Convergence of TD (0) Algorithm Convergence of TD(A) Algorithm Equivalence of Offline Implementations References Q-Learning 47.1 47.2 47.3 47.4 47.5 47.6 47.7 47.8 47.9 47.10 SARSA(O) Algorithm Look-Ahead SARSA Algorithm SARSA(A) Algorithm Off-Policy Learning Optimal Policy Extraction Q-Learning Algorithm Exploration versus Exploitation Q-Learning with Replay Buffer Double Q-Learuing Commentaries and Discussion Problems 47.A Convergence of SARSA(O) Algorithm 47.В Convergence of 6/ Learning Algorithm References 48 Value Function Approximation 48.1 48.2 48.3 48.4 Stochastic Gradient TD-Learning Least-Squares TD-Learning Projected Bellman Learning SARSA Methods xix 1905 1906 1907 1909 1910 1914 1917 1918 1920 1928 1936 1940 1949 1952 1957 1958 1959 1960 1963 1967 1969 1971 1971 1975 1977 1979 1980 1982 1985 1993 1994 1996 1999 2001 2003 2005 2008 2008 2018 2019 2026
XX Contents 48.5 48.6 49 Deep Ģ-Learning Commentaries and Discussion Problems References 2032 2041 2043 2045 Policy Model Finite-Difference Method Score Function Objective Functions Policy Gradient Theorem Actor-Critic Algorithms Natural Gradient Policy Trust Region Policy Optimization Deep Reinforcement Learning Soft Learning Commentaries and Discussion Problems 49.A Proof of Policy Gradient Theorem 49.В Proof of Consistency Theorem References 2047 2047 2048 2050 2052 2057 2059 2071 2074 2093 2098 2106 2109 2113 2117 2118 Author Index Subject Index 2121 2145 Policy Gradient Methods 49.1 49.2 49.3 49.4 49.5 49.6 49.7 49.8 49.9 49.10 49.11 VOLUME III LEARNING Preface P.l Emphasis on Foundations P.2 Glimpse of History P.3 Organization of the Text P.4 How to Use the Text P.5 Simulation Datasets P.6 Acknowledgments Notation 50 Least-Squares Problems Motivation Normal Equations Recursive Least-Squares Implicit Bias Commentaries and Discussion Problems 50.A Minimum-Norm Solution 50.В Equivalence in Linear Estimation 50.1 50.2 50.3 50.4 50.5 xxvii xxvii xxix xxxi xxxiv xxxvii xl xlv 2165 2165 2170 2187 2195 2197 2202 2210 2211
Contents 51 50.C Extended Least-Squares References 2212 2217 Regularization 2221 2222 2225 2230 2234 2242 2245 2250 2253 2257 Three Challenges ^-Regularization ^i-Regularization Soft Thresholding Commentaries and Discussion Problems 51.A Constrained Formulations for Regularization 51.В Expression for LASSO Solution References 51.1 51.2 51.3 51.4 51.5 52 Nearest-Neighbor Rule Bayes Classifier fc-NN Classifier Performance Guarantee Å’-Mcans Algorithm Commentaries and Discussion Problems 52.A Performance of the NN Classifier References 52.1 52.2 52.3 52.4 52.5 53 Self-Organizing Maps 53.1 53.2 53.3 53.4 54 Grid Arrangements Training Algorithm Visualization Commentaries and Discussion Problems References Decision Trees 54.1 54.2 54.3 54.4 55 xxi Trees and Attributes Selecting Attributes Constructing a Tree Commentaries and Discussion Problems References Naïve Bayes Classifier 55.1 55.2 55.3 Independence Condition Modeling the Conditional Distribution Estimating the Priors 2260 2262 2265 2268 2270 2279 2282 2284 2287 2290 2290 2293 2302 2310 2310 2311 2313 2313 2317 2327 2335 2337 2338 2341 2341 2343 2344
xxii Contents 55.4 55.5 56 Linear Discriminant Analysis 56.1 56.2 56.3 56.4 56.5 57 Gaussian Naïve Classifier Commentaries and Discussion Problems References Discriminant Functions Linear Discriminant Algorithm Minimum Distance Classifier Fisher Discriminant Analysis Commentaries and Discussion Problems References Principal Component Analysis Data Preprocessing Dimensionality Reduction Subspace Interpretations Sparse PCA Probabilistic PCA Commentaries and Discussion Problems 57.A Maximum Likelihood Solution 57.В Alternative Optimization Problem References 57.1 57.2 57.3 57.4 57.5 57.6 58 Dictionary Learning Learning Under Regularization Learning Under Constraints K-SVD Approach Nonnegative Matrix Factorization Commentaries and Discussion Problems 58.A Orthogonal Matching Pursuit References 58.1 58.2 58.3 58.4 58.5 59 Logistic Regression 59.1 59.2 59.3 59.4 59.5 59.6 Logistic Model Logistic Empirical Risk Multiclass Classification Active Learning Domain Adaptation Commentaries and Discussion Problems 2351 2352 2354 2356 2357 2357 2360 2362 2365 2378 2379 2381 2383 2383 2385 2396 2399 2404 2411 2414 2417 2421 2422 2424 2425 2430 2432 2435 2443 2446 2448 2454 2457 2457 2459 2464 2471 2476 2484 2488
Contents 60 59.A Generalized Linear Models References 2492 2496 Perceptron 2499 2499 2501 2507 2509 2513 2517 2520 2526 2528 Linear Separability Perceptron Empirical Risk Termination in Finite Steps Pocket PerceĮrtron Commentaries and Discussion Problems 60.A Counting Theorem 60.В Boolean Functions References 60.1 60.2 60.3 60.4 60.5 61 Support Vector Machines 61.1 61.2 61.3 61.4 62 SVM Empirical Risk Convex Quadratic Program Cross Validation Comment aries and Discussion Problems References Bagging and Boosting 62.1 62.2 62.3 62.4 63 xxiii Bagging Classifiers AdaBoost Classifier Gradient Boosting Commentaries and Discussion Problems References Kernel Methods 63.1 63.2 63.3 63.4 63.5 63.6 63.7 63.8 63.9 63.10 Motivation Nonlinear Mappings Polynomial and Gaussian Kernels Kernel-Based Perceptron Kernel-Based SVM Kernel-Based Ridge Regression Kernel-Based Learning Kernel PGA Inference under Gaussian Processes Commentaries and Discussion Problems References 2530 2530 2541 2546 2551 2553 2554 2557 2557 2561 2572 2580 2581 2584 2587 2587 2590 2592 2595 2604 2610 2613 2618 2623 2634 2640 2646
xxiv 64 Contents Generalization Theory 64.1 64.2 64.3 64.4 64.5 64.6 64.7 64.A 64.В 64.C 64.D 65 Curse of Dimensionality Empirical Risk Minimization Generalization Ability VC Dimension Bias-Variance Trade-off Surrogate Risk Functions Commentaries and Discussion Problems VC Dimension for Linear Classifiers Sauer Lemma Vapnik-Chervoncnkis Bound Rademacher Complexity References Feedforward Neural Networks Activation Functions Feedforward Networks Regression and Classification Calculation of Gradient Vectors Backpropagation Algorithm Dropout Strategy Regularized Cross-Entropy Risk Slowdown in Learning Batch Normalization Commentaries and Discussion Problems 65.A Derivation of Batch Normalization Algorithm References 65.1 65.2 65.3 65.4 65.5 65.6 65.7 65.8 65.9 65.10 66 Deep Belief Networks 66.1 66.2 66.3 66.4 66.5 66.6 67 Pre-Training Using Stacked Autoencoders Restricted Boltzmann Machines Contrastive Divergence Pre-Training using Stacked RBMs Deep Generative Model Commentaries and Discussion Problems References Convolutional Networks 67.1 67.2 67.3 Correlation Layers Pooling Full Network 2650 2650 2654 2657 2662 2663 2667 2672 2679 2686 2688 2694 2701 2711 2715 2716 2721 2728 2731 2739 2750 2754 2768 2769 2776 2781 2787 2792 2797 2797 2802 2809 2820 2823 2830 2834 2836 2838 2839 2860 2869
Contents Training Algorithm Commentaries and Discussion Problems 67.A Derivation of Training Algorithm References 2876 2885 2887 2888 2903 Generative Networks 2905 2905 2913 2930 2935 2943 2956 2960 2963 2964 67.4 67.5 68 68.1 68.2 68.3 68.4 68.5 68.6 68.7 69 Recurrent Neural Networks Backpropagation Through Time Bidirectional Recurrent Networks Vanishing and Exploding Gradients Long Short-Term Memory Networks Bidirectional LSTMs Gated Recurrent Units Commentaries and Discussion Problems References Explainable Learning 70.1 70.2 70.3 70.4 70.5 71 Variational Autoencoders Training Variational Autoencoders Conditional Variational Autoencoders Generative Adversarial Networks Training of GANs Conditional GANs Commentaries and Discussion Problems References Recurrent Networks 69.1 69.2 69.3 69.4 69.5 69.6 69.7 69.8 70 xxv Classifier Model Sensitivity Analysis Gradient X Input Analysis Relevance Analysis Commentaries and Discussion Problems References Adversarial Attacks 71.1 71.2 71.3 71.4 71.5 Types of Attacks Fast Gradient Sign Method Jacobian Saliency Map Approach DeepFool Technique Black-Box Attacks 2967 2967 2973 2995 3002 3004 3026 3034 3036 3037 3040 3042 3042 3046 3049 3050 3060 3061 3062 3065 3066 3070 3075 3078 3088
xxvi Contents 71.6 71.7 72 Defense Mechanisms Commentaries and Discussion Problems References 3091 3093 3095 3096 Network Model Siamese Networks Relation Networks Exploration Models Commentaries and Discussion Problems 72.A Matching Networks 72.В Prototypical Networks References 3099 3099 3101 3112 3118 3136 3136 3138 3144 3146 Author Index Subject Index 3149 3173 Meta Learning 72.1 72.2 72.3 72.4 72.5 |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Sayed, Ali H. |
author_GND | (DE-588)128216810X |
author_facet | Sayed, Ali H. |
author_role | aut |
author_sort | Sayed, Ali H. |
author_variant | a h s ah ahs |
building | Verbundindex |
bvnumber | BV048690777 |
classification_rvk | ST 300 |
classification_tum | DAT 708 |
ctrlnum | (DE-599)BVBBV048690777 |
discipline | Informatik |
discipline_str_mv | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01762nam a2200445 ca4500</leader><controlfield tag="001">BV048690777</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20230825 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">230203s2023 xxk |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781009218108</subfield><subfield code="c">set hbk. £ 210.00</subfield><subfield code="9">978-1-009-21810-8</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV048690777</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxk</subfield><subfield code="c">XA-GB</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 708</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Sayed, Ali H.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)128216810X</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Inference and learning from data</subfield><subfield code="c">Ali H. Sayed (École Polytechnique Fédérale de Lausanne, University of California at Los Angeles)</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Cambridge</subfield><subfield code="b">Cambridge University Press</subfield><subfield code="c">2023</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">3 vol.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Lernen</subfield><subfield code="0">(DE-588)4035408-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Big Data</subfield><subfield code="0">(DE-588)4802620-7</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Schlussfolgern</subfield><subfield code="0">(DE-588)4251178-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Inference</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Learning</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Big data / Mathematical models</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Big data / Statistical methods</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Inference</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Learning</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Schlussfolgern</subfield><subfield code="0">(DE-588)4251178-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Lernen</subfield><subfield code="0">(DE-588)4035408-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Big Data</subfield><subfield code="0">(DE-588)4802620-7</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Online-Ausgabe</subfield><subfield code="z">9781009218146</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034064987&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-034064987</subfield></datafield></record></collection> |
id | DE-604.BV048690777 |
illustrated | Not Illustrated |
index_date | 2024-07-03T21:27:24Z |
indexdate | 2024-07-10T09:46:14Z |
institution | BVB |
isbn | 9781009218108 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-034064987 |
open_access_boolean | |
physical | 3 vol. |
publishDate | 2023 |
publishDateSearch | 2023 |
publishDateSort | 2023 |
publisher | Cambridge University Press |
record_format | marc |
spelling | Sayed, Ali H. Verfasser (DE-588)128216810X aut Inference and learning from data Ali H. Sayed (École Polytechnique Fédérale de Lausanne, University of California at Los Angeles) Cambridge Cambridge University Press 2023 3 vol. txt rdacontent n rdamedia nc rdacarrier Lernen (DE-588)4035408-8 gnd rswk-swf Big Data (DE-588)4802620-7 gnd rswk-swf Schlussfolgern (DE-588)4251178-1 gnd rswk-swf Inference Learning Big data / Mathematical models Big data / Statistical methods Schlussfolgern (DE-588)4251178-1 s Lernen (DE-588)4035408-8 s Big Data (DE-588)4802620-7 s DE-604 Erscheint auch als Online-Ausgabe 9781009218146 Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034064987&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Sayed, Ali H. Inference and learning from data Lernen (DE-588)4035408-8 gnd Big Data (DE-588)4802620-7 gnd Schlussfolgern (DE-588)4251178-1 gnd |
subject_GND | (DE-588)4035408-8 (DE-588)4802620-7 (DE-588)4251178-1 |
title | Inference and learning from data |
title_auth | Inference and learning from data |
title_exact_search | Inference and learning from data |
title_exact_search_txtP | Inference and learning from data |
title_full | Inference and learning from data Ali H. Sayed (École Polytechnique Fédérale de Lausanne, University of California at Los Angeles) |
title_fullStr | Inference and learning from data Ali H. Sayed (École Polytechnique Fédérale de Lausanne, University of California at Los Angeles) |
title_full_unstemmed | Inference and learning from data Ali H. Sayed (École Polytechnique Fédérale de Lausanne, University of California at Los Angeles) |
title_short | Inference and learning from data |
title_sort | inference and learning from data |
topic | Lernen (DE-588)4035408-8 gnd Big Data (DE-588)4802620-7 gnd Schlussfolgern (DE-588)4251178-1 gnd |
topic_facet | Lernen Big Data Schlussfolgern |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034064987&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT sayedalih inferenceandlearningfromdata |