Verfügbarkeit: Machine learning

Machine learning: a Bayesian and optimization perspective

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Theodoridis, Sergios 1951- (VerfasserIn)
Format:	Buch
Sprache:	English
Veröffentlicht:	Amsterdam [u.a.] Elsevier, Academic Press 2015
Schlagworte:	Maschinelles Lernen Optimierung Bayes-Verfahren
Online-Zugang:	Inhaltsverzeichnis Klappentext
Beschreibung:	XXI, 1050 S. Ill., graph. Darst.
ISBN:	9780128015223

Internformat

MARC


LEADER	00000nam a2200000 c 4500
001	BV042385052
003	DE-604
005	20181127
007	t
008	150302s2015 ad\|\| \|\|\|\| 00\|\|\| eng d
020			\|a 9780128015223 \|9 978-0-12-801522-3
020			\|z 0128015225 \|9 0128015225
035			\|a (OCoLC)910913108
035			\|a (DE-599)BVBBV042385052
040			\|a DE-604 \|b ger \|e rakwb
041	0		\|a eng
049			\|a DE-29T \|a DE-739 \|a DE-706 \|a DE-573 \|a DE-11 \|a DE-523 \|a DE-863 \|a DE-20
084			\|a ST 300 \|0 (DE-625)143650: \|2 rvk
084			\|a ST 302 \|0 (DE-625)143652: \|2 rvk
100	1		\|a Theodoridis, Sergios \|d 1951- \|e Verfasser \|0 (DE-588)12164135X \|4 aut
245	1	0	\|a Machine learning \|b a Bayesian and optimization perspective \|c Sergios Theodoridis
264		1	\|a Amsterdam [u.a.] \|b Elsevier, Academic Press \|c 2015
300			\|a XXI, 1050 S. \|b Ill., graph. Darst.
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
650	0	7	\|a Maschinelles Lernen \|0 (DE-588)4193754-5 \|2 gnd \|9 rswk-swf
650	0	7	\|a Optimierung \|0 (DE-588)4043664-0 \|2 gnd \|9 rswk-swf
650	0	7	\|a Bayes-Verfahren \|0 (DE-588)4204326-8 \|2 gnd \|9 rswk-swf
689	0	0	\|a Maschinelles Lernen \|0 (DE-588)4193754-5 \|D s
689	0	1	\|a Bayes-Verfahren \|0 (DE-588)4204326-8 \|D s
689	0	2	\|a Optimierung \|0 (DE-588)4043664-0 \|D s
689	0		\|5 DE-604
856	4	2	\|m Digitalisierung UB Passau - ADAM Catalogue Enrichment \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027821053&sequence=000003&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
856	4	2	\|m Digitalisierung UB Passau - ADAM Catalogue Enrichment \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027821053&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA \|3 Klappentext
999			\|a oai:aleph.bib-bvb.de:BVB01-027821053

Datensatz im Suchindex

DE-BY-863_location	1000
DE-BY-FWS_call_number	1000/ST 302 T388
DE-BY-FWS_katkey	707142
DE-BY-FWS_media_number	083101414536
_version_	1824553559145840640
adam_text	Contents Preface ................................................................................................... Acknowledgments ....................................................................................... xix Notation .................................................................................................. xxi CHAPTER 1 Introduction ...................................................................... ι 1.1 What Machine Learning is About .................................................. 1 1.1.1 Classification ............................................................... 2 1.1.2 Regression .................................................................. 3 1.2 Structure and a Road Map of the Book ............................................ 5 References ................................................................................ 8 CHAPTER2 Probability and Stochastic Processes ................................... 9 2.1 Introduction ......................................................................... 10 2.2 Probability and Random Variables ................................................. 10 2.2.1 Probability .................................................................. 11 2.2.2 Discrete Random Variables ................................................ 12 2.2.3 Continuous Random Variables ............................................ 14 2.2.4 Mean and Variance ........................................................ 15 2.2.5 Transformation of Random Variables ..................................... 17 2.3 Examples of Distributions .......................................................... 18 2.3.1 Discrete Variables .......................................................... 18 2.3.2 Continuous Variables ...................................................... 20 2.4 Stochastic Processes ................................................................ 29 2.4.1 First and Second Order Statistics ......................................... 30 2.4.2 Stationarity and Ergodicity ................................................ 30 2.4.3 Power Spectral Density .................................................... 33 2.4.4 Autoregressive Models ............,...,................................... 38 2.5 Information Theory ................................................................. 41 2.5.1 Discrete Random Variables ................................................ 42 2.5.2 Continuous Random Variables ............................................ 45 2.6 Stochastic Convergence ............................................................ 48 Problems .................................................................................. 49 References ................................................................................ 51 CHAPTER 3 Learning in Parametric Modeling: Basic Concepts and Directions 53 3.1 Introduction ......................................................................... 53 3.2 Parameter Estimation: The Deterministic Point of View ......................... 54 vi Contents 3.3 Linear Regression................................................................... 57 3.4 Classification ........................................................................ 60 3.5 Biased Versus Unbiased Estimation ............................................... 64 3.5.1 Biased or Unbiased Estimation? .......................................... 65 3.6 The Cramér-Rao Lower Bound .................................................... 67 3.7 Sufficient Statistic ................................................................... 70 3.8 Regularization ....................................................................... 72 3.9 The Bias-Variance Dilemma ....................................................... 77 3.9.1 Mean-Square Error Estimation ............................................ 77 3.9.2 Bias-Variance Tradeoff .................................................... 78 3.10 Maximum Likelihood Method ..................................................... 82 3.10.1 Linear Regression: The Non white Gaussian Noise Case ................ 84 3.11 Bayesian Inference .................................................................. 84 3.11.1 The Maximum a Posteriori Probability Estimation Method ............. 88 3.12 Curse of Dimensionality ............................................................ 89 3.13 Validation ........................................................................... 91 3.14 Expected and Empirical Loss Functions ........................................... 93 3.15 Nonparametric Modeling and Estimation ......................................... 95 Problems ................................................................................... 97 References .................................................................................. 102 CHAPTER4 Mean-Square Error Linear Estimation ....................................105 4.1 Introduction ......................................................................... 105 4.2 Mean-Square Error Linear Estimation: The Normal Equations .................. 106 4.2.1 The Cost Function Surface ................................................ 107 4.3 A Geometric Viewpoint: Orthogonality Condition ................................ 109 4.4 Extension to Complex-Valued Variables .......................................... Ill 4.4.1 Widely Linear Complex-Valued Estimation .............................. 113 4.4.2 Optimizing with Respect to Complex-Valued Variables: Wirtinger Calculus ......................................................... 116 4.5 Linear Filtering ..................................................................... 118 4.6 MSE Linear Filtering: A Frequency Domain Point of View ...................... 120 4.7 Some Typical Applications ......................................................... 124 4.7.1 Interference Cancellation .................................................. 124 4.7.2 System Identification ...................................................... 125 4.7.3 Deconvolution: Channel Equalization .................................... 126 4.8 Algorithmic Aspects: The Levinson and the Lattice-Ladder Algorithms ........ 132 4.8.1 The Lattice-Ladder Scheme ............................................... 137 4.9 Mean-Square Error Estimation of Linear Models ................................. 140 4.9.1 The Gauss-Markov Theorem .............................................. 143 4.9.2 Constrained Linear Estimation: The Beamforming Case ................ 145 Contents vii 4.10 Time-Varying Statistics: Kalman Filtering ........................................ 148 Problems ................................................................................... 154 References .................................................................................. 158 CHAPTER 5 Stochastic Gradient Descent: The LMS Algorithm and its Family .................................................................... ібі 5.1 Introduction ......................................................................... 162 5.2 The Steepest Descent Method ...................................................... 163 5.3 Application to the Mean-Square Error Cost Function ............................ 167 5.3.1 The Complex-Valued Case ................................................ 175 5.4 Stochastic Approximation .......................................................... 177 5.5 The Least-Mean-Squares Adaptive Algorithm .................................... 179 5.5.1 Convergence and Steady-State Performance of the LMS in Stationary Environments ............................................... 181 5.5.2 Cumulative Loss Bounds .................................................. 186 5.6 The Affine Projection Algorithm ................................................... 188 5.6.1 The Normalized LMS ..................................................... 193 5.7 The Complex-Valued Case ......................................................... 194 5.8 Relatives of the LMS ............................................................... 196 5.9 Simulation Exampies ............................................................... 199 5.10 Adaptive Decision Feedback Equalization ........................................202 5.11 The Linearly Constrained LMS ....................................................204 5.12 Tracking Performance of the LMS in Nonstationary Environments .......................................................................206 5.13 Distributed Learning: The Distributed LMS ......................................208 5.13.1 Cooperation Strategies .....................................................209 5.13.2 The Diffusion LMS ........................................................211 5.13.3 Convergence and Steady-State Performance: Some Highlights ...........................................................218 5.13.4 Consensus-Based Distributed Schemes ...................................220 5.14 A Case Study: Target Localization ................................................222 5.15 Some Concluding Remarks: Consensus Matrix ...................................223 Problems ................................................................................... 224 References ...................__..............................__.........................2.27 Laast-Squaros Family .................................................... 233 6.1 Introduction ......................................................................... 234 6.2 Least-Squares Linear Regression: A Geometric Perspective ..................... 234 6.3 Statistical Properties of the LS Estimator ..........................................236 6.4 Orthogonalizing the Column Space of X: The SVD Method .....................239 6.5 Ridge Regression ...................................................................243 6.6 The Recursive Least-Squares Algorithm ..........................................245 viii Contents 6.7 Newton s Iterative Minimization Method .........................................248 6.7. 1 RLS and Newton s Method ................................................251 6.8 Steady-State Performance of the RLS .............................................252 6.9 Complex-Valued Data: The Widely Linear RLS ..................................254 6.10 Computational Aspects of the LS Solution ........................................255 6.11 The Coordinate and Cyclic Coordinate Descent Methods ........................258 6.12 Simulation Examples ...............................................................259 6.13 Total-Least-Squares .................................................................261 Problems ...................................................................................268 References ..................................................................................272 CHAPTER 7 Classification: A Tour of the Classics ....................................275 7.1 Introduction .........................................................................275 7.2 Bayesian Classification .............................................................276 7.2.1 Average Risk ...............................................................278 7.3 Decision (Hyper)Surfaces ..........................................................280 7.3.1 The Gaussian Distribution Case ...........................................282 7.4 The Naive Bayes Classifier .........................................................287 7.5 The Nearest Neighbor Rule ........................................................288 7.6 Logistic Regression .................................................................290 7.7 Fisher s Linear Discriminant .......................................................294 7.8 Classification Trees .................................................................300 7.9 Combining Classifiers ..............................................................304 7.10 The Boosting Approach ............................................................307 7.11 Boosting Trees ......................................................................313 7.12 A Case Study: Protein Folding Prediction ......................................... 314 Problems ...................................................................................318 References ..................................................................................323 CHAPTER 8 Parameter Learning: A Convex Analytic Path ...........................327 8.1 Introduction .........................................................................328 8.2 Convex Sets and Functions .........................................................329 8.2.1 Convex Sets ................................................................329 8.2.2 Convex Functions ..........................................................330 8.3 Projections onto Convex Sets ......................................................333 8.3.1 Properties of Projections ..................................................337 8.4 Fundamental Theorem of Projections onto Convex Sets .........................341 8.5 A Parallel Version of POCS ........................................................344 8.6 From Convex Sets to Parameter Estimation and Machine Learning .............345 8.6.1 Regression ..................................................................345 8.6.2 Classification ...............................................................347 Contents ix 8.7 Infinite Many Closed Convex Sets: The Online Learning Case ..................349 8.7.1 Convergence of APSM ....................................................351 8.8 Constrained Learning ...............................................................356 8.9 The Distributed APSM .............................................................357 8.10 Optimizing Nonsmooth Convex Cost Functions ..................................358 8.10.1 Subgradients and Subdifferentials ........................................359 8.10.2 Minimizing Nonsmooth Continuous Convex Loss Functions: The Batch Learning Case ..................................................362 8.10.3 Online Learning for Convex Optimization ...............................367 8.11 Regret Analysis ..................................................................... 370 8.12 Online Learning and Big Data Applications: A Discussion ......................374 8.13 Proximal Operators .................................................................379 8.13.1 Properties of the Proximal Operator ......................................382 8.13.2 Proximal Minimization ....................................................383 8.14 Proximal Splitting Methods for Optimization .....................................385 Problems ...................................................................................389 8.15 Appendix to Chapter 8..............................................................393 References ..................................................................................398 CHAPTER 9 Sparsity-Aware Learning: Concepts and Theoretical Foundations ................................................ 4оз 9.1 Introduction .........................................................................403 9.2 Searching for a Norm ...............................................................404 9.3 The Least Absolute Shrinkage and Selection Operator (LASSO) ................407 9.4 Sparse Signal Representation ......................................................411 9.5 In Search of the Sparsest Solution .................................................415 9.6 Uniqueness of the £o Minimizer ...................................................422 9.6.1 Mutual Coherence .........................................................424 9.7 Equivalence of tq and i Minimizers: Sufficiency Conditions ...................426 9.7.1 Condition Implied by the Mutual Coherence Number ...................426 9.7.2 The Restricted Isometry Property (RIP) .................................. 427 9.8 Robust Sparse Signal Recovery from Noisy Measurements ................,,.... 429 9.9 Compressed Sensing: The Glory of Randomness ................................. 430 9.9.1 Dimensionality Reduction and Stable Embeddings ......................433 9.9.2 Sub-Nyquist Sampling: Analog-to-Information Conversion ............434 9.10 A Case Study: Image De-Noising ..................................................438 Problems ................................................................................... 440 References ..................................................................................444 CHAPTER 10 Sparsity-Aware Learning; Algorithms and Applications .............449 10.1 Introduction .........................................................................450 10.2 Sparsity-Promoting Algorithms ....................................................450 Contents 10.2.1 Greedy Algorithms ........................................................451 10.2.2 Iterative Shrinkage/Thresholding (1ST) Algorithms .....................456 10.2.3 Which Algorithm?: Some Practical Hints ................................462 10.3 Variations on the Sparsity-Aware Theme ..........................................467 10.4 Online Sparsity-Promoting Algorithms ............................................475 10.4.1 LASSO: Asymptotic Performance ........................................475 10.4.2 The Adaptive Norm-Weighted LASSO ...................................477 10.4.3 Adaptive CoSaMP (AdCoSaMP) Algorithm .............................479 10.4.4 Sparse Adaptive Projection Subgradient Method (SpAPSM) ...........480 10.5 Learning Sparse Analysis Models .................................................485 10.5.1 Compressed Sensing for Sparse Signal Representation in Coherent Dictionaries ...................................................487 10.5.2 Cosparsity ..................................................................488 10.6 A Case Study: Time-Frequency Analysis .........................................490 10.7 Appendix to Chapter 10: Some Hints from the Theory of Frames ...............497 Problems ...................................................................................500 References ..................................................................................502 CHAPTER 11 Learning in Reproducing Kernel Hubert Spaces .......................509 11.1 Introduction .........................................................................510 11.2 Generalized Linear Models .........................................................510 11.3 Volterra, Wiener, and Hammerstein Models .......................................511 11.4 Cover s Theorem: Capacity of a Space in Linear Dichotomies ..................514 11.5 Reproducing Kernel Hubert Spaces ...............................................517 11.5.1 Some Properties and Theoretical Highlights .............................519 11.5.2 Examples of Kernel Functions ............................................520 11.6 Représenter Theorem ...............................................................525 11.6.1 Semiparametric Représenter Theorem ....................................527 11.6.2 Nonparametric Modeling: A Discussion .................................528 11.7 Kernel Ridge Regression ...........................................................528 11.8 Support Vector Regression ......................................................... 530 11.8.1 The Linear 6 -Insensitive Optimal Regression ............................531 11.9 Kernel Ridge Regression Revisited ................................................537 11.10 Optimal Margin Classification: Support Vector Machines ........................538 11.10.1 Linearly Separable Classes: Maximum Margin Classifiers .........................................................540 11.10.2 Nonseparable Classes ......................................................545 11.10.3 Performance of SVMs and Applications .................................550 11.10.4 Choice of Hyperparameters ...............................................550 11.11 Computational Considerations .....................................................551 11.11.1 Multiclass Generalizations ................................................552 Contents xi 11.12 Online Learning in RKHS..........................................................553 11.12.1 The Kernel LMS(KLMS) .................................................553 11.12.2 The Naive Online Rreg Minimization Algorithm (NORMA) ............556 11.12.3 The Kernel APSM Algorithm .............................................560 11.13 Multiple Kernel Learning ..........................................................567 11.14 Nonparametric Sparsity-Aware Learning: Additive Models ......................568 11.15 A Case Study: Authorship Identification ..........................................570 Problems ....................................................................................574 References ...................................................................................578 CHAPTER 12 Bayesian Learning·. Inference and the Ш Algorithm ................. 585 12.1 Introduction .........................................................................586 12.2 Regression: A Bayesian Perspective ...............................................586 12.2.1 The Maximum Likelihood Estimator ..................................... 587 12.2.2 The MAP Estimator .......................................................588 12.2.3 The Bayesian Approach ...................................................589 12.3 The Evidence Function and Occam s Razor Rule .................................593 12.4 Exponential Family of Probability Distributions ..................................600 12.4.1 The Exponential Family and the Maximum Entropy Method ...........605 12.5 Latent Variables and the EM Algorithm ...........................................606 12.5.1 The Expectation-Maximization Algorithm ...............................606 12.5.2 The EM Algorithm: A Lower Bound Maximization View ..............608 12.6 Linear Regression and the EM Algorithm .........................................610 12.7 Gaussian Mixture Models ..........................................................613 12.7.1 Gaussian Mixture Modeling and Clustering ..............................617 12.8 Combining Learning Models: A Probabilistic Point of View .....................621 12.8.1 Mixing Linear Regression Models ........................................622 12.8.2 Mixing Logistic Regression Models ......................................625 Problems ...................................................................................628 12.9 Appendix to Chapter 12............................................................631 12.9.1 PDFs with Exponent of Quadratic Form .................................631 12.9.2 The Conditional from the Joint Gaussian Pdf............................632 12.9.3 The Marginal from the Joint Gaussian Pdf............................... 633 12.9.4 The Posterior from Gaussian Prior and Conditional Pdf s ................634 References .................................................................................. 637 CHAPTER 13 Bayessan Learnings Approximate Inference and Nonparametrig IVîodeis .............................................................................639 і 3.1 Introduction .........................................................................640 і 3.2 Variational Approximation in Bayesian Learning .................................640 13.2.1 The Case of the Exponential Family of Probability Distributions .......644 xii Contents 13.3 A Variational Bayesian Approach to Linear Regression..........................645 13.4 A Variational Bayesian Approach to Gaussian Mixture Modeling ...............651 13.5 When Bayesian Inference Meets Sparsity .........................................655 13.6 Sparse Bayesian Learning (SBL) ..................................................657 13.6.1 The Spike and Slab Method ...............................................660 13.7 The Relevance Vector Machine Framework .......................................661 13.7.1 Adopting the Logistic Regression Model for Classification ...............................................................662 13.8 Convex Duality and Variational Bounds ...........................................666 13.9 Sparsity-Aware Regression: A Variational Bound Bayesian Path ................671 13.10 Sparsity-Aware Learning: Some Concluding Remarks ...........................675 13.11 Expectation Propagation ............................................................679 13.12 Nonparametric Bayesian Modeling ................................................683 13.12.1 The Chinese Restaurant Process ..........................................684 13.12.2 Inference ...................................................................684 13.12.3 Dirichlet Processes .........................................................684 13.12.4 The Stick-Breaking Construction of a DP ................................685 13.13 Gaussian Processes .................................................................687 13.13.1 Co variance Functions and Kernels ........................................688 13.13.2 Regression ..................................................................690 13.13.3 Classification ...............................................................692 13.14 A Case Study: Hyperspectral Image Unmixing ...................................693 13.14.1 Hierarchical Bayesian Modeling ..........................................695 13.14.2 Experimental Results ......................................................696 Problems ....................................................................................699 References ...................................................................................702 CHAPTER 14 Montecarlo Methods ..........................................................707 14.1 Introduction .........................................................................707 14.2 Monte Carlo Methods: The Main Concept ........................................708 14.2.1 Random number generation ...............................................709 14.3 Random Sampling Based on Function Transformation ...........................711 14.4 Rejection Sampling ................................................................. 715 14.5 Importance Sampling ...............................................................718 14.6 Monte Carlo Methods and the EM Algorithm .....................................720 14.7 Markov Chain Monte Carlo Methods ..............................................721 14.7.1 Ergodic Markov Chains ...................................................723 14.8 The Metropolis Method ............................................................728 14.8.1 Convergence Issues ........................................................731 14.9 Gibbs Sampling .....................................................................733 14.10 In Search of More Efficient Methods: A Discussion ..............................735 Contents xiii 14.11 A Case Study: Change-Point Detection ...........................................737 Problems ....................................................................................740 References ...................................................................................742 CHAPTER 15 Probabilistic Graphical Models: Parti ...................................745 15.1 Introduction .........................................................................745 15.2 The Need for Graphical Models ...................................................746 15.3 Bayesian Networks and the Markov Condition ...................................748 15.3.1 Graphs: Basic Definitions .................................................749 15.3.2 Some Hints on Causality ..................................................753 15.3.3 D-Separation ...............................................................755 15.3.4 Sigmoidal Bayesian Networks ............................................758 15.3.5 Linear Gaussian Models ...................................................759 15.3.6 Multiple-Cause Networks .................................................760 15.3.7 I-Maps, Soundness, Faithfulness, and Completeness ....................761 15.4 Undirected Graphical Models ..............................,...,...................762 15.4.1 Independencies and I-Maps in Markov Random Fields .............................................................763 15.4.2 The Ising Model and Its Variants .........................................765 15.4.3 Conditional Random Fields (CRFs) ......................................767 15.5 Factor Graphs .......................................................................768 15.5.1 Graphical Models for Error-Correcting Codes ...........................770 15,6 Moralization of Directed Graphs ...................................................772 15.7 Exact Inference Methods: Message-Passing Algorithms .........................773 15.7.1 Exact Inference in Chains .................................................773 15.7.2 Exact Inference in Trees ...................................................777 15.7.3 The Sum-Product Algorithm ..............................................778 15.7.4 The Max-Product and Max-Sum Algorithms ............................782 Problems ...................................................................................789 References ..................................................................................791 CHAPTER 16 Probabilistic Graphical Models: Part SI ..................................795 16.1 Introduction ..........................................................,..............795 16.2 Triangulated Graphs and Junction Trees ........................................... 796 16.2.1 Constructing a Join Tree ...................................................799 16.2.2 Message-Passing in Junction Trees ................,......................801 16.3 Approximate Inference Methods ................................................... 804 16.3.1 Variational Methods: Local Approximation ..............................804 16.3.2 Block Methods for Variational Approximation ..........................809 16.3.3 Loopy Belief Propagation .................................................813 16.4 Dynamic Graphical Models ........................................................816 xiv Contents 16.5 Hidden Markov Models ............................................................818 16.5.1 Inference ...................................................................821 16.5.2 Learning the Parameters in an HMM .....................................825 16.5.3 Discriminative Learning ...................................................828 16.6 Beyond HMMs: A Discussion .....................................................829 16.6.1 Factorial Hidden Markov Models .........................................829 16.6.2 Time-Varying Dynamic Bayesian Networks .............................832 16.7 Learning Graphical Modeis ........................................................833 16.7.1 Parameter Estimation ......................................................833 16.7.2 Learning the Structure .....................................................837 Problems ...................................................................................838 References .................................................................................. 840 CHAPTER 17 Particle Filtering ................................................................845 17.1 Introduction .........................................................................845 17.2 Sequential Importance Sampling ...................................................845 17.2.1 Importance Sampling Revisited ...........................................846 17.2.2 Resampling .................................................................847 17.2.3 Sequential Sampling .......................................................849 17.3 Kalman and Particle Filtering ......................................................851 17.3.1 Kalman Filtering: A Bayesian Point of View .............................852 17.4 Particle Filtering ....................................................................854 17.4.1 Degeneracy .................................................................858 í 7.4.2 Generic Particle Filtering ..................................................860 17.4.3 Auxiliary Particle Filtering ................................................862 Problems ...................................................................................868 References ..................................................................................872 CHAPTER 18 Neural Networks and Deep Learning ......................................875 18.1 Introduction .........................................................................876 18.2 The Perceptron ......................................................................877 18.2.1 The Kernel Perceptron Algorithm ........................................881 18.3 Feed-Forward Multilayer Neural Networks .......................................882 18.4 The Backpropagation Algorithm ...................................................886 18.4.1 The Gradient Descent Scheme ............................................887 18.4.2 Beyond the Gradient Descent Rationale ..................................895 18.4.3 Selecting a Cost Function .................................................896 18.5 Pruning the Network ................................................................897 18.6 Universal Approximation Property of Feed-Forward Neural Networks ....................................................................899 18.7 Neural Networks: A Bayesian Flavor ..............................................902 Contents xv 18.8 Learning Deep Networks ...........................................................903 18.8.1 The Need for Deep Architectures .........................................904 18.8.2 Training Deep Networks ..................................................905 18.8.3 Training Restricted Boltzmann Machines ................................908 18.8.4 Training Deep Feed-Forward Networks ..................................914 18.9 Deep Belief Networks ..............................................................916 18.10 Variations on the Deep Learning Theme ..........................................918 18.10.1 Gaussian Units .............................................................918 18.10.2 Stacked Autoencoders.....................................................919 18.10.3 The Conditional RBM .....................................................920 18.11 Case Study: A Deep Network for Optical Character Recognition .........................................................................923 18.12 Case Study: A Deep Autoencoder.................................................925 18.13 Example: Generating Data via a DBN .........,...................................928 Problems .................................................................................... 929 References ...................................................................................932 CHAPTER 19 Dimensionality Reduction and Latent Variables Modeling ..........937 19.1 Introduction .........................................................................938 19.2 Intrinsic Dimensionality ............................................................939 19.3 Principle Component Analysis .....................................................939 19.4 Canonical Correlation Analysis ....................................................950 19.4.1 Relatives of CCA ..........................................................953 19.5 Independent Component Analysis .................................................955 19.5.1 ICA and Gaussianity .......................................................956 19.5.2 ICA and Higher Order Cumulants ........................................957 19.5.3 Non-Gaussianity and Independent Components ................................................................958 19.5.4 ICA Based on Mutual Information ........................................959 19.5.5 Alternative Paths to ICA ...................................................962 19.6 Dictionary Learning: The k-$ VD Algorithm ......................................966 19.7 Nonnegative Matrix Factorization ................................................. 971 19.8 Learning Low-Dimensional Models: A Probabilistic Perspective ...........................................................,..,.....,,,,.. 972 19.8.1 Factor Analysis ...................................................,.,,.,... 972 19.8.2 Probabilistic PCA .......................................................... 974 19.8.3 Mixture of Factors Analyzers: A Bayesian View to Compressed Sensing ....................................................977 19.9 Nonlinear Dimensionality Reduction ..............................................980 19.9.1 Kernel PCA ................................................................980 19.9.2 Graph-Based Methods .....................................................982 xvi Contents 19.10 Low-Rank Matrix Factorization: A Sparse Modeling Path .......................991 19.10.1 Matrix Completion .........................................................991 19.10.2 Robust PCA ................................................................995 19.10.3 Applications of Matrix Completion and ROBUST PCA ................996 19.11 A Case Study: fMRI Data Analysis ................................................998 Problems ....................................................................................1002 References ...................................................................................1003 APPENDIX A Linear Algebra ................................................................... юіз A.1 Properties of Matrices ..............................................................1013 A.2 Positive Definite and Symmetric Matrices ........................................1015 A.3 Wirtinger Calculus ..................................................................1016 References ................................................................................1017 APPENDIX В Probability Theory and Statistics .......................................1019 B.1 Cramér-Rao Bound .................................................................1019 B.2 Characteristic Functions ............................................................1020 B,3 Moments and Cumulants ........................................................... 1020 B.4 Edgeworth Expansion of a Pdf.....................................................1021 Reference .................................................................................1022 APPENDIX С Hints on Constrained Optimization ........................................1023 C.1 Equality Constraints ................................................................ 1023 C.2 Inequality Constraints ..............................................................1025 References ................................................................................1029 Index .....................................................................................................1031 Gain an πι depth understanding of all the main machine learning methods, including the very latest trends, such as sparse modeling, online convex optimization, nonparametric Bayesian modeling, deep learning, learning in RKH spaces, dimensionality reduction and dictionary learning. This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches -which are based on optimization techniques - together with the Bayesian inference approach, whose essence lies in the use of a hierarchy of probabilistic models. The book presents the major machine learning methods as they have been developed in different disciplines, such as statistics, statistical and adaptive signal processing and computer science. Focusing on the physical reasoning behind the mathematics, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts. The book builds carefully from the basic classical methods to the most recent trends, with chapters written to be as self-contained as possible, making the text suitable for different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as short courses on sparse modeling, deep learning, and probabilistic graphical models.
any_adam_object	1
author	Theodoridis, Sergios 1951-
author_GND	(DE-588)12164135X
author_facet	Theodoridis, Sergios 1951-
author_role	aut
author_sort	Theodoridis, Sergios 1951-
author_variant	s t st
building	Verbundindex
bvnumber	BV042385052
classification_rvk	ST 300 ST 302
ctrlnum	(OCoLC)910913108 (DE-599)BVBBV042385052
discipline	Informatik
format	Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01916nam a2200397 c 4500</leader><controlfield tag="001">BV042385052</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20181127 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">150302s2015 ad\|\| \|\|\|\| 00\|\|\| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780128015223</subfield><subfield code="9">978-0-12-801522-3</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="z">0128015225</subfield><subfield code="9">0128015225</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)910913108</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV042385052</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-29T</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-706</subfield><subfield code="a">DE-573</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-523</subfield><subfield code="a">DE-863</subfield><subfield code="a">DE-20</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 302</subfield><subfield code="0">(DE-625)143652:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Theodoridis, Sergios</subfield><subfield code="d">1951-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)12164135X</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Machine learning</subfield><subfield code="b">a Bayesian and optimization perspective</subfield><subfield code="c">Sergios Theodoridis</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Amsterdam [u.a.]</subfield><subfield code="b">Elsevier, Academic Press</subfield><subfield code="c">2015</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXI, 1050 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Optimierung</subfield><subfield code="0">(DE-588)4043664-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Bayes-Verfahren</subfield><subfield code="0">(DE-588)4204326-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Bayes-Verfahren</subfield><subfield code="0">(DE-588)4204326-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Optimierung</subfield><subfield code="0">(DE-588)4043664-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027821053&sequence=000003&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027821053&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Klappentext</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-027821053</subfield></datafield></record></collection>
id	DE-604.BV042385052
illustrated	Illustrated
indexdate	2025-02-20T06:37:48Z
institution	BVB
isbn	9780128015223
language	English
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-027821053
oclc_num	910913108
open_access_boolean
owner	DE-29T DE-739 DE-706 DE-573 DE-11 DE-523 DE-863 DE-BY-FWS DE-20
owner_facet	DE-29T DE-739 DE-706 DE-573 DE-11 DE-523 DE-863 DE-BY-FWS DE-20
physical	XXI, 1050 S. Ill., graph. Darst.
publishDate	2015
publishDateSearch	2015
publishDateSort	2015
publisher	Elsevier, Academic Press
record_format	marc
spellingShingle	Theodoridis, Sergios 1951- Machine learning a Bayesian and optimization perspective Maschinelles Lernen (DE-588)4193754-5 gnd Optimierung (DE-588)4043664-0 gnd Bayes-Verfahren (DE-588)4204326-8 gnd
subject_GND	(DE-588)4193754-5 (DE-588)4043664-0 (DE-588)4204326-8
title	Machine learning a Bayesian and optimization perspective
title_auth	Machine learning a Bayesian and optimization perspective
title_exact_search	Machine learning a Bayesian and optimization perspective
title_full	Machine learning a Bayesian and optimization perspective Sergios Theodoridis
title_fullStr	Machine learning a Bayesian and optimization perspective Sergios Theodoridis
title_full_unstemmed	Machine learning a Bayesian and optimization perspective Sergios Theodoridis
title_short	Machine learning
title_sort	machine learning a bayesian and optimization perspective
title_sub	a Bayesian and optimization perspective
topic	Maschinelles Lernen (DE-588)4193754-5 gnd Optimierung (DE-588)4043664-0 gnd Bayes-Verfahren (DE-588)4204326-8 gnd
topic_facet	Maschinelles Lernen Optimierung Bayes-Verfahren
url	http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027821053&sequence=000003&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027821053&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA
work_keys_str_mv	AT theodoridissergios machinelearningabayesianandoptimizationperspective

Verfügbarkeit

Inhaltsverzeichnis

THWS Würzburg Zentralbibliothek Lesesaal

Bestandesangaben von THWS Würzburg Zentralbibliothek Lesesaal
Signatur:	1000 ST 302 T388
Exemplar 1	ausleihbar Verfügbar Bestellen

MARC

Datensatz im Suchindex

THWS Würzburg Zentralbibliothek Lesesaal

Ähnliche Einträge