Machine learning: a Bayesian and optimization perspective
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Amsterdam [u.a.]
Elsevier, Academic Press
2015
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis Klappentext |
Beschreibung: | XXI, 1050 S. Ill., graph. Darst. |
ISBN: | 9780128015223 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV042385052 | ||
003 | DE-604 | ||
005 | 20181127 | ||
007 | t | ||
008 | 150302s2015 ad|| |||| 00||| eng d | ||
020 | |a 9780128015223 |9 978-0-12-801522-3 | ||
020 | |z 0128015225 |9 0128015225 | ||
035 | |a (OCoLC)910913108 | ||
035 | |a (DE-599)BVBBV042385052 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
049 | |a DE-29T |a DE-739 |a DE-706 |a DE-573 |a DE-11 |a DE-523 |a DE-863 |a DE-20 | ||
084 | |a ST 300 |0 (DE-625)143650: |2 rvk | ||
084 | |a ST 302 |0 (DE-625)143652: |2 rvk | ||
100 | 1 | |a Theodoridis, Sergios |d 1951- |e Verfasser |0 (DE-588)12164135X |4 aut | |
245 | 1 | 0 | |a Machine learning |b a Bayesian and optimization perspective |c Sergios Theodoridis |
264 | 1 | |a Amsterdam [u.a.] |b Elsevier, Academic Press |c 2015 | |
300 | |a XXI, 1050 S. |b Ill., graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 0 | 7 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Optimierung |0 (DE-588)4043664-0 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Bayes-Verfahren |0 (DE-588)4204326-8 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |D s |
689 | 0 | 1 | |a Bayes-Verfahren |0 (DE-588)4204326-8 |D s |
689 | 0 | 2 | |a Optimierung |0 (DE-588)4043664-0 |D s |
689 | 0 | |5 DE-604 | |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027821053&sequence=000003&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027821053&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |3 Klappentext |
999 | |a oai:aleph.bib-bvb.de:BVB01-027821053 |
Datensatz im Suchindex
DE-BY-863_location | 1000 |
---|---|
DE-BY-FWS_call_number | 1000/ST 302 T388 |
DE-BY-FWS_katkey | 707142 |
DE-BY-FWS_media_number | 083101414536 |
_version_ | 1824553559145840640 |
adam_text | Contents
Preface
...................................................................................................
Acknowledgments
....................................................................................... xix
Notation
.................................................................................................. xxi
CHAPTER
1
Introduction
......................................................................
ι
1.1
What Machine Learning is About
.................................................. 1
1.1.1
Classification
............................................................... 2
1.1.2
Regression
.................................................................. 3
1.2
Structure and a Road Map of the Book
............................................ 5
References
................................................................................ 8
CHAPTER2 Probability and Stochastic Processes
................................... 9
2.1
Introduction
......................................................................... 10
2.2
Probability and Random Variables
................................................. 10
2.2.1
Probability
.................................................................. 11
2.2.2
Discrete Random Variables
................................................ 12
2.2.3
Continuous Random Variables
............................................ 14
2.2.4
Mean and Variance
........................................................ 15
2.2.5
Transformation of Random Variables
..................................... 17
2.3
Examples of Distributions
.......................................................... 18
2.3.1
Discrete Variables
.......................................................... 18
2.3.2
Continuous Variables
...................................................... 20
2.4
Stochastic Processes
................................................................ 29
2.4.1
First and Second Order Statistics
......................................... 30
2.4.2
Stationarity and Ergodicity
................................................ 30
2.4.3
Power Spectral Density
.................................................... 33
2.4.4
Autoregressive
Models
............,...,................................... 38
2.5
Information Theory
................................................................. 41
2.5.1
Discrete Random Variables
................................................ 42
2.5.2
Continuous Random Variables
............................................ 45
2.6
Stochastic Convergence
............................................................ 48
Problems
.................................................................................. 49
References
................................................................................ 51
CHAPTER
3
Learning in Parametric Modeling: Basic Concepts and Directions
53
3.1
Introduction
......................................................................... 53
3.2
Parameter Estimation: The Deterministic Point of View
......................... 54
vi
Contents
3.3
Linear
Regression................................................................... 57
3.4
Classification
........................................................................ 60
3.5
Biased
Versus
Unbiased Estimation
............................................... 64
3.5.1
Biased or Unbiased Estimation?
.......................................... 65
3.6
The
Cramér-Rao
Lower Bound
.................................................... 67
3.7
Sufficient Statistic
................................................................... 70
3.8
Regularization
....................................................................... 72
3.9
The Bias-Variance Dilemma
....................................................... 77
3.9.1
Mean-Square Error Estimation
............................................ 77
3.9.2
Bias-Variance Tradeoff
.................................................... 78
3.10
Maximum Likelihood Method
..................................................... 82
3.10.1
Linear Regression: The
Non
white Gaussian Noise Case
................ 84
3.11
Bayesian Inference
.................................................................. 84
3.11.1
The Maximum a Posteriori Probability Estimation Method
............. 88
3.12
Curse of Dimensionality
............................................................ 89
3.13
Validation
........................................................................... 91
3.14
Expected and Empirical Loss Functions
........................................... 93
3.15
Nonparametric Modeling and Estimation
......................................... 95
Problems
................................................................................... 97
References
.................................................................................. 102
CHAPTER4 Mean-Square Error Linear Estimation
....................................105
4.1
Introduction
......................................................................... 105
4.2
Mean-Square Error Linear Estimation: The Normal Equations
.................. 106
4.2.1
The Cost Function Surface
................................................ 107
4.3
A Geometric Viewpoint: Orthogonality Condition
................................ 109
4.4
Extension to Complex-Valued Variables
..........................................
Ill
4.4.1
Widely Linear Complex-Valued Estimation
.............................. 113
4.4.2
Optimizing with Respect to Complex-Valued Variables:
Wirtinger Calculus
......................................................... 116
4.5
Linear Filtering
..................................................................... 118
4.6
MSE
Linear Filtering: A Frequency Domain Point of View
...................... 120
4.7
Some Typical Applications
......................................................... 124
4.7.1
Interference Cancellation
.................................................. 124
4.7.2
System Identification
...................................................... 125
4.7.3
Deconvolution: Channel Equalization
.................................... 126
4.8
Algorithmic Aspects: The
Levinson
and the Lattice-Ladder Algorithms
........ 132
4.8.1
The Lattice-Ladder Scheme
............................................... 137
4.9
Mean-Square Error Estimation of Linear Models
................................. 140
4.9.1
The Gauss-Markov Theorem
.............................................. 143
4.9.2
Constrained Linear Estimation: The Beamforming Case
................ 145
Contents
vii
4.10
Time-Varying Statistics:
Kalman
Filtering
........................................ 148
Problems
................................................................................... 154
References
.................................................................................. 158
CHAPTER
5
Stochastic Gradient Descent: The LMS Algorithm
and its Family
....................................................................
ібі
5.1
Introduction
......................................................................... 162
5.2
The Steepest Descent Method
...................................................... 163
5.3
Application to the Mean-Square Error Cost Function
............................ 167
5.3.1
The Complex-Valued Case
................................................ 175
5.4
Stochastic Approximation
.......................................................... 177
5.5
The Least-Mean-Squares Adaptive Algorithm
.................................... 179
5.5.1
Convergence and Steady-State Performance of the LMS
in Stationary Environments
............................................... 181
5.5.2
Cumulative Loss Bounds
.................................................. 186
5.6
The
Affine
Projection Algorithm
................................................... 188
5.6.1
The Normalized LMS
..................................................... 193
5.7
The Complex-Valued Case
......................................................... 194
5.8
Relatives of the LMS
............................................................... 196
5.9
Simulation
Exampies
............................................................... 199
5.10
Adaptive Decision Feedback Equalization
........................................202
5.11
The Linearly Constrained LMS
....................................................204
5.12
Tracking Performance of the LMS in Nonstationary
Environments
.......................................................................206
5.13
Distributed Learning: The Distributed LMS
......................................208
5.13.1
Cooperation Strategies
.....................................................209
5.13.2
The Diffusion LMS
........................................................211
5.13.3
Convergence and Steady-State Performance:
Some Highlights
...........................................................218
5.13.4
Consensus-Based Distributed Schemes
...................................220
5.14
A Case Study: Target Localization
................................................222
5.15
Some Concluding Remarks: Consensus Matrix
...................................223
Problems
................................................................................... 224
References
...................__..............................__.........................2.27
Laast-Squaros Family
....................................................
233
6.1
Introduction
......................................................................... 234
6.2
Least-Squares Linear Regression: A Geometric Perspective
..................... 234
6.3
Statistical Properties of the LS Estimator
..........................................236
6.4
Orthogonalizing the Column Space of X: The
SVD
Method
.....................239
6.5
Ridge Regression
...................................................................243
6.6
The Recursive Least-Squares Algorithm
..........................................245
viii Contents
6.7
Newton s Iterative Minimization Method
.........................................248
6.7.
1 RLS and Newton s Method
................................................251
6.8
Steady-State Performance of the RLS
.............................................252
6.9
Complex-Valued Data: The Widely Linear RLS
..................................254
6.10
Computational Aspects of the LS Solution
........................................255
6.11
The Coordinate and Cyclic Coordinate Descent Methods
........................258
6.12
Simulation Examples
...............................................................259
6.13
Total-Least-Squares
.................................................................261
Problems
...................................................................................268
References
..................................................................................272
CHAPTER
7
Classification: A Tour of the Classics
....................................275
7.1
Introduction
.........................................................................275
7.2
Bayesian Classification
.............................................................276
7.2.1
Average Risk
...............................................................278
7.3
Decision (Hyper)Surfaces
..........................................................280
7.3.1
The Gaussian Distribution Case
...........................................282
7.4
The Naive
Bayes
Classifier
.........................................................287
7.5
The Nearest Neighbor Rule
........................................................288
7.6
Logistic Regression
.................................................................290
7.7
Fisher s Linear Discriminant
.......................................................294
7.8
Classification Trees
.................................................................300
7.9
Combining Classifiers
..............................................................304
7.10
The Boosting Approach
............................................................307
7.11
Boosting Trees
......................................................................313
7.12
A Case Study: Protein Folding Prediction
......................................... 314
Problems
...................................................................................318
References
..................................................................................323
CHAPTER
8
Parameter Learning: A Convex Analytic Path
...........................327
8.1
Introduction
.........................................................................328
8.2
Convex Sets and Functions
.........................................................329
8.2.1
Convex Sets
................................................................329
8.2.2
Convex Functions
..........................................................330
8.3
Projections onto Convex Sets
......................................................333
8.3.1
Properties of Projections
..................................................337
8.4
Fundamental Theorem of Projections onto Convex Sets
.........................341
8.5
A Parallel Version of POCS
........................................................344
8.6
From Convex Sets to Parameter Estimation and Machine Learning
.............345
8.6.1
Regression
..................................................................345
8.6.2
Classification
...............................................................347
Contents ix
8.7 Infinite
Many Closed Convex Sets: The Online Learning Case
..................349
8.7.1
Convergence of APSM
....................................................351
8.8
Constrained Learning
...............................................................356
8.9
The Distributed APSM
.............................................................357
8.10
Optimizing Nonsmooth Convex Cost Functions
..................................358
8.10.1
Subgradients
and Subdifferentials
........................................359
8.10.2
Minimizing Nonsmooth Continuous Convex Loss Functions:
The Batch Learning Case
..................................................362
8.10.3
Online Learning for Convex Optimization
...............................367
8.11
Regret Analysis
..................................................................... 370
8.12
Online Learning and Big Data Applications: A Discussion
......................374
8.13
Proximal Operators
.................................................................379
8.13.1
Properties of the Proximal Operator
......................................382
8.13.2
Proximal Minimization
....................................................383
8.14
Proximal Splitting Methods for Optimization
.....................................385
Problems
...................................................................................389
8.15
Appendix to Chapter
8..............................................................393
References
..................................................................................398
CHAPTER
9
Sparsity-Aware Learning: Concepts
and Theoretical Foundations
................................................
4оз
9.1
Introduction
.........................................................................403
9.2
Searching for a Norm
...............................................................404
9.3
The Least Absolute Shrinkage and Selection Operator (LASSO)
................407
9.4
Sparse Signal Representation
......................................................411
9.5
In Search of the Sparsest Solution
.................................................415
9.6
Uniqueness of the £o Minimizer
...................................................422
9.6.1
Mutual Coherence
.........................................................424
9.7
Equivalence of tq and i Minimizers: Sufficiency Conditions
...................426
9.7.1
Condition Implied by the Mutual Coherence Number
...................426
9.7.2
The Restricted Isometry Property (RIP)
.................................. 427
9.8
Robust Sparse Signal Recovery from Noisy Measurements
................,,.... 429
9.9
Compressed Sensing: The Glory of Randomness
................................. 430
9.9.1
Dimensionality Reduction and Stable Embeddings
......................433
9.9.2
Sub-Nyquist Sampling: Analog-to-Information Conversion
............434
9.10
A Case Study: Image De-Noising
..................................................438
Problems
................................................................................... 440
References
..................................................................................444
CHAPTER
10
Sparsity-Aware Learning; Algorithms and Applications
.............449
10.1
Introduction
.........................................................................450
10.2
Sparsity-Promoting Algorithms
....................................................450
Contents
10.2.1
Greedy Algorithms
........................................................451
10.2.2
Iterative Shrinkage/Thresholding
(1ST)
Algorithms
.....................456
10.2.3
Which Algorithm?: Some Practical Hints
................................462
10.3
Variations on the Sparsity-Aware Theme
..........................................467
10.4
Online Sparsity-Promoting Algorithms
............................................475
10.4.1
LASSO: Asymptotic Performance
........................................475
10.4.2
The Adaptive Norm-Weighted LASSO
...................................477
10.4.3
Adaptive CoSaMP (AdCoSaMP) Algorithm
.............................479
10.4.4
Sparse Adaptive Projection
Subgradient
Method (SpAPSM)
...........480
10.5
Learning Sparse Analysis Models
.................................................485
10.5.1
Compressed Sensing for Sparse Signal Representation
in Coherent Dictionaries
...................................................487
10.5.2
Cosparsity
..................................................................488
10.6
A Case Study: Time-Frequency Analysis
.........................................490
10.7
Appendix to Chapter
10:
Some Hints from the Theory of Frames
...............497
Problems
...................................................................................500
References
..................................................................................502
CHAPTER
11
Learning in Reproducing Kernel Hubert Spaces
.......................509
11.1
Introduction
.........................................................................510
11.2
Generalized Linear Models
.........................................................510
11.3
Volterra, Wiener, and
Hammerstein
Models
.......................................511
11.4
Cover s Theorem: Capacity of a Space in Linear Dichotomies
..................514
11.5
Reproducing Kernel Hubert Spaces
...............................................517
11.5.1
Some Properties and Theoretical Highlights
.............................519
11.5.2
Examples of Kernel Functions
............................................520
11.6
Représenter
Theorem
...............................................................525
11.6.1
Semiparametric
Représenter
Theorem
....................................527
11.6.2
Nonparametric Modeling: A Discussion
.................................528
11.7
Kernel Ridge Regression
...........................................................528
11.8
Support Vector Regression
......................................................... 530
11.8.1
The Linear
6
-Insensitive Optimal Regression
............................531
11.9
Kernel Ridge Regression Revisited
................................................537
11.10
Optimal Margin Classification: Support Vector Machines
........................538
11.10.1
Linearly Separable Classes: Maximum
Margin Classifiers
.........................................................540
11.10.2
Nonseparable Classes
......................................................545
11.10.3
Performance of SVMs and Applications
.................................550
11.10.4
Choice of Hyperparameters
...............................................550
11.11
Computational Considerations
.....................................................551
11.11.1
Multiclass Generalizations
................................................552
Contents xi
11.12 Online
Learning
in RKHS..........................................................553
11.12.1 The Kernel
LMS(KLMS)
.................................................553
11.12.2 The Naive Online Rreg
Minimization Algorithm
(NORMA)
............556
11.12.3
The Kernel APSM Algorithm
.............................................560
11.13
Multiple Kernel Learning
..........................................................567
11.14
Nonparametric Sparsity-Aware Learning: Additive Models
......................568
11.15
A Case Study: Authorship Identification
..........................................570
Problems
....................................................................................574
References
...................................................................................578
CHAPTER
12
Bayesian Learning·. Inference and the
Ш
Algorithm
................. 585
12.1
Introduction
.........................................................................586
12.2
Regression: A Bayesian Perspective
...............................................586
12.2.1
The Maximum Likelihood Estimator
..................................... 587
12.2.2
The MAP Estimator
.......................................................588
12.2.3
The Bayesian Approach
...................................................589
12.3
The Evidence Function and Occam s Razor Rule
.................................593
12.4
Exponential Family of Probability Distributions
..................................600
12.4.1
The Exponential Family and the Maximum Entropy Method
...........605
12.5
Latent Variables and the EM Algorithm
...........................................606
12.5.1
The Expectation-Maximization Algorithm
...............................606
12.5.2
The EM Algorithm: A Lower Bound Maximization View
..............608
12.6
Linear Regression and the EM Algorithm
.........................................610
12.7
Gaussian Mixture Models
..........................................................613
12.7.1
Gaussian Mixture Modeling and Clustering
..............................617
12.8
Combining Learning Models: A Probabilistic Point of View
.....................621
12.8.1
Mixing Linear Regression Models
........................................622
12.8.2
Mixing Logistic Regression Models
......................................625
Problems
...................................................................................628
12.9
Appendix to Chapter
12............................................................631
12.9.1
PDFs with Exponent of Quadratic Form
.................................631
12.9.2
The Conditional from the Joint Gaussian
Pdf............................632
12.9.3
The Marginal from the Joint Gaussian
Pdf............................... 633
12.9.4
The Posterior from Gaussian Prior and Conditional
Pdf s
................634
References
.................................................................................. 637
CHAPTER
13
Bayessan Learnings Approximate Inference and Nonparametrig
IVîodeis
.............................................................................639
і
3.1
Introduction
.........................................................................640
і
3.2
Variational Approximation in Bayesian Learning
.................................640
13.2.1
The Case of the Exponential Family of Probability Distributions
.......644
xii Contents
13.3
A Variational Bayesian Approach to
Linear Regression..........................645
13.4
A
Variational Bayesian
Approach to Gaussian Mixture Modeling
...............651
13.5
When Bayesian Inference Meets Sparsity
.........................................655
13.6
Sparse Bayesian Learning (SBL)
..................................................657
13.6.1
The Spike and Slab Method
...............................................660
13.7
The Relevance Vector Machine Framework
.......................................661
13.7.1
Adopting the Logistic Regression Model for
Classification
...............................................................662
13.8
Convex Duality and Variational Bounds
...........................................666
13.9
Sparsity-Aware Regression: A Variational Bound Bayesian Path
................671
13.10
Sparsity-Aware Learning: Some Concluding Remarks
...........................675
13.11
Expectation Propagation
............................................................679
13.12
Nonparametric Bayesian Modeling
................................................683
13.12.1
The Chinese Restaurant Process
..........................................684
13.12.2
Inference
...................................................................684
13.12.3
Dirichlet Processes
.........................................................684
13.12.4
The Stick-Breaking Construction of a DP
................................685
13.13
Gaussian Processes
.................................................................687
13.13.1
Co variance Functions and Kernels
........................................688
13.13.2
Regression
..................................................................690
13.13.3
Classification
...............................................................692
13.14
A Case Study: Hyperspectral Image Unmixing
...................................693
13.14.1
Hierarchical Bayesian Modeling
..........................................695
13.14.2
Experimental Results
......................................................696
Problems
....................................................................................699
References
...................................................................................702
CHAPTER
14
Montecarlo
Methods
..........................................................707
14.1
Introduction
.........................................................................707
14.2
Monte Carlo Methods: The Main Concept
........................................708
14.2.1
Random number generation
...............................................709
14.3
Random Sampling Based on Function Transformation
...........................711
14.4
Rejection Sampling
................................................................. 715
14.5
Importance Sampling
...............................................................718
14.6
Monte Carlo Methods and the EM Algorithm
.....................................720
14.7
Markov Chain Monte Carlo Methods
..............................................721
14.7.1
Ergodic Markov Chains
...................................................723
14.8
The Metropolis Method
............................................................728
14.8.1
Convergence Issues
........................................................731
14.9
Gibbs Sampling
.....................................................................733
14.10
In Search of More Efficient Methods: A Discussion
..............................735
Contents xiii
14.11
A Case Study: Change-Point Detection
...........................................737
Problems
....................................................................................740
References
...................................................................................742
CHAPTER
15
Probabilistic Graphical Models: Parti
...................................745
15.1
Introduction
.........................................................................745
15.2
The Need for Graphical Models
...................................................746
15.3
Bayesian Networks and the Markov Condition
...................................748
15.3.1
Graphs: Basic Definitions
.................................................749
15.3.2
Some Hints on Causality
..................................................753
15.3.3
D-Separation
...............................................................755
15.3.4
Sigmoidal Bayesian Networks
............................................758
15.3.5
Linear Gaussian Models
...................................................759
15.3.6
Multiple-Cause Networks
.................................................760
15.3.7
I-Maps, Soundness, Faithfulness, and Completeness
....................761
15.4
Undirected Graphical Models
..............................,...,...................762
15.4.1
Independencies and I-Maps in Markov
Random Fields
.............................................................763
15.4.2
The Ising Model and Its Variants
.........................................765
15.4.3
Conditional Random Fields (CRFs)
......................................767
15.5
Factor Graphs
.......................................................................768
15.5.1
Graphical Models for Error-Correcting Codes
...........................770
15,6
Moralization of Directed Graphs
...................................................772
15.7
Exact Inference Methods: Message-Passing Algorithms
.........................773
15.7.1
Exact Inference in Chains
.................................................773
15.7.2
Exact Inference in Trees
...................................................777
15.7.3
The Sum-Product Algorithm
..............................................778
15.7.4
The Max-Product and Max-Sum Algorithms
............................782
Problems
...................................................................................789
References
..................................................................................791
CHAPTER
16
Probabilistic Graphical Models: Part SI
..................................795
16.1
Introduction
..........................................................,..............795
16.2
Triangulated Graphs and Junction Trees
........................................... 796
16.2.1
Constructing a Join Tree
...................................................799
16.2.2
Message-Passing in Junction Trees
................,......................801
16.3
Approximate Inference Methods
................................................... 804
16.3.1
Variational Methods: Local Approximation
..............................804
16.3.2
Block Methods for Variational Approximation
..........................809
16.3.3
Loopy Belief Propagation
.................................................813
16.4
Dynamic Graphical Models
........................................................816
xiv Contents
16.5
Hidden Markov Models
............................................................818
16.5.1
Inference
...................................................................821
16.5.2
Learning the Parameters in an
HMM
.....................................825
16.5.3
Discriminative Learning
...................................................828
16.6
Beyond HMMs: A Discussion
.....................................................829
16.6.1
Factorial Hidden Markov Models
.........................................829
16.6.2
Time-Varying Dynamic Bayesian Networks
.............................832
16.7
Learning Graphical Modeis
........................................................833
16.7.1
Parameter Estimation
......................................................833
16.7.2
Learning the Structure
.....................................................837
Problems
...................................................................................838
References
.................................................................................. 840
CHAPTER
17
Particle Filtering
................................................................845
17.1
Introduction
.........................................................................845
17.2
Sequential Importance Sampling
...................................................845
17.2.1
Importance Sampling Revisited
...........................................846
17.2.2
Resampling
.................................................................847
17.2.3
Sequential Sampling
.......................................................849
17.3
Kalman
and Particle Filtering
......................................................851
17.3.1
Kalman
Filtering: A Bayesian Point of View
.............................852
17.4
Particle Filtering
....................................................................854
17.4.1
Degeneracy
.................................................................858
í
7.4.2
Generic Particle Filtering
..................................................860
17.4.3
Auxiliary Particle Filtering
................................................862
Problems
...................................................................................868
References
..................................................................................872
CHAPTER
18
Neural Networks and Deep Learning
......................................875
18.1
Introduction
.........................................................................876
18.2
The Perceptron
......................................................................877
18.2.1
The Kernel Perceptron Algorithm
........................................881
18.3
Feed-Forward Multilayer Neural Networks
.......................................882
18.4
The Backpropagation Algorithm
...................................................886
18.4.1
The Gradient Descent Scheme
............................................887
18.4.2
Beyond the Gradient Descent Rationale
..................................895
18.4.3
Selecting a Cost Function
.................................................896
18.5
Pruning the Network
................................................................897
18.6
Universal Approximation Property of Feed-Forward
Neural Networks
....................................................................899
18.7
Neural Networks: A Bayesian Flavor
..............................................902
Contents xv
18.8
Learning Deep Networks
...........................................................903
18.8.1
The Need for Deep Architectures
.........................................904
18.8.2
Training Deep Networks
..................................................905
18.8.3
Training Restricted Boltzmann Machines
................................908
18.8.4
Training Deep Feed-Forward Networks
..................................914
18.9
Deep Belief Networks
..............................................................916
18.10
Variations on the Deep Learning Theme
..........................................918
18.10.1
Gaussian Units
.............................................................918
18.10.2
Stacked
Autoencoders.....................................................919
18.10.3
The Conditional RBM
.....................................................920
18.11
Case Study: A Deep Network for Optical Character
Recognition
.........................................................................923
18.12
Case Study: A Deep
Autoencoder.................................................925
18.13
Example: Generating Data via a DBN
.........,...................................928
Problems
.................................................................................... 929
References
...................................................................................932
CHAPTER
19
Dimensionality Reduction and Latent Variables Modeling
..........937
19.1
Introduction
.........................................................................938
19.2
Intrinsic Dimensionality
............................................................939
19.3
Principle Component Analysis
.....................................................939
19.4
Canonical Correlation Analysis
....................................................950
19.4.1
Relatives of CCA
..........................................................953
19.5
Independent Component Analysis
.................................................955
19.5.1
ICA
and Gaussianity
.......................................................956
19.5.2
ICA
and Higher Order
Cumulants
........................................957
19.5.3
Non-Gaussianity and Independent
Components
................................................................958
19.5.4
ICA
Based on Mutual Information
........................................959
19.5.5
Alternative Paths to
ICA
...................................................962
19.6
Dictionary Learning: The k-$ VD Algorithm
......................................966
19.7
Nonnegative
Matrix Factorization
................................................. 971
19.8
Learning Low-Dimensional Models: A Probabilistic
Perspective
...........................................................,..,.....,,,,.. 972
19.8.1
Factor Analysis
...................................................,.,,.,... 972
19.8.2
Probabilistic PCA
.......................................................... 974
19.8.3
Mixture of Factors Analyzers: A Bayesian View
to Compressed Sensing
....................................................977
19.9
Nonlinear Dimensionality Reduction
..............................................980
19.9.1
Kernel PCA
................................................................980
19.9.2
Graph-Based Methods
.....................................................982
xvi Contents
19.10
Low-Rank
Matrix
Factorization: A Sparse Modeling Path
.......................991
19.10.1
Matrix Completion
.........................................................991
19.10.2
Robust PCA
................................................................995
19.10.3
Applications of Matrix Completion and ROBUST PCA
................996
19.11
A Case Study: fMRI Data Analysis
................................................998
Problems
....................................................................................1002
References
...................................................................................1003
APPENDIX A Linear Algebra
...................................................................
юіз
A.1 Properties of Matrices
..............................................................1013
A.2 Positive Definite and Symmetric Matrices
........................................1015
A.3 Wirtinger Calculus
..................................................................1016
References
................................................................................1017
APPENDIX
В
Probability Theory and Statistics
.......................................1019
B.1
Cramér-Rao
Bound
.................................................................1019
B.2 Characteristic Functions
............................................................1020
B,3 Moments and
Cumulants
........................................................... 1020
B.4 Edgeworth Expansion of
a
Pdf.....................................................1021
Reference
.................................................................................1022
APPENDIX
С
Hints on Constrained Optimization
........................................1023
C.1 Equality Constraints
................................................................ 1023
C.2 Inequality Constraints
..............................................................1025
References
................................................................................1029
Index
.....................................................................................................1031
Gain an
πι
depth understanding of all the main machine learning methods, including the very latest trends, such as
sparse modeling, online convex optimization, nonparametric Bayesian modeling, deep learning, learning in RKH
spaces, dimensionality reduction and dictionary learning.
This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic
approaches -which are based on optimization techniques
-
together with the Bayesian inference approach, whose
essence lies in the use of a hierarchy of probabilistic models.
The book presents the major machine learning methods as they have been developed in different disciplines, such
as statistics, statistical and adaptive signal processing and computer science. Focusing on the physical reasoning
behind the mathematics, all the various methods and techniques are explained in depth, supported by examples
and problems, giving an invaluable resource to the student and researcher for understanding and applying machine
learning concepts.
The book builds carefully from the basic classical methods to the most recent trends, with chapters written to be
as self-contained as possible, making the text suitable for different courses: pattern recognition, statistical/adaptive
signal processing, statistical/Bayesian learning, as well as short courses on sparse modeling, deep learning, and
probabilistic graphical models.
|
any_adam_object | 1 |
author | Theodoridis, Sergios 1951- |
author_GND | (DE-588)12164135X |
author_facet | Theodoridis, Sergios 1951- |
author_role | aut |
author_sort | Theodoridis, Sergios 1951- |
author_variant | s t st |
building | Verbundindex |
bvnumber | BV042385052 |
classification_rvk | ST 300 ST 302 |
ctrlnum | (OCoLC)910913108 (DE-599)BVBBV042385052 |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01916nam a2200397 c 4500</leader><controlfield tag="001">BV042385052</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20181127 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">150302s2015 ad|| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780128015223</subfield><subfield code="9">978-0-12-801522-3</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="z">0128015225</subfield><subfield code="9">0128015225</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)910913108</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV042385052</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-29T</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-706</subfield><subfield code="a">DE-573</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-523</subfield><subfield code="a">DE-863</subfield><subfield code="a">DE-20</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 300</subfield><subfield code="0">(DE-625)143650:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 302</subfield><subfield code="0">(DE-625)143652:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Theodoridis, Sergios</subfield><subfield code="d">1951-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)12164135X</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Machine learning</subfield><subfield code="b">a Bayesian and optimization perspective</subfield><subfield code="c">Sergios Theodoridis</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Amsterdam [u.a.]</subfield><subfield code="b">Elsevier, Academic Press</subfield><subfield code="c">2015</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXI, 1050 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Optimierung</subfield><subfield code="0">(DE-588)4043664-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Bayes-Verfahren</subfield><subfield code="0">(DE-588)4204326-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Bayes-Verfahren</subfield><subfield code="0">(DE-588)4204326-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Optimierung</subfield><subfield code="0">(DE-588)4043664-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027821053&sequence=000003&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027821053&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Klappentext</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-027821053</subfield></datafield></record></collection> |
id | DE-604.BV042385052 |
illustrated | Illustrated |
indexdate | 2025-02-20T06:37:48Z |
institution | BVB |
isbn | 9780128015223 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-027821053 |
oclc_num | 910913108 |
open_access_boolean | |
owner | DE-29T DE-739 DE-706 DE-573 DE-11 DE-523 DE-863 DE-BY-FWS DE-20 |
owner_facet | DE-29T DE-739 DE-706 DE-573 DE-11 DE-523 DE-863 DE-BY-FWS DE-20 |
physical | XXI, 1050 S. Ill., graph. Darst. |
publishDate | 2015 |
publishDateSearch | 2015 |
publishDateSort | 2015 |
publisher | Elsevier, Academic Press |
record_format | marc |
spellingShingle | Theodoridis, Sergios 1951- Machine learning a Bayesian and optimization perspective Maschinelles Lernen (DE-588)4193754-5 gnd Optimierung (DE-588)4043664-0 gnd Bayes-Verfahren (DE-588)4204326-8 gnd |
subject_GND | (DE-588)4193754-5 (DE-588)4043664-0 (DE-588)4204326-8 |
title | Machine learning a Bayesian and optimization perspective |
title_auth | Machine learning a Bayesian and optimization perspective |
title_exact_search | Machine learning a Bayesian and optimization perspective |
title_full | Machine learning a Bayesian and optimization perspective Sergios Theodoridis |
title_fullStr | Machine learning a Bayesian and optimization perspective Sergios Theodoridis |
title_full_unstemmed | Machine learning a Bayesian and optimization perspective Sergios Theodoridis |
title_short | Machine learning |
title_sort | machine learning a bayesian and optimization perspective |
title_sub | a Bayesian and optimization perspective |
topic | Maschinelles Lernen (DE-588)4193754-5 gnd Optimierung (DE-588)4043664-0 gnd Bayes-Verfahren (DE-588)4204326-8 gnd |
topic_facet | Maschinelles Lernen Optimierung Bayes-Verfahren |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027821053&sequence=000003&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027821053&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT theodoridissergios machinelearningabayesianandoptimizationperspective |
Inhaltsverzeichnis
THWS Würzburg Zentralbibliothek Lesesaal
Signatur: |
1000 ST 302 T388 |
---|---|
Exemplar 1 | ausleihbar Verfügbar Bestellen |