Speech and audio signal processing: processing and perception of speech and music
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Hoboken, NJ
Wiley
2011
|
Ausgabe: | 2. ed. |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | XXII, 661 S. Ill., graph. Darst. |
ISBN: | 9780470195369 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV039575118 | ||
003 | DE-604 | ||
005 | 20150626 | ||
007 | t | ||
008 | 110909s2011 ad|| |||| 00||| eng d | ||
015 | |a GBB0D6078 |2 dnb | ||
020 | |a 9780470195369 |9 978-0-470-19536-9 | ||
020 | |z 9780470195369 |9 9780470195369 | ||
035 | |a (OCoLC)697485510 | ||
035 | |a (DE-599)BVBBV039575118 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
049 | |a DE-83 |a DE-1102 |a DE-92 |a DE-739 | ||
082 | 0 | |a 621.3822 |2 22 | |
084 | |a ZN 6040 |0 (DE-625)157496: |2 rvk | ||
084 | |a ZN 6060 |0 (DE-625)157500: |2 rvk | ||
100 | 1 | |a Gold, Bernard |e Verfasser |4 aut | |
245 | 1 | 0 | |a Speech and audio signal processing |b processing and perception of speech and music |c Ben Gold ; Nelson Morgan ; Ellis Dan |
250 | |a 2. ed. | ||
264 | 1 | |a Hoboken, NJ |b Wiley |c 2011 | |
300 | |a XXII, 661 S. |b Ill., graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 4 | |a Speech processing systems | |
650 | 4 | |a Signal processing / Digital techniques | |
650 | 4 | |a Electronic music | |
650 | 0 | 7 | |a Automatische Sprachproduktion |0 (DE-588)4143703-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Digitale Sprachverarbeitung |0 (DE-588)4233857-8 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Digitale Sprachverarbeitung |0 (DE-588)4233857-8 |D s |
689 | 0 | 1 | |a Automatische Sprachproduktion |0 (DE-588)4143703-2 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Morgan, Nelson |e Sonstige |4 oth | |
700 | 1 | |a Ellis, Dan |e Sonstige |0 (DE-588)138989729 |4 oth | |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024426552&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-024426552 |
Datensatz im Suchindex
_version_ | 1804148403922272256 |
---|---|
adam_text | vii
PREFACE TO THE 2011 EDITION xxi
0.1 Why We Created a New Edition xxi
0,2 What is New xxi
0.3 A Final Thought xxii
CHAPTER 1 INTRODUCTION 1
1.1 Why We Wrote This Book 1
1.2 How to Use This Book 2
1.3 A Confession 4
1.4 Acknowledgments 8
PARTI____________________________________________________________________
HISTORICAL BACKGROUND
CHAPTER 2 SYNTHETIC A UDIO: A BRIEF HISTORY 9________________________________
2.1 VonKempelen 9
2.2 The Voder 9
2.3 Teaching the Operator to Make the Voder “Talk” 11
2.4 Speech Synthesis After the Voder 13
2.5 Music Machines 13
2.6 Exercises 17
CHAPTER 3 SPEECH ANALYSIS AND SYNTHESIS OVERVIEW 21
3.1 Background 21
3.1.1 Transmission of Acoustic Signals 21
3.1.2 Acoustical Telegraphy before Morse Code 22
3.1.3 The Telephone 23
3.1.4 The Channel Vocoder and Bandwidth Compression 23
3.2 Voice-coding concepts 28
3.3 Homer Dudley (1898-1981) 30
3.4 Exercises 38
3.5 Appendix: Hearing of the Fall of Troy 37
Vili
CHAPTER 4 BRIEF HISTORY OF AUTOMATIC SPEECH RECOGNITION 40
4.1 Radio Rex 40
4.2 Digit Recognition 42
4.3 Speech Recognition in the 1950s 43
4.4 The 1960s 43
4.4.1 Short-Term Spectral Analysis 45 4.4.2 Pattern Matching 45
4.5 1971-1976 ARPA Project 46
4.6 Achieved by 1976 46
4.7 The 1980s in Automatic Speech Recognition 47 4.7.1 Large Corpora Collection 47 4.7.2 Front Ends 48
4.7.3 Hidden Markov Models 48 4.7.4 The Second (D)ARPA Speech-Recognition Program 49 4.7.5 The Return of Neural Nets 50 4.7.6 Knowledge-Based Approaches 50
4.8 More Recent Work 51
4.9 Some Lessons 53
4.10 Exercises 54
CHAPTER 5 SPEECH-RECOGNITION OVERVIEW 59
5.1 Why Study Automatic Speech Recognition? 59
5.2 Why is Automatic Speech Recognition Hard? 60
5.3 Automatic Speech Recognition Dimensions 62 5.3.1 Task Parameters 62 5.3.2 Sample Domain: Letters of the Alphabet 63
5.4 Components of Automatic Speech Recognition 64
5.5 Final Comments 67
5.6 Exercises 69
PART II
MATHEMATICAL BACKGROUND
CHAPTER 6 DIGITAL SIGNAL PROCESSING 73
6.1 Introduction 73
6.2 The z Transform 73
6.3 Inverse z Transform 74
6.4 Convolution 75
6.5 Sampling 76
6.6 Linear Difference Equations 77
6.7 First-Order Linear Difference Equations 78
ix
6.8 Resonance 79
6.9 Concluding Comments 83
6.10 Exercises 84
CHAPTER 7 DIGITAL FILTERS AND DISCRETE FOURIER TRANSFORM 87
7.1 Introduction 87
7.2 Filtering Concepts 88
7.3 Transformations for Digital Filter Design 92
7.4 Digital Filter Design with Bilinear Transformation 93
7.5 The Discrete Fourier Transform 94
7.6 Fast Fourier Transform Methods 98
7.7 Relation Between the DFT and Digital Filters 100
7.8 Exercises 101
CHAPTER 8 PATTERN CLASSIFICATION 105
8.1 Introduction 105
8.2 Feature Extraction 107
8.2.1 Some Opinions 108
8.3 Pattern-Classification Methods 109
8.3.1 Minimum Distance Classifiers 109
8.3.2 Discriminant Functions 111
8.3.3 Generalized Discriminators 112
8.4 Support Vector Machines 115
8.5 Unsupervised Clustering 117
8.6 Conclusions 118
8.7 Exercises 118
8.8 Appendix: Multilayer Perception Training 119
8.8.1 Definitions 119
8.8.2 Derivation 120
CHAPTER 9 STATISTICAL PATTERN CLASSIFICATION 124
9.1 Introduction 124
9.2 A Few Definitions 124
9.3 Class-Related Probability Functions 125
9.4 Minimum Error Classification 126
9.5 Likelihood-Based MAP Classification 12/
9.6 Approximating a Bayes Classifier 128
9.7 Statistically Based Linear Discriminants ¡30
9.7.1 Discussion 131
9.8 Iterative Training: The EM Algorithm 131
9.8.1 Discussion 136
9.9 Exercises 137
x
PARTIU
ACOUSTICS
CHAPTER 10 WAVE BASICS 141
10.1 Introduction 141
10.2 The Wave Equation for the Vibrating String 142
10.3 Discrete-Time Traveling Waves 143
10.4 Boundary Conditions and Discrete Traveling Waves 144
10.5 Standing Waves 144
10.6 Discrete-Time Models of Acoustic Tubes 146
10.7 Acoustic Tube Resonances 147
10.8 Relation of Tube Resonances to Formant Frequencies 148
10.9 Exercises 150
CHAPTER 11 ACOUSTIC TUBE MODELING OF SPEECH PRODUCTION 152
11.1 Introduction 152
11.2 Acoustic Tube Models of English Phonemes 152
11.3 Excitation Mechanisms in Speech Production 156
11.4 Exercises 157
CHAPTER 12 MUSICAL INSTR UMENT ACOUSTICS 158
12.1 Introduction 158
12.2 Sequence of Steps in a Plucked or Bowed String Instrument 159
12.3 Vibrations of the Bowed String 159
12.4 Frequency-Response Measurements of the Bridge of a Violin 160
12.5 Vibrations of the Body of String instruments 163
12.6 Radiation Pattern of Bowed String Instruments 167
12.7 Some Considerations in Piano Design 169
12.8 The Trumpet, Trombone, French Horn, and Tuba 175
12.9 Exercises 177
CHAPTER 13 ROOM ACOUSTICS 179
13.1 Introduction 179
13.2 SoundWaves 179
13.2.1 One-Dimensional Wave Equation 180
13.2.2 Spherical Wave Equation 180
13.2.3 Intensity 181
13.2.4 Decibel Sound Levels 182
13.2.5 Typical Power Sources 182
13.3
xi
PART IV 13.3 13.4 13.5 Sound Waves in Rooms 183 13.3.1 Acoustic Reverberation 184 13.3.2 Early Reflections 187 Room Acoustics as a Component in Speech Systems 188 Exercises 189
AUDITORY PERCEPTION
CHAPTER 14 EAR PHYSIOLOGY 193
14.1 Introduction 193
14.2 Anatomical Pathways From the Ear to the Perception of Sound 193
14.3 The Peripheral Auditory System 195
14.4 Hair Cell and Auditory Nerve Functions 196
14.5 Properties of the Auditory Nerve 190
14.6 Summary and Block Diagram of the Peripheral Auditory System 20S
14.7 Exercises 207
CHAPTER 1b PSYCHOACOUSTICS 209
15.1 Introduction 209
15.2 Sound-Pressure Level and Loudness 210
15.3 Frequency Analysis and Critical Bands 212
15.4 Masking 214
15.5 Summary 216
15.6 Exercises 217
CHAPTER 16 MODELS OF PITCH PERCEPTION 218
16.1 Introduction 218
16.2 Historical Review of Pitch-Perception Models 218
16.3 Physiological Exploration of Place Versus Periodicity 223
16.4 Results from Psychoacoustic Testing and Models ‘№
16.5 Summary 228
16.6 Exercises 230
CHAPTER i7 SPEECH PERCEPTION 232
17.1 introduction 232
17.2 Vowel Perception: Psychoacoustics and Physiology 23?,
17.3 The Confusion Matrix 23b
17.4 Perceptual Cues for Plosives 238
17.5 Physiological Studies of Two Voiced Plosives 239
xii
17.6 Motor Theories of Speech Perception 241
17.7 Neural Firing Patterns for Connected Speech Stimuli 243
17.8 Concluding Thoughts 244
17.9 Exercises 247
CHAPTER 18 HUMAN SPEECH RECOGNITION 250
18.1 Introduction 250
18.2 The Articulation Index and Human Recognition 250
18.2.1 The B ig Idea 250
18.2.2 The Experiments 251
18.2.3 Discussion 252
18.3 Comparisons Between Human and Machine Speech Recognizers 253
18.4 Concluding Thoughts 256
18.5 Exercises 258
PART V
SPEECH FEATURES CHAPTER 19 THE A UDITORY SYSTEM AS A FILTER BANK 263
19.1 Introduction 263
19.2 Review of Fletcher’s Critical Band Experiments 263
19.3 Threshold Measurements and Filter Shapes 265
19.4 Gamma-Tone Filters, Roex Filters, and Auditory Models 270
19.5 Other Considerations in Filter-Bank Design 272
19.6 Speech Spectrum Analysis Using the FFT 274
19.7 Conclusions 275
19.8 Exercises 275
CHAPTER 20 THECEPSTRUM AS A SPECTRAL ANALYZER 277
20.1 Introduction 277
20.2 A Historical Note 277
20.3 The Real Cepstrum 278
20.4 The Complex Cepstrum 279
20.5 Application of Cepstral Analysis to Speech Signals 281
20.6 Concluding Thoughts 283
20.7 Exercises 284
CHAPTER 21 LINEAR PREDICTION 286
21.1 Introduction 286
21.2 The Predictive Model 286
xiii
21.3 Properties of the Representation 290
21.4 Getting the Coefficients 292
21.5 Related Representations 294
21.6 Concluding Discussion 295
21.7 Exercises 297
PART VI
A UTOMATICSPEECH RECOGNITION
CHAPTER 22 FEATURE EXTRACTION FOR ASR 301
22.1 Introduction 301
22.2 22.3 22.4 Common Feature Vectors 301 Dynamic Features 306 Strategies for Robustness 307
22.4.1 Robustness to Convolutional Error 307 22.4.2 Robustness to Room Reverberation 309
22.4.3 Robustness to Additive Noise 311
22.4.4 Caveats 313
22.5 Auditory Models 313
22.6 Multichannel Input 314
22.7 Discriminant Features 315
22.8 Discussion 315
22.9 Exercises 316
CHAPTER 23 LINGUISTIC CATEGORIES FOR SPEECH RECOGNITION 319
23.1 Introduction 319
23.2 Phones and Phonemes 319 23.2.1 Overview 319 23.2.2 What Makes a Phone? 320 23.2.3 What Makes a Phoneme? 321
23.3 Phonetic and Phonemic Alphabets 321
23.4 Articulatory Features 322
23.4.1 Consonants 322 23.4.2 Vowels 326
23.4.3 Why Use Features? 327
23.5 Subword Units as Categories for ASR 327
23.6 Phonological Models for ASR 329
23.6.1 Phonological rules 329
23.7 23.6.2 Pronunciation rule induction 329 Context- Dependent Phones 330
23.8 Other Subword Units 331
23.8.1 Properties in Fluent Speech 332
XIV
23.9 Phrases 332
23.10 Some Issues in Phonological Modeling 332
23.11 Exercises 334
CHAPTER 24 DETERMINISTIC SEQUENCE RECOGNITION FOR ASR 337
24.1 Introduction 337
24.2 Isolated Word Recognition 338 24.2.1. Linear Time Warp 339 24.2.2 Dynamic Time Warp 340 24.2.3 Distances 344 24.2.4 End-Point Detection 344
24.3 Connected Word Recognition 346
24.4 Segmental Approaches 347
24.5 Discussion 348
24.6 Exercises 349
CHAPTER 25 STATISTICAL SEQUENCE RECOGNITION 350
25.1 Introduction 350
25.2 Stating the Problem 351
25.3 Parameterization and Probability Estimation 353 25.3.1 Markov Models 354 25.3.2 Hidden Markov Model 356
25.3.3 HMMs for Speech Recognition 357 25.3.4 Estimation of P (A M )358
25.4 Conclusion 362
25.5 Exercises 363
CHAPTER 26 STATISTICAL MODEL TRAINING 364
26.1 Introduction 364
26.2 HMM Training 365
26.3 Forward-Backward Training 368
26.4 Optimal Parameters for Emission Probability Estimators 371
26.4.1 Gaussian Density Functions 371
26.4.2 Example: Training with Discrete Densities 372
26.5 Viterbi Training 373
26.5.1 Example: Training with Gaussian Density Functions 37S
26.5.2 Example: Training with Discrete Densities 375
26.6 Local Acoustic Probability Estimators for ASR 376
26.6.1 Discrete Probabilities 376
26.6.2 Gaussian Densities 377
26.6.3 Tied Mixtures of Gaussians 377
26.6.4 Independent Mixtures of Gaussians 377
XV
26.6.5 Neural Networks 377
26.7 Initialization 378
26.B Smoothing 378
26.9 Conclusions 379
26.10 Exercises 379
CHAPTER 27 DISCRIMINANT ACOUSTIC PROBABILITY ESTIMATION 381
27.1 Introduction 381
27.2 Discriminant Training 382
27.2.1 Maximum Mutual Information 383
27.2.2 Corrective Training 383
27.2.3 Generalized Probabilistic Descent 304
27.2.4 Direct Estimation of Posteriors 300
27.3 HMM-ANN Based ASR 308
27.3.1 MLP Architecture 388
27.3.2 MLP Training 300
27.3.3 Embedded Training 389
27.4 Other Applications of ANNs to ASR 390
27.5 Exercises 391
27.6 Appendix: Posterior Probability Proof 391
CHAPTER 28 ACOUSTIC MODEL TRAINING: FURTHER TOPICS 394
28.1 Introduction 394
28.2 Adaptation 394
28.2.1 MAP and MLLR 394
28.2.2 Speaker Adaptive Training 399
28.2.3 Vocal tract length normalization 401
28.3 Lattice-Based MMI and MPE 402
28.3.1 Details of mean estimation using lattice-based MMI and MPE 405
28.4 Conclusion 412
28.5 Exercises 413
CHAPTER 29 SPEECH RECOGNITION AND UNDERSTANDING 41G
29.1 Introduction 410
29,3, Phonological Models 417
29.3 Language Models 419
29.3.1 n-Gram Statistics 421
29.3.2 Smoothing 422
29.4 Decoding With Acoustic and Language Models 423
29.5 A Complete System 424
29.6 Accepting Realistic Input 426
29.7 Concluding Comments 427
xvi
PARTVil ____________________________________________________________________________________
SYNTHESIS AND CODING
CHAPTER 30 SPEECH SYNTHESIS 431
30.1 Introduction 431
30.2 Concatenative Methods 433
30.2.1 Database 433
30.2.2 Unit selection 434
30.2.3 Concatenation and optional modification 435
30.3 Statistical Parametric Methods 436
30.3.1 Vocoding: from waveforms to features and back 436
30.3.2 Statistical modeling for speech generation 438
30.3.3 Advanced techniques 440
30.4 A Historical Perspective 441
30.5 Speculation 443
30.5.1 Physical models 444
30.5.2 Sub-word units and the role of linguistic knowledge 445
30.5.3 Prosody matters 445
30.6 Tools and Evaluation 446
30.6.1 Further reading 447
30.7 Exercises 447
30.8 Appendix: Synthesizer Examples 448
30.8.1 The Klatt Recordings 448
30.8.2 Development of Speech Synthesizers 448
30.8.3 Segmental Synthesis by Rule 449
30.8.4 Synthesis By Rule of Segments and Sentence Prosody 449
30.8.5 Fully Automatic Text-To-Speech Conversion: Formants and
diphones 450
30.8.6 The van Santen Recordings 451
30.8.7 Fully Automatic Text-To-Speech Conversion:
Unit selection and HMMs 451
CHAPTER 31 PITCH DETECTION 455
31.1 Introduction 455
31.2 A Note on Nomenclature 455
31.3 Pitch Detection, Perception and Articulation 456
31.4 The Voicing Decision 457
31.5 Some Difficulties in Pitch Detection 458
31.6 Signal Processing to Improve Pitch Detection 458
31.7 Pattern-Recognition Methods for Pitch Detection 462
31.8 Smoothing to Fix Errors in Pitch Estimation 467
31.9 Normalizing the Autocorrelation Function 469
31.10 Exercises 471
XVII
CHAPTER 32 VOCODERS 473
32.1 Introduction 473
32.2 Standards for Digital Speech Coding 473
32.3 Design Considerations in Channel Vocoder Filter Banks 473
32.4 Energy Measurements in a Channel Vocoder 476
32.5 A Vocoder Design for Spectral Envelope Estimation 478
32.6 32.7 Bit Saving in Channel Vocoders 478 Design of the Excitation Parameters for a Channel Vocoder 482
32.8 LPC Vocoders 484
32.9 Cepstral Vocoders 484
32.10 Design Comparisons 485
32.11 Vocoder Standardization 489
32.12 Exercises 480
CHAPTER 33 LOW-RATE VOCODERS 493
33.1 Introduction 493
33.2 The Frame-Fill Concept 494
33.3 Pattern Matching or Vector Quantization 496
33.4 The Kang-Coulter 600-bps Vocoder 497
33.5 Segmentation Methods for Bandwidth Reduction 498
33.6 Exercises 503
CHAPTER 34 MEDIUM-RATE AND HIGH-RATE VOCODERS 505
34.1 Introduction 505
34.2 Voice Excitation and Spectral Flattening 505
34.3 34.4 34.5 Voice-Excited Channel Vocoder 508 Voice-Excited and EiTor-SignaFExcited LPC Vocoders 808 Waveform Coding with Predictive Methods 510
34.6 34.7 Adaptive Predictive Coding of Speech 512 Subband Coding 813
34.8 Multipulse LPC Vocoders 514
34.9 Code-Excited Linear Predictive Coding 818
34.9.1 Basic CELP 516 34.9.2 Modifications to CELP 518 34.9.3 Non-Gaussian Codebook Sequences 518 34.9.4 Low-Delay CELP 519
34.10 Reducing Codebook Search Time in CELP 520
34.10.1 Filter Simplification 520
xviii
34.10.2 Speeding Up the Search 522
34.10.3 Multiresolution Codebook Search 523
34.10.4 Partial Sequence Elimination 524
34.10.5 Tree-Structured Delta Codebooks 524
34.10.6 Adaptive Codebooks 525
34.10.7 Linear Combination Codebooks 526
34.10.8 Vector Sum Excited Linear Prediction 527
CHAPTER 35 34.11 Conclusions 527 34.12 Exercises 528 PERCEPTUAL A UDIO CODING 531
35.1 Transparent Audio Coding 531
35.2 Perceptual Masking 533 35.2.1 Psychoacoustic phenomena 533 35.2.2 Computational models 535
35.3 Noise Shaping 538 35.3.1 Subband analysis 539 35.3.2 Temporal noise shaping 542
35.4 Some Example Coding Schemes 546 35.4.1 MPEG-1 Audio layers I and II 546 35.4.2 MPEG-1 Audio Layer III (MP3) 546 35.4.3 MPEG-2 Advanced Audio Codec (AAC) 547
35.5 Summary 548
35.6 Exercises 549
PART VIII
OTHER APPLICATIONS
CHAPTER 36 SOME ASPECTS OF COMPUTER MUSIC SYNTHESIS 553
36.1 Introduction 553
36.2 Some Examples of Acoustically Generated Musicals Sounds 553
36.3 Music Synthesis Concepts 555
36.4 Analysis-Based Synthesis 557
36.5 Other Techniques for Music Synthesis 560
36.6 Reverberation 562
36.7 Several Examples of Synthesis 563
36.8 Exercises 565
CHAPTER 37 MUSIC SIGNAL ANALYSIS 567
37.1 The Information in Music Audio 567
37.2 Music Transcription 568
37.3 Note Transcription 569
37.4 Score Alignment 571
37.5 Chord Transcription 574
37.6 Structure Detection 576
37.7 Conclusion 577
37.8 Exercises 578
CHAPTER 38 MUSIC RETRIEVAL 581_____________________
38.1 The Music Retrieval Problem 581
38.2 Music Fingerprinting 582
38.3 Query by Humming 584
38.4 Cover Song Matching 587
38.5 Music Classification and Autotagging 589
38.6 Music Similarity 591
38.7 Conclusions 592
38.8 Exercises 592,
CHAPTER 39 SOURCE SEPARATION 595
39.1 Sources and Mixtures 595
39.2 Evaluating Source Separation 596
39.3 Multi-Channel Approaches 598
39.4 Beamforming with Microphone Arrays 599
39.4.1 A multi-channel signal model 601
39.4.2 Time-invariant Beamformers 602
39.4.3 Adaptive beamformers 604
39.4.4 Alternative Objective Criteria 605
39.5 Independent Component Analysis 605
39.6 Computational Auditory Scene Analysis 607
39.7 Model-Based Separation 610
39.8 Conclusions 613
39.9 Exercises 614
CHAPTER 40 SPEECH TRANSFORMATIONS 617
40.1 Introduction 617
40.2 Time· Scale Modification 617
40.3 Transformation Without Explicit Pitch Detection 020
40.4 Transformations in Analysis-Synthesis Systems 621
40.5 Speech Modifications in the Phase Vocoder 623
40.6 Speech Transformations Without Pitch Extraction 625
40.7 The Sine Transform Coder as a Transformation Algorithm 628
40.8 Voice Modification to Emulate a Target Voice 629
40.9 Exercises 630
XX
CHAPTER 41
CHAPTER 42
SPEAKER VERIFICATION 633
41.1 Introduction 633
41.2 General Design of a Speaker Recognition System 634
41.3 Example System Components 635
41.3.1 Features 635
41.3.2 Models 635
41.3.3 Score normalization 637
41.3.4 Fusion and calibration 638
41.4 Evaluation 638
41.5 Modem Research Challenges 641
41.6 Exercises 641
SPEAKER DIARIZATION 644
42.1 Introduction 644
42.2 General Design of a Speaker Diarization System 645
42.3 Example System Components 647
42.3.1 Features 647
42.3.2 Segmentation and clustering 647
42.3.3 Acoustic beamforming 649
42.3.4 Speech activity detection 650
42.4 Research Challenges 651
42.4.1 Overlap resolution 651
42.4.2 Multimodal diarization 651
42.4.3 Further challenges 652
42.5 Exercises 652
|
any_adam_object | 1 |
author | Gold, Bernard |
author_GND | (DE-588)138989729 |
author_facet | Gold, Bernard |
author_role | aut |
author_sort | Gold, Bernard |
author_variant | b g bg |
building | Verbundindex |
bvnumber | BV039575118 |
classification_rvk | ZN 6040 ZN 6060 |
ctrlnum | (OCoLC)697485510 (DE-599)BVBBV039575118 |
dewey-full | 621.3822 |
dewey-hundreds | 600 - Technology (Applied sciences) |
dewey-ones | 621 - Applied physics |
dewey-raw | 621.3822 |
dewey-search | 621.3822 |
dewey-sort | 3621.3822 |
dewey-tens | 620 - Engineering and allied operations |
discipline | Elektrotechnik / Elektronik / Nachrichtentechnik |
edition | 2. ed. |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01856nam a2200457 c 4500</leader><controlfield tag="001">BV039575118</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20150626 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">110909s2011 ad|| |||| 00||| eng d</controlfield><datafield tag="015" ind1=" " ind2=" "><subfield code="a">GBB0D6078</subfield><subfield code="2">dnb</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780470195369</subfield><subfield code="9">978-0-470-19536-9</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="z">9780470195369</subfield><subfield code="9">9780470195369</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)697485510</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV039575118</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-83</subfield><subfield code="a">DE-1102</subfield><subfield code="a">DE-92</subfield><subfield code="a">DE-739</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">621.3822</subfield><subfield code="2">22</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ZN 6040</subfield><subfield code="0">(DE-625)157496:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ZN 6060</subfield><subfield code="0">(DE-625)157500:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Gold, Bernard</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Speech and audio signal processing</subfield><subfield code="b">processing and perception of speech and music</subfield><subfield code="c">Ben Gold ; Nelson Morgan ; Ellis Dan</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">2. ed.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Hoboken, NJ</subfield><subfield code="b">Wiley</subfield><subfield code="c">2011</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXII, 661 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Speech processing systems</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Signal processing / Digital techniques</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Electronic music</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Sprachproduktion</subfield><subfield code="0">(DE-588)4143703-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Digitale Sprachverarbeitung</subfield><subfield code="0">(DE-588)4233857-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Digitale Sprachverarbeitung</subfield><subfield code="0">(DE-588)4233857-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Automatische Sprachproduktion</subfield><subfield code="0">(DE-588)4143703-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Morgan, Nelson</subfield><subfield code="e">Sonstige</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ellis, Dan</subfield><subfield code="e">Sonstige</subfield><subfield code="0">(DE-588)138989729</subfield><subfield code="4">oth</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024426552&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-024426552</subfield></datafield></record></collection> |
id | DE-604.BV039575118 |
illustrated | Illustrated |
indexdate | 2024-07-10T00:06:35Z |
institution | BVB |
isbn | 9780470195369 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-024426552 |
oclc_num | 697485510 |
open_access_boolean | |
owner | DE-83 DE-1102 DE-92 DE-739 |
owner_facet | DE-83 DE-1102 DE-92 DE-739 |
physical | XXII, 661 S. Ill., graph. Darst. |
publishDate | 2011 |
publishDateSearch | 2011 |
publishDateSort | 2011 |
publisher | Wiley |
record_format | marc |
spelling | Gold, Bernard Verfasser aut Speech and audio signal processing processing and perception of speech and music Ben Gold ; Nelson Morgan ; Ellis Dan 2. ed. Hoboken, NJ Wiley 2011 XXII, 661 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier Speech processing systems Signal processing / Digital techniques Electronic music Automatische Sprachproduktion (DE-588)4143703-2 gnd rswk-swf Digitale Sprachverarbeitung (DE-588)4233857-8 gnd rswk-swf Digitale Sprachverarbeitung (DE-588)4233857-8 s Automatische Sprachproduktion (DE-588)4143703-2 s DE-604 Morgan, Nelson Sonstige oth Ellis, Dan Sonstige (DE-588)138989729 oth Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024426552&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Gold, Bernard Speech and audio signal processing processing and perception of speech and music Speech processing systems Signal processing / Digital techniques Electronic music Automatische Sprachproduktion (DE-588)4143703-2 gnd Digitale Sprachverarbeitung (DE-588)4233857-8 gnd |
subject_GND | (DE-588)4143703-2 (DE-588)4233857-8 |
title | Speech and audio signal processing processing and perception of speech and music |
title_auth | Speech and audio signal processing processing and perception of speech and music |
title_exact_search | Speech and audio signal processing processing and perception of speech and music |
title_full | Speech and audio signal processing processing and perception of speech and music Ben Gold ; Nelson Morgan ; Ellis Dan |
title_fullStr | Speech and audio signal processing processing and perception of speech and music Ben Gold ; Nelson Morgan ; Ellis Dan |
title_full_unstemmed | Speech and audio signal processing processing and perception of speech and music Ben Gold ; Nelson Morgan ; Ellis Dan |
title_short | Speech and audio signal processing |
title_sort | speech and audio signal processing processing and perception of speech and music |
title_sub | processing and perception of speech and music |
topic | Speech processing systems Signal processing / Digital techniques Electronic music Automatische Sprachproduktion (DE-588)4143703-2 gnd Digitale Sprachverarbeitung (DE-588)4233857-8 gnd |
topic_facet | Speech processing systems Signal processing / Digital techniques Electronic music Automatische Sprachproduktion Digitale Sprachverarbeitung |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024426552&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT goldbernard speechandaudiosignalprocessingprocessingandperceptionofspeechandmusic AT morgannelson speechandaudiosignalprocessingprocessingandperceptionofspeechandmusic AT ellisdan speechandaudiosignalprocessingprocessingandperceptionofspeechandmusic |