Verfügbarkeit: Speech and audio signal processing

Speech and audio signal processing: processing and perception of speech and music

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Gold, Bernard (VerfasserIn)
Format:	Buch
Sprache:	English
Veröffentlicht:	Hoboken, NJ Wiley 2011
Ausgabe:	2. ed.
Schlagworte:	Speech processing systems Signal processing / Digital techniques Electronic music Automatische Sprachproduktion Digitale Sprachverarbeitung
Online-Zugang:	Inhaltsverzeichnis
Beschreibung:	XXII, 661 S. Ill., graph. Darst.
ISBN:	9780470195369

Internformat

MARC


LEADER	00000nam a2200000 c 4500
001	BV039575118
003	DE-604
005	20150626
007	t
008	110909s2011 ad\|\| \|\|\|\| 00\|\|\| eng d
015			\|a GBB0D6078 \|2 dnb
020			\|a 9780470195369 \|9 978-0-470-19536-9
020			\|z 9780470195369 \|9 9780470195369
035			\|a (OCoLC)697485510
035			\|a (DE-599)BVBBV039575118
040			\|a DE-604 \|b ger \|e rakwb
041	0		\|a eng
049			\|a DE-83 \|a DE-1102 \|a DE-92 \|a DE-739
082	0		\|a 621.3822 \|2 22
084			\|a ZN 6040 \|0 (DE-625)157496: \|2 rvk
084			\|a ZN 6060 \|0 (DE-625)157500: \|2 rvk
100	1		\|a Gold, Bernard \|e Verfasser \|4 aut
245	1	0	\|a Speech and audio signal processing \|b processing and perception of speech and music \|c Ben Gold ; Nelson Morgan ; Ellis Dan
250			\|a 2. ed.
264		1	\|a Hoboken, NJ \|b Wiley \|c 2011
300			\|a XXII, 661 S. \|b Ill., graph. Darst.
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
650		4	\|a Speech processing systems
650		4	\|a Signal processing / Digital techniques
650		4	\|a Electronic music
650	0	7	\|a Automatische Sprachproduktion \|0 (DE-588)4143703-2 \|2 gnd \|9 rswk-swf
650	0	7	\|a Digitale Sprachverarbeitung \|0 (DE-588)4233857-8 \|2 gnd \|9 rswk-swf
689	0	0	\|a Digitale Sprachverarbeitung \|0 (DE-588)4233857-8 \|D s
689	0	1	\|a Automatische Sprachproduktion \|0 (DE-588)4143703-2 \|D s
689	0		\|5 DE-604
700	1		\|a Morgan, Nelson \|e Sonstige \|4 oth
700	1		\|a Ellis, Dan \|e Sonstige \|0 (DE-588)138989729 \|4 oth
856	4	2	\|m Digitalisierung UB Passau - ADAM Catalogue Enrichment \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024426552&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
999			\|a oai:aleph.bib-bvb.de:BVB01-024426552

Datensatz im Suchindex

_version_	1804148403922272256
adam_text	vii PREFACE TO THE 2011 EDITION xxi 0.1 Why We Created a New Edition xxi 0,2 What is New xxi 0.3 A Final Thought xxii CHAPTER 1 INTRODUCTION 1 1.1 Why We Wrote This Book 1 1.2 How to Use This Book 2 1.3 A Confession 4 1.4 Acknowledgments 8 PARTI____________________________________________________________________ HISTORICAL BACKGROUND CHAPTER 2 SYNTHETIC A UDIO: A BRIEF HISTORY 9________________________________ 2.1 VonKempelen 9 2.2 The Voder 9 2.3 Teaching the Operator to Make the Voder “Talk” 11 2.4 Speech Synthesis After the Voder 13 2.5 Music Machines 13 2.6 Exercises 17 CHAPTER 3 SPEECH ANALYSIS AND SYNTHESIS OVERVIEW 21 3.1 Background 21 3.1.1 Transmission of Acoustic Signals 21 3.1.2 Acoustical Telegraphy before Morse Code 22 3.1.3 The Telephone 23 3.1.4 The Channel Vocoder and Bandwidth Compression 23 3.2 Voice-coding concepts 28 3.3 Homer Dudley (1898-1981) 30 3.4 Exercises 38 3.5 Appendix: Hearing of the Fall of Troy 37 Vili CHAPTER 4 BRIEF HISTORY OF AUTOMATIC SPEECH RECOGNITION 40 4.1 Radio Rex 40 4.2 Digit Recognition 42 4.3 Speech Recognition in the 1950s 43 4.4 The 1960s 43 4.4.1 Short-Term Spectral Analysis 45 4.4.2 Pattern Matching 45 4.5 1971-1976 ARPA Project 46 4.6 Achieved by 1976 46 4.7 The 1980s in Automatic Speech Recognition 47 4.7.1 Large Corpora Collection 47 4.7.2 Front Ends 48 4.7.3 Hidden Markov Models 48 4.7.4 The Second (D)ARPA Speech-Recognition Program 49 4.7.5 The Return of Neural Nets 50 4.7.6 Knowledge-Based Approaches 50 4.8 More Recent Work 51 4.9 Some Lessons 53 4.10 Exercises 54 CHAPTER 5 SPEECH-RECOGNITION OVERVIEW 59 5.1 Why Study Automatic Speech Recognition? 59 5.2 Why is Automatic Speech Recognition Hard? 60 5.3 Automatic Speech Recognition Dimensions 62 5.3.1 Task Parameters 62 5.3.2 Sample Domain: Letters of the Alphabet 63 5.4 Components of Automatic Speech Recognition 64 5.5 Final Comments 67 5.6 Exercises 69 PART II MATHEMATICAL BACKGROUND CHAPTER 6 DIGITAL SIGNAL PROCESSING 73 6.1 Introduction 73 6.2 The z Transform 73 6.3 Inverse z Transform 74 6.4 Convolution 75 6.5 Sampling 76 6.6 Linear Difference Equations 77 6.7 First-Order Linear Difference Equations 78 ix 6.8 Resonance 79 6.9 Concluding Comments 83 6.10 Exercises 84 CHAPTER 7 DIGITAL FILTERS AND DISCRETE FOURIER TRANSFORM 87 7.1 Introduction 87 7.2 Filtering Concepts 88 7.3 Transformations for Digital Filter Design 92 7.4 Digital Filter Design with Bilinear Transformation 93 7.5 The Discrete Fourier Transform 94 7.6 Fast Fourier Transform Methods 98 7.7 Relation Between the DFT and Digital Filters 100 7.8 Exercises 101 CHAPTER 8 PATTERN CLASSIFICATION 105 8.1 Introduction 105 8.2 Feature Extraction 107 8.2.1 Some Opinions 108 8.3 Pattern-Classification Methods 109 8.3.1 Minimum Distance Classifiers 109 8.3.2 Discriminant Functions 111 8.3.3 Generalized Discriminators 112 8.4 Support Vector Machines 115 8.5 Unsupervised Clustering 117 8.6 Conclusions 118 8.7 Exercises 118 8.8 Appendix: Multilayer Perception Training 119 8.8.1 Definitions 119 8.8.2 Derivation 120 CHAPTER 9 STATISTICAL PATTERN CLASSIFICATION 124 9.1 Introduction 124 9.2 A Few Definitions 124 9.3 Class-Related Probability Functions 125 9.4 Minimum Error Classification 126 9.5 Likelihood-Based MAP Classification 12/ 9.6 Approximating a Bayes Classifier 128 9.7 Statistically Based Linear Discriminants ¡30 9.7.1 Discussion 131 9.8 Iterative Training: The EM Algorithm 131 9.8.1 Discussion 136 9.9 Exercises 137 x PARTIU ACOUSTICS CHAPTER 10 WAVE BASICS 141 10.1 Introduction 141 10.2 The Wave Equation for the Vibrating String 142 10.3 Discrete-Time Traveling Waves 143 10.4 Boundary Conditions and Discrete Traveling Waves 144 10.5 Standing Waves 144 10.6 Discrete-Time Models of Acoustic Tubes 146 10.7 Acoustic Tube Resonances 147 10.8 Relation of Tube Resonances to Formant Frequencies 148 10.9 Exercises 150 CHAPTER 11 ACOUSTIC TUBE MODELING OF SPEECH PRODUCTION 152 11.1 Introduction 152 11.2 Acoustic Tube Models of English Phonemes 152 11.3 Excitation Mechanisms in Speech Production 156 11.4 Exercises 157 CHAPTER 12 MUSICAL INSTR UMENT ACOUSTICS 158 12.1 Introduction 158 12.2 Sequence of Steps in a Plucked or Bowed String Instrument 159 12.3 Vibrations of the Bowed String 159 12.4 Frequency-Response Measurements of the Bridge of a Violin 160 12.5 Vibrations of the Body of String instruments 163 12.6 Radiation Pattern of Bowed String Instruments 167 12.7 Some Considerations in Piano Design 169 12.8 The Trumpet, Trombone, French Horn, and Tuba 175 12.9 Exercises 177 CHAPTER 13 ROOM ACOUSTICS 179 13.1 Introduction 179 13.2 SoundWaves 179 13.2.1 One-Dimensional Wave Equation 180 13.2.2 Spherical Wave Equation 180 13.2.3 Intensity 181 13.2.4 Decibel Sound Levels 182 13.2.5 Typical Power Sources 182 13.3 xi PART IV 13.3 13.4 13.5 Sound Waves in Rooms 183 13.3.1 Acoustic Reverberation 184 13.3.2 Early Reflections 187 Room Acoustics as a Component in Speech Systems 188 Exercises 189 AUDITORY PERCEPTION CHAPTER 14 EAR PHYSIOLOGY 193 14.1 Introduction 193 14.2 Anatomical Pathways From the Ear to the Perception of Sound 193 14.3 The Peripheral Auditory System 195 14.4 Hair Cell and Auditory Nerve Functions 196 14.5 Properties of the Auditory Nerve 190 14.6 Summary and Block Diagram of the Peripheral Auditory System 20S 14.7 Exercises 207 CHAPTER 1b PSYCHOACOUSTICS 209 15.1 Introduction 209 15.2 Sound-Pressure Level and Loudness 210 15.3 Frequency Analysis and Critical Bands 212 15.4 Masking 214 15.5 Summary 216 15.6 Exercises 217 CHAPTER 16 MODELS OF PITCH PERCEPTION 218 16.1 Introduction 218 16.2 Historical Review of Pitch-Perception Models 218 16.3 Physiological Exploration of Place Versus Periodicity 223 16.4 Results from Psychoacoustic Testing and Models ‘№ 16.5 Summary 228 16.6 Exercises 230 CHAPTER i7 SPEECH PERCEPTION 232 17.1 introduction 232 17.2 Vowel Perception: Psychoacoustics and Physiology 23?, 17.3 The Confusion Matrix 23b 17.4 Perceptual Cues for Plosives 238 17.5 Physiological Studies of Two Voiced Plosives 239 xii 17.6 Motor Theories of Speech Perception 241 17.7 Neural Firing Patterns for Connected Speech Stimuli 243 17.8 Concluding Thoughts 244 17.9 Exercises 247 CHAPTER 18 HUMAN SPEECH RECOGNITION 250 18.1 Introduction 250 18.2 The Articulation Index and Human Recognition 250 18.2.1 The B ig Idea 250 18.2.2 The Experiments 251 18.2.3 Discussion 252 18.3 Comparisons Between Human and Machine Speech Recognizers 253 18.4 Concluding Thoughts 256 18.5 Exercises 258 PART V SPEECH FEATURES CHAPTER 19 THE A UDITORY SYSTEM AS A FILTER BANK 263 19.1 Introduction 263 19.2 Review of Fletcher’s Critical Band Experiments 263 19.3 Threshold Measurements and Filter Shapes 265 19.4 Gamma-Tone Filters, Roex Filters, and Auditory Models 270 19.5 Other Considerations in Filter-Bank Design 272 19.6 Speech Spectrum Analysis Using the FFT 274 19.7 Conclusions 275 19.8 Exercises 275 CHAPTER 20 THECEPSTRUM AS A SPECTRAL ANALYZER 277 20.1 Introduction 277 20.2 A Historical Note 277 20.3 The Real Cepstrum 278 20.4 The Complex Cepstrum 279 20.5 Application of Cepstral Analysis to Speech Signals 281 20.6 Concluding Thoughts 283 20.7 Exercises 284 CHAPTER 21 LINEAR PREDICTION 286 21.1 Introduction 286 21.2 The Predictive Model 286 xiii 21.3 Properties of the Representation 290 21.4 Getting the Coefficients 292 21.5 Related Representations 294 21.6 Concluding Discussion 295 21.7 Exercises 297 PART VI A UTOMATICSPEECH RECOGNITION CHAPTER 22 FEATURE EXTRACTION FOR ASR 301 22.1 Introduction 301 22.2 22.3 22.4 Common Feature Vectors 301 Dynamic Features 306 Strategies for Robustness 307 22.4.1 Robustness to Convolutional Error 307 22.4.2 Robustness to Room Reverberation 309 22.4.3 Robustness to Additive Noise 311 22.4.4 Caveats 313 22.5 Auditory Models 313 22.6 Multichannel Input 314 22.7 Discriminant Features 315 22.8 Discussion 315 22.9 Exercises 316 CHAPTER 23 LINGUISTIC CATEGORIES FOR SPEECH RECOGNITION 319 23.1 Introduction 319 23.2 Phones and Phonemes 319 23.2.1 Overview 319 23.2.2 What Makes a Phone? 320 23.2.3 What Makes a Phoneme? 321 23.3 Phonetic and Phonemic Alphabets 321 23.4 Articulatory Features 322 23.4.1 Consonants 322 23.4.2 Vowels 326 23.4.3 Why Use Features? 327 23.5 Subword Units as Categories for ASR 327 23.6 Phonological Models for ASR 329 23.6.1 Phonological rules 329 23.7 23.6.2 Pronunciation rule induction 329 Context- Dependent Phones 330 23.8 Other Subword Units 331 23.8.1 Properties in Fluent Speech 332 XIV 23.9 Phrases 332 23.10 Some Issues in Phonological Modeling 332 23.11 Exercises 334 CHAPTER 24 DETERMINISTIC SEQUENCE RECOGNITION FOR ASR 337 24.1 Introduction 337 24.2 Isolated Word Recognition 338 24.2.1. Linear Time Warp 339 24.2.2 Dynamic Time Warp 340 24.2.3 Distances 344 24.2.4 End-Point Detection 344 24.3 Connected Word Recognition 346 24.4 Segmental Approaches 347 24.5 Discussion 348 24.6 Exercises 349 CHAPTER 25 STATISTICAL SEQUENCE RECOGNITION 350 25.1 Introduction 350 25.2 Stating the Problem 351 25.3 Parameterization and Probability Estimation 353 25.3.1 Markov Models 354 25.3.2 Hidden Markov Model 356 25.3.3 HMMs for Speech Recognition 357 25.3.4 Estimation of P (A M )358 25.4 Conclusion 362 25.5 Exercises 363 CHAPTER 26 STATISTICAL MODEL TRAINING 364 26.1 Introduction 364 26.2 HMM Training 365 26.3 Forward-Backward Training 368 26.4 Optimal Parameters for Emission Probability Estimators 371 26.4.1 Gaussian Density Functions 371 26.4.2 Example: Training with Discrete Densities 372 26.5 Viterbi Training 373 26.5.1 Example: Training with Gaussian Density Functions 37S 26.5.2 Example: Training with Discrete Densities 375 26.6 Local Acoustic Probability Estimators for ASR 376 26.6.1 Discrete Probabilities 376 26.6.2 Gaussian Densities 377 26.6.3 Tied Mixtures of Gaussians 377 26.6.4 Independent Mixtures of Gaussians 377 XV 26.6.5 Neural Networks 377 26.7 Initialization 378 26.B Smoothing 378 26.9 Conclusions 379 26.10 Exercises 379 CHAPTER 27 DISCRIMINANT ACOUSTIC PROBABILITY ESTIMATION 381 27.1 Introduction 381 27.2 Discriminant Training 382 27.2.1 Maximum Mutual Information 383 27.2.2 Corrective Training 383 27.2.3 Generalized Probabilistic Descent 304 27.2.4 Direct Estimation of Posteriors 300 27.3 HMM-ANN Based ASR 308 27.3.1 MLP Architecture 388 27.3.2 MLP Training 300 27.3.3 Embedded Training 389 27.4 Other Applications of ANNs to ASR 390 27.5 Exercises 391 27.6 Appendix: Posterior Probability Proof 391 CHAPTER 28 ACOUSTIC MODEL TRAINING: FURTHER TOPICS 394 28.1 Introduction 394 28.2 Adaptation 394 28.2.1 MAP and MLLR 394 28.2.2 Speaker Adaptive Training 399 28.2.3 Vocal tract length normalization 401 28.3 Lattice-Based MMI and MPE 402 28.3.1 Details of mean estimation using lattice-based MMI and MPE 405 28.4 Conclusion 412 28.5 Exercises 413 CHAPTER 29 SPEECH RECOGNITION AND UNDERSTANDING 41G 29.1 Introduction 410 29,3, Phonological Models 417 29.3 Language Models 419 29.3.1 n-Gram Statistics 421 29.3.2 Smoothing 422 29.4 Decoding With Acoustic and Language Models 423 29.5 A Complete System 424 29.6 Accepting Realistic Input 426 29.7 Concluding Comments 427 xvi PARTVil ____________________________________________________________________________________ SYNTHESIS AND CODING CHAPTER 30 SPEECH SYNTHESIS 431 30.1 Introduction 431 30.2 Concatenative Methods 433 30.2.1 Database 433 30.2.2 Unit selection 434 30.2.3 Concatenation and optional modification 435 30.3 Statistical Parametric Methods 436 30.3.1 Vocoding: from waveforms to features and back 436 30.3.2 Statistical modeling for speech generation 438 30.3.3 Advanced techniques 440 30.4 A Historical Perspective 441 30.5 Speculation 443 30.5.1 Physical models 444 30.5.2 Sub-word units and the role of linguistic knowledge 445 30.5.3 Prosody matters 445 30.6 Tools and Evaluation 446 30.6.1 Further reading 447 30.7 Exercises 447 30.8 Appendix: Synthesizer Examples 448 30.8.1 The Klatt Recordings 448 30.8.2 Development of Speech Synthesizers 448 30.8.3 Segmental Synthesis by Rule 449 30.8.4 Synthesis By Rule of Segments and Sentence Prosody 449 30.8.5 Fully Automatic Text-To-Speech Conversion: Formants and diphones 450 30.8.6 The van Santen Recordings 451 30.8.7 Fully Automatic Text-To-Speech Conversion: Unit selection and HMMs 451 CHAPTER 31 PITCH DETECTION 455 31.1 Introduction 455 31.2 A Note on Nomenclature 455 31.3 Pitch Detection, Perception and Articulation 456 31.4 The Voicing Decision 457 31.5 Some Difficulties in Pitch Detection 458 31.6 Signal Processing to Improve Pitch Detection 458 31.7 Pattern-Recognition Methods for Pitch Detection 462 31.8 Smoothing to Fix Errors in Pitch Estimation 467 31.9 Normalizing the Autocorrelation Function 469 31.10 Exercises 471 XVII CHAPTER 32 VOCODERS 473 32.1 Introduction 473 32.2 Standards for Digital Speech Coding 473 32.3 Design Considerations in Channel Vocoder Filter Banks 473 32.4 Energy Measurements in a Channel Vocoder 476 32.5 A Vocoder Design for Spectral Envelope Estimation 478 32.6 32.7 Bit Saving in Channel Vocoders 478 Design of the Excitation Parameters for a Channel Vocoder 482 32.8 LPC Vocoders 484 32.9 Cepstral Vocoders 484 32.10 Design Comparisons 485 32.11 Vocoder Standardization 489 32.12 Exercises 480 CHAPTER 33 LOW-RATE VOCODERS 493 33.1 Introduction 493 33.2 The Frame-Fill Concept 494 33.3 Pattern Matching or Vector Quantization 496 33.4 The Kang-Coulter 600-bps Vocoder 497 33.5 Segmentation Methods for Bandwidth Reduction 498 33.6 Exercises 503 CHAPTER 34 MEDIUM-RATE AND HIGH-RATE VOCODERS 505 34.1 Introduction 505 34.2 Voice Excitation and Spectral Flattening 505 34.3 34.4 34.5 Voice-Excited Channel Vocoder 508 Voice-Excited and EiTor-SignaFExcited LPC Vocoders 808 Waveform Coding with Predictive Methods 510 34.6 34.7 Adaptive Predictive Coding of Speech 512 Subband Coding 813 34.8 Multipulse LPC Vocoders 514 34.9 Code-Excited Linear Predictive Coding 818 34.9.1 Basic CELP 516 34.9.2 Modifications to CELP 518 34.9.3 Non-Gaussian Codebook Sequences 518 34.9.4 Low-Delay CELP 519 34.10 Reducing Codebook Search Time in CELP 520 34.10.1 Filter Simplification 520 xviii 34.10.2 Speeding Up the Search 522 34.10.3 Multiresolution Codebook Search 523 34.10.4 Partial Sequence Elimination 524 34.10.5 Tree-Structured Delta Codebooks 524 34.10.6 Adaptive Codebooks 525 34.10.7 Linear Combination Codebooks 526 34.10.8 Vector Sum Excited Linear Prediction 527 CHAPTER 35 34.11 Conclusions 527 34.12 Exercises 528 PERCEPTUAL A UDIO CODING 531 35.1 Transparent Audio Coding 531 35.2 Perceptual Masking 533 35.2.1 Psychoacoustic phenomena 533 35.2.2 Computational models 535 35.3 Noise Shaping 538 35.3.1 Subband analysis 539 35.3.2 Temporal noise shaping 542 35.4 Some Example Coding Schemes 546 35.4.1 MPEG-1 Audio layers I and II 546 35.4.2 MPEG-1 Audio Layer III (MP3) 546 35.4.3 MPEG-2 Advanced Audio Codec (AAC) 547 35.5 Summary 548 35.6 Exercises 549 PART VIII OTHER APPLICATIONS CHAPTER 36 SOME ASPECTS OF COMPUTER MUSIC SYNTHESIS 553 36.1 Introduction 553 36.2 Some Examples of Acoustically Generated Musicals Sounds 553 36.3 Music Synthesis Concepts 555 36.4 Analysis-Based Synthesis 557 36.5 Other Techniques for Music Synthesis 560 36.6 Reverberation 562 36.7 Several Examples of Synthesis 563 36.8 Exercises 565 CHAPTER 37 MUSIC SIGNAL ANALYSIS 567 37.1 The Information in Music Audio 567 37.2 Music Transcription 568 37.3 Note Transcription 569 37.4 Score Alignment 571 37.5 Chord Transcription 574 37.6 Structure Detection 576 37.7 Conclusion 577 37.8 Exercises 578 CHAPTER 38 MUSIC RETRIEVAL 581_____________________ 38.1 The Music Retrieval Problem 581 38.2 Music Fingerprinting 582 38.3 Query by Humming 584 38.4 Cover Song Matching 587 38.5 Music Classification and Autotagging 589 38.6 Music Similarity 591 38.7 Conclusions 592 38.8 Exercises 592, CHAPTER 39 SOURCE SEPARATION 595 39.1 Sources and Mixtures 595 39.2 Evaluating Source Separation 596 39.3 Multi-Channel Approaches 598 39.4 Beamforming with Microphone Arrays 599 39.4.1 A multi-channel signal model 601 39.4.2 Time-invariant Beamformers 602 39.4.3 Adaptive beamformers 604 39.4.4 Alternative Objective Criteria 605 39.5 Independent Component Analysis 605 39.6 Computational Auditory Scene Analysis 607 39.7 Model-Based Separation 610 39.8 Conclusions 613 39.9 Exercises 614 CHAPTER 40 SPEECH TRANSFORMATIONS 617 40.1 Introduction 617 40.2 Time· Scale Modification 617 40.3 Transformation Without Explicit Pitch Detection 020 40.4 Transformations in Analysis-Synthesis Systems 621 40.5 Speech Modifications in the Phase Vocoder 623 40.6 Speech Transformations Without Pitch Extraction 625 40.7 The Sine Transform Coder as a Transformation Algorithm 628 40.8 Voice Modification to Emulate a Target Voice 629 40.9 Exercises 630 XX CHAPTER 41 CHAPTER 42 SPEAKER VERIFICATION 633 41.1 Introduction 633 41.2 General Design of a Speaker Recognition System 634 41.3 Example System Components 635 41.3.1 Features 635 41.3.2 Models 635 41.3.3 Score normalization 637 41.3.4 Fusion and calibration 638 41.4 Evaluation 638 41.5 Modem Research Challenges 641 41.6 Exercises 641 SPEAKER DIARIZATION 644 42.1 Introduction 644 42.2 General Design of a Speaker Diarization System 645 42.3 Example System Components 647 42.3.1 Features 647 42.3.2 Segmentation and clustering 647 42.3.3 Acoustic beamforming 649 42.3.4 Speech activity detection 650 42.4 Research Challenges 651 42.4.1 Overlap resolution 651 42.4.2 Multimodal diarization 651 42.4.3 Further challenges 652 42.5 Exercises 652
any_adam_object	1
author	Gold, Bernard
author_GND	(DE-588)138989729
author_facet	Gold, Bernard
author_role	aut
author_sort	Gold, Bernard
author_variant	b g bg
building	Verbundindex
bvnumber	BV039575118
classification_rvk	ZN 6040 ZN 6060
ctrlnum	(OCoLC)697485510 (DE-599)BVBBV039575118
dewey-full	621.3822
dewey-hundreds	600 - Technology (Applied sciences)
dewey-ones	621 - Applied physics
dewey-raw	621.3822
dewey-search	621.3822
dewey-sort	3621.3822
dewey-tens	620 - Engineering and allied operations
discipline	Elektrotechnik / Elektronik / Nachrichtentechnik
edition	2. ed.
format	Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01856nam a2200457 c 4500</leader><controlfield tag="001">BV039575118</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20150626 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">110909s2011 ad\|\| \|\|\|\| 00\|\|\| eng d</controlfield><datafield tag="015" ind1=" " ind2=" "><subfield code="a">GBB0D6078</subfield><subfield code="2">dnb</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780470195369</subfield><subfield code="9">978-0-470-19536-9</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="z">9780470195369</subfield><subfield code="9">9780470195369</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)697485510</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV039575118</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-83</subfield><subfield code="a">DE-1102</subfield><subfield code="a">DE-92</subfield><subfield code="a">DE-739</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">621.3822</subfield><subfield code="2">22</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ZN 6040</subfield><subfield code="0">(DE-625)157496:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ZN 6060</subfield><subfield code="0">(DE-625)157500:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Gold, Bernard</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Speech and audio signal processing</subfield><subfield code="b">processing and perception of speech and music</subfield><subfield code="c">Ben Gold ; Nelson Morgan ; Ellis Dan</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">2. ed.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Hoboken, NJ</subfield><subfield code="b">Wiley</subfield><subfield code="c">2011</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXII, 661 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Speech processing systems</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Signal processing / Digital techniques</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Electronic music</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Sprachproduktion</subfield><subfield code="0">(DE-588)4143703-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Digitale Sprachverarbeitung</subfield><subfield code="0">(DE-588)4233857-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Digitale Sprachverarbeitung</subfield><subfield code="0">(DE-588)4233857-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Automatische Sprachproduktion</subfield><subfield code="0">(DE-588)4143703-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Morgan, Nelson</subfield><subfield code="e">Sonstige</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ellis, Dan</subfield><subfield code="e">Sonstige</subfield><subfield code="0">(DE-588)138989729</subfield><subfield code="4">oth</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024426552&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-024426552</subfield></datafield></record></collection>
id	DE-604.BV039575118
illustrated	Illustrated
indexdate	2024-07-10T00:06:35Z
institution	BVB
isbn	9780470195369
language	English
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-024426552
oclc_num	697485510
open_access_boolean
owner	DE-83 DE-1102 DE-92 DE-739
owner_facet	DE-83 DE-1102 DE-92 DE-739
physical	XXII, 661 S. Ill., graph. Darst.
publishDate	2011
publishDateSearch	2011
publishDateSort	2011
publisher	Wiley
record_format	marc
spelling	Gold, Bernard Verfasser aut Speech and audio signal processing processing and perception of speech and music Ben Gold ; Nelson Morgan ; Ellis Dan 2. ed. Hoboken, NJ Wiley 2011 XXII, 661 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier Speech processing systems Signal processing / Digital techniques Electronic music Automatische Sprachproduktion (DE-588)4143703-2 gnd rswk-swf Digitale Sprachverarbeitung (DE-588)4233857-8 gnd rswk-swf Digitale Sprachverarbeitung (DE-588)4233857-8 s Automatische Sprachproduktion (DE-588)4143703-2 s DE-604 Morgan, Nelson Sonstige oth Ellis, Dan Sonstige (DE-588)138989729 oth Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024426552&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis
spellingShingle	Gold, Bernard Speech and audio signal processing processing and perception of speech and music Speech processing systems Signal processing / Digital techniques Electronic music Automatische Sprachproduktion (DE-588)4143703-2 gnd Digitale Sprachverarbeitung (DE-588)4233857-8 gnd
subject_GND	(DE-588)4143703-2 (DE-588)4233857-8
title	Speech and audio signal processing processing and perception of speech and music
title_auth	Speech and audio signal processing processing and perception of speech and music
title_exact_search	Speech and audio signal processing processing and perception of speech and music
title_full	Speech and audio signal processing processing and perception of speech and music Ben Gold ; Nelson Morgan ; Ellis Dan
title_fullStr	Speech and audio signal processing processing and perception of speech and music Ben Gold ; Nelson Morgan ; Ellis Dan
title_full_unstemmed	Speech and audio signal processing processing and perception of speech and music Ben Gold ; Nelson Morgan ; Ellis Dan
title_short	Speech and audio signal processing
title_sort	speech and audio signal processing processing and perception of speech and music
title_sub	processing and perception of speech and music
topic	Speech processing systems Signal processing / Digital techniques Electronic music Automatische Sprachproduktion (DE-588)4143703-2 gnd Digitale Sprachverarbeitung (DE-588)4233857-8 gnd
topic_facet	Speech processing systems Signal processing / Digital techniques Electronic music Automatische Sprachproduktion Digitale Sprachverarbeitung
url	http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024426552&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA
work_keys_str_mv	AT goldbernard speechandaudiosignalprocessingprocessingandperceptionofspeechandmusic AT morgannelson speechandaudiosignalprocessingprocessingandperceptionofspeechandmusic AT ellisdan speechandaudiosignalprocessingprocessingandperceptionofspeechandmusic

Verfügbarkeit

Es ist kein Print-Exemplar vorhanden.

Fernleihe Bestellen Achtung: Nicht im THWS-Bestand! Inhaltsverzeichnis

MARC

Datensatz im Suchindex

Es ist kein Print-Exemplar vorhanden.

Ähnliche Einträge