Speech recognition over digital channels: robustness and standards
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Chichester [u.a.]
Wiley
2006
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis Beschreibung für Leser Inhaltsverzeichnis |
Beschreibung: | XVI, 257 S. graph. Darst. |
ISBN: | 0470024003 |
Internformat
MARC
LEADER | 00000nam a2200000zc 4500 | ||
---|---|---|---|
001 | BV023801519 | ||
003 | DE-604 | ||
005 | 20120622 | ||
007 | t | ||
008 | 070724s2006 d||| |||| 00||| eng d | ||
020 | |a 0470024003 |9 0-470-02400-3 | ||
035 | |a (OCoLC)254783235 | ||
035 | |a (DE-599)BVBBV023801519 | ||
040 | |a DE-604 |b ger | ||
041 | 0 | |a eng | |
049 | |a DE-634 |a DE-188 |a DE-355 | ||
082 | 0 | |a 621.384 | |
084 | |a ES 945 |0 (DE-625)27935: |2 rvk | ||
100 | 1 | |a Peinado, Antonio M. |e Verfasser |4 aut | |
245 | 1 | 0 | |a Speech recognition over digital channels |b robustness and standards |c Antonio M. Peinado ; José C. Segura |
264 | 1 | |a Chichester [u.a.] |b Wiley |c 2006 | |
300 | |a XVI, 257 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 0 | 7 | |a Digitale Sprachverarbeitung |0 (DE-588)4233857-8 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Automatische Spracherkennung |0 (DE-588)4003961-4 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Automatische Spracherkennung |0 (DE-588)4003961-4 |D s |
689 | 0 | |5 DE-604 | |
689 | 1 | 0 | |a Automatische Spracherkennung |0 (DE-588)4003961-4 |D s |
689 | 1 | 1 | |a Digitale Sprachverarbeitung |0 (DE-588)4233857-8 |D s |
689 | 1 | |5 DE-188 | |
700 | 1 | |a Segura, José C. |e Verfasser |4 aut | |
856 | 4 | |u http://www.loc.gov/catdir/toc/ecip0614/2006014790.html |3 Inhaltsverzeichnis | |
856 | 4 | |u http://www.loc.gov/catdir/enhancements/fy0645/2006014790-d.html |3 Beschreibung für Leser | |
856 | 4 | 2 | |m HBZ Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017443718&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-017443718 |
Datensatz im Suchindex
_version_ | 1804138999636295680 |
---|---|
adam_text | Contents
Forward xi
Preface xv
1 Introduction 1
1.1 Introduction 1
1.2 RSR over Digital Channels 4
1.3 Organization of the Book 5
2 Speech Recognition with HMMs 7
2.1 Introduction 7
2.2 Some General Issues 8
2.3 Analysis of Speech Signals 10
2.3.1 Preprocessing of the Speech Signal 10
2.3.2 Linear Prediction Analysis 11
2.3.3 Mel-Frequency Filterbanks 13
2.3.4 Cepstral Coefficients 13
2.3.5 Other Features 15
2.4 Vector Quantization 16
2.5 Approaches to ASR 18
2.5.1 Pattern Matching 18
2.5.2 The Statistical Approach 19
2.6 Hidden Markov Models 20
2.6.1 Markov Processes and Hidden Markov Processes 20
2.6.2 Definition of a Discrete HMM 22
2.6.3 The Three Basic Problems 23
2.6.4 Generalization and Types of HMM Modeling 28
2.6.5 Simplifications to Continuous HMM Modeling 28
2.7 Application of HMMs to Speech Recognition 29
2.8 Model Adaptation 32
2.8.1 Maximum Likelihood Linear Regression (MLLR) 33
2.8.2 Maximum a Posteriori Linear Regression (MAPLR) 35
2.9 Dealing with Uncertainty 36
2.9.1 Exponential Weighting 36
2.9.2 Bayesian Optimal Classification with Uncertain Data 37
viii Contents
3 Networks and Degradation 41
3.1 Introduction 41
3.2 Mobile and Wireless Networks 41
3.2.1 Cellular Structure of a Mobile Network 43
3.2.2 Example of a Mobile Network Architecture: GSM/GPRS 43
3.2.3 Degradation in Wireless Networks 46
3.2.4 Wireless Channel Models for RSR 50
3.2.5 Implementation of RSR Systems over Mobile Networks 54
3.3 IP Networks 54
3.3.1 The TCP/IP Protocol Suite 55
3.3.2 Degradation in IP Networks 58
3.3.3 Lossy Packet Channel Models 60
3.3.4 Implementation of RSR Systems over Packet Networks 65
3.4 The Acoustic Environment 66
3.4.1 Additive Noise 67
3.4.2 Channel Distortion 72
3.4.3 A Model of Environment 74
3.4.4 Probability Distributions of Noisy Speech Features 11
4 Speech Compression and Architectures for RSR 85
4.1 Introduction 85
4.2 Speech Coding 86
4.2.7 Waveform Coders 86
4.2.2 Parametric Coders 91
4.2.3 Pitch Estimation 92
4.2.4 Hybrid Coders 93
4.3 Recognition from Decoded Speech 100
4.3.1 Effect of Coding 100
4.3.2 Effect of Tandeming 102
4.3.3 Treatment of the Coding Degradation 103
4.4 Recognition from Codec Parameters 105
4.4.1 Robustness of Bitstream-based NSR 106
4.4.2 Log-Energy Computation in B-NSR Systems 111
4.4.3 Alternative Parameter Conversion 111
4.5 Distributed Speech Recognition 112
4.5.1 Scalar Quantizers 113
4.5.2 Vector Quantizers and Product Codes 113
4.5.3 Very Low Bitrate PLP-based Compression for DSR 120
4.5.4 Transform Coders 121
4.5.5 Scalability in DSR Systems 126
4.6 Comparison between NSR and DSR 127
5 Robustness Against Transmission Channel Errors 131
5.1 Introduction 131
5.2 Channel Coding Techniques 133
5.2.1 Error Detection 133
5.2.2 Error Correction and Unequal Error Protection 135
5.2.3 Example of Channel Coding for NSR: GSM-EFR 138
5.2.4 Frame-Level Interleaving 139
5.2.5 Media-specific FEC 140
Contents jx
5.3 Error Concealment (EC) 141
5.3.1 Interpolation 142
5.3.2 Estimation 145
5.3.3 Recognizer-based EC Techniques 151
5.3.4 Error Concealment for NSR 156
6 Front-end Processing for Robust Feature Extraction 159
6.1 Introduction 159
6.2 Noise Reduction Techniques 160
6.2.1 Wiener Filters 161
6.2.2 Short-time Spectral Attenuation 167
6.3 Voice Activity Detection 174
6.3.1 Full-band-energy-based VAD 175
6.3.2 Statistical VAD 177
6.3.3 Using Long-term Information 178
6.4 Feature Normalization 182
6.4.1 Cepstral Mean Normalization 182
6.4.2 Frequency Analysis of Time-filtered Features 185
6.4.3 RASTA Processing 187
6.4.4 Cepstral Mean and Variance Normalization 187
6.4.5 Nonlinear Feature Normalization 189
7 Standards for Distributed Speech Recognition 197
7.1 Introduction 197
7.2 Signal Preprocessing 199
7.2.1 Two-stage Mel-warped Wiener Filtering 199
7.2.2 Offset Compensation and Waveform Processing 206
7.3 Feature Extraction 207
7.3.1 Computation of the Basic Features 207
7.3.2 AFE/XAFE Sampling Frequency Extension to 16 kHz 208
7.3.3 Blind Equalization 211
7.3.4 Voice Activity Detection 212
7.3.5 Pitch and Class Estimation 214
7.4 Feature Compression and Encoding 217
7.4.1 Basic Feature Vector Quantization 217
7.4.2 Extension Features Quantization and Encoding 218
7.4.3 Bitstream/Payload Format and Error Protection 219
7.5 Feature Decoding and Postprocessing 221
7.5.1 Bitstream Decoding and Decompression 221
7.5.2 Error Detection and Concealment 222
7.5.3 Server Feature Processing 223
7.5.4 Pitch Tracking and Smoothing 223
A Alternative Representations of the LPC Coefficients 225
B Basic Digital Modulation Concepts 227
C Review of Channel Coding Techniques 229
C.I Media-independent FEC 229
C.I.I Linear Block Codes 229
x Contents
C.I.2 Cyclic Codes 231
C.I.3 Convolutional Codes 232
C.2 Interleaving 234
Bibliography 237
List of Acronyms 249
Index 253
|
any_adam_object | 1 |
author | Peinado, Antonio M. Segura, José C. |
author_facet | Peinado, Antonio M. Segura, José C. |
author_role | aut aut |
author_sort | Peinado, Antonio M. |
author_variant | a m p am amp j c s jc jcs |
building | Verbundindex |
bvnumber | BV023801519 |
classification_rvk | ES 945 |
ctrlnum | (OCoLC)254783235 (DE-599)BVBBV023801519 |
dewey-full | 621.384 |
dewey-hundreds | 600 - Technology (Applied sciences) |
dewey-ones | 621 - Applied physics |
dewey-raw | 621.384 |
dewey-search | 621.384 |
dewey-sort | 3621.384 |
dewey-tens | 620 - Engineering and allied operations |
discipline | Sprachwissenschaft Elektrotechnik / Elektronik / Nachrichtentechnik Literaturwissenschaft |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01734nam a2200409zc 4500</leader><controlfield tag="001">BV023801519</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20120622 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">070724s2006 d||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0470024003</subfield><subfield code="9">0-470-02400-3</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)254783235</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV023801519</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-634</subfield><subfield code="a">DE-188</subfield><subfield code="a">DE-355</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">621.384</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ES 945</subfield><subfield code="0">(DE-625)27935:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Peinado, Antonio M.</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Speech recognition over digital channels</subfield><subfield code="b">robustness and standards</subfield><subfield code="c">Antonio M. Peinado ; José C. Segura</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Chichester [u.a.]</subfield><subfield code="b">Wiley</subfield><subfield code="c">2006</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XVI, 257 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Digitale Sprachverarbeitung</subfield><subfield code="0">(DE-588)4233857-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Spracherkennung</subfield><subfield code="0">(DE-588)4003961-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Automatische Spracherkennung</subfield><subfield code="0">(DE-588)4003961-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="689" ind1="1" ind2="0"><subfield code="a">Automatische Spracherkennung</subfield><subfield code="0">(DE-588)4003961-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2="1"><subfield code="a">Digitale Sprachverarbeitung</subfield><subfield code="0">(DE-588)4233857-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2=" "><subfield code="5">DE-188</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Segura, José C.</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2=" "><subfield code="u">http://www.loc.gov/catdir/toc/ecip0614/2006014790.html</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2=" "><subfield code="u">http://www.loc.gov/catdir/enhancements/fy0645/2006014790-d.html</subfield><subfield code="3">Beschreibung für Leser</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017443718&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-017443718</subfield></datafield></record></collection> |
id | DE-604.BV023801519 |
illustrated | Illustrated |
indexdate | 2024-07-09T21:37:07Z |
institution | BVB |
isbn | 0470024003 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-017443718 |
oclc_num | 254783235 |
open_access_boolean | |
owner | DE-634 DE-188 DE-355 DE-BY-UBR |
owner_facet | DE-634 DE-188 DE-355 DE-BY-UBR |
physical | XVI, 257 S. graph. Darst. |
publishDate | 2006 |
publishDateSearch | 2006 |
publishDateSort | 2006 |
publisher | Wiley |
record_format | marc |
spelling | Peinado, Antonio M. Verfasser aut Speech recognition over digital channels robustness and standards Antonio M. Peinado ; José C. Segura Chichester [u.a.] Wiley 2006 XVI, 257 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Digitale Sprachverarbeitung (DE-588)4233857-8 gnd rswk-swf Automatische Spracherkennung (DE-588)4003961-4 gnd rswk-swf Automatische Spracherkennung (DE-588)4003961-4 s DE-604 Digitale Sprachverarbeitung (DE-588)4233857-8 s DE-188 Segura, José C. Verfasser aut http://www.loc.gov/catdir/toc/ecip0614/2006014790.html Inhaltsverzeichnis http://www.loc.gov/catdir/enhancements/fy0645/2006014790-d.html Beschreibung für Leser HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017443718&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Peinado, Antonio M. Segura, José C. Speech recognition over digital channels robustness and standards Digitale Sprachverarbeitung (DE-588)4233857-8 gnd Automatische Spracherkennung (DE-588)4003961-4 gnd |
subject_GND | (DE-588)4233857-8 (DE-588)4003961-4 |
title | Speech recognition over digital channels robustness and standards |
title_auth | Speech recognition over digital channels robustness and standards |
title_exact_search | Speech recognition over digital channels robustness and standards |
title_full | Speech recognition over digital channels robustness and standards Antonio M. Peinado ; José C. Segura |
title_fullStr | Speech recognition over digital channels robustness and standards Antonio M. Peinado ; José C. Segura |
title_full_unstemmed | Speech recognition over digital channels robustness and standards Antonio M. Peinado ; José C. Segura |
title_short | Speech recognition over digital channels |
title_sort | speech recognition over digital channels robustness and standards |
title_sub | robustness and standards |
topic | Digitale Sprachverarbeitung (DE-588)4233857-8 gnd Automatische Spracherkennung (DE-588)4003961-4 gnd |
topic_facet | Digitale Sprachverarbeitung Automatische Spracherkennung |
url | http://www.loc.gov/catdir/toc/ecip0614/2006014790.html http://www.loc.gov/catdir/enhancements/fy0645/2006014790-d.html http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017443718&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT peinadoantoniom speechrecognitionoverdigitalchannelsrobustnessandstandards AT segurajosec speechrecognitionoverdigitalchannelsrobustnessandstandards |
Es ist kein Print-Exemplar vorhanden.
Inhaltsverzeichnis