Automatic speech recognition: the development of the SPHINX system
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Boston [u.a.]
Kluwer Acad. Publ.
1989
|
Schriftenreihe: | The Kluwer international series in engineering and computer science
62 |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Literaturverz. S. [187] - 203 |
Beschreibung: | XIV, 207 S. graph. Darst. |
ISBN: | 0898382963 |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV004160172 | ||
003 | DE-604 | ||
005 | 20190228 | ||
007 | t| | ||
008 | 901119s1989 xx d||| |||| 00||| eng d | ||
020 | |a 0898382963 |9 0-89838-296-3 | ||
035 | |a (OCoLC)632713873 | ||
035 | |a (DE-599)BVBBV004160172 | ||
040 | |a DE-604 |b ger |e rakddb | ||
041 | 0 | |a eng | |
049 | |a DE-91 |a DE-739 |a DE-19 |a DE-355 |a DE-83 | ||
050 | 0 | |a TK7882.S65 | |
082 | 0 | |a 006.4/54 |2 19 | |
084 | |a ES 945 |0 (DE-625)27935: |2 rvk | ||
084 | |a ST 306 |0 (DE-625)143654: |2 rvk | ||
084 | |a ELT 532f |2 stub | ||
100 | 1 | |a Li, Kaifu |d 1961- |e Verfasser |0 (DE-588)1176434330 |4 aut | |
245 | 1 | 0 | |a Automatic speech recognition |b the development of the SPHINX system |c by Kai-Fu Lee |
264 | 1 | |a Boston [u.a.] |b Kluwer Acad. Publ. |c 1989 | |
300 | |a XIV, 207 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a The Kluwer international series in engineering and computer science |v 62 | |
500 | |a Literaturverz. S. [187] - 203 | ||
650 | 4 | |a Automatic speech recognition | |
650 | 0 | 7 | |a Automatische Spracherkennung |0 (DE-588)4003961-4 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Automatische Spracherkennung |0 (DE-588)4003961-4 |D s |
689 | 0 | |5 DE-604 | |
830 | 0 | |a The Kluwer international series in engineering and computer science |v 62 |w (DE-604)BV023545171 |9 62 | |
856 | 4 | 2 | |m Digitalisierung UB Regensburg |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=002594336&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-002594336 |
Datensatz im Suchindex
_version_ | 1817704477548871680 |
---|---|
adam_text |
Table
of
Contents
Table of Contents v
List of Figures
ix
List of Tables
xi
Foreword by Raj Reddy
*»"
Ackowledgements
xv
1.
Introduction
1
1.1.
Constrained Speech Recognition: Achievements and Limitations
2
1.1.1.
Speaker Independence
3
1.1.2.
Continuous Speech
6
1.1.3.
Large Vocabulary
8
1.1.4.
Natural Task
8
1.2.
Relaxing the Constraints: The SPHINX System
9
1.2.1.
Hidden Markov Models: A Representation of Speech
10
1.2.2.
Adding Human Knowledge
11
1.2.3.
Finding a Good Unit of Speech
12
1.2.4.
Speaker Learning and Adaptation
14
1.3.
Summary and Monograph Outline
15
2.
Hidden Markov Modeling of Speech
17
2.1.
Definition of a Hidden Markov Model
17
2.2.
Three
HMM
Problems
19
2.2.1.
The Evaluation Problem
:
The Forward Algorithm
20
2.2.2.
The Decoding Problem: The Viterbi Algorithm
22
2.23.
The Learning Problem
:
The Forward-Backward
23
Algorithm
2.3.
Implementational Issues
26
23.1.
Tied Transition
26
2.3.2.
Null Transitions
27
233.
Initialization
27
23.4.
Scaling or Log Compression
28
23.5.
Multiple Independent Observations
30
23.6.
Smoothing
30
2.4.
Using HMMs for Speech Recognition
32
2.4.1.
Representation
32
2.4.1.1.
Continuous vs. Discrete Model
32
2.4.1.2.
HMM
Representation of Speech Units
34
2.4.1.3.
HMM
Representation of Other Knowledge Sources
36
2.42.
Using
HMM
for Isolated Word Tasks
36
2.4.2.1.
Training
36
2.4.2.2.
Recognition
37
2.43.
Using
HMM
for Continuous Speech Tasks
38
2.43.1.
Training
38
vi AUTOMATIC
SPEECH
RECOGNITION
2.4.3.2.
Recognition
39
3.
Task and Databases
45
3.1.
The Resource Management Task and Database
45
3.1.1.
The Vocabulary
45
3.1.2.
The Grammar
46
3.13.
The TIRM Database
47
3.2.
The
TIMIT
Database
48
4.
The Baseline SPHINX System
51
4.1.
Signal Processing
51
4.2.
Vector Quantization
52
4.2.1.
The Distortion Measure
52
4.2.2.
A Hierarchical VQ Algorithm
53
4.3.
The Phone Model
54
4.4.
The Pronunciation Dictionary
55
4.5.
HMM
Training
56
4.6.
HMM
Recognition
59
4.7.
Results and Discussion
60
4.8.
Summary
62
5.
Adding Knowledge
63
5.1.
Fixed-Width Speech Parameters
64
5.1.1.
Bilinear Transform on the Cepstrum Coefficients
64
5.1.2.
Differenced Cepstrum Coefficients
65
5.1.3.
Power and Differenced Power
66
5.1.4.
Integrating Frame-Based Parameters
67
5.1.4.1.
Stack and Reduce
67
5.1.4.2.
Composite Distance Metric
68
5.1.4.3.
Multiple
Codebooks
69
5.2.
Variable-Width Speech Parameters
72
5.2.1.
Duration
72
5.2.2.
Knowledge-based Parameters
75
5.3.
Lexical/Phonological Improvements
75
5.3.1.
Insertion/Deletion Modeling
77
53.2.
Multiple Pronunciations
79
533.
Other Dictionary/Phone-Set Improvements
81
53.3.1.
Phonological Rules
81
533.2.
Non-Phonemic Affricates
81
533.3.
Tailoring
HMM
Topology
82
533.4.
Final Phone Set and Dictionary
83
5.4.
Results and Discussion
84
5.5.
Summary
88
6.
Finding a Good Unit of Speech
91
6.1.
Previously Proposed Units of Speech
91
6.1.1.
Words
91
6.1.2.
Phones
92
TABLE
OF
CONTENTS
VU
6.13.
Multi-Phone Units
93
6.1.4.
Explicit Transition Modeling
94
6.1.5.
Word-Dependent Phones
95
6.1.6.
Triphones (Context-Dependent Phones)
95
6.1.7.
Summary of Previous Units
97
6.2.
Deleted Interpolation of Contextual Models
97
6.3.
Function-Word-Dependent Phones
100
6.4.
Generalized Triphones
103
6.5.
Summary of SPHINX Training Procedure
106
6.6.
Results and Discussion
107
6.7.
Summary 111
7.
Learning and Adaptation
115
7.1.
Speaker Adaptation through Speaker Cluster Selection
116
7.1.1.
Speaker Clustering
117
7.1.2.
Speaker Cluster Identification
118
7.2.
Interpolated Re-estimation of
HMM
Parameters
118
7.2.1.
Different Speaker-Adaptive Estimates
119
7.2.2.
Interpolated Re-estimation
122
7.3.
Results and Discussion
124
7.4.
Summary
126
8.
Summary of Results
129
8.1.
SPHINX Results
129
8.2.
Comparison with Other Systems
131
8.3.
Error Analysis
133
9.
Conclusion
137
9.1.
Trainability vs. Specificity
:
A Unified View
137
9.2.
Contributions
138
9.3.
Future Work
141
9.4.
Final Remarks
143
Appendix I. Evaluating Speech Recognizers
145
1.1.
Perplexity
145
1.2.
Computing Error Rate
146
Appendix
П.
The Resource Management Task
149
ILI.
The Vocabulary and the SPHINX Pronunciation Dictionary
149
II.2. The Grammar
170
IL3. Training and Test Speakers
170
Appendix
Ш.
Examples of SPHINX Recognition
173
References
187
Index
205 |
any_adam_object | 1 |
author | Li, Kaifu 1961- |
author_GND | (DE-588)1176434330 |
author_facet | Li, Kaifu 1961- |
author_role | aut |
author_sort | Li, Kaifu 1961- |
author_variant | k l kl |
building | Verbundindex |
bvnumber | BV004160172 |
callnumber-first | T - Technology |
callnumber-label | TK7882 |
callnumber-raw | TK7882.S65 |
callnumber-search | TK7882.S65 |
callnumber-sort | TK 47882 S65 |
callnumber-subject | TK - Electrical and Nuclear Engineering |
classification_rvk | ES 945 ST 306 |
classification_tum | ELT 532f |
ctrlnum | (OCoLC)632713873 (DE-599)BVBBV004160172 |
dewey-full | 006.4/54 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 006 - Special computer methods |
dewey-raw | 006.4/54 |
dewey-search | 006.4/54 |
dewey-sort | 16.4 254 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik Sprachwissenschaft Elektrotechnik Literaturwissenschaft |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 cb4500</leader><controlfield tag="001">BV004160172</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20190228</controlfield><controlfield tag="007">t|</controlfield><controlfield tag="008">901119s1989 xx d||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0898382963</subfield><subfield code="9">0-89838-296-3</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)632713873</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV004160172</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakddb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-91</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-19</subfield><subfield code="a">DE-355</subfield><subfield code="a">DE-83</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">TK7882.S65</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.4/54</subfield><subfield code="2">19</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ES 945</subfield><subfield code="0">(DE-625)27935:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 306</subfield><subfield code="0">(DE-625)143654:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ELT 532f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Li, Kaifu</subfield><subfield code="d">1961-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1176434330</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Automatic speech recognition</subfield><subfield code="b">the development of the SPHINX system</subfield><subfield code="c">by Kai-Fu Lee</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boston [u.a.]</subfield><subfield code="b">Kluwer Acad. Publ.</subfield><subfield code="c">1989</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XIV, 207 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">The Kluwer international series in engineering and computer science</subfield><subfield code="v">62</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturverz. S. [187] - 203</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Automatic speech recognition</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Spracherkennung</subfield><subfield code="0">(DE-588)4003961-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Automatische Spracherkennung</subfield><subfield code="0">(DE-588)4003961-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">The Kluwer international series in engineering and computer science</subfield><subfield code="v">62</subfield><subfield code="w">(DE-604)BV023545171</subfield><subfield code="9">62</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=002594336&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-002594336</subfield></datafield></record></collection> |
id | DE-604.BV004160172 |
illustrated | Illustrated |
indexdate | 2024-12-06T15:14:35Z |
institution | BVB |
isbn | 0898382963 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-002594336 |
oclc_num | 632713873 |
open_access_boolean | |
owner | DE-91 DE-BY-TUM DE-739 DE-19 DE-BY-UBM DE-355 DE-BY-UBR DE-83 |
owner_facet | DE-91 DE-BY-TUM DE-739 DE-19 DE-BY-UBM DE-355 DE-BY-UBR DE-83 |
physical | XIV, 207 S. graph. Darst. |
publishDate | 1989 |
publishDateSearch | 1989 |
publishDateSort | 1989 |
publisher | Kluwer Acad. Publ. |
record_format | marc |
series | The Kluwer international series in engineering and computer science |
series2 | The Kluwer international series in engineering and computer science |
spelling | Li, Kaifu 1961- Verfasser (DE-588)1176434330 aut Automatic speech recognition the development of the SPHINX system by Kai-Fu Lee Boston [u.a.] Kluwer Acad. Publ. 1989 XIV, 207 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier The Kluwer international series in engineering and computer science 62 Literaturverz. S. [187] - 203 Automatic speech recognition Automatische Spracherkennung (DE-588)4003961-4 gnd rswk-swf Automatische Spracherkennung (DE-588)4003961-4 s DE-604 The Kluwer international series in engineering and computer science 62 (DE-604)BV023545171 62 Digitalisierung UB Regensburg application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=002594336&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Li, Kaifu 1961- Automatic speech recognition the development of the SPHINX system The Kluwer international series in engineering and computer science Automatic speech recognition Automatische Spracherkennung (DE-588)4003961-4 gnd |
subject_GND | (DE-588)4003961-4 |
title | Automatic speech recognition the development of the SPHINX system |
title_auth | Automatic speech recognition the development of the SPHINX system |
title_exact_search | Automatic speech recognition the development of the SPHINX system |
title_full | Automatic speech recognition the development of the SPHINX system by Kai-Fu Lee |
title_fullStr | Automatic speech recognition the development of the SPHINX system by Kai-Fu Lee |
title_full_unstemmed | Automatic speech recognition the development of the SPHINX system by Kai-Fu Lee |
title_short | Automatic speech recognition |
title_sort | automatic speech recognition the development of the sphinx system |
title_sub | the development of the SPHINX system |
topic | Automatic speech recognition Automatische Spracherkennung (DE-588)4003961-4 gnd |
topic_facet | Automatic speech recognition Automatische Spracherkennung |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=002594336&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
volume_link | (DE-604)BV023545171 |
work_keys_str_mv | AT likaifu automaticspeechrecognitionthedevelopmentofthesphinxsystem |