Text to speech synthesis: new paradigms and advances
Gespeichert in:
Format: | Buch |
---|---|
Sprache: | English |
Veröffentlicht: |
Upper Saddle River, N.J.
Prentice Hall Professional Technical Reference
2004
|
Ausgabe: | 1. print. |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Includes bibliographical references and index |
Beschreibung: | XXIII, 257 S. graph. Darst. |
ISBN: | 013145661X |
Internformat
MARC
LEADER | 00000nam a2200000zc 4500 | ||
---|---|---|---|
001 | BV019539601 | ||
003 | DE-604 | ||
005 | 20041103 | ||
007 | t | ||
008 | 041102s2004 xxud||| |||| 00||| eng d | ||
010 | |a 2004010674 | ||
020 | |a 013145661X |c hardcover : alk. paper |9 0-13-145661-X | ||
035 | |a (OCoLC)249193154 | ||
035 | |a (DE-599)BVBBV019539601 | ||
040 | |a DE-604 |b ger |e aacr | ||
041 | 0 | |a eng | |
044 | |a xxu |c US | ||
049 | |a DE-29T | ||
050 | 0 | |a TK7882.S65 | |
082 | 0 | |a 621.399 | |
245 | 1 | 0 | |a Text to speech synthesis |b new paradigms and advances |c [edited by] Shrikanth Narayanan, Abeer Alwan |
250 | |a 1. print. | ||
264 | 1 | |a Upper Saddle River, N.J. |b Prentice Hall Professional Technical Reference |c 2004 | |
300 | |a XXIII, 257 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Includes bibliographical references and index | ||
650 | 4 | |a aSpeech synthesis | |
650 | 0 | 7 | |a Sprachsynthese |0 (DE-588)4056501-4 |2 gnd |9 rswk-swf |
655 | 7 | |0 (DE-588)4143413-4 |a Aufsatzsammlung |2 gnd-content | |
689 | 0 | 0 | |a Sprachsynthese |0 (DE-588)4056501-4 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Narayanan, Shrikanth |e Sonstige |4 oth | |
700 | 1 | |a Alwan, Abeer |e Sonstige |4 oth | |
856 | 4 | 2 | |m HEBIS Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=012907966&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-012907966 |
Datensatz im Suchindex
_version_ | 1804132923545223168 |
---|---|
adam_text | TEXT TO SPEECH
SYNTHESIS
New Paradigms and Advances
Shrikanth Narayanan
Abeer Alwan
PRENTICE
HALL
PTR
Prentice Hall Professional Technical Reference
Upper Saddle River, New Jersey 07458
www phptr com
Contents
PREFACE xiii
FOREWORD xvi
1 REDUCING DISCONTINUITIES AT SYNTHESIS TIME
FOR CORPUS-BASED SPEECH SYNTHESIS
Boris Bozkurt, Thierry Dutoit, Romain Prudon,
Christophe D Alessandro, and Vincent Pagel 1
1 1 Introduction 1
1 2 Shift-Only FO Smoothing 2
121 Where to Apply Pitch Shifts 3
122 Calculating Shifts to Be Introduced 4
123 Preliminary Listening Tests for Shift-Only FO Smoothing 6
1 3 Improving Quality of MBROLA Synthesis 8
131 Background 8
132 TP-MBROLA Synthesis 9
1 4 Evaluation 12
141 Sample Preparation 12
142 Test Procedure 13
143 Test Results 13
1 5 Discussions and Conclusion 14
1 6 Bibliography 15
2 VOICE QUALITY VARIATION IN A LONG-TERM
RECORDING OF A SINGLE SPEAKER SPEECH
CORPUS
V I
Hisashi Kawai and Minoru Tsuzaki 19
2 1 Introduction 19
2 2 Perceptual Experiment 20
221 Speech material 20
222 Stimuli 21
223 Procedure 22
224 Results 22
2 3 Factors of Voice Quality Variation 23
2 4 Candidates of Acoustic Correlates 25
241 Spectral Tilt in 0-4 kHz Band 25
242 MFCC Distance in 0-4 kHz Band 26
243 Higher Frequency Power 27
244 Peak Amplitude of Autocorrelation Coefficients 27
245 Fundamental Frequency and Speech Rate 27
246 Time Intervals 28
2 5 Prediction of Voice Quality Difference Scores 28
2 6 Summary 32
2 7 Bibliography 32
JOIN COST FOR UNIT SELECTION SPEECH
SYNTHESIS
Jithendra Vepa and Simon King 35
3 1 Introduction 35
3 2 Previous Work 37
321 Join Cost Functions Based on Spectral Measures 37
322 Combined Join Cost and Target Cost Functions 39
3 3 Spectral Distances 42
331 Parameterizations 43
332 Simple Distance Measures 44
333 Statistically Motivated Distance Measures 44
334 Weighted Distances 46
3 4 Perceptual Listening Tests 46
341 Test Stimuli 47
342 Test Design 48
V l l
343 Test Procedure 49
3 5 Results and Discussion 49
351 Listener Ratings 49
352 Correlations with Statistical Distances 50
353 Correlations with Weighted Distances 53
3 6 Conclusions 56
361 Weighted Sums of Join Costs 57
362 The Listening Test 58
363 Correlation as an Evaluation Tool 58
364 Future Work 59
3 7 Bibliography 59
ARTICULATORY MODELING: A ROLE IN
CONCATENATIVE TEXT TO SPEECH SYNTHESIS
M Mohan Sondhi and Daniel J Sinder 63
4 1 Introduction 63
4 2 Articulatory Modeling 65
421 Vocal Tract Acoustics 65
422 Articulatory Parameters 66
423 Acoustic Source Models 68
424 Synthesis from the Parameters 69
4 3 Rule-Based Control of the Parameters 74
4 4 Concatenative Articulatory Synthesis 75
441 Motivation 75
442 Terminology 76
443 Articulatory Units from Natural Speech 78
444 The Speech Mimic 79
445A Prototype TTS System 83
4 5 Concluding Remarks 84
4 6 Bibliography 85
MINIMIZING THE AMOUNT OF PITCH
MODIFICATION IN SPEECH SYNTHESIS
Esther Klabbers, Jan van Santen and Johan Wouters 89
5 1 Introduction 89
Vlll
5 2 Speech Corpus Analysis 92
521 Prosodic Factors 92
522 Material 95
523 Distance Measures 96
524 Results 98
5 3 Text Corpus Analysis 100
531 Material 100
532 Results 101
5 4 Perceptual Experiment 102
541 Material 102
542 Method 103
543 Results 103
5 5 Conclusion 105
5 6 Bibliography 106
6 THE USE OF SPEECH RECOGNITION TECHNOLOGY
IN SPEECH SYNTHESIS
Mari Ostendorf and Ivan Bulyko 109
6 1 Introduction 109
6 2 Speech Recognition 110
6 3 ASR in Synthesis 114
631 Speech Synthesis as a Search Problem 114
632 ASR Tools for Annotation 116
633 Speech Models 118
634 Adaptation for Voice Transformation 119
635 iV-grams for Text Processing and Language Generation 119
636 Statistical Models for Prosody Prediction 120
6 4 Limitations 121
6 5 Speculations 123
651 ASR and Parametric Synthesis 124
652 Can Synthesis Impact Recognition? 124
6 6 Bibliography 126
7 AN HMM-BASED APPROACH TO MULTILINGUAL
SPEECH SYNTHESIS
IX
Keiichi Tokuda, Heiga Zen and Alan W Black 135
7 1 Introduction 135
7 2 HMM-Based Speech Synthesis System 137
721 Training 137
722 Synthesis 139
7 3 FO Pattern Modeling by HMM 140
731 HMM Based on Multispace Probability Distribution 140
732 Application to FO Pattern Modeling 142
7 4 Speech-Parameter Generation from an HMM 143
741 Speech-Parameter-Generation Algorithm 143
742 Determination of State Durations 146
743 Example of Parameter Generation 146
7 5 Implementation on Festival Architecture 146
7 6 Discussion 149
7 7 Conclusion 150
7 8 Bibliography 151
8 PROSODY CONTROL FOR HMM-BASED JAPANESE TTS
Koji Iwano, Masahiro Yamada,
Taro Togawa and Sadaoki Purui 155
8 1 Introduction 155
8 2 Outline of HMM-Based TTS System 156
8 3 Prosody Generation Using the Quantification Theory (Type 1) 158
831 Quantification Theory (Type 1) 158
832 FO Contour Control Model 158
833 Phoneme-Duration-Control Model 162
8 4 Speech-Rate-Variable Synthesis Method 168
841 Database 168
842 Phoneme-Duration Model Generated by Interpolation 168
843 Experiments 169
844 Experimental Results 170
8 5 Conclusions 170
8 6 Bibliography 171
9 SYNTHESIZING EXPRESSIVE SPEECH OVERVIEW:
CHALLENGES, AND OPEN QUESTIONS
Murtaza Bulut, Shrikanth Narayanan and Lewis Johnson 175
9 1 Introduction 175
9 2 Theories of Emotion 177
9 3 Dimensions of Emotional Space 178
9 4 Speech Synthesis Methods 180
941 Formant Synthesis (Rule-Driven Synthesis) 181
942 Concatenative Synthesis (Data-Driven Synthesis) 182
943 Articulatory Synthesis (Model-Driven Synthesis) 186
9 5 Emotional Speech Data Collection 187
951 Data Collection for Concatenative Speech Synthesis 187
9 6 Experimental Evaluation of Expressive Speech 190
9 7 Presentation of Results From Case Studies 191
9 8 Conclusion 196
9 9 Open Questions and Future Directions 196
9 10 Bibliography 197
10 UNIT SELECTION SYNTHESIS OF PROSODY:
EVALUATION USING DIPHONE TRANSPLANTATION
Romain Prudon, Christophe D Alessandro and
Philippe Boula de Mareuil 203
10 1 Introduction 203
10 2 Computing Prosody by Selection 204
10 2 1 Databases 204
10 2 2 Selection System Architecture 205
10 2 3 Tuning Selection for Prosody Synthesis 209
10 3 Comparative Evaluation 209
10 3 1 Prosody Generation Modules 209
10 3 2 Test Methodology 211
10 4 Results 212
10 4 1 Presentation of the Results 212
10 4 2 Overall Analysis of the Results 213
10 4 3 Analysis of the Results by Sentence Length 214
10 5 Conclusion 215
10 6 Bibliography 215
XI
11 TOWARD EXPRESSIVE SYNTHETIC SPEECH
Ellen Eide, Raimo Bakis, Wael Hamza and John F Pitrelli 219
11 1 Introduction 219
11 2 A Pilot Study For Generating Expressive Speech 222
11 2 1 Baseline System 222
11 2 2 Data 223
11 2 3 Experiment 1: Liveliness 224
11 2 4 Experiment 2: Sadness 225
11 2 5 Experiment 3: Anger 226
11 2 6 Experiment 4: Expression Detection 227
11 3 Generating Expressive Speech with Limited Resources 228
11 3 1 Expressive Data Collection 228
11 3 2 FO Target Estimation 229
11 3 3 Target-Duration Estimation 230
11 3 4 Results 231
11 3 5 Adaptation via Sinusoidal Modeling 232
11 4 Rule-Based Methods for Generating Expressive Speech 233
11 5 Use of an Expressive TTS System 235
11 5 1 Proposed Extensions to SSML 237
11 6 Assessing Performance 243
11 7 Conclusions 245
11 8 Bibliography 247
11 9 FOOTNOTES 249
11 10COPYRIGHT FORMS 249
11 11REFERENCES 249
INDEX 251
|
any_adam_object | 1 |
building | Verbundindex |
bvnumber | BV019539601 |
callnumber-first | T - Technology |
callnumber-label | TK7882 |
callnumber-raw | TK7882.S65 |
callnumber-search | TK7882.S65 |
callnumber-sort | TK 47882 S65 |
callnumber-subject | TK - Electrical and Nuclear Engineering |
ctrlnum | (OCoLC)249193154 (DE-599)BVBBV019539601 |
dewey-full | 621.399 |
dewey-hundreds | 600 - Technology (Applied sciences) |
dewey-ones | 621 - Applied physics |
dewey-raw | 621.399 |
dewey-search | 621.399 |
dewey-sort | 3621.399 |
dewey-tens | 620 - Engineering and allied operations |
discipline | Elektrotechnik / Elektronik / Nachrichtentechnik |
edition | 1. print. |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01551nam a2200409zc 4500</leader><controlfield tag="001">BV019539601</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20041103 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">041102s2004 xxud||| |||| 00||| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">2004010674</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">013145661X</subfield><subfield code="c">hardcover : alk. paper</subfield><subfield code="9">0-13-145661-X</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)249193154</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV019539601</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-29T</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">TK7882.S65</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">621.399</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Text to speech synthesis</subfield><subfield code="b">new paradigms and advances</subfield><subfield code="c">[edited by] Shrikanth Narayanan, Abeer Alwan</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1. print.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Upper Saddle River, N.J.</subfield><subfield code="b">Prentice Hall Professional Technical Reference</subfield><subfield code="c">2004</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXIII, 257 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references and index</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">aSpeech synthesis</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Sprachsynthese</subfield><subfield code="0">(DE-588)4056501-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)4143413-4</subfield><subfield code="a">Aufsatzsammlung</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Sprachsynthese</subfield><subfield code="0">(DE-588)4056501-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Narayanan, Shrikanth</subfield><subfield code="e">Sonstige</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Alwan, Abeer</subfield><subfield code="e">Sonstige</subfield><subfield code="4">oth</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HEBIS Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=012907966&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-012907966</subfield></datafield></record></collection> |
genre | (DE-588)4143413-4 Aufsatzsammlung gnd-content |
genre_facet | Aufsatzsammlung |
id | DE-604.BV019539601 |
illustrated | Illustrated |
indexdate | 2024-07-09T20:00:32Z |
institution | BVB |
isbn | 013145661X |
language | English |
lccn | 2004010674 |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-012907966 |
oclc_num | 249193154 |
open_access_boolean | |
owner | DE-29T |
owner_facet | DE-29T |
physical | XXIII, 257 S. graph. Darst. |
publishDate | 2004 |
publishDateSearch | 2004 |
publishDateSort | 2004 |
publisher | Prentice Hall Professional Technical Reference |
record_format | marc |
spelling | Text to speech synthesis new paradigms and advances [edited by] Shrikanth Narayanan, Abeer Alwan 1. print. Upper Saddle River, N.J. Prentice Hall Professional Technical Reference 2004 XXIII, 257 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Includes bibliographical references and index aSpeech synthesis Sprachsynthese (DE-588)4056501-4 gnd rswk-swf (DE-588)4143413-4 Aufsatzsammlung gnd-content Sprachsynthese (DE-588)4056501-4 s DE-604 Narayanan, Shrikanth Sonstige oth Alwan, Abeer Sonstige oth HEBIS Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=012907966&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Text to speech synthesis new paradigms and advances aSpeech synthesis Sprachsynthese (DE-588)4056501-4 gnd |
subject_GND | (DE-588)4056501-4 (DE-588)4143413-4 |
title | Text to speech synthesis new paradigms and advances |
title_auth | Text to speech synthesis new paradigms and advances |
title_exact_search | Text to speech synthesis new paradigms and advances |
title_full | Text to speech synthesis new paradigms and advances [edited by] Shrikanth Narayanan, Abeer Alwan |
title_fullStr | Text to speech synthesis new paradigms and advances [edited by] Shrikanth Narayanan, Abeer Alwan |
title_full_unstemmed | Text to speech synthesis new paradigms and advances [edited by] Shrikanth Narayanan, Abeer Alwan |
title_short | Text to speech synthesis |
title_sort | text to speech synthesis new paradigms and advances |
title_sub | new paradigms and advances |
topic | aSpeech synthesis Sprachsynthese (DE-588)4056501-4 gnd |
topic_facet | aSpeech synthesis Sprachsynthese Aufsatzsammlung |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=012907966&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT narayananshrikanth texttospeechsynthesisnewparadigmsandadvances AT alwanabeer texttospeechsynthesisnewparadigmsandadvances |