Text-to-speech synthesis:
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | German |
Veröffentlicht: |
Cambridge [u.a.]
Cambridge Univ. Pr.
2009
|
Ausgabe: | 1. publ. |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Literaturverz. S. [556] - 582 |
Beschreibung: | XXVIII, 597 S. graph. Darst. |
ISBN: | 9780521899277 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV035039831 | ||
003 | DE-604 | ||
005 | 20090212 | ||
007 | t | ||
008 | 080905s2009 d||| |||| 00||| ger d | ||
020 | |a 9780521899277 |9 978-0-521-89927-7 | ||
035 | |a (OCoLC)221147648 | ||
035 | |a (DE-599)BVBBV035039831 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a ger | |
049 | |a DE-19 |a DE-11 | ||
050 | 0 | |a TK7882.S65 | |
082 | 0 | |a 006.54 |2 22 | |
084 | |a ES 950 |0 (DE-625)27936: |2 rvk | ||
100 | 1 | |a Taylor, Paul |e Verfasser |4 aut | |
245 | 1 | 0 | |a Text-to-speech synthesis |c Paul Taylor |
250 | |a 1. publ. | ||
264 | 1 | |a Cambridge [u.a.] |b Cambridge Univ. Pr. |c 2009 | |
300 | |a XXVIII, 597 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Literaturverz. S. [556] - 582 | ||
650 | 7 | |a Sprachcodierung |2 swd | |
650 | 7 | |a Synthèse automatique de la parole |2 ram | |
650 | 4 | |a Speech synthesis | |
856 | 4 | 2 | |m HEBIS Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016708674&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-016708674 |
Datensatz im Suchindex
_version_ | 1804137974089121792 |
---|---|
adam_text | TEXT-TO-SPEECH SYNTHESIS PAUL TAYLOR UNIVERSITY OF CAMBRIDGE CAMBRIDGE
UNIVERSITY PRESS CONTENTS FOREWORD PAGE XXIII PREFACE XXVII INTRODUCTION
1 1.1 WHAT ARE TEXT-TO-SPEECH SYSTEMS FOR? 2 1.2 WHAT SHOULD THE GOALS
OF TEXT-TO-SPEECH SYSTEM DEVELOPMENT BE? 3 1.3 THE ENGINEERING APPROACH
4 1.4 OVERVIEW OF THE BOOK 5 1.4.1 VIEWPOINTS WITHIN THE BOOK 5 1.4.2
READERS BACKGROUNDS 6 1.4.3 BACKGROUND AND SPECIALIST SECTIONS 7
COMMUNICATION AND LANGUAGE 8 9 10 11 12 13 14 16 17 18 18 19 20 21 22 22
23 24 2.1 2.2 2.3 2.4 2.5 TYPES 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 OF
COMMUNICATION AFFECTIVE COMMUNICATION ICONIC COMMUNICATION SYMBOLIC
COMMUNICATION COMBINATIONS OF SYMBOLS MEANING, FORM AND SIGNAL HUMAN
COMMUNICATION 2.2.1 2.2.2 2.2.3 2.2.4 VERBAL COMMUNICATION LINGUISTIC
LEVELS AFFECTIVE PROSODY AUGMENTATIVE PROSODY COMMUNICATION PROCESSES
2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 COMMUNICATION FACTORS GENERATION ENCODING
DECODING UNDERSTANDING DISCUSSION SUMMARY CONTENTS THE TEXT-TO-SPEECH
PROBLEM 26 3.1 SPEECH AND WRITING 26 3.1.1 PHYSICAL NATURE 27 3.1.2
SPOKEN FORM AND WRITTEN FORM 28 3.1.3 USE 29 3.1.4 PROSODIC AND VERBAL
CONTENT 30 3.1.5 COMPONENT BALANCE 31 3.1.6 NON-LINGUISTIC CONTENT 32
3.1.7 SEMIOTIC SYSTEMS 33 3.1.8 WRITING SYSTEMS 34 3.2 READING ALOUD 35
3.2.1 READING SILENTLY AND READING ALOUD 35 3.2.2 PROSODY IN READING
ALOUD 36 3.2.3 VERBAL CONTENT AND STYLE IN READING ALOUD 37 3.3
TEXT-TO-SPEECH SYSTEM ORGANISATION 37 3.3.1 THE COMMON-FORM MODEL 38
3.3.2 OTHER MODELS 39 3.3.3 COMPARISON 40 3.4 SYSTEMS 41 3.4.1 A SIMPLE
TEXT-TO-SPEECH SYSTEM 41 3.4.2 CONCEPT TO SPEECH 42 3.4.3 CANNED
SPEECH AND LIMITED-DOMAIN SYNTHESIS 43 3.5 KEY PROBLEMS IN
TEXT-TO-SPEECH 44 3.5.1 TEXT CLASSIFICATION WITH RESPECT TO SEMIOTIC
SYSTEMS 44 3.5.2 DECODING NATURAL-LANGUAGE TEXT 46 3.5.3 NATURALNESS 47
3.5.4 INTELLIGIBILITY: ENCODING THE MESSAGE IN SIGNAL 48 3.5.5 AUXILIARY
GENERATION FOR PROSODY 49 3.5.6 ADAPTING THE SYSTEM TO THE SITUATION 50
3.6 SUMMARY 50 TEXT SEGMENTATION AND ORGANISATION 52 52 53 53 55 59 60
61 61 62 63 63 4.1 OVERVIEW OF THE PROBLEM 4.2 WORDS 4.2 4.2 4.2 4.2 4.2
4.2 4.2 4.2 .1 .2 .3 .4 .5 .6 .7 .8 4.3 TEXT SE AND SENTENCES WHAT IS A
WORD? DEFINING WORDS IN TEXT-TO-SPEECH SCOPE AND MORPHOLOGY CONTRACTIONS
AND CLITICS SLANG FORMS HYPHENATED FORMS WHAT IS A SENTENCE? THE LEXICON
GMENTATION CONTENTS 4.3.1 TOKENISATION 64 4.3.2 TOKENISATION AND
PUNCTUATION 65 4.3.3 TOKENISATION ALGORITHMS 66 4.3.4 SENTENCE SPLITTING
67 4.4 PROCESSING DOCUMENTS 68 4.4.1 MARKUP LANGUAGES 68 4.4.2
INTERPRETING CHARACTERS 70 4.5 TEXT-TO-SPEECH ARCHITECTURES 71 4.6
DISCUSSION 75 4.6.1 FURTHER READING 75 4.6.2 SUMMARY 76 TEXT DECODING:
FINDING THE WORDS FROM THE TEXT 78 5.1 OVERVIEW OF TEXT DECODING 78 5.2
TEXT-CLASSIFICATION ALGORITHMS 79 5.2.1 FEATURES AND ALGORITHMS 79 5.2.2
TAGGING AND WORD-SENSE DISAMBIGUATION 82 5.2.3 AD-HOC APPROACHES 83
5.2.4 DETERMINISTIC RULE APPROACHES 83 5.2.5 DECISION LISTS , 85 5.2.6
NAIVE BAYES CLASSIFIER 86 5.2.7 DECISION TREES 87 5.2.8 PART-OF-SPEECH
TAGGING 88 5.3 NON-NATURAL-LANGUAGE TEXT 92 5.3.1 SEMIOTIC
CLASSIFICATION 92 5.3.2 SEMIOTIC DECODING 95 5.3.3 VERBALISATION 95 5.4
NATURAL-LANGUAGE TEXT 97 5.4.1 ACRONYMS AND LETTER SEQUENCES 99 5.4.2
HOMOGRAPH DISAMBIGUATION 99 5.4.3 NON-HOMOGRAPHS 101 5.5
NATURAL-LANGUAGE PARSING 102 5.5.1 CONTEXT-FREE GRAMMARS 102 5.5.2
STATISTICAL PARSING 105 5.6 DISCUSSION 105 5.6.1 FURTHER READING 108
5.6.2 SUMMARY 109 PROSODY PREDICTION FROM TEXT 111 6.1 PROSODIC FORM 111
6.2 PHRASING 112 6.2.1 PHRASING PHENOMENA 112 6.2.2 MODELS OF PHRASING
113 XII CONTENTS 6.3 PROMINENCE 115 6.3.1 SYNTACTIC PROMINENCE PATTERNS
116 6.3.2 DISCOURSE PROMINENCE PATTERNS 118 6.3.3 PROMINENCE SYSTEMS,
DATA AND LABELLING 119 6.4 INTONATION AND TUNE 121 6.5 PROSODIC MEANING
AND FUNCTION 122 6.5.1 AFFECTIVE PROSODY 123 6.5.2 SUPRASEGMENTALITY 124
6.5.3 AUGMENTATIVE PROSODY 125 6.5.4 SYMBOLIC COMMUNICATION AND PROSODIC
STYLE 126 6.6 DETERMINING PROSODY FROM THE TEXT 127 6.6.1 PROSODY AND
HUMAN READING 127 6.6.2 CONTROLLING THE-DEGREE OF AUGMENTATIVE PROSODY
128 6.6.3 PROSODY AND SYNTHESIS TECHNIQUES 128 6.7 PHRASING PREDICTION
129 6.7.1 EXPERIMENTAL FORMULATION 129 6.7.2 DETERMINISTIC APPROACHES
130 6.7.3 CLASSIFIER APPROACHES 132 6.7.4 HMM APPROACHES 133 6.7.5
HYBRID APPROACHES 135 6.8 PROMINENCE PREDICTION . 136 6.8.1
COMPOUND-NOUN PHRASES 136 6.8.2 FUNCTION-WORD PROMINENCE 138 6.8.3
DATA-DRIVEN APPROACHES 138 6.9 INTONATIONAL-TUNE PREDICTION 139 6.10
DISCUSSION 139 6.10.1 LABELLING SCHEMES AND LABELLING ACCURACY 139
6.10.2 LINGUISTIC THEORIES AND PROSODY 141 6.10.3 SYNTHESISING
SUPRASEGMENTAL AND TRUE PROSODY 142 6.10.4 PROSODY IN REAL DIALOGUES 143
6.10.5 CONCLUSION 144 6.10.6 SUMMARY 144 PHONETICS AND PHONOLOGY 146 7.1
ARTICULATORY PHONETICS AND SPEECH PRODUCTION 146 7.1.1 THE VOCAL ORGANS
147 7.1.2 SOUND SOURCES 147 7.1.3 SOUND OUTPUT 150 7.1.4 THE VOCAL-TRACT
FILTER 150 7.1.5 VOWELS 151 7.1.6 CONSONANTS 153 7.1.7 EXAMINING SPEECH
PRODUCTION 155 7.2 ACOUSTICS, PHONETICS AND SPEECH PERCEPTION 156
CONTENTS XIII 7.3 7.4 7.5 7.2.1 ACOUSTIC REPRESENTATIONS 7.2.2 ACOUSTIC
CHARACTERISTICS THE COMMUNICATIVE USE OF SPEECH 7.3.1 COMMUNICATING
DISCRETE INFORMATION WITH A CONTINUOUS CHANNEL 7.3.2 PHONEMES, PHONES
AND ALLOPHONES 7.3.3 ALLOPHONIC VARIATION AND PHONETIC CONTEXT 7.3.4
COARTICULATION, TARGETS AND TRANSIENTS 7.3.5 THE CONTINUOUS NATURE OF
SPEECH 7.3.6 TRANSCRIPTION 7.3.7 THE DISTINCTIVENESS OF SPEECH IN
COMMUNICATION PHONOLOGY: THE LINGUISTIC ORGANISATION OF SPEECH 7.4.1
PHONOTACTICS 7.4.2 WORD FORMATION 7.4.3 DISTINCTIVE FEATURES AND
PHONOLOGICAL THEORIES 7.4.4 SYLLABLES 7.4.5 LEXICAL STRESS DISCUSSION
7.5.1 FURTHER READING 7.5.2 SUMMARY 8 PRONUNCIATION 8.1 8.2 8.3 8.4
PRONUNCIATION REPRESENTATIONS 8.1.1 WHY BOTHER? 8.1.2 PHONEMIC AND
PHONETIC INPUT 8.1.3 DIFFICULTIES IN DERIVING PHONETIC INPUT 8.1.4 A
STRUCTURED APPROACH TO PRONUNCIATION 8.1.5 ABSTRACT PHONOLOGICAL
REPRESENTATIONS FORMULATING A PHONOLOGICAL REPRESENTATION SYSTEM 8.2.1
SIMPLE CONSONANTS AND VOWELS 8.2.2 DIFFICULT CONSONANTS 8.2.3 DIPHTHONGS
AND AFFRICATES 8.2.4 APPROXIMANT-VOWEL COMBINATIONS 8.2.5 DEFINING THE
FULL INVENTORY 8.2.6 PHONEME NAMES 8.2.7 SYLLABIC ISSUES THE LEXICON
8.3.1 LEXICON AND RULES 8.3.2 LEXICON FORMATS 8.3.3 THE OFFLINE LEXICON
8.3.4 THE SYSTEM LEXICON 8.3.5 LEXICON QUALITY 8.3.6 DETERMINING THE
PRONUNCIATIONS OF UNKNOWN WORDS GRANHEME-TO-RJHONEME CONVERSION 156 159
160 161 162 166 168 169 170 171 172 172 179 181 184 186 189 189 190 192
192 192 193 194 195 196 197 197 199 201 201 203 204 206 207 208 210 213
214 215 216 218 XIV CONTENTS 8.4.1 8.4.2 8.4.3 8.4.4 8.4.5 8.4.6
RULE-BASED TECHNIQUES GRAPHEME-TO-PHONEME ALIGNMENT NEURAL NETWORKS
PRONUNCIATION BY ANALOGY OTHER DATA-DRIVEN TECHNIQUES STATISTICAL
TECHNIQUES 8.5 FURTHER ISSUES 8.5.1 8.5.2 8.5.3 MORPHOLOGY LANGUAGE
ORIGIN AND NAMES POST-LEXICAL PROCESSING 8.6 SUMMARY 218 219 219 220 221
221 222 222 223 223 224 SYNTHESIS OF PROSODY 225 9.1 INTONATION OVERVIEW
225 9.1.1 F0 AND PITCH 226 9.1.2 INTONATIONAL FORM 226 9.1.3 MODELS OF
F0 CONTOURS 227 9.1.4 MICRO-PROSODY 229 9.2 INTONATIONAL BEHAVIOUR 229
9.2.1 INTONATIONAL TUNE 229 9.2.2 DOWNDRIFT 230 9.2.3 PITCH RANGE 233
9.2.4 PITCH ACCENTS AND BOUNDARY TONES 234 9.3 INTONATION THEORIES AND
MODELS 236 9.3.1 TRADITIONAL MODELS AND THE BRITISH SCHOOL 236 9.3.2 THE
DUTCH SCHOOL 237 9.3.3 AUTOSEGMENTAL-METRICAL AND TOBI MODELS 237 9.3.4
THE INTSINT MODEL 239 9.3.5 THE FUJISAKI MODEL AND SUPERIMPOSITIONAL
MODELS 239 9.3.6 THE TILT MODEL 242 9.3.7 COMPARISON 244 9.4 INTONATION
SYNTHESIS WITH AM MODELS 245 9.4.1 PREDICTION OF AM LABELS FROM TEXT 246
9.4.2 DETERMINISTIC SYNTHESIS METHODS 246 9.4.3 DATA-DRIVEN SYNTHESIS
METHODS 247 9.4.4 ANALYSIS WITH AUTOSEGMENTAL MODELS 248 9.5 INTONATION
SYNTHESIS WITH DETERMINISTIC ACOUSTIC MODELS 248 9.5.1 SYNTHESIS WITH
SUPERIMPOSITIONAL MODELS 249 9.5.2 SYNTHESIS WITH THE TILT MODEL 249
9.5.3 ANALYSIS WITH FUJISAKI AND TILT MODELS 250 9.6 DATA-DRIVEN
INTONATION MODELS 250 9.6.1 UNIT-SELECTION-STYLE APPROACHES 251 9.6.2
DYNAMIC-SYSTEM MODELS 252 CONTENTS XV 9.7 9.8 9.6.3 9.6.4 HIDDEN MARKOV
MODELS FUNCTIONAL MODELS TIMING 9.7.1 9.7.2 9.7.3 9.7.4 9.7.5 9.7.6
FORMULATION OF THE TIMING PROBLEM THE NATURE OF TIMING KLATT RULES THE
SUMS-OF-PRODUCTS MODEL THE CAMPBELL MODEL OTHER REGRESSION TECHNIQUES
DISCUSSION 9.8.1 9.8.2 SIGNALS AND FURTHER READING SUMMARY FILTERS 253
254 254 255 255 256 257 258 259 259 260 260 10 SIGNALS AND FILTERS 262
10.1 ANALOGUE SIGNALS 262 10.1.1 SIMPLE PERIODIC SIGNALS: SINUSOIDS 263
10.1.2 GENERAL PERIODIC SIGNALS 265 10.1.3 SINUSOIDS AS COMPLEX
EXPONENTIALS 266 10.1.4 FOURIER ANALYSIS 269 10.1.5 THE FREQUENCY DOMAIN
270 10.1.6 THE FOURIER TRANSFORM 275 10.2 DIGITAL SIGNALS 278 10.2.1
DIGITAL WAVEFORMS 279 10.2.2 DIGITAL REPRESENTATIONS 280 10.2.3 THE
DISCRETE-TIME FOURIER TRANSFORM 280 10.2.4 THE DISCRETE FOURIER
TRANSFORM 281 10.2.5 THE Z-TRANSFORM 282 10.2.6 THE FREQUENCY DOMAIN FOR
DIGITAL SIGNALS 283 10.3 PROPERTIES OF TRANSFORMS 284 10.3.1 LINEARITY
284 10.3.2 TIME AND FREQUENCY DUALITY 284 10.3.3 SCALING 285 10.3.4
IMPULSE PROPERTIES 285 10.3.5 TIME DELAY 286 10.3.6 FREQUENCY SHIFT 286
10.3.7 CONVOLUTION 287 10.3.8 ANALYTICAL AND NUMERICAL ANALYSIS 287
10.3.9 STOCHASTIC SIGNALS 288 10.4 DIGITAL FILTERS 288 10.4.1 DIFFERENCE
EQUATIONS 289 10.4.2 THE IMPULSE RESPONSE 289 10.4.3 THE FILTER
CONVOLUTION SUM 292 10.4.4 THE FILTER TRANSFER FUNCTION 293 XVI CONTENTS
10.4.5 THE TRANSFER FUNCTION AND THE IMPULSE RESPONSE 293 10.5 DIGITAL
FILTER ANALYSIS AND DESIGN 294 10.5.1 POLYNOMIAL ANALYSIS: POLES AND
ZEROS 294 10.5.2 FREQUENCY INTERPRETATION OF THE Z-DOMAIN TRANSFER
FUNCTION 297 10.5.3 FILTER CHARACTERISTICS 298 10.5.4 PUTTING IT ALL
TOGETHER 304 10.6 SUMMARY 305 11 ACOUSTIC MODELS OF SPEECH PRODUCTION
309 11.1 THE ACOUSTIC THEORY OF SPEECH PRODUCTION 309 11.1.1 COMPONENTS
IN THE MODEL 309 11.2 THE PHYSICS OF SOUND 311 11.2.1 RESONANT SYSTEMS
311 11.2.2 TRAVELLING WAVES 313 11.2.3 ACOUSTIC WAVES 315 11.2.4
ACOUSTIC REFLECTION 317 11.3 THE VOWEL-TUBE MODEL 318 11.3.1 DISCRETE
TIME AND DISTANCE 319 11.3.2 JUNCTION OF TWO TUBES 320 11.3.3 SPECIAL
CASES OF JUNCTIONS 322 11.3.4 THE TWO-TUBE VOCAL-TRACT MODEL 323
11.3.5 THE SINGLE-TUBE MODEL 325 11.3.6 THE MULTI-TUBE VOCAL-TRACT MODEL
327 11.3.7 THE ALL-POLE RESONATOR MODEL 329 11.4 SOURCE AND RADIATION
MODELS 330 11.4.1 RADIATION 330 11.4.2 THE GLOTTAL SOURCE 330 11.5 MODEL
REFINEMENTS 333 11.5.1 MODELLING THE NASAL CAVITY 333 11.5.2 SOURCE
POSITIONS IN THE ORAL CAVITY 334 11.5.3 MODELS WITH VOCAL-TRACT LOSSES
335 11.5.4 SOURCE AND RADIATION EFFECTS 336 11.6 DISCUSSION 336 11.6.1
FURTHER READING 339 11.6.2 SUMMARY 339 12 ANALYSIS OF SPEECH SIGNALS 341
12.1 SHORT-TERM SPEECH ANALYSIS 341 12.1.1 WINDOWING 342 12.1.2
SHORT-TERM SPECTRAL REPRESENTATIONS 343 12.1.3 FRAME LENGTHS AND SHIFTS
345 12.1.4 THE SPECTROGRAM 348 12.1.5 AUDITORY SCALES 351 CONTENTS XVII
12.2 FILTER-BANK ANALYSIS 352 12.3 THE CEPSTRUM 353 12.3.1 CEPSTRUM
DEFINITION 353 12.3.2 TREATING THE MAGNITUDE SPECTRUM AS A SIGNAL 353
12.3.3 CEPSTRAL ANALYSIS AS DECONVOLUTION 355 12.3.4 CEPSTRAL ANALYSIS
DISCUSSION 356 12.4 LINEAR-PREDICTION ANALYSIS 357 12.4.1 FINDING THE
COEFFICIENTS: THE COVARIANCE METHOD 358 12.4.2 THE AUTOCORRELATION
METHOD 360 12.4.3 LEVINSON-DURBIN RECURSION 361 12.5 SPECTRAL-ENVELOPE
AND VOCAL-TRACT REPRESENTATIONS 362 12.5.1 LINEAR-PREDICTION SPECTRA 362
12.5.2 TRANSFER-FUNCTION POLES 364 12.5.3 REFLECTION COEFFICIENTS 366
12.5.4 LOG AREA RATIOS 367 12.5.5 LINE-SPECTRUM FREQUENCIES 367 12.5.6
LINEAR-PREDICTION CEPSTRA 369 12.5.7 MEL-SCALED CEPSTRA 370 12.5.8
PERCEPTUAL LINEAR PREDICTION 370 12.5.9 FORMANT TRACKING 370 12.6 SOURCE
REPRESENTATIONS , 372 12.6.1 RESIDUAL SIGNALS 372 12.6.2 CLOSED-PHASE
ANALYSIS 374 12.6.3 OPEN-PHASE ANALYSIS 377 12.6.4 IMPULSE/NOISE MODELS
378 12.6.5 PARAMETERISATION OF GLOTTAL-FLOW SIGNALS 379 12.7 PITCH AND
EPOCH DETECTION 379 12.7.1 PITCH DETECTION 379 12.7.2 EPOCH DETECTION:
FINDING THE INSTANT OF GLOTTAL CLOSURE 381 12.8 DISCUSSION 384 12.8.1
FURTHER READING 385 12.8.2 SUMMARY 386 13 SYNTHESIS TECHNIQUES BASED ON
VOCAL-TRACT MODELS 387 13.1 SYNTHESIS SPECIFICATION: THE INPUT TO THE
SYNTHESISER 387 13.2 FORMANT SYNTHESIS 388 13.2.1 SOUND SOURCES 389
13.2.2 SYNTHESISING A SINGLE FORMANT 390 13.2.3 RESONATORS IN SERIES AND
PARALLEL 391 13.2.4 SYNTHESISING CONSONANTS 392 13.2.5 A COMPLETE
SYNTHESISER 394 13.2.6 THE PHONETIC INPUT TO THE SYNTHESISER 394 13.2.7
FORMANT-SYNTHESIS QUALITY 397 XVIII CONTENTS 13.3 CLASSICAL
LINEAR-PREDICTION SYNTHESIS 13.3.1 COMPARISON WITH FORMANT SYNTHESIS
13.3.2 THE IMPULSE/NOISE SOURCE MODEL 13.3.3 LINEAR-PREDICTION
DIPHONE-CONCATENATIVE SYNTHESIS 13.3.4 A COMPLETE SYNTHESISER 13.3.5
PROBLEMS WITH THE SOURCE 13.4 ARTICULATORY SYNTHESIS 13.5 DISCUSSION
13.5.1 FURTHER READING 13.5.2 SUMMARY 14 SYNTHESIS BY CONCATENATION AND
SIGNAL-PROCESSING MODIFICATION 14.1 SPEECH UNITS IN SECOND-GENERATION
SYSTEMS 14.1.1 CREATING A DIPHONE INVENTORY 14.1.2 OBTAINING DIPHONES
FROM SPEECH 14.2 PITCH-SYNCHRONOUS OVERLAP AND ADD (PSOLA) 14.2.1
TIME-DOMAIN PSOLA 14.2.2 EPOCH MANIPULATION 14.2.3 HOW DOES PSOLA WORK?
14.3 RESIDUAL-EXCITED LINEAR PREDICTION (RELP) 14.3.1 RESIDUAL
MANIPULATION 14.3.2 LINEAR-PREDICTION PSOLA 14.4 SINUSOIDAL MODELS
14.4.1 PURE SINUSOIDAL MODELS 14.4.2 HARMONIC/NOISE MODELS 14.5 MBROLA
14.6 SYNTHESIS FROM CEPSTRAL COEFFICIENTS 14.7 CONCATENATION ISSUES 14.8
DISCUSSION 14.8.1 FURTHER READING 14.8.2 SUMMARY 15 HIDDEN-MARKOV-MODEL
SYNTHESIS 15.1 THE HMM FORMALISM 15.1.1 OBSERVATION PROBABILITIES 15.1.2
DELTA COEFFICIENTS 15.1.3 ACOUSTIC REPRESENTATIONS AND COVARIANCE 15.
15. 15. 15. 15. .4 STATES AND TRANSITIONS .5 RECOGNISING WITH HMMS .6
LANGUAGE MODELS .7 THE VITERBI ALGORITHM .8 TRAINING HMMS 15.1.9
CONTEXT-SENSITIVE MODELLING 399 399 400 401 403 404 405 407 409 410 412
413 414 414 415 416 417 421 421 423 423 424 425 426 429 429 431 433 433
433 435 435 436 438 439 440 440 443 444 447 451 CONTENTS XIX 15.1.10 ARE
HMMS A GOOD MODEL OF SPEECH? 454 15.2 SYNTHESIS FROM HIDDEN MARKOV
MODELS 456 15.2.1 FINDING THE LIKELIEST OBSERVATIONS GIVEN THE STATE
SEQUENCE 457 15.2.2 FINDING THE LIKELIEST OBSERVATIONS AND STATE
SEQUENCE 459 15.2.3 ACOUSTIC REPRESENTATIONS 459 15.2.4
CONTEXT-SENSITIVE SYNTHESIS MODELS 463 15.2.5 DURATION MODELLING 464
15.2.6 SIGNAL PROCESSING IN HMM SYNTHESIS 464 15.2.7 HMM SYNTHESIS
SYSTEMS 465 15.3 LABELLING DATABASES WITH HMMS 467 15.3.1 DETERMINING
THE WORD SEQUENCE 467 15.3.2 DETERMINING THE PHONE SEQUENCE 468 15.3.3
DETERMINING THE PHONE BOUNDARIES 468 15.3.4 MEASURING THE QUALITY OF THE
ALIGNMENTS 470 15.4 OTHER DATA-DRIVEN SYNTHESIS TECHNIQUES 471 15.5
DISCUSSION 471 15.5.1 FURTHER READING 471 15.5.2 SUMMARY 472 16
UNIT-SELECTION SYNTHESIS 474 16.1 FROM CONCATENATIVE SYNTHESIS TO UNIT
SELECTION 474 16.1.1 EXTENDING CONCATENATIVE SYNTHESIS 475 16.1.2 THE
HUNT AND BLACK ALGORITHM 477 16.2 FEATURES, 479 16.2.1 BASE TYPES 479
16.2.2 LINGUISTIC AND ACOUSTIC FEATURES 480 16.2.3 CHOICE OF FEATURES
481 16.2.4 TYPES OF FEATURES 482 16.3 THE INDEPENDENT-FEATURE
TARGET-FUNCTION FORMULATION 484 16.3.1 THE PURPOSE OF THE TARGET
FUNCTION 484 16.3.2 DEFINING A PERCEPTUAL SPACE 485 16.3.3 PERCEPTUAL
SPACES DEFINED BY INDEPENDENT FEATURES 486 16.3.4 SETTING THE TARGET
WEIGHTS USING ACOUSTIC DISTANCES 488 16.3.5 LIMITATIONS OF THE
INDEPENDENT-FEATURE FORMULATION 491 16.4 THE ACOUSTIC-SPACE
TARGET-FUNCTION FORMULATION 493 16.4.1 DECISION-TREE CLUSTERING 494
16.4.2 GENERAL PARTIAL-SYNTHESIS FUNCTIONS 496 16.5 JOIN FUNCTIONS 497
16.5.1 BASIC ISSUES IN JOINING UNITS 497 16.5.2 PHONE-CLASS JOIN COSTS
498 16.5.3 ACOUSTIC-DISTANCE JOIN COSTS 499 16.5.4 COMBINING CATEGORICAL
AND AND ACOUSTIC JOIN COSTS 500 16.5.5 PROBABILISTIC AND SEQUENCE JOIN
COSTS 501 XX CONTENTS 16.5.6 JOIN CLASSIFIERS 502 16.6 SEARCHING 504
16.6.1 BASE TYPES AND SEARCHING 505 16.6.2 PRUNING 508 16.6.3
PRE-SELECTION 508 16.6.4 BEAM PRUNING 509 16.6.5 MULTI-PASS SEARCHING
509 16.7 DISCUSSION 510 16.7.1 UNIT SELECTION AND SIGNAL PROCESSING 511
16.7.2 FEATURES, COSTS AND PERCEPTION 511 16.7.3 EXAMPLE UNIT-SELECTION
SYSTEMS 512 16.7.4 FURTHER READING 514 16.7.5 SUMMARY 515 17 FURTHER
ISSUES 517 17.1 DATABASES 517 17.1.1 UNIT-SELECTION DATABASES 517 17.1.2
TEXT MATERIALS 518 17.1.3 PROSODY DATABASES 518 17.1.4 LABELLING 519
17.1.5 WHAT EXACTLY IS HAND LABELLING? 519 17.1.6 AUTOMATIC LABELLING
521 17.1.7 AVOIDING EXPLICIT LABELS 521 17.2 EVALUATION 522 17.2.1
SYSTEM TESTING: INTELLIGIBILITY AND NATURALNESS 523 17.2.2
WORD-RECOGNITION TESTS 523 17.2.3 NATURALNESS TESTS 524 17.2.4 TEST DATA
525 17.2.5 UNIT OR COMPONENT TESTING 525 17.2.6 COMPETITIVE EVALUATIONS
526 17.3 AUDIO-VISUAL SPEECH SYNTHESIS 527 17.3.1 SPEECH CONTROL 528
17.4 SYNTHESIS OF EMOTIONAL AND EXPRESSIVE SPEECH 529 17.4.1 DESCRIBING
EMOTION 529 17.4.2 SYNTHESISING EMOTION WITH PROSODY CONTROL 529 17.4.3
SYNTHESISING EMOTION WITH VOICE TRANSFORMATION 530 17.4.4 UNIT SELECTION
AND HMM TECHNIQUES 531 17.5 SUMMARY 531 18 CONCLUSION 533 18.1 SPEECH
TECHNOLOGY AND LINGUISTICS 533 18.2 FUTURE DIRECTIONS 536 18.3
CONCLUSION 539 CONTENTS XXI APPENDIX A PROBABILITY 540 A. 1 DISCRETE
PROBABILITIES 540 A. 1.1 DISCRETE RANDOM VARIABLES 540 A. 1.2
PROBABILITY MASS FUNCTIONS 541 A. 1.3 EXPECTED VALUES 541 A. 1.4 MOMENTS
OF A PMF 542 A.2 PAIRS OF DISCRETE RANDOM VARIABLES 542 A.2.1 MARGINAL
DISTRIBUTIONS 543 A.2.2 INDEPENDENCE 543 A.2.3 EXPECTED VALUES 543 A.2.4
MOMENTS OF A JOINT DISTRIBUTION 544 A.2.5 HIGHER-ORDER MOMENTS AND
COVARIANCE 544 A.2.6 CORRELATION 544 A.2.7 CONDITIONAL PROBABILITY 545
A.2.8 BAYES RULE 545 A.2.9 SUM OF RANDOM VARIABLES 545 A.2.10 THE CHAIN
RULE 546 A.2.11 ENTROPY 546 A.3 CONTINUOUS RANDOM VARIABLES 547 A.3.1
CONTINUOUS RANDOM VARIABLES 548 A.3.2 EXPECTED VALUES 548 A.3.3 THE
GAUSSIAN DISTRIBUTION 549 A.3.4 THE UNIFORM DISTRIBUTION 549 - A.3.5
CUMULATIVE DENSITY FUNCTIONS 549 A.4 PAIRS OF CONTINUOUS RANDOM
VARIABLES 550 A.4.1 INDEPENDENT VERSUS UNCORRELATED 551 A.4.2 THE SUM OF
TWO RANDOM VARIABLES 551 A.4.3 ENTROPY 552 A.4.4 KULLBACK-LEIBLER
DISTANCE 552 APPENDIX B PHONE DEFINITIONS 553 REFERENCES 556 INDEX 583
|
adam_txt |
TEXT-TO-SPEECH SYNTHESIS PAUL TAYLOR UNIVERSITY OF CAMBRIDGE CAMBRIDGE
UNIVERSITY PRESS CONTENTS FOREWORD PAGE XXIII PREFACE XXVII INTRODUCTION
1 1.1 WHAT ARE TEXT-TO-SPEECH SYSTEMS FOR? 2 1.2 WHAT SHOULD THE GOALS
OF TEXT-TO-SPEECH SYSTEM DEVELOPMENT BE? 3 1.3 THE ENGINEERING APPROACH
4 1.4 OVERVIEW OF THE BOOK 5 1.4.1 VIEWPOINTS WITHIN THE BOOK 5 1.4.2
READERS'BACKGROUNDS 6 1.4.3 BACKGROUND AND SPECIALIST SECTIONS ' 7
COMMUNICATION AND LANGUAGE 8 9 10 11 12 13 14 16 17 18 18 19 20 21 22 22
23 24 2.1 2.2 2.3 2.4 2.5 TYPES 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 OF
COMMUNICATION AFFECTIVE COMMUNICATION ICONIC COMMUNICATION SYMBOLIC
COMMUNICATION COMBINATIONS OF SYMBOLS MEANING, FORM AND SIGNAL HUMAN
COMMUNICATION 2.2.1 2.2.2 2.2.3 2.2.4 VERBAL COMMUNICATION LINGUISTIC
LEVELS AFFECTIVE PROSODY AUGMENTATIVE PROSODY COMMUNICATION PROCESSES
2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 COMMUNICATION FACTORS GENERATION ENCODING
DECODING UNDERSTANDING DISCUSSION SUMMARY CONTENTS THE TEXT-TO-SPEECH
PROBLEM 26 3.1 SPEECH AND WRITING 26 3.1.1 PHYSICAL NATURE 27 3.1.2
SPOKEN FORM AND WRITTEN FORM 28 3.1.3 USE 29 3.1.4 PROSODIC AND VERBAL
CONTENT 30 3.1.5 COMPONENT BALANCE 31 3.1.6 NON-LINGUISTIC CONTENT 32
3.1.7 SEMIOTIC SYSTEMS 33 3.1.8 WRITING SYSTEMS 34 3.2 READING ALOUD 35
3.2.1 READING SILENTLY AND READING ALOUD 35 3.2.2 PROSODY IN READING
ALOUD 36 3.2.3 VERBAL CONTENT AND STYLE IN READING ALOUD 37 3.3
TEXT-TO-SPEECH SYSTEM ORGANISATION 37 3.3.1 THE COMMON-FORM MODEL 38
3.3.2 OTHER MODELS 39 3.3.3 COMPARISON 40 3.4 SYSTEMS 41 3.4.1 A SIMPLE
TEXT-TO-SPEECH SYSTEM 41 3.4.2 CONCEPT TO SPEECH ' 42 3.4.3 CANNED
SPEECH AND LIMITED-DOMAIN SYNTHESIS 43 3.5 KEY PROBLEMS IN
TEXT-TO-SPEECH 44 3.5.1 TEXT CLASSIFICATION WITH RESPECT TO SEMIOTIC
SYSTEMS 44 3.5.2 DECODING NATURAL-LANGUAGE TEXT 46 3.5.3 NATURALNESS 47
3.5.4 INTELLIGIBILITY: ENCODING THE MESSAGE IN SIGNAL 48 3.5.5 AUXILIARY
GENERATION FOR PROSODY 49 3.5.6 ADAPTING THE SYSTEM TO THE SITUATION 50
3.6 SUMMARY 50 TEXT SEGMENTATION AND ORGANISATION 52 52 53 53 55 59 60
61 61 62 63 63 4.1 OVERVIEW OF THE PROBLEM 4.2 WORDS 4.2 4.2 4.2 4.2 4.2
4.2 4.2 4.2 .1 .2 .3 .4 .5 .6 .7 .8 4.3 TEXT SE AND SENTENCES WHAT IS A
WORD? DEFINING WORDS IN TEXT-TO-SPEECH SCOPE AND MORPHOLOGY CONTRACTIONS
AND CLITICS SLANG FORMS HYPHENATED FORMS WHAT IS A SENTENCE? THE LEXICON
GMENTATION CONTENTS 4.3.1 TOKENISATION 64 4.3.2 TOKENISATION AND
PUNCTUATION 65 4.3.3 TOKENISATION ALGORITHMS 66 4.3.4 SENTENCE SPLITTING
67 4.4 PROCESSING DOCUMENTS 68 4.4.1 MARKUP LANGUAGES 68 4.4.2
INTERPRETING CHARACTERS 70 4.5 TEXT-TO-SPEECH ARCHITECTURES 71 4.6
DISCUSSION 75 4.6.1 FURTHER READING 75 4.6.2 SUMMARY 76 TEXT DECODING:
FINDING THE WORDS FROM THE TEXT 78 5.1 OVERVIEW OF TEXT DECODING 78 5.2
TEXT-CLASSIFICATION ALGORITHMS 79 5.2.1 FEATURES AND ALGORITHMS 79 5.2.2
TAGGING AND WORD-SENSE DISAMBIGUATION 82 5.2.3 AD-HOC APPROACHES 83
5.2.4 DETERMINISTIC RULE APPROACHES 83 5.2.5 DECISION LISTS , 85 5.2.6
NAIVE BAYES CLASSIFIER ' 86 5.2.7 DECISION TREES 87 5.2.8 PART-OF-SPEECH
TAGGING 88 5.3 NON-NATURAL-LANGUAGE TEXT 92 5.3.1 SEMIOTIC
CLASSIFICATION 92 5.3.2 SEMIOTIC DECODING 95 5.3.3 VERBALISATION 95 5.4
NATURAL-LANGUAGE TEXT 97 5.4.1 ACRONYMS AND LETTER SEQUENCES 99 5.4.2
HOMOGRAPH DISAMBIGUATION 99 5.4.3 NON-HOMOGRAPHS 101 5.5
NATURAL-LANGUAGE PARSING 102 5.5.1 CONTEXT-FREE GRAMMARS 102 5.5.2
STATISTICAL PARSING 105 5.6 DISCUSSION 105 5.6.1 FURTHER READING 108
5.6.2 SUMMARY 109 PROSODY PREDICTION FROM TEXT 111 6.1 PROSODIC FORM 111
6.2 PHRASING 112 6.2.1 PHRASING PHENOMENA 112 6.2.2 MODELS OF PHRASING
113 XII CONTENTS 6.3 PROMINENCE 115 6.3.1 SYNTACTIC PROMINENCE PATTERNS
116 6.3.2 DISCOURSE PROMINENCE PATTERNS 118 6.3.3 PROMINENCE SYSTEMS,
DATA AND LABELLING 119 6.4 INTONATION AND TUNE 121 6.5 PROSODIC MEANING
AND FUNCTION 122 6.5.1 AFFECTIVE PROSODY 123 6.5.2 SUPRASEGMENTALITY 124
6.5.3 AUGMENTATIVE PROSODY 125 6.5.4 SYMBOLIC COMMUNICATION AND PROSODIC
STYLE 126 6.6 DETERMINING PROSODY FROM THE TEXT 127 6.6.1 PROSODY AND
HUMAN READING 127 6.6.2 CONTROLLING THE-DEGREE OF AUGMENTATIVE PROSODY
128 6.6.3 PROSODY AND SYNTHESIS TECHNIQUES 128 6.7 PHRASING PREDICTION
129 6.7.1 EXPERIMENTAL FORMULATION 129 6.7.2 DETERMINISTIC APPROACHES
130 6.7.3 CLASSIFIER APPROACHES 132 6.7.4 HMM APPROACHES 133 6.7.5
HYBRID APPROACHES 135 6.8 PROMINENCE PREDICTION . 136 6.8.1
COMPOUND-NOUN PHRASES 136 6.8.2 FUNCTION-WORD PROMINENCE 138 6.8.3
DATA-DRIVEN APPROACHES 138 6.9 INTONATIONAL-TUNE PREDICTION 139 6.10
DISCUSSION 139 6.10.1 LABELLING SCHEMES AND LABELLING ACCURACY 139
6.10.2 LINGUISTIC THEORIES AND PROSODY 141 6.10.3 SYNTHESISING
SUPRASEGMENTAL AND TRUE PROSODY 142 6.10.4 PROSODY IN REAL DIALOGUES 143
6.10.5 CONCLUSION 144 6.10.6 SUMMARY 144 PHONETICS AND PHONOLOGY 146 7.1
ARTICULATORY PHONETICS AND SPEECH PRODUCTION 146 7.1.1 THE VOCAL ORGANS
147 7.1.2 SOUND SOURCES 147 7.1.3 SOUND OUTPUT 150 7.1.4 THE VOCAL-TRACT
FILTER 150 7.1.5 VOWELS 151 7.1.6 CONSONANTS 153 7.1.7 EXAMINING SPEECH
PRODUCTION 155 7.2 ACOUSTICS, PHONETICS AND SPEECH PERCEPTION 156
CONTENTS XIII 7.3 7.4 7.5 7.2.1 ACOUSTIC REPRESENTATIONS 7.2.2 ACOUSTIC
CHARACTERISTICS THE COMMUNICATIVE USE OF SPEECH 7.3.1 COMMUNICATING
DISCRETE INFORMATION WITH A CONTINUOUS CHANNEL 7.3.2 PHONEMES, PHONES
AND ALLOPHONES 7.3.3 ALLOPHONIC VARIATION AND PHONETIC CONTEXT 7.3.4
COARTICULATION, TARGETS AND TRANSIENTS 7.3.5 THE CONTINUOUS NATURE OF
SPEECH 7.3.6 TRANSCRIPTION 7.3.7 THE DISTINCTIVENESS OF SPEECH IN
COMMUNICATION PHONOLOGY: THE LINGUISTIC ORGANISATION OF SPEECH 7.4.1
PHONOTACTICS 7.4.2 WORD FORMATION 7.4.3 DISTINCTIVE FEATURES AND
PHONOLOGICAL THEORIES 7.4.4 SYLLABLES 7.4.5 LEXICAL STRESS DISCUSSION
7.5.1 FURTHER READING 7.5.2 SUMMARY 8 PRONUNCIATION 8.1 8.2 8.3 8.4
PRONUNCIATION REPRESENTATIONS 8.1.1 WHY BOTHER? 8.1.2 PHONEMIC AND
PHONETIC INPUT 8.1.3 DIFFICULTIES IN DERIVING PHONETIC INPUT 8.1.4 A
STRUCTURED APPROACH TO PRONUNCIATION 8.1.5 ABSTRACT PHONOLOGICAL
REPRESENTATIONS FORMULATING A PHONOLOGICAL REPRESENTATION SYSTEM 8.2.1
SIMPLE CONSONANTS AND VOWELS 8.2.2 DIFFICULT CONSONANTS 8.2.3 DIPHTHONGS
AND AFFRICATES 8.2.4 APPROXIMANT-VOWEL COMBINATIONS 8.2.5 DEFINING THE
FULL INVENTORY 8.2.6 PHONEME NAMES 8.2.7 SYLLABIC ISSUES THE LEXICON
8.3.1 LEXICON AND RULES 8.3.2 LEXICON FORMATS 8.3.3 THE OFFLINE LEXICON
8.3.4 THE SYSTEM LEXICON 8.3.5 LEXICON QUALITY 8.3.6 DETERMINING THE
PRONUNCIATIONS OF UNKNOWN WORDS GRANHEME-TO-RJHONEME CONVERSION 156 159
160 161 162 166 168 169 170 171 172 172 179 181 184 186 189 189 190 192
192 192 193 194 195 196 197 197 199 201 201 203 204 206 207 208 210 213
214 215 216 218 XIV CONTENTS 8.4.1 8.4.2 8.4.3 8.4.4 8.4.5 8.4.6
RULE-BASED TECHNIQUES GRAPHEME-TO-PHONEME ALIGNMENT NEURAL NETWORKS
PRONUNCIATION BY ANALOGY OTHER DATA-DRIVEN TECHNIQUES STATISTICAL
TECHNIQUES 8.5 FURTHER ISSUES 8.5.1 8.5.2 8.5.3 MORPHOLOGY LANGUAGE
ORIGIN AND NAMES POST-LEXICAL PROCESSING 8.6 SUMMARY 218 219 219 220 221
221 222 222 223 223 224 SYNTHESIS OF PROSODY 225 9.1 INTONATION OVERVIEW
225 9.1.1 F0 AND PITCH 226 9.1.2 INTONATIONAL FORM 226 9.1.3 MODELS OF
F0 CONTOURS 227 9.1.4 MICRO-PROSODY 229 9.2 INTONATIONAL BEHAVIOUR 229
9.2.1 INTONATIONAL TUNE 229 9.2.2 DOWNDRIFT ' 230 9.2.3 PITCH RANGE 233
9.2.4 PITCH ACCENTS AND BOUNDARY TONES 234 9.3 INTONATION THEORIES AND
MODELS 236 9.3.1 TRADITIONAL MODELS AND THE BRITISH SCHOOL 236 9.3.2 THE
DUTCH SCHOOL 237 9.3.3 AUTOSEGMENTAL-METRICAL AND TOBI MODELS 237 9.3.4
THE INTSINT MODEL 239 9.3.5 THE FUJISAKI MODEL AND SUPERIMPOSITIONAL
MODELS 239 9.3.6 THE TILT MODEL 242 9.3.7 COMPARISON 244 9.4 INTONATION
SYNTHESIS WITH AM MODELS 245 9.4.1 PREDICTION OF AM LABELS FROM TEXT 246
9.4.2 DETERMINISTIC SYNTHESIS METHODS 246 9.4.3 DATA-DRIVEN SYNTHESIS
METHODS 247 9.4.4 ANALYSIS WITH AUTOSEGMENTAL MODELS 248 9.5 INTONATION
SYNTHESIS WITH DETERMINISTIC ACOUSTIC MODELS 248 9.5.1 SYNTHESIS WITH
SUPERIMPOSITIONAL MODELS 249 9.5.2 SYNTHESIS WITH THE TILT MODEL 249
9.5.3 ANALYSIS WITH FUJISAKI AND TILT MODELS 250 9.6 DATA-DRIVEN
INTONATION MODELS 250 9.6.1 UNIT-SELECTION-STYLE APPROACHES 251 9.6.2
DYNAMIC-SYSTEM MODELS 252 CONTENTS XV 9.7 9.8 9.6.3 9.6.4 HIDDEN MARKOV
MODELS FUNCTIONAL MODELS TIMING 9.7.1 9.7.2 9.7.3 9.7.4 9.7.5 9.7.6
FORMULATION OF THE TIMING PROBLEM THE NATURE OF TIMING KLATT RULES THE
SUMS-OF-PRODUCTS MODEL THE CAMPBELL MODEL OTHER REGRESSION TECHNIQUES
DISCUSSION 9.8.1 9.8.2 SIGNALS AND FURTHER READING SUMMARY FILTERS 253
254 254 255 255 256 257 258 259 259 260 260 10 SIGNALS AND FILTERS 262
10.1 ANALOGUE SIGNALS 262 10.1.1 SIMPLE PERIODIC SIGNALS: SINUSOIDS 263
10.1.2 GENERAL PERIODIC SIGNALS 265 10.1.3 SINUSOIDS AS COMPLEX
EXPONENTIALS 266 10.1.4 FOURIER ANALYSIS 269 10.1.5 THE FREQUENCY DOMAIN
270 10.1.6 THE FOURIER TRANSFORM ' 275 10.2 DIGITAL SIGNALS 278 10.2.1
DIGITAL WAVEFORMS 279 10.2.2 DIGITAL REPRESENTATIONS 280 10.2.3 THE
DISCRETE-TIME FOURIER TRANSFORM 280 10.2.4 THE DISCRETE FOURIER
TRANSFORM 281 10.2.5 THE Z-TRANSFORM 282 10.2.6 THE FREQUENCY DOMAIN FOR
DIGITAL SIGNALS 283 10.3 PROPERTIES OF TRANSFORMS 284 10.3.1 LINEARITY
284 10.3.2 TIME AND FREQUENCY DUALITY 284 10.3.3 SCALING 285 10.3.4
IMPULSE PROPERTIES 285 10.3.5 TIME DELAY 286 10.3.6 FREQUENCY SHIFT 286
10.3.7 CONVOLUTION 287 10.3.8 ANALYTICAL AND NUMERICAL ANALYSIS 287
10.3.9 STOCHASTIC SIGNALS 288 10.4 DIGITAL FILTERS 288 10.4.1 DIFFERENCE
EQUATIONS 289 10.4.2 THE IMPULSE RESPONSE 289 10.4.3 THE FILTER
CONVOLUTION SUM 292 10.4.4 THE FILTER TRANSFER FUNCTION 293 XVI CONTENTS
10.4.5 THE TRANSFER FUNCTION AND THE IMPULSE RESPONSE 293 10.5 DIGITAL
FILTER ANALYSIS AND DESIGN 294 10.5.1 POLYNOMIAL ANALYSIS: POLES AND
ZEROS 294 10.5.2 FREQUENCY INTERPRETATION OF THE Z-DOMAIN TRANSFER
FUNCTION 297 10.5.3 FILTER CHARACTERISTICS 298 10.5.4 PUTTING IT ALL
TOGETHER 304 10.6 SUMMARY 305 11 ACOUSTIC MODELS OF SPEECH PRODUCTION
309 11.1 THE ACOUSTIC THEORY OF SPEECH PRODUCTION 309 11.1.1 COMPONENTS
IN THE MODEL 309 11.2 THE PHYSICS OF SOUND 311 11.2.1 RESONANT SYSTEMS
311 11.2.2 TRAVELLING WAVES 313 11.2.3 ACOUSTIC WAVES 315 11.2.4
ACOUSTIC REFLECTION 317 11.3 THE VOWEL-TUBE MODEL 318 11.3.1 DISCRETE
TIME AND DISTANCE 319 11.3.2 JUNCTION OF TWO TUBES 320 11.3.3 SPECIAL
CASES OF JUNCTIONS 322 11.3.4 THE TWO-TUBE VOCAL-TRACT MODEL ' 323
11.3.5 THE SINGLE-TUBE MODEL 325 11.3.6 THE MULTI-TUBE VOCAL-TRACT MODEL
327 11.3.7 THE ALL-POLE RESONATOR MODEL 329 11.4 SOURCE AND RADIATION
MODELS 330 11.4.1 RADIATION 330 11.4.2 THE GLOTTAL SOURCE 330 11.5 MODEL
REFINEMENTS 333 11.5.1 MODELLING THE NASAL CAVITY 333 11.5.2 SOURCE
POSITIONS IN THE ORAL CAVITY 334 11.5.3 MODELS WITH VOCAL-TRACT LOSSES
335 11.5.4 SOURCE AND RADIATION EFFECTS 336 11.6 DISCUSSION 336 11.6.1
FURTHER READING 339 11.6.2 SUMMARY 339 12 ANALYSIS OF SPEECH SIGNALS 341
12.1 SHORT-TERM SPEECH ANALYSIS 341 12.1.1 WINDOWING 342 12.1.2
SHORT-TERM SPECTRAL REPRESENTATIONS 343 12.1.3 FRAME LENGTHS AND SHIFTS
345 12.1.4 THE SPECTROGRAM 348 12.1.5 AUDITORY SCALES 351 CONTENTS XVII
12.2 FILTER-BANK ANALYSIS 352 12.3 THE CEPSTRUM 353 12.3.1 CEPSTRUM
DEFINITION 353 12.3.2 TREATING THE MAGNITUDE SPECTRUM AS A SIGNAL 353
12.3.3 CEPSTRAL ANALYSIS AS DECONVOLUTION 355 12.3.4 CEPSTRAL ANALYSIS
DISCUSSION 356 12.4 LINEAR-PREDICTION ANALYSIS 357 12.4.1 FINDING THE
COEFFICIENTS: THE COVARIANCE METHOD 358 12.4.2 THE AUTOCORRELATION
METHOD 360 12.4.3 LEVINSON-DURBIN RECURSION 361 12.5 SPECTRAL-ENVELOPE
AND VOCAL-TRACT REPRESENTATIONS 362 12.5.1 LINEAR-PREDICTION SPECTRA 362
12.5.2 TRANSFER-FUNCTION POLES 364 12.5.3 REFLECTION COEFFICIENTS 366
12.5.4 LOG AREA RATIOS 367 12.5.5 LINE-SPECTRUM FREQUENCIES 367 12.5.6
LINEAR-PREDICTION CEPSTRA 369 12.5.7 MEL-SCALED CEPSTRA 370 12.5.8
PERCEPTUAL LINEAR PREDICTION 370 12.5.9 FORMANT TRACKING 370 12.6 SOURCE
REPRESENTATIONS , 372 12.6.1 RESIDUAL SIGNALS 372 12.6.2 CLOSED-PHASE
ANALYSIS 374 12.6.3 OPEN-PHASE ANALYSIS 377 12.6.4 IMPULSE/NOISE MODELS
378 12.6.5 PARAMETERISATION OF GLOTTAL-FLOW SIGNALS 379 12.7 PITCH AND
EPOCH DETECTION 379 12.7.1 PITCH DETECTION 379 12.7.2 EPOCH DETECTION:
FINDING THE INSTANT OF GLOTTAL CLOSURE 381 12.8 DISCUSSION 384 12.8.1
FURTHER READING 385 12.8.2 SUMMARY 386 13 SYNTHESIS TECHNIQUES BASED ON
VOCAL-TRACT MODELS 387 13.1 SYNTHESIS SPECIFICATION: THE INPUT TO THE
SYNTHESISER 387 13.2 FORMANT SYNTHESIS 388 13.2.1 SOUND SOURCES 389
13.2.2 SYNTHESISING A SINGLE FORMANT 390 13.2.3 RESONATORS IN SERIES AND
PARALLEL 391 13.2.4 SYNTHESISING CONSONANTS 392 13.2.5 A COMPLETE
SYNTHESISER 394 13.2.6 THE PHONETIC INPUT TO THE SYNTHESISER 394 13.2.7
FORMANT-SYNTHESIS QUALITY 397 XVIII CONTENTS 13.3 CLASSICAL
LINEAR-PREDICTION SYNTHESIS 13.3.1 COMPARISON WITH FORMANT SYNTHESIS
13.3.2 THE IMPULSE/NOISE SOURCE MODEL 13.3.3 LINEAR-PREDICTION
DIPHONE-CONCATENATIVE SYNTHESIS 13.3.4 A COMPLETE SYNTHESISER 13.3.5
PROBLEMS WITH THE SOURCE 13.4 ARTICULATORY SYNTHESIS 13.5 DISCUSSION
13.5.1 FURTHER READING 13.5.2 SUMMARY 14 SYNTHESIS BY CONCATENATION AND
SIGNAL-PROCESSING MODIFICATION 14.1 SPEECH UNITS IN SECOND-GENERATION
SYSTEMS 14.1.1 CREATING A DIPHONE INVENTORY 14.1.2 OBTAINING DIPHONES
FROM SPEECH 14.2 PITCH-SYNCHRONOUS OVERLAP AND ADD (PSOLA) 14.2.1
TIME-DOMAIN PSOLA 14.2.2 EPOCH MANIPULATION 14.2.3 HOW DOES PSOLA WORK?
14.3 RESIDUAL-EXCITED LINEAR PREDICTION (RELP) 14.3.1 RESIDUAL
MANIPULATION 14.3.2 LINEAR-PREDICTION PSOLA 14.4 SINUSOIDAL MODELS
14.4.1 PURE SINUSOIDAL MODELS 14.4.2 HARMONIC/NOISE MODELS 14.5 MBROLA'
14.6 SYNTHESIS FROM CEPSTRAL COEFFICIENTS 14.7 CONCATENATION ISSUES 14.8
DISCUSSION 14.8.1 FURTHER READING 14.8.2 SUMMARY 15 HIDDEN-MARKOV-MODEL
SYNTHESIS 15.1 THE HMM FORMALISM 15.1.1 OBSERVATION PROBABILITIES 15.1.2
DELTA COEFFICIENTS 15.1.3 ACOUSTIC REPRESENTATIONS AND COVARIANCE 15.
15. 15. 15. 15. .4 STATES AND TRANSITIONS .5 RECOGNISING WITH HMMS .6
LANGUAGE MODELS .7 THE VITERBI ALGORITHM .8 TRAINING HMMS 15.1.9
CONTEXT-SENSITIVE MODELLING 399 399 400 401 403 404 405 407 409 410 412
413 414 414 415 416 417 421 421 423 423 424 425 426 429 429 431 433 433
433 435 435 436 438 439 440 440 443 444 447 451 CONTENTS XIX 15.1.10 ARE
HMMS A GOOD MODEL OF SPEECH? 454 15.2 SYNTHESIS FROM HIDDEN MARKOV
MODELS 456 15.2.1 FINDING THE LIKELIEST OBSERVATIONS GIVEN THE STATE
SEQUENCE 457 15.2.2 FINDING THE LIKELIEST OBSERVATIONS AND STATE
SEQUENCE 459 15.2.3 ACOUSTIC REPRESENTATIONS 459 15.2.4
CONTEXT-SENSITIVE SYNTHESIS MODELS 463 15.2.5 DURATION MODELLING 464
15.2.6 SIGNAL PROCESSING IN HMM SYNTHESIS 464 15.2.7 HMM SYNTHESIS
SYSTEMS 465 15.3 LABELLING DATABASES WITH HMMS 467 15.3.1 DETERMINING
THE WORD SEQUENCE 467 15.3.2 DETERMINING THE PHONE SEQUENCE 468 15.3.3
DETERMINING THE PHONE BOUNDARIES 468 15.3.4 MEASURING THE QUALITY OF THE
ALIGNMENTS 470 15.4 OTHER DATA-DRIVEN SYNTHESIS TECHNIQUES 471 15.5
DISCUSSION 471 15.5.1 FURTHER READING 471 15.5.2 SUMMARY 472 16
UNIT-SELECTION SYNTHESIS 474 16.1 FROM CONCATENATIVE SYNTHESIS TO UNIT
SELECTION 474 16.1.1 EXTENDING CONCATENATIVE SYNTHESIS 475 16.1.2 THE
HUNT AND BLACK ALGORITHM 477 16.2 FEATURES, 479 16.2.1 BASE TYPES 479
16.2.2 LINGUISTIC AND ACOUSTIC FEATURES 480 16.2.3 CHOICE OF FEATURES
481 16.2.4 TYPES OF FEATURES 482 16.3 THE INDEPENDENT-FEATURE
TARGET-FUNCTION FORMULATION 484 16.3.1 THE PURPOSE OF THE TARGET
FUNCTION 484 16.3.2 DEFINING A PERCEPTUAL SPACE 485 16.3.3 PERCEPTUAL
SPACES DEFINED BY INDEPENDENT FEATURES 486 16.3.4 SETTING THE TARGET
WEIGHTS USING ACOUSTIC DISTANCES 488 16.3.5 LIMITATIONS OF THE
INDEPENDENT-FEATURE FORMULATION 491 16.4 THE ACOUSTIC-SPACE
TARGET-FUNCTION FORMULATION 493 16.4.1 DECISION-TREE CLUSTERING 494
16.4.2 GENERAL PARTIAL-SYNTHESIS FUNCTIONS 496 16.5 JOIN FUNCTIONS 497
16.5.1 BASIC ISSUES IN JOINING UNITS 497 16.5.2 PHONE-CLASS JOIN COSTS
498 16.5.3 ACOUSTIC-DISTANCE JOIN COSTS 499 16.5.4 COMBINING CATEGORICAL
AND AND ACOUSTIC JOIN COSTS 500 16.5.5 PROBABILISTIC AND SEQUENCE JOIN
COSTS 501 XX CONTENTS 16.5.6 JOIN CLASSIFIERS 502 16.6 SEARCHING 504
16.6.1 BASE TYPES AND SEARCHING 505 16.6.2 PRUNING 508 16.6.3
PRE-SELECTION 508 16.6.4 BEAM PRUNING 509 16.6.5 MULTI-PASS SEARCHING
509 16.7 DISCUSSION 510 16.7.1 UNIT SELECTION AND SIGNAL PROCESSING 511
16.7.2 FEATURES, COSTS AND PERCEPTION 511 16.7.3 EXAMPLE UNIT-SELECTION
SYSTEMS 512 16.7.4 FURTHER READING 514 16.7.5 SUMMARY 515 17 FURTHER
ISSUES 517 17.1 DATABASES 517 17.1.1 UNIT-SELECTION DATABASES 517 17.1.2
TEXT MATERIALS 518 17.1.3 PROSODY DATABASES 518 17.1.4 LABELLING 519
17.1.5 WHAT EXACTLY IS HAND LABELLING? ' 519 17.1.6 AUTOMATIC LABELLING
521 17.1.7 AVOIDING EXPLICIT LABELS 521 17.2 EVALUATION 522 17.2.1
SYSTEM TESTING: INTELLIGIBILITY AND NATURALNESS 523 17.2.2
WORD-RECOGNITION TESTS 523 17.2.3 NATURALNESS TESTS 524 17.2.4 TEST DATA
525 17.2.5 UNIT OR COMPONENT TESTING 525 17.2.6 COMPETITIVE EVALUATIONS
526 17.3 AUDIO-VISUAL SPEECH SYNTHESIS 527 17.3.1 SPEECH CONTROL 528
17.4 SYNTHESIS OF EMOTIONAL AND EXPRESSIVE SPEECH 529 17.4.1 DESCRIBING
EMOTION 529 17.4.2 SYNTHESISING EMOTION WITH PROSODY CONTROL 529 17.4.3
SYNTHESISING EMOTION WITH VOICE TRANSFORMATION 530 17.4.4 UNIT SELECTION
AND HMM TECHNIQUES 531 17.5 SUMMARY 531 18 CONCLUSION 533 18.1 SPEECH
TECHNOLOGY AND LINGUISTICS 533 18.2 FUTURE DIRECTIONS 536 18.3
CONCLUSION 539 CONTENTS XXI APPENDIX A PROBABILITY 540 A. 1 DISCRETE
PROBABILITIES 540 A. 1.1 DISCRETE RANDOM VARIABLES 540 A. 1.2
PROBABILITY MASS FUNCTIONS 541 A. 1.3 EXPECTED VALUES 541 A. 1.4 MOMENTS
OF A PMF 542 A.2 PAIRS OF DISCRETE RANDOM VARIABLES 542 A.2.1 MARGINAL
DISTRIBUTIONS 543 A.2.2 INDEPENDENCE 543 A.2.3 EXPECTED VALUES 543 A.2.4
MOMENTS OF A JOINT DISTRIBUTION 544 A.2.5 HIGHER-ORDER MOMENTS AND
COVARIANCE 544 A.2.6 CORRELATION 544 A.2.7 CONDITIONAL PROBABILITY 545
A.2.8 BAYES'RULE 545 A.2.9 SUM OF RANDOM VARIABLES 545 A.2.10 THE CHAIN
RULE 546 A.2.11 ENTROPY 546 A.3 CONTINUOUS RANDOM VARIABLES 547 A.3.1
CONTINUOUS RANDOM VARIABLES 548 A.3.2 EXPECTED VALUES ' 548 A.3.3 THE
GAUSSIAN DISTRIBUTION 549 A.3.4 THE UNIFORM DISTRIBUTION 549 - A.3.5
CUMULATIVE DENSITY FUNCTIONS 549 A.4 PAIRS OF CONTINUOUS RANDOM
VARIABLES 550 A.4.1 INDEPENDENT VERSUS UNCORRELATED 551 A.4.2 THE SUM OF
TWO RANDOM VARIABLES 551 A.4.3 ENTROPY 552 A.4.4 KULLBACK-LEIBLER
DISTANCE 552 APPENDIX B PHONE DEFINITIONS 553 REFERENCES 556 INDEX 583 |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Taylor, Paul |
author_facet | Taylor, Paul |
author_role | aut |
author_sort | Taylor, Paul |
author_variant | p t pt |
building | Verbundindex |
bvnumber | BV035039831 |
callnumber-first | T - Technology |
callnumber-label | TK7882 |
callnumber-raw | TK7882.S65 |
callnumber-search | TK7882.S65 |
callnumber-sort | TK 47882 S65 |
callnumber-subject | TK - Electrical and Nuclear Engineering |
classification_rvk | ES 950 |
ctrlnum | (OCoLC)221147648 (DE-599)BVBBV035039831 |
dewey-full | 006.54 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 006 - Special computer methods |
dewey-raw | 006.54 |
dewey-search | 006.54 |
dewey-sort | 16.54 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik Sprachwissenschaft Literaturwissenschaft |
discipline_str_mv | Informatik Sprachwissenschaft Literaturwissenschaft |
edition | 1. publ. |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01261nam a2200361 c 4500</leader><controlfield tag="001">BV035039831</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20090212 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">080905s2009 d||| |||| 00||| ger d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780521899277</subfield><subfield code="9">978-0-521-89927-7</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)221147648</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV035039831</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">ger</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-19</subfield><subfield code="a">DE-11</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">TK7882.S65</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.54</subfield><subfield code="2">22</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ES 950</subfield><subfield code="0">(DE-625)27936:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Taylor, Paul</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Text-to-speech synthesis</subfield><subfield code="c">Paul Taylor</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1. publ.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Cambridge [u.a.]</subfield><subfield code="b">Cambridge Univ. Pr.</subfield><subfield code="c">2009</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXVIII, 597 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturverz. S. [556] - 582</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Sprachcodierung</subfield><subfield code="2">swd</subfield></datafield><datafield tag="650" ind1=" " ind2="7"><subfield code="a">Synthèse automatique de la parole</subfield><subfield code="2">ram</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Speech synthesis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HEBIS Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016708674&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-016708674</subfield></datafield></record></collection> |
id | DE-604.BV035039831 |
illustrated | Illustrated |
index_date | 2024-07-02T21:52:49Z |
indexdate | 2024-07-09T21:20:49Z |
institution | BVB |
isbn | 9780521899277 |
language | German |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-016708674 |
oclc_num | 221147648 |
open_access_boolean | |
owner | DE-19 DE-BY-UBM DE-11 |
owner_facet | DE-19 DE-BY-UBM DE-11 |
physical | XXVIII, 597 S. graph. Darst. |
publishDate | 2009 |
publishDateSearch | 2009 |
publishDateSort | 2009 |
publisher | Cambridge Univ. Pr. |
record_format | marc |
spelling | Taylor, Paul Verfasser aut Text-to-speech synthesis Paul Taylor 1. publ. Cambridge [u.a.] Cambridge Univ. Pr. 2009 XXVIII, 597 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Literaturverz. S. [556] - 582 Sprachcodierung swd Synthèse automatique de la parole ram Speech synthesis HEBIS Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016708674&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Taylor, Paul Text-to-speech synthesis Sprachcodierung swd Synthèse automatique de la parole ram Speech synthesis |
title | Text-to-speech synthesis |
title_auth | Text-to-speech synthesis |
title_exact_search | Text-to-speech synthesis |
title_exact_search_txtP | Text-to-speech synthesis |
title_full | Text-to-speech synthesis Paul Taylor |
title_fullStr | Text-to-speech synthesis Paul Taylor |
title_full_unstemmed | Text-to-speech synthesis Paul Taylor |
title_short | Text-to-speech synthesis |
title_sort | text to speech synthesis |
topic | Sprachcodierung swd Synthèse automatique de la parole ram Speech synthesis |
topic_facet | Sprachcodierung Synthèse automatique de la parole Speech synthesis |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=016708674&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT taylorpaul texttospeechsynthesis |