Audiovisual speech processing:
"When we speak, we configure the vocal tract which shapes the visible motions of the face and the patterning of the audible speech acoustics. Similarly, we use these visible and audible behaviors to perceive speech. This book showcases a broad range of research investigating how these two types...
Gespeichert in:
Weitere Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Cambridge [u.a.]
Cambridge Univ. Press
2012
|
Ausgabe: | 1. publ. |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis Klappentext |
Zusammenfassung: | "When we speak, we configure the vocal tract which shapes the visible motions of the face and the patterning of the audible speech acoustics. Similarly, we use these visible and audible behaviors to perceive speech. This book showcases a broad range of research investigating how these two types of signals are used in spoken communication, how they interact, and how they can be used to enhance the realistic synthesis and recognition of audible and visible speech. The volume begins by addressing two important questions about human audiovisual performance: how auditory and visual signals combine to access the mental lexicon and where in the brain this and related processes take place. It then turns to the production and perception of multimodal speech and how structures are coordinated within and across the two modalities. Finally, the book presents overviews and recent developments in machine-based speech recognition and synthesis of AV speech"--Provided by publisher |
Beschreibung: | Literaturverz. S. 403 - 468 Hier auch später erschienene, unveränderte Nachdrucke |
Beschreibung: | XXXVI, 470 S. Ill., graph. Darst. |
ISBN: | 9781107006829 9781107499324 1107006821 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV039972387 | ||
003 | DE-604 | ||
005 | 20151123 | ||
007 | t | ||
008 | 120321s2012 ad|| |||| 00||| eng d | ||
020 | |a 9781107006829 |c hbk. |9 978-1-107-00682-9 | ||
020 | |a 9781107499324 |c pbk. |9 978-1-107-49932-4 | ||
020 | |a 1107006821 |9 1-107-00682-1 | ||
035 | |a (OCoLC)785855660 | ||
035 | |a (DE-599)BVBBV039972387 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
049 | |a DE-12 |a DE-739 |a DE-188 |a DE-11 | ||
084 | |a ES 950 |0 (DE-625)27936: |2 rvk | ||
084 | |a ET 215 |0 (DE-625)27955: |2 rvk | ||
084 | |a ET 220 |0 (DE-625)27956: |2 rvk | ||
245 | 1 | 0 | |a Audiovisual speech processing |c ed. by Gérard Bailly ... |
250 | |a 1. publ. | ||
264 | 1 | |a Cambridge [u.a.] |b Cambridge Univ. Press |c 2012 | |
300 | |a XXXVI, 470 S. |b Ill., graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Literaturverz. S. 403 - 468 | ||
500 | |a Hier auch später erschienene, unveränderte Nachdrucke | ||
520 | |a "When we speak, we configure the vocal tract which shapes the visible motions of the face and the patterning of the audible speech acoustics. Similarly, we use these visible and audible behaviors to perceive speech. This book showcases a broad range of research investigating how these two types of signals are used in spoken communication, how they interact, and how they can be used to enhance the realistic synthesis and recognition of audible and visible speech. The volume begins by addressing two important questions about human audiovisual performance: how auditory and visual signals combine to access the mental lexicon and where in the brain this and related processes take place. It then turns to the production and perception of multimodal speech and how structures are coordinated within and across the two modalities. Finally, the book presents overviews and recent developments in machine-based speech recognition and synthesis of AV speech"--Provided by publisher | ||
650 | 4 | |a Speech Perception | |
650 | 4 | |a Lipreading | |
650 | 4 | |a Phonetics | |
650 | 4 | |a Speech / physiology | |
650 | 4 | |a Visual Perception | |
650 | 0 | 7 | |a Sprachverarbeitung |0 (DE-588)4116579-2 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Sprachverarbeitung |0 (DE-588)4116579-2 |D s |
689 | 0 | |8 1\p |5 DE-604 | |
700 | 1 | |a Bailly, Gérard |4 edt | |
856 | 4 | 2 | |m HBZ Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024829896&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
856 | 4 | 2 | |m Digitalisierung UB Passau |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024829896&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |3 Klappentext |
999 | |a oai:aleph.bib-bvb.de:BVB01-024829896 | ||
883 | 1 | |8 1\p |a cgwrk |d 20201028 |q DE-101 |u https://d-nb.info/provenance/plan#cgwrk |
Datensatz im Suchindex
_version_ | 1804148951535845376 |
---|---|
adam_text | Titel: Audiovisual speech processing
Autor: Bailly, Gérard
Jahr: 2012
Contents
List of figures page xi
List of tables xvii
List of contributors xviii
Preface xxxiii
Acknowledgments xxxvi
Introduction 1
1 Three puzzles of multimodal speech perception 4
R. E. REMEZ
1.1 Introduction 4
1.2 Organization 5
1.3 Event perception and speech perception 10
1.4 Experience 15
1.5 A conclusion 20
1.6 Acknowledgments 20
2 Visual speech perception 21
L.E. BERNSTEIN
2.1 Introduction 21
2.2 Evaluation of visemes and word homopheny 27
2.3 Phonetic distinctiveness of English words 32
2.4 Research strategies 36
2.5 General conclusions 39
2.6 Acknowledgments 39
3 Dynamic information for face perception 40
K. LANDER AND V. BRUCE
3.1 Introduction 40
3.2 Motion information for expression perception 42
3.3 Motion information for visual speech perception 44
3.4 Dynamic information for familiar face recognition 47
3.5 Dynamic information for unfamiliar face learning 51
3.6 Practical considerations 54
3.7 Theoretical interpretations 55
3.8 Future research and conclusions 60
viii Contents
4 Investigating auditory-visual speech perception development 62
D. BURNHAM AND K. SEKIYAMA
4.1 Speech perception is auditory-visual 62
4.2 Auditory-visual speech perception 63
4.3 Methods for investigating development 64
4.4 The ontogenetic development method 65
4.5 The cross-language development method 69
4.6 Combined methods 71
4.7 Conclusions and an application: automatic speech recognition 73
4.8 Acknowledgments 75
5 Brain bases for seeing speech: fMRI studies of speechreading 76
R. CAMPBELL AND M. MACSWEENEY
5.1 Introduction 76
5.2 Route maps and guidelines 77
5.3 Silent speechreading and auditory cortex 83
5.4 Audiovisual integration: timing 92
5.5 Speechreading: other cortical regions 94
5.6 Speechreading in people born deaf 95
5.7 Conclusions, directions 98
5.8 Acknowledgments 99
5.9 Appendix: glossary of acronyms and terms 100
6 Temporal organization of Cued Speech production 104
D. BEAUTEMPS, M.-A. CATHIARD, V. ATTINA,
AND C. SAVARIAUX
6.1 Introduction 104
6.2 Overview on manual cueing 105
6.3 First results on Cued Speech production 110
6.4 General discussion 118
6.5 Acknowledgments 120
7 Bimodal perception within the natural time-course of speech
production 121
M.-A. CATHIARD, A. VILAIN, R. LABOISSIERE,
H. LOEVENBRUCK, C. SAVARIAUX, AND J.-L. SCHWARTZ
7.1 Introduction 121
7.2 The 2-Component-Vowel model 123
7.3 The 2-Comp-Vowel model and visible speech 135
7.4 The perceptual benefit of the model 146
7.5 Conclusion and perspectives 155
7.6 Post-scriptum 158
7.7 Acknowledgments 158
8 Visual and audiovisual synthesis and recognition of speech by
computers 159
N. M. BROOKE AND S. D. SCOTT
8.1 Overview 159
Contents ix
8.2 The historical perspective 161
8.3 Heads, faces, and visible speech signals 168
8.4 Automatic audiovisual speech processing 175
8.5 Assessing and perceiving audiovisual speech 184
8.6 Current prospects 189
9 Audiovisual automatic speech recognition 193
G. POTAMIANOS, C. NETI, J. LUETTIN, AND I. MATTHEWS
9.1 Introduction 193
9.2 Visual front ends 197
9.3 Audiovisual integration 213
9.4 Audiovisual databases 229
9.5 Audiovisual ASR experiments 234
9.6 Summary and discussion 244
9.7 Acknowledgments 247
10 Image-based facial synthesis 248
M. SLANEY AND C. BREGLER
10.1 Facial synthesis approaches 248
10.2 Image-based facial synthesis 250
10.3 Analyses and normalization 253
10.4 Synthesis 259
10.5 Alternative approaches 265
10.6 Conclusions 270
10.7 Acknowledgments 270
11 A trainable videorealistic speech animation system 271
T. EZZAT, G. GEIGER, AND T. POGGIO
11.1 Overview 271
11.2 Background 272
11.3 System overview 275
11.4 Corpus 276
11.5 Pre-processing 277
11.6 Multidimensional morphable models 277
11.7 Trajectory synthesis 287
11.8 Post-processing 291
11.9 Computational issues 292
11.10 Evaluation 293
11.11 Further work 305
11.12 Acknowledgments 305
11.13 Appendix 306
12 Animated speech: research progress and applications 309
D. W. MASSARO, M. M. COHEN, M. TABAIN, J. BESKOW,
AND R. CLARK
12.1 Background 309
12.2 Visible speech synthesis 311
12.3 Illustrative experiment of evaluation testing 314
12.4 The use of synthetic speech and facial animation 317
x Contents
12.5 New structures and their control 319
12.6 Reshaping the canonical head 328
12.7 Training speech articulation using dynamic 3D measurements 330
12.8 Some applications of electropalatography to speech therapy 333
12.9 Development of a speech tutor 336
12.10 Empirical studies 341
12.11 Additional potential applications 344
12.12 Acknowledgments 345
13 Empirical perceptual-motor linkage of multimodal speech 346
E. VATIKIOTIS-BATESON AND K. G. MUNHALL
13.1 Introduction 346
13.2 The perception of audiovisual speech 347
13.3 Bringing speech production to the face 349
13.4 Auditory-visual speech production 349
13.5 Correspondences of multimodal speech 350
13.6 Talking head animation 355
13.7 The importance of physical structure 356
13.8 Communicative versus cosmetic realism 364
13.9 Summary 366
13.10 Acknowledgments 367
14 Sensorimotor characteristics of speech production 368
G. BAILLY, P. BADIN, L. REVERET, AND A. BEN YOUSSEF
14.1 Introduction 368
14.2 Speech maps 368
14.3 Degrees-of-freedom in a speech task 369
14.4 Models of the underlying speech organs 372
14.5 Models of facial deformation 377
14.6 Linking articulatory degrees-of-freedom 384
14.7 Discussion 393
14.8 Conclusions 395
14.9 Acknowledgments 396
Notes 397
References 403
Index 469
Audiovisual
Speech
Processing
When we speak, we configure the vocal tract which shapes the visible motions
of the face and the patterning of the audible speech acoustics. Similarly, we use
these visible and audible behaviors to perceive speech. This book showcases a
broad range of research investigating how these two types of signals are used in
spoken communication, how they interact, and how they can be used to enhance the
realistic synthesis and recognition of audible and visible speech. The volume begins
by addressing two important questions about human audio visual performance:
how auditory and visual signals combine to access the mental lexicon, and where
in the brain this and related processes take place. It then turns to the production
and perception of multimodal speech, and how structures are coordinated within
and across the two modalities. Finally, the book presents overviews and recent
developments in machine-based speech recognition and synthesis of AV speech.
|
any_adam_object | 1 |
author2 | Bailly, Gérard |
author2_role | edt |
author2_variant | g b gb |
author_facet | Bailly, Gérard |
building | Verbundindex |
bvnumber | BV039972387 |
classification_rvk | ES 950 ET 215 ET 220 |
ctrlnum | (OCoLC)785855660 (DE-599)BVBBV039972387 |
discipline | Sprachwissenschaft Literaturwissenschaft |
edition | 1. publ. |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>03012nam a2200493 c 4500</leader><controlfield tag="001">BV039972387</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20151123 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">120321s2012 ad|| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781107006829</subfield><subfield code="c">hbk.</subfield><subfield code="9">978-1-107-00682-9</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781107499324</subfield><subfield code="c">pbk.</subfield><subfield code="9">978-1-107-49932-4</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1107006821</subfield><subfield code="9">1-107-00682-1</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)785855660</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV039972387</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-12</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-188</subfield><subfield code="a">DE-11</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ES 950</subfield><subfield code="0">(DE-625)27936:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ET 215</subfield><subfield code="0">(DE-625)27955:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ET 220</subfield><subfield code="0">(DE-625)27956:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Audiovisual speech processing</subfield><subfield code="c">ed. by Gérard Bailly ...</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1. publ.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Cambridge [u.a.]</subfield><subfield code="b">Cambridge Univ. Press</subfield><subfield code="c">2012</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXXVI, 470 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Literaturverz. S. 403 - 468</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Hier auch später erschienene, unveränderte Nachdrucke</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">"When we speak, we configure the vocal tract which shapes the visible motions of the face and the patterning of the audible speech acoustics. Similarly, we use these visible and audible behaviors to perceive speech. This book showcases a broad range of research investigating how these two types of signals are used in spoken communication, how they interact, and how they can be used to enhance the realistic synthesis and recognition of audible and visible speech. The volume begins by addressing two important questions about human audiovisual performance: how auditory and visual signals combine to access the mental lexicon and where in the brain this and related processes take place. It then turns to the production and perception of multimodal speech and how structures are coordinated within and across the two modalities. Finally, the book presents overviews and recent developments in machine-based speech recognition and synthesis of AV speech"--Provided by publisher</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Speech Perception</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Lipreading</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Phonetics</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Speech / physiology</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Visual Perception</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Sprachverarbeitung</subfield><subfield code="0">(DE-588)4116579-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Sprachverarbeitung</subfield><subfield code="0">(DE-588)4116579-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="8">1\p</subfield><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Bailly, Gérard</subfield><subfield code="4">edt</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024829896&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024829896&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Klappentext</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-024829896</subfield></datafield><datafield tag="883" ind1="1" ind2=" "><subfield code="8">1\p</subfield><subfield code="a">cgwrk</subfield><subfield code="d">20201028</subfield><subfield code="q">DE-101</subfield><subfield code="u">https://d-nb.info/provenance/plan#cgwrk</subfield></datafield></record></collection> |
id | DE-604.BV039972387 |
illustrated | Illustrated |
indexdate | 2024-07-10T00:15:18Z |
institution | BVB |
isbn | 9781107006829 9781107499324 1107006821 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-024829896 |
oclc_num | 785855660 |
open_access_boolean | |
owner | DE-12 DE-739 DE-188 DE-11 |
owner_facet | DE-12 DE-739 DE-188 DE-11 |
physical | XXXVI, 470 S. Ill., graph. Darst. |
publishDate | 2012 |
publishDateSearch | 2012 |
publishDateSort | 2012 |
publisher | Cambridge Univ. Press |
record_format | marc |
spelling | Audiovisual speech processing ed. by Gérard Bailly ... 1. publ. Cambridge [u.a.] Cambridge Univ. Press 2012 XXXVI, 470 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier Literaturverz. S. 403 - 468 Hier auch später erschienene, unveränderte Nachdrucke "When we speak, we configure the vocal tract which shapes the visible motions of the face and the patterning of the audible speech acoustics. Similarly, we use these visible and audible behaviors to perceive speech. This book showcases a broad range of research investigating how these two types of signals are used in spoken communication, how they interact, and how they can be used to enhance the realistic synthesis and recognition of audible and visible speech. The volume begins by addressing two important questions about human audiovisual performance: how auditory and visual signals combine to access the mental lexicon and where in the brain this and related processes take place. It then turns to the production and perception of multimodal speech and how structures are coordinated within and across the two modalities. Finally, the book presents overviews and recent developments in machine-based speech recognition and synthesis of AV speech"--Provided by publisher Speech Perception Lipreading Phonetics Speech / physiology Visual Perception Sprachverarbeitung (DE-588)4116579-2 gnd rswk-swf Sprachverarbeitung (DE-588)4116579-2 s 1\p DE-604 Bailly, Gérard edt HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024829896&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis Digitalisierung UB Passau application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024829896&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA Klappentext 1\p cgwrk 20201028 DE-101 https://d-nb.info/provenance/plan#cgwrk |
spellingShingle | Audiovisual speech processing Speech Perception Lipreading Phonetics Speech / physiology Visual Perception Sprachverarbeitung (DE-588)4116579-2 gnd |
subject_GND | (DE-588)4116579-2 |
title | Audiovisual speech processing |
title_auth | Audiovisual speech processing |
title_exact_search | Audiovisual speech processing |
title_full | Audiovisual speech processing ed. by Gérard Bailly ... |
title_fullStr | Audiovisual speech processing ed. by Gérard Bailly ... |
title_full_unstemmed | Audiovisual speech processing ed. by Gérard Bailly ... |
title_short | Audiovisual speech processing |
title_sort | audiovisual speech processing |
topic | Speech Perception Lipreading Phonetics Speech / physiology Visual Perception Sprachverarbeitung (DE-588)4116579-2 gnd |
topic_facet | Speech Perception Lipreading Phonetics Speech / physiology Visual Perception Sprachverarbeitung |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024829896&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024829896&sequence=000004&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT baillygerard audiovisualspeechprocessing |