Prosody recognition in speech dialogue systems: [robust natural language understanding through prediction of semantic items by pattern recognition on nonverbal acoustic speech characteristics]
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Göttingen
Sierke
2006
|
Ausgabe: | 1. Aufl. |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Zug.: Göttingen, Univ., Diss., 2005 |
Beschreibung: | 213 S. Ill., graph. Darst. |
ISBN: | 9783933893482 3933893488 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV039814424 | ||
003 | DE-604 | ||
005 | 20120209 | ||
007 | t | ||
008 | 120118s2006 gw ad|| m||| 00||| eng d | ||
015 | |a 06,N28,1122 |2 dnb | ||
016 | 7 | |a 980101182 |2 DE-101 | |
020 | |a 9783933893482 |9 978-3-933893-48-2 | ||
020 | |a 3933893488 |9 3-933893-48-8 | ||
024 | 3 | |a 9783933893482 | |
035 | |a (OCoLC)775095853 | ||
035 | |a (DE-599)BVBBV039814424 | ||
040 | |a DE-604 |b ger |e rakddb | ||
041 | 0 | |a eng | |
044 | |a gw |c XA-DE-NI | ||
049 | |a DE-355 | ||
084 | |a ST 278 |0 (DE-625)143644: |2 rvk | ||
084 | |a 530 |2 sdnb | ||
084 | |a 520 |2 sdnb | ||
100 | 1 | |a Quast, Holger |e Verfasser |4 aut | |
245 | 1 | 0 | |a Prosody recognition in speech dialogue systems |b [robust natural language understanding through prediction of semantic items by pattern recognition on nonverbal acoustic speech characteristics] |c Holger Quast |
250 | |a 1. Aufl. | ||
264 | 1 | |a Göttingen |b Sierke |c 2006 | |
300 | |a 213 S. |b Ill., graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Zug.: Göttingen, Univ., Diss., 2005 | ||
650 | 0 | 7 | |a Digitale Signalverarbeitung |0 (DE-588)4113314-6 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Neuronales Netz |0 (DE-588)4226127-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Prosodie |0 (DE-588)4047500-1 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Mensch-Maschine-Schnittstelle |0 (DE-588)4720440-0 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Suprasegmentales Merkmal |0 (DE-588)4383439-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Automatische Spracherkennung |0 (DE-588)4003961-4 |2 gnd |9 rswk-swf |
655 | 7 | |0 (DE-588)4113937-9 |a Hochschulschrift |2 gnd-content | |
689 | 0 | 0 | |a Automatische Spracherkennung |0 (DE-588)4003961-4 |D s |
689 | 0 | 1 | |a Mensch-Maschine-Schnittstelle |0 (DE-588)4720440-0 |D s |
689 | 0 | 2 | |a Prosodie |0 (DE-588)4047500-1 |D s |
689 | 0 | 3 | |a Suprasegmentales Merkmal |0 (DE-588)4383439-5 |D s |
689 | 0 | 4 | |a Neuronales Netz |0 (DE-588)4226127-2 |D s |
689 | 0 | 5 | |a Digitale Signalverarbeitung |0 (DE-588)4113314-6 |D s |
689 | 0 | |5 DE-604 | |
856 | 4 | 2 | |m Digitalisierung UB Regensburg |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024674677&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-024674677 |
Datensatz im Suchindex
_version_ | 1804148749565427713 |
---|---|
adam_text | Contents
Preface
vii
Frequently Used Abbreviations
xv
1
Introduction
1
1.1
Motivation
................................... 1
1.2
Previous Work
................................. 3
1.3
Objectives
................................... 4
1.4
Overview of this Work
............................ 5
2
The Dialogue System
7
2.1
Some Fundamental Linguistic Concepts and Techniques
.......... 8
2.1.1
Levels of Speech and Language Analysis
.............. 8
2.1.2
Symbolic Representations of Structure and Meaning
........ 11
2.1.3
Symbolic vs. Statistical Language Processing
............ 15
2.2
Basic Principles of Dialogue System Design
................. 19
2.2.1
Challenges
.............................. 19
2.2.2
Architecture and Components
.................... 21
2.3
The RoBoDiMa Framework
......................... 33
2.3.1
Speech Recognition
......................... 33
2.3.2
Natural Language Processing
.................... 34
2.3.3
Dialogue Management
........................ 38
2.3.4
Response Generation
......................... 43
2.3.5
Speech Synthesis
........................... 45
3
Prosody and Its Measurement
47
3.1
What Is Prosody?
............................... 47
3.1.1
Verbal and Nonverbal Speech
.................... 47
3.1.2
Stress and Accentuation
....................... 49
3.2
Fundamentals of Speech Production
..................... 55
3.3
Pitch Tracking
................................. 58
3.3.1
Autocorrelation
............................ 59
3.3.2
Cepstrum
............................... 60
3.3.3
Relationship between the Autocorrelation and the Cepstrum
.... 62
3.3.4
Harmonic Product Spectrum
..................... 63
3.3.5
Modulation Spectrum Analysis
................... 63
3.3.6
Pre
and Postprocessing
........................ 64
xi
Contents
3.4 Speech Loudness
Perception
......................... 69
3.4.1
Some Properties of
Human Hearing................. 70
3.4.2
Vocal Effort
.............................. 72
3.4.3
The
Absolute
Loudness
of
Speech
Model
.............. 74
3.5
Vowel Quality
................................. 79
3.6
Extracted Parameters
............................. 81
3.6.1
Utterance Segmentation
—
Loudness Maximum Based Data Analysis
81
3.6.2
Prosody Points and Prosody Contours
................ 83
3.6.3
Parameters for Pattern Recognition
................. 84
4
Pattern Recognition
85
4.1
Neural Networks
............................... 85
4.1.1
Definition
............................... 86
4.1.2
Decision Making
........................... 86
4.1.3
Learning
............................... 88
4.2
Multilayer Perceptrons
............................ 89
4.3
Backpropagation
............................... 90
4.3.1
Derivation of the Learning Law
................... 91
4.3.2
Further Training Considerations
................... 92
4.4
Data Distribution and Partitioning
...................... 93
4.4.1
Data Sets
............................... 94
4.4.2
Leave-One-Contour-Out Training
.................. 94
4.4.3
Data Distribution
........................... 95
4.5
Identifying Important Parameters
....................... 95
4.5.1
What the Weights Give Away
.................... 96
4.5.2
Incremental Parameter Selection
................... 97
4.5.3
Network Output Maximization with Simulated Annealing
..... 97
5
The Wizard of Oz Experiment
101
5.1
Motivation
................................... 101
5.2
Experiment Setup
............................... 102
5.3
Labelling
...................................
IO5
6
Results
107
6.1
General Considerations
............................
Ю7
6.2
Pitch Tracking
.................................
Ю8
6.3
The Wizard of Oz Data
............................
ц0
6.3.1
User Evaluation
............................
jjl
6.3.2
Prosodie
Data and Accentuation
.................
2J3
6.4
Pattern Recognition Results
.......................... 217
6.4.1
Leave-One-Contour-Out Training
...............
U7
6.4.2
Identification of Principal Parameters
................
119
6.4.3
Classification Results
.........................
123
6.5
Validation on Speech Recognition Data
...................
126
xii
Contents
7
Conclusions
135
7.1
Summary
................................... 135
7.2
Main Achievements
.............................. 136
7.3
Outlook
.................................... 138
A Dialogue Script for the Wizard of Oz Experiment
143
8
Pattern Recognition Parameters
149
B.I Input Parameters
............................... 149
B.2 Output Parameters
.............................. 155
С
Speaker Statistics
157
Bibliography
191
Index
203
About the Author
212
xiu
|
any_adam_object | 1 |
author | Quast, Holger |
author_facet | Quast, Holger |
author_role | aut |
author_sort | Quast, Holger |
author_variant | h q hq |
building | Verbundindex |
bvnumber | BV039814424 |
classification_rvk | ST 278 |
ctrlnum | (OCoLC)775095853 (DE-599)BVBBV039814424 |
discipline | Physik Informatik Geographie |
edition | 1. Aufl. |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02352nam a2200553 c 4500</leader><controlfield tag="001">BV039814424</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20120209 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">120118s2006 gw ad|| m||| 00||| eng d</controlfield><datafield tag="015" ind1=" " ind2=" "><subfield code="a">06,N28,1122</subfield><subfield code="2">dnb</subfield></datafield><datafield tag="016" ind1="7" ind2=" "><subfield code="a">980101182</subfield><subfield code="2">DE-101</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9783933893482</subfield><subfield code="9">978-3-933893-48-2</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">3933893488</subfield><subfield code="9">3-933893-48-8</subfield></datafield><datafield tag="024" ind1="3" ind2=" "><subfield code="a">9783933893482</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)775095853</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV039814424</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakddb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">gw</subfield><subfield code="c">XA-DE-NI</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-355</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 278</subfield><subfield code="0">(DE-625)143644:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">530</subfield><subfield code="2">sdnb</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">520</subfield><subfield code="2">sdnb</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Quast, Holger</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Prosody recognition in speech dialogue systems</subfield><subfield code="b">[robust natural language understanding through prediction of semantic items by pattern recognition on nonverbal acoustic speech characteristics]</subfield><subfield code="c">Holger Quast</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1. Aufl.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Göttingen</subfield><subfield code="b">Sierke</subfield><subfield code="c">2006</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">213 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Zug.: Göttingen, Univ., Diss., 2005</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Digitale Signalverarbeitung</subfield><subfield code="0">(DE-588)4113314-6</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Neuronales Netz</subfield><subfield code="0">(DE-588)4226127-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Prosodie</subfield><subfield code="0">(DE-588)4047500-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Mensch-Maschine-Schnittstelle</subfield><subfield code="0">(DE-588)4720440-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Suprasegmentales Merkmal</subfield><subfield code="0">(DE-588)4383439-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Automatische Spracherkennung</subfield><subfield code="0">(DE-588)4003961-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)4113937-9</subfield><subfield code="a">Hochschulschrift</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Automatische Spracherkennung</subfield><subfield code="0">(DE-588)4003961-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Mensch-Maschine-Schnittstelle</subfield><subfield code="0">(DE-588)4720440-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Prosodie</subfield><subfield code="0">(DE-588)4047500-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="3"><subfield code="a">Suprasegmentales Merkmal</subfield><subfield code="0">(DE-588)4383439-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="4"><subfield code="a">Neuronales Netz</subfield><subfield code="0">(DE-588)4226127-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="5"><subfield code="a">Digitale Signalverarbeitung</subfield><subfield code="0">(DE-588)4113314-6</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024674677&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-024674677</subfield></datafield></record></collection> |
genre | (DE-588)4113937-9 Hochschulschrift gnd-content |
genre_facet | Hochschulschrift |
id | DE-604.BV039814424 |
illustrated | Illustrated |
indexdate | 2024-07-10T00:12:02Z |
institution | BVB |
isbn | 9783933893482 3933893488 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-024674677 |
oclc_num | 775095853 |
open_access_boolean | |
owner | DE-355 DE-BY-UBR |
owner_facet | DE-355 DE-BY-UBR |
physical | 213 S. Ill., graph. Darst. |
publishDate | 2006 |
publishDateSearch | 2006 |
publishDateSort | 2006 |
publisher | Sierke |
record_format | marc |
spelling | Quast, Holger Verfasser aut Prosody recognition in speech dialogue systems [robust natural language understanding through prediction of semantic items by pattern recognition on nonverbal acoustic speech characteristics] Holger Quast 1. Aufl. Göttingen Sierke 2006 213 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier Zug.: Göttingen, Univ., Diss., 2005 Digitale Signalverarbeitung (DE-588)4113314-6 gnd rswk-swf Neuronales Netz (DE-588)4226127-2 gnd rswk-swf Prosodie (DE-588)4047500-1 gnd rswk-swf Mensch-Maschine-Schnittstelle (DE-588)4720440-0 gnd rswk-swf Suprasegmentales Merkmal (DE-588)4383439-5 gnd rswk-swf Automatische Spracherkennung (DE-588)4003961-4 gnd rswk-swf (DE-588)4113937-9 Hochschulschrift gnd-content Automatische Spracherkennung (DE-588)4003961-4 s Mensch-Maschine-Schnittstelle (DE-588)4720440-0 s Prosodie (DE-588)4047500-1 s Suprasegmentales Merkmal (DE-588)4383439-5 s Neuronales Netz (DE-588)4226127-2 s Digitale Signalverarbeitung (DE-588)4113314-6 s DE-604 Digitalisierung UB Regensburg application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024674677&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Quast, Holger Prosody recognition in speech dialogue systems [robust natural language understanding through prediction of semantic items by pattern recognition on nonverbal acoustic speech characteristics] Digitale Signalverarbeitung (DE-588)4113314-6 gnd Neuronales Netz (DE-588)4226127-2 gnd Prosodie (DE-588)4047500-1 gnd Mensch-Maschine-Schnittstelle (DE-588)4720440-0 gnd Suprasegmentales Merkmal (DE-588)4383439-5 gnd Automatische Spracherkennung (DE-588)4003961-4 gnd |
subject_GND | (DE-588)4113314-6 (DE-588)4226127-2 (DE-588)4047500-1 (DE-588)4720440-0 (DE-588)4383439-5 (DE-588)4003961-4 (DE-588)4113937-9 |
title | Prosody recognition in speech dialogue systems [robust natural language understanding through prediction of semantic items by pattern recognition on nonverbal acoustic speech characteristics] |
title_auth | Prosody recognition in speech dialogue systems [robust natural language understanding through prediction of semantic items by pattern recognition on nonverbal acoustic speech characteristics] |
title_exact_search | Prosody recognition in speech dialogue systems [robust natural language understanding through prediction of semantic items by pattern recognition on nonverbal acoustic speech characteristics] |
title_full | Prosody recognition in speech dialogue systems [robust natural language understanding through prediction of semantic items by pattern recognition on nonverbal acoustic speech characteristics] Holger Quast |
title_fullStr | Prosody recognition in speech dialogue systems [robust natural language understanding through prediction of semantic items by pattern recognition on nonverbal acoustic speech characteristics] Holger Quast |
title_full_unstemmed | Prosody recognition in speech dialogue systems [robust natural language understanding through prediction of semantic items by pattern recognition on nonverbal acoustic speech characteristics] Holger Quast |
title_short | Prosody recognition in speech dialogue systems |
title_sort | prosody recognition in speech dialogue systems robust natural language understanding through prediction of semantic items by pattern recognition on nonverbal acoustic speech characteristics |
title_sub | [robust natural language understanding through prediction of semantic items by pattern recognition on nonverbal acoustic speech characteristics] |
topic | Digitale Signalverarbeitung (DE-588)4113314-6 gnd Neuronales Netz (DE-588)4226127-2 gnd Prosodie (DE-588)4047500-1 gnd Mensch-Maschine-Schnittstelle (DE-588)4720440-0 gnd Suprasegmentales Merkmal (DE-588)4383439-5 gnd Automatische Spracherkennung (DE-588)4003961-4 gnd |
topic_facet | Digitale Signalverarbeitung Neuronales Netz Prosodie Mensch-Maschine-Schnittstelle Suprasegmentales Merkmal Automatische Spracherkennung Hochschulschrift |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=024674677&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT quastholger prosodyrecognitioninspeechdialoguesystemsrobustnaturallanguageunderstandingthroughpredictionofsemanticitemsbypatternrecognitiononnonverbalacousticspeechcharacteristics |