Speaker-independent phoneme recognition on TIMIT database using integrated time-delay neural networks (TDNNs):
Abstract: "This paper describes a new structure of Neural Networks (NNs) for speaker-independent and context-independent phoneme recognition. This structrure is based on the integration of Time-Delay Neural Networks (TDNN, Waibel et al.[1988]) which have several TDNNs separated according to the...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Pittsburgh, Pa.
1989
|
Schriftenreihe: | Carnegie-Mellon University <Pittsburgh, Pa.> / Computer Science Department: CMU-CS
89,190 |
Schlagworte: | |
Zusammenfassung: | Abstract: "This paper describes a new structure of Neural Networks (NNs) for speaker-independent and context-independent phoneme recognition. This structrure is based on the integration of Time-Delay Neural Networks (TDNN, Waibel et al.[1988]) which have several TDNNs separated according to the duration of phonemes. As a result, the proposed structure has the advantage that it deals with phonemes of varying duration more efficiently. In the experimental evaluation of the proposed new structure, 16-English vowel recognition was performed using 5268 vowel tokens picked from 480 sentences spoken by 140 speakers (98 males and 42 females) on the TIMIT(TI-MIT) database The number of training tokens and testing tokens was 4326 from 100 speakers (69 males and 31 females) and 942 from 40 speakers (29 males and 11 females), respectively. The result was a 60.5% recognition rate (around 70% for a collapsed 13-vowel case), which was improved from 56% in the single TDNN structure, showing the effectiveness of the proposed new structure to use temporal information. |
Beschreibung: | 21 S. |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV008948779 | ||
003 | DE-604 | ||
005 | 00000000000000.0 | ||
007 | t | ||
008 | 940206s1989 |||| 00||| eng d | ||
035 | |a (OCoLC)21054886 | ||
035 | |a (DE-599)BVBBV008948779 | ||
040 | |a DE-604 |b ger |e rakddb | ||
041 | 0 | |a eng | |
049 | |a DE-29T | ||
082 | 0 | |a 510.7808 |b C28r 89-190 | |
100 | 1 | |a Hataoka, Nobuo |e Verfasser |4 aut | |
245 | 1 | 0 | |a Speaker-independent phoneme recognition on TIMIT database using integrated time-delay neural networks (TDNNs) |c Nobuo Hataoka and Alex H. Waibel |
264 | 1 | |a Pittsburgh, Pa. |c 1989 | |
300 | |a 21 S. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Carnegie-Mellon University <Pittsburgh, Pa.> / Computer Science Department: CMU-CS |v 89,190 | |
520 | 3 | |a Abstract: "This paper describes a new structure of Neural Networks (NNs) for speaker-independent and context-independent phoneme recognition. This structrure is based on the integration of Time-Delay Neural Networks (TDNN, Waibel et al.[1988]) which have several TDNNs separated according to the duration of phonemes. As a result, the proposed structure has the advantage that it deals with phonemes of varying duration more efficiently. In the experimental evaluation of the proposed new structure, 16-English vowel recognition was performed using 5268 vowel tokens picked from 480 sentences spoken by 140 speakers (98 males and 42 females) on the TIMIT(TI-MIT) database | |
520 | 3 | |a The number of training tokens and testing tokens was 4326 from 100 speakers (69 males and 31 females) and 942 from 40 speakers (29 males and 11 females), respectively. The result was a 60.5% recognition rate (around 70% for a collapsed 13-vowel case), which was improved from 56% in the single TDNN structure, showing the effectiveness of the proposed new structure to use temporal information. | |
650 | 4 | |a Automatic speech recognition | |
650 | 4 | |a Speech perception | |
700 | 1 | |a Waibel, Alex H. |e Verfasser |4 aut | |
810 | 2 | |a Computer Science Department: CMU-CS |t Carnegie-Mellon University <Pittsburgh, Pa.> |v 89,190 |w (DE-604)BV006187264 |9 89,190 | |
999 | |a oai:aleph.bib-bvb.de:BVB01-005904507 |
Datensatz im Suchindex
_version_ | 1804123281644584960 |
---|---|
any_adam_object | |
author | Hataoka, Nobuo Waibel, Alex H. |
author_facet | Hataoka, Nobuo Waibel, Alex H. |
author_role | aut aut |
author_sort | Hataoka, Nobuo |
author_variant | n h nh a h w ah ahw |
building | Verbundindex |
bvnumber | BV008948779 |
ctrlnum | (OCoLC)21054886 (DE-599)BVBBV008948779 |
dewey-full | 510.7808 |
dewey-hundreds | 500 - Natural sciences and mathematics |
dewey-ones | 510 - Mathematics |
dewey-raw | 510.7808 |
dewey-search | 510.7808 |
dewey-sort | 3510.7808 |
dewey-tens | 510 - Mathematics |
discipline | Mathematik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02232nam a2200337 cb4500</leader><controlfield tag="001">BV008948779</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">00000000000000.0</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">940206s1989 |||| 00||| eng d</controlfield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)21054886</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV008948779</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakddb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-29T</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">510.7808</subfield><subfield code="b">C28r 89-190</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Hataoka, Nobuo</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Speaker-independent phoneme recognition on TIMIT database using integrated time-delay neural networks (TDNNs)</subfield><subfield code="c">Nobuo Hataoka and Alex H. Waibel</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Pittsburgh, Pa.</subfield><subfield code="c">1989</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">21 S.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Carnegie-Mellon University <Pittsburgh, Pa.> / Computer Science Department: CMU-CS</subfield><subfield code="v">89,190</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">Abstract: "This paper describes a new structure of Neural Networks (NNs) for speaker-independent and context-independent phoneme recognition. This structrure is based on the integration of Time-Delay Neural Networks (TDNN, Waibel et al.[1988]) which have several TDNNs separated according to the duration of phonemes. As a result, the proposed structure has the advantage that it deals with phonemes of varying duration more efficiently. In the experimental evaluation of the proposed new structure, 16-English vowel recognition was performed using 5268 vowel tokens picked from 480 sentences spoken by 140 speakers (98 males and 42 females) on the TIMIT(TI-MIT) database</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">The number of training tokens and testing tokens was 4326 from 100 speakers (69 males and 31 females) and 942 from 40 speakers (29 males and 11 females), respectively. The result was a 60.5% recognition rate (around 70% for a collapsed 13-vowel case), which was improved from 56% in the single TDNN structure, showing the effectiveness of the proposed new structure to use temporal information.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Automatic speech recognition</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Speech perception</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Waibel, Alex H.</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="810" ind1="2" ind2=" "><subfield code="a">Computer Science Department: CMU-CS</subfield><subfield code="t">Carnegie-Mellon University <Pittsburgh, Pa.></subfield><subfield code="v">89,190</subfield><subfield code="w">(DE-604)BV006187264</subfield><subfield code="9">89,190</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-005904507</subfield></datafield></record></collection> |
id | DE-604.BV008948779 |
illustrated | Not Illustrated |
indexdate | 2024-07-09T17:27:17Z |
institution | BVB |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-005904507 |
oclc_num | 21054886 |
open_access_boolean | |
owner | DE-29T |
owner_facet | DE-29T |
physical | 21 S. |
publishDate | 1989 |
publishDateSearch | 1989 |
publishDateSort | 1989 |
record_format | marc |
series2 | Carnegie-Mellon University <Pittsburgh, Pa.> / Computer Science Department: CMU-CS |
spelling | Hataoka, Nobuo Verfasser aut Speaker-independent phoneme recognition on TIMIT database using integrated time-delay neural networks (TDNNs) Nobuo Hataoka and Alex H. Waibel Pittsburgh, Pa. 1989 21 S. txt rdacontent n rdamedia nc rdacarrier Carnegie-Mellon University <Pittsburgh, Pa.> / Computer Science Department: CMU-CS 89,190 Abstract: "This paper describes a new structure of Neural Networks (NNs) for speaker-independent and context-independent phoneme recognition. This structrure is based on the integration of Time-Delay Neural Networks (TDNN, Waibel et al.[1988]) which have several TDNNs separated according to the duration of phonemes. As a result, the proposed structure has the advantage that it deals with phonemes of varying duration more efficiently. In the experimental evaluation of the proposed new structure, 16-English vowel recognition was performed using 5268 vowel tokens picked from 480 sentences spoken by 140 speakers (98 males and 42 females) on the TIMIT(TI-MIT) database The number of training tokens and testing tokens was 4326 from 100 speakers (69 males and 31 females) and 942 from 40 speakers (29 males and 11 females), respectively. The result was a 60.5% recognition rate (around 70% for a collapsed 13-vowel case), which was improved from 56% in the single TDNN structure, showing the effectiveness of the proposed new structure to use temporal information. Automatic speech recognition Speech perception Waibel, Alex H. Verfasser aut Computer Science Department: CMU-CS Carnegie-Mellon University <Pittsburgh, Pa.> 89,190 (DE-604)BV006187264 89,190 |
spellingShingle | Hataoka, Nobuo Waibel, Alex H. Speaker-independent phoneme recognition on TIMIT database using integrated time-delay neural networks (TDNNs) Automatic speech recognition Speech perception |
title | Speaker-independent phoneme recognition on TIMIT database using integrated time-delay neural networks (TDNNs) |
title_auth | Speaker-independent phoneme recognition on TIMIT database using integrated time-delay neural networks (TDNNs) |
title_exact_search | Speaker-independent phoneme recognition on TIMIT database using integrated time-delay neural networks (TDNNs) |
title_full | Speaker-independent phoneme recognition on TIMIT database using integrated time-delay neural networks (TDNNs) Nobuo Hataoka and Alex H. Waibel |
title_fullStr | Speaker-independent phoneme recognition on TIMIT database using integrated time-delay neural networks (TDNNs) Nobuo Hataoka and Alex H. Waibel |
title_full_unstemmed | Speaker-independent phoneme recognition on TIMIT database using integrated time-delay neural networks (TDNNs) Nobuo Hataoka and Alex H. Waibel |
title_short | Speaker-independent phoneme recognition on TIMIT database using integrated time-delay neural networks (TDNNs) |
title_sort | speaker independent phoneme recognition on timit database using integrated time delay neural networks tdnns |
topic | Automatic speech recognition Speech perception |
topic_facet | Automatic speech recognition Speech perception |
volume_link | (DE-604)BV006187264 |
work_keys_str_mv | AT hataokanobuo speakerindependentphonemerecognitionontimitdatabaseusingintegratedtimedelayneuralnetworkstdnns AT waibelalexh speakerindependentphonemerecognitionontimitdatabaseusingintegratedtimedelayneuralnetworkstdnns |