Explorations in Automatic Thesaurus Discovery:
Explorations in Automatic Thesaurus Discovery presents an automated method for creating a first-draft thesaurus from raw text. It describes natural processing steps of tokenization, surface syntactic analysis, and syntactic attribute extraction. From these attributes, word and term similarity is cal...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Elektronisch E-Book |
Sprache: | English |
Veröffentlicht: |
Boston, MA
Springer US
1994
|
Schriftenreihe: | The Springer International Series in Engineering and Computer Science, Natural Language Processing and Machine Translation
278 |
Schlagworte: | |
Online-Zugang: | BTU01 URL des Erstveröffentlichers |
Zusammenfassung: | Explorations in Automatic Thesaurus Discovery presents an automated method for creating a first-draft thesaurus from raw text. It describes natural processing steps of tokenization, surface syntactic analysis, and syntactic attribute extraction. From these attributes, word and term similarity is calculated and a thesaurus is created showing important common terms and their relation to each other, common verb--noun pairings, common expressions, and word family members. The techniques are tested on twenty different corpora ranging from baseball newsgroups, assassination archives, medical X-ray reports, abstracts on AIDS, to encyclopedia articles on animals, even on the text of the book itself. The corpora range from 40,000 to 6 million characters of text, and results are presented for each in the Appendix. The methods described in the book have undergone extensive evaluation. Their time and space complexity are shown to be modest. The results are shown to converge to a stable state as the corpus grows. The similarities calculated are compared to those produced by psychological testing. A method of evaluation using Artificial Synonyms is tested. Gold Standards evaluation show that techniques significantly outperform non-linguistic-based techniques for the most important words in corpora. Explorations in Automatic Thesaurus Discovery includes applications to the fields of information retrieval using established testbeds, existing thesaural enrichment, semantic analysis. Also included are applications showing how to create, implement, and test a first-draft thesaurus |
Beschreibung: | 1 Online-Ressource (XIII, 305 p) |
ISBN: | 9781461527107 |
DOI: | 10.1007/978-1-4615-2710-7 |
Internformat
MARC
LEADER | 00000nmm a2200000zcb4500 | ||
---|---|---|---|
001 | BV045187895 | ||
003 | DE-604 | ||
005 | 00000000000000.0 | ||
007 | cr|uuu---uuuuu | ||
008 | 180912s1994 |||| o||u| ||||||eng d | ||
020 | |a 9781461527107 |9 978-1-4615-2710-7 | ||
024 | 7 | |a 10.1007/978-1-4615-2710-7 |2 doi | |
035 | |a (ZDB-2-ENG)978-1-4615-2710-7 | ||
035 | |a (OCoLC)1053794362 | ||
035 | |a (DE-599)BVBBV045187895 | ||
040 | |a DE-604 |b ger |e aacr | ||
041 | 0 | |a eng | |
049 | |a DE-634 | ||
082 | 0 | |a 006.3 |2 23 | |
100 | 1 | |a Grefenstette, Gregory |e Verfasser |4 aut | |
245 | 1 | 0 | |a Explorations in Automatic Thesaurus Discovery |c by Gregory Grefenstette |
264 | 1 | |a Boston, MA |b Springer US |c 1994 | |
300 | |a 1 Online-Ressource (XIII, 305 p) | ||
336 | |b txt |2 rdacontent | ||
337 | |b c |2 rdamedia | ||
338 | |b cr |2 rdacarrier | ||
490 | 0 | |a The Springer International Series in Engineering and Computer Science, Natural Language Processing and Machine Translation |v 278 | |
520 | |a Explorations in Automatic Thesaurus Discovery presents an automated method for creating a first-draft thesaurus from raw text. It describes natural processing steps of tokenization, surface syntactic analysis, and syntactic attribute extraction. From these attributes, word and term similarity is calculated and a thesaurus is created showing important common terms and their relation to each other, common verb--noun pairings, common expressions, and word family members. The techniques are tested on twenty different corpora ranging from baseball newsgroups, assassination archives, medical X-ray reports, abstracts on AIDS, to encyclopedia articles on animals, even on the text of the book itself. The corpora range from 40,000 to 6 million characters of text, and results are presented for each in the Appendix. The methods described in the book have undergone extensive evaluation. Their time and space complexity are shown to be modest. The results are shown to converge to a stable state as the corpus grows. The similarities calculated are compared to those produced by psychological testing. A method of evaluation using Artificial Synonyms is tested. Gold Standards evaluation show that techniques significantly outperform non-linguistic-based techniques for the most important words in corpora. Explorations in Automatic Thesaurus Discovery includes applications to the fields of information retrieval using established testbeds, existing thesaural enrichment, semantic analysis. Also included are applications showing how to create, implement, and test a first-draft thesaurus | ||
650 | 4 | |a Computer Science | |
650 | 4 | |a Artificial Intelligence (incl. Robotics) | |
650 | 4 | |a Language Translation and Linguistics | |
650 | 4 | |a Computer science | |
650 | 4 | |a Artificial intelligence | |
650 | 4 | |a Computational linguistics | |
650 | 0 | 7 | |a Sprachverarbeitung |0 (DE-588)4116579-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Thesaurus |0 (DE-588)4185172-9 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Natürliche Sprache |0 (DE-588)4041354-8 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Natürliche Sprache |0 (DE-588)4041354-8 |D s |
689 | 0 | 1 | |a Sprachverarbeitung |0 (DE-588)4116579-2 |D s |
689 | 0 | 2 | |a Thesaurus |0 (DE-588)4185172-9 |D s |
689 | 0 | |8 1\p |5 DE-604 | |
776 | 0 | 8 | |i Erscheint auch als |n Druck-Ausgabe |z 9781461361671 |
856 | 4 | 0 | |u https://doi.org/10.1007/978-1-4615-2710-7 |x Verlag |z URL des Erstveröffentlichers |3 Volltext |
912 | |a ZDB-2-ENG | ||
940 | 1 | |q ZDB-2-ENG_Archiv | |
999 | |a oai:aleph.bib-bvb.de:BVB01-030577072 | ||
883 | 1 | |8 1\p |a cgwrk |d 20201028 |q DE-101 |u https://d-nb.info/provenance/plan#cgwrk | |
966 | e | |u https://doi.org/10.1007/978-1-4615-2710-7 |l BTU01 |p ZDB-2-ENG |q ZDB-2-ENG_Archiv |x Verlag |3 Volltext |
Datensatz im Suchindex
_version_ | 1804178880548831232 |
---|---|
any_adam_object | |
author | Grefenstette, Gregory |
author_facet | Grefenstette, Gregory |
author_role | aut |
author_sort | Grefenstette, Gregory |
author_variant | g g gg |
building | Verbundindex |
bvnumber | BV045187895 |
collection | ZDB-2-ENG |
ctrlnum | (ZDB-2-ENG)978-1-4615-2710-7 (OCoLC)1053794362 (DE-599)BVBBV045187895 |
dewey-full | 006.3 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 006 - Special computer methods |
dewey-raw | 006.3 |
dewey-search | 006.3 |
dewey-sort | 16.3 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik |
doi_str_mv | 10.1007/978-1-4615-2710-7 |
format | Electronic eBook |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>03701nmm a2200541zcb4500</leader><controlfield tag="001">BV045187895</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">00000000000000.0</controlfield><controlfield tag="007">cr|uuu---uuuuu</controlfield><controlfield tag="008">180912s1994 |||| o||u| ||||||eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781461527107</subfield><subfield code="9">978-1-4615-2710-7</subfield></datafield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/978-1-4615-2710-7</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ZDB-2-ENG)978-1-4615-2710-7</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1053794362</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV045187895</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-634</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.3</subfield><subfield code="2">23</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Grefenstette, Gregory</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Explorations in Automatic Thesaurus Discovery</subfield><subfield code="c">by Gregory Grefenstette</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boston, MA</subfield><subfield code="b">Springer US</subfield><subfield code="c">1994</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 Online-Ressource (XIII, 305 p)</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">The Springer International Series in Engineering and Computer Science, Natural Language Processing and Machine Translation</subfield><subfield code="v">278</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Explorations in Automatic Thesaurus Discovery presents an automated method for creating a first-draft thesaurus from raw text. It describes natural processing steps of tokenization, surface syntactic analysis, and syntactic attribute extraction. From these attributes, word and term similarity is calculated and a thesaurus is created showing important common terms and their relation to each other, common verb--noun pairings, common expressions, and word family members. The techniques are tested on twenty different corpora ranging from baseball newsgroups, assassination archives, medical X-ray reports, abstracts on AIDS, to encyclopedia articles on animals, even on the text of the book itself. The corpora range from 40,000 to 6 million characters of text, and results are presented for each in the Appendix. The methods described in the book have undergone extensive evaluation. Their time and space complexity are shown to be modest. The results are shown to converge to a stable state as the corpus grows. The similarities calculated are compared to those produced by psychological testing. A method of evaluation using Artificial Synonyms is tested. Gold Standards evaluation show that techniques significantly outperform non-linguistic-based techniques for the most important words in corpora. Explorations in Automatic Thesaurus Discovery includes applications to the fields of information retrieval using established testbeds, existing thesaural enrichment, semantic analysis. Also included are applications showing how to create, implement, and test a first-draft thesaurus</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computer Science</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Artificial Intelligence (incl. Robotics)</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Language Translation and Linguistics</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computer science</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Artificial intelligence</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computational linguistics</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Sprachverarbeitung</subfield><subfield code="0">(DE-588)4116579-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Thesaurus</subfield><subfield code="0">(DE-588)4185172-9</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Natürliche Sprache</subfield><subfield code="0">(DE-588)4041354-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Natürliche Sprache</subfield><subfield code="0">(DE-588)4041354-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Sprachverarbeitung</subfield><subfield code="0">(DE-588)4116579-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Thesaurus</subfield><subfield code="0">(DE-588)4185172-9</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="8">1\p</subfield><subfield code="5">DE-604</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Druck-Ausgabe</subfield><subfield code="z">9781461361671</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.1007/978-1-4615-2710-7</subfield><subfield code="x">Verlag</subfield><subfield code="z">URL des Erstveröffentlichers</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-2-ENG</subfield></datafield><datafield tag="940" ind1="1" ind2=" "><subfield code="q">ZDB-2-ENG_Archiv</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-030577072</subfield></datafield><datafield tag="883" ind1="1" ind2=" "><subfield code="8">1\p</subfield><subfield code="a">cgwrk</subfield><subfield code="d">20201028</subfield><subfield code="q">DE-101</subfield><subfield code="u">https://d-nb.info/provenance/plan#cgwrk</subfield></datafield><datafield tag="966" ind1="e" ind2=" "><subfield code="u">https://doi.org/10.1007/978-1-4615-2710-7</subfield><subfield code="l">BTU01</subfield><subfield code="p">ZDB-2-ENG</subfield><subfield code="q">ZDB-2-ENG_Archiv</subfield><subfield code="x">Verlag</subfield><subfield code="3">Volltext</subfield></datafield></record></collection> |
id | DE-604.BV045187895 |
illustrated | Not Illustrated |
indexdate | 2024-07-10T08:11:00Z |
institution | BVB |
isbn | 9781461527107 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-030577072 |
oclc_num | 1053794362 |
open_access_boolean | |
owner | DE-634 |
owner_facet | DE-634 |
physical | 1 Online-Ressource (XIII, 305 p) |
psigel | ZDB-2-ENG ZDB-2-ENG_Archiv ZDB-2-ENG ZDB-2-ENG_Archiv |
publishDate | 1994 |
publishDateSearch | 1994 |
publishDateSort | 1994 |
publisher | Springer US |
record_format | marc |
series2 | The Springer International Series in Engineering and Computer Science, Natural Language Processing and Machine Translation |
spelling | Grefenstette, Gregory Verfasser aut Explorations in Automatic Thesaurus Discovery by Gregory Grefenstette Boston, MA Springer US 1994 1 Online-Ressource (XIII, 305 p) txt rdacontent c rdamedia cr rdacarrier The Springer International Series in Engineering and Computer Science, Natural Language Processing and Machine Translation 278 Explorations in Automatic Thesaurus Discovery presents an automated method for creating a first-draft thesaurus from raw text. It describes natural processing steps of tokenization, surface syntactic analysis, and syntactic attribute extraction. From these attributes, word and term similarity is calculated and a thesaurus is created showing important common terms and their relation to each other, common verb--noun pairings, common expressions, and word family members. The techniques are tested on twenty different corpora ranging from baseball newsgroups, assassination archives, medical X-ray reports, abstracts on AIDS, to encyclopedia articles on animals, even on the text of the book itself. The corpora range from 40,000 to 6 million characters of text, and results are presented for each in the Appendix. The methods described in the book have undergone extensive evaluation. Their time and space complexity are shown to be modest. The results are shown to converge to a stable state as the corpus grows. The similarities calculated are compared to those produced by psychological testing. A method of evaluation using Artificial Synonyms is tested. Gold Standards evaluation show that techniques significantly outperform non-linguistic-based techniques for the most important words in corpora. Explorations in Automatic Thesaurus Discovery includes applications to the fields of information retrieval using established testbeds, existing thesaural enrichment, semantic analysis. Also included are applications showing how to create, implement, and test a first-draft thesaurus Computer Science Artificial Intelligence (incl. Robotics) Language Translation and Linguistics Computer science Artificial intelligence Computational linguistics Sprachverarbeitung (DE-588)4116579-2 gnd rswk-swf Thesaurus (DE-588)4185172-9 gnd rswk-swf Natürliche Sprache (DE-588)4041354-8 gnd rswk-swf Natürliche Sprache (DE-588)4041354-8 s Sprachverarbeitung (DE-588)4116579-2 s Thesaurus (DE-588)4185172-9 s 1\p DE-604 Erscheint auch als Druck-Ausgabe 9781461361671 https://doi.org/10.1007/978-1-4615-2710-7 Verlag URL des Erstveröffentlichers Volltext 1\p cgwrk 20201028 DE-101 https://d-nb.info/provenance/plan#cgwrk |
spellingShingle | Grefenstette, Gregory Explorations in Automatic Thesaurus Discovery Computer Science Artificial Intelligence (incl. Robotics) Language Translation and Linguistics Computer science Artificial intelligence Computational linguistics Sprachverarbeitung (DE-588)4116579-2 gnd Thesaurus (DE-588)4185172-9 gnd Natürliche Sprache (DE-588)4041354-8 gnd |
subject_GND | (DE-588)4116579-2 (DE-588)4185172-9 (DE-588)4041354-8 |
title | Explorations in Automatic Thesaurus Discovery |
title_auth | Explorations in Automatic Thesaurus Discovery |
title_exact_search | Explorations in Automatic Thesaurus Discovery |
title_full | Explorations in Automatic Thesaurus Discovery by Gregory Grefenstette |
title_fullStr | Explorations in Automatic Thesaurus Discovery by Gregory Grefenstette |
title_full_unstemmed | Explorations in Automatic Thesaurus Discovery by Gregory Grefenstette |
title_short | Explorations in Automatic Thesaurus Discovery |
title_sort | explorations in automatic thesaurus discovery |
topic | Computer Science Artificial Intelligence (incl. Robotics) Language Translation and Linguistics Computer science Artificial intelligence Computational linguistics Sprachverarbeitung (DE-588)4116579-2 gnd Thesaurus (DE-588)4185172-9 gnd Natürliche Sprache (DE-588)4041354-8 gnd |
topic_facet | Computer Science Artificial Intelligence (incl. Robotics) Language Translation and Linguistics Computer science Artificial intelligence Computational linguistics Sprachverarbeitung Thesaurus Natürliche Sprache |
url | https://doi.org/10.1007/978-1-4615-2710-7 |
work_keys_str_mv | AT grefenstettegregory explorationsinautomaticthesaurusdiscovery |