Internformat: Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery:

Explorations in Automatic Thesaurus Discovery presents an automated method for creating a first-draft thesaurus from raw text. It describes natural processing steps of tokenization, surface syntactic analysis, and syntactic attribute extraction. From these attributes, word and term similarity is cal...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Grefenstette, Gregory (VerfasserIn)
Format:	Elektronisch E-Book
Sprache:	English
Veröffentlicht:	Boston, MA Springer US 1994
Schriftenreihe:	The Springer International Series in Engineering and Computer Science, Natural Language Processing and Machine Translation 278
Schlagworte:	Computer Science Artificial Intelligence (incl. Robotics) Language Translation and Linguistics Computer science Artificial intelligence Computational linguistics Sprachverarbeitung Thesaurus Natürliche Sprache
Online-Zugang:	BTU01 URL des Erstveröffentlichers
Zusammenfassung:	Explorations in Automatic Thesaurus Discovery presents an automated method for creating a first-draft thesaurus from raw text. It describes natural processing steps of tokenization, surface syntactic analysis, and syntactic attribute extraction. From these attributes, word and term similarity is calculated and a thesaurus is created showing important common terms and their relation to each other, common verb--noun pairings, common expressions, and word family members. The techniques are tested on twenty different corpora ranging from baseball newsgroups, assassination archives, medical X-ray reports, abstracts on AIDS, to encyclopedia articles on animals, even on the text of the book itself. The corpora range from 40,000 to 6 million characters of text, and results are presented for each in the Appendix. The methods described in the book have undergone extensive evaluation. Their time and space complexity are shown to be modest. The results are shown to converge to a stable state as the corpus grows. The similarities calculated are compared to those produced by psychological testing. A method of evaluation using Artificial Synonyms is tested. Gold Standards evaluation show that techniques significantly outperform non-linguistic-based techniques for the most important words in corpora. Explorations in Automatic Thesaurus Discovery includes applications to the fields of information retrieval using established testbeds, existing thesaural enrichment, semantic analysis. Also included are applications showing how to create, implement, and test a first-draft thesaurus
Beschreibung:	1 Online-Ressource (XIII, 305 p)
ISBN:	9781461527107
DOI:	10.1007/978-1-4615-2710-7

Internformat

MARC


LEADER	00000nmm a2200000zcb4500
001	BV045187895
003	DE-604
005	00000000000000.0
007	cr\|uuu---uuuuu
008	180912s1994 \|\|\|\| o\|\|u\| \|\|\|\|\|\|eng d
020			\|a 9781461527107 \|9 978-1-4615-2710-7
024	7		\|a 10.1007/978-1-4615-2710-7 \|2 doi
035			\|a (ZDB-2-ENG)978-1-4615-2710-7
035			\|a (OCoLC)1053794362
035			\|a (DE-599)BVBBV045187895
040			\|a DE-604 \|b ger \|e aacr
041	0		\|a eng
049			\|a DE-634
082	0		\|a 006.3 \|2 23
100	1		\|a Grefenstette, Gregory \|e Verfasser \|4 aut
245	1	0	\|a Explorations in Automatic Thesaurus Discovery \|c by Gregory Grefenstette
264		1	\|a Boston, MA \|b Springer US \|c 1994
300			\|a 1 Online-Ressource (XIII, 305 p)
336			\|b txt \|2 rdacontent
337			\|b c \|2 rdamedia
338			\|b cr \|2 rdacarrier
490	0		\|a The Springer International Series in Engineering and Computer Science, Natural Language Processing and Machine Translation \|v 278
520			\|a Explorations in Automatic Thesaurus Discovery presents an automated method for creating a first-draft thesaurus from raw text. It describes natural processing steps of tokenization, surface syntactic analysis, and syntactic attribute extraction. From these attributes, word and term similarity is calculated and a thesaurus is created showing important common terms and their relation to each other, common verb--noun pairings, common expressions, and word family members. The techniques are tested on twenty different corpora ranging from baseball newsgroups, assassination archives, medical X-ray reports, abstracts on AIDS, to encyclopedia articles on animals, even on the text of the book itself. The corpora range from 40,000 to 6 million characters of text, and results are presented for each in the Appendix. The methods described in the book have undergone extensive evaluation. Their time and space complexity are shown to be modest. The results are shown to converge to a stable state as the corpus grows. The similarities calculated are compared to those produced by psychological testing. A method of evaluation using Artificial Synonyms is tested. Gold Standards evaluation show that techniques significantly outperform non-linguistic-based techniques for the most important words in corpora. Explorations in Automatic Thesaurus Discovery includes applications to the fields of information retrieval using established testbeds, existing thesaural enrichment, semantic analysis. Also included are applications showing how to create, implement, and test a first-draft thesaurus
650		4	\|a Computer Science
650		4	\|a Artificial Intelligence (incl. Robotics)
650		4	\|a Language Translation and Linguistics
650		4	\|a Computer science
650		4	\|a Artificial intelligence
650		4	\|a Computational linguistics
650	0	7	\|a Sprachverarbeitung \|0 (DE-588)4116579-2 \|2 gnd \|9 rswk-swf
650	0	7	\|a Thesaurus \|0 (DE-588)4185172-9 \|2 gnd \|9 rswk-swf
650	0	7	\|a Natürliche Sprache \|0 (DE-588)4041354-8 \|2 gnd \|9 rswk-swf
689	0	0	\|a Natürliche Sprache \|0 (DE-588)4041354-8 \|D s
689	0	1	\|a Sprachverarbeitung \|0 (DE-588)4116579-2 \|D s
689	0	2	\|a Thesaurus \|0 (DE-588)4185172-9 \|D s
689	0		\|8 1\p \|5 DE-604
776	0	8	\|i Erscheint auch als \|n Druck-Ausgabe \|z 9781461361671
856	4	0	\|u https://doi.org/10.1007/978-1-4615-2710-7 \|x Verlag \|z URL des Erstveröffentlichers \|3 Volltext
912			\|a ZDB-2-ENG
940	1		\|q ZDB-2-ENG_Archiv
999			\|a oai:aleph.bib-bvb.de:BVB01-030577072
883	1		\|8 1\p \|a cgwrk \|d 20201028 \|q DE-101 \|u https://d-nb.info/provenance/plan#cgwrk
966	e		\|u https://doi.org/10.1007/978-1-4615-2710-7 \|l BTU01 \|p ZDB-2-ENG \|q ZDB-2-ENG_Archiv \|x Verlag \|3 Volltext

Datensatz im Suchindex

_version_	1804178880548831232
any_adam_object
author	Grefenstette, Gregory
author_facet	Grefenstette, Gregory
author_role	aut
author_sort	Grefenstette, Gregory
author_variant	g g gg
building	Verbundindex
bvnumber	BV045187895
collection	ZDB-2-ENG
ctrlnum	(ZDB-2-ENG)978-1-4615-2710-7 (OCoLC)1053794362 (DE-599)BVBBV045187895
dewey-full	006.3
dewey-hundreds	000 - Computer science, information, general works
dewey-ones	006 - Special computer methods
dewey-raw	006.3
dewey-search	006.3
dewey-sort	16.3
dewey-tens	000 - Computer science, information, general works
discipline	Informatik
doi_str_mv	10.1007/978-1-4615-2710-7
format	Electronic eBook
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>03701nmm a2200541zcb4500</leader><controlfield tag="001">BV045187895</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">00000000000000.0</controlfield><controlfield tag="007">cr\|uuu---uuuuu</controlfield><controlfield tag="008">180912s1994 \|\|\|\| o\|\|u\| \|\|\|\|\|\|eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781461527107</subfield><subfield code="9">978-1-4615-2710-7</subfield></datafield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/978-1-4615-2710-7</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ZDB-2-ENG)978-1-4615-2710-7</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1053794362</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV045187895</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-634</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">006.3</subfield><subfield code="2">23</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Grefenstette, Gregory</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Explorations in Automatic Thesaurus Discovery</subfield><subfield code="c">by Gregory Grefenstette</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boston, MA</subfield><subfield code="b">Springer US</subfield><subfield code="c">1994</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 Online-Ressource (XIII, 305 p)</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">The Springer International Series in Engineering and Computer Science, Natural Language Processing and Machine Translation</subfield><subfield code="v">278</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Explorations in Automatic Thesaurus Discovery presents an automated method for creating a first-draft thesaurus from raw text. It describes natural processing steps of tokenization, surface syntactic analysis, and syntactic attribute extraction. From these attributes, word and term similarity is calculated and a thesaurus is created showing important common terms and their relation to each other, common verb--noun pairings, common expressions, and word family members. The techniques are tested on twenty different corpora ranging from baseball newsgroups, assassination archives, medical X-ray reports, abstracts on AIDS, to encyclopedia articles on animals, even on the text of the book itself. The corpora range from 40,000 to 6 million characters of text, and results are presented for each in the Appendix. The methods described in the book have undergone extensive evaluation. Their time and space complexity are shown to be modest. The results are shown to converge to a stable state as the corpus grows. The similarities calculated are compared to those produced by psychological testing. A method of evaluation using Artificial Synonyms is tested. Gold Standards evaluation show that techniques significantly outperform non-linguistic-based techniques for the most important words in corpora. Explorations in Automatic Thesaurus Discovery includes applications to the fields of information retrieval using established testbeds, existing thesaural enrichment, semantic analysis. Also included are applications showing how to create, implement, and test a first-draft thesaurus</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computer Science</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Artificial Intelligence (incl. Robotics)</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Language Translation and Linguistics</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computer science</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Artificial intelligence</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computational linguistics</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Sprachverarbeitung</subfield><subfield code="0">(DE-588)4116579-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Thesaurus</subfield><subfield code="0">(DE-588)4185172-9</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Natürliche Sprache</subfield><subfield code="0">(DE-588)4041354-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Natürliche Sprache</subfield><subfield code="0">(DE-588)4041354-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Sprachverarbeitung</subfield><subfield code="0">(DE-588)4116579-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Thesaurus</subfield><subfield code="0">(DE-588)4185172-9</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="8">1\p</subfield><subfield code="5">DE-604</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Erscheint auch als</subfield><subfield code="n">Druck-Ausgabe</subfield><subfield code="z">9781461361671</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.1007/978-1-4615-2710-7</subfield><subfield code="x">Verlag</subfield><subfield code="z">URL des Erstveröffentlichers</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">ZDB-2-ENG</subfield></datafield><datafield tag="940" ind1="1" ind2=" "><subfield code="q">ZDB-2-ENG_Archiv</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-030577072</subfield></datafield><datafield tag="883" ind1="1" ind2=" "><subfield code="8">1\p</subfield><subfield code="a">cgwrk</subfield><subfield code="d">20201028</subfield><subfield code="q">DE-101</subfield><subfield code="u">https://d-nb.info/provenance/plan#cgwrk</subfield></datafield><datafield tag="966" ind1="e" ind2=" "><subfield code="u">https://doi.org/10.1007/978-1-4615-2710-7</subfield><subfield code="l">BTU01</subfield><subfield code="p">ZDB-2-ENG</subfield><subfield code="q">ZDB-2-ENG_Archiv</subfield><subfield code="x">Verlag</subfield><subfield code="3">Volltext</subfield></datafield></record></collection>
id	DE-604.BV045187895
illustrated	Not Illustrated
indexdate	2024-07-10T08:11:00Z
institution	BVB
isbn	9781461527107
language	English
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-030577072
oclc_num	1053794362
open_access_boolean
owner	DE-634
owner_facet	DE-634
physical	1 Online-Ressource (XIII, 305 p)
psigel	ZDB-2-ENG ZDB-2-ENG_Archiv ZDB-2-ENG ZDB-2-ENG_Archiv
publishDate	1994
publishDateSearch	1994
publishDateSort	1994
publisher	Springer US
record_format	marc
series2	The Springer International Series in Engineering and Computer Science, Natural Language Processing and Machine Translation
spelling	Grefenstette, Gregory Verfasser aut Explorations in Automatic Thesaurus Discovery by Gregory Grefenstette Boston, MA Springer US 1994 1 Online-Ressource (XIII, 305 p) txt rdacontent c rdamedia cr rdacarrier The Springer International Series in Engineering and Computer Science, Natural Language Processing and Machine Translation 278 Explorations in Automatic Thesaurus Discovery presents an automated method for creating a first-draft thesaurus from raw text. It describes natural processing steps of tokenization, surface syntactic analysis, and syntactic attribute extraction. From these attributes, word and term similarity is calculated and a thesaurus is created showing important common terms and their relation to each other, common verb--noun pairings, common expressions, and word family members. The techniques are tested on twenty different corpora ranging from baseball newsgroups, assassination archives, medical X-ray reports, abstracts on AIDS, to encyclopedia articles on animals, even on the text of the book itself. The corpora range from 40,000 to 6 million characters of text, and results are presented for each in the Appendix. The methods described in the book have undergone extensive evaluation. Their time and space complexity are shown to be modest. The results are shown to converge to a stable state as the corpus grows. The similarities calculated are compared to those produced by psychological testing. A method of evaluation using Artificial Synonyms is tested. Gold Standards evaluation show that techniques significantly outperform non-linguistic-based techniques for the most important words in corpora. Explorations in Automatic Thesaurus Discovery includes applications to the fields of information retrieval using established testbeds, existing thesaural enrichment, semantic analysis. Also included are applications showing how to create, implement, and test a first-draft thesaurus Computer Science Artificial Intelligence (incl. Robotics) Language Translation and Linguistics Computer science Artificial intelligence Computational linguistics Sprachverarbeitung (DE-588)4116579-2 gnd rswk-swf Thesaurus (DE-588)4185172-9 gnd rswk-swf Natürliche Sprache (DE-588)4041354-8 gnd rswk-swf Natürliche Sprache (DE-588)4041354-8 s Sprachverarbeitung (DE-588)4116579-2 s Thesaurus (DE-588)4185172-9 s 1\p DE-604 Erscheint auch als Druck-Ausgabe 9781461361671 https://doi.org/10.1007/978-1-4615-2710-7 Verlag URL des Erstveröffentlichers Volltext 1\p cgwrk 20201028 DE-101 https://d-nb.info/provenance/plan#cgwrk
spellingShingle	Grefenstette, Gregory Explorations in Automatic Thesaurus Discovery Computer Science Artificial Intelligence (incl. Robotics) Language Translation and Linguistics Computer science Artificial intelligence Computational linguistics Sprachverarbeitung (DE-588)4116579-2 gnd Thesaurus (DE-588)4185172-9 gnd Natürliche Sprache (DE-588)4041354-8 gnd
subject_GND	(DE-588)4116579-2 (DE-588)4185172-9 (DE-588)4041354-8
title	Explorations in Automatic Thesaurus Discovery
title_auth	Explorations in Automatic Thesaurus Discovery
title_exact_search	Explorations in Automatic Thesaurus Discovery
title_full	Explorations in Automatic Thesaurus Discovery by Gregory Grefenstette
title_fullStr	Explorations in Automatic Thesaurus Discovery by Gregory Grefenstette
title_full_unstemmed	Explorations in Automatic Thesaurus Discovery by Gregory Grefenstette
title_short	Explorations in Automatic Thesaurus Discovery
title_sort	explorations in automatic thesaurus discovery
topic	Computer Science Artificial Intelligence (incl. Robotics) Language Translation and Linguistics Computer science Artificial intelligence Computational linguistics Sprachverarbeitung (DE-588)4116579-2 gnd Thesaurus (DE-588)4185172-9 gnd Natürliche Sprache (DE-588)4041354-8 gnd
topic_facet	Computer Science Artificial Intelligence (incl. Robotics) Language Translation and Linguistics Computer science Artificial intelligence Computational linguistics Sprachverarbeitung Thesaurus Natürliche Sprache
url	https://doi.org/10.1007/978-1-4615-2710-7
work_keys_str_mv	AT grefenstettegregory explorationsinautomaticthesaurusdiscovery

Verfügbarkeit

MARC

Datensatz im Suchindex

Ähnliche Einträge