Verfügbarkeit: Mining the Web

Mining the Web: discovering knowledge from hypertext data

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Chakrabarti, Soumen (VerfasserIn)
Format:	Buch
Sprache:	English
Veröffentlicht:	Amsterdam [u.a.] Morgan Kaufmann Publ. 2007
Ausgabe:	[Nachdr.]
Schriftenreihe:	The Morgan Kaufmann series in data management systems
Schlagworte:	Data mining Hypertext systems Web databases Automatic data collection Data Mining - World Wide Web World Wide Web Data Mining
Online-Zugang:	Publisher description Table of contents Inhaltsverzeichnis
Beschreibung:	Includes bibliographical references (p. 307-326) and index
Beschreibung:	XVIII, 345 S. Ill., graph. Darst.
ISBN:	1558607544 9781558607545

Internformat

MARC


LEADER	00000nam a2200000zc 4500
001	BV036132174
003	DE-604
005	20110616
007	t
008	100422s2007 xxuad\|\| \|\|\|\| 00\|\|\| eng d
016	7		\|a ocn263706453 \|2 DE-101
020			\|a 1558607544 \|9 1-55860-754-4
020			\|a 9781558607545 \|9 978-1-55860-754-5
035			\|a (OCoLC)263706453
035			\|a (DE-599)BVBBV036132174
040			\|a DE-604 \|b ger \|e aacr
041	0		\|a eng
044			\|a xxu \|c US
049			\|a DE-473 \|a DE-945
050		0	\|a QA76.9.D343 C45 2007
082	0		\|a 005.72
082	0		\|a 005.78/8 22
084			\|a ST 270 \|0 (DE-625)143638: \|2 rvk
084			\|a ST 530 \|0 (DE-625)143679: \|2 rvk
100	1		\|a Chakrabarti, Soumen \|e Verfasser \|4 aut
245	1	0	\|a Mining the Web \|b discovering knowledge from hypertext data \|c Soumen Chakrabarti
250			\|a [Nachdr.]
264		1	\|a Amsterdam [u.a.] \|b Morgan Kaufmann Publ. \|c 2007
300			\|a XVIII, 345 S. \|b Ill., graph. Darst.
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
490	0		\|a The Morgan Kaufmann series in data management systems
500			\|a Includes bibliographical references (p. 307-326) and index
650		4	\|a Data mining
650		4	\|a Hypertext systems
650		4	\|a Web databases
650		4	\|a Automatic data collection
650		4	\|a Data Mining - World Wide Web
650		4	\|a Automatic data collection
650		4	\|a Data mining
650		4	\|a Hypertext systems
650		4	\|a Web databases
650	0	7	\|a World Wide Web \|0 (DE-588)4363898-3 \|2 gnd \|9 rswk-swf
650	0	7	\|a Data Mining \|0 (DE-588)4428654-5 \|2 gnd \|9 rswk-swf
689	0	0	\|a World Wide Web \|0 (DE-588)4363898-3 \|D s
689	0	1	\|a Data Mining \|0 (DE-588)4428654-5 \|D s
689	0		\|C b \|5 DE-604
856	4	2	\|q text/html \|u http://www.loc.gov/catdir/description/els031/2002107241.html \|3 Publisher description
856	4	2	\|q text/html \|u http://www.loc.gov/catdir/toc/els031/2002107241.html \|3 Table of contents
856	4	2	\|m Digitalisierung UB Bamberg \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020214775&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
999			\|a oai:aleph.bib-bvb.de:BVB01-020214775

Datensatz im Suchindex

_version_	1804142806267068416
adam_text	CONTENTS Foreword vii Jiawei Han Preface xv INTRODUCTION 1.1 Crawling and Indexing 6 1.2 Topic Directories 7 1.3 Clustering and Classification 8 1.4 Hyperlink Analysis 9 1.5 Resource Discovery and Vertical Portals 11 1.6 Structured vs. Unstructured Data Mining 11 1.7 Bibliographic Notes 13 part ι INFRASTRUCTURE 2 CRAWLING THE WEB 2.1 HTML and HTTP Basics 18 2.2 Crawling Basics 19 2.3 Engineering Large-Scale Crawlers 21 2.3.1 DNS Caching, Prefetching, and Resolution 22 2.3.2 Multiple Concurrent Fetches 23 2.3.3 Link Extraction and Normalization 25 ix X Contents 2.3.4 Robot Exclusion 26 2.3.5 Eliminating Already-Visited URLs 26 2.3.6 Spider Traps 28 2.3.7 Avoiding Repeated Expansion of Links on Duplicate Pages 29 2.3.8 Load Monitor and Manager 29 2.3.9 Per-Server Work-Queues 30 2.3.10 Text Repository 31 2.3.11 Refreshing Crawled Pages 33 2.4 Putting Together a Crawler 35 2.4.1 Design of the Core Components 35 2.4.2 Case Study: Using vrôc-l i bwww 40 2.5 Bibliographic Notes 40 WEB SEARCH AND INFORMATION RETRIEVAL 3.1 Boolean Queries and the Inverted Index 45 3.1.1 Stopwords and Stemming 48 3.1.2 Batch Indexing and Updates 49 3.1.3 Index Compression Techniques 51 3.2 Relevance Ranking 53 3.2.1 Recall and Precision 53 3.2.2 The Vector-Space Model 56 3.2.3 Relevance Feedback and Rocchio s Method 57 3.2.4 Probabilistic Relevance Feedback Models 58 3.2.5 Advanced Issues 61 3.3 Similarity Search 67 3.3.1 Handling Find-Similar Queries 68 3.3.2 Eliminating Near Duplicates via Shingling 71 3.3.3 Detecting Locally Similar Subgraphs of the Web 73 3.4 Bibliographic Notes 75 Contents Xl PART II LEARNING 4 SIMILARITY AND CLUSTERING 4.1 Formulations and Approaches 81 4.1.1 Partitioning Approaches 81 4.1.2 Geometric Embedding Approaches 82 4.1.3 Generative Models and Probabilistic Approaches 83 4.2 Bottom-Up and Тор -Down Partitioning Paradigms 84 4.2.1 Agglomerative Clustering 84 4.2.2 The fe-Means Algorithm 87 4.3 Clustering and Visualization via Embeddings 89 4.3.1 Self-Organizing Maps (SOMs) 90 4.3.2 Multidimensional Scaling (MDS) and FastMap 91 4.3.3 Projections and Subspaces 94 4.3.4 Latent Semantic Indexing (LSI) 96 4.4 Probabilistic Approaches to Clustering 99 4.4.1 Generative Distributions for Documents 101 4.4.2 Mixture Models and Expectation Maximization (EM) 103 4.4.3 Multiple Cause Mixture Model (MCMM) 108 4.4.4 Aspect Models and Probabilistic LSI 109 4.4.5 Model and Feature Selection 112 4.5 Collaborative Filtering 115 4.5.1 Probabilistic Models 115 4.5.2 Combining Content-Based and Collaborative Features 117 4.6 Bibliographic Notes 121 5 SUPERVISED LEARNING 5.1 The Supervised Learning Scenario 126 5.2 Overview of Classification Strategies 128 Xli Contents 5.3 Evaluating Text Classifiers 129 5.3.1 Benchmarks 130 5.3.2 Measures of Accuracy 131 5.4 Nearest Neighbor Learners 133 5.4.1 Pros and Cons 134 5.4.2 Is TFIDF Appropriate? 135 5.5 Feature Selection 136 5.5.1 Greedy Inclusion Algorithms 137 5.5.2 Truncation Algorithms 144 5.5.3 Comparison and Discussion 145 5.6 Bayesian Learners 147 5.6.1 Naive Bayes Learners 148 5.6.2 Small-Degree Bayesian Networks 152 5.7 Exploiting Hierarchy among Topics 155 5.7.1 Feature Selection 155 5.7.2 Enhanced Parameter Estimation 155 5.7.3 Training and Search Strategies 157 5.8 Maximum Entropy Learners 160 5.9 Discriminative Classification 163 5.9.1 Linear Least-Square Regression 163 5.9.2 Support Vector Machines 164 5.10 Hypertext Classification 169 5.10.1 Representing Hypertext for Supervised Learning 169 5.10.2 Rule Induction 171 5.11 Bibliographic Notes 173 SEMISUPERVISED LEARNING 6.1 Expectation Maximization 178 6.1.1 Experimental Results 179 6.1.2 Reducing the Belief in Unlabeled Documents 181 6.1.3 Modeling Labels Using Many Mixture Components 183 Contents Xlii 6.2 Labeling Hypertext Graphs 184 6.2.1 Absorbing Features from Neighboring Pages 185 6.2.2 A Relaxation Labeling Algorithm 188 6.2.3 A Metric Graph-Labeling Problem 193 6.3 Co-training 195 6.4 Bibliographic Notes 198 part in APPLICATIONS 7 SOCIAL NETWORK ANALYSIS 7.1 Social Sciences and Bibliometry 205 7.1.1 Prestige 205 7.1.2 Centrality 206 7.1.3 Co-citation 207 7.2 PageRank and HITS 209 7.2.1 PageRank 209 7.2.2 HITS 212 7.2.3 Stochastic HITS and Other Variants 216 7.3 Shortcomings of the Coarse-Grained Graph Model 219 7.3.1 Artifacts of Web Authorship 219 7.3.2 Topic Contamination and Drift 223 7.4 Enhanced Models and Techniques 225 7.4.1 Avoiding Two-Party Nepotism 225 7.4.2 Outlier Elimination 226 7.4.3 Exploiting Anchor Text 227 7.4.4 Exploiting Document Markup Structure 228 7.5 Evaluation of Topic Distillation 235 7.5.1 HITS and Related Algorithms 235 7.5.2 Effect of Exploiting Other Hypertext Features 238 7.6 Measuring and Modeling the Web 243 7.6.1 Power-Law Degree Distributions 243 XIV Contents 7.6.2 The Bow Tie Structure and Bipartite Cores 246 7.6.3 Sampling Web Pages at Random 246 7.7 Bibliographic Notes 254 RESOURCE DISCOVERY 8.1 Collecting Important Pages Preferentially 257 8.1.1 Crawling as Guided Search in a Graph 257 8.1.2 Keyword-Based Graph Search 259 8.2 Similarity Search Using Link Topology 264 8.3 Topical Locality and Focused Crawling 268 8.3.1 Focused Crawling 270 8.3.2 Identifying and Exploiting Hubs 277 8.3.3 Learning Context Graphs 279 8.3.4 Reinforcement Learning 280 8.4 Discovering Communities 284 8.4.1 Bipartite Cores as Communities 284 8.4.2 Network Flow/Cut-Based Notions of Communities 285 8.5 Bibliographic Notes 288 THE FUTURE OF WEB MINING 9.1 Information Extraction 290 9.2 Natural Language Processing 295 9.2.1 Lexical Networks and Ontologies 296 9.2.2 Part-of-Speech and Sense Tagging 297 9.2.3 Parsing and Knowledge Representation 299 9.3 Question Answering 302 9.4 Profiles, Personalization, and Collaboration 305 References 307 Index 327 About the Author 345
any_adam_object	1
author	Chakrabarti, Soumen
author_facet	Chakrabarti, Soumen
author_role	aut
author_sort	Chakrabarti, Soumen
author_variant	s c sc
building	Verbundindex
bvnumber	BV036132174
callnumber-first	Q - Science
callnumber-label	QA76
callnumber-raw	QA76.9.D343 C45 2007
callnumber-search	QA76.9.D343 C45 2007
callnumber-sort	QA 276.9 D343 C45 42007
callnumber-subject	QA - Mathematics
classification_rvk	ST 270 ST 530
ctrlnum	(OCoLC)263706453 (DE-599)BVBBV036132174
dewey-full	005.72 005.78/822
dewey-hundreds	000 - Computer science, information, general works
dewey-ones	005 - Computer programming, programs, data, security
dewey-raw	005.72 005.78/8 22
dewey-search	005.72 005.78/8 22
dewey-sort	15.72
dewey-tens	000 - Computer science, information, general works
discipline	Informatik
edition	[Nachdr.]
format	Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02262nam a2200589zc 4500</leader><controlfield tag="001">BV036132174</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20110616 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">100422s2007 xxuad\|\| \|\|\|\| 00\|\|\| eng d</controlfield><datafield tag="016" ind1="7" ind2=" "><subfield code="a">ocn263706453</subfield><subfield code="2">DE-101</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">1558607544</subfield><subfield code="9">1-55860-754-4</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781558607545</subfield><subfield code="9">978-1-55860-754-5</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)263706453</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV036132174</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-473</subfield><subfield code="a">DE-945</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA76.9.D343 C45 2007</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">005.72</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">005.78/8 22</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 270</subfield><subfield code="0">(DE-625)143638:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Chakrabarti, Soumen</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Mining the Web</subfield><subfield code="b">discovering knowledge from hypertext data</subfield><subfield code="c">Soumen Chakrabarti</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">[Nachdr.]</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Amsterdam [u.a.]</subfield><subfield code="b">Morgan Kaufmann Publ.</subfield><subfield code="c">2007</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XVIII, 345 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">The Morgan Kaufmann series in data management systems</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references (p. 307-326) and index</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data mining</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Hypertext systems</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Web databases</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Automatic data collection</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data Mining - World Wide Web</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Automatic data collection</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data mining</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Hypertext systems</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Web databases</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">World Wide Web</subfield><subfield code="0">(DE-588)4363898-3</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">World Wide Web</subfield><subfield code="0">(DE-588)4363898-3</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="C">b</subfield><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="q">text/html</subfield><subfield code="u">http://www.loc.gov/catdir/description/els031/2002107241.html</subfield><subfield code="3">Publisher description</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="q">text/html</subfield><subfield code="u">http://www.loc.gov/catdir/toc/els031/2002107241.html</subfield><subfield code="3">Table of contents</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bamberg</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020214775&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-020214775</subfield></datafield></record></collection>
id	DE-604.BV036132174
illustrated	Illustrated
indexdate	2024-07-09T22:37:37Z
institution	BVB
isbn	1558607544 9781558607545
language	English
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-020214775
oclc_num	263706453
open_access_boolean
owner	DE-473 DE-BY-UBG DE-945
owner_facet	DE-473 DE-BY-UBG DE-945
physical	XVIII, 345 S. Ill., graph. Darst.
publishDate	2007
publishDateSearch	2007
publishDateSort	2007
publisher	Morgan Kaufmann Publ.
record_format	marc
series2	The Morgan Kaufmann series in data management systems
spelling	Chakrabarti, Soumen Verfasser aut Mining the Web discovering knowledge from hypertext data Soumen Chakrabarti [Nachdr.] Amsterdam [u.a.] Morgan Kaufmann Publ. 2007 XVIII, 345 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier The Morgan Kaufmann series in data management systems Includes bibliographical references (p. 307-326) and index Data mining Hypertext systems Web databases Automatic data collection Data Mining - World Wide Web World Wide Web (DE-588)4363898-3 gnd rswk-swf Data Mining (DE-588)4428654-5 gnd rswk-swf World Wide Web (DE-588)4363898-3 s Data Mining (DE-588)4428654-5 s b DE-604 text/html http://www.loc.gov/catdir/description/els031/2002107241.html Publisher description text/html http://www.loc.gov/catdir/toc/els031/2002107241.html Table of contents Digitalisierung UB Bamberg application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020214775&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis
spellingShingle	Chakrabarti, Soumen Mining the Web discovering knowledge from hypertext data Data mining Hypertext systems Web databases Automatic data collection Data Mining - World Wide Web World Wide Web (DE-588)4363898-3 gnd Data Mining (DE-588)4428654-5 gnd
subject_GND	(DE-588)4363898-3 (DE-588)4428654-5
title	Mining the Web discovering knowledge from hypertext data
title_auth	Mining the Web discovering knowledge from hypertext data
title_exact_search	Mining the Web discovering knowledge from hypertext data
title_full	Mining the Web discovering knowledge from hypertext data Soumen Chakrabarti
title_fullStr	Mining the Web discovering knowledge from hypertext data Soumen Chakrabarti
title_full_unstemmed	Mining the Web discovering knowledge from hypertext data Soumen Chakrabarti
title_short	Mining the Web
title_sort	mining the web discovering knowledge from hypertext data
title_sub	discovering knowledge from hypertext data
topic	Data mining Hypertext systems Web databases Automatic data collection Data Mining - World Wide Web World Wide Web (DE-588)4363898-3 gnd Data Mining (DE-588)4428654-5 gnd
topic_facet	Data mining Hypertext systems Web databases Automatic data collection Data Mining - World Wide Web World Wide Web Data Mining
url	http://www.loc.gov/catdir/description/els031/2002107241.html http://www.loc.gov/catdir/toc/els031/2002107241.html http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020214775&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA
work_keys_str_mv	AT chakrabartisoumen miningthewebdiscoveringknowledgefromhypertextdata

Verfügbarkeit

Es ist kein Print-Exemplar vorhanden.

Fernleihe Bestellen Achtung: Nicht im THWS-Bestand! Inhaltsverzeichnis

MARC

Datensatz im Suchindex

Es ist kein Print-Exemplar vorhanden.

Ähnliche Einträge