Search engines: information retrieval in practice
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Boston
Addison-Wesley
c2010
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Includes bibliographical references (p. [487]-511) and index |
Beschreibung: | xxv, 520 Seiten. ill. 24 cm |
ISBN: | 9780136072249 0136072240 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV044233086 | ||
003 | DE-604 | ||
005 | 20170412 | ||
007 | t | ||
008 | 170320s2010 xxua||| |||| 00||| eng d | ||
010 | |a 009002640 | ||
020 | |a 9780136072249 |9 978-0-13-607224-9 | ||
020 | |a 0136072240 |9 0-13-607224-0 | ||
035 | |a (OCoLC)836855288 | ||
035 | |a (DE-599)BVBBV044233086 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng | |
044 | |a xxu |c US | ||
049 | |a DE-739 | ||
050 | 0 | |a TK5105.884 | |
082 | 0 | |a 005.75/8 |2 22 | |
084 | |a ST 270 |0 (DE-625)143638: |2 rvk | ||
100 | 1 | |a Croft, W. Bruce |d 1952- |e Verfasser |0 (DE-588)137756658 |4 aut | |
245 | 1 | 0 | |a Search engines |b information retrieval in practice |c W. Bruce Croft ; Donald Metzler ; Trevor Strohman |
264 | 1 | |a Boston |b Addison-Wesley |c c2010 | |
300 | |a xxv, 520 Seiten. |b ill. |c 24 cm | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Includes bibliographical references (p. [487]-511) and index | ||
650 | 4 | |a Search engines |x Programming | |
650 | 4 | |a Information retrieval | |
650 | 4 | |a Information Storage and Retrieval | |
650 | 4 | |a Knowledge Bases | |
650 | 0 | 7 | |a Suchmaschine |0 (DE-588)4423007-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Information Retrieval |0 (DE-588)4072803-1 |2 gnd |9 rswk-swf |
655 | 7 | |8 1\p |0 (DE-588)4123623-3 |a Lehrbuch |2 gnd-content | |
689 | 0 | 0 | |a Suchmaschine |0 (DE-588)4423007-2 |D s |
689 | 0 | 1 | |a Information Retrieval |0 (DE-588)4072803-1 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Metzler, Donald |e Verfasser |4 aut | |
700 | 1 | |a Strohman, Trevor |e Verfasser |0 (DE-588)138148090 |4 aut | |
856 | 4 | 2 | |m Digitalisierung UB Passau - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029638660&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-029638660 | ||
883 | 1 | |8 1\p |a cgwrk |d 20201028 |q DE-101 |u https://d-nb.info/provenance/plan#cgwrk |
Datensatz im Suchindex
_version_ | 1804177388769116161 |
---|---|
adam_text | Contents
1 Search Engines and Information Retrieval........................ 1
1.1 What Is Information Retrieval ?............................... 1
1.2 The Big Issues................................................ 4
1.3 Search Engines................................................ 6
1.4 Search Engineers.............................................. 9
2 Architecture of a Search Engine................................... 13
2.1 What Is an Architecture ?.................................... 13
2.2 Basic Building Blocks........................................ 14
2.3 Breaking It Down............................................. 17
2.3.1 Text Acquisition....................................... 17
2.3.2 Text Transformation.................................... 19
2.3.3 Index Creation......................................... 22
2.3.4 User Interaction....................................... 23
2.3.5 Ranking................................................ 25
2.3.6 Evaluation............................................. 27
2.4 How Does It Really Work?.................................... 28
3 Crawls and Feeds................................................ 31
3.1 Deciding What to Search...................................... 31
3.2 Crawling the Web............................................. 32
3.2.1 Retrieving Web Pages................................... 33
3.2.2 The Web Crawler........................................ 35
3.2.3 Freshness.............................................. 37
3.2.4 Focused Crawling....................................... 41
3.2.5 Deep Web............................................... 41
X Contents
3.2.6 Sitemaps.......................................... 43
3.2.7 Distributed Crawling.............................. 44
3.3 Crawling Documents and Email............................ 46
3.4 Document Feeds.......................................... 47
3.5 The Conversion Problem................................ 49
3.5.1 Character Encodings............................... 50
3.6 Storing the Documents................................... 52
3.6.1 Using a Database System........................... 53
3.6.2 Random Access..................................... 53
3.6.3 Compression and Large Files....................... 54
3.6.4 Update............................................ 56
3.6.5 BigTable.......................................... 57
3.7 Detecting Duplicates ................................... 60
3.8 Removing Noise.......................................... 63
4 Processing Text..................................
4.1 From Words to Terms.........................
4.2 Text Statistics.............................
4.2.1 Vocabulary Growth.....................
4.2.2 Estimating Collection and Result Set Sizes
4.3 Document Parsing............................
4.3.1 Overview..............................
4.3.2 Tokenizing............................
4.3.3 Stopping..............................
4.3.4 Stemming..............................
4.3.5 Phrases and N-grams...................
4.4 Document Structure and Markup...............
4.5 Link Analysis...............................
4.5.1 Anchor Text...........................
4.5.2 PageRank..............................
4.5.3 Link Quality..........................
4.6 Information Extraction......................
4.6.1 Elidden Markov Models for Extraction ...
4.7 Internationalization........................
73
73
75
80
83
86
86
87
90
91
97
101
104
105
105
111
113
115
118
Contents XI
5 Ranking with Indexes............................................ 125
5.1 Overview.................................................. 125
5.2 Abstract Model of Ranking................................. 126
5.3 Inverted Indexes.......................................... 129
5.3.1 Documents........................................... 131
5.3.2 Counts.............................................. 133
5.3.3 Positions........................................... 134
5.3.4 Fields and Extents.................................. 136
5.3.5 Scores.............................................. 138
5.3.6 Ordering............................................ 139
5.4 Compression............................................... 140
5.4.1 Entropy and Ambiguity............................... 142
5.4.2 Delta Encoding...................................... 144
5.4.3 Bit-Aligned Codes................................... 145
5.4.4 Byte-Aligned Codes.................................. 148
5.4.5 Compression in Practice............................. 149
5.4.6 Looking Ahead....................................... 151
5.4.7 Skipping and Skip Pointers ......................... 151
5.5 Auxiliary Structures...................................... 154
5.6 Index Construction........................................ 156
5.6.1 Simple Construction................................. 156
5.6.2 Merging............................................. 157
5.6.3 Parallelism and Distribution........................ 158
5.6.4 Update.............................................. 164
5.7 Query Processing.......................................... 165
5.7.1 Document-at-a-time Evaluation....................... 166
5.7.2 Term-at-a-time Evaluation........................... 168
5.7.3 Optimization Techniques............................. 170
5.7.4 Structured Queries ................................. 178
5.7.5 Distributed Evaluation.............................. 180
5.7.6 Caching............................................. 181
6 Queries and Interfaces........................................ 187
6.1 Information Needs and Queries............................. 187
6.2 Query Transformation and Refinement....................... 190
6.2.1 Stopping and Stemming Revisited..................... 190
6.2.2 Spell Checking and Suggestions...................... 193
XII Contents
6.2.3 Query Expansion........................................ 199
6.2.4 Relevance Feedback......................................208
6.2.5 Context and Personalization.............................211
6.3 Showing the Results...........................................215
6.3.1 Result Pages and Snippets ..............................215
6.3.2 Advertising and Search................................. 218
6.3.3 Clustering the Results..................................221
6.4 Cross-Language Search........................................ 226
7 Retrieval Models.................................................233
7.1 Overview of Retrieval Models..................................233
7.1.1 Boolean Retrieval.......................................235
7.1.2 The Vector Space Model..................................237
7.2 Probabilistic Models..........................................243
7.2.1 Information Retrieval as Classification ................244
7.2.2 The BM25 Ranking Algorithm..............................250
7.3 Ranking Based on Language Models..............................252
7.3.1 Query Likelihood Ranking................................254
7.3.2 Relevance Models and Pseudo-Relevance Feedback........261
7.4 Complex Queries and Combining Evidence......................267
7.4.1 The Inference Network Model ............................268
7.4.2 The Galago Query Language...............................273
7.5 Web Search....................................................279
7.6 Machine Learning and Information Retrieval.................. 283
7.6.1 Learning to Rank........................................284
7.6.2 Topic Models and Vocabulary Mismatch....................288
7.7 Application-Based Models..................................... 291
8 Evaluating Search Engines........................................297
8.1 Why Evaluate?.................................................297
8.2 The Evaluation Corpus.........................................299
8.3 Logging..................................................... 305
8.4 Effectiveness Metrics.........................................308
8.4.1 Recall and Precision ...................................308
8.4.2 Averaging and Interpolation ........................... 313
8.4.3 Focusing on the Top Documents ..........................318
8.4.4 Using Preferences.......................................321
Contents XIII
8.5 Efficiency Metrics............................................322
8.6 Training, Testing, and Statistics.............................325
8.6.1 Significance Tests......................................325
8.6.2 Setting Parameter Values ...............................330
8.6.3 Online Testing..........................................332
8.7 The Bottom Line.............................................. 333
9 Classification and Clustering..................................... 339
9.1 Classification and Categorization.............................340
9.1.1 Naïve Bayes.............................................342
9.1.2 Support Vector Machines.................................351
9.1.3 Evaluation..............................................359
9.1.4 Classifier and Feature Selection........................359
9.1.5 Spam, Sentiment, and Online Advertising.................364
9.2 Clustering....................................................373
9.2.1 Hierarchical and iOMeans Clustering.....................375
9.2.2 K Nearest Neighbor Clustering...........................384
9.2.3 Evaluation..............................................386
9.2.4 How to Choose K.........................................387
9.2.5 Clustering and Search...................................389
10 Social Search .....................................................397
10.1 What Is Social Search?........................................397
10.2 User Tags and Manual Indexing................................400
10.2.1 Searching Tags.........................................402
10.2.2 Inferring Missing Tags.................................404
10.2.3 Browsing and Tag Clouds................................406
10.3 Searching with Communities...................................408
10.3.1 What Is a Community? ..................................408
10.3.2 Finding Communities....................................409
10.3.3 Community-Based Question Answering.....................415
10.3.4 Collaborative Searching................................420
10.4 Filtering and Recommending...................................423
10.4.1 Document Filtering.....................................423
10.4.2 Collaborative Filtering................................432
10.5 Peer-to-Peer and Metasearch..................................438
10.5.1 Distributed Search.....................................438
XIV Contents
10.5.2 P2P Networks.........................................442
11 Beyond Bag of Words............................................451
11.1 Overview...................................................451
11.2 Feature-Based Retrieval Models.............................452
11.3 Term Dependence Models.....................................454
11.4 Structure Revisited........................................459
11.4.1 XML Retrieval........................................461
11.4.2 Entity Search........................................464
11.5 Longer Questions, Better Answers...........................466
11.6 Words, Pictures, and Music.................................470
11.7 One Search Fits All?.......................................479
References.........................................................487
Index
513
|
any_adam_object | 1 |
author | Croft, W. Bruce 1952- Metzler, Donald Strohman, Trevor |
author_GND | (DE-588)137756658 (DE-588)138148090 |
author_facet | Croft, W. Bruce 1952- Metzler, Donald Strohman, Trevor |
author_role | aut aut aut |
author_sort | Croft, W. Bruce 1952- |
author_variant | w b c wb wbc d m dm t s ts |
building | Verbundindex |
bvnumber | BV044233086 |
callnumber-first | T - Technology |
callnumber-label | TK5105 |
callnumber-raw | TK5105.884 |
callnumber-search | TK5105.884 |
callnumber-sort | TK 45105.884 |
callnumber-subject | TK - Electrical and Nuclear Engineering |
classification_rvk | ST 270 |
ctrlnum | (OCoLC)836855288 (DE-599)BVBBV044233086 |
dewey-full | 005.75/8 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 005 - Computer programming, programs, data, security |
dewey-raw | 005.75/8 |
dewey-search | 005.75/8 |
dewey-sort | 15.75 18 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02036nam a2200505 c 4500</leader><controlfield tag="001">BV044233086</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20170412 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">170320s2010 xxua||| |||| 00||| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">009002640</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780136072249</subfield><subfield code="9">978-0-13-607224-9</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0136072240</subfield><subfield code="9">0-13-607224-0</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)836855288</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV044233086</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxu</subfield><subfield code="c">US</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-739</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">TK5105.884</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">005.75/8</subfield><subfield code="2">22</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 270</subfield><subfield code="0">(DE-625)143638:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Croft, W. Bruce</subfield><subfield code="d">1952-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)137756658</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Search engines</subfield><subfield code="b">information retrieval in practice</subfield><subfield code="c">W. Bruce Croft ; Donald Metzler ; Trevor Strohman</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boston</subfield><subfield code="b">Addison-Wesley</subfield><subfield code="c">c2010</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xxv, 520 Seiten.</subfield><subfield code="b">ill.</subfield><subfield code="c">24 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references (p. [487]-511) and index</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Search engines</subfield><subfield code="x">Programming</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Information retrieval</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Information Storage and Retrieval</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Knowledge Bases</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Suchmaschine</subfield><subfield code="0">(DE-588)4423007-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Information Retrieval</subfield><subfield code="0">(DE-588)4072803-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="8">1\p</subfield><subfield code="0">(DE-588)4123623-3</subfield><subfield code="a">Lehrbuch</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Suchmaschine</subfield><subfield code="0">(DE-588)4423007-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Information Retrieval</subfield><subfield code="0">(DE-588)4072803-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Metzler, Donald</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Strohman, Trevor</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)138148090</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029638660&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-029638660</subfield></datafield><datafield tag="883" ind1="1" ind2=" "><subfield code="8">1\p</subfield><subfield code="a">cgwrk</subfield><subfield code="d">20201028</subfield><subfield code="q">DE-101</subfield><subfield code="u">https://d-nb.info/provenance/plan#cgwrk</subfield></datafield></record></collection> |
genre | 1\p (DE-588)4123623-3 Lehrbuch gnd-content |
genre_facet | Lehrbuch |
id | DE-604.BV044233086 |
illustrated | Illustrated |
indexdate | 2024-07-10T07:47:17Z |
institution | BVB |
isbn | 9780136072249 0136072240 |
language | English |
lccn | 009002640 |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-029638660 |
oclc_num | 836855288 |
open_access_boolean | |
owner | DE-739 |
owner_facet | DE-739 |
physical | xxv, 520 Seiten. ill. 24 cm |
publishDate | 2010 |
publishDateSearch | 2010 |
publishDateSort | 2010 |
publisher | Addison-Wesley |
record_format | marc |
spelling | Croft, W. Bruce 1952- Verfasser (DE-588)137756658 aut Search engines information retrieval in practice W. Bruce Croft ; Donald Metzler ; Trevor Strohman Boston Addison-Wesley c2010 xxv, 520 Seiten. ill. 24 cm txt rdacontent n rdamedia nc rdacarrier Includes bibliographical references (p. [487]-511) and index Search engines Programming Information retrieval Information Storage and Retrieval Knowledge Bases Suchmaschine (DE-588)4423007-2 gnd rswk-swf Information Retrieval (DE-588)4072803-1 gnd rswk-swf 1\p (DE-588)4123623-3 Lehrbuch gnd-content Suchmaschine (DE-588)4423007-2 s Information Retrieval (DE-588)4072803-1 s DE-604 Metzler, Donald Verfasser aut Strohman, Trevor Verfasser (DE-588)138148090 aut Digitalisierung UB Passau - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029638660&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis 1\p cgwrk 20201028 DE-101 https://d-nb.info/provenance/plan#cgwrk |
spellingShingle | Croft, W. Bruce 1952- Metzler, Donald Strohman, Trevor Search engines information retrieval in practice Search engines Programming Information retrieval Information Storage and Retrieval Knowledge Bases Suchmaschine (DE-588)4423007-2 gnd Information Retrieval (DE-588)4072803-1 gnd |
subject_GND | (DE-588)4423007-2 (DE-588)4072803-1 (DE-588)4123623-3 |
title | Search engines information retrieval in practice |
title_auth | Search engines information retrieval in practice |
title_exact_search | Search engines information retrieval in practice |
title_full | Search engines information retrieval in practice W. Bruce Croft ; Donald Metzler ; Trevor Strohman |
title_fullStr | Search engines information retrieval in practice W. Bruce Croft ; Donald Metzler ; Trevor Strohman |
title_full_unstemmed | Search engines information retrieval in practice W. Bruce Croft ; Donald Metzler ; Trevor Strohman |
title_short | Search engines |
title_sort | search engines information retrieval in practice |
title_sub | information retrieval in practice |
topic | Search engines Programming Information retrieval Information Storage and Retrieval Knowledge Bases Suchmaschine (DE-588)4423007-2 gnd Information Retrieval (DE-588)4072803-1 gnd |
topic_facet | Search engines Programming Information retrieval Information Storage and Retrieval Knowledge Bases Suchmaschine Information Retrieval Lehrbuch |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=029638660&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT croftwbruce searchenginesinformationretrievalinpractice AT metzlerdonald searchenginesinformationretrievalinpractice AT strohmantrevor searchenginesinformationretrievalinpractice |