Information retrieval in practice:
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Upper Saddle River, NJ [u.a.]
Pearson Education
2009
|
Ausgabe: | International ed. |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | XXV, 524 S. graph. Darst. |
ISBN: | 9780131364899 0131364898 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV035437138 | ||
003 | DE-604 | ||
005 | 20120320 | ||
007 | t | ||
008 | 090416s2009 d||| |||| 00||| eng d | ||
015 | |a GBA914654 |2 dnb | ||
020 | |a 9780131364899 |9 978-0-13-136489-9 | ||
020 | |a 0131364898 |9 0-13-136489-8 | ||
035 | |a (OCoLC)305146830 | ||
035 | |a (DE-599)BVBBV035437138 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
049 | |a DE-12 |a DE-355 | ||
082 | 0 | |a 025.04 |2 22 | |
084 | |a ST 270 |0 (DE-625)143638: |2 rvk | ||
084 | |a 24,1 |2 ssgn | ||
100 | 1 | |a Croft, W. Bruce |d 1952- |e Verfasser |0 (DE-588)137756658 |4 aut | |
245 | 1 | 0 | |a Information retrieval in practice |c by Bruce Croft, Donald Metzler, Trevor Strohman |
250 | |a International ed. | ||
264 | 1 | |a Upper Saddle River, NJ [u.a.] |b Pearson Education |c 2009 | |
300 | |a XXV, 524 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 4 | |a Search engines | |
650 | 4 | |a Information storage and retrieval systems | |
650 | 4 | |a Information retrieval | |
650 | 4 | |a Information retrieval | |
650 | 4 | |a Information storage and retrieval systems | |
650 | 4 | |a Search engines | |
650 | 0 | 7 | |a Suchmaschine |0 (DE-588)4423007-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Information Retrieval |0 (DE-588)4072803-1 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Suchmaschine |0 (DE-588)4423007-2 |D s |
689 | 0 | 1 | |a Information Retrieval |0 (DE-588)4072803-1 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Metzler, Donald |e Verfasser |4 aut | |
700 | 1 | |a Strohman, Trevor |e Verfasser |0 (DE-588)138148090 |4 aut | |
856 | 4 | 2 | |m Digitalisierung BSBMuenchen |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017357446&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-017357446 |
Datensatz im Suchindex
_version_ | 1804138882099314688 |
---|---|
adam_text | Contents
1
Search Engines and Information Retrieval
....................... 1
1.1
What
Ь
Information Retrieval?
.............................. 1
1.2
TheBigbsues
............................................. 4
1.3
Search Engines
............................................ 6
1.4
Search Engineers
.......................................... 9
2
Architecture of a Search Engine
................................. 13
2.1
What ban Architecture?
................................... 13
2.2
Basic Building Blocks
...................................... 14
2.3
Breaking It Down
......................................... 17
2.3.1
Text Acquisition
..................................... 17
2.3.2
Text Transformation
................................. 19
2.3.3
Index Creation
...................................... 22
2.3.4
User Interaction
..................................... 23
2.3.5
Ranking
............................................ 25
2.3.6
Evaluation
.......................................... 27
2.4
How Does It Artwork?
.................................. 28
3
Crawls and Feeds
.............................................. 31
3.1
DeddingWhat to Search
................................... 31
3.2
CrawlingtheWeb
......................................... 32
3.2.1
Retrieving Web Pages
................................ 33
3.2.2
The Web Crawler
.................................... 35
3.2.3
Freshness
........................................... 37
3.2.4
Focused Crawling
................................... 41
3.2.5
Deep Web
.......................................... 41
Contents
3.2.6 Sitemaps........................................... 43
3.2.7
Distributed Crawling
................................ 44
3.3
Crawling Documents and Email
............................. 46
3.4
Document Feeds
.......................................... 47
3.5
The Conversion Problem
.................................«.. 49
3.5.1
Character Encodings
................................. 50
3.6
Storing the Documents
..................................... 52
3.6.1
Using a Database System
.............................. 53
3.6.2
Random Access
..................................... 53
3.6.3
Compression and Large Files
.......................... 54
3.6.4
Update
............................................. 56
3.6.5
BigTable
............................................ 57
3.7
Detecting Duplicates
...................................... 60
3.8
Removing Noise
........................................... 63
Processing Text
............................................... 75
4.1
From Words to Terms
...................................... 75
4.2
Text Statistics
............................................. 77
4.2.1
Vocabulary Growth
.................................. 82
4.2.2
Estimating Collection and Result Set Sizes
.............. 85
4.3
Document Parsing
.......................................· · 88
4.3.1
Overview
........................................... 88
4.3.2
Tokenizing
......................................... 89
4.3.3
Stopping
........................................... 92
4.3.4
Stemming
..........................................
93
4.3.5
Phrases and N-grams
................;................
°°
4.4
Document Structure and Markup
............................
4.5
Link Analysis
.............................................
106
4.5.1
Anchor Text
........................................
107
4.5.2
PageRank
........................................···
l07
4.5.3
LinkQuality
....................................·····
113
4.6
Information Extraction
.....................................
117
4.6.1
Hidden Markov Models for Extraction
.................
4.7
Internationalization
..........................·.............
Contents
XI
5
Ranking
with Indexes
.......................................... 127
5.1
Overview
................................................. 127
5.2
Abstract Model of Ranking
................................. 128
5.3
Inverted Indexes
........................................... 131
5.3.1
Documents
......................................... 133
5.3.2
Counts
............................................. 135
5.3.3
Positions
........................................... 136
5.3.4
Fields and Extents
................................... 138
5.3.5
Scores
.............................................. 140
5.3.6
Ordering
........................................... 141
5.4
Compression
............................................. 142
5.4.1
Entropy and Ambiguity
.............................. 144
5.4.2
DeltaEncoding
..................................... 146
5.4.3
Bit-Aligned Codes
................................... 147
5.4.4
Byte-Aligned Codes
.................................. 150
5.4.5
Compression in Practice
.............................. 151
5.4.6
Looking Ahead
...................................... 153
5.4.7
Skipping and Skip Pointers
........................... 153
5.5
Auxiliary Structures
........................................ 156
5.6
Index Construction
........................................ 158
5.6.1
Simple Construction
................................. 158
5.6.2
Merging
............................................ 159
5.6.3
Parallelism and Distribution
.......................... 160
5.6.4
Update
............................................. 166
5.7
Query Processing
.......................................... 167
5.7.1
Documcnt-at-a-timc Evaluation
....................... 168
5.7.2
Term-at-a-time Evaluation
............................ 170
5.7.3
Optimization Techniques
............................. 172
5.7.4
Structured Queries
.................................. 180
5.7.5
Distributed Evaluation
............................... 182
5.7.6
Caching
............................................ 183
6
Queries and Interfaces
......................................... 191
6.1
Information Needs and Queries
............................. 191
6.2
Query
Transformarion
and Refinement
....................... 194
6.2.1
Stopping and
Stemming Revisited
..................... 194
6.2.2
Spell Checking and Suggestions
....................... 197
XII Contents
6.2.3
Query Expansion
....................................203
6.2.4
Relevance Feedback
..................................212
6.2.5
Context and Personalization
..........................215
6.3
ShowingtheResults
.......................................219
6.3.1
Result Pages and Snippets
............................ 219
6.3.2
Advertising and Search
............................... 222
6.3.3
Clustering the Results
................................ 225
6.4
Cross-Language Search
..................................... 230
7
Retrieval Models
..............................................237
7.1
Overview of Retrieval Models
...............................237
7.1.1
Boolean Retrieval
....................................239
7.1.2
The Vector Space Model
..............................241
7.2
Probabilistic Models
.......................................247
7.2.1
Information Retrieval as Classification
.................248
7.2.2
TheBM25 Ranking Algorithm
........................254
7.3
Ranking Based on Language Models
.........................256
7.3.1
Query Likelihood Ranking
...........................258
7.3.2
Relevance Models and Pseudo-Relevance Feedback
...... 265
7.4
Complex Queries and Combining Evidence
...................271
7.4.1
The Inference Network Model
........................272
7.4.2
The Galago Query Language
..........................277
7.5
WebSearch
...............................................283
7.6
Machine Learning and Information Retrieval
..................287
7.6.1
Learning«) Rank
....................................288
7.6.2
Topic Models and Vocabulary Mismatch
................292
7.7
Application-Based Models
..................................295
8
Evaluating Search Engines
......................................
^
8.1
Why Evaluate?
............................................
301
8.2
The Evaluation Corpus
.....................................
3°3
8-3
Logging
..................................................309
8.4
Effectiveness Metrics
.......................................
^
ВАЛ
Recall and Precision
.................................312
417
8.4.2
Averaging and Interpolation
..........................■*
8.4.3
Focusing on the Top Documents
......................
8.4.4
Using Preferences
....................................
325
Contents XIII
8.5
Efficiency Metrics
.........................................326
8.6
Training, Testing, and Statistics
..............................329
8.6.1
Significance Tests
....................................329
8.6.2
Setting Parameter Values
.............................334
8.6.3
Online Testing
......................................336
8.7
The Bottom Line
..........................................337
9
Classification and Clustering
................................... 343
9.1
Classification and Categorization
............................344
9.1.1
NaïveBayes
.........................................346
9.1.2
Support Vector Machines
.............................355
9.1.3
Evaluation
..........................................363
9.1.4
Classifier and Feature Selection
........................363
9.1.5
Spam, Sentiment, and Online Advertising
..............368
9.2
Clustering
................................................377
9.2.1
Hierarchical and /f-Means Clustering
.................. 379
9.2.2
К
Nearest Neighbor Clustering
....................... 388
9.2.3
Evaluation
.......................................... 390
9.2.4
How to Choose
К
................................... 391
9.2.5
Clustering and Search
................................ 393
10
SocialSearch
.................................................401
10.1
What
Ь
Social Search?
.....................................401
10.2
User Tags and Manual Indexing
.............................404
10.2.1
Searching Tags
......................................406
10.2.2
Inferring Missing Tags
................................408
10.2.3
Browsing and Tag Clouds
.............................410
10.3
Searchingwith Communities
...............................412
10.3.1
What Is a Community?
..............................412
10.3.2
Finding Communities
................................413
10.3.3
Community-Based Question Answering
................419
10.3.4
Collaborative Searching
..............................424
10.4
Filtering and Recommending
...............................427
10.4.1
Document Filtering
..................................427
10.4.2
Collaborative Filtering
...............................436
10.5
Peer-to-PeerandMetasearch
................................442
10.5.1
Distributed Search
...................................442
XIV Contents
10.5.2
P2P
Networb......................................446
11
Beyond Bag of Words
..........................................455
11.1
Overview
.................................................455
11.2
Feature-Based Retrieval Models
.............................456
11.3
Term Dependence Models
..................................458
11.4
Structure Revisited
........................................463
11.4.1
XML Retrieval
......................................465
11.4.2
Entity Search
.......................................468
11.5
Longer Questions, Better Answers
...........................470
11.6
Words, Pictures, and Music
.................................474
11.7
One Search Fits All?
.......................................483
References
........................................................491
Index
............................................................ 517
|
any_adam_object | 1 |
author | Croft, W. Bruce 1952- Metzler, Donald Strohman, Trevor |
author_GND | (DE-588)137756658 (DE-588)138148090 |
author_facet | Croft, W. Bruce 1952- Metzler, Donald Strohman, Trevor |
author_role | aut aut aut |
author_sort | Croft, W. Bruce 1952- |
author_variant | w b c wb wbc d m dm t s ts |
building | Verbundindex |
bvnumber | BV035437138 |
classification_rvk | ST 270 |
ctrlnum | (OCoLC)305146830 (DE-599)BVBBV035437138 |
dewey-full | 025.04 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 025 - Operations of libraries and archives |
dewey-raw | 025.04 |
dewey-search | 025.04 |
dewey-sort | 225.04 |
dewey-tens | 020 - Library and information sciences |
discipline | Allgemeines Informatik |
edition | International ed. |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01901nam a2200493 c 4500</leader><controlfield tag="001">BV035437138</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20120320 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">090416s2009 d||| |||| 00||| eng d</controlfield><datafield tag="015" ind1=" " ind2=" "><subfield code="a">GBA914654</subfield><subfield code="2">dnb</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780131364899</subfield><subfield code="9">978-0-13-136489-9</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0131364898</subfield><subfield code="9">0-13-136489-8</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)305146830</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV035437138</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-12</subfield><subfield code="a">DE-355</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">025.04</subfield><subfield code="2">22</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 270</subfield><subfield code="0">(DE-625)143638:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">24,1</subfield><subfield code="2">ssgn</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Croft, W. Bruce</subfield><subfield code="d">1952-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)137756658</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Information retrieval in practice</subfield><subfield code="c">by Bruce Croft, Donald Metzler, Trevor Strohman</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">International ed.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Upper Saddle River, NJ [u.a.]</subfield><subfield code="b">Pearson Education</subfield><subfield code="c">2009</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXV, 524 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Search engines</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Information storage and retrieval systems</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Information retrieval</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Information retrieval</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Information storage and retrieval systems</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Search engines</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Suchmaschine</subfield><subfield code="0">(DE-588)4423007-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Information Retrieval</subfield><subfield code="0">(DE-588)4072803-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Suchmaschine</subfield><subfield code="0">(DE-588)4423007-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Information Retrieval</subfield><subfield code="0">(DE-588)4072803-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Metzler, Donald</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Strohman, Trevor</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)138148090</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung BSBMuenchen</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017357446&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-017357446</subfield></datafield></record></collection> |
id | DE-604.BV035437138 |
illustrated | Illustrated |
indexdate | 2024-07-09T21:35:15Z |
institution | BVB |
isbn | 9780131364899 0131364898 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-017357446 |
oclc_num | 305146830 |
open_access_boolean | |
owner | DE-12 DE-355 DE-BY-UBR |
owner_facet | DE-12 DE-355 DE-BY-UBR |
physical | XXV, 524 S. graph. Darst. |
publishDate | 2009 |
publishDateSearch | 2009 |
publishDateSort | 2009 |
publisher | Pearson Education |
record_format | marc |
spelling | Croft, W. Bruce 1952- Verfasser (DE-588)137756658 aut Information retrieval in practice by Bruce Croft, Donald Metzler, Trevor Strohman International ed. Upper Saddle River, NJ [u.a.] Pearson Education 2009 XXV, 524 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Search engines Information storage and retrieval systems Information retrieval Suchmaschine (DE-588)4423007-2 gnd rswk-swf Information Retrieval (DE-588)4072803-1 gnd rswk-swf Suchmaschine (DE-588)4423007-2 s Information Retrieval (DE-588)4072803-1 s DE-604 Metzler, Donald Verfasser aut Strohman, Trevor Verfasser (DE-588)138148090 aut Digitalisierung BSBMuenchen application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017357446&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Croft, W. Bruce 1952- Metzler, Donald Strohman, Trevor Information retrieval in practice Search engines Information storage and retrieval systems Information retrieval Suchmaschine (DE-588)4423007-2 gnd Information Retrieval (DE-588)4072803-1 gnd |
subject_GND | (DE-588)4423007-2 (DE-588)4072803-1 |
title | Information retrieval in practice |
title_auth | Information retrieval in practice |
title_exact_search | Information retrieval in practice |
title_full | Information retrieval in practice by Bruce Croft, Donald Metzler, Trevor Strohman |
title_fullStr | Information retrieval in practice by Bruce Croft, Donald Metzler, Trevor Strohman |
title_full_unstemmed | Information retrieval in practice by Bruce Croft, Donald Metzler, Trevor Strohman |
title_short | Information retrieval in practice |
title_sort | information retrieval in practice |
topic | Search engines Information storage and retrieval systems Information retrieval Suchmaschine (DE-588)4423007-2 gnd Information Retrieval (DE-588)4072803-1 gnd |
topic_facet | Search engines Information storage and retrieval systems Information retrieval Suchmaschine Information Retrieval |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=017357446&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT croftwbruce informationretrievalinpractice AT metzlerdonald informationretrievalinpractice AT strohmantrevor informationretrievalinpractice |