Modern information retrieval: the concepts and technology behind search
Gespeichert in:
Vorheriger Titel: | Baeza-Yates, Ricardo Modern information retrieval |
---|---|
Hauptverfasser: | , |
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Harlow ; Munich [u.a.]
Pearson
2011
|
Ausgabe: | 2. ed. |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Hier auch später erschienene, unveränderte Nachdrucke |
Beschreibung: | XXX, 913 S. Ill., graph. Darst. |
ISBN: | 9780321416919 0321416910 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV022216400 | ||
003 | DE-604 | ||
005 | 20220322 | ||
007 | t | ||
008 | 070108s2011 ad|| |||| 00||| eng d | ||
010 | |a 2010045454 | ||
020 | |a 9780321416919 |9 978-0-321-41691-9 | ||
020 | |a 0321416910 |9 0-321-41691-0 | ||
035 | |a (OCoLC)634765757 | ||
035 | |a (DE-599)BVBBV022216400 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
049 | |a DE-473 |a DE-92 |a DE-20 |a DE-11 |a DE-29T |a DE-859 |a DE-355 |a DE-634 |a DE-91G |a DE-Aug4 |a DE-739 |a DE-2070s |a DE-706 |a DE-526 |a DE-522 |a DE-858 |a DE-573 | ||
050 | 0 | |a ZA3075 .B34 2011 | |
082 | 0 | |a 005.7 | |
082 | 0 | |a 025.04 | |
084 | |a ST 270 |0 (DE-625)143638: |2 rvk | ||
084 | |a ST 515 |0 (DE-625)143677: |2 rvk | ||
084 | |a ST 270 |0 (DE-625)143638: |2 rvk | ||
084 | |a ST 205 |0 (DE-625)143613: |2 rvk | ||
084 | |a DAT 825f |2 stub | ||
100 | 1 | |a Baeza-Yates, Ricardo |e Verfasser |4 aut | |
245 | 1 | 0 | |a Modern information retrieval |b the concepts and technology behind search |c Ricardo Baeza-Yates ; Berthier Ribeiro-Neto |
250 | |a 2. ed. | ||
264 | 1 | |a Harlow ; Munich [u.a.] |b Pearson |c 2011 | |
300 | |a XXX, 913 S. |b Ill., graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Hier auch später erschienene, unveränderte Nachdrucke | ||
650 | 4 | |a Information retrieval | |
650 | 0 | 7 | |a Information Retrieval |0 (DE-588)4072803-1 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Information Retrieval |0 (DE-588)4072803-1 |D s |
689 | 0 | |5 DE-188 | |
700 | 1 | |a Ribeiro, Berthier de Araújo Neto |d 1960- |e Verfasser |0 (DE-588)129860751 |4 aut | |
780 | 0 | 0 | |i Früher u.d.T. |a Baeza-Yates, Ricardo |t Modern information retrieval |w (DE-604)BV012361456 |
856 | 4 | 2 | |m HBZ Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=015427689&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-015427689 |
Datensatz im Suchindex
_version_ | 1804136197928255488 |
---|---|
adam_text | Titel: Modern information retrieval
Autor: Baeza-Yates, Ricardo
Jahr: 2011
Contents
Preface to the Second Edition xix
Preface to the First Edition xxi
Authors Acknowledgements to the Second Edition xxiii
Authors Acknowledgements to the First Edition xxv
Publishers Acknowledgements xxvii
1 Introduction 1
1.1 Information Retrieval ........................... 1
1.1.1 Early Developments........................ 1
1.1.2 Information Retrieval in Libraries and Digital Libraries .... 3
1.1.3 IR at the Center of the Stage................... 3
1.2 The IR Problem .............................. 3
1.2.1 The User s Task .......................... 4
1.2.2 Information versus Data Retrieval................ 5
1.3 The IR System............................... 5
1.3.1 Software Architecture of the IR System............. 5
1.3.2 The Retrieval and Ranking Processes .............. 7
1.4 The Web .................................. 8
1.4.1 A Brief History........................... 8
1.4.2 The e-Publishing Era ....................... 9
1.4.3 How the Web Changed Search .................. 10
1.4.4 Practical Issues on the Web.................... 12
1.5 Organization of the Book......................... 12
1.5.1 Focus of the Book......................... 12
1.5.2 Book Contents........................... 13
1.6 The Book Web Site: A Teaching Resource................ 16
1.7 Bibliographic Discussion.......................... 17
2 User Interfaces for Search 21
by Marti Hearst
2.1 Introduction................................. 21
2.2 How People Search............................. 21
I CONTENTS
2.2.1 Information Lookup versus Exploratory Search......... 22
2.2.2 Classic versus Dynamic Model of Information Seeking..... 23
2.2.3 Navigation versus Search ..................... 24
2.2.4 Observations of the Search Process................ 24
2.3 Search Interfaces Today.......................... 25
2.3.1 Getting Started........................... 25
2.3.2 Query Specification ........................ 26
2.3.3 Query Specification Interfaces................... 27
2.3.4 Retrieval Results Display..................... 29
2.3.5 Query Reformulation........................ 32
2.3.6 Organizing Search Results..................... 35
2.4 Visualization in Search Interfaces..................... 40
2.4.1 Visualizing Boolean Syntax.................... 42
2.4.2 Visualizing Query Terms within Retrieval Results ....... 43
2.4.3 Visualizing Relationships Among Words and Documents .... 47
2.4.4 Visualization for Text Mining................... 49
2.5 Design and Evaluation of Search Interfaces ............... 50
2.6 Trends and Research Issues........................ 54
2.7 Bibliographic Discussion.......................... 54
Modeling 57
3.1 IR Models.................................. 57
3.1.1 Modeling and Ranking....................... 57
3.1.2 Characterization of an IR Model................. 58
3.1.3 A Taxonomy of IR Models..................... 59
3.2 Classic Information Retrieval....................... 61
3.2.1 Basic Concepts........................... 61
3.2.2 The Boolean Model ........................ 64
3.2.3 Term Weighting .......................... 66
3.2.4 TF-IDF Weights.......................... 68
3.2.5 Document Length Normalization................. 75
3.2.6 The Vector Model......................... 77
3.2.7 The Probabilistic Model...................... 79
3.2.8 Brief Comparison of Classic Models ............... 86
3.3 Alternative Set Theoretic Models..................... 87
3.3.1 Set-Based Model.......................... 87
3.3.2 Extended Boolean Model..................... 92
3.3.3 Fuzzy Set Model.......................... 95
3.4 Alternative Algebraic Models....................... 98
3.4.1 Generalized Vector Space Model................. 98
3.4.2 Latent Semantic Indexing Model................. 101
3.4.3 Neural Network Model....................... 102
3.5 Alternative Probabilistic Models..................... 104
3.5.1 BM25................................ 104
3.5.2 Language Models.......................... 107
3.5.3 Divergence from Randomness................... 113
3.5.4 Bayesian Network Models..................... 116
3.6 Other Models................................ 124
CONTENTS ix
3.6.1 The Hypertext Model....................... 124
3.6.2 Web based Models......................... 125
3.6.3 Structured Text Retrieval..................... 126
3.6.4 Multimedia Retrieval........................ 126
3.6.5 Enterprise and Vertical Search.................. 126
3.7 Trends and Research Issues........................ 127
3.8 Bibliographic Discussion.......................... 128
Retrieval Evaluation 131
4.1 Introduction................................. 131
4.2 The Cranfield Paradigm.......................... 132
4.2.1 A Brief History........................... 132
4.2.2 Reference Collections........................ 134
4.3 Retrieval Metrics.............................. 134
4.3.1 Precision and Recall........................ 135
4.3.2 Single Value Summaries: P@n, MAP, MRR, F......... 139
4.3.3 User-Oriented Measures...................... 144
4.3.4 DCG: Discounted Cumulated Gain................ 145
4.3.5 BPREF: Binary Preferences.................... 150
4.3.6 Rank Correlation Metrics..................... 153
4.4 Reference Collections............................ 158
4.4.1 The TREC Collections....................... 159
4.4.2 Other Reference Collections.................... 166
4.4.3 Other Small Test Collections................... 167
4.5 User-Based Evaluation........................... 168
4.5.1 Human Experimentation in the Lab............... 168
4.5.2 Side-by-Side Panels......................... 168
4.5.3 A/B Testing ............................ 169
4.5.4 Crowdsourcing........................... 170
4.5.5 Evaluation using Clickthrough Data............... 171
4.6 Practical Caveats.............................. 173
4.7 Trends and Research Issues........................ 174
4.8 Bibliographic Discussion.......................... 174
Relevance Feedback and Query Expansion 177
5.1 Introduction................................. 177
5.2 A Framework for Feedback Methods................... 178
5.3 Explicit Relevance Feedback........................ 180
5.3.1 Relevance Feedback for the Vector Model: Rocchio Method . . 181
5.3.2 Relevance Feedback for the Probabilistic Model......... 183
5.3.3 Evaluation of Relevance Feedback ................ 184
5.4 Explicit Feedback Through Clicks..................... 185
5.4.1 Eye Tracking and Relevance Judgements............. 185
5.4.2 User Behavior............................ 186
5.4.3 Clicks as a Metric of User Preferences.............. 187
5.5 Implicit Feedback Through Local Analysis................ 190
5.5.1 Implicit Feedback Through Local Clustering........... 190
5.5.2 Implicit Feedback through Local Context Analysis....... 193
X CONTENTS
5.6 Implicit Feedback Through Global Analysis............... 195
5.6.1 Query Expansion based on a Similarity Thesaurus....... 195
5.6.2 Query Expansion based on a Statistical Thesaurus....... 198
5.7 Trends and Research Issues........................ 200
5.8 Bibliographic Discussion.......................... 200
6 Documents: Languages Properties 203
with Gonzalo Navarro and Nivio Ziviani
6.1 Introduction................................. 203
6.2 Metadata.................................. 205
6.3 Document Formats............................. 206
6.3.1 Text................................. 206
6.3.2 Multimedia............................. 207
6.3.3 Graphics and Virtual Reality................... 208
6.4 Markup Languages............................. 208
6.4.1 SGML................................ 209
6.4.2 HTML................................ 211
6.4.3 XML................................. 214
6.4.4 RDF: Resource Description Framework ............. 216
6.4.5 HyTime............................... 217
6.5 Text Properties............................... 218
6.5.1 Information Theory ........................ 218
6.5.2 Modeling Natural Language.................... 219
6.5.3 Text Similarity........................... 222
6.6 Document Preprocessing.......................... 223
6.6.1 Lexical Analysis of the Text.................... 224
6.6.2 Elimination of Stopwords..................... 226
6.6.3 Stemming.............................. 226
6.6.4 Keyword Selection......................... 227
6.6.5 Thesauri............................... 228
6.7 Organizing Documents........................... 231
6.7.1 Taxonomies............................. 231
6.7.2 Folksonomies............................ 232
6.8 Text Compression ............................. 233
6.8.1 Basic Concepts........................... 234
6.8.2 Statistical Methods......................... 234
6.8.3 Statistical Methods: Modeling .................. 235
6.8.4 Statistical Methods: Coding.................... 238
6.8.5 Dictionary Methods........................ 245
6.8.6 Preprocessing for Compression.................. 246
6.8.7 Comparing Text Compression Techniques............ 248
6.8.8 Structured Text Compression................... 249
6.9 Trends and Research Issues........................ 250
6.10 Bibliographical Discussion......................... 253
7 Queries: Languages Properties 255
with Gonzalo Navarro
7.1 Query Languages.............................. 255
CONTENTS xi
7.1.1 Keyword-based Querying..................... 256
7.1.2 Beyond Keywords......................... 259
7.1.3 Structural Queries......................... 262
7.1.4 Query Protocols.......................... 265
7.2 Query Properties.............................. 267
7.2.1 Characterizing Web Queries.................... 267
7.2.2 User Search Behavior ....................... 269
7.2.3 Query Intent............................ 270
7.2.4 Query Topic ............................ 272
7.2.5 Query Sessions and Missions ................... 273
7.2.6 Query Difficulty .......................... 274
7.3 Trends and Research Issues........................ 278
7.4 Bibliographical Discussion......................... 279
8 Text Classification 281
with Marcos Gongalves
8.1 Introduction................................. 281
8.2 A Characterization of Text Classification................. 282
8.2.1 Machine Learning ......................... 282
8.2.2 The Text Classification Problem................. 283
8.2.3 Text Classification Algorithms .................. 284
8.3 Unsupervised Algorithms ......................... 286
8.3.1 Clustering.............................. 286
8.3.2 Naive Text Classification ..................... 290
8.4 Supervised Algorithms........................... 291
8.4.1 Decision Trees ........................... 294
8.4.2 The k-NN Classifier........................ 299
8.4.3 The Rocchio Classifier....................... 300
8.4.4 Probabilistic Naive Bayes Document Classification....... 303
8.4.5 The SVM Classifier ........................ 306
8.4.6 Ensemble Classifiers........................ 316
8.4.7 Final Remarks on Supervised Algorithms............ 319
8.5 Feature Selection or Dimensionality Reduction ............. 320
8.5.1 Term-Class Incidence Table.................... 321
8.5.2 Term Document Frequency.................... 322
8.5.3 TF-IDF Weights.......................... 322
8.5.4 Mutual Information........................ 323
8.5.5 Information Gain.......................... 323
8.5.6 Chi Square............................. 324
8.5.7 Impact of Feature Selection.................... 325
8.6 Evaluation Metrics............................. 325
8.6.1 Contingency Table......................... 325
8.6.2 Accuracy and Error........................ 326
8.6.3 Precision and Recall........................ 327
8.6.4 F-measure and Fi ......................... 327
8.6.5 Cross-Validation.......................... 329
8.6.6 Standard Collections........................ 329
8.7 Organizing the Classes - Building Taxonomies ............. 330
xii CONTENTS
8.8 Trends and Research Issues........................ 333
8.9 Bibliographic Discussion.......................... 334
9 Indexing and Searching 337
with Gonzalo Navarro
9.1 Introduction................................. 337
9.2 Inverted Indexes.............................. 340
9.2.1 Basic Concepts........................... 340
9.2.2 Full Inverted Indexes........................ 341
9.2.3 Searching.............................. 345
9.2.4 Ranking............................... 348
9.2.5 Construction............................ 351
9.2.6 Compressed Inverted Indexes................... 354
9.2.7 Structural Queries......................... 357
9.3 Signature Files............................... 357
9.4 Suffix Trees and Suffix Arrays....................... 360
9.4.1 Structure: Tries and Suffix Trees................. 361
9.4.2 Searching for Simple Strings.................... 362
9.4.3 Searching for Complex Patterns.................. 363
9.4.4 Construction............................ 365
9.4.5 Compressed Suffix Arrays..................... 367
9.5 Sequential Searching............................ 372
9.5.1 Simple Strings: Horspool ..................... 373
9.5.2 Complex Patterns: Automata and Bit-Parallelism....... 375
9.5.3 Faster Bit-Parallel Algorithms .................. 379
9.5.4 Regular Expressions........................ 382
9.5.5 Multiple Patterns.......................... 384
9.5.6 Approximate Searching...................... 385
9.5.7 Searching Compressed Text.................... 389
9.6 Multi-dimensional Indexing........................ 391
9.7 Trends and Research Issues........................ 393
9.8 Bibliographic Discussion.......................... 394
10 Parallel and Distributed IR 399
with Eric Brown
10.1 Introduction................................. 399
10.2 A Taxonomy of Distributed IR Systems................. 402
10.3 Data Partitioning.............................. 404
10.3.1 Collection Partitioning....................... 405
10.3.2 Collection Selection ........................ 407
10.3.3 Inverted Index Partitioning.................... 409
10.3.4 Partitioning other Indexes..................... 413
10.4 Parallel IR.................................. 414
10.4.1 Introduction ............................ 414
10.4.2 Parallel IR on MIMD Architectures ............... 416
10.4.3 Parallel IR on SIMD Architectures................ 418
10.5 Cluster-based IR.............................. 423
10.6 Distributed IR ............................... 424
CONTENTS xill
10.6.1 Introduction ............................ 424
10.6.2 Indexing............................... 428
10.6.3 Query Processing.......................... 431
10.6.4 Web Issues............................. 437
10.7 Federated Search.............................. 438
10.8 Retrieval in Peer-to-Peer Networks.................... 440
10.9 Trends and Research Issues........................ 444
lO.lOBibliographic Discussion.......................... 445
11 Web Retrieval 447
with Yoelle Maarek
11.1 Introduction................................. 447
11.2 A Challenging Problem .......................... 449
11.3 The Web .................................. 451
11.3.1 Characteristics........................... 451
11.3.2 Structure of the Web Graph.................... 452
11.3.3 Modeling the Web......................... 454
11.3.4 Link Analysis............................ 456
11.4 Search Engine Architectures........................ 458
11.4.1 Basic Architecture......................... 458
11.4.2 Cluster-based Architecture .................... 459
11.4.3 Caching............................... 462
11.4.4 Multiple Indexes.......................... 464
11.4.5 Distributed Architectures..................... 466
11.5 Search Engine Ranking........................... 468
11.5.1 Ranking Signals .......................... 469
11.5.2 Link-based Ranking........................ 470
11.5.3 Simple Ranking Functions..................... 473
11.5.4 Learning to Rank.......................... 473
11.5.5 Learning the Ranking Function.................. 474
11.5.6 Quality Evaluation......................... 475
11.5.7 Web Spam ............................. 476
11.6 Managing Web Data............................ 477
11.6.1 Assigning Identifiers to Documents................ 477
11.6.2 Metadata.............................. 478
11.6.3 Compressing the Web Graph................... 478
11.6.4 Handling Duplicated Data..................... 479
11.7 Search Engine User Interaction...................... 480
11.7.1 The Search Rectangle Paradigm ................. 481
11.7.2 The Search Engine Result Page.................. 488
11.7.3 Educating the User......................... 497
11.8 Browsing .................................. 498
11.8.1 Flat Browsing............................ 499
11.8.2 Structure Guided Browsing and Web Directories........ 499
11.9 Beyond Browsing.............................. 501
11.9.1 Hypertext and the Web...................... 501
11.9.2 Combining Searching with Browsing............... 501
11.9.3 Web Query Languages....................... 503
xiv CONTENTS
11.9.4 Dynamic Search .......................... 503
U.lORelated Problems.............................. 504
11.10.1 Computational Advertising.................... 504
11.10.2Web Mining............................. 506
11.10.3 Metasearch............................. 508
ll.HTrends and Research Issues........................ 509
11.11.1 Beyond Static Text Data ..................... 509
11.11.2Current Challenges......................... 511
11.12Bibliographical Discussion......................... 513
12 Web Crawling 515
with Carlos Castillo
12.1 Introduction................................. 515
12.2 Applications of a Web Crawler...................... 517
12.2.1 General Web Search........................ 517
12.2.2 Topical Crawling.......................... 518
12.2.3 Web Characterization....................... 518
12.2.4 Mirroring.............................. 518
12.2.5 Web Site Analysis......................... 519
12.3 A Taxonomy of Crawlers.......................... 519
12.3.1 Types of Web Pages........................ 520
12.4 Architecture and Implementation..................... 521
12.4.1 Crawler Architecture........................ 521
12.4.2 Practical Issues........................... 523
12.4.3 Parallel Crawling.......................... 526
12.5 Scheduling Algorithms........................... 527
12.5.1 Selection Policy........................... 528
12.5.2 Revisit Policy............................ 530
12.5.3 Politeness Policy.......................... 535
12.5.4 Combining Policies......................... 538
12.6 Evaluation.................................. 539
12.6.1 Evaluating Network Usage..................... 539
12.6.2 Evaluating Long-term Scheduling................. 540
12.7 Trends and Research Issues........................ 541
12.7.1 Crawling the Hidden Web.................... 541
12.7.2 Crawling with the Help of Web Sites............... 542
12.7.3 Distributed Crawling........................ 543
12.8 Bibliographic Discussion.......................... 543
13 Structured Text Retrieval 545
with Mounia Lalmas
13.1 Introduction................................. 545
13.2 Structuring Power............................. 546
13.2.1 Explicit vs. Implicit Structure................... 546
13.2.2 Static vs. Dynamic Structure................... 547
13.2.3 Single Hierarchy vs. Multiple Hierarchies ............ 548
13.3 Early Text Retrieval Models........................ 549
13.3.1 Model Based on Non-Overlapping Lists............. 549
CONTENTS xv
13.3.2 Model Based on Proximal Nodes................. 550
13.3.3 Ranking Structured Text Results................. 551
13.4 XML Retrieval............................... 551
13.4.1 Challenges in XML Retrieval................... 551
13.4.2 Indexing Strategies......................... 553
13.4.3 Ranking Strategies......................... 554
13.4.4 Removing Overlaps......................... 565
13.5 XML Retrieval Evaluation......................... 566
13.5.1 Document Collections....................... 566
13.5.2 Topics................................ 567
13.5.3 Retrieval Tasks........................... 568
13.5.4 Relevance.............................. 569
13.5.5 Measures.............................. 571
13.6 Query Languages.............................. 573
13.6.1 Characteristics........................... 574
13.6.2 Classification of XML Query Languages............. 575
13.6.3 Examples of XML Query Languages............... 577
13.7 Trends and Research Issues......................... 582
13.8 Bibliographic Discussion.......................... 585
14 Multimedia Information Retrieval 587
by Dulce Ponceleon and Malcolm Slaney
14.1 Introduction................................. 587
14.1.1 What is Multimedia?........................ 587
14.1.2 Multimedia IR........................... 588
14.1.3 Text IR versus Multimedia IR .................. 589
14.2 The Challenges............................... 589
14.2.1 The Semantic Gap......................... 589
14.2.2 Feature Ambiguity......................... 591
14.2.3 Machine-generated Data...................... 591
14.3 Content-based Image Retrieval...................... 592
14.3.1 Color-Based Retrieval....................... 593
14.3.2 Texture............................... 593
14.3.3 Salient Points............................ 596
14.4 Audio and Music Retrieval ........................ 597
14.4.1 Fingerprinting ........................... 598
14.4.2 Speech Recognition......................... 599
14.4.3 Speaker Identification....................... 601
14.4.4 Spoken Document Retrieval.................... 602
14.4.5 Audio Basics............................ 602
14.5 Retrieving and Browsing Video...................... 606
14.5.1 Video Abstracts .......................... 606
14.5.2 Static Summaries.......................... 607
14.5.3 Mosaics and Salient Stills..................... 608
14.5.4 Dynamic Summaries........................ 609
14.5.5 Interactive Summaries....................... 611
14.5.6 Visual vs. Audio Browsing .................... 612
14.5.7 Evaluating Summaries....................... 613
xvi CONTENTS
14.6 Fusion Models: Combining it All..................... 614
14.6.1 Naming Faces............................ 614
14.6.2 Naming Images........................... 615
14.6.3 Naming Audio........................... 616
14.6.4 Combining Audio and Video for AVSR.............. 617
14.6.5 Combining Audio and Video for Multimedia........... 620
14.7 Segmentation................................ 620
14.7.1 A Video Segmentation Example ................. 620
14.7.2 Segmentation Schemes for Video................. 622
14.7.3 Video Segmentation with Edges ................. 623
14.7.4 Speech Segmentation........................ 624
14.7.5 Segmentation Evaluation ..................... 625
14.8 Compression and MPEG Standards.................... 625
14.8.1 Intensity and Sampling ...................... 626
14.8.2 Color ................................ 626
14.8.3 Lossy Compression......................... 628
14.8.4 Lossless Compression........................ 628
14.8.5 Temporal Redundancy....................... 630
14.8.6 Motion Prediction......................... 631
14.8.7 MPEG Standards ......................... 633
14.9 Trends and Research Issues........................ 636
14.10Bibliographic Discussion.......................... 637
15 Enterprise Search 641
by David Hawking
15.1 Introduction................................. 641
15.1.1 Characteristics and Applications of Enterprise Search..... 642
15.1.2 Enterprise Search Software.................... 643
15.1.3 Workplace Search ......................... 644
15.2 Enterprise Search Tasks.......................... 644
15.2.1 Examples of Search-Supported Tasks............... 644
15.2.2 Search Types............................ 647
15.2.3 Studying Enterprise Search.................... 647
15.3 Architecture of Enterprise Search Systems................ 648
15.3.1 Gathering.......... .................... 648
15.3.2 Extracting.............................. 651
15.3.3 Indexing............................... 652
15.3.4 Indexing Textual Annotations................... 653
15.3.5 Query Processing.......................... 654
15.3.6 Presentation of Search Results.................. 655
15.3.7 Security Models .......................... 657
15.3.8 Fedcration/Metasearch....................... 659
15.4 Enterprise Search Evaluation....................... 662
15.4.1 Published Test Collections for Enterprise Search........ 662
15.4.2 Internal Enterprise Search Evaluations.............. 663
15.4.3 Enterprise Search Tuning..................... 665
15.4.4 What is it Reasonable to Expect? ................ 666
15.5 Potential Reasons for Dissatisfaction................... 667
CONTENTS xvii
15.6 Context and Personalization........................ 668
15.6.1 Controls and Levers for Contextualization............ 671
15.6.2 Contextualization: Local, Enterprise or Global?......... 675
15.6.3 Privacy of Profiles......................... 676
15.6.4 Defining, Creating and Maintaining a Profile.......... 677
15.6.5 User Modeling........................... 677
15.6.6 Implicit Measures ......................... 679
15.6.7 Information Filtering........................ 679
15.6.8 Social Recommender Systems................... 680
15.7 Trends and Research Issues........................ 681
15.8 Bibliographic Discussion.......................... 681
16 Library Systems 685
by Edit Rasmussen
16.1 The Information Environment in the Library.............. 685
16.2 Online Public Access Catalogues..................... 687
16.2.1 OPACs and Bibliographic Records................ 689
16.2.2 Information Retrieval from the ILS................ 691
16.2.3 Integrating the Hybrid Library.................. 693
16.2.4 OPACs and End Users....................... 694
16.2.5 ILS: Vendors and Products.................... 695
16.3 IR Systems and Document Databases .................. 697
16.3.1 Bibliographic and Full-text Databases.............. 698
16.3.2 Content of Database Records................... 698
16.3.3 The Online Industry: Database Vendors............. 701
16.3.4 Information Retrieval from Document Databases........ 702
16.4 Information Retrieval in Organizations.................. 706
16.5 Trends and Research Issues........................ 708
16.6 Bibliographic Discussion.......................... 709
17 Digital Libraries 711
by Marcos Goncalves
17.1 Introduction................................. 711
17.2 Defining Digital Libraries......................... 712
17.3 A General Architecture .......................... 713
17.4 Fundamentals................................ 714
17.4.1 Digital Objects and Collections.................. 714
17.4.2 Metadata and Catalogs...................... 716
17.4.3 Repositories/Archives....................... 719
17.4.4 Services............................... 723
17.5 Social-Economical Issues.......................... 725
17.5.1 Social Issues ............................ 725
17.5.2 Economical Issues ......................... 726
17.6 Software Systems.............................. 727
17.6.1 Greenstone............................. 728
17.6.2 Eprints ............................... 728
17.6.3 DSpace............................... 728
17.6.4 Fedora................................ 729
xviii CONTENTS
17.6.5 Open Digital Libraries....................... 729
17.6.6 The 5S Suite............................ 730
17.7 DL Case Studies.............................. 731
17.7.1 The Networked DL of Theses and Dissertations......... 731
17.7.2 The National Science Digital Library............... 732
17.7.3 The ETANA-DL Archaeological Digital Library......... 732
17.8 Trends and Research Issues........................ 733
17.8.1 Evaluation ............................. 733
17.8.2 Integration............................. 733
17.8.3 Other Research Challenges .................... 734
17.9 Bibliographic Discussion.......................... 735
A Open Source Search Engines 737
with Christian Middleton
A.l Introduction................................. 737
A.2 Search Engines............................... 738
A.2.1 Preliminary Selection of Search Engines............. 738
A.2.2 Features............................... 741
A.2.3 Evaluation ............................. 742
A.3 Methodology................................ 743
A.3.1 Document Collections....................... 743
A.3.2 Evaluation Tests.......................... 744
A.3.3 Experimental Setup........................ 744
A.4 Experimental Results............................ 745
A.4.1 Test A - Indexing......................... 745
A.4.2 Test B - Incremental Indexing.................. 749
A.4.3 Test C - Search Performance................... 749
A.4.4 Global Evaluation......................... 752
A.5 Conclusions................................. 753
B Biographies 755
References 761
Index 893
|
adam_txt |
Titel: Modern information retrieval
Autor: Baeza-Yates, Ricardo
Jahr: 2011
Contents
Preface to the Second Edition xix
Preface to the First Edition xxi
Authors' Acknowledgements to the Second Edition xxiii
Authors' Acknowledgements to the First Edition xxv
Publishers' Acknowledgements xxvii
1 Introduction 1
1.1 Information Retrieval . 1
1.1.1 Early Developments. 1
1.1.2 Information Retrieval in Libraries and Digital Libraries . 3
1.1.3 IR at the Center of the Stage. 3
1.2 The IR Problem . 3
1.2.1 The User's Task . 4
1.2.2 Information versus Data Retrieval. 5
1.3 The IR System. 5
1.3.1 Software Architecture of the IR System. 5
1.3.2 The Retrieval and Ranking Processes . 7
1.4 The Web . 8
1.4.1 A Brief History. 8
1.4.2 The e-Publishing Era . 9
1.4.3 How the Web Changed Search . 10
1.4.4 Practical Issues on the Web. 12
1.5 Organization of the Book. 12
1.5.1 Focus of the Book. 12
1.5.2 Book Contents. 13
1.6 The Book Web Site: A Teaching Resource. 16
1.7 Bibliographic Discussion. 17
2 User Interfaces for Search 21
by Marti Hearst
2.1 Introduction. 21
2.2 How People Search. 21
I CONTENTS
2.2.1 Information Lookup versus Exploratory Search. 22
2.2.2 Classic versus Dynamic Model of Information Seeking. 23
2.2.3 Navigation versus Search . 24
2.2.4 Observations of the Search Process. 24
2.3 Search Interfaces Today. 25
2.3.1 Getting Started. 25
2.3.2 Query Specification . 26
2.3.3 Query Specification Interfaces. 27
2.3.4 Retrieval Results Display. 29
2.3.5 Query Reformulation. 32
2.3.6 Organizing Search Results. 35
2.4 Visualization in Search Interfaces. 40
2.4.1 Visualizing Boolean Syntax. 42
2.4.2 Visualizing Query Terms within Retrieval Results . 43
2.4.3 Visualizing Relationships Among Words and Documents . 47
2.4.4 Visualization for Text Mining. 49
2.5 Design and Evaluation of Search Interfaces . 50
2.6 Trends and Research Issues. 54
2.7 Bibliographic Discussion. 54
Modeling 57
3.1 IR Models. 57
3.1.1 Modeling and Ranking. 57
3.1.2 Characterization of an IR Model. 58
3.1.3 A Taxonomy of IR Models. 59
3.2 Classic Information Retrieval. 61
3.2.1 Basic Concepts. 61
3.2.2 The Boolean Model . 64
3.2.3 Term Weighting . 66
3.2.4 TF-IDF Weights. 68
3.2.5 Document Length Normalization. 75
3.2.6 The Vector Model. 77
3.2.7 The Probabilistic Model. 79
3.2.8 Brief Comparison of Classic Models . 86
3.3 Alternative Set Theoretic Models. 87
3.3.1 Set-Based Model. 87
3.3.2 Extended Boolean Model. 92
3.3.3 Fuzzy Set Model. 95
3.4 Alternative Algebraic Models. 98
3.4.1 Generalized Vector Space Model. 98
3.4.2 Latent Semantic Indexing Model. 101
3.4.3 Neural Network Model. 102
3.5 Alternative Probabilistic Models. 104
3.5.1 BM25. 104
3.5.2 Language Models. 107
3.5.3 Divergence from Randomness. 113
3.5.4 Bayesian Network Models. 116
3.6 Other Models. 124
CONTENTS ix
3.6.1 The Hypertext Model. 124
3.6.2 Web based Models. 125
3.6.3 Structured Text Retrieval. 126
3.6.4 Multimedia Retrieval. 126
3.6.5 Enterprise and Vertical Search. 126
3.7 Trends and Research Issues. 127
3.8 Bibliographic Discussion. 128
Retrieval Evaluation 131
4.1 Introduction. 131
4.2 The Cranfield Paradigm. 132
4.2.1 A Brief History. 132
4.2.2 Reference Collections. 134
4.3 Retrieval Metrics. 134
4.3.1 Precision and Recall. 135
4.3.2 Single Value Summaries: P@n, MAP, MRR, F. 139
4.3.3 User-Oriented Measures. 144
4.3.4 DCG: Discounted Cumulated Gain. 145
4.3.5 BPREF: Binary Preferences. 150
4.3.6 Rank Correlation Metrics. 153
4.4 Reference Collections. 158
4.4.1 The TREC Collections. 159
4.4.2 Other Reference Collections. 166
4.4.3 Other Small Test Collections. 167
4.5 User-Based Evaluation. 168
4.5.1 Human Experimentation in the Lab. 168
4.5.2 Side-by-Side Panels. 168
4.5.3 A/B Testing . 169
4.5.4 Crowdsourcing. 170
4.5.5 Evaluation using Clickthrough Data. 171
4.6 Practical Caveats. 173
4.7 Trends and Research Issues. 174
4.8 Bibliographic Discussion. 174
Relevance Feedback and Query Expansion 177
5.1 Introduction. 177
5.2 A Framework for Feedback Methods. 178
5.3 Explicit Relevance Feedback. 180
5.3.1 Relevance Feedback for the Vector Model: Rocchio Method . . 181
5.3.2 Relevance Feedback for the Probabilistic Model. 183
5.3.3 Evaluation of Relevance Feedback . 184
5.4 Explicit Feedback Through Clicks. 185
5.4.1 Eye Tracking and Relevance Judgements. 185
5.4.2 User Behavior. 186
5.4.3 Clicks as a Metric of User Preferences. 187
5.5 Implicit Feedback Through Local Analysis. 190
5.5.1 Implicit Feedback Through Local Clustering. 190
5.5.2 Implicit Feedback through Local Context Analysis. 193
X CONTENTS
5.6 Implicit Feedback Through Global Analysis. 195
5.6.1 Query Expansion based on a Similarity Thesaurus. 195
5.6.2 Query Expansion based on a Statistical Thesaurus. 198
5.7 Trends and Research Issues. 200
5.8 Bibliographic Discussion. 200
6 Documents: Languages Properties 203
with Gonzalo Navarro and Nivio Ziviani
6.1 Introduction. 203
6.2 Metadata. 205
6.3 Document Formats. 206
6.3.1 Text. 206
6.3.2 Multimedia. 207
6.3.3 Graphics and Virtual Reality. 208
6.4 Markup Languages. 208
6.4.1 SGML. 209
6.4.2 HTML. 211
6.4.3 XML. 214
6.4.4 RDF: Resource Description Framework . 216
6.4.5 HyTime. 217
6.5 Text Properties. 218
6.5.1 Information Theory . 218
6.5.2 Modeling Natural Language. 219
6.5.3 Text Similarity. 222
6.6 Document Preprocessing. 223
6.6.1 Lexical Analysis of the Text. 224
6.6.2 Elimination of Stopwords. 226
6.6.3 Stemming. 226
6.6.4 Keyword Selection. 227
6.6.5 Thesauri. 228
6.7 Organizing Documents. 231
6.7.1 Taxonomies. 231
6.7.2 Folksonomies. 232
6.8 Text Compression . 233
6.8.1 Basic Concepts. 234
6.8.2 Statistical Methods. 234
6.8.3 Statistical Methods: Modeling . 235
6.8.4 Statistical Methods: Coding. 238
6.8.5 Dictionary Methods. 245
6.8.6 Preprocessing for Compression. 246
6.8.7 Comparing Text Compression Techniques. 248
6.8.8 Structured Text Compression. 249
6.9 Trends and Research Issues. 250
6.10 Bibliographical Discussion. 253
7 Queries: Languages Properties 255
with Gonzalo Navarro
7.1 Query Languages. 255
CONTENTS xi
7.1.1 Keyword-based Querying. 256
7.1.2 Beyond Keywords. 259
7.1.3 Structural Queries. 262
7.1.4 Query Protocols. 265
7.2 Query Properties. 267
7.2.1 Characterizing Web Queries. 267
7.2.2 User Search Behavior . 269
7.2.3 Query Intent. 270
7.2.4 Query Topic . 272
7.2.5 Query Sessions and Missions . 273
7.2.6 Query Difficulty . 274
7.3 Trends and Research Issues. 278
7.4 Bibliographical Discussion. 279
8 Text Classification 281
with Marcos Gongalves
8.1 Introduction. 281
8.2 A Characterization of Text Classification. 282
8.2.1 Machine Learning . 282
8.2.2 The Text Classification Problem. 283
8.2.3 Text Classification Algorithms . 284
8.3 Unsupervised Algorithms . 286
8.3.1 Clustering. 286
8.3.2 Naive Text Classification . 290
8.4 Supervised Algorithms. 291
8.4.1 Decision Trees . 294
8.4.2 The k-NN Classifier. 299
8.4.3 The Rocchio Classifier. 300
8.4.4 Probabilistic Naive Bayes Document Classification. 303
8.4.5 The SVM Classifier . 306
8.4.6 Ensemble Classifiers. 316
8.4.7 Final Remarks on Supervised Algorithms. 319
8.5 Feature Selection or Dimensionality Reduction . 320
8.5.1 Term-Class Incidence Table. 321
8.5.2 Term Document Frequency. 322
8.5.3 TF-IDF Weights. 322
8.5.4 Mutual Information. 323
8.5.5 Information Gain. 323
8.5.6 Chi Square. 324
8.5.7 Impact of Feature Selection. 325
8.6 Evaluation Metrics. 325
8.6.1 Contingency Table. 325
8.6.2 Accuracy and Error. 326
8.6.3 Precision and Recall. 327
8.6.4 F-measure and Fi . 327
8.6.5 Cross-Validation. 329
8.6.6 Standard Collections. 329
8.7 Organizing the Classes - Building Taxonomies . 330
xii CONTENTS
8.8 Trends and Research Issues. 333
8.9 Bibliographic Discussion. 334
9 Indexing and Searching 337
with Gonzalo Navarro
9.1 Introduction. 337
9.2 Inverted Indexes. 340
9.2.1 Basic Concepts. 340
9.2.2 Full Inverted Indexes. 341
9.2.3 Searching. 345
9.2.4 Ranking. 348
9.2.5 Construction. 351
9.2.6 Compressed Inverted Indexes. 354
9.2.7 Structural Queries. 357
9.3 Signature Files. 357
9.4 Suffix Trees and Suffix Arrays. 360
9.4.1 Structure: Tries and Suffix Trees. 361
9.4.2 Searching for Simple Strings. 362
9.4.3 Searching for Complex Patterns. 363
9.4.4 Construction. 365
9.4.5 Compressed Suffix Arrays. 367
9.5 Sequential Searching. 372
9.5.1 Simple Strings: Horspool . 373
9.5.2 Complex Patterns: Automata and Bit-Parallelism. 375
9.5.3 Faster Bit-Parallel Algorithms . 379
9.5.4 Regular Expressions. 382
9.5.5 Multiple Patterns. 384
9.5.6 Approximate Searching. 385
9.5.7 Searching Compressed Text. 389
9.6 Multi-dimensional Indexing. 391
9.7 Trends and Research Issues. 393
9.8 Bibliographic Discussion. 394
10 Parallel and Distributed IR 399
with Eric Brown
10.1 Introduction. 399
10.2 A Taxonomy of Distributed IR Systems. 402
10.3 Data Partitioning. 404
10.3.1 Collection Partitioning. 405
10.3.2 Collection Selection . 407
10.3.3 Inverted Index Partitioning. 409
10.3.4 Partitioning other Indexes. 413
10.4 Parallel IR. 414
10.4.1 Introduction . 414
10.4.2 Parallel IR on MIMD Architectures . 416
10.4.3 Parallel IR on SIMD Architectures. 418
10.5 Cluster-based IR. 423
10.6 Distributed IR . 424
CONTENTS xill
10.6.1 Introduction . 424
10.6.2 Indexing. 428
10.6.3 Query Processing. 431
10.6.4 Web Issues. 437
10.7 Federated Search. 438
10.8 Retrieval in Peer-to-Peer Networks. 440
10.9 Trends and Research Issues. 444
lO.lOBibliographic Discussion. 445
11 Web Retrieval 447
with Yoelle Maarek
11.1 Introduction. 447
11.2 A Challenging Problem . 449
11.3 The Web . 451
11.3.1 Characteristics. 451
11.3.2 Structure of the Web Graph. 452
11.3.3 Modeling the Web. 454
11.3.4 Link Analysis. 456
11.4 Search Engine Architectures. 458
11.4.1 Basic Architecture. 458
11.4.2 Cluster-based Architecture . 459
11.4.3 Caching. 462
11.4.4 Multiple Indexes. 464
11.4.5 Distributed Architectures. 466
11.5 Search Engine Ranking. 468
11.5.1 Ranking Signals . 469
11.5.2 Link-based Ranking. 470
11.5.3 Simple Ranking Functions. 473
11.5.4 Learning to Rank. 473
11.5.5 Learning the Ranking Function. 474
11.5.6 Quality Evaluation. 475
11.5.7 Web Spam . 476
11.6 Managing Web Data. 477
11.6.1 Assigning Identifiers to Documents. 477
11.6.2 Metadata. 478
11.6.3 Compressing the Web Graph. 478
11.6.4 Handling Duplicated Data. 479
11.7 Search Engine User Interaction. 480
11.7.1 The Search Rectangle Paradigm . 481
11.7.2 The Search Engine Result Page. 488
11.7.3 Educating the User. 497
11.8 Browsing . 498
11.8.1 Flat Browsing. 499
11.8.2 Structure Guided Browsing and Web Directories. 499
11.9 Beyond Browsing. 501
11.9.1 Hypertext and the Web. 501
11.9.2 Combining Searching with Browsing. 501
11.9.3 Web Query Languages. 503
xiv CONTENTS
11.9.4 Dynamic Search . 503
U.lORelated Problems. 504
11.10.1 Computational Advertising. 504
11.10.2Web Mining. 506
11.10.3 Metasearch. 508
ll.HTrends and Research Issues. 509
11.11.1 Beyond Static Text Data . 509
11.11.2Current Challenges. 511
11.12Bibliographical Discussion. 513
12 Web Crawling 515
with Carlos Castillo
12.1 Introduction. 515
12.2 Applications of a Web Crawler. 517
12.2.1 General Web Search. 517
12.2.2 Topical Crawling. 518
12.2.3 Web Characterization. 518
12.2.4 Mirroring. 518
12.2.5 Web Site Analysis. 519
12.3 A Taxonomy of Crawlers. 519
12.3.1 Types of Web Pages. 520
12.4 Architecture and Implementation. 521
12.4.1 Crawler Architecture. 521
12.4.2 Practical Issues. 523
12.4.3 Parallel Crawling. 526
12.5 Scheduling Algorithms. 527
12.5.1 Selection Policy. 528
12.5.2 Revisit Policy. 530
12.5.3 Politeness Policy. 535
12.5.4 Combining Policies. 538
12.6 Evaluation. 539
12.6.1 Evaluating Network Usage. 539
12.6.2 Evaluating Long-term Scheduling. 540
12.7 Trends and Research Issues. 541
12.7.1 Crawling the "Hidden" Web. 541
12.7.2 Crawling with the Help of Web Sites. 542
12.7.3 Distributed Crawling. 543
12.8 Bibliographic Discussion. 543
13 Structured Text Retrieval 545
with Mounia Lalmas
13.1 Introduction. 545
13.2 Structuring Power. 546
13.2.1 Explicit vs. Implicit Structure. 546
13.2.2 Static vs. Dynamic Structure. 547
13.2.3 Single Hierarchy vs. Multiple Hierarchies . 548
13.3 Early Text Retrieval Models. 549
13.3.1 Model Based on Non-Overlapping Lists. 549
CONTENTS xv
13.3.2 Model Based on Proximal Nodes. 550
13.3.3 Ranking Structured Text Results. 551
13.4 XML Retrieval. 551
13.4.1 Challenges in XML Retrieval. 551
13.4.2 Indexing Strategies. 553
13.4.3 Ranking Strategies. 554
13.4.4 Removing Overlaps. 565
13.5 XML Retrieval Evaluation. 566
13.5.1 Document Collections. 566
13.5.2 Topics. 567
13.5.3 Retrieval Tasks. 568
13.5.4 Relevance. 569
13.5.5 Measures. 571
13.6 Query Languages. 573
13.6.1 Characteristics. 574
13.6.2 Classification of XML Query Languages. 575
13.6.3 Examples of XML Query Languages. 577
13.7 Trends and Research Issues. 582
13.8 Bibliographic Discussion. 585
14 Multimedia Information Retrieval 587
by Dulce Ponceleon and Malcolm Slaney
14.1 Introduction. 587
14.1.1 What is Multimedia?. 587
14.1.2 Multimedia IR. 588
14.1.3 Text IR versus Multimedia IR . 589
14.2 The Challenges. 589
14.2.1 The Semantic Gap. 589
14.2.2 Feature Ambiguity. 591
14.2.3 Machine-generated Data. 591
14.3 Content-based Image Retrieval. 592
14.3.1 Color-Based Retrieval. 593
14.3.2 Texture. 593
14.3.3 Salient Points. 596
14.4 Audio and Music Retrieval . 597
14.4.1 Fingerprinting . 598
14.4.2 Speech Recognition. 599
14.4.3 Speaker Identification. 601
14.4.4 Spoken Document Retrieval. 602
14.4.5 Audio Basics. 602
14.5 Retrieving and Browsing Video. 606
14.5.1 Video Abstracts . 606
14.5.2 Static Summaries. 607
14.5.3 Mosaics and Salient Stills. 608
14.5.4 Dynamic Summaries. 609
14.5.5 Interactive Summaries. 611
14.5.6 Visual vs. Audio Browsing . 612
14.5.7 Evaluating Summaries. 613
xvi CONTENTS
14.6 Fusion Models: Combining it All. 614
14.6.1 Naming Faces. 614
14.6.2 Naming Images. 615
14.6.3 Naming Audio. 616
14.6.4 Combining Audio and Video for AVSR. 617
14.6.5 Combining Audio and Video for Multimedia. 620
14.7 Segmentation. 620
14.7.1 A Video Segmentation Example . 620
14.7.2 Segmentation Schemes for Video. 622
14.7.3 Video Segmentation with Edges . 623
14.7.4 Speech Segmentation. 624
14.7.5 Segmentation Evaluation . 625
14.8 Compression and MPEG Standards. 625
14.8.1 Intensity and Sampling . 626
14.8.2 Color . 626
14.8.3 Lossy Compression. 628
14.8.4 Lossless Compression. 628
14.8.5 Temporal Redundancy. 630
14.8.6 Motion Prediction. 631
14.8.7 MPEG Standards . 633
14.9 Trends and Research Issues. 636
14.10Bibliographic Discussion. 637
15 Enterprise Search 641
by David Hawking
15.1 Introduction. 641
15.1.1 Characteristics and Applications of Enterprise Search. 642
15.1.2 Enterprise Search Software. 643
15.1.3 Workplace Search . 644
15.2 Enterprise Search Tasks. 644
15.2.1 Examples of Search-Supported Tasks. 644
15.2.2 Search Types. 647
15.2.3 Studying Enterprise Search. 647
15.3 Architecture of Enterprise Search Systems. 648
15.3.1 Gathering.". 648
15.3.2 Extracting. 651
15.3.3 Indexing. 652
15.3.4 Indexing Textual Annotations. 653
15.3.5 Query Processing. 654
15.3.6 Presentation of Search Results. 655
15.3.7 Security Models . 657
15.3.8 Fedcration/Metasearch. 659
15.4 Enterprise Search Evaluation. 662
15.4.1 Published Test Collections for Enterprise Search. 662
15.4.2 Internal Enterprise Search Evaluations. 663
15.4.3 Enterprise Search Tuning. 665
15.4.4 What is it Reasonable to Expect? . 666
15.5 Potential Reasons for Dissatisfaction. 667
CONTENTS xvii
15.6 Context and Personalization. 668
15.6.1 Controls and Levers for Contextualization. 671
15.6.2 Contextualization: Local, Enterprise or Global?. 675
15.6.3 Privacy of Profiles. 676
15.6.4 Defining, Creating and Maintaining a Profile. 677
15.6.5 User Modeling. 677
15.6.6 Implicit Measures . 679
15.6.7 Information Filtering. 679
15.6.8 Social Recommender Systems. 680
15.7 Trends and Research Issues. 681
15.8 Bibliographic Discussion. 681
16 Library Systems 685
by Edit Rasmussen
16.1 The Information Environment in the Library. 685
16.2 Online Public Access Catalogues. 687
16.2.1 OPACs and Bibliographic Records. 689
16.2.2 Information Retrieval from the ILS. 691
16.2.3 Integrating the Hybrid Library. 693
16.2.4 OPACs and End Users. 694
16.2.5 ILS: Vendors and Products. 695
16.3 IR Systems and Document Databases . 697
16.3.1 Bibliographic and Full-text Databases. 698
16.3.2 Content of Database Records. 698
16.3.3 The Online Industry: Database Vendors. 701
16.3.4 Information Retrieval from Document Databases. 702
16.4 Information Retrieval in Organizations. 706
16.5 Trends and Research Issues. 708
16.6 Bibliographic Discussion. 709
17 Digital Libraries 711
by Marcos Goncalves
17.1 Introduction. 711
17.2 Defining Digital Libraries. 712
17.3 A General Architecture . 713
17.4 Fundamentals. 714
17.4.1 Digital Objects and Collections. 714
17.4.2 Metadata and Catalogs. 716
17.4.3 Repositories/Archives. 719
17.4.4 Services. 723
17.5 Social-Economical Issues. 725
17.5.1 Social Issues . 725
17.5.2 Economical Issues . 726
17.6 Software Systems. 727
17.6.1 Greenstone. 728
17.6.2 Eprints . 728
17.6.3 DSpace. 728
17.6.4 Fedora. 729
xviii CONTENTS
17.6.5 Open Digital Libraries. 729
17.6.6 The 5S Suite. 730
17.7 DL Case Studies. 731
17.7.1 The Networked DL of Theses and Dissertations. 731
17.7.2 The National Science Digital Library. 732
17.7.3 The ETANA-DL Archaeological Digital Library. 732
17.8 Trends and Research Issues. 733
17.8.1 Evaluation . 733
17.8.2 Integration. 733
17.8.3 Other Research Challenges . 734
17.9 Bibliographic Discussion. 735
A Open Source Search Engines 737
with Christian Middleton
A.l Introduction. 737
A.2 Search Engines. 738
A.2.1 Preliminary Selection of Search Engines. 738
A.2.2 Features. 741
A.2.3 Evaluation . 742
A.3 Methodology. 743
A.3.1 Document Collections. 743
A.3.2 Evaluation Tests. 744
A.3.3 Experimental Setup. 744
A.4 Experimental Results. 745
A.4.1 Test A - Indexing. 745
A.4.2 Test B - Incremental Indexing. 749
A.4.3 Test C - Search Performance. 749
A.4.4 Global Evaluation. 752
A.5 Conclusions. 753
B Biographies 755
References 761
Index 893 |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Baeza-Yates, Ricardo Ribeiro, Berthier de Araújo Neto 1960- |
author_GND | (DE-588)129860751 |
author_facet | Baeza-Yates, Ricardo Ribeiro, Berthier de Araújo Neto 1960- |
author_role | aut aut |
author_sort | Baeza-Yates, Ricardo |
author_variant | r b y rby b d a n r bdan bdanr |
building | Verbundindex |
bvnumber | BV022216400 |
callnumber-first | Z - Library Science |
callnumber-label | ZA3075 |
callnumber-raw | ZA3075 .B34 2011 |
callnumber-search | ZA3075 .B34 2011 |
callnumber-sort | ZA 43075 B34 42011 |
callnumber-subject | ZA - Information Resources |
classification_rvk | ST 270 ST 515 ST 205 |
classification_tum | DAT 825f |
ctrlnum | (OCoLC)634765757 (DE-599)BVBBV022216400 |
dewey-full | 005.7 025.04 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 005 - Computer programming, programs, data, security 025 - Operations of libraries and archives |
dewey-raw | 005.7 025.04 |
dewey-search | 005.7 025.04 |
dewey-sort | 15.7 |
dewey-tens | 000 - Computer science, information, general works 020 - Library and information sciences |
discipline | Allgemeines Informatik |
discipline_str_mv | Allgemeines Informatik |
edition | 2. ed. |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02012nam a2200481 c 4500</leader><controlfield tag="001">BV022216400</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20220322 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">070108s2011 ad|| |||| 00||| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">2010045454</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780321416919</subfield><subfield code="9">978-0-321-41691-9</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0321416910</subfield><subfield code="9">0-321-41691-0</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)634765757</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV022216400</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-473</subfield><subfield code="a">DE-92</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-29T</subfield><subfield code="a">DE-859</subfield><subfield code="a">DE-355</subfield><subfield code="a">DE-634</subfield><subfield code="a">DE-91G</subfield><subfield code="a">DE-Aug4</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-2070s</subfield><subfield code="a">DE-706</subfield><subfield code="a">DE-526</subfield><subfield code="a">DE-522</subfield><subfield code="a">DE-858</subfield><subfield code="a">DE-573</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">ZA3075 .B34 2011</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">005.7</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">025.04</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 270</subfield><subfield code="0">(DE-625)143638:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 515</subfield><subfield code="0">(DE-625)143677:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 270</subfield><subfield code="0">(DE-625)143638:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 205</subfield><subfield code="0">(DE-625)143613:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 825f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Baeza-Yates, Ricardo</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Modern information retrieval</subfield><subfield code="b">the concepts and technology behind search</subfield><subfield code="c">Ricardo Baeza-Yates ; Berthier Ribeiro-Neto</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">2. ed.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Harlow ; Munich [u.a.]</subfield><subfield code="b">Pearson</subfield><subfield code="c">2011</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXX, 913 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Hier auch später erschienene, unveränderte Nachdrucke</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Information retrieval</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Information Retrieval</subfield><subfield code="0">(DE-588)4072803-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Information Retrieval</subfield><subfield code="0">(DE-588)4072803-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-188</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ribeiro, Berthier de Araújo Neto</subfield><subfield code="d">1960-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)129860751</subfield><subfield code="4">aut</subfield></datafield><datafield tag="780" ind1="0" ind2="0"><subfield code="i">Früher u.d.T.</subfield><subfield code="a">Baeza-Yates, Ricardo</subfield><subfield code="t">Modern information retrieval</subfield><subfield code="w">(DE-604)BV012361456</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=015427689&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-015427689</subfield></datafield></record></collection> |
id | DE-604.BV022216400 |
illustrated | Illustrated |
index_date | 2024-07-02T16:27:31Z |
indexdate | 2024-07-09T20:52:35Z |
institution | BVB |
isbn | 9780321416919 0321416910 |
language | English |
lccn | 2010045454 |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-015427689 |
oclc_num | 634765757 |
open_access_boolean | |
owner | DE-473 DE-BY-UBG DE-92 DE-20 DE-11 DE-29T DE-859 DE-355 DE-BY-UBR DE-634 DE-91G DE-BY-TUM DE-Aug4 DE-739 DE-2070s DE-706 DE-526 DE-522 DE-858 DE-573 |
owner_facet | DE-473 DE-BY-UBG DE-92 DE-20 DE-11 DE-29T DE-859 DE-355 DE-BY-UBR DE-634 DE-91G DE-BY-TUM DE-Aug4 DE-739 DE-2070s DE-706 DE-526 DE-522 DE-858 DE-573 |
physical | XXX, 913 S. Ill., graph. Darst. |
publishDate | 2011 |
publishDateSearch | 2011 |
publishDateSort | 2011 |
publisher | Pearson |
record_format | marc |
spelling | Baeza-Yates, Ricardo Verfasser aut Modern information retrieval the concepts and technology behind search Ricardo Baeza-Yates ; Berthier Ribeiro-Neto 2. ed. Harlow ; Munich [u.a.] Pearson 2011 XXX, 913 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier Hier auch später erschienene, unveränderte Nachdrucke Information retrieval Information Retrieval (DE-588)4072803-1 gnd rswk-swf Information Retrieval (DE-588)4072803-1 s DE-188 Ribeiro, Berthier de Araújo Neto 1960- Verfasser (DE-588)129860751 aut Früher u.d.T. Baeza-Yates, Ricardo Modern information retrieval (DE-604)BV012361456 HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=015427689&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Baeza-Yates, Ricardo Ribeiro, Berthier de Araújo Neto 1960- Modern information retrieval the concepts and technology behind search Information retrieval Information Retrieval (DE-588)4072803-1 gnd |
subject_GND | (DE-588)4072803-1 |
title | Modern information retrieval the concepts and technology behind search |
title_auth | Modern information retrieval the concepts and technology behind search |
title_exact_search | Modern information retrieval the concepts and technology behind search |
title_exact_search_txtP | Modern information retrieval the concepts and technology behind search |
title_full | Modern information retrieval the concepts and technology behind search Ricardo Baeza-Yates ; Berthier Ribeiro-Neto |
title_fullStr | Modern information retrieval the concepts and technology behind search Ricardo Baeza-Yates ; Berthier Ribeiro-Neto |
title_full_unstemmed | Modern information retrieval the concepts and technology behind search Ricardo Baeza-Yates ; Berthier Ribeiro-Neto |
title_old | Baeza-Yates, Ricardo Modern information retrieval |
title_short | Modern information retrieval |
title_sort | modern information retrieval the concepts and technology behind search |
title_sub | the concepts and technology behind search |
topic | Information retrieval Information Retrieval (DE-588)4072803-1 gnd |
topic_facet | Information retrieval Information Retrieval |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=015427689&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT baezayatesricardo moderninformationretrievaltheconceptsandtechnologybehindsearch AT ribeiroberthierdearaujoneto moderninformationretrievaltheconceptsandtechnologybehindsearch |