Data-intensive text processing with MapReduce:
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
[San Rafael, Calif.]
Morgan & Claypool
2010
|
Schriftenreihe: | Synthesis lectures on human language technologies
7 |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Includes bibliographical references |
Beschreibung: | IX, 165 S. graph. Darst. |
ISBN: | 9781608453429 9781608453436 |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV036593470 | ||
003 | DE-604 | ||
005 | 20110812 | ||
007 | t | ||
008 | 100730s2010 d||| |||| 00||| eng d | ||
020 | |a 9781608453429 |9 978-1-60845-342-9 | ||
020 | |a 9781608453436 |9 978-1-60845-343-6 | ||
035 | |a (OCoLC)705706098 | ||
035 | |a (DE-599)BVBBV036593470 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
049 | |a DE-19 |a DE-859 |a DE-739 |a DE-2070s | ||
084 | |a ST 270 |0 (DE-625)143638: |2 rvk | ||
100 | 1 | |a Lin, Jimmy |e Verfasser |4 aut | |
245 | 1 | 0 | |a Data-intensive text processing with MapReduce |c Jimmy Lin and Chris Dyer |
264 | 1 | |a [San Rafael, Calif.] |b Morgan & Claypool |c 2010 | |
300 | |a IX, 165 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Synthesis lectures on human language technologies |v 7 | |
500 | |a Includes bibliographical references | ||
650 | 4 | |a Database management | |
650 | 4 | |a Cloud computing / Programming | |
650 | 4 | |a Parallel processing (Electronic computers) / Programming | |
650 | 4 | |a Electronic data processing / Distributed processing / Programming | |
650 | 4 | |a Automatic Data Processing | |
650 | 4 | |a Datenverarbeitung | |
650 | 0 | 7 | |a Natürlichsprachiges System |0 (DE-588)4284757-6 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Parallelverarbeitung |0 (DE-588)4075860-6 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Hadoop |0 (DE-588)1022420135 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Verteiltes System |0 (DE-588)4238872-7 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Information Retrieval |0 (DE-588)4072803-1 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Parallelverarbeitung |0 (DE-588)4075860-6 |D s |
689 | 0 | 1 | |a Hadoop |0 (DE-588)1022420135 |D s |
689 | 0 | 2 | |a Verteiltes System |0 (DE-588)4238872-7 |D s |
689 | 0 | 3 | |a Natürlichsprachiges System |0 (DE-588)4284757-6 |D s |
689 | 0 | 4 | |a Information Retrieval |0 (DE-588)4072803-1 |D s |
689 | 0 | 5 | |a Maschinelles Lernen |0 (DE-588)4193754-5 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Dyer, Chris |e Sonstige |4 oth | |
830 | 0 | |a Synthesis lectures on human language technologies |v 7 |w (DE-604)BV035447238 |9 7 | |
856 | 4 | 2 | |m Digitalisierung UB Passau |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020514131&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-020514131 |
Datensatz im Suchindex
_version_ | 1804143191617699840 |
---|---|
adam_text | Coîitents
Acknowledgments
...........................................................xi
Introduction
.................................................................1
1.1
Computing in the Clouds
...................................................6
1.2
Big Ideas
..................................................................8
1.3
Why Is This Different?
....................................................13
1.4
What This Book Is Not
....................................................15
MapReduce Basics
..........................................................17
2.1
Functional Programming Roots
.............................................18
2.2
Mappers and Reducers
.....................................................20
2.3
The Execution Framework
.................................................24
2.4
Partitioners and Combiners
.................................................26
2.5
The Distributed File System
................................................28
2.6
Hadoop Cluster Architecture
...............................................33
2.7
Summary
.................................................................34
MapReduce Algorithm Design
...............................................37
3.1
Local Aggregation
.........................................................39
3.1.1
Combiners and In-Mapper Combining
................................39
3.1.2
Algorithmic Correctness with Local Aggregation
.......................43
3.2
Pairs and Stripes
...........................................................47
3.3
Computing Relative Frequencies
............................................52
3.4
Secondary Sorting
.........................................................57
3.5
Relational Joins
...........................................................58
3.5.1
Reduce-Side Join
....................................................60
3.5.2
Map-Side Join
......................................................62
VIU
3.5.3
Memory-Backed Join
................................................63
3.6
Summary
.................................................................63
4
Inverted Indexing for Text Retrieval
..........................................65
4.1
Web Crawling
............................................................66
4.2
Inverted Indexes
...........................................................68
4.3
Inverted Indexing: Baseline Implementation
.................................69
4.4
Inverted Indexing: Revised Implementation
..................................72
4.5
Index Compression
........................................................74
4.5.1
Byte-Aligned and Word-Aligned Codes
...............................75
4.5.2
Bit-Aligned Codes
..................................................76
4.5.3
Postings Compression
...............................................78
4.6
What About Retrieval?
....................................................80
4.7
Summary and Additional Readings
..........................................83
5
Graph Algorithms
...........................................................85
5.1
Graph Representations
.....................................................87
5.2
Parallel Breadth-First Search
...............................................88
5.3
PageRank
.................................................................95
5.4
Issues with Graph Processing
..............................................100
5.5
Summary and Additional Readings
........................................102
6
EM Algorithms for Text Processing
.........................................105
6.1
Expectation Maximization
................................................108
6.1.1
Maximum Likelihood Estimation
...................................108
6.1.2
A Latent Variable Marble Game
.....................................110
6.1.3
MLE with Latent variables
.........................................
Ill
6.1.4
Expectation Maximization
..........................................112
6.1.5
An EM Example
...................................................113
6.2
Hidden Markov Models
..................................................114
6.2.1
Three Questions for Hidden Markov Models
.........................115
CONTENTS ix
6.2.2
The Forward Algorithm
............................................117
6.2.3
The Viterbi Algorithm
.............................................118
6.2.4
Parameter Estimation for HMMs
...................................120
6.2.5
Forward-Backward Training: Summary
..............................125
6.3
EM in MapReduce
.......................................................125
6.3.1
HMM
Training in MapReduce
......................................126
6.4
Case Study: Word Alignment for Statistical Machine Translation
.............130
6.4.1
Statistical Phrase-Based Translation
..................................131
6.4.2
Brief Digression: Language Modeling with MapReduce
...............133
6.4.3
Word Alignment
...................................................134
6.4.4
Experiments
.......................................................135
6.5
EM-Like Algorithms
.....................................................138
6.5.1
Gradient-Based Optimization and Log-Linear Models
................138
6.6
Summary and Additional Readings
........................................141
Closing Remarks
...........................................................143
7.1
Limitations of MapReduce
................................................143
7.2
Alternative Computing Paradigms
.........................................145
7.3
MapReduce and Beyond
..................................................146
Bibliography
...............................................................149
Authors Biographies
.......................................................165
|
any_adam_object | 1 |
author | Lin, Jimmy |
author_facet | Lin, Jimmy |
author_role | aut |
author_sort | Lin, Jimmy |
author_variant | j l jl |
building | Verbundindex |
bvnumber | BV036593470 |
classification_rvk | ST 270 |
ctrlnum | (OCoLC)705706098 (DE-599)BVBBV036593470 |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02469nam a2200565 cb4500</leader><controlfield tag="001">BV036593470</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20110812 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">100730s2010 d||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781608453429</subfield><subfield code="9">978-1-60845-342-9</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781608453436</subfield><subfield code="9">978-1-60845-343-6</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)705706098</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV036593470</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-19</subfield><subfield code="a">DE-859</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-2070s</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 270</subfield><subfield code="0">(DE-625)143638:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Lin, Jimmy</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Data-intensive text processing with MapReduce</subfield><subfield code="c">Jimmy Lin and Chris Dyer</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">[San Rafael, Calif.]</subfield><subfield code="b">Morgan & Claypool</subfield><subfield code="c">2010</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">IX, 165 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Synthesis lectures on human language technologies</subfield><subfield code="v">7</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Database management</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cloud computing / Programming</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Parallel processing (Electronic computers) / Programming</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Electronic data processing / Distributed processing / Programming</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Automatic Data Processing</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Datenverarbeitung</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Natürlichsprachiges System</subfield><subfield code="0">(DE-588)4284757-6</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Parallelverarbeitung</subfield><subfield code="0">(DE-588)4075860-6</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Hadoop</subfield><subfield code="0">(DE-588)1022420135</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Verteiltes System</subfield><subfield code="0">(DE-588)4238872-7</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Information Retrieval</subfield><subfield code="0">(DE-588)4072803-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Parallelverarbeitung</subfield><subfield code="0">(DE-588)4075860-6</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Hadoop</subfield><subfield code="0">(DE-588)1022420135</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Verteiltes System</subfield><subfield code="0">(DE-588)4238872-7</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="3"><subfield code="a">Natürlichsprachiges System</subfield><subfield code="0">(DE-588)4284757-6</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="4"><subfield code="a">Information Retrieval</subfield><subfield code="0">(DE-588)4072803-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="5"><subfield code="a">Maschinelles Lernen</subfield><subfield code="0">(DE-588)4193754-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Dyer, Chris</subfield><subfield code="e">Sonstige</subfield><subfield code="4">oth</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Synthesis lectures on human language technologies</subfield><subfield code="v">7</subfield><subfield code="w">(DE-604)BV035447238</subfield><subfield code="9">7</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Passau</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020514131&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-020514131</subfield></datafield></record></collection> |
id | DE-604.BV036593470 |
illustrated | Illustrated |
indexdate | 2024-07-09T22:43:44Z |
institution | BVB |
isbn | 9781608453429 9781608453436 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-020514131 |
oclc_num | 705706098 |
open_access_boolean | |
owner | DE-19 DE-BY-UBM DE-859 DE-739 DE-2070s |
owner_facet | DE-19 DE-BY-UBM DE-859 DE-739 DE-2070s |
physical | IX, 165 S. graph. Darst. |
publishDate | 2010 |
publishDateSearch | 2010 |
publishDateSort | 2010 |
publisher | Morgan & Claypool |
record_format | marc |
series | Synthesis lectures on human language technologies |
series2 | Synthesis lectures on human language technologies |
spelling | Lin, Jimmy Verfasser aut Data-intensive text processing with MapReduce Jimmy Lin and Chris Dyer [San Rafael, Calif.] Morgan & Claypool 2010 IX, 165 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Synthesis lectures on human language technologies 7 Includes bibliographical references Database management Cloud computing / Programming Parallel processing (Electronic computers) / Programming Electronic data processing / Distributed processing / Programming Automatic Data Processing Datenverarbeitung Natürlichsprachiges System (DE-588)4284757-6 gnd rswk-swf Parallelverarbeitung (DE-588)4075860-6 gnd rswk-swf Maschinelles Lernen (DE-588)4193754-5 gnd rswk-swf Hadoop (DE-588)1022420135 gnd rswk-swf Verteiltes System (DE-588)4238872-7 gnd rswk-swf Information Retrieval (DE-588)4072803-1 gnd rswk-swf Parallelverarbeitung (DE-588)4075860-6 s Hadoop (DE-588)1022420135 s Verteiltes System (DE-588)4238872-7 s Natürlichsprachiges System (DE-588)4284757-6 s Information Retrieval (DE-588)4072803-1 s Maschinelles Lernen (DE-588)4193754-5 s DE-604 Dyer, Chris Sonstige oth Synthesis lectures on human language technologies 7 (DE-604)BV035447238 7 Digitalisierung UB Passau application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020514131&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Lin, Jimmy Data-intensive text processing with MapReduce Synthesis lectures on human language technologies Database management Cloud computing / Programming Parallel processing (Electronic computers) / Programming Electronic data processing / Distributed processing / Programming Automatic Data Processing Datenverarbeitung Natürlichsprachiges System (DE-588)4284757-6 gnd Parallelverarbeitung (DE-588)4075860-6 gnd Maschinelles Lernen (DE-588)4193754-5 gnd Hadoop (DE-588)1022420135 gnd Verteiltes System (DE-588)4238872-7 gnd Information Retrieval (DE-588)4072803-1 gnd |
subject_GND | (DE-588)4284757-6 (DE-588)4075860-6 (DE-588)4193754-5 (DE-588)1022420135 (DE-588)4238872-7 (DE-588)4072803-1 |
title | Data-intensive text processing with MapReduce |
title_auth | Data-intensive text processing with MapReduce |
title_exact_search | Data-intensive text processing with MapReduce |
title_full | Data-intensive text processing with MapReduce Jimmy Lin and Chris Dyer |
title_fullStr | Data-intensive text processing with MapReduce Jimmy Lin and Chris Dyer |
title_full_unstemmed | Data-intensive text processing with MapReduce Jimmy Lin and Chris Dyer |
title_short | Data-intensive text processing with MapReduce |
title_sort | data intensive text processing with mapreduce |
topic | Database management Cloud computing / Programming Parallel processing (Electronic computers) / Programming Electronic data processing / Distributed processing / Programming Automatic Data Processing Datenverarbeitung Natürlichsprachiges System (DE-588)4284757-6 gnd Parallelverarbeitung (DE-588)4075860-6 gnd Maschinelles Lernen (DE-588)4193754-5 gnd Hadoop (DE-588)1022420135 gnd Verteiltes System (DE-588)4238872-7 gnd Information Retrieval (DE-588)4072803-1 gnd |
topic_facet | Database management Cloud computing / Programming Parallel processing (Electronic computers) / Programming Electronic data processing / Distributed processing / Programming Automatic Data Processing Datenverarbeitung Natürlichsprachiges System Parallelverarbeitung Maschinelles Lernen Hadoop Verteiltes System Information Retrieval |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020514131&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
volume_link | (DE-604)BV035447238 |
work_keys_str_mv | AT linjimmy dataintensivetextprocessingwithmapreduce AT dyerchris dataintensivetextprocessingwithmapreduce |