Text compression:
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Englewood Cliffs, NJ
Prentice Hall
1990
|
Schriftenreihe: | Prentice Hall advanced reference series : Computer science
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | XVIII, 318 S. |
ISBN: | 0139119914 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV004142701 | ||
003 | DE-604 | ||
005 | 20220207 | ||
007 | t | ||
008 | 901114s1990 |||| 00||| engod | ||
020 | |a 0139119914 |9 0-13-911991-4 | ||
035 | |a (OCoLC)263159093 | ||
035 | |a (DE-599)BVBBV004142701 | ||
040 | |a DE-604 |b ger |e rakddb | ||
041 | 0 | |a eng | |
049 | |a DE-12 |a DE-739 |a DE-20 |a DE-91 |a DE-19 |a DE-706 |a DE-83 |a DE-188 | ||
084 | |a ST 284 |0 (DE-625)143647: |2 rvk | ||
084 | |a ST 350 |0 (DE-625)143667: |2 rvk | ||
084 | |a DAT 579f |2 stub | ||
100 | 1 | |a Bell, Timothy C. |e Verfasser |4 aut | |
245 | 1 | 0 | |a Text compression |c Timothy C. Bell ; John G. Cleary ; Ian H. Witten |
264 | 1 | |a Englewood Cliffs, NJ |b Prentice Hall |c 1990 | |
300 | |a XVIII, 318 S. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Prentice Hall advanced reference series : Computer science | |
650 | 0 | 7 | |a Textkompression |0 (DE-588)4254794-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Textverarbeitung |0 (DE-588)4059667-9 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Textkompression |0 (DE-588)4254794-5 |D s |
689 | 0 | |5 DE-604 | |
689 | 1 | 0 | |a Textverarbeitung |0 (DE-588)4059667-9 |D s |
689 | 1 | |5 DE-604 | |
700 | 1 | |a Cleary, John G. |e Verfasser |4 aut | |
700 | 1 | |a Witten, Ian H. |d 1947- |e Verfasser |0 (DE-588)138440166 |4 aut | |
856 | 4 | 2 | |m HBZ Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=002584703&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-002584703 |
Datensatz im Suchindex
_version_ | 1804118357364965376 |
---|---|
adam_text | Titel: Text compression
Autor: Bell, Timothy C.
Jahr: 1990
CONTENTS
PREFACE xiii
Structure of the Book xiv
Acknowledgments xvii
1 WHY COMPRESS TEXT? 1
1.1 Compression, Prediction, and Modeling 6
1.1.1 Inexact Compression, 6
1.1.2 Exact Compression, 8
1.1.3 Modeling and Coding, 10
1.1.4 Static and Adaptive Modeling, 12
1.2 The Importance of Text Compression 13
1.2.1 Compression in the Workplace, 15
1.3 Ad Hoc Compression Methods 18
1.3.1 Irreversible Compression. 20
V
vi Contents
1.3.2 Run-Length Encoding, 20
1.3.3 Bit Mapping, 22
1.3.4 Packing, 22
1.3.5 Differential Coding, 23
1.3.6 Special Data Types, 23
1.3.7 Move to Front (MTF) Coding, 24
1.4 The Subject of Text Compression 24
1.5 A Note for the Practitioner 25
Notes 26
2 INFORMATION AND MODELS 28
2.1 Measuring Information 29
2.1.1 Entropy, 30
2.1.2 Compression, 31
2.2 Models 33
2.2. / Finite-Context Models, 33
2.2.2 Finite-State Models, 34
2.2.3 Grammar Models, 36
2.2.4 Ergodic Models, 38
2.2.5 Computing the Entropy of a Model, 40
2.2.6 Conditioning Classes, 43
Notes 47
2A Noiseless Source Coding Theorem 47
2B Examples of Calculating Exact Message Probabilities 48
3 ADAPTIVE MODELS 52
3.1 Zero-Frequency Problem 54
3.2 Comparison of Adaptive and Nonadaptive Models 56
3.2.1 Approximations to Enumerative Codes, 57
3.2.2 Encoding Without Prior Statistics, 62
3.2.3 Analysis and Comparison, 65
3.2.4 Robustness Analysis, 68
3.2.5 Changing Conditioning Classes, 70
Notes 70
3A Exact Encoding Inequality 70
Contents vii
3B Mathematical Techniques 74
3B-1 Stirling s Approximation, 74
3B-2 Expanding Log n, 74
3B-3 Minimizing/Maximizing Multinomials, 75
4 MODELING NATURAL LANGUAGE 77
4.1 Letters 78
4.2 Words 81
4.3 Theoretical Models of Natural Distributions 85
4.4 The Information Content of Natural Language 93
Notes 98
5 FROM PROBABILITIES TO BITS 100
5.1 The Quest for Optimal Source Coding 102
1.I.1 Shannon—Fano Coding, 103
5.1.2 Huffman Coding, 105
5.1.3 The Birth of Arithmetic Coding, 108
5.2 Arithmetic Coding 109
5.2.1 Implementing Arithmetic Coding, 112
5.2.2 Incremental Transmission and Reception, 113
5.2.3 Decoding Using Integer Arithmetic, 115
5.2.4 Underflow and Overflow, 115
5.2.5 Word-Length Constraints, 118
5.2.6 Terminating the Message, 119
5.2.7 The Model, 119
5.3 Implementing Adaptive Models 120
5.3.1 Adaptive Huffman Coding, 121
5.3.2 Adaptive Arithmetic Coding, 123
5.3.3 Efficient Adaptive Arithmetic Coding for Large Alphabets, 124
5.3.4 Efficient Adaptive Arithmetic Coding for Binary Alphabets, 127
5.4 Performance of Arithmetic Coding 129
5.4.1 Compression Efficiency, 129
5.4.2 Execution Time, 130
Notes 131
viii Contents
5 A An Implementation of Arithmetic Coding 132
Encoding and Decoding, 132
Implementing Models, 137
5B Proof of Decoding Inequality 137
6 CONTEXT MODELING 140
6.1 Blending 141
6.1.1 A Simple Blending Algorithm, 142
6.1.2 Escape Probabilities, 143
6.1.3 Determining Escape Probabilities: Method A, 144
6.1.4 Determining Escape Probabilities: Method B, 145
6.1.5 Determining Escape Probabilities: Method C, 146
6.1.6 Exclusion, 146
6.2 Building Models 149
6.2.1 Contexts and Dictionaries, 149
6.2.2 Scaling Counts, 151
6.2.3 Alphabets, 152
6.3 Description of Some Finite-Context Models 152
6.4 Experimental Performance 153
6.4.1 Blending, 154
6.4.2 Escape Probabilities, 154
6.5 Implementation 155
6.5.1 Finding Contexts, 157
6.5.2 Computing Probablities, 162
6.5.3 Space, 165
6.5.4 Small Alphabets, 165
Notes 166
7 STATE-BASED MODELING 167
7.1 Optimal State Models 169
7.1.1 Input-Output Models, 172
7.1.2 General models, 176
7.1.3 State versus Context Models, 179
7.2 Approximate State Models 185
7.2 / Limited-Context State Modeling, 185
Contents ix
7.2.2 Reduction and Evaluation, 189
7.3 Dynamic Markov Modeling 191
7.3.1 How DMC Works, 191
7.3.2 Initial Models, 193
7.3.3 Forgetting, 196
7.4 The Power of DMC 197
7.4.1 Distinguishing Finite-Context Models, 197
7.4.2 Proof That DMC Models Are Finite Context, 199
7.4.3 Extending the Proof to Other Initial Models, 200
Notes 205
8 DICTIONARY TECHNIQUES 206
8.1 Parsing Strategies 208
8.2 Static and Semiadaptive Dictionary Encoders 211
8.2.1 Static Dictionary Encoders, 211
8.2.2 Semiadaptive Dictionary Encoders, 212
8.3 Adaptive Dictionary Encoders: Ziv—Limpel Coding 214
8.3.1 LZ77.218
8.3.2 LZR,220
8.3.3 LZSS, 221
8.3.4 LZB,222
8.3.5 LZH, 223
8.3.6 LZ78,225
8.3.7 LZW,226
8.3.8 LIC,228
8.3.9 LZT,228
8.3.10 LZMW.229
8.3.11 LZJ,229
8.3.12 LZFG,230
8.3.13 Compression Performance of Ziv—Lempel Coding, 234
8.4 Data Structures for Dictionary Coding 235
8.4.1 Unsorted List, 235
8.4.2 Sorted List, 235
8.4.3 Binary Tree, 236
8.4.4 Trie, 238
8.4.5 Hash Table, 239
8.4.6 Coping With Data Structures Where Deletion Is Difficult, 241
8.4.7 Parallel Algorithms, 242
Notes 242
i
x Contents
9 CHOOSING YOUR WEAPON 244
9.1 Dictionary Coding versus Statistical Coding 245
9.1.1 General Algorithm, 248
9.1.2 Proof of Equivalence, 252
9.1.3 Equivalent Statistical Compression, 253
9.2 Looking Inside Models 254
9.3 Practical Comparisons 257
9.3.1 Measuring Compression, 258
9.3.2 The Schemes, 258
9.3.3 Parameters for the Schemes, 259
9.3.4 Compression Experiments, 261
9.3.5 Speed and Storage, 265
9.4 Convergence 268
9.5 Compression in Practice 270
9.5.1 Finite Memory, 271
9.5.2 Error Control, 272
9.5.3 Upward Compatibility, 272
Notes 273
10 THE WIDER VIEW 274
10.1 An Overview 274
10.1.1 The Reactive Keyboard, 275
10.1.2 An Autoprogramming Calculator, 277
10.1.3 Predicting User Input to an Interactive System, 280
10.1.4 Error Control with Arithmetic Coding, 281
10.1.5 Privacy and Compression, 284
10.1.6 Progressive Transmission of Pictures, 285
Notes 289
A VARIABLE-LENGTH REPRESENTATION OF THE INTEGERS 290
A.I Coding Arbitrarily Large Integers 290
A.2 Phasing in Binary Codes 293
A.3 Start-Step-Stop Codes 294
Notes 295
Contents xi
B THE COMPRESSION CORPUS 296
GLOSSARY 299
REFERENCES 303
INDEX 311
|
any_adam_object | 1 |
author | Bell, Timothy C. Cleary, John G. Witten, Ian H. 1947- |
author_GND | (DE-588)138440166 |
author_facet | Bell, Timothy C. Cleary, John G. Witten, Ian H. 1947- |
author_role | aut aut aut |
author_sort | Bell, Timothy C. |
author_variant | t c b tc tcb j g c jg jgc i h w ih ihw |
building | Verbundindex |
bvnumber | BV004142701 |
classification_rvk | ST 284 ST 350 |
classification_tum | DAT 579f |
ctrlnum | (OCoLC)263159093 (DE-599)BVBBV004142701 |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01623nam a2200409 c 4500</leader><controlfield tag="001">BV004142701</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20220207 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">901114s1990 |||| 00||| engod</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0139119914</subfield><subfield code="9">0-13-911991-4</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)263159093</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV004142701</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakddb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-12</subfield><subfield code="a">DE-739</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-91</subfield><subfield code="a">DE-19</subfield><subfield code="a">DE-706</subfield><subfield code="a">DE-83</subfield><subfield code="a">DE-188</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 284</subfield><subfield code="0">(DE-625)143647:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 350</subfield><subfield code="0">(DE-625)143667:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 579f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Bell, Timothy C.</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Text compression</subfield><subfield code="c">Timothy C. Bell ; John G. Cleary ; Ian H. Witten</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Englewood Cliffs, NJ</subfield><subfield code="b">Prentice Hall</subfield><subfield code="c">1990</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XVIII, 318 S.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Prentice Hall advanced reference series : Computer science</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Textkompression</subfield><subfield code="0">(DE-588)4254794-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Textverarbeitung</subfield><subfield code="0">(DE-588)4059667-9</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Textkompression</subfield><subfield code="0">(DE-588)4254794-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="689" ind1="1" ind2="0"><subfield code="a">Textverarbeitung</subfield><subfield code="0">(DE-588)4059667-9</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="1" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Cleary, John G.</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Witten, Ian H.</subfield><subfield code="d">1947-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)138440166</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=002584703&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-002584703</subfield></datafield></record></collection> |
id | DE-604.BV004142701 |
illustrated | Not Illustrated |
indexdate | 2024-07-09T16:09:01Z |
institution | BVB |
isbn | 0139119914 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-002584703 |
oclc_num | 263159093 |
open_access_boolean | |
owner | DE-12 DE-739 DE-20 DE-91 DE-BY-TUM DE-19 DE-BY-UBM DE-706 DE-83 DE-188 |
owner_facet | DE-12 DE-739 DE-20 DE-91 DE-BY-TUM DE-19 DE-BY-UBM DE-706 DE-83 DE-188 |
physical | XVIII, 318 S. |
publishDate | 1990 |
publishDateSearch | 1990 |
publishDateSort | 1990 |
publisher | Prentice Hall |
record_format | marc |
series2 | Prentice Hall advanced reference series : Computer science |
spelling | Bell, Timothy C. Verfasser aut Text compression Timothy C. Bell ; John G. Cleary ; Ian H. Witten Englewood Cliffs, NJ Prentice Hall 1990 XVIII, 318 S. txt rdacontent n rdamedia nc rdacarrier Prentice Hall advanced reference series : Computer science Textkompression (DE-588)4254794-5 gnd rswk-swf Textverarbeitung (DE-588)4059667-9 gnd rswk-swf Textkompression (DE-588)4254794-5 s DE-604 Textverarbeitung (DE-588)4059667-9 s Cleary, John G. Verfasser aut Witten, Ian H. 1947- Verfasser (DE-588)138440166 aut HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=002584703&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Bell, Timothy C. Cleary, John G. Witten, Ian H. 1947- Text compression Textkompression (DE-588)4254794-5 gnd Textverarbeitung (DE-588)4059667-9 gnd |
subject_GND | (DE-588)4254794-5 (DE-588)4059667-9 |
title | Text compression |
title_auth | Text compression |
title_exact_search | Text compression |
title_full | Text compression Timothy C. Bell ; John G. Cleary ; Ian H. Witten |
title_fullStr | Text compression Timothy C. Bell ; John G. Cleary ; Ian H. Witten |
title_full_unstemmed | Text compression Timothy C. Bell ; John G. Cleary ; Ian H. Witten |
title_short | Text compression |
title_sort | text compression |
topic | Textkompression (DE-588)4254794-5 gnd Textverarbeitung (DE-588)4059667-9 gnd |
topic_facet | Textkompression Textverarbeitung |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=002584703&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT belltimothyc textcompression AT clearyjohng textcompression AT wittenianh textcompression |