Analytic pattern matching: from DNA to Twitter
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Cambridge
Cambridge Univ. Press
2015
|
Schlagworte: | |
Online-Zugang: | Klappentext Inhaltsverzeichnis |
Beschreibung: | Includes bibliographical references and index |
Beschreibung: | XXII, 366 S. Ill., graph. Darst. |
ISBN: | 9780521876087 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV042382838 | ||
003 | DE-604 | ||
005 | 20230324 | ||
007 | t | ||
008 | 150227s2015 xxkad|| |||| 00||| eng d | ||
010 | |a 014017256 | ||
020 | |a 9780521876087 |c hbk. |9 978-0-521-87608-7 | ||
035 | |a (OCoLC)915133921 | ||
035 | |a (DE-599)BVBBV042382838 | ||
040 | |a DE-604 |b ger |e aacr | ||
041 | 0 | |a eng | |
044 | |a xxk |c GB | ||
049 | |a DE-703 |a DE-91G | ||
050 | 0 | |a TK7882.P3 | |
082 | 0 | |a 519.2 |2 23 | |
084 | |a ST 330 |0 (DE-625)143663: |2 rvk | ||
084 | |a DAT 770f |2 stub | ||
084 | |a DAT 536f |2 stub | ||
100 | 1 | |a Jacquet, Philippe |d 20. Jh./21. Jh. |e Verfasser |0 (DE-588)130150703 |4 aut | |
245 | 1 | 0 | |a Analytic pattern matching |b from DNA to Twitter |c Philippe Jacquet ; Wojciech Szpankowski |
264 | 1 | |a Cambridge |b Cambridge Univ. Press |c 2015 | |
300 | |a XXII, 366 S. |b Ill., graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Includes bibliographical references and index | ||
650 | 4 | |a Pattern recognition systems | |
650 | 0 | 7 | |a Algorithmus |0 (DE-588)4001183-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Mustervergleich |0 (DE-588)4307192-2 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Mustervergleich |0 (DE-588)4307192-2 |D s |
689 | 0 | 1 | |a Algorithmus |0 (DE-588)4001183-5 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Szpankowski, Wojciech |d 1952- |e Verfasser |0 (DE-588)130150746 |4 aut | |
856 | 4 | 2 | |m Digitalisierung UB Bayreuth - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027818885&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Klappentext |
856 | 4 | 2 | |m Digitalisierung UB Bayreuth - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027818885&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-027818885 |
Datensatz im Suchindex
_version_ | 1804153026696445952 |
---|---|
adam_text | How do you distinguish a cat from a dog by their DNA?
Did Shakespeare really write all his plays?
Pattern matching techniques can offer answers to these questions and to
many others, in contexts from molecular biology to telecommunications to
the classification of Twitter content.
This book, intended for researchers and graduate students, demonstrates the
probabilistic approach to pattern matching, which predicts the performance
of pattern matching algorithms with very high precision using analytic
combinatorics and analytic information theory. Part I compiles results for
pattern matching problems that can be obtained via analytic methods. Part
II focuses on applications to various data structures on words, such as digital
trees, suffix trees, string complexity, and string-based data compression.
The authors use results and techniques from Part I and also introduce new
methodology such as the Mellin transform and analytic depoissonization.
More than 100 end-of-chapter problems will help the reader to make the link
between theory and practice.
Philippe Jacquet is a research director at INRIA, a major public research
laboratory in Computer Science in France. He has been a major contributor
to the Internet OLSR protocol for mobile networks. His research interests
include information theory, probability theory, quantum telecommunication,
protocol design, performance evaluation and optimization, and the analysis
of algorithms. Since 2012 he has been with Alcatel-Lucent Bell Labs as head
of the department of Mathematics of Dynamic Networks and Information.
Jacquet is a member of the prestigious French Corps des Mines, known for
excellence in French industry, with the rank of Ingenieur General. He is also
a member of ACM and IEEE.
Wojciech Szpankowski is Saul Rosen Professor of Computer Science and (by
courtesy) Electrical and Computer Engineering at Purdue University, where
he teaches and conducts research in analysis of algorithms, information
theory, bioinformatics, analytic combinatorics, random structures,
and stability problems of distributed systems. In 2008 he launched the
interdisciplinary Institute for Science of Information, and in 2010 he became
the Director of the newly established NSF Science and Technology Center
for Science of Information. Szpankowski is a Fellow of IEEE and an Erskine
Fellow. He received the Humboldt Research Award in 2010.
Contents
R
Foreword ........................................................... xi
Preface.......................................................... xiii
Acknowledgments.....................................................xix
About the sketches..................................................xxi
I ANALYSIS 1
Chapter 1 Probabilistic Models....................................... 3
1.1 Probabilistic models on words................................ 4
1.2 Probabilistic tools ......................................... 8
1.3 Generating functions and analytic tools..................... 13
1.4 Special functions........................................... 16
1.4.1 Euler’s gamma function............................... 16
1.4.2 Riemann’s zeta function.............................. 18
1.5 Exercises................................................. 19
Bibliographical notes...................................... 21
Chapter 2 Exact String Matching .................................... 23
2.1 Formulation of the problem.................................. 23
2.2 Language representation..................................... 25
2.3 Generating functions........................................ 28
2.4 Moments .................................................... 31
2.5 Limit laws.................................................. 33
2.5.1 Pattern count probability for small values of r ..... 34
2.5.2 Central limit laws .................................. 35
2.5.3 Large deviations..................................... 37
2.5.4 Poisson laws ........................................ 40
viii Contents
2.6 Waiting times................................................... 43
2.7 Exercises ...................................................... 44
Bibliographical notes........................................... 45
Chapter 3 Constrained Exact String Matching........................... 47
3.1 Enumeration of (d,k) sequences.................................. 48
3.1.1 Languages and generating functions........................ 51
3.2 Moments ........................................................ 54
3.3 The probability count........................................... 57
3.4 Central limit law............................................... 60
3.5 Large deviations................................................ 61
3.6 Some extensions ................................................ 70
3.7 Application: significant signals in neural data................. 70
3.8 Exercises ...................................................... 73
Bibliographical notes........................................... 73
Chapter 4 Generalized String Matching ............................... 75
4.1 String matching over a reduced set.............................. 76
4.2 Generalized string matching via automata........................ 82
4.3 Generalized string matching via a language approach............. 93
4.3.1 Symbolic inclusion-exclusion principle.................... 94
4.3.2 Multivariate generating function.......................... 96
4.3.3 Generating function of a cluster.......................... 98
4.3.4 Moments and covariance....................................103
4.4 Exercises ......................................................106
Bibliographical notes...........................................107
Chapter 5 Subsequence String Matching................................109
5.1 Problem formulation.............................................110
5.2 Mean and variance analysis.................................... 112
‘5.3 Autocorrelation polynomial revisited............................118
5.4 Central limit laws..............................................118
5.5 Limit laws for fully constrained pattern........................122
5.6 Generalized subsequence problem.................................123
5.7 Exercises ......................................................129
Bibliographical notes...........................................132
Contents ix
II APPLICATIONS 133
Chapter 6 Algorithms and Data Structures........................135
6.1 Tries........................................................136
6.2 Suffix trees.................................................140
6.3 Lempel—Ziv’77 scheme.........................................143
6.4 Digital search tree..........................................144
6.5 Parsing trees and Lempel-Ziv’78 algorithm ...................147
Bibliographical notes........................................153
Chapter 7 Digital Trees.........................................155
7.1 Digital tree shape parameters................................156
7.2 Moments .....................................................159
7.2.1 Average path length in a trie by Rice’s method......159
7.2.2 Average size of a trie................................168
7.2.3 Average depth in a DST by Rice’s method ............169
7.2.4 Multiplicity parameter by Mellin transform............176
7.2.5 Increasing domains....................................183
7.3 Limiting distributions.......................................185
7.3.1 Depth in a trie.......................................185
7.3.2 Depth in a digital search tree........................189
7.3.3 Asymptotic distribution of the multiplicity parameter . . 193
7.4 Average profile of tries.....................................197
7.5 Exercises ...................................................210
Bibliographical notes........................................216
Chapter 8 Suffix Trees and Lempel—Ziv’77........................219
8.1 Random tries resemble suffix trees...........................222
8.1.1 Proof of Theorem 8.1.2 226
8.1.2 Suffix trees and finite suffix trees are equivalent...239
8.2 Size of suffix tree..........................................243
8.3 Lempel-Ziv’77 .............................................. 248
8.4 Exercises ...................................................265
Bibliographical notes........................................266
Chapter 9 Lempel-Ziv’78 Compression Algorithm...................267
9.1 Description of the algorithm.................................269
9.2 Number of phrases and redundancy of LZ’78 271
9.3 From Lempel-Ziv to digital search tree...........279
x Contents
9.3.1 Moments ................................................282
9.3.2 Distributional analysis.................................285
9.3.3 Large deviation results.................................287
9.4 Proofs of Theorems 9.2.1 and 9.2.2.............................290
9.4.1 Large deviations: proof of Theorem 9.2.2 290
9.4.2 Central limit theorem: Proof of Theorem 9.2.1...........292
9.4.3 Some technical results..................................294
9.4.4 Proof of Theorem 9.4.2 300
9.5 Exercises .....................................................307
Bibliographical notes..........................................309
Chapter 10 String Complexity............................................311
10.1 Introduction to string complexity..............................312
10.1.1 String self-complexity..................................313
10.1.2 Joint string complexity ................................313
10.2 Analysis of string self-complexity.............................314
10.3 Analysis of the joint complexity...............................315
10.3.1 Independent joint complexity............................315
10.3.2 Key property............................................316
10.3.3 Recurrence and generating functions.....................317
10.3.4 Double depoissonization.................................318
10.4 Average joint complexity for identical sources...........322
10.5 Average joint complexity for nonidentical sources..............323
10.5.1 The kernel and its properties...........................323
10.5.2 Main results............................................326
10.5.3 Proof of (10.18) for one symmetric source...............329
10.5.4 Finishing the proof of Theorem 10.5.6...................338
10.6 Joint complexity via suffix trees..............................340
10.6.1 Joint complexity of two suffix trees for nonidentical sources 341
10.6.2 Joint complexity of two suffix trees for identical sources . 343
10.7 Conclusion and applications....................................343
10.8 Exercises .....................................................344
Bibliographical notes..........................................346
Bibliography............................................................347
Index
363
|
any_adam_object | 1 |
author | Jacquet, Philippe 20. Jh./21. Jh Szpankowski, Wojciech 1952- |
author_GND | (DE-588)130150703 (DE-588)130150746 |
author_facet | Jacquet, Philippe 20. Jh./21. Jh Szpankowski, Wojciech 1952- |
author_role | aut aut |
author_sort | Jacquet, Philippe 20. Jh./21. Jh |
author_variant | p j pj w s ws |
building | Verbundindex |
bvnumber | BV042382838 |
callnumber-first | T - Technology |
callnumber-label | TK7882 |
callnumber-raw | TK7882.P3 |
callnumber-search | TK7882.P3 |
callnumber-sort | TK 47882 P3 |
callnumber-subject | TK - Electrical and Nuclear Engineering |
classification_rvk | ST 330 |
classification_tum | DAT 770f DAT 536f |
ctrlnum | (OCoLC)915133921 (DE-599)BVBBV042382838 |
dewey-full | 519.2 |
dewey-hundreds | 500 - Natural sciences and mathematics |
dewey-ones | 519 - Probabilities and applied mathematics |
dewey-raw | 519.2 |
dewey-search | 519.2 |
dewey-sort | 3519.2 |
dewey-tens | 510 - Mathematics |
discipline | Informatik Mathematik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02023nam a2200457 c 4500</leader><controlfield tag="001">BV042382838</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20230324 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">150227s2015 xxkad|| |||| 00||| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">014017256</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780521876087</subfield><subfield code="c">hbk.</subfield><subfield code="9">978-0-521-87608-7</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)915133921</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV042382838</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">xxk</subfield><subfield code="c">GB</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-703</subfield><subfield code="a">DE-91G</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">TK7882.P3</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">519.2</subfield><subfield code="2">23</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 330</subfield><subfield code="0">(DE-625)143663:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 770f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 536f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Jacquet, Philippe</subfield><subfield code="d">20. Jh./21. Jh.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)130150703</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Analytic pattern matching</subfield><subfield code="b">from DNA to Twitter</subfield><subfield code="c">Philippe Jacquet ; Wojciech Szpankowski</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Cambridge</subfield><subfield code="b">Cambridge Univ. Press</subfield><subfield code="c">2015</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXII, 366 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references and index</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Pattern recognition systems</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Algorithmus</subfield><subfield code="0">(DE-588)4001183-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Mustervergleich</subfield><subfield code="0">(DE-588)4307192-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Mustervergleich</subfield><subfield code="0">(DE-588)4307192-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Algorithmus</subfield><subfield code="0">(DE-588)4001183-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Szpankowski, Wojciech</subfield><subfield code="d">1952-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)130150746</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bayreuth - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027818885&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Klappentext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Bayreuth - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027818885&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-027818885</subfield></datafield></record></collection> |
id | DE-604.BV042382838 |
illustrated | Illustrated |
indexdate | 2024-07-10T01:20:04Z |
institution | BVB |
isbn | 9780521876087 |
language | English |
lccn | 014017256 |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-027818885 |
oclc_num | 915133921 |
open_access_boolean | |
owner | DE-703 DE-91G DE-BY-TUM |
owner_facet | DE-703 DE-91G DE-BY-TUM |
physical | XXII, 366 S. Ill., graph. Darst. |
publishDate | 2015 |
publishDateSearch | 2015 |
publishDateSort | 2015 |
publisher | Cambridge Univ. Press |
record_format | marc |
spelling | Jacquet, Philippe 20. Jh./21. Jh. Verfasser (DE-588)130150703 aut Analytic pattern matching from DNA to Twitter Philippe Jacquet ; Wojciech Szpankowski Cambridge Cambridge Univ. Press 2015 XXII, 366 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier Includes bibliographical references and index Pattern recognition systems Algorithmus (DE-588)4001183-5 gnd rswk-swf Mustervergleich (DE-588)4307192-2 gnd rswk-swf Mustervergleich (DE-588)4307192-2 s Algorithmus (DE-588)4001183-5 s DE-604 Szpankowski, Wojciech 1952- Verfasser (DE-588)130150746 aut Digitalisierung UB Bayreuth - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027818885&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Klappentext Digitalisierung UB Bayreuth - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027818885&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Jacquet, Philippe 20. Jh./21. Jh Szpankowski, Wojciech 1952- Analytic pattern matching from DNA to Twitter Pattern recognition systems Algorithmus (DE-588)4001183-5 gnd Mustervergleich (DE-588)4307192-2 gnd |
subject_GND | (DE-588)4001183-5 (DE-588)4307192-2 |
title | Analytic pattern matching from DNA to Twitter |
title_auth | Analytic pattern matching from DNA to Twitter |
title_exact_search | Analytic pattern matching from DNA to Twitter |
title_full | Analytic pattern matching from DNA to Twitter Philippe Jacquet ; Wojciech Szpankowski |
title_fullStr | Analytic pattern matching from DNA to Twitter Philippe Jacquet ; Wojciech Szpankowski |
title_full_unstemmed | Analytic pattern matching from DNA to Twitter Philippe Jacquet ; Wojciech Szpankowski |
title_short | Analytic pattern matching |
title_sort | analytic pattern matching from dna to twitter |
title_sub | from DNA to Twitter |
topic | Pattern recognition systems Algorithmus (DE-588)4001183-5 gnd Mustervergleich (DE-588)4307192-2 gnd |
topic_facet | Pattern recognition systems Algorithmus Mustervergleich |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027818885&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=027818885&sequence=000002&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT jacquetphilippe analyticpatternmatchingfromdnatotwitter AT szpankowskiwojciech analyticpatternmatchingfromdnatotwitter |