Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
Saved in:
Main Authors: | , |
---|---|
Format: | Book |
Language: | English |
Published: |
Cambridge [u.a.]
Cambridge Univ. Pr.
2007
|
Subjects: | |
Online Access: | Inhaltsverzeichnis |
Physical Description: | X, 221 S. Ill., graph. Darst. |
ISBN: | 0521039932 9780521813075 9780521039932 |
Staff View
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV036081854 | ||
003 | DE-604 | ||
005 | 20100401 | ||
007 | t | ||
008 | 100316s2007 ad|| |||| 00||| eng d | ||
020 | |a 0521039932 |9 0-521-03993-2 | ||
020 | |a 9780521813075 |9 978-0-521-81307-5 | ||
020 | |a 9780521039932 |9 978-0-521-03993-2 | ||
035 | |a (OCoLC)1050124351 | ||
035 | |a (DE-599)HBZHT014078940 | ||
040 | |a DE-604 |b ger |e aacr | ||
041 | 0 | |a eng | |
049 | |a DE-92 |a DE-83 |a DE-355 | ||
082 | 0 | |a 005.74 |2 22 | |
084 | |a ST 134 |0 (DE-625)143590: |2 rvk | ||
100 | 1 | |a Navarro, Gonzalo |e Verfasser |4 aut | |
245 | 1 | 0 | |a Flexible pattern matching in strings |b practical on-line search algorithms for texts and biological sequences |c Gonzalo Navarro ; Mathieu Raffinot |
264 | 1 | |a Cambridge [u.a.] |b Cambridge Univ. Pr. |c 2007 | |
300 | |a X, 221 S. |b Ill., graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 0 | 7 | |a Information Retrieval |0 (DE-588)4072803-1 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Algorithmus |0 (DE-588)4001183-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Information Retrieval |0 (DE-588)4072803-1 |D s |
689 | 0 | 1 | |a Algorithmus |0 (DE-588)4001183-5 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Raffinot, Mathieu |e Verfasser |4 aut | |
856 | 4 | 2 | |m Digitalisierung UB Regensburg |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=018972913&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-018972913 |
Record in the Search Index
_version_ | 1814328986877034496 |
---|---|
adam_text |
Contents
1
Introduction page
1
1.1
Why this book? Our aim and focus
1
1.2
Overview
3
1.3
Basic concepts
8
1.3.1
Bit-parallelism and hit operations
8
1.3.2
Labeled rooted tree, trie
9
1.3.3
Automata
11
1.3.4
Complexity notations
12
2
String matching
15
2.1
Basic concepts
15
2.2
Prefix based approach
17
2.2.1
Knuth-Morris-Pratt idea
18
2.2.2
Shift-And/Shift-Or algorithm
19
2.3
Suffix based approach
22
2.3.1
Boyer-Moore idea
22
2.3.2
Horspool
algorithm
25
2.4
Factor based approach
27
2.4.1
Backward Dawg Matching idea
28
2.4.2
Backward Nondeterministic Dawg Matching algorithm
29
2.4.3
Backward Oracle Matching algorithm
34
2.5
Experimental map
38
2.6
Other algorithms and references
39
3
Multiple string matching
41
3.1
Basic concepts
41
3.2
Prefix based approach
45
3.2.1
Multiple Shift-And algorithm
45
3.2.2
Basic Aho-Corasick algorithm
49
vn
viii Contents
3.2.3
Advanced
Aho-Corasick algorithm
54
3.3
Suffix based approach
54
3.3.1
Commentz-Walter idea
55
3.3.2
Set
Horspool
algorithm
56
3.3.3
Wu-Manber algorithm
59
3.4
Factor based approach
62
3.4.1
Multiple BNDM algorithm
63
3.4.2
Set Backward Dawg Matching idea
68
3.4.3
Set Backward Oracle Matching algorithm
69
3.5
Experimental maps
74
3.6
Other algorithms and references
74
4
Extended string matching
77
4.1
Basic concepts
77
4.2
Classes of characters
78
4.2.1
Classes in the pattern
78
4.2.2
Classes in the text
80
4.3
Bounded length gaps
81
4.3.1
Extending Shift-And
82
4.3.2
Extending BNDM
84
4.4
Optional characters
87
4.5
Wild cards and repeatable characters
89
4.5.1
Extended Shift-And
91
4.5.2
Extended BNDM
93
4.6
Multipattern searching
96
4.7
Other algorithms and references
97
5
Regular expression matching
99
5.1
Basic concepts
99
5.2
Building an NFA
102
5.2.1
Thompson automaton
102
5.2.2
Glushkov automaton
105
5.3
Classical approaches to regular expression searching 111
5.3.1
Thompson's NFA simulation 111
5.3.2
Using a deterministic automaton 111
5.3.3
A hybrid approach
115
5.4
Bit-parallel algorithms
117
5.4.1
Bit-parallel Thompson
118
5.4.2
Bit-parallel Glushkov
122
5.5
Filtration approaches
125
5.5.1
Multistring matching approach
126
Contents ix
5.5.2
Gnu's heuristic based on necessary factors
130
5.5.3
An approach based on BNDM
131
5.6
Experimental map
137
5.7
Other algorithms and references
139
5.8
Building a parse tree
139
6
Approximate matching
145
6.1
Basic concepts
145
6.2
Dynamic programming algorithms
146
6.2.1
Computing edit distance
146
6.2.2
Text searching
147
6.2.3
Improving the average case
148
6.2.4
Other algorithms based on dynamic programming
150
6.3
Algorithms based on automata
150
6.4
Bit-parallel algorithms
152
6.4.1
Parallelizing the NFA
152
6.4.2
Parallelizing the DP matrix
158
6.5
Algorithms for fast filtering the text
162
6.5.1
Partitioning into
к
+ 1
pieces
163
6.5.2
Approximate BNDM
166
6.5.3
Other filtration algorithms
170
6.6
Multipattern approximate searching
171
6.6.1
A hashing based algorithm for one error
171
6.6.2
Partitioning into
к
+ 1
pieces
173
6.6.3
Superimposed automata
174
6.7
Searching for extended strings and regular expressions
175
6.7.1
A dynamic programming based approach
176
6.7.2
A Four-Russians approach
178
6.7.3
A bit-parallel approach
180
6.8
Experimental map
181
6.9
Other algorithms and references
183
7
Conclusion
185
7.1
Available software
185
7.1.1
Gnu
Grep
185
7.1.2
Wu and Manber's Agrep
186
7.1.3
Navarro's Nrgrep
187
7.1.4
Mehldau and Myers' Anrep
188
7.1.5
Other resources for computational biology
189
7.2
Other boob
190
7.2.1
Books on string matching
190
χ
Contents
7.2.2
Books on computational biology
192
7.3
Other resources
193
7.3.1
Journals
193
7.3.2
Conferences
193
7.3.3
On-line resources
194
7.4
Related topics
194
7.4.1
Indexing
195
7.4.2
Searching compressed text
196
7.4.3
Repeats and repetitions
199
7.4.4
Pattern matching in two and more dimensions
200
7.4.5
Tree pattern matching
202
7.4.6
Sequence comparison
203
7.4.7
Meaningful string occurrences
205
Bibliography
207
Index
219 |
any_adam_object | 1 |
author | Navarro, Gonzalo Raffinot, Mathieu |
author_facet | Navarro, Gonzalo Raffinot, Mathieu |
author_role | aut aut |
author_sort | Navarro, Gonzalo |
author_variant | g n gn m r mr |
building | Verbundindex |
bvnumber | BV036081854 |
classification_rvk | ST 134 |
ctrlnum | (OCoLC)1050124351 (DE-599)HBZHT014078940 |
dewey-full | 005.74 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 005 - Computer programming, programs, data, security |
dewey-raw | 005.74 |
dewey-search | 005.74 |
dewey-sort | 15.74 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>00000nam a2200000 c 4500</leader><controlfield tag="001">BV036081854</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20100401</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">100316s2007 ad|| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0521039932</subfield><subfield code="9">0-521-03993-2</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780521813075</subfield><subfield code="9">978-0-521-81307-5</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780521039932</subfield><subfield code="9">978-0-521-03993-2</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1050124351</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)HBZHT014078940</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">aacr</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-92</subfield><subfield code="a">DE-83</subfield><subfield code="a">DE-355</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">005.74</subfield><subfield code="2">22</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 134</subfield><subfield code="0">(DE-625)143590:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Navarro, Gonzalo</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Flexible pattern matching in strings</subfield><subfield code="b">practical on-line search algorithms for texts and biological sequences</subfield><subfield code="c">Gonzalo Navarro ; Mathieu Raffinot</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Cambridge [u.a.]</subfield><subfield code="b">Cambridge Univ. Pr.</subfield><subfield code="c">2007</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">X, 221 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Information Retrieval</subfield><subfield code="0">(DE-588)4072803-1</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Algorithmus</subfield><subfield code="0">(DE-588)4001183-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Information Retrieval</subfield><subfield code="0">(DE-588)4072803-1</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Algorithmus</subfield><subfield code="0">(DE-588)4001183-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Raffinot, Mathieu</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=018972913&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-018972913</subfield></datafield></record></collection> |
id | DE-604.BV036081854 |
illustrated | Illustrated |
indexdate | 2024-10-30T09:02:36Z |
institution | BVB |
isbn | 0521039932 9780521813075 9780521039932 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-018972913 |
oclc_num | 1050124351 |
open_access_boolean | |
owner | DE-92 DE-83 DE-355 DE-BY-UBR |
owner_facet | DE-92 DE-83 DE-355 DE-BY-UBR |
physical | X, 221 S. Ill., graph. Darst. |
publishDate | 2007 |
publishDateSearch | 2007 |
publishDateSort | 2007 |
publisher | Cambridge Univ. Pr. |
record_format | marc |
spelling | Navarro, Gonzalo Verfasser aut Flexible pattern matching in strings practical on-line search algorithms for texts and biological sequences Gonzalo Navarro ; Mathieu Raffinot Cambridge [u.a.] Cambridge Univ. Pr. 2007 X, 221 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier Information Retrieval (DE-588)4072803-1 gnd rswk-swf Algorithmus (DE-588)4001183-5 gnd rswk-swf Information Retrieval (DE-588)4072803-1 s Algorithmus (DE-588)4001183-5 s DE-604 Raffinot, Mathieu Verfasser aut Digitalisierung UB Regensburg application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=018972913&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Navarro, Gonzalo Raffinot, Mathieu Flexible pattern matching in strings practical on-line search algorithms for texts and biological sequences Information Retrieval (DE-588)4072803-1 gnd Algorithmus (DE-588)4001183-5 gnd |
subject_GND | (DE-588)4072803-1 (DE-588)4001183-5 |
title | Flexible pattern matching in strings practical on-line search algorithms for texts and biological sequences |
title_auth | Flexible pattern matching in strings practical on-line search algorithms for texts and biological sequences |
title_exact_search | Flexible pattern matching in strings practical on-line search algorithms for texts and biological sequences |
title_full | Flexible pattern matching in strings practical on-line search algorithms for texts and biological sequences Gonzalo Navarro ; Mathieu Raffinot |
title_fullStr | Flexible pattern matching in strings practical on-line search algorithms for texts and biological sequences Gonzalo Navarro ; Mathieu Raffinot |
title_full_unstemmed | Flexible pattern matching in strings practical on-line search algorithms for texts and biological sequences Gonzalo Navarro ; Mathieu Raffinot |
title_short | Flexible pattern matching in strings |
title_sort | flexible pattern matching in strings practical on line search algorithms for texts and biological sequences |
title_sub | practical on-line search algorithms for texts and biological sequences |
topic | Information Retrieval (DE-588)4072803-1 gnd Algorithmus (DE-588)4001183-5 gnd |
topic_facet | Information Retrieval Algorithmus |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=018972913&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT navarrogonzalo flexiblepatternmatchinginstringspracticalonlinesearchalgorithmsfortextsandbiologicalsequences AT raffinotmathieu flexiblepatternmatchinginstringspracticalonlinesearchalgorithmsfortextsandbiologicalsequences |