Sequence data mining:
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Berlin
Springer
2007
|
Schriftenreihe: | Advances in Database Systems
33 |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | 150 S. Ill. |
ISBN: | 9780387699363 0387699368 |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV022301682 | ||
003 | DE-604 | ||
005 | 20071126 | ||
007 | t | ||
008 | 070307s2007 gw a||| |||| 00||| eng d | ||
015 | |a 07,N07,0025 |2 dnb | ||
016 | 7 | |a 982648189 |2 DE-101 | |
020 | |a 9780387699363 |c Gb. : ca. EUR 81.59 (freier Pr.), ca. sfr 129.00 (freier Pr.) |9 978-0-387-69936-3 | ||
020 | |a 0387699368 |c Gb. : ca. EUR 81.59 (freier Pr.), ca. sfr 129.00 (freier Pr.) |9 0-387-69936-8 | ||
024 | 3 | |a 9780387699363 | |
028 | 5 | 2 | |a 11736974 |
035 | |a (OCoLC)255755736 | ||
035 | |a (DE-599)BVBBV022301682 | ||
040 | |a DE-604 |b ger |e rakddb | ||
041 | 0 | |a eng | |
044 | |a gw |c XA-DE-BE | ||
049 | |a DE-355 |a DE-824 |a DE-N2 |a DE-1051 |a DE-11 | ||
084 | |a ST 530 |0 (DE-625)143679: |2 rvk | ||
084 | |a 004 |2 sdnb | ||
100 | 1 | |a Dong, Guozhu |d 1957- |e Verfasser |0 (DE-588)124939783 |4 aut | |
245 | 1 | 0 | |a Sequence data mining |c by Guozhu Dong and Jian Pei |
264 | 1 | |a Berlin |b Springer |c 2007 | |
300 | |a 150 S. |b Ill. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Advances in Database Systems |v 33 | |
650 | 0 | 7 | |a Data Mining |0 (DE-588)4428654-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Data Mining |0 (DE-588)4428654-5 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Pei, Jian |e Verfasser |4 aut | |
830 | 0 | |a Advances in Database Systems |v 33 |w (DE-604)BV021653394 |9 33 | |
856 | 4 | 2 | |m Digitalisierung UB Regensburg |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=015511662&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-015511662 |
Datensatz im Suchindex
_version_ | 1804136319584043008 |
---|---|
adam_text | Contents
1
Introduction
............................................... 1
1.1
Examples and Applications of Sequence Data
............... 1
1.1.1
Examples of Sequence Data
......................... 2
1.1.2
Examples of Sequence Mining Applications
........... 4
1.2
Basic Definitions
........................................ 6
1.2.1
Sequences and Sequence Types
...................... 6
1.2.2
Characteristics of Sequence Data
.................... 7
1.2.3
Sequence Patterns and Sequence Models
............. 8
1.3
General Data Mining Processes and Research Issues
......... 11
1.4
Overview of the Book
.................................... 12
2
SVequent and Closed Sequence Patterns
................... 15
2.1
Sequential Patterns
...................................... 15
2.2
GSP: An Apriori-iike Method
............................. 18
2.3
PreflxSpan: A Pattern-growth, Depth-first Search Method
___ 20
2.3.1
Apriori-like, Breadth-first Search versus Pattern-
growth, Depth-first Search
.......................... 20
2.3.2
PrefixSpan
....................................... 22
2.3.3
Pseudo-Projection
................................. 26
2.4
Mining Sequential Patterns with Constraints
................ 28
2.4.1
Categories of Constraints
........................... 29
2.4.2
Mining Sequential Patterns with Prefix-Monotone
Constraints
....................................... 33
2.4.3
Prefix-Monotone Property
.......................... 33
2.4.4
Pushing Prefix-Monotone Constraints into Sequential
Pattern Mining
................................... 35
2.4.5
Handling Tough Aggregate Constraints by Prefix-growth
39
2.5
Mining
Closed Sequential Patterns
......................... 42
2.5.1
Closed Sequential Patterns
......................... 42
2.5.2
Efficiently Mining Closed Sequential Patterns
___.___ 44
2.6
Summary
............................................... 45
XIV Contents
3
Classification,
Clustering, Features and Distances
of Sequence Data
.......................................... 47
3.1
Three Tasks on Sequence Classification/Clustering
........... 47
3.2
Sequence Features
....................................... 48
3.2.1
Sequence Feature Types
............................ 48
3.2.2
Sequence Feature Selection
......................... 50
3.3
Distance Functions over Sequences
......................... 51
3.3.1
Overview on Sequence Distance Functions
............ 51
3.3.2
Edit, Hamming, and Alignment based Distances
....... 52
3.3.3
Conditional Probability Distribution based Distance
... 53
3.3.4
An Example of Feature based Distance: d2
........... 53
3.3.5
Web Session Similarity
............................. 54
3.4
Classification of Sequence Data
............................ 55
3.4.1
Support Vector Machines
........................... 55
3.4.2
Artificial Neural Networks
.......................... 57
3.4.3
Other Methods
.................................... 58
3.4.4
Evaluation of Classifiers and Classification Algorithms
. 58
3.5
Clustering Sequence Data
................................ 60
3.5.1
Popular Sequence Clustering Approaches
............. 60
3.5.2
Quality Evaluation of Clustering Results
............. 65
4
Sequence Motifs: Identifying and Characterizing Sequence
Families
................................................... 67
4.1
Motivations and Problems
................................ 68
4.1.1
Motivations
....................................... 68
4.1.2
Four Motif Analysis Problems
....................... 69
4.2
Motif Representations
.................................... 70
4.2.1
Consensus Sequence
............................... 71
4.2.2
Position Weight Matrix (PWM)
..................... 71
4.2.3
Markov Chain Model
.............................. 74
4.2.4
Hidden Markov Model
(HMM)
...................... 77
4.3
Representative Algorithms for Motif Problems
.............. 79
4.3.1
Dynamic Programming for Sequence Scoring
and Explanation with
HMM
........................ 80
4.3.2
Gibbs Sampling for Constructing PWM-based Motif
... 82
4.3.3
Expectation Maximization for Building
HMM
......... 84
4.4
Discussion
.............................................. 86
5
Mining Partial Orders from Sequences
..................... 89
5.1
Mining Frequent Closed Partial Orders
..................... 91
5.1.1
Problem Definition
................................ 91
5.1.2
How Is Frequent Closed Partial Order Mining
Different from Other Data Mining Tasks?
............ 94
5.1.3
TranClose: A Rudimentary Method
.................. 97
5.1.4
Algorithm Frecpo
.................................100
5.1.5
Applications
......................................106
Contents
XV
5.2 Mining Global
Partial
Orders.............................107
5.2.1 Motivation
and Preliminaries
.......................107
5.2.2 Mining
Algorithms
................................108
5.2.3
Mixture
Models...................................
Ill
5.3
Summary
...............................................112
6
Distinguishing Sequence Patterns
..........................113
6.1
Categories of Distinguishing Sequence Patterns
..............113
6.2
Class-Characteristics Distinguishing Sequence Patterns
.......115
6.2.1
Definitions and Terminology
........................115
6.2.2
The ConSGapMiner Algorithm
.....................117
6.2.3
Extending ConSGapMiner: Minimum Gap Constraints
. 124
6.2.4
Extending ConSGapMiner: Coverage and Prefix-Based
Pattern Minimization
..............................126
6.3
Surprising Sequence Patterns
.............................128
7
Related Topics
.............................................131
7.1
Structured-Data Mining
..................................131
7.2
Partial Periodic Pattern Mining
...........................132
7.3
Bioinformatics
..........................................134
7.4
Sequence Alignment
.....................................135
7.5
Biological Sequence Databases and Biological Data Analysis
Resources
..............................................137
References
.....................................................139
Index
..........................................................147
|
adam_txt |
Contents
1
Introduction
. 1
1.1
Examples and Applications of Sequence Data
. 1
1.1.1
Examples of Sequence Data
. 2
1.1.2
Examples of Sequence Mining Applications
. 4
1.2
Basic Definitions
. 6
1.2.1
Sequences and Sequence Types
. 6
1.2.2
Characteristics of Sequence Data
. 7
1.2.3
Sequence Patterns and Sequence Models
. 8
1.3
General Data Mining Processes and Research Issues
. 11
1.4
Overview of the Book
. 12
2
SVequent and Closed Sequence Patterns
. 15
2.1
Sequential Patterns
. 15
2.2
GSP: An Apriori-iike Method
. 18
2.3
PreflxSpan: A Pattern-growth, Depth-first Search Method
_ 20
2.3.1
Apriori-like, Breadth-first Search versus Pattern-
growth, Depth-first Search
. 20
2.3.2
PrefixSpan
. 22
2.3.3
Pseudo-Projection
. 26
2.4
Mining Sequential Patterns with Constraints
. 28
2.4.1
Categories of Constraints
. 29
2.4.2
Mining Sequential Patterns with Prefix-Monotone
Constraints
. 33
2.4.3
Prefix-Monotone Property
. 33
2.4.4
Pushing Prefix-Monotone Constraints into Sequential
Pattern Mining
. 35
2.4.5
Handling Tough Aggregate Constraints by Prefix-growth
39
2.5
Mining
Closed Sequential Patterns
. 42
2.5.1
Closed Sequential Patterns
. 42
2.5.2
Efficiently Mining Closed Sequential Patterns
_._ 44
2.6
Summary
. 45
XIV Contents
3
Classification,
Clustering, Features and Distances
of Sequence Data
. 47
3.1
Three Tasks on Sequence Classification/Clustering
. 47
3.2
Sequence Features
. 48
3.2.1
Sequence Feature Types
. 48
3.2.2
Sequence Feature Selection
. 50
3.3
Distance Functions over Sequences
. 51
3.3.1
Overview on Sequence Distance Functions
. 51
3.3.2
Edit, Hamming, and Alignment based Distances
. 52
3.3.3
Conditional Probability Distribution based Distance
. 53
3.3.4
An Example of Feature based Distance: d2
. 53
3.3.5
Web Session Similarity
. 54
3.4
Classification of Sequence Data
. 55
3.4.1
Support Vector Machines
. 55
3.4.2
Artificial Neural Networks
. 57
3.4.3
Other Methods
. 58
3.4.4
Evaluation of Classifiers and Classification Algorithms
. 58
3.5
Clustering Sequence Data
. 60
3.5.1
Popular Sequence Clustering Approaches
. 60
3.5.2
Quality Evaluation of Clustering Results
. 65
4
Sequence Motifs: Identifying and Characterizing Sequence
Families
. 67
4.1
Motivations and Problems
. 68
4.1.1
Motivations
. 68
4.1.2
Four Motif Analysis Problems
. 69
4.2
Motif Representations
. 70
4.2.1
Consensus Sequence
. 71
4.2.2
Position Weight Matrix (PWM)
. 71
4.2.3
Markov Chain Model
. 74
4.2.4
Hidden Markov Model
(HMM)
. 77
4.3
Representative Algorithms for Motif Problems
. 79
4.3.1
Dynamic Programming for Sequence Scoring
and Explanation with
HMM
. 80
4.3.2
Gibbs Sampling for Constructing PWM-based Motif
. 82
4.3.3
Expectation Maximization for Building
HMM
. 84
4.4
Discussion
. 86
5
Mining Partial Orders from Sequences
. 89
5.1
Mining Frequent Closed Partial Orders
. 91
5.1.1
Problem Definition
. 91
5.1.2
How Is Frequent Closed Partial Order Mining
Different from Other Data Mining Tasks?
. 94
5.1.3
TranClose: A Rudimentary Method
. 97
5.1.4
Algorithm Frecpo
.100
5.1.5
Applications
.106
Contents
XV
5.2 Mining Global
Partial
Orders.107
5.2.1 Motivation
and Preliminaries
.107
5.2.2 Mining
Algorithms
.108
5.2.3
Mixture
Models.
Ill
5.3
Summary
.112
6
Distinguishing Sequence Patterns
.113
6.1
Categories of Distinguishing Sequence Patterns
.113
6.2
Class-Characteristics Distinguishing Sequence Patterns
.115
6.2.1
Definitions and Terminology
.115
6.2.2
The ConSGapMiner Algorithm
.117
6.2.3
Extending ConSGapMiner: Minimum Gap Constraints
. 124
6.2.4
Extending ConSGapMiner: Coverage and Prefix-Based
Pattern Minimization
.126
6.3
Surprising Sequence Patterns
.128
7
Related Topics
.131
7.1
Structured-Data Mining
.131
7.2
Partial Periodic Pattern Mining
.132
7.3
Bioinformatics
.134
7.4
Sequence Alignment
.135
7.5
Biological Sequence Databases and Biological Data Analysis
Resources
.137
References
.139
Index
.147 |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Dong, Guozhu 1957- Pei, Jian |
author_GND | (DE-588)124939783 |
author_facet | Dong, Guozhu 1957- Pei, Jian |
author_role | aut aut |
author_sort | Dong, Guozhu 1957- |
author_variant | g d gd j p jp |
building | Verbundindex |
bvnumber | BV022301682 |
classification_rvk | ST 530 |
ctrlnum | (OCoLC)255755736 (DE-599)BVBBV022301682 |
discipline | Informatik |
discipline_str_mv | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01691nam a2200433 cb4500</leader><controlfield tag="001">BV022301682</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20071126 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">070307s2007 gw a||| |||| 00||| eng d</controlfield><datafield tag="015" ind1=" " ind2=" "><subfield code="a">07,N07,0025</subfield><subfield code="2">dnb</subfield></datafield><datafield tag="016" ind1="7" ind2=" "><subfield code="a">982648189</subfield><subfield code="2">DE-101</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780387699363</subfield><subfield code="c">Gb. : ca. EUR 81.59 (freier Pr.), ca. sfr 129.00 (freier Pr.)</subfield><subfield code="9">978-0-387-69936-3</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0387699368</subfield><subfield code="c">Gb. : ca. EUR 81.59 (freier Pr.), ca. sfr 129.00 (freier Pr.)</subfield><subfield code="9">0-387-69936-8</subfield></datafield><datafield tag="024" ind1="3" ind2=" "><subfield code="a">9780387699363</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">11736974</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)255755736</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV022301682</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakddb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">gw</subfield><subfield code="c">XA-DE-BE</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-355</subfield><subfield code="a">DE-824</subfield><subfield code="a">DE-N2</subfield><subfield code="a">DE-1051</subfield><subfield code="a">DE-11</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 530</subfield><subfield code="0">(DE-625)143679:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">004</subfield><subfield code="2">sdnb</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Dong, Guozhu</subfield><subfield code="d">1957-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)124939783</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Sequence data mining</subfield><subfield code="c">by Guozhu Dong and Jian Pei</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Berlin</subfield><subfield code="b">Springer</subfield><subfield code="c">2007</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">150 S.</subfield><subfield code="b">Ill.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Advances in Database Systems</subfield><subfield code="v">33</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Data Mining</subfield><subfield code="0">(DE-588)4428654-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Pei, Jian</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Advances in Database Systems</subfield><subfield code="v">33</subfield><subfield code="w">(DE-604)BV021653394</subfield><subfield code="9">33</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=015511662&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-015511662</subfield></datafield></record></collection> |
id | DE-604.BV022301682 |
illustrated | Illustrated |
index_date | 2024-07-02T16:55:18Z |
indexdate | 2024-07-09T20:54:31Z |
institution | BVB |
isbn | 9780387699363 0387699368 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-015511662 |
oclc_num | 255755736 |
open_access_boolean | |
owner | DE-355 DE-BY-UBR DE-824 DE-N2 DE-1051 DE-11 |
owner_facet | DE-355 DE-BY-UBR DE-824 DE-N2 DE-1051 DE-11 |
physical | 150 S. Ill. |
publishDate | 2007 |
publishDateSearch | 2007 |
publishDateSort | 2007 |
publisher | Springer |
record_format | marc |
series | Advances in Database Systems |
series2 | Advances in Database Systems |
spelling | Dong, Guozhu 1957- Verfasser (DE-588)124939783 aut Sequence data mining by Guozhu Dong and Jian Pei Berlin Springer 2007 150 S. Ill. txt rdacontent n rdamedia nc rdacarrier Advances in Database Systems 33 Data Mining (DE-588)4428654-5 gnd rswk-swf Data Mining (DE-588)4428654-5 s DE-604 Pei, Jian Verfasser aut Advances in Database Systems 33 (DE-604)BV021653394 33 Digitalisierung UB Regensburg application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=015511662&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Dong, Guozhu 1957- Pei, Jian Sequence data mining Advances in Database Systems Data Mining (DE-588)4428654-5 gnd |
subject_GND | (DE-588)4428654-5 |
title | Sequence data mining |
title_auth | Sequence data mining |
title_exact_search | Sequence data mining |
title_exact_search_txtP | Sequence data mining |
title_full | Sequence data mining by Guozhu Dong and Jian Pei |
title_fullStr | Sequence data mining by Guozhu Dong and Jian Pei |
title_full_unstemmed | Sequence data mining by Guozhu Dong and Jian Pei |
title_short | Sequence data mining |
title_sort | sequence data mining |
topic | Data Mining (DE-588)4428654-5 gnd |
topic_facet | Data Mining |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=015511662&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
volume_link | (DE-604)BV021653394 |
work_keys_str_mv | AT dongguozhu sequencedatamining AT peijian sequencedatamining |