Context sensitive neural networks for the classification of DNA, RNA and protein sequences:
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Abschlussarbeit Buch |
Sprache: | English German |
Veröffentlicht: |
Jena
[2022?]
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis Inhaltsverzeichnis Inhaltsverzeichnis |
Beschreibung: | Deutsche Zusammenfassung: Seite v-vi |
Beschreibung: | xviii, 137 Seiten Illustrationen, Diagramme 30 cm |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV048973746 | ||
003 | DE-604 | ||
005 | 20230718 | ||
007 | t | ||
008 | 230525s2022 gw a||| m||| 00||| eng d | ||
015 | |a 23,H04 |2 dnb | ||
016 | 7 | |a 1272996719 |2 DE-101 | |
020 | |c Broschur | ||
035 | |a (OCoLC)1348921342 | ||
035 | |a (DE-599)DNB1272996719 | ||
040 | |a DE-604 |b ger |e rda | ||
041 | 0 | |a eng |a ger | |
044 | |a gw |c XA-DE | ||
049 | |a DE-355 | ||
084 | |a ST 301 |0 (DE-625)143651: |2 rvk | ||
084 | |8 1\p |a 570 |2 23sdnb | ||
100 | 1 | |a Mock, Florian |d 1991- |e Verfasser |0 (DE-588)1214757367 |4 aut | |
245 | 1 | 0 | |a Context sensitive neural networks for the classification of DNA, RNA and protein sequences |c von M. Sc. Florian Mock |
264 | 1 | |a Jena |c [2022?] | |
300 | |a xviii, 137 Seiten |b Illustrationen, Diagramme |c 30 cm | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Deutsche Zusammenfassung: Seite v-vi | ||
502 | |b Dissertation |c Friedrich-Schiller-Universität Jena |d 2022 | ||
650 | 0 | 7 | |a DNS |0 (DE-588)4070512-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Molekülstruktur |0 (DE-588)4170383-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Neuronales Netz |0 (DE-588)4226127-2 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Klassifikation |0 (DE-588)4030958-7 |2 gnd |9 rswk-swf |
655 | 7 | |0 (DE-588)4113937-9 |a Hochschulschrift |2 gnd-content | |
689 | 0 | 0 | |a Neuronales Netz |0 (DE-588)4226127-2 |D s |
689 | 0 | 1 | |a Klassifikation |0 (DE-588)4030958-7 |D s |
689 | 0 | 2 | |a Molekülstruktur |0 (DE-588)4170383-2 |D s |
689 | 0 | 3 | |a DNS |0 (DE-588)4070512-2 |D s |
689 | 0 | |5 DE-604 | |
856 | 4 | 2 | |m B:DE-101 |q application/pdf |u https://d-nb.info/1272996719/04 |3 Inhaltsverzeichnis |
856 | 4 | 2 | |m DNB Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034237323&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
856 | 4 | 2 | |m Digitalisierung UB Regensburg - ADAM Catalogue Enrichment |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034237323&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-034237323 | ||
883 | 2 | |8 1\p |a dnb |d 20230308 |q DE-101 |u https://d-nb.info/provenance/plan#dnb |
Datensatz im Suchindex
_version_ | 1804185215347720192 |
---|---|
adam_text | CONTENTS
1
PREFACE
1
1.1
MOTIVATION
.............................................................................................
1
1.2
CONTRIBUTION
AND
SCOPE
OF
THIS
THESIS
..............................................
2
2
BACKGROUND
5
2.1
WHAT
ARE
DNA,
RNA
AND
PROTEIN
SEQUENCES?
.................................
5
2.2
WHY
DOES
CONTEXT
MATTER?
.................................................................
7
2.3
WHY
MACHINE
LEARNING?
.......................................................................
8
2.3.1
OVERVIEW
OF
MACHINE
LEARNING
FIELDS
...................................
8
2.3.2
MORE
DETAILED
LOOK
INTO
SUPERVISED
MACHINE
LEARNING
.
.
.
11
2.4
WHAT
ARE
NEURAL
NETWORKS
-
AND
WHY
USE
THEM?
......................
14
2.4.1
BENEFITS
OF
NEURAL
NETWORKS
.....................................................
14
2.4.2
WORKING
PRINCIPLE
....................................................................
16
2.4.3
TRAINING
PRINCIPLE
AND
TYPICAL
PROBLEMS
WHILE
TRAINING
.
.
17
2.4.4
EMPOWER
NEURAL
NETWORKS
FOR
SEQUENCE
CLASSIFICATION
.
.
.
19
2.4.5
THE
FIELD
OF
NATURAL
LANGUAGE
PROCESSING
............................
22
2.4.6
TRANSFER
LEARNING
.....................................................................
23
2.5
WHAT
DATA
IS
NEEDED?
..........................................................................
24
2.6
HOW
TO
STATISTICALLY
EVALUATE
THE
QUALITY
OF
MY
MODEL?
...............
24
2.7
TAKE
AWAY
.............................................................................................
29
3
CORRECT
CATEGORIZATION
EXEMPLIFIED
BY
VIRAL-HOST
PREDICTION
31
3.1
BACKGROUND
AND
PROSPECTS
....................................................................
31
3.1.1
LIMITING
VIRAL
OUTBREAKS
BEFORE
GETTING
A
GLOBAL
THREAD
.
32
3.1.2
PREVIOUS
APPROACHES
..................................................................
32
3.2
BUILDING
A
DEEP
NEURAL
NETWORK
APPROACH
TO
DETERMINE
THE
HOST
OF
A
VIRUS
................................................................................................
33
3.2.1
DATA
..............................................................................................
34
3.2.2
DEEP
NEURAL
NETWORK
ARCHITECTURE
..........................................
40
XI
3.2.3
FINAL
HOST
PREDICTION
FROM
FRAGMENT
PREDICTIONS
............
41
3.3
DEEP
NEURAL
NETWORKS
ARE
WELL
SUITED
FOR
HOST
PREDICTION
....
42
3.3.1
COMPARISON
.................................................................................
42
3.3.2
BEST
PRACTICE
AND
USEFUL
OBSERVATIONS
..................................
45
3.3.3
COMBINING
FRAGMENT
HOST
PREDICTIONS
RESULTS
IN
HIGHER
ACCURACY
....................................................................................
46
3.3.4
VIDHOP
OUTPERFORMS
OTHER
APPROACHES
.................................
47
3.3.5
POTENTIAL
OPTIMIZATIONS
.............................................................
50
3.3.6
GENERALIZATION
ON
NEW
VIRUSES
................................................
50
3.4
DEEP
NEURAL
NETWORKS
SUPPORT
SCIENTIFIC
PROGRESS
.........................
51
3.4.1
CHANCES
FOR
TACKLING
FUTURE
PANDEMICS
.................................
51
3.4.2
IMPROVED
CATEGORIZATION
WITH
DEEP
NEURAL
NETWORKS
.
.
.
52
4
LOCALIZATION
OF
AREAS
OF
INTEREST
EXEMPLIFIED
BY
LINEAR
B-CELL
EPITOPES
IDENTIFICATION
53
4.1
BACKGROUND
AND
PROSPECTS
....................................................................
53
4.1.1
ACCELERATING
THE
ANTIBODY
TEST
DEVELOPMENT
......................
54
4.1.2
PREVIOUS
APPROACHES
.................................................................
56
4.2
CREATING
A
HYBRID
APPROACH
TO
DETECT
B-CELL
EPITOPES,
WITH
AND
WITHOUT
CONTEXT
....................................................................................
56
4.2.1
PRE-TRAINING:
LEARNING
THE
PROTEIN-CONTEXT
.........................
57
4.2.2
DATA
...............................................................................................
58
4.2.3
DEEP
NEURAL
NETWORK
ARCHITECTURE
..........................................
63
4.2.4
BENCHMARK
METHOD
...................................................................
65
4.3
CONTEXT
HELPS
DEEP
NEURAL
NETWORKS
TO
CLASSIFY
EPITOPES
............
66
4.3.1
COMPARISON
WITH
COMPETITION
................................................
66
4.3.2
EVALUATION
RECAP
.........................................................................
69
4.3.3
EPIDOPE
OUTPUT
AND
VISUALIZATION
..........................................
70
4.3.4
POTENTIAL
OPTIMIZATIONS
............................................................
70
4.4
CONTEXT
IS
ESSENTIAL
WHEN
LOCATING
AREAS
OF
INTEREST
...................
71
4.4.1
ACCELERATED
DEVELOPMENT
OF
DIAGNOSTICS
POSSIBLE
.................
71
4.4.2
PROGRESS
IN
LOCATING
AREAS
OF
INTEREST
...................................
72
5
CATEGORIZATION
AND
ATTRIBUTION
EXEMPLIFIED
BY
TAXONOMIC
READ
CLASSIFICATION
73
5.1
BACKGROUND
AND
PROSPECTS
.....................................................................
73
5.1.1
REDUCE
THE
METAGENOMIC
DARK
MATTER
..................................
74
XII
5.1.2
PREVIOUS
APPROACHES
.................................................................
74
5.2
DEVELOPING
A
NLP
BASED
TAXONOMY
PREDICTOR
..................................
75
5.2.1
PRE-TRAINING:
FROM
DNA
SEQUENCE
TO
UNDERSTANDING
CONTEXT
76
5.2.2
FINE-TUNING:
FROM
CONTEXT
TO
TAXONOMIC
PREDICTION
....
79
5.2.3
DATA
.............................................................................................
80
5.2.4
DEEP
NEURAL
NETWORK
ARCHITECTURES
......................................
85
5.2.5
EVALUATING
COMPETING
METHODS
...............................................
87
5.3
NLP
CAN
PARTIALLY
COMPENSATE
MISSING
DATA
..................................
88
5.3.1
BETTER
TO
PREDICT
ALL
AT
ONCE
THAN
TO
INFER
..........................
88
5.3.2
COMPARABLE
PERFORMANCE
ON
SIMILAR
DATASET
.......................
89
5.3.3
COMPETING
METHODS
FAIL
ON
UNKNOWN
SEQUENCES
.................
92
5.3.4
THE
MORE
DATA
THE
BETTER
IS
BERTAX
......................................
93
5.3.5
ATTRIBUTION,
OR
HOW
TO
PEEK
IN
THE
BLACK-BOX
.......................
94
5.3.6
BROAD
USABILITY
OF
BERTAX........................................................
96
5.3.7
POTENTIAL
OPTIMIZATIONS
............................................................
96
5.4
NLP
PREDICTOR
VERY
SUITABLE
FOR
TAXONOMIC
CLASSIFICATION
BUT
DIFFICULT
TO
UNDERSTAND
..........................................................................
97
5.4.1
FULLER
PICTURE
OF
METAGENOMIC
SAMPLES
...............................
97
5.4.2
ATTRIBUTION
IS
ONLY
THE
START,
AND
MAYBE
THE
LIMIT
....
98
6
CONCLUSION
99
6.1
CHALLENGES
IN
DEVELOPING
NEURAL
NETWORKS
FOR
BIOLOGICAL
DATA
.
.
.
100
6.2
COMMON
CHALLENGES
WITH
NEURAL
NETWORKS
..........................................
101
6.3
PUT
INTO
CONTEXT
...................................................................................
104
6.4
BITTER
LESSONS
AND
SWEET
EXPECTATIONS
..............................................
105
XIII
Contents 1 2 1 Preface 1.1 Motivation ............................................................................................. 1 1.2 Contribution and Scope of This Thesis............................................... 2 Background 2.1 2.2 2.3 5 What Are DNA, RNA and Protein Sequences?............................... Why Does Context Matter?................................................................. Why Machine Learning?........................................................................ 7 8 .................................. 8 More Detailed Look Into Supervised Machine Learning ... What Are Neural Networks ֊ And Why Use Them? ...................... 11 2.4.1 2.4.2 Benefits of Neural Networks..................................................... Working Principle.................................................................... 14 16 2.4.3 Training Principle and Typical Problems While Training . . 17 2.4.4 Empower Neural Networks for Sequence Classification ... 19 2.4.5 2.4.6 The Field of Natural Language Processing............................ Transfer Learning .................................................................... 22 23 2.5 What Data Is Needed?........................................................................... 24 2.6 2.7 How to Statistically Evaluate the Quality of My Model?................ Take Away ............................................................................................. 24 2.3.1 Overview of Machine Learning Fields 2.3.2 2.4 3 5 14 29 Correct Categorization Exemplified by Viral-Host Prediction 31 3.1 Background and
Prospects.................................................................... 31 3.1.1 Limiting Viral Outbreaks Before Getting a Global Thread . 32 3.1.2 PreviousApproaches................................................................... 32 Building a Deep Neural Network Approach to Determine the Host of a Virus................................................................................................ 33 3.2.1 Data.............................................................................................. 34 3.2.2 Deep NeuralNetwork Architecture.......................................... 40 3.2 xi
Contents 3.3 3.4 4 41 Deep Neural Networks Are Well Suited for Host Prediction .... 42 3.3.1 Comparison................................................................................. 42 3.3.2 Best Practice and Useful Observations.................................. 45 3.3.3 Combining Fragment Host Predictions Results in Higher Accuracy....................................................................... 46 3.3.4 VIDHOP Outperforms Other Approaches.............................. 47 3.3.5 Potential Optimizations.......................................................... 50 3.3.6 Generalization on New Viruses.............................................. 50 Deep Neural Networks Support Scientific Progress......................... 51 3.4.1 Chances for Tackling Future Pandemics............................... 51 3.4.2 Improved Categorization With Deep Neural Networks ... 52 Localization of Areas of Interest Exemplified by Linear В-cell Epitopes Identification 53 4.1 Background and Prospects.................................................................... 53 4.1.1 Accelerating the Antibody Test Development . . ................ 54 4.1.2 Previous Approaches................................................................. 56 4.2 4.3 4.4 5 Final Host Prediction From Fragment Predictions ............ 3.2.3 Creating a Hybrid Approach to Detect В-Cell Epitopes, With and Without Context........................................................................... 56 4.2.1 Pre-Training: Learning the Protein-Context........................ 57 4.2.2
Data.............................................................................................. 58 4.2.3 Deep Neural Network Architecture......................................... 63 4.2.4 Benchmark Method.................................................................. 65 Context Helps Deep Neural Networks to Classify Epitopes............ 66 4.3.1 Comparison With Competition................................................ 66 4.3.2 Evaluation Recap.................................................. 69 4.3.3 EpiDope Output and Visualization.......................................... 4.3.4 Potential Optimizations............................................................ 70 Context Is Essential When Locating Areas of Interest ................... 71 4.4.1 Accelerated Development of Diagnostics Possible................. 4.4.2 Progress in Locating Areas of Interest.................................... 71 72 70 Categorization and Attribution Exemplified by Taxonomic Read Classification 5.1 73 Background and Prospects..................................................................... 73 Reduce the Metagenomic Dark Matter.................................. 74 5.1.1 xii
Contents 5.2 5.3 5.4 6 5.1.2 Previous Approaches................................................................. Developing a NLP Based Taxonomy Predictor.................................. 5.2.1 Pre-training: From DNA Sequence to Understanding Context 5.2.2 Fine-tuning: From Context to Taxonomic Prediction .... 5.2.3 Data.............................................................................................. 5.2.4 Deep Neural Network Architectures ...................................... 5.2.5 Evaluating Competing Methods............................................... NLP Can Partially Compensate Missing Data.................................. 75 76 79 80 85 87 88 5.3.1 Better to Predict All at Once Than to Infer.......................... 5.3.2 Comparable Performance on Similar Dataset....................... 88 89 5.3.3 Competing Methods Fail on Unknown Sequences................ 5.3.4 The More Data the Better Is BERTax...................................... 5.3.5 Attribution, or How to Peek in the Black-Box...................... 5.3.6 Broad Usability of BERTax........................................................ 5.3.7 Potential Optimizations........................................................... NLP Predictor Very Suitable for Taxonomic Classification but Difficult to Understand................................................................. 97 5.4.1 Fuller Picture of Metagenomic Samples ............................... 5.4.2 Attribution Is Only the Start, and Maybe the Limit .... 92 93 Conclusion 6.1 6.2 6.3 6.4 74 94 96 96 97 98 99 Challenges in Developing
Neural Networks for Biological Data . . . 100 Common Challenges With Neural Networks.......................................... 101 Put Into Context.................................................................................... Bitter Lessons and Sweet Expectations.............................................. xiii 104 105
|
adam_txt |
CONTENTS
1
PREFACE
1
1.1
MOTIVATION
.
1
1.2
CONTRIBUTION
AND
SCOPE
OF
THIS
THESIS
.
2
2
BACKGROUND
5
2.1
WHAT
ARE
DNA,
RNA
AND
PROTEIN
SEQUENCES?
.
5
2.2
WHY
DOES
CONTEXT
MATTER?
.
7
2.3
WHY
MACHINE
LEARNING?
.
8
2.3.1
OVERVIEW
OF
MACHINE
LEARNING
FIELDS
.
8
2.3.2
MORE
DETAILED
LOOK
INTO
SUPERVISED
MACHINE
LEARNING
.
.
.
11
2.4
WHAT
ARE
NEURAL
NETWORKS
-
AND
WHY
USE
THEM?
.
14
2.4.1
BENEFITS
OF
NEURAL
NETWORKS
.
14
2.4.2
WORKING
PRINCIPLE
.
16
2.4.3
TRAINING
PRINCIPLE
AND
TYPICAL
PROBLEMS
WHILE
TRAINING
.
.
17
2.4.4
EMPOWER
NEURAL
NETWORKS
FOR
SEQUENCE
CLASSIFICATION
.
.
.
19
2.4.5
THE
FIELD
OF
NATURAL
LANGUAGE
PROCESSING
.
22
2.4.6
TRANSFER
LEARNING
.
23
2.5
WHAT
DATA
IS
NEEDED?
.
24
2.6
HOW
TO
STATISTICALLY
EVALUATE
THE
QUALITY
OF
MY
MODEL?
.
24
2.7
TAKE
AWAY
.
29
3
CORRECT
CATEGORIZATION
EXEMPLIFIED
BY
VIRAL-HOST
PREDICTION
31
3.1
BACKGROUND
AND
PROSPECTS
.
31
3.1.1
LIMITING
VIRAL
OUTBREAKS
BEFORE
GETTING
A
GLOBAL
THREAD
.
32
3.1.2
PREVIOUS
APPROACHES
.
32
3.2
BUILDING
A
DEEP
NEURAL
NETWORK
APPROACH
TO
DETERMINE
THE
HOST
OF
A
VIRUS
.
33
3.2.1
DATA
.
34
3.2.2
DEEP
NEURAL
NETWORK
ARCHITECTURE
.
40
XI
3.2.3
FINAL
HOST
PREDICTION
FROM
FRAGMENT
PREDICTIONS
.
41
3.3
DEEP
NEURAL
NETWORKS
ARE
WELL
SUITED
FOR
HOST
PREDICTION
.
42
3.3.1
COMPARISON
.
42
3.3.2
BEST
PRACTICE
AND
USEFUL
OBSERVATIONS
.
45
3.3.3
COMBINING
FRAGMENT
HOST
PREDICTIONS
RESULTS
IN
HIGHER
ACCURACY
.
46
3.3.4
VIDHOP
OUTPERFORMS
OTHER
APPROACHES
.
47
3.3.5
POTENTIAL
OPTIMIZATIONS
.
50
3.3.6
GENERALIZATION
ON
NEW
VIRUSES
.
50
3.4
DEEP
NEURAL
NETWORKS
SUPPORT
SCIENTIFIC
PROGRESS
.
51
3.4.1
CHANCES
FOR
TACKLING
FUTURE
PANDEMICS
.
51
3.4.2
IMPROVED
CATEGORIZATION
WITH
DEEP
NEURAL
NETWORKS
.
.
.
52
4
LOCALIZATION
OF
AREAS
OF
INTEREST
EXEMPLIFIED
BY
LINEAR
B-CELL
EPITOPES
IDENTIFICATION
53
4.1
BACKGROUND
AND
PROSPECTS
.
53
4.1.1
ACCELERATING
THE
ANTIBODY
TEST
DEVELOPMENT
.
54
4.1.2
PREVIOUS
APPROACHES
.
56
4.2
CREATING
A
HYBRID
APPROACH
TO
DETECT
B-CELL
EPITOPES,
WITH
AND
WITHOUT
CONTEXT
.
56
4.2.1
PRE-TRAINING:
LEARNING
THE
PROTEIN-CONTEXT
.
57
4.2.2
DATA
.
58
4.2.3
DEEP
NEURAL
NETWORK
ARCHITECTURE
.
63
4.2.4
BENCHMARK
METHOD
.
65
4.3
CONTEXT
HELPS
DEEP
NEURAL
NETWORKS
TO
CLASSIFY
EPITOPES
.
66
4.3.1
COMPARISON
WITH
COMPETITION
.
66
4.3.2
EVALUATION
RECAP
.
69
4.3.3
EPIDOPE
OUTPUT
AND
VISUALIZATION
.
70
4.3.4
POTENTIAL
OPTIMIZATIONS
.
70
4.4
CONTEXT
IS
ESSENTIAL
WHEN
LOCATING
AREAS
OF
INTEREST
.
71
4.4.1
ACCELERATED
DEVELOPMENT
OF
DIAGNOSTICS
POSSIBLE
.
71
4.4.2
PROGRESS
IN
LOCATING
AREAS
OF
INTEREST
.
72
5
CATEGORIZATION
AND
ATTRIBUTION
EXEMPLIFIED
BY
TAXONOMIC
READ
CLASSIFICATION
73
5.1
BACKGROUND
AND
PROSPECTS
.
73
5.1.1
REDUCE
THE
METAGENOMIC
DARK
MATTER
.
74
XII
5.1.2
PREVIOUS
APPROACHES
.
74
5.2
DEVELOPING
A
NLP
BASED
TAXONOMY
PREDICTOR
.
75
5.2.1
PRE-TRAINING:
FROM
DNA
SEQUENCE
TO
UNDERSTANDING
CONTEXT
76
5.2.2
FINE-TUNING:
FROM
CONTEXT
TO
TAXONOMIC
PREDICTION
.
79
5.2.3
DATA
.
80
5.2.4
DEEP
NEURAL
NETWORK
ARCHITECTURES
.
85
5.2.5
EVALUATING
COMPETING
METHODS
.
87
5.3
NLP
CAN
PARTIALLY
COMPENSATE
MISSING
DATA
.
88
5.3.1
BETTER
TO
PREDICT
ALL
AT
ONCE
THAN
TO
INFER
.
88
5.3.2
COMPARABLE
PERFORMANCE
ON
SIMILAR
DATASET
.
89
5.3.3
COMPETING
METHODS
FAIL
ON
UNKNOWN
SEQUENCES
.
92
5.3.4
THE
MORE
DATA
THE
BETTER
IS
BERTAX
.
93
5.3.5
ATTRIBUTION,
OR
HOW
TO
PEEK
IN
THE
BLACK-BOX
.
94
5.3.6
BROAD
USABILITY
OF
BERTAX.
96
5.3.7
POTENTIAL
OPTIMIZATIONS
.
96
5.4
NLP
PREDICTOR
VERY
SUITABLE
FOR
TAXONOMIC
CLASSIFICATION
BUT
DIFFICULT
TO
UNDERSTAND
.
97
5.4.1
FULLER
PICTURE
OF
METAGENOMIC
SAMPLES
.
97
5.4.2
ATTRIBUTION
IS
ONLY
THE
START,
AND
MAYBE
THE
LIMIT
.
98
6
CONCLUSION
99
6.1
CHALLENGES
IN
DEVELOPING
NEURAL
NETWORKS
FOR
BIOLOGICAL
DATA
.
.
.
100
6.2
COMMON
CHALLENGES
WITH
NEURAL
NETWORKS
.
101
6.3
PUT
INTO
CONTEXT
.
104
6.4
BITTER
LESSONS
AND
SWEET
EXPECTATIONS
.
105
XIII
Contents 1 2 1 Preface 1.1 Motivation . 1 1.2 Contribution and Scope of This Thesis. 2 Background 2.1 2.2 2.3 5 What Are DNA, RNA and Protein Sequences?. Why Does Context Matter?. Why Machine Learning?. 7 8 . 8 More Detailed Look Into Supervised Machine Learning . What Are Neural Networks ֊ And Why Use Them? . 11 2.4.1 2.4.2 Benefits of Neural Networks. Working Principle. 14 16 2.4.3 Training Principle and Typical Problems While Training . . 17 2.4.4 Empower Neural Networks for Sequence Classification . 19 2.4.5 2.4.6 The Field of Natural Language Processing. Transfer Learning . 22 23 2.5 What Data Is Needed?. 24 2.6 2.7 How to Statistically Evaluate the Quality of My Model?. Take Away . 24 2.3.1 Overview of Machine Learning Fields 2.3.2 2.4 3 5 14 29 Correct Categorization Exemplified by Viral-Host Prediction 31 3.1 Background and
Prospects. 31 3.1.1 Limiting Viral Outbreaks Before Getting a Global Thread . 32 3.1.2 PreviousApproaches. 32 Building a Deep Neural Network Approach to Determine the Host of a Virus. 33 3.2.1 Data. 34 3.2.2 Deep NeuralNetwork Architecture. 40 3.2 xi
Contents 3.3 3.4 4 41 Deep Neural Networks Are Well Suited for Host Prediction . 42 3.3.1 Comparison. 42 3.3.2 Best Practice and Useful Observations. 45 3.3.3 Combining Fragment Host Predictions Results in Higher Accuracy. 46 3.3.4 VIDHOP Outperforms Other Approaches. 47 3.3.5 Potential Optimizations. 50 3.3.6 Generalization on New Viruses. 50 Deep Neural Networks Support Scientific Progress. 51 3.4.1 Chances for Tackling Future Pandemics. 51 3.4.2 Improved Categorization With Deep Neural Networks . 52 Localization of Areas of Interest Exemplified by Linear В-cell Epitopes Identification 53 4.1 Background and Prospects. 53 4.1.1 Accelerating the Antibody Test Development . . . 54 4.1.2 Previous Approaches. 56 4.2 4.3 4.4 5 Final Host Prediction From Fragment Predictions . 3.2.3 Creating a Hybrid Approach to Detect В-Cell Epitopes, With and Without Context. 56 4.2.1 Pre-Training: Learning the Protein-Context. 57 4.2.2
Data. 58 4.2.3 Deep Neural Network Architecture. 63 4.2.4 Benchmark Method. 65 Context Helps Deep Neural Networks to Classify Epitopes. 66 4.3.1 Comparison With Competition. 66 4.3.2 Evaluation Recap. 69 4.3.3 EpiDope Output and Visualization. 4.3.4 Potential Optimizations. 70 Context Is Essential When Locating Areas of Interest . 71 4.4.1 Accelerated Development of Diagnostics Possible. 4.4.2 Progress in Locating Areas of Interest. 71 72 70 Categorization and Attribution Exemplified by Taxonomic Read Classification 5.1 73 Background and Prospects. 73 Reduce the Metagenomic Dark Matter. 74 5.1.1 xii
Contents 5.2 5.3 5.4 6 5.1.2 Previous Approaches. Developing a NLP Based Taxonomy Predictor. 5.2.1 Pre-training: From DNA Sequence to Understanding Context 5.2.2 Fine-tuning: From Context to Taxonomic Prediction . 5.2.3 Data. 5.2.4 Deep Neural Network Architectures . 5.2.5 Evaluating Competing Methods. NLP Can Partially Compensate Missing Data. 75 76 79 80 85 87 88 5.3.1 Better to Predict All at Once Than to Infer. 5.3.2 Comparable Performance on Similar Dataset. 88 89 5.3.3 Competing Methods Fail on Unknown Sequences. 5.3.4 The More Data the Better Is BERTax. 5.3.5 Attribution, or How to Peek in the Black-Box. 5.3.6 Broad Usability of BERTax. 5.3.7 Potential Optimizations. NLP Predictor Very Suitable for Taxonomic Classification but Difficult to Understand. 97 5.4.1 Fuller Picture of Metagenomic Samples . 5.4.2 Attribution Is Only the Start, and Maybe the Limit . 92 93 Conclusion 6.1 6.2 6.3 6.4 74 94 96 96 97 98 99 Challenges in Developing
Neural Networks for Biological Data . . . 100 Common Challenges With Neural Networks. 101 Put Into Context. Bitter Lessons and Sweet Expectations. xiii 104 105 |
any_adam_object | 1 |
any_adam_object_boolean | 1 |
author | Mock, Florian 1991- |
author_GND | (DE-588)1214757367 |
author_facet | Mock, Florian 1991- |
author_role | aut |
author_sort | Mock, Florian 1991- |
author_variant | f m fm |
building | Verbundindex |
bvnumber | BV048973746 |
classification_rvk | ST 301 |
ctrlnum | (OCoLC)1348921342 (DE-599)DNB1272996719 |
discipline | Informatik |
discipline_str_mv | Informatik |
format | Thesis Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02333nam a2200505 c 4500</leader><controlfield tag="001">BV048973746</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20230718 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">230525s2022 gw a||| m||| 00||| eng d</controlfield><datafield tag="015" ind1=" " ind2=" "><subfield code="a">23,H04</subfield><subfield code="2">dnb</subfield></datafield><datafield tag="016" ind1="7" ind2=" "><subfield code="a">1272996719</subfield><subfield code="2">DE-101</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="c">Broschur</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1348921342</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DNB1272996719</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield><subfield code="a">ger</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">gw</subfield><subfield code="c">XA-DE</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-355</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 301</subfield><subfield code="0">(DE-625)143651:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="8">1\p</subfield><subfield code="a">570</subfield><subfield code="2">23sdnb</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Mock, Florian</subfield><subfield code="d">1991-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1214757367</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Context sensitive neural networks for the classification of DNA, RNA and protein sequences</subfield><subfield code="c">von M. Sc. Florian Mock</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Jena</subfield><subfield code="c">[2022?]</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xviii, 137 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield><subfield code="c">30 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Deutsche Zusammenfassung: Seite v-vi</subfield></datafield><datafield tag="502" ind1=" " ind2=" "><subfield code="b">Dissertation</subfield><subfield code="c">Friedrich-Schiller-Universität Jena</subfield><subfield code="d">2022</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">DNS</subfield><subfield code="0">(DE-588)4070512-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Molekülstruktur</subfield><subfield code="0">(DE-588)4170383-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Neuronales Netz</subfield><subfield code="0">(DE-588)4226127-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Klassifikation</subfield><subfield code="0">(DE-588)4030958-7</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)4113937-9</subfield><subfield code="a">Hochschulschrift</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Neuronales Netz</subfield><subfield code="0">(DE-588)4226127-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Klassifikation</subfield><subfield code="0">(DE-588)4030958-7</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Molekülstruktur</subfield><subfield code="0">(DE-588)4170383-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="3"><subfield code="a">DNS</subfield><subfield code="0">(DE-588)4070512-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">B:DE-101</subfield><subfield code="q">application/pdf</subfield><subfield code="u">https://d-nb.info/1272996719/04</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">DNB Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034237323&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034237323&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-034237323</subfield></datafield><datafield tag="883" ind1="2" ind2=" "><subfield code="8">1\p</subfield><subfield code="a">dnb</subfield><subfield code="d">20230308</subfield><subfield code="q">DE-101</subfield><subfield code="u">https://d-nb.info/provenance/plan#dnb</subfield></datafield></record></collection> |
genre | (DE-588)4113937-9 Hochschulschrift gnd-content |
genre_facet | Hochschulschrift |
id | DE-604.BV048973746 |
illustrated | Illustrated |
index_date | 2024-07-03T22:03:29Z |
indexdate | 2024-07-10T09:51:41Z |
institution | BVB |
language | English German |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-034237323 |
oclc_num | 1348921342 |
open_access_boolean | |
owner | DE-355 DE-BY-UBR |
owner_facet | DE-355 DE-BY-UBR |
physical | xviii, 137 Seiten Illustrationen, Diagramme 30 cm |
publishDate | 2022 |
publishDateSearch | 2022 |
publishDateSort | 2022 |
record_format | marc |
spelling | Mock, Florian 1991- Verfasser (DE-588)1214757367 aut Context sensitive neural networks for the classification of DNA, RNA and protein sequences von M. Sc. Florian Mock Jena [2022?] xviii, 137 Seiten Illustrationen, Diagramme 30 cm txt rdacontent n rdamedia nc rdacarrier Deutsche Zusammenfassung: Seite v-vi Dissertation Friedrich-Schiller-Universität Jena 2022 DNS (DE-588)4070512-2 gnd rswk-swf Molekülstruktur (DE-588)4170383-2 gnd rswk-swf Neuronales Netz (DE-588)4226127-2 gnd rswk-swf Klassifikation (DE-588)4030958-7 gnd rswk-swf (DE-588)4113937-9 Hochschulschrift gnd-content Neuronales Netz (DE-588)4226127-2 s Klassifikation (DE-588)4030958-7 s Molekülstruktur (DE-588)4170383-2 s DNS (DE-588)4070512-2 s DE-604 B:DE-101 application/pdf https://d-nb.info/1272996719/04 Inhaltsverzeichnis DNB Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034237323&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis Digitalisierung UB Regensburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034237323&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis 1\p dnb 20230308 DE-101 https://d-nb.info/provenance/plan#dnb |
spellingShingle | Mock, Florian 1991- Context sensitive neural networks for the classification of DNA, RNA and protein sequences DNS (DE-588)4070512-2 gnd Molekülstruktur (DE-588)4170383-2 gnd Neuronales Netz (DE-588)4226127-2 gnd Klassifikation (DE-588)4030958-7 gnd |
subject_GND | (DE-588)4070512-2 (DE-588)4170383-2 (DE-588)4226127-2 (DE-588)4030958-7 (DE-588)4113937-9 |
title | Context sensitive neural networks for the classification of DNA, RNA and protein sequences |
title_auth | Context sensitive neural networks for the classification of DNA, RNA and protein sequences |
title_exact_search | Context sensitive neural networks for the classification of DNA, RNA and protein sequences |
title_exact_search_txtP | Context sensitive neural networks for the classification of DNA, RNA and protein sequences |
title_full | Context sensitive neural networks for the classification of DNA, RNA and protein sequences von M. Sc. Florian Mock |
title_fullStr | Context sensitive neural networks for the classification of DNA, RNA and protein sequences von M. Sc. Florian Mock |
title_full_unstemmed | Context sensitive neural networks for the classification of DNA, RNA and protein sequences von M. Sc. Florian Mock |
title_short | Context sensitive neural networks for the classification of DNA, RNA and protein sequences |
title_sort | context sensitive neural networks for the classification of dna rna and protein sequences |
topic | DNS (DE-588)4070512-2 gnd Molekülstruktur (DE-588)4170383-2 gnd Neuronales Netz (DE-588)4226127-2 gnd Klassifikation (DE-588)4030958-7 gnd |
topic_facet | DNS Molekülstruktur Neuronales Netz Klassifikation Hochschulschrift |
url | https://d-nb.info/1272996719/04 http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034237323&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034237323&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT mockflorian contextsensitiveneuralnetworksfortheclassificationofdnarnaandproteinsequences |
Es ist kein Print-Exemplar vorhanden.
Inhaltsverzeichnis
Inhaltsverzeichnis