Verfügbarkeit: Context sensitive neural networks for the classification of DNA, RNA and protein sequences

Context sensitive neural networks for the classification of DNA, RNA and protein sequences:

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Mock, Florian 1991- (VerfasserIn)
Format:	Abschlussarbeit Buch
Sprache:	English German
Veröffentlicht:	Jena [2022?]
Schlagworte:	DNS Molekülstruktur Neuronales Netz Klassifikation Hochschulschrift
Online-Zugang:	Inhaltsverzeichnis Inhaltsverzeichnis Inhaltsverzeichnis
Beschreibung:	Deutsche Zusammenfassung: Seite v-vi
Beschreibung:	xviii, 137 Seiten Illustrationen, Diagramme 30 cm

Internformat

MARC


LEADER	00000nam a2200000 c 4500
001	BV048973746
003	DE-604
005	20230718
007	t
008	230525s2022 gw a\|\|\| m\|\|\| 00\|\|\| eng d
015			\|a 23,H04 \|2 dnb
016	7		\|a 1272996719 \|2 DE-101
020			\|c Broschur
035			\|a (OCoLC)1348921342
035			\|a (DE-599)DNB1272996719
040			\|a DE-604 \|b ger \|e rda
041	0		\|a eng \|a ger
044			\|a gw \|c XA-DE
049			\|a DE-355
084			\|a ST 301 \|0 (DE-625)143651: \|2 rvk
084			\|8 1\p \|a 570 \|2 23sdnb
100	1		\|a Mock, Florian \|d 1991- \|e Verfasser \|0 (DE-588)1214757367 \|4 aut
245	1	0	\|a Context sensitive neural networks for the classification of DNA, RNA and protein sequences \|c von M. Sc. Florian Mock
264		1	\|a Jena \|c [2022?]
300			\|a xviii, 137 Seiten \|b Illustrationen, Diagramme \|c 30 cm
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
500			\|a Deutsche Zusammenfassung: Seite v-vi
502			\|b Dissertation \|c Friedrich-Schiller-Universität Jena \|d 2022
650	0	7	\|a DNS \|0 (DE-588)4070512-2 \|2 gnd \|9 rswk-swf
650	0	7	\|a Molekülstruktur \|0 (DE-588)4170383-2 \|2 gnd \|9 rswk-swf
650	0	7	\|a Neuronales Netz \|0 (DE-588)4226127-2 \|2 gnd \|9 rswk-swf
650	0	7	\|a Klassifikation \|0 (DE-588)4030958-7 \|2 gnd \|9 rswk-swf
655		7	\|0 (DE-588)4113937-9 \|a Hochschulschrift \|2 gnd-content
689	0	0	\|a Neuronales Netz \|0 (DE-588)4226127-2 \|D s
689	0	1	\|a Klassifikation \|0 (DE-588)4030958-7 \|D s
689	0	2	\|a Molekülstruktur \|0 (DE-588)4170383-2 \|D s
689	0	3	\|a DNS \|0 (DE-588)4070512-2 \|D s
689	0		\|5 DE-604
856	4	2	\|m B:DE-101 \|q application/pdf \|u https://d-nb.info/1272996719/04 \|3 Inhaltsverzeichnis
856	4	2	\|m DNB Datenaustausch \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034237323&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
856	4	2	\|m Digitalisierung UB Regensburg - ADAM Catalogue Enrichment \|q application/pdf \|u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034237323&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA \|3 Inhaltsverzeichnis
999			\|a oai:aleph.bib-bvb.de:BVB01-034237323
883	2		\|8 1\p \|a dnb \|d 20230308 \|q DE-101 \|u https://d-nb.info/provenance/plan#dnb

Datensatz im Suchindex

_version_	1804185215347720192
adam_text	CONTENTS 1 PREFACE 1 1.1 MOTIVATION ............................................................................................. 1 1.2 CONTRIBUTION AND SCOPE OF THIS THESIS .............................................. 2 2 BACKGROUND 5 2.1 WHAT ARE DNA, RNA AND PROTEIN SEQUENCES? ................................. 5 2.2 WHY DOES CONTEXT MATTER? ................................................................. 7 2.3 WHY MACHINE LEARNING? ....................................................................... 8 2.3.1 OVERVIEW OF MACHINE LEARNING FIELDS ................................... 8 2.3.2 MORE DETAILED LOOK INTO SUPERVISED MACHINE LEARNING . . . 11 2.4 WHAT ARE NEURAL NETWORKS - AND WHY USE THEM? ...................... 14 2.4.1 BENEFITS OF NEURAL NETWORKS ..................................................... 14 2.4.2 WORKING PRINCIPLE .................................................................... 16 2.4.3 TRAINING PRINCIPLE AND TYPICAL PROBLEMS WHILE TRAINING . . 17 2.4.4 EMPOWER NEURAL NETWORKS FOR SEQUENCE CLASSIFICATION . . . 19 2.4.5 THE FIELD OF NATURAL LANGUAGE PROCESSING ............................ 22 2.4.6 TRANSFER LEARNING ..................................................................... 23 2.5 WHAT DATA IS NEEDED? .......................................................................... 24 2.6 HOW TO STATISTICALLY EVALUATE THE QUALITY OF MY MODEL? ............... 24 2.7 TAKE AWAY ............................................................................................. 29 3 CORRECT CATEGORIZATION EXEMPLIFIED BY VIRAL-HOST PREDICTION 31 3.1 BACKGROUND AND PROSPECTS .................................................................... 31 3.1.1 LIMITING VIRAL OUTBREAKS BEFORE GETTING A GLOBAL THREAD . 32 3.1.2 PREVIOUS APPROACHES .................................................................. 32 3.2 BUILDING A DEEP NEURAL NETWORK APPROACH TO DETERMINE THE HOST OF A VIRUS ................................................................................................ 33 3.2.1 DATA .............................................................................................. 34 3.2.2 DEEP NEURAL NETWORK ARCHITECTURE .......................................... 40 XI 3.2.3 FINAL HOST PREDICTION FROM FRAGMENT PREDICTIONS ............ 41 3.3 DEEP NEURAL NETWORKS ARE WELL SUITED FOR HOST PREDICTION .... 42 3.3.1 COMPARISON ................................................................................. 42 3.3.2 BEST PRACTICE AND USEFUL OBSERVATIONS .................................. 45 3.3.3 COMBINING FRAGMENT HOST PREDICTIONS RESULTS IN HIGHER ACCURACY .................................................................................... 46 3.3.4 VIDHOP OUTPERFORMS OTHER APPROACHES ................................. 47 3.3.5 POTENTIAL OPTIMIZATIONS ............................................................. 50 3.3.6 GENERALIZATION ON NEW VIRUSES ................................................ 50 3.4 DEEP NEURAL NETWORKS SUPPORT SCIENTIFIC PROGRESS ......................... 51 3.4.1 CHANCES FOR TACKLING FUTURE PANDEMICS ................................. 51 3.4.2 IMPROVED CATEGORIZATION WITH DEEP NEURAL NETWORKS . . . 52 4 LOCALIZATION OF AREAS OF INTEREST EXEMPLIFIED BY LINEAR B-CELL EPITOPES IDENTIFICATION 53 4.1 BACKGROUND AND PROSPECTS .................................................................... 53 4.1.1 ACCELERATING THE ANTIBODY TEST DEVELOPMENT ...................... 54 4.1.2 PREVIOUS APPROACHES ................................................................. 56 4.2 CREATING A HYBRID APPROACH TO DETECT B-CELL EPITOPES, WITH AND WITHOUT CONTEXT .................................................................................... 56 4.2.1 PRE-TRAINING: LEARNING THE PROTEIN-CONTEXT ......................... 57 4.2.2 DATA ............................................................................................... 58 4.2.3 DEEP NEURAL NETWORK ARCHITECTURE .......................................... 63 4.2.4 BENCHMARK METHOD ................................................................... 65 4.3 CONTEXT HELPS DEEP NEURAL NETWORKS TO CLASSIFY EPITOPES ............ 66 4.3.1 COMPARISON WITH COMPETITION ................................................ 66 4.3.2 EVALUATION RECAP ......................................................................... 69 4.3.3 EPIDOPE OUTPUT AND VISUALIZATION .......................................... 70 4.3.4 POTENTIAL OPTIMIZATIONS ............................................................ 70 4.4 CONTEXT IS ESSENTIAL WHEN LOCATING AREAS OF INTEREST ................... 71 4.4.1 ACCELERATED DEVELOPMENT OF DIAGNOSTICS POSSIBLE ................. 71 4.4.2 PROGRESS IN LOCATING AREAS OF INTEREST ................................... 72 5 CATEGORIZATION AND ATTRIBUTION EXEMPLIFIED BY TAXONOMIC READ CLASSIFICATION 73 5.1 BACKGROUND AND PROSPECTS ..................................................................... 73 5.1.1 REDUCE THE METAGENOMIC DARK MATTER .................................. 74 XII 5.1.2 PREVIOUS APPROACHES ................................................................. 74 5.2 DEVELOPING A NLP BASED TAXONOMY PREDICTOR .................................. 75 5.2.1 PRE-TRAINING: FROM DNA SEQUENCE TO UNDERSTANDING CONTEXT 76 5.2.2 FINE-TUNING: FROM CONTEXT TO TAXONOMIC PREDICTION .... 79 5.2.3 DATA ............................................................................................. 80 5.2.4 DEEP NEURAL NETWORK ARCHITECTURES ...................................... 85 5.2.5 EVALUATING COMPETING METHODS ............................................... 87 5.3 NLP CAN PARTIALLY COMPENSATE MISSING DATA .................................. 88 5.3.1 BETTER TO PREDICT ALL AT ONCE THAN TO INFER .......................... 88 5.3.2 COMPARABLE PERFORMANCE ON SIMILAR DATASET ....................... 89 5.3.3 COMPETING METHODS FAIL ON UNKNOWN SEQUENCES ................. 92 5.3.4 THE MORE DATA THE BETTER IS BERTAX ...................................... 93 5.3.5 ATTRIBUTION, OR HOW TO PEEK IN THE BLACK-BOX ....................... 94 5.3.6 BROAD USABILITY OF BERTAX........................................................ 96 5.3.7 POTENTIAL OPTIMIZATIONS ............................................................ 96 5.4 NLP PREDICTOR VERY SUITABLE FOR TAXONOMIC CLASSIFICATION BUT DIFFICULT TO UNDERSTAND .......................................................................... 97 5.4.1 FULLER PICTURE OF METAGENOMIC SAMPLES ............................... 97 5.4.2 ATTRIBUTION IS ONLY THE START, AND MAYBE THE LIMIT .... 98 6 CONCLUSION 99 6.1 CHALLENGES IN DEVELOPING NEURAL NETWORKS FOR BIOLOGICAL DATA . . . 100 6.2 COMMON CHALLENGES WITH NEURAL NETWORKS .......................................... 101 6.3 PUT INTO CONTEXT ................................................................................... 104 6.4 BITTER LESSONS AND SWEET EXPECTATIONS .............................................. 105 XIII Contents 1 2 1 Preface 1.1 Motivation ............................................................................................. 1 1.2 Contribution and Scope of This Thesis............................................... 2 Background 2.1 2.2 2.3 5 What Are DNA, RNA and Protein Sequences?............................... Why Does Context Matter?................................................................. Why Machine Learning?........................................................................ 7 8 .................................. 8 More Detailed Look Into Supervised Machine Learning ... What Are Neural Networks ֊ And Why Use Them? ...................... 11 2.4.1 2.4.2 Benefits of Neural Networks..................................................... Working Principle.................................................................... 14 16 2.4.3 Training Principle and Typical Problems While Training . . 17 2.4.4 Empower Neural Networks for Sequence Classification ... 19 2.4.5 2.4.6 The Field of Natural Language Processing............................ Transfer Learning .................................................................... 22 23 2.5 What Data Is Needed?........................................................................... 24 2.6 2.7 How to Statistically Evaluate the Quality of My Model?................ Take Away ............................................................................................. 24 2.3.1 Overview of Machine Learning Fields 2.3.2 2.4 3 5 14 29 Correct Categorization Exemplified by Viral-Host Prediction 31 3.1 Background and Prospects.................................................................... 31 3.1.1 Limiting Viral Outbreaks Before Getting a Global Thread . 32 3.1.2 PreviousApproaches................................................................... 32 Building a Deep Neural Network Approach to Determine the Host of a Virus................................................................................................ 33 3.2.1 Data.............................................................................................. 34 3.2.2 Deep NeuralNetwork Architecture.......................................... 40 3.2 xi Contents 3.3 3.4 4 41 Deep Neural Networks Are Well Suited for Host Prediction .... 42 3.3.1 Comparison................................................................................. 42 3.3.2 Best Practice and Useful Observations.................................. 45 3.3.3 Combining Fragment Host Predictions Results in Higher Accuracy....................................................................... 46 3.3.4 VIDHOP Outperforms Other Approaches.............................. 47 3.3.5 Potential Optimizations.......................................................... 50 3.3.6 Generalization on New Viruses.............................................. 50 Deep Neural Networks Support Scientific Progress......................... 51 3.4.1 Chances for Tackling Future Pandemics............................... 51 3.4.2 Improved Categorization With Deep Neural Networks ... 52 Localization of Areas of Interest Exemplified by Linear В-cell Epitopes Identification 53 4.1 Background and Prospects.................................................................... 53 4.1.1 Accelerating the Antibody Test Development . . ................ 54 4.1.2 Previous Approaches................................................................. 56 4.2 4.3 4.4 5 Final Host Prediction From Fragment Predictions ............ 3.2.3 Creating a Hybrid Approach to Detect В-Cell Epitopes, With and Without Context........................................................................... 56 4.2.1 Pre-Training: Learning the Protein-Context........................ 57 4.2.2 Data.............................................................................................. 58 4.2.3 Deep Neural Network Architecture......................................... 63 4.2.4 Benchmark Method.................................................................. 65 Context Helps Deep Neural Networks to Classify Epitopes............ 66 4.3.1 Comparison With Competition................................................ 66 4.3.2 Evaluation Recap.................................................. 69 4.3.3 EpiDope Output and Visualization.......................................... 4.3.4 Potential Optimizations............................................................ 70 Context Is Essential When Locating Areas of Interest ................... 71 4.4.1 Accelerated Development of Diagnostics Possible................. 4.4.2 Progress in Locating Areas of Interest.................................... 71 72 70 Categorization and Attribution Exemplified by Taxonomic Read Classification 5.1 73 Background and Prospects..................................................................... 73 Reduce the Metagenomic Dark Matter.................................. 74 5.1.1 xii Contents 5.2 5.3 5.4 6 5.1.2 Previous Approaches................................................................. Developing a NLP Based Taxonomy Predictor.................................. 5.2.1 Pre-training: From DNA Sequence to Understanding Context 5.2.2 Fine-tuning: From Context to Taxonomic Prediction .... 5.2.3 Data.............................................................................................. 5.2.4 Deep Neural Network Architectures ...................................... 5.2.5 Evaluating Competing Methods............................................... NLP Can Partially Compensate Missing Data.................................. 75 76 79 80 85 87 88 5.3.1 Better to Predict All at Once Than to Infer.......................... 5.3.2 Comparable Performance on Similar Dataset....................... 88 89 5.3.3 Competing Methods Fail on Unknown Sequences................ 5.3.4 The More Data the Better Is BERTax...................................... 5.3.5 Attribution, or How to Peek in the Black-Box...................... 5.3.6 Broad Usability of BERTax........................................................ 5.3.7 Potential Optimizations........................................................... NLP Predictor Very Suitable for Taxonomic Classification but Difficult to Understand................................................................. 97 5.4.1 Fuller Picture of Metagenomic Samples ............................... 5.4.2 Attribution Is Only the Start, and Maybe the Limit .... 92 93 Conclusion 6.1 6.2 6.3 6.4 74 94 96 96 97 98 99 Challenges in Developing Neural Networks for Biological Data . . . 100 Common Challenges With Neural Networks.......................................... 101 Put Into Context.................................................................................... Bitter Lessons and Sweet Expectations.............................................. xiii 104 105
adam_txt	CONTENTS 1 PREFACE 1 1.1 MOTIVATION . 1 1.2 CONTRIBUTION AND SCOPE OF THIS THESIS . 2 2 BACKGROUND 5 2.1 WHAT ARE DNA, RNA AND PROTEIN SEQUENCES? . 5 2.2 WHY DOES CONTEXT MATTER? . 7 2.3 WHY MACHINE LEARNING? . 8 2.3.1 OVERVIEW OF MACHINE LEARNING FIELDS . 8 2.3.2 MORE DETAILED LOOK INTO SUPERVISED MACHINE LEARNING . . . 11 2.4 WHAT ARE NEURAL NETWORKS - AND WHY USE THEM? . 14 2.4.1 BENEFITS OF NEURAL NETWORKS . 14 2.4.2 WORKING PRINCIPLE . 16 2.4.3 TRAINING PRINCIPLE AND TYPICAL PROBLEMS WHILE TRAINING . . 17 2.4.4 EMPOWER NEURAL NETWORKS FOR SEQUENCE CLASSIFICATION . . . 19 2.4.5 THE FIELD OF NATURAL LANGUAGE PROCESSING . 22 2.4.6 TRANSFER LEARNING . 23 2.5 WHAT DATA IS NEEDED? . 24 2.6 HOW TO STATISTICALLY EVALUATE THE QUALITY OF MY MODEL? . 24 2.7 TAKE AWAY . 29 3 CORRECT CATEGORIZATION EXEMPLIFIED BY VIRAL-HOST PREDICTION 31 3.1 BACKGROUND AND PROSPECTS . 31 3.1.1 LIMITING VIRAL OUTBREAKS BEFORE GETTING A GLOBAL THREAD . 32 3.1.2 PREVIOUS APPROACHES . 32 3.2 BUILDING A DEEP NEURAL NETWORK APPROACH TO DETERMINE THE HOST OF A VIRUS . 33 3.2.1 DATA . 34 3.2.2 DEEP NEURAL NETWORK ARCHITECTURE . 40 XI 3.2.3 FINAL HOST PREDICTION FROM FRAGMENT PREDICTIONS . 41 3.3 DEEP NEURAL NETWORKS ARE WELL SUITED FOR HOST PREDICTION . 42 3.3.1 COMPARISON . 42 3.3.2 BEST PRACTICE AND USEFUL OBSERVATIONS . 45 3.3.3 COMBINING FRAGMENT HOST PREDICTIONS RESULTS IN HIGHER ACCURACY . 46 3.3.4 VIDHOP OUTPERFORMS OTHER APPROACHES . 47 3.3.5 POTENTIAL OPTIMIZATIONS . 50 3.3.6 GENERALIZATION ON NEW VIRUSES . 50 3.4 DEEP NEURAL NETWORKS SUPPORT SCIENTIFIC PROGRESS . 51 3.4.1 CHANCES FOR TACKLING FUTURE PANDEMICS . 51 3.4.2 IMPROVED CATEGORIZATION WITH DEEP NEURAL NETWORKS . . . 52 4 LOCALIZATION OF AREAS OF INTEREST EXEMPLIFIED BY LINEAR B-CELL EPITOPES IDENTIFICATION 53 4.1 BACKGROUND AND PROSPECTS . 53 4.1.1 ACCELERATING THE ANTIBODY TEST DEVELOPMENT . 54 4.1.2 PREVIOUS APPROACHES . 56 4.2 CREATING A HYBRID APPROACH TO DETECT B-CELL EPITOPES, WITH AND WITHOUT CONTEXT . 56 4.2.1 PRE-TRAINING: LEARNING THE PROTEIN-CONTEXT . 57 4.2.2 DATA . 58 4.2.3 DEEP NEURAL NETWORK ARCHITECTURE . 63 4.2.4 BENCHMARK METHOD . 65 4.3 CONTEXT HELPS DEEP NEURAL NETWORKS TO CLASSIFY EPITOPES . 66 4.3.1 COMPARISON WITH COMPETITION . 66 4.3.2 EVALUATION RECAP . 69 4.3.3 EPIDOPE OUTPUT AND VISUALIZATION . 70 4.3.4 POTENTIAL OPTIMIZATIONS . 70 4.4 CONTEXT IS ESSENTIAL WHEN LOCATING AREAS OF INTEREST . 71 4.4.1 ACCELERATED DEVELOPMENT OF DIAGNOSTICS POSSIBLE . 71 4.4.2 PROGRESS IN LOCATING AREAS OF INTEREST . 72 5 CATEGORIZATION AND ATTRIBUTION EXEMPLIFIED BY TAXONOMIC READ CLASSIFICATION 73 5.1 BACKGROUND AND PROSPECTS . 73 5.1.1 REDUCE THE METAGENOMIC DARK MATTER . 74 XII 5.1.2 PREVIOUS APPROACHES . 74 5.2 DEVELOPING A NLP BASED TAXONOMY PREDICTOR . 75 5.2.1 PRE-TRAINING: FROM DNA SEQUENCE TO UNDERSTANDING CONTEXT 76 5.2.2 FINE-TUNING: FROM CONTEXT TO TAXONOMIC PREDICTION . 79 5.2.3 DATA . 80 5.2.4 DEEP NEURAL NETWORK ARCHITECTURES . 85 5.2.5 EVALUATING COMPETING METHODS . 87 5.3 NLP CAN PARTIALLY COMPENSATE MISSING DATA . 88 5.3.1 BETTER TO PREDICT ALL AT ONCE THAN TO INFER . 88 5.3.2 COMPARABLE PERFORMANCE ON SIMILAR DATASET . 89 5.3.3 COMPETING METHODS FAIL ON UNKNOWN SEQUENCES . 92 5.3.4 THE MORE DATA THE BETTER IS BERTAX . 93 5.3.5 ATTRIBUTION, OR HOW TO PEEK IN THE BLACK-BOX . 94 5.3.6 BROAD USABILITY OF BERTAX. 96 5.3.7 POTENTIAL OPTIMIZATIONS . 96 5.4 NLP PREDICTOR VERY SUITABLE FOR TAXONOMIC CLASSIFICATION BUT DIFFICULT TO UNDERSTAND . 97 5.4.1 FULLER PICTURE OF METAGENOMIC SAMPLES . 97 5.4.2 ATTRIBUTION IS ONLY THE START, AND MAYBE THE LIMIT . 98 6 CONCLUSION 99 6.1 CHALLENGES IN DEVELOPING NEURAL NETWORKS FOR BIOLOGICAL DATA . . . 100 6.2 COMMON CHALLENGES WITH NEURAL NETWORKS . 101 6.3 PUT INTO CONTEXT . 104 6.4 BITTER LESSONS AND SWEET EXPECTATIONS . 105 XIII Contents 1 2 1 Preface 1.1 Motivation . 1 1.2 Contribution and Scope of This Thesis. 2 Background 2.1 2.2 2.3 5 What Are DNA, RNA and Protein Sequences?. Why Does Context Matter?. Why Machine Learning?. 7 8 . 8 More Detailed Look Into Supervised Machine Learning . What Are Neural Networks ֊ And Why Use Them? . 11 2.4.1 2.4.2 Benefits of Neural Networks. Working Principle. 14 16 2.4.3 Training Principle and Typical Problems While Training . . 17 2.4.4 Empower Neural Networks for Sequence Classification . 19 2.4.5 2.4.6 The Field of Natural Language Processing. Transfer Learning . 22 23 2.5 What Data Is Needed?. 24 2.6 2.7 How to Statistically Evaluate the Quality of My Model?. Take Away . 24 2.3.1 Overview of Machine Learning Fields 2.3.2 2.4 3 5 14 29 Correct Categorization Exemplified by Viral-Host Prediction 31 3.1 Background and Prospects. 31 3.1.1 Limiting Viral Outbreaks Before Getting a Global Thread . 32 3.1.2 PreviousApproaches. 32 Building a Deep Neural Network Approach to Determine the Host of a Virus. 33 3.2.1 Data. 34 3.2.2 Deep NeuralNetwork Architecture. 40 3.2 xi Contents 3.3 3.4 4 41 Deep Neural Networks Are Well Suited for Host Prediction . 42 3.3.1 Comparison. 42 3.3.2 Best Practice and Useful Observations. 45 3.3.3 Combining Fragment Host Predictions Results in Higher Accuracy. 46 3.3.4 VIDHOP Outperforms Other Approaches. 47 3.3.5 Potential Optimizations. 50 3.3.6 Generalization on New Viruses. 50 Deep Neural Networks Support Scientific Progress. 51 3.4.1 Chances for Tackling Future Pandemics. 51 3.4.2 Improved Categorization With Deep Neural Networks . 52 Localization of Areas of Interest Exemplified by Linear В-cell Epitopes Identification 53 4.1 Background and Prospects. 53 4.1.1 Accelerating the Antibody Test Development . . . 54 4.1.2 Previous Approaches. 56 4.2 4.3 4.4 5 Final Host Prediction From Fragment Predictions . 3.2.3 Creating a Hybrid Approach to Detect В-Cell Epitopes, With and Without Context. 56 4.2.1 Pre-Training: Learning the Protein-Context. 57 4.2.2 Data. 58 4.2.3 Deep Neural Network Architecture. 63 4.2.4 Benchmark Method. 65 Context Helps Deep Neural Networks to Classify Epitopes. 66 4.3.1 Comparison With Competition. 66 4.3.2 Evaluation Recap. 69 4.3.3 EpiDope Output and Visualization. 4.3.4 Potential Optimizations. 70 Context Is Essential When Locating Areas of Interest . 71 4.4.1 Accelerated Development of Diagnostics Possible. 4.4.2 Progress in Locating Areas of Interest. 71 72 70 Categorization and Attribution Exemplified by Taxonomic Read Classification 5.1 73 Background and Prospects. 73 Reduce the Metagenomic Dark Matter. 74 5.1.1 xii Contents 5.2 5.3 5.4 6 5.1.2 Previous Approaches. Developing a NLP Based Taxonomy Predictor. 5.2.1 Pre-training: From DNA Sequence to Understanding Context 5.2.2 Fine-tuning: From Context to Taxonomic Prediction . 5.2.3 Data. 5.2.4 Deep Neural Network Architectures . 5.2.5 Evaluating Competing Methods. NLP Can Partially Compensate Missing Data. 75 76 79 80 85 87 88 5.3.1 Better to Predict All at Once Than to Infer. 5.3.2 Comparable Performance on Similar Dataset. 88 89 5.3.3 Competing Methods Fail on Unknown Sequences. 5.3.4 The More Data the Better Is BERTax. 5.3.5 Attribution, or How to Peek in the Black-Box. 5.3.6 Broad Usability of BERTax. 5.3.7 Potential Optimizations. NLP Predictor Very Suitable for Taxonomic Classification but Difficult to Understand. 97 5.4.1 Fuller Picture of Metagenomic Samples . 5.4.2 Attribution Is Only the Start, and Maybe the Limit . 92 93 Conclusion 6.1 6.2 6.3 6.4 74 94 96 96 97 98 99 Challenges in Developing Neural Networks for Biological Data . . . 100 Common Challenges With Neural Networks. 101 Put Into Context. Bitter Lessons and Sweet Expectations. xiii 104 105
any_adam_object	1
any_adam_object_boolean	1
author	Mock, Florian 1991-
author_GND	(DE-588)1214757367
author_facet	Mock, Florian 1991-
author_role	aut
author_sort	Mock, Florian 1991-
author_variant	f m fm
building	Verbundindex
bvnumber	BV048973746
classification_rvk	ST 301
ctrlnum	(OCoLC)1348921342 (DE-599)DNB1272996719
discipline	Informatik
discipline_str_mv	Informatik
format	Thesis Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02333nam a2200505 c 4500</leader><controlfield tag="001">BV048973746</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20230718 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">230525s2022 gw a\|\|\| m\|\|\| 00\|\|\| eng d</controlfield><datafield tag="015" ind1=" " ind2=" "><subfield code="a">23,H04</subfield><subfield code="2">dnb</subfield></datafield><datafield tag="016" ind1="7" ind2=" "><subfield code="a">1272996719</subfield><subfield code="2">DE-101</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="c">Broschur</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1348921342</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DNB1272996719</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rda</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield><subfield code="a">ger</subfield></datafield><datafield tag="044" ind1=" " ind2=" "><subfield code="a">gw</subfield><subfield code="c">XA-DE</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-355</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 301</subfield><subfield code="0">(DE-625)143651:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="8">1\p</subfield><subfield code="a">570</subfield><subfield code="2">23sdnb</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Mock, Florian</subfield><subfield code="d">1991-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1214757367</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Context sensitive neural networks for the classification of DNA, RNA and protein sequences</subfield><subfield code="c">von M. Sc. Florian Mock</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Jena</subfield><subfield code="c">[2022?]</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xviii, 137 Seiten</subfield><subfield code="b">Illustrationen, Diagramme</subfield><subfield code="c">30 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Deutsche Zusammenfassung: Seite v-vi</subfield></datafield><datafield tag="502" ind1=" " ind2=" "><subfield code="b">Dissertation</subfield><subfield code="c">Friedrich-Schiller-Universität Jena</subfield><subfield code="d">2022</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">DNS</subfield><subfield code="0">(DE-588)4070512-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Molekülstruktur</subfield><subfield code="0">(DE-588)4170383-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Neuronales Netz</subfield><subfield code="0">(DE-588)4226127-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Klassifikation</subfield><subfield code="0">(DE-588)4030958-7</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)4113937-9</subfield><subfield code="a">Hochschulschrift</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Neuronales Netz</subfield><subfield code="0">(DE-588)4226127-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Klassifikation</subfield><subfield code="0">(DE-588)4030958-7</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Molekülstruktur</subfield><subfield code="0">(DE-588)4170383-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="3"><subfield code="a">DNS</subfield><subfield code="0">(DE-588)4070512-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">B:DE-101</subfield><subfield code="q">application/pdf</subfield><subfield code="u">https://d-nb.info/1272996719/04</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">DNB Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034237323&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">Digitalisierung UB Regensburg - ADAM Catalogue Enrichment</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034237323&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-034237323</subfield></datafield><datafield tag="883" ind1="2" ind2=" "><subfield code="8">1\p</subfield><subfield code="a">dnb</subfield><subfield code="d">20230308</subfield><subfield code="q">DE-101</subfield><subfield code="u">https://d-nb.info/provenance/plan#dnb</subfield></datafield></record></collection>
genre	(DE-588)4113937-9 Hochschulschrift gnd-content
genre_facet	Hochschulschrift
id	DE-604.BV048973746
illustrated	Illustrated
index_date	2024-07-03T22:03:29Z
indexdate	2024-07-10T09:51:41Z
institution	BVB
language	English German
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-034237323
oclc_num	1348921342
open_access_boolean
owner	DE-355 DE-BY-UBR
owner_facet	DE-355 DE-BY-UBR
physical	xviii, 137 Seiten Illustrationen, Diagramme 30 cm
publishDate	2022
publishDateSearch	2022
publishDateSort	2022
record_format	marc
spelling	Mock, Florian 1991- Verfasser (DE-588)1214757367 aut Context sensitive neural networks for the classification of DNA, RNA and protein sequences von M. Sc. Florian Mock Jena [2022?] xviii, 137 Seiten Illustrationen, Diagramme 30 cm txt rdacontent n rdamedia nc rdacarrier Deutsche Zusammenfassung: Seite v-vi Dissertation Friedrich-Schiller-Universität Jena 2022 DNS (DE-588)4070512-2 gnd rswk-swf Molekülstruktur (DE-588)4170383-2 gnd rswk-swf Neuronales Netz (DE-588)4226127-2 gnd rswk-swf Klassifikation (DE-588)4030958-7 gnd rswk-swf (DE-588)4113937-9 Hochschulschrift gnd-content Neuronales Netz (DE-588)4226127-2 s Klassifikation (DE-588)4030958-7 s Molekülstruktur (DE-588)4170383-2 s DNS (DE-588)4070512-2 s DE-604 B:DE-101 application/pdf https://d-nb.info/1272996719/04 Inhaltsverzeichnis DNB Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034237323&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis Digitalisierung UB Regensburg - ADAM Catalogue Enrichment application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034237323&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis 1\p dnb 20230308 DE-101 https://d-nb.info/provenance/plan#dnb
spellingShingle	Mock, Florian 1991- Context sensitive neural networks for the classification of DNA, RNA and protein sequences DNS (DE-588)4070512-2 gnd Molekülstruktur (DE-588)4170383-2 gnd Neuronales Netz (DE-588)4226127-2 gnd Klassifikation (DE-588)4030958-7 gnd
subject_GND	(DE-588)4070512-2 (DE-588)4170383-2 (DE-588)4226127-2 (DE-588)4030958-7 (DE-588)4113937-9
title	Context sensitive neural networks for the classification of DNA, RNA and protein sequences
title_auth	Context sensitive neural networks for the classification of DNA, RNA and protein sequences
title_exact_search	Context sensitive neural networks for the classification of DNA, RNA and protein sequences
title_exact_search_txtP	Context sensitive neural networks for the classification of DNA, RNA and protein sequences
title_full	Context sensitive neural networks for the classification of DNA, RNA and protein sequences von M. Sc. Florian Mock
title_fullStr	Context sensitive neural networks for the classification of DNA, RNA and protein sequences von M. Sc. Florian Mock
title_full_unstemmed	Context sensitive neural networks for the classification of DNA, RNA and protein sequences von M. Sc. Florian Mock
title_short	Context sensitive neural networks for the classification of DNA, RNA and protein sequences
title_sort	context sensitive neural networks for the classification of dna rna and protein sequences
topic	DNS (DE-588)4070512-2 gnd Molekülstruktur (DE-588)4170383-2 gnd Neuronales Netz (DE-588)4226127-2 gnd Klassifikation (DE-588)4030958-7 gnd
topic_facet	DNS Molekülstruktur Neuronales Netz Klassifikation Hochschulschrift
url	https://d-nb.info/1272996719/04 http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034237323&sequence=000001&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=034237323&sequence=000003&line_number=0002&func_code=DB_RECORDS&service_type=MEDIA
work_keys_str_mv	AT mockflorian contextsensitiveneuralnetworksfortheclassificationofdnarnaandproteinsequences

Verfügbarkeit

Es ist kein Print-Exemplar vorhanden.

Fernleihe Bestellen Achtung: Nicht im THWS-Bestand! Inhaltsverzeichnis
Inhaltsverzeichnis
Inhaltsverzeichnis

MARC

Datensatz im Suchindex

Es ist kein Print-Exemplar vorhanden.

Ähnliche Einträge