Cross-layer reliability of computing systems:

This book presents state-of-the-art solutions for increasing the resilience of computing systems, both at single levels of abstraction and multi-layers. It is a valuable resource for researchers, postgraduate students and professional computer architects focusing on the dependability of computing sy...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Weitere Verfasser: Di Natale, Giorgio (HerausgeberIn), Gizopoulos, Dimitris (HerausgeberIn), Di Carlo, Stefano (HerausgeberIn), Bosio, Alberto (HerausgeberIn), Canal, Ramon (HerausgeberIn)
Format: Elektronisch E-Book
Sprache:English
Veröffentlicht: Stevenage The Institution of Engineering and Technology 2020
Schriftenreihe:IET materials, circuits and devices series 57
Online-Zugang:TUM01
UBY01
UER01
Zusammenfassung:This book presents state-of-the-art solutions for increasing the resilience of computing systems, both at single levels of abstraction and multi-layers. It is a valuable resource for researchers, postgraduate students and professional computer architects focusing on the dependability of computing systems
Beschreibung:Intro -- Contents -- Part I: Design techniques to improve the resilience of computing systems -- 1. Technological layer | Antonio Rubio and Ramon Canal -- 1.1 Introduction -- 1.1.1 Faults, errors and failures -- 1.2 Technology overview -- 1.2.1 Technologies based on electric charge -- 1.2.2 Roadmap for adoption -- 1.2.3 Sources of unreliability in technology -- 1.3 CPU building blocks -- 1.3.1 Combinatorial circuits -- 1.3.2 Memories -- 1.3.3 Main memory and storage -- 1.3.4 Emerging memories -- 1.4 Characterization -- 1.4.1 Manufacturing -- 1.4.2 Radiation -- 1.5 Conclusions -- References -- 2. Design techniques to improve the resilience of computing systems: logic layer | Lorena Anghel and Michael Nicolaidis -- 2.1 Introduction -- 2.2 Performance and reliability monitors -- 2.2.1 Double-sampling methodology and the basic architecture -- 2.3 Double-sampling-based monitors for detecting performance violations and transient faults -- 2.3.1 External-design monitors -- 2.3.2 Embedded monitors -- 2.3.3 Other types of monitors -- 2.3.4 Discussions -- 2.4 Conclusions -- References -- 3. Design techniques to improve the resilience of computing systems: architectural layer | Aviral Shrivastava, Kyoungwoo Lee, Hwisoo So, Jinhyo Jung, and Prudhvi Gali -- 3.1 Cache protection techniques -- 3.2 Register file protection techniques -- 3.3 Pipeline and core protection -- References -- 4. Design techniques to improve the resilience of computing systems: software layer | Alberto Bosio, Stefano Di Carlo, Giorgio Di Natale, Matteo Sonza Reorda, and Josie E. Rodriguez Condia -- 4.1 Introduction -- 4.2 Fault taxonomy -- 4.2.1 Software faults -- 4.3 Software-Implemented Hardware Fault Tolerance -- 4.3.1 Modify the software in order to reduce the probability of fault occurrences -- 4.3.2 Detecting/tolerating the presence of an error -- 4.4 Software-Based Self-Test
4.4.1 Basics on SBST -- 4.5 SBST for GPGPUs -- 4.5.1 Introduction -- 4.5.2 Effects of permanent faults in GPGPU devices -- 4.5.3 SBST techniques for testing the GPGPU scheduler -- References -- 5. Cross-layer resilience | Eric Cheng and Subhasish Mitra -- 5.1 Introduction -- 5.2 CLEAR framework -- 5.2.1 Reliability analysis -- 5.2.2 Execution time -- 5.2.3 Physical design -- 5.2.4 Resilience library -- 5.3 Cross-layer combinations -- 5.3.1 Combinations for general-purpose processors -- 5.3.2 Targeting specific applications -- 5.4 Application benchmark dependence -- 5.5 The design of new resilience techniques -- 5.6 Conclusions -- Acknowledgments -- References -- Part II: Reliability assessment -- 6. Physical stress | Fernando Fernandes dos Santos, Fabio Benevenuti, Gennaro Rodrigues, Fernanda Kastensmidt, and Paolo Rech -- 6.1 Introduction -- 6.2 Effects and physical sources -- 6.3 Reliability metrics -- 6.4 General setup -- 6.5 Neutron beam experiments -- 6.6 Heavy ions and proton experiments -- 6.7 Laser test -- 6.8 Conclusions -- References -- 7. Soft error modeling and simulation | Mojtaba Ebrahimi and Mehdi Tahoori -- 7.1 Introduction -- 7.2 FIT rate analysis at device level -- 7.3 Multiple transient error site identification using layout information -- 7.3.1 Motivation for layout-based MT analysis and mitigation -- 7.3.2 Proposed layout-based MT error site extraction technique -- 7.3.3 Experimental results of MT modeling -- 7.4 Propagating flip-flop errors at circuit level -- 7.4.1 Event-driven logic simulation -- 7.4.2 Error propagation from single flip-flop -- 7.4.3 Concurrent transient error propagation from multiple flip-flops -- 7.4.4 Experimental results -- 7.5 Propagating combinational gates errors at circuit level -- 7.6 Emulation-based fault injection platform -- 7.6.1 Shadow components
7.6.2 Shadow components-based fault injection technique -- 7.6.3 Experimental results -- 7.7 Fault injection acceleration -- 7.7.1 Workflow -- 7.7.2 Analytical modeling -- 7.7.3 Case study: fault injection on memory arrays of Leon3 -- 7.8 Conclusions -- References -- 8. Microarchitecture-level reliability assessment of multi-core processors | Athanasios Chatzidimitriou and Dimitris Gizopoulos -- 8.1 Introduction -- 8.2 Background -- 8.2.1 Threats and vulnerability -- 8.3 Fault-effect classes -- 8.4 Statistical fault injection -- 8.5 Cross-layer and single-layer evaluation -- 8.6 Assessment throughput -- 8.6.1 Simulation acceleration -- 8.6.2 Fault list reduction -- 8.7 Estimation accuracy -- 8.8 Conclusions -- References -- 9. Fault injection at the instruction set architecture (ISA) level | Karthik Pattabiraman and Guanpeng Li -- 9.1 Introduction -- 9.2 Background -- 9.2.1 Terms and definitions -- 9.2.2 Failure outcomes -- 9.2.3 Metrics -- 9.2.4 Fault Injection process -- 9.2.5 Fault model -- 9.3 Classification of injection techniques -- 9.3.1 Simulation versus direct -- 9.3.2 Intrusive versus nonintrusive -- 9.3.3 Level of injection -- 9.3.4 Platform -- 9.3.5 Classification results -- 9.4 LLFI and PINFI fault injectors -- 9.4.1 LLVM fault injector: LLFI -- 9.4.2 PINFI -- 9.5 Open challenges and conclusion -- 9.5.1 Challenge 1: level of injection -- 9.5.2 Challenge 2: target platform -- 9.5.3 Challenge 3: bit-flip model -- 9.5.4 Conclusion -- Acknowledgments -- References -- 10. Analytical modeling for crosslayer resiliency | Arijit Biswas -- 10.1 Introduction -- 10.2 ACE lifetime analysis -- 10.2.1 Un-ACE and ACE -- 10.2.2 Little's law -- 10.2.3 Example of ACE lifetime analysis -- 10.2.4 AVFs of various structures and workloads using ACE lifetime analysis -- 10.2.5 Hamming Distance Analysis and bit field analysis
10.2.6 Hamming Distance Analysis and multi-bit fault modeling -- 10.3 Sequential AVF analysis -- 10.3.1 port AVF (pAVF) and structure AVF -- 10.3.2 Sequential AVF computation -- 10.4 Program vulnerability factor -- 10.4.1 Cross-layer modeling using AVF and PVF -- 10.5 Artifacts of analytical vulnerability modeling and mitigations -- 10.5.1 Significance of data values in analytical modeling -- 10.5.2 Reducing unknowns-warmup and cooldown -- 10.5.3 Dealing with large and complex models -- 10.6 Future directions for analytical technique -- 10.7 Summary of analytical modeling for vulnerability -- References -- 11. Stochastic methods | Alessandro Savino, Alessandro Vallero, and Stefano Di Carlo -- 11.1 Introduction -- 11.2 Methodologies -- 11.2.1 Reliability Block Diagrams -- 11.2.2 Markov Chains -- 11.2.3 Bayesian Networks -- 11.3 Conclusions -- References -- Index
Beschreibung:1 Online-Ressource Illustrationen, Diagramme
ISBN:9781785617980

Es ist kein Print-Exemplar vorhanden.

Fernleihe Bestellen Achtung: Nicht im THWS-Bestand!