Checkpointing and rollback recovery for distributed systems:
We consider the problem of bringing a distributed system to a consistent state after transient failures. We address the two components of this problem by describing a distributed algorithm to create consistent checkpoints, as well as a rollback-recovery algorithm to recover the system to a consisten...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Ithaca, New York
1985
|
Schriftenreihe: | Cornell University <Ithaca, NY> / Department of Computer Science: Technical report
706 |
Schlagworte: | |
Zusammenfassung: | We consider the problem of bringing a distributed system to a consistent state after transient failures. We address the two components of this problem by describing a distributed algorithm to create consistent checkpoints, as well as a rollback-recovery algorithm to recover the system to a consistent state. In contrast to previous algorithms, they tolerate failures that occur during their executions. Furthermore, when a process takes a checkpoint, a minimal number of additional processes are forced to take checkpoints. Similarly, when a process rolls back and restarts after a failure, a minimal number of additional processes are forced to roll back with it. Our algorithms require each process to store at most two checkpoints in stable storage. This storage requirement is shown to be minimal under general assumptions. |
Beschreibung: | 22 S. |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV010589727 | ||
003 | DE-604 | ||
005 | 00000000000000.0 | ||
007 | t | ||
008 | 960130s1985 |||| 00||| engod | ||
035 | |a (OCoLC)14637583 | ||
035 | |a (DE-599)BVBBV010589727 | ||
040 | |a DE-604 |b ger |e rakddb | ||
041 | 0 | |a eng | |
049 | |a DE-91G | ||
100 | 1 | |a Koo, Richard |e Verfasser |4 aut | |
245 | 1 | 0 | |a Checkpointing and rollback recovery for distributed systems |c Richard Koo ; Sam Toueg |
264 | 1 | |a Ithaca, New York |c 1985 | |
300 | |a 22 S. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Cornell University <Ithaca, NY> / Department of Computer Science: Technical report |v 706 | |
520 | 3 | |a We consider the problem of bringing a distributed system to a consistent state after transient failures. We address the two components of this problem by describing a distributed algorithm to create consistent checkpoints, as well as a rollback-recovery algorithm to recover the system to a consistent state. In contrast to previous algorithms, they tolerate failures that occur during their executions. Furthermore, when a process takes a checkpoint, a minimal number of additional processes are forced to take checkpoints. Similarly, when a process rolls back and restarts after a failure, a minimal number of additional processes are forced to roll back with it. Our algorithms require each process to store at most two checkpoints in stable storage. This storage requirement is shown to be minimal under general assumptions. | |
650 | 4 | |a Fault-tolerant computing | |
650 | 4 | |a Operating systems (Computers) | |
700 | 1 | |a Toueg, Sam |e Verfasser |4 aut | |
810 | 2 | |a Department of Computer Science: Technical report |t Cornell University <Ithaca, NY> |v 706 |w (DE-604)BV006185504 |9 706 | |
999 | |a oai:aleph.bib-bvb.de:BVB01-007061596 |
Datensatz im Suchindex
_version_ | 1804125060520214528 |
---|---|
any_adam_object | |
author | Koo, Richard Toueg, Sam |
author_facet | Koo, Richard Toueg, Sam |
author_role | aut aut |
author_sort | Koo, Richard |
author_variant | r k rk s t st |
building | Verbundindex |
bvnumber | BV010589727 |
ctrlnum | (OCoLC)14637583 (DE-599)BVBBV010589727 |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01874nam a2200313 cb4500</leader><controlfield tag="001">BV010589727</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">00000000000000.0</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">960130s1985 |||| 00||| engod</controlfield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)14637583</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV010589727</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakddb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-91G</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Koo, Richard</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Checkpointing and rollback recovery for distributed systems</subfield><subfield code="c">Richard Koo ; Sam Toueg</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Ithaca, New York</subfield><subfield code="c">1985</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">22 S.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Cornell University <Ithaca, NY> / Department of Computer Science: Technical report</subfield><subfield code="v">706</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">We consider the problem of bringing a distributed system to a consistent state after transient failures. We address the two components of this problem by describing a distributed algorithm to create consistent checkpoints, as well as a rollback-recovery algorithm to recover the system to a consistent state. In contrast to previous algorithms, they tolerate failures that occur during their executions. Furthermore, when a process takes a checkpoint, a minimal number of additional processes are forced to take checkpoints. Similarly, when a process rolls back and restarts after a failure, a minimal number of additional processes are forced to roll back with it. Our algorithms require each process to store at most two checkpoints in stable storage. This storage requirement is shown to be minimal under general assumptions.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Fault-tolerant computing</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Operating systems (Computers)</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Toueg, Sam</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="810" ind1="2" ind2=" "><subfield code="a">Department of Computer Science: Technical report</subfield><subfield code="t">Cornell University <Ithaca, NY></subfield><subfield code="v">706</subfield><subfield code="w">(DE-604)BV006185504</subfield><subfield code="9">706</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-007061596</subfield></datafield></record></collection> |
id | DE-604.BV010589727 |
illustrated | Not Illustrated |
indexdate | 2024-07-09T17:55:33Z |
institution | BVB |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-007061596 |
oclc_num | 14637583 |
open_access_boolean | |
owner | DE-91G DE-BY-TUM |
owner_facet | DE-91G DE-BY-TUM |
physical | 22 S. |
publishDate | 1985 |
publishDateSearch | 1985 |
publishDateSort | 1985 |
record_format | marc |
series2 | Cornell University <Ithaca, NY> / Department of Computer Science: Technical report |
spelling | Koo, Richard Verfasser aut Checkpointing and rollback recovery for distributed systems Richard Koo ; Sam Toueg Ithaca, New York 1985 22 S. txt rdacontent n rdamedia nc rdacarrier Cornell University <Ithaca, NY> / Department of Computer Science: Technical report 706 We consider the problem of bringing a distributed system to a consistent state after transient failures. We address the two components of this problem by describing a distributed algorithm to create consistent checkpoints, as well as a rollback-recovery algorithm to recover the system to a consistent state. In contrast to previous algorithms, they tolerate failures that occur during their executions. Furthermore, when a process takes a checkpoint, a minimal number of additional processes are forced to take checkpoints. Similarly, when a process rolls back and restarts after a failure, a minimal number of additional processes are forced to roll back with it. Our algorithms require each process to store at most two checkpoints in stable storage. This storage requirement is shown to be minimal under general assumptions. Fault-tolerant computing Operating systems (Computers) Toueg, Sam Verfasser aut Department of Computer Science: Technical report Cornell University <Ithaca, NY> 706 (DE-604)BV006185504 706 |
spellingShingle | Koo, Richard Toueg, Sam Checkpointing and rollback recovery for distributed systems Fault-tolerant computing Operating systems (Computers) |
title | Checkpointing and rollback recovery for distributed systems |
title_auth | Checkpointing and rollback recovery for distributed systems |
title_exact_search | Checkpointing and rollback recovery for distributed systems |
title_full | Checkpointing and rollback recovery for distributed systems Richard Koo ; Sam Toueg |
title_fullStr | Checkpointing and rollback recovery for distributed systems Richard Koo ; Sam Toueg |
title_full_unstemmed | Checkpointing and rollback recovery for distributed systems Richard Koo ; Sam Toueg |
title_short | Checkpointing and rollback recovery for distributed systems |
title_sort | checkpointing and rollback recovery for distributed systems |
topic | Fault-tolerant computing Operating systems (Computers) |
topic_facet | Fault-tolerant computing Operating systems (Computers) |
volume_link | (DE-604)BV006185504 |
work_keys_str_mv | AT koorichard checkpointingandrollbackrecoveryfordistributedsystems AT touegsam checkpointingandrollbackrecoveryfordistributedsystems |