Recent Changes - Search:

Research

Notes

Architecture

Faults

System

Planning

Background

OS

Misc

edit SideBar

Faults

Takes a look at how to deal with faults.

  • Taxonomy - How to categorize faults.
  • Detection - Just being able to realize that a fault has occurred can be difficult. This is vitally important, especially since many systems make the assumption that components are "Fail Stop": faults do not propagate, but are detected immediately.
  • Recovery - Assuming that the fault has been detected, the next step is recovery. Restarting the offending component, rebuilding state, loading a checkpoint, diagnosing the problem...

General Strategies

Protecting from faults often uses one of three general strategies. Various implementations are discussed in this sections sub-pages. The strategies themselves are:

  • k-modular redundancy -
  • Primary Backup - hot / cold-standby are variations, as is active replication which is a decentralized version.
  • Checkpointing
Edit - History - Print - Recent Changes - Search
Page last modified on June 01, 2014, at 06:35 PM