If something gets out of control, why not just “let it crash”? If you think about critical parts of the infrastructure or critical systems like planes, trains or nuclear reactors – well, obviously you can’t.
Nobody wants their system to crash, but is that necessarily true of all its components? Often it would be even desirable for a misbehaving control unit or software component to stop working at all – and instantly being replaced by a redundant (or better diverse) part – than to compromise the whole system for a longer period of time. Of course it is then necessary to recognize the crash as early as possible and replace the failed component via hot-spare as fast as possible. Detecting and compensating a still actively misbehaving part of hardware or software corrupting the whole system is much more difficult. Continue reading