- Always have a backup.
- Always monitor your backups.
- Always backup your monitoring.
- Always investigate failures; work out how to monitor it to catch it quicker next time (hopefully before it fails).
Backups without monitoring is no backup at all; if your backups fail, you won’t know until you try to use them.
A note about RAID
Unmonitored RAID 1/5/6/10 is no better than a single disk; as each disk pops, you’ll never notice until the last one goes.
A note about UPS’s
UPS’s fail, OK.
Resilience and cost
Resilient, fast, and cheap; chose any two.
Lies in status messages
Just because something says OK, doesn’t mean it is OK.