Resiliency
an ability to recover from or adjust easily to adversity or change.
Resiliency is very important in my home lab setup, both as an exercise for real-world production deployments, and to ease the maintenance burden and headache for servers. Thinking about resiliency and failure from the start can really reduce the worries and headaches that we would otherwise encounter when things go wrong.
For me, resiliency essentially boils down to the following points:
- Reduce single points of failure
- Make it easy to recover from failures in hardware or software
- Document everything useful
TBC