Static stability is the principle that a system should continue to operate through a failure using capacity that is already provisioned, rather than depending on new actions during the event. AWS Well-Architected describes it as avoiding bimodal behavior, where a workload behaves differently in normal and failure modes.
If you run across two Availability Zones, provision enough capacity in each so that losing one AZ leaves the survivor able to carry full load immediately. The anti-pattern is running just enough capacity and assuming you can launch more in the surviving AZ during the failure, exactly when the control plane may be degraded.
Chaos experiments verify static stability by deliberately removing an Availability Zone and confirming the workload keeps serving from the rest.
The practice of deliberately injecting failures into a system to discover weaknesses before they cause outages, by forming a hypothesis about steady state and testing it under real-world fault conditions.
The maximum acceptable length of time a system can be down after a failure before the impact is unacceptable. It defines how fast you must recover.
The scope of impact when a security incident occurs - how many resources, accounts, or users are affected. Smaller blast radius means better security posture.
Toc Consulting: AWS Security & Cloud Architecture
Our team helps engineering teams secure and architect AWS the right way: assessment in week one, a prioritized action plan in week two.