Glossary

    Chaos Engineering

    Architecture & Design

    Chaos Engineering is the discipline of experimenting on a system by deliberately introducing failures, so you learn how it really behaves under stress instead of assuming it is resilient. The goal is not to break things for fun, it is to build confidence that the system withstands turbulent conditions in production.

    The Experiment Loop

    • Define the steady state: a measurable signal of normal health (latency percentiles, error rate, successful orders per minute).
    • Form a hypothesis: the steady state will hold when a specific fault occurs.
    • Inject the fault: stop an instance, add latency, fail over a database, isolate an Availability Zone.
    • Measure and learn: did the steady state hold? If not, you found a weakness to fix.

    On AWS

    AWS provides AWS Fault Injection Service (FIS) to run these experiments safely, with stop conditions wired to CloudWatch alarms so an experiment aborts automatically if it goes too far. Always start in a sandbox, keep the blast radius small, and graduate to production only once you trust the controls.

    Toc Consulting: AWS Security & Cloud Architecture

    Securing your AWS estate?

    Our team helps engineering teams secure and architect AWS the right way: assessment in week one, a prioritized action plan in week two.