ROUTE 05: If heroics are propping up your system, start here


What this route does in 10 minutes:

You’ll understand why systems built on heroics are brittle, identify which heroic behaviors are masking structural problems, and know how to build resilience that doesn’t depend on specific people being available.

This diagram illustrates contrasting models: a centralized “Heroic System” versus a distributed, resilient system.
Contrasting models: a centralized “Heroic System” versus a distributed, resilient system.

Start here: Three fast questions

Before you dive in, orient yourself:

  1. Who keeps things running? Can you name specific people without whom the system fails?
  2. What happens when they’re gone? Do things break, stall, or require emergency workarounds?
  3. Is this sustainable? Can the system survive if key people leave, get sick, or go on vacation?

If your system depends on heroic effort from specific individuals, you’re in the right route.


Quick diagnostic

This diagram illustrates how brittle systems, marked by heroics and rushed fixes, contrast with resilient systems built on strong processes.
This diagram illustrates how brittle systems, marked by heroics and rushed fixes, contrast with resilient systems built on strong processes.

You are probably in this route if:

  • System works only when specific people intervene
  • Someone is “the only one who knows how that works”
  • After-hours calls are routine, not exceptional
  • Knowledge is tribal, not documented
  • People can’t take vacation without system degradation
  • You hear “just call [person name]” as the solution
  • Burnout is visible but considered necessary
  • New people can’t ramp up quickly

You are probably NOT in this route if:

  • System has clear owners but interfaces fail (try Route 01)
  • Heroics aren’t the problem, decision paralysis is (try Route 02)
  • System works but you need portfolio visibility (try Route 03)
  • External partners are the constraint (try Route 06)

Doctrine and Annex anchors

These pieces define the principles and provide the models:

Doctrine 10: Degraded Operations Are the Normal Mode, Not the Exception Why systems must be designed to function when things aren’t perfect, not just when everything is ideal.

Doctrine 12: Resilience Is an Emergent Property, Not a Feature How resilience emerges from system structure, not from adding “resilience features” or relying on heroic individuals.

ANNEX E. Prevention-Contingency Matrix Framework for distinguishing what you prevent versus what you prepare contingency for, and how to design for both.


Field Notes that show the failure mode in the wild

These cases show what happens when systems depend on heroics and how to build resilience instead:

Field Note: Systems Built On Heroics Are Brittle Why depending on specific people creates single points of failure, and how to identify when heroics are masking structural problems.

Field Note: Gates That Matter: Task Books, Checkrides And Real Safety How aviation and wildland fire use structured qualification systems to ensure resilience without depending on specific individuals.

Field Note: Stranded in Vienna, Responsible in Kyiv Case study of distributed systems working when key people are unavailable, showing how to build resilience through structure.

Field Note: Guardrails, Not Gates How to design systems that prevent catastrophic failure without creating rigid processes that slow everything down.


Choose your situation

Pick the scenario that most closely matches your context:

Scenario A: One person is the system

Signals:

  • System depends entirely on one individual
  • That person can’t take vacation
  • Knowledge transfer hasn’t happened
  • New hires can’t fill the role quickly
  • Succession planning is “someday, not now”

Start with:

  1. Field Note: “Systems Built On Heroics” (see the pattern)
  2. Doctrine 12 (resilience through structure, not people)
  3. ANNEX E (design prevention and contingency)

Scenario B: After-hours heroics are routine

Signals:

  • Weekends and evenings are expected work time
  • “Emergency” fixes happen multiple times per week
  • On-call rotation is constant fire-fighting
  • People joke about not having work-life balance
  • Burnout is visible but normalized

Start with:

  1. Doctrine 10 (design for degraded mode)
  2. Field Note: “Guardrails, Not Gates” (prevent catastrophic failure without rigidity)
  3. ANNEX E (distinguish prevention from contingency)

Scenario C: Tribal knowledge blocks new people

Signals:

  • New hires struggle for months to ramp up
  • Documentation doesn’t exist or is badly outdated
  • Only 2-3 people understand the system
  • Knowledge is passed through shadowing, not formal training
  • You can’t scale the team even when budget exists

Start with:

  1. Field Note: “Gates That Matter” (structured qualification systems)
  2. Field Note: “Systems Built On Heroics” (identify tribal knowledge patterns)
  3. Doctrine 12 (build structural resilience)

Scenario D: System breaks when key person is unavailable

Signals:

  • Vacation causes system degradation
  • Illness creates immediate crisis
  • Travel means emergency calls
  • Coverage doesn’t actually cover
  • “Just wait until [person] gets back”

Start with:

  1. Field Note: “Stranded in Vienna” (distributed resilience in action)
  2. Doctrine 10 (design for absence)
  3. ANNEX E (contingency planning)

Scenario E: Not sure which scenario fits

Start with:

  1. Field Note: “Systems Built On Heroics” (identify the pattern)
  2. Doctrine 10 (design for degraded mode)
  3. Doctrine 12 (structural resilience)
  4. ANNEX E (prevention vs contingency)
  5. Field Note: “Gates That Matter” (qualification systems)

Recommended default path

If you’re unsure which scenario fits, follow this sequence:

  1. Doctrine 10: Degraded Operations Are the Normal Mode (5 minutes) Understand why systems must work when things aren’t perfect.
  2. Doctrine 12: Resilience Is an Emergent Property (5 minutes) Learn how resilience comes from structure, not heroic individuals.
  3. ANNEX E. Prevention-Contingency Matrix (10 minutes) Framework for designing prevention and contingency systems.
  4. Field Note: Systems Built On Heroics (5 minutes) See the failure mode in practice.
  5. Field Note: Stranded in Vienna (5 minutes) See distributed resilience working without key individuals.

Total time: 30 minutes from heroics-dependent to structurally resilient.


What to do next

In the next 15 minutes:

  • List your top 5 heroic dependencies (systems that depend on specific people)
  • For each one, ask: “What breaks when this person is unavailable?”
  • Identify the highest-risk dependency (most critical system, least backup)

In the next 60 minutes:

  • Pick the highest-risk heroic dependency
  • Document: What does this person do that nobody else can?
  • Identify: Is this tribal knowledge, structural design flaw, or both?
  • Draft a one-page resilience plan: cross-training, documentation, or redesign

This week:

  • Create succession documentation for the top heroic role
  • Identify 2-3 people who could cover (even partially)
  • Schedule knowledge transfer sessions (don’t wait for emergency)
  • Document the critical procedures that only one person knows
  • Build degraded mode plans: what works when key person is unavailable?

Optional: Send me 5 sentences

If you want targeted guidance for your specific situation, describe:

  1. What system depends on heroics? (what breaks, when, how)
  2. Who are the heroes? (roles, not names if sensitive)
  3. What happens when they’re unavailable? (consequences)
  4. What constraints matter? (budget, timeline, politics, technical)
  5. What would “resilient” look like? (desired state)