Operational Resilience Tabletop Exercises: Designing Scenarios That Satisfy DORA and NIS2 Auditors | FortisEU

A well-run tabletop exercise is one of the most operationally informative activities a security and resilience function can do. A badly run one — which is what supervisors encounter far more often — is a two-hour meeting in which a facilitator reads a pre-written scenario aloud, participants agree that their plans cover it, and an "exercise report" is filed in a document management system to satisfy the audit requirement. DORA Article 25 and NIS2 Article 21(2)(b) both require testing of business continuity and crisis management capabilities, and both regulatory regimes have moved past accepting the second kind.

This post describes how to design and run tabletop exercises that produce genuine learning, generate audit-worthy evidence, and do not waste the time of the twelve senior people in the room. It also covers the failure modes that supervisors have publicly flagged as disqualifying.

What Auditors Are Actually Looking For

Before designing the exercise, it is worth clarifying what the supervisor will examine. In recent DORA-aligned reviews and NIS2 competent authority inspections across 2025, the patterns converge on four specific evidence items.

The scenario rationale. Why this scenario, why this year, why this scale. A scenario plucked from a vendor's playbook library is harder to defend than one traced directly to the entity's threat and risk assessment, its register of past incidents, or the ENISA threat landscape. Supervisors have explicitly flagged "scenario divorced from entity's risk profile" as a common finding.

The participant list and their real authority. An exercise that involves the right names on an org chart but none of them were actually present, or that involves only the IT function when the scenario implicates legal, communications, and executive decision-making, is not a genuine test. Evidence of attendance must be matched to the roles required for the scenario.

The findings and the follow-through. A tabletop that generates no findings is not credible. A tabletop that generates findings but no tracked remediation is worse — it demonstrates that the entity knows its weaknesses and has chosen not to address them. The evidence supervisors ask for is not just the exercise report but the register of findings, the assigned owners, the remediation deadlines, and the completion evidence.

The independence. For significant financial entities under DORA, the exercise must involve adequate independent challenge. A facilitator from the same function being tested is not independent challenge. Supervisors are increasingly looking for evidence of external facilitation or, at minimum, facilitation by a function organisationally separate from the one whose plans are being tested.

Scenario Design That Exposes Weaknesses

The common failure mode of tabletop design is the "comfortable scenario" — a plausible incident that happens to be covered cleanly by the existing plan, so everyone performs well. These exercises are not useless, but they generate no learning and they do not satisfy supervisors who have seen several of them. The better design starts from the opposite premise: which scenarios would expose the plan to meaningful stress?

Three scenario families reliably produce learning.

Simultaneity. Two incidents occurring at once, where each plan individually covers one but neither plan covers the resource contention that the combination produces. A typical formulation: a critical vendor outage occurring during a period of elevated market volatility, with the CISO unavailable. The combination tests whether the deputy chain works, whether vendor communications scale, and whether the executive decision cadence holds up when it does not fit into the single-incident rhythm the plans assume.

Boundary incidents. Scenarios at the edge of the plan's activation criteria, where the first question is whether the plan is activated at all. A typical formulation: an unusual pattern of customer complaints suggesting a potential data integrity issue, with ambiguous technical evidence. These scenarios test the classification and escalation logic, which is where most plans fail in practice.

Compounding third-party scenarios. A subcontractor in the ICT chain — not the direct vendor — experiences a failure that propagates through multiple direct vendors. These test the chain-level concentration analysis and expose whether the entity actually understands its fourth-party exposure or only has documented it.

The scenario should come with a structured set of injects — additional information revealed at specific points during the exercise — that force decision-making. Without injects, a tabletop tends to devolve into a discussion of the plan rather than a test of the decisions. Good injects are drawn from real patterns: the media contact that happens before the PR plan is activated, the supervisor who calls asking for a status update, the subsidiary that insists on activating its own plan in parallel.

The Mechanics That Separate Useful From Performative

Four mechanical choices materially affect whether an exercise produces real findings.

Information asymmetry. Not every participant should receive the same information. In a real incident, the CEO does not have the engineer's terminal open, and the legal team does not see the operations dashboard. Exercises that distribute identical briefing packs to every participant teach nothing about how information flows. Distribute the briefing and the injects by role, so that participants have to actively request information from each other, as they would in a real event.

Time compression. Real incidents unfold over hours or days. A two-hour tabletop cannot literally replay a 48-hour incident, but it can compress time by using clear time skips ("the scenario now jumps to T+6 hours; here is what has happened in between"). Without time compression, every tabletop ends up testing only the first four hours of an incident, which is where most plans are strongest. The interesting failures are at T+36 hours, when fatigue, shift handoffs, and information drift start to dominate.

Decision logging. Every significant decision made during the exercise — including the decision not to escalate, not to communicate, not to activate — must be logged in real time with the rationale. This log is a primary evidence artefact for supervisors. It also makes the after-action review substantive rather than speculative, because the facilitator can ask "at 14:32 you decided not to activate the crisis communications plan; what changed between the information you had then and the information you had at 15:10 when you did activate it".

Observable roles. Assign observers whose only job is to watch specific processes — one watching the information flow between the operations cell and the executive committee, one watching the external communications cadence, one watching the third-party vendor engagement. Observers are the richest source of findings because they see patterns that participants, immersed in the scenario, do not notice.

The After-Action Review and the Findings Register

A two-hour exercise followed by a thirty-minute wrap-up produces a weak after-action review. The pattern that reliably produces supervisory-grade findings is a structured after-action review conducted 24-72 hours after the exercise — late enough for participants to have reflected, early enough for memory to be accurate.

The review should cover, in order:

What actually happened in the room. Reconstructed from the decision log, the observer notes, and participant recollection. Distinguish clearly between what the plan says should have happened, what participants thought was happening, and what the observers recorded.
Where plans and reality diverged. Each divergence is a candidate finding. Some are genuine gaps; some are acceptable deviations that should be codified into the plan; some are exercise artefacts.
Where decisions were made without documented authority. Another rich source of findings, because real incidents expose the same pattern.
What information was needed and not available. Gaps in monitoring, in vendor contracts, in regulatory contacts.
What information was available but not used. Frequently more telling than the previous item, because it points to training and process gaps rather than instrumentation gaps.

Each finding must have an owner, a target date, and a tracking identifier. The register of findings should be reviewable alongside prior exercises' findings to demonstrate continuous improvement. Repeat findings across exercises are themselves a finding — they indicate that the remediation process is not working.

The Supervisor-Facing Evidence Package

For DORA-regulated entities and significant NIS2 entities, the evidence package the supervisor will request typically includes:

The exercise charter, including scenario rationale traced to the risk assessment
The participant list, with role and attendance evidence
The briefing pack and inject schedule
The decision log
The observer notes
The after-action review report
The findings register with ownership and status
Evidence of remediation completion for findings from prior exercises

A mature programme packages these in a standard structure for every exercise. An immature programme reassembles them for each supervisory request, and in the reassembly some elements inevitably go missing or reveal inconsistencies between documents.

Realistic Cadence

DORA Article 25 requires a comprehensive testing programme with a specified cadence for different test types. Tabletops are typically run:

Executive-level crisis management tabletop: annually at minimum, with full executive committee participation
Operational tabletop covering a specific critical function: every 6-12 months per critical or important function
Scenario-specific tabletops triggered by threat intelligence: as needed, when the threat landscape or the entity's risk profile changes materially

NIS2 has a less prescriptive cadence but a similar effective expectation — a NIS2 entity that cannot demonstrate at least one substantive crisis management exercise per year will struggle in a competent authority inspection.

The cadence matters less than the continuity. A single excellent exercise followed by two years of nothing is weaker evidence than a consistent quarterly cadence of smaller, more focused exercises. Supervisors look for the programme, not the individual exercise.

What to Avoid

Three patterns are reliably disqualifying in 2026 supervisory reviews:

Exercises where the scenario was the same as the vendor's template, with no entity-specific adaptation
Exercises with no findings, or findings that all resolved to "the plan worked as expected"
Exercises that involved only the technology function when the scenario required executive, legal, and communications decision-making

A tabletop exercise is a cheap form of testing compared to a full-scale failover or a threat-led penetration test. It is cheap because it does not require a technical environment, not because it requires less preparation. The entities that get value from tabletops — and that satisfy supervisors — are the ones that invest the preparation budget in scenario design, facilitation, and follow-through, not the ones that treat the exercise itself as the deliverable.