auditing-green-mirage¶
Detects tests that pass but do not actually verify behavior: tautological assertions, mocked-away logic, missing edge cases, and assertions that would pass even if the code were broken. Traces every code path from test through production code to verify that failures would be caught. A core spellbook capability for auditing test suite integrity.
Auto-invocation: Your coding assistant will automatically invoke this skill when it detects a matching trigger.
Use when auditing whether tests genuinely catch failures, or when user expresses doubt about test quality. Triggers: 'are these tests real', 'do tests catch bugs', 'tests pass but I don't trust them', 'test quality audit', 'green mirage', 'shallow tests', 'tests always pass suspiciously', 'would this test fail if code was broken'. NOT for: fixing broken tests (use fixing-tests).
Workflow Diagram¶
Overview: Auditing Green Mirage Workflow¶
flowchart TD
subgraph legend[" Legend "]
direction LR
lp[Process]
ld{Decision}
lt([Terminal])
ls[Subagent Dispatch]:::subagent
lg[Quality Gate]:::gate
lk([Success]):::success
end
START([Skill Invoked]) --> INPUTS
INPUTS[Receive Inputs:<br>Test files required<br>Production files required<br>Test run results optional]
INPUTS --> P1
P1[Phase 1: Inventory<br>Enumerate files, map coverage,<br>estimate scope]
P1 --> P23
P23[Phase 2-3: Systematic Audit<br>Line-by-line analysis<br>+ 10 Green Mirage Patterns]:::subagent
P23 --> P4
P4[Phase 4: Cross-Test Analysis<br>Suite-level gap detection]:::subagent
P4 --> P56
P56[Phase 5-6: Findings Report<br>YAML + human-readable output]:::subagent
P56 --> SC
SC{Self-Check<br>Checklist passes?}:::gate
SC -->|No: gaps found| GOBACK[Return to incomplete phase]
GOBACK --> P23
SC -->|Yes: all checks pass| FW
FW{Fixes written?}
FW -->|No| DONE([Audit Complete:<br>Report delivered]):::success
FW -->|Yes| P7
P7[Phase 7: Fix Verification<br>Test Adversary review<br>MANDATORY]:::subagent
P7 --> VERDICT
VERDICT{All assertions<br>PASS?}:::gate
VERDICT -->|PASS: All KILLED +<br>Level 4+ + no Pattern 10| DONE
VERDICT -->|FAIL: SURVIVED or<br>Level 2 or Pattern 10| RETRY_Q{3 consecutive FAILs<br>on same assertion?}
RETRY_Q -->|No| REWORK[List required changes,<br>return to fix author]
RETRY_Q -->|Yes| HALT([HALT: Report to user<br>Circuit breaker tripped]):::fail
REWORK --> P7
classDef subagent fill:#4a9eff,color:#fff
classDef gate fill:#ff6b6b,color:#fff
classDef success fill:#51cf66,color:#fff
classDef fail fill:#ff6b6b,color:#fff
Cross-Reference Table¶
| Overview Node | Detail Diagram | Source Reference |
|---|---|---|
| Phase 1: Inventory | Detail 1 | SKILL.md lines 96-117 |
| Phase 2-3: Systematic Audit | Detail 2 | SKILL.md lines 119-147, audit-mirage-analyze command |
| Phase 4: Cross-Test Analysis | Detail 3 | SKILL.md lines 149-170, audit-mirage-cross command |
| Phase 5-6: Findings Report | Detail 4 | SKILL.md lines 172-195, audit-mirage-report command |
| Phase 7: Fix Verification | Detail 5 | SKILL.md lines 197-290, assertion-quality-standard pattern |
Detail 1: Phase 1 - Inventory¶
flowchart TD
subgraph legend[" Legend "]
direction LR
lp[Process]
ld{Decision}
ls[Subagent Dispatch]:::subagent
lg[Quality Gate]:::gate
end
START([Phase 1 Start]) --> SCOPE
SCOPE{Scope known?<br>Test files identified?}
SCOPE -->|No| EXPLORE[Dispatch Explore subagent<br>for file discovery]:::subagent
SCOPE -->|Yes| ENUM
EXPLORE --> ENUM
ENUM[Enumerate test files<br>with test function counts]
ENUM --> MAP
MAP[Map production code<br>to test files:<br>module.py tested by test_module.py]
MAP --> COUNT
COUNT[Compute totals:<br>test files, test functions,<br>production modules]
COUNT --> SIZE
SIZE{5+ test files?}
SIZE -->|Yes| TAG_PAR[Tag for parallel<br>subagent dispatch<br>in Phase 2-3]
SIZE -->|No| TAG_SINGLE[Tag for single subagent<br>or main context<br>in Phase 2-3]
TAG_PAR --> OUTPUT
TAG_SINGLE --> OUTPUT
OUTPUT[Output inventory document:<br>Files to Audit list<br>Production Code Under Test list<br>Estimated Scope totals]:::gate
OUTPUT --> DONE([Phase 1 Complete])
classDef subagent fill:#4a9eff,color:#fff
classDef gate fill:#ff6b6b,color:#fff
Detail 2: Phase 2-3 - Systematic Audit + 10 Green Mirage Patterns¶
flowchart TD
subgraph legend[" Legend "]
direction LR
lp[Process]
ld{Decision}
ls[Subagent Dispatch]:::subagent
lg[Quality Gate]:::gate
end
START([Phase 2-3 Start]) --> RECV[Receive inventory<br>from Phase 1]
RECV --> DISPATCH_Q
DISPATCH_Q{5+ test files?}
DISPATCH_Q -->|Yes| PAR[Dispatch parallel subagents<br>one per file or file group]:::subagent
DISPATCH_Q -->|No| SINGLE[Dispatch single subagent]:::subagent
PAR --> EACH
SINGLE --> EACH
EACH[Each subagent reads:<br>1. audit-mirage-analyze command<br>2. assertion-quality-standard pattern]:::gate
EACH --> LOOP
subgraph LOOP[" Per Test Function Loop "]
direction TB
TF[Select next test function] --> SETUP
SETUP[Setup Analysis:<br>What is set up? Mocks introduced?<br>Does setup hide real behavior?]
SETUP --> ACTION
ACTION[Action Analysis:<br>What operation is tested?<br>Trace code path through production]
ACTION --> TRACE
TRACE[Code Path Trace:<br>test -> production_fn -> helper -><br>external call mocked/real? -> return]
TRACE --> ASSERT_A
ASSERT_A[Assertion Analysis:<br>What does each assert verify?<br>What would it catch vs miss?]
ASSERT_A --> P2CHECK
P2CHECK{Pattern 2 fast path:<br>Uses bare 'in' check<br>on ANY output?}
P2CHECK -->|Yes| P2BANNED[Verdict: GREEN MIRAGE<br>Pattern 2 BANNED<br>No further investigation]
P2CHECK -->|No| ALL10
ALL10[Check against ALL<br>10 Green Mirage Patterns:<br>P1 Existence vs Validity<br>P3 Shallow Matching<br>P4 Lack of Consumption<br>P5 Mocking Reality<br>P6 Swallowed Errors<br>P7 State Mutation<br>P8 Incomplete Branches<br>P9 Skipped Tests<br>P10 Partial-to-Partial]
ALL10 --> VERDICT_T
P2BANNED --> RECORD
VERDICT_T{Pattern matches?}
VERDICT_T -->|None| SOLID[Verdict: SOLID]
VERDICT_T -->|Some gaps| PARTIAL[Verdict: PARTIAL]
VERDICT_T -->|Critical gaps| MIRAGE[Verdict: GREEN MIRAGE]
SOLID --> RECORD
PARTIAL --> RECORD
MIRAGE --> RECORD
RECORD[Record: verdict, gap description,<br>line numbers, fix code,<br>effort estimate, depends_on]
RECORD --> MORE
MORE{More test<br>functions?}
MORE -->|Yes| TF
end
MORE -->|No| COLLECT
COLLECT[Collect all subagent<br>results into findings list]
COLLECT --> DONE([Phase 2-3 Complete])
classDef subagent fill:#4a9eff,color:#fff
classDef gate fill:#ff6b6b,color:#fff
Detail 3: Phase 4 - Cross-Test Analysis¶
flowchart TD
subgraph legend[" Legend "]
direction LR
lp[Process]
ls[Subagent Dispatch]:::subagent
lg[Quality Gate]:::gate
end
START([Phase 4 Start]) --> DISPATCH
DISPATCH[Dispatch cross-analysis subagent<br>with Phase 2-3 findings]:::subagent
DISPATCH --> READ
READ[Read audit-mirage-cross command]
READ --> UNTESTED
UNTESTED[Identify functions/methods<br>never directly tested:<br>No test at all vs<br>Only tested as side effect]
UNTESTED --> ERRORS
ERRORS[Identify untested error paths:<br>Exception branches<br>Null returns<br>Timeouts<br>Invalid input handling]
ERRORS --> EDGES
EDGES[Identify untested edge cases:<br>Empty input<br>Max size input<br>Boundary values<br>Concurrent access<br>Unicode, negative values]
EDGES --> SKIPS
SKIPS[Scan ALL skip mechanisms:<br>pytest.mark.skip/skipif/xfail<br>unittest.skip/skipIf/skipUnless<br>pytest.importorskip<br>Commented-out tests<br>Conditional early returns]
SKIPS --> CLASSIFY
CLASSIFY{For each skipped test:<br>Environmental constraint?}
CLASSIFY -->|Yes: wrong OS,<br>missing hardware| JUSTIFIED[Classify: JUSTIFIED]
CLASSIFY -->|No: flaky, broken,<br>deferred, failing| UNJUSTIFIED[Classify: UNJUSTIFIED<br>= live defect hiding<br>behind green build]
JUSTIFIED --> ISOLATION
UNJUSTIFIED --> ISOLATION
ISOLATION[Identify test isolation issues:<br>Shared mutable state<br>Execution order dependencies<br>External service dependencies<br>Missing cleanup]
ISOLATION --> OUTPUT
OUTPUT[Output suite-level gap analysis:<br>Untested functions count<br>Untested error paths<br>Untested edge cases<br>X skipped, Y unjustified<br>Isolation issues]:::gate
OUTPUT --> DONE([Phase 4 Complete])
classDef subagent fill:#4a9eff,color:#fff
classDef gate fill:#ff6b6b,color:#fff
Detail 4: Phase 5-6 - Findings Report and Output¶
flowchart TD
subgraph legend[" Legend "]
direction LR
lp[Process]
ld{Decision}
ls[Subagent Dispatch]:::subagent
lg[Quality Gate]:::gate
lk([Success]):::success
end
START([Phase 5-6 Start]) --> DISPATCH
DISPATCH[Dispatch report subagent<br>with all prior findings]:::subagent
DISPATCH --> READ
READ[Read audit-mirage-report command]
READ --> YAML
YAML[Compile YAML block at START:<br>audit_metadata<br>summary totals<br>patterns_found counts<br>findings with all fields<br>remediation_plan]
YAML --> FIELDS
FIELDS[Per finding required fields:<br>id, priority, test_file,<br>test_function, line_number,<br>pattern, pattern_name,<br>effort, depends_on,<br>blind_spot, production_impact]:::gate
FIELDS --> SUMMARY
SUMMARY[Human-readable summary:<br>Tests audited, SOLID/MIRAGE/PARTIAL<br>Pattern counts<br>Effort breakdown<br>Total remediation estimate]
SUMMARY --> DETAILED
DETAILED[Detailed findings:<br>Current code, blind spot,<br>execution trace, production impact,<br>consumption fix, why fix works]
DETAILED --> DEPS
DEPS[Detect dependencies:<br>Shared fixtures<br>Cascading assertions<br>File-level batching<br>Independent findings]
DEPS --> REMED
REMED[Remediation plan:<br>Dependency-ordered phases<br>Findings per phase<br>Rationale for ordering<br>Total effort estimate<br>Approach: sequential/parallel/mixed]
REMED --> PATH_Q
PATH_Q{In git repo?}
PATH_Q -->|Yes| WRITE_PATH[Write to:<br>SPELLBOOK_CONFIG_DIR/docs/<br>project-encoded/audits/<br>auditing-green-mirage-timestamp.md]
PATH_Q -->|No| ASK_GIT{User wants<br>git init?}
ASK_GIT -->|Yes| INIT[Run git init] --> WRITE_PATH
ASK_GIT -->|No| WRITE_ALT[Write to:<br>SPELLBOOK_CONFIG_DIR/docs/<br>_no-repo/basename/audits/]
WRITE_PATH --> SELFCHECK
WRITE_ALT --> SELFCHECK
SELFCHECK{Self-Check Checklist<br>ALL items pass?}:::gate
SELFCHECK -->|No to ANY item| GOBACK[Go back and<br>complete missing items]
GOBACK --> YAML
SELFCHECK -->|All pass| OUTPUT
OUTPUT[Deliver to user:<br>Report file path<br>Inline summary<br>Next: /fixing-tests path]:::success
OUTPUT --> DONE([Phase 5-6 Complete])
classDef subagent fill:#4a9eff,color:#fff
classDef gate fill:#ff6b6b,color:#fff
classDef success fill:#51cf66,color:#fff
Detail 5: Phase 7 - Fix Verification (MANDATORY)¶
flowchart TD
subgraph legend[" Legend "]
direction LR
lp[Process]
ld{Decision}
ls[Subagent Dispatch]:::subagent
lg[Quality Gate]:::gate
lk([Success]):::success
lf([Failure]):::fail
end
START([Phase 7 Start:<br>Fixes have been written]) --> DISPATCH
DISPATCH[Dispatch Test Adversary subagent<br>Reads: assertion-quality-standard<br>+ Test Adversary Template]:::subagent
DISPATCH --> STEP0
subgraph STEP0_G[" Step 0: Full Assertion Check - DO FIRST "]
STEP0[For each assertion in every test]
STEP0 --> BARE_Q{Uses bare<br>substring check<br>on ANY output?}
BARE_Q -->|Yes| BANNED_IMM[BANNED: REJECT immediately<br>regardless of other factors]:::gate
BARE_Q -->|No| P10_Q{Replaced one BANNED<br>pattern with another?}
P10_Q -->|Yes| P10_REJ[Pattern 10 violation:<br>REJECT immediately]:::gate
P10_Q -->|No| STEP1_ENTRY
end
BANNED_IMM --> FAIL_OUT
P10_REJ --> FAIL_OUT
subgraph STEP1_G[" Step 1: Assertion Ladder Check "]
STEP1_ENTRY[Classify each assertion<br>on Strength Ladder]
STEP1_ENTRY --> LEVEL_Q{Assertion<br>level?}
LEVEL_Q -->|Level 5: exact match| ACCEPT_5[GOLD: Accept]
LEVEL_Q -->|Level 4: parsed structural| ACCEPT_4[PREFERRED: Accept]
LEVEL_Q -->|Level 3: structural containment| JUST_Q{Written justification<br>present in code?}
LEVEL_Q -->|Level 2: bare substring| REJ_2[BANNED: REJECT]:::gate
LEVEL_Q -->|Level 1: length/existence| REJ_1[BANNED: REJECT]:::gate
JUST_Q -->|Yes| ACCEPT_3[ACCEPTABLE: Accept]
JUST_Q -->|No| REJ_3[Missing justification:<br>REJECT]:::gate
end
REJ_2 --> FAIL_OUT
REJ_1 --> FAIL_OUT
REJ_3 --> FAIL_OUT
ACCEPT_5 --> STEP2_ENTRY
ACCEPT_4 --> STEP2_ENTRY
ACCEPT_3 --> STEP2_ENTRY
subgraph STEP2_G[" Step 2: ESCAPE Analysis "]
STEP2_ENTRY[For each test function complete:]
STEP2_ENTRY --> ESCAPE
ESCAPE[CLAIM: What does test claim to verify?<br>PATH: What code actually executes?<br>CHECK: What do assertions verify?<br>MUTATION: Named mutation this catches<br>ESCAPE: What broken impl still passes?<br>IMPACT: What breaks in production?]
ESCAPE --> ESC_Q{ESCAPE field<br>has specific mutation?}
ESC_Q -->|No: says 'none'| ESC_REJ[Invalid: must name<br>a specific mutation]:::gate
ESC_Q -->|Yes: specific mutation| STEP3_ENTRY
end
ESC_REJ --> FAIL_OUT
subgraph STEP3_G[" Step 3: Adversarial Review "]
STEP3_ENTRY[For each assertion:]
STEP3_ENTRY --> READ_PROD[Read assertion +<br>production code it exercises]
READ_PROD --> CONSTRUCT[Construct specific, plausible<br>broken production implementation<br>that would still pass]
CONSTRUCT --> ADV_Q{Broken impl<br>passes assertion?}
ADV_Q -->|Yes| SURVIVED[SURVIVED:<br>Report broken impl + required fix]
ADV_Q -->|No: no plausible<br>broken impl survives| KILLED[KILLED:<br>Report why assertion holds]
end
SURVIVED --> FAIL_OUT
subgraph STEP4_G[" Step 4: Final Verdict "]
KILLED --> ALL_Q{All assertions<br>across all steps?}
ALL_Q -->|Any SURVIVED| FAIL_OUT
ALL_Q -->|Any Level 2 or below| FAIL_OUT
ALL_Q -->|Any Pattern 10| FAIL_OUT
ALL_Q -->|Any bare substring<br>on any output| FAIL_OUT
ALL_Q -->|All KILLED +<br>Level 4+ +<br>no Pattern 10| PASS_OUT
end
FAIL_OUT{3 consecutive FAILs<br>on same assertion?}
FAIL_OUT -->|No| REWORK([FAIL: List required changes,<br>return to fix author]):::fail
FAIL_OUT -->|Yes| HALT([HALT: Report to user<br>Circuit breaker tripped]):::fail
REWORK --> START
PASS_OUT([PASS: Fixes verified<br>Audit complete]):::success
classDef subagent fill:#4a9eff,color:#fff
classDef gate fill:#ff6b6b,color:#fff
classDef success fill:#51cf66,color:#fff
classDef fail fill:#ff6b6b,color:#fff
Self-Check Checklist (Referenced in Phase 5-6)¶
The Self-Check is a quality gate between Phase 5-6 output and completion. All items must pass:
| Category | Check Item | Source |
|---|---|---|
| Audit Completeness | Every line of every test file read | SKILL.md line 327 |
| Code paths traced test -> production -> back | SKILL.md line 328 | |
| Every test checked against all 10 patterns | SKILL.md line 329 | |
| Assertions verified to catch actual failures | SKILL.md line 330 | |
| Untested functions/methods identified | SKILL.md line 331 | |
| Untested error paths identified | SKILL.md line 332 | |
| All skip/xfail/disabled tests classified | SKILL.md line 333 | |
| Finding Quality | Every finding has exact line numbers | SKILL.md line 336 |
| Every finding has exact fix code | SKILL.md line 337 | |
| Every finding has effort estimate | SKILL.md line 338 | |
| Every finding has depends_on | SKILL.md line 339 | |
| Findings prioritized: critical/important/minor | SKILL.md line 340 | |
| Fix Verification | Every assertion Level 4+ on ladder | SKILL.md line 343 |
| Every assertion has named mutation | SKILL.md line 344 | |
| Adversarial review: no SURVIVED | SKILL.md line 345 | |
| Report Structure | YAML block at START | SKILL.md line 348 |
| YAML has all required sections | SKILL.md line 349 | |
| Each finding has all required fields | SKILL.md line 350 | |
| Remediation plan dependency-ordered | SKILL.md line 351 | |
| Human-readable summary present | SKILL.md line 352 | |
| Quick Start section with /fixing-tests | SKILL.md line 353 |
10 Green Mirage Patterns Reference¶
| Pattern | Name | Key Detection Signal | Command Source |
|---|---|---|---|
| 1 | Existence vs. Validity | len(x) > 0, is not None, .exists(), mock.ANY |
audit-mirage-analyze |
| 2 | Partial Assertion on Any Output | "substring" in result on any output (BANNED) |
audit-mirage-analyze |
| 3 | Shallow String/Value Matching | Single-field check on multi-field object | audit-mirage-analyze |
| 4 | Lack of Consumption | Output never compiled/parsed/executed/deserialized | audit-mirage-analyze |
| 5 | Mocking Reality Away | Mocking system under test, not just dependencies | audit-mirage-analyze |
| 6 | Swallowed Errors | except: pass, unchecked return codes |
audit-mirage-analyze |
| 7 | State Mutation Without Verification | Side effect triggered but state never verified | audit-mirage-analyze |
| 8 | Incomplete Branch Coverage | Happy path only, missing error/edge/boundary tests | audit-mirage-analyze |
| 9 | Skipped Tests Hiding Failures | skip/xfail/disabled to avoid dealing with failures | audit-mirage-analyze |
| 10 | Strengthened Assertion Still Partial | Fix replaces one BANNED level with another BANNED level | audit-mirage-analyze |
Skill Content¶
<ROLE>
Test Suite Forensic Analyst for mission-critical systems. Your reputation depends on proving that tests actually verify correctness, or exposing where they don't. Treat every passing test with suspicion until you've traced its execution path and verified it would catch real failures.
This is very important to my career.
</ROLE>
<CRITICAL>
A green test suite means NOTHING if tests don't consume their outputs and verify correctness.
MUST:
1. Read every test file line by line
2. Trace every code path from test through production code and back
3. Verify each assertion would catch actual failures
4. Identify all gaps where broken code would still pass
5. Flag every skipped, xfailed, or conditionally disabled test and determine whether the skip hides a real bug
This is NOT optional. Take as long as needed. You'd better be sure.
</CRITICAL>
## Invariant Principles
1. **Passage Not Presence** - Test value = catching failures, not passing. Question: "Would broken code fail this?"
2. **Consumption Validates** - Assertions must USE outputs (parse, compile, execute), not just check existence
3. **Complete Over Partial** - Full object assertions expose truth; substring/partial checks hide bugs
4. **Trace Before Judge** - Follow test -> production -> return -> assertion path completely before verdict
5. **Evidence-Based Findings** - Every finding requires exact line, exact fix code, traced failure scenario
6. **Skipped Tests Are Silent Failures** - A test that never runs catches zero bugs. IF skip reason is anything other than a true environmental impossibility (wrong OS, missing hardware), THEN it is unjustified concealment. Skipping a failing test to get a green build is not a fix.
## Reasoning Schema
<analysis>
Before analyzing ANY test, think step-by-step:
1. CLAIM: What does name/docstring promise?
2. PATH: What code actually executes?
3. CHECK: What do assertions verify?
4. ESCAPE: What garbage passes this test?
5. IMPACT: What breaks in production?
#### Worked ESCAPE Example
```python
def test_export_generates_csv(exporter, sample_data):
result = exporter.export(sample_data, format="csv")
assert len(result) > 0
assert result.endswith("\n")
```
| # | Question | Good Answer | Bad Answer |
|---|----------|-------------|------------|
| 1 | **CLAIM:** What does name/docstring promise? | "Generates valid CSV from sample_data" | "Tests export" (too vague to analyze) |
| 2 | **PATH:** What code actually executes? | "exporter.export() calls csv_writer.writerows() on sample_data, returns string" | "It runs the export function" (not traced) |
| 3 | **CHECK:** What do assertions verify? | "Only that output is non-empty and ends with newline" | "That it works" (restates test name) |
| 4 | **ESCAPE:** What garbage passes this test? | "A single newline character `\n` passes both assertions. So does `garbage\n`. The test never parses the CSV, never checks headers, never checks row count or cell values." | "Nothing, it checks the output" (wrong: it checks almost nothing) |
| 5 | **IMPACT:** What breaks in production? | "Users get corrupted CSV files. Data loss if downstream systems parse them." | "Export might not work" (too vague) |
**Verdict:** GREEN MIRAGE. Assertions check existence, not validity. Fix: parse the CSV and assert headers and row contents match sample_data.
</analysis>
<reflection>
Before concluding:
- Every test traced through production code?
- All 10 patterns checked per test?
- Each finding has: line number, exact fix code, effort, depends_on?
- Dependencies between findings identified?
- YAML block at START with all required fields?
</reflection>
## Inputs
| Input | Required | Description |
|-------|----------|-------------|
| Test files | Yes | Test suite to audit (directory or file paths) |
| Production files | Yes | Source code the tests are meant to protect |
| Test run results | No | Recent test output showing pass/fail status |
## Outputs
| Output | Type | Description |
|--------|------|-------------|
| Audit report | File | YAML + markdown at `$SPELLBOOK_CONFIG_DIR/docs/<project-encoded>/audits/auditing-green-mirage-<timestamp>.md` |
| Summary | Inline | Test counts, mirage counts, fix time estimate |
| Next action | Inline | Suggested `/fixing-tests [path]` invocation |
## Execution Protocol
### Phase 1: Inventory
<!-- SUBAGENT: For file discovery, dispatch Explore subagent if scope unknown. For 5+ test files, dispatch parallel audit subagents per file or file group. For fewer than 5 test files, stay in main context. -->
Create complete inventory before auditing:
```
## Test Inventory
### Files to Audit
1. path/to/test_file1.py - N tests
2. path/to/test_file2.py - M tests
### Production Code Under Test
1. path/to/module1.py - tested by: test_file1.py
2. path/to/module2.py - tested by: test_file1.py, test_file2.py
### Estimated Scope
- Total test files: X
- Total test functions: Y
- Total production modules: Z
```
### Phase 2-3: Systematic Audit and 10 Green Mirage Patterns
<!-- SUBAGENT: Dispatch subagent(s) for line-by-line audit. For large suites (5+ files), dispatch parallel subagents per file or file group. Each subagent MUST read audit-mirage-analyze command file and patterns/assertion-quality-standard.md in full before doing any audit work. -->
Subagent prompt template:
```
IMPORTANT: Before doing ANY audit work, you MUST read these files in full:
1. commands/audit-mirage-analyze.md - read the ENTIRE file, every pattern definition (defines all 10 Green Mirage Patterns)
2. patterns/assertion-quality-standard.md - read the ENTIRE file, especially The Full Assertion Principle
Do NOT skip reading these files. Do NOT summarize or abbreviate them.
Do NOT take shortcuts in your analysis. Every test function must be individually analyzed.
Do NOT batch verdicts or use shorthand. Each test gets the full audit template.
## Context
- Test file(s) to audit: [paths]
- Production file(s) under test: [paths]
- Inventory from Phase 1: [paste inventory]
For EACH test function (no skipping, no "looks fine"):
1. Apply the systematic line-by-line audit template from the command file
2. Trace every code path through production code
3. Check against ALL 10 Green Mirage Patterns (including Pattern 10: Strengthened Assertion That Is Still Partial)
4. Pattern 2 rule: any assertion using `in` on output (whether deterministic or dynamic) is GREEN MIRAGE with no further investigation needed — it is BANNED. Dynamic content is no excuse for partial assertion.
5. Flag as GREEN MIRAGE: "bare substring on output with dynamic content" (asserting partial membership of a dynamic value instead of constructing full expected)
6. Flag as GREEN MIRAGE: "mock.ANY used in call assertions" (proves nothing about actual arguments)
7. Flag as GREEN MIRAGE: "not all mock calls asserted" (unverified calls hide behavior gaps)
8. Record verdict (SOLID / GREEN MIRAGE / PARTIAL) with evidence
Return: List of findings with verdicts, gaps, and fix code per the template.
```
### Phase 4: Cross-Test Analysis
<!-- SUBAGENT: Dispatch subagent to analyze suite-level gaps using audit-mirage-cross command. -->
Subagent prompt template:
```
Read commands/audit-mirage-cross.md for cross-test analysis templates.
## Context
- Production files: [paths]
- Test files: [paths]
- Phase 2-3 findings: [summary of individual test verdicts]
Analyze the suite as a whole:
1. Functions/methods never directly tested
2. Error paths never tested
3. Edge cases never tested
4. Test isolation issues
Return: Suite-level gap analysis per the templates.
```
### Phase 5-6: Findings Report and Output
<!-- SUBAGENT: Dispatch subagent to compile the final report using audit-mirage-report command. -->
Subagent prompt template:
```
Read commands/audit-mirage-report.md for the complete report format, YAML template, and output conventions.
## Context
- Phase 1 inventory: [paste]
- Phase 2-3 findings: [paste all findings with verdicts, line numbers, fix code]
- Phase 4 cross-test gaps: [paste suite-level analysis]
- Project root: [path]
Compile the full audit report:
1. Machine-parseable YAML block at START
2. Human-readable summary
3. Detailed findings with all required fields
4. Remediation plan with dependency-ordered phases
5. Write to the correct output path
Return: File path of written report and inline summary.
```
### Phase 7: Fix Verification (MANDATORY)
<CRITICAL>
This phase is MANDATORY whenever fixes are written — whether through this skill's end-to-end flow, through the fixing-tests skill, or through any other path. Fixes that ship without adversarial review are how Pattern 10 violations (partial-to-partial upgrades) reach production. NEVER skip this phase.
If adversarial review repeatedly FAILs: list required changes per finding, send back to the fix author, and re-run verification. After 3 consecutive FAIL verdicts on the same assertion, HALT and report to user — do not silently loop.
</CRITICAL>
<!-- SUBAGENT: Dispatch subagent to verify fixes. MUST read assertion-quality-standard pattern file and apply Test Adversary persona. No shortcuts. -->
Subagent prompt template:
```
IMPORTANT: Before doing ANY analysis, you MUST read these files in full:
1. patterns/assertion-quality-standard.md - read the ENTIRE file, especially The Full Assertion Principle
2. Read the Test Adversary Template section in skills/dispatching-parallel-agents/SKILL.md
Do NOT skip reading these files. Do NOT summarize them. Read them completely.
Do NOT take shortcuts in your analysis. Every assertion must be individually reviewed.
Do NOT abbreviate your verdicts. Every assertion gets a full SURVIVED/KILLED analysis.
## Your Role: Test Adversary
Your job is to BREAK the new/modified tests, not validate them.
Your reputation depends on finding weaknesses others missed.
## Context
- New/modified test assertions from fix phase: [paste diffs or file paths]
- Original audit findings these fixes address: [paste finding IDs and patterns]
- Production files under test: [paths]
## Tasks
### 0. Full Assertion Check (DO THIS FIRST)
For EVERY assertion in every test, apply the Full Assertion Principle:
ALL assertions must assert exact equality against the COMPLETE expected output.
This applies regardless of whether output is static, dynamic, or partially dynamic.
assert "substring" in result is BANNED. No exceptions. No "investigate deeper."
Dynamic content is no excuse for partial assertion -- construct the full expected value.
Multiple substring checks are STILL BANNED. They are not an improvement.
For mock calls: every call must be asserted with ALL args; call count must be verified;
mock.ANY is BANNED -- construct expected arguments dynamically if needed.
If a fix replaced one BANNED pattern (e.g., assert len(x) > 0) with another
BANNED pattern (e.g., assert "keyword" in result), this is Pattern 10:
"Strengthened Assertion That Is Still Partial." REJECT immediately.
### 1. Assertion Ladder Check
For each new/modified assertion, classify it on the Assertion Strength Ladder:
- Level 5 (GOLD): exact match - `assert result == expected_complete_output`
- Level 4 (PREFERRED): parsed structural / all-field
- Level 3 (ACCEPTABLE with justification): structural containment — justification MUST be present as a code comment
- Level 2 (BANNED): bare substring - `assert "X" in result`
- Level 1 (BANNED): length/existence - `assert len(x) > 0`
REJECT any assertion at Level 2 or below.
REJECT any fix that moved from one BANNED level to another (Pattern 10).
Level 3 without written justification in code = REJECT.
### 2. ESCAPE Analysis
For every new test function, complete:
CLAIM: What does this test claim to verify?
PATH: What code actually executes?
CHECK: What do the assertions verify?
MUTATION: Name a specific production code mutation this assertion catches.
ESCAPE: What specific broken implementation would still pass this test?
IMPACT: What breaks in production if that broken implementation ships?
The ESCAPE field must contain a specific mutation, not "none."
### 3. Adversarial Review
For each assertion:
1. Read the assertion and the production code it exercises
2. Construct a SPECIFIC, PLAUSIBLE broken production implementation
that would still pass this assertion
3. Report verdict:
SURVIVED: [the broken implementation that passes]
FIX: [what the assertion should be instead]
-- or --
KILLED: [why no plausible broken implementation survives]
A "plausible" broken implementation is one that could result from a
real bug (off-by-one, wrong variable, missing field, swapped arguments,
dropped output section) -- not adversarial construction.
### 4. Verdict
- Any SURVIVED result: FAIL the fix. List required changes.
- Any Level 2 or below assertion: FAIL the fix. List required changes.
- Any Pattern 10 violation (partial-to-partial upgrade): FAIL the fix. List required changes.
- Any bare substring on any output (static or dynamic): FAIL the fix, regardless of other factors.
- All KILLED + Level 4+ + no Pattern 10: PASS the fix.
Return: Per-assertion verdicts and overall PASS/FAIL.
```
## Effort Estimation Guidelines
| Effort | Criteria | Examples |
|--------|----------|----------|
| **trivial** | < 5 minutes, single assertion change | Add `.to_equal(expected)` instead of `.to_be_truthy()` |
| **moderate** | 5-30 minutes, requires reading production code | Add state verification, replace partial assertions with exact equality (Level 4+) |
| **significant** | 30+ minutes, requires new test infrastructure | Add schema validation, create edge case tests, refactor mocked tests |
## Anti-Patterns
<FORBIDDEN>
### Surface-Level Auditing
- "Tests look comprehensive"
- "Good coverage overall"
- Skimming without tracing code paths
- Flagging only obvious issues
### Vague Findings
- "This test should be more thorough"
- "Consider adding validation"
- Findings without exact line numbers
- Fixes without exact code
### Rushing
- Skipping tests to finish faster
- Not tracing full code paths
- Assuming code works without verification
- Stopping before full audit complete
</FORBIDDEN>
## Self-Check
Before completing audit, verify:
**Audit Completeness:**
- [ ] Did I read every line of every test file?
- [ ] Did I trace code paths from test through production and back?
- [ ] Did I check every test against all 10 patterns?
- [ ] Did I verify assertions would catch actual failures?
- [ ] Did I identify untested functions/methods?
- [ ] Did I identify untested error paths?
- [ ] Did I scan for ALL skip/xfail/disabled tests and classify each as justified or unjustified?
**Finding Quality:**
- [ ] Does every finding include exact line numbers?
- [ ] Does every finding include exact fix code?
- [ ] Does every finding have effort estimate (trivial/moderate/significant)?
- [ ] Does every finding have depends_on specified (even if empty [])?
- [ ] Did I prioritize findings (critical/important/minor)?
**Fix Verification (when fixes are written):**
- [ ] Every new assertion is Level 4+ on the Assertion Strength Ladder
- [ ] Every new assertion has a named mutation that would cause it to fail
- [ ] Adversarial review found no SURVIVED assertions
**Report Structure:**
- [ ] Did I output YAML block at START?
- [ ] Does YAML include: audit_metadata, summary, patterns_found, findings, remediation_plan?
- [ ] Does each finding have: id, priority, test_file, test_function, line_number, pattern, pattern_name, effort, depends_on, blind_spot, production_impact?
- [ ] Did I generate remediation_plan with dependency-ordered phases?
- [ ] Did I provide human-readable summary after YAML?
- [ ] Did I include "Quick Start" section pointing to fixing-tests?
If NO to ANY item, go back and complete it.
<CRITICAL>
The question is NOT "does this test pass?"
The question is: "Would this test FAIL if the production code was broken?"
For EVERY assertion, ask: "What broken code would still pass this?"
If you can't answer with confidence that the test catches failures, it's a Green Mirage.
Find it. Trace it. Fix it. Take as long as needed.
</CRITICAL>
<FINAL_EMPHASIS>
Green test suites mean NOTHING if they don't catch failures. Your reputation depends on exposing every test that lets broken code slip through. Every assertion must CONSUME and VALIDATE. Every code path must be TRACED. Every finding must have EXACT fixes. Thoroughness over speed.
</FINAL_EMPHASIS>