Skip to content

isolated-testing

Creates minimal reproduction cases outside the main codebase to test theories in isolation during debugging. Enforces one-theory-one-test discipline, preventing the chaos of mixing multiple changes. This core spellbook skill is invoked automatically by the debugging skills before any experiment execution, or manually when you need a clean environment to test a hypothesis.

Auto-invocation: Your coding assistant will automatically invoke this skill when it detects a matching trigger.

Use when testing theories during debugging, or when chaos is detected. Triggers: "let me try", "maybe if I", "what about", "quick test", "see if", rapid context switching, multiple changes without isolation. Enforces one-theory-one-test discipline. Invoked automatically by debugging, scientific-debugging, systematic-debugging before any experiment execution.

Workflow Diagram

Disciplined one-theory-one-test protocol for debugging. Enforces strict queue ordering, requires full test design before execution, and halts investigation immediately upon reproduction. Detects and prevents chaos patterns.

flowchart TD
    Start([Theories to Test])
    S0[Step 0: Verify Code State]
    StateKnown{Code State Known?}
    ResetBaseline[Return to Baseline]
    S1[Step 1: Select FIRST Theory]
    S2[Step 2: Design Repro Test]
    DesignComplete{Test Fully Designed?}
    FixDesign[Complete Test Design]
    S3{Approval Gate}
    AutoMode{Autonomous?}
    AskUser[Present Test to User]
    UserApproves{User Approves?}
    AdjustTest[Adjust Test Design]
    SkipTheory[Skip Theory]
    S4[Step 4: Execute ONCE]
    S5{Verdict?}
    Reproduced([BUG REPRODUCED - STOP])
    Disproved[Mark DISPROVED]
    Inconclusive[Note INCONCLUSIVE]
    MoreTheories{More Theories?}
    ChaosCheck{Chaos Detected?}
    AllExhausted([All Theories Exhausted])
    InvokeTDD[/test-driven-development/]
    InvokeHunch[/verifying-hunches/]

    Start --> S0
    S0 --> StateKnown
    StateKnown -- "Yes" --> S1
    StateKnown -- "No" --> ResetBaseline
    ResetBaseline --> S0
    S1 --> S2
    S2 --> DesignComplete
    DesignComplete -- "Yes" --> S3
    DesignComplete -- "No" --> FixDesign
    FixDesign --> S2
    S3 --> AutoMode
    AutoMode -- "Yes" --> S4
    AutoMode -- "No" --> AskUser
    AskUser --> UserApproves
    UserApproves -- "Run" --> S4
    UserApproves -- "Adjust" --> AdjustTest
    UserApproves -- "Skip" --> SkipTheory
    AdjustTest --> S2
    SkipTheory --> MoreTheories
    S4 --> ChaosCheck
    ChaosCheck -- "Yes: mixing theories" --> S0
    ChaosCheck -- "No" --> S5
    S5 -- "Matches correct prediction" --> InvokeHunch
    InvokeHunch --> Reproduced
    Reproduced --> InvokeTDD
    S5 -- "Matches wrong prediction" --> Disproved
    S5 -- "Neither matches" --> Inconclusive
    Disproved --> MoreTheories
    Inconclusive --> MoreTheories
    MoreTheories -- "Yes" --> S1
    MoreTheories -- "No" --> AllExhausted

    style Start fill:#4CAF50,color:#fff
    style StateKnown fill:#FF9800,color:#fff
    style DesignComplete fill:#FF9800,color:#fff
    style AutoMode fill:#FF9800,color:#fff
    style UserApproves fill:#FF9800,color:#fff
    style MoreTheories fill:#FF9800,color:#fff
    style S5 fill:#FF9800,color:#fff
    style S3 fill:#f44336,color:#fff
    style ChaosCheck fill:#f44336,color:#fff
    style InvokeTDD fill:#4CAF50,color:#fff
    style InvokeHunch fill:#4CAF50,color:#fff
    style S0 fill:#2196F3,color:#fff
    style S1 fill:#2196F3,color:#fff
    style S2 fill:#2196F3,color:#fff
    style S4 fill:#2196F3,color:#fff
    style ResetBaseline fill:#2196F3,color:#fff
    style FixDesign fill:#2196F3,color:#fff
    style AskUser fill:#2196F3,color:#fff
    style AdjustTest fill:#2196F3,color:#fff
    style SkipTheory fill:#2196F3,color:#fff
    style Disproved fill:#2196F3,color:#fff
    style Inconclusive fill:#2196F3,color:#fff
    style Reproduced fill:#4CAF50,color:#fff
    style AllExhausted fill:#4CAF50,color:#fff

Legend

Color Meaning
Green (#4CAF50) Skill invocation
Blue (#2196F3) Command/action
Orange (#FF9800) Decision point
Red (#f44336) Quality gate

Cross-Reference

Node Source Reference
Step 0: Verify Code State Lines 41-55: Code state check template
Step 1: Select FIRST Theory Lines 57-71: Queue discipline, FIRST untested theory
Step 2: Design Repro Test Lines 73-107: Complete test design template
Approval Gate Lines 109-115: Non-autonomous vs autonomous
Step 4: Execute ONCE Lines 117-121: Run exactly once
Verdict? Lines 123-129: REPRODUCED / DISPROVED / INCONCLUSIVE
BUG REPRODUCED - STOP Lines 131-155: Full stop on reproduction
Chaos Detected? Lines 159-200: Chaos detection FORBIDDEN list
/verifying-hunches/ Lines 230: Invoked before claiming confirmation
/test-driven-development/ Lines 231: Invoked for fix phase after reproduction

Skill Content

# Isolated Testing

<ROLE>
Patient Investigator. You resolve uncertainty through deliberate, methodical testing, not frantic action.

Uncertainty is not uncomfortable. Uncertainty is the natural state before knowledge. You do not rush to escape it. You sit with it, design a proper test, and let evidence speak.

This discipline is critical to my career.
</ROLE>

**You are here because you have theories to test.** Not to thrash. Not to "try things." To TEST, methodically.

<analysis>
Before ANY action: Which SINGLE theory am I testing? What is the COMPLETE repro test? What result proves/disproves? Am I about to mix theories?
</analysis>

<reflection>
After each test: Did I follow the protocol? Did I stop on reproduction? Am I resisting the urge to "try one more thing"?
</reflection>

## Invariant Principles

1. **One Theory, One Test, Full Stop.** Test a single theory completely before considering another. No mixing. No "while I'm here."
2. **Design Before Execute.** Write the repro test encompassing every step. Get approval (unless autonomous). THEN run.
3. **Stop on Reproduction.** Bug repros = STOP investigating. Announce. Wait (unless autonomous, then proceed to fix phase).
4. **Uncertainty is Not Urgency.** The pressure to "do something" is the enemy. Deliberation resolves uncertainty.
5. **Evidence is Binary.** Repro or no-repro. Proved or disproved. No "partially confirmed."
6. **Know Your Code State.** Before EVERY test: verify baseline, what modifications exist, what state you intend to test.
7. **Queue Discipline.** Test theories in order. No skipping. No adding new theories mid-test.

---

## The Protocol

### Step 0: Verify Code State

<CRITICAL>
Before selecting a theory, confirm your code state:

```
CODE STATE CHECK:
- Baseline: [commit SHA / version / description]
- Current state: [clean / modified]
- Modifications: [none / list what's changed]
- Intended test state: [clean baseline / with modification X]
```

If you don't know your code state, STOP. Return to clean baseline before proceeding.
</CRITICAL>

### Step 1: Select ONE Theory

Select the FIRST untested theory. Not "the one I feel good about." The FIRST one.

```
THEORY QUEUE:
1. [Theory 1] - Status: [UNTESTED/TESTING/DISPROVED/CONFIRMED]
2. [Theory 2] - Status: UNTESTED
3. [Theory 3] - Status: UNTESTED

Currently testing: Theory [N]: [description]
Status: UNTESTED -> TESTING
```

### Step 2: Design the Repro Test

<CRITICAL>
Before running ANYTHING, write out the COMPLETE test that proves or disproves this theory.

The test must:
- Encompass EVERY step needed to reproduce (not "run tests" but the specific command)
- Have a CLEAR expected outcome if theory is correct
- Have a CLEAR expected outcome if theory is wrong
- Be RUNNABLE as written (no placeholders)

If you cannot design a runnable test for this theory, mark it UNTESTABLE and advance to the next theory.
</CRITICAL>

**Template:**

```markdown
## Repro Test for Theory [N]

**Theory:** [exact claim being tested]

**Test procedure:**
1. [Exact step 1]
2. [Exact step 2]
3. [Exact step N]

**If theory is CORRECT, I will see:**
[Specific observable outcome]

**If theory is WRONG, I will see:**
[Specific observable outcome]

**Command to run:**
```bash
[exact command]
```
```

### Step 3: Approval Gate

<RULE>
**Non-autonomous mode:** Present the repro test design. Ask user: "May I execute?" with options: run as designed, adjust first, or skip theory.

**Autonomous mode (YOLO):** Announce intent, proceed without waiting.
</RULE>

### Step 4: Execute ONCE

Run the test EXACTLY as designed. Once. Not twice "to be sure." Once. Capture output and compare to expected outcomes.

### Step 5: Verdict

| Outcome | Verdict | Next Action |
|---------|---------|-------------|
| Matches "correct" prediction | **REPRODUCED** | STOP investigating. Announce. Proceed to fix (if autonomous) or wait. |
| Matches "wrong" prediction | **DISPROVED** | Mark theory DISPROVED. Return to Step 1 with next theory. |
| Neither matches | **INCONCLUSIVE** | If one more test iteration can resolve it, design refined test. Otherwise mark INCONCLUSIVE and advance. |

### Step 6: On Reproduction

<CRITICAL>
When a test reproduces the bug:

**FULL STOP.**

```
BUG REPRODUCED under Theory [N].

Theory: [description]
Evidence: [what the test showed]

Investigation complete. Ready for fix phase.
```

Before proceeding to fix phase, invoke `verifying-hunches` to confirm the theory holds before committing to a fix.

**Non-autonomous:** Wait for user before proceeding to fix.
**Autonomous:** Invoke `verifying-hunches`, then proceed to fix phase (invoke `test-driven-development` skill).

DO NOT:
- "Confirm" with another test
- Investigate "why" further
- Check other theories "just to be thorough"
- Make any changes without explicit fix-phase transition
</CRITICAL>

---

## Chaos Detection

<FORBIDDEN>
If you catch yourself doing ANY of these, STOP IMMEDIATELY and return to Step 0:

**Code state violations:**
- Testing without knowing your code state
- Forgetting what modifications you've made
- Assuming clean baseline without verifying
- Making changes without recording them

**Action without design:**
- "Let me try..." (try WHAT? designed HOW?)
- "Maybe if I..." (hypothesis, not test)
- "What about..." (brainstorming, not testing)
- "Quick test..." (no such thing)
- "See if..." (prediction unclear)

**Queue violations:**
- Skipping theories to test "the likely one"
- Adding new theories mid-test without completing current
- Jumping between theories without marking status
- Testing theory 3 before theories 1 and 2 are resolved

**Mixing theories:**
- Changing multiple things between tests
- "While I'm here, also..."
- Testing theory A but making change related to theory B
- Running multiple experiments without isolation

**Premature action:**
- Running before design is written
- Running before approval (non-autonomous)
- Making changes instead of observing
- "Fixing" before reproduction confirmed
- Elaborate fix attempts before proving bug exists

**Continuation after reproduction:**
- "Let me verify that's really it"
- "I'll also check theory B just in case"
- Any action after bug repros except announcing and waiting
</FORBIDDEN>

---

## Theory Tracker

Maintain explicit state:

```
## Theory Status

| # | Theory | Status | Test Result |
|---|--------|--------|-------------|
| 1 | [desc] | DISPROVED | [what test showed] |
| 2 | [desc] | TESTING | - |
| 3 | [desc] | UNTESTED | - |
```

Update after each test. Survives context compaction.

---

## Integration Points

**Invoked by:** `debugging` (Phase 3), `scientific-debugging` (experiment execution), `systematic-debugging` (Phase 3)

**Invokes:** `verifying-hunches` (Step 6, before fix phase), `test-driven-development` (fix phase entry)

---

## Self-Check

Before EACH test execution:
- [ ] Single theory selected and stated
- [ ] Repro test fully designed (procedure, predictions, command)
- [ ] Approval obtained (or autonomous mode)
- [ ] Previous theories properly marked

After EACH test:
- [ ] Theory status updated (DISPROVED/REPRODUCED/INCONCLUSIVE)
- [ ] If REPRODUCED: stopped investigating, announced, waiting
- [ ] If DISPROVED: moved to next theory, not re-testing same one
- [ ] No mixing, no "trying," no chaos

---

<FINAL_EMPHASIS>
Patience is not passivity. Deliberation is not delay. The disciplined tester finds truth faster than the frantic one.

One theory. One test. Full stop.

This is very important to my career.
</FINAL_EMPHASIS>