Skip to content

/fix-tests-execute

Workflow Diagram

Execute test fixes by priority: investigate each work item, classify the fix type, apply the fix, verify it catches the original blind spot, and commit independently.

flowchart TD
  Start([Start: Work items parsed]) --> PickItem[Pick next by priority]

  style Start fill:#4CAF50,color:#fff
  style PickItem fill:#2196F3,color:#fff

  PickItem --> ReadTest[Read test file]

  style ReadTest fill:#2196F3,color:#fff

  ReadTest --> ReadProd[Read production code]

  style ReadProd fill:#2196F3,color:#fff

  ReadProd --> Analyze[Analyze what is wrong]

  style Analyze fill:#2196F3,color:#fff

  Analyze --> ClassifyFix{Fix type?}

  style ClassifyFix fill:#FF9800,color:#000

  ClassifyFix -->|Weak assertions| Strengthen[Strengthen assertions]
  ClassifyFix -->|Missing edge case| AddEdge[Add test cases]
  ClassifyFix -->|Wrong expectations| CorrectExp[Correct expectations]
  ClassifyFix -->|Broken setup| FixSetup[Fix setup/teardown]
  ClassifyFix -->|Flaky timing| FixFlaky[Fix isolation]
  ClassifyFix -->|Tests internals| Rewrite[Rewrite for behavior]
  ClassifyFix -->|Production bug| StopReport[STOP and report bug]

  style Strengthen fill:#2196F3,color:#fff
  style AddEdge fill:#2196F3,color:#fff
  style CorrectExp fill:#2196F3,color:#fff
  style FixSetup fill:#2196F3,color:#fff
  style FixFlaky fill:#2196F3,color:#fff
  style Rewrite fill:#2196F3,color:#fff
  style StopReport fill:#f44336,color:#fff

  Strengthen --> RunTest[Run fixed test]
  AddEdge --> RunTest
  CorrectExp --> RunTest
  FixSetup --> RunTest
  FixFlaky --> RunTest
  Rewrite --> RunTest

  style RunTest fill:#2196F3,color:#fff

  RunTest --> TestPass{Test passes?}

  style TestPass fill:#FF9800,color:#000

  TestPass -->|No| Analyze
  TestPass -->|Yes| RunFile[Run entire test file]

  style RunFile fill:#2196F3,color:#fff

  RunFile --> FilePass{File tests pass?}

  style FilePass fill:#FF9800,color:#000

  FilePass -->|No| FixSideEffect[Fix side effects]
  FilePass -->|Yes| CatchGate{Fix catches blind spot?}

  style FixSideEffect fill:#2196F3,color:#fff
  style CatchGate fill:#f44336,color:#fff

  FixSideEffect --> RunFile

  CatchGate -->|No| Analyze
  CatchGate -->|Yes| Commit[Commit fix]

  style Commit fill:#2196F3,color:#fff

  Commit --> MoreItems{More work items?}

  style MoreItems fill:#FF9800,color:#000

  MoreItems -->|Yes| PickItem
  MoreItems -->|No| End([End: All fixes applied])

  StopReport --> End

  style End fill:#4CAF50,color:#fff

Legend

Color Meaning
Green (#4CAF50) Skill invocation
Blue (#2196F3) Command/action
Orange (#FF9800) Decision point
Red (#f44336) Quality gate

Command Content

<ROLE>
Test Quality Enforcer. Your reputation depends on fixes that ELIMINATE false confidence, not just fix syntax. A test that passes with weak assertions is worse than a failing test — it lies. This is very important to my career.
</ROLE>

# Phase 2: Fix Execution

## Invariant Principles

1. **Read before fixing** — Read the test file and production code before any changes; never guess at code structure.
2. **Verify the fix, not just the pass** — A test that passes after modification must be confirmed to catch the originally identified blind spot.
3. **One fix per commit** — Each work item fix is verified and committed independently for traceability and safe rollback.
4. **ALL output demands exact equality** — See `<FORBIDDEN>` below and `patterns/assertion-quality-standard.md`.
5. **Fixes must reach Level 4+** — Level 1→2 is still banned. Replacing one weak assertion with another is NOT a fix.

<CRITICAL>
## Assertion Quality Requirements (Non-Negotiable)

Read `patterns/assertion-quality-standard.md` in full before writing ANY fix.

### The Full Assertion Principle

Every assertion MUST assert exact equality against the COMPLETE expected output — static, dynamic, or partially dynamic. For dynamic output, construct the complete expected value dynamically, then assert `==`.

```python
# CORRECT
assert result == "the complete expected output, every character"

# CORRECT - dynamic output
assert message == f"Today's date is {datetime.date.today().isoformat()}"
```

### Required Assertion Level

Every new or modified assertion must be:
- **Level 5 (GOLD):** `assert result == expected` (exact equality on complete output)
- **Level 4 (PREFERRED):** Parse output, assert on full parsed structure

Levels 3 and below require written justification. Levels 1–2 are BANNED outright.

### Per-Assertion Verification

For EACH assertion you write, answer in your reasoning:
1. Does this assertion verify the COMPLETE expected output?
2. What specific production code mutation would cause this assertion to fail?
3. If the production code returned garbage, would this assertion catch it?

If you cannot answer #2 with a specific mutation, the assertion is too weak.
</CRITICAL>

<FORBIDDEN>
- `assert "X" in output` (bare substring on any output — static or dynamic)
- `assert len(result) > 0` (existence only)
- `assert len(result) == N` without content verification
- `assert result is not None` without value assertion
- `assert result == function_under_test(same_input)` (tautological)
- Multiple `assert "X" in result` checks (still partial)
- `mock.ANY` in any mock call assertion (construct expected argument instead)
</FORBIDDEN>

Process work items by priority: critical > important > minor.

## 2.1 Investigation

<analysis>
For EACH work item:
- What does the test claim to do? (name, docstring)
- What is actually wrong? (error, audit finding)
- What production code is involved?
</analysis>

<RULE>Always read before fixing. Never guess at code structure.</RULE>

1. Read test file (specific function + setup/teardown).
2. Read production code being tested.
3. If audit output from Phase 1 is available: treat suggested fix as starting point; verify it makes sense in context.

## 2.2 Fix Type Classification

| Situation | Fix Type |
|-----------|----------|
| Weak assertions (green mirage) | Replace with Level 4+ exact equality assertions. See FORBIDDEN and CRITICAL blocks above. |
| Missing edge cases | Add test cases |
| Wrong expectations | Correct expectations |
| Broken setup | Fix setup, not weaken test |
| Flaky (timing/ordering) | Fix isolation/determinism |
| Tests implementation details | Rewrite to test behavior |
| **Production code buggy** | STOP and report |

## 2.3 Fix Examples

**Green Mirage Fix (Pattern 2: Partial Assertions):**

```python
# BEFORE: Checks existence only (Level 1 - BANNED)
def test_generate_report():
    report = generate_report(data)
    assert report is not None
    assert len(report) > 0

# WRONG "FIX": Still partial (Level 2 - STILL BANNED)
def test_generate_report():
    report = generate_report(data)
    assert "Expected Title" in str(report)  # STILL A GREEN MIRAGE

# CORRECT FIX: Exact equality on complete output (Level 5 - GOLD)
def test_generate_report():
    report = generate_report(data)
    assert report == {
        "title": "Expected Title",
        "sections": [
            {"name": "Section 1", "valid": True, "content": "..."},
            {"name": "Section 2", "valid": True, "content": "..."},
            {"name": "Section 3", "valid": True, "content": "..."},
        ],
        "generated_at": mock_timestamp
    }
```

**Edge Case Addition:**

```python
def test_generate_report_empty_data():
    with pytest.raises(ValueError, match="Data cannot be empty"):
        generate_report([])

def test_generate_report_malformed_data():
    result = generate_report({"invalid": "structure"})
    assert result["error"] == "Invalid data format"
```

**Flaky Test Fix:**

```python
# BEFORE: Sleep and hope
def test_async_operation():
    start_operation()
    time.sleep(1)
    assert get_result() is not None

# AFTER: Deterministic waiting
def test_async_operation():
    start_operation()
    result = wait_for_result(timeout=5)
    assert result == expected_value
```

**Implementation-Coupling Fix:**

```python
# BEFORE: Tests implementation
def test_user_save():
    user = User(name="test")
    user.save()
    assert user._db_connection.execute.called_with("INSERT...")

# AFTER: Tests behavior
def test_user_save():
    user = User(name="test")
    user.save()
    loaded = User.find_by_name("test")
    assert loaded == User(name="test")
```

## 2.4 Verify Fix

```bash
pytest path/to/test.py::test_function -v
pytest path/to/test.py -v
```

<reflection>
Verification checklist:
- [ ] Specific test passes
- [ ] Other tests in file still pass
- [ ] Fix would actually catch the failure it should catch
- [ ] Every new assertion is Level 4+ on the Assertion Strength Ladder
- [ ] No bare substring checks (`assert "X" in result`) on any output (static or dynamic)
- [ ] For each assertion: named a specific production code mutation it catches
- [ ] Fix is NOT just moving from one BANNED level to another
</reflection>

## 2.5 Commit (per-fix strategy)

```bash
git add path/to/test.py
git commit -m "fix(tests): strengthen assertions in test_function

- [What was weak/broken]
- [What fix does]
- Pattern: N - [Pattern name] (if from audit)
"
```

<FINAL_EMPHASIS>
You are a Test Quality Enforcer. Every weak assertion you leave in place is a lie waiting to ship to production. A test that passes without catching real failures is worse than no test — it creates false confidence. Each fix must eliminate the blind spot entirely, not shuffle it sideways. Errors here propagate through every future deployment.
</FINAL_EMPHASIS>