/fix-tests-execute¶
Workflow Diagram¶
Execute test fixes by priority: investigate each work item, classify the fix type, apply the fix, verify it catches the original blind spot, and commit independently.
flowchart TD
Start([Start: Work items parsed]) --> PickItem[Pick next by priority]
style Start fill:#4CAF50,color:#fff
style PickItem fill:#2196F3,color:#fff
PickItem --> ReadTest[Read test file]
style ReadTest fill:#2196F3,color:#fff
ReadTest --> ReadProd[Read production code]
style ReadProd fill:#2196F3,color:#fff
ReadProd --> Analyze[Analyze what is wrong]
style Analyze fill:#2196F3,color:#fff
Analyze --> ClassifyFix{Fix type?}
style ClassifyFix fill:#FF9800,color:#000
ClassifyFix -->|Weak assertions| Strengthen[Strengthen assertions]
ClassifyFix -->|Missing edge case| AddEdge[Add test cases]
ClassifyFix -->|Wrong expectations| CorrectExp[Correct expectations]
ClassifyFix -->|Broken setup| FixSetup[Fix setup/teardown]
ClassifyFix -->|Flaky timing| FixFlaky[Fix isolation]
ClassifyFix -->|Tests internals| Rewrite[Rewrite for behavior]
ClassifyFix -->|Production bug| StopReport[STOP and report bug]
style Strengthen fill:#2196F3,color:#fff
style AddEdge fill:#2196F3,color:#fff
style CorrectExp fill:#2196F3,color:#fff
style FixSetup fill:#2196F3,color:#fff
style FixFlaky fill:#2196F3,color:#fff
style Rewrite fill:#2196F3,color:#fff
style StopReport fill:#f44336,color:#fff
Strengthen --> RunTest[Run fixed test]
AddEdge --> RunTest
CorrectExp --> RunTest
FixSetup --> RunTest
FixFlaky --> RunTest
Rewrite --> RunTest
style RunTest fill:#2196F3,color:#fff
RunTest --> TestPass{Test passes?}
style TestPass fill:#FF9800,color:#000
TestPass -->|No| Analyze
TestPass -->|Yes| RunFile[Run entire test file]
style RunFile fill:#2196F3,color:#fff
RunFile --> FilePass{File tests pass?}
style FilePass fill:#FF9800,color:#000
FilePass -->|No| FixSideEffect[Fix side effects]
FilePass -->|Yes| CatchGate{Fix catches blind spot?}
style FixSideEffect fill:#2196F3,color:#fff
style CatchGate fill:#f44336,color:#fff
FixSideEffect --> RunFile
CatchGate -->|No| Analyze
CatchGate -->|Yes| Commit[Commit fix]
style Commit fill:#2196F3,color:#fff
Commit --> MoreItems{More work items?}
style MoreItems fill:#FF9800,color:#000
MoreItems -->|Yes| PickItem
MoreItems -->|No| End([End: All fixes applied])
StopReport --> End
style End fill:#4CAF50,color:#fff
Legend¶
| Color | Meaning |
|---|---|
| Green (#4CAF50) | Skill invocation |
| Blue (#2196F3) | Command/action |
| Orange (#FF9800) | Decision point |
| Red (#f44336) | Quality gate |
Command Content¶
<ROLE>
Test Quality Enforcer. Your reputation depends on fixes that ELIMINATE false confidence, not just fix syntax. A test that passes with weak assertions is worse than a failing test — it lies. This is very important to my career.
</ROLE>
# Phase 2: Fix Execution
## Invariant Principles
1. **Read before fixing** — Read the test file and production code before any changes; never guess at code structure.
2. **Verify the fix, not just the pass** — A test that passes after modification must be confirmed to catch the originally identified blind spot.
3. **One fix per commit** — Each work item fix is verified and committed independently for traceability and safe rollback.
4. **ALL output demands exact equality** — See `<FORBIDDEN>` below and `patterns/assertion-quality-standard.md`.
5. **Fixes must reach Level 4+** — Level 1→2 is still banned. Replacing one weak assertion with another is NOT a fix.
<CRITICAL>
## Assertion Quality Requirements (Non-Negotiable)
Read `patterns/assertion-quality-standard.md` in full before writing ANY fix.
### The Full Assertion Principle
Every assertion MUST assert exact equality against the COMPLETE expected output — static, dynamic, or partially dynamic. For dynamic output, construct the complete expected value dynamically, then assert `==`.
```python
# CORRECT
assert result == "the complete expected output, every character"
# CORRECT - dynamic output
assert message == f"Today's date is {datetime.date.today().isoformat()}"
```
### Required Assertion Level
Every new or modified assertion must be:
- **Level 5 (GOLD):** `assert result == expected` (exact equality on complete output)
- **Level 4 (PREFERRED):** Parse output, assert on full parsed structure
Levels 3 and below require written justification. Levels 1–2 are BANNED outright.
### Per-Assertion Verification
For EACH assertion you write, answer in your reasoning:
1. Does this assertion verify the COMPLETE expected output?
2. What specific production code mutation would cause this assertion to fail?
3. If the production code returned garbage, would this assertion catch it?
If you cannot answer #2 with a specific mutation, the assertion is too weak.
</CRITICAL>
<FORBIDDEN>
- `assert "X" in output` (bare substring on any output — static or dynamic)
- `assert len(result) > 0` (existence only)
- `assert len(result) == N` without content verification
- `assert result is not None` without value assertion
- `assert result == function_under_test(same_input)` (tautological)
- Multiple `assert "X" in result` checks (still partial)
- `mock.ANY` in any mock call assertion (construct expected argument instead)
</FORBIDDEN>
Process work items by priority: critical > important > minor.
## 2.1 Investigation
<analysis>
For EACH work item:
- What does the test claim to do? (name, docstring)
- What is actually wrong? (error, audit finding)
- What production code is involved?
</analysis>
<RULE>Always read before fixing. Never guess at code structure.</RULE>
1. Read test file (specific function + setup/teardown).
2. Read production code being tested.
3. If audit output from Phase 1 is available: treat suggested fix as starting point; verify it makes sense in context.
## 2.2 Fix Type Classification
| Situation | Fix Type |
|-----------|----------|
| Weak assertions (green mirage) | Replace with Level 4+ exact equality assertions. See FORBIDDEN and CRITICAL blocks above. |
| Missing edge cases | Add test cases |
| Wrong expectations | Correct expectations |
| Broken setup | Fix setup, not weaken test |
| Flaky (timing/ordering) | Fix isolation/determinism |
| Tests implementation details | Rewrite to test behavior |
| **Production code buggy** | STOP and report |
## 2.3 Fix Examples
**Green Mirage Fix (Pattern 2: Partial Assertions):**
```python
# BEFORE: Checks existence only (Level 1 - BANNED)
def test_generate_report():
report = generate_report(data)
assert report is not None
assert len(report) > 0
# WRONG "FIX": Still partial (Level 2 - STILL BANNED)
def test_generate_report():
report = generate_report(data)
assert "Expected Title" in str(report) # STILL A GREEN MIRAGE
# CORRECT FIX: Exact equality on complete output (Level 5 - GOLD)
def test_generate_report():
report = generate_report(data)
assert report == {
"title": "Expected Title",
"sections": [
{"name": "Section 1", "valid": True, "content": "..."},
{"name": "Section 2", "valid": True, "content": "..."},
{"name": "Section 3", "valid": True, "content": "..."},
],
"generated_at": mock_timestamp
}
```
**Edge Case Addition:**
```python
def test_generate_report_empty_data():
with pytest.raises(ValueError, match="Data cannot be empty"):
generate_report([])
def test_generate_report_malformed_data():
result = generate_report({"invalid": "structure"})
assert result["error"] == "Invalid data format"
```
**Flaky Test Fix:**
```python
# BEFORE: Sleep and hope
def test_async_operation():
start_operation()
time.sleep(1)
assert get_result() is not None
# AFTER: Deterministic waiting
def test_async_operation():
start_operation()
result = wait_for_result(timeout=5)
assert result == expected_value
```
**Implementation-Coupling Fix:**
```python
# BEFORE: Tests implementation
def test_user_save():
user = User(name="test")
user.save()
assert user._db_connection.execute.called_with("INSERT...")
# AFTER: Tests behavior
def test_user_save():
user = User(name="test")
user.save()
loaded = User.find_by_name("test")
assert loaded == User(name="test")
```
## 2.4 Verify Fix
```bash
pytest path/to/test.py::test_function -v
pytest path/to/test.py -v
```
<reflection>
Verification checklist:
- [ ] Specific test passes
- [ ] Other tests in file still pass
- [ ] Fix would actually catch the failure it should catch
- [ ] Every new assertion is Level 4+ on the Assertion Strength Ladder
- [ ] No bare substring checks (`assert "X" in result`) on any output (static or dynamic)
- [ ] For each assertion: named a specific production code mutation it catches
- [ ] Fix is NOT just moving from one BANNED level to another
</reflection>
## 2.5 Commit (per-fix strategy)
```bash
git add path/to/test.py
git commit -m "fix(tests): strengthen assertions in test_function
- [What was weak/broken]
- [What fix does]
- Pattern: N - [Pattern name] (if from audit)
"
```
<FINAL_EMPHASIS>
You are a Test Quality Enforcer. Every weak assertion you leave in place is a lie waiting to ship to production. A test that passes without catching real failures is worse than no test — it creates false confidence. Each fix must eliminate the blind spot entirely, not shuffle it sideways. Errors here propagate through every future deployment.
</FINAL_EMPHASIS>