/review-plan-behavior¶
Workflow Diagram¶
Phase 3 of reviewing-impl-plans: Behavior Verification Audit. Audits every code reference in an implementation plan to ensure behaviors are verified from source (file:line) rather than assumed from method names. Flags the fabrication anti-pattern, detects dangerous assumption patterns, and identifies trial-and-error loop indicators.
Process Flow¶
flowchart TD
subgraph Legend
direction LR
L1[Process]
L2{Decision}
L3([Terminal])
L4[/"Deliverable"/]
L5[Quality Gate]:::gate
L6[Critical Flag]:::critical
end
Start([Receive implementation plan]) --> Principles[Apply Invariant Principles:<br>1. Inferred != verified<br>2. Fabrication is root failure<br>3. Every ref needs file:line]
Principles --> CollectRefs[Collect all code references<br>in the plan]
CollectRefs --> PickRef[Pick next reference]
PickRef --> HasCitation{Has file:line<br>citation?}
HasCitation -->|Yes| ReadSrc[Read actual source<br>at cited location]
HasCitation -->|No| FlagNoCite[Flag: Missing citation]:::critical
ReadSrc --> MatchBehavior{Behavior matches<br>plan's claim?}
MatchBehavior -->|Yes| LogVerified[Log as VERIFIED<br>in verification table]
MatchBehavior -->|No| LogMismatch[Log as ASSUMED - CRITICAL:<br>actual behavior differs]:::critical
FlagNoCite --> LogAssumed[Log as ASSUMED<br>in verification table]:::critical
LogMismatch --> LogAssumed
LogVerified --> CheckPatterns[Check Dangerous<br>Assumption Patterns]
LogAssumed --> CheckPatterns
CheckPatterns --> P1{Assumes convenience<br>parameters exist?}
P1 -->|"Yes: e.g. partial=True,<br>strict_mode=False"| FlagP1[Flag: Unverified<br>parameter assumption]:::critical
P1 -->|No| P2
FlagP1 --> P2{Assumes flexible behavior<br>from strict interfaces?}
P2 -->|"Yes: e.g. partial assertions,<br>subset of fields"| FlagP2[Flag: Unverified<br>interface assumption]:::critical
P2 -->|No| P3
FlagP2 --> P3{Assumes library behavior<br>from method names?}
P3 -->|"Yes: e.g. update() merges,<br>validate() returns"| FlagP3[Flag: Unverified<br>library assumption]:::critical
P3 -->|No| P4
FlagP3 --> P4{Assumes test utilities<br>work conveniently?}
P4 -->|"Yes: e.g. assert_model_updated<br>checks only specified fields"| FlagP4[Flag: Unverified<br>test utility assumption]:::critical
P4 -->|No| MoreRefs
FlagP4 --> MoreRefs{More references<br>to audit?}
MoreRefs -->|Yes| PickRef
MoreRefs -->|No| LoopDetect[Loop Detection Scan]
LoopDetect --> HasLoops{Plan describes<br>trial-and-error?}
HasLoops -->|"Yes: try X, if fails try Y,<br>experiment, adjust until pass"| FlagLoop[RED FLAG: Author did not<br>verify behavior.<br>Require source citation.]:::critical
HasLoops -->|No| BuildTable
FlagLoop --> BuildTable
BuildTable --> BuildVerifTable[Build Verification Table:<br>Interface / Verified or Assumed /<br>Source Read / Actual Behavior /<br>Constraints]
BuildVerifTable --> GateCheck{All references<br>VERIFIED?}:::gate
GateCheck -->|"Yes: 0 ASSUMED entries"| DeliverClean[/"Deliver structured output:<br>- All D verified, 0 assumed<br>- No CRITICAL findings<br>- No loop red flags"/]
GateCheck -->|"No: ASSUMED entries exist"| Remediate[Generate remediation:<br>- Source files to read<br>- Citations to add<br>- Specific verifications needed]
Remediate --> DeliverFindings[/"Deliver structured output:<br>- D verified, E assumed<br>- All CRITICAL findings<br>- Loop detection red flags<br>- Remediation steps"/]
DeliverClean --> Done([Phase 3 Complete]):::success
DeliverFindings --> Done
classDef gate fill:#ff6b6b,stroke:#cc5555,color:#fff
classDef critical fill:#f44336,stroke:#c62828,color:#fff
classDef success fill:#51cf66,stroke:#40a854,color:#fff
Key Decision Points¶
| Decision | Branches | Outcome |
|---|---|---|
| Has file:line citation? | Yes / No | Proceed to source verification vs flag missing citation |
| Behavior matches plan's claim? | Yes / No | VERIFIED vs ASSUMED (CRITICAL) |
| Assumes convenience parameters? | Yes / No | Flag unverified parameter assumption |
| Assumes flexible behavior from strict interfaces? | Yes / No | Flag unverified interface assumption |
| Assumes library behavior from method names? | Yes / No | Flag unverified library assumption |
| Assumes test utilities work conveniently? | Yes / No | Flag unverified test utility assumption |
| More references to audit? | Yes / No | Loop back or proceed to loop detection |
| Plan describes trial-and-error? | Yes / No | RED FLAG requiring source citation |
| All references VERIFIED? | Yes (0 assumed) / No (assumed exist) | Clean delivery vs delivery with remediation |
Fabrication Anti-Pattern (flagged by this audit)¶
Plan assumes method does X based on name
-> Agent writes code, fails (method does Y)
-> Agent INVENTS parameter (partial=True)
-> Fails (parameter doesn't exist)
-> Debugging loop, never reads source
-> Hours wasted on fabricated solutions
The audit breaks this chain by requiring verified source citations before implementation begins.
Command Content¶
<ROLE>
Behavior Verification Auditor. Your reputation depends on catching every assumed behavior before it triggers a fabrication loop. A plan reaching implementation with unverified code references wastes hours of agent work.
</ROLE>
# Phase 3: Behavior Verification Audit
## Invariant Principles
1. **Inferred behavior is not verified behavior** - Method names suggest intent; only source confirms it
2. **Fabrication is the root failure** - Invented parameters or return types cascade into debugging loops
3. **Every code reference needs file:line** - Plans citing existing code without source location are unverified
<CRITICAL>
Every code reference MUST cite verified source (file:line). Method names do not constitute verification.
</CRITICAL>
## The Fabrication Anti-Pattern
```
# FORBIDDEN: The Fabrication Loop
1. Plan assumes method does X based on name
2. Agent writes code, fails because method actually does Y
3. Agent INVENTS parameter: method(..., partial=True)
4. Fails because parameter doesn't exist
5. Agent enters debugging loop, never reads source
6. Hours wasted on fabricated solutions
# REQUIRED in Plan
1. "Behavior verified by reading [file:line]"
2. Actual method signatures from source
3. Constraints discovered from reading source
4. Executing agents follow verified behavior, no guessing
```
## Dangerous Assumption Patterns
Flag when plan exhibits any of:
**1. Assumes convenience parameters exist:**
- "Pass `partial=True` to allow partial matching" (VERIFY THIS EXISTS)
- "Use `strict_mode=False` to relax validation" (VERIFY THIS EXISTS)
**2. Assumes flexible behavior from strict interfaces:**
- "The test context allows partial assertions" (VERIFY: many require exhaustive assertions)
- "The validator accepts subset of fields" (VERIFY: many require complete objects)
**3. Assumes library behavior from method names:**
- "The `update()` method will merge fields" (VERIFY: might replace entirely)
- "The `validate()` method returns errors" (VERIFY: might raise exceptions)
**4. Assumes test utilities work "conveniently":**
- "Our `assert_model_updated()` checks specified fields" (VERIFY: might require ALL changes)
- "Our `mock_service()` auto-mocks everything" (VERIFY: might require explicit setup)
## Verification Requirements
| Interface | Verified/Assumed | Source Read | Actual Behavior | Constraints |
|-----------|------------------|-------------|-----------------|-------------|
| [name] | VERIFIED/ASSUMED | [file:line] | [what it does] | [limitations] |
**Flag every ASSUMED entry as CRITICAL gap.**
## Loop Detection
Flag when plan describes:
- "Try X, if that fails try Y, if that fails try Z"
- "Experiment with different parameter combinations"
- "Adjust until tests pass"
**RED FLAG**: Plan author did not verify behavior. Require source citation instead.
## Deliverable
Structured output to orchestrator:
- Behavior verifications: D verified, E assumed (assumed = CRITICAL)
- All CRITICAL findings for assumed behaviors
- All loop detection red flags
- Specific remediation: source files to read, citations to add
<FINAL_EMPHASIS>
Assumed behavior in an implementation plan is not a minor gap—it is a time bomb. Every ASSUMED entry in the verification table is a fabrication waiting to happen. Flag them all. Your reputation depends on plans that implement correctly on the first pass, not on plans that merely look complete.
</FINAL_EMPHASIS>