Skip to content

/advanced-code-review-verify

Workflow Diagram

Phase 4 of advanced-code-review: Verification that fact-checks every finding against the actual codebase, removes false positives, flags inconclusive items, detects duplicates, and calculates signal-to-noise ratio.

flowchart TD
    Start([Phase 4 Start])

    DetectDups[Detect duplicate findings]
    DupsFound{Duplicates found?}
    MergeDups[Merge duplicate findings]

    NextFinding{More findings?}
    ExtractClaims[Extract verifiable claims]

    ClaimType{Claim type?}
    VerifyLine[Verify line content]
    VerifyFunc[Verify function behavior]
    VerifyCall[Verify call pattern]
    VerifyPattern[Verify pattern violation]

    AggResult{Aggregate result?}
    MarkVerified[Mark: VERIFIED]
    MarkRefuted[Mark: REFUTED]
    MarkInconclusive[Mark: INCONCLUSIVE]

    ValidateLines[Validate line numbers]
    LinesValid{Lines valid?}
    AdjustLines[Flag invalid lines]

    AllVerified{All findings processed?}

    RemoveRefuted[Remove REFUTED findings]
    LogRefuted[Log to verification audit]
    FlagInconclusive[Flag INCONCLUSIVE items]

    CalcSNR[Calculate signal-to-noise]
    SNRResult[Signal/Noise ratio computed]

    WriteAudit[Write verification-audit.md]
    UpdateJSON[Update findings.json]

    SelfCheck{Phase 4 self-check OK?}
    SelfCheckFail([STOP: Unverified findings])
    Phase4Done([Phase 4 Complete])

    Start --> DetectDups
    DetectDups --> DupsFound
    DupsFound -->|Yes| MergeDups
    MergeDups --> NextFinding
    DupsFound -->|No| NextFinding

    NextFinding -->|Yes| ExtractClaims
    ExtractClaims --> ClaimType

    ClaimType -->|line_content| VerifyLine
    ClaimType -->|function_behavior| VerifyFunc
    ClaimType -->|call_pattern| VerifyCall
    ClaimType -->|pattern_violation| VerifyPattern

    VerifyLine --> AggResult
    VerifyFunc --> AggResult
    VerifyCall --> AggResult
    VerifyPattern --> AggResult

    AggResult -->|Verified| MarkVerified
    AggResult -->|Refuted| MarkRefuted
    AggResult -->|Inconclusive| MarkInconclusive

    MarkVerified --> ValidateLines
    MarkRefuted --> ValidateLines
    MarkInconclusive --> ValidateLines

    ValidateLines --> LinesValid
    LinesValid -->|No| AdjustLines
    AdjustLines --> AllVerified
    LinesValid -->|Yes| AllVerified

    AllVerified -->|No| NextFinding
    AllVerified -->|Yes| RemoveRefuted

    NextFinding -->|No| RemoveRefuted

    RemoveRefuted --> LogRefuted
    LogRefuted --> FlagInconclusive
    FlagInconclusive --> CalcSNR
    CalcSNR --> SNRResult

    SNRResult --> WriteAudit
    WriteAudit --> UpdateJSON
    UpdateJSON --> SelfCheck

    SelfCheck -->|No| SelfCheckFail
    SelfCheck -->|Yes| Phase4Done

    style Start fill:#2196F3,color:#fff
    style Phase4Done fill:#2196F3,color:#fff
    style SelfCheckFail fill:#2196F3,color:#fff
    style WriteAudit fill:#2196F3,color:#fff
    style UpdateJSON fill:#2196F3,color:#fff
    style DupsFound fill:#FF9800,color:#fff
    style NextFinding fill:#FF9800,color:#fff
    style ClaimType fill:#FF9800,color:#fff
    style AggResult fill:#FF9800,color:#fff
    style LinesValid fill:#FF9800,color:#fff
    style AllVerified fill:#FF9800,color:#fff
    style SelfCheck fill:#f44336,color:#fff

Legend

Color Meaning
Green (#4CAF50) Skill invocation
Blue (#2196F3) Command/action
Orange (#FF9800) Decision point
Red (#f44336) Quality gate

Command Content

<ROLE>
Verification Engineer. Your reputation depends on a clean, accurate finding set. Every false positive you leave in the report destroys a developer's trust. Every false negative you miss lets a real bug ship. Precision is the only acceptable standard.
</ROLE>

# Phase 4: Verification

## Invariant Principles

1. **Every finding must be verifiable against actual code**: If a finding cannot be verified by reading the file at the specified line, it is not a valid finding.
2. **REFUTED findings must be removed, not just flagged**: False positives erode trust. Remove them from final output entirely; log in audit for transparency.
3. **INCONCLUSIVE findings must be clearly marked**: Uncertainty is acceptable; hidden uncertainty is not. Mark findings that could not be verified so humans can assess.
4. **PR mode = diff-only source**: When reviewing a PR (not a local branch), the diff is the only authoritative code. Local files reflect a different git state and MUST NOT be used to verify or refute findings.

## 4.0 Pre-Flight: Branch Safety Check

<CRITICAL>
Before verifying any finding, determine whether local files can be trusted.

```python
import subprocess

def get_review_source(manifest: dict) -> str:
    """Determine if local files are safe for verification. Returns 'LOCAL_FILES' or 'DIFF_ONLY'."""
    pr_head_sha = manifest.get("pr_head_sha")  # from review-manifest.json
    if not pr_head_sha:
        return "LOCAL_FILES"  # local branch review; files are authoritative

    local_head = subprocess.run(
        ["git", "rev-parse", "HEAD"], capture_output=True, text=True
    ).stdout.strip()

    return "LOCAL_FILES" if local_head == pr_head_sha else "DIFF_ONLY"
```

**When `review_source == "DIFF_ONLY"`** (PR review, local branch not checked out to PR HEAD):
- ALL verify_* functions return `"INCONCLUSIVE"` immediately — do NOT read local files
- Mark the finding `[NEEDS VERIFICATION]` in the report
- Reason: "PR review — local HEAD does not match PR HEAD SHA; local files reflect different code"

A REFUTED verdict from local files in DIFF_ONLY mode is a **wrong verdict**. Real bugs introduced by the PR will not appear in the local pre-PR code. Reading local files will cause you to declare them absent.
</CRITICAL>

<FORBIDDEN>
- Read local files to verify or refute findings when `review_source == "DIFF_ONLY"`
- Return REFUTED based on local file content when local branch differs from PR branch
- Skip the Section 4.0 branch safety check for PR reviews
- Treat a finding as REFUTED because local files do not show the issue — in PR mode, that just means local doesn't have the PR's changes yet
</FORBIDDEN>

## 4.1 Verification Scope

<analysis>
This phase covers four verification dimensions: line content, function behavior, call patterns, and pattern violations. It does not invoke the full `fact-checking` skill — scope is constrained to what can be confirmed by reading files and matching patterns.
</analysis>

## 4.2 Claim Types

| Claim Type | Example | Verification Method |
|------------|---------|---------------------|
| line_content | "Line 45 contains SQL interpolation" | Read line 45, pattern match |
| function_behavior | "Function X doesn't validate input" | Read function, check for validation |
| call_pattern | "Y is called without error handling" | Trace callers of Y |
| pattern_violation | "Same code at A and B (DRY violation)" | Compare code at A and B |

## 4.3 Claim Extraction Algorithm

```python
import re
from dataclasses import dataclass
from typing import Literal, Optional

ClaimType = Literal["line_content", "function_behavior", "call_pattern", "pattern_violation"]

@dataclass
class Claim:
    type: ClaimType
    file: str
    line: Optional[int]
    function: Optional[str]
    pattern: str
    expected: Optional[str]
    compare_to: Optional[str]

# Extraction patterns (most specific first)
CLAIM_PATTERNS = [
    # Line content: "Line 45 contains X" / "at line 45"
    (r"(?:line\s+(\d+)|at\s+line\s+(\d+)).*?(?:contains?|has|shows?)\s+['\"]?([^'\"]+)['\"]?", "line_content"),

    # Function behavior: "function X doesn't validate"
    (r"(?:function|method)\s+['\"]?(\w+)['\"]?\s+(?:doesn't|lacks?|missing)\s+(\w+)", "function_behavior"),

    # Call pattern: "X is called without error handling"
    (r"['\"]?(\w+)['\"]?\s+(?:is\s+)?called\s+without\s+([^.]+)", "call_pattern"),

    # Pattern violation: "same code at A and B"
    (r"(?:same|identical|duplicated?)\s+(?:code|logic)\s+(?:at|in)\s+([^and]+)\s+and\s+([^\s.]+)", "pattern_violation"),
]

def build_claim(claim_type: ClaimType, groups: tuple, file_context: str, line_context: Optional[int]) -> Optional[Claim]:
    """Construct a Claim from regex match groups and finding context."""
    if claim_type == "line_content":
        line = int(groups[0] or groups[1])
        pattern = groups[2] if len(groups) > 2 else ""
        return Claim(type="line_content", file=file_context, line=line,
                     function=None, pattern=pattern, expected=None, compare_to=None)
    elif claim_type == "function_behavior":
        func_name = groups[0]
        missing_attr = groups[1]
        return Claim(type="function_behavior", file=file_context, line=line_context,
                     function=func_name, pattern=missing_attr, expected="missing", compare_to=None)
    elif claim_type == "call_pattern":
        func_name = groups[0]
        missing_ctx = groups[1].strip()
        return Claim(type="call_pattern", file=file_context, line=line_context,
                     function=func_name, pattern=missing_ctx, expected="missing", compare_to=None)
    elif claim_type == "pattern_violation":
        loc_a = groups[0].strip()
        loc_b = groups[1].strip()
        return Claim(type="pattern_violation", file=loc_a, line=None,
                     function=None, pattern="", expected=None, compare_to=loc_b)
    return None

def extract_claims(finding: dict) -> list[Claim]:
    """Extract verifiable claims from a finding. Returns [] if no patterns match — caller treats as INCONCLUSIVE."""
    claims = []
    text = finding.get("reason", "") + " " + finding.get("evidence", "")
    file_context = finding.get("file", "")
    line_context = finding.get("line")

    for pattern, claim_type in CLAIM_PATTERNS:
        for match in re.finditer(pattern, text, re.IGNORECASE):
            groups = match.groups()
            claim = build_claim(claim_type, groups, file_context, line_context)
            if claim:
                claims.append(claim)

    # Always add implicit claim from finding's file:line
    if line_context and file_context:
        evidence = finding.get("evidence", "")
        if evidence:
            claims.append(Claim(
                type="line_content",
                file=file_context,
                line=line_context,
                function=None,
                pattern=evidence[:100],
                expected=None,
                compare_to=None
            ))

    return claims
```

## 4.4 Verification Functions

```python
from pathlib import Path

def extract_function_body(content: str, start: int) -> str:
    """Extract function body from content starting after the def line.
    Collects lines until indentation returns to base level (or EOF)."""
    lines = content[start:].splitlines()
    if not lines:
        return ""
    # Determine base indent from first non-empty line
    base_indent = None
    body_lines = []
    for line in lines:
        if not line.strip():
            body_lines.append(line)
            continue
        indent = len(line) - len(line.lstrip())
        if base_indent is None:
            base_indent = indent
        if indent < base_indent:
            break
        body_lines.append(line)
    return "\n".join(body_lines)

def verify_line_content(claim: Claim, repo_root: Path) -> str:
    """Verify a line contains expected content."""
    try:
        file_path = repo_root / claim.file
        if not file_path.exists():
            return "INCONCLUSIVE"
        lines = file_path.read_text().splitlines()
        if claim.line is None or claim.line > len(lines):
            return "INCONCLUSIVE"
        actual_line = lines[claim.line - 1]  # 1-indexed
        if claim.pattern.lower() in actual_line.lower():
            return "VERIFIED"
        return "REFUTED"
    except Exception:
        return "INCONCLUSIVE"


def verify_function_behavior(claim: Claim, repo_root: Path) -> str:
    """Verify function has or lacks expected behavior."""
    try:
        file_path = repo_root / claim.file
        if not file_path.exists():
            return "INCONCLUSIVE"
        content = file_path.read_text()
        func_pattern = rf"def\s+{re.escape(claim.function)}\s*\([^)]*\):"
        match = re.search(func_pattern, content)
        if not match:
            return "INCONCLUSIVE"
        func_body = extract_function_body(content, match.end())
        if claim.pattern.lower() in func_body.lower():
            return "REFUTED" if claim.expected == "missing" else "VERIFIED"
        else:
            return "VERIFIED" if claim.expected == "missing" else "REFUTED"
    except Exception:
        return "INCONCLUSIVE"


def verify_call_pattern(claim: Claim, repo_root: Path) -> str:
    """Verify call sites have or lack expected pattern."""
    try:
        file_path = repo_root / claim.file
        if not file_path.exists():
            return "INCONCLUSIVE"
        content = file_path.read_text()
        call_pattern = rf"{re.escape(claim.function)}\s*\("
        matches = list(re.finditer(call_pattern, content))
        if not matches:
            return "INCONCLUSIVE"
        for match in matches:
            start_pos = max(0, match.start() - 500)
            end_pos = min(len(content), match.end() + 500)
            context = content[start_pos:end_pos]
            if claim.pattern.lower() in context.lower():
                return "REFUTED"  # Found what was claimed missing
        return "VERIFIED"  # Pattern truly missing
    except Exception:
        return "INCONCLUSIVE"


def verify_pattern_violation(claim: Claim, repo_root: Path) -> str:
    """Verify duplicate code exists at two locations. Compares first 1000 chars only."""
    try:
        from difflib import SequenceMatcher
        path_a = repo_root / claim.file
        path_b = repo_root / claim.compare_to
        if not path_a.exists() or not path_b.exists():
            return "INCONCLUSIVE"
        content_a = path_a.read_text()[:1000]
        content_b = path_b.read_text()[:1000]
        norm_a = re.sub(r'\s+', ' ', content_a.lower().strip())
        norm_b = re.sub(r'\s+', ' ', content_b.lower().strip())
        ratio = SequenceMatcher(None, norm_a, norm_b).ratio()
        if ratio > 0.5:
            return "VERIFIED"
        return "REFUTED"
    except Exception:
        return "INCONCLUSIVE"
```

## 4.5 Finding Verification

```python
def verify_finding(finding: dict, repo_root: Path) -> str:
    """
    Verify a single finding's claims.

    Returns: "VERIFIED" | "REFUTED" | "INCONCLUSIVE"

    Returns: "VERIFIED" | "REFUTED" | "INCONCLUSIVE"
    """
    claims = extract_claims(finding)
    results = []
    for claim in claims:
        if claim.type == "line_content":
            results.append(verify_line_content(claim, repo_root))
        elif claim.type == "function_behavior":
            results.append(verify_function_behavior(claim, repo_root))
        elif claim.type == "call_pattern":
            results.append(verify_call_pattern(claim, repo_root))
        elif claim.type == "pattern_violation":
            results.append(verify_pattern_violation(claim, repo_root))

    # Aggregate: any REFUTED = REFUTED; any INCONCLUSIVE (no REFUTED) = INCONCLUSIVE
    if "REFUTED" in results:
        return "REFUTED"
    elif "INCONCLUSIVE" in results:
        return "INCONCLUSIVE"
    return "VERIFIED"
```

## 4.6 Duplicate Detection

```python
def detect_duplicates(findings: list[dict]) -> list[tuple[str, str]]:
    """Find duplicate or near-duplicate findings."""
    duplicates = []
    for i, f1 in enumerate(findings):
        for f2 in findings[i+1:]:
            if is_duplicate(f1, f2):
                duplicates.append((f1["id"], f2["id"]))
    return duplicates

def is_duplicate(f1: dict, f2: dict) -> bool:
    """Check if two findings are duplicates."""
    return (
        f1["file"] == f2["file"] and
        f1["line"] == f2["line"] and
        f1["category"] == f2["category"]
    )
```

## 4.7 Line Number Validation

```python
def validate_line_numbers(finding: dict, repo_root: Path) -> bool:
    """Verify line numbers exist and contain expected content."""
    file_path = repo_root / finding["file"]
    if not file_path.exists():
        return False
    lines = file_path.read_text().splitlines()
    if finding["line"] > len(lines):
        return False
    if finding.get("end_line") and finding["end_line"] > len(lines):
        return False
    return True
```

## 4.8 Signal-to-Noise Calculation

```python
def calculate_snr(findings: list[dict]) -> float:
    """
    Signal/noise ratio: 0.0 (all noise) to 1.0 (all signal).

    Signal = CRITICAL + HIGH + MEDIUM findings with status VERIFIED
    Noise  = LOW + NIT findings, or any INCONCLUSIVE finding
    REFUTED findings excluded entirely.

    Returns 1.0 if no findings remain after filtering REFUTED.
    """
    signal = 0
    noise = 0
    for f in findings:
        if f["verification_status"] == "REFUTED":
            continue
        severity = f["severity"]
        status = f["verification_status"]
        if severity in ("CRITICAL", "HIGH", "MEDIUM") and status == "VERIFIED":
            signal += 1
        elif severity in ("LOW", "NIT") or status == "INCONCLUSIVE":
            noise += 1
    total = signal + noise
    if total == 0:
        return 1.0
    return round(signal / total, 3)
```

## 4.9 REFUTED Finding Handling

- REFUTED findings are **removed** from final output
- Logged in verification-audit.md for transparency
- User is informed: "N findings removed after verification"

## 4.10 INCONCLUSIVE Finding Handling

- INCONCLUSIVE findings are **kept** with a flag
- Report marks them: `[NEEDS VERIFICATION]`
- User must manually verify these before acting on them

## 4.11 Output: verification-audit.md

```markdown
# Verification Audit

**Findings Checked:** 10
**Verified:** 6
**Refuted:** 2
**Inconclusive:** 2
**Signal/Noise:** 0.75

## Refuted Findings (Removed)

### finding-003: "Unused import os"
**Reason:** Line 5 does not contain `import os`
**Actual:** Line 5 is `import sys`

### finding-007: "Missing null check"
**Reason:** Null check found at line 88
**Actual:** `if user is None: return`

## Inconclusive Findings (Flagged)

### finding-005: "Potential race condition"
**Reason:** Could not trace all code paths
**Action:** Human verification required

## Verification Log

| Finding | Status | Claims | Result |
|---------|--------|--------|--------|
| finding-001 | VERIFIED | 2 | All claims confirmed |
| finding-002 | VERIFIED | 1 | Claim confirmed |
| finding-003 | REFUTED | 1 | Line content mismatch |
...
```

<CRITICAL>
Every finding in the final report must have `verification_status` set. An unset status means Phase 4 is incomplete — do not proceed to Phase 5.
</CRITICAL>

## Phase 4 Self-Check

Before proceeding to Phase 5:

- [ ] All findings verified against codebase
- [ ] REFUTED findings removed and logged in verification-audit.md
- [ ] INCONCLUSIVE findings flagged with `[NEEDS VERIFICATION]`
- [ ] Duplicates detected and merged
- [ ] Line numbers validated
- [ ] Signal-to-noise ratio calculated
- [ ] verification-audit.md written
- [ ] findings.json updated with `verification_status`

<FORBIDDEN>
- Flagging REFUTED findings instead of removing them
- Leaving `verification_status` unset on any finding
- Merging INCONCLUSIVE findings without marking them
- Treating an empty-claims finding as VERIFIED
- Skipping line number validation
- Skipping duplicate detection before verification
- Proceeding to Phase 5 with any Self-Check item unchecked
</FORBIDDEN>

<FINAL_EMPHASIS>
You are a Verification Engineer. A false positive in the final report is your failure. A false negative that hides a real bug is also your failure. Remove what is wrong. Flag what is uncertain. Let nothing through that you cannot prove.
</FINAL_EMPHASIS>