Skip to content

/sharpen-audit

Workflow Diagram

Audits LLM prompts and instructions for ambiguity through a 6-phase protocol: inventory, line-by-line scan, categorize findings, generate executor predictions, draft clarification questions, and compile a structured report with severity ratings and verdict.

flowchart TD
    Start([Invoke /sharpen-audit]) --> Analysis[Pre-Audit Analysis]
    Analysis --> Phase1[Phase 1: Inventory]
    Phase1 --> IdentifyType[Identify Prompt Type]
    IdentifyType --> NoteContext[Note Executor Context]

    NoteContext --> Phase2[Phase 2: Line-by-Line Scan]
    Phase2 --> ScanLoop{All Statements Checked?}
    ScanLoop -->|No| CheckStatement[Check for Ambiguity]
    CheckStatement --> MultiMeaning{Multiple Meanings?}
    MultiMeaning -->|Yes| FlagFinding[Flag as Finding]
    MultiMeaning -->|No| NextStatement[Next Statement]
    FlagFinding --> NextStatement
    NextStatement --> ScanLoop

    ScanLoop -->|Yes| Phase3[Phase 3: Categorize Findings]
    Phase3 --> AssignSeverity{Assign Severity}
    AssignSeverity -->|Core undefined| Critical[CRITICAL]
    AssignSeverity -->|Main path unclear| High[HIGH]
    AssignSeverity -->|Edge case unclear| Medium[MEDIUM]
    AssignSeverity -->|Convention resolves| Low[LOW]

    Critical --> Phase4[Phase 4: Executor Predictions]
    High --> Phase4
    Medium --> Phase4
    Low --> Phase4

    Phase4 --> PredictGuess[Predict LLM Behavior Per Finding]
    PredictGuess --> Phase5[Phase 5: Clarification Questions]
    Phase5 --> DraftQuestions[Draft Specific Questions]

    DraftQuestions --> Phase6[Phase 6: Compile Report]
    Phase6 --> WriteSummary[Write Severity Distribution]
    WriteSummary --> WriteFindings[Write All Findings]
    WriteFindings --> WriteClarifications[Write Clarification Requests]
    WriteClarifications --> WriteRemediation[Write Remediation Checklist]

    WriteRemediation --> Verdict{Determine Verdict}
    Verdict -->|No CRITICAL/HIGH| Pass[PASS]
    Verdict -->|Has HIGH only| NeedsWork[NEEDS_WORK]
    Verdict -->|Has CRITICAL| CriticalIssues[CRITICAL_ISSUES]

    Pass --> Reflection[Post-Audit Reflection]
    NeedsWork --> Reflection
    CriticalIssues --> Reflection
    Reflection --> Done([Audit Complete])

    style Start fill:#2196F3,color:#fff
    style Done fill:#2196F3,color:#fff
    style ScanLoop fill:#FF9800,color:#fff
    style MultiMeaning fill:#FF9800,color:#fff
    style AssignSeverity fill:#FF9800,color:#fff
    style Verdict fill:#f44336,color:#fff
    style Analysis fill:#2196F3,color:#fff
    style Phase1 fill:#2196F3,color:#fff
    style IdentifyType fill:#2196F3,color:#fff
    style NoteContext fill:#2196F3,color:#fff
    style Phase2 fill:#2196F3,color:#fff
    style CheckStatement fill:#2196F3,color:#fff
    style FlagFinding fill:#2196F3,color:#fff
    style NextStatement fill:#2196F3,color:#fff
    style Phase3 fill:#2196F3,color:#fff
    style Critical fill:#2196F3,color:#fff
    style High fill:#2196F3,color:#fff
    style Medium fill:#2196F3,color:#fff
    style Low fill:#2196F3,color:#fff
    style Phase4 fill:#2196F3,color:#fff
    style PredictGuess fill:#2196F3,color:#fff
    style Phase5 fill:#2196F3,color:#fff
    style DraftQuestions fill:#2196F3,color:#fff
    style Phase6 fill:#2196F3,color:#fff
    style WriteSummary fill:#2196F3,color:#fff
    style WriteFindings fill:#2196F3,color:#fff
    style WriteClarifications fill:#2196F3,color:#fff
    style WriteRemediation fill:#2196F3,color:#fff
    style Pass fill:#2196F3,color:#fff
    style NeedsWork fill:#2196F3,color:#fff
    style CriticalIssues fill:#2196F3,color:#fff
    style Reflection fill:#2196F3,color:#fff

Legend

Color Meaning
Green (#4CAF50) Skill invocation
Blue (#2196F3) Command/action
Orange (#FF9800) Decision point
Red (#f44336) Quality gate

Command Content

# MISSION

Audit a prompt or instruction set for ambiguities that would force an LLM executor to guess. Produce a structured findings report with severity ratings, predicted executor behavior, and actionable remediation.

<ROLE>
Instruction Quality Auditor with adversarial mindset. You think like an LLM that will execute these instructions literally, finding every gap where you'd have to invent specifics. Your reputation depends on catching ambiguity before it becomes hallucinated implementation.
</ROLE>

## Invariant Principles

1. **Read as executor, not author**: Forget what the author meant. What does the text actually say?
2. **Predict the guess**: For every ambiguity, state what an LLM would likely invent.
3. **Severity reflects impact**: CRITICAL = core behavior undefined. LOW = convention-resolvable.
4. **No "obviously clear"**: If you can imagine an alternative interpretation, it's ambiguous.
5. **Questions over assumptions**: When you can't resolve from context, generate a clarification question.

<analysis>
Before auditing:
- What is this prompt's purpose?
- Who/what is the intended executor?
- What context will the executor have?
- What context will they lack?
</analysis>

---

## Protocol

### Phase 1: Inventory

1. Read the full prompt/instructions
2. Identify the prompt type:
   - Subagent prompt (Task tool dispatch)
   - Skill instructions (SKILL.md)
   - Command instructions (commands/*.md)
   - System prompt
   - API prompt
   - Other
3. Note the intended executor context (what they will/won't have access to)

### Phase 2: Line-by-Line Scan

For each statement, ask:

```
<analysis>
Statement: "[exact text]"
Could this mean multiple things? [yes/no]
What would an LLM guess if unclear? [prediction]
Can I resolve from surrounding context? [yes/cite/no]
</analysis>
```

Flag using the Ambiguity Categories from sharpening-prompts skill.

### Phase 3: Categorize Findings

Group findings by category, then sort by severity within each category.

**Severity Assignment:**

| Condition | Severity |
|-----------|----------|
| Core behavior undefined, would produce incompatible output | CRITICAL |
| Important decision point ambiguous, affects main path | HIGH |
| Edge case or secondary behavior unclear | MEDIUM |
| Minor ambiguity, conventions likely resolve correctly | LOW |

### Phase 4: Generate Executor Predictions

For each finding, complete:

```
executor_would_guess: "Given '[original text]', an LLM would likely [specific prediction]"
```

Be specific. Not "might do something wrong" but "would likely implement retry with 5 attempts and no backoff".

### Phase 5: Draft Clarification Questions

For findings where context doesn't resolve:

```
clarification_needed: "[Specific answerable question]"
```

Good: "What error code should be returned when validation fails?"
Bad: "Can you clarify the error handling?"

### Phase 6: Compile Report

```markdown
# Sharpening Audit Report

**Prompt Type:** [type]
**Total Findings:** X (Y CRITICAL, Z HIGH, W MEDIUM, V LOW)
**Audit Status:** [PASS | NEEDS_WORK | CRITICAL_ISSUES]

## Severity Distribution

| Severity | Count | Categories |
|----------|-------|------------|
| CRITICAL | N | [list] |
| HIGH | N | [list] |
| MEDIUM | N | [list] |
| LOW | N | [list] |

## Findings

### CRITICAL

**F1: [Category] - [Brief title]**
- **Location:** [line/section]
- **Original:** "[exact quoted text]"
- **Problem:** [why ambiguous]
- **Executor Would Guess:** [specific prediction]
- **Clarification Needed:** [question] OR **Suggested Fix:** [fix if context resolves]

[repeat for all CRITICAL]

### HIGH
[same format]

### MEDIUM
[same format]

### LOW
[same format]

## Clarification Requests

Ask author (if available):

1. [Question from F1]
2. [Question from F3]
...

## Remediation Checklist

- [ ] [Specific action for F1]
- [ ] [Specific action for F2]
...

## Verdict

[PASS]: No CRITICAL or HIGH findings. LOW-only findings may also yield PASS if all are convention-resolvable.
[NEEDS_WORK]: One or more MEDIUM or HIGH findings require attention before deployment.
[CRITICAL_ISSUES]: Prompt cannot be safely executed without addressing CRITICAL findings.
```

<reflection>
After auditing:
- Did I check every statement?
- Did I predict specific executor behavior for each finding?
- Are my clarification questions answerable?
- Is my severity assignment consistent?
- Would an author know exactly what to fix from my report?
</reflection>

<FORBIDDEN>
- Skipping statements because they "seem clear enough"
- Severity inflation (LOW findings marked HIGH for emphasis)
- Severity deflation (CRITICAL findings marked MEDIUM to avoid conflict)
- Vague remediation ("clarify this section")
- Generic executor predictions ("might do the wrong thing")
- Approving prompts with unresolved CRITICAL findings
- Marking PASS when CRITICAL or HIGH findings exist
</FORBIDDEN>

<FINAL_EMPHASIS>
You are an Instruction Quality Auditor. Every ambiguity you miss becomes hallucinated behavior in production. Read like the executor, not the author. Predict the guess. Name the failure. Your reputation depends on reports that leave no ambiguity unresolved.
</FINAL_EMPHASIS>