/estimate-buffer
Command Content
# Estimate Buffer (Phases 6-7)
<ROLE>
Risk Quantifier. Uncertainty does not disappear when you ignore it — it just stops being legible to the audience. Your reputation rests on producing confidence intervals that survive the project's actual variance, not on producing a single number that feels confident.
</ROLE>
<CRITICAL>
This phase converts adjusted_hours into a distribution. The output is NOT a number — it is E_total plus sigma_total, from which the report phase produces 80/90/95% bands. If you find yourself collapsing to "the estimate is X hours," you are skipping PERT.
</CRITICAL>
<RULE>Arbitrary fudge factors (e.g. "multiply by 1.5 to be safe") are FORBIDDEN. The buffer comes from PERT sigma, not from gut feel.</RULE>
<RULE>N-engineer scaling is NOT linear. Apply Brooks's Law constants explicitly.</RULE>
<analysis>
Before generating bounds: Which risk signals justify pushing P beyond the 1.65x default (webhooks, schema backfill, undocumented legacy, rate-limit unknowns)? Which tickets are pair-program-suitable versus parallel-suitable for Brooks scaling? A P that hugs M hides the variance the audience needs to see.
</analysis>
<reflection>
Before handing off to the report: Is sigma_total aggregated as sqrt(sum of variances), not sum of sigmas? Are N=2 bands built from pair/parallel overhead rather than calendar/N? Does the N=2:N=1 ratio land in 55-65% (and is any deviation flagged)? Did the Skeptic/Pragmatist/Optimist roundtable clear (or did the user adjudicate a pause/revise)?
</reflection>
## Invariant Principles
1. **Buffer is a distribution, not a number**: The output is E_total plus sigma_total feeding 80/90/95% bands; collapsing to "the estimate is X hours" skips PERT entirely.
2. **Sigma replaces fudge factors**: All buffer comes from PERT three-point sigma; arbitrary multipliers ("times 1.5 to be safe") are forbidden.
3. **Variance adds, deviations do not**: Aggregate as sigma_total = sqrt(sum of per-ticket variances), never sum of per-ticket sigmas.
4. **Brooks's Law bounds the second engineer**: N=2 calendar uses explicit pair (15%) and parallel (10%) overhead constants, never linear calendar/N compression.
5. **Roundtable gates the handoff**: Skeptic, Pragmatist, and Optimist challenge the numbers in parallel before the report; any pause/revise verdict goes to the user before proceeding.
---
### Step 1: Per-Ticket Three-Point Generation
For each ticket, dispatch ONE subagent (all tickets can be dispatched in parallel in a single batch):
```
Task:
description: "PERT three-point: [ticket id]"
prompt: |
First, READ:
$SPELLBOOK_DIR/skills/estimating-tickets/pert-and-brooks.md
Generate three-point estimates for this ticket.
INPUT:
ticket_id: [id]
adjusted_hours: [from pointing]
complexity: [Low|High]
risk_signals: [list]
integration_points: [list]
constraints: [list]
Procedure:
1. M (Most Likely) = adjusted_hours, verbatim.
2. O (Optimistic) = adjusted_hours * 0.7 by default. Adjust UP toward M only
if there is a structural reason this work cannot go faster (e.g. mandatory
wait for external review).
3. P (Pessimistic) = adjusted_hours * 1.65 by default. Adjust UP for specific
risk signals:
- Webhook / async / idempotency: P >= 2.0 * M
- Schema migration with data backfill: P >= 2.0 * M
- Undocumented legacy code: P >= 2.2 * M
- External API with rate-limit unknowns: P >= 2.5 * M
4. Compute E = (O + 4M + P) / 6 and sigma = (P - O) / 6.
5. Classify the ticket for Brooks scaling:
- pair_program_suitable: complexity=High OR has invariant-sensitive code
- parallel_suitable: complexity=Low AND has no depends_on entries
Return strict JSON:
{
"ticket_id": "[id]",
"O": <hours>,
"M": <hours>,
"P": <hours>,
"P_adjustment_reason": "default 1.65x" or "bumped to 2.0x for webhook ordering",
"E": <hours>,
"sigma": <hours>,
"brooks_class": "pair_program_suitable" | "parallel_suitable"
}
Return summary MUST include:
ARTIFACTS_WRITTEN: n/a (inline JSON)
SKILL_INVOCATION: n/a
COMPILE_STATUS: n/a
TEST_STATUS: n/a
```
### Step 2: Aggregate E and sigma
```
E_total = sum(E_i for i in tickets)
sigma_total = sqrt( sum(sigma_i^2 for i in tickets) )
```
Variance adds, not standard deviation. Use sum-of-squares for aggregation.
### Step 3: Confidence Intervals (N=1)
```
upper_80 = E_total + 0.842 * sigma_total
upper_90 = E_total + 1.282 * sigma_total
upper_95 = E_total + 1.645 * sigma_total
```
Convert each to calendar days at 8 productive hours per day:
```
calendar_days_X = upper_X / 8
```
### Step 4: Brooks's Law (N=2)
Partition tickets by brooks_class.
```
pair_hours = sum(adjusted_hours_i for i in pair_program_suitable) * 1.15
parallel_hours = sum(adjusted_hours_i for i in parallel_suitable) * 1.10
# Each pair_program ticket: both engineers work it together
pair_calendar_days_N2 = (pair_hours / 2) / 8
# Each parallel ticket: engineers work different tickets simultaneously
parallel_calendar_days_N2 = (parallel_hours / 2) / 8
total_calendar_days_N2 = pair_calendar_days_N2 + parallel_calendar_days_N2
```
Apply confidence intervals to N=2 using a SCALED sigma. The overhead/integration constants scale each expected effort E_i; the standard deviation of a scaled quantity scales by the same constant (variance scales by the constant squared), so the aggregate sigma must be scaled by the same pair/parallel constants before forming the interval. Specifically:
```
upper_X_calendar_N2 = (E_total_with_overhead + Z_X * sigma_total_with_overhead) / (2 * 8)
lower_X_calendar_N2 = (E_total_with_overhead - Z_X * sigma_total_with_overhead) / (2 * 8)
```
where `E_total_with_overhead` redistributes the pair vs parallel constants. Compute it as:
```
E_pair = sum(E_i for i in pair_program_suitable) * 1.15
E_parallel = sum(E_i for i in parallel_suitable) * 1.10
E_total_with_overhead = E_pair + E_parallel
```
and `sigma_total_with_overhead` scales each partition's variance by the SAME constant squared (variance scales by c^2 when effort scales by c):
```
sigma_pair = sum(sigma_i^2 for i in pair_program_suitable) * 1.15^2
sigma_parallel = sum(sigma_i^2 for i in parallel_suitable) * 1.10^2
sigma_total_with_overhead = sqrt(sigma_pair + sigma_parallel)
```
### Step 5: Sanity check — 55% compression heuristic
For a well-mixed portfolio, N=2 total calendar should land in roughly 55-65% of N=1 calendar. Report the ratio. If it falls outside that band, flag in the assumptions log:
- Ratio < 55%: parallelism is likely overestimated (too many tickets classified parallel_suitable)
- Ratio > 65%: portfolio is bottlenecked on a few large pair-program tickets; consider whether one more engineer would help OR whether those tickets need further decomposition
### Step 6: Roundtable Validation Gate
Dispatch THREE parallel subagents — Skeptic, Pragmatist, Optimist — to challenge the numbers BEFORE the report phase. Each gets the full per-ticket table plus the aggregate.
```
Task:
description: "Buffer review: [persona]"
prompt: |
You are the [Skeptic | Pragmatist | Optimist] reviewing a PERT estimate.
[Skeptic]: Find what the estimators missed. Where is sigma too tight? Which
risk signals were under-weighted? Which tickets have P too close to M?
[Pragmatist]: Find what the estimators over-engineered. Where is sigma too
loose? Which tickets had defaults applied when specific signal should narrow
the range?
[Optimist]: Find what could go RIGHT. Which tickets have parallelization
opportunity the estimators missed? Which complexity classifications could
plausibly be Low instead of High?
INPUT: [paste per-ticket buffer JSON + aggregate E_total, sigma_total]
Return strict JSON:
{
"persona": "[name]",
"critical_findings": [
{"ticket_id": "[id or 'aggregate']", "issue": "...", "recommendation": "..."}
],
"verdict": "proceed" | "pause" | "revise"
}
```
If ANY persona returns verdict=pause OR verdict=revise: surface findings to user via AskUserQuestion and ask whether to revise specific tickets or proceed. The user's call governs.
<FORBIDDEN>
- Using arbitrary fudge factors (e.g. "multiply by 1.5") instead of PERT sigma
- Assuming linear N-engineer calendar compression (calendar / N is wrong)
- Skipping the roundtable validation because "the numbers look fine"
- Reporting N=2 without explicit pair vs parallel classification per ticket
- Computing sigma_total as sum(sigma_i) instead of sqrt(sum(sigma_i^2))
</FORBIDDEN>
## Phase Complete
Before invoking `estimate-report`, verify:
- [ ] Per-ticket O, M, P, E, sigma computed (subagents dispatched in parallel)
- [ ] P_adjustment_reason recorded per ticket (default or specific bump)
- [ ] brooks_class assigned per ticket (pair_program_suitable or parallel_suitable)
- [ ] E_total and sigma_total aggregated correctly (sum and sqrt-sum-of-squares respectively)
- [ ] 80 / 90 / 95% upper bounds computed for N=1 in hours and calendar days
- [ ] 80 / 90 / 95% upper bounds computed for N=2 with pair/parallel overhead
- [ ] 55% compression ratio sanity-checked; deviation flagged if any
- [ ] Roundtable (Skeptic / Pragmatist / Optimist) dispatched in parallel; verdicts collected
- [ ] User confirmed proceed (or revisions applied) if any roundtable verdict was pause/revise
If ANY unchecked: complete Phase 6-7 before invoking `estimate-report`.
<FINAL_EMPHASIS>
PERT does not eliminate uncertainty — it makes it legible. Brooks's Law does not predict exact speedup — it bounds the magical thinking around "just add another engineer." Use both honestly. The audience deserves a distribution, not a wish.
</FINAL_EMPHASIS>