Skip to content

/dedupe-apply

Command Content

# MISSION

Phase 4 of the `dedupe` skill: consume the report artifact produced by
`/dedupe-report`, apply every EXTRACT finding marked `apply` after one
final pre-edit checkpoint, and journal each edit in a deterministic,
rollback-ready format.

**Part of the dedupe-* command family.** Run after `/dedupe-report`.

## Invariant Principles

1. **Clean working tree is a hard gate**`git status --porcelain` must
   be empty. There is no override flag. The operator commits / stashes /
   discards before this command will touch the filesystem.
2. **Every edit is journaled** — the journal is the rollback source of
   truth. If an edit is not journaled, it did not happen.
3. **Per-finding final checkpoint** — every EXTRACT finding marked
   `apply` in the report receives one `AskUserQuestion` prompt
   immediately before its edit. There is no "apply all" affordance.
4. **Rollback is byte-exact** — restoring an original block requires the
   `new_path` content to match what the journal recorded as the
   post-apply state. Mismatch means the canonical home was edited
   externally; warn and skip rather than overwrite the operator's work.
5. **No Python, no shell scripts** — base64 encode/decode runs via the
   `base64` CLI through the harness Bash tool. Journal parsing uses
   POSIX text utilities plus `jq` (preinstalled via brew on the dev
   machine).

**Clean-tree-gate invariant:** The clean-tree gate has no override.
Every applied edit is journaled; the only way rollback remains
trustworthy is if the working tree before apply is a known git state.
Suppressing the gate would silently break rollback's correctness
invariant.

**Prohibited operations:**

- Editing any file before the clean-tree gate passes.
- Editing any file before the per-finding `AskUserQuestion` checkpoint
  for that finding has returned `apply`.
- Writing a journal entry whose `original_text_*_b64` field does not
  round-trip back to the exact bytes read from the source file.
- Rolling back a finding whose `new_path` content does not match the
  recorded `new_text_b64` (warn and skip instead).
- Implementing base64 encode/decode in Python.

---

## Invocation

```
/dedupe-apply <report-path>
/dedupe-apply --rollback <journal-path>
```

- `<report-path>` — the artifact produced by `/dedupe-report`. Required
  in apply mode.
- `--rollback <journal-path>` — switches to rollback mode. The journal
  is the artifact produced by an earlier successful or partial apply
  run.

The two modes are mutually exclusive.

---

## Mode A — Apply

### Phase 4.1 — Clean-tree hard gate

The very first action is:

```sh
git status --porcelain
```

run via the harness Bash tool from the repository root. If the output
is non-empty, HALT with the explicit error message:

```
dedupe-apply: working tree is not clean.
The following files have uncommitted changes:
<list of files>

Commit, stash, or discard them and re-invoke /dedupe-apply.
There is no override flag.
```

No journal entry is written. No file is edited.

### Phase 4.2 — Parse the report

Read the report at `<report-path>`. Identify every EXTRACT subsection
whose `Disposition` line records `apply`. Skip every other disposition
(`skip-this`, `defer-to-drift`, `mark-keep`) silently.

For each `apply` finding, extract:

- the pair `finding_id`;
- the two source file paths and heading chains for blocks A and B;
- the proposed canonical home path under `skills/shared-references/`;
- the rationale and counterfactual-loss prose (echoed into the journal
  narrative).

### Phase 4.3 — Compute the journal artifact path

```
~/.local/spellbook/docs/<project-encoded>/dedupe-journal-YYYY-MM-DD-<seed-slug>.md
```

The project-encoded prefix and seed slug are inherited from the report
header. `YYYY-MM-DD` is today's date in UTC, obtained via
`date -u +%Y-%m-%d`. Create the parent directory with `mkdir -p` if it
does not exist.

If a journal file already exists at that path (resume scenario), append
new entries to it rather than truncating. Read the existing journal to
identify which `finding_id`s have already been applied (entries with
`status=applied`); skip those on the second pass to keep the run
idempotent.

### Phase 4.4 — Per-finding apply loop

For each `apply` finding, in the order they appear in the report:

1. **Final pre-edit checkpoint.** Drive `AskUserQuestion` with the
   options:

   | Option | Effect |
   |---|---|
   | `apply` | Proceed with the edit for this finding. |
   | `skip-this` | Skip this finding; do not edit, do not journal. |
   | `abort-remaining` | Skip this and every remaining finding; finish the run. |

   This is the final safety checkpoint. There is no "apply all"
   affordance. If 50 prompts are operationally excessive, the operator
   should have narrowed the cost ceiling at `/dedupe-analyze`.

2. **Read the original block bodies.** For each of block A and block B,
   read the verbatim bytes from the source file. Hold them in memory
   for the journal entry.

3. **Base64-encode the originals** via the `base64` CLI through the
   harness Bash tool, e.g.:

   ```sh
   ORIG_A_B64="$(printf '%s' "$ORIG_A" | base64 | tr -d '\n')"
   ```

   The `printf '%s'` form avoids `echo`'s trailing newline. The
   `| tr -d '\n'` strips GNU `base64`'s default 76-column line wraps so
   the resulting string survives embedding in a single-line JSON value.
   Decoded round-trip is verified before writing the journal entry: if
   `printf '%s' "$ORIG_A_B64" | base64 --decode` does not produce
   byte-exact `$ORIG_A`, HALT with a journal-encode-failure error.
   Use `base64 --decode` (not `-d`) for portability across GNU and
   BSD/macOS implementations.

4. **Write the canonical home.** Create or overwrite the file at the
   proposed canonical home path under `skills/shared-references/`. The
   canonical home contains the consolidated block body. Capture the
   exact bytes written and base64-encode them as `new_text_b64`.

5. **Replace block A in its source file** with the single-line
   reference plumbing:

   ```
   See [<slug>](../shared-references/<slug>.md).
   ```

   Replace block B in its source file the same way.

6. **Append a journal entry.** Each entry is one markdown subsection
   with an HTML-comment-fenced JSON block embedded in it:

   ```
   ## EXTRACT-<finding_id_pair>

   Applied EXTRACT for blocks in <file_a> and <file_b>. Canonical home
   written to <new_path>; both original blocks replaced with single-line
   references.

   <!--FINDING {
     "id": "<finding_id_pair>",
     "verdict": "EXTRACT",
     "status": "applied",
     "timestamp": "<ISO8601 UTC>",
     "original_path_a": "<resolved path>",
     "original_text_a_b64": "<base64 of block A bytes>",
     "replacement_text_a": "See [<slug>](../shared-references/<slug>.md).",
     "original_path_b": "<resolved path>",
     "original_text_b_b64": "<base64 of block B bytes>",
     "replacement_text_b": "See [<slug>](../shared-references/<slug>.md).",
     "new_path": "skills/shared-references/<slug>.md",
     "new_text_b64": "<base64 of canonical home bytes>"
   } FINDING-->
   ```

   The `<!--FINDING ... FINDING-->` fence makes the entry parseable by
   substring scan without a markdown parser and invisible in rendered
   markdown.

   If step 2, 4, or 5 fails for any reason, append the journal entry
   with `status=failed` and a `error_message` field carrying the error
   text. The apply loop continues to the next finding; idempotency
   requires that partial runs be resumable.

### Phase 4.5 — Final summary

When every approved finding has been processed, emit a single-line
summary:

```
dedupe-apply complete: A applied, S skipped, F failed (journal: <path>)
```

---

## Mode B — Rollback

### Phase 4.6 — Parse the journal

Read the journal at `<journal-path>`. Extract every `<!--FINDING ...
FINDING-->` block via substring scan; do NOT rely on a markdown
parser. The pattern is anchored: scan for the literal opening sentinel
`<!--FINDING`, then for the matching closing sentinel `FINDING-->`.
The content between them is JSON.

Pipe each JSON block through `jq` for parsing and field extraction:

```sh
printf '%s\n' "$JSON" | jq -r '.id, .status, .original_path_a, .original_text_a_b64, ...'
```

Use `printf '%s\n'` rather than `echo "$JSON"`: `echo` may interpret
backslashes (`\n`, `\t`) inside JSON string values or treat a JSON
fragment beginning with `-` as a flag.

Discard entries whose `status` is not `applied`.

### Phase 4.7 — Per-entry restore

For each applied entry, in REVERSE order of the journal (most-recent
first), restore in three steps:

1. **Verify the canonical home content.** Read the current bytes at
   `new_path`. Compute its base64 via the `base64` CLI. Compare against
   the journal's `new_text_b64`.

   - If the bytes match byte-exactly, proceed to step 2.
   - If they differ, EMIT a single-line warning:

     ```
     dedupe-apply --rollback: skipping <new_path> — content differs from journal record.
     The operator may have edited the canonical home; reconcile manually.
     ```

     Skip steps 2 and 3 for this entry. Move to the next entry.

2. **Restore block A.** Base64-decode `original_text_a_b64` via
   `base64 --decode` through the harness Bash tool. In the source file at
   `original_path_a`, replace the current single-line reference
   plumbing with the decoded original bytes.

3. **Restore block B.** Same procedure with `original_text_b_b64` and
   `original_path_b`.

4. **Delete the canonical home.** Remove the file at `new_path`. If it
   does not exist (already deleted in a prior rollback), proceed silently.

### Phase 4.8 — Companion rollback journal

Write a companion artifact at:

```
~/.local/spellbook/docs/<project-encoded>/dedupe-rollback-YYYY-MM-DD-<seed-slug>.md
```

It records every restore action with the same HTML-comment-fenced JSON
shape, with `status` set to `rolled-back` or `skipped-content-differs`
as appropriate.

### Phase 4.9 — Rollback summary

Emit:

```
dedupe-apply --rollback complete: R restored, S skipped (companion journal: <path>)
```

Rollback does NOT require a clean working tree, but every restore step
that detected a post-apply edit (step 1 mismatch) is logged so the
operator can reconcile manually.

---

## Output

In apply mode:

```
~/.local/spellbook/docs/<project-encoded>/dedupe-journal-YYYY-MM-DD-<seed-slug>.md
```

In rollback mode:

```
~/.local/spellbook/docs/<project-encoded>/dedupe-rollback-YYYY-MM-DD-<seed-slug>.md
```

---

## References

- The classifier JSON schema (referenced for the verdict field set
  embedded in journal entries): `skills/dedupe/references/counterfactual-prompt.md`.
- Verdict catalog: `skills/dedupe/references/verdict-taxonomy.md`.

**Closing:** The clean-tree gate has no override. Every edit is
journaled. Rollback is byte-exact or it warns and skips. There are
no shortcuts on this phase; the operator's working tree is the
contract.