feat: session recovery toolset (orphan detector + /recover)

Reconstructs session logs from Claude Code transcripts when a session
crashes or is closed before /save. Two entry points:

- /recover <uuid|latest> : manual, Claude-reviewed reconstruction
- detect_orphaned_sessions.py : scheduled scan that auto-builds logs for
  substantive, unsaved, not-yet-recovered transcripts (banner-marked
  RECOVERED-UNVERIFIED), commits them, and posts a #bot-alerts FYI.

recover_session.py is the shared engine: Python extracts the verbatim
command/config/reference timeline; Ollama drafts prose-only narrative.
Machine-local ledger (.claude/state/) prevents reprocessing. Reviewed:
git add scoped to own files, ledger written only after successful push,
per-uuid idempotency, --max cap for unattended runs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-01 18:33:07 -07:00
parent e8144a862e
commit eed3ece2c7
9 changed files with 1897 additions and 0 deletions

View File

@@ -274,6 +274,7 @@ Vault structure: `infrastructure/`, `clients/`, `services/`, `projects/`, `msp-t
| `/shape-spec` | Pre-implementation spec for a GuruRMM feature — produces plan.md, shape.md, references.md, standards.md | | `/shape-spec` | Pre-implementation spec for a GuruRMM feature — produces plan.md, shape.md, references.md, standards.md |
| `/rmm-audit` | Full end-to-end audit of GuruRMM: API coverage, UI gaps, Rust/TS quality, security, data integrity. Produces timestamped report + updates UI_GAPS.md | | `/rmm-audit` | Full end-to-end audit of GuruRMM: API coverage, UI gaps, Rust/TS quality, security, data integrity. Produces timestamped report + updates UI_GAPS.md |
| `/forum-post` | Post a technical article to community.azcomputerguru.com — drafts from context, shows preview, inserts via paramiko SSH to Flarum DB | | `/forum-post` | Post a technical article to community.azcomputerguru.com — drafts from context, shows preview, inserts via paramiko SSH to Flarum DB |
| `/recover` | Reconstruct a session log from a Claude Code transcript after a crash/close-before-save. `/recover <uuid>`, `/recover latest`, or `/recover --list`. See `.claude/RECOVERY.md` |
--- ---

76
.claude/RECOVERY.md Normal file
View File

@@ -0,0 +1,76 @@
# Session Recovery
Never lose work again when a Claude Code session crashes or is closed before `/save`.
Claude Code writes every session live to a transcript JSONL. This toolset distills those transcripts back into normal session logs in the `.claude/commands/save.md` format.
---
## The three pieces
| Piece | File | Role |
|---|---|---|
| Engine | `.claude/scripts/recover_session.py` | Parses one transcript, classifies it, and reconstructs a full session log. CLI: `--uuid` / `--latest` / `--path` with `--print` (default), `--auto`, or `--json`. |
| Detector | `.claude/scripts/detect_orphaned_sessions.py` | Scans all idle transcripts, auto-recovers the orphans (substantive + unsaved), updates the ledger, commits + pushes, and posts an FYI to `#bot-alerts`. CLI: `--dry-run`, `--idle-min N`, `--no-commit`, `--no-alert`. |
| Command | `.claude/commands/recover.md` | `/recover <uuid>` / `/recover latest` / `/recover --list` — the **manual, reviewed** path where Claude edits the draft before writing. |
The scheduled-task registration script `.claude/scripts/register-orphan-detector.ps1` wires the detector into the Windows Task Scheduler (Windows only).
---
## Where things live
- **Transcripts:** `~/.claude/projects/<slug>/<uuid>.jsonl`, where `<slug>` is the claudetools repo root with `/`, `\`, and `:` each replaced by `-`. On a `D:\claudetools` machine the slug is `D--claudetools`, so `C:\Users\<you>\.claude\projects\D--claudetools\*.jsonl`. The slug is computed portably from `claudetools_root` in `.claude/identity.json`. Sibling `<uuid>/` dirs hold subagent transcripts and are ignored for the main narrative.
- **Ledger:** `.claude/state/recovered-sessions.json` (machine-local, gitignored). Records every processed uuid with its verdict (`recovered` / `skipped-saved` / `skipped-trivial` / `error`) so it is never re-scanned. Transcripts are per-machine, so the ledger is too.
---
## How to run
```bash
# See candidate orphans without writing anything:
py .claude/scripts/detect_orphaned_sessions.py --dry-run
# Inspect one transcript's verdict as JSON (writes nothing):
py .claude/scripts/recover_session.py --json --uuid <uuid>
# Print a reconstructed log to stdout (writes nothing):
py .claude/scripts/recover_session.py --uuid <uuid> --print
# Full unattended run (writes logs, updates ledger, commits, pushes, alerts):
py .claude/scripts/detect_orphaned_sessions.py
```
### Register the scheduled task (Windows)
```powershell
powershell -ExecutionPolicy Bypass -File D:\claudetools\.claude\scripts\register-orphan-detector.ps1
```
Registers `ClaudeTools - Orphaned Session Detector`: runs at logon and every 4 hours. The 4-hour cadence pairs with the detector's 90-minute idle gate so an active session is never grabbed mid-flight.
---
## Accuracy split: Ollama prose vs Python verbatim
This is the core design principle.
- **Ollama drafts prose only** — Session Summary, Key Decisions, Problems Encountered, Pending / Incomplete Tasks. It never sees and never emits commands, IPs, credentials, file paths, commit SHAs, or ticket IDs. If Ollama is unreachable the log is still produced with a placeholder note in the prose sections.
- **Python extracts the verbatim evidence** — Configuration Changes (Write/Edit/NotebookEdit targets), Commands & Outputs (mutating Bash/PowerShell with truncated results), Reference Information (regex-extracted SHAs, URLs, IPs, ticket numbers, coord message ids), and Infrastructure & Servers. This is the high-value, accuracy-critical part and it comes straight from the transcript.
Trust the verbatim sections for facts; treat the prose as a draft.
---
## Classification
- **substantive** — the session did real work: a Write/Edit/NotebookEdit, a mutating Bash/PowerShell command (git commit/push/add, ssh, schtasks, New-Item, Set-Content, Remove-Item, Out-File, a POST/PUT/DELETE/PATCH curl, an `/api/` call, `vault.sh`, a mutating Invoke-RestMethod), or a mutating Skill (syncro, rmm, remediation-tool, mailbox, forum-post, syncro-emergency-billing).
- **saved** — the session was already saved: a save/scc/checkpoint Skill, or a Write/Edit into a `session-logs/` path.
- **orphan** = substantive AND not saved. Only orphans are auto-recovered.
- **scope** — client / project / general, decided by Python from the transcript text, `cwd`, and `gitBranch` against the known client and project slugs. Conservative: ambiguous resolves to `general`.
---
## Banner discipline
Auto-recovered logs are written with a `[RECOVERED -- UNVERIFIED]` banner. **The banner stays until a human reviews the log** and removes it. The manual `/recover` path lets Claude review and correct the draft before writing, and drops the banner once verified.

View File

@@ -0,0 +1,84 @@
Reconstruct a session log from a Claude Code transcript when a session crashed or was closed before `/save`.
Claude Code writes every session live to a transcript JSONL under `~/.claude/projects/<slug>/<uuid>.jsonl`. `/recover` distills one of those transcripts back into a normal session log in the `.claude/commands/save.md` format. This is the **manual, reviewed** path; the background detector (`detect_orphaned_sessions.py`) handles unattended auto-recovery.
---
## Usage
| Invocation | Action |
|---|---|
| `/recover <uuid>` | Reconstruct the session with that transcript uuid |
| `/recover latest` | Reconstruct the newest transcript by mtime |
| `/recover --list` | Show candidate orphans (runs the detector `--dry-run`) |
---
## Flow: `/recover --list`
Run the detector in scan-only mode and present the table to the user:
```bash
py .claude/scripts/detect_orphaned_sessions.py --dry-run
```
The table shows every past-idle, not-yet-processed transcript with its uuid, mtime, `substantive`/`saved`/`orphan` verdicts, classified scope, and the path a recovery would write to. Point the user at the rows where `orphan` is `YES` — those are unsaved substantive sessions. Nothing is written.
---
## Flow: `/recover <uuid>` or `/recover latest`
This is a **reviewed** recovery. Claude is the editor, not a passive writer.
1. **Generate the draft** (prints to stdout, writes nothing):
```bash
py .claude/scripts/recover_session.py --uuid <uuid> --print
```
(or `--latest`). The draft contains:
- Ollama-drafted prose: Session Summary, Key Decisions, Problems Encountered, Pending / Incomplete Tasks.
- Python-extracted verbatim evidence: Configuration Changes, Commands & Outputs, Reference Information, Infrastructure & Servers, Credentials & Secrets.
- A `[RECOVERED -- UNVERIFIED]` banner and the canonical User block (from `whoami-block.sh`).
2. **Review the draft.** This is the point of the manual path:
- Verify the **Commands / Config / Reference** appendix matches what actually happened and what the user intended. These are machine-extracted verbatim — confirm they are complete and not misleading.
- Correct the **scope and slug**: the classifier is conservative and may land on `general` (or the wrong project/client) when work spanned several areas. Fix the target `session-logs/` directory accordingly.
- Tighten the **topic** in the filename and the title.
- Correct or rewrite the **Ollama prose** where it is imprecise. If Ollama was unreachable, write the prose sections yourself from the verbatim evidence.
3. **Write the final log.** Once verified, write the corrected markdown to the correct `session-logs/` path (client -> `clients/<slug>/session-logs/`, project -> `projects/<project>/session-logs/`, general -> root `session-logs/`), using the transcript's first-timestamp date: `YYYY-MM-DD-recovered-<topic>.md`. **Drop the UNVERIFIED banner** — by writing it yourself you have verified it.
4. **Sync:**
```bash
bash .claude/scripts/sync.sh
```
5. **Unseeded wiki check.** If the scope is a client or project with no `wiki/<type>/<slug>.md` article yet, suggest:
```
[INFO] No wiki article for '<slug>' yet. Run /wiki-compile <type>:<slug> to seed it.
```
---
## Difference from the automatic detector
| | `/recover` (this command) | `detect_orphaned_sessions.py` (background) |
|---|---|---|
| Trigger | Manual, on demand | Scheduled task (every few hours + at logon) |
| Review | Claude reviews and corrects before writing | None — auto-writes unreviewed |
| Banner | Removed once verified | Kept (`[RECOVERED -- UNVERIFIED]`) until a human reviews |
| Scope/topic | Corrected by Claude | Whatever the classifier decided |
| Output | Final, clean session log | Banner-marked draft committed for later review |
Use `/recover` when you know a specific session was lost and want a clean log. Let the detector catch the ones you forget.
---
## Notes
- `--auto` and `--json` modes on `recover_session.py` exist for the detector and for scripting; `/recover` uses `--print` so Claude always reviews before anything lands on disk.
- The prose is Ollama-drafted from the transcript; the Commands/Config/Reference sections are extracted verbatim by Python. Never trust the prose for exact commands, IPs, credentials, paths, SHAs, or ticket IDs — read those from the verbatim sections.
- Transcripts are per-machine. You can only recover sessions that ran on the machine you are on.

View File

@@ -52,6 +52,7 @@
- [Add Mike as owner on all Entra apps](feedback_entra_app_owner.md) — Apps created via management SP have no user owner — must add Mike manually or publisher verification fails. - [Add Mike as owner on all Entra apps](feedback_entra_app_owner.md) — Apps created via management SP have no user owner — must add Mike manually or publisher verification fails.
- [No TOML/config file approach for endpoints](feedback_no_toml_config_endpoints.md) — User explicitly prohibits TOML or config-file-based endpoint configuration — this will never be approved. - [No TOML/config file approach for endpoints](feedback_no_toml_config_endpoints.md) — User explicitly prohibits TOML or config-file-based endpoint configuration — this will never be approved.
- [Python on Windows — use py launcher](feedback_python_windows.md) — Windows Store python/python3 aliases disabled; always use py or jq on DESKTOP-0O8A1RL. - [Python on Windows — use py launcher](feedback_python_windows.md) — Windows Store python/python3 aliases disabled; always use py or jq on DESKTOP-0O8A1RL.
- [Unsaved sessions are recoverable from transcripts](feedback_session_recovery.md) — Crashed/closed-before-save sessions live in `~/.claude/projects/<slug>/*.jsonl`; the detector auto-recovers orphans, `/recover <uuid>` does it manually. Ollama prose + Python verbatim. See `.claude/RECOVERY.md`.
### Syncro ### Syncro
- [Syncro API plumbing](feedback_syncro_api.md) — Content-Type required on all POST/PUT; NO idempotency anywhere — always GET before retrying; response wrappers (`.ticket.id`, `.comment.id`); add_line_item shape (internal ID, flat response, required fields); HTML uses `<br>` not `<ul>/<li>`; timer_entry response is FLAT but SUPERSEDED (use add_line_item). - [Syncro API plumbing](feedback_syncro_api.md) — Content-Type required on all POST/PUT; NO idempotency anywhere — always GET before retrying; response wrappers (`.ticket.id`, `.comment.id`); add_line_item shape (internal ID, flat response, required fields); HTML uses `<br>` not `<ul>/<li>`; timer_entry response is FLAT but SUPERSEDED (use add_line_item).

View File

@@ -0,0 +1,19 @@
---
name: Unsaved sessions are recoverable from transcripts
description: Claude Code transcripts let you rebuild a session log after a crash/close-before-save; a detector auto-recovers orphans and /recover does it manually
type: feedback
---
Claude Code writes every session live to a transcript JSONL at `~/.claude/projects/<slug>/<uuid>.jsonl` (slug = the claudetools repo root with `/`, `\`, and `:` each replaced by `-`; computed from `claudetools_root` in identity.json). A session closed or crashed before `/save` is NOT lost — the work is fully recorded in that transcript and can be distilled back into a normal session log.
Toolset (`.claude/RECOVERY.md`):
- `.claude/scripts/recover_session.py` — engine. `--uuid`/`--latest`/`--path` with `--print`/`--auto`/`--json`.
- `.claude/scripts/detect_orphaned_sessions.py` — scans idle transcripts, auto-recovers orphans (substantive AND not saved), commits + pushes, FYIs `#bot-alerts`. `--dry-run` to scan only. Ledger at `.claude/state/recovered-sessions.json` (machine-local).
- `/recover <uuid>` — manual reviewed path; Claude corrects the draft before writing.
- `.claude/scripts/register-orphan-detector.ps1` — registers the scheduled task (Windows).
Accuracy split: Ollama drafts ONLY prose (summary/decisions/problems/pending); Python extracts commands, file paths, IPs, SHAs, tickets verbatim. Auto-recovered logs carry a `[RECOVERED -- UNVERIFIED]` banner until a human reviews them.
**Why:** Mike wanted to never lose work to a crashed/unclosed session again. Manual `/save` is the only thing that wrote logs before; the transcript is a complete fallback record.
**How to apply:** If a user says a session crashed or work was lost, run `py .claude/scripts/detect_orphaned_sessions.py --dry-run` to find candidate orphans, then `/recover <uuid>` to reconstruct and review a clean log. Don't assume work is gone — check the transcripts first.

View File

@@ -0,0 +1,431 @@
#!/usr/bin/env python3
"""detect_orphaned_sessions.py -- find and auto-recover unsaved Claude Code sessions.
A session is "orphaned" when its transcript records substantive (mutating) work
but the session was never saved (no /save, /scc, or /checkpoint, and no write into
a session-logs/ path). This script scans the per-machine transcript directory,
classifies each idle transcript via the recover_session engine, auto-builds a
banner-marked recovery log for each orphan, records every processed uuid in a
machine-local ledger so it is never re-scanned, commits + pushes the recovered
logs, and posts an FYI to #bot-alerts.
Modes:
(default) full run: build logs, update ledger, commit, push, alert
--dry-run scan + print a report table; write/commit/alert nothing
--idle-min N minutes of mtime-idle before a transcript is eligible (default 90)
--no-commit build + ledger, but skip git commit/push
--no-alert build + ledger + commit, but skip the Discord alert
The detector NEVER touches sync.sh; it does its own git add/commit/push so it has
no surprising side effects. Soft-fails on git/alert errors (work is already saved
to disk -- those are best-effort).
stdlib only; targets Python 3.11+.
"""
from __future__ import annotations
import argparse
import json
import shutil
import subprocess
import sys
from datetime import datetime, timezone
from pathlib import Path
# Import the shared engine (same directory).
sys.path.insert(0, str(Path(__file__).resolve().parent))
import recover_session as engine # noqa: E402
LEDGER_REL = Path(".claude") / "state" / "recovered-sessions.json"
def _now_iso() -> str:
return datetime.now(timezone.utc).isoformat()
def ledger_path() -> Path:
return engine.repo_root() / LEDGER_REL
def load_ledger() -> dict:
p = ledger_path()
if p.exists():
try:
return json.loads(p.read_text(encoding="utf-8"))
except (OSError, ValueError):
return {}
return {}
def save_ledger(ledger: dict) -> None:
p = ledger_path()
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(json.dumps(ledger, indent=2, ensure_ascii=False) + "\n", encoding="utf-8")
def _scope_str(scope: dict) -> str:
t = scope.get("type", "general")
if t == "general":
return "general"
return f"{t}:{scope.get('slug', '?')}"
def scan(idle_min: int, ledger: dict) -> tuple[list[dict], list[dict]]:
"""Scan transcripts.
Returns (eligible, recoverable):
eligible -- every transcript that is past idle and not already in ledger
(each a dict with parsed metadata + verdict fields)
recoverable -- the subset that are orphans (substantive and not saved)
"""
base = engine.transcript_base_dir()
now = datetime.now().timestamp()
idle_secs = idle_min * 60
eligible: list[dict] = []
recoverable: list[dict] = []
if not base.is_dir():
return eligible, recoverable
for jf in sorted(base.glob("*.jsonl")):
uuid = jf.stem
try:
mtime = jf.stat().st_mtime
except OSError:
continue
# Skip recently-active sessions.
if (now - mtime) < idle_secs:
continue
# Skip anything already processed.
if uuid in ledger:
continue
parsed = engine.parse_transcript(jf)
verdict = engine.classify(parsed)
orphan = bool(verdict["substantive"] and not verdict["saved"])
rec = {
"uuid": uuid,
"path": jf,
"mtime": mtime,
"substantive": verdict["substantive"],
"saved": verdict["saved"],
"orphan": orphan,
"scope": verdict["scope"],
"title": verdict["title"],
"parsed": parsed,
}
# would-write path (metadata-cheap; no Ollama)
rec["would_write"] = str(
engine.compute_output_path(parsed, verdict["scope"], verdict["title"])
)
eligible.append(rec)
if orphan:
recoverable.append(rec)
# Process OLDEST-FIRST so a capped run drains the longest-waiting orphans
# first. Prefer the transcript's first_ts when available; fall back to mtime.
def _age_key(r: dict):
ts = (r.get("parsed").first_ts if r.get("parsed") else "") or ""
if ts:
try:
return datetime.fromisoformat(ts.replace("Z", "+00:00")).timestamp()
except ValueError:
pass
return r.get("mtime", 0.0)
eligible.sort(key=_age_key)
recoverable.sort(key=_age_key)
return eligible, recoverable
def print_dry_run_table(eligible: list[dict]) -> None:
if not eligible:
print("[INFO] No eligible (past-idle, unprocessed) transcripts found.")
return
headers = ["uuid", "mtime", "subst", "saved", "orphan", "scope", "would-write-path"]
rows = []
for r in eligible:
mt = datetime.fromtimestamp(r["mtime"]).strftime("%Y-%m-%d %H:%M")
rows.append(
[
r["uuid"][:8],
mt,
"yes" if r["substantive"] else "no",
"yes" if r["saved"] else "no",
"YES" if r["orphan"] else "no",
_scope_str(r["scope"]),
r["would_write"],
]
)
widths = [len(h) for h in headers]
for row in rows:
for i, cell in enumerate(row):
widths[i] = max(widths[i], len(str(cell)))
fmt = " ".join("{:<" + str(w) + "}" for w in widths)
print(fmt.format(*headers))
print(fmt.format(*["-" * w for w in widths]))
for row in rows:
print(fmt.format(*[str(c) for c in row]))
n_orphan = sum(1 for r in eligible if r["orphan"])
print()
print(f"[INFO] {len(eligible)} eligible, {n_orphan} orphan(s) would be recovered.")
def _existing_recovered_for_uuid(out_dir: Path, uuid: str) -> Path | None:
"""Return a prior recovered log for THIS uuid in ``out_dir``, if one exists.
The tool's own collision filename embeds the 8-char uuid prefix as a trailing
``-recovered-...-<short>.md`` suffix (see ``compute_output_path``). Matching on
that prefix lets a re-run overwrite its OWN prior draft for the same uuid in
place -- the one safe overwrite -- instead of minting a second suffixed copy.
Only files that are clearly recovered drafts (``-recovered-`` in the name AND
ending in ``-<short>.md``) are considered. A genuine non-recovered human log
will never match, so its suffix protection is preserved.
"""
if not out_dir.is_dir():
return None
short = uuid[:8]
suffix = f"-{short}.md"
for f in out_dir.glob(f"*-recovered-*{suffix}"):
if f.is_file() and f.name.endswith(suffix):
return f
return None
def recover_one(rec: dict) -> str:
"""Build + write the recovery log for one orphan. Returns the written path.
Idempotent per-uuid: if a prior recovered draft for THIS uuid already exists
in the target directory (a run that died after writing but before the ledger
was updated), overwrite that same file in place rather than creating a new
suffixed copy. Never overwrites a non-recovered human log.
"""
parsed = rec["parsed"]
markdown, meta = engine.build_log(parsed)
out_path = Path(meta["path_would_be"])
prior = _existing_recovered_for_uuid(out_path.parent, rec["uuid"])
if prior is not None:
out_path = prior
out_path.parent.mkdir(parents=True, exist_ok=True)
out_path.write_text(markdown, encoding="utf-8")
rec["written"] = str(out_path)
rec["date"] = meta["date"]
return str(out_path)
def git(*args: str) -> subprocess.CompletedProcess:
return subprocess.run(
["git", *args],
cwd=str(engine.repo_root()),
capture_output=True,
text=True,
timeout=120,
)
def _current_branch() -> str:
"""Return the current git branch name, or empty string if undeterminable."""
res = git("rev-parse", "--abbrev-ref", "HEAD")
if res.returncode == 0:
name = res.stdout.strip()
if name and name != "HEAD":
return name
return ""
def commit_and_push(written_paths: list[str], count: int) -> bool:
"""Stage only the recovered logs, commit, push. Soft-fail on errors.
NEVER stages the ledger -- it is machine-local and correctly gitignored;
appending it to ``git add`` aborts the whole add (exit 1) and stages nothing.
Returns True only when BOTH the commit AND the push succeed. On any failure
returns False so the caller knows not to mark these uuids ``recovered`` (the
next run must re-attempt them).
"""
root = engine.repo_root()
rel_paths = []
for p in written_paths:
try:
rel_paths.append(str(Path(p).resolve().relative_to(root)))
except ValueError:
rel_paths.append(p)
add = git("add", "--", *rel_paths)
if add.returncode != 0:
print(f"[WARNING] git add failed; logs are on disk but uncommitted: {add.stderr.strip()}", file=sys.stderr)
return False
msg = (
f"chore: auto-recover {count} unsaved session log(s)\n\n"
f"{engine._COMMIT_FOOTER}"
)
commit = git("commit", "-m", msg)
if commit.returncode != 0:
# Nothing to commit, or hook failure -- soft-fail.
print(f"[WARNING] git commit returned non-zero: {commit.stdout.strip()} {commit.stderr.strip()}", file=sys.stderr)
return False
print(f"[OK] committed {count} recovered log(s).")
branch = _current_branch()
if branch:
push = git("push", "origin", branch)
else:
push = git("push")
if push.returncode != 0:
target = f"origin {branch}" if branch else "origin"
print(
f"[WARNING] git push to {target} failed (commit is local): {push.stderr.strip()}",
file=sys.stderr,
)
return False
print(f"[OK] pushed to origin{(' ' + branch) if branch else ''}.")
return True
def post_alert(recovered: list[dict]) -> None:
"""Post an FYI to #bot-alerts via post-bot-alert.sh. Soft-fail."""
script = engine.repo_root() / ".claude" / "scripts" / "post-bot-alert.sh"
if not script.exists():
print("[WARNING] post-bot-alert.sh not found; alert skipped.", file=sys.stderr)
return
bash = shutil.which("bash")
if not bash:
print(
"[WARNING] 'bash' not found on PATH (restricted scheduler env?); "
"#bot-alerts FYI skipped. Recovered logs are already committed.",
file=sys.stderr,
)
return
lines = [
f"[INFO] Auto-recovered {len(recovered)} unsaved session log(s) -- "
f"already saved to the repo; FYI, please review and remove the UNVERIFIED banner:"
]
for r in recovered:
lines.append(
f"- {r['uuid'][:8]} | {r.get('date', '?')} | {_scope_str(r['scope'])} | {r.get('written', '?')}"
)
message = "\n".join(lines)
try:
res = subprocess.run(
[bash, str(script), message, "bot"],
cwd=str(engine.repo_root()),
capture_output=True,
text=True,
timeout=30,
)
out = (res.stdout or "").strip() or (res.stderr or "").strip()
if out:
print(out)
except (OSError, subprocess.SubprocessError) as e:
print(f"[WARNING] alert post failed: {e}", file=sys.stderr)
def main(argv: list[str] | None = None) -> int:
# Force UTF-8 stdout (Windows console defaults to cp1252; titles/paths in
# the dry-run table can contain characters outside that codepage).
try:
sys.stdout.reconfigure(encoding="utf-8", errors="replace")
except (AttributeError, ValueError):
pass
parser = argparse.ArgumentParser(
description="Detect and auto-recover unsaved Claude Code sessions."
)
parser.add_argument("--dry-run", action="store_true", help="scan + print report; no writes/commit/alert")
parser.add_argument("--idle-min", type=int, default=90, help="minutes of mtime-idle before eligible (default 90)")
parser.add_argument("--max", type=int, default=25, dest="max_recover", help="max orphan logs to build per run, oldest-first (default 25)")
parser.add_argument("--no-commit", action="store_true", help="skip git commit/push")
parser.add_argument("--no-alert", action="store_true", help="skip the Discord alert")
args = parser.parse_args(argv)
# Respect the ledger in both modes (dry-run still skips already-processed).
ledger = load_ledger()
eligible, recoverable = scan(args.idle_min, ledger)
if args.dry_run:
print_dry_run_table(eligible)
return 0
if not eligible:
print("[INFO] No eligible transcripts to process.")
return 0
written_paths: list[str] = []
recovered_recs: list[dict] = []
deferred = 0
built = 0
for rec in eligible:
uuid = rec["uuid"]
if rec["orphan"]:
# Cap actual log-builds per run (oldest-first). Remaining orphans are
# left OUT of the ledger so the next run re-attempts them.
if built >= args.max_recover:
deferred += 1
continue
try:
path = recover_one(rec)
except Exception as e: # noqa: BLE001 -- never let one bad transcript abort the run
print(f"[WARNING] failed to recover {uuid[:8]}: {e}", file=sys.stderr)
# No on-disk artifact -> safe to mark immediately.
ledger[uuid] = {"verdict": "error", "at": _now_iso(), "path": None, "error": str(e)}
continue
built += 1
written_paths.append(path)
recovered_recs.append(rec)
print(f"[OK] recovered {uuid[:8]} -> {path}")
elif rec["saved"]:
# No on-disk artifact -> safe to mark immediately.
ledger[uuid] = {"verdict": "skipped-saved", "at": _now_iso(), "path": None}
else:
ledger[uuid] = {"verdict": "skipped-trivial", "at": _now_iso(), "path": None}
if deferred:
print(f"[INFO] {deferred} more orphan(s) deferred to next run (--max {args.max_recover}).")
# Persist the skipped/error verdicts now (they have no artifact, so they are
# safe regardless of the commit/push outcome below).
save_ledger(ledger)
if not recovered_recs:
print("[INFO] No orphans recovered (all eligible sessions were saved or trivial).")
return 0
if not args.no_commit:
pushed = commit_and_push(written_paths, len(recovered_recs))
if pushed:
# H1: only mark uuids 'recovered' AFTER a successful commit+push, so a
# push failure leaves them out of the ledger for the next run to retry.
for rec in recovered_recs:
ledger[rec["uuid"]] = {
"verdict": "recovered",
"at": _now_iso(),
"path": rec.get("written"),
}
save_ledger(ledger)
else:
print(
"[WARNING] commit/push did not succeed; recovered uuids left UNLEDGERED "
"so the next run re-attempts them (logs are on disk).",
file=sys.stderr,
)
else:
print("[INFO] --no-commit set; recovered logs left unstaged and UNLEDGERED (next run will re-attempt).")
if not args.no_alert:
post_alert(recovered_recs)
else:
print("[INFO] --no-alert set; Discord alert skipped.")
return 0
if __name__ == "__main__":
raise SystemExit(main())

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,95 @@
# register-orphan-detector.ps1
# Register the "ClaudeTools - Orphaned Session Detector" scheduled task on this
# Windows machine. The task runs detect_orphaned_sessions.py, which scans the
# per-machine Claude Code transcript directory for unsaved substantive sessions,
# auto-builds banner-marked recovery logs, commits + pushes them, and posts an
# FYI to #bot-alerts.
#
# Mirrors the GrepAI watcher registration pattern in .claude/OLLAMA.md.
#
# Triggers:
# - AtLogOn (catch sessions lost since the last logon)
# - Daily, repeating every 4 hours (catch crashes during a long workday;
# 4h cadence pairs with the detector's 90-minute idle gate so an active
# session is never grabbed mid-flight)
#
# Idempotent: -Force replaces any existing task with the same name.
# This script only REGISTERS the task. It does not run the detector now.
#
# Run from an ordinary (non-admin) PowerShell:
# powershell -ExecutionPolicy Bypass -File D:\claudetools\.claude\scripts\register-orphan-detector.ps1
$ErrorActionPreference = "Stop"
$TaskName = "ClaudeTools - Orphaned Session Detector"
# Resolve the repo root portably. Prefer claudetools_root from identity.json
# (per-machine, gitignored); fall back to two levels up from this script
# (.claude/scripts/ -> repo root), resolved to a full path.
$ScriptDir = $PSScriptRoot
$FallbackRoot = (Resolve-Path (Join-Path $ScriptDir "..\..")).Path
$IdentityPath = Join-Path $FallbackRoot ".claude\identity.json"
$RepoRoot = $FallbackRoot
if (Test-Path $IdentityPath) {
try {
$identity = Get-Content -Raw -Path $IdentityPath | ConvertFrom-Json
if ($identity.claudetools_root -and (Test-Path $identity.claudetools_root)) {
$RepoRoot = (Resolve-Path $identity.claudetools_root).Path
}
} catch {
Write-Host "[WARNING] Could not parse $IdentityPath; using $FallbackRoot" -ForegroundColor Yellow
}
}
$Script = Join-Path $RepoRoot ".claude\scripts\detect_orphaned_sessions.py"
if (-not (Test-Path $Script)) {
Write-Host "[ERROR] Detector not found at $Script" -ForegroundColor Red
exit 1
}
# Resolve the py launcher's full path (the action's Execute wants an absolute
# path; "py" alone usually resolves but we pin it for reliability under the
# Task Scheduler's environment).
$PyCmd = Get-Command py -ErrorAction SilentlyContinue
if ($null -ne $PyCmd) {
$PyPath = $PyCmd.Source
} else {
$PyPath = "py" # fall back to PATH resolution at run time
}
$Action = New-ScheduledTaskAction `
-Execute $PyPath `
-Argument "`"$Script`"" `
-WorkingDirectory $RepoRoot
# Trigger 1: at logon for the current user.
$TriggerLogon = New-ScheduledTaskTrigger -AtLogOn -User $env:USERNAME
# Trigger 2: daily at a fixed start, repeating every 4 hours all day.
$TriggerDaily = New-ScheduledTaskTrigger -Daily -At 9am
$TriggerDaily.Repetition = (New-ScheduledTaskTrigger `
-Once -At 9am `
-RepetitionInterval (New-TimeSpan -Hours 4) `
-RepetitionDuration (New-TimeSpan -Hours 24)).Repetition
$Settings = New-ScheduledTaskSettingsSet `
-ExecutionTimeLimit (New-TimeSpan -Minutes 30) `
-MultipleInstances IgnoreNew `
-StartWhenAvailable `
-DontStopOnIdleEnd
Register-ScheduledTask `
-TaskName $TaskName `
-Action $Action `
-Trigger $TriggerLogon, $TriggerDaily `
-Settings $Settings `
-Description "Scans Claude Code transcripts for unsaved substantive sessions and auto-recovers them into session logs." `
-Force | Out-Null
Write-Host "[OK] Registered scheduled task '$TaskName'."
Write-Host "[INFO] Action: $PyPath `"$Script`""
Write-Host "[INFO] WorkDir: $RepoRoot"
Write-Host "[INFO] Triggers: AtLogOn ($env:USERNAME) + daily every 4h"
Write-Host "[INFO] To inspect: Get-ScheduledTask -TaskName '$TaskName' | Format-List"
Write-Host "[INFO] To run now: Start-ScheduledTask -TaskName '$TaskName'"
Write-Host "[INFO] To remove: Unregister-ScheduledTask -TaskName '$TaskName' -Confirm:`$false"

3
.gitignore vendored
View File

@@ -18,6 +18,9 @@ tmp-remediation/
.claude/coord-broadcasts-seen .claude/coord-broadcasts-seen
.claude/scheduled_tasks.lock .claude/scheduled_tasks.lock
# Session-recovery ledger (machine-local — transcripts are per-machine)
.claude/state/
# /autotask command — kept local/undistributed (Syncro is the default PSA; Autotask is opt-in). # /autotask command — kept local/undistributed (Syncro is the default PSA; Autotask is opt-in).
# Remove this line to distribute /autotask to the fleet. See .claude/memory/feedback_psa_default_syncro.md # Remove this line to distribute /autotask to the fleet. See .claude/memory/feedback_psa_default_syncro.md
.claude/commands/autotask.md .claude/commands/autotask.md