feat: session recovery toolset (orphan detector + /recover)
Reconstructs session logs from Claude Code transcripts when a session crashes or is closed before /save. Two entry points: - /recover <uuid|latest> : manual, Claude-reviewed reconstruction - detect_orphaned_sessions.py : scheduled scan that auto-builds logs for substantive, unsaved, not-yet-recovered transcripts (banner-marked RECOVERED-UNVERIFIED), commits them, and posts a #bot-alerts FYI. recover_session.py is the shared engine: Python extracts the verbatim command/config/reference timeline; Ollama drafts prose-only narrative. Machine-local ledger (.claude/state/) prevents reprocessing. Reviewed: git add scoped to own files, ledger written only after successful push, per-uuid idempotency, --max cap for unattended runs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -274,6 +274,7 @@ Vault structure: `infrastructure/`, `clients/`, `services/`, `projects/`, `msp-t
|
||||
| `/shape-spec` | Pre-implementation spec for a GuruRMM feature — produces plan.md, shape.md, references.md, standards.md |
|
||||
| `/rmm-audit` | Full end-to-end audit of GuruRMM: API coverage, UI gaps, Rust/TS quality, security, data integrity. Produces timestamped report + updates UI_GAPS.md |
|
||||
| `/forum-post` | Post a technical article to community.azcomputerguru.com — drafts from context, shows preview, inserts via paramiko SSH to Flarum DB |
|
||||
| `/recover` | Reconstruct a session log from a Claude Code transcript after a crash/close-before-save. `/recover <uuid>`, `/recover latest`, or `/recover --list`. See `.claude/RECOVERY.md` |
|
||||
|
||||
---
|
||||
|
||||
|
||||
76
.claude/RECOVERY.md
Normal file
76
.claude/RECOVERY.md
Normal file
@@ -0,0 +1,76 @@
|
||||
# Session Recovery
|
||||
|
||||
Never lose work again when a Claude Code session crashes or is closed before `/save`.
|
||||
|
||||
Claude Code writes every session live to a transcript JSONL. This toolset distills those transcripts back into normal session logs in the `.claude/commands/save.md` format.
|
||||
|
||||
---
|
||||
|
||||
## The three pieces
|
||||
|
||||
| Piece | File | Role |
|
||||
|---|---|---|
|
||||
| Engine | `.claude/scripts/recover_session.py` | Parses one transcript, classifies it, and reconstructs a full session log. CLI: `--uuid` / `--latest` / `--path` with `--print` (default), `--auto`, or `--json`. |
|
||||
| Detector | `.claude/scripts/detect_orphaned_sessions.py` | Scans all idle transcripts, auto-recovers the orphans (substantive + unsaved), updates the ledger, commits + pushes, and posts an FYI to `#bot-alerts`. CLI: `--dry-run`, `--idle-min N`, `--no-commit`, `--no-alert`. |
|
||||
| Command | `.claude/commands/recover.md` | `/recover <uuid>` / `/recover latest` / `/recover --list` — the **manual, reviewed** path where Claude edits the draft before writing. |
|
||||
|
||||
The scheduled-task registration script `.claude/scripts/register-orphan-detector.ps1` wires the detector into the Windows Task Scheduler (Windows only).
|
||||
|
||||
---
|
||||
|
||||
## Where things live
|
||||
|
||||
- **Transcripts:** `~/.claude/projects/<slug>/<uuid>.jsonl`, where `<slug>` is the claudetools repo root with `/`, `\`, and `:` each replaced by `-`. On a `D:\claudetools` machine the slug is `D--claudetools`, so `C:\Users\<you>\.claude\projects\D--claudetools\*.jsonl`. The slug is computed portably from `claudetools_root` in `.claude/identity.json`. Sibling `<uuid>/` dirs hold subagent transcripts and are ignored for the main narrative.
|
||||
- **Ledger:** `.claude/state/recovered-sessions.json` (machine-local, gitignored). Records every processed uuid with its verdict (`recovered` / `skipped-saved` / `skipped-trivial` / `error`) so it is never re-scanned. Transcripts are per-machine, so the ledger is too.
|
||||
|
||||
---
|
||||
|
||||
## How to run
|
||||
|
||||
```bash
|
||||
# See candidate orphans without writing anything:
|
||||
py .claude/scripts/detect_orphaned_sessions.py --dry-run
|
||||
|
||||
# Inspect one transcript's verdict as JSON (writes nothing):
|
||||
py .claude/scripts/recover_session.py --json --uuid <uuid>
|
||||
|
||||
# Print a reconstructed log to stdout (writes nothing):
|
||||
py .claude/scripts/recover_session.py --uuid <uuid> --print
|
||||
|
||||
# Full unattended run (writes logs, updates ledger, commits, pushes, alerts):
|
||||
py .claude/scripts/detect_orphaned_sessions.py
|
||||
```
|
||||
|
||||
### Register the scheduled task (Windows)
|
||||
|
||||
```powershell
|
||||
powershell -ExecutionPolicy Bypass -File D:\claudetools\.claude\scripts\register-orphan-detector.ps1
|
||||
```
|
||||
|
||||
Registers `ClaudeTools - Orphaned Session Detector`: runs at logon and every 4 hours. The 4-hour cadence pairs with the detector's 90-minute idle gate so an active session is never grabbed mid-flight.
|
||||
|
||||
---
|
||||
|
||||
## Accuracy split: Ollama prose vs Python verbatim
|
||||
|
||||
This is the core design principle.
|
||||
|
||||
- **Ollama drafts prose only** — Session Summary, Key Decisions, Problems Encountered, Pending / Incomplete Tasks. It never sees and never emits commands, IPs, credentials, file paths, commit SHAs, or ticket IDs. If Ollama is unreachable the log is still produced with a placeholder note in the prose sections.
|
||||
- **Python extracts the verbatim evidence** — Configuration Changes (Write/Edit/NotebookEdit targets), Commands & Outputs (mutating Bash/PowerShell with truncated results), Reference Information (regex-extracted SHAs, URLs, IPs, ticket numbers, coord message ids), and Infrastructure & Servers. This is the high-value, accuracy-critical part and it comes straight from the transcript.
|
||||
|
||||
Trust the verbatim sections for facts; treat the prose as a draft.
|
||||
|
||||
---
|
||||
|
||||
## Classification
|
||||
|
||||
- **substantive** — the session did real work: a Write/Edit/NotebookEdit, a mutating Bash/PowerShell command (git commit/push/add, ssh, schtasks, New-Item, Set-Content, Remove-Item, Out-File, a POST/PUT/DELETE/PATCH curl, an `/api/` call, `vault.sh`, a mutating Invoke-RestMethod), or a mutating Skill (syncro, rmm, remediation-tool, mailbox, forum-post, syncro-emergency-billing).
|
||||
- **saved** — the session was already saved: a save/scc/checkpoint Skill, or a Write/Edit into a `session-logs/` path.
|
||||
- **orphan** = substantive AND not saved. Only orphans are auto-recovered.
|
||||
- **scope** — client / project / general, decided by Python from the transcript text, `cwd`, and `gitBranch` against the known client and project slugs. Conservative: ambiguous resolves to `general`.
|
||||
|
||||
---
|
||||
|
||||
## Banner discipline
|
||||
|
||||
Auto-recovered logs are written with a `[RECOVERED -- UNVERIFIED]` banner. **The banner stays until a human reviews the log** and removes it. The manual `/recover` path lets Claude review and correct the draft before writing, and drops the banner once verified.
|
||||
84
.claude/commands/recover.md
Normal file
84
.claude/commands/recover.md
Normal file
@@ -0,0 +1,84 @@
|
||||
Reconstruct a session log from a Claude Code transcript when a session crashed or was closed before `/save`.
|
||||
|
||||
Claude Code writes every session live to a transcript JSONL under `~/.claude/projects/<slug>/<uuid>.jsonl`. `/recover` distills one of those transcripts back into a normal session log in the `.claude/commands/save.md` format. This is the **manual, reviewed** path; the background detector (`detect_orphaned_sessions.py`) handles unattended auto-recovery.
|
||||
|
||||
---
|
||||
|
||||
## Usage
|
||||
|
||||
| Invocation | Action |
|
||||
|---|---|
|
||||
| `/recover <uuid>` | Reconstruct the session with that transcript uuid |
|
||||
| `/recover latest` | Reconstruct the newest transcript by mtime |
|
||||
| `/recover --list` | Show candidate orphans (runs the detector `--dry-run`) |
|
||||
|
||||
---
|
||||
|
||||
## Flow: `/recover --list`
|
||||
|
||||
Run the detector in scan-only mode and present the table to the user:
|
||||
|
||||
```bash
|
||||
py .claude/scripts/detect_orphaned_sessions.py --dry-run
|
||||
```
|
||||
|
||||
The table shows every past-idle, not-yet-processed transcript with its uuid, mtime, `substantive`/`saved`/`orphan` verdicts, classified scope, and the path a recovery would write to. Point the user at the rows where `orphan` is `YES` — those are unsaved substantive sessions. Nothing is written.
|
||||
|
||||
---
|
||||
|
||||
## Flow: `/recover <uuid>` or `/recover latest`
|
||||
|
||||
This is a **reviewed** recovery. Claude is the editor, not a passive writer.
|
||||
|
||||
1. **Generate the draft** (prints to stdout, writes nothing):
|
||||
|
||||
```bash
|
||||
py .claude/scripts/recover_session.py --uuid <uuid> --print
|
||||
```
|
||||
|
||||
(or `--latest`). The draft contains:
|
||||
- Ollama-drafted prose: Session Summary, Key Decisions, Problems Encountered, Pending / Incomplete Tasks.
|
||||
- Python-extracted verbatim evidence: Configuration Changes, Commands & Outputs, Reference Information, Infrastructure & Servers, Credentials & Secrets.
|
||||
- A `[RECOVERED -- UNVERIFIED]` banner and the canonical User block (from `whoami-block.sh`).
|
||||
|
||||
2. **Review the draft.** This is the point of the manual path:
|
||||
- Verify the **Commands / Config / Reference** appendix matches what actually happened and what the user intended. These are machine-extracted verbatim — confirm they are complete and not misleading.
|
||||
- Correct the **scope and slug**: the classifier is conservative and may land on `general` (or the wrong project/client) when work spanned several areas. Fix the target `session-logs/` directory accordingly.
|
||||
- Tighten the **topic** in the filename and the title.
|
||||
- Correct or rewrite the **Ollama prose** where it is imprecise. If Ollama was unreachable, write the prose sections yourself from the verbatim evidence.
|
||||
|
||||
3. **Write the final log.** Once verified, write the corrected markdown to the correct `session-logs/` path (client -> `clients/<slug>/session-logs/`, project -> `projects/<project>/session-logs/`, general -> root `session-logs/`), using the transcript's first-timestamp date: `YYYY-MM-DD-recovered-<topic>.md`. **Drop the UNVERIFIED banner** — by writing it yourself you have verified it.
|
||||
|
||||
4. **Sync:**
|
||||
|
||||
```bash
|
||||
bash .claude/scripts/sync.sh
|
||||
```
|
||||
|
||||
5. **Unseeded wiki check.** If the scope is a client or project with no `wiki/<type>/<slug>.md` article yet, suggest:
|
||||
|
||||
```
|
||||
[INFO] No wiki article for '<slug>' yet. Run /wiki-compile <type>:<slug> to seed it.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Difference from the automatic detector
|
||||
|
||||
| | `/recover` (this command) | `detect_orphaned_sessions.py` (background) |
|
||||
|---|---|---|
|
||||
| Trigger | Manual, on demand | Scheduled task (every few hours + at logon) |
|
||||
| Review | Claude reviews and corrects before writing | None — auto-writes unreviewed |
|
||||
| Banner | Removed once verified | Kept (`[RECOVERED -- UNVERIFIED]`) until a human reviews |
|
||||
| Scope/topic | Corrected by Claude | Whatever the classifier decided |
|
||||
| Output | Final, clean session log | Banner-marked draft committed for later review |
|
||||
|
||||
Use `/recover` when you know a specific session was lost and want a clean log. Let the detector catch the ones you forget.
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
- `--auto` and `--json` modes on `recover_session.py` exist for the detector and for scripting; `/recover` uses `--print` so Claude always reviews before anything lands on disk.
|
||||
- The prose is Ollama-drafted from the transcript; the Commands/Config/Reference sections are extracted verbatim by Python. Never trust the prose for exact commands, IPs, credentials, paths, SHAs, or ticket IDs — read those from the verbatim sections.
|
||||
- Transcripts are per-machine. You can only recover sessions that ran on the machine you are on.
|
||||
@@ -52,6 +52,7 @@
|
||||
- [Add Mike as owner on all Entra apps](feedback_entra_app_owner.md) — Apps created via management SP have no user owner — must add Mike manually or publisher verification fails.
|
||||
- [No TOML/config file approach for endpoints](feedback_no_toml_config_endpoints.md) — User explicitly prohibits TOML or config-file-based endpoint configuration — this will never be approved.
|
||||
- [Python on Windows — use py launcher](feedback_python_windows.md) — Windows Store python/python3 aliases disabled; always use py or jq on DESKTOP-0O8A1RL.
|
||||
- [Unsaved sessions are recoverable from transcripts](feedback_session_recovery.md) — Crashed/closed-before-save sessions live in `~/.claude/projects/<slug>/*.jsonl`; the detector auto-recovers orphans, `/recover <uuid>` does it manually. Ollama prose + Python verbatim. See `.claude/RECOVERY.md`.
|
||||
|
||||
### Syncro
|
||||
- [Syncro API plumbing](feedback_syncro_api.md) — Content-Type required on all POST/PUT; NO idempotency anywhere — always GET before retrying; response wrappers (`.ticket.id`, `.comment.id`); add_line_item shape (internal ID, flat response, required fields); HTML uses `<br>` not `<ul>/<li>`; timer_entry response is FLAT but SUPERSEDED (use add_line_item).
|
||||
|
||||
19
.claude/memory/feedback_session_recovery.md
Normal file
19
.claude/memory/feedback_session_recovery.md
Normal file
@@ -0,0 +1,19 @@
|
||||
---
|
||||
name: Unsaved sessions are recoverable from transcripts
|
||||
description: Claude Code transcripts let you rebuild a session log after a crash/close-before-save; a detector auto-recovers orphans and /recover does it manually
|
||||
type: feedback
|
||||
---
|
||||
|
||||
Claude Code writes every session live to a transcript JSONL at `~/.claude/projects/<slug>/<uuid>.jsonl` (slug = the claudetools repo root with `/`, `\`, and `:` each replaced by `-`; computed from `claudetools_root` in identity.json). A session closed or crashed before `/save` is NOT lost — the work is fully recorded in that transcript and can be distilled back into a normal session log.
|
||||
|
||||
Toolset (`.claude/RECOVERY.md`):
|
||||
- `.claude/scripts/recover_session.py` — engine. `--uuid`/`--latest`/`--path` with `--print`/`--auto`/`--json`.
|
||||
- `.claude/scripts/detect_orphaned_sessions.py` — scans idle transcripts, auto-recovers orphans (substantive AND not saved), commits + pushes, FYIs `#bot-alerts`. `--dry-run` to scan only. Ledger at `.claude/state/recovered-sessions.json` (machine-local).
|
||||
- `/recover <uuid>` — manual reviewed path; Claude corrects the draft before writing.
|
||||
- `.claude/scripts/register-orphan-detector.ps1` — registers the scheduled task (Windows).
|
||||
|
||||
Accuracy split: Ollama drafts ONLY prose (summary/decisions/problems/pending); Python extracts commands, file paths, IPs, SHAs, tickets verbatim. Auto-recovered logs carry a `[RECOVERED -- UNVERIFIED]` banner until a human reviews them.
|
||||
|
||||
**Why:** Mike wanted to never lose work to a crashed/unclosed session again. Manual `/save` is the only thing that wrote logs before; the transcript is a complete fallback record.
|
||||
|
||||
**How to apply:** If a user says a session crashed or work was lost, run `py .claude/scripts/detect_orphaned_sessions.py --dry-run` to find candidate orphans, then `/recover <uuid>` to reconstruct and review a clean log. Don't assume work is gone — check the transcripts first.
|
||||
431
.claude/scripts/detect_orphaned_sessions.py
Normal file
431
.claude/scripts/detect_orphaned_sessions.py
Normal file
@@ -0,0 +1,431 @@
|
||||
#!/usr/bin/env python3
|
||||
"""detect_orphaned_sessions.py -- find and auto-recover unsaved Claude Code sessions.
|
||||
|
||||
A session is "orphaned" when its transcript records substantive (mutating) work
|
||||
but the session was never saved (no /save, /scc, or /checkpoint, and no write into
|
||||
a session-logs/ path). This script scans the per-machine transcript directory,
|
||||
classifies each idle transcript via the recover_session engine, auto-builds a
|
||||
banner-marked recovery log for each orphan, records every processed uuid in a
|
||||
machine-local ledger so it is never re-scanned, commits + pushes the recovered
|
||||
logs, and posts an FYI to #bot-alerts.
|
||||
|
||||
Modes:
|
||||
(default) full run: build logs, update ledger, commit, push, alert
|
||||
--dry-run scan + print a report table; write/commit/alert nothing
|
||||
--idle-min N minutes of mtime-idle before a transcript is eligible (default 90)
|
||||
--no-commit build + ledger, but skip git commit/push
|
||||
--no-alert build + ledger + commit, but skip the Discord alert
|
||||
|
||||
The detector NEVER touches sync.sh; it does its own git add/commit/push so it has
|
||||
no surprising side effects. Soft-fails on git/alert errors (work is already saved
|
||||
to disk -- those are best-effort).
|
||||
|
||||
stdlib only; targets Python 3.11+.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
# Import the shared engine (same directory).
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent))
|
||||
import recover_session as engine # noqa: E402
|
||||
|
||||
|
||||
LEDGER_REL = Path(".claude") / "state" / "recovered-sessions.json"
|
||||
|
||||
|
||||
def _now_iso() -> str:
|
||||
return datetime.now(timezone.utc).isoformat()
|
||||
|
||||
|
||||
def ledger_path() -> Path:
|
||||
return engine.repo_root() / LEDGER_REL
|
||||
|
||||
|
||||
def load_ledger() -> dict:
|
||||
p = ledger_path()
|
||||
if p.exists():
|
||||
try:
|
||||
return json.loads(p.read_text(encoding="utf-8"))
|
||||
except (OSError, ValueError):
|
||||
return {}
|
||||
return {}
|
||||
|
||||
|
||||
def save_ledger(ledger: dict) -> None:
|
||||
p = ledger_path()
|
||||
p.parent.mkdir(parents=True, exist_ok=True)
|
||||
p.write_text(json.dumps(ledger, indent=2, ensure_ascii=False) + "\n", encoding="utf-8")
|
||||
|
||||
|
||||
def _scope_str(scope: dict) -> str:
|
||||
t = scope.get("type", "general")
|
||||
if t == "general":
|
||||
return "general"
|
||||
return f"{t}:{scope.get('slug', '?')}"
|
||||
|
||||
|
||||
def scan(idle_min: int, ledger: dict) -> tuple[list[dict], list[dict]]:
|
||||
"""Scan transcripts.
|
||||
|
||||
Returns (eligible, recoverable):
|
||||
eligible -- every transcript that is past idle and not already in ledger
|
||||
(each a dict with parsed metadata + verdict fields)
|
||||
recoverable -- the subset that are orphans (substantive and not saved)
|
||||
"""
|
||||
base = engine.transcript_base_dir()
|
||||
now = datetime.now().timestamp()
|
||||
idle_secs = idle_min * 60
|
||||
|
||||
eligible: list[dict] = []
|
||||
recoverable: list[dict] = []
|
||||
|
||||
if not base.is_dir():
|
||||
return eligible, recoverable
|
||||
|
||||
for jf in sorted(base.glob("*.jsonl")):
|
||||
uuid = jf.stem
|
||||
try:
|
||||
mtime = jf.stat().st_mtime
|
||||
except OSError:
|
||||
continue
|
||||
# Skip recently-active sessions.
|
||||
if (now - mtime) < idle_secs:
|
||||
continue
|
||||
# Skip anything already processed.
|
||||
if uuid in ledger:
|
||||
continue
|
||||
|
||||
parsed = engine.parse_transcript(jf)
|
||||
verdict = engine.classify(parsed)
|
||||
orphan = bool(verdict["substantive"] and not verdict["saved"])
|
||||
rec = {
|
||||
"uuid": uuid,
|
||||
"path": jf,
|
||||
"mtime": mtime,
|
||||
"substantive": verdict["substantive"],
|
||||
"saved": verdict["saved"],
|
||||
"orphan": orphan,
|
||||
"scope": verdict["scope"],
|
||||
"title": verdict["title"],
|
||||
"parsed": parsed,
|
||||
}
|
||||
# would-write path (metadata-cheap; no Ollama)
|
||||
rec["would_write"] = str(
|
||||
engine.compute_output_path(parsed, verdict["scope"], verdict["title"])
|
||||
)
|
||||
eligible.append(rec)
|
||||
if orphan:
|
||||
recoverable.append(rec)
|
||||
|
||||
# Process OLDEST-FIRST so a capped run drains the longest-waiting orphans
|
||||
# first. Prefer the transcript's first_ts when available; fall back to mtime.
|
||||
def _age_key(r: dict):
|
||||
ts = (r.get("parsed").first_ts if r.get("parsed") else "") or ""
|
||||
if ts:
|
||||
try:
|
||||
return datetime.fromisoformat(ts.replace("Z", "+00:00")).timestamp()
|
||||
except ValueError:
|
||||
pass
|
||||
return r.get("mtime", 0.0)
|
||||
|
||||
eligible.sort(key=_age_key)
|
||||
recoverable.sort(key=_age_key)
|
||||
|
||||
return eligible, recoverable
|
||||
|
||||
|
||||
def print_dry_run_table(eligible: list[dict]) -> None:
|
||||
if not eligible:
|
||||
print("[INFO] No eligible (past-idle, unprocessed) transcripts found.")
|
||||
return
|
||||
headers = ["uuid", "mtime", "subst", "saved", "orphan", "scope", "would-write-path"]
|
||||
rows = []
|
||||
for r in eligible:
|
||||
mt = datetime.fromtimestamp(r["mtime"]).strftime("%Y-%m-%d %H:%M")
|
||||
rows.append(
|
||||
[
|
||||
r["uuid"][:8],
|
||||
mt,
|
||||
"yes" if r["substantive"] else "no",
|
||||
"yes" if r["saved"] else "no",
|
||||
"YES" if r["orphan"] else "no",
|
||||
_scope_str(r["scope"]),
|
||||
r["would_write"],
|
||||
]
|
||||
)
|
||||
widths = [len(h) for h in headers]
|
||||
for row in rows:
|
||||
for i, cell in enumerate(row):
|
||||
widths[i] = max(widths[i], len(str(cell)))
|
||||
fmt = " ".join("{:<" + str(w) + "}" for w in widths)
|
||||
print(fmt.format(*headers))
|
||||
print(fmt.format(*["-" * w for w in widths]))
|
||||
for row in rows:
|
||||
print(fmt.format(*[str(c) for c in row]))
|
||||
n_orphan = sum(1 for r in eligible if r["orphan"])
|
||||
print()
|
||||
print(f"[INFO] {len(eligible)} eligible, {n_orphan} orphan(s) would be recovered.")
|
||||
|
||||
|
||||
def _existing_recovered_for_uuid(out_dir: Path, uuid: str) -> Path | None:
|
||||
"""Return a prior recovered log for THIS uuid in ``out_dir``, if one exists.
|
||||
|
||||
The tool's own collision filename embeds the 8-char uuid prefix as a trailing
|
||||
``-recovered-...-<short>.md`` suffix (see ``compute_output_path``). Matching on
|
||||
that prefix lets a re-run overwrite its OWN prior draft for the same uuid in
|
||||
place -- the one safe overwrite -- instead of minting a second suffixed copy.
|
||||
|
||||
Only files that are clearly recovered drafts (``-recovered-`` in the name AND
|
||||
ending in ``-<short>.md``) are considered. A genuine non-recovered human log
|
||||
will never match, so its suffix protection is preserved.
|
||||
"""
|
||||
if not out_dir.is_dir():
|
||||
return None
|
||||
short = uuid[:8]
|
||||
suffix = f"-{short}.md"
|
||||
for f in out_dir.glob(f"*-recovered-*{suffix}"):
|
||||
if f.is_file() and f.name.endswith(suffix):
|
||||
return f
|
||||
return None
|
||||
|
||||
|
||||
def recover_one(rec: dict) -> str:
|
||||
"""Build + write the recovery log for one orphan. Returns the written path.
|
||||
|
||||
Idempotent per-uuid: if a prior recovered draft for THIS uuid already exists
|
||||
in the target directory (a run that died after writing but before the ledger
|
||||
was updated), overwrite that same file in place rather than creating a new
|
||||
suffixed copy. Never overwrites a non-recovered human log.
|
||||
"""
|
||||
parsed = rec["parsed"]
|
||||
markdown, meta = engine.build_log(parsed)
|
||||
out_path = Path(meta["path_would_be"])
|
||||
prior = _existing_recovered_for_uuid(out_path.parent, rec["uuid"])
|
||||
if prior is not None:
|
||||
out_path = prior
|
||||
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
out_path.write_text(markdown, encoding="utf-8")
|
||||
rec["written"] = str(out_path)
|
||||
rec["date"] = meta["date"]
|
||||
return str(out_path)
|
||||
|
||||
|
||||
def git(*args: str) -> subprocess.CompletedProcess:
|
||||
return subprocess.run(
|
||||
["git", *args],
|
||||
cwd=str(engine.repo_root()),
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=120,
|
||||
)
|
||||
|
||||
|
||||
def _current_branch() -> str:
|
||||
"""Return the current git branch name, or empty string if undeterminable."""
|
||||
res = git("rev-parse", "--abbrev-ref", "HEAD")
|
||||
if res.returncode == 0:
|
||||
name = res.stdout.strip()
|
||||
if name and name != "HEAD":
|
||||
return name
|
||||
return ""
|
||||
|
||||
|
||||
def commit_and_push(written_paths: list[str], count: int) -> bool:
|
||||
"""Stage only the recovered logs, commit, push. Soft-fail on errors.
|
||||
|
||||
NEVER stages the ledger -- it is machine-local and correctly gitignored;
|
||||
appending it to ``git add`` aborts the whole add (exit 1) and stages nothing.
|
||||
|
||||
Returns True only when BOTH the commit AND the push succeed. On any failure
|
||||
returns False so the caller knows not to mark these uuids ``recovered`` (the
|
||||
next run must re-attempt them).
|
||||
"""
|
||||
root = engine.repo_root()
|
||||
rel_paths = []
|
||||
for p in written_paths:
|
||||
try:
|
||||
rel_paths.append(str(Path(p).resolve().relative_to(root)))
|
||||
except ValueError:
|
||||
rel_paths.append(p)
|
||||
|
||||
add = git("add", "--", *rel_paths)
|
||||
if add.returncode != 0:
|
||||
print(f"[WARNING] git add failed; logs are on disk but uncommitted: {add.stderr.strip()}", file=sys.stderr)
|
||||
return False
|
||||
|
||||
msg = (
|
||||
f"chore: auto-recover {count} unsaved session log(s)\n\n"
|
||||
f"{engine._COMMIT_FOOTER}"
|
||||
)
|
||||
commit = git("commit", "-m", msg)
|
||||
if commit.returncode != 0:
|
||||
# Nothing to commit, or hook failure -- soft-fail.
|
||||
print(f"[WARNING] git commit returned non-zero: {commit.stdout.strip()} {commit.stderr.strip()}", file=sys.stderr)
|
||||
return False
|
||||
print(f"[OK] committed {count} recovered log(s).")
|
||||
|
||||
branch = _current_branch()
|
||||
if branch:
|
||||
push = git("push", "origin", branch)
|
||||
else:
|
||||
push = git("push")
|
||||
if push.returncode != 0:
|
||||
target = f"origin {branch}" if branch else "origin"
|
||||
print(
|
||||
f"[WARNING] git push to {target} failed (commit is local): {push.stderr.strip()}",
|
||||
file=sys.stderr,
|
||||
)
|
||||
return False
|
||||
print(f"[OK] pushed to origin{(' ' + branch) if branch else ''}.")
|
||||
return True
|
||||
|
||||
|
||||
def post_alert(recovered: list[dict]) -> None:
|
||||
"""Post an FYI to #bot-alerts via post-bot-alert.sh. Soft-fail."""
|
||||
script = engine.repo_root() / ".claude" / "scripts" / "post-bot-alert.sh"
|
||||
if not script.exists():
|
||||
print("[WARNING] post-bot-alert.sh not found; alert skipped.", file=sys.stderr)
|
||||
return
|
||||
bash = shutil.which("bash")
|
||||
if not bash:
|
||||
print(
|
||||
"[WARNING] 'bash' not found on PATH (restricted scheduler env?); "
|
||||
"#bot-alerts FYI skipped. Recovered logs are already committed.",
|
||||
file=sys.stderr,
|
||||
)
|
||||
return
|
||||
lines = [
|
||||
f"[INFO] Auto-recovered {len(recovered)} unsaved session log(s) -- "
|
||||
f"already saved to the repo; FYI, please review and remove the UNVERIFIED banner:"
|
||||
]
|
||||
for r in recovered:
|
||||
lines.append(
|
||||
f"- {r['uuid'][:8]} | {r.get('date', '?')} | {_scope_str(r['scope'])} | {r.get('written', '?')}"
|
||||
)
|
||||
message = "\n".join(lines)
|
||||
try:
|
||||
res = subprocess.run(
|
||||
[bash, str(script), message, "bot"],
|
||||
cwd=str(engine.repo_root()),
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=30,
|
||||
)
|
||||
out = (res.stdout or "").strip() or (res.stderr or "").strip()
|
||||
if out:
|
||||
print(out)
|
||||
except (OSError, subprocess.SubprocessError) as e:
|
||||
print(f"[WARNING] alert post failed: {e}", file=sys.stderr)
|
||||
|
||||
|
||||
def main(argv: list[str] | None = None) -> int:
|
||||
# Force UTF-8 stdout (Windows console defaults to cp1252; titles/paths in
|
||||
# the dry-run table can contain characters outside that codepage).
|
||||
try:
|
||||
sys.stdout.reconfigure(encoding="utf-8", errors="replace")
|
||||
except (AttributeError, ValueError):
|
||||
pass
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Detect and auto-recover unsaved Claude Code sessions."
|
||||
)
|
||||
parser.add_argument("--dry-run", action="store_true", help="scan + print report; no writes/commit/alert")
|
||||
parser.add_argument("--idle-min", type=int, default=90, help="minutes of mtime-idle before eligible (default 90)")
|
||||
parser.add_argument("--max", type=int, default=25, dest="max_recover", help="max orphan logs to build per run, oldest-first (default 25)")
|
||||
parser.add_argument("--no-commit", action="store_true", help="skip git commit/push")
|
||||
parser.add_argument("--no-alert", action="store_true", help="skip the Discord alert")
|
||||
args = parser.parse_args(argv)
|
||||
|
||||
# Respect the ledger in both modes (dry-run still skips already-processed).
|
||||
ledger = load_ledger()
|
||||
|
||||
eligible, recoverable = scan(args.idle_min, ledger)
|
||||
|
||||
if args.dry_run:
|
||||
print_dry_run_table(eligible)
|
||||
return 0
|
||||
|
||||
if not eligible:
|
||||
print("[INFO] No eligible transcripts to process.")
|
||||
return 0
|
||||
|
||||
written_paths: list[str] = []
|
||||
recovered_recs: list[dict] = []
|
||||
deferred = 0
|
||||
built = 0
|
||||
|
||||
for rec in eligible:
|
||||
uuid = rec["uuid"]
|
||||
if rec["orphan"]:
|
||||
# Cap actual log-builds per run (oldest-first). Remaining orphans are
|
||||
# left OUT of the ledger so the next run re-attempts them.
|
||||
if built >= args.max_recover:
|
||||
deferred += 1
|
||||
continue
|
||||
try:
|
||||
path = recover_one(rec)
|
||||
except Exception as e: # noqa: BLE001 -- never let one bad transcript abort the run
|
||||
print(f"[WARNING] failed to recover {uuid[:8]}: {e}", file=sys.stderr)
|
||||
# No on-disk artifact -> safe to mark immediately.
|
||||
ledger[uuid] = {"verdict": "error", "at": _now_iso(), "path": None, "error": str(e)}
|
||||
continue
|
||||
built += 1
|
||||
written_paths.append(path)
|
||||
recovered_recs.append(rec)
|
||||
print(f"[OK] recovered {uuid[:8]} -> {path}")
|
||||
elif rec["saved"]:
|
||||
# No on-disk artifact -> safe to mark immediately.
|
||||
ledger[uuid] = {"verdict": "skipped-saved", "at": _now_iso(), "path": None}
|
||||
else:
|
||||
ledger[uuid] = {"verdict": "skipped-trivial", "at": _now_iso(), "path": None}
|
||||
|
||||
if deferred:
|
||||
print(f"[INFO] {deferred} more orphan(s) deferred to next run (--max {args.max_recover}).")
|
||||
|
||||
# Persist the skipped/error verdicts now (they have no artifact, so they are
|
||||
# safe regardless of the commit/push outcome below).
|
||||
save_ledger(ledger)
|
||||
|
||||
if not recovered_recs:
|
||||
print("[INFO] No orphans recovered (all eligible sessions were saved or trivial).")
|
||||
return 0
|
||||
|
||||
if not args.no_commit:
|
||||
pushed = commit_and_push(written_paths, len(recovered_recs))
|
||||
if pushed:
|
||||
# H1: only mark uuids 'recovered' AFTER a successful commit+push, so a
|
||||
# push failure leaves them out of the ledger for the next run to retry.
|
||||
for rec in recovered_recs:
|
||||
ledger[rec["uuid"]] = {
|
||||
"verdict": "recovered",
|
||||
"at": _now_iso(),
|
||||
"path": rec.get("written"),
|
||||
}
|
||||
save_ledger(ledger)
|
||||
else:
|
||||
print(
|
||||
"[WARNING] commit/push did not succeed; recovered uuids left UNLEDGERED "
|
||||
"so the next run re-attempts them (logs are on disk).",
|
||||
file=sys.stderr,
|
||||
)
|
||||
else:
|
||||
print("[INFO] --no-commit set; recovered logs left unstaged and UNLEDGERED (next run will re-attempt).")
|
||||
|
||||
if not args.no_alert:
|
||||
post_alert(recovered_recs)
|
||||
else:
|
||||
print("[INFO] --no-alert set; Discord alert skipped.")
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
1187
.claude/scripts/recover_session.py
Normal file
1187
.claude/scripts/recover_session.py
Normal file
File diff suppressed because it is too large
Load Diff
95
.claude/scripts/register-orphan-detector.ps1
Normal file
95
.claude/scripts/register-orphan-detector.ps1
Normal file
@@ -0,0 +1,95 @@
|
||||
# register-orphan-detector.ps1
|
||||
# Register the "ClaudeTools - Orphaned Session Detector" scheduled task on this
|
||||
# Windows machine. The task runs detect_orphaned_sessions.py, which scans the
|
||||
# per-machine Claude Code transcript directory for unsaved substantive sessions,
|
||||
# auto-builds banner-marked recovery logs, commits + pushes them, and posts an
|
||||
# FYI to #bot-alerts.
|
||||
#
|
||||
# Mirrors the GrepAI watcher registration pattern in .claude/OLLAMA.md.
|
||||
#
|
||||
# Triggers:
|
||||
# - AtLogOn (catch sessions lost since the last logon)
|
||||
# - Daily, repeating every 4 hours (catch crashes during a long workday;
|
||||
# 4h cadence pairs with the detector's 90-minute idle gate so an active
|
||||
# session is never grabbed mid-flight)
|
||||
#
|
||||
# Idempotent: -Force replaces any existing task with the same name.
|
||||
# This script only REGISTERS the task. It does not run the detector now.
|
||||
#
|
||||
# Run from an ordinary (non-admin) PowerShell:
|
||||
# powershell -ExecutionPolicy Bypass -File D:\claudetools\.claude\scripts\register-orphan-detector.ps1
|
||||
|
||||
$ErrorActionPreference = "Stop"
|
||||
|
||||
$TaskName = "ClaudeTools - Orphaned Session Detector"
|
||||
|
||||
# Resolve the repo root portably. Prefer claudetools_root from identity.json
|
||||
# (per-machine, gitignored); fall back to two levels up from this script
|
||||
# (.claude/scripts/ -> repo root), resolved to a full path.
|
||||
$ScriptDir = $PSScriptRoot
|
||||
$FallbackRoot = (Resolve-Path (Join-Path $ScriptDir "..\..")).Path
|
||||
$IdentityPath = Join-Path $FallbackRoot ".claude\identity.json"
|
||||
$RepoRoot = $FallbackRoot
|
||||
if (Test-Path $IdentityPath) {
|
||||
try {
|
||||
$identity = Get-Content -Raw -Path $IdentityPath | ConvertFrom-Json
|
||||
if ($identity.claudetools_root -and (Test-Path $identity.claudetools_root)) {
|
||||
$RepoRoot = (Resolve-Path $identity.claudetools_root).Path
|
||||
}
|
||||
} catch {
|
||||
Write-Host "[WARNING] Could not parse $IdentityPath; using $FallbackRoot" -ForegroundColor Yellow
|
||||
}
|
||||
}
|
||||
$Script = Join-Path $RepoRoot ".claude\scripts\detect_orphaned_sessions.py"
|
||||
|
||||
if (-not (Test-Path $Script)) {
|
||||
Write-Host "[ERROR] Detector not found at $Script" -ForegroundColor Red
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Resolve the py launcher's full path (the action's Execute wants an absolute
|
||||
# path; "py" alone usually resolves but we pin it for reliability under the
|
||||
# Task Scheduler's environment).
|
||||
$PyCmd = Get-Command py -ErrorAction SilentlyContinue
|
||||
if ($null -ne $PyCmd) {
|
||||
$PyPath = $PyCmd.Source
|
||||
} else {
|
||||
$PyPath = "py" # fall back to PATH resolution at run time
|
||||
}
|
||||
|
||||
$Action = New-ScheduledTaskAction `
|
||||
-Execute $PyPath `
|
||||
-Argument "`"$Script`"" `
|
||||
-WorkingDirectory $RepoRoot
|
||||
|
||||
# Trigger 1: at logon for the current user.
|
||||
$TriggerLogon = New-ScheduledTaskTrigger -AtLogOn -User $env:USERNAME
|
||||
|
||||
# Trigger 2: daily at a fixed start, repeating every 4 hours all day.
|
||||
$TriggerDaily = New-ScheduledTaskTrigger -Daily -At 9am
|
||||
$TriggerDaily.Repetition = (New-ScheduledTaskTrigger `
|
||||
-Once -At 9am `
|
||||
-RepetitionInterval (New-TimeSpan -Hours 4) `
|
||||
-RepetitionDuration (New-TimeSpan -Hours 24)).Repetition
|
||||
|
||||
$Settings = New-ScheduledTaskSettingsSet `
|
||||
-ExecutionTimeLimit (New-TimeSpan -Minutes 30) `
|
||||
-MultipleInstances IgnoreNew `
|
||||
-StartWhenAvailable `
|
||||
-DontStopOnIdleEnd
|
||||
|
||||
Register-ScheduledTask `
|
||||
-TaskName $TaskName `
|
||||
-Action $Action `
|
||||
-Trigger $TriggerLogon, $TriggerDaily `
|
||||
-Settings $Settings `
|
||||
-Description "Scans Claude Code transcripts for unsaved substantive sessions and auto-recovers them into session logs." `
|
||||
-Force | Out-Null
|
||||
|
||||
Write-Host "[OK] Registered scheduled task '$TaskName'."
|
||||
Write-Host "[INFO] Action: $PyPath `"$Script`""
|
||||
Write-Host "[INFO] WorkDir: $RepoRoot"
|
||||
Write-Host "[INFO] Triggers: AtLogOn ($env:USERNAME) + daily every 4h"
|
||||
Write-Host "[INFO] To inspect: Get-ScheduledTask -TaskName '$TaskName' | Format-List"
|
||||
Write-Host "[INFO] To run now: Start-ScheduledTask -TaskName '$TaskName'"
|
||||
Write-Host "[INFO] To remove: Unregister-ScheduledTask -TaskName '$TaskName' -Confirm:`$false"
|
||||
Reference in New Issue
Block a user