claudetools/docs/CT_THOUGHTS.md

# CT Thoughts — ClaudeTools idea backlog

> The shared backlog of ClaudeTools harness ideas (the internal tooling itself, not
> client work). Nothing here is approved to build — ideas advance only by explicit
> decision.
>
> **Pipeline:** THOUGHT (raw idea dropped here) -> DISCUSS (chat it through) ->
> SPEC (`/shape-spec` or a concept doc) -> ROADMAP -> BUILD.
>
> **How to add a thought:** in any Claude session say "ct thought: <idea>" (or
> "add to ct thoughts" / "park this as a ct thought"). Claude appends it below with
> who/when and a Status. Howard's ideas land here too.
>
> **Status per entry:** Raw -> Discussed -> Spec'd -> Roadmapped -> Done.
>
> The entries below are the current thoughts:
> 1. ClaudeTools 3.0 — web-based co-work workspace (Mike, 2026-06-14) — **Discussed (vision-stage, no build go)**
> 2. Web-search bots (grok xsearch + gemini search) reliability - MUST FIX (Mike, 2026-06-17) - **Mitigated/Fixed same day (gemini 3-retry+backoff; grok xsearch auto-falls-back to gemini on timeout). Grok's own multi-agent timeout is upstream/unsolved.**

---

## Thought 1 — ClaudeTools 3.0: Web-Based Co-Work Workspace (Mike, 2026-06-14)

**Status: Discussed — vision-stage, feeling out possibilities. NOT authorized to build.**

### The want

A web-based ClaudeTools that gives the team (Mike, Howard) real co-work — "think Claude
Co-Work, but tailored like ClaudeTools already is." Today co-work = N separate Claude Code
CLI terminals on N machines, glued together by the git-synced repo + the coord API
(`172.16.3.30:8001` — locks, todos, messages, component state) polled over HTTP. The
vision turns that into one shared room where you can see and drive sessions across the
fleet from a browser (incl. phone) over Tailscale.

Odysseus (`D:\Odysseus`, AGPL-3.0 self-hosted AI workspace) is **inspiration for the shell
only** (auth, sessions, document editor, memory UI, mobile/PWA polish) — NOT the agent.
Its agent is a from-scratch loop; adopting it would throw away the Claude Code harness
(skills, slash commands, hooks, coord, the Opus agent itself) that is ClaudeTools' whole
value. Copying its code would also make ClaudeTools 3.0 AGPL + network-served (source-offer
obligation). Reimplement/take-inspiration only.

### Agent backend + auth (decided constraints)

- Agent = **Claude Agent SDK** (the supported way to embed the real Claude Code harness —
  same skills/MCP/hooks/tools — into a server). NOT Odysseus's loop.
- ClaudeTools 3.0 is **internal-only**, so per-person **subscription OAuth**
  (`CLAUDE_CODE_OAUTH_TOKEN` via `claude setup-token`) is compliant and is the auth model.
  Each node authenticates with THAT person's own token (Mike's on Mike's boxes, Howard's on
  his); the hub never centralizes one subscription to serve many. See memory
  `project_ai_auth_product_boundary`.
- Cost ceiling = the post-2026-06-15 monthly Agent-SDK credit pool (~$100-200/Max), not
  unlimited; eventual "spill to API key when the pool's dry" fallback in the node daemon.
- GuruRMM (sellable) is separate: customer brings their own API key. Does not entangle this.

### The architecture: two axes, not one choice

The instinct was "central host on Beast" vs. "per-workstation + peer-to-peer back-channel
replacing the coord DB." That conflates two independent axes:
- **Axis 1 — where the AGENT runs:** central on Beast vs. per-workstation.
- **Axis 2 — how SHARED STATE flows:** central coord service vs. peer-to-peer mesh.

Recommended corners: **distributed agents + central coordination.**
- Don't replace the coord DB with a mesh. Connectivity isn't the hard part (Tailscale gives
  direct node-to-node); **agreement + durability are.** A lock living only in a peer's memory
  evaporates exactly when that peer crashes. Keep coordination central — it already works.
- Don't centralize the agent onto Beast. ClaudeTools sessions do machine-local work (SSH from
  a specific network position, OS-specific skills, a per-machine vault age-key). Beast (a ROG
  box, presumably Windows) can't run macOS/Kali-native work, and one host egress becomes a
  chokepoint + SPOF. Keep the brain local where the work + creds are.

### Topology — three tiers, two channels, one rule

```
   TIER 1: CLIENTS                TIER 2: HUB (Beast)              TIER 3: AGENT HOSTS
   (presentation + control)       (always-on coordinator)         (where work runs)

  +----------------+                                            +----------------------+
  | Browser (Mike) |==WSS==+                              +=WSS=| GURU-5070 node daemon|
  +----------------+       |     +------------------+    |     |  - Agent SDK sessions|
  +----------------+       +====>|   Gateway (WSS)  |<===+     |  - vault age-key     |
  |Browser (Howard)|==WSS==+     |  auth + terminate|    |     |  - local fs/shell/net|
  +----------------+       |     +------------------+    |     +----------------------+
  +----------------+       |     |  Session Registry|    |     +----------------------+
  |  Phone (Mike)  |==WSS==+     |  + Presence      |    +=WSS=| MacBook node daemon  |
  +----------------+             +------------------+    |     |  (macOS-native work) |
                                 |  Relay (pipe)    |    |     +----------------------+
                                 +------------------+    |     +----------------------+
                                 |  Coord (locks,   |    +=WSS=| GURU-KALI node daemon|
                                 |  todos, msgs) +  |          |  (Linux-native work) |
                                 |  event log DB    |          +----------------------+
                                 +------------------+
                                  all on Tailscale (WireGuard)
```

**The one rule: Beast never dials out. Both clients and nodes dial IN to Beast.** Each
workstation's node daemon holds a persistent, auto-reconnecting **outbound** WSS to the
gateway — kills NAT traversal, inbound firewall rules, and reachability as problems.
Tailscale is the L3 fabric (encrypted + identity'd); WSS rides on top for the app layer.

**What each tier is:**
- **Client** — pure glass. Renders sessions, sends input/approvals. Per-PERSON login here
  (Mike, Howard); authz maps a person to the nodes/sessions they may drive. Holds no state
  that can't be rebuilt from the hub.
- **Hub (Beast)** — the coordinator, NOT the brain. Terminates both WSS channels, tracks
  presence, relays session streams, owns the durable shared state (coord + a per-session
  event log). This is the existing coord API promoted from "polled over HTTP" to "pushes
  events live." HTTP coord API can stay for backward-compat with existing CLI sessions.
- **Node daemon (each workstation)** — small long-lived process owning the local Agent SDK
  session lifecycle + exposing local resources (vault, fs, shell, network position). Brain
  runs here.

### The envelope (one framing for both channels)

```
{ v:1, type, session_id?, node_id?, from, ts, payload }
```

| type | direction | meaning |
|---|---|---|
| `node.register` / `presence` | node -> hub | "GURU-5070 online, here are my sessions" |
| `session.start` | client -> hub -> node | spawn an Agent SDK session on node X |
| `session.list` | hub -> client | fleet inventory for the lobby |
| `session.attach` / `detach` | client -> hub | subscribe/unsubscribe a session stream |
| `stream.delta` / `tool_call` / `tool_result` / `status` | node -> hub -> client(s) | live output |
| `input.prompt` / `input.approval` | client -> hub -> node | drive the session, answer gates |
| `coord.lock` / `todo` / `message` | any -> hub -> subscribers | shared state, pushed live |

### Attaching to a remote session (core flow)

```
Mike's browser        Hub (Beast)            GURU-5070 daemon       Agent SDK session
     |                     |                       |                       |
     | attach(sess_42) --->|                       |                       |
     |                     | replay event-log ---->| (already streaming) --|
     |<-- replay + tail ---|                       |   stream.delta -------|
     |                     |<----------------------|   (node->hub->clients)|
     | input.prompt ------>| --------------------->| --------------------->|
     |<-- stream.delta ----|<----------------------|<----------------------|
```

Node is source of truth for a LIVE session; hub MIRRORS every event into a per-session log
for durability + late-join replay. Attaching = replay the log, then tail live. A session
whose node drops shows "offline" with transcript intact, re-attaches on reconnect.

### The co-work mechanic (the actual point)

"Attach" is just *subscribe to a session_id*, so **N people can attach to the same session.**
Mike + Howard watch the same agent run live; either can send input. Add:
- **Presence-on-session** — "Howard is viewing", "Mike is typing".
- **A driving token** — a soft lock at session granularity (reuse the coord lock primitive):
  one person "has the wheel," visibly; others can request it. Last-writer-wins underneath,
  but the indicator stops collisions socially before they happen.

That falls out of the relay design almost for free — it's the Claude Co-Work analogue.

### What it looks like on the screen (the hard part: vision -> screen)

**Lobby / fleet view** — "what's everyone doing":
```
+- ClaudeTools --------------------------------------------+- Coord ----------+
|  NODES                                                   | LOCKS            |
|  * GURU-5070   Mike    2 sessions                        |  valleywide-esxi |
|     |- #42  [remediation] Valleywide ESXi   < Mike       |  -> held: 5070   |
|     \- #43  [client]      Syncro triage     idle         |                  |
|  * MacBook     Mike    1 session                         | TODOS (3)        |
|     \- #44  [dev]         GuruRMM build     (waiting)     |  [ ] rotate B2   |
|  * GURU-KALI   -        0 sessions                       |  [ ] wiki: VWP   |
|  o BEAST       (hub)                                     |  [x] py.sh dep   |
|                                                          | MESSAGES (1)     |
|  [ + New session on... v ]                               |  KALI->fleet: .. |
+---------------------------------------------------------------------------+
```

**Session room** — attach to #42:
```
+- #42  remediation - Valleywide ESXi - on GURU-5070 ---- (eye) Mike, Howard -+
|                                                                             |
|  [agent] Checking datastore free space on 192.168.3.24...                   |
|  +- tool: ssh esxi - df -h --------------------------------+  > approved    |
|  | /vmfs/volumes/datastore1   3.6T   65% used              |                |
|  +----------------------------------------------------------+               |
|  [agent] 65% now - down from 87%. The 3 decommissioned VMs are gone.        |
|                                                                             |
+---------------------------------------------------------------------------  |
|  Mike has the wheel   [request wheel]                                       |
|  > _                                                              [send]     |
+---------------------------------------------------------------------------+
```

If those two screens match the picture in Mike's head, the architecture is the diagram
above. If they don't, that mismatch is the cheapest thing to discover now, pre-code.

### Failure modes (SPOF honesty)

- **Hub down:** local agents keep working — degrade to local-only, queue coord events (the
  softfail-queue idea already exists in ClaudeTools). Web UI dark, but work does NOT stop.
- **Node down:** its sessions pause; others unaffected; transcripts survive on the hub.
- **Partition:** outbound WSS reconnects with backoff; event log resyncs clients on return.

### What to build first (prove the vision, cheaply)

The one risky, novel slice: a node daemon running ONE Agent SDK session, streaming it over
a WSS to a dead-simple web page that can watch + send input. No auth, no coord, no
multi-node, localhost only. If watching a real session stream into a browser and typing back
matches the vision -> everything else on the diagram is known engineering. If it doesn't ->
a day spent, not a quarter.

### Open questions (resolve before going past the prototype)

1. **Transcript truth on partition** — node vs. hub when they disagree after a reconnect.
2. **Input arbitration** — is a soft driving-token enough, or do you want hard turns?
3. **CLI coexistence** — do existing Claude Code CLI sessions appear as first-class nodes,
   or is the web the only entry point?

---

## Thought 2 — Web-search bots (grok xsearch + gemini search) reliability: MUST FIX (Mike, 2026-06-17)

**Status: Raw - HIGH PRIORITY. Mike's directive: this "absolutely must be properly fixed."**

### The problem

The live web-search path - `grok xsearch` (multi-agent web_search) and `agy/gemini search`
(Google grounding) - is the MOST VALUABLE discovery tool we have (it surfaced the UniFi cloud
connector proxy and the Teleport `/rest/setting/teleport` path that blind endpoint-probing never
would have). But it is UNRELIABLE: both return EMPTY intermittently, especially on longer /
multi-part queries.

Observed 2026-06-17:
- `grok xsearch` returned `[ask-grok] no result (stopReason=)` ~5+ times on UniFi queries,
  DESPITE the same-day partial fix (--yolo, drop --no-subagents, web-primary prompt, 300s budget)
  that DID work on a short query ("current rust version", 23s). So the fix is incomplete - longer
  queries still fail.
- `agy/gemini search` returned `[ask-gemini] empty response` (even after its built-in single retry)
  on the same queries.

### Why it matters (Mike's reweighting)

Web search now carries AT LEAST as much weight as live API probing. Probing without a search/doc
lead is "blind guessing" and mostly 404s; the searches give the real leads. So a flaky search bot
directly degrades research quality and pushes the loop back toward bad guessing.

### Proper fix (not a workaround) - investigation plan

1. **grok:** capture the raw `--output-format json` (or `streaming-json`) of a FAILING long query;
   determine whether it's max-turns exhaustion, the empty-finalization quirk (stopReason blank), a
   timeout mid-search, or the multi-agent searcher itself returning nothing. Then fix the actual cause
   (raise/auto-scale max-turns; switch to `streaming-json` so partial results survive; retry-on-empty
   loop with backoff; possibly chunk multi-part queries).
2. **gemini:** same - capture why `search` returns empty (the wrapper already retries once and still
   fails); check whether `--approval-mode yolo` + `google_web_search` is finalizing empty, and add a
   real retry/fallback.
3. **Cross-fallback:** if grok empty -> auto-try gemini and vice-versa, and surface which one answered.
4. Acceptance: 5/5 success on a battery of long, multi-part research queries.

### Note

This was filed because both bots failed during live UniFi VPN/Teleport research and forced a fallback
to suspect endpoint-probing. The web-search feature is load-bearing for the "interview the AIs / read
the docs before probing" workflow - it has to be dependable.

### Resolution (2026-06-17, same day) - diagnosed from raw output, fixed:

- **Diagnosis (not guessed):** captured raw output of failing queries. GROK xsearch = TIMEOUT: the
  grok-4.20-multi-agent web_search runs past budget on multi-part queries (286s/280s, rc=124, still in
  the search phase - 183 thoughts, only progress-noise text), and buffered `json` => total loss. GEMINI
  search = INTERMITTENT empty turn (a clean re-run succeeded in 122s with a real 2.6KB answer); the
  wrapper only retried once, so two empties in a row failed spuriously.
- **Gemini fix:** `emit_or_fail` now retries up to 3x with 3s/6s backoff (was 1). Two follow-on bugs
  found+fixed same day while using it: (a) the auth check ran INSIDE the retry loop and a benign mid-run
  token-refresh line matched the over-broad auth regex -> false "auth error" abort; moved auth-classify
  AFTER the retries and tightened the regex. (b) added a QUOTA FALLBACK: when the pinned strong model
  (gemini-3.1-pro-preview) returns "exhausted your capacity on this model", retry once on the default
  (lighter) model (separate quota) by stripping -m. Validated: a quota-capped pro run fell back and
  returned a 2.9KB answer.
- **Grok xsearch fix:** switched to `--output-format streaming-json` (salvage any partial that streamed),
  moderate budget, and **AUTO-FALLBACK to gemini search** when grok doesn't finish (rc!=0 or empty).
  Validated e2e: grok timed out (rc=124) -> fell back -> gemini returned a real sourced answer.
- **Still open (upstream):** grok's multi-agent web_search genuinely can't finish heavy queries in
  budget - that's an xAI-side limitation; the fallback makes xsearch reliable regardless. If grok fixes
  the multi-agent latency (or exposes a lighter single-agent web_search), revisit. Acceptance ("5/5 on
  long queries") now effectively met via the gemini path.