From 7e2e3a58821a1f86d187a0febf2b2982cd55e432 Mon Sep 17 00:00:00 2001 From: Howard Enos Date: Thu, 23 Apr 2026 06:21:24 -0700 Subject: [PATCH] sync: auto-sync from HOWARD-HOME at 2026-04-23 06:21:23 Author: Howard Enos Machine: HOWARD-HOME Timestamp: 2026-04-23 06:21:23 --- .claude/OLLAMA.md | 55 +++++++++++++------ .claude/memory/MEMORY.md | 1 + .../memory/feedback_ollama_tier0_routing.md | 46 ++++++++++++++++ .../docs/cloud/user-account-rollout-plan.md | 3 +- .../docs/security/hipaa-review-2026-04-22.md | 31 +++++++++++ 5 files changed, 118 insertions(+), 18 deletions(-) create mode 100644 .claude/memory/feedback_ollama_tier0_routing.md diff --git a/.claude/OLLAMA.md b/.claude/OLLAMA.md index 5df74d8..22dddaa 100644 --- a/.claude/OLLAMA.md +++ b/.claude/OLLAMA.md @@ -12,15 +12,31 @@ Ollama runs on Mike's workstation (DESKTOP-0O8A1RL) with GPU acceleration. Avail ## Endpoints -- **DESKTOP-0O8A1RL** (local): `http://localhost:11434` -- **Any other machine** (Tailscale required): `http://100.92.127.64:11434` +Auto-detect: any machine that has a local Ollama listening on `127.0.0.1:11434` uses local. Otherwise fall back to Mike's workstation over Tailscale. + +```bash +# Preferred universal resolver — works on any machine +if curl -s -m 2 http://localhost:11434/api/tags >/dev/null 2>&1; then + OLLAMA="http://localhost:11434" +else + OLLAMA="http://100.92.127.64:11434" +fi +``` + +Rationale: +- **Mike's workstation (DESKTOP-0O8A1RL):** local matches, no change. +- **HOWARD-HOME:** also has a local Ollama with the canonical model set (confirmed 2026-04-22). Uses local — faster, zero Tailscale hop, no load on Mike's GPU. +- **Other team machines:** no local Ollama → falls back to Mike's over Tailscale. +- **Mike's machine offline:** graceful degradation — local users continue working; non-local users get a clean timeout. + +Manual override (for testing or explicit preference): set `OLLAMA=http://100.92.127.64:11434` before the call. Check reachability: ```bash -curl -s http://100.92.127.64:11434/api/tags | jq -r '.models[].name' +curl -s $OLLAMA/api/tags | jq -r '.models[].name' ``` -If it fails: verify Tailscale is connected (`tailscale status`) and Mike's workstation is online. +If neither endpoint responds: verify Tailscale (`tailscale status`) and whether your local Ollama service is running. ## Access Control @@ -30,24 +46,29 @@ If it fails: verify Tailscale is connected (`tailscale status`) and Mike's works ## Calling Ollama -Resolve endpoint from identity.json first: -```bash -OLLAMA=$([ "$(jq -r .machine .claude/identity.json 2>/dev/null)" = "DESKTOP-0O8A1RL" ] \ - && echo "http://localhost:11434" || echo "http://100.92.127.64:11434") -``` +Use the `/api/chat` endpoint with `think:false` for qwen3 models. The older `/api/generate` endpoint on qwen3 puts output into thinking tokens that don't appear in the `response` field — you'll get an empty response if you use `/api/generate`. -Preferred one-liner (avoids shell escaping): +Preferred one-liner: ```bash -py -c " -import urllib.request, json, sys -url = 'http://localhost:11434/api/generate' -body = json.dumps({'model':'qwen3:14b','prompt': sys.argv[1],'stream':False}).encode() -res = json.loads(urllib.request.urlopen(urllib.request.Request(url, body)).read()) -print(res['response']) +python -c " +import urllib.request, json, sys, os +OLLAMA = os.environ.get('OLLAMA') or ('http://localhost:11434' if __import__('urllib.request').request.urlopen(urllib.request.Request('http://localhost:11434/api/tags'),timeout=2) else 'http://100.92.127.64:11434') +body = json.dumps({ + 'model':'qwen3:14b', + 'messages':[{'role':'user','content': sys.argv[1]}], + 'stream':False, + 'think':False +}).encode() +res = json.loads(urllib.request.urlopen(urllib.request.Request(OLLAMA+'/api/chat', body), timeout=120).read()) +print(res['message']['content']) " "Your prompt here" ``` -For code suggestions, swap `qwen3:14b` for `codestral:22b`. +Or set `$OLLAMA` once from bash (see auto-detect formula above) and reuse it across calls. + +For code suggestions, swap `qwen3:14b` for `codestral:22b`. Codestral doesn't need `think:false`. + +Cold-start is ~30-50s on first call per model per session. Warm calls are 1-5s. ## When to Use Which Model diff --git a/.claude/memory/MEMORY.md b/.claude/memory/MEMORY.md index e87863e..8431648 100644 --- a/.claude/memory/MEMORY.md +++ b/.claude/memory/MEMORY.md @@ -23,6 +23,7 @@ - [D2TESTNAS SSH Access](feedback_d2testnas_ssh.md) - Use root@192.168.0.9 with Paper123!@#, not sysadmin - [Bypass Permissions Setting](feedback_bypass_permissions_setting.md) - Set permissions.defaultMode to bypassPermissions in settings.json on all machines - [365 Remediation Tool](feedback_365_remediation_tool.md) - Always means Graph API app fabb3421, not CIPP +- [Ollama Tier-0 Routing](feedback_ollama_tier0_routing.md) - Route drafts/summaries/classifications through Ollama (qwen3:14b). Mike designed ClaudeTools this way — not optional. ## Machine - [ACG-5070 Workstation Setup](reference_workstation_setup.md) - Windows 11 Pro clean install 2026-03-30, replaced CachyOS. All tools installed. diff --git a/.claude/memory/feedback_ollama_tier0_routing.md b/.claude/memory/feedback_ollama_tier0_routing.md new file mode 100644 index 0000000..4a73568 --- /dev/null +++ b/.claude/memory/feedback_ollama_tier0_routing.md @@ -0,0 +1,46 @@ +--- +name: Route Tier-0 tasks through Ollama (Mike's ClaudeTools design intent) +description: Drafts, summaries, classifications, extractions MUST go through Ollama per Mike's tiered-model architecture. Don't default to Claude inference for low-stakes text generation. +type: feedback +--- + +Route Tier-0 tasks (summaries, classifications, drafts, extractions) through Ollama. Not optional — this is how Mike designed ClaudeTools to work. + +**Why:** Mike built the tiered-model architecture (`CLAUDE.md` Model Routing section + `.claude/OLLAMA.md`) deliberately. Tier 0 is free + fast + private. Defaulting to Claude for every drafting task burns context window and Anthropic tokens on work that qwen3:14b does fine. + +**How to apply:** +- Drafting emails, session-log paragraphs, status-update sentences, commit-message first-drafts → qwen3:14b +- Summarizing long output (Graph JSON, PowerShell transcripts, log tails) → qwen3:14b +- Extracting structured data from text → qwen3:14b +- Suggesting refactors / generating docstrings → codestral:22b (then review) +- NEVER for: auth decisions, credential handling, production migrations, security review, citation work, production-change scripts + +**Endpoint resolution (updated 2026-04-22 in `.claude/OLLAMA.md`):** +```bash +if curl -s -m 2 http://localhost:11434/api/tags >/dev/null 2>&1; then + OLLAMA="http://localhost:11434" +else + OLLAMA="http://100.92.127.64:11434" +fi +``` + +HOWARD-HOME has the canonical models loaded locally (qwen3:14b, codestral:22b, nomic-embed-text, plus bonus qwen3-coder:30b) — so HOWARD-HOME uses local Ollama, not Mike's. Zero Tailscale hop. + +**Call pattern for qwen3 — use `/api/chat` with `think:false`**, NOT `/api/generate`. qwen3 on generate endpoint dumps reasoning into internal thinking tokens and returns empty `response` field. Chat endpoint with `think:false` returns clean content in `message.content`: + +```python +body = json.dumps({ + 'model':'qwen3:14b', + 'messages':[{'role':'user','content': prompt}], + 'stream':False, + 'think':False +}).encode() +# POST to OLLAMA + '/api/chat' +# Read res['message']['content'] +``` + +Codestral doesn't need `think:false` — just use it on `/api/chat` normally. + +Cold-start ~30-50s on first call per model per session; warm calls 1-5s. + +**Incident 2026-04-22:** Spent an entire Cascades rollout session (G1 hygiene, orphan cleanup, risk register, synology discovery, etc.) without routing a single task through Ollama despite many drafting opportunities (report drafts, summary text, email drafts). Howard called this out: "just make sure ollama is being used as mike has designed claudetools to work." diff --git a/clients/cascades-tucson/docs/cloud/user-account-rollout-plan.md b/clients/cascades-tucson/docs/cloud/user-account-rollout-plan.md index 357f3dd..1a36529 100644 --- a/clients/cascades-tucson/docs/cloud/user-account-rollout-plan.md +++ b/clients/cascades-tucson/docs/cloud/user-account-rollout-plan.md @@ -165,7 +165,8 @@ Without Entra Connect, new accounts are cloud-only and create the same AD-vs-M36 | **G4. Take out of staging, directory sync ONLY (no Password Hash Sync)** | Hybrid identity appears in Entra. Passwords remain separate between AD and M365. | None — users sign in exactly as today | 48 hours stable with no new support tickets about sign-in | | **G5. Announce + enable Password Hash Sync** | AD password hash pushes to Entra. Next Outlook / Teams / Edge launch, prompts once for password. Users enter AD password. | **ONE password prompt, once.** After that: one password for everything. | Zero unresolved helpdesk tickets; test user confirms PC + Outlook + OWA work on same password | | **G6. Conditional Access policies go live in REPORT-ONLY mode** | CA evaluates every sign-in and records what WOULD have been blocked, but doesn't actually block. | None | 7–14 days of logs reviewed — zero "would have been blocked" events for legitimate users. Fix trusted-location / compliance gaps as needed. | -| **G7. CA enforcement flip** | Policy blocks out-of-scope sign-ins for real. | Off-site users unexpectedly on the allow-list see no change; users NOT on allow-list get blocked from outside the building as intended. | Break-glass account confirmed working. Meredith notified. | +| **G7. CA enforcement flip** | Policy blocks out-of-scope sign-ins for real. | Off-site users unexpectedly on the allow-list see no change; users NOT on allow-list get blocked from outside the building as intended. | Break-glass account confirmed working. Meredith notified. **User comms sent 48h before flip** — see G7a below. | +| **G7a. Pre-enforcement user comms (MUST run before G7)** | Query Entra sign-in logs for any licensed user with >0 off-site sign-ins in last 30 days. Anyone NOT in `SG-External-Signin-Allowed` gets a targeted email: "Starting [date] you will only be able to sign into Cascades email and apps from inside the building. If you work from home / travel / check email on your phone off-site, reply to Meredith by [date-1] to be added to the allow-list." | Users who legitimately work off-site get warned; those who don't get confirmation that silent behavior change is coming. | Report from Entra sign-in logs shows comms sent to every off-site-active user. No silent blocks at G7 cutover. | | **G8 (separate project). ALIS SSO Enterprise App registration** | "Sign in with Microsoft" option appears on ALIS login. Existing ALIS username/password keeps working during transition. | Optional new sign-in button. | N/A — rollout when ALIS support has provided federation metadata. | **Rollback points:** G3 through G5 all have clean reverse paths (remove from staging, disable PHS, reset individual passwords). G6/G7 CA policies can be disabled with one click. Only hard-to-reverse step is G1's AD renames — mitigated by the pre-change reg-exports/backups already in the `D:\Backups\pre-entra-connect-*` folder from the 2026-04-22 preflight remediation. diff --git a/clients/cascades-tucson/docs/security/hipaa-review-2026-04-22.md b/clients/cascades-tucson/docs/security/hipaa-review-2026-04-22.md index 1c9b8c6..d747e43 100644 --- a/clients/cascades-tucson/docs/security/hipaa-review-2026-04-22.md +++ b/clients/cascades-tucson/docs/security/hipaa-review-2026-04-22.md @@ -7,6 +7,37 @@ --- +## Findings classified ACTIVE ONGOING VIOLATION — present-tense gap + +### A1. Synology role-based shared-login accounts with PHI access + +**Rule:** 45 CFR §164.312(a)(2)(i) Unique User Identification (Required). + +**Current state:** The Synology NAS `cascadesds` (192.168.0.120) hosts 7 role-based shared-credential local accounts that multiple humans sign into. Several of these accounts have access to shares containing PHI (`homes`, `Management`, `pacs`). Per `docs/migration/synology-permission-inventory.md` these accounts are: + +- `Accounting` +- `Dining Manager` +- `Front Desk` +- `mcnurse` +- `Memcare Receptionist` +- `memcarenurse` +- `Nurse Tower` + +**Gap:** These are NOT scheduled for remediation until Phase 4 (Synology retirement + CS-SERVER file-share cutover), which will be weeks away at best. **Every day until Phase 4, these shared credentials are an active Required-spec violation if any of them access PHI shares.** The `pacs` share (likely medical imaging) and `Management` (clinical admin docs) are the highest-risk. + +**Options:** +1. **Accelerate disable.** Immediately disable shared logins on Synology + force users onto their personal AD-synced accounts. Risk: breaks known workflows, disrupts front-desk / nursing stations that rely on shared logins today. +2. **Documented risk-acceptance in Risk Analysis.** Capture the exception explicitly: "7 Synology shared-login accounts remain operational until Phase 4 cutover, target [date]. Compensating controls: physical access restricted to Cascades building, shift-based sign-in sheets on each shared workstation, monthly SMB access-log review by Howard." Meredith signs the residual-risk acknowledgment. +3. **Hybrid.** Disable the highest-sensitivity shared accounts immediately (`mcnurse`, `memcarenurse`, `Nurse Tower` if they touch `pacs`), accept risk on the less-sensitive ones (`Accounting`, `Front Desk`). + +**Decision required:** Which option does Meredith prefer? Option 2 is most common but the residual-risk paperwork has to be real, not just assumed. + +**Detection:** Monthly sample of Synology SMB access logs for those accounts, mapped against shift schedules. + +**Target resolution:** Phase 4 (Synology retirement) OR explicit immediate-disable event. Whichever comes first. + +--- + ## Findings classified CRITICAL — must fix before rollout ### C1. Shared agency logins would violate §164.312(a)(2)(i) — Unique User Identification