Files

Mike Swanson a8abe4a14b glaztech: staged-remediation pacing strategy + Steve approval + softened Tom message

Adds the "from emergency to deliberate staged objectives" pacing strategy
(severity unchanged, tempo deliberate - the depth of the Glaz tools estate makes
rushing the bigger risk) and records Steve's blanket approval (Tier A
execution-cleared). Softens the Tom outreach to a partnership / not-a-fire-drill
tone per Mike.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-05 10:40:14 -07:00

12 KiB

Raw Blame History

Glaztech — The "Minimal-Tom" Path to a Real Fix

Status: DRAFT v0.2 (Grok + Gemini reviewed — strong consensus; reframed below) · Date: 2026-06-05 Owner: ACG · Relates to: #32378, the least-priv migration scope (2026-06-05-least-privilege-db-migration-scope.md), assessment E-bucket + items 16/17/19/20/22

The human constraint (read first — it drives the design)

Tom built and has owned this application for ~20 years; it is his baby. He is overwhelmed right now, and this engagement has made clear that security is not his domain. So the remediation is structured around these rules, not the technical ideal:

ACG does the security/infra/DBA heavy lifting — anything at the network/host/DB-admin layer is ours.
What we need from Tom is minimal, concrete, and within his existing skill — knowledge first, then a few small, well-scoped code edits, never "rearchitect your life's work."
Real, not cosmetic — the minimal-Tom path must actually cut the C0 blast radius.
Protect the relationship — don't hand Tom the threat model as a to-do list; frame it as "we'll handle the hard parts; we need your help on these few specific things."

Strategy & pacing — from "emergency" to deliberate staged objectives (2026-06-05)

Up to now this engagement has been framed as RED-FLAG / EMERGENCY — which was right when we believed it was a website problem. The deeper recon changed the calculus: this is the whole Glaz tools ecosystem — the public website + GTIware/portal + the in-process card engine + a centralized SQL estate wired into accounting and payroll across 10 offices.

Reframe: severity is unchanged; tempo changes. The findings are still genuinely critical (stored CVV, plaintext PANs, SQLi running as sysadmin) — we are not downplaying the risk. But precisely because the system is this deep and interconnected, hair-on-fire speed is the thing most likely to cause real damage. A rushed fix risks breaking live billing, accounting, payroll, and customer payments — the 06-05 outage from a "simple" hardening change is the proof. If this were just a website, the emergency tempo would still be right; it isn't, so getting it right matters more than getting it fast.

We split the work by blast radius:

Move fast (emergency-appropriate) on the CONTAINED, reversible, ACG-owned items — the E-bucket (disable xp_cmdshell, rotate the msdb domain-admin password, disable sa, de-priv the Agent), WAF, SQL network segmentation. These shut the worst escalation paths quickly with low risk to the business.
Go deliberate and staged on the DEEP app/data work — the DB de-priv proof-out, the 59-query fix, tokenization, engine decoupling. Each one staged, validated, and reversible — not rushed.

This is the defensible posture for a critical-but-complex environment: identify the risk, contain the immediate escalation fast, then remediate the structural issues in careful, validated stages rather than risk an outage that takes the business down. Client-facing messaging should reflect this calm, in-control, staged tone — not perpetual alarm. (It also serves the "don't break Tom" goal: a staged plan is something an overwhelmed dev can engage with; a five-alarm fire is not.)

Authorization & status (2026-06-05)

Steve Eastman has given ACG blanket approval to execute whatever remediation actions are necessary — no per-action re-authorization required. So the ACG-owned Tier A work below (E-bucket, DB de-privilege, WAF, SQL network segmentation) is authorized to proceed, applying the engineering discipline the 06-05 outage reinforced (backup + real-binding health check + one-step rollback; no in-place hotfixes). Risky DB/AD steps still get done carefully — Steve's approval covers authorization, not the care.
Still gating full execution: the network recon (Mike enrolling the site servers in RMM) to finalize the estate map and the scoped-login boundary; and Tom's input for the path map + the 59-query fix (see the Tom message).
The customer-facing security ticket #32378 can move from "Waiting on Customer" once ACG-side Tier A begins.

The reframe both reviewers forced (this is the important part)

My first draft led with "split the site into external/internal and network-gate /emp/ to VPN." Both models agree that's cosmetic for the headline risk. Why: the customer-facing payment pages (quick-pay*, ach*, order-detail*) contain the quo() SQL injection and run as sysadmin tom in the same worker process. An attacker injects on the public quick-pay.aspx and gets everything — OS takeover via xp_cmdshell, the domain-admin password in msdb, all 10 offices' cards — regardless of which URLs are internet-visible. Gating /emp/ takes the back-office tool off the internet (a real, worthwhile defense-in-depth win for the employee surface) but does nothing for C0.

So the real minimal-Tom path leads with what ACG can own, and reduces Tom to one within-skill code ask.

The way forward — tiered, consensus-driven

Tier A — Pure ACG, do first (≈0 Tom; immediate, real escalation/C0 reduction)

These need no path map, no Tom code, no customer confirmation. Highest leverage available right now.

Finish E-bucket containment (assessment E2-E5): rotate glaztech\administrator + strip the cleartext use from the msdb Agent job steps; disable xp_cmdshell (still =1 on :3436); disable sa + de-privilege the SQL Agent service account; turn on the host firewall profiles; replace EOL RealVNC. This kills the OS-RCE → domain-admin escalation — the worst of C0-Extended — with zero app change. (E1, the web-root ACL, was completed carefully during the 06-05 outage fix.)
DB de-privilege: strip sysadmin/securityadmin/dbcreator (and xp_cmdshell ability) from the website's login — but KEEP its legitimate access (all offices, cc_file, Sage, payroll, msdb) so nothing breaks. Caps the catastrophic blast radius (no OS RCE, no domain-admin, no arbitrary-instance control) without touching app code. (This is the achievable core of the scope's Phase 1, minus the unrealistic DENYs the recon disproved.)
WAF / IIS request-filtering in front of the site: block obvious SQLi/XSS patterns + rate-limit/bot-throttle on public paths. Immediate reduction of quo() exploitability while the code fix lands. (Also closes the failed-login-detection gap if added at the proxy/IIS layer.)
SQL network segmentation: restrict the :3436 listener + the linked-server mesh + backup shares to the WWW IP + required internal hosts only.
Run the §2a inventory now (ACG read-only, via RMM + on-box source grep + an Extended-Events trace on the tom login over a business day incl. a gt_auto_process run). Prerequisite for any scoped login, and it lets ACG derive a candidate customer-vs-employee path map before asking Tom anything.

Tier B — ACG infra + small Tom (after Tier A)

Network-gate /emp/ + sensitive paths to VPN/LAN-only. Real reduction of the employee-tool surface (takes the back-office off the public internet). ACG owns the gating (reverse proxy or IIS <location> ipSecurity). Tom's ask = review/correct ACG's candidate path map (ACG derives it first from IIS logs + the /emp/ tree + source grep). Needs office/VPN ranges from Steve + an employee heads-up ("back-office now requires VPN").
THE one within-skill Tom code ask — parameterize the 59 quo() queries. ACG greps and hands Tom the exact 59 lines (many are in the public payment paths). Fixing them kills the SQLi at the source. It's an isolated, repetitive, no-architecture coding task — squarely within his skill. This is the single highest-value thing only Tom can do.

Tier C — Named sub-project, deferred, heavily scaffolded (the real dev work — NOT a casual ask)

Tokenized payments (so the external surface needs zero card access) + decouple the GTIware card engine off the public host. Both reviewers were emphatic: this is not a "small deferred step." It's a multi-month, multi-surface project — the customer payment pages must stop touching PANs (processor hosted fields/vault), the in-process DLLs and the server-side gt_console_apps.exe jobs must bill by token, schema/dual-write, historical purge only after writes stop, PCI SAQ re-scope. Treat it as a named project with ACG (and possibly processor) support, small steps, pair-reviewed rollouts — never "Tom, go tokenize the payments." This is also the prerequisite to ever truly separate external/internal or DENY cc_file.

The Tom-ask ledger (the entire list — keep it this short)

Ask	Type	Tier
Review/correct ACG's candidate customer-vs-employee path map	Knowledge	B
Parameterize the 59 `quo()` lines (ACG provides them)	Small code, in-skill	B
Awareness/sign-off on the DB de-priv + E-bucket + conn-string swap	Yes/no	A
Tokenized payments + engine decouple (scaffolded sub-project)	Large code (deferred)	C
(Also eventually his: password hashing + stop emailing plaintext (C2), CSRF on payment POSTs (H7), `get_cc_data` IDOR)	Small-medium code	later

Everything else — WAF, E-bucket, DB de-priv, network segmentation, the gating config, the inventories, monitoring/rollback — is ACG's.

Breakage to watch (from the red-team)

Network-gating breaks legit remote employees (06-04 logs show real timecard traffic from residential IPs/iPhones) unless they're on VPN first → get authoritative office/VPN ranges from Steve, communicate the change, test customer post-login payment flows for any hidden /emp/ asset/AJAX dependency.
DB de-priv must KEEP the app's real reach (all offices, cc_file, Sage, payroll, msdb) — DENYing any of them hard-breaks the multi-office app and the in-process engine. Plus the earlier catches: app-pool recycle to flush the ADO.NET pool on swap/rollback; alphanumeric-only interim password; dynamic-SQL/ownership-chaining; don't DENY tempdb; audit linked-login catch-all mappings.
No staging (H1) — the 06-05 outage proved "simple" config changes here can take the whole site down; every step needs backup + real-binding health check + one-step auto-rollback.

How to approach Tom (delicately)

Lead with relief: "We're locking down the server and putting a WAF in front so you don't have to touch the network or database. We just need you to fix these 59 specific lines of SQL that we'll hand you."
Give him the shortest possible, concrete ask (the 59 lines; the path-map review) — only after ACG has done the homework.
Position ACG as the security partner doing the hard parts; the fix is with him, not done to his code.
Tokenization = a future named project we'll scaffold and pair on — not a burden dropped on an overwhelmed solo dev.

Reviewer consensus (Grok 4.3 + Gemini 3 Pro, 2026-06-05)

Both reviewed independently and agree — no material divergence:

Network-gating /emp/ is cosmetic for C0 (customer SQLi as sysadmin tom keeps full reach); real only for the employee-surface. Do the E-bucket + DB de-priv + WAF + SQL segmentation first — pure ACG, higher leverage, no path map needed.
The minimal-Tom real fix = WAF + DB de-priv (ACG) + Tom parameterizes the 59 quo() queries (ACG hands him the lines). Gemini: "the path map is a red herring; fixing those 59 queries kills the SQLi." Grok: same, with the split still useful to shrink the employee surface + the interim login scope.
Tokenization is underestimated — a multi-month, multi-surface rewrite of payments + the GTIware engine; must be a scaffolded sub-project, not a small Tom step.
ACG should derive the candidate path map first (logs + source + trace) and bring Tom only "review and correct," because in a 20yo intertwined monolith "confirm no customer page depends on /emp/" is not a 5-minute answer.

v0.2 — rebuilt around the Grok+Gemini consensus. Steve has authorized ACG to proceed (Tier A is execution-cleared, applying outage-grade rollback discipline); full execution still gated on the network recon (site-server RMM enrollment) and Tom's path-map + 59-query work.

12 KiB Raw Blame History