Files
claudetools/CATALOG_SOLUTIONS.md
Mike Swanson b79c47acb9 sync: Auto-sync from ACG-M-L5090 at 2026-01-26 16:45:54
Synced files:
- Complete claude-projects import (5 catalog files)
- Client directory with 12 clients
- Project directory with 12 projects
- Credentials updated (100+ sets)
- Session logs consolidated
- Agent coordination rules updated
- Task management integration

Major work completed:
- Exhaustive cataloging of claude-projects
- All session logs analyzed (38 files)
- All credentials extracted and organized
- Client infrastructure documented
- Problem solutions cataloged (70+)

Machine: ACG-M-L5090
Timestamp: 2026-01-26 16:45:54

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 16:23:47 -07:00

45 KiB

Technical Solutions Catalog

Generated: 2026-01-26 Source: Session logs from C:\Users\MikeSwanson\claude-projects\session-logs
Purpose: Comprehensive catalog of all technical problems, solutions, and insights


Table of Contents

  1. Tailscale & VPN
  2. Database & Migration
  3. Web Applications & JavaScript
  4. Email & DNS
  5. Legacy Systems & DOS
  6. Development & Build Systems
  7. Authentication & Security
  8. Infrastructure & Networking
  9. Software Updates & Auto-Update
  10. Cross-Platform Compatibility

Tailscale & VPN

Problem: Tailscale Stopped Working After pfSense Upgrade

Date: 2025-12-12 Technologies: pfSense, Tailscale, VPN

Symptom:

  • Tailscale stopped working after pfSense upgrade
  • Device removed from Tailscale admin console
  • Reinstalling package didn't resolve issue

Root Cause:

  • pfSense upgrade cleared Tailscale authentication state
  • Service was running but logged out

Solution:

# Check status
tailscale status  # Showed "Logged out"
service tailscaled status  # Confirmed running (pid 67768)

# Re-authenticate with preserved settings
tailscale up --accept-routes --advertise-routes=172.16.0.0/22

# Follow login URL in browser to complete authentication

Verification:

  • pfSense now shows IP 100.127.188.119
  • Device visible in Tailscale admin console
  • Routes properly advertised

Technologies: pfSense (FreeBSD), Tailscale, routing

Lesson Learned: Major OS upgrades may clear authentication state even if service configurations persist. Always preserve critical flags like --accept-routes and --advertise-routes when re-authenticating.


Problem: RADIUS Authentication Failing for OpenVPN

Date: 2025-12-20 Technologies: OpenVPN, RADIUS, NPS, Windows Server, pam_radius_auth

Symptom:

  • OpenVPN connections timing out when RADIUS authentication enabled
  • Local authentication worked fine
  • Error: "RADIUS server failed to respond"

Root Cause:

  • NPS (Network Policy Server) requires Message-Authenticator attribute in RADIUS requests
  • UDM's pam_radius_auth module doesn't send Message-Authenticator attribute
  • NPS Event ID 17 showed Message-Authenticator requirement mismatch

Solution:

# Connect to NPS server
$pass = ConvertTo-SecureString 'Paper123!@#' -AsPlainText -Force
$cred = New-Object System.Management.Automation.PSCredential('INTRANET\sysadmin', $pass)

# Disable Message-Authenticator requirement
Invoke-Command -ComputerName 192.168.0.27 -Credential $cred -ScriptBlock {
    Set-NpsRadiusClient -Name "unifi" -AuthAttributeRequired $false
}

# Restart NPS service
Restart-Service IAS

Verification:

# Check for successful auth events
Get-WinEvent -FilterHashtable @{LogName='Security'; ID=6272} -MaxEvents 5
# Event 6272 = "Network Policy Server granted access to a user"

Technologies: OpenVPN, RADIUS, NPS, PAM, Windows Server

Related Issue: UniFi bug - exported .ovpn profiles ignore redirectGateway: false setting and include redirect-gateway def1 anyway

Workaround: Create custom .ovpn profiles without redirect-gateway line

Lesson Learned: RADIUS implementations vary in attribute requirements. PAM-based RADIUS clients may not support all attributes that Windows NPS expects. Check NPS Event Viewer (Event ID 17) for detailed authentication failure reasons.


Database & Migration

Problem: MariaDB Strict Mode Rejecting Legacy PHP Code

Date: 2025-12-15 Site: data.grabbanddurando.com Technologies: PHP 8.1, MariaDB 10.11, MySQL strict mode

Symptom:

  • Site migrated from GoDaddy VPS to new cPanel server
  • INSERT statements failing with data type validation errors
  • Calendar and form submissions broken

Root Cause: New server runs MariaDB in strict mode (STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION) which enforces data type validation. Old GoDaddy server used lenient MySQL that silently accepted invalid data.

Issues Identified:

Issue 1: Empty String for Auto-Increment Columns

Problem:

// Broken in strict mode:
insert into activity value('', user_id, 'username', 'message', NOW())

MariaDB strict mode rejects '' for integer columns. GoDaddy's lenient MySQL converted '' to 0 which auto-increment replaced.

Fix:

// Specify column names, exclude ID field:
insert into activity (uid, uname, message, date)
values(user_id, 'username', 'message', NOW())

Files Modified: 61 PHP files including:

  • editcapability.php
  • manage_calendar.php
  • todo_list.php
  • timer-start-ajax.php
  • All files in services/ directory

Issue 2: Session Handling Order (PHP 8.1)

Problem:

// Before (broken):
session_start();
session_name("law_admin");  // PHP 8.1 rejects this order

Fix:

// After (fixed):
session_name("law_admin");
session_start();

Issue 3: Date Format Parsing

Problem:

// Unreliable with MM/DD/YYYY format:
$event_starts = date('Y-m-d', strtotime($_POST['event_starts']));

Fix:

// Explicit format parsing:
$event_starts = !empty($_POST['event_starts']) &&
    ($dt = DateTime::createFromFormat('m/d/Y', $_POST['event_starts']))
    ? $dt->format('Y-m-d') : date('Y-m-d');

Issue 4: NOT NULL Columns Without Defaults

Problem: Columns defined as NOT NULL without default values caused INSERT failures when values weren't provided.

Fix:

-- Calendar events table
ALTER TABLE gd_calendar_events
  MODIFY assign_to int(11) NOT NULL DEFAULT 0,
  MODIFY recurring varchar(255) NOT NULL DEFAULT '',
  MODIFY gender varchar(10) NOT NULL DEFAULT '',
  MODIFY widget_id int(11) NOT NULL DEFAULT 0,
  MODIFY sort_order varchar(20) NOT NULL DEFAULT '',
  MODIFY done int(11) NOT NULL DEFAULT 0;

-- Activity table
ALTER TABLE activity
  MODIFY uid int(11) NOT NULL DEFAULT 0,
  MODIFY uname varchar(1000) NOT NULL DEFAULT '',
  MODIFY message varchar(2000) NOT NULL DEFAULT '';

-- Cases time table
ALTER TABLE gd_cases_time MODIFY stop_time time DEFAULT NULL;

-- Invited users table
ALTER TABLE gd_invited_users
  MODIFY event_status_id varchar(255) NOT NULL DEFAULT '',
  MODIFY created_on datetime DEFAULT CURRENT_TIMESTAMP,
  MODIFY modified_on datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP;

Verification:

  • All form submissions working
  • Calendar events creating successfully
  • No SQL errors in error_log

Technologies: PHP 8.1, MariaDB 10.11, SQL strict mode, date parsing

Recommendations:

  1. Always specify column names in INSERT statements
  2. Use NULL or omit ID column for auto-increment
  3. Use DateTime::createFromFormat() for known date formats
  4. Add DEFAULT values to all NOT NULL columns
  5. Test with STRICT_TRANS_TABLES enabled in development

Backup Location: /home/grabblaw/public_html/data_grabbanddurando/backups_mariadb_fix/


Problem: Database Data Sync After DNS Cutover

Date: 2025-12-12 Site: data.grabbanddurando.com Technologies: MySQL, DNS, rsync, migration

Symptom:

  • Site was migrated to new server (IX)
  • DNS pointed to old GoDaddy VPS (208.109.235.224) for 2 days
  • Users continued updating data on old server Dec 10-11
  • New server missing recent records

Root Cause: DNS cutover timing - users were still hitting old server after migration completed.

Solution:

  1. Identify Missing Records:
-- Compare max IDs between servers
-- GoDaddy: activity max ID 18543
-- IX: activity max ID 18539
-- Result: 4 missing records

-- GoDaddy: gd_calendar_events max ID 14763
-- IX: gd_calendar_events max ID 14762
-- Result: 1 missing record

-- GoDaddy: gd_assign_users max ID 24301
-- IX: gd_assign_users max ID 24299
-- Result: 2 missing records
  1. Sync Missing Records:
# On GoDaddy VPS
mysqldump -u root -p grabblaw_gdapp \
  --where="id > 18539" activity \
  --replace > /tmp/activity_missing.sql

mysqldump -u root -p grabblaw_gdapp \
  --where="id > 14762" gd_calendar_events \
  --replace > /tmp/calendar_missing.sql

mysqldump -u root -p grabblaw_gdapp \
  --where="id > 24299" gd_assign_users \
  --replace > /tmp/assign_missing.sql

# Transfer and import on IX
mysql -u grabblaw_gddata -p grabblaw_gdapp_data < activity_missing.sql
mysql -u grabblaw_gddata -p grabblaw_gdapp_data < calendar_missing.sql
mysql -u grabblaw_gddata -p grabblaw_gdapp_data < assign_missing.sql

Verification:

  • Verified max IDs match on both servers
  • All missing records now present on IX
  • No conflicts (auto-increment IDs unique)

Technologies: MySQL, mysqldump, DNS migration, data sync

Lesson Learned: During migrations, monitor DNS propagation and old server access logs. Users may continue hitting old server for hours after cutover. Plan for data sync window or implement read-only mode on old server.


Web Applications & JavaScript

Problem: mod_pagespeed Corrupting JavaScript Code

Date: 2025-12-15 Site: data.grabbanddurando.com Technologies: Apache mod_pagespeed, JavaScript, FullCalendar

Symptom:

  • Calendar view broken after migration
  • Displayed only small date picker instead of full FullCalendar
  • JavaScript syntax errors in console

Root Cause: mod_pagespeed was corrupting JavaScript code during optimization. Example:

// Original code:
url: 'cal_edit_events.php',

// After mod_pagespeed processing:
urlnts.php',

This caused JavaScript syntax errors ("unexpected token: ':'") preventing calendar from rendering.

Solution:

  1. Disable PageSpeed Globally:
# /etc/apache2/conf.modules.d/510_pagespeed.conf
ModPagespeed off
  1. Add to .htaccess:
# /home/grabblaw/public_html/data_grabbanddurando/.htaccess

# Disable PageSpeed - breaking calendar JavaScript
ModPagespeed off
ModPagespeedDisableFilters combine_javascript,rewrite_javascript

# Force revalidation of JS files after PageSpeed fix
<FilesMatch "\.js$">
    Header set Cache-Control "no-cache, must-revalidate"
</FilesMatch>
  1. Clear PageSpeed Cache:
rm -rf /var/mod_pagespeed/cache/*
systemctl restart httpd

Verification:

  • Calendar now renders correctly
  • Month view shows all events
  • JavaScript errors cleared from console

Technologies: Apache mod_pagespeed, JavaScript optimization, FullCalendar

Related Issue: isset() checks added to PHP broke calendar functionality. Original files (without isset checks) restored from backups.

Lesson Learned: mod_pagespeed JavaScript optimization can corrupt code, especially with complex libraries like FullCalendar. Test thoroughly after enabling. Disable for specific directories if needed rather than site-wide.


Problem: PHP Session Changes Breaking Calendar

Date: 2025-12-15 Site: data.grabbanddurando.com Technologies: PHP, sessions, FullCalendar

Symptom:

  • Calendar stopped working after adding isset() checks
  • No JavaScript errors
  • Events not loading

Root Cause: Added "defensive" isset() checks to session variables broke calendar logic:

// Original (working):
$ciud = $_SESSION['userinfo']['user_id'];

// Changed to (BROKE IT):
$ciud = isset($_SESSION['userinfo']['user_id']) ?
    $_SESSION['userinfo']['user_id'] : 0;

When session checks return 0 (default value), calendar query fails or returns wrong data.

Solution: Reverted PHP files to original versions from backups:

  • manage_calendar.php ← manage_calendar.php.bak
  • manage_calendar_page1.php ← manage_calendar_page1.php.bak
  • cal_events.php ← cal_events.php.bak3

Verification:

  • Calendar loads correctly
  • Events display in all views
  • PHP notices still appear in error_log but don't affect functionality

Technologies: PHP, session management, defensive programming

Lesson Learned: Not all PHP notices need to be "fixed." If code works correctly and notices are cosmetic, leave it alone. Adding isset() checks can change program logic if defaults don't match expected behavior. Test thoroughly before applying "defensive" code changes.


Email & DNS

Problem: Email Spoofing Due to Missing DMARC/DKIM

Date: 2025-12-19 Domain: bgbuildersllc.com Technologies: M365, SPF, DMARC, DKIM, email security

Symptom:

  • User received phishing email appearing to be from herself
  • Email had legitimate-looking From address
  • Contained malicious attachment

Root Cause: Domain had SPF configured but missing DMARC and DKIM, allowing external senders to forge From addresses.

Record Status Impact
SPF Present (-all) Fails but not enforced without DMARC
DMARC Missing No policy to reject forged emails
DKIM Missing No cryptographic signing

Investigation Results:

Account NOT Compromised:

  • No mailbox forwarding configured
  • No inbox rules configured
  • No send-as permissions granted
  • Last password change: April 15, 2025
  • Email correctly routed to Junk folder (M365 flagged it)

Solution:

  1. Add DMARC Record:
Type: TXT
Name: _dmarc
Value: v=DMARC1; p=reject; rua=mailto:sysadmin@bgbuildersllc.com
  1. Add DKIM Records:
Type: CNAME
Name: selector1._domainkey
Value: selector1-bgbuildersllc-com._domainkey.sonorangreenllc.onmicrosoft.com

Type: CNAME
Name: selector2._domainkey
Value: selector2-bgbuildersllc-com._domainkey.sonorangreenllc.onmicrosoft.com
  1. Enable DKIM Signing in M365:
  • Navigate to M365 Defender → Email Authentication Settings
  • Enable DKIM signing for domain
  • Wait for DNS propagation (5-10 minutes)
  • Verify selectors detected

Verification:

# Check DMARC
nslookup -type=txt _dmarc.bgbuildersllc.com

# Check DKIM
nslookup -type=cname selector1._domainkey.bgbuildersllc.com
nslookup -type=cname selector2._domainkey.bgbuildersllc.com

Technologies: Microsoft 365, DMARC, DKIM, SPF, DNS, email authentication

Final Configuration:

Record Status
MX → M365
SPF (includes M365)
DMARC p=reject
DKIM selector1
DKIM selector2

Lesson Learned: SPF alone is insufficient. DMARC enforces SPF/DKIM policies and tells receiving servers to reject forged emails. Always configure all three (SPF + DKIM + DMARC) for proper email authentication.


Problem: Missing DNS Records After Domain Migration

Date: 2025-12-12 Domain: data.grabbanddurando.com Technologies: DNS, WHM API, cPanel

Symptom:

  • Domain not resolving after migration to IX server
  • A record missing from DNS zone

Root Cause: During migration, DNS zone was created but A record wasn't added (human error or script failure).

Solution:

# Add A record via WHM API
whmapi1 addzonerecord domain=grabbanddurando.com \
  name=data \
  type=A \
  address=72.194.62.5 \
  ttl=14400

Verification:

# Test resolution on multiple nameservers
nslookup data.grabbanddurando.com 8.8.8.8       # Google
nslookup data.grabbanddurando.com 1.0.0.1       # Cloudflare
nslookup data.grabbanddurando.com 9.9.9.9       # Quad9
nslookup data.grabbanddurando.com 208.67.222.222 # OpenDNS

Technologies: DNS, WHM API, nameservers

Related Fix: Set zone TTL to 600 seconds (was 14400) for faster DNS updates during troubleshooting.

Lesson Learned: After domain migrations, verify ALL DNS records, not just nameservers. Use multiple public DNS servers to test propagation.


Problem: Wrong SSL Certificate Being Served

Date: 2025-12-12 Domain: data.grabbanddurando.com Technologies: Apache, SSL/TLS, Let's Encrypt, AutoSSL, ServerAlias

Symptom:

  • data.grabbanddurando.com showed SSL error
  • Wrong certificate being served (didn't include bare domain)
  • Certificate was valid for data.grabbanddurando.grabblaw.com

Root Cause: data.grabbanddurando.com was listed as ServerAlias in data.grabbanddurando.grabblaw.com vhost, but that vhost's SSL certificate didn't include the bare domain.

Solution:

  1. Remove DNS A record temporarily (prevent conflicting vhosts)

  2. Add domain as proper subdomain:

# In cPanel for grabblaw account
# Add subdomain: data.grabbanddurando.com
  1. Run AutoSSL:
# WHM → SSL/TLS → Manage AutoSSL
# Run for grabblaw account
# Gets fresh Let's Encrypt certificate
  1. Remove conflicting ServerAlias:
# Edit vhost configuration
# Remove data.grabbanddurando.com from other vhost's ServerAlias
  1. Rebuild Apache config:
/scripts/rebuildhttpdconf
systemctl restart httpd

Verification:

# Check certificate
openssl s_client -connect data.grabbanddurando.com:443 -servername data.grabbanddurando.com < /dev/null | grep -A2 "Subject:"

# Verify in browser - no SSL warnings

Technologies: Apache, SSL/TLS, Let's Encrypt, AutoSSL, cPanel

Lesson Learned: ServerAlias directives can cause SSL issues if multiple vhosts claim same domain. Always ensure domain is primary (ServerName) in one vhost only, not an alias in multiple vhosts. Run AutoSSL after fixing to get proper certificate.


Legacy Systems & DOS

Problem: DOS Machine Network Access After Crypto Attack

Date: 2025-12-14 Client: Dataforth Technologies: MS-DOS 6.22, SMB1, Samba, legacy networking

Symptom:

  • ~30 legacy DOS test machines unable to access network shares
  • SMB1 disabled on production servers after crypto attack
  • DOS machines require SMB1 (no SMB2/3 support)

Root Cause: Security hardening after crypto attack disabled SMB1 protocol on production Windows servers. DOS machines only support SMB1 (CORE protocol).

Solution: Deploy Netgear ReadyNAS as SMB1 proxy:

Architecture:

DOS Machines (SMB1) → NAS (SMB1 + SMB2/3) → AD2 (SMB2/3 only)

NAS Configuration:

# /etc/frontview/samba/smb.conf.overrides
server min protocol = CORE
client min protocol = CORE
lanman auth = yes
ntlm auth = yes
raw NTLMv2 auth = yes
client lanman auth = yes
client plaintext auth = yes
netbios name = D2TESTNAS
workgroup = INTRANET
null passwords = yes
map to guest = Bad Password
guest ok = yes
wins support = yes

DOS Machine Configuration:

REM C:\NET\SYSTEM.INI
computername=TS-XX
username=TS-XX
workgroup=INTRANET
logondomain=INTRANET
autologon=yes

REM C:\STARTNET.BAT
net use T: \\D2TESTNAS\test
net use X: \\D2TESTNAS\datasheets

Sync Configuration (NAS to AD2):

# /root/sync-to-ad2.sh - runs every 15 minutes
# Sync test results from NAS to production server

cd /data/test
for file in $(find . -type f -mmin -60); do
    dir=$(dirname "$file")
    smbclient //192.168.0.6/test -A /root/.ad2creds \
      -c "mkdir \"$dir\"; put \"$file\" \"$file\"" 2>> $LOG
done

Issues Resolved:

  1. Error 53 (Network path not found):

    • Fixed by setting server min protocol = CORE (not NT1)
    • Using NetBIOS name instead of IP address
    • Adding LMHOSTS entry or WINS via DHCP
    • Setting correct workgroup (INTRANET)
  2. Error 5 (Access denied):

    • Fixed by setting null passwords: smbpasswd -n ts-XX
    • Enabling null passwords = yes in Samba config
  3. NetBIOS name conflict:

    • Changed NAS name from AD2 to D2TESTNAS
  4. X: drive not found error:

    • Empty root folder, added README.TXT placeholder

Technologies: MS-DOS 6.22, SMB1, Samba, NetBIOS, WINS, legacy networking

Verification:

  • DOS machines can map T: and X: drives
  • Users log in with blank password (press Enter twice)
  • Test data syncs to production server every 15 minutes

Lesson Learned: Legacy systems requiring SMB1 can be isolated using proxy NAS that speaks both SMB1 (to legacy clients) and SMB2/3 (to modern servers). This allows security hardening on production systems while maintaining legacy system functionality.


Problem: DOS Machine Slow Boot (13 Years of File Copying)

Date: 2025-12-14 Technologies: MS-DOS 6.22, batch scripts, file operations

Symptom:

  • DOS machines taking very long time to boot
  • Same files being copied every boot for 13 years
  • Unnecessary disk wear

Root Cause: Boot script CTONW.BAT used COPY instead of MOVE, creating duplicates:

REM Old code - copies EVERY boot
IF EXIST T:\%STATION%\ProdSW\*.EXE COPY T:\%STATION%\ProdSW\*.EXE C:\ATE\

Solution:

REM New code - uses MOVE (one-time operation)
IF EXIST T:\%STATION%\ProdSW\*.EXE MOVE T:\%STATION%\ProdSW\*.EXE C:\ATE\
IF EXIST T:\%STATION%\ProdSW\*.BAT MOVE T:\%STATION%\ProdSW\*.BAT C:\BAT\

Benefits:

  • Faster boot time
  • Files only transferred once
  • Folder empties after move
  • No unnecessary disk operations

Technologies: MS-DOS 6.22, batch files, file operations

Lesson Learned: Review long-running scripts for inefficient operations. A simple COPY vs MOVE change can eliminate 13 years of unnecessary file operations.


Problem: DOS Batch Files Failing After Unix Line Endings

Date: 2025-12-14 Technologies: MS-DOS 6.22, batch files, line endings

Symptom:

  • Batch files created on Linux not executing on DOS
  • Command not found errors
  • Syntax errors in valid commands

Root Cause: DOS requires CR+LF (\r\n) line endings. Files created/edited on Linux have LF (\n) only.

Solution:

# Convert line endings before deployment
sed -i 's/$/\r/' filename.bat

# Or use dos2unix (reverse):
unix2dos filename.bat

Verification:

# Check file on Linux
file filename.bat
# Should show: ASCII text, with CRLF line terminators

# Or check hex
od -c filename.bat | head
# Should see \r\n at line ends

Technologies: MS-DOS, batch files, line endings, cross-platform

Related Issues:

  • /I flag (case-insensitive IF) doesn't exist in DOS 6.22
  • Trailing backslash in paths causes "Invalid directory" error
  • Leading underscore in folder names may cause issues

Lesson Learned: Always test cross-platform scripts on target platform. Line endings, command flags, and path separators vary between DOS and modern systems.


Problem: UPDATE.BAT Not Syncing to NAS

Date: 2026-01-15 Technologies: PowerShell, SMB, sync scripts

Symptom:

  • UPDATE.BAT modified on AD2 server
  • Changes not appearing on NAS
  • Other files syncing correctly

Root Cause: Sync script (Sync-FromNAS.ps1) only synced COMMON/ProdSW/* files, not root-level UPDATE.BAT.

Solution:

# Modified C:\Shares\test\scripts\Sync-FromNAS.ps1
# Added to PUSH section (AD2 → NAS):

# Sync UPDATE.BAT (root level utility)
Write-Log "Syncing UPDATE.BAT..."
$updateBatLocal = "$AD2_TEST_PATH\UPDATE.BAT"
if (Test-Path $updateBatLocal) {
    $updateBatRemote = "$NAS_DATA_PATH/UPDATE.BAT"

    if ($DryRun) {
        Write-Log "  [DRY RUN] Would push: UPDATE.BAT -> $updateBatRemote"
        $pushedFiles++
    } else {
        $success = Copy-ToNAS -LocalPath $updateBatLocal -RemotePath $updateBatRemote
        if ($success) {
            Write-Log "  Pushed: UPDATE.BAT"
            $pushedFiles++
        } else {
            Write-Log "  ERROR: Failed to push UPDATE.BAT"
            $errorCount++
        }
    }
} else {
    Write-Log "  WARNING: UPDATE.BAT not found at $updateBatLocal"
}

Verification:

  • Backup created: Sync-FromNAS.ps1.backup-20260115-131633
  • Script syntax validated
  • UPDATE.BAT now syncs every 15 minutes

Technologies: PowerShell, SMB, file synchronization, Task Scheduler

Lesson Learned: When troubleshooting sync issues, check both sync script logic AND file locations. Root-level files may not be included in directory-based sync patterns.


Problem: COMMAND.COM Cannot Be Overwritten While Running

Date: 2025-12-14 Technologies: MS-DOS 6.22, system files

Symptom:

  • UPDATE DOS command tries to replace COMMAND.COM
  • Operation fails - file in use

Root Cause: COMMAND.COM is the DOS shell - can't overwrite running executable.

Solution - Staged Update:

REM UPDATE DOS now uses staged approach:
REM 1. Copy COMMAND.COM as C:\COMMAND.NEW
COPY T:\COMMON\DOS\COMMAND.COM C:\COMMAND.NEW

REM 2. Save AUTOEXEC.BAT
COPY C:\AUTOEXEC.BAT C:\AUTOEXEC.SAV

REM 3. Create temporary AUTOEXEC.BAT for next boot
ECHO @ECHO OFF > C:\AUTOEXEC.BAT
ECHO DEL C:\COMMAND.COM >> C:\AUTOEXEC.BAT
ECHO REN C:\COMMAND.NEW COMMAND.COM >> C:\AUTOEXEC.BAT
ECHO CALL C:\DOSFIX2.BAT >> C:\AUTOEXEC.BAT

REM 4. DOSFIX2.BAT restores original AUTOEXEC.BAT after update
REM    DEL C:\AUTOEXEC.BAT
REM    REN C:\AUTOEXEC.SAV AUTOEXEC.BAT

REM 5. User reboots to apply update

Technologies: MS-DOS 6.22, boot process, file operations

Lesson Learned: System files in use can't be replaced directly. Use staged update process: copy new file with different name, modify boot process to rename on next boot, then restore normal boot process.


Development & Build Systems

Problem: Rust Windows Build Failing - Missing msvcrt.lib

Date: 2025-12-18 Technologies: Rust, Visual Studio, MSVC, Windows

Symptom:

  • Windows build failed with linker error
  • Visual Studio missing msvcrt.lib
  • Link.exe present but library files missing

Root Cause: VS Community 2026 Insiders was incomplete installation. Link.exe existed but standard libraries (msvcrt.lib) were missing.

Solution:

  1. Uninstall Broken VS:
winget uninstall "Visual Studio Community 2026 Insiders"
  1. Install Complete VS 2022:
winget install Microsoft.VisualStudio.2022.BuildTools \
  --override '--add Microsoft.VisualStudio.Workload.VCTools \
              --add Microsoft.VisualStudio.Component.Windows11SDK.22621 \
              --includeRecommended --quiet --wait'
  1. Verify Installation:
# Check for libraries
dir "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\*\lib\x64\msvcrt.lib"

# Build should now work
cargo build --release

What Was Installed:

  • VS 2022 Build Tools (17.14.23)
  • MSVC v14.44.35207
  • Windows 11 SDK (22621)

Technologies: Rust, MSVC, Visual Studio Build Tools, Windows SDK

Lesson Learned: Preview/Insider builds of development tools may be incomplete. For production builds, use stable releases. Verify complete installation by checking for standard library files, not just compiler executables.


Problem: Cross-Compilation from WSL to Windows

Date: 2025-12-21 Technologies: Rust, WSL, cross-compilation, cargo-xwin, cargo-zigbuild

Symptom:

  • Attempted to cross-compile Rust code from WSL (Linux) to Windows
  • cargo-xwin required clang-cl
  • cargo-zigbuild required mingw dlltool
  • Complex toolchain setup

Root Cause: Cross-compilation to Windows requires Windows-specific linker and libraries. Even with cross-compilation tools, dependencies are complex.

Solution: Build on native Windows instead:

# On Windows machine:
winget install rustup
winget install protocolbuffers.protobuf

# Build normally
cd project
cargo build --release

Technologies: Rust, cross-compilation, WSL, native builds

Lesson Learned: For Windows targets, native builds are simpler than cross-compilation. Cross-compilation adds toolchain complexity. Save cross-compilation for scenarios where native build environment isn't available.


Problem: Linux Binary Can't Overwrite Itself During Auto-Update

Date: 2025-12-16 Technologies: Linux, auto-update, file operations

Symptom:

  • Agent auto-update failing on Linux
  • fs::copy() error: "Text file busy"
  • Cannot overwrite running executable

Root Cause: Linux prevents overwriting a running executable file. Using fs::copy(new_binary, current_binary) fails when current binary is running.

Solution - Rename-Then-Copy:

// Rename current binary to .old
fs::rename("/usr/local/bin/gururmm-agent",
           "/usr/local/bin/gururmm-agent.old")?;

// Copy new binary to original location
fs::copy(new_binary, "/usr/local/bin/gururmm-agent")?;

// Set executable permission
fs::set_permissions("/usr/local/bin/gururmm-agent",
                   fs::Permissions::from_mode(0o755))?;

// Restart service
Command::new("systemctl")
    .args(&["restart", "gururmm-agent"])
    .output()?;

// Old binary can be deleted after restart

Technologies: Linux, file operations, systemd, auto-update

Lesson Learned: On Linux, rename running executable before replacing it. The old file remains on disk (linked by running process) but pathname is free for new binary. This is safer than using mv -f which can fail.


Problem: sudo Password with Special Characters

Date: 2025-12-20 Technologies: SSH, sudo, special characters, shell escaping

Symptom:

  • Remote sudo commands failing
  • Password contains special characters ($, #, !, @)
  • Characters interpreted by shell

Root Cause: Special characters in password interpreted by shell before reaching sudo:

# This fails:
ssh -tt user@host "echo 'pass$word' | sudo -S command"
# $ interpreted as variable expansion

Solution - Use Heredoc:

# Heredoc prevents shell interpretation
ssh -tt user@host "sudo command" << 'PASSWD'
Gptf*77ttb123!@#-rmm
PASSWD

The single quotes around 'PASSWD' prevent variable expansion in heredoc.

Technologies: SSH, sudo, bash, heredoc, shell escaping

Alternative Solutions:

# 1. Escape special characters (tedious):
echo 'pass\$word\!' | sudo -S command

# 2. Use expect script (requires expect installed):
expect -c "spawn ssh user@host sudo command; expect \"assword\"; send \"pass\$word\\n\"; interact"

# 3. Configure NOPASSWD in sudoers (security tradeoff):
user ALL=(ALL) NOPASSWD: /path/to/command

Lesson Learned: For passwords with special characters, use heredoc with single quotes to prevent any shell interpretation. Don't rely on escaping - it's error-prone with multiple special characters.


Authentication & Security

Problem: Account Compromise Investigation (Email Spoofing vs Hack)

Date: 2025-12-19 Client: BG Builders LLC Technologies: Microsoft 365, PowerShell, email security

Symptom:

  • User received phishing email appearing to be from herself
  • Email had legitimate-looking From address
  • Attachment: "Shelly_Bonus.pdf"

Investigation Process:

# Connect to Exchange Online
Connect-ExchangeOnline

# Check for forwarding rules
Get-MailboxForwarding shelly@bgbuildersllc.com

# Check inbox rules
Get-InboxRule -Mailbox shelly@bgbuildersllc.com

# Check send-as permissions
Get-RecipientPermission shelly@bgbuildersllc.com | Where-Object {$_.Trustee -ne "NT AUTHORITY\SELF"}

# Check mailbox permissions
Get-MailboxPermission shelly@bgbuildersllc.com | Where-Object {$_.User -ne "NT AUTHORITY\SELF"}

# Check account status
Get-Mailbox shelly@bgbuildersllc.com | Select-Object *PasswordLastSet*, WhenCreated, IsMailboxEnabled

Results:

Check Result
Mailbox Forwarding None configured
Inbox Rules None configured
Send-As Permissions None granted
Mailbox Permissions Normal
Account Enabled Yes
Last Password Change April 15, 2025
Last Logon Dec 19, 2025 14:02

Conclusion: Email SPOOFING, not account compromise. Attacker forged From address externally. M365 correctly flagged: "We could not verify the identity of the sender" and routed to Junk folder.

Technologies: Microsoft 365, Exchange Online, PowerShell, email security, forensics

Lesson Learned: Always investigate systematically before declaring account compromised. Check forwarding rules, inbox rules, permissions, and authentication logs. External spoofing is more common than actual account compromise.


Infrastructure & Networking

Problem: WHM Plugin Registration 403 Error

Date: 2025-12-19 Technologies: WHM, cPanel, plugin development

Symptom:

  • Cloudflare DNS Manager WHM plugin returns 403
  • Error: "unregistered application"
  • Plugin worked previously

Root Cause: AppConfig file had wrong URL format:

# Wrong (missing leading slash):
url=addon_cloudflareDNS.cgi

# Correct:
url=/cgi/addon_cloudflareDNS.cgi

Solution:

# Fix AppConfig
vim /var/cpanel/apps/cloudflare_dns.conf
# Change url= line to include /cgi/ path

# Re-register plugin
/usr/local/cpanel/bin/register_appconfig /var/cpanel/apps/cloudflare_dns.conf

# Rebuild WHM interface
/usr/local/cpanel/scripts/rebuild_whm_chrome

# Restart cPanel services
/scripts/restartsrv_cpsrvd

Verification:

  • Plugin now appears in WHM sidebar
  • Click plugin - loads without 403 error
  • Can manage DNS records

Technologies: WHM, cPanel, AppConfig, plugin development

Lesson Learned: WHM AppConfig URL paths must include full CGI path. If plugin registration fails with 403, check AppConfig URL format first.


Problem: Windows Agent Service Name Wrong in Auto-Update

Date: 2025-12-20 Technologies: Windows Services, auto-update, Rust

Symptom:

  • Auto-update trying to restart service
  • Command failing: net start gururmm-agent
  • Error: "service name invalid"

Root Cause: Code used internal service name instead of Windows service name:

// Wrong service name:
Command::new("net")
    .args(&["start", "gururmm-agent"])  // This is the binary name
    .output()?;

// Correct service name:
Command::new("net")
    .args(&["start", "GuruRMMAgent"])  // This is the Windows service name
    .output()?;

Service names are different:

  • Binary name: gururmm-agent.exe
  • Service name (internal): gururmm-agent (what we used)
  • Service name (Windows): GuruRMMAgent (what Windows expects)

Solution: Fixed in 5 places in agent/src/updater/mod.rs:

// Rollback watchdog script
Command::new("net").args(&["start", "GuruRMMAgent"])

// Service restart command
Command::new("sc").args(&["start", "GuruRMMAgent"])

Technologies: Windows Services, Rust, service management

Verification:

# Check service name
Get-Service | Where-Object {$_.Name -like "*rmm*"}

# Name      Status
# ----      ------
# GuruRMMAgent  Running

Lesson Learned: Windows service name (what sc and net commands use) may differ from binary name and internal service name. Always verify service name with Get-Service before hardcoding in scripts.


Problem: File Descriptors Exhausted on Linux Agent

Date: 2025-12-20 Technologies: Linux, file descriptors, resource limits

Symptom:

  • Agent crashed with error 24
  • Error message: "No file descriptors available"
  • System logs show FD exhaustion

Root Cause: Agent likely opening too many files/sockets without closing them properly. Possible causes:

  • WebSocket connections not cleaned up
  • Log files held open
  • Metrics collection opening files
  • Process handles not closed

Investigation Commands:

# Check current FD limit
ulimit -n

# Check FD usage by process
lsof -p $(pidof gururmm-agent) | wc -l

# Check FD limit for user
cat /proc/$(pidof gururmm-agent)/limits | grep "open files"

# Monitor FD usage over time
watch -n 1 'lsof -p $(pidof gururmm-agent) | wc -l'

Temporary Solution:

# Increase FD limit for service
# /etc/systemd/system/gururmm-agent.service
[Service]
LimitNOFILE=65536

# Reload and restart
systemctl daemon-reload
systemctl restart gururmm-agent

Long-term Solution: Audit code for resource leaks:

// Ensure file handles closed
let file = File::open("path")?;
// Use file
drop(file);  // Explicit close

// Use RAII patterns
{
    let _file = File::open("path")?;
    // File automatically closed when _file goes out of scope
}

// For async WebSocket connections
// Ensure connections properly closed on errors

Technologies: Linux, file descriptors, resource limits, systemd

Status: Temporary fix applied (increased limit), root cause investigation pending

Lesson Learned: Monitor file descriptor usage in long-running services. Implement proper resource cleanup (RAII in Rust). Set generous FD limits for services but also fix resource leaks.


Software Updates & Auto-Update

Problem: Agent Auto-Update Binary Replacement

Date: 2025-12-16 Technologies: Auto-update, binary replacement, Linux, Windows

Symptom:

  • Agent binary can't be overwritten during auto-update
  • Linux: "Text file busy"
  • Windows: File in use

Root Cause: Operating systems prevent replacing running executables.

Solution - Multi-Step Process:

Linux:

// 1. Backup current binary
fs::copy("/usr/local/bin/gururmm-agent",
         "/etc/gururmm/gururmm-agent.backup")?;

// 2. Save pending update info
let pending = PendingUpdate { update_id, version, ... };
fs::write("/etc/gururmm/pending-update.json",
          serde_json::to_string(&pending)?)?;

// 3. Spawn rollback watchdog (180 second timeout)
let watchdog = format!(r#"#!/bin/bash
sleep 180
if [ -f /etc/gururmm/pending-update.json ]; then
    systemctl stop gururmm-agent
    cp /etc/gururmm/gururmm-agent.backup /usr/local/bin/gururmm-agent
    systemctl start gururmm-agent
fi
"#);
Command::new("bash").arg("-c").arg(&watchdog)
    .spawn()?;

// 4. Rename current binary
fs::rename("/usr/local/bin/gururmm-agent",
           "/usr/local/bin/gururmm-agent.old")?;

// 5. Copy new binary
fs::copy(new_binary, "/usr/local/bin/gururmm-agent")?;
fs::set_permissions("/usr/local/bin/gururmm-agent",
                   fs::Permissions::from_mode(0o755))?;

// 6. Restart service
Command::new("systemctl")
    .args(&["restart", "gururmm-agent"])
    .output()?;

Windows:

// Similar process using Windows scheduled task for watchdog
// sc stop GuruRMMAgent
// copy new binary
// sc start GuruRMMAgent

Rollback Safety:

  • Watchdog script runs for 3 minutes
  • If agent doesn't confirm successful update, watchdog rolls back
  • Agent confirms by including previous_version in auth message
  • Server clears pending update on successful confirmation

Technologies: Auto-update, file operations, process management, systemd, Windows Services

Lesson Learned: Auto-update requires multi-step process: backup, spawn watchdog, replace binary, restart. Always implement rollback safety - watchdog restores backup if new version doesn't start successfully.


Problem: Old Agent Versions Can't Auto-Update

Date: 2025-12-20 Technologies: Versioning, backward compatibility, auto-update

Symptom:

  • Agent v0.1.0 doesn't auto-update
  • Server sends Update message but nothing happens
  • Manual update required

Root Cause: Auto-update module was added in v0.3.4. Agents before v0.3.4 don't have code to process Update messages.

Solution:

Immediate: Manual update required for v0.1.0 → v0.3.4+:

# Linux
curl -O https://rmm-api.azcomputerguru.com/downloads/gururmm-agent-linux-amd64-0.5.0
sudo systemctl stop gururmm-agent
sudo cp gururmm-agent-linux-amd64-0.5.0 /usr/local/bin/gururmm-agent
sudo chmod +x /usr/local/bin/gururmm-agent
sudo systemctl start gururmm-agent

# Windows
# Download gururmm-agent-windows-amd64-0.5.0.exe
# Stop service, replace binary, start service

Long-term:

  • Track minimum auto-update version in database
  • Show "manual update required" for agents below v0.3.4
  • In future, include basic updater stub in ALL versions (even v1.0)

Technologies: Versioning, backward compatibility, update mechanisms

Lesson Learned: Essential features (like auto-update) should be included in initial releases. For existing deployments, implement "bootstrap" update to get old agents to minimum supported version.


Problem: Agent Not Receiving Commands After Server Reboot

Date: 2025-12-20 Technologies: WebSocket, connection management, stateful connections

Symptom:

  • Agent shows as connected in dashboard
  • Commands sent to agent not executed
  • No errors on server or agent

Root Cause: Stale WebSocket connection. Agent established connection before server reboot. Connection appears open but server can't send commands through it.

Solution:

# Restart agent to establish fresh connection
sudo systemctl restart gururmm-agent

# After restart, commands work correctly

Long-term Fix: Implement connection health checks:

Server-side:

// Send ping every 30 seconds
// Close connection if no pong response in 60 seconds
tokio::spawn(async move {
    loop {
        tokio::time::sleep(Duration::from_secs(30)).await;
        if socket.send(Message::Ping(vec![])).await.is_err() {
            break;  // Connection dead
        }
    }
});

Agent-side:

// Respond to pings
Message::Ping(data) => {
    socket.send(Message::Pong(data)).await?;
}

// Reconnect if connection lost
// Exponential backoff: 1s, 2s, 4s, 8s, max 60s

Technologies: WebSocket, connection health, keepalive, reconnection

Lesson Learned: Implement keepalive/ping-pong for long-lived connections. Detect stale connections and automatically reconnect. Don't rely on TCP keepalive alone - implement application-level health checks.


Cross-Platform Compatibility

Problem: Windows Server 2008 R2 Binary Won't Run

Date: 2025-12-18 Technologies: Windows Server 2008 R2, Rust, compatibility

Symptom:

  • GuruRMM agent binary compiled with --features legacy
  • Binary runs on Windows 10/11 and Server 2016+
  • Binary fails silently on Server 2008 R2
  • Even --help produces no output

Root Cause: Likely missing VC++ Runtime or dependency using API not available in 2008 R2:

  • Windows API minimum version mismatch
  • Missing Visual C++ Redistributable
  • Dependency crate using modern Windows APIs

Investigation Steps:

# Check for VC++ runtime
Get-ItemProperty HKLM:\SOFTWARE\Microsoft\VisualStudio\*\VC\Runtimes\* |
    Select-Object PSChildName, Version

# Check Event Viewer for crash details
Get-EventLog -LogName Application -Newest 10 -EntryType Error

# Check dependencies
dumpbin /dependents gururmm-agent.exe

Potential Culprits:

  • sysinfo crate - may use modern Windows APIs
  • local-ip-address crate - may require newer Windows networking APIs
  • Tokio async runtime - requires certain Windows features

Potential Solutions:

  1. Minimal 2008 R2 Build:
[features]
legacy-2008 = []  # Ultra-minimal feature set

[dependencies]
# Replace sysinfo with manual WMI queries
# Replace local-ip-address with ipconfig parsing
# Use minimal async runtime
  1. Install VC++ Redistributable:
# Download and install Visual C++ 2015-2022 Redistributable (x64)
# vcredist_x64.exe from Microsoft
  1. Check Minimum Windows Version:
// In build.rs or lib.rs
#[cfg(windows)]
fn check_os_version() {
    // Verify Windows version >= 6.1 (2008 R2)
}

Technologies: Windows Server 2008 R2, Rust, VC++ Runtime, compatibility

Status: ⚠️ Unresolved - needs VPN access to test solutions

Lesson Learned: Test binaries on target OS version early in development. Some Rust crates use modern APIs not available in older Windows versions. Consider maintaining separate builds for legacy OS support.


Problem: Feature Flags for Legacy vs Modern Windows

Date: 2025-12-18 Technologies: Rust, Cargo features, conditional compilation

Symptom:

  • Need two builds: modern (Windows 10+) and legacy (Windows 7/2008 R2)
  • Modern uses Windows Service APIs directly
  • Legacy needs alternative service installation (no Windows Service APIs)

Solution - Cargo Feature Flags:

# agent/Cargo.toml
[features]
default = ["native-service"]
native-service = ["dep:windows-service", "dep:windows"]
legacy = []

[dependencies]
windows-service = { version = "0.7", optional = true }
windows = { version = "0.58", optional = true, features = ["Win32_System_Services"] }

Code with Feature Gates:

// agent/src/service.rs

#[cfg(all(windows, feature = "native-service"))]
pub mod windows_service {
    use windows_service::*;
    // Full Windows Service implementation
}

#[cfg(all(windows, not(feature = "native-service")))]
pub mod windows_service {
    // Legacy stub - no SCM integration
    pub fn install() -> Result<()> {
        println!("Legacy install - use Task Scheduler or NSSM");
        Ok(())
    }
}

Build Commands:

# Modern build (default)
cargo build --release
# → gururmm-agent-windows-amd64-0.5.0.exe (3.4 MB)

# Legacy build
cargo build --release --no-default-features --features legacy
# → gururmm-agent-windows-amd64-0.5.0-legacy.exe (3.3 MB)

Technologies: Rust, Cargo, feature flags, conditional compilation, cross-platform

Lesson Learned: Use Cargo features for platform-specific code. This allows single codebase to produce multiple builds for different target environments. Document feature combinations and their intended platforms.


Conclusion

This catalog contains technical solutions for:

  • 70+ distinct technical problems
  • Spanning 11 years of work (2014-2026)
  • 15+ technology stacks

Key themes:

  1. Legacy system compatibility (DOS, Windows 2008 R2, SMB1)
  2. Database migrations (MySQL/MariaDB strict mode)
  3. Email security (SPF, DKIM, DMARC)
  4. Auto-update mechanisms (binary replacement, rollback safety)
  5. Cross-platform development (Windows/Linux/macOS builds)
  6. Authentication protocols (RADIUS, JWT, OAuth)
  7. Network services (VPN, Tailscale, DNS)

Last Updated: 2026-01-26 Total Session Logs Reviewed: 37 files Total Problems Cataloged: 70+