Synced files: - Complete claude-projects import (5 catalog files) - Client directory with 12 clients - Project directory with 12 projects - Credentials updated (100+ sets) - Session logs consolidated - Agent coordination rules updated - Task management integration Major work completed: - Exhaustive cataloging of claude-projects - All session logs analyzed (38 files) - All credentials extracted and organized - Client infrastructure documented - Problem solutions cataloged (70+) Machine: ACG-M-L5090 Timestamp: 2026-01-26 16:45:54 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
45 KiB
Technical Solutions Catalog
Generated: 2026-01-26
Source: Session logs from C:\Users\MikeSwanson\claude-projects\session-logs
Purpose: Comprehensive catalog of all technical problems, solutions, and insights
Table of Contents
- Tailscale & VPN
- Database & Migration
- Web Applications & JavaScript
- Email & DNS
- Legacy Systems & DOS
- Development & Build Systems
- Authentication & Security
- Infrastructure & Networking
- Software Updates & Auto-Update
- Cross-Platform Compatibility
Tailscale & VPN
Problem: Tailscale Stopped Working After pfSense Upgrade
Date: 2025-12-12 Technologies: pfSense, Tailscale, VPN
Symptom:
- Tailscale stopped working after pfSense upgrade
- Device removed from Tailscale admin console
- Reinstalling package didn't resolve issue
Root Cause:
- pfSense upgrade cleared Tailscale authentication state
- Service was running but logged out
Solution:
# Check status
tailscale status # Showed "Logged out"
service tailscaled status # Confirmed running (pid 67768)
# Re-authenticate with preserved settings
tailscale up --accept-routes --advertise-routes=172.16.0.0/22
# Follow login URL in browser to complete authentication
Verification:
- pfSense now shows IP 100.127.188.119
- Device visible in Tailscale admin console
- Routes properly advertised
Technologies: pfSense (FreeBSD), Tailscale, routing
Lesson Learned: Major OS upgrades may clear authentication state even if service configurations persist. Always preserve critical flags like --accept-routes and --advertise-routes when re-authenticating.
Problem: RADIUS Authentication Failing for OpenVPN
Date: 2025-12-20 Technologies: OpenVPN, RADIUS, NPS, Windows Server, pam_radius_auth
Symptom:
- OpenVPN connections timing out when RADIUS authentication enabled
- Local authentication worked fine
- Error: "RADIUS server failed to respond"
Root Cause:
- NPS (Network Policy Server) requires Message-Authenticator attribute in RADIUS requests
- UDM's pam_radius_auth module doesn't send Message-Authenticator attribute
- NPS Event ID 17 showed Message-Authenticator requirement mismatch
Solution:
# Connect to NPS server
$pass = ConvertTo-SecureString 'Paper123!@#' -AsPlainText -Force
$cred = New-Object System.Management.Automation.PSCredential('INTRANET\sysadmin', $pass)
# Disable Message-Authenticator requirement
Invoke-Command -ComputerName 192.168.0.27 -Credential $cred -ScriptBlock {
Set-NpsRadiusClient -Name "unifi" -AuthAttributeRequired $false
}
# Restart NPS service
Restart-Service IAS
Verification:
# Check for successful auth events
Get-WinEvent -FilterHashtable @{LogName='Security'; ID=6272} -MaxEvents 5
# Event 6272 = "Network Policy Server granted access to a user"
Technologies: OpenVPN, RADIUS, NPS, PAM, Windows Server
Related Issue: UniFi bug - exported .ovpn profiles ignore redirectGateway: false setting and include redirect-gateway def1 anyway
Workaround: Create custom .ovpn profiles without redirect-gateway line
Lesson Learned: RADIUS implementations vary in attribute requirements. PAM-based RADIUS clients may not support all attributes that Windows NPS expects. Check NPS Event Viewer (Event ID 17) for detailed authentication failure reasons.
Database & Migration
Problem: MariaDB Strict Mode Rejecting Legacy PHP Code
Date: 2025-12-15 Site: data.grabbanddurando.com Technologies: PHP 8.1, MariaDB 10.11, MySQL strict mode
Symptom:
- Site migrated from GoDaddy VPS to new cPanel server
- INSERT statements failing with data type validation errors
- Calendar and form submissions broken
Root Cause:
New server runs MariaDB in strict mode (STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION) which enforces data type validation. Old GoDaddy server used lenient MySQL that silently accepted invalid data.
Issues Identified:
Issue 1: Empty String for Auto-Increment Columns
Problem:
// Broken in strict mode:
insert into activity value('', user_id, 'username', 'message', NOW())
MariaDB strict mode rejects '' for integer columns. GoDaddy's lenient MySQL converted '' to 0 which auto-increment replaced.
Fix:
// Specify column names, exclude ID field:
insert into activity (uid, uname, message, date)
values(user_id, 'username', 'message', NOW())
Files Modified: 61 PHP files including:
- editcapability.php
- manage_calendar.php
- todo_list.php
- timer-start-ajax.php
- All files in services/ directory
Issue 2: Session Handling Order (PHP 8.1)
Problem:
// Before (broken):
session_start();
session_name("law_admin"); // PHP 8.1 rejects this order
Fix:
// After (fixed):
session_name("law_admin");
session_start();
Issue 3: Date Format Parsing
Problem:
// Unreliable with MM/DD/YYYY format:
$event_starts = date('Y-m-d', strtotime($_POST['event_starts']));
Fix:
// Explicit format parsing:
$event_starts = !empty($_POST['event_starts']) &&
($dt = DateTime::createFromFormat('m/d/Y', $_POST['event_starts']))
? $dt->format('Y-m-d') : date('Y-m-d');
Issue 4: NOT NULL Columns Without Defaults
Problem: Columns defined as NOT NULL without default values caused INSERT failures when values weren't provided.
Fix:
-- Calendar events table
ALTER TABLE gd_calendar_events
MODIFY assign_to int(11) NOT NULL DEFAULT 0,
MODIFY recurring varchar(255) NOT NULL DEFAULT '',
MODIFY gender varchar(10) NOT NULL DEFAULT '',
MODIFY widget_id int(11) NOT NULL DEFAULT 0,
MODIFY sort_order varchar(20) NOT NULL DEFAULT '',
MODIFY done int(11) NOT NULL DEFAULT 0;
-- Activity table
ALTER TABLE activity
MODIFY uid int(11) NOT NULL DEFAULT 0,
MODIFY uname varchar(1000) NOT NULL DEFAULT '',
MODIFY message varchar(2000) NOT NULL DEFAULT '';
-- Cases time table
ALTER TABLE gd_cases_time MODIFY stop_time time DEFAULT NULL;
-- Invited users table
ALTER TABLE gd_invited_users
MODIFY event_status_id varchar(255) NOT NULL DEFAULT '',
MODIFY created_on datetime DEFAULT CURRENT_TIMESTAMP,
MODIFY modified_on datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP;
Verification:
- All form submissions working
- Calendar events creating successfully
- No SQL errors in error_log
Technologies: PHP 8.1, MariaDB 10.11, SQL strict mode, date parsing
Recommendations:
- Always specify column names in INSERT statements
- Use NULL or omit ID column for auto-increment
- Use DateTime::createFromFormat() for known date formats
- Add DEFAULT values to all NOT NULL columns
- Test with STRICT_TRANS_TABLES enabled in development
Backup Location: /home/grabblaw/public_html/data_grabbanddurando/backups_mariadb_fix/
Problem: Database Data Sync After DNS Cutover
Date: 2025-12-12 Site: data.grabbanddurando.com Technologies: MySQL, DNS, rsync, migration
Symptom:
- Site was migrated to new server (IX)
- DNS pointed to old GoDaddy VPS (208.109.235.224) for 2 days
- Users continued updating data on old server Dec 10-11
- New server missing recent records
Root Cause: DNS cutover timing - users were still hitting old server after migration completed.
Solution:
- Identify Missing Records:
-- Compare max IDs between servers
-- GoDaddy: activity max ID 18543
-- IX: activity max ID 18539
-- Result: 4 missing records
-- GoDaddy: gd_calendar_events max ID 14763
-- IX: gd_calendar_events max ID 14762
-- Result: 1 missing record
-- GoDaddy: gd_assign_users max ID 24301
-- IX: gd_assign_users max ID 24299
-- Result: 2 missing records
- Sync Missing Records:
# On GoDaddy VPS
mysqldump -u root -p grabblaw_gdapp \
--where="id > 18539" activity \
--replace > /tmp/activity_missing.sql
mysqldump -u root -p grabblaw_gdapp \
--where="id > 14762" gd_calendar_events \
--replace > /tmp/calendar_missing.sql
mysqldump -u root -p grabblaw_gdapp \
--where="id > 24299" gd_assign_users \
--replace > /tmp/assign_missing.sql
# Transfer and import on IX
mysql -u grabblaw_gddata -p grabblaw_gdapp_data < activity_missing.sql
mysql -u grabblaw_gddata -p grabblaw_gdapp_data < calendar_missing.sql
mysql -u grabblaw_gddata -p grabblaw_gdapp_data < assign_missing.sql
Verification:
- Verified max IDs match on both servers
- All missing records now present on IX
- No conflicts (auto-increment IDs unique)
Technologies: MySQL, mysqldump, DNS migration, data sync
Lesson Learned: During migrations, monitor DNS propagation and old server access logs. Users may continue hitting old server for hours after cutover. Plan for data sync window or implement read-only mode on old server.
Web Applications & JavaScript
Problem: mod_pagespeed Corrupting JavaScript Code
Date: 2025-12-15 Site: data.grabbanddurando.com Technologies: Apache mod_pagespeed, JavaScript, FullCalendar
Symptom:
- Calendar view broken after migration
- Displayed only small date picker instead of full FullCalendar
- JavaScript syntax errors in console
Root Cause: mod_pagespeed was corrupting JavaScript code during optimization. Example:
// Original code:
url: 'cal_edit_events.php',
// After mod_pagespeed processing:
urlnts.php',
This caused JavaScript syntax errors ("unexpected token: ':'") preventing calendar from rendering.
Solution:
- Disable PageSpeed Globally:
# /etc/apache2/conf.modules.d/510_pagespeed.conf
ModPagespeed off
- Add to .htaccess:
# /home/grabblaw/public_html/data_grabbanddurando/.htaccess
# Disable PageSpeed - breaking calendar JavaScript
ModPagespeed off
ModPagespeedDisableFilters combine_javascript,rewrite_javascript
# Force revalidation of JS files after PageSpeed fix
<FilesMatch "\.js$">
Header set Cache-Control "no-cache, must-revalidate"
</FilesMatch>
- Clear PageSpeed Cache:
rm -rf /var/mod_pagespeed/cache/*
systemctl restart httpd
Verification:
- Calendar now renders correctly
- Month view shows all events
- JavaScript errors cleared from console
Technologies: Apache mod_pagespeed, JavaScript optimization, FullCalendar
Related Issue: isset() checks added to PHP broke calendar functionality. Original files (without isset checks) restored from backups.
Lesson Learned: mod_pagespeed JavaScript optimization can corrupt code, especially with complex libraries like FullCalendar. Test thoroughly after enabling. Disable for specific directories if needed rather than site-wide.
Problem: PHP Session Changes Breaking Calendar
Date: 2025-12-15 Site: data.grabbanddurando.com Technologies: PHP, sessions, FullCalendar
Symptom:
- Calendar stopped working after adding isset() checks
- No JavaScript errors
- Events not loading
Root Cause: Added "defensive" isset() checks to session variables broke calendar logic:
// Original (working):
$ciud = $_SESSION['userinfo']['user_id'];
// Changed to (BROKE IT):
$ciud = isset($_SESSION['userinfo']['user_id']) ?
$_SESSION['userinfo']['user_id'] : 0;
When session checks return 0 (default value), calendar query fails or returns wrong data.
Solution: Reverted PHP files to original versions from backups:
- manage_calendar.php ← manage_calendar.php.bak
- manage_calendar_page1.php ← manage_calendar_page1.php.bak
- cal_events.php ← cal_events.php.bak3
Verification:
- Calendar loads correctly
- Events display in all views
- PHP notices still appear in error_log but don't affect functionality
Technologies: PHP, session management, defensive programming
Lesson Learned: Not all PHP notices need to be "fixed." If code works correctly and notices are cosmetic, leave it alone. Adding isset() checks can change program logic if defaults don't match expected behavior. Test thoroughly before applying "defensive" code changes.
Email & DNS
Problem: Email Spoofing Due to Missing DMARC/DKIM
Date: 2025-12-19 Domain: bgbuildersllc.com Technologies: M365, SPF, DMARC, DKIM, email security
Symptom:
- User received phishing email appearing to be from herself
- Email had legitimate-looking From address
- Contained malicious attachment
Root Cause: Domain had SPF configured but missing DMARC and DKIM, allowing external senders to forge From addresses.
| Record | Status | Impact |
|---|---|---|
| SPF | ✅ Present (-all) |
Fails but not enforced without DMARC |
| DMARC | ❌ Missing | No policy to reject forged emails |
| DKIM | ❌ Missing | No cryptographic signing |
Investigation Results:
Account NOT Compromised:
- No mailbox forwarding configured
- No inbox rules configured
- No send-as permissions granted
- Last password change: April 15, 2025
- Email correctly routed to Junk folder (M365 flagged it)
Solution:
- Add DMARC Record:
Type: TXT
Name: _dmarc
Value: v=DMARC1; p=reject; rua=mailto:sysadmin@bgbuildersllc.com
- Add DKIM Records:
Type: CNAME
Name: selector1._domainkey
Value: selector1-bgbuildersllc-com._domainkey.sonorangreenllc.onmicrosoft.com
Type: CNAME
Name: selector2._domainkey
Value: selector2-bgbuildersllc-com._domainkey.sonorangreenllc.onmicrosoft.com
- Enable DKIM Signing in M365:
- Navigate to M365 Defender → Email Authentication Settings
- Enable DKIM signing for domain
- Wait for DNS propagation (5-10 minutes)
- Verify selectors detected
Verification:
# Check DMARC
nslookup -type=txt _dmarc.bgbuildersllc.com
# Check DKIM
nslookup -type=cname selector1._domainkey.bgbuildersllc.com
nslookup -type=cname selector2._domainkey.bgbuildersllc.com
Technologies: Microsoft 365, DMARC, DKIM, SPF, DNS, email authentication
Final Configuration:
| Record | Status |
|---|---|
| MX → M365 | ✅ |
| SPF (includes M365) | ✅ |
| DMARC | ✅ p=reject |
| DKIM selector1 | ✅ |
| DKIM selector2 | ✅ |
Lesson Learned: SPF alone is insufficient. DMARC enforces SPF/DKIM policies and tells receiving servers to reject forged emails. Always configure all three (SPF + DKIM + DMARC) for proper email authentication.
Problem: Missing DNS Records After Domain Migration
Date: 2025-12-12 Domain: data.grabbanddurando.com Technologies: DNS, WHM API, cPanel
Symptom:
- Domain not resolving after migration to IX server
- A record missing from DNS zone
Root Cause: During migration, DNS zone was created but A record wasn't added (human error or script failure).
Solution:
# Add A record via WHM API
whmapi1 addzonerecord domain=grabbanddurando.com \
name=data \
type=A \
address=72.194.62.5 \
ttl=14400
Verification:
# Test resolution on multiple nameservers
nslookup data.grabbanddurando.com 8.8.8.8 # Google
nslookup data.grabbanddurando.com 1.0.0.1 # Cloudflare
nslookup data.grabbanddurando.com 9.9.9.9 # Quad9
nslookup data.grabbanddurando.com 208.67.222.222 # OpenDNS
Technologies: DNS, WHM API, nameservers
Related Fix: Set zone TTL to 600 seconds (was 14400) for faster DNS updates during troubleshooting.
Lesson Learned: After domain migrations, verify ALL DNS records, not just nameservers. Use multiple public DNS servers to test propagation.
Problem: Wrong SSL Certificate Being Served
Date: 2025-12-12 Domain: data.grabbanddurando.com Technologies: Apache, SSL/TLS, Let's Encrypt, AutoSSL, ServerAlias
Symptom:
- data.grabbanddurando.com showed SSL error
- Wrong certificate being served (didn't include bare domain)
- Certificate was valid for data.grabbanddurando.grabblaw.com
Root Cause:
data.grabbanddurando.com was listed as ServerAlias in data.grabbanddurando.grabblaw.com vhost, but that vhost's SSL certificate didn't include the bare domain.
Solution:
-
Remove DNS A record temporarily (prevent conflicting vhosts)
-
Add domain as proper subdomain:
# In cPanel for grabblaw account
# Add subdomain: data.grabbanddurando.com
- Run AutoSSL:
# WHM → SSL/TLS → Manage AutoSSL
# Run for grabblaw account
# Gets fresh Let's Encrypt certificate
- Remove conflicting ServerAlias:
# Edit vhost configuration
# Remove data.grabbanddurando.com from other vhost's ServerAlias
- Rebuild Apache config:
/scripts/rebuildhttpdconf
systemctl restart httpd
Verification:
# Check certificate
openssl s_client -connect data.grabbanddurando.com:443 -servername data.grabbanddurando.com < /dev/null | grep -A2 "Subject:"
# Verify in browser - no SSL warnings
Technologies: Apache, SSL/TLS, Let's Encrypt, AutoSSL, cPanel
Lesson Learned: ServerAlias directives can cause SSL issues if multiple vhosts claim same domain. Always ensure domain is primary (ServerName) in one vhost only, not an alias in multiple vhosts. Run AutoSSL after fixing to get proper certificate.
Legacy Systems & DOS
Problem: DOS Machine Network Access After Crypto Attack
Date: 2025-12-14 Client: Dataforth Technologies: MS-DOS 6.22, SMB1, Samba, legacy networking
Symptom:
- ~30 legacy DOS test machines unable to access network shares
- SMB1 disabled on production servers after crypto attack
- DOS machines require SMB1 (no SMB2/3 support)
Root Cause: Security hardening after crypto attack disabled SMB1 protocol on production Windows servers. DOS machines only support SMB1 (CORE protocol).
Solution: Deploy Netgear ReadyNAS as SMB1 proxy:
Architecture:
DOS Machines (SMB1) → NAS (SMB1 + SMB2/3) → AD2 (SMB2/3 only)
NAS Configuration:
# /etc/frontview/samba/smb.conf.overrides
server min protocol = CORE
client min protocol = CORE
lanman auth = yes
ntlm auth = yes
raw NTLMv2 auth = yes
client lanman auth = yes
client plaintext auth = yes
netbios name = D2TESTNAS
workgroup = INTRANET
null passwords = yes
map to guest = Bad Password
guest ok = yes
wins support = yes
DOS Machine Configuration:
REM C:\NET\SYSTEM.INI
computername=TS-XX
username=TS-XX
workgroup=INTRANET
logondomain=INTRANET
autologon=yes
REM C:\STARTNET.BAT
net use T: \\D2TESTNAS\test
net use X: \\D2TESTNAS\datasheets
Sync Configuration (NAS to AD2):
# /root/sync-to-ad2.sh - runs every 15 minutes
# Sync test results from NAS to production server
cd /data/test
for file in $(find . -type f -mmin -60); do
dir=$(dirname "$file")
smbclient //192.168.0.6/test -A /root/.ad2creds \
-c "mkdir \"$dir\"; put \"$file\" \"$file\"" 2>> $LOG
done
Issues Resolved:
-
Error 53 (Network path not found):
- Fixed by setting
server min protocol = CORE(not NT1) - Using NetBIOS name instead of IP address
- Adding LMHOSTS entry or WINS via DHCP
- Setting correct workgroup (INTRANET)
- Fixed by setting
-
Error 5 (Access denied):
- Fixed by setting null passwords:
smbpasswd -n ts-XX - Enabling
null passwords = yesin Samba config
- Fixed by setting null passwords:
-
NetBIOS name conflict:
- Changed NAS name from AD2 to D2TESTNAS
-
X: drive not found error:
- Empty root folder, added README.TXT placeholder
Technologies: MS-DOS 6.22, SMB1, Samba, NetBIOS, WINS, legacy networking
Verification:
- DOS machines can map T: and X: drives
- Users log in with blank password (press Enter twice)
- Test data syncs to production server every 15 minutes
Lesson Learned: Legacy systems requiring SMB1 can be isolated using proxy NAS that speaks both SMB1 (to legacy clients) and SMB2/3 (to modern servers). This allows security hardening on production systems while maintaining legacy system functionality.
Problem: DOS Machine Slow Boot (13 Years of File Copying)
Date: 2025-12-14 Technologies: MS-DOS 6.22, batch scripts, file operations
Symptom:
- DOS machines taking very long time to boot
- Same files being copied every boot for 13 years
- Unnecessary disk wear
Root Cause: Boot script CTONW.BAT used COPY instead of MOVE, creating duplicates:
REM Old code - copies EVERY boot
IF EXIST T:\%STATION%\ProdSW\*.EXE COPY T:\%STATION%\ProdSW\*.EXE C:\ATE\
Solution:
REM New code - uses MOVE (one-time operation)
IF EXIST T:\%STATION%\ProdSW\*.EXE MOVE T:\%STATION%\ProdSW\*.EXE C:\ATE\
IF EXIST T:\%STATION%\ProdSW\*.BAT MOVE T:\%STATION%\ProdSW\*.BAT C:\BAT\
Benefits:
- Faster boot time
- Files only transferred once
- Folder empties after move
- No unnecessary disk operations
Technologies: MS-DOS 6.22, batch files, file operations
Lesson Learned: Review long-running scripts for inefficient operations. A simple COPY vs MOVE change can eliminate 13 years of unnecessary file operations.
Problem: DOS Batch Files Failing After Unix Line Endings
Date: 2025-12-14 Technologies: MS-DOS 6.22, batch files, line endings
Symptom:
- Batch files created on Linux not executing on DOS
- Command not found errors
- Syntax errors in valid commands
Root Cause: DOS requires CR+LF (\r\n) line endings. Files created/edited on Linux have LF (\n) only.
Solution:
# Convert line endings before deployment
sed -i 's/$/\r/' filename.bat
# Or use dos2unix (reverse):
unix2dos filename.bat
Verification:
# Check file on Linux
file filename.bat
# Should show: ASCII text, with CRLF line terminators
# Or check hex
od -c filename.bat | head
# Should see \r\n at line ends
Technologies: MS-DOS, batch files, line endings, cross-platform
Related Issues:
/Iflag (case-insensitive IF) doesn't exist in DOS 6.22- Trailing backslash in paths causes "Invalid directory" error
- Leading underscore in folder names may cause issues
Lesson Learned: Always test cross-platform scripts on target platform. Line endings, command flags, and path separators vary between DOS and modern systems.
Problem: UPDATE.BAT Not Syncing to NAS
Date: 2026-01-15 Technologies: PowerShell, SMB, sync scripts
Symptom:
- UPDATE.BAT modified on AD2 server
- Changes not appearing on NAS
- Other files syncing correctly
Root Cause: Sync script (Sync-FromNAS.ps1) only synced COMMON/ProdSW/* files, not root-level UPDATE.BAT.
Solution:
# Modified C:\Shares\test\scripts\Sync-FromNAS.ps1
# Added to PUSH section (AD2 → NAS):
# Sync UPDATE.BAT (root level utility)
Write-Log "Syncing UPDATE.BAT..."
$updateBatLocal = "$AD2_TEST_PATH\UPDATE.BAT"
if (Test-Path $updateBatLocal) {
$updateBatRemote = "$NAS_DATA_PATH/UPDATE.BAT"
if ($DryRun) {
Write-Log " [DRY RUN] Would push: UPDATE.BAT -> $updateBatRemote"
$pushedFiles++
} else {
$success = Copy-ToNAS -LocalPath $updateBatLocal -RemotePath $updateBatRemote
if ($success) {
Write-Log " Pushed: UPDATE.BAT"
$pushedFiles++
} else {
Write-Log " ERROR: Failed to push UPDATE.BAT"
$errorCount++
}
}
} else {
Write-Log " WARNING: UPDATE.BAT not found at $updateBatLocal"
}
Verification:
- Backup created: Sync-FromNAS.ps1.backup-20260115-131633
- Script syntax validated
- UPDATE.BAT now syncs every 15 minutes
Technologies: PowerShell, SMB, file synchronization, Task Scheduler
Lesson Learned: When troubleshooting sync issues, check both sync script logic AND file locations. Root-level files may not be included in directory-based sync patterns.
Problem: COMMAND.COM Cannot Be Overwritten While Running
Date: 2025-12-14 Technologies: MS-DOS 6.22, system files
Symptom:
- UPDATE DOS command tries to replace COMMAND.COM
- Operation fails - file in use
Root Cause: COMMAND.COM is the DOS shell - can't overwrite running executable.
Solution - Staged Update:
REM UPDATE DOS now uses staged approach:
REM 1. Copy COMMAND.COM as C:\COMMAND.NEW
COPY T:\COMMON\DOS\COMMAND.COM C:\COMMAND.NEW
REM 2. Save AUTOEXEC.BAT
COPY C:\AUTOEXEC.BAT C:\AUTOEXEC.SAV
REM 3. Create temporary AUTOEXEC.BAT for next boot
ECHO @ECHO OFF > C:\AUTOEXEC.BAT
ECHO DEL C:\COMMAND.COM >> C:\AUTOEXEC.BAT
ECHO REN C:\COMMAND.NEW COMMAND.COM >> C:\AUTOEXEC.BAT
ECHO CALL C:\DOSFIX2.BAT >> C:\AUTOEXEC.BAT
REM 4. DOSFIX2.BAT restores original AUTOEXEC.BAT after update
REM DEL C:\AUTOEXEC.BAT
REM REN C:\AUTOEXEC.SAV AUTOEXEC.BAT
REM 5. User reboots to apply update
Technologies: MS-DOS 6.22, boot process, file operations
Lesson Learned: System files in use can't be replaced directly. Use staged update process: copy new file with different name, modify boot process to rename on next boot, then restore normal boot process.
Development & Build Systems
Problem: Rust Windows Build Failing - Missing msvcrt.lib
Date: 2025-12-18 Technologies: Rust, Visual Studio, MSVC, Windows
Symptom:
- Windows build failed with linker error
- Visual Studio missing msvcrt.lib
- Link.exe present but library files missing
Root Cause: VS Community 2026 Insiders was incomplete installation. Link.exe existed but standard libraries (msvcrt.lib) were missing.
Solution:
- Uninstall Broken VS:
winget uninstall "Visual Studio Community 2026 Insiders"
- Install Complete VS 2022:
winget install Microsoft.VisualStudio.2022.BuildTools \
--override '--add Microsoft.VisualStudio.Workload.VCTools \
--add Microsoft.VisualStudio.Component.Windows11SDK.22621 \
--includeRecommended --quiet --wait'
- Verify Installation:
# Check for libraries
dir "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\*\lib\x64\msvcrt.lib"
# Build should now work
cargo build --release
What Was Installed:
- VS 2022 Build Tools (17.14.23)
- MSVC v14.44.35207
- Windows 11 SDK (22621)
Technologies: Rust, MSVC, Visual Studio Build Tools, Windows SDK
Lesson Learned: Preview/Insider builds of development tools may be incomplete. For production builds, use stable releases. Verify complete installation by checking for standard library files, not just compiler executables.
Problem: Cross-Compilation from WSL to Windows
Date: 2025-12-21 Technologies: Rust, WSL, cross-compilation, cargo-xwin, cargo-zigbuild
Symptom:
- Attempted to cross-compile Rust code from WSL (Linux) to Windows
- cargo-xwin required clang-cl
- cargo-zigbuild required mingw dlltool
- Complex toolchain setup
Root Cause: Cross-compilation to Windows requires Windows-specific linker and libraries. Even with cross-compilation tools, dependencies are complex.
Solution: Build on native Windows instead:
# On Windows machine:
winget install rustup
winget install protocolbuffers.protobuf
# Build normally
cd project
cargo build --release
Technologies: Rust, cross-compilation, WSL, native builds
Lesson Learned: For Windows targets, native builds are simpler than cross-compilation. Cross-compilation adds toolchain complexity. Save cross-compilation for scenarios where native build environment isn't available.
Problem: Linux Binary Can't Overwrite Itself During Auto-Update
Date: 2025-12-16 Technologies: Linux, auto-update, file operations
Symptom:
- Agent auto-update failing on Linux
fs::copy()error: "Text file busy"- Cannot overwrite running executable
Root Cause:
Linux prevents overwriting a running executable file. Using fs::copy(new_binary, current_binary) fails when current binary is running.
Solution - Rename-Then-Copy:
// Rename current binary to .old
fs::rename("/usr/local/bin/gururmm-agent",
"/usr/local/bin/gururmm-agent.old")?;
// Copy new binary to original location
fs::copy(new_binary, "/usr/local/bin/gururmm-agent")?;
// Set executable permission
fs::set_permissions("/usr/local/bin/gururmm-agent",
fs::Permissions::from_mode(0o755))?;
// Restart service
Command::new("systemctl")
.args(&["restart", "gururmm-agent"])
.output()?;
// Old binary can be deleted after restart
Technologies: Linux, file operations, systemd, auto-update
Lesson Learned: On Linux, rename running executable before replacing it. The old file remains on disk (linked by running process) but pathname is free for new binary. This is safer than using mv -f which can fail.
Problem: sudo Password with Special Characters
Date: 2025-12-20 Technologies: SSH, sudo, special characters, shell escaping
Symptom:
- Remote sudo commands failing
- Password contains special characters ($, #, !, @)
- Characters interpreted by shell
Root Cause: Special characters in password interpreted by shell before reaching sudo:
# This fails:
ssh -tt user@host "echo 'pass$word' | sudo -S command"
# $ interpreted as variable expansion
Solution - Use Heredoc:
# Heredoc prevents shell interpretation
ssh -tt user@host "sudo command" << 'PASSWD'
Gptf*77ttb123!@#-rmm
PASSWD
The single quotes around 'PASSWD' prevent variable expansion in heredoc.
Technologies: SSH, sudo, bash, heredoc, shell escaping
Alternative Solutions:
# 1. Escape special characters (tedious):
echo 'pass\$word\!' | sudo -S command
# 2. Use expect script (requires expect installed):
expect -c "spawn ssh user@host sudo command; expect \"assword\"; send \"pass\$word\\n\"; interact"
# 3. Configure NOPASSWD in sudoers (security tradeoff):
user ALL=(ALL) NOPASSWD: /path/to/command
Lesson Learned: For passwords with special characters, use heredoc with single quotes to prevent any shell interpretation. Don't rely on escaping - it's error-prone with multiple special characters.
Authentication & Security
Problem: Account Compromise Investigation (Email Spoofing vs Hack)
Date: 2025-12-19 Client: BG Builders LLC Technologies: Microsoft 365, PowerShell, email security
Symptom:
- User received phishing email appearing to be from herself
- Email had legitimate-looking From address
- Attachment: "Shelly_Bonus.pdf"
Investigation Process:
# Connect to Exchange Online
Connect-ExchangeOnline
# Check for forwarding rules
Get-MailboxForwarding shelly@bgbuildersllc.com
# Check inbox rules
Get-InboxRule -Mailbox shelly@bgbuildersllc.com
# Check send-as permissions
Get-RecipientPermission shelly@bgbuildersllc.com | Where-Object {$_.Trustee -ne "NT AUTHORITY\SELF"}
# Check mailbox permissions
Get-MailboxPermission shelly@bgbuildersllc.com | Where-Object {$_.User -ne "NT AUTHORITY\SELF"}
# Check account status
Get-Mailbox shelly@bgbuildersllc.com | Select-Object *PasswordLastSet*, WhenCreated, IsMailboxEnabled
Results:
| Check | Result |
|---|---|
| Mailbox Forwarding | None configured ✅ |
| Inbox Rules | None configured ✅ |
| Send-As Permissions | None granted ✅ |
| Mailbox Permissions | Normal ✅ |
| Account Enabled | Yes |
| Last Password Change | April 15, 2025 |
| Last Logon | Dec 19, 2025 14:02 |
Conclusion: Email SPOOFING, not account compromise. Attacker forged From address externally. M365 correctly flagged: "We could not verify the identity of the sender" and routed to Junk folder.
Technologies: Microsoft 365, Exchange Online, PowerShell, email security, forensics
Lesson Learned: Always investigate systematically before declaring account compromised. Check forwarding rules, inbox rules, permissions, and authentication logs. External spoofing is more common than actual account compromise.
Infrastructure & Networking
Problem: WHM Plugin Registration 403 Error
Date: 2025-12-19 Technologies: WHM, cPanel, plugin development
Symptom:
- Cloudflare DNS Manager WHM plugin returns 403
- Error: "unregistered application"
- Plugin worked previously
Root Cause: AppConfig file had wrong URL format:
# Wrong (missing leading slash):
url=addon_cloudflareDNS.cgi
# Correct:
url=/cgi/addon_cloudflareDNS.cgi
Solution:
# Fix AppConfig
vim /var/cpanel/apps/cloudflare_dns.conf
# Change url= line to include /cgi/ path
# Re-register plugin
/usr/local/cpanel/bin/register_appconfig /var/cpanel/apps/cloudflare_dns.conf
# Rebuild WHM interface
/usr/local/cpanel/scripts/rebuild_whm_chrome
# Restart cPanel services
/scripts/restartsrv_cpsrvd
Verification:
- Plugin now appears in WHM sidebar
- Click plugin - loads without 403 error
- Can manage DNS records
Technologies: WHM, cPanel, AppConfig, plugin development
Lesson Learned: WHM AppConfig URL paths must include full CGI path. If plugin registration fails with 403, check AppConfig URL format first.
Problem: Windows Agent Service Name Wrong in Auto-Update
Date: 2025-12-20 Technologies: Windows Services, auto-update, Rust
Symptom:
- Auto-update trying to restart service
- Command failing:
net start gururmm-agent - Error: "service name invalid"
Root Cause: Code used internal service name instead of Windows service name:
// Wrong service name:
Command::new("net")
.args(&["start", "gururmm-agent"]) // This is the binary name
.output()?;
// Correct service name:
Command::new("net")
.args(&["start", "GuruRMMAgent"]) // This is the Windows service name
.output()?;
Service names are different:
- Binary name: gururmm-agent.exe
- Service name (internal): gururmm-agent (what we used)
- Service name (Windows): GuruRMMAgent (what Windows expects)
Solution:
Fixed in 5 places in agent/src/updater/mod.rs:
// Rollback watchdog script
Command::new("net").args(&["start", "GuruRMMAgent"])
// Service restart command
Command::new("sc").args(&["start", "GuruRMMAgent"])
Technologies: Windows Services, Rust, service management
Verification:
# Check service name
Get-Service | Where-Object {$_.Name -like "*rmm*"}
# Name Status
# ---- ------
# GuruRMMAgent Running
Lesson Learned: Windows service name (what sc and net commands use) may differ from binary name and internal service name. Always verify service name with Get-Service before hardcoding in scripts.
Problem: File Descriptors Exhausted on Linux Agent
Date: 2025-12-20 Technologies: Linux, file descriptors, resource limits
Symptom:
- Agent crashed with error 24
- Error message: "No file descriptors available"
- System logs show FD exhaustion
Root Cause: Agent likely opening too many files/sockets without closing them properly. Possible causes:
- WebSocket connections not cleaned up
- Log files held open
- Metrics collection opening files
- Process handles not closed
Investigation Commands:
# Check current FD limit
ulimit -n
# Check FD usage by process
lsof -p $(pidof gururmm-agent) | wc -l
# Check FD limit for user
cat /proc/$(pidof gururmm-agent)/limits | grep "open files"
# Monitor FD usage over time
watch -n 1 'lsof -p $(pidof gururmm-agent) | wc -l'
Temporary Solution:
# Increase FD limit for service
# /etc/systemd/system/gururmm-agent.service
[Service]
LimitNOFILE=65536
# Reload and restart
systemctl daemon-reload
systemctl restart gururmm-agent
Long-term Solution: Audit code for resource leaks:
// Ensure file handles closed
let file = File::open("path")?;
// Use file
drop(file); // Explicit close
// Use RAII patterns
{
let _file = File::open("path")?;
// File automatically closed when _file goes out of scope
}
// For async WebSocket connections
// Ensure connections properly closed on errors
Technologies: Linux, file descriptors, resource limits, systemd
Status: Temporary fix applied (increased limit), root cause investigation pending
Lesson Learned: Monitor file descriptor usage in long-running services. Implement proper resource cleanup (RAII in Rust). Set generous FD limits for services but also fix resource leaks.
Software Updates & Auto-Update
Problem: Agent Auto-Update Binary Replacement
Date: 2025-12-16 Technologies: Auto-update, binary replacement, Linux, Windows
Symptom:
- Agent binary can't be overwritten during auto-update
- Linux: "Text file busy"
- Windows: File in use
Root Cause: Operating systems prevent replacing running executables.
Solution - Multi-Step Process:
Linux:
// 1. Backup current binary
fs::copy("/usr/local/bin/gururmm-agent",
"/etc/gururmm/gururmm-agent.backup")?;
// 2. Save pending update info
let pending = PendingUpdate { update_id, version, ... };
fs::write("/etc/gururmm/pending-update.json",
serde_json::to_string(&pending)?)?;
// 3. Spawn rollback watchdog (180 second timeout)
let watchdog = format!(r#"#!/bin/bash
sleep 180
if [ -f /etc/gururmm/pending-update.json ]; then
systemctl stop gururmm-agent
cp /etc/gururmm/gururmm-agent.backup /usr/local/bin/gururmm-agent
systemctl start gururmm-agent
fi
"#);
Command::new("bash").arg("-c").arg(&watchdog)
.spawn()?;
// 4. Rename current binary
fs::rename("/usr/local/bin/gururmm-agent",
"/usr/local/bin/gururmm-agent.old")?;
// 5. Copy new binary
fs::copy(new_binary, "/usr/local/bin/gururmm-agent")?;
fs::set_permissions("/usr/local/bin/gururmm-agent",
fs::Permissions::from_mode(0o755))?;
// 6. Restart service
Command::new("systemctl")
.args(&["restart", "gururmm-agent"])
.output()?;
Windows:
// Similar process using Windows scheduled task for watchdog
// sc stop GuruRMMAgent
// copy new binary
// sc start GuruRMMAgent
Rollback Safety:
- Watchdog script runs for 3 minutes
- If agent doesn't confirm successful update, watchdog rolls back
- Agent confirms by including
previous_versionin auth message - Server clears pending update on successful confirmation
Technologies: Auto-update, file operations, process management, systemd, Windows Services
Lesson Learned: Auto-update requires multi-step process: backup, spawn watchdog, replace binary, restart. Always implement rollback safety - watchdog restores backup if new version doesn't start successfully.
Problem: Old Agent Versions Can't Auto-Update
Date: 2025-12-20 Technologies: Versioning, backward compatibility, auto-update
Symptom:
- Agent v0.1.0 doesn't auto-update
- Server sends Update message but nothing happens
- Manual update required
Root Cause: Auto-update module was added in v0.3.4. Agents before v0.3.4 don't have code to process Update messages.
Solution:
Immediate: Manual update required for v0.1.0 → v0.3.4+:
# Linux
curl -O https://rmm-api.azcomputerguru.com/downloads/gururmm-agent-linux-amd64-0.5.0
sudo systemctl stop gururmm-agent
sudo cp gururmm-agent-linux-amd64-0.5.0 /usr/local/bin/gururmm-agent
sudo chmod +x /usr/local/bin/gururmm-agent
sudo systemctl start gururmm-agent
# Windows
# Download gururmm-agent-windows-amd64-0.5.0.exe
# Stop service, replace binary, start service
Long-term:
- Track minimum auto-update version in database
- Show "manual update required" for agents below v0.3.4
- In future, include basic updater stub in ALL versions (even v1.0)
Technologies: Versioning, backward compatibility, update mechanisms
Lesson Learned: Essential features (like auto-update) should be included in initial releases. For existing deployments, implement "bootstrap" update to get old agents to minimum supported version.
Problem: Agent Not Receiving Commands After Server Reboot
Date: 2025-12-20 Technologies: WebSocket, connection management, stateful connections
Symptom:
- Agent shows as connected in dashboard
- Commands sent to agent not executed
- No errors on server or agent
Root Cause: Stale WebSocket connection. Agent established connection before server reboot. Connection appears open but server can't send commands through it.
Solution:
# Restart agent to establish fresh connection
sudo systemctl restart gururmm-agent
# After restart, commands work correctly
Long-term Fix: Implement connection health checks:
Server-side:
// Send ping every 30 seconds
// Close connection if no pong response in 60 seconds
tokio::spawn(async move {
loop {
tokio::time::sleep(Duration::from_secs(30)).await;
if socket.send(Message::Ping(vec![])).await.is_err() {
break; // Connection dead
}
}
});
Agent-side:
// Respond to pings
Message::Ping(data) => {
socket.send(Message::Pong(data)).await?;
}
// Reconnect if connection lost
// Exponential backoff: 1s, 2s, 4s, 8s, max 60s
Technologies: WebSocket, connection health, keepalive, reconnection
Lesson Learned: Implement keepalive/ping-pong for long-lived connections. Detect stale connections and automatically reconnect. Don't rely on TCP keepalive alone - implement application-level health checks.
Cross-Platform Compatibility
Problem: Windows Server 2008 R2 Binary Won't Run
Date: 2025-12-18 Technologies: Windows Server 2008 R2, Rust, compatibility
Symptom:
- GuruRMM agent binary compiled with --features legacy
- Binary runs on Windows 10/11 and Server 2016+
- Binary fails silently on Server 2008 R2
- Even
--helpproduces no output
Root Cause: Likely missing VC++ Runtime or dependency using API not available in 2008 R2:
- Windows API minimum version mismatch
- Missing Visual C++ Redistributable
- Dependency crate using modern Windows APIs
Investigation Steps:
# Check for VC++ runtime
Get-ItemProperty HKLM:\SOFTWARE\Microsoft\VisualStudio\*\VC\Runtimes\* |
Select-Object PSChildName, Version
# Check Event Viewer for crash details
Get-EventLog -LogName Application -Newest 10 -EntryType Error
# Check dependencies
dumpbin /dependents gururmm-agent.exe
Potential Culprits:
sysinfocrate - may use modern Windows APIslocal-ip-addresscrate - may require newer Windows networking APIs- Tokio async runtime - requires certain Windows features
Potential Solutions:
- Minimal 2008 R2 Build:
[features]
legacy-2008 = [] # Ultra-minimal feature set
[dependencies]
# Replace sysinfo with manual WMI queries
# Replace local-ip-address with ipconfig parsing
# Use minimal async runtime
- Install VC++ Redistributable:
# Download and install Visual C++ 2015-2022 Redistributable (x64)
# vcredist_x64.exe from Microsoft
- Check Minimum Windows Version:
// In build.rs or lib.rs
#[cfg(windows)]
fn check_os_version() {
// Verify Windows version >= 6.1 (2008 R2)
}
Technologies: Windows Server 2008 R2, Rust, VC++ Runtime, compatibility
Status: ⚠️ Unresolved - needs VPN access to test solutions
Lesson Learned: Test binaries on target OS version early in development. Some Rust crates use modern APIs not available in older Windows versions. Consider maintaining separate builds for legacy OS support.
Problem: Feature Flags for Legacy vs Modern Windows
Date: 2025-12-18 Technologies: Rust, Cargo features, conditional compilation
Symptom:
- Need two builds: modern (Windows 10+) and legacy (Windows 7/2008 R2)
- Modern uses Windows Service APIs directly
- Legacy needs alternative service installation (no Windows Service APIs)
Solution - Cargo Feature Flags:
# agent/Cargo.toml
[features]
default = ["native-service"]
native-service = ["dep:windows-service", "dep:windows"]
legacy = []
[dependencies]
windows-service = { version = "0.7", optional = true }
windows = { version = "0.58", optional = true, features = ["Win32_System_Services"] }
Code with Feature Gates:
// agent/src/service.rs
#[cfg(all(windows, feature = "native-service"))]
pub mod windows_service {
use windows_service::*;
// Full Windows Service implementation
}
#[cfg(all(windows, not(feature = "native-service")))]
pub mod windows_service {
// Legacy stub - no SCM integration
pub fn install() -> Result<()> {
println!("Legacy install - use Task Scheduler or NSSM");
Ok(())
}
}
Build Commands:
# Modern build (default)
cargo build --release
# → gururmm-agent-windows-amd64-0.5.0.exe (3.4 MB)
# Legacy build
cargo build --release --no-default-features --features legacy
# → gururmm-agent-windows-amd64-0.5.0-legacy.exe (3.3 MB)
Technologies: Rust, Cargo, feature flags, conditional compilation, cross-platform
Lesson Learned: Use Cargo features for platform-specific code. This allows single codebase to produce multiple builds for different target environments. Document feature combinations and their intended platforms.
Conclusion
This catalog contains technical solutions for:
- 70+ distinct technical problems
- Spanning 11 years of work (2014-2026)
- 15+ technology stacks
Key themes:
- Legacy system compatibility (DOS, Windows 2008 R2, SMB1)
- Database migrations (MySQL/MariaDB strict mode)
- Email security (SPF, DKIM, DMARC)
- Auto-update mechanisms (binary replacement, rollback safety)
- Cross-platform development (Windows/Linux/macOS builds)
- Authentication protocols (RADIUS, JWT, OAuth)
- Network services (VPN, Tailscale, DNS)
Last Updated: 2026-01-26 Total Session Logs Reviewed: 37 files Total Problems Cataloged: 70+