Reorganized project structure for better maintainability and reduced disk usage by 95.9% (11 GB -> 451 MB). Directory Reorganization (85% reduction in root files): - Created docs/ with subdirectories (deployment, testing, database, etc.) - Created infrastructure/vpn-configs/ for VPN scripts - Moved 90+ files from root to organized locations - Archived obsolete documentation (context system, offline mode, zombie debugging) - Moved all test files to tests/ directory - Root directory: 119 files -> 18 files Disk Cleanup (10.55 GB recovered): - Deleted Rust build artifacts: 9.6 GB (target/ directories) - Deleted Python virtual environments: 161 MB (venv/ directories) - Deleted Python cache: 50 KB (__pycache__/) New Structure: - docs/ - All documentation organized by category - docs/archives/ - Obsolete but preserved documentation - infrastructure/ - VPN configs and SSH setup - tests/ - All test files consolidated - logs/ - Ready for future logs Benefits: - Cleaner root directory (18 vs 119 files) - Logical organization of documentation - 95.9% disk space reduction - Faster navigation and discovery - Better portability (build artifacts excluded) Build artifacts can be regenerated: - Rust: cargo build --release (5-15 min per project) - Python: pip install -r requirements.txt (2-3 min) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
6.0 KiB
Zombie Process Investigation - Preliminary Findings
Date: 2026-01-17 Issue: Zombie processes accumulating during long dev sessions, running machine out of memory
Reported Symptoms
User reports these specific zombie processes:
- Multiple "Git for Windows" processes
- Multiple "Console Window Host" (conhost.exe) processes
- Many bash instances
- 5 SSH processes
- 1 ssh-agent process
Initial Investigation Findings
SMOKING GUN: periodic_save_check.py
File: .claude/hooks/periodic_save_check.py
Frequency: Runs EVERY 1 MINUTE via Task Scheduler
Problem: Spawns subprocess without timeout
Subprocess Calls (per execution):
# Line 70-76: Git config check (NO TIMEOUT)
subprocess.run(
["git", "config", "--local", "claude.projectid"],
capture_output=True,
text=True,
check=False,
cwd=PROJECT_ROOT,
)
# Line 81-87: Git remote URL check (NO TIMEOUT)
subprocess.run(
["git", "config", "--get", "remote.origin.url"],
capture_output=True,
text=True,
check=False,
cwd=PROJECT_ROOT,
)
# Line 102-107: Process check (NO TIMEOUT)
subprocess.run(
["tasklist.exe"],
capture_output=True,
text=True,
check=False,
)
Impact Analysis:
- Runs: 60 times/hour, 1,440 times/day
- Each run spawns: 3 subprocess calls
- Total spawns: 180/hour, 4,320/day
- If 1% hang: 1.8 zombies/hour, 43 zombies/day
- If 5% hang: 9 zombies/hour, 216 zombies/day
Process Tree (Windows):
periodic_save_check.py (python.exe)
└─> git.exe (Git for Windows)
└─> bash.exe (for git internals)
└─> conhost.exe (Console Window Host)
Each git command spawns this entire tree!
Why Git/Bash/Conhost Zombies?
Git for Windows Architecture
Git for Windows uses MSYS2/Cygwin which spawns:
git.exe- Main Git binarybash.exe- Shell for git hooks/internalsconhost.exe- Console host for each shell
Normal Lifecycle
subprocess.run(["git", ...])
→ spawn git.exe
→ git spawns bash.exe
→ bash spawns conhost.exe
→ command completes
→ all processes terminate
Problem Scenarios
Scenario 1: Git Hangs (No Timeout)
- Git operation waits indefinitely
- Subprocess never returns
- Processes accumulate
Scenario 2: Orphaned Processes
- Parent (python) terminates before children
- bash.exe and conhost.exe orphaned
- Windows doesn't auto-kill orphans
Scenario 3: Rapid Spawning
- Running every 60 seconds
- Each call spawns 3 processes
- Cleanup slower than spawning
- Processes accumulate
SSH Process Mystery
Question: Why 5 SSH processes if remote is HTTPS?
Remote URL Check:
git config --get remote.origin.url
# Result: https://git.azcomputerguru.com/azcomputerguru/claudetools.git
Hypotheses:
- Credential Helper: Git HTTPS may use SSH credential helper
- SSH Agent: ssh-agent running for other purposes (GitHub, other repos)
- Git Hooks: Pre-commit/post-commit hooks might use SSH
- Background Fetches: Git background maintenance tasks
- Multiple Repos: Other repos on system using SSH
Action: Agents investigating this further
Agents Currently Investigating
- Process Investigation Agent (a381b9a): Root cause analysis
- Solution Design Agent (a8dbf87): Proposing solutions
- Code Pattern Review Agent (a06900a): Reviewing subprocess patterns
- Bash Process Lifecycle Agent (a0da635): Bash/git/conhost lifecycle (IN PROGRESS)
- SSH/Network Connection Agent (a6a748f): SSH connection analysis (IN PROGRESS)
Immediate Observations
Confirmed Issues
-
[HIGH] No Timeout on Subprocess Calls
- periodic_save_check.py: 3 calls without timeout
- If git hangs, process never terminates
- Fix: Add
timeout=5to all subprocess.run() calls
-
[HIGH] High Frequency Execution
- Every 1 minute = 1,440 executions/day
- Each spawns 3+ processes
- Cleanup lag accumulates zombies
-
[MEDIUM] No Error Handling
- No try/finally for cleanup
- If exception occurs, processes may not clean up
Suspected Issues
-
[MEDIUM] Git for Windows Process Tree
- Each git call spawns bash + conhost
- Windows may not clean up tree properly
- Need process group cleanup
-
[LOW] SSH Processes
- 5 SSH + 1 ssh-agent
- Not directly related to HTTPS git URL
- May be separate issue (background git operations?)
Recommended Fixes (Pending Agent Reports)
Immediate (High Priority)
-
Add Timeouts to All Subprocess Calls
subprocess.run( ["git", "config", "--local", "claude.projectid"], capture_output=True, text=True, check=False, cwd=PROJECT_ROOT, timeout=5, # ADD THIS ) -
Reduce Execution Frequency
- Change from every 1 minute to every 5 minutes
- 80% reduction in process spawns
- Still frequent enough for context saving
-
Cache Git Config Results
- Project ID doesn't change frequently
- Cache for 5-10 minutes
- Reduce git calls by 80-90%
Secondary (Medium Priority)
-
Process Group Cleanup
- Use process groups on Windows
- Ensure child processes terminate with parent
-
Monitor and Alert
- Track running process count
- Alert if exceeds threshold
- Auto-cleanup if memory pressure
Pending Agent Analysis
Waiting for comprehensive reports from:
- Bash Process Lifecycle Agent (analyzing bash/git lifecycle)
- SSH/Network Connection Agent (analyzing SSH zombies)
- Solution Design Agent (proposing comprehensive solution)
- Code Pattern Review Agent (finding all subprocess usage)
Next Steps
- Wait for all agent reports to complete
- Coordinate findings across all agents
- Synthesize comprehensive solution
- Present options to user for final decision
- Implement chosen solution
- Test and verify fix
Status: Investigation in progress Preliminary Confidence: HIGH that periodic_save_check.py is primary culprit ETA: Waiting for agent reports (est. 5-10 minutes)