Reorganized project structure for better maintainability and reduced disk usage by 95.9% (11 GB -> 451 MB). Directory Reorganization (85% reduction in root files): - Created docs/ with subdirectories (deployment, testing, database, etc.) - Created infrastructure/vpn-configs/ for VPN scripts - Moved 90+ files from root to organized locations - Archived obsolete documentation (context system, offline mode, zombie debugging) - Moved all test files to tests/ directory - Root directory: 119 files -> 18 files Disk Cleanup (10.55 GB recovered): - Deleted Rust build artifacts: 9.6 GB (target/ directories) - Deleted Python virtual environments: 161 MB (venv/ directories) - Deleted Python cache: 50 KB (__pycache__/) New Structure: - docs/ - All documentation organized by category - docs/archives/ - Obsolete but preserved documentation - infrastructure/ - VPN configs and SSH setup - tests/ - All test files consolidated - logs/ - Ready for future logs Benefits: - Cleaner root directory (18 vs 119 files) - Logical organization of documentation - 95.9% disk space reduction - Faster navigation and discovery - Better portability (build artifacts excluded) Build artifacts can be regenerated: - Rust: cargo build --release (5-15 min per project) - Python: pip install -r requirements.txt (2-3 min) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
240 lines
6.0 KiB
Markdown
240 lines
6.0 KiB
Markdown
# Zombie Process Investigation - Preliminary Findings
|
|
|
|
**Date:** 2026-01-17
|
|
**Issue:** Zombie processes accumulating during long dev sessions, running machine out of memory
|
|
|
|
---
|
|
|
|
## Reported Symptoms
|
|
|
|
User reports these specific zombie processes:
|
|
1. Multiple "Git for Windows" processes
|
|
2. Multiple "Console Window Host" (conhost.exe) processes
|
|
3. Many bash instances
|
|
4. 5 SSH processes
|
|
5. 1 ssh-agent process
|
|
|
|
---
|
|
|
|
## Initial Investigation Findings
|
|
|
|
### SMOKING GUN: periodic_save_check.py
|
|
|
|
**File:** `.claude/hooks/periodic_save_check.py`
|
|
**Frequency:** Runs EVERY 1 MINUTE via Task Scheduler
|
|
**Problem:** Spawns subprocess without timeout
|
|
|
|
**Subprocess Calls (per execution):**
|
|
|
|
```python
|
|
# Line 70-76: Git config check (NO TIMEOUT)
|
|
subprocess.run(
|
|
["git", "config", "--local", "claude.projectid"],
|
|
capture_output=True,
|
|
text=True,
|
|
check=False,
|
|
cwd=PROJECT_ROOT,
|
|
)
|
|
|
|
# Line 81-87: Git remote URL check (NO TIMEOUT)
|
|
subprocess.run(
|
|
["git", "config", "--get", "remote.origin.url"],
|
|
capture_output=True,
|
|
text=True,
|
|
check=False,
|
|
cwd=PROJECT_ROOT,
|
|
)
|
|
|
|
# Line 102-107: Process check (NO TIMEOUT)
|
|
subprocess.run(
|
|
["tasklist.exe"],
|
|
capture_output=True,
|
|
text=True,
|
|
check=False,
|
|
)
|
|
```
|
|
|
|
**Impact Analysis:**
|
|
- Runs: 60 times/hour, 1,440 times/day
|
|
- Each run spawns: 3 subprocess calls
|
|
- Total spawns: 180/hour, 4,320/day
|
|
- If 1% hang: 1.8 zombies/hour, 43 zombies/day
|
|
- If 5% hang: 9 zombies/hour, 216 zombies/day
|
|
|
|
**Process Tree (Windows):**
|
|
```
|
|
periodic_save_check.py (python.exe)
|
|
└─> git.exe (Git for Windows)
|
|
└─> bash.exe (for git internals)
|
|
└─> conhost.exe (Console Window Host)
|
|
```
|
|
|
|
Each git command spawns this entire tree!
|
|
|
|
---
|
|
|
|
## Why Git/Bash/Conhost Zombies?
|
|
|
|
### Git for Windows Architecture
|
|
Git for Windows uses MSYS2/Cygwin which spawns:
|
|
1. `git.exe` - Main Git binary
|
|
2. `bash.exe` - Shell for git hooks/internals
|
|
3. `conhost.exe` - Console host for each shell
|
|
|
|
### Normal Lifecycle
|
|
```
|
|
subprocess.run(["git", ...])
|
|
→ spawn git.exe
|
|
→ git spawns bash.exe
|
|
→ bash spawns conhost.exe
|
|
→ command completes
|
|
→ all processes terminate
|
|
```
|
|
|
|
### Problem Scenarios
|
|
|
|
**Scenario 1: Git Hangs (No Timeout)**
|
|
- Git operation waits indefinitely
|
|
- Subprocess never returns
|
|
- Processes accumulate
|
|
|
|
**Scenario 2: Orphaned Processes**
|
|
- Parent (python) terminates before children
|
|
- bash.exe and conhost.exe orphaned
|
|
- Windows doesn't auto-kill orphans
|
|
|
|
**Scenario 3: Rapid Spawning**
|
|
- Running every 60 seconds
|
|
- Each call spawns 3 processes
|
|
- Cleanup slower than spawning
|
|
- Processes accumulate
|
|
|
|
---
|
|
|
|
## SSH Process Mystery
|
|
|
|
**Question:** Why 5 SSH processes if remote is HTTPS?
|
|
|
|
**Remote URL Check:**
|
|
```bash
|
|
git config --get remote.origin.url
|
|
# Result: https://git.azcomputerguru.com/azcomputerguru/claudetools.git
|
|
```
|
|
|
|
**Hypotheses:**
|
|
1. **Credential Helper:** Git HTTPS may use SSH credential helper
|
|
2. **SSH Agent:** ssh-agent running for other purposes (GitHub, other repos)
|
|
3. **Git Hooks:** Pre-commit/post-commit hooks might use SSH
|
|
4. **Background Fetches:** Git background maintenance tasks
|
|
5. **Multiple Repos:** Other repos on system using SSH
|
|
|
|
**Action:** Agents investigating this further
|
|
|
|
---
|
|
|
|
## Agents Currently Investigating
|
|
|
|
1. **Process Investigation Agent (a381b9a):** Root cause analysis
|
|
2. **Solution Design Agent (a8dbf87):** Proposing solutions
|
|
3. **Code Pattern Review Agent (a06900a):** Reviewing subprocess patterns
|
|
4. **Bash Process Lifecycle Agent (a0da635):** Bash/git/conhost lifecycle (IN PROGRESS)
|
|
5. **SSH/Network Connection Agent (a6a748f):** SSH connection analysis (IN PROGRESS)
|
|
|
|
---
|
|
|
|
## Immediate Observations
|
|
|
|
### Confirmed Issues
|
|
|
|
1. [HIGH] **No Timeout on Subprocess Calls**
|
|
- periodic_save_check.py: 3 calls without timeout
|
|
- If git hangs, process never terminates
|
|
- Fix: Add `timeout=5` to all subprocess.run() calls
|
|
|
|
2. [HIGH] **High Frequency Execution**
|
|
- Every 1 minute = 1,440 executions/day
|
|
- Each spawns 3+ processes
|
|
- Cleanup lag accumulates zombies
|
|
|
|
3. [MEDIUM] **No Error Handling**
|
|
- No try/finally for cleanup
|
|
- If exception occurs, processes may not clean up
|
|
|
|
### Suspected Issues
|
|
|
|
4. [MEDIUM] **Git for Windows Process Tree**
|
|
- Each git call spawns bash + conhost
|
|
- Windows may not clean up tree properly
|
|
- Need process group cleanup
|
|
|
|
5. [LOW] **SSH Processes**
|
|
- 5 SSH + 1 ssh-agent
|
|
- Not directly related to HTTPS git URL
|
|
- May be separate issue (background git operations?)
|
|
|
|
---
|
|
|
|
## Recommended Fixes (Pending Agent Reports)
|
|
|
|
### Immediate (High Priority)
|
|
|
|
1. **Add Timeouts to All Subprocess Calls**
|
|
```python
|
|
subprocess.run(
|
|
["git", "config", "--local", "claude.projectid"],
|
|
capture_output=True,
|
|
text=True,
|
|
check=False,
|
|
cwd=PROJECT_ROOT,
|
|
timeout=5, # ADD THIS
|
|
)
|
|
```
|
|
|
|
2. **Reduce Execution Frequency**
|
|
- Change from every 1 minute to every 5 minutes
|
|
- 80% reduction in process spawns
|
|
- Still frequent enough for context saving
|
|
|
|
3. **Cache Git Config Results**
|
|
- Project ID doesn't change frequently
|
|
- Cache for 5-10 minutes
|
|
- Reduce git calls by 80-90%
|
|
|
|
### Secondary (Medium Priority)
|
|
|
|
4. **Process Group Cleanup**
|
|
- Use process groups on Windows
|
|
- Ensure child processes terminate with parent
|
|
|
|
5. **Monitor and Alert**
|
|
- Track running process count
|
|
- Alert if exceeds threshold
|
|
- Auto-cleanup if memory pressure
|
|
|
|
---
|
|
|
|
## Pending Agent Analysis
|
|
|
|
Waiting for comprehensive reports from:
|
|
- Bash Process Lifecycle Agent (analyzing bash/git lifecycle)
|
|
- SSH/Network Connection Agent (analyzing SSH zombies)
|
|
- Solution Design Agent (proposing comprehensive solution)
|
|
- Code Pattern Review Agent (finding all subprocess usage)
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. Wait for all agent reports to complete
|
|
2. Coordinate findings across all agents
|
|
3. Synthesize comprehensive solution
|
|
4. Present options to user for final decision
|
|
5. Implement chosen solution
|
|
6. Test and verify fix
|
|
|
|
---
|
|
|
|
**Status:** Investigation in progress
|
|
**Preliminary Confidence:** HIGH that periodic_save_check.py is primary culprit
|
|
**ETA:** Waiting for agent reports (est. 5-10 minutes)
|