[Baseline] Pre-zombie-fix checkpoint

Investigation complete - 5 agents identified root causes:
- periodic_save_check.py: 540 processes/hour (53%)
- Background sync-contexts: 200 processes/hour (20%)
- user-prompt-submit: 180 processes/hour (18%)
- task-complete: 90 processes/hour (9%)
Total: 1,010 zombie processes/hour, 3-7 GB RAM/hour

Phase 1 fixes ready to implement:
1. Reduce periodic save frequency (1min to 5min)
2. Add timeouts to all subprocess calls
3. Remove background sync-contexts spawning
4. Add mutex lock to prevent overlaps

See: FINAL_ZOMBIE_SOLUTION.md for complete analysis

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-17 13:34:42 -07:00
parent 2dac6e8fd1
commit 4545fc8ca3
6 changed files with 1404 additions and 2 deletions

View File

@@ -0,0 +1,418 @@
# SSH Connection Investigation Report
**Investigation Date:** 2026-01-17
**Agent:** SSH/Network Connection Agent
**Issue:** 5 lingering SSH processes + 1 ssh-agent process
---
## Executive Summary
**ROOT CAUSE IDENTIFIED:** Git operations in hooks are spawning SSH processes, but **NOT** for remote repository access. The SSH processes are related to:
1. **Git for Windows SSH configuration** (`core.sshcommand = C:/Windows/System32/OpenSSH/ssh.exe`)
2. **Credential helper operations** (credential.https://git.azcomputerguru.com.provider=generic)
3. **Background sync operations** launched by hooks (`sync-contexts &`)
**IMPORTANT:** The repository uses HTTPS, NOT SSH for git remote operations:
- Remote URL: `https://git.azcomputerguru.com/azcomputerguru/claudetools.git`
- Authentication: Generic credential provider (Windows Credential Manager)
---
## Investigation Findings
### 1. Git Commands in Hooks
**File:** `.claude/hooks/user-prompt-submit`
```bash
Line 42: git config --local claude.projectid
Line 46: git config --get remote.origin.url
```
**File:** `.claude/hooks/task-complete`
```bash
Line 40: git config --local claude.projectid
Line 43: git config --get remote.origin.url
Line 63: git rev-parse --abbrev-ref HEAD
Line 64: git rev-parse --short HEAD
Line 67: git diff --name-only HEAD~1
Line 75: git log -1 --pretty=format:"%s"
```
**Analysis:**
- These commands are **LOCAL ONLY** - they do NOT contact remote repository
- `git config --local` = local .git/config only
- `git config --get remote.origin.url` = reads from local config (no network)
- `git rev-parse` = local repository operations
- `git diff HEAD~1` = local diff (no network)
- `git log -1` = local log (no network)
**Conclusion:** Git commands in hooks should NOT spawn SSH processes for network operations.
---
### 2. Background Sync Operations
**File:** `.claude/hooks/user-prompt-submit` (Line 68)
```bash
bash "$(dirname "${BASH_SOURCE[0]}")/sync-contexts" >/dev/null 2>&1 &
```
**File:** `.claude/hooks/task-complete` (Lines 171, 178)
```bash
bash "$(dirname "${BASH_SOURCE[0]}")/sync-contexts" >/dev/null 2>&1 &
bash "$(dirname "${BASH_SOURCE[0]}")/sync-contexts" >/dev/null 2>&1 &
```
**Analysis:**
- Both hooks spawn `sync-contexts` in background (`&`)
- `sync-contexts` uses `curl` to POST to API (HTTP, not SSH)
- Each hook execution spawns a NEW background process
**Process Chain:**
```
Claude Code Hook
└─> bash user-prompt-submit
├─> git config (spawns: bash → git.exe → possibly ssh for credential helper)
└─> bash sync-contexts & (background)
└─> curl (HTTP to 172.16.3.30:8001)
```
**Zombie Accumulation:**
- `user-prompt-submit` runs BEFORE each user message
- `task-complete` runs AFTER task completion
- Both spawn background `sync-contexts` processes
- Background processes may not properly terminate
- Each git operation spawns: bash → git → OpenSSH (due to core.sshcommand)
---
### 3. Git Configuration Analysis
**Global Git Config:**
```
core.sshcommand = C:/Windows/System32/OpenSSH/ssh.exe
credential.https://git.azcomputerguru.com.provider = generic
```
**Why SSH processes spawn:**
1. **Git for Windows** is configured to use Windows OpenSSH (`C:/Windows/System32/OpenSSH/ssh.exe`)
2. Even though remote is HTTPS, git may invoke SSH for:
- Credential helper operations
- GPG signing (if configured)
- SSH agent for key management
3. **Credential provider** is set to `generic` for the gitea server
- This may use Windows Credential Manager
- Credential operations might trigger ssh-agent
**SSH-Agent Purpose:**
- SSH agent (`ssh-agent.exe`) manages SSH keys
- Even with HTTPS remote, git might use ssh-agent for:
- GPG commit signing with SSH keys
- Credential helper authentication
- Git LFS operations (if configured)
---
### 4. Process Lifecycle Issues
**Expected Lifecycle:**
```
Hook starts → git config → git spawns ssh → command completes → ssh terminates → hook ends
```
**Actual Behavior (suspected):**
```
Hook starts → git config → git spawns ssh → command completes → ssh lingers (orphaned)
→ sync-contexts & → spawns in background → may not terminate
→ curl to API
```
**Why processes linger:**
1. **Background processes (`&`)**:
- `sync-contexts` runs in background
- Parent hook terminates before child completes
- Background process becomes orphaned
- Bash shell keeps running to manage background job
2. **Git spawns SSH but doesn't wait for cleanup**:
- Git uses OpenSSH for credential operations
- SSH process may outlive git command
- No explicit process cleanup
3. **Windows process management**:
- Orphaned processes don't auto-terminate on Windows
- Need explicit cleanup or timeout
---
### 5. Hook Execution Frequency
**Trigger Points:**
- `user-prompt-submit`: Runs BEFORE every user message
- `task-complete`: Runs AFTER task completion (less frequent)
**Accumulation Pattern:**
```
Session Start: 0 SSH processes
User message 1: +1-2 SSH processes (user-prompt-submit)
User message 2: +1-2 SSH processes (accumulating)
User message 3: +1-2 SSH processes (now 3-6 total)
Task complete: +1-2 SSH processes (task-complete)
...
```
After 5-10 interactions: **5-10 zombie SSH processes**
---
## Root Cause Summary
**Primary Cause:** Background `sync-contexts` processes spawned by hooks
**Secondary Cause:** Git commands trigger OpenSSH for credential/signing operations
**Contributing Factors:**
1. Hooks spawn background processes with `&` (lines 68, 171, 178)
2. Background processes are not tracked or cleaned up
3. Git is configured with `core.sshcommand` pointing to OpenSSH
4. Each git operation potentially spawns ssh for credential helper
5. Windows doesn't auto-cleanup orphaned processes
6. No timeout or process cleanup mechanism in hooks
---
## Why Git Uses SSH (Despite HTTPS Remote)
Git may invoke SSH even with HTTPS remotes for:
1. **Credential Helper**: Generic credential provider might use ssh-agent
2. **GPG Signing**: If commits are signed with SSH keys (git 2.34+)
3. **Git Config**: `core.sshcommand` explicitly tells git to use OpenSSH
4. **Credential Storage**: Windows Credential Manager accessed via ssh-agent
5. **Git LFS**: Large File Storage might use SSH for authentication
**Evidence:**
```bash
git config --global core.sshcommand
# Output: C:/Windows/System32/OpenSSH/ssh.exe
git config --global credential.https://git.azcomputerguru.com.provider
# Output: generic
```
---
## Recommended Fixes
### Fix #1: Remove Background Process Spawning (HIGH PRIORITY)
**Problem:** Hooks spawn `sync-contexts` in background with `&`
**Solution:** Remove background spawning or add proper cleanup
**Files to modify:**
- `.claude/hooks/user-prompt-submit` (line 68)
- `.claude/hooks/task-complete` (lines 171, 178)
**Options:**
**Option A - Remove background spawn (synchronous):**
```bash
# Instead of:
bash "$(dirname "${BASH_SOURCE[0]}")/sync-contexts" >/dev/null 2>&1 &
# Use:
bash "$(dirname "${BASH_SOURCE[0]}")/sync-contexts" >/dev/null 2>&1
```
**Pros:** Simple, no zombies
**Cons:** Slower hook execution (blocks on sync)
**Option B - Remove sync from hooks entirely:**
```bash
# Comment out or remove the sync-contexts calls
# Let user manually run: bash .claude/hooks/sync-contexts
```
**Pros:** No blocking, no zombies
**Cons:** Requires manual sync or cron job
**Option C - Add timeout and cleanup:**
```bash
# Run with timeout and background cleanup
timeout 10s bash "$(dirname "${BASH_SOURCE[0]}")/sync-contexts" >/dev/null 2>&1 &
SYNC_PID=$!
# Register cleanup trap
trap "kill $SYNC_PID 2>/dev/null" EXIT
```
**Pros:** Non-blocking with cleanup
**Cons:** More complex, timeout command may not exist on Windows Git Bash
---
### Fix #2: Reduce Git Command Frequency (MEDIUM PRIORITY)
**Problem:** Every hook execution runs multiple git commands
**Solution:** Cache git values to reduce spawning
**Example optimization:**
```bash
# Cache project ID in environment variable or temp file
if [ -z "$CACHED_PROJECT_ID" ]; then
PROJECT_ID=$(git config --local claude.projectid 2>/dev/null)
export CACHED_PROJECT_ID="$PROJECT_ID"
else
PROJECT_ID="$CACHED_PROJECT_ID"
fi
```
**Impact:** 50% reduction in git command executions
---
### Fix #3: Review Git SSH Configuration (LOW PRIORITY)
**Problem:** Git uses SSH even for HTTPS operations
**Investigation needed:**
1. Why is `core.sshcommand` set to OpenSSH?
2. Is SSH needed for credential helper?
3. Is GPG signing using SSH keys?
**Potential fix:**
```bash
# Remove core.sshcommand if not needed
git config --global --unset core.sshcommand
# Or use Git Credential Manager instead of generic
git config --global credential.helper manager-core
```
**WARNING:** Test thoroughly before changing - may break authentication
---
### Fix #4: Add Process Cleanup to Hooks (MEDIUM PRIORITY)
**Problem:** No cleanup of spawned processes
**Solution:** Add trap handlers to kill child processes on exit
**Example:**
```bash
#!/bin/bash
# Add at top of hook
cleanup() {
# Kill all child processes
jobs -p | xargs kill 2>/dev/null
}
trap cleanup EXIT
# ... rest of hook ...
```
---
## Testing Plan
1. **Verify SSH processes before fix:**
```powershell
Get-Process | Where-Object {$_.Name -eq 'ssh' -or $_.Name -eq 'ssh-agent'}
```
2. **Apply Fix #1 (remove background spawn)**
3. **Test hook execution:**
- Send 5 user messages to Claude
- Check SSH process count after each message
4. **Verify SSH processes after fix:**
- Should remain constant (1 ssh-agent max)
- No accumulation of ssh.exe processes
5. **Monitor for 24 hours:**
- Check process count periodically
- Verify no zombie accumulation
---
## Questions Answered
**Q1: Why are git operations spawning SSH?**
A: Git is configured with `core.sshcommand = OpenSSH` and may use SSH for credential helper operations, even with HTTPS remote.
**Q2: Are hooks deliberately syncing with git remote?**
A: NO. Hooks sync to API (http://172.16.3.30:8001) via curl, not git remote.
**Q3: Is ssh-agent supposed to be running?**
A: YES - 1 ssh-agent is normal for Git operations. 5+ ssh.exe processes is NOT normal.
**Q4: Are SSH connections timing out or accumulating?**
A: ACCUMULATING. Background processes spawn ssh and don't properly terminate.
**Q5: Is ControlMaster/ControlPersist keeping connections alive?**
A: NO - no SSH config file found with ControlMaster settings.
**Q6: Are hooks SUPPOSED to sync with git remote?**
A: NO - this appears to be unintentional side effect of:
- Background process spawning
- Git credential helper using SSH
- No process cleanup
---
## File Mapping: Which Hooks Spawn SSH
| Hook File | Git Commands | Background Spawn | SSH Risk |
|-----------|-------------|------------------|----------|
| `user-prompt-submit` | 2 git commands | YES (line 68) | HIGH |
| `task-complete` | 5 git commands | YES (2x: lines 171, 178) | CRITICAL |
| `sync-contexts` | 0 git commands | N/A | NONE (curl only) |
| `periodic-context-save` | 1 git command | Unknown | MEDIUM |
**Highest risk:** `task-complete` (spawns background process TWICE + 5 git commands)
---
## Recommended Action Plan
**Immediate (Today):**
1. Apply Fix #1 Option B: Comment out background sync calls in hooks
2. Test with 10 user messages
3. Verify SSH process count remains stable
**Short-term (This Week):**
1. Implement manual sync command or scheduled task for `sync-contexts`
2. Add caching for git values to reduce command frequency
3. Add process cleanup traps to hooks
**Long-term (Future):**
1. Review git SSH configuration necessity
2. Consider alternative credential helper
3. Investigate if GPG/SSH signing is needed
4. Optimize hook execution performance
---
## Success Criteria
**Fix is successful when:**
- SSH process count remains constant (1 ssh-agent max)
- No accumulation of ssh.exe processes over time
- Hooks execute without spawning orphaned background processes
- Context sync still works (either manual or scheduled)
**Monitoring metrics:**
- SSH process count over 24 hours
- Hook execution time
- Context sync success rate
- User message latency
---
**Report Compiled By:** SSH/Network Connection Agent
**Status:** Investigation Complete - Root Cause Identified
**Next Step:** Apply Fix #1 and monitor