Investigation complete - 5 agents identified root causes: - periodic_save_check.py: 540 processes/hour (53%) - Background sync-contexts: 200 processes/hour (20%) - user-prompt-submit: 180 processes/hour (18%) - task-complete: 90 processes/hour (9%) Total: 1,010 zombie processes/hour, 3-7 GB RAM/hour Phase 1 fixes ready to implement: 1. Reduce periodic save frequency (1min to 5min) 2. Add timeouts to all subprocess calls 3. Remove background sync-contexts spawning 4. Add mutex lock to prevent overlaps See: FINAL_ZOMBIE_SOLUTION.md for complete analysis Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
419 lines
12 KiB
Markdown
419 lines
12 KiB
Markdown
# SSH Connection Investigation Report
|
|
|
|
**Investigation Date:** 2026-01-17
|
|
**Agent:** SSH/Network Connection Agent
|
|
**Issue:** 5 lingering SSH processes + 1 ssh-agent process
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
**ROOT CAUSE IDENTIFIED:** Git operations in hooks are spawning SSH processes, but **NOT** for remote repository access. The SSH processes are related to:
|
|
|
|
1. **Git for Windows SSH configuration** (`core.sshcommand = C:/Windows/System32/OpenSSH/ssh.exe`)
|
|
2. **Credential helper operations** (credential.https://git.azcomputerguru.com.provider=generic)
|
|
3. **Background sync operations** launched by hooks (`sync-contexts &`)
|
|
|
|
**IMPORTANT:** The repository uses HTTPS, NOT SSH for git remote operations:
|
|
- Remote URL: `https://git.azcomputerguru.com/azcomputerguru/claudetools.git`
|
|
- Authentication: Generic credential provider (Windows Credential Manager)
|
|
|
|
---
|
|
|
|
## Investigation Findings
|
|
|
|
### 1. Git Commands in Hooks
|
|
|
|
**File:** `.claude/hooks/user-prompt-submit`
|
|
```bash
|
|
Line 42: git config --local claude.projectid
|
|
Line 46: git config --get remote.origin.url
|
|
```
|
|
|
|
**File:** `.claude/hooks/task-complete`
|
|
```bash
|
|
Line 40: git config --local claude.projectid
|
|
Line 43: git config --get remote.origin.url
|
|
Line 63: git rev-parse --abbrev-ref HEAD
|
|
Line 64: git rev-parse --short HEAD
|
|
Line 67: git diff --name-only HEAD~1
|
|
Line 75: git log -1 --pretty=format:"%s"
|
|
```
|
|
|
|
**Analysis:**
|
|
- These commands are **LOCAL ONLY** - they do NOT contact remote repository
|
|
- `git config --local` = local .git/config only
|
|
- `git config --get remote.origin.url` = reads from local config (no network)
|
|
- `git rev-parse` = local repository operations
|
|
- `git diff HEAD~1` = local diff (no network)
|
|
- `git log -1` = local log (no network)
|
|
|
|
**Conclusion:** Git commands in hooks should NOT spawn SSH processes for network operations.
|
|
|
|
---
|
|
|
|
### 2. Background Sync Operations
|
|
|
|
**File:** `.claude/hooks/user-prompt-submit` (Line 68)
|
|
```bash
|
|
bash "$(dirname "${BASH_SOURCE[0]}")/sync-contexts" >/dev/null 2>&1 &
|
|
```
|
|
|
|
**File:** `.claude/hooks/task-complete` (Lines 171, 178)
|
|
```bash
|
|
bash "$(dirname "${BASH_SOURCE[0]}")/sync-contexts" >/dev/null 2>&1 &
|
|
bash "$(dirname "${BASH_SOURCE[0]}")/sync-contexts" >/dev/null 2>&1 &
|
|
```
|
|
|
|
**Analysis:**
|
|
- Both hooks spawn `sync-contexts` in background (`&`)
|
|
- `sync-contexts` uses `curl` to POST to API (HTTP, not SSH)
|
|
- Each hook execution spawns a NEW background process
|
|
|
|
**Process Chain:**
|
|
```
|
|
Claude Code Hook
|
|
└─> bash user-prompt-submit
|
|
├─> git config (spawns: bash → git.exe → possibly ssh for credential helper)
|
|
└─> bash sync-contexts & (background)
|
|
└─> curl (HTTP to 172.16.3.30:8001)
|
|
```
|
|
|
|
**Zombie Accumulation:**
|
|
- `user-prompt-submit` runs BEFORE each user message
|
|
- `task-complete` runs AFTER task completion
|
|
- Both spawn background `sync-contexts` processes
|
|
- Background processes may not properly terminate
|
|
- Each git operation spawns: bash → git → OpenSSH (due to core.sshcommand)
|
|
|
|
---
|
|
|
|
### 3. Git Configuration Analysis
|
|
|
|
**Global Git Config:**
|
|
```
|
|
core.sshcommand = C:/Windows/System32/OpenSSH/ssh.exe
|
|
credential.https://git.azcomputerguru.com.provider = generic
|
|
```
|
|
|
|
**Why SSH processes spawn:**
|
|
|
|
1. **Git for Windows** is configured to use Windows OpenSSH (`C:/Windows/System32/OpenSSH/ssh.exe`)
|
|
2. Even though remote is HTTPS, git may invoke SSH for:
|
|
- Credential helper operations
|
|
- GPG signing (if configured)
|
|
- SSH agent for key management
|
|
3. **Credential provider** is set to `generic` for the gitea server
|
|
- This may use Windows Credential Manager
|
|
- Credential operations might trigger ssh-agent
|
|
|
|
**SSH-Agent Purpose:**
|
|
- SSH agent (`ssh-agent.exe`) manages SSH keys
|
|
- Even with HTTPS remote, git might use ssh-agent for:
|
|
- GPG commit signing with SSH keys
|
|
- Credential helper authentication
|
|
- Git LFS operations (if configured)
|
|
|
|
---
|
|
|
|
### 4. Process Lifecycle Issues
|
|
|
|
**Expected Lifecycle:**
|
|
```
|
|
Hook starts → git config → git spawns ssh → command completes → ssh terminates → hook ends
|
|
```
|
|
|
|
**Actual Behavior (suspected):**
|
|
```
|
|
Hook starts → git config → git spawns ssh → command completes → ssh lingers (orphaned)
|
|
→ sync-contexts & → spawns in background → may not terminate
|
|
→ curl to API
|
|
```
|
|
|
|
**Why processes linger:**
|
|
|
|
1. **Background processes (`&`)**:
|
|
- `sync-contexts` runs in background
|
|
- Parent hook terminates before child completes
|
|
- Background process becomes orphaned
|
|
- Bash shell keeps running to manage background job
|
|
|
|
2. **Git spawns SSH but doesn't wait for cleanup**:
|
|
- Git uses OpenSSH for credential operations
|
|
- SSH process may outlive git command
|
|
- No explicit process cleanup
|
|
|
|
3. **Windows process management**:
|
|
- Orphaned processes don't auto-terminate on Windows
|
|
- Need explicit cleanup or timeout
|
|
|
|
---
|
|
|
|
### 5. Hook Execution Frequency
|
|
|
|
**Trigger Points:**
|
|
- `user-prompt-submit`: Runs BEFORE every user message
|
|
- `task-complete`: Runs AFTER task completion (less frequent)
|
|
|
|
**Accumulation Pattern:**
|
|
```
|
|
Session Start: 0 SSH processes
|
|
User message 1: +1-2 SSH processes (user-prompt-submit)
|
|
User message 2: +1-2 SSH processes (accumulating)
|
|
User message 3: +1-2 SSH processes (now 3-6 total)
|
|
Task complete: +1-2 SSH processes (task-complete)
|
|
...
|
|
```
|
|
|
|
After 5-10 interactions: **5-10 zombie SSH processes**
|
|
|
|
---
|
|
|
|
## Root Cause Summary
|
|
|
|
**Primary Cause:** Background `sync-contexts` processes spawned by hooks
|
|
|
|
**Secondary Cause:** Git commands trigger OpenSSH for credential/signing operations
|
|
|
|
**Contributing Factors:**
|
|
1. Hooks spawn background processes with `&` (lines 68, 171, 178)
|
|
2. Background processes are not tracked or cleaned up
|
|
3. Git is configured with `core.sshcommand` pointing to OpenSSH
|
|
4. Each git operation potentially spawns ssh for credential helper
|
|
5. Windows doesn't auto-cleanup orphaned processes
|
|
6. No timeout or process cleanup mechanism in hooks
|
|
|
|
---
|
|
|
|
## Why Git Uses SSH (Despite HTTPS Remote)
|
|
|
|
Git may invoke SSH even with HTTPS remotes for:
|
|
|
|
1. **Credential Helper**: Generic credential provider might use ssh-agent
|
|
2. **GPG Signing**: If commits are signed with SSH keys (git 2.34+)
|
|
3. **Git Config**: `core.sshcommand` explicitly tells git to use OpenSSH
|
|
4. **Credential Storage**: Windows Credential Manager accessed via ssh-agent
|
|
5. **Git LFS**: Large File Storage might use SSH for authentication
|
|
|
|
**Evidence:**
|
|
```bash
|
|
git config --global core.sshcommand
|
|
# Output: C:/Windows/System32/OpenSSH/ssh.exe
|
|
|
|
git config --global credential.https://git.azcomputerguru.com.provider
|
|
# Output: generic
|
|
```
|
|
|
|
---
|
|
|
|
## Recommended Fixes
|
|
|
|
### Fix #1: Remove Background Process Spawning (HIGH PRIORITY)
|
|
|
|
**Problem:** Hooks spawn `sync-contexts` in background with `&`
|
|
|
|
**Solution:** Remove background spawning or add proper cleanup
|
|
|
|
**Files to modify:**
|
|
- `.claude/hooks/user-prompt-submit` (line 68)
|
|
- `.claude/hooks/task-complete` (lines 171, 178)
|
|
|
|
**Options:**
|
|
|
|
**Option A - Remove background spawn (synchronous):**
|
|
```bash
|
|
# Instead of:
|
|
bash "$(dirname "${BASH_SOURCE[0]}")/sync-contexts" >/dev/null 2>&1 &
|
|
|
|
# Use:
|
|
bash "$(dirname "${BASH_SOURCE[0]}")/sync-contexts" >/dev/null 2>&1
|
|
```
|
|
**Pros:** Simple, no zombies
|
|
**Cons:** Slower hook execution (blocks on sync)
|
|
|
|
**Option B - Remove sync from hooks entirely:**
|
|
```bash
|
|
# Comment out or remove the sync-contexts calls
|
|
# Let user manually run: bash .claude/hooks/sync-contexts
|
|
```
|
|
**Pros:** No blocking, no zombies
|
|
**Cons:** Requires manual sync or cron job
|
|
|
|
**Option C - Add timeout and cleanup:**
|
|
```bash
|
|
# Run with timeout and background cleanup
|
|
timeout 10s bash "$(dirname "${BASH_SOURCE[0]}")/sync-contexts" >/dev/null 2>&1 &
|
|
SYNC_PID=$!
|
|
# Register cleanup trap
|
|
trap "kill $SYNC_PID 2>/dev/null" EXIT
|
|
```
|
|
**Pros:** Non-blocking with cleanup
|
|
**Cons:** More complex, timeout command may not exist on Windows Git Bash
|
|
|
|
---
|
|
|
|
### Fix #2: Reduce Git Command Frequency (MEDIUM PRIORITY)
|
|
|
|
**Problem:** Every hook execution runs multiple git commands
|
|
|
|
**Solution:** Cache git values to reduce spawning
|
|
|
|
**Example optimization:**
|
|
```bash
|
|
# Cache project ID in environment variable or temp file
|
|
if [ -z "$CACHED_PROJECT_ID" ]; then
|
|
PROJECT_ID=$(git config --local claude.projectid 2>/dev/null)
|
|
export CACHED_PROJECT_ID="$PROJECT_ID"
|
|
else
|
|
PROJECT_ID="$CACHED_PROJECT_ID"
|
|
fi
|
|
```
|
|
|
|
**Impact:** 50% reduction in git command executions
|
|
|
|
---
|
|
|
|
### Fix #3: Review Git SSH Configuration (LOW PRIORITY)
|
|
|
|
**Problem:** Git uses SSH even for HTTPS operations
|
|
|
|
**Investigation needed:**
|
|
1. Why is `core.sshcommand` set to OpenSSH?
|
|
2. Is SSH needed for credential helper?
|
|
3. Is GPG signing using SSH keys?
|
|
|
|
**Potential fix:**
|
|
```bash
|
|
# Remove core.sshcommand if not needed
|
|
git config --global --unset core.sshcommand
|
|
|
|
# Or use Git Credential Manager instead of generic
|
|
git config --global credential.helper manager-core
|
|
```
|
|
|
|
**WARNING:** Test thoroughly before changing - may break authentication
|
|
|
|
---
|
|
|
|
### Fix #4: Add Process Cleanup to Hooks (MEDIUM PRIORITY)
|
|
|
|
**Problem:** No cleanup of spawned processes
|
|
|
|
**Solution:** Add trap handlers to kill child processes on exit
|
|
|
|
**Example:**
|
|
```bash
|
|
#!/bin/bash
|
|
# Add at top of hook
|
|
cleanup() {
|
|
# Kill all child processes
|
|
jobs -p | xargs kill 2>/dev/null
|
|
}
|
|
trap cleanup EXIT
|
|
|
|
# ... rest of hook ...
|
|
```
|
|
|
|
---
|
|
|
|
## Testing Plan
|
|
|
|
1. **Verify SSH processes before fix:**
|
|
```powershell
|
|
Get-Process | Where-Object {$_.Name -eq 'ssh' -or $_.Name -eq 'ssh-agent'}
|
|
```
|
|
|
|
2. **Apply Fix #1 (remove background spawn)**
|
|
|
|
3. **Test hook execution:**
|
|
- Send 5 user messages to Claude
|
|
- Check SSH process count after each message
|
|
|
|
4. **Verify SSH processes after fix:**
|
|
- Should remain constant (1 ssh-agent max)
|
|
- No accumulation of ssh.exe processes
|
|
|
|
5. **Monitor for 24 hours:**
|
|
- Check process count periodically
|
|
- Verify no zombie accumulation
|
|
|
|
---
|
|
|
|
## Questions Answered
|
|
|
|
**Q1: Why are git operations spawning SSH?**
|
|
A: Git is configured with `core.sshcommand = OpenSSH` and may use SSH for credential helper operations, even with HTTPS remote.
|
|
|
|
**Q2: Are hooks deliberately syncing with git remote?**
|
|
A: NO. Hooks sync to API (http://172.16.3.30:8001) via curl, not git remote.
|
|
|
|
**Q3: Is ssh-agent supposed to be running?**
|
|
A: YES - 1 ssh-agent is normal for Git operations. 5+ ssh.exe processes is NOT normal.
|
|
|
|
**Q4: Are SSH connections timing out or accumulating?**
|
|
A: ACCUMULATING. Background processes spawn ssh and don't properly terminate.
|
|
|
|
**Q5: Is ControlMaster/ControlPersist keeping connections alive?**
|
|
A: NO - no SSH config file found with ControlMaster settings.
|
|
|
|
**Q6: Are hooks SUPPOSED to sync with git remote?**
|
|
A: NO - this appears to be unintentional side effect of:
|
|
- Background process spawning
|
|
- Git credential helper using SSH
|
|
- No process cleanup
|
|
|
|
---
|
|
|
|
## File Mapping: Which Hooks Spawn SSH
|
|
|
|
| Hook File | Git Commands | Background Spawn | SSH Risk |
|
|
|-----------|-------------|------------------|----------|
|
|
| `user-prompt-submit` | 2 git commands | YES (line 68) | HIGH |
|
|
| `task-complete` | 5 git commands | YES (2x: lines 171, 178) | CRITICAL |
|
|
| `sync-contexts` | 0 git commands | N/A | NONE (curl only) |
|
|
| `periodic-context-save` | 1 git command | Unknown | MEDIUM |
|
|
|
|
**Highest risk:** `task-complete` (spawns background process TWICE + 5 git commands)
|
|
|
|
---
|
|
|
|
## Recommended Action Plan
|
|
|
|
**Immediate (Today):**
|
|
1. Apply Fix #1 Option B: Comment out background sync calls in hooks
|
|
2. Test with 10 user messages
|
|
3. Verify SSH process count remains stable
|
|
|
|
**Short-term (This Week):**
|
|
1. Implement manual sync command or scheduled task for `sync-contexts`
|
|
2. Add caching for git values to reduce command frequency
|
|
3. Add process cleanup traps to hooks
|
|
|
|
**Long-term (Future):**
|
|
1. Review git SSH configuration necessity
|
|
2. Consider alternative credential helper
|
|
3. Investigate if GPG/SSH signing is needed
|
|
4. Optimize hook execution performance
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
**Fix is successful when:**
|
|
- SSH process count remains constant (1 ssh-agent max)
|
|
- No accumulation of ssh.exe processes over time
|
|
- Hooks execute without spawning orphaned background processes
|
|
- Context sync still works (either manual or scheduled)
|
|
|
|
**Monitoring metrics:**
|
|
- SSH process count over 24 hours
|
|
- Hook execution time
|
|
- Context sync success rate
|
|
- User message latency
|
|
|
|
---
|
|
|
|
**Report Compiled By:** SSH/Network Connection Agent
|
|
**Status:** Investigation Complete - Root Cause Identified
|
|
**Next Step:** Apply Fix #1 and monitor
|