Created comprehensive VPN setup tooling for Peaceful Spirit L2TP/IPsec connection and enhanced agent documentation framework. VPN Configuration (PST-NW-VPN): - Setup-PST-L2TP-VPN.ps1: Automated L2TP/IPsec setup with split-tunnel and DNS - Connect-PST-VPN.ps1: Connection helper with PPP adapter detection, DNS (192.168.0.2), and route config (192.168.0.0/24) - Connect-PST-VPN-Standalone.ps1: Self-contained connection script for remote deployment - Fix-PST-VPN-Auth.ps1: Authentication troubleshooting for CHAP/MSChapv2 - Diagnose-VPN-Interface.ps1: Comprehensive VPN interface and routing diagnostic - Quick-Test-VPN.ps1: Fast connectivity verification (DNS/router/routes) - Add-PST-VPN-Route-Manual.ps1: Manual route configuration helper - vpn-connect.bat, vpn-disconnect.bat: Simple batch file shortcuts - OpenVPN config files (Windows-compatible, abandoned for L2TP) Key VPN Implementation Details: - L2TP creates PPP adapter with connection name as interface description - UniFi auto-configures DNS (192.168.0.2) but requires manual route to 192.168.0.0/24 - Split-tunnel enabled (only remote traffic through VPN) - All-user connection for pre-login auto-connect via scheduled task - Authentication: CHAP + MSChapv2 for UniFi compatibility Agent Documentation: - AGENT_QUICK_REFERENCE.md: Quick reference for all specialized agents - documentation-squire.md: Documentation and task management specialist agent - Updated all agent markdown files with standardized formatting Project Organization: - Moved conversation logs to dedicated directories (guru-connect-conversation-logs, guru-rmm-conversation-logs) - Cleaned up old session JSONL files from projects/msp-tools/ - Added guru-connect infrastructure (agent, dashboard, proto, scripts, .gitea workflows) - Added guru-rmm server components and deployment configs Technical Notes: - VPN IP pool: 192.168.4.x (client gets 192.168.4.6) - Remote network: 192.168.0.0/24 (router at 192.168.0.10) - PSK: rrClvnmUeXEFo90Ol+z7tfsAZHeSK6w7 - Credentials: pst-admin / 24Hearts$ Files: 15 VPN scripts, 2 agent docs, conversation log reorganization, guru-connect/guru-rmm infrastructure additions Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
660 lines
17 KiB
Markdown
660 lines
17 KiB
Markdown
# GuruConnect - Technical Debt & Future Work Tracker
|
|
|
|
**Last Updated:** 2026-01-18
|
|
**Project Phase:** Phase 1 Complete (89%)
|
|
|
|
---
|
|
|
|
## Critical Items (Blocking Production Use)
|
|
|
|
### 1. Gitea Actions Runner Registration
|
|
**Status:** PENDING (requires admin access)
|
|
**Priority:** HIGH
|
|
**Effort:** 5 minutes
|
|
**Tracked In:** PHASE1_WEEK3_COMPLETE.md line 181
|
|
|
|
**Description:**
|
|
Runner installed but not registered with Gitea instance. CI/CD pipeline is ready but not active.
|
|
|
|
**Action Required:**
|
|
```bash
|
|
# Get token from: https://git.azcomputerguru.com/admin/actions/runners
|
|
sudo -u gitea-runner act_runner register \
|
|
--instance https://git.azcomputerguru.com \
|
|
--token YOUR_REGISTRATION_TOKEN_HERE \
|
|
--name gururmm-runner \
|
|
--labels ubuntu-latest,ubuntu-22.04
|
|
|
|
sudo systemctl enable gitea-runner
|
|
sudo systemctl start gitea-runner
|
|
```
|
|
|
|
**Verification:**
|
|
- Runner shows "Online" in Gitea admin panel
|
|
- Test commit triggers build workflow
|
|
|
|
---
|
|
|
|
## High Priority Items (Security & Stability)
|
|
|
|
### 2. TLS Certificate Auto-Renewal
|
|
**Status:** NOT IMPLEMENTED
|
|
**Priority:** HIGH
|
|
**Effort:** 2-4 hours
|
|
**Tracked In:** PHASE1_COMPLETE.md line 51
|
|
|
|
**Description:**
|
|
Let's Encrypt certificates need manual renewal. Should implement certbot auto-renewal.
|
|
|
|
**Implementation:**
|
|
```bash
|
|
# Install certbot
|
|
sudo apt install certbot python3-certbot-nginx
|
|
|
|
# Configure auto-renewal
|
|
sudo certbot --nginx -d connect.azcomputerguru.com
|
|
|
|
# Set up automatic renewal (cron or systemd timer)
|
|
sudo systemctl enable certbot.timer
|
|
sudo systemctl start certbot.timer
|
|
```
|
|
|
|
**Verification:**
|
|
- `sudo certbot renew --dry-run` succeeds
|
|
- Certificate auto-renews before expiration
|
|
|
|
---
|
|
|
|
### 3. Systemd Watchdog Implementation
|
|
**Status:** PARTIALLY COMPLETED (issue fixed, proper implementation pending)
|
|
**Priority:** MEDIUM
|
|
**Effort:** 4-8 hours (remaining for sd_notify implementation)
|
|
**Discovered:** 2026-01-18 (dashboard 502 error)
|
|
**Issue Fixed:** 2026-01-18
|
|
|
|
**Description:**
|
|
Systemd watchdog was causing service crashes. Removed `WatchdogSec=30s` from service file to resolve immediate 502 error. Server now runs stably without watchdog configuration. Proper sd_notify watchdog support should still be implemented for automatic restart on hung processes.
|
|
|
|
**Implementation:**
|
|
1. Add `systemd` crate to server/Cargo.toml
|
|
2. Implement `sd_notify_watchdog()` calls in main loop
|
|
3. Re-enable `WatchdogSec=30s` in systemd service
|
|
4. Test that service doesn't crash and watchdog works
|
|
|
|
**Files to Modify:**
|
|
- `server/Cargo.toml` - Add dependency
|
|
- `server/src/main.rs` - Add watchdog notifications
|
|
- `/etc/systemd/system/guruconnect.service` - Re-enable WatchdogSec
|
|
|
|
**Benefits:**
|
|
- Systemd can detect hung server process
|
|
- Automatic restart on deadlock/hang conditions
|
|
|
|
---
|
|
|
|
### 4. Invalid Agent API Key Investigation
|
|
**Status:** ONGOING ISSUE
|
|
**Priority:** MEDIUM
|
|
**Effort:** 1-2 hours
|
|
**Discovered:** 2026-01-18
|
|
|
|
**Description:**
|
|
Agent at 172.16.3.20 (machine ID 935a3920-6e32-4da3-a74f-3e8e8b2a426a) is repeatedly connecting with invalid API key every 5 seconds.
|
|
|
|
**Log Evidence:**
|
|
```
|
|
WARN guruconnect_server::relay: Agent connection rejected: 935a3920-6e32-4da3-a74f-3e8e8b2a426a from 172.16.3.20 - invalid API key
|
|
```
|
|
|
|
**Investigation Needed:**
|
|
1. Identify which machine is 172.16.3.20
|
|
2. Check agent configuration on that machine
|
|
3. Update agent with correct API key OR remove agent
|
|
4. Consider implementing rate limiting for failed auth attempts
|
|
|
|
**Potential Impact:**
|
|
- Fills logs with warnings
|
|
- Wastes server resources processing invalid connections
|
|
- May indicate misconfigured or rogue agent
|
|
|
|
---
|
|
|
|
### 5. Comprehensive Security Audit Logging
|
|
**Status:** PARTIALLY IMPLEMENTED
|
|
**Priority:** MEDIUM
|
|
**Effort:** 8-16 hours
|
|
**Tracked In:** PHASE1_COMPLETE.md line 51
|
|
|
|
**Description:**
|
|
Current logging covers basic operations. Need comprehensive audit trail for security events.
|
|
|
|
**Events to Track:**
|
|
- All authentication attempts (success/failure)
|
|
- Session creation/termination
|
|
- Agent connections/disconnections
|
|
- User account changes
|
|
- Configuration changes
|
|
- Administrative actions
|
|
- File transfer operations (when implemented)
|
|
|
|
**Implementation:**
|
|
1. Create `audit_logs` table in database
|
|
2. Implement `AuditLogger` service
|
|
3. Add audit calls to all security-sensitive operations
|
|
4. Create audit log viewer in dashboard
|
|
5. Implement log retention policy
|
|
|
|
**Files to Create/Modify:**
|
|
- `server/migrations/XXX_create_audit_logs.sql`
|
|
- `server/src/audit.rs` - Audit logging service
|
|
- `server/src/api/audit.rs` - Audit log API endpoints
|
|
- `server/static/audit.html` - Audit log viewer
|
|
|
|
---
|
|
|
|
### 6. Session Timeout Enforcement (UI-Side)
|
|
**Status:** NOT IMPLEMENTED
|
|
**Priority:** MEDIUM
|
|
**Effort:** 2-4 hours
|
|
**Tracked In:** PHASE1_COMPLETE.md line 51
|
|
|
|
**Description:**
|
|
JWT tokens expire after 24 hours (server-side), but UI doesn't detect/handle expiration gracefully.
|
|
|
|
**Implementation:**
|
|
1. Add token expiration check to dashboard JavaScript
|
|
2. Implement automatic logout on token expiration
|
|
3. Add session timeout warning (e.g., "Session expires in 5 minutes")
|
|
4. Implement token refresh mechanism (optional)
|
|
|
|
**Files to Modify:**
|
|
- `server/static/dashboard.html` - Add expiration check
|
|
- `server/static/viewer.html` - Add expiration check
|
|
- `server/src/api/auth.rs` - Add token refresh endpoint (optional)
|
|
|
|
**User Experience:**
|
|
- User gets warned before automatic logout
|
|
- Clear messaging: "Session expired, please log in again"
|
|
- No confusing error messages on expired tokens
|
|
|
|
---
|
|
|
|
## Medium Priority Items (Operational Excellence)
|
|
|
|
### 7. Grafana Dashboard Import
|
|
**Status:** NOT COMPLETED
|
|
**Priority:** MEDIUM
|
|
**Effort:** 15 minutes
|
|
**Tracked In:** PHASE1_COMPLETE.md
|
|
|
|
**Description:**
|
|
Dashboard JSON file exists but not imported into Grafana.
|
|
|
|
**Action Required:**
|
|
1. Login to Grafana: http://172.16.3.30:3000
|
|
2. Go to Dashboards > Import
|
|
3. Upload `infrastructure/grafana-dashboard.json`
|
|
4. Verify all panels display data
|
|
|
|
**File Location:**
|
|
- `infrastructure/grafana-dashboard.json`
|
|
|
|
---
|
|
|
|
### 8. Grafana Default Password Change
|
|
**Status:** NOT CHANGED
|
|
**Priority:** MEDIUM
|
|
**Effort:** 2 minutes
|
|
**Tracked In:** Multiple docs
|
|
|
|
**Description:**
|
|
Grafana still using default admin/admin credentials.
|
|
|
|
**Action Required:**
|
|
1. Login to Grafana: http://172.16.3.30:3000
|
|
2. Change password from admin/admin to secure password
|
|
3. Update documentation with new password
|
|
|
|
**Security Risk:**
|
|
- Low (internal network only, not exposed to internet)
|
|
- But should follow security best practices
|
|
|
|
---
|
|
|
|
### 9. Deployment SSH Keys for Full Automation
|
|
**Status:** NOT CONFIGURED
|
|
**Priority:** MEDIUM
|
|
**Effort:** 1-2 hours
|
|
**Tracked In:** PHASE1_WEEK3_COMPLETE.md, CI_CD_SETUP.md
|
|
|
|
**Description:**
|
|
CI/CD deployment workflow ready but requires SSH key configuration for full automation.
|
|
|
|
**Implementation:**
|
|
```bash
|
|
# Generate SSH key for runner
|
|
sudo -u gitea-runner ssh-keygen -t ed25519 -C "gitea-runner@gururmm"
|
|
|
|
# Add public key to authorized_keys
|
|
sudo -u gitea-runner cat /home/gitea-runner/.ssh/id_ed25519.pub >> ~guru/.ssh/authorized_keys
|
|
|
|
# Test SSH connection
|
|
sudo -u gitea-runner ssh guru@172.16.3.30 whoami
|
|
|
|
# Add secrets to Gitea repository settings
|
|
# SSH_PRIVATE_KEY - content of /home/gitea-runner/.ssh/id_ed25519
|
|
# SSH_HOST - 172.16.3.30
|
|
# SSH_USER - guru
|
|
```
|
|
|
|
**Current State:**
|
|
- Manual deployment works via deploy.sh
|
|
- Automated deployment via workflow will fail on SSH step
|
|
|
|
---
|
|
|
|
### 10. Backup Offsite Sync
|
|
**Status:** NOT IMPLEMENTED
|
|
**Priority:** MEDIUM
|
|
**Effort:** 4-8 hours
|
|
**Tracked In:** PHASE1_COMPLETE.md
|
|
|
|
**Description:**
|
|
Daily backups stored locally but not synced offsite. Risk of data loss if server fails.
|
|
|
|
**Implementation Options:**
|
|
|
|
**Option A: Rsync to Remote Server**
|
|
```bash
|
|
# Add to backup script
|
|
rsync -avz /home/guru/backups/guruconnect/ \
|
|
backup-server:/backups/gururmm/guruconnect/
|
|
```
|
|
|
|
**Option B: Cloud Storage (S3, Azure Blob, etc.)**
|
|
```bash
|
|
# Install rclone
|
|
sudo apt install rclone
|
|
|
|
# Configure cloud provider
|
|
rclone config
|
|
|
|
# Sync backups
|
|
rclone sync /home/guru/backups/guruconnect/ remote:guruconnect-backups/
|
|
```
|
|
|
|
**Considerations:**
|
|
- Encryption for backups in transit
|
|
- Retention policy on remote storage
|
|
- Cost of cloud storage
|
|
- Bandwidth usage
|
|
|
|
---
|
|
|
|
### 11. Alertmanager for Prometheus
|
|
**Status:** NOT CONFIGURED
|
|
**Priority:** MEDIUM
|
|
**Effort:** 4-8 hours
|
|
**Tracked In:** PHASE1_COMPLETE.md
|
|
|
|
**Description:**
|
|
Prometheus collects metrics but no alerting configured. Should notify on issues.
|
|
|
|
**Alerts to Configure:**
|
|
- Service down
|
|
- High error rate
|
|
- Database connection failures
|
|
- Disk space low
|
|
- High CPU/memory usage
|
|
- Failed authentication spike
|
|
|
|
**Implementation:**
|
|
```bash
|
|
# Install Alertmanager
|
|
sudo apt install prometheus-alertmanager
|
|
|
|
# Configure alert rules
|
|
sudo tee /etc/prometheus/alert.rules.yml << 'EOF'
|
|
groups:
|
|
- name: guruconnect
|
|
rules:
|
|
- alert: ServiceDown
|
|
expr: up{job="guruconnect"} == 0
|
|
for: 1m
|
|
annotations:
|
|
summary: "GuruConnect service is down"
|
|
|
|
- alert: HighErrorRate
|
|
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
|
|
for: 5m
|
|
annotations:
|
|
summary: "High error rate detected"
|
|
EOF
|
|
|
|
# Configure notification channels (email, Slack, etc.)
|
|
```
|
|
|
|
---
|
|
|
|
### 12. CI/CD Notification Webhooks
|
|
**Status:** NOT CONFIGURED
|
|
**Priority:** LOW
|
|
**Effort:** 2-4 hours
|
|
**Tracked In:** PHASE1_COMPLETE.md
|
|
|
|
**Description:**
|
|
No notifications when builds fail or deployments complete.
|
|
|
|
**Implementation:**
|
|
1. Configure webhook in Gitea repository settings
|
|
2. Point to Slack/Discord/Email service
|
|
3. Select events: Push, Pull Request, Release
|
|
4. Test notifications
|
|
|
|
**Events to Notify:**
|
|
- Build started
|
|
- Build failed
|
|
- Build succeeded
|
|
- Deployment started
|
|
- Deployment completed
|
|
- Deployment failed
|
|
|
|
---
|
|
|
|
## Low Priority Items (Future Enhancements)
|
|
|
|
### 13. Windows Runner for Native Agent Builds
|
|
**Status:** NOT IMPLEMENTED
|
|
**Priority:** LOW
|
|
**Effort:** 8-16 hours
|
|
**Tracked In:** PHASE1_WEEK3_COMPLETE.md
|
|
|
|
**Description:**
|
|
Currently cross-compiling Windows agent from Linux. Native Windows builds would be faster and more reliable.
|
|
|
|
**Implementation:**
|
|
1. Set up Windows server/VM
|
|
2. Install Gitea Actions runner on Windows
|
|
3. Configure runner with windows-latest label
|
|
4. Update build workflow to use Windows runner for agent builds
|
|
|
|
**Benefits:**
|
|
- Faster agent builds (no cross-compilation)
|
|
- More accurate Windows testing
|
|
- Ability to run Windows-specific tests
|
|
|
|
**Cost:**
|
|
- Windows Server license (or Windows 10/11 Pro)
|
|
- Additional hardware/VM resources
|
|
|
|
---
|
|
|
|
### 14. Staging Environment
|
|
**Status:** NOT IMPLEMENTED
|
|
**Priority:** LOW
|
|
**Effort:** 16-32 hours
|
|
**Tracked In:** PHASE1_COMPLETE.md
|
|
|
|
**Description:**
|
|
All changes deploy directly to production. Should have staging environment for testing.
|
|
|
|
**Implementation:**
|
|
1. Set up staging server (VM or separate port)
|
|
2. Configure separate database for staging
|
|
3. Update CI/CD workflows:
|
|
- Push to develop → Deploy to staging
|
|
- Push tag → Deploy to production
|
|
4. Add smoke tests for staging
|
|
|
|
**Benefits:**
|
|
- Test deployments before production
|
|
- QA environment for testing
|
|
- Reduced production downtime
|
|
|
|
---
|
|
|
|
### 15. Code Coverage Thresholds
|
|
**Status:** NOT ENFORCED
|
|
**Priority:** LOW
|
|
**Effort:** 2-4 hours
|
|
**Tracked In:** Multiple docs
|
|
|
|
**Description:**
|
|
Code coverage collected but no minimum threshold enforced.
|
|
|
|
**Implementation:**
|
|
1. Analyze current coverage baseline
|
|
2. Set reasonable thresholds (e.g., 70% overall)
|
|
3. Update test workflow to fail if below threshold
|
|
4. Add coverage badge to README
|
|
|
|
**Files to Modify:**
|
|
- `.gitea/workflows/test.yml` - Add threshold check
|
|
- `README.md` - Add coverage badge
|
|
|
|
---
|
|
|
|
### 16. Performance Benchmarking in CI
|
|
**Status:** NOT IMPLEMENTED
|
|
**Priority:** LOW
|
|
**Effort:** 8-16 hours
|
|
**Tracked In:** PHASE1_COMPLETE.md
|
|
|
|
**Description:**
|
|
No automated performance testing. Risk of performance regression.
|
|
|
|
**Implementation:**
|
|
1. Create performance benchmarks using `criterion`
|
|
2. Add benchmark job to CI workflow
|
|
3. Track performance trends over time
|
|
4. Alert on performance regression (>10% slower)
|
|
|
|
**Benchmarks to Add:**
|
|
- WebSocket message throughput
|
|
- Authentication latency
|
|
- Database query performance
|
|
- Screen capture encoding speed
|
|
|
|
---
|
|
|
|
### 17. Database Replication
|
|
**Status:** NOT IMPLEMENTED
|
|
**Priority:** LOW
|
|
**Effort:** 16-32 hours
|
|
**Tracked In:** PHASE1_COMPLETE.md
|
|
|
|
**Description:**
|
|
Single database instance. No high availability or read scaling.
|
|
|
|
**Implementation:**
|
|
1. Set up PostgreSQL streaming replication
|
|
2. Configure automatic failover (pg_auto_failover)
|
|
3. Update application to use read replicas
|
|
4. Test failover scenarios
|
|
|
|
**Benefits:**
|
|
- High availability
|
|
- Read scaling
|
|
- Faster backups (from replica)
|
|
|
|
**Complexity:**
|
|
- Significant operational overhead
|
|
- Monitoring and alerting needed
|
|
- Failover testing required
|
|
|
|
---
|
|
|
|
### 18. Centralized Logging (ELK Stack)
|
|
**Status:** NOT IMPLEMENTED
|
|
**Priority:** LOW
|
|
**Effort:** 16-32 hours
|
|
**Tracked In:** PHASE1_COMPLETE.md
|
|
|
|
**Description:**
|
|
Logs stored in systemd journal. Hard to search across time periods.
|
|
|
|
**Implementation:**
|
|
1. Install Elasticsearch, Logstash, Kibana
|
|
2. Configure log shipping from systemd journal
|
|
3. Create Kibana dashboards
|
|
4. Set up log retention policy
|
|
|
|
**Benefits:**
|
|
- Powerful log search
|
|
- Log aggregation across services
|
|
- Visual log analysis
|
|
|
|
**Cost:**
|
|
- Significant resource usage (RAM for Elasticsearch)
|
|
- Operational complexity
|
|
|
|
---
|
|
|
|
## Discovered Issues (Need Investigation)
|
|
|
|
### 19. Agent Connection Retry Logic
|
|
**Status:** NEEDS REVIEW
|
|
**Priority:** LOW
|
|
**Effort:** 2-4 hours
|
|
**Discovered:** 2026-01-18
|
|
|
|
**Description:**
|
|
Agent at 172.16.3.20 retries every 5 seconds with invalid API key. Should implement exponential backoff or rate limiting.
|
|
|
|
**Investigation:**
|
|
1. Check agent retry logic in codebase
|
|
2. Determine if 5-second retry is intentional
|
|
3. Consider exponential backoff for failed auth
|
|
4. Add server-side rate limiting for repeated failures
|
|
|
|
**Files to Review:**
|
|
- `agent/src/transport/` - WebSocket connection logic
|
|
- `server/src/relay/` - Rate limiting for auth failures
|
|
|
|
---
|
|
|
|
### 20. Database Connection Pool Sizing
|
|
**Status:** NEEDS MONITORING
|
|
**Priority:** LOW
|
|
**Effort:** 2-4 hours
|
|
**Discovered:** During infrastructure setup
|
|
|
|
**Description:**
|
|
Default connection pool settings may not be optimal. Need to monitor under load.
|
|
|
|
**Monitoring:**
|
|
- Check `db_connections_active` metric in Prometheus
|
|
- Monitor for pool exhaustion warnings
|
|
- Track query latency
|
|
|
|
**Tuning:**
|
|
- Adjust `max_connections` in PostgreSQL config
|
|
- Adjust pool size in server .env file
|
|
- Monitor and iterate
|
|
|
|
---
|
|
|
|
## Completed Items (For Reference)
|
|
|
|
### ✓ Systemd Service Configuration
|
|
**Completed:** 2026-01-17
|
|
**Phase:** Phase 1 Week 2
|
|
|
|
### ✓ Prometheus Metrics Integration
|
|
**Completed:** 2026-01-17
|
|
**Phase:** Phase 1 Week 2
|
|
|
|
### ✓ Grafana Dashboard Setup
|
|
**Completed:** 2026-01-17
|
|
**Phase:** Phase 1 Week 2
|
|
|
|
### ✓ Automated Backup System
|
|
**Completed:** 2026-01-17
|
|
**Phase:** Phase 1 Week 2
|
|
|
|
### ✓ Log Rotation Configuration
|
|
**Completed:** 2026-01-17
|
|
**Phase:** Phase 1 Week 2
|
|
|
|
### ✓ CI/CD Workflows Created
|
|
**Completed:** 2026-01-18
|
|
**Phase:** Phase 1 Week 3
|
|
|
|
### ✓ Deployment Automation Script
|
|
**Completed:** 2026-01-18
|
|
**Phase:** Phase 1 Week 3
|
|
|
|
### ✓ Version Tagging Automation
|
|
**Completed:** 2026-01-18
|
|
**Phase:** Phase 1 Week 3
|
|
|
|
### ✓ Gitea Actions Runner Installation
|
|
**Completed:** 2026-01-18
|
|
**Phase:** Phase 1 Week 3
|
|
|
|
### ✓ Systemd Watchdog Issue Fixed (Partial Completion)
|
|
**Completed:** 2026-01-18
|
|
**What Was Done:** Removed `WatchdogSec=30s` from systemd service file
|
|
**Result:** Resolved immediate 502 error; server now runs stably
|
|
**Status:** Issue fixed but full implementation (sd_notify) still pending
|
|
**Item Reference:** Item #3 (full sd_notify implementation remains as future work)
|
|
**Impact:** Production server is now stable and responding correctly
|
|
|
|
---
|
|
|
|
## Summary by Priority
|
|
|
|
**Critical (1 item):**
|
|
1. Gitea Actions runner registration
|
|
|
|
**High (4 items):**
|
|
2. TLS certificate auto-renewal
|
|
4. Invalid agent API key investigation
|
|
5. Comprehensive security audit logging
|
|
6. Session timeout enforcement
|
|
|
|
**High - Partial/Pending (1 item):**
|
|
3. Systemd watchdog implementation (issue fixed; sd_notify implementation pending)
|
|
|
|
**Medium (6 items):**
|
|
7. Grafana dashboard import
|
|
8. Grafana password change
|
|
9. Deployment SSH keys
|
|
10. Backup offsite sync
|
|
11. Alertmanager for Prometheus
|
|
12. CI/CD notification webhooks
|
|
|
|
**Low (8 items):**
|
|
13. Windows runner for agent builds
|
|
14. Staging environment
|
|
15. Code coverage thresholds
|
|
16. Performance benchmarking
|
|
17. Database replication
|
|
18. Centralized logging (ELK)
|
|
19. Agent retry logic review
|
|
20. Database pool sizing monitoring
|
|
|
|
---
|
|
|
|
## Tracking Notes
|
|
|
|
**How to Use This Document:**
|
|
1. Before starting new work, review this list
|
|
2. When discovering new issues, add them here
|
|
3. When completing items, move to "Completed Items" section
|
|
4. Prioritize based on: Security > Stability > Operations > Features
|
|
5. Update status and dates as work progresses
|
|
|
|
**Related Documents:**
|
|
- `PHASE1_COMPLETE.md` - Overall Phase 1 status
|
|
- `PHASE1_WEEK3_COMPLETE.md` - CI/CD specific items
|
|
- `CI_CD_SETUP.md` - CI/CD documentation
|
|
- `INFRASTRUCTURE_STATUS.md` - Infrastructure status
|
|
|
|
---
|
|
|
|
**Document Version:** 1.1
|
|
**Items Tracked:** 20 (1 critical, 4 high, 1 high-partial, 6 medium, 8 low)
|
|
**Last Updated:** 2026-01-18 (Item #3 marked as partial completion)
|
|
**Next Review:** Before Phase 2 planning
|