Files
claudetools/projects/msp-tools/guru-connect/CHECKLIST_STATE.json
Mike Swanson 8521c95755 Phase 1 Week 2: Infrastructure & Monitoring
Added comprehensive production infrastructure:

Systemd Service:
- guruconnect.service with auto-restart, resource limits, security hardening
- setup-systemd.sh installation script

Prometheus Metrics:
- Added prometheus-client dependency
- Created metrics module tracking:
  - HTTP requests (count, latency)
  - Sessions (created, closed, active)
  - Connections (WebSocket, by type)
  - Errors (by type)
  - Database operations (count, latency)
  - Server uptime
- Added /metrics endpoint
- Background task for uptime updates

Monitoring Configuration:
- prometheus.yml with scrape configs for GuruConnect and node_exporter
- alerts.yml with alerting rules
- grafana-dashboard.json with 10 panels
- setup-monitoring.sh installation script

PostgreSQL Backups:
- backup-postgres.sh with gzip compression
- restore-postgres.sh with safety checks
- guruconnect-backup.service and .timer for automated daily backups
- Retention policy: 30 daily, 4 weekly, 6 monthly

Health Monitoring:
- health-monitor.sh checking HTTP, disk, memory, database, metrics
- guruconnect.logrotate for log rotation
- Email alerts on failures

Updated CHECKLIST_STATE.json to reflect Week 1 completion (77%) and Week 2 start.
Created PHASE1_WEEK2_INFRASTRUCTURE.md with comprehensive planning.

Ready for deployment and testing on RMM server.
2026-01-17 20:24:32 -07:00

183 lines
6.1 KiB
JSON

{
"project": "GuruConnect",
"last_updated": "2026-01-18T03:30:00Z",
"current_phase": 1,
"current_week": 2,
"current_day": 1,
"deployment_status": "deployed_to_production",
"phases": {
"phase1": {
"name": "Security & Infrastructure",
"status": "in_progress",
"progress_percentage": 50,
"checklist_summary": {
"total_items": 147,
"completed": 74,
"in_progress": 0,
"pending": 73
},
"weeks": {
"week1": {
"name": "Critical Security Fixes",
"status": "complete",
"progress_percentage": 77,
"items_completed": 10,
"items_total": 13,
"completed_items": [
"SEC-1: Remove hardcoded JWT secret",
"SEC-1: Add JWT_SECRET environment variable",
"SEC-1: Validate JWT secret strength",
"SEC-3: SQL injection audit (verified safe)",
"SEC-4: IP address extraction and logging",
"SEC-4: Failed connection attempt logging",
"SEC-4: API key strength validation",
"SEC-5: Token blacklist implementation",
"SEC-5: JWT validation with revocation",
"SEC-5: Logout and revocation endpoints",
"SEC-5: Blacklist monitoring tools",
"SEC-5: Middleware integration",
"SEC-6: Remove password logging (write to .admin-credentials)",
"SEC-7: XSS prevention (CSP headers)",
"SEC-9: Verify Argon2id usage (explicitly configured)",
"SEC-11: CORS configuration review (restricted origins)",
"SEC-12: Security headers (6 headers implemented)",
"SEC-13: Session expiration enforcement (strict validation)",
"Production deployment to 172.16.3.30:3002",
"Security header verification via HTTP responses",
"IP logging operational verification"
],
"deferred_items": [
"SEC-2: Rate limiting (deferred - tower_governor type issues)",
"SEC-8: TLS certificate validation (not applicable - NPM handles)",
"SEC-10: HTTPS enforcement (delegated to NPM reverse proxy)"
]
},
"week2": {
"name": "Infrastructure & Monitoring",
"status": "starting",
"progress_percentage": 0,
"items_completed": 0,
"items_total": 8,
"pending_items": [
"Systemd service configuration",
"Auto-restart on failure",
"Prometheus metrics endpoint",
"Grafana dashboard setup",
"PostgreSQL automated backups",
"Backup retention policy",
"Log rotation configuration",
"Health check monitoring"
]
},
"week3": {
"name": "CI/CD & Automation",
"status": "not_started",
"progress_percentage": 0,
"items_total": 6,
"pending_items": [
"Gitea CI pipeline configuration",
"Automated builds on commit",
"Automated tests in CI",
"Deployment automation scripts",
"Build artifact storage",
"Version tagging automation"
]
},
"week4": {
"name": "Production Hardening",
"status": "not_started",
"progress_percentage": 0,
"items_total": 5,
"pending_items": [
"Load testing (50+ concurrent sessions)",
"Performance optimization",
"Database connection pooling",
"Security audit",
"Production deployment checklist"
]
}
}
},
"phase2": {
"name": "Core Features",
"status": "not_started",
"progress_percentage": 0,
"weeks": {
"week5": {
"name": "End-User Portal",
"status": "not_started"
},
"week6-8": {
"name": "One-Time Agent Download",
"status": "not_started"
},
"week9-12": {
"name": "Core Session Features",
"status": "not_started"
}
}
}
},
"recent_completions": [
{
"timestamp": "2026-01-17T18:00:00Z",
"item": "SEC-1: JWT Secret Security",
"notes": "Removed hardcoded secrets, added validation"
},
{
"timestamp": "2026-01-17T18:30:00Z",
"item": "SEC-3: SQL Injection Audit",
"notes": "Verified all queries safe"
},
{
"timestamp": "2026-01-17T19:00:00Z",
"item": "SEC-4: Agent Connection Validation",
"notes": "IP logging, failed connection tracking complete"
},
{
"timestamp": "2026-01-17T20:30:00Z",
"item": "SEC-5: Session Takeover Prevention",
"notes": "Token blacklist and revocation complete"
},
{
"timestamp": "2026-01-18T01:00:00Z",
"item": "SEC-6 through SEC-13 Implementation",
"notes": "Password file write, XSS prevention, Argon2id, CORS, security headers, JWT expiration"
},
{
"timestamp": "2026-01-18T02:00:00Z",
"item": "Production Deployment - Week 1 Security",
"notes": "All security fixes deployed to 172.16.3.30:3002, verified via curl and logs"
},
{
"timestamp": "2026-01-18T03:06:00Z",
"item": "Final Deployment Verification",
"notes": "All security headers operational, server stable (PID 3839055)"
}
],
"blockers": [
{
"item": "SEC-2: Rate Limiting",
"issue": "tower_governor type incompatibility with Axum 0.7",
"workaround": "Documented in SEC2_RATE_LIMITING_TODO.md - will revisit with custom middleware"
},
{
"item": "Database Connectivity",
"issue": "PostgreSQL password authentication failed",
"impact": "Cannot test token revocation end-to-end, server runs in memory-only mode",
"workaround": "Server operational without database persistence"
}
],
"next_milestone": {
"name": "Phase 1 Week 2 - Infrastructure Complete",
"target_date": "2026-01-25",
"deliverables": [
"Systemd service running with auto-restart",
"Prometheus metrics exposed",
"Grafana dashboard configured",
"Automated PostgreSQL backups",
"Log rotation configured"
]
}
}