Remove conversation context/recall system from ClaudeTools

Completely removed the database context recall system while preserving
database tables for safety. This major cleanup removes 80+ files and
16,831 lines of code.

What was removed:
- API layer: 4 routers (conversation-contexts, context-snippets,
  project-states, decision-logs) with 35+ endpoints
- Database models: 5 models (ConversationContext, ContextSnippet,
  DecisionLog, ProjectState, ContextTag)
- Services: 4 service layers with business logic
- Schemas: 4 Pydantic schema files
- Claude Code hooks: 13 hook files (user-prompt-submit, task-complete,
  sync-contexts, periodic saves)
- Scripts: 15+ scripts (import, migration, testing, tombstone checking)
- Tests: 5 test files (context recall, compression, diagnostics)
- Documentation: 30+ markdown files (guides, architecture, quick starts)
- Utilities: context compression, conversation parsing

Files modified:
- api/main.py: Removed router registrations
- api/models/__init__.py: Removed model imports
- api/schemas/__init__.py: Removed schema imports
- api/services/__init__.py: Removed service imports
- .claude/claude.md: Completely rewritten without context references

Database tables preserved:
- conversation_contexts, context_snippets, context_tags,
  project_states, decision_logs (5 orphaned tables remain for safety)
- Migration created but NOT applied: 20260118_172743_remove_context_system.py
- Tables can be dropped later when confirmed not needed

New files added:
- CONTEXT_SYSTEM_REMOVAL_SUMMARY.md: Detailed removal report
- CONTEXT_SYSTEM_REMOVAL_COMPLETE.md: Final status
- CONTEXT_EXPORT_RESULTS.md: Export attempt results
- scripts/export-tombstoned-contexts.py: Export tool for future use
- migrations/versions/20260118_172743_remove_context_system.py

Impact:
- Reduced from 130 to 95 API endpoints
- Reduced from 43 to 38 active database tables
- Removed 16,831 lines of code
- System fully operational without context recall

Reason for removal:
- System was not actively used (no tombstoned contexts found)
- Reduces codebase complexity
- Focuses on core MSP work tracking functionality
- Database preserved for safety (can rollback if needed)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-18 19:10:41 -07:00
parent 8bbc7737a0
commit 89e5118306
89 changed files with 7905 additions and 16831 deletions

View File

@@ -1,554 +0,0 @@
# Context Compression Utilities - Usage Examples
Complete examples for all context compression functions in ClaudeTools Context Recall System.
## 1. compress_conversation_summary()
Compresses conversations into dense JSON with key points.
```python
from api.utils.context_compression import compress_conversation_summary
# Example 1: From message list
messages = [
{"role": "user", "content": "Build authentication system with JWT"},
{"role": "assistant", "content": "Completed auth endpoints. Using FastAPI for async support."},
{"role": "user", "content": "Now add CRUD endpoints for users"},
{"role": "assistant", "content": "Working on user CRUD. Blocker: need to decide on pagination approach."}
]
summary = compress_conversation_summary(messages)
print(summary)
# Output:
# {
# "phase": "api_development",
# "completed": ["auth endpoints"],
# "in_progress": "user crud",
# "blockers": ["need to decide on pagination approach"],
# "decisions": [{
# "decision": "use fastapi",
# "rationale": "async support",
# "impact": "medium",
# "timestamp": "2026-01-16T..."
# }],
# "next": ["add crud endpoints"]
# }
# Example 2: From raw text
text = """
Completed:
- Authentication system with JWT
- Database migrations
- User model
Currently working on: API rate limiting
Blockers:
- Need Redis for rate limiting store
- Waiting on DevOps for Redis instance
Next steps:
- Implement rate limiting middleware
- Add API documentation
- Set up monitoring
"""
summary = compress_conversation_summary(text)
print(summary)
# Extracts phase, completed items, blockers, next actions
```
## 2. create_context_snippet()
Creates structured snippets with auto-extracted tags.
```python
from api.utils.context_compression import create_context_snippet
# Example 1: Decision snippet
snippet = create_context_snippet(
content="Using FastAPI instead of Flask for async support and better performance",
snippet_type="decision",
importance=8
)
print(snippet)
# Output:
# {
# "content": "Using FastAPI instead of Flask for async support and better performance",
# "type": "decision",
# "tags": ["decision", "fastapi", "async", "api"],
# "importance": 8,
# "relevance_score": 8.0,
# "created_at": "2026-01-16T12:00:00+00:00",
# "usage_count": 0,
# "last_used": None
# }
# Example 2: Pattern snippet
snippet = create_context_snippet(
content="Always use dependency injection for database sessions to ensure proper cleanup",
snippet_type="pattern",
importance=7
)
# Tags auto-extracted: ["pattern", "dependency-injection", "database"]
# Example 3: Blocker snippet
snippet = create_context_snippet(
content="PostgreSQL connection pool exhausted under load - need to tune max_connections",
snippet_type="blocker",
importance=9
)
# Tags: ["blocker", "postgresql", "database", "critical"]
```
## 3. compress_project_state()
Compresses project state into dense summary.
```python
from api.utils.context_compression import compress_project_state
project_details = {
"name": "ClaudeTools Context Recall System",
"phase": "api_development",
"progress_pct": 65,
"blockers": ["Need Redis setup", "Waiting on security review"],
"next_actions": ["Deploy to staging", "Load testing", "Documentation"]
}
current_work = "Implementing context compression utilities for token efficiency"
files_changed = [
"api/utils/context_compression.py",
"api/utils/__init__.py",
"tests/test_context_compression.py",
"migrations/versions/add_context_recall.py"
]
state = compress_project_state(project_details, current_work, files_changed)
print(state)
# Output:
# {
# "project": "ClaudeTools Context Recall System",
# "phase": "api_development",
# "progress": 65,
# "current": "Implementing context compression utilities for token efficiency",
# "files": [
# {"path": "api/utils/context_compression.py", "type": "impl"},
# {"path": "api/utils/__init__.py", "type": "impl"},
# {"path": "tests/test_context_compression.py", "type": "test"},
# {"path": "migrations/versions/add_context_recall.py", "type": "migration"}
# ],
# "blockers": ["Need Redis setup", "Waiting on security review"],
# "next": ["Deploy to staging", "Load testing", "Documentation"]
# }
```
## 4. extract_key_decisions()
Extracts decisions with rationale from text.
```python
from api.utils.context_compression import extract_key_decisions
text = """
We decided to use FastAPI for the API framework because it provides native async
support and automatic OpenAPI documentation generation.
Chose PostgreSQL for the database due to its robust JSON support and excellent
performance with complex queries.
Will use Redis for caching because it's fast and integrates well with our stack.
"""
decisions = extract_key_decisions(text)
print(decisions)
# Output:
# [
# {
# "decision": "use fastapi for the api framework",
# "rationale": "it provides native async support and automatic openapi documentation",
# "impact": "high",
# "timestamp": "2026-01-16T12:00:00+00:00"
# },
# {
# "decision": "postgresql for the database",
# "rationale": "its robust json support and excellent performance with complex queries",
# "impact": "high",
# "timestamp": "2026-01-16T12:00:00+00:00"
# },
# {
# "decision": "redis for caching",
# "rationale": "it's fast and integrates well with our stack",
# "impact": "medium",
# "timestamp": "2026-01-16T12:00:00+00:00"
# }
# ]
```
## 5. calculate_relevance_score()
Calculates relevance score with time decay and usage boost.
```python
from api.utils.context_compression import calculate_relevance_score
from datetime import datetime, timedelta, timezone
# Example 1: Recent, important snippet
snippet = {
"created_at": datetime.now(timezone.utc).isoformat(),
"usage_count": 3,
"importance": 8,
"tags": ["critical", "security", "api"],
"last_used": datetime.now(timezone.utc).isoformat()
}
score = calculate_relevance_score(snippet)
print(f"Score: {score}") # ~11.1 (8 base + 0.6 usage + 1.5 tags + 1.0 recent)
# Example 2: Old, unused snippet
old_snippet = {
"created_at": (datetime.now(timezone.utc) - timedelta(days=30)).isoformat(),
"usage_count": 0,
"importance": 5,
"tags": ["general"]
}
score = calculate_relevance_score(old_snippet)
print(f"Score: {score}") # ~3.0 (5 base - 2.0 time decay)
# Example 3: Frequently used pattern
pattern_snippet = {
"created_at": (datetime.now(timezone.utc) - timedelta(days=7)).isoformat(),
"usage_count": 10,
"importance": 7,
"tags": ["pattern", "architecture"],
"last_used": (datetime.now(timezone.utc) - timedelta(hours=2)).isoformat()
}
score = calculate_relevance_score(pattern_snippet)
print(f"Score: {score}") # ~9.3 (7 base - 0.7 decay + 2.0 usage + 0.0 tags + 1.0 recent)
```
## 6. merge_contexts()
Merges multiple contexts with deduplication.
```python
from api.utils.context_compression import merge_contexts
context1 = {
"phase": "api_development",
"completed": ["auth", "user_crud"],
"in_progress": "rate_limiting",
"blockers": ["need_redis"],
"decisions": [{
"decision": "use fastapi",
"timestamp": "2026-01-15T10:00:00Z"
}],
"next": ["deploy"],
"tags": ["api", "fastapi"]
}
context2 = {
"phase": "api_development",
"completed": ["auth", "user_crud", "validation"],
"in_progress": "testing",
"blockers": [],
"decisions": [{
"decision": "use pydantic",
"timestamp": "2026-01-16T10:00:00Z"
}],
"next": ["deploy", "monitoring"],
"tags": ["api", "testing"]
}
context3 = {
"phase": "testing",
"completed": ["unit_tests"],
"files": ["tests/test_api.py", "tests/test_auth.py"],
"tags": ["testing", "pytest"]
}
merged = merge_contexts([context1, context2, context3])
print(merged)
# Output:
# {
# "phase": "api_development", # First non-null
# "completed": ["auth", "unit_tests", "user_crud", "validation"], # Deduplicated, sorted
# "in_progress": "testing", # Most recent
# "blockers": ["need_redis"],
# "decisions": [
# {"decision": "use pydantic", "timestamp": "2026-01-16T10:00:00Z"}, # Newest first
# {"decision": "use fastapi", "timestamp": "2026-01-15T10:00:00Z"}
# ],
# "next": ["deploy", "monitoring"],
# "files": ["tests/test_api.py", "tests/test_auth.py"],
# "tags": ["api", "fastapi", "pytest", "testing"]
# }
```
## 7. format_for_injection()
Formats contexts for token-efficient prompt injection.
```python
from api.utils.context_compression import format_for_injection
contexts = [
{
"type": "blocker",
"content": "Redis connection failing in production - needs debugging",
"tags": ["redis", "production", "critical"],
"relevance_score": 9.5
},
{
"type": "decision",
"content": "Using FastAPI for async support and auto-documentation",
"tags": ["fastapi", "architecture"],
"relevance_score": 8.2
},
{
"type": "pattern",
"content": "Always use dependency injection for DB sessions",
"tags": ["pattern", "database"],
"relevance_score": 7.8
},
{
"type": "state",
"content": "Currently at 65% completion of API development phase",
"tags": ["progress", "api"],
"relevance_score": 7.0
}
]
# Format with default token limit
prompt = format_for_injection(contexts, max_tokens=500)
print(prompt)
# Output:
# ## Context Recall
#
# **Blockers:**
# - Redis connection failing in production - needs debugging [redis, production, critical]
#
# **Decisions:**
# - Using FastAPI for async support and auto-documentation [fastapi, architecture]
#
# **Patterns:**
# - Always use dependency injection for DB sessions [pattern, database]
#
# **States:**
# - Currently at 65% completion of API development phase [progress, api]
#
# *4 contexts loaded*
# Format with tight token limit
compact_prompt = format_for_injection(contexts, max_tokens=200)
print(compact_prompt)
# Only includes highest priority items within token budget
```
## 8. extract_tags_from_text()
Auto-extracts relevant tags from text.
```python
from api.utils.context_compression import extract_tags_from_text
# Example 1: Technology detection
text1 = "Implementing authentication using FastAPI with PostgreSQL database and Redis caching"
tags = extract_tags_from_text(text1)
print(tags) # ["fastapi", "postgresql", "redis", "database", "api", "auth", "cache"]
# Example 2: Pattern detection
text2 = "Refactoring async error handling middleware to optimize performance"
tags = extract_tags_from_text(text2)
print(tags) # ["async", "middleware", "error-handling", "optimization", "refactor"]
# Example 3: Category detection
text3 = "Critical bug in production: database connection pool exhausted causing system blocker"
tags = extract_tags_from_text(text3)
print(tags) # ["database", "critical", "blocker", "bug"]
# Example 4: Mixed content
text4 = """
Building CRUD endpoints with FastAPI and SQLAlchemy.
Using dependency injection pattern for database sessions.
Need to add validation with Pydantic.
Testing with pytest.
"""
tags = extract_tags_from_text(text4)
print(tags)
# ["fastapi", "sqlalchemy", "api", "database", "crud", "dependency-injection",
# "validation", "testing"]
```
## 9. compress_file_changes()
Compresses file change lists.
```python
from api.utils.context_compression import compress_file_changes
files = [
"api/routes/auth.py",
"api/routes/users.py",
"api/models/user.py",
"api/schemas/user.py",
"tests/test_auth.py",
"tests/test_users.py",
"migrations/versions/001_add_users.py",
"docker-compose.yml",
"README.md",
"requirements.txt"
]
compressed = compress_file_changes(files)
print(compressed)
# Output:
# [
# {"path": "api/routes/auth.py", "type": "api"},
# {"path": "api/routes/users.py", "type": "api"},
# {"path": "api/models/user.py", "type": "schema"},
# {"path": "api/schemas/user.py", "type": "schema"},
# {"path": "tests/test_auth.py", "type": "test"},
# {"path": "tests/test_users.py", "type": "test"},
# {"path": "migrations/versions/001_add_users.py", "type": "migration"},
# {"path": "docker-compose.yml", "type": "infra"},
# {"path": "README.md", "type": "doc"},
# {"path": "requirements.txt", "type": "config"}
# ]
```
## Complete Workflow Example
Here's a complete example showing how these functions work together:
```python
from api.utils.context_compression import (
compress_conversation_summary,
create_context_snippet,
compress_project_state,
merge_contexts,
format_for_injection,
calculate_relevance_score
)
# 1. Compress ongoing conversation
conversation = [
{"role": "user", "content": "Build API with FastAPI and PostgreSQL"},
{"role": "assistant", "content": "Completed auth system. Now working on CRUD endpoints."}
]
conv_summary = compress_conversation_summary(conversation)
# 2. Create snippets for important info
decision_snippet = create_context_snippet(
"Using FastAPI for async support",
snippet_type="decision",
importance=8
)
blocker_snippet = create_context_snippet(
"Need Redis for rate limiting",
snippet_type="blocker",
importance=9
)
# 3. Compress project state
project_state = compress_project_state(
project_details={"name": "API", "phase": "development", "progress_pct": 60},
current_work="Building CRUD endpoints",
files_changed=["api/routes/users.py", "tests/test_users.py"]
)
# 4. Merge all contexts
all_contexts = [conv_summary, project_state]
merged = merge_contexts(all_contexts)
# 5. Prepare snippets with relevance scores
snippets = [decision_snippet, blocker_snippet]
for snippet in snippets:
snippet["relevance_score"] = calculate_relevance_score(snippet)
# Sort by relevance
snippets.sort(key=lambda s: s["relevance_score"], reverse=True)
# 6. Format for prompt injection
context_prompt = format_for_injection(snippets, max_tokens=300)
print("=" * 60)
print("CONTEXT READY FOR CLAUDE:")
print("=" * 60)
print(context_prompt)
# This prompt can now be injected into Claude's context
```
## Integration with Database
Example of using these utilities with SQLAlchemy models:
```python
from sqlalchemy.orm import Session
from api.models.context_recall import ContextSnippet
from api.utils.context_compression import (
create_context_snippet,
calculate_relevance_score,
format_for_injection
)
def save_context(db: Session, content: str, snippet_type: str, importance: int):
"""Save context snippet to database"""
snippet = create_context_snippet(content, snippet_type, importance)
db_snippet = ContextSnippet(
content=snippet["content"],
type=snippet["type"],
tags=snippet["tags"],
importance=snippet["importance"],
relevance_score=snippet["relevance_score"]
)
db.add(db_snippet)
db.commit()
return db_snippet
def load_relevant_contexts(db: Session, limit: int = 20):
"""Load and format most relevant contexts"""
snippets = (
db.query(ContextSnippet)
.order_by(ContextSnippet.relevance_score.desc())
.limit(limit)
.all()
)
# Convert to dicts and recalculate scores
context_dicts = []
for snippet in snippets:
ctx = {
"content": snippet.content,
"type": snippet.type,
"tags": snippet.tags,
"importance": snippet.importance,
"created_at": snippet.created_at.isoformat(),
"usage_count": snippet.usage_count,
"last_used": snippet.last_used.isoformat() if snippet.last_used else None
}
ctx["relevance_score"] = calculate_relevance_score(ctx)
context_dicts.append(ctx)
# Sort by updated relevance score
context_dicts.sort(key=lambda c: c["relevance_score"], reverse=True)
# Format for injection
return format_for_injection(context_dicts, max_tokens=1000)
```
## Token Efficiency Stats
These utilities achieve significant token compression:
- Raw conversation (500 tokens) → Compressed summary (50-80 tokens) = **85-90% reduction**
- Full project state (1000 tokens) → Compressed state (100-150 tokens) = **85-90% reduction**
- Multiple contexts merged → Deduplicated = **30-50% reduction**
- Formatted injection → Only relevant info = **60-80% reduction**
**Overall pipeline efficiency: 90-95% token reduction while preserving critical information.**

View File

@@ -1,228 +0,0 @@
# Context Compression - Quick Reference
**Location:** `D:\ClaudeTools\api\utils\context_compression.py`
## Quick Import
```python
from api.utils.context_compression import *
# or
from api.utils import compress_conversation_summary, create_context_snippet, format_for_injection
```
## Function Quick Reference
| Function | Input | Output | Token Reduction |
|----------|-------|--------|-----------------|
| `compress_conversation_summary(conversation)` | str or list[dict] | Dense JSON summary | 85-90% |
| `create_context_snippet(content, type, importance)` | str, str, int | Structured snippet | N/A |
| `compress_project_state(details, work, files)` | dict, str, list | Dense state | 85-90% |
| `extract_key_decisions(text)` | str | list[dict] | N/A |
| `calculate_relevance_score(snippet, time)` | dict, datetime | float (0-10) | N/A |
| `merge_contexts(contexts)` | list[dict] | Merged dict | 30-50% |
| `format_for_injection(contexts, max_tokens)` | list[dict], int | Markdown str | 60-80% |
| `extract_tags_from_text(text)` | str | list[str] | N/A |
| `compress_file_changes(files)` | list[str] | list[dict] | N/A |
## Common Patterns
### Pattern 1: Save Conversation Context
```python
summary = compress_conversation_summary(messages)
snippet = create_context_snippet(
json.dumps(summary),
snippet_type="state",
importance=6
)
db.add(ContextSnippet(**snippet))
db.commit()
```
### Pattern 2: Load and Inject Context
```python
snippets = db.query(ContextSnippet)\
.order_by(ContextSnippet.relevance_score.desc())\
.limit(20).all()
contexts = [s.to_dict() for s in snippets]
prompt = format_for_injection(contexts, max_tokens=1000)
# Use in Claude prompt
messages = [
{"role": "system", "content": f"{system_msg}\n\n{prompt}"},
{"role": "user", "content": user_msg}
]
```
### Pattern 3: Record Decision
```python
decision = create_context_snippet(
"Using PostgreSQL for better JSON support and performance",
snippet_type="decision",
importance=9
)
db.add(ContextSnippet(**decision))
```
### Pattern 4: Track Blocker
```python
blocker = create_context_snippet(
"Redis connection failing in production",
snippet_type="blocker",
importance=10
)
db.add(ContextSnippet(**blocker))
```
### Pattern 5: Update Relevance Scores
```python
snippets = db.query(ContextSnippet).all()
for snippet in snippets:
data = snippet.to_dict()
snippet.relevance_score = calculate_relevance_score(data)
db.commit()
```
### Pattern 6: Merge Agent Contexts
```python
# Load contexts from multiple sources
conv_context = compress_conversation_summary(messages)
project_context = compress_project_state(project, work, files)
db_contexts = [s.to_dict() for s in db.query(ContextSnippet).limit(10)]
# Merge all
merged = merge_contexts([conv_context, project_context] + db_contexts)
```
## Tag Categories
### Technologies (Auto-detected)
`fastapi`, `postgresql`, `redis`, `docker`, `nginx`, `python`, `javascript`, `sqlalchemy`, `alembic`
### Patterns
`async`, `crud`, `middleware`, `dependency-injection`, `error-handling`, `validation`, `optimization`, `refactor`
### Categories
`critical`, `blocker`, `bug`, `feature`, `architecture`, `integration`, `security`, `testing`, `deployment`
## Relevance Score Formula
```
Score = base_importance
- min(2.0, age_days × 0.1) # Time decay
+ min(2.0, usage_count × 0.2) # Usage boost
+ (important_tags × 0.5) # Tag boost
+ (1.0 if used_in_24h else 0.0) # Recency boost
Clamped to [0.0, 10.0]
```
### Important Tags
`critical`, `blocker`, `decision`, `architecture`, `security`, `performance`, `bug`
## File Type Detection
| Path Pattern | Type |
|--------------|------|
| `*test*` | test |
| `*migration*` | migration |
| `*config*.{yaml,json,toml}` | config |
| `*model*`, `*schema*` | schema |
| `*api*`, `*route*`, `*endpoint*` | api |
| `.{py,js,ts,go,java}` | impl |
| `.{md,txt,rst}` | doc |
| `*docker*`, `*deploy*` | infra |
## One-Liner Examples
```python
# Compress and save conversation
db.add(ContextSnippet(**create_context_snippet(
json.dumps(compress_conversation_summary(messages)),
"state", 6
)))
# Load top contexts as prompt
prompt = format_for_injection(
[s.to_dict() for s in db.query(ContextSnippet)
.order_by(ContextSnippet.relevance_score.desc())
.limit(20)],
max_tokens=1000
)
# Extract and save decisions
for decision in extract_key_decisions(text):
db.add(ContextSnippet(**create_context_snippet(
f"{decision['decision']} because {decision['rationale']}",
"decision",
8 if decision['impact'] == 'high' else 6
)))
# Auto-tag and save
snippet = create_context_snippet(content, "general", 5)
# Tags auto-extracted from content
# Update all relevance scores
for s in db.query(ContextSnippet):
s.relevance_score = calculate_relevance_score(s.to_dict())
db.commit()
```
## Token Budget Guide
| Max Tokens | Use Case | Contexts |
|------------|----------|----------|
| 200 | Critical only | 3-5 |
| 500 | Essential | 8-12 |
| 1000 | Standard | 15-25 |
| 2000 | Extended | 30-50 |
## Error Handling
All functions handle edge cases:
- Empty input → Empty/default output
- Invalid dates → Current time
- Missing fields → Defaults
- Malformed JSON → Graceful degradation
## Testing
```bash
cd D:\ClaudeTools
python test_context_compression_quick.py
```
All 9 tests should pass.
## Performance
- Conversation compression: ~1ms per message
- Tag extraction: ~0.5ms per text
- Relevance calculation: ~0.1ms per snippet
- Format injection: ~10ms for 20 contexts
## Common Issues
**Issue:** Tags not extracted
**Solution:** Check text contains recognized keywords
**Issue:** Low relevance scores
**Solution:** Increase importance or usage_count
**Issue:** Injection too long
**Solution:** Reduce max_tokens or limit contexts
**Issue:** Missing fields in snippet
**Solution:** All required fields have defaults
## Full Documentation
- Examples: `api/utils/CONTEXT_COMPRESSION_EXAMPLES.md`
- Summary: `api/utils/CONTEXT_COMPRESSION_SUMMARY.md`
- Tests: `test_context_compression_quick.py`

View File

@@ -1,338 +0,0 @@
# Context Compression Utilities - Summary
## Overview
Created comprehensive context compression utilities for the ClaudeTools Context Recall System. These utilities enable **90-95% token reduction** while preserving critical information for efficient context injection.
## Files Created
1. **D:\ClaudeTools\api\utils\context_compression.py** - Main implementation (680 lines)
2. **D:\ClaudeTools\api\utils\CONTEXT_COMPRESSION_EXAMPLES.md** - Comprehensive usage examples
3. **D:\ClaudeTools\test_context_compression_quick.py** - Functional tests (all passing)
## Functions Implemented
### Core Compression Functions
1. **compress_conversation_summary(conversation)**
- Compresses conversations into dense JSON structure
- Extracts: phase, completed tasks, in-progress work, blockers, decisions, next actions
- Token reduction: 85-90%
2. **create_context_snippet(content, snippet_type, importance)**
- Creates structured snippets with auto-extracted tags
- Includes relevance scoring
- Supports types: decision, pattern, lesson, blocker, state
3. **compress_project_state(project_details, current_work, files_changed)**
- Compresses project state into dense summary
- Includes: phase, progress %, blockers, next actions, file changes
- Token reduction: 85-90%
4. **extract_key_decisions(text)**
- Extracts decisions with rationale and impact
- Auto-classifies impact level (low/medium/high)
- Returns structured array with timestamps
### Relevance & Scoring
5. **calculate_relevance_score(snippet, current_time)**
- Calculates 0.0-10.0 relevance score
- Factors: age (time decay), usage count, importance, tags, recency
- Formula: `base_importance - time_decay + usage_boost + tag_boost + recency_boost`
### Context Management
6. **merge_contexts(contexts)**
- Merges multiple context objects
- Deduplicates information
- Keeps most recent values
- Token reduction: 30-50%
7. **format_for_injection(contexts, max_tokens)**
- Formats contexts for prompt injection
- Token-efficient markdown output
- Prioritizes by relevance score
- Respects token budget
### Utilities
8. **extract_tags_from_text(text)**
- Auto-detects technologies (fastapi, postgresql, redis, etc.)
- Identifies patterns (async, crud, middleware, etc.)
- Recognizes categories (critical, blocker, bug, etc.)
9. **compress_file_changes(file_paths)**
- Compresses file change lists
- Auto-classifies by type: api, test, schema, migration, config, doc, infra
- Limits to 50 files max
## Key Features
### Maximum Token Efficiency
- **Conversation compression**: 500 tokens → 50-80 tokens (85-90% reduction)
- **Project state**: 1000 tokens → 100-150 tokens (85-90% reduction)
- **Context merging**: 30-50% deduplication
- **Overall pipeline**: 90-95% total reduction
### Intelligent Relevance Scoring
```python
Score = base_importance
- (age_days × 0.1, max -2.0) # Time decay
+ (usage_count × 0.2, max +2.0) # Usage boost
+ (important_tags × 0.5) # Tag boost
+ (1.0 if used_in_24h else 0.0) # Recency boost
```
### Auto-Tag Extraction
Detects 30+ technology and pattern keywords:
- Technologies: fastapi, postgresql, redis, docker, nginx, etc.
- Patterns: async, crud, middleware, dependency-injection, etc.
- Categories: critical, blocker, bug, feature, architecture, etc.
## Usage Examples
### Basic Usage
```python
from api.utils.context_compression import (
compress_conversation_summary,
create_context_snippet,
format_for_injection
)
# Compress conversation
messages = [
{"role": "user", "content": "Build auth with FastAPI"},
{"role": "assistant", "content": "Completed auth endpoints"}
]
summary = compress_conversation_summary(messages)
# {"phase": "api_development", "completed": ["auth endpoints"], ...}
# Create snippet
snippet = create_context_snippet(
"Using FastAPI for async support",
snippet_type="decision",
importance=8
)
# Auto-extracts tags: ["decision", "fastapi", "async", "api"]
# Format for prompt injection
contexts = [snippet]
prompt = format_for_injection(contexts, max_tokens=500)
# "## Context Recall\n\n**Decisions:**\n- Using FastAPI..."
```
### Database Integration
```python
from sqlalchemy.orm import Session
from api.models.context_recall import ContextSnippet
from api.utils.context_compression import (
create_context_snippet,
calculate_relevance_score,
format_for_injection
)
def save_context(db: Session, content: str, type: str, importance: int):
"""Save context to database"""
snippet = create_context_snippet(content, type, importance)
db_snippet = ContextSnippet(**snippet)
db.add(db_snippet)
db.commit()
return db_snippet
def load_contexts(db: Session, limit: int = 20):
"""Load and format relevant contexts"""
snippets = db.query(ContextSnippet)\
.order_by(ContextSnippet.relevance_score.desc())\
.limit(limit).all()
# Convert to dicts and recalculate scores
contexts = [snippet.to_dict() for snippet in snippets]
for ctx in contexts:
ctx["relevance_score"] = calculate_relevance_score(ctx)
# Sort and format
contexts.sort(key=lambda c: c["relevance_score"], reverse=True)
return format_for_injection(contexts, max_tokens=1000)
```
### Complete Workflow
```python
from api.utils.context_compression import (
compress_conversation_summary,
compress_project_state,
merge_contexts,
format_for_injection
)
# 1. Compress conversation
conv_summary = compress_conversation_summary(messages)
# 2. Compress project state
project_state = compress_project_state(
{"name": "API", "phase": "dev", "progress_pct": 60},
"Building CRUD endpoints",
["api/routes/users.py"]
)
# 3. Merge contexts
merged = merge_contexts([conv_summary, project_state])
# 4. Load snippets from DB (with relevance scores)
snippets = load_contexts(db, limit=20)
# 5. Format for injection
context_prompt = format_for_injection(snippets, max_tokens=1000)
# 6. Inject into Claude prompt
full_prompt = f"{context_prompt}\n\n{user_message}"
```
## Testing
All 9 functional tests passing:
```
✓ compress_conversation_summary - Extracts phase, completed, in-progress, blockers
✓ create_context_snippet - Creates structured snippets with tags
✓ extract_tags_from_text - Detects technologies, patterns, categories
✓ extract_key_decisions - Extracts decisions with rationale
✓ calculate_relevance_score - Scores with time decay and boosts
✓ merge_contexts - Merges and deduplicates contexts
✓ compress_project_state - Compresses project state
✓ compress_file_changes - Classifies and compresses file lists
✓ format_for_injection - Formats for token-efficient injection
```
Run tests:
```bash
cd D:\ClaudeTools
python test_context_compression_quick.py
```
## Type Safety
All functions include:
- Full type hints (typing module)
- Comprehensive docstrings
- Usage examples in docstrings
- Error handling for edge cases
## Performance Characteristics
### Token Efficiency
- **Single conversation**: 500 → 60 tokens (88% reduction)
- **Project state**: 1000 → 120 tokens (88% reduction)
- **10 contexts merged**: 5000 → 300 tokens (94% reduction)
- **Formatted injection**: Only relevant info within budget
### Time Complexity
- `compress_conversation_summary`: O(n) - linear in text length
- `create_context_snippet`: O(n) - linear in content length
- `extract_key_decisions`: O(n) - regex matching
- `calculate_relevance_score`: O(1) - constant time
- `merge_contexts`: O(n×m) - n contexts, m items per context
- `format_for_injection`: O(n log n) - sorting + formatting
### Space Complexity
All functions use O(n) space relative to input size, with hard limits:
- Max 10 completed items per context
- Max 5 blockers per context
- Max 10 next actions per context
- Max 20 contexts in merged output
- Max 50 files in compressed changes
## Integration Points
### Database Models
Works with SQLAlchemy models having these fields:
- `content` (str)
- `type` (str)
- `tags` (list/JSON)
- `importance` (int 1-10)
- `relevance_score` (float 0.0-10.0)
- `created_at` (datetime)
- `usage_count` (int)
- `last_used` (datetime, nullable)
### API Endpoints
Expected API usage:
- `POST /api/v1/context` - Save context snippet
- `GET /api/v1/context` - Load contexts (sorted by relevance)
- `POST /api/v1/context/merge` - Merge multiple contexts
- `GET /api/v1/context/inject` - Get formatted prompt injection
### Claude Prompt Injection
```python
# Before sending to Claude
context_prompt = load_contexts(db, agent_id=agent.id, limit=20)
messages = [
{"role": "system", "content": f"{base_system_prompt}\n\n{context_prompt}"},
{"role": "user", "content": user_message}
]
response = claude_client.messages.create(messages=messages)
```
## Future Enhancements
Potential improvements:
1. **Semantic similarity**: Group similar contexts
2. **LLM-based summarization**: Use small model for ultra-compression
3. **Context pruning**: Auto-remove stale contexts
4. **Multi-agent support**: Share contexts across agents
5. **Vector embeddings**: For semantic search
6. **Streaming compression**: Handle very large conversations
7. **Custom tag rules**: User-defined tag extraction
## File Structure
```
D:\ClaudeTools\api\utils\
├── __init__.py # Updated exports
├── context_compression.py # Main implementation (680 lines)
├── CONTEXT_COMPRESSION_EXAMPLES.md # Usage examples
└── CONTEXT_COMPRESSION_SUMMARY.md # This file
D:\ClaudeTools\
└── test_context_compression_quick.py # Functional tests
```
## Import Reference
```python
# Import all functions
from api.utils.context_compression import (
# Core compression
compress_conversation_summary,
create_context_snippet,
compress_project_state,
extract_key_decisions,
# Relevance & scoring
calculate_relevance_score,
# Context management
merge_contexts,
format_for_injection,
# Utilities
extract_tags_from_text,
compress_file_changes
)
# Or import via utils package
from api.utils import (
compress_conversation_summary,
create_context_snippet,
# ... etc
)
```
## License & Attribution
Part of the ClaudeTools Context Recall System.
Created: 2026-01-16
All utilities designed for maximum token efficiency and information density.

View File

@@ -1,643 +0,0 @@
"""
Context Compression Utilities for ClaudeTools Context Recall System
Maximum information density, minimum token usage.
All functions designed for efficient context summarization and injection.
"""
import re
from datetime import datetime, timezone
from typing import Any, Dict, List, Optional, Union
from collections import defaultdict
def compress_conversation_summary(
conversation: Union[str, List[Dict[str, str]]]
) -> Dict[str, Any]:
"""
Compress conversation into dense JSON structure with key points.
Args:
conversation: Raw conversation text or message list
[{role: str, content: str}, ...] or str
Returns:
Dense summary with phase, completed, in_progress, blockers, decisions, next
Example:
>>> msgs = [{"role": "user", "content": "Build auth system"}]
>>> compress_conversation_summary(msgs)
{
"phase": "api_development",
"completed": ["auth"],
"in_progress": None,
"blockers": [],
"decisions": [],
"next": []
}
"""
# Convert to text if list
if isinstance(conversation, list):
text = "\n".join([f"{msg.get('role', 'user')}: {msg.get('content', '')}"
for msg in conversation])
else:
text = conversation
text_lower = text.lower()
# Extract phase
phase = "unknown"
phase_keywords = {
"api_development": ["api", "endpoint", "fastapi", "route"],
"testing": ["test", "pytest", "unittest"],
"deployment": ["deploy", "docker", "production"],
"debugging": ["bug", "error", "fix", "debug"],
"design": ["design", "architecture", "plan"],
"integration": ["integrate", "connect", "third-party"]
}
for p, keywords in phase_keywords.items():
if any(kw in text_lower for kw in keywords):
phase = p
break
# Extract completed tasks
completed = []
completed_patterns = [
r"completed[:\s]+([^\n.]+)",
r"finished[:\s]+([^\n.]+)",
r"done[:\s]+([^\n.]+)",
r"\[OK\]\s*([^\n.]+)",
r"\[PASS\]\s*([^\n.]+)",
r"implemented[:\s]+([^\n.]+)"
]
for pattern in completed_patterns:
matches = re.findall(pattern, text_lower)
completed.extend([m.strip()[:50] for m in matches])
# Extract in-progress
in_progress = None
in_progress_patterns = [
r"in[- ]progress[:\s]+([^\n.]+)",
r"working on[:\s]+([^\n.]+)",
r"currently[:\s]+([^\n.]+)"
]
for pattern in in_progress_patterns:
match = re.search(pattern, text_lower)
if match:
in_progress = match.group(1).strip()[:50]
break
# Extract blockers
blockers = []
blocker_patterns = [
r"blocker[s]?[:\s]+([^\n.]+)",
r"blocked[:\s]+([^\n.]+)",
r"issue[s]?[:\s]+([^\n.]+)",
r"problem[s]?[:\s]+([^\n.]+)"
]
for pattern in blocker_patterns:
matches = re.findall(pattern, text_lower)
blockers.extend([m.strip()[:50] for m in matches])
# Extract decisions
decisions = extract_key_decisions(text)
# Extract next actions
next_actions = []
next_patterns = [
r"next[:\s]+([^\n.]+)",
r"todo[:\s]+([^\n.]+)",
r"will[:\s]+([^\n.]+)"
]
for pattern in next_patterns:
matches = re.findall(pattern, text_lower)
next_actions.extend([m.strip()[:50] for m in matches])
return {
"phase": phase,
"completed": list(set(completed))[:10], # Dedupe, limit
"in_progress": in_progress,
"blockers": list(set(blockers))[:5],
"decisions": decisions[:5],
"next": list(set(next_actions))[:10]
}
def create_context_snippet(
content: str,
snippet_type: str = "general",
importance: int = 5
) -> Dict[str, Any]:
"""
Create structured snippet with auto-extracted tags and relevance score.
Args:
content: Raw information (decision, pattern, lesson)
snippet_type: Type of snippet (decision, pattern, lesson, state)
importance: Manual importance 1-10, default 5
Returns:
Structured snippet with tags, relevance score, metadata
Example:
>>> create_context_snippet("Using FastAPI for async support", "decision")
{
"content": "Using FastAPI for async support",
"type": "decision",
"tags": ["fastapi", "async"],
"importance": 5,
"relevance_score": 5.0,
"created_at": "2026-01-16T...",
"usage_count": 0
}
"""
# Extract tags from content
tags = extract_tags_from_text(content)
# Add type-specific tag
if snippet_type not in tags:
tags.insert(0, snippet_type)
now = datetime.now(timezone.utc).isoformat()
snippet = {
"content": content[:500], # Limit content length
"type": snippet_type,
"tags": tags[:10], # Limit tags
"importance": max(1, min(10, importance)), # Clamp 1-10
"created_at": now,
"usage_count": 0,
"last_used": None
}
# Calculate initial relevance score
snippet["relevance_score"] = calculate_relevance_score(snippet)
return snippet
def compress_project_state(
project_details: Dict[str, Any],
current_work: str,
files_changed: Optional[List[str]] = None
) -> Dict[str, Any]:
"""
Compress project state into dense summary.
Args:
project_details: Dict with name, description, phase, etc.
current_work: Description of current work
files_changed: List of file paths that changed
Returns:
Dense project state with phase, progress, blockers, next actions
Example:
>>> compress_project_state(
... {"name": "ClaudeTools", "phase": "api_dev"},
... "Building auth endpoints",
... ["api/auth.py"]
... )
{
"project": "ClaudeTools",
"phase": "api_dev",
"progress": 0,
"current": "Building auth endpoints",
"files": ["api/auth.py"],
"blockers": [],
"next": []
}
"""
files_changed = files_changed or []
state = {
"project": project_details.get("name", "unknown")[:50],
"phase": project_details.get("phase", "unknown")[:30],
"progress": project_details.get("progress_pct", 0),
"current": current_work[:200], # Compress description
"files": compress_file_changes(files_changed),
"blockers": project_details.get("blockers", [])[:5],
"next": project_details.get("next_actions", [])[:10]
}
return state
def extract_key_decisions(text: str) -> List[Dict[str, str]]:
"""
Extract key decisions from conversation text.
Args:
text: Conversation text or work description
Returns:
Array of decision objects with decision, rationale, impact, timestamp
Example:
>>> extract_key_decisions("Decided to use FastAPI for async support")
[{
"decision": "use FastAPI",
"rationale": "async support",
"impact": "medium",
"timestamp": "2026-01-16T..."
}]
"""
decisions = []
text_lower = text.lower()
# Decision patterns
patterns = [
r"decid(?:ed|e)[:\s]+([^.\n]+?)(?:because|for|due to)[:\s]+([^.\n]+)",
r"chose[:\s]+([^.\n]+?)(?:because|for|due to)[:\s]+([^.\n]+)",
r"using[:\s]+([^.\n]+?)(?:because|for|due to)[:\s]+([^.\n]+)",
r"will use[:\s]+([^.\n]+?)(?:because|for|due to)[:\s]+([^.\n]+)"
]
for pattern in patterns:
matches = re.findall(pattern, text_lower)
for match in matches:
decision = match[0].strip()[:100]
rationale = match[1].strip()[:100]
# Estimate impact based on keywords
impact = "low"
high_impact_keywords = ["architecture", "database", "framework", "major"]
medium_impact_keywords = ["api", "endpoint", "feature", "integration"]
if any(kw in decision.lower() or kw in rationale.lower()
for kw in high_impact_keywords):
impact = "high"
elif any(kw in decision.lower() or kw in rationale.lower()
for kw in medium_impact_keywords):
impact = "medium"
decisions.append({
"decision": decision,
"rationale": rationale,
"impact": impact,
"timestamp": datetime.now(timezone.utc).isoformat()
})
return decisions
def calculate_relevance_score(
snippet: Dict[str, Any],
current_time: Optional[datetime] = None
) -> float:
"""
Calculate relevance score based on age, usage, tags, importance.
Args:
snippet: Snippet metadata with created_at, usage_count, importance, tags
current_time: Optional current time for testing, defaults to now
Returns:
Float score 0.0-10.0 (higher = more relevant)
Example:
>>> snippet = {
... "created_at": "2026-01-16T12:00:00Z",
... "usage_count": 5,
... "importance": 8,
... "tags": ["critical", "fastapi"]
... }
>>> calculate_relevance_score(snippet)
9.2
"""
if current_time is None:
current_time = datetime.now(timezone.utc)
# Parse created_at
try:
created_at = datetime.fromisoformat(snippet["created_at"].replace("Z", "+00:00"))
except (ValueError, KeyError):
created_at = current_time
# Base score from importance (0-10)
score = float(snippet.get("importance", 5))
# Time decay - lose 0.1 points per day, max -2.0
age_days = (current_time - created_at).total_seconds() / 86400
time_penalty = min(2.0, age_days * 0.1)
score -= time_penalty
# Usage boost - add 0.2 per use, max +2.0
usage_count = snippet.get("usage_count", 0)
usage_boost = min(2.0, usage_count * 0.2)
score += usage_boost
# Tag boost for important tags
important_tags = {"critical", "blocker", "decision", "architecture",
"security", "performance", "bug"}
tags = set(snippet.get("tags", []))
tag_boost = len(tags & important_tags) * 0.5 # 0.5 per important tag
score += tag_boost
# Recency boost if used recently
last_used = snippet.get("last_used")
if last_used:
try:
last_used_dt = datetime.fromisoformat(last_used.replace("Z", "+00:00"))
hours_since_use = (current_time - last_used_dt).total_seconds() / 3600
if hours_since_use < 24: # Used in last 24h
score += 1.0
except (ValueError, AttributeError):
pass
# Clamp to 0.0-10.0
return max(0.0, min(10.0, score))
def merge_contexts(contexts: List[Dict[str, Any]]) -> Dict[str, Any]:
"""
Merge multiple context objects into single deduplicated context.
Args:
contexts: List of context objects to merge
Returns:
Single merged context with deduplicated, most recent info
Example:
>>> ctx1 = {"phase": "api_dev", "completed": ["auth"]}
>>> ctx2 = {"phase": "api_dev", "completed": ["auth", "crud"]}
>>> merge_contexts([ctx1, ctx2])
{"phase": "api_dev", "completed": ["auth", "crud"], ...}
"""
if not contexts:
return {}
merged = {
"phase": None,
"completed": [],
"in_progress": None,
"blockers": [],
"decisions": [],
"next": [],
"files": [],
"tags": []
}
# Collect all items
completed_set = set()
blocker_set = set()
next_set = set()
files_set = set()
tags_set = set()
decisions_list = []
for ctx in contexts:
# Take most recent phase
if ctx.get("phase") and not merged["phase"]:
merged["phase"] = ctx["phase"]
# Take most recent in_progress
if ctx.get("in_progress"):
merged["in_progress"] = ctx["in_progress"]
# Collect completed
for item in ctx.get("completed", []):
if isinstance(item, str):
completed_set.add(item)
# Collect blockers
for item in ctx.get("blockers", []):
if isinstance(item, str):
blocker_set.add(item)
# Collect next actions
for item in ctx.get("next", []):
if isinstance(item, str):
next_set.add(item)
# Collect files
for item in ctx.get("files", []):
if isinstance(item, str):
files_set.add(item)
elif isinstance(item, dict) and "path" in item:
files_set.add(item["path"])
# Collect tags
for item in ctx.get("tags", []):
if isinstance(item, str):
tags_set.add(item)
# Collect decisions (keep all with timestamps)
for decision in ctx.get("decisions", []):
if isinstance(decision, dict):
decisions_list.append(decision)
# Sort decisions by timestamp (most recent first)
decisions_list.sort(
key=lambda d: d.get("timestamp", ""),
reverse=True
)
merged["completed"] = sorted(list(completed_set))[:20]
merged["blockers"] = sorted(list(blocker_set))[:10]
merged["next"] = sorted(list(next_set))[:20]
merged["files"] = sorted(list(files_set))[:30]
merged["tags"] = sorted(list(tags_set))[:20]
merged["decisions"] = decisions_list[:10]
return merged
def format_for_injection(
contexts: List[Dict[str, Any]],
max_tokens: int = 1000
) -> str:
"""
Format context objects for token-efficient prompt injection.
Args:
contexts: List of context objects from database (sorted by relevance)
max_tokens: Approximate max tokens to use (rough estimate)
Returns:
Token-efficient markdown string for Claude prompt
Example:
>>> contexts = [{"content": "Use FastAPI", "tags": ["api"]}]
>>> format_for_injection(contexts)
"## Context Recall\\n\\n- Use FastAPI [api]\\n"
"""
if not contexts:
return ""
lines = ["## Context Recall\n"]
# Estimate ~4 chars per token
max_chars = max_tokens * 4
current_chars = len(lines[0])
# Group by type
by_type = defaultdict(list)
for ctx in contexts:
ctx_type = ctx.get("type", "general")
by_type[ctx_type].append(ctx)
# Priority order for types
type_priority = ["blocker", "decision", "state", "pattern", "lesson", "general"]
for ctx_type in type_priority:
if ctx_type not in by_type:
continue
# Add type header
header = f"\n**{ctx_type.title()}s:**\n"
if current_chars + len(header) > max_chars:
break
lines.append(header)
current_chars += len(header)
# Add contexts of this type
for ctx in by_type[ctx_type][:5]: # Max 5 per type
content = ctx.get("content", "")
tags = ctx.get("tags", [])
# Format with tags
tag_str = f" [{', '.join(tags[:3])}]" if tags else ""
line = f"- {content[:150]}{tag_str}\n"
if current_chars + len(line) > max_chars:
break
lines.append(line)
current_chars += len(line)
# Add summary stats
summary = f"\n*{len(contexts)} contexts loaded*\n"
if current_chars + len(summary) <= max_chars:
lines.append(summary)
return "".join(lines)
def extract_tags_from_text(text: str) -> List[str]:
"""
Auto-detect relevant tags from text content.
Args:
text: Content to extract tags from
Returns:
List of detected tags (technologies, patterns, categories)
Example:
>>> extract_tags_from_text("Using FastAPI with PostgreSQL")
["fastapi", "postgresql", "api", "database"]
"""
text_lower = text.lower()
tags = []
# Technology keywords
tech_keywords = {
"fastapi": ["fastapi"],
"postgresql": ["postgresql", "postgres", "psql"],
"sqlalchemy": ["sqlalchemy", "orm"],
"alembic": ["alembic", "migration"],
"docker": ["docker", "container"],
"redis": ["redis", "cache"],
"nginx": ["nginx", "reverse proxy"],
"python": ["python", "py"],
"javascript": ["javascript", "js", "node"],
"typescript": ["typescript", "ts"],
"react": ["react", "jsx"],
"vue": ["vue"],
"api": ["api", "endpoint", "rest"],
"database": ["database", "db", "sql"],
"auth": ["auth", "authentication", "authorization"],
"security": ["security", "encryption", "secure"],
"testing": ["test", "pytest", "unittest"],
"deployment": ["deploy", "deployment", "production"]
}
for tag, keywords in tech_keywords.items():
if any(kw in text_lower for kw in keywords):
tags.append(tag)
# Pattern keywords
pattern_keywords = {
"async": ["async", "asynchronous", "await"],
"crud": ["crud", "create", "read", "update", "delete"],
"middleware": ["middleware"],
"dependency-injection": ["dependency injection", "depends"],
"error-handling": ["error", "exception", "try", "catch"],
"validation": ["validation", "validate", "pydantic"],
"optimization": ["optimize", "performance", "speed"],
"refactor": ["refactor", "refactoring", "cleanup"]
}
for tag, keywords in pattern_keywords.items():
if any(kw in text_lower for kw in keywords):
tags.append(tag)
# Category keywords
category_keywords = {
"critical": ["critical", "urgent", "important"],
"blocker": ["blocker", "blocked", "blocking"],
"bug": ["bug", "error", "issue", "problem"],
"feature": ["feature", "enhancement", "add"],
"architecture": ["architecture", "design", "structure"],
"integration": ["integration", "integrate", "connect"]
}
for tag, keywords in category_keywords.items():
if any(kw in text_lower for kw in keywords):
tags.append(tag)
# Deduplicate and return
return list(dict.fromkeys(tags)) # Preserves order
def compress_file_changes(file_paths: List[str]) -> List[Dict[str, str]]:
"""
Compress file change list into brief summaries.
Args:
file_paths: List of file paths that changed
Returns:
Compressed summary with path and inferred change type
Example:
>>> compress_file_changes(["api/auth.py", "tests/test_auth.py"])
[
{"path": "api/auth.py", "type": "impl"},
{"path": "tests/test_auth.py", "type": "test"}
]
"""
compressed = []
for path in file_paths[:50]: # Limit to 50 files
# Infer change type from path
change_type = "other"
path_lower = path.lower()
if "test" in path_lower:
change_type = "test"
elif any(ext in path_lower for ext in [".py", ".js", ".ts", ".go", ".java"]):
if "migration" in path_lower:
change_type = "migration"
elif "config" in path_lower or path_lower.endswith((".yaml", ".yml", ".json", ".toml")):
change_type = "config"
elif "model" in path_lower or "schema" in path_lower:
change_type = "schema"
elif "api" in path_lower or "endpoint" in path_lower or "route" in path_lower:
change_type = "api"
else:
change_type = "impl"
elif path_lower.endswith((".md", ".txt", ".rst")):
change_type = "doc"
elif "docker" in path_lower or "deploy" in path_lower:
change_type = "infra"
compressed.append({
"path": path,
"type": change_type
})
return compressed