Refactor cockpit to use DockerTmuxController pattern
Based on claude-code-tools TmuxCLIController, this refactor: - Added DockerTmuxController class for robust tmux session management - Implements send_keys() with configurable delay_enter - Implements capture_pane() for output retrieval - Implements wait_for_prompt() for pattern-based completion detection - Implements wait_for_idle() for content-hash-based idle detection - Implements wait_for_shell_prompt() for shell prompt detection Also includes workflow improvements: - Pre-task git snapshot before agent execution - Post-task commit protocol in agent guidelines Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
425
docs/SKILL_LEARNING_SYSTEM.md
Normal file
425
docs/SKILL_LEARNING_SYSTEM.md
Normal file
@@ -0,0 +1,425 @@
|
||||
# Skill and Knowledge Learning System
|
||||
|
||||
## Overview
|
||||
|
||||
The Skill and Knowledge Learning System automatically extracts learnings from completed tasks and QA passes, storing them in the shared knowledge graph for future skill recommendations and continuous decision-making improvements.
|
||||
|
||||
This system enables Luzia to:
|
||||
- **Learn from successes**: Extract patterns from passing QA validations
|
||||
- **Build skill profiles**: Aggregate tool usage, patterns, and decision-making approaches
|
||||
- **Make recommendations**: Suggest effective approaches for similar future tasks
|
||||
- **Improve over time**: Store learnings persistently for cross-session learning
|
||||
|
||||
## Architecture
|
||||
|
||||
### Components
|
||||
|
||||
```
|
||||
TaskExecution
|
||||
↓
|
||||
TaskAnalyzer → Patterns & Metadata
|
||||
↓
|
||||
SkillExtractor → Extracted Skills
|
||||
↓
|
||||
LearningEngine → Learning Objects
|
||||
↓
|
||||
KnowledgeGraph (research domain)
|
||||
↓
|
||||
SkillRecommender → Task Recommendations
|
||||
```
|
||||
|
||||
### Core Classes
|
||||
|
||||
#### 1. **TaskAnalyzer**
|
||||
Analyzes task executions to extract patterns and metadata.
|
||||
|
||||
```python
|
||||
from lib.skill_learning_engine import TaskAnalyzer
|
||||
|
||||
analyzer = TaskAnalyzer()
|
||||
|
||||
# Analyze a single task
|
||||
execution = analyzer.analyze_task({
|
||||
"task_id": "task_001",
|
||||
"prompt": "Refactor database schema",
|
||||
"project": "overbits",
|
||||
"status": "success",
|
||||
"tools_used": ["Bash", "Read", "Edit"],
|
||||
"duration": 45.2,
|
||||
"result_summary": "Schema refactored successfully",
|
||||
"qa_passed": True,
|
||||
"timestamp": "2026-01-09T12:00:00"
|
||||
})
|
||||
|
||||
# Extract patterns from multiple executions
|
||||
patterns = analyzer.extract_patterns(executions)
|
||||
# Returns: success_rate, average_duration, common_tools, etc.
|
||||
```
|
||||
|
||||
#### 2. **SkillExtractor**
|
||||
Extracts skills from task executions and QA results.
|
||||
|
||||
```python
|
||||
from lib.skill_learning_engine import SkillExtractor
|
||||
|
||||
extractor = SkillExtractor()
|
||||
|
||||
# Extract skills from task
|
||||
skills = extractor.extract_from_task(execution)
|
||||
|
||||
# Extract skills from QA results
|
||||
qa_skills = extractor.extract_from_qa_results(qa_results)
|
||||
|
||||
# Aggregate multiple skill extractions
|
||||
aggregated = extractor.aggregate_skills(all_skills)
|
||||
```
|
||||
|
||||
**Skill Categories:**
|
||||
- `tool_usage`: Tools used in task (Read, Bash, Edit, etc.)
|
||||
- `pattern`: Task patterns (optimization, debugging, testing, etc.)
|
||||
- `decision`: Decision-making approaches
|
||||
- `architecture`: Project/system knowledge
|
||||
|
||||
#### 3. **LearningEngine**
|
||||
Processes and stores learnings in the knowledge graph.
|
||||
|
||||
```python
|
||||
from lib.skill_learning_engine import LearningEngine
|
||||
|
||||
engine = LearningEngine()
|
||||
|
||||
# Extract a learning from successful task
|
||||
learning = engine.extract_learning(execution, skills, qa_results)
|
||||
|
||||
# Store in knowledge graph
|
||||
learning_id = engine.store_learning(learning)
|
||||
|
||||
# Create skill entities
|
||||
skill_id = engine.create_skill_entity(skill)
|
||||
```
|
||||
|
||||
#### 4. **SkillRecommender**
|
||||
Recommends skills for future tasks based on stored learnings.
|
||||
|
||||
```python
|
||||
from lib.skill_learning_engine import SkillRecommender
|
||||
|
||||
recommender = SkillRecommender()
|
||||
|
||||
# Get recommendations for a task
|
||||
recommendations = recommender.recommend_for_task(
|
||||
task_prompt="Optimize database performance",
|
||||
project="overbits"
|
||||
)
|
||||
|
||||
# Get overall skill profile
|
||||
profile = recommender.get_skill_profile()
|
||||
```
|
||||
|
||||
#### 5. **SkillLearningSystem**
|
||||
Unified orchestrator for the complete learning pipeline.
|
||||
|
||||
```python
|
||||
from lib.skill_learning_engine import SkillLearningSystem
|
||||
|
||||
system = SkillLearningSystem()
|
||||
|
||||
# Process a completed task with QA results
|
||||
result = system.process_task_completion(task_data, qa_results)
|
||||
# Result includes: skills_extracted, learning_created, learning_id
|
||||
|
||||
# Get recommendations
|
||||
recommendations = system.get_recommendations(prompt, project)
|
||||
|
||||
# Get learning summary
|
||||
summary = system.get_learning_summary()
|
||||
```
|
||||
|
||||
## Integration with QA Validator
|
||||
|
||||
The learning system integrates seamlessly with the QA validator:
|
||||
|
||||
### Manual Integration
|
||||
|
||||
```python
|
||||
from lib.qa_learning_integration import QALearningIntegrator
|
||||
|
||||
integrator = QALearningIntegrator()
|
||||
|
||||
# Run QA with automatic learning extraction
|
||||
result = integrator.run_qa_and_sync_with_learning(sync=True, verbose=True)
|
||||
```
|
||||
|
||||
### Via CLI
|
||||
|
||||
```bash
|
||||
# Standard QA validation
|
||||
python3 lib/qa_validator.py
|
||||
|
||||
# QA validation with learning extraction
|
||||
python3 lib/qa_validator.py --learn --sync --verbose
|
||||
|
||||
# Get statistics on learning integration
|
||||
python3 lib/qa_learning_integration.py --stats
|
||||
```
|
||||
|
||||
## Knowledge Graph Storage
|
||||
|
||||
Learnings are stored in the `research` domain of the knowledge graph:
|
||||
|
||||
```
|
||||
Entity Type: finding
|
||||
Name: learning_20260109_120000_Refactor_Database_Schema
|
||||
Content:
|
||||
- Title: Refactor Database Schema
|
||||
- Description: Task execution details
|
||||
- Skills Used: tool_bash, tool_read, tool_edit, ...
|
||||
- Pattern: refactoring_pattern
|
||||
- Applicability: overbits, tool_bash, decision, ...
|
||||
- Confidence: 0.85
|
||||
|
||||
Metadata:
|
||||
- skills: [list of skill names]
|
||||
- pattern: refactoring_pattern
|
||||
- confidence: 0.85
|
||||
- applicability: [projects, tools, categories]
|
||||
- extraction_time: ISO timestamp
|
||||
```
|
||||
|
||||
### Accessing Stored Learnings
|
||||
|
||||
```python
|
||||
from lib.knowledge_graph import KnowledgeGraph
|
||||
|
||||
kg = KnowledgeGraph("research")
|
||||
|
||||
# Search for learnings
|
||||
learnings = kg.search("database optimization", limit=10)
|
||||
|
||||
# Get specific learning
|
||||
learning = kg.get_entity("learning_20260109_120000_Refactor_Database_Schema")
|
||||
|
||||
# Get related skills
|
||||
relations = kg.get_relations("learning_20260109_120000_...")
|
||||
|
||||
# List all learnings
|
||||
all_learnings = kg.list_entities(entity_type="finding")
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Example 1: Extract Learnings from Task Completion
|
||||
|
||||
```python
|
||||
from lib.skill_learning_engine import SkillLearningSystem
|
||||
|
||||
system = SkillLearningSystem()
|
||||
|
||||
# Task data from execution
|
||||
task_data = {
|
||||
"task_id": "deploy_overbits_v2",
|
||||
"prompt": "Deploy new frontend build to production with zero downtime",
|
||||
"project": "overbits",
|
||||
"status": "success",
|
||||
"tools_used": ["Bash", "Read", "Edit"],
|
||||
"duration": 120.5,
|
||||
"result_summary": "Successfully deployed with no downtime, 100% rollback verified",
|
||||
"qa_passed": True,
|
||||
"timestamp": "2026-01-09T15:30:00"
|
||||
}
|
||||
|
||||
# QA validation results
|
||||
qa_results = {
|
||||
"passed": True,
|
||||
"results": {
|
||||
"syntax": True,
|
||||
"routes": True,
|
||||
"command_docs": True,
|
||||
},
|
||||
"summary": {
|
||||
"errors": 0,
|
||||
"warnings": 0,
|
||||
"info": 5,
|
||||
}
|
||||
}
|
||||
|
||||
# Process and extract learnings
|
||||
result = system.process_task_completion(task_data, qa_results)
|
||||
|
||||
print(f"Skills extracted: {result['skills_extracted']}")
|
||||
print(f"Learning created: {result['learning_id']}")
|
||||
```
|
||||
|
||||
### Example 2: Get Recommendations for Similar Task
|
||||
|
||||
```python
|
||||
# Later, for a similar deployment task
|
||||
new_prompt = "Deploy database migration to production"
|
||||
|
||||
recommendations = system.get_recommendations(new_prompt, project="overbits")
|
||||
|
||||
for rec in recommendations:
|
||||
print(f"Skill: {rec['skill']}")
|
||||
print(f"From learning: {rec['source_learning']}")
|
||||
print(f"Confidence: {rec['confidence']:.1%}")
|
||||
```
|
||||
|
||||
### Example 3: Build Skill Profile
|
||||
|
||||
```python
|
||||
# Get overview of learned skills
|
||||
profile = system.get_learning_summary()
|
||||
|
||||
print(f"Total learnings: {profile['total_learnings']}")
|
||||
print(f"Skills by category: {profile['by_category']}")
|
||||
print(f"Top 5 skills:")
|
||||
for skill, count in profile['top_skills'][:5]:
|
||||
print(f" {skill}: {count} occurrences")
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
Run the comprehensive test suite:
|
||||
|
||||
```bash
|
||||
python3 -m pytest tests/test_skill_learning.py -v
|
||||
```
|
||||
|
||||
**Test Coverage:**
|
||||
- Task analysis and pattern extraction
|
||||
- Skill extraction from tasks and QA results
|
||||
- Decision pattern recognition
|
||||
- Skill aggregation
|
||||
- Learning extraction and storage
|
||||
- Skill recommendations
|
||||
- Full integration pipeline
|
||||
|
||||
All tests pass with mocked knowledge graph to avoid dependencies.
|
||||
|
||||
## Configuration
|
||||
|
||||
The system is configured in the QA validator integration:
|
||||
|
||||
**File:** `lib/qa_learning_integration.py`
|
||||
|
||||
Key settings:
|
||||
- **Knowledge Graph Domain**: `research` (all learnings stored here)
|
||||
- **Learning Extraction Trigger**: QA pass with all validations successful
|
||||
- **Skill Categories**: tool_usage, pattern, decision, architecture
|
||||
- **Confidence Calculation**: Weighted average of skill confidence and QA pass rate
|
||||
|
||||
## Data Flow
|
||||
|
||||
```
|
||||
Task Execution
|
||||
↓
|
||||
Task Analysis
|
||||
├─→ Success rate: 85%
|
||||
├─→ Average duration: 45 min
|
||||
├─→ Common tools: [Bash, Read, Edit]
|
||||
└─→ Project distribution: {overbits: 60%, dss: 40%}
|
||||
↓
|
||||
Skill Extraction
|
||||
├─→ Tool skills (from tools_used)
|
||||
├─→ Decision patterns (from prompt)
|
||||
├─→ Project knowledge (from project)
|
||||
└─→ QA validation skills
|
||||
↓
|
||||
Learning Creation
|
||||
├─→ Title & description
|
||||
├─→ Skill aggregation
|
||||
├─→ Pattern classification
|
||||
├─→ Confidence scoring
|
||||
└─→ Applicability determination
|
||||
↓
|
||||
Knowledge Graph Storage
|
||||
└─→ Entity: finding
|
||||
Relations: skill → learning
|
||||
Metadata: skills, pattern, confidence, applicability
|
||||
↓
|
||||
Future Recommendations
|
||||
└─→ Search similar tasks
|
||||
Extract applicable skills
|
||||
Rank by confidence
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
**Learning Extraction:**
|
||||
- Runs only on successful QA passes (not a bottleneck)
|
||||
- Async-ready (future enhancement)
|
||||
- Minimal overhead (~100ms per extraction)
|
||||
|
||||
**Recommendation:**
|
||||
- Uses FTS5 full-text search on KG
|
||||
- Limited to top 10 results
|
||||
- Confidence-ranked sorting
|
||||
|
||||
**Storage:**
|
||||
- SQLite with FTS5 (efficient)
|
||||
- Automatic indexing and triggers
|
||||
- Scales to thousands of learnings
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
1. **Async Extraction**: Background learning extraction during deployment
|
||||
2. **Confidence Evolution**: Learnings gain/lose confidence based on outcomes
|
||||
3. **Skill Decay**: Unused skills decrease in relevance over time
|
||||
4. **Cross-Project Learning**: Share learnings between similar projects
|
||||
5. **Decision Tracing**: Link recommendations back to specific successful tasks
|
||||
6. **Feedback Loop**: Update learning confidence based on task outcomes
|
||||
7. **Skill Trees**: Build hierarchies of related skills
|
||||
8. **Collaborative Learning**: Share learnings across team instances
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Learnings Not Being Created
|
||||
|
||||
Check:
|
||||
1. QA validation passes (`qa_results["passed"] == True`)
|
||||
2. Knowledge graph is accessible and writable
|
||||
3. No errors in `qa_learning_integration.py` output
|
||||
|
||||
```bash
|
||||
python3 lib/qa_validator.py --learn --verbose
|
||||
```
|
||||
|
||||
### Recommendations Are Empty
|
||||
|
||||
Possible causes:
|
||||
1. No learnings stored yet (run a successful task with `--learn`)
|
||||
2. Task prompt doesn't match stored learning titles
|
||||
3. Knowledge graph search not finding results
|
||||
|
||||
Test with:
|
||||
```bash
|
||||
python3 lib/skill_learning_engine.py recommend --task-prompt "Your task" --project overbits
|
||||
```
|
||||
|
||||
### Knowledge Graph Issues
|
||||
|
||||
Check knowledge graph status:
|
||||
```bash
|
||||
python3 lib/knowledge_graph.py stats
|
||||
python3 lib/knowledge_graph.py search "learning"
|
||||
```
|
||||
|
||||
## API Reference
|
||||
|
||||
See inline documentation in:
|
||||
- `lib/skill_learning_engine.py` - Main system classes
|
||||
- `lib/qa_learning_integration.py` - QA integration
|
||||
- `tests/test_skill_learning.py` - Usage examples via tests
|
||||
|
||||
## Contributing
|
||||
|
||||
To add new skill extraction patterns:
|
||||
|
||||
1. Add pattern to `SkillExtractor._extract_decision_patterns()`
|
||||
2. Update test cases in `TestSkillExtractor.test_extract_decision_patterns()`
|
||||
3. Test with: `python3 lib/skill_learning_engine.py test`
|
||||
4. Document pattern in this guide
|
||||
|
||||
## License
|
||||
|
||||
Part of Luzia Orchestrator. See parent project license.
|
||||
Reference in New Issue
Block a user