Files

admin ec33ac1936 Refactor cockpit to use DockerTmuxController pattern

Based on claude-code-tools TmuxCLIController, this refactor:

- Added DockerTmuxController class for robust tmux session management
- Implements send_keys() with configurable delay_enter
- Implements capture_pane() for output retrieval
- Implements wait_for_prompt() for pattern-based completion detection
- Implements wait_for_idle() for content-hash-based idle detection
- Implements wait_for_shell_prompt() for shell prompt detection

Also includes workflow improvements:
- Pre-task git snapshot before agent execution
- Post-task commit protocol in agent guidelines

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-14 10:42:16 -03:00

11 KiB

Raw Blame History

Skill and Knowledge Learning System

Overview

The Skill and Knowledge Learning System automatically extracts learnings from completed tasks and QA passes, storing them in the shared knowledge graph for future skill recommendations and continuous decision-making improvements.

This system enables Luzia to:

Learn from successes: Extract patterns from passing QA validations
Build skill profiles: Aggregate tool usage, patterns, and decision-making approaches
Make recommendations: Suggest effective approaches for similar future tasks
Improve over time: Store learnings persistently for cross-session learning

Architecture

Components

TaskExecution
     ↓
TaskAnalyzer → Patterns & Metadata
     ↓
SkillExtractor → Extracted Skills
     ↓
LearningEngine → Learning Objects
     ↓
KnowledgeGraph (research domain)
     ↓
SkillRecommender → Task Recommendations

Core Classes

1. TaskAnalyzer

Analyzes task executions to extract patterns and metadata.

from lib.skill_learning_engine import TaskAnalyzer

analyzer = TaskAnalyzer()

# Analyze a single task
execution = analyzer.analyze_task({
    "task_id": "task_001",
    "prompt": "Refactor database schema",
    "project": "overbits",
    "status": "success",
    "tools_used": ["Bash", "Read", "Edit"],
    "duration": 45.2,
    "result_summary": "Schema refactored successfully",
    "qa_passed": True,
    "timestamp": "2026-01-09T12:00:00"
})

# Extract patterns from multiple executions
patterns = analyzer.extract_patterns(executions)
# Returns: success_rate, average_duration, common_tools, etc.

2. SkillExtractor

Extracts skills from task executions and QA results.

from lib.skill_learning_engine import SkillExtractor

extractor = SkillExtractor()

# Extract skills from task
skills = extractor.extract_from_task(execution)

# Extract skills from QA results
qa_skills = extractor.extract_from_qa_results(qa_results)

# Aggregate multiple skill extractions
aggregated = extractor.aggregate_skills(all_skills)

Skill Categories:

tool_usage: Tools used in task (Read, Bash, Edit, etc.)
pattern: Task patterns (optimization, debugging, testing, etc.)
decision: Decision-making approaches
architecture: Project/system knowledge

3. LearningEngine

Processes and stores learnings in the knowledge graph.

from lib.skill_learning_engine import LearningEngine

engine = LearningEngine()

# Extract a learning from successful task
learning = engine.extract_learning(execution, skills, qa_results)

# Store in knowledge graph
learning_id = engine.store_learning(learning)

# Create skill entities
skill_id = engine.create_skill_entity(skill)

4. SkillRecommender

Recommends skills for future tasks based on stored learnings.

from lib.skill_learning_engine import SkillRecommender

recommender = SkillRecommender()

# Get recommendations for a task
recommendations = recommender.recommend_for_task(
    task_prompt="Optimize database performance",
    project="overbits"
)

# Get overall skill profile
profile = recommender.get_skill_profile()

5. SkillLearningSystem

Unified orchestrator for the complete learning pipeline.

from lib.skill_learning_engine import SkillLearningSystem

system = SkillLearningSystem()

# Process a completed task with QA results
result = system.process_task_completion(task_data, qa_results)
# Result includes: skills_extracted, learning_created, learning_id

# Get recommendations
recommendations = system.get_recommendations(prompt, project)

# Get learning summary
summary = system.get_learning_summary()

Integration with QA Validator

The learning system integrates seamlessly with the QA validator:

Manual Integration

from lib.qa_learning_integration import QALearningIntegrator

integrator = QALearningIntegrator()

# Run QA with automatic learning extraction
result = integrator.run_qa_and_sync_with_learning(sync=True, verbose=True)

Via CLI

# Standard QA validation
python3 lib/qa_validator.py

# QA validation with learning extraction
python3 lib/qa_validator.py --learn --sync --verbose

# Get statistics on learning integration
python3 lib/qa_learning_integration.py --stats

Knowledge Graph Storage

Learnings are stored in the research domain of the knowledge graph:

Entity Type: finding
Name: learning_20260109_120000_Refactor_Database_Schema
Content:
  - Title: Refactor Database Schema
  - Description: Task execution details
  - Skills Used: tool_bash, tool_read, tool_edit, ...
  - Pattern: refactoring_pattern
  - Applicability: overbits, tool_bash, decision, ...
  - Confidence: 0.85

Metadata:
  - skills: [list of skill names]
  - pattern: refactoring_pattern
  - confidence: 0.85
  - applicability: [projects, tools, categories]
  - extraction_time: ISO timestamp

Accessing Stored Learnings

from lib.knowledge_graph import KnowledgeGraph

kg = KnowledgeGraph("research")

# Search for learnings
learnings = kg.search("database optimization", limit=10)

# Get specific learning
learning = kg.get_entity("learning_20260109_120000_Refactor_Database_Schema")

# Get related skills
relations = kg.get_relations("learning_20260109_120000_...")

# List all learnings
all_learnings = kg.list_entities(entity_type="finding")

Usage Examples

Example 1: Extract Learnings from Task Completion

from lib.skill_learning_engine import SkillLearningSystem

system = SkillLearningSystem()

# Task data from execution
task_data = {
    "task_id": "deploy_overbits_v2",
    "prompt": "Deploy new frontend build to production with zero downtime",
    "project": "overbits",
    "status": "success",
    "tools_used": ["Bash", "Read", "Edit"],
    "duration": 120.5,
    "result_summary": "Successfully deployed with no downtime, 100% rollback verified",
    "qa_passed": True,
    "timestamp": "2026-01-09T15:30:00"
}

# QA validation results
qa_results = {
    "passed": True,
    "results": {
        "syntax": True,
        "routes": True,
        "command_docs": True,
    },
    "summary": {
        "errors": 0,
        "warnings": 0,
        "info": 5,
    }
}

# Process and extract learnings
result = system.process_task_completion(task_data, qa_results)

print(f"Skills extracted: {result['skills_extracted']}")
print(f"Learning created: {result['learning_id']}")

Example 2: Get Recommendations for Similar Task

# Later, for a similar deployment task
new_prompt = "Deploy database migration to production"

recommendations = system.get_recommendations(new_prompt, project="overbits")

for rec in recommendations:
    print(f"Skill: {rec['skill']}")
    print(f"From learning: {rec['source_learning']}")
    print(f"Confidence: {rec['confidence']:.1%}")

Example 3: Build Skill Profile

# Get overview of learned skills
profile = system.get_learning_summary()

print(f"Total learnings: {profile['total_learnings']}")
print(f"Skills by category: {profile['by_category']}")
print(f"Top 5 skills:")
for skill, count in profile['top_skills'][:5]:
    print(f"  {skill}: {count} occurrences")

Testing

Run the comprehensive test suite:

python3 -m pytest tests/test_skill_learning.py -v

Test Coverage:

Task analysis and pattern extraction
Skill extraction from tasks and QA results
Decision pattern recognition
Skill aggregation
Learning extraction and storage
Skill recommendations
Full integration pipeline

All tests pass with mocked knowledge graph to avoid dependencies.

Configuration

The system is configured in the QA validator integration:

File: lib/qa_learning_integration.py

Key settings:

Knowledge Graph Domain: research (all learnings stored here)
Learning Extraction Trigger: QA pass with all validations successful
Skill Categories: tool_usage, pattern, decision, architecture
Confidence Calculation: Weighted average of skill confidence and QA pass rate

Data Flow

Task Execution
      ↓
Task Analysis
      ├─→ Success rate: 85%
      ├─→ Average duration: 45 min
      ├─→ Common tools: [Bash, Read, Edit]
      └─→ Project distribution: {overbits: 60%, dss: 40%}
      ↓
Skill Extraction
      ├─→ Tool skills (from tools_used)
      ├─→ Decision patterns (from prompt)
      ├─→ Project knowledge (from project)
      └─→ QA validation skills
      ↓
Learning Creation
      ├─→ Title & description
      ├─→ Skill aggregation
      ├─→ Pattern classification
      ├─→ Confidence scoring
      └─→ Applicability determination
      ↓
Knowledge Graph Storage
      └─→ Entity: finding
          Relations: skill → learning
          Metadata: skills, pattern, confidence, applicability
      ↓
Future Recommendations
      └─→ Search similar tasks
          Extract applicable skills
          Rank by confidence

Performance Considerations

Learning Extraction:

Runs only on successful QA passes (not a bottleneck)
Async-ready (future enhancement)
Minimal overhead (~100ms per extraction)

Recommendation:

Uses FTS5 full-text search on KG
Limited to top 10 results
Confidence-ranked sorting

Storage:

SQLite with FTS5 (efficient)
Automatic indexing and triggers
Scales to thousands of learnings

Future Enhancements

Async Extraction: Background learning extraction during deployment
Confidence Evolution: Learnings gain/lose confidence based on outcomes
Skill Decay: Unused skills decrease in relevance over time
Cross-Project Learning: Share learnings between similar projects
Decision Tracing: Link recommendations back to specific successful tasks
Feedback Loop: Update learning confidence based on task outcomes
Skill Trees: Build hierarchies of related skills
Collaborative Learning: Share learnings across team instances

Troubleshooting

Learnings Not Being Created

Check:

QA validation passes (qa_results["passed"] == True)
Knowledge graph is accessible and writable
No errors in qa_learning_integration.py output

python3 lib/qa_validator.py --learn --verbose

Recommendations Are Empty

Possible causes:

No learnings stored yet (run a successful task with --learn)
Task prompt doesn't match stored learning titles
Knowledge graph search not finding results

Test with:

python3 lib/skill_learning_engine.py recommend --task-prompt "Your task" --project overbits

Knowledge Graph Issues

Check knowledge graph status:

python3 lib/knowledge_graph.py stats
python3 lib/knowledge_graph.py search "learning"

API Reference

See inline documentation in:

lib/skill_learning_engine.py - Main system classes
lib/qa_learning_integration.py - QA integration
tests/test_skill_learning.py - Usage examples via tests

Contributing

To add new skill extraction patterns:

Add pattern to SkillExtractor._extract_decision_patterns()
Update test cases in TestSkillExtractor.test_extract_decision_patterns()
Test with: python3 lib/skill_learning_engine.py test
Document pattern in this guide

License

Part of Luzia Orchestrator. See parent project license.

11 KiB Raw Blame History