Refactor cockpit to use DockerTmuxController pattern

Based on claude-code-tools TmuxCLIController, this refactor:

- Added DockerTmuxController class for robust tmux session management
- Implements send_keys() with configurable delay_enter
- Implements capture_pane() for output retrieval
- Implements wait_for_prompt() for pattern-based completion detection
- Implements wait_for_idle() for content-hash-based idle detection
- Implements wait_for_shell_prompt() for shell prompt detection

Also includes workflow improvements:
- Pre-task git snapshot before agent execution
- Post-task commit protocol in agent guidelines

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
admin
2026-01-14 10:42:16 -03:00
commit ec33ac1936
265 changed files with 92011 additions and 0 deletions

881
AGENT-AUTONOMY-RESEARCH.md Normal file
View File

@@ -0,0 +1,881 @@
# Luzia Agent Autonomy Research
## Interactive Prompts and Autonomous Agent Patterns
**Date:** 2026-01-09
**Version:** 1.0
**Status:** Complete
---
## Executive Summary
This research documents how **Luzia** and the **Claude Agent SDK** enable autonomous agents to handle interactive scenarios without blocking. The key insight is that **blocking is prevented through architectural choices, not technical tricks**:
1. **Detached Execution** - Agents run in background processes, not waiting for input
2. **Non-Interactive Mode** - Permission mode set to `bypassPermissions` to avoid approval dialogs
3. **Async Communication** - Results delivered via files and notification logs, not stdin/stdout
4. **Failure Recovery** - Exit codes captured for retry logic without agent restart
5. **Context-First Design** - All necessary context provided upfront in prompts
---
## Part 1: How Luzia Prevents Agent Blocking
### 1.1 The Core Pattern: Detached Spawning
**File:** `/opt/server-agents/orchestrator/bin/luzia` (lines 1012-1200)
```bash
# Agents run detached with nohup, not waiting for completion
os.system(f'nohup "{script_file}" >/dev/null 2>&1 &')
```
**Key Design Decisions:**
| Aspect | Implementation | Why It Works |
|--------|---|---|
| **Process Isolation** | `nohup ... &` spawns detached process | Parent doesn't block; agent runs independently |
| **Permission Mode** | `--permission-mode bypassPermissions` | No permission dialogs to pause agent |
| **PID Tracking** | Job directory captures PID at startup | Can monitor/kill if needed without blocking |
| **Output Capture** | `tee` pipes output to log file | Results captured even if agent backgrounded |
| **Status Tracking** | Exit code appended to output.log | Job status determined post-execution |
### 1.2 The Full Agent Spawn Flow
**Complete lifecycle (simplified):**
```
1. spawn_claude_agent() called
2. Job directory created: /var/log/luz-orchestrator/jobs/{job_id}/
├── prompt.txt (full context + task)
├── run.sh (executable shell script)
├── meta.json (job metadata)
└── output.log (will capture all output)
3. Script written with all environment setup
├── TMPDIR set to user's home (prevent /tmp collisions)
├── HOME set to project user
├── Current directory: project path
└── Claude CLI invoked with full prompt
4. Execution via nohup (detached)
os.system(f'nohup "{script_file}" >/dev/null 2>&1 &')
5. Control returns immediately to CLI
6. Agent continues in background:
├── Reads prompt from file
├── Executes task (reads/writes files)
├── All output captured to output.log
├── Exit code captured: "exit:{code}"
└── Completion logged to notifications.log
```
### 1.3 Permission Bypass Strategy
**Critical Flag:** `--permission-mode bypassPermissions`
```python
# From spawn_claude_agent()
claude_cmd = f'claude --dangerously-skip-permissions --permission-mode bypassPermissions ...'
```
**Why This Works:**
- **No User Prompts**: Claude doesn't ask for approval on tool use
- **Full Autonomy**: Agent makes all decisions without waiting
- **Pre-Authorization**: All permissions granted upfront in job spawning
- **Isolation**: Each agent runs as project user in their own space
**When This is Safe:**
- All agents have limited scope (project directory)
- Running as restricted user (not root)
- Task fully specified in prompt (no ambiguity)
- Agent context includes execution environment details
---
## Part 2: Handling Clarification Without Blocking
### 2.1 The AskUserQuestion Problem
When agents need clarification, Claude's `AskUserQuestion` tool blocks the agent process waiting for stdin input. For background agents, this is problematic.
**Solutions in Luzia:**
#### Solution 1: Context-First Design
Provide all necessary context upfront so agents rarely need to ask:
```python
prompt = f"""
You are a project agent working on the **{project}** project.
## Your Task
{task}
## Execution Environment
- You are running as user: {run_as_user}
- Working directory: {project_path}
- All file operations are pre-authorized
- Complete the task autonomously
## Guidelines
- Complete the task autonomously
- If you encounter errors, debug and fix them
- Store important findings in the shared knowledge graph
"""
```
#### Solution 2: Structured Task Format
Use specific, unambiguous task descriptions:
```
GOOD: "Run tests in /workspace/tests and report pass/fail count"
BAD: "Fix the test suite" (unclear what 'fix' means)
GOOD: "Analyze src/index.ts for complexity metrics"
BAD: "Improve code quality" (needs clarification on what to improve)
```
#### Solution 3: Async Fallback Mechanism
If clarification is truly needed, agents can:
1. **Create a hold file** in job directory
2. **Log the question** to a status file
3. **Return exit code 1** (needs input)
4. **Await resolution** via file modification
```python
# Example pattern for agent code:
# (Not yet implemented in Luzia, but pattern is documented)
import json
from pathlib import Path
job_dir = Path("/var/log/luz-orchestrator/jobs/{job_id}")
# Agent encounters ambiguity
question = "Should I update production database or staging?"
# Write question to file
clarification = {
"status": "awaiting_clarification",
"question": question,
"options": ["production", "staging"],
"agent_paused_at": datetime.now().isoformat()
}
(job_dir / "clarification.json").write_text(json.dumps(clarification))
# Exit with code 1 to signal "needs input"
exit(1)
```
Then externally:
```bash
# Operator provides input
echo '{"choice": "staging"}' > /var/log/luz-orchestrator/jobs/{job_id}/clarification.json
# Restart agent (automatic retry system)
luzia retry {job_id}
```
### 2.2 Why AskUserQuestion Doesn't Work for Async Agents
| Scenario | Issue | Solution |
|----------|-------|----------|
| User runs `luzia project task` | User might close terminal | Store prompt in file, not stdin |
| Agent backgrounded | stdin not available | No interactive input possible |
| Multiple agents running | stdin interference | Use file-based IPC instead |
| Agent on remote machine | stdin tunneling complex | All I/O via files or HTTP |
---
## Part 3: Job State Machine and Exit Codes
### 3.1 Job Lifecycle States
**Defined in:** `/opt/server-agents/orchestrator/bin/luzia` (lines 607-646)
```python
def _get_actual_job_status(job_dir: Path) -> str:
"""Determine actual job status by checking output.log"""
# Status values:
# - "running" (process still active)
# - "completed" (exit:0)
# - "failed" (exit:non-zero)
# - "killed" (exit:-9)
# - "unknown" (no status info)
```
**State Transitions:**
```
Job Created
[meta.json: status="running", output.log: empty]
Agent Executes (captured in output.log)
Agent Completes/Exits
output.log appended with "exit:{code}"
Status determined:
├─ exit:0 → "completed"
├─ exit:non-0 → "failed"
├─ exit:-9 → "killed"
└─ no exit → "running" (still active)
```
### 3.2 The Critical Line: Capturing Exit Code
**From run.sh template (lines 1148-1173):**
```bash
# Command with output capture
stdbuf ... {claude_cmd} 2>&1 | tee "{output_file}"
exit_code=${PIPESTATUS[0]}
# CRITICAL: Append exit code to log
echo "" >> "{output_file}"
echo "exit:$exit_code" >> "{output_file}"
# Notify completion
{notify_cmd}
```
**Why This Matters:**
- Job status determined by examining log file, not process exit
- Exit code persists in file even after process terminates
- Allows status queries without spawning process
- Enables automatic retry logic based on exit code
---
## Part 4: Handling Approval Prompts in Background
### 4.1 How Claude Code Approval Prompts Work
Claude Code tools can ask for permission before executing risky operations:
```
⚠️ This command has high privilege level. Approve?
[Y/n] _
```
In **interactive mode**: User can respond
In **background mode**: Command blocks indefinitely waiting for stdin
### 4.2 Luzia's Prevention Mechanism
**Three-layer approach:**
1. **CLI Flag**: `--permission-mode bypassPermissions`
- Tells Claude CLI to skip permission dialogs
- Requires `--dangerously-skip-permissions` flag
2. **Environment Setup**: User runs as project user, not root
- Limited scope prevents catastrophic damage
- Job runs in isolated directory
- File ownership is correct by default
3. **Process Isolation**: Agent runs detached
- Even if blocked, parent CLI returns immediately
- Job continues in background
- Can be monitored/killed separately
**Example: Safe Bash Execution**
```python
# This command would normally require approval in interactive mode
command = "rm -rf /opt/sensitive-data"
# But in agent context:
# 1. Agent running as limited user (not root)
# 2. Project path restricted (can't access /opt from project user)
# 3. Permission flags bypass confirmation dialog
# 4. Agent detached (blocking doesn't affect CLI)
# Result: Command executes without interactive prompt
```
---
## Part 5: Async Communication Patterns
### 5.1 File-Based Job Queue
**Implemented in:** `/opt/server-agents/orchestrator/lib/queue_controller.py`
**Pattern:**
```
User provides task → Enqueue to file-based queue → Status logged to disk
Load-aware scheduler polls queue
Task spawned as background agent
Agent writes to output.log
User queries status via filesystem
```
**Queue Structure:**
```
/var/lib/luzia/queue/
├── pending/
│ ├── high/
│ │ └── {priority}_{timestamp}_{project}_{task_id}.json
│ └── normal/
│ └── {priority}_{timestamp}_{project}_{task_id}.json
├── config.json
└── capacity.json
```
**Task File Format:**
```json
{
"id": "a1b2c3d4",
"project": "musica",
"priority": 5,
"prompt": "Run tests in /workspace/tests",
"skill_match": "test-runner",
"enqueued_at": "2026-01-09T15:30:45Z",
"enqueued_by": "admin",
"status": "pending"
}
```
### 5.2 Notification Log Pattern
**Location:** `/var/log/luz-orchestrator/notifications.log`
**Pattern:**
```
[14:23:15] Agent 142315-a1b2 finished (exit 0)
[14:24:03] Agent 142403-c3d4 finished (exit 1)
[14:25:12] Agent 142512-e5f6 finished (exit 0)
```
**Usage:**
```python
# Script can tail this file to await completion
# without polling job directories
tail -f /var/log/luz-orchestrator/notifications.log | \
grep "Agent {job_id}"
```
### 5.3 Job Directory as IPC Channel
**Location:** `/var/log/luz-orchestrator/jobs/{job_id}/`
**Files Used for Communication:**
| File | Purpose | Direction |
|------|---------|-----------|
| `prompt.txt` | Task definition & context | Input (before agent starts) |
| `output.log` | Agent's stdout/stderr + exit code | Output (written during execution) |
| `meta.json` | Job metadata & status | Both (initial + final) |
| `clarification.json` | Awaiting user input (pattern) | Bidirectional |
| `run.sh` | Execution script | Input |
| `pid` | Process ID | Output |
**Example: Monitoring Job Completion**
```bash
#!/bin/bash
job_id="142315-a1b2"
job_dir="/var/log/luz-orchestrator/jobs/$job_id"
# Poll for completion
while true; do
if grep -q "^exit:" "$job_dir/output.log"; then
exit_code=$(grep "^exit:" "$job_dir/output.log" | tail -1 | cut -d: -f2)
echo "Job completed with exit code: $exit_code"
break
fi
sleep 1
done
```
---
## Part 6: Prompt Patterns for Agent Autonomy
### 6.1 The Ideal Autonomous Agent Prompt
**Pattern:**
```
1. Identity & Context
- What role is the agent playing?
- What project/domain?
2. Task Specification
- What needs to be done?
- What are success criteria?
- What are the constraints?
3. Execution Environment
- What tools are available?
- What directories can be accessed?
- What permissions are granted?
4. Decision Autonomy
- What decisions can the agent make alone?
- When should it ask for clarification? (ideally: never)
- What should it do if ambiguous?
5. Communication
- Where should results be written?
- What format (JSON, text, files)?
- When should it report progress?
6. Failure Handling
- What to do if task fails?
- Should it retry? How many times?
- What exit codes to use?
```
### 6.2 Good vs Bad Prompts for Autonomy
**BAD - Requires Clarification:**
```
"Help me improve the code"
- Ambiguous: which files? What metrics?
- No success criteria
- Agent likely to ask questions
"Fix the bug"
- Which bug? What symptoms?
- Agent needs to investigate then ask
```
**GOOD - Autonomous:**
```
"Run tests in /workspace/tests and report:
- Total test count
- Passed count
- Failed count
- Exit code (0 if all pass, 1 if any fail)"
"Analyze src/index.ts for:
- Lines of code
- Number of functions
- Max function complexity
- Save results to analysis.json"
```
### 6.3 Prompt Template for Autonomous Agents
**Used in Luzia:** `/opt/server-agents/orchestrator/bin/luzia` (lines 1053-1079)
```python
prompt_template = """You are a project agent working on the **{project}** project.
{context}
## Your Task
{task}
## Execution Environment
- You are running as user: {run_as_user}
- You are running directly in the project directory: {project_path}
- You have FULL permission to read, write, and execute files in this directory
- Use standard Claude tools (Read, Write, Edit, Bash) directly
- All file operations are pre-authorized - proceed without asking for permission
## Knowledge Graph - IMPORTANT
Use the **shared/global knowledge graph** for storing knowledge:
- Use `mcp__shared-projects-memory__store_fact` to store facts
- Use `mcp__shared-projects-memory__query_relations` to query
- Use `mcp__shared-projects-memory__search_context` to search
## Guidelines
- Complete the task autonomously
- If you encounter errors, debug and fix them
- Store important findings in the shared knowledge graph
- Provide a summary of what was done when complete
"""
```
**Key Autonomy Features:**
- No "ask for help" - pre-authorization is explicit
- Clear environment details - no guessing about paths/permissions
- Knowledge graph integration - preserve learnings across runs
- Exit code expectations - clear success/failure criteria
---
## Part 7: Pattern Summary - Building Autonomous Agents
### 7.1 The Five Patterns
| Pattern | When to Use | Implementation |
|---------|------------|---|
| **Detached Spawning** | Background tasks that shouldn't block CLI | `nohup ... &` with PID tracking |
| **Permission Bypass** | Autonomous execution without prompts | `--permission-mode bypassPermissions` |
| **File-Based IPC** | Async communication with agents | Job directory as channel |
| **Exit Code Signaling** | Status determination without polling | Append "exit:{code}" to output |
| **Context-First Prompts** | Avoid clarification questions | Detailed spec + success criteria |
### 7.2 Comparison: Interactive vs Autonomous Patterns
| Aspect | Interactive Agent | Autonomous Agent |
|--------|---|---|
| **Execution** | Runs in foreground, blocks | Detached process, returns immediately |
| **Prompts** | Can use `AskUserQuestion` | Must provide all context upfront |
| **Approval** | Can request tool permission | Uses `--permission-mode bypassPermissions` |
| **I/O** | stdin/stdout with user | Files, logs, notification channels |
| **Failure** | User responds to errors | Agent handles/reports via exit code |
| **Monitoring** | User watches output | Query filesystem for status |
### 7.3 When to Use Each Pattern
**Use Interactive Agents When:**
- User is present and waiting
- Task requires user input/decisions
- Working in development/exploration mode
- Real-time feedback is valuable
**Use Autonomous Agents When:**
- Running background maintenance tasks
- Multiple parallel operations needed
- No user available to respond to prompts
- Results needed asynchronously
---
## Part 8: Real Implementation Examples
### 8.1 Example: Running Tests Autonomously
**Task:**
```
Run pytest in /workspace/tests and report results as JSON
```
**Luzia Command:**
```bash
luzia musica "Run pytest in /workspace/tests and save results to tests.json with {passed: int, failed: int, errors: int}"
```
**What Happens:**
1. Job directory created with UUID
2. Prompt written with full context
3. Script prepared with environment setup
4. Launched via nohup
5. Immediately returns job_id to user
6. Agent runs in background:
```bash
cd /workspace
pytest tests/ --json > tests.json
# Results saved to file
```
7. Exit code captured (0 if all pass, 1 if failures)
8. Output logged to output.log
9. Completion notification sent
**User Monitor:**
```bash
luzia jobs {job_id}
# Status: running/completed/failed
# Exit code: 0/1
# Output preview: last 10 lines
```
### 8.2 Example: Code Analysis Autonomously
**Task:**
```
Analyze the codebase structure and save metrics to analysis.json
```
**Agent Does (no prompts needed):**
1. Reads prompt from job directory
2. Scans project structure
3. Collects metrics (LOC, functions, classes, complexity)
4. Writes results to analysis.json
5. Stores findings in knowledge graph
6. Exits with 0
**Success Criteria (in prompt):**
```
Results saved to analysis.json with:
- total_files: int
- total_lines: int
- total_functions: int
- total_classes: int
- average_complexity: float
```
---
## Part 9: Best Practices
### 9.1 Prompt Design for Autonomy
1. **Be Specific**
- What files? What directories?
- What exact metrics/outputs?
- What format (JSON, CSV, text)?
2. **Provide Success Criteria**
- What makes this task complete?
- What should the output look like?
- What exit code for success/failure?
3. **Include Error Handling**
- What if file doesn't exist?
- What if command fails?
- Should it retry or report and exit?
4. **Minimize Ambiguity**
- Don't say "improve code quality" - say what to measure
- Don't say "fix bugs" - specify which bugs or how to find them
- Don't say "optimize" - specify what metric to optimize
### 9.2 Environment Setup for Autonomy
1. **Pre-authorize Everything**
- Set correct user/group
- Ensure file permissions allow operations
- Document what's accessible
2. **Provide Full Context**
- Include CLAUDE.md or similar
- Document project structure
- Explain architectural decisions
3. **Set Clear Boundaries**
- Which directories can be modified?
- Which operations are allowed?
- What can't be changed?
### 9.3 Failure Recovery for Autonomy
1. **Use Exit Codes Meaningfully**
- 0 = success
- 1 = recoverable failure
- 2 = unrecoverable failure
- -9 = killed/timeout
2. **Log Failures Comprehensively**
- What was attempted?
- What failed and why?
- What was tried to recover?
3. **Enable Automatic Retry**
- Retry on exit code 1 (optional)
- Don't retry on exit code 2 (unrecoverable)
- Track retry count to prevent infinite loops
---
## Part 10: Advanced Patterns
### 10.1 Multi-Phase Autonomous Tasks
For complex tasks requiring multiple steps:
```python
# Phase 1: Context gathering
# Phase 2: Analysis
# Phase 3: Implementation
# Phase 4: Verification
# Phase 5: Reporting
# All phases defined upfront in prompt
# Agent proceeds through all without asking
# Exit code reflects overall success
```
### 10.2 Knowledge Graph Integration
Agents store findings persistently:
```python
# From agent code:
from mcp__shared-projects-memory__store_fact import store_fact
# After analysis, store for future agents
store_fact(
entity_source_name="musica-project",
relation="has_complexity_metrics",
entity_target_name="analysis-2026-01-09",
context={
"avg_complexity": 3.2,
"hotspots": ["index.ts", "processor.ts"]
}
)
```
### 10.3 Cross-Agent Coordination
Use shared state file for coordination:
```python
# /opt/server-agents/state/cross-agent-todos.json
# Agents read/update this to coordinate
{
"current_tasks": [
{
"id": "analyze-musica",
"project": "musica",
"status": "in_progress",
"assigned_to": "agent-a1b2",
"started": "2026-01-09T14:23:00Z"
}
],
"completed_archive": { ... }
}
```
---
## Part 11: Failure Cases and Solutions
### 11.1 Common Blocking Issues
| Issue | Cause | Solution |
|-------|-------|----------|
| Agent pauses on permission prompt | Tool permission check enabled | Use `--permission-mode bypassPermissions` |
| Agent blocks on AskUserQuestion | Prompt causes clarification needed | Redesign prompt with full context |
| stdin unavailable | Agent backgrounded | Use file-based IPC for input |
| Exit code not recorded | Script exits before writing exit code | Ensure "exit:{code}" in output.log |
| Job marked "running" forever | Process dies but exit code not appended | Use `tee` and explicit exit code capture |
### 11.2 Debugging Blocking Agents
```bash
# Check if agent is actually running
ps aux | grep {job_id}
# Check job output (should show what it's doing)
tail -f /var/log/luz-orchestrator/jobs/{job_id}/output.log
# Check if waiting on stdin
strace -p {pid} | grep read
# Look for approval prompts in output
grep -i "approve\|confirm\|permission" /var/log/luz-orchestrator/jobs/{job_id}/output.log
# Check if exit code was written
tail -5 /var/log/luz-orchestrator/jobs/{job_id}/output.log
```
---
## Part 12: Conclusion - Key Takeaways
### 12.1 The Core Principle
**Autonomous agents don't ask for input because they don't need to.**
Rather than implementing complex async prompting, the better approach is:
1. **Specify tasks completely** - No ambiguity
2. **Provide full context** - No guessing required
3. **Set clear boundaries** - Know what's allowed
4. **Detach execution** - Run independent of CLI
5. **Capture results** - File-based communication
### 12.2 Implementation Checklist
- [ ] Prompt includes all necessary context
- [ ] Task has clear success criteria
- [ ] Environment fully described (user, directory, permissions)
- [ ] No ambiguous language in prompt
- [ ] Exit codes defined (0=success, 1=failure, 2=error)
- [ ] Output format specified (JSON, text, files)
- [ ] Job runs as appropriate user
- [ ] Results captured to files/logs
- [ ] Notification system tracks completion
- [ ] Status queryable without blocking
### 12.3 When Blocks Still Occur
1. **Rare**: Well-designed prompts rarely need clarification
2. **Detectable**: Agent exits with code 1 and logs to output.log
3. **Recoverable**: Can retry or modify task and re-queue
4. **Monitorable**: Parent CLI never blocks, can watch from elsewhere
---
## Appendix A: Key Code Locations
| Location | Purpose |
|----------|---------|
| `/opt/server-agents/orchestrator/bin/luzia` (lines 1012-1200) | `spawn_claude_agent()` - Core autonomous agent spawning |
| `/opt/server-agents/orchestrator/lib/docker_bridge.py` | Container isolation for project agents |
| `/opt/server-agents/orchestrator/lib/queue_controller.py` | File-based task queue with load awareness |
| `/var/log/luz-orchestrator/jobs/` | Job directory structure and IPC |
| `/opt/server-agents/orchestrator/CLAUDE.md` | Embedded instructions for agents |
---
## Appendix B: Environment Variables
Agents have these set automatically:
```bash
TMPDIR="/home/{user}/.tmp" # Prevent /tmp collisions
TEMP="/home/{user}/.tmp" # Same
TMP="/home/{user}/.tmp" # Same
HOME="/home/{user}" # User's home directory
PWD="/path/to/project" # Working directory
```
---
## Appendix C: File Formats Reference
### Job Directory Files
**meta.json:**
```json
{
"id": "142315-a1b2",
"project": "musica",
"task": "Run tests and report results",
"type": "agent",
"user": "musica",
"pid": "12847",
"started": "2026-01-09T14:23:15Z",
"status": "running",
"debug": false
}
```
**output.log:**
```
[14:23:15] Starting agent...
[14:23:16] Reading prompt from file
[14:23:17] Executing task...
[14:23:18] Running tests...
PASSED: test_1
PASSED: test_2
...
[14:23:25] Task complete
exit:0
```
---
## References
- **Luzia CLI**: `/opt/server-agents/orchestrator/bin/luzia`
- **Agent SDK**: Claude Agent SDK (Anthropic)
- **Docker Bridge**: Container isolation for agent execution
- **Queue Controller**: File-based task queue implementation
- **Bot Orchestration Protocol**: `/opt/server-agents/BOT-ORCHESTRATION-PROTOCOL.md`