Refactor cockpit to use DockerTmuxController pattern

Based on claude-code-tools TmuxCLIController, this refactor: - Added DockerTmuxController class for robust tmux session management - Implements send_keys() with configurable delay_enter - Implements capture_pane() for output retrieval - Implements wait_for_prompt() for pattern-based completion detection - Implements wait_for_idle() for content-hash-based idle detection - Implements wait_for_shell_prompt() for shell prompt detection Also includes workflow improvements: - Pre-task git snapshot before agent execution - Post-task commit protocol in agent guidelines Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 10:42:16 -03:00
commit ec33ac1936
265 changed files with 92011 additions and 0 deletions
--- a/AGENT-AUTONOMY-RESEARCH.md
+++ b/AGENT-AUTONOMY-RESEARCH.md
@@ -0,0 +1,881 @@
+# Luzia Agent Autonomy Research
+## Interactive Prompts and Autonomous Agent Patterns
+
+**Date:** 2026-01-09
+**Version:** 1.0
+**Status:** Complete
+
+---
+
+## Executive Summary
+
+This research documents how **Luzia** and the **Claude Agent SDK** enable autonomous agents to handle interactive scenarios without blocking. The key insight is that **blocking is prevented through architectural choices, not technical tricks**:
+
+1. **Detached Execution** - Agents run in background processes, not waiting for input
+2. **Non-Interactive Mode** - Permission mode set to `bypassPermissions` to avoid approval dialogs
+3. **Async Communication** - Results delivered via files and notification logs, not stdin/stdout
+4. **Failure Recovery** - Exit codes captured for retry logic without agent restart
+5. **Context-First Design** - All necessary context provided upfront in prompts
+
+---
+
+## Part 1: How Luzia Prevents Agent Blocking
+
+### 1.1 The Core Pattern: Detached Spawning
+
+**File:** `/opt/server-agents/orchestrator/bin/luzia` (lines 1012-1200)
+
+```bash
+# Agents run detached with nohup, not waiting for completion
+os.system(f'nohup "{script_file}" >/dev/null 2>&1 &')
+```
+
+**Key Design Decisions:**
+
+| Aspect | Implementation | Why It Works |
+|--------|---|---|
+| **Process Isolation** | `nohup ... &` spawns detached process | Parent doesn't block; agent runs independently |
+| **Permission Mode** | `--permission-mode bypassPermissions` | No permission dialogs to pause agent |
+| **PID Tracking** | Job directory captures PID at startup | Can monitor/kill if needed without blocking |
+| **Output Capture** | `tee` pipes output to log file | Results captured even if agent backgrounded |
+| **Status Tracking** | Exit code appended to output.log | Job status determined post-execution |
+
+### 1.2 The Full Agent Spawn Flow
+
+**Complete lifecycle (simplified):**
+
+```
+1. spawn_claude_agent() called
+   ↓
+2. Job directory created: /var/log/luz-orchestrator/jobs/{job_id}/
+   ├── prompt.txt (full context + task)
+   ├── run.sh (executable shell script)
+   ├── meta.json (job metadata)
+   └── output.log (will capture all output)
+   ↓
+3. Script written with all environment setup
+   ├── TMPDIR set to user's home (prevent /tmp collisions)
+   ├── HOME set to project user
+   ├── Current directory: project path
+   └── Claude CLI invoked with full prompt
+   ↓
+4. Execution via nohup (detached)
+   os.system(f'nohup "{script_file}" >/dev/null 2>&1 &')
+   ↓
+5. Control returns immediately to CLI
+   ↓
+6. Agent continues in background:
+   ├── Reads prompt from file
+   ├── Executes task (reads/writes files)
+   ├── All output captured to output.log
+   ├── Exit code captured: "exit:{code}"
+   └── Completion logged to notifications.log
+```
+
+### 1.3 Permission Bypass Strategy
+
+**Critical Flag:** `--permission-mode bypassPermissions`
+
+```python
+# From spawn_claude_agent()
+claude_cmd = f'claude --dangerously-skip-permissions --permission-mode bypassPermissions ...'
+```
+
+**Why This Works:**
+- **No User Prompts**: Claude doesn't ask for approval on tool use
+- **Full Autonomy**: Agent makes all decisions without waiting
+- **Pre-Authorization**: All permissions granted upfront in job spawning
+- **Isolation**: Each agent runs as project user in their own space
+
+**When This is Safe:**
+- All agents have limited scope (project directory)
+- Running as restricted user (not root)
+- Task fully specified in prompt (no ambiguity)
+- Agent context includes execution environment details
+
+---
+
+## Part 2: Handling Clarification Without Blocking
+
+### 2.1 The AskUserQuestion Problem
+
+When agents need clarification, Claude's `AskUserQuestion` tool blocks the agent process waiting for stdin input. For background agents, this is problematic.
+
+**Solutions in Luzia:**
+
+#### Solution 1: Context-First Design
+Provide all necessary context upfront so agents rarely need to ask:
+
+```python
+prompt = f"""
+You are a project agent working on the **{project}** project.
+
+## Your Task
+{task}
+
+## Execution Environment
+- You are running as user: {run_as_user}
+- Working directory: {project_path}
+- All file operations are pre-authorized
+- Complete the task autonomously
+
+## Guidelines
+- Complete the task autonomously
+- If you encounter errors, debug and fix them
+- Store important findings in the shared knowledge graph
+"""
+```
+
+#### Solution 2: Structured Task Format
+Use specific, unambiguous task descriptions:
+
+```
+GOOD:   "Run tests in /workspace/tests and report pass/fail count"
+BAD:    "Fix the test suite"  (unclear what 'fix' means)
+
+GOOD:   "Analyze src/index.ts for complexity metrics"
+BAD:    "Improve code quality" (needs clarification on what to improve)
+```
+
+#### Solution 3: Async Fallback Mechanism
+If clarification is truly needed, agents can:
+
+1. **Create a hold file** in job directory
+2. **Log the question** to a status file
+3. **Return exit code 1** (needs input)
+4. **Await resolution** via file modification
+
+```python
+# Example pattern for agent code:
+# (Not yet implemented in Luzia, but pattern is documented)
+
+import json
+from pathlib import Path
+
+job_dir = Path("/var/log/luz-orchestrator/jobs/{job_id}")
+
+# Agent encounters ambiguity
+question = "Should I update production database or staging?"
+
+# Write question to file
+clarification = {
+    "status": "awaiting_clarification",
+    "question": question,
+    "options": ["production", "staging"],
+    "agent_paused_at": datetime.now().isoformat()
+}
+(job_dir / "clarification.json").write_text(json.dumps(clarification))
+
+# Exit with code 1 to signal "needs input"
+exit(1)
+```
+
+Then externally:
+```bash
+# Operator provides input
+echo '{"choice": "staging"}' > /var/log/luz-orchestrator/jobs/{job_id}/clarification.json
+
+# Restart agent (automatic retry system)
+luzia retry {job_id}
+```
+
+### 2.2 Why AskUserQuestion Doesn't Work for Async Agents
+
+| Scenario | Issue | Solution |
+|----------|-------|----------|
+| User runs `luzia project task` | User might close terminal | Store prompt in file, not stdin |
+| Agent backgrounded | stdin not available | No interactive input possible |
+| Multiple agents running | stdin interference | Use file-based IPC instead |
+| Agent on remote machine | stdin tunneling complex | All I/O via files or HTTP |
+
+---
+
+## Part 3: Job State Machine and Exit Codes
+
+### 3.1 Job Lifecycle States
+
+**Defined in:** `/opt/server-agents/orchestrator/bin/luzia` (lines 607-646)
+
+```python
+def _get_actual_job_status(job_dir: Path) -> str:
+    """Determine actual job status by checking output.log"""
+
+    # Status values:
+    # - "running" (process still active)
+    # - "completed" (exit:0)
+    # - "failed" (exit:non-zero)
+    # - "killed" (exit:-9)
+    # - "unknown" (no status info)
+```
+
+**State Transitions:**
+
+```
+Job Created
+  ↓
+[meta.json: status="running", output.log: empty]
+  ↓
+Agent Executes (captured in output.log)
+  ↓
+Agent Completes/Exits
+  ↓
+output.log appended with "exit:{code}"
+  ↓
+Status determined:
+  ├─ exit:0     → "completed"
+  ├─ exit:non-0 → "failed"
+  ├─ exit:-9    → "killed"
+  └─ no exit    → "running" (still active)
+```
+
+### 3.2 The Critical Line: Capturing Exit Code
+
+**From run.sh template (lines 1148-1173):**
+
+```bash
+# Command with output capture
+stdbuf ... {claude_cmd} 2>&1 | tee "{output_file}"
+exit_code=${PIPESTATUS[0]}
+
+# CRITICAL: Append exit code to log
+echo "" >> "{output_file}"
+echo "exit:$exit_code" >> "{output_file}"
+
+# Notify completion
+{notify_cmd}
+```
+
+**Why This Matters:**
+- Job status determined by examining log file, not process exit
+- Exit code persists in file even after process terminates
+- Allows status queries without spawning process
+- Enables automatic retry logic based on exit code
+
+---
+
+## Part 4: Handling Approval Prompts in Background
+
+### 4.1 How Claude Code Approval Prompts Work
+
+Claude Code tools can ask for permission before executing risky operations:
+
+```
+⚠️ This command has high privilege level. Approve?
+[Y/n] _
+```
+
+In **interactive mode**: User can respond
+In **background mode**: Command blocks indefinitely waiting for stdin
+
+### 4.2 Luzia's Prevention Mechanism
+
+**Three-layer approach:**
+
+1. **CLI Flag**: `--permission-mode bypassPermissions`
+   - Tells Claude CLI to skip permission dialogs
+   - Requires `--dangerously-skip-permissions` flag
+
+2. **Environment Setup**: User runs as project user, not root
+   - Limited scope prevents catastrophic damage
+   - Job runs in isolated directory
+   - File ownership is correct by default
+
+3. **Process Isolation**: Agent runs detached
+   - Even if blocked, parent CLI returns immediately
+   - Job continues in background
+   - Can be monitored/killed separately
+
+**Example: Safe Bash Execution**
+
+```python
+# This command would normally require approval in interactive mode
+command = "rm -rf /opt/sensitive-data"
+
+# But in agent context:
+# 1. Agent running as limited user (not root)
+# 2. Project path restricted (can't access /opt from project user)
+# 3. Permission flags bypass confirmation dialog
+# 4. Agent detached (blocking doesn't affect CLI)
+
+# Result: Command executes without interactive prompt
+```
+
+---
+
+## Part 5: Async Communication Patterns
+
+### 5.1 File-Based Job Queue
+
+**Implemented in:** `/opt/server-agents/orchestrator/lib/queue_controller.py`
+
+**Pattern:**
+
+```
+User provides task → Enqueue to file-based queue → Status logged to disk
+                                                       ↓
+                                          Load-aware scheduler polls queue
+                                                       ↓
+                                          Task spawned as background agent
+                                                       ↓
+                                          Agent writes to output.log
+                                                       ↓
+                                          User queries status via filesystem
+```
+
+**Queue Structure:**
+
+```
+/var/lib/luzia/queue/
+├── pending/
+│   ├── high/
+│   │   └── {priority}_{timestamp}_{project}_{task_id}.json
+│   └── normal/
+│       └── {priority}_{timestamp}_{project}_{task_id}.json
+├── config.json
+└── capacity.json
+```
+
+**Task File Format:**
+
+```json
+{
+  "id": "a1b2c3d4",
+  "project": "musica",
+  "priority": 5,
+  "prompt": "Run tests in /workspace/tests",
+  "skill_match": "test-runner",
+  "enqueued_at": "2026-01-09T15:30:45Z",
+  "enqueued_by": "admin",
+  "status": "pending"
+}
+```
+
+### 5.2 Notification Log Pattern
+
+**Location:** `/var/log/luz-orchestrator/notifications.log`
+
+**Pattern:**
+
+```
+[14:23:15] Agent 142315-a1b2 finished (exit 0)
+[14:24:03] Agent 142403-c3d4 finished (exit 1)
+[14:25:12] Agent 142512-e5f6 finished (exit 0)
+```
+
+**Usage:**
+
+```python
+# Script can tail this file to await completion
+# without polling job directories
+tail -f /var/log/luz-orchestrator/notifications.log | \
+  grep "Agent {job_id}"
+```
+
+### 5.3 Job Directory as IPC Channel
+
+**Location:** `/var/log/luz-orchestrator/jobs/{job_id}/`
+
+**Files Used for Communication:**
+
+| File | Purpose | Direction |
+|------|---------|-----------|
+| `prompt.txt` | Task definition & context | Input (before agent starts) |
+| `output.log` | Agent's stdout/stderr + exit code | Output (written during execution) |
+| `meta.json` | Job metadata & status | Both (initial + final) |
+| `clarification.json` | Awaiting user input (pattern) | Bidirectional |
+| `run.sh` | Execution script | Input |
+| `pid` | Process ID | Output |
+
+**Example: Monitoring Job Completion**
+
+```bash
+#!/bin/bash
+job_id="142315-a1b2"
+job_dir="/var/log/luz-orchestrator/jobs/$job_id"
+
+# Poll for completion
+while true; do
+  if grep -q "^exit:" "$job_dir/output.log"; then
+    exit_code=$(grep "^exit:" "$job_dir/output.log" | tail -1 | cut -d: -f2)
+    echo "Job completed with exit code: $exit_code"
+    break
+  fi
+  sleep 1
+done
+```
+
+---
+
+## Part 6: Prompt Patterns for Agent Autonomy
+
+### 6.1 The Ideal Autonomous Agent Prompt
+
+**Pattern:**
+
+```
+1. Identity & Context
+   - What role is the agent playing?
+   - What project/domain?
+
+2. Task Specification
+   - What needs to be done?
+   - What are success criteria?
+   - What are the constraints?
+
+3. Execution Environment
+   - What tools are available?
+   - What directories can be accessed?
+   - What permissions are granted?
+
+4. Decision Autonomy
+   - What decisions can the agent make alone?
+   - When should it ask for clarification? (ideally: never)
+   - What should it do if ambiguous?
+
+5. Communication
+   - Where should results be written?
+   - What format (JSON, text, files)?
+   - When should it report progress?
+
+6. Failure Handling
+   - What to do if task fails?
+   - Should it retry? How many times?
+   - What exit codes to use?
+```
+
+### 6.2 Good vs Bad Prompts for Autonomy
+
+**BAD - Requires Clarification:**
+```
+"Help me improve the code"
+- Ambiguous: which files? What metrics?
+- No success criteria
+- Agent likely to ask questions
+
+"Fix the bug"
+- Which bug? What symptoms?
+- Agent needs to investigate then ask
+```
+
+**GOOD - Autonomous:**
+```
+"Run tests in /workspace/tests and report:
+- Total test count
+- Passed count
+- Failed count
+- Exit code (0 if all pass, 1 if any fail)"
+
+"Analyze src/index.ts for:
+- Lines of code
+- Number of functions
+- Max function complexity
+- Save results to analysis.json"
+```
+
+### 6.3 Prompt Template for Autonomous Agents
+
+**Used in Luzia:** `/opt/server-agents/orchestrator/bin/luzia` (lines 1053-1079)
+
+```python
+prompt_template = """You are a project agent working on the **{project}** project.
+
+{context}
+
+## Your Task
+{task}
+
+## Execution Environment
+- You are running as user: {run_as_user}
+- You are running directly in the project directory: {project_path}
+- You have FULL permission to read, write, and execute files in this directory
+- Use standard Claude tools (Read, Write, Edit, Bash) directly
+- All file operations are pre-authorized - proceed without asking for permission
+
+## Knowledge Graph - IMPORTANT
+Use the **shared/global knowledge graph** for storing knowledge:
+- Use `mcp__shared-projects-memory__store_fact` to store facts
+- Use `mcp__shared-projects-memory__query_relations` to query
+- Use `mcp__shared-projects-memory__search_context` to search
+
+## Guidelines
+- Complete the task autonomously
+- If you encounter errors, debug and fix them
+- Store important findings in the shared knowledge graph
+- Provide a summary of what was done when complete
+"""
+```
+
+**Key Autonomy Features:**
+- No "ask for help" - pre-authorization is explicit
+- Clear environment details - no guessing about paths/permissions
+- Knowledge graph integration - preserve learnings across runs
+- Exit code expectations - clear success/failure criteria
+
+---
+
+## Part 7: Pattern Summary - Building Autonomous Agents
+
+### 7.1 The Five Patterns
+
+| Pattern | When to Use | Implementation |
+|---------|------------|---|
+| **Detached Spawning** | Background tasks that shouldn't block CLI | `nohup ... &` with PID tracking |
+| **Permission Bypass** | Autonomous execution without prompts | `--permission-mode bypassPermissions` |
+| **File-Based IPC** | Async communication with agents | Job directory as channel |
+| **Exit Code Signaling** | Status determination without polling | Append "exit:{code}" to output |
+| **Context-First Prompts** | Avoid clarification questions | Detailed spec + success criteria |
+
+### 7.2 Comparison: Interactive vs Autonomous Patterns
+
+| Aspect | Interactive Agent | Autonomous Agent |
+|--------|---|---|
+| **Execution** | Runs in foreground, blocks | Detached process, returns immediately |
+| **Prompts** | Can use `AskUserQuestion` | Must provide all context upfront |
+| **Approval** | Can request tool permission | Uses `--permission-mode bypassPermissions` |
+| **I/O** | stdin/stdout with user | Files, logs, notification channels |
+| **Failure** | User responds to errors | Agent handles/reports via exit code |
+| **Monitoring** | User watches output | Query filesystem for status |
+
+### 7.3 When to Use Each Pattern
+
+**Use Interactive Agents When:**
+- User is present and waiting
+- Task requires user input/decisions
+- Working in development/exploration mode
+- Real-time feedback is valuable
+
+**Use Autonomous Agents When:**
+- Running background maintenance tasks
+- Multiple parallel operations needed
+- No user available to respond to prompts
+- Results needed asynchronously
+
+---
+
+## Part 8: Real Implementation Examples
+
+### 8.1 Example: Running Tests Autonomously
+
+**Task:**
+```
+Run pytest in /workspace/tests and report results as JSON
+```
+
+**Luzia Command:**
+```bash
+luzia musica "Run pytest in /workspace/tests and save results to tests.json with {passed: int, failed: int, errors: int}"
+```
+
+**What Happens:**
+
+1. Job directory created with UUID
+2. Prompt written with full context
+3. Script prepared with environment setup
+4. Launched via nohup
+5. Immediately returns job_id to user
+6. Agent runs in background:
+   ```bash
+   cd /workspace
+   pytest tests/ --json > tests.json
+   # Results saved to file
+   ```
+7. Exit code captured (0 if all pass, 1 if failures)
+8. Output logged to output.log
+9. Completion notification sent
+
+**User Monitor:**
+```bash
+luzia jobs {job_id}
+# Status: running/completed/failed
+# Exit code: 0/1
+# Output preview: last 10 lines
+```
+
+### 8.2 Example: Code Analysis Autonomously
+
+**Task:**
+```
+Analyze the codebase structure and save metrics to analysis.json
+```
+
+**Agent Does (no prompts needed):**
+
+1. Reads prompt from job directory
+2. Scans project structure
+3. Collects metrics (LOC, functions, classes, complexity)
+4. Writes results to analysis.json
+5. Stores findings in knowledge graph
+6. Exits with 0
+
+**Success Criteria (in prompt):**
+```
+Results saved to analysis.json with:
+- total_files: int
+- total_lines: int
+- total_functions: int
+- total_classes: int
+- average_complexity: float
+```
+
+---
+
+## Part 9: Best Practices
+
+### 9.1 Prompt Design for Autonomy
+
+1. **Be Specific**
+   - What files? What directories?
+   - What exact metrics/outputs?
+   - What format (JSON, CSV, text)?
+
+2. **Provide Success Criteria**
+   - What makes this task complete?
+   - What should the output look like?
+   - What exit code for success/failure?
+
+3. **Include Error Handling**
+   - What if file doesn't exist?
+   - What if command fails?
+   - Should it retry or report and exit?
+
+4. **Minimize Ambiguity**
+   - Don't say "improve code quality" - say what to measure
+   - Don't say "fix bugs" - specify which bugs or how to find them
+   - Don't say "optimize" - specify what metric to optimize
+
+### 9.2 Environment Setup for Autonomy
+
+1. **Pre-authorize Everything**
+   - Set correct user/group
+   - Ensure file permissions allow operations
+   - Document what's accessible
+
+2. **Provide Full Context**
+   - Include CLAUDE.md or similar
+   - Document project structure
+   - Explain architectural decisions
+
+3. **Set Clear Boundaries**
+   - Which directories can be modified?
+   - Which operations are allowed?
+   - What can't be changed?
+
+### 9.3 Failure Recovery for Autonomy
+
+1. **Use Exit Codes Meaningfully**
+   - 0 = success
+   - 1 = recoverable failure
+   - 2 = unrecoverable failure
+   - -9 = killed/timeout
+
+2. **Log Failures Comprehensively**
+   - What was attempted?
+   - What failed and why?
+   - What was tried to recover?
+
+3. **Enable Automatic Retry**
+   - Retry on exit code 1 (optional)
+   - Don't retry on exit code 2 (unrecoverable)
+   - Track retry count to prevent infinite loops
+
+---
+
+## Part 10: Advanced Patterns
+
+### 10.1 Multi-Phase Autonomous Tasks
+
+For complex tasks requiring multiple steps:
+
+```python
+# Phase 1: Context gathering
+# Phase 2: Analysis
+# Phase 3: Implementation
+# Phase 4: Verification
+# Phase 5: Reporting
+
+# All phases defined upfront in prompt
+# Agent proceeds through all without asking
+# Exit code reflects overall success
+```
+
+### 10.2 Knowledge Graph Integration
+
+Agents store findings persistently:
+
+```python
+# From agent code:
+from mcp__shared-projects-memory__store_fact import store_fact
+
+# After analysis, store for future agents
+store_fact(
+    entity_source_name="musica-project",
+    relation="has_complexity_metrics",
+    entity_target_name="analysis-2026-01-09",
+    context={
+        "avg_complexity": 3.2,
+        "hotspots": ["index.ts", "processor.ts"]
+    }
+)
+```
+
+### 10.3 Cross-Agent Coordination
+
+Use shared state file for coordination:
+
+```python
+# /opt/server-agents/state/cross-agent-todos.json
+# Agents read/update this to coordinate
+
+{
+  "current_tasks": [
+    {
+      "id": "analyze-musica",
+      "project": "musica",
+      "status": "in_progress",
+      "assigned_to": "agent-a1b2",
+      "started": "2026-01-09T14:23:00Z"
+    }
+  ],
+  "completed_archive": { ... }
+}
+```
+
+---
+
+## Part 11: Failure Cases and Solutions
+
+### 11.1 Common Blocking Issues
+
+| Issue | Cause | Solution |
+|-------|-------|----------|
+| Agent pauses on permission prompt | Tool permission check enabled | Use `--permission-mode bypassPermissions` |
+| Agent blocks on AskUserQuestion | Prompt causes clarification needed | Redesign prompt with full context |
+| stdin unavailable | Agent backgrounded | Use file-based IPC for input |
+| Exit code not recorded | Script exits before writing exit code | Ensure "exit:{code}" in output.log |
+| Job marked "running" forever | Process dies but exit code not appended | Use `tee` and explicit exit code capture |
+
+### 11.2 Debugging Blocking Agents
+
+```bash
+# Check if agent is actually running
+ps aux | grep {job_id}
+
+# Check job output (should show what it's doing)
+tail -f /var/log/luz-orchestrator/jobs/{job_id}/output.log
+
+# Check if waiting on stdin
+strace -p {pid} | grep read
+
+# Look for approval prompts in output
+grep -i "approve\|confirm\|permission" /var/log/luz-orchestrator/jobs/{job_id}/output.log
+
+# Check if exit code was written
+tail -5 /var/log/luz-orchestrator/jobs/{job_id}/output.log
+```
+
+---
+
+## Part 12: Conclusion - Key Takeaways
+
+### 12.1 The Core Principle
+
+**Autonomous agents don't ask for input because they don't need to.**
+
+Rather than implementing complex async prompting, the better approach is:
+1. **Specify tasks completely** - No ambiguity
+2. **Provide full context** - No guessing required
+3. **Set clear boundaries** - Know what's allowed
+4. **Detach execution** - Run independent of CLI
+5. **Capture results** - File-based communication
+
+### 12.2 Implementation Checklist
+
+- [ ] Prompt includes all necessary context
+- [ ] Task has clear success criteria
+- [ ] Environment fully described (user, directory, permissions)
+- [ ] No ambiguous language in prompt
+- [ ] Exit codes defined (0=success, 1=failure, 2=error)
+- [ ] Output format specified (JSON, text, files)
+- [ ] Job runs as appropriate user
+- [ ] Results captured to files/logs
+- [ ] Notification system tracks completion
+- [ ] Status queryable without blocking
+
+### 12.3 When Blocks Still Occur
+
+1. **Rare**: Well-designed prompts rarely need clarification
+2. **Detectable**: Agent exits with code 1 and logs to output.log
+3. **Recoverable**: Can retry or modify task and re-queue
+4. **Monitorable**: Parent CLI never blocks, can watch from elsewhere
+
+---
+
+## Appendix A: Key Code Locations
+
+| Location | Purpose |
+|----------|---------|
+| `/opt/server-agents/orchestrator/bin/luzia` (lines 1012-1200) | `spawn_claude_agent()` - Core autonomous agent spawning |
+| `/opt/server-agents/orchestrator/lib/docker_bridge.py` | Container isolation for project agents |
+| `/opt/server-agents/orchestrator/lib/queue_controller.py` | File-based task queue with load awareness |
+| `/var/log/luz-orchestrator/jobs/` | Job directory structure and IPC |
+| `/opt/server-agents/orchestrator/CLAUDE.md` | Embedded instructions for agents |
+
+---
+
+## Appendix B: Environment Variables
+
+Agents have these set automatically:
+
+```bash
+TMPDIR="/home/{user}/.tmp"      # Prevent /tmp collisions
+TEMP="/home/{user}/.tmp"        # Same
+TMP="/home/{user}/.tmp"         # Same
+HOME="/home/{user}"             # User's home directory
+PWD="/path/to/project"          # Working directory
+```
+
+---
+
+## Appendix C: File Formats Reference
+
+### Job Directory Files
+
+**meta.json:**
+```json
+{
+  "id": "142315-a1b2",
+  "project": "musica",
+  "task": "Run tests and report results",
+  "type": "agent",
+  "user": "musica",
+  "pid": "12847",
+  "started": "2026-01-09T14:23:15Z",
+  "status": "running",
+  "debug": false
+}
+```
+
+**output.log:**
+```
+[14:23:15] Starting agent...
+[14:23:16] Reading prompt from file
+[14:23:17] Executing task...
+[14:23:18] Running tests...
+PASSED: test_1
+PASSED: test_2
+...
+[14:23:25] Task complete
+
+exit:0
+```
+
+---
+
+## References
+
+- **Luzia CLI**: `/opt/server-agents/orchestrator/bin/luzia`
+- **Agent SDK**: Claude Agent SDK (Anthropic)
+- **Docker Bridge**: Container isolation for agent execution
+- **Queue Controller**: File-based task queue implementation
+- **Bot Orchestration Protocol**: `/opt/server-agents/BOT-ORCHESTRATION-PROTOCOL.md`
+