Based on claude-code-tools TmuxCLIController, this refactor: - Added DockerTmuxController class for robust tmux session management - Implements send_keys() with configurable delay_enter - Implements capture_pane() for output retrieval - Implements wait_for_prompt() for pattern-based completion detection - Implements wait_for_idle() for content-hash-based idle detection - Implements wait_for_shell_prompt() for shell prompt detection Also includes workflow improvements: - Pre-task git snapshot before agent execution - Post-task commit protocol in agent guidelines Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
882 lines
24 KiB
Markdown
882 lines
24 KiB
Markdown
# Luzia Agent Autonomy Research
|
|
## Interactive Prompts and Autonomous Agent Patterns
|
|
|
|
**Date:** 2026-01-09
|
|
**Version:** 1.0
|
|
**Status:** Complete
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
This research documents how **Luzia** and the **Claude Agent SDK** enable autonomous agents to handle interactive scenarios without blocking. The key insight is that **blocking is prevented through architectural choices, not technical tricks**:
|
|
|
|
1. **Detached Execution** - Agents run in background processes, not waiting for input
|
|
2. **Non-Interactive Mode** - Permission mode set to `bypassPermissions` to avoid approval dialogs
|
|
3. **Async Communication** - Results delivered via files and notification logs, not stdin/stdout
|
|
4. **Failure Recovery** - Exit codes captured for retry logic without agent restart
|
|
5. **Context-First Design** - All necessary context provided upfront in prompts
|
|
|
|
---
|
|
|
|
## Part 1: How Luzia Prevents Agent Blocking
|
|
|
|
### 1.1 The Core Pattern: Detached Spawning
|
|
|
|
**File:** `/opt/server-agents/orchestrator/bin/luzia` (lines 1012-1200)
|
|
|
|
```bash
|
|
# Agents run detached with nohup, not waiting for completion
|
|
os.system(f'nohup "{script_file}" >/dev/null 2>&1 &')
|
|
```
|
|
|
|
**Key Design Decisions:**
|
|
|
|
| Aspect | Implementation | Why It Works |
|
|
|--------|---|---|
|
|
| **Process Isolation** | `nohup ... &` spawns detached process | Parent doesn't block; agent runs independently |
|
|
| **Permission Mode** | `--permission-mode bypassPermissions` | No permission dialogs to pause agent |
|
|
| **PID Tracking** | Job directory captures PID at startup | Can monitor/kill if needed without blocking |
|
|
| **Output Capture** | `tee` pipes output to log file | Results captured even if agent backgrounded |
|
|
| **Status Tracking** | Exit code appended to output.log | Job status determined post-execution |
|
|
|
|
### 1.2 The Full Agent Spawn Flow
|
|
|
|
**Complete lifecycle (simplified):**
|
|
|
|
```
|
|
1. spawn_claude_agent() called
|
|
↓
|
|
2. Job directory created: /var/log/luz-orchestrator/jobs/{job_id}/
|
|
├── prompt.txt (full context + task)
|
|
├── run.sh (executable shell script)
|
|
├── meta.json (job metadata)
|
|
└── output.log (will capture all output)
|
|
↓
|
|
3. Script written with all environment setup
|
|
├── TMPDIR set to user's home (prevent /tmp collisions)
|
|
├── HOME set to project user
|
|
├── Current directory: project path
|
|
└── Claude CLI invoked with full prompt
|
|
↓
|
|
4. Execution via nohup (detached)
|
|
os.system(f'nohup "{script_file}" >/dev/null 2>&1 &')
|
|
↓
|
|
5. Control returns immediately to CLI
|
|
↓
|
|
6. Agent continues in background:
|
|
├── Reads prompt from file
|
|
├── Executes task (reads/writes files)
|
|
├── All output captured to output.log
|
|
├── Exit code captured: "exit:{code}"
|
|
└── Completion logged to notifications.log
|
|
```
|
|
|
|
### 1.3 Permission Bypass Strategy
|
|
|
|
**Critical Flag:** `--permission-mode bypassPermissions`
|
|
|
|
```python
|
|
# From spawn_claude_agent()
|
|
claude_cmd = f'claude --dangerously-skip-permissions --permission-mode bypassPermissions ...'
|
|
```
|
|
|
|
**Why This Works:**
|
|
- **No User Prompts**: Claude doesn't ask for approval on tool use
|
|
- **Full Autonomy**: Agent makes all decisions without waiting
|
|
- **Pre-Authorization**: All permissions granted upfront in job spawning
|
|
- **Isolation**: Each agent runs as project user in their own space
|
|
|
|
**When This is Safe:**
|
|
- All agents have limited scope (project directory)
|
|
- Running as restricted user (not root)
|
|
- Task fully specified in prompt (no ambiguity)
|
|
- Agent context includes execution environment details
|
|
|
|
---
|
|
|
|
## Part 2: Handling Clarification Without Blocking
|
|
|
|
### 2.1 The AskUserQuestion Problem
|
|
|
|
When agents need clarification, Claude's `AskUserQuestion` tool blocks the agent process waiting for stdin input. For background agents, this is problematic.
|
|
|
|
**Solutions in Luzia:**
|
|
|
|
#### Solution 1: Context-First Design
|
|
Provide all necessary context upfront so agents rarely need to ask:
|
|
|
|
```python
|
|
prompt = f"""
|
|
You are a project agent working on the **{project}** project.
|
|
|
|
## Your Task
|
|
{task}
|
|
|
|
## Execution Environment
|
|
- You are running as user: {run_as_user}
|
|
- Working directory: {project_path}
|
|
- All file operations are pre-authorized
|
|
- Complete the task autonomously
|
|
|
|
## Guidelines
|
|
- Complete the task autonomously
|
|
- If you encounter errors, debug and fix them
|
|
- Store important findings in the shared knowledge graph
|
|
"""
|
|
```
|
|
|
|
#### Solution 2: Structured Task Format
|
|
Use specific, unambiguous task descriptions:
|
|
|
|
```
|
|
GOOD: "Run tests in /workspace/tests and report pass/fail count"
|
|
BAD: "Fix the test suite" (unclear what 'fix' means)
|
|
|
|
GOOD: "Analyze src/index.ts for complexity metrics"
|
|
BAD: "Improve code quality" (needs clarification on what to improve)
|
|
```
|
|
|
|
#### Solution 3: Async Fallback Mechanism
|
|
If clarification is truly needed, agents can:
|
|
|
|
1. **Create a hold file** in job directory
|
|
2. **Log the question** to a status file
|
|
3. **Return exit code 1** (needs input)
|
|
4. **Await resolution** via file modification
|
|
|
|
```python
|
|
# Example pattern for agent code:
|
|
# (Not yet implemented in Luzia, but pattern is documented)
|
|
|
|
import json
|
|
from pathlib import Path
|
|
|
|
job_dir = Path("/var/log/luz-orchestrator/jobs/{job_id}")
|
|
|
|
# Agent encounters ambiguity
|
|
question = "Should I update production database or staging?"
|
|
|
|
# Write question to file
|
|
clarification = {
|
|
"status": "awaiting_clarification",
|
|
"question": question,
|
|
"options": ["production", "staging"],
|
|
"agent_paused_at": datetime.now().isoformat()
|
|
}
|
|
(job_dir / "clarification.json").write_text(json.dumps(clarification))
|
|
|
|
# Exit with code 1 to signal "needs input"
|
|
exit(1)
|
|
```
|
|
|
|
Then externally:
|
|
```bash
|
|
# Operator provides input
|
|
echo '{"choice": "staging"}' > /var/log/luz-orchestrator/jobs/{job_id}/clarification.json
|
|
|
|
# Restart agent (automatic retry system)
|
|
luzia retry {job_id}
|
|
```
|
|
|
|
### 2.2 Why AskUserQuestion Doesn't Work for Async Agents
|
|
|
|
| Scenario | Issue | Solution |
|
|
|----------|-------|----------|
|
|
| User runs `luzia project task` | User might close terminal | Store prompt in file, not stdin |
|
|
| Agent backgrounded | stdin not available | No interactive input possible |
|
|
| Multiple agents running | stdin interference | Use file-based IPC instead |
|
|
| Agent on remote machine | stdin tunneling complex | All I/O via files or HTTP |
|
|
|
|
---
|
|
|
|
## Part 3: Job State Machine and Exit Codes
|
|
|
|
### 3.1 Job Lifecycle States
|
|
|
|
**Defined in:** `/opt/server-agents/orchestrator/bin/luzia` (lines 607-646)
|
|
|
|
```python
|
|
def _get_actual_job_status(job_dir: Path) -> str:
|
|
"""Determine actual job status by checking output.log"""
|
|
|
|
# Status values:
|
|
# - "running" (process still active)
|
|
# - "completed" (exit:0)
|
|
# - "failed" (exit:non-zero)
|
|
# - "killed" (exit:-9)
|
|
# - "unknown" (no status info)
|
|
```
|
|
|
|
**State Transitions:**
|
|
|
|
```
|
|
Job Created
|
|
↓
|
|
[meta.json: status="running", output.log: empty]
|
|
↓
|
|
Agent Executes (captured in output.log)
|
|
↓
|
|
Agent Completes/Exits
|
|
↓
|
|
output.log appended with "exit:{code}"
|
|
↓
|
|
Status determined:
|
|
├─ exit:0 → "completed"
|
|
├─ exit:non-0 → "failed"
|
|
├─ exit:-9 → "killed"
|
|
└─ no exit → "running" (still active)
|
|
```
|
|
|
|
### 3.2 The Critical Line: Capturing Exit Code
|
|
|
|
**From run.sh template (lines 1148-1173):**
|
|
|
|
```bash
|
|
# Command with output capture
|
|
stdbuf ... {claude_cmd} 2>&1 | tee "{output_file}"
|
|
exit_code=${PIPESTATUS[0]}
|
|
|
|
# CRITICAL: Append exit code to log
|
|
echo "" >> "{output_file}"
|
|
echo "exit:$exit_code" >> "{output_file}"
|
|
|
|
# Notify completion
|
|
{notify_cmd}
|
|
```
|
|
|
|
**Why This Matters:**
|
|
- Job status determined by examining log file, not process exit
|
|
- Exit code persists in file even after process terminates
|
|
- Allows status queries without spawning process
|
|
- Enables automatic retry logic based on exit code
|
|
|
|
---
|
|
|
|
## Part 4: Handling Approval Prompts in Background
|
|
|
|
### 4.1 How Claude Code Approval Prompts Work
|
|
|
|
Claude Code tools can ask for permission before executing risky operations:
|
|
|
|
```
|
|
⚠️ This command has high privilege level. Approve?
|
|
[Y/n] _
|
|
```
|
|
|
|
In **interactive mode**: User can respond
|
|
In **background mode**: Command blocks indefinitely waiting for stdin
|
|
|
|
### 4.2 Luzia's Prevention Mechanism
|
|
|
|
**Three-layer approach:**
|
|
|
|
1. **CLI Flag**: `--permission-mode bypassPermissions`
|
|
- Tells Claude CLI to skip permission dialogs
|
|
- Requires `--dangerously-skip-permissions` flag
|
|
|
|
2. **Environment Setup**: User runs as project user, not root
|
|
- Limited scope prevents catastrophic damage
|
|
- Job runs in isolated directory
|
|
- File ownership is correct by default
|
|
|
|
3. **Process Isolation**: Agent runs detached
|
|
- Even if blocked, parent CLI returns immediately
|
|
- Job continues in background
|
|
- Can be monitored/killed separately
|
|
|
|
**Example: Safe Bash Execution**
|
|
|
|
```python
|
|
# This command would normally require approval in interactive mode
|
|
command = "rm -rf /opt/sensitive-data"
|
|
|
|
# But in agent context:
|
|
# 1. Agent running as limited user (not root)
|
|
# 2. Project path restricted (can't access /opt from project user)
|
|
# 3. Permission flags bypass confirmation dialog
|
|
# 4. Agent detached (blocking doesn't affect CLI)
|
|
|
|
# Result: Command executes without interactive prompt
|
|
```
|
|
|
|
---
|
|
|
|
## Part 5: Async Communication Patterns
|
|
|
|
### 5.1 File-Based Job Queue
|
|
|
|
**Implemented in:** `/opt/server-agents/orchestrator/lib/queue_controller.py`
|
|
|
|
**Pattern:**
|
|
|
|
```
|
|
User provides task → Enqueue to file-based queue → Status logged to disk
|
|
↓
|
|
Load-aware scheduler polls queue
|
|
↓
|
|
Task spawned as background agent
|
|
↓
|
|
Agent writes to output.log
|
|
↓
|
|
User queries status via filesystem
|
|
```
|
|
|
|
**Queue Structure:**
|
|
|
|
```
|
|
/var/lib/luzia/queue/
|
|
├── pending/
|
|
│ ├── high/
|
|
│ │ └── {priority}_{timestamp}_{project}_{task_id}.json
|
|
│ └── normal/
|
|
│ └── {priority}_{timestamp}_{project}_{task_id}.json
|
|
├── config.json
|
|
└── capacity.json
|
|
```
|
|
|
|
**Task File Format:**
|
|
|
|
```json
|
|
{
|
|
"id": "a1b2c3d4",
|
|
"project": "musica",
|
|
"priority": 5,
|
|
"prompt": "Run tests in /workspace/tests",
|
|
"skill_match": "test-runner",
|
|
"enqueued_at": "2026-01-09T15:30:45Z",
|
|
"enqueued_by": "admin",
|
|
"status": "pending"
|
|
}
|
|
```
|
|
|
|
### 5.2 Notification Log Pattern
|
|
|
|
**Location:** `/var/log/luz-orchestrator/notifications.log`
|
|
|
|
**Pattern:**
|
|
|
|
```
|
|
[14:23:15] Agent 142315-a1b2 finished (exit 0)
|
|
[14:24:03] Agent 142403-c3d4 finished (exit 1)
|
|
[14:25:12] Agent 142512-e5f6 finished (exit 0)
|
|
```
|
|
|
|
**Usage:**
|
|
|
|
```python
|
|
# Script can tail this file to await completion
|
|
# without polling job directories
|
|
tail -f /var/log/luz-orchestrator/notifications.log | \
|
|
grep "Agent {job_id}"
|
|
```
|
|
|
|
### 5.3 Job Directory as IPC Channel
|
|
|
|
**Location:** `/var/log/luz-orchestrator/jobs/{job_id}/`
|
|
|
|
**Files Used for Communication:**
|
|
|
|
| File | Purpose | Direction |
|
|
|------|---------|-----------|
|
|
| `prompt.txt` | Task definition & context | Input (before agent starts) |
|
|
| `output.log` | Agent's stdout/stderr + exit code | Output (written during execution) |
|
|
| `meta.json` | Job metadata & status | Both (initial + final) |
|
|
| `clarification.json` | Awaiting user input (pattern) | Bidirectional |
|
|
| `run.sh` | Execution script | Input |
|
|
| `pid` | Process ID | Output |
|
|
|
|
**Example: Monitoring Job Completion**
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
job_id="142315-a1b2"
|
|
job_dir="/var/log/luz-orchestrator/jobs/$job_id"
|
|
|
|
# Poll for completion
|
|
while true; do
|
|
if grep -q "^exit:" "$job_dir/output.log"; then
|
|
exit_code=$(grep "^exit:" "$job_dir/output.log" | tail -1 | cut -d: -f2)
|
|
echo "Job completed with exit code: $exit_code"
|
|
break
|
|
fi
|
|
sleep 1
|
|
done
|
|
```
|
|
|
|
---
|
|
|
|
## Part 6: Prompt Patterns for Agent Autonomy
|
|
|
|
### 6.1 The Ideal Autonomous Agent Prompt
|
|
|
|
**Pattern:**
|
|
|
|
```
|
|
1. Identity & Context
|
|
- What role is the agent playing?
|
|
- What project/domain?
|
|
|
|
2. Task Specification
|
|
- What needs to be done?
|
|
- What are success criteria?
|
|
- What are the constraints?
|
|
|
|
3. Execution Environment
|
|
- What tools are available?
|
|
- What directories can be accessed?
|
|
- What permissions are granted?
|
|
|
|
4. Decision Autonomy
|
|
- What decisions can the agent make alone?
|
|
- When should it ask for clarification? (ideally: never)
|
|
- What should it do if ambiguous?
|
|
|
|
5. Communication
|
|
- Where should results be written?
|
|
- What format (JSON, text, files)?
|
|
- When should it report progress?
|
|
|
|
6. Failure Handling
|
|
- What to do if task fails?
|
|
- Should it retry? How many times?
|
|
- What exit codes to use?
|
|
```
|
|
|
|
### 6.2 Good vs Bad Prompts for Autonomy
|
|
|
|
**BAD - Requires Clarification:**
|
|
```
|
|
"Help me improve the code"
|
|
- Ambiguous: which files? What metrics?
|
|
- No success criteria
|
|
- Agent likely to ask questions
|
|
|
|
"Fix the bug"
|
|
- Which bug? What symptoms?
|
|
- Agent needs to investigate then ask
|
|
```
|
|
|
|
**GOOD - Autonomous:**
|
|
```
|
|
"Run tests in /workspace/tests and report:
|
|
- Total test count
|
|
- Passed count
|
|
- Failed count
|
|
- Exit code (0 if all pass, 1 if any fail)"
|
|
|
|
"Analyze src/index.ts for:
|
|
- Lines of code
|
|
- Number of functions
|
|
- Max function complexity
|
|
- Save results to analysis.json"
|
|
```
|
|
|
|
### 6.3 Prompt Template for Autonomous Agents
|
|
|
|
**Used in Luzia:** `/opt/server-agents/orchestrator/bin/luzia` (lines 1053-1079)
|
|
|
|
```python
|
|
prompt_template = """You are a project agent working on the **{project}** project.
|
|
|
|
{context}
|
|
|
|
## Your Task
|
|
{task}
|
|
|
|
## Execution Environment
|
|
- You are running as user: {run_as_user}
|
|
- You are running directly in the project directory: {project_path}
|
|
- You have FULL permission to read, write, and execute files in this directory
|
|
- Use standard Claude tools (Read, Write, Edit, Bash) directly
|
|
- All file operations are pre-authorized - proceed without asking for permission
|
|
|
|
## Knowledge Graph - IMPORTANT
|
|
Use the **shared/global knowledge graph** for storing knowledge:
|
|
- Use `mcp__shared-projects-memory__store_fact` to store facts
|
|
- Use `mcp__shared-projects-memory__query_relations` to query
|
|
- Use `mcp__shared-projects-memory__search_context` to search
|
|
|
|
## Guidelines
|
|
- Complete the task autonomously
|
|
- If you encounter errors, debug and fix them
|
|
- Store important findings in the shared knowledge graph
|
|
- Provide a summary of what was done when complete
|
|
"""
|
|
```
|
|
|
|
**Key Autonomy Features:**
|
|
- No "ask for help" - pre-authorization is explicit
|
|
- Clear environment details - no guessing about paths/permissions
|
|
- Knowledge graph integration - preserve learnings across runs
|
|
- Exit code expectations - clear success/failure criteria
|
|
|
|
---
|
|
|
|
## Part 7: Pattern Summary - Building Autonomous Agents
|
|
|
|
### 7.1 The Five Patterns
|
|
|
|
| Pattern | When to Use | Implementation |
|
|
|---------|------------|---|
|
|
| **Detached Spawning** | Background tasks that shouldn't block CLI | `nohup ... &` with PID tracking |
|
|
| **Permission Bypass** | Autonomous execution without prompts | `--permission-mode bypassPermissions` |
|
|
| **File-Based IPC** | Async communication with agents | Job directory as channel |
|
|
| **Exit Code Signaling** | Status determination without polling | Append "exit:{code}" to output |
|
|
| **Context-First Prompts** | Avoid clarification questions | Detailed spec + success criteria |
|
|
|
|
### 7.2 Comparison: Interactive vs Autonomous Patterns
|
|
|
|
| Aspect | Interactive Agent | Autonomous Agent |
|
|
|--------|---|---|
|
|
| **Execution** | Runs in foreground, blocks | Detached process, returns immediately |
|
|
| **Prompts** | Can use `AskUserQuestion` | Must provide all context upfront |
|
|
| **Approval** | Can request tool permission | Uses `--permission-mode bypassPermissions` |
|
|
| **I/O** | stdin/stdout with user | Files, logs, notification channels |
|
|
| **Failure** | User responds to errors | Agent handles/reports via exit code |
|
|
| **Monitoring** | User watches output | Query filesystem for status |
|
|
|
|
### 7.3 When to Use Each Pattern
|
|
|
|
**Use Interactive Agents When:**
|
|
- User is present and waiting
|
|
- Task requires user input/decisions
|
|
- Working in development/exploration mode
|
|
- Real-time feedback is valuable
|
|
|
|
**Use Autonomous Agents When:**
|
|
- Running background maintenance tasks
|
|
- Multiple parallel operations needed
|
|
- No user available to respond to prompts
|
|
- Results needed asynchronously
|
|
|
|
---
|
|
|
|
## Part 8: Real Implementation Examples
|
|
|
|
### 8.1 Example: Running Tests Autonomously
|
|
|
|
**Task:**
|
|
```
|
|
Run pytest in /workspace/tests and report results as JSON
|
|
```
|
|
|
|
**Luzia Command:**
|
|
```bash
|
|
luzia musica "Run pytest in /workspace/tests and save results to tests.json with {passed: int, failed: int, errors: int}"
|
|
```
|
|
|
|
**What Happens:**
|
|
|
|
1. Job directory created with UUID
|
|
2. Prompt written with full context
|
|
3. Script prepared with environment setup
|
|
4. Launched via nohup
|
|
5. Immediately returns job_id to user
|
|
6. Agent runs in background:
|
|
```bash
|
|
cd /workspace
|
|
pytest tests/ --json > tests.json
|
|
# Results saved to file
|
|
```
|
|
7. Exit code captured (0 if all pass, 1 if failures)
|
|
8. Output logged to output.log
|
|
9. Completion notification sent
|
|
|
|
**User Monitor:**
|
|
```bash
|
|
luzia jobs {job_id}
|
|
# Status: running/completed/failed
|
|
# Exit code: 0/1
|
|
# Output preview: last 10 lines
|
|
```
|
|
|
|
### 8.2 Example: Code Analysis Autonomously
|
|
|
|
**Task:**
|
|
```
|
|
Analyze the codebase structure and save metrics to analysis.json
|
|
```
|
|
|
|
**Agent Does (no prompts needed):**
|
|
|
|
1. Reads prompt from job directory
|
|
2. Scans project structure
|
|
3. Collects metrics (LOC, functions, classes, complexity)
|
|
4. Writes results to analysis.json
|
|
5. Stores findings in knowledge graph
|
|
6. Exits with 0
|
|
|
|
**Success Criteria (in prompt):**
|
|
```
|
|
Results saved to analysis.json with:
|
|
- total_files: int
|
|
- total_lines: int
|
|
- total_functions: int
|
|
- total_classes: int
|
|
- average_complexity: float
|
|
```
|
|
|
|
---
|
|
|
|
## Part 9: Best Practices
|
|
|
|
### 9.1 Prompt Design for Autonomy
|
|
|
|
1. **Be Specific**
|
|
- What files? What directories?
|
|
- What exact metrics/outputs?
|
|
- What format (JSON, CSV, text)?
|
|
|
|
2. **Provide Success Criteria**
|
|
- What makes this task complete?
|
|
- What should the output look like?
|
|
- What exit code for success/failure?
|
|
|
|
3. **Include Error Handling**
|
|
- What if file doesn't exist?
|
|
- What if command fails?
|
|
- Should it retry or report and exit?
|
|
|
|
4. **Minimize Ambiguity**
|
|
- Don't say "improve code quality" - say what to measure
|
|
- Don't say "fix bugs" - specify which bugs or how to find them
|
|
- Don't say "optimize" - specify what metric to optimize
|
|
|
|
### 9.2 Environment Setup for Autonomy
|
|
|
|
1. **Pre-authorize Everything**
|
|
- Set correct user/group
|
|
- Ensure file permissions allow operations
|
|
- Document what's accessible
|
|
|
|
2. **Provide Full Context**
|
|
- Include CLAUDE.md or similar
|
|
- Document project structure
|
|
- Explain architectural decisions
|
|
|
|
3. **Set Clear Boundaries**
|
|
- Which directories can be modified?
|
|
- Which operations are allowed?
|
|
- What can't be changed?
|
|
|
|
### 9.3 Failure Recovery for Autonomy
|
|
|
|
1. **Use Exit Codes Meaningfully**
|
|
- 0 = success
|
|
- 1 = recoverable failure
|
|
- 2 = unrecoverable failure
|
|
- -9 = killed/timeout
|
|
|
|
2. **Log Failures Comprehensively**
|
|
- What was attempted?
|
|
- What failed and why?
|
|
- What was tried to recover?
|
|
|
|
3. **Enable Automatic Retry**
|
|
- Retry on exit code 1 (optional)
|
|
- Don't retry on exit code 2 (unrecoverable)
|
|
- Track retry count to prevent infinite loops
|
|
|
|
---
|
|
|
|
## Part 10: Advanced Patterns
|
|
|
|
### 10.1 Multi-Phase Autonomous Tasks
|
|
|
|
For complex tasks requiring multiple steps:
|
|
|
|
```python
|
|
# Phase 1: Context gathering
|
|
# Phase 2: Analysis
|
|
# Phase 3: Implementation
|
|
# Phase 4: Verification
|
|
# Phase 5: Reporting
|
|
|
|
# All phases defined upfront in prompt
|
|
# Agent proceeds through all without asking
|
|
# Exit code reflects overall success
|
|
```
|
|
|
|
### 10.2 Knowledge Graph Integration
|
|
|
|
Agents store findings persistently:
|
|
|
|
```python
|
|
# From agent code:
|
|
from mcp__shared-projects-memory__store_fact import store_fact
|
|
|
|
# After analysis, store for future agents
|
|
store_fact(
|
|
entity_source_name="musica-project",
|
|
relation="has_complexity_metrics",
|
|
entity_target_name="analysis-2026-01-09",
|
|
context={
|
|
"avg_complexity": 3.2,
|
|
"hotspots": ["index.ts", "processor.ts"]
|
|
}
|
|
)
|
|
```
|
|
|
|
### 10.3 Cross-Agent Coordination
|
|
|
|
Use shared state file for coordination:
|
|
|
|
```python
|
|
# /opt/server-agents/state/cross-agent-todos.json
|
|
# Agents read/update this to coordinate
|
|
|
|
{
|
|
"current_tasks": [
|
|
{
|
|
"id": "analyze-musica",
|
|
"project": "musica",
|
|
"status": "in_progress",
|
|
"assigned_to": "agent-a1b2",
|
|
"started": "2026-01-09T14:23:00Z"
|
|
}
|
|
],
|
|
"completed_archive": { ... }
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Part 11: Failure Cases and Solutions
|
|
|
|
### 11.1 Common Blocking Issues
|
|
|
|
| Issue | Cause | Solution |
|
|
|-------|-------|----------|
|
|
| Agent pauses on permission prompt | Tool permission check enabled | Use `--permission-mode bypassPermissions` |
|
|
| Agent blocks on AskUserQuestion | Prompt causes clarification needed | Redesign prompt with full context |
|
|
| stdin unavailable | Agent backgrounded | Use file-based IPC for input |
|
|
| Exit code not recorded | Script exits before writing exit code | Ensure "exit:{code}" in output.log |
|
|
| Job marked "running" forever | Process dies but exit code not appended | Use `tee` and explicit exit code capture |
|
|
|
|
### 11.2 Debugging Blocking Agents
|
|
|
|
```bash
|
|
# Check if agent is actually running
|
|
ps aux | grep {job_id}
|
|
|
|
# Check job output (should show what it's doing)
|
|
tail -f /var/log/luz-orchestrator/jobs/{job_id}/output.log
|
|
|
|
# Check if waiting on stdin
|
|
strace -p {pid} | grep read
|
|
|
|
# Look for approval prompts in output
|
|
grep -i "approve\|confirm\|permission" /var/log/luz-orchestrator/jobs/{job_id}/output.log
|
|
|
|
# Check if exit code was written
|
|
tail -5 /var/log/luz-orchestrator/jobs/{job_id}/output.log
|
|
```
|
|
|
|
---
|
|
|
|
## Part 12: Conclusion - Key Takeaways
|
|
|
|
### 12.1 The Core Principle
|
|
|
|
**Autonomous agents don't ask for input because they don't need to.**
|
|
|
|
Rather than implementing complex async prompting, the better approach is:
|
|
1. **Specify tasks completely** - No ambiguity
|
|
2. **Provide full context** - No guessing required
|
|
3. **Set clear boundaries** - Know what's allowed
|
|
4. **Detach execution** - Run independent of CLI
|
|
5. **Capture results** - File-based communication
|
|
|
|
### 12.2 Implementation Checklist
|
|
|
|
- [ ] Prompt includes all necessary context
|
|
- [ ] Task has clear success criteria
|
|
- [ ] Environment fully described (user, directory, permissions)
|
|
- [ ] No ambiguous language in prompt
|
|
- [ ] Exit codes defined (0=success, 1=failure, 2=error)
|
|
- [ ] Output format specified (JSON, text, files)
|
|
- [ ] Job runs as appropriate user
|
|
- [ ] Results captured to files/logs
|
|
- [ ] Notification system tracks completion
|
|
- [ ] Status queryable without blocking
|
|
|
|
### 12.3 When Blocks Still Occur
|
|
|
|
1. **Rare**: Well-designed prompts rarely need clarification
|
|
2. **Detectable**: Agent exits with code 1 and logs to output.log
|
|
3. **Recoverable**: Can retry or modify task and re-queue
|
|
4. **Monitorable**: Parent CLI never blocks, can watch from elsewhere
|
|
|
|
---
|
|
|
|
## Appendix A: Key Code Locations
|
|
|
|
| Location | Purpose |
|
|
|----------|---------|
|
|
| `/opt/server-agents/orchestrator/bin/luzia` (lines 1012-1200) | `spawn_claude_agent()` - Core autonomous agent spawning |
|
|
| `/opt/server-agents/orchestrator/lib/docker_bridge.py` | Container isolation for project agents |
|
|
| `/opt/server-agents/orchestrator/lib/queue_controller.py` | File-based task queue with load awareness |
|
|
| `/var/log/luz-orchestrator/jobs/` | Job directory structure and IPC |
|
|
| `/opt/server-agents/orchestrator/CLAUDE.md` | Embedded instructions for agents |
|
|
|
|
---
|
|
|
|
## Appendix B: Environment Variables
|
|
|
|
Agents have these set automatically:
|
|
|
|
```bash
|
|
TMPDIR="/home/{user}/.tmp" # Prevent /tmp collisions
|
|
TEMP="/home/{user}/.tmp" # Same
|
|
TMP="/home/{user}/.tmp" # Same
|
|
HOME="/home/{user}" # User's home directory
|
|
PWD="/path/to/project" # Working directory
|
|
```
|
|
|
|
---
|
|
|
|
## Appendix C: File Formats Reference
|
|
|
|
### Job Directory Files
|
|
|
|
**meta.json:**
|
|
```json
|
|
{
|
|
"id": "142315-a1b2",
|
|
"project": "musica",
|
|
"task": "Run tests and report results",
|
|
"type": "agent",
|
|
"user": "musica",
|
|
"pid": "12847",
|
|
"started": "2026-01-09T14:23:15Z",
|
|
"status": "running",
|
|
"debug": false
|
|
}
|
|
```
|
|
|
|
**output.log:**
|
|
```
|
|
[14:23:15] Starting agent...
|
|
[14:23:16] Reading prompt from file
|
|
[14:23:17] Executing task...
|
|
[14:23:18] Running tests...
|
|
PASSED: test_1
|
|
PASSED: test_2
|
|
...
|
|
[14:23:25] Task complete
|
|
|
|
exit:0
|
|
```
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- **Luzia CLI**: `/opt/server-agents/orchestrator/bin/luzia`
|
|
- **Agent SDK**: Claude Agent SDK (Anthropic)
|
|
- **Docker Bridge**: Container isolation for agent execution
|
|
- **Queue Controller**: File-based task queue implementation
|
|
- **Bot Orchestration Protocol**: `/opt/server-agents/BOT-ORCHESTRATION-PROTOCOL.md`
|
|
|