# Luzia Agent Autonomy Research ## Interactive Prompts and Autonomous Agent Patterns **Date:** 2026-01-09 **Version:** 1.0 **Status:** Complete --- ## Executive Summary This research documents how **Luzia** and the **Claude Agent SDK** enable autonomous agents to handle interactive scenarios without blocking. The key insight is that **blocking is prevented through architectural choices, not technical tricks**: 1. **Detached Execution** - Agents run in background processes, not waiting for input 2. **Non-Interactive Mode** - Permission mode set to `bypassPermissions` to avoid approval dialogs 3. **Async Communication** - Results delivered via files and notification logs, not stdin/stdout 4. **Failure Recovery** - Exit codes captured for retry logic without agent restart 5. **Context-First Design** - All necessary context provided upfront in prompts --- ## Part 1: How Luzia Prevents Agent Blocking ### 1.1 The Core Pattern: Detached Spawning **File:** `/opt/server-agents/orchestrator/bin/luzia` (lines 1012-1200) ```bash # Agents run detached with nohup, not waiting for completion os.system(f'nohup "{script_file}" >/dev/null 2>&1 &') ``` **Key Design Decisions:** | Aspect | Implementation | Why It Works | |--------|---|---| | **Process Isolation** | `nohup ... &` spawns detached process | Parent doesn't block; agent runs independently | | **Permission Mode** | `--permission-mode bypassPermissions` | No permission dialogs to pause agent | | **PID Tracking** | Job directory captures PID at startup | Can monitor/kill if needed without blocking | | **Output Capture** | `tee` pipes output to log file | Results captured even if agent backgrounded | | **Status Tracking** | Exit code appended to output.log | Job status determined post-execution | ### 1.2 The Full Agent Spawn Flow **Complete lifecycle (simplified):** ``` 1. spawn_claude_agent() called ↓ 2. Job directory created: /var/log/luz-orchestrator/jobs/{job_id}/ ├── prompt.txt (full context + task) ├── run.sh (executable shell script) ├── meta.json (job metadata) └── output.log (will capture all output) ↓ 3. Script written with all environment setup ├── TMPDIR set to user's home (prevent /tmp collisions) ├── HOME set to project user ├── Current directory: project path └── Claude CLI invoked with full prompt ↓ 4. Execution via nohup (detached) os.system(f'nohup "{script_file}" >/dev/null 2>&1 &') ↓ 5. Control returns immediately to CLI ↓ 6. Agent continues in background: ├── Reads prompt from file ├── Executes task (reads/writes files) ├── All output captured to output.log ├── Exit code captured: "exit:{code}" └── Completion logged to notifications.log ``` ### 1.3 Permission Bypass Strategy **Critical Flag:** `--permission-mode bypassPermissions` ```python # From spawn_claude_agent() claude_cmd = f'claude --dangerously-skip-permissions --permission-mode bypassPermissions ...' ``` **Why This Works:** - **No User Prompts**: Claude doesn't ask for approval on tool use - **Full Autonomy**: Agent makes all decisions without waiting - **Pre-Authorization**: All permissions granted upfront in job spawning - **Isolation**: Each agent runs as project user in their own space **When This is Safe:** - All agents have limited scope (project directory) - Running as restricted user (not root) - Task fully specified in prompt (no ambiguity) - Agent context includes execution environment details --- ## Part 2: Handling Clarification Without Blocking ### 2.1 The AskUserQuestion Problem When agents need clarification, Claude's `AskUserQuestion` tool blocks the agent process waiting for stdin input. For background agents, this is problematic. **Solutions in Luzia:** #### Solution 1: Context-First Design Provide all necessary context upfront so agents rarely need to ask: ```python prompt = f""" You are a project agent working on the **{project}** project. ## Your Task {task} ## Execution Environment - You are running as user: {run_as_user} - Working directory: {project_path} - All file operations are pre-authorized - Complete the task autonomously ## Guidelines - Complete the task autonomously - If you encounter errors, debug and fix them - Store important findings in the shared knowledge graph """ ``` #### Solution 2: Structured Task Format Use specific, unambiguous task descriptions: ``` GOOD: "Run tests in /workspace/tests and report pass/fail count" BAD: "Fix the test suite" (unclear what 'fix' means) GOOD: "Analyze src/index.ts for complexity metrics" BAD: "Improve code quality" (needs clarification on what to improve) ``` #### Solution 3: Async Fallback Mechanism If clarification is truly needed, agents can: 1. **Create a hold file** in job directory 2. **Log the question** to a status file 3. **Return exit code 1** (needs input) 4. **Await resolution** via file modification ```python # Example pattern for agent code: # (Not yet implemented in Luzia, but pattern is documented) import json from pathlib import Path job_dir = Path("/var/log/luz-orchestrator/jobs/{job_id}") # Agent encounters ambiguity question = "Should I update production database or staging?" # Write question to file clarification = { "status": "awaiting_clarification", "question": question, "options": ["production", "staging"], "agent_paused_at": datetime.now().isoformat() } (job_dir / "clarification.json").write_text(json.dumps(clarification)) # Exit with code 1 to signal "needs input" exit(1) ``` Then externally: ```bash # Operator provides input echo '{"choice": "staging"}' > /var/log/luz-orchestrator/jobs/{job_id}/clarification.json # Restart agent (automatic retry system) luzia retry {job_id} ``` ### 2.2 Why AskUserQuestion Doesn't Work for Async Agents | Scenario | Issue | Solution | |----------|-------|----------| | User runs `luzia project task` | User might close terminal | Store prompt in file, not stdin | | Agent backgrounded | stdin not available | No interactive input possible | | Multiple agents running | stdin interference | Use file-based IPC instead | | Agent on remote machine | stdin tunneling complex | All I/O via files or HTTP | --- ## Part 3: Job State Machine and Exit Codes ### 3.1 Job Lifecycle States **Defined in:** `/opt/server-agents/orchestrator/bin/luzia` (lines 607-646) ```python def _get_actual_job_status(job_dir: Path) -> str: """Determine actual job status by checking output.log""" # Status values: # - "running" (process still active) # - "completed" (exit:0) # - "failed" (exit:non-zero) # - "killed" (exit:-9) # - "unknown" (no status info) ``` **State Transitions:** ``` Job Created ↓ [meta.json: status="running", output.log: empty] ↓ Agent Executes (captured in output.log) ↓ Agent Completes/Exits ↓ output.log appended with "exit:{code}" ↓ Status determined: ├─ exit:0 → "completed" ├─ exit:non-0 → "failed" ├─ exit:-9 → "killed" └─ no exit → "running" (still active) ``` ### 3.2 The Critical Line: Capturing Exit Code **From run.sh template (lines 1148-1173):** ```bash # Command with output capture stdbuf ... {claude_cmd} 2>&1 | tee "{output_file}" exit_code=${PIPESTATUS[0]} # CRITICAL: Append exit code to log echo "" >> "{output_file}" echo "exit:$exit_code" >> "{output_file}" # Notify completion {notify_cmd} ``` **Why This Matters:** - Job status determined by examining log file, not process exit - Exit code persists in file even after process terminates - Allows status queries without spawning process - Enables automatic retry logic based on exit code --- ## Part 4: Handling Approval Prompts in Background ### 4.1 How Claude Code Approval Prompts Work Claude Code tools can ask for permission before executing risky operations: ``` ⚠️ This command has high privilege level. Approve? [Y/n] _ ``` In **interactive mode**: User can respond In **background mode**: Command blocks indefinitely waiting for stdin ### 4.2 Luzia's Prevention Mechanism **Three-layer approach:** 1. **CLI Flag**: `--permission-mode bypassPermissions` - Tells Claude CLI to skip permission dialogs - Requires `--dangerously-skip-permissions` flag 2. **Environment Setup**: User runs as project user, not root - Limited scope prevents catastrophic damage - Job runs in isolated directory - File ownership is correct by default 3. **Process Isolation**: Agent runs detached - Even if blocked, parent CLI returns immediately - Job continues in background - Can be monitored/killed separately **Example: Safe Bash Execution** ```python # This command would normally require approval in interactive mode command = "rm -rf /opt/sensitive-data" # But in agent context: # 1. Agent running as limited user (not root) # 2. Project path restricted (can't access /opt from project user) # 3. Permission flags bypass confirmation dialog # 4. Agent detached (blocking doesn't affect CLI) # Result: Command executes without interactive prompt ``` --- ## Part 5: Async Communication Patterns ### 5.1 File-Based Job Queue **Implemented in:** `/opt/server-agents/orchestrator/lib/queue_controller.py` **Pattern:** ``` User provides task → Enqueue to file-based queue → Status logged to disk ↓ Load-aware scheduler polls queue ↓ Task spawned as background agent ↓ Agent writes to output.log ↓ User queries status via filesystem ``` **Queue Structure:** ``` /var/lib/luzia/queue/ ├── pending/ │ ├── high/ │ │ └── {priority}_{timestamp}_{project}_{task_id}.json │ └── normal/ │ └── {priority}_{timestamp}_{project}_{task_id}.json ├── config.json └── capacity.json ``` **Task File Format:** ```json { "id": "a1b2c3d4", "project": "musica", "priority": 5, "prompt": "Run tests in /workspace/tests", "skill_match": "test-runner", "enqueued_at": "2026-01-09T15:30:45Z", "enqueued_by": "admin", "status": "pending" } ``` ### 5.2 Notification Log Pattern **Location:** `/var/log/luz-orchestrator/notifications.log` **Pattern:** ``` [14:23:15] Agent 142315-a1b2 finished (exit 0) [14:24:03] Agent 142403-c3d4 finished (exit 1) [14:25:12] Agent 142512-e5f6 finished (exit 0) ``` **Usage:** ```python # Script can tail this file to await completion # without polling job directories tail -f /var/log/luz-orchestrator/notifications.log | \ grep "Agent {job_id}" ``` ### 5.3 Job Directory as IPC Channel **Location:** `/var/log/luz-orchestrator/jobs/{job_id}/` **Files Used for Communication:** | File | Purpose | Direction | |------|---------|-----------| | `prompt.txt` | Task definition & context | Input (before agent starts) | | `output.log` | Agent's stdout/stderr + exit code | Output (written during execution) | | `meta.json` | Job metadata & status | Both (initial + final) | | `clarification.json` | Awaiting user input (pattern) | Bidirectional | | `run.sh` | Execution script | Input | | `pid` | Process ID | Output | **Example: Monitoring Job Completion** ```bash #!/bin/bash job_id="142315-a1b2" job_dir="/var/log/luz-orchestrator/jobs/$job_id" # Poll for completion while true; do if grep -q "^exit:" "$job_dir/output.log"; then exit_code=$(grep "^exit:" "$job_dir/output.log" | tail -1 | cut -d: -f2) echo "Job completed with exit code: $exit_code" break fi sleep 1 done ``` --- ## Part 6: Prompt Patterns for Agent Autonomy ### 6.1 The Ideal Autonomous Agent Prompt **Pattern:** ``` 1. Identity & Context - What role is the agent playing? - What project/domain? 2. Task Specification - What needs to be done? - What are success criteria? - What are the constraints? 3. Execution Environment - What tools are available? - What directories can be accessed? - What permissions are granted? 4. Decision Autonomy - What decisions can the agent make alone? - When should it ask for clarification? (ideally: never) - What should it do if ambiguous? 5. Communication - Where should results be written? - What format (JSON, text, files)? - When should it report progress? 6. Failure Handling - What to do if task fails? - Should it retry? How many times? - What exit codes to use? ``` ### 6.2 Good vs Bad Prompts for Autonomy **BAD - Requires Clarification:** ``` "Help me improve the code" - Ambiguous: which files? What metrics? - No success criteria - Agent likely to ask questions "Fix the bug" - Which bug? What symptoms? - Agent needs to investigate then ask ``` **GOOD - Autonomous:** ``` "Run tests in /workspace/tests and report: - Total test count - Passed count - Failed count - Exit code (0 if all pass, 1 if any fail)" "Analyze src/index.ts for: - Lines of code - Number of functions - Max function complexity - Save results to analysis.json" ``` ### 6.3 Prompt Template for Autonomous Agents **Used in Luzia:** `/opt/server-agents/orchestrator/bin/luzia` (lines 1053-1079) ```python prompt_template = """You are a project agent working on the **{project}** project. {context} ## Your Task {task} ## Execution Environment - You are running as user: {run_as_user} - You are running directly in the project directory: {project_path} - You have FULL permission to read, write, and execute files in this directory - Use standard Claude tools (Read, Write, Edit, Bash) directly - All file operations are pre-authorized - proceed without asking for permission ## Knowledge Graph - IMPORTANT Use the **shared/global knowledge graph** for storing knowledge: - Use `mcp__shared-projects-memory__store_fact` to store facts - Use `mcp__shared-projects-memory__query_relations` to query - Use `mcp__shared-projects-memory__search_context` to search ## Guidelines - Complete the task autonomously - If you encounter errors, debug and fix them - Store important findings in the shared knowledge graph - Provide a summary of what was done when complete """ ``` **Key Autonomy Features:** - No "ask for help" - pre-authorization is explicit - Clear environment details - no guessing about paths/permissions - Knowledge graph integration - preserve learnings across runs - Exit code expectations - clear success/failure criteria --- ## Part 7: Pattern Summary - Building Autonomous Agents ### 7.1 The Five Patterns | Pattern | When to Use | Implementation | |---------|------------|---| | **Detached Spawning** | Background tasks that shouldn't block CLI | `nohup ... &` with PID tracking | | **Permission Bypass** | Autonomous execution without prompts | `--permission-mode bypassPermissions` | | **File-Based IPC** | Async communication with agents | Job directory as channel | | **Exit Code Signaling** | Status determination without polling | Append "exit:{code}" to output | | **Context-First Prompts** | Avoid clarification questions | Detailed spec + success criteria | ### 7.2 Comparison: Interactive vs Autonomous Patterns | Aspect | Interactive Agent | Autonomous Agent | |--------|---|---| | **Execution** | Runs in foreground, blocks | Detached process, returns immediately | | **Prompts** | Can use `AskUserQuestion` | Must provide all context upfront | | **Approval** | Can request tool permission | Uses `--permission-mode bypassPermissions` | | **I/O** | stdin/stdout with user | Files, logs, notification channels | | **Failure** | User responds to errors | Agent handles/reports via exit code | | **Monitoring** | User watches output | Query filesystem for status | ### 7.3 When to Use Each Pattern **Use Interactive Agents When:** - User is present and waiting - Task requires user input/decisions - Working in development/exploration mode - Real-time feedback is valuable **Use Autonomous Agents When:** - Running background maintenance tasks - Multiple parallel operations needed - No user available to respond to prompts - Results needed asynchronously --- ## Part 8: Real Implementation Examples ### 8.1 Example: Running Tests Autonomously **Task:** ``` Run pytest in /workspace/tests and report results as JSON ``` **Luzia Command:** ```bash luzia musica "Run pytest in /workspace/tests and save results to tests.json with {passed: int, failed: int, errors: int}" ``` **What Happens:** 1. Job directory created with UUID 2. Prompt written with full context 3. Script prepared with environment setup 4. Launched via nohup 5. Immediately returns job_id to user 6. Agent runs in background: ```bash cd /workspace pytest tests/ --json > tests.json # Results saved to file ``` 7. Exit code captured (0 if all pass, 1 if failures) 8. Output logged to output.log 9. Completion notification sent **User Monitor:** ```bash luzia jobs {job_id} # Status: running/completed/failed # Exit code: 0/1 # Output preview: last 10 lines ``` ### 8.2 Example: Code Analysis Autonomously **Task:** ``` Analyze the codebase structure and save metrics to analysis.json ``` **Agent Does (no prompts needed):** 1. Reads prompt from job directory 2. Scans project structure 3. Collects metrics (LOC, functions, classes, complexity) 4. Writes results to analysis.json 5. Stores findings in knowledge graph 6. Exits with 0 **Success Criteria (in prompt):** ``` Results saved to analysis.json with: - total_files: int - total_lines: int - total_functions: int - total_classes: int - average_complexity: float ``` --- ## Part 9: Best Practices ### 9.1 Prompt Design for Autonomy 1. **Be Specific** - What files? What directories? - What exact metrics/outputs? - What format (JSON, CSV, text)? 2. **Provide Success Criteria** - What makes this task complete? - What should the output look like? - What exit code for success/failure? 3. **Include Error Handling** - What if file doesn't exist? - What if command fails? - Should it retry or report and exit? 4. **Minimize Ambiguity** - Don't say "improve code quality" - say what to measure - Don't say "fix bugs" - specify which bugs or how to find them - Don't say "optimize" - specify what metric to optimize ### 9.2 Environment Setup for Autonomy 1. **Pre-authorize Everything** - Set correct user/group - Ensure file permissions allow operations - Document what's accessible 2. **Provide Full Context** - Include CLAUDE.md or similar - Document project structure - Explain architectural decisions 3. **Set Clear Boundaries** - Which directories can be modified? - Which operations are allowed? - What can't be changed? ### 9.3 Failure Recovery for Autonomy 1. **Use Exit Codes Meaningfully** - 0 = success - 1 = recoverable failure - 2 = unrecoverable failure - -9 = killed/timeout 2. **Log Failures Comprehensively** - What was attempted? - What failed and why? - What was tried to recover? 3. **Enable Automatic Retry** - Retry on exit code 1 (optional) - Don't retry on exit code 2 (unrecoverable) - Track retry count to prevent infinite loops --- ## Part 10: Advanced Patterns ### 10.1 Multi-Phase Autonomous Tasks For complex tasks requiring multiple steps: ```python # Phase 1: Context gathering # Phase 2: Analysis # Phase 3: Implementation # Phase 4: Verification # Phase 5: Reporting # All phases defined upfront in prompt # Agent proceeds through all without asking # Exit code reflects overall success ``` ### 10.2 Knowledge Graph Integration Agents store findings persistently: ```python # From agent code: from mcp__shared-projects-memory__store_fact import store_fact # After analysis, store for future agents store_fact( entity_source_name="musica-project", relation="has_complexity_metrics", entity_target_name="analysis-2026-01-09", context={ "avg_complexity": 3.2, "hotspots": ["index.ts", "processor.ts"] } ) ``` ### 10.3 Cross-Agent Coordination Use shared state file for coordination: ```python # /opt/server-agents/state/cross-agent-todos.json # Agents read/update this to coordinate { "current_tasks": [ { "id": "analyze-musica", "project": "musica", "status": "in_progress", "assigned_to": "agent-a1b2", "started": "2026-01-09T14:23:00Z" } ], "completed_archive": { ... } } ``` --- ## Part 11: Failure Cases and Solutions ### 11.1 Common Blocking Issues | Issue | Cause | Solution | |-------|-------|----------| | Agent pauses on permission prompt | Tool permission check enabled | Use `--permission-mode bypassPermissions` | | Agent blocks on AskUserQuestion | Prompt causes clarification needed | Redesign prompt with full context | | stdin unavailable | Agent backgrounded | Use file-based IPC for input | | Exit code not recorded | Script exits before writing exit code | Ensure "exit:{code}" in output.log | | Job marked "running" forever | Process dies but exit code not appended | Use `tee` and explicit exit code capture | ### 11.2 Debugging Blocking Agents ```bash # Check if agent is actually running ps aux | grep {job_id} # Check job output (should show what it's doing) tail -f /var/log/luz-orchestrator/jobs/{job_id}/output.log # Check if waiting on stdin strace -p {pid} | grep read # Look for approval prompts in output grep -i "approve\|confirm\|permission" /var/log/luz-orchestrator/jobs/{job_id}/output.log # Check if exit code was written tail -5 /var/log/luz-orchestrator/jobs/{job_id}/output.log ``` --- ## Part 12: Conclusion - Key Takeaways ### 12.1 The Core Principle **Autonomous agents don't ask for input because they don't need to.** Rather than implementing complex async prompting, the better approach is: 1. **Specify tasks completely** - No ambiguity 2. **Provide full context** - No guessing required 3. **Set clear boundaries** - Know what's allowed 4. **Detach execution** - Run independent of CLI 5. **Capture results** - File-based communication ### 12.2 Implementation Checklist - [ ] Prompt includes all necessary context - [ ] Task has clear success criteria - [ ] Environment fully described (user, directory, permissions) - [ ] No ambiguous language in prompt - [ ] Exit codes defined (0=success, 1=failure, 2=error) - [ ] Output format specified (JSON, text, files) - [ ] Job runs as appropriate user - [ ] Results captured to files/logs - [ ] Notification system tracks completion - [ ] Status queryable without blocking ### 12.3 When Blocks Still Occur 1. **Rare**: Well-designed prompts rarely need clarification 2. **Detectable**: Agent exits with code 1 and logs to output.log 3. **Recoverable**: Can retry or modify task and re-queue 4. **Monitorable**: Parent CLI never blocks, can watch from elsewhere --- ## Appendix A: Key Code Locations | Location | Purpose | |----------|---------| | `/opt/server-agents/orchestrator/bin/luzia` (lines 1012-1200) | `spawn_claude_agent()` - Core autonomous agent spawning | | `/opt/server-agents/orchestrator/lib/docker_bridge.py` | Container isolation for project agents | | `/opt/server-agents/orchestrator/lib/queue_controller.py` | File-based task queue with load awareness | | `/var/log/luz-orchestrator/jobs/` | Job directory structure and IPC | | `/opt/server-agents/orchestrator/CLAUDE.md` | Embedded instructions for agents | --- ## Appendix B: Environment Variables Agents have these set automatically: ```bash TMPDIR="/home/{user}/.tmp" # Prevent /tmp collisions TEMP="/home/{user}/.tmp" # Same TMP="/home/{user}/.tmp" # Same HOME="/home/{user}" # User's home directory PWD="/path/to/project" # Working directory ``` --- ## Appendix C: File Formats Reference ### Job Directory Files **meta.json:** ```json { "id": "142315-a1b2", "project": "musica", "task": "Run tests and report results", "type": "agent", "user": "musica", "pid": "12847", "started": "2026-01-09T14:23:15Z", "status": "running", "debug": false } ``` **output.log:** ``` [14:23:15] Starting agent... [14:23:16] Reading prompt from file [14:23:17] Executing task... [14:23:18] Running tests... PASSED: test_1 PASSED: test_2 ... [14:23:25] Task complete exit:0 ``` --- ## References - **Luzia CLI**: `/opt/server-agents/orchestrator/bin/luzia` - **Agent SDK**: Claude Agent SDK (Anthropic) - **Docker Bridge**: Container isolation for agent execution - **Queue Controller**: File-based task queue implementation - **Bot Orchestration Protocol**: `/opt/server-agents/BOT-ORCHESTRATION-PROTOCOL.md`