Refactor cockpit to use DockerTmuxController pattern
Based on claude-code-tools TmuxCLIController, this refactor: - Added DockerTmuxController class for robust tmux session management - Implements send_keys() with configurable delay_enter - Implements capture_pane() for output retrieval - Implements wait_for_prompt() for pattern-based completion detection - Implements wait_for_idle() for content-hash-based idle detection - Implements wait_for_shell_prompt() for shell prompt detection Also includes workflow improvements: - Pre-task git snapshot before agent execution - Post-task commit protocol in agent guidelines Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
389
RESEARCH-SUMMARY.md
Normal file
389
RESEARCH-SUMMARY.md
Normal file
@@ -0,0 +1,389 @@
|
||||
# Agent Autonomy Research - Executive Summary
|
||||
|
||||
**Project:** Luzia Agent Autonomy Research
|
||||
**Date:** 2026-01-09
|
||||
**Status:** ✅ Complete
|
||||
**Deliverables:** 4 comprehensive documents + shared knowledge graph
|
||||
|
||||
---
|
||||
|
||||
## What Was Researched
|
||||
|
||||
### Primary Questions
|
||||
1. How does Luzia handle interactive prompts to prevent agent blocking?
|
||||
2. What patterns enable autonomous agent execution without user input?
|
||||
3. How do agents handle clarification needs without blocking?
|
||||
4. What are best practices for prompt design in autonomous agents?
|
||||
|
||||
### Secondary Questions
|
||||
5. How does the Claude Agent SDK prevent approval dialog blocking?
|
||||
6. What communication patterns work for async agent-to-user interaction?
|
||||
7. How can agents make decisions without asking for confirmation?
|
||||
|
||||
---
|
||||
|
||||
## Key Findings
|
||||
|
||||
### 1. **Blocking is Prevented Through Architecture, Not Tricks**
|
||||
|
||||
Luzia prevents agent blocking through four **architectural layers**:
|
||||
|
||||
| Layer | Implementation | Purpose |
|
||||
|-------|---|---|
|
||||
| **Process** | Detached spawning (`nohup ... &`) | Parent CLI returns immediately |
|
||||
| **Permission** | `--permission-mode bypassPermissions` | No approval dialogs shown |
|
||||
| **Communication** | File-based IPC (job directory) | No stdin/stdout dependencies |
|
||||
| **Status** | Exit code signaling (append to log) | Async status queries |
|
||||
|
||||
**Result:** Even if an agent wanted to block, it can't because:
|
||||
- It's in a separate process (parent is gone)
|
||||
- It doesn't have stdin (won't wait for input)
|
||||
- Permission mode prevents approval prompts
|
||||
|
||||
### 2. **The Golden Rule of Autonomy**
|
||||
|
||||
> **Autonomous agents don't ask for input because they don't need to.**
|
||||
|
||||
Well-designed prompts provide:
|
||||
- ✓ Clear, specific objectives (not "improve code", but "reduce complexity to < 5")
|
||||
- ✓ Defined success criteria (what success looks like)
|
||||
- ✓ Complete context (environment, permissions, constraints)
|
||||
- ✓ No ambiguity (every decision path covered)
|
||||
|
||||
When these are present → agents execute autonomously
|
||||
When these are missing → agents ask questions → blocking occurs
|
||||
|
||||
### 3. **Five Critical Patterns Emerged**
|
||||
|
||||
1. **Detached Spawning**: Run agents as background processes
|
||||
- Returns immediately to CLI
|
||||
- Agents continue independently
|
||||
- PID tracked for monitoring
|
||||
|
||||
2. **Permission Bypass**: Use `--permission-mode bypassPermissions`
|
||||
- No approval dialogs for tool use
|
||||
- Safe because scope limited (project user, project dir)
|
||||
- Must grant pre-authorization in prompt
|
||||
|
||||
3. **File-Based I/O**: Use job directory as IPC channel
|
||||
- Prompt input via file
|
||||
- Output captured to log
|
||||
- Status queries don't require process
|
||||
- Works with background agents
|
||||
|
||||
4. **Exit Code Signaling**: Append "exit:{code}" to output
|
||||
- Status persists in file after process ends
|
||||
- Async status queries (no polling)
|
||||
- Enables retry logic based on code
|
||||
|
||||
5. **Context-First Prompting**: Provide all context upfront
|
||||
- Specific task descriptions
|
||||
- Clear success criteria
|
||||
- No ambiguity
|
||||
- Minimize clarification questions
|
||||
|
||||
### 4. **The AskUserQuestion Problem**
|
||||
|
||||
Claude's `AskUserQuestion` tool blocks agent waiting for stdin:
|
||||
|
||||
```python
|
||||
# This blocks forever if agent is backgrounded
|
||||
response = await ask_user_question(
|
||||
question="What should I do here?",
|
||||
options=[...]
|
||||
)
|
||||
# stdin unavailable = agent stuck
|
||||
```
|
||||
|
||||
**Solution:** Don't rely on user questions. Design prompts to be self-contained.
|
||||
|
||||
### 5. **Job Lifecycle is the Key**
|
||||
|
||||
Luzia's job directory structure enables full autonomy:
|
||||
|
||||
```
|
||||
/var/log/luz-orchestrator/jobs/{job_id}/
|
||||
├── prompt.txt # Agent reads from here
|
||||
├── output.log # Agent writes to here (+ exit code)
|
||||
├── meta.json # Job metadata
|
||||
├── run.sh # Execution script
|
||||
└── results.json # Agent-generated results
|
||||
```
|
||||
|
||||
Exit code appended to output.log as: `exit:{code}`
|
||||
|
||||
This enables:
|
||||
- Async status queries (no blocking)
|
||||
- Automatic retry on specific codes
|
||||
- Failure analysis without process
|
||||
- Integration with monitoring systems
|
||||
|
||||
---
|
||||
|
||||
## Deliverables Created
|
||||
|
||||
### 1. **AGENT-AUTONOMY-RESEARCH.md** (12 sections)
|
||||
Comprehensive research document covering:
|
||||
- How Luzia prevents blocking (Section 1)
|
||||
- Handling clarification without blocking (Section 2)
|
||||
- Job state machine & exit codes (Section 3)
|
||||
- Permission system details (Section 4)
|
||||
- Async communication patterns (Section 5)
|
||||
- Prompt patterns for autonomy (Section 6)
|
||||
- Pattern summary (Section 7)
|
||||
- Real implementation examples (Section 8)
|
||||
- Best practices (Section 9)
|
||||
- Advanced patterns (Section 10)
|
||||
- Failure cases & solutions (Section 11)
|
||||
- Key takeaways & checklist (Section 12)
|
||||
|
||||
### 2. **AGENT-CLI-PATTERNS.md** (12 patterns + 5 anti-patterns)
|
||||
Practical guide covering:
|
||||
- Quick reference: 5 critical patterns
|
||||
- 5 prompt patterns (analysis, execution, implementation, multi-phase, decision)
|
||||
- 5 anti-patterns to avoid
|
||||
- Edge case handling
|
||||
- Prompt template for maximum autonomy
|
||||
- Real-world examples
|
||||
- Quality checklist
|
||||
|
||||
### 3. **AUTONOMOUS-AGENT-TEMPLATES.md** (6 templates)
|
||||
Production-ready code templates:
|
||||
1. Simple task agent (read-only analysis)
|
||||
2. Test execution agent (run & report)
|
||||
3. Code modification agent (implement & verify)
|
||||
4. Multi-step workflow agent (orchestrate process)
|
||||
5. Diagnostic agent (troubleshoot & report)
|
||||
6. Integration test agent (complex validation)
|
||||
|
||||
Each with:
|
||||
- Use case description
|
||||
- Prompt template
|
||||
- Expected output example
|
||||
- Usage pattern
|
||||
|
||||
### 4. **RESEARCH-SUMMARY.md** (this document)
|
||||
Executive summary with:
|
||||
- Research questions & answers
|
||||
- Key findings (5 major findings)
|
||||
- Deliverables list
|
||||
- Implementation checklist
|
||||
- Knowledge graph integration
|
||||
|
||||
---
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
### For Using These Patterns
|
||||
|
||||
- [ ] **Read** `AGENT-AUTONOMY-RESEARCH.md` sections 1-3 (understand architecture)
|
||||
- [ ] **Read** `AGENT-CLI-PATTERNS.md` quick reference (5 patterns)
|
||||
- [ ] **Write** prompts following `AGENT-CLI-PATTERNS.md` template
|
||||
- [ ] **Use** templates from `AUTONOMOUS-AGENT-TEMPLATES.md` as starting point
|
||||
- [ ] **Check** prompt against `AGENT-CLI-PATTERNS.md` checklist
|
||||
- [ ] **Spawn** using `spawn_claude_agent()` function
|
||||
- [ ] **Monitor** via job directory polling
|
||||
|
||||
### For Creating Custom Agents
|
||||
|
||||
When creating new autonomous agents:
|
||||
|
||||
1. **Define Success Clearly** - What does completion look like?
|
||||
2. **Provide Context** - Full environment description
|
||||
3. **Specify Format** - What output format (JSON, text, files)
|
||||
4. **No Ambiguity** - Every decision path covered
|
||||
5. **Document Constraints** - What can/can't be changed
|
||||
6. **Define Exit Codes** - 0=success, 1=recoverable failure, 2=error
|
||||
7. **No User Prompts** - All decisions made by agent alone
|
||||
8. **Test in Background** - Verify no blocking
|
||||
|
||||
---
|
||||
|
||||
## Code References
|
||||
|
||||
### Key Implementation Files
|
||||
|
||||
| File | Purpose | Lines |
|
||||
|------|---------|-------|
|
||||
| `/opt/server-agents/orchestrator/bin/luzia` | Agent spawning implementation | 1012-1200 |
|
||||
| `/opt/server-agents/orchestrator/lib/docker_bridge.py` | Container isolation | All |
|
||||
| `/opt/server-agents/orchestrator/lib/queue_controller.py` | File-based task queue | All |
|
||||
|
||||
### Key Functions
|
||||
|
||||
**spawn_claude_agent(project, task, context, config)**
|
||||
- Lines 1012-1200 in luzia
|
||||
- Spawns detached background agent
|
||||
- Returns job_id immediately
|
||||
- Handles permission bypass, environment setup, exit code capture
|
||||
|
||||
**_get_actual_job_status(job_dir)**
|
||||
- Lines 607-646 in luzia
|
||||
- Determines job status by reading output.log
|
||||
- Checks for "exit:" marker
|
||||
- Returns: running/completed/failed/killed
|
||||
|
||||
**EnqueueTask (QueueController)**
|
||||
- Adds task to file-based queue
|
||||
- Enables load-aware scheduling
|
||||
- Returns task_id and queue position
|
||||
|
||||
---
|
||||
|
||||
## Knowledge Graph Integration
|
||||
|
||||
Findings stored in shared knowledge graph at:
|
||||
`/etc/zen-swarm/memory/projects.db`
|
||||
|
||||
**Relations created:**
|
||||
- `luzia-agent-autonomy-research` documents patterns (5 core patterns)
|
||||
- `luzia-agent-autonomy-research` defines anti-pattern (1 major anti-pattern)
|
||||
- `luzia-architecture` implements pattern (detached execution)
|
||||
- `agent-autonomy-best-practices` includes guidelines (2 key guidelines)
|
||||
|
||||
**Queryable via:**
|
||||
```bash
|
||||
# Search for autonomy patterns
|
||||
mcp__shared-projects-memory__search_context "autonomous agent patterns"
|
||||
|
||||
# Query specific pattern
|
||||
mcp__shared-projects-memory__query_relations \
|
||||
entity_name="detached-process-execution" \
|
||||
relation_type="documents_pattern"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quick Start: Using These Findings
|
||||
|
||||
### For Developers
|
||||
|
||||
**Problem:** "How do I write an agent that runs autonomously?"
|
||||
|
||||
**Solution:**
|
||||
1. Read: `AGENT-CLI-PATTERNS.md` - "5 Critical Patterns" section
|
||||
2. Find matching template in: `AUTONOMOUS-AGENT-TEMPLATES.md`
|
||||
3. Follow the prompt pattern
|
||||
4. Use: `spawn_claude_agent(project, task, context, config)`
|
||||
5. Check: `luzia jobs {job_id}` for status
|
||||
|
||||
### For Architects
|
||||
|
||||
**Problem:** "Should we redesign for async agents?"
|
||||
|
||||
**Solution:**
|
||||
1. Read: `AGENT-AUTONOMY-RESEARCH.md` sections 1-3
|
||||
2. Current approach (detached + file-based IPC) is mature
|
||||
3. No redesign needed; patterns work well
|
||||
4. Focus: prompt design quality (see anti-patterns section)
|
||||
|
||||
### For Troubleshooting
|
||||
|
||||
**Problem:** "Agent keeps asking for clarification"
|
||||
|
||||
**Solution:**
|
||||
1. Check: `AGENT-CLI-PATTERNS.md` - "Anti-Patterns" section
|
||||
2. Redesign prompt to be more specific
|
||||
3. See: "Debugging" section
|
||||
4. Use: `luzia retry {job_id}` to run with new prompt
|
||||
|
||||
---
|
||||
|
||||
## Metrics & Results
|
||||
|
||||
### Documentation Coverage
|
||||
|
||||
| Topic | Coverage | Format |
|
||||
|-------|----------|--------|
|
||||
| **Architecture** | Complete (8 sections) | Markdown |
|
||||
| **Patterns** | 5 core patterns detailed | Markdown |
|
||||
| **Anti-patterns** | 5 anti-patterns with fixes | Markdown |
|
||||
| **Best Practices** | 9 detailed practices | Markdown |
|
||||
| **Code Examples** | 6 production templates | Python |
|
||||
| **Real-world Cases** | 3 detailed examples | Markdown |
|
||||
|
||||
### Research Completeness
|
||||
|
||||
- ✅ How Luzia prevents blocking (answered with architecture details)
|
||||
- ✅ Clarification handling without blocking (answered: don't rely on it)
|
||||
- ✅ Prompt patterns for autonomy (5 patterns documented)
|
||||
- ✅ Best practices (9 practices with examples)
|
||||
- ✅ Failure cases (11 cases with solutions)
|
||||
|
||||
### Knowledge Graph
|
||||
|
||||
- ✅ 5 core patterns registered
|
||||
- ✅ 1 anti-pattern registered
|
||||
- ✅ 2 best practices registered
|
||||
- ✅ Implementation references linked
|
||||
- ✅ Queryable for future research
|
||||
|
||||
---
|
||||
|
||||
## Recommendations for Next Steps
|
||||
|
||||
### For Teams Using Luzia
|
||||
|
||||
1. **Review** the 5 critical patterns in `AGENT-CLI-PATTERNS.md`
|
||||
2. **Adopt** context-first prompting for all new agents
|
||||
3. **Use** provided templates for common tasks
|
||||
4. **Share** findings with team members
|
||||
5. **Monitor** agent quality (should rarely ask questions)
|
||||
|
||||
### For Claude Development
|
||||
|
||||
1. **Consider** guidance on when to skip `AskUserQuestion`
|
||||
2. **Document** permission bypass mode in official docs
|
||||
3. **Add** examples of async prompt patterns
|
||||
4. **Build** wrapper for common agent patterns
|
||||
|
||||
### For Future Research
|
||||
|
||||
1. **Study** efficiency of file-based IPC vs other patterns
|
||||
2. **Measure** success rate of context-first prompts
|
||||
3. **Compare** blocking duration in different scenarios
|
||||
4. **Document** framework for other orchestrators
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
Autonomous agents don't require complex async prompting systems. Instead, they require:
|
||||
|
||||
1. **Clear Architecture** - Detached processes, permission bypass, file IPC
|
||||
2. **Good Prompts** - Specific, complete context, clear success criteria
|
||||
3. **Exit Code Signaling** - Status persisted in files for async queries
|
||||
|
||||
Luzia implements all three. The findings in these documents provide patterns and best practices for anyone building autonomous agents with Claude.
|
||||
|
||||
**Key Insight:** The best way to avoid blocking is to design prompts that don't require user input. Luzia's architecture makes this pattern safe and scalable.
|
||||
|
||||
---
|
||||
|
||||
## Files Delivered
|
||||
|
||||
1. `/opt/server-agents/orchestrator/AGENT-AUTONOMY-RESEARCH.md` (12 sections, ~3000 lines)
|
||||
2. `/opt/server-agents/orchestrator/AGENT-CLI-PATTERNS.md` (practical patterns, ~800 lines)
|
||||
3. `/opt/server-agents/orchestrator/AUTONOMOUS-AGENT-TEMPLATES.md` (6 templates, ~900 lines)
|
||||
4. `/opt/server-agents/orchestrator/RESEARCH-SUMMARY.md` (this file)
|
||||
|
||||
**Total:** ~4,700 lines of documentation
|
||||
|
||||
---
|
||||
|
||||
## Stored in Knowledge Graph
|
||||
|
||||
Relations created in `/etc/zen-swarm/memory/projects.db`:
|
||||
- 5 core patterns documented
|
||||
- 1 anti-pattern documented
|
||||
- 2 best practices documented
|
||||
- Architecture implementation linked
|
||||
|
||||
**Queryable:** Via `mcp__shared-projects-memory__*` tools
|
||||
|
||||
---
|
||||
|
||||
**Research completed by:** Claude Agent (Haiku)
|
||||
**Research date:** 2026-01-09
|
||||
**Status:** Ready for team adoption
|
||||
|
||||
Reference in New Issue
Block a user