Based on claude-code-tools TmuxCLIController, this refactor: - Added DockerTmuxController class for robust tmux session management - Implements send_keys() with configurable delay_enter - Implements capture_pane() for output retrieval - Implements wait_for_prompt() for pattern-based completion detection - Implements wait_for_idle() for content-hash-based idle detection - Implements wait_for_shell_prompt() for shell prompt detection Also includes workflow improvements: - Pre-task git snapshot before agent execution - Post-task commit protocol in agent guidelines Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
399 lines
12 KiB
Markdown
399 lines
12 KiB
Markdown
# Claude Dispatch and Monitor Flow Analysis
|
|
|
|
**Date:** 2026-01-11
|
|
**Author:** Luzia Research Agent
|
|
**Status:** Complete
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
Luzia dispatches Claude tasks using a **fully autonomous, non-blocking pattern**. The current architecture intentionally **does not support human-in-the-loop interaction** for background agents. This analysis documents the current flow, identifies pain points, and researches potential improvements for scenarios where user input is needed.
|
|
|
|
---
|
|
|
|
## Part 1: Current Dispatch Flow
|
|
|
|
### 1.1 Task Dispatch Mechanism
|
|
|
|
**Entry Point:** `spawn_claude_agent()` in `/opt/server-agents/orchestrator/bin/luzia:1102-1401`
|
|
|
|
**Flow:**
|
|
```
|
|
User runs: luzia <project> <task>
|
|
↓
|
|
1. Permission check (project access validation)
|
|
↓
|
|
2. QA Preflight checks (optional, validates task)
|
|
↓
|
|
3. Job directory created: /var/log/luz-orchestrator/jobs/{job_id}/
|
|
├── prompt.txt (task + context)
|
|
├── run.sh (shell script with env setup)
|
|
├── meta.json (job metadata)
|
|
└── output.log (will capture output)
|
|
↓
|
|
4. Shell script generated with:
|
|
- User-specific TMPDIR to avoid /tmp collisions
|
|
- HOME set to target user
|
|
- stdbuf for unbuffered output
|
|
- tee to capture to output.log
|
|
↓
|
|
5. Launched via: os.system(f'nohup "{script_file}" >/dev/null 2>&1 &')
|
|
↓
|
|
6. Control returns IMMEDIATELY to CLI (job_id returned)
|
|
↓
|
|
7. Agent runs in background, detached from parent process
|
|
```
|
|
|
|
### 1.2 Claude CLI Invocation
|
|
|
|
**Command Line Built:**
|
|
```bash
|
|
claude --dangerously-skip-permissions \
|
|
--permission-mode bypassPermissions \
|
|
--add-dir "{project_path}" \
|
|
--add-dir /opt/server-agents \
|
|
--print \
|
|
--verbose \
|
|
-p # Reads prompt from stdin
|
|
```
|
|
|
|
**Critical Flags:**
|
|
| Flag | Purpose |
|
|
|------|---------|
|
|
| `--dangerously-skip-permissions` | Required to use bypassPermissions mode |
|
|
| `--permission-mode bypassPermissions` | Skip ALL interactive prompts |
|
|
| `--print` | Non-interactive output mode |
|
|
| `--verbose` | Progress visibility in logs |
|
|
| `-p` | Read prompt from stdin (piped from prompt.txt) |
|
|
|
|
### 1.3 Output Capture
|
|
|
|
**Run script template:**
|
|
```bash
|
|
#!/bin/bash
|
|
echo $$ > "{pid_file}"
|
|
|
|
# Environment setup
|
|
export TMPDIR="{user_tmp_dir}"
|
|
export HOME="{user_home}"
|
|
|
|
# Execute with unbuffered output capture
|
|
sudo -u {user} bash -c '... cd "{project_path}" && cat "{prompt_file}" | stdbuf -oL -eL {claude_cmd}' 2>&1 | tee "{output_file}"
|
|
|
|
exit_code=${PIPESTATUS[0]}
|
|
echo "" >> "{output_file}"
|
|
echo "exit:$exit_code" >> "{output_file}"
|
|
```
|
|
|
|
---
|
|
|
|
## Part 2: Output Monitoring Flow
|
|
|
|
### 2.1 Status Checking
|
|
|
|
**Function:** `get_job_status()` at line 1404
|
|
|
|
Status is determined by:
|
|
1. Reading `output.log` for `exit:` line at end
|
|
2. Checking if process is still running (via PID)
|
|
3. Updating `meta.json` with completion time metrics
|
|
|
|
**Status Values:**
|
|
- `running` - No exit code yet, PID may still be active
|
|
- `completed` - `exit:0` found
|
|
- `failed` - `exit:non-zero` found
|
|
- `killed` - `exit:-9` or manual kill detected
|
|
|
|
### 2.2 User Monitoring Commands
|
|
|
|
```bash
|
|
# List all jobs
|
|
luzia jobs
|
|
|
|
# Show specific job status
|
|
luzia jobs {job_id}
|
|
|
|
# View job output
|
|
luzia logs {job_id}
|
|
|
|
# Show with timing details
|
|
luzia jobs --timing
|
|
```
|
|
|
|
### 2.3 Notification Flow
|
|
|
|
On completion, the run script appends to notification log:
|
|
```bash
|
|
echo "[$(date +%H:%M:%S)] Agent {job_id} finished (exit $exit_code)" >> /var/log/luz-orchestrator/notifications.log
|
|
```
|
|
|
|
This allows external monitoring via:
|
|
```bash
|
|
tail -f /var/log/luz-orchestrator/notifications.log
|
|
```
|
|
|
|
---
|
|
|
|
## Part 3: Current User Interaction Handling
|
|
|
|
### 3.1 The Problem: No Interaction Supported
|
|
|
|
**Current Design:** Background agents **cannot** receive user input.
|
|
|
|
**Why:**
|
|
1. `nohup` detaches from terminal - stdin unavailable
|
|
2. `--permission-mode bypassPermissions` skips prompts
|
|
3. No mechanism exists to pause agent and wait for input
|
|
4. Output is captured to file, not interactive terminal
|
|
|
|
### 3.2 When Claude Would Ask Questions
|
|
|
|
Claude's `AskUserQuestion` tool would block waiting for stdin, which isn't available. Current mitigations:
|
|
|
|
1. **Context-First Design** - Prompts include all necessary context
|
|
2. **Pre-Authorization** - Permissions granted upfront
|
|
3. **Structured Tasks** - Clear success criteria reduce ambiguity
|
|
4. **Exit Code Signaling** - Agent exits with code 1 if unable to proceed
|
|
|
|
### 3.3 Current Pain Points
|
|
|
|
| Pain Point | Impact | Current Workaround |
|
|
|------------|--------|-------------------|
|
|
| Agent can't ask clarifying questions | May proceed with wrong assumptions | Write detailed prompts |
|
|
| User can't provide mid-task guidance | Task might fail when adjustments needed | Retry with modified task |
|
|
| No approval workflow for risky actions | Security relies on upfront authorization | Careful permission scoping |
|
|
| Long tasks give no progress updates | User doesn't know if task is stuck | Check output.log manually |
|
|
| AskUserQuestion blocks indefinitely | Agent hangs, appears as "running" forever | Must kill and retry |
|
|
|
|
---
|
|
|
|
## Part 4: Research on Interaction Improvements
|
|
|
|
### 4.1 Pattern: File-Based Clarification Queue
|
|
|
|
**Concept:** Agent writes questions to file, waits for answer file.
|
|
|
|
```
|
|
/var/log/luz-orchestrator/jobs/{job_id}/
|
|
├── clarification.json # Agent writes question
|
|
├── response.json # User writes answer
|
|
└── output.log # Agent logs waiting status
|
|
```
|
|
|
|
**Agent Behavior:**
|
|
```python
|
|
# Agent encounters ambiguity
|
|
question = {
|
|
"type": "choice",
|
|
"question": "Which database: production or staging?",
|
|
"options": ["production", "staging"],
|
|
"timeout_minutes": 30,
|
|
"default_if_timeout": "staging"
|
|
}
|
|
Path("clarification.json").write_text(json.dumps(question))
|
|
|
|
# Wait for response (polling)
|
|
for _ in range(timeout * 60):
|
|
if Path("response.json").exists():
|
|
response = json.loads(Path("response.json").read_text())
|
|
return response["choice"]
|
|
time.sleep(1)
|
|
|
|
# Timeout - use default
|
|
return question["default_if_timeout"]
|
|
```
|
|
|
|
**User Side:**
|
|
```bash
|
|
# List pending questions
|
|
luzia questions
|
|
|
|
# Answer a question
|
|
luzia answer {job_id} staging
|
|
```
|
|
|
|
### 4.2 Pattern: WebSocket Status Bridge
|
|
|
|
**Concept:** Real-time bidirectional communication via WebSocket.
|
|
|
|
```
|
|
User Browser ←→ Luzia Status Server ←→ Agent Process
|
|
↑
|
|
/var/lib/luzia/status.sock
|
|
```
|
|
|
|
**Implementation in Existing Code:**
|
|
`lib/luzia_status_integration.py` already has a status publisher framework that could be extended.
|
|
|
|
**Flow:**
|
|
1. Agent publishes status updates to socket
|
|
2. Status server broadcasts to connected clients
|
|
3. When question arises, server notifies all clients
|
|
4. User responds via web UI or CLI
|
|
5. Response routed back to agent
|
|
|
|
### 4.3 Pattern: Telegram/Chat Integration
|
|
|
|
**Existing:** `/opt/server-agents/mcp-servers/assistant-channel/` provides Telegram integration.
|
|
|
|
**Extended for Agent Questions:**
|
|
```python
|
|
# Agent needs input
|
|
channel_query(
|
|
sender=f"agent-{job_id}",
|
|
question="Should I update production database?",
|
|
context="Running migration task for musica project"
|
|
)
|
|
|
|
# Bruno responds via Telegram
|
|
# Response delivered to agent via file or status channel
|
|
```
|
|
|
|
### 4.4 Pattern: Approval Gates
|
|
|
|
**Concept:** Pre-define checkpoints where agent must wait for approval.
|
|
|
|
```python
|
|
# In task prompt
|
|
"""
|
|
## Approval Gates
|
|
- Before running migrations: await approval
|
|
- Before deleting files: await approval
|
|
- Before modifying production config: await approval
|
|
|
|
Write to approval.json when reaching a gate. Wait for approved.json.
|
|
"""
|
|
```
|
|
|
|
**Gate File:**
|
|
```json
|
|
{
|
|
"gate": "database_migration",
|
|
"description": "About to run 3 migrations on staging DB",
|
|
"awaiting_since": "2026-01-11T14:30:00Z",
|
|
"auto_approve_after_minutes": null
|
|
}
|
|
```
|
|
|
|
### 4.5 Pattern: Interactive Mode Flag
|
|
|
|
**Concept:** Allow foreground execution when user is present.
|
|
|
|
```bash
|
|
# Background (current default)
|
|
luzia musica "run tests"
|
|
|
|
# Foreground/Interactive
|
|
luzia musica "run tests" --fg
|
|
|
|
# Interactive session (already exists)
|
|
luzia work on musica
|
|
```
|
|
|
|
The `--fg` flag already exists but doesn't fully support interactive Q&A. Enhancement needed:
|
|
- Don't detach process
|
|
- Keep stdin connected
|
|
- Allow Claude's AskUserQuestion to work normally
|
|
|
|
---
|
|
|
|
## Part 5: Recommendations
|
|
|
|
### 5.1 Short-Term (Quick Wins)
|
|
|
|
1. **Better Exit Code Semantics**
|
|
- Exit 100 = "needs clarification" (new code)
|
|
- Capture the question in `clarification.json`
|
|
- `luzia questions` command to list pending
|
|
|
|
2. **Enhanced `--fg` Mode**
|
|
- Don't background the process
|
|
- Keep stdin/stdout connected
|
|
- Allow normal interactive Claude session
|
|
|
|
3. **Progress Streaming**
|
|
- Add `luzia watch {job_id}` for `tail -f` on output.log
|
|
- Color-coded output for better readability
|
|
|
|
### 5.2 Medium-Term (New Features)
|
|
|
|
4. **File-Based Clarification System**
|
|
- Agent writes to `clarification.json`
|
|
- Luzia CLI watches for pending questions
|
|
- `luzia answer {job_id} <response>` writes `response.json`
|
|
- Agent polls and continues
|
|
|
|
5. **Telegram/Chat Bridge for Questions**
|
|
- Extend assistant-channel for agent questions
|
|
- Push notification when agent needs input
|
|
- Reply via chat, response routed to agent
|
|
|
|
6. **Status Dashboard**
|
|
- Web UI showing all running agents
|
|
- Real-time output streaming
|
|
- Question/response interface
|
|
|
|
### 5.3 Long-Term (Architecture Evolution)
|
|
|
|
7. **Approval Workflows**
|
|
- Define approval gates in task specification
|
|
- Configurable auto-approve timeouts
|
|
- Audit log of approvals
|
|
|
|
8. **Agent Orchestration Layer**
|
|
- Queue of pending questions across agents
|
|
- Priority handling for urgent questions
|
|
- SLA tracking for response times
|
|
|
|
9. **Hybrid Execution Mode**
|
|
- Start background, attach to foreground if question arises
|
|
- Agent sends signal when needing input
|
|
- CLI can "attach" to running agent
|
|
|
|
---
|
|
|
|
## Part 6: Implementation Priority
|
|
|
|
| Priority | Feature | Effort | Impact |
|
|
|----------|---------|--------|--------|
|
|
| **P0** | Better `--fg` mode | Low | High - enables immediate interactive use |
|
|
| **P0** | Exit code 100 for clarification | Low | Medium - better failure understanding |
|
|
| **P1** | `luzia watch {job_id}` | Low | Medium - easier monitoring |
|
|
| **P1** | File-based clarification | Medium | High - enables async Q&A |
|
|
| **P2** | Telegram question bridge | Medium | Medium - mobile notification |
|
|
| **P2** | Status dashboard | High | High - visual monitoring |
|
|
| **P3** | Approval workflows | High | Medium - enterprise feature |
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
The current Luzia dispatch architecture is optimized for **fully autonomous** agent execution. This is the right default for background tasks. However, there's a gap for scenarios where:
|
|
- Tasks are inherently ambiguous
|
|
- User guidance is needed mid-task
|
|
- High-stakes actions require approval
|
|
|
|
The recommended path forward is:
|
|
1. **Improve `--fg` mode** for true interactive sessions
|
|
2. **Add file-based clarification** for async Q&A on background tasks
|
|
3. **Integrate with Telegram** for push notifications on questions
|
|
4. **Build status dashboard** for visual monitoring and interaction
|
|
|
|
These improvements maintain the autonomous-by-default philosophy while enabling human-in-the-loop interaction when needed.
|
|
|
|
---
|
|
|
|
## Appendix: Key File Locations
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `/opt/server-agents/orchestrator/bin/luzia:1102-1401` | `spawn_claude_agent()` - main dispatch |
|
|
| `/opt/server-agents/orchestrator/bin/luzia:1404-1449` | `get_job_status()` - status checking |
|
|
| `/opt/server-agents/orchestrator/bin/luzia:4000-4042` | `route_logs()` - log viewing |
|
|
| `/opt/server-agents/orchestrator/lib/responsive_dispatcher.py` | Async dispatch patterns |
|
|
| `/opt/server-agents/orchestrator/lib/cli_feedback.py` | CLI output formatting |
|
|
| `/opt/server-agents/orchestrator/AGENT-AUTONOMY-RESEARCH.md` | Prior research on autonomy |
|
|
| `/var/log/luz-orchestrator/jobs/` | Job directories |
|
|
| `/var/log/luz-orchestrator/notifications.log` | Completion notifications |
|