Based on claude-code-tools TmuxCLIController, this refactor: - Added DockerTmuxController class for robust tmux session management - Implements send_keys() with configurable delay_enter - Implements capture_pane() for output retrieval - Implements wait_for_prompt() for pattern-based completion detection - Implements wait_for_idle() for content-hash-based idle detection - Implements wait_for_shell_prompt() for shell prompt detection Also includes workflow improvements: - Pre-task git snapshot before agent execution - Post-task commit protocol in agent guidelines Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
12 KiB
Claude Dispatch and Monitor Flow Analysis
Date: 2026-01-11 Author: Luzia Research Agent Status: Complete
Executive Summary
Luzia dispatches Claude tasks using a fully autonomous, non-blocking pattern. The current architecture intentionally does not support human-in-the-loop interaction for background agents. This analysis documents the current flow, identifies pain points, and researches potential improvements for scenarios where user input is needed.
Part 1: Current Dispatch Flow
1.1 Task Dispatch Mechanism
Entry Point: spawn_claude_agent() in /opt/server-agents/orchestrator/bin/luzia:1102-1401
Flow:
User runs: luzia <project> <task>
↓
1. Permission check (project access validation)
↓
2. QA Preflight checks (optional, validates task)
↓
3. Job directory created: /var/log/luz-orchestrator/jobs/{job_id}/
├── prompt.txt (task + context)
├── run.sh (shell script with env setup)
├── meta.json (job metadata)
└── output.log (will capture output)
↓
4. Shell script generated with:
- User-specific TMPDIR to avoid /tmp collisions
- HOME set to target user
- stdbuf for unbuffered output
- tee to capture to output.log
↓
5. Launched via: os.system(f'nohup "{script_file}" >/dev/null 2>&1 &')
↓
6. Control returns IMMEDIATELY to CLI (job_id returned)
↓
7. Agent runs in background, detached from parent process
1.2 Claude CLI Invocation
Command Line Built:
claude --dangerously-skip-permissions \
--permission-mode bypassPermissions \
--add-dir "{project_path}" \
--add-dir /opt/server-agents \
--print \
--verbose \
-p # Reads prompt from stdin
Critical Flags:
| Flag | Purpose |
|---|---|
--dangerously-skip-permissions |
Required to use bypassPermissions mode |
--permission-mode bypassPermissions |
Skip ALL interactive prompts |
--print |
Non-interactive output mode |
--verbose |
Progress visibility in logs |
-p |
Read prompt from stdin (piped from prompt.txt) |
1.3 Output Capture
Run script template:
#!/bin/bash
echo $$ > "{pid_file}"
# Environment setup
export TMPDIR="{user_tmp_dir}"
export HOME="{user_home}"
# Execute with unbuffered output capture
sudo -u {user} bash -c '... cd "{project_path}" && cat "{prompt_file}" | stdbuf -oL -eL {claude_cmd}' 2>&1 | tee "{output_file}"
exit_code=${PIPESTATUS[0]}
echo "" >> "{output_file}"
echo "exit:$exit_code" >> "{output_file}"
Part 2: Output Monitoring Flow
2.1 Status Checking
Function: get_job_status() at line 1404
Status is determined by:
- Reading
output.logforexit:line at end - Checking if process is still running (via PID)
- Updating
meta.jsonwith completion time metrics
Status Values:
running- No exit code yet, PID may still be activecompleted-exit:0foundfailed-exit:non-zerofoundkilled-exit:-9or manual kill detected
2.2 User Monitoring Commands
# List all jobs
luzia jobs
# Show specific job status
luzia jobs {job_id}
# View job output
luzia logs {job_id}
# Show with timing details
luzia jobs --timing
2.3 Notification Flow
On completion, the run script appends to notification log:
echo "[$(date +%H:%M:%S)] Agent {job_id} finished (exit $exit_code)" >> /var/log/luz-orchestrator/notifications.log
This allows external monitoring via:
tail -f /var/log/luz-orchestrator/notifications.log
Part 3: Current User Interaction Handling
3.1 The Problem: No Interaction Supported
Current Design: Background agents cannot receive user input.
Why:
nohupdetaches from terminal - stdin unavailable--permission-mode bypassPermissionsskips prompts- No mechanism exists to pause agent and wait for input
- Output is captured to file, not interactive terminal
3.2 When Claude Would Ask Questions
Claude's AskUserQuestion tool would block waiting for stdin, which isn't available. Current mitigations:
- Context-First Design - Prompts include all necessary context
- Pre-Authorization - Permissions granted upfront
- Structured Tasks - Clear success criteria reduce ambiguity
- Exit Code Signaling - Agent exits with code 1 if unable to proceed
3.3 Current Pain Points
| Pain Point | Impact | Current Workaround |
|---|---|---|
| Agent can't ask clarifying questions | May proceed with wrong assumptions | Write detailed prompts |
| User can't provide mid-task guidance | Task might fail when adjustments needed | Retry with modified task |
| No approval workflow for risky actions | Security relies on upfront authorization | Careful permission scoping |
| Long tasks give no progress updates | User doesn't know if task is stuck | Check output.log manually |
| AskUserQuestion blocks indefinitely | Agent hangs, appears as "running" forever | Must kill and retry |
Part 4: Research on Interaction Improvements
4.1 Pattern: File-Based Clarification Queue
Concept: Agent writes questions to file, waits for answer file.
/var/log/luz-orchestrator/jobs/{job_id}/
├── clarification.json # Agent writes question
├── response.json # User writes answer
└── output.log # Agent logs waiting status
Agent Behavior:
# Agent encounters ambiguity
question = {
"type": "choice",
"question": "Which database: production or staging?",
"options": ["production", "staging"],
"timeout_minutes": 30,
"default_if_timeout": "staging"
}
Path("clarification.json").write_text(json.dumps(question))
# Wait for response (polling)
for _ in range(timeout * 60):
if Path("response.json").exists():
response = json.loads(Path("response.json").read_text())
return response["choice"]
time.sleep(1)
# Timeout - use default
return question["default_if_timeout"]
User Side:
# List pending questions
luzia questions
# Answer a question
luzia answer {job_id} staging
4.2 Pattern: WebSocket Status Bridge
Concept: Real-time bidirectional communication via WebSocket.
User Browser ←→ Luzia Status Server ←→ Agent Process
↑
/var/lib/luzia/status.sock
Implementation in Existing Code:
lib/luzia_status_integration.py already has a status publisher framework that could be extended.
Flow:
- Agent publishes status updates to socket
- Status server broadcasts to connected clients
- When question arises, server notifies all clients
- User responds via web UI or CLI
- Response routed back to agent
4.3 Pattern: Telegram/Chat Integration
Existing: /opt/server-agents/mcp-servers/assistant-channel/ provides Telegram integration.
Extended for Agent Questions:
# Agent needs input
channel_query(
sender=f"agent-{job_id}",
question="Should I update production database?",
context="Running migration task for musica project"
)
# Bruno responds via Telegram
# Response delivered to agent via file or status channel
4.4 Pattern: Approval Gates
Concept: Pre-define checkpoints where agent must wait for approval.
# In task prompt
"""
## Approval Gates
- Before running migrations: await approval
- Before deleting files: await approval
- Before modifying production config: await approval
Write to approval.json when reaching a gate. Wait for approved.json.
"""
Gate File:
{
"gate": "database_migration",
"description": "About to run 3 migrations on staging DB",
"awaiting_since": "2026-01-11T14:30:00Z",
"auto_approve_after_minutes": null
}
4.5 Pattern: Interactive Mode Flag
Concept: Allow foreground execution when user is present.
# Background (current default)
luzia musica "run tests"
# Foreground/Interactive
luzia musica "run tests" --fg
# Interactive session (already exists)
luzia work on musica
The --fg flag already exists but doesn't fully support interactive Q&A. Enhancement needed:
- Don't detach process
- Keep stdin connected
- Allow Claude's AskUserQuestion to work normally
Part 5: Recommendations
5.1 Short-Term (Quick Wins)
-
Better Exit Code Semantics
- Exit 100 = "needs clarification" (new code)
- Capture the question in
clarification.json luzia questionscommand to list pending
-
Enhanced
--fgMode- Don't background the process
- Keep stdin/stdout connected
- Allow normal interactive Claude session
-
Progress Streaming
- Add
luzia watch {job_id}fortail -fon output.log - Color-coded output for better readability
- Add
5.2 Medium-Term (New Features)
-
File-Based Clarification System
- Agent writes to
clarification.json - Luzia CLI watches for pending questions
luzia answer {job_id} <response>writesresponse.json- Agent polls and continues
- Agent writes to
-
Telegram/Chat Bridge for Questions
- Extend assistant-channel for agent questions
- Push notification when agent needs input
- Reply via chat, response routed to agent
-
Status Dashboard
- Web UI showing all running agents
- Real-time output streaming
- Question/response interface
5.3 Long-Term (Architecture Evolution)
-
Approval Workflows
- Define approval gates in task specification
- Configurable auto-approve timeouts
- Audit log of approvals
-
Agent Orchestration Layer
- Queue of pending questions across agents
- Priority handling for urgent questions
- SLA tracking for response times
-
Hybrid Execution Mode
- Start background, attach to foreground if question arises
- Agent sends signal when needing input
- CLI can "attach" to running agent
Part 6: Implementation Priority
| Priority | Feature | Effort | Impact |
|---|---|---|---|
| P0 | Better --fg mode |
Low | High - enables immediate interactive use |
| P0 | Exit code 100 for clarification | Low | Medium - better failure understanding |
| P1 | luzia watch {job_id} |
Low | Medium - easier monitoring |
| P1 | File-based clarification | Medium | High - enables async Q&A |
| P2 | Telegram question bridge | Medium | Medium - mobile notification |
| P2 | Status dashboard | High | High - visual monitoring |
| P3 | Approval workflows | High | Medium - enterprise feature |
Conclusion
The current Luzia dispatch architecture is optimized for fully autonomous agent execution. This is the right default for background tasks. However, there's a gap for scenarios where:
- Tasks are inherently ambiguous
- User guidance is needed mid-task
- High-stakes actions require approval
The recommended path forward is:
- Improve
--fgmode for true interactive sessions - Add file-based clarification for async Q&A on background tasks
- Integrate with Telegram for push notifications on questions
- Build status dashboard for visual monitoring and interaction
These improvements maintain the autonomous-by-default philosophy while enabling human-in-the-loop interaction when needed.
Appendix: Key File Locations
| File | Purpose |
|---|---|
/opt/server-agents/orchestrator/bin/luzia:1102-1401 |
spawn_claude_agent() - main dispatch |
/opt/server-agents/orchestrator/bin/luzia:1404-1449 |
get_job_status() - status checking |
/opt/server-agents/orchestrator/bin/luzia:4000-4042 |
route_logs() - log viewing |
/opt/server-agents/orchestrator/lib/responsive_dispatcher.py |
Async dispatch patterns |
/opt/server-agents/orchestrator/lib/cli_feedback.py |
CLI output formatting |
/opt/server-agents/orchestrator/AGENT-AUTONOMY-RESEARCH.md |
Prior research on autonomy |
/var/log/luz-orchestrator/jobs/ |
Job directories |
/var/log/luz-orchestrator/notifications.log |
Completion notifications |