Files
luzia/docs/CLAUDE-DISPATCH-ANALYSIS.md
admin ec33ac1936 Refactor cockpit to use DockerTmuxController pattern
Based on claude-code-tools TmuxCLIController, this refactor:

- Added DockerTmuxController class for robust tmux session management
- Implements send_keys() with configurable delay_enter
- Implements capture_pane() for output retrieval
- Implements wait_for_prompt() for pattern-based completion detection
- Implements wait_for_idle() for content-hash-based idle detection
- Implements wait_for_shell_prompt() for shell prompt detection

Also includes workflow improvements:
- Pre-task git snapshot before agent execution
- Post-task commit protocol in agent guidelines

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 10:42:16 -03:00

12 KiB

Claude Dispatch and Monitor Flow Analysis

Date: 2026-01-11 Author: Luzia Research Agent Status: Complete


Executive Summary

Luzia dispatches Claude tasks using a fully autonomous, non-blocking pattern. The current architecture intentionally does not support human-in-the-loop interaction for background agents. This analysis documents the current flow, identifies pain points, and researches potential improvements for scenarios where user input is needed.


Part 1: Current Dispatch Flow

1.1 Task Dispatch Mechanism

Entry Point: spawn_claude_agent() in /opt/server-agents/orchestrator/bin/luzia:1102-1401

Flow:

User runs: luzia <project> <task>
                ↓
1. Permission check (project access validation)
                ↓
2. QA Preflight checks (optional, validates task)
                ↓
3. Job directory created: /var/log/luz-orchestrator/jobs/{job_id}/
   ├── prompt.txt     (task + context)
   ├── run.sh         (shell script with env setup)
   ├── meta.json      (job metadata)
   └── output.log     (will capture output)
                ↓
4. Shell script generated with:
   - User-specific TMPDIR to avoid /tmp collisions
   - HOME set to target user
   - stdbuf for unbuffered output
   - tee to capture to output.log
                ↓
5. Launched via: os.system(f'nohup "{script_file}" >/dev/null 2>&1 &')
                ↓
6. Control returns IMMEDIATELY to CLI (job_id returned)
                ↓
7. Agent runs in background, detached from parent process

1.2 Claude CLI Invocation

Command Line Built:

claude --dangerously-skip-permissions \
       --permission-mode bypassPermissions \
       --add-dir "{project_path}" \
       --add-dir /opt/server-agents \
       --print \
       --verbose \
       -p  # Reads prompt from stdin

Critical Flags:

Flag Purpose
--dangerously-skip-permissions Required to use bypassPermissions mode
--permission-mode bypassPermissions Skip ALL interactive prompts
--print Non-interactive output mode
--verbose Progress visibility in logs
-p Read prompt from stdin (piped from prompt.txt)

1.3 Output Capture

Run script template:

#!/bin/bash
echo $$ > "{pid_file}"

# Environment setup
export TMPDIR="{user_tmp_dir}"
export HOME="{user_home}"

# Execute with unbuffered output capture
sudo -u {user} bash -c '... cd "{project_path}" && cat "{prompt_file}" | stdbuf -oL -eL {claude_cmd}' 2>&1 | tee "{output_file}"

exit_code=${PIPESTATUS[0]}
echo "" >> "{output_file}"
echo "exit:$exit_code" >> "{output_file}"

Part 2: Output Monitoring Flow

2.1 Status Checking

Function: get_job_status() at line 1404

Status is determined by:

  1. Reading output.log for exit: line at end
  2. Checking if process is still running (via PID)
  3. Updating meta.json with completion time metrics

Status Values:

  • running - No exit code yet, PID may still be active
  • completed - exit:0 found
  • failed - exit:non-zero found
  • killed - exit:-9 or manual kill detected

2.2 User Monitoring Commands

# List all jobs
luzia jobs

# Show specific job status
luzia jobs {job_id}

# View job output
luzia logs {job_id}

# Show with timing details
luzia jobs --timing

2.3 Notification Flow

On completion, the run script appends to notification log:

echo "[$(date +%H:%M:%S)] Agent {job_id} finished (exit $exit_code)" >> /var/log/luz-orchestrator/notifications.log

This allows external monitoring via:

tail -f /var/log/luz-orchestrator/notifications.log

Part 3: Current User Interaction Handling

3.1 The Problem: No Interaction Supported

Current Design: Background agents cannot receive user input.

Why:

  1. nohup detaches from terminal - stdin unavailable
  2. --permission-mode bypassPermissions skips prompts
  3. No mechanism exists to pause agent and wait for input
  4. Output is captured to file, not interactive terminal

3.2 When Claude Would Ask Questions

Claude's AskUserQuestion tool would block waiting for stdin, which isn't available. Current mitigations:

  1. Context-First Design - Prompts include all necessary context
  2. Pre-Authorization - Permissions granted upfront
  3. Structured Tasks - Clear success criteria reduce ambiguity
  4. Exit Code Signaling - Agent exits with code 1 if unable to proceed

3.3 Current Pain Points

Pain Point Impact Current Workaround
Agent can't ask clarifying questions May proceed with wrong assumptions Write detailed prompts
User can't provide mid-task guidance Task might fail when adjustments needed Retry with modified task
No approval workflow for risky actions Security relies on upfront authorization Careful permission scoping
Long tasks give no progress updates User doesn't know if task is stuck Check output.log manually
AskUserQuestion blocks indefinitely Agent hangs, appears as "running" forever Must kill and retry

Part 4: Research on Interaction Improvements

4.1 Pattern: File-Based Clarification Queue

Concept: Agent writes questions to file, waits for answer file.

/var/log/luz-orchestrator/jobs/{job_id}/
├── clarification.json     # Agent writes question
├── response.json          # User writes answer
└── output.log             # Agent logs waiting status

Agent Behavior:

# Agent encounters ambiguity
question = {
    "type": "choice",
    "question": "Which database: production or staging?",
    "options": ["production", "staging"],
    "timeout_minutes": 30,
    "default_if_timeout": "staging"
}
Path("clarification.json").write_text(json.dumps(question))

# Wait for response (polling)
for _ in range(timeout * 60):
    if Path("response.json").exists():
        response = json.loads(Path("response.json").read_text())
        return response["choice"]
    time.sleep(1)

# Timeout - use default
return question["default_if_timeout"]

User Side:

# List pending questions
luzia questions

# Answer a question
luzia answer {job_id} staging

4.2 Pattern: WebSocket Status Bridge

Concept: Real-time bidirectional communication via WebSocket.

User Browser ←→ Luzia Status Server ←→ Agent Process
                     ↑
              /var/lib/luzia/status.sock

Implementation in Existing Code: lib/luzia_status_integration.py already has a status publisher framework that could be extended.

Flow:

  1. Agent publishes status updates to socket
  2. Status server broadcasts to connected clients
  3. When question arises, server notifies all clients
  4. User responds via web UI or CLI
  5. Response routed back to agent

4.3 Pattern: Telegram/Chat Integration

Existing: /opt/server-agents/mcp-servers/assistant-channel/ provides Telegram integration.

Extended for Agent Questions:

# Agent needs input
channel_query(
    sender=f"agent-{job_id}",
    question="Should I update production database?",
    context="Running migration task for musica project"
)

# Bruno responds via Telegram
# Response delivered to agent via file or status channel

4.4 Pattern: Approval Gates

Concept: Pre-define checkpoints where agent must wait for approval.

# In task prompt
"""
## Approval Gates
- Before running migrations: await approval
- Before deleting files: await approval
- Before modifying production config: await approval

Write to approval.json when reaching a gate. Wait for approved.json.
"""

Gate File:

{
    "gate": "database_migration",
    "description": "About to run 3 migrations on staging DB",
    "awaiting_since": "2026-01-11T14:30:00Z",
    "auto_approve_after_minutes": null
}

4.5 Pattern: Interactive Mode Flag

Concept: Allow foreground execution when user is present.

# Background (current default)
luzia musica "run tests"

# Foreground/Interactive
luzia musica "run tests" --fg

# Interactive session (already exists)
luzia work on musica

The --fg flag already exists but doesn't fully support interactive Q&A. Enhancement needed:

  • Don't detach process
  • Keep stdin connected
  • Allow Claude's AskUserQuestion to work normally

Part 5: Recommendations

5.1 Short-Term (Quick Wins)

  1. Better Exit Code Semantics

    • Exit 100 = "needs clarification" (new code)
    • Capture the question in clarification.json
    • luzia questions command to list pending
  2. Enhanced --fg Mode

    • Don't background the process
    • Keep stdin/stdout connected
    • Allow normal interactive Claude session
  3. Progress Streaming

    • Add luzia watch {job_id} for tail -f on output.log
    • Color-coded output for better readability

5.2 Medium-Term (New Features)

  1. File-Based Clarification System

    • Agent writes to clarification.json
    • Luzia CLI watches for pending questions
    • luzia answer {job_id} <response> writes response.json
    • Agent polls and continues
  2. Telegram/Chat Bridge for Questions

    • Extend assistant-channel for agent questions
    • Push notification when agent needs input
    • Reply via chat, response routed to agent
  3. Status Dashboard

    • Web UI showing all running agents
    • Real-time output streaming
    • Question/response interface

5.3 Long-Term (Architecture Evolution)

  1. Approval Workflows

    • Define approval gates in task specification
    • Configurable auto-approve timeouts
    • Audit log of approvals
  2. Agent Orchestration Layer

    • Queue of pending questions across agents
    • Priority handling for urgent questions
    • SLA tracking for response times
  3. Hybrid Execution Mode

    • Start background, attach to foreground if question arises
    • Agent sends signal when needing input
    • CLI can "attach" to running agent

Part 6: Implementation Priority

Priority Feature Effort Impact
P0 Better --fg mode Low High - enables immediate interactive use
P0 Exit code 100 for clarification Low Medium - better failure understanding
P1 luzia watch {job_id} Low Medium - easier monitoring
P1 File-based clarification Medium High - enables async Q&A
P2 Telegram question bridge Medium Medium - mobile notification
P2 Status dashboard High High - visual monitoring
P3 Approval workflows High Medium - enterprise feature

Conclusion

The current Luzia dispatch architecture is optimized for fully autonomous agent execution. This is the right default for background tasks. However, there's a gap for scenarios where:

  • Tasks are inherently ambiguous
  • User guidance is needed mid-task
  • High-stakes actions require approval

The recommended path forward is:

  1. Improve --fg mode for true interactive sessions
  2. Add file-based clarification for async Q&A on background tasks
  3. Integrate with Telegram for push notifications on questions
  4. Build status dashboard for visual monitoring and interaction

These improvements maintain the autonomous-by-default philosophy while enabling human-in-the-loop interaction when needed.


Appendix: Key File Locations

File Purpose
/opt/server-agents/orchestrator/bin/luzia:1102-1401 spawn_claude_agent() - main dispatch
/opt/server-agents/orchestrator/bin/luzia:1404-1449 get_job_status() - status checking
/opt/server-agents/orchestrator/bin/luzia:4000-4042 route_logs() - log viewing
/opt/server-agents/orchestrator/lib/responsive_dispatcher.py Async dispatch patterns
/opt/server-agents/orchestrator/lib/cli_feedback.py CLI output formatting
/opt/server-agents/orchestrator/AGENT-AUTONOMY-RESEARCH.md Prior research on autonomy
/var/log/luz-orchestrator/jobs/ Job directories
/var/log/luz-orchestrator/notifications.log Completion notifications