Files

admin ec33ac1936 Refactor cockpit to use DockerTmuxController pattern

Based on claude-code-tools TmuxCLIController, this refactor:

- Added DockerTmuxController class for robust tmux session management
- Implements send_keys() with configurable delay_enter
- Implements capture_pane() for output retrieval
- Implements wait_for_prompt() for pattern-based completion detection
- Implements wait_for_idle() for content-hash-based idle detection
- Implements wait_for_shell_prompt() for shell prompt detection

Also includes workflow improvements:
- Pre-task git snapshot before agent execution
- Post-task commit protocol in agent guidelines

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-14 10:42:16 -03:00

12 KiB

Raw Blame History

Claude Dispatch and Monitor Flow Analysis

Date: 2026-01-11 Author: Luzia Research Agent Status: Complete

Executive Summary

Luzia dispatches Claude tasks using a fully autonomous, non-blocking pattern. The current architecture intentionally does not support human-in-the-loop interaction for background agents. This analysis documents the current flow, identifies pain points, and researches potential improvements for scenarios where user input is needed.

Part 1: Current Dispatch Flow

1.1 Task Dispatch Mechanism

Entry Point: spawn_claude_agent() in /opt/server-agents/orchestrator/bin/luzia:1102-1401

Flow:

User runs: luzia <project> <task>
                ↓
1. Permission check (project access validation)
                ↓
2. QA Preflight checks (optional, validates task)
                ↓
3. Job directory created: /var/log/luz-orchestrator/jobs/{job_id}/
   ├── prompt.txt     (task + context)
   ├── run.sh         (shell script with env setup)
   ├── meta.json      (job metadata)
   └── output.log     (will capture output)
                ↓
4. Shell script generated with:
   - User-specific TMPDIR to avoid /tmp collisions
   - HOME set to target user
   - stdbuf for unbuffered output
   - tee to capture to output.log
                ↓
5. Launched via: os.system(f'nohup "{script_file}" >/dev/null 2>&1 &')
                ↓
6. Control returns IMMEDIATELY to CLI (job_id returned)
                ↓
7. Agent runs in background, detached from parent process

1.2 Claude CLI Invocation

Command Line Built:

claude --dangerously-skip-permissions \
       --permission-mode bypassPermissions \
       --add-dir "{project_path}" \
       --add-dir /opt/server-agents \
       --print \
       --verbose \
       -p  # Reads prompt from stdin

Critical Flags:

Flag	Purpose
`--dangerously-skip-permissions`	Required to use bypassPermissions mode
`--permission-mode bypassPermissions`	Skip ALL interactive prompts
`--print`	Non-interactive output mode
`--verbose`	Progress visibility in logs
`-p`	Read prompt from stdin (piped from prompt.txt)

1.3 Output Capture

Run script template:

#!/bin/bash
echo $$ > "{pid_file}"

# Environment setup
export TMPDIR="{user_tmp_dir}"
export HOME="{user_home}"

# Execute with unbuffered output capture
sudo -u {user} bash -c '... cd "{project_path}" && cat "{prompt_file}" | stdbuf -oL -eL {claude_cmd}' 2>&1 | tee "{output_file}"

exit_code=${PIPESTATUS[0]}
echo "" >> "{output_file}"
echo "exit:$exit_code" >> "{output_file}"

Part 2: Output Monitoring Flow

2.1 Status Checking

Function: get_job_status() at line 1404

Status is determined by:

Reading output.log for exit: line at end
Checking if process is still running (via PID)
Updating meta.json with completion time metrics

Status Values:

running - No exit code yet, PID may still be active
completed - exit:0 found
failed - exit:non-zero found
killed - exit:-9 or manual kill detected

2.2 User Monitoring Commands

# List all jobs
luzia jobs

# Show specific job status
luzia jobs {job_id}

# View job output
luzia logs {job_id}

# Show with timing details
luzia jobs --timing

2.3 Notification Flow

On completion, the run script appends to notification log:

echo "[$(date +%H:%M:%S)] Agent {job_id} finished (exit $exit_code)" >> /var/log/luz-orchestrator/notifications.log

This allows external monitoring via:

tail -f /var/log/luz-orchestrator/notifications.log

Part 3: Current User Interaction Handling

3.1 The Problem: No Interaction Supported

Current Design: Background agents cannot receive user input.

Why:

nohup detaches from terminal - stdin unavailable
--permission-mode bypassPermissions skips prompts
No mechanism exists to pause agent and wait for input
Output is captured to file, not interactive terminal

3.2 When Claude Would Ask Questions

Claude's AskUserQuestion tool would block waiting for stdin, which isn't available. Current mitigations:

Context-First Design - Prompts include all necessary context
Pre-Authorization - Permissions granted upfront
Structured Tasks - Clear success criteria reduce ambiguity
Exit Code Signaling - Agent exits with code 1 if unable to proceed

3.3 Current Pain Points

Pain Point	Impact	Current Workaround
Agent can't ask clarifying questions	May proceed with wrong assumptions	Write detailed prompts
User can't provide mid-task guidance	Task might fail when adjustments needed	Retry with modified task
No approval workflow for risky actions	Security relies on upfront authorization	Careful permission scoping
Long tasks give no progress updates	User doesn't know if task is stuck	Check output.log manually
AskUserQuestion blocks indefinitely	Agent hangs, appears as "running" forever	Must kill and retry

Part 4: Research on Interaction Improvements

4.1 Pattern: File-Based Clarification Queue

Concept: Agent writes questions to file, waits for answer file.

/var/log/luz-orchestrator/jobs/{job_id}/
├── clarification.json     # Agent writes question
├── response.json          # User writes answer
└── output.log             # Agent logs waiting status

Agent Behavior:

# Agent encounters ambiguity
question = {
    "type": "choice",
    "question": "Which database: production or staging?",
    "options": ["production", "staging"],
    "timeout_minutes": 30,
    "default_if_timeout": "staging"
}
Path("clarification.json").write_text(json.dumps(question))

# Wait for response (polling)
for _ in range(timeout * 60):
    if Path("response.json").exists():
        response = json.loads(Path("response.json").read_text())
        return response["choice"]
    time.sleep(1)

# Timeout - use default
return question["default_if_timeout"]

User Side:

# List pending questions
luzia questions

# Answer a question
luzia answer {job_id} staging

4.2 Pattern: WebSocket Status Bridge

Concept: Real-time bidirectional communication via WebSocket.

User Browser ←→ Luzia Status Server ←→ Agent Process
                     ↑
              /var/lib/luzia/status.sock

Implementation in Existing Code: lib/luzia_status_integration.py already has a status publisher framework that could be extended.

Flow:

Agent publishes status updates to socket
Status server broadcasts to connected clients
When question arises, server notifies all clients
User responds via web UI or CLI
Response routed back to agent

4.3 Pattern: Telegram/Chat Integration

Existing: /opt/server-agents/mcp-servers/assistant-channel/ provides Telegram integration.

Extended for Agent Questions:

# Agent needs input
channel_query(
    sender=f"agent-{job_id}",
    question="Should I update production database?",
    context="Running migration task for musica project"
)

# Bruno responds via Telegram
# Response delivered to agent via file or status channel

4.4 Pattern: Approval Gates

Concept: Pre-define checkpoints where agent must wait for approval.

# In task prompt
"""
## Approval Gates
- Before running migrations: await approval
- Before deleting files: await approval
- Before modifying production config: await approval

Write to approval.json when reaching a gate. Wait for approved.json.
"""

Gate File:

{
    "gate": "database_migration",
    "description": "About to run 3 migrations on staging DB",
    "awaiting_since": "2026-01-11T14:30:00Z",
    "auto_approve_after_minutes": null
}

4.5 Pattern: Interactive Mode Flag

Concept: Allow foreground execution when user is present.

# Background (current default)
luzia musica "run tests"

# Foreground/Interactive
luzia musica "run tests" --fg

# Interactive session (already exists)
luzia work on musica

The --fg flag already exists but doesn't fully support interactive Q&A. Enhancement needed:

Don't detach process
Keep stdin connected
Allow Claude's AskUserQuestion to work normally

Part 5: Recommendations

5.1 Short-Term (Quick Wins)

Better Exit Code Semantics
- Exit 100 = "needs clarification" (new code)
- Capture the question in clarification.json
- luzia questions command to list pending
Enhanced --fg Mode
- Don't background the process
- Keep stdin/stdout connected
- Allow normal interactive Claude session
Progress Streaming
- Add luzia watch {job_id} for tail -f on output.log
- Color-coded output for better readability

5.2 Medium-Term (New Features)

File-Based Clarification System
- Agent writes to clarification.json
- Luzia CLI watches for pending questions
- luzia answer {job_id} <response> writes response.json
- Agent polls and continues
Telegram/Chat Bridge for Questions
- Extend assistant-channel for agent questions
- Push notification when agent needs input
- Reply via chat, response routed to agent
Status Dashboard
- Web UI showing all running agents
- Real-time output streaming
- Question/response interface

5.3 Long-Term (Architecture Evolution)

Approval Workflows
- Define approval gates in task specification
- Configurable auto-approve timeouts
- Audit log of approvals
Agent Orchestration Layer
- Queue of pending questions across agents
- Priority handling for urgent questions
- SLA tracking for response times
Hybrid Execution Mode
- Start background, attach to foreground if question arises
- Agent sends signal when needing input
- CLI can "attach" to running agent

Part 6: Implementation Priority

Priority	Feature	Effort	Impact
P0	Better `--fg` mode	Low	High - enables immediate interactive use
P0	Exit code 100 for clarification	Low	Medium - better failure understanding
P1	`luzia watch {job_id}`	Low	Medium - easier monitoring
P1	File-based clarification	Medium	High - enables async Q&A
P2	Telegram question bridge	Medium	Medium - mobile notification
P2	Status dashboard	High	High - visual monitoring
P3	Approval workflows	High	Medium - enterprise feature

Conclusion

The current Luzia dispatch architecture is optimized for fully autonomous agent execution. This is the right default for background tasks. However, there's a gap for scenarios where:

Tasks are inherently ambiguous
User guidance is needed mid-task
High-stakes actions require approval

The recommended path forward is:

Improve --fg mode for true interactive sessions
Add file-based clarification for async Q&A on background tasks
Integrate with Telegram for push notifications on questions
Build status dashboard for visual monitoring and interaction

These improvements maintain the autonomous-by-default philosophy while enabling human-in-the-loop interaction when needed.

Appendix: Key File Locations

File	Purpose
`/opt/server-agents/orchestrator/bin/luzia:1102-1401`	`spawn_claude_agent()` - main dispatch
`/opt/server-agents/orchestrator/bin/luzia:1404-1449`	`get_job_status()` - status checking
`/opt/server-agents/orchestrator/bin/luzia:4000-4042`	`route_logs()` - log viewing
`/opt/server-agents/orchestrator/lib/responsive_dispatcher.py`	Async dispatch patterns
`/opt/server-agents/orchestrator/lib/cli_feedback.py`	CLI output formatting
`/opt/server-agents/orchestrator/AGENT-AUTONOMY-RESEARCH.md`	Prior research on autonomy
`/var/log/luz-orchestrator/jobs/`	Job directories
`/var/log/luz-orchestrator/notifications.log`	Completion notifications

12 KiB Raw Blame History