luzia/docs/CLAUDE-DISPATCH-ANALYSIS.md

# Claude Dispatch and Monitor Flow Analysis

**Date:** 2026-01-11
**Author:** Luzia Research Agent
**Status:** Complete

---

## Executive Summary

Luzia dispatches Claude tasks using a **fully autonomous, non-blocking pattern**. The current architecture intentionally **does not support human-in-the-loop interaction** for background agents. This analysis documents the current flow, identifies pain points, and researches potential improvements for scenarios where user input is needed.

---

## Part 1: Current Dispatch Flow

### 1.1 Task Dispatch Mechanism

**Entry Point:** `spawn_claude_agent()` in `/opt/server-agents/orchestrator/bin/luzia:1102-1401`

**Flow:**
```
User runs: luzia <project> <task>
                ↓
1. Permission check (project access validation)
                ↓
2. QA Preflight checks (optional, validates task)
                ↓
3. Job directory created: /var/log/luz-orchestrator/jobs/{job_id}/
   ├── prompt.txt     (task + context)
   ├── run.sh         (shell script with env setup)
   ├── meta.json      (job metadata)
   └── output.log     (will capture output)
                ↓
4. Shell script generated with:
   - User-specific TMPDIR to avoid /tmp collisions
   - HOME set to target user
   - stdbuf for unbuffered output
   - tee to capture to output.log
                ↓
5. Launched via: os.system(f'nohup "{script_file}" >/dev/null 2>&1 &')
                ↓
6. Control returns IMMEDIATELY to CLI (job_id returned)
                ↓
7. Agent runs in background, detached from parent process
```

### 1.2 Claude CLI Invocation

**Command Line Built:**
```bash
claude --dangerously-skip-permissions \
       --permission-mode bypassPermissions \
       --add-dir "{project_path}" \
       --add-dir /opt/server-agents \
       --print \
       --verbose \
       -p  # Reads prompt from stdin
```

**Critical Flags:**
| Flag | Purpose |
|------|---------|
| `--dangerously-skip-permissions` | Required to use bypassPermissions mode |
| `--permission-mode bypassPermissions` | Skip ALL interactive prompts |
| `--print` | Non-interactive output mode |
| `--verbose` | Progress visibility in logs |
| `-p` | Read prompt from stdin (piped from prompt.txt) |

### 1.3 Output Capture

**Run script template:**
```bash
#!/bin/bash
echo $$ > "{pid_file}"

# Environment setup
export TMPDIR="{user_tmp_dir}"
export HOME="{user_home}"

# Execute with unbuffered output capture
sudo -u {user} bash -c '... cd "{project_path}" && cat "{prompt_file}" | stdbuf -oL -eL {claude_cmd}' 2>&1 | tee "{output_file}"

exit_code=${PIPESTATUS[0]}
echo "" >> "{output_file}"
echo "exit:$exit_code" >> "{output_file}"
```

---

## Part 2: Output Monitoring Flow

### 2.1 Status Checking

**Function:** `get_job_status()` at line 1404

Status is determined by:
1. Reading `output.log` for `exit:` line at end
2. Checking if process is still running (via PID)
3. Updating `meta.json` with completion time metrics

**Status Values:**
- `running` - No exit code yet, PID may still be active
- `completed` - `exit:0` found
- `failed` - `exit:non-zero` found
- `killed` - `exit:-9` or manual kill detected

### 2.2 User Monitoring Commands

```bash
# List all jobs
luzia jobs

# Show specific job status
luzia jobs {job_id}

# View job output
luzia logs {job_id}

# Show with timing details
luzia jobs --timing
```

### 2.3 Notification Flow

On completion, the run script appends to notification log:
```bash
echo "[$(date +%H:%M:%S)] Agent {job_id} finished (exit $exit_code)" >> /var/log/luz-orchestrator/notifications.log
```

This allows external monitoring via:
```bash
tail -f /var/log/luz-orchestrator/notifications.log
```

---

## Part 3: Current User Interaction Handling

### 3.1 The Problem: No Interaction Supported

**Current Design:** Background agents **cannot** receive user input.

**Why:**
1. `nohup` detaches from terminal - stdin unavailable
2. `--permission-mode bypassPermissions` skips prompts
3. No mechanism exists to pause agent and wait for input
4. Output is captured to file, not interactive terminal

### 3.2 When Claude Would Ask Questions

Claude's `AskUserQuestion` tool would block waiting for stdin, which isn't available. Current mitigations:

1. **Context-First Design** - Prompts include all necessary context
2. **Pre-Authorization** - Permissions granted upfront
3. **Structured Tasks** - Clear success criteria reduce ambiguity
4. **Exit Code Signaling** - Agent exits with code 1 if unable to proceed

### 3.3 Current Pain Points

| Pain Point | Impact | Current Workaround |
|------------|--------|-------------------|
| Agent can't ask clarifying questions | May proceed with wrong assumptions | Write detailed prompts |
| User can't provide mid-task guidance | Task might fail when adjustments needed | Retry with modified task |
| No approval workflow for risky actions | Security relies on upfront authorization | Careful permission scoping |
| Long tasks give no progress updates | User doesn't know if task is stuck | Check output.log manually |
| AskUserQuestion blocks indefinitely | Agent hangs, appears as "running" forever | Must kill and retry |

---

## Part 4: Research on Interaction Improvements

### 4.1 Pattern: File-Based Clarification Queue

**Concept:** Agent writes questions to file, waits for answer file.

```
/var/log/luz-orchestrator/jobs/{job_id}/
├── clarification.json     # Agent writes question
├── response.json          # User writes answer
└── output.log             # Agent logs waiting status
```

**Agent Behavior:**
```python
# Agent encounters ambiguity
question = {
    "type": "choice",
    "question": "Which database: production or staging?",
    "options": ["production", "staging"],
    "timeout_minutes": 30,
    "default_if_timeout": "staging"
}
Path("clarification.json").write_text(json.dumps(question))

# Wait for response (polling)
for _ in range(timeout * 60):
    if Path("response.json").exists():
        response = json.loads(Path("response.json").read_text())
        return response["choice"]
    time.sleep(1)

# Timeout - use default
return question["default_if_timeout"]
```

**User Side:**
```bash
# List pending questions
luzia questions

# Answer a question
luzia answer {job_id} staging
```

### 4.2 Pattern: WebSocket Status Bridge

**Concept:** Real-time bidirectional communication via WebSocket.

```
User Browser ←→ Luzia Status Server ←→ Agent Process
                     ↑
              /var/lib/luzia/status.sock
```

**Implementation in Existing Code:**
`lib/luzia_status_integration.py` already has a status publisher framework that could be extended.

**Flow:**
1. Agent publishes status updates to socket
2. Status server broadcasts to connected clients
3. When question arises, server notifies all clients
4. User responds via web UI or CLI
5. Response routed back to agent

### 4.3 Pattern: Telegram/Chat Integration

**Existing:** `/opt/server-agents/mcp-servers/assistant-channel/` provides Telegram integration.

**Extended for Agent Questions:**
```python
# Agent needs input
channel_query(
    sender=f"agent-{job_id}",
    question="Should I update production database?",
    context="Running migration task for musica project"
)

# Bruno responds via Telegram
# Response delivered to agent via file or status channel
```

### 4.4 Pattern: Approval Gates

**Concept:** Pre-define checkpoints where agent must wait for approval.

```python
# In task prompt
"""
## Approval Gates
- Before running migrations: await approval
- Before deleting files: await approval
- Before modifying production config: await approval

Write to approval.json when reaching a gate. Wait for approved.json.
"""
```

**Gate File:**
```json
{
    "gate": "database_migration",
    "description": "About to run 3 migrations on staging DB",
    "awaiting_since": "2026-01-11T14:30:00Z",
    "auto_approve_after_minutes": null
}
```

### 4.5 Pattern: Interactive Mode Flag

**Concept:** Allow foreground execution when user is present.

```bash
# Background (current default)
luzia musica "run tests"

# Foreground/Interactive
luzia musica "run tests" --fg

# Interactive session (already exists)
luzia work on musica
```

The `--fg` flag already exists but doesn't fully support interactive Q&A. Enhancement needed:
- Don't detach process
- Keep stdin connected
- Allow Claude's AskUserQuestion to work normally

---

## Part 5: Recommendations

### 5.1 Short-Term (Quick Wins)

1. **Better Exit Code Semantics**
   - Exit 100 = "needs clarification" (new code)
   - Capture the question in `clarification.json`
   - `luzia questions` command to list pending

2. **Enhanced `--fg` Mode**
   - Don't background the process
   - Keep stdin/stdout connected
   - Allow normal interactive Claude session

3. **Progress Streaming**
   - Add `luzia watch {job_id}` for `tail -f` on output.log
   - Color-coded output for better readability

### 5.2 Medium-Term (New Features)

4. **File-Based Clarification System**
   - Agent writes to `clarification.json`
   - Luzia CLI watches for pending questions
   - `luzia answer {job_id} <response>` writes `response.json`
   - Agent polls and continues

5. **Telegram/Chat Bridge for Questions**
   - Extend assistant-channel for agent questions
   - Push notification when agent needs input
   - Reply via chat, response routed to agent

6. **Status Dashboard**
   - Web UI showing all running agents
   - Real-time output streaming
   - Question/response interface

### 5.3 Long-Term (Architecture Evolution)

7. **Approval Workflows**
   - Define approval gates in task specification
   - Configurable auto-approve timeouts
   - Audit log of approvals

8. **Agent Orchestration Layer**
   - Queue of pending questions across agents
   - Priority handling for urgent questions
   - SLA tracking for response times

9. **Hybrid Execution Mode**
   - Start background, attach to foreground if question arises
   - Agent sends signal when needing input
   - CLI can "attach" to running agent

---

## Part 6: Implementation Priority

| Priority | Feature | Effort | Impact |
|----------|---------|--------|--------|
| **P0** | Better `--fg` mode | Low | High - enables immediate interactive use |
| **P0** | Exit code 100 for clarification | Low | Medium - better failure understanding |
| **P1** | `luzia watch {job_id}` | Low | Medium - easier monitoring |
| **P1** | File-based clarification | Medium | High - enables async Q&A |
| **P2** | Telegram question bridge | Medium | Medium - mobile notification |
| **P2** | Status dashboard | High | High - visual monitoring |
| **P3** | Approval workflows | High | Medium - enterprise feature |

---

## Conclusion

The current Luzia dispatch architecture is optimized for **fully autonomous** agent execution. This is the right default for background tasks. However, there's a gap for scenarios where:
- Tasks are inherently ambiguous
- User guidance is needed mid-task
- High-stakes actions require approval

The recommended path forward is:
1. **Improve `--fg` mode** for true interactive sessions
2. **Add file-based clarification** for async Q&A on background tasks
3. **Integrate with Telegram** for push notifications on questions
4. **Build status dashboard** for visual monitoring and interaction

These improvements maintain the autonomous-by-default philosophy while enabling human-in-the-loop interaction when needed.

---

## Appendix: Key File Locations

| File | Purpose |
|------|---------|
| `/opt/server-agents/orchestrator/bin/luzia:1102-1401` | `spawn_claude_agent()` - main dispatch |
| `/opt/server-agents/orchestrator/bin/luzia:1404-1449` | `get_job_status()` - status checking |
| `/opt/server-agents/orchestrator/bin/luzia:4000-4042` | `route_logs()` - log viewing |
| `/opt/server-agents/orchestrator/lib/responsive_dispatcher.py` | Async dispatch patterns |
| `/opt/server-agents/orchestrator/lib/cli_feedback.py` | CLI output formatting |
| `/opt/server-agents/orchestrator/AGENT-AUTONOMY-RESEARCH.md` | Prior research on autonomy |
| `/var/log/luz-orchestrator/jobs/` | Job directories |
| `/var/log/luz-orchestrator/notifications.log` | Completion notifications |