Refactor cockpit to use DockerTmuxController pattern

Based on claude-code-tools TmuxCLIController, this refactor: - Added DockerTmuxController class for robust tmux session management - Implements send_keys() with configurable delay_enter - Implements capture_pane() for output retrieval - Implements wait_for_prompt() for pattern-based completion detection - Implements wait_for_idle() for content-hash-based idle detection - Implements wait_for_shell_prompt() for shell prompt detection Also includes workflow improvements: - Pre-task git snapshot before agent execution - Post-task commit protocol in agent guidelines Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 10:42:16 -03:00
commit ec33ac1936
265 changed files with 92011 additions and 0 deletions
--- a/AGENT-CLI-PATTERNS.md
+++ b/AGENT-CLI-PATTERNS.md
@@ -0,0 +1,629 @@
+# CLI Agent Patterns and Prompt Design
+## Practical Guide for Building Non-Blocking Agents
+
+**Date:** 2026-01-09
+**Version:** 1.0
+**Audience:** Agent developers, prompt engineers
+
+---
+
+## Quick Reference: 5 Critical Patterns
+
+### 1. Detached Spawning (Never Block)
+```python
+# ✅ CORRECT: Agent runs in background
+os.system(f'nohup script.sh >/dev/null 2>&1 &')
+job_id = generate_uuid()
+return job_id  # Return immediately
+
+# ❌ WRONG: Parent waits for agent to finish
+result = subprocess.run(['claude', ...], wait=True)
+# CLI blocked until agent completes!
+```
+
+### 2. Permission Bypass (No Approval Dialogs)
+```bash
+# ✅ CORRECT: Agents don't ask for tool approval
+claude --permission-mode bypassPermissions --dangerously-skip-permissions ...
+
+# ❌ WRONG: Default mode asks for confirmation on tool use
+claude ...
+# Blocks waiting for user to approve: "This command has high privileges. Approve? [Y/n]"
+```
+
+### 3. File-Based I/O (No stdin/stdout)
+```python
+# ✅ CORRECT: All I/O via files
+with open(f"{job_dir}/prompt.txt", "w") as f:
+    f.write(full_prompt)
+
+# Agent reads prompt from file
+# Agent writes output to log file
+# Status checked by reading exit code from file
+
+# ❌ WRONG: Trying to use stdin/stdout
+process = subprocess.Popen(..., stdin=PIPE, stdout=PIPE)
+process.stdin.write(prompt)  # What if backgrounded? stdin unavailable!
+result = process.stdout.read()  # Parent blocked waiting!
+```
+
+### 4. Exit Code Signaling (Async Status)
+```bash
+# ✅ CORRECT: Append exit code to output
+command...
+exit_code=$?
+echo "exit:$exit_code" >> output.log
+
+# Later, check status without process
+grep "^exit:" output.log  # Returns immediately
+
+# ❌ WRONG: Only store in memory
+# Process exits, exit code lost
+# Can't determine status later
+```
+
+### 5. Context-First Prompts (Minimize Questions)
+```
+# ✅ CORRECT: Specific, complete, unambiguous
+You are running as user: musica
+Working directory: /workspace
+You have permission to read/write files here.
+
+Task: Run pytest in /workspace/tests and save results to results.json
+Success criteria: File contains {passed: int, failed: int, skipped: int}
+Exit code: 0 if all tests pass, 1 if any fail
+
+Do NOT ask for clarification. You have all needed information.
+
+# ❌ WRONG: Vague, requires interpretation
+Fix the test suite.
+(What needs fixing? Which tests? Agent will need to ask!)
+```
+
+---
+
+## Prompt Patterns for Autonomy
+
+### Pattern 1: Analysis Task (Read-Only)
+
+**Goal:** Agent analyzes code without modifying anything
+
+```markdown
+## Task
+Analyze the TypeScript codebase in /workspace/src for:
+1. Total files
+2. Total lines of code (excluding comments/blanks)
+3. Number of functions
+4. Number of classes
+5. Average cyclomatic complexity per function
+6. Top 3 most complex files
+
+## Success Criteria
+Save results to /workspace/analysis.json with structure:
+{
+  "total_files": number,
+  "total_loc": number,
+  "functions": number,
+  "classes": number,
+  "avg_complexity": number,
+  "hotspots": [
+    {"file": string, "complexity": number, "functions": number}
+  ]
+}
+
+## Exit Codes
+- Exit 0: Success, file created with all fields
+- Exit 1: File not created or missing fields
+- Exit 2: Unrecoverable error (no TypeScript found, etc)
+
+## Autonomy
+You have all information needed. Do NOT:
+- Ask which files to analyze
+- Ask which metrics matter
+- Request clarification on format
+```
+
+### Pattern 2: Execution Task (Run & Report)
+
+**Goal:** Agent runs command and reports results
+
+```markdown
+## Task
+Run the test suite in /workspace/tests with the following requirements:
+
+1. Use pytest with JSON output
+2. Run: pytest tests/ --json=results.json
+3. Capture exit code
+4. Create summary.json with:
+   - Total tests run
+   - Passed count
+   - Failed count
+   - Skipped count
+   - Exit code from pytest
+
+## Success Criteria
+Both results.json (from pytest) and summary.json (created by you) must exist.
+
+Exit 0 if pytest exit code is 0 (all passed)
+Exit 1 if pytest exit code is non-zero (failures)
+
+## What to Do If Tests Fail
+1. Create summary.json anyway with failure counts
+2. Exit with code 1 (not 2, this is expected)
+3. Do NOT try to fix tests yourself
+
+## Autonomy
+You know what to do. Do NOT:
+- Ask which tests to run
+- Ask about test configuration
+- Request approval before running tests
+```
+
+### Pattern 3: Implementation Task (Read + Modify)
+
+**Goal:** Agent modifies code based on specification
+
+```markdown
+## Task
+Add error handling to /workspace/src/database.ts
+
+Requirements:
+1. All database calls must have try/catch
+2. Catch blocks must log to console.error
+3. Catch blocks must return null (not throw)
+4. Add TypeScript types for error parameter
+
+## Success Criteria
+File modifies without syntax errors (use: npm run build)
+All database functions protected (search file for db\. calls)
+
+## Exit Codes
+- Exit 0: All database calls wrapped, no TypeScript errors
+- Exit 1: Some database calls not wrapped, OR TypeScript errors exist
+- Exit 2: File not found or unrecoverable
+
+## Verification
+After modifications:
+npm run build  # Must succeed with no errors
+
+## Autonomy
+You have specific requirements. Do NOT:
+- Ask which functions need wrapping
+- Ask about error logging format
+- Request confirmation before modifying
+```
+
+### Pattern 4: Multi-Phase Task (Sequential Steps)
+
+**Goal:** Agent completes multiple dependent steps
+
+```markdown
+## Task
+Complete this CI/CD pipeline step:
+
+Phase 1: Build
+  - npm install
+  - npm run build
+  - Check: no errors in output
+
+Phase 2: Test
+  - npm run test
+  - Check: exit code 0
+  - If exit code 1: STOP, exit 1 from this task
+
+Phase 3: Report
+  - Create build-report.json with:
+    {
+      "build": {success: true, timestamp: string},
+      "tests": {success: true, count: number, failed: number},
+      "status": "ready_for_deploy"
+    }
+
+## Success Criteria
+All three phases complete AND exit codes from npm are 0
+build-report.json created with all fields
+Overall exit code: 0 (success) or 1 (failure at any phase)
+
+## Autonomy
+Execute phases in order. Do NOT:
+- Ask whether to skip phases
+- Ask about error handling
+- Request approval between phases
+```
+
+### Pattern 5: Decision Task (Branch Logic)
+
+**Goal:** Agent makes decisions based on conditions
+
+```markdown
+## Task
+Decide whether to deploy based on build status.
+
+Steps:
+1. Read build-report.json (created by previous task)
+2. Check: all phases successful
+3. If successful:
+   a. Create deployment-plan.json
+   b. Exit 0
+4. If not successful:
+   a. Create failure-report.json
+   b. Exit 1
+
+## Decision Logic
+IF (build.success AND tests.success AND no_syntax_errors):
+  Deploy ready
+ELSE:
+  Cannot deploy
+
+## Success Criteria
+One of these files exists:
+  - deployment-plan.json (exit 0)
+  - failure-report.json (exit 1)
+
+## Autonomy
+You have criteria. Do NOT:
+- Ask whether to deploy
+- Request confirmation
+- Ask about deployment process
+```
+
+---
+
+## Anti-Patterns: What NOT to Do
+
+### ❌ Anti-Pattern 1: Ambiguous Tasks
+
+```
+WRONG: "Improve the code"
+- What needs improvement?
+- Which files?
+- What metrics?
+AGENT WILL ASK: "Can you clarify what you mean by improve?"
+```
+
+**FIX:**
+```
+CORRECT: "Reduce cyclomatic complexity in src/processor.ts"
+- Identify functions with complexity > 5
+- Refactor to reduce to < 5
+- Run tests to verify no regression
+```
+
+### ❌ Anti-Pattern 2: Vague Success Criteria
+
+```
+WRONG: "Make sure it works"
+- What is "it"?
+- How do we verify it works?
+AGENT WILL ASK: "How should I know when the task is complete?"
+```
+
+**FIX:**
+```
+CORRECT: "Task complete when:"
+- All tests pass (pytest exit 0)
+- No TypeScript errors (npm run build succeeds)
+- Code coverage > 80% (check coverage report)
+```
+
+### ❌ Anti-Pattern 3: Implicit Constraints
+
+```
+WRONG: "Add this feature to the codebase"
+- What files can be modified?
+- What can't be changed?
+AGENT WILL ASK: "Can I modify the database schema?"
+```
+
+**FIX:**
+```
+CORRECT: "Add feature to src/features/auth.ts:"
+- This file ONLY
+- Don't modify: database schema, config, types
+- Do maintain: existing function signatures
+```
+
+### ❌ Anti-Pattern 4: Interactive Questions in Prompts
+
+```
+WRONG:
+"Do you think we should refactor this?
+Try a few approaches and tell me which is best."
+AGENT WILL ASK: "What criteria for 'best'? Performance? Readability?"
+```
+
+**FIX:**
+```
+CORRECT:
+"Refactor for readability:"
+- Break functions > 20 lines into smaller functions
+- Add clear variable names (no x, y, temp)
+- Check: ESLint passes, no new warnings
+```
+
+### ❌ Anti-Pattern 5: Requiring User Approval
+
+```
+WRONG:
+"I'm about to deploy. Is this okay? [Y/n]"
+BLOCKS: Waiting for user input via stdin (won't work in background!)
+```
+
+**FIX:**
+```
+CORRECT:
+"Validate deployment prerequisites and create deployment-plan.json"
+(No approval request. User runs separately: cat deployment-plan.json)
+(If satisfied, user can execute deployment)
+```
+
+---
+
+## Handling Edge Cases Without Blocking
+
+### Case 1: File Not Found
+
+```markdown
+## If /workspace/config.json doesn't exist:
+1. Log to output: "Config file not found"
+2. Create default config
+3. Continue with default values
+4. Do NOT ask user: "Should I create a default?"
+
+## If error occurs during execution:
+1. Log full error to output.log
+2. Include: what failed, why, what was attempted
+3. Exit with code 1
+4. Do NOT ask: "What should I do?"
+```
+
+### Case 2: Ambiguous State
+
+```markdown
+## If multiple versions of file exist:
+1. Document all versions found
+2. Choose: most recent by timestamp
+3. Continue
+4. Log choice to output.log
+5. Do NOT ask: "Which one should I use?"
+
+## If task instructions conflict:
+1. Document the conflict
+2. Follow: primary instruction (first mentioned)
+3. Log reasoning to output.log
+4. Do NOT ask: "Which should I follow?"
+```
+
+### Case 3: Partial Success
+
+```markdown
+## If some tests pass, some fail:
+1. Report both: {passed: 45, failed: 3}
+2. Exit with code 1 (not 0, even though some passed)
+3. Include in output: which tests failed
+4. Do NOT ask: "Should I count partial success?"
+```
+
+---
+
+## Prompt Template for Maximum Autonomy
+
+```markdown
+# Agent Task Template
+
+## Role & Context
+You are a {project_name} project agent.
+Working directory: {absolute_path}
+Running as user: {username}
+Permissions: Full read/write in working directory
+
+## Task Specification
+{SPECIFIC task description}
+
+Success looks like:
+- {Specific deliverable 1}
+- {Specific deliverable 2}
+- {Specific output file/format}
+
+## Execution Environment
+Tools available: Read, Write, Edit, Bash, Glob, Grep
+Directories accessible: {list specific paths}
+Commands available: {list specific commands}
+Constraints: {List what cannot be done}
+
+## Exit Codes
+- 0: Success (all success criteria met)
+- 1: Failure (some success criteria not met, but not unrecoverable)
+- 2: Error (unrecoverable, cannot continue)
+
+## If Something Goes Wrong
+1. Log the error to output
+2. Try once to recover
+3. If recovery fails, exit with appropriate code
+4. Do NOT ask for help or clarification
+
+## Do NOT
+- Ask any clarifying questions
+- Request approval for any action
+- Wait for user input
+- Modify files outside {working directory}
+- Use tools not listed above
+```
+
+---
+
+## Real-World Examples
+
+### Example 1: Code Quality Scan (Read-Only)
+
+**Prompt:**
+```
+Analyze code quality in /workspace/src using:
+1. ESLint (npm run lint) - capture all warnings
+2. TypeScript compiler (npm run build) - capture all errors
+3. Count lines of code per file
+
+Save to quality-report.json:
+{
+  "eslint": {
+    "errors": number,
+    "warnings": number,
+    "rules_violated": [string]
+  },
+  "typescript": {
+    "errors": number,
+    "errors_list": [string]
+  },
+  "code_metrics": {
+    "total_loc": number,
+    "total_files": number,
+    "avg_loc_per_file": number
+  }
+}
+
+Exit 0 if both eslint and typescript succeeded.
+Exit 1 if either had errors.
+Do NOT try to fix errors, just report.
+```
+
+**Expected Agent Behavior:**
+- Runs linters (no approval needed)
+- Collects metrics
+- Creates JSON file
+- Exits with appropriate code
+- No questions asked ✓
+
+### Example 2: Database Migration (Modify + Verify)
+
+**Prompt:**
+```
+Apply database migration /workspace/migrations/001_add_users_table.sql
+
+Steps:
+1. Read migration file
+2. Run: psql -U postgres -d mydb -f migrations/001_add_users_table.sql
+3. If success: psql ... -c "SELECT COUNT(*) FROM users;" to verify
+4. Save results to migration-log.json
+
+Success criteria:
+- Migration file executed without errors
+- New table exists
+- migration-log.json contains:
+  {
+    "timestamp": string,
+    "migration": "001_add_users_table.sql",
+    "status": "success" | "failed",
+    "error": string | null
+  }
+
+Exit 0 on success.
+Exit 1 on any database error.
+Do NOT manually create table if migration fails.
+```
+
+**Expected Agent Behavior:**
+- Executes SQL (no approval needed)
+- Verifies results
+- Logs to JSON
+- Exits appropriately
+- No questions asked ✓
+
+### Example 3: Deployment Check (Decision Logic)
+
+**Prompt:**
+```
+Verify deployment readiness:
+
+Checks:
+1. All tests passing: npm test -> exit 0
+2. Build succeeds: npm run build -> exit 0
+3. No security warnings: npm audit -> moderate/high = 0
+4. Environment configured: .env file exists
+
+Create deployment-readiness.json:
+{
+  "ready": boolean,
+  "checks": {
+    "tests": boolean,
+    "build": boolean,
+    "security": boolean,
+    "config": boolean
+  },
+  "blockers": [string],
+  "timestamp": string
+}
+
+If all checks pass: ready = true, exit 0
+If any check fails: ready = false, exit 1
+Do NOT try to fix blockers. Only report.
+```
+
+**Expected Agent Behavior:**
+- Runs all checks
+- Documents results
+- No fixes attempted
+- Clear decision output
+- No questions asked ✓
+
+---
+
+## Debugging: When Agents DO Ask Questions
+
+### How to Detect Blocking Questions
+
+```bash
+# Check agent output for clarification questions
+grep -i "should i\|would you\|can you\|do you want\|clarif" \
+  /var/log/luz-orchestrator/jobs/{job_id}/output.log
+
+# Check for approval prompts
+grep -i "approve\|confirm\|permission\|y/n" \
+  /var/log/luz-orchestrator/jobs/{job_id}/output.log
+
+# Agent blocked = exit code not in output.log
+tail -5 /var/log/luz-orchestrator/jobs/{job_id}/output.log
+# If last line is NOT "exit:{code}", agent is blocked
+```
+
+### How to Fix
+
+1. **Identify the question** - What is agent asking?
+2. **Redesign prompt** - Provide the answer upfront
+3. **Be more specific** - Remove ambiguity
+4. **Retry** - `luzia retry {job_id}`
+
+---
+
+## Checklist: Autonomous Prompt Quality
+
+- [ ] Task is specific (not "improve" or "fix")
+- [ ] Success criteria defined (what success looks like)
+- [ ] Output format specified (JSON, file, etc)
+- [ ] Exit codes documented (0=success, 1=failure)
+- [ ] Constraints listed (what can't be changed)
+- [ ] No ambiguous language
+- [ ] No requests for clarification
+- [ ] No approval prompts
+- [ ] No "if you think..." or "do you want to..."
+- [ ] All context provided upfront
+- [ ] User running as limited user (not root)
+- [ ] Task scope limited to project directory
+
+---
+
+## Summary
+
+**The Core Rule:**
+> Autonomous agents don't ask questions because they don't need to.
+
+Well-designed prompts provide:
+1. Clear objectives
+2. Specific success criteria
+3. Complete context
+4. Defined boundaries
+5. No ambiguity
+
+When these are present, agents execute autonomously. When they're missing, agents ask clarifying questions, causing blocking.
+
+**For Luzia agents:** Use the 5 patterns (detached spawning, permission bypass, file-based I/O, exit code signaling, context-first prompting) and follow the anti-patterns guide.
+