Refactor cockpit to use DockerTmuxController pattern

Based on claude-code-tools TmuxCLIController, this refactor:

- Added DockerTmuxController class for robust tmux session management
- Implements send_keys() with configurable delay_enter
- Implements capture_pane() for output retrieval
- Implements wait_for_prompt() for pattern-based completion detection
- Implements wait_for_idle() for content-hash-based idle detection
- Implements wait_for_shell_prompt() for shell prompt detection

Also includes workflow improvements:
- Pre-task git snapshot before agent execution
- Post-task commit protocol in agent guidelines

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
admin
2026-01-14 10:42:16 -03:00
commit ec33ac1936
265 changed files with 92011 additions and 0 deletions

629
AGENT-CLI-PATTERNS.md Normal file
View File

@@ -0,0 +1,629 @@
# CLI Agent Patterns and Prompt Design
## Practical Guide for Building Non-Blocking Agents
**Date:** 2026-01-09
**Version:** 1.0
**Audience:** Agent developers, prompt engineers
---
## Quick Reference: 5 Critical Patterns
### 1. Detached Spawning (Never Block)
```python
# ✅ CORRECT: Agent runs in background
os.system(f'nohup script.sh >/dev/null 2>&1 &')
job_id = generate_uuid()
return job_id # Return immediately
# ❌ WRONG: Parent waits for agent to finish
result = subprocess.run(['claude', ...], wait=True)
# CLI blocked until agent completes!
```
### 2. Permission Bypass (No Approval Dialogs)
```bash
# ✅ CORRECT: Agents don't ask for tool approval
claude --permission-mode bypassPermissions --dangerously-skip-permissions ...
# ❌ WRONG: Default mode asks for confirmation on tool use
claude ...
# Blocks waiting for user to approve: "This command has high privileges. Approve? [Y/n]"
```
### 3. File-Based I/O (No stdin/stdout)
```python
# ✅ CORRECT: All I/O via files
with open(f"{job_dir}/prompt.txt", "w") as f:
f.write(full_prompt)
# Agent reads prompt from file
# Agent writes output to log file
# Status checked by reading exit code from file
# ❌ WRONG: Trying to use stdin/stdout
process = subprocess.Popen(..., stdin=PIPE, stdout=PIPE)
process.stdin.write(prompt) # What if backgrounded? stdin unavailable!
result = process.stdout.read() # Parent blocked waiting!
```
### 4. Exit Code Signaling (Async Status)
```bash
# ✅ CORRECT: Append exit code to output
command...
exit_code=$?
echo "exit:$exit_code" >> output.log
# Later, check status without process
grep "^exit:" output.log # Returns immediately
# ❌ WRONG: Only store in memory
# Process exits, exit code lost
# Can't determine status later
```
### 5. Context-First Prompts (Minimize Questions)
```
# ✅ CORRECT: Specific, complete, unambiguous
You are running as user: musica
Working directory: /workspace
You have permission to read/write files here.
Task: Run pytest in /workspace/tests and save results to results.json
Success criteria: File contains {passed: int, failed: int, skipped: int}
Exit code: 0 if all tests pass, 1 if any fail
Do NOT ask for clarification. You have all needed information.
# ❌ WRONG: Vague, requires interpretation
Fix the test suite.
(What needs fixing? Which tests? Agent will need to ask!)
```
---
## Prompt Patterns for Autonomy
### Pattern 1: Analysis Task (Read-Only)
**Goal:** Agent analyzes code without modifying anything
```markdown
## Task
Analyze the TypeScript codebase in /workspace/src for:
1. Total files
2. Total lines of code (excluding comments/blanks)
3. Number of functions
4. Number of classes
5. Average cyclomatic complexity per function
6. Top 3 most complex files
## Success Criteria
Save results to /workspace/analysis.json with structure:
{
"total_files": number,
"total_loc": number,
"functions": number,
"classes": number,
"avg_complexity": number,
"hotspots": [
{"file": string, "complexity": number, "functions": number}
]
}
## Exit Codes
- Exit 0: Success, file created with all fields
- Exit 1: File not created or missing fields
- Exit 2: Unrecoverable error (no TypeScript found, etc)
## Autonomy
You have all information needed. Do NOT:
- Ask which files to analyze
- Ask which metrics matter
- Request clarification on format
```
### Pattern 2: Execution Task (Run & Report)
**Goal:** Agent runs command and reports results
```markdown
## Task
Run the test suite in /workspace/tests with the following requirements:
1. Use pytest with JSON output
2. Run: pytest tests/ --json=results.json
3. Capture exit code
4. Create summary.json with:
- Total tests run
- Passed count
- Failed count
- Skipped count
- Exit code from pytest
## Success Criteria
Both results.json (from pytest) and summary.json (created by you) must exist.
Exit 0 if pytest exit code is 0 (all passed)
Exit 1 if pytest exit code is non-zero (failures)
## What to Do If Tests Fail
1. Create summary.json anyway with failure counts
2. Exit with code 1 (not 2, this is expected)
3. Do NOT try to fix tests yourself
## Autonomy
You know what to do. Do NOT:
- Ask which tests to run
- Ask about test configuration
- Request approval before running tests
```
### Pattern 3: Implementation Task (Read + Modify)
**Goal:** Agent modifies code based on specification
```markdown
## Task
Add error handling to /workspace/src/database.ts
Requirements:
1. All database calls must have try/catch
2. Catch blocks must log to console.error
3. Catch blocks must return null (not throw)
4. Add TypeScript types for error parameter
## Success Criteria
File modifies without syntax errors (use: npm run build)
All database functions protected (search file for db\. calls)
## Exit Codes
- Exit 0: All database calls wrapped, no TypeScript errors
- Exit 1: Some database calls not wrapped, OR TypeScript errors exist
- Exit 2: File not found or unrecoverable
## Verification
After modifications:
npm run build # Must succeed with no errors
## Autonomy
You have specific requirements. Do NOT:
- Ask which functions need wrapping
- Ask about error logging format
- Request confirmation before modifying
```
### Pattern 4: Multi-Phase Task (Sequential Steps)
**Goal:** Agent completes multiple dependent steps
```markdown
## Task
Complete this CI/CD pipeline step:
Phase 1: Build
- npm install
- npm run build
- Check: no errors in output
Phase 2: Test
- npm run test
- Check: exit code 0
- If exit code 1: STOP, exit 1 from this task
Phase 3: Report
- Create build-report.json with:
{
"build": {success: true, timestamp: string},
"tests": {success: true, count: number, failed: number},
"status": "ready_for_deploy"
}
## Success Criteria
All three phases complete AND exit codes from npm are 0
build-report.json created with all fields
Overall exit code: 0 (success) or 1 (failure at any phase)
## Autonomy
Execute phases in order. Do NOT:
- Ask whether to skip phases
- Ask about error handling
- Request approval between phases
```
### Pattern 5: Decision Task (Branch Logic)
**Goal:** Agent makes decisions based on conditions
```markdown
## Task
Decide whether to deploy based on build status.
Steps:
1. Read build-report.json (created by previous task)
2. Check: all phases successful
3. If successful:
a. Create deployment-plan.json
b. Exit 0
4. If not successful:
a. Create failure-report.json
b. Exit 1
## Decision Logic
IF (build.success AND tests.success AND no_syntax_errors):
Deploy ready
ELSE:
Cannot deploy
## Success Criteria
One of these files exists:
- deployment-plan.json (exit 0)
- failure-report.json (exit 1)
## Autonomy
You have criteria. Do NOT:
- Ask whether to deploy
- Request confirmation
- Ask about deployment process
```
---
## Anti-Patterns: What NOT to Do
### ❌ Anti-Pattern 1: Ambiguous Tasks
```
WRONG: "Improve the code"
- What needs improvement?
- Which files?
- What metrics?
AGENT WILL ASK: "Can you clarify what you mean by improve?"
```
**FIX:**
```
CORRECT: "Reduce cyclomatic complexity in src/processor.ts"
- Identify functions with complexity > 5
- Refactor to reduce to < 5
- Run tests to verify no regression
```
### ❌ Anti-Pattern 2: Vague Success Criteria
```
WRONG: "Make sure it works"
- What is "it"?
- How do we verify it works?
AGENT WILL ASK: "How should I know when the task is complete?"
```
**FIX:**
```
CORRECT: "Task complete when:"
- All tests pass (pytest exit 0)
- No TypeScript errors (npm run build succeeds)
- Code coverage > 80% (check coverage report)
```
### ❌ Anti-Pattern 3: Implicit Constraints
```
WRONG: "Add this feature to the codebase"
- What files can be modified?
- What can't be changed?
AGENT WILL ASK: "Can I modify the database schema?"
```
**FIX:**
```
CORRECT: "Add feature to src/features/auth.ts:"
- This file ONLY
- Don't modify: database schema, config, types
- Do maintain: existing function signatures
```
### ❌ Anti-Pattern 4: Interactive Questions in Prompts
```
WRONG:
"Do you think we should refactor this?
Try a few approaches and tell me which is best."
AGENT WILL ASK: "What criteria for 'best'? Performance? Readability?"
```
**FIX:**
```
CORRECT:
"Refactor for readability:"
- Break functions > 20 lines into smaller functions
- Add clear variable names (no x, y, temp)
- Check: ESLint passes, no new warnings
```
### ❌ Anti-Pattern 5: Requiring User Approval
```
WRONG:
"I'm about to deploy. Is this okay? [Y/n]"
BLOCKS: Waiting for user input via stdin (won't work in background!)
```
**FIX:**
```
CORRECT:
"Validate deployment prerequisites and create deployment-plan.json"
(No approval request. User runs separately: cat deployment-plan.json)
(If satisfied, user can execute deployment)
```
---
## Handling Edge Cases Without Blocking
### Case 1: File Not Found
```markdown
## If /workspace/config.json doesn't exist:
1. Log to output: "Config file not found"
2. Create default config
3. Continue with default values
4. Do NOT ask user: "Should I create a default?"
## If error occurs during execution:
1. Log full error to output.log
2. Include: what failed, why, what was attempted
3. Exit with code 1
4. Do NOT ask: "What should I do?"
```
### Case 2: Ambiguous State
```markdown
## If multiple versions of file exist:
1. Document all versions found
2. Choose: most recent by timestamp
3. Continue
4. Log choice to output.log
5. Do NOT ask: "Which one should I use?"
## If task instructions conflict:
1. Document the conflict
2. Follow: primary instruction (first mentioned)
3. Log reasoning to output.log
4. Do NOT ask: "Which should I follow?"
```
### Case 3: Partial Success
```markdown
## If some tests pass, some fail:
1. Report both: {passed: 45, failed: 3}
2. Exit with code 1 (not 0, even though some passed)
3. Include in output: which tests failed
4. Do NOT ask: "Should I count partial success?"
```
---
## Prompt Template for Maximum Autonomy
```markdown
# Agent Task Template
## Role & Context
You are a {project_name} project agent.
Working directory: {absolute_path}
Running as user: {username}
Permissions: Full read/write in working directory
## Task Specification
{SPECIFIC task description}
Success looks like:
- {Specific deliverable 1}
- {Specific deliverable 2}
- {Specific output file/format}
## Execution Environment
Tools available: Read, Write, Edit, Bash, Glob, Grep
Directories accessible: {list specific paths}
Commands available: {list specific commands}
Constraints: {List what cannot be done}
## Exit Codes
- 0: Success (all success criteria met)
- 1: Failure (some success criteria not met, but not unrecoverable)
- 2: Error (unrecoverable, cannot continue)
## If Something Goes Wrong
1. Log the error to output
2. Try once to recover
3. If recovery fails, exit with appropriate code
4. Do NOT ask for help or clarification
## Do NOT
- Ask any clarifying questions
- Request approval for any action
- Wait for user input
- Modify files outside {working directory}
- Use tools not listed above
```
---
## Real-World Examples
### Example 1: Code Quality Scan (Read-Only)
**Prompt:**
```
Analyze code quality in /workspace/src using:
1. ESLint (npm run lint) - capture all warnings
2. TypeScript compiler (npm run build) - capture all errors
3. Count lines of code per file
Save to quality-report.json:
{
"eslint": {
"errors": number,
"warnings": number,
"rules_violated": [string]
},
"typescript": {
"errors": number,
"errors_list": [string]
},
"code_metrics": {
"total_loc": number,
"total_files": number,
"avg_loc_per_file": number
}
}
Exit 0 if both eslint and typescript succeeded.
Exit 1 if either had errors.
Do NOT try to fix errors, just report.
```
**Expected Agent Behavior:**
- Runs linters (no approval needed)
- Collects metrics
- Creates JSON file
- Exits with appropriate code
- No questions asked ✓
### Example 2: Database Migration (Modify + Verify)
**Prompt:**
```
Apply database migration /workspace/migrations/001_add_users_table.sql
Steps:
1. Read migration file
2. Run: psql -U postgres -d mydb -f migrations/001_add_users_table.sql
3. If success: psql ... -c "SELECT COUNT(*) FROM users;" to verify
4. Save results to migration-log.json
Success criteria:
- Migration file executed without errors
- New table exists
- migration-log.json contains:
{
"timestamp": string,
"migration": "001_add_users_table.sql",
"status": "success" | "failed",
"error": string | null
}
Exit 0 on success.
Exit 1 on any database error.
Do NOT manually create table if migration fails.
```
**Expected Agent Behavior:**
- Executes SQL (no approval needed)
- Verifies results
- Logs to JSON
- Exits appropriately
- No questions asked ✓
### Example 3: Deployment Check (Decision Logic)
**Prompt:**
```
Verify deployment readiness:
Checks:
1. All tests passing: npm test -> exit 0
2. Build succeeds: npm run build -> exit 0
3. No security warnings: npm audit -> moderate/high = 0
4. Environment configured: .env file exists
Create deployment-readiness.json:
{
"ready": boolean,
"checks": {
"tests": boolean,
"build": boolean,
"security": boolean,
"config": boolean
},
"blockers": [string],
"timestamp": string
}
If all checks pass: ready = true, exit 0
If any check fails: ready = false, exit 1
Do NOT try to fix blockers. Only report.
```
**Expected Agent Behavior:**
- Runs all checks
- Documents results
- No fixes attempted
- Clear decision output
- No questions asked ✓
---
## Debugging: When Agents DO Ask Questions
### How to Detect Blocking Questions
```bash
# Check agent output for clarification questions
grep -i "should i\|would you\|can you\|do you want\|clarif" \
/var/log/luz-orchestrator/jobs/{job_id}/output.log
# Check for approval prompts
grep -i "approve\|confirm\|permission\|y/n" \
/var/log/luz-orchestrator/jobs/{job_id}/output.log
# Agent blocked = exit code not in output.log
tail -5 /var/log/luz-orchestrator/jobs/{job_id}/output.log
# If last line is NOT "exit:{code}", agent is blocked
```
### How to Fix
1. **Identify the question** - What is agent asking?
2. **Redesign prompt** - Provide the answer upfront
3. **Be more specific** - Remove ambiguity
4. **Retry** - `luzia retry {job_id}`
---
## Checklist: Autonomous Prompt Quality
- [ ] Task is specific (not "improve" or "fix")
- [ ] Success criteria defined (what success looks like)
- [ ] Output format specified (JSON, file, etc)
- [ ] Exit codes documented (0=success, 1=failure)
- [ ] Constraints listed (what can't be changed)
- [ ] No ambiguous language
- [ ] No requests for clarification
- [ ] No approval prompts
- [ ] No "if you think..." or "do you want to..."
- [ ] All context provided upfront
- [ ] User running as limited user (not root)
- [ ] Task scope limited to project directory
---
## Summary
**The Core Rule:**
> Autonomous agents don't ask questions because they don't need to.
Well-designed prompts provide:
1. Clear objectives
2. Specific success criteria
3. Complete context
4. Defined boundaries
5. No ambiguity
When these are present, agents execute autonomously. When they're missing, agents ask clarifying questions, causing blocking.
**For Luzia agents:** Use the 5 patterns (detached spawning, permission bypass, file-based I/O, exit code signaling, context-first prompting) and follow the anti-patterns guide.