Refactor cockpit to use DockerTmuxController pattern
Based on claude-code-tools TmuxCLIController, this refactor: - Added DockerTmuxController class for robust tmux session management - Implements send_keys() with configurable delay_enter - Implements capture_pane() for output retrieval - Implements wait_for_prompt() for pattern-based completion detection - Implements wait_for_idle() for content-hash-based idle detection - Implements wait_for_shell_prompt() for shell prompt detection Also includes workflow improvements: - Pre-task git snapshot before agent execution - Post-task commit protocol in agent guidelines Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
629
AGENT-CLI-PATTERNS.md
Normal file
629
AGENT-CLI-PATTERNS.md
Normal file
@@ -0,0 +1,629 @@
|
||||
# CLI Agent Patterns and Prompt Design
|
||||
## Practical Guide for Building Non-Blocking Agents
|
||||
|
||||
**Date:** 2026-01-09
|
||||
**Version:** 1.0
|
||||
**Audience:** Agent developers, prompt engineers
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference: 5 Critical Patterns
|
||||
|
||||
### 1. Detached Spawning (Never Block)
|
||||
```python
|
||||
# ✅ CORRECT: Agent runs in background
|
||||
os.system(f'nohup script.sh >/dev/null 2>&1 &')
|
||||
job_id = generate_uuid()
|
||||
return job_id # Return immediately
|
||||
|
||||
# ❌ WRONG: Parent waits for agent to finish
|
||||
result = subprocess.run(['claude', ...], wait=True)
|
||||
# CLI blocked until agent completes!
|
||||
```
|
||||
|
||||
### 2. Permission Bypass (No Approval Dialogs)
|
||||
```bash
|
||||
# ✅ CORRECT: Agents don't ask for tool approval
|
||||
claude --permission-mode bypassPermissions --dangerously-skip-permissions ...
|
||||
|
||||
# ❌ WRONG: Default mode asks for confirmation on tool use
|
||||
claude ...
|
||||
# Blocks waiting for user to approve: "This command has high privileges. Approve? [Y/n]"
|
||||
```
|
||||
|
||||
### 3. File-Based I/O (No stdin/stdout)
|
||||
```python
|
||||
# ✅ CORRECT: All I/O via files
|
||||
with open(f"{job_dir}/prompt.txt", "w") as f:
|
||||
f.write(full_prompt)
|
||||
|
||||
# Agent reads prompt from file
|
||||
# Agent writes output to log file
|
||||
# Status checked by reading exit code from file
|
||||
|
||||
# ❌ WRONG: Trying to use stdin/stdout
|
||||
process = subprocess.Popen(..., stdin=PIPE, stdout=PIPE)
|
||||
process.stdin.write(prompt) # What if backgrounded? stdin unavailable!
|
||||
result = process.stdout.read() # Parent blocked waiting!
|
||||
```
|
||||
|
||||
### 4. Exit Code Signaling (Async Status)
|
||||
```bash
|
||||
# ✅ CORRECT: Append exit code to output
|
||||
command...
|
||||
exit_code=$?
|
||||
echo "exit:$exit_code" >> output.log
|
||||
|
||||
# Later, check status without process
|
||||
grep "^exit:" output.log # Returns immediately
|
||||
|
||||
# ❌ WRONG: Only store in memory
|
||||
# Process exits, exit code lost
|
||||
# Can't determine status later
|
||||
```
|
||||
|
||||
### 5. Context-First Prompts (Minimize Questions)
|
||||
```
|
||||
# ✅ CORRECT: Specific, complete, unambiguous
|
||||
You are running as user: musica
|
||||
Working directory: /workspace
|
||||
You have permission to read/write files here.
|
||||
|
||||
Task: Run pytest in /workspace/tests and save results to results.json
|
||||
Success criteria: File contains {passed: int, failed: int, skipped: int}
|
||||
Exit code: 0 if all tests pass, 1 if any fail
|
||||
|
||||
Do NOT ask for clarification. You have all needed information.
|
||||
|
||||
# ❌ WRONG: Vague, requires interpretation
|
||||
Fix the test suite.
|
||||
(What needs fixing? Which tests? Agent will need to ask!)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Prompt Patterns for Autonomy
|
||||
|
||||
### Pattern 1: Analysis Task (Read-Only)
|
||||
|
||||
**Goal:** Agent analyzes code without modifying anything
|
||||
|
||||
```markdown
|
||||
## Task
|
||||
Analyze the TypeScript codebase in /workspace/src for:
|
||||
1. Total files
|
||||
2. Total lines of code (excluding comments/blanks)
|
||||
3. Number of functions
|
||||
4. Number of classes
|
||||
5. Average cyclomatic complexity per function
|
||||
6. Top 3 most complex files
|
||||
|
||||
## Success Criteria
|
||||
Save results to /workspace/analysis.json with structure:
|
||||
{
|
||||
"total_files": number,
|
||||
"total_loc": number,
|
||||
"functions": number,
|
||||
"classes": number,
|
||||
"avg_complexity": number,
|
||||
"hotspots": [
|
||||
{"file": string, "complexity": number, "functions": number}
|
||||
]
|
||||
}
|
||||
|
||||
## Exit Codes
|
||||
- Exit 0: Success, file created with all fields
|
||||
- Exit 1: File not created or missing fields
|
||||
- Exit 2: Unrecoverable error (no TypeScript found, etc)
|
||||
|
||||
## Autonomy
|
||||
You have all information needed. Do NOT:
|
||||
- Ask which files to analyze
|
||||
- Ask which metrics matter
|
||||
- Request clarification on format
|
||||
```
|
||||
|
||||
### Pattern 2: Execution Task (Run & Report)
|
||||
|
||||
**Goal:** Agent runs command and reports results
|
||||
|
||||
```markdown
|
||||
## Task
|
||||
Run the test suite in /workspace/tests with the following requirements:
|
||||
|
||||
1. Use pytest with JSON output
|
||||
2. Run: pytest tests/ --json=results.json
|
||||
3. Capture exit code
|
||||
4. Create summary.json with:
|
||||
- Total tests run
|
||||
- Passed count
|
||||
- Failed count
|
||||
- Skipped count
|
||||
- Exit code from pytest
|
||||
|
||||
## Success Criteria
|
||||
Both results.json (from pytest) and summary.json (created by you) must exist.
|
||||
|
||||
Exit 0 if pytest exit code is 0 (all passed)
|
||||
Exit 1 if pytest exit code is non-zero (failures)
|
||||
|
||||
## What to Do If Tests Fail
|
||||
1. Create summary.json anyway with failure counts
|
||||
2. Exit with code 1 (not 2, this is expected)
|
||||
3. Do NOT try to fix tests yourself
|
||||
|
||||
## Autonomy
|
||||
You know what to do. Do NOT:
|
||||
- Ask which tests to run
|
||||
- Ask about test configuration
|
||||
- Request approval before running tests
|
||||
```
|
||||
|
||||
### Pattern 3: Implementation Task (Read + Modify)
|
||||
|
||||
**Goal:** Agent modifies code based on specification
|
||||
|
||||
```markdown
|
||||
## Task
|
||||
Add error handling to /workspace/src/database.ts
|
||||
|
||||
Requirements:
|
||||
1. All database calls must have try/catch
|
||||
2. Catch blocks must log to console.error
|
||||
3. Catch blocks must return null (not throw)
|
||||
4. Add TypeScript types for error parameter
|
||||
|
||||
## Success Criteria
|
||||
File modifies without syntax errors (use: npm run build)
|
||||
All database functions protected (search file for db\. calls)
|
||||
|
||||
## Exit Codes
|
||||
- Exit 0: All database calls wrapped, no TypeScript errors
|
||||
- Exit 1: Some database calls not wrapped, OR TypeScript errors exist
|
||||
- Exit 2: File not found or unrecoverable
|
||||
|
||||
## Verification
|
||||
After modifications:
|
||||
npm run build # Must succeed with no errors
|
||||
|
||||
## Autonomy
|
||||
You have specific requirements. Do NOT:
|
||||
- Ask which functions need wrapping
|
||||
- Ask about error logging format
|
||||
- Request confirmation before modifying
|
||||
```
|
||||
|
||||
### Pattern 4: Multi-Phase Task (Sequential Steps)
|
||||
|
||||
**Goal:** Agent completes multiple dependent steps
|
||||
|
||||
```markdown
|
||||
## Task
|
||||
Complete this CI/CD pipeline step:
|
||||
|
||||
Phase 1: Build
|
||||
- npm install
|
||||
- npm run build
|
||||
- Check: no errors in output
|
||||
|
||||
Phase 2: Test
|
||||
- npm run test
|
||||
- Check: exit code 0
|
||||
- If exit code 1: STOP, exit 1 from this task
|
||||
|
||||
Phase 3: Report
|
||||
- Create build-report.json with:
|
||||
{
|
||||
"build": {success: true, timestamp: string},
|
||||
"tests": {success: true, count: number, failed: number},
|
||||
"status": "ready_for_deploy"
|
||||
}
|
||||
|
||||
## Success Criteria
|
||||
All three phases complete AND exit codes from npm are 0
|
||||
build-report.json created with all fields
|
||||
Overall exit code: 0 (success) or 1 (failure at any phase)
|
||||
|
||||
## Autonomy
|
||||
Execute phases in order. Do NOT:
|
||||
- Ask whether to skip phases
|
||||
- Ask about error handling
|
||||
- Request approval between phases
|
||||
```
|
||||
|
||||
### Pattern 5: Decision Task (Branch Logic)
|
||||
|
||||
**Goal:** Agent makes decisions based on conditions
|
||||
|
||||
```markdown
|
||||
## Task
|
||||
Decide whether to deploy based on build status.
|
||||
|
||||
Steps:
|
||||
1. Read build-report.json (created by previous task)
|
||||
2. Check: all phases successful
|
||||
3. If successful:
|
||||
a. Create deployment-plan.json
|
||||
b. Exit 0
|
||||
4. If not successful:
|
||||
a. Create failure-report.json
|
||||
b. Exit 1
|
||||
|
||||
## Decision Logic
|
||||
IF (build.success AND tests.success AND no_syntax_errors):
|
||||
Deploy ready
|
||||
ELSE:
|
||||
Cannot deploy
|
||||
|
||||
## Success Criteria
|
||||
One of these files exists:
|
||||
- deployment-plan.json (exit 0)
|
||||
- failure-report.json (exit 1)
|
||||
|
||||
## Autonomy
|
||||
You have criteria. Do NOT:
|
||||
- Ask whether to deploy
|
||||
- Request confirmation
|
||||
- Ask about deployment process
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Anti-Patterns: What NOT to Do
|
||||
|
||||
### ❌ Anti-Pattern 1: Ambiguous Tasks
|
||||
|
||||
```
|
||||
WRONG: "Improve the code"
|
||||
- What needs improvement?
|
||||
- Which files?
|
||||
- What metrics?
|
||||
AGENT WILL ASK: "Can you clarify what you mean by improve?"
|
||||
```
|
||||
|
||||
**FIX:**
|
||||
```
|
||||
CORRECT: "Reduce cyclomatic complexity in src/processor.ts"
|
||||
- Identify functions with complexity > 5
|
||||
- Refactor to reduce to < 5
|
||||
- Run tests to verify no regression
|
||||
```
|
||||
|
||||
### ❌ Anti-Pattern 2: Vague Success Criteria
|
||||
|
||||
```
|
||||
WRONG: "Make sure it works"
|
||||
- What is "it"?
|
||||
- How do we verify it works?
|
||||
AGENT WILL ASK: "How should I know when the task is complete?"
|
||||
```
|
||||
|
||||
**FIX:**
|
||||
```
|
||||
CORRECT: "Task complete when:"
|
||||
- All tests pass (pytest exit 0)
|
||||
- No TypeScript errors (npm run build succeeds)
|
||||
- Code coverage > 80% (check coverage report)
|
||||
```
|
||||
|
||||
### ❌ Anti-Pattern 3: Implicit Constraints
|
||||
|
||||
```
|
||||
WRONG: "Add this feature to the codebase"
|
||||
- What files can be modified?
|
||||
- What can't be changed?
|
||||
AGENT WILL ASK: "Can I modify the database schema?"
|
||||
```
|
||||
|
||||
**FIX:**
|
||||
```
|
||||
CORRECT: "Add feature to src/features/auth.ts:"
|
||||
- This file ONLY
|
||||
- Don't modify: database schema, config, types
|
||||
- Do maintain: existing function signatures
|
||||
```
|
||||
|
||||
### ❌ Anti-Pattern 4: Interactive Questions in Prompts
|
||||
|
||||
```
|
||||
WRONG:
|
||||
"Do you think we should refactor this?
|
||||
Try a few approaches and tell me which is best."
|
||||
AGENT WILL ASK: "What criteria for 'best'? Performance? Readability?"
|
||||
```
|
||||
|
||||
**FIX:**
|
||||
```
|
||||
CORRECT:
|
||||
"Refactor for readability:"
|
||||
- Break functions > 20 lines into smaller functions
|
||||
- Add clear variable names (no x, y, temp)
|
||||
- Check: ESLint passes, no new warnings
|
||||
```
|
||||
|
||||
### ❌ Anti-Pattern 5: Requiring User Approval
|
||||
|
||||
```
|
||||
WRONG:
|
||||
"I'm about to deploy. Is this okay? [Y/n]"
|
||||
BLOCKS: Waiting for user input via stdin (won't work in background!)
|
||||
```
|
||||
|
||||
**FIX:**
|
||||
```
|
||||
CORRECT:
|
||||
"Validate deployment prerequisites and create deployment-plan.json"
|
||||
(No approval request. User runs separately: cat deployment-plan.json)
|
||||
(If satisfied, user can execute deployment)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Handling Edge Cases Without Blocking
|
||||
|
||||
### Case 1: File Not Found
|
||||
|
||||
```markdown
|
||||
## If /workspace/config.json doesn't exist:
|
||||
1. Log to output: "Config file not found"
|
||||
2. Create default config
|
||||
3. Continue with default values
|
||||
4. Do NOT ask user: "Should I create a default?"
|
||||
|
||||
## If error occurs during execution:
|
||||
1. Log full error to output.log
|
||||
2. Include: what failed, why, what was attempted
|
||||
3. Exit with code 1
|
||||
4. Do NOT ask: "What should I do?"
|
||||
```
|
||||
|
||||
### Case 2: Ambiguous State
|
||||
|
||||
```markdown
|
||||
## If multiple versions of file exist:
|
||||
1. Document all versions found
|
||||
2. Choose: most recent by timestamp
|
||||
3. Continue
|
||||
4. Log choice to output.log
|
||||
5. Do NOT ask: "Which one should I use?"
|
||||
|
||||
## If task instructions conflict:
|
||||
1. Document the conflict
|
||||
2. Follow: primary instruction (first mentioned)
|
||||
3. Log reasoning to output.log
|
||||
4. Do NOT ask: "Which should I follow?"
|
||||
```
|
||||
|
||||
### Case 3: Partial Success
|
||||
|
||||
```markdown
|
||||
## If some tests pass, some fail:
|
||||
1. Report both: {passed: 45, failed: 3}
|
||||
2. Exit with code 1 (not 0, even though some passed)
|
||||
3. Include in output: which tests failed
|
||||
4. Do NOT ask: "Should I count partial success?"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Prompt Template for Maximum Autonomy
|
||||
|
||||
```markdown
|
||||
# Agent Task Template
|
||||
|
||||
## Role & Context
|
||||
You are a {project_name} project agent.
|
||||
Working directory: {absolute_path}
|
||||
Running as user: {username}
|
||||
Permissions: Full read/write in working directory
|
||||
|
||||
## Task Specification
|
||||
{SPECIFIC task description}
|
||||
|
||||
Success looks like:
|
||||
- {Specific deliverable 1}
|
||||
- {Specific deliverable 2}
|
||||
- {Specific output file/format}
|
||||
|
||||
## Execution Environment
|
||||
Tools available: Read, Write, Edit, Bash, Glob, Grep
|
||||
Directories accessible: {list specific paths}
|
||||
Commands available: {list specific commands}
|
||||
Constraints: {List what cannot be done}
|
||||
|
||||
## Exit Codes
|
||||
- 0: Success (all success criteria met)
|
||||
- 1: Failure (some success criteria not met, but not unrecoverable)
|
||||
- 2: Error (unrecoverable, cannot continue)
|
||||
|
||||
## If Something Goes Wrong
|
||||
1. Log the error to output
|
||||
2. Try once to recover
|
||||
3. If recovery fails, exit with appropriate code
|
||||
4. Do NOT ask for help or clarification
|
||||
|
||||
## Do NOT
|
||||
- Ask any clarifying questions
|
||||
- Request approval for any action
|
||||
- Wait for user input
|
||||
- Modify files outside {working directory}
|
||||
- Use tools not listed above
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Real-World Examples
|
||||
|
||||
### Example 1: Code Quality Scan (Read-Only)
|
||||
|
||||
**Prompt:**
|
||||
```
|
||||
Analyze code quality in /workspace/src using:
|
||||
1. ESLint (npm run lint) - capture all warnings
|
||||
2. TypeScript compiler (npm run build) - capture all errors
|
||||
3. Count lines of code per file
|
||||
|
||||
Save to quality-report.json:
|
||||
{
|
||||
"eslint": {
|
||||
"errors": number,
|
||||
"warnings": number,
|
||||
"rules_violated": [string]
|
||||
},
|
||||
"typescript": {
|
||||
"errors": number,
|
||||
"errors_list": [string]
|
||||
},
|
||||
"code_metrics": {
|
||||
"total_loc": number,
|
||||
"total_files": number,
|
||||
"avg_loc_per_file": number
|
||||
}
|
||||
}
|
||||
|
||||
Exit 0 if both eslint and typescript succeeded.
|
||||
Exit 1 if either had errors.
|
||||
Do NOT try to fix errors, just report.
|
||||
```
|
||||
|
||||
**Expected Agent Behavior:**
|
||||
- Runs linters (no approval needed)
|
||||
- Collects metrics
|
||||
- Creates JSON file
|
||||
- Exits with appropriate code
|
||||
- No questions asked ✓
|
||||
|
||||
### Example 2: Database Migration (Modify + Verify)
|
||||
|
||||
**Prompt:**
|
||||
```
|
||||
Apply database migration /workspace/migrations/001_add_users_table.sql
|
||||
|
||||
Steps:
|
||||
1. Read migration file
|
||||
2. Run: psql -U postgres -d mydb -f migrations/001_add_users_table.sql
|
||||
3. If success: psql ... -c "SELECT COUNT(*) FROM users;" to verify
|
||||
4. Save results to migration-log.json
|
||||
|
||||
Success criteria:
|
||||
- Migration file executed without errors
|
||||
- New table exists
|
||||
- migration-log.json contains:
|
||||
{
|
||||
"timestamp": string,
|
||||
"migration": "001_add_users_table.sql",
|
||||
"status": "success" | "failed",
|
||||
"error": string | null
|
||||
}
|
||||
|
||||
Exit 0 on success.
|
||||
Exit 1 on any database error.
|
||||
Do NOT manually create table if migration fails.
|
||||
```
|
||||
|
||||
**Expected Agent Behavior:**
|
||||
- Executes SQL (no approval needed)
|
||||
- Verifies results
|
||||
- Logs to JSON
|
||||
- Exits appropriately
|
||||
- No questions asked ✓
|
||||
|
||||
### Example 3: Deployment Check (Decision Logic)
|
||||
|
||||
**Prompt:**
|
||||
```
|
||||
Verify deployment readiness:
|
||||
|
||||
Checks:
|
||||
1. All tests passing: npm test -> exit 0
|
||||
2. Build succeeds: npm run build -> exit 0
|
||||
3. No security warnings: npm audit -> moderate/high = 0
|
||||
4. Environment configured: .env file exists
|
||||
|
||||
Create deployment-readiness.json:
|
||||
{
|
||||
"ready": boolean,
|
||||
"checks": {
|
||||
"tests": boolean,
|
||||
"build": boolean,
|
||||
"security": boolean,
|
||||
"config": boolean
|
||||
},
|
||||
"blockers": [string],
|
||||
"timestamp": string
|
||||
}
|
||||
|
||||
If all checks pass: ready = true, exit 0
|
||||
If any check fails: ready = false, exit 1
|
||||
Do NOT try to fix blockers. Only report.
|
||||
```
|
||||
|
||||
**Expected Agent Behavior:**
|
||||
- Runs all checks
|
||||
- Documents results
|
||||
- No fixes attempted
|
||||
- Clear decision output
|
||||
- No questions asked ✓
|
||||
|
||||
---
|
||||
|
||||
## Debugging: When Agents DO Ask Questions
|
||||
|
||||
### How to Detect Blocking Questions
|
||||
|
||||
```bash
|
||||
# Check agent output for clarification questions
|
||||
grep -i "should i\|would you\|can you\|do you want\|clarif" \
|
||||
/var/log/luz-orchestrator/jobs/{job_id}/output.log
|
||||
|
||||
# Check for approval prompts
|
||||
grep -i "approve\|confirm\|permission\|y/n" \
|
||||
/var/log/luz-orchestrator/jobs/{job_id}/output.log
|
||||
|
||||
# Agent blocked = exit code not in output.log
|
||||
tail -5 /var/log/luz-orchestrator/jobs/{job_id}/output.log
|
||||
# If last line is NOT "exit:{code}", agent is blocked
|
||||
```
|
||||
|
||||
### How to Fix
|
||||
|
||||
1. **Identify the question** - What is agent asking?
|
||||
2. **Redesign prompt** - Provide the answer upfront
|
||||
3. **Be more specific** - Remove ambiguity
|
||||
4. **Retry** - `luzia retry {job_id}`
|
||||
|
||||
---
|
||||
|
||||
## Checklist: Autonomous Prompt Quality
|
||||
|
||||
- [ ] Task is specific (not "improve" or "fix")
|
||||
- [ ] Success criteria defined (what success looks like)
|
||||
- [ ] Output format specified (JSON, file, etc)
|
||||
- [ ] Exit codes documented (0=success, 1=failure)
|
||||
- [ ] Constraints listed (what can't be changed)
|
||||
- [ ] No ambiguous language
|
||||
- [ ] No requests for clarification
|
||||
- [ ] No approval prompts
|
||||
- [ ] No "if you think..." or "do you want to..."
|
||||
- [ ] All context provided upfront
|
||||
- [ ] User running as limited user (not root)
|
||||
- [ ] Task scope limited to project directory
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
**The Core Rule:**
|
||||
> Autonomous agents don't ask questions because they don't need to.
|
||||
|
||||
Well-designed prompts provide:
|
||||
1. Clear objectives
|
||||
2. Specific success criteria
|
||||
3. Complete context
|
||||
4. Defined boundaries
|
||||
5. No ambiguity
|
||||
|
||||
When these are present, agents execute autonomously. When they're missing, agents ask clarifying questions, causing blocking.
|
||||
|
||||
**For Luzia agents:** Use the 5 patterns (detached spawning, permission bypass, file-based I/O, exit code signaling, context-first prompting) and follow the anti-patterns guide.
|
||||
|
||||
Reference in New Issue
Block a user