Based on claude-code-tools TmuxCLIController, this refactor: - Added DockerTmuxController class for robust tmux session management - Implements send_keys() with configurable delay_enter - Implements capture_pane() for output retrieval - Implements wait_for_prompt() for pattern-based completion detection - Implements wait_for_idle() for content-hash-based idle detection - Implements wait_for_shell_prompt() for shell prompt detection Also includes workflow improvements: - Pre-task git snapshot before agent execution - Post-task commit protocol in agent guidelines Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
630 lines
15 KiB
Markdown
630 lines
15 KiB
Markdown
# CLI Agent Patterns and Prompt Design
|
|
## Practical Guide for Building Non-Blocking Agents
|
|
|
|
**Date:** 2026-01-09
|
|
**Version:** 1.0
|
|
**Audience:** Agent developers, prompt engineers
|
|
|
|
---
|
|
|
|
## Quick Reference: 5 Critical Patterns
|
|
|
|
### 1. Detached Spawning (Never Block)
|
|
```python
|
|
# ✅ CORRECT: Agent runs in background
|
|
os.system(f'nohup script.sh >/dev/null 2>&1 &')
|
|
job_id = generate_uuid()
|
|
return job_id # Return immediately
|
|
|
|
# ❌ WRONG: Parent waits for agent to finish
|
|
result = subprocess.run(['claude', ...], wait=True)
|
|
# CLI blocked until agent completes!
|
|
```
|
|
|
|
### 2. Permission Bypass (No Approval Dialogs)
|
|
```bash
|
|
# ✅ CORRECT: Agents don't ask for tool approval
|
|
claude --permission-mode bypassPermissions --dangerously-skip-permissions ...
|
|
|
|
# ❌ WRONG: Default mode asks for confirmation on tool use
|
|
claude ...
|
|
# Blocks waiting for user to approve: "This command has high privileges. Approve? [Y/n]"
|
|
```
|
|
|
|
### 3. File-Based I/O (No stdin/stdout)
|
|
```python
|
|
# ✅ CORRECT: All I/O via files
|
|
with open(f"{job_dir}/prompt.txt", "w") as f:
|
|
f.write(full_prompt)
|
|
|
|
# Agent reads prompt from file
|
|
# Agent writes output to log file
|
|
# Status checked by reading exit code from file
|
|
|
|
# ❌ WRONG: Trying to use stdin/stdout
|
|
process = subprocess.Popen(..., stdin=PIPE, stdout=PIPE)
|
|
process.stdin.write(prompt) # What if backgrounded? stdin unavailable!
|
|
result = process.stdout.read() # Parent blocked waiting!
|
|
```
|
|
|
|
### 4. Exit Code Signaling (Async Status)
|
|
```bash
|
|
# ✅ CORRECT: Append exit code to output
|
|
command...
|
|
exit_code=$?
|
|
echo "exit:$exit_code" >> output.log
|
|
|
|
# Later, check status without process
|
|
grep "^exit:" output.log # Returns immediately
|
|
|
|
# ❌ WRONG: Only store in memory
|
|
# Process exits, exit code lost
|
|
# Can't determine status later
|
|
```
|
|
|
|
### 5. Context-First Prompts (Minimize Questions)
|
|
```
|
|
# ✅ CORRECT: Specific, complete, unambiguous
|
|
You are running as user: musica
|
|
Working directory: /workspace
|
|
You have permission to read/write files here.
|
|
|
|
Task: Run pytest in /workspace/tests and save results to results.json
|
|
Success criteria: File contains {passed: int, failed: int, skipped: int}
|
|
Exit code: 0 if all tests pass, 1 if any fail
|
|
|
|
Do NOT ask for clarification. You have all needed information.
|
|
|
|
# ❌ WRONG: Vague, requires interpretation
|
|
Fix the test suite.
|
|
(What needs fixing? Which tests? Agent will need to ask!)
|
|
```
|
|
|
|
---
|
|
|
|
## Prompt Patterns for Autonomy
|
|
|
|
### Pattern 1: Analysis Task (Read-Only)
|
|
|
|
**Goal:** Agent analyzes code without modifying anything
|
|
|
|
```markdown
|
|
## Task
|
|
Analyze the TypeScript codebase in /workspace/src for:
|
|
1. Total files
|
|
2. Total lines of code (excluding comments/blanks)
|
|
3. Number of functions
|
|
4. Number of classes
|
|
5. Average cyclomatic complexity per function
|
|
6. Top 3 most complex files
|
|
|
|
## Success Criteria
|
|
Save results to /workspace/analysis.json with structure:
|
|
{
|
|
"total_files": number,
|
|
"total_loc": number,
|
|
"functions": number,
|
|
"classes": number,
|
|
"avg_complexity": number,
|
|
"hotspots": [
|
|
{"file": string, "complexity": number, "functions": number}
|
|
]
|
|
}
|
|
|
|
## Exit Codes
|
|
- Exit 0: Success, file created with all fields
|
|
- Exit 1: File not created or missing fields
|
|
- Exit 2: Unrecoverable error (no TypeScript found, etc)
|
|
|
|
## Autonomy
|
|
You have all information needed. Do NOT:
|
|
- Ask which files to analyze
|
|
- Ask which metrics matter
|
|
- Request clarification on format
|
|
```
|
|
|
|
### Pattern 2: Execution Task (Run & Report)
|
|
|
|
**Goal:** Agent runs command and reports results
|
|
|
|
```markdown
|
|
## Task
|
|
Run the test suite in /workspace/tests with the following requirements:
|
|
|
|
1. Use pytest with JSON output
|
|
2. Run: pytest tests/ --json=results.json
|
|
3. Capture exit code
|
|
4. Create summary.json with:
|
|
- Total tests run
|
|
- Passed count
|
|
- Failed count
|
|
- Skipped count
|
|
- Exit code from pytest
|
|
|
|
## Success Criteria
|
|
Both results.json (from pytest) and summary.json (created by you) must exist.
|
|
|
|
Exit 0 if pytest exit code is 0 (all passed)
|
|
Exit 1 if pytest exit code is non-zero (failures)
|
|
|
|
## What to Do If Tests Fail
|
|
1. Create summary.json anyway with failure counts
|
|
2. Exit with code 1 (not 2, this is expected)
|
|
3. Do NOT try to fix tests yourself
|
|
|
|
## Autonomy
|
|
You know what to do. Do NOT:
|
|
- Ask which tests to run
|
|
- Ask about test configuration
|
|
- Request approval before running tests
|
|
```
|
|
|
|
### Pattern 3: Implementation Task (Read + Modify)
|
|
|
|
**Goal:** Agent modifies code based on specification
|
|
|
|
```markdown
|
|
## Task
|
|
Add error handling to /workspace/src/database.ts
|
|
|
|
Requirements:
|
|
1. All database calls must have try/catch
|
|
2. Catch blocks must log to console.error
|
|
3. Catch blocks must return null (not throw)
|
|
4. Add TypeScript types for error parameter
|
|
|
|
## Success Criteria
|
|
File modifies without syntax errors (use: npm run build)
|
|
All database functions protected (search file for db\. calls)
|
|
|
|
## Exit Codes
|
|
- Exit 0: All database calls wrapped, no TypeScript errors
|
|
- Exit 1: Some database calls not wrapped, OR TypeScript errors exist
|
|
- Exit 2: File not found or unrecoverable
|
|
|
|
## Verification
|
|
After modifications:
|
|
npm run build # Must succeed with no errors
|
|
|
|
## Autonomy
|
|
You have specific requirements. Do NOT:
|
|
- Ask which functions need wrapping
|
|
- Ask about error logging format
|
|
- Request confirmation before modifying
|
|
```
|
|
|
|
### Pattern 4: Multi-Phase Task (Sequential Steps)
|
|
|
|
**Goal:** Agent completes multiple dependent steps
|
|
|
|
```markdown
|
|
## Task
|
|
Complete this CI/CD pipeline step:
|
|
|
|
Phase 1: Build
|
|
- npm install
|
|
- npm run build
|
|
- Check: no errors in output
|
|
|
|
Phase 2: Test
|
|
- npm run test
|
|
- Check: exit code 0
|
|
- If exit code 1: STOP, exit 1 from this task
|
|
|
|
Phase 3: Report
|
|
- Create build-report.json with:
|
|
{
|
|
"build": {success: true, timestamp: string},
|
|
"tests": {success: true, count: number, failed: number},
|
|
"status": "ready_for_deploy"
|
|
}
|
|
|
|
## Success Criteria
|
|
All three phases complete AND exit codes from npm are 0
|
|
build-report.json created with all fields
|
|
Overall exit code: 0 (success) or 1 (failure at any phase)
|
|
|
|
## Autonomy
|
|
Execute phases in order. Do NOT:
|
|
- Ask whether to skip phases
|
|
- Ask about error handling
|
|
- Request approval between phases
|
|
```
|
|
|
|
### Pattern 5: Decision Task (Branch Logic)
|
|
|
|
**Goal:** Agent makes decisions based on conditions
|
|
|
|
```markdown
|
|
## Task
|
|
Decide whether to deploy based on build status.
|
|
|
|
Steps:
|
|
1. Read build-report.json (created by previous task)
|
|
2. Check: all phases successful
|
|
3. If successful:
|
|
a. Create deployment-plan.json
|
|
b. Exit 0
|
|
4. If not successful:
|
|
a. Create failure-report.json
|
|
b. Exit 1
|
|
|
|
## Decision Logic
|
|
IF (build.success AND tests.success AND no_syntax_errors):
|
|
Deploy ready
|
|
ELSE:
|
|
Cannot deploy
|
|
|
|
## Success Criteria
|
|
One of these files exists:
|
|
- deployment-plan.json (exit 0)
|
|
- failure-report.json (exit 1)
|
|
|
|
## Autonomy
|
|
You have criteria. Do NOT:
|
|
- Ask whether to deploy
|
|
- Request confirmation
|
|
- Ask about deployment process
|
|
```
|
|
|
|
---
|
|
|
|
## Anti-Patterns: What NOT to Do
|
|
|
|
### ❌ Anti-Pattern 1: Ambiguous Tasks
|
|
|
|
```
|
|
WRONG: "Improve the code"
|
|
- What needs improvement?
|
|
- Which files?
|
|
- What metrics?
|
|
AGENT WILL ASK: "Can you clarify what you mean by improve?"
|
|
```
|
|
|
|
**FIX:**
|
|
```
|
|
CORRECT: "Reduce cyclomatic complexity in src/processor.ts"
|
|
- Identify functions with complexity > 5
|
|
- Refactor to reduce to < 5
|
|
- Run tests to verify no regression
|
|
```
|
|
|
|
### ❌ Anti-Pattern 2: Vague Success Criteria
|
|
|
|
```
|
|
WRONG: "Make sure it works"
|
|
- What is "it"?
|
|
- How do we verify it works?
|
|
AGENT WILL ASK: "How should I know when the task is complete?"
|
|
```
|
|
|
|
**FIX:**
|
|
```
|
|
CORRECT: "Task complete when:"
|
|
- All tests pass (pytest exit 0)
|
|
- No TypeScript errors (npm run build succeeds)
|
|
- Code coverage > 80% (check coverage report)
|
|
```
|
|
|
|
### ❌ Anti-Pattern 3: Implicit Constraints
|
|
|
|
```
|
|
WRONG: "Add this feature to the codebase"
|
|
- What files can be modified?
|
|
- What can't be changed?
|
|
AGENT WILL ASK: "Can I modify the database schema?"
|
|
```
|
|
|
|
**FIX:**
|
|
```
|
|
CORRECT: "Add feature to src/features/auth.ts:"
|
|
- This file ONLY
|
|
- Don't modify: database schema, config, types
|
|
- Do maintain: existing function signatures
|
|
```
|
|
|
|
### ❌ Anti-Pattern 4: Interactive Questions in Prompts
|
|
|
|
```
|
|
WRONG:
|
|
"Do you think we should refactor this?
|
|
Try a few approaches and tell me which is best."
|
|
AGENT WILL ASK: "What criteria for 'best'? Performance? Readability?"
|
|
```
|
|
|
|
**FIX:**
|
|
```
|
|
CORRECT:
|
|
"Refactor for readability:"
|
|
- Break functions > 20 lines into smaller functions
|
|
- Add clear variable names (no x, y, temp)
|
|
- Check: ESLint passes, no new warnings
|
|
```
|
|
|
|
### ❌ Anti-Pattern 5: Requiring User Approval
|
|
|
|
```
|
|
WRONG:
|
|
"I'm about to deploy. Is this okay? [Y/n]"
|
|
BLOCKS: Waiting for user input via stdin (won't work in background!)
|
|
```
|
|
|
|
**FIX:**
|
|
```
|
|
CORRECT:
|
|
"Validate deployment prerequisites and create deployment-plan.json"
|
|
(No approval request. User runs separately: cat deployment-plan.json)
|
|
(If satisfied, user can execute deployment)
|
|
```
|
|
|
|
---
|
|
|
|
## Handling Edge Cases Without Blocking
|
|
|
|
### Case 1: File Not Found
|
|
|
|
```markdown
|
|
## If /workspace/config.json doesn't exist:
|
|
1. Log to output: "Config file not found"
|
|
2. Create default config
|
|
3. Continue with default values
|
|
4. Do NOT ask user: "Should I create a default?"
|
|
|
|
## If error occurs during execution:
|
|
1. Log full error to output.log
|
|
2. Include: what failed, why, what was attempted
|
|
3. Exit with code 1
|
|
4. Do NOT ask: "What should I do?"
|
|
```
|
|
|
|
### Case 2: Ambiguous State
|
|
|
|
```markdown
|
|
## If multiple versions of file exist:
|
|
1. Document all versions found
|
|
2. Choose: most recent by timestamp
|
|
3. Continue
|
|
4. Log choice to output.log
|
|
5. Do NOT ask: "Which one should I use?"
|
|
|
|
## If task instructions conflict:
|
|
1. Document the conflict
|
|
2. Follow: primary instruction (first mentioned)
|
|
3. Log reasoning to output.log
|
|
4. Do NOT ask: "Which should I follow?"
|
|
```
|
|
|
|
### Case 3: Partial Success
|
|
|
|
```markdown
|
|
## If some tests pass, some fail:
|
|
1. Report both: {passed: 45, failed: 3}
|
|
2. Exit with code 1 (not 0, even though some passed)
|
|
3. Include in output: which tests failed
|
|
4. Do NOT ask: "Should I count partial success?"
|
|
```
|
|
|
|
---
|
|
|
|
## Prompt Template for Maximum Autonomy
|
|
|
|
```markdown
|
|
# Agent Task Template
|
|
|
|
## Role & Context
|
|
You are a {project_name} project agent.
|
|
Working directory: {absolute_path}
|
|
Running as user: {username}
|
|
Permissions: Full read/write in working directory
|
|
|
|
## Task Specification
|
|
{SPECIFIC task description}
|
|
|
|
Success looks like:
|
|
- {Specific deliverable 1}
|
|
- {Specific deliverable 2}
|
|
- {Specific output file/format}
|
|
|
|
## Execution Environment
|
|
Tools available: Read, Write, Edit, Bash, Glob, Grep
|
|
Directories accessible: {list specific paths}
|
|
Commands available: {list specific commands}
|
|
Constraints: {List what cannot be done}
|
|
|
|
## Exit Codes
|
|
- 0: Success (all success criteria met)
|
|
- 1: Failure (some success criteria not met, but not unrecoverable)
|
|
- 2: Error (unrecoverable, cannot continue)
|
|
|
|
## If Something Goes Wrong
|
|
1. Log the error to output
|
|
2. Try once to recover
|
|
3. If recovery fails, exit with appropriate code
|
|
4. Do NOT ask for help or clarification
|
|
|
|
## Do NOT
|
|
- Ask any clarifying questions
|
|
- Request approval for any action
|
|
- Wait for user input
|
|
- Modify files outside {working directory}
|
|
- Use tools not listed above
|
|
```
|
|
|
|
---
|
|
|
|
## Real-World Examples
|
|
|
|
### Example 1: Code Quality Scan (Read-Only)
|
|
|
|
**Prompt:**
|
|
```
|
|
Analyze code quality in /workspace/src using:
|
|
1. ESLint (npm run lint) - capture all warnings
|
|
2. TypeScript compiler (npm run build) - capture all errors
|
|
3. Count lines of code per file
|
|
|
|
Save to quality-report.json:
|
|
{
|
|
"eslint": {
|
|
"errors": number,
|
|
"warnings": number,
|
|
"rules_violated": [string]
|
|
},
|
|
"typescript": {
|
|
"errors": number,
|
|
"errors_list": [string]
|
|
},
|
|
"code_metrics": {
|
|
"total_loc": number,
|
|
"total_files": number,
|
|
"avg_loc_per_file": number
|
|
}
|
|
}
|
|
|
|
Exit 0 if both eslint and typescript succeeded.
|
|
Exit 1 if either had errors.
|
|
Do NOT try to fix errors, just report.
|
|
```
|
|
|
|
**Expected Agent Behavior:**
|
|
- Runs linters (no approval needed)
|
|
- Collects metrics
|
|
- Creates JSON file
|
|
- Exits with appropriate code
|
|
- No questions asked ✓
|
|
|
|
### Example 2: Database Migration (Modify + Verify)
|
|
|
|
**Prompt:**
|
|
```
|
|
Apply database migration /workspace/migrations/001_add_users_table.sql
|
|
|
|
Steps:
|
|
1. Read migration file
|
|
2. Run: psql -U postgres -d mydb -f migrations/001_add_users_table.sql
|
|
3. If success: psql ... -c "SELECT COUNT(*) FROM users;" to verify
|
|
4. Save results to migration-log.json
|
|
|
|
Success criteria:
|
|
- Migration file executed without errors
|
|
- New table exists
|
|
- migration-log.json contains:
|
|
{
|
|
"timestamp": string,
|
|
"migration": "001_add_users_table.sql",
|
|
"status": "success" | "failed",
|
|
"error": string | null
|
|
}
|
|
|
|
Exit 0 on success.
|
|
Exit 1 on any database error.
|
|
Do NOT manually create table if migration fails.
|
|
```
|
|
|
|
**Expected Agent Behavior:**
|
|
- Executes SQL (no approval needed)
|
|
- Verifies results
|
|
- Logs to JSON
|
|
- Exits appropriately
|
|
- No questions asked ✓
|
|
|
|
### Example 3: Deployment Check (Decision Logic)
|
|
|
|
**Prompt:**
|
|
```
|
|
Verify deployment readiness:
|
|
|
|
Checks:
|
|
1. All tests passing: npm test -> exit 0
|
|
2. Build succeeds: npm run build -> exit 0
|
|
3. No security warnings: npm audit -> moderate/high = 0
|
|
4. Environment configured: .env file exists
|
|
|
|
Create deployment-readiness.json:
|
|
{
|
|
"ready": boolean,
|
|
"checks": {
|
|
"tests": boolean,
|
|
"build": boolean,
|
|
"security": boolean,
|
|
"config": boolean
|
|
},
|
|
"blockers": [string],
|
|
"timestamp": string
|
|
}
|
|
|
|
If all checks pass: ready = true, exit 0
|
|
If any check fails: ready = false, exit 1
|
|
Do NOT try to fix blockers. Only report.
|
|
```
|
|
|
|
**Expected Agent Behavior:**
|
|
- Runs all checks
|
|
- Documents results
|
|
- No fixes attempted
|
|
- Clear decision output
|
|
- No questions asked ✓
|
|
|
|
---
|
|
|
|
## Debugging: When Agents DO Ask Questions
|
|
|
|
### How to Detect Blocking Questions
|
|
|
|
```bash
|
|
# Check agent output for clarification questions
|
|
grep -i "should i\|would you\|can you\|do you want\|clarif" \
|
|
/var/log/luz-orchestrator/jobs/{job_id}/output.log
|
|
|
|
# Check for approval prompts
|
|
grep -i "approve\|confirm\|permission\|y/n" \
|
|
/var/log/luz-orchestrator/jobs/{job_id}/output.log
|
|
|
|
# Agent blocked = exit code not in output.log
|
|
tail -5 /var/log/luz-orchestrator/jobs/{job_id}/output.log
|
|
# If last line is NOT "exit:{code}", agent is blocked
|
|
```
|
|
|
|
### How to Fix
|
|
|
|
1. **Identify the question** - What is agent asking?
|
|
2. **Redesign prompt** - Provide the answer upfront
|
|
3. **Be more specific** - Remove ambiguity
|
|
4. **Retry** - `luzia retry {job_id}`
|
|
|
|
---
|
|
|
|
## Checklist: Autonomous Prompt Quality
|
|
|
|
- [ ] Task is specific (not "improve" or "fix")
|
|
- [ ] Success criteria defined (what success looks like)
|
|
- [ ] Output format specified (JSON, file, etc)
|
|
- [ ] Exit codes documented (0=success, 1=failure)
|
|
- [ ] Constraints listed (what can't be changed)
|
|
- [ ] No ambiguous language
|
|
- [ ] No requests for clarification
|
|
- [ ] No approval prompts
|
|
- [ ] No "if you think..." or "do you want to..."
|
|
- [ ] All context provided upfront
|
|
- [ ] User running as limited user (not root)
|
|
- [ ] Task scope limited to project directory
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
**The Core Rule:**
|
|
> Autonomous agents don't ask questions because they don't need to.
|
|
|
|
Well-designed prompts provide:
|
|
1. Clear objectives
|
|
2. Specific success criteria
|
|
3. Complete context
|
|
4. Defined boundaries
|
|
5. No ambiguity
|
|
|
|
When these are present, agents execute autonomously. When they're missing, agents ask clarifying questions, causing blocking.
|
|
|
|
**For Luzia agents:** Use the 5 patterns (detached spawning, permission bypass, file-based I/O, exit code signaling, context-first prompting) and follow the anti-patterns guide.
|
|
|