Quality Gates
Quality gates enforce code quality standards between implementation and PR creation. They ensure all code passes basic checks before being submitted for review.
Quality gates run automatically after task implementation. If gates fail, Pilot receives feedback and retries the implementation.
Quality gates are disabled by default (enabled: false). You must explicitly enable them in your configuration.
Overview
Quality gates provide automated quality assurance by running checks like build verification, tests, linting, and security scans. When gates fail, the system provides specific error feedback to guide fixes.
When Gates Run
Task Implementation → Quality Gates → Pass? → Create PR
↓ Fail
Retry with feedback
↓ Still fail
Notify & StopGates execute in the project directory after code changes are complete but before PR creation. This catches issues early while providing actionable feedback for automatic fixes.
Gate Types
Quality gates support seven built-in types with configurable commands and thresholds:
| Type | Default Timeout | Description |
|---|---|---|
| build | 5 minutes | Compilation and syntax checking |
| test | 10 minutes | Unit, integration, and e2e test execution |
| lint | 2 minutes | Code style, formatting, and static analysis |
| coverage | 10 minutes | Test coverage measurement and threshold enforcement |
| security | 5 minutes | Security vulnerability scanning |
| typecheck | 3 minutes | Type checking for TypeScript, Flow, or similar |
| custom | 5 minutes | Project-specific checks and validations |
Build Gates
Verify code compiles and has no syntax errors. Auto-detects project type:
- Go:
go build ./... - Node.js + TypeScript:
npm run build || npx tsc --noEmit - Rust:
cargo check - Python:
python -m py_compile *.py
Test Gates
Execute test suites with configurable timeout. Common commands:
make testnpm testgo test ./...pytest
Lint Gates
Enforce code style and catch common issues:
make lintnpm run lintgolangci-lint runeslint .
Coverage Gates
Measure test coverage and enforce minimum thresholds. Supports parsing:
- Go:
go test -cover ./... - Jest:
npm test -- --coverage - Python:
pytest --cov=.
Set threshold: 80 to require 80% coverage.
Security Gates
Scan for vulnerabilities and security issues:
npm auditgo mod auditsafety check(Python)cargo audit(Rust)
Type Check Gates
Verify type safety in typed languages:
npx tsc --noEmit(TypeScript)mypy .(Python)flow check(Flow)
Custom Gates
Run project-specific checks:
- Bundle size validation
- Performance benchmarks
- API contract verification
- Database migration validation
Configuration
Enable quality gates in ~/.pilot/config.yaml:
quality:
enabled: true
gates:
- name: build
type: build
command: "make build"
required: true
timeout: 5m
max_retries: 2
failure_hint: "Fix compilation errors in the changed files"
- name: test
type: test
command: "make test"
required: true
timeout: 10m
max_retries: 2
failure_hint: "Fix failing tests or update test expectations"
- name: lint
type: lint
command: "make lint"
required: false # warn only
timeout: 2m
max_retries: 1
failure_hint: "Fix linting errors: formatting, unused imports, etc."
- name: coverage
type: coverage
command: "go test -cover ./..."
required: true
threshold: 80 # minimum coverage percentage
timeout: 10m
- name: security
type: security
command: "npm audit"
required: false
timeout: 5m
on_failure:
action: retry # retry | fail | warn
max_retries: 2
notify_on: [failed]Gate Properties
| Property | Description | Required |
|---|---|---|
name | Unique gate identifier | ✓ |
type | Gate type (build, test, etc.) | ✓ |
command | Shell command to execute | ✓ |
required | Fail pipeline if gate fails | |
timeout | Maximum execution time | |
threshold | Coverage percentage (coverage gates only) | |
max_retries | Retry attempts on failure | |
retry_delay | Delay between retries | |
failure_hint | Guidance for Claude on failure |
Behavior
Required vs Optional Gates
- Required gates: Pipeline fails if gate fails after retries
- Optional gates: Generate warnings but allow PR creation
Retry Logic
Failed gates trigger automatic retries with:
- Error feedback: Gate output is provided to Claude
- Failure hints: Custom guidance for common issues
- Delay: Configurable wait between attempts
- Max attempts: Prevents infinite retry loops
Failure Actions
Configure pipeline behavior when required gates fail:
- retry: Provide feedback and retry implementation (default)
- fail: Stop pipeline immediately
- warn: Log warning but continue to PR creation
Retry Behavior
Quality gates use a two-level retry system to maximize recovery from failures:
Two-Level Retry System
Level 1: Gate-Level Retries
Individual gates retry their command with configurable delay before reporting failure:
- name: build
max_retries: 2 # Retry up to 2 times
retry_delay: 5s # Wait 5 seconds between attemptsGate-level retries handle transient failures (network issues, flaky commands) without re-invoking Claude Code.
Level 2: Pipeline-Level Retries
When gate-level retries are exhausted, the pipeline can re-invoke Claude Code with error feedback:
on_failure:
action: retry # Re-invoke Claude with feedback
max_retries: 2 # Maximum pipeline-level retriesPipeline-level retries provide Claude with the full error output and failure_hint, allowing it to fix the underlying code issues.
Retry Flow
┌─────────────────────────────────────────────────────────────────────┐
│ QUALITY GATE EXECUTION │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌───────────────┐
│ Run Gate │
│ Command │
└───────────────┘
│
┌─────────┴─────────┐
│ │
▼ ▼
┌──────────┐ ┌──────────┐
│ Pass │ │ Fail │
└──────────┘ └──────────┘
│ │
│ ┌───────┴───────┐
│ │ Gate-level │
│ │ retries │
│ │ remaining? │
│ └───────────────┘
│ │
│ ┌─────────┴─────────┐
│ │ YES │ NO
│ ▼ ▼
│ ┌───────────┐ ┌─────────────┐
│ │ Wait │ │ Check │
│ │ delay │ │ on_failure │
│ └───────────┘ │ action │
│ │ └─────────────┘
│ │ │
│ └──────┐ ┌───────┴───────┬────────┐
│ │ │ │ │
│ ▼ ▼ ▼ ▼
│ [retry] "retry" "fail" "warn"
│ │ │ │
│ ┌─────────┴─────────┐ │ │
│ │ Pipeline retries │ │ │
│ │ remaining? │ │ │
│ └───────────────────┘ │ │
│ │ │ │
│ ┌─────────┴─────────┐ │ │
│ │ YES │ NO │ │
│ ▼ ▼ ▼ ▼
│ ┌───────────────┐ ┌──────────┐ │ ┌──────────┐
│ │ Re-invoke │ │ Pipeline │ │ │ Continue │
│ │ Claude Code │ │ Failed │◄─┘ │ to PR │
│ │ with feedback │ └──────────┘ └──────────┘
│ └───────────────┘
│ │
│ └───────────────┐
│ │
▼ ▼
┌──────────┐ [run gates again]
│ Continue │
│ to PR │
└──────────┘on_failure Actions
| Action | Behavior | Use Case |
|---|---|---|
retry | Re-invoke Claude Code with error feedback | Default. Allows AI to fix issues |
fail | Stop pipeline immediately | Strict mode. No automatic recovery |
warn | Log warning, continue to PR creation | Non-blocking gates |
Example: Full Retry Configuration
quality:
enabled: true
gates:
- name: build
type: build
command: "go build ./..."
required: true
max_retries: 2 # Gate-level: retry command 2 times
retry_delay: 3s # Wait 3s between gate retries
failure_hint: "Fix compilation errors shown above"
- name: test
type: test
command: "go test ./..."
required: true
max_retries: 1
retry_delay: 5s
failure_hint: "Fix failing tests. Check assertions and expected values"
on_failure:
action: retry # Pipeline-level: re-invoke Claude
max_retries: 2 # Up to 2 full re-runs with feedback
notify_on: [failed] # Alert when pipeline failsWith this configuration:
- Build gate runs, fails → waits 3s, retries → fails → waits 3s, retries → fails
- Gate-level retries exhausted → pipeline checks
on_failure.action - Action is
retry→ Claude Code re-invoked with build error output - Claude fixes code → gates run again
- Up to 2 pipeline retries before final failure
Auto Build Gate
When quality gates are enabled but no gates are configured, Pilot automatically creates a minimal build gate by detecting the project type.
Detection Priority
Pilot checks for project indicators in this order:
| Priority | File | Build Command |
|---|---|---|
| 1 | go.mod | go build ./... |
| 2 | package.json + tsconfig.json | npm run build || npx tsc --noEmit |
| 3 | package.json | npm run build --if-present |
| 4 | Cargo.toml | cargo check |
| 5 | pyproject.toml or setup.py | python -m py_compile ... |
Generated Configuration
When auto-detection triggers, Pilot generates this minimal config:
quality:
enabled: true
gates:
- name: build
type: build
command: "<detected-command>" # Based on project type
required: true
timeout: 3m
max_retries: 1
retry_delay: 3s
failure_hint: "Fix compilation errors in the changed files"
on_failure:
action: retry
max_retries: 1Enabling Auto Build Gate
Simply enable quality gates without specifying any gates:
quality:
enabled: true
# No gates specified - auto-detection kicks inAuto build gate provides basic compilation safety with zero configuration. For test, lint, or coverage gates, you must configure them explicitly.
Override Auto-Detection
To use a custom build command instead of auto-detection:
quality:
enabled: true
gates:
- name: build
type: build
command: "make build" # Custom command overrides detectionSelf-Review
After quality gates pass, Pilot runs an automatic self-review phase where Claude examines its own changes for common issues. This runs before PR creation.
Self-review is advisory only. Errors during self-review do not block PR creation. The phase is designed to catch and fix issues when possible, but failures are logged and execution continues.
What Self-Review Checks
-
Diff Analysis: Examines staged changes for:
- Methods called that don’t exist
- Struct fields added but never used
- Config fields that aren’t wired through
- Unused imports
-
Build Verification: Runs build command to catch compile errors
-
Wiring Check: For new struct fields:
- Verifies field is assigned when creating the struct
- Verifies field is used somewhere in the code
-
Method Existence Check: For new method calls:
- Searches for method implementation
- Implements missing methods if needed
-
Issue-to-Changes Alignment: Compares issue title/body with actual changes:
- Detects if files mentioned in the issue weren’t modified
- Flags incomplete implementations
Self-Review Flow
Quality Gates Pass
│
▼
┌───────────────────┐
│ Self-Review │
│ Phase │
│ (2 min timeout) │
└───────────────────┘
│
├─── Issues found? ───► Fix automatically
│ │
│ ▼
│ Commit fixes
│ │
▼ │
No issues / ◄────────────────────────┘
Review passed
│
▼
Create PRConfiguration
Self-review is enabled by default. To disable:
executor:
skip_self_review: trueAcceptance Criteria Verification
Starting in v2.49.0, self-review includes automatic verification of acceptance criteria (ACs) from the source issue.
How ACs flow through the pipeline:
Issue Body (ACs)
│
▼
┌──────────────────┐
│ prompt_builder.go │ ← Extracts ACs from issue body (lines 73-80)
│ Parse checkboxes │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Execution │ ← Claude implements with ACs in context
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Self-Review │ ← Verifies each AC was actually implemented
│ AC Check │
└──────────────────┘-
Extraction: The prompt builder parses the issue body for Markdown checkboxes (
- [ ] ...) and structured acceptance criteria sections. These are injected into the execution prompt so Claude knows exactly what to deliver. -
Verification: During self-review, each extracted AC is checked against the actual diff. The reviewer looks for evidence that the criterion was addressed — matching file changes, new tests, config additions, etc.
-
Signals: If an AC appears unaddressed, self-review emits an
INCOMPLETE:signal with the specific criterion, giving Claude a chance to fix it before PR creation.
Example issue body:
### Acceptance Criteria
- [ ] Add rate limiting middleware to /api/v1/* routes
- [ ] Default limit: 100 requests per minute per IP
- [ ] Return 429 status with Retry-After header
- [ ] Add rate limit config to config.yamlEach checkbox becomes a verification target. Self-review confirms the diff includes middleware registration, the 100 req/min default, 429 response handling, and a config struct field.
Skipped Conditions
Self-review is automatically skipped for:
- Trivial tasks: Simple changes that don’t warrant review
- Disabled in config: When
skip_self_review: true
Output Signals
Self-review emits signals in its output:
| Signal | Meaning |
|---|---|
REVIEW_PASSED | No issues found |
REVIEW_FIXED: | Issues found and fixed |
INCOMPLETE: | Files mentioned in issue but not modified |
Minimal Configuration
For basic protection without full configuration:
quality:
enabled: true
# Uses minimal build gate with auto-detectionThis enables build verification only, with commands auto-detected from project type.
Monitoring
Quality gate results are logged and tracked:
# View recent gate results
pilot logs --quality
# Dashboard with gate status
pilot start --dashboardBest Practices
Start with build and test gates only. Add linting and coverage gates gradually to avoid overwhelming the system with failures.
- Start simple: Enable build gates first, add others incrementally
- Tune timeouts: Adjust based on project size and CI performance
- Meaningful hints: Provide specific guidance in
failure_hint - Optional first: Make new gates optional until they’re stable
- Test locally: Verify gate commands work in your environment
Troubleshooting
Gates Always Failing
Check that gate commands work in your project:
# Test gate commands manually
make build
make test
make lintTimeout Issues
Increase timeout for slow operations:
- name: integration-test
timeout: 20m # Longer for slow testsCoverage Parsing
Ensure coverage output format is supported. Quality gates parse:
- Go:
coverage: X.X% of statements - Jest:
All files | X.X | - Python:
TOTAL.*X%