Quality Gates

Quality gates enforce code quality standards between implementation and PR creation. They ensure all code passes basic checks before being submitted for review. Gates run as step 8 of Pilot’s execution pipeline, immediately before self-review.

Quality gates run automatically after task implementation. If gates fail, Pilot receives feedback and retries the implementation.

Quality gates are disabled by default (enabled: false). You must explicitly enable them in your configuration.

Overview

Quality gates provide automated quality assurance by running checks like build verification, tests, linting, and security scans. When gates fail, the system provides specific error feedback to guide fixes.

When Gates Run


Task Implementation → Quality Gates → Pass? → Create PR
                            ↓ Fail
                      Retry with feedback
                            ↓ Still fail
                      Notify & Stop

Gates execute in the project directory after code changes are complete but before PR creation. This catches issues early while providing actionable feedback for automatic fixes.

Gate Types

Quality gates support seven built-in types with configurable commands and thresholds:

Type	Default Timeout	Description
build	5 minutes	Compilation and syntax checking
test	10 minutes	Unit, integration, and e2e test execution
lint	2 minutes	Code style, formatting, and static analysis
coverage	10 minutes	Test coverage measurement and threshold enforcement
security	5 minutes	Security vulnerability scanning
typecheck	3 minutes	Type checking for TypeScript, Flow, or similar
custom	5 minutes	Project-specific checks and validations

Build Gates

Verify code compiles and has no syntax errors. Auto-detects project type:

Go: go build ./...
Node.js + TypeScript: npm run build || npx tsc --noEmit
Rust: cargo check
Python: python -m py_compile *.py

Test Gates

Execute test suites with configurable timeout. Common commands:

make test
npm test
go test ./...
pytest

Lint Gates

Enforce code style and catch common issues:

make lint
npm run lint
golangci-lint run
eslint .

Coverage Gates

Measure test coverage and enforce minimum thresholds. Supports parsing:

Go: go test -cover ./...
Jest: npm test -- --coverage
Python: pytest --cov=.

Set threshold: 80 to require 80% coverage.

Security Gates

Scan for vulnerabilities and security issues:

npm audit
go mod audit
safety check (Python)
cargo audit (Rust)

Type Check Gates

Verify type safety in typed languages:

npx tsc --noEmit (TypeScript)
mypy . (Python)
flow check (Flow)

Custom Gates

Run project-specific checks:

Bundle size validation
Performance benchmarks
API contract verification
Database migration validation

Configuration

Enable quality gates in ~/.pilot/config.yaml:


quality:
  enabled: true
  gates:
    - name: build
      type: build
      command: "make build"
      required: true
      timeout: 5m
      max_retries: 2
      failure_hint: "Fix compilation errors in the changed files"
 
    - name: test
      type: test
      command: "make test"
      required: true
      timeout: 10m
      max_retries: 2
      failure_hint: "Fix failing tests or update test expectations"
 
    - name: lint
      type: lint
      command: "make lint"
      required: false  # warn only
      timeout: 2m
      max_retries: 1
      failure_hint: "Fix linting errors: formatting, unused imports, etc."
 
    - name: coverage
      type: coverage
      command: "go test -cover ./..."
      required: true
      threshold: 80  # minimum coverage percentage
      timeout: 10m
 
    - name: security
      type: security
      command: "npm audit"
      required: false
      timeout: 5m
 
  on_failure:
    action: retry  # retry | fail | warn
    max_retries: 2
    notify_on: [failed]

Gate Properties

Property	Description	Required
`name`	Unique gate identifier	✓
`type`	Gate type (build, test, etc.)	✓
`command`	Shell command to execute	✓
`required`	Fail pipeline if gate fails
`timeout`	Maximum execution time
`threshold`	Coverage percentage (coverage gates only)
`max_retries`	Retry attempts on failure
`retry_delay`	Delay between retries
`failure_hint`	Guidance for Claude on failure

Behavior

Required vs Optional Gates

Required gates: Pipeline fails if gate fails after retries
Optional gates: Generate warnings but allow PR creation

Retry Logic

Failed gates trigger automatic retries with:

Error feedback: Gate output is provided to Claude
Failure hints: Custom guidance for common issues
Delay: Configurable wait between attempts
Max attempts: Prevents infinite retry loops

Failure Actions

Configure pipeline behavior when required gates fail:

retry: Provide feedback and retry implementation (default)
fail: Stop pipeline immediately
warn: Log warning but continue to PR creation

Retry Behavior

Quality gates use a two-level retry system to maximize recovery from failures:

Two-Level Retry System

Level 1: Gate-Level Retries

Individual gates retry their command with configurable delay before reporting failure:


- name: build
  max_retries: 2      # Retry up to 2 times
  retry_delay: 5s     # Wait 5 seconds between attempts

Gate-level retries handle transient failures (network issues, flaky commands) without re-invoking Claude Code.

Level 2: Pipeline-Level Retries

When gate-level retries are exhausted, the pipeline can re-invoke Claude Code with error feedback:


on_failure:
  action: retry       # Re-invoke Claude with feedback
  max_retries: 2      # Maximum pipeline-level retries

Pipeline-level retries provide Claude with the full error output and failure_hint, allowing it to fix the underlying code issues.

Retry Flow


┌─────────────────────────────────────────────────────────────────────┐
│                         QUALITY GATE EXECUTION                       │
└─────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
                           ┌───────────────┐
                           │   Run Gate    │
                           │   Command     │
                           └───────────────┘
                                    │
                          ┌─────────┴─────────┐
                          │                   │
                          ▼                   ▼
                    ┌──────────┐        ┌──────────┐
                    │  Pass    │        │  Fail    │
                    └──────────┘        └──────────┘
                          │                   │
                          │           ┌───────┴───────┐
                          │           │  Gate-level   │
                          │           │  retries      │
                          │           │  remaining?   │
                          │           └───────────────┘
                          │                   │
                          │         ┌─────────┴─────────┐
                          │         │ YES               │ NO
                          │         ▼                   ▼
                          │   ┌───────────┐      ┌─────────────┐
                          │   │   Wait    │      │ Check       │
                          │   │   delay   │      │ on_failure  │
                          │   └───────────┘      │ action      │
                          │         │            └─────────────┘
                          │         │                   │
                          │         └──────┐    ┌───────┴───────┬────────┐
                          │                │    │               │        │
                          │                ▼    ▼               ▼        ▼
                          │           [retry]  "retry"        "fail"   "warn"
                          │                     │               │        │
                          │           ┌─────────┴─────────┐     │        │
                          │           │ Pipeline retries  │     │        │
                          │           │ remaining?        │     │        │
                          │           └───────────────────┘     │        │
                          │                   │                 │        │
                          │         ┌─────────┴─────────┐       │        │
                          │         │ YES               │ NO    │        │
                          │         ▼                   ▼       ▼        ▼
                          │   ┌───────────────┐   ┌──────────┐  │  ┌──────────┐
                          │   │ Re-invoke     │   │ Pipeline │  │  │ Continue │
                          │   │ Claude Code   │   │ Failed   │◄─┘  │ to PR    │
                          │   │ with feedback │   └──────────┘     └──────────┘
                          │   └───────────────┘
                          │         │
                          │         └───────────────┐
                          │                         │
                          ▼                         ▼
                    ┌──────────┐             [run gates again]
                    │ Continue │
                    │  to PR   │
                    └──────────┘

on_failure Actions

Action	Behavior	Use Case
`retry`	Re-invoke Claude Code with error feedback	Default. Allows AI to fix issues
`fail`	Stop pipeline immediately	Strict mode. No automatic recovery
`warn`	Log warning, continue to PR creation	Non-blocking gates

Example: Full Retry Configuration


quality:
  enabled: true
  gates:
    - name: build
      type: build
      command: "go build ./..."
      required: true
      max_retries: 2        # Gate-level: retry command 2 times
      retry_delay: 3s       # Wait 3s between gate retries
      failure_hint: "Fix compilation errors shown above"
 
    - name: test
      type: test
      command: "go test ./..."
      required: true
      max_retries: 1
      retry_delay: 5s
      failure_hint: "Fix failing tests. Check assertions and expected values"
 
  on_failure:
    action: retry           # Pipeline-level: re-invoke Claude
    max_retries: 2          # Up to 2 full re-runs with feedback
    notify_on: [failed]     # Alert when pipeline fails

With this configuration:

Build gate runs, fails → waits 3s, retries → fails → waits 3s, retries → fails
Gate-level retries exhausted → pipeline checks on_failure.action
Action is retry → Claude Code re-invoked with build error output
Claude fixes code → gates run again
Up to 2 pipeline retries before final failure

Auto Build Gate

When quality gates are enabled but no gates are configured, Pilot automatically creates a minimal build gate by detecting the project type.

Detection Priority

Pilot checks for project indicators in this order:

Priority	File	Build Command
1	`go.mod`	`go build ./...`
2	`package.json` + `tsconfig.json`	`npm run build \|\| npx tsc --noEmit`
3	`package.json`	`npm run build --if-present`
4	`Cargo.toml`	`cargo check`
5	`pyproject.toml` or `setup.py`	`python -m py_compile ...`

Generated Configuration

When auto-detection triggers, Pilot generates this minimal config:


quality:
  enabled: true
  gates:
    - name: build
      type: build
      command: "<detected-command>"  # Based on project type
      required: true
      timeout: 3m
      max_retries: 1
      retry_delay: 3s
      failure_hint: "Fix compilation errors in the changed files"
 
  on_failure:
    action: retry
    max_retries: 1

Enabling Auto Build Gate

Simply enable quality gates without specifying any gates:


quality:
  enabled: true
  # No gates specified - auto-detection kicks in

Auto build gate provides basic compilation safety with zero configuration. For test, lint, or coverage gates, you must configure them explicitly.

Override Auto-Detection

To use a custom build command instead of auto-detection:


quality:
  enabled: true
  gates:
    - name: build
      type: build
      command: "make build"  # Custom command overrides detection

Self-Review

After quality gates pass, Pilot runs an automatic self-review phase where Claude examines its own changes for common issues. This runs before PR creation.

Self-review is advisory only. Errors during self-review do not block PR creation. The phase is designed to catch and fix issues when possible, but failures are logged and execution continues.

What Self-Review Checks

Diff Analysis: Examines staged changes for:
- Methods called that don’t exist
- Struct fields added but never used
- Config fields that aren’t wired through
- Unused imports
Build Verification: Runs build command to catch compile errors
Wiring Check: For new struct fields:
- Verifies field is assigned when creating the struct
- Verifies field is used somewhere in the code
Method Existence Check: For new method calls:
- Searches for method implementation
- Implements missing methods if needed
Issue-to-Changes Alignment: Compares issue title/body with actual changes:
- Detects if files mentioned in the issue weren’t modified
- Flags incomplete implementations

Self-Review Flow


Quality Gates Pass
        │
        ▼
┌───────────────────┐
│   Self-Review     │
│   Phase           │
│   (2 min timeout) │
└───────────────────┘
        │
        ├─── Issues found? ───► Fix automatically
        │                              │
        │                              ▼
        │                       Commit fixes
        │                              │
        ▼                              │
  No issues / ◄────────────────────────┘
  Review passed
        │
        ▼
   Create PR

Configuration

Self-review is enabled by default. To disable:


executor:
  skip_self_review: true

Acceptance Criteria Verification

Starting in v2.49.0, self-review includes automatic verification of acceptance criteria (ACs) from the source issue.

How ACs flow through the pipeline:


Issue Body (ACs)
       │
       ▼
┌──────────────────┐
│ prompt_builder.go │  ← Extracts ACs from issue body (lines 73-80)
│ Parse checkboxes  │
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│   Execution      │  ← Claude implements with ACs in context
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│   Self-Review    │  ← Verifies each AC was actually implemented
│   AC Check       │
└──────────────────┘

Extraction: The prompt builder parses the issue body for Markdown checkboxes (- [ ] ...) and structured acceptance criteria sections. These are injected into the execution prompt so Claude knows exactly what to deliver.
Verification: During self-review, each extracted AC is checked against the actual diff. The reviewer looks for evidence that the criterion was addressed — matching file changes, new tests, config additions, etc.
Signals: If an AC appears unaddressed, self-review emits an INCOMPLETE: signal with the specific criterion, giving Claude a chance to fix it before PR creation.

Example issue body:


### Acceptance Criteria
- [ ] Add rate limiting middleware to /api/v1/* routes
- [ ] Default limit: 100 requests per minute per IP
- [ ] Return 429 status with Retry-After header
- [ ] Add rate limit config to config.yaml

Each checkbox becomes a verification target. Self-review confirms the diff includes middleware registration, the 100 req/min default, 429 response handling, and a config struct field.

Skipped Conditions

Self-review is automatically skipped for:

Trivial tasks: Simple changes that don’t warrant review
Disabled in config: When skip_self_review: true

Output Signals

Self-review emits signals in its output:

Signal	Meaning
`REVIEW_PASSED`	No issues found
`REVIEW_FIXED:`	Issues found and fixed
`INCOMPLETE:`	Files mentioned in issue but not modified

Minimal Configuration

For basic protection without full configuration:


quality:
  enabled: true
  # Uses minimal build gate with auto-detection

This enables build verification only, with commands auto-detected from project type.

Monitoring

Quality gate results are logged and tracked:


# View recent gate results
pilot logs --quality
 
# Dashboard with gate status
pilot start --dashboard

Best Practices

Start with build and test gates only. Add linting and coverage gates gradually to avoid overwhelming the system with failures.

Start simple: Enable build gates first, add others incrementally
Tune timeouts: Adjust based on project size and CI performance
Meaningful hints: Provide specific guidance in failure_hint
Optional first: Make new gates optional until they’re stable
Test locally: Verify gate commands work in your environment

Troubleshooting

Gates Always Failing

Check that gate commands work in your project:


# Test gate commands manually
make build
make test
make lint

Timeout Issues

Increase timeout for slow operations:


- name: integration-test
  timeout: 20m  # Longer for slow tests

Coverage Parsing

Ensure coverage output format is supported. Quality gates parse:

Go: coverage: X.X% of statements
Jest: All files | X.X |
Python: TOTAL.*X%