CI/CD Pipelines - Production Deployment Automation and Best Practices
Master CI/CD pipelines with GitHub Actions, GitLab CI, Jenkins, automated testing, deployment strategies, rollback mechanisms, and production best practices for continuous delivery.
Continuous Integration and Continuous Deployment (CI/CD) automate software delivery from code commit to production deployment, reducing deployment time from hours to minutes while improving reliability. This comprehensive guide covers modern CI/CD patterns, pipeline configuration, testing strategies, deployment automation, and production best practices used by companies deploying hundreds of times daily.
Why CI/CD Matters
Faster Delivery: Automated pipelines deploy code within minutes of merge, enabling rapid iteration and feature delivery.
Reduced Risk: Automated testing catches bugs before production, while gradual rollout strategies minimize blast radius.
Consistency: Every deployment follows the same process, eliminating manual errors and configuration drift.
Developer Productivity: Developers focus on features instead of deployment procedures, improving velocity 3-5x.
Amazon deploys code every 11.7 seconds on average using automated CI/CD pipelines, while Netflix deploys thousands of times daily across their microservices architecture.
CI/CD Fundamentals
Pipeline Stages
# .github/workflows/ci-cd.yml
name: CI/CD Pipeline
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
jobs:
Stage 1: Code Quality
lint-and-format:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run linter
run: npm run lint
- name: Check formatting
run: npm run format:check
- name: Type check
run: npm run type-check
Stage 2: Unit Tests
unit-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run unit tests
run: npm run test:unit -- --coverage
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: ./coverage/lcov.info
Stage 3: Integration Tests
integration-tests:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: postgres
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
redis:
image: redis:7
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run database migrations
run: npm run migrate:test
env:
DATABASE_URL: postgresql://postgres:postgres@localhost:5432/test
- name: Run integration tests
run: npm run test:integration
env:
DATABASE_URL: postgresql://postgres:postgres@localhost:5432/test
REDIS_URL: redis://localhost:6379
Stage 4: E2E Tests
e2e-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Install Playwright
run: npx playwright install --with-deps
- name: Build application
run: npm run build
- name: Run E2E tests
run: npm run test:e2e
- name: Upload test results
if: failure()
uses: actions/upload-artifact@v3
with:
name: playwright-report
path: playwright-report/
Stage 5: Build Container
build:
needs: [lint-and-format, unit-tests, integration-tests, e2e-tests]
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: |
ghcr.io/${{ github.repository }}:latest
ghcr.io/${{ github.repository }}:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
Stage 6: Deploy to Staging
deploy-staging:
needs: build
runs-on: ubuntu-latest
environment: staging
steps:
- name: Deploy to staging
run: |
kubectl set image deployment/app
app=ghcr.io/${{ github.repository }}:${{ github.sha }}
--namespace=staging
- name: Wait for rollout
run: |
kubectl rollout status deployment/app \
--namespace=staging \
--timeout=5m
Stage 7: Smoke Tests
smoke-tests:
needs: deploy-staging
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run smoke tests
run: npm run test:smoke
env:
BASE_URL: https://staging.myapp.com
- name: Check health endpoints
run: |
curl -f https://staging.myapp.com/health || exit 1
curl -f https://staging.myapp.com/ready || exit 1
Stage 8: Deploy to Production
deploy-production:
needs: smoke-tests
runs-on: ubuntu-latest
environment: production
steps:
- name: Deploy to production
run: |
kubectl set image deployment/app
app=ghcr.io/${{ github.repository }}:${{ github.sha }}
--namespace=production
- name: Wait for rollout
run: |
kubectl rollout status deployment/app \
--namespace=production \
--timeout=10m
- name: Notify deployment
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "🚀 Deployed ${{ github.sha }} to production",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "Deployment successful!\n*Commit*: `${{ github.sha }}`\n*Author*: ${{ github.actor }}"
}
}
]
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}
Testing in CI/CD
Parallel Test Execution
# .github/workflows/parallel-tests.yml
name: Parallel Tests
on: [push, pull_request]
jobs:
test-matrix:
strategy:
matrix:
node-version: [18, 20, 22]
os: [ubuntu-latest, windows-latest, macos-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- name: Setup Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node-version }}
- name: Run tests
run: npm test
test-sharding:
runs-on: ubuntu-latest
strategy:
matrix:
shard: [1, 2, 3, 4]
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: npm ci
- name: Run test shard ${{ matrix.shard }}
run: npm test -- --shard=${{ matrix.shard }}/4
Test Coverage Requirements
# .github/workflows/coverage.yml
- name: Check coverage
run: |
COVERAGE=$(npm test -- --coverage --coverageReporters=json-summary | \
jq '.total.lines.pct' coverage/coverage-summary.json)
if (( $(echo "$COVERAGE < 80" | bc -l) )); then
echo "Coverage $COVERAGE% is below 80% threshold"
exit 1
fi
echo "Coverage: $COVERAGE%"
Deployment Strategies
Blue-Green Deployment
# blue-green-deploy.yml
name: Blue-Green Deployment
on:
workflow_dispatch:
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Deploy to green environment
run: |
kubectl apply -f k8s/deployment-green.yaml
kubectl wait --for=condition=available deployment/app-green --timeout=5m
- name: Run smoke tests on green
run: npm run test:smoke
env:
BASE_URL: https://green.myapp.com
- name: Switch traffic to green
run: |
kubectl patch service app \
-p '{"spec":{"selector":{"version":"green"}}}'
- name: Monitor for errors
run: |
sleep 60
ERROR_RATE=$(curl -s https://metrics.myapp.com/error-rate)
if (( $(echo "$ERROR_RATE > 1.0" | bc -l) )); then
echo "High error rate detected, rolling back"
kubectl patch service app \
-p '{"spec":{"selector":{"version":"blue"}}}'
exit 1
fi
- name: Decommission blue environment
run: kubectl delete deployment app-blue
Canary Deployment
# canary-deploy.yml
name: Canary Deployment
on:
push:
branches: [main]
jobs:
canary:
runs-on: ubuntu-latest
steps:
- name: Deploy canary (10% traffic)
run: |
kubectl apply -f k8s/deployment-canary.yaml
kubectl apply -f k8s/virtualservice-10-percent.yaml
- name: Monitor canary for 10 minutes
run: |
for i in {1..10}; do
echo "Monitoring minute $i/10..."
CANARY_ERRORS=$(kubectl logs -l version=canary | grep ERROR | wc -l)
STABLE_ERRORS=$(kubectl logs -l version=stable | grep ERROR | wc -l)
if [ $CANARY_ERRORS -gt $((STABLE_ERRORS * 2)) ]; then
echo "Canary error rate too high, rolling back"
kubectl delete deployment app-canary
exit 1
fi
sleep 60
done
- name: Increase to 50% traffic
run: kubectl apply -f k8s/virtualservice-50-percent.yaml
- name: Monitor for 5 minutes
run: sleep 300
- name: Promote canary to stable
run: |
kubectl apply -f k8s/deployment-stable-new-version.yaml
kubectl delete deployment app-canary
Rolling Deployment
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
spec:
replicas: 10
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2 # Allow 2 extra pods during update
maxUnavailable: 1 # Allow 1 pod to be unavailable
template:
spec:
containers:
- name: app
image: myapp:v2.0.0
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
Secrets Management
Encrypted Secrets
# .github/workflows/secrets.yml
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: us-east-1
- name: Get secrets from Parameter Store
run: |
DATABASE_URL=$(aws ssm get-parameter \
--name /myapp/production/database-url \
--with-decryption \
--query Parameter.Value \
--output text)
echo "::add-mask::$DATABASE_URL"
echo "DATABASE_URL=$DATABASE_URL" >> $GITHUB_ENV
- name: Deploy with secrets
run: |
kubectl create secret generic app-secrets \
--from-literal=database-url=$DATABASE_URL \
--dry-run=client -o yaml | kubectl apply -f -
HashiCorp Vault Integration
- name: Import secrets from Vault
uses: hashicorp/vault-action@v2
with:
url: https://vault.mycompany.com
token: ${{ secrets.VAULT_TOKEN }}
secrets: |
secret/data/myapp/production database_url | DATABASE_URL ;
secret/data/myapp/production api_key | API_KEY
Monitoring and Observability
Deployment Metrics
- name: Record deployment metrics
run: |
DEPLOYMENT_TIME=$(($(date +%s) - $START_TIME))
curl -X POST https://metrics.myapp.com/deployments \
-H "Content-Type: application/json" \
-d "{
\"commit\": \"${{ github.sha }}\",
\"duration\": $DEPLOYMENT_TIME,
\"status\": \"success\",
\"environment\": \"production\"
}"
- name: Create deployment marker in Datadog
run: |
curl -X POST "https://api.datadoghq.com/api/v1/events"
-H "DD-API-KEY: ${{ secrets.DATADOG_API_KEY }}"
-H "Content-Type: application/json"
-d "{
"title": "Deployment to production",
"text": "Deployed ${{ github.sha }}",
"tags": ["environment:production", "service:api"]
}"
Automated Rollback
- name: Monitor error rates
run: |
sleep 300 # Wait 5 minutes
ERROR_RATE=$(curl -s https://api.newrelic.com/v2/applications/$APP_ID/metrics/data.json \
-H "X-Api-Key: ${{ secrets.NEW_RELIC_API_KEY }}" \
-G -d "names[]=Errors/all" \
-d "summarize=true" | \
jq '.metric_data.metrics[0].timeslices[0].values.error_count')
if [ $ERROR_RATE -gt 100 ]; then
echo "High error rate detected: $ERROR_RATE errors"
kubectl rollout undo deployment/app
curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
-H "Content-Type: application/json" \
-d "{\"text\":\"⚠️ Auto-rollback triggered due to high error rate\"}"
exit 1
fi
Advanced Patterns
Feature Flags Integration
- name: Enable feature flag for canary
run: |
curl -X PATCH https://api.launchdarkly.com/api/v2/flags/my-project/new-feature \
-H "Authorization: ${{ secrets.LAUNCHDARKLY_TOKEN }}" \
-H "Content-Type: application/json" \
-d '{
"instructions": [{
"kind": "updateFlagVariationOrRolloutRule",
"ruleId": "canary-rule",
"rolloutWeights": {
"on": 10,
"off": 90
}
}]
}'
Database Migration Automation
- name: Run database migrations
run: |
# Create migration job
kubectl create job migrate-${{ github.sha }} \
--from=cronjob/migrations \
--image=myapp:${{ github.sha }}
# Wait for completion
kubectl wait --for=condition=complete job/migrate-${{ github.sha }} \
--timeout=5m
# Check migration status
if kubectl get job migrate-${{ github.sha }} -o jsonpath='{.status.failed}' | grep -q "1"; then
echo "Migration failed"
kubectl logs job/migrate-${{ github.sha }}
exit 1
fi
Performance Budget Enforcement
- name: Run Lighthouse CI
run: |
npm install -g @lhci/cli
lhci autorun --config=lighthouserc.json
lighthouserc.json
{
"ci": {
"assert": {
"assertions": {
"first-contentful-paint": ["error", {"maxNumericValue": 2000}],
"interactive": ["error", {"maxNumericValue": 3500}],
"speed-index": ["error", {"maxNumericValue": 3000}]
}
}
}
}
Real-World Examples
Netflix's Spinnaker Pipeline
Netflix uses Spinnaker for multi-region deployments:
- Automated baking: Build AMIs with application code
- Red/Black deployments: New version deployed alongside old
- Automated canary analysis: Machine learning detects anomalies
- Multi-region rollout: Deploy region-by-region with automated rollback
Their pipelines handle 4,000+ daily deployments across microservices.
GitLab's Auto DevOps
GitLab Auto DevOps provides zero-configuration CI/CD:
- Auto Build: Detects language and builds container
- Auto Test: Runs tests based on project structure
- Auto Deploy: Deploys to Kubernetes automatically
- Auto Monitoring: Prometheus metrics collection
Reduces CI/CD setup time from days to minutes for standard applications.
Kubernetes Progressive Delivery
Flagger automates progressive delivery on Kubernetes:
- Canary analysis: Monitors metrics during rollout
- A/B testing: Route traffic based on headers
- Blue/green switching: Zero-downtime deployments
- Automatic rollback: Revert on metric threshold violations
Used by Weaveworks to deploy multiple times daily with confidence.
Conclusion
Modern CI/CD pipelines automate the entire deployment process from code commit to production, enabling rapid iteration while maintaining reliability. Implement comprehensive testing at every stage, use progressive deployment strategies to minimize risk, and automate rollback procedures for quick recovery.
Key patterns - parallel test execution for speed, blue-green deployments for zero downtime, canary releases for gradual rollout, and automated monitoring for early issue detection - create robust deployment pipelines. Treat CI/CD configuration as code, version control pipeline definitions, and regularly review and optimize pipeline performance.
Start simple with basic CI, add deployment automation incrementally, and expand to advanced patterns like canary analysis as team maturity grows. The goal is deployment confidence - teams should feel comfortable deploying multiple times daily without fear of breaking production.
Related Articles
GraphQL API Design - Production Architecture and Best Practices for Scalable Systems
Master GraphQL API design covering schema design principles, resolver optimization, N+1 query prevention with DataLoader, authentication and authorization patterns, caching strategies, error handling, and production deployment for high-performance GraphQL systems.
Testing Strategies - Unit, Integration, and E2E Testing Best Practices for Production Quality
Comprehensive guide to testing strategies covering unit tests, integration tests, end-to-end testing, test-driven development, mocking patterns, testing pyramid, and production testing practices for reliable software delivery.
Monitoring and Observability - Production Systems Performance and Debugging at Scale
Master monitoring and observability covering metrics collection with Prometheus, distributed tracing with OpenTelemetry, log aggregation, alerting strategies, SLOs/SLIs, and production debugging techniques for reliable systems.
Written by StaticBlock Editorial
StaticBlock Editorial is a technical writer and software engineer specializing in web development, performance optimization, and developer tooling.