CI/CD Pipelines - Production Deployment Automation and Best Practices

Continuous Integration and Continuous Deployment (CI/CD) automate software delivery from code commit to production deployment, reducing deployment time from hours to minutes while improving reliability. This comprehensive guide covers modern CI/CD patterns, pipeline configuration, testing strategies, deployment automation, and production best practices used by companies deploying hundreds of times daily.

Why CI/CD Matters

Faster Delivery: Automated pipelines deploy code within minutes of merge, enabling rapid iteration and feature delivery.

Reduced Risk: Automated testing catches bugs before production, while gradual rollout strategies minimize blast radius.

Consistency: Every deployment follows the same process, eliminating manual errors and configuration drift.

Developer Productivity: Developers focus on features instead of deployment procedures, improving velocity 3-5x.

Amazon deploys code every 11.7 seconds on average using automated CI/CD pipelines, while Netflix deploys thousands of times daily across their microservices architecture.

CI/CD Fundamentals

Pipeline Stages

# .github/workflows/ci-cd.yml name: CI/CD Pipeline on: push: branches: [main, develop] pull_request: branches: [main] jobs: Stage 1: Code Quality lint-and-format: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Setup Node.js uses: actions/setup-node@v4 with: node-version: '20' cache: 'npm' - name: Install dependencies run: npm ci - name: Run linter run: npm run lint - name: Check formatting run: npm run format:check - name: Type check run: npm run type-check Stage 2: Unit Tests unit-tests: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Setup Node.js uses: actions/setup-node@v4 with: node-version: '20' cache: 'npm' - name: Install dependencies run: npm ci - name: Run unit tests run: npm run test:unit -- --coverage - name: Upload coverage uses: codecov/codecov-action@v3 with: files: ./coverage/lcov.info Stage 3: Integration Tests integration-tests: runs-on: ubuntu-latest services: postgres: image: postgres:15 env: POSTGRES_PASSWORD: postgres options: >- --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5 redis: image: redis:7 options: >- --health-cmd "redis-cli ping" --health-interval 10s --health-timeout 5s --health-retries 5 steps: - uses: actions/checkout@v4 - name: Setup Node.js uses: actions/setup-node@v4 with: node-version: '20' cache: 'npm' - name: Install dependencies run: npm ci - name: Run database migrations run: npm run migrate:test env: DATABASE_URL: postgresql://postgres:postgres@localhost:5432/test - name: Run integration tests run: npm run test:integration env: DATABASE_URL: postgresql://postgres:postgres@localhost:5432/test REDIS_URL: redis://localhost:6379 Stage 4: E2E Tests e2e-tests: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Setup Node.js uses: actions/setup-node@v4 with: node-version: '20' cache: 'npm' - name: Install dependencies run: npm ci - name: Install Playwright run: npx playwright install --with-deps - name: Build application run: npm run build - name: Run E2E tests run: npm run test:e2e - name: Upload test results if: failure() uses: actions/upload-artifact@v3 with: name: playwright-report path: playwright-report/ Stage 5: Build Container build: needs: [lint-and-format, unit-tests, integration-tests, e2e-tests] runs-on: ubuntu-latest if: github.ref == 'refs/heads/main' steps: - uses: actions/checkout@v4 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 - name: Login to Container Registry uses: docker/login-action@v3 with: registry: ghcr.io username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - name: Build and push uses: docker/build-push-action@v5 with: context: . push: true tags: | ghcr.io/${{ github.repository }}:latest ghcr.io/${{ github.repository }}:${{ github.sha }} cache-from: type=gha cache-to: type=gha,mode=max Stage 6: Deploy to Staging deploy-staging: needs: build runs-on: ubuntu-latest environment: staging steps: - name: Deploy to staging run: | kubectl set image deployment/app app=ghcr.io/${{ github.repository }}:${{ github.sha }} --namespace=staging - name: Wait for rollout run: | kubectl rollout status deployment/app \ --namespace=staging \ --timeout=5m Stage 7: Smoke Tests smoke-tests: needs: deploy-staging runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Run smoke tests run: npm run test:smoke env: BASE_URL: https://staging.myapp.com - name: Check health endpoints run: | curl -f https://staging.myapp.com/health || exit 1 curl -f https://staging.myapp.com/ready || exit 1 Stage 8: Deploy to Production deploy-production: needs: smoke-tests runs-on: ubuntu-latest environment: production steps: - name: Deploy to production run: | kubectl set image deployment/app app=ghcr.io/${{ github.repository }}:${{ github.sha }} --namespace=production - name: Wait for rollout run: | kubectl rollout status deployment/app \ --namespace=production \ --timeout=10m - name: Notify deployment uses: slackapi/slack-github-action@v1 with: payload: | { "text": "🚀 Deployed ${{ github.sha }} to production", "blocks": [ { "type": "section", "text": { "type": "mrkdwn", "text": "Deployment successful!\n*Commit*: `${{ github.sha }}`\n*Author*: ${{ github.actor }}" } } ] } env: SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

Testing in CI/CD

Parallel Test Execution

# .github/workflows/parallel-tests.yml
name: Parallel Tests
on: [push, pull_request]
jobs:
test-matrix:
strategy:
matrix:
node-version: [18, 20, 22]
os: [ubuntu-latest, windows-latest, macos-latest]
runs-on: ${{ matrix.os }}

steps:
  - uses: actions/checkout@v4

  - name: Setup Node.js ${{ matrix.node-version }}
    uses: actions/setup-node@v4
    with:
      node-version: ${{ matrix.node-version }}

  - name: Run tests
    run: npm test

test-sharding:
runs-on: ubuntu-latest
strategy:
matrix:
shard: [1, 2, 3, 4]
steps:
  - uses: actions/checkout@v4

  - name: Setup Node.js
    uses: actions/setup-node@v4
    with:
      node-version: '20'

  - name: Install dependencies
    run: npm ci

  - name: Run test shard ${{ matrix.shard }}
    run: npm test -- --shard=${{ matrix.shard }}/4

Test Coverage Requirements

# .github/workflows/coverage.yml
- name: Check coverage
  run: |
    COVERAGE=$(npm test -- --coverage --coverageReporters=json-summary | \
               jq '.total.lines.pct' coverage/coverage-summary.json)
if (( $(echo &quot;$COVERAGE &lt; 80&quot; | bc -l) )); then
  echo &quot;Coverage $COVERAGE% is below 80% threshold&quot;
  exit 1
fi

echo &quot;Coverage: $COVERAGE%&quot;

Deployment Strategies

Blue-Green Deployment

# blue-green-deploy.yml
name: Blue-Green Deployment
on:
workflow_dispatch:
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Deploy to green environment
run: |
kubectl apply -f k8s/deployment-green.yaml
kubectl wait --for=condition=available deployment/app-green --timeout=5m
  - name: Run smoke tests on green
    run: npm run test:smoke
    env:
      BASE_URL: https://green.myapp.com

  - name: Switch traffic to green
    run: |
      kubectl patch service app \
        -p '{&quot;spec&quot;:{&quot;selector&quot;:{&quot;version&quot;:&quot;green&quot;}}}'

  - name: Monitor for errors
    run: |
      sleep 60
      ERROR_RATE=$(curl -s https://metrics.myapp.com/error-rate)
      if (( $(echo &quot;$ERROR_RATE &gt; 1.0&quot; | bc -l) )); then
        echo &quot;High error rate detected, rolling back&quot;
        kubectl patch service app \
          -p '{&quot;spec&quot;:{&quot;selector&quot;:{&quot;version&quot;:&quot;blue&quot;}}}'
        exit 1
      fi

  - name: Decommission blue environment
    run: kubectl delete deployment app-blue

Canary Deployment

# canary-deploy.yml
name: Canary Deployment
on:
push:
branches: [main]
jobs:
canary:
runs-on: ubuntu-latest
steps:
- name: Deploy canary (10% traffic)
run: |
kubectl apply -f k8s/deployment-canary.yaml
kubectl apply -f k8s/virtualservice-10-percent.yaml
  - name: Monitor canary for 10 minutes
    run: |
      for i in {1..10}; do
        echo &quot;Monitoring minute $i/10...&quot;

        CANARY_ERRORS=$(kubectl logs -l version=canary | grep ERROR | wc -l)
        STABLE_ERRORS=$(kubectl logs -l version=stable | grep ERROR | wc -l)

        if [ $CANARY_ERRORS -gt $((STABLE_ERRORS * 2)) ]; then
          echo &quot;Canary error rate too high, rolling back&quot;
          kubectl delete deployment app-canary
          exit 1
        fi

        sleep 60
      done

  - name: Increase to 50% traffic
    run: kubectl apply -f k8s/virtualservice-50-percent.yaml

  - name: Monitor for 5 minutes
    run: sleep 300

  - name: Promote canary to stable
    run: |
      kubectl apply -f k8s/deployment-stable-new-version.yaml
      kubectl delete deployment app-canary

Rolling Deployment

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2        # Allow 2 extra pods during update
      maxUnavailable: 1  # Allow 1 pod to be unavailable
  template:
    spec:
      containers:
      - name: app
        image: myapp:v2.0.0
        readinessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 10
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10

Secrets Management

Encrypted Secrets

# .github/workflows/secrets.yml jobs: deploy: runs-on: ubuntu-latest steps: - name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: ${{ secrets.AWS_ROLE_ARN }} aws-region: us-east-1 - name: Get secrets from Parameter Store run: | DATABASE_URL=$(aws ssm get-parameter \ --name /myapp/production/database-url \ --with-decryption \ --query Parameter.Value \ --output text) echo "::add-mask::$DATABASE_URL" echo "DATABASE_URL=$DATABASE_URL" >> $GITHUB_ENV - name: Deploy with secrets run: | kubectl create secret generic app-secrets \ --from-literal=database-url=$DATABASE_URL \ --dry-run=client -o yaml | kubectl apply -f -

HashiCorp Vault Integration

- name: Import secrets from Vault
  uses: hashicorp/vault-action@v2
  with:
    url: https://vault.mycompany.com
    token: ${{ secrets.VAULT_TOKEN }}
    secrets: |
      secret/data/myapp/production database_url | DATABASE_URL ;
      secret/data/myapp/production api_key | API_KEY

Monitoring and Observability

Deployment Metrics

- name: Record deployment metrics
  run: |
    DEPLOYMENT_TIME=$(($(date +%s) - $START_TIME))
curl -X POST https://metrics.myapp.com/deployments \
  -H &quot;Content-Type: application/json&quot; \
  -d &quot;{
    \&quot;commit\&quot;: \&quot;${{ github.sha }}\&quot;,
    \&quot;duration\&quot;: $DEPLOYMENT_TIME,
    \&quot;status\&quot;: \&quot;success\&quot;,
    \&quot;environment\&quot;: \&quot;production\&quot;
  }&quot;


name: Create deployment marker in Datadog
run: |
curl -X POST "https://api.datadoghq.com/api/v1/events" 

-H "DD-API-KEY: ${{ secrets.DATADOG_API_KEY }}" 

-H "Content-Type: application/json" 

-d "{
&quot;title&quot;: &quot;Deployment to production&quot;,
&quot;text&quot;: &quot;Deployed ${{ github.sha }}&quot;,
&quot;tags&quot;: [&quot;environment:production&quot;, &quot;service:api&quot;]
}"

Automated Rollback

- name: Monitor error rates
  run: |
    sleep 300  # Wait 5 minutes
ERROR_RATE=$(curl -s https://api.newrelic.com/v2/applications/$APP_ID/metrics/data.json \
  -H &quot;X-Api-Key: ${{ secrets.NEW_RELIC_API_KEY }}&quot; \
  -G -d &quot;names[]=Errors/all&quot; \
  -d &quot;summarize=true&quot; | \
  jq '.metric_data.metrics[0].timeslices[0].values.error_count')

if [ $ERROR_RATE -gt 100 ]; then
  echo &quot;High error rate detected: $ERROR_RATE errors&quot;

  kubectl rollout undo deployment/app

  curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
    -H &quot;Content-Type: application/json&quot; \
    -d &quot;{\&quot;text\&quot;:\&quot;⚠️ Auto-rollback triggered due to high error rate\&quot;}&quot;

  exit 1
fi

Advanced Patterns

Feature Flags Integration

- name: Enable feature flag for canary
  run: |
    curl -X PATCH https://api.launchdarkly.com/api/v2/flags/my-project/new-feature \
      -H "Authorization: ${{ secrets.LAUNCHDARKLY_TOKEN }}" \
      -H "Content-Type: application/json" \
      -d '{
        "instructions": [{
          "kind": "updateFlagVariationOrRolloutRule",
          "ruleId": "canary-rule",
          "rolloutWeights": {
            "on": 10,
            "off": 90
          }
        }]
      }'

Database Migration Automation

- name: Run database migrations
  run: |
    # Create migration job
    kubectl create job migrate-${{ github.sha }} \
      --from=cronjob/migrations \
      --image=myapp:${{ github.sha }}
# Wait for completion
kubectl wait --for=condition=complete job/migrate-${{ github.sha }} \
  --timeout=5m

# Check migration status
if kubectl get job migrate-${{ github.sha }} -o jsonpath='{.status.failed}' | grep -q &quot;1&quot;; then
  echo &quot;Migration failed&quot;
  kubectl logs job/migrate-${{ github.sha }}
  exit 1
fi

Performance Budget Enforcement

- name: Run Lighthouse CI
  run: |
    npm install -g @lhci/cli
    lhci autorun --config=lighthouserc.json
lighthouserc.json
{
"ci": {
"assert": {
"assertions": {
"first-contentful-paint": ["error", {"maxNumericValue": 2000}],
"interactive": ["error", {"maxNumericValue": 3500}],
"speed-index": ["error", {"maxNumericValue": 3000}]
}
}
}
}

Real-World Examples

Netflix's Spinnaker Pipeline

Netflix uses Spinnaker for multi-region deployments:

Automated baking: Build AMIs with application code
Red/Black deployments: New version deployed alongside old
Automated canary analysis: Machine learning detects anomalies
Multi-region rollout: Deploy region-by-region with automated rollback

Their pipelines handle 4,000+ daily deployments across microservices.

GitLab's Auto DevOps

GitLab Auto DevOps provides zero-configuration CI/CD:

Auto Build: Detects language and builds container
Auto Test: Runs tests based on project structure
Auto Deploy: Deploys to Kubernetes automatically
Auto Monitoring: Prometheus metrics collection

Reduces CI/CD setup time from days to minutes for standard applications.

Kubernetes Progressive Delivery

Flagger automates progressive delivery on Kubernetes:

Canary analysis: Monitors metrics during rollout
A/B testing: Route traffic based on headers
Blue/green switching: Zero-downtime deployments
Automatic rollback: Revert on metric threshold violations

Used by Weaveworks to deploy multiple times daily with confidence.

Conclusion

Modern CI/CD pipelines automate the entire deployment process from code commit to production, enabling rapid iteration while maintaining reliability. Implement comprehensive testing at every stage, use progressive deployment strategies to minimize risk, and automate rollback procedures for quick recovery.

Key patterns - parallel test execution for speed, blue-green deployments for zero downtime, canary releases for gradual rollout, and automated monitoring for early issue detection - create robust deployment pipelines. Treat CI/CD configuration as code, version control pipeline definitions, and regularly review and optimize pipeline performance.

Start simple with basic CI, add deployment automation incrementally, and expand to advanced patterns like canary analysis as team maturity grows. The goal is deployment confidence - teams should feel comfortable deploying multiple times daily without fear of breaking production.

CI/CD Pipelines - Production Deployment Automation and Best Practices

Why CI/CD Matters

CI/CD Fundamentals

Pipeline Stages

Stage 1: Code Quality

Stage 2: Unit Tests

Stage 3: Integration Tests

Stage 4: E2E Tests

Stage 5: Build Container

Stage 6: Deploy to Staging

Stage 7: Smoke Tests

Stage 8: Deploy to Production