Automating the path from a developer's editor to production systems
Before CI/CD, teams faced integration hell — developers worked in isolation for weeks, then merged everything at once.
The longer code sits unintegrated, the more expensive integration becomes. CI/CD makes integration a continuous, cheap activity rather than a periodic expensive one.
CI is a practice where developers integrate their changes into a shared branch frequently — ideally multiple times per day — with each integration verified by an automated build and test run.
The two CDs are often confused. The distinction is whether the final push to production is manual or automatic.
| Aspect | Continuous Delivery | Continuous Deployment |
|---|---|---|
| Definition | Code is always in a releasable state; deploy to prod requires a human click | Every commit that passes the pipeline is deployed to prod automatically |
| Human gate | Yes — explicit approval step | No — fully automated |
| Risk tolerance | Suitable when compliance, QA, or product sign-off is required | Requires very high test coverage and observability confidence |
| Typical users | Regulated industries, enterprise products | SaaS, high-cadence web services (e.g. Netflix, Etsy) |
| Prerequisite | Both require solid CI — you cannot have CD without CI | |
A pipeline is a sequence of automated stages. Each stage must pass before the next runs. A failure stops the pipeline and notifies the team.
Pipeline definitions live in the repo (e.g. .github/workflows/ci.yml). They are versioned, reviewed, and evolved alongside the application code.
The build stage converts source code into a runnable artefact — a compiled binary, a Docker image, a wheel package, or a bundled web app.
pip install, npm ci, go mod download)npm ci, not npm installin CI. ci is stricter: it uses the lockfile exactly and fails if it would need updating — preventing silent dependency drift.
# GitHub Actions build step
- name: Install deps
run: npm ci
- name: Lint
run: npm run lint
- name: Build
run: npm run build
- name: Upload artefact
uses: actions/upload-artifact@v4
with:
name: dist-${{ github.sha }}
path: dist/
Mike Cohn's test pyramid guides how to balance test types. More tests at the base (fast, cheap); fewer at the apex (slow, expensive).
Test a single function or class in isolation. No I/O, no network. Should run in < 1 s total for a module.
Test how components work together — e.g. service + database, or two microservices. Use real or containerised dependencies.
Drive the full stack through a browser or API client. Playwright, Cypress, Selenium. Run last; slowest.
A quality gate is a threshold that the pipeline enforces. If the code does not meet the standard, the pipeline fails and the change cannot proceed.
Require minimum test coverage — e.g. 80% line or branch coverage. Tools: pytest-cov, nyc, jacoco.
coverage: 82.4% ✔ (≥ 80%)
branch: 76.1% ✘ (< 75%) → FAIL
Run SAST tools: semgrep, bandit, eslint, SonarQube. Block merges that introduce new high-severity findings.
Scan for known CVEs in dependencies. npm audit, pip-audit, Snyk, Dependabot. Fail on HIGH or CRITICAL findings.
Start with loose gates and tighten over time. Enforcing 80% coverage on a legacy codebase from day one is demoralising — begin at your current level and ratchet upward.
How you branch determines when and what the pipeline runs.
Developers commit directly to main (or via very short-lived branches). CI runs on every push. Favoured by Google, Meta. Requires feature flags for incomplete work.
main is always deployable. Work happens in feature branches, merged via pull request. CI runs on each PR and on merge to main. Simple and effective for most teams.
Separate develop, release, and hotfix branches. More overhead — now considered heavyweight for most modern software.
| Trigger | Typical pipeline |
|---|---|
| Push to feature branch | Build + unit tests |
| Pull request opened | Full CI (build + all tests + analysis) |
Merge to main | Full CI + deploy to staging |
Tag push (v1.2.0) | Full CI + deploy to production |
| Scheduled (nightly) | Slow tests, security scans |
Don't run the full slow suite on every commit to a feature branch — developers lose patience and disable CI. Run a fast subset; run everything on merge.
A build artefact is the immutable, versioned output of the build stage. It is promoted through environments — never rebuilt.
Never rebuild the artefact for staging vs production. Rebuilding introduces the risk that the artefact that passed testing is not what gets deployed.
Containers solve the "works on my machine" problem. A Docker image bundles the application and its entire runtime — the same image runs in CI, staging, and production.
# Multi-stage Dockerfile (Python example)
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
FROM python:3.12-slim AS runtime
WORKDIR /app
COPY --from=builder /usr/local/lib/python3.12 \
/usr/local/lib/python3.12
COPY src/ .
RUN useradd -m appuser && chown -R appuser /app
USER appuser
CMD ["python", "main.py"]
Multi-stage builds keep the final image lean — the builder stage (with compilers and dev tools) is discarded. Runtime images should contain only what is needed to run.
docker build -t app:$SHA .trivy image app:$SHAghcr.io, ECR, Docker Hub:latest — avoid in production; ambiguous:abc1234 — git SHA, fully traceable:1.4.2 — semver for releases:main-20260306 — branch + date# .github/workflows/ci.yml
name: CI
on:
push:
branches: [main]
pull_request:
jobs:
build-and-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install dependencies
run: pip install -r requirements.txt
- name: Lint
run: ruff check .
- name: Unit tests
run: pytest tests/unit --cov=src \
--cov-fail-under=80
- name: Build Docker image
run: |
docker build \
-t ghcr.io/${{ github.repository }}:${{ github.sha }} .
- name: Push to registry
if: github.ref == 'refs/heads/main'
run: docker push \
ghcr.io/${{ github.repository }}:${{ github.sha }}
.github/workflows/Split slow test suites across multiple jobs that run concurrently. Use needs: to express dependencies between jobs.
| Tool | Model | Config file | Strengths | Considerations |
|---|---|---|---|---|
| GitHub Actions | SaaS / self-hosted | .github/workflows/*.yml |
Tight GitHub integration, huge Marketplace, free for public repos | Costs scale with minutes on private repos |
| GitLab CI/CD | SaaS / self-hosted | .gitlab-ci.yml |
All-in-one DevOps platform, strong environments & review apps | GitLab hosting required (or self-host) |
| Jenkins | Self-hosted | Jenkinsfile (Groovy) |
Fully configurable, huge plugin ecosystem, runs anywhere | High operational burden; Groovy DSL has a learning curve |
| CircleCI | SaaS / self-hosted | .circleci/config.yml |
Fast, good caching, orbs for reusable config | Costs can surprise at scale |
| Tekton / ArgoCD | Kubernetes-native | CRD YAML manifests | Cloud-native, GitOps-friendly, very scalable | Steep learning curve; requires Kubernetes |
For a new project on GitHub, start with GitHub Actions. For an enterprise self-hosted requirement with Kubernetes, evaluate Tekton + ArgoCD.
Pipelines need credentials — API keys, registry passwords, cloud credentials. Mishandling them is one of the most common CI/CD security failures.
echo $MY_SECRET.env files with real values${{ secrets.MY_KEY }} — redacted from logsSLSA provenance and signed artefactsRunners should have the minimum IAM permissions needed. Use ephemeral runners (a fresh VM per job) rather than long-lived shared agents.
GitHub Actions supports OIDC federation with AWS, GCP, and Azure. The runner gets a short-lived token automatically — no stored secret needed.
How you deploy to production determines your blast radius and rollback speed.
Stop old version, start new version. Simple but causes downtime. Only for non-critical systems.
Replace instances one at a time. Zero downtime. Old and new versions briefly coexist — APIs must be backwards-compatible.
Two identical environments. Route traffic from blue (old) to green (new). Instant rollback by switching the load balancer back. Doubles infrastructure cost briefly.
Route a small percentage (e.g. 5%) of traffic to the new version. Monitor error rates and latency. Gradually shift 100% if healthy, or roll back quickly if not.
Deploy code but hide it behind a runtime flag. Decouple deployment from feature release. Allows trunk-based development with incomplete features safely in production.
Canary + feature flags is the combination used by most high-velocity teams (Netflix, Spotify, GitHub itself).
Deploying fast is only safe if you can detect failures quickly and recover faster than you broke things.
Configure alerts on key SLIs. If error rate exceeds threshold within a defined window after deploy, trigger automatic rollback — re-deploy the previous artefact SHA.
Deploy completes. Automated smoke tests run.
Monitor p50/p99 latency and error rate. Compare to pre-deploy baseline.
Check application logs for new error patterns.
Mark deploy as stable. Close the change window.
A slow pipeline is a pipeline that developers work around. Target < 10 minutes for the core CI loop.
Cache dependency downloads between runs. Most platforms support keying the cache on the lockfile hash.
- uses: actions/cache@v4
with:
path: ~/.npm
key: npm-${{ hashFiles('**/package-lock.json') }}
Split test suites into shards. Run lint, unit tests, and security scans as parallel jobs. Use a matrix strategy for multi-version or multi-OS testing.
Only trigger expensive jobs when relevant files change. Skip the full test suite if only documentation changed.
on:
push:
paths:
- 'src/**'
- 'tests/**'
- 'requirements*.txt'
Order Dockerfile instructions from least to most frequently changing. Copy and install dependencies before copying application code.
The DORA (DevOps Research and Assessment) team identified four metrics that predict software delivery performance.
How often does the team successfully deploy to production? Elite teams deploy multiple times per day. Frequent, small deployments are safer than infrequent large ones.
Time from code committed to code running in production. Elite: < 1 hour. Measures the efficiency of the entire pipeline, including review and approvals.
Percentage of deployments that cause a production failure requiring rollback or hotfix. Elite: 0–5%. A high rate indicates inadequate testing or deployment strategy.
How long to restore service after a failure. Elite: < 1 hour. Requires fast detection (observability), fast rollback, and on-call processes.
These metrics are positively correlated with business outcomes — teams in the elite tier have 127× faster lead time than low performers (DORA State of DevOps Report).
.github/workflows/ci.yml to your next projectHumble & Farley, Continuous Delivery (2010) · Kim et al., The DevOps Handbook (2nd ed. 2021) · DORA State of DevOps Reports · Martin Fowler's ContinuousIntegration article (martinfowler.com)