Deploying Node.js Microservices
From localhost to production-grade distributed systems
Node.js
Microservices
DevOps
Design
→
Build
→
Containerise
→
Deploy
→
Monitor
22 slides · architecture · containers · orchestration · observability
00
Topics We'll Cover
Part 1 — Architecture & Code
- Monolith vs Microservices
- Anatomy of a Node.js microservice
- Project structure (mono/polyrepo)
- Health checks & readiness probes
- Graceful shutdown patterns
- Environment configuration
Part 2 — Containers & Networking
- Production Dockerfiles (multi-stage)
- Docker Compose for local dev
- Inter-service communication
- Message queues & event-driven arch
- API Gateway pattern
- Service discovery & load balancing
Part 3 — Security & Observability
- Service-to-service authentication
- Logging & correlation IDs
- Database-per-service pattern
Part 4 — Deployment & Operations
- Deploying to Kubernetes
- Deploying to AWS (ECS/Fargate/Lambda)
- CI/CD pipelines for microservices
- Common pitfalls & production lessons
01
Monolith vs Microservices
When to Stay Monolithic
- Small team (< 5 developers)
- Unclear domain boundaries
- Rapid prototyping / MVP phase
- Low traffic, simple scaling needs
Rule: Start monolithic, extract when pain points emerge
When to Split
- Independent deploy cadences needed
- Different scaling profiles per domain
- Multiple teams owning distinct features
- Technology diversity requirements
Goal: Organisational autonomy + independent deployability
02
Anatomy of a Node.js Microservice
// src/server.ts — Entry point
import Fastify from 'fastify';
import { healthRoutes } from './routes/health.js';
import { userRoutes } from './routes/users.js';
import { gracefulShutdown } from './lifecycle.js';
import { config } from './config.js';
const app = Fastify({
logger: {
level: config.LOG_LEVEL,
transport: config.NODE_ENV === 'development'
? { target: 'pino-pretty' }
: undefined,
},
});
// Register routes
app.register(healthRoutes, { prefix: '/' });
app.register(userRoutes, { prefix: '/api/users' });
// Start
const start = async () => {
await app.listen({
port: config.PORT,
host: '0.0.0.0', // bind all interfaces in Docker
});
app.log.info(`Service running on :${config.PORT}`);
};
start();
gracefulShutdown(app);
Key Principles
- Single responsibility — one bounded context per service
- Bind to 0.0.0.0 — required for container networking
- Structured logging — JSON via pino (built into Fastify)
- Health endpoints — K8s liveness & readiness probes
- Graceful shutdown — drain in-flight requests on SIGTERM
Fastify vs Express
- Fastify: ~77k req/s, schema validation, built-in logging
- Express: ~15k req/s, massive ecosystem, simpler mental model
- Both work — Fastify is preferred for high-throughput microservices
03
Structuring a Node.js Project for Microservices
Monorepo (Recommended for Most Teams)
platform/
├── packages/
│ ├── shared-types/ # Shared TS interfaces
│ ├── logger/ # Shared pino config
│ └── auth-middleware/ # JWT validation
├── services/
│ ├── user-service/
│ │ ├── src/
│ │ │ ├── server.ts
│ │ │ ├── config.ts
│ │ │ ├── routes/
│ │ │ ├── services/
│ │ │ └── repositories/
│ │ ├── Dockerfile
│ │ ├── package.json
│ │ └── tsconfig.json
│ ├── order-service/
│ └── payment-service/
├── docker-compose.yml
├── turbo.json # Turborepo config
└── package.json # Workspace root
Monorepo Pros
- Atomic cross-service changes in one PR
- Shared packages without npm publishing
- Unified CI/CD and tooling config
- Easier refactoring across boundaries
Polyrepo Pros
- Complete team autonomy
- Independent versioning & release cycles
- Smaller clone / CI scope per service
- Better for large orgs with distinct ownership
Tooling
- Turborepo — fast, caches build artifacts
- Nx — more features, steeper learning curve
- pnpm workspaces — lightweight, fast installs
04
Health Checks & Readiness Probes
// src/routes/health.ts
import { FastifyInstance } from 'fastify';
import { pool } from '../db.js';
import { redis } from '../redis.js';
export async function healthRoutes(app: FastifyInstance) {
// Liveness — is the process alive?
app.get('/health', async (_req, reply) => {
return reply.send({ status: 'ok', uptime: process.uptime() });
});
// Readiness — can it serve traffic?
app.get('/ready', async (_req, reply) => {
const checks: Record<string, string> = {};
try {
await pool.query('SELECT 1');
checks.postgres = 'ok';
} catch {
checks.postgres = 'fail';
}
try {
await redis.ping();
checks.redis = 'ok';
} catch {
checks.redis = 'fail';
}
const allOk = Object.values(checks)
.every(v => v === 'ok');
return reply
.status(allOk ? 200 : 503)
.send({ status: allOk ? 'ready' : 'degraded', checks });
});
}
/health (Liveness)
- Returns 200 if process is alive
- K8s restarts container if this fails
- Keep it fast — no DB calls
/ready (Readiness)
- Checks all dependencies (DB, Redis, disk)
- Returns 503 if any check fails
- K8s removes pod from load balancer
- Traffic resumes when check passes again
What to Check
| Dependency | Check |
| PostgreSQL | SELECT 1 |
| Redis | PING |
| RabbitMQ | Channel open |
| Disk | Write temp file |
| External API | HEAD request |
05
Graceful Shutdown
// src/lifecycle.ts
import { FastifyInstance } from 'fastify';
import { pool } from './db.js';
import { redis } from './redis.js';
import { channel } from './mq.js';
const SHUTDOWN_TIMEOUT = 15_000; // 15 seconds
export function gracefulShutdown(app: FastifyInstance) {
let shuttingDown = false;
const shutdown = async (signal: string) => {
if (shuttingDown) return;
shuttingDown = true;
app.log.info({ signal }, 'Shutdown signal received');
// 1. Stop accepting new connections
// Fastify .close() drains in-flight requests
const timer = setTimeout(() => {
app.log.error('Shutdown timed out, forcing exit');
process.exit(1);
}, SHUTDOWN_TIMEOUT);
try {
// 2. Close HTTP server (drain in-flight)
await app.close();
app.log.info('HTTP server closed');
// 3. Close external connections
await channel?.close();
app.log.info('RabbitMQ channel closed');
await redis.quit();
app.log.info('Redis connection closed');
await pool.end();
app.log.info('PostgreSQL pool closed');
clearTimeout(timer);
app.log.info('Clean shutdown complete');
process.exit(0);
} catch (err) {
app.log.error(err, 'Error during shutdown');
process.exit(1);
}
};
process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));
}
Why It Matters
- K8s sends SIGTERM before killing a pod
- Default grace period: 30 seconds
- Without handling: dropped requests, corrupt data
Shutdown Order
- Stop accepting new connections
- Drain in-flight requests
- Close message queue channels
- Close cache connections (Redis)
- Close database pools
- Exit process
Common Mistakes
- No timeout — shutdown hangs forever
- Closing DB before draining HTTP requests
- Forgetting
SIGINT (Ctrl+C in dev)
- Using
process.exit() without cleanup
06
Environment Configuration
// src/config.ts — Typed, validated configuration
import { z } from 'zod';
import 'dotenv/config';
const envSchema = z.object({
NODE_ENV: z.enum(['development', 'production', 'test'])
.default('development'),
PORT: z.coerce.number().int().default(3000),
LOG_LEVEL: z.enum(['fatal','error','warn','info','debug','trace'])
.default('info'),
// Database
DATABASE_URL: z.string().url()
.describe('PostgreSQL connection string'),
DB_POOL_MIN: z.coerce.number().int().default(2),
DB_POOL_MAX: z.coerce.number().int().default(10),
// Redis
REDIS_URL: z.string().url().default('redis://localhost:6379'),
// Auth
JWT_SECRET: z.string().min(32),
JWT_EXPIRES: z.string().default('15m'),
// External services
ORDER_SERVICE_URL: z.string().url()
.default('http://order-service:3001'),
});
export type Config = z.infer<typeof envSchema>;
const parsed = envSchema.safeParse(process.env);
if (!parsed.success) {
console.error('Invalid environment:', parsed.error.format());
process.exit(1);
}
export const config = parsed.data;
12-Factor App Config
- Store config in environment variables
- Never commit secrets to source control
- Same artefact deployed to all environments
- Only env vars change between dev/staging/prod
Validation Benefits
- Fail fast at startup — not at 3 AM
- TypeScript types inferred from schema
- Default values for optional config
- Descriptive errors for missing values
Terminal Output on Failure
$ node dist/server.js
Invalid environment:
{
JWT_SECRET: {
_errors: ["String must contain at
least 32 character(s)"]
},
DATABASE_URL: {
_errors: ["Required"]
}
}
# Process exits with code 1
07
Containerising a Node.js Service
# Dockerfile — Multi-stage production build
# Stage 1: Install dependencies
FROM node:22-alpine AS deps
WORKDIR /app
COPY package.json pnpm-lock.yaml ./
RUN corepack enable && pnpm install --frozen-lockfile
# Stage 2: Build TypeScript
FROM deps AS build
COPY tsconfig.json ./
COPY src/ ./src/
RUN pnpm build
# Stage 3: Production image
FROM node:22-alpine AS runtime
WORKDIR /app
# Security: non-root user
RUN addgroup -g 1001 appgroup && \
adduser -u 1001 -G appgroup -s /bin/sh -D appuser
ENV NODE_ENV=production
# Copy only production artifacts
COPY --from=deps /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
COPY package.json ./
# Drop privileges
USER appuser
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s \
CMD wget -qO- http://localhost:3000/health || exit 1
CMD ["node", "dist/server.js"]
# .dockerignore
node_modules
dist
.env*
.git
*.md
coverage
.turbo
Multi-Stage Benefits
- Final image has no devDependencies
- No TypeScript compiler in production
- Image size: ~150 MB (Alpine) vs ~900 MB (full)
Security Checklist
- Non-root user — never run as root
- NODE_ENV=production — disables dev features
- No .env files in image
- Pin exact base image versions
| Base Image | Size | Use Case |
node:22-alpine | ~55 MB | Best default choice |
node:22-slim | ~80 MB | Need glibc (native addons) |
gcr.io/distroless/nodejs | ~45 MB | Minimal attack surface |
node:22 | ~350 MB | Avoid in production |
08
Docker Compose for Local Development
# docker-compose.yml
services:
user-service:
build: ./services/user-service
ports: ["3000:3000"]
environment:
DATABASE_URL: postgres://app:secret@postgres:5432/users
REDIS_URL: redis://redis:6379
RABBITMQ_URL: amqp://rabbit:5672
JWT_SECRET: local-dev-secret-min-32-chars-long!!
depends_on:
postgres: { condition: service_healthy }
redis: { condition: service_healthy }
rabbit: { condition: service_healthy }
networks: [backend]
order-service:
build: ./services/order-service
ports: ["3001:3001"]
environment:
DATABASE_URL: postgres://app:secret@postgres:5432/orders
REDIS_URL: redis://redis:6379
RABBITMQ_URL: amqp://rabbit:5672
USER_SERVICE_URL: http://user-service:3000
JWT_SECRET: local-dev-secret-min-32-chars-long!!
depends_on: [postgres, redis, rabbit]
networks: [backend]
payment-service:
build: ./services/payment-service
ports: ["3002:3002"]
environment:
RABBITMQ_URL: amqp://rabbit:5672
STRIPE_KEY: sk_test_fake_key_for_dev
depends_on: [rabbit]
networks: [backend]
postgres:
image: postgres:16-alpine
environment: { POSTGRES_USER: app, POSTGRES_PASSWORD: secret }
volumes: [pg_data:/var/lib/postgresql/data, ./init-db.sql:/docker-entrypoint-initdb.d/init.sql]
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app"]
interval: 5s
networks: [backend]
redis:
image: redis:7-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
networks: [backend]
rabbit:
image: rabbitmq:3.13-management-alpine
ports: ["5672:5672", "15672:15672"]
healthcheck:
test: ["CMD", "rabbitmq-diagnostics", "check_running"]
interval: 10s
networks: [backend]
volumes:
pg_data:
networks:
backend:
driver: bridge
09
Inter-Service Communication
Synchronous (Request/Response)
- HTTP/REST — simple, widely understood
- gRPC — binary protocol, codegen, streaming
- Client expects a response immediately
- Creates temporal coupling between services
// Calling another service via HTTP
const user = await fetch(
`${config.USER_SERVICE_URL}/api/users/${id}`,
{ headers: { Authorization: req.headers.authorization } }
).then(r => r.json());
Asynchronous (Event-Driven)
- Message queues — RabbitMQ, SQS, Redis Streams
- Event bus — Kafka, NATS, EventBridge
- Fire-and-forget, eventual consistency
- Services are decoupled in time
// Publishing an event
await channel.publish('events', 'order.created',
Buffer.from(JSON.stringify({
orderId, userId, total, createdAt: new Date()
}))
);
| Criteria | HTTP/REST | gRPC | Message Queue | Event Bus |
| Latency | Medium | Low | Variable | Variable |
| Coupling | High | High | Low | Very Low |
| Complexity | Low | Medium | Medium | High |
| Reliability | Retry needed | Retry needed | Built-in | Built-in |
| Best For | CRUD, queries | High-throughput internal | Task processing | Domain events |
10
Message Queues & Event-Driven Architecture
Publisher (order-service)
// src/mq.ts — RabbitMQ with amqplib
import amqp from 'amqplib';
let connection: amqp.Connection;
let channel: amqp.Channel;
export async function connectMQ(url: string) {
connection = await amqp.connect(url);
channel = await connection.createChannel();
// Ensure exchange exists
await channel.assertExchange(
'events', 'topic', { durable: true }
);
return { connection, channel };
}
// Publish a domain event
export async function publishEvent(
routingKey: string,
payload: Record<string, unknown>
) {
const msg = JSON.stringify({
id: crypto.randomUUID(),
type: routingKey,
timestamp: new Date().toISOString(),
data: payload,
});
channel.publish(
'events',
routingKey,
Buffer.from(msg),
{ persistent: true, contentType: 'application/json' }
);
}
Consumer (payment-service)
// src/consumers/order-created.ts
import { channel } from '../mq.js';
export async function startConsumer() {
const q = await channel.assertQueue(
'payment.order-created',
{ durable: true }
);
await channel.bindQueue(
q.queue, 'events', 'order.created'
);
// Prefetch 1 = process one at a time
channel.prefetch(1);
channel.consume(q.queue, async (msg) => {
if (!msg) return;
try {
const event = JSON.parse(
msg.content.toString()
);
await processPayment(event.data);
channel.ack(msg); // success
} catch (err) {
channel.nack(msg, false, true); // requeue
}
});
}
Broker Comparison
| Broker | Strength |
| RabbitMQ | Routing flexibility, mature |
| NATS | Ultra low latency, cloud-native |
| Redis Streams | If you already have Redis |
| AWS SQS | Zero ops, serverless |
11
API Gateway Pattern
Gateway Options
| Tool | Type | Notes |
| Kong | Full-featured | Plugins, Lua-based |
| Nginx | Reverse proxy | High perf, config-driven |
| AWS API GW | Managed | Zero ops, pay-per-call |
| Express | Custom | Full control, more work |
Simple Express Gateway
// gateway/src/server.ts
import express from 'express';
import { createProxyMiddleware } from
'http-proxy-middleware';
import rateLimit from 'express-rate-limit';
import { verifyJWT } from './auth.js';
const app = express();
// Global rate limiting
app.use(rateLimit({
windowMs: 60_000,
max: 100,
standardHeaders: true,
}));
// Auth middleware (skip for public routes)
app.use('/api', verifyJWT);
// Route to services
app.use('/api/users',
createProxyMiddleware({
target: 'http://user-service:3000',
pathRewrite: { '^/api/users': '/api/users' },
changeOrigin: true,
})
);
app.use('/api/orders',
createProxyMiddleware({
target: 'http://order-service:3001',
pathRewrite: { '^/api/orders': '/api/orders' },
changeOrigin: true,
})
);
app.use('/api/payments',
createProxyMiddleware({
target: 'http://payment-service:3002',
pathRewrite: {
'^/api/payments': '/api/payments'
},
changeOrigin: true,
})
);
app.listen(8080, () =>
console.log('Gateway on :8080')
);
12
Service Discovery & Load Balancing
DNS-Based (Simplest)
- Docker Compose: service name = hostname
- Kubernetes:
service-name.namespace.svc.cluster.local
- No code changes needed
- DNS caching can cause stale lookups
// In Docker Compose, service names
// resolve to container IPs:
const url = 'http://user-service:3000';
// K8s:
const url =
'http://user-svc.default.svc.cluster.local';
Client-Side Discovery
- Service registers itself in a registry
- Client queries registry, picks an instance
- Tools: Consul, etcd, Eureka
- More control, more complexity
// Using Consul for discovery
import Consul from 'consul';
const consul = new Consul();
const services = await consul.health
.service({ service: 'user-service', passing: true });
const instance = services[
Math.floor(Math.random() * services.length)
];
const url =
`http://${instance.Service.Address}:${instance.Service.Port}`;
Service Mesh
- Sidecar proxy intercepts all traffic
- Zero application code changes
- mTLS, retries, circuit breaking built in
- Tools: Istio, Linkerd, Consul Connect
# Istio VirtualService
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: user-service
spec:
hosts: [user-service]
http:
- route:
- destination:
host: user-service
weight: 90
- destination:
host: user-service-canary
weight: 10
DNS (Compose/K8s)
→
Client Discovery (Consul)
→
Service Mesh (Istio)
complexity & power →
13
Authentication Between Services
JWT Middleware
// src/middleware/auth.ts
import jwt from 'jsonwebtoken';
import { config } from '../config.js';
import { FastifyRequest, FastifyReply } from 'fastify';
interface JWTPayload {
sub: string;
email: string;
roles: string[];
iat: number;
exp: number;
}
export async function verifyJWT(
req: FastifyRequest,
reply: FastifyReply
) {
const header = req.headers.authorization;
if (!header?.startsWith('Bearer ')) {
return reply.status(401).send({
error: 'Missing bearer token'
});
}
try {
const token = header.slice(7);
const payload = jwt.verify(
token, config.JWT_SECRET
) as JWTPayload;
// Attach to request for downstream use
req.user = payload;
} catch (err) {
return reply.status(401).send({
error: 'Invalid or expired token'
});
}
}
// Propagate JWT to downstream services
export function forwardAuth(req: FastifyRequest) {
return {
Authorization: req.headers.authorization,
'X-Request-Id': req.headers['x-request-id'],
};
}
JWT Propagation (Most Common)
- Gateway validates user JWT once
- JWT forwarded to internal services via
Authorization header
- Each service can inspect claims (roles, sub)
- No extra network calls for auth
Service-to-Service Auth
- API Keys — simple, rotated via env vars
- mTLS — mutual TLS, handled by service mesh
- Service JWTs — short-lived, machine-to-machine
// Service-to-service with API key
const res = await fetch(url, {
headers: {
'X-Service-Key': config.INTERNAL_API_KEY,
'X-Service-Name': 'order-service',
}
});
Security Pitfalls
- Never expose internal services to the internet
- Rotate keys/secrets via K8s Secrets or Vault
- Set short JWT expiry (15m) + refresh tokens
- Log auth failures — detect brute force
14
Logging & Observability
Structured Logging with Pino
// src/logger.ts
import pino from 'pino';
import { config } from './config.js';
export const logger = pino({
level: config.LOG_LEVEL,
base: {
service: 'order-service',
version: process.env.npm_package_version,
env: config.NODE_ENV,
},
timestamp: pino.stdTimeFunctions.isoTime,
formatters: {
level: (label) => ({ level: label }),
},
redact: ['req.headers.authorization',
'req.headers.cookie'],
});
Correlation IDs Across Services
// src/middleware/correlation.ts
import { FastifyInstance } from 'fastify';
import crypto from 'node:crypto';
export function correlationPlugin(app: FastifyInstance) {
app.addHook('onRequest', (req, _reply, done) => {
// Use incoming ID or generate new one
const correlationId =
(req.headers['x-request-id'] as string)
?? crypto.randomUUID();
req.correlationId = correlationId;
// Bind to logger for this request
req.log = req.log.child({ correlationId });
done();
});
app.addHook('onSend', (req, reply, _payload, done) => {
reply.header('X-Request-Id', req.correlationId);
done();
});
}
JSON Log Output
{
"level": "info",
"time": "2026-03-24T10:23:45.123Z",
"service": "order-service",
"correlationId": "a1b2c3d4-e5f6-...",
"msg": "Order created",
"orderId": "ord_98765",
"userId": "usr_12345",
"total": 49.99,
"responseTime": 23
}
Observability Stack
| Pillar | Tool |
| Logs | ELK Stack / Loki + Grafana |
| Metrics | Prometheus + Grafana |
| Traces | Jaeger / Tempo (OpenTelemetry) |
| Alerting | Grafana Alerting / PagerDuty |
Key Rule
Every log line must include a correlationId. Without it, debugging across 5+ services is impossible.
15
Deploying to Kubernetes
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
labels:
app: user-service
spec:
replicas: 3
selector:
matchLabels: { app: user-service }
template:
metadata:
labels: { app: user-service }
spec:
containers:
- name: user-service
image: ghcr.io/myorg/user-service:1.4.2
ports:
- containerPort: 3000
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: user-db-secret
key: url
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet: { path: /health, port: 3000 }
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet: { path: /ready, port: 3000 }
initialDelaySeconds: 5
periodSeconds: 5
terminationGracePeriodSeconds: 30
# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
name: user-service
spec:
selector: { app: user-service }
ports:
- port: 80
targetPort: 3000
type: ClusterIP
---
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /$2
spec:
ingressClassName: nginx
rules:
- host: api.example.com
http:
paths:
- path: /users(/|$)(.*)
pathType: ImplementationSpecific
backend:
service:
name: user-service
port: { number: 80 }
# k8s/hpa.yaml — Autoscaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: user-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: user-service
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
16
Deploying to AWS (ECS/Fargate & Lambda)
ECS Task Definition
{
"family": "user-service",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"containerDefinitions": [{
"name": "user-service",
"image": "123456789.dkr.ecr.us-east-1.amazonaws.com/user-service:1.4.2",
"portMappings": [{ "containerPort": 3000 }],
"environment": [
{ "name": "NODE_ENV", "value": "production" }
],
"secrets": [{
"name": "DATABASE_URL",
"valueFrom": "arn:aws:ssm:us-east-1:123456789:parameter/prod/user-service/db-url"
}],
"healthCheck": {
"command": ["CMD-SHELL",
"wget -qO- http://localhost:3000/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/user-service",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}]
}
| Criteria | ECS/Fargate | Lambda |
| Startup | ~30s (container) | ~200ms (cold start) |
| Long-running | Yes, always on | Max 15 min |
| Scaling | Min/max tasks | Automatic, per-request |
| WebSockets | Supported | Via API GW WS |
| Cost model | Per hour (vCPU+mem) | Per invocation |
| Best for | Stateful / high traffic | Bursty / event-driven |
Fargate vs EC2 Launch Type
- Fargate: zero server management, pay-per-task, best default
- EC2: cheaper at scale, GPU workloads, more control
Lambda as Microservice
// handler.ts — Lambda with API GW
import { APIGatewayProxyHandlerV2 } from 'aws-lambda';
export const handler: APIGatewayProxyHandlerV2 =
async (event) => {
const { pathParameters, body } = event;
// Same business logic, different wrapper
const user = await userService.getById(
pathParameters?.id!
);
return {
statusCode: 200,
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(user),
};
};
17
CI/CD Pipeline for Microservices
# .github/workflows/deploy-service.yml
name: Build & Deploy Service
on:
push:
branches: [main]
paths: ['services/user-service/**'] # Only trigger for this service
env:
SERVICE: user-service
REGISTRY: ghcr.io/${{ github.repository_owner }}
jobs:
build-and-deploy:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: 22, cache: 'pnpm' }
- name: Install & Test
run: |
corepack enable
pnpm install --frozen-lockfile
pnpm --filter ${{ env.SERVICE }} run lint
pnpm --filter ${{ env.SERVICE }} run test
pnpm --filter ${{ env.SERVICE }} run build
- name: Set image tag
id: tag
run: echo "tag=$(git rev-parse --short HEAD)" >> $GITHUB_OUTPUT
- name: Build & Push Docker Image
uses: docker/build-push-action@v5
with:
context: services/${{ env.SERVICE }}
push: true
tags: |
${{ env.REGISTRY }}/${{ env.SERVICE }}:${{ steps.tag.outputs.tag }}
${{ env.REGISTRY }}/${{ env.SERVICE }}:latest
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Deploy to Kubernetes
uses: azure/k8s-deploy@v5
with:
manifests: services/${{ env.SERVICE }}/k8s/
images: ${{ env.REGISTRY }}/${{ env.SERVICE }}:${{ steps.tag.outputs.tag }}
namespace: production
strategy: canary
percentage: 20
Push to main
→
Lint + Test
→
Build Image
→
Canary 20%
→
Full Rollout
18
Database per Service
Why Own Your Data
- Independent schema evolution
- Choose the right DB per workload
- No shared-DB coupling between teams
- Services can be deployed independently
The Saga Pattern
Distributed transactions across services using a sequence of local transactions + compensating actions:
- order-service: create order (PENDING)
- payment-service: charge card
- inventory-service: reserve stock
- If any step fails → run compensations
Data Consistency Strategies
| Pattern | Consistency |
| Saga (choreography) | Eventual |
| Saga (orchestration) | Eventual |
| Event Sourcing | Eventual |
| Outbox Pattern | At-least-once |
19
Common Pitfalls & Production Lessons
Distributed System Fallacies
- The network is not reliable
- Latency is not zero
- Bandwidth is not infinite
- The topology does change
Every inter-service call can fail. Design for it.
Circuit Breaker (opossum)
import CircuitBreaker from 'opossum';
const breaker = new CircuitBreaker(
async (userId: string) => {
const res = await fetch(
`${config.USER_SERVICE_URL}/api/users/${userId}`
);
if (!res.ok) throw new Error(`HTTP ${res.status}`);
return res.json();
},
{
timeout: 3000, // 3s per call
errorThresholdPercentage: 50,
resetTimeout: 10000, // try again after 10s
volumeThreshold: 5, // min calls before tripping
}
);
breaker.on('open', () => log.warn('Circuit OPEN'));
breaker.on('halfOpen',() => log.info('Circuit HALF-OPEN'));
breaker.on('close', () => log.info('Circuit CLOSED'));
// Usage
breaker.fallback(() => ({ id: userId, name: 'Unknown' }));
const user = await breaker.fire(userId);
Retry with Exponential Backoff
// src/utils/retry.ts
export async function retry<T>(
fn: () => Promise<T>,
opts = { maxRetries: 3, baseDelay: 200 }
): Promise<T> {
let lastError: Error | undefined;
for (let attempt = 0; attempt <= opts.maxRetries; attempt++) {
try {
return await fn();
} catch (err) {
lastError = err as Error;
if (attempt < opts.maxRetries) {
const delay = opts.baseDelay * 2 ** attempt
+ Math.random() * 100; // jitter
await new Promise(r => setTimeout(r, delay));
}
}
}
throw lastError;
}
// Usage
const data = await retry(
() => fetchFromService(url),
{ maxRetries: 3, baseDelay: 200 }
);
Timeout Management
- Set timeouts on every outbound call
- Use
AbortController with fetch
- Cascade: gateway 30s > service 10s > DB 5s
- Without timeouts: one slow service kills all
const controller = new AbortController();
setTimeout(() => controller.abort(), 5000);
const res = await fetch(url, {
signal: controller.signal,
});
20
Summary & Further Reading
Key Takeaways
- Start monolithic — extract services when team/scaling pain is real
- Each service: health checks, graceful shutdown, structured logging
- Validate config at startup — fail fast, not at 3 AM
- Multi-stage Dockerfiles, non-root user, Alpine base
- Prefer async messaging over sync HTTP between services
- Database per service + saga pattern for consistency
- Circuit breakers, retries, timeouts on every call
- Correlation IDs in every log line across every service
Learning Path
Node.js basics
→
REST APIs
→
Docker
→
K8s
→
Service Mesh
Recommended Books
- Building Microservices — Sam Newman (2nd ed.) — the definitive guide to microservice architecture
- Node.js Design Patterns — Mario Casciaro & Luciano Mammino — advanced patterns for production Node
- Designing Data-Intensive Applications — Martin Kleppmann — distributed data fundamentals
- Release It! — Michael Nygard — production stability patterns
Tools & Resources
| Category | Tools |
| Framework | Fastify, NestJS, Express |
| Queue | RabbitMQ, BullMQ, NATS |
| Containers | Docker, Podman, Buildah |
| Orchestration | Kubernetes, ECS, Nomad |
| CI/CD | GitHub Actions, GitLab CI |
| Monitoring | Prometheus, Grafana, Datadog |
| Tracing | OpenTelemetry, Jaeger |