CLOUD SERVICE MODELS · PART 3 OF 6
    

PaaS · FaaS · CaaS

The managed-compute middle — where most workloads should live

Heroku · Render · Fly · Vercel Cloud Run · Fargate · Container Apps Lambda · Cloud Functions · Workers EKS · GKE · AKS

git push → build → image → deploy → live URL

Side-by-side tour of every flavour of managed compute · cold-start economics · pricing & latency · decision matrix.

Platform · Function · Container · Choose

Topics

PaaS

Heroku-style — Render, Fly, Railway
Hyperscaler — Beanstalk, App Engine, App Service
Frontend-shaped — Vercel, Netlify, Cloudflare Pages
Buildpacks & build orchestration

CaaS

"Run my container" — Cloud Run, Fargate, Container Apps, Fly Machines
Managed Kubernetes — EKS, GKE, AKS, Autopilot
The serverless container revolution

FaaS

Lambda anatomy — invoke → init → execute
Cold starts — what causes them, what fixes them
Edge FaaS — V8 isolates (Workers, Edge Functions)
Event-driven patterns & orchestration (Step Functions)

Choosing

Cost model comparison — per-hour vs per-request
Latency profiles — cold start, warm, p99
Decision matrix & anti-patterns
Deployment strategies — blue/green, canary, rolling

The Managed-Compute Spectrum

Reading left-to-right: your unit shrinks (server → container → function), scale-to-zero gets faster, the platform handles more, and you trade away more flexibility.

Heroku-Style PaaS — git push, get URL

The original PaaS pattern — Heroku, 2007. Push code; the platform detects the language, runs a build, deploys an app, gives you a URL, scales it on a slider. Modern siblings have closed the gap on price and added preview environments, multi-region, and IPv6.

The five contenders

Platform	Build	Killer feature	From ($)
Heroku	Buildpacks	add-ons marketplace, 18 yrs of stability	$5/mo eco-dyno
Render	Auto-detect / Dockerfile	preview environments, native HTTPS, free DBs	free tier · $7/mo paid
Fly.io	Dockerfile / nixpacks	multi-region by default, Firecracker μVMs	~$5/mo
Railway	Nixpacks	Postgres-and-friends in a click	$5/mo
Koyeb	Buildpacks / Docker	edge global, scale-to-0 paid tier	free hobby

A typical workflow

$ git push render main
==> Detected language: Node.js (package.json)
==> Running build: npm ci && npm run build
==> Building image (cached layers)
==> Provisioning HTTPS cert (Let's Encrypt)
==> Deploying to: my-app.onrender.com
==> Health check passing
==> Live in 2m18s

What you give up

Build-image choice (your runtime is whatever the platform says it is)
Networking depth — limited VPC, no NAT, no peering
Sidecars / multi-process pods (Heroku has limited buildpack-multi-procfile, Fly has machine-side)
OS tweaks (no apt-get install at runtime; build-time only)

What you get back

Time. From "I have an idea" to "URL anyone can hit" in < 10 minutes
Free TLS, free preview env per PR, autoscaling, log UI, no on-call for hypervisor
Add-ons — Postgres, Redis, S3-compat object stores, CDN, Sentry — one click

When you outgrow it

Per-instance pricing > CaaS at > ~$2k/month spend; or you need a private link to an internal data store; or your stack hits a runtime limit. Climb to CaaS or managed K8s.

Hyperscaler PaaS

Each big-three cloud has its own PaaS. They're more enterprisey, more integrated with the rest of the cloud (IAM, VPC, observability) and more boring than the indie PaaS — which is the point if you're already in the cloud.

AWS

Elastic Beanstalk — the original, EC2 underneath, dated UX
App Runner — modern, container-or-source, scale-to-zero (2024+)
Amplify Hosting — frontend SPAs & SSR, Vercel-like
Lightsail — VPS-style fixed-price PaaS

GCP

App Engine Standard — strict sandbox, sub-second autoscale, deprecated for greenfield
App Engine Flex — VM-based, less popular than Cloud Run
Firebase Hosting — frontend + functions

Azure

App Service — broad runtime support, Windows + Linux, slot-based deployment
Static Web Apps — frontend + API in one
Container Apps — also CaaS, see slide 08

Why they exist

Selling "your team uses one cloud" — same IAM, same network, same support contract. Procurement loves them. Developers usually prefer Render, Vercel, or Cloud Run.

The Beanstalk warning

Old, opinionated, slow-deploying. Many teams trying to escape Beanstalk in 2024–2026 — App Runner is the AWS-native escape hatch; ECS Fargate is the polyglot one.

Frontend-Shaped PaaS

A new category: PaaS optimised for the JS frontend stack. Edge by default, framework-aware, preview-per-PR, and tightly bound to one or two frameworks (Next, Astro, SvelteKit).

Vercel

Made by the Next.js team — first-class for Next App Router, RSC, ISR, image optimisation
Edge functions (V8 isolates) + Lambda functions (Node)
Preview deployments per branch / PR — game-changing UX
Pricing: free hobby, $20/mo Pro, custom Enterprise; bandwidth gets expensive at scale

Netlify

The original JAMstack PaaS (2014)
Functions, Edge Handlers, Forms, Identity, A/B split testing
Strong agency / marketing-site fit

Cloudflare Pages + Workers

Sites pushed to 300+ edge locations globally
Workers = backend at the same edge, V8-isolate cold-start <5ms
D1 (SQLite), R2 (object), KV, Durable Objects, Queues — full-stack-on-the-edge
Pricing: free tier huge, $5/mo Workers Paid; no egress fees ever

Same-day frontends

The Vercel/Netlify/CF pattern collapses what used to be: hosting + CDN + TLS + functions + previews + analytics into one git push. For greenfield consumer-facing apps with React/Next/Svelte, this is the new default.

The framework lock-in

Vercel works best with Next, Cloudflare Pages with their Workers SDK, Netlify with their Functions runtime. Self-hosting Next (via next start on a container) is harder than they advertise — open-next.js has emerged to bridge.

Buildpacks · Dockerfiles · Nixpacks

How a PaaS turns your repo into something runnable. Three approaches — pick your trade-off between magic and control.

Buildpacks

Heroku invented (2011); CNB (Cloud Native Buildpacks) is the OSS standard
Auto-detect: package.json → Node, requirements.txt → Python
Reproducible, layered, rebases efficiently on base-image updates
Used by Heroku, GCP App Engine / Cloud Run source-deploy, IBM Code Engine

Dockerfile

You write the recipe; the platform builds & runs it
Maximum control, language-agnostic, portable across every CaaS
Multi-stage builds for tiny final images — see Multi-Stage Builds deck
Required for: heavy native deps, custom CA bundles, FIPS, BYO image

Nixpacks

Railway-led modern alternative, uses Nix under the hood
Faster than buildpacks, deterministic, cache-friendly
Used by Railway, Coolify, Sevalla

Build at PaaS vs build in CI

Build-in-PaaS — push source, platform builds. Simpler, less reproducible.
Build-in-CI — GitHub Actions builds the image, pushes to a registry, deploys. More complex, fully reproducible, scans for CVEs in pipeline. The pattern at scale.
Hybrid — CI builds & tags; PaaS pulls. Best of both for medium teams.

The "works in CI, fails in PaaS" trap

Different base images, different versions of node, different libc. Lock the runtime version in .tool-versions / package.json engines / runtime.txt — and ideally build the image yourself.

"Run My Container" CaaS

Push a container image; the platform runs it, scales it, networks it, gives you a URL. No VMs, no Kubernetes, no buildpack opinions. Most workloads should default here.

Service	Cloud	Cold start	Min CPU	Scale-to-0?	Pricing
Cloud Run	GCP	~500 ms gen1, < 100ms gen2	0.08 vCPU	yes	per-100ms vCPU + GB-sec + req
ECS Fargate	AWS	20–60 s (task launch)	0.25 vCPU	no (min 1 task)	per-second vCPU + GB
App Runner	AWS	10–30 s (warm pool)	0.25 vCPU	yes (paused state)	per-second + provisioned
Container Apps	Azure	~1 s	0.25 vCPU	yes (KEDA)	per-second vCPU + GB + req
Fly Machines	Fly.io	200–600 ms (cold), < 50 ms (suspended)	shared-1× CPU	yes (auto-stop)	per-second + bandwidth
Cloudflare Containers	CF	< 1 s (2025 GA)	0.25 vCPU	yes	per-10ms vCPU + GB

Why CaaS won

You bring a container — works in your laptop, prod, CI, anywhere
Scale to zero — pay nothing when idle (most platforms)
No platform team needed; one team can ship, deploy, and operate
Migration path from PaaS or Kubernetes alike

A Cloud Run deploy

gcloud run deploy myapp \
  --image=us-docker.pkg.dev/proj/repo/app:v42 \
  --region=europe-west2 \
  --concurrency=80 \
  --min-instances=0 \
  --max-instances=100 \
  --cpu=1 --memory=512Mi \
  --service-account=app-runner@proj.iam... \
  --allow-unauthenticated

Managed Kubernetes — EKS · GKE · AKS

The cloud runs the control plane (etcd, API server, scheduler); you bring node pools and deploy workloads. The "platform you build on top of" — and the right answer when CaaS isn't enough.

When you actually need Kubernetes

You need workload-level customisation — sidecars, init containers, custom CSI / CNI
You're running enough services that Cloud Run / Fargate billing crosses break-even (~$3-5k/month workloads)
You need a service mesh, advanced traffic shaping, custom CRDs
Cross-cluster portability (on-prem ↔ cloud, or multi-cloud)

The three managed offerings

Service	Control plane	Auto-mode	Notes
EKS	$0.10/hr ($73/mo)	EKS Auto Mode (2024)	Most flexible, most ops
GKE	$0.10/hr / cluster after first	Autopilot	Smartest defaults, Autopilot is the standout
AKS	Free (Standard SKU costs)	—	Tightest Azure integration

Auto / Autopilot mode

Provider runs both control plane and node lifecycle
You declare pods; the platform decides nodes (Karpenter-style)
Closes the gap with CaaS — pay per pod, scale to zero, but keep K8s API
GKE Autopilot is the most mature; EKS Auto Mode launched re:Invent 2024

Critical add-ons (real prod K8s)

Karpenter / cluster-autoscaler — node-level scaling
External-DNS, cert-manager — DNS + TLS automation
ArgoCD / Flux — GitOps
External Secrets Operator — secrets sync
Datadog / OpenTelemetry agents — observability
Kyverno / OPA Gatekeeper — admission policy

The truth about K8s cost

"Kubernetes is cheap" is a myth — the control plane is, the platform team isn't. A serious K8s install needs at least 1.5 SREs to operate well. If you don't have them, use CaaS.

FaaS — How a Lambda Actually Runs

Cold path

Steps 2–3. 50ms (Workers) → 1.5s (Java Lambda in VPC). Once per fresh worker.

Warm path

Step 4 only. Handler is already in memory. P50 latency = your code's latency.

Burst path

Sudden traffic → hundreds of cold workers spun up in parallel. P99 spikes; this is where load tests reveal the truth.

Cold Starts — Causes & Cures

What contributes to a cold start

μVM boot — ~125 ms on Firecracker
Runtime init — Node, Python ~50ms; JVM 300–800ms; .NET 200–600ms; Go < 50ms
Code download — bigger zip = longer; 50 MB Lambda zip ~150 ms slower than 1 MB
Handler init — module imports, DB connection setup, AWS SDK warmup. This is usually the biggest variable.
VPC ENI attach — historically seconds; "Hyperplane ENIs" (2019) brought it to ~250 ms

Mitigations

Provisioned concurrency / min instances — keep N warm; you pay for them
SnapStart — Lambda Java/Python — restore from a memory snapshot, ~100 ms instead of 800 ms
Smaller deploy package — split function dependencies, layer common code
Lazy-init at module-top only when needed — defer optional clients
Compile-on-build — Go & Rust dominate cold-start charts

Cold-start latency, real numbers (2024-25)

Runtime	p50 cold	p99 cold
Workers (V8 isolate)	< 5 ms	< 30 ms
Lambda Node 20	180 ms	650 ms
Lambda Python 3.12	200 ms	700 ms
Lambda Java + SnapStart	200 ms	800 ms
Lambda Java (no Snap)	900 ms	3500 ms
Cloud Run gen2 Node	120 ms	400 ms

Edge FaaS — V8 Isolates

A different beast: each function runs in a V8 isolate (the same construct as a browser tab) instead of a μVM. ~5 ms cold start. No Node-native APIs. Code runs at the closest of hundreds of edge locations to the user.

What's running it

Cloudflare Workers — 300+ POPs, the original isolate-based platform
Vercel Edge Functions — also V8 isolates, on Vercel's edge
Deno Deploy — Deno runtime, Anthropic-and-others backed
Fastly Compute@Edge — WASM-based, multi-language

Best fits

Auth checks, A/B routing, geofencing — request rewriting at the edge
Lightweight APIs with global users (auth, KV reads, image proxying)
Edge SSR — render close to the user; revalidate from the origin

What you give up

No fs, no net as in Node — only fetch, Request, Response, Streams, WebCrypto
No native binaries (sharp, tensorflow.js without WASM)
Limited CPU per request (~50 ms wall on Workers Free, 30s Paid; varies)
No long-running background jobs (Workers Cron, Durable Objects fill some of this)

Workers KV / D1 / Durable Objects

KV — eventually consistent global key-value, microseconds at edge
D1 — SQLite per region, replicated
Durable Objects — single-tenant in-memory state, transactional, the "actor" pattern at the edge

Closest equivalents elsewhere: Vercel KV/Postgres, Deno KV, PlanetScale Hyperdrive + edge functions.

Event-Driven Patterns

FaaS shines on events, not requests. Most production FaaS code runs in response to: a queue message, an S3 upload, a CloudWatch event, a webhook, a DynamoDB change, a Kinesis record.

Common event sources

Queues — SQS / Pub/Sub / Service Bus / EventBridge
Storage — S3 / GCS / Blob notifications
DBs — DynamoDB Streams, Firestore triggers, Cosmos change feed
Schedule — EventBridge / Cloud Scheduler / Cron
HTTP — API Gateway, ALB, Function URLs

Orchestration — when one function isn't enough

AWS Step Functions — JSON state machine; retries, parallel, choice; the standard for AWS pipelines
GCP Workflows — YAML state machine
Azure Durable Functions — code-first orchestration in C# / JS
Temporal — vendor-neutral workflow engine; runs on any cloud

Idempotency & at-least-once

Event sources retry. Your function will be called with the same event again. Design for it: idempotency keys, conditional writes (DynamoDB attribute_not_exists), de-dup via a transaction id.

Dead-letter queues are not optional

Configure DLQ on every function
Set max-receive count (3–5)
Alarm on DLQ depth > 0
Have a redrive runbook

The "lambda calls lambda" smell

If one function synchronously calls another, you're paying double, doubling cold starts, and increasing tail latency. Use Step Functions, queues, or just one bigger function.

Cost Comparison — Same Workload, Five Layers

A small JSON API at 100 requests/sec sustained and 100 ms median, 256 MB. Approximate, us-east-1, 2025 prices.

Layer	Service	Sizing	$/month (rough)	Comment
IaaS	2× t3.small (HA)	2 vCPU, 2 GB each	~$30 (RI) – $60 (on-demand)	+ ALB ($23) + NAT GW ($32) + EBS — closer to $120–150
K8s	EKS small cluster + 2 t3.small nodes	1 control plane + 2 nodes	~$73 + $30–$60 = $100–135	Plus ALB / NAT; before any platform team cost
CaaS	Cloud Run, max-instances 4, scale-to-0	1 vCPU, 256MB	~$15–25	Excludes free tier; idle costs $0
PaaS	Render Standard $7/mo or Heroku Basic $7/mo each x 2 instances	HA, autoscaling off	$14–25	+ $7/mo Postgres add-on
FaaS	Lambda + API GW	256 MB, 100 ms each	~$70 (Lambda) + ~$45 (APIGW)	API GW is the killer; switch to Function URL or ALB to cut to ~$25 total

Take-aways

CaaS & PaaS are the cheapest for steady ~100 RPS workloads
Lambda is cheapest only if you're not using API Gateway
K8s is rarely cheapest for small workloads — it's a platform, not an optimisation
IaaS only wins below this scale (single VM running everything) or above (3-year RI giants)

Where the curve flips

< 1 RPS, spiky → FaaS
1–500 RPS, steady → CaaS / PaaS
500+ RPS or stateful → managed K8s or IaaS with reserved capacity

Deployment Strategies on Managed Compute

Rolling

The default everywhere. Replace N pods/instances at a time; drain old, start new. Simple, slow, hard to roll back mid-deploy.

Blue / green

Stand up the new version alongside, swap traffic atomically. Cloud Run revisions, AWS App Runner deploys, K8s with two services + a switch. Fast rollback (swap back).

Canary

Send 1% / 5% / 25% / 100% of traffic to the new version while watching SLO metrics. Cloud Run native traffic-splitting, ALB weighted target groups, Linkerd / Istio + Argo Rollouts.

Feature flags & ring deploys

Different problem from infra deploys. Push code dark; flip flags per cohort. LaunchDarkly, Unleash, GrowthBook, Statsig, AWS AppConfig. Decouples "deploy" from "release".

The forgotten half — db migrations

Code can roll forward; data rarely can. Separate schema migrations from code deploys: backwards-compatible migration first, code, then drop-old-column migration. Two deploys, never one.

Where to read more

CI/CD deck covers the pipeline. Deploying Web Applications for beginner deploy strategies.

Decision Matrix — Which Layer?

You are…	Default	Why
Building a Next.js / SvelteKit consumer site	Vercel / Cloudflare Pages	Framework-aware, edge SSR, preview-per-PR, free TLS
Shipping a small Go / Python / Node API	Cloud Run / Render / Fly	One container, scale-to-0, deploy via `gcloud run deploy` or `git push`
Glue / webhook / S3-triggered ETL	Lambda / Cloud Functions	Per-event billing; nothing else fits this shape
Cron / scheduled jobs	EventBridge + Lambda; Cloud Scheduler + Cloud Run Jobs	$0 idle, robust retries, observability built-in
Auth / geofence / A/B at request edge	Cloudflare Workers / Vercel Edge	< 5 ms cold start, runs at every POP
20+ services, custom CRDs, service mesh	Managed K8s (Autopilot/Auto Mode)	K8s API, but nodes are someone else's problem
GPU training / inference workload	IaaS GPUs or specialist (Modal, RunPod)	Lambda has no GPU; CaaS GPU support is thin
Stateful service (DB, queue, cache)	Managed service (RDS, MemoryStore, Aurora, Redis Cloud)	Don't run state on FaaS or PaaS — use the DBaaS
Existing Heroku app, growing	Render / Fly first; Cloud Run if data-side is on GCP	Same DX, modern infra, often cheaper

Anti-Patterns

"Lambda for everything"

200 functions, mutually-coupled, sharing one Postgres, on a 14-step Step Function. Move steady-state HTTP services to CaaS; keep FaaS for events.

"K8s for everything"

Three engineers, one Helm chart per service, one Crossplane install per cloud. Cloud Run / App Runner / Container Apps will give back literal months of life.

"Vercel as our backend"

Long DB connections, background jobs, big files, 6 GB function deploys, 10s timeouts. Push your backend off the frontend PaaS; use a real CaaS.

"Multi-cloud abstraction layer"

Abstracting across Lambda + Cloud Functions + Azure Functions in one repo. The lowest common denominator is awful, debugging gets worse, and you still pick a cloud per region. Don't.

"Free tier serverless will never bill"

Until it does. Lambda + S3 + DynamoDB run-away can rack $thousands overnight. Set hard quota / concurrent execution limits / budget alarms before launch.

"Serverless = no monitoring"

If anything you need more instrumentation. Cold starts, throttles, DLQ depth, init duration. Default CloudWatch is the floor; OpenTelemetry → Honeycomb / Datadog / Tempo is what you actually want.

Summary

Three takeaways

Most workloads belong on CaaS (Cloud Run / Container Apps / App Runner). Default there; deviate when there's a real reason.
FaaS is for events, not services. Use it where its shape fits — webhooks, queues, S3 triggers — not as a substitute for an API.
Kubernetes is a platform you build on top of, not a faster way to ship. Choose it deliberately, or you'll feed it for years.

Next in the series

04 SaaS Architecture — multi-tenancy, B2B identity, metering
05 Cloud Security — IAM in depth, secrets, network, compliance
06 LLM-as-a-Service

Companion decks

Deploying Web Applications — beginner introduction
Deploying with Docker — what you put on these platforms
Docker series — building those containers
CI/CD — the pipeline that triggers each deploy

One sentence

"Pick the smallest unit of deploy that fits your workload — function for events, container for services, pod for platforms — and resist the temptation to use the same hammer for all three."