How services prove who they are to other services — without shared secrets, without long-lived keys, and without an operator in the loop.
Every modern system has a sea of services calling other services. They authenticate to each other using secrets — tokens, API keys, certificates. Where do those secrets come from, where are they stored, and what happens when one leaks?
SPIFFE = Secure Production Identity Framework For Everyone. CNCF graduated 2022. A specification, not a product. Defines what a workload identity looks like.
spiffe://acme.com/billing/payments-svc
└── trust domain ──┘└── workload path ──┘
SAN: URI:spiffe://... (X.509) or sub (JWT).Certificate:
Subject: O=SPIRE
X509v3 Subject Alternative Name:
URI:spiffe://acme.com/billing/payments-svc
Validity:
Not Before: 2026-05-06T09:00:00Z
Not After: 2026-05-06T10:00:00Z
Public Key: ECDSA P-256 …
Signed by: spiffe://acme.com (intermediate CA)
{
"iss": "spiffe://acme.com",
"sub": "spiffe://acme.com/billing/payments-svc",
"aud": ["spiffe://acme.com/billing/ledger-svc"],
"exp": 1800003600,
"iat": 1800000000
}
JWT SVIDs are sender-constrained by audience — payments can only call ledger if the SVID was minted with that aud.
SPIRE = SPIFFE Runtime Environment. Two components: a server and an agent on every node. Together they issue SVIDs to workloads after attesting them.
The workload has no shared key with the Server. The Agent's attestation chain proves the workload is legitimate; the SVID falls out as a by-product. App code calls the Workload API and receives a fresh SVID — no enrolment ceremony, no admin step.
The attestation chain is the heart of SPIRE. The Agent and Server verify the workload's identity against the substrate it runs on, not against a credential the workload presents.
k8s_psat — Kubernetes projected service account token, validated against the cluster's OIDC issuer.aws_iid — AWS instance identity document signed by EC2.aws_iam — STS GetCallerIdentity proves the EC2 role.gcp_iit / azure_msi — equivalents.x509pop / sshpop — proof-of-possession of pre-issued credentials.tpm — TPM EK/AK chains for bare metal.spire-server entry create \
-spiffeID spiffe://acme.com/billing/payments-svc \
-parentID spiffe://acme.com/spire/agent/k8s_psat/cluster/abc \
-selector k8s:ns:billing \
-selector k8s:sa:payments-svc \
-selector k8s:container-image-sha:sha256:e1b7… \
-ttl 300
The agent only mints this SPIFFE ID for a workload that matches every selector. Wrong namespace, wrong SA, wrong image hash → no SVID.
Loose selectors give every Pod in a namespace the same identity. Tight selectors (image SHA) break on every deploy. Sweet spot: namespace + service account + image tag with admission policy enforcing valid image registries.
spiffe://acme.com for production.spiffe://oldco.com.oldco validates against acme's view, and vice versa.# SPIRE servers exchange bundles via HTTPS
GET /federation/spiffe/v1/bundle
# response is a SPIFFE Trust Domain bundle (JWK Set + X.509)
{
"spiffe_sequence": 1234,
"spiffe_refresh_hint": 60,
"keys": [
{ "kty": "EC", "use": "x509-svid", "x5c": ["MIIB…"] },
{ "kty": "EC", "use": "jwt-svid", "kid": "abc1", "crv": "P-256", "x":"…", "y":"…" }
]
}
A service mesh injects a sidecar (or, in newer architectures, an ambient agent) into every pod's network path. That sidecar terminates and originates connections, which means it can do three things app code traditionally had to: identity, encryption, policy.
Sidecar holds the workload's SVID / cert. Other pod sees the verified identity, not an IP.
mTLS for free, between every pair of pods. Rotated by the mesh, not the app.
Policies expressed as Kubernetes CRDs, applied at the sidecar — outside the app.
Istio's AuthorizationPolicy CRD is evaluated by the Envoy sidecar (or ztunnel in ambient mode). It supports L4 and L7 rules.
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata: { name: ledger-allow-payments, namespace: billing }
spec:
selector:
matchLabels: { app: ledger-svc }
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/billing/sa/payments-svc"]
to:
- operation:
methods: ["GET","POST"]
paths: ["/v1/postings/*"]
when:
- key: request.auth.claims[scope]
values: ["postings:write"]
- key: request.headers[x-tenant-id]
notValues: [""]
ALLOW, DENY, AUDIT, or CUSTOM (delegate to ext_authz).cluster.local/ns/X/sa/Y) of allowed callers.RequestAuthentication evaluates JWT claims against an unverified token.Linkerd's authorisation model is intentionally smaller than Istio's — fewer CRDs, simpler defaults, gateway-API-aligned.
# 1. Server — selects ports on workloads to protect
apiVersion: policy.linkerd.io/v1beta3
kind: Server
metadata: { name: ledger, namespace: billing }
spec:
podSelector:
matchLabels: { app: ledger-svc }
port: 8080
proxyProtocol: HTTP/2
---
# 2. AuthorizationPolicy — bind a target to a Required* auth
apiVersion: policy.linkerd.io/v1alpha1
kind: AuthorizationPolicy
metadata: { name: ledger-from-payments, namespace: billing }
spec:
targetRef:
group: policy.linkerd.io
kind: Server
name: ledger
requiredAuthenticationRefs:
- { group: policy.linkerd.io, kind: MeshTLSAuthentication,
name: payments-svc-id }
---
# 3. MeshTLSAuthentication — who is allowed
apiVersion: policy.linkerd.io/v1alpha1
kind: MeshTLSAuthentication
metadata: { name: payments-svc-id, namespace: billing }
spec:
identities:
- "payments-svc.billing.serviceaccount.identity.linkerd.cluster.local"
Linkerd 2.12+ defaults to "ports without a Server are open; ports with a Server require an explicit policy". Add a Server and you've turned on the lock.
Linkerd is lighter and more opinionated; Istio more powerful and more complex. For most teams the rule of thumb is: Linkerd unless you specifically need Istio's L7 features (custom Envoy filters, EnvoyFilter CRD, Wasm extensions, fine-grained traffic shaping).
An eBPF-based CNI for Kubernetes. Network policy, observability, encryption and L7 inspection are enforced by eBPF programs in the kernel — no per-pod sidecar, often lower overhead.
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata: { name: payments-to-ledger, namespace: billing }
spec:
endpointSelector:
matchLabels: { app: ledger-svc }
ingress:
- fromEndpoints:
- matchLabels: { app: payments-svc }
toPorts:
- ports: [{ port: "8080", protocol: TCP }]
rules:
http:
- method: "GET"
path: "/v1/postings/.*"
- method: "POST"
path: "/v1/postings"
execve, connect, file open.Cilium's L7 inspection only works for protocols it understands (HTTP/1.1, HTTP/2, gRPC, Kafka, DNS). For anything custom you're back to L4 + a service mesh sidecar.
| System | Who issues |
|---|---|
| Istio (in-mesh) | Istiod / cert-manager+istio-csr / SPIRE federated |
| Linkerd | Linkerd's own identity component / cert-manager |
| Cilium service mesh | Cilium agent, optionally with cert-manager |
| SPIRE-issued mesh | SPIRE Server |
| Cloud-managed (App Mesh, Service Mesh) | Cloud's CA service |
Issuer / ClusterIssuer CRDs for ACME, Vault, AWS PCA, SPIFFE.Rehearse it before you need it. Pre-distribute the new root for ≥ 1 cert TTL before switching issuance, then keep the old root in trust bundles for ≥ 1 cert TTL after.
Each cloud has its own implementation of "this pod / VM / function is allowed to call our APIs". They all converge on the same shape — bind a workload identity to an IAM role, mint short-lived credentials on demand.
| Cloud | Mechanism | How it binds |
|---|---|---|
| AWS — IRSA (IAM Roles for Service Accounts) | EKS pod's projected SA token has aud=sts.amazonaws.com |
STS trusts the cluster's OIDC issuer; pod calls AssumeRoleWithWebIdentity using the SA token. |
| AWS — EKS Pod Identity | Newer alternative; agent on every node | Agent intermediates; pod calls a local IMDSv2-like endpoint. |
| GCP — Workload Identity | Bind GCP service account to K8s service account | GKE metadata server intermediates; pod gets GCP creds via metadata calls. |
| GCP — Workload Identity Federation | Outside-GCP workloads (other clouds, GitHub Actions) | External OIDC IdP → STS-style token exchange → short-lived GCP creds. |
| Azure — Workload Identity | K8s SA federated with Entra app registration | SA token is exchanged via OIDC for an Entra access token. |
| Azure — Managed Identity | VMs / Container Apps / Functions | Local IMDS endpoint at 169.254.169.254 mints tokens for the assigned identity. |
An OIDC token signed by the workload-substrate (cluster, cloud) is exchanged for short-lived API credentials at the cloud's STS. No long-lived secret, no shared key, no manual rotation.
Cloud workload identity authenticates the workload to the cloud's APIs. SPIFFE authenticates workloads to each other. They compose: the SPIRE server can attest using IRSA, and SPIFFE workloads can federate to AWS via WIF.
The Kubernetes-native way to get a short-lived OIDC-style token into a pod. The substrate everything cloud-native workload identity (IRSA, GCP WI, Azure WI) and SPIRE k8s_psat builds on.
apiVersion: v1
kind: Pod
spec:
serviceAccountName: payments-svc
containers:
- name: app
image: acme/payments:1.4
volumeMounts:
- mountPath: /var/run/secrets/aws
name: aws-token
volumes:
- name: aws-token
projected:
sources:
- serviceAccountToken:
audience: sts.amazonaws.com
expirationSeconds: 3600
path: token
{
"iss": "https://oidc.eks.eu-west-1.amazonaws.com/id/ABC123",
"sub": "system:serviceaccount:billing:payments-svc",
"aud": "sts.amazonaws.com",
"exp": 1800003600,
"iat": 1800000000,
"kubernetes.io": {
"namespace": "billing",
"serviceaccount": { "name": "payments-svc", "uid": "…" },
"pod": { "name": "payments-svc-7df4f", "uid": "…" }
}
}
$ kubectl get --raw /.well-known/openid-configuration
{ "issuer": "https://oidc.eks.eu-west-1.amazonaws.com/id/ABC123",
"jwks_uri": "https://oidc…/keys",
"response_types_supported": ["id_token"],
"subject_types_supported": ["public"],
"id_token_signing_alg_values_supported": ["RS256"] }
Any external IdP / cloud STS can be configured to trust this issuer — that's how IRSA / WIF / GitHub Actions OIDC all work.
sub in the IAM role's trust policy — not just iss + aud.mTLS gives you workload identity. JWT validation at the sidecar gives you end-user identity. Together you get authz that knows both who is calling and on whose behalf.
# 1. Validate JWTs from this issuer
apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata: { name: jwt-acme, namespace: billing }
spec:
selector: { matchLabels: { app: ledger-svc } }
jwtRules:
- issuer: "https://login.acme.com/"
jwksUri: "https://login.acme.com/.well-known/jwks.json"
audiences: ["https://api.acme.com/ledger"]
---
# 2. Require a valid JWT with the right scope
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata: { name: ledger-write, namespace: billing }
spec:
selector: { matchLabels: { app: ledger-svc } }
action: ALLOW
rules:
- to: [{ operation: { methods: ["POST"], paths: ["/v1/postings"] } }]
when:
- key: request.auth.claims[scope]
values: ["postings:write"]
- key: request.auth.claims[iss]
values: ["https://login.acme.com/"]
Authorization or in a custom header.If RequestAuthentication isn't present, Istio doesn't validate the JWT at all — but AuthorizationPolicy may still read claims from it. The result: an attacker forges any JWT they like and your policy reads the unsigned claims. Always pair the two.
Authorization.agent claim in the access token (or in the act-on-behalf-of structure of a token-exchanged JWT).This is where OAuth_for_MCP (OAuth profile) + Advanced_OpenID_Connect (Token Exchange + workload OIDC) + this deck (workload identity + service-mesh enforcement) meet.
| You're starting with... | Best fit | Why |
|---|---|---|
| One Kubernetes cluster, one cloud, want low-friction mTLS | Linkerd or Cilium service-mesh mode | Identity + mTLS work out of the box; no extra system to operate. |
| Many K8s clusters, many clouds, polyglot workloads (VM + serverless) | SPIRE + a thin mesh | Vendor-neutral SPIFFE identity travels everywhere; mesh enforces only inside K8s. |
| Heavily AWS-centric, EKS + Lambda + EC2 | IRSA + AWS App Mesh / Cloud Map / native VPC controls | Cloud-managed; deep IAM integration; no SPIRE to operate. |
| Need fine L7 traffic policy (canary, circuit breaking, custom Envoy filters) | Istio | The widest L7 surface; can call out to ext_authz for complex rules. |
| "We just want pod-to-pod NetworkPolicy with identity" | Cilium | eBPF in the kernel; no sidecars; identity-aware NetworkPolicy. |
| Compliance needs hardware-rooted attestation | SPIRE with TPM / AWS Nitro / Azure Confidential / GCP Shielded VM plugins | Workload attestation chain rooted in hardware. |
| Multi-org B2B service-to-service (no shared cluster, no shared cloud) | SPIRE federation | Trust-bundle exchange enables verifiable cross-org calls without VPNs. |
Cloud-native (IRSA / WI / Managed Identity) is the lowest-cost path inside one cloud — but you re-build it each time you add a cloud. SPIRE costs more to operate up front, pays back with portability. Mesh-based identity gets you authz between services for free; less useful for "identity for AWS API calls".
Most common cause: the workload uses a port the mesh isn't proxying (e.g. headless service, host-network pod). Confirm the proxy is actually in the path.
Workload not refreshing the cert before expiry; SPIFFE Workload API call rate-limited or blocked. Re-fetch ahead of expiry (~ 50 % of TTL), with retries.
One cluster has the new CA, another still has the old. Calls fail mTLS validation. Trust Manager / SPIRE bundle endpoint should be polled; verify TTLs aren't too long.
Almost always: the SA annotation is wrong, the IAM role's trust policy doesn't pin the right SA, or the cluster's OIDC issuer isn't registered as an STS identity provider. CloudTrail's AssumeRoleWithWebIdentity error is verbose; read it.
Sidecar's view of "now" diverges from the IdP's; tokens look expired or not-yet-valid. NTP everywhere, leeway ≤ 60 s.
mTLS strict mode + outbound to a non-mesh endpoint = handshake failure. Configure egress policy explicitly; don't rely on default-allow.
You can't inspect HTTP that you can't decrypt. Either terminate TLS at the proxy (mesh-mTLS handles this) or accept L4-only enforcement on opaque traffic.
Loose SPIRE selectors → wrong identity issued to the wrong pod. Tight selectors → pod won't get an identity after a routine deploy. Pin namespace + SA + image registry; iterate when you change those.
Authorization Models — RBAC/ABAC/ReBAC/PaC foundations. Edge & Gateway AuthZ — north-south enforcement. OAuth for MCP Servers · Advanced OpenID Connect · Cloud_aaS_05_Cloud_Security — the wider context this deck refers back to.
SPIFFE specification · SPIRE — spiffe.io · "Solving the Bottom Turtle" (SPIFFE/SPIRE book) · Istio Security · Linkerd Authorization Policy docs · Cilium Network Policies · cert-manager · AWS IRSA / EKS Pod Identity · GCP Workload Identity / WIF · Azure Workload Identity · NIST SP 800-204A (Service Mesh) · CNCF TAG-Security: Workload Identity
Replace every long-lived service credential with a short-lived, attested, automatically-rotated identity. The platforms exist; the only thing left is to wire them up.