Understanding how Docker manages data — from ephemeral container layers to production-grade persistent storage strategies
Containers use a layered, copy-on-write filesystem. Each image layer is read-only; the running container adds a thin writable layer on top.
docker rm destroys all written data┌─────────────────────────────────────┐
│ Writable Container Layer (R/W) │ ← Deleted on container removal
├─────────────────────────────────────┤
│ Layer 4: CMD / EXPOSE │ ← Read-only
├─────────────────────────────────────┤
│ Layer 3: COPY app source │ ← Read-only
├─────────────────────────────────────┤
│ Layer 2: RUN npm install │ ← Read-only
├─────────────────────────────────────┤
│ Layer 1: FROM node:20-alpine │ ← Read-only (base image)
└─────────────────────────────────────┘
Docker uses a union filesystem (UnionFS) to merge multiple read-only layers into a single coherent view. The default on modern Linux is OverlayFS.
# Inspect the storage driver in use
docker info | grep "Storage Driver"
# Storage Driver: overlay2
# See the layers of an image
docker inspect node:20-alpine \
--format='{{.GraphDriver.Data}}'
# View overlay mount for a running container
docker inspect myapp \
--format='{{.GraphDriver.Data.MergedDir}}'
# Check disk usage by layer
docker system df -v
Docker provides three mount types to persist data beyond the container lifecycle. Each serves a different purpose.
/var/lib/docker/volumes/:ro) Host Filesystem Docker Area Memory
┌──────────────┐ ┌──────────────────────┐ ┌──────────┐
│ /home/user/ │──bind mount──│ /var/lib/docker/ │ │ tmpfs │
│ /data/app/ │ │ volumes/mydata/_data│ │ (RAM) │
└──────────────┘ └──────────────────────┘ └──────────┘
↕ ↕ ↕
┌─────────────────────────────────────────────────────────────────────┐
│ Container Filesystem │
└─────────────────────────────────────────────────────────────────────┘
# Create a named volume
docker volume create pgdata
# Use it with a container
docker run -d \
-v pgdata:/var/lib/postgresql/data \
--name db postgres:16
# Same volume, new container
docker run -d \
-v pgdata:/var/lib/postgresql/data \
--name db2 postgres:16
VOLUME in Dockerfile or -v /path# Creates an anonymous volume
docker run -d \
-v /var/lib/postgresql/data \
postgres:16
# List volumes — spot the anonymous ones
docker volume ls
# DRIVER VOLUME NAME
# local pgdata ← named
# local a1b2c3d4e5f6... ← anonymous
# Clean up dangling anonymous volumes
docker volume prune
Bind mounts map a specific host directory into the container. Changes on either side are reflected instantly.
# Legacy -v syntax
docker run -d \
-v $(pwd)/src:/app/src \
-v $(pwd)/config:/app/config:ro \
myapp:dev
# Modern --mount syntax (recommended)
docker run -d \
--mount type=bind,src=$(pwd)/src,dst=/app/src \
--mount type=bind,src=$(pwd)/config,dst=/app/config,readonly \
myapp:dev
# Key difference: --mount errors if
# source doesn't exist (safer)
# -v silently creates an empty directory
Add :ro or readonly to prevent the container from modifying host files. Essential for config and secrets.
tmpfs mounts store data in host memory only. Data is never written to the host filesystem and is lost when the container stops.
# Using --tmpfs flag
docker run -d \
--tmpfs /tmp:rw,size=64m \
--tmpfs /run/secrets:rw,size=1m \
myapp:1.0
# Using --mount syntax (more options)
docker run -d \
--mount type=tmpfs,dst=/tmp,tmpfs-size=67108864 \
--mount type=tmpfs,dst=/run/secrets,tmpfs-mode=0700 \
myapp:1.0
# tmpfs-size: bytes (64MB = 67108864)
# tmpfs-mode: file permissions (octal)
Docker volumes use the local driver by default, but plugins enable storage on remote and cloud backends.
| Driver / Plugin | Backend | Use Case |
|---|---|---|
local | Host filesystem | Default; single-host workloads |
local + NFS opts | NFS share | Shared storage across hosts |
rexray/ebs | AWS EBS | Persistent block storage on AWS |
rexray/efs | AWS EFS | Shared file storage on AWS |
azure_file | Azure Files | Shared storage on Azure |
vieux/sshfs | SSH/SFTP | Mount remote dir via SSH |
netapp/trident | NetApp | Enterprise SAN/NAS |
# Create NFS-backed volume
docker volume create \
--driver local \
--opt type=nfs \
--opt o=addr=192.168.1.100,rw \
--opt device=:/exports/data \
nfs_data
# Install and use a plugin
docker plugin install \
vieux/sshfs
docker volume create \
--driver vieux/sshfs \
-o sshcmd=user@host:/remote/path \
-o password=secret \
sshvol
| Command | Description | Example |
|---|---|---|
docker volume create | Create a named volume | docker volume create mydata |
docker volume ls | List all volumes | docker volume ls --filter dangling=true |
docker volume inspect | Show volume details | docker volume inspect mydata |
docker volume rm | Remove a volume | docker volume rm mydata |
docker volume prune | Remove all unused volumes | docker volume prune -f |
docker run -v | Mount volume (legacy syntax) | docker run -v mydata:/app/data img |
docker run --mount | Mount volume (modern syntax) | docker run --mount src=mydata,dst=/data img |
-v name:/path — concise, auto-creates volumes
--mount type=volume,src=name,dst=/path — explicit, errors on missing source. Prefer --mount for clarity.
{
"Name": "pgdata",
"Driver": "local",
"Mountpoint": "/var/lib/docker/volumes/pgdata/_data",
"Scope": "local"
}
Named volumes can be mounted by multiple containers simultaneously, enabling shared storage and data pipelines.
# Create a shared volume
docker volume create shared_data
# Writer container
docker run -d --name writer \
-v shared_data:/data \
alpine sh -c \
'while true; do date >> /data/log.txt; sleep 5; done'
# Reader container
docker run -d --name reader \
-v shared_data:/data:ro \
alpine tail -f /data/log.txt
# Init: populate config from git
docker run --rm \
-v app_config:/config \
alpine/git clone \
https://github.com/org/config.git /config
# App: use the config
docker run -d \
-v app_config:/app/config:ro \
myapp:latest
Multiple writers to the same volume can cause data corruption. Use application-level locking or a database for concurrent access.
# docker-compose.yml
services:
app:
build: .
ports: ["3000:3000"]
volumes:
- ./src:/app/src # bind mount (dev)
- app_uploads:/app/uploads # named volume
depends_on:
db:
condition: service_healthy
db:
image: postgres:16-alpine
environment:
POSTGRES_PASSWORD: secret
volumes:
- pgdata:/var/lib/postgresql/data
- ./init.sql:/docker-entrypoint-initdb.d/init.sql:ro
healthcheck:
test: ["CMD-SHELL", "pg_isready"]
interval: 5s
redis:
image: redis:7-alpine
volumes:
- redis_data:/data
command: redis-server --appendonly yes
volumes:
pgdata: # default local driver
redis_data:
driver: local
app_uploads:
driver_opts:
type: none
o: bind
device: /mnt/uploads
volumes: Keymyapp_pgdatavolumes:
pgdata:
external: true # must pre-exist
name: production_pgdata
Use external: true to reference volumes managed outside Compose.
docker compose down — keeps volumesdocker compose down -v — destroys volumes-v flag in production# Backup: mount volume into temp container
docker run --rm \
-v pgdata:/source:ro \
-v $(pwd):/backup \
alpine tar czf /backup/pgdata-backup.tar.gz \
-C /source .
# Restore: extract into volume
docker run --rm \
-v pgdata:/target \
-v $(pwd):/backup \
alpine tar xzf /backup/pgdata-backup.tar.gz \
-C /target
# PostgreSQL — pg_dump
docker exec db \
pg_dump -U postgres mydb \
> backup.sql
# MySQL — mysqldump
docker exec mysql \
mysqldump -u root -p mydb \
> backup.sql
# Restore
cat backup.sql | docker exec -i db \
psql -U postgres mydb
#!/bin/bash
# backup-volumes.sh
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR=/backups
for vol in pgdata redis_data uploads; do
docker run --rm \
-v ${vol}:/source:ro \
-v ${BACKUP_DIR}:/backup \
alpine tar czf \
/backup/${vol}_${DATE}.tar.gz \
-C /source .
done
# Retention: keep last 7 days
find ${BACKUP_DIR} -name "*.tar.gz" \
-mtime +7 -delete
aws s3 cp or gsutil cpStorage drivers control how image layers and the writable container layer are stored on disk. This is different from volume drivers.
| Driver | Backing Filesystem | Status | Notes |
|---|---|---|---|
| overlay2 | xfs, ext4 | Default & recommended | Best all-round performance; uses kernel OverlayFS |
| btrfs | btrfs | Supported | Native snapshots; good for many builds |
| zfs | zfs | Supported | Snapshots, compression, dedup; resource-heavy |
| devicemapper | direct-lvm | Deprecated | Was default on CentOS/RHEL; avoid for new installs |
| vfs | Any | Testing only | No CoW; full copy per layer. Very slow, very large |
# Check current driver
docker info | grep -i storage
# Set in /etc/docker/daemon.json
{
"storage-driver": "overlay2"
}
Use overlay2 unless you have a specific reason not to. It is the default on all major Linux distributions and offers the best balance of performance and stability.
| Mount Type | Read | Write | Best For |
|---|---|---|---|
| tmpfs | Fastest | Fastest | Temp/scratch data |
| Bind mount | Native | Native | Dev, host data |
| Named volume | Near-native | Near-native | Databases, state |
| Container layer | Slower | Slowest | Avoid for data |
:cached or :delegated consistency modesnode_modules--mount type=tmpfs for temp/scratch datanode_modules in a named volume.dockerignore to reduce build contextd_type=true for overlay2)# Test write performance
docker run --rm -v mydata:/data alpine \
dd if=/dev/zero of=/data/test \
bs=1M count=1024 oflag=direct
# Compare with container layer
docker run --rm alpine \
dd if=/dev/zero of=/tmp/test \
bs=1M count=1024 oflag=direct
Running databases in Docker is common, but losing data is the #1 beginner mistake. Always use named volumes.
services:
postgres:
image: postgres:16-alpine
environment:
POSTGRES_DB: myapp
POSTGRES_USER: admin
POSTGRES_PASSWORD_FILE: /run/secrets/db_pass
volumes:
- pgdata:/var/lib/postgresql/data
- ./init:/docker-entrypoint-initdb.d:ro
secrets:
- db_pass
volumes:
pgdata:
docker run -d --name mysql \
-e MYSQL_ROOT_PASSWORD=secret \
-v mysql_data:/var/lib/mysql \
mysql:8
docker run -d --name mongo \
-v mongo_data:/data/db \
-v mongo_config:/data/configdb \
mongo:7
docker run -d --name redis \
-v redis_data:/data \
redis:7-alpine \
redis-server --appendonly yes
docker compose down -v carelesslydocker volume create \
--label env=production \
--label service=api \
api_uploads
Never store important data without a volume. One docker rm and it's gone forever.
/ or /etc gives container full host access/var/run/docker.sock) = root on hostread_only: true on the container root filesystem# Immutable container with targeted writable mounts
docker run -d \
--read-only \
--tmpfs /tmp:rw,size=64m \
--tmpfs /run:rw,size=8m \
-v app_data:/app/data \
myapp:latest
# Create a secret
echo "s3cr3t" | docker secret create db_pass -
# Use in service (mounted at /run/secrets/)
docker service create \
--secret db_pass \
--name db postgres:16
Secrets are stored encrypted at rest and mounted as tmpfs in the container.
--mount over -v for clarity| Scenario | Use |
|---|---|
| Database data | Named volume |
| Source code (dev) | Bind mount |
| Secrets | tmpfs / Docker secrets |
| Shared config | Read-only bind mount |
| Cross-host sharing | NFS volume driver |
| Large file uploads | Object storage (S3) |