Storage & Backup Configuration
Longhorn storage, NFS and S3 connections, backup policies, and restore procedures
Overview
This guide covers storage and backup configuration for PodWarden-managed K3s clusters. It complements the Storage reference (volume types, mount syntax) and the Backups reference (policy creation, retention rules) with practical setup instructions and troubleshooting from real deployments.
Longhorn Distributed Storage
PodWarden automatically installs Longhorn during K3s cluster provisioning. Longhorn provides replicated block storage across your nodes, making PVCs resilient to single-node failures.
Verifying Longhorn Is Running
After provisioning your cluster, verify Longhorn is healthy:
# Check Longhorn pods (all should be Running)
kubectl get pods -n longhorn-system
# Check the Longhorn StorageClass exists
kubectl get storageclass longhornExpected output: a longhorn StorageClass with driver.longhorn.io as the provisioner.
In the PodWarden UI, go to the cluster detail page — the StorageClasses section shows all available storage classes including Longhorn.
Longhorn Prerequisites
Longhorn needs these packages on each node, which PodWarden installs during provisioning:
open-iscsi— iSCSI initiator for volume attachmentnfs-common— NFS client (for Longhorn backup targets)
If you added nodes manually (not via PodWarden provisioning), install these packages before joining the cluster:
sudo apt-get update && sudo apt-get install -y open-iscsi nfs-common
sudo systemctl enable --now iscsidLonghorn Health Checks in PodWarden
PodWarden monitors Longhorn status as part of cluster health:
- Cluster detail page shows Longhorn node status and disk availability
- Node schedulability — Longhorn marks a node as unschedulable when disk space drops below the threshold (default: 25% remaining)
- Volume health — degraded volumes (missing replicas) are flagged in the cluster detail
Common Longhorn Issues
Node Not Schedulable
Longhorn marks nodes as unschedulable when available disk space is too low. This prevents new volumes from being created on that node.
Symptoms: PVC stays in Pending state, workload cannot start.
Fix:
# Check disk pressure on the node
kubectl get nodes -o custom-columns=NAME:.metadata.name,DISK_PRESSURE:.status.conditions[?\(@.type==\"DiskPressure\"\)].status
# Check Longhorn node status
kubectl -n longhorn-system get nodes.longhorn.io -o wideFree disk space by cleaning up unused images, old logs, or expanding the disk. Longhorn's threshold is configurable in the Longhorn UI (accessible at longhorn-frontend service in the longhorn-system namespace).
Volume Degraded
A volume shows as degraded when it has fewer healthy replicas than configured (default: 3 replicas).
Common causes:
- A node went offline — Longhorn will rebuild replicas when the node returns or when another node has space
- Disk full on a node — same fix as above
- Network partition between nodes — check flannel and node connectivity
Slow Volume I/O
Longhorn replicates data across nodes over the network. On slow LANs or high-latency mesh connections, this can impact write performance.
Mitigations:
- Reduce replica count to 2 for non-critical workloads
- Pin high-I/O workloads to nodes with fast local storage
- Use
local-pathStorageClass (shipped with K3s) for workloads that do not need replication
Storage Connections
Storage connections are global named backends registered in PodWarden. Workloads reference them by name instead of hardcoding addresses. Manage them under Storage in the sidebar.
Adding NFS Storage
- Go to Storage and click Add Connection
- Select NFS as the type
- Fill in:
- Name — descriptive name (e.g.
nas-media,backup-share) - Server — NFS server hostname or IP (e.g.
10.10.0.50ornas.local) - Base Path — NFS export path (e.g.
/exports/data) - Network Types — how the storage is reachable (LAN, mesh, public)
- Name — descriptive name (e.g.
- Click Test Connection
The test verifies:
- TCP port 2049 reachability
- RPC portmapper response
- NFS export listing
- If managed hosts are available: SSH-based mount and 100 MB read/write speed test
Note: The NFS server must allow connections from all cluster nodes. If using IP-based access control in
/etc/exports, include the IP range of your K3s nodes.
Adding S3 Storage
- Go to Storage and click Add Connection
- Select S3 as the type
- Fill in:
- Name — descriptive name (e.g.
offsite-backup,media-bucket) - Endpoint — S3 endpoint URL (e.g.
https://s3.amazonaws.comfor AWS,https://minio.local:9000for MinIO) - Bucket — bucket name
- Region — AWS region or empty for non-AWS
- Access Key — stored encrypted
- Secret Key — stored encrypted
- Network Types — typically
publicfor AWS S3,lanormeshfor local MinIO
- Name — descriptive name (e.g.
- Click Test Connection
The test verifies endpoint reachability, authentication, and runs a 100 MB upload/download speed test.
MinIO example:
| Field | Value |
|---|---|
| Endpoint | https://minio.local:9000 |
| Bucket | podwarden-backups |
| Region | (leave empty) |
| Access Key | minioadmin |
| Secret Key | minioadmin |
AWS S3 example:
| Field | Value |
|---|---|
| Endpoint | https://s3.amazonaws.com |
| Bucket | my-podwarden-backups |
| Region | us-east-1 |
| Access Key | AKIA... |
| Secret Key | wJal... |
Testing Connections
Always test connections after creating them. The test runs from the PodWarden API server, not from the K3s cluster. A successful test confirms:
- Network connectivity to the storage
- Authentication works
- Read/write operations succeed
If the test passes but workloads fail to mount, check that the K3s nodes can also reach the storage. The PodWarden API server and cluster nodes may be on different networks.
Backup Policies
PodWarden uses Restic for incremental, deduplicated, encrypted backups. The Backup Operator (a system app) runs inside the cluster and manages the backup lifecycle.
Prerequisites
Install the Backup Operator before creating backup policies:
kubectl apply -f https://www.podwarden.com/operators/backup-operator/install.yamlOr deploy it from the Hub catalog under System Apps. Without the operator, policies remain in Pending state and no backups run. See System Apps for details.
Verify it is running:
kubectl get pods -n podwarden-system
# Should show backup-operator pod in Running state
kubectl get crd | grep backup
# Should show backuppolicies.podwarden.com and backupruns.podwarden.comCreating a Backup Policy
- Go to Backups in the sidebar
- Click New backup policy
- Select the deployed workload to back up
- Choose backup mode:
- Hot — backs up while workload runs (optional pre-backup hook for database dumps)
- Cold — scales workload to 0 during backup (~30s downtime)
- Set the schedule:
- Daily at 2:00 AM:
0 2 * * * - Weekly on Sunday:
0 2 * * 0 - Every hour:
0 * * * *
- Daily at 2:00 AM:
- Select the storage target (an NFS or S3 connection)
- Configure retention:
- Keep last: 7 (number of most recent snapshots to keep)
- Keep daily: 7 (one per day for N days)
- Keep weekly: 4 (one per week for N weeks)
- Click Create policy
The operator picks up the policy and starts scheduling runs automatically.
Hot vs Cold Backups
| Hot | Cold | |
|---|---|---|
| Downtime | None | ~30 seconds |
| Consistency | Application-level (use pre-backup hook) | Filesystem-level (volume is idle) |
| Best for | Web apps, services with dump commands | Databases without dump hooks, apps that need quiescent state |
Hot mode with pre-backup hook is the recommended approach for databases:
# PostgreSQL hook
pg_dumpall -U postgres > /var/lib/postgresql/data/dump.sql
# MySQL hook
mysqldump -u root -p$MYSQL_ROOT_PASSWORD --all-databases > /var/lib/mysql/dump.sql
# Redis hook
redis-cli BGSAVE && sleep 2The hook runs via kubectl exec inside the running pod. File backup starts only after the hook completes successfully. If the hook exits non-zero, the backup is aborted.
Manual Backup Triggers
To run a backup outside the normal schedule:
- Go to Backups
- Click into the policy
- Click Run now
The operator creates an immediate BackupRun. This is useful before risky operations like upgrades or migrations.
Restoring from a Backup
- Go to Backups and click into the policy
- Browse snapshots — each shows timestamp, size, and status
- Click Restore on the snapshot you want
- Confirm the restore
PodWarden creates a RestoreRun CRD. The operator:
- Scales the workload to 0 replicas
- Runs
restic restoreto overwrite PVC data with the snapshot - Scales the workload back to 1
Restore takes 30-120 seconds plus data transfer time. All data written after the snapshot is overwritten.
Warning: Restore overwrites the current PVC data with the snapshot contents. There is no undo. If you need to preserve current data, take a manual backup first.
PostgreSQL Backups
For databases external to K8s or for dedicated database backup pipelines:
- Create a new backup policy with Backup type set to
postgres - Enter database connection details (host, port, database, user, password)
- Set schedule and retention
- Select a storage target
The operator runs pg_dump --format=custom in an init container and uploads the dump via Restic. The custom format supports selective restore with pg_restore.
Backup Operator (System App)
The Backup Operator is a Kubernetes operator that PodWarden deploys to your cluster. It watches for BackupPolicy and RestoreRun CRDs and handles the actual data movement.
How It Works
- PodWarden creates a BackupPolicy CRD in the cluster when you create a policy in the UI
- The operator schedules BackupRun resources according to the cron expression
- Each BackupRun creates a Kubernetes Job that:
- Runs the pre-backup hook (if configured)
- Scales down the workload (cold mode only)
- Runs Restic to backup PVC data to the storage target
- Scales up the workload (cold mode only)
- Applies retention rules (
restic forget)
- PodWarden syncs BackupRun status back to the UI
Kubernetes Resources Created
Each backup policy creates:
| Resource | Name | Purpose |
|---|---|---|
| BackupPolicy | pw-backup-{workload} | CRD watched by the operator |
| Secret | pw-backup-{suffix} | Restic repository password |
| Secret | S3 credentials (if applicable) | From storage connection |
| ServiceAccount | podwarden-backup | RBAC for deployment reads and pod exec |
Monitoring the Operator
# Operator logs
kubectl -n podwarden-system logs deployment/backup-operator --tail=200
# List all backup policies
kubectl get backuppolicies
# List recent backup runs
kubectl get backupruns --sort-by=.metadata.creationTimestamp
# Describe a specific run
kubectl describe backuprun <name>Troubleshooting
Policy Stuck in Pending
The backup operator is not installed or not running.
# Check if CRDs exist
kubectl get crd | grep backup
# Check operator pod
kubectl -n podwarden-system get podsIf CRDs are missing, install the operator. If the pod is crashing, check its logs.
Backup Run Failed
# Check the BackupRun status
kubectl describe backuprun <name>
# Check the backup Job logs
kubectl logs job/<backup-job-name>Common causes:
- Storage unreachable — the cluster nodes cannot reach the NFS or S3 endpoint
- Pre-backup hook failed — the command returned non-zero; check the hook command
- PVC not found — the workload may have been undeployed between policy creation and backup run
- Restic password mismatch — the repo was initialized with a different password. Delete the policy, create a new one (fresh repo path), and run a backup
Longhorn PVC Stuck in Pending
# Check PVC events
kubectl describe pvc <pvc-name>
# Check Longhorn node status
kubectl -n longhorn-system get nodes.longhorn.ioCommon causes:
- No Longhorn nodes are schedulable (disk full)
- StorageClass
longhorndoes not exist (Longhorn not installed) - Access mode mismatch (
ReadWriteManyrequires NFS, not Longhorn)
NFS Mount Fails in Pods
# Test NFS mount from a cluster node
ssh user@node-ip "sudo mount -t nfs storage-ip:/export/path /mnt/test && ls /mnt/test"Common causes:
nfs-commonnot installed on the node- NFS server firewall blocks the node's IP
- Export path does not exist or is not exported
S3 Connection Timeout
If the test passes from PodWarden but backups fail:
- PodWarden's test runs from the API server; backups run from the cluster
- Verify cluster nodes can reach the S3 endpoint:
kubectl run test --rm -it --image=curlimages/curl -- curl -I https://s3.amazonaws.com - For MinIO on LAN: ensure the endpoint URL uses an IP or hostname reachable from pods
Best Practices
-
Test storage connections before creating backup policies. A failing connection means failed backups, and you may not notice until you need a restore.
-
Use hot mode with pre-backup hooks for databases. Cold mode works but causes downtime. A properly configured dump hook gives you application-consistent backups with zero downtime.
-
Keep at least one off-site backup. NFS is fast and cheap for local backups. Add an S3 target (even a cheap one like Backblaze B2) for disaster recovery.
-
Monitor backup run status. Check the Backups page periodically. A policy with consecutive failed runs needs attention.
-
Test restores before you need them. Create a test workload, back it up, delete the data, and restore. Verify the data is intact. Do this at least once per storage target.
-
Use Longhorn for stateful workloads, local-path for ephemeral ones. Longhorn provides replication but has network overhead.
local-pathis faster but data lives on a single node. -
Set retention policies thoughtfully. 7 daily + 4 weekly snapshots is a sensible default. Increase for critical data. Restic deduplication keeps storage costs manageable even with many snapshots.
Next Steps
- Storage — volume types, mount syntax, StorageClass discovery
- Backups — detailed backup reference (CRD specs, compose stack backups, pre-backup hooks)
- System Apps — backup operator installation and detection
- Infrastructure Setup — initial cluster provisioning