Storage & Backup Configuration

Longhorn storage, NFS and S3 connections, backup policies, and restore procedures

Overview

This guide covers storage and backup configuration for PodWarden-managed K3s clusters. It complements the Storage reference (volume types, mount syntax) and the Backups reference (policy creation, retention rules) with practical setup instructions and troubleshooting from real deployments.

Longhorn Distributed Storage

PodWarden automatically installs Longhorn during K3s cluster provisioning. Longhorn provides replicated block storage across your nodes, making PVCs resilient to single-node failures.

Verifying Longhorn Is Running

After provisioning your cluster, verify Longhorn is healthy:

# Check Longhorn pods (all should be Running)
kubectl get pods -n longhorn-system

# Check the Longhorn StorageClass exists
kubectl get storageclass longhorn

Expected output: a longhorn StorageClass with driver.longhorn.io as the provisioner.

In the PodWarden UI, go to the cluster detail page — the StorageClasses section shows all available storage classes including Longhorn.

Longhorn Prerequisites

Longhorn needs these packages on each node, which PodWarden installs during provisioning:

open-iscsi — iSCSI initiator for volume attachment
nfs-common — NFS client (for Longhorn backup targets)

If you added nodes manually (not via PodWarden provisioning), install these packages before joining the cluster:

sudo apt-get update && sudo apt-get install -y open-iscsi nfs-common
sudo systemctl enable --now iscsid

Longhorn Health Checks in PodWarden

PodWarden monitors Longhorn status as part of cluster health:

Cluster detail page shows Longhorn node status and disk availability
Node schedulability — Longhorn marks a node as unschedulable when disk space drops below the threshold (default: 25% remaining)
Volume health — degraded volumes (missing replicas) are flagged in the cluster detail

Common Longhorn Issues

Node Not Schedulable

Longhorn marks nodes as unschedulable when available disk space is too low. This prevents new volumes from being created on that node.

Symptoms: PVC stays in Pending state, workload cannot start.

Fix:

# Check disk pressure on the node
kubectl get nodes -o custom-columns=NAME:.metadata.name,DISK_PRESSURE:.status.conditions[?\(@.type==\"DiskPressure\"\)].status

# Check Longhorn node status
kubectl -n longhorn-system get nodes.longhorn.io -o wide

Free disk space by cleaning up unused images, old logs, or expanding the disk. Longhorn's threshold is configurable in the Longhorn UI (accessible at longhorn-frontend service in the longhorn-system namespace).

Volume Degraded

A volume shows as degraded when it has fewer healthy replicas than configured (default: 3 replicas).

Common causes:

A node went offline — Longhorn will rebuild replicas when the node returns or when another node has space
Disk full on a node — same fix as above
Network partition between nodes — check flannel and node connectivity

Slow Volume I/O

Longhorn replicates data across nodes over the network. On slow LANs or high-latency mesh connections, this can impact write performance.

Mitigations:

Reduce replica count to 2 for non-critical workloads
Pin high-I/O workloads to nodes with fast local storage
Use local-path StorageClass (shipped with K3s) for workloads that do not need replication

Storage Connections

Storage connections are global named backends registered in PodWarden. Workloads reference them by name instead of hardcoding addresses. Manage them under Storage in the sidebar.

Adding NFS Storage

Go to Storage and click Add Connection
Select NFS as the type
Fill in:
- Name — descriptive name (e.g. nas-media, backup-share)
- Server — NFS server hostname or IP (e.g. 10.10.0.50 or nas.local)
- Base Path — NFS export path (e.g. /exports/data)
- Network Types — how the storage is reachable (LAN, mesh, public)
Click Test Connection

The test verifies:

TCP port 2049 reachability
RPC portmapper response
NFS export listing
If managed hosts are available: SSH-based mount and 100 MB read/write speed test

Note: The NFS server must allow connections from all cluster nodes. If using IP-based access control in /etc/exports, include the IP range of your K3s nodes.

Adding S3 Storage

Go to Storage and click Add Connection
Select S3 as the type
Fill in:
- Name — descriptive name (e.g. offsite-backup, media-bucket)
- Endpoint — S3 endpoint URL (e.g. https://s3.amazonaws.com for AWS, https://minio.local:9000 for MinIO)
- Bucket — bucket name
- Region — AWS region or empty for non-AWS
- Access Key — stored encrypted
- Secret Key — stored encrypted
- Network Types — typically public for AWS S3, lan or mesh for local MinIO
Click Test Connection

The test verifies endpoint reachability, authentication, and runs a 100 MB upload/download speed test.

MinIO example:

Field	Value
Endpoint	`https://minio.local:9000`
Bucket	`podwarden-backups`
Region	(leave empty)
Access Key	`minioadmin`
Secret Key	`minioadmin`

AWS S3 example:

Field	Value
Endpoint	`https://s3.amazonaws.com`
Bucket	`my-podwarden-backups`
Region	`us-east-1`
Access Key	`AKIA...`
Secret Key	`wJal...`

Testing Connections

Always test connections after creating them. The test runs from the PodWarden API server, not from the K3s cluster. A successful test confirms:

Network connectivity to the storage
Authentication works
Read/write operations succeed

If the test passes but workloads fail to mount, check that the K3s nodes can also reach the storage. The PodWarden API server and cluster nodes may be on different networks.

Backup Policies

PodWarden uses Restic for incremental, deduplicated, encrypted backups. The Backup Operator (a system app) runs inside the cluster and manages the backup lifecycle.

Prerequisites

Install the Backup Operator before creating backup policies:

kubectl apply -f https://www.podwarden.com/operators/backup-operator/install.yaml

Or deploy it from the Hub catalog under System Apps. Without the operator, policies remain in Pending state and no backups run. See System Apps for details.

Verify it is running:

kubectl get pods -n podwarden-system
# Should show backup-operator pod in Running state

kubectl get crd | grep backup
# Should show backuppolicies.podwarden.com and backupruns.podwarden.com

Creating a Backup Policy

Go to Backups in the sidebar
Click New backup policy
Select the deployed workload to back up
Choose backup mode:
- Hot — backs up while workload runs (optional pre-backup hook for database dumps)
- Cold — scales workload to 0 during backup (~30s downtime)
Set the schedule:
- Daily at 2:00 AM: 0 2 * * *
- Weekly on Sunday: 0 2 * * 0
- Every hour: 0 * * * *
Select the storage target (an NFS or S3 connection)
Configure retention:
- Keep last: 7 (number of most recent snapshots to keep)
- Keep daily: 7 (one per day for N days)
- Keep weekly: 4 (one per week for N weeks)
Click Create policy

The operator picks up the policy and starts scheduling runs automatically.

Hot vs Cold Backups

	Hot	Cold
Downtime	None	~30 seconds
Consistency	Application-level (use pre-backup hook)	Filesystem-level (volume is idle)
Best for	Web apps, services with dump commands	Databases without dump hooks, apps that need quiescent state

Hot mode with pre-backup hook is the recommended approach for databases:

# PostgreSQL hook
pg_dumpall -U postgres > /var/lib/postgresql/data/dump.sql

# MySQL hook
mysqldump -u root -p$MYSQL_ROOT_PASSWORD --all-databases > /var/lib/mysql/dump.sql

# Redis hook
redis-cli BGSAVE && sleep 2

The hook runs via kubectl exec inside the running pod. File backup starts only after the hook completes successfully. If the hook exits non-zero, the backup is aborted.

Manual Backup Triggers

To run a backup outside the normal schedule:

Go to Backups
Click into the policy
Click Run now

The operator creates an immediate BackupRun. This is useful before risky operations like upgrades or migrations.

Restoring from a Backup

Go to Backups and click into the policy
Browse snapshots — each shows timestamp, size, and status
Click Restore on the snapshot you want
Confirm the restore

PodWarden creates a RestoreRun CRD. The operator:

Scales the workload to 0 replicas
Runs restic restore to overwrite PVC data with the snapshot
Scales the workload back to 1

Restore takes 30-120 seconds plus data transfer time. All data written after the snapshot is overwritten.

Warning: Restore overwrites the current PVC data with the snapshot contents. There is no undo. If you need to preserve current data, take a manual backup first.

PostgreSQL Backups

For databases external to K8s or for dedicated database backup pipelines:

Create a new backup policy with Backup type set to postgres
Enter database connection details (host, port, database, user, password)
Set schedule and retention
Select a storage target

The operator runs pg_dump --format=custom in an init container and uploads the dump via Restic. The custom format supports selective restore with pg_restore.

Backup Operator (System App)

The Backup Operator is a Kubernetes operator that PodWarden deploys to your cluster. It watches for BackupPolicy and RestoreRun CRDs and handles the actual data movement.

How It Works

PodWarden creates a BackupPolicy CRD in the cluster when you create a policy in the UI
The operator schedules BackupRun resources according to the cron expression
Each BackupRun creates a Kubernetes Job that:
- Runs the pre-backup hook (if configured)
- Scales down the workload (cold mode only)
- Runs Restic to backup PVC data to the storage target
- Scales up the workload (cold mode only)
- Applies retention rules (restic forget)
PodWarden syncs BackupRun status back to the UI

Kubernetes Resources Created

Each backup policy creates:

Resource	Name	Purpose
BackupPolicy	`pw-backup-{workload}`	CRD watched by the operator
Secret	`pw-backup-{suffix}`	Restic repository password
Secret	S3 credentials (if applicable)	From storage connection
ServiceAccount	`podwarden-backup`	RBAC for deployment reads and pod exec

Monitoring the Operator

# Operator logs
kubectl -n podwarden-system logs deployment/backup-operator --tail=200

# List all backup policies
kubectl get backuppolicies

# List recent backup runs
kubectl get backupruns --sort-by=.metadata.creationTimestamp

# Describe a specific run
kubectl describe backuprun <name>

Troubleshooting

Policy Stuck in Pending

The backup operator is not installed or not running.

# Check if CRDs exist
kubectl get crd | grep backup

# Check operator pod
kubectl -n podwarden-system get pods

If CRDs are missing, install the operator. If the pod is crashing, check its logs.

Backup Run Failed

# Check the BackupRun status
kubectl describe backuprun <name>

# Check the backup Job logs
kubectl logs job/<backup-job-name>

Common causes:

Storage unreachable — the cluster nodes cannot reach the NFS or S3 endpoint
Pre-backup hook failed — the command returned non-zero; check the hook command
PVC not found — the workload may have been undeployed between policy creation and backup run
Restic password mismatch — the repo was initialized with a different password. Delete the policy, create a new one (fresh repo path), and run a backup

Longhorn PVC Stuck in Pending

# Check PVC events
kubectl describe pvc <pvc-name>

# Check Longhorn node status
kubectl -n longhorn-system get nodes.longhorn.io

Common causes:

No Longhorn nodes are schedulable (disk full)
StorageClass longhorn does not exist (Longhorn not installed)
Access mode mismatch (ReadWriteMany requires NFS, not Longhorn)

NFS Mount Fails in Pods

# Test NFS mount from a cluster node
ssh user@node-ip "sudo mount -t nfs storage-ip:/export/path /mnt/test && ls /mnt/test"

Common causes:

nfs-common not installed on the node
NFS server firewall blocks the node's IP
Export path does not exist or is not exported

S3 Connection Timeout

If the test passes from PodWarden but backups fail:

PodWarden's test runs from the API server; backups run from the cluster
Verify cluster nodes can reach the S3 endpoint: kubectl run test --rm -it --image=curlimages/curl -- curl -I https://s3.amazonaws.com
For MinIO on LAN: ensure the endpoint URL uses an IP or hostname reachable from pods

Best Practices

Test storage connections before creating backup policies. A failing connection means failed backups, and you may not notice until you need a restore.
Use hot mode with pre-backup hooks for databases. Cold mode works but causes downtime. A properly configured dump hook gives you application-consistent backups with zero downtime.
Keep at least one off-site backup. NFS is fast and cheap for local backups. Add an S3 target (even a cheap one like Backblaze B2) for disaster recovery.
Monitor backup run status. Check the Backups page periodically. A policy with consecutive failed runs needs attention.
Test restores before you need them. Create a test workload, back it up, delete the data, and restore. Verify the data is intact. Do this at least once per storage target.
Use Longhorn for stateful workloads, local-path for ephemeral ones. Longhorn provides replication but has network overhead. local-path is faster but data lives on a single node.
Set retention policies thoughtfully. 7 daily + 4 weekly snapshots is a sensible default. Increase for critical data. Restic deduplication keeps storage costs manageable even with many snapshots.

Next Steps

Storage — volume types, mount syntax, StorageClass discovery
Backups — detailed backup reference (CRD specs, compose stack backups, pre-backup hooks)
System Apps — backup operator installation and detection
Infrastructure Setup — initial cluster provisioning