If you’re running a two-node Proxmox VE 9 cluster using ZFS as your storage backend, one of the most powerful built-in features you can use for disaster recovery is ZFS replication.

ZFS replication automatically synchronizes VM and container data between nodes using ZFS snapshots and incremental data transfers, ensuring your workloads are protected against node failure — even without expensive shared storage.

In this post, we’ll explain how ZFS replication works, how to configure it, and how it behaves during a node failure.

Note: A two node Proxmox VE cluster is NOT recommended in production environments.


What Is ZFS Replication?

ZFS replication is a snapshot-based data synchronization mechanism built into Proxmox VE.
It allows you to replicate local ZFS volumes (VM or container disks) from one node to another — automatically, securely, and incrementally.

It relies on the native ZFS commands:

zfs send | zfs receive

After the first full copy, only the changed blocks between snapshots are transferred. This makes replication fast and bandwidth-efficient.


Example Setup

NodeRolePoolIP
pve1Primaryrpool10.10.10.1
pve2Secondaryrpool10.10.10.2

Replication direction: pve1 → pve2


How to Configure ZFS Replication in Proxmox VE 9

Option 1: Using the Proxmox GUI

  1. Go to Datacenter → Replication → Add

  2. Choose:

    • Source Node: pve1

    • Target Node: pve2

    • VM IDs: select the virtual machines

    • Schedule: e.g., every 15 minutes

  3. Click Create to start replication.

Option 2: Using the Command Line

pvesr create-local-job vm-101 --target pve2 --rate 10 --schedule "*/15"

List replication jobs:

pvesr list

Under the Hood: The Replication Process Explained

Step 1: ZFS Snapshot Creation

Proxmox automatically creates a snapshot for each disk:

rpool/data/vm-101-disk-0@replication-2025-10-19_10-00-00

You can view snapshots:

zfs list -t snapshot

Step 2: Data Transfer Over SSH

Proxmox uses a secure SSH connection between nodes:

zfs send rpool/data/vm-101-disk-0@replication-2025-10-19_10-00-00 | ssh root@pve2 zfs receive -F rpool/data/vm-101-disk-0

If the previous snapshot exists on both nodes, only delta changes are sent:

zfs send -i @replication-2025-10-19_09-00-00 @replication-2025-10-19_10-00-00 | ssh ...

Step 3: Verification and Retention

Once transfer completes:

  • Checksums verify data integrity

  • The target dataset is read-only

  • Old snapshots are automatically deleted according to retention policy

 


Step 4: Ready for Failover

After replication, VM disks exist on both nodes.
If one node fails, the replicated VM can be started manually — or automatically if HA and quorum are configured.


Advanced Features in ZFS Replication

FeatureDescription
Incremental replicationTransfers only changed data blocks
CompressionUses ZFS LZ4 compression (can also compress stream)
Bandwidth controlLimit speed, e.g. --rate 20 (MB/s)
Resumable transfersAutomatically resumes after network interruptions
Parallel jobsConfigure per-node concurrency in /etc/pve/datacenter.cfg

Example:

replication: max-per-node 3

Incremental vs. Full Replication

TypeDescriptionUse Case
FullSends entire datasetFirst replication
IncrementalSends changed blocks onlyRoutine sync
ResumeContinues partial sendAfter failure

 


What Happens When a Node Fails

Without HA or QDevice

  • Cluster loses quorum (no automatic failover)

  • Replication stops

  • You can manually unlock and start VM on target node:

    qm unlock 101
    qm start 101

With HA + QDevice

  • Cluster maintains quorum via QDevice

  • Proxmox automatically starts replicated VMs on surviving node

  • Once the failed node is restored, replication resumes and syncs deltas.

 


Best Practices for Reliable ZFS Replication

AspectRecommendation
NetworkUse a dedicated 10GbE or VLAN interface for replication
Pool NamesKeep identical pool names on both nodes
CompressionKeep LZ4 enabled
ZFS ScrubRun monthly: zpool scrub rpool
SLOG/ZILUse dedicated SSD for synchronous writes
Avoid DedupHigh RAM usage, not needed for replication

 


Summary

FeatureDescription
MechanismZFS send/receive via SSH
DirectionOne-way (source → target)
Sync TypeSnapshot-based incremental
Automatic FailoverRequires HA + QDevice
Storage TypeLocal ZFS pools
EfficiencyOnly changed blocks sent

 


Conclusion

ZFS replication in Proxmox VE 9 offers a simple, efficient, and robust disaster recovery method — perfect for small to mid-size clusters that don’t have shared SAN or Ceph storage.

With incremental snapshots, dedicated replication networks, and QDevice-based quorum, even a two-node Proxmox setup can achieve near-enterprise-grade data protection and uptime.