Understanding the Issue: ZFS Failed but VM Didn’t Migrate

You might encounter this situation in a Proxmox VE cluster using local ZFS storage on each node:

  • A ZFS pool (for example /tank or /rpool) fails or goes offline.

  • The node itself stays powered on and reachable.

  • But your VMs remain stuck, and Proxmox HA doesn’t migrate them to another node.

It’s a confusing scenario — especially if you’ve configured ZFS replication between nodes and expected automatic failover. So why doesn’t Proxmox move those VMs automatically?


Proxmox HA Works at Node Level, Not Storage Level

The key to understanding this lies in how Proxmox HA Manager operates.

Proxmox HA monitors node health, not the storage layer.


This means:

  • If a node goes offline, HA migrates or restarts VMs on another node.

  • If the ZFS storage pool fails but the node is still running, HA sees the node as healthy and takes no action.

In short:

Proxmox HA cannot detect or act on local ZFS storage failure.

So even if the storage under a VM becomes unreadable, HA won’t migrate it — because the host node didn’t actually “fail.”


Local ZFS Is Not Shared Storage

Most Proxmox setups using ZFS are configured with local pools per node, e.g.:

/tank/vm-100-disk-0
/tank/vm-102-disk-0

Even with ZFS replication enabled, these are still separate local datasets. Replication only creates periodic snapshots on other nodes — not live, accessible copies.

So when a ZFS pool on Node A fails:

  • The replicated copy on Node B exists, but

  • It’s a snapshot, not a live disk that HA can boot from automatically.

 


Why ZFS Replication Doesn’t Trigger Automatic Failover

Proxmox ZFS replication (pve-zsync or built-in replication jobs) copies datasets between nodes at scheduled intervals.


However, the replicated dataset:

  • Remains in snapshot form.

  • Is not automatically promoted or activated as a live volume.

Therefore, after a failure, you must manually start the replicated VM: Until that step happens, the Proxmox HA system cannot automatically restart the VM from the replica.


Recovery Steps After ZFS Failure

If your ZFS pool failed but replication was enabled, here’s how to restore the latest snapshot on another node. Assuming ZFS storage failed on PVE2 node, PVE1 is up and running and has replicated successfully recently, you can run the following command on pve1 to bring up the VM(102).

  1. Move VM config 102.conf to PVE1:

    mv /etc/pve/nodes/pve2/qemu-server/102.conf /etc/pve/nodes/pve1/qemu-server/102.conf


  2. Start the VM manually:

    qm start 102

Now the VM will boot from the replicated ZFS dataset on the healthy node.

You can then wipe the ZFS pool and disks on PVE2, create a new ZFS pool using the same name as before, and the existing replication configuration will automatically resume normal operation.


How to Fix It — and Build True HA for ZFS in Proxmox VE

Here are your options to ensure automatic recovery or faster failover.


1. Use Shared Storage for True HA

The most reliable solution is to store all VM disks on shared storage accessible by all cluster nodes.

Recommended options include:

  • CephFS (native to Proxmox VE)

  • NFS or iSCSI SAN

  • ZFS over iSCSI (TrueNAS, StarWind VSAN, etc.)

With shared storage, all nodes can access the same disk image. If one node fails, HA instantly restarts the VM on another host — no replication or promotion required.


2. Automate Failover with ZFS Replication

If you prefer using local ZFS storage per node, you can still achieve partial HA with automation.

  1. Set up Proxmox replication jobs (every 5–15 minutes).

  2. Use a failover script (like ha-replication-manager) that:

    • Detects ZFS pool failure.

    • Promotes the replicated dataset on the standby node.

    • Registers the VM config.

    • Starts the VM automatically.

This approach provides storage-aware failover — a practical compromise between full Ceph and manual recovery.


3. Enable ZFS Health Monitoring & Alerts

Proxmox integrates ZFS monitoring tools that can detect pool degradation early.

Useful commands:

zpool status
zpool events -v

To automate notifications, enable ZFS ZED (ZFS Event Daemon):

apt install zfs-zed
systemctl enable zfs-zed --now

You’ll receive email alerts for:

  • Disk failures

  • Pool degradation

  • Resilvering or corruption events

This gives you time to replace disks before the pool collapses.


Summary: Why Proxmox Didn’t Migrate and How to Prevent It

CauseWhy Migration Didn’t HappenSolution
Local ZFS pool failedHA only detects node failureUse shared storage or automation
ZFS replication usedReplicated data not livePromote snapshot and start manually
Node still reachableHA assumes node healthyAdd storage-level monitoring
No alerts configuredMissed early warningsEnable ZED, SMART, and email alerts

 


Recommended High Availability Setup for ZFS in Proxmox VE 9

ComponentRecommended Setup
Cluster2 or 3 nodes with QDevice
StorageZFS per node + replication
BackupProxmox Backup Server (PBS)
FailoverCustom replication promotion script
AlertsZFS ZED + SMART monitoring

 


Final Thoughts

ZFS provides rock-solid storage reliability in Proxmox VE 9, but automatic HA migration requires shared storage or additional automation.


If your cluster uses local ZFS pools, HA won’t detect storage failure by default — it only reacts to node-level outages.

To achieve true resilience:

  • Use Ceph or shared ZFS over iSCSI for seamless migration.

  • Or enhance ZFS replication with automatic failover scripts.

  • And always keep Proxmox Backup Server running for last-resort recovery.

With these adjustments, your Proxmox ZFS cluster can survive hardware failures, storage faults, and even full node loss — without manual intervention.