Proxmox High Availability (HA) ensures that virtual machines (VMs) and containers (CTs) automatically restart on another node in a cluster if the original node fails. It uses Corosync for cluster communication and quorum, and requires at least 3 nodes for reliable operation. HA is enabled per VM or container and depends on shared or replicated storage. While it does not offer live failover (like memory state transfer), it provides automated recovery with minimal downtime, making it ideal for critical workloads in a Proxmox cluster.

 


Why 3 Nodes Are Required for HA in Proxmox

1. Quorum

  • Proxmox VE uses Corosync for cluster communication and quorum-based decision making.
  • Quorum requires a majority of nodes to agree on the current state of the cluster.
  • In a 2-node setup, you can lose quorum with just one node down, leading to split-brain or HA services stopping.

2. Failover Logic

  • Proxmox HA relies on the pve-ha-crm (Cluster Resource Manager) and pve-ha-lrm (Local Resource Manager) to detect failure and move services.
  • With 3 nodes, if one fails:
    • The other two still maintain quorum
    • HA logic can decide to restart the failed VMs/CTs on healthy nodes

2-Node HA is NOT Recommended

  • While a 2-node cluster is technically possible in Proxmox VE, HA is not supported or reliable.
  • You can add a QDevice (external quorum vote) to simulate a 3rd vote, but that adds complexity and still has limitations.

Recommended Minimum for HA

NodesHA CapableNotes
1NoSingle node = no cluster or HA
2No*Possible with QDevice, but fragile
3+YesFull support for HA and stable quorum logic

Best practice: Start with 3 nodes and scale in odd numbers (e.g., 3, 5, 7) for quorum stability.

 


Cluster Hardware Overview

ComponentNode 1Node 2Node 3
CPUXeon / Ryzen (8+ cores)SameSame
RAM64–128 GB ECCSameSame
Boot Drive256–512 GB SSD (ZFS mirror recommended)SameSame
VM StorageCeph OSD SSDs or NFS-backed drivesSameSame
Network NICs2x 1G + 2x 10G NICsSameSame

Network Design

Network RoleDescriptionNIC TypeVLAN / Separate Phys
ManagementWeb GUI, SSH, API1G NICVLAN 10 or Physical
CorosyncCluster heartbeat traffic1G or 10G NICVLAN 20 or Physical
VM/Storage LANCeph, NFS, iSCSI, VM traffic10G NICVLAN 30 or Physical
Backup LANProxmox Backup Server, replicationOptionalVLAN 40

Best Practice: Use dedicated or VLAN-isolated networks for Corosync and Ceph/Storage traffic to avoid congestion and latency.


Storage Design

Option 1: Ceph (Recommended for full HA)

  • 3-node Ceph storage with 3 OSDs per node
  • Use enterprise SSDs or NVMe (min. 2 TB per node)
  • Replication: 3x
  • Journals: Use separate SSDs or WAL/DB partitions

Option 2: Shared NFS/iSCSI

  • NFS or iSCSI from TrueNAS or similar high-availability NAS
  • Accessible to all 3 nodes
  • VMs stored on shared volume
  • No local-only storage for HA VMs

Option 3: ZFS with Replication (Low-cost HA)

  • Each node has local ZFS mirror
  • Use ZFS replication (manual or scheduled)
  • Enables semi-HA (failover with some delay)

Fencing and Quorum

FeatureDescription
QuorumNeeds 2 of 3 nodes online
Corosync Rings2 (ring0 and ring1 for redundancy)
FencingSoftware-based via Proxmox HA stack
No STONITHProxmox uses internal fencing logic

VM HA Configuration

  • Enable HA per VM in Datacenter > HA
  • Group critical VMs into HA Groups with node preferences
  • Avoid overcommitting all nodes with HA VMs
  • Enable no-failback if you don’t want VMs to jump back after recovery

Backup Strategy

  • Deploy Proxmox Backup Server on separate node (physical or VM with external storage)
  • Run daily incremental backups of HA-enabled VMs
  • Backup stored on ZFS dataset or external NAS

Maintenance & Monitoring

TaskFrequencyTools/Notes
Corosync link testMonthlycorosync-cfgtool, ping, traceroute
Disk health checkWeeklysmartctl, Ceph dashboard
Backup restore testMonthlyRestore to non-production node
Resource usage monitorDailyProxmox GUI, Nagios

Configuration Checklist

  • Proxmox VE installed and up to date on all 3 nodes
  • Cluster created using pvecm create and pvecm add
  • Corosync dual-ring configured
  • Shared or replicated storage accessible on all nodes
  • VMs created on shared storage (not local)
  • HA groups defined for critical workloads
  • Proxmox Backup Server connected and tested
  • Monitoring and alerts configured

 

Get in touch with Saturn ME today for a free Proxmox consulting session—no strings attached.