Proxmox High Availability (HA) is designed to keep virtual machines (VMs) and containers (CTs) running automatically on other nodes in a cluster when a failure occurs. It provides automated failover, minimizing downtime without needing manual intervention.

How Proxmox HA Works – Key Concepts

1. Cluster Setup

  • Proxmox HA only works in a clustered environment (minimum 3 nodes recommended).
  • Clustering is built on Corosync, which handles node communication, quorum, and fencing.

2. Quorum and Corosync

  • Quorum ensures that the cluster can make valid decisions.
  • Corosync uses a ring-based protocol to exchange heartbeat messages between nodes.
  • If a node loses quorum, it’s fenced (disabled) to avoid split-brain scenarios.

3. HA Manager (pve-ha-crm)

  • Each node runs a local component of the Proxmox HA Manager.
  • This manager monitors HA-enabled services and coordinates failover.
  • It works with the pve-ha-lrm (Local Resource Manager) on each node.

How Failover Works

  1. You enable HA for a VM or CT in the Proxmox GUI or CLI.
  2. If the host node fails (no heartbeat), other nodes detect this via Corosync.
  3. The HA Manager elects a new node to take over the failed VM/CT.
  4. The service is restarted automatically on the new node.
  5. Once the original node comes back online, it does not automatically reclaim the VM (no auto “failback”).

Failover requires shared or replicated storage (like Ceph, ZFS replication, or NFS) so that the VM’s disk is accessible from other nodes.


Requirements for Proxmox HA

ComponentRequirement
NodesMinimum of 3 for quorum-based HA
Shared StorageCeph, NFS, iSCSI, ZFS replication
Cluster CommunicationStable, redundant network (preferably a 10Gb/s link)
VM/CT ConfigurationMust be HA-enabled manually (per VM or group)

How to Enable HA (GUI)

  1. Go to Datacenter > HA > Resources
  2. Click “Add”
  3. Select a VM or CT
  4. Set group (optional) and state (enabled/disabled)
  5. Proxmox will now monitor and manage the HA status of that resource

How to Enable HA (CLI)

# Add VM 101 to HA with default group
ha-manager add vm:101

# Check HA status
ha-manager status

# Remove from HA
ha-manager remove vm:101

HA Groups and Priorities

  • You can define HA Groups with preferred nodes and priorities.
  • This lets you control where services should run first, or where they should move during failover.

Example:

# Create an HA group
ha-manager groupadd mygroup --nodes node1,node2,node3 --nofailback 1

HA Recovery Actions

ScenarioWhat Happens
Node crash/failureVMs/CTs moved to another node and restarted
Node reboot (graceful)HA service waits, does not move VMs (unless fenced)
Manual shutdownVMs migrate if live migration is possible
Network partition (split)Node may be fenced if quorum is lost

Limitations of Proxmox HA

  • No live failover (failover involves a restart, not a state transfer)
  • No automatic failback when the failed node comes back online
  • Requires shared or replicated storage
  • Doesn’t guarantee zero downtime — only minimal downtime

Best Practices for Proxmox HA

  • Always use at least 3 nodes
  • Use redundant networking for Corosync (e.g., ring0 and ring1)
  • Set up fencing correctly (Proxmox uses pve-ha-crm logic for fencing, not traditional STONITH)
  • Use replicated or shared storage
  • Don’t HA-enable every VM — only critical services

Summary

FeatureDescription
Automatic FailoverYes, restarts VM on another node
Live Migration on FailoverNo (restart required)
Node CountMinimum 3 recommended for quorum
Shared Storage RequiredYes
Memory State PreservedNo (unless using external HA-aware memory storage)

 

Proxmox HA is reliable, lightweight, and built-in, offering a simple and powerful way to ensure uptime across small to medium virtualization clusters — without the licensing overhead of proprietary systems like VMware vSphere HA.

 

Get in touch with Saturn ME today for a free Proxmox consulting session—no strings attached.