How Proxmox HA Works – Key Concepts

Proxmox High Availability (HA) is designed to keep virtual machines (VMs) and containers (CTs) running automatically on other nodes in a cluster when a failure occurs. It provides automated failover, minimizing downtime without needing manual intervention.

How Proxmox HA Works – Key Concepts

1. Cluster Setup

Proxmox HA only works in a clustered environment (minimum 3 nodes recommended).
Clustering is built on Corosync, which handles node communication, quorum, and fencing.

2. Quorum and Corosync

Quorum ensures that the cluster can make valid decisions.
Corosync uses a ring-based protocol to exchange heartbeat messages between nodes.
If a node loses quorum, it’s fenced (disabled) to avoid split-brain scenarios.

3. HA Manager (pve-ha-crm)

Each node runs a local component of the Proxmox HA Manager.
This manager monitors HA-enabled services and coordinates failover.
It works with the pve-ha-lrm (Local Resource Manager) on each node.

How Failover Works

You enable HA for a VM or CT in the Proxmox GUI or CLI.
If the host node fails (no heartbeat), other nodes detect this via Corosync.
The HA Manager elects a new node to take over the failed VM/CT.
The service is restarted automatically on the new node.
Once the original node comes back online, it does not automatically reclaim the VM (no auto “failback”).

Failover requires shared or replicated storage (like Ceph, ZFS replication, or NFS) so that the VM’s disk is accessible from other nodes.

Requirements for Proxmox HA

Component	Requirement
Nodes	Minimum of 3 for quorum-based HA
Shared Storage	Ceph, NFS, iSCSI, ZFS replication
Cluster Communication	Stable, redundant network (preferably a 10Gb/s link)
VM/CT Configuration	Must be HA-enabled manually (per VM or group)

How to Enable HA (GUI)

Go to Datacenter > HA > Resources
Click “Add”
Select a VM or CT
Set group (optional) and state (enabled/disabled)
Proxmox will now monitor and manage the HA status of that resource

How to Enable HA (CLI)

# Add VM 101 to HA with default group
ha-manager add vm:101

# Check HA status
ha-manager status

# Remove from HA
ha-manager remove vm:101

HA Groups and Priorities

You can define HA Groups with preferred nodes and priorities.
This lets you control where services should run first, or where they should move during failover.

Example:

# Create an HA group
ha-manager groupadd mygroup --nodes node1,node2,node3 --nofailback 1

HA Recovery Actions

Scenario	What Happens
Node crash/failure	VMs/CTs moved to another node and restarted
Node reboot (graceful)	HA service waits, does not move VMs (unless fenced)
Manual shutdown	VMs migrate if live migration is possible
Network partition (split)	Node may be fenced if quorum is lost

Limitations of Proxmox HA

No live failover (failover involves a restart, not a state transfer)
No automatic failback when the failed node comes back online
Requires shared or replicated storage
Doesn’t guarantee zero downtime — only minimal downtime

Best Practices for Proxmox HA

Always use at least 3 nodes
Use redundant networking for Corosync (e.g., ring0 and ring1)
Set up fencing correctly (Proxmox uses pve-ha-crm logic for fencing, not traditional STONITH)
Use replicated or shared storage
Don’t HA-enable every VM — only critical services

Summary

Feature	Description
Automatic Failover	Yes, restarts VM on another node
Live Migration on Failover	No (restart required)
Node Count	Minimum 3 recommended for quorum
Shared Storage Required	Yes
Memory State Preserved	No (unless using external HA-aware memory storage)

Proxmox HA is reliable, lightweight, and built-in, offering a simple and powerful way to ensure uptime across small to medium virtualization clusters — without the licensing overhead of proprietary systems like VMware vSphere HA.

Get in touch with Saturn ME today for a free Proxmox consulting session—no strings attached.

June 22, 2025

How Proxmox HA Works – Key Concepts

How Proxmox HA Works – Key Concepts