Proxmox High Availability (HA) is designed to keep virtual machines (VMs) and containers (CTs) running automatically on other nodes in a cluster when a failure occurs. It provides automated failover, minimizing downtime without needing manual intervention.
How Proxmox HA Works – Key Concepts
1. Cluster Setup
- Proxmox HA only works in a clustered environment (minimum 3 nodes recommended).
- Clustering is built on Corosync, which handles node communication, quorum, and fencing.
2. Quorum and Corosync
- Quorum ensures that the cluster can make valid decisions.
- Corosync uses a ring-based protocol to exchange heartbeat messages between nodes.
- If a node loses quorum, it’s fenced (disabled) to avoid split-brain scenarios.
3. HA Manager (pve-ha-crm)
- Each node runs a local component of the Proxmox HA Manager.
- This manager monitors HA-enabled services and coordinates failover.
- It works with the
pve-ha-lrm
(Local Resource Manager) on each node.
How Failover Works
- You enable HA for a VM or CT in the Proxmox GUI or CLI.
- If the host node fails (no heartbeat), other nodes detect this via Corosync.
- The HA Manager elects a new node to take over the failed VM/CT.
- The service is restarted automatically on the new node.
- Once the original node comes back online, it does not automatically reclaim the VM (no auto “failback”).
Failover requires shared or replicated storage (like Ceph, ZFS replication, or NFS) so that the VM’s disk is accessible from other nodes.
Requirements for Proxmox HA
Component | Requirement |
---|---|
Nodes | Minimum of 3 for quorum-based HA |
Shared Storage | Ceph, NFS, iSCSI, ZFS replication |
Cluster Communication | Stable, redundant network (preferably a 10Gb/s link) |
VM/CT Configuration | Must be HA-enabled manually (per VM or group) |
How to Enable HA (GUI)
- Go to Datacenter > HA > Resources
- Click “Add”
- Select a VM or CT
- Set group (optional) and state (enabled/disabled)
- Proxmox will now monitor and manage the HA status of that resource
How to Enable HA (CLI)
# Add VM 101 to HA with default group
ha-manager add vm:101
# Check HA status
ha-manager status
# Remove from HA
ha-manager remove vm:101
HA Groups and Priorities
- You can define HA Groups with preferred nodes and priorities.
- This lets you control where services should run first, or where they should move during failover.
Example:
# Create an HA group
ha-manager groupadd mygroup --nodes node1,node2,node3 --nofailback 1
HA Recovery Actions
Scenario | What Happens |
---|---|
Node crash/failure | VMs/CTs moved to another node and restarted |
Node reboot (graceful) | HA service waits, does not move VMs (unless fenced) |
Manual shutdown | VMs migrate if live migration is possible |
Network partition (split) | Node may be fenced if quorum is lost |
Limitations of Proxmox HA
- No live failover (failover involves a restart, not a state transfer)
- No automatic failback when the failed node comes back online
- Requires shared or replicated storage
- Doesn’t guarantee zero downtime — only minimal downtime
Best Practices for Proxmox HA
- Always use at least 3 nodes
- Use redundant networking for Corosync (e.g., ring0 and ring1)
- Set up fencing correctly (Proxmox uses
pve-ha-crm
logic for fencing, not traditional STONITH) - Use replicated or shared storage
- Don’t HA-enable every VM — only critical services
Summary
Feature | Description |
---|---|
Automatic Failover | Yes, restarts VM on another node |
Live Migration on Failover | No (restart required) |
Node Count | Minimum 3 recommended for quorum |
Shared Storage Required | Yes |
Memory State Preserved | No (unless using external HA-aware memory storage) |
Proxmox HA is reliable, lightweight, and built-in, offering a simple and powerful way to ensure uptime across small to medium virtualization clusters — without the licensing overhead of proprietary systems like VMware vSphere HA.
Get in touch with Saturn ME today for a free Proxmox consulting session—no strings attached.