Proxmox VE 9 is a powerful open-source virtualization platform combining KVM virtualization, LXC containers, and software-defined storage (like ZFS and Ceph).

 

 

To keep your cluster stable and responsive, you need to understand what the Proxmox metrics mean — and how to interpret them. This guide explains every key metric, including CPU, memory, I/O pressure, and storage performance, with tips for troubleshooting common issues.

 


What Are Proxmox Metrics?

Proxmox continuously collects system performance metrics from the Linux kernel, KVM hypervisor, and storage subsystems. These metrics give you insights into:

  • Node health (CPU, RAM, I/O, swap)
  • VM and container performance
  • Cluster communication
  • ZFS or Ceph storage behavior

You can view these metrics from the Proxmox web GUI, the command line, or external monitoring tools such as Nagios and Grafana.

 


Node-Level Metrics in Proxmox VE 9

These metrics represent the overall health of your host (node).
You can find them under:
Datacenter → Node → Summary / Metrics Tabs or via CLI commands such as pveperf.

 

MetricDescriptionHealthy RangeNotes
CPU Usage (%)Total CPU load across all cores<70% sustainedIncludes KVM overhead
IO Delay (%)Time CPU spends waiting for disk I/O<5%High values = disk bottleneck
System Load AverageAverage number of running/waiting processes≤ number of CPU coresHigher = CPU or I/O congestion
Memory Usage (%)RAM consumed by OS, VMs, ZFS ARC<80%ARC can use large memory portion
Swap Usage (%)Used swap memory0–10%Persistent use = insufficient RAM
Network (In/Out)Bandwidth per NIC or bridgeTrack for traffic spikes
UptimeTime since last rebootIndicates stability

 


Disk and Storage Metrics (ZFS, LVM, Ceph)

Storage performance directly impacts VM responsiveness.
You can check metrics under Datacenter → Storage → Summary or via:

zpool iostat -v 1
iostat -x 1
pvesm status
MetricDescriptionIdeal ValueNotes
Read/Write Throughput (MB/s)Data transfer speedIndicates workload intensity
IOPS (Ops/sec)Input/output operations per secondHigher is betterDepends on disk type
Latency (ms/op)Average delay per I/O operation<5ms (SSD), <20ms (HDD)Key indicator for slow storage
ZFS ARC SizeZFS read cache memory usageBoosts read performance
ZIL/SLOG ActivitySync write log operationsAdd dedicated SLOG for NFS/databases
Fragmentation %Data fragmentation in pool<50%High = degraded performance

 

 


Understanding I/O Pressure Stall in Proxmox VE 9

Starting with Linux kernel 5.x, Proxmox includes Pressure Stall Information (PSI) — a mechanism that shows how long processes are delayed due to resource contention.

PSI types:

TypeFileMeaning
CPU Pressure/proc/pressure/cpuWaiting for CPU time
Memory Pressure/proc/pressure/memoryWaiting for memory reclaim
IO Pressure/proc/pressure/ioWaiting for disk I/O completion

Check with:

cat /proc/pressure/io

If avg10 (average stall time over 10 seconds) is above 10–15%, it indicates I/O bottlenecks — usually due to slow disks, overloaded ZFS pools, or too many simultaneous sync writes.

 


Virtual Machine & LXC Container Metrics

Each VM and container has its own set of resource usage stats.
You can view them under VM → Summary or via CLI:

qm status <vmid> --verbose
MetricDescriptionWhat It Means
CPU Usage (%)CPU time used by VMHigh = heavy load or runaway process
Memory Usage (%)RAM consumption inside VMWatch for overcommit or leaks
Disk Read/Write (MB/s)Storage throughputIndicates storage demand
IOPSI/O operations per secondUseful for database or mail servers
Network Traffic (In/Out)vNIC data flowMonitor bandwidth per VM
BallooningDynamic memory adjustmentHelps reclaim unused memory
UptimeVM running timeDetects reboots or instability

 

 


Cluster Metrics and Quorum Monitoring

Cluster stability is measured by Corosync metrics, visible via:

pvecm status
corosync-cfgtool -s
MetricMeaningIdeal Value
Quorum StatusWhether majority voting achievedMust be “Yes” for HA
Vote CountNumber of active nodes with votesUse odd number or QDevice
Cluster Latency (ms)Delay between nodes<2ms ideal
Link Errors / DropsPacket loss between cluster rings0 preferred

If a two-node cluster loses one node, it loses quorum unless a QDevice (tie-breaker) is configured.

 


Key Proxmox Monitoring Commands

ToolPurposeCommand
pveperfQuick node performance snapshotpveperf
top / htopLive CPU/memory usagehtop
zpool iostat -v 1ZFS pool throughput and latency
iostat -x 1Disk stats (non-ZFS)
arcstat.pyZFS ARC cache efficiency
pvecm statusCluster quorum info

 


How to Interpret Common Scenarios

SymptomLikely CauseMetric to Check
VMs lag or freezeDisk I/O bottleneckIO delay, /proc/pressure/io
Backups slow downHigh storage latencyzpool iostat, iostat -x
High memory usageZFS ARC expansionarcstat, free -h
No quorum in clusterNode down / latencypvecm status
GUI sluggishnessCPU or I/O pressurepveperf, load average

 


Summary: Key Takeaways

LayerMetric TypeCheck WithHealthy Range
HostCPU, IO Delay, Memorypveperf, GUIIO Delay <5%
StorageLatency, IOPSzpool iostat, iostat<5ms SSD / <20ms HDD
VMCPU, Memory, Disk IOGUI, qm statusDepends on workload
ClusterQuorum, Latencypvecm statusQuorum = Yes
PSICPU, Memory, IO Stall/proc/pressure/*avg10 <10%

 


Pro Tips for Optimizing Proxmox VE 9 Performance

  • Use SSD or NVMe storage for VM disks and metadata.
  • Add a dedicated SLOG for ZFS sync-heavy workloads (e.g. databases).
  • Enable ZFS compression (lz4) for faster I/O and better space efficiency.
  • Limit concurrent backup and replication jobs.
  • Use virtio-scsi with iothreads for high-performance VMs.
  • Set ZFS scheduler to none for SSDs (echo none > /sys/block/sdX/queue/scheduler).

 


Final Thoughts

Proxmox VE 9 offers deep visibility into host, VM, and storage performance — but understanding its metrics is the key to maintaining a stable and responsive cluster.

By monitoring I/O pressure, CPU load, and ZFS latency, you can detect performance issues early, plan hardware upgrades, and ensure your virtual infrastructure remains efficient and reliable.