![]()
To keep your cluster stable and responsive, you need to understand what the Proxmox metrics mean — and how to interpret them. This guide explains every key metric, including CPU, memory, I/O pressure, and storage performance, with tips for troubleshooting common issues.
What Are Proxmox Metrics?
Proxmox continuously collects system performance metrics from the Linux kernel, KVM hypervisor, and storage subsystems. These metrics give you insights into:
- Node health (CPU, RAM, I/O, swap)
- VM and container performance
- Cluster communication
- ZFS or Ceph storage behavior
You can view these metrics from the Proxmox web GUI, the command line, or external monitoring tools such as Nagios and Grafana.
Node-Level Metrics in Proxmox VE 9
These metrics represent the overall health of your host (node).
You can find them under:
Datacenter → Node → Summary / Metrics Tabs or via CLI commands such as pveperf.
| Metric | Description | Healthy Range | Notes |
|---|---|---|---|
| CPU Usage (%) | Total CPU load across all cores | <70% sustained | Includes KVM overhead |
| IO Delay (%) | Time CPU spends waiting for disk I/O | <5% | High values = disk bottleneck |
| System Load Average | Average number of running/waiting processes | ≤ number of CPU cores | Higher = CPU or I/O congestion |
| Memory Usage (%) | RAM consumed by OS, VMs, ZFS ARC | <80% | ARC can use large memory portion |
| Swap Usage (%) | Used swap memory | 0–10% | Persistent use = insufficient RAM |
| Network (In/Out) | Bandwidth per NIC or bridge | – | Track for traffic spikes |
| Uptime | Time since last reboot | – | Indicates stability |
Disk and Storage Metrics (ZFS, LVM, Ceph)
Storage performance directly impacts VM responsiveness.
You can check metrics under Datacenter → Storage → Summary or via:
zpool iostat -v 1
iostat -x 1
pvesm status
| Metric | Description | Ideal Value | Notes |
|---|---|---|---|
| Read/Write Throughput (MB/s) | Data transfer speed | – | Indicates workload intensity |
| IOPS (Ops/sec) | Input/output operations per second | Higher is better | Depends on disk type |
| Latency (ms/op) | Average delay per I/O operation | <5ms (SSD), <20ms (HDD) | Key indicator for slow storage |
| ZFS ARC Size | ZFS read cache memory usage | – | Boosts read performance |
| ZIL/SLOG Activity | Sync write log operations | – | Add dedicated SLOG for NFS/databases |
| Fragmentation % | Data fragmentation in pool | <50% | High = degraded performance |
Understanding I/O Pressure Stall in Proxmox VE 9
Starting with Linux kernel 5.x, Proxmox includes Pressure Stall Information (PSI) — a mechanism that shows how long processes are delayed due to resource contention.
PSI types:
| Type | File | Meaning |
|---|---|---|
| CPU Pressure | /proc/pressure/cpu | Waiting for CPU time |
| Memory Pressure | /proc/pressure/memory | Waiting for memory reclaim |
| IO Pressure | /proc/pressure/io | Waiting for disk I/O completion |
Check with:
cat /proc/pressure/io
If avg10 (average stall time over 10 seconds) is above 10–15%, it indicates I/O bottlenecks — usually due to slow disks, overloaded ZFS pools, or too many simultaneous sync writes.
Virtual Machine & LXC Container Metrics
Each VM and container has its own set of resource usage stats.
You can view them under VM → Summary or via CLI:
qm status <vmid> --verbose
| Metric | Description | What It Means |
|---|---|---|
| CPU Usage (%) | CPU time used by VM | High = heavy load or runaway process |
| Memory Usage (%) | RAM consumption inside VM | Watch for overcommit or leaks |
| Disk Read/Write (MB/s) | Storage throughput | Indicates storage demand |
| IOPS | I/O operations per second | Useful for database or mail servers |
| Network Traffic (In/Out) | vNIC data flow | Monitor bandwidth per VM |
| Ballooning | Dynamic memory adjustment | Helps reclaim unused memory |
| Uptime | VM running time | Detects reboots or instability |
Cluster Metrics and Quorum Monitoring
Cluster stability is measured by Corosync metrics, visible via:
pvecm status
corosync-cfgtool -s
| Metric | Meaning | Ideal Value |
|---|---|---|
| Quorum Status | Whether majority voting achieved | Must be “Yes” for HA |
| Vote Count | Number of active nodes with votes | Use odd number or QDevice |
| Cluster Latency (ms) | Delay between nodes | <2ms ideal |
| Link Errors / Drops | Packet loss between cluster rings | 0 preferred |
If a two-node cluster loses one node, it loses quorum unless a QDevice (tie-breaker) is configured.
Key Proxmox Monitoring Commands
| Tool | Purpose | Command |
|---|---|---|
pveperf | Quick node performance snapshot | pveperf |
top / htop | Live CPU/memory usage | htop |
zpool iostat -v 1 | ZFS pool throughput and latency | – |
iostat -x 1 | Disk stats (non-ZFS) | – |
arcstat.py | ZFS ARC cache efficiency | – |
pvecm status | Cluster quorum info | – |
How to Interpret Common Scenarios
| Symptom | Likely Cause | Metric to Check |
|---|---|---|
| VMs lag or freeze | Disk I/O bottleneck | IO delay, /proc/pressure/io |
| Backups slow down | High storage latency | zpool iostat, iostat -x |
| High memory usage | ZFS ARC expansion | arcstat, free -h |
| No quorum in cluster | Node down / latency | pvecm status |
| GUI sluggishness | CPU or I/O pressure | pveperf, load average |
Summary: Key Takeaways
| Layer | Metric Type | Check With | Healthy Range |
|---|---|---|---|
| Host | CPU, IO Delay, Memory | pveperf, GUI | IO Delay <5% |
| Storage | Latency, IOPS | zpool iostat, iostat | <5ms SSD / <20ms HDD |
| VM | CPU, Memory, Disk IO | GUI, qm status | Depends on workload |
| Cluster | Quorum, Latency | pvecm status | Quorum = Yes |
| PSI | CPU, Memory, IO Stall | /proc/pressure/* | avg10 <10% |
Pro Tips for Optimizing Proxmox VE 9 Performance
- Use SSD or NVMe storage for VM disks and metadata.
- Add a dedicated SLOG for ZFS sync-heavy workloads (e.g. databases).
- Enable ZFS compression (lz4) for faster I/O and better space efficiency.
- Limit concurrent backup and replication jobs.
- Use virtio-scsi with iothreads for high-performance VMs.
- Set ZFS scheduler to
nonefor SSDs (echo none > /sys/block/sdX/queue/scheduler).
Final Thoughts
Proxmox VE 9 offers deep visibility into host, VM, and storage performance — but understanding its metrics is the key to maintaining a stable and responsive cluster.
By monitoring I/O pressure, CPU load, and ZFS latency, you can detect performance issues early, plan hardware upgrades, and ensure your virtual infrastructure remains efficient and reliable.