Metrics Exporter ¶

Exposes Proxmox VE metrics over a Prometheus-compatible HTTP endpoint, ready to be scraped and visualised in Grafana or any compatible monitoring stack. Each enabled cluster gets its own scrape endpoint; collectors are opt-in, individually cached and parallelised.

Features¶

Two-level Enable

Master Enabled switch for the module, plus an independent Prometheus Enabled toggle inside the Prometheus tab. Either disabled returns 503 to scrapes.
Per-cluster Token

Each cluster has its own token, stored encrypted at rest and validated with CryptographicOperations.FixedTimeEquals at scrape time. Rotate it any time from the settings page.
Presets

Three one-click profiles: Fast (essentials only — low API impact), Standard (sensible default), Full (everything on). Each preset can still be tweaked collector by collector.
Parallel Collection

MaxParallelRequests (default 5) controls how many per-node/per-guest fetches run concurrently. Lower it on small or slow clusters; raise it for big clusters with fast APIs.
Per-collector Cache

Every collector has its own CacheSeconds TTL. Slow collectors like S.M.A.R.T. or Subscription can be cached for minutes while keeping fast ones (Node Status) fresh.
API Instrumentation

Optional — exports counters and histograms about the cv4pve-admin → PVE API calls themselves (latency, errors). Handy to spot a misbehaving cluster.

Hot reload

Saving the settings clears the cached engine for that cluster — the next scrape rebuilds it with the new configuration. No service restart needed.

Why¶

Why this exporter when PVE already has metrics?

Cluster-wide in one scrape

A single endpoint covers cluster, every node, every guest. Prometheus doesn't need to talk to each PVE node individually.

Opt-in collectors, no surprises

Disk SMART, replication, HA, balloon memory — each is a switch with its own cache TTL. You pay for what you actually scrape.

Auth without certificates

Each cluster has a per-cluster token validated in constant time. No PVE API token in your Prometheus config, no certificate rotation pain.

Drop-in for Grafana

Standard Prometheus text format with # HELP and # TYPE lines on every metric — your existing dashboards keep working.

Sections¶

Status — health check for the current cluster: the scrape URL (clickable), how many requests have been served, and when the last one happened

Endpoint¶

GET /module/*/metrics-exporter/prometheus/<clusterName>?token=<TOKEN>

<clusterName> — name of the configured Proxmox VE cluster
<TOKEN> — per-cluster token defined in the module settings

A 503 response means the module or the Prometheus exporter is disabled; 401 means the token is wrong; 400 means the cluster name is unknown or disabled.

Grab the URL from the UI

The fully-qualified scrape URL for the current cluster is shown (clickable) in the Status tab — copy it from there instead of building it by hand.

Collectors¶

Grouped per scope; each collector is independently toggleable and has its own cache TTL.

Cluster¶

Collector	What it exposes
High Availability	HA state of cluster manager, groups and resources
Backup Info	Configured backup jobs and per-VM backup coverage

Node¶

Collector	What it exposes
Status	Memory, swap, load average, uptime
Subscription	License status and tier per node (Community / Basic / Standard / Premium / None)
Replication	Per-VM replication jobs, sync status and lag
Disk S.M.A.R.T.	S.M.A.R.T. attributes per physical disk (one API call per disk per node — keep cached)

Guest¶

Collector	What it exposes
QEMU Balloon Memory	Real used memory inside the guest (only on QEMU VMs with the balloon driver enabled)

Connecting Prometheus¶

Minimal prometheus.yml scrape job:

scrape_configs:
  - job_name: cv4pve-admin
    metrics_path: "/module/*/metrics-exporter/prometheus/my-cluster"
    params:
      token: ['REPLACE_WITH_YOUR_TOKEN']
    static_configs:
      - targets: ['cv4pve-admin.example.com']

Quote the metrics_path

Quote metrics_path because the literal * in the URL is a wildcard for the "all clusters" routing context and must be preserved as-is — without quotes some YAML parsers treat it as a special character.

If you have multiple clusters, add one scrape job per cluster (or a single job with __metrics_path__ relabeling from a static list).

Metric Naming¶

All metrics are prefixed with cv4pve_ and grouped by scope. Examples:

cv4pve_cluster_quorate, cv4pve_cluster_nodes
cv4pve_node_cpu_assigned_cores, cv4pve_node_memory_total_bytes, cv4pve_node_disk_health, cv4pve_node_disk_wearout
cv4pve_guest_cpu_usage_ratio, cv4pve_guest_memory_usage_bytes, cv4pve_guest_disk_size_bytes, cv4pve_guest_uptime_seconds, cv4pve_guest_network_receive_bytes_total
cv4pve_ha_state, cv4pve_ha_node_state, cv4pve_ha_quorate
cv4pve_guests_not_backed_up
cv4pve_api_request_duration_seconds, cv4pve_api_request_errors_total (when API instrumentation is on)

For the full, always up-to-date catalogue scrape the endpoint above — every metric carries its own # HELP and # TYPE line in the response.

Settings¶

The default preset on a new module is Fast. Each collector has its own Enabled switch and CacheSeconds TTL (0 = no cache).

Show all settings

Module

Setting	Default	Purpose
Enabled	off	Master on/off switch for the module on the cluster
Token	empty (required)	Per-cluster token (stored encrypted, validated with constant-time compare)

Prometheus (Fast preset)

Setting	Default	Purpose
Enabled	on	Enable the Prometheus scrape endpoint
Max Parallel Requests	5	Max concurrent per-node / per-guest fetches (1 = sequential)
API Instrumentation	off in Fast (on in Standard / Full)	Export counters and histograms about the underlying PVE API calls (latency, errors)

Cluster collectors

Collector	Default in Fast	Default in Standard	Default in Full
High Availability (`Ha`)	on, cache 0s	on, cache 0s	on, cache 30s
Backup Info	on, cache 0s	on, cache 600s	on, cache 600s

Node collectors

Collector	Default in Fast	Default in Standard	Default in Full
Status (memory, swap, load, uptime)	off	on, cache 0s	on, cache 0s
Subscription	off	on, cache 3600s	on, cache 3600s
Replication	off	on, cache 0s	on, cache 60s
Disk S.M.A.R.T.	off	off	on, cache 600s

Guest collectors

Collector	Default in Fast	Default in Standard	Default in Full
QEMU Balloon Memory	off	off	on, cache 0s