Troubleshooting

When a node is online but earning nothing — or a freshly-installed node is failing validator verification — start here. This page covers the two diagnostic surfaces providers reach for first:

Grafana Job Logs — the canonical place to read raw validator-side error messages for a specific node.
The verifyx-test benchmark — runs the same network, storage, and memory checks the validator runs, on demand.

If neither narrows down the problem, the Provider Portal monitoring view and the full Grafana suite cover the rest of the day-2 telemetry.

1. Pull error details from Grafana Job Logs

The Job Logs dashboard is the per-node scoring log. Every validator run for your node — successful, failed, or unreachable — lands here with the exact error message the validator produced.

Open the dashboard with your filters

Go to grafana.lium.io → Job Logs.
In the top toolbar, set the two template variables:
- miner_hotkey — your provider SS58 hotkey (the one registered on subnet 51).
- executor_id — the UUID of the specific node you're debugging. You can copy this from the Provider Portal node detail page or from provider.lium.io/executors/{executor-id}.
Set the time range (top-right) to cover the period you care about. Default is the last 1 hour; widen it if your node has been silent.

You can also reach the same view without leaving the portal — each node's detail page has a Grafana tab that pre-fills both variables for that one node (see Monitoring Nodes → Grafana Integration).

Read the three log feeds

The dashboard shows three separate panels, all keyed by the same miner_hotkey + executor_id:

Panel	When it has rows	What to look for
Job Logs	Every scoring run	The score the validator gave you, the uptime, and any non-fatal warnings.
Job Error Logs (Score = 0 AND GPU > 0)	The node was reachable but failed scoring	Synthetic-job errors, missing `sysbox-runc` runtime, GPU verification failures, network-metric failures.
Machine scrape Error Logs (Score = 0 AND GPU = 0)	The validator could not even read your machine	SSH / auth failures, unreachable host, no GPU detected, port closed.

Triage rule of thumb:

Rows in panel 3 → it is a connectivity or configuration problem (network, firewall, SSH, machine offline). Fix the host before doing anything else.
Rows in panel 2 → the host is up but a specific check is failing. Read the error message; for verifyx network or storage failures, jump to §2 and reproduce locally.
Rows in panel 1 with non-zero scores → your node is being scored normally; revenue concerns belong in Rewards and the rewards FAQ.

2. Run the verifyx benchmark to reproduce validator checks

When the Job Logs panel 2 reports a network or storage failure, the fastest way to confirm and iterate on a fix is to run the validator's own checks locally. We ship them as a Docker image, daturaai/verifyx-test, that runs on the node host itself. Run it before registering a new node, and re-run it any time validator-side network or storage scoring drops.

Quick start (network only)

docker run --gpus all --rm daturaai/verifyx-test:latest

Runs a 10-iteration network benchmark and prints download and upload speeds. Finishes in roughly 2–3 minutes.

Full benchmark (network + storage + memory)

docker run --gpus all --rm daturaai/verifyx-test:latest python3 benchmark.py --full

Runs all three test suites. Expect 5–10 minutes due to the 5 GB storage I/O test.

What is tested

Test	What it measures	Minimum requirement
Network download	PyPI package download speed	50 Mbps
Network upload	Cloudflare speedtest upload	—
Storage write	Sequential write throughput (5 GB)	100 GB free space
Storage read	Sequential read throughput (5 GB)	—
Memory allocation	Allocates 75 % of available RAM (8–128 GB range)	8 GB RAM

Interpreting results

The benchmark runs 10 iterations and reports an Exponential Moving Average (EMA) with alpha = 0.3 (decay = 0.7), so recent runs carry more weight. The final printed values are the EMA-smoothed numbers — a single slow run will not dominate the result.

The last section of the output also shows your hardware stats (total RAM, free disk space, utilization) from the most recent successful run.

Prerequisites

Your host must have the NVIDIA Container Toolkit installed so Docker can pass GPU access through the --gpus all flag. If you followed the Node setup guide, this is already configured.

# Verify the toolkit is installed
docker run --gpus all --rm nvidia/cuda:12.8.1-base-ubuntu22.04 nvidia-smi

3. Common issues

Symptom (Job Logs message or benchmark output)	Likely cause	Fix
Download speed below 50 Mbps	ISP throttling or congested uplink	Contact your hosting provider; consider a dedicated port.
Storage test fails — not enough free space	Less than 100 GB free on the node disk	Free space or resize the volume. See Docker storage.
Memory allocation fails	Less than 8 GB RAM available to Docker	Reduce other processes consuming RAM; upgrade RAM if needed.
`--gpus all` flag not recognised	NVIDIA Container Toolkit not installed	Follow the Node setup guide.
Job Logs panel 3 has rows; SSH / auth errors	Validator cannot reach the node	Check firewall, the node's public port, SSH service, and the credentials registered in the Provider Portal.
Job Logs panel 2 mentions `sysbox-runc`	Sysbox runtime missing or not the default	Install Sysbox — see Sysbox setup. It is required; validators reject any node without it.

Next steps

Grafana Dashboards — full dashboard catalogue beyond Job Logs (Penalty Events, Weights, GPU Demand Analytics).
Provider Portal — Monitoring — portal-side status indicators and embedded Grafana tab.
Node Quickstart — full setup if you suspect a misconfiguration at the host level.

1. Pull error details from Grafana Job Logs​

Open the dashboard with your filters​

Read the three log feeds​

2. Run the verifyx benchmark to reproduce validator checks​

Quick start (network only)​

Full benchmark (network + storage + memory)​

What is tested​

Interpreting results​

Prerequisites​

3. Common issues​

Next steps​