Troubleshooting
When a node is online but earning nothing โ or a freshly-installed node is failing validator verification โ start here. This page covers the two diagnostic surfaces providers reach for first:
- Grafana Job Logs โ the canonical place to read raw validator-side error messages for a specific node.
- The
verifyx-testbenchmark โ runs the same network, storage, and memory checks the validator runs, on demand.
If neither narrows down the problem, the Provider Portal monitoring view and the full Grafana suite cover the rest of the day-2 telemetry.
1. Pull error details from Grafana Job Logsโ
The Job Logs dashboard is the per-node scoring log. Every validator run for your node โ successful, failed, or unreachable โ lands here with the exact error message the validator produced.
Open the dashboard with your filtersโ
- Go to grafana.lium.io โ Job Logs.
- In the top toolbar, set the two template variables:
miner_hotkeyโ your provider SS58 hotkey (the one registered on subnet 51).executor_idโ the UUID of the specific node you're debugging. You can copy this from the Provider Portal node detail page or fromprovider.lium.io/executors/{executor-id}.
- Set the time range (top-right) to cover the period you care about. Default is the last 1 hour; widen it if your node has been silent.
You can also reach the same view without leaving the portal โ each node's detail page has a Grafana tab that pre-fills both variables for that one node (see Monitoring Nodes โ Grafana Integration).
Read the three log feedsโ
The dashboard shows three separate panels, all keyed by the same miner_hotkey + executor_id:
| Panel | When it has rows | What to look for |
|---|---|---|
| Job Logs | Every scoring run | The score the validator gave you, the uptime, and any non-fatal warnings. |
| Job Error Logs (Score = 0 AND GPU > 0) | The node was reachable but failed scoring | Synthetic-job errors, missing sysbox-runc runtime, GPU verification failures, network-metric failures. |
| Machine scrape Error Logs (Score = 0 AND GPU = 0) | The validator could not even read your machine | SSH / auth failures, unreachable host, no GPU detected, port closed. |
Triage rule of thumb:
- Rows in panel 3 โ it is a connectivity or configuration problem (network, firewall, SSH, machine offline). Fix the host before doing anything else.
- Rows in panel 2 โ the host is up but a specific check is failing. Read the error message; for verifyx network or storage failures, jump to ยง2 and reproduce locally.
- Rows in panel 1 with non-zero scores โ your node is being scored normally; revenue concerns belong in Rewards and the rewards FAQ.
2. Run the verifyx benchmark to reproduce validator checksโ
When the Job Logs panel 2 reports a network or storage failure, the fastest way to confirm and iterate on a fix is to run the validator's own checks locally. We ship them as a Docker image, daturaai/verifyx-test, that runs on the node host itself. Run it before registering a new node, and re-run it any time validator-side network or storage scoring drops.
Quick start (network only)โ
docker run --gpus all --rm daturaai/verifyx-test:latest
Runs a 10-iteration network benchmark and prints download and upload speeds. Finishes in roughly 2โ3 minutes.
Full benchmark (network + storage + memory)โ
docker run --gpus all --rm daturaai/verifyx-test:latest python3 benchmark.py --full
Runs all three test suites. Expect 5โ10 minutes due to the 5 GB storage I/O test.
What is testedโ
| Test | What it measures | Minimum requirement |
|---|---|---|
| Network download | PyPI package download speed | 50 Mbps |
| Network upload | Cloudflare speedtest upload | โ |
| Storage write | Sequential write throughput (5 GB) | 100 GB free space |
| Storage read | Sequential read throughput (5 GB) | โ |
| Memory allocation | Allocates 75 % of available RAM (8โ128 GB range) | 8 GB RAM |
Interpreting resultsโ
The benchmark runs 10 iterations and reports an Exponential Moving Average (EMA) with alpha = 0.3 (decay = 0.7), so recent runs carry more weight. The final printed values are the EMA-smoothed numbers โ a single slow run will not dominate the result.
The last section of the output also shows your hardware stats (total RAM, free disk space, utilization) from the most recent successful run.
Prerequisitesโ
Your host must have the NVIDIA Container Toolkit installed so Docker can pass GPU access through the --gpus all flag. If you followed the Node setup guide, this is already configured.
# Verify the toolkit is installed
docker run --gpus all --rm nvidia/cuda:12.8.1-base-ubuntu22.04 nvidia-smi
3. Common issuesโ
| Symptom (Job Logs message or benchmark output) | Likely cause | Fix |
|---|---|---|
| Download speed below 50 Mbps | ISP throttling or congested uplink | Contact your hosting provider; consider a dedicated port. |
| Storage test fails โ not enough free space | Less than 100 GB free on the node disk | Free space or resize the volume. See Docker storage. |
| Memory allocation fails | Less than 8 GB RAM available to Docker | Reduce other processes consuming RAM; upgrade RAM if needed. |
--gpus all flag not recognised | NVIDIA Container Toolkit not installed | Follow the Node setup guide. |
| Job Logs panel 3 has rows; SSH / auth errors | Validator cannot reach the node | Check firewall, the node's public port, SSH service, and the credentials registered in the Provider Portal. |
Job Logs panel 2 mentions sysbox-runc | Sysbox runtime missing or not the default | Install Sysbox โ see Sysbox setup. It is required; validators reject any node without it. |
Next stepsโ
- Grafana Dashboards โ full dashboard catalogue beyond Job Logs (Penalty Events, Weights, GPU Demand Analytics).
- Provider Portal โ Monitoring โ portal-side status indicators and embedded Grafana tab.
- Node Quickstart โ full setup if you suspect a misconfiguration at the host level.