---
sidebar_position: 8
---

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lium.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Troubleshooting

When a node is online but earning nothing — or a freshly-installed node is failing validator verification — start here. This page covers the two diagnostic surfaces providers reach for first:

1. **Grafana Job Logs** — the canonical place to read raw validator-side error messages for a specific node.
2. **The `verifyx-test` benchmark** — runs the same network, storage, and memory checks the validator runs, on demand.

If neither narrows down the problem, the [Provider Portal monitoring view](./portal/monitoring.md) and the [full Grafana suite](./grafana.md) cover the rest of the day-2 telemetry.

## 1. Pull error details from Grafana Job Logs

The [**Job Logs** dashboard](https://grafana.lium.io/d/aejriu31349hcb/job-logs) is the per-node scoring log. Every validator run for your node — successful, failed, or unreachable — lands here with the exact error message the validator produced.

### Open the dashboard with your filters

1. Go to [grafana.lium.io → Job Logs](https://grafana.lium.io/d/aejriu31349hcb/job-logs).
2. In the top toolbar, set the two template variables:
   - **`miner_hotkey`** — your provider SS58 hotkey (the one registered on subnet 51).
   - **`executor_id`** — the UUID of the specific node you're debugging. You can copy this from the [Provider Portal node detail page](./portal/managing-nodes.md) or from `provider.lium.io/executors/{executor-id}`.
3. Set the time range (top-right) to cover the period you care about. Default is the last 1 hour; widen it if your node has been silent.

You can also reach the same view without leaving the portal — each node's detail page has a **Grafana** tab that pre-fills both variables for that one node (see [Monitoring Nodes → Grafana Integration](./portal/monitoring.md#2-grafana-integration)).

### Read the three log feeds

The dashboard shows three separate panels, all keyed by the same `miner_hotkey` + `executor_id`:

| Panel | When it has rows | What to look for |
|---|---|---|
| **Job Logs** | Every scoring run | The score the validator gave you, the uptime, and any non-fatal warnings. |
| **Job Error Logs (Score = 0 AND GPU > 0)** | The node was reachable but failed scoring | Synthetic-job errors, missing `sysbox-runc` runtime, GPU verification failures, network-metric failures. |
| **Machine scrape Error Logs (Score = 0 AND GPU = 0)** | The validator could not even read your machine | SSH / auth failures, unreachable host, no GPU detected, port closed. |

**Triage rule of thumb:**

- Rows in panel 3 → it is a connectivity or configuration problem (network, firewall, SSH, machine offline). Fix the host before doing anything else.
- Rows in panel 2 → the host is up but a specific check is failing. Read the error message; for verifyx network or storage failures, jump to [§2](#2-run-the-verifyx-benchmark-to-reproduce-validator-checks) and reproduce locally.
- Rows in panel 1 with non-zero scores → your node is being scored normally; revenue concerns belong in [Rewards](./rewards/index.mdx) and the [rewards FAQ](./rewards/faq.mdx).

## 2. Run the verifyx benchmark to reproduce validator checks

When the Job Logs panel 2 reports a network or storage failure, the fastest way to confirm and iterate on a fix is to run the validator's own checks locally. We ship them as a Docker image, `daturaai/verifyx-test`, that runs on the node host itself. Run it before registering a new node, and re-run it any time validator-side network or storage scoring drops.

### Quick start (network only)

```bash
docker run --gpus all --rm daturaai/verifyx-test:latest
```

Runs a 10-iteration network benchmark and prints download and upload speeds. Finishes in roughly 2–3 minutes.

### Full benchmark (network + storage + memory)

```bash
docker run --gpus all --rm daturaai/verifyx-test:latest python3 benchmark.py --full
```

Runs all three test suites. Expect 5–10 minutes due to the 5 GB storage I/O test.

### What is tested

| Test | What it measures | Minimum requirement |
|------|-----------------|---------------------|
| Network download | PyPI package download speed | **50 Mbps** |
| Network upload | Cloudflare speedtest upload | — |
| Storage write | Sequential write throughput (5 GB) | **100 GB free space** |
| Storage read | Sequential read throughput (5 GB) | — |
| Memory allocation | Allocates 75 % of available RAM (8–128 GB range) | **8 GB RAM** |

### Interpreting results

The benchmark runs **10 iterations** and reports an Exponential Moving Average (EMA) with `alpha = 0.3` (decay = 0.7), so recent runs carry more weight. The final printed values are the EMA-smoothed numbers — a single slow run will not dominate the result.

The last section of the output also shows your hardware stats (total RAM, free disk space, utilization) from the most recent successful run.

### Prerequisites

Your host must have the **NVIDIA Container Toolkit** installed so Docker can pass GPU access through the `--gpus all` flag. If you followed the [Node setup guide](./nodes/quickstart.md), this is already configured.

```bash
# Verify the toolkit is installed
docker run --gpus all --rm nvidia/cuda:12.8.1-base-ubuntu22.04 nvidia-smi
```

## 3. Common issues

| Symptom (Job Logs message or benchmark output) | Likely cause | Fix |
|---|---|---|
| Download speed below 50 Mbps | ISP throttling or congested uplink | Contact your hosting provider; consider a dedicated port. |
| Storage test fails — not enough free space | Less than 100 GB free on the node disk | Free space or resize the volume. See [Docker storage](./nodes/docker-storage.md). |
| Memory allocation fails | Less than 8 GB RAM available to Docker | Reduce other processes consuming RAM; upgrade RAM if needed. |
| `--gpus all` flag not recognised | NVIDIA Container Toolkit not installed | Follow the [Node setup guide](./nodes/quickstart.md). |
| Job Logs panel 3 has rows; SSH / auth errors | Validator cannot reach the node | Check firewall, the node's public port, SSH service, and the credentials registered in the Provider Portal. |
| Job Logs panel 2 mentions `sysbox-runc` | Sysbox runtime missing or not the default | Install Sysbox — see [Sysbox setup](./nodes/sysbox.md). It is required; validators reject any node without it. |

## Next steps

- [Grafana Dashboards](./grafana.md) — full dashboard catalogue beyond Job Logs (Penalty Events, Weights, GPU Demand Analytics).
- [Provider Portal — Monitoring](./portal/monitoring.md) — portal-side status indicators and embedded Grafana tab.
- [Node Quickstart](./nodes/quickstart.md) — full setup if you suspect a misconfiguration at the host level.
