---
sidebar_position: 6
---

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lium.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Confidential Virtual Machine (CVM)

<div style={{position: 'relative', paddingBottom: '56.25%', height: 0, overflow: 'hidden', marginBottom: '2rem'}}>
  <iframe
    src="https://drive.google.com/file/d/1JOU8BvqGNwtirnelrhI5BQxbhwSjAsLg/preview"
    style={{position: 'absolute', top: 0, left: 0, width: '100%', height: '100%', border: 'none'}}
    allow="autoplay"
    allowFullScreen
  />
</div>

<details>
<summary>**What is CVM and Why Do We Need It?** (background — skip if you already know)</summary>

**The Problem**

In the current Lium platform design, GPU providers have full access to rental containers, which may allow a GPU provider to:

- Inspect the contents of a rental container and exfiltrate sensitive data (model weights, proprietary code, private keys)
- Modify or replace the node container with a customized version that behaves maliciously
- Intercept network traffic or memory contents of running workloads
- Tamper with job execution in ways that are invisible to validators and renters

This undermines trust in the platform — renters cannot be confident that their workloads are running in a safe, unmodified environment.

**The Solution: CVM with Intel TDX**

A **Confidential Virtual Machine (CVM)** is a hardware-isolated virtual machine powered by [Intel Trust Domain Extensions (TDX)](https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/overview.html). TDX provides:

- **Memory encryption** — All VM memory is encrypted by the CPU hardware. The host OS and hypervisor cannot read VM memory contents, even with physical access.
- **Hardware attestation** — The VM can generate cryptographically signed quotes (TDX quotes) that prove to a remote party exactly what software is running inside.
- **Measurement integrity** — Any modification to the VM image or boot environment changes the hardware measurements, making tampering detectable.

**How This Protects the Lium Platform**

| Threat | Without CVM | With CVM |
|---|---|---|
| GPU provider inspects rental container | Possible — full host access | Blocked — VM memory is hardware-encrypted |
| GPU provider replaces node container | Possible — container is writable from host | Blocked — TDX measurements detect any tampering |
| Rental container escapes to bare metal | Possible in edge cases | Blocked — rental container is nested inside TDX VM |
| Validator verifies node integrity | Not possible | Possible — TDX quotes provide cryptographic proof |

In short: GPU providers can run the CVM but cannot see inside it. Renters are isolated from bare metal. Validators can cryptographically verify that the node is running exactly the expected software stack.

</details>

---

## Prerequisites

### 1. CPU — Intel TDX Support

Your host machine must have a CPU with [Intel TDX (Trust Domain Extensions)](https://cc-enabling.trustedservices.intel.com/intel-tdx-enabling-guide/01/introduction/) support.

| Generation | Codename | TDX Support |
|---|---|---|
| 4th Gen Intel Xeon Scalable | Sapphire Rapids | XCC and MCC SKUs only (not all SKUs) |
| 5th Gen Intel Xeon Scalable | Emerald Rapids | All SKUs |
| Intel Xeon 6 | Granite Rapids | All SKUs — also supports [TDX Connect](https://community.intel.com/t5/Blogs/Tech-Innovation/Data-Center/Announcing-Intel-TDX-Connect-Support-on-Intel-Xeon-6/post/1668423) for encrypted CPU↔GPU communication |

> For 4th Gen, verify with your hardware vendor that the specific SKU supports TDX. TDX availability depends on both the CPU SKU and the platform firmware.

In addition to TDX, the CPU must support **Intel SGX**, as the key provider component relies on SGX enclaves. Verify that `/dev/sgx_enclave` and `/dev/sgx_provision` are present on the host after boot.

### 2. GPU — NVIDIA Confidential Compute Support

The GPU must support [NVIDIA Confidential Computing](https://www.nvidia.com/en-us/data-center/solutions/confidential-computing/). Supported architectures are **Hopper** (H100, H200) and **Blackwell** (B200, GB200).

For the full list of supported GPU SKUs, VBIOS versions, CUDA driver versions, and Confidential Computing modes, refer to the official [NVIDIA Secure AI Compatibility Matrix](https://www.nvidia.com/en-us/data-center/solutions/confidential-computing/secure-ai-compatibility-matrix/).

> Consumer-grade GPUs (RTX series, A-series workstation) do **not** support NVIDIA Confidential Computing.

### 3. BIOS / Firmware

1. Update to the **latest BIOS** from your server/motherboard vendor before enabling TDX
2. Enable the following settings in BIOS:
   - **Intel TDX** (Trust Domain Extensions)
   - **Intel SGX** (Software Guard Extensions)
   - **KVM / Virtualization** (VT-x and VT-d)

> Exact BIOS menu paths vary by vendor (Dell, HPE, Supermicro, etc.) — consult your server's platform configuration guide.

### 4. Kernel

TDX requires a kernel with Intel TDX support. Install the latest Intel-optimized kernel — the tested version is `6.14.0-1009-intel`.

Check your current kernel version:

```bash
uname -r
```

The output should match or exceed the tested version, for example:

```
6.14.0-1009-intel
```

### 5. Operating System

1. Install **[Ubuntu 25.04](https://releases.ubuntu.com/25.04/)** on the host machine
2. Ensure KVM is enabled and the `kvm_intel` module is loaded
3. Install Docker and Docker Compose

---

## Setup Guide

### 1. Check System Compatibility

Run the built-in compatibility check to verify that all required software and kernel features are present:

```bash
./lium-cvm.sh check
```

Fix any errors reported before continuing.

---

### 2. Check vfio-pci Module

The `vfio-pci` kernel module is required to pass GPUs through to the CVM. Verify it is loaded:

```bash
lsmod | grep vfio
```

If the module is not listed, load it manually:

```bash
sudo modprobe vfio-pci
```

To make this persistent across reboots, add it to `/etc/modules`:

```bash
echo "vfio-pci" | sudo tee -a /etc/modules
```

---

### 3. Verify GPU is Bound to vfio-pci

Confirm that the target GPU is using `vfio-pci` as its kernel driver. First, find the PCI address of your GPU:

```bash
lspci | grep -i h200   # adjust the model name as needed
```

Then inspect the specific device (replace `19:00.0` with your PCI address):

```bash
lspci -nnk -s 19:00.0
```

Expected output:

```
19:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H200 SXM 141GB] [10de:2335] (rev a1)
    Subsystem: NVIDIA Corporation Device [10de:18be]
    Kernel driver in use: vfio-pci
    Kernel modules: nvidiafb, nouveau
```

The line `Kernel driver in use: vfio-pci` confirms the GPU is correctly bound. If it shows `nvidia` instead, the GPU is still claimed by the NVIDIA driver and must be rebound to `vfio-pci` before creating the CVM.

To list all available GPUs and their PCI addresses, you can also use:

```bash
./lium-cvm.sh lsgpu
```

---

### 4. Enable GPU Confidential Computing Mode

Use NVIDIA's [gpu-admin-tools](https://github.com/NVIDIA/gpu-admin-tools) to configure Confidential Computing mode on the GPU.

```bash
git clone https://github.com/NVIDIA/gpu-admin-tools.git
cd gpu-admin-tools
```

Disable Protected PCIe mode first, then enable CC mode:

```bash
# Disable PPCIE mode
sudo python3 ./nvidia_gpu_tools.py --devices gpus --set-ppcie-mode=off --reset-after-ppcie-mode-switch

# Enable CC mode
sudo python3 ./nvidia_gpu_tools.py --devices gpus --set-cc-mode=on --reset-after-cc-mode-switch
```

> Both commands trigger a GPU reset. Run them before the CVM is started.

---

### 5. Download the OS Image

Download the dstack TDX OS image that the CVM will boot from:

```bash
./lium-cvm.sh download
```

The image is saved to `run/images/` and reused on subsequent runs.

---

### 6. Start the Key Provider

The **key-provider** is an SGX enclave service that supplies sealing keys to the TDX VM. It must be running on the host before the CVM boots.

The service consists of two containers: `aesmd` (Intel SGX architectural enclave service) and `gramine-sealing-key-provider` (the key provider itself), listening on `127.0.0.1:3443`.

1. Navigate to the key-provider directory and start the containers:

```bash
cd key-provider && docker compose up --build -d
```

2. Verify both containers are running:

```bash
docker compose ps
```

3. Check the key-provider logs for errors:

```bash
docker compose logs -f gramine-sealing-key-provider
```

4. Confirm the endpoint is reachable:

```bash
curl -k https://localhost:3443
```

> **Note:** `lium-cvm.sh run` will also attempt to auto-start the key-provider, but starting it manually first is recommended so you can catch build errors early — especially on first setup when the container images need to be built.

---

### 7. Configure Environment and Create the CVM

Copy the example environment file and fill in your settings:

```bash
cp .env.example .env
```

Key fields to configure:

```bash
# Provider identity
MINER_HOTKEY_SS58_ADDRESS=<your_hotkey>

# Ports
SSH_PORT=2200
RENTING_PORT_RANGE="19001,19002,19003"

# CVM resources
CVM_VCPUS=16
CVM_MEMORY=64G
CVM_DISK=200G

# GPU passthrough — use PCI addresses from step 3, or "all" to pass through every GPU
CVM_GPUS=19:00.0,3b:00.0
# CVM_GPUS=all
```

> **Important:** `CVM_GPUS` must list the PCI addresses of GPUs that are already bound to `vfio-pci` (verified in step 3). Using the wrong address or a GPU still bound to the NVIDIA driver will cause the CVM to fail on launch.

Once `.env` is configured, create the CVM:

```bash
./lium-cvm.sh new my-executor
```

---

### 8. Run the CVM

Start the CVM:

```bash
./lium-cvm.sh run my-executor
```

To verify the launch command without actually starting the VM, use the dry-run flag:

```bash
./lium-cvm.sh run my-executor --dry-run
```

---

### 9. Check the Dashboard

Once the CVM is running, a logging dashboard is available at:

```
http://<host-ip>:8090
```

Port `8090` must be included in `RENTING_PORT_RANGE` in your `.env` file so it is exposed by the CVM:

```bash
RENTING_PORT_RANGE="8090,19001,19002,19003"
```

The dashboard provides real-time logs and status for the node running inside the CVM.

---

## Troubleshooting

1. [CVM fails to start — `vfio-dev: No such file or directory`](#1-cvm-fails-to-start--vfio-dev-no-such-file-or-directory)

---

### 1. CVM fails to start — `vfio-dev: No such file or directory`

**Symptom**

QEMU exits immediately with an error like:

```
qemu-system-x86_64: -device vfio-pci,host=19:00.0,...: vfio 0000:19:00.0:
vfio /sys/bus/pci/devices/0000:19:00.0/vfio-dev: couldn't open directory
/sys/bus/pci/devices/0000:19:00.0/vfio-dev: No such file or directory
```

**Cause**

The kernel was not booted with the parameters required to enable Intel IOMMU and bind the GPU to `vfio-pci` at boot time. Without `intel_iommu=on`, the kernel does not create the `vfio-dev` sysfs entry even if `vfio-pci` is loaded.

**Solution**

1. Find the PCI device IDs of your NVIDIA GPUs:

```bash
lspci -nn | grep -i nvidia
```

Example output:

```
19:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H200 SXM 141GB] [10de:2335] (rev a1)
3b:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H200 SXM 141GB] [10de:22a3] (rev a1)
```

Note the IDs in brackets — e.g. `10de:2335,10de:22a3`.

2. Edit the GRUB configuration at `/etc/default/grub`:

```bash
sudo nano /etc/default/grub
```

Set the following lines (replace `vfio-pci.ids` with the IDs from step 1):

```
GRUB_CMDLINE_LINUX_DEFAULT="kvm_intel.tdx=on nohibernate intel_iommu=on video=efifb:off vfio_iommu_type1.dma_entry_limit=1048576 vfio-pci.ids=10de:2335,10de:22a3 kvm_intel.tdx=1 default_hugepagesz=1G hugepagesz=1G hugepages=10"
GRUB_CMDLINE_LINUX="console=tty0"
```

3. Apply the changes and reboot:

```bash
sudo update-grub
sudo reboot
```

4. After reboot, verify the GPU is bound to `vfio-pci` (see [Setup Guide step 3](#3-verify-gpu-is-bound-to-vfio-pci)).
