Confidential Virtual Machine (CVM)

What is CVM and Why Do We Need It? (background — skip if you already know)

The Problem

In the current Lium platform design, GPU providers have full access to rental containers, which may allow a GPU provider to:

Inspect the contents of a rental container and exfiltrate sensitive data (model weights, proprietary code, private keys)
Modify or replace the node container with a customized version that behaves maliciously
Intercept network traffic or memory contents of running workloads
Tamper with job execution in ways that are invisible to validators and renters

This undermines trust in the platform — renters cannot be confident that their workloads are running in a safe, unmodified environment.

The Solution: CVM with Intel TDX

A Confidential Virtual Machine (CVM) is a hardware-isolated virtual machine powered by Intel Trust Domain Extensions (TDX). TDX provides:

Memory encryption — All VM memory is encrypted by the CPU hardware. The host OS and hypervisor cannot read VM memory contents, even with physical access.
Hardware attestation — The VM can generate cryptographically signed quotes (TDX quotes) that prove to a remote party exactly what software is running inside.
Measurement integrity — Any modification to the VM image or boot environment changes the hardware measurements, making tampering detectable.

How This Protects the Lium Platform

Threat	Without CVM	With CVM
GPU provider inspects rental container	Possible — full host access	Blocked — VM memory is hardware-encrypted
GPU provider replaces node container	Possible — container is writable from host	Blocked — TDX measurements detect any tampering
Rental container escapes to bare metal	Possible in edge cases	Blocked — rental container is nested inside TDX VM
Validator verifies node integrity	Not possible	Possible — TDX quotes provide cryptographic proof

In short: GPU providers can run the CVM but cannot see inside it. Renters are isolated from bare metal. Validators can cryptographically verify that the node is running exactly the expected software stack.

Prerequisites

1. CPU — Intel TDX Support

Your host machine must have a CPU with Intel TDX (Trust Domain Extensions) support.

Generation	Codename	TDX Support
4th Gen Intel Xeon Scalable	Sapphire Rapids	XCC and MCC SKUs only (not all SKUs)
5th Gen Intel Xeon Scalable	Emerald Rapids	All SKUs
Intel Xeon 6	Granite Rapids	All SKUs — also supports TDX Connect for encrypted CPU↔GPU communication

For 4th Gen, verify with your hardware vendor that the specific SKU supports TDX. TDX availability depends on both the CPU SKU and the platform firmware.

In addition to TDX, the CPU must support Intel SGX, as the key provider component relies on SGX enclaves. Verify that /dev/sgx_enclave and /dev/sgx_provision are present on the host after boot.

2. GPU — NVIDIA Confidential Compute Support

The GPU must support NVIDIA Confidential Computing. Supported architectures are Hopper (H100, H200) and Blackwell (B200, GB200).

For the full list of supported GPU SKUs, VBIOS versions, CUDA driver versions, and Confidential Computing modes, refer to the official NVIDIA Secure AI Compatibility Matrix.

Consumer-grade GPUs (RTX series, A-series workstation) do not support NVIDIA Confidential Computing.

3. BIOS / Firmware

Update to the latest BIOS from your server/motherboard vendor before enabling TDX
Enable the following settings in BIOS:
- Intel TDX (Trust Domain Extensions)
- Intel SGX (Software Guard Extensions)
- KVM / Virtualization (VT-x and VT-d)

Exact BIOS menu paths vary by vendor (Dell, HPE, Supermicro, etc.) — consult your server's platform configuration guide.

4. Kernel

TDX requires a kernel with Intel TDX support. Install the latest Intel-optimized kernel — the tested version is 6.14.0-1009-intel.

Check your current kernel version:

uname -r

The output should match or exceed the tested version, for example:

6.14.0-1009-intel

5. Operating System

Install Ubuntu 25.04 on the host machine
Ensure KVM is enabled and the kvm_intel module is loaded
Install Docker and Docker Compose

Setup Guide

1. Check System Compatibility

Run the built-in compatibility check to verify that all required software and kernel features are present:

./lium-cvm.sh check

Fix any errors reported before continuing.

2. Check vfio-pci Module

The vfio-pci kernel module is required to pass GPUs through to the CVM. Verify it is loaded:

lsmod | grep vfio

If the module is not listed, load it manually:

sudo modprobe vfio-pci

To make this persistent across reboots, add it to /etc/modules:

echo "vfio-pci" | sudo tee -a /etc/modules

3. Verify GPU is Bound to vfio-pci

Confirm that the target GPU is using vfio-pci as its kernel driver. First, find the PCI address of your GPU:

lspci | grep -i h200   # adjust the model name as needed

Then inspect the specific device (replace 19:00.0 with your PCI address):

lspci -nnk -s 19:00.0

Expected output:

19:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H200 SXM 141GB] [10de:2335] (rev a1)
    Subsystem: NVIDIA Corporation Device [10de:18be]
    Kernel driver in use: vfio-pci
    Kernel modules: nvidiafb, nouveau

The line Kernel driver in use: vfio-pci confirms the GPU is correctly bound. If it shows nvidia instead, the GPU is still claimed by the NVIDIA driver and must be rebound to vfio-pci before creating the CVM.

To list all available GPUs and their PCI addresses, you can also use:

./lium-cvm.sh lsgpu

4. Enable GPU Confidential Computing Mode

Use NVIDIA's gpu-admin-tools to configure Confidential Computing mode on the GPU.

git clone https://github.com/NVIDIA/gpu-admin-tools.git
cd gpu-admin-tools

Disable Protected PCIe mode first, then enable CC mode:

# Disable PPCIE mode
sudo python3 ./nvidia_gpu_tools.py --devices gpus --set-ppcie-mode=off --reset-after-ppcie-mode-switch

# Enable CC mode
sudo python3 ./nvidia_gpu_tools.py --devices gpus --set-cc-mode=on --reset-after-cc-mode-switch

Both commands trigger a GPU reset. Run them before the CVM is started.

5. Download the OS Image

Download the dstack TDX OS image that the CVM will boot from:

./lium-cvm.sh download

The image is saved to run/images/ and reused on subsequent runs.

6. Start the Key Provider

The key-provider is an SGX enclave service that supplies sealing keys to the TDX VM. It must be running on the host before the CVM boots.

The service consists of two containers: aesmd (Intel SGX architectural enclave service) and gramine-sealing-key-provider (the key provider itself), listening on 127.0.0.1:3443.

Navigate to the key-provider directory and start the containers:

cd key-provider && docker compose up --build -d

Verify both containers are running:

docker compose ps

Check the key-provider logs for errors:

docker compose logs -f gramine-sealing-key-provider

Confirm the endpoint is reachable:

curl -k https://localhost:3443

Note: lium-cvm.sh run will also attempt to auto-start the key-provider, but starting it manually first is recommended so you can catch build errors early — especially on first setup when the container images need to be built.

7. Configure Environment and Create the CVM

Copy the example environment file and fill in your settings:

cp .env.example .env

Key fields to configure:

# Provider identity
MINER_HOTKEY_SS58_ADDRESS=<your_hotkey>

# Ports
SSH_PORT=2200
RENTING_PORT_RANGE="19001,19002,19003"

# CVM resources
CVM_VCPUS=16
CVM_MEMORY=64G
CVM_DISK=200G

# GPU passthrough — use PCI addresses from step 3, or "all" to pass through every GPU
CVM_GPUS=19:00.0,3b:00.0
# CVM_GPUS=all

Important: CVM_GPUS must list the PCI addresses of GPUs that are already bound to vfio-pci (verified in step 3). Using the wrong address or a GPU still bound to the NVIDIA driver will cause the CVM to fail on launch.

Once .env is configured, create the CVM:

./lium-cvm.sh new my-executor

8. Run the CVM

Start the CVM:

./lium-cvm.sh run my-executor

To verify the launch command without actually starting the VM, use the dry-run flag:

./lium-cvm.sh run my-executor --dry-run

9. Check the Dashboard

Once the CVM is running, a logging dashboard is available at:

http://<host-ip>:8090

Port 8090 must be included in RENTING_PORT_RANGE in your .env file so it is exposed by the CVM:

RENTING_PORT_RANGE="8090,19001,19002,19003"

The dashboard provides real-time logs and status for the node running inside the CVM.

Troubleshooting

CVM fails to start — vfio-dev: No such file or directory

1. CVM fails to start — `vfio-dev: No such file or directory`

Symptom

QEMU exits immediately with an error like:

qemu-system-x86_64: -device vfio-pci,host=19:00.0,...: vfio 0000:19:00.0:
vfio /sys/bus/pci/devices/0000:19:00.0/vfio-dev: couldn't open directory
/sys/bus/pci/devices/0000:19:00.0/vfio-dev: No such file or directory

Cause

The kernel was not booted with the parameters required to enable Intel IOMMU and bind the GPU to vfio-pci at boot time. Without intel_iommu=on, the kernel does not create the vfio-dev sysfs entry even if vfio-pci is loaded.

Solution

Find the PCI device IDs of your NVIDIA GPUs:

lspci -nn | grep -i nvidia

Example output:

19:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H200 SXM 141GB] [10de:2335] (rev a1)
3b:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H200 SXM 141GB] [10de:22a3] (rev a1)

Note the IDs in brackets — e.g. 10de:2335,10de:22a3.

Edit the GRUB configuration at /etc/default/grub:

sudo nano /etc/default/grub

Set the following lines (replace vfio-pci.ids with the IDs from step 1):

GRUB_CMDLINE_LINUX_DEFAULT="kvm_intel.tdx=on nohibernate intel_iommu=on video=efifb:off vfio_iommu_type1.dma_entry_limit=1048576 vfio-pci.ids=10de:2335,10de:22a3 kvm_intel.tdx=1 default_hugepagesz=1G hugepagesz=1G hugepages=10"
GRUB_CMDLINE_LINUX="console=tty0"

Apply the changes and reboot:

sudo update-grub
sudo reboot

After reboot, verify the GPU is bound to vfio-pci (see Setup Guide step 3).

Prerequisites​

1. CPU — Intel TDX Support​

2. GPU — NVIDIA Confidential Compute Support​

3. BIOS / Firmware​

4. Kernel​

5. Operating System​

Setup Guide​

1. Check System Compatibility​

2. Check vfio-pci Module​

3. Verify GPU is Bound to vfio-pci​

4. Enable GPU Confidential Computing Mode​

5. Download the OS Image​

6. Start the Key Provider​

7. Configure Environment and Create the CVM​

8. Run the CVM​

9. Check the Dashboard​

Troubleshooting​

1. CVM fails to start — vfio-dev: No such file or directory​