Skip to main content

Confidential Virtual Machine (CVM)

What is CVM and Why Do We Need It?​

The Problem​

In the current Lium platform design, GPU providers have full access to rental containers, which may allow a GPU provider to:

  • Inspect the contents of a rental container and exfiltrate sensitive data (model weights, proprietary code, private keys)
  • Modify or replace the executor container with a customized version that behaves maliciously
  • Intercept network traffic or memory contents of running workloads
  • Tamper with job execution in ways that are invisible to validators and renters

This undermines trust in the platform — renters cannot be confident that their workloads are running in a safe, unmodified environment.

The Solution: CVM with Intel TDX​

A Confidential Virtual Machine (CVM) is a hardware-isolated virtual machine powered by Intel Trust Domain Extensions (TDX). TDX provides:

  • Memory encryption — All VM memory is encrypted by the CPU hardware. The host OS and hypervisor cannot read VM memory contents, even with physical access.
  • Hardware attestation — The VM can generate cryptographically signed quotes (TDX quotes) that prove to a remote party exactly what software is running inside.
  • Measurement integrity — Any modification to the VM image or boot environment changes the hardware measurements, making tampering detectable.

How This Protects the Lium Platform​

ThreatWithout CVMWith CVM
GPU provider inspects rental containerPossible — full host accessBlocked — VM memory is hardware-encrypted
GPU provider replaces executor containerPossible — container is writable from hostBlocked — TDX measurements detect any tampering
Rental container escapes to bare metalPossible in edge casesBlocked — rental container is nested inside TDX VM
Validator verifies executor integrityNot possiblePossible — TDX quotes provide cryptographic proof

In short: GPU providers can run the CVM but cannot see inside it. Renters are isolated from bare metal. Validators can cryptographically verify that the executor is running exactly the expected software stack.


Prerequisites​

1. CPU — Intel TDX Support​

Your host machine must have a CPU with Intel TDX (Trust Domain Extensions) support.

GenerationCodenameTDX Support
4th Gen Intel Xeon ScalableSapphire RapidsXCC and MCC SKUs only (not all SKUs)
5th Gen Intel Xeon ScalableEmerald RapidsAll SKUs
Intel Xeon 6Granite RapidsAll SKUs — also supports TDX Connect for encrypted CPU↔GPU communication

For 4th Gen, verify with your hardware vendor that the specific SKU supports TDX. TDX availability depends on both the CPU SKU and the platform firmware.

In addition to TDX, the CPU must support Intel SGX, as the key provider component relies on SGX enclaves. Verify that /dev/sgx_enclave and /dev/sgx_provision are present on the host after boot.

2. GPU — NVIDIA Confidential Compute Support​

The GPU must support NVIDIA Confidential Computing. Supported architectures are Hopper (H100, H200) and Blackwell (B200, GB200).

For the full list of supported GPU SKUs, VBIOS versions, CUDA driver versions, and Confidential Computing modes, refer to the official NVIDIA Secure AI Compatibility Matrix.

Consumer-grade GPUs (RTX series, A-series workstation) do not support NVIDIA Confidential Computing.

3. BIOS / Firmware​

  1. Update to the latest BIOS from your server/motherboard vendor before enabling TDX
  2. Enable the following settings in BIOS:
    • Intel TDX (Trust Domain Extensions)
    • Intel SGX (Software Guard Extensions)
    • KVM / Virtualization (VT-x and VT-d)

Exact BIOS menu paths vary by vendor (Dell, HPE, Supermicro, etc.) — consult your server's platform configuration guide.

4. Kernel​

TDX requires a kernel with Intel TDX support. Install the latest Intel-optimized kernel — the tested version is 6.14.0-1009-intel.

Check your current kernel version:

uname -r

The output should match or exceed the tested version, for example:

6.14.0-1009-intel

5. Operating System​

  1. Install Ubuntu 25.04 on the host machine
  2. Ensure KVM is enabled and the kvm_intel module is loaded
  3. Install Docker and Docker Compose

Setup Guide​

1. Check System Compatibility​

Run the built-in compatibility check to verify that all required software and kernel features are present:

./lium-cvm.sh check

Fix any errors reported before continuing.


2. Check vfio-pci Module​

The vfio-pci kernel module is required to pass GPUs through to the CVM. Verify it is loaded:

lsmod | grep vfio

If the module is not listed, load it manually:

sudo modprobe vfio-pci

To make this persistent across reboots, add it to /etc/modules:

echo "vfio-pci" | sudo tee -a /etc/modules

3. Verify GPU is Bound to vfio-pci​

Confirm that the target GPU is using vfio-pci as its kernel driver. First, find the PCI address of your GPU:

lspci | grep -i h200   # adjust the model name as needed

Then inspect the specific device (replace 19:00.0 with your PCI address):

lspci -nnk -s 19:00.0

Expected output:

19:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H200 SXM 141GB] [10de:2335] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:18be]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau

The line Kernel driver in use: vfio-pci confirms the GPU is correctly bound. If it shows nvidia instead, the GPU is still claimed by the NVIDIA driver and must be rebound to vfio-pci before creating the CVM.

To list all available GPUs and their PCI addresses, you can also use:

./lium-cvm.sh lsgpu

4. Enable GPU Confidential Computing Mode​

Use NVIDIA's gpu-admin-tools to configure Confidential Computing mode on the GPU.

git clone https://github.com/NVIDIA/gpu-admin-tools.git
cd gpu-admin-tools

Disable Protected PCIe mode first, then enable CC mode:

# Disable PPCIE mode
sudo python3 ./nvidia_gpu_tools.py --devices gpus --set-ppcie-mode=off --reset-after-ppcie-mode-switch

# Enable CC mode
sudo python3 ./nvidia_gpu_tools.py --devices gpus --set-cc-mode=on --reset-after-cc-mode-switch

Both commands trigger a GPU reset. Run them before the CVM is started.


5. Download the OS Image​

Download the dstack TDX OS image that the CVM will boot from:

./lium-cvm.sh download

The image is saved to run/images/ and reused on subsequent runs.


6. Start the Key Provider​

The key-provider is an SGX enclave service that supplies sealing keys to the TDX VM. It must be running on the host before the CVM boots.

The service consists of two containers: aesmd (Intel SGX architectural enclave service) and gramine-sealing-key-provider (the key provider itself), listening on 127.0.0.1:3443.

  1. Navigate to the key-provider directory and start the containers:
cd key-provider && docker compose up --build -d
  1. Verify both containers are running:
docker compose ps
  1. Check the key-provider logs for errors:
docker compose logs -f gramine-sealing-key-provider
  1. Confirm the endpoint is reachable:
curl -k https://localhost:3443

Note: lium-cvm.sh run will also attempt to auto-start the key-provider, but starting it manually first is recommended so you can catch build errors early — especially on first setup when the container images need to be built.


7. Configure Environment and Create the CVM​

Copy the example environment file and fill in your settings:

cp .env.example .env

Key fields to configure:

# Miner identity
MINER_HOTKEY_SS58_ADDRESS=<your_hotkey>

# Ports
SSH_PORT=2200
RENTING_PORT_RANGE="19001,19002,19003"

# CVM resources
CVM_VCPUS=16
CVM_MEMORY=64G
CVM_DISK=200G

# GPU passthrough — use PCI addresses from step 3, or "all" to pass through every GPU
CVM_GPUS=19:00.0,3b:00.0
# CVM_GPUS=all

Important: CVM_GPUS must list the PCI addresses of GPUs that are already bound to vfio-pci (verified in step 3). Using the wrong address or a GPU still bound to the NVIDIA driver will cause the CVM to fail on launch.

Once .env is configured, create the CVM:

./lium-cvm.sh new my-executor

8. Run the CVM​

Start the CVM:

./lium-cvm.sh run my-executor

To verify the launch command without actually starting the VM, use the dry-run flag:

./lium-cvm.sh run my-executor --dry-run

9. Check the Dashboard​

Once the CVM is running, a logging dashboard is available at:

http://<host-ip>:8090

Port 8090 must be included in RENTING_PORT_RANGE in your .env file so it is exposed by the CVM:

RENTING_PORT_RANGE="8090,19001,19002,19003"

The dashboard provides real-time logs and status for the executor running inside the CVM.


Troubleshooting​

  1. CVM fails to start — vfio-dev: No such file or directory

1. CVM fails to start — vfio-dev: No such file or directory​

Symptom

QEMU exits immediately with an error like:

qemu-system-x86_64: -device vfio-pci,host=19:00.0,...: vfio 0000:19:00.0:
vfio /sys/bus/pci/devices/0000:19:00.0/vfio-dev: couldn't open directory
/sys/bus/pci/devices/0000:19:00.0/vfio-dev: No such file or directory

Cause

The kernel was not booted with the parameters required to enable Intel IOMMU and bind the GPU to vfio-pci at boot time. Without intel_iommu=on, the kernel does not create the vfio-dev sysfs entry even if vfio-pci is loaded.

Solution

  1. Find the PCI device IDs of your NVIDIA GPUs:
lspci -nn | grep -i nvidia

Example output:

19:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H200 SXM 141GB] [10de:2335] (rev a1)
3b:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H200 SXM 141GB] [10de:22a3] (rev a1)

Note the IDs in brackets — e.g. 10de:2335,10de:22a3.

  1. Edit the GRUB configuration at /etc/default/grub:
sudo nano /etc/default/grub

Set the following lines (replace vfio-pci.ids with the IDs from step 1):

GRUB_CMDLINE_LINUX_DEFAULT="kvm_intel.tdx=on nohibernate intel_iommu=on video=efifb:off vfio_iommu_type1.dma_entry_limit=1048576 vfio-pci.ids=10de:2335,10de:22a3 kvm_intel.tdx=1 default_hugepagesz=1G hugepagesz=1G hugepages=10"
GRUB_CMDLINE_LINUX="console=tty0"
  1. Apply the changes and reboot:
sudo update-grub
sudo reboot
  1. After reboot, verify the GPU is bound to vfio-pci (see Setup Guide step 3).