Confidential Virtual Machine (CVM)
What is CVM and Why Do We Need It?​
The Problem​
In the current Lium platform design, GPU providers have full access to rental containers, which may allow a GPU provider to:
- Inspect the contents of a rental container and exfiltrate sensitive data (model weights, proprietary code, private keys)
- Modify or replace the executor container with a customized version that behaves maliciously
- Intercept network traffic or memory contents of running workloads
- Tamper with job execution in ways that are invisible to validators and renters
This undermines trust in the platform — renters cannot be confident that their workloads are running in a safe, unmodified environment.
The Solution: CVM with Intel TDX​
A Confidential Virtual Machine (CVM) is a hardware-isolated virtual machine powered by Intel Trust Domain Extensions (TDX). TDX provides:
- Memory encryption — All VM memory is encrypted by the CPU hardware. The host OS and hypervisor cannot read VM memory contents, even with physical access.
- Hardware attestation — The VM can generate cryptographically signed quotes (TDX quotes) that prove to a remote party exactly what software is running inside.
- Measurement integrity — Any modification to the VM image or boot environment changes the hardware measurements, making tampering detectable.
How This Protects the Lium Platform​
| Threat | Without CVM | With CVM |
|---|---|---|
| GPU provider inspects rental container | Possible — full host access | Blocked — VM memory is hardware-encrypted |
| GPU provider replaces executor container | Possible — container is writable from host | Blocked — TDX measurements detect any tampering |
| Rental container escapes to bare metal | Possible in edge cases | Blocked — rental container is nested inside TDX VM |
| Validator verifies executor integrity | Not possible | Possible — TDX quotes provide cryptographic proof |
In short: GPU providers can run the CVM but cannot see inside it. Renters are isolated from bare metal. Validators can cryptographically verify that the executor is running exactly the expected software stack.
Prerequisites​
1. CPU — Intel TDX Support​
Your host machine must have a CPU with Intel TDX (Trust Domain Extensions) support.
| Generation | Codename | TDX Support |
|---|---|---|
| 4th Gen Intel Xeon Scalable | Sapphire Rapids | XCC and MCC SKUs only (not all SKUs) |
| 5th Gen Intel Xeon Scalable | Emerald Rapids | All SKUs |
| Intel Xeon 6 | Granite Rapids | All SKUs — also supports TDX Connect for encrypted CPU↔GPU communication |
For 4th Gen, verify with your hardware vendor that the specific SKU supports TDX. TDX availability depends on both the CPU SKU and the platform firmware.
In addition to TDX, the CPU must support Intel SGX, as the key provider component relies on SGX enclaves. Verify that /dev/sgx_enclave and /dev/sgx_provision are present on the host after boot.
2. GPU — NVIDIA Confidential Compute Support​
The GPU must support NVIDIA Confidential Computing. Supported architectures are Hopper (H100, H200) and Blackwell (B200, GB200).
For the full list of supported GPU SKUs, VBIOS versions, CUDA driver versions, and Confidential Computing modes, refer to the official NVIDIA Secure AI Compatibility Matrix.
Consumer-grade GPUs (RTX series, A-series workstation) do not support NVIDIA Confidential Computing.
3. BIOS / Firmware​
- Update to the latest BIOS from your server/motherboard vendor before enabling TDX
- Enable the following settings in BIOS:
- Intel TDX (Trust Domain Extensions)
- Intel SGX (Software Guard Extensions)
- KVM / Virtualization (VT-x and VT-d)
Exact BIOS menu paths vary by vendor (Dell, HPE, Supermicro, etc.) — consult your server's platform configuration guide.
4. Kernel​
TDX requires a kernel with Intel TDX support. Install the latest Intel-optimized kernel — the tested version is 6.14.0-1009-intel.
Check your current kernel version:
uname -r
The output should match or exceed the tested version, for example:
6.14.0-1009-intel
5. Operating System​
- Install Ubuntu 25.04 on the host machine
- Ensure KVM is enabled and the
kvm_intelmodule is loaded - Install Docker and Docker Compose
Setup Guide​
1. Check System Compatibility​
Run the built-in compatibility check to verify that all required software and kernel features are present:
./lium-cvm.sh check
Fix any errors reported before continuing.
2. Check vfio-pci Module​
The vfio-pci kernel module is required to pass GPUs through to the CVM. Verify it is loaded:
lsmod | grep vfio
If the module is not listed, load it manually:
sudo modprobe vfio-pci
To make this persistent across reboots, add it to /etc/modules:
echo "vfio-pci" | sudo tee -a /etc/modules
3. Verify GPU is Bound to vfio-pci​
Confirm that the target GPU is using vfio-pci as its kernel driver. First, find the PCI address of your GPU:
lspci | grep -i h200 # adjust the model name as needed
Then inspect the specific device (replace 19:00.0 with your PCI address):
lspci -nnk -s 19:00.0
Expected output:
19:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H200 SXM 141GB] [10de:2335] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:18be]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
The line Kernel driver in use: vfio-pci confirms the GPU is correctly bound. If it shows nvidia instead, the GPU is still claimed by the NVIDIA driver and must be rebound to vfio-pci before creating the CVM.
To list all available GPUs and their PCI addresses, you can also use:
./lium-cvm.sh lsgpu
4. Enable GPU Confidential Computing Mode​
Use NVIDIA's gpu-admin-tools to configure Confidential Computing mode on the GPU.
git clone https://github.com/NVIDIA/gpu-admin-tools.git
cd gpu-admin-tools
Disable Protected PCIe mode first, then enable CC mode:
# Disable PPCIE mode
sudo python3 ./nvidia_gpu_tools.py --devices gpus --set-ppcie-mode=off --reset-after-ppcie-mode-switch
# Enable CC mode
sudo python3 ./nvidia_gpu_tools.py --devices gpus --set-cc-mode=on --reset-after-cc-mode-switch
Both commands trigger a GPU reset. Run them before the CVM is started.
5. Download the OS Image​
Download the dstack TDX OS image that the CVM will boot from:
./lium-cvm.sh download
The image is saved to run/images/ and reused on subsequent runs.
6. Start the Key Provider​
The key-provider is an SGX enclave service that supplies sealing keys to the TDX VM. It must be running on the host before the CVM boots.
The service consists of two containers: aesmd (Intel SGX architectural enclave service) and gramine-sealing-key-provider (the key provider itself), listening on 127.0.0.1:3443.
- Navigate to the key-provider directory and start the containers:
cd key-provider && docker compose up --build -d
- Verify both containers are running:
docker compose ps
- Check the key-provider logs for errors:
docker compose logs -f gramine-sealing-key-provider
- Confirm the endpoint is reachable:
curl -k https://localhost:3443
Note:
lium-cvm.sh runwill also attempt to auto-start the key-provider, but starting it manually first is recommended so you can catch build errors early — especially on first setup when the container images need to be built.
7. Configure Environment and Create the CVM​
Copy the example environment file and fill in your settings:
cp .env.example .env
Key fields to configure:
# Miner identity
MINER_HOTKEY_SS58_ADDRESS=<your_hotkey>
# Ports
SSH_PORT=2200
RENTING_PORT_RANGE="19001,19002,19003"
# CVM resources
CVM_VCPUS=16
CVM_MEMORY=64G
CVM_DISK=200G
# GPU passthrough — use PCI addresses from step 3, or "all" to pass through every GPU
CVM_GPUS=19:00.0,3b:00.0
# CVM_GPUS=all
Important:
CVM_GPUSmust list the PCI addresses of GPUs that are already bound tovfio-pci(verified in step 3). Using the wrong address or a GPU still bound to the NVIDIA driver will cause the CVM to fail on launch.
Once .env is configured, create the CVM:
./lium-cvm.sh new my-executor
8. Run the CVM​
Start the CVM:
./lium-cvm.sh run my-executor
To verify the launch command without actually starting the VM, use the dry-run flag:
./lium-cvm.sh run my-executor --dry-run
9. Check the Dashboard​
Once the CVM is running, a logging dashboard is available at:
http://<host-ip>:8090
Port 8090 must be included in RENTING_PORT_RANGE in your .env file so it is exposed by the CVM:
RENTING_PORT_RANGE="8090,19001,19002,19003"
The dashboard provides real-time logs and status for the executor running inside the CVM.
Troubleshooting​
1. CVM fails to start — vfio-dev: No such file or directory​
Symptom
QEMU exits immediately with an error like:
qemu-system-x86_64: -device vfio-pci,host=19:00.0,...: vfio 0000:19:00.0:
vfio /sys/bus/pci/devices/0000:19:00.0/vfio-dev: couldn't open directory
/sys/bus/pci/devices/0000:19:00.0/vfio-dev: No such file or directory
Cause
The kernel was not booted with the parameters required to enable Intel IOMMU and bind the GPU to vfio-pci at boot time. Without intel_iommu=on, the kernel does not create the vfio-dev sysfs entry even if vfio-pci is loaded.
Solution
- Find the PCI device IDs of your NVIDIA GPUs:
lspci -nn | grep -i nvidia
Example output:
19:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H200 SXM 141GB] [10de:2335] (rev a1)
3b:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H200 SXM 141GB] [10de:22a3] (rev a1)
Note the IDs in brackets — e.g. 10de:2335,10de:22a3.
- Edit the GRUB configuration at
/etc/default/grub:
sudo nano /etc/default/grub
Set the following lines (replace vfio-pci.ids with the IDs from step 1):
GRUB_CMDLINE_LINUX_DEFAULT="kvm_intel.tdx=on nohibernate intel_iommu=on video=efifb:off vfio_iommu_type1.dma_entry_limit=1048576 vfio-pci.ids=10de:2335,10de:22a3 kvm_intel.tdx=1 default_hugepagesz=1G hugepagesz=1G hugepages=10"
GRUB_CMDLINE_LINUX="console=tty0"
- Apply the changes and reboot:
sudo update-grub
sudo reboot
- After reboot, verify the GPU is bound to
vfio-pci(see Setup Guide step 3).