Skip to main content

Remote Inference with @lium.machine

This example runs a local Python function on a GPU pod. The decorator provisions the pod, uploads the function, installs Python dependencies, runs inference, returns the result, and tears the pod down.

Authenticate before running the script:

lium init

For CI, scripts, or temporary overrides, set LIUM_API_KEY instead.

#!/usr/bin/env python3
"""Run a small instruct model on a remote Lium GPU."""

import lium

@lium.machine(machine="A100", requirements=["torch", "transformers", "accelerate"])
def infer(prompt: str) -> str:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "HuggingFaceTB/SmolLM2-135M-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
device_map="cuda",
)

messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=64,
do_sample=False,
pad_token_id=tokenizer.eos_token_id,
)

response_tokens = outputs[0][inputs["input_ids"].shape[-1]:]
return tokenizer.decode(response_tokens, skip_special_tokens=True).strip()

print(infer("In one sentence, who discovered penicillin?"))

The exact generated text can vary by model version, but it should identify Alexander Fleming.