Remote Inference with @lium.machine
This example runs a local Python function on a GPU pod. The decorator provisions the pod, uploads the function, installs Python dependencies, runs inference, returns the result, and tears the pod down.
Authenticate before running the script:
lium init
For CI, scripts, or temporary overrides, set LIUM_API_KEY instead.
#!/usr/bin/env python3
"""Run a small instruct model on a remote Lium GPU."""
import lium
@lium.machine(machine="A100", requirements=["torch", "transformers", "accelerate"])
def infer(prompt: str) -> str:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "HuggingFaceTB/SmolLM2-135M-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
device_map="cuda",
)
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=64,
do_sample=False,
pad_token_id=tokenizer.eos_token_id,
)
response_tokens = outputs[0][inputs["input_ids"].shape[-1]:]
return tokenizer.decode(response_tokens, skip_special_tokens=True).strip()
print(infer("In one sentence, who discovered penicillin?"))
The exact generated text can vary by model version, but it should identify Alexander Fleming.