Remote Inference with `@lium.machine`

This example runs a local Python function on a GPU pod. The decorator provisions the pod, uploads the function, installs Python dependencies, runs inference, returns the result, and tears the pod down.

Authenticate before running the script:

lium init

For CI, scripts, or temporary overrides, set LIUM_API_KEY instead.

#!/usr/bin/env python3
"""Run a small instruct model on a remote Lium GPU."""

import lium

@lium.machine(machine="A100", requirements=["torch", "transformers", "accelerate"])
def infer(prompt: str) -> str:
    from transformers import AutoModelForCausalLM, AutoTokenizer

    model_id = "HuggingFaceTB/SmolLM2-135M-Instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        torch_dtype="auto",
        device_map="cuda",
    )

    messages = [{"role": "user", "content": prompt}]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )
    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=64,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id,
    )

    response_tokens = outputs[0][inputs["input_ids"].shape[-1]:]
    return tokenizer.decode(response_tokens, skip_special_tokens=True).strip()

print(infer("In one sentence, who discovered penicillin?"))

The exact generated text can vary by model version, but it should identify Alexander Fleming.