InferX | Serverless Inference Platform

Model L3.3-70B-Loki-V2.0

Namespace	Model Name	Type	Standby GPU	Standby Pageable	Standby Pinned Memory	GPU Count	vRam (MB)	CPU	Memory (MB)	State	Revision
Trial	L3.3-70B-Loki-V2.0	text2text	Mem	File	File	2	71000	20.0	100000	Normal	259

Prompt

Sample Rest Call

Pods

Tenant	Namespace	Pod Name	State	Required Resource	Allocated Resource	GPU
public	Trial	public/Trial/L3.3-70B-Loki-V2.0/259/274	Standby	CPU 20000 Mem 100000 CacheMem 0 GPU Type Any GPU Count 2 GPU vRam 71000 GPU Contexts 0	Node Name computeinstance-e00r2jrqynf83a8b4f CPU 0 Memory 0 Cache Memory 0	GPU Type NVIDIA H100 80GB HBM3 vRam 0 Slot Size 268435456 Total Slot Count 285 Max Context Per GPU 1

Logs

tenant	namespace	model name	revision	id	node name	create time	exit info	state
public	Trial	L3.3-70B-Loki-V2.0	259	262	computeinstance-e00r2jrqynf83a8b4f	2026-03-01 17:12:55	None	log

Snapshot History

tenant	namespace	model name	revision	nodename	state	detail	updatetime
public	Trial	L3.3-70B-Loki-V2.0	259	computeinstance-e00r2jrqynf83a8b4f	Scheduled	Scheduled	2026-03-01 16:42:08
public	Trial	L3.3-70B-Loki-V2.0	259	computeinstance-e00r2jrqynf83a8b4f	Done	Done	2026-03-01 17:12:55

Model Spec

{
    "image": "vllm/vllm-openai:v0.9.0",
    "commands": [
        "--model",
        "CrucibleLab/L3.3-70B-Loki-V2.0",
        "--disable-custom-all-reduce",
        "--trust-remote-code",
        "--gpu-memory-utilization",
        "0.99",
        "--max-model-len",
        "10000",
        "--tensor-parallel-size=2"
    ],
    "resources": {
        "GPU": {
            "Count": 2,
            "vRam": 71000
        }
    },
    "envs": [],
    "sample_query": {
        "apiType": "text2text",
        "path": "v1/completions",
        "prompt": "what is the integral of x^2 from 0 to 2?\nPlease reason step by step, and put your final answer within \\boxed{}.",
        "prompts": [],
        "dataUrl": "",
        "body": {
            "max_tokens": "800",
            "model": "CrucibleLab/L3.3-70B-Loki-V2.0",
            "stream": "true",
            "temperature": "0"
        },
        "loadingTimeout": 90
    },
    "policy": {
        "Obj": {
            "queue_timeout": 30.0,
            "scalein_timeout": 1.0
        }
    }
}

InferX — Serverless GPU Inference Platform for Production Workloads