InferX | Serverless Inference Platform

Model translategemma-27b-it-FP8-Dynamic

Namespace	Model Name	Type	Standby GPU	Standby Pageable	Standby Pinned Memory	GPU Count	vRam (MB)	CPU	Memory (MB)	State	Revision
Trial	translategemma-27b-it-FP8-Dynamic	text2text	Mem	File	File	1	32000	20.0	80000	Normal	266

Prompt

Sample Rest Call

Pods

Tenant	Namespace	Pod Name	State	Required Resource	Allocated Resource	GPU

Logs

tenant	namespace	model name	revision	id	node name	create time	exit info	state
public	Trial	translategemma-27b-it-FP8-Dynamic	266	269	computeinstance-e00r2jrqynf83a8b4f	2026-03-01 17:05:27	Error("DockerContainerWaitError { error: \"\", code: 1 }")	log
public	Trial	translategemma-27b-it-FP8-Dynamic	266	271	computeinstance-e00r2jrqynf83a8b4f	2026-03-01 17:14:33	Error("DockerContainerWaitError { error: \"\", code: 1 }")	log
public	Trial	translategemma-27b-it-FP8-Dynamic	266	273	computeinstance-e00r2jrqynf83a8b4f	2026-03-01 17:19:31	Error("DockerContainerWaitError { error: \"\", code: 1 }")	log

Snapshot History

tenant	namespace	model name	revision	nodename	state	detail	updatetime
public	Trial	translategemma-27b-it-FP8-Dynamic	266	computeinstance-e00r2jrqynf83a8b4f	Scheduled	Scheduled	2026-03-01 17:00:26

Model Spec

{
    "image": "vllm/vllm-openai:v0.9.0",
    "commands": [
        "--model",
        "kaitchup/translategemma-27b-it-FP8-Dynamic",
        "--disable-custom-all-reduce",
        "--trust-remote-code",
        "--gpu-memory-utilization",
        "0.95",
        "--max-model-len",
        "2048",
        "--chat-template",
        "/root/.cache/huggingface/hub/models--kaitchup--translategemma-27b-it-FP8-Dynamic/snapshots/6bdba8b2d7d690be008ce87348f2ce4be2e8a876/chat_template.jinja",
        "--enable-prefix-caching",
        "--prefix-caching-hash-algo",
        "builtin"
    ],
    "resources": {
        "GPU": {
            "Count": 1,
            "vRam": 32000
        }
    },
    "envs": [],
    "sample_query": {
        "apiType": "text2text",
        "path": "v1/completions",
        "prompt": "write a quick sort algorithm.",
        "prompts": [],
        "dataUrl": "",
        "body": {
            "max_tokens": "1000",
            "model": "kaitchup/translategemma-27b-it-FP8-Dynamic",
            "stream": "true",
            "temperature": "0"
        },
        "loadingTimeout": 90
    },
    "policy": {
        "Obj": {
            "queue_timeout": 30.0,
            "scalein_timeout": 1.0
        }
    }
}

InferX — Serverless GPU Inference Platform for Production Workloads