Last updated:

Model Inference Overview

What is Model Inference

The platform provides one-click inference functionality, helping users quickly allocate compute and start inference services on supported model pages — no complex environment configuration needed.

Core Advantages

Flexible Invocation: Provides an intuitive web interface for conversation testing, while also generating standard API interfaces for business code integration.
Rich Framework Support: Supports multiple mainstream inference frameworks including vLLM, llama.cpp, SGLang, and TGI.
Instantly Available: Eliminates complex configuration by automatically launching container environments with all required dependencies.

Supported Inference Frameworks

Framework	Features	Best For
vLLM	High throughput, low latency, supports continuous batching	Production-grade high-concurrency inference services
SGLang	Optimized for structured generation, supports RadixAttention	Complex reasoning and structured output scenarios
TGI (Text Generation Inference)	Hugging Face's official inference server	Inference within the Hugging Face ecosystem
llama.cpp	Supports GGUF format, runs on both CPU and GPU	Resource-constrained environments or GGUF format models

Inference Task Types

The platform supports multiple inference task types — refer to the corresponding documentation for API usage:

Model Inference Overview

What is Model Inference

Core Advantages

Supported Inference Frameworks

Inference Task Types

Related Documentation