Last updated:

Create Dedicated Inference Instance

Access Entry

On the model details page, click the Deploy Model button in the top right corner, then select Dedicated Instance from the dropdown menu to navigate to the creation page.

Note

Only some models support creating dedicated instances. If the desired model does not have a "Dedicated Instance" option, contact the platform administrator.

Configuration Parameters

On the dedicated instance creation page, fill in the following configuration, then click Create Instance:

Parameter	Description
Instance Name	Custom name; must not duplicate existing instances
Model ID	The model identifier on the platform; defaults to the current model
Region/Resource Config	Select compute resource specifications for the inference service (GPU model, VRAM size)
Runtime Framework	Select the inference framework: vLLM, SGLang, TGI, or llama.cpp
Security Level	Public: accessible without authentication (default); Private: requires authentication
Elastic Replicas	Number of instance replicas, range 1–5

View Instance List

After creation, use the top navigation to open Model Inference → Dedicated Instances to view all created instances and their running status. You can also view them centrally in the dedicated instances section of Resource Management.

Calling the Inference Service

Once the instance is running, the platform provides:

Web Testing Interface: Test the model directly in your browser via conversation.
API Interface: OpenAI-compatible API for business code integration.

For private instances, include an access token in the request header:

curl https://<instance-address>/v1/chat/completions \
  -H "Authorization: Bearer <access-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<model-name>",
    "messages": [{"role": "user", "content": "Hello"}]
  }'