< img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=3131724&fmt=gif" />
Last updated:

    Create Dedicated Inference Instance

    Access Entry

    On the model details page, click the Deploy Model button in the top right corner, then select Dedicated Instance from the dropdown menu to navigate to the creation page.

    Note

    Only some models support creating dedicated instances. If the desired model does not have a "Dedicated Instance" option, contact the platform administrator.

    Configuration Parameters

    On the dedicated instance creation page, fill in the following configuration, then click Create Instance:

    Parameter Description
    Instance Name Custom name; must not duplicate existing instances
    Model ID The model identifier on the platform; defaults to the current model
    Region/Resource Config Select compute resource specifications for the inference service (GPU model, VRAM size)
    Runtime Framework Select the inference framework: vLLM, SGLang, TGI, or llama.cpp
    Security Level Public: accessible without authentication (default); Private: requires authentication
    Elastic Replicas Number of instance replicas, range 1–5

    View Instance List

    After creation, use the top navigation to open Model Inference → Dedicated Instances to view all created instances and their running status. You can also view them centrally in the dedicated instances section of Resource Management.

    Calling the Inference Service

    Once the instance is running, the platform provides:

    • Web Testing Interface: Test the model directly in your browser via conversation.
    • API Interface: OpenAI-compatible API for business code integration.

    For private instances, include an access token in the request header:

    curl https://<instance-address>/v1/chat/completions \
      -H "Authorization: Bearer <access-token>" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "<model-name>",
        "messages": [{"role": "user", "content": "Hello"}]
      }'