Last updated:

Text Generation

Task Overview

Text Generation is the most common large language model inference task, covering conversation completion, text continuation, code generation, and more. The platform's dedicated inference instances provide an OpenAI-compatible API interface.

API Usage

Chat Completions

curl https://<instance-address>/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <access-token>" \
  -d '{
    "model": "<model-name>",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain large language models briefly."}
    ],
    "temperature": 0.7,
    "max_tokens": 512
  }'

Text Completions

curl https://<instance-address>/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <access-token>" \
  -d '{
    "model": "<model-name>",
    "prompt": "Large language models are",
    "max_tokens": 200
  }'

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="https://<instance-address>/v1",
    api_key="<access-token>"
)

response = client.chat.completions.create(
    model="<model-name>",
    messages=[
        {"role": "user", "content": "Explain large language models briefly."}
    ]
)
print(response.choices[0].message.content)

Common Parameters

Parameter	Type	Description
`temperature`	float	Generation randomness, range 0–2; lower is more deterministic; default 1.0
`max_tokens`	int	Maximum number of tokens to generate
`top_p`	float	Nucleus sampling probability; used together with temperature
`stream`	bool	Whether to use streaming output; default false

Create Dedicated Inference Instance