Using Model Evaluation
After an evaluation task is created, you can track its progress, view results, and manage tasks from the evaluation task list.
Evaluation Status
Evaluation tasks go through the following statuses during execution:
| Status | Description |
|---|---|
| Evaluating | The evaluation task is running; details are not yet available |
| Completed | The evaluation has finished; results and detailed reports are viewable |
| Failed | The evaluation task failed; check the logs for details |
Note
View Evaluation Details
When the evaluation task status changes to Completed, click the Details button in the task list to open the evaluation results page.
The details page contains the following information:
| Information | Description |
|---|---|
| Overall Score | Composite score across all evaluation benchmarks |
| Per-Benchmark Scores | Individual scores for each benchmark/dataset |
| Evaluation Metrics | Specific metrics such as accuracy, F1 score, BLEU score, etc. |
| Evaluation Configuration | Framework, datasets, and parameter settings used during evaluation |
The evaluation details page gives you a comprehensive view of model performance across different benchmarks, providing data-driven insights for model selection and optimization.
Download Evaluation Results
On the evaluation details page or in the task list, click the Download button to download the complete evaluation results file.
The downloaded file contains:
- Evaluation score summary table
- Detailed evaluation data for each benchmark
- Sample model prediction outputs
Tip
Delete Evaluation Tasks
If an evaluation task is no longer needed, click the Delete button in the task list to remove it.
Warning