Last updated:

Using Fine-tuning and Monitoring

After the fine-tuning instance is created and reaches the Running state, you can perform model fine-tuning through either the Web UI or Notebook.

Using the Web UI (LlamaBoard)

If you selected the LLaMA-Factory framework during instance creation, the system provides the LlamaBoard visual training interface, enabling no-code model fine-tuning.

Steps

Load Training Configuration Click the Open Web UI button in the instance action bar to open the LlamaBoard training interface. Select the base model and fine-tuning method (e.g., LoRA, QLoRA, full-parameter fine-tuning) from the top section.
Select Training Dataset In the dataset configuration area, select a training dataset that has been uploaded to the platform. Both custom datasets and built-in platform datasets are supported.

Adjust Hyperparameters Adjust training hyperparameters based on your requirements:

Parameter	Description	Recommended Range
Batch Size	Number of training samples per batch	Adjust based on VRAM, typically 4-16
Learning Rate	Learning rate	1e-5 to 5e-4
Epoch	Number of training epochs	Typically 1-5
LoRA Rank	Dimension of LoRA low-rank matrices	8, 16, or 32
LoRA Alpha	LoRA scaling coefficient	Usually 2× the Rank

Start Training After configuration, click the Start Fine-tuning button to launch the training task. During training, you can view real-time loss curves and training progress in the interface.

Tip

LlamaBoard automatically saves training configurations. For iterative experiments, you can quickly reload previous configurations and adjust parameters.

Using Notebook

Click the Launch Notebook button in the instance action bar to open a JupyterLab development environment in your browser, giving you full control over training code and workflow.

Notebook mode is ideal for:

Custom training scripts and data preprocessing logic
Using MS-Swift or other framework CLI commands
Fine-grained control over the training process
Debugging model or dataset issues

Note

The Notebook environment comes pre-installed with the relevant fine-tuning framework and dependencies. For additional packages, use pip install in the terminal.

Training Monitoring

Click the instance name in the instance list to open the details page, then switch to the Analysis tab to view real-time training status and resource monitoring.

Resource Monitoring

Metric	Description
CPU Utilization	System CPU usage percentage
GPU Utilization	GPU compute core usage
Memory Usage	System memory consumption
VRAM Usage	GPU memory allocation and usage

Training Metrics

Metric	Description
Loss Curve	Training loss trend over steps/epochs
Learning Rate Schedule	Actual learning rate changes from the scheduler
Training Speed	Samples or training steps processed per second

Monitoring these metrics helps you determine whether training is progressing normally and identify issues such as overfitting, underfitting, or resource bottlenecks.

View Billing Details

On the instance details page, switch to the Billing tab to view resource consumption details:

Field	Description
Billing Start Time	When the instance started consuming compute resources
Billing End Time	When the instance was stopped or resources were released
Resource Spec	GPU/CPU/memory configuration in use
Accumulated Cost	Total cost to date

Warning

Fine-tuning instances incur charges continuously while running. Even after a training task completes, charges continue as long as the instance is not stopped. Stop the instance promptly after training to avoid unnecessary costs.