< img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=3131724&fmt=gif" />
Last updated:

    Datasets

    What are Datasets

    The Dataset Hub enables unified management of data, providing easy access to and sharing of various types of data. Datasets can be categorized as text, image, audio, video, multimodal, scientific computing, and other types, used for text generation, speech recognition, image classification, relation extraction, and other tasks, supporting model training and inference.

    Dataset Types

    The platform supports the following dataset types:

    Type Example Tasks
    Text Text classification, sentiment analysis, Q&A, text generation
    Image Image classification, object detection, image segmentation
    Audio Speech recognition, audio classification
    Video Video understanding, action recognition
    Multimodal Image-text understanding, visual Q&A

    Core Features

    • Dataset Upload: Upload data files in various formats via the Web interface or Git.
    • Dataset Preview: Preview Parquet, CSV, JSONL, and other format files online.
    • Version Control: Manage datasets with Git versioning.
    • Access Control: Supports public and private visibility settings.
    • Processing Integration: Datasets can be directly used in platform fine-tuning and evaluation tasks.

    Dataset Card

    Learn about Dataset Cards — their purpose, metadata format, and how to write one to help users understand and use your dataset.

    Create Dataset

    How to create a dataset repository on the platform, including entry point, form parameters, and next steps.

    Upload Datasets

    How to upload dataset files using the Web interface, Git, CLI tools, or Python SDK.

    Update Datasets

    How to edit dataset files, modify repository settings, and delete datasets.

    Download Datasets

    How to download datasets using Git, CLI tools, or Python SDK.