< img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=3131724&fmt=gif" />
Last updated:

    Download Datasets

    Download Methods

    The platform supports three methods to download datasets:

    Method Use Case
    HTTPS Git Clone General-purpose download for all users
    SSH Git Clone Password-free download after SSH key configuration
    csghub-cli Supports resumable downloads; ideal for large datasets

    Prerequisites

    Before downloading datasets containing large files, install Git LFS:

    # Install Git LFS (macOS)
    brew install git-lfs
    
    # Install Git LFS (Ubuntu/Debian)
    sudo apt-get install git-lfs
    
    # Initialize Git LFS
    git lfs install
    

    Download via HTTPS

    # Clone the dataset repository
    git lfs install
    git clone https://<platform-host>/<namespace>/<dataset-name>
    

    For private datasets, use an access token:

    git clone https://<username>:<access-token>@<platform-host>/<namespace>/<dataset-name>
    

    To skip large file downloads:

    GIT_LFS_SKIP_SMUDGE=1 git clone https://<platform-host>/<namespace>/<dataset-name>
    

    Download via SSH

    After adding your SSH public key in User Settings → SSH Keys:

    git lfs install
    git clone ssh://git@<platform-host>/<namespace>/<dataset-name>
    

    Download via csghub-cli

    Install csghub-cli:

    pip install csghub-sdk
    

    Download a dataset:

    # Download the entire dataset repository
    csghub-cli download <namespace>/<dataset-name> --repo_type dataset
    
    # Download a specific revision
    csghub-cli download <namespace>/<dataset-name> --repo_type dataset --revision main
    

    Download via Python SDK

    from pycsghub.snapshot_download import snapshot_download
    
    dataset_path = snapshot_download(
        repo_id="<namespace>/<dataset-name>",
        repo_type="dataset",
        endpoint="https://<platform-host>",
        token="<access-token>"  # Required for private datasets
    )
    print(f"Dataset downloaded to: {dataset_path}")
    

    Note

    Access tokens can be generated in User Settings → Access Tokens.