< img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=3131724&fmt=gif" />

AI Compute Management Platform: Simplify Complexity, 3–10x Better Utilization

Through GPU/CPU heterogeneous compute pooling and virtualization, resource utilization can be improved by 3–10x. Unified scheduling and management of AI clusters enables platform-level management of hardware through resource pooling and GPU virtualization.

Fully compatible with domestic GPU/CPU/NPU hardware, building a secure and controllable local compute infrastructure.

Heterogeneous Compute Management

Four Core Values to Fully Unleash AI Compute Potential

  • Compute Pooling for Maximum Utilization

    GPU/CPU heterogeneous compute pooling and virtualization improves utilization by 3–10x.

    • Heterogeneous GPU virtualization supporting NVIDIA, Cambricon, Huawei Ascend, Iluvatar, and more
    • Unified compute resource pooling to manage GPU clusters, CPU clusters, and file storage
    • Native Kubernetes integration for seamless compute resource and service management
    • Harbor image registry integration with out-of-the-box Helm Charts application management
    Compute Pooling for Maximum Utilization
  • On-Demand Allocation and Flexible Scheduling

    Supports compute over-provisioning and fine-grained slicing, with flexible allocation by tenant/department/project and elastic scaling on demand.

    • Multi-tenant isolation with standard authentication, custom authentication, and multi-tenant permission management
    • Fine-grained compute slicing with GPU scheduling at sub-card granularity
    • Elastic scaling to dynamically adjust compute resources based on workload demands
    • Comprehensive cluster management covering nodes, groups, containers, storage, monitoring, and logging
    On-Demand Allocation and Flexible Scheduling
  • Metering, Billing, and Fine-grained Cost Management

    Supports compute usage monitoring and cost accounting to help enterprises manage IT costs with precision.

    • Real-time compute usage monitoring for accurate tracking of GPU/CPU resource consumption
    • Multi-dimensional billing with flexible statistics by tenant, project, and time period
    • Cost visualization for intuitive display of compute usage and cost allocation across departments
    Metering, Billing, and Fine-grained Cost Management
  • Intelligent Scheduling and Operations

    Thousand-card level distributed scheduling capability with built-in job queues, service deployment, monitoring alerts, and log management.

    • Thousand-card distributed scheduling supporting large-scale parallel AI training jobs
    • Built-in job queues with priority scheduling and resource reservation policies
    • Monitoring and alerting for real-time cluster health awareness and automatic fault notification
    • Log management for unified collection and retrieval of container logs to quickly locate issues
    Intelligent Scheduling and Operations
background image background image

Product Architecture: Full-Stack Management from User Permissions to Heterogeneous Hardware

Get started quickly with the AI Compute Management Platform by reading the documentation.