Kube AI Hub Compute Platform

Unified Heterogeneous Compute
3–10x Better GPU Utilization

Kube AI Hub is a heterogeneous compute management platform with Kubernetes as its kernel. Through GPU/CPU resource pooling and vGPU virtualization, it enables platform-level management of hardware clusters and supports domestic GPU/CPU/NPU hardware for a secure, self-controlled AI compute infrastructure.

Full-stack Compute Management, Simplified

Kube AI Hub provides end-to-end compute management from hardware resources to business applications. A unified console manages heterogeneous GPU/CPU clusters with built-in multi-tenancy, elastic scheduling, and fine-grained metering — helping enterprises rapidly build a self-controlled AI compute infrastructure.

Easy to Deploy

Deploy on any existing Kubernetes cluster or bare metal, supports online and air-gapped installation, one-click scaling and upgrades.
Feature Complete

Manage GPU nodes, job queues, compute scheduling, multi-tenancy, monitoring, metering, and log management in a single unified platform.
Modular & Pluggable

All modules are loosely coupled and optional. Flexibly integrate third-party schedulers, storage systems, and monitoring stacks.

Value for Every Team

The built-in multi-tenant design lets infrastructure teams, AI engineers, and operations staff collaborate on the same platform. Infra teams control hardware resources centrally, engineers focus on model development, and ops teams gain complete observability and automation.

Infra Team
AI Engineers
Ops Team
Business Owner

Infra Team

Unified management of heterogeneous GPU/CPU clusters — resource pooling reduces hardware costs

Unified onboarding of NVIDIA, Huawei Ascend, Cambricon, Iluvatar, and other GPUs
vGPU virtualization slices compute resources, improving hardware utilization by 3–10x
Built-in CSI support for S3, NFS, Ceph, and other file storage resources
Multi-cluster management across data centers and hybrid cloud environments

AI Engineers

Focus on model training and inference — no more fighting Kubernetes YAML

Submit AI training jobs via web console without writing complex Kubernetes manifests
Built-in job queues with priority scheduling and resource reservation for fair compute allocation
Supports distributed training with PyTorch, TensorFlow, and other major frameworks
One-click inference service deployment with auto horizontal scaling

Ops Team

Build a one-stop compute platform operations and observability system

Multi-dimensional monitoring and alerting for GPU temperature, utilization, and memory usage
Centralized log collection and search to quickly diagnose job failures
Node health checks and automatic fault isolation to ensure training job stability
Graphical console and web terminal to accommodate different operational preferences

Business Owner

Compute costs are transparent and auditable — manage IT budgets with precision

View compute usage and cost allocation reports by tenant, department, or project
Quota management prevents resource contention and waste
Metering reports support IT budget planning and cost accounting
Multi-tenant isolation ensures data and resource security across teams

Key Platform Features

Kube AI Hub covers the full compute management lifecycle from hardware onboarding to workload delivery. All features are modular and can be enabled on demand.

Heterogeneous GPU Cluster Mgmt

Unified onboarding of NVIDIA, Huawei Ascend, Cambricon, Tianshu, and other GPUs. Supports online node expansion and cross-cluster resource allocation.
vGPU Virtualization & Scheduling

Fine-grained GPU slicing and sharing across concurrent workloads. Significantly improves hardware utilization with sub-card granularity.
Multi-tenant Access Control

Three-tier permission system across platform, workspace, and project. Supports AD/LDAP integration for secure multi-team resource isolation.
Storage & Networking

Supports S3, NFS, Ceph, LocalPV and other storage backends. Built-in network policy management with Calico, Flannel, and other CNI plugins.

Heterogeneous Compute Management

GPU/CPU heterogeneous compute pooling and virtualization improves utilization by 3–10x, supporting domestic GPU/CPU/NPU hardware for a secure local compute foundation.

Read More →
Intelligent Job Scheduling

Thousand-GPU distributed scheduling with built-in priority job queues and resource reservation policies for large-scale parallel AI training workloads.

Read More →
Full-stack Observability

Multi-dimensional GPU/CPU monitoring, alerting, and log management with multi-tenant isolation and support for multiple notification channels.

Read More →
Metering and Billing

Compute usage monitoring and cost accounting by tenant, department, and project — helping enterprises manage IT costs with precision.

Read More →
Multi-cluster Management

Unified management of multiple GPU/CPU clusters across data centers and hybrid cloud, with high availability and disaster recovery best practices.

Read More →
Edge Node Support

Extend compute scheduling to edge nodes via KubeEdge, enabling cloud-edge collaborative AI inference job distribution and management.

Read More →
App Marketplace

Built-in Helm-based app marketplace and image registry (Harbor) for one-click deployment and lifecycle management of AI frameworks and tools.

Read More →