Concepts & Roles¶

Understanding the platform's tenancy model and permission system helps you work effectively with projects, groups, and storage.

Tenancy Hierarchy¶

Resources on the platform are organised in a hierarchy:

graph TD
    Admin[Platform Admin] -->|creates| Group[Group\ne.g. Research Lab]
    Group -->|contains| Project[Project\ne.g. LLM Fine-Tuning]
    Project -->|launches| Workspace[Workspace / IDE]
    Project -->|binds| ProjStorage[Project Storage PVC]
    Group -->|owns| GroupStorage[Group Storage PVC]
    GroupStorage -->|inherited by| ProjStorage

Entity	Description
Group	A team or department. Groups own shared storage and can be project members.
Project	The primary resource allocation unit. Holds GPU/CPU quotas, config files, and deployments.
Workspace	An interactive IDE (JupyterLab / VSCode) running inside the project's quota.
Storage (PVC)	Persistent volume claim; survives workspace restarts.

Quota Lifecycle¶

flowchart LR
    Admin["Admin defines\nResource Plan"] -->|assigned to| Project
    Project -->|quota enforced on| Launch[Workspace Launch]
    Launch -->|requests| DRA["DRA ResourceClaim"]
    Launch -->|uses| Queue["Platform Queue"]
    DRA -->|allocates| GPULimit["GPU / SM Share"]
    Queue -->|applies| Priority["Priority / Preemption Policy"]

Each project is assigned a Resource Plan by an admin. When you launch a workspace, the platform verifies project quota and any per-user quota before creating Kubernetes DRA resources for the pod.

A plan also carries:

An allowed GPU model list — only these models are selectable from the launch form.
An optional schedule window that restricts when plan-bound queues are usable (see Plan Window below).
One or more bound queues with their priority, preemption, and GPU-model policy.

Scheduling Queues, Priority, and Preemption¶

Workloads choose a platform queue. Queue records are synced to Volcano Queue CRDs for policy compatibility, while DRA GPU pods run through Kubernetes default-scheduler with ResourceClaims. A queue owns a priority value, an optional preemptible flag, and an optional deserved GPU limit:

Concept	Meaning
Priority	Higher-priority queues are scheduled first when GPUs are scarce.
Preemptible	Pods in this queue can be evicted to free GPUs for higher-priority queues.
Deserved GPU	The GPU floor a queue is guaranteed even when other queues compete. A `0` value marks a queue as a "victim" eligible to be drained first.

The default queue is always available; plan-bound queues are listed alongside it on the workspace and deploy forms.

Plan Window¶

A plan window is the time interval during which a project's plan-bound queues are usable. Outside the window, plan-bound queues stop scheduling new pods and the platform's plan-window reaper may evict pods already running in those queues. Pods on the default queue are not affected.

The dashboard shows live countdowns for the next boundary (open/close), so users can avoid eviction by switching to the default queue or waiting for the next window.

DRA Resource Claims¶

GPU access on the platform is provisioned through Kubernetes Dynamic Resource Allocation (DRA) ResourceClaims. Each claim describes:

A device class (e.g., a specific GPU model)
An SM share percentage that describes the requested compute-time share
A VRAM policy — elastic shares VRAM with peers, hard_cap enforces a strict ceiling

There are two ways a claim is created:

Inline claim — the workspace launch flow creates a ResourceClaim that lives with the pod and disappears when the pod is deleted.
Project-managed claim — created from a project's GPU Claims tab and reused by Pod or Deployment config files across launches. It is a pending contract until Kubernetes DRA allocates it and reservedFor points at a live Pod; only that bound state counts against quota and resource-hours.

Config files refer to project-managed claims through named deploy-time slots, for example platform-go/dra-claim-name: '{{ gpuClaimName "train-a" }}'. The slot name is not the claim name; the deploying user maps each slot to one of their own ResourceClaims. A Deployment with multiple replicas shares the same claim allocation and is counted once per claim. A single Kubernetes Deployment cannot assign different claims to different replicas because every replica shares one PodTemplate, so split the workload into multiple Deployments when only some pods should use a claim.

Role-Based Permissions¶

System Roles¶

Permission	User	Manager	Admin
Use workspaces within quota	✓	✓	✓
Create personal projects	✓	✓	✓
Submit resource requests	✓	✓	✓
Access admin panel		✓	✓
Manage all users / groups			✓
Define resource plans			✓
Manage platform queues			✓
View audit logs		✓	✓
RBAC policy management			✓

Project Roles¶

Permission	Member	Manager	Admin
Launch workspaces	✓	✓	✓
View project configs	✓	✓	✓
Edit config files		✓	✓
Manage deployments		✓	✓
Add/remove members			✓
Delete project			✓

Group Roles¶

Permission	Member	Admin
Access group storage	✓	✓
View group members	✓	✓
Add/remove members		✓
Set storage permissions		✓

Storage Permission Model¶

Storage permissions follow a dual-path inheritance model:

flowchart TD
    GroupAdmin["Group Admin\nsets permissions"] -->|batch set| GroupStorage["Group PVC\n(source of truth)"]
    GroupStorage -->|inherited, read-only| ProjectView["Project Storage View\nfor Group Members"]
    ProjectAdmin["Project Admin"] -->|direct management| DirectPerms["Project Storage\nfor Non-Group Members"]
    DirectPerms --> ProjectView

Group members always inherit storage permissions from the group level. Their permissions cannot be overridden at the project level.
Non-group project members have permissions managed directly at the project level.

Key Terms¶

Term	Definition
PVC	Persistent Volume Claim — a Kubernetes storage resource that survives pod restarts
Platform Queue	A named scheduling policy that controls priority, preemption, queue windows, and optional GPU model affinity
Plan Window	The time interval during which a project's plan-bound queues are usable
Preemption	Eviction of a pod in a preemptible queue so a higher-priority workload can run
DRA	Dynamic Resource Allocation — Kubernetes API that provisions GPUs via ResourceClaim objects
ResourceClaim	A DRA object that requests a fractional or whole GPU; platform-managed standalone claims consume quota only while bound to live Pods
Config File	A versioned Kubernetes YAML/config stored with content-addressable immutability
SM share	Streaming Multiprocessor share percentage used to describe fractional GPU compute allocation
Resource Plan	A named template defining GPU, CPU, memory limits, allowed GPU models, and schedule windows for a project
Storage Lane	A profile (`shared-rwx`, `legacy-rwx`, `fast-rwo`) that selects how a PVC is provisioned
Content-Addressable Storage (CAS)	Storage where files are identified by their SHA-256 hash, making them immutable
ltree	PostgreSQL data type for hierarchical tree paths (used for nested projects)
Workspace	An interactive cloud IDE (JupyterLab, VSCode) running inside a Kubernetes pod