Concepts & Roles¶
Understanding the platform's tenancy model and permission system helps you work effectively with projects, groups, and storage.
Tenancy Hierarchy¶
Resources on the platform are organised in a hierarchy:
graph TD
Admin[Platform Admin] -->|creates| Group[Group\ne.g. Research Lab]
Group -->|contains| Project[Project\ne.g. LLM Fine-Tuning]
Project -->|launches| Workspace[Workspace / IDE]
Project -->|binds| ProjStorage[Project Storage PVC]
Group -->|owns| GroupStorage[Group Storage PVC]
GroupStorage -->|inherited by| ProjStorage
| Entity | Description |
|---|---|
| Group | A team or department. Groups own shared storage and can be project members. |
| Project | The primary resource allocation unit. Holds GPU/CPU quotas, config files, and deployments. |
| Workspace | An interactive IDE (JupyterLab / VSCode) running inside the project's quota. |
| Storage (PVC) | Persistent volume claim; survives workspace restarts. |
Quota Lifecycle¶
flowchart LR
Admin["Admin defines\nResource Plan"] -->|assigned to| Project
Project -->|quota enforced on| Launch[Workspace Launch]
Launch -->|requests| DRA["DRA ResourceClaim"]
Launch -->|uses| Queue["Platform Queue"]
DRA -->|allocates| GPULimit["GPU / SM Share"]
Queue -->|applies| Priority["Priority / Preemption Policy"]
Each project is assigned a Resource Plan by an admin. When you launch a workspace, the platform verifies project quota and any per-user quota before creating Kubernetes DRA resources for the pod.
A plan also carries:
- An allowed GPU model list — only these models are selectable from the launch form.
- An optional schedule window that restricts when plan-bound queues are usable (see Plan Window below).
- One or more bound queues with their priority, preemption, and GPU-model policy.
Scheduling Queues, Priority, and Preemption¶
Workloads choose a platform queue. Queue records are synced to Volcano Queue CRDs for policy compatibility, while DRA GPU pods run through Kubernetes default-scheduler with ResourceClaims. A queue owns a priority value, an optional preemptible flag, and an optional deserved GPU limit:
| Concept | Meaning |
|---|---|
| Priority | Higher-priority queues are scheduled first when GPUs are scarce. |
| Preemptible | Pods in this queue can be evicted to free GPUs for higher-priority queues. |
| Deserved GPU | The GPU floor a queue is guaranteed even when other queues compete. A 0 value marks a queue as a "victim" eligible to be drained first. |
The default queue is always available; plan-bound queues are listed alongside it on the workspace and deploy forms.
Plan Window¶
A plan window is the time interval during which a project's plan-bound queues are usable. Outside the window, plan-bound queues stop scheduling new pods and the platform's plan-window reaper may evict pods already running in those queues. Pods on the default queue are not affected.
The dashboard shows live countdowns for the next boundary (open/close), so users can avoid eviction by switching to the default queue or waiting for the next window.
DRA Resource Claims¶
GPU access on the platform is provisioned through Kubernetes Dynamic Resource Allocation (DRA) ResourceClaims. Each claim describes:
- A device class (e.g., a specific GPU model)
- An SM share percentage that describes the requested compute-time share
- A VRAM policy —
elasticshares VRAM with peers,hard_capenforces a strict ceiling
There are two ways a claim is created:
- Inline claim — the workspace launch flow creates a ResourceClaim that lives with the pod and disappears when the pod is deleted.
- Project-managed claim — created from a project's GPU Claims tab and reused by Pod or Deployment config files across launches. It is a pending contract until Kubernetes DRA allocates it and
reservedForpoints at a live Pod; only that bound state counts against quota and resource-hours.
Config files refer to project-managed claims through named deploy-time slots, for example platform-go/dra-claim-name: '{{ gpuClaimName "train-a" }}'. The slot name is not the claim name; the deploying user maps each slot to one of their own ResourceClaims. A Deployment with multiple replicas shares the same claim allocation and is counted once per claim. A single Kubernetes Deployment cannot assign different claims to different replicas because every replica shares one PodTemplate, so split the workload into multiple Deployments when only some pods should use a claim.
Role-Based Permissions¶
System Roles¶
| Permission | User | Manager | Admin |
|---|---|---|---|
| Use workspaces within quota | ✓ | ✓ | ✓ |
| Create personal projects | ✓ | ✓ | ✓ |
| Submit resource requests | ✓ | ✓ | ✓ |
| Access admin panel | ✓ | ✓ | |
| Manage all users / groups | ✓ | ||
| Define resource plans | ✓ | ||
| Manage platform queues | ✓ | ||
| View audit logs | ✓ | ✓ | |
| RBAC policy management | ✓ |
Project Roles¶
| Permission | Member | Manager | Admin |
|---|---|---|---|
| Launch workspaces | ✓ | ✓ | ✓ |
| View project configs | ✓ | ✓ | ✓ |
| Edit config files | ✓ | ✓ | |
| Manage deployments | ✓ | ✓ | |
| Add/remove members | ✓ | ||
| Delete project | ✓ |
Group Roles¶
| Permission | Member | Admin |
|---|---|---|
| Access group storage | ✓ | ✓ |
| View group members | ✓ | ✓ |
| Add/remove members | ✓ | |
| Set storage permissions | ✓ |
Storage Permission Model¶
Storage permissions follow a dual-path inheritance model:
flowchart TD
GroupAdmin["Group Admin\nsets permissions"] -->|batch set| GroupStorage["Group PVC\n(source of truth)"]
GroupStorage -->|inherited, read-only| ProjectView["Project Storage View\nfor Group Members"]
ProjectAdmin["Project Admin"] -->|direct management| DirectPerms["Project Storage\nfor Non-Group Members"]
DirectPerms --> ProjectView
- Group members always inherit storage permissions from the group level. Their permissions cannot be overridden at the project level.
- Non-group project members have permissions managed directly at the project level.
Key Terms¶
| Term | Definition |
|---|---|
| PVC | Persistent Volume Claim — a Kubernetes storage resource that survives pod restarts |
| Platform Queue | A named scheduling policy that controls priority, preemption, queue windows, and optional GPU model affinity |
| Plan Window | The time interval during which a project's plan-bound queues are usable |
| Preemption | Eviction of a pod in a preemptible queue so a higher-priority workload can run |
| DRA | Dynamic Resource Allocation — Kubernetes API that provisions GPUs via ResourceClaim objects |
| ResourceClaim | A DRA object that requests a fractional or whole GPU; platform-managed standalone claims consume quota only while bound to live Pods |
| Config File | A versioned Kubernetes YAML/config stored with content-addressable immutability |
| SM share | Streaming Multiprocessor share percentage used to describe fractional GPU compute allocation |
| Resource Plan | A named template defining GPU, CPU, memory limits, allowed GPU models, and schedule windows for a project |
| Storage Lane | A profile (shared-rwx, legacy-rwx, fast-rwo) that selects how a PVC is provisioned |
| Content-Addressable Storage (CAS) | Storage where files are identified by their SHA-256 hash, making them immutable |
| ltree | PostgreSQL data type for hierarchical tree paths (used for nested projects) |
| Workspace | An interactive cloud IDE (JupyterLab, VSCode) running inside a Kubernetes pod |