System Component
Core platform architecture, services, nodes, and component interactions. These components determine how jobs are submitted, scheduled, and executed across the CosmicAC infrastructure.
Components used across all job types:
| Component | Description | Role |
|---|
| app-ui | Web interface | Browser-based dashboard for job management |
| app-node | Application server | Handles HTTP API requests, authenticates users, and routes commands to the orchestrator |
| cosmicac-cli | Command-line interface | Submits jobs, manages resources, and connects to containers from the terminal |
| wrk-ork | Orchestrator | Manages resource allocation, distributes jobs across the cluster, and routes requests to workers |
| wrk-server-k8s-nvidia | Kubernetes GPU server worker | Manages GPU server provisioning and communicates with the Kubernetes cluster |
| k8s-control plane | Kubernetes control plane | Schedules pods, allocates resources, and manages workload lifecycle |
GPU container jobs run user workloads inside KubeVirt virtual machines with direct GPU access. Each container runs in an isolated VM, scheduled by Kubernetes and managed through the CosmicAC orchestration layer.
| Component | Description | Role |
|---|
| wrk-agent-instance | Instance agent | Runs inside the VM and exposes SSH access over Hyperswarm |
| VMI | KubeVirt VirtualMachineInstance | Virtual machine instance managed by KubeVirt |
- User submits a job via app-ui or cosmicac-cli.
- Request authenticates with app-node over HTTPS (ttr-token).
- app-node forwards the request to wrk-ork over HRPC.
- wrk-ork routes the job to wrk-server-k8s-nvidia over HRPC.
- wrk-server-k8s-nvidia instructs the k8s-control plane to schedule the workload.
- Kubernetes creates a pod containing a VMI with wrk-agent-instance.
- User runs
cosmicac jobs shell from cosmicac-cli.
- CLI connects directly to wrk-agent-instance over hyperswarm-ssh.
Managed inference jobs run vLLM inside KubeVirt virtual machines and expose model endpoints through a proxy layer. The proxy handles API key verification, load balancing, and service discovery through a distributed hash table.
| Component | Description | Role |
|---|
| proxy-inference | Inference proxy | Verifies API keys, load balances requests, and routes to inference agents |
| wrk-agent-inference | Inference agent | Runs vLLM inside the VM and handles inference requests over HRPC |
| HyperDB + Autobase | Distributed database | Stores API keys, usage metrics, and job metadata |
| dht table | Distributed hash table | Enables service discovery for inference agents |
| VMI | KubeVirt VirtualMachineInstance | Virtual machine instance managed by KubeVirt |
- User submits a managed inference job via app-ui.
- Request authenticates with app-node over HTTPS (ttr-token).
- app-node forwards the request to wrk-ork over HRPC.
- wrk-ork routes the job to wrk-server-k8s-nvidia over HRPC.
- Kubernetes creates a pod containing a VMI with wrk-agent-inference.
- On spin-up, wrk-agent-inference registers itself to the dht table.
- User sends an inference request via cosmicac-cli.
- Request authenticates with proxy-inference over HTTPS or HRPC (api-key).
- proxy-inference verifies the API key.
- proxy-inference queries the dht table (search by topic) to discover inference agents.
- The load balancer routes the request to a wrk-agent-inference over HRPC.
- wrk-agent-inference processes the request using vLLM and returns the response.
| Protocol | Description | Usage |
|---|
| HTTPS | HTTP over TLS | app-ui/cli → app-node, cli/client → proxy-inference |
| HRPC | Hyperswarm RPC | internal service communication (app-node → wrk-ork → workers), cli → proxy-inference |
| hyperswarm-ssh | SSH over Hyperswarm | cli → wrk-agent-instance |
| Method | Description | Used By |
|---|
| ttr-token | Token-based authentication over HTTPS | app-ui, cosmicac-cli → app-node |
| api-key | API key authentication | cosmicac-cli, HTTP clients → proxy-inference |
CosmicAC uses KubeVirt to run user workloads in isolated virtual machines on Kubernetes. KubeVirt runs VMs in non-privileged pods, applies Kubernetes security controls (RBAC, SELinux, network policies), and exposes GPU devices through secure device plugins.
| Feature | Benefit |
|---|
| Pod-level isolation | Separates each VM in its own namespace with SELinux enforcement |
| Non-privileged pods | Runs VMs without elevated container privileges |
| GPU device plugins | Exposes hardware without hostPath volumes or compromising isolation |