Cluster Execution
RoboVAST can execute scenarios at scale on a Kubernetes cluster,
running each run configuration as an independent Job and collecting results
via a built-in MinIO S3 server. This section covers everything from cluster
setup and job queueing to multi-context workflows and cloud-provider-specific
configuration.
Overview
When a cluster run is triggered, RoboVAST performs the following steps
internally:
Config upload — All scenario configurations (entrypoints, scenario
files, parameter files) are uploaded to a MinIO S3 bucket inside the
cluster.
Job creation — For each configuration × run number, a Kubernetes
Job is created from a manifest template. Each job runs an initContainer
that pulls its config files from S3 and a main robovast container that
executes the scenario.
Queueing (Kueue) — Jobs are submitted to a dedicated Kueue
LocalQueue (robovast). Kueue’s gang-scheduling and resource quotas
ensure that jobs are admitted only when sufficient CPU/memory is available,
preventing cluster oversubscription.
Result collection — After each job, the scenario container uploads
result files back to the S3 bucket. vast execution cluster upload-to-share
compresses and transfers the archives from inside the pod to a configured
share service (Nextcloud, GCS, …), or vast execution cluster download-cleanup
removes the buckets once results have been handled.
Prerequisites
The following tools must be installed and available on PATH before using
cluster execution:
Tool |
Purpose |
Install |
kubectl
|
Communicate with the Kubernetes cluster (apply manifests, port-forward,
wait for pods) |
kubectl install guide |
helm
|
Install and upgrade Kueue (the job-queueing controller) via the Helm
chart registry |
helm install guide |
k9s (recommended)
|
Terminal UI for monitoring pods, jobs, and logs in real time — not
required but greatly simplifies observability during a run |
k9s install guide |
For GCP clusters the gcloud CLI is additionally required — see
GCP (Google Kubernetes Engine) below.
Cluster Setup
Before the first run, deploy the MinIO S3 server and Kueue into the cluster:
vast execution cluster setup <cluster-config>
Available cluster configs (--list):
vast execution cluster setup --list
The setup command:
Deploys a robovast pod containing MinIO, an nginx HTTP server, and a
Python/boto3 archiver sidecar.
Installs Kueue via Helm and creates a
ClusterQueue and LocalQueue sized to the cluster’s available
CPU/memory.
To tear everything down after use:
vast execution cluster cleanup
Running Scenarios
# Run all configs defined in the project's .vast file
vast execution cluster run
# Override the number of runs from the CLI
vast execution cluster run --runs 5
# Run only one specific config by name
vast execution cluster run --config my-config
Monitoring and Results
Check the status of a running (or recently completed) run:
vast execution cluster monitor
Upload results to a share service once jobs have finished:
vast execution cluster upload-to-share
Clean up only the job objects (without touching the result storage):
vast execution cluster run-cleanup
vast execution cluster run-cleanup --campaign campaign-2025-06-01-120000
Remove result archives from S3 (after uploading or when no longer needed):
vast execution cluster download-cleanup
Manual Deployment (prepare-run)
To generate all necessary manifests and scripts without running them
(e.g. for airgapped clusters or CI pipelines):
vast execution cluster prepare-run ./output-dir
The output directory contains:
robovast-manifest.yaml — robovast base services (e.g. MinIO pod/service manifest)
kueue-queue-setup.yaml + README_kueue.md — Kueue queue objects
out_template/ — scenario configuration files
jobs/ — individual Kubernetes Job YAML files per scenario/run
all-jobs.yaml — all jobs in a single file
upload_configs.py — script to upload configs to S3
README.md + cluster-specific README files
Job Queueing with Kueue
RoboVAST uses Kueue (version 0.16.1)
for admission control and resource quotas.
What Kueue does:
Admits batch jobs only when the cluster has enough CPU and memory.
Queues excess jobs and starts them as capacity becomes available.
Prevents oversubscription: no node goes out-of-memory from too many
concurrent simulation pods.
Enables fair sharing when the cluster is shared with other workloads.
How it is set up:
A single ResourceFlavor (default-flavor) represents the cluster’s
homogeneous node pool.
A ClusterQueue (robovast-cluster-queue) holds the combined CPU/memory
quota, sized automatically from allocatable − requested at setup time.
A LocalQueue named robovast in the execution namespace is the
submission target for every RoboVAST job.
Each generated Job manifest carries the annotation
kueue.x-k8s.io/queue-name: robovast so Kueue picks it up automatically.
If Kueue is not installed, jobs are still created but are not queued —
they start immediately, which can overload the cluster.
Selecting a Cluster Context
RoboVAST uses kubeconfig contexts to address different clusters. Pass
the --context flag to any cluster sub-command to select a specific context
(as listed by kubectl config get-contexts):
# Use the currently active context (default)
vast execution cluster run
# Explicitly target a context
vast execution cluster run --context gcp-c4
The --context flag is available on setup, run, monitor,
upload-to-share, prepare-run, run-cleanup, and cleanup.
Contexts can be renamed to shorter, human-friendly identifiers:
kubectl config rename-context <old-name> <new-name>
Per-Cluster Resource Limits
When the same .vast file is used on multiple clusters that have
different hardware, resource fields (cpu, memory) can be expressed as
a list of {context-name: value} mappings instead of a plain scalar.
execution:
resources:
cpu:
- gcp-c4: 4
- local: 8
memory:
- gcp-c4: 10Gi
- local: 20Gi
secondary_containers:
- nav:
resources:
cpu:
- gcp-c4: 2
- local: 4
- simulation:
resources:
cpu:
- gcp-c4: 2
- local: 4
memory:
- gcp-c4: 8Gi
- local: 16Gi
Rules:
Scalars take precedence — a plain integer/string is used unchanged on
every cluster.
For per-cluster lists the entry whose key matches the active context is
used. If no entry matches, RoboVAST raises a ValueError.
Fields can be mixed: cpu as a scalar and memory as a per-cluster list
is valid.
If a per-cluster list is present and no --context is supplied, RoboVAST
will ask you to provide one.
Running the same config on two clusters:
vast execution cluster run --context gcp-c4
vast execution cluster run --context local
Cloud Provider Configurations
Three cluster configurations are shipped out of the box. Select the one
matching your environment.
GCP (Google Kubernetes Engine)
Config name: gcp
Uses a GCP Persistent Disk (PD) as MinIO storage, provisioned automatically
through a dedicated StorageClass.
Prerequisites:
Install and authenticate the gcloud CLI.
Install the GKE auth plugin required by kubectl to authenticate against
GKE clusters:
sudo apt-get install google-cloud-cli-gke-gcloud-auth-plugin
Fetch the cluster credentials into your kubeconfig:
gcloud container clusters get-credentials <cluster-name> --region <region>
Optionally rename the context for brevity:
kubectl config rename-context \
gke_<project>_<region>_<cluster-name> gcp-c4
Setup:
vast execution cluster setup gcp
# With a larger disk or a faster disk type:
vast execution cluster setup gcp \
--option storage_size=50Gi \
--option disk_type=pd-ssd
Available options:
Option |
Default |
Description |
storage_size
|
10Gi
|
Size of the GCP PD PVC |
disk_type
|
pd-standard
|
GCP PD type (pd-standard, pd-ssd, pd-balanced) |
Note
After a cleanup, the PersistentVolume may need to be deleted manually
in the GCP console (the StorageClass uses reclaimPolicy: Delete
but cloud disks are not always reclaimed immediately).
RKE2
Config name: rke2
Targets on-premise clusters managed by
Rancher RKE2. Uses MinIO with an emptyDir
volume — data persists as long as the pod is alive.
Prerequisites:
Setup:
vast execution cluster setup rke2
Notes:
Minikube
Config name: minikube
Targets a local minikube cluster.
Uses MinIO with ephemeral emptyDir storage. Intended for development
and local integration tests.
Prerequisites:
Setup:
vast execution cluster setup minikube
Notes:
No HTTP result server — the nginx sidecar and archiver are not included in
the minikube manifest. Use vast execution cluster download-cleanup to
remove S3 buckets after processing results via kubectl port-forward.
emptyDir storage means all data is lost if the pod restarts.
API Reference
The resolution logic for per-cluster resources lives in
robovast.common.cluster_context:
Kubernetes context awareness and per-cluster resource resolution.
Resource values in the .vast config file may be given as a per-cluster
list keyed by the real Kubernetes context name instead of a scalar:
resources:
cpu:
- gke_my-project_us-central1_my-cluster: 4
- minikube: 8
memory:
- gke_my-project_us-central1_my-cluster: 10Gi
- minikube: 20Gi
Scalars always work and are the recommended default when a single cluster is
used. Pass the matching context name via --context/-x when running
commands against a specific cluster.
-
robovast.common.cluster_context.get_active_kube_context() → str | None
Return the name of the currently active Kubernetes context.
Reads the active context from the local kubeconfig (~/.kube/config
or KUBECONFIG). Returns None when the context cannot be
determined (e.g. kubeconfig is absent).
-
robovast.common.cluster_context.get_config_context_names(config_path: str) → set[str]
Extract all context names used in per-cluster resource lists.
Scans a .vast YAML config for any field that uses the per-cluster list
syntax ([{context-name: value}, …]) and returns the union of all keys.
- Parameters:
config_path – Absolute path to a .vast YAML config file.
- Returns:
Set of context name strings. Empty when no per-cluster lists are found.
-
robovast.common.cluster_context.list_all_contexts() → list[tuple[str, str]]
List all available (label, kube_context_name) pairs from the kubeconfig.
- Returns:
List of (label, kube_context_name) tuples sorted by name.
Returns an empty list when no kubeconfig is available.
-
robovast.common.cluster_context.require_context_for_multi_cluster(kube_context: str | None) → None
Raise ValueError when a multi-cluster config is used without --context.
Discovers the project .vast config file, scans it for per-cluster
resource lists, and raises an informative error when more than one context
name is present and no kube_context was specified.
This is a no-op when:
kube_context is already set (the user supplied --context).
No project config can be found.
The config uses only a single context name (or only plain scalars).
- Parameters:
kube_context – The Kubernetes context name (None when the user did
not pass --context).
- Raises:
ValueError – When multiple context names are found and kube_context
is None.
-
robovast.common.cluster_context.resolve_resource_value(value: Any, context: str | None) → Any
Resolve a resource value for the active Kubernetes context.
Handles two forms:
Scalar (int, float, or str): returned as-is.
Per-cluster list ([{context-name: value}, …]): the entry whose
key matches context is returned.
- Raises:
ValueError – When the value is a per-cluster list but context is
None, or when the context has no entry in the list.
- Parameters:
value – Raw resource value (scalar or per-cluster list).
context – Active Kubernetes context name, or None.
- Returns:
Resolved scalar value, or None when value is None.
-
robovast.common.cluster_context.resolve_resources(resources: dict, context: str | None) → dict
Resolve all resource fields in a resources dict for the active cluster.
Calls resolve_resource_value() for every key in resources and
returns a new dict with all per-cluster lists replaced by their resolved
scalar values.
- Raises:
ValueError – Propagated from resolve_resource_value() when a
per-cluster list has no entry for context.
- Parameters:
resources – Raw resources dict (e.g. {'cpu': 15} or
{'cpu': [{'gke_my-project_…_cluster': 4}, {'minikube': 8}]}).
context – Active Kubernetes context name, or None.
- Returns:
New dict with resolved scalar values (None entries removed).
Sharing Results via cluster upload-to-share
The upload-to-share command transfers campaign results entirely inside
the archiver sidecar of the robovast pod — directly to a shared folder
(Nextcloud, GCS, …). No data ever reaches the user’s machine.
vast execution cluster upload-to-share
How it works
For each available campaign the command:
Creates a compressed {campaign_id}.tar.gz archive in /data/ on the
pod. If the archive already exists it is reused.
Executes the share-provider upload script inside the archiver container,
streaming upload progress back to the local terminal.
Removes the archive from the pod on success (use --keep-archive to
retain it for other uses).
Keeps both the archive and the S3 bucket if the upload fails, so the
command can be retried safely.
Configuration via .env
All credentials and share URLs are stored in a .env file in the project
directory (or any parent directory). The file is never committed to the
.vast project configuration, keeping secrets out of version control.
Load order: python-dotenv searches for .env starting from the
current working directory and walks up to the root.
Required variables (for all share types):
ROBOVAST_SHARE_TYPE=<provider> # e.g. nextcloud
Provider-specific variables are listed in the sections below.
Note
If any required variable is missing, the command prints a clear error
message listing what is needed before performing any cluster operation.
Nextcloud
The Nextcloud share must be a public link that allows file uploads without
a password (“Allow upload and editing” enabled in the Nextcloud sharing
dialog).
ROBOVAST_SHARE_TYPE=nextcloud
# Copy the link from the Nextcloud sharing dialog.
# Example: https://cloud.example.com/s/AbCdEfGhIjKlMn
ROBOVAST_SHARE_URL=https://cloud.example.com/s/<token>
The upload uses the WebDAV public-share endpoint (/public.php/webdav/)
with the share token as the HTTP Basic-Auth username and an empty password.
Only the standard Python library is used inside the pod — no additional
packages need to be installed.
Example usage:
# Upload all available runs
vast execution cluster upload-to-share
# Keep the pod-side archive after upload
vast execution cluster upload-to-share --keep-archive
# Force recreation of the tar.gz even if it already exists
vast execution cluster upload-to-share --force
Progress output
A single-line progress bar is printed for each run during upload, showing
the percentage, transferred size, and upload rate:
campaign-2026-03-01-120000 [████████████░░░░░░░░] 60.0% 1.2 MiB/2.0 MiB 3.4 MiB/s
campaign-2026-03-01-120000 uploaded (2.0 MiB) ✓
✓ Uploaded 3 campaign(s) to nextcloud successfully!
Google Cloud Storage (GCS)
The GCS provider uploads archives directly from the archiver pod to a GCS
bucket using a service-account key. Downloads use the public GCS HTTP API
and do not require credentials when the bucket is publicly readable.
ROBOVAST_SHARE_TYPE=gcs
# GCS bucket name
ROBOVAST_GCS_BUCKET=my-robovast-results
# Required for upload (cluster upload-to-share) only.
# Not needed for results download on public buckets.
ROBOVAST_GCS_KEY_FILE=/path/to/service-account-key.json
# Optional: object-name prefix inside the bucket (default: bucket root)
# ROBOVAST_GCS_PREFIX=results/
Service-account setup (upload only):
Create a service account in the GCP IAM console.
Grant it the Storage Object Creator role on the target bucket.
Generate a JSON key, download it, and set ROBOVAST_GCS_KEY_FILE to its
path.
Making the bucket publicly readable (for download):
Grant the Storage Object Viewer role to the special principal
allUsers in the GCP console (or via gsutil iam):
gsutil iam ch allUsers:objectViewer gs://my-robovast-results
Once the bucket is public, vast results download works without
any credentials — only ROBOVAST_SHARE_TYPE and ROBOVAST_GCS_BUCKET
need to be set.
Adding a new share provider (plugin system)
Share providers are discovered as entry-point plugins under the
robovast.share_providers group. To add a new provider:
Create a provider class that inherits from
BaseShareProvider
and implements the three abstract methods:
from robovast.execution.cluster_execution.share_providers.base import (
BaseShareProvider,
)
class MyShareProvider(BaseShareProvider):
SHARE_TYPE = "myshare"
def required_env_vars(self) -> dict[str, str]:
return {
"ROBOVAST_SHARE_URL": "URL of the target folder",
"MY_SHARE_TOKEN": "API token for the share service",
}
def get_upload_script_path(self) -> str:
import os
return os.path.join(os.path.dirname(__file__), "myshare_upload_script.py")
def build_pod_env(self) -> dict[str, str]:
import os
return {
"MY_SHARE_URL": os.environ["ROBOVAST_SHARE_URL"],
"MY_SHARE_TOKEN": os.environ["MY_SHARE_TOKEN"],
}
Create a pod-side upload script (myshare_upload_script.py). It
runs inside the robovast-archiver image (python:3.12-alpine +
pigz, boto3, google-auth, google-api-python-client). It
receives the run ID as sys.argv[1] and finds the archive at
/data/{campaign}.tar.gz. Env vars from build_pod_env() are
available via os.environ.
Register the provider in your package’s pyproject.toml:
[tool.poetry.plugins."robovast.share_providers"]
myshare = "mypackage.myshare:MyShareProvider"
Re-install the package (pip install -e .) so the entry point is
registered.
After that, ROBOVAST_SHARE_TYPE=myshare in .env will select your
provider automatically.
Share provider API reference
-
class robovast.execution.cluster_execution.share_providers.base.BaseShareProvider
Base class for all share providers.
A share provider encapsulates everything needed to upload a tar.gz archive
from inside the archiver sidecar of the robovast pod to a remote storage
service (Nextcloud, Google Drive, …).
Subclasses must:
The constructor automatically validates that all required env vars are
present; it raises click.UsageError if any are missing. Values
are read from os.environ (which is already populated by
python-dotenv before the provider is instantiated).
Pod-side scripts run inside the robovast-sidecar image
(python:3.12-alpine + pigz, mc, boto3, google-auth,
requests, paramiko, pyyaml). No additional packages need to be
pip-installed at runtime for the built-in providers.
To add a new provider:
Create a new file in this package (e.g. myshare.py).
Subclass BaseShareProvider, fill in the four abstract members.
Create a pod-side upload script (piped via stdin to python -).
Register the provider in pyproject.toml under
[tool.poetry.plugins."robovast.share_providers"].
-
SHARE_TYPE: str = ''
Short identifier for the provider, e.g. "nextcloud" .
-
archive_exists_on_share(object_name: str) → bool
Return True if object_name already exists on the share.
Used by cluster upload-to-share to skip uploads when the archive
is already present (unless --force is given). Only meaningful for
providers that support remote listing or HTTP HEAD checks.
The default implementation always returns False (no skip), so
providers that do not override this method will always re-upload.
- Parameters:
object_name – Filename of the archive on the server
(e.g. campaign-2025-02-27-123456.tar.gz).
- Returns:
True if the archive already exists on the share, False
otherwise.
-
abstract build_pod_env() → dict[str, str]
Return environment variables to inject into the pod exec call.
These variables will be set for the upload script executed inside the
archiver container. Include everything the script needs: URLs, tokens,
credentials, etc.
The return value is merged into the pod’s environment via the
--env flag of kubectl exec.
- Returns:
Mapping of variable name to value.
- Return type:
dict[str, str]
-
download_archive(object_name: str, dest_path: str, progress_callback=None, resume_offset: int = 0) → None
Download object_name from the share to the local dest_path.
- Parameters:
object_name – The object/file name on the share (as returned by
list_campaign_archives()).
dest_path – Absolute local path to write the downloaded file to.
progress_callback – Optional callable (bytes_received, total_bytes)
called periodically during the download.
resume_offset – Byte offset to resume downloading from. When
non-zero the provider should skip the first resume_offset
bytes and append to dest_path.
Raise NotImplementedError if the provider does not support
downloading (default).
-
abstract get_upload_script_path() → str
Return the absolute path to the pod-side Python upload script.
The script is piped via stdin to python - inside the archiver
container (robovast-sidecar image). It must be self-contained
and only use packages available in that image: standard library,
boto3, google-auth, requests, paramiko, pyyaml.
The script receives the run ID as sys.argv[1].
Environment variables from build_pod_env() are available via
os.environ.
-
list_campaign_archives() → list[str]
Return a list of campaign *.tar.gz object names on the share.
Archives whose base name (without .tar.gz) matches the campaign
naming convention (<campaign-name>-YYYY-MM-DD-HHMMSS) are returned.
Raise NotImplementedError if the provider does not support
downloading (default). Implementations should return bare object names
(keys), not full URLs.
The default implementation delegates to
list_campaign_archives_with_size() and discards the size.
Override list_campaign_archives_with_size() to provide sizes.
-
list_campaign_archives_with_size() → list[tuple[str, int]]
Return a list of (object_name, size_in_bytes) for each
campaign-*.tar.gz object on the share.
size_in_bytes is -1 when the provider cannot determine the file
size. Raise NotImplementedError if the provider does not
support listing at all (default).
Implementations should return bare object names (keys), not full URLs.
-
remove_archive(object_name: str) → None
Remove object_name from the share.
- Parameters:
object_name – The object/file name on the share (as returned by
list_campaign_archives()).
Raise NotImplementedError if the provider does not support
removal (default).
-
abstract required_env_vars() → dict[str, str]
Return a mapping of environment-variable name → human-readable description.
All listed variables must be non-empty strings in the environment when
the provider is instantiated. The base class validates them
automatically and raises click.UsageError if any are missing.
Example:
return {
"ROBOVAST_SHARE_URL": "Public share URL of the target folder",
}
-
class robovast.execution.cluster_execution.share_providers.nextcloud.NextcloudShareProvider
Upload/download campaign archives to a public Nextcloud share (WebDAV).
The share must be a public link that allows file uploads without a
password. In the Nextcloud web UI, create a share with “Allow upload
and editing” enabled and copy the link.
Required .env variables:
Variable |
Description |
ROBOVAST_SHARE_TYPE
|
Must be nextcloud |
ROBOVAST_SHARE_URL
|
Public share URL (e.g.
https://cloud.example.com/s/AbCdEfGhIjKlMn) |
-
SHARE_TYPE: str = 'nextcloud'
Short identifier for the provider, e.g. "nextcloud" .
-
build_pod_env() → dict[str, str]
Return environment variables to inject into the pod exec call.
These variables will be set for the upload script executed inside the
archiver container. Include everything the script needs: URLs, tokens,
credentials, etc.
The return value is merged into the pod’s environment via the
--env flag of kubectl exec.
- Returns:
Mapping of variable name to value.
- Return type:
dict[str, str]
-
download_archive(object_name: str, dest_path: str, progress_callback: Callable[[int, int], None] | None = None, resume_offset: int = 0) → None
Download object_name from the Nextcloud share to dest_path.
- Parameters:
object_name – Filename of the archive on the share.
dest_path – Local destination path.
progress_callback – Optional (bytes_received, total_bytes) callable.
resume_offset – Byte offset to resume downloading from.
-
get_upload_script_path() → str
Return the absolute path to the pod-side Python upload script.
The script is piped via stdin to python - inside the archiver
container (robovast-sidecar image). It must be self-contained
and only use packages available in that image: standard library,
boto3, google-auth, requests, paramiko, pyyaml.
The script receives the run ID as sys.argv[1].
Environment variables from build_pod_env() are available via
os.environ.
-
list_campaign_archives() → list[str]
Return a list of campaign *.tar.gz filenames on the share.
-
list_campaign_archives_with_size() → list[tuple[str, int]]
Return (filename, size_in_bytes) for each campaign *.tar.gz on the share.
Uses WebDAV PROPFIND Depth: 1 against the Nextcloud
public.php/webdav/ endpoint, authenticated with the share token
as the HTTP Basic Auth username. size_in_bytes is -1 when the
server does not return a getcontentlength value.
-
remove_archive(object_name: str) → None
Delete object_name from the Nextcloud share via WebDAV DELETE.
- Parameters:
object_name – Filename of the archive on the share (as returned by
list_campaign_archives()).
-
required_env_vars() → dict[str, str]
Return a mapping of environment-variable name → human-readable description.
All listed variables must be non-empty strings in the environment when
the provider is instantiated. The base class validates them
automatically and raises click.UsageError if any are missing.
Example:
return {
"ROBOVAST_SHARE_URL": "Public share URL of the target folder",
}
-
class robovast.execution.cluster_execution.share_providers.gcs.GcsShareProvider
Upload campaign archives to a Google Cloud Storage bucket.
Authentication uses a service-account key file. Create a service account
with the Storage Object Creator role on the target bucket, generate a
JSON key, download the file, and point ROBOVAST_GCS_KEY_FILE at it.
Required .env variables:
Variable |
Description |
ROBOVAST_SHARE_TYPE
|
Must be gcs |
ROBOVAST_GCS_BUCKET
|
Target GCS bucket name (e.g. my-robovast-results) |
ROBOVAST_GCS_KEY_FILE
|
Path to the service-account JSON key file
(required for cluster upload-to-share only;
not needed for results download on public buckets) |
Optional .env variables:
Variable |
Description |
ROBOVAST_GCS_PREFIX
|
Object-name prefix inside the bucket (e.g. results/).
Defaults to the bucket root. |
-
SHARE_TYPE: str = 'gcs'
Short identifier for the provider, e.g. "nextcloud" .
-
build_pod_env() → dict[str, str]
Return environment variables to inject into the pod exec call.
These variables will be set for the upload script executed inside the
archiver container. Include everything the script needs: URLs, tokens,
credentials, etc.
The return value is merged into the pod’s environment via the
--env flag of kubectl exec.
- Returns:
Mapping of variable name to value.
- Return type:
dict[str, str]
-
download_archive(object_name: str, dest_path: str, progress_callback: Callable[[int, int], None] | None = None, resume_offset: int = 0) → None
Stream object_name from the public GCS bucket to dest_path.
Uses chunked streaming so that archives of any size (including 100 GB+)
are written incrementally without loading the file into memory.
- Parameters:
object_name – GCS object key (as returned by list_campaign_archives()).
dest_path – Local file path to write the downloaded content to.
progress_callback – Optional (bytes_received, total_bytes) callable
called after each chunk. total_bytes is 0 if unknown.
resume_offset – Byte offset to resume downloading from.
-
get_upload_script_path() → str
Return the absolute path to the pod-side Python upload script.
The script is piped via stdin to python - inside the archiver
container (robovast-sidecar image). It must be self-contained
and only use packages available in that image: standard library,
boto3, google-auth, requests, paramiko, pyyaml.
The script receives the run ID as sys.argv[1].
Environment variables from build_pod_env() are available via
os.environ.
-
list_campaign_archives_with_size() → list[str]
List all campaign *.tar.gz objects in the configured GCS bucket.
Recognizes archives whose base name (without .tar.gz) matches the
campaign naming convention (<campaign-name>-YYYY-MM-DD-HHMMSS).
Uses the public GCS XML API (no credentials required for public buckets).
Handles GCS list pagination via the NextContinuationToken marker.
- Returns:
List of (object_name, size_in_bytes) tuples.
-
remove_archive(object_name: str) → None
Delete object_name from the GCS bucket.
Requires ROBOVAST_GCS_KEY_FILE to be set to a service-account key
file with at least Storage Object Admin (or Storage Object Viewer +
Storage Object Creator + delete permission) on the bucket.
Uses the GCS JSON API DELETE endpoint with a Bearer token.
- Parameters:
object_name – GCS object key (as returned by
list_campaign_archives()).
-
required_env_vars() → dict[str, str]
Return a mapping of environment-variable name → human-readable description.
All listed variables must be non-empty strings in the environment when
the provider is instantiated. The base class validates them
automatically and raises click.UsageError if any are missing.
Example:
return {
"ROBOVAST_SHARE_URL": "Public share URL of the target folder",
}
-
class robovast.execution.cluster_execution.upload_to_share.ShareUploader(namespace: str = 'default', cluster_config=None, context: str | None = None, provider: BaseShareProvider | None = None)
Upload campaign archives from the cluster pod to an external share service.
- Parameters:
namespace – Kubernetes namespace where the robovast pod lives.
cluster_config – Cluster configuration object providing S3/GCS credentials.
context – Kubernetes context name (or None for the active context).
provider – Instantiated BaseShareProvider
that supplies the upload script and pod environment.
-
list_available_campaigns()
List campaigns that are finished and ready for upload.
- Returns:
- (available_campaigns, excluded_runs) where excluded_runs
is a list of (campaign_id, running_count, pending_count)
for campaigns that still have active jobs.
- Return type:
tuple
-
upload_campaigns(force: bool = False, verbose: bool = False, keep_archive: bool = False, skip_removal: bool = False, campaign_ids: list | None = None) → int
Create tar.gz archives on the pod then upload them to the share.
Steps for each available run:
Existence check – if the archive already exists on the share and
force is False, skip both compression and upload (only for
providers that implement
archive_exists_on_share(),
e.g. WebDAV).
Create the remote {campaign}.tar.gz archive in /data/ (skip
if it already exists and force is False).
Execute the provider’s upload script inside the archiver container,
streaming progress to the local terminal.
On success: remove the remote archive from /data/ unless
keep_archive is True. Also delete the S3 bucket for the run
unless skip_removal is True (mirrors cluster download
behavior).
On failure: keep both the remote archive and the S3 bucket so the
user can retry or fall back to cluster download.
- Parameters:
force – If True, recreate the tar.gz archive even if it already
exists in /data/.
verbose – If True, emit detailed log messages instead of the
single-line progress bar.
keep_archive – If True, do not remove the tar.gz from /data/
after a successful upload.
skip_removal – If True, do not delete the S3 bucket after a
successful upload.
campaign_ids – If provided, only upload these campaigns.
- Returns:
Number of runs successfully uploaded.
- Return type:
int