Cluster Execution

RoboVAST can execute scenarios at scale on a Kubernetes cluster, running each run configuration as an independent Job and collecting results via a built-in MinIO S3 server. This section covers everything from cluster setup and job queueing to multi-context workflows and cloud-provider-specific configuration.

Overview

When a cluster run is triggered, RoboVAST performs the following steps internally:

  1. Config upload — All scenario configurations (entrypoints, scenario files, parameter files) are uploaded to a MinIO S3 bucket inside the cluster.

  2. Job creation — For each configuration × run number, a Kubernetes Job is created from a manifest template. Each job runs an initContainer that pulls its config files from S3 and a main robovast container that executes the scenario.

  3. Queueing (Kueue) — Jobs are submitted to a dedicated Kueue LocalQueue (robovast). Kueue’s gang-scheduling and resource quotas ensure that jobs are admitted only when sufficient CPU/memory is available, preventing cluster oversubscription.

  4. Result collection — After each job, the scenario container uploads result files back to the S3 bucket. vast execution cluster upload-to-share compresses and transfers the archives from inside the pod to a configured share service (Nextcloud, GCS, …), or vast execution cluster download-cleanup removes the buckets once results have been handled.

Prerequisites

The following tools must be installed and available on PATH before using cluster execution:

Tool

Purpose

Install

kubectl

Communicate with the Kubernetes cluster (apply manifests, port-forward, wait for pods)

kubectl install guide

helm

Install and upgrade Kueue (the job-queueing controller) via the Helm chart registry

helm install guide

k9s (recommended)

Terminal UI for monitoring pods, jobs, and logs in real time — not required but greatly simplifies observability during a run

k9s install guide

For GCP clusters the gcloud CLI is additionally required — see GCP (Google Kubernetes Engine) below.

Cluster Setup

Before the first run, deploy the MinIO S3 server and Kueue into the cluster:

vast execution cluster setup <cluster-config>

Available cluster configs (--list):

vast execution cluster setup --list

The setup command:

  • Deploys a robovast pod containing MinIO, an nginx HTTP server, and a Python/boto3 archiver sidecar.

  • Installs Kueue via Helm and creates a ClusterQueue and LocalQueue sized to the cluster’s available CPU/memory.

To tear everything down after use:

vast execution cluster cleanup

Running Scenarios

# Run all configs defined in the project's .vast file
vast execution cluster run

# Override the number of runs from the CLI
vast execution cluster run --runs 5

# Run only one specific config by name
vast execution cluster run --config my-config

Monitoring and Results

Check the status of a running (or recently completed) run:

vast execution cluster monitor

Upload results to a share service once jobs have finished:

vast execution cluster upload-to-share

Clean up only the job objects (without touching the result storage):

vast execution cluster run-cleanup
vast execution cluster run-cleanup --campaign campaign-2025-06-01-120000

Remove result archives from S3 (after uploading or when no longer needed):

vast execution cluster download-cleanup

Manual Deployment (prepare-run)

To generate all necessary manifests and scripts without running them (e.g. for airgapped clusters or CI pipelines):

vast execution cluster prepare-run ./output-dir

The output directory contains:

  • robovast-manifest.yaml — robovast base services (e.g. MinIO pod/service manifest)

  • kueue-queue-setup.yaml + README_kueue.md — Kueue queue objects

  • out_template/ — scenario configuration files

  • jobs/ — individual Kubernetes Job YAML files per scenario/run

  • all-jobs.yaml — all jobs in a single file

  • upload_configs.py — script to upload configs to S3

  • README.md + cluster-specific README files

Job Queueing with Kueue

RoboVAST uses Kueue (version 0.16.1) for admission control and resource quotas.

What Kueue does:

  • Admits batch jobs only when the cluster has enough CPU and memory.

  • Queues excess jobs and starts them as capacity becomes available.

  • Prevents oversubscription: no node goes out-of-memory from too many concurrent simulation pods.

  • Enables fair sharing when the cluster is shared with other workloads.

How it is set up:

  • A single ResourceFlavor (default-flavor) represents the cluster’s homogeneous node pool.

  • A ClusterQueue (robovast-cluster-queue) holds the combined CPU/memory quota, sized automatically from allocatable requested at setup time.

  • A LocalQueue named robovast in the execution namespace is the submission target for every RoboVAST job.

Each generated Job manifest carries the annotation kueue.x-k8s.io/queue-name: robovast so Kueue picks it up automatically.

If Kueue is not installed, jobs are still created but are not queued — they start immediately, which can overload the cluster.

Selecting a Cluster Context

RoboVAST uses kubeconfig contexts to address different clusters. Pass the --context flag to any cluster sub-command to select a specific context (as listed by kubectl config get-contexts):

# Use the currently active context (default)
vast execution cluster run

# Explicitly target a context
vast execution cluster run --context gcp-c4

The --context flag is available on setup, run, monitor, upload-to-share, prepare-run, run-cleanup, and cleanup.

Contexts can be renamed to shorter, human-friendly identifiers:

kubectl config rename-context <old-name> <new-name>

Per-Cluster Resource Limits

When the same .vast file is used on multiple clusters that have different hardware, resource fields (cpu, memory) can be expressed as a list of {context-name: value} mappings instead of a plain scalar.

execution:
  resources:
    cpu:
      - gcp-c4: 4
      - local:  8
    memory:
      - gcp-c4: 10Gi
      - local:  20Gi
  secondary_containers:
    - nav:
        resources:
          cpu:
            - gcp-c4: 2
            - local:  4
    - simulation:
        resources:
          cpu:
            - gcp-c4: 2
            - local:  4
          memory:
            - gcp-c4: 8Gi
            - local:  16Gi

Rules:

  • Scalars take precedence — a plain integer/string is used unchanged on every cluster.

  • For per-cluster lists the entry whose key matches the active context is used. If no entry matches, RoboVAST raises a ValueError.

  • Fields can be mixed: cpu as a scalar and memory as a per-cluster list is valid.

  • If a per-cluster list is present and no --context is supplied, RoboVAST will ask you to provide one.

Running the same config on two clusters:

vast execution cluster run --context gcp-c4
vast execution cluster run --context local

Cloud Provider Configurations

Three cluster configurations are shipped out of the box. Select the one matching your environment.

GCP (Google Kubernetes Engine)

Config name: gcp

Uses a GCP Persistent Disk (PD) as MinIO storage, provisioned automatically through a dedicated StorageClass.

Prerequisites:

  1. Install and authenticate the gcloud CLI.

  2. Install the GKE auth plugin required by kubectl to authenticate against GKE clusters:

    sudo apt-get install google-cloud-cli-gke-gcloud-auth-plugin
    
  3. Fetch the cluster credentials into your kubeconfig:

    gcloud container clusters get-credentials <cluster-name> --region <region>
    
  4. Optionally rename the context for brevity:

    kubectl config rename-context \
      gke_<project>_<region>_<cluster-name> gcp-c4
    

Setup:

vast execution cluster setup gcp

# With a larger disk or a faster disk type:
vast execution cluster setup gcp \
  --option storage_size=50Gi \
  --option disk_type=pd-ssd

Available options:

Option

Default

Description

storage_size

10Gi

Size of the GCP PD PVC

disk_type

pd-standard

GCP PD type (pd-standard, pd-ssd, pd-balanced)

Note

After a cleanup, the PersistentVolume may need to be deleted manually in the GCP console (the StorageClass uses reclaimPolicy: Delete but cloud disks are not always reclaimed immediately).

RKE2

Config name: rke2

Targets on-premise clusters managed by Rancher RKE2. Uses MinIO with an emptyDir volume — data persists as long as the pod is alive.

Prerequisites:

  • Ensure the kubeconfig for the RKE2 cluster is available (typically provided by the cluster administrator as /etc/rancher/rke2/rke2.yaml).

Setup:

vast execution cluster setup rke2

Notes:

  • emptyDir is ephemeral: if the robovast pod is restarted, all data is lost. Upload results with vast execution cluster upload-to-share before modifying or restarting the pod.

Minikube

Config name: minikube

Targets a local minikube cluster. Uses MinIO with ephemeral emptyDir storage. Intended for development and local integration tests.

Prerequisites:

  • Start a minikube cluster:

    minikube start
    

Setup:

vast execution cluster setup minikube

Notes:

  • No HTTP result server — the nginx sidecar and archiver are not included in the minikube manifest. Use vast execution cluster download-cleanup to remove S3 buckets after processing results via kubectl port-forward.

  • emptyDir storage means all data is lost if the pod restarts.

API Reference

The resolution logic for per-cluster resources lives in robovast.common.cluster_context:

Kubernetes context awareness and per-cluster resource resolution.

Resource values in the .vast config file may be given as a per-cluster list keyed by the real Kubernetes context name instead of a scalar:

resources:
  cpu:
    - gke_my-project_us-central1_my-cluster: 4
    - minikube: 8
  memory:
    - gke_my-project_us-central1_my-cluster: 10Gi
    - minikube: 20Gi

Scalars always work and are the recommended default when a single cluster is used. Pass the matching context name via --context/-x when running commands against a specific cluster.

robovast.common.cluster_context.get_active_kube_context() str | None

Return the name of the currently active Kubernetes context.

Reads the active context from the local kubeconfig (~/.kube/config or KUBECONFIG). Returns None when the context cannot be determined (e.g. kubeconfig is absent).

robovast.common.cluster_context.get_config_context_names(config_path: str) set[str]

Extract all context names used in per-cluster resource lists.

Scans a .vast YAML config for any field that uses the per-cluster list syntax ([{context-name: value}, …]) and returns the union of all keys.

Parameters:

config_path – Absolute path to a .vast YAML config file.

Returns:

Set of context name strings. Empty when no per-cluster lists are found.

robovast.common.cluster_context.list_all_contexts() list[tuple[str, str]]

List all available (label, kube_context_name) pairs from the kubeconfig.

Returns:

List of (label, kube_context_name) tuples sorted by name. Returns an empty list when no kubeconfig is available.

robovast.common.cluster_context.require_context_for_multi_cluster(kube_context: str | None) None

Raise ValueError when a multi-cluster config is used without --context.

Discovers the project .vast config file, scans it for per-cluster resource lists, and raises an informative error when more than one context name is present and no kube_context was specified.

This is a no-op when:

  • kube_context is already set (the user supplied --context).

  • No project config can be found.

  • The config uses only a single context name (or only plain scalars).

Parameters:

kube_context – The Kubernetes context name (None when the user did not pass --context).

Raises:

ValueError – When multiple context names are found and kube_context is None.

robovast.common.cluster_context.resolve_resource_value(value: Any, context: str | None) Any

Resolve a resource value for the active Kubernetes context.

Handles two forms:

  • Scalar (int, float, or str): returned as-is.

  • Per-cluster list ([{context-name: value}, …]): the entry whose key matches context is returned.

Raises:

ValueError – When the value is a per-cluster list but context is None, or when the context has no entry in the list.

Parameters:
  • value – Raw resource value (scalar or per-cluster list).

  • context – Active Kubernetes context name, or None.

Returns:

Resolved scalar value, or None when value is None.

robovast.common.cluster_context.resolve_resources(resources: dict, context: str | None) dict

Resolve all resource fields in a resources dict for the active cluster.

Calls resolve_resource_value() for every key in resources and returns a new dict with all per-cluster lists replaced by their resolved scalar values.

Raises:

ValueError – Propagated from resolve_resource_value() when a per-cluster list has no entry for context.

Parameters:
  • resources – Raw resources dict (e.g. {'cpu': 15} or {'cpu': [{'gke_my-project_…_cluster': 4}, {'minikube': 8}]}).

  • context – Active Kubernetes context name, or None.

Returns:

New dict with resolved scalar values (None entries removed).

Sharing Results via cluster upload-to-share

The upload-to-share command transfers campaign results entirely inside the archiver sidecar of the robovast pod — directly to a shared folder (Nextcloud, GCS, …). No data ever reaches the user’s machine.

vast execution cluster upload-to-share

How it works

For each available campaign the command:

  1. Creates a compressed {campaign_id}.tar.gz archive in /data/ on the pod. If the archive already exists it is reused.

  2. Executes the share-provider upload script inside the archiver container, streaming upload progress back to the local terminal.

  3. Removes the archive from the pod on success (use --keep-archive to retain it for other uses).

  4. Keeps both the archive and the S3 bucket if the upload fails, so the command can be retried safely.

Configuration via .env

All credentials and share URLs are stored in a .env file in the project directory (or any parent directory). The file is never committed to the .vast project configuration, keeping secrets out of version control.

Load order: python-dotenv searches for .env starting from the current working directory and walks up to the root.

Required variables (for all share types):

ROBOVAST_SHARE_TYPE=<provider>   # e.g. nextcloud

Provider-specific variables are listed in the sections below.

Note

If any required variable is missing, the command prints a clear error message listing what is needed before performing any cluster operation.

Nextcloud

The Nextcloud share must be a public link that allows file uploads without a password (“Allow upload and editing” enabled in the Nextcloud sharing dialog).

ROBOVAST_SHARE_TYPE=nextcloud

# Copy the link from the Nextcloud sharing dialog.
# Example: https://cloud.example.com/s/AbCdEfGhIjKlMn
ROBOVAST_SHARE_URL=https://cloud.example.com/s/<token>

The upload uses the WebDAV public-share endpoint (/public.php/webdav/) with the share token as the HTTP Basic-Auth username and an empty password. Only the standard Python library is used inside the pod — no additional packages need to be installed.

Example usage:

# Upload all available runs
vast execution cluster upload-to-share

# Keep the pod-side archive after upload
vast execution cluster upload-to-share --keep-archive

# Force recreation of the tar.gz even if it already exists
vast execution cluster upload-to-share --force

Progress output

A single-line progress bar is printed for each run during upload, showing the percentage, transferred size, and upload rate:

campaign-2026-03-01-120000  [████████████░░░░░░░░]   60.0%  1.2 MiB/2.0 MiB  3.4 MiB/s
campaign-2026-03-01-120000  uploaded (2.0 MiB)  ✓

✓ Uploaded 3 campaign(s) to nextcloud successfully!

Google Cloud Storage (GCS)

The GCS provider uploads archives directly from the archiver pod to a GCS bucket using a service-account key. Downloads use the public GCS HTTP API and do not require credentials when the bucket is publicly readable.

ROBOVAST_SHARE_TYPE=gcs

# GCS bucket name
ROBOVAST_GCS_BUCKET=my-robovast-results

# Required for upload (cluster upload-to-share) only.
# Not needed for results download on public buckets.
ROBOVAST_GCS_KEY_FILE=/path/to/service-account-key.json

# Optional: object-name prefix inside the bucket (default: bucket root)
# ROBOVAST_GCS_PREFIX=results/

Service-account setup (upload only):

  1. Create a service account in the GCP IAM console.

  2. Grant it the Storage Object Creator role on the target bucket.

  3. Generate a JSON key, download it, and set ROBOVAST_GCS_KEY_FILE to its path.

Making the bucket publicly readable (for download):

Grant the Storage Object Viewer role to the special principal allUsers in the GCP console (or via gsutil iam):

gsutil iam ch allUsers:objectViewer gs://my-robovast-results

Once the bucket is public, vast results download works without any credentials — only ROBOVAST_SHARE_TYPE and ROBOVAST_GCS_BUCKET need to be set.

Adding a new share provider (plugin system)

Share providers are discovered as entry-point plugins under the robovast.share_providers group. To add a new provider:

  1. Create a provider class that inherits from BaseShareProvider and implements the three abstract methods:

    from robovast.execution.cluster_execution.share_providers.base import (
        BaseShareProvider,
    )
    
    class MyShareProvider(BaseShareProvider):
        SHARE_TYPE = "myshare"
    
        def required_env_vars(self) -> dict[str, str]:
            return {
                "ROBOVAST_SHARE_URL": "URL of the target folder",
                "MY_SHARE_TOKEN":     "API token for the share service",
            }
    
        def get_upload_script_path(self) -> str:
            import os
            return os.path.join(os.path.dirname(__file__), "myshare_upload_script.py")
    
        def build_pod_env(self) -> dict[str, str]:
            import os
            return {
                "MY_SHARE_URL":   os.environ["ROBOVAST_SHARE_URL"],
                "MY_SHARE_TOKEN": os.environ["MY_SHARE_TOKEN"],
            }
    
  2. Create a pod-side upload script (myshare_upload_script.py). It runs inside the robovast-archiver image (python:3.12-alpine + pigz, boto3, google-auth, google-api-python-client). It receives the run ID as sys.argv[1] and finds the archive at /data/{campaign}.tar.gz. Env vars from build_pod_env() are available via os.environ.

  3. Register the provider in your package’s pyproject.toml:

    [tool.poetry.plugins."robovast.share_providers"]
    myshare = "mypackage.myshare:MyShareProvider"
    
  4. Re-install the package (pip install -e .) so the entry point is registered.

After that, ROBOVAST_SHARE_TYPE=myshare in .env will select your provider automatically.

Share provider API reference

class robovast.execution.cluster_execution.share_providers.base.BaseShareProvider

Base class for all share providers.

A share provider encapsulates everything needed to upload a tar.gz archive from inside the archiver sidecar of the robovast pod to a remote storage service (Nextcloud, Google Drive, …).

Subclasses must:

The constructor automatically validates that all required env vars are present; it raises click.UsageError if any are missing. Values are read from os.environ (which is already populated by python-dotenv before the provider is instantiated).

Pod-side scripts run inside the robovast-sidecar image (python:3.12-alpine + pigz, mc, boto3, google-auth, requests, paramiko, pyyaml). No additional packages need to be pip-installed at runtime for the built-in providers.

To add a new provider:

  1. Create a new file in this package (e.g. myshare.py).

  2. Subclass BaseShareProvider, fill in the four abstract members.

  3. Create a pod-side upload script (piped via stdin to python -).

  4. Register the provider in pyproject.toml under [tool.poetry.plugins."robovast.share_providers"].

SHARE_TYPE: str = ''

Short identifier for the provider, e.g. "nextcloud" .

archive_exists_on_share(object_name: str) bool

Return True if object_name already exists on the share.

Used by cluster upload-to-share to skip uploads when the archive is already present (unless --force is given). Only meaningful for providers that support remote listing or HTTP HEAD checks.

The default implementation always returns False (no skip), so providers that do not override this method will always re-upload.

Parameters:

object_name – Filename of the archive on the server (e.g. campaign-2025-02-27-123456.tar.gz).

Returns:

True if the archive already exists on the share, False otherwise.

abstract build_pod_env() dict[str, str]

Return environment variables to inject into the pod exec call.

These variables will be set for the upload script executed inside the archiver container. Include everything the script needs: URLs, tokens, credentials, etc.

The return value is merged into the pod’s environment via the --env flag of kubectl exec.

Returns:

Mapping of variable name to value.

Return type:

dict[str, str]

download_archive(object_name: str, dest_path: str, progress_callback=None, resume_offset: int = 0) None

Download object_name from the share to the local dest_path.

Parameters:
  • object_name – The object/file name on the share (as returned by list_campaign_archives()).

  • dest_path – Absolute local path to write the downloaded file to.

  • progress_callback – Optional callable (bytes_received, total_bytes) called periodically during the download.

  • resume_offset – Byte offset to resume downloading from. When non-zero the provider should skip the first resume_offset bytes and append to dest_path.

Raise NotImplementedError if the provider does not support downloading (default).

abstract get_upload_script_path() str

Return the absolute path to the pod-side Python upload script.

The script is piped via stdin to python - inside the archiver container (robovast-sidecar image). It must be self-contained and only use packages available in that image: standard library, boto3, google-auth, requests, paramiko, pyyaml.

The script receives the run ID as sys.argv[1]. Environment variables from build_pod_env() are available via os.environ.

list_campaign_archives() list[str]

Return a list of campaign *.tar.gz object names on the share.

Archives whose base name (without .tar.gz) matches the campaign naming convention (<campaign-name>-YYYY-MM-DD-HHMMSS) are returned.

Raise NotImplementedError if the provider does not support downloading (default). Implementations should return bare object names (keys), not full URLs.

The default implementation delegates to list_campaign_archives_with_size() and discards the size. Override list_campaign_archives_with_size() to provide sizes.

list_campaign_archives_with_size() list[tuple[str, int]]

Return a list of (object_name, size_in_bytes) for each campaign-*.tar.gz object on the share.

size_in_bytes is -1 when the provider cannot determine the file size. Raise NotImplementedError if the provider does not support listing at all (default).

Implementations should return bare object names (keys), not full URLs.

remove_archive(object_name: str) None

Remove object_name from the share.

Parameters:

object_name – The object/file name on the share (as returned by list_campaign_archives()).

Raise NotImplementedError if the provider does not support removal (default).

abstract required_env_vars() dict[str, str]

Return a mapping of environment-variable name → human-readable description.

All listed variables must be non-empty strings in the environment when the provider is instantiated. The base class validates them automatically and raises click.UsageError if any are missing.

Example:

return {
    "ROBOVAST_SHARE_URL": "Public share URL of the target folder",
}
class robovast.execution.cluster_execution.share_providers.nextcloud.NextcloudShareProvider

Upload/download campaign archives to a public Nextcloud share (WebDAV).

The share must be a public link that allows file uploads without a password. In the Nextcloud web UI, create a share with “Allow upload and editing” enabled and copy the link.

Required .env variables:

Variable

Description

ROBOVAST_SHARE_TYPE

Must be nextcloud

ROBOVAST_SHARE_URL

Public share URL (e.g. https://cloud.example.com/s/AbCdEfGhIjKlMn)

SHARE_TYPE: str = 'nextcloud'

Short identifier for the provider, e.g. "nextcloud" .

build_pod_env() dict[str, str]

Return environment variables to inject into the pod exec call.

These variables will be set for the upload script executed inside the archiver container. Include everything the script needs: URLs, tokens, credentials, etc.

The return value is merged into the pod’s environment via the --env flag of kubectl exec.

Returns:

Mapping of variable name to value.

Return type:

dict[str, str]

download_archive(object_name: str, dest_path: str, progress_callback: Callable[[int, int], None] | None = None, resume_offset: int = 0) None

Download object_name from the Nextcloud share to dest_path.

Parameters:
  • object_name – Filename of the archive on the share.

  • dest_path – Local destination path.

  • progress_callback – Optional (bytes_received, total_bytes) callable.

  • resume_offset – Byte offset to resume downloading from.

get_upload_script_path() str

Return the absolute path to the pod-side Python upload script.

The script is piped via stdin to python - inside the archiver container (robovast-sidecar image). It must be self-contained and only use packages available in that image: standard library, boto3, google-auth, requests, paramiko, pyyaml.

The script receives the run ID as sys.argv[1]. Environment variables from build_pod_env() are available via os.environ.

list_campaign_archives() list[str]

Return a list of campaign *.tar.gz filenames on the share.

list_campaign_archives_with_size() list[tuple[str, int]]

Return (filename, size_in_bytes) for each campaign *.tar.gz on the share.

Uses WebDAV PROPFIND Depth: 1 against the Nextcloud public.php/webdav/ endpoint, authenticated with the share token as the HTTP Basic Auth username. size_in_bytes is -1 when the server does not return a getcontentlength value.

remove_archive(object_name: str) None

Delete object_name from the Nextcloud share via WebDAV DELETE.

Parameters:

object_name – Filename of the archive on the share (as returned by list_campaign_archives()).

required_env_vars() dict[str, str]

Return a mapping of environment-variable name → human-readable description.

All listed variables must be non-empty strings in the environment when the provider is instantiated. The base class validates them automatically and raises click.UsageError if any are missing.

Example:

return {
    "ROBOVAST_SHARE_URL": "Public share URL of the target folder",
}
class robovast.execution.cluster_execution.share_providers.gcs.GcsShareProvider

Upload campaign archives to a Google Cloud Storage bucket.

Authentication uses a service-account key file. Create a service account with the Storage Object Creator role on the target bucket, generate a JSON key, download the file, and point ROBOVAST_GCS_KEY_FILE at it.

Required .env variables:

Variable

Description

ROBOVAST_SHARE_TYPE

Must be gcs

ROBOVAST_GCS_BUCKET

Target GCS bucket name (e.g. my-robovast-results)

ROBOVAST_GCS_KEY_FILE

Path to the service-account JSON key file (required for cluster upload-to-share only; not needed for results download on public buckets)

Optional .env variables:

Variable

Description

ROBOVAST_GCS_PREFIX

Object-name prefix inside the bucket (e.g. results/). Defaults to the bucket root.

SHARE_TYPE: str = 'gcs'

Short identifier for the provider, e.g. "nextcloud" .

build_pod_env() dict[str, str]

Return environment variables to inject into the pod exec call.

These variables will be set for the upload script executed inside the archiver container. Include everything the script needs: URLs, tokens, credentials, etc.

The return value is merged into the pod’s environment via the --env flag of kubectl exec.

Returns:

Mapping of variable name to value.

Return type:

dict[str, str]

download_archive(object_name: str, dest_path: str, progress_callback: Callable[[int, int], None] | None = None, resume_offset: int = 0) None

Stream object_name from the public GCS bucket to dest_path.

Uses chunked streaming so that archives of any size (including 100 GB+) are written incrementally without loading the file into memory.

Parameters:
  • object_name – GCS object key (as returned by list_campaign_archives()).

  • dest_path – Local file path to write the downloaded content to.

  • progress_callback – Optional (bytes_received, total_bytes) callable called after each chunk. total_bytes is 0 if unknown.

  • resume_offset – Byte offset to resume downloading from.

get_upload_script_path() str

Return the absolute path to the pod-side Python upload script.

The script is piped via stdin to python - inside the archiver container (robovast-sidecar image). It must be self-contained and only use packages available in that image: standard library, boto3, google-auth, requests, paramiko, pyyaml.

The script receives the run ID as sys.argv[1]. Environment variables from build_pod_env() are available via os.environ.

list_campaign_archives_with_size() list[str]

List all campaign *.tar.gz objects in the configured GCS bucket.

Recognizes archives whose base name (without .tar.gz) matches the campaign naming convention (<campaign-name>-YYYY-MM-DD-HHMMSS). Uses the public GCS XML API (no credentials required for public buckets). Handles GCS list pagination via the NextContinuationToken marker.

Returns:

List of (object_name, size_in_bytes) tuples.

remove_archive(object_name: str) None

Delete object_name from the GCS bucket.

Requires ROBOVAST_GCS_KEY_FILE to be set to a service-account key file with at least Storage Object Admin (or Storage Object Viewer + Storage Object Creator + delete permission) on the bucket.

Uses the GCS JSON API DELETE endpoint with a Bearer token.

Parameters:

object_name – GCS object key (as returned by list_campaign_archives()).

required_env_vars() dict[str, str]

Return a mapping of environment-variable name → human-readable description.

All listed variables must be non-empty strings in the environment when the provider is instantiated. The base class validates them automatically and raises click.UsageError if any are missing.

Example:

return {
    "ROBOVAST_SHARE_URL": "Public share URL of the target folder",
}
class robovast.execution.cluster_execution.upload_to_share.ShareUploader(namespace: str = 'default', cluster_config=None, context: str | None = None, provider: BaseShareProvider | None = None)

Upload campaign archives from the cluster pod to an external share service.

Parameters:
  • namespace – Kubernetes namespace where the robovast pod lives.

  • cluster_config – Cluster configuration object providing S3/GCS credentials.

  • context – Kubernetes context name (or None for the active context).

  • provider – Instantiated BaseShareProvider that supplies the upload script and pod environment.

list_available_campaigns()

List campaigns that are finished and ready for upload.

Returns:

(available_campaigns, excluded_runs) where excluded_runs

is a list of (campaign_id, running_count, pending_count) for campaigns that still have active jobs.

Return type:

tuple

upload_campaigns(force: bool = False, verbose: bool = False, keep_archive: bool = False, skip_removal: bool = False, campaign_ids: list | None = None) int

Create tar.gz archives on the pod then upload them to the share.

Steps for each available run:

  1. Existence check – if the archive already exists on the share and force is False, skip both compression and upload (only for providers that implement archive_exists_on_share(), e.g. WebDAV).

  2. Create the remote {campaign}.tar.gz archive in /data/ (skip if it already exists and force is False).

  3. Execute the provider’s upload script inside the archiver container, streaming progress to the local terminal.

  4. On success: remove the remote archive from /data/ unless keep_archive is True. Also delete the S3 bucket for the run unless skip_removal is True (mirrors cluster download behavior).

  5. On failure: keep both the remote archive and the S3 bucket so the user can retry or fall back to cluster download.

Parameters:
  • force – If True, recreate the tar.gz archive even if it already exists in /data/.

  • verbose – If True, emit detailed log messages instead of the single-line progress bar.

  • keep_archive – If True, do not remove the tar.gz from /data/ after a successful upload.

  • skip_removal – If True, do not delete the S3 bucket after a successful upload.

  • campaign_ids – If provided, only upload these campaigns.

Returns:

Number of runs successfully uploaded.

Return type:

int