Evaluation

RoboVAST provides an interactive GUI (vast eval) based on Jupyter notebooks for exploration and visualization of scenario execution results.

GUI

vast eval gui [OPTIONS]

Opens a GUI application for interactive exploration and visualization of run results. Automatically runs postprocessing before launching, unless --skip-postprocessing is specified.

Options

-r, --results-dir PATH

Directory containing the run results. When omitted the value configured with vast init is used.

-f, --force

Force postprocessing even if the results directory is unchanged.

--skip-postprocessing

Launch the GUI without running postprocessing first.

-o, --override VAST_FILE

Use the given .vast file for both postprocessing and notebook discovery instead of the campaign copy. See Using --override to Supply a Local .vast File.

Using --override to Supply a Local .vast File

By default vast eval gui reads the .vast configuration from the campaign snapshot stored in <results-dir>/<campaign-name>-<timestamp>/_config/<name>.vast. This snapshot is copied at execution time and may be out of date.

--override (short form -o) lets you point to any .vast file on disk, for example your current working copy:

# Use a local/updated .vast file
vast eval gui --override my_project.vast

When to use ``–override``

  • You have updated the evaluation.visualization section (e.g. new notebooks, changed paths) and want the GUI to pick up the changes immediately without re-running the scenario.

  • The results were produced in a different directory and the campaign snapshot points to stale paths.

  • During notebook development: point to your working .vast so the GUI always uses the latest notebook paths.

Note

When --override is supplied, the same .vast file is used for every campaign folder (<campaign-name>-<timestamp>) found under the results directory. The config directory of the override file (its parent folder) is used to resolve relative notebook paths defined under evaluation.visualization.

Writing Evaluation Notebooks

Notebooks are plain Jupyter .ipynb files referenced from the evaluation.visualization section of the .vast file:

evaluation:
  visualization:
    - MyAnalysis:
        run: analysis/analysis_run.ipynb
        config: analysis/analysis_config.ipynb
        campaign: analysis/analysis_campaign.ipynb

There are three reserved scopes:

  • run – executed once per individual run directory (<campaign-name>-<timestamp>/<config>/<run-number>/).

  • config – executed once per configuration directory (<campaign-name>-<timestamp>/<config>/).

  • campaign – executed once per campaign directory (<campaign-name>-<timestamp>/).

The only hard requirement is that every notebook contains the line:

DATA_DIR = ''

When the GUI executes a notebook it replaces this line with the actual path for the currently selected item. The output is cached so subsequent views are instant.

Self-Contained Evaluation Notebooks

The self-contained pattern extends the basic requirement above: the notebook is written so it can be opened and executed directly in VS Code or JupyterLab (i.e. without the GUI) by setting DATA_DIR to a real path, while still remaining fully compatible with the GUI.

The approach

Set DATA_DIR to a real results directory in the very first code cell:

# Self-contained: set DATA_DIR to a real path during development.
# The RoboVAST GUI replaces this line at runtime.
DATA_DIR = '/path/to/results/dynamic_obstacle-2026-03-04-132444/my-config-1/'

When the GUI runs the notebook it replaces the entire DATA_DIR = ... line, so the hardcoded path is never used in production.

Handling missing columns defensively

When developing against a specific dataset, guard against unexpected DataFrame schemas so the notebook fails clearly rather than with a cryptic KeyError:

required_cols = {'run', 'config', 'timestamp', 'frame'}
missing = required_cols - set(df.columns)
if missing:
    raise ValueError(f"DataFrame is missing expected columns: {missing}. "
                     f"Available: {list(df.columns)}")

Scoping DATA_DIR per notebook type

Use paths appropriate to the scope of the notebook:

Scope

Example DATA_DIR

Available columns

run

/<campaign-name>-<timestamp>/<config>/<run-number>/

frame, timestamp, …

config

/<campaign-name>-<timestamp>/<config>/

run, frame, timestamp, …

campaign

/<campaign-name>-<timestamp>/

run, config, test, frame, …

Note

The test and config columns are only present when DATA_DIR points to a campaign or config directory that contains multiple runs. When DATA_DIR points to a single run directory those columns are absent. Grouping by ['test', 'config'] on a run-level notebook will raise a KeyError; always match the notebook scope to its DATA_DIR level.

Benefits of the self-contained pattern

  • Interactive development: run all cells with Run All in VS Code without launching the GUI.

  • No context switching: tweak a visualization, re-run, inspect – all in one editor window.

  • GUI-compatible: the notebook works unchanged in the GUI; the hardcoded path is simply overwritten at runtime.

  • Reproducible: the path embedded in DATA_DIR documents which dataset the notebook was last developed against.

Typical development workflow

  1. Run an execution campaign to produce results.

  2. Open the relevant .ipynb file in VS Code.

  3. Set DATA_DIR to the actual campaign/config/run directory.

  4. Develop and iterate with Run All (or cell-by-cell).

  5. Once satisfied, commit the notebook. The GUI will use it via the evaluation.visualization section of the .vast file; DATA_DIR will be replaced automatically.

  6. To share the notebook with colleagues working on the same dataset, leave the real DATA_DIR value in place – they only need to update the path.

MCP Server

RoboVAST ships an MCP (Model Context Protocol) server that exposes campaign results as tools so that AI assistants (Claude, Open WebUI, etc.) can inspect runs, read logs, and analyze data.

vast eval mcp-server                                # legacy SSE on 0.0.0.0:8000 (default)
vast eval mcp-server --transport stdio              # stdio
vast eval mcp-server --transport streamable-http    # modern HTTP
vast eval mcp-server --transport streamable-http --host 127.0.0.1 --port 9000
vast eval mcp-server --debug                        # log all MCP messages

Options

--transport {stdio,sse,streamable-http}

Transport layer. stdio is the default and is used by local MCP clients such as Claude Desktop. sse and streamable-http expose the server over HTTP.

--host HOST

Host to bind when using an HTTP transport (default 0.0.0.0).

--port PORT

Port to bind when using an HTTP transport (default 8000).

--debug

Enable DEBUG logging for all MCP messages.

Tool Taxonomy

The MCP server organizes its tools along two dimensions: operations (verbs) and resources. Tool names follow the pattern <verb>_<resource>[_<detail>].

Operations

Verb

Meaning

get

Retrieve a specific structured metadata object

list

Enumerate objects within a resource

search

Filter and query resources by criteria

inspect

Compute derived analysis or statistics

Resources

Resource

Description

campaign

An experiment dataset containing configurations and runs. Defines the shared input files (scenario, .vast config) available to every configuration and run.

configuration

A specific parameterized experiment setup. May add configuration-specific files generated during variation.

run

An individual execution of a configuration. Inherits all input files from its configuration and campaign. Produces output files (test results, logs, rosbags).

run_data

Structured tabular run output exposed via dedicated query/inspect tools.

artifact

Files generated or consumed during execution

Tool groups

The tools are grouped by domain. Two introspection tools (describe_server_capabilities, describe_tool_taxonomy) are always available so that AI clients can orient themselves at runtime.

Available Tools

All tools are provided by plugins loaded at startup via the robovast.mcp_plugins entry-point group.

Use MCP Inspector or a compatible client to explore the available tools and their input/output schemas.

npx @modelcontextprotocol/inspector

Campaign metadata tools

Tool

Description

list_campaigns

List available campaigns

get_campaign_summary

Get aggregated statistics about a campaign.

get_campaign_scenario

Return the full scenario description used within a campaign.

get_campaign_scenario_parameters

Return scenario parameters and their unique values across all configurations.

list_campaign_run_files

List files available during every run in every configuration.

get_campaign_run_file

Read a single campaign run file (or a page of it).

get_campaign_config

Return the YAML-formatted VAST configuration file for a campaign.

get_campaign_execution_details

Return the full execution details for a campaign.

get_campaign_postprocessing_details

Return postprocessing details for a campaign.

list_campaign_configurations

List fully resolved configurations of a campaign.

get_campaign_agents

Return the agents defined in the campaign and their configuration files.

list_campaign_transient_files

List transient files of a campaign.

get_campaign_transient_file

Read a single campaign transient file (or a page of it).

Configuration metadata tools

Tool

Description

get_configuration_summary

Get details about a specific configuration.

get_configuration_scenario_parameter

Get the scenario parameter values of a configuration.

get_configuration_variations

Return the variation steps that produced this configuration.

list_configuration_transient_files

List transient files of a configuration.

get_configuration_transient_file

Read a single configuration transient file.

list_configuration_config_files

List config files specific to a configuration.

get_configuration_config_file

Read a single configuration config file (or a page of it).

Run metadata tools

Tool

Description

get_run_details

Get test result details for a single run.

get_run_sysinfo

Get system information recorded during a run.

list_run_additional_output_files

List additional output files of a single run, that are not already provided

get_run_output_file

Read a single output file from a run.

Run data tools

Tool

Description

list_run_data_tables

List the available data tables for a specific run.

query_run_data_table

Query rows from a run data table with filters, projection, and pagination.

inspect_run_data_table

Inspect a run data table using aggregate/statistical operations only.

query_run_log

Query log entries for a specific run.

Plugin metadata tools

Tool

Description

list_plugin_groups

List all RoboVAST plugin extension groups.

list_plugins

List all installed plugins for a given extension group.

search_plugin

Search for a plugin by name across all extension groups.

get_plugin_details

Get full details for a specific plugin.