Evaluation

RoboVAST provides an interactive GUI (vast eval) based on Jupyter notebooks for exploration and visualization of scenario execution results.

GUI

vast eval gui [OPTIONS]

Opens a GUI application for interactive exploration and visualization of run results. Automatically runs postprocessing before launching, unless --skip-postprocessing is specified.

Options

-r, --results-dir PATH: Directory containing the run results. When omitted the value configured with vast init is used.

-f, --force: Force postprocessing even if the results directory is unchanged.

--skip-postprocessing: Launch the GUI without running postprocessing first.

-o, --override VAST_FILE: Use the given .vast file for both postprocessing and notebook discovery instead of the campaign copy. See Using --override to Supply a Local .vast File.

Using `--override` to Supply a Local `.vast` File

By default vast eval gui reads the .vast configuration from the campaign snapshot stored in <results-dir>/<campaign-name>-<timestamp>/_config/<name>.vast. This snapshot is copied at execution time and may be out of date.

--override (short form -o) lets you point to any .vast file on disk, for example your current working copy:

# Use a local/updated .vast file
vast eval gui --override my_project.vast

When to use ``–override``

You have updated the evaluation.visualization section (e.g. new notebooks, changed paths) and want the GUI to pick up the changes immediately without re-running the scenario.
The results were produced in a different directory and the campaign snapshot points to stale paths.
During notebook development: point to your working .vast so the GUI always uses the latest notebook paths.

Note

When --override is supplied, the same .vast file is used for every campaign folder (<campaign-name>-<timestamp>) found under the results directory. The config directory of the override file (its parent folder) is used to resolve relative notebook paths defined under evaluation.visualization.

Writing Evaluation Notebooks

Notebooks are plain Jupyter .ipynb files referenced from the evaluation.visualization section of the .vast file:

evaluation:
  visualization:
    - MyAnalysis:
        run: analysis/analysis_run.ipynb
        config: analysis/analysis_config.ipynb
        campaign: analysis/analysis_campaign.ipynb

There are three reserved scopes:

run – executed once per individual run directory (<campaign-name>-<timestamp>/<config>/<run-number>/).
config – executed once per configuration directory (<campaign-name>-<timestamp>/<config>/).
campaign – executed once per campaign directory (<campaign-name>-<timestamp>/).

The only hard requirement is that every notebook contains the line:

DATA_DIR = ''

When the GUI executes a notebook it replaces this line with the actual path for the currently selected item. The output is cached so subsequent views are instant.

Self-Contained Evaluation Notebooks

The self-contained pattern extends the basic requirement above: the notebook is written so it can be opened and executed directly in VS Code or JupyterLab (i.e. without the GUI) by setting DATA_DIR to a real path, while still remaining fully compatible with the GUI.

The approach

Set DATA_DIR to a real results directory in the very first code cell:

# Self-contained: set DATA_DIR to a real path during development.
# The RoboVAST GUI replaces this line at runtime.
DATA_DIR = '/path/to/results/dynamic_obstacle-2026-03-04-132444/my-config-1/'

When the GUI runs the notebook it replaces the entire DATA_DIR = ... line, so the hardcoded path is never used in production.

Recommended first-cell pattern

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os

# Set DATA_DIR to a real path for interactive development.
# The RoboVAST GUI replaces this line automatically.
DATA_DIR = '/path/to/results/<campaign-name>-<timestamp>/<config-name>/'

try:
    from robovast.common.analysis import read_output_files, read_output_csv
    df = read_output_files(DATA_DIR, lambda d: read_output_csv(d, "poses.csv"))
except Exception as e:
    print(f"Error reading data: {e}")
    raise SystemExit("No data found -- check DATA_DIR.")

Handling missing columns defensively

When developing against a specific dataset, guard against unexpected DataFrame schemas so the notebook fails clearly rather than with a cryptic KeyError:

required_cols = {'run', 'config', 'timestamp', 'frame'}
missing = required_cols - set(df.columns)
if missing:
    raise ValueError(f"DataFrame is missing expected columns: {missing}. "
                     f"Available: {list(df.columns)}")

Scoping `DATA_DIR` per notebook type

Use paths appropriate to the scope of the notebook:

Scope	Example `DATA_DIR`	Available columns
`run`	`/<campaign-name>-<timestamp>/<config>/<run-number>/`	`frame`, `timestamp`, …
`config`	`/<campaign-name>-<timestamp>/<config>/`	`run`, `frame`, `timestamp`, …
`campaign`	`/<campaign-name>-<timestamp>/`	`run`, `config`, `test`, `frame`, …

Note

The test and config columns are only present when DATA_DIR points to a campaign or config directory that contains multiple runs. When DATA_DIR points to a single run directory those columns are absent. Grouping by ['test', 'config'] on a run-level notebook will raise a KeyError; always match the notebook scope to its DATA_DIR level.

Benefits of the self-contained pattern

Interactive development: run all cells with Run All in VS Code without launching the GUI.
No context switching: tweak a visualization, re-run, inspect – all in one editor window.
GUI-compatible: the notebook works unchanged in the GUI; the hardcoded path is simply overwritten at runtime.
Reproducible: the path embedded in DATA_DIR documents which dataset the notebook was last developed against.

Typical development workflow

Run an execution campaign to produce results.
Open the relevant .ipynb file in VS Code.
Set DATA_DIR to the actual campaign/config/run directory.
Develop and iterate with Run All (or cell-by-cell).
Once satisfied, commit the notebook. The GUI will use it via the evaluation.visualization section of the .vast file; DATA_DIR will be replaced automatically.
To share the notebook with colleagues working on the same dataset, leave the real DATA_DIR value in place – they only need to update the path.

MCP Server

RoboVAST ships an MCP (Model Context Protocol) server that exposes campaign results as tools so that AI assistants (Claude, Open WebUI, etc.) can inspect runs, read logs, and analyze data.

vast eval mcp-server                                # legacy SSE on 0.0.0.0:8000 (default)
vast eval mcp-server --transport stdio              # stdio
vast eval mcp-server --transport streamable-http    # modern HTTP
vast eval mcp-server --transport streamable-http --host 127.0.0.1 --port 9000
vast eval mcp-server --debug                        # log all MCP messages

Options

--transport {stdio,sse,streamable-http}: Transport layer. stdio is the default and is used by local MCP clients such as Claude Desktop. sse and streamable-http expose the server over HTTP.

--host HOST: Host to bind when using an HTTP transport (default 0.0.0.0).

--port PORT: Port to bind when using an HTTP transport (default 8000).

--debug: Enable DEBUG logging for all MCP messages.

Tool Taxonomy

The MCP server organizes its tools along two dimensions: operations (verbs) and resources. Tool names follow the pattern <verb>_<resource>[_<detail>].

Operations

Verb	Meaning
`get`	Retrieve a specific structured metadata object
`list`	Enumerate objects within a resource
`search`	Filter and query resources by criteria
`inspect`	Compute derived analysis or statistics

Resources

Resource	Description
`campaign`	An experiment dataset containing configurations and runs. Defines the shared input files (scenario, .vast config) available to every configuration and run.
`configuration`	A specific parameterized experiment setup. May add configuration-specific files generated during variation.
`run`	An individual execution of a configuration. Inherits all input files from its configuration and campaign. Produces output files (test results, logs, rosbags).
`run_data`	Structured tabular run output exposed via dedicated query/inspect tools.
`artifact`	Files generated or consumed during execution

Tool groups

The tools are grouped by domain. Two introspection tools (describe_server_capabilities, describe_tool_taxonomy) are always available so that AI clients can orient themselves at runtime.

Available Tools

All tools are provided by plugins loaded at startup via the robovast.mcp_plugins entry-point group.

Use MCP Inspector or a compatible client to explore the available tools and their input/output schemas.

npx @modelcontextprotocol/inspector

Campaign metadata tools

Tool	Description
`list_campaigns`	List available campaigns
`get_campaign_summary`	Get aggregated statistics about a campaign.
`get_campaign_scenario`	Return the full scenario description used within a campaign.
`get_campaign_scenario_parameters`	Return scenario parameters and their unique values across all configurations.
`list_campaign_run_files`	List files available during every run in every configuration.
`get_campaign_run_file`	Read a single campaign run file (or a page of it).
`get_campaign_config`	Return the YAML-formatted VAST configuration file for a campaign.
`get_campaign_execution_details`	Return the full execution details for a campaign.
`get_campaign_postprocessing_details`	Return postprocessing details for a campaign.
`list_campaign_configurations`	List fully resolved configurations of a campaign.
`get_campaign_agents`	Return the agents defined in the campaign and their configuration files.
`list_campaign_transient_files`	List transient files of a campaign.
`get_campaign_transient_file`	Read a single campaign transient file (or a page of it).

Configuration metadata tools

Tool	Description
`get_configuration_summary`	Get details about a specific configuration.
`get_configuration_scenario_parameter`	Get the scenario parameter values of a configuration.
`get_configuration_variations`	Return the variation steps that produced this configuration.
`list_configuration_transient_files`	List transient files of a configuration.
`get_configuration_transient_file`	Read a single configuration transient file.
`list_configuration_config_files`	List config files specific to a configuration.
`get_configuration_config_file`	Read a single configuration config file (or a page of it).

Run metadata tools

Tool	Description
`get_run_details`	Get test result details for a single run.
`get_run_sysinfo`	Get system information recorded during a run.
`list_run_additional_output_files`	List additional output files of a single run, that are not already provided
`get_run_output_file`	Read a single output file from a run.

Run data tools

Tool	Description
`list_run_data_tables`	List the available data tables for a specific run.
`query_run_data_table`	Query rows from a run data table with filters, projection, and pagination.
`inspect_run_data_table`	Inspect a run data table using aggregate/statistical operations only.
`query_run_log`	Query log entries for a specific run.

Plugin metadata tools

Tool	Description
`list_plugin_groups`	List all RoboVAST plugin extension groups.
`list_plugins`	List all installed plugins for a given extension group.
`search_plugin`	Search for a plugin by name across all extension groups.
`get_plugin_details`	Get full details for a specific plugin.