Evaluation
RoboVAST provides an interactive GUI (vast eval) based on Jupyter notebooks
for exploration and visualization of scenario execution results.
GUI
vast eval gui [OPTIONS]
Opens a GUI application for interactive exploration and visualization of run results.
Automatically runs postprocessing before launching, unless --skip-postprocessing
is specified.
Options
- -r, --results-dir PATH
Directory containing the run results. When omitted the value configured with
vast initis used.
- -f, --force
Force postprocessing even if the results directory is unchanged.
- --skip-postprocessing
Launch the GUI without running postprocessing first.
- -o, --override VAST_FILE
Use the given
.vastfile for both postprocessing and notebook discovery instead of the campaign copy. See Using --override to Supply a Local .vast File.
Using --override to Supply a Local .vast File
By default vast eval gui reads the .vast configuration from the
campaign snapshot stored in
<results-dir>/<campaign-name>-<timestamp>/_config/<name>.vast. This snapshot is copied
at execution time and may be out of date.
--override (short form -o) lets you point to any .vast file on disk,
for example your current working copy:
# Use a local/updated .vast file
vast eval gui --override my_project.vast
When to use ``–override``
You have updated the
evaluation.visualizationsection (e.g. new notebooks, changed paths) and want the GUI to pick up the changes immediately without re-running the scenario.The results were produced in a different directory and the campaign snapshot points to stale paths.
During notebook development: point to your working
.vastso the GUI always uses the latest notebook paths.
Note
When --override is supplied, the same .vast file is used for
every campaign folder (<campaign-name>-<timestamp>) found under the results directory. The
config directory of the override file (its parent folder) is used to
resolve relative notebook paths defined under evaluation.visualization.
Writing Evaluation Notebooks
Notebooks are plain Jupyter .ipynb files referenced from the
evaluation.visualization section of the .vast file:
evaluation:
visualization:
- MyAnalysis:
run: analysis/analysis_run.ipynb
config: analysis/analysis_config.ipynb
campaign: analysis/analysis_campaign.ipynb
There are three reserved scopes:
run – executed once per individual run directory (
<campaign-name>-<timestamp>/<config>/<run-number>/).config – executed once per configuration directory (
<campaign-name>-<timestamp>/<config>/).campaign – executed once per campaign directory (
<campaign-name>-<timestamp>/).
The only hard requirement is that every notebook contains the line:
DATA_DIR = ''
When the GUI executes a notebook it replaces this line with the actual path for the currently selected item. The output is cached so subsequent views are instant.
Self-Contained Evaluation Notebooks
The self-contained pattern extends the basic requirement above: the
notebook is written so it can be opened and executed directly in VS Code
or JupyterLab (i.e. without the GUI) by setting DATA_DIR to a real
path, while still remaining fully compatible with the GUI.
The approach
Set DATA_DIR to a real results directory in the very first code cell:
# Self-contained: set DATA_DIR to a real path during development.
# The RoboVAST GUI replaces this line at runtime.
DATA_DIR = '/path/to/results/dynamic_obstacle-2026-03-04-132444/my-config-1/'
When the GUI runs the notebook it replaces the entire DATA_DIR = ...
line, so the hardcoded path is never used in production.
Recommended first-cell pattern
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
# Set DATA_DIR to a real path for interactive development.
# The RoboVAST GUI replaces this line automatically.
DATA_DIR = '/path/to/results/<campaign-name>-<timestamp>/<config-name>/'
try:
from robovast.common.analysis import read_output_files, read_output_csv
df = read_output_files(DATA_DIR, lambda d: read_output_csv(d, "poses.csv"))
except Exception as e:
print(f"Error reading data: {e}")
raise SystemExit("No data found -- check DATA_DIR.")
Handling missing columns defensively
When developing against a specific dataset, guard against unexpected
DataFrame schemas so the notebook fails clearly rather than with a cryptic
KeyError:
required_cols = {'run', 'config', 'timestamp', 'frame'}
missing = required_cols - set(df.columns)
if missing:
raise ValueError(f"DataFrame is missing expected columns: {missing}. "
f"Available: {list(df.columns)}")
Scoping DATA_DIR per notebook type
Use paths appropriate to the scope of the notebook:
Scope |
Example |
Available columns |
|---|---|---|
|
|
|
|
|
|
|
|
|
Note
The test and config columns are only present when DATA_DIR
points to a campaign or config directory that contains multiple
runs. When DATA_DIR points to a single run directory those columns
are absent. Grouping by ['test', 'config'] on a run-level notebook
will raise a KeyError; always match the notebook scope to its
DATA_DIR level.
Benefits of the self-contained pattern
Interactive development: run all cells with
Run Allin VS Code without launching the GUI.No context switching: tweak a visualization, re-run, inspect – all in one editor window.
GUI-compatible: the notebook works unchanged in the GUI; the hardcoded path is simply overwritten at runtime.
Reproducible: the path embedded in
DATA_DIRdocuments which dataset the notebook was last developed against.
Typical development workflow
Run an execution campaign to produce results.
Open the relevant
.ipynbfile in VS Code.Set
DATA_DIRto the actual campaign/config/run directory.Develop and iterate with Run All (or cell-by-cell).
Once satisfied, commit the notebook. The GUI will use it via the
evaluation.visualizationsection of the.vastfile;DATA_DIRwill be replaced automatically.To share the notebook with colleagues working on the same dataset, leave the real
DATA_DIRvalue in place – they only need to update the path.
MCP Server
RoboVAST ships an MCP (Model Context Protocol) server that exposes campaign results as tools so that AI assistants (Claude, Open WebUI, etc.) can inspect runs, read logs, and analyze data.
vast eval mcp-server # legacy SSE on 0.0.0.0:8000 (default)
vast eval mcp-server --transport stdio # stdio
vast eval mcp-server --transport streamable-http # modern HTTP
vast eval mcp-server --transport streamable-http --host 127.0.0.1 --port 9000
vast eval mcp-server --debug # log all MCP messages
Options
- --transport {stdio,sse,streamable-http}
Transport layer.
stdiois the default and is used by local MCP clients such as Claude Desktop.sseandstreamable-httpexpose the server over HTTP.
- --host HOST
Host to bind when using an HTTP transport (default
0.0.0.0).
- --port PORT
Port to bind when using an HTTP transport (default
8000).
- --debug
Enable
DEBUGlogging for all MCP messages.
Tool Taxonomy
The MCP server organizes its tools along two dimensions: operations
(verbs) and resources. Tool names follow the pattern
<verb>_<resource>[_<detail>].
Operations
Verb |
Meaning |
|---|---|
|
Retrieve a specific structured metadata object |
|
Enumerate objects within a resource |
|
Filter and query resources by criteria |
|
Compute derived analysis or statistics |
Resources
Resource |
Description |
|---|---|
|
An experiment dataset containing configurations and runs. Defines the shared input files (scenario, .vast config) available to every configuration and run. |
|
A specific parameterized experiment setup. May add configuration-specific files generated during variation. |
|
An individual execution of a configuration. Inherits all input files from its configuration and campaign. Produces output files (test results, logs, rosbags). |
|
Structured tabular run output exposed via dedicated query/inspect tools. |
|
Files generated or consumed during execution |
Tool groups
The tools are grouped by domain. Two introspection tools
(describe_server_capabilities, describe_tool_taxonomy) are always
available so that AI clients can orient themselves at runtime.
Available Tools
All tools are provided by plugins loaded at startup via the
robovast.mcp_plugins entry-point group.
Use MCP Inspector or a compatible client to explore the available tools and their input/output schemas.
npx @modelcontextprotocol/inspector
Campaign metadata tools
Tool |
Description |
|---|---|
|
List available campaigns |
|
Get aggregated statistics about a campaign. |
|
Return the full scenario description used within a campaign. |
|
Return scenario parameters and their unique values across all configurations. |
|
List files available during every run in every configuration. |
|
Read a single campaign run file (or a page of it). |
|
Return the YAML-formatted VAST configuration file for a campaign. |
|
Return the full execution details for a campaign. |
|
Return postprocessing details for a campaign. |
|
List fully resolved configurations of a campaign. |
|
Return the agents defined in the campaign and their configuration files. |
|
List transient files of a campaign. |
|
Read a single campaign transient file (or a page of it). |
Configuration metadata tools
Tool |
Description |
|---|---|
|
Get details about a specific configuration. |
|
Get the scenario parameter values of a configuration. |
|
Return the variation steps that produced this configuration. |
|
List transient files of a configuration. |
|
Read a single configuration transient file. |
|
List config files specific to a configuration. |
|
Read a single configuration config file (or a page of it). |
Run metadata tools
Tool |
Description |
|---|---|
|
Get test result details for a single run. |
|
Get system information recorded during a run. |
|
List additional output files of a single run, that are not already provided |
|
Read a single output file from a run. |
Run data tools
Tool |
Description |
|---|---|
|
List the available data tables for a specific run. |
|
Query rows from a run data table with filters, projection, and pagination. |
|
Inspect a run data table using aggregate/statistical operations only. |
|
Query log entries for a specific run. |
Plugin metadata tools
Tool |
Description |
|---|---|
|
List all RoboVAST plugin extension groups. |
|
List all installed plugins for a given extension group. |
|
Search for a plugin by name across all extension groups. |
|
Get full details for a specific plugin. |