Documentation Index
Fetch the complete documentation index at: https://docs.worldflux.ai/llms.txt
Use this file to discover all available pages before exploring further.
Module path
run.py; the manifest writer picks up the result.
Pattern
LIBERO is the cleanest example.LIBEROEvalSummary per suite. If output_dir is set, it also writes a JSON artifact with the same content.
What ships
| Bridge | Suite | Entry function | Output |
|---|---|---|---|
cosmos | RoboCasa rollouts via Cosmos-Predict | run_cosmos_rollout(...) | RolloutSummary per scene |
libero | LIBERO 4-suite benchmark | run_libero_eval(...) | dict[str, LIBEROEvalSummary] |
minecraft | Minecraft offline RL replay | run_minecraft_eval(...) | MinecraftReplaySummary |
vjepa | V-JEPA latent embedding probe | run_vjepa_embed_eval(...) | VJEPAEmbedSummary |
manifest.metrics directly.
Wiring an adapter
A curated adapter’srun.py typically does three things:
- Start the adapter’s policy server (or call its inference function directly).
- Call the bridge with the right
taskargument. - Hand the bridge’s return value to the manifest writer.
Adding a bridge
Write the function
Add
src/worldflux/eval_bridges/<suite>.py. Keep it function-first; only add a class if state genuinely needs to live across episodes.Define the output dataclass
A frozen
@dataclass with the fields the dashboard will need to render. Avoid optionals where you can; mean_* should always be a float, even if it’s 0.0.Write the JSON artifact
If the suite produces per-episode detail, write a JSON file under
output_dir and reference it from the dataclass. The dashboard’s run detail panel previews JSON inline.When you do not need a bridge
If the suite already returns JSON in roughly the shape the manifest expects, the adapter can write straight intomanifest.metrics and skip the bridge layer. Bridges exist for suites whose native output (per-episode logs, per-task success matrices, vector embeddings) does not map onto the manifest 1:1.