Evidence Triage Rules
WorldFlux treats existing run folders as untrusted input. Therun-folder
importer is read-only: it inventories evidence, writes a private operator report,
and emits audit_input.json only when the selected candidate is claim-safe.
Accepted For Claim Packaging
WorldFlux can emitaudit_input.json when one selected candidate has explicit
per-episode JSON or JSONL with:
- boolean success per episode;
- task, suite, or episode identity;
- a metric contract that matches the claim/protocol;
- denominator information that prevents silently dropping failed or missing episodes;
- no secrets, private credentials, signed URLs, or raw customer-only payloads in public fields.
Report-Only Evidence
WorldFlux inventories these signals for review but does not treat them as comparable claim metrics by default:- W&B, MLflow, tracker summaries, and aggregate dashboards;
- simulator, ROS/MCAP, Isaac, Cosmos, LeRobot/GR00T dataset signals;
- custom metrics without an explicit metric contract;
- model names or benchmark labels without episode-level outcomes;
- logs, screenshots, videos, and narrative notes.
Rejected For audit_input.json
WorldFlux writes the private report and refuses to emit audit_input.json when
the folder is ambiguous, partial, archive-only, CSV-only, numeric-score-only,
aggregate-only, missing episode-level success outcomes, missing denominator
policy, dominated by raw binaries, or likely to expose secrets.
VLA Benchmark Claims
For LIBERO, OpenPI, OpenVLA, GR00T, and related VLA benchmarks, evidence grade must be chosen before execution if WorldFlux is involved in the run. Imported results that lack pre-run model identity attestation, frozen episode manifests, attempt policy, or denominator policy must be labeled with the weaker Grade B/C/D wording from the VLA apple-to-apple definition.Safe Customer Wording
Use:- “WorldFlux packaged imported evaluation output.”
- “The package is signed and tamper-evident.”
- “The claim is limited to the recorded protocol and evidence scope.”