> ## Documentation Index
> Fetch the complete documentation index at: https://docs.worldflux.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Evidence Triage Rules

> What WorldFlux can and cannot turn into an evidence claim from imported run folders.

# Evidence Triage Rules

WorldFlux treats existing run folders as untrusted input. The `run-folder`
importer is read-only: it inventories evidence, writes a private operator report,
and emits `audit_input.json` only when the selected candidate is claim-safe.

## Accepted For Claim Packaging

WorldFlux can emit `audit_input.json` when one selected candidate has explicit
per-episode JSON or JSONL with:

* boolean success per episode;
* task, suite, or episode identity;
* a metric contract that matches the claim/protocol;
* denominator information that prevents silently dropping failed or missing
  episodes;
* no secrets, private credentials, signed URLs, or raw customer-only payloads in
  public fields.

## Report-Only Evidence

WorldFlux inventories these signals for review but does not treat them as
comparable claim metrics by default:

* W\&B, MLflow, tracker summaries, and aggregate dashboards;
* simulator, ROS/MCAP, Isaac, Cosmos, LeRobot/GR00T dataset signals;
* custom metrics without an explicit metric contract;
* model names or benchmark labels without episode-level outcomes;
* logs, screenshots, videos, and narrative notes.

## Rejected For `audit_input.json`

WorldFlux writes the private report and refuses to emit `audit_input.json` when
the folder is ambiguous, partial, archive-only, CSV-only, numeric-score-only,
aggregate-only, missing episode-level success outcomes, missing denominator
policy, dominated by raw binaries, or likely to expose secrets.

## VLA Benchmark Claims

For LIBERO, OpenPI, OpenVLA, GR00T, and related VLA benchmarks, evidence grade
must be chosen before execution if WorldFlux is involved in the run. Imported
results that lack pre-run model identity attestation, frozen episode manifests,
attempt policy, or denominator policy must be labeled with the weaker Grade B/C/D
wording from the VLA apple-to-apple definition.

## Safe Customer Wording

Use:

* "WorldFlux packaged imported evaluation output."
* "The package is signed and tamper-evident."
* "The claim is limited to the recorded protocol and evidence scope."

Do not say "official benchmark score", "deployment-safe", "regulatory
certified", "fully Apple-to-Apple", "live provider runtime supported", or
"tamper-proof" unless separately proven.
