Audit Quickstart

This guide starts from a LeRobot evaluation you already ran. WorldFlux does not run the model again; it reads the saved eval_info.json and turns it into an evidence package. The same audit tree also accepts openpi, lbm_eval, vla-eval, gr00t-n1.7, pi-0.7, embodied-gov-bench, and bridge-generated cosmos-predict audit inputs.

1. Prerequisites

You need:

an existing LeRobot install
an eval_info.json from a past LeRobot eval run
a local WorldFlux install

Example source file:

ls /tmp/lerobot_run_001/eval_info.json

2. Install WorldFlux

During beta, design partners install from a private checkout or private package index. Public PyPI installation will be documented after the public package release.

git clone <private-worldflux-checkout-url>
cd WorldFlux
uv sync --extra dev --extra cloud

For a package published to a private index:

uv pip install --index-url <private-index-url> worldflux

3. Create a claim package

The fastest path is the built-in OpenPI/LIBERO template:

worldflux claim create --template openpi-libero --output claim_pkg/

This writes claim_pkg/claim.json and claim_pkg/protocol.json. If you need a paper-derived draft instead, scaffold it locally:

worldflux claim from-paper-url https://arxiv.org/abs/... --output claim_pkg/

For a fully custom claim, copy this template and edit the ids, checkpoint, and baseline for your model:

{
  "schema_version": "worldflux.claim.v0_draft",
  "id": "clm_01HXY7K8ABCDEFGHJKMNPQRSTV",
  "slug": "smolvla-libero-spatial-success-rate",
  "claim_text": "SmolVLA reaches at least 80% success on LIBERO-Spatial.",
  "subject": {
    "type": "hf_model",
    "id": "nvidia/smolvla-arena-gr1-microwave",
    "checkpoint_uri": "hf://nvidia/smolvla-arena-gr1-microwave"
  },
  "capability": "robot manipulation",
  "applicability": {
    "benchmark": "LIBERO",
    "benchmark_version": "v1",
    "robot_embodiment": "Franka Panda",
    "simulator": "Isaac Lab",
    "camera_distribution": "default",
    "language_template": "standard",
    "license_assumptions": ["user-provided LeRobot output"]
  },
  "expected_metrics": [
    {"name": "success_rate", "operator": ">=", "value": 0.8, "unit": "ratio"}
  ],
  "decision_rule": {
    "null_hypothesis": "success_rate below baseline minus margin",
    "margin": 0.05,
    "alpha": 0.05,
    "interval_method": "wilson"
  },
  "source": {
    "type": "huggingface",
    "title": "SmolVLA LIBERO example",
    "url": "https://huggingface.co/nvidia/smolvla-arena-gr1-microwave",
    "section": "model card"
  },
  "curator": {
    "agent": "local-user",
    "created_at": "2026-05-06T00:00:00Z"
  },
  "status": "untested"
}

Save it if you use the manual path:

$EDITOR claim.json

4. Create `protocol.json`

Skip this step if you used worldflux claim create or worldflux claim from-paper-url; both commands already wrote protocol.json.

{
  "schema_version": "worldflux.protocol.v0_draft",
  "id": "prt_01HXY7K9ABCDEFGHJKMNPQRSTV",
  "claim_id": "clm_01HXY7K8ABCDEFGHJKMNPQRSTV",
  "claim_hash": "sha256:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
  "benchmark": {
    "name": "LIBERO",
    "version": "v1",
    "task_list": ["libero-spatial"],
    "split": "libero-spatial"
  },
  "runtime": {
    "provider": "audit-import",
    "instance_class_min": "cpu",
    "max_runtime_seconds": 600
  },
  "execution": {
    "episodes_per_task": 50,
    "seeds": [0, 1, 2],
    "max_steps_per_episode": 500,
    "reset_policy": "seeded",
    "observation_space": ["rgb", "state"],
    "action_space": "continuous"
  },
  "evaluator": {
    "name": "lerobot",
    "version": "local",
    "implementation_ref": "lerobot.scripts.lerobot_eval"
  },
  "model_runtime": {
    "checkpoint_uri": "hf://nvidia/smolvla-arena-gr1-microwave"
  }
}

Save it:

$EDITOR protocol.json

5. Run Audit

worldflux audit run lerobot \
  --from /tmp/lerobot_run_001/eval_info.json \
  --claim claim_pkg/claim.json \
  --protocol claim_pkg/protocol.json \
  --output evidence_pkg/

Optional evidence features include compliance mappings, CycloneDX ML-BOM sidecars, SAVI/PPI stopping metadata, and completeness scores for supported sources:

worldflux audit run embodied-gov-bench \
  --from /tmp/bench_output \
  --claim claim_pkg/claim.json \
  --protocol claim_pkg/protocol.json \
  --output evidence_pkg/ \
  --compliance eu-ai-act-annex8 \
  --mlbom \
  --emit-completeness-score

6. Inspect

worldflux evidence inspect evidence_pkg/

Compare two evidence packages:

worldflux evidence diff left/evidence.json right/evidence.json

7. Publish

worldflux audit sign evidence_pkg/
worldflux audit verify evidence_pkg/
worldflux audit publish evidence_pkg/ \
  --share \
  --cloud-run-id <cloud-run-uuid> \
  --confirm-public-share-upload \
  --approval-file public_share_approval.json \
  --password-env WORLDFLUX_SHARE_ACCESS_CODE

Publishing a share requires Cloud login, a signed and verified evidence package, and either --cloud-run-id <cloud-run-uuid> with --confirm-public-share-upload so WorldFlux can upload customer-approved sanitized package bytes, or --evidence-package-artifact-id <artifact-uuid> for an already uploaded Cloud evidence package artifact. Production Sigstore publication also requires --sigstore-policy-config. Self-sign publication requires --trusted-signer-config so Cloud can bind the share to an explicitly trusted signer. For messy already-extracted customer run folders, start with a private, read-only import report:

worldflux audit import run-folder --from /tmp/customer_run --dry-run --report import_report.json --customer-report import_report_public.md
worldflux audit import run-folder --from /tmp/customer_run --select-run <candidate_id> --output audit_input.json --report import_report.json --customer-report import_report_public.md

cat evidence_pkg/share_url.txt

Send that URL to reviewers. The evidence package contains claim.json, protocol.json, evidence.json, evidence.md, audit_input.json, audit_provenance.json, episode_results.jsonl, raw_evidence_manifest.json, failure_evidence_index.jsonl, failure_replay_manifest.jsonl, and artifact_manifest.json. audit_input.json is the normalized input WorldFlux audited. episode_results.jsonl stores per-episode summaries plus bounded metadata that adapters supplied. raw_evidence_manifest.json stores safe references to raw/source evidence and redacted export hashes; it does not automatically copy raw videos, traces, provider responses, or model outputs. failure_evidence_index.jsonl is a searchable seed for later failure graph ingestion, not a graph database. failure_replay_manifest.jsonl records replay hints such as task, seed, and input hashes when available, but it is not a replay runner. Hosted public shares are narrower than the local package. Cloud accepts only the signed package members claim.json, protocol.json, evidence.json, evidence.md, audit_input.json, audit_provenance.json, artifact_manifest.json, and the signature files required by the signing backend. The reviewer URL exposes a public-safe DTO: expiry and password status as computed fields; reviewer label, audience, approver, revocation owner, and retention policy from the approval record; display id, claim/protocol/result/scope/recommendation, verification status, package-derived deployment summary, missing evidence labels, and next falsification axes from the verified package; and sanitized artifact display name/type/size bucket/state. It does not publish raw logs, raw videos, checkpoints, provider responses, local paths, signed URLs, workspace controls, or private object keys. It also hides raw client_run_id, raw recipe/runtime, raw artifact paths, workspace/project/user ids, API key ids, and token hashes. Reliability metadata is separate from publishing. It is opt-in only, and when enabled it stores allowlisted labels derived from the verified public-safe summary, not customer model weights, raw folders, logs, or videos.

​Audit Quickstart

​1. Prerequisites

​2. Install WorldFlux

​3. Create a claim package

​4. Create protocol.json

​5. Run Audit

​6. Inspect

​7. Publish

​8. Share