> ## Documentation Index
> Fetch the complete documentation index at: https://docs.worldflux.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Audit Quickstart

> Create a WorldFlux audit evidence package from an existing LeRobot eval_info.json.

# Audit Quickstart

This guide starts from a LeRobot evaluation you already ran. WorldFlux does not run the model again; it reads the saved `eval_info.json` and turns it into an evidence package. The same audit tree also accepts `openpi`, `lbm_eval`, `vla-eval`, `gr00t-n1.7`, `pi-0.7`, `embodied-gov-bench`, and bridge-generated `cosmos-predict` audit inputs.

## 1. Prerequisites

You need:

* an existing LeRobot install
* an `eval_info.json` from a past LeRobot eval run
* a local WorldFlux install

Example source file:

```bash theme={null}
ls /tmp/lerobot_run_001/eval_info.json
```

## 2. Install WorldFlux

During beta, design partners install from a private checkout or private package
index. Public PyPI installation will be documented after the public package
release.

```bash theme={null}
git clone <private-worldflux-checkout-url>
cd WorldFlux
uv sync --extra dev --extra cloud
```

For a package published to a private index:

```bash theme={null}
uv pip install --index-url <private-index-url> worldflux
```

## 3. Create a claim package

The fastest path is the built-in OpenPI/LIBERO template:

```bash theme={null}
worldflux claim create --template openpi-libero --output claim_pkg/
```

This writes `claim_pkg/claim.json` and `claim_pkg/protocol.json`.

If you need a paper-derived draft instead, scaffold it locally:

```bash theme={null}
worldflux claim from-paper-url https://arxiv.org/abs/... --output claim_pkg/
```

For a fully custom claim, copy this template and edit the ids, checkpoint, and baseline for your model:

```json theme={null}
{
  "schema_version": "worldflux.claim.v0_draft",
  "id": "clm_01HXY7K8ABCDEFGHJKMNPQRSTV",
  "slug": "smolvla-libero-spatial-success-rate",
  "claim_text": "SmolVLA reaches at least 80% success on LIBERO-Spatial.",
  "subject": {
    "type": "hf_model",
    "id": "nvidia/smolvla-arena-gr1-microwave",
    "checkpoint_uri": "hf://nvidia/smolvla-arena-gr1-microwave"
  },
  "capability": "robot manipulation",
  "applicability": {
    "benchmark": "LIBERO",
    "benchmark_version": "v1",
    "robot_embodiment": "Franka Panda",
    "simulator": "Isaac Lab",
    "camera_distribution": "default",
    "language_template": "standard",
    "license_assumptions": ["user-provided LeRobot output"]
  },
  "expected_metrics": [
    {"name": "success_rate", "operator": ">=", "value": 0.8, "unit": "ratio"}
  ],
  "decision_rule": {
    "null_hypothesis": "success_rate below baseline minus margin",
    "margin": 0.05,
    "alpha": 0.05,
    "interval_method": "wilson"
  },
  "source": {
    "type": "huggingface",
    "title": "SmolVLA LIBERO example",
    "url": "https://huggingface.co/nvidia/smolvla-arena-gr1-microwave",
    "section": "model card"
  },
  "curator": {
    "agent": "local-user",
    "created_at": "2026-05-06T00:00:00Z"
  },
  "status": "untested"
}
```

Save it if you use the manual path:

```bash theme={null}
$EDITOR claim.json
```

## 4. Create `protocol.json`

Skip this step if you used `worldflux claim create` or `worldflux claim from-paper-url`; both commands already wrote `protocol.json`.

```json theme={null}
{
  "schema_version": "worldflux.protocol.v0_draft",
  "id": "prt_01HXY7K9ABCDEFGHJKMNPQRSTV",
  "claim_id": "clm_01HXY7K8ABCDEFGHJKMNPQRSTV",
  "claim_hash": "sha256:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
  "benchmark": {
    "name": "LIBERO",
    "version": "v1",
    "task_list": ["libero-spatial"],
    "split": "libero-spatial"
  },
  "runtime": {
    "provider": "audit-import",
    "instance_class_min": "cpu",
    "max_runtime_seconds": 600
  },
  "execution": {
    "episodes_per_task": 50,
    "seeds": [0, 1, 2],
    "max_steps_per_episode": 500,
    "reset_policy": "seeded",
    "observation_space": ["rgb", "state"],
    "action_space": "continuous"
  },
  "evaluator": {
    "name": "lerobot",
    "version": "local",
    "implementation_ref": "lerobot.scripts.lerobot_eval"
  },
  "model_runtime": {
    "checkpoint_uri": "hf://nvidia/smolvla-arena-gr1-microwave"
  }
}
```

Save it:

```bash theme={null}
$EDITOR protocol.json
```

## 5. Run Audit

```bash theme={null}
worldflux audit run lerobot \
  --from /tmp/lerobot_run_001/eval_info.json \
  --claim claim_pkg/claim.json \
  --protocol claim_pkg/protocol.json \
  --output evidence_pkg/
```

Optional evidence features include compliance mappings, CycloneDX ML-BOM sidecars, SAVI/PPI stopping metadata, and completeness scores for supported sources:

```bash theme={null}
worldflux audit run embodied-gov-bench \
  --from /tmp/bench_output \
  --claim claim_pkg/claim.json \
  --protocol claim_pkg/protocol.json \
  --output evidence_pkg/ \
  --compliance eu-ai-act-annex8 \
  --mlbom \
  --emit-completeness-score
```

## 6. Inspect

```bash theme={null}
worldflux evidence inspect evidence_pkg/
```

Compare two evidence packages:

```bash theme={null}
worldflux evidence diff left/evidence.json right/evidence.json
```

## 7. Publish

```bash theme={null}
worldflux audit sign evidence_pkg/
worldflux audit verify evidence_pkg/
worldflux audit publish evidence_pkg/ \
  --share \
  --cloud-run-id <cloud-run-uuid> \
  --confirm-public-share-upload \
  --approval-file public_share_approval.json \
  --password-env WORLDFLUX_SHARE_ACCESS_CODE
```

Publishing a share requires Cloud login, a signed and verified evidence package,
and either `--cloud-run-id <cloud-run-uuid>` with
`--confirm-public-share-upload` so WorldFlux can upload customer-approved
sanitized package bytes, or `--evidence-package-artifact-id <artifact-uuid>` for
an already uploaded Cloud evidence package artifact.

Production Sigstore publication also requires `--sigstore-policy-config`.
Self-sign publication requires `--trusted-signer-config` so Cloud can bind the
share to an explicitly trusted signer.

For messy already-extracted customer run folders, start with a private,
read-only import report:

```bash theme={null}
worldflux audit import run-folder --from /tmp/customer_run --dry-run --report import_report.json --customer-report import_report_public.md
worldflux audit import run-folder --from /tmp/customer_run --select-run <candidate_id> --output audit_input.json --report import_report.json --customer-report import_report_public.md
```

## 8. Share

```bash theme={null}
cat evidence_pkg/share_url.txt
```

Send that URL to reviewers. The evidence package contains `claim.json`, `protocol.json`, `evidence.json`, `evidence.md`, `audit_input.json`, `audit_provenance.json`, `episode_results.jsonl`, `raw_evidence_manifest.json`, `failure_evidence_index.jsonl`, `failure_replay_manifest.jsonl`, and `artifact_manifest.json`.

`audit_input.json` is the normalized input WorldFlux audited. `episode_results.jsonl` stores per-episode summaries plus bounded metadata that adapters supplied. `raw_evidence_manifest.json` stores safe references to raw/source evidence and redacted export hashes; it does not automatically copy raw videos, traces, provider responses, or model outputs. `failure_evidence_index.jsonl` is a searchable seed for later failure graph ingestion, not a graph database. `failure_replay_manifest.jsonl` records replay hints such as task, seed, and input hashes when available, but it is not a replay runner.

Hosted public shares are narrower than the local package. Cloud accepts only the signed package members `claim.json`, `protocol.json`, `evidence.json`, `evidence.md`, `audit_input.json`, `audit_provenance.json`, `artifact_manifest.json`, and the signature files required by the signing backend. The reviewer URL exposes a public-safe DTO: expiry and password status as computed fields; reviewer label, audience, approver, revocation owner, and retention policy from the approval record; display id, claim/protocol/result/scope/recommendation, verification status, package-derived deployment summary, missing evidence labels, and next falsification axes from the verified package; and sanitized artifact display name/type/size bucket/state. It does not publish raw logs, raw videos, checkpoints, provider responses, local paths, signed URLs, workspace controls, or private object keys. It also hides raw `client_run_id`, raw recipe/runtime, raw artifact paths, workspace/project/user ids, API key ids, and token hashes.

Reliability metadata is separate from publishing. It is opt-in only, and when enabled it stores allowlisted labels derived from the verified public-safe summary, not customer model weights, raw folders, logs, or videos.
