Boundary
WorldFlux separates two objects:- A Protocol Plan is pre-score. It freezes the robot/policy profile, candidate probe registry digest, selected probes, selected tasks, axes, episode indexes, seeds, denominator policy, and missing-evidence questions.
- An Eval Packet exists only after normalized
AuditInputevidence is attached and checked against the frozen plan. Packet generation validates plan/evidence conformance, including matched cells, missing cells, out-of-protocol episodes, digest mismatches, and metadata mismatches.
LOCAL_PRIVATE by default. The MVP does not run benchmarks, upload data, host models, proxy credentials, or decide whether a robot can be deployed. The intended distribution is internal review/prep plus local-private private reviewer handoff only.
Private reviewer briefs are not public-share-ready. They keep redaction/consent warnings, unsupported claims, missing evidence, and reviewer questions visible. Public sharing, third-party publication, or endorsement-style use requires a separate path for review, redaction, customer consent, signing, and verification.
Flow
AuditInput files, including outputs from worldflux audit import run-folder. Add --strict-missing-evidence to fail packet generation when missing or underpowered evidence should not remain a reviewer-visible warning.
Custom eval contracts
Use--custom-eval-contract when the selected probe is customer-owned rather than a public benchmark family. The contract is a private review input and must include:
- a task manifest reference and digest
- a metric schema reference
- inclusion and exclusion rules
- replay or audit metadata keys
- a reviewer-owned task source
- a customer consent marker
- the customer use case, acceptance question, workflow claim, and task owner