A catalogued run knows what it is — cryo-EM, crystallography, SAXS — and the agent uses
that to dispatch the right processing pipeline without being told. It reads
technique, locates the inputs by their pointers, runs the appropriate
tools, and writes each step back as a typed WorkflowRun. Phenix is one
engine in the rack, not the whole rack.
The idea
Each technique has a well-trodden processing path, but they share almost no tools. A
cryo-EM movie needs motion correction and 3D classification; a diffraction dataset needs
indexing and scaling; a scattering curve needs buffer subtraction and a Guinier fit. The
agent reads the typed metadata and sends each run down the correct lane — and because the
WorkflowTypeEnum names these steps, the plan is legible before anything runs.
Technique-routed lanes
Every step maps to a workflow_type from the schema, so the pipeline is just
a sequence of typed WorkflowRun rows — auditable, resumable, and the same
shape regardless of which external tool did the work.
How the router decides
The record it leaves
Whatever tool ran, the result is the same kind of row — capturing the type, the software, the version, and the status. That uniformity is what lets the re-analysis use case (06) later find and re-run exactly the right steps.
workflow_runs: - id: lambdaber:wf_ncp_refine_001 workflow_code: WF-NCP-REFINE-001 workflow_type: model_refinement # from WorkflowTypeEnum software_name: Phenix software_version: 1.21.1-5286 # pinned — matters for use case 06 processing_status: completed workflow_input_associations: - workflow_id: lambdaber:wf_ncp_refine_001 data_file_id: lambdaber:df_ncp_mtz # resolved via storage_uri at run time
The agent at work
Payoff
The technique field decides the pipeline, so a mixed queue of cryo-EM, X-ray, and SAXS runs all clears itself correctly.
RELION, DIALS, ATSAS, and Phenix all leave the same shape of WorkflowRun — one provenance model across every technique.
Typed steps mean a half-finished pipeline can be inspected and resumed, not restarted from scratch.
Pinned software_version on each step is what lets use case 06 later target exactly what a new release affects.