Structure software never stops improving — a new reconstruction algorithm, a better CTF model, an updated sequence database. Old data often holds more than it gave the first time. Because every result records the exact software and inputs that produced it, the agent can find precisely which past results a new release affects, re-run only those, and report whether the numbers actually moved.
What sets it off
A re-analysis sweep starts from a change in the world, not a calendar. The schema's
pinned software_name / software_version on each
WorkflowRun is what makes "which results are now stale?" a query rather
than a guess.
RELION 5, a new cryoSPARC, or a Phenix point release with a better refinement target.
A refreshed sequence DB, a new ligand restraint library, or an improved reference model.
Improved motion correction or CTF estimation that can squeeze more from existing raw frames.
The sweep
Scope the work
The expensive mistake is re-processing everything. The point of pinned versions is surgical scope: a query returns exactly the runs a release can improve, and nothing else is touched.
-- which completed runs used a now-superseded RELION? SELECT w.workflow_code, w.software_version, e.experiment_code FROM workflow_run w JOIN workflow_input_assoc i USING (workflow_id) JOIN experiment_run e ON e.id = i.experiment_id WHERE w.software_name = 'RELION' AND w.workflow_type = 'refinement' AND w.software_version LIKE '3.%' AND w.processing_status = 'completed'; -- → 18 runs. Re-run these 18, not the other 3,400.
The result
Re-running produces new WorkflowRun rows beside the old ones — never
overwriting history. The agent reports the metric deltas so a scientist can accept the
improvements, investigate regressions, and ignore the unchanged.
| Run | Metric | Was (v3) | Now (v5) | Δ |
|---|---|---|---|---|
| EXP-NCP-CRYOEM-001 | resolution (Å) | 3.4 | 3.0 | ▲ 0.4 better |
| EXP-COMPLEX-001 | map-model FSC | 0.71 | 0.78 | ▲ improved |
| EXP-APO-014 | resolution (Å) | 2.9 | 2.9 | — flat |
| EXP-LIG-022 | clashscore | 4.1 | 5.8 | ▼ regressed — flag |
Pair this with the lakehouse (use case 05) and every sweep becomes a versioned snapshot: you can always answer "what did this structure look like before the RELION 5 reprocessing?"
The agent at work
Payoff
Raw collections keep paying off as methods improve — without anyone remembering to revisit them.
Pinned versions mean you re-run the 18 runs a release touches, not the 3,400 it doesn't.
A diff catches the rare case where "newer" is worse, before it quietly ships into a deposition.
New results sit beside old ones as versioned rows, so every past state stays reproducible.