Resolving how the CHD1 remodeler acts on a nucleosome takes more than any single method can give: a cryo-EM map of the bound complex, a small-angle scattering envelope for CHD1's flexible regions, and a crystallographic structure of the nucleosome core. Each comes from a different instrument and facility, in its own format. A structural computational biologist — increasingly with the assistance of an AI agent — retrieves each result as an RO-Crate, reconciles them in local tables, and stages one integrative-modeling bundle, with LAMBDA-BER as the shared model that lets independent measurements join.
The scenario
A chromatin-remodeling project studies how the CHD1 ATPase engages a reconstituted nucleosome. No single technique answers the question, so the structure is attacked from three directions — each producing data at a different light source or microscope, in a different format, with provenance recorded a different way.
Without a shared model, integrating these is a week of manual bookkeeping: chasing down which sample prep fed which beamtime, reconciling buffer conditions, and hand-copying resolution numbers into a modeling spreadsheet. LAMBDA-BER turns that into a query.
Federation
Raw frames stay where they were collected. The agent fetches lightweight LAMBDA-BER RO-Crates over the lambda API, registers their metadata locally, cross-references a PDB mirror, and only streams heavy data (maps, MTZs, images) on demand when a downstream tool needs it.
The interchange layer
The cryo-EM crate, the SAXS crate, and the X-ray crate each describe the project in
LAMBDA-BER terms. Because they share class definitions, the agent can join across them
on shared Sample and Study identifiers — that join is what
makes integration possible.
| Class | From |
|---|---|
Sample · CHD1–NCP complex | all three |
ExperimentRun · cryo_em | Krios crate |
ExperimentRun · saxs | SIBYLS crate |
ExperimentRun · xray | beamline crate |
Instrument ×3 | per source |
DataFile · maps, I(q), MTZ | per source |
| Association | Ties together |
|---|---|
study_sample | study ↔ each sample |
study_experiment | study ↔ all 3 runs |
experiment_sample | run ↔ sample (+ role) |
experiment_instrument | run ↔ instrument |
workflow_input | files → modeling job |
workflow_output | job → integrative model |
The association tables are the seam. A single Study id collects three
independent ExperimentRuns collected months apart, on different continents:
# one study collects all three measurements of the project study_experiment_associations: - study_id: lambdaber:study_integrative_nucleosome experiment_id: lambdaber:exp_ncp_cryoem_001 # cryo-EM @ Krios - study_id: lambdaber:study_integrative_nucleosome experiment_id: lambdaber:exp_chd1_saxs_001 # SAXS @ SIBYLS 12.3.1 - study_id: lambdaber:study_integrative_nucleosome experiment_id: lambdaber:exp_complex_cryoem_001 # bound-state map # each run points back to its sample, its prep, and its instrument experiment_instrument_associations: - experiment_id: lambdaber:exp_chd1_saxs_001 instrument_id: lambdaber:instrument_sibyls_1231 role: primary
The agent at work
Running in the Beril compute environment, the agent reasons over the local tables, pulls only what it needs across the federation, and stages a Phenix/IMP-ready crate.
Where Phenix plugs in
Phenix tools don't read RO-Crates — they read MTZs, maps, and models. The crate's job is upstream of that: because every file is typed by technique, resolution, and role, the agent knows which files each step actually needs, fetches only those across the federation, converts them to the expected formats, and dispatches the right Phenix tool.
phenix.refine | MTZ → refined NCP model |
phenix.dock_in_map | place NCP in cryo-EM density |
phenix.mtriage | FSC / local resolution from half-maps |
phenix.real_space_refine | fit complex into 3.2 Å map |
SAXS envelopes (Rg, Dmax, P(r)) then enter integrative modeling (IMP/HADDOCK) as a
spatial restraint on CHD1's flexible regions — and every Phenix and IMP invocation is
written back as a WorkflowRun, so the provenance chain from raw beamtime
to final model never breaks.
Payoff
Three facilities, one set of classes. The agent merges rows instead of writing a parser for every beamline's export.
Terabytes of frames remain at the source. Only metadata and the specific files a tool needs cross the wire.
A shared Study/Sample id makes "everything we measured on this complex" a single query against local tables.
The output is itself a validatable RO-Crate — checksummed inputs, typed workflows, ready to re-run or hand off.