04 · LAMBDA-BER Use Case

Cross-source integration for
integrative structural biology

Resolving how the CHD1 remodeler acts on a nucleosome takes more than any single method can give: a cryo-EM map of the bound complex, a small-angle scattering envelope for CHD1's flexible regions, and a crystallographic structure of the nucleosome core. Each comes from a different instrument and facility, in its own format. A structural computational biologist — increasingly with the assistance of an AI agent — retrieves each result as an RO-Crate, reconciles them in local tables, and stages one integrative-modeling bundle, with LAMBDA-BER as the shared model that lets independent measurements join.

Complex: CHD1 · nucleosome remodeler Techniques: cryo-EM + SAXS + X-ray Env: Beril compute + Claude Code agent Anchor: Dataset-integrative.yaml

The scenario

The same sample is measured in three different places

A chromatin-remodeling project studies how the CHD1 ATPase engages a reconstituted nucleosome. No single technique answers the question, so the structure is attacked from three directions — each producing data at a different light source or microscope, in a different format, with provenance recorded a different way.

cryo-EM

CHD1–nucleosome complex

Titan Krios G3i · 300 kV
  • 3.2 Å reconstruction (in progress)
  • Movies, motion-corrected micrographs, half-maps
  • Defines the bound-state architecture
SAXS / WAXS

CHD1 in solution

SIBYLS 12.3.1 · ALS
  • 30 Å envelope, concentration series
  • I(q) curves, P(r), Rg / Dmax
  • Captures flexibility the map can't
X-ray crystallography

Nucleosome core particle

Macromolecular beamline
  • High-resolution rigid reference
  • MTZ reflections → refined model
  • Docked into the cryo-EM density

Without a shared model, integrating these is a week of manual bookkeeping: chasing down which sample prep fed which beamtime, reconciling buffer conditions, and hand-copying resolution numbers into a modeling spreadsheet. LAMBDA-BER turns that into a query.

Federation

How the agent pulls it together

Raw frames stay where they were collected. The agent fetches lightweight LAMBDA-BER RO-Crates over the lambda API, registers their metadata locally, cross-references a PDB mirror, and only streams heavy data (maps, MTZs, images) on demand when a downstream tool needs it.

cryo-EM facility Krios · RO-Crate SIBYLS 12.3.1 SAXS · RO-Crate X-ray beamline MTZ · RO-Crate lambda API federated fetch Local metadata Sample · ExperimentRun WorkflowRun · DataFile + association tables DuckDB / SQLite PDB mirror reference structures compare · validate (optional) Integrative RO-Crate one bundle → Phenix / IMP / HADDOCK Claude Code orchestrates ↑
The trick: the RO-Crate is the transport, LAMBDA-BER is the payload schema. Each facility emits crates that validate against the same classes, so the agent never writes per-source parsing glue — it merges rows.

The interchange layer

One schema reconciles three vocabularies

The cryo-EM crate, the SAXS crate, and the X-ray crate each describe the project in LAMBDA-BER terms. Because they share class definitions, the agent can join across them on shared Sample and Study identifiers — that join is what makes integration possible.

Entities the crates contribute
ClassFrom
Sample · CHD1–NCP complexall three
ExperimentRun · cryo_emKrios crate
ExperimentRun · saxsSIBYLS crate
ExperimentRun · xraybeamline crate
Instrument ×3per source
DataFile · maps, I(q), MTZper source
Associations that bind them
AssociationTies together
study_samplestudy ↔ each sample
study_experimentstudy ↔ all 3 runs
experiment_samplerun ↔ sample (+ role)
experiment_instrumentrun ↔ instrument
workflow_inputfiles → modeling job
workflow_outputjob → integrative model

The association tables are the seam. A single Study id collects three independent ExperimentRuns collected months apart, on different continents:

Dataset-integrative.yaml — the binding rows
# one study collects all three measurements of the project
study_experiment_associations:
- study_id: lambdaber:study_integrative_nucleosome
  experiment_id: lambdaber:exp_ncp_cryoem_001      # cryo-EM @ Krios
- study_id: lambdaber:study_integrative_nucleosome
  experiment_id: lambdaber:exp_chd1_saxs_001       # SAXS @ SIBYLS 12.3.1
- study_id: lambdaber:study_integrative_nucleosome
  experiment_id: lambdaber:exp_complex_cryoem_001   # bound-state map

# each run points back to its sample, its prep, and its instrument
experiment_instrument_associations:
- experiment_id: lambdaber:exp_chd1_saxs_001
  instrument_id: lambdaber:instrument_sibyls_1231
  role: primary

The agent at work

Claude Code assembles the integrative bundle

Running in the Beril compute environment, the agent reasons over the local tables, pulls only what it needs across the federation, and stages a Phenix/IMP-ready crate.

beril · claude-code · ~/chd1-nucleosome
> assemble an integrative-modeling bundle for the CHD1 nucleosome study

⏺ query local metadata
SELECT * FROM experiment_run JOIN experiment_sample USING(...)
WHERE study_id = 'study_integrative_nucleosome'
→ 3 runs: cryo_em (3.2 Å), saxs (30 Å), xray (1.9 Å) · 1 shared complex sample

⏺ cross-reference (PDB mirror, optional)
→ nucleosome core consistent with 1KX5 · histone identities confirmed

⏺ federated fetch (lambda API · only heavy inputs)
GET sibyls://saxs/exp_chd1_saxs_001 → chd1.dat (I(q), P(r))
GET krios://maps/exp_complex_cryoem_001 → complex_half{1,2}.map
GET xray://mtz/ncp → ncp_refine.mtz
→ 3 DataFile rows registered with checksums (no raw frames copied)

⏺ write WorkflowRun + input associations
→ workflow_integrative_imp_001 · 5 inputs wired via workflow_input_associations

⏺ stage RO-Cratechd1-ncp-integrative.crate/ (validates against lambda_ber_schema)
✓ ready for phenix.dock_in_map + IMP · buffer/Rg/resolution carried in metadata

Where Phenix plugs in

From crate to Phenix inputs

Phenix tools don't read RO-Crates — they read MTZs, maps, and models. The crate's job is upstream of that: because every file is typed by technique, resolution, and role, the agent knows which files each step actually needs, fetches only those across the federation, converts them to the expected formats, and dispatches the right Phenix tool.

X-ray reference → rigid body
phenix.refineMTZ → refined NCP model
phenix.dock_in_mapplace NCP in cryo-EM density
Cryo-EM map → bound state
phenix.mtriageFSC / local resolution from half-maps
phenix.real_space_refinefit complex into 3.2 Å map

SAXS envelopes (Rg, Dmax, P(r)) then enter integrative modeling (IMP/HADDOCK) as a spatial restraint on CHD1's flexible regions — and every Phenix and IMP invocation is written back as a WorkflowRun, so the provenance chain from raw beamtime to final model never breaks.

Payoff

What the standardization buys

[ ↯ ] No per-source glue

Three facilities, one set of classes. The agent merges rows instead of writing a parser for every beamline's export.

[ ⇄ ] Data stays put

Terabytes of frames remain at the source. Only metadata and the specific files a tool needs cross the wire.

[ ⌖ ] Joinable provenance

A shared Study/Sample id makes "everything we measured on this complex" a single query against local tables.

[ ✓ ] Reproducible bundles

The output is itself a validatable RO-Crate — checksummed inputs, typed workflows, ready to re-run or hand off.