01 · LAMBDA-BER Use Case

Federated data pull:
register a beamtime without hauling it home

A shift at the beamline leaves terabytes of frames sitting on the facility's storage. You don't want them on your laptop — you want them known: searchable, linked to the sample and instrument, ready to act on. The agent pulls a LAMBDA-BER RO-Crate over the lambda API and registers the run as rows in your local tables — pointers and checksums, not bytes.

Move: metadata, not data Pointer: storage_uri + checksum Source: any LAMBDA light source Result: a queryable local catalog

The scenario

The data is huge, remote, and yours to track

A SAXS session at SIBYLS or a cryo-EM collection on a Krios produces hundreds of gigabytes to terabytes of raw frames. Copying everything to local disk is slow, expensive, and usually pointless — you need most of it only when a processing step actually reads it. What you need immediately is the record: what was run, on what sample, with what instrument, and where the bytes live.

The federation principle: the facility stays the source of truth for raw data. The agent ingests a lightweight crate that describes the run and points at the bytes by storage_uri, verifying integrity with the checksum. Heavy files are streamed later, on demand, only if a workflow needs them.

How it works

One pull, three things land locally

The crate the facility emits validates against the same schema you use everywhere else, so ingest is a row-insert, not a parsing project. A single session populates the instrument, the experiment, and the file index together.

light source facility SIBYLS · Krios · beamline object store ≈ TB raw frames · movies stays here lambda API crate fetch metadata crate (KB, not TB) local metadata tables Instrument · current_status ExperimentRun · technique, metrics DataFile · storage_uri + checksum pointers resolve back to the facility ↩ stream bytes on demand →

What gets registered

A DataFile row is a pointer, not a payload

Each raw or derived file becomes a DataFile with enough to find it, verify it, and reason about it — without a single byte leaving the facility.

ingested rows (excerpt) — pointers + provenance, zero bytes copied
data_files:
- file_name: saxs_session_2024-05-16_buffer-subtracted.dat
  file_format: hdf5
  data_type: processed
  file_size_bytes: 41229312
  storage_uri: sibyls://als/12.3.1/2024-05-16/run047.h5   # lives at the facility
  checksum: sha256:9f2c…a71b                          # integrity on fetch
  related_entity: lambdaber:exp_chd1_saxs_001

experiment_runs:
- id: lambdaber:exp_chd1_saxs_001
  experiment_code: EXP-CHD1-SAXS-001
  technique: saxs
  processing_status: collected          # raw on day one; updated as it advances
Lands on ingest
ClassCarries
Instrumentcode, model, current_status
ExperimentRuntechnique, conditions, quality_metrics
DataFilestorage_uri, checksum, size, role
assoc tablesexperiment↔instrument, experiment↔sample
Stays remote
AssetWhy
raw movie stacksTB-scale, rarely re-read whole
detector framesneeded only by processing
intermediate mapsregenerable from inputs
→ fetched by storage_uriwhen a step demands it

The agent at work

Ingest a session in one pass

beril · claude-code · ~/lab-catalog
> pull yesterday's SIBYLS session into the catalog

⏺ list available crates (lambda API · facility=sibyls, date=2024-05-16)
→ 1 session · EXP-CHD1-SAXS-001 · 47 runs · 612 GB raw (remote)

⏺ fetch metadata crate
GET sibyls://als/12.3.1/2024-05-16/ro-crate-metadata.json (318 KB)
✓ validates against lambda_ber_schema

⏺ register rows (DuckDB)
+ 1 Instrument · + 1 ExperimentRun · + 49 DataFile (pointers)
+ experiment↔instrument, experiment↔sample associations
✓ 612 GB now searchable · 0 bytes copied · checksums recorded

> how much of my catalog is still un-processed?
⏺ query WHERE processing_status IN ('collected','raw')
→ 1 run ready to hand to use case 03 (auto-orchestrate)

Payoff

What pulling metadata-first buys

[ ⇣ ] Seconds, not hours

A session is catalogued the moment the shift ends — no overnight transfer before you can even see what you have.

[ ⛃ ] No local data lake

You track terabytes without storing them. Bytes stay at the facility and stream only when a step reads them.

[ ✓ ] Integrity built in

Every pointer carries a SHA-256, so any later fetch is verified against what was collected.

[ ↦ ] Ready for everything else

Once registered, a run feeds the query, processing, integration, and deposition use cases — same rows, no re-import.