02 · LAMBDA-BER Use Case

Ask your data:
your holdings, joined to the PDB

Once a lab's runs are catalogued (use case 01), the metadata becomes a database you can interrogate in plain language. The agent turns a question into SQL over your local tables, cross-references a PDB mirror, and answers in seconds — "which of my structures are already public, and which are still novel?" — without anyone writing a join by hand.

Store: local DuckDB / SQLite Reference: PDB mirror Interface: natural language → SQL Answers: gaps, cohorts, dedup

Why the PDB join matters here

Your data on one side, the world's on the other

A lab accumulates hundreds of samples and experiment runs. On its own that tells you what you measured. Joined against a mirror of the PDB, it tells you something far more useful: what's novel, what duplicates existing structures, where your resolution beats the published entry, and which results you've never deposited.

Unlike the integrative case (04), the PDB is load-bearing here. The whole point is the comparison: every local Sample / ExperimentRun is matched against published entries by sequence, ligand, or organism — turning a private catalog into a map of where your science sits relative to the field.

Ask in plain language

Three questions, three answers

The agent translates each question into SQL over the LAMBDA-BER tables (plus the PDB mirror), runs it locally, and reports. The generated SQL is shown so the scientist can audit and refine it — the natural language is a front-end, not a black box.

Q1 "Which of my purified samples diffracted better than 2.5 Å but were never deposited?"
SELECT s.sample_code, e.experiment_code, e.resolution_ang FROM sample s JOIN experiment_sample_assoc a USING (sample_id) JOIN experiment_run e ON e.id = a.experiment_id LEFT JOIN pdb.entry p ON p.sequence_md5 = s.sequence_md5 -- PDB mirror WHERE e.technique = 'xray_crystallography' AND e.resolution_ang < 2.5 AND p.pdb_id IS NULL; -- no public match = undeposited
4 samples. Three are genuinely novel; one matches a deposited construct by sequence but at worse resolution than yours — a candidate to supersede.
Q2 "Have we ever collected SAXS on anything homologous to this kinase?"
SELECT s.sample_code, e.experiment_code, p.pdb_id FROM pdb.entry p JOIN sample s ON s.organism = p.organism JOIN experiment_sample_assoc a USING (sample_id) JOIN experiment_run e ON e.id = a.experiment_id WHERE p.ec_number LIKE '2.7.11.%' -- protein kinases AND e.technique = 'saxs';
2 runs on a homologous kinase, both with usable Rg — a starting envelope you already own rather than new beamtime.
Q3 "Show me duplicate sample prep across studies so we stop re-purifying."
SELECT s.molecular_composition_hash, count(*) AS n, array_agg(s.sample_code) AS samples FROM sample s GROUP BY s.molecular_composition_hash HAVING count(*) > 1;
6 clusters. The CHD1 construct was independently prepared in three studies — consolidate the protocol and the freezer stock.

Architecture

A reference DB sits beside your catalog

question plain language agent NL → SQL your LAMBDA-BER catalog Sample · ExperimentRun · WorkflowRun · DataFile + association tables (local DuckDB) PDB mirror sequences · ligands · organism · resolution EC numbers · accessions SELECT JOIN

The PDB mirror lives next to the catalog as just another set of tables, so a join is local and fast. Keep it fresh with the project's ETL tooling; the agent never has to leave the box to answer "is this novel?"

What makes it answerable

The schema is already query-shaped

Local columns the questions hit
FromSlot
Sampleorganism, molecular_composition, purity_percentage
ExperimentRuntechnique, quality_metrics.resolution
WorkflowRunworkflow_type, software_name
associationssample ↔ experiment ↔ study
Bridges to the PDB
Match onAnswers
sequencenovel vs. deposited
ligand / cofactorcomplexes seen before
organism / EChomolog coverage
resolutiondo we beat the public entry?

Payoff

What asking-your-data buys

[ ? ] No SQL tax

A PI or bench scientist gets answers without a data engineer in the loop — and sees the generated query to keep it honest.

[ ⌘ ] Novelty at a glance

The PDB join turns "did anyone solve this?" from a literature search into a row count.

[ ⊟ ] Stop redundant work

Surfacing duplicate preps and unused beamtime saves reagents, freezer space, and shifts.

[ ↧ ] Find your deposition gaps

Instantly list high-quality results that were never made public — a to-do list for use case 05.