Once a lab's runs are catalogued (use case 01), the metadata becomes a database you can interrogate in plain language. The agent turns a question into SQL over your local tables, cross-references a PDB mirror, and answers in seconds — "which of my structures are already public, and which are still novel?" — without anyone writing a join by hand.
Why the PDB join matters here
A lab accumulates hundreds of samples and experiment runs. On its own that tells you what you measured. Joined against a mirror of the PDB, it tells you something far more useful: what's novel, what duplicates existing structures, where your resolution beats the published entry, and which results you've never deposited.
Sample / ExperimentRun is
matched against published entries by sequence, ligand, or organism — turning a private
catalog into a map of where your science sits relative to the field.
Ask in plain language
The agent translates each question into SQL over the LAMBDA-BER tables (plus the PDB mirror), runs it locally, and reports. The generated SQL is shown so the scientist can audit and refine it — the natural language is a front-end, not a black box.
Architecture
The PDB mirror lives next to the catalog as just another set of tables, so a join is local and fast. Keep it fresh with the project's ETL tooling; the agent never has to leave the box to answer "is this novel?"
What makes it answerable
| From | Slot |
|---|---|
| Sample | organism, molecular_composition, purity_percentage |
| ExperimentRun | technique, quality_metrics.resolution |
| WorkflowRun | workflow_type, software_name |
| associations | sample ↔ experiment ↔ study |
| Match on | Answers |
|---|---|
| sequence | novel vs. deposited |
| ligand / cofactor | complexes seen before |
| organism / EC | homolog coverage |
| resolution | do we beat the public entry? |
Payoff
A PI or bench scientist gets answers without a data engineer in the loop — and sees the generated query to keep it honest.
The PDB join turns "did anyone solve this?" from a literature search into a row count.
Surfacing duplicate preps and unused beamtime saves reagents, freezer space, and shifts.
Instantly list high-quality results that were never made public — a to-do list for use case 05.