Use Case 02 · Ask Your Data

Why the PDB join matters here

Your data on one side, the world's on the other

A lab accumulates hundreds of samples and experiment runs. On its own that tells you what you measured. Joined against a mirror of the PDB, it tells you something far more useful: what's novel, what duplicates existing structures, where your resolution beats the published entry, and which results you've never deposited.

Unlike the integrative case (04), the PDB is load-bearing here. The whole point is the comparison: every local Sample / ExperimentRun is matched against published entries by sequence, ligand, or organism — turning a private catalog into a map of where your science sits relative to the field.

Ask in plain language

Three questions, three answers

The agent translates each question into SQL over the LAMBDA-BER tables (plus the PDB mirror), runs it locally, and reports. The generated SQL is shown so the scientist can audit and refine it — the natural language is a front-end, not a black box.

Q1 "Which of my purified samples diffracted better than 2.5 Å but were never deposited?"

SELECT s.sample_code, e.experiment_code, e.resolution_ang FROM sample s JOIN experiment_sample_assoc a USING (sample_id) JOIN experiment_run e ON e.id = a.experiment_id LEFT JOIN pdb.entry p ON p.sequence_md5 = s.sequence_md5 -- PDB mirror WHERE e.technique = 'xray_crystallography' AND e.resolution_ang < 2.5 AND p.pdb_id IS NULL; -- no public match = undeposited

4 samples. Three are genuinely novel; one matches a deposited construct by sequence but at worse resolution than yours — a candidate to supersede.

Q2 "Have we ever collected SAXS on anything homologous to this kinase?"

SELECT s.sample_code, e.experiment_code, p.pdb_id FROM pdb.entry p JOIN sample s ON s.organism = p.organism JOIN experiment_sample_assoc a USING (sample_id) JOIN experiment_run e ON e.id = a.experiment_id WHERE p.ec_number LIKE '2.7.11.%' -- protein kinases AND e.technique = 'saxs';

2 runs on a homologous kinase, both with usable Rg — a starting envelope you already own rather than new beamtime.

Q3 "Show me duplicate sample prep across studies so we stop re-purifying."

SELECT s.molecular_composition_hash, count(*) AS n, array_agg(s.sample_code) AS samples FROM sample s GROUP BY s.molecular_composition_hash HAVING count(*) > 1;

6 clusters. The CHD1 construct was independently prepared in three studies — consolidate the protocol and the freezer stock.

What makes it answerable

The schema is already query-shaped

Local columns the questions hit

From	Slot
Sample	`organism`, `molecular_composition`, `purity_percentage`
ExperimentRun	`technique`, `quality_metrics.resolution`
WorkflowRun	`workflow_type`, `software_name`
associations	sample ↔ experiment ↔ study

Bridges to the PDB

Match on	Answers
sequence	novel vs. deposited
ligand / cofactor	complexes seen before
organism / EC	homolog coverage
resolution	do we beat the public entry?

Payoff

What asking-your-data buys

[ ? ] No SQL tax

A PI or bench scientist gets answers without a data engineer in the loop — and sees the generated query to keep it honest.

[ ⌘ ] Novelty at a glance

The PDB join turns "did anyone solve this?" from a literature search into a row count.

[ ⊟ ] Stop redundant work

Surfacing duplicate preps and unused beamtime saves reagents, freezer space, and shifts.

[ ↧ ] Find your deposition gaps

Instantly list high-quality results that were never made public — a to-do list for use case 05.

Ask your data:
your holdings, joined to the PDB

Your data on one side, the world's on the other

Three questions, three answers

A reference DB sits beside your catalog

The schema is already query-shaped

What asking-your-data buys

[ ? ] No SQL tax

[ ⌘ ] Novelty at a glance

[ ⊟ ] Stop redundant work

[ ↧ ] Find your deposition gaps