lambda-ber-schema
lambda-ber-schema is a comprehensive schema for representing multimodal structural biology imaging data, from atomic-resolution structures to tissue-level organization. It supports diverse experimental techniques including cryo-EM, X-ray crystallography, SAXS/SANS, fluorescence microscopy, and spectroscopic imaging.
NOTE this schema was developed rapidly using AI assistance, there may be mistakes!
Schema Organization
The schema follows a relational design with flat entity collections and explicit association tables for many-to-many relationships. This maps cleanly to SQL databases while supporting flexible data reuse across studies.
The top-level entity is a Dataset, which serves as a container for related research. A dataset might represent all data from a specific grant, collaboration, or publication.
Entity Tables
All entities are stored in flat collections at the Dataset level:
Biological Materials - Samples: The biological specimens being studied (proteins, nucleic acids, complexes, cells, tissues). Each sample includes detailed molecular composition, buffer conditions, and storage information. For example, a purified protein with its sequence, concentration, and buffer pH.
- Sample Preparations: How samples were prepared for specific techniques. This includes cryo-EM grid preparation (vitrification parameters), crystallization conditions for X-ray studies, or staining protocols for fluorescence microscopy.
Data Collection - Instruments: The equipment used, from Titan Krios microscopes to synchrotron beamlines. Each instrument type (CryoEMInstrument, XRayInstrument, SAXSInstrument) has specific parameters like accelerating voltage, detector type, or beam energy.
- Experiment Runs: Individual data collection sessions. An experiment run captures when, how, and under what conditions data was collected, including quality metrics like resolution and completeness.
Data Processing - Workflow Runs: Computational processing steps applied to raw data. This includes motion correction for cryo-EM movies, 3D reconstruction, model building, or phase determination for crystallography. Each workflow tracks the software used, parameters, and computational resources.
Data Products - Data Files: Any files generated or used, from raw data to final models. Each file is tracked with checksums for data integrity and typed (micrograph, particles, volume, model).
- Images: Specialized classes for different imaging modalities:
- Image2D: Micrographs, diffraction patterns
- Image3D: 3D reconstructions, tomograms
- FTIRImage: Molecular composition maps from infrared spectroscopy
- FluorescenceImage: Fluorophore-labeled cellular components
- OpticalImage: Brightfield/phase contrast microscopy
- XRFImage: Elemental distribution maps
Logical Groupings - Studies: Lightweight groupings representing focused investigations of specific biological questions. For example, a study might investigate "Heat stress response in Arabidopsis" or "Structure of the human ribosome under different conditions."
Association Tables
Many-to-many relationships are represented via explicit association tables, which can carry relationship metadata (e.g., the role of a sample in an experiment):
- StudySampleAssociation: Links samples to studies (with role: target, control, reference)
- StudyExperimentAssociation: Links experiments to studies
- StudyWorkflowAssociation: Links workflows to studies
- ExperimentSampleAssociation: Links samples to experiments (with role and preparation used)
- ExperimentInstrumentAssociation: Links instruments to experiments (with role: primary, detector)
- WorkflowExperimentAssociation: Links source experiments to workflows
- WorkflowInputAssociation: Links input files to workflows
- WorkflowOutputAssociation: Links output files to workflows
This relational design enables: - Sample reuse: The same sample can be used in multiple studies and experiments - Multi-instrument experiments: An experiment can use multiple instruments with different roles - Integrative workflows: A workflow can combine data from multiple experiments
Example Usage
A typical cryo-EM study of a protein complex would include:
- Sample records for the purified complex with molecular weight and buffer composition
- Grid preparation details with vitrification parameters
- Microscope specifications and data collection parameters
- Processing workflows from motion correction through 3D refinement
- Final reconstructed volumes and fitted atomic models
A multimodal plant imaging study might combine:
- Whole plant optical imaging for morphology
- XRF imaging to map nutrient distribution
- FTIR spectroscopy to identify stress-related molecular changes
- Fluorescence microscopy to track specific protein responses
- Cryo-EM of isolated organelles for ultrastructural details
Key Features
- Relational design: Flat entity tables with explicit association tables for M:N relationships
- SQL-friendly: Maps directly to normalized database tables
- Technique-agnostic core: The same schema handles data from any structural biology method
- Rich metadata: Comprehensive tracking from sample to structure
- Workflow provenance: Complete computational reproducibility
- Multimodal support: Seamlessly integrate data across scales and techniques
- Standards-compliant: Follows FAIR principles and integrates with existing ontologies
URI: https://w3id.org/lambda-ber-schema/
Name: lambda-ber-schema-schema
Classes
| Class | Description |
|---|---|
| BufferComposition | Buffer composition for sample storage |
| ComputeResources | Computational resources used |
| DataCollectionStrategy | Strategy for data collection |
| ExperimentalConditions | Environmental and experimental conditions |
| ImageFeature | |
| MolecularComposition | Molecular composition of a sample |
| NamedThing | A named thing |
| DataFile | A data file generated or used in the study |
| Dataset | A collection of studies |
| ExperimentRun | An experimental data collection session |
| Image | An image file from structural biology experiments |
| FTIRImage | Fourier Transform Infrared (FTIR) spectroscopy image capturing molecular comp... |
| Image2D | A 2D image (micrograph, diffraction pattern) |
| FluorescenceImage | Fluorescence microscopy image capturing specific molecular targets through fl... |
| OpticalImage | Visible light optical microscopy or photography image |
| XRFImage | X-ray fluorescence (XRF) image showing elemental distribution |
| Image3D | A 3D volume or tomogram |
| Instrument | An instrument used to collect data |
| CryoEMInstrument | Cryo-EM microscope specifications |
| SAXSInstrument | SAXS/WAXS instrument specifications |
| XRayInstrument | X-ray diffractometer or synchrotron beamline specifications |
| Sample | A biological sample used in structural biology experiments |
| SamplePreparation | A process that prepares a sample for imaging |
| Study | |
| WorkflowRun | A computational processing workflow execution |
| OntologyTerm | |
| QualityMetrics | Quality metrics for experiments |
| StorageConditions | Storage conditions for samples |
| TechniqueSpecificPreparation | Base class for technique-specific preparation details |
| CryoEMPreparation | Cryo-EM specific sample preparation |
| SAXSPreparation | SAXS/WAXS specific preparation |
| XRayPreparation | X-ray crystallography specific preparation |
Slots
| Slot | Description |
|---|---|
| accelerating_voltage | Accelerating voltage in kV |
| acquisition_date | Date image was acquired |
| additives | Additional additives in the buffer |
| apodization_function | Mathematical function used for apodization |
| astigmatism | Astigmatism value |
| atmosphere | Storage atmosphere conditions |
| autoloader_capacity | Number of grids the autoloader can hold |
| background_correction | Method used for background correction |
| beam_energy | X-ray beam energy in keV |
| beam_size | X-ray beam size in micrometers |
| beam_size_max | Maximum beam size in micrometers |
| beam_size_min | Minimum beam size in micrometers |
| blot_force | Blotting force setting |
| blot_time | Blotting time in seconds |
| buffer_composition | Buffer composition including pH, salts, additives |
| buffer_matching_protocol | Protocol for buffer matching |
| calibration_standard | Reference standard used for calibration |
| cell_path_length | Path length in mm |
| chamber_temperature | Chamber temperature in Celsius |
| channel_name | Name of the fluorescence channel (e |
| checksum | SHA-256 checksum for data integrity |
| collection_mode | Mode of data collection |
| color_channels | Color channels present (e |
| completed_at | Workflow completion time |
| completeness | Data completeness percentage |
| components | Buffer components and their concentrations |
| compute_resources | Computational resources used |
| concentration | Sample concentration in mg/mL or µM |
| concentration_series | Concentration values for series measurements |
| concentration_unit | Unit of concentration measurement |
| contrast_method | Contrast enhancement method used |
| cpu_hours | CPU hours used |
| creation_date | File creation date |
| cryoprotectant | Cryoprotectant used |
| cryoprotectant_concentration | Cryoprotectant concentration percentage |
| crystal_cooling_capability | Crystal cooling system available |
| crystal_size | Crystal dimensions in micrometers |
| crystallization_conditions | Detailed crystallization conditions |
| crystallization_method | Method used for crystallization |
| cs_corrector | Spherical aberration corrector present |
| current_status | Current operational status |
| data_collection_strategy | Strategy for data collection |
| data_files | |
| data_type | Type of data in the file |
| definition | |
| defocus | Defocus value in micrometers |
| description | |
| detector_dimensions | Detector dimensions in pixels (e |
| detector_distance_max | Maximum detector distance in mm |
| detector_distance_min | Minimum detector distance in mm |
| detector_type | Type of detector |
| dimensions_x | Image width in pixels |
| dimensions_y | Image height in pixels |
| dimensions_z | Image depth in pixels/slices |
| dose | Electron dose in e-/Ų |
| dose_per_frame | Dose per frame |
| duration | Storage duration |
| dwell_time | Dwell time per pixel in milliseconds |
| elements_measured | Elements detected and measured |
| emission_filter | Specifications of the emission filter |
| emission_wavelength | Emission wavelength in nanometers |
| energy_max | Maximum X-ray energy in keV |
| energy_min | Minimum X-ray energy in keV |
| excitation_filter | Specifications of the excitation filter |
| excitation_wavelength | Excitation wavelength in nanometers |
| experiment_code | Unique experiment identifier |
| experiment_date | Date of the experiment |
| experiment_id | Reference to the source experiment |
| experimental_conditions | Environmental and experimental conditions |
| exposure_time | Exposure time in seconds |
| file_format | File format |
| file_name | Name of the file |
| file_path | Path to the file |
| file_size_bytes | File size in bytes |
| flash_cooling_method | Flash cooling protocol |
| fluorophore | Name or type of fluorophore used |
| flux | Photon flux in photons/second |
| flux_density | Photon flux density in photons/s/mm² |
| frame_rate | Frames per second |
| goniometer_type | Type of goniometer |
| gpu_hours | GPU hours used |
| grid_type | Type of EM grid used |
| hole_size | Hole size in micrometers |
| humidity | Humidity percentage |
| humidity_percentage | Chamber humidity during vitrification |
| id | |
| illumination_type | Type of illumination (brightfield, darkfield, phase contrast, DIC) |
| images | |
| installation_date | Date of instrument installation |
| instrument_code | Unique identifier code for the instrument |
| instrument_id | Reference to the instrument used |
| instrument_runs | |
| keywords | |
| label | |
| laser_power | Laser power in milliwatts or percentage |
| ligands | Bound ligands or cofactors |
| magnification | Optical magnification factor |
| manufacturer | Instrument manufacturer |
| memory_gb | Maximum memory used in GB |
| model | Instrument model |
| modifications | Post-translational modifications or chemical modifications |
| molecular_composition | Description of molecular composition including sequences, modifications, liga... |
| molecular_signatures | Identified molecular signatures or peaks |
| molecular_weight | Molecular weight in kDa |
| monochromator_type | Type of monochromator |
| mounting_method | Crystal mounting method |
| number_of_scans | Number of scans averaged for the spectrum |
| numerical_aperture | Numerical aperture of the objective lens |
| ontology | |
| operator_id | Person who performed the preparation |
| output_files | Output files generated |
| parent_sample_id | Reference to parent sample for derivation tracking |
| ph | pH of the buffer |
| phase_plate | Phase plate available |
| pinhole_size | Pinhole size in Airy units for confocal microscopy |
| pixel_size | Pixel size in Angstroms |
| pixel_size_max | Maximum pixel size in Angstroms per pixel |
| pixel_size_min | Minimum pixel size in Angstroms per pixel |
| plasma_treatment | Plasma treatment details |
| preparation_date | Date of sample preparation |
| preparation_method | Method used to prepare the sample |
| preparation_type | Type of sample preparation |
| pressure | Pressure in kPa |
| processing_level | Processing level (0=raw, 1=corrected, 2=derived, 3=model) |
| processing_parameters | Parameters used in processing |
| processing_status | Current processing status |
| protocol_description | Detailed protocol description |
| purity_percentage | Sample purity as percentage |
| q_range_max | Maximum q value in inverse Angstroms |
| q_range_min | Minimum q value in inverse Angstroms |
| quality_metrics | Quality control metrics for the sample |
| quantum_yield | Quantum yield of the fluorophore |
| r_factor | R-factor for crystallography |
| raw_data_location | Location of raw data files |
| reconstruction_method | Method used for 3D reconstruction |
| resolution | Resolution in Angstroms |
| sample_cell_type | Type of sample cell used |
| sample_changer_capacity | Number of samples in automatic sample changer |
| sample_code | Unique identifier code for the sample |
| sample_id | Reference to the sample being prepared |
| sample_preparations | |
| sample_type | Type of biological sample |
| samples | |
| sequences | Amino acid or nucleotide sequences |
| signal_to_noise | Signal to noise ratio |
| software_name | Software used for processing |
| software_version | Software version |
| source_type | Type of X-ray source |
| spectral_resolution | Spectral resolution in cm⁻¹ |
| started_at | Workflow start time |
| storage_conditions | Storage conditions for the sample |
| storage_gb | Storage used in GB |
| studies | |
| support_film | Support film type |
| technique | Technique used for data collection |
| temperature | Storage temperature in Celsius |
| temperature_control | Temperature control settings |
| temperature_control_range | Temperature control range in Celsius |
| temperature_unit | Temperature unit |
| terms | |
| title | |
| total_dose | Total electron dose for cryo-EM |
| total_frames | Total number of frames/images |
| vitrification_method | Method used for vitrification |
| voxel_size | Voxel size in Angstroms |
| wavenumber_max | Maximum wavenumber in cm⁻¹ |
| wavenumber_min | Minimum wavenumber in cm⁻¹ |
| white_balance | White balance settings |
| workflow_code | Unique workflow identifier |
| workflow_runs | |
| workflow_type | Type of processing workflow |
Enumerations
| Enumeration | Description |
|---|---|
| CollectionModeEnum | Data collection modes |
| ConcentrationUnitEnum | Units for concentration measurement |
| CrystallizationMethodEnum | Methods for protein crystallization |
| DataTypeEnum | Types of data |
| DetectorTypeEnum | Types of detectors for cryo-EM |
| FileFormatEnum | File formats |
| GridTypeEnum | Types of EM grids |
| IlluminationTypeEnum | Types of illumination for optical microscopy |
| InstrumentStatusEnum | Operational status of instruments |
| PreparationTypeEnum | Types of sample preparation |
| ProcessingStatusEnum | Processing status |
| SampleTypeEnum | Types of biological samples |
| TechniqueEnum | Structural biology techniques |
| TemperatureUnitEnum | Units for temperature measurement |
| VitrificationMethodEnum | Methods for vitrification |
| WorkflowTypeEnum | Types of processing workflows |
| XRaySourceTypeEnum | Types of X-ray sources |
Types
| Type | Description |
|---|---|
| Boolean | A binary (true or false) value |
| Curie | a compact URI |
| Date | a date (year, month and day) in an idealized calendar |
| DateOrDatetime | Either a date or a datetime |
| Datetime | The combination of a date and time |
| Decimal | A real number with arbitrary precision that conforms to the xsd:decimal speci... |
| Double | A real number that conforms to the xsd:double specification |
| Float | A real number that conforms to the xsd:float specification |
| Integer | An integer |
| Jsonpath | A string encoding a JSON Path |
| Jsonpointer | A string encoding a JSON Pointer |
| Ncname | Prefix part of CURIE |
| Nodeidentifier | A URI, CURIE or BNODE that represents a node in a model |
| Objectidentifier | A URI or CURIE that represents an object in the model |
| Sparqlpath | A string encoding a SPARQL Property Path |
| String | A character string |
| Time | A time object represents a (local) time of day, independent of any particular... |
| Uri | a complete URI |
| Uriorcurie | a URI or a CURIE |
Subsets
| Subset | Description |
|---|---|