REMBI Alignment: Integrating Biological Imaging Metadata Standards with lambda-ber-schema
Executive Summary
REMBI (Recommended Metadata for Biological Images) represents a community-driven standard for biological imaging metadata that shares significant overlap with lambda-ber-schema's goals for structural biology data management. Published in Nature Methods in 2021, REMBI provides guidelines for describing imaging experiments across light microscopy, electron microscopy, and X-ray microscopy—making it highly relevant to lambda-ber-schema's multi-modal approach. This document analyzes the alignment between REMBI and lambda-ber-schema, identifying opportunities for integration and mutual reinforcement of metadata standards in structural biology.
1. Introduction to REMBI
1.1 Background and Development
REMBI emerged from a critical need in the bioimaging community to standardize metadata for the vast amounts of imaging data being generated. Developed through a collaborative workshop in Hinxton (October 2019), REMBI brought together representatives from light, electron, and X-ray microscopy communities to establish common metadata guidelines.
Key Publication: Sarkans, U., Chiu, W., Collinson, L., et al. (2021). REMBI: Recommended Metadata for Biological Images—enabling reuse of microscopy data in biology. Nature Methods, 18, 1418–1422.
1.2 Core Objectives
REMBI aims to: - Enable systematic archiving of imaging data and metadata in public databases - Support FAIR (Findable, Accessible, Interoperable, Reusable) data principles - Facilitate automated data harvesting using machine learning techniques - Bridge diverse imaging communities with flexible yet comprehensive metadata standards
1.3 Adoption and Implementation
REMBI has been adopted by major imaging repositories: - BioImage Archive: Primary implementation platform - EMPIAR: Electron Microscopy Public Image Archive - Cell-IDR: Image Data Resource for cellular imaging - Tissue-IDR: Tissue-level imaging repository
2. REMBI's Eight-Component Structure
2.1 Hierarchical Organization
REMBI organizes metadata into eight high-level components that encompass diverse biological imaging methods:
REMBI_Structure:
1_Study:
description: "Top-level project metadata"
alignment: "Dublin Core, DataCite, schema.org"
2_Study_Component:
description: "Container for organizing related data"
example: "Separate components for EM and confocal in correlative studies"
3_Biosample:
description: "Biological sample metadata"
importance: "Critical for biological context"
4_Specimen:
description: "Specimen preparation details"
includes: "Fixation, staining, mounting"
5_Image_Acquisition:
description: "Microscopy technique and settings"
audience: "Imaging scientists and microscopists"
6_Image_Data:
description: "Image file metadata"
implementation: "File list with technical parameters"
7_Image_Correlation:
description: "Multi-modal image registration"
status: "Optional"
use_case: "Correlative microscopy"
8_Analyzed_Data:
description: "Image analysis and measurements"
status: "Optional"
format: "Typically tabular"
2.2 Target Audiences
REMBI addresses three primary user groups:
- Imaging Scientists: Need detailed acquisition parameters, physical properties of instruments
- Computer Vision Researchers: Require ground truth data, segmentations, labeled datasets
- Biologists: Focus on biological context, experimental conditions, sample preparation
3. Alignment with lambda-ber-schema
3.1 Structural Mapping
lambda-ber-schema and REMBI share substantial conceptual overlap, with complementary strengths:
| REMBI Component | lambda-ber-schema Equivalent | Alignment Level | Notes |
|---|---|---|---|
| Study | Study | High | Direct conceptual match |
| Study Component | ExperimentRun | High | Both organize data collection sessions |
| Biosample | Sample | Very High | Core biological material description |
| Specimen | SamplePreparation | Very High | Preparation protocols and methods |
| Image Acquisition | Instrument + ExperimentRun | High | lambda-ber-schema separates instrument from run |
| Image Data | Image (2D/3D) + DataFile | High | lambda-ber-schema has specialized image classes |
| Image Correlation | Cross-modal relationships | Medium | lambda-ber-schema handles via Study relationships |
| Analyzed Data | WorkflowRun + DataFile | High | Processing and derived data |
3.2 Complementary Strengths
REMBI Strengths for lambda-ber-schema
- Comprehensive Imaging Focus: REMBI provides deeper metadata for microscopy-specific parameters
- Community Consensus: Broad acceptance across imaging communities
- Correlation Metadata: Explicit support for multi-modal image registration
- Machine Learning Ready: Designed with AI/ML applications in mind
lambda-ber-schema Strengths for REMBI
- Structural Biology Depth: Specialized support for crystallography, SAXS/SANS, spectroscopy
- Workflow Provenance: Detailed computational processing tracking
- Multi-Scale Integration: From atomic to tissue-level organization
- LinkML Foundation: Semantic web compatibility and validation
3.3 Integration Opportunities
Integration_Strategy:
metadata_harmonization:
- Map REMBI fields to lambda-ber-schema schema
- Extend lambda-ber-schema with REMBI-specific attributes
- Create bidirectional converters
shared_vocabularies:
- Adopt REMBI controlled terms for microscopy
- Contribute structural biology terms to REMBI
- Align with common ontologies (UBERON, CL, etc.)
tool_ecosystem:
- Support REMBI export from lambda-ber-schema
- Import REMBI-compliant data
- Validation across both standards
4. Specific Relevance to Structural Biology
4.1 Cryo-EM Applications
REMBI's adoption by EMPIAR creates direct relevance for structural biology:
Case Study: EMPIAR-10061 - Dataset: β-galactosidase raw cryo-EM data (12.4 TB) - Original resolution: 2.2 Å - Reuse examples: - Reprocessed to higher resolution by multiple groups - Used for algorithm development - Training data for deep learning particle picking - Cloud processing pipeline development
This demonstrates how REMBI metadata enables extensive data reuse in structural biology.
4.2 Correlative Microscopy
REMBI's Image Correlation component is particularly valuable for multi-modal structural biology:
Correlative_Example:
technique_1:
type: "Cryo-EM"
resolution: "3-5 Å"
provides: "Molecular structure"
technique_2:
type: "Fluorescence"
resolution: "200-300 nm"
provides: "Cellular context"
correlation_metadata:
transformation_matrix: "3x4 affine"
registration_error: "50 nm"
fiducial_markers: "Gold particles"
lambda-ber-schema_extension:
add_rembi_correlation: true
maintain_provenance: true
cross_validate: true
4.3 Light Microscopy Integration
Many structural biology studies now incorporate light microscopy for context:
- Live cell imaging before vitrification
- Fluorescence-guided lamella preparation
- CLEM (Correlative Light and Electron Microscopy)
- Super-resolution microscopy for validation
REMBI provides the metadata framework for these techniques that lambda-ber-schema can leverage.
5. Implementation Recommendations
5.1 Short-Term Integration (0-6 months)
Metadata Mapping
Create explicit mappings between REMBI and lambda-ber-schema:
class REMBIlambda-ber-schemaMapper:
"""Bidirectional mapper between REMBI and lambda-ber-schema schemas"""
def rembi_to_lambda-ber-schema(self, rembi_data):
lambda-ber-schema_study = {
'title': rembi_data['study']['title'],
'description': rembi_data['study']['description'],
'samples': self.map_biosamples(rembi_data['biosample']),
'sample_preparations': self.map_specimen(rembi_data['specimen']),
'instrument_runs': self.map_acquisition(rembi_data['image_acquisition']),
'images': self.map_image_data(rembi_data['image_data'])
}
# Handle optional correlation data
if 'image_correlation' in rembi_data:
lambda-ber-schema_study['cross_modal_registration'] = \
self.map_correlation(rembi_data['image_correlation'])
return lambda-ber-schema_study
def lambda-ber-schema_to_rembi(self, lambda-ber-schema_data):
# Inverse mapping for REMBI compliance
pass
Vocabulary Alignment
Harmonize controlled vocabularies:
Vocabulary_Mapping:
sample_types:
rembi: ["cell", "tissue", "organism", "molecule"]
lambda-ber-schema: ["protein", "nucleic_acid", "complex", "cell", "tissue"]
unified: ["protein", "nucleic_acid", "complex", "cell", "tissue", "organism", "small_molecule"]
preparation_methods:
rembi: ["fixation", "staining", "mounting"]
lambda-ber-schema: ["vitrification", "crystallization", "purification"]
unified: # Combine both sets with clear categories
imaging_techniques:
rembi: ["brightfield", "fluorescence", "confocal", "EM", "super-resolution"]
lambda-ber-schema: ["cryo_em", "x_ray", "saxs", "sans", "ftir", "xrf"]
unified: # Comprehensive technique ontology
5.2 Medium-Term Development (6-12 months)
Schema Extension
Extend lambda-ber-schema to incorporate REMBI-specific fields:
# Addition to lambda-ber-schema schema
ImageCorrelation:
is_a: AttributeGroup
description: "REMBI-compatible image correlation metadata"
attributes:
source_image:
range: Image
description: "Reference image for alignment"
target_image:
range: Image
description: "Image to be aligned"
transformation_type:
range: string
description: "Type of transformation (affine, deformable, etc.)"
transformation_parameters:
range: float
multivalued: true
description: "Transformation matrix or parameters"
registration_error:
range: float
description: "Registration accuracy in nm"
fiducial_markers:
range: string
description: "Type of fiducial markers used"
correlation_coefficient:
range: float
description: "Correlation quality metric"
Validation Framework
Implement cross-standard validation:
class CrossStandardValidator:
"""Validate data against both REMBI and lambda-ber-schema"""
def validate(self, data):
results = {
'lambda-ber-schema_valid': self.validate_lambda-ber-schema(data),
'rembi_valid': self.validate_rembi(data),
'cross_compatible': self.check_compatibility(data),
'warnings': [],
'errors': []
}
# Check for conflicts between standards
if results['lambda-ber-schema_valid'] and not results['rembi_valid']:
results['warnings'].append(
"Data valid for lambda-ber-schema but missing REMBI requirements"
)
return results
5.3 Long-Term Vision (1-2 years)
Unified Imaging Metadata Standard
Work toward a unified standard that combines strengths:
Future_Standard:
name: "Unified Biological Imaging Metadata (UBIM)"
foundation: "LinkML"
components:
- structural_biology: "From lambda-ber-schema"
- light_microscopy: "From REMBI"
- image_correlation: "From REMBI"
- workflow_provenance: "From lambda-ber-schema"
- ai_readiness: "From both"
governance:
- committee: "Joint REMBI-lambda-ber-schema working group"
- stakeholders: ["wwPDB", "EMDB", "BioImage Archive", "IDR"]
- update_cycle: "Annual"
6. Practical Examples
6.1 Example: Cryo-CLEM Study
Combining cryo-EM with fluorescence microscopy:
Study:
title: "Mitochondrial ATP synthase in situ structure"
# REMBI-style biosample
biosample:
organism: "Homo sapiens"
cell_type: "HeLa"
organelle: "Mitochondria"
# REMBI-style specimen preparation
specimen:
fixation: "Vitrification"
temperature: "-180°C"
fluorescent_label: "MitoTracker Green"
# lambda-ber-schema-style experiment runs
experiments:
- technique: "fluorescence_microscopy"
rembi_metadata:
excitation_wavelength: "488 nm"
emission_wavelength: "510 nm"
objective: "100x/1.4 NA"
- technique: "cryo_em"
voltage: "300 kV"
detector: "K3"
pixel_size: "0.83 Å"
# REMBI image correlation
correlation:
method: "Fiducial-based"
markers: "100 nm gold"
accuracy: "50 nm"
software: "ec-CLEM"
6.2 Example: Multi-Scale Plant Imaging
From organ to molecular level:
Study:
title: "Chloroplast structure under stress"
study_components:
- name: "Whole leaf imaging"
technique: "optical_microscopy"
rembi_compliance: true
scale: "mm"
- name: "Cellular imaging"
technique: "confocal_microscopy"
rembi_compliance: true
scale: "μm"
- name: "Chloroplast ultrastructure"
technique: "cryo_em_tomography"
lambda-ber-schema_compliance: true
scale: "nm"
- name: "Protein structure"
technique: "single_particle_cryoem"
lambda-ber-schema_compliance: true
scale: "Å"
cross_scale_correlation:
registration_method: "Hierarchical"
coordinate_system: "Unified 3D"
metadata_standard: "REMBI+lambda-ber-schema"
7. Benefits of Integration
7.1 For the Structural Biology Community
- Broader Data Integration: Seamlessly combine molecular structures with cellular context
- Enhanced Discoverability: REMBI-compliant data more findable in imaging repositories
- Improved Reproducibility: Comprehensive metadata across scales
- AI/ML Readiness: Standardized metadata for training algorithms
7.2 For the Imaging Community
- Structural Context: Link cellular images to molecular structures
- Advanced Analytics: Leverage structural biology computational tools
- Multi-Modal Workflows: Streamlined correlative approaches
- Validation Frameworks: Benefit from structural biology's rigorous validation
7.3 For Data Repositories
- Interoperability: Exchange data between EMPIAR, BioImage Archive, and others
- Reduced Redundancy: Shared metadata models reduce duplication
- Unified Submission: Single pipeline for multi-modal studies
- Enhanced Services: Offer cross-repository search and analysis
8. Challenges and Mitigation
8.1 Technical Challenges
| Challenge | Mitigation Strategy |
|---|---|
| Schema complexity | Provide user-friendly tools and wizards |
| Vocabulary conflicts | Create mapping tables and conversion tools |
| Validation overhead | Implement automated validation pipelines |
| Storage requirements | Use hierarchical storage with metadata separation |
8.2 Community Challenges
| Challenge | Mitigation Strategy |
|---|---|
| Adoption resistance | Demonstrate clear benefits through case studies |
| Training needs | Develop comprehensive educational materials |
| Legacy data | Provide migration tools and services |
| Governance | Establish joint committees with clear mandates |
9. Implementation Roadmap
Phase 1: Foundation (Months 1-3)
- Establish working group with REMBI representatives
- Create detailed field mapping documentation
- Develop proof-of-concept converter tools
- Identify pilot projects for testing
Phase 2: Integration (Months 4-9)
- Implement bidirectional converters
- Extend lambda-ber-schema schema with REMBI fields
- Develop validation frameworks
- Launch pilot projects with early adopters
Phase 3: Deployment (Months 10-12)
- Release integrated tools and documentation
- Train repository staff and users
- Process feedback and iterate
- Plan long-term governance structure
Phase 4: Maturation (Year 2+)
- Establish formal partnership with REMBI
- Contribute to standard evolution
- Expand adoption across communities
- Develop advanced integration features
10. Conclusions
10.1 Strategic Importance
The alignment between REMBI and lambda-ber-schema represents a critical opportunity to bridge the imaging and structural biology communities. With REMBI's broad adoption in biological imaging and lambda-ber-schema's comprehensive structural biology support, integration of these standards will enable unprecedented multi-scale, multi-modal studies.
10.2 Mutual Benefits
Both standards benefit from integration: - REMBI gains access to structural biology's sophisticated data models and validation frameworks - lambda-ber-schema gains established imaging metadata standards and community adoption - The scientific community gains a more complete picture from molecules to organisms
10.3 Path Forward
The path to integration is clear and achievable: 1. Start with metadata mapping and vocabulary alignment 2. Build practical tools for data conversion 3. Demonstrate value through pilot projects 4. Formalize collaboration through governance structures 5. Work toward eventual standard convergence
10.4 Call to Action
We recommend: 1. Immediate: Contact REMBI maintainers to discuss collaboration 2. Short-term: Implement REMBI support in lambda-ber-schema tools 3. Medium-term: Joint workshops and training materials 4. Long-term: Co-develop next-generation unified standards
The convergence of REMBI and lambda-ber-schema will accelerate scientific discovery by enabling researchers to seamlessly navigate from cellular contexts to atomic details, ultimately advancing our understanding of biological systems across all scales.
References
-
Sarkans, U., et al. (2021). REMBI: Recommended Metadata for Biological Images—enabling reuse of microscopy data in biology. Nature Methods, 18, 1418–1422.
-
REMBI Overview. European Bioinformatics Institute. https://www.ebi.ac.uk/bioimage-archive/rembi-help-overview/
-
Iudin, A., et al. (2016). EMPIAR: a public archive for raw electron microscopy image data. Nature Methods, 13, 387–388.
-
Williams, E., et al. (2017). The Image Data Resource: A Bioimage Data Integration and Publication Platform. Nature Methods, 14, 775–781.
-
Hartley, M., et al. (2022). The BioImage Archive – Building a Home for Life-Sciences Microscopy Data. Journal of Molecular Biology, 434(11), 167505.