Skip to content

SimpleScattering GluRS Dataset Analysis

Dataset Overview

Source: SimpleScattering.com
Dataset ID: xsbhevph
URL: https://simplescattering.com/open_dataset/xsbhevph
Title: SEC-SAXS-MALS of Pseudomonas aeruginosa GluRS

Scientific Context

This dataset represents a comprehensive structural characterization of Glutamyl-tRNA synthetase (GluRS) from Pseudomonas aeruginosa using an integrated approach:

  1. SEC-SAXS: Size Exclusion Chromatography coupled with Small Angle X-ray Scattering
  2. MALS: Multi-Angle Light Scattering
  3. Crystallography: Complementary crystal structure (PDB: 8VC5)

Scientific Goal: Determine the oligomerization state of GluRS in solution

Data Structure

Raw Data

  • 660 SEC-SAXS frames: Individual scattering profiles collected during SEC elution
  • Format: ASCII text files with Q, I(Q), and Error columns
  • File pattern: GluRS_1_00001.dat to GluRS_1_00660.dat
  • Size: ~35 KB per frame
  • Total: 10.8 MB compressed

Processed Data

  • Averaged profile: A_S_GluRS_1_5.dat (107 KB)
  • Merged data from peak fractions
  • Q range: 0.0109 to 0.4729 Å⁻¹
  • 913 data points

Structural Data

  • Crystal structure: 8vc5.pdb (1.3 MB)
  • Zinc-bound form of GluRS
  • Resolution data from crystallography
  • Deposited: 2023-12-13

Technical Parameters

Beamline Configuration

  • Facility: Advanced Light Source (ALS)
  • Beamline: SIBYLS BL12.3.1
  • Wavelength: 1.127 Å (11 keV)
  • Detector: Pilatus 3 2M
  • Sample-to-detector distance: 2.1 m

Data Collection

  • Technique: SEC-SAXS with inline MALS
  • Frames: 660 during elution
  • Q range: 0.0109 - 0.4729 Å⁻¹
  • Temperature: 20°C (standard)

lambda-ber-schema Representation

Key Design Decisions

  1. Dataset Structure yaml Dataset └── Study: GluRS oligomerization ├── Sample: PA-GluRS protein ├── SamplePreparations: expression, purification, SEC-SAXS prep ├── ExperimentRun: SEC-SAXS-MALS collection ├── WorkflowRuns: SAXS processing, MALS analysis ├── DataFiles: frames, averaged data, structure, archive └── Images: chromatogram, Guinier plot

  2. Sample Tracking

  3. Linked UniProt ID (Q9XCL6) for sequence reference
  4. Expression system details (E. coli BL21(DE3))
  5. Buffer composition for reproducibility
  6. Concentration and preparation methods

  7. Multi-technique Integration

  8. SEC-SAXS-MALS represented as single experiment run
  9. Technique set to 'saxs' (primary method)
  10. MALS processing as separate workflow
  11. Crystal structure linked as complementary data

  12. Data Provenance

  13. All 660 frames referenced (example shown)
  14. Processing parameters captured
  15. Software versions documented
  16. Compute resources tracked

Schema Extensions Required

  1. FileFormatEnum: Added zip for compressed archives
  2. Existing enums covered:
  3. sec_saxs collection mode (already added)
  4. saxs_analysis workflow type (already added)
  5. ascii file format (already added)

Metadata Preservation

The lambda-ber-schema annotation preserves all critical metadata: - Experimental conditions (wavelength, detector, distance) - Processing parameters (Q range, buffer subtraction) - Quality metrics (estimated Rg ~32.5 Å, I(0) ~45) - Data lineage (raw frames → averaged profile)

Comparison with Native Format

SimpleScattering Format

  • Web-based presentation
  • Minimal structured metadata
  • Files as separate downloads
  • No formal schema

lambda-ber-schema Advantages

  1. Structured metadata: Machine-readable YAML/JSON
  2. Complete provenance: Sample → Data → Analysis
  3. Validation: LinkML schema ensures data integrity
  4. Integration ready: Can combine with other techniques
  5. FAIR compliant: Findable, Accessible, Interoperable, Reusable

Use Cases Enabled

  1. Data Integration
  2. Combine SEC-SAXS with crystallography
  3. Cross-reference with other GluRS studies
  4. Meta-analysis across synthetases

  5. Reproducibility

  6. Complete experimental parameters
  7. Processing workflow documented
  8. Software versions tracked

  9. Machine Learning

  10. Structured data for training
  11. Quality metrics for filtering
  12. Multi-modal feature extraction

  13. Data Discovery

  14. Searchable by protein, technique, facility
  15. Queryable quality metrics
  16. Linked to public databases

Recommendations

  1. For Data Providers
  2. Include buffer subtraction details
  3. Provide Rg and Dmax values
  4. Link to publications when available
  5. Include MALS molecular weight results

  6. For Schema Development

  7. Consider adding MALS-specific fields
  8. Add SEC parameters (column, flow rate)
  9. Include data quality indicators
  10. Support for time-resolved data

  11. For Users

  12. Validate oligomeric state claims
  13. Compare solution vs crystal structures
  14. Check for aggregation in SEC profile
  15. Verify Guinier region linearity

Conclusion

The SimpleScattering GluRS dataset demonstrates lambda-ber-schema's capability to: - Capture complex multi-technique experiments - Preserve all scientific metadata - Enable data integration and reuse - Support FAIR data principles

This annotation serves as a template for representing SEC-SAXS-MALS datasets from various sources in a standardized, interoperable format.