Titel: Advancements in Diffraction Analysis Methods and Data Reduction Techniques for Serial Crystallography
Sprache: Englisch
Autor*in: Galchenkova, Marina
Schlagwörter: serial crystallography; data reduction; automatisation; compression techniques; protein crystallography
Erscheinungsdatum: 2024
Tag der mündlichen Prüfung: 2024-03-05
Proteins play a crucial role in living cells. Their functions are determined by their three-dimensional (3D) structure. This atomic-scale structure is usually investigated by crystallography using X-ray sources such as an X-ray tube, synchrotron or Free Electron Laser (FEL). The conventional approach to macromolecular crystallography (MX) is to acquire diffraction patterns from a crystal as it is rotated about one or more axes to get the full 3D diffraction volume of the studied crystal. The total X-ray exposure of the crystal is limited by the accumulation of damage to the protein structure and crystal lattice by ionising radiation. Cryogenic cooling reduces the processes of radiolysis and extends the dose that can be tolerated. However, such cooling may alter the macromolecular structure and prevent the ability to measure dynamical processes by time-resolved methods.

For efficient measurement at room temperature (RT) and investigating fast protein dynamics, serial crystallography (SX) comes into play. In this method, the studied crystals' 3D diffraction volume (reciprocal space) is merged from still diffraction patterns collected from small randomly oriented crystals exposed by X-rays. This technique must be capable of assembling a complete three-dimensional dataset of structure factor moduli using a large number of individual still diffraction patterns. SX enables a wide range of experiments, including measurements at room temperature, time-resolved studies on biological crystals, measuring sub-micron-sized crystals, and obtaining structures of radiation-sensitive proteins. Known problems in serial crystallography are the high threshold to enter the field, the lack of a user-friendly data processing pipeline, and the huge amount of data that must be processed and reduced to get the structure of the studied protein. This dissertation is dedicated to developing solutions for addressing the issues mentioned above.

Recent advancements in X-ray facilities, including 4th generation synchrotrons and FELs, in combination with state-of-the-art X-ray detectors, have enabled conducting SX experiments at a remarkable rate, capturing more than 1000 images per second. However, this increased acquisition rate comes with a trade-off - an enormous volume of data, with some experiments already yielding up to 5 PB of measured data. As a result, novel data reduction strategies need to be developed and implemented to handle this vast amount of information efficiently. The most common method to reduce the size of the measured data is the usage of lossless compression. The compression rate and speed of different compression algorithms available for the HDF5 library were checked using different datasets. This extensive evaluation demonstrated that lossless compression methods maintain the original data without any alteration but cannot achieve a high compression ratio. Thus, some lossy compression and data reduction are needed. For this reason, the following approaches were successfully tested on different datasets: binning, quantisation (including quantisation using a non-uniform step), and non-hits rejection. Also, it was shown that such approaches as measuring less data or storing data within the area of identified Bragg peaks in a diffraction pattern may lead to data quality degradation and, therefore, are not recommended for general use.

A set of data metrics capable of assessing the loss of information due to applying various compression schemes is used to evaluate the effect of any lossy compression schemes. Different data quality metrics are described and used for testing various data reduction schemes. A proper way to use each quality metric is also described in detail.

Notably, non-hits rejection and binning process automation have been successfully implemented into the routine data processing pipeline and tested on data collected with the TapeDrive sample-delivery method at the P11 beamline, PETRA III. Furthermore, the presented non-uniform quantisation compression technique holds potential for application in other datasets, including electron or neutron diffraction.

The enormous amount of measured data poses another challenge: it cannot be processed manually. Instead, an auto-processing pipeline has to be developed. Considering how the crystals are measured in MX and SX, the data analysis techniques differ for those two methods. Therefore, the existing pipelines used for MX are hardly applicable to the SX data. Despite significant progress in this field for SX over the past decade, establishing a universal, reliable processing pipeline compatible with different sample delivery systems remains a complex challenge. This dissertation aims to develop a well-established, robust and universally applicable data processing pipeline for SX, which constitutes the generation of various figures of merit and compiling overall statistics for proper data evaluation at each stage of data processing and for publishing purposes. Multiple experiments at FELs and synchrotrons were processed during the work on the dissertation, and some of the results are presented to illustrate the benefits of using the developed algorithms. This dissertation emphasised data with observable undesirable features, such as the presence of ice rings and salt reflections. To address these issues, a special software package was developed and used as a part of the developed data processing pipeline. This automatic data processing pipeline has been implemented in the control system of a drug-screening P09 beamline, PETRA III. This dissertation outlines a strategy to optimise SSX beamtimes using fixed-target sample delivery methods like chips. The approach involves two key steps: initially, a rapid raster scan of the chip identifies crystal positions via diffraction, followed by measuring a rotational series at these positions within a small range of angles. This method efficiently avoids empty positions during data acquisition, saving precious beam time and reducing data volume. It is particularly effective when the chip has few crystals, common with challenging-to-crystallise proteins. This approach is critical for maximising crystal utilisation and enhancing the likelihood of successfully determining protein structures.

The dissertation contributes to the advancement of serial crystallography by establishing a reliable data processing and reduction framework, ensuring the reproducibility and reliability of obtained final results. Developed strategies open up new possibilities for carrying out the experiments in an efficient way and overcoming the problem with data storage.
URL: https://ediss.sub.uni-hamburg.de/handle/ediss/10804
URN: urn:nbn:de:gbv:18-ediss-116422
Dokumenttyp: Dissertation
Betreuer*in: Chapman, Henry N.
Yefanov, Oleksandr
Enthalten in den Sammlungen:Elektronische Dissertationen und Habilitationen

Dateien zu dieser Ressource:
Datei Beschreibung Prüfsumme GrößeFormat  
Thesis_Galchenkova_PhD_v2.pdffc988a413c056974d98ad14d659a5aa611.99 MBAdobe PDFÖffnen/Anzeigen
Zur Langanzeige



Letzte Woche
Letzten Monat
geprüft am null


Letzte Woche
Letzten Monat
geprüft am null

Google ScholarTM