|Applications of deep generative modeling approaches on Omics data
|Tag der mündlichen Prüfung:
Single-cell RNA-sequencing (scRNA-seq) provides detailed insights into the biology of tissue and disease development on the level of single cells. This cell-specific information can be used for cell identification, inference of cell development and disease characterization. However, current sequencing methods suffer from technical constraints, especially large differences between multiple experiments (batch effects) and a high number of technically absent expression values (dropout). This can impede common analysis, for example, differential expression analysis, clustering and cell type identification. Common methods for scRNA-seq analyses focus on solving either the problems due to batch effects by batch correction or the problems of dropouts by imputation. However, both problems are closely related.
Given this insight, a combined approach for expression reconstruction, called DISCERN, was developed and extensively evaluated in the project described here. DISCERN, a generative deep learning model, is the first approach, which combines batch correction with imputation. It is based on the autoencoder architecture of Wasserstein autoencoders (WAEs) and conditional instance normalization (CIN) to reconstruct and adjust gene expression values to a reference batch.
DISCERN was extensively compared to previous batch correction and imputation methods. In several benchmarks, it outperforms state-of-the-art methods for batchcorrection, e.g. Seurat, scGEN, and scVI, as well as state-of-the-art imputation methods, e.g. scImpute, CarDEC, and DCA. The approach of DISCERN differs from previous approaches for batch correction and imputation by directly adjusting gene expression information and using a high-quality reference for the reconstruction of multiple batches. In contrast, established batch correction methods rely on an adjusted embedding of gene expression values and current methods for imputation are not evaluated for data sets composed of multiple batches. The evaluations show that DISCERN improves the analysis of scRNA-seq data with respect to the detection of marker genes and cell type identification, when using e.g. single-nuclei RNA-sequencing (snRNA-seq) or bulk RNA-sequencing (RNA-seq) data as a reference.
Especially bulk RNA-seq data obtained from cells sorted by type is well-suited as a reference, as it usually has almost no dropout of gene expression values, due to technical reasons. Applying DISCERN to a scRNA-seq data set and a bulk RNA-seq reference data set delivered novel insights into the development of severe lung damage in the coronavirus disease 2019 (COVID-19). These insights could be verified using other data modalities.
Thus, reference-based reconstruction based on deep generative networks, such as the one implemented in DISCERN, provides a real advance in the analysis of Omics data.
|Enthalten in den Sammlungen:
|Elektronische Dissertationen und Habilitationen
geprüft am 23.02.2024
geprüft am 23.02.2024