|Titel:||Development and Validation of Algorithms for the Generation of Conformer Ensembles Representing Protein-Bound Ligand Conformations||Sprache:||Englisch||Autor*in:||Friedrich, Nils-Ole||Schlagwörter:||conformer ensemble generation; conformer ensemble generator validation; protein-bound ligand conformations; computer aided drug design; macrocycle conformation; Konformations-Ensemble-Generierung||Erscheinungsdatum:||2020-06-21||Tag der mündlichen Prüfung:||2020-10-27||Zusammenfassung:||
The systematic search for new drugs is expensive and time-consuming. The process of drug discovery is more and more supported by computer aided drug design. Applications such as docking, pharmacophore search, 3D database searching and the creation of 3D-QSAR models are dependent on conformational ensembles to adequately represent the flexibility of small molecules. The generation of conformational ensembles is a complex problem because of the high number of degrees of freedom, even in small molecules. Because of its importance to the field, conformer ensemble generation has been the subject of intensive research for more than three decades. While there have been many intriguing ideas for algorithms for conformer ensemble generation, the quality and size of the benchmarking datasets to test their validity have improved very slowly.
To compile a large dataset of high-quality protein-bound ligand conformations from X-ray structural data, a fully automated cheminformatics pipeline for their selection and extraction was developed during this thesis. The pipeline evaluates the validity and accuracy of the 3D structures of small molecules according to multiple criteria, including their physicochemical and structural properties and, most importantly, their fit to the experimentally determined electron density. Extracted from a total of over 350,000 structures of co-crystallized ligands stored in the Protein Data Bank, the resulting Sperrylite and Platinum datasets are the largest publicly available datasets of such high quality. The Sperrylite Dataset consists of 10,936 high-quality structures of 4,548 unique ligands. It was utilized to assess the variability of the bioactive conformations of small molecules. The Platinum Dataset contains the 4,548 unique protein-bound ligands with the smallest diffraction-component precision index in the Sperrylite Dataset. The Platinum Diverse Dataset is a diversified subset of the Platinum Dataset of 2,859 compounds. In addition to its high quality and remarkable size, the Platinum Diverse Dataset is unbiased, diverse, and easily updatable. The Platinum Diverse Dataset is the first publicly available dataset from X-ray structural data in the Protein Data Bank of adequately high quality and sufficient size for thorough benchmark studies of conformer ensemble generators, which allow statements on the statistical significance of differences in performance between algorithms. In the course of this thesis, the Platinum Diverse Dataset was utilized to conduct the most comprehensive benchmark study of conformer ensemble generators to date. The performance of seven freely available and eight commercial conformer ensemble generators were compared to each other. The tests showed that commercial algorithms generally obtain higher accuracy and robustness with respect to input formats and molecular geometries.
The findings and experience gained during the benchmarking studies and the analysis of the variability of bioactive conformations was used for the development of Conformator, a new knowledge-based algorithm for generating conformer ensembles. Conformator is freely available for noncommercial use and academic research. The conformer ensembles generated by Conformator are significantly more accurate than those of all free tools tested, and there is no significant difference to the best performing commercial algorithm. It could be demonstrated that Conformator, with its high accuracy and speed, as well as its robustness with respect to input formats, molecular geometries, and its handling of macrocycles, effectively closes the gap between commercial and freely available algorithms.
|Enthalten in den Sammlungen:||Elektronische Dissertationen und Habilitationen|