Daptev: Deep Aptamer Evolutionary Modelling for Covid-19 Drug Design
Cameron Andress, Kalli Kappel, Marcus Elbert Villena, Miroslava Cuperlovic-Culf, Hongbin Yan, Yifeng Li
Typical drug discovery and development processes are costly, time consuming and often biased by expert opinion. Aptamers are short, single-stranded oligonucleotides (RNA/DNA) that bind to target proteins and other types of biomolecules. Compared with small-molecule drugs, aptamers can bind to their targets with high affinity (binding strength) and specificity (uniquely interacting with the target only). The conventional development process for aptamers utilizes a manual process known as Systematic Evolution of Ligands by Exponential Enrichment (SELEX), which is costly, slow, dependent on library choice and often produces aptamers that are not optimized.
Viruses contain deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) but are incapable of self-reproduction and rely on commandeering the cell’s protein creation capabilities to reproduce. After the viral protein has been successfully reproduced, it goes on to infect other cells in a process known as viral proliferation. Attaching and injecting of viral DNA or RNA to host cells is achieved through binding to cellular receptor through receptor-binding domain (RBD), an area on the viral protein evolved to specifically bind hosts’ cell receptor. In the case of the SARS-CoV-2 spike protein, the RBD targets the lung cell angiotensin-converting enzyme (ACE2) receptor [1–6].
Materials and method
Our deep aptamer evolutionary modelling (DAPTEV) framework is visualized in Fig 1. It is a hybrid approach that integrates the strengths of DGMs (variational autoencoders) for aptamer encoding and modelling, computational intelligence (evolutionary computation) for aptamer optimization, and bioinformatics tools for RNA secondary and tertiary structure prediction and RNA-protein folding-and-docking (Rosetta). The continuous docking score is used as the fitness value or objective in the evolutionary computation component to guide aptamer optimization.
In our experiments, we used the SARS-CoV-2 spike glycoprotein (closed state, PDB: 6VXX)  as the docking target with an RBD between residues 333 and 524 [2, 4–6, 49]. See Fig 3 for an illustration. As it shows, the SARS-CoV-2 spike protein’s RBD represents a relatively small area at the apical face of the protein. Using the entire spike protein in the docking procedure would drastically increase the computation time. For example, performing five runs of the docking process with an RNA base count of 25 on the full SARS-CoV-2 spike protein took over 98 minutes to finish computing. Thus, one should remove all unnecessary residues from the target PDB file and save the new PDB structure and the new FASTA sequence. Unnecessary residues are those on the target that an RNA could not possibly interact with during the RBD-docking process due to constraint specification. Once these residues are removed, the PDB file must be cleaned (renumbered and sequenced).
Discussion and Conclusion
While it may seem that GA performed the best in terms of docking scores, this is not actually the case if the aptamer structures are considered as well. As previously mentioned, the docking score can be artificially improved by producing an unfolded RNA secondary structure which, in turn, will incur fewer penalties during the RNA tertiary structure prediction and the docking simulation. If a model prioritizes only unconnected structures, it stands to reason that it would seem to perform better when only considering score output. However, the expectation of this research is three-fold. (1) The docking scores has to be optimized. This was accomplished best by GA as substantiated by the returned scores and statistical analysis. (2) Some learning of well-performing structural motifs in the provided RNA secondary structures is required. Producing unfolded RNAs is not overly helpful when attempting to develop aptamer-based drugs. Even if this is a desirable trait, we expect to produce more complex structures from intelligent models and not be forced to sacrifice these features. Based on the percentage of folded structures in the last generation, it is clear that DAPTEV performs significantly better than GA
The use of Rosetta was technical supported by Das Lab’s Dr. Rhiju Das, Dr. Ramya Rangan, and Dr. Andy Watkins. Additional insights were rendered by Brock University’s Dr. Robson De Grande, Dr. Sheridan Houghten, and Dr. Ali Emami.
Citation: Andress C, Kappel K, Villena ME, Cuperlovic-Culf M, Yan H, Li Y (2023) DAPTEV: Deep aptamer evolutionary modelling for COVID-19 drug design. PLoS Comput Biol 19(7): e1010774. https://doi.org/10.1371/journal.pcbi.1010774
Editor: Alexander MacKerell, University of Maryland School of Pharmacy, UNITED STATES
Received: November 29, 2022; Accepted: June 13, 2023; Published: July 5, 2023
Copyright: © 2023 Andress et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Original data and the code for this model are accessible at https://github.com/candress/DAPTEV_Model.
Funding: This work was supported by the AI for Design Challenge Program from the National Research Council Canada (AI4D-108-2 to YL), the Discovery Grant Program from the National Sciences and Engineering Research Council of Canada (RGPIN 2021-03879 to YL), Ontario Graduate Scholarships (to CA), Schmidt Science Fellows in partnership with the Rhodes Trust and the HHMI Hanna H. Gray Fellows Program (to KK). MCC is an employee of the National Research Council Canada and, therefore, receives a salary. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.