The world of High Performance Computing (HPC) is now moving towards exascale performance, i.e. the ability of calculating 1018 operations per second. A variety of applications will be improved to take advantage of this computing power, leading to better prediction and models in different fields, like Environmental Sciences, Artificial Intelligence, Material Sciences and Life Sciences.
In Life Sciences, HPC advancements can improve different areas:
- a reduced time to scientific discovery;
- the ability of generating predictions necessary for precision medicine;
- new healthcare and genomics-driven research approaches;
- the processing of huge datasets for deep and machine learning;
- the optimization of modeling, such as Computer Aided Drug Design (CADD);
- enhanched security and protection of healthcare data in HPC environments, in compliance with European GDPR regulations;
- management of massive amount of data for example for clinical trials, drug development and genomics data analytics.
The outbreak of COVID-19 has further accelerated this progress from different points of view. Some European projects aim at reusing known and active ingredients to prepare new drugs as contrast therapy against COVID disease [Exscalate4CoV, Ligate], while others focus on the management and monitoring of contagion clusters to provide an innovative approach to learn from SARS-CoV-2 crisis and derive recommendations for future waves and pandemics [Orchestra].
The ability to deal with massive amounts of data in HPC environments is also used to create databases with data from nucleic acids sequencing and use them to detect allelic variant frequencies, as in the NIG project [Nig], a collaboration with the Network for Italian Genomes. Another example of usage of this capability is the set-up of data sharing platform based on novel Federated Learning schemes, to advance research in personalised medicine in haematological diseases [Genomed4All].
Supercomputing is widely used in Drug Design (the process of finding medicines for disease for which there are no or insufficient treatments), with many projects active in this field just like RISC2.
Sometimes, when there is no previous knowledge of the biological target, just like what happened with COVID-19, discovering new drugs requires creating from scratch new molecules [Novartis]. This process involves billion dollar investments to produce and test thousands of molecules and it usually has a low success rate: only about 12% of potential drugs entering the clinical development are approved [Engitix]. The whole process from identifying a possible compound to the end of the clinical trial can take up to 10 years. Nowadays there is an uneven coverage of disease: most of the compounds are used for genetic conditions, while only a few antiviral and antibiotics have been found.
The search for candidate drugs occurs mainly through two different approaches: high-throughput screening and virtual screening. The first one is more reliable but also very expensive and time consuming: it is usually applied when dealing with well-known targets by mainly pharmaceutical companies. The second approach is a good compromise between cost and accuracy and is typically applied against relatively new targets, in academics laboratories, where it is also used to discover or understand better mechanisms of these targets. [Liu2016]
Candidate drugs are usually small molecules that bind to a specific protein or part of it, inhibiting the usual activity of the protein itself. For example, binding the correct ligand to a vial enzyme may stop viral infection. In the process of virtual screening million of compounds are screened against the target protein at different levels: the most basic one simply takes into account the shape to correctly fit into the protein, at higher level also other features are considered as specific interactions, protein flexibility, solubility, human tolerance, and so on. A “score” is assigned to each docked ligand: compounds with highest score are further studied. With massively parallel computers, we can rapidly filter extremely large molecule databases (e.g. billions of molecules).
The current computational power of HPC clusters allow us to analyze up to 3 million compounds per second [Exscalate]. Even though vaccines were developed remarkably quickly, effective drug treatments for people already suffering from covid-19 were very fresh at the beginning of the pandemic. At that time, supercomputers around the world were asked to help with drug design, a real-world example of the power of Urgent Computing. CINECA participates in Exscalate4cov [Exscalate4Cov], currently the most advanced center of competence for fighting the coronavirus, combining the most powerful supercomputing resources and Artificial Intelligence with experimental facilities and clinical validation.
[Liu2016] T. Liu, D. Lu, H. Zhang, M. Zheng, H. Yang, Ye. Xu, C. Luo, W. Zhu, K. Yu, and H. Jiang, “Applying high-performance computing in drug discovery and molecular simulation” Natl Sci Rev. 2016 Mar; 3(1): 49–63.