Drug Discovery and AI: Academic Research Worth Talking About

Are you behind in drug discovery because you aren't using AI to its fullest potential?

Drug Discovery and AI: Academic Research Worth Talking About

AI has changed the way drug discovery is done. And academic researchers are a major driver of this disruption. In this blog we’ll cover some of the truly remarkable academic cases that are leveraging AI to push drug discovery further.

For an even more in-depth exploration of each case be sure to download our AI in Drug Discovery eBook.


In a collaborative project between Ohio State University, City University of New York, and Cornell University, researchers Thai-Hoang Pham, Yue Qiu, Jucheng Zeng, Lei Xie, and Ping Zhang developed DeepCE, a mechanism-driven neural network-based method.


DeepCE expands on phenotype-based compound screening by modelling chemical substructure-gene and gene-gene associations, predicting the differential gene expression profile perturbed by de novo chemicals. Essentially, DeepCE uses deep learning to predict how drugs will influence the amounts of RNA, and therefore the amounts of various proteins, produced by a cell, which in turn provides insights into how the drug may modulate the disease.


DeepCE uses a neural network-based model for gene expression profile prediction consisting of several components. A graph convolutional network is used to learn a vector representation for each chemical compound from its graph structure. A feed-forward neural network is used to learn vector representations for cell line and chemical dose size.

These vector representations are then put into the interaction component (two multihead attention modules, concatenated into a normalization layer followed by feed-forward layer and another normalization layer) to learn high-level feature associations, including chemical substructure-gene and gene-gene feature associations. Finally, the prediction component (two-layer feed-forward neural network with a rectified linear unit activation function) takes the interaction component’s outputs as inputs to simultaneously predict the gene expression values for all L100017 genes.


DeepCE offers improved performance compared to existing methods and has the advantage of providing data augmentation, which makes it possible to tackle areas with minimal or unreliable data.


A research team from the Engineering, Chemistry, and Statistical Laboratory departments at the University of Cambridge, UK created DOCKSTRING, a software and data bundle for meaningful and robust comparison of machine learning models.


One challenge in drug discovery is being able to utilize the full spectrum of knowledge available. Often, approaches that would be beneficial require the researchers to have a deep level of understanding of the underlying biology. One example of this is molecular docking. It requires extensive domain knowledge to set up experiments and train machine learning correctly. DOCKSTRING was created to help address this challenge.


As machine learning methods for drug discovery continue to be developed, benchmarks are required to compare performance against experimental data, giving an indication of what performance can be expected in the real world.

DOCKSTRING offers standardized and accessible benchmarking capabilities based on molecular docking. The three-component DOCKSTRING bundle includes code, datasets, and benchmarking tasks which allow ML practitioners without biological expertise to obtain meaningful docking scores.


DOCKSTRING is aiming to lower the barrier to entry for drug discovery startups. It goes beyond structure-based modelling and brings more complex techniques for predicting binding affinity into more ligand design pipelines.


The UvA-Bosch Delta Lab at the University of Amsterdam focuses on the fundamentals of deep learning. At the 38th International Conference on Machine Learning in 2021 their team of researchers introduced Equivariant Graph Neural Networks (EGNNs)1.


Satorras, Hoogeboom, and Welling introduced the EGNN architecture for graphs that are translation, rotation, reflection, and permutation equivariant.

Trained and tested against the QM919 20 dataset (a standard in ML for chemical property prediction tasks), Equivariant Graph Neural Networks (EGNNs) produces highly competitive results in all property prediction tasks while remaining simple, not requiring the use of higher-order molecular representations, molecular angles, or spherical harmonics.


The EGNN-based model can predict all features from the QM9 dataset including equilibrium geometries, frontier orbital eigenvalues, dipole moments, harmonic frequencies, polarizabilities, and thermochemical energetics corresponding to atomization energies, enthalpies, and entropies at ambient temperature.


Graph Neural Networks (GNNs) can accelerate the drug discovery process by providing an ability to analyze molecules and their properties at a previously unattainable level, and EGNNs in particular represent a step forward in terms of simplicity and efficiency.

AI is making it possible for researchers to do more, faster, and with greater accuracy. If you’re interested in diving even deeper into how each of these cases are using AI, exploring additional resources, or checking our sources download our latest eBook. Or, if you want to learn more about how our drug data can help kick your AI-powered research into high gear we would love to chat with you.

Looking for more? Check out our first blog in this series and explore how artificial intelligence has changed the drug discovery game.

Download eBook

The Little Book of Big Changes in AI-Powered Drug Discovery eBook

Learn more about AI in drug discovery and get access to helpful resources and references.


Satorras VG., Hoogeboom E, Welling M. E(n) Equivariant Graph Neural Networks. 2021.