Generative design for small molecule drugs

A. Roy
United States

Keywords: generative design, deep learning, multi-objective optimisation


Success in computational drug design has been elusive. Not for a lack of research activity, for the number of publications in the field has increased every year over the last decade. Not for a want of funds, since investors have lavished more than $115 billion on 600 companies in the last eight years. Not even for a lack of urgency, which was exemplified vividly by the onset of the pandemic. Instead, the failures in this field can be attributed to endogenous causes. Computational drug design is dominated by virtual screening of chemical libraries with deep-learning models. Even though these models feature clever computational architectures, their operations lack the important element of design. The consequences are serious: 1. The chemical space that is explored is a negligible fraction of the total drug-like space. 2. The molecules that are produced are riddled with undesirable characteristics. 3. Lead optimisation must be done manually. A new computational paradigm is needed. But what characteristics should such a paradigm possess? Firstly, recognising that the drug-like chemical space is nearly infinite and cannot be covered appreciably by any finite set, the paradigm must be able to venture beyond the boundaries of the initial library of molecules with which it is presented. It must be able to chart its own path through the forest of drug-like molecular structures to find the most promising candidates for a disease. Secondly, recognising that the requirements of any effective drug molecule are multifarious, the paradigm must be able to optimise molecules against numerous, often conflicting, objectives simultaneously. It must not designate lead optimisation as subsequent manual labour to be performed by medicinal chemists. Finally, recognising that the hard work of scientists over several decades has produced a vast body of curated knowledge of high quality, the paradigm must be able to absorb current science. It may do so by encoding scientific laws and precepts as constraints or objectives. There is no shame in standing on the shoulders of giants. Enter Norachem, with its generative design paradigm, which possesses all of the above characteristics! The importance of a generative step which can transcend the boundaries of the initial library cannot be overstated. Yet, with few exceptions, such a step is uniformly absent in this field of computation. One such exception--the publication of which made a notable splash at the end of 2019--uses stochastic gradient descent to find optimal points in a trained manifold comprising the latent space of a variational autoencoder. In "On the shortcomings of continuous representations of chemical space," Norachem discusses its flaws. Norachem's generative design draws inspiration from mathematics, computation, and Nature. Our investors and stakeholders understand our view that the evolutionary processes shaping biology can also be modelled to contend with the pathologies it creates. Our computational methods include proprietary implementations of evolutionary algorithms, geometric deep learning models, combinatorial routines, and many others, to search hitherto inaccessible areas of the drug-like chemical space.