AI in Drug Discovery

Data: What AI Needs The Most

All AI approaches need one thing: masses and masses of high-quality data. New data points help improve the performance of the model and thus also the predictions.

Our tools help generate large amounts of usable, versatile data that can be fed into various points of the AI workflow.

Chemical Spaces

Chemical Spaces serve as the largest resources for purchasable and synthesizable compounds as well as the synthons to create them. Different methods are available to retrieve relevant chemistry.
  • compounds (based on three different similarities)
  • building blocks
  • reactions
  • Bemis-Murcko scaffold SMILES

Ligand Assessment

In the 3D environment, the interactions between the target and ligand provide an additional layer of information that can be used to predict novel binders.
  • estimated affinity
  • individual atomic ΔG contributions
  • H-bonds, desolvation penalties
  • poses/interactions

Properties & Descriptors

It is also possible to calculate the most common physicochemical and pharmacokinetically relevant parameters of datasets to enable direct classification of candidates.
  • MMW, logP, LLE, TPSA
  • formal charge of molecule
  • ADME parameters
  • numbers of H-bond donors/acceptors, rings, halogens, rotatable bond, …

People Want Molecules That Can Be Made
… or even better: that can be bought!

The most promising molecules have only minor value if they are difficult to access by AI in drug discovery. Our methods enable access to the largest compound collections with tangible entries: Combinatorial Chemical Spaces.

  • 1. The core of the innovation is the AI/ML machinery, which identifies complex patterns from input data. The generated model can then be applied to predict outcomes.
  • 2. The method suggests optimized queries that have been fine-tuned for potency or ADME properties. Depending on the workflow and its units, these may be more or less synthetically accessible.
  • 3. The search in Chemical Spaces delivers only synthetically accessible compounds with a high degree of similarity to the proposed candidates.

Ultra-Large Screening:
The Pinnacle of Compound Hunting Grounds

Bigger Is Better!
Chemical Spaces are the largest collections of potential drug candidates, offering a vast selection of compounds that can be screened significantly faster and more efficiently than pre-enumerated libraries given the context of AI in drug discovery. While the concept of Chemical Spaces has existed for more than 20 years, they are now undergoing a true limelight experience.

This transformation is driven by companies specializing in on-demand compound synthesis, making these enormous collections not only theoretically accessible but also practically viable for real-world drug discovery. By leveraging this synergy, researchers can explore billion- to trillion-sized Chemical Spaces with the confidence that promising hits can be synthesized and tested quickly.

Navigating the Chemical Space

Different 2D search methods (pharmacophore, fingerprint-based, substructure) are available to explore new areas and expand chemical diversity.

The algorithms provide case-dependent similarity scores for orthogonal screening that can be used to enrich similar compounds and promising hits.

Chemistry Tailored to the Campaign's Objectives

Expanding into uncharted chemical regions enables you to mine for new analogs and seeds for iterating optimization of compound parameters. Focus on the proximity around compounds of interest retrieves relevant subsets for your projects.

The best part is that compounds retrieved from our commercial partners’ Spaces can be ordered directly to your table.

Leveraging From Chemical Spaces:
Almost Infinite Compounds = Almost Infinite Data

Due to their nature, Chemical Spaces can be used in various ways to optimize drug discovery workflows and adapt them to specific objectives. AI in drug discovery is no exception.

Using a collection of building blocks or a set of possible reactions, valuable data can be generated.
Space Generation:
Using the insights from AI to populate the Chemical Space.
  • Generate all possible compounds based on the provided building blocks and reactions.
  • Categorize these compounds based on the complexity of their synthesis (e.g., easy, moderate, challenging).
Training the AI Model:
Train the AI using historical data on reaction success and challenges.
  • Identify reactions thar are efficient and straightforward.
  • Flag reactions that require complex conditions or additional effort.
Predictive Analysis
Assess the content and its accessibility.
  • Compatibility of building blocks.
  • Optimal reaction conditions for successful outcomes.
  • Synthetic feasibility of each compound.

Reducing Costs while Expanding the Possibilities

Compounds are money. And money should be spend wisely.
The byproduct of the dynamic creation of Chemical Spaces can be the effort-associated mapping of the contained compounds, providing additional factors for informed decision-making.

Furthermore, Chemical Spaces can be trained and improved. Minor tweaks to the used building blocks and reactions increase the aptness of the Space.
Distance to Purchasability:
The setup is suitable for capturing the time required for compounds acquisition.
Depending on complexity and current availability, sub-spaces can be generated—those that can be made with 'on-the-shelves' building blocks and those that would take longer due to procurement.
Smart Exploration:
Chemical Spaces allow for the targeted exploration of vast molecular possibilities based on the needs of the project.
By strategically navigating these spaces, researchers can prioritize the most promising candidates and focus on the chemical proximity around them.
Cost Effectiveness:
The interplay of the mentioned factors enables the prediction of costs per compound and the generation of multiple cheaper and/or faster alternatives to a compound and its related hits.

Structure-Based Drug Design Augmentation

3D approaches profit from additional insights and layers of information when applying AI in drug discovery. Our scoring algorithm HYDE enables the incremental calculation of the binding contribution of individual heavy atoms, providing a highly detailed view of how each part of a molecule interacts within a target complex.

Guided by interactions, HYDE identifies particularly favorable placements of atoms in the binding site, highlighting regions where high-quality interactions can be achieved. This approach takes into account the delicate balance of solvation and desolvation effects for both the ligand and the target, ensuring a realistic and meaningful assessment of binding affinity.

By focusing on individual atomic contributions, HYDE helps refine molecular designs with precision, optimizing interactions for stronger and more effective binding.

Additional Layer of Information for AI-Driven SBDD

The individual atom scores are written out in a CSV format, which can be easily read by the common applications to identify important interaction hotspots within the AI/ML workflow.

HYDE captures the ligand-complex holistically: Important factors such as torsions, inter- and intra-molecular clashes, number of H-bonds, and the estimated affinity of the entire ligand are part of the output.

Structure Preparation Setup for Modeling

All modeling approaches require the best starting points to ensure the highest possible precision of predictions. Embedding our tools into your structure-based workflows enhances the performance of your virtual screenings and molecular modeling studies.

Nice-to-know: The --hbond-only option allows that only the calculation and optimization of the H-bond network is performed for every input ligand. No HYDE scoring and pose optimization is executed, e.g., no HYDE scores are available in the output file.
More Potential Applications of HYDE

Holistic Assessment of Interactions

The conformational sampling of the complex can be seamlessly combined with the interpretable feedback from HYDE to enable iterative optimization within ligand design. By adding an additional layer of information (calculated binding affinities, LE/LLE, torsions, clashes), new insights into the binding behavior of the compound (series) can be gained.

Input for Model Generation

Identifying hot spots for high-quality interactions between the (potential) ligand and target structure can serve as the basis for rational drug design, which is implemented through a generative workflow. It is particularly helpful that both aspects, ligand AND target, are analyzed for their contributions to the overall affinity.

More Use Cases

  • Orthogonal scoring metric for hit enrichment.
  • Refinement of poses for template-based docking or fragment extension.
  • Sanity check for workflow-generated poses.
  • Surface sampling of target surface for poorly assembled regions.
  • Assessment of interactions with (conserved) water, co-factors, and ions.

Understanding and Analyzing Your Compound Sets

Our tools enable the generation of parameters that can be additionally utilized to further categorize compounds and serve as new input for ML/AL approaches.

The generated data points can help to understand and discover patterns of your compounds.
Use the generated parameters to further characterize your compounds and perform dimensionality reduction such as UMAP and PCA to cluster your compounds, improve your model and interpret the topology of your chemical diversity.

Computable Parameters

  • docking score (FlexX)
  • predicted binding affinity (HYDE)
  • LE/LLE
  • molecular torsions
  • inter- & intramolecular clashes
  • logP
  • TPSA
  • ADME properties
  • max ring size
  • total charge
  • Bemis-Murcko scaffold
  • #acceptors
  • # donors
  • # heavy atoms
  • # aromatic atoms
  • # N and O
  • # rings
  • # aromatic rings
  • # halogens
  • # rotatable bonds

Workflows in Motion: AI in Drug Discovery

Many users have already recognized the value of our applications and successfully integrated them into their drug discovery pipelines. Among the numerous possibilities, we would like to highlight two published success stories in more detail.

AIDDISON Workflow by MilliporeSigma & Merck

To enhance the virtual screening performance and compound selection, Kuhn, Ghogare et al. from Merck and Millipore Sigma embedded Chemical Space navigation with FTrees into the AIDDISON workflow.
FTrees is used for 2D pharmacophore searches, enabling scaffold hopping and efficient ligand-based screening to expand the chemical diversity of related compounds to a proposed candidated. AIDDISON incorporates several Chemical Spaces, Enamine's REAL Space, WuXi's GalaXi, and their own in-house SA-Space, each covering billions of synthetically accessible molecules.

The integration of the search algorithm and the combinatorial Chemical Spaces accelerates searching, filtering, and identifying of promising drug-like molecules while expanding into uncharged areas of the chemical space that would not be covered by other 2D methods.

(Workflow representation modified after Kuhn, Ghogare et al. 2023.)

SpaceHASTEN Workflow by Orion Pharma

In order to mine relevant chemistry based on an enumerated set of seed compounds, Kalliokoski et al. integrated SpaceLight and FTrees into their SpaceHASTEN workflow to optimize the performance of a structure-based virtual screening approach.
SpaceHASTEN is designed to handle giga-scale compound collections like Enamine's REAL Space and combines molecular docking and machine learning-guided compounds selection.
Docking scores are predicted with ML models used to prioritize compounds retrieved through similarity searches. An iteratively refined selection aims to maximize hit rates and deliver diverse and synthesizable virtual hits with high docking scores for experimental validation.

This approach enables efficient handling of billion-sized molecule clusters with reduced computational costs compared to brute-force docking.

(Workflow representation modified after Kalliokoski et al. 2023.)
Clearly, the limit of possibilities has not yet been reached. Just as drug discovery is complex, the solutions to achieve results can be equally diverse.

Our versatile portfolio not only enables the generation of valuable additional data points for AI-driven drug discovery but also helps improve and optimize the economic efficiency of the processes.

Augment your workflows with BioSolveIT Applications