Molecule Libraries

Molecule Libraries and Their Use in Drug Discovery

Molecule libraries are collections of individual compounds. They can physically exist, meaning that all substances are stored as powders or solutions, that can be used and tested when needed, or they can exist as virtual molecule collections stored on a computer.

The latter represents a resource often used in the field of computer-aided drug discovery (CADD). Virtual molecule libraries are utilized as sets for various applications, such as using them in the search for potential active substances, extracting subsets from them to train or test a model, or generally to maintain an overview of the inventory of in-stock compounds.

Harnessing Your Molecule Libraries

Molecule libraries can emerge from different sources. On one hand, the substances contained may already exist and be stored in vials, waiting for use. On the other hand, the molecules can be entirely virtual and may not (yet) exist. Furthermore, the entries can be of in-house origin or from a catalog of compound providers that can be ordered as needed.
Furthermore, the content of libraries can vary greatly. They may contain highly reactive reagents, small fragments, drug-like compounds, beyond-rule-of-five entries, natural compounds, and anything else currently floating around in the research portfolio.

It is also not uncommon to create individual, project-specific molecule libraries. For example, it might be useful to set up a diverse library with many hinge-binding motifs for kinase-focused projects. Similarly, having a library of inactive molecules for a target can be interesting to use as a decoy set.

Managing Molecule Libraries and Making Them Accessible

As versatile as molecule libraries may be, their true potential can only be realized if they are actually put to use. Therefore, it is of utmost importance that they are accessible internally for every user. Unrestricted access and ease of use enable efficient work, leading to faster and better overall results.

A central molecule library management system addresses these needs and comes with several advantages. Libraries are stored in one central location, accessible to all users for specific tasks. This eliminates the need to transfer data from one place to another, which conserves resources. For large virtual screening campaigns, the system can be used to run the complex computations. Once completed, the results can be easily loaded and analyzed. Additionally, it allows for the management of the entire user team by one or more admins.
To streamline in-house virtual screening campaigns, we have developed a high-performance computing solution that can be used by any team.
  • HPSee: Scalable virtual screening workflow environment.
    HPSee allows the upload of molecule libraries that then can be subsequently used for virtual screening runs. Within the interface molecule libraries and user can be managed.
    Serving as a gateway to high-performance computing, HPSee enables the efficient docking of larger molecule libraries. Once the calculations are complete, the results can then be analyzed and evaluated in SeeSAR's graphical user interface.

Making Molecule Libraries Versatile

As mentioned, molecule libraries can be used in a variety of ways. A more creative approach, for example, would be to use the molecules as growing fragments. In a structure-based scenario, they could be applied to build onto a ligand to see which fragments best complement the binding site.

This allows for maximizing the versatility of the compounds and gaining the greatest benefit from in-house structures. Quick screening methods could also be used to generate on-the-fly ideas and arrive at concepts that might not have been considered otherwise.

FastGrow is a tool that can both generate databases of molecular fragments and use them for compound extension. The sampling of hundred thousands of fragments together with several of their conformations for the best shape complementary solutions to satisfy a binding site takes only a few seconds on standard hardware.

Software for compound extension:
  • SeeSAR: Visual, drug design dashboard for computational and medicinal chemists.
Command-line tools for compound extension:

Create Custom Molecule Libraries of Tangible Results

Custom molecule libraries are a useful means to create tailored datasets designed to fulfill a specific task. For example, it may be advantageous to have several thousand molecules on hand that are very similar to a particular molecule but not identical to it. Such datasets are often used in machine learning approaches to train a model with data it has not yet seen.

Chemical Spaces are the perfect sources for such custom molecule libraries. Due to their sheer size, reaching billions or even trillions, they provide enough volume to yield relevant results for numerous molecules. De novo methods, in particular, benefit from Chemical Spaces by searching for the closest commercially available compounds to the generated results.

For the mining of relevant molecules, different methods are available that retrieve compounds based on various forms of similarity. Some projects may require an exact substructure search, others may want to explore analogs of a compound, while others might seek a bit more fuzziness in the structure to break out of a specific chemotype.
BioSolveIT software for compound mining:
  • infiniSee: Chemical Space navigation platform with a graphical user interface.
    infiniSee retrieves relevant chemistry from ultra-large Chemical Spaces containing billions or even trillions of compounds based on their similarity to a query compound. Results are synthetically accessible per design in one or two steps and, in the case of our partners' Chemical Spaces, can be ordered directly to your table.
  • infiniSee xREAL: Exclusive platform to screen Enamine's largest compound catalog featuring trillions of compounds.
    infiniSee xREAL contains all features of infiniSee and supports all three Chemical Space exploration search modes.
Command-line tools for compounds:
  • FTrees: Pharmacophore-based similarity screen. Algorithm behind the Scaffold Hopper Mode.
  • SpaceLight: Retrieves close analogs based on molecular fingerprints. Algorithm behind the Analog Hunter Mode.
  • SpaceMACS: Performs maximum common substructure searches, as well as exact substructure mining. Algorithm behind the Motif Matcher Mode.

Excited for more drug discovery solutions?