Class MaxCommonSubstructure

java.lang.Object
com.chemaxon.search.mcs.MaxCommonSubstructure
All Implemented Interfaces:
chemaxon.license.Licensable
Direct Known Subclasses:
BuildupMcs, MaxCliqueMcs

@PublicApi public abstract class MaxCommonSubstructure extends Object implements chemaxon.license.Licensable
Abstract base class of the algorithms for finding the maximum common substructure (MCS) of two molecules. More precisely, these algorithms find the maximum common edge subgraph (MCES) of the input structures. For more information, see the user's guide.

An instance of the default MCS algorithm implementation can be created using newInstance() or newInstance(McsSearchOptions). The provided MCS algorithms are powerful heuristic methods, which typically find large common substructures in a short time. However, they do not always provide the exact optimal result due to the complexity of the MCS problem (especially for large molecules). Furthermore, as the algorithms perform randomized search, different results might be obtained for equivalent molecule representations.

Warning: MCS algorithms do not perform transformations on the input molecules, so you should be aware of aromatization (and other standardization actions) before using them.

Features: Query and target structures basically play the same role in MCS search except for query features: a query molecule may contain generic query atoms (A, Q, M, X, list atom, not list atom, etc.) and query bonds (any, single or double, etc.), but query properties (e.g., valence, hydrogen count) are ignored. If exact query atom/bond matching is set to true, then generic atoms and bonds are allowed in both molecules, but they are matched in exact manner. Reactions are also supported, but Markush structures are not.

Typical usage:

 MaxCommonSubstructure mcs = MaxCommonSubstructure.newInstance();
 mcs.setMolecules(queryMol, targetMol);
 McsSearchResult result = mcs.find();
 System.out.println("Atoms in MCS: " + result.getAtomCount());
 System.out.println("Bonds in MCS: " + result.getBondCount());
 System.out.println("MCS molecule: " + MolExporter.exportToFormat(result.getAsMolecule(), "smiles"));
 
  • Field Details

    • DEFAULT_RANDOM_SEED

      public static final long DEFAULT_RANDOM_SEED
      Default random seed.
      See Also:
    • randomSeed

      protected long randomSeed
      Random seed (0 means using the current time).
    • searchMode

      protected SearchMode searchMode
      Search mode.
    • timeLimit

      protected long timeLimit
      Time limit in milliseconds (-1 means disabled).
    • queryMol

      protected Molecule queryMol
      Query molecule (it is neither cloned nor modified).
    • targetMol

      protected Molecule targetMol
      Target molecule (it is neither cloned nor modified).
    • searchOpts

      protected final McsSearchOptions searchOpts
      Search options.
    • LOG

      protected static final System.Logger LOG
      Logger object.
    • matcher

      protected com.chemaxon.search.vf2.Vf2Matcher matcher
      The matcher object containing the preprocessed query and target molecules.
  • Constructor Details

    • MaxCommonSubstructure

      protected MaxCommonSubstructure(McsSearchOptions searchOpts)
  • Method Details

    • newInstance

      public static MaxCommonSubstructure newInstance()
      Creates a new instance of MCS search algorithm using the default search options.

      MaxCliqueMcs implementation is used, which turned out to be the most efficient and robust according to our benchmark tests.

      Returns:
      a new instance of the default algorithm implementation
      See Also:
    • newInstance

      public static MaxCommonSubstructure newInstance(McsSearchOptions searchOpts)
      Creates a new instance of MCS search algorithm using the given search options.

      MaxCliqueMcs implementation is used, which turned out to be the most efficient and robust according to our benchmark tests.

      Parameters:
      searchOpts - the search options (not null)
      Returns:
      a new instance of the default algorithm implementation
    • isLicensed

      public final boolean isLicensed()
      Returns information about the licensing of the product.
      Specified by:
      isLicensed in interface chemaxon.license.Licensable
      Returns:
      true if the product is correctly licensed
    • setLicenseEnvironment

      public final void setLicenseEnvironment(String env)
      Sets the license environment.
      Specified by:
      setLicenseEnvironment in interface chemaxon.license.Licensable
    • setMolecules

      public final void setMolecules(Molecule query, Molecule target)
      Sets the two molecular structures to be matched.
      Parameters:
      query - query molecule (not null)
      target - target molecule (not null)
    • setQuery

      public final void setQuery(Molecule query)
      Sets the query structure.
      Parameters:
      query - query molecule (not null)
    • setTarget

      public final void setTarget(Molecule target)
      Sets the target structure.
      Parameters:
      target - target molecule (not null)
    • getQuery

      public final Molecule getQuery()
      Gets the query structure.
      Returns:
      the query molecule
    • getTarget

      public final Molecule getTarget()
      Gets the target structure.
      Returns:
      the target molecule
    • getSearchOptions

      public final McsSearchOptions getSearchOptions()
      Returns the search options used by this instance. For more information, see McsSearchOptions.
      Returns:
      search options (not null)
    • setSearchMode

      public final void setSearchMode(SearchMode mode)
      Sets the search mode that controls the running time and the accuracy of the algorithm. The default option is SearchMode.NORMAL. For more information, see SearchMode.
      Parameters:
      mode - search mode (not null)
    • getSearchMode

      public final SearchMode getSearchMode()
      Gets the current search mode. For more information, see SearchMode.
      Returns:
      current search mode (not null)
    • setTimeLimit

      public final void setTimeLimit(long maxMilliseconds)
      Sets the maximum allowed time for MCS search. If the given limit is exceeded, the search process terminates with the best result obtained so far. When searching for multiple results (without changing the input molecules), the time limit is applied only to the first call of nextResult() (or hasNextResult()).

      This is an optional limit, which is set to 1 minute by default. You can use a negative parameter value to disable it.

      If the search process seems to be too slow, consider using FAST search mode instead of decreasing the time limit (see setSearchMode(SearchMode)).

      Parameters:
      maxMilliseconds - maximum running time in milliseconds (negative value means disabled)
    • getTimeLimit

      public final long getTimeLimit()
      Gets the maximum allowed MCS search time in milliseconds. If no such limit is specified, this method returns -1. For more information, see setTimeLimit().
      Returns:
      maximum running time in milliseconds or -1 if no such limit is specified
    • setRandomSeed

      public final void setRandomSeed(long seed)
      Sets the random seed value. 0 means using the current time.
      Parameters:
      seed - random seed
    • getRandomSeed

      public final long getRandomSeed()
      Gets the random seed value. 0 means using the current time.
      Returns:
      seed random seed
    • find

      public final McsSearchResult find()
      Performs MCS search according to the specified options. This method returns a valid result object even if the MCS is empty or the search time limit is reached.

      If multiple MCS results are desired, use hasNextResult() and nextResult().

      Returns:
      the McsSearchResult object containing the found common substructure
      Throws:
      IllegalStateException - if the input molecules are not set before calling this method or the method is called more than once for the same input molecules
      CancellationException - if the thread has been interrupted
    • hasNextResult

      public final boolean hasNextResult()
      Returns whether there are more results available. See nextResult().
      Returns:
      true if nextResult() can be called to obtain a new McsSearchResult object
      Throws:
      IllegalStateException - if the input molecules are not set before calling this method
      CancellationException - if the thread has been interrupted
    • nextResult

      public final McsSearchResult nextResult()
      Finds the next MCS search result according to the specified options. The first call to this method always returns a valid result object, even if the MCS is empty or the search time limit is reached.

      If multiple search results are desired, this method can be called repeatedly. The results typically correspond to equivalent common substructures but with different mappings. (See McsSearchResult for more information.) When this method is called multiple times, hasNextResult() should be called first to avoid exception.

      The search state is reset after a call to setQuery(Molecule), setTarget(Molecule) or setMolecules(Molecule, Molecule).

      Returns:
      the McsSearchResult object containing the found common substructure
      Throws:
      IllegalStateException - if the input molecules are not set before the first call to this method or there are no more results to return
      CancellationException - if the thread has been interrupted
    • calculateUpperBound

      public final int calculateUpperBound()
      Calculates an upper bound on the number of bonds the maximum common substructure may contain with respect to the specified search options. The returned value is an upper bound on McsSearchResult.getBondCount().

      As this method is much faster than executing the MCS algorithm (find()), it can be used for pre-filtering.

      Returns:
      an upper bound on the size (bond count) of the MCS
      Throws:
      IllegalStateException - if the input molecules are not set
      CancellationException - if the thread has been interrupted
    • calculateSimilarityUpperBound

      public final float calculateSimilarityUpperBound()
      Calculates an upper bound on the similarity of the query and target molecules with respect to the specified search options. The returned value is an upper bound on McsSearchResult.getSimilarity().

      As this method is much faster than executing the MCS algorithm (find()), it can be used for pre-filtering.

      Returns:
      an upper bound on the (MCS based) similarity of the query and the target
      Throws:
      IllegalStateException - if the input molecules are not set
      CancellationException - if the thread has been interrupted
    • findMcs

      protected abstract McsSearchResult findMcs()
      Finds the MCS and all related data (fragments and mappings). Deriving classes must implement this method to calculate an (approximate) maximum common substructure after internal data structures are properly initialized.

      For internal use only.