Class LibraryMcs

java.lang.Object
chemaxon.clustering.libmcs.LibraryMcs
All Implemented Interfaces:
chemaxon.license.Licensable
Direct Known Subclasses:
LibraryMCS

@PublicApi public class LibraryMcs extends Object implements chemaxon.license.Licensable
The LibraryMCS class computes the maximum common substructure (MCS) of a set of compounds. It can suggest scaffolds of a library, in particular VHTS hit sets. Typical size of such input structure set is a few thousand molecules, but LibraryMCS can cope with 10,000s of molecules.
It is not one subgraph common to all or to the majority of input molecules that the algorithm determines but the set of the most frequently occurring common substructures. The more diverse the set to be analysed is the larger the number of the frequent common substructures is, while in case of a more focused set with limited structural diversity, the number of frequent common substructures is smaller.
The algorithm is capable of going one or more level further in this kind of scaffold analysis by finding the MCS of the frequent common substructures - and so on in a hierarchical manner.
Practically speaking structures are clustered based on their MCSs (not on their similarities etc.) in a hierarchical clustering procedure.
This class implements the ClusterEnumberator class which allows clients to retieve the hierarchy. The tree of clusters as well as data associated with nodes in this tree can be accessed along with various code values that help reconstruct the hierarchy in custom applications.
This class also provides a simple command line interface for batch processing of MCS search for a set of structures, as well as a simple graphical user interface for easy navigation through clusters of structures.
Since:
JChem 3.2
  • Field Details

    • DEFAULT_REQUIRED_CLUSTER_COUNT

      public static final int DEFAULT_REQUIRED_CLUSTER_COUNT
      minimum number of top-level clusters
      See Also:
    • DEFAULT_ALLOWED_LEVEL_COUNT

      public static final int DEFAULT_ALLOWED_LEVEL_COUNT
      maximum number of levels in the hierarchy
      See Also:
    • ATOM_COUNT_UPPER_BOUND

      public static final int ATOM_COUNT_UPPER_BOUND
      structures above this size are not searched for pair-wise mcs as it would take to long to calculate the MCS
      See Also:
    • MAX_LEVEL_COUNT

      public static final int MAX_LEVEL_COUNT
      maximum allowed number of hierarchy levels
      See Also:
    • DEFAULT_MCS_MODE

      public static final SearchMode DEFAULT_MCS_MODE
      default MCS search mode
    • DEFAULT_KEEP_RINGS_MODE

      public static final boolean DEFAULT_KEEP_RINGS_MODE
      Rings are not broken by default
      See Also:
    • DEFAULT_ATOM_TYPE_MATCH

      public static final boolean DEFAULT_ATOM_TYPE_MATCH
      atom types are matched by default
      See Also:
    • DEFAULT_BOND_TYPE_MATCH

      public static final boolean DEFAULT_BOND_TYPE_MATCH
      bond types are matched by default
      See Also:
    • DEFAULT_CHARGE_MATCH

      public static final boolean DEFAULT_CHARGE_MATCH
      atom formal charges are matched by default
      See Also:
    • DEFAULT_RADICAL_MATCH

      public static final boolean DEFAULT_RADICAL_MATCH
      atom radicals are not matched by default
      See Also:
    • DEFAULT_ISOTOPE_MATCH

      public static final boolean DEFAULT_ISOTOPE_MATCH
      atom isotopes are not matched by default
      See Also:
    • DEFAULT_MIN_MCS_SIZE

      public static final int DEFAULT_MIN_MCS_SIZE
      default MCS size limit, the algorithm does not search for an MCS below this limit
      See Also:
    • TERMINATION_UNKNOWN

      public static final int TERMINATION_UNKNOWN
      last search terminated for an unknown reason, solution may not be found
      See Also:
    • TERMINATION_ERROR

      public static final int TERMINATION_ERROR
      last search terminated due to an error, solution is not found
      See Also:
    • TERMINATION_LEVEL_COUNT

      public static final int TERMINATION_LEVEL_COUNT
      last search terminated because the predefined allowed level count was reached
      See Also:
    • TERMINATION_CLUSTER_COUNT

      public static final int TERMINATION_CLUSTER_COUNT
      last search terminated because the required top level cluster count was reached
      See Also:
    • TERMINATION_MCS_SIZE_LIMIT

      public static final int TERMINATION_MCS_SIZE_LIMIT
      last search terminated becasue the allowed minimum MCS size was reached
      See Also:
    • TERMINATION_CANCEL

      public static final int TERMINATION_CANCEL
      last search terminated due to user cancellation
      See Also:
    • TERMINATION_SAME_PARAMETERS

      public static final int TERMINATION_SAME_PARAMETERS
      last attempt to cluster one level failed as the clustering paramters were the same as one the last level
      See Also:
    • TERMINATION_STEP_NOT_ALLOWED

      public static final int TERMINATION_STEP_NOT_ALLOWED
      invalid call of method step()
      See Also:
  • Constructor Details

    • LibraryMcs

      public LibraryMcs() throws LicenseException
      Creates an new LibraryMCS instance. It is an empty chemical space that is ready to take structures to be clustered.
      Throws:
      LicenseException - when no valid license found
  • Method Details

    • isLicensed

      public final boolean isLicensed()
      Specified by:
      isLicensed in interface chemaxon.license.Licensable
    • setLicenseEnvironment

      public final void setLicenseEnvironment(String string)
      Specified by:
      setLicenseEnvironment in interface chemaxon.license.Licensable
    • reset

      public void reset()
      Resets the internal state to the initial values. Note, that it does not clear the chemical space, that is, input structures that were added previously (and clustered) are not removed, clusters are deleted. This allows running clustering from scratch but without the need to import and add input molecules again.
      Typically, parameters are changed before reclustering.
    • setRequiredClusterCount

      public void setRequiredClusterCount(int requiredClusterCount)
      Sets the minimal number of clusters required on the top level of hierarchy. Search terminates if there is the number of clusters on the highest level of the hierarchy is less than this limit.
      Parameters:
      requiredClusterCount - number of top level clusters
    • setAllowedLevelCount

      public void setAllowedLevelCount(int allowedLevelCount)
      Sets the maximum number of hierarchy levels allowed in clustering. Clustering terminates when the hierarchy has this many levels.
      Parameters:
      allowedLevelCount - number of hierarchy levels allowed (tree depth)
    • setAtomCountUpperBound

      public void setAtomCountUpperBound(int atomCountUpperBound)
      Sets the maximum structure size for pairwise mcs search. Sructures above this size are not selected for a pair-wise mcs search. This limit has strong effect on the results as well as on the total running time. MCS search for larger structure (e.g. above 40 atoms) can be slow.
    • setMCSMode

      public void setMCSMode(SearchMode mode)
      Sets MCS search strategy. Allowed values are NORMAL or FAST (default).
      Parameters:
      mode - mode flag
    • setMinimumMCSSize

      public void setMinimumMCSSize(int mcsSize)
      Sets the minimum size of any MCS found. MCSs below this size limit are ignored.
      Parameters:
      mcsSize - minimum required size of any MCS
    • setKeepRings

      public void setKeepRings(boolean keepRings)
      Sets whether rings should be kept or they can be broken.
      Parameters:
      keepRings - false if rings can be broken.
    • setAtomTypeMatch

      public void setAtomTypeMatch(boolean b)
      Sets the matching mode for atom types. Atom types can either be considered (checked) or ignored when two molecules are searched for an MCS.
      Parameters:
      b - flags if atom types are considered (true) or ignored (false)
    • setBondTypeMatch

      public void setBondTypeMatch(boolean b)
      Sets the matching mode for bond types. Bond types can either be considered (checked) or ignored when two molecules are searched for an MCS.
      Parameters:
      b - flags if bond types are considered (true) or ignored (false)
    • setChargeMatch

      public void setChargeMatch(boolean b)
      Sets the matching mode for atom formal charges. Charges can either be considered (checked) or ignored when two molecules are searched for an MCS.
      Parameters:
      b - flags if atom charges are considered (true) or ignored (false)
    • setRadicalMatch

      public void setRadicalMatch(boolean b)
      Sets the matching mode for radicals on atoms. Radicals can either be considered (checked) or ignored when two molecules are searched for an MCS.
      Parameters:
      b - flags if atom radicals are considered (true) or ignored (false)
    • setIsotopeMatch

      public void setIsotopeMatch(boolean b)
      Sets the matching mode for isotopes. Isotopes can either be considered (checked) or ignored when two molecules are searched for an MCS.
      Parameters:
      b - flags if atom isotopes are considered (true) or ignored (false)
    • addMolecule

      public void addMolecule(Molecule mol)
      Adds a new molecule to the set of structures to be clustered. The input molecule will be aromatized and the hybridization states of atoms will also be calculated. The mol object is not copied thus its value on output is different form the input value.
      Parameters:
      mol - a molecular structure to be clustered
    • search

      public boolean search() throws InterruptedException
      Performs hierarchical maximum common substructure search. The search terminates if either of the conditions below hold:
      • no more MCS above a given size is found
      • required cluster count is reached
      • allowed number of levels is reached
      When search() terminates method getStopCause() can be invoked to get the termination code (see constants TERMINATION* ).
      Returns:
      indicates is a solution was found not (ie. at least one MCS was found and a cluster was successfully formed)
      Throws:
      InterruptedException
    • step

      public boolean step()
      Adds one more level to the exsisting cluster hierarchy. Method search() must be called prior to this method and it has to return true. Beside, clustering options (typically, the allowed minimum size of the MCS, see setMinimumMCSSize(int)) must be changed, otherwise step() has no effect (since termination conditions were reached when previous search() terminated).
      Returns:
      true if one more level was successfully added to the exsisting cluster hierarchy
    • getClusterEnumerator

      public LibraryMcs.ClusterEnumerator getClusterEnumerator(boolean leavesOnly)
      Gets a new LibraryMcs.ClusterEnumerator object.
      Parameters:
      leavesOnly - leaf nodes or all clusters are enumerated
      Returns:
      the initialized enumerator
    • getClusterEnumerator

      public LibraryMcs.ClusterEnumerator getClusterEnumerator(boolean leavesOnly, boolean selectedOnly)
      Gets a new LibraryMcs.ClusterEnumerator object.
      Parameters:
      leavesOnly - leaf nodes or all clusters are enumerated
      selectedOnly - selected clusters and leaf nodes are listed
      Returns:
      the initialized enumerator
    • getStopCause

      public int getStopCause()
      Internal code of last termination condition. This can be called after search() or step().
      Returns:
      code of last termination condition, see TERMINATION*
    • getStopCauseExplanation

      public String getStopCauseExplanation()
      Detailed explanation why last search terminated. This can be called after search() or step().
      Returns:
      text explaining why the search algorithm terminated
    • getInputStructureCount

      public int getInputStructureCount()
      Retrieves the total number of input structures clustered.
      Returns:
      number of clusters in the lowest level of the hierarchy
    • getLevelCount

      public int getLevelCount()
      Retrieves the total number of levels in the hierarchy.
      Returns:
      number of hierarchy levels
    • getTotalClusterCount

      public int getTotalClusterCount()
      Gets the total number of clusters in the hierarchy. Leaf nodes sotring the input structures are not considered, only higher level nodes that represent real clusters. Singletons are included.
      Returns:
      number of clusters
    • getTopLevelClusterCount

      public int getTopLevelClusterCount()
      Gets the number of clusters on the highest level of the hierarchy. Singletons are included.
      Returns:
      number of clusters on the top hierarchy level
    • main

      @Deprecated(forRemoval=true) @SubjectToRemoval(date=JAN_01_2025) public static void main(String[] args)
      Deprecated, for removal: This API element is subject to removal in a future version.
      This main method will be removed, CLI interfaces should not be used directly from Java code.
      Simple command line interface for batch processing. Run this class by the -h flag in its commandline to get a brief list of command line syntax and options available.
      Parameters:
      args - command line arguments