Class LibraryMCS

  • All Implemented Interfaces:
    chemaxon.license.Licensable

    @PublicAPI
    public class LibraryMCS
    extends Object
    implements chemaxon.license.Licensable
    The LibraryMCS class computes the maximum common substructure (MCS) of a set of compounds. It can suggest scaffolds of a library, in particular VHTS hit sets. Typical size of such input structure set is a few thousand molecules, but LibraryMCS can cope with 10,000s of molecules.
    It is not one subgraph common to all or to the majority of input molecules that the algorithm determines but the set of the most frequently occurring common substructures. The more diverse the set to be analysed is the larger the number of the frequent common substructures is, while in case of a more focused set with limited structural diversity, the number of frequent common substructures is smaller.
    The algorithm is capable of going one or more level further in this kind of scaffold analysis by finding the MCS of the frequent common substructures - and so on in a hierarchical manner.
    Practically speaking structures are clustered based on their MCSs (not on their similarities etc.) in a hierarchical clustering procedure.
    This class implements the ClusterEnumberator class which allows clients to retieve the hierarchy. The tree of clusters as well as data associated with nodes in this tree can be accessed along with various code values that help reconstruct the hierarchy in custom applications.
    This class also provides a simple command line interface for batch processing of MCS search for a set of structures, as well as a simple graphical user interface for easy navigation through clusters of structures.
    Since:
    JChem 3.2
    • Field Detail

      • DEFAULT_REQUIRED_CLUSTER_COUNT

        public static final int DEFAULT_REQUIRED_CLUSTER_COUNT
        minimum number of top-level clusters
        See Also:
        Constant Field Values
      • DEFAULT_ALLOWED_LEVEL_COUNT

        public static final int DEFAULT_ALLOWED_LEVEL_COUNT
        maximum number of levels in the hierarchy
        See Also:
        Constant Field Values
      • ATOM_COUNT_UPPER_BOUND

        public static final int ATOM_COUNT_UPPER_BOUND
        structures above this size are not searched for pair-wise mcs as it would take to long to calculate the MCS
        See Also:
        Constant Field Values
      • MAX_LEVEL_COUNT

        public static final int MAX_LEVEL_COUNT
        maximum allowed number of hierarchy levels
        See Also:
        Constant Field Values
      • DEFAULT_MCS_MODE

        public static final SearchMode DEFAULT_MCS_MODE
        default MCS search mode
      • DEFAULT_KEEP_RINGS_MODE

        public static final boolean DEFAULT_KEEP_RINGS_MODE
        Rings are not broken by default
        See Also:
        Constant Field Values
      • DEFAULT_ATOM_TYPE_MATCH

        public static final boolean DEFAULT_ATOM_TYPE_MATCH
        atom types are matched by default
        See Also:
        Constant Field Values
      • DEFAULT_BOND_TYPE_MATCH

        public static final boolean DEFAULT_BOND_TYPE_MATCH
        bond types are matched by default
        See Also:
        Constant Field Values
      • DEFAULT_CHARGE_MATCH

        public static final boolean DEFAULT_CHARGE_MATCH
        atom formal charges are matched by default
        See Also:
        Constant Field Values
      • DEFAULT_RADICAL_MATCH

        public static final boolean DEFAULT_RADICAL_MATCH
        atom radicals are not matched by default
        See Also:
        Constant Field Values
      • DEFAULT_ISOTOPE_MATCH

        public static final boolean DEFAULT_ISOTOPE_MATCH
        atom isotopes are not matched by default
        See Also:
        Constant Field Values
      • DEFAULT_MIN_MCS_SIZE

        public static final int DEFAULT_MIN_MCS_SIZE
        default MCS size limit, the algorithm does not search for an MCS below this limit
        See Also:
        Constant Field Values
      • TERMINATION_UNKNOWN

        public static final int TERMINATION_UNKNOWN
        last search terminated for an unknown reason, solution may not be found
        See Also:
        Constant Field Values
      • TERMINATION_ERROR

        public static final int TERMINATION_ERROR
        last search terminated due to an error, solution is not found
        See Also:
        Constant Field Values
      • TERMINATION_LEVEL_COUNT

        public static final int TERMINATION_LEVEL_COUNT
        last search terminated because the predefined allowed level count was reached
        See Also:
        Constant Field Values
      • TERMINATION_CLUSTER_COUNT

        public static final int TERMINATION_CLUSTER_COUNT
        last search terminated because the required top level cluster count was reached
        See Also:
        Constant Field Values
      • TERMINATION_MCS_SIZE_LIMIT

        public static final int TERMINATION_MCS_SIZE_LIMIT
        last search terminated becasue the allowed minimum MCS size was reached
        See Also:
        Constant Field Values
      • TERMINATION_CANCEL

        public static final int TERMINATION_CANCEL
        last search terminated due to user cancellation
        See Also:
        Constant Field Values
      • TERMINATION_SAME_PARAMETERS

        public static final int TERMINATION_SAME_PARAMETERS
        last attempt to cluster one level failed as the clustering paramters were the same as one the last level
        See Also:
        Constant Field Values
      • TERMINATION_STEP_NOT_ALLOWED

        public static final int TERMINATION_STEP_NOT_ALLOWED
        invalid call of method step()
        See Also:
        Constant Field Values
    • Constructor Detail

      • LibraryMCS

        public LibraryMCS()
                   throws chemaxon.license.LicenseException
        Creates an new LibraryMCS instance. It is an empty chemical space that is ready to take structures to be clustered.
        Throws:
        chemaxon.license.LicenseException - when no valid license found
    • Method Detail

      • isLicensed

        public final boolean isLicensed()
        Specified by:
        isLicensed in interface chemaxon.license.Licensable
      • setLicenseEnvironment

        public final void setLicenseEnvironment​(String string)
        Specified by:
        setLicenseEnvironment in interface chemaxon.license.Licensable
      • reset

        public void reset()
        Resets the internal state to the initial values. Note, that it does not clear the chemical space, that is, input structures that were added previously (and clustered) are not removed, clusters are deleted. This allows running clustering from scratch but without the need to import and add input molecules again.
        Typically, parameters are changed before reclustering.
      • setRequiredClusterCount

        public void setRequiredClusterCount​(int requiredClusterCount)
        Sets the minimal number of clusters required on the top level of hierarchy. Search terminates if there is the number of clusters on the highest level of the hierarchy is less than this limit.
        Parameters:
        requiredClusterCount - number of top level clusters
      • setAllowedLevelCount

        public void setAllowedLevelCount​(int allowedLevelCount)
        Sets the maximum number of hierarchy levels allowed in clustering. Clustering terminates when the hierarchy has this many levels.
        Parameters:
        allowedLevelCount - number of hierarchy levels allowed (tree depth)
      • setAtomCountUpperBound

        public void setAtomCountUpperBound​(int atomCountUpperBound)
        Sets the maximum structure size for pairwise mcs search. Sructures above this size are not selected for a pair-wise mcs search. This limit has strong effect on the results as well as on the total running time. MCS search for larger structure (e.g. above 40 atoms) can be slow.
        Parameters:
        atomCountUpperBound -
      • setMCSMode

        public void setMCSMode​(SearchMode mode)
        Sets MCS search strategy. Allowed values are NORMAL or FAST (default).
        Parameters:
        mode - mode flag
      • setMinimumMCSSize

        public void setMinimumMCSSize​(int mcsSize)
        Sets the minimum size of any MCS found. MCSs below this size limit are ignored.
        Parameters:
        mcsSize - minimum required size of any MCS
      • setKeepRings

        public void setKeepRings​(boolean keepRings)
        Sets whether rings should be kept or they can be broken.
        Parameters:
        keepRings - false if rings can be broken.
      • setAtomTypeMatch

        public void setAtomTypeMatch​(boolean b)
        Sets the matching mode for atom types. Atom types can either be considered (checked) or ignored when two molecules are searched for an MCS.
        Parameters:
        b - flags if atom types are considered (true) or ignored (false)
      • setBondTypeMatch

        public void setBondTypeMatch​(boolean b)
        Sets the matching mode for bond types. Bond types can either be considered (checked) or ignored when two molecules are searched for an MCS.
        Parameters:
        b - flags if bond types are considered (true) or ignored (false)
      • setChargeMatch

        public void setChargeMatch​(boolean b)
        Sets the matching mode for atom formal charges. Charges can either be considered (checked) or ignored when two molecules are searched for an MCS.
        Parameters:
        b - flags if atom charges are considered (true) or ignored (false)
      • setRadicalMatch

        public void setRadicalMatch​(boolean b)
        Sets the matching mode for radicals on atoms. Radicals can either be considered (checked) or ignored when two molecules are searched for an MCS.
        Parameters:
        b - flags if atom radicals are considered (true) or ignored (false)
      • setIsotopeMatch

        public void setIsotopeMatch​(boolean b)
        Sets the matching mode for isotopes. Isotopes can either be considered (checked) or ignored when two molecules are searched for an MCS.
        Parameters:
        b - flags if atom isotopes are considered (true) or ignored (false)
      • addMolecule

        public void addMolecule​(Molecule mol)
        Adds a new molecule to the set of structures to be clustered. The input molecule will be aromatized and the hybridization states of atoms will also be calculated. The mol object is not copied thus its value on output is different form the input value.
        Parameters:
        mol - a molecular structure to be clustered
      • search

        public boolean search()
                       throws InterruptedException
        Performs hierarchical maximum common substructure search. The search terminates if either of the conditions below hold:
        • no more MCS above a given size is found
        • required cluster count is reached
        • allowed number of levels is reached
        When search() terminates method getStopCause() can be invoked to get the termination code (see constants TERMINATION*).
        Returns:
        indicates is a solution was found not (ie. at least one MCS was found and a cluster was successfully formed)
        Throws:
        InterruptedException
      • step

        public boolean step()
        Adds one more level to the exsisting cluster hierarchy. Method search() must be called prior to this method and it has to return true. Beside, clustering options (typically, the allowed minimum size of the MCS, see setMinimumMCSSize(int)) must be changed, otherwise step() has no effect (since termination conditions were reached when previous search() terminated).
        Returns:
        true if one more level was successfully added to the exsisting cluster hierarchy
      • getClusterEnumerator

        public LibraryMCS.ClusterEnumerator getClusterEnumerator​(boolean leavesOnly,
                                                                 boolean selectedOnly)
        Gets a new LibraryMCS.ClusterEnumerator object.
        Parameters:
        leavesOnly - leaf nodes or all clusters are enumerated
        selectedOnly - selected clusters and leaf nodes are listed
        Returns:
        the initialized enumerator
      • getStopCause

        public int getStopCause()
        Internal code of last termination condition. This can be called after search() or step().
        Returns:
        code of last termination condition, see TERMINATION*
      • getStopCauseExplanation

        public String getStopCauseExplanation()
        Detailed explanation why last search terminated. This can be called after search() or step().
        Returns:
        text explaining why the search algorithm terminated
      • getInputStructureCount

        public int getInputStructureCount()
        Retrieves the total number of input structures clustered.
        Returns:
        number of clusters in the lowest level of the hierarchy
      • getLevelCount

        public int getLevelCount()
        Retrieves the total number of levels in the hierarchy.
        Returns:
        number of hierarchy levels
      • getTotalClusterCount

        public int getTotalClusterCount()
        Gets the total number of clusters in the hierarchy. Leaf nodes sotring the input structures are not considered, only higher level nodes that represent real clusters. Singletons are included.
        Returns:
        number of clusters
      • getTopLevelClusterCount

        public int getTopLevelClusterCount()
        Gets the number of clusters on the highest level of the hierarchy. Singletons are included.
        Returns:
        number of clusters on the top hierarchy level
      • main

        public static void main​(String[] args)
        Simple command line interface for batch processing. Run this class by the -h flag in its commandline to get a brief list of command line syntax and options available.
        Parameters:
        args - command line arguments