Package chemaxon.clustering.libmcs
Class LibraryMcs
java.lang.Object
chemaxon.clustering.libmcs.LibraryMcs
- All Implemented Interfaces:
chemaxon.license.Licensable
- Direct Known Subclasses:
LibraryMCS
The
It is not one subgraph common to all or to the majority of input molecules that the algorithm determines but the set of the most frequently occurring common substructures. The more diverse the set to be analysed is the larger the number of the frequent common substructures is, while in case of a more focused set with limited structural diversity, the number of frequent common substructures is smaller.
The algorithm is capable of going one or more level further in this kind of scaffold analysis by finding the MCS of the frequent common substructures - and so on in a hierarchical manner.
Practically speaking structures are clustered based on their MCSs (not on their similarities etc.) in a hierarchical clustering procedure.
This class implements the
This class also provides a simple command line interface for batch processing of MCS search for a set of structures, as well as a simple graphical user interface for easy navigation through clusters of structures.
LibraryMCS
class computes the maximum common substructure (MCS)
of a set of compounds. It can suggest scaffolds of a library, in particular VHTS hit sets.
Typical size of such input structure set is a few thousand molecules, but LibraryMCS can cope with 10,000s of
molecules.
It is not one subgraph common to all or to the majority of input molecules that the algorithm determines but the set of the most frequently occurring common substructures. The more diverse the set to be analysed is the larger the number of the frequent common substructures is, while in case of a more focused set with limited structural diversity, the number of frequent common substructures is smaller.
The algorithm is capable of going one or more level further in this kind of scaffold analysis by finding the MCS of the frequent common substructures - and so on in a hierarchical manner.
Practically speaking structures are clustered based on their MCSs (not on their similarities etc.) in a hierarchical clustering procedure.
This class implements the
ClusterEnumberator
class which
allows clients to retieve the hierarchy. The tree of clusters as well as
data associated with nodes in this tree can be accessed along with various code values
that help reconstruct the hierarchy in custom applications.
This class also provides a simple command line interface for batch processing of MCS search for a set of structures, as well as a simple graphical user interface for easy navigation through clusters of structures.
- Since:
- JChem 3.2
-
Nested Class Summary
Modifier and TypeClassDescriptionclass
TheClusterEnumerator
is the right way to obtain results of a LibraryMCS clustering. -
Field Summary
Modifier and TypeFieldDescriptionstatic final int
structures above this size are not searched for pair-wise mcs as it would take to long to calculate the MCSstatic final int
maximum number of levels in the hierarchystatic final boolean
atom types are matched by defaultstatic final boolean
bond types are matched by defaultstatic final boolean
atom formal charges are matched by defaultstatic final boolean
atom isotopes are not matched by defaultstatic final boolean
Rings are not broken by defaultstatic final SearchMode
default MCS search modestatic final int
default MCS size limit, the algorithm does not search for an MCS below this limitstatic final boolean
atom radicals are not matched by defaultstatic final int
minimum number of top-level clustersstatic final int
maximum allowed number of hierarchy levelsstatic final int
last search terminated due to user cancellationstatic final int
last search terminated because the required top level cluster count was reachedstatic final int
last search terminated due to an error, solution is not foundstatic final int
last search terminated because the predefined allowed level count was reachedstatic final int
last search terminated becasue the allowed minimum MCS size was reachedstatic final int
last attempt to cluster one level failed as the clustering paramters were the same as one the last levelstatic final int
invalid call of methodstep()
static final int
last search terminated for an unknown reason, solution may not be found -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
addMolecule
(Molecule mol) Adds a new molecule to the set of structures to be clustered.getClusterEnumerator
(boolean leavesOnly) Gets a newLibraryMcs.ClusterEnumerator
object.getClusterEnumerator
(boolean leavesOnly, boolean selectedOnly) Gets a newLibraryMcs.ClusterEnumerator
object.int
Retrieves the total number of input structures clustered.int
Retrieves the total number of levels in the hierarchy.int
Internal code of last termination condition.Detailed explanation why last search terminated.int
Gets the number of clusters on the highest level of the hierarchy.int
Gets the total number of clusters in the hierarchy.final boolean
static void
Deprecated, for removal: This API element is subject to removal in a future version.This main method will be removed, CLI interfaces should not be used directly from Java code.void
reset()
Resets the internal state to the initial values.boolean
search()
Performs hierarchical maximum common substructure search.void
setAllowedLevelCount
(int allowedLevelCount) Sets the maximum number of hierarchy levels allowed in clustering.void
setAtomCountUpperBound
(int atomCountUpperBound) Sets the maximum structure size for pairwise mcs search.void
setAtomTypeMatch
(boolean b) Sets the matching mode for atom types.void
setBondTypeMatch
(boolean b) Sets the matching mode for bond types.void
setChargeMatch
(boolean b) Sets the matching mode for atom formal charges.void
setIsotopeMatch
(boolean b) Sets the matching mode for isotopes.void
setKeepRings
(boolean keepRings) Sets whether rings should be kept or they can be broken.final void
setLicenseEnvironment
(String string) void
setMCSMode
(SearchMode mode) Sets MCS search strategy.void
setMinimumMCSSize
(int mcsSize) Sets the minimum size of any MCS found.void
setRadicalMatch
(boolean b) Sets the matching mode for radicals on atoms.void
setRequiredClusterCount
(int requiredClusterCount) Sets the minimal number of clusters required on the top level of hierarchy.boolean
step()
Adds one more level to the exsisting cluster hierarchy.
-
Field Details
-
DEFAULT_REQUIRED_CLUSTER_COUNT
public static final int DEFAULT_REQUIRED_CLUSTER_COUNTminimum number of top-level clusters- See Also:
-
DEFAULT_ALLOWED_LEVEL_COUNT
public static final int DEFAULT_ALLOWED_LEVEL_COUNTmaximum number of levels in the hierarchy- See Also:
-
ATOM_COUNT_UPPER_BOUND
public static final int ATOM_COUNT_UPPER_BOUNDstructures above this size are not searched for pair-wise mcs as it would take to long to calculate the MCS- See Also:
-
MAX_LEVEL_COUNT
public static final int MAX_LEVEL_COUNTmaximum allowed number of hierarchy levels- See Also:
-
DEFAULT_MCS_MODE
default MCS search mode -
DEFAULT_KEEP_RINGS_MODE
public static final boolean DEFAULT_KEEP_RINGS_MODERings are not broken by default- See Also:
-
DEFAULT_ATOM_TYPE_MATCH
public static final boolean DEFAULT_ATOM_TYPE_MATCHatom types are matched by default- See Also:
-
DEFAULT_BOND_TYPE_MATCH
public static final boolean DEFAULT_BOND_TYPE_MATCHbond types are matched by default- See Also:
-
DEFAULT_CHARGE_MATCH
public static final boolean DEFAULT_CHARGE_MATCHatom formal charges are matched by default- See Also:
-
DEFAULT_RADICAL_MATCH
public static final boolean DEFAULT_RADICAL_MATCHatom radicals are not matched by default- See Also:
-
DEFAULT_ISOTOPE_MATCH
public static final boolean DEFAULT_ISOTOPE_MATCHatom isotopes are not matched by default- See Also:
-
DEFAULT_MIN_MCS_SIZE
public static final int DEFAULT_MIN_MCS_SIZEdefault MCS size limit, the algorithm does not search for an MCS below this limit- See Also:
-
TERMINATION_UNKNOWN
public static final int TERMINATION_UNKNOWNlast search terminated for an unknown reason, solution may not be found- See Also:
-
TERMINATION_ERROR
public static final int TERMINATION_ERRORlast search terminated due to an error, solution is not found- See Also:
-
TERMINATION_LEVEL_COUNT
public static final int TERMINATION_LEVEL_COUNTlast search terminated because the predefined allowed level count was reached- See Also:
-
TERMINATION_CLUSTER_COUNT
public static final int TERMINATION_CLUSTER_COUNTlast search terminated because the required top level cluster count was reached- See Also:
-
TERMINATION_MCS_SIZE_LIMIT
public static final int TERMINATION_MCS_SIZE_LIMITlast search terminated becasue the allowed minimum MCS size was reached- See Also:
-
TERMINATION_CANCEL
public static final int TERMINATION_CANCELlast search terminated due to user cancellation- See Also:
-
TERMINATION_SAME_PARAMETERS
public static final int TERMINATION_SAME_PARAMETERSlast attempt to cluster one level failed as the clustering paramters were the same as one the last level- See Also:
-
TERMINATION_STEP_NOT_ALLOWED
public static final int TERMINATION_STEP_NOT_ALLOWEDinvalid call of methodstep()
- See Also:
-
-
Constructor Details
-
LibraryMcs
Creates an new LibraryMCS instance. It is an empty chemical space that is ready to take structures to be clustered.- Throws:
LicenseException
- when no valid license found
-
-
Method Details
-
isLicensed
public final boolean isLicensed()- Specified by:
isLicensed
in interfacechemaxon.license.Licensable
-
setLicenseEnvironment
- Specified by:
setLicenseEnvironment
in interfacechemaxon.license.Licensable
-
reset
public void reset()Resets the internal state to the initial values. Note, that it does not clear the chemical space, that is, input structures that were added previously (and clustered) are not removed, clusters are deleted. This allows running clustering from scratch but without the need to import and add input molecules again.
Typically, parameters are changed before reclustering. -
setRequiredClusterCount
public void setRequiredClusterCount(int requiredClusterCount) Sets the minimal number of clusters required on the top level of hierarchy. Search terminates if there is the number of clusters on the highest level of the hierarchy is less than this limit.- Parameters:
requiredClusterCount
- number of top level clusters
-
setAllowedLevelCount
public void setAllowedLevelCount(int allowedLevelCount) Sets the maximum number of hierarchy levels allowed in clustering. Clustering terminates when the hierarchy has this many levels.- Parameters:
allowedLevelCount
- number of hierarchy levels allowed (tree depth)
-
setAtomCountUpperBound
public void setAtomCountUpperBound(int atomCountUpperBound) Sets the maximum structure size for pairwise mcs search. Sructures above this size are not selected for a pair-wise mcs search. This limit has strong effect on the results as well as on the total running time. MCS search for larger structure (e.g. above 40 atoms) can be slow. -
setMCSMode
Sets MCS search strategy. Allowed values are NORMAL or FAST (default).- Parameters:
mode
- mode flag
-
setMinimumMCSSize
public void setMinimumMCSSize(int mcsSize) Sets the minimum size of any MCS found. MCSs below this size limit are ignored.- Parameters:
mcsSize
- minimum required size of any MCS
-
setKeepRings
public void setKeepRings(boolean keepRings) Sets whether rings should be kept or they can be broken.- Parameters:
keepRings
- false if rings can be broken.
-
setAtomTypeMatch
public void setAtomTypeMatch(boolean b) Sets the matching mode for atom types. Atom types can either be considered (checked) or ignored when two molecules are searched for an MCS.- Parameters:
b
- flags if atom types are considered (true) or ignored (false)
-
setBondTypeMatch
public void setBondTypeMatch(boolean b) Sets the matching mode for bond types. Bond types can either be considered (checked) or ignored when two molecules are searched for an MCS.- Parameters:
b
- flags if bond types are considered (true) or ignored (false)
-
setChargeMatch
public void setChargeMatch(boolean b) Sets the matching mode for atom formal charges. Charges can either be considered (checked) or ignored when two molecules are searched for an MCS.- Parameters:
b
- flags if atom charges are considered (true) or ignored (false)
-
setRadicalMatch
public void setRadicalMatch(boolean b) Sets the matching mode for radicals on atoms. Radicals can either be considered (checked) or ignored when two molecules are searched for an MCS.- Parameters:
b
- flags if atom radicals are considered (true) or ignored (false)
-
setIsotopeMatch
public void setIsotopeMatch(boolean b) Sets the matching mode for isotopes. Isotopes can either be considered (checked) or ignored when two molecules are searched for an MCS.- Parameters:
b
- flags if atom isotopes are considered (true) or ignored (false)
-
addMolecule
Adds a new molecule to the set of structures to be clustered. The input molecule will be aromatized and the hybridization states of atoms will also be calculated. Themol
object is not copied thus its value on output is different form the input value.- Parameters:
mol
- a molecular structure to be clustered
-
search
Performs hierarchical maximum common substructure search. The search terminates if either of the conditions below hold:- no more MCS above a given size is found
- required cluster count is reached
- allowed number of levels is reached
getStopCause()
can be invoked to get the termination code (see constants
).TERMINATION*
- Returns:
- indicates is a solution was found not (ie. at least one MCS was found and a cluster was successfully formed)
- Throws:
InterruptedException
-
step
public boolean step()Adds one more level to the exsisting cluster hierarchy. Methodsearch()
must be called prior to this method and it has to returntrue
. Beside, clustering options (typically, the allowed minimum size of the MCS, seesetMinimumMCSSize(int)
) must be changed, otherwisestep()
has no effect (since termination conditions were reached when previoussearch()
terminated).- Returns:
- true if one more level was successfully added to the exsisting cluster hierarchy
-
getClusterEnumerator
Gets a newLibraryMcs.ClusterEnumerator
object.- Parameters:
leavesOnly
- leaf nodes or all clusters are enumerated- Returns:
- the initialized enumerator
-
getClusterEnumerator
Gets a newLibraryMcs.ClusterEnumerator
object.- Parameters:
leavesOnly
- leaf nodes or all clusters are enumeratedselectedOnly
- selected clusters and leaf nodes are listed- Returns:
- the initialized enumerator
-
getStopCause
public int getStopCause()Internal code of last termination condition. This can be called aftersearch()
orstep()
.- Returns:
- code of last termination condition, see
TERMINATION*
-
getStopCauseExplanation
Detailed explanation why last search terminated. This can be called aftersearch()
orstep()
.- Returns:
- text explaining why the search algorithm terminated
-
getInputStructureCount
public int getInputStructureCount()Retrieves the total number of input structures clustered.- Returns:
- number of clusters in the lowest level of the hierarchy
-
getLevelCount
public int getLevelCount()Retrieves the total number of levels in the hierarchy.- Returns:
- number of hierarchy levels
-
getTotalClusterCount
public int getTotalClusterCount()Gets the total number of clusters in the hierarchy. Leaf nodes sotring the input structures are not considered, only higher level nodes that represent real clusters. Singletons are included.- Returns:
- number of clusters
-
getTopLevelClusterCount
public int getTopLevelClusterCount()Gets the number of clusters on the highest level of the hierarchy. Singletons are included.- Returns:
- number of clusters on the top hierarchy level
-
main
@Deprecated(forRemoval=true) @SubjectToRemoval(date=JAN_01_2025) public static void main(String[] args) Deprecated, for removal: This API element is subject to removal in a future version.This main method will be removed, CLI interfaces should not be used directly from Java code.Simple command line interface for batch processing. Run this class by the-h
flag in its commandline to get a brief list of command line syntax and options available.- Parameters:
args
- command line arguments
-