Class MDSimilarity

java.lang.Object
chemaxon.descriptors.MDSimilarity
All Implemented Interfaces:
chemaxon.license.Licensable

@PublicApi public class MDSimilarity extends Object implements chemaxon.license.Licensable

Performs similarity comparisons between MDSets (see MDSet (for example set of chemical fingerprints and/or pharmacophore fingerprints). Comparisons may be performed, when all the query descriptor sets to which molecular descriptor sets will be compared are added, the metrics to be used are set and filtering options are also set. If filtering thresholds are applied then they should be also given.

After a comparison results may be retrieved by calling methods getDissimilarityCoeff() or getDissimilarityCoeffs().

Typical usage:

 MDSimilarity similarity = MDSimilarity();

 // Add queries from MDReader
 similarity.addQueries( queryReader );
 // Setup metrics and thresholds
 for ( int d = 0; d < descriptorCount; d++ ) {
     for ( int m = 0; m < metricIndices[ d ].length; m++ ) {
          similarity.useMetric( d, metricIndices[ d ][ m ], thresholds[ d ][ m ]);
     }
 }
 // Setup filtering
 if ( andMetrics )
     similarity.passWithAllMetrics();
 else
     similarity.passWithOneMetric();
 if ( andDescriptors )
     similarity.passWithAllDescriptors();
 else
     similarity.passWithOneDescriptor();

 // Setup result writer (table writer in this case)
 MDSimilarityTableWriter twr = new MDSimilarityTableWriter( outputStream, precision );
 if ( !verboseSet ) {
    twr.setVerbosity( verbose );
    twr.setVerboseFrequency( verboseFreq );
    verboseSet = true;
 }
 twr.setPrintId( generateId );
 if ( idTagName != null ) {
     twr.setPrintNaturalId( true );
     twr.setNaturalIdName( idTagName );
 }
 twr.setPrecision( precision );
 similarity.addResultWriter( twr );

 // Perform comparisons, results are written into the specified result writer
 similarity.compare( targetReader );
 
Since:
JChem 2.0
  • Constructor Summary

    Constructors
    Constructor
    Description
    Creates a new instance.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    addQueries(MDReader queryReader)
    Adds new query molecules as their set of descriptors from a chemical descriptor reader.
    void
    addQueries(MDSet[] queries)
    Adds new query molecules as their set of descriptors from an array.
    void
    addQuery(MDSet query)
    Adds a new query molecule as its set of descriptors.
    void
    Adds a MDSimilarityResultWriter object.
    boolean
    compare(int mdIndex, int metricIndex, MDSet target)
    Compares a target descriptor against all queries added prior to the call of this method using the given metric of the given descriptor.
    int
    compare(MDReader targetReader)
    Compares a list of target descriptor sets (read by a molecular descriptor reader) against all queries added prior to the call of this method the same way as compareQueries( MolecularDescriptor target ) but for each target.
    boolean
    compare(MDSet target)
    Compares a target descriptor set (for instance from a database) against all queries added prior to the call of this method.
    float
    getDissimilarityCoeff(int queryIndex, int mdIndex, int metricIndex)
    Retrieves query dissimilarity coefficients (one at a time) of the last compareQueries() or compare() method called.
    float[][]
    getDissimilarityCoeffs(int queryIndex)
    Retrieves query dissimilarity coefficients with all metrics and one query of the last compareQueries() or compare() method called.
    float[]
    getDissimilarityCoeffs(int queryIndex, int mdIndex)
    Retrieves query dissimilarity coefficients with all metrics and one descriptor of the last compareQueries() or compare() method called.
    int
    Gets the number of queries that have already been added.
    int
    getNrOfUsedMetrics(int mdIndex)
    Return the number of metrics used with the given molecular descriptor in similarity calculations.
    getQuery(int queryIndex)
    Gets a query.
    boolean
    Checks the component-wise flag.
    boolean
     
    boolean
    Tells whether filtering of target descriptor sets is set to pass only if each descriptor in the set passes.
    boolean
    Tells whether filtering of target descriptor sets is set to pass only if dissimilarity calculated with each metric used with the descriptor is under the required threshold.
    boolean
    Tells whether filtering of target descriptor sets is set to pass if at least one descriptor in the set passes.
    boolean
    Tells whether filtering of target descriptor sets is set to pass if dissimilarity calculated with at least one metric used with the descriptor is under the required threshold.
    boolean
    isUsedMetric(int mdIndex, int metricIndex)
    Return if the given metric is used with the given molecular descriptor in similarity calculations.
    void
    In the following searches the descriptor set of a target molecule passes the comparison with a query descriptor set, if all descriptors of the set have passed the corresponding comparisons.
    void
    In the following searches a target molecule's molecular descriptor passes the comparison with a corresponding query descriptor, if all dissimilarity coefficients (distances calculated with each metric) between these descriptors are under the previously given threshold.
    void
    In the following searches the descriptor set of a target molecule passes the comparison with a query descriptor set, if at least one descriptor of the set have passed the corresponding comparisons.
    void
    In the following searches a target molecule's molecular descriptor passes the comparison with a corresponding query descriptor, if at least one dissimilarity coefficient between these descriptors is under the previously given threshold.
    void
    setComponentWise(boolean componentWise)
    Sets MDSet evaluation mode.
    void
     
    void
    setThreshold(float threshold)
    Sets threshold for descriptor set mode.
    float
    threshold(int mdIndex, int metricIndex)
    Return the acceptance threshold of the given metric for the given molecular descriptor.
    void
    useMetric(int mdIndex, int metricIndex)
    Use the specified metric for the specified molecular descriptor with the dissimilarity threshold stored in the corresponding parameters settings.
    void
    useMetric(int mdIndex, int metricIndex, float threshold)
    Use the specified metric for the specified molecular descriptor along with the given dissimilarity threshold.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • MDSimilarity

      public MDSimilarity()
      Creates a new instance. Allocates internal storage.
  • Method Details

    • setComponentWise

      public void setComponentWise(boolean componentWise)
      Sets MDSet evaluation mode. Default mode is composite (descriptor set) mode, when one dissimilarity value is calculated for each descriptor set (using selected/default metrics per components and calculating the weighted sum of these dissimilairty values). In component-wise mode each component of a descriptor set yield one dissimilarity value and these values are kept independent in screening (ie. they are not summed).
      Parameters:
      componentWise - indicates component-wise evaluation model]
      Since:
      JChem 2.2
    • addResultWriter

      public void addResultWriter(MDSimilarityResultWriter rwr)
      Adds a MDSimilarityResultWriter object. A MDSimilarity instance can have an arbitrary number and type of such MDSimilarityResultWriters and all are envoked (in the same order as they were added) after each target MDSet has been processed.
      Parameters:
      rwr - a result writer object
      Since:
      JChem 2.2
    • addQuery

      public void addQuery(MDSet query)
      Adds a new query molecule as its set of descriptors. The number of queries is not limited, however their number is supposed to be significantly smaller than the number of targets. In typical usage the number of queries does not exceed 10.
      Once a query is added, it cannot be withdrawn. Added queries must be the composition of the same kind of descriptors.
      Parameters:
      query - Query descriptor set, it is not cloned.
    • addQueries

      public void addQueries(MDSet[] queries)
      Adds new query molecules as their set of descriptors from an array.
      Parameters:
      queries - Array of query descriptor sets, it is not cloned.
    • addQueries

      public void addQueries(MDReader queryReader) throws MDReaderException
      Adds new query molecules as their set of descriptors from a chemical descriptor reader.
      Parameters:
      queryReader - Molecular descriptor set reader of the queries.
      Throws:
      MDReaderException - when failed reading the next descriptor set
      Since:
      JChem 2.2
    • getQuery

      public MDSet getQuery(int queryIndex)
      Gets a query.
      Parameters:
      queryIndex - The index of the query (in order of addition) from 0 to getNrOfQueries() - 1 (both inclusive).
      Returns:
      The set of molecular descriptors of the query
    • getNrOfQueries

      public int getNrOfQueries()
      Gets the number of queries that have already been added.
      Returns:
      Number of query descriptors.
    • setThreshold

      public void setThreshold(float threshold)
      Sets threshold for descriptor set mode. (Component-wise mode uses different threshold values for each descriptor component and metric.)
      Parameters:
      threshold - similarity threshold
      Since:
      JChem 2.2
    • useMetric

      public void useMetric(int mdIndex, int metricIndex, float threshold)
      Use the specified metric for the specified molecular descriptor along with the given dissimilarity threshold.
      Parameters:
      mdIndex - Index of the molecular descriptor in the set.
      metricIndex - Index of the metric.
      threshold - Maximum dissimilarity allowed.
    • useMetric

      public void useMetric(int mdIndex, int metricIndex)
      Use the specified metric for the specified molecular descriptor with the dissimilarity threshold stored in the corresponding parameters settings.
      Parameters:
      mdIndex - Index of the molecular descriptor in the set.
      metricIndex - Index of the metric.
    • isUsedMetric

      public boolean isUsedMetric(int mdIndex, int metricIndex)
      Return if the given metric is used with the given molecular descriptor in similarity calculations.
      Parameters:
      mdIndex - Index of the molecular descriptor in the set.
      metricIndex - Index of the metric.
      Returns:
      Metric in use flag.
    • getNrOfUsedMetrics

      public int getNrOfUsedMetrics(int mdIndex)
      Return the number of metrics used with the given molecular descriptor in similarity calculations.
      Parameters:
      mdIndex - Index of the molecular descriptor in the set.
      Returns:
      Metric in use flag.
    • threshold

      public float threshold(int mdIndex, int metricIndex)
      Return the acceptance threshold of the given metric for the given molecular descriptor.
      Parameters:
      mdIndex - Index of the molecular descriptor in the set.
      metricIndex - Index of the metric.
      Returns:
      Threshold value, -1.0F, if metric is not used.
    • isComponentWise

      public boolean isComponentWise()
      Checks the component-wise flag.
      Returns:
      true if screening work in component-wise mode
      Since:
      JChem 2.2
    • passWithAllMetrics

      public void passWithAllMetrics()
      In the following searches a target molecule's molecular descriptor passes the comparison with a corresponding query descriptor, if all dissimilarity coefficients (distances calculated with each metric) between these descriptors are under the previously given threshold. If this flag is not set, then one coefficient under the threshold is enough for passing (default).
    • isPassWithAllMetrics

      public boolean isPassWithAllMetrics()
      Tells whether filtering of target descriptor sets is set to pass only if dissimilarity calculated with each metric used with the descriptor is under the required threshold.
      Returns:
      true if the condition is met
    • passWithOneMetric

      public void passWithOneMetric()
      In the following searches a target molecule's molecular descriptor passes the comparison with a corresponding query descriptor, if at least one dissimilarity coefficient between these descriptors is under the previously given threshold. This is the default setting.
    • isPassWithOneMetric

      public boolean isPassWithOneMetric()
      Tells whether filtering of target descriptor sets is set to pass if dissimilarity calculated with at least one metric used with the descriptor is under the required threshold.
      Returns:
      true if the condition is met
    • passWithAllDescriptors

      public void passWithAllDescriptors()
      In the following searches the descriptor set of a target molecule passes the comparison with a query descriptor set, if all descriptors of the set have passed the corresponding comparisons. If this flag is not set, then one passing descriptor from the set is enough for passing (default).
    • isPassWithAllDescriptors

      public boolean isPassWithAllDescriptors()
      Tells whether filtering of target descriptor sets is set to pass only if each descriptor in the set passes.
      Returns:
      true if the condition is met
    • passWithOneDescriptor

      public void passWithOneDescriptor()
      In the following searches the descriptor set of a target molecule passes the comparison with a query descriptor set, if at least one descriptor of the set have passed the corresponding comparisons. This is the default setting.
    • isPassWithOneDescriptor

      public boolean isPassWithOneDescriptor()
      Tells whether filtering of target descriptor sets is set to pass if at least one descriptor in the set passes.
      Returns:
      true if the condition is met
    • compare

      public boolean compare(int mdIndex, int metricIndex, MDSet target) throws RuntimeException
      Compares a target descriptor against all queries added prior to the call of this method using the given metric of the given descriptor. The results of the comparison (the dissimilarity coefficients) are stored internally, but only the results of the last comparison are kept, former values are discarded. Thus it is the responsibility of the user of this class to obtain required values by calling queryDissimilarityCoeffs() after compareQueries() is performed.
      The method can be used for filtering purposes, in which case its return value indicates whether the current target descriptor set is filtered out or not. Threshold values are set separately with useMetric().
      Parameters:
      mdIndex - Index of the molecular descriptor.
      metricIndex - Index of the metric.
      target - Target descriptor set.
      Returns:
      Target passed filtering or not.
      Throws:
      RuntimeException - in case of invalid configuration
    • compare

      public boolean compare(MDSet target) throws RuntimeException
      Compares a target descriptor set (for instance from a database) against all queries added prior to the call of this method. Results of the comparison (the dissimilarity coefficients) are stored internally, but only the results of the last comparison are kept, former values are discarded. Thus it is the responsibility of the user of this class to obtain required values by calling queryDissimilarityCoeffs() after compareQueries() is performed.
      The method can be used for filtering purposes, in which case its return value indicates whether the current target descriptor set is filtered out or not. Threshold values are set separately with useMetric().
      Parameters:
      target - Target descriptor set.
      Returns:
      Target passed filtering or not.
      Throws:
      RuntimeException - in case of invalid configuration
    • compare

      public int compare(MDReader targetReader) throws MDReaderException, RuntimeException
      Compares a list of target descriptor sets (read by a molecular descriptor reader) against all queries added prior to the call of this method the same way as compareQueries( MolecularDescriptor target ) but for each target.
      Processing the results is the responsibility of the class implementing the MDSimilarityResultWriter interface.
      Before starting the processing of targets the open() procedure of MDSimilarityResultWriter is executed, then after processing each target the procedure write() is invoked, after the processing has ended the procedure close() is invoked.
      Parameters:
      targetReader - Reader of target descriptor sets.
      Returns:
      Number of targets that passed filtering.
      Throws:
      MDReaderException - when failed reading the next descriptor set
      RuntimeException - in case of invalid configuration
      Since:
      JChem 2.2
    • getDissimilarityCoeff

      public float getDissimilarityCoeff(int queryIndex, int mdIndex, int metricIndex)
      Retrieves query dissimilarity coefficients (one at a time) of the last compareQueries() or compare() method called.
      Parameters:
      queryIndex - Index of the query molecule. Query molecules are numbered from 0 to nQueries() - 1, in the same order as added with addQuery().
      mdIndex - Index of molecular descriptor component in the set.
      metricIndex - Index of the metric.
      Returns:
      Value of the dissimilarity coefficient.
    • getDissimilarityCoeffs

      public float[] getDissimilarityCoeffs(int queryIndex, int mdIndex)
      Retrieves query dissimilarity coefficients with all metrics and one descriptor of the last compareQueries() or compare() method called.
      Parameters:
      queryIndex - Index of the query molecule. Query molecules are numbered from 0 to nQueries() - 1, in the same order as added with addQuery().
      mdIndex - Index of molecular descriptor component in the set.
      Returns:
      Array of dissimilarity coefficients with each metrics.
    • getDissimilarityCoeffs

      public float[][] getDissimilarityCoeffs(int queryIndex)
      Retrieves query dissimilarity coefficients with all metrics and one query of the last compareQueries() or compare() method called.
      Parameters:
      queryIndex - Index of the query molecule. Query molecules are numbered from 0 to nQueries() - 1, in the same order as added with addQuery().
      Returns:
      Array of dissimilarity coefficients with each descriptor and metric.
    • isLicensed

      public boolean isLicensed()
      Specified by:
      isLicensed in interface chemaxon.license.Licensable
    • setLicenseEnvironment

      public void setLicenseEnvironment(String env)
      Specified by:
      setLicenseEnvironment in interface chemaxon.license.Licensable