Class MDSimilarity

  • All Implemented Interfaces:
    chemaxon.license.Licensable

    @PublicAPI
    public class MDSimilarity
    extends Object
    implements chemaxon.license.Licensable

    Performs similarity comparisons between MDSets (see MDSet (for example set of chemical fingerprints and/or pharmacophore fingerprints). Comparisons may be performed, when all the query descriptor sets to which molecular descriptor sets will be compared are added, the metrics to be used are set and filtering options are also set. If filtering thresholds are applied then they should be also given.

    After a comparison results may be retrieved by calling methods getDissimilarityCoeff() or getDissimilarityCoeffs().

    Typical usage:

     MDSimilarity similarity = MDSimilarity();
    
     // Add queries from MDReader
     similarity.addQueries( queryReader );            
     // Setup metrics and thresholds
     for ( int d = 0; d < descriptorCount; d++ ) {
         for ( int m = 0; m < metricIndices[ d ].length; m++ ) {
              similarity.useMetric( d, metricIndices[ d ][ m ], thresholds[ d ][ m ]);
         }
     }
     // Setup filtering
     if ( andMetrics ) 
         similarity.passWithAllMetrics();
     else
         similarity.passWithOneMetric();
     if ( andDescriptors ) 
         similarity.passWithAllDescriptors();
     else
         similarity.passWithOneDescriptor();
     
     // Setup result writer (table writer in this case)  
     MDSimilarityTableWriter twr = new MDSimilarityTableWriter( outputStream, precision );
     if ( !verboseSet ) {
        twr.setVerbosity( verbose );
        twr.setVerboseFrequency( verboseFreq );
        verboseSet = true;
     }
     twr.setPrintId( generateId );
     if ( idTagName != null ) {
         twr.setPrintNaturalId( true );
         twr.setNaturalIdName( idTagName );
     }
     twr.setPrecision( precision );
     similarity.addResultWriter( twr );
    
     // Perform comparisons, results are written into the specified result writer
     similarity.compare( targetReader );
     
    Since:
    JChem 2.0
    • Constructor Summary

      Constructors 
      Constructor Description
      MDSimilarity()
      Creates a new instance.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void addQueries​(MDReader queryReader)
      Adds new query molecules as their set of descriptors from a chemical descriptor reader.
      void addQueries​(MDSet[] queries)
      Adds new query molecules as their set of descriptors from an array.
      void addQuery​(MDSet query)
      Adds a new query molecule as its set of descriptors.
      void addResultWriter​(MDSimilarityResultWriter rwr)
      Adds a MDSimilarityResultWriter object.
      boolean compare​(int mdIndex, int metricIndex, MDSet target)
      Compares a target descriptor against all queries added prior to the call of this method using the given metric of the given descriptor.
      int compare​(MDReader targetReader)
      Compares a list of target descriptor sets (read by a molecular descriptor reader) against all queries added prior to the call of this method the same way as compareQueries( MolecularDescriptor target ) but for each target.
      boolean compare​(MDSet target)
      Compares a target descriptor set (for instance from a database) against all queries added prior to the call of this method.
      float getDissimilarityCoeff​(int queryIndex, int mdIndex, int metricIndex)
      Retrieves query dissimilarity coefficients (one at a time) of the last compareQueries() or compare() method called.
      float[][] getDissimilarityCoeffs​(int queryIndex)
      Retrieves query dissimilarity coefficients with all metrics and one query of the last compareQueries() or compare() method called.
      float[] getDissimilarityCoeffs​(int queryIndex, int mdIndex)
      Retrieves query dissimilarity coefficients with all metrics and one descriptor of the last compareQueries() or compare() method called.
      int getNrOfQueries()
      Gets the number of queries that have already been added.
      int getNrOfUsedMetrics​(int mdIndex)
      Return the number of metrics used with the given molecular descriptor in similarity calculations.
      MDSet getQuery​(int queryIndex)
      Gets a query.
      boolean isComponentWise()
      Checks the component-wise flag.
      boolean isLicensed()  
      boolean isPassWithAllDescriptors()
      Tells whether filtering of target descriptor sets is set to pass only if each descriptor in the set passes.
      boolean isPassWithAllMetrics()
      Tells whether filtering of target descriptor sets is set to pass only if dissimilarity calculated with each metric used with the descriptor is under the required threshold.
      boolean isPassWithOneDescriptor()
      Tells whether filtering of target descriptor sets is set to pass if at least one descriptor in the set passes.
      boolean isPassWithOneMetric()
      Tells whether filtering of target descriptor sets is set to pass if dissimilarity calculated with at least one metric used with the descriptor is under the required threshold.
      boolean isUsedMetric​(int mdIndex, int metricIndex)
      Return if the given metric is used with the given molecular descriptor in similarity calculations.
      void passWithAllDescriptors()
      In the following searches the descriptor set of a target molecule passes the comparison with a query descriptor set, if all descriptors of the set have passed the corresponding comparisons.
      void passWithAllMetrics()
      In the following searches a target molecule's molecular descriptor passes the comparison with a corresponding query descriptor, if all dissimilarity coefficients (distances calculated with each metric) between these descriptors are under the previously given threshold.
      void passWithOneDescriptor()
      In the following searches the descriptor set of a target molecule passes the comparison with a query descriptor set, if at least one descriptor of the set have passed the corresponding comparisons.
      void passWithOneMetric()
      In the following searches a target molecule's molecular descriptor passes the comparison with a corresponding query descriptor, if at least one dissimilarity coefficient between these descriptors is under the previously given threshold.
      void setComponentWise​(boolean componentWise)
      Sets MDSet evaluation mode.
      void setLicenseEnvironment​(String env)  
      void setThreshold​(float threshold)
      Sets threshold for descriptor set mode.
      float threshold​(int mdIndex, int metricIndex)
      Return the acceptance threshold of the given metric for the given molecular descriptor.
      void useMetric​(int mdIndex, int metricIndex)
      Use the specified metric for the specified molecular descriptor with the dissimilarity threshold stored in the corresponding parameters settings.
      void useMetric​(int mdIndex, int metricIndex, float threshold)
      Use the specified metric for the specified molecular descriptor along with the given dissimilarity threshold.
    • Constructor Detail

      • MDSimilarity

        public MDSimilarity()
        Creates a new instance. Allocates internal storage.
    • Method Detail

      • setComponentWise

        public void setComponentWise​(boolean componentWise)
        Sets MDSet evaluation mode. Default mode is composite (descriptor set) mode, when one dissimilarity value is calculated for each descriptor set (using selected/default metrics per components and calculating the weighted sum of these dissimilairty values). In component-wise mode each component of a descriptor set yield one dissimilarity value and these values are kept independent in screening (ie. they are not summed).
        Parameters:
        componentWise - indicates component-wise evaluation model]
        Since:
        JChem 2.2
      • addResultWriter

        public void addResultWriter​(MDSimilarityResultWriter rwr)
        Adds a MDSimilarityResultWriter object. A MDSimilarity instance can have an arbitrary number and type of such MDSimilarityResultWriters and all are envoked (in the same order as they were added) after each target MDSet has been processed.
        Parameters:
        rwr - a result writer object
        Since:
        JChem 2.2
      • addQuery

        public void addQuery​(MDSet query)
        Adds a new query molecule as its set of descriptors. The number of queries is not limited, however their number is supposed to be significantly smaller than the number of targets. In typical usage the number of queries does not exceed 10.
        Once a query is added, it cannot be withdrawn. Added queries must be the composition of the same kind of descriptors.
        Parameters:
        query - Query descriptor set, it is not cloned.
      • addQueries

        public void addQueries​(MDSet[] queries)
        Adds new query molecules as their set of descriptors from an array.
        Parameters:
        queries - Array of query descriptor sets, it is not cloned.
      • addQueries

        public void addQueries​(MDReader queryReader)
                        throws MDReaderException
        Adds new query molecules as their set of descriptors from a chemical descriptor reader.
        Parameters:
        queryReader - Molecular descriptor set reader of the queries.
        Throws:
        MDReaderException - when failed reading the next descriptor set
        Since:
        JChem 2.2
      • getQuery

        public MDSet getQuery​(int queryIndex)
        Gets a query.
        Parameters:
        queryIndex - The index of the query (in order of addition) from 0 to getNrOfQueries() - 1 (both inclusive).
        Returns:
        The set of molecular descriptors of the query
      • getNrOfQueries

        public int getNrOfQueries()
        Gets the number of queries that have already been added.
        Returns:
        Number of query descriptors.
      • setThreshold

        public void setThreshold​(float threshold)
        Sets threshold for descriptor set mode. (Component-wise mode uses different threshold values for each descriptor component and metric.)
        Parameters:
        threshold - similarity threshold
        Since:
        JChem 2.2
      • useMetric

        public void useMetric​(int mdIndex,
                              int metricIndex,
                              float threshold)
        Use the specified metric for the specified molecular descriptor along with the given dissimilarity threshold.
        Parameters:
        mdIndex - Index of the molecular descriptor in the set.
        metricIndex - Index of the metric.
        threshold - Maximum dissimilarity allowed.
      • useMetric

        public void useMetric​(int mdIndex,
                              int metricIndex)
        Use the specified metric for the specified molecular descriptor with the dissimilarity threshold stored in the corresponding parameters settings.
        Parameters:
        mdIndex - Index of the molecular descriptor in the set.
        metricIndex - Index of the metric.
      • isUsedMetric

        public boolean isUsedMetric​(int mdIndex,
                                    int metricIndex)
        Return if the given metric is used with the given molecular descriptor in similarity calculations.
        Parameters:
        mdIndex - Index of the molecular descriptor in the set.
        metricIndex - Index of the metric.
        Returns:
        Metric in use flag.
      • getNrOfUsedMetrics

        public int getNrOfUsedMetrics​(int mdIndex)
        Return the number of metrics used with the given molecular descriptor in similarity calculations.
        Parameters:
        mdIndex - Index of the molecular descriptor in the set.
        Returns:
        Metric in use flag.
      • threshold

        public float threshold​(int mdIndex,
                               int metricIndex)
        Return the acceptance threshold of the given metric for the given molecular descriptor.
        Parameters:
        mdIndex - Index of the molecular descriptor in the set.
        metricIndex - Index of the metric.
        Returns:
        Threshold value, -1.0F, if metric is not used.
      • isComponentWise

        public boolean isComponentWise()
        Checks the component-wise flag.
        Returns:
        true if screening work in component-wise mode
        Since:
        JChem 2.2
      • passWithAllMetrics

        public void passWithAllMetrics()
        In the following searches a target molecule's molecular descriptor passes the comparison with a corresponding query descriptor, if all dissimilarity coefficients (distances calculated with each metric) between these descriptors are under the previously given threshold. If this flag is not set, then one coefficient under the threshold is enough for passing (default).
      • isPassWithAllMetrics

        public boolean isPassWithAllMetrics()
        Tells whether filtering of target descriptor sets is set to pass only if dissimilarity calculated with each metric used with the descriptor is under the required threshold.
        Returns:
        true if the condition is met
      • passWithOneMetric

        public void passWithOneMetric()
        In the following searches a target molecule's molecular descriptor passes the comparison with a corresponding query descriptor, if at least one dissimilarity coefficient between these descriptors is under the previously given threshold. This is the default setting.
      • isPassWithOneMetric

        public boolean isPassWithOneMetric()
        Tells whether filtering of target descriptor sets is set to pass if dissimilarity calculated with at least one metric used with the descriptor is under the required threshold.
        Returns:
        true if the condition is met
      • passWithAllDescriptors

        public void passWithAllDescriptors()
        In the following searches the descriptor set of a target molecule passes the comparison with a query descriptor set, if all descriptors of the set have passed the corresponding comparisons. If this flag is not set, then one passing descriptor from the set is enough for passing (default).
      • isPassWithAllDescriptors

        public boolean isPassWithAllDescriptors()
        Tells whether filtering of target descriptor sets is set to pass only if each descriptor in the set passes.
        Returns:
        true if the condition is met
      • passWithOneDescriptor

        public void passWithOneDescriptor()
        In the following searches the descriptor set of a target molecule passes the comparison with a query descriptor set, if at least one descriptor of the set have passed the corresponding comparisons. This is the default setting.
      • isPassWithOneDescriptor

        public boolean isPassWithOneDescriptor()
        Tells whether filtering of target descriptor sets is set to pass if at least one descriptor in the set passes.
        Returns:
        true if the condition is met
      • compare

        public boolean compare​(int mdIndex,
                               int metricIndex,
                               MDSet target)
                        throws RuntimeException
        Compares a target descriptor against all queries added prior to the call of this method using the given metric of the given descriptor. The results of the comparison (the dissimilarity coefficients) are stored internally, but only the results of the last comparison are kept, former values are discarded. Thus it is the responsibility of the user of this class to obtain required values by calling queryDissimilarityCoeffs() after compareQueries() is performed.
        The method can be used for filtering purposes, in which case its return value indicates whether the current target descriptor set is filtered out or not. Threshold values are set separately with useMetric().
        Parameters:
        mdIndex - Index of the molecular descriptor.
        metricIndex - Index of the metric.
        target - Target descriptor set.
        Returns:
        Target passed filtering or not.
        Throws:
        RuntimeException - in case of invalid configuration
      • compare

        public boolean compare​(MDSet target)
                        throws RuntimeException
        Compares a target descriptor set (for instance from a database) against all queries added prior to the call of this method. Results of the comparison (the dissimilarity coefficients) are stored internally, but only the results of the last comparison are kept, former values are discarded. Thus it is the responsibility of the user of this class to obtain required values by calling queryDissimilarityCoeffs() after compareQueries() is performed.
        The method can be used for filtering purposes, in which case its return value indicates whether the current target descriptor set is filtered out or not. Threshold values are set separately with useMetric().
        Parameters:
        target - Target descriptor set.
        Returns:
        Target passed filtering or not.
        Throws:
        RuntimeException - in case of invalid configuration
      • compare

        public int compare​(MDReader targetReader)
                    throws MDReaderException,
                           RuntimeException
        Compares a list of target descriptor sets (read by a molecular descriptor reader) against all queries added prior to the call of this method the same way as compareQueries( MolecularDescriptor target ) but for each target.
        Processing the results is the responsibility of the class implementing the MDSimilarityResultWriter interface.
        Before starting the processing of targets the open() procedure of MDSimilarityResultWriter is executed, then after processing each target the procedure write() is invoked, after the processing has ended the procedure close() is invoked.
        Parameters:
        targetReader - Reader of target descriptor sets.
        Returns:
        Number of targets that passed filtering.
        Throws:
        MDReaderException - when failed reading the next descriptor set
        RuntimeException - in case of invalid configuration
        Since:
        JChem 2.2
      • getDissimilarityCoeff

        public float getDissimilarityCoeff​(int queryIndex,
                                           int mdIndex,
                                           int metricIndex)
        Retrieves query dissimilarity coefficients (one at a time) of the last compareQueries() or compare() method called.
        Parameters:
        queryIndex - Index of the query molecule. Query molecules are numbered from 0 to nQueries() - 1, in the same order as added with addQuery().
        mdIndex - Index of molecular descriptor component in the set.
        metricIndex - Index of the metric.
        Returns:
        Value of the dissimilarity coefficient.
      • getDissimilarityCoeffs

        public float[] getDissimilarityCoeffs​(int queryIndex,
                                              int mdIndex)
        Retrieves query dissimilarity coefficients with all metrics and one descriptor of the last compareQueries() or compare() method called.
        Parameters:
        queryIndex - Index of the query molecule. Query molecules are numbered from 0 to nQueries() - 1, in the same order as added with addQuery().
        mdIndex - Index of molecular descriptor component in the set.
        Returns:
        Array of dissimilarity coefficients with each metrics.
      • getDissimilarityCoeffs

        public float[][] getDissimilarityCoeffs​(int queryIndex)
        Retrieves query dissimilarity coefficients with all metrics and one query of the last compareQueries() or compare() method called.
        Parameters:
        queryIndex - Index of the query molecule. Query molecules are numbered from 0 to nQueries() - 1, in the same order as added with addQuery().
        Returns:
        Array of dissimilarity coefficients with each descriptor and metric.
      • isLicensed

        public boolean isLicensed()
        Specified by:
        isLicensed in interface chemaxon.license.Licensable
      • setLicenseEnvironment

        public void setLicenseEnvironment​(String env)
        Specified by:
        setLicenseEnvironment in interface chemaxon.license.Licensable