Class MDHitEvaluator


  • @PublicAPI
    public class MDHitEvaluator
    extends Object

    Retrieves statistical information from a test screen on a set of molecules. Statistical information supplied:

    • Value of the selected evaluator function (enrichment, selectivity effectiveness) of a screening
    • Distribution of dissimilarity values (with histograms) for a given metric.
    • Hits from the set, containing structures similar to the actives
    • Hits from target set
    • Thresholds for dissimilarity metrics

    Basic input:

    • Set of query molecules
    • Set of molecules, which are known to be similar to the actives (e.g. the original set of actives can be divided into two parts, one of which is used as queries, the other is the test set of known similars)
    • Set of target molecules, which is screened against the actives (called dissimilar set in the code).

    There are two possible ways of usage. The first is intended to be applied to smaller amount of molecules, but with fast retrieval of statistical information in several ways. In this case all the dissimilarity values are calculated previously and are stored to enable fast queries.

    If the 'memory-safe' methods are used, then dissimilarities are calculated on the go, each time when a query function is called, they are not stored in the memory.

    Typical usage: Not memory-safe mode:

     evaluator = new MDHitEvaluator( similarity );
     evaluator.setSelectivityAsymmetryFactor( 0.3 );  
     int functionIndex = evaluator.getEvaluatorFunctionIndex( "SelectivityEffectiveness" )
     evaluator.setCurrentEvaluatorFunction( functionIndex );
     evaluator.calcDissimilarity( testReader, targetReader );
     int nSimilars = evaluator.getNumberOfSimilars();
     float E = evaluator.evaluateByMetric( descrIndex, metrIndex, 
                                             (int) 0.3 * nSimilars, (int) 0.8 * nSimilars );
     float E = evaluator.evaluateByMetric( descrIndex, metrIndex, 
                                             (int) 0.5 * nSimilars, nSimilars );
    

    Memory-safe mode, dissimilarities are always calculated!

     evaluator = new MDHitEvaluator( similarity );
     evaluator.setSelectivityAsymmetryFactor( 0.3 );  
     int functionIndex = evaluator.getEvaluatorFunctionIndex( "SelectivityEffectiveness" )
     evaluator.setCurrentEvaluatorFunction( functionIndex );
     float E = evaluator.evaluateByMetric( descrIndex, metrIndex, 50.0F, 
                                           testReader, targetReader );
    
    Since:
    JChem 2.0
    • Constructor Summary

      Constructors 
      Constructor Description
      MDHitEvaluator​(MDSimilarity similarity)
      Creates a new instance, allocates storage.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void calcDissimilarity​(MDReader similarSetReader, MDReader dissimilarSetReader)
      Precalculates dissimilarity values.
      int[] calcMetricDistribution​(int descrIndex, int metricIndex, float lowerBound, float upperBound, int nHistograms, float[] metricValues)
      Retrieves the distribution of the given metric from the dissimilarity values calculated by a previous call to calcDissimilarity().
      int[] calcMetricDistribution​(int descrIndex, int metricIndex, float lowerBound, float upperBound, int nHistograms, float[] metricValues, MDReader similarSetReader, MDReader dissimilarSetReader)
      Retrieves the distribution of the given metric from the dissimilarity values calculated by a screen using the given two molecular descriptor readers.
      float evaluateByMetric​(int descrIndex, int metricIndex, float minPercentageOfSimilarHits)
      Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the percentage of similar hits to the total number of similars must be greater or equal, than the given percentage.
      float evaluateByMetric​(int descrIndex, int metricIndex, float minPercentageOfSimilarHits, MDReader similarSetReader, MDReader dissimilarSetReader)
      Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the percentage of similar hits to the total number of similars must be greater or equal, than the given percentage.
      float evaluateByMetric​(int descrIndex, int metricIndex, int nSimilarHits)
      Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the given number of similars must be found.
      float evaluateByMetric​(int descrIndex, int metricIndex, int fromNSimilarHits, int toNSimilarHits)
      Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the number of similar hits must be between the given numbers.
      float evaluateByMetric​(int descrIndex, int metricIndex, int fromNSimilarHits, int toNSimilarHits, MDReader similarSetReader, MDReader dissimilarSetReader)
      Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the number of similar hits must be between the given numbers.
      float evaluateByMetric​(int descrIndex, int metricIndex, int nSimilarHits, MDReader similarSetReader, MDReader dissimilarSetReader)
      Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the given number of similars must be found.
      int getCurrentEvaluatorFunction()
      Gets the index of the current the evaluator function
      int getEvaluatorFunctionIndex​(String name)
      Gets the index of the evaluator function from its name
      String getEvaluatorFunctionName​(int index)
      Gets the name of the evaluator function from its index
      ArrayList[] getInsertedDissimilars()
      Returns lists of dissimilars which have dissimilarity values lower than the similars.
      int getNextDissimilarHit()
      Retrieves ids of target hits found in a previous screen or evaluation one by one.
      int getNextSimilarHit()
      Retrieves ids of known similar hits found in a previous screen or evaluation one by one.
      int getNumberOfDissimilarHits()
      Returns the number of hits from the set of target molecules, found in a previous evaluation or screen.
      int getNumberOfDissimilars()
      Returns the number of target molecules (read by dissimilarReader previously).
      int getNumberOfSimilarHits()
      Returns the number of hits from the known similar molecules, found in a previous evaluation or screen.
      int getNumberOfSimilars()
      Returns the number of known similar molecules (read by similarReader previously).
      float getSelectivityAsymmetryFactor()
      Returns the value of the asymmetry factor (weight) of the evaluator funcion selectivity effectiveness.
      float getThreshold​(int descrIndex, int metricIndex)
      Returns threshold set by last screen (given by user as a parameter) of evaluation (set by evaluation).
      void resetDissimilarHits()
      Resets target hits found in a previous screen or evaluation for following retrieval one by one.
      void resetSimilarHits()
      Resets known similar hits found in a previous screen or evaluation for following retrieval one by one.
      float[] screen​(int descrIndex, int metricIndex, float threshold)
      Screen the similar set and the dissimilar set with the given descriptor, metric and threshold.
      float[] screen​(int descrIndex, int metricIndex, float threshold, MDReader similarSetReader, MDReader dissimilarSetReader)
      Screen the similar set and the dissimilar set with the given descriptor, metric and threshold.
      void setCurrentEvaluatorFunction​(int index)
      Sets the evaluator function, the value of which is returned in each evaluate call.
      void setSelectivityAsymmetryFactor​(float alpha)
      Sets the asymmetry factor (weight) of the evaluator funcion selectivity effectiveness.
    • Field Detail

      • evaluatorFunctions

        public String[] evaluatorFunctions
    • Constructor Detail

      • MDHitEvaluator

        public MDHitEvaluator​(MDSimilarity similarity)
        Creates a new instance, allocates storage.
        Parameters:
        similarity - A complete MDSimilarity object with added queries
    • Method Detail

      • setCurrentEvaluatorFunction

        public void setCurrentEvaluatorFunction​(int index)
        Sets the evaluator function, the value of which is returned in each evaluate call.
        Parameters:
        index - Index of evaluator funcion
      • getCurrentEvaluatorFunction

        public int getCurrentEvaluatorFunction()
        Gets the index of the current the evaluator function
        Returns:
        Index of evaluator funcion
      • getEvaluatorFunctionIndex

        public int getEvaluatorFunctionIndex​(String name)
                                      throws IllegalArgumentException
        Gets the index of the evaluator function from its name
        Parameters:
        name - Name of evaluator function
        Returns:
        Index of evaluator funcion
        Throws:
        IllegalArgumentException
      • getEvaluatorFunctionName

        public String getEvaluatorFunctionName​(int index)
        Gets the name of the evaluator function from its index
        Parameters:
        index - Index of evaluator funcion
        Returns:
        Name of evaluator function
      • setSelectivityAsymmetryFactor

        public void setSelectivityAsymmetryFactor​(float alpha)
                                           throws IllegalArgumentException
        Sets the asymmetry factor (weight) of the evaluator funcion selectivity effectiveness.
        Parameters:
        alpha - Value of he asymmetry factor
        Throws:
        IllegalArgumentException
      • getSelectivityAsymmetryFactor

        public float getSelectivityAsymmetryFactor()
        Returns the value of the asymmetry factor (weight) of the evaluator funcion selectivity effectiveness.
        Returns:
        Value of he asymmetry factor
      • calcDissimilarity

        public void calcDissimilarity​(MDReader similarSetReader,
                                      MDReader dissimilarSetReader)
                               throws MDReaderException
        Precalculates dissimilarity values. It is worth doind so, if the number of all dissimilarity values can fit into the memory. In this case several evaluations can be performed afterwards without recalculating the dissimilarity values.
        Parameters:
        similarSetReader - Reader of the test set of known similars
        dissimilarSetReader - Reader of the set of target molecules (where similars are thought)
        Throws:
        MDReaderException - if the record couldn't be read.
      • screen

        public float[] screen​(int descrIndex,
                              int metricIndex,
                              float threshold)
        Screen the similar set and the dissimilar set with the given descriptor, metric and threshold. Evaluate all evaluator functions. To be called only if calcDissimilarity() has been called previously.
        Parameters:
        descrIndex - Index of molecular descriptor
        metricIndex - Index of metric (of the given molecular descriptor)
        threshold - Threshold value for selecting hits
        Returns:
        Array of evaluator function values
      • screen

        public float[] screen​(int descrIndex,
                              int metricIndex,
                              float threshold,
                              MDReader similarSetReader,
                              MDReader dissimilarSetReader)
                       throws MDReaderException
        Screen the similar set and the dissimilar set with the given descriptor, metric and threshold. Evaluate all evaluator functions.
        Parameters:
        descrIndex - Index of molecular descriptor
        metricIndex - Index of metric (of the given molecular descriptor)
        threshold - Threshold value for selecting hits
        similarSetReader - Reader of the test set of known similars
        dissimilarSetReader - Reader of the set of target molecules (where similars are thought)
        Returns:
        Array of evaluator function values
        Throws:
        MDReaderException - if the record couldn't be read.
      • evaluateByMetric

        public float evaluateByMetric​(int descrIndex,
                                      int metricIndex,
                                      int nSimilarHits)
        Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the given number of similars must be found. Threshold for metric is set using this requirement, and can be retrieved by a following call to getThreshold( descrIndex, metricIndex ). To be called only if calcDissimilarity() has been called previously.
        Parameters:
        descrIndex - Index of molecular descriptor
        metricIndex - Index of metric (of the given molecular descriptor)
        nSimilarHits - Number of known similars required as hits
        Returns:
        Value of current evaluator function
      • evaluateByMetric

        public float evaluateByMetric​(int descrIndex,
                                      int metricIndex,
                                      int nSimilarHits,
                                      MDReader similarSetReader,
                                      MDReader dissimilarSetReader)
                               throws MDReaderException
        Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the given number of similars must be found. Threshold for metric is set using this requirement, and can be retrieved by a following call to getThreshold( descrIndex, metricIndex ).
        Parameters:
        descrIndex - Index of molecular descriptor
        metricIndex - Index of metric (of the given molecular descriptor)
        nSimilarHits - Number of known similars required as hits
        similarSetReader - Reader of the test set of known similars
        dissimilarSetReader - Reader of the set of target molecules (where similars are thought)
        Returns:
        Value of current evaluator function
        Throws:
        MDReaderException - if the record couldn't be read.
      • evaluateByMetric

        public float evaluateByMetric​(int descrIndex,
                                      int metricIndex,
                                      int fromNSimilarHits,
                                      int toNSimilarHits)
        Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the number of similar hits must be between the given numbers. The actual number of similar hits and the threshold for the metric will be the one, for which the value of the evaluator function is maximal amongst the allowed ones. Their value can be retrieved by a following call to getNumberOfSimilarHits( descrIndex, metricIndex ) and getThreshold( descrIndex, metricIndex ). To be called only if calcDissimilarity() has been called previously.
        Parameters:
        descrIndex - Index of molecular descriptor
        metricIndex - Index of metric (of the given molecular descriptor)
        fromNSimilarHits - Minimal number of known similars required as hits
        toNSimilarHits - Maximal number of known similars required as hits
        Returns:
        Value of current evaluator function
      • evaluateByMetric

        public float evaluateByMetric​(int descrIndex,
                                      int metricIndex,
                                      float minPercentageOfSimilarHits)
        Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the percentage of similar hits to the total number of similars must be greater or equal, than the given percentage. The actual number of similar hits and the threshold for the metric will be the one, for which the value of the evaluator function is maximal amongst the allowed ones. Their value can be retrieved by a following call to getNumberOfSimilarHits( descrIndex, metricIndex ) and getThreshold( descrIndex, metricIndex ). To be called only if calcDissimilarity() has been called previously.
        Parameters:
        descrIndex - Index of molecular descriptor
        metricIndex - Index of metric (of the given molecular descriptor)
        minPercentageOfSimilarHits - Minimal percentage of known similars required as hits compared to total number of similars
        Returns:
        Value of current evaluator function
      • evaluateByMetric

        public float evaluateByMetric​(int descrIndex,
                                      int metricIndex,
                                      int fromNSimilarHits,
                                      int toNSimilarHits,
                                      MDReader similarSetReader,
                                      MDReader dissimilarSetReader)
                               throws MDReaderException
        Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the number of similar hits must be between the given numbers. The actual number of similar hits and the threshold for the metric will be the one, for which the value of the evaluator function is maximal amongst the allowed ones. Their value can be retrieved by a following call to getNumberOfSimilarHits( descrIndex, metricIndex ) and getThreshold( descrIndex, metricIndex ).
        Parameters:
        descrIndex - Index of molecular descriptor
        metricIndex - Index of metric (of the given molecular descriptor)
        fromNSimilarHits - Minimal number of known similars required as hits
        toNSimilarHits - Maximal number of known similars required as hits
        similarSetReader - Reader of the test set of known similars
        dissimilarSetReader - Reader of the set of target molecules (where similars are thought)
        Returns:
        Value of current evaluator function
        Throws:
        MDReaderException - if the record couldn't be read.
      • evaluateByMetric

        public float evaluateByMetric​(int descrIndex,
                                      int metricIndex,
                                      float minPercentageOfSimilarHits,
                                      MDReader similarSetReader,
                                      MDReader dissimilarSetReader)
                               throws MDReaderException
        Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the percentage of similar hits to the total number of similars must be greater or equal, than the given percentage. The actual number of similar hits and the threshold for the metric will be the one, for which the value of the evaluator function is maximal amongst the allowed ones. Their value can be retrieved by a following call to getNumberOfSimilarHits( descrIndex, metricIndex ) and getThreshold( descrIndex, metricIndex ).
        Parameters:
        descrIndex - Index of molecular descriptor
        metricIndex - Index of metric (of the given molecular descriptor)
        minPercentageOfSimilarHits - Minimal percentage of known similars required as hits compared to total number of similars
        similarSetReader - Reader of the test set of known similars
        dissimilarSetReader - Reader of the set of target molecules (where similars are thought)
        Returns:
        Value of current evaluator function
        Throws:
        MDReaderException - if the record couldn't be read.
      • getNumberOfSimilars

        public int getNumberOfSimilars()
        Returns the number of known similar molecules (read by similarReader previously).
        Returns:
        Number of known similar structures
      • getNumberOfDissimilars

        public int getNumberOfDissimilars()
        Returns the number of target molecules (read by dissimilarReader previously).
        Returns:
        Number of known target structures, which are not known to be similars
      • getNumberOfSimilarHits

        public int getNumberOfSimilarHits()
        Returns the number of hits from the known similar molecules, found in a previous evaluation or screen.
        Returns:
        Number of known similar hits
      • getNumberOfDissimilarHits

        public int getNumberOfDissimilarHits()
        Returns the number of hits from the set of target molecules, found in a previous evaluation or screen.
        Returns:
        Number of target hits
      • resetSimilarHits

        public void resetSimilarHits()
        Resets known similar hits found in a previous screen or evaluation for following retrieval one by one.
      • resetDissimilarHits

        public void resetDissimilarHits()
        Resets target hits found in a previous screen or evaluation for following retrieval one by one.
      • getNextSimilarHit

        public int getNextSimilarHit()
        Retrieves ids of known similar hits found in a previous screen or evaluation one by one.
        Returns:
        Id of next similar hit
      • getNextDissimilarHit

        public int getNextDissimilarHit()
        Retrieves ids of target hits found in a previous screen or evaluation one by one.
        Returns:
        Id of next hit from target set of dissimilars
      • getThreshold

        public float getThreshold​(int descrIndex,
                                  int metricIndex)
        Returns threshold set by last screen (given by user as a parameter) of evaluation (set by evaluation).
        Parameters:
        descrIndex - Index of molecular descriptor
        metricIndex - Index of metric (of the given molecular descriptor)
        Returns:
        Threshold value
      • getInsertedDissimilars

        public ArrayList[] getInsertedDissimilars()
        Returns lists of dissimilars which have dissimilarity values lower than the similars. First element contains the list of dissimilar ids that have dissimilarity lower than all similars, second contains the ids of dissimilars with dissimilarity values between the first and second similar (if similars are ordered by their dissimilarity values) etc.
        Returns:
        Array of lists of dissimilar ids, length is the number of similars.
        Since:
        JChem 2.2
      • calcMetricDistribution

        public int[] calcMetricDistribution​(int descrIndex,
                                            int metricIndex,
                                            float lowerBound,
                                            float upperBound,
                                            int nHistograms,
                                            float[] metricValues)
        Retrieves the distribution of the given metric from the dissimilarity values calculated by a previous call to calcDissimilarity(). Distribution is returned by giving the number of dissimilarities falling into the (nHistograms - 2) equal size intervals beween lowerBound and upperBound, and by adding two extra intervals: for each value lower than the given lower bound and for each value greater than the given upper bound. The i-th interval is defined as: [ metricValues[ i ], metricValues[ i + 1 ] ] .
        Parameters:
        descrIndex - Index of molecular descriptor
        metricIndex - Index of metric (of the given molecular descriptor)
        lowerBound - Lower bound for dissimilarity distribution
        upperBound - Upper bound for dissimilarity distribution
        nHistograms - Refinement of distribution: number of histograms (including the two extra histograms)
        metricValues - Outgoing parameter! Must be allocated previously with length (nHistograms + 1), contains endpoints of the dissimilarity value intervals
        Returns:
        Array of numbers of dissimilarity values falling into the intervals defined by metricValues
      • calcMetricDistribution

        public int[] calcMetricDistribution​(int descrIndex,
                                            int metricIndex,
                                            float lowerBound,
                                            float upperBound,
                                            int nHistograms,
                                            float[] metricValues,
                                            MDReader similarSetReader,
                                            MDReader dissimilarSetReader)
                                     throws MDReaderException
        Retrieves the distribution of the given metric from the dissimilarity values calculated by a screen using the given two molecular descriptor readers. Distribution is returned by giving the number of dissimilarities falling into the (nHistograms - 2) equal size intervals beween lowerBound and upperBound, and by adding two extra intervals: for each value lower than the given lower bound and for each value greater than the given upper bound. The i-th interval is defined as: [ metricValues[ i ], metricValues[ i + 1 ] ] .
        Parameters:
        descrIndex - Index of molecular descriptor
        metricIndex - Index of metric (of the given molecular descriptor)
        lowerBound - Lower bound for dissimilarity distribution
        upperBound - Upper bound for dissimilarity distribution
        nHistograms - Refinement of distribution: number of histograms (including the two extra histograms)
        metricValues - Outgoing parameter! Must be allocated previously with length (nHistograms + 1), contains endpoints of the dissimilarity value intervals
        similarSetReader - Reader of the test set of known similars
        dissimilarSetReader - Reader of the set of target molecules (where similars are thought)
        Returns:
        Array of numbers of dissimilarity values falling into the intervals defined by metricValues
        Throws:
        MDReaderException - if the record couldn't be read.