Class MDHitEvaluator

java.lang.Object
chemaxon.descriptors.MDHitEvaluator

@PublicApi public class MDHitEvaluator extends Object

Retrieves statistical information from a test screen on a set of molecules. Statistical information supplied:

  • Value of the selected evaluator function (enrichment, selectivity effectiveness) of a screening
  • Distribution of dissimilarity values (with histograms) for a given metric.
  • Hits from the set, containing structures similar to the actives
  • Hits from target set
  • Thresholds for dissimilarity metrics

Basic input:

  • Set of query molecules
  • Set of molecules, which are known to be similar to the actives (e.g. the original set of actives can be divided into two parts, one of which is used as queries, the other is the test set of known similars)
  • Set of target molecules, which is screened against the actives (called dissimilar set in the code).

There are two possible ways of usage. The first is intended to be applied to smaller amount of molecules, but with fast retrieval of statistical information in several ways. In this case all the dissimilarity values are calculated previously and are stored to enable fast queries.

If the 'memory-safe' methods are used, then dissimilarities are calculated on the go, each time when a query function is called, they are not stored in the memory.

Typical usage: Not memory-safe mode:

 evaluator = new MDHitEvaluator( similarity );
 evaluator.setSelectivityAsymmetryFactor( 0.3 );
 int functionIndex = evaluator.getEvaluatorFunctionIndex( "SelectivityEffectiveness" )
 evaluator.setCurrentEvaluatorFunction( functionIndex );
 evaluator.calcDissimilarity( testReader, targetReader );
 int nSimilars = evaluator.getNumberOfSimilars();
 float E = evaluator.evaluateByMetric( descrIndex, metrIndex,
                                         (int) 0.3 * nSimilars, (int) 0.8 * nSimilars );
 float E = evaluator.evaluateByMetric( descrIndex, metrIndex,
                                         (int) 0.5 * nSimilars, nSimilars );

Memory-safe mode, dissimilarities are always calculated!

 evaluator = new MDHitEvaluator( similarity );
 evaluator.setSelectivityAsymmetryFactor( 0.3 );
 int functionIndex = evaluator.getEvaluatorFunctionIndex( "SelectivityEffectiveness" )
 evaluator.setCurrentEvaluatorFunction( functionIndex );
 float E = evaluator.evaluateByMetric( descrIndex, metrIndex, 50.0F,
                                       testReader, targetReader );
Since:
JChem 2.0
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
     
  • Constructor Summary

    Constructors
    Constructor
    Description
    Creates a new instance, allocates storage.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    calcDissimilarity(MDReader similarSetReader, MDReader dissimilarSetReader)
    Precalculates dissimilarity values.
    int[]
    calcMetricDistribution(int descrIndex, int metricIndex, float lowerBound, float upperBound, int nHistograms, float[] metricValues)
    Retrieves the distribution of the given metric from the dissimilarity values calculated by a previous call to calcDissimilarity().
    int[]
    calcMetricDistribution(int descrIndex, int metricIndex, float lowerBound, float upperBound, int nHistograms, float[] metricValues, MDReader similarSetReader, MDReader dissimilarSetReader)
    Retrieves the distribution of the given metric from the dissimilarity values calculated by a screen using the given two molecular descriptor readers.
    float
    evaluateByMetric(int descrIndex, int metricIndex, float minPercentageOfSimilarHits)
    Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the percentage of similar hits to the total number of similars must be greater or equal, than the given percentage.
    float
    evaluateByMetric(int descrIndex, int metricIndex, float minPercentageOfSimilarHits, MDReader similarSetReader, MDReader dissimilarSetReader)
    Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the percentage of similar hits to the total number of similars must be greater or equal, than the given percentage.
    float
    evaluateByMetric(int descrIndex, int metricIndex, int nSimilarHits)
    Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the given number of similars must be found.
    float
    evaluateByMetric(int descrIndex, int metricIndex, int fromNSimilarHits, int toNSimilarHits)
    Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the number of similar hits must be between the given numbers.
    float
    evaluateByMetric(int descrIndex, int metricIndex, int fromNSimilarHits, int toNSimilarHits, MDReader similarSetReader, MDReader dissimilarSetReader)
    Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the number of similar hits must be between the given numbers.
    float
    evaluateByMetric(int descrIndex, int metricIndex, int nSimilarHits, MDReader similarSetReader, MDReader dissimilarSetReader)
    Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the given number of similars must be found.
    int
    Gets the index of the current the evaluator function
    int
    Gets the index of the evaluator function from its name
    Gets the name of the evaluator function from its index
    Returns lists of dissimilars which have dissimilarity values lower than the similars.
    int
    Retrieves ids of target hits found in a previous screen or evaluation one by one.
    int
    Retrieves ids of known similar hits found in a previous screen or evaluation one by one.
    int
    Returns the number of hits from the set of target molecules, found in a previous evaluation or screen.
    int
    Returns the number of target molecules (read by dissimilarReader previously).
    int
    Returns the number of hits from the known similar molecules, found in a previous evaluation or screen.
    int
    Returns the number of known similar molecules (read by similarReader previously).
    float
    Returns the value of the asymmetry factor (weight) of the evaluator funcion selectivity effectiveness.
    float
    getThreshold(int descrIndex, int metricIndex)
    Returns threshold set by last screen (given by user as a parameter) of evaluation (set by evaluation).
    void
    Resets target hits found in a previous screen or evaluation for following retrieval one by one.
    void
    Resets known similar hits found in a previous screen or evaluation for following retrieval one by one.
    float[]
    screen(int descrIndex, int metricIndex, float threshold)
    Screen the similar set and the dissimilar set with the given descriptor, metric and threshold.
    float[]
    screen(int descrIndex, int metricIndex, float threshold, MDReader similarSetReader, MDReader dissimilarSetReader)
    Screen the similar set and the dissimilar set with the given descriptor, metric and threshold.
    void
    Sets the evaluator function, the value of which is returned in each evaluate call.
    void
    Sets the asymmetry factor (weight) of the evaluator funcion selectivity effectiveness.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • evaluatorFunctions

      public String[] evaluatorFunctions
  • Constructor Details

    • MDHitEvaluator

      public MDHitEvaluator(MDSimilarity similarity)
      Creates a new instance, allocates storage.
      Parameters:
      similarity - A complete MDSimilarity object with added queries
  • Method Details

    • setCurrentEvaluatorFunction

      public void setCurrentEvaluatorFunction(int index)
      Sets the evaluator function, the value of which is returned in each evaluate call.
      Parameters:
      index - Index of evaluator funcion
    • getCurrentEvaluatorFunction

      public int getCurrentEvaluatorFunction()
      Gets the index of the current the evaluator function
      Returns:
      Index of evaluator funcion
    • getEvaluatorFunctionIndex

      public int getEvaluatorFunctionIndex(String name) throws IllegalArgumentException
      Gets the index of the evaluator function from its name
      Parameters:
      name - Name of evaluator function
      Returns:
      Index of evaluator funcion
      Throws:
      IllegalArgumentException
    • getEvaluatorFunctionName

      public String getEvaluatorFunctionName(int index)
      Gets the name of the evaluator function from its index
      Parameters:
      index - Index of evaluator funcion
      Returns:
      Name of evaluator function
    • setSelectivityAsymmetryFactor

      public void setSelectivityAsymmetryFactor(float alpha) throws IllegalArgumentException
      Sets the asymmetry factor (weight) of the evaluator funcion selectivity effectiveness.
      Parameters:
      alpha - Value of he asymmetry factor
      Throws:
      IllegalArgumentException
    • getSelectivityAsymmetryFactor

      public float getSelectivityAsymmetryFactor()
      Returns the value of the asymmetry factor (weight) of the evaluator funcion selectivity effectiveness.
      Returns:
      Value of he asymmetry factor
    • calcDissimilarity

      public void calcDissimilarity(MDReader similarSetReader, MDReader dissimilarSetReader) throws MDReaderException
      Precalculates dissimilarity values. It is worth doind so, if the number of all dissimilarity values can fit into the memory. In this case several evaluations can be performed afterwards without recalculating the dissimilarity values.
      Parameters:
      similarSetReader - Reader of the test set of known similars
      dissimilarSetReader - Reader of the set of target molecules (where similars are thought)
      Throws:
      MDReaderException - if the record couldn't be read.
    • screen

      public float[] screen(int descrIndex, int metricIndex, float threshold)
      Screen the similar set and the dissimilar set with the given descriptor, metric and threshold. Evaluate all evaluator functions. To be called only if calcDissimilarity() has been called previously.
      Parameters:
      descrIndex - Index of molecular descriptor
      metricIndex - Index of metric (of the given molecular descriptor)
      threshold - Threshold value for selecting hits
      Returns:
      Array of evaluator function values
    • screen

      public float[] screen(int descrIndex, int metricIndex, float threshold, MDReader similarSetReader, MDReader dissimilarSetReader) throws MDReaderException
      Screen the similar set and the dissimilar set with the given descriptor, metric and threshold. Evaluate all evaluator functions.
      Parameters:
      descrIndex - Index of molecular descriptor
      metricIndex - Index of metric (of the given molecular descriptor)
      threshold - Threshold value for selecting hits
      similarSetReader - Reader of the test set of known similars
      dissimilarSetReader - Reader of the set of target molecules (where similars are thought)
      Returns:
      Array of evaluator function values
      Throws:
      MDReaderException - if the record couldn't be read.
    • evaluateByMetric

      public float evaluateByMetric(int descrIndex, int metricIndex, int nSimilarHits)
      Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the given number of similars must be found. Threshold for metric is set using this requirement, and can be retrieved by a following call to getThreshold( descrIndex, metricIndex ). To be called only if calcDissimilarity() has been called previously.
      Parameters:
      descrIndex - Index of molecular descriptor
      metricIndex - Index of metric (of the given molecular descriptor)
      nSimilarHits - Number of known similars required as hits
      Returns:
      Value of current evaluator function
    • evaluateByMetric

      public float evaluateByMetric(int descrIndex, int metricIndex, int nSimilarHits, MDReader similarSetReader, MDReader dissimilarSetReader) throws MDReaderException
      Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the given number of similars must be found. Threshold for metric is set using this requirement, and can be retrieved by a following call to getThreshold( descrIndex, metricIndex ).
      Parameters:
      descrIndex - Index of molecular descriptor
      metricIndex - Index of metric (of the given molecular descriptor)
      nSimilarHits - Number of known similars required as hits
      similarSetReader - Reader of the test set of known similars
      dissimilarSetReader - Reader of the set of target molecules (where similars are thought)
      Returns:
      Value of current evaluator function
      Throws:
      MDReaderException - if the record couldn't be read.
    • evaluateByMetric

      public float evaluateByMetric(int descrIndex, int metricIndex, int fromNSimilarHits, int toNSimilarHits)
      Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the number of similar hits must be between the given numbers. The actual number of similar hits and the threshold for the metric will be the one, for which the value of the evaluator function is maximal amongst the allowed ones. Their value can be retrieved by a following call to getNumberOfSimilarHits( descrIndex, metricIndex ) and getThreshold( descrIndex, metricIndex ). To be called only if calcDissimilarity() has been called previously.
      Parameters:
      descrIndex - Index of molecular descriptor
      metricIndex - Index of metric (of the given molecular descriptor)
      fromNSimilarHits - Minimal number of known similars required as hits
      toNSimilarHits - Maximal number of known similars required as hits
      Returns:
      Value of current evaluator function
    • evaluateByMetric

      public float evaluateByMetric(int descrIndex, int metricIndex, float minPercentageOfSimilarHits)
      Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the percentage of similar hits to the total number of similars must be greater or equal, than the given percentage. The actual number of similar hits and the threshold for the metric will be the one, for which the value of the evaluator function is maximal amongst the allowed ones. Their value can be retrieved by a following call to getNumberOfSimilarHits( descrIndex, metricIndex ) and getThreshold( descrIndex, metricIndex ). To be called only if calcDissimilarity() has been called previously.
      Parameters:
      descrIndex - Index of molecular descriptor
      metricIndex - Index of metric (of the given molecular descriptor)
      minPercentageOfSimilarHits - Minimal percentage of known similars required as hits compared to total number of similars
      Returns:
      Value of current evaluator function
    • evaluateByMetric

      public float evaluateByMetric(int descrIndex, int metricIndex, int fromNSimilarHits, int toNSimilarHits, MDReader similarSetReader, MDReader dissimilarSetReader) throws MDReaderException
      Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the number of similar hits must be between the given numbers. The actual number of similar hits and the threshold for the metric will be the one, for which the value of the evaluator function is maximal amongst the allowed ones. Their value can be retrieved by a following call to getNumberOfSimilarHits( descrIndex, metricIndex ) and getThreshold( descrIndex, metricIndex ).
      Parameters:
      descrIndex - Index of molecular descriptor
      metricIndex - Index of metric (of the given molecular descriptor)
      fromNSimilarHits - Minimal number of known similars required as hits
      toNSimilarHits - Maximal number of known similars required as hits
      similarSetReader - Reader of the test set of known similars
      dissimilarSetReader - Reader of the set of target molecules (where similars are thought)
      Returns:
      Value of current evaluator function
      Throws:
      MDReaderException - if the record couldn't be read.
    • evaluateByMetric

      public float evaluateByMetric(int descrIndex, int metricIndex, float minPercentageOfSimilarHits, MDReader similarSetReader, MDReader dissimilarSetReader) throws MDReaderException
      Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the percentage of similar hits to the total number of similars must be greater or equal, than the given percentage. The actual number of similar hits and the threshold for the metric will be the one, for which the value of the evaluator function is maximal amongst the allowed ones. Their value can be retrieved by a following call to getNumberOfSimilarHits( descrIndex, metricIndex ) and getThreshold( descrIndex, metricIndex ).
      Parameters:
      descrIndex - Index of molecular descriptor
      metricIndex - Index of metric (of the given molecular descriptor)
      minPercentageOfSimilarHits - Minimal percentage of known similars required as hits compared to total number of similars
      similarSetReader - Reader of the test set of known similars
      dissimilarSetReader - Reader of the set of target molecules (where similars are thought)
      Returns:
      Value of current evaluator function
      Throws:
      MDReaderException - if the record couldn't be read.
    • getNumberOfSimilars

      public int getNumberOfSimilars()
      Returns the number of known similar molecules (read by similarReader previously).
      Returns:
      Number of known similar structures
    • getNumberOfDissimilars

      public int getNumberOfDissimilars()
      Returns the number of target molecules (read by dissimilarReader previously).
      Returns:
      Number of known target structures, which are not known to be similars
    • getNumberOfSimilarHits

      public int getNumberOfSimilarHits()
      Returns the number of hits from the known similar molecules, found in a previous evaluation or screen.
      Returns:
      Number of known similar hits
    • getNumberOfDissimilarHits

      public int getNumberOfDissimilarHits()
      Returns the number of hits from the set of target molecules, found in a previous evaluation or screen.
      Returns:
      Number of target hits
    • resetSimilarHits

      public void resetSimilarHits()
      Resets known similar hits found in a previous screen or evaluation for following retrieval one by one.
    • resetDissimilarHits

      public void resetDissimilarHits()
      Resets target hits found in a previous screen or evaluation for following retrieval one by one.
    • getNextSimilarHit

      public int getNextSimilarHit()
      Retrieves ids of known similar hits found in a previous screen or evaluation one by one.
      Returns:
      Id of next similar hit
    • getNextDissimilarHit

      public int getNextDissimilarHit()
      Retrieves ids of target hits found in a previous screen or evaluation one by one.
      Returns:
      Id of next hit from target set of dissimilars
    • getThreshold

      public float getThreshold(int descrIndex, int metricIndex)
      Returns threshold set by last screen (given by user as a parameter) of evaluation (set by evaluation).
      Parameters:
      descrIndex - Index of molecular descriptor
      metricIndex - Index of metric (of the given molecular descriptor)
      Returns:
      Threshold value
    • getInsertedDissimilars

      public ArrayList<Integer>[] getInsertedDissimilars()
      Returns lists of dissimilars which have dissimilarity values lower than the similars. First element contains the list of dissimilar ids that have dissimilarity lower than all similars, second contains the ids of dissimilars with dissimilarity values between the first and second similar (if similars are ordered by their dissimilarity values) etc.
      Returns:
      Array of lists of dissimilar ids, length is the number of similars.
      Since:
      JChem 2.2
    • calcMetricDistribution

      public int[] calcMetricDistribution(int descrIndex, int metricIndex, float lowerBound, float upperBound, int nHistograms, float[] metricValues)
      Retrieves the distribution of the given metric from the dissimilarity values calculated by a previous call to calcDissimilarity(). Distribution is returned by giving the number of dissimilarities falling into the (nHistograms - 2) equal size intervals beween lowerBound and upperBound, and by adding two extra intervals: for each value lower than the given lower bound and for each value greater than the given upper bound. The i-th interval is defined as: [ metricValues[ i ], metricValues[ i + 1 ] ] .
      Parameters:
      descrIndex - Index of molecular descriptor
      metricIndex - Index of metric (of the given molecular descriptor)
      lowerBound - Lower bound for dissimilarity distribution
      upperBound - Upper bound for dissimilarity distribution
      nHistograms - Refinement of distribution: number of histograms (including the two extra histograms)
      metricValues - Outgoing parameter! Must be allocated previously with length (nHistograms + 1), contains endpoints of the dissimilarity value intervals
      Returns:
      Array of numbers of dissimilarity values falling into the intervals defined by metricValues
    • calcMetricDistribution

      public int[] calcMetricDistribution(int descrIndex, int metricIndex, float lowerBound, float upperBound, int nHistograms, float[] metricValues, MDReader similarSetReader, MDReader dissimilarSetReader) throws MDReaderException
      Retrieves the distribution of the given metric from the dissimilarity values calculated by a screen using the given two molecular descriptor readers. Distribution is returned by giving the number of dissimilarities falling into the (nHistograms - 2) equal size intervals beween lowerBound and upperBound, and by adding two extra intervals: for each value lower than the given lower bound and for each value greater than the given upper bound. The i-th interval is defined as: [ metricValues[ i ], metricValues[ i + 1 ] ] .
      Parameters:
      descrIndex - Index of molecular descriptor
      metricIndex - Index of metric (of the given molecular descriptor)
      lowerBound - Lower bound for dissimilarity distribution
      upperBound - Upper bound for dissimilarity distribution
      nHistograms - Refinement of distribution: number of histograms (including the two extra histograms)
      metricValues - Outgoing parameter! Must be allocated previously with length (nHistograms + 1), contains endpoints of the dissimilarity value intervals
      similarSetReader - Reader of the test set of known similars
      dissimilarSetReader - Reader of the set of target molecules (where similars are thought)
      Returns:
      Array of numbers of dissimilarity values falling into the intervals defined by metricValues
      Throws:
      MDReaderException - if the record couldn't be read.