Package chemaxon.descriptors
Class MDHitEvaluator
java.lang.Object
chemaxon.descriptors.MDHitEvaluator
Retrieves statistical information from a test screen on a set of molecules. Statistical information supplied:
- Value of the selected evaluator function (enrichment, selectivity effectiveness) of a screening
- Distribution of dissimilarity values (with histograms) for a given metric.
- Hits from the set, containing structures similar to the actives
- Hits from target set
- Thresholds for dissimilarity metrics
Basic input:
- Set of query molecules
- Set of molecules, which are known to be similar to the actives (e.g. the original set of actives can be divided into two parts, one of which is used as queries, the other is the test set of known similars)
- Set of target molecules, which is screened against the actives (called dissimilar set in the code).
There are two possible ways of usage. The first is intended to be applied to smaller amount of molecules, but with fast retrieval of statistical information in several ways. In this case all the dissimilarity values are calculated previously and are stored to enable fast queries.
If the 'memory-safe' methods are used, then dissimilarities are calculated on the go, each time when a query function is called, they are not stored in the memory.
Typical usage: Not memory-safe mode:
evaluator = new MDHitEvaluator( similarity ); evaluator.setSelectivityAsymmetryFactor( 0.3 ); int functionIndex = evaluator.getEvaluatorFunctionIndex( "SelectivityEffectiveness" ) evaluator.setCurrentEvaluatorFunction( functionIndex ); evaluator.calcDissimilarity( testReader, targetReader ); int nSimilars = evaluator.getNumberOfSimilars(); float E = evaluator.evaluateByMetric( descrIndex, metrIndex, (int) 0.3 * nSimilars, (int) 0.8 * nSimilars ); float E = evaluator.evaluateByMetric( descrIndex, metrIndex, (int) 0.5 * nSimilars, nSimilars );
Memory-safe mode, dissimilarities are always calculated!
evaluator = new MDHitEvaluator( similarity ); evaluator.setSelectivityAsymmetryFactor( 0.3 ); int functionIndex = evaluator.getEvaluatorFunctionIndex( "SelectivityEffectiveness" ) evaluator.setCurrentEvaluatorFunction( functionIndex ); float E = evaluator.evaluateByMetric( descrIndex, metrIndex, 50.0F, testReader, targetReader );
- Since:
- JChem 2.0
-
Field Summary
-
Constructor Summary
ConstructorDescriptionMDHitEvaluator
(MDSimilarity similarity) Creates a new instance, allocates storage. -
Method Summary
Modifier and TypeMethodDescriptionvoid
calcDissimilarity
(MDReader similarSetReader, MDReader dissimilarSetReader) Precalculates dissimilarity values.int[]
calcMetricDistribution
(int descrIndex, int metricIndex, float lowerBound, float upperBound, int nHistograms, float[] metricValues) Retrieves the distribution of the given metric from the dissimilarity values calculated by a previous call tocalcDissimilarity()
.int[]
calcMetricDistribution
(int descrIndex, int metricIndex, float lowerBound, float upperBound, int nHistograms, float[] metricValues, MDReader similarSetReader, MDReader dissimilarSetReader) Retrieves the distribution of the given metric from the dissimilarity values calculated by a screen using the given two molecular descriptor readers.float
evaluateByMetric
(int descrIndex, int metricIndex, float minPercentageOfSimilarHits) Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the percentage of similar hits to the total number of similars must be greater or equal, than the given percentage.float
evaluateByMetric
(int descrIndex, int metricIndex, float minPercentageOfSimilarHits, MDReader similarSetReader, MDReader dissimilarSetReader) Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the percentage of similar hits to the total number of similars must be greater or equal, than the given percentage.float
evaluateByMetric
(int descrIndex, int metricIndex, int nSimilarHits) Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the given number of similars must be found.float
evaluateByMetric
(int descrIndex, int metricIndex, int fromNSimilarHits, int toNSimilarHits) Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the number of similar hits must be between the given numbers.float
evaluateByMetric
(int descrIndex, int metricIndex, int fromNSimilarHits, int toNSimilarHits, MDReader similarSetReader, MDReader dissimilarSetReader) Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the number of similar hits must be between the given numbers.float
evaluateByMetric
(int descrIndex, int metricIndex, int nSimilarHits, MDReader similarSetReader, MDReader dissimilarSetReader) Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the given number of similars must be found.int
Gets the index of the current the evaluator functionint
Gets the index of the evaluator function from its namegetEvaluatorFunctionName
(int index) Gets the name of the evaluator function from its indexReturns lists of dissimilars which have dissimilarity values lower than the similars.int
Retrieves ids of target hits found in a previous screen or evaluation one by one.int
Retrieves ids of known similar hits found in a previous screen or evaluation one by one.int
Returns the number of hits from the set of target molecules, found in a previous evaluation or screen.int
Returns the number of target molecules (read by dissimilarReader previously).int
Returns the number of hits from the known similar molecules, found in a previous evaluation or screen.int
Returns the number of known similar molecules (read by similarReader previously).float
Returns the value of the asymmetry factor (weight) of the evaluator funcion selectivity effectiveness.float
getThreshold
(int descrIndex, int metricIndex) Returns threshold set by last screen (given by user as a parameter) of evaluation (set by evaluation).void
Resets target hits found in a previous screen or evaluation for following retrieval one by one.void
Resets known similar hits found in a previous screen or evaluation for following retrieval one by one.float[]
screen
(int descrIndex, int metricIndex, float threshold) Screen the similar set and the dissimilar set with the given descriptor, metric and threshold.float[]
screen
(int descrIndex, int metricIndex, float threshold, MDReader similarSetReader, MDReader dissimilarSetReader) Screen the similar set and the dissimilar set with the given descriptor, metric and threshold.void
setCurrentEvaluatorFunction
(int index) Sets the evaluator function, the value of which is returned in each evaluate call.void
setSelectivityAsymmetryFactor
(float alpha) Sets the asymmetry factor (weight) of the evaluator funcion selectivity effectiveness.
-
Field Details
-
evaluatorFunctions
-
-
Constructor Details
-
MDHitEvaluator
Creates a new instance, allocates storage.- Parameters:
similarity
- A complete MDSimilarity object with added queries
-
-
Method Details
-
setCurrentEvaluatorFunction
public void setCurrentEvaluatorFunction(int index) Sets the evaluator function, the value of which is returned in each evaluate call.- Parameters:
index
- Index of evaluator funcion
-
getCurrentEvaluatorFunction
public int getCurrentEvaluatorFunction()Gets the index of the current the evaluator function- Returns:
- Index of evaluator funcion
-
getEvaluatorFunctionIndex
Gets the index of the evaluator function from its name- Parameters:
name
- Name of evaluator function- Returns:
- Index of evaluator funcion
- Throws:
IllegalArgumentException
-
getEvaluatorFunctionName
Gets the name of the evaluator function from its index- Parameters:
index
- Index of evaluator funcion- Returns:
- Name of evaluator function
-
setSelectivityAsymmetryFactor
Sets the asymmetry factor (weight) of the evaluator funcion selectivity effectiveness.- Parameters:
alpha
- Value of he asymmetry factor- Throws:
IllegalArgumentException
-
getSelectivityAsymmetryFactor
public float getSelectivityAsymmetryFactor()Returns the value of the asymmetry factor (weight) of the evaluator funcion selectivity effectiveness.- Returns:
- Value of he asymmetry factor
-
calcDissimilarity
public void calcDissimilarity(MDReader similarSetReader, MDReader dissimilarSetReader) throws MDReaderException Precalculates dissimilarity values. It is worth doind so, if the number of all dissimilarity values can fit into the memory. In this case several evaluations can be performed afterwards without recalculating the dissimilarity values.- Parameters:
similarSetReader
- Reader of the test set of known similarsdissimilarSetReader
- Reader of the set of target molecules (where similars are thought)- Throws:
MDReaderException
- if the record couldn't be read.
-
screen
public float[] screen(int descrIndex, int metricIndex, float threshold) Screen the similar set and the dissimilar set with the given descriptor, metric and threshold. Evaluate all evaluator functions. To be called only ifcalcDissimilarity()
has been called previously.- Parameters:
descrIndex
- Index of molecular descriptormetricIndex
- Index of metric (of the given molecular descriptor)threshold
- Threshold value for selecting hits- Returns:
- Array of evaluator function values
-
screen
public float[] screen(int descrIndex, int metricIndex, float threshold, MDReader similarSetReader, MDReader dissimilarSetReader) throws MDReaderException Screen the similar set and the dissimilar set with the given descriptor, metric and threshold. Evaluate all evaluator functions.- Parameters:
descrIndex
- Index of molecular descriptormetricIndex
- Index of metric (of the given molecular descriptor)threshold
- Threshold value for selecting hitssimilarSetReader
- Reader of the test set of known similarsdissimilarSetReader
- Reader of the set of target molecules (where similars are thought)- Returns:
- Array of evaluator function values
- Throws:
MDReaderException
- if the record couldn't be read.
-
evaluateByMetric
public float evaluateByMetric(int descrIndex, int metricIndex, int nSimilarHits) Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the given number of similars must be found. Threshold for metric is set using this requirement, and can be retrieved by a following call togetThreshold( descrIndex, metricIndex )
. To be called only ifcalcDissimilarity()
has been called previously.- Parameters:
descrIndex
- Index of molecular descriptormetricIndex
- Index of metric (of the given molecular descriptor)nSimilarHits
- Number of known similars required as hits- Returns:
- Value of current evaluator function
-
evaluateByMetric
public float evaluateByMetric(int descrIndex, int metricIndex, int nSimilarHits, MDReader similarSetReader, MDReader dissimilarSetReader) throws MDReaderException Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the given number of similars must be found. Threshold for metric is set using this requirement, and can be retrieved by a following call togetThreshold( descrIndex, metricIndex )
.- Parameters:
descrIndex
- Index of molecular descriptormetricIndex
- Index of metric (of the given molecular descriptor)nSimilarHits
- Number of known similars required as hitssimilarSetReader
- Reader of the test set of known similarsdissimilarSetReader
- Reader of the set of target molecules (where similars are thought)- Returns:
- Value of current evaluator function
- Throws:
MDReaderException
- if the record couldn't be read.
-
evaluateByMetric
public float evaluateByMetric(int descrIndex, int metricIndex, int fromNSimilarHits, int toNSimilarHits) Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the number of similar hits must be between the given numbers. The actual number of similar hits and the threshold for the metric will be the one, for which the value of the evaluator function is maximal amongst the allowed ones. Their value can be retrieved by a following call togetNumberOfSimilarHits( descrIndex, metricIndex )
andgetThreshold( descrIndex, metricIndex )
. To be called only ifcalcDissimilarity()
has been called previously.- Parameters:
descrIndex
- Index of molecular descriptormetricIndex
- Index of metric (of the given molecular descriptor)fromNSimilarHits
- Minimal number of known similars required as hitstoNSimilarHits
- Maximal number of known similars required as hits- Returns:
- Value of current evaluator function
-
evaluateByMetric
public float evaluateByMetric(int descrIndex, int metricIndex, float minPercentageOfSimilarHits) Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the percentage of similar hits to the total number of similars must be greater or equal, than the given percentage. The actual number of similar hits and the threshold for the metric will be the one, for which the value of the evaluator function is maximal amongst the allowed ones. Their value can be retrieved by a following call togetNumberOfSimilarHits( descrIndex, metricIndex )
andgetThreshold( descrIndex, metricIndex )
. To be called only ifcalcDissimilarity()
has been called previously.- Parameters:
descrIndex
- Index of molecular descriptormetricIndex
- Index of metric (of the given molecular descriptor)minPercentageOfSimilarHits
- Minimal percentage of known similars required as hits compared to total number of similars- Returns:
- Value of current evaluator function
-
evaluateByMetric
public float evaluateByMetric(int descrIndex, int metricIndex, int fromNSimilarHits, int toNSimilarHits, MDReader similarSetReader, MDReader dissimilarSetReader) throws MDReaderException Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the number of similar hits must be between the given numbers. The actual number of similar hits and the threshold for the metric will be the one, for which the value of the evaluator function is maximal amongst the allowed ones. Their value can be retrieved by a following call togetNumberOfSimilarHits( descrIndex, metricIndex )
andgetThreshold( descrIndex, metricIndex )
.- Parameters:
descrIndex
- Index of molecular descriptormetricIndex
- Index of metric (of the given molecular descriptor)fromNSimilarHits
- Minimal number of known similars required as hitstoNSimilarHits
- Maximal number of known similars required as hitssimilarSetReader
- Reader of the test set of known similarsdissimilarSetReader
- Reader of the set of target molecules (where similars are thought)- Returns:
- Value of current evaluator function
- Throws:
MDReaderException
- if the record couldn't be read.
-
evaluateByMetric
public float evaluateByMetric(int descrIndex, int metricIndex, float minPercentageOfSimilarHits, MDReader similarSetReader, MDReader dissimilarSetReader) throws MDReaderException Return the value of the current evaluator function for a screen of the similar set and the dissimilar set with the given descriptor and metric, when the percentage of similar hits to the total number of similars must be greater or equal, than the given percentage. The actual number of similar hits and the threshold for the metric will be the one, for which the value of the evaluator function is maximal amongst the allowed ones. Their value can be retrieved by a following call togetNumberOfSimilarHits( descrIndex, metricIndex )
andgetThreshold( descrIndex, metricIndex )
.- Parameters:
descrIndex
- Index of molecular descriptormetricIndex
- Index of metric (of the given molecular descriptor)minPercentageOfSimilarHits
- Minimal percentage of known similars required as hits compared to total number of similarssimilarSetReader
- Reader of the test set of known similarsdissimilarSetReader
- Reader of the set of target molecules (where similars are thought)- Returns:
- Value of current evaluator function
- Throws:
MDReaderException
- if the record couldn't be read.
-
getNumberOfSimilars
public int getNumberOfSimilars()Returns the number of known similar molecules (read by similarReader previously).- Returns:
- Number of known similar structures
-
getNumberOfDissimilars
public int getNumberOfDissimilars()Returns the number of target molecules (read by dissimilarReader previously).- Returns:
- Number of known target structures, which are not known to be similars
-
getNumberOfSimilarHits
public int getNumberOfSimilarHits()Returns the number of hits from the known similar molecules, found in a previous evaluation or screen.- Returns:
- Number of known similar hits
-
getNumberOfDissimilarHits
public int getNumberOfDissimilarHits()Returns the number of hits from the set of target molecules, found in a previous evaluation or screen.- Returns:
- Number of target hits
-
resetSimilarHits
public void resetSimilarHits()Resets known similar hits found in a previous screen or evaluation for following retrieval one by one. -
resetDissimilarHits
public void resetDissimilarHits()Resets target hits found in a previous screen or evaluation for following retrieval one by one. -
getNextSimilarHit
public int getNextSimilarHit()Retrieves ids of known similar hits found in a previous screen or evaluation one by one.- Returns:
- Id of next similar hit
-
getNextDissimilarHit
public int getNextDissimilarHit()Retrieves ids of target hits found in a previous screen or evaluation one by one.- Returns:
- Id of next hit from target set of dissimilars
-
getThreshold
public float getThreshold(int descrIndex, int metricIndex) Returns threshold set by last screen (given by user as a parameter) of evaluation (set by evaluation).- Parameters:
descrIndex
- Index of molecular descriptormetricIndex
- Index of metric (of the given molecular descriptor)- Returns:
- Threshold value
-
getInsertedDissimilars
Returns lists of dissimilars which have dissimilarity values lower than the similars. First element contains the list of dissimilar ids that have dissimilarity lower than all similars, second contains the ids of dissimilars with dissimilarity values between the first and second similar (if similars are ordered by their dissimilarity values) etc.- Returns:
- Array of lists of dissimilar ids, length is the number of similars.
- Since:
- JChem 2.2
-
calcMetricDistribution
public int[] calcMetricDistribution(int descrIndex, int metricIndex, float lowerBound, float upperBound, int nHistograms, float[] metricValues) Retrieves the distribution of the given metric from the dissimilarity values calculated by a previous call tocalcDissimilarity()
. Distribution is returned by giving the number of dissimilarities falling into the (nHistograms - 2) equal size intervals beween lowerBound and upperBound, and by adding two extra intervals: for each value lower than the given lower bound and for each value greater than the given upper bound. The i-th interval is defined as:[ metricValues[ i ], metricValues[ i + 1 ] ]
.- Parameters:
descrIndex
- Index of molecular descriptormetricIndex
- Index of metric (of the given molecular descriptor)lowerBound
- Lower bound for dissimilarity distributionupperBound
- Upper bound for dissimilarity distributionnHistograms
- Refinement of distribution: number of histograms (including the two extra histograms)metricValues
- Outgoing parameter! Must be allocated previously with length (nHistograms + 1), contains endpoints of the dissimilarity value intervals- Returns:
- Array of numbers of dissimilarity values falling into the intervals defined by metricValues
-
calcMetricDistribution
public int[] calcMetricDistribution(int descrIndex, int metricIndex, float lowerBound, float upperBound, int nHistograms, float[] metricValues, MDReader similarSetReader, MDReader dissimilarSetReader) throws MDReaderException Retrieves the distribution of the given metric from the dissimilarity values calculated by a screen using the given two molecular descriptor readers. Distribution is returned by giving the number of dissimilarities falling into the (nHistograms - 2) equal size intervals beween lowerBound and upperBound, and by adding two extra intervals: for each value lower than the given lower bound and for each value greater than the given upper bound. The i-th interval is defined as:[ metricValues[ i ], metricValues[ i + 1 ] ]
.- Parameters:
descrIndex
- Index of molecular descriptormetricIndex
- Index of metric (of the given molecular descriptor)lowerBound
- Lower bound for dissimilarity distributionupperBound
- Upper bound for dissimilarity distributionnHistograms
- Refinement of distribution: number of histograms (including the two extra histograms)metricValues
- Outgoing parameter! Must be allocated previously with length (nHistograms + 1), contains endpoints of the dissimilarity value intervalssimilarSetReader
- Reader of the test set of known similarsdissimilarSetReader
- Reader of the set of target molecules (where similars are thought)- Returns:
- Array of numbers of dissimilarity values falling into the intervals defined by metricValues
- Throws:
MDReaderException
- if the record couldn't be read.
-