Package chemaxon.descriptors
Class PharmacophoreFingerprint
java.lang.Object
chemaxon.descriptors.MolecularDescriptor
chemaxon.descriptors.PharmacophoreFingerprint
- All Implemented Interfaces:
chemaxon.license.Licensable
,Cloneable
@PublicApi
public class PharmacophoreFingerprint
extends MolecularDescriptor
implements chemaxon.license.Licensable
The
Pharmacophoric point types can be customized by the user of the software, and are specified in an external configuration file, see user documentation for details.
The total number of bars (or bins) in one histogram (that is, the number of cells in the descriptor) is determined by two distance values: the minimal and maximal distances of pharmacophoric point pairs (atom pairs). Since fingerprints handled by this class are two-dimensional, distances are considered as topological distances (that is, the distance of two atoms in the same molecule is equal to the number of edges in the shortest path connecting the two nodes corresponding to the two atoms in the chemical graph of the molecule). (This implies that chemical graphs should be connected.) Atom pairs closer to each other than minimal distance are regarded as being minimal distance apart (and similarly for distance greater than the maximal distance).
Thus the number of bars in one historgram is equal to: maximal distance - minimal distance + 1 .
The above described three configuration parameters (minimal and maximal distance, and the number of pharmacophore types) have substantial influence on the size of the pharmacophoric fingerprints. When this class is instantiated these params have to be provided in a
Beside fingerprint size, two further circumstances determine the internal logical structure of fingerprints: the order of the histograms in the fingerprint, and the order of histogram bars in one histogram. Histograms are ordered by pharmacophore type symbols, that is, if H-bond acceptor is denoted by a, and H-donor property by d (and there are no more features specified), then the order of histograms is: a-a, a-d, d-d (and according to the above introduced formula, the number of histograms is 2*(2+1)/2 = 3. Histogram bars are ordered from left to right by distance valued (from minimal to maximal distance).
This fingerprint structure results in a unique (well-defined, unambiguous) representation that enables the canonical numbering (indexing) of individual bins. This is vital in accessing cells efficiently. Otherwise, if only symbolic keys (in contrast to integer index numbers) could be used (for example ('a','d',3) ) a dramatic loss of efficiency in retrieving information from fingerprints would be experienced. Therefore it is crucial to introduce distinct symbols for different pharmacophore types in the XML configuration file and also to use the same symbols when fingerprints are generated and when they are used in dissimilarity calculations. Otherwise, the interpretation (meaning) of the fingerprints could be significantly different.
PharmacophoreFingerprint
class implements 2D pharmacophoric fingerprints.
Such fingerprints (which are chemical descriptors) are constructed from
sequences of histograms, each of these histograms have the same
number of bars. (Each of these bars represent a descriptor cell.)
The number of histograms is determined by the number
of pharmacophore types (also often referred as features, properties).
If the number of distinct pharmacophore features (for instance
H-donor, H-acceptor, charge etc.) is n then the number of
histograms is n*(n+1)/2. Pharmacophoric point types can be customized by the user of the software, and are specified in an external configuration file, see user documentation for details.
The total number of bars (or bins) in one histogram (that is, the number of cells in the descriptor) is determined by two distance values: the minimal and maximal distances of pharmacophoric point pairs (atom pairs). Since fingerprints handled by this class are two-dimensional, distances are considered as topological distances (that is, the distance of two atoms in the same molecule is equal to the number of edges in the shortest path connecting the two nodes corresponding to the two atoms in the chemical graph of the molecule). (This implies that chemical graphs should be connected.) Atom pairs closer to each other than minimal distance are regarded as being minimal distance apart (and similarly for distance greater than the maximal distance).
Thus the number of bars in one historgram is equal to: maximal distance - minimal distance + 1 .
The above described three configuration parameters (minimal and maximal distance, and the number of pharmacophore types) have substantial influence on the size of the pharmacophoric fingerprints. When this class is instantiated these params have to be provided in a
PFParameters
object.
Beside fingerprint size, two further circumstances determine the internal logical structure of fingerprints: the order of the histograms in the fingerprint, and the order of histogram bars in one histogram. Histograms are ordered by pharmacophore type symbols, that is, if H-bond acceptor is denoted by a, and H-donor property by d (and there are no more features specified), then the order of histograms is: a-a, a-d, d-d (and according to the above introduced formula, the number of histograms is 2*(2+1)/2 = 3. Histogram bars are ordered from left to right by distance valued (from minimal to maximal distance).
This fingerprint structure results in a unique (well-defined, unambiguous) representation that enables the canonical numbering (indexing) of individual bins. This is vital in accessing cells efficiently. Otherwise, if only symbolic keys (in contrast to integer index numbers) could be used (for example ('a','d',3) ) a dramatic loss of efficiency in retrieving information from fingerprints would be experienced. Therefore it is crucial to introduce distinct symbols for different pharmacophore types in the XML configuration file and also to use the same symbols when fingerprints are generated and when they are used in dissimilarity calculations. Otherwise, the interpretation (meaning) of the fingerprints could be significantly different.
Operations
Three main groups of operations (methods) can be distinguished:
- Direct bin manipulation: put value in a bin, increase the value stored in a bin, retrieve the value stored in a bin.
- Conversion methods: string representations, extracting into database format and building up from string and database formats.
- (Dis)similarity metrics: these compare two finegrprints and calculate a distance value (dissimilarity ratio or coefficient) between them.
- Since:
- JChem 2.0
-
Field Summary
Fields inherited from class chemaxon.descriptors.MolecularDescriptor
params
-
Constructor Summary
ConstructorDescriptionCreates a new, empty instance of PharmacophoreFingerprint without allocating internal storage.PharmacophoreFingerprint
(PFParameters params) Creates a new instance of PharmacophoreFingerprint according to the parameters given.Copy constructor.PharmacophoreFingerprint
(String params) Creates a new instance of PharmacophoreFingerprint according to the parameters given. -
Method Summary
Modifier and TypeMethodDescriptionfinal void
clear()
Clears the fingerprint: sets all bins to store zero value.clone()
Creates a new instance with identical internal state.protected byte[]
decompress
(byte[] data) Uncompresses input byte array and stores the uncompressed array in params.data.void
fromData
(byte[] dbRepr) Builds aPharmacophoreFingerprint
from an external data format, created by a previous call totoData()
.void
fromFloatArray
(float[] descr) Builds a molecular descriptor from its float array representation.final void
fromString
(String pfp) Builds a fingerprint from its string representation created bytoString()
.String[]
Creates the PharmacophoreFingerprint descriptor from the given Molecule.final float
get
(int bin) Gets the content of the specified hitogram bin.float
get
(int feature1, int feature2, float dist) final float
get
(int fa, int fb, int dist) Gets the histogram bar height of two features ('fa'-'fb') corresponding to the given ditance 'dist'.final float
final float
Calculates the asymmetric FBPA convolution product based distance of the fingerprint from an other (given as parameter).Color[]
Determines the coloring of atoms.int[]
Gets the individual atom colors by pharmcophore point type.String[]
float[]
Gets the default dissimilarity threshold values for all dissimilarity metrics defined.float
Calculates the dissimilarity between two pharmacophore fingerprints using the default distance measure.float
getDissimilarity
(MolecularDescriptor fp2, int metricIndex) Calculates the dissimilarity between two pharmacophore fingerprints using the specified parametrized distance metric.String[]
Gets the dissimilarity metric names.final float
Calculates the Euclidean distance.float
Calculates the lower bound estimate of the dissimilarity from the given fingerprint.float
float
getName()
Gets the name of the PharmacophoreFingerprint object.int
Gets the name of the parameters class corresponding to the descriptor.float
final float
getScaledTanimoto
(PharmacophoreFingerprint f, PharmacophoreFingerprint hypothesis) Calculates the scaled Tanimoto metric (adapted to hystograms).Gets the short name of the descriptor.getSymbol
(int feature) final float
Calculates the symmetric FBPA convolution product based distasnce of the fingerprint from an other (given as parameter).final float
Calculates the Tanimoto metric (adapted to hystograms)float
Calculates the Tversky !!DISSIMILARITY!! indexfinal float
Calculates the weighted asymmetric Euclidean distance.final float
Calculates the weighted Euclidean distance.final void
inc
(int bin) Increments the content of the specified hitogram bin by one.final void
inc
(int fa, int fb, int dist) Increments the histogram corresponding to two features ('fa'-'fb') and a distance, 'dist'.final void
inc
(int fa, int fb, int dist, float[] incr) The fuzzy version ofinc( int fa, int fb, int dist )
.final void
inc
(int fa, int fb, int dist, int nrRotBonds) The fuzzy version ofinc( int fa, int fb, int dist )
.int
index
(int fa, int fb, int dist) Calculates the index of the bin specified by the arguments.boolean
Returns information about the licensing of the product.boolean
Checks if this fingerprint is a subset of another fingerprint that is passed as method parameter.final void
put
(int bin, float newValue) Stores the given value in the specified hitogram bin.final void
put
(int bin, int newValue) Stores the given value in the specified hitogram bin.void
Sets the license environment.void
setParameters
(MDParameters parameters) Sets parameters, allocates internal storage if needed and cleans the descriptor.void
setParameters
(String parameters) Sets the parameters of an already createdPharmacophoreFingerprint
object.byte[]
toData()
Converts aPharmacophoreFingerprint
object into a byte array.final String
Converts the fingerprint into a string of decial numbers.float[]
Creates the float array representation of aMolecularDescriptor
object.final String
toHistogramString
(String sep, boolean nonZeroOnly) Creates the string representation of the fingerprint.final String
toString()
Converts the fingerprint into a readable string.final String
Creates the string representation of the pharmacophore fingerprint.Methods inherited from class chemaxon.descriptors.MolecularDescriptor
generate, getDefaultMetricIndex, getDefaultThreshold, getDissimilarityMetricIndex, getMetricIndex, getMetricName, getMetricName, getNumberOfMetrics, getNumberOfWeights, getParameters, getThreshold, getThreshold, main, needsConfig, newInstance, newInstance, newInstanceFromXML, setScreeningConfiguration, toBinaryString
-
Field Details
-
fp
protected float[] fpstorage for the fingerprint
-
-
Constructor Details
-
PharmacophoreFingerprint
public PharmacophoreFingerprint()Creates a new, empty instance of PharmacophoreFingerprint without allocating internal storage. -
PharmacophoreFingerprint
Creates a new instance of PharmacophoreFingerprint according to the parameters given.- Parameters:
params
- parameters used in fingerprint generation and handling
-
PharmacophoreFingerprint
Creates a new instance of PharmacophoreFingerprint according to the parameters given.- Parameters:
params
- parameter settings
-
PharmacophoreFingerprint
Copy constructor. An identical copy of the pharmacophore fingerprint passed is created, they share the samePFParameters
object.- Parameters:
pfp
- fingerprint to be copied
-
-
Method Details
-
clone
Creates a new instance with identical internal state.- Specified by:
clone
in classMolecularDescriptor
- Returns:
- the newly copied object
-
isLicensed
public boolean isLicensed()Returns information about the licensing of the product.- Specified by:
isLicensed
in interfacechemaxon.license.Licensable
- Returns:
- true if the product is correctly licensed
-
setLicenseEnvironment
Sets the license environment.- Specified by:
setLicenseEnvironment
in interfacechemaxon.license.Licensable
-
getName
Gets the name of the PharmacophoreFingerprint object. The name is not the same as the class name, it is nicer, more readable and meaningful for end-users too.- Overrides:
getName
in classMolecularDescriptor
- Returns:
- the nice, external name for PharmacophoreFingerprint class objects
-
getShortName
Gets the short name of the descriptor.- Overrides:
getShortName
in classMolecularDescriptor
- Returns:
- the short name used in text outputs (tables etc.)
-
getParametersClassName
Gets the name of the parameters class corresponding to the descriptor.- Overrides:
getParametersClassName
in classMolecularDescriptor
- Returns:
- the name of the parameters class
-
setParameters
Sets parameters, allocates internal storage if needed and cleans the descriptor.- Overrides:
setParameters
in classMolecularDescriptor
- Parameters:
parameters
- fingerprint parameters- Since:
- JChem 2.2
-
setParameters
Sets the parameters of an already createdPharmacophoreFingerprint
object.- Specified by:
setParameters
in classMolecularDescriptor
- Parameters:
parameters
- parameter settings for the descriptor- Throws:
MDParametersException
- any XML error
-
toData
public byte[] toData()Converts aPharmacophoreFingerprint
object into a byte array. This format can be reffered to as an "external representation" since it servers as the data format for storing fingerprints in databases.
Use thefromData()
method to build the pharmacophore fingerprint from this "external" representation.- Specified by:
toData
in classMolecularDescriptor
- Returns:
- byte array representation of the fingerprint object
-
fromData
public void fromData(byte[] dbRepr) Builds aPharmacophoreFingerprint
from an external data format, created by a previous call totoData()
.- Specified by:
fromData
in classMolecularDescriptor
- Parameters:
dbRepr
- "external" representation of PharmacophoreFingerprint
-
decompress
protected byte[] decompress(byte[] data) Uncompresses input byte array and stores the uncompressed array in params.data. This is the reverse ofcompress( final byte[] )
. Checks header (first byte) and decompresses only if the value of the first byte is ZERO_SEQUENCE_COMPRESSION_CODE. Otherwise null is returned.- Parameters:
data
- compressed data
-
generate
Creates the PharmacophoreFingerprint descriptor from the given Molecule. Calls the generator created by the correspondingMDParameters
class.- Overrides:
generate
in classMolecularDescriptor
- Returns:
- property names set in the molecule passed during generation
- Throws:
MDGeneratorException
- when failed to generate descriptor
-
inc
public final void inc(int fa, int fb, int dist) Increments the histogram corresponding to two features ('fa'-'fb') and a distance, 'dist'. Pharmacophore features (types, properties) are not used directly, but instead their indices (as introduced byPSymbols
class) have to be provided for the sake of efficiency. Distance values are normalized in this method to fall within the minimum and maximum distance range, as specified by the previously given parameters.
If the bin is already full its value is not changed.- Parameters:
fa
- feature index of one of the featuresfb
- feature index of the other paharmacophore featuredist
- distance value of the two features
-
inc
public final void inc(int fa, int fb, int dist, int nrRotBonds) The fuzzy version ofinc( int fa, int fb, int dist )
. The contents of all bins in the (fa,fb) histogram are incremented with the appropriate value depending on the distance and the number of rotatable bonds, and also the fuzzy smoothig factor.- Parameters:
fa
- feature index of one of the featuresfb
- feature index of the other paharmacophore featuredist
- distance value of the two featuresnrRotBonds
- number of rotatable bonds on the path connecting the two pharmacophoric points
-
inc
public final void inc(int fa, int fb, int dist, float[] incr) The fuzzy version ofinc( int fa, int fb, int dist )
. The contents of all bins in the (fa,fb) histogram are incremented with the appropriate value depending on the user defined fuzzy smoothing vector.- Parameters:
fa
- feature index of one of the featuresfb
- feature index of the other paharmacophore featuredist
- distance value of the two featuresincr
- distant dependent fuzzy increments
-
inc
public final void inc(int bin) Increments the content of the specified hitogram bin by one. No overflow check is performed for the sake of efficiency (in normal use no overflow should occur, since 2^32-1 is large enough for molecules having about 90000 atoms). See the class description for the exact meaning of the bin index.- Parameters:
bin
- index of the bin to be incremented by one
-
put
public final void put(int bin, int newValue) Stores the given value in the specified hitogram bin. Previous value of the bin is thrown away.- Parameters:
bin
- index of the bin to be incremented by onenewValue
- value to be stored in the given bin
-
put
public final void put(int bin, float newValue) Stores the given value in the specified hitogram bin. Previous value of the bin is thrown away.- Parameters:
bin
- index of the bin to be incremented by onenewValue
- value to be stored in the given bin
-
get
public final float get(int fa, int fb, int dist) Gets the histogram bar height of two features ('fa'-'fb') corresponding to the given ditance 'dist'. Distance values have to be normalized upfront to calling this method!- Parameters:
fa
- feature index of one of the featuresfb
- feature index of the other paharmacophore featuredist
- distance value of the two features- Returns:
- height (value) of the histogram bar (column) corresponding to the input arguments
-
get
public final float get(int bin) Gets the content of the specified hitogram bin. See the description of
class for the meaning of the bin index.PharmacophoreFingerprint
- Parameters:
bin
- index of the bin qeuried- Returns:
- the value sotred in the specified bin
-
clear
public final void clear()Clears the fingerprint: sets all bins to store zero value. -
toString
Converts the fingerprint into a readable string. This is the default external text format of the pharmacophore fingerprint, also written into SDfile into the field named (tagged) PFP2D (seesetPMAPTagName( String tagName )
). SeetoHistogramString(String sep, boolean nonZeroOnly)
for detailed format description.- Specified by:
toString
in classMolecularDescriptor
- Returns:
- string representation of the pharmacophore fingerprint
-
fromString
Builds a fingerprint from its string representation created bytoString()
.- Specified by:
fromString
in classMolecularDescriptor
- Parameters:
pfp
- pharmacophore fingerprint string- Throws:
ParseException
-
toString
Creates the string representation of the pharmacophore fingerprint. The output format is different than intoString
:<feature symbol> ' ' <feature symbol> @ <distance> '=' <value> <sep> ...
. Note, that such text representation cannot be converted into pharmacophore fingerprint data.- Parameters:
sep
- separator character printed between two binsnonZeroOnly
- bins containing zero values are not printed- Returns:
- the string representation of the fingerprint
-
toHistogramString
Creates the string representation of the fingerprint. All bins, or all all bins of those histograms in which at least one feature pair has at least one occurance (that is one non-zero valued bin) are printed depending on parameter settings.
The format is:<feature symbol> ' ' <feature symbol> '=' '|' b1 b2 ... bn '|' <separator>
, wherebi
denotes the value stored in bini
.- Parameters:
sep
- separator string to be printed between histogramsnonZeroOnly
- all or non-zero value containing histogram are printed- Returns:
- the string representation of the fingerprint
-
toDecimalString
Converts the fingerprint into a string of decial numbers. All bins are printed in an unstructed way, values are simply separated by tabs.- Specified by:
toDecimalString
in classMolecularDescriptor
- Returns:
- binary string representation of the fingerprint
-
toFloatArray
public float[] toFloatArray()Creates the float array representation of aMolecularDescriptor
object. This array contains all values of the descriptor (including all zeros) in the elements of the array.- Specified by:
toFloatArray
in classMolecularDescriptor
- Returns:
- float array of the fingerprint cells
- Since:
- JChem 2.0.1
-
fromFloatArray
public void fromFloatArray(float[] descr) Builds a molecular descriptor from its float array representation. Typically used when a hypothesis is created.- Specified by:
fromFloatArray
in classMolecularDescriptor
- Parameters:
descr
- descriptor represented in a float array (e.g. generated bytoFloatArray()
)- Since:
- JChem 2.0.1
-
getAtomSetColors
Determines the coloring of atoms. This coloring does not reflect element types, instead pharmacophore point types. This method should be called after each call ofsetParameters()
as that may change the coloring scheme to be applied.- Overrides:
getAtomSetColors
in classMolecularDescriptor
- Returns:
- array of colors of different pharmacophore point types
-
getAtomSetNames
- Overrides:
getAtomSetNames
in classMolecularDescriptor
-
getAtomSetIndexes
Gets the individual atom colors by pharmcophore point type.- Overrides:
getAtomSetIndexes
in classMolecularDescriptor
- Parameters:
m
- a molecule to assign pharmacophore point colors to- Returns:
- array of color indexes indexed by atom indixes
-
getDissimilarityMetrics
Gets the dissimilarity metric names.- Specified by:
getDissimilarityMetrics
in classMolecularDescriptor
- Returns:
- the metrics array
-
getDefaultDissimilarityMetricThresholds
public float[] getDefaultDissimilarityMetricThresholds()Gets the default dissimilarity threshold values for all dissimilarity metrics defined.- Specified by:
getDefaultDissimilarityMetricThresholds
in classMolecularDescriptor
- Returns:
- array of dissimilarity threshold values
-
getEuclidean
Calculates the Euclidean distance. The dissimilarity coefficient returned ranges from 0 to MAX_FLOAT, this coefficient is not normalized.- Parameters:
f
- another fingerprint from which the distance is measured- Returns:
- dissimilarity coefficient
-
getAsymmetricEuclidean
-
getWeightedEuclidean
Calculates the weighted Euclidean distance. Weights are taken from the associated PFParameters.- Parameters:
f
- a fingerprint from which the distance is measured- Returns:
- dissimilarity coefficient
-
getWeightedAsymmetricEuclidean
Calculates the weighted asymmetric Euclidean distance. Weights and asymmetry ratio are taken from the associated PFParameters.- Parameters:
f
- a fingerprint from which the distance is measured- Returns:
- dissimilarity coefficient
-
getSymmetricFBPA
Calculates the symmetric FBPA convolution product based distasnce of the fingerprint from an other (given as parameter).- Parameters:
f
- distance ofthis
is taken fromf
- Returns:
- euclidean distance (dissimilarity measure)
-
getAsymmetricFBPA
Calculates the asymmetric FBPA convolution product based distance of the fingerprint from an other (given as parameter).- Parameters:
f
- the reference fingerprint (denoted by M))- Returns:
- the euaclidean distance (dissimilarity measure)
-
getTanimoto
Calculates the Tanimoto metric (adapted to hystograms)- Parameters:
f
- the distance fromf
is calculated- Returns:
- the tanimoto distance (dissimilarity measure)
-
getTversky
Calculates the Tversky !!DISSIMILARITY!! index- Parameters:
f
- the distance fromf
is calculated- Returns:
- the Tversky dissmilarity index as float
-
getScaledTanimoto
public final float getScaledTanimoto(PharmacophoreFingerprint f, PharmacophoreFingerprint hypothesis) Calculates the scaled Tanimoto metric (adapted to hystograms).- Parameters:
f
- the distance is measured fromf
- Returns:
- the tanimoto distance (dissimilarity measure)
-
index
public int index(int fa, int fb, int dist) Calculates the index of the bin specified by the arguments.- Parameters:
fa
- index of the first pharmacophore point typefb
- index of the second (other) pharmacophore point typedist
- distance of the pharmacophore points- Returns:
- index of the specified bit
-
getDissimilarity
Calculates the dissimilarity between two pharmacophore fingerprints using the default distance measure.- Specified by:
getDissimilarity
in classMolecularDescriptor
- Parameters:
fp2
- the other pharmacophore fingerprint- Returns:
- dissimilarity ratio
-
getDissimilarity
Calculates the dissimilarity between two pharmacophore fingerprints using the specified parametrized distance metric.- Specified by:
getDissimilarity
in classMolecularDescriptor
- Parameters:
fp2
- the pharmacohore fingerprint from which the distance is measuredmetricIndex
- index of the parametrized metric to be used- Returns:
- the dissimilarity ratio
- See Also:
-
getLowerBound
Calculates the lower bound estimate of the dissimilarity from the given fingerprint. This method is required byDiffable
see remarks at
for further explanation. In the case of PharmacophoreFingerprint a good estimate for the minimum distance cannot be obtained efficiently (that is, significantly faster than calculating the proper distance) therefore 0 is returned. This trivial distance bound estimation will lead to callinggetDissimilarity
( final Object fp2 )getDistance
.- Overrides:
getLowerBound
in classMolecularDescriptor
- Parameters:
fp2
- pharmacophore fingerprint from which distance is measured- Returns:
- estimate of the minimum distance
-
isSubsetOf
Checks if this fingerprint is a subset of another fingerprint that is passed as method parameter. A histogram (fingerprint) is considered to be a subset of another, if none of its bars is higher than that of the other's.- Parameters:
d
- a descriptor which is supposed to be a superset- Returns:
- true if this descriptor is a subset of the parameter
-
getMaxDist
public float getMaxDist() -
getMinDist
public float getMinDist() -
getResolution
public float getResolution() -
getNumberOfFeatures
public int getNumberOfFeatures() -
getSymbol
-
get
public float get(int feature1, int feature2, float dist) -
getAliasNames
-