Package chemaxon.descriptors
Class PharmacophoreFingerprint
java.lang.Object
chemaxon.descriptors.MolecularDescriptor
chemaxon.descriptors.PharmacophoreFingerprint
- All Implemented Interfaces:
- chemaxon.license.Licensable,- Cloneable
@PublicApi
public class PharmacophoreFingerprint
extends MolecularDescriptor
implements chemaxon.license.Licensable
The 
Pharmacophoric point types can be customized by the user of the software, and are specified in an external configuration file, see user documentation for details.
The total number of bars (or bins) in one histogram (that is, the number of cells in the descriptor) is determined by two distance values: the minimal and maximal distances of pharmacophoric point pairs (atom pairs). Since fingerprints handled by this class are two-dimensional, distances are considered as topological distances (that is, the distance of two atoms in the same molecule is equal to the number of edges in the shortest path connecting the two nodes corresponding to the two atoms in the chemical graph of the molecule). (This implies that chemical graphs should be connected.) Atom pairs closer to each other than minimal distance are regarded as being minimal distance apart (and similarly for distance greater than the maximal distance).
Thus the number of bars in one historgram is equal to: maximal distance - minimal distance + 1 .
The above described three configuration parameters (minimal and maximal distance, and the number of pharmacophore types) have substantial influence on the size of the pharmacophoric fingerprints. When this class is instantiated these params have to be provided in a
Beside fingerprint size, two further circumstances determine the internal logical structure of fingerprints: the order of the histograms in the fingerprint, and the order of histogram bars in one histogram. Histograms are ordered by pharmacophore type symbols, that is, if H-bond acceptor is denoted by a, and H-donor property by d (and there are no more features specified), then the order of histograms is: a-a, a-d, d-d (and according to the above introduced formula, the number of histograms is 2*(2+1)/2 = 3. Histogram bars are ordered from left to right by distance valued (from minimal to maximal distance).
This fingerprint structure results in a unique (well-defined, unambiguous) representation that enables the canonical numbering (indexing) of individual bins. This is vital in accessing cells efficiently. Otherwise, if only symbolic keys (in contrast to integer index numbers) could be used (for example ('a','d',3) ) a dramatic loss of efficiency in retrieving information from fingerprints would be experienced. Therefore it is crucial to introduce distinct symbols for different pharmacophore types in the XML configuration file and also to use the same symbols when fingerprints are generated and when they are used in dissimilarity calculations. Otherwise, the interpretation (meaning) of the fingerprints could be significantly different.
PharmacophoreFingerprint class implements 2D pharmacophoric fingerprints.
 Such fingerprints (which are chemical descriptors) are constructed from
 sequences of histograms, each of these histograms have the same
 number of bars. (Each of these bars represent a descriptor cell.)
 The number of histograms is determined by the number
 of pharmacophore types (also often referred as features, properties).
 If the number of distinct pharmacophore features (for instance
 H-donor, H-acceptor, charge etc.) is n then the number of
 histograms is n*(n+1)/2. Pharmacophoric point types can be customized by the user of the software, and are specified in an external configuration file, see user documentation for details.
The total number of bars (or bins) in one histogram (that is, the number of cells in the descriptor) is determined by two distance values: the minimal and maximal distances of pharmacophoric point pairs (atom pairs). Since fingerprints handled by this class are two-dimensional, distances are considered as topological distances (that is, the distance of two atoms in the same molecule is equal to the number of edges in the shortest path connecting the two nodes corresponding to the two atoms in the chemical graph of the molecule). (This implies that chemical graphs should be connected.) Atom pairs closer to each other than minimal distance are regarded as being minimal distance apart (and similarly for distance greater than the maximal distance).
Thus the number of bars in one historgram is equal to: maximal distance - minimal distance + 1 .
The above described three configuration parameters (minimal and maximal distance, and the number of pharmacophore types) have substantial influence on the size of the pharmacophoric fingerprints. When this class is instantiated these params have to be provided in a
PFParameters object.
 Beside fingerprint size, two further circumstances determine the internal logical structure of fingerprints: the order of the histograms in the fingerprint, and the order of histogram bars in one histogram. Histograms are ordered by pharmacophore type symbols, that is, if H-bond acceptor is denoted by a, and H-donor property by d (and there are no more features specified), then the order of histograms is: a-a, a-d, d-d (and according to the above introduced formula, the number of histograms is 2*(2+1)/2 = 3. Histogram bars are ordered from left to right by distance valued (from minimal to maximal distance).
This fingerprint structure results in a unique (well-defined, unambiguous) representation that enables the canonical numbering (indexing) of individual bins. This is vital in accessing cells efficiently. Otherwise, if only symbolic keys (in contrast to integer index numbers) could be used (for example ('a','d',3) ) a dramatic loss of efficiency in retrieving information from fingerprints would be experienced. Therefore it is crucial to introduce distinct symbols for different pharmacophore types in the XML configuration file and also to use the same symbols when fingerprints are generated and when they are used in dissimilarity calculations. Otherwise, the interpretation (meaning) of the fingerprints could be significantly different.
Operations
Three main groups of operations (methods) can be distinguished:
- Direct bin manipulation: put value in a bin, increase the value stored in a bin, retrieve the value stored in a bin.
- Conversion methods: string representations, extracting into database format and building up from string and database formats.
- (Dis)similarity metrics: these compare two finegrprints and calculate a distance value (dissimilarity ratio or coefficient) between them.
- Since:
- JChem 2.0
- 
Field SummaryFieldsFields inherited from class chemaxon.descriptors.MolecularDescriptorparams
- 
Constructor SummaryConstructorsConstructorDescriptionCreates a new, empty instance of PharmacophoreFingerprint without allocating internal storage.PharmacophoreFingerprint(PFParameters params) Creates a new instance of PharmacophoreFingerprint according to the parameters given.Copy constructor.PharmacophoreFingerprint(String params) Creates a new instance of PharmacophoreFingerprint according to the parameters given.
- 
Method SummaryModifier and TypeMethodDescriptionfinal voidclear()Clears the fingerprint: sets all bins to store zero value.clone()Creates a new instance with identical internal state.protected byte[]decompress(byte[] data) Uncompresses input byte array and stores the uncompressed array in params.data.voidfromData(byte[] dbRepr) Builds aPharmacophoreFingerprintfrom an external data format, created by a previous call totoData().voidfromFloatArray(float[] descr) Builds a molecular descriptor from its float array representation.final voidfromString(String pfp) Builds a fingerprint from its string representation created bytoString().String[]Creates the PharmacophoreFingerprint descriptor from the given Molecule.final floatget(int bin) Gets the content of the specified hitogram bin.floatget(int feature1, int feature2, float dist) final floatget(int fa, int fb, int dist) Gets the histogram bar height of two features ('fa'-'fb') corresponding to the given ditance 'dist'.final floatfinal floatCalculates the asymmetric FBPA convolution product based distance of the fingerprint from an other (given as parameter).Color[]Determines the coloring of atoms.int[]Gets the individual atom colors by pharmcophore point type.String[]float[]Gets the default dissimilarity threshold values for all dissimilarity metrics defined.floatCalculates the dissimilarity between two pharmacophore fingerprints using the default distance measure.floatgetDissimilarity(MolecularDescriptor fp2, int metricIndex) Calculates the dissimilarity between two pharmacophore fingerprints using the specified parametrized distance metric.String[]Gets the dissimilarity metric names.final floatCalculates the Euclidean distance.floatCalculates the lower bound estimate of the dissimilarity from the given fingerprint.floatfloatgetName()Gets the name of the PharmacophoreFingerprint object.intGets the name of the parameters class corresponding to the descriptor.floatfinal floatgetScaledTanimoto(PharmacophoreFingerprint f, PharmacophoreFingerprint hypothesis) Calculates the scaled Tanimoto metric (adapted to hystograms).Gets the short name of the descriptor.getSymbol(int feature) final floatCalculates the symmetric FBPA convolution product based distasnce of the fingerprint from an other (given as parameter).final floatCalculates the Tanimoto metric (adapted to hystograms)floatCalculates the Tversky !!DISSIMILARITY!! indexfinal floatCalculates the weighted asymmetric Euclidean distance.final floatCalculates the weighted Euclidean distance.final voidinc(int bin) Increments the content of the specified hitogram bin by one.final voidinc(int fa, int fb, int dist) Increments the histogram corresponding to two features ('fa'-'fb') and a distance, 'dist'.final voidinc(int fa, int fb, int dist, float[] incr) The fuzzy version ofinc( int fa, int fb, int dist ).final voidinc(int fa, int fb, int dist, int nrRotBonds) The fuzzy version ofinc( int fa, int fb, int dist ).intindex(int fa, int fb, int dist) Calculates the index of the bin specified by the arguments.booleanReturns information about the licensing of the product.booleanChecks if this fingerprint is a subset of another fingerprint that is passed as method parameter.final voidput(int bin, float newValue) Stores the given value in the specified hitogram bin.final voidput(int bin, int newValue) Stores the given value in the specified hitogram bin.voidSets the license environment.voidsetParameters(MDParameters parameters) Sets parameters, allocates internal storage if needed and cleans the descriptor.voidsetParameters(String parameters) Sets the parameters of an already createdPharmacophoreFingerprintobject.byte[]toData()Converts aPharmacophoreFingerprintobject into a byte array.final StringConverts the fingerprint into a string of decial numbers.float[]Creates the float array representation of aMolecularDescriptorobject.final StringtoHistogramString(String sep, boolean nonZeroOnly) Creates the string representation of the fingerprint.final StringtoString()Converts the fingerprint into a readable string.final StringCreates the string representation of the pharmacophore fingerprint.Methods inherited from class chemaxon.descriptors.MolecularDescriptorgenerate, getDefaultMetricIndex, getDefaultThreshold, getDissimilarityMetricIndex, getMetricIndex, getMetricName, getMetricName, getNumberOfMetrics, getNumberOfWeights, getParameters, getThreshold, getThreshold, needsConfig, newInstance, newInstance, newInstanceFromXML, setScreeningConfiguration, toBinaryString
- 
Field Details- 
fpprotected float[] fpstorage for the fingerprint
 
- 
- 
Constructor Details- 
PharmacophoreFingerprintpublic PharmacophoreFingerprint()Creates a new, empty instance of PharmacophoreFingerprint without allocating internal storage.
- 
PharmacophoreFingerprintCreates a new instance of PharmacophoreFingerprint according to the parameters given.- Parameters:
- params- parameters used in fingerprint generation and handling
 
- 
PharmacophoreFingerprintCreates a new instance of PharmacophoreFingerprint according to the parameters given.- Parameters:
- params- parameter settings
 
- 
PharmacophoreFingerprintCopy constructor. An identical copy of the pharmacophore fingerprint passed is created, they share the samePFParametersobject.- Parameters:
- pfp- fingerprint to be copied
 
 
- 
- 
Method Details- 
cloneCreates a new instance with identical internal state.- Specified by:
- clonein class- MolecularDescriptor
- Returns:
- the newly copied object
 
- 
isLicensedpublic boolean isLicensed()Returns information about the licensing of the product.- Specified by:
- isLicensedin interface- chemaxon.license.Licensable
- Returns:
- true if the product is correctly licensed
 
- 
setLicenseEnvironmentSets the license environment.- Specified by:
- setLicenseEnvironmentin interface- chemaxon.license.Licensable
 
- 
getNameGets the name of the PharmacophoreFingerprint object. The name is not the same as the class name, it is nicer, more readable and meaningful for end-users too.- Overrides:
- getNamein class- MolecularDescriptor
- Returns:
- the nice, external name for PharmacophoreFingerprint class objects
 
- 
getShortNameGets the short name of the descriptor.- Overrides:
- getShortNamein class- MolecularDescriptor
- Returns:
- the short name used in text outputs (tables etc.)
 
- 
getParametersClassNameGets the name of the parameters class corresponding to the descriptor.- Overrides:
- getParametersClassNamein class- MolecularDescriptor
- Returns:
- the name of the parameters class
 
- 
setParametersSets parameters, allocates internal storage if needed and cleans the descriptor.- Overrides:
- setParametersin class- MolecularDescriptor
- Parameters:
- parameters- fingerprint parameters
- Since:
- JChem 2.2
 
- 
setParametersSets the parameters of an already createdPharmacophoreFingerprintobject.- Specified by:
- setParametersin class- MolecularDescriptor
- Parameters:
- parameters- parameter settings for the descriptor
- Throws:
- MDParametersException- any XML error
 
- 
toDatapublic byte[] toData()Converts aPharmacophoreFingerprintobject into a byte array. This format can be reffered to as an "external representation" since it servers as the data format for storing fingerprints in databases.
 Use thefromData()method to build the pharmacophore fingerprint from this "external" representation.- Specified by:
- toDatain class- MolecularDescriptor
- Returns:
- byte array representation of the fingerprint object
 
- 
fromDatapublic void fromData(byte[] dbRepr) Builds aPharmacophoreFingerprintfrom an external data format, created by a previous call totoData().- Specified by:
- fromDatain class- MolecularDescriptor
- Parameters:
- dbRepr- "external" representation of PharmacophoreFingerprint
 
- 
decompressprotected byte[] decompress(byte[] data) Uncompresses input byte array and stores the uncompressed array in params.data. This is the reverse ofcompress( final byte[] ). Checks header (first byte) and decompresses only if the value of the first byte is ZERO_SEQUENCE_COMPRESSION_CODE. Otherwise null is returned.- Parameters:
- data- compressed data
 
- 
generateCreates the PharmacophoreFingerprint descriptor from the given Molecule. Calls the generator created by the correspondingMDParametersclass.- Overrides:
- generatein class- MolecularDescriptor
- Returns:
- property names set in the molecule passed during generation
- Throws:
- MDGeneratorException- when failed to generate descriptor
 
- 
incpublic final void inc(int fa, int fb, int dist) Increments the histogram corresponding to two features ('fa'-'fb') and a distance, 'dist'. Pharmacophore features (types, properties) are not used directly, but instead their indices (as introduced byPSymbolsclass) have to be provided for the sake of efficiency. Distance values are normalized in this method to fall within the minimum and maximum distance range, as specified by the previously given parameters.
 If the bin is already full its value is not changed.- Parameters:
- fa- feature index of one of the features
- fb- feature index of the other paharmacophore feature
- dist- distance value of the two features
 
- 
incpublic final void inc(int fa, int fb, int dist, int nrRotBonds) The fuzzy version ofinc( int fa, int fb, int dist ). The contents of all bins in the (fa,fb) histogram are incremented with the appropriate value depending on the distance and the number of rotatable bonds, and also the fuzzy smoothig factor.- Parameters:
- fa- feature index of one of the features
- fb- feature index of the other paharmacophore feature
- dist- distance value of the two features
- nrRotBonds- number of rotatable bonds on the path connecting the two pharmacophoric points
 
- 
incpublic final void inc(int fa, int fb, int dist, float[] incr) The fuzzy version ofinc( int fa, int fb, int dist ). The contents of all bins in the (fa,fb) histogram are incremented with the appropriate value depending on the user defined fuzzy smoothing vector.- Parameters:
- fa- feature index of one of the features
- fb- feature index of the other paharmacophore feature
- dist- distance value of the two features
- incr- distant dependent fuzzy increments
 
- 
incpublic final void inc(int bin) Increments the content of the specified hitogram bin by one. No overflow check is performed for the sake of efficiency (in normal use no overflow should occur, since 2^32-1 is large enough for molecules having about 90000 atoms). See the class description for the exact meaning of the bin index.- Parameters:
- bin- index of the bin to be incremented by one
 
- 
putpublic final void put(int bin, int newValue) Stores the given value in the specified hitogram bin. Previous value of the bin is thrown away.- Parameters:
- bin- index of the bin to be incremented by one
- newValue- value to be stored in the given bin
 
- 
putpublic final void put(int bin, float newValue) Stores the given value in the specified hitogram bin. Previous value of the bin is thrown away.- Parameters:
- bin- index of the bin to be incremented by one
- newValue- value to be stored in the given bin
 
- 
getpublic final float get(int fa, int fb, int dist) Gets the histogram bar height of two features ('fa'-'fb') corresponding to the given ditance 'dist'. Distance values have to be normalized upfront to calling this method!- Parameters:
- fa- feature index of one of the features
- fb- feature index of the other paharmacophore feature
- dist- distance value of the two features
- Returns:
- height (value) of the histogram bar (column) corresponding to the input arguments
 
- 
getpublic final float get(int bin) Gets the content of the specified hitogram bin. See the description ofPharmacophoreFingerprint- Parameters:
- bin- index of the bin qeuried
- Returns:
- the value sotred in the specified bin
 
- 
clearpublic final void clear()Clears the fingerprint: sets all bins to store zero value.
- 
toStringConverts the fingerprint into a readable string. This is the default external text format of the pharmacophore fingerprint, also written into SDfile into the field named (tagged) PFP2D (seesetPMAPTagName( String tagName )). SeetoHistogramString(String sep, boolean nonZeroOnly)for detailed format description.- Specified by:
- toStringin class- MolecularDescriptor
- Returns:
- string representation of the pharmacophore fingerprint
 
- 
fromStringBuilds a fingerprint from its string representation created bytoString().- Specified by:
- fromStringin class- MolecularDescriptor
- Parameters:
- pfp- pharmacophore fingerprint string
- Throws:
- ParseException
 
- 
toStringCreates the string representation of the pharmacophore fingerprint. The output format is different than intoString:<feature symbol> ' ' <feature symbol> @ <distance> '=' <value> <sep> .... Note, that such text representation cannot be converted into pharmacophore fingerprint data.- Parameters:
- sep- separator character printed between two bins
- nonZeroOnly- bins containing zero values are not printed
- Returns:
- the string representation of the fingerprint
 
- 
toHistogramStringCreates the string representation of the fingerprint. All bins, or all all bins of those histograms in which at least one feature pair has at least one occurance (that is one non-zero valued bin) are printed depending on parameter settings.
 The format is:<feature symbol> ' ' <feature symbol> '=' '|' b1 b2 ... bn '|' <separator>, wherebidenotes the value stored in bini.- Parameters:
- sep- separator string to be printed between histograms
- nonZeroOnly- all or non-zero value containing histogram are printed
- Returns:
- the string representation of the fingerprint
 
- 
toDecimalStringConverts the fingerprint into a string of decial numbers. All bins are printed in an unstructed way, values are simply separated by tabs.- Specified by:
- toDecimalStringin class- MolecularDescriptor
- Returns:
- binary string representation of the fingerprint
 
- 
toFloatArraypublic float[] toFloatArray()Creates the float array representation of aMolecularDescriptorobject. This array contains all values of the descriptor (including all zeros) in the elements of the array.- Specified by:
- toFloatArrayin class- MolecularDescriptor
- Returns:
- float array of the fingerprint cells
- Since:
- JChem 2.0.1
 
- 
fromFloatArraypublic void fromFloatArray(float[] descr) Builds a molecular descriptor from its float array representation. Typically used when a hypothesis is created.- Specified by:
- fromFloatArrayin class- MolecularDescriptor
- Parameters:
- descr- descriptor represented in a float array (e.g. generated by- toFloatArray())
- Since:
- JChem 2.0.1
 
- 
getAtomSetColorsDetermines the coloring of atoms. This coloring does not reflect element types, instead pharmacophore point types. This method should be called after each call ofsetParameters()as that may change the coloring scheme to be applied.- Overrides:
- getAtomSetColorsin class- MolecularDescriptor
- Returns:
- array of colors of different pharmacophore point types
 
- 
getAtomSetNames- Overrides:
- getAtomSetNamesin class- MolecularDescriptor
 
- 
getAtomSetIndexesGets the individual atom colors by pharmcophore point type.- Overrides:
- getAtomSetIndexesin class- MolecularDescriptor
- Parameters:
- m- a molecule to assign pharmacophore point colors to
- Returns:
- array of color indexes indexed by atom indixes
 
- 
getDissimilarityMetricsGets the dissimilarity metric names.- Specified by:
- getDissimilarityMetricsin class- MolecularDescriptor
- Returns:
- the metrics array
 
- 
getDefaultDissimilarityMetricThresholdspublic float[] getDefaultDissimilarityMetricThresholds()Gets the default dissimilarity threshold values for all dissimilarity metrics defined.- Specified by:
- getDefaultDissimilarityMetricThresholdsin class- MolecularDescriptor
- Returns:
- array of dissimilarity threshold values
 
- 
getEuclideanCalculates the Euclidean distance. The dissimilarity coefficient returned ranges from 0 to MAX_FLOAT, this coefficient is not normalized.- Parameters:
- f- another fingerprint from which the distance is measured
- Returns:
- dissimilarity coefficient
 
- 
getAsymmetricEuclidean
- 
getWeightedEuclideanCalculates the weighted Euclidean distance. Weights are taken from the associated PFParameters.- Parameters:
- f- a fingerprint from which the distance is measured
- Returns:
- dissimilarity coefficient
 
- 
getWeightedAsymmetricEuclideanCalculates the weighted asymmetric Euclidean distance. Weights and asymmetry ratio are taken from the associated PFParameters.- Parameters:
- f- a fingerprint from which the distance is measured
- Returns:
- dissimilarity coefficient
 
- 
getSymmetricFBPACalculates the symmetric FBPA convolution product based distasnce of the fingerprint from an other (given as parameter).- Parameters:
- f- distance of- thisis taken from- f
- Returns:
- euclidean distance (dissimilarity measure)
 
- 
getAsymmetricFBPACalculates the asymmetric FBPA convolution product based distance of the fingerprint from an other (given as parameter).- Parameters:
- f- the reference fingerprint (denoted by M))
- Returns:
- the euaclidean distance (dissimilarity measure)
 
- 
getTanimotoCalculates the Tanimoto metric (adapted to hystograms)- Parameters:
- f- the distance from- fis calculated
- Returns:
- the tanimoto distance (dissimilarity measure)
 
- 
getTverskyCalculates the Tversky !!DISSIMILARITY!! index- Parameters:
- f- the distance from- fis calculated
- Returns:
- the Tversky dissmilarity index as float
 
- 
getScaledTanimotopublic final float getScaledTanimoto(PharmacophoreFingerprint f, PharmacophoreFingerprint hypothesis) Calculates the scaled Tanimoto metric (adapted to hystograms).- Parameters:
- f- the distance is measured from- f
- Returns:
- the tanimoto distance (dissimilarity measure)
 
- 
indexpublic int index(int fa, int fb, int dist) Calculates the index of the bin specified by the arguments.- Parameters:
- fa- index of the first pharmacophore point type
- fb- index of the second (other) pharmacophore point type
- dist- distance of the pharmacophore points
- Returns:
- index of the specified bit
 
- 
getDissimilarityCalculates the dissimilarity between two pharmacophore fingerprints using the default distance measure.- Specified by:
- getDissimilarityin class- MolecularDescriptor
- Parameters:
- fp2- the other pharmacophore fingerprint
- Returns:
- dissimilarity ratio
 
- 
getDissimilarityCalculates the dissimilarity between two pharmacophore fingerprints using the specified parametrized distance metric.- Specified by:
- getDissimilarityin class- MolecularDescriptor
- Parameters:
- fp2- the pharmacohore fingerprint from which the distance is measured
- metricIndex- index of the parametrized metric to be used
- Returns:
- the dissimilarity ratio
- See Also:
 
- 
getLowerBoundCalculates the lower bound estimate of the dissimilarity from the given fingerprint. This method is required byDiffablesee remarks atgetDissimilarity( final Object fp2 )getDistance.- Overrides:
- getLowerBoundin class- MolecularDescriptor
- Parameters:
- fp2- pharmacophore fingerprint from which distance is measured
- Returns:
- estimate of the minimum distance
 
- 
isSubsetOfChecks if this fingerprint is a subset of another fingerprint that is passed as method parameter. A histogram (fingerprint) is considered to be a subset of another, if none of its bars is higher than that of the other's.- Parameters:
- d- a descriptor which is supposed to be a superset
- Returns:
- true if this descriptor is a subset of the parameter
 
- 
getMaxDistpublic float getMaxDist()
- 
getMinDistpublic float getMinDist()
- 
getResolutionpublic float getResolution()
- 
getNumberOfFeaturespublic int getNumberOfFeatures()
- 
getSymbol
- 
getpublic float get(int feature1, int feature2, float dist) 
- 
getAliasNames
 
-