Class PharmacophoreFingerprint

java.lang.Object
chemaxon.descriptors.MolecularDescriptor
chemaxon.descriptors.PharmacophoreFingerprint
All Implemented Interfaces:
chemaxon.license.Licensable, Cloneable

@PublicAPI public class PharmacophoreFingerprint extends MolecularDescriptor implements chemaxon.license.Licensable
The PharmacophoreFingerprint class implements 2D pharmacophoric fingerprints. Such fingerprints (which are chemical descriptors) are constructed from sequences of histograms, each of these histograms have the same number of bars. (Each of these bars represent a descriptor cell.) The number of histograms is determined by the number of pharmacophore types (also often referred as features, properties). If the number of distinct pharmacophore features (for instance H-donor, H-acceptor, charge etc.) is n then the number of histograms is n*(n+1)/2.
Pharmacophoric point types can be customized by the user of the software, and are specified in an external configuration file, see user documentation for details.
The total number of bars (or bins) in one histogram (that is, the number of cells in the descriptor) is determined by two distance values: the minimal and maximal distances of pharmacophoric point pairs (atom pairs). Since fingerprints handled by this class are two-dimensional, distances are considered as topological distances (that is, the distance of two atoms in the same molecule is equal to the number of edges in the shortest path connecting the two nodes corresponding to the two atoms in the chemical graph of the molecule). (This implies that chemical graphs should be connected.) Atom pairs closer to each other than minimal distance are regarded as being minimal distance apart (and similarly for distance greater than the maximal distance).
Thus the number of bars in one historgram is equal to: maximal distance - minimal distance + 1 .
The above described three configuration parameters (minimal and maximal distance, and the number of pharmacophore types) have substantial influence on the size of the pharmacophoric fingerprints. When this class is instantiated these params have to be provided in a PFParameters object.
Beside fingerprint size, two further circumstances determine the internal logical structure of fingerprints: the order of the histograms in the fingerprint, and the order of histogram bars in one histogram. Histograms are ordered by pharmacophore type symbols, that is, if H-bond acceptor is denoted by a, and H-donor property by d (and there are no more features specified), then the order of histograms is: a-a, a-d, d-d (and according to the above introduced formula, the number of histograms is 2*(2+1)/2 = 3. Histogram bars are ordered from left to right by distance valued (from minimal to maximal distance).
This fingerprint structure results in a unique (well-defined, unambiguous) representation that enables the canonical numbering (indexing) of individual bins. This is vital in accessing cells efficiently. Otherwise, if only symbolic keys (in contrast to integer index numbers) could be used (for example ('a','d',3) ) a dramatic loss of efficiency in retrieving information from fingerprints would be experienced. Therefore it is crucial to introduce distinct symbols for different pharmacophore types in the XML configuration file and also to use the same symbols when fingerprints are generated and when they are used in dissimilarity calculations. Otherwise, the interpretation (meaning) of the fingerprints could be significantly different.

Operations

Three main groups of operations (methods) can be distinguished:

  • Direct bin manipulation: put value in a bin, increase the value stored in a bin, retrieve the value stored in a bin.
  • Conversion methods: string representations, extracting into database format and building up from string and database formats.
  • (Dis)similarity metrics: these compare two finegrprints and calculate a distance value (dissimilarity ratio or coefficient) between them.
Since:
JChem 2.0
  • Field Details

    • fp

      protected float[] fp
      storage for the fingerprint
  • Constructor Details

    • PharmacophoreFingerprint

      public PharmacophoreFingerprint()
      Creates a new, empty instance of PharmacophoreFingerprint without allocating internal storage.
    • PharmacophoreFingerprint

      public PharmacophoreFingerprint(PFParameters params)
      Creates a new instance of PharmacophoreFingerprint according to the parameters given.
      Parameters:
      params - parameters used in fingerprint generation and handling
    • PharmacophoreFingerprint

      public PharmacophoreFingerprint(String params)
      Creates a new instance of PharmacophoreFingerprint according to the parameters given.
      Parameters:
      params - parameter settings
    • PharmacophoreFingerprint

      public PharmacophoreFingerprint(PharmacophoreFingerprint pfp)
      Copy constructor. An identical copy of the pharmacophore fingerprint passed is created, they share the same PFParameters object.
      Parameters:
      pfp - fingerprint to be copied
  • Method Details

    • clone

      public PharmacophoreFingerprint clone()
      Creates a new instance with identical internal state.
      Specified by:
      clone in class MolecularDescriptor
      Returns:
      the newly copied object
    • isLicensed

      public boolean isLicensed()
      Returns information about the licensing of the product.
      Specified by:
      isLicensed in interface chemaxon.license.Licensable
      Returns:
      true if the product is correctly licensed
    • setLicenseEnvironment

      public void setLicenseEnvironment(String env)
      Sets the license environment.
      Specified by:
      setLicenseEnvironment in interface chemaxon.license.Licensable
    • getName

      public String getName()
      Gets the name of the PharmacophoreFingerprint object. The name is not the same as the class name, it is nicer, more readable and meaningful for end-users too.
      Overrides:
      getName in class MolecularDescriptor
      Returns:
      the nice, external name for PharmacophoreFingerprint class objects
    • getShortName

      public String getShortName()
      Gets the short name of the descriptor.
      Overrides:
      getShortName in class MolecularDescriptor
      Returns:
      the short name used in text outputs (tables etc.)
    • getParametersClassName

      public String getParametersClassName()
      Gets the name of the parameters class corresponding to the descriptor.
      Overrides:
      getParametersClassName in class MolecularDescriptor
      Returns:
      the name of the parameters class
    • setParameters

      public void setParameters(MDParameters parameters)
      Sets parameters, allocates internal storage if needed and cleans the descriptor.
      Overrides:
      setParameters in class MolecularDescriptor
      Parameters:
      parameters - fingerprint parameters
      Since:
      JChem 2.2
    • setParameters

      public void setParameters(String parameters) throws MDParametersException
      Sets the parameters of an already created PharmacophoreFingerprint object.
      Specified by:
      setParameters in class MolecularDescriptor
      Parameters:
      parameters - parameter settings for the descriptor
      Throws:
      MDParametersException - any XML error
    • toData

      public byte[] toData()
      Converts a PharmacophoreFingerprint object into a byte array. This format can be reffered to as an "external representation" since it servers as the data format for storing fingerprints in databases.
      Use the fromData() method to build the pharmacophore fingerprint from this "external" representation.
      Specified by:
      toData in class MolecularDescriptor
      Returns:
      byte array representation of the fingerprint object
    • fromData

      public void fromData(byte[] dbRepr)
      Builds a PharmacophoreFingerprint from an external data format, created by a previous call to toData().
      Specified by:
      fromData in class MolecularDescriptor
      Parameters:
      dbRepr - "external" representation of PharmacophoreFingerprint
    • decompress

      protected byte[] decompress(byte[] data)
      Uncompresses input byte array and stores the uncompressed array in params.data. This is the reverse of compress( final byte[] ). Checks header (first byte) and decompresses only if the value of the first byte is ZERO_SEQUENCE_COMPRESSION_CODE. Otherwise null is returned.
      Parameters:
      data - compressed data
    • generate

      public String[] generate(Molecule m) throws MDGeneratorException
      Creates the PharmacophoreFingerprint descriptor from the given Molecule. Calls the generator created by the corresponding MDParameters class.
      Overrides:
      generate in class MolecularDescriptor
      Returns:
      property names set in the molecule passed during generation
      Throws:
      MDGeneratorException - when failed to generate descriptor
    • inc

      public final void inc(int fa, int fb, int dist)
      Increments the histogram corresponding to two features ('fa'-'fb') and a distance, 'dist'. Pharmacophore features (types, properties) are not used directly, but instead their indices (as introduced by PSymbols class) have to be provided for the sake of efficiency. Distance values are normalized in this method to fall within the minimum and maximum distance range, as specified by the previously given parameters.
      If the bin is already full its value is not changed.
      Parameters:
      fa - feature index of one of the features
      fb - feature index of the other paharmacophore feature
      dist - distance value of the two features
    • inc

      public final void inc(int fa, int fb, int dist, int nrRotBonds)
      The fuzzy version of inc( int fa, int fb, int dist ). The contents of all bins in the (fa,fb) histogram are incremented with the appropriate value depending on the distance and the number of rotatable bonds, and also the fuzzy smoothig factor.
      Parameters:
      fa - feature index of one of the features
      fb - feature index of the other paharmacophore feature
      dist - distance value of the two features
      nrRotBonds - number of rotatable bonds on the path connecting the two pharmacophoric points
    • inc

      public final void inc(int fa, int fb, int dist, float[] incr)
      The fuzzy version of inc( int fa, int fb, int dist ). The contents of all bins in the (fa,fb) histogram are incremented with the appropriate value depending on the user defined fuzzy smoothing vector.
      Parameters:
      fa - feature index of one of the features
      fb - feature index of the other paharmacophore feature
      dist - distance value of the two features
      incr - distant dependent fuzzy increments
    • inc

      public final void inc(int bin)
      Increments the content of the specified hitogram bin by one. No overflow check is performed for the sake of efficiency (in normal use no overflow should occur, since 2^32-1 is large enough for molecules having about 90000 atoms). See the class description for the exact meaning of the bin index.
      Parameters:
      bin - index of the bin to be incremented by one
    • put

      public final void put(int bin, int newValue)
      Stores the given value in the specified hitogram bin. Previous value of the bin is thrown away.
      Parameters:
      bin - index of the bin to be incremented by one
      newValue - value to be stored in the given bin
    • put

      public final void put(int bin, float newValue)
      Stores the given value in the specified hitogram bin. Previous value of the bin is thrown away.
      Parameters:
      bin - index of the bin to be incremented by one
      newValue - value to be stored in the given bin
    • get

      public final float get(int fa, int fb, int dist)
      Gets the histogram bar height of two features ('fa'-'fb') corresponding to the given ditance 'dist'. Distance values have to be normalized upfront to calling this method!
      Parameters:
      fa - feature index of one of the features
      fb - feature index of the other paharmacophore feature
      dist - distance value of the two features
      Returns:
      height (value) of the histogram bar (column) corresponding to the input arguments
    • get

      public final float get(int bin)
      Gets the content of the specified hitogram bin. See the description of PharmacophoreFingerprint class for the meaning of the bin index.
      Parameters:
      bin - index of the bin qeuried
      Returns:
      the value sotred in the specified bin
    • clear

      public final void clear()
      Clears the fingerprint: sets all bins to store zero value.
    • toString

      public final String toString()
      Converts the fingerprint into a readable string. This is the default external text format of the pharmacophore fingerprint, also written into SDfile into the field named (tagged) PFP2D (see setPMAPTagName( String tagName )). See toHistogramString(String sep, boolean nonZeroOnly) for detailed format description.
      Specified by:
      toString in class MolecularDescriptor
      Returns:
      string representation of the pharmacophore fingerprint
    • fromString

      public final void fromString(String pfp) throws ParseException
      Builds a fingerprint from its string representation created by toString().
      Specified by:
      fromString in class MolecularDescriptor
      Parameters:
      pfp - pharmacophore fingerprint string
      Throws:
      ParseException
    • toString

      public final String toString(String sep, boolean nonZeroOnly)
      Creates the string representation of the pharmacophore fingerprint. The output format is different than in toString: <feature symbol> ' ' <feature symbol> @ <distance> '=' <value> <sep> ... . Note, that such text representation cannot be converted into pharmacophore fingerprint data.
      Parameters:
      sep - separator character printed between two bins
      nonZeroOnly - bins containing zero values are not printed
      Returns:
      the string representation of the fingerprint
    • toHistogramString

      public final String toHistogramString(String sep, boolean nonZeroOnly)
      Creates the string representation of the fingerprint. All bins, or all all bins of those histograms in which at least one feature pair has at least one occurance (that is one non-zero valued bin) are printed depending on parameter settings.
      The format is: <feature symbol> ' ' <feature symbol> '=' '|' b1 b2 ... bn '|' <separator>, where bi denotes the value stored in bin i.
      Parameters:
      sep - separator string to be printed between histograms
      nonZeroOnly - all or non-zero value containing histogram are printed
      Returns:
      the string representation of the fingerprint
    • toDecimalString

      public final String toDecimalString()
      Converts the fingerprint into a string of decial numbers. All bins are printed in an unstructed way, values are simply separated by tabs.
      Specified by:
      toDecimalString in class MolecularDescriptor
      Returns:
      binary string representation of the fingerprint
    • toFloatArray

      public float[] toFloatArray()
      Creates the float array representation of a MolecularDescriptor object. This array contains all values of the descriptor (including all zeros) in the elements of the array.
      Specified by:
      toFloatArray in class MolecularDescriptor
      Returns:
      float array of the fingerprint cells
      Since:
      JChem 2.0.1
    • fromFloatArray

      public void fromFloatArray(float[] descr)
      Builds a molecular descriptor from its float array representation. Typically used when a hypothesis is created.
      Specified by:
      fromFloatArray in class MolecularDescriptor
      Parameters:
      descr - descriptor represented in a float array (e.g. generated by toFloatArray())
      Since:
      JChem 2.0.1
    • getAtomSetColors

      public Color[] getAtomSetColors()
      Determines the coloring of atoms. This coloring does not reflect element types, instead pharmacophore point types. This method should be called after each call of setParameters() as that may change the coloring scheme to be applied.
      Overrides:
      getAtomSetColors in class MolecularDescriptor
      Returns:
      array of colors of different pharmacophore point types
    • getAtomSetNames

      public String[] getAtomSetNames()
      Overrides:
      getAtomSetNames in class MolecularDescriptor
    • getAtomSetIndexes

      public int[] getAtomSetIndexes(Molecule m)
      Gets the individual atom colors by pharmcophore point type.
      Overrides:
      getAtomSetIndexes in class MolecularDescriptor
      Parameters:
      m - a molecule to assign pharmacophore point colors to
      Returns:
      array of color indexes indexed by atom indixes
    • getDissimilarityMetrics

      public String[] getDissimilarityMetrics()
      Gets the dissimilarity metric names.
      Specified by:
      getDissimilarityMetrics in class MolecularDescriptor
      Returns:
      the metrics array
    • getDefaultDissimilarityMetricThresholds

      public float[] getDefaultDissimilarityMetricThresholds()
      Gets the default dissimilarity threshold values for all dissimilarity metrics defined.
      Specified by:
      getDefaultDissimilarityMetricThresholds in class MolecularDescriptor
      Returns:
      array of dissimilarity threshold values
    • getEuclidean

      public final float getEuclidean(PharmacophoreFingerprint f)
      Calculates the Euclidean distance. The dissimilarity coefficient returned ranges from 0 to MAX_FLOAT, this coefficient is not normalized.
      Parameters:
      f - another fingerprint from which the distance is measured
      Returns:
      dissimilarity coefficient
    • getAsymmetricEuclidean

      public final float getAsymmetricEuclidean(PharmacophoreFingerprint f)
    • getWeightedEuclidean

      public final float getWeightedEuclidean(PharmacophoreFingerprint f)
      Calculates the weighted Euclidean distance. Weights are taken from the associated PFParameters.
      Parameters:
      f - a fingerprint from which the distance is measured
      Returns:
      dissimilarity coefficient
    • getWeightedAsymmetricEuclidean

      public final float getWeightedAsymmetricEuclidean(PharmacophoreFingerprint f)
      Calculates the weighted asymmetric Euclidean distance. Weights and asymmetry ratio are taken from the associated PFParameters.
      Parameters:
      f - a fingerprint from which the distance is measured
      Returns:
      dissimilarity coefficient
    • getSymmetricFBPA

      public final float getSymmetricFBPA(PharmacophoreFingerprint f)
      Calculates the symmetric FBPA convolution product based distasnce of the fingerprint from an other (given as parameter).
      Parameters:
      f - distance of this is taken from f
      Returns:
      euclidean distance (dissimilarity measure)
    • getAsymmetricFBPA

      public final float getAsymmetricFBPA(PharmacophoreFingerprint f)
      Calculates the asymmetric FBPA convolution product based distance of the fingerprint from an other (given as parameter).
      Parameters:
      f - the reference fingerprint (denoted by M))
      Returns:
      the euaclidean distance (dissimilarity measure)
    • getTanimoto

      public final float getTanimoto(PharmacophoreFingerprint f)
      Calculates the Tanimoto metric (adapted to hystograms)
      Parameters:
      f - the distance from f is calculated
      Returns:
      the tanimoto distance (dissimilarity measure)
    • getTversky

      public float getTversky(PharmacophoreFingerprint f)
      Calculates the Tversky !!DISSIMILARITY!! index
      Parameters:
      f - the distance from f is calculated
      Returns:
      the Tversky dissmilarity index as float
    • getScaledTanimoto

      public final float getScaledTanimoto(PharmacophoreFingerprint f, PharmacophoreFingerprint hypothesis)
      Calculates the scaled Tanimoto metric (adapted to hystograms).
      Parameters:
      f - the distance is measured from f
      Returns:
      the tanimoto distance (dissimilarity measure)
    • index

      public int index(int fa, int fb, int dist)
      Calculates the index of the bin specified by the arguments.
      Parameters:
      fa - index of the first pharmacophore point type
      fb - index of the second (other) pharmacophore point type
      dist - distance of the pharmacophore points
      Returns:
      index of the specified bit
    • getDissimilarity

      public float getDissimilarity(MolecularDescriptor fp2)
      Calculates the dissimilarity between two pharmacophore fingerprints using the default distance measure.
      Specified by:
      getDissimilarity in class MolecularDescriptor
      Parameters:
      fp2 - the other pharmacophore fingerprint
      Returns:
      dissimilarity ratio
    • getDissimilarity

      public float getDissimilarity(MolecularDescriptor fp2, int metricIndex)
      Calculates the dissimilarity between two pharmacophore fingerprints using the specified parametrized distance metric.
      Specified by:
      getDissimilarity in class MolecularDescriptor
      Parameters:
      fp2 - the pharmacohore fingerprint from which the distance is measured
      metricIndex - index of the parametrized metric to be used
      Returns:
      the dissimilarity ratio
      See Also:
    • getLowerBound

      public float getLowerBound(MolecularDescriptor fp2)
      Calculates the lower bound estimate of the dissimilarity from the given fingerprint. This method is required by Diffable see remarks at getDissimilarity( final Object fp2 ) for further explanation. In the case of PharmacophoreFingerprint a good estimate for the minimum distance cannot be obtained efficiently (that is, significantly faster than calculating the proper distance) therefore 0 is returned. This trivial distance bound estimation will lead to calling getDistance.
      Overrides:
      getLowerBound in class MolecularDescriptor
      Parameters:
      fp2 - pharmacophore fingerprint from which distance is measured
      Returns:
      estimate of the minimum distance
    • isSubsetOf

      public boolean isSubsetOf(PharmacophoreFingerprint d)
      Checks if this fingerprint is a subset of another fingerprint that is passed as method parameter. A histogram (fingerprint) is considered to be a subset of another, if none of its bars is higher than that of the other's.
      Parameters:
      d - a descriptor which is supposed to be a superset
      Returns:
      true if this descriptor is a subset of the parameter
    • getMaxDist

      public float getMaxDist()
    • getMinDist

      public float getMinDist()
    • getResolution

      public float getResolution()
    • getNumberOfFeatures

      public int getNumberOfFeatures()
    • getSymbol

      public String getSymbol(int feature)
    • get

      public float get(int feature1, int feature2, float dist)
    • getAliasNames

      public List<String> getAliasNames()