Class ECFP

All Implemented Interfaces:
chemaxon.license.Licensable, Cloneable

@PublicAPI public class ECFP extends MolecularDescriptor implements chemaxon.license.Licensable
The ECFP class implements Extended-Connectivity Fingerprints (ECFPs) as a type of MolecularDescriptors. ECFPs are circular topological fingerprints designed for molecular characterization, similarity searching, and structure-activity modeling.

They are among the most popular similarity search tools in drug discovery and they are effectively used in a wide variety of applications.

The main properties of ECFPs are the following.

  • They represent molecular structures by means of circular atom neighborhoods.
  • They can be very rapidly calculated.
  • Their features represent the presence of particular substructures.
  • They are not predefined and can represent a huge number of different molecular features (including stereochemical information).
  • They are designed to represent both the presence and the absence of functionality, since both are crucial for analyzing molecular activity.
  • Their generation method can be flexibly customized to produce various types of circular fingerprints for diverse applications.

For more information, see the user's guide.

Since:
JChem 5.4
  • Field Details

    • ids

      protected int[] ids
      Identifier list storage of the fingerprint
    • fp

      protected int[] fp
      Binary vector storage of the fingerprint
    • brightness

      protected int brightness
      The number of bits set in the binary vector storage
  • Constructor Details

    • ECFP

      public ECFP()
      Creates a new, empty instance of ECFP without allocating internal storage.
    • ECFP

      public ECFP(ECFPParameters params)
      Creates a new instance of ECFP according to the parameters given.
      Parameters:
      params - parameter settings
    • ECFP

      public ECFP(String params)
      Creates a new instance of ECFP according to the parameters given.
      Parameters:
      params - parameter settings
    • ECFP

      public ECFP(ECFP ecfp)
      Copy constructor. An identical copy of the ECFP fingerprint passed is created. The old and the new instances share the same ECFPParameters object.
      Parameters:
      ecfp - fingerprint to be copied
  • Method Details

    • clone

      public ECFP clone()
      Creates a new instance with identical internal state.
      Specified by:
      clone in class MolecularDescriptor
      Returns:
      the newly copied object
    • isLicensed

      public boolean isLicensed()
      Returns information about the licensing of the product.
      Specified by:
      isLicensed in interface chemaxon.license.Licensable
      Returns:
      true if the product is correctly licensed
    • setLicenseEnvironment

      public void setLicenseEnvironment(String env)
      Sets the license environment.
      Specified by:
      setLicenseEnvironment in interface chemaxon.license.Licensable
    • getName

      public String getName()
      Gets the name of the ECFP fingerprint object. This name is not the same as the class name: nicer, and more meaningful for end-users too.
      Overrides:
      getName in class MolecularDescriptor
      Returns:
      the nice, external name for ECFP class objects
    • getShortName

      public String getShortName()
      Gets the short name of the fingerprint.
      Overrides:
      getShortName in class MolecularDescriptor
      Returns:
      the short name used in text outputs (tables etc.)
    • getParametersClassName

      public String getParametersClassName()
      Gets the name of the parameters class corresponding to the descriptor.
      Overrides:
      getParametersClassName in class MolecularDescriptor
      Returns:
      the name of the parameters class
    • setParameters

      public void setParameters(MDParameters parameters) throws MDParametersException
      Sets the parameters of an already created ECFP object.
      Overrides:
      setParameters in class MolecularDescriptor
      Parameters:
      parameters - parameter settings for the fingerprint
      Throws:
      MDParametersException - any XML error
    • setParameters

      public void setParameters(String parameters) throws MDParametersException
      Sets the parameters of an already created ECFP object.
      Specified by:
      setParameters in class MolecularDescriptor
      Parameters:
      parameters - parameter settings for the fingerprint
      Throws:
      MDParametersException - any XML error
    • clear

      public void clear()
      Clears the fingerprint, all values are set to zero.
    • toData

      public byte[] toData()
      Converts an ECFP object into a byte array. This format can be referred to as an "external representation" since it servers as the data format for storing ECFP fingerprints in databases.
      Use the fromData() method to build the ECFP object from this "external" representation.
      Specified by:
      toData in class MolecularDescriptor
      Returns:
      byte array representation of the fingerprint object
    • fromData

      public void fromData(byte[] data)
      Builds an ECFP fingerprint from an external data format created by toData().
      Specified by:
      fromData in class MolecularDescriptor
      Parameters:
      data - "external" representation of a ECFP object
    • toString

      public final String toString()
      Converts the fingerprint into a readable string. This is the default external text format of the fingerprint, which can also be stored into an SDfile.
      Specified by:
      toString in class MolecularDescriptor
      Returns:
      string representation of the fingerprint
    • toDecimalString

      public final String toDecimalString()
      Converts the ECFP fingerprint into a tab separated string.
      Specified by:
      toDecimalString in class MolecularDescriptor
      Returns:
      string representation of the fingerprint
    • toBinaryString

      public String toBinaryString()
      Converts the fingerprint into a fixed-length 0,1 string. This string represents the "folded" binary version of the fingerprint.
      Overrides:
      toBinaryString in class MolecularDescriptor
      Returns:
      binary string representation of the fingerprint
    • fromString

      public final void fromString(String ecfp) throws ParseException
      Builds an ECFP fingerprint from its string representation created by toString().
      Specified by:
      fromString in class MolecularDescriptor
      Parameters:
      ecfp - ECFP fingerprint string
      Throws:
      ParseException
    • toFloatArray

      public float[] toFloatArray()
      Creates the float array representation of a ECFP fingerprint object.
      Specified by:
      toFloatArray in class MolecularDescriptor
      Returns:
      a float array of the fingerprint values
    • fromFloatArray

      public void fromFloatArray(float[] descr)
      Builds an ECFP fingerprint from its float array representation. Typically used when a hypothesis is created.
      Specified by:
      fromFloatArray in class MolecularDescriptor
      Parameters:
      descr - fingerprint represented in a float array (e.g. generated by toFloatArray())
    • toIntArray

      public int[] toIntArray()
      Converts the fingerprint to an array of int identifiers.
    • fromIntArray

      public void fromIntArray(int[] array)
      Builds an ECFP fingerprint from an array of int identifiers.
    • toIdentiferSet

      public Set<Integer> toIdentiferSet()
      Converts the fingerprint to a set of Integer identifiers.
    • fromIdentiferSet

      public void fromIdentiferSet(Set<Integer> set)
      Builds an ECFP fingerprint from a set of Integer identifers.
    • toFeatureSet

      Deprecated, for removal: This API element is subject to removal in a future version.
      As of JChem 5.4.1, replaced by toIdentiferSet().
      Converts the fingerprint to a set of Integer identifiers.
    • fromFeatureSet

      @Deprecated(forRemoval=true) @SubjectToRemoval(date=JUL_01_2025) public void fromFeatureSet(Set<Integer> set)
      Deprecated, for removal: This API element is subject to removal in a future version.
      As of JChem 5.4.1, replaced by fromIdentiferSet().
      Builds an ECFP fingerprint from a set of Integer identifers.
    • toBitSet

      public BitSet toBitSet()
      Returns a bit vector storing the "folded" binary representation of the fingerprint.
    • getIdentiferCount

      public int getIdentiferCount()
      Gets the number of integer identifers generated for the fingerprint.
      Returns:
      the number of identifers in the fingerprint
    • getFeatureCount

      @Deprecated(forRemoval=true) @SubjectToRemoval(date=JUL_01_2025) public int getFeatureCount()
      Deprecated, for removal: This API element is subject to removal in a future version.
      As of JChem 5.4.1, replaced by getIdentiferCount().
      Gets the number of integer identifers generated for the fingerprint.
      Returns:
      the number of identifers in the fingerprint
    • getBrightness

      public int getBrightness()
      Gets the brightness of the fingerprint. Actually, sometimes this is called the darkness. To be precise, this methods gets the numbers of 1 (one) bits in the fingerprint.
      Returns:
      number of bits set to 1
    • requireBinaryVector

      protected void requireBinaryVector()
      Checks the binary vector storage and generates it from the identifier list if necessary.
    • dropBinaryVector

      public void dropBinaryVector()
      Drops the binary vector storage. It will be regenerated when required.
    • generate

      public String[] generate(Molecule m) throws MDGeneratorException
      Creates the ECFP fingerprint for the given Molecule. Calls the generator created by the corresponding ECFPParameters class.
      Overrides:
      generate in class MolecularDescriptor
      Returns:
      property names set in the molecule during generation
      Throws:
      MDGeneratorException - when failed to generate fingerprint
    • getDissimilarityMetrics

      public String[] getDissimilarityMetrics()
      Gets the dissimilarity metric names introduced for this class of MolecularDescriptor.
      Specified by:
      getDissimilarityMetrics in class MolecularDescriptor
      Returns:
      the metrics array
    • getDefaultDissimilarityMetricThresholds

      public float[] getDefaultDissimilarityMetricThresholds()
      Gets the default dissimilarity threshold values for all dissimilarity metrics defined.
      Specified by:
      getDefaultDissimilarityMetricThresholds in class MolecularDescriptor
      Returns:
      array of dissimilarity threshold values
    • getDefaultMetricIndex

      public int getDefaultMetricIndex()
      Gets the index of the default metric. In the case of ECFP, this is Tanimoto.
      Overrides:
      getDefaultMetricIndex in class MolecularDescriptor
      Returns:
      metric index of the default metric
    • getDefaultThreshold

      public float getDefaultThreshold(int metricIndex)
      Gets a metric dependent default threshold value. Ideally, this value should be based on statistics, though the actual value is not too critical, since these are only used in user interfaces to simplify the use of applications for beginners.
      Overrides:
      getDefaultThreshold in class MolecularDescriptor
      Parameters:
      metricIndex - index of a parameterized metric
    • getTanimoto

      public float getTanimoto(ECFP f)
      Calculates the Tanimoto distance.
      Parameters:
      f - the distance from f is calculated
      Returns:
      the tanimoto distance (dissimilarity coefficient)
    • getEuclidean

      public float getEuclidean(ECFP f)
      Calculates the Euclidean distance. This is the same as the Euclidean distance for bit string.
      Parameters:
      f - the distance from f is calculated
      Returns:
      the dissimilarity coefficient
    • getWeightedEuclidean

      public float getWeightedEuclidean(ECFP f)
      Calculates the weighted Euclidean distance. This is the same as the weighted Euclidean distance for bit strings.
      Parameters:
      f - the distance from f is calculated
      Returns:
      the dissimilarity coefficient
    • getAsymmetricEuclidean

      public float getAsymmetricEuclidean(ECFP f)
      Calculates the asymmetric Euclidean distance. This is the same as the asymmetric Euclidean distance for bit strings.
      Parameters:
      f - the distance from f is calculated
      Returns:
      the dissimilarity coefficient
    • getWeightedAsymmetricEuclidean

      public float getWeightedAsymmetricEuclidean(ECFP f)
      Calculates the weighted asymmetric Euclidean distance. This is the same as the asymmetric Euclidean distance for bit strings.
      Parameters:
      f - the distance from f is calculated
      Returns:
      the dissimilarity coefficient
    • getDissimilarity

      public float getDissimilarity(MolecularDescriptor other)
      Calculates the dissimilarity ratio between two ECFP objects using the current default metric. Default metric is set in the corresponding ECFPParameters object by setCurrentParametrizedMetric(int metricIndex). In the case of assymetric distances, swapping the two fingerprints can make big difference.
      Specified by:
      getDissimilarity in class MolecularDescriptor
      Parameters:
      other - a fingerprint, to which the dissimilarity ratio is measured
      Returns:
      the dissimilarity ratio
    • getDissimilarity

      public float getDissimilarity(MolecularDescriptor other, int metricIndex)
      Calculates the dissimilarity between two ECFP objects using the specified metric, apart from that it is the same as getDissimilarity(final MolecularDescriptor other).
      Specified by:
      getDissimilarity in class MolecularDescriptor
      Parameters:
      other - a fingerprint, to which the dissimilarity ratio is measured
      metricIndex - the index of the metric to be used
      Returns:
      the dissimilarity ratio
      See Also:
    • getAliasNames

      public List<String> getAliasNames()