Class ECFP

  • All Implemented Interfaces:
    chemaxon.license.Licensable, Cloneable

    @PublicAPI
    public class ECFP
    extends MolecularDescriptor
    implements chemaxon.license.Licensable
    The ECFP class implements Extended-Connectivity Fingerprints (ECFPs) as a type of MolecularDescriptors. ECFPs are circular topological fingerprints designed for molecular characterization, similarity searching, and structure-activity modeling. They are among the most popular similarity search tools in drug discovery and they are effectively used in a wide variety of applications.

    The main properties of ECFPs are the following.

    • They represent molecular structures by means of circular atom neighborhoods.
    • They can be very rapidly calculated.
    • Their features represent the presence of particular substructures.
    • They are not predefined and can represent a huge number of different molecular features (including stereochemical information).
    • They are designed to represent both the presence and the absence of functionality, since both are crucial for analyzing molecular activity.
    • Their generation method can be flexibly customized to produce various types of circular fingerprints for diverse applications.

    For more information, see the user's guide.

    Since:
    JChem 5.4
    • Field Detail

      • ids

        protected int[] ids
        Identifier list storage of the fingerprint
      • fp

        protected int[] fp
        Binary vector storage of the fingerprint
      • brightness

        protected int brightness
        The number of bits set in the binary vector storage
    • Constructor Detail

      • ECFP

        public ECFP()
        Creates a new, empty instance of ECFP without allocating internal storage.
      • ECFP

        public ECFP​(ECFPParameters params)
        Creates a new instance of ECFP according to the parameters given.
        Parameters:
        params - parameter settings
      • ECFP

        public ECFP​(String params)
        Creates a new instance of ECFP according to the parameters given.
        Parameters:
        params - parameter settings
      • ECFP

        public ECFP​(ECFP ecfp)
        Copy constructor. An identical copy of the ECFP fingerprint passed is created. The old and the new instances share the same ECFPParameters object.
        Parameters:
        ecfp - fingerprint to be copied
    • Method Detail

      • clone

        public ECFP clone()
        Creates a new instance with identical internal state.
        Specified by:
        clone in class MolecularDescriptor
        Returns:
        the newly copied object
      • isLicensed

        public boolean isLicensed()
        Returns information about the licensing of the product.
        Specified by:
        isLicensed in interface chemaxon.license.Licensable
        Returns:
        true if the product is correctly licensed
      • setLicenseEnvironment

        public void setLicenseEnvironment​(String env)
        Sets the license environment.
        Specified by:
        setLicenseEnvironment in interface chemaxon.license.Licensable
      • getName

        public String getName()
        Gets the name of the ECFP fingerprint object. This name is not the same as the class name: nicer, and more meaningful for end-users too.
        Overrides:
        getName in class MolecularDescriptor
        Returns:
        the nice, external name for ECFP class objects
      • getShortName

        public String getShortName()
        Gets the short name of the fingerprint.
        Overrides:
        getShortName in class MolecularDescriptor
        Returns:
        the short name used in text outputs (tables etc.)
      • getParametersClassName

        public String getParametersClassName()
        Gets the name of the parameters class corresponding to the descriptor.
        Overrides:
        getParametersClassName in class MolecularDescriptor
        Returns:
        the name of the parameters class
      • clear

        public void clear()
        Clears the fingerprint, all values are set to zero.
      • toData

        public byte[] toData()
        Converts an ECFP object into a byte array. This format can be referred to as an "external representation" since it servers as the data format for storing ECFP fingerprints in databases.
        Use the fromData() method to build the ECFP object from this "external" representation.
        Specified by:
        toData in class MolecularDescriptor
        Returns:
        byte array representation of the fingerprint object
      • fromData

        public void fromData​(byte[] data)
        Builds an ECFP fingerprint from an external data format created by toData().
        Specified by:
        fromData in class MolecularDescriptor
        Parameters:
        data - "external" representation of a ECFP object
      • toString

        public final String toString()
        Converts the fingerprint into a readable string. This is the default external text format of the fingerprint, which can also be stored into an SDfile.
        Specified by:
        toString in class MolecularDescriptor
        Returns:
        string representation of the fingerprint
      • toDecimalString

        public final String toDecimalString()
        Converts the ECFP fingerprint into a tab separated string.
        Specified by:
        toDecimalString in class MolecularDescriptor
        Returns:
        string representation of the fingerprint
      • toBinaryString

        public String toBinaryString()
        Converts the fingerprint into a fixed-length 0,1 string. This string represents the "folded" binary version of the fingerprint.
        Overrides:
        toBinaryString in class MolecularDescriptor
        Returns:
        binary string representation of the fingerprint
      • toFloatArray

        public float[] toFloatArray()
        Creates the float array representation of a ECFP fingerprint object.
        Specified by:
        toFloatArray in class MolecularDescriptor
        Returns:
        a float array of the fingerprint values
      • fromFloatArray

        public void fromFloatArray​(float[] descr)
        Builds an ECFP fingerprint from its float array representation. Typically used when a hypothesis is created.
        Specified by:
        fromFloatArray in class MolecularDescriptor
        Parameters:
        descr - fingerprint represented in a float array (e.g. generated by toFloatArray())
      • toIntArray

        public int[] toIntArray()
        Converts the fingerprint to an array of int identifiers.
      • fromIntArray

        public void fromIntArray​(int[] array)
        Builds an ECFP fingerprint from an array of int identifiers.
      • toIdentiferSet

        public Set<Integer> toIdentiferSet()
        Converts the fingerprint to a set of Integer identifiers.
      • fromIdentiferSet

        public void fromIdentiferSet​(Set<Integer> set)
        Builds an ECFP fingerprint from a set of Integer identifers.
      • fromFeatureSet

        @Deprecated
        public void fromFeatureSet​(Set<Integer> set)
        Deprecated.
        As of JChem 5.4.1, replaced by fromIdentiferSet().
        Builds an ECFP fingerprint from a set of Integer identifers.
      • toBitSet

        public BitSet toBitSet()
        Returns a bit vector storing the "folded" binary representation of the fingerprint.
      • getIdentiferCount

        public int getIdentiferCount()
        Gets the number of integer identifers generated for the fingerprint.
        Returns:
        the number of identifers in the fingerprint
      • getFeatureCount

        @Deprecated
        public int getFeatureCount()
        Deprecated.
        As of JChem 5.4.1, replaced by getIdentiferCount().
        Gets the number of integer identifers generated for the fingerprint.
        Returns:
        the number of identifers in the fingerprint
      • getBrightness

        public int getBrightness()
        Gets the brightness of the fingerprint. Actually, sometimes this is called the darkness. To be precise, this methods gets the numbers of 1 (one) bits in the fingerprint.
        Returns:
        number of bits set to 1
      • requireBinaryVector

        protected void requireBinaryVector()
        Checks the binary vector storage and generates it from the identifier list if necessary.
      • dropBinaryVector

        public void dropBinaryVector()
        Drops the binary vector storage. It will be regenerated when required.
      • getDissimilarityMetrics

        public String[] getDissimilarityMetrics()
        Gets the dissimilarity metric names introduced for this class of MolecularDescriptor.
        Specified by:
        getDissimilarityMetrics in class MolecularDescriptor
        Returns:
        the metrics array
      • getDefaultDissimilarityMetricThresholds

        public float[] getDefaultDissimilarityMetricThresholds()
        Gets the default dissimilarity threshold values for all dissimilarity metrics defined.
        Specified by:
        getDefaultDissimilarityMetricThresholds in class MolecularDescriptor
        Returns:
        array of dissimilarity threshold values
      • getDefaultMetricIndex

        public int getDefaultMetricIndex()
        Gets the index of the default metric. In the case of ECFP, this is Tanimoto.
        Overrides:
        getDefaultMetricIndex in class MolecularDescriptor
        Returns:
        metric index of the default metric
      • getDefaultThreshold

        public float getDefaultThreshold​(int metricIndex)
        Gets a metric dependent default threshold value. Ideally, this value should be based on statistics, though the actual value is not too critical, since these are only used in user interfaces to simplify the use of applications for beginners.
        Overrides:
        getDefaultThreshold in class MolecularDescriptor
        Parameters:
        metricIndex - index of a parameterized metric
      • getTanimoto

        public float getTanimoto​(ECFP f)
        Calculates the Tanimoto distance.
        Parameters:
        f - the distance from f is calculated
        Returns:
        the tanimoto distance (dissimilarity coefficient)
      • getEuclidean

        public float getEuclidean​(ECFP f)
        Calculates the Euclidean distance. This is the same as the Euclidean distance for bit string.
        Parameters:
        f - the distance from f is calculated
        Returns:
        the dissimilarity coefficient
      • getWeightedEuclidean

        public float getWeightedEuclidean​(ECFP f)
        Calculates the weighted Euclidean distance. This is the same as the weighted Euclidean distance for bit strings.
        Parameters:
        f - the distance from f is calculated
        Returns:
        the dissimilarity coefficient
      • getAsymmetricEuclidean

        public float getAsymmetricEuclidean​(ECFP f)
        Calculates the asymmetric Euclidean distance. This is the same as the asymmetric Euclidean distance for bit strings.
        Parameters:
        f - the distance from f is calculated
        Returns:
        the dissimilarity coefficient
      • getWeightedAsymmetricEuclidean

        public float getWeightedAsymmetricEuclidean​(ECFP f)
        Calculates the weighted asymmetric Euclidean distance. This is the same as the asymmetric Euclidean distance for bit strings.
        Parameters:
        f - the distance from f is calculated
        Returns:
        the dissimilarity coefficient
      • getDissimilarity

        public float getDissimilarity​(MolecularDescriptor other)
        Calculates the dissimilarity ratio between two ECFP objects using the current default metric. Default metric is set in the corresponding ECFPParameters object by setCurrentParametrizedMetric(int metricIndex). In the case of assymetric distances, swapping the two fingerprints can make big difference.
        Specified by:
        getDissimilarity in class MolecularDescriptor
        Parameters:
        other - a fingerprint, to which the dissimilarity ratio is measured
        Returns:
        the dissimilarity ratio
      • getAliasNames

        public List<String> getAliasNames()