Package chemaxon.descriptors
Class ECFP
java.lang.Object
chemaxon.descriptors.MolecularDescriptor
chemaxon.descriptors.ECFP
- All Implemented Interfaces:
chemaxon.license.Licensable
,Cloneable
The
ECFP
class implements Extended-Connectivity Fingerprints (ECFPs)
as a type of MolecularDescriptor
s.
ECFPs are circular topological fingerprints designed for molecular characterization,
similarity searching, and structure-activity modeling.
They are among the most popular similarity search tools in drug discovery and they are effectively used in a wide variety of applications.
The main properties of ECFPs are the following.
- They represent molecular structures by means of circular atom neighborhoods.
- They can be very rapidly calculated.
- Their features represent the presence of particular substructures.
- They are not predefined and can represent a huge number of different molecular features (including stereochemical information).
- They are designed to represent both the presence and the absence of functionality, since both are crucial for analyzing molecular activity.
- Their generation method can be flexibly customized to produce various types of circular fingerprints for diverse applications.
For more information, see the user's guide.
- Since:
- JChem 5.4
-
Field Summary
Modifier and TypeFieldDescriptionprotected int
The number of bits set in the binary vector storageprotected int[]
Binary vector storage of the fingerprintprotected int[]
Identifier list storage of the fingerprintFields inherited from class chemaxon.descriptors.MolecularDescriptor
params
-
Constructor Summary
ConstructorDescriptionECFP()
Creates a new, empty instance of ECFP without allocating internal storage.Copy constructor.ECFP
(ECFPParameters params) Creates a new instance of ECFP according to the parameters given.Creates a new instance of ECFP according to the parameters given. -
Method Summary
Modifier and TypeMethodDescriptionvoid
clear()
Clears the fingerprint, all values are set to zero.clone()
Creates a new instance with identical internal state.void
Drops the binary vector storage.void
fromData
(byte[] data) Builds anECFP
fingerprint from an external data format created bytoData()
.void
fromFeatureSet
(Set<Integer> set) Deprecated, for removal: This API element is subject to removal in a future version.void
fromFloatArray
(float[] descr) Builds anECFP
fingerprint from its float array representation.void
fromIdentiferSet
(Set<Integer> set) Builds anECFP
fingerprint from a set ofInteger
identifers.void
fromIntArray
(int[] array) Builds anECFP
fingerprint from an array ofint
identifiers.final void
fromString
(String ecfp) Builds anECFP
fingerprint from its string representation created bytoString()
.String[]
Creates the ECFP fingerprint for the given Molecule.float
Calculates the asymmetric Euclidean distance.int
Gets the brightness of the fingerprint.float[]
Gets the default dissimilarity threshold values for all dissimilarity metrics defined.int
Gets the index of the default metric.float
getDefaultThreshold
(int metricIndex) Gets a metric dependent default threshold value.float
Calculates the dissimilarity ratio between twoECFP
objects using the current default metric.float
getDissimilarity
(MolecularDescriptor other, int metricIndex) Calculates the dissimilarity between twoECFP
objects using the specified metric, apart from that it is the same asgetDissimilarity(final MolecularDescriptor other)
.String[]
Gets the dissimilarity metric names introduced for this class ofMolecularDescriptor
.float
getEuclidean
(ECFP f) Calculates the Euclidean distance.int
Deprecated, for removal: This API element is subject to removal in a future version.As of JChem 5.4.1, replaced bygetIdentiferCount()
.int
Gets the number of integer identifers generated for the fingerprint.getName()
Gets the name of theECFP
fingerprint object.Gets the name of the parameters class corresponding to the descriptor.Gets the short name of the fingerprint.float
getTanimoto
(ECFP f) Calculates the Tanimoto distance.float
Calculates the weighted asymmetric Euclidean distance.float
Calculates the weighted Euclidean distance.boolean
Returns information about the licensing of the product.protected void
Checks the binary vector storage and generates it from the identifier list if necessary.void
Sets the license environment.void
setParameters
(MDParameters parameters) Sets the parameters of an already createdECFP
object.void
setParameters
(String parameters) Sets the parameters of an already createdECFP
object.Converts the fingerprint into a fixed-length 0,1 string.toBitSet()
Returns a bit vector storing the "folded" binary representation of the fingerprint.byte[]
toData()
Converts anECFP
object into a byte array.final String
Converts theECFP
fingerprint into a tab separated string.Deprecated, for removal: This API element is subject to removal in a future version.As of JChem 5.4.1, replaced bytoIdentiferSet()
.float[]
Creates the float array representation of aECFP
fingerprint object.Converts the fingerprint to a set ofInteger
identifiers.int[]
Converts the fingerprint to an array ofint
identifiers.final String
toString()
Converts the fingerprint into a readable string.Methods inherited from class chemaxon.descriptors.MolecularDescriptor
generate, getAtomSetColors, getAtomSetIndexes, getAtomSetNames, getDissimilarityMetricIndex, getLowerBound, getMetricIndex, getMetricName, getMetricName, getNumberOfMetrics, getNumberOfWeights, getParameters, getThreshold, getThreshold, main, needsConfig, newInstance, newInstance, newInstanceFromXML, setScreeningConfiguration
-
Field Details
-
ids
protected int[] idsIdentifier list storage of the fingerprint -
fp
protected int[] fpBinary vector storage of the fingerprint -
brightness
protected int brightnessThe number of bits set in the binary vector storage
-
-
Constructor Details
-
ECFP
public ECFP()Creates a new, empty instance of ECFP without allocating internal storage. -
ECFP
Creates a new instance of ECFP according to the parameters given.- Parameters:
params
- parameter settings
-
ECFP
Creates a new instance of ECFP according to the parameters given.- Parameters:
params
- parameter settings
-
ECFP
Copy constructor. An identical copy of theECFP
fingerprint passed is created. The old and the new instances share the sameECFPParameters
object.- Parameters:
ecfp
- fingerprint to be copied
-
-
Method Details
-
clone
Creates a new instance with identical internal state.- Specified by:
clone
in classMolecularDescriptor
- Returns:
- the newly copied object
-
isLicensed
public boolean isLicensed()Returns information about the licensing of the product.- Specified by:
isLicensed
in interfacechemaxon.license.Licensable
- Returns:
- true if the product is correctly licensed
-
setLicenseEnvironment
Sets the license environment.- Specified by:
setLicenseEnvironment
in interfacechemaxon.license.Licensable
-
getName
Gets the name of theECFP
fingerprint object. This name is not the same as the class name: nicer, and more meaningful for end-users too.- Overrides:
getName
in classMolecularDescriptor
- Returns:
- the nice, external name for
ECFP
class objects
-
getShortName
Gets the short name of the fingerprint.- Overrides:
getShortName
in classMolecularDescriptor
- Returns:
- the short name used in text outputs (tables etc.)
-
getParametersClassName
Gets the name of the parameters class corresponding to the descriptor.- Overrides:
getParametersClassName
in classMolecularDescriptor
- Returns:
- the name of the parameters class
-
setParameters
Sets the parameters of an already createdECFP
object.- Overrides:
setParameters
in classMolecularDescriptor
- Parameters:
parameters
- parameter settings for the fingerprint- Throws:
MDParametersException
- any XML error
-
setParameters
Sets the parameters of an already createdECFP
object.- Specified by:
setParameters
in classMolecularDescriptor
- Parameters:
parameters
- parameter settings for the fingerprint- Throws:
MDParametersException
- any XML error
-
clear
public void clear()Clears the fingerprint, all values are set to zero. -
toData
public byte[] toData()Converts anECFP
object into a byte array. This format can be referred to as an "external representation" since it servers as the data format for storing ECFP fingerprints in databases.
Use thefromData()
method to build theECFP
object from this "external" representation.- Specified by:
toData
in classMolecularDescriptor
- Returns:
- byte array representation of the fingerprint object
-
fromData
public void fromData(byte[] data) Builds anECFP
fingerprint from an external data format created bytoData()
.- Specified by:
fromData
in classMolecularDescriptor
- Parameters:
data
- "external" representation of aECFP
object
-
toString
Converts the fingerprint into a readable string. This is the default external text format of the fingerprint, which can also be stored into an SDfile.- Specified by:
toString
in classMolecularDescriptor
- Returns:
- string representation of the fingerprint
-
toDecimalString
Converts theECFP
fingerprint into a tab separated string.- Specified by:
toDecimalString
in classMolecularDescriptor
- Returns:
- string representation of the fingerprint
-
toBinaryString
Converts the fingerprint into a fixed-length 0,1 string. This string represents the "folded" binary version of the fingerprint.- Overrides:
toBinaryString
in classMolecularDescriptor
- Returns:
- binary string representation of the fingerprint
-
fromString
Builds anECFP
fingerprint from its string representation created bytoString()
.- Specified by:
fromString
in classMolecularDescriptor
- Parameters:
ecfp
-ECFP
fingerprint string- Throws:
ParseException
-
toFloatArray
public float[] toFloatArray()Creates the float array representation of aECFP
fingerprint object.- Specified by:
toFloatArray
in classMolecularDescriptor
- Returns:
- a float array of the fingerprint values
-
fromFloatArray
public void fromFloatArray(float[] descr) Builds anECFP
fingerprint from its float array representation. Typically used when a hypothesis is created.- Specified by:
fromFloatArray
in classMolecularDescriptor
- Parameters:
descr
- fingerprint represented in a float array (e.g. generated bytoFloatArray()
)
-
toIntArray
public int[] toIntArray()Converts the fingerprint to an array ofint
identifiers. -
fromIntArray
public void fromIntArray(int[] array) Builds anECFP
fingerprint from an array ofint
identifiers. -
toIdentiferSet
Converts the fingerprint to a set ofInteger
identifiers. -
fromIdentiferSet
Builds anECFP
fingerprint from a set ofInteger
identifers. -
toFeatureSet
Deprecated, for removal: This API element is subject to removal in a future version.As of JChem 5.4.1, replaced bytoIdentiferSet()
.Converts the fingerprint to a set ofInteger
identifiers. -
fromFeatureSet
@Deprecated(forRemoval=true) @SubjectToRemoval(date=JUL_01_2025) public void fromFeatureSet(Set<Integer> set) Deprecated, for removal: This API element is subject to removal in a future version.As of JChem 5.4.1, replaced byfromIdentiferSet()
.Builds anECFP
fingerprint from a set ofInteger
identifers. -
toBitSet
Returns a bit vector storing the "folded" binary representation of the fingerprint. -
getIdentiferCount
public int getIdentiferCount()Gets the number of integer identifers generated for the fingerprint.- Returns:
- the number of identifers in the fingerprint
-
getFeatureCount
Deprecated, for removal: This API element is subject to removal in a future version.As of JChem 5.4.1, replaced bygetIdentiferCount()
.Gets the number of integer identifers generated for the fingerprint.- Returns:
- the number of identifers in the fingerprint
-
getBrightness
public int getBrightness()Gets the brightness of the fingerprint. Actually, sometimes this is called the darkness. To be precise, this methods gets the numbers of 1 (one) bits in the fingerprint.- Returns:
- number of bits set to 1
-
requireBinaryVector
protected void requireBinaryVector()Checks the binary vector storage and generates it from the identifier list if necessary. -
dropBinaryVector
public void dropBinaryVector()Drops the binary vector storage. It will be regenerated when required. -
generate
Creates the ECFP fingerprint for the given Molecule. Calls the generator created by the correspondingECFPParameters
class.- Overrides:
generate
in classMolecularDescriptor
- Returns:
- property names set in the molecule during generation
- Throws:
MDGeneratorException
- when failed to generate fingerprint
-
getDissimilarityMetrics
Gets the dissimilarity metric names introduced for this class ofMolecularDescriptor
.- Specified by:
getDissimilarityMetrics
in classMolecularDescriptor
- Returns:
- the metrics array
-
getDefaultDissimilarityMetricThresholds
public float[] getDefaultDissimilarityMetricThresholds()Gets the default dissimilarity threshold values for all dissimilarity metrics defined.- Specified by:
getDefaultDissimilarityMetricThresholds
in classMolecularDescriptor
- Returns:
- array of dissimilarity threshold values
-
getDefaultMetricIndex
public int getDefaultMetricIndex()Gets the index of the default metric. In the case ofECFP
, this is Tanimoto.- Overrides:
getDefaultMetricIndex
in classMolecularDescriptor
- Returns:
- metric index of the default metric
-
getDefaultThreshold
public float getDefaultThreshold(int metricIndex) Gets a metric dependent default threshold value. Ideally, this value should be based on statistics, though the actual value is not too critical, since these are only used in user interfaces to simplify the use of applications for beginners.- Overrides:
getDefaultThreshold
in classMolecularDescriptor
- Parameters:
metricIndex
- index of a parameterized metric
-
getTanimoto
Calculates the Tanimoto distance.- Parameters:
f
- the distance fromf
is calculated- Returns:
- the tanimoto distance (dissimilarity coefficient)
-
getEuclidean
Calculates the Euclidean distance. This is the same as the Euclidean distance for bit string.- Parameters:
f
- the distance fromf
is calculated- Returns:
- the dissimilarity coefficient
-
getWeightedEuclidean
Calculates the weighted Euclidean distance. This is the same as the weighted Euclidean distance for bit strings.- Parameters:
f
- the distance fromf
is calculated- Returns:
- the dissimilarity coefficient
-
getAsymmetricEuclidean
Calculates the asymmetric Euclidean distance. This is the same as the asymmetric Euclidean distance for bit strings.- Parameters:
f
- the distance fromf
is calculated- Returns:
- the dissimilarity coefficient
-
getWeightedAsymmetricEuclidean
Calculates the weighted asymmetric Euclidean distance. This is the same as the asymmetric Euclidean distance for bit strings.- Parameters:
f
- the distance fromf
is calculated- Returns:
- the dissimilarity coefficient
-
getDissimilarity
Calculates the dissimilarity ratio between twoECFP
objects using the current default metric. Default metric is set in the correspondingECFPParameters
object bysetCurrentParametrizedMetric(int metricIndex)
. In the case of assymetric distances, swapping the two fingerprints can make big difference.- Specified by:
getDissimilarity
in classMolecularDescriptor
- Parameters:
other
- a fingerprint, to which the dissimilarity ratio is measured- Returns:
- the dissimilarity ratio
-
getDissimilarity
Calculates the dissimilarity between twoECFP
objects using the specified metric, apart from that it is the same asgetDissimilarity(final MolecularDescriptor other)
.- Specified by:
getDissimilarity
in classMolecularDescriptor
- Parameters:
other
- a fingerprint, to which the dissimilarity ratio is measuredmetricIndex
- the index of the metric to be used- Returns:
- the dissimilarity ratio
- See Also:
-
getAliasNames
-
fromIdentiferSet()
.