Package chemaxon.descriptors
Class ECFP
java.lang.Object
chemaxon.descriptors.MolecularDescriptor
chemaxon.descriptors.ECFP
- All Implemented Interfaces:
chemaxon.license.Licensable,Cloneable
The
ECFP class implements Extended-Connectivity Fingerprints (ECFPs)
as a type of MolecularDescriptors.
ECFPs are circular topological fingerprints designed for molecular characterization,
similarity searching, and structure-activity modeling.
They are among the most popular similarity search tools in drug discovery and they are effectively used in a wide variety of applications.
The main properties of ECFPs are the following.
- They represent molecular structures by means of circular atom neighborhoods.
- They can be very rapidly calculated.
- Their features represent the presence of particular substructures.
- They are not predefined and can represent a huge number of different molecular features (including stereochemical information).
- They are designed to represent both the presence and the absence of functionality, since both are crucial for analyzing molecular activity.
- Their generation method can be flexibly customized to produce various types of circular fingerprints for diverse applications.
For more information, see the user's guide.
- Since:
- JChem 5.4
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected intThe number of bits set in the binary vector storageprotected int[]Binary vector storage of the fingerprintprotected int[]Identifier list storage of the fingerprintFields inherited from class chemaxon.descriptors.MolecularDescriptor
params -
Constructor Summary
ConstructorsConstructorDescriptionECFP()Creates a new, empty instance of ECFP without allocating internal storage.Copy constructor.ECFP(ECFPParameters params) Creates a new instance of ECFP according to the parameters given.Creates a new instance of ECFP according to the parameters given. -
Method Summary
Modifier and TypeMethodDescriptionvoidclear()Clears the fingerprint, all values are set to zero.clone()Creates a new instance with identical internal state.voidDrops the binary vector storage.voidfromData(byte[] data) Builds anECFPfingerprint from an external data format created bytoData().voidfromFeatureSet(Set<Integer> set) Deprecated, for removal: This API element is subject to removal in a future version.voidfromFloatArray(float[] descr) Builds anECFPfingerprint from its float array representation.voidfromIdentiferSet(Set<Integer> set) Builds anECFPfingerprint from a set ofIntegeridentifers.voidfromIntArray(int[] array) Builds anECFPfingerprint from an array ofintidentifiers.final voidfromString(String ecfp) Builds anECFPfingerprint from its string representation created bytoString().String[]Creates the ECFP fingerprint for the given Molecule.floatCalculates the asymmetric Euclidean distance.intGets the brightness of the fingerprint.float[]Gets the default dissimilarity threshold values for all dissimilarity metrics defined.intGets the index of the default metric.floatgetDefaultThreshold(int metricIndex) Gets a metric dependent default threshold value.floatCalculates the dissimilarity ratio between twoECFPobjects using the current default metric.floatgetDissimilarity(MolecularDescriptor other, int metricIndex) Calculates the dissimilarity between twoECFPobjects using the specified metric, apart from that it is the same asgetDissimilarity(final MolecularDescriptor other).String[]Gets the dissimilarity metric names introduced for this class ofMolecularDescriptor.floatgetEuclidean(ECFP f) Calculates the Euclidean distance.intDeprecated, for removal: This API element is subject to removal in a future version.As of JChem 5.4.1, replaced bygetIdentiferCount().intGets the number of integer identifers generated for the fingerprint.getName()Gets the name of theECFPfingerprint object.Gets the name of the parameters class corresponding to the descriptor.Gets the short name of the fingerprint.floatgetTanimoto(ECFP f) Calculates the Tanimoto distance.floatCalculates the weighted asymmetric Euclidean distance.floatCalculates the weighted Euclidean distance.booleanReturns information about the licensing of the product.protected voidChecks the binary vector storage and generates it from the identifier list if necessary.voidSets the license environment.voidsetParameters(MDParameters parameters) Sets the parameters of an already createdECFPobject.voidsetParameters(String parameters) Sets the parameters of an already createdECFPobject.Converts the fingerprint into a fixed-length 0,1 string.toBitSet()Returns a bit vector storing the "folded" binary representation of the fingerprint.byte[]toData()Converts anECFPobject into a byte array.final StringConverts theECFPfingerprint into a tab separated string.Deprecated, for removal: This API element is subject to removal in a future version.As of JChem 5.4.1, replaced bytoIdentiferSet().float[]Creates the float array representation of aECFPfingerprint object.Converts the fingerprint to a set ofIntegeridentifiers.int[]Converts the fingerprint to an array ofintidentifiers.final StringtoString()Converts the fingerprint into a readable string.Methods inherited from class chemaxon.descriptors.MolecularDescriptor
generate, getAtomSetColors, getAtomSetIndexes, getAtomSetNames, getDissimilarityMetricIndex, getLowerBound, getMetricIndex, getMetricName, getMetricName, getNumberOfMetrics, getNumberOfWeights, getParameters, getThreshold, getThreshold, needsConfig, newInstance, newInstance, newInstanceFromXML, setScreeningConfiguration
-
Field Details
-
ids
protected int[] idsIdentifier list storage of the fingerprint -
fp
protected int[] fpBinary vector storage of the fingerprint -
brightness
protected int brightnessThe number of bits set in the binary vector storage
-
-
Constructor Details
-
ECFP
public ECFP()Creates a new, empty instance of ECFP without allocating internal storage. -
ECFP
Creates a new instance of ECFP according to the parameters given.- Parameters:
params- parameter settings
-
ECFP
Creates a new instance of ECFP according to the parameters given.- Parameters:
params- parameter settings
-
ECFP
Copy constructor. An identical copy of theECFPfingerprint passed is created. The old and the new instances share the sameECFPParametersobject.- Parameters:
ecfp- fingerprint to be copied
-
-
Method Details
-
clone
Creates a new instance with identical internal state.- Specified by:
clonein classMolecularDescriptor- Returns:
- the newly copied object
-
isLicensed
public boolean isLicensed()Returns information about the licensing of the product.- Specified by:
isLicensedin interfacechemaxon.license.Licensable- Returns:
- true if the product is correctly licensed
-
setLicenseEnvironment
Sets the license environment.- Specified by:
setLicenseEnvironmentin interfacechemaxon.license.Licensable
-
getName
Gets the name of theECFPfingerprint object. This name is not the same as the class name: nicer, and more meaningful for end-users too.- Overrides:
getNamein classMolecularDescriptor- Returns:
- the nice, external name for
ECFPclass objects
-
getShortName
Gets the short name of the fingerprint.- Overrides:
getShortNamein classMolecularDescriptor- Returns:
- the short name used in text outputs (tables etc.)
-
getParametersClassName
Gets the name of the parameters class corresponding to the descriptor.- Overrides:
getParametersClassNamein classMolecularDescriptor- Returns:
- the name of the parameters class
-
setParameters
Sets the parameters of an already createdECFPobject.- Overrides:
setParametersin classMolecularDescriptor- Parameters:
parameters- parameter settings for the fingerprint- Throws:
MDParametersException- any XML error
-
setParameters
Sets the parameters of an already createdECFPobject.- Specified by:
setParametersin classMolecularDescriptor- Parameters:
parameters- parameter settings for the fingerprint- Throws:
MDParametersException- any XML error
-
clear
public void clear()Clears the fingerprint, all values are set to zero. -
toData
public byte[] toData()Converts anECFPobject into a byte array. This format can be referred to as an "external representation" since it servers as the data format for storing ECFP fingerprints in databases.
Use thefromData()method to build theECFPobject from this "external" representation.- Specified by:
toDatain classMolecularDescriptor- Returns:
- byte array representation of the fingerprint object
-
fromData
public void fromData(byte[] data) Builds anECFPfingerprint from an external data format created bytoData().- Specified by:
fromDatain classMolecularDescriptor- Parameters:
data- "external" representation of aECFPobject
-
toString
Converts the fingerprint into a readable string. This is the default external text format of the fingerprint, which can also be stored into an SDfile.- Specified by:
toStringin classMolecularDescriptor- Returns:
- string representation of the fingerprint
-
toDecimalString
Converts theECFPfingerprint into a tab separated string.- Specified by:
toDecimalStringin classMolecularDescriptor- Returns:
- string representation of the fingerprint
-
toBinaryString
Converts the fingerprint into a fixed-length 0,1 string. This string represents the "folded" binary version of the fingerprint.- Overrides:
toBinaryStringin classMolecularDescriptor- Returns:
- binary string representation of the fingerprint
-
fromString
Builds anECFPfingerprint from its string representation created bytoString().- Specified by:
fromStringin classMolecularDescriptor- Parameters:
ecfp-ECFPfingerprint string- Throws:
ParseException
-
toFloatArray
public float[] toFloatArray()Creates the float array representation of aECFPfingerprint object.- Specified by:
toFloatArrayin classMolecularDescriptor- Returns:
- a float array of the fingerprint values
-
fromFloatArray
public void fromFloatArray(float[] descr) Builds anECFPfingerprint from its float array representation. Typically used when a hypothesis is created.- Specified by:
fromFloatArrayin classMolecularDescriptor- Parameters:
descr- fingerprint represented in a float array (e.g. generated bytoFloatArray())
-
toIntArray
public int[] toIntArray()Converts the fingerprint to an array ofintidentifiers. -
fromIntArray
public void fromIntArray(int[] array) Builds anECFPfingerprint from an array ofintidentifiers. -
toIdentiferSet
Converts the fingerprint to a set ofIntegeridentifiers. -
fromIdentiferSet
Builds anECFPfingerprint from a set ofIntegeridentifers. -
toFeatureSet
Deprecated, for removal: This API element is subject to removal in a future version.As of JChem 5.4.1, replaced bytoIdentiferSet().Converts the fingerprint to a set ofIntegeridentifiers. -
fromFeatureSet
@Deprecated(forRemoval=true) @SubjectToRemoval(date=JUL_01_2025) public void fromFeatureSet(Set<Integer> set) Deprecated, for removal: This API element is subject to removal in a future version.As of JChem 5.4.1, replaced byfromIdentiferSet().Builds anECFPfingerprint from a set ofIntegeridentifers. -
toBitSet
Returns a bit vector storing the "folded" binary representation of the fingerprint. -
getIdentiferCount
public int getIdentiferCount()Gets the number of integer identifers generated for the fingerprint.- Returns:
- the number of identifers in the fingerprint
-
getFeatureCount
Deprecated, for removal: This API element is subject to removal in a future version.As of JChem 5.4.1, replaced bygetIdentiferCount().Gets the number of integer identifers generated for the fingerprint.- Returns:
- the number of identifers in the fingerprint
-
getBrightness
public int getBrightness()Gets the brightness of the fingerprint. Actually, sometimes this is called the darkness. To be precise, this methods gets the numbers of 1 (one) bits in the fingerprint.- Returns:
- number of bits set to 1
-
requireBinaryVector
protected void requireBinaryVector()Checks the binary vector storage and generates it from the identifier list if necessary. -
dropBinaryVector
public void dropBinaryVector()Drops the binary vector storage. It will be regenerated when required. -
generate
Creates the ECFP fingerprint for the given Molecule. Calls the generator created by the correspondingECFPParametersclass.- Overrides:
generatein classMolecularDescriptor- Returns:
- property names set in the molecule during generation
- Throws:
MDGeneratorException- when failed to generate fingerprint
-
getDissimilarityMetrics
Gets the dissimilarity metric names introduced for this class ofMolecularDescriptor.- Specified by:
getDissimilarityMetricsin classMolecularDescriptor- Returns:
- the metrics array
-
getDefaultDissimilarityMetricThresholds
public float[] getDefaultDissimilarityMetricThresholds()Gets the default dissimilarity threshold values for all dissimilarity metrics defined.- Specified by:
getDefaultDissimilarityMetricThresholdsin classMolecularDescriptor- Returns:
- array of dissimilarity threshold values
-
getDefaultMetricIndex
public int getDefaultMetricIndex()Gets the index of the default metric. In the case ofECFP, this is Tanimoto.- Overrides:
getDefaultMetricIndexin classMolecularDescriptor- Returns:
- metric index of the default metric
-
getDefaultThreshold
public float getDefaultThreshold(int metricIndex) Gets a metric dependent default threshold value. Ideally, this value should be based on statistics, though the actual value is not too critical, since these are only used in user interfaces to simplify the use of applications for beginners.- Overrides:
getDefaultThresholdin classMolecularDescriptor- Parameters:
metricIndex- index of a parameterized metric
-
getTanimoto
Calculates the Tanimoto distance.- Parameters:
f- the distance fromfis calculated- Returns:
- the tanimoto distance (dissimilarity coefficient)
-
getEuclidean
Calculates the Euclidean distance. This is the same as the Euclidean distance for bit string.- Parameters:
f- the distance fromfis calculated- Returns:
- the dissimilarity coefficient
-
getWeightedEuclidean
Calculates the weighted Euclidean distance. This is the same as the weighted Euclidean distance for bit strings.- Parameters:
f- the distance fromfis calculated- Returns:
- the dissimilarity coefficient
-
getAsymmetricEuclidean
Calculates the asymmetric Euclidean distance. This is the same as the asymmetric Euclidean distance for bit strings.- Parameters:
f- the distance fromfis calculated- Returns:
- the dissimilarity coefficient
-
getWeightedAsymmetricEuclidean
Calculates the weighted asymmetric Euclidean distance. This is the same as the asymmetric Euclidean distance for bit strings.- Parameters:
f- the distance fromfis calculated- Returns:
- the dissimilarity coefficient
-
getDissimilarity
Calculates the dissimilarity ratio between twoECFPobjects using the current default metric. Default metric is set in the correspondingECFPParametersobject bysetCurrentParametrizedMetric(int metricIndex). In the case of assymetric distances, swapping the two fingerprints can make big difference.- Specified by:
getDissimilarityin classMolecularDescriptor- Parameters:
other- a fingerprint, to which the dissimilarity ratio is measured- Returns:
- the dissimilarity ratio
-
getDissimilarity
Calculates the dissimilarity between twoECFPobjects using the specified metric, apart from that it is the same asgetDissimilarity(final MolecularDescriptor other).- Specified by:
getDissimilarityin classMolecularDescriptor- Parameters:
other- a fingerprint, to which the dissimilarity ratio is measuredmetricIndex- the index of the metric to be used- Returns:
- the dissimilarity ratio
- See Also:
-
getAliasNames
-
fromIdentiferSet().