Package chemaxon.descriptors
Class ECFP
- java.lang.Object
-
- chemaxon.descriptors.MolecularDescriptor
-
- chemaxon.descriptors.ECFP
-
- All Implemented Interfaces:
chemaxon.license.Licensable
,Cloneable
@PublicAPI public class ECFP extends MolecularDescriptor implements chemaxon.license.Licensable
TheECFP
class implements Extended-Connectivity Fingerprints (ECFPs) as a type ofMolecularDescriptor
s. ECFPs are circular topological fingerprints designed for molecular characterization, similarity searching, and structure-activity modeling. They are among the most popular similarity search tools in drug discovery and they are effectively used in a wide variety of applications.The main properties of ECFPs are the following.
- They represent molecular structures by means of circular atom neighborhoods.
- They can be very rapidly calculated.
- Their features represent the presence of particular substructures.
- They are not predefined and can represent a huge number of different molecular features (including stereochemical information).
- They are designed to represent both the presence and the absence of functionality, since both are crucial for analyzing molecular activity.
- Their generation method can be flexibly customized to produce various types of circular fingerprints for diverse applications.
For more information, see the user's guide.
- Since:
- JChem 5.4
-
-
Field Summary
Fields Modifier and Type Field Description protected int
brightness
The number of bits set in the binary vector storageprotected int[]
fp
Binary vector storage of the fingerprintprotected int[]
ids
Identifier list storage of the fingerprint-
Fields inherited from class chemaxon.descriptors.MolecularDescriptor
params
-
-
Constructor Summary
Constructors Constructor Description ECFP()
Creates a new, empty instance of ECFP without allocating internal storage.ECFP(ECFP ecfp)
Copy constructor.ECFP(ECFPParameters params)
Creates a new instance of ECFP according to the parameters given.ECFP(String params)
Creates a new instance of ECFP according to the parameters given.
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
clear()
Clears the fingerprint, all values are set to zero.ECFP
clone()
Creates a new instance with identical internal state.void
dropBinaryVector()
Drops the binary vector storage.void
fromData(byte[] data)
Builds anECFP
fingerprint from an external data format created bytoData()
.void
fromFeatureSet(Set<Integer> set)
Deprecated.As of JChem 5.4.1, replaced byfromIdentiferSet()
.void
fromFloatArray(float[] descr)
Builds anECFP
fingerprint from its float array representation.void
fromIdentiferSet(Set<Integer> set)
Builds anECFP
fingerprint from a set ofInteger
identifers.void
fromIntArray(int[] array)
Builds anECFP
fingerprint from an array ofint
identifiers.void
fromString(String ecfp)
Builds anECFP
fingerprint from its string representation created bytoString()
.String[]
generate(Molecule m)
Creates the ECFP fingerprint for the given Molecule.List<String>
getAliasNames()
float
getAsymmetricEuclidean(ECFP f)
Calculates the asymmetric Euclidean distance.int
getBrightness()
Gets the brightness of the fingerprint.float[]
getDefaultDissimilarityMetricThresholds()
Gets the default dissimilarity threshold values for all dissimilarity metrics defined.int
getDefaultMetricIndex()
Gets the index of the default metric.float
getDefaultThreshold(int metricIndex)
Gets a metric dependent default threshold value.float
getDissimilarity(MolecularDescriptor other)
Calculates the dissimilarity ratio between twoECFP
objects using the current default metric.float
getDissimilarity(MolecularDescriptor other, int metricIndex)
Calculates the dissimilarity between twoECFP
objects using the specified metric, apart from that it is the same asgetDissimilarity(final MolecularDescriptor other)
.String[]
getDissimilarityMetrics()
Gets the dissimilarity metric names introduced for this class ofMolecularDescriptor
.float
getEuclidean(ECFP f)
Calculates the Euclidean distance.int
getFeatureCount()
Deprecated.As of JChem 5.4.1, replaced bygetIdentiferCount()
.int
getIdentiferCount()
Gets the number of integer identifers generated for the fingerprint.String
getName()
Gets the name of theECFP
fingerprint object.String
getParametersClassName()
Gets the name of the parameters class corresponding to the descriptor.String
getShortName()
Gets the short name of the fingerprint.float
getTanimoto(ECFP f)
Calculates the Tanimoto distance.float
getWeightedAsymmetricEuclidean(ECFP f)
Calculates the weighted asymmetric Euclidean distance.float
getWeightedEuclidean(ECFP f)
Calculates the weighted Euclidean distance.boolean
isLicensed()
Returns information about the licensing of the product.protected void
requireBinaryVector()
Checks the binary vector storage and generates it from the identifier list if necessary.void
setLicenseEnvironment(String env)
Sets the license environment.void
setParameters(MDParameters parameters)
Sets the parameters of an already createdECFP
object.void
setParameters(String parameters)
Sets the parameters of an already createdECFP
object.String
toBinaryString()
Converts the fingerprint into a fixed-length 0,1 string.BitSet
toBitSet()
Returns a bit vector storing the "folded" binary representation of the fingerprint.byte[]
toData()
Converts anECFP
object into a byte array.String
toDecimalString()
Converts theECFP
fingerprint into a tab separated string.Set<Integer>
toFeatureSet()
Deprecated.As of JChem 5.4.1, replaced bytoIdentiferSet()
.float[]
toFloatArray()
Creates the float array representation of aECFP
fingerprint object.Set<Integer>
toIdentiferSet()
Converts the fingerprint to a set ofInteger
identifiers.int[]
toIntArray()
Converts the fingerprint to an array ofint
identifiers.String
toString()
Converts the fingerprint into a readable string.-
Methods inherited from class chemaxon.descriptors.MolecularDescriptor
generate, getAtomSetColors, getAtomSetIndexes, getAtomSetNames, getDissimilarityMetricIndex, getLowerBound, getMetricIndex, getMetricName, getMetricName, getNumberOfMetrics, getNumberOfWeights, getParameters, getThreshold, getThreshold, main, needsConfig, newInstance, newInstance, newInstanceFromXML, newInstanceSupplier, setScreeningConfiguration
-
-
-
-
Constructor Detail
-
ECFP
public ECFP()
Creates a new, empty instance of ECFP without allocating internal storage.
-
ECFP
public ECFP(ECFPParameters params)
Creates a new instance of ECFP according to the parameters given.- Parameters:
params
- parameter settings
-
ECFP
public ECFP(String params)
Creates a new instance of ECFP according to the parameters given.- Parameters:
params
- parameter settings
-
ECFP
public ECFP(ECFP ecfp)
Copy constructor. An identical copy of theECFP
fingerprint passed is created. The old and the new instances share the sameECFPParameters
object.- Parameters:
ecfp
- fingerprint to be copied
-
-
Method Detail
-
clone
public ECFP clone()
Creates a new instance with identical internal state.- Specified by:
clone
in classMolecularDescriptor
- Returns:
- the newly copied object
-
isLicensed
public boolean isLicensed()
Returns information about the licensing of the product.- Specified by:
isLicensed
in interfacechemaxon.license.Licensable
- Returns:
- true if the product is correctly licensed
-
setLicenseEnvironment
public void setLicenseEnvironment(String env)
Sets the license environment.- Specified by:
setLicenseEnvironment
in interfacechemaxon.license.Licensable
-
getName
public String getName()
Gets the name of theECFP
fingerprint object. This name is not the same as the class name: nicer, and more meaningful for end-users too.- Overrides:
getName
in classMolecularDescriptor
- Returns:
- the nice, external name for
ECFP
class objects
-
getShortName
public String getShortName()
Gets the short name of the fingerprint.- Overrides:
getShortName
in classMolecularDescriptor
- Returns:
- the short name used in text outputs (tables etc.)
-
getParametersClassName
public String getParametersClassName()
Gets the name of the parameters class corresponding to the descriptor.- Overrides:
getParametersClassName
in classMolecularDescriptor
- Returns:
- the name of the parameters class
-
setParameters
public void setParameters(MDParameters parameters) throws MDParametersException
Sets the parameters of an already createdECFP
object.- Overrides:
setParameters
in classMolecularDescriptor
- Parameters:
parameters
- parameter settings for the fingerprint- Throws:
MDParametersException
- any XML error
-
setParameters
public void setParameters(String parameters) throws MDParametersException
Sets the parameters of an already createdECFP
object.- Specified by:
setParameters
in classMolecularDescriptor
- Parameters:
parameters
- parameter settings for the fingerprint- Throws:
MDParametersException
- any XML error
-
clear
public void clear()
Clears the fingerprint, all values are set to zero.
-
toData
public byte[] toData()
Converts anECFP
object into a byte array. This format can be referred to as an "external representation" since it servers as the data format for storing ECFP fingerprints in databases.
Use thefromData()
method to build theECFP
object from this "external" representation.- Specified by:
toData
in classMolecularDescriptor
- Returns:
- byte array representation of the fingerprint object
-
fromData
public void fromData(byte[] data)
Builds anECFP
fingerprint from an external data format created bytoData()
.- Specified by:
fromData
in classMolecularDescriptor
- Parameters:
data
- "external" representation of aECFP
object
-
toString
public final String toString()
Converts the fingerprint into a readable string. This is the default external text format of the fingerprint, which can also be stored into an SDfile.- Specified by:
toString
in classMolecularDescriptor
- Returns:
- string representation of the fingerprint
-
toDecimalString
public final String toDecimalString()
Converts theECFP
fingerprint into a tab separated string.- Specified by:
toDecimalString
in classMolecularDescriptor
- Returns:
- string representation of the fingerprint
-
toBinaryString
public String toBinaryString()
Converts the fingerprint into a fixed-length 0,1 string. This string represents the "folded" binary version of the fingerprint.- Overrides:
toBinaryString
in classMolecularDescriptor
- Returns:
- binary string representation of the fingerprint
-
fromString
public final void fromString(String ecfp) throws ParseException
Builds anECFP
fingerprint from its string representation created bytoString()
.- Specified by:
fromString
in classMolecularDescriptor
- Parameters:
ecfp
-ECFP
fingerprint string- Throws:
ParseException
-
toFloatArray
public float[] toFloatArray()
Creates the float array representation of aECFP
fingerprint object.- Specified by:
toFloatArray
in classMolecularDescriptor
- Returns:
- a float array of the fingerprint values
-
fromFloatArray
public void fromFloatArray(float[] descr)
Builds anECFP
fingerprint from its float array representation. Typically used when a hypothesis is created.- Specified by:
fromFloatArray
in classMolecularDescriptor
- Parameters:
descr
- fingerprint represented in a float array (e.g. generated bytoFloatArray()
)
-
toIntArray
public int[] toIntArray()
Converts the fingerprint to an array ofint
identifiers.
-
fromIntArray
public void fromIntArray(int[] array)
Builds anECFP
fingerprint from an array ofint
identifiers.
-
toIdentiferSet
public Set<Integer> toIdentiferSet()
Converts the fingerprint to a set ofInteger
identifiers.
-
fromIdentiferSet
public void fromIdentiferSet(Set<Integer> set)
Builds anECFP
fingerprint from a set ofInteger
identifers.
-
toFeatureSet
@Deprecated public Set<Integer> toFeatureSet()
Deprecated.As of JChem 5.4.1, replaced bytoIdentiferSet()
.Converts the fingerprint to a set ofInteger
identifiers.
-
fromFeatureSet
@Deprecated public void fromFeatureSet(Set<Integer> set)
Deprecated.As of JChem 5.4.1, replaced byfromIdentiferSet()
.Builds anECFP
fingerprint from a set ofInteger
identifers.
-
toBitSet
public BitSet toBitSet()
Returns a bit vector storing the "folded" binary representation of the fingerprint.
-
getIdentiferCount
public int getIdentiferCount()
Gets the number of integer identifers generated for the fingerprint.- Returns:
- the number of identifers in the fingerprint
-
getFeatureCount
@Deprecated public int getFeatureCount()
Deprecated.As of JChem 5.4.1, replaced bygetIdentiferCount()
.Gets the number of integer identifers generated for the fingerprint.- Returns:
- the number of identifers in the fingerprint
-
getBrightness
public int getBrightness()
Gets the brightness of the fingerprint. Actually, sometimes this is called the darkness. To be precise, this methods gets the numbers of 1 (one) bits in the fingerprint.- Returns:
- number of bits set to 1
-
requireBinaryVector
protected void requireBinaryVector()
Checks the binary vector storage and generates it from the identifier list if necessary.
-
dropBinaryVector
public void dropBinaryVector()
Drops the binary vector storage. It will be regenerated when required.
-
generate
public String[] generate(Molecule m) throws MDGeneratorException
Creates the ECFP fingerprint for the given Molecule. Calls the generator created by the correspondingECFPParameters
class.- Overrides:
generate
in classMolecularDescriptor
- Returns:
- property names set in the molecule during generation
- Throws:
MDGeneratorException
- when failed to generate fingerprint
-
getDissimilarityMetrics
public String[] getDissimilarityMetrics()
Gets the dissimilarity metric names introduced for this class ofMolecularDescriptor
.- Specified by:
getDissimilarityMetrics
in classMolecularDescriptor
- Returns:
- the metrics array
-
getDefaultDissimilarityMetricThresholds
public float[] getDefaultDissimilarityMetricThresholds()
Gets the default dissimilarity threshold values for all dissimilarity metrics defined.- Specified by:
getDefaultDissimilarityMetricThresholds
in classMolecularDescriptor
- Returns:
- array of dissimilarity threshold values
-
getDefaultMetricIndex
public int getDefaultMetricIndex()
Gets the index of the default metric. In the case ofECFP
, this is Tanimoto.- Overrides:
getDefaultMetricIndex
in classMolecularDescriptor
- Returns:
- metric index of the default metric
-
getDefaultThreshold
public float getDefaultThreshold(int metricIndex)
Gets a metric dependent default threshold value. Ideally, this value should be based on statistics, though the actual value is not too critical, since these are only used in user interfaces to simplify the use of applications for beginners.- Overrides:
getDefaultThreshold
in classMolecularDescriptor
- Parameters:
metricIndex
- index of a parameterized metric
-
getTanimoto
public float getTanimoto(ECFP f)
Calculates the Tanimoto distance.- Parameters:
f
- the distance fromf
is calculated- Returns:
- the tanimoto distance (dissimilarity coefficient)
-
getEuclidean
public float getEuclidean(ECFP f)
Calculates the Euclidean distance. This is the same as the Euclidean distance for bit string.- Parameters:
f
- the distance fromf
is calculated- Returns:
- the dissimilarity coefficient
-
getWeightedEuclidean
public float getWeightedEuclidean(ECFP f)
Calculates the weighted Euclidean distance. This is the same as the weighted Euclidean distance for bit strings.- Parameters:
f
- the distance fromf
is calculated- Returns:
- the dissimilarity coefficient
-
getAsymmetricEuclidean
public float getAsymmetricEuclidean(ECFP f)
Calculates the asymmetric Euclidean distance. This is the same as the asymmetric Euclidean distance for bit strings.- Parameters:
f
- the distance fromf
is calculated- Returns:
- the dissimilarity coefficient
-
getWeightedAsymmetricEuclidean
public float getWeightedAsymmetricEuclidean(ECFP f)
Calculates the weighted asymmetric Euclidean distance. This is the same as the asymmetric Euclidean distance for bit strings.- Parameters:
f
- the distance fromf
is calculated- Returns:
- the dissimilarity coefficient
-
getDissimilarity
public float getDissimilarity(MolecularDescriptor other)
Calculates the dissimilarity ratio between twoECFP
objects using the current default metric. Default metric is set in the correspondingECFPParameters
object bysetCurrentParametrizedMetric(int metricIndex)
. In the case of assymetric distances, swapping the two fingerprints can make big difference.- Specified by:
getDissimilarity
in classMolecularDescriptor
- Parameters:
other
- a fingerprint, to which the dissimilarity ratio is measured- Returns:
- the dissimilarity ratio
-
getDissimilarity
public float getDissimilarity(MolecularDescriptor other, int metricIndex)
Calculates the dissimilarity between twoECFP
objects using the specified metric, apart from that it is the same asgetDissimilarity(final MolecularDescriptor other)
.- Specified by:
getDissimilarity
in classMolecularDescriptor
- Parameters:
other
- a fingerprint, to which the dissimilarity ratio is measuredmetricIndex
- the index of the metric to be used- Returns:
- the dissimilarity ratio
- See Also:
MDParameters
,PFParameters
-
-