Package chemaxon.descriptors
Class ChemicalFingerprint
- java.lang.Object
-
- chemaxon.descriptors.MolecularDescriptor
-
- chemaxon.descriptors.ChemicalFingerprint
-
- All Implemented Interfaces:
Cloneable
@PublicAPI public class ChemicalFingerprint extends MolecularDescriptor
TheChemicalFingerprint
class implements topological fingerprints as a type ofMolecularDescriptor
s. Such fingerprints encode the topological connection between atoms of the chemical graph. Though such encoding loses information, still it preserves enough to allow fast comparisons of chemical structures without their direct structural comparison but instead involving their topological fingerprints.
This class provides two metrics for dissimilarity calculations: Tanimoto and Euclidean. Many varieties of the base metrics are supported, for instance scaling, directing, weighting. Euclidean has a normalized form too in order to upper bound the otherwise unbounded Euclidean metric.
Typical usage:
Generating fingerprints
CFParameters params = new CFParameters( "config.xml" ); CF fp = new CF( params ); // always use an MDSet object, even if it has one component only MDSet ds = new MDSet(); ds.addDescriptor( pfp ); // create an input source reader that takes molecules from a smiles file MDFileReader src = new MDFileReader( "input.smiles" ); src.setIdTagName( "CGX_ID" ); // just an example // process input: get the fingerprints from the input source and do sg while ( src.next( ds ) ) { // scr generates the descriptor! do_something( ds ); } src.close();
- Since:
- JChem 2.0
-
-
Field Summary
Fields Modifier and Type Field Description protected int
brightness
number of bits set in the fingerprint (sometimes this is called the darkness, but that seems to be less pausible)protected int[]
fp
storage for the fingerprint-
Fields inherited from class chemaxon.descriptors.MolecularDescriptor
params
-
-
Constructor Summary
Constructors Constructor Description ChemicalFingerprint()
Creates a new, empty instance of ChemicalFingerprint without allocating internal storage.ChemicalFingerprint(CFParameters params)
Creates a new instance of ChemicalFingerprint according to the parameters given.ChemicalFingerprint(ChemicalFingerprint cfp)
Copy constructor.ChemicalFingerprint(String params)
Creates a new instance of ChemicalFingerprint according to the parameters given.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
clear()
Clears the fingerprint: sets all bins to store zero value.ChemicalFingerprint
clone()
Creates a new instance with identical internal state.void
fromData(byte[] dbRepr)
Builds a fingerprint from an external data format, created by a previous call totoData()
.void
fromFloatArray(float[] descr)
Builds fingerprint from its float array representation.void
fromString(String cfp)
Builds a fingerprint from its string representation created bytoString()
.String[]
generate(Molecule m)
Creates the ChemicalFingerprint descriptor for the given Molecule.List<String>
getAliasNames()
Simple test function for engineering purposes, comment it out from released version.float
getAsymmetricEuclidean(ChemicalFingerprint f)
Calculates the asymmetric Euclidean distance.int
getBrightness()
Gets the brightness of the fingerprint.int
getCommonBitCount(ChemicalFingerprint f)
float[]
getDefaultDissimilarityMetricThresholds()
Gets the default dissimilarity threshold values for all dissimilarity metrics defined.int
getDefaultMetricIndex()
Gets the index of the default metric.float
getDefaultThreshold(int metricIndex)
Gets a metric dependent default threshold value.float
getDissimilarity(MolecularDescriptor fp2)
Calculates the dissimilarity between two chemical fingerprints using the default distance measure.float
getDissimilarity(MolecularDescriptor fp2, int metricIndex)
Calculates the dissimilarity between two chemical fingerprints using the specified distance metric.String[]
getDissimilarityMetrics()
Gets the dissimilarity metric namesfloat
getEuclidean(ChemicalFingerprint f)
Calculates the Euclidean distance.float
getLowerBound(Object fp2)
Calculates the lower bound estimate of the dissimilarity from the given fingerprint.String
getName()
Gets the name of the ChemicalFingerprint object.String
getParametersClassName()
Gets the name of the parameters class corresponding to the descriptor.String
getShortName()
Gets the short name of the descriptor.float
getTanimoto(ChemicalFingerprint f)
Calculates the Tanimoto metric.float
getTversky(ChemicalFingerprint f)
Calculates the Tversky !!DISSIMILARITY!! index: (1-(commonly used tversky))float
getWeightedAsymmetricEuclidean(ChemicalFingerprint f)
Calculates the weighted asymmetric Euclidean distance.float
getWeightedEuclidean(ChemicalFingerprint f)
Calculates the weighted Euclidean distance.boolean
isSubSetOf(ChemicalFingerprint f)
Checks if this fingerprint is a subset of another fingerprint that is passed as method parameter.void
setParameters(MDParameters parameters)
Sets parameters, allocates internal storage if needed and cleans the descriptor.void
setParameters(String parameters)
Sets the parameters of an already createdChemicalFingerprint
object.String
toBinaryString()
Converts the fingerprint into a 0,1 string.byte[]
toData()
Converts a chemical fingerprint object into a byte array.String
toDecimalString()
Converts the fingerprint into a tab separated string.float[]
toFloatArray()
Creates the float array representation of the fingerprint.String
toString()
Converts the fingerprint into a readable string.-
Methods inherited from class chemaxon.descriptors.MolecularDescriptor
generate, getAtomSetColors, getAtomSetIndexes, getAtomSetNames, getDissimilarityMetricIndex, getLowerBound, getMetricIndex, getMetricName, getMetricName, getNumberOfMetrics, getNumberOfWeights, getParameters, getThreshold, getThreshold, main, needsConfig, newInstance, newInstance, newInstanceFromXML, newInstanceSupplier, setScreeningConfiguration
-
-
-
-
Constructor Detail
-
ChemicalFingerprint
public ChemicalFingerprint()
Creates a new, empty instance of ChemicalFingerprint without allocating internal storage.
-
ChemicalFingerprint
public ChemicalFingerprint(CFParameters params)
Creates a new instance of ChemicalFingerprint according to the parameters given.- Parameters:
params
- parameters used in fingerprint generation and handling- Since:
- JChem 2.2
-
ChemicalFingerprint
public ChemicalFingerprint(String params)
Creates a new instance of ChemicalFingerprint according to the parameters given.- Parameters:
params
- parameter settings
-
ChemicalFingerprint
public ChemicalFingerprint(ChemicalFingerprint cfp)
Copy constructor. An identical copy of the chemical fingerprint passed is created. The old and the new instances share the sameCFParameters
object.- Parameters:
cfp
- fingerprint to be copied
-
-
Method Detail
-
clone
public ChemicalFingerprint clone()
Creates a new instance with identical internal state.- Specified by:
clone
in classMolecularDescriptor
- Returns:
- the newly copied object
-
getName
public String getName()
Gets the name of the ChemicalFingerprint object. The name is not the same as the class name, it is nicer, more readable and meaningful for end-users too.- Overrides:
getName
in classMolecularDescriptor
- Returns:
- the nice, external name for ChemicalFingerprint class objects
-
getShortName
public String getShortName()
Gets the short name of the descriptor.- Overrides:
getShortName
in classMolecularDescriptor
- Returns:
- the short name used in text outputs (tables etc.)
-
getParametersClassName
public String getParametersClassName()
Gets the name of the parameters class corresponding to the descriptor.- Overrides:
getParametersClassName
in classMolecularDescriptor
- Returns:
- the name of the parameters class
-
getBrightness
public int getBrightness()
Gets the brightness of the fingerprint. Actually, sometimes this is called the darkness. To be precise, this methods gets the numbers of 1 (one) bits in the fingerprint.- Returns:
- number of bits set to 1
-
setParameters
public void setParameters(MDParameters parameters)
Sets parameters, allocates internal storage if needed and cleans the descriptor.- Overrides:
setParameters
in classMolecularDescriptor
- Parameters:
parameters
- fingerprint parameters
-
setParameters
public void setParameters(String parameters) throws MDParametersException
Sets the parameters of an already createdChemicalFingerprint
object.- Specified by:
setParameters
in classMolecularDescriptor
- Parameters:
parameters
- parameter settings for the descriptor- Throws:
MDParametersException
- any XML error
-
toData
public byte[] toData()
Converts a chemical fingerprint object into a byte array. This format can be reffered to as an "external representation" since it servers as the data format for storing fingerprints in databases.
Use thefromData()
method to build the fingerprint from this "external" representation.- Specified by:
toData
in classMolecularDescriptor
- Returns:
- byte array representation of the fingerprint object
-
fromData
public void fromData(byte[] dbRepr)
Builds a fingerprint from an external data format, created by a previous call totoData()
.- Specified by:
fromData
in classMolecularDescriptor
- Parameters:
dbRepr
- "external" representation of ChemicalFingerprint
-
clear
public final void clear()
Clears the fingerprint: sets all bins to store zero value.
-
toString
public final String toString()
Converts the fingerprint into a readable string. This is the default external text format of the fingerprint, which can also be stored into an SDfile.- Specified by:
toString
in classMolecularDescriptor
- Returns:
- string representation of the fingerprint
-
toDecimalString
public final String toDecimalString()
Converts the fingerprint into a tab separated string.- Specified by:
toDecimalString
in classMolecularDescriptor
- Returns:
- string representation of the fingerprint
-
toBinaryString
public String toBinaryString()
Converts the fingerprint into a 0,1 string.- Overrides:
toBinaryString
in classMolecularDescriptor
- Returns:
- binary string representation of the fingerprint
- Since:
- JChem 2.3
-
fromString
public final void fromString(String cfp) throws ParseException
Builds a fingerprint from its string representation created bytoString()
.- Specified by:
fromString
in classMolecularDescriptor
- Parameters:
cfp
- fingerprint string- Throws:
ParseException
-
toFloatArray
public final float[] toFloatArray()
Creates the float array representation of the fingerprint. This array contains all values of the fingerprint (including all zeros) in the elements of the array.- Specified by:
toFloatArray
in classMolecularDescriptor
- Returns:
- a float array representation of the fingerprint
- Since:
- JChem 2.0.1
-
fromFloatArray
public void fromFloatArray(float[] descr) throws RuntimeException
Builds fingerprint from its float array representation. Typically used when a hypothesis is created.- Specified by:
fromFloatArray
in classMolecularDescriptor
- Parameters:
descr
- fingerprint represented in a float array (e.g. generated bytoFloatArray()
)- Throws:
RuntimeException
- Since:
- JChem 2.0.1
-
generate
public String[] generate(Molecule m) throws MDGeneratorException
Creates the ChemicalFingerprint descriptor for the given Molecule. Calls the generator created by the correspondingMDParameters
class.- Overrides:
generate
in classMolecularDescriptor
- Returns:
- property names set in the molecule passed during generation
- Throws:
MDGeneratorException
- when failed to generate descriptor
-
getDissimilarityMetrics
public String[] getDissimilarityMetrics()
Gets the dissimilarity metric names- Specified by:
getDissimilarityMetrics
in classMolecularDescriptor
- Returns:
- the metrics array
-
getDefaultDissimilarityMetricThresholds
public float[] getDefaultDissimilarityMetricThresholds()
Gets the default dissimilarity threshold values for all dissimilarity metrics defined.- Specified by:
getDefaultDissimilarityMetricThresholds
in classMolecularDescriptor
- Returns:
- array of dissimilarity threshold values
-
getDefaultMetricIndex
public int getDefaultMetricIndex()
Gets the index of the default metric. In the case of this class this is Tanimoto.- Overrides:
getDefaultMetricIndex
in classMolecularDescriptor
- Returns:
- metric index of the default metric
-
getDefaultThreshold
public float getDefaultThreshold(int metricIndex)
Gets a metric dependent default threshold value. Ideally, this value should be based on statistics, though the actual value is not too critical, since these are only used in user interfaces to simplify the use of applications for beginners.- Overrides:
getDefaultThreshold
in classMolecularDescriptor
- Parameters:
metricIndex
- index of a parameterized metric
-
getCommonBitCount
public int getCommonBitCount(ChemicalFingerprint f)
-
getTanimoto
public float getTanimoto(ChemicalFingerprint f)
Calculates the Tanimoto metric.- Parameters:
f
- the distance fromf
is calculated- Returns:
- the tanimoto distance (dissimilarity coefficient)
-
getTversky
public float getTversky(ChemicalFingerprint f)
Calculates the Tversky !!DISSIMILARITY!! index: (1-(commonly used tversky))- Parameters:
f
- the distance fromf
is calculated- Returns:
- the Tversky dissmilarity index as float
-
getEuclidean
public float getEuclidean(ChemicalFingerprint f)
Calculates the Euclidean distance. This is the same as the Euclidean distance for bit string.- Parameters:
f
- the distance fromf
is calculated- Returns:
- the dissimilarity coefficient
-
getWeightedEuclidean
public float getWeightedEuclidean(ChemicalFingerprint f)
Calculates the weighted Euclidean distance. This is the same as the weighted Euclidean distance for bit strings.- Parameters:
f
- the distance fromf
is calculated- Returns:
- the dissimilarity coefficient
-
getAsymmetricEuclidean
public float getAsymmetricEuclidean(ChemicalFingerprint f)
Calculates the asymmetric Euclidean distance. This is the same as the asymmetric Euclidean distance for bit strings.- Parameters:
f
- the distance fromf
is calculated- Returns:
- the dissimilarity coefficient
-
getWeightedAsymmetricEuclidean
public float getWeightedAsymmetricEuclidean(ChemicalFingerprint f)
Calculates the weighted asymmetric Euclidean distance. This is the same as the asymmetric Euclidean distance for bit strings.- Parameters:
f
- the distance fromf
is calculated- Returns:
- the dissimilarity coefficient
-
getDissimilarity
public float getDissimilarity(MolecularDescriptor fp2)
Calculates the dissimilarity between two chemical fingerprints using the default distance measure.- Specified by:
getDissimilarity
in classMolecularDescriptor
- Parameters:
fp2
- the other pahrmacophore fingerprint- Returns:
- dissimilarity ratio
-
getDissimilarity
public float getDissimilarity(MolecularDescriptor fp2, int metricIndex)
Calculates the dissimilarity between two chemical fingerprints using the specified distance metric. The index of the required metric can be obtained by calling
.getMetricIndex
( String metricName )
New metrics implemented by this class have to be added at the end of the existing ones.- Specified by:
getDissimilarity
in classMolecularDescriptor
- Parameters:
fp2
- the chemical fingerprint from which the distance is measuredmetricIndex
- index of the metric to be used- Returns:
- the dissimilarity ratio
- See Also:
MDParameters
,PFParameters
-
getLowerBound
public float getLowerBound(Object fp2)
Calculates the lower bound estimate of the dissimilarity from the given fingerprint. In the case ofChemicalFingerprint
a good estimate for the minimum distance cannot be obtained efficiently (that is, significantly faster than calculating the proper distance) therefore 0 is returned. This trivial distance bound estimation will lead to calling
getDistance
.- Parameters:
fp2
- chemical fingerprint from which distance is measured- Returns:
- estimate of the minimum distance
-
isSubSetOf
public boolean isSubSetOf(ChemicalFingerprint f)
Checks if this fingerprint is a subset of another fingerprint that is passed as method parameter. A binary fingerprint is considered to be a subset of another if none of its bits is larger than that of the other's.- Parameters:
f
- a descriptor which is supposed to be a superset- Returns:
- true if this descriptor is a subset of the parameter
-
-