Package chemaxon.descriptors
Class ChemicalFingerprint
java.lang.Object
chemaxon.descriptors.MolecularDescriptor
chemaxon.descriptors.ChemicalFingerprint
- All Implemented Interfaces:
Cloneable
The
This class provides two metrics for dissimilarity calculations: Tanimoto and Euclidean. Many varieties of the base metrics are supported, for instance scaling, directing, weighting. Euclidean has a normalized form too in order to upper bound the otherwise unbounded Euclidean metric.
Generating fingerprints
ChemicalFingerprint class implements topological fingerprints as a
type of MolecularDescriptors. Such fingerprints encode the
topological connection between atoms of the chemical graph. Though such
encoding loses information, still it preserves enough to allow fast
comparisons of chemical structures without their direct structural comparison
but instead involving their topological fingerprints. This class provides two metrics for dissimilarity calculations: Tanimoto and Euclidean. Many varieties of the base metrics are supported, for instance scaling, directing, weighting. Euclidean has a normalized form too in order to upper bound the otherwise unbounded Euclidean metric.
Typical usage:
Generating fingerprints
CFParameters params = new CFParameters( "config.xml" );
CF fp = new CF( params );
// always use an MDSet object, even if it has one component only
MDSet ds = new MDSet();
ds.addDescriptor( pfp );
// create an input source reader that takes molecules from a smiles file
MDFileReader src = new MDFileReader( "input.smiles" );
src.setIdTagName( "CGX_ID" ); // just an example
// process input: get the fingerprints from the input source and do sg
while ( src.next( ds ) ) { // scr generates the descriptor!
do_something( ds );
}
src.close();
- Since:
- JChem 2.0
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected intnumber of bits set in the fingerprint (sometimes this is called the darkness, but that seems to be less pausible)protected int[]storage for the fingerprintFields inherited from class chemaxon.descriptors.MolecularDescriptor
params -
Constructor Summary
ConstructorsConstructorDescriptionCreates a new, empty instance of ChemicalFingerprint without allocating internal storage.ChemicalFingerprint(CFParameters params) Creates a new instance of ChemicalFingerprint according to the parameters given.Copy constructor.ChemicalFingerprint(String params) Creates a new instance of ChemicalFingerprint according to the parameters given. -
Method Summary
Modifier and TypeMethodDescriptionfinal voidclear()Clears the fingerprint: sets all bins to store zero value.clone()Creates a new instance with identical internal state.voidfromData(byte[] dbRepr) Builds a fingerprint from an external data format, created by a previous call totoData().voidfromFloatArray(float[] descr) Builds fingerprint from its float array representation.final voidfromString(String cfp) Builds a fingerprint from its string representation created bytoString().String[]Creates the ChemicalFingerprint descriptor for the given Molecule.floatCalculates the asymmetric Euclidean distance.intGets the brightness of the fingerprint.intfloat[]Gets the default dissimilarity threshold values for all dissimilarity metrics defined.intGets the index of the default metric.floatgetDefaultThreshold(int metricIndex) Gets a metric dependent default threshold value.floatCalculates the dissimilarity between two chemical fingerprints using the default distance measure.floatgetDissimilarity(MolecularDescriptor fp2, int metricIndex) Calculates the dissimilarity between two chemical fingerprints using the specified distance metric.String[]Gets the dissimilarity metric namesfloatCalculates the Euclidean distance.getName()Gets the name of the ChemicalFingerprint object.Gets the name of the parameters class corresponding to the descriptor.Gets the short name of the descriptor.floatCalculates the Tanimoto metric.floatCalculates the Tversky !!DISSIMILARITY!! index: (1-(commonly used tversky))floatCalculates the weighted asymmetric Euclidean distance.floatCalculates the weighted Euclidean distance.booleanChecks if this fingerprint is a subset of another fingerprint that is passed as method parameter.voidsetParameters(MDParameters parameters) Sets parameters, allocates internal storage if needed and cleans the descriptor.voidsetParameters(String parameters) Sets the parameters of an already createdChemicalFingerprintobject.Converts the fingerprint into a 0,1 string.byte[]toData()Converts a chemical fingerprint object into a byte array.final StringConverts the fingerprint into a tab separated string.final float[]Creates the float array representation of the fingerprint.final StringtoString()Converts the fingerprint into a readable string.Methods inherited from class chemaxon.descriptors.MolecularDescriptor
generate, getAtomSetColors, getAtomSetIndexes, getAtomSetNames, getDissimilarityMetricIndex, getLowerBound, getMetricIndex, getMetricName, getMetricName, getNumberOfMetrics, getNumberOfWeights, getParameters, getThreshold, getThreshold, needsConfig, newInstance, newInstance, newInstanceFromXML, setScreeningConfiguration
-
Field Details
-
fp
protected int[] fpstorage for the fingerprint -
brightness
protected int brightnessnumber of bits set in the fingerprint (sometimes this is called the darkness, but that seems to be less pausible)
-
-
Constructor Details
-
ChemicalFingerprint
public ChemicalFingerprint()Creates a new, empty instance of ChemicalFingerprint without allocating internal storage. -
ChemicalFingerprint
Creates a new instance of ChemicalFingerprint according to the parameters given.- Parameters:
params- parameters used in fingerprint generation and handling- Since:
- JChem 2.2
-
ChemicalFingerprint
Creates a new instance of ChemicalFingerprint according to the parameters given.- Parameters:
params- parameter settings
-
ChemicalFingerprint
Copy constructor. An identical copy of the chemical fingerprint passed is created. The old and the new instances share the sameCFParametersobject.- Parameters:
cfp- fingerprint to be copied
-
-
Method Details
-
clone
Creates a new instance with identical internal state.- Specified by:
clonein classMolecularDescriptor- Returns:
- the newly copied object
-
getName
Gets the name of the ChemicalFingerprint object. The name is not the same as the class name, it is nicer, more readable and meaningful for end-users too.- Overrides:
getNamein classMolecularDescriptor- Returns:
- the nice, external name for ChemicalFingerprint class objects
-
getShortName
Gets the short name of the descriptor.- Overrides:
getShortNamein classMolecularDescriptor- Returns:
- the short name used in text outputs (tables etc.)
-
getParametersClassName
Gets the name of the parameters class corresponding to the descriptor.- Overrides:
getParametersClassNamein classMolecularDescriptor- Returns:
- the name of the parameters class
-
getBrightness
public int getBrightness()Gets the brightness of the fingerprint. Actually, sometimes this is called the darkness. To be precise, this methods gets the numbers of 1 (one) bits in the fingerprint.- Returns:
- number of bits set to 1
-
setParameters
Sets parameters, allocates internal storage if needed and cleans the descriptor.- Overrides:
setParametersin classMolecularDescriptor- Parameters:
parameters- fingerprint parameters
-
setParameters
Sets the parameters of an already createdChemicalFingerprintobject.- Specified by:
setParametersin classMolecularDescriptor- Parameters:
parameters- parameter settings for the descriptor- Throws:
MDParametersException- any XML error
-
toData
public byte[] toData()Converts a chemical fingerprint object into a byte array. This format can be reffered to as an "external representation" since it servers as the data format for storing fingerprints in databases.
Use thefromData()method to build the fingerprint from this "external" representation.- Specified by:
toDatain classMolecularDescriptor- Returns:
- byte array representation of the fingerprint object
-
fromData
public void fromData(byte[] dbRepr) Builds a fingerprint from an external data format, created by a previous call totoData().- Specified by:
fromDatain classMolecularDescriptor- Parameters:
dbRepr- "external" representation of ChemicalFingerprint
-
clear
public final void clear()Clears the fingerprint: sets all bins to store zero value. -
toString
Converts the fingerprint into a readable string. This is the default external text format of the fingerprint, which can also be stored into an SDfile.- Specified by:
toStringin classMolecularDescriptor- Returns:
- string representation of the fingerprint
-
toDecimalString
Converts the fingerprint into a tab separated string.- Specified by:
toDecimalStringin classMolecularDescriptor- Returns:
- string representation of the fingerprint
-
toBinaryString
Converts the fingerprint into a 0,1 string.- Overrides:
toBinaryStringin classMolecularDescriptor- Returns:
- binary string representation of the fingerprint
- Since:
- JChem 2.3
-
fromString
Builds a fingerprint from its string representation created bytoString().- Specified by:
fromStringin classMolecularDescriptor- Parameters:
cfp- fingerprint string- Throws:
ParseException
-
toFloatArray
public final float[] toFloatArray()Creates the float array representation of the fingerprint. This array contains all values of the fingerprint (including all zeros) in the elements of the array.- Specified by:
toFloatArrayin classMolecularDescriptor- Returns:
- a float array representation of the fingerprint
- Since:
- JChem 2.0.1
-
fromFloatArray
Builds fingerprint from its float array representation. Typically used when a hypothesis is created.- Specified by:
fromFloatArrayin classMolecularDescriptor- Parameters:
descr- fingerprint represented in a float array (e.g. generated bytoFloatArray())- Throws:
RuntimeException- Since:
- JChem 2.0.1
-
generate
Creates the ChemicalFingerprint descriptor for the given Molecule. Calls the generator created by the correspondingMDParametersclass.- Overrides:
generatein classMolecularDescriptor- Returns:
- property names set in the molecule passed during generation
- Throws:
MDGeneratorException- when failed to generate descriptor
-
getDissimilarityMetrics
Gets the dissimilarity metric names- Specified by:
getDissimilarityMetricsin classMolecularDescriptor- Returns:
- the metrics array
-
getDefaultDissimilarityMetricThresholds
public float[] getDefaultDissimilarityMetricThresholds()Gets the default dissimilarity threshold values for all dissimilarity metrics defined.- Specified by:
getDefaultDissimilarityMetricThresholdsin classMolecularDescriptor- Returns:
- array of dissimilarity threshold values
-
getDefaultMetricIndex
public int getDefaultMetricIndex()Gets the index of the default metric. In the case of this class this is Tanimoto.- Overrides:
getDefaultMetricIndexin classMolecularDescriptor- Returns:
- metric index of the default metric
-
getDefaultThreshold
public float getDefaultThreshold(int metricIndex) Gets a metric dependent default threshold value. Ideally, this value should be based on statistics, though the actual value is not too critical, since these are only used in user interfaces to simplify the use of applications for beginners.- Overrides:
getDefaultThresholdin classMolecularDescriptor- Parameters:
metricIndex- index of a parameterized metric
-
getCommonBitCount
-
getTanimoto
Calculates the Tanimoto metric.- Parameters:
f- the distance fromfis calculated- Returns:
- the tanimoto distance (dissimilarity coefficient)
-
getTversky
Calculates the Tversky !!DISSIMILARITY!! index: (1-(commonly used tversky))- Parameters:
f- the distance fromfis calculated- Returns:
- the Tversky dissmilarity index as float
-
getEuclidean
Calculates the Euclidean distance. This is the same as the Euclidean distance for bit string.- Parameters:
f- the distance fromfis calculated- Returns:
- the dissimilarity coefficient
-
getWeightedEuclidean
Calculates the weighted Euclidean distance. This is the same as the weighted Euclidean distance for bit strings.- Parameters:
f- the distance fromfis calculated- Returns:
- the dissimilarity coefficient
-
getAsymmetricEuclidean
Calculates the asymmetric Euclidean distance. This is the same as the asymmetric Euclidean distance for bit strings.- Parameters:
f- the distance fromfis calculated- Returns:
- the dissimilarity coefficient
-
getWeightedAsymmetricEuclidean
Calculates the weighted asymmetric Euclidean distance. This is the same as the asymmetric Euclidean distance for bit strings.- Parameters:
f- the distance fromfis calculated- Returns:
- the dissimilarity coefficient
-
getDissimilarity
Calculates the dissimilarity between two chemical fingerprints using the default distance measure.- Specified by:
getDissimilarityin classMolecularDescriptor- Parameters:
fp2- the other pahrmacophore fingerprint- Returns:
- dissimilarity ratio
-
getDissimilarity
Calculates the dissimilarity between two chemical fingerprints using the specified distance metric. The index of the required metric can be obtained by calling.getMetricIndex( String metricName )
New metrics implemented by this class have to be added at the end of the existing ones.- Specified by:
getDissimilarityin classMolecularDescriptor- Parameters:
fp2- the chemical fingerprint from which the distance is measuredmetricIndex- index of the metric to be used- Returns:
- the dissimilarity ratio
- See Also:
-
isSubSetOf
Checks if this fingerprint is a subset of another fingerprint that is passed as method parameter. A binary fingerprint is considered to be a subset of another if none of its bits is larger than that of the other's.- Parameters:
f- a descriptor which is supposed to be a superset- Returns:
- true if this descriptor is a subset of the parameter
-
getAliasNames
-