Class ECFPFeatureLookup
- java.lang.Object
-
- chemaxon.descriptors.ECFPFeatureLookup
-
@PublicAPI public class ECFPFeatureLookup extends Object
Class for retrieving the substructural features of ECFP fingerprints. ECFPs are represented either as lists of integer identifiers or as fixed-length bit strings, in which the identifiers and bit positions account for particular substructural features of the input molecule. This class provides a lookup service for both kinds of ECFP representations.A related class,
ECFPFeature
serves for representing the substructural features of ECFP fingerprints. More precisely, eachECFPFeature
instance captures a circular atom neighborhood of the input molecule by recording a central atom and a diameter. The ECFP generation process assigns integer identifiers to these substructural features by a hashing procedure. The positions of 1 bits in the fixed-length bit string representation are derived from these identifiers. This lookup class provides methods to obtain the represented ECFP features for a given identifier or bit position.Note that there is no one-to-one relationship between the substructural features and the generated identifers. Therefore, the lookup methods of this class return a list of corresponding
ECFPFeature
objects for the given identifier or bit position. Apparently, atom neighborhoods that are equivalent with respect to the considered atom properties are represented by the same identifier and bit position. However, unwanted collisions may also occur, especially for the fixed-length bit string representation. That is, completely different substructural features may be represented by the same bit position due to the applied hashing method (folding). In such cases, all represented features are listed by the lookup methods. (These collisions are inevitable effects of the limited representation capability of fixed-length fingerprints.)Apart from the collisions, it is also possible that two different identifiers represent the same atom neighborhood but originating in different central atoms. In such cases, the fingerprint generation method eliminates the redundancy by keeping only one representation according to a specific rule. For example, in the ECFP fingerprints of
CO
, only three identifiers (bits) are kept out of the generated four.This class requires ECFP configuration parameters, which determine both the generation process of ECFP features and the standardization actions that should be applied on the input molecule. You should use exactly the same configuration parameters for fingerprint generation and feature retrieval to ensure correct results.
For more information about ECFPs, see the related HTML documentation.
Typical usage
ECFPFeatureLookup lookup = new ECFPFeatureLookup(); lookup.processMolecule(mol); for (ECFPFeature f : lookup.getFeaturesFromIdentifier(id)) { System.out.println(f.getSubstructure().toFormat("SMARTS")); }
- Since:
- JChem 5.5
- See Also:
ECFPFeature
,ECFP
-
-
Constructor Summary
Constructors Constructor Description ECFPFeatureLookup()
Creates a newECFPFeatureLookup
instance with the default ECFP configuration parameters.ECFPFeatureLookup(ECFPParameters params)
Creates a newECFPFeatureLookup
instance with the given ECFP configuration parameters.ECFPFeatureLookup(String configString)
Creates a newECFPFeatureLookup
instance with the given ECFP configuration parameters.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
getBitPosition(int id)
Returns the corresponding bit position for the given integer identifier.Integer
getBitPosition(MolAtom atom, int diameter)
Returns the corresponding bit position for the given atom neighborhood.List<ECFPFeature>
getFeaturesFromBitPosition(int bitPos)
Returns the substructural features represented by the given bit position.List<ECFPFeature>
getFeaturesFromIdentifier(int id)
Returns the substructural features represented by the given integer identifier.Integer
getIdentifier(MolAtom atom, int diameter)
Returns the corresponding integer identifier for the given atom neighborhood.void
processMolecule(Molecule mol)
Performs the necessary preprocessing for the given molecule.
-
-
-
Constructor Detail
-
ECFPFeatureLookup
public ECFPFeatureLookup()
Creates a newECFPFeatureLookup
instance with the default ECFP configuration parameters.
-
ECFPFeatureLookup
public ECFPFeatureLookup(String configString)
Creates a newECFPFeatureLookup
instance with the given ECFP configuration parameters.- Parameters:
configString
- ECFP configuration string in XML
-
ECFPFeatureLookup
public ECFPFeatureLookup(ECFPParameters params)
Creates a newECFPFeatureLookup
instance with the given ECFP configuration parameters.- Parameters:
params
- ECFP parameters object
-
-
Method Detail
-
processMolecule
public void processMolecule(Molecule mol)
Performs the necessary preprocessing for the given molecule.- Parameters:
mol
- the molecule
-
getFeaturesFromIdentifier
public List<ECFPFeature> getFeaturesFromIdentifier(int id)
Returns the substructural features represented by the given integer identifier. If no such feature is found, this method returns an empty list.- Parameters:
id
- the identifier- Returns:
- the list of ECFP features
-
getFeaturesFromBitPosition
public List<ECFPFeature> getFeaturesFromBitPosition(int bitPos)
Returns the substructural features represented by the given bit position. If no such feature is found, this method returns an empty list.- Parameters:
bitPos
- the position in the fixed-length bit string- Returns:
- the list of ECFP features
-
getIdentifier
public Integer getIdentifier(MolAtom atom, int diameter) throws IllegalArgumentException
Returns the corresponding integer identifier for the given atom neighborhood.Note that the generated identifier is often removed by the fingerprint generation process because the same atom neighborhood is represented by another center atom and diameter. In these cases, this function returns
null
.- Parameters:
atom
- the center atom of the circular neighborhood. It must be a chemical atom that is not removed in the standardization phase.diameter
- the diameter of the circular neighborhood. It must be an even number between zero and the maximum diameter specified by the ECFP configuration parameters.- Returns:
- the integer identifier or
null
if no identifier corresponds to the given neighborhood in the generated fingerprint. - Throws:
IllegalArgumentException
- if the central atom or the diameter is illegal (e.g., the given atom is an explicit hydrogen, which is removed by the applied standardizer).
-
getBitPosition
public Integer getBitPosition(MolAtom atom, int diameter) throws IllegalArgumentException
Returns the corresponding bit position for the given atom neighborhood.Note that the generated identifier is often removed by the fingerprint generation process because the atom neighborhood is represented by another center atom and diameter. In these cases, this function returns
null
.- Parameters:
atom
- the center atom of the circular neighborhood. It must be a chemical atom that is not removed in the standardization phase.diameter
- the diameter of the circular neighborhood. It must be an even number between zero and the maximum diameter specified by the ECFP configuration parameters.- Returns:
- the bit position or
null
if no identifier corresponds to the given neighborhood in the generated fingerprint. - Throws:
IllegalArgumentException
- if the central atom or the diameter is illegal (e.g., the given atom is an explicit hydrogen, which is removed by the applied standardizer).
-
getBitPosition
public int getBitPosition(int id)
Returns the corresponding bit position for the given integer identifier.- Parameters:
id
- the identifier- Returns:
- the bit position
-
-