Class ECFPFeatureLookup
A related class, ECFPFeature
serves for representing the substructural features of
ECFP fingerprints. More precisely, each ECFPFeature
instance captures
a circular atom neighborhood of the input molecule by recording a central atom
and a diameter. The ECFP generation process assigns integer identifiers to these
substructural features by a hashing procedure. The positions of 1 bits in the fixed-length
bit string representation are derived from these identifiers.
This lookup class provides methods to obtain the represented ECFP features for a given
identifier or bit position.
Note that there is no one-to-one relationship between the substructural features and the
generated identifers. Therefore, the lookup methods of this class return a list of
corresponding ECFPFeature
objects for the given identifier or bit position.
Apparently, atom neighborhoods that are equivalent with respect to the considered atom
properties are represented by the same identifier and bit position.
However, unwanted collisions may also occur, especially for the fixed-length bit string
representation. That is, completely different substructural features may be represented by
the same bit position due to the applied hashing method (folding).
In such cases, all represented features are listed by the lookup methods.
(These collisions are inevitable effects of the limited representation capability
of fixed-length fingerprints.)
Apart from the collisions, it is also possible that two different identifiers represent
the same atom neighborhood but originating in different central atoms.
In such cases, the fingerprint generation method eliminates the redundancy by keeping only
one representation according to a specific rule.
For example, in the ECFP fingerprints of CO
, only three identifiers (bits)
are kept out of the generated four.
This class requires ECFP configuration parameters, which determine both the generation process of ECFP features and the standardization actions that should be applied on the input molecule. You should use exactly the same configuration parameters for fingerprint generation and feature retrieval to ensure correct results.
For more information about ECFPs, see the related HTML documentation.
Typical usage
ECFPFeatureLookup lookup = new ECFPFeatureLookup(); lookup.processMolecule(mol); for (ECFPFeature f : lookup.getFeaturesFromIdentifier(id)) { System.out.println(f.getSubstructure().toFormat("SMARTS")); }
- Since:
- JChem 5.5
- See Also:
-
Constructor Summary
ConstructorDescriptionCreates a newECFPFeatureLookup
instance with the default ECFP configuration parameters.ECFPFeatureLookup
(ECFPParameters params) Creates a newECFPFeatureLookup
instance with the given ECFP configuration parameters.ECFPFeatureLookup
(String configString) Creates a newECFPFeatureLookup
instance with the given ECFP configuration parameters. -
Method Summary
Modifier and TypeMethodDescriptionint
getBitPosition
(int id) Returns the corresponding bit position for the given integer identifier.getBitPosition
(MolAtom atom, int diameter) Returns the corresponding bit position for the given atom neighborhood.getFeaturesFromBitPosition
(int bitPos) Returns the substructural features represented by the given bit position.getFeaturesFromIdentifier
(int id) Returns the substructural features represented by the given integer identifier.getIdentifier
(MolAtom atom, int diameter) Returns the corresponding integer identifier for the given atom neighborhood.void
processMolecule
(Molecule mol) Performs the necessary preprocessing for the given molecule.
-
Constructor Details
-
ECFPFeatureLookup
public ECFPFeatureLookup()Creates a newECFPFeatureLookup
instance with the default ECFP configuration parameters. -
ECFPFeatureLookup
Creates a newECFPFeatureLookup
instance with the given ECFP configuration parameters.- Parameters:
configString
- ECFP configuration string in XML
-
ECFPFeatureLookup
Creates a newECFPFeatureLookup
instance with the given ECFP configuration parameters.- Parameters:
params
- ECFP parameters object
-
-
Method Details
-
processMolecule
Performs the necessary preprocessing for the given molecule.- Parameters:
mol
- the molecule
-
getFeaturesFromIdentifier
Returns the substructural features represented by the given integer identifier. If no such feature is found, this method returns an empty list.- Parameters:
id
- the identifier- Returns:
- the list of ECFP features
-
getFeaturesFromBitPosition
Returns the substructural features represented by the given bit position. If no such feature is found, this method returns an empty list.- Parameters:
bitPos
- the position in the fixed-length bit string- Returns:
- the list of ECFP features
-
getIdentifier
Returns the corresponding integer identifier for the given atom neighborhood.Note that the generated identifier is often removed by the fingerprint generation process because the same atom neighborhood is represented by another center atom and diameter. In these cases, this function returns
null
.- Parameters:
atom
- the center atom of the circular neighborhood. It must be a chemical atom that is not removed in the standardization phase.diameter
- the diameter of the circular neighborhood. It must be an even number between zero and the maximum diameter specified by the ECFP configuration parameters.- Returns:
- the integer identifier or
null
if no identifier corresponds to the given neighborhood in the generated fingerprint. - Throws:
IllegalArgumentException
- if the central atom or the diameter is illegal (e.g., the given atom is an explicit hydrogen, which is removed by the applied standardizer).
-
getBitPosition
Returns the corresponding bit position for the given atom neighborhood.Note that the generated identifier is often removed by the fingerprint generation process because the atom neighborhood is represented by another center atom and diameter. In these cases, this function returns
null
.- Parameters:
atom
- the center atom of the circular neighborhood. It must be a chemical atom that is not removed in the standardization phase.diameter
- the diameter of the circular neighborhood. It must be an even number between zero and the maximum diameter specified by the ECFP configuration parameters.- Returns:
- the bit position or
null
if no identifier corresponds to the given neighborhood in the generated fingerprint. - Throws:
IllegalArgumentException
- if the central atom or the diameter is illegal (e.g., the given atom is an explicit hydrogen, which is removed by the applied standardizer).
-
getBitPosition
public int getBitPosition(int id) Returns the corresponding bit position for the given integer identifier.- Parameters:
id
- the identifier- Returns:
- the bit position
-