Class DescriptorGenerator


  • @PublicAPI
    public class DescriptorGenerator
    extends Object
    Simple class for generating molecular descriptors (fingerprints). The main purpose of this class is to provide a lightweight common interface for creating various molecular descriptors and obtaining them in different formats.

    Typical usage

          DescriptorGenerator gen = new DescriptorGenerator("ECFP");
          gen.setParameter("Length", "512");
          Molecule mol = getFirstMoleculeFromSomewhere();
          while (mol != null) {
              gen.generate(mol);
              doSomethingWith(gen.getAsString());
              doSomethingWith(gen.getAsBitSet());
              mol = getNextMoleculeFromSomewhere();
          }
     
    Since:
    JChem 5.4
    • Constructor Detail

      • DescriptorGenerator

        public DescriptorGenerator​(String descrType)
        Creates a new instance using the given descriptor type with its default configuration parameters.
        Parameters:
        descrType - Predefined type name or class name of the desired molecular descriptor type. The list of available descriptor types can be obtained using getDescriptorTypes(). If the given string does not match any of the predefined names, it is assumed to be a class name.
        Throws:
        RuntimeException - if neither the given name matches any predefined descriptor type nor a derived class of MolecularDescriptor with that name can be initialized.
      • DescriptorGenerator

        public DescriptorGenerator​(String descrType,
                                   String configString)
                            throws MDParametersException
        Creates a new instance using the given descriptor type with the given XML configuration.
        Parameters:
        descrType - Predefined type name or class name of the desired molecular descriptor type. The list of available descriptor types can be obtained using getDescriptorTypes(). If the given string does not match any of the predefined names, it is assumed to be a class name.
        configString - XML configuration string for the selected descriptor type.
        Throws:
        RuntimeException - if neither the given name matches any predefined descriptor type nor a derived class of MolecularDescriptor with that name can be initialized.
        MDParametersException - if the XML configuration is invalid.
    • Method Detail

      • getDescriptorTypes

        public static String[] getDescriptorTypes()
        Returns the list of the built-in molecular descripor types. The returned array contains the short names of the descriptors. The long names can be obtained using getDescriptorLongName(String).
      • getDescriptorLongName

        public static String getDescriptorLongName​(String descrType)
        Returns the long name for the given molecular descriptor type.
        Parameters:
        descrType - Predefined short name of a descriptor type. The list of available short names can be obtained using getDescriptorTypes().
        Throws:
        IllegalArgumentException - if the given parameter is not an available descriptor type.
      • setParameter

        public void setParameter​(String paramName,
                                 String paramValue)
        Sets a parameter of the current descriptor configuration. Only a few main parameters for each descriptor type can be set, which are stored as attributes of a designated element in the XML configuration. For specifying more parameters, you should pass a full XML configuration to the constructor of the class.
        Parameters:
        paramName - the name of the parameter, which must be the same as the attribute name in the XML configuration.
        paramValue - the new value of the parameter.
      • setStandardizer

        public void setStandardizer​(Standardizer standardizer)
        Sets the standardizer object to be used during descriptor generation. This function replaces the standardizer that was defined before either by using this method or by the configuration parameters of the descriptor.
        Parameters:
        standardizer - the standardizer object
        Since:
        JChem 5.12
      • generate

        public void generate​(Molecule mol,
                             int[] atoms)
                      throws MDGeneratorException
        Generates partial descriptor for the given molecule. The generated descriptor will contain only those features that are related to the given atoms of the input molecule.

        Currently, only ChemicalFingerprint supports this kind of partial descriptor generation. UnsupportedOperationException is thrown for all other descriptor types.

        Parameters:
        mol - the molecule.
        atoms - indexes of the selected atoms.
        Throws:
        MDGeneratorException - if failed to generate descriptor.
        UnsupportedOperationException - if the selected descriptor type does not support partial generation.
        Since:
        JChem 5.4.1
      • getAsString

        public String getAsString()
        Returns the generated descriptor in its native string representation. This function is applicable to all kinds of descriptors.
      • getAsFloatArray

        public float[] getAsFloatArray()
                                throws UnsupportedOperationException
        Returns the generated descriptor in a float array representation if it is available.
        Throws:
        UnsupportedOperationException - if no appropriate conversion can be applied for the selected descriptor type.