Class MolecularDescriptor

  • All Implemented Interfaces:
    Cloneable
    Direct Known Subclasses:
    BCUT, ChemicalFingerprint, CustomDescriptor, ECFP, PharmacophoreFingerprint, ReactionFingerprint, ScalarDescriptor, ShapeDescriptor

    @PublicAPI
    public abstract class MolecularDescriptor
    extends Object
    implements Cloneable
    Generic definition of molecular descriptors. The MolecularDescriptor class models all kinds of structural keys, fingerprints (hashed, pharmacophoric), MDL keys and many others which can be implemented in derived classes, some of these are implemented in JChem. For the sake of generality the MolecularDescriptor class does not introduce operations that manipulate descriptors on "atomic" level, that is, cells (or bins) of descriptors cannot be accessed either for reading or for writing. This is because cells in various chemical descriptors can have different type (for example bit, integer or floating point value).
    Operations between different MolecularDescriptor derivatives are not supported, though for the sake of efficiency no extra type checking is introduced (other than provided by the language itself).
    Derived classes should define their own dissimilarity metrics for the sake of efficiency. This is one reason why this class does not define any dissimilarity metric; the other is, that different MolecularDescriptor subclasses may have different representations (for instance integer array vs. float array).
    Since:
    JChem 2.0
    • Field Detail

      • params

        protected MDParameters params
        Parameter settings related to the descriptor. Instances of the MolecularDescriptor class with the same parameter setting share a common MDParameter object, however MolecularDescriptor objects having different parameters are also allowed in the same application. MDParameters can be specialized with inheritance, thus classes derived from MolecularDescriptor are recommended to derive their own MDParameters sub-clusses.
    • Constructor Detail

      • MolecularDescriptor

        public MolecularDescriptor​(MDParameters parameters)
        Creates a new MolecularDescriptor with the given parameters.
        Parameters:
        parameters - parameter settings of the descriptor to be created
        Since:
        JChem 2.2
      • MolecularDescriptor

        public MolecularDescriptor()
        Default constructor, creates an empty object.
      • MolecularDescriptor

        public MolecularDescriptor​(MolecularDescriptor c)
        Copy constructor, creates am identical copy of the MolecularDescriptor passed as a parameter.
        Parameters:
        c - a MolecularDescriptor to be copied
    • Method Detail

      • newInstance

        public static final MolecularDescriptor newInstance​(String descriptorTypeName,
                                                            String parameters)
        Creates a MolecularDescriptor specified by its name and xml parameter.
        Parameters:
        descriptorTypeName - predefined type name or class name If the name doesn't match any of the predefined names, it's assumed that it's a class name parameters xml parameter configuration
        Returns:
        a new instance of the appropriate MolecularDescriptor
      • newInstanceSupplier

        public static final com.google.common.base.Supplier<MolecularDescriptor> newInstanceSupplier​(String descriptorTypeName)
        Creates a Supplier of the specified molecular descriptor. See newInstance(String).
      • newInstance

        public static final MolecularDescriptor newInstance​(String descriptorTypeName)
        Creates a MolecularDescriptor specified by its name. The descriptor is created with the default parameter settings.
        Parameters:
        descriptorTypeName - predefined type name or class name If the name does not match any of the predefined names, it is assumed to be a class name
        Returns:
        a new instance of the appropriate MolecularDescriptor
      • newInstanceFromXML

        public static final MolecularDescriptor newInstanceFromXML​(String parameters)
        Creates a new MolecularDescriptor instance according to the given parameter string. The parameter string is and XML configuration that specifies the type of the MOlecularDescriptor as well as its parmeter settings.
        Parameters:
        parameters - XML configuration string
        Returns:
        a new instance of the appropriate MolecularDescriptor
      • clone

        public abstract MolecularDescriptor clone()
        Creates a new instance with identical internal state.
        Overrides:
        clone in class Object
        Returns:
        the newly copied object
      • getName

        public String getName()
        Gets the name of the descriptor. The name is not the same as the class name, it is nicer, more readable and meaningful for end-users too.
        Returns:
        the descriptor's name depending on its actual type
      • getShortName

        public String getShortName()
        Gets the short name of the descriptor.
        Returns:
        the short name used in text outputs (tables etc.)
      • getParametersClassName

        public String getParametersClassName()
        Gets the name of the parameters class corresponding to the descriptor (prefixed with the package name as getClass().getName() would return).
        Returns:
        the name of the parameters class
      • setParameters

        public void setParameters​(MDParameters parameters)
        Sets the parameters for an already created MolecularDescriptor.
        Parameters:
        parameters - parameter settings of the descriptor to be created
      • setParameters

        public abstract void setParameters​(String parameters)
                                    throws MDParametersException
        Sets the parameters for an already created MolecularDescriptor.
        Parameters:
        parameters - parameter settings of the descriptor. If it is null or empty, then the default settings will be used.
        Throws:
        MDParametersException
      • getParameters

        public MDParameters getParameters()
        Gets the parameters associated with the object.
        Returns:
        parameters settings
      • needsConfig

        public boolean needsConfig()
        Indicates if class takes parameters from configuration file. Derived classes have to override this method appropriately.
        Returns:
        true, most descriptors classes must have a configuration (file)
        Since:
        JChem 2.2
      • setScreeningConfiguration

        public void setScreeningConfiguration​(String config)
                                       throws MDParametersException
        Sets the screening configuration. Overwrites old parameters with the new ones, parameters not affected by the screening configuration remain unchanged.
        Parameters:
        config - screening configuration string
        Throws:
        MDParametersException
      • toData

        public abstract byte[] toData()
        Converts the internal (memory) representation of a MolecularDescriptor instance into an external format that can be stored in a database.
        Returns:
        binary representation of the descriptor
      • fromData

        public abstract void fromData​(byte[] dbRepr)
        Builds a MolecularDescriptor object from its external (database) representation.
        Parameters:
        dbRepr - an array generated by toData()
      • toString

        public abstract String toString()
        Creates the string representation of a MolecularDescriptor object. This string value is stored in SDfiles, though the use of this string is not limited to this purpose. Typically, this string is compact, for instance zero values are not necessarily printed.
        Overrides:
        toString in class Object
        Returns:
        a formatted string of the descriptor
      • toDecimalString

        public abstract String toDecimalString()
        Creates the string representation of a MolecularDescriptor object. This string value contains all values of the descriptor (including all zeros), values are separated by tabs.
        Returns:
        a formatted string of the descriptor
      • toBinaryString

        public String toBinaryString()
        Creates the binary string representation of a MolecularDescriptor object.
        Returns:
        a 0,1 string of the descriptor
        Since:
        JChem 2.3
      • fromString

        public abstract void fromString​(String descr)
                                 throws ParseException
        Builds a molecular descriptor from its string representation. Typically used when SDfile is read.
        Parameters:
        descr - descriptor string, previously generated by toString()
        Throws:
        ParseException
      • toFloatArray

        public abstract float[] toFloatArray()
        Creates the float array representation of a MolecularDescriptor object. This array contains all values of the descriptor (including all zeros) in the elements of the array.
        Returns:
        a formatted float array of the descriptor
        Since:
        JChem 2.0.1
      • fromFloatArray

        public abstract void fromFloatArray​(float[] descr)
        Builds a molecular descriptor from its float array representation. Typically used when a hypothesis is created.
        Parameters:
        descr - descriptor represented in a float array (e.g. generated by toFloatArray())
        Since:
        JChem 2.0.1
      • generate

        public String[] generate​(String molRepr)
                          throws MDGeneratorException
        Creates the descriptor for the given Molecule.
        Parameters:
        molRepr - String representation of the molecule eg.: smiles
        Returns:
        property names set in the molecule passed during generation
        Throws:
        MDGeneratorException - when failed to generate descriptor
      • getAtomSetColors

        public Color[] getAtomSetColors()
        Determines the coloring of atoms. This coloring does not reflect element types, instead other categories related to the specific descriptor are considered. Therefore, whenever the parameters are changed, it is advisable to call getAtomSetColors() to obtain the current coloring scheme. Typically, the coloring scheme is defined in the MDParameters object associated with the MolecularDescriptor object.
        Returns:
        array of colors of different atom categories
      • getAtomSetNames

        public String[] getAtomSetNames()
      • getAtomSetIndexes

        public int[] getAtomSetIndexes​(Molecule m)
        Gets the individual atom color indexes. This allows color mapping and visualization of various properties encoded into the MolecularDescriptor. Prior to this method, getAtomSetColors() has to be called to obtain a color array. This method returns a per atom color index array and the returned indexes refer to to the color array returned by getAtomSetColors.
        Parameters:
        m - a molecule to assign atom colors to
        Returns:
        array of color indexes (indexed by atom indexes)
      • getDissimilarityMetrics

        public abstract String[] getDissimilarityMetrics()
        Gets the dissimilarity metric names in an array.
        This method must be overloaded by derived classes in order to get the metrics array depending on the dynamic type. (This is needed because the metrics[] array is a class variable, but class variables are shared among all derived classes.)
        Returns:
        the metrics array
      • getDefaultDissimilarityMetricThresholds

        public abstract float[] getDefaultDissimilarityMetricThresholds()
        Gets the default dissimilarity threshold values for all dissimilarity metrics defined.
        Returns:
        array of dissimilarity threshold values
      • getDissimilarityMetricIndex

        public int getDissimilarityMetricIndex​(String metricName)
                                        throws IllegalArgumentException
        Gets the internal index of the given metric.
        Parameters:
        metricName - name of a metric
        Returns:
        index of the specified metric
        Throws:
        IllegalArgumentException
      • getNumberOfWeights

        public int getNumberOfWeights​(String dissimilarityMetricName)
                               throws IllegalArgumentException
        Gets the number of weight factors used by the specified metric. This method can be applied to the dissimilarity metrics provided by the MolecularDescriptor class or its derived classes, but not to parameterized metric.
        Parameters:
        dissimilarityMetricName - name as returned by getDissimilarityMetrics()
        Returns:
        number of weights the metric uses
        Throws:
        IllegalArgumentException - if the given parameter is not a valid metric name
      • getNumberOfMetrics

        public int getNumberOfMetrics()
        Gets the number of parameterized metrics available for the particular descriptor.
        Returns:
        number of metrics implemented in this class
      • getMetricIndex

        public int getMetricIndex​(String metricName)
                           throws IllegalArgumentException
        Gets the index of the given parameterized metric.
        Parameters:
        metricName - name of a metric
        Returns:
        index of the specified metric
        Throws:
        IllegalArgumentException - when given metric name is not valid
      • getDefaultThreshold

        public float getDefaultThreshold​(int metricIndex)
        Gets a metric dependent default threshold value. The actual value of this wired in parameter is not important, since it is only used in user interfaces to simplify the use of applications for beginners. Note: this method is for compatibility reasons.
        Parameters:
        metricIndex - index of a parameterized metric
      • getThreshold

        public float getThreshold​(int metricIndex)
        Gets a metric dependent default threshold value. Ideally, this value should be based on statistics, though the actual value is not too critical, since these are only used in user interfaces to simplify the use of applications for beginners. Note: this method is for compatibility reasons.
        Parameters:
        metricIndex - index of a parameterized metric
      • getThreshold

        public float getThreshold()
        Gets threshold value of the current parameterized metric.
        Returns:
        threshold value
      • getMetricName

        public String getMetricName()
        Gets the name of the current parameterized metric.
        Returns:
        name of the current metric
      • getMetricName

        public String getMetricName​(int metricIndex)
        Gets the name of a metric specified parameterized metric by its index. Note: this method is kept for backward compatibility.
        Parameters:
        metricIndex - metric index
        Returns:
        name of the given metric
      • getDefaultMetricIndex

        public int getDefaultMetricIndex()
        Gets the index of the default metric. The default metric is one of the available ones that is the most commonly used for the given MolecularDescriptor.
        Returns:
        metric index of the default metric
      • getDissimilarity

        public abstract float getDissimilarity​(MolecularDescriptor other)
        Calculates the dissimilarity ratio between two MolecularDescriptor objects using the default metric. Default metric is set in the corresponding MDParameters object. In the case of asymmetric distances swapping the two descriptors can make a big difference.
        Parameters:
        other - a descriptor, to which the dissimilarity ratio is measured
        Returns:
        dissimilarity ratio
      • getDissimilarity

        public abstract float getDissimilarity​(MolecularDescriptor other,
                                               int parametrizedMetricIndex)
        Calculates the dissimilarity between two MolecularDescriptor objects using the specified metric, apart from that it is the same as getDissimilarity( final MolecularDescriptor other ).
        Parameters:
        other - a descriptor, to which the dissimilarity ratio is measured
        parametrizedMetricIndex - the index of the parameterized metric to used
        Returns:
        dissimilarity ratio
        See Also:
        MDParameters, PFParameters
      • getLowerBound

        public float getLowerBound​(MolecularDescriptor other)
        Calculates an estimate for the minimum value of the distance using the default distance metric.
      • main

        public static void main​(String[] args)