Class MDParameters

  • Direct Known Subclasses:
    BCUTParameters, CDParameters, CFParameters, ECFPParameters, PFParameters, RFParameters, SDParameters, ShapeParameters

    @PublicAPI
    public class MDParameters
    extends Object
    MolecularDescriptor parameter settings. This class serves as the base class for the parameter classes of specific MolecularDescriptor derivatives.
    Descriptor objects of the same type share one common MDParameter object that stores all parameters.
    Besides storing parameters and other internal data that are better kept in this class than in individual MolecularDescriptor objects for the sake of memory efficiency.
    The naming convention - similarly to the derivates of the MolecularDescriptor class - is as follows: derived class name begins with the name of the corresponding MolecularDescriptor class and postfixed by Parameters. For instance that parameters class of the descriptor class MDXyZ is XyZParameters.
    Parameters are read from configuration files. MDParameters provides an extensive functionality to process XML configuration files, however, further parameter classes extending MDParameters do not necessarily have to use XML for storing parameters.
    MDParameters plays an important role in providing so called Screening Configurations for the dissimilarity calculations. Such configurations contain so called parameterized metrics that are based on dissimilarity metrics implemented in classes that extend the MolecularDescriptor class. It is important to make clear distinction between the above two cathegories: dissimilarity metrics are the basis of the parametrized metrics. MDParameters stores metrics parameters and provides services for retrieving and storing parametrized metrics, access them by either name or index.
    Another important functionality of this class is to allow the creation of new parametrized metrics and write them into the XML document.
    Since:
    JChem 2.2
    • Field Detail

      • DEFAULT_SCALE_FACTOR

        public static final float DEFAULT_SCALE_FACTOR
        constants, default parameter values
        See Also:
        Constant Field Values
      • DEFAULT_ASYMMETRY_FACTOR

        public static final float DEFAULT_ASYMMETRY_FACTOR
        See Also:
        Constant Field Values
      • DEFAULT_OUTPUT_PRECISION

        public static final int DEFAULT_OUTPUT_PRECISION
        See Also:
        Constant Field Values
      • cellSize

        protected int cellSize
        size - number of bits - of one descriptor cell
      • length

        protected int length
        the length of the descriptor: the number of cells
      • internalSize

        protected int internalSize
        required memory size of one descriptor instance
      • data

        protected byte[] data
        buffer for external data format generation, used in MolecularDescriptor.toData()
      • configFilePath

        protected String configFilePath
        location of the configuration file
      • document

        protected org.dom4j.Document document
        contains the XML document
      • standardizerConfigurationNode

        protected org.dom4j.Node standardizerConfigurationNode
        node defining the Standardizer configuration
      • similarityNode

        protected org.dom4j.Element similarityNode
        node holding the similarity calculations related parameters
      • screeningConfigurationNode

        protected org.dom4j.Node screeningConfigurationNode
      • parametrizedMetricsNode

        protected org.dom4j.Element parametrizedMetricsNode
      • parametrizedMetricNodes

        protected List parametrizedMetricNodes
      • parametrizedMetrics

        protected ArrayList parametrizedMetrics
        symbolic names (mnemonics) of parametrized metrics
      • metricIndexes

        protected ArrayList metricIndexes
        convert parameterized indexes to MolecularDescriptor metric indexes
      • scaleFactors

        protected ArrayList scaleFactors
        scale factor of scalable parametrized metrics
      • tverskyA

        protected ArrayList<Float> tverskyA
        alpha values for tversky dissimilarity
      • tverskyB

        protected ArrayList<Float> tverskyB
        beta values for tversky dissimilarity
      • asymmetryFactors

        protected ArrayList asymmetryFactors
        asymmetry ratio of parametrized asymmetric metrics
      • thresholds

        protected ArrayList thresholds
        dissimilarity thresholds values
      • normalized

        protected ArrayList normalized
        flags indicating if the metric is normalized or not
      • defaultWeight

        protected float defaultWeight
        value for all missing weight parameters
      • weights

        protected ArrayList weights
        weights for parametrized metrics
      • cellwiseWeights

        protected ArrayList cellwiseWeights
        is cell weights for parametrized metrics
      • outputPrecision

        protected int outputPrecision
        number of fraction digits in floating point output format
      • currentMetricIndex

        protected int currentMetricIndex
        index of the parametrized metric currently in use
      • md

        protected MolecularDescriptor md
        this object is needed to access default dissimilarity functions
      • decForm

        protected NumberFormat decForm
        to format floating point output
      • generator

        protected MDGenerator generator
        generates MolecularDescriptors
      • standardizer

        protected Standardizer standardizer
        transform molecules into standard form before descriptor generation
    • Constructor Detail

      • MDParameters

        protected MDParameters()
        Creates and initializes an empty object. This class is not allowed to be instantiated directly. Only its derived classes can call the superclass constructors.
    • Method Detail

      • initParameters

        protected void initParameters()
        Initializes object after configuration parameters are loaded.
      • fromString

        public void fromString​(String parameterString)
                        throws MDParametersException
        Sets parameters from a string representation. This method assumes that parameters are described in XML format.
        Parameters:
        parameterString - configuration parameters in string
        Throws:
        MDParametersException - when the parameter string is not well-formed
      • fromFile

        public void fromFile​(File parameterFile)
                      throws MDParametersException
        Sets parameters from an XML file. Stores filepath in configFilePath.
        Parameters:
        parameterFile - initialized parameter file
        Throws:
        MDParametersException - failed to process parameter file
      • addParameters

        public void addParameters​(String parameterString)
                           throws MDParametersException
        Sets parameters from an XML string representation keeping all previous settings.
        Parameters:
        parameterString - parameters in string
        Throws:
        MDParametersException - when the parameter string is not well-formed
      • addParameters

        public void addParameters​(File parameterFile)
                           throws MDParametersException
        Sets parameters from an XML config file keeping all previous settings.
        Parameters:
        parameterFile - parameter file
        Throws:
        MDParametersException - when the parameter string is not well-formed
      • setParameters

        public void setParameters​(String parametersString)
                           throws MDParametersException
        Sets parameters from an XML string representation overwriting all previous parameters settings with the new ones.
        Parameters:
        parametersString - parameters in string
        Throws:
        MDParametersException - when the parameter string is not well-formed
      • setParameters

        public void setParameters​(File parametersFile)
                           throws MDParametersException
        Sets parameters from an XML file representation overwriting all previous settings with the new ones. Stores filepath in configFilePath.
        Parameters:
        parametersFile - parameters File
        Throws:
        MDParametersException - when the parameter string is not well-formed
      • toString

        public String toString()
        Returns the parameter values in string. This implementation uses XML for the external format of parameters, however derived classes may use different formats.
        Overrides:
        toString in class Object
        Returns:
        parameter string
        Throws:
        MDParametersException - when creating the parameter string fails
      • getScreeningConfigurationString

        public String getScreeningConfigurationString​(String nodeName,
                                                      String attrib,
                                                      String value)
                                               throws MDParametersException
        Returns parts of the parameter values in string. Selects a sub-tree of the DOM tree specified by the tagname of one of its nodes, and writes the subtree into an XML string.
        Parameters:
        nodeName - name of the node to be printed
        attrib - attribute name
        value - value of the attribute
        Returns:
        parameter string
        Throws:
        MDParametersException - when creating the parameter string fails
      • toString

        protected String toString​(org.dom4j.Node node)
                           throws MDParametersException
        Returns parts of the parameter values in string. Selects a sub-tree of the DOM tree specified by a node, and writes the subtree into an XML string.
        Parameters:
        node - rootnode of the subtree to be printed
        Returns:
        parameter string
        Throws:
        MDParametersException - when creating the parameter string fails
      • setCellSize

        public void setCellSize​(int cellSize)
        Sets the size (number of bits) of the bins (cells). This has to be at least 1, but should not exceed 32.
        Parameters:
        cellSize - the width of one (and each) cell (bin) in bits
      • setLength

        public void setLength​(int length)
                       throws MDParametersException
        Sets the length (number of cells) of the descriptor.
        Parameters:
        length - the required length (cell count)
        Throws:
        MDParametersException - if argument is not positive
      • setScalingHypothesis

        public void setScalingHypothesis​(MolecularDescriptor scalingHypothesis)
        Sets (stores) the specified scaling hypothesis. It is used by various scaled metrics. It cannot be passed to these metrics directly as an argument, because of the uniform method header of metric functions (which has to be preserved).
        Parameters:
        scalingHypothesis - the consensus hypothesis used for scaling
      • setScaleFactor

        public void setScaleFactor​(float scaleFactor)
        Sets scaleFactor used with the current parametrized metrics.
        Parameters:
        scaleFactor - the new value of the scaleFactor
      • setAsymmetryFactor

        public void setAsymmetryFactor​(float af)
        Sets the value of the asymmetry factor of the current parametrized metric.
        Parameters:
        af - asymmetry factor
      • setThreshold

        public void setThreshold​(float th)
        Sets the value of the threshold of the current parametrized metric.
        Parameters:
        th - dissimilarity threshold value
      • setWeights

        public void setWeights​(float[] w)
        Sets the cell-wise weight factors for the current parametrized metric.
        Parameters:
        w - weights
      • setCellwiseWeights

        public void setCellwiseWeights​(boolean c)
        Sets boolean telling whether cell weights are to be generated for current parametrized metric.
        Parameters:
        c - true if cell weights
      • setNormalized

        public void setNormalized​(boolean yes)
        Toggles the normalized flag of the current parametrized metric.
        Parameters:
        yes - true, if the metric is normalized
      • setOutputPrecision

        public void setOutputPrecision​(int precision)
        Specifies the output precision for floating point values. This method can be used in conjunction with getDecForm().
        Parameters:
        precision - number of digits after the decimal point
      • setCurrentParametrizedMetric

        public void setCurrentParametrizedMetric​(int metricIndex)
        Selects the specified parametrized metric to be the current.
        Parameters:
        metricIndex - index of the selected parametrized metric
      • setCreateStatistics

        public void setCreateStatistics​(boolean createStatistics)
        Toggles the create statistics flag of the MDGenerator object.
        Parameters:
        createStatistics - new value for the create statistics flag
      • addParametrizedMetric

        public int addParametrizedMetric​(String name,
                                         String metric,
                                         String activeFamily)
                                  throws MDParametersException
        Expands the set of parametrized metrics with a new item. The first parameter is optional, if not specified the symbolic name is formed from the second and the third parameters (both of these are mandatory).
        Parameters:
        name - symbolic name of the parametrized metric
        metric - name of the metric (like Tanimoto, Euclidean etc)
        activeFamily - name of the active compounds family
        Throws:
        MDParametersException
      • getCellSize

        public int getCellSize()
        Gets the number of bits of an atomic cell in the descriptor.
        Returns:
        the number of bits in one single descriptor cell
      • getLength

        public int getLength()
        Returns the number of cells forming the descriptor.
      • getInternalSize

        public int getInternalSize()
        Gets the required memory size to store the descriptor according to the specified parameters.
        Returns:
        size of a suitable array that can store one descriptor
      • getData

        public byte[] getData()
        Gets the byte array which is used for conversions between internal and external data formats. Internal is the memory representation, while external is used for storing descriptors in files.
        Returns:
        an array large enough to hold the descriptor in external format
      • getCurrentMetricIndex

        public int getCurrentMetricIndex()
      • getNumberOfMetrics

        public int getNumberOfMetrics()
        Gets the total number of parametrized metrics available in the present configuration.
        Returns:
        number of metrics
      • getNumberOfWeights

        public int getNumberOfWeights()
        Gets the number of weights the current parametrized metric takes.
        Returns:
        number of weight factors corresponding to the current metric
      • getNumberOfWeights

        protected int getNumberOfWeights​(int parametrizedMetricIndex)
                                  throws IllegalArgumentException
        Gets the number of weight factors used by the specified metric. This method can be applied to the dissimilarity metrics provided by the MolecularDescriptor class or its derived classes, but not to parametrized metric.
        Parameters:
        parametrizedMetricIndex - parametrized metric index
        Returns:
        number of weights the metric uses
        Throws:
        IllegalArgumentException - if the given parameter is not a valid metric index
      • getThreshold

        public float getThreshold​(int metricIndex)
        Gets a metric dependent threshold value. Ideally, this value should be based on statistics, though the actual value is not too critical, since these are only used in user interfaces to simplify the use of applications for beginners. Note: this parametrized version of getThreshold() is kept for compatibility reasons.
        Parameters:
        metricIndex - index of a parametrized metric
        Returns:
        threshold corresponding to the given metric index
      • getThreshold

        public float getThreshold()
        Gets the threshold value being set for the current parametrized version.
        Returns:
        dissimilarity threshold corresponding to the curent metric
      • getScalingHypothesis

        public MolecularDescriptor getScalingHypothesis()
        Gets the scaling hypothesis used in scaled metrics.
        Returns:
        the scaling hypothesis
      • getInternalMetricIndex

        public int getInternalMetricIndex()
        Gets the MolecularDescriptor specific metric index of the current parametrized metric.
        Returns:
        metric index
      • getMetricName

        public String getMetricName()
        Gets the user defined symbolic name of the current parametrized metric.
        Returns:
        metric name
      • getMetricName

        public String getMetricName​(int metricIndex)
        Gets the user defined symbolic name of the specified parametrized metric.
        Returns:
        metric name
      • getMetricIndex

        public int getMetricIndex​(String name)
        Gets the index of the given parametrized metric.
        Parameters:
        name - name of the parametrized metric
        Returns:
        metric index
      • getScaleFactor

        public float getScaleFactor()
        Gets the scale factor used in the current parametrized scalable metrics.
        Returns:
        the scale factor
      • getAsymmetryFactor

        public float getAsymmetryFactor()
        Gets the asymmetry factor used in the current parametrized asymmetric metrics.
        Returns:
        the value of the asymmetry factor
      • getWeights

        public float[] getWeights()
        Gets all weights for the given parametrized metric. If the specified metric is not a weighted metric or correspong weights are not set, null is returned.
        Returns:
        all weights as float values
      • getTverskyAlpha

        public float getTverskyAlpha()
        Gets Tversky alpha value for the given parametrized metric.
        Returns:
        alpha as float value
      • getTverskyBeta

        public float getTverskyBeta()
        Gets Tversky beta value for the given parametrized metric.
        Returns:
        beta as float value
      • isCellwiseWeights

        public boolean isCellwiseWeights()
        Gets boolean telling whether cell weights are to be generated for current parametrized metric.
        Returns:
        true if weights are assigned to each individual descriptor cell
      • getDecForm

        public DecimalFormat getDecForm()
        Gets the formatter object that is capable of formatting fractions with given precision. Precision can be set by calling setOutputPrecision( int precision ) .
        Returns:
        decimal number formatter object (not localized)
      • isScaled

        public boolean isScaled()
        Returns whether current parametrized metric is scaled or not.
        Returns:
        true if the metric is scaled
      • isAsymmetric

        public boolean isAsymmetric()
        Returns whether current parametrized metric is asymmetric or not.
        Returns:
        true if the metric is asymmetric
      • isWeighted

        public boolean isWeighted()
        Returns whether current parametrized metric is weighted or not.
        Returns:
        true if the metric is weighted
      • isNormalized

        public boolean isNormalized()
        Returns whether current parametrized metric is normalized or not.
        Returns:
        true if the metric is normalized
      • isStandardizationMandatory

        public boolean isStandardizationMandatory()
        Checks is Standardization of molecules is mandatory for the corresponding MolecularDescriptor before descriptor generation. This method always returns true. Derived classes should override in case when standardization is not obligatory.
        Returns:
        whether or not standardization of molecules is needed
        Since:
        JChem 2.2
      • getDefaultStandardizerConfiguration

        public static String getDefaultStandardizerConfiguration()
        Gets the default configuration of the standardizer. This method is called if no standardizer configuration is set in the parameters configuration but standardization is mandatory for the corresponding MolecularDescriptor. The default on this top level is aromatization and dehydrogenization, but derived parameter classes may overload this behaviour.
        Returns:
        standardizer configuration XML string
        Since:
        JChem 2.2
      • getDefaultDocumentFrame

        public String getDefaultDocumentFrame()
        Gets the default XML configuration string. This is needed when an optional XML configuration is not specified. Descriptor files and descriptor tables always store the configuration that corresponds to the descriptors stored, thus sg. has to be storedeven when nothing is specified.
        Returns:
        default XML configuration string of the actual Parameters class
        Since:
        JChem 2.2
      • standardize

        public Molecule standardize​(Molecule m)
        Standardizes the Molecule and returns the standardized form. The standardization is configured via XML. StandardizerConfiguration is the corresponding XML tag. If no standardizar is set up, null is returned.
        Parameters:
        m - molecular structure to be standardized
        Returns:
        standardized form of the input structure, or null if standardization is mandatory for the corresponding descriptor
        Since:
        JChem 2.2
      • readFromXmlFile

        protected void readFromXmlFile​(File file,
                                       boolean merge,
                                       boolean all)
                                throws MDParametersException
        Reads configuration from XML file. Builds a DOM tree, picks nodes, elements that store information that can be processed on this level (leaves others for derived classes), and processes this information by converting values in to internal format and stores them in data members of this class. Stores filepath in configFilePath.
        Parameters:
        file - the XML file to read configuration data from
        merge - merge config from file into already existing parameters or overwrite existing parameter values
        all - process the complete document or only the ScreeningConfiguration tag
        Throws:
        MDParametersException - in the case of any failure
      • readFromXmlString

        protected void readFromXmlString​(String xml,
                                         boolean merge,
                                         boolean all)
                                  throws MDParametersException
        Reads configuration from XML string. Builds a DOM tree, picks nodes, elements that store information that can be processed on this level (leaves others for derived classes), and processes this information by converting values in to internal format and stores them in data members of this class.
        Parameters:
        xml - the XML string to get the configuration data from
        merge - merge config from file into already existing parameters or overwrite existing parameter values
        all - process the complete document or only the ScreeningConfiguration tag
        Throws:
        MDParametersException - in the case of any failure
      • checkDocumentVersion

        protected void checkDocumentVersion​(String docType,
                                            String version)
                                     throws MDParametersException
        Checks if the document is the right version
        Parameters:
        docType - the required document type
        version - the expected version number
        Throws:
        MDParametersException
      • processDocument

        protected void processDocument​(boolean all)
                                throws MDParametersException
        Searches the DOM tree for relevant nodes and sets internal variables to some these nodes for the sake of easier information processing.
        Parameters:
        all - process the complete document or only the ScreeningConfiguration tag
        Throws:
        MDParametersException
      • readValues

        protected void readValues​(boolean all)
                           throws MDParametersException
        Picks attribute values from the document tree that are relevant to the actual MDParameters sub-class.
        Parameters:
        all - process the complete document or only the ScreeningConfiguration tag
        Throws:
        MDParametersException
      • readMetricParameters

        protected void readMetricParameters()
                                     throws MDParametersException
        Processes all ParametrizedMetric nodes in the DOM tree. Reads parameterized metric names and associated parameter setting and stores them in data member for faster and easier access in getter methods.
        Throws:
        MDParametersException - if one of the nodes is not well-formed
      • writeMetricParameter

        protected void writeMetricParameter​(ArrayList pl,
                                            String attr,
                                            int mi,
                                            boolean useDecForm)
        Writes a given parameter of the specified metric into the corresponding tree node.
        Parameters:
        pl - list of parameters (for all metric indexes)
        attr - name of the attribute which the parameter corresponds to
        mi - index of the metric
        useDecForm - use precision for writing floating point values
      • appendParametrizedMetric

        protected int appendParametrizedMetric​(String name,
                                               String metric)
        Extends internal data with a new parametrized metric. Neither the DOM tree nor the XML document is modified.
        Parameters:
        name - name of the parametrized metric
        metric - dissimilarity metric name (as defined in its implementor class
      • addParametrizedMetricsNode

        protected void addParametrizedMetricsNode()
        Adds the ParametrizedMetrics node to the DOM tree.
      • addParametrizedMetricNode

        protected org.dom4j.Element addParametrizedMetricNode​(String name,
                                                              String activeFamily,
                                                              String metric)
        Adds a ParametrizedMetric node to the DOM tree.
        Parameters:
        name - name of the parameterized metric, given by the user
        activeFamily - name of the active compound family (e.g. ACE)
        metric - name of the dissimilarity metric
      • importNodes

        protected boolean importNodes​(org.dom4j.Document doc,
                                      boolean merge)
        Imports nodes from the specified Document into the current (main) Document. New nodes can either merged into the existing ones without removing them, or new nodes may overwrite exisiting nodes.
        Parameters:
        doc - import nodes from this document
        merge - merge (add new) or overwrite (replace with new) existing nodes
      • getDescriptorTypeName

        public static String getDescriptorTypeName​(String xmlConfig)
        Takes the descriptor type name from the root element of the XML configuration.
        Parameters:
        xmlConfig - configuration string
        Returns:
        descriptor type name