Class MDParameters

java.lang.Object
chemaxon.descriptors.MDParameters
Direct Known Subclasses:
BCUTParameters, CDParameters, CFParameters, ECFPParameters, PFParameters, RFParameters, SDParameters, ShapeParameters

@PublicAPI public class MDParameters extends Object
MolecularDescriptor parameter settings. This class serves as the base class for the parameter classes of specific MolecularDescriptor derivatives.
Descriptor objects of the same type share one common MDParameter object that stores all parameters.
Besides storing parameters and other internal data that are better kept in this class than in individual MolecularDescriptor objects for the sake of memory efficiency.
The naming convention - similarly to the derivates of the MolecularDescriptor class - is as follows: derived class name begins with the name of the corresponding MolecularDescriptor class and postfixed by Parameters. For instance that parameters class of the descriptor class MDXyZ is XyZParameters.
Parameters are read from configuration files. MDParameters provides an extensive functionality to process XML configuration files, however, further parameter classes extending MDParameters do not necessarily have to use XML for storing parameters.
MDParameters plays an important role in providing so called Screening Configurations for the dissimilarity calculations. Such configurations contain so called parameterized metrics that are based on dissimilarity metrics implemented in classes that extend the MolecularDescriptor class. It is important to make clear distinction between the above two cathegories: dissimilarity metrics are the basis of the parametrized metrics. MDParameters stores metrics parameters and provides services for retrieving and storing parametrized metrics, access them by either name or index.
Another important functionality of this class is to allow the creation of new parametrized metrics and write them into the XML document.
Since:
JChem 2.2
  • Field Details

    • DEFAULT_SCALE_FACTOR

      public static final float DEFAULT_SCALE_FACTOR
      constants, default parameter values
      See Also:
    • DEFAULT_ASYMMETRY_FACTOR

      public static final float DEFAULT_ASYMMETRY_FACTOR
      See Also:
    • DEFAULT_WEIGHT

      public static final float DEFAULT_WEIGHT
      See Also:
    • DEFAULT_OUTPUT_PRECISION

      public static final int DEFAULT_OUTPUT_PRECISION
      See Also:
    • cellSize

      protected int cellSize
      size - number of bits - of one descriptor cell
    • length

      protected int length
      the length of the descriptor: the number of cells
    • internalSize

      protected int internalSize
      required memory size of one descriptor instance
    • data

      protected byte[] data
      buffer for external data format generation, used in MolecularDescriptor.toData()
    • configFilePath

      protected String configFilePath
      location of the configuration file
    • document

      protected org.dom4j.Document document
      contains the XML document
    • standardizerConfigurationNode

      protected org.dom4j.Node standardizerConfigurationNode
      node defining the Standardizer configuration
    • similarityNode

      protected org.dom4j.Element similarityNode
      node holding the similarity calculations related parameters
    • screeningConfigurationNode

      protected org.dom4j.Node screeningConfigurationNode
    • parametrizedMetricsNode

      protected org.dom4j.Element parametrizedMetricsNode
    • parametrizedMetricNodes

      protected List parametrizedMetricNodes
    • parametrizedMetrics

      protected ArrayList parametrizedMetrics
      symbolic names (mnemonics) of parametrized metrics
    • metricIndexes

      protected ArrayList metricIndexes
      convert parameterized indexes to MolecularDescriptor metric indexes
    • scaleFactors

      protected ArrayList scaleFactors
      scale factor of scalable parametrized metrics
    • tverskyA

      protected ArrayList<Float> tverskyA
      alpha values for tversky dissimilarity
    • tverskyB

      protected ArrayList<Float> tverskyB
      beta values for tversky dissimilarity
    • asymmetryFactors

      protected ArrayList asymmetryFactors
      asymmetry ratio of parametrized asymmetric metrics
    • thresholds

      protected ArrayList thresholds
      dissimilarity thresholds values
    • normalized

      protected ArrayList normalized
      flags indicating if the metric is normalized or not
    • defaultWeight

      protected float defaultWeight
      value for all missing weight parameters
    • weights

      protected ArrayList weights
      weights for parametrized metrics
    • cellwiseWeights

      protected ArrayList cellwiseWeights
      is cell weights for parametrized metrics
    • outputPrecision

      protected int outputPrecision
      number of fraction digits in floating point output format
    • currentMetricIndex

      protected int currentMetricIndex
      index of the parametrized metric currently in use
    • md

      protected MolecularDescriptor md
      this object is needed to access default dissimilarity functions
    • decForm

      protected NumberFormat decForm
      to format floating point output
    • generator

      protected MDGenerator generator
      generates MolecularDescriptors
    • standardizer

      protected Standardizer standardizer
      transform molecules into standard form before descriptor generation
  • Constructor Details

    • MDParameters

      protected MDParameters()
      Creates and initializes an empty object. This class is not allowed to be instantiated directly. Only its derived classes can call the superclass constructors.
  • Method Details

    • initParameters

      protected void initParameters()
      Initializes object after configuration parameters are loaded.
    • fromString

      public void fromString(String parameterString) throws MDParametersException
      Sets parameters from a string representation. This method assumes that parameters are described in XML format.
      Parameters:
      parameterString - configuration parameters in string
      Throws:
      MDParametersException - when the parameter string is not well-formed
    • fromFile

      public void fromFile(File parameterFile) throws MDParametersException
      Sets parameters from an XML file. Stores filepath in configFilePath.
      Parameters:
      parameterFile - initialized parameter file
      Throws:
      MDParametersException - failed to process parameter file
    • addParameters

      public void addParameters(String parameterString) throws MDParametersException
      Sets parameters from an XML string representation keeping all previous settings.
      Parameters:
      parameterString - parameters in string
      Throws:
      MDParametersException - when the parameter string is not well-formed
    • addParameters

      public void addParameters(File parameterFile) throws MDParametersException
      Sets parameters from an XML config file keeping all previous settings.
      Parameters:
      parameterFile - parameter file
      Throws:
      MDParametersException - when the parameter string is not well-formed
    • setParameters

      public void setParameters(String parametersString) throws MDParametersException
      Sets parameters from an XML string representation overwriting all previous parameters settings with the new ones.
      Parameters:
      parametersString - parameters in string
      Throws:
      MDParametersException - when the parameter string is not well-formed
    • setParameters

      public void setParameters(File parametersFile) throws MDParametersException
      Sets parameters from an XML file representation overwriting all previous settings with the new ones. Stores filepath in configFilePath.
      Parameters:
      parametersFile - parameters File
      Throws:
      MDParametersException - when the parameter string is not well-formed
    • toString

      public String toString()
      Returns the parameter values in string. This implementation uses XML for the external format of parameters, however derived classes may use different formats.
      Overrides:
      toString in class Object
      Returns:
      parameter string
      Throws:
      MDParametersException - when creating the parameter string fails
    • getScreeningConfigurationString

      public String getScreeningConfigurationString(String nodeName, String attrib, String value) throws MDParametersException
      Returns parts of the parameter values in string. Selects a sub-tree of the DOM tree specified by the tagname of one of its nodes, and writes the subtree into an XML string.
      Parameters:
      nodeName - name of the node to be printed
      attrib - attribute name
      value - value of the attribute
      Returns:
      parameter string
      Throws:
      MDParametersException - when creating the parameter string fails
    • toString

      protected String toString(org.dom4j.Node node) throws MDParametersException
      Returns parts of the parameter values in string. Selects a sub-tree of the DOM tree specified by a node, and writes the subtree into an XML string.
      Parameters:
      node - rootnode of the subtree to be printed
      Returns:
      parameter string
      Throws:
      MDParametersException - when creating the parameter string fails
    • setCellSize

      public void setCellSize(int cellSize)
      Sets the size (number of bits) of the bins (cells). This has to be at least 1, but should not exceed 32.
      Parameters:
      cellSize - the width of one (and each) cell (bin) in bits
    • setLength

      public void setLength(int length) throws MDParametersException
      Sets the length (number of cells) of the descriptor.
      Parameters:
      length - the required length (cell count)
      Throws:
      MDParametersException - if argument is not positive
    • setScalingHypothesis

      public void setScalingHypothesis(MolecularDescriptor scalingHypothesis)
      Sets (stores) the specified scaling hypothesis. It is used by various scaled metrics. It cannot be passed to these metrics directly as an argument, because of the uniform method header of metric functions (which has to be preserved).
      Parameters:
      scalingHypothesis - the consensus hypothesis used for scaling
    • setScaleFactor

      public void setScaleFactor(float scaleFactor)
      Sets scaleFactor used with the current parametrized metrics.
      Parameters:
      scaleFactor - the new value of the scaleFactor
    • setAsymmetryFactor

      public void setAsymmetryFactor(float af)
      Sets the value of the asymmetry factor of the current parametrized metric.
      Parameters:
      af - asymmetry factor
    • setThreshold

      public void setThreshold(float th)
      Sets the value of the threshold of the current parametrized metric.
      Parameters:
      th - dissimilarity threshold value
    • setWeights

      public void setWeights(float[] w)
      Sets the cell-wise weight factors for the current parametrized metric.
      Parameters:
      w - weights
    • setCellwiseWeights

      public void setCellwiseWeights(boolean c)
      Sets boolean telling whether cell weights are to be generated for current parametrized metric.
      Parameters:
      c - true if cell weights
    • setNormalized

      public void setNormalized(boolean yes)
      Toggles the normalized flag of the current parametrized metric.
      Parameters:
      yes - true, if the metric is normalized
    • setOutputPrecision

      public void setOutputPrecision(int precision)
      Specifies the output precision for floating point values. This method can be used in conjunction with getDecForm().
      Parameters:
      precision - number of digits after the decimal point
    • setCurrentParametrizedMetric

      public void setCurrentParametrizedMetric(int metricIndex)
      Selects the specified parametrized metric to be the current.
      Parameters:
      metricIndex - index of the selected parametrized metric
    • setCreateStatistics

      public void setCreateStatistics(boolean createStatistics)
      Toggles the create statistics flag of the MDGenerator object.
      Parameters:
      createStatistics - new value for the create statistics flag
    • addParametrizedMetric

      public int addParametrizedMetric(String name, String metric, String activeFamily) throws MDParametersException
      Expands the set of parametrized metrics with a new item. The first parameter is optional, if not specified the symbolic name is formed from the second and the third parameters (both of these are mandatory).
      Parameters:
      name - symbolic name of the parametrized metric
      metric - name of the metric (like Tanimoto, Euclidean etc)
      activeFamily - name of the active compounds family
      Throws:
      MDParametersException
    • getCellSize

      public int getCellSize()
      Gets the number of bits of an atomic cell in the descriptor.
      Returns:
      the number of bits in one single descriptor cell
    • getLength

      public int getLength()
      Returns the number of cells forming the descriptor.
    • getInternalSize

      public int getInternalSize()
      Gets the required memory size to store the descriptor according to the specified parameters.
      Returns:
      size of a suitable array that can store one descriptor
    • getData

      public byte[] getData()
      Gets the byte array which is used for conversions between internal and external data formats. Internal is the memory representation, while external is used for storing descriptors in files.
      Returns:
      an array large enough to hold the descriptor in external format
    • getCurrentMetricIndex

      public int getCurrentMetricIndex()
    • getNumberOfMetrics

      public int getNumberOfMetrics()
      Gets the total number of parametrized metrics available in the present configuration.
      Returns:
      number of metrics
    • getNumberOfWeights

      public int getNumberOfWeights()
      Gets the number of weights the current parametrized metric takes.
      Returns:
      number of weight factors corresponding to the current metric
    • getNumberOfWeights

      protected int getNumberOfWeights(int parametrizedMetricIndex) throws IllegalArgumentException
      Gets the number of weight factors used by the specified metric. This method can be applied to the dissimilarity metrics provided by the MolecularDescriptor class or its derived classes, but not to parametrized metric.
      Parameters:
      parametrizedMetricIndex - parametrized metric index
      Returns:
      number of weights the metric uses
      Throws:
      IllegalArgumentException - if the given parameter is not a valid metric index
    • getThreshold

      public float getThreshold(int metricIndex)
      Gets a metric dependent threshold value. Ideally, this value should be based on statistics, though the actual value is not too critical, since these are only used in user interfaces to simplify the use of applications for beginners. Note: this parametrized version of getThreshold() is kept for compatibility reasons.
      Parameters:
      metricIndex - index of a parametrized metric
      Returns:
      threshold corresponding to the given metric index
    • getThreshold

      public float getThreshold()
      Gets the threshold value being set for the current parametrized version.
      Returns:
      dissimilarity threshold corresponding to the curent metric
    • getScalingHypothesis

      public MolecularDescriptor getScalingHypothesis()
      Gets the scaling hypothesis used in scaled metrics.
      Returns:
      the scaling hypothesis
    • getInternalMetricIndex

      public int getInternalMetricIndex()
      Gets the MolecularDescriptor specific metric index of the current parametrized metric.
      Returns:
      metric index
    • getMetricName

      public String getMetricName()
      Gets the user defined symbolic name of the current parametrized metric.
      Returns:
      metric name
    • getMetricName

      public String getMetricName(int metricIndex)
      Gets the user defined symbolic name of the specified parametrized metric.
      Returns:
      metric name
    • getMetricIndex

      public int getMetricIndex(String name)
      Gets the index of the given parametrized metric.
      Parameters:
      name - name of the parametrized metric
      Returns:
      metric index
    • getScaleFactor

      public float getScaleFactor()
      Gets the scale factor used in the current parametrized scalable metrics.
      Returns:
      the scale factor
    • getAsymmetryFactor

      public float getAsymmetryFactor()
      Gets the asymmetry factor used in the current parametrized asymmetric metrics.
      Returns:
      the value of the asymmetry factor
    • getWeights

      public float[] getWeights()
      Gets all weights for the given parametrized metric. If the specified metric is not a weighted metric or correspong weights are not set, null is returned.
      Returns:
      all weights as float values
    • getTverskyAlpha

      public float getTverskyAlpha()
      Gets Tversky alpha value for the given parametrized metric.
      Returns:
      alpha as float value
    • getTverskyBeta

      public float getTverskyBeta()
      Gets Tversky beta value for the given parametrized metric.
      Returns:
      beta as float value
    • isCellwiseWeights

      public boolean isCellwiseWeights()
      Gets boolean telling whether cell weights are to be generated for current parametrized metric.
      Returns:
      true if weights are assigned to each individual descriptor cell
    • getDecForm

      public DecimalFormat getDecForm()
      Gets the formatter object that is capable of formatting fractions with given precision. Precision can be set by calling setOutputPrecision( int precision ) .
      Returns:
      decimal number formatter object (not localized)
    • isScaled

      public boolean isScaled()
      Returns whether current parametrized metric is scaled or not.
      Returns:
      true if the metric is scaled
    • isAsymmetric

      public boolean isAsymmetric()
      Returns whether current parametrized metric is asymmetric or not.
      Returns:
      true if the metric is asymmetric
    • isWeighted

      public boolean isWeighted()
      Returns whether current parametrized metric is weighted or not.
      Returns:
      true if the metric is weighted
    • isNormalized

      public boolean isNormalized()
      Returns whether current parametrized metric is normalized or not.
      Returns:
      true if the metric is normalized
    • isStandardizationMandatory

      public boolean isStandardizationMandatory()
      Checks is Standardization of molecules is mandatory for the corresponding MolecularDescriptor before descriptor generation. This method always returns true. Derived classes should override in case when standardization is not obligatory.
      Returns:
      whether or not standardization of molecules is needed
      Since:
      JChem 2.2
    • getDefaultStandardizerConfiguration

      public static String getDefaultStandardizerConfiguration()
      Gets the default configuration of the standardizer. This method is called if no standardizer configuration is set in the parameters configuration but standardization is mandatory for the corresponding MolecularDescriptor. The default on this top level is aromatization and dehydrogenization, but derived parameter classes may overload this behaviour.
      Returns:
      standardizer configuration XML string
      Since:
      JChem 2.2
    • getDefaultDocumentFrame

      public String getDefaultDocumentFrame()
      Gets the default XML configuration string. This is needed when an optional XML configuration is not specified. Descriptor files and descriptor tables always store the configuration that corresponds to the descriptors stored, thus sg. has to be storedeven when nothing is specified.
      Returns:
      default XML configuration string of the actual Parameters class
      Since:
      JChem 2.2
    • standardize

      public Molecule standardize(Molecule m)
      Standardizes the Molecule and returns the standardized form. The standardization is configured via XML. StandardizerConfiguration is the corresponding XML tag. If no standardizar is set up, null is returned.
      Parameters:
      m - molecular structure to be standardized
      Returns:
      standardized form of the input structure, or null if standardization is mandatory for the corresponding descriptor
      Since:
      JChem 2.2
    • readFromXmlFile

      protected void readFromXmlFile(File file, boolean merge, boolean all) throws MDParametersException
      Reads configuration from XML file. Builds a DOM tree, picks nodes, elements that store information that can be processed on this level (leaves others for derived classes), and processes this information by converting values in to internal format and stores them in data members of this class. Stores filepath in configFilePath.
      Parameters:
      file - the XML file to read configuration data from
      merge - merge config from file into already existing parameters or overwrite existing parameter values
      all - process the complete document or only the ScreeningConfiguration tag
      Throws:
      MDParametersException - in the case of any failure
    • readFromXmlString

      protected void readFromXmlString(String xml, boolean merge, boolean all) throws MDParametersException
      Reads configuration from XML string. Builds a DOM tree, picks nodes, elements that store information that can be processed on this level (leaves others for derived classes), and processes this information by converting values in to internal format and stores them in data members of this class.
      Parameters:
      xml - the XML string to get the configuration data from
      merge - merge config from file into already existing parameters or overwrite existing parameter values
      all - process the complete document or only the ScreeningConfiguration tag
      Throws:
      MDParametersException - in the case of any failure
    • checkDocumentVersion

      protected void checkDocumentVersion(String docType, String version) throws MDParametersException
      Checks if the document is the right version
      Parameters:
      docType - the required document type
      version - the expected version number
      Throws:
      MDParametersException
    • processDocument

      protected void processDocument(boolean all) throws MDParametersException
      Searches the DOM tree for relevant nodes and sets internal variables to some these nodes for the sake of easier information processing.
      Parameters:
      all - process the complete document or only the ScreeningConfiguration tag
      Throws:
      MDParametersException
    • readValues

      protected void readValues(boolean all) throws MDParametersException
      Picks attribute values from the document tree that are relevant to the actual MDParameters sub-class.
      Parameters:
      all - process the complete document or only the ScreeningConfiguration tag
      Throws:
      MDParametersException
    • readMetricParameters

      protected void readMetricParameters() throws MDParametersException
      Processes all ParametrizedMetric nodes in the DOM tree. Reads parameterized metric names and associated parameter setting and stores them in data member for faster and easier access in getter methods.
      Throws:
      MDParametersException - if one of the nodes is not well-formed
    • readMetricWeights

      protected void readMetricWeights(org.dom4j.Element parametrizedMetric, int metricIndex) throws MDParametersException
      Throws:
      MDParametersException
    • writeMetricParameter

      protected void writeMetricParameter(ArrayList pl, String attr, int mi, boolean useDecForm)
      Writes a given parameter of the specified metric into the corresponding tree node.
      Parameters:
      pl - list of parameters (for all metric indexes)
      attr - name of the attribute which the parameter corresponds to
      mi - index of the metric
      useDecForm - use precision for writing floating point values
    • appendParametrizedMetric

      protected int appendParametrizedMetric(String name, String metric)
      Extends internal data with a new parametrized metric. Neither the DOM tree nor the XML document is modified.
      Parameters:
      name - name of the parametrized metric
      metric - dissimilarity metric name (as defined in its implementor class
    • addParametrizedMetricsNode

      protected void addParametrizedMetricsNode()
      Adds the ParametrizedMetrics node to the DOM tree.
    • addParametrizedMetricNode

      protected org.dom4j.Element addParametrizedMetricNode(String name, String activeFamily, String metric)
      Adds a ParametrizedMetric node to the DOM tree.
      Parameters:
      name - name of the parameterized metric, given by the user
      activeFamily - name of the active compound family (e.g. ACE)
      metric - name of the dissimilarity metric
    • importNodes

      protected boolean importNodes(org.dom4j.Document doc, boolean merge)
      Imports nodes from the specified Document into the current (main) Document. New nodes can either merged into the existing ones without removing them, or new nodes may overwrite exisiting nodes.
      Parameters:
      doc - import nodes from this document
      merge - merge (add new) or overwrite (replace with new) existing nodes
    • getDescriptorTypeName

      public static String getDescriptorTypeName(String xmlConfig)
      Takes the descriptor type name from the root element of the XML configuration.
      Parameters:
      xmlConfig - configuration string
      Returns:
      descriptor type name