Package chemaxon.descriptors
Class MDParameters
java.lang.Object
chemaxon.descriptors.MDParameters
- Direct Known Subclasses:
BCUTParameters
,CDParameters
,CFParameters
,ECFPParameters
,PFParameters
,RFParameters
,SDParameters
,ShapeParameters
MolecularDescriptor
parameter settings. This class serves as
the base class for the parameter classes of specific
MolecularDescriptor
derivatives.
Descriptor objects of the same type share one common MDParameter object that stores all parameters.
Besides storing parameters and other internal data that are better kept in this class than in individual
MolecularDescriptor
objects for
the sake of memory efficiency.
The naming convention - similarly to the derivates of the
MolecularDescriptor
class - is as follows: derived class name begins
with the name of the corresponding MolecularDescriptor
class and
postfixed by Parameters. For instance that parameters class of the descriptor
class MDXyZ
is XyZParameters
.
Parameters are read from configuration files.
MDParameters
provides an extensive functionality to process XML configuration files,
however, further parameter classes extending MDParameters
do
not necessarily have to use XML for storing parameters.
MDParameters
plays an important role in providing so called
Screening Configurations for the dissimilarity calculations.
Such configurations contain so called parameterized metrics that are
based on dissimilarity metrics implemented in classes that extend the
MolecularDescriptor
class. It is important to make clear
distinction between the above two cathegories: dissimilarity metrics are
the basis of the parametrized metrics. MDParameters
stores
metrics parameters and provides services for retrieving and storing
parametrized metrics, access them by either name or index.
Another important functionality of this class is to allow the creation of new parametrized metrics and write them into the XML document.
- Since:
- JChem 2.2
-
Field Summary
Modifier and TypeFieldDescriptionprotected ArrayList
asymmetry ratio of parametrized asymmetric metricsprotected int
size - number of bits - of one descriptor cellprotected ArrayList
is cell weights for parametrized metricsprotected String
location of the configuration fileprotected int
index of the parametrized metric currently in useprotected byte[]
buffer for external data format generation, used inMolecularDescriptor.toData()
protected NumberFormat
to format floating point outputstatic final float
static final int
static final float
constants, default parameter valuesstatic final float
protected float
value for all missing weight parametersprotected org.dom4j.Document
contains the XML documentprotected MDGenerator
generatesMolecularDescriptors
protected int
required memory size of one descriptor instanceprotected int
the length of the descriptor: the number of cellsprotected MolecularDescriptor
this object is needed to access default dissimilarity functionsprotected ArrayList
convert parameterized indexes to MolecularDescriptor metric indexesprotected ArrayList
flags indicating if the metric is normalized or notprotected int
number of fraction digits in floating point output formatprotected List
protected ArrayList
symbolic names (mnemonics) of parametrized metricsprotected org.dom4j.Element
protected ArrayList
scale factor of scalable parametrized metricsprotected org.dom4j.Node
protected org.dom4j.Element
node holding the similarity calculations related parametersprotected Standardizer
transform molecules into standard form before descriptor generationprotected org.dom4j.Node
node defining the Standardizer configurationprotected ArrayList
dissimilarity thresholds valuesalpha values for tversky dissimilaritybeta values for tversky dissimilarityprotected ArrayList
weights for parametrized metrics -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
addParameters
(File parameterFile) Sets parameters from an XML config file keeping all previous settings.void
addParameters
(String parameterString) Sets parameters from an XML string representation keeping all previous settings.int
addParametrizedMetric
(String name, String metric, String activeFamily) Expands the set of parametrized metrics with a new item.protected org.dom4j.Element
addParametrizedMetricNode
(String name, String activeFamily, String metric) Adds aParametrizedMetric
node to the DOM tree.protected void
Adds theParametrizedMetrics
node to the DOM tree.protected int
appendParametrizedMetric
(String name, String metric) Extends internal data with a new parametrized metric.protected void
checkDocumentVersion
(String docType, String version) Checks if the document is the right versionvoid
Sets parameters from an XML file.void
fromString
(String parameterString) Sets parameters from a string representation.float
Gets the asymmetry factor used in the current parametrized asymmetric metrics.int
Gets the number of bits of an atomic cell in the descriptor.int
byte[]
getData()
Gets the byte array which is used for conversions between internal and external data formats.Gets the formatter object that is capable of formatting fractions with given precision.Gets the default XML configuration string.static String
Gets the default configuration of the standardizer.static String
getDescriptorTypeName
(String xmlConfig) Takes the descriptor type name from the root element of the XML configuration.int
Gets the MolecularDescriptor specific metric index of the current parametrized metric.int
Gets the required memory size to store the descriptor according to the specified parameters.int
Returns the number of cells forming the descriptor.int
getMetricIndex
(String name) Gets the index of the given parametrized metric.Gets the user defined symbolic name of the current parametrized metric.getMetricName
(int metricIndex) Gets the user defined symbolic name of the specified parametrized metric.int
Gets the total number of parametrized metrics available in the present configuration.int
Gets the number of weights the current parametrized metric takes.protected int
getNumberOfWeights
(int parametrizedMetricIndex) Gets the number of weight factors used by the specified metric.float
Gets the scale factor used in the current parametrized scalable metrics.Gets the scaling hypothesis used in scaled metrics.getScreeningConfigurationString
(String nodeName, String attrib, String value) Returns parts of the parameter values in string.float
Gets the threshold value being set for the current parametrized version.float
getThreshold
(int metricIndex) Gets a metric dependent threshold value.float
Gets Tversky alpha value for the given parametrized metric.float
Gets Tversky beta value for the given parametrized metric.float[]
Gets all weights for the given parametrized metric.protected boolean
importNodes
(org.dom4j.Document doc, boolean merge) Imports nodes from the specifiedDocument
into the current (main)Document
.protected void
Initializes object after configuration parameters are loaded.boolean
Returns whether current parametrized metric is asymmetric or not.boolean
Gets boolean telling whether cell weights are to be generated for current parametrized metric.boolean
Returns whether current parametrized metric is normalized or not.boolean
isScaled()
Returns whether current parametrized metric is scaled or not.boolean
Checks is Standardization of molecules is mandatory for the correspondingMolecularDescriptor
before descriptor generation.boolean
Returns whether current parametrized metric is weighted or not.protected void
processDocument
(boolean all) Searches the DOM tree for relevant nodes and sets internal variables to some these nodes for the sake of easier information processing.protected void
readFromXmlFile
(File file, boolean merge, boolean all) Reads configuration from XML file.protected void
readFromXmlString
(String xml, boolean merge, boolean all) Reads configuration from XML string.protected void
Processes allParametrizedMetric
nodes in the DOM tree.protected void
readMetricWeights
(org.dom4j.Element parametrizedMetric, int metricIndex) protected void
readValues
(boolean all) Picks attribute values from the document tree that are relevant to the actualMDParameters
sub-class.void
setAsymmetryFactor
(float af) Sets the value of the asymmetry factor of the current parametrized metric.void
setCellSize
(int cellSize) Sets the size (number of bits) of the bins (cells).void
setCellwiseWeights
(boolean c) Sets boolean telling whether cell weights are to be generated for current parametrized metric.void
setCreateStatistics
(boolean createStatistics) Toggles the create statistics flag of theMDGenerator
object.void
setCurrentParametrizedMetric
(int metricIndex) Selects the specified parametrized metric to be the current.void
setLength
(int length) Sets the length (number of cells) of the descriptor.void
setNormalized
(boolean yes) Toggles the normalized flag of the current parametrized metric.void
setOutputPrecision
(int precision) Specifies the output precision for floating point values.void
setParameters
(File parametersFile) Sets parameters from an XML file representation overwriting all previous settings with the new ones.void
setParameters
(String parametersString) Sets parameters from an XML string representation overwriting all previous parameters settings with the new ones.void
setScaleFactor
(float scaleFactor) Sets scaleFactor used with the current parametrized metrics.void
setScalingHypothesis
(MolecularDescriptor scalingHypothesis) Sets (stores) the specified scaling hypothesis.void
setThreshold
(float th) Sets the value of the threshold of the current parametrized metric.void
setWeights
(float[] w) Sets the cell-wise weight factors for the current parametrized metric.Standardizes theMolecule
and returns the standardized form.toString()
Returns the parameter values in string.protected String
toString
(org.dom4j.Node node) Returns parts of the parameter values in string.protected void
writeMetricParameter
(ArrayList pl, String attr, int mi, boolean useDecForm) Writes a given parameter of the specified metric into the corresponding tree node.
-
Field Details
-
DEFAULT_SCALE_FACTOR
public static final float DEFAULT_SCALE_FACTORconstants, default parameter values- See Also:
-
DEFAULT_ASYMMETRY_FACTOR
public static final float DEFAULT_ASYMMETRY_FACTOR- See Also:
-
DEFAULT_WEIGHT
public static final float DEFAULT_WEIGHT- See Also:
-
DEFAULT_OUTPUT_PRECISION
public static final int DEFAULT_OUTPUT_PRECISION- See Also:
-
cellSize
protected int cellSizesize - number of bits - of one descriptor cell -
length
protected int lengththe length of the descriptor: the number of cells -
internalSize
protected int internalSizerequired memory size of one descriptor instance -
data
protected byte[] databuffer for external data format generation, used inMolecularDescriptor.toData()
-
configFilePath
location of the configuration file -
document
protected org.dom4j.Document documentcontains the XML document -
standardizerConfigurationNode
protected org.dom4j.Node standardizerConfigurationNodenode defining the Standardizer configuration -
similarityNode
protected org.dom4j.Element similarityNodenode holding the similarity calculations related parameters -
screeningConfigurationNode
protected org.dom4j.Node screeningConfigurationNode -
parametrizedMetricsNode
protected org.dom4j.Element parametrizedMetricsNode -
parametrizedMetricNodes
-
parametrizedMetrics
symbolic names (mnemonics) of parametrized metrics -
metricIndexes
convert parameterized indexes to MolecularDescriptor metric indexes -
scaleFactors
scale factor of scalable parametrized metrics -
tverskyA
alpha values for tversky dissimilarity -
tverskyB
beta values for tversky dissimilarity -
asymmetryFactors
asymmetry ratio of parametrized asymmetric metrics -
thresholds
dissimilarity thresholds values -
normalized
flags indicating if the metric is normalized or not -
defaultWeight
protected float defaultWeightvalue for all missing weight parameters -
weights
weights for parametrized metrics -
cellwiseWeights
is cell weights for parametrized metrics -
outputPrecision
protected int outputPrecisionnumber of fraction digits in floating point output format -
currentMetricIndex
protected int currentMetricIndexindex of the parametrized metric currently in use -
md
this object is needed to access default dissimilarity functions -
decForm
to format floating point output -
generator
generatesMolecularDescriptors
-
standardizer
transform molecules into standard form before descriptor generation
-
-
Constructor Details
-
MDParameters
protected MDParameters()Creates and initializes an empty object. This class is not allowed to be instantiated directly. Only its derived classes can call the superclass constructors.
-
-
Method Details
-
initParameters
protected void initParameters()Initializes object after configuration parameters are loaded. -
fromString
Sets parameters from a string representation. This method assumes that parameters are described in XML format.- Parameters:
parameterString
- configuration parameters in string- Throws:
MDParametersException
- when the parameter string is not well-formed
-
fromFile
Sets parameters from an XML file. Stores filepath inconfigFilePath
.- Parameters:
parameterFile
- initialized parameter file- Throws:
MDParametersException
- failed to process parameter file
-
addParameters
Sets parameters from an XML string representation keeping all previous settings.- Parameters:
parameterString
- parameters in string- Throws:
MDParametersException
- when the parameter string is not well-formed
-
addParameters
Sets parameters from an XML config file keeping all previous settings.- Parameters:
parameterFile
- parameter file- Throws:
MDParametersException
- when the parameter string is not well-formed
-
setParameters
Sets parameters from an XML string representation overwriting all previous parameters settings with the new ones.- Parameters:
parametersString
- parameters in string- Throws:
MDParametersException
- when the parameter string is not well-formed
-
setParameters
Sets parameters from an XML file representation overwriting all previous settings with the new ones. Stores filepath inconfigFilePath
.- Parameters:
parametersFile
- parameters File- Throws:
MDParametersException
- when the parameter string is not well-formed
-
toString
Returns the parameter values in string. This implementation uses XML for the external format of parameters, however derived classes may use different formats.- Overrides:
toString
in classObject
- Returns:
- parameter string
- Throws:
MDParametersException
- when creating the parameter string fails
-
getScreeningConfigurationString
public String getScreeningConfigurationString(String nodeName, String attrib, String value) throws MDParametersException Returns parts of the parameter values in string. Selects a sub-tree of the DOM tree specified by the tagname of one of its nodes, and writes the subtree into an XML string.- Parameters:
nodeName
- name of the node to be printedattrib
- attribute namevalue
- value of the attribute- Returns:
- parameter string
- Throws:
MDParametersException
- when creating the parameter string fails
-
toString
Returns parts of the parameter values in string. Selects a sub-tree of the DOM tree specified by a node, and writes the subtree into an XML string.- Parameters:
node
- rootnode of the subtree to be printed- Returns:
- parameter string
- Throws:
MDParametersException
- when creating the parameter string fails
-
setCellSize
public void setCellSize(int cellSize) Sets the size (number of bits) of the bins (cells). This has to be at least 1, but should not exceed 32.- Parameters:
cellSize
- the width of one (and each) cell (bin) in bits
-
setLength
Sets the length (number of cells) of the descriptor.- Parameters:
length
- the required length (cell count)- Throws:
MDParametersException
- if argument is not positive
-
setScalingHypothesis
Sets (stores) the specified scaling hypothesis. It is used by various scaled metrics. It cannot be passed to these metrics directly as an argument, because of the uniform method header of metric functions (which has to be preserved).- Parameters:
scalingHypothesis
- the consensus hypothesis used for scaling
-
setScaleFactor
public void setScaleFactor(float scaleFactor) Sets scaleFactor used with the current parametrized metrics.- Parameters:
scaleFactor
- the new value of the scaleFactor
-
setAsymmetryFactor
public void setAsymmetryFactor(float af) Sets the value of the asymmetry factor of the current parametrized metric.- Parameters:
af
- asymmetry factor
-
setThreshold
public void setThreshold(float th) Sets the value of the threshold of the current parametrized metric.- Parameters:
th
- dissimilarity threshold value
-
setWeights
public void setWeights(float[] w) Sets the cell-wise weight factors for the current parametrized metric.- Parameters:
w
- weights
-
setCellwiseWeights
public void setCellwiseWeights(boolean c) Sets boolean telling whether cell weights are to be generated for current parametrized metric.- Parameters:
c
- true if cell weights
-
setNormalized
public void setNormalized(boolean yes) Toggles the normalized flag of the current parametrized metric.- Parameters:
yes
- true, if the metric is normalized
-
setOutputPrecision
public void setOutputPrecision(int precision) Specifies the output precision for floating point values. This method can be used in conjunction withgetDecForm()
.- Parameters:
precision
- number of digits after the decimal point
-
setCurrentParametrizedMetric
public void setCurrentParametrizedMetric(int metricIndex) Selects the specified parametrized metric to be the current.- Parameters:
metricIndex
- index of the selected parametrized metric
-
setCreateStatistics
public void setCreateStatistics(boolean createStatistics) Toggles the create statistics flag of theMDGenerator
object.- Parameters:
createStatistics
- new value for the create statistics flag
-
addParametrizedMetric
public int addParametrizedMetric(String name, String metric, String activeFamily) throws MDParametersException Expands the set of parametrized metrics with a new item. The first parameter is optional, if not specified the symbolic name is formed from the second and the third parameters (both of these are mandatory).- Parameters:
name
- symbolic name of the parametrized metricmetric
- name of the metric (like Tanimoto, Euclidean etc)activeFamily
- name of the active compounds family- Throws:
MDParametersException
-
getCellSize
public int getCellSize()Gets the number of bits of an atomic cell in the descriptor.- Returns:
- the number of bits in one single descriptor cell
-
getLength
public int getLength()Returns the number of cells forming the descriptor. -
getInternalSize
public int getInternalSize()Gets the required memory size to store the descriptor according to the specified parameters.- Returns:
- size of a suitable array that can store one descriptor
-
getData
public byte[] getData()Gets the byte array which is used for conversions between internal and external data formats. Internal is the memory representation, while external is used for storing descriptors in files.- Returns:
- an array large enough to hold the descriptor in external format
-
getCurrentMetricIndex
public int getCurrentMetricIndex() -
getNumberOfMetrics
public int getNumberOfMetrics()Gets the total number of parametrized metrics available in the present configuration.- Returns:
- number of metrics
-
getNumberOfWeights
public int getNumberOfWeights()Gets the number of weights the current parametrized metric takes.- Returns:
- number of weight factors corresponding to the current metric
-
getNumberOfWeights
Gets the number of weight factors used by the specified metric. This method can be applied to the dissimilarity metrics provided by theMolecularDescriptor
class or its derived classes, but not to parametrized metric.- Parameters:
parametrizedMetricIndex
- parametrized metric index- Returns:
- number of weights the metric uses
- Throws:
IllegalArgumentException
- if the given parameter is not a valid metric index
-
getThreshold
public float getThreshold(int metricIndex) Gets a metric dependent threshold value. Ideally, this value should be based on statistics, though the actual value is not too critical, since these are only used in user interfaces to simplify the use of applications for beginners. Note: this parametrized version ofgetThreshold()
is kept for compatibility reasons.- Parameters:
metricIndex
- index of a parametrized metric- Returns:
- threshold corresponding to the given metric index
-
getThreshold
public float getThreshold()Gets the threshold value being set for the current parametrized version.- Returns:
- dissimilarity threshold corresponding to the curent metric
-
getScalingHypothesis
Gets the scaling hypothesis used in scaled metrics.- Returns:
- the scaling hypothesis
-
getInternalMetricIndex
public int getInternalMetricIndex()Gets the MolecularDescriptor specific metric index of the current parametrized metric.- Returns:
- metric index
-
getMetricName
Gets the user defined symbolic name of the current parametrized metric.- Returns:
- metric name
-
getMetricName
Gets the user defined symbolic name of the specified parametrized metric.- Returns:
- metric name
-
getMetricIndex
Gets the index of the given parametrized metric.- Parameters:
name
- name of the parametrized metric- Returns:
- metric index
-
getScaleFactor
public float getScaleFactor()Gets the scale factor used in the current parametrized scalable metrics.- Returns:
- the scale factor
-
getAsymmetryFactor
public float getAsymmetryFactor()Gets the asymmetry factor used in the current parametrized asymmetric metrics.- Returns:
- the value of the asymmetry factor
-
getWeights
public float[] getWeights()Gets all weights for the given parametrized metric. If the specified metric is not a weighted metric or correspong weights are not set, null is returned.- Returns:
- all weights as float values
-
getTverskyAlpha
public float getTverskyAlpha()Gets Tversky alpha value for the given parametrized metric.- Returns:
- alpha as float value
-
getTverskyBeta
public float getTverskyBeta()Gets Tversky beta value for the given parametrized metric.- Returns:
- beta as float value
-
isCellwiseWeights
public boolean isCellwiseWeights()Gets boolean telling whether cell weights are to be generated for current parametrized metric.- Returns:
- true if weights are assigned to each individual descriptor cell
-
getDecForm
Gets the formatter object that is capable of formatting fractions with given precision. Precision can be set by callingsetOutputPrecision( int precision )
.- Returns:
- decimal number formatter object (not localized)
-
isScaled
public boolean isScaled()Returns whether current parametrized metric is scaled or not.- Returns:
- true if the metric is scaled
-
isAsymmetric
public boolean isAsymmetric()Returns whether current parametrized metric is asymmetric or not.- Returns:
- true if the metric is asymmetric
-
isWeighted
public boolean isWeighted()Returns whether current parametrized metric is weighted or not.- Returns:
- true if the metric is weighted
-
isNormalized
public boolean isNormalized()Returns whether current parametrized metric is normalized or not.- Returns:
- true if the metric is normalized
-
isStandardizationMandatory
public boolean isStandardizationMandatory()Checks is Standardization of molecules is mandatory for the correspondingMolecularDescriptor
before descriptor generation. This method always returns true. Derived classes should override in case when standardization is not obligatory.- Returns:
- whether or not standardization of molecules is needed
- Since:
- JChem 2.2
-
getDefaultStandardizerConfiguration
Gets the default configuration of the standardizer. This method is called if no standardizer configuration is set in the parameters configuration but standardization is mandatory for the correspondingMolecularDescriptor
. The default on this top level is aromatization and dehydrogenization, but derived parameter classes may overload this behaviour.- Returns:
- standardizer configuration XML string
- Since:
- JChem 2.2
-
getDefaultDocumentFrame
Gets the default XML configuration string. This is needed when an optional XML configuration is not specified. Descriptor files and descriptor tables always store the configuration that corresponds to the descriptors stored, thus sg. has to be storedeven when nothing is specified.- Returns:
- default XML configuration string of the actual Parameters class
- Since:
- JChem 2.2
-
standardize
Standardizes theMolecule
and returns the standardized form. The standardization is configured via XML. StandardizerConfiguration is the corresponding XML tag. If no standardizar is set up, null is returned.- Parameters:
m
- molecular structure to be standardized- Returns:
- standardized form of the input structure, or null if standardization is mandatory for the corresponding descriptor
- Since:
- JChem 2.2
-
readFromXmlFile
Reads configuration from XML file. Builds a DOM tree, picks nodes, elements that store information that can be processed on this level (leaves others for derived classes), and processes this information by converting values in to internal format and stores them in data members of this class. Stores filepath inconfigFilePath
.- Parameters:
file
- the XML file to read configuration data frommerge
- merge config from file into already existing parameters or overwrite existing parameter valuesall
- process the complete document or only theScreeningConfiguration
tag- Throws:
MDParametersException
- in the case of any failure
-
readFromXmlString
protected void readFromXmlString(String xml, boolean merge, boolean all) throws MDParametersException Reads configuration from XML string. Builds a DOM tree, picks nodes, elements that store information that can be processed on this level (leaves others for derived classes), and processes this information by converting values in to internal format and stores them in data members of this class.- Parameters:
xml
- the XML string to get the configuration data frommerge
- merge config from file into already existing parameters or overwrite existing parameter valuesall
- process the complete document or only theScreeningConfiguration
tag- Throws:
MDParametersException
- in the case of any failure
-
checkDocumentVersion
Checks if the document is the right version- Parameters:
docType
- the required document typeversion
- the expected version number- Throws:
MDParametersException
-
processDocument
Searches the DOM tree for relevant nodes and sets internal variables to some these nodes for the sake of easier information processing.- Parameters:
all
- process the complete document or only theScreeningConfiguration
tag- Throws:
MDParametersException
-
readValues
Picks attribute values from the document tree that are relevant to the actualMDParameters
sub-class.- Parameters:
all
- process the complete document or only theScreeningConfiguration
tag- Throws:
MDParametersException
-
readMetricParameters
Processes allParametrizedMetric
nodes in the DOM tree. Reads parameterized metric names and associated parameter setting and stores them in data member for faster and easier access in getter methods.- Throws:
MDParametersException
- if one of the nodes is not well-formed
-
readMetricWeights
protected void readMetricWeights(org.dom4j.Element parametrizedMetric, int metricIndex) throws MDParametersException - Throws:
MDParametersException
-
writeMetricParameter
Writes a given parameter of the specified metric into the corresponding tree node.- Parameters:
pl
- list of parameters (for all metric indexes)attr
- name of the attribute which the parameter corresponds tomi
- index of the metricuseDecForm
- use precision for writing floating point values
-
appendParametrizedMetric
Extends internal data with a new parametrized metric. Neither the DOM tree nor the XML document is modified.- Parameters:
name
- name of the parametrized metricmetric
- dissimilarity metric name (as defined in its implementor class
-
addParametrizedMetricsNode
protected void addParametrizedMetricsNode()Adds theParametrizedMetrics
node to the DOM tree. -
addParametrizedMetricNode
protected org.dom4j.Element addParametrizedMetricNode(String name, String activeFamily, String metric) Adds aParametrizedMetric
node to the DOM tree.- Parameters:
name
- name of the parameterized metric, given by the useractiveFamily
- name of the active compound family (e.g. ACE)metric
- name of the dissimilarity metric
-
importNodes
protected boolean importNodes(org.dom4j.Document doc, boolean merge) Imports nodes from the specifiedDocument
into the current (main)Document
. New nodes can either merged into the existing ones without removing them, or new nodes may overwrite exisiting nodes.- Parameters:
doc
- import nodes from this documentmerge
- merge (add new) or overwrite (replace with new) existing nodes
-
getDescriptorTypeName
Takes the descriptor type name from the root element of the XML configuration.- Parameters:
xmlConfig
- configuration string- Returns:
- descriptor type name
-