Package chemaxon.descriptors
Class MDParameters
java.lang.Object
chemaxon.descriptors.MDParameters
- Direct Known Subclasses:
BCUTParameters,CDParameters,CFParameters,ECFPParameters,PFParameters,RFParameters,SDParameters,ShapeParameters
MolecularDescriptor parameter settings. This class serves as
the base class for the parameter classes of specific
MolecularDescriptor derivatives.
Descriptor objects of the same type share one common MDParameter object that stores all parameters.
Besides storing parameters and other internal data that are better kept in this class than in individual
MolecularDescriptor objects for
the sake of memory efficiency.
The naming convention - similarly to the derivates of the
MolecularDescriptor class - is as follows: derived class name begins
with the name of the corresponding MolecularDescriptor class and
postfixed by Parameters. For instance that parameters class of the descriptor
class MDXyZ is XyZParameters.
Parameters are read from configuration files.
MDParameters
provides an extensive functionality to process XML configuration files,
however, further parameter classes extending MDParameters do
not necessarily have to use XML for storing parameters.
MDParameters plays an important role in providing so called
Screening Configurations for the dissimilarity calculations.
Such configurations contain so called parameterized metrics that are
based on dissimilarity metrics implemented in classes that extend the
MolecularDescriptor class. It is important to make clear
distinction between the above two cathegories: dissimilarity metrics are
the basis of the parametrized metrics. MDParameters stores
metrics parameters and provides services for retrieving and storing
parametrized metrics, access them by either name or index.
Another important functionality of this class is to allow the creation of new parametrized metrics and write them into the XML document.
- Since:
- JChem 2.2
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected ArrayListasymmetry ratio of parametrized asymmetric metricsprotected intsize - number of bits - of one descriptor cellprotected ArrayListis cell weights for parametrized metricsprotected Stringlocation of the configuration fileprotected intindex of the parametrized metric currently in useprotected byte[]buffer for external data format generation, used inMolecularDescriptor.toData()protected NumberFormatto format floating point outputstatic final floatstatic final intstatic final floatconstants, default parameter valuesstatic final floatprotected floatvalue for all missing weight parametersprotected org.dom4j.Documentcontains the XML documentprotected MDGeneratorgeneratesMolecularDescriptorsprotected intrequired memory size of one descriptor instanceprotected intthe length of the descriptor: the number of cellsprotected MolecularDescriptorthis object is needed to access default dissimilarity functionsprotected ArrayListconvert parameterized indexes to MolecularDescriptor metric indexesprotected ArrayListflags indicating if the metric is normalized or notprotected intnumber of fraction digits in floating point output formatprotected Listprotected ArrayListsymbolic names (mnemonics) of parametrized metricsprotected org.dom4j.Elementprotected ArrayListscale factor of scalable parametrized metricsprotected org.dom4j.Nodeprotected org.dom4j.Elementnode holding the similarity calculations related parametersprotected Standardizertransform molecules into standard form before descriptor generationprotected org.dom4j.Nodenode defining the Standardizer configurationprotected ArrayListdissimilarity thresholds valuesalpha values for tversky dissimilaritybeta values for tversky dissimilarityprotected ArrayListweights for parametrized metrics -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidaddParameters(File parameterFile) Sets parameters from an XML config file keeping all previous settings.voidaddParameters(String parameterString) Sets parameters from an XML string representation keeping all previous settings.intaddParametrizedMetric(String name, String metric, String activeFamily) Expands the set of parametrized metrics with a new item.protected org.dom4j.ElementaddParametrizedMetricNode(String name, String activeFamily, String metric) Adds aParametrizedMetricnode to the DOM tree.protected voidAdds theParametrizedMetricsnode to the DOM tree.protected intappendParametrizedMetric(String name, String metric) Extends internal data with a new parametrized metric.protected voidcheckDocumentVersion(String docType, String version) Checks if the document is the right versionvoidSets parameters from an XML file.voidfromString(String parameterString) Sets parameters from a string representation.floatGets the asymmetry factor used in the current parametrized asymmetric metrics.intGets the number of bits of an atomic cell in the descriptor.intbyte[]getData()Gets the byte array which is used for conversions between internal and external data formats.Gets the formatter object that is capable of formatting fractions with given precision.Gets the default XML configuration string.static StringGets the default configuration of the standardizer.static StringgetDescriptorTypeName(String xmlConfig) Takes the descriptor type name from the root element of the XML configuration.intGets the MolecularDescriptor specific metric index of the current parametrized metric.intGets the required memory size to store the descriptor according to the specified parameters.intReturns the number of cells forming the descriptor.intgetMetricIndex(String name) Gets the index of the given parametrized metric.Gets the user defined symbolic name of the current parametrized metric.getMetricName(int metricIndex) Gets the user defined symbolic name of the specified parametrized metric.intGets the total number of parametrized metrics available in the present configuration.intGets the number of weights the current parametrized metric takes.protected intgetNumberOfWeights(int parametrizedMetricIndex) Gets the number of weight factors used by the specified metric.floatGets the scale factor used in the current parametrized scalable metrics.Gets the scaling hypothesis used in scaled metrics.getScreeningConfigurationString(String nodeName, String attrib, String value) Returns parts of the parameter values in string.floatGets the threshold value being set for the current parametrized version.floatgetThreshold(int metricIndex) Gets a metric dependent threshold value.floatGets Tversky alpha value for the given parametrized metric.floatGets Tversky beta value for the given parametrized metric.float[]Gets all weights for the given parametrized metric.protected booleanimportNodes(org.dom4j.Document doc, boolean merge) Imports nodes from the specifiedDocumentinto the current (main)Document.protected voidInitializes object after configuration parameters are loaded.booleanReturns whether current parametrized metric is asymmetric or not.booleanGets boolean telling whether cell weights are to be generated for current parametrized metric.booleanReturns whether current parametrized metric is normalized or not.booleanisScaled()Returns whether current parametrized metric is scaled or not.booleanChecks is Standardization of molecules is mandatory for the correspondingMolecularDescriptorbefore descriptor generation.booleanReturns whether current parametrized metric is weighted or not.protected voidprocessDocument(boolean all) Searches the DOM tree for relevant nodes and sets internal variables to some these nodes for the sake of easier information processing.protected voidreadFromXmlFile(File file, boolean merge, boolean all) Reads configuration from XML file.protected voidreadFromXmlString(String xml, boolean merge, boolean all) Reads configuration from XML string.protected voidProcesses allParametrizedMetricnodes in the DOM tree.protected voidreadMetricWeights(org.dom4j.Element parametrizedMetric, int metricIndex) protected voidreadValues(boolean all) Picks attribute values from the document tree that are relevant to the actualMDParameterssub-class.voidsetAsymmetryFactor(float af) Sets the value of the asymmetry factor of the current parametrized metric.voidsetCellSize(int cellSize) Sets the size (number of bits) of the bins (cells).voidsetCellwiseWeights(boolean c) Sets boolean telling whether cell weights are to be generated for current parametrized metric.voidsetCreateStatistics(boolean createStatistics) Toggles the create statistics flag of theMDGeneratorobject.voidsetCurrentParametrizedMetric(int metricIndex) Selects the specified parametrized metric to be the current.voidsetLength(int length) Sets the length (number of cells) of the descriptor.voidsetNormalized(boolean yes) Toggles the normalized flag of the current parametrized metric.voidsetOutputPrecision(int precision) Specifies the output precision for floating point values.voidsetParameters(File parametersFile) Sets parameters from an XML file representation overwriting all previous settings with the new ones.voidsetParameters(String parametersString) Sets parameters from an XML string representation overwriting all previous parameters settings with the new ones.voidsetScaleFactor(float scaleFactor) Sets scaleFactor used with the current parametrized metrics.voidsetScalingHypothesis(MolecularDescriptor scalingHypothesis) Sets (stores) the specified scaling hypothesis.voidsetThreshold(float th) Sets the value of the threshold of the current parametrized metric.voidsetWeights(float[] w) Sets the cell-wise weight factors for the current parametrized metric.Standardizes theMoleculeand returns the standardized form.toString()Returns the parameter values in string.protected StringtoString(org.dom4j.Node node) Returns parts of the parameter values in string.protected voidwriteMetricParameter(ArrayList pl, String attr, int mi, boolean useDecForm) Writes a given parameter of the specified metric into the corresponding tree node.
-
Field Details
-
DEFAULT_SCALE_FACTOR
public static final float DEFAULT_SCALE_FACTORconstants, default parameter values- See Also:
-
DEFAULT_ASYMMETRY_FACTOR
public static final float DEFAULT_ASYMMETRY_FACTOR- See Also:
-
DEFAULT_WEIGHT
public static final float DEFAULT_WEIGHT- See Also:
-
DEFAULT_OUTPUT_PRECISION
public static final int DEFAULT_OUTPUT_PRECISION- See Also:
-
cellSize
protected int cellSizesize - number of bits - of one descriptor cell -
length
protected int lengththe length of the descriptor: the number of cells -
internalSize
protected int internalSizerequired memory size of one descriptor instance -
data
protected byte[] databuffer for external data format generation, used inMolecularDescriptor.toData() -
configFilePath
location of the configuration file -
document
protected org.dom4j.Document documentcontains the XML document -
standardizerConfigurationNode
protected org.dom4j.Node standardizerConfigurationNodenode defining the Standardizer configuration -
similarityNode
protected org.dom4j.Element similarityNodenode holding the similarity calculations related parameters -
screeningConfigurationNode
protected org.dom4j.Node screeningConfigurationNode -
parametrizedMetricsNode
protected org.dom4j.Element parametrizedMetricsNode -
parametrizedMetricNodes
-
parametrizedMetrics
symbolic names (mnemonics) of parametrized metrics -
metricIndexes
convert parameterized indexes to MolecularDescriptor metric indexes -
scaleFactors
scale factor of scalable parametrized metrics -
tverskyA
alpha values for tversky dissimilarity -
tverskyB
beta values for tversky dissimilarity -
asymmetryFactors
asymmetry ratio of parametrized asymmetric metrics -
thresholds
dissimilarity thresholds values -
normalized
flags indicating if the metric is normalized or not -
defaultWeight
protected float defaultWeightvalue for all missing weight parameters -
weights
weights for parametrized metrics -
cellwiseWeights
is cell weights for parametrized metrics -
outputPrecision
protected int outputPrecisionnumber of fraction digits in floating point output format -
currentMetricIndex
protected int currentMetricIndexindex of the parametrized metric currently in use -
md
this object is needed to access default dissimilarity functions -
decForm
to format floating point output -
generator
generatesMolecularDescriptors -
standardizer
transform molecules into standard form before descriptor generation
-
-
Constructor Details
-
MDParameters
protected MDParameters()Creates and initializes an empty object. This class is not allowed to be instantiated directly. Only its derived classes can call the superclass constructors.
-
-
Method Details
-
initParameters
protected void initParameters()Initializes object after configuration parameters are loaded. -
fromString
Sets parameters from a string representation. This method assumes that parameters are described in XML format.- Parameters:
parameterString- configuration parameters in string- Throws:
MDParametersException- when the parameter string is not well-formed
-
fromFile
Sets parameters from an XML file. Stores filepath inconfigFilePath.- Parameters:
parameterFile- initialized parameter file- Throws:
MDParametersException- failed to process parameter file
-
addParameters
Sets parameters from an XML string representation keeping all previous settings.- Parameters:
parameterString- parameters in string- Throws:
MDParametersException- when the parameter string is not well-formed
-
addParameters
Sets parameters from an XML config file keeping all previous settings.- Parameters:
parameterFile- parameter file- Throws:
MDParametersException- when the parameter string is not well-formed
-
setParameters
Sets parameters from an XML string representation overwriting all previous parameters settings with the new ones.- Parameters:
parametersString- parameters in string- Throws:
MDParametersException- when the parameter string is not well-formed
-
setParameters
Sets parameters from an XML file representation overwriting all previous settings with the new ones. Stores filepath inconfigFilePath.- Parameters:
parametersFile- parameters File- Throws:
MDParametersException- when the parameter string is not well-formed
-
toString
Returns the parameter values in string. This implementation uses XML for the external format of parameters, however derived classes may use different formats.- Overrides:
toStringin classObject- Returns:
- parameter string
- Throws:
MDParametersException- when creating the parameter string fails
-
getScreeningConfigurationString
public String getScreeningConfigurationString(String nodeName, String attrib, String value) throws MDParametersException Returns parts of the parameter values in string. Selects a sub-tree of the DOM tree specified by the tagname of one of its nodes, and writes the subtree into an XML string.- Parameters:
nodeName- name of the node to be printedattrib- attribute namevalue- value of the attribute- Returns:
- parameter string
- Throws:
MDParametersException- when creating the parameter string fails
-
toString
Returns parts of the parameter values in string. Selects a sub-tree of the DOM tree specified by a node, and writes the subtree into an XML string.- Parameters:
node- rootnode of the subtree to be printed- Returns:
- parameter string
- Throws:
MDParametersException- when creating the parameter string fails
-
setCellSize
public void setCellSize(int cellSize) Sets the size (number of bits) of the bins (cells). This has to be at least 1, but should not exceed 32.- Parameters:
cellSize- the width of one (and each) cell (bin) in bits
-
setLength
Sets the length (number of cells) of the descriptor.- Parameters:
length- the required length (cell count)- Throws:
MDParametersException- if argument is not positive
-
setScalingHypothesis
Sets (stores) the specified scaling hypothesis. It is used by various scaled metrics. It cannot be passed to these metrics directly as an argument, because of the uniform method header of metric functions (which has to be preserved).- Parameters:
scalingHypothesis- the consensus hypothesis used for scaling
-
setScaleFactor
public void setScaleFactor(float scaleFactor) Sets scaleFactor used with the current parametrized metrics.- Parameters:
scaleFactor- the new value of the scaleFactor
-
setAsymmetryFactor
public void setAsymmetryFactor(float af) Sets the value of the asymmetry factor of the current parametrized metric.- Parameters:
af- asymmetry factor
-
setThreshold
public void setThreshold(float th) Sets the value of the threshold of the current parametrized metric.- Parameters:
th- dissimilarity threshold value
-
setWeights
public void setWeights(float[] w) Sets the cell-wise weight factors for the current parametrized metric.- Parameters:
w- weights
-
setCellwiseWeights
public void setCellwiseWeights(boolean c) Sets boolean telling whether cell weights are to be generated for current parametrized metric.- Parameters:
c- true if cell weights
-
setNormalized
public void setNormalized(boolean yes) Toggles the normalized flag of the current parametrized metric.- Parameters:
yes- true, if the metric is normalized
-
setOutputPrecision
public void setOutputPrecision(int precision) Specifies the output precision for floating point values. This method can be used in conjunction withgetDecForm().- Parameters:
precision- number of digits after the decimal point
-
setCurrentParametrizedMetric
public void setCurrentParametrizedMetric(int metricIndex) Selects the specified parametrized metric to be the current.- Parameters:
metricIndex- index of the selected parametrized metric
-
setCreateStatistics
public void setCreateStatistics(boolean createStatistics) Toggles the create statistics flag of theMDGeneratorobject.- Parameters:
createStatistics- new value for the create statistics flag
-
addParametrizedMetric
public int addParametrizedMetric(String name, String metric, String activeFamily) throws MDParametersException Expands the set of parametrized metrics with a new item. The first parameter is optional, if not specified the symbolic name is formed from the second and the third parameters (both of these are mandatory).- Parameters:
name- symbolic name of the parametrized metricmetric- name of the metric (like Tanimoto, Euclidean etc)activeFamily- name of the active compounds family- Throws:
MDParametersException
-
getCellSize
public int getCellSize()Gets the number of bits of an atomic cell in the descriptor.- Returns:
- the number of bits in one single descriptor cell
-
getLength
public int getLength()Returns the number of cells forming the descriptor. -
getInternalSize
public int getInternalSize()Gets the required memory size to store the descriptor according to the specified parameters.- Returns:
- size of a suitable array that can store one descriptor
-
getData
public byte[] getData()Gets the byte array which is used for conversions between internal and external data formats. Internal is the memory representation, while external is used for storing descriptors in files.- Returns:
- an array large enough to hold the descriptor in external format
-
getCurrentMetricIndex
public int getCurrentMetricIndex() -
getNumberOfMetrics
public int getNumberOfMetrics()Gets the total number of parametrized metrics available in the present configuration.- Returns:
- number of metrics
-
getNumberOfWeights
public int getNumberOfWeights()Gets the number of weights the current parametrized metric takes.- Returns:
- number of weight factors corresponding to the current metric
-
getNumberOfWeights
Gets the number of weight factors used by the specified metric. This method can be applied to the dissimilarity metrics provided by theMolecularDescriptorclass or its derived classes, but not to parametrized metric.- Parameters:
parametrizedMetricIndex- parametrized metric index- Returns:
- number of weights the metric uses
- Throws:
IllegalArgumentException- if the given parameter is not a valid metric index
-
getThreshold
public float getThreshold(int metricIndex) Gets a metric dependent threshold value. Ideally, this value should be based on statistics, though the actual value is not too critical, since these are only used in user interfaces to simplify the use of applications for beginners. Note: this parametrized version ofgetThreshold()is kept for compatibility reasons.- Parameters:
metricIndex- index of a parametrized metric- Returns:
- threshold corresponding to the given metric index
-
getThreshold
public float getThreshold()Gets the threshold value being set for the current parametrized version.- Returns:
- dissimilarity threshold corresponding to the curent metric
-
getScalingHypothesis
Gets the scaling hypothesis used in scaled metrics.- Returns:
- the scaling hypothesis
-
getInternalMetricIndex
public int getInternalMetricIndex()Gets the MolecularDescriptor specific metric index of the current parametrized metric.- Returns:
- metric index
-
getMetricName
Gets the user defined symbolic name of the current parametrized metric.- Returns:
- metric name
-
getMetricName
Gets the user defined symbolic name of the specified parametrized metric.- Returns:
- metric name
-
getMetricIndex
Gets the index of the given parametrized metric.- Parameters:
name- name of the parametrized metric- Returns:
- metric index
-
getScaleFactor
public float getScaleFactor()Gets the scale factor used in the current parametrized scalable metrics.- Returns:
- the scale factor
-
getAsymmetryFactor
public float getAsymmetryFactor()Gets the asymmetry factor used in the current parametrized asymmetric metrics.- Returns:
- the value of the asymmetry factor
-
getWeights
public float[] getWeights()Gets all weights for the given parametrized metric. If the specified metric is not a weighted metric or correspong weights are not set, null is returned.- Returns:
- all weights as float values
-
getTverskyAlpha
public float getTverskyAlpha()Gets Tversky alpha value for the given parametrized metric.- Returns:
- alpha as float value
-
getTverskyBeta
public float getTverskyBeta()Gets Tversky beta value for the given parametrized metric.- Returns:
- beta as float value
-
isCellwiseWeights
public boolean isCellwiseWeights()Gets boolean telling whether cell weights are to be generated for current parametrized metric.- Returns:
- true if weights are assigned to each individual descriptor cell
-
getDecForm
Gets the formatter object that is capable of formatting fractions with given precision. Precision can be set by callingsetOutputPrecision( int precision ).- Returns:
- decimal number formatter object (not localized)
-
isScaled
public boolean isScaled()Returns whether current parametrized metric is scaled or not.- Returns:
- true if the metric is scaled
-
isAsymmetric
public boolean isAsymmetric()Returns whether current parametrized metric is asymmetric or not.- Returns:
- true if the metric is asymmetric
-
isWeighted
public boolean isWeighted()Returns whether current parametrized metric is weighted or not.- Returns:
- true if the metric is weighted
-
isNormalized
public boolean isNormalized()Returns whether current parametrized metric is normalized or not.- Returns:
- true if the metric is normalized
-
isStandardizationMandatory
public boolean isStandardizationMandatory()Checks is Standardization of molecules is mandatory for the correspondingMolecularDescriptorbefore descriptor generation. This method always returns true. Derived classes should override in case when standardization is not obligatory.- Returns:
- whether or not standardization of molecules is needed
- Since:
- JChem 2.2
-
getDefaultStandardizerConfiguration
Gets the default configuration of the standardizer. This method is called if no standardizer configuration is set in the parameters configuration but standardization is mandatory for the correspondingMolecularDescriptor. The default on this top level is aromatization and dehydrogenization, but derived parameter classes may overload this behaviour.- Returns:
- standardizer configuration XML string
- Since:
- JChem 2.2
-
getDefaultDocumentFrame
Gets the default XML configuration string. This is needed when an optional XML configuration is not specified. Descriptor files and descriptor tables always store the configuration that corresponds to the descriptors stored, thus sg. has to be storedeven when nothing is specified.- Returns:
- default XML configuration string of the actual Parameters class
- Since:
- JChem 2.2
-
standardize
Standardizes theMoleculeand returns the standardized form. The standardization is configured via XML. StandardizerConfiguration is the corresponding XML tag. If no standardizar is set up, null is returned.- Parameters:
m- molecular structure to be standardized- Returns:
- standardized form of the input structure, or null if standardization is mandatory for the corresponding descriptor
- Since:
- JChem 2.2
-
readFromXmlFile
Reads configuration from XML file. Builds a DOM tree, picks nodes, elements that store information that can be processed on this level (leaves others for derived classes), and processes this information by converting values in to internal format and stores them in data members of this class. Stores filepath inconfigFilePath.- Parameters:
file- the XML file to read configuration data frommerge- merge config from file into already existing parameters or overwrite existing parameter valuesall- process the complete document or only theScreeningConfigurationtag- Throws:
MDParametersException- in the case of any failure
-
readFromXmlString
protected void readFromXmlString(String xml, boolean merge, boolean all) throws MDParametersException Reads configuration from XML string. Builds a DOM tree, picks nodes, elements that store information that can be processed on this level (leaves others for derived classes), and processes this information by converting values in to internal format and stores them in data members of this class.- Parameters:
xml- the XML string to get the configuration data frommerge- merge config from file into already existing parameters or overwrite existing parameter valuesall- process the complete document or only theScreeningConfigurationtag- Throws:
MDParametersException- in the case of any failure
-
checkDocumentVersion
Checks if the document is the right version- Parameters:
docType- the required document typeversion- the expected version number- Throws:
MDParametersException
-
processDocument
Searches the DOM tree for relevant nodes and sets internal variables to some these nodes for the sake of easier information processing.- Parameters:
all- process the complete document or only theScreeningConfigurationtag- Throws:
MDParametersException
-
readValues
Picks attribute values from the document tree that are relevant to the actualMDParameterssub-class.- Parameters:
all- process the complete document or only theScreeningConfigurationtag- Throws:
MDParametersException
-
readMetricParameters
Processes allParametrizedMetricnodes in the DOM tree. Reads parameterized metric names and associated parameter setting and stores them in data member for faster and easier access in getter methods.- Throws:
MDParametersException- if one of the nodes is not well-formed
-
readMetricWeights
protected void readMetricWeights(org.dom4j.Element parametrizedMetric, int metricIndex) throws MDParametersException - Throws:
MDParametersException
-
writeMetricParameter
Writes a given parameter of the specified metric into the corresponding tree node.- Parameters:
pl- list of parameters (for all metric indexes)attr- name of the attribute which the parameter corresponds tomi- index of the metricuseDecForm- use precision for writing floating point values
-
appendParametrizedMetric
Extends internal data with a new parametrized metric. Neither the DOM tree nor the XML document is modified.- Parameters:
name- name of the parametrized metricmetric- dissimilarity metric name (as defined in its implementor class
-
addParametrizedMetricsNode
protected void addParametrizedMetricsNode()Adds theParametrizedMetricsnode to the DOM tree. -
addParametrizedMetricNode
protected org.dom4j.Element addParametrizedMetricNode(String name, String activeFamily, String metric) Adds aParametrizedMetricnode to the DOM tree.- Parameters:
name- name of the parameterized metric, given by the useractiveFamily- name of the active compound family (e.g. ACE)metric- name of the dissimilarity metric
-
importNodes
protected boolean importNodes(org.dom4j.Document doc, boolean merge) Imports nodes from the specifiedDocumentinto the current (main)Document. New nodes can either merged into the existing ones without removing them, or new nodes may overwrite exisiting nodes.- Parameters:
doc- import nodes from this documentmerge- merge (add new) or overwrite (replace with new) existing nodes
-
getDescriptorTypeName
Takes the descriptor type name from the root element of the XML configuration.- Parameters:
xmlConfig- configuration string- Returns:
- descriptor type name
-