Class GenerateMD
GenerateMD
provides a high level Application Program Interface
(API) with comprehensive functionality for the generation of various
Molecular Descriptors.
The API supports all kinds of inputs and outputs (molecule files, database,
desciptor files), and is capable of generating multiple descriptors
simultaneously.
Example of typical usage:
MDParameters cfpConfig = new CFParameters( new File("jchem/examples/config/cfp.xml") ); MDParameters cfpConfig = new PFParameters( new File("jchem/examples/config/pharma-frag.xml") ); GenerateMD generator = new GenerateMD( 2 ); generator.setInput( "molecules.sdf" ); generator.setSDFileInput( true ); generator.setDescriptor( 0, "molecules.cfp", "CF", cfpConfig, "" ); generator.setDescriptor( 1, "molecules.pfp", "PF", pfpConfig, "" ); generator.init(); generator.run(); generator.close();
The above example generates two descriptors (a descriptor set) at the same
time for every structures read from the input file molecules.sdf
.
The first component of the descriptor set is a chemical fingerprint which
is configured from the parameter file jchem/examples/config/cfp.xml
, while the second is a pharmcophore fingerprint configured by the
jchem/examples/config/pharma-frag.xml
configuration file.
GenerateMD supports the following descriptor types (generatemd -L
lists all available built-in descriptor types):
ECFP fingerprint (ECFP) 3D Shape descriptor (Shape) Chemical Fingerprint (CF) Pharmacophore Fingerprint (PF) Reaction Fingerprint (RF) BCUT descriptors (BCUT) Hydrogen bond Donor/Acceptor count (HDon/HAcc) octanol-water distribution coefficient (LogD) octanol-water partition coefficient (LogP) Topological Polar Surface Area (TPSA) Mass of molcule (Mass) number of Heavy atoms (Heavy)The chemical and pharmcophore fingerprints generated are written into the files
molecules.cfp
and molecules.pfp
respectively.
This class does not provide methods others than transforming a molecular
structure retrieved from the input source into one or more descriptor files or
database tables.
GenerateMD
also servers as a command line tool for the generation
of Molecular Descriptors from batch.
Beside supporting all kinds of MolecularDescriptors implemented by ChemAxon,
it is capable of generating arbitarary custom MolecularDescriptors (which
are derived from the MolecularDescriptor
class) implemented by
users or third parties.
GenerateMD
accepts various import sources: molecular files in
many standard format, and database table (JChem structure tables).
MolecularDescriptor
s generated are stored in file in the case
of file input, and in database tables (so called MD
tables) when
input molecules are retrieved from a structure table. SDfile output stores
the descriptors generated in a custom tag. It is also possible to produce
MolecularDescriptor
files that do not include any structural
information only the descriptors in a readable format. Such files allow faster
operation than SDfiles in further processing steps (for example in
virtual screening).
- Since:
- JChem 2.0
-
Constructor Summary
ConstructorDescriptionCreates an emptyMolecularDescripotor
generator object.GenerateMD
(int descriptorCount) Creates an object for generating the given number of differentMolecularDescriptor
s (a molecular descriptor set,MDSet
) simultaneously. -
Method Summary
Modifier and TypeMethodDescriptionvoid
addMDConfig
(String descrName, String configName, File configFile) Adds a new parameter configuration to the descriptor.void
addMDConfig
(String descrName, String configName, String config) Adds a new parameter configuration to the descriptor.void
close()
Closes the generator, all output files or database connection.void
createMDTable
(String descrName, String className, String settings, String comment) Creates a database table to store theMolecularDescriptor
s generated.void
deleteMDConfig
(String descrName, String configName) Deletes an extension configuration.void
deleteMDTable
(String descrName) Deletes a database table that strores molecular descriptors.int[]
int
Gets the number of molecules processed sinceinit()
was called.String[]
Gets the names of all descriptor types stored in the database that are associated with the current structure table.getStatistics
(int di) Gets statistical data on descriptors generated.void
init()
Initialize the generator object.static void
Command-line entry point to theMolecularDescriptor
generator.void
run()
Processes all structures from the input source.void
setBinaryOutput
(boolean binaryOutput) Sets decimal output format.void
setConnectionHandler
(ConnectionHandler connectionHandler) Sets the database connection when both structures and descriptors are stored in a database.void
setCreateStat
(boolean createStat) Toggles create statistics flag.void
setDecimalOutput
(boolean decimalOutput) Sets decimal output format.void
setDescriptor
(int index, String name, String type, MDParameters params, String comment) Sets type, name, parameters and comment for a given descriptor component.void
setDescriptor
(int index, String name, String type, String settings, String comment) Sets type, name, parameters and comment for a given descriptor component.void
setDescriptor
(String name, String type, MDParameters params, String comment) Sets type, name, parameters and comment for a given descriptor component.void
setDescriptor
(String name, String type, String settings, String comment) Sets the descriptor to be generated.void
setDescriptors
(String[] names, String[] types, MDParameters[] params, String[] comments) Sets type, name, parameters and comment for all components of a molecular descriptor set.void
setDescriptors
(String[] names, String[] types, String[] settings, String[] comments) Sets all descriptor components to be generated simultaneously.void
setGenerateId
(int from) Toggles automatic unique structure/descriptor identifier generation mode and sets the value of the first unique identifier.void
setIdTagName
(String idTagName) Sets the name of the input SDfile tag which contains unique structure identifiers.void
setInput
(InputStream input) Sets the input to an already opened molecular structure stream.void
Sets the name of the input molecular structure file.void
setOutputFileName
(String outputFileName) Sets the name of the outputSDfile
.void
setSDfileInput
(boolean sdfInput) Toggles input file type.void
setSDfileOutput
(boolean sdfOutput) Toggles SDfile output format.void
setSelectStatement
(String whereClause) Sets the optional select statement for fetching molecules from the structure table.void
setStructureTableName
(String structureTableName) Sets the name of the structure table to take molecular structures from.void
setTagName
(int index, String name) Sets the SDfile tag name for the given descriptor set component.void
setTagName
(String name) Sets the SDfile tag name for the only descriptor type generated.void
setTagNames
(String[] names) Sets the SDfile tag names for all descriptor set components.void
setUpdateOnInsert
(boolean updateOnInsert) Sets/clears automatic update on insert mode.void
setValidateDescriptor
(String activityTagName, double clusteringRadius, String metric) Sets parameters for the Activity-seeded Structure-based clustering.boolean
step()
Fetches one structure from the input source and generates descriptors as specified before initialization by the setter methods.void
updateMDTable
(String descrName) Systematically regenerates all descriptors.void
Validates a descriptor by the activity-seeded structure-based clustering.
-
Constructor Details
-
GenerateMD
public GenerateMD()Creates an emptyMolecularDescripotor
generator object. -
GenerateMD
public GenerateMD(int descriptorCount) Creates an object for generating the given number of differentMolecularDescriptor
s (a molecular descriptor set,MDSet
) simultaneously.- Parameters:
descriptorCount
- number of independent descriptor types to be generated
-
-
Method Details
-
setConnectionHandler
Sets the database connection when both structures and descriptors are stored in a database.- Parameters:
connectionHandler
- valid connection to a database- Throws:
MDGeneratorException
- when attempting to call this method afterinit()
-
setStructureTableName
public void setStructureTableName(String structureTableName) throws MDGeneratorException, SQLException Sets the name of the structure table to take molecular structures from. Use this when input comes from a database.- Parameters:
structureTableName
- name of the database table of input structures- Throws:
MDGeneratorException
- when attempting to call this method afterinit()
or when there is no valid database connection; or if descriptor validation option was selected beforehandSQLException
- in the case of database management errors
-
setUpdateOnInsert
public void setUpdateOnInsert(boolean updateOnInsert) Sets/clears automatic update on insert mode. Auto-update on insert means that the descriptor table is automatically updated when a new structure is inserted into the original structure table.- Parameters:
updateOnInsert
- indicates auto-update mode- Since:
- JChem 2.3
-
setSelectStatement
Sets the optional select statement for fetching molecules from the structure table.- Parameters:
whereClause
- restrict clause without theWHERE
statement- Throws:
MDGeneratorException
- when attempting to call this method afterinit()
or when there is no valid and alive database connection, or when no structure table name has been set
-
setInput
Sets the name of the input molecular structure file.- Parameters:
inputFileName
- name of the input file- Throws:
MDGeneratorException
- when attempting to call this method afterinit()
or when there is already a valid and alive database connectionIOException
- in case of file reading problems.
-
setInput
Sets the input to an already opened molecular structure stream.- Parameters:
input
- an input stream- Throws:
MDGeneratorException
- when attempting to call this method afterinit()
or when there is already a valid and alive database connection
-
setSDfileInput
Toggles input file type.- Parameters:
sdfInput
- indicates, if input file is anSDfile
- Throws:
MDGeneratorException
- when attempting to call this method afterinit()
or when no input file has been specified
-
setOutputFileName
Sets the name of the outputSDfile
. Note, that if the required output is one or more descriptor file(s), it (they) should not be specified as output file(s), but as descriptor name(s).- Parameters:
outputFileName
- name of the output SDfile- Throws:
MDGeneratorException
- when attempting to call this method afterinit()
or when there is already a valid and alive database connection
-
setSDfileOutput
public void setSDfileOutput(boolean sdfOutput) Toggles SDfile output format.- Parameters:
sdfOutput
- indicates if output file is an SDfile
-
setDecimalOutput
public void setDecimalOutput(boolean decimalOutput) Sets decimal output format. This file format is recognized by JKlustor tools.- Parameters:
decimalOutput
- new value for the option- Since:
- JChem 2.0.1
-
setBinaryOutput
public void setBinaryOutput(boolean binaryOutput) Sets decimal output format. This file format is recognized by JKlustor tools.- Parameters:
binaryOutput
- new value for the option- Since:
- JChem 2.3
-
setIdTagName
Sets the name of the input SDfile tag which contains unique structure identifiers. These identifiers are printed in each line of the decimal output format.- Parameters:
idTagName
- SDfile structure identifier tag name- Since:
- JChem 2.0.1
-
setValidateDescriptor
public void setValidateDescriptor(String activityTagName, double clusteringRadius, String metric) throws MDGeneratorException Sets parameters for the Activity-seeded Structure-based clustering.- Parameters:
activityTagName
- name of the SDfile tag storing activity dataclusteringRadius
- dissimilarity radius of a cluster around a seedmetric
- metric used in clustering- Throws:
MDGeneratorException
- when attempting to call this method afterinit()
or when applying this method in multiple descriptor case- Since:
- JChem 2.3
-
setDescriptor
public void setDescriptor(String name, String type, String settings, String comment) throws MDGeneratorException Sets the descriptor to be generated. Use this method when descriptor of one type are generated (that is, the descriptor set has one component only).- Parameters:
name
- user given name of the descriptortype
- type name of the descriptor (e.g.ChemicalFingerprint
;generatemd -L
command lists all available built-in descriptor types)settings
- parameter settings of the descriptor (XML)comment
- optional comment to be stored in database- Throws:
MDGeneratorException
- when attempting to call this method afterinit()
or when applying this method in multiple descriptor case
-
setDescriptor
public void setDescriptor(int index, String name, String type, String settings, String comment) throws MDGeneratorException Sets type, name, parameters and comment for a given descriptor component. Use this method when more than one descriptors are generated at a time (e.g. CF and PF simultaneously).- Parameters:
index
- index of the componentname
- user given name of the descriptor set componenttype
- type name of the descriptor (e.g.ChemicalFingerprint
;generatemd -L
command lists all available built-in descriptor types)settings
- parameter settings for the descriptor (XML)comment
- optional comment to be stored in database- Throws:
MDGeneratorException
- when attempting to call this method afterinit()
or when another component's settings were specified with anMDParameters
object (rather than aString
-
setDescriptors
public void setDescriptors(String[] names, String[] types, String[] settings, String[] comments) throws MDGeneratorException Sets all descriptor components to be generated simultaneously.- Parameters:
names
- user given names of the descriptor set componentstypes
- type names of the descriptors (e.g.ChemicalFingerprint
;generatemd -L
command lists all available built-in descriptor types)settings
- parameter settings for the descriptors (XML)comments
- optional comments to be stored in database- Throws:
MDGeneratorException
- when attempting to call this method afterinit()
-
setDescriptor
public void setDescriptor(String name, String type, MDParameters params, String comment) throws MDGeneratorException Sets type, name, parameters and comment for a given descriptor component. Use this method when only one descriptor type is generated.- Parameters:
name
- user given name of the descriptortype
- type name of the descriptor (e.g.ChemicalFingerprint
;generatemd -L
command lists all available built-in descriptor types)params
- parameter settings for the descriptor (e.g.CFParameters
)comment
- optional comment to be stored in database- Throws:
MDGeneratorException
- when attempting to call this method afterinit()
or when applying this method in multiple descriptor case
-
setDescriptor
public void setDescriptor(int index, String name, String type, MDParameters params, String comment) throws MDGeneratorException Sets type, name, parameters and comment for a given descriptor component. Use this method when more than one descriptors are generated at a time and they are not specified all in one go.- Parameters:
index
- index of the component to be specifiedname
- user given name of the descriptortype
- type name of the descriptor (e.g.ChemicalFingerprint
;generatemd -L
command lists all available built-in descriptor types)params
- parameter settings of the descriptor (e.g.CFParameters
)comment
- optional comment to be stored indatabase only- Throws:
MDGeneratorException
- when attempting to call this method afterinit()
or when a previously set component was specified with aString
parameter setting
-
setDescriptors
public void setDescriptors(String[] names, String[] types, MDParameters[] params, String[] comments) throws MDGeneratorException Sets type, name, parameters and comment for all components of a molecular descriptor set.- Parameters:
names
- user given names of the descriptor componentstypes
- type names of the descriptors (e.g.ChemicalFingerprint
;generatemd -L
command lists all available built-in descriptor types)params
- parameter settings for the descriptor components (e.g.CFParameters
)comments
- optional comments to be stored in database- Throws:
MDGeneratorException
- when attempting to call this method afterinit()
-
setTagName
Sets the SDfile tag name for the only descriptor type generated.- Parameters:
name
- SDfile tag name- Throws:
MDGeneratorException
- when attempting to call this method afterinit()
or when applying this method in multiple descriptor case
-
setTagName
Sets the SDfile tag name for the given descriptor set component.- Parameters:
index
- index of the componentname
- SDfile tag name- Throws:
MDGeneratorException
- when attempting to call this method afterinit()
-
setTagNames
Sets the SDfile tag names for all descriptor set components.- Parameters:
names
- SDfile tag names- Throws:
MDGeneratorException
- when attempting to call this method afterinit()
-
setGenerateId
Toggles automatic unique structure/descriptor identifier generation mode and sets the value of the first unique identifier.- Parameters:
from
- the value of the first id to be generated- Throws:
MDGeneratorException
- when attempting to call this method afterinit()
or when attempting to generate ID-s for database structures
-
setCreateStat
public void setCreateStat(boolean createStat) Toggles create statistics flag.- Parameters:
createStat
- new value for the create statistics flag
-
init
Initialize the generator object. Call this method only after all features, modes and parameters have been set by the setter methods.- Throws:
MDGeneratorException
- when attempting to initialize once again, all input/output (file creation and writing) and all database (SQL) exceptions are re-thrown
-
step
Fetches one structure from the input source and generates descriptors as specified before initialization by the setter methods.- Returns:
- true if a structure was successfully processed
- Throws:
MDGeneratorException
- when not yet initialized or failure to read input or write output
-
getCounter
Gets the number of molecules processed sinceinit()
was called.- Returns:
- number of structures processed
- Throws:
MDGeneratorException
- when not yet initialized
-
getASSBClusters
public int[] getASSBClusters() -
run
Processes all structures from the input source. Structure from the input are retrieved one-by-one and all descriptors types set earlier (by the set methods) are generated and stored in the specified output.- Throws:
MDGeneratorException
- not yet initialized or failed to read input or write output
-
getStatistics
Gets statistical data on descriptors generated.- Parameters:
di
- descriptor component index- Returns:
- statistics in a formatted string
- Since:
- JChem 2.1
-
validateDescriptor
public void validateDescriptor()Validates a descriptor by the activity-seeded structure-based clustering. -
close
Closes the generator, all output files or database connection.- Throws:
MDGeneratorException
- when not yet initialized or failed to close output files
-
createMDTable
public void createMDTable(String descrName, String className, String settings, String comment) throws MDGeneratorException Creates a database table to store theMolecularDescriptor
s generated. There is no need to call this method directly if descriptors are generated with methods offered by this class, for advanced usage only.
The corresponding structure table's name should be set bysetStructureTableName( String )
prior to calling this function.- Parameters:
descrName
- symbolic name of the descriptor, given by the userclassName
- name of the class implementing the descriptorsettings
- parameter stringcomment
- optional comment- Throws:
MDGeneratorException
- when there is no valid database connection or an SQL error occurred
-
deleteMDTable
Deletes a database table that strores molecular descriptors. All raws, the table and all corresponding administrative information is lost irreversibly.- Parameters:
descrName
- name of the descriptor (as given by the user when created)- Throws:
MDGeneratorException
- when there is no valid databaseSQLException
- any database error
-
updateMDTable
Systematically regenerates all descriptors. Call this method, when new structures are added to the structure table.- Parameters:
descrName
- name of the descriptor (as given by the user when created)- Throws:
MDGeneratorException
- when there is no valid databaseSQLException
- any database error
-
addMDConfig
public void addMDConfig(String descrName, String configName, String config) throws MDGeneratorException, SQLException Adds a new parameter configuration to the descriptor. Such extra configurations, often called as 'screening configurations' can extend or overwrite parameter settings stored in the time of creation. A typical example is adding new dissimilarity metrics optimized for a new active compound family to the existing set of metrics.- Parameters:
descrName
- name of the descriptor (as given by the user when created)configName
- symbolic name given by the user to help the identification of the extension configurationconfig
- extra configuration settings- Throws:
MDGeneratorException
- when there is no valid database or an existing configuration is attempted to be redefinedSQLException
- any database error
-
addMDConfig
public void addMDConfig(String descrName, String configName, File configFile) throws SQLException, IOException, MDGeneratorException Adds a new parameter configuration to the descriptor. Such extra configurations, often called as 'screening configurations' can extend or overwrite parameter settings stored in the time of creation. A typical example is adding new dissimilarity metrics optimized for a new active compound family to the existing set of metrics.- Parameters:
descrName
- name of the descriptor (as given by the user when created)configName
- symbolic name given by the user to help the identification of the extension configurationconfigFile
- file of extra configuration settings- Throws:
MDGeneratorException
- when there is no valid database or an existing configuration is attempted to be redefinedSQLException
- any database errorIOException
- in case of file reading problems.
-
deleteMDConfig
public void deleteMDConfig(String descrName, String configName) throws MDGeneratorException, SQLException Deletes an extension configuration.- Parameters:
descrName
- name of the descriptor (as given by the user when created)configName
- symbolic name given by the user to help the identification of the extension configuration- Throws:
MDGeneratorException
- when there is no valid databaseSQLException
- any database error
-
getMDNames
Gets the names of all descriptor types stored in the database that are associated with the current structure table.- Returns:
- molecular descriptors' names
- Throws:
MDGeneratorException
- when there is no valid databaseSQLException
- any database error
-
main
Command-line entry point to theMolecularDescriptor
generator.- Parameters:
args
- the command line arguments
-