@PublicAPI public class GenerateMD extends java.lang.Object
GenerateMD
provides a high level Application Program Interface
(API) with comprehensive functionality for the generation of various
Molecular Descriptors.
The API supports all kinds of inputs and outputs (molecule files, database,
desciptor files), and is capable of generating multiple descriptors
simultaneously.
MDParameters cfpConfig = new CFParameters( new File("jchem/examples/config/cfp.xml") ); MDParameters cfpConfig = new PFParameters( new File("jchem/examples/config/pharma-frag.xml") ); GenerateMD generator = new GenerateMD( 2 ); generator.setInput( "molecules.sdf" ); generator.setSDFileInput( true ); generator.setDescriptor( 0, "molecules.cfp", "CF", cfpConfig, "" ); generator.setDescriptor( 1, "molecules.pfp", "PF", pfpConfig, "" ); generator.init(); generator.run(); generator.close();
The above example generates two descriptors (a descriptor set) at the same
time for every structures read from the input file molecules.sdf
.
The first component of the descriptor set is a chemical fingerprint which
is configured from the parameter file jchem/examples/config/cfp.xml
, while the second is a pharmcophore fingerprint configured by the
jchem/examples/config/pharma-frag.xml
configuration file.
GenerateMD supports the following descriptor types (generatemd -L
lists all available built-in descriptor types):
ECFP fingerprint (ECFP) 3D Shape descriptor (Shape) Chemical Fingerprint (CF) Pharmacophore Fingerprint (PF) Reaction Fingerprint (RF) BCUT descriptors (BCUT) Hydrogen bond Donor/Acceptor count (HDon/HAcc) octanol-water distribution coefficient (LogD) octanol-water partition coefficient (LogP) Topological Polar Surface Area (TPSA) Mass of molcule (Mass) number of Heavy atoms (Heavy)The chemical and pharmcophore fingerprints generated are written into the files
molecules.cfp
and molecules.pfp
respectively.
This class does not provide methods others than transforming a molecular
structure retrieved from the input source into one or more descriptor files or
database tables.
GenerateMD
also servers as a command line tool for the generation
of Molecular Descriptors from batch.
Beside supporting all kinds of MolecularDescriptors implemented by ChemAxon,
it is capable of generating arbitarary custom MolecularDescriptors (which
are derived from the MolecularDescriptor
class) implemented by
users or third parties.
GenerateMD
accepts various import sources: molecular files in
many standard format, and database table (JChem structure tables).
MolecularDescriptor
s generated are stored in file in the case
of file input, and in database tables (so called MD
tables) when
input molecules are retrieved from a structure table. SDfile output stores
the descriptors generated in a custom tag. It is also possible to produce
MolecularDescriptor
files that do not include any structural
information only the descriptors in a readable format. Such files allow faster
operation than SDfiles in further processing steps (for example in
virtual screening).
Constructor and Description |
---|
GenerateMD()
Creates an empty
MolecularDescripotor generator object. |
GenerateMD(int descriptorCount)
Creates an object for generating the given number of different
MolecularDescriptor s (a molecular descriptor set, MDSet
) simultaneously. |
Modifier and Type | Method and Description |
---|---|
void |
addMDConfig(java.lang.String descrName,
java.lang.String configName,
java.io.File configFile)
Adds a new parameter configuration to the descriptor.
|
void |
addMDConfig(java.lang.String descrName,
java.lang.String configName,
java.lang.String config)
Adds a new parameter configuration to the descriptor.
|
void |
close()
Closes the generator, all output files or database connection.
|
void |
createMDTable(java.lang.String descrName,
java.lang.String className,
java.lang.String settings,
java.lang.String comment)
Creates a database table to store the
MolecularDescriptor s
generated. |
void |
deleteMDConfig(java.lang.String descrName,
java.lang.String configName)
Deletes an extension configuration.
|
void |
deleteMDTable(java.lang.String descrName)
Deletes a database table that strores molecular descriptors.
|
int[] |
getASSBClusters() |
int |
getCounter()
Gets the number of molecules processed since
init() was
called. |
java.lang.String[] |
getMDNames()
Gets the names of all descriptor types stored in the database that are
associated with the current structure table.
|
java.lang.String |
getStatistics(int di)
Gets statistical data on descriptors generated.
|
void |
init()
Initialize the generator object.
|
static void |
main(java.lang.String[] args)
Command-line entry point to the
MolecularDescriptor generator. |
void |
run()
Processes all structures from the input source.
|
void |
setBinaryOutput(boolean binaryOutput)
Sets decimal output format.
|
void |
setConnectionHandler(ConnectionHandler connectionHandler)
Sets the database connection when both structures and
descriptors are stored in a database.
|
void |
setCreateStat(boolean createStat)
Toggles create statistics flag.
|
void |
setDecimalOutput(boolean decimalOutput)
Sets decimal output format.
|
void |
setDescriptor(int index,
java.lang.String name,
java.lang.String type,
MDParameters params,
java.lang.String comment)
Sets type, name, parameters and comment for a given descriptor component.
|
void |
setDescriptor(int index,
java.lang.String name,
java.lang.String type,
java.lang.String settings,
java.lang.String comment)
Sets type, name, parameters and comment for a given descriptor component.
|
void |
setDescriptor(java.lang.String name,
java.lang.String type,
MDParameters params,
java.lang.String comment)
Sets type, name, parameters and comment for a given descriptor component.
|
void |
setDescriptor(java.lang.String name,
java.lang.String type,
java.lang.String settings,
java.lang.String comment)
Sets the descriptor to be generated.
|
void |
setDescriptors(java.lang.String[] names,
java.lang.String[] types,
MDParameters[] params,
java.lang.String[] comments)
Sets type, name, parameters and comment for all components of a
molecular descriptor set.
|
void |
setDescriptors(java.lang.String[] names,
java.lang.String[] types,
java.lang.String[] settings,
java.lang.String[] comments)
Sets all descriptor components to be generated simultaneously.
|
void |
setGenerateId(int from)
Toggles automatic unique structure/descriptor identifier generation mode
and sets the value of the first unique identifier.
|
void |
setIdTagName(java.lang.String idTagName)
Sets the name of the input SDfile tag which contains unique structure
identifiers.
|
void |
setInput(java.io.InputStream input)
Sets the input to an already opened molecular structure stream.
|
void |
setInput(java.lang.String inputFileName)
Sets the name of the input molecular structure file.
|
void |
setOutputFileName(java.lang.String outputFileName)
Sets the name of the output
SDfile . |
void |
setSDfileInput(boolean sdfInput)
Toggles input file type.
|
void |
setSDfileOutput(boolean sdfOutput)
Toggles SDfile output format.
|
void |
setSelectStatement(java.lang.String whereClause)
Sets the optional select statement for fetching molecules from the
structure table.
|
void |
setStructureTableName(java.lang.String structureTableName)
Sets the name of the structure table to take molecular structures from.
|
void |
setTagName(int index,
java.lang.String name)
Sets the SDfile tag name for the given descriptor set component.
|
void |
setTagName(java.lang.String name)
Sets the SDfile tag name for the only descriptor type generated.
|
void |
setTagNames(java.lang.String[] names)
Sets the SDfile tag names for all descriptor set components.
|
void |
setUpdateOnInsert(boolean updateOnInsert)
Sets/clears automatic update on insert mode.
|
void |
setValidateDescriptor(java.lang.String activityTagName,
double clusteringRadius,
java.lang.String metric)
Sets parameters for the Activity-seeded Structure-based clustering.
|
boolean |
step()
Fetches one structure from the input source and generates descriptors
as specified before initialization by the setter methods.
|
void |
updateMDTable(java.lang.String descrName)
Systematically regenerates all descriptors.
|
void |
validateDescriptor()
Validates a descriptor by the activity-seeded structure-based clustering.
|
public GenerateMD()
MolecularDescripotor
generator object.public GenerateMD(int descriptorCount)
MolecularDescriptor
s (a molecular descriptor set, MDSet
) simultaneously.descriptorCount
- number of independent descriptor types to be
generatedpublic void setConnectionHandler(ConnectionHandler connectionHandler) throws MDGeneratorException
connectionHandler
- valid connection to a databaseMDGeneratorException
- when attempting to call this method
after init()
public void setStructureTableName(java.lang.String structureTableName) throws MDGeneratorException, java.sql.SQLException
structureTableName
- name of the database table of input
structuresMDGeneratorException
- when attempting to call this method
after init()
or when
there is no valid database
connection; or if descriptor validation
option was selected beforehandjava.sql.SQLException
- in the case of database management errorspublic void setUpdateOnInsert(boolean updateOnInsert)
updateOnInsert
- indicates auto-update modepublic void setSelectStatement(java.lang.String whereClause) throws MDGeneratorException
whereClause
- restrict clause without the WHERE
statementMDGeneratorException
- when attempting to call this method
after init()
or when
there is no valid and alive database
connection, or when no structure table
name has been setpublic void setInput(java.lang.String inputFileName) throws MDGeneratorException, java.io.IOException
inputFileName
- name of the input fileMDGeneratorException
- when attempting to call this method
after init()
or when
there is already a valid and alive
database connectionjava.io.IOException
- in case of file reading problems.public void setInput(java.io.InputStream input) throws MDGeneratorException
input
- an input streamMDGeneratorException
- when attempting to call this method
after init()
or when
there is already a valid and alive
database connectionpublic void setSDfileInput(boolean sdfInput) throws MDGeneratorException
sdfInput
- indicates, if input file is an SDfile
MDGeneratorException
- when attempting to call this method
after init()
or when no
input file has been specifiedpublic void setOutputFileName(java.lang.String outputFileName) throws MDGeneratorException
SDfile
. Note, that if the
required output is one or more descriptor file(s), it (they) should not
be specified as output file(s), but as descriptor name(s).outputFileName
- name of the output SDfileMDGeneratorException
- when attempting to call this method
after init()
or when
there is already a valid and alive
database connectionpublic void setSDfileOutput(boolean sdfOutput)
sdfOutput
- indicates if output file is an SDfilepublic void setDecimalOutput(boolean decimalOutput)
decimalOutput
- new value for the optionpublic void setBinaryOutput(boolean binaryOutput)
binaryOutput
- new value for the optionpublic void setIdTagName(java.lang.String idTagName)
idTagName
- SDfile structure identifier tag namepublic void setValidateDescriptor(java.lang.String activityTagName, double clusteringRadius, java.lang.String metric) throws MDGeneratorException
activityTagName
- name of the SDfile tag storing activity dataclusteringRadius
- dissimilarity radius of a cluster around a seedmetric
- metric used in clusteringMDGeneratorException
- when attempting to call this method
after init()
or when
applying this method in multiple
descriptor casepublic void setDescriptor(java.lang.String name, java.lang.String type, java.lang.String settings, java.lang.String comment) throws MDGeneratorException
name
- user given name of the descriptortype
- type name of the descriptor (e.g. ChemicalFingerprint
;
generatemd -L
command lists all available built-in descriptor types)settings
- parameter settings of the descriptor (XML)comment
- optional comment to be stored in databaseMDGeneratorException
- when attempting to call this method
after init()
or when
applying this method in multiple
descriptor casepublic void setDescriptor(int index, java.lang.String name, java.lang.String type, java.lang.String settings, java.lang.String comment) throws MDGeneratorException
index
- index of the componentname
- user given name of the descriptor set componenttype
- type name of the descriptor (e.g. ChemicalFingerprint
;
generatemd -L
command lists all available built-in descriptor types)settings
- parameter settings for the descriptor (XML)comment
- optional comment to be stored in databaseMDGeneratorException
- when attempting to call this method
after init()
or when
another component's settings were
specified with an MDParameters
object (rather than a
String
public void setDescriptors(java.lang.String[] names, java.lang.String[] types, java.lang.String[] settings, java.lang.String[] comments) throws MDGeneratorException
names
- user given names of the descriptor set componentstypes
- type names of the descriptors (e.g. ChemicalFingerprint
;
generatemd -L
command lists all available built-in descriptor types)settings
- parameter settings for the descriptors (XML)comments
- optional comments to be stored in databaseMDGeneratorException
- when attempting to call this method
after init()
public void setDescriptor(java.lang.String name, java.lang.String type, MDParameters params, java.lang.String comment) throws MDGeneratorException
name
- user given name of the descriptortype
- type name of the descriptor (e.g. ChemicalFingerprint
;
generatemd -L
command lists all available built-in descriptor types)params
- parameter settings for the descriptor (e.g. CFParameters
)comment
- optional comment to be stored in databaseMDGeneratorException
- when attempting to call this method
after init()
or when
applying this method in multiple
descriptor casepublic void setDescriptor(int index, java.lang.String name, java.lang.String type, MDParameters params, java.lang.String comment) throws MDGeneratorException
index
- index of the component to be specifiedname
- user given name of the descriptortype
- type name of the descriptor (e.g. ChemicalFingerprint
;
generatemd -L
command lists all available built-in descriptor types)params
- parameter settings of the descriptor (e.g. CFParameters
)comment
- optional comment to be stored indatabase onlyMDGeneratorException
- when attempting to call this method
after init()
or when
a previously set component was specified
with a String
parameter
settingpublic void setDescriptors(java.lang.String[] names, java.lang.String[] types, MDParameters[] params, java.lang.String[] comments) throws MDGeneratorException
names
- user given names of the descriptor componentstypes
- type names of the descriptors (e.g. ChemicalFingerprint
;
generatemd -L
command lists all available built-in descriptor types)params
- parameter settings for the descriptor components (e.g. CFParameters
)comments
- optional comments to be stored in databaseMDGeneratorException
- when attempting to call this method
after init()
public void setTagName(java.lang.String name) throws MDGeneratorException
name
- SDfile tag nameMDGeneratorException
- when attempting to call this method
after init()
or when
applying this method in multiple
descriptor casepublic void setTagName(int index, java.lang.String name) throws MDGeneratorException
index
- index of the componentname
- SDfile tag nameMDGeneratorException
- when attempting to call this method
after init()
public void setTagNames(java.lang.String[] names) throws MDGeneratorException
names
- SDfile tag namesMDGeneratorException
- when attempting to call this method
after init()
public void setGenerateId(int from) throws MDGeneratorException
from
- the value of the first id to be generatedMDGeneratorException
- when attempting to call this method
after init()
or when
attempting to generate ID-s for database
structurespublic void setCreateStat(boolean createStat)
createStat
- new value for the create statistics flagpublic void init() throws MDGeneratorException
MDGeneratorException
- when attempting to initialize once again,
all input/output (file creation and
writing) and all database (SQL)
exceptions are re-thrownpublic boolean step() throws MDGeneratorException
MDGeneratorException
- when not yet initialized or failure
to read input or write outputpublic int getCounter() throws MDGeneratorException
init()
was
called.MDGeneratorException
- when not yet initializedpublic int[] getASSBClusters()
public void run() throws MDGeneratorException
MDGeneratorException
- not yet initialized or failed
to read input or write outputpublic java.lang.String getStatistics(int di)
di
- descriptor component indexpublic void validateDescriptor()
public void close() throws MDGeneratorException
MDGeneratorException
- when not yet initialized or failed
to close output filespublic void createMDTable(java.lang.String descrName, java.lang.String className, java.lang.String settings, java.lang.String comment) throws MDGeneratorException
MolecularDescriptor
s
generated. There is no need to call this method directly if descriptors
are generated with methods offered by this class, for advanced usage only.
setStructureTableName( String )
prior to calling
this function.descrName
- symbolic name of the descriptor, given by the userclassName
- name of the class implementing the descriptorsettings
- parameter stringcomment
- optional commentMDGeneratorException
- when there is no valid database
connection or an SQL error occurredpublic void deleteMDTable(java.lang.String descrName) throws MDGeneratorException, java.sql.SQLException
descrName
- name of the descriptor (as given by the user when
created)MDGeneratorException
- when there is no valid databasejava.sql.SQLException
- any database errorpublic void updateMDTable(java.lang.String descrName) throws MDGeneratorException, java.sql.SQLException
descrName
- name of the descriptor (as given by the user when
created)MDGeneratorException
- when there is no valid databasejava.sql.SQLException
- any database errorpublic void addMDConfig(java.lang.String descrName, java.lang.String configName, java.lang.String config) throws MDGeneratorException, java.sql.SQLException
descrName
- name of the descriptor (as given by the user when
created)configName
- symbolic name given by the user to help the
identification of the extension configurationconfig
- extra configuration settingsMDGeneratorException
- when there is no valid database or an
existing configuration is attempted to
be redefinedjava.sql.SQLException
- any database errorpublic void addMDConfig(java.lang.String descrName, java.lang.String configName, java.io.File configFile) throws java.sql.SQLException, java.io.IOException, MDGeneratorException
descrName
- name of the descriptor (as given by the user when
created)configName
- symbolic name given by the user to help the
identification of the extension configurationconfigFile
- file of extra configuration settingsMDGeneratorException
- when there is no valid database or an
existing configuration is attempted to
be redefinedjava.sql.SQLException
- any database errorjava.io.IOException
- in case of file reading problems.public void deleteMDConfig(java.lang.String descrName, java.lang.String configName) throws MDGeneratorException, java.sql.SQLException
descrName
- name of the descriptor (as given by the user when
created)configName
- symbolic name given by the user to help the
identification of the extension configurationMDGeneratorException
- when there is no valid databasejava.sql.SQLException
- any database errorpublic java.lang.String[] getMDNames() throws MDGeneratorException, java.sql.SQLException
MDGeneratorException
- when there is no valid databasejava.sql.SQLException
- any database errorpublic static void main(java.lang.String[] args)
MolecularDescriptor
generator.args
- the command line arguments