Class GenerateMD

java.lang.Object
chemaxon.descriptors.GenerateMD

@PublicApi public class GenerateMD extends Object
GenerateMD provides a high level Application Program Interface (API) with comprehensive functionality for the generation of various Molecular Descriptors. The API supports all kinds of inputs and outputs (molecule files, database, desciptor files), and is capable of generating multiple descriptors simultaneously.
Example of typical usage:
      MDParameters cfpConfig = new CFParameters( new File("jchem/examples/config/cfp.xml") );
      MDParameters cfpConfig = new PFParameters( new File("jchem/examples/config/pharma-frag.xml") );
      GenerateMD generator = new GenerateMD( 2 );
      generator.setInput( "molecules.sdf" );
      generator.setSDFileInput( true );
      generator.setDescriptor( 0, "molecules.cfp", "CF", cfpConfig, "" );
      generator.setDescriptor( 1, "molecules.pfp", "PF", pfpConfig, "" );
      generator.init();
      generator.run();
      generator.close();
 

The above example generates two descriptors (a descriptor set) at the same time for every structures read from the input file molecules.sdf. The first component of the descriptor set is a chemical fingerprint which is configured from the parameter file jchem/examples/config/cfp.xml , while the second is a pharmcophore fingerprint configured by the jchem/examples/config/pharma-frag.xml configuration file.
GenerateMD supports the following descriptor types (generatemd -L lists all available built-in descriptor types):

   ECFP fingerprint (ECFP)
   3D Shape descriptor (Shape)
   Chemical Fingerprint (CF)
   Pharmacophore Fingerprint (PF)
   Reaction Fingerprint (RF)
   BCUT descriptors (BCUT)
   Hydrogen bond Donor/Acceptor count (HDon/HAcc)
   octanol-water distribution coefficient (LogD)
   octanol-water partition coefficient (LogP)
   Topological Polar Surface Area (TPSA)
   Mass of molcule (Mass)
   number of Heavy atoms (Heavy)
The chemical and pharmcophore fingerprints generated are written into the files molecules.cfp and molecules.pfp respectively.

This class does not provide methods others than transforming a molecular structure retrieved from the input source into one or more descriptor files or database tables.
GenerateMD also servers as a command line tool for the generation of Molecular Descriptors from batch.
Beside supporting all kinds of MolecularDescriptors implemented by ChemAxon, it is capable of generating arbitarary custom MolecularDescriptors (which are derived from the MolecularDescriptor class) implemented by users or third parties.
GenerateMD accepts various import sources: molecular files in many standard format, and database table (JChem structure tables). MolecularDescriptors generated are stored in file in the case of file input, and in database tables (so called MD tables) when input molecules are retrieved from a structure table. SDfile output stores the descriptors generated in a custom tag. It is also possible to produce MolecularDescriptor files that do not include any structural information only the descriptors in a readable format. Such files allow faster operation than SDfiles in further processing steps (for example in virtual screening).

Since:
JChem 2.0
  • Constructor Summary

    Constructors
    Constructor
    Description
    Creates an empty MolecularDescripotor generator object.
    GenerateMD(int descriptorCount)
    Creates an object for generating the given number of different MolecularDescriptors (a molecular descriptor set, MDSet ) simultaneously.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    addMDConfig(String descrName, String configName, File configFile)
    Adds a new parameter configuration to the descriptor.
    void
    addMDConfig(String descrName, String configName, String config)
    Adds a new parameter configuration to the descriptor.
    void
    Closes the generator, all output files or database connection.
    void
    createMDTable(String descrName, String className, String settings, String comment)
    Creates a database table to store the MolecularDescriptors generated.
    void
    deleteMDConfig(String descrName, String configName)
    Deletes an extension configuration.
    void
    deleteMDTable(String descrName)
    Deletes a database table that strores molecular descriptors.
    int[]
     
    int
    Gets the number of molecules processed since init() was called.
    Gets the names of all descriptor types stored in the database that are associated with the current structure table.
    getStatistics(int di)
    Gets statistical data on descriptors generated.
    void
    Initialize the generator object.
    static void
    main(String[] args)
    Command-line entry point to the MolecularDescriptor generator.
    void
    run()
    Processes all structures from the input source.
    void
    setBinaryOutput(boolean binaryOutput)
    Sets decimal output format.
    void
    Sets the database connection when both structures and descriptors are stored in a database.
    void
    setCreateStat(boolean createStat)
    Toggles create statistics flag.
    void
    setDecimalOutput(boolean decimalOutput)
    Sets decimal output format.
    void
    setDescriptor(int index, String name, String type, MDParameters params, String comment)
    Sets type, name, parameters and comment for a given descriptor component.
    void
    setDescriptor(int index, String name, String type, String settings, String comment)
    Sets type, name, parameters and comment for a given descriptor component.
    void
    setDescriptor(String name, String type, MDParameters params, String comment)
    Sets type, name, parameters and comment for a given descriptor component.
    void
    setDescriptor(String name, String type, String settings, String comment)
    Sets the descriptor to be generated.
    void
    setDescriptors(String[] names, String[] types, MDParameters[] params, String[] comments)
    Sets type, name, parameters and comment for all components of a molecular descriptor set.
    void
    setDescriptors(String[] names, String[] types, String[] settings, String[] comments)
    Sets all descriptor components to be generated simultaneously.
    void
    setGenerateId(int from)
    Toggles automatic unique structure/descriptor identifier generation mode and sets the value of the first unique identifier.
    void
    setIdTagName(String idTagName)
    Sets the name of the input SDfile tag which contains unique structure identifiers.
    void
    Sets the input to an already opened molecular structure stream.
    void
    setInput(String inputFileName)
    Sets the name of the input molecular structure file.
    void
    setOutputFileName(String outputFileName)
    Sets the name of the output SDfile.
    void
    setSDfileInput(boolean sdfInput)
    Toggles input file type.
    void
    setSDfileOutput(boolean sdfOutput)
    Toggles SDfile output format.
    void
    Sets the optional select statement for fetching molecules from the structure table.
    void
    setStructureTableName(String structureTableName)
    Sets the name of the structure table to take molecular structures from.
    void
    setTagName(int index, String name)
    Sets the SDfile tag name for the given descriptor set component.
    void
    Sets the SDfile tag name for the only descriptor type generated.
    void
    Sets the SDfile tag names for all descriptor set components.
    void
    setUpdateOnInsert(boolean updateOnInsert)
    Sets/clears automatic update on insert mode.
    void
    setValidateDescriptor(String activityTagName, double clusteringRadius, String metric)
    Sets parameters for the Activity-seeded Structure-based clustering.
    boolean
    Fetches one structure from the input source and generates descriptors as specified before initialization by the setter methods.
    void
    updateMDTable(String descrName)
    Systematically regenerates all descriptors.
    void
    Validates a descriptor by the activity-seeded structure-based clustering.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • GenerateMD

      public GenerateMD()
      Creates an empty MolecularDescripotor generator object.
    • GenerateMD

      public GenerateMD(int descriptorCount)
      Creates an object for generating the given number of different MolecularDescriptors (a molecular descriptor set, MDSet ) simultaneously.
      Parameters:
      descriptorCount - number of independent descriptor types to be generated
  • Method Details

    • setConnectionHandler

      public void setConnectionHandler(ConnectionHandler connectionHandler) throws MDGeneratorException
      Sets the database connection when both structures and descriptors are stored in a database.
      Parameters:
      connectionHandler - valid connection to a database
      Throws:
      MDGeneratorException - when attempting to call this method after init()
    • setStructureTableName

      public void setStructureTableName(String structureTableName) throws MDGeneratorException, SQLException
      Sets the name of the structure table to take molecular structures from. Use this when input comes from a database.
      Parameters:
      structureTableName - name of the database table of input structures
      Throws:
      MDGeneratorException - when attempting to call this method after init() or when there is no valid database connection; or if descriptor validation option was selected beforehand
      SQLException - in the case of database management errors
    • setUpdateOnInsert

      public void setUpdateOnInsert(boolean updateOnInsert)
      Sets/clears automatic update on insert mode. Auto-update on insert means that the descriptor table is automatically updated when a new structure is inserted into the original structure table.
      Parameters:
      updateOnInsert - indicates auto-update mode
      Since:
      JChem 2.3
    • setSelectStatement

      public void setSelectStatement(String whereClause) throws MDGeneratorException
      Sets the optional select statement for fetching molecules from the structure table.
      Parameters:
      whereClause - restrict clause without the WHERE statement
      Throws:
      MDGeneratorException - when attempting to call this method after init() or when there is no valid and alive database connection, or when no structure table name has been set
    • setInput

      public void setInput(String inputFileName) throws MDGeneratorException, IOException
      Sets the name of the input molecular structure file.
      Parameters:
      inputFileName - name of the input file
      Throws:
      MDGeneratorException - when attempting to call this method after init() or when there is already a valid and alive database connection
      IOException - in case of file reading problems.
    • setInput

      public void setInput(InputStream input) throws MDGeneratorException
      Sets the input to an already opened molecular structure stream.
      Parameters:
      input - an input stream
      Throws:
      MDGeneratorException - when attempting to call this method after init() or when there is already a valid and alive database connection
    • setSDfileInput

      public void setSDfileInput(boolean sdfInput) throws MDGeneratorException
      Toggles input file type.
      Parameters:
      sdfInput - indicates, if input file is an SDfile
      Throws:
      MDGeneratorException - when attempting to call this method after init() or when no input file has been specified
    • setOutputFileName

      public void setOutputFileName(String outputFileName) throws MDGeneratorException
      Sets the name of the output SDfile. Note, that if the required output is one or more descriptor file(s), it (they) should not be specified as output file(s), but as descriptor name(s).
      Parameters:
      outputFileName - name of the output SDfile
      Throws:
      MDGeneratorException - when attempting to call this method after init() or when there is already a valid and alive database connection
    • setSDfileOutput

      public void setSDfileOutput(boolean sdfOutput)
      Toggles SDfile output format.
      Parameters:
      sdfOutput - indicates if output file is an SDfile
    • setDecimalOutput

      public void setDecimalOutput(boolean decimalOutput)
      Sets decimal output format. This file format is recognized by JKlustor tools.
      Parameters:
      decimalOutput - new value for the option
      Since:
      JChem 2.0.1
    • setBinaryOutput

      public void setBinaryOutput(boolean binaryOutput)
      Sets decimal output format. This file format is recognized by JKlustor tools.
      Parameters:
      binaryOutput - new value for the option
      Since:
      JChem 2.3
    • setIdTagName

      public void setIdTagName(String idTagName)
      Sets the name of the input SDfile tag which contains unique structure identifiers. These identifiers are printed in each line of the decimal output format.
      Parameters:
      idTagName - SDfile structure identifier tag name
      Since:
      JChem 2.0.1
    • setValidateDescriptor

      public void setValidateDescriptor(String activityTagName, double clusteringRadius, String metric) throws MDGeneratorException
      Sets parameters for the Activity-seeded Structure-based clustering.
      Parameters:
      activityTagName - name of the SDfile tag storing activity data
      clusteringRadius - dissimilarity radius of a cluster around a seed
      metric - metric used in clustering
      Throws:
      MDGeneratorException - when attempting to call this method after init() or when applying this method in multiple descriptor case
      Since:
      JChem 2.3
    • setDescriptor

      public void setDescriptor(String name, String type, String settings, String comment) throws MDGeneratorException
      Sets the descriptor to be generated. Use this method when descriptor of one type are generated (that is, the descriptor set has one component only).
      Parameters:
      name - user given name of the descriptor
      type - type name of the descriptor (e.g. ChemicalFingerprint; generatemd -L command lists all available built-in descriptor types)
      settings - parameter settings of the descriptor (XML)
      comment - optional comment to be stored in database
      Throws:
      MDGeneratorException - when attempting to call this method after init() or when applying this method in multiple descriptor case
    • setDescriptor

      public void setDescriptor(int index, String name, String type, String settings, String comment) throws MDGeneratorException
      Sets type, name, parameters and comment for a given descriptor component. Use this method when more than one descriptors are generated at a time (e.g. CF and PF simultaneously).
      Parameters:
      index - index of the component
      name - user given name of the descriptor set component
      type - type name of the descriptor (e.g. ChemicalFingerprint; generatemd -L command lists all available built-in descriptor types)
      settings - parameter settings for the descriptor (XML)
      comment - optional comment to be stored in database
      Throws:
      MDGeneratorException - when attempting to call this method after init() or when another component's settings were specified with an MDParameters object (rather than a String
    • setDescriptors

      public void setDescriptors(String[] names, String[] types, String[] settings, String[] comments) throws MDGeneratorException
      Sets all descriptor components to be generated simultaneously.
      Parameters:
      names - user given names of the descriptor set components
      types - type names of the descriptors (e.g. ChemicalFingerprint; generatemd -L command lists all available built-in descriptor types)
      settings - parameter settings for the descriptors (XML)
      comments - optional comments to be stored in database
      Throws:
      MDGeneratorException - when attempting to call this method after init()
    • setDescriptor

      public void setDescriptor(String name, String type, MDParameters params, String comment) throws MDGeneratorException
      Sets type, name, parameters and comment for a given descriptor component. Use this method when only one descriptor type is generated.
      Parameters:
      name - user given name of the descriptor
      type - type name of the descriptor (e.g. ChemicalFingerprint; generatemd -L command lists all available built-in descriptor types)
      params - parameter settings for the descriptor (e.g. CFParameters)
      comment - optional comment to be stored in database
      Throws:
      MDGeneratorException - when attempting to call this method after init() or when applying this method in multiple descriptor case
    • setDescriptor

      public void setDescriptor(int index, String name, String type, MDParameters params, String comment) throws MDGeneratorException
      Sets type, name, parameters and comment for a given descriptor component. Use this method when more than one descriptors are generated at a time and they are not specified all in one go.
      Parameters:
      index - index of the component to be specified
      name - user given name of the descriptor
      type - type name of the descriptor (e.g. ChemicalFingerprint; generatemd -L command lists all available built-in descriptor types)
      params - parameter settings of the descriptor (e.g. CFParameters)
      comment - optional comment to be stored indatabase only
      Throws:
      MDGeneratorException - when attempting to call this method after init() or when a previously set component was specified with a String parameter setting
    • setDescriptors

      public void setDescriptors(String[] names, String[] types, MDParameters[] params, String[] comments) throws MDGeneratorException
      Sets type, name, parameters and comment for all components of a molecular descriptor set.
      Parameters:
      names - user given names of the descriptor components
      types - type names of the descriptors (e.g. ChemicalFingerprint; generatemd -L command lists all available built-in descriptor types)
      params - parameter settings for the descriptor components (e.g. CFParameters)
      comments - optional comments to be stored in database
      Throws:
      MDGeneratorException - when attempting to call this method after init()
    • setTagName

      public void setTagName(String name) throws MDGeneratorException
      Sets the SDfile tag name for the only descriptor type generated.
      Parameters:
      name - SDfile tag name
      Throws:
      MDGeneratorException - when attempting to call this method after init() or when applying this method in multiple descriptor case
    • setTagName

      public void setTagName(int index, String name) throws MDGeneratorException
      Sets the SDfile tag name for the given descriptor set component.
      Parameters:
      index - index of the component
      name - SDfile tag name
      Throws:
      MDGeneratorException - when attempting to call this method after init()
    • setTagNames

      public void setTagNames(String[] names) throws MDGeneratorException
      Sets the SDfile tag names for all descriptor set components.
      Parameters:
      names - SDfile tag names
      Throws:
      MDGeneratorException - when attempting to call this method after init()
    • setGenerateId

      public void setGenerateId(int from) throws MDGeneratorException
      Toggles automatic unique structure/descriptor identifier generation mode and sets the value of the first unique identifier.
      Parameters:
      from - the value of the first id to be generated
      Throws:
      MDGeneratorException - when attempting to call this method after init() or when attempting to generate ID-s for database structures
    • setCreateStat

      public void setCreateStat(boolean createStat)
      Toggles create statistics flag.
      Parameters:
      createStat - new value for the create statistics flag
    • init

      public void init() throws MDGeneratorException
      Initialize the generator object. Call this method only after all features, modes and parameters have been set by the setter methods.
      Throws:
      MDGeneratorException - when attempting to initialize once again, all input/output (file creation and writing) and all database (SQL) exceptions are re-thrown
    • step

      public boolean step() throws MDGeneratorException
      Fetches one structure from the input source and generates descriptors as specified before initialization by the setter methods.
      Returns:
      true if a structure was successfully processed
      Throws:
      MDGeneratorException - when not yet initialized or failure to read input or write output
    • getCounter

      public int getCounter() throws MDGeneratorException
      Gets the number of molecules processed since init() was called.
      Returns:
      number of structures processed
      Throws:
      MDGeneratorException - when not yet initialized
    • getASSBClusters

      public int[] getASSBClusters()
    • run

      public void run() throws MDGeneratorException
      Processes all structures from the input source. Structure from the input are retrieved one-by-one and all descriptors types set earlier (by the set methods) are generated and stored in the specified output.
      Throws:
      MDGeneratorException - not yet initialized or failed to read input or write output
    • getStatistics

      public String getStatistics(int di)
      Gets statistical data on descriptors generated.
      Parameters:
      di - descriptor component index
      Returns:
      statistics in a formatted string
      Since:
      JChem 2.1
    • validateDescriptor

      public void validateDescriptor()
      Validates a descriptor by the activity-seeded structure-based clustering.
    • close

      public void close() throws MDGeneratorException
      Closes the generator, all output files or database connection.
      Throws:
      MDGeneratorException - when not yet initialized or failed to close output files
    • createMDTable

      public void createMDTable(String descrName, String className, String settings, String comment) throws MDGeneratorException
      Creates a database table to store the MolecularDescriptors generated. There is no need to call this method directly if descriptors are generated with methods offered by this class, for advanced usage only.
      The corresponding structure table's name should be set by setStructureTableName( String ) prior to calling this function.
      Parameters:
      descrName - symbolic name of the descriptor, given by the user
      className - name of the class implementing the descriptor
      settings - parameter string
      comment - optional comment
      Throws:
      MDGeneratorException - when there is no valid database connection or an SQL error occurred
    • deleteMDTable

      public void deleteMDTable(String descrName) throws MDGeneratorException, SQLException
      Deletes a database table that strores molecular descriptors. All raws, the table and all corresponding administrative information is lost irreversibly.
      Parameters:
      descrName - name of the descriptor (as given by the user when created)
      Throws:
      MDGeneratorException - when there is no valid database
      SQLException - any database error
    • updateMDTable

      public void updateMDTable(String descrName) throws MDGeneratorException, SQLException
      Systematically regenerates all descriptors. Call this method, when new structures are added to the structure table.
      Parameters:
      descrName - name of the descriptor (as given by the user when created)
      Throws:
      MDGeneratorException - when there is no valid database
      SQLException - any database error
    • addMDConfig

      public void addMDConfig(String descrName, String configName, String config) throws MDGeneratorException, SQLException
      Adds a new parameter configuration to the descriptor. Such extra configurations, often called as 'screening configurations' can extend or overwrite parameter settings stored in the time of creation. A typical example is adding new dissimilarity metrics optimized for a new active compound family to the existing set of metrics.
      Parameters:
      descrName - name of the descriptor (as given by the user when created)
      configName - symbolic name given by the user to help the identification of the extension configuration
      config - extra configuration settings
      Throws:
      MDGeneratorException - when there is no valid database or an existing configuration is attempted to be redefined
      SQLException - any database error
    • addMDConfig

      public void addMDConfig(String descrName, String configName, File configFile) throws SQLException, IOException, MDGeneratorException
      Adds a new parameter configuration to the descriptor. Such extra configurations, often called as 'screening configurations' can extend or overwrite parameter settings stored in the time of creation. A typical example is adding new dissimilarity metrics optimized for a new active compound family to the existing set of metrics.
      Parameters:
      descrName - name of the descriptor (as given by the user when created)
      configName - symbolic name given by the user to help the identification of the extension configuration
      configFile - file of extra configuration settings
      Throws:
      MDGeneratorException - when there is no valid database or an existing configuration is attempted to be redefined
      SQLException - any database error
      IOException - in case of file reading problems.
    • deleteMDConfig

      public void deleteMDConfig(String descrName, String configName) throws MDGeneratorException, SQLException
      Deletes an extension configuration.
      Parameters:
      descrName - name of the descriptor (as given by the user when created)
      configName - symbolic name given by the user to help the identification of the extension configuration
      Throws:
      MDGeneratorException - when there is no valid database
      SQLException - any database error
    • getMDNames

      public String[] getMDNames() throws MDGeneratorException, SQLException
      Gets the names of all descriptor types stored in the database that are associated with the current structure table.
      Returns:
      molecular descriptors' names
      Throws:
      MDGeneratorException - when there is no valid database
      SQLException - any database error
    • main

      public static void main(String[] args)
      Command-line entry point to the MolecularDescriptor generator.
      Parameters:
      args - the command line arguments