Class GenerateMD


  • @PublicAPI
    public class GenerateMD
    extends Object
    GenerateMD provides a high level Application Program Interface (API) with comprehensive functionality for the generation of various Molecular Descriptors. The API supports all kinds of inputs and outputs (molecule files, database, desciptor files), and is capable of generating multiple descriptors simultaneously.
    Example of typical usage:
          MDParameters cfpConfig = new CFParameters( new File("jchem/examples/config/cfp.xml") );
          MDParameters cfpConfig = new PFParameters( new File("jchem/examples/config/pharma-frag.xml") );
          GenerateMD generator = new GenerateMD( 2 );
          generator.setInput( "molecules.sdf" );
          generator.setSDFileInput( true );
          generator.setDescriptor( 0, "molecules.cfp", "CF", cfpConfig, "" );
          generator.setDescriptor( 1, "molecules.pfp", "PF", pfpConfig, "" );
          generator.init();
          generator.run();
          generator.close();
     

    The above example generates two descriptors (a descriptor set) at the same time for every structures read from the input file molecules.sdf. The first component of the descriptor set is a chemical fingerprint which is configured from the parameter file jchem/examples/config/cfp.xml , while the second is a pharmcophore fingerprint configured by the jchem/examples/config/pharma-frag.xml configuration file.
    GenerateMD supports the following descriptor types (generatemd -L lists all available built-in descriptor types):

       ECFP fingerprint (ECFP)
       3D Shape descriptor (Shape)
       Chemical Fingerprint (CF)
       Pharmacophore Fingerprint (PF)
       Reaction Fingerprint (RF)
       BCUT descriptors (BCUT)
       Hydrogen bond Donor/Acceptor count (HDon/HAcc)
       octanol-water distribution coefficient (LogD)
       octanol-water partition coefficient (LogP)
       Topological Polar Surface Area (TPSA)
       Mass of molcule (Mass)
       number of Heavy atoms (Heavy)
    The chemical and pharmcophore fingerprints generated are written into the files molecules.cfp and molecules.pfp respectively.

    This class does not provide methods others than transforming a molecular structure retrieved from the input source into one or more descriptor files or database tables.
    GenerateMD also servers as a command line tool for the generation of Molecular Descriptors from batch.
    Beside supporting all kinds of MolecularDescriptors implemented by ChemAxon, it is capable of generating arbitarary custom MolecularDescriptors (which are derived from the MolecularDescriptor class) implemented by users or third parties.
    GenerateMD accepts various import sources: molecular files in many standard format, and database table (JChem structure tables). MolecularDescriptors generated are stored in file in the case of file input, and in database tables (so called MD tables) when input molecules are retrieved from a structure table. SDfile output stores the descriptors generated in a custom tag. It is also possible to produce MolecularDescriptor files that do not include any structural information only the descriptors in a readable format. Such files allow faster operation than SDfiles in further processing steps (for example in virtual screening).

    Since:
    JChem 2.0
    • Constructor Summary

      Constructors 
      Constructor Description
      GenerateMD()
      Creates an empty MolecularDescripotor generator object.
      GenerateMD​(int descriptorCount)
      Creates an object for generating the given number of different MolecularDescriptors (a molecular descriptor set, MDSet ) simultaneously.
    • Constructor Detail

      • GenerateMD

        public GenerateMD()
        Creates an empty MolecularDescripotor generator object.
      • GenerateMD

        public GenerateMD​(int descriptorCount)
        Creates an object for generating the given number of different MolecularDescriptors (a molecular descriptor set, MDSet ) simultaneously.
        Parameters:
        descriptorCount - number of independent descriptor types to be generated
    • Method Detail

      • setConnectionHandler

        public void setConnectionHandler​(ConnectionHandler connectionHandler)
                                  throws MDGeneratorException
        Sets the database connection when both structures and descriptors are stored in a database.
        Parameters:
        connectionHandler - valid connection to a database
        Throws:
        MDGeneratorException - when attempting to call this method after init()
      • setStructureTableName

        public void setStructureTableName​(String structureTableName)
                                   throws MDGeneratorException,
                                          SQLException
        Sets the name of the structure table to take molecular structures from. Use this when input comes from a database.
        Parameters:
        structureTableName - name of the database table of input structures
        Throws:
        MDGeneratorException - when attempting to call this method after init() or when there is no valid database connection; or if descriptor validation option was selected beforehand
        SQLException - in the case of database management errors
      • setUpdateOnInsert

        public void setUpdateOnInsert​(boolean updateOnInsert)
        Sets/clears automatic update on insert mode. Auto-update on insert means that the descriptor table is automatically updated when a new structure is inserted into the original structure table.
        Parameters:
        updateOnInsert - indicates auto-update mode
        Since:
        JChem 2.3
      • setSelectStatement

        public void setSelectStatement​(String whereClause)
                                throws MDGeneratorException
        Sets the optional select statement for fetching molecules from the structure table.
        Parameters:
        whereClause - restrict clause without the WHERE statement
        Throws:
        MDGeneratorException - when attempting to call this method after init() or when there is no valid and alive database connection, or when no structure table name has been set
      • setInput

        public void setInput​(String inputFileName)
                      throws MDGeneratorException,
                             IOException
        Sets the name of the input molecular structure file.
        Parameters:
        inputFileName - name of the input file
        Throws:
        MDGeneratorException - when attempting to call this method after init() or when there is already a valid and alive database connection
        IOException - in case of file reading problems.
      • setInput

        public void setInput​(InputStream input)
                      throws MDGeneratorException
        Sets the input to an already opened molecular structure stream.
        Parameters:
        input - an input stream
        Throws:
        MDGeneratorException - when attempting to call this method after init() or when there is already a valid and alive database connection
      • setSDfileInput

        public void setSDfileInput​(boolean sdfInput)
                            throws MDGeneratorException
        Toggles input file type.
        Parameters:
        sdfInput - indicates, if input file is an SDfile
        Throws:
        MDGeneratorException - when attempting to call this method after init() or when no input file has been specified
      • setOutputFileName

        public void setOutputFileName​(String outputFileName)
                               throws MDGeneratorException
        Sets the name of the output SDfile. Note, that if the required output is one or more descriptor file(s), it (they) should not be specified as output file(s), but as descriptor name(s).
        Parameters:
        outputFileName - name of the output SDfile
        Throws:
        MDGeneratorException - when attempting to call this method after init() or when there is already a valid and alive database connection
      • setSDfileOutput

        public void setSDfileOutput​(boolean sdfOutput)
        Toggles SDfile output format.
        Parameters:
        sdfOutput - indicates if output file is an SDfile
      • setDecimalOutput

        public void setDecimalOutput​(boolean decimalOutput)
        Sets decimal output format. This file format is recognized by JKlustor tools.
        Parameters:
        decimalOutput - new value for the option
        Since:
        JChem 2.0.1
      • setBinaryOutput

        public void setBinaryOutput​(boolean binaryOutput)
        Sets decimal output format. This file format is recognized by JKlustor tools.
        Parameters:
        binaryOutput - new value for the option
        Since:
        JChem 2.3
      • setIdTagName

        public void setIdTagName​(String idTagName)
        Sets the name of the input SDfile tag which contains unique structure identifiers. These identifiers are printed in each line of the decimal output format.
        Parameters:
        idTagName - SDfile structure identifier tag name
        Since:
        JChem 2.0.1
      • setValidateDescriptor

        public void setValidateDescriptor​(String activityTagName,
                                          double clusteringRadius,
                                          String metric)
                                   throws MDGeneratorException
        Sets parameters for the Activity-seeded Structure-based clustering.
        Parameters:
        activityTagName - name of the SDfile tag storing activity data
        clusteringRadius - dissimilarity radius of a cluster around a seed
        metric - metric used in clustering
        Throws:
        MDGeneratorException - when attempting to call this method after init() or when applying this method in multiple descriptor case
        Since:
        JChem 2.3
      • setDescriptor

        public void setDescriptor​(String name,
                                  String type,
                                  String settings,
                                  String comment)
                           throws MDGeneratorException
        Sets the descriptor to be generated. Use this method when descriptor of one type are generated (that is, the descriptor set has one component only).
        Parameters:
        name - user given name of the descriptor
        type - type name of the descriptor (e.g. ChemicalFingerprint; generatemd -L command lists all available built-in descriptor types)
        settings - parameter settings of the descriptor (XML)
        comment - optional comment to be stored in database
        Throws:
        MDGeneratorException - when attempting to call this method after init() or when applying this method in multiple descriptor case
      • setDescriptor

        public void setDescriptor​(int index,
                                  String name,
                                  String type,
                                  String settings,
                                  String comment)
                           throws MDGeneratorException
        Sets type, name, parameters and comment for a given descriptor component. Use this method when more than one descriptors are generated at a time (e.g. CF and PF simultaneously).
        Parameters:
        index - index of the component
        name - user given name of the descriptor set component
        type - type name of the descriptor (e.g. ChemicalFingerprint; generatemd -L command lists all available built-in descriptor types)
        settings - parameter settings for the descriptor (XML)
        comment - optional comment to be stored in database
        Throws:
        MDGeneratorException - when attempting to call this method after init() or when another component's settings were specified with an MDParameters object (rather than a String
      • setDescriptors

        public void setDescriptors​(String[] names,
                                   String[] types,
                                   String[] settings,
                                   String[] comments)
                            throws MDGeneratorException
        Sets all descriptor components to be generated simultaneously.
        Parameters:
        names - user given names of the descriptor set components
        types - type names of the descriptors (e.g. ChemicalFingerprint; generatemd -L command lists all available built-in descriptor types)
        settings - parameter settings for the descriptors (XML)
        comments - optional comments to be stored in database
        Throws:
        MDGeneratorException - when attempting to call this method after init()
      • setDescriptor

        public void setDescriptor​(String name,
                                  String type,
                                  MDParameters params,
                                  String comment)
                           throws MDGeneratorException
        Sets type, name, parameters and comment for a given descriptor component. Use this method when only one descriptor type is generated.
        Parameters:
        name - user given name of the descriptor
        type - type name of the descriptor (e.g. ChemicalFingerprint; generatemd -L command lists all available built-in descriptor types)
        params - parameter settings for the descriptor (e.g. CFParameters)
        comment - optional comment to be stored in database
        Throws:
        MDGeneratorException - when attempting to call this method after init() or when applying this method in multiple descriptor case
      • setDescriptor

        public void setDescriptor​(int index,
                                  String name,
                                  String type,
                                  MDParameters params,
                                  String comment)
                           throws MDGeneratorException
        Sets type, name, parameters and comment for a given descriptor component. Use this method when more than one descriptors are generated at a time and they are not specified all in one go.
        Parameters:
        index - index of the component to be specified
        name - user given name of the descriptor
        type - type name of the descriptor (e.g. ChemicalFingerprint; generatemd -L command lists all available built-in descriptor types)
        params - parameter settings of the descriptor (e.g. CFParameters)
        comment - optional comment to be stored indatabase only
        Throws:
        MDGeneratorException - when attempting to call this method after init() or when a previously set component was specified with a String parameter setting
      • setDescriptors

        public void setDescriptors​(String[] names,
                                   String[] types,
                                   MDParameters[] params,
                                   String[] comments)
                            throws MDGeneratorException
        Sets type, name, parameters and comment for all components of a molecular descriptor set.
        Parameters:
        names - user given names of the descriptor components
        types - type names of the descriptors (e.g. ChemicalFingerprint; generatemd -L command lists all available built-in descriptor types)
        params - parameter settings for the descriptor components (e.g. CFParameters)
        comments - optional comments to be stored in database
        Throws:
        MDGeneratorException - when attempting to call this method after init()
      • setTagName

        public void setTagName​(String name)
                        throws MDGeneratorException
        Sets the SDfile tag name for the only descriptor type generated.
        Parameters:
        name - SDfile tag name
        Throws:
        MDGeneratorException - when attempting to call this method after init() or when applying this method in multiple descriptor case
      • setTagName

        public void setTagName​(int index,
                               String name)
                        throws MDGeneratorException
        Sets the SDfile tag name for the given descriptor set component.
        Parameters:
        index - index of the component
        name - SDfile tag name
        Throws:
        MDGeneratorException - when attempting to call this method after init()
      • setTagNames

        public void setTagNames​(String[] names)
                         throws MDGeneratorException
        Sets the SDfile tag names for all descriptor set components.
        Parameters:
        names - SDfile tag names
        Throws:
        MDGeneratorException - when attempting to call this method after init()
      • setGenerateId

        public void setGenerateId​(int from)
                           throws MDGeneratorException
        Toggles automatic unique structure/descriptor identifier generation mode and sets the value of the first unique identifier.
        Parameters:
        from - the value of the first id to be generated
        Throws:
        MDGeneratorException - when attempting to call this method after init() or when attempting to generate ID-s for database structures
      • setCreateStat

        public void setCreateStat​(boolean createStat)
        Toggles create statistics flag.
        Parameters:
        createStat - new value for the create statistics flag
      • init

        public void init()
                  throws MDGeneratorException
        Initialize the generator object. Call this method only after all features, modes and parameters have been set by the setter methods.
        Throws:
        MDGeneratorException - when attempting to initialize once again, all input/output (file creation and writing) and all database (SQL) exceptions are re-thrown
      • step

        public boolean step()
                     throws MDGeneratorException
        Fetches one structure from the input source and generates descriptors as specified before initialization by the setter methods.
        Returns:
        true if a structure was successfully processed
        Throws:
        MDGeneratorException - when not yet initialized or failure to read input or write output
      • getCounter

        public int getCounter()
                       throws MDGeneratorException
        Gets the number of molecules processed since init() was called.
        Returns:
        number of structures processed
        Throws:
        MDGeneratorException - when not yet initialized
      • getASSBClusters

        public int[] getASSBClusters()
      • run

        public void run()
                 throws MDGeneratorException
        Processes all structures from the input source. Structure from the input are retrieved one-by-one and all descriptors types set earlier (by the set methods) are generated and stored in the specified output.
        Throws:
        MDGeneratorException - not yet initialized or failed to read input or write output
      • getStatistics

        public String getStatistics​(int di)
        Gets statistical data on descriptors generated.
        Parameters:
        di - descriptor component index
        Returns:
        statistics in a formatted string
        Since:
        JChem 2.1
      • validateDescriptor

        public void validateDescriptor()
        Validates a descriptor by the activity-seeded structure-based clustering.
      • close

        public void close()
                   throws MDGeneratorException
        Closes the generator, all output files or database connection.
        Throws:
        MDGeneratorException - when not yet initialized or failed to close output files
      • createMDTable

        public void createMDTable​(String descrName,
                                  String className,
                                  String settings,
                                  String comment)
                           throws MDGeneratorException
        Creates a database table to store the MolecularDescriptors generated. There is no need to call this method directly if descriptors are generated with methods offered by this class, for advanced usage only.
        The corresponding structure table's name should be set by setStructureTableName( String ) prior to calling this function.
        Parameters:
        descrName - symbolic name of the descriptor, given by the user
        className - name of the class implementing the descriptor
        settings - parameter string
        comment - optional comment
        Throws:
        MDGeneratorException - when there is no valid database connection or an SQL error occurred
      • deleteMDTable

        public void deleteMDTable​(String descrName)
                           throws MDGeneratorException,
                                  SQLException
        Deletes a database table that strores molecular descriptors. All raws, the table and all corresponding administrative information is lost irreversibly.
        Parameters:
        descrName - name of the descriptor (as given by the user when created)
        Throws:
        MDGeneratorException - when there is no valid database
        SQLException - any database error
      • updateMDTable

        public void updateMDTable​(String descrName)
                           throws MDGeneratorException,
                                  SQLException
        Systematically regenerates all descriptors. Call this method, when new structures are added to the structure table.
        Parameters:
        descrName - name of the descriptor (as given by the user when created)
        Throws:
        MDGeneratorException - when there is no valid database
        SQLException - any database error
      • addMDConfig

        public void addMDConfig​(String descrName,
                                String configName,
                                String config)
                         throws MDGeneratorException,
                                SQLException
        Adds a new parameter configuration to the descriptor. Such extra configurations, often called as 'screening configurations' can extend or overwrite parameter settings stored in the time of creation. A typical example is adding new dissimilarity metrics optimized for a new active compound family to the existing set of metrics.
        Parameters:
        descrName - name of the descriptor (as given by the user when created)
        configName - symbolic name given by the user to help the identification of the extension configuration
        config - extra configuration settings
        Throws:
        MDGeneratorException - when there is no valid database or an existing configuration is attempted to be redefined
        SQLException - any database error
      • addMDConfig

        public void addMDConfig​(String descrName,
                                String configName,
                                File configFile)
                         throws SQLException,
                                IOException,
                                MDGeneratorException
        Adds a new parameter configuration to the descriptor. Such extra configurations, often called as 'screening configurations' can extend or overwrite parameter settings stored in the time of creation. A typical example is adding new dissimilarity metrics optimized for a new active compound family to the existing set of metrics.
        Parameters:
        descrName - name of the descriptor (as given by the user when created)
        configName - symbolic name given by the user to help the identification of the extension configuration
        configFile - file of extra configuration settings
        Throws:
        MDGeneratorException - when there is no valid database or an existing configuration is attempted to be redefined
        SQLException - any database error
        IOException - in case of file reading problems.
      • deleteMDConfig

        public void deleteMDConfig​(String descrName,
                                   String configName)
                            throws MDGeneratorException,
                                   SQLException
        Deletes an extension configuration.
        Parameters:
        descrName - name of the descriptor (as given by the user when created)
        configName - symbolic name given by the user to help the identification of the extension configuration
        Throws:
        MDGeneratorException - when there is no valid database
        SQLException - any database error
      • main

        public static void main​(String[] args)
        Command-line entry point to the MolecularDescriptor generator.
        Parameters:
        args - the command line arguments