Class Importer

  • All Implemented Interfaces:
    chemaxon.jchem.db.Transfer, Runnable

    @PublicAPI
    public class Importer
    extends Thread
    implements chemaxon.jchem.db.Transfer
    Tool for importing molecules to database tables from a File or InputStream object. Example of usage: File Import/Export Tools.
    • Constructor Detail

      • Importer

        public Importer()
        Constructor.
    • Method Detail

      • setConnectionHandler

        public void setConnectionHandler​(ConnectionHandler conh)
        Setter for property connectionHandler. The ConnectionHandler must represent an open connection to the database.
        Parameters:
        conh - the connection handler
      • getConnectionHandler

        public ConnectionHandler getConnectionHandler()
        Getter for property connectionHandler.
        Returns:
        the connection handler
      • setInput

        public void setInput​(File inputFile)
        Sets the source object as a file.
        Parameters:
        inputFile - the source file
      • setInput

        public void setInput​(InputStream is)
        Sets the source object as a stream.
        Parameters:
        is - the source stream
      • setInput

        public void setInput​(String fileName)
        Sets the source object as a file, specifying the name of the file.
        Parameters:
        fileName - the source file name
      • getInput

        public Object getInput()
        Gets the source object. The object may be File or InputStream.
        Returns:
        the source object
      • setTableName

        public void setTableName​(String tname)
        Sets the name of the table to import into.
        Parameters:
        tname - the table name
      • getTableName

        public String getTableName()
        Gets the name of the table to import into.
        Returns:
        the table name
      • setConnections

        @Deprecated
        public void setConnections​(String connections)
        Deprecated.
        since 2.2 replaced by setFieldConnections(String).
        Specifies which data fields correspond to which table fields.

        The fields are given in pair in the following format: "databaseField1=dataField1;databaseField2=dataField2"

        • If empty String, no connections are made to extra fields.
        • If null, the fields are automatically connected by equal names (default).
        Parameters:
        connections - the connection string
      • setFieldConnections

        public void setFieldConnections​(String connections)
        Specifies which data fields correspond to which table fieds.

        The fields are given in pair in the following format: "databaseField1=dataField1;databaseField2=dataField2"

        • If empty String, no connections are made to extra fields.
        • If null, the fields are automatically connected by equal names (default).
        Parameters:
        connections - the connection string
      • getConnections

        @Deprecated
        public String getConnections()
        Deprecated.
        since 2.2 replaced by getFieldConnections().
        Returns the specified table field - file field pairs.
        Returns:
        the connection string
      • getFieldConnections

        public String getFieldConnections()
        Returns the specified table field - file field pairs.
        Returns:
        the connection string
      • setRecordsToCheck

        public void setRecordsToCheck​(int recordsToCheck)
        Sets the number of records to check for file format. The same number of records will be checked for field names in the case of SDfiles.
        Default value is 500 records.
        Note: In the case of using InputStream as source, these records are buffered in memory. Make sure java has enough memory when setting this value very high. (-Xmx parameter)
        Using File as input recommended if it's feasible, since it doesn't need buffering.
        Parameters:
        recordsToCheck - the number of records to check for file format
      • getRecordsToCheck

        public int getRecordsToCheck()
        Gets the number of records to check for file format.
        Returns:
        the number of records to check for file format
      • setProgressWriter

        public void setProgressWriter​(ProgressWriter pwriter)
        Sets the ProgressWriter object to track the progress the actual importing. (Format checking and skipping not monitored by this object.)
        It can be null if no monitoring is necessary.
        Parameters:
        pwriter - the progress writer
      • getProgressWriter

        public ProgressWriter getProgressWriter()
        Gets the ProgressWriter object used for monitoring.
        Returns:
        the progress writer
      • setHaltOnError

        public void setHaltOnError​(boolean b)
        Sets if import should stop when an error occurs.
        Parameters:
        b - true if halt on error
      • isHaltOnError

        public boolean isHaltOnError()
        Gets if import should stop when an error occurs.
        Returns:
        true if halt on error
      • setDuplicateImportAllowed

        public void setDuplicateImportAllowed​(int duplicateFilteringOption)
        Sets the duplicate filtering option on import. It can be banned, allowed or specified by the table option.
        Parameters:
        duplicateFilteringOption -
        • If set to DUPLICATE_FILTERING_ON does not import molecules that already exist in the table with the same topology. Forces switching ON duplicate filtering regardless of table setting. This checking may slow down the import progress.
        • If set to DUPLICATE_FILTERING_OFF duplicates are allowed. Forces switching OFF duplicate filtering regardless of table setting.
        • If set to DUPLICATE_FILTERING_TABLE_OPTION the value of the table option (StructureTableOptions.isDuplicateFiltering()) controls the filtering of duplicates.
        Warning: switching duplicate filtering upon import to a different option than the table duplicate filtering option may result in table content not consistent with the table option.
        See Also:
        UpdateHandler.DUPLICATE_FILTERING_ON, UpdateHandler.DUPLICATE_FILTERING_OFF, UpdateHandler.DUPLICATE_FILTERING_TABLE_OPTION, StructureTableOptions.isDuplicateFiltering()
      • isDuplicateImportAllowed

        public boolean isDuplicateImportAllowed()
        Gets whether duplicate structures are allowed.
        Returns:
        true if duplicates are allowed
        Throws:
        IllegalArgumentException - if duplicate filtering option of the table cannot be determined.
      • setEmptyStructuresAllowed

        public void setEmptyStructuresAllowed​(boolean b)
        If set to false does not import empty molecules.
        Parameters:
        b - set to true if empty structures are allowed
      • getEmptyStructuresAllowed

        public boolean getEmptyStructuresAllowed()
        Gets whether empty structures are allowed.
        Returns:
        true if empty structures are allowed
      • setSetChiralFlag

        public void setSetChiralFlag​(boolean setChiralFlag)
        Sets if chiral flag should be set to true during import.
        Parameters:
        setChiralFlag - if set to true, chiral flag is set to true for imported molecules. The default setting is false. since 2.3
      • getSetChiralFlag

        public boolean getSetChiralFlag()
        Gets whether chiral flag is set on import.
        Returns:
        the current state
      • setNameFieldInDB

        public void setNameFieldInDB​(String fieldName)
        Set a DB field to contain the structure name.
        Parameters:
        fieldName - the name of the column in the database
      • getNameFieldInDB

        public String getNameFieldInDB()
      • isFinished

        public boolean isFinished()
        Checks whenver the import has been finished.
        Returns:
        true if importing has finished, else returns false.
      • getErrorMessage

        public String getErrorMessage()
        If error occurs this function returns the error message.
        Returns:
        the error message
      • getErrorCause

        public Throwable getErrorCause()
        Retrieves Throwable caught in run() method. WARNING This mechanism is expected to be revised in the near future, use with extreme caution!
        Returns:
        the error cause or null
      • getStructCount

        public int getStructCount()
        Returns the current count of structures which were examined by the import process.
        Returns:
        the number of molecules imported so far
      • getImportedNumber

        public int getImportedNumber()
        Returns the number of imported molecules.
        Returns:
        the number of molecules imported
      • getDuplicates

        public int getDuplicates()
        Returns the number of molecules that were not imported, because they are duplicates.
        Returns:
        the number of duplicates encountered
      • getEmptyStructures

        public int getEmptyStructures()
        Returns the number of molecules that were not imported, because they are empty strucures.
        Returns:
        the count of empty structures skipped
      • getNote

        public String getNote()
        Returns the note of the progresswriter.
        Returns:
        the note
      • setSkip

        public void setSkip​(int skip)
        Sets the number of molecules to skip from the beginning ogf file.
        Parameters:
        skip - the number of molecule to skip
      • getSkip

        public int getSkip()
        Gets the number of molecules to skip from the beginning ogf file.
        Returns:
        the number of molecules to skip
      • getProgress

        public long getProgress()
        Gets the status of the importing progress.
        Returns:
        the position of the ProgressWriter, -1 if the object is not set (null)
      • run

        public void run()
        Starts execution as a thread. Calls init(),skip(), and importMols. Exceptions are caught and printed to stderr.
        Specified by:
        run in interface Runnable
        Overrides:
        run in class Thread
      • setInfoStream

        public void setInfoStream​(PrintStream st)
        Sets the stream where information about the import prorcess will be written (e.g. skipped duplicates and empty structures).
        Parameters:
        st - the stream. The default is null (no info is written).
      • setOutputOptions

        public void setOutputOptions​(boolean printDuplicates,
                                     boolean printNonDuplicates,
                                     OutputStream os,
                                     boolean doNotImport)
        With this option one can print duplicate or non-duplicate molecules to a stream. Will print only if duplicate filtering is allowed.
        Parameters:
        printDuplicates - prints indexes of duplicate structures
        printNonDuplicates - prints indexes of imported structures
        os - outputStream to write information, if null stdout is used.
        doNotImport - disables the real importing, just generates the index entry.
      • setStoreDuplicates

        public void setStoreDuplicates​(boolean value)
        Specifies whether the ID's of duplicate structures should be stored.
        Parameters:
        value - enable/disable storing of duplicates
        See Also:
        getDuplicateIDList()
      • setStoreImportedIDs

        public void setStoreImportedIDs​(boolean value)
        Specifies whether the ID's of imported structures should be stored.
        Parameters:
        value - enable/disable storing of ids
        Since:
        JChem 3.1.7
        See Also:
        getImportedIDList()
      • getDuplicateIDList

        public List<Integer> getDuplicateIDList()
        Returns the IDs (cd_id column in database table) of duplicate structures.
        Returns:
        the IDs as a list containing Integer objects.
        See Also:
        setStoreDuplicates(boolean)
      • getImportedIDList

        public List<Integer> getImportedIDList()
        Returns the IDs (cd_id column in database table) of imported structures.
        Returns:
        the IDs as a list containing Integer objects.
        See Also:
        setStoreImportedIDs(boolean)
      • importMols

        public int importMols()
                       throws TransferException
        Imports molecules.
        Returns:
        the number of molecules imported
        Throws:
        TransferException - if the settings are invalid
      • cancel

        public void cancel()
        Stops the importing progress.
      • skip

        public void skip​(int offset)
                  throws TransferException
        Skips the given number of molecules.
        Parameters:
        offset - the number of molecules to be skipped
        Throws:
        TransferException - if the settings are invalid
      • init

        public void init()
                  throws TransferException
        Initialization, checking given number of lines for file format and fields. If not called explicitly, automatically called by skip or importMols if necessary.
        Throws:
        TransferException - if the settings are invalid
      • getFieldNameList

        public List<String> getFieldNameList()
                                      throws TransferException,
                                             IOException
        Returns field names in an SDfile. The file may come from an InputStream, import may follow without reopening the stream. Calls int if initialization is necessary.
        Returns:
        a vector of String objects, the names of the SDfile fields.
        Throws:
        TransferException - in case of failure
        IOException - if an IOException occurs
      • getFieldNameList

        public static List<String> getFieldNameList​(InputStream is,
                                                    int linesToCheck)
                                             throws IOException,
                                                    MRecordParseException
        Returns field names in an SDfile. NOTE: in order to return to the initial position, the InputStream has to reopened or repositioned (BufferedInputStream)
        Parameters:
        is - inputStream to read from
        linesToCheck - read ahead this many lines during field name identification
        Returns:
        a vector of String objects, the names of the SDfile fields.
        Throws:
        IOException - if the inputstream encounters a problem
        MRecordParseException - if a record could not be read.