Class Importer

java.lang.Object
java.lang.Thread
chemaxon.jchem.db.Importer
All Implemented Interfaces:
chemaxon.jchem.db.Transfer, Runnable

@PublicApi public class Importer extends Thread implements chemaxon.jchem.db.Transfer
Tool for importing molecules to database tables from a File or InputStream object.

Example of usage: File Import/Export Tools.

  • Constructor Details

    • Importer

      public Importer()
      Constructor.
  • Method Details

    • setConnectionHandler

      public void setConnectionHandler(ConnectionHandler conh)
      Setter for property connectionHandler. The ConnectionHandler must represent an open connection to the database.
      Parameters:
      conh - the connection handler
    • getConnectionHandler

      public ConnectionHandler getConnectionHandler()
      Getter for property connectionHandler.
      Returns:
      the connection handler
    • setInput

      public void setInput(File inputFile)
      Sets the source object as a file.
      Parameters:
      inputFile - the source file
    • setInput

      public void setInput(InputStream is)
      Sets the source object as a stream.
      Parameters:
      is - the source stream
    • setInput

      public void setInput(String fileName)
      Sets the source object as a file, specifying the name of the file.
      Parameters:
      fileName - the source file name
    • getInput

      public Object getInput()
      Gets the source object. The object may be File or InputStream.
      Returns:
      the source object
    • setTableName

      public void setTableName(String tname)
      Sets the name of the table to import into.
      Parameters:
      tname - the table name
    • getTableName

      public String getTableName()
      Gets the name of the table to import into.
      Returns:
      the table name
    • setConnections

      @Deprecated(forRemoval=true) @SubjectToRemoval(date=JUL_01_2025) public void setConnections(String connections)
      Deprecated, for removal: This API element is subject to removal in a future version.
      since 2.2 replaced by setFieldConnections(String).
      Specifies which data fields correspond to which table fields.

      The fields are given in pair in the following format: "databaseField1=dataField1;databaseField2=dataField2"

      • If empty String, no connections are made to extra fields.
      • If null, the fields are automatically connected by equal names (default).
      Parameters:
      connections - the connection string
    • setFieldConnections

      public void setFieldConnections(String connections)
      Specifies which data fields correspond to which table fieds.

      The fields are given in pair in the following format: "databaseField1=dataField1;databaseField2=dataField2"

      • If empty String, no connections are made to extra fields.
      • If null, the fields are automatically connected by equal names (default).
      Parameters:
      connections - the connection string
    • getConnections

      Deprecated, for removal: This API element is subject to removal in a future version.
      since 2.2 replaced by getFieldConnections().
      Returns the specified table field - file field pairs.
      Returns:
      the connection string
    • getFieldConnections

      public String getFieldConnections()
      Returns the specified table field - file field pairs.
      Returns:
      the connection string
    • setRecordsToCheck

      public void setRecordsToCheck(int recordsToCheck)
      Sets the number of records to check for file format. The same number of records will be checked for field names in the case of SDfiles.
      Default value is 500 records.
      Note: In the case of using InputStream as source, these records are buffered in memory. Make sure java has enough memory when setting this value very high. (-Xmx parameter)
      Using File as input recommended if it's feasible, since it doesn't need buffering.
      Parameters:
      recordsToCheck - the number of records to check for file format
    • getRecordsToCheck

      public int getRecordsToCheck()
      Gets the number of records to check for file format.
      Returns:
      the number of records to check for file format
    • setProgressWriter

      public void setProgressWriter(ProgressWriter pwriter)
      Sets the ProgressWriter object to track the progress the actual importing. (Format checking and skipping not monitored by this object.)
      It can be null if no monitoring is necessary.
      Parameters:
      pwriter - the progress writer
    • getProgressWriter

      public ProgressWriter getProgressWriter()
      Gets the ProgressWriter object used for monitoring.
      Returns:
      the progress writer
    • setHaltOnError

      public void setHaltOnError(boolean b)
      Sets if import should stop when an error occurs.
      Parameters:
      b - true if halt on error
    • isHaltOnError

      public boolean isHaltOnError()
      Gets if import should stop when an error occurs.
      Returns:
      true if halt on error
    • setDuplicateImportAllowed

      @Deprecated(forRemoval=true) @SubjectToRemoval(date=JUL_01_2025) public void setDuplicateImportAllowed(boolean b)
      Deprecated, for removal: This API element is subject to removal in a future version.
      since JChem 5.4. This import option has been table option, instead of this use setDuplicateImportAllowed(int) method
      If set to false does not import molecules that already exist in the table with the same topology. This checking may slow down the import progress.
      Parameters:
      b - new value for duplicateFiltering
      See Also:
    • setDuplicateImportAllowed

      public void setDuplicateImportAllowed(int duplicateFilteringOption)
      Sets the duplicate filtering option on import. It can be banned, allowed or specified by the table option.
      Parameters:
      duplicateFilteringOption -
      • If set to DUPLICATE_FILTERING_ON does not import molecules that already exist in the table with the same topology. Forces switching ON duplicate filtering regardless of table setting. This checking may slow down the import progress.
      • If set to DUPLICATE_FILTERING_OFF duplicates are allowed. Forces switching OFF duplicate filtering regardless of table setting.
      • If set to DUPLICATE_FILTERING_TABLE_OPTION the value of the table option (StructureTableOptions.isDuplicateFiltering()) controls the filtering of duplicates.
      Warning: switching duplicate filtering upon import to a different option than the table duplicate filtering option may result in table content not consistent with the table option.
      See Also:
    • isDuplicateImportAllowed

      public boolean isDuplicateImportAllowed()
      Gets whether duplicate structures are allowed.
      Returns:
      true if duplicates are allowed
      Throws:
      IllegalArgumentException - if duplicate filtering option of the table cannot be determined.
    • setEmptyStructuresAllowed

      public void setEmptyStructuresAllowed(boolean b)
      If set to false does not import empty molecules.
      Parameters:
      b - set to true if empty structures are allowed
    • getEmptyStructuresAllowed

      public boolean getEmptyStructuresAllowed()
      Gets whether empty structures are allowed.
      Returns:
      true if empty structures are allowed
    • setSetChiralFlag

      public void setSetChiralFlag(boolean setChiralFlag)
      Sets if chiral flag should be set to true during import.
      Parameters:
      setChiralFlag - if set to true, chiral flag is set to true for imported molecules. The default setting is false. since 2.3
    • getSetChiralFlag

      public boolean getSetChiralFlag()
      Gets whether chiral flag is set on import.
      Returns:
      the current state
    • setNameFieldInDB

      public void setNameFieldInDB(String fieldName)
      Set a DB field to contain the structure name.
      Parameters:
      fieldName - the name of the column in the database
    • getNameFieldInDB

      public String getNameFieldInDB()
    • isFinished

      public boolean isFinished()
      Checks whenver the import has been finished.
      Returns:
      true if importing has finished, else returns false.
    • getErrorMessage

      public String getErrorMessage()
      If error occurs this function returns the error message.
      Returns:
      the error message
    • getErrorCause

      public Throwable getErrorCause()
      Retrieves Throwable caught in run() method. WARNING This mechanism is expected to be revised in the near future, use with extreme caution!
      Returns:
      the error cause or null
    • getStructCount

      public int getStructCount()
      Returns the current count of structures which were examined by the import process.
      Returns:
      the number of molecules imported so far
    • getImportedNumber

      public int getImportedNumber()
      Returns the number of imported molecules.
      Returns:
      the number of molecules imported
    • getDuplicates

      public int getDuplicates()
      Returns the number of molecules that were not imported, because they are duplicates.
      Returns:
      the number of duplicates encountered
    • getEmptyStructures

      public int getEmptyStructures()
      Returns the number of molecules that were not imported, because they are empty strucures.
      Returns:
      the count of empty structures skipped
    • getNote

      public String getNote()
      Returns the note of the progresswriter.
      Returns:
      the note
    • setSkip

      public void setSkip(int skip)
      Sets the number of molecules to skip from the beginning ogf file.
      Parameters:
      skip - the number of molecule to skip
    • getSkip

      public int getSkip()
      Gets the number of molecules to skip from the beginning ogf file.
      Returns:
      the number of molecules to skip
    • getProgress

      public long getProgress()
      Gets the status of the importing progress.
      Returns:
      the position of the ProgressWriter, -1 if the object is not set (null)
    • run

      public void run()
      Starts execution as a thread. Calls init(),skip(), and importMols. Exceptions are caught and printed to stderr.
      Specified by:
      run in interface Runnable
      Overrides:
      run in class Thread
    • setInfoStream

      public void setInfoStream(PrintStream st)
      Sets the stream where information about the import prorcess will be written (e.g. skipped duplicates and empty structures).
      Parameters:
      st - the stream. The default is null (no info is written).
    • setOutputOptions

      public void setOutputOptions(boolean printDuplicates, boolean printNonDuplicates, OutputStream os, boolean doNotImport)
      With this option one can print duplicate or non-duplicate molecules to a stream. Will print only if duplicate filtering is allowed.
      Parameters:
      printDuplicates - prints indexes of duplicate structures
      printNonDuplicates - prints indexes of imported structures
      os - outputStream to write information, if null stdout is used.
      doNotImport - disables the real importing, just generates the index entry.
    • setStoreDuplicates

      public void setStoreDuplicates(boolean value)
      Specifies whether the ID's of duplicate structures should be stored.
      Parameters:
      value - enable/disable storing of duplicates
      See Also:
    • setStoreImportedIDs

      public void setStoreImportedIDs(boolean value)
      Specifies whether the ID's of imported structures should be stored.
      Parameters:
      value - enable/disable storing of ids
      Since:
      JChem 3.1.7
      See Also:
    • getDuplicateIDList

      public List<Integer> getDuplicateIDList()
      Returns the IDs (cd_id column in database table) of duplicate structures.
      Returns:
      the IDs as a list containing Integer objects.
      See Also:
    • getImportedIDList

      public List<Integer> getImportedIDList()
      Returns the IDs (cd_id column in database table) of imported structures.
      Returns:
      the IDs as a list containing Integer objects.
      See Also:
    • importMols

      public int importMols() throws TransferException
      Imports molecules.
      Returns:
      the number of molecules imported
      Throws:
      TransferException - if the settings are invalid
    • cancel

      public void cancel()
      Stops the importing progress.
    • skip

      public void skip(int offset) throws TransferException
      Skips the given number of molecules.
      Parameters:
      offset - the number of molecules to be skipped
      Throws:
      TransferException - if the settings are invalid
    • init

      public void init() throws TransferException
      Initialization, checking given number of lines for file format and fields. If not called explicitly, automatically called by skip or importMols if necessary.
      Throws:
      TransferException - if the settings are invalid
    • getFieldNameList

      public List<String> getFieldNameList() throws TransferException, IOException
      Returns field names in an SDfile. The file may come from an InputStream, import may follow without reopening the stream. Calls int if initialization is necessary.
      Returns:
      a vector of String objects, the names of the SDfile fields.
      Throws:
      TransferException - in case of failure
      IOException - if an IOException occurs
    • getFieldNameList

      public static List<String> getFieldNameList(InputStream is, int linesToCheck) throws IOException, MRecordParseException
      Returns field names in an SDfile. NOTE: in order to return to the initial position, the InputStream has to reopened or repositioned (BufferedInputStream)
      Parameters:
      is - inputStream to read from
      linesToCheck - read ahead this many lines during field name identification
      Returns:
      a vector of String objects, the names of the SDfile fields.
      Throws:
      IOException - if the inputstream encounters a problem
      MRecordParseException - if a record could not be read.