Package chemaxon.jchem.db
Class Importer
java.lang.Object
java.lang.Thread
chemaxon.jchem.db.Importer
- All Implemented Interfaces:
chemaxon.jchem.db.Transfer
,Runnable
Tool for importing molecules to database tables from a File
or InputStream object.
Example of usage: File Import/Export Tools.
-
Nested Class Summary
Nested classes/interfaces inherited from class java.lang.Thread
Thread.State, Thread.UncaughtExceptionHandler
-
Field Summary
Fields inherited from class java.lang.Thread
MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY
Fields inherited from interface chemaxon.jchem.db.Transfer
CXSMARTS, CXSMILES, INCHI, MOL2FILE, MOLFILE, MRV, RDFILE, RXNFILE, SDFILE, SMARTS, SMILES
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
cancel()
Stops the importing progress.Getter for property connectionHandler.Deprecated, for removal: This API element is subject to removal in a future version.Returns the IDs (cd_id column in database table) of duplicate structures.int
Returns the number of molecules that were not imported, because they are duplicates.int
Returns the number of molecules that were not imported, because they are empty strucures.boolean
Gets whether empty structures are allowed.Retrieves Throwable caught inrun()
method.If error occurs this function returns the error message.Returns the specified table field - file field pairs.Returns field names in an SDfile.getFieldNameList
(InputStream is, int linesToCheck) Returns field names in an SDfile.Returns the IDs (cd_id column in database table) of imported structures.int
Returns the number of imported molecules.getInput()
Gets the source object.getNote()
Returns the note of the progresswriter.long
Gets the status of the importing progress.Gets the ProgressWriter object used for monitoring.int
Gets the number of records to check for file format.boolean
Gets whether chiral flag is set on import.int
getSkip()
Gets the number of molecules to skip from the beginning ogf file.int
Returns the current count of structures which were examined by the import process.Gets the name of the table to import into.int
Imports molecules.void
init()
Initialization, checking given number of lines for file format and fields.boolean
Gets whether duplicate structures are allowed.boolean
Checks whenver the import has been finished.boolean
Gets if import should stop when an error occurs.void
run()
Starts execution as a thread.void
Setter for property connectionHandler.void
setConnections
(String connections) Deprecated, for removal: This API element is subject to removal in a future version.since 2.2 replaced bysetFieldConnections(String)
.void
setDuplicateImportAllowed
(boolean b) Deprecated, for removal: This API element is subject to removal in a future version.since JChem 5.4.void
setDuplicateImportAllowed
(int duplicateFilteringOption) Sets the duplicate filtering option on import.void
setEmptyStructuresAllowed
(boolean b) If set tofalse
does not import empty molecules.void
setFieldConnections
(String connections) Specifies which data fields correspond to which table fieds.void
setHaltOnError
(boolean b) Sets if import should stop when an error occurs.void
Sets the stream where information about the import prorcess will be written (e.g.void
Sets the source object as a file.void
setInput
(InputStream is) Sets the source object as a stream.void
Sets the source object as a file, specifying the name of the file.void
setNameFieldInDB
(String fieldName) Set a DB field to contain the structure name.void
setOutputOptions
(boolean printDuplicates, boolean printNonDuplicates, OutputStream os, boolean doNotImport) With this option one can print duplicate or non-duplicate molecules to a stream.void
setProgressWriter
(ProgressWriter pwriter) Sets theProgressWriter
object to track the progress the actual importing.void
setRecordsToCheck
(int recordsToCheck) Sets the number of records to check for file format.void
setSetChiralFlag
(boolean setChiralFlag) Sets if chiral flag should be set totrue
during import.void
setSkip
(int skip) Sets the number of molecules to skip from the beginning ogf file.void
setStoreDuplicates
(boolean value) Specifies whether the ID's of duplicate structures should be stored.void
setStoreImportedIDs
(boolean value) Specifies whether the ID's of imported structures should be stored.void
setTableName
(String tname) Sets the name of the table to import into.void
skip
(int offset) Skips the given number of molecules.Methods inherited from class java.lang.Thread
activeCount, checkAccess, clone, countStackFrames, currentThread, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, onSpinWait, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, suspend, toString, yield
-
Constructor Details
-
Importer
public Importer()Constructor.
-
-
Method Details
-
setConnectionHandler
Setter for property connectionHandler. The ConnectionHandler must represent an open connection to the database.- Parameters:
conh
- the connection handler
-
getConnectionHandler
Getter for property connectionHandler.- Returns:
- the connection handler
-
setInput
Sets the source object as a file.- Parameters:
inputFile
- the source file
-
setInput
Sets the source object as a stream.- Parameters:
is
- the source stream
-
setInput
Sets the source object as a file, specifying the name of the file.- Parameters:
fileName
- the source file name
-
getInput
Gets the source object. The object may beFile
orInputStream
.- Returns:
- the source object
-
setTableName
Sets the name of the table to import into.- Parameters:
tname
- the table name
-
getTableName
Gets the name of the table to import into.- Returns:
- the table name
-
setConnections
@Deprecated(forRemoval=true) @SubjectToRemoval(date=JUL_01_2025) public void setConnections(String connections) Deprecated, for removal: This API element is subject to removal in a future version.since 2.2 replaced bysetFieldConnections(String)
.Specifies which data fields correspond to which table fields.The fields are given in pair in the following format: "databaseField1=dataField1;databaseField2=dataField2"
- If empty String, no connections are made to extra fields.
- If
null
, the fields are automatically connected by equal names (default).
- Parameters:
connections
- the connection string
-
setFieldConnections
Specifies which data fields correspond to which table fieds.The fields are given in pair in the following format: "databaseField1=dataField1;databaseField2=dataField2"
- If empty String, no connections are made to extra fields.
- If
null
, the fields are automatically connected by equal names (default).
- Parameters:
connections
- the connection string
-
getConnections
Deprecated, for removal: This API element is subject to removal in a future version.since 2.2 replaced bygetFieldConnections()
.Returns the specified table field - file field pairs.- Returns:
- the connection string
-
getFieldConnections
Returns the specified table field - file field pairs.- Returns:
- the connection string
-
setRecordsToCheck
public void setRecordsToCheck(int recordsToCheck) Sets the number of records to check for file format. The same number of records will be checked for field names in the case of SDfiles.
Default value is 500 records.
Note: In the case of using InputStream as source, these records are buffered in memory. Make sure java has enough memory when setting this value very high. (-Xmx parameter)
Using File as input recommended if it's feasible, since it doesn't need buffering.- Parameters:
recordsToCheck
- the number of records to check for file format
-
getRecordsToCheck
public int getRecordsToCheck()Gets the number of records to check for file format.- Returns:
- the number of records to check for file format
-
setProgressWriter
Sets theProgressWriter
object to track the progress the actual importing. (Format checking and skipping not monitored by this object.)
It can benull
if no monitoring is necessary.- Parameters:
pwriter
- the progress writer
-
getProgressWriter
Gets the ProgressWriter object used for monitoring.- Returns:
- the progress writer
-
setHaltOnError
public void setHaltOnError(boolean b) Sets if import should stop when an error occurs.- Parameters:
b
-true
if halt on error
-
isHaltOnError
public boolean isHaltOnError()Gets if import should stop when an error occurs.- Returns:
true
if halt on error
-
setDuplicateImportAllowed
@Deprecated(forRemoval=true) @SubjectToRemoval(date=JUL_01_2025) public void setDuplicateImportAllowed(boolean b) Deprecated, for removal: This API element is subject to removal in a future version.since JChem 5.4. This import option has been table option, instead of this usesetDuplicateImportAllowed(int)
methodIf set tofalse
does not import molecules that already exist in the table with the same topology. This checking may slow down the import progress.- Parameters:
b
- new value for duplicateFiltering- See Also:
-
setDuplicateImportAllowed
public void setDuplicateImportAllowed(int duplicateFilteringOption) Sets the duplicate filtering option on import. It can be banned, allowed or specified by the table option.- Parameters:
duplicateFilteringOption
-- If set to
DUPLICATE_FILTERING_ON
does not import molecules that already exist in the table with the same topology. Forces switching ON duplicate filtering regardless of table setting. This checking may slow down the import progress. - If set to
DUPLICATE_FILTERING_OFF
duplicates are allowed. Forces switching OFF duplicate filtering regardless of table setting. - If set to
DUPLICATE_FILTERING_TABLE_OPTION
the value of the table option (StructureTableOptions.isDuplicateFiltering()
) controls the filtering of duplicates.
- If set to
- See Also:
-
isDuplicateImportAllowed
public boolean isDuplicateImportAllowed()Gets whether duplicate structures are allowed.- Returns:
- true if duplicates are allowed
- Throws:
IllegalArgumentException
- if duplicate filtering option of the table cannot be determined.
-
setEmptyStructuresAllowed
public void setEmptyStructuresAllowed(boolean b) If set tofalse
does not import empty molecules.- Parameters:
b
- set to true if empty structures are allowed
-
getEmptyStructuresAllowed
public boolean getEmptyStructuresAllowed()Gets whether empty structures are allowed.- Returns:
- true if empty structures are allowed
-
setSetChiralFlag
public void setSetChiralFlag(boolean setChiralFlag) Sets if chiral flag should be set totrue
during import.- Parameters:
setChiralFlag
- if set totrue
, chiral flag is set totrue
for imported molecules. The default setting isfalse
. since 2.3
-
getSetChiralFlag
public boolean getSetChiralFlag()Gets whether chiral flag is set on import.- Returns:
- the current state
-
setNameFieldInDB
Set a DB field to contain the structure name.- Parameters:
fieldName
- the name of the column in the database
-
getNameFieldInDB
-
isFinished
public boolean isFinished()Checks whenver the import has been finished.- Returns:
true
if importing has finished, else returnsfalse
.
-
getErrorMessage
If error occurs this function returns the error message.- Returns:
- the error message
-
getErrorCause
Retrieves Throwable caught inrun()
method. WARNING This mechanism is expected to be revised in the near future, use with extreme caution!- Returns:
- the error cause or
null
-
getStructCount
public int getStructCount()Returns the current count of structures which were examined by the import process.- Returns:
- the number of molecules imported so far
-
getImportedNumber
public int getImportedNumber()Returns the number of imported molecules.- Returns:
- the number of molecules imported
-
getDuplicates
public int getDuplicates()Returns the number of molecules that were not imported, because they are duplicates.- Returns:
- the number of duplicates encountered
-
getEmptyStructures
public int getEmptyStructures()Returns the number of molecules that were not imported, because they are empty strucures.- Returns:
- the count of empty structures skipped
-
getNote
Returns the note of the progresswriter.- Returns:
- the note
-
setSkip
public void setSkip(int skip) Sets the number of molecules to skip from the beginning ogf file.- Parameters:
skip
- the number of molecule to skip
-
getSkip
public int getSkip()Gets the number of molecules to skip from the beginning ogf file.- Returns:
- the number of molecules to skip
-
getProgress
public long getProgress()Gets the status of the importing progress.- Returns:
- the position of the ProgressWriter, -1 if the object is not set (null)
-
run
public void run()Starts execution as a thread. Calls init(),skip(), and importMols. Exceptions are caught and printed to stderr. -
setInfoStream
Sets the stream where information about the import prorcess will be written (e.g. skipped duplicates and empty structures).- Parameters:
st
- the stream. The default isnull
(no info is written).
-
setOutputOptions
public void setOutputOptions(boolean printDuplicates, boolean printNonDuplicates, OutputStream os, boolean doNotImport) With this option one can print duplicate or non-duplicate molecules to a stream. Will print only if duplicate filtering is allowed.- Parameters:
printDuplicates
- prints indexes of duplicate structuresprintNonDuplicates
- prints indexes of imported structuresos
- outputStream to write information, ifnull
stdout is used.doNotImport
- disables the real importing, just generates the index entry.
-
setStoreDuplicates
public void setStoreDuplicates(boolean value) Specifies whether the ID's of duplicate structures should be stored.- Parameters:
value
- enable/disable storing of duplicates- See Also:
-
setStoreImportedIDs
public void setStoreImportedIDs(boolean value) Specifies whether the ID's of imported structures should be stored.- Parameters:
value
- enable/disable storing of ids- Since:
- JChem 3.1.7
- See Also:
-
getDuplicateIDList
Returns the IDs (cd_id column in database table) of duplicate structures.- Returns:
- the IDs as a list containing Integer objects.
- See Also:
-
getImportedIDList
Returns the IDs (cd_id column in database table) of imported structures.- Returns:
- the IDs as a list containing Integer objects.
- See Also:
-
importMols
Imports molecules.- Returns:
- the number of molecules imported
- Throws:
TransferException
- if the settings are invalid
-
cancel
public void cancel()Stops the importing progress. -
skip
Skips the given number of molecules.- Parameters:
offset
- the number of molecules to be skipped- Throws:
TransferException
- if the settings are invalid
-
init
Initialization, checking given number of lines for file format and fields. If not called explicitly, automatically called byskip
orimportMols
if necessary.- Throws:
TransferException
- if the settings are invalid
-
getFieldNameList
Returns field names in an SDfile. The file may come from an InputStream, import may follow without reopening the stream. Callsint
if initialization is necessary.- Returns:
- a vector of String objects, the names of the SDfile fields.
- Throws:
TransferException
- in case of failureIOException
- if an IOException occurs
-
getFieldNameList
public static List<String> getFieldNameList(InputStream is, int linesToCheck) throws IOException, MRecordParseException Returns field names in an SDfile. NOTE: in order to return to the initial position, the InputStream has to reopened or repositioned (BufferedInputStream)- Parameters:
is
- inputStream to read fromlinesToCheck
- read ahead this many lines during field name identification- Returns:
- a vector of String objects, the names of the SDfile fields.
- Throws:
IOException
- if the inputstream encounters a problemMRecordParseException
- if a record could not be read.
-
getFieldConnections()
.