Package chemaxon.jchem.db
Class Importer
- java.lang.Object
-
- java.lang.Thread
-
- chemaxon.jchem.db.Importer
-
- All Implemented Interfaces:
chemaxon.jchem.db.Transfer
,Runnable
@PublicAPI public class Importer extends Thread implements chemaxon.jchem.db.Transfer
Tool for importing molecules to database tables from a File or InputStream object. Example of usage: File Import/Export Tools.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class java.lang.Thread
Thread.State, Thread.UncaughtExceptionHandler
-
-
Field Summary
-
Fields inherited from class java.lang.Thread
MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY
-
-
Constructor Summary
Constructors Constructor Description Importer()
Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
cancel()
Stops the importing progress.ConnectionHandler
getConnectionHandler()
Getter for property connectionHandler.String
getConnections()
Deprecated.since 2.2 replaced bygetFieldConnections()
.List<Integer>
getDuplicateIDList()
Returns the IDs (cd_id column in database table) of duplicate structures.int
getDuplicates()
Returns the number of molecules that were not imported, because they are duplicates.int
getEmptyStructures()
Returns the number of molecules that were not imported, because they are empty strucures.boolean
getEmptyStructuresAllowed()
Gets whether empty structures are allowed.Throwable
getErrorCause()
Retrieves Throwable caught inrun()
method.String
getErrorMessage()
If error occurs this function returns the error message.String
getFieldConnections()
Returns the specified table field - file field pairs.List<String>
getFieldNameList()
Returns field names in an SDfile.static List<String>
getFieldNameList(InputStream is, int linesToCheck)
Returns field names in an SDfile.List<Integer>
getImportedIDList()
Returns the IDs (cd_id column in database table) of imported structures.int
getImportedNumber()
Returns the number of imported molecules.Object
getInput()
Gets the source object.String
getNameFieldInDB()
String
getNote()
Returns the note of the progresswriter.long
getProgress()
Gets the status of the importing progress.ProgressWriter
getProgressWriter()
Gets the ProgressWriter object used for monitoring.int
getRecordsToCheck()
Gets the number of records to check for file format.boolean
getSetChiralFlag()
Gets whether chiral flag is set on import.int
getSkip()
Gets the number of molecules to skip from the beginning ogf file.int
getStructCount()
Returns the current count of structures which were examined by the import process.String
getTableName()
Gets the name of the table to import into.int
importMols()
Imports molecules.void
init()
Initialization, checking given number of lines for file format and fields.boolean
isDuplicateImportAllowed()
Gets whether duplicate structures are allowed.boolean
isFinished()
Checks whenver the import has been finished.boolean
isHaltOnError()
Gets if import should stop when an error occurs.void
run()
Starts execution as a thread.void
setConnectionHandler(ConnectionHandler conh)
Setter for property connectionHandler.void
setConnections(String connections)
Deprecated.since 2.2 replaced bysetFieldConnections(String)
.void
setDuplicateImportAllowed(boolean b)
Deprecated.since JChem 5.4.void
setDuplicateImportAllowed(int duplicateFilteringOption)
Sets the duplicate filtering option on import.void
setEmptyStructuresAllowed(boolean b)
If set tofalse
does not import empty molecules.void
setFieldConnections(String connections)
Specifies which data fields correspond to which table fieds.void
setHaltOnError(boolean b)
Sets if import should stop when an error occurs.void
setInfoStream(PrintStream st)
Sets the stream where information about the import prorcess will be written (e.g.void
setInput(File inputFile)
Sets the source object as a file.void
setInput(InputStream is)
Sets the source object as a stream.void
setInput(String fileName)
Sets the source object as a file, specifying the name of the file.void
setNameFieldInDB(String fieldName)
Set a DB field to contain the structure name.void
setOutputOptions(boolean printDuplicates, boolean printNonDuplicates, OutputStream os, boolean doNotImport)
With this option one can print duplicate or non-duplicate molecules to a stream.void
setProgressWriter(ProgressWriter pwriter)
Sets theProgressWriter
object to track the progress the actual importing.void
setRecordsToCheck(int recordsToCheck)
Sets the number of records to check for file format.void
setSetChiralFlag(boolean setChiralFlag)
Sets if chiral flag should be set totrue
during import.void
setSkip(int skip)
Sets the number of molecules to skip from the beginning ogf file.void
setStoreDuplicates(boolean value)
Specifies whether the ID's of duplicate structures should be stored.void
setStoreImportedIDs(boolean value)
Specifies whether the ID's of imported structures should be stored.void
setTableName(String tname)
Sets the name of the table to import into.void
skip(int offset)
Skips the given number of molecules.-
Methods inherited from class java.lang.Thread
activeCount, checkAccess, clone, countStackFrames, currentThread, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, onSpinWait, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, suspend, toString, yield
-
-
-
-
Method Detail
-
setConnectionHandler
public void setConnectionHandler(ConnectionHandler conh)
Setter for property connectionHandler. The ConnectionHandler must represent an open connection to the database.- Parameters:
conh
- the connection handler
-
getConnectionHandler
public ConnectionHandler getConnectionHandler()
Getter for property connectionHandler.- Returns:
- the connection handler
-
setInput
public void setInput(File inputFile)
Sets the source object as a file.- Parameters:
inputFile
- the source file
-
setInput
public void setInput(InputStream is)
Sets the source object as a stream.- Parameters:
is
- the source stream
-
setInput
public void setInput(String fileName)
Sets the source object as a file, specifying the name of the file.- Parameters:
fileName
- the source file name
-
getInput
public Object getInput()
Gets the source object. The object may beFile
orInputStream
.- Returns:
- the source object
-
setTableName
public void setTableName(String tname)
Sets the name of the table to import into.- Parameters:
tname
- the table name
-
getTableName
public String getTableName()
Gets the name of the table to import into.- Returns:
- the table name
-
setConnections
@Deprecated public void setConnections(String connections)
Deprecated.since 2.2 replaced bysetFieldConnections(String)
.Specifies which data fields correspond to which table fields.The fields are given in pair in the following format: "databaseField1=dataField1;databaseField2=dataField2"
- If empty String, no connections are made to extra fields.
- If
null
, the fields are automatically connected by equal names (default).
- Parameters:
connections
- the connection string
-
setFieldConnections
public void setFieldConnections(String connections)
Specifies which data fields correspond to which table fieds.The fields are given in pair in the following format: "databaseField1=dataField1;databaseField2=dataField2"
- If empty String, no connections are made to extra fields.
- If
null
, the fields are automatically connected by equal names (default).
- Parameters:
connections
- the connection string
-
getConnections
@Deprecated public String getConnections()
Deprecated.since 2.2 replaced bygetFieldConnections()
.Returns the specified table field - file field pairs.- Returns:
- the connection string
-
getFieldConnections
public String getFieldConnections()
Returns the specified table field - file field pairs.- Returns:
- the connection string
-
setRecordsToCheck
public void setRecordsToCheck(int recordsToCheck)
Sets the number of records to check for file format. The same number of records will be checked for field names in the case of SDfiles.
Default value is 500 records.
Note: In the case of using InputStream as source, these records are buffered in memory. Make sure java has enough memory when setting this value very high. (-Xmx parameter)
Using File as input recommended if it's feasible, since it doesn't need buffering.- Parameters:
recordsToCheck
- the number of records to check for file format
-
getRecordsToCheck
public int getRecordsToCheck()
Gets the number of records to check for file format.- Returns:
- the number of records to check for file format
-
setProgressWriter
public void setProgressWriter(ProgressWriter pwriter)
Sets theProgressWriter
object to track the progress the actual importing. (Format checking and skipping not monitored by this object.)
It can benull
if no monitoring is necessary.- Parameters:
pwriter
- the progress writer
-
getProgressWriter
public ProgressWriter getProgressWriter()
Gets the ProgressWriter object used for monitoring.- Returns:
- the progress writer
-
setHaltOnError
public void setHaltOnError(boolean b)
Sets if import should stop when an error occurs.- Parameters:
b
-true
if halt on error
-
isHaltOnError
public boolean isHaltOnError()
Gets if import should stop when an error occurs.- Returns:
true
if halt on error
-
setDuplicateImportAllowed
@Deprecated public void setDuplicateImportAllowed(boolean b)
Deprecated.since JChem 5.4. This import option has been table option, instead of this usesetDuplicateImportAllowed(int)
methodIf set tofalse
does not import molecules that already exist in the table with the same topology. This checking may slow down the import progress.- Parameters:
b
- new value for duplicateFiltering- See Also:
DatabaseProperties.setDuplicateFilteringOption(String, boolean)
,StructureTableOptions.isDuplicateFiltering()
-
setDuplicateImportAllowed
public void setDuplicateImportAllowed(int duplicateFilteringOption)
Sets the duplicate filtering option on import. It can be banned, allowed or specified by the table option.- Parameters:
duplicateFilteringOption
-- If set to
DUPLICATE_FILTERING_ON
does not import molecules that already exist in the table with the same topology. Forces switching ON duplicate filtering regardless of table setting. This checking may slow down the import progress. - If set to
DUPLICATE_FILTERING_OFF
duplicates are allowed. Forces switching OFF duplicate filtering regardless of table setting. - If set to
DUPLICATE_FILTERING_TABLE_OPTION
the value of the table option (StructureTableOptions.isDuplicateFiltering()
) controls the filtering of duplicates.
- If set to
- See Also:
UpdateHandler.DUPLICATE_FILTERING_ON
,UpdateHandler.DUPLICATE_FILTERING_OFF
,UpdateHandler.DUPLICATE_FILTERING_TABLE_OPTION
,StructureTableOptions.isDuplicateFiltering()
-
isDuplicateImportAllowed
public boolean isDuplicateImportAllowed()
Gets whether duplicate structures are allowed.- Returns:
- true if duplicates are allowed
- Throws:
IllegalArgumentException
- if duplicate filtering option of the table cannot be determined.
-
setEmptyStructuresAllowed
public void setEmptyStructuresAllowed(boolean b)
If set tofalse
does not import empty molecules.- Parameters:
b
- set to true if empty structures are allowed
-
getEmptyStructuresAllowed
public boolean getEmptyStructuresAllowed()
Gets whether empty structures are allowed.- Returns:
- true if empty structures are allowed
-
setSetChiralFlag
public void setSetChiralFlag(boolean setChiralFlag)
Sets if chiral flag should be set totrue
during import.- Parameters:
setChiralFlag
- if set totrue
, chiral flag is set totrue
for imported molecules. The default setting isfalse
. since 2.3
-
getSetChiralFlag
public boolean getSetChiralFlag()
Gets whether chiral flag is set on import.- Returns:
- the current state
-
setNameFieldInDB
public void setNameFieldInDB(String fieldName)
Set a DB field to contain the structure name.- Parameters:
fieldName
- the name of the column in the database
-
getNameFieldInDB
public String getNameFieldInDB()
-
isFinished
public boolean isFinished()
Checks whenver the import has been finished.- Returns:
true
if importing has finished, else returnsfalse
.
-
getErrorMessage
public String getErrorMessage()
If error occurs this function returns the error message.- Returns:
- the error message
-
getErrorCause
public Throwable getErrorCause()
Retrieves Throwable caught inrun()
method. WARNING This mechanism is expected to be revised in the near future, use with extreme caution!- Returns:
- the error cause or
null
-
getStructCount
public int getStructCount()
Returns the current count of structures which were examined by the import process.- Returns:
- the number of molecules imported so far
-
getImportedNumber
public int getImportedNumber()
Returns the number of imported molecules.- Returns:
- the number of molecules imported
-
getDuplicates
public int getDuplicates()
Returns the number of molecules that were not imported, because they are duplicates.- Returns:
- the number of duplicates encountered
-
getEmptyStructures
public int getEmptyStructures()
Returns the number of molecules that were not imported, because they are empty strucures.- Returns:
- the count of empty structures skipped
-
getNote
public String getNote()
Returns the note of the progresswriter.- Returns:
- the note
-
setSkip
public void setSkip(int skip)
Sets the number of molecules to skip from the beginning ogf file.- Parameters:
skip
- the number of molecule to skip
-
getSkip
public int getSkip()
Gets the number of molecules to skip from the beginning ogf file.- Returns:
- the number of molecules to skip
-
getProgress
public long getProgress()
Gets the status of the importing progress.- Returns:
- the position of the ProgressWriter, -1 if the object is not set (null)
-
run
public void run()
Starts execution as a thread. Calls init(),skip(), and importMols. Exceptions are caught and printed to stderr.
-
setInfoStream
public void setInfoStream(PrintStream st)
Sets the stream where information about the import prorcess will be written (e.g. skipped duplicates and empty structures).- Parameters:
st
- the stream. The default isnull
(no info is written).
-
setOutputOptions
public void setOutputOptions(boolean printDuplicates, boolean printNonDuplicates, OutputStream os, boolean doNotImport)
With this option one can print duplicate or non-duplicate molecules to a stream. Will print only if duplicate filtering is allowed.- Parameters:
printDuplicates
- prints indexes of duplicate structuresprintNonDuplicates
- prints indexes of imported structuresos
- outputStream to write information, ifnull
stdout is used.doNotImport
- disables the real importing, just generates the index entry.
-
setStoreDuplicates
public void setStoreDuplicates(boolean value)
Specifies whether the ID's of duplicate structures should be stored.- Parameters:
value
- enable/disable storing of duplicates- See Also:
getDuplicateIDList()
-
setStoreImportedIDs
public void setStoreImportedIDs(boolean value)
Specifies whether the ID's of imported structures should be stored.- Parameters:
value
- enable/disable storing of ids- Since:
- JChem 3.1.7
- See Also:
getImportedIDList()
-
getDuplicateIDList
public List<Integer> getDuplicateIDList()
Returns the IDs (cd_id column in database table) of duplicate structures.- Returns:
- the IDs as a list containing Integer objects.
- See Also:
setStoreDuplicates(boolean)
-
getImportedIDList
public List<Integer> getImportedIDList()
Returns the IDs (cd_id column in database table) of imported structures.- Returns:
- the IDs as a list containing Integer objects.
- See Also:
setStoreImportedIDs(boolean)
-
importMols
public int importMols() throws TransferException
Imports molecules.- Returns:
- the number of molecules imported
- Throws:
TransferException
- if the settings are invalid
-
cancel
public void cancel()
Stops the importing progress.
-
skip
public void skip(int offset) throws TransferException
Skips the given number of molecules.- Parameters:
offset
- the number of molecules to be skipped- Throws:
TransferException
- if the settings are invalid
-
init
public void init() throws TransferException
Initialization, checking given number of lines for file format and fields. If not called explicitly, automatically called byskip
orimportMols
if necessary.- Throws:
TransferException
- if the settings are invalid
-
getFieldNameList
public List<String> getFieldNameList() throws TransferException, IOException
Returns field names in an SDfile. The file may come from an InputStream, import may follow without reopening the stream. Callsint
if initialization is necessary.- Returns:
- a vector of String objects, the names of the SDfile fields.
- Throws:
TransferException
- in case of failureIOException
- if an IOException occurs
-
getFieldNameList
public static List<String> getFieldNameList(InputStream is, int linesToCheck) throws IOException, MRecordParseException
Returns field names in an SDfile. NOTE: in order to return to the initial position, the InputStream has to reopened or repositioned (BufferedInputStream)- Parameters:
is
- inputStream to read fromlinesToCheck
- read ahead this many lines during field name identification- Returns:
- a vector of String objects, the names of the SDfile fields.
- Throws:
IOException
- if the inputstream encounters a problemMRecordParseException
- if a record could not be read.
-
-