Package chemaxon.jchem.db
Class Importer
java.lang.Object
java.lang.Thread
chemaxon.jchem.db.Importer
- All Implemented Interfaces:
chemaxon.jchem.db.Transfer,Runnable
Tool for importing molecules to database tables from a File
or InputStream object.
Example of usage: File Import/Export Tools.
-
Nested Class Summary
Nested classes/interfaces inherited from class java.lang.Thread
Thread.Builder, Thread.State, Thread.UncaughtExceptionHandler -
Field Summary
Fields inherited from class java.lang.Thread
MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITYFields inherited from interface chemaxon.jchem.db.Transfer
CXSMARTS, CXSMILES, INCHI, MOL2FILE, MOLFILE, MRV, RDFILE, RXNFILE, SDFILE, SMARTS, SMILES -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidcancel()Stops the importing progress.Getter for property connectionHandler.Deprecated, for removal: This API element is subject to removal in a future version.Returns the IDs (cd_id column in database table) of duplicate structures.intReturns the number of molecules that were not imported, because they are duplicates.intReturns the number of molecules that were not imported, because they are empty strucures.booleanGets whether empty structures are allowed.Retrieves Throwable caught inrun()method.If error occurs this function returns the error message.Returns the specified table field - file field pairs.Returns field names in an SDfile.getFieldNameList(InputStream is, int linesToCheck) Returns field names in an SDfile.Returns the IDs (cd_id column in database table) of imported structures.intReturns the number of imported molecules.getInput()Gets the source object.getNote()Returns the note of the progresswriter.longGets the status of the importing progress.Gets the ProgressWriter object used for monitoring.intGets the number of records to check for file format.booleanGets whether chiral flag is set on import.intgetSkip()Gets the number of molecules to skip from the beginning ogf file.intReturns the current count of structures which were examined by the import process.Gets the name of the table to import into.intImports molecules.voidinit()Initialization, checking given number of lines for file format and fields.booleanGets whether duplicate structures are allowed.booleanChecks whenver the import has been finished.booleanGets if import should stop when an error occurs.voidrun()Starts execution as a thread.voidSetter for property connectionHandler.voidsetConnections(String connections) Deprecated, for removal: This API element is subject to removal in a future version.since 2.2 replaced bysetFieldConnections(String).voidsetDuplicateImportAllowed(boolean b) Deprecated, for removal: This API element is subject to removal in a future version.since JChem 5.4.voidsetDuplicateImportAllowed(int duplicateFilteringOption) Sets the duplicate filtering option on import.voidsetEmptyStructuresAllowed(boolean b) If set tofalsedoes not import empty molecules.voidsetFieldConnections(String connections) Specifies which data fields correspond to which table fieds.voidsetHaltOnError(boolean b) Sets if import should stop when an error occurs.voidSets the stream where information about the import prorcess will be written (e.g.voidSets the source object as a file.voidsetInput(InputStream is) Sets the source object as a stream.voidSets the source object as a file, specifying the name of the file.voidsetNameFieldInDB(String fieldName) Set a DB field to contain the structure name.voidsetOutputOptions(boolean printDuplicates, boolean printNonDuplicates, OutputStream os, boolean doNotImport) With this option one can print duplicate or non-duplicate molecules to a stream.voidsetProgressWriter(ProgressWriter pwriter) Sets theProgressWriterobject to track the progress the actual importing.voidsetRecordsToCheck(int recordsToCheck) Sets the number of records to check for file format.voidsetSetChiralFlag(boolean setChiralFlag) Sets if chiral flag should be set totrueduring import.voidsetSkip(int skip) Sets the number of molecules to skip from the beginning ogf file.voidsetStoreDuplicates(boolean value) Specifies whether the ID's of duplicate structures should be stored.voidsetStoreImportedIDs(boolean value) Specifies whether the ID's of imported structures should be stored.voidsetTableName(String tname) Sets the name of the table to import into.voidskip(int offset) Skips the given number of molecules.Methods inherited from class java.lang.Thread
activeCount, checkAccess, clone, countStackFrames, currentThread, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, isVirtual, join, join, join, join, ofPlatform, ofVirtual, onSpinWait, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, sleep, start, startVirtualThread, stop, suspend, threadId, toString, yield
-
Constructor Details
-
Importer
public Importer()Constructor.
-
-
Method Details
-
setConnectionHandler
Setter for property connectionHandler. The ConnectionHandler must represent an open connection to the database.- Parameters:
conh- the connection handler
-
getConnectionHandler
Getter for property connectionHandler.- Returns:
- the connection handler
-
setInput
Sets the source object as a file.- Parameters:
inputFile- the source file
-
setInput
Sets the source object as a stream.- Parameters:
is- the source stream
-
setInput
Sets the source object as a file, specifying the name of the file.- Parameters:
fileName- the source file name
-
getInput
Gets the source object. The object may beFileorInputStream.- Returns:
- the source object
-
setTableName
Sets the name of the table to import into.- Parameters:
tname- the table name
-
getTableName
Gets the name of the table to import into.- Returns:
- the table name
-
setConnections
@Deprecated(forRemoval=true) @SubjectToRemoval(date=JUL_01_2025) public void setConnections(String connections) Deprecated, for removal: This API element is subject to removal in a future version.since 2.2 replaced bysetFieldConnections(String).Specifies which data fields correspond to which table fields.The fields are given in pair in the following format: "databaseField1=dataField1;databaseField2=dataField2"
- If empty String, no connections are made to extra fields.
- If
null, the fields are automatically connected by equal names (default).
- Parameters:
connections- the connection string
-
setFieldConnections
Specifies which data fields correspond to which table fieds.The fields are given in pair in the following format: "databaseField1=dataField1;databaseField2=dataField2"
- If empty String, no connections are made to extra fields.
- If
null, the fields are automatically connected by equal names (default).
- Parameters:
connections- the connection string
-
getConnections
Deprecated, for removal: This API element is subject to removal in a future version.since 2.2 replaced bygetFieldConnections().Returns the specified table field - file field pairs.- Returns:
- the connection string
-
getFieldConnections
Returns the specified table field - file field pairs.- Returns:
- the connection string
-
setRecordsToCheck
public void setRecordsToCheck(int recordsToCheck) Sets the number of records to check for file format. The same number of records will be checked for field names in the case of SDfiles.
Default value is 500 records.
Note: In the case of using InputStream as source, these records are buffered in memory. Make sure java has enough memory when setting this value very high. (-Xmx parameter)
Using File as input recommended if it's feasible, since it doesn't need buffering.- Parameters:
recordsToCheck- the number of records to check for file format
-
getRecordsToCheck
public int getRecordsToCheck()Gets the number of records to check for file format.- Returns:
- the number of records to check for file format
-
setProgressWriter
Sets theProgressWriterobject to track the progress the actual importing. (Format checking and skipping not monitored by this object.)
It can benullif no monitoring is necessary.- Parameters:
pwriter- the progress writer
-
getProgressWriter
Gets the ProgressWriter object used for monitoring.- Returns:
- the progress writer
-
setHaltOnError
public void setHaltOnError(boolean b) Sets if import should stop when an error occurs.- Parameters:
b-trueif halt on error
-
isHaltOnError
public boolean isHaltOnError()Gets if import should stop when an error occurs.- Returns:
trueif halt on error
-
setDuplicateImportAllowed
@Deprecated(forRemoval=true) @SubjectToRemoval(date=JUL_01_2025) public void setDuplicateImportAllowed(boolean b) Deprecated, for removal: This API element is subject to removal in a future version.since JChem 5.4. This import option has been table option, instead of this usesetDuplicateImportAllowed(int)methodIf set tofalsedoes not import molecules that already exist in the table with the same topology. This checking may slow down the import progress.- Parameters:
b- new value for duplicateFiltering- See Also:
-
setDuplicateImportAllowed
public void setDuplicateImportAllowed(int duplicateFilteringOption) Sets the duplicate filtering option on import. It can be banned, allowed or specified by the table option.- Parameters:
duplicateFilteringOption-- If set to
DUPLICATE_FILTERING_ONdoes not import molecules that already exist in the table with the same topology. Forces switching ON duplicate filtering regardless of table setting. This checking may slow down the import progress. - If set to
DUPLICATE_FILTERING_OFFduplicates are allowed. Forces switching OFF duplicate filtering regardless of table setting. - If set to
DUPLICATE_FILTERING_TABLE_OPTIONthe value of the table option (StructureTableOptions.isDuplicateFiltering()) controls the filtering of duplicates.
- If set to
- See Also:
-
isDuplicateImportAllowed
public boolean isDuplicateImportAllowed()Gets whether duplicate structures are allowed.- Returns:
- true if duplicates are allowed
- Throws:
IllegalArgumentException- if duplicate filtering option of the table cannot be determined.
-
setEmptyStructuresAllowed
public void setEmptyStructuresAllowed(boolean b) If set tofalsedoes not import empty molecules.- Parameters:
b- set to true if empty structures are allowed
-
getEmptyStructuresAllowed
public boolean getEmptyStructuresAllowed()Gets whether empty structures are allowed.- Returns:
- true if empty structures are allowed
-
setSetChiralFlag
public void setSetChiralFlag(boolean setChiralFlag) Sets if chiral flag should be set totrueduring import.- Parameters:
setChiralFlag- if set totrue, chiral flag is set totruefor imported molecules. The default setting isfalse. since 2.3
-
getSetChiralFlag
public boolean getSetChiralFlag()Gets whether chiral flag is set on import.- Returns:
- the current state
-
setNameFieldInDB
Set a DB field to contain the structure name.- Parameters:
fieldName- the name of the column in the database
-
getNameFieldInDB
-
isFinished
public boolean isFinished()Checks whenver the import has been finished.- Returns:
trueif importing has finished, else returnsfalse.
-
getErrorMessage
If error occurs this function returns the error message.- Returns:
- the error message
-
getErrorCause
Retrieves Throwable caught inrun()method. WARNING This mechanism is expected to be revised in the near future, use with extreme caution!- Returns:
- the error cause or
null
-
getStructCount
public int getStructCount()Returns the current count of structures which were examined by the import process.- Returns:
- the number of molecules imported so far
-
getImportedNumber
public int getImportedNumber()Returns the number of imported molecules.- Returns:
- the number of molecules imported
-
getDuplicates
public int getDuplicates()Returns the number of molecules that were not imported, because they are duplicates.- Returns:
- the number of duplicates encountered
-
getEmptyStructures
public int getEmptyStructures()Returns the number of molecules that were not imported, because they are empty strucures.- Returns:
- the count of empty structures skipped
-
getNote
Returns the note of the progresswriter.- Returns:
- the note
-
setSkip
public void setSkip(int skip) Sets the number of molecules to skip from the beginning ogf file.- Parameters:
skip- the number of molecule to skip
-
getSkip
public int getSkip()Gets the number of molecules to skip from the beginning ogf file.- Returns:
- the number of molecules to skip
-
getProgress
public long getProgress()Gets the status of the importing progress.- Returns:
- the position of the ProgressWriter, -1 if the object is not set (null)
-
run
public void run()Starts execution as a thread. Calls init(),skip(), and importMols. Exceptions are caught and printed to stderr. -
setInfoStream
Sets the stream where information about the import prorcess will be written (e.g. skipped duplicates and empty structures).- Parameters:
st- the stream. The default isnull(no info is written).
-
setOutputOptions
public void setOutputOptions(boolean printDuplicates, boolean printNonDuplicates, OutputStream os, boolean doNotImport) With this option one can print duplicate or non-duplicate molecules to a stream. Will print only if duplicate filtering is allowed.- Parameters:
printDuplicates- prints indexes of duplicate structuresprintNonDuplicates- prints indexes of imported structuresos- outputStream to write information, ifnullstdout is used.doNotImport- disables the real importing, just generates the index entry.
-
setStoreDuplicates
public void setStoreDuplicates(boolean value) Specifies whether the ID's of duplicate structures should be stored.- Parameters:
value- enable/disable storing of duplicates- See Also:
-
setStoreImportedIDs
public void setStoreImportedIDs(boolean value) Specifies whether the ID's of imported structures should be stored.- Parameters:
value- enable/disable storing of ids- Since:
- JChem 3.1.7
- See Also:
-
getDuplicateIDList
Returns the IDs (cd_id column in database table) of duplicate structures.- Returns:
- the IDs as a list containing Integer objects.
- See Also:
-
getImportedIDList
Returns the IDs (cd_id column in database table) of imported structures.- Returns:
- the IDs as a list containing Integer objects.
- See Also:
-
importMols
Imports molecules.- Returns:
- the number of molecules imported
- Throws:
TransferException- if the settings are invalid
-
cancel
public void cancel()Stops the importing progress. -
skip
Skips the given number of molecules.- Parameters:
offset- the number of molecules to be skipped- Throws:
TransferException- if the settings are invalid
-
init
Initialization, checking given number of lines for file format and fields. If not called explicitly, automatically called byskiporimportMolsif necessary.- Throws:
TransferException- if the settings are invalid
-
getFieldNameList
Returns field names in an SDfile. The file may come from an InputStream, import may follow without reopening the stream. Callsintif initialization is necessary.- Returns:
- a vector of String objects, the names of the SDfile fields.
- Throws:
TransferException- in case of failureIOException- if an IOException occurs
-
getFieldNameList
public static List<String> getFieldNameList(InputStream is, int linesToCheck) throws IOException, MRecordParseException Returns field names in an SDfile. NOTE: in order to return to the initial position, the InputStream has to reopened or repositioned (BufferedInputStream)- Parameters:
is- inputStream to read fromlinesToCheck- read ahead this many lines during field name identification- Returns:
- a vector of String objects, the names of the SDfile fields.
- Throws:
IOException- if the inputstream encounters a problemMRecordParseException- if a record could not be read.
-
getFieldConnections().