Class MolImporter
- All Implemented Interfaces:
MolImporterInterface,Closeable,AutoCloseable,Iterable<Molecule>
The input file format is guessed automatically or specified as an import option to the constructor. Many different formats are supported like "mol", "rgf", "sdf", "rdf", "csmol", "csrgf", "cssdf", "csrdf", "mol2", "cml", "mrv", "smiles", "cxsmiles", "pdb", "xyz", "cube", "name". For more information on formats, please visit File Formats in Marvin. MolImporter can also import gzip compressed and base64 encoded structures.
The processing is single-threaded by default. Concurrent mode can be enabled using setThreadCount(int).
Serialized Molecule objects can also be imported using the "chemaxon.struc.Molecule" format. In this case, processing is always single-threaded, regardless of the configured thread count.
-
Constructor Summary
ConstructorsConstructorDescriptionMolImporter(File f) Create a molecule importer for a file.MolImporter(File f, String opts) Create a molecule importer for a file.Create a molecule importer for an input stream.MolImporter(InputStream is, String opts) Create a molecule importer for an input stream.MolImporter(InputStream is, String opts, String enc) Create a molecule importer for an input stream.MolImporter(InputStream is, String opts, String enc, String fileName) Create a molecule importer for an input stream.MolImporter(String fname) Create a molecule importer for a file.MolImporter(String fname, Component component, String msg) Create a molecule importer with a progress monitor. -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()Closes the underlying input stream.intEstimates the total number of records.getFile()Gets the file object for the input.Gets the name of the input fileGet the file format.Gets the global properties in a container that was retrieved from the input stream, earlier.Gets the last grabbed molecule string with LF style line endings by default.intGets the current line number.Creates anMDocumentstream with the iterator of the importer.Creates aMoleculestream with the iterator of the importer.final StringGets the import options.booleanGets query mode.intGets the current record number.intGets the total number of records read.static MDocumentimportDoc(byte[] b) Reads a document from a byte array.static MDocumentReads a document from a byte array.static MoleculeimportMol(byte[] b) Reads a molecule from a byte array.static MoleculeReads a molecule from a byte array.static MoleculeimportMol(InputStream is, String opts, String enc) Reads a molecule from an input stream.static MoleculeReads a molecule from a string.static MoleculeDeprecated.static MoleculeReads a molecule from a string.booleanTests whether the end of input is already reached.booleanAre the imported molecules merged into one multi-set molecule?booleanAre the imported molecules merged into one multi-set molecule?booleanTests whether rewinding (seeking backwards) is possible in the underlying input stream.nextDoc()Reads the next document.read()Reads the next molecule from the stream/file.Reads the next molecule in text format without creating aMoleculeobject.voidseekRecord(int k, ProgressMonitor pmon) Seek the specified record.protected voidseekVisitedRecord(int k) Seeks an already visited position in case of rewindable input.voidsetQueryMode(boolean q) Sets query mode.voidsetThreadCount(int threadCount) Sets the number of threads for concurrent processing.booleanSkips the next molecule or document instead of reading it into memory.longtell()Returns the current file offset.Methods inherited from class chemaxon.formats.MDocSource
getDocLabel, getMoleculeIterator, iterator, seekForward, seekRecordAtFraction, skipRecordsMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface java.lang.Iterable
forEach, spliterator
-
Constructor Details
-
MolImporter
Create a molecule importer for an input stream. Begins reading the input stream and determines the file format.- Parameters:
is- the input stream to read- Throws:
IOException- If I/O error occurred when determining the file format.MolFormatException- If the molecule file is in a format that cannot be readIllegalCharsetNameException- if illegal encoding is usedUnsupportedCharsetException- if unsupported encoding is used
-
MolImporter
Create a molecule importer for an input stream. Begins reading the input stream and determines the file format. If the option string starts with the substring "MULTISET", then all the molecules in the stream are merged into one molecule object containing multiple atom sets. If it starts with "MOLMOVIE", then molecules are read taken to be frames of a molecule movie (default in XYZ format). If it starts with "NOMOLMOVIE", then multimolecule XYZ files are not interpreted as molecule movies. Other parts of the option string are passed to the import module. The input character encoding can also be set in "enc{encoding}" form.- Parameters:
is- the input stream to readopts- the file format and/or options separated by a colon; usenullfor automatic format recognition and default options- Throws:
IOException- If I/O error occurred when determining the file format.MolFormatException- If the molecule file is in a format that cannot be readIllegalCharsetNameException- if illegal encoding is usedUnsupportedCharsetException- if unsupported encoding is used
-
MolImporter
Create a molecule importer for an input stream. Begins reading the input stream and determines the file format. If the option string starts with the substring "MULTISET", then all the molecules in the stream are merged into one molecule object containing multiple atom sets. If it starts with "MOLMOVIE", then molecules are read taken to be frames of a molecule movie (default in XYZ format). If it starts with "NOMOLMOVIE", then multimolecule XYZ files are not interpreted as molecule movies. Other parts of the option string are passed to the import module. The input character encoding can also be set in "enc{encoding}" form.- Parameters:
is- the input stream to readopts- the file format and/or options separated by a colon; usenullfor automatic format recognition and default optionsenc- charset name ornull- Throws:
IOException- If I/O error occurred when determining the file format.MolFormatException- If the molecule file is in a format that cannot be readIllegalCharsetNameException- if illegal encoding is usedUnsupportedCharsetException- if unsupported encoding is used- Since:
- Marvin 3.5.5, 01/02/2006
-
MolImporter
public MolImporter(InputStream is, String opts, String enc, String fileName) throws IOException, MolFormatException Create a molecule importer for an input stream. Begins reading the input stream and determines the file format. If the option string starts with the substring "MULTISET", then all the molecules in the stream are merged into one molecule object containing multiple atom sets. If it starts with "MOLMOVIE", then molecules are read taken to be frames of a molecule movie (default in XYZ format). If it starts with "NOMOLMOVIE", then multimolecule XYZ files are not interpreted as molecule movies. Other parts of the option string are passed to the import module. The input character encoding can also be set in "enc{encoding}" form.- Parameters:
is- the input stream to readopts- the file format and/or options separated by a colon; usenullfor automatic format recognition and default optionsenc- charset name ornullfileName- the original filename the stream is reading from- Throws:
IOException- If I/O error occurred when determining the file format.MolFormatException- If the molecule file is in a format that cannot be readIllegalCharsetNameException- if illegal encoding is usedUnsupportedCharsetException- if unsupported encoding is used- Since:
- Marvin 5.8
-
MolImporter
Create a molecule importer for a file. Begins reading the input stream and determines the file format. If the option string starts with the substring "MULTISET", then all the molecules in the stream are merged into one molecule object containing multiple atom sets. If it starts with "MOLMOVIE", then molecules are read taken to be frames of a molecule movie (default in XYZ format). If it starts with "NOMOLMOVIE", then multimolecule XYZ files are not interpreted as molecule movies. The input character encoding can also be set in "enc{encoding}" form. Other parts of the option string are passed to the import module.- Parameters:
f- the file to readopts- the file format and/or options separated by a colon; usenullfor automatic format recognition and default options- Throws:
IOException- If I/O error occurred when determining the file format.MolFormatException- If the molecule file is in a format that cannot be readIllegalCharsetNameException- if illegal encoding is usedUnsupportedCharsetException- if unsupported encoding is used- See Also:
-
MolImporter
Create a molecule importer for a file. Begins reading the input stream and determines the file format. If the option string starts with the substring "MULTISET", then all the molecules in the stream are merged into one molecule object containing multiple atom sets. If it starts with "MOLMOVIE", then molecules are read taken to be frames of a molecule movie (default in XYZ format). If it starts with "NOMOLMOVIE", then multimolecule XYZ files are not interpreted as molecule movies. The input character encoding can also be set in "enc{encoding}" form. Other parts of the option string are passed to the import module.- Parameters:
f- the file to read- Throws:
IOException- If I/O error occurred when determining the file format.MolFormatException- If the molecule file is in a format that cannot be readIllegalCharsetNameException- if illegal encoding is usedUnsupportedCharsetException- if unsupported encoding is used- Since:
- Marinv 6.3
- See Also:
-
MolImporter
Create a molecule importer for a file. Begins reading the input stream and determines the file format. The filename string can contain options in the "file{options}" form. If the option string starts with the substring "MULTISET", then all the molecules in the stream are merged into one molecule object containing multiple atom sets. If it starts with "MOLMOVIE", then molecules are read taken to be frames of a molecule movie (default in XYZ format). If it starts with "NOMOLMOVIE", then multimolecule XYZ files are not interpreted as molecule movies. Other parts of the option string are passed to the import module. The input character encoding can also be set in "enc{encoding}" form.- Parameters:
fname- name of the file to read- Throws:
IOException- If I/O error occurred when determining the file format.MolFormatException- If the molecule file is in a format that cannot be readIllegalCharsetNameException- if illegal encoding is usedUnsupportedCharsetException- if unsupported encoding is used- See Also:
-
MolImporter
public MolImporter(String fname, Component component, String msg) throws IOException, MolFormatException Create a molecule importer with a progress monitor. Begins reading the input stream and determines the file format. The filename string can contain options in the "file{options}" form. If the option string starts with "MULTISET" or "MULTISET,", then all the molecules in the stream are merged into one molecule object containing multiple atom sets. The input character encoding can also be set in "enc{encoding}" form. Other parts of the option string are passed to the import module.- Parameters:
fname- name of the file to readcomponent- the parent componentmsg- displayed message, where %p is replaced by the file path- Throws:
IOException- If I/O error occurred when determining the file format.MolFormatException- If the molecule file is in a format that cannot be readIllegalCharsetNameException- if illegal encoding is usedUnsupportedCharsetException- if unsupported encoding is used- See Also:
-
-
Method Details
-
getFileName
Gets the name of the input file- Returns:
- the name of the input file
-
getFile
Gets the file object for the input.- Returns:
- the File or null (if the input is not a File)
-
getOptions
Gets the import options.- Returns:
- the options
-
getMolImportModule
-
getGrabbedMoleculeString
Gets the last grabbed molecule string with LF style line endings by default. If the "noLF" import option was set for MolImporter, then original line endings are kept. E.g. new MolImporter(stream, "mrv:noLF");- Returns:
- the molecule as a string
- Since:
- 4.0, 01/05/2005
-
isMultiSet
public boolean isMultiSet()Are the imported molecules merged into one multi-set molecule?- Returns:
trueif the input is a multi-set molecule
-
isMolMovie
public boolean isMolMovie()Are the imported molecules merged into one multi-set molecule?- Returns:
trueif the input is a multi-set molecule- Since:
- Marvin 5.2, 02/12/2009
-
getMolStream
Creates aMoleculestream with the iterator of the importer. Only one iterator can exist at a time, so only one stream can exist at a time.- Specified by:
getMolStreamin interfaceMolImporterInterface- Overrides:
getMolStreamin classMDocSource
-
getMDocumentStream
Creates anMDocumentstream with the iterator of the importer. AddsMDocumentto molecules without document. Only one iterator can exist at a time, so only one stream can exist at a time.- Specified by:
getMDocumentStreamin interfaceMolImporterInterface- Overrides:
getMDocumentStreamin classMDocSource
-
setThreadCount
Sets the number of threads for concurrent processing. The default value is 1 (single-threaded mode).In concurrent mode, multiple threads are started in the background, which adds a considerable overhead to the processing, so only use this mode if the number of molecules is huge. In single-threaded mode, no background thread is used, every action is done on the caller thread.
- Parameters:
threadCount- the number of threads, set0for the number of CPUs,1for single-threaded mode- Throws:
IllegalStateException- if concurrent processing is already started or if object input stream is used instead of record importer- Since:
- Marvin 5.3
-
getQueryMode
public boolean getQueryMode()Gets query mode. SMILES strings are imported as SMARTS if query mode is set.- Returns:
- query mode
- Since:
- Marvin 3.3, 11/14/2003
-
setQueryMode
public void setQueryMode(boolean q) Sets query mode. SMILES strings are imported as SMARTS if query mode is set.- Parameters:
q- query mode- Since:
- Marvin 3.3, 11/14/2003
-
read
Reads the next molecule from the stream/file.- Specified by:
readin interfaceMolImporterInterface- Returns:
- the next molecule, or
nullif no more molecules can be imported. - Throws:
IOException- if an error occurred during reading.
-
nextDoc
Reads the next document.- Specified by:
nextDocin classMDocSource- Returns:
- the next document or
nullat end of file - Throws:
IOException- If I/O error occurred- Since:
- Marvin 4.1, 04/14/2006
-
skipRecord
Skips the next molecule or document instead of reading it into memory.- Specified by:
skipRecordin classMDocSource- Returns:
trueif the end of molecule is found,falseif there is no chance to continue- Throws:
IOException- if read error occurred- Since:
- Marvin 4.1, 04/20/2006
-
readRecordAsText
Reads the next molecule in text format without creating aMoleculeobject. Processing is single-threaded.- Returns:
- the grabbed record in its original format or
nullat end of file - Throws:
MolExportException- if binary data cannot be exported to MRV format textIOException- if read error occurred- Since:
- Marvin 5.0, 11/13/2006
-
isRewindable
public boolean isRewindable()Tests whether rewinding (seeking backwards) is possible in the underlying input stream. In concurrent mode always returnsfalse. Therefore this method should not be called before callingsetThreadCount(int).- Specified by:
isRewindablein classMDocSource- Returns:
trueif rewinding is possible,falseotherwise- Since:
- Marvin 4.1, 04/20/2006
- See Also:
-
seekRecord
Seek the specified record. This method should not be called before callingsetThreadCount(int). Backward seeking (rewinding) in the stream is only possible if the underlying input stream is seekable. Note, that in concurrent mode this is not true, the import is not rewindable. Forward seeking is always possible. Seeking terminates before reaching the specified position if the usercancelsthe progress dialog.- Specified by:
seekRecordin classMDocSource- Parameters:
k- positionpmon- progress monitor ornull- Throws:
EOFException- if end of file reached while trying to seekIOException- if read error occurred- Since:
- Marvin 4.1, 04/19/2006
- See Also:
-
seekVisitedRecord
Seeks an already visited position in case of rewindable input. This method should not be called before callingsetThreadCount(int).- Specified by:
seekVisitedRecordin classMDocSource- Parameters:
k- the record index- Throws:
IOException- if read error occurred- Since:
- Marvin 4.1, 06/28/2006
- See Also:
-
isEndReached
public boolean isEndReached()Tests whether the end of input is already reached.- Specified by:
isEndReachedin classMDocSource- Returns:
trueif the end was reached,falseotherwise- Since:
- Marvin 4.1, 06/18/2006
-
estimateNumRecords
public int estimateNumRecords()Estimates the total number of records. If the end of file is already reached, then it returns the exact value. Otherwise, in case of a file with known length, it extrapolates from the last read record index and the value of the file pointer at the last read position. If the input is a stream with unknown total length, then it returns two times the current highest record number.- Specified by:
estimateNumRecordsin classMDocSource- Returns:
- estimated number of records or -1 at the beginning of file
- Since:
- Marvin 4.1, 04/18/2006
-
tell
Returns the current file offset.- Returns:
- the file pointer
- Throws:
IOException- if the position cannot be determined
-
getLineCount
public int getLineCount()Gets the current line number. This method should not be called before callingsetThreadCount(int).- Returns:
- the line number
- See Also:
-
getRecordCount
public int getRecordCount()Gets the current record number.- Specified by:
getRecordCountin classMDocSource- Returns:
- the record number
- Since:
- Marvin 4.1, 04/18/2006
-
getRecordCountMax
public int getRecordCountMax()Gets the total number of records read.- Specified by:
getRecordCountMaxin classMDocSource- Returns:
- the number of records
- Since:
- Marvin 4.1, 04/18/2006
-
close
Closes the underlying input stream.WARNING: call this after reading molecules to close concurrent processing properly.
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Specified by:
closein interfaceMolImporterInterface- Overrides:
closein classMDocSource- Throws:
IOException- if an error occurred.
-
getFormat
Get the file format.- Returns:
- "mrv", "mol", "csmol", "sdf", "cssdf", "rdf", "csrdf", "smiles", "sybyl", "mol2", "pdb", "xyz", "cube", "inchi", "csv", "gzip:{inner file format}", etc. or "chemaxon.struc.Molecule" if imported from ObjectInputStream (serialized molecule)
-
importMol
Reads a molecule from a byte array. If the array contains multiple molecules (e.g. in the case of an SDF or MRV file), reads only the first one. Processing is single-threaded.- Parameters:
b- the molecule file contents- Returns:
- the molecule
- Throws:
MolFormatException- If the molecule file is in a format that cannot be read
-
importMol
Reads a molecule from a byte array. If the array contains multiple molecules (e.g. in the case of an SDF or MRV file), reads only the first one. Processing is single-threaded.- Parameters:
b- the molecule file contentsopts- the file format and/or options separated by a colon; usenullfor automatic format recognition and default optionsenc- encoding ornull- Returns:
- the molecule
- Throws:
MolFormatException- If the molecule file is in a format that cannot be read- Since:
- Marvin 5.0, 12/27/2007
-
importMol
Reads a molecule from an input stream. If the array contains multiple molecules (e.g. in the case of an SDF or MRV file), reads only the first one.- Parameters:
is- the molecule file contentsopts- the file format and/or options separated by a colon; usenullfor automatic format recognition and default optionsenc- encoding ornull- Returns:
- the molecule
- Throws:
MolFormatException- If the molecule file is in a format that cannot be read- Since:
- Marvin 5.0, 12/27/2007
-
importDoc
Reads a document from a byte array. If the array contains multiple molecules (e.g. in the case of an SDF or MRV file), reads only the first one. Processing is single-threaded.- Parameters:
b- the file contents- Returns:
- the document or
nullif no document found in input - Throws:
MolFormatException- If the molecule file is in a format that cannot be read- Since:
- Marvin 4.1.8, 04/20/2007
-
importDoc
Reads a document from a byte array. If the array contains multiple molecules (e.g. in the case of an SDF or MRV file), reads only the first one. Processing is single-threaded.- Parameters:
b- the file contentsopts- the file format and/or options separated by a colon; usenullfor automatic format recognition and default optionsenc- encoding ornull- Returns:
- the document or
nullif no document found in input - Throws:
MolFormatException- If the molecule file is in a format that cannot be read- Since:
- Marvin 5.0, 12/27/2007
-
importMol
Reads a molecule from a string. If the string contains multiple molecules (e.g. in the case of an SDF or MRV file), reads only the first one. If the format is known, it is faster to useimportMol(String, String)to avoid wasting time with format recognition. Processing is single-threaded.- Parameters:
s- the molecule file contents- Returns:
- the molecule
- Throws:
MolFormatException- If the molecule file is in a format that cannot be read
-
importMol
Reads a molecule from a string. If the string contains multiple molecules (e.g. in the case of an SDF or MRV file), reads only the first one. Processing is single-threaded.- Parameters:
s- the molecule file contentsopts- the file format and/or options separated by a colon; usenullfor automatic format recognition and default options- Returns:
- the molecule
- Throws:
MolFormatException- If the molecule file is in a format that cannot be read
-
importMol
@Deprecated @SubjectToRemoval(date=JUL_01_2027) public static Molecule importMol(String s, Object options) throws MolFormatException Deprecated.UseimportMol(String, String)instead and pass the options as a string (call toString() on the object). This method will be removed in a future release.Read a molecule from a string with the given options. If the string contains multiple molecules (e.g. in the case of an SDF or MRV file), reads only the first one. Processing is single-threaded. If an encoding is specified in the option, its value will not be used since the input is a string which can not be encoded.- Parameters:
s- the molecule file contentsoptions- options defined by an options object- Returns:
- the molecule
- Throws:
MolFormatException- If the molecule file is in a format that cannot be read
-
getGlobalProperties
Gets the global properties in a container that was retrieved from the input stream, earlier. Only MRV import supports global properties. Reads them by the initalization of the record importer.- Returns:
- global properties in a container or null.
- Since:
- Marvin 5.0 06/05/2007
-
importMol(String, String)instead and pass the options as a string (call toString() on the object).