Package chemaxon.formats
Class MFileFormatUtil
java.lang.Object
chemaxon.formats.MFileFormatUtil
File format related utility functions.
- Since:
- Marvin 4.1, 12/15/2005
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
Read multi-molecule files as movies.static final int
The multi-molecule file really contains multiple atom sets of one molecule.static final int
Do not read multi-molecule XYZ files as movies. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic String[]
Tries to convert a molecule to a SMILES related format.static String[]
Try to convert a property to text with a SMILES related format argument.static MolExportModule
createExportModule
(String fmt) Creates an export module for the specified format.static MolExportModule
createExportModule
(String fmt, String enc) Creates an export module for the specified format with the specified encoding.static MRecordReader
createRecordReader
(InputStream is, String opts) Creates a record reader for an input stream.static MRecordReader
createRecordReader
(InputStream is, String opts, String enc, String path) Creates a record reader for an input stream.static MFileFormat[]
findFormats
(String fmt, long flags, long mask) Gets a list of formats.static String[]
getEncodingFromOptions
(String fmtopts) Gets the encoding that was explicitly given as an import option.static String
Gets the file extension in lower case.static String
getFileExtensionLC
(String fname) Gets the file extension in lower case.static MFileFormat
Gets the file format descriptor for the specified codename.getFormatNamesWithExtension
(String fileName) static String
getKnownExtension
(String fname) Returns the file extension if it is a known extension.static String[]
Gets the array of known molecule file extensions.static String[]
Gets the array of known molecule file formats.static String
getMostLikelyMolFormat
(String fname) Gets the most likey molecule file format from the file name extension.static String
getUnguessableFormat
(String fname) Gets the file format from the file name extension for formats that are not guessable from the file content.static boolean
isOutputCleanable
(String fmt) Tests whether the specified output format is cleanable.static boolean
isSubFormatOf
(String f, String other) Tests whether a format is a sub-format of another format.static boolean
Tests whether the specified string is an URL (absolute or relative) or file name.static int
preprocessFormatAndOptions
(String[] fmtopts) Parses options like "MULTISET", "MOLMOVIE" or "NOMOLMOVIE".static String
Recognize a one-line string as CxSMILES, CxSMARTS, AbbrevGroup, Peptide or IUPAC name.static String
recognizeOneLineFormat
(String s, MFileFormat... forbiddeneFormats) Recognize a one-line string as CxSMILES, CxSMARTS, AbbrevGroup, Peptide or IUPAC name.static void
Registers a user defined file format.static String[]
Parses "file{options}" strings used in molecule file import.static String[]
splitFormatAndOptions
(String opts) Parses "format:options" strings used in molecule file import and export.static void
testEncoding
(String enc) Tests whether the given charset name is supported by this JVM
-
Field Details
-
MULTISET
public static final int MULTISETThe multi-molecule file really contains multiple atom sets of one molecule.- See Also:
-
MOLMOVIE
public static final int MOLMOVIERead multi-molecule files as movies.- Since:
- Marvin 5.2, 02/12/2009
- See Also:
-
NOMOLMOVIE
public static final int NOMOLMOVIEDo not read multi-molecule XYZ files as movies.- Since:
- Marvin 5.2, 02/12/2009
- See Also:
-
-
Constructor Details
-
MFileFormatUtil
public MFileFormatUtil()
-
-
Method Details
-
isSubFormatOf
Tests whether a format is a sub-format of another format.- Parameters:
f
- the format codenameother
- the other format- Returns:
- true if it is a format variant of f
- Since:
- Marvin 4.1, 04/07/2006
-
splitFileAndOptions
Parses "file{options}" strings used in molecule file import.- Parameters:
arg
- string containing the filename and the options (if there are)- Returns:
- a two-element array containing the filename and the options.
-
splitFormatAndOptions
Parses "format:options" strings used in molecule file import and export. Examples:splitFormatAndOptions("xyz:f1.4") returns {"xyz", "f1.4"} splitFormatAndOptions("f1.4") returns {null, "f1.4"} splitFormatAndOptions("xyz:") returns {"xyz", ""} splitFormatAndOptions("gzip:xyz:f1.4") returns {"gzip", "xyz:f1.4"}
The colon can be omitted in case if Marvin's built-in input formats. Example:splitFormatAndOptions("xyz") returns { "xyz", ""}
Colons after the first equality sign are ignored. This is to allow options which have a parameter that can contain a colon (e.g. URLs). Example:splitFormatAndOptions("param=https://chemaxon.com") returns {null, "param=https://chemaxon.com"}
- Parameters:
opts
- string containing the format and the options- Returns:
- an array containing the format(s) and the options.
-
preprocessFormatAndOptions
Parses options like "MULTISET", "MOLMOVIE" or "NOMOLMOVIE". Example:String[] fmtopts = splitFormatAndOptions("gzip:xyz:MULTISET,f1.4"); // fmtopts == {"gzip", "xyz:MULTISET,f.14"} int result = preprocessFormatAndOptions(fmtopts); // fmtopts == {"gzip", "xyz:f.14"}, results == MULTISET
- Parameters:
fmtopts
- two-element array containing the format and the options- Returns:
- flags corresponding to the options
- See Also:
-
getEncodingFromOptions
Gets the encoding that was explicitly given as an import option. The format is enc{name}, where name is a JAVA supported name of the charset.- Parameters:
fmtopts
- the input format and options- Returns:
- two element array, the first element is the encoding, the second contains the remaining import options.
- Throws:
IllegalCharsetNameException
- if the encoding is illegalUnsupportedCharsetException
- if the encoding is unsupported
-
testEncoding
Tests whether the given charset name is supported by this JVM- Parameters:
enc
- the name of the charset- Throws:
IllegalArgumentException
-
getUnguessableFormat
Gets the file format from the file name extension for formats that are not guessable from the file content. Used to distinguish SMARTS and SMILES.- Parameters:
fname
- the filename- Returns:
- the file format or null if the file contents can be used to recognize the format
-
getFileExtensionLC
Gets the file extension in lower case.- Parameters:
f
- the file- Returns:
- the extension in lower case
-
getFileExtensionLC
Gets the file extension in lower case.- Parameters:
fname
- the filename- Returns:
- the extension in lower case
-
getMostLikelyMolFormat
Gets the most likey molecule file format from the file name extension.- Parameters:
fname
- the filename- Returns:
- the file format or null if the format cannot be determined from the file name
-
getKnownExtension
Returns the file extension if it is a known extension. Known extensions are the following: mrv t gz mol mol2 rgf rxn csmol csrgf csrxn sdf cssdf rdf smi smiles sma smarts cml xml xyz txt html htm cgi gif jpg jpeg msbmp png svg svgz- Parameters:
fname
- the filename- Returns:
- the extension
-
getMolfileExtensions
Gets the array of known molecule file extensions.- Returns:
- the array of known molecule file extensions
-
getMolfileFormats
Gets the array of known molecule file formats.- Returns:
- the array of known molecule file formats
-
isOutputCleanable
Tests whether the specified output format is cleanable. For a non-cleanable output format, cleaning is meaningless because coordinates are not stored.- Parameters:
fmt
- the format string- Returns:
- true if the specified output format is non-cleanable, false otherwise
- Throws:
SecurityException
- Since:
- Marvin 4.1, 02/13/2006
-
registerFormat
Registers a user defined file format.- Parameters:
mff
- the file format- Since:
- Marvin 5.0, 05/23/2007
-
getFormat
Gets the file format descriptor for the specified codename.- Parameters:
fmt
- the format codename- Returns:
- the descriptor or
null
if not found - Since:
- Marvin 5.0, 05/23/2007
-
findFormats
Gets a list of formats.- Parameters:
fmt
- the format name ornull
if not importantflags
- select formats of which the specified flags are setmask
- only bits specified here are taken into account- Returns:
- the list
- Since:
- Marvin 5.0, 05/24/2007
-
createRecordReader
Creates a record reader for an input stream.- Parameters:
is
- the input streamopts
- input options ornull
- Returns:
- the record reader or
null
if the format was not recognized - Throws:
IllegalCharsetNameException
- if illegal encoding is usedUnsupportedCharsetException
- if unsupported encoding is usedSecurityException
- if the module cannot be loaded because of a firewall problemIOException
- Since:
- Marvin 5.0, 06/03/2007
- See Also:
-
createRecordReader
public static MRecordReader createRecordReader(InputStream is, String opts, String enc, String path) throws IOException Creates a record reader for an input stream.- Parameters:
is
- the input streamopts
- input options ornull
enc
- the input encoding or nullpath
- the file path (it can also be an URL) ornull
- Returns:
- the record reader or
null
if the format was not recognized - Throws:
IllegalCharsetNameException
- if illegal encoding is usedUnsupportedCharsetException
- if unsupported encoding is usedSecurityException
- if the module cannot be loaded because of a firewall problemIOException
- Since:
- Marvin 5.0, 06/03/2007, Marvin 5.3
- See Also:
-
createExportModule
Creates an export module for the specified format.- Parameters:
fmt
- the format name- Throws:
SecurityException
- if the module cannot be loaded because of a firewall problemMolExportException
- See Also:
-
createExportModule
Creates an export module for the specified format with the specified encoding.- Parameters:
fmt
- the format nameenc
- the encoding- Throws:
SecurityException
- if the module cannot be loaded because of a firewall problemMolExportException
- See Also:
-
convertToSmilingFormat
Tries to convert a molecule to a SMILES related format. SMILES, SMARTS, CxSMILES and CxSMARTS are tried in this order.- Returns:
- the result of the first successful conversion, the 0th array element is the converted text, the 1st element is the format
- Throws:
MolExportException
- if conversion was not successful- Since:
- Marvin 5.0, 11/11/2007
-
convertToSmilingFormat
Try to convert a property to text with a SMILES related format argument. SMILES, SMARTS, CxSMILES and CxSMARTS are tried in this order.- Returns:
- the result of the first successful conversion, the 0th array element is the converted text, the 1st element is the format
- Throws:
MolExportException
- if conversion was not successful- Since:
- Marvin 5.0, 11/11/2007
-
recognizeOneLineFormat
Recognize a one-line string as CxSMILES, CxSMARTS, AbbrevGroup, Peptide or IUPAC name.- Parameters:
s
- the input string- Returns:
- the most probable format or null
- Since:
- Marvin 4.1, 04/06/2006
-
recognizeOneLineFormat
Recognize a one-line string as CxSMILES, CxSMARTS, AbbrevGroup, Peptide or IUPAC name.- Parameters:
s
- the input stringforbiddeneFormats
- the list ofMFileFormat
that should be not recognised.- Returns:
- the most probable format or null
- Since:
- Marvin 4.1, 04/06/2006
-
isURLOrFileName
Tests whether the specified string is an URL (absolute or relative) or file name.- Parameters:
s
- the string- Returns:
- true if it is an URL or file name, false otherwise
-
getFormatNamesWithExtension
-