Class MFileFormatUtil

java.lang.Object
chemaxon.formats.MFileFormatUtil

@PublicApi public class MFileFormatUtil extends Object
File format related utility functions.
Since:
Marvin 4.1, 12/15/2005
  • Field Details

    • MULTISET

      public static final int MULTISET
      The multi-molecule file really contains multiple atom sets of one molecule.
      See Also:
    • MOLMOVIE

      public static final int MOLMOVIE
      Read multi-molecule files as movies.
      Since:
      Marvin 5.2, 02/12/2009
      See Also:
    • NOMOLMOVIE

      public static final int NOMOLMOVIE
      Do not read multi-molecule XYZ files as movies.
      Since:
      Marvin 5.2, 02/12/2009
      See Also:
  • Constructor Details

    • MFileFormatUtil

      public MFileFormatUtil()
  • Method Details

    • isSubFormatOf

      public static boolean isSubFormatOf(String f, String other)
      Tests whether a format is a sub-format of another format.
      Parameters:
      f - the format codename
      other - the other format
      Returns:
      true if it is a format variant of f
      Since:
      Marvin 4.1, 04/07/2006
    • splitFileAndOptions

      public static String[] splitFileAndOptions(String arg)
      Parses "file{options}" strings used in molecule file import.
      Parameters:
      arg - string containing the filename and the options (if there are)
      Returns:
      a two-element array containing the filename and the options.
    • splitFormatAndOptions

      public static String[] splitFormatAndOptions(String opts)
      Parses "format:options" strings used in molecule file import and export. Examples:
       splitFormatAndOptions("xyz:f1.4") returns {"xyz", "f1.4"}
       splitFormatAndOptions("f1.4") returns {null, "f1.4"}
       splitFormatAndOptions("xyz:") returns {"xyz", ""}
       splitFormatAndOptions("gzip:xyz:f1.4") returns {"gzip", "xyz:f1.4"}
       
      The colon can be omitted in case if Marvin's built-in input formats. Example:
       splitFormatAndOptions("xyz") returns { "xyz", ""}
       
      Colons after the first equality sign are ignored. This is to allow options which have a parameter that can contain a colon (e.g. URLs). Example:
       splitFormatAndOptions("param=https://chemaxon.com") returns {null, "param=https://chemaxon.com"}
       
      Parameters:
      opts - string containing the format and the options
      Returns:
      an array containing the format(s) and the options.
    • preprocessFormatAndOptions

      public static int preprocessFormatAndOptions(String[] fmtopts)
      Parses options like "MULTISET", "MOLMOVIE" or "NOMOLMOVIE". Example:
       String[] fmtopts = splitFormatAndOptions("gzip:xyz:MULTISET,f1.4");
       // fmtopts == {"gzip", "xyz:MULTISET,f.14"}
       int result = preprocessFormatAndOptions(fmtopts);
       // fmtopts == {"gzip", "xyz:f.14"}, results == MULTISET
       
      Parameters:
      fmtopts - two-element array containing the format and the options
      Returns:
      flags corresponding to the options
      See Also:
    • getEncodingFromOptions

      public static String[] getEncodingFromOptions(String fmtopts)
      Gets the encoding that was explicitly given as an import option. The format is enc{name}, where name is a JAVA supported name of the charset.
      Parameters:
      fmtopts - the input format and options
      Returns:
      two element array, the first element is the encoding, the second contains the remaining import options.
      Throws:
      IllegalCharsetNameException - if the encoding is illegal
      UnsupportedCharsetException - if the encoding is unsupported
    • testEncoding

      public static void testEncoding(String enc) throws IllegalArgumentException
      Tests whether the given charset name is supported by this JVM
      Parameters:
      enc - the name of the charset
      Throws:
      IllegalArgumentException
    • getUnguessableFormat

      public static String getUnguessableFormat(String fname)
      Gets the file format from the file name extension for formats that are not guessable from the file content. Used to distinguish SMARTS and SMILES.
      Parameters:
      fname - the filename
      Returns:
      the file format or null if the file contents can be used to recognize the format
    • getFileExtensionLC

      public static String getFileExtensionLC(File f)
      Gets the file extension in lower case.
      Parameters:
      f - the file
      Returns:
      the extension in lower case
    • getFileExtensionLC

      public static String getFileExtensionLC(String fname)
      Gets the file extension in lower case.
      Parameters:
      fname - the filename
      Returns:
      the extension in lower case
    • getMostLikelyMolFormat

      public static String getMostLikelyMolFormat(String fname)
      Gets the most likey molecule file format from the file name extension.
      Parameters:
      fname - the filename
      Returns:
      the file format or null if the format cannot be determined from the file name
    • getKnownExtension

      public static String getKnownExtension(String fname)
      Returns the file extension if it is a known extension. Known extensions are the following: mrv t gz mol mol2 rgf rxn csmol csrgf csrxn sdf cssdf rdf smi smiles sma smarts cml xml xyz txt html htm cgi gif jpg jpeg msbmp png svg svgz
      Parameters:
      fname - the filename
      Returns:
      the extension
    • getMolfileExtensions

      public static String[] getMolfileExtensions()
      Gets the array of known molecule file extensions.
      Returns:
      the array of known molecule file extensions
    • getMolfileFormats

      public static String[] getMolfileFormats()
      Gets the array of known molecule file formats.
      Returns:
      the array of known molecule file formats
    • isOutputCleanable

      public static boolean isOutputCleanable(String fmt) throws SecurityException
      Tests whether the specified output format is cleanable. For a non-cleanable output format, cleaning is meaningless because coordinates are not stored.
      Parameters:
      fmt - the format string
      Returns:
      true if the specified output format is non-cleanable, false otherwise
      Throws:
      SecurityException
      Since:
      Marvin 4.1, 02/13/2006
    • registerFormat

      public static void registerFormat(MFileFormat mff)
      Registers a user defined file format.
      Parameters:
      mff - the file format
      Since:
      Marvin 5.0, 05/23/2007
    • getFormat

      public static MFileFormat getFormat(String fmt)
      Gets the file format descriptor for the specified codename.
      Parameters:
      fmt - the format codename
      Returns:
      the descriptor or null if not found
      Since:
      Marvin 5.0, 05/23/2007
    • findFormats

      public static MFileFormat[] findFormats(String fmt, long flags, long mask)
      Gets a list of formats.
      Parameters:
      fmt - the format name or null if not important
      flags - select formats of which the specified flags are set
      mask - only bits specified here are taken into account
      Returns:
      the list
      Since:
      Marvin 5.0, 05/24/2007
    • createRecordReader

      public static MRecordReader createRecordReader(InputStream is, String opts) throws IOException
      Creates a record reader for an input stream.
      Parameters:
      is - the input stream
      opts - input options or null
      Returns:
      the record reader or null if the format was not recognized
      Throws:
      IllegalCharsetNameException - if illegal encoding is used
      UnsupportedCharsetException - if unsupported encoding is used
      SecurityException - if the module cannot be loaded because of a firewall problem
      IOException
      Since:
      Marvin 5.0, 06/03/2007
      See Also:
    • createRecordReader

      public static MRecordReader createRecordReader(InputStream is, String opts, String enc, String path) throws IOException
      Creates a record reader for an input stream.
      Parameters:
      is - the input stream
      opts - input options or null
      enc - the input encoding or null
      path - the file path (it can also be an URL) or null
      Returns:
      the record reader or null if the format was not recognized
      Throws:
      IllegalCharsetNameException - if illegal encoding is used
      UnsupportedCharsetException - if unsupported encoding is used
      SecurityException - if the module cannot be loaded because of a firewall problem
      IOException
      Since:
      Marvin 5.0, 06/03/2007, Marvin 5.3
      See Also:
    • createExportModule

      public static MolExportModule createExportModule(String fmt) throws MolExportException
      Creates an export module for the specified format.
      Parameters:
      fmt - the format name
      Throws:
      SecurityException - if the module cannot be loaded because of a firewall problem
      MolExportException
      See Also:
    • createExportModule

      public static MolExportModule createExportModule(String fmt, String enc) throws MolExportException
      Creates an export module for the specified format with the specified encoding.
      Parameters:
      fmt - the format name
      enc - the encoding
      Throws:
      SecurityException - if the module cannot be loaded because of a firewall problem
      MolExportException
      See Also:
    • convertToSmilingFormat

      public static String[] convertToSmilingFormat(Molecule m) throws MolExportException
      Tries to convert a molecule to a SMILES related format. SMILES, SMARTS, CxSMILES and CxSMARTS are tried in this order.
      Returns:
      the result of the first successful conversion, the 0th array element is the converted text, the 1st element is the format
      Throws:
      MolExportException - if conversion was not successful
      Since:
      Marvin 5.0, 11/11/2007
    • convertToSmilingFormat

      public static String[] convertToSmilingFormat(MProp p) throws MolExportException
      Try to convert a property to text with a SMILES related format argument. SMILES, SMARTS, CxSMILES and CxSMARTS are tried in this order.
      Returns:
      the result of the first successful conversion, the 0th array element is the converted text, the 1st element is the format
      Throws:
      MolExportException - if conversion was not successful
      Since:
      Marvin 5.0, 11/11/2007
    • recognizeOneLineFormat

      public static String recognizeOneLineFormat(String s)
      Recognize a one-line string as CxSMILES, CxSMARTS, AbbrevGroup, Peptide or IUPAC name.
      Parameters:
      s - the input string
      Returns:
      the most probable format or null
      Since:
      Marvin 4.1, 04/06/2006
    • recognizeOneLineFormat

      public static String recognizeOneLineFormat(String s, MFileFormat... forbiddeneFormats)
      Recognize a one-line string as CxSMILES, CxSMARTS, AbbrevGroup, Peptide or IUPAC name.
      Parameters:
      s - the input string
      forbiddeneFormats - the list of MFileFormat that should be not recognised.
      Returns:
      the most probable format or null
      Since:
      Marvin 4.1, 04/06/2006
    • isURLOrFileName

      public static boolean isURLOrFileName(String s)
      Tests whether the specified string is an URL (absolute or relative) or file name.
      Parameters:
      s - the string
      Returns:
      true if it is an URL or file name, false otherwise
    • getFormatNamesWithExtension

      public static List<String> getFormatNamesWithExtension(String fileName)