Class MFileFormatUtil


  • @PublicAPI
    public class MFileFormatUtil
    extends Object
    File format related utility functions.
    Since:
    Marvin 4.1, 12/15/2005
    • Field Detail

      • MULTISET

        public static final int MULTISET
        The multi-molecule file really contains multiple atom sets of one molecule.
        See Also:
        Constant Field Values
      • MOLMOVIE

        public static final int MOLMOVIE
        Read multi-molecule files as movies.
        Since:
        Marvin 5.2, 02/12/2009
        See Also:
        Constant Field Values
      • NOMOLMOVIE

        public static final int NOMOLMOVIE
        Do not read multi-molecule XYZ files as movies.
        Since:
        Marvin 5.2, 02/12/2009
        See Also:
        Constant Field Values
    • Constructor Detail

      • MFileFormatUtil

        public MFileFormatUtil()
    • Method Detail

      • isSubFormatOf

        public static boolean isSubFormatOf​(String f,
                                            String other)
        Tests whether a format is a sub-format of another format.
        Parameters:
        f - the format codename
        other - the other format
        Returns:
        true if it is a format variant of f
        Since:
        Marvin 4.1, 04/07/2006
      • splitFileAndOptions

        public static String[] splitFileAndOptions​(String arg)
        Parses "file{options}" strings used in molecule file import.
        Parameters:
        arg - string containing the filename and the options (if there are)
        Returns:
        a two-element array containing the filename and the options.
      • splitFormatAndOptions

        public static String[] splitFormatAndOptions​(String opts)
        Parses "format:options" strings used in molecule file import and export. Examples:
         splitFormatAndOptions("xyz:f1.4") returns {"xyz", "f1.4"}
         splitFormatAndOptions("f1.4") returns {null, "f1.4"}
         splitFormatAndOptions("xyz:") returns {"xyz", ""}
         splitFormatAndOptions("gzip:xyz:f1.4") returns {"gzip", "xyz:f1.4"}
         
        The colon can be omitted in case if Marvin's built-in input formats. Example:
         splitFormatAndOptions("xyz") returns { "xyz", ""}
         
        Colons after the first equality sign are ignored. This is to allow options which have a parameter that can contain a colon (e.g. URLs). Example:
         splitFormatAndOptions("param=https://chemaxon.com") returns {null, "param=https://chemaxon.com"}
         
        Parameters:
        opts - string containing the format and the options
        Returns:
        an array containing the format(s) and the options.
      • preprocessFormatAndOptions

        public static int preprocessFormatAndOptions​(String[] fmtopts)
        Parses options like "MULTISET", "MOLMOVIE" or "NOMOLMOVIE". Example:
         String[] fmtopts = splitFormatAndOptions("gzip:xyz:MULTISET,f1.4");
         // fmtopts == {"gzip", "xyz:MULTISET,f.14"}
         int result = preprocessFormatAndOptions(fmtopts);
         // fmtopts == {"gzip", "xyz:f.14"}, results == MULTISET
         
        Parameters:
        fmtopts - two-element array containing the format and the options
        Returns:
        flags corresponding to the options
        See Also:
        splitFormatAndOptions(java.lang.String), MULTISET, MOLMOVIE, NOMOLMOVIE
      • getEncodingFromOptions

        public static String[] getEncodingFromOptions​(String fmtopts)
        Gets the encoding that was explicitly given as an import option. The format is enc{name}, where name is a JAVA supported name of the charset.
        Parameters:
        fmtopts - the input format and options
        Returns:
        two element array, the first element is the encoding, the second contains the remaining import options.
        Throws:
        IllegalCharsetNameException - if the encoding is illegal
        UnsupportedCharsetException - if the encoding is unsupported
      • getUnguessableFormat

        public static String getUnguessableFormat​(String fname)
        Gets the file format from the file name extension for formats that are not guessable from the file content. Used to distinguish SMARTS and SMILES.
        Parameters:
        fname - the filename
        Returns:
        the file format or null if the file contents can be used to recognize the format
      • getFileExtensionLC

        public static String getFileExtensionLC​(File f)
        Gets the file extension in lower case.
        Parameters:
        f - the file
        Returns:
        the extension in lower case
      • getFileExtensionLC

        public static String getFileExtensionLC​(String fname)
        Gets the file extension in lower case.
        Parameters:
        fname - the filename
        Returns:
        the extension in lower case
      • getMostLikelyMolFormat

        public static String getMostLikelyMolFormat​(String fname)
        Gets the most likey molecule file format from the file name extension.
        Parameters:
        fname - the filename
        Returns:
        the file format or null if the format cannot be determined from the file name
      • getKnownExtension

        public static String getKnownExtension​(String fname)
        Returns the file extension if it is a known extension. Known extensions are the following: mrv t gz mol mol2 rgf rxn csmol csrgf csrxn sdf cssdf rdf smi smiles sma smarts cml xml xyz txt html htm cgi gif jpg jpeg msbmp png svg svgz
        Parameters:
        fname - the filename
        Returns:
        the extension
      • getMolfileExtensions

        public static String[] getMolfileExtensions()
        Gets the array of known molecule file extensions.
        Returns:
        the array of known molecule file extensions
      • getMolfileFormats

        public static String[] getMolfileFormats()
        Gets the array of known molecule file formats.
        Returns:
        the array of known molecule file formats
      • isOutputCleanable

        public static boolean isOutputCleanable​(String fmt)
                                         throws SecurityException
        Tests whether the specified output format is cleanable. For a non-cleanable output format, cleaning is meaningless because coordinates are not stored.
        Parameters:
        fmt - the format string
        Returns:
        true if the specified output format is non-cleanable, false otherwise
        Throws:
        SecurityException
        Since:
        Marvin 4.1, 02/13/2006
      • registerFormat

        public static void registerFormat​(MFileFormat mff)
        Registers a user defined file format. The MFileFormat.F_USER_DEFINED flag is automatically set.
        Parameters:
        mff - the file format
        Since:
        Marvin 5.0, 05/23/2007
      • getFormat

        public static MFileFormat getFormat​(String fmt)
        Gets the file format descriptor for the specified codename.
        Parameters:
        fmt - the format codename
        Returns:
        the descriptor or null if not found
        Since:
        Marvin 5.0, 05/23/2007
      • findFormats

        public static MFileFormat[] findFormats​(String fmt,
                                                long flags,
                                                long mask)
        Gets a list of formats.
        Parameters:
        fmt - the format name or null if not important
        flags - select formats of which the specified flags are set
        mask - only bits specified here are taken into account
        Returns:
        the list
        Since:
        Marvin 5.0, 05/24/2007
      • convertToSmilingFormat

        public static String[] convertToSmilingFormat​(Molecule m)
                                               throws MolExportException
        Tries to convert a molecule to a SMILES related format. SMILES, SMARTS, CxSMILES and CxSMARTS are tried in this order.
        Returns:
        the result of the first successful conversion, the 0th array element is the converted text, the 1st element is the format
        Throws:
        MolExportException - if conversion was not successful
        Since:
        Marvin 5.0, 11/11/2007
      • convertToSmilingFormat

        public static String[] convertToSmilingFormat​(MProp p)
                                               throws MolExportException
        Try to convert a property to text with a SMILES related format argument. SMILES, SMARTS, CxSMILES and CxSMARTS are tried in this order.
        Returns:
        the result of the first successful conversion, the 0th array element is the converted text, the 1st element is the format
        Throws:
        MolExportException - if conversion was not successful
        Since:
        Marvin 5.0, 11/11/2007
      • recognizeOneLineFormat

        public static String recognizeOneLineFormat​(String s)
        Recognize a one-line string as CxSMILES, CxSMARTS, AbbrevGroup, Peptide or IUPAC name.
        Parameters:
        s - the input string
        Returns:
        the most probable format or null
        Since:
        Marvin 4.1, 04/06/2006
      • recognizeOneLineFormat

        public static String recognizeOneLineFormat​(String s,
                                                    MFileFormat... forbiddeneFormats)
        Recognize a one-line string as CxSMILES, CxSMARTS, AbbrevGroup, Peptide or IUPAC name.
        Parameters:
        s - the input string
        forbiddeneFormats - the list of MFileFormat that should be not recognised.
        Returns:
        the most probable format or null
        Since:
        Marvin 4.1, 04/06/2006
      • isURLOrFileName

        public static boolean isURLOrFileName​(String s)
        Tests whether the specified string is an URL (absolute or relative) or file name.
        Parameters:
        s - the string
        Returns:
        true if it is an URL or file name, false otherwise
      • getFormatNamesWithExtension

        public static List<String> getFormatNamesWithExtension​(String fileName)