Class DocumentAnnotator

  • All Implemented Interfaces:
    chemaxon.marvin.io.formats.MoleculeImporterIface, AutoCloseable

    @PublicAPI
    public class DocumentAnnotator
    extends Object
    implements chemaxon.marvin.io.formats.MoleculeImporterIface, AutoCloseable
    Generate a chemically annotated HTML view of a document.
    • Constructor Detail

      • DocumentAnnotator

        public DocumentAnnotator​(File sourceDocument)
                          throws FileNotFoundException
        Constructs a DocumentAnnotator to annotate the given document file.

        The document format (PDF, HTML, XML) will be auto-detected.

        Parameters:
        sourceDocument - the document to annotate
        Throws:
        FileNotFoundException - if the document does not exist
      • DocumentAnnotator

        public DocumentAnnotator​(InputStream sourceDocument)
                          throws IOException
        Constructs a DocumentAnnotator to annotate the given document.

        The document format (PDF, HTML, XML) will be autodetected.

        If annotation of plain text is desired, please use: new DocumentAnnotator(sourceDocument, DocumentAnnotator.DocumentType.TXT)

        Parameters:
        sourceDocument - the document to annotate
        Throws:
        IOException
      • DocumentAnnotator

        public DocumentAnnotator​(InputStream sourceDocument,
                                 DocumentAnnotator.DocumentType documentType)
        Constructs a DocumentAnnotator to annotate the given document.
        Parameters:
        sourceDocument - the document to annotate
        documentType - the type of the source document
    • Method Detail

      • fromPlainText

        public static DocumentAnnotator fromPlainText​(Reader source)
        Constructs a DocumentAnnotator to annotate the given text.
        Parameters:
        source - a Reader representing the source document
      • setOptions

        public void setOptions​(DocumentAnnotatorOptions options)
        Sets the options used for document annotation.
      • isAnnotationSupported

        public boolean isAnnotationSupported()
        Checks whether annotation is supported for the current document type.
        Returns:
        true if the document type is supported, false otherwise
      • setAnnotatedOutput

        public boolean setAnnotatedOutput​(File destination)
                                   throws IOException
        Set the destination file where an annotated version of the source document should be written.

        Since not all type of source documents are supported for annotation, this method can return false, in which case no annotated document will be generated.

        Returns:
        true if annotation is supported for the source document, false otherwise.
        Throws:
        IOException - if the file exists but is a directory rather than a regular file, does not exist but cannot be created, or cannot be opened for any other reason
        See Also:
        isAnnotationSupported()
      • setAnnotatedOutput

        public boolean setAnnotatedOutput​(OutputStream destination)
        Set the destination output stream where an annotated version of the source document should be written.

        Since not all type of source documents are supported for annotation, this method can return false, in which case no annotated document will be generated.

        Returns:
        true if annotation is supported for the source document, false otherwise.
        See Also:
        isAnnotationSupported()
      • setAnnotatedOutputDirectory

        public File setAnnotatedOutputDirectory​(File annotateDirectory)
                                         throws IOException
        Set the directory where the annotated document and associated resources will be placed.

        The name of the annotated document is based on the source document file name, if known. Otherwise, it will be an arbitrary file name, which is returned as the result of this method.

        Parameters:
        annotateDirectory - the destination directory
        Returns:
        the File inside the directory where the main document will be stored, or null if annotation is not supported for the source document.
        Throws:
        IOException - if the directory cannot be used to store files.
      • setAnnotatedOutputDirectory

        public File setAnnotatedOutputDirectory​(File annotateDirectory,
                                                boolean keepOriginalExtension)
                                         throws IOException
        Set the directory where the annotated document and associated resources will be placed.

        The name of the annotated document is based on the source document file name, if known. Otherwise, it will be an arbitrary file name, which is returned as the result of this method.

        Parameters:
        annotateDirectory - the destination directory
        Returns:
        the File inside the directory where the main document will be stored, or null if annotation is not supported for the source document.
        Throws:
        IOException - if the directory cannot be used to store files.
      • setD2SOptions

        public void setD2SOptions​(String options)
        Sets the options string to use for document annotation.
      • usePopups

        public void usePopups​(boolean addPopups)
        Parameters:
        addPopups - whether a popup should be generated for each hit
      • setMolconvert

        public void setMolconvert​(File molconvert)
      • setCustomHtmlToXmlConverter

        public void setCustomHtmlToXmlConverter​(chemaxon.naming.document.annotate.XmlToHtmlConverter customConverter)
        Provide a custom converter from XML to HTML, to be used instead of the default one.
      • read

        public Molecule read()
                      throws IOException
        Find the next structure in the source document and return it, or null when the end of the document has been reached.

        If an annotated document is being generated, calling read() might also lead to the corresponding portion of the annotated document to be written to the destination.

        Specified by:
        read in interface chemaxon.marvin.io.formats.MoleculeImporterIface
        Returns:
        the next structure found in the source document, or null if the document has been fully processed.
        Throws:
        IOException
      • close

        public void close()
                   throws IOException
        Specified by:
        close in interface AutoCloseable
        Specified by:
        close in interface chemaxon.marvin.io.formats.MoleculeImporterIface
        Throws:
        IOException
      • setResourceDirectory

        public void setResourceDirectory​(File resourceDirectory)
                                  throws IOException
        Set the directory where resources will be stored.
        Throws:
        IOException - if the directory cannot be used to store files.