Class DocumentAnnotatorOptions


  • @PublicAPI
    public class DocumentAnnotatorOptions
    extends Object
    Represents various options regarding how a document should be annotated, both in terms of input and output.
    • Field Detail

      • firstPage

        public final Integer firstPage
        The first page of the document that should be annotated.

        By default or if null, the document is annotated from the starting page.

        Note that this option only applies to documents that have page numbers, in particular PDF documents.

      • lastPage

        public final Integer lastPage
        The last page of the document that should be annotated.

        By default or if null, the document is annotated until its last page.

        Note that this option only applies to documents that have page numbers, in particular PDF documents.

      • listener

        public final DocumentAnnotator.Listener listener
        A listener that will be notified of the progress of the annotation.

        This can be used for instance to implement a progress bar.

      • useOpticalCharacterRecognition

        public final Boolean useOpticalCharacterRecognition
        Whether to use Optical Character Recognition on images.

        By default (and when set to null), Document Annotator will try to determine if OCR is required. In particular, OCR will be skipped the document has already been OCR'd by a known program.

      • nameStructureFormat

        public final String nameStructureFormat
        The structure format used to represent hits converted from names.
      • structureFormat

        public final String structureFormat
        The structure format used to represent hits converted from other sources than names (OSR, embedded structures, ...).