Class DocumentToStructure
See the constructors of MolImporter
for more general and common document to structure operations.
- Since:
- 5.9
-
Field Summary
Modifier and TypeFieldDescriptionstatic final String
For internal usage only.static final String
Molecule property key on results: key of the molecule property which contains the starting character offset since the beginning of the document, for text formats (html, xml, txt).static final String
Molecule property key on results: the confidence that the structure is correct.static final String
Molecule property key on results: the context of the structure recognized in the text.static final String
Molecule property key on results: index of the hit inside the context.static final String
Molecule property key on results: name of the principal author(s) of a document.static final String
Molecule property key on results: the date on which the document was created.static final String
Molecule property key on results: name of the last (most recent) author of a document.static final String
Molecule property key on results: the assignees of the patent, separated by newline characters.static final String
Molecule property key on results: the patent identifier.static final String
Molecule property key on results: the inventors of the patent, separated by newline characters.static final String
Molecule property key on results: the IPC classification(s) for the patent, separated by newline characters.static final String
Molecule property key on results: the IPCR classification(s) for the patent, separated by newline characters.static final String
Molecule property key on results: the title of the document.static final String
Molecule property key on results: the file name of the source document.static final String
Molecule property key on results: key of the molecule property which contains the ending character offset since the beginning of the document, for text formats (html, xml, txt).static final String
static final String
Molecule property key on results: the page number, if applicable (e.g.static final String
Molecule property key on results: the section of the document where the structure was found.static final String
Molecule property key on results: the source text, as it appears in the original document.static final String
Molecule property key on results: the type of source for the structure.static final String
Possible value for theTYPE
property: the source is a CAS Registry Number®.static final String
Possible value for theTYPE
property: the source is an embedded ChemDraw structure.static final String
Possible value for theTYPE
property: the source is a common name.static final String
Possible value for theTYPE
property: the source is an EC Number.static final String
Possible value for theTYPE
property: the source is a generic name, for instance "C1-C4 alkyl".static final String
Possible value for theTYPE
property: the source is an InChI string.static final String
Possible value for theTYPE
property: the source is an ion abbreviation, for instance K+ or Ca2+.static final String
Possible value for theTYPE
property: the source is an embedded Chemaxon MRV structure.static final String
Possible value for theTYPE
property: the source is a structure image recognized by Optical Structure Recognition.static final String
Possible value for theTYPE
property: the source is a peptide notation, for instance Val-Gly-Ser-Ala.static final String
Possible value for theTYPE
property: the source is a SMILES string.static final String
Possible value for theTYPE
property: the source is an embedded Symyx/ISIS draw structure.static final String
Possible value for theTYPE
property: the source is a systematic name. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic boolean
static MolImporter
Creates aMolImporter
instance to import structures from a given text using the default format options.static MolImporter
Creates aMolImporter
instance to import structures from a given text.
-
Field Details
-
SOURCE_TEXT
Molecule property key on results: the source text, as it appears in the original document.- See Also:
-
DOCUMENT
Molecule property key on results: the file name of the source document.- See Also:
-
PAGE
Molecule property key on results: the page number, if applicable (e.g. for a PDF document).- See Also:
-
CHARACTER
Molecule property key on results: key of the molecule property which contains the starting character offset since the beginning of the document, for text formats (html, xml, txt).- See Also:
-
END_CHARACTER
Molecule property key on results: key of the molecule property which contains the ending character offset since the beginning of the document, for text formats (html, xml, txt).- See Also:
-
BYTE
For internal usage only.- See Also:
-
IDENTIFIER
- See Also:
-
DOC_AUTHOR
Molecule property key on results: name of the principal author(s) of a document.- See Also:
-
DOC_LAST_AUTHOR
Molecule property key on results: name of the last (most recent) author of a document.- See Also:
-
DOC_TITLE
Molecule property key on results: the title of the document.- See Also:
-
DOC_CREATION_DATE
Molecule property key on results: the date on which the document was created.- See Also:
-
DOC_PATENT_ID
Molecule property key on results: the patent identifier.- See Also:
-
DOC_PATENT_IPC
Molecule property key on results: the IPC classification(s) for the patent, separated by newline characters.- See Also:
-
DOC_PATENT_IPCR
Molecule property key on results: the IPCR classification(s) for the patent, separated by newline characters.- See Also:
-
DOC_PATENT_ASSIGNEES
Molecule property key on results: the assignees of the patent, separated by newline characters.- See Also:
-
DOC_PATENT_INVENTORS
Molecule property key on results: the inventors of the patent, separated by newline characters.- See Also:
-
CONFIDENCE
Molecule property key on results: the confidence that the structure is correct.0 or less means very little confidence. 1 or more means high confidence.
This is currently set on image recognition, that is Optical Structure Recognition (OSR), also known as "chemical OCR".
- See Also:
-
SECTION
Molecule property key on results: the section of the document where the structure was found.This is currently supported only for US patents in the USPTO XML format, in which case the value of the property can be "abstract", "citation", "description" or "claim N".
- See Also:
-
CONTEXT
Molecule property key on results: the context of the structure recognized in the text.- See Also:
-
CONTEXT_INDEX
Molecule property key on results: index of the hit inside the context.- See Also:
-
TYPE
Molecule property key on results: the type of source for the structure. -
TYPE_SYSTEMATIC
Possible value for theTYPE
property: the source is a systematic name.- See Also:
-
TYPE_COMMON
Possible value for theTYPE
property: the source is a common name.- See Also:
-
TYPE_GENERIC
Possible value for theTYPE
property: the source is a generic name, for instance "C1-C4 alkyl".- See Also:
-
TYPE_SMILES
Possible value for theTYPE
property: the source is a SMILES string.- See Also:
-
TYPE_INCHI
Possible value for theTYPE
property: the source is an InChI string.- See Also:
-
TYPE_CAS
Possible value for theTYPE
property: the source is a CAS Registry Number®.- See Also:
-
TYPE_EC
Possible value for theTYPE
property: the source is an EC Number.- See Also:
-
TYPE_ION
Possible value for theTYPE
property: the source is an ion abbreviation, for instance K+ or Ca2+.- See Also:
-
TYPE_PEPTIDE
Possible value for theTYPE
property: the source is a peptide notation, for instance Val-Gly-Ser-Ala.- See Also:
-
TYPE_CDX
Possible value for theTYPE
property: the source is an embedded ChemDraw structure.- See Also:
-
TYPE_MRV
Possible value for theTYPE
property: the source is an embedded Chemaxon MRV structure.- See Also:
-
TYPE_SYMYX
Possible value for theTYPE
property: the source is an embedded Symyx/ISIS draw structure.- See Also:
-
TYPE_OSR
Possible value for theTYPE
property: the source is a structure image recognized by Optical Structure Recognition.- See Also:
-
-
Constructor Details
-
DocumentToStructure
public DocumentToStructure()
-
-
Method Details
-
process
Creates aMolImporter
instance to import structures from a given text using the default format options.A shorthand for
process(text, null)
. -
process
Creates aMolImporter
instance to import structures from a given text.Generally, the text is treated as plain text. However, for convenience, text that starts immediately with an XML or HTML prologue is recognized as such instead of plain text. For complete documents, a direct call to a MolImporter constructor is often more appropriate than loading the whole document into a String object.
The returned
MolImporter
instance does no actual resource management so closing it is not necessary.- Parameters:
text
- the plain text or HTML/XML to processoptions
- the "d2s" format options passed to MolImporter or null if the default options should be used. Starting the String with "d2s:" is optional.- Returns:
- a
MolImporter
that can be used to read the structures found in the text.
-
isMetadataMol
-