Class DocumentToStructure
 See the constructors of MolImporter for more general and common document to structure operations.
- Since:
- 5.9
- 
Field SummaryFieldsModifier and TypeFieldDescriptionstatic final StringFor internal usage only.static final StringMolecule property key on results: key of the molecule property which contains the starting character offset since the beginning of the document, for text formats (html, xml, txt).static final StringMolecule property key on results: the confidence that the structure is correct.static final StringMolecule property key on results: the context of the structure recognized in the text.static final StringMolecule property key on results: index of the hit inside the context.static final StringMolecule property key on results: name of the principal author(s) of a document.static final StringMolecule property key on results: the date on which the document was created.static final StringMolecule property key on results: name of the last (most recent) author of a document.static final StringMolecule property key on results: the assignees of the patent, separated by newline characters.static final StringMolecule property key on results: the patent identifier.static final StringMolecule property key on results: the inventors of the patent, separated by newline characters.static final StringMolecule property key on results: the IPC classification(s) for the patent, separated by newline characters.static final StringMolecule property key on results: the IPCR classification(s) for the patent, separated by newline characters.static final StringMolecule property key on results: the title of the document.static final StringMolecule property key on results: the file name of the source document.static final StringMolecule property key on results: key of the molecule property which contains the ending character offset since the beginning of the document, for text formats (html, xml, txt).static final Stringstatic final StringMolecule property key on results: the page number, if applicable (e.g.static final StringMolecule property key on results: the section of the document where the structure was found.static final StringMolecule property key on results: the source text, as it appears in the original document.static final StringMolecule property key on results: the type of source for the structure.static final StringPossible value for theTYPEproperty: the source is a CAS Registry Number®.static final StringPossible value for theTYPEproperty: the source is an embedded ChemDraw structure.static final StringPossible value for theTYPEproperty: the source is a common name.static final StringPossible value for theTYPEproperty: the source is an EC Number.static final StringPossible value for theTYPEproperty: the source is a generic name, for instance "C1-C4 alkyl".static final StringPossible value for theTYPEproperty: the source is an InChI string.static final StringPossible value for theTYPEproperty: the source is an ion abbreviation, for instance K+ or Ca2+.static final StringPossible value for theTYPEproperty: the source is an embedded Chemaxon MRV structure.static final StringPossible value for theTYPEproperty: the source is a structure image recognized by Optical Structure Recognition.static final StringPossible value for theTYPEproperty: the source is a peptide notation, for instance Val-Gly-Ser-Ala.static final StringPossible value for theTYPEproperty: the source is a SMILES string.static final StringPossible value for theTYPEproperty: the source is an embedded Symyx/ISIS draw structure.static final StringPossible value for theTYPEproperty: the source is a systematic name.
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionstatic booleanstatic MolImporterCreates aMolImporterinstance to import structures from a given text using the default format options.static MolImporterCreates aMolImporterinstance to import structures from a given text.
- 
Field Details- 
SOURCE_TEXTMolecule property key on results: the source text, as it appears in the original document.- See Also:
 
- 
DOCUMENTMolecule property key on results: the file name of the source document.- See Also:
 
- 
PAGEMolecule property key on results: the page number, if applicable (e.g. for a PDF document).- See Also:
 
- 
CHARACTERMolecule property key on results: key of the molecule property which contains the starting character offset since the beginning of the document, for text formats (html, xml, txt).- See Also:
 
- 
END_CHARACTERMolecule property key on results: key of the molecule property which contains the ending character offset since the beginning of the document, for text formats (html, xml, txt).- See Also:
 
- 
BYTEFor internal usage only.- See Also:
 
- 
IDENTIFIER- See Also:
 
- 
DOC_AUTHORMolecule property key on results: name of the principal author(s) of a document.- See Also:
 
- 
DOC_LAST_AUTHORMolecule property key on results: name of the last (most recent) author of a document.- See Also:
 
- 
DOC_TITLEMolecule property key on results: the title of the document.- See Also:
 
- 
DOC_CREATION_DATEMolecule property key on results: the date on which the document was created.- See Also:
 
- 
DOC_PATENT_IDMolecule property key on results: the patent identifier.- See Also:
 
- 
DOC_PATENT_IPCMolecule property key on results: the IPC classification(s) for the patent, separated by newline characters.- See Also:
 
- 
DOC_PATENT_IPCRMolecule property key on results: the IPCR classification(s) for the patent, separated by newline characters.- See Also:
 
- 
DOC_PATENT_ASSIGNEESMolecule property key on results: the assignees of the patent, separated by newline characters.- See Also:
 
- 
DOC_PATENT_INVENTORSMolecule property key on results: the inventors of the patent, separated by newline characters.- See Also:
 
- 
CONFIDENCEMolecule property key on results: the confidence that the structure is correct.0 or less means very little confidence. 1 or more means high confidence. This is currently set on image recognition, that is Optical Structure Recognition (OSR), also known as "chemical OCR". - See Also:
 
- 
SECTIONMolecule property key on results: the section of the document where the structure was found.This is currently supported only for US patents in the USPTO XML format, in which case the value of the property can be "abstract", "citation", "description" or "claim N". - See Also:
 
- 
CONTEXTMolecule property key on results: the context of the structure recognized in the text.- See Also:
 
- 
CONTEXT_INDEXMolecule property key on results: index of the hit inside the context.- See Also:
 
- 
TYPEMolecule property key on results: the type of source for the structure.- See Also:
 
- 
TYPE_SYSTEMATICPossible value for theTYPEproperty: the source is a systematic name.- See Also:
 
- 
TYPE_COMMONPossible value for theTYPEproperty: the source is a common name.- See Also:
 
- 
TYPE_GENERICPossible value for theTYPEproperty: the source is a generic name, for instance "C1-C4 alkyl".- See Also:
 
- 
TYPE_SMILESPossible value for theTYPEproperty: the source is a SMILES string.- See Also:
 
- 
TYPE_INCHIPossible value for theTYPEproperty: the source is an InChI string.- See Also:
 
- 
TYPE_CASPossible value for theTYPEproperty: the source is a CAS Registry Number®.- See Also:
 
- 
TYPE_ECPossible value for theTYPEproperty: the source is an EC Number.- See Also:
 
- 
TYPE_IONPossible value for theTYPEproperty: the source is an ion abbreviation, for instance K+ or Ca2+.- See Also:
 
- 
TYPE_PEPTIDEPossible value for theTYPEproperty: the source is a peptide notation, for instance Val-Gly-Ser-Ala.- See Also:
 
- 
TYPE_CDXPossible value for theTYPEproperty: the source is an embedded ChemDraw structure.- See Also:
 
- 
TYPE_MRVPossible value for theTYPEproperty: the source is an embedded Chemaxon MRV structure.- See Also:
 
- 
TYPE_SYMYXPossible value for theTYPEproperty: the source is an embedded Symyx/ISIS draw structure.- See Also:
 
- 
TYPE_OSRPossible value for theTYPEproperty: the source is a structure image recognized by Optical Structure Recognition.- See Also:
 
 
- 
- 
Constructor Details- 
DocumentToStructurepublic DocumentToStructure()
 
- 
- 
Method Details- 
processCreates aMolImporterinstance to import structures from a given text using the default format options.A shorthand for process(text, null).
- 
processCreates aMolImporterinstance to import structures from a given text.Generally, the text is treated as plain text. However, for convenience, text that starts immediately with an XML or HTML prologue is recognized as such instead of plain text. For complete documents, a direct call to a MolImporter constructor is often more appropriate than loading the whole document into a String object. The returned MolImporterinstance does no actual resource management so closing it is not necessary.- Parameters:
- text- the plain text or HTML/XML to process
- options- the "d2s" format options passed to MolImporter or null if the default options should be used. Starting the String with "d2s:" is optional.
- Returns:
- a MolImporterthat can be used to read the structures found in the text.
 
- 
isMetadataMol
 
-