@PublicAPI public class DocumentToStructure extends java.lang.Object
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
BYTE |
static java.lang.String |
CHARACTER
The starting character offset since the beginning of the document,
for text formats (html, xml, txt).
|
static java.lang.String |
CONFIDENCE
The confidence that the structure is correct.
|
static java.lang.String |
CONTEXT
The context of the structure recognized in the text.
|
static java.lang.String |
CONTEXT_INDEX
Index of the hit inside the context.
|
static java.lang.String |
DOC_AUTHOR |
static java.lang.String |
DOC_CREATION_DATE |
static java.lang.String |
DOC_LAST_AUTHOR |
static java.lang.String |
DOC_PATENT_ASSIGNEES
The assignees of the patent, separated by newline characters.
|
static java.lang.String |
DOC_PATENT_ID |
static java.lang.String |
DOC_PATENT_INVENTORS
The inventors of the patent, separated by newline characters.
|
static java.lang.String |
DOC_PATENT_IPC
The IPC classification(s) for the patent, separated by newline characters.
|
static java.lang.String |
DOC_PATENT_IPCR
The IPCR classification(s) for the patent, separated by newline characters.
|
static java.lang.String |
DOC_TITLE |
static java.lang.String |
DOCUMENT
The file name of the source document.
|
static java.lang.String |
DOCUMENT_METADATA |
static java.lang.String |
END_CHARACTER
The ending character offset since the beginning of the document,
for text formats (html, xml, txt).
|
static java.lang.String |
IDENTIFIER |
static java.lang.String |
PAGE
The page number, if applicable (e.g.
|
static java.lang.String |
SECTION
The section of the document where the structure was found.
|
static java.lang.String |
SOURCE_TEXT
The source text, as it appears in the original document.
|
static java.lang.String |
TYPE
The type of source for the structure.
|
static java.lang.String |
TYPE_CAS
CAS Registry Number®.
|
static java.lang.String |
TYPE_CDX
Embedded ChemDraw structure.
|
static java.lang.String |
TYPE_COMMON
Common name.
|
static java.lang.String |
TYPE_EC
EC Number.
|
static java.lang.String |
TYPE_GENERIC
Generic name, for instance "C1-C4 alkyl".
|
static java.lang.String |
TYPE_INCHI
InChI string.
|
static java.lang.String |
TYPE_ION
Ion abbreviation, for instance K+ or Ca2+.
|
static java.lang.String |
TYPE_MRV
Embedded ChemAxon MRV structure.
|
static java.lang.String |
TYPE_OSR
Structure image recognized by Optical Structure Recognition.
|
static java.lang.String |
TYPE_PEPTIDE
Peptide notation, for instance Val-Gly-Ser-Ala.
|
static java.lang.String |
TYPE_SMILES
SMILES string.
|
static java.lang.String |
TYPE_SYMYX
Embedded Symyx/ISIS draw structure.
|
static java.lang.String |
TYPE_SYSTEMATIC
Systematic name.
|
Constructor and Description |
---|
DocumentToStructure() |
Modifier and Type | Method and Description |
---|---|
static boolean |
isMetadataMol(Molecule m) |
static MolImporter |
process(java.lang.String text)
Creates a
MolImporter instance to import structures from a given text using the default format
options. |
static MolImporter |
process(java.lang.String text,
java.lang.String options)
Creates a
MolImporter instance to import structures from a given text. |
public static final java.lang.String SOURCE_TEXT
public static final java.lang.String DOCUMENT
public static final java.lang.String PAGE
public static final java.lang.String CHARACTER
public static final java.lang.String END_CHARACTER
public static final java.lang.String BYTE
public static final java.lang.String IDENTIFIER
public static final java.lang.String DOC_AUTHOR
public static final java.lang.String DOC_LAST_AUTHOR
public static final java.lang.String DOC_TITLE
public static final java.lang.String DOC_CREATION_DATE
public static final java.lang.String DOC_PATENT_ID
public static final java.lang.String DOC_PATENT_IPC
public static final java.lang.String DOC_PATENT_IPCR
public static final java.lang.String DOC_PATENT_ASSIGNEES
public static final java.lang.String DOC_PATENT_INVENTORS
public static final java.lang.String DOCUMENT_METADATA
public static final java.lang.String CONFIDENCE
0 or less means very little confidence. 1 or more means high confidence.
This is currently set on image recognition, that is Optical Structure Recognition (OSR), also known as "chemical OCR".
public static final java.lang.String SECTION
This is currently supported only for US patents in the USPTO XML format, in which case the value of the property can be "abstract", "citation", "description" or "claim N".
public static final java.lang.String CONTEXT
public static final java.lang.String CONTEXT_INDEX
public static final java.lang.String TYPE
TYPE_SYSTEMATIC
,
TYPE_COMMON
,
TYPE_GENERIC
,
TYPE_SMILES
,
TYPE_INCHI
,
TYPE_CAS
,
Constant Field Valuespublic static final java.lang.String TYPE_SYSTEMATIC
public static final java.lang.String TYPE_COMMON
public static final java.lang.String TYPE_GENERIC
public static final java.lang.String TYPE_SMILES
public static final java.lang.String TYPE_INCHI
public static final java.lang.String TYPE_CAS
public static final java.lang.String TYPE_EC
public static final java.lang.String TYPE_ION
public static final java.lang.String TYPE_PEPTIDE
public static final java.lang.String TYPE_CDX
public static final java.lang.String TYPE_MRV
public static final java.lang.String TYPE_SYMYX
public static final java.lang.String TYPE_OSR
public static MolImporter process(java.lang.String text)
MolImporter
instance to import structures from a given text using the default format
options.
A shorthand for process(text, null)
.
public static MolImporter process(java.lang.String text, java.lang.String options)
MolImporter
instance to import structures from a given text.
Generally, the text is treated as plain text. However, for convenience, text that starts immediately with an XML or HTML prologue is recognized as such instead of plain text. For complete documents, a direct call to a MolImporter constructor is often more appropriate than loading the whole document into a String object.
The returned MolImporter
instance does no actual resource management so closing it is not necessary.
text
- the plain text or HTML/XML to processoptions
- the "d2s" format options passed to MolImporter or null if the default options should be
used. Starting the String with "d2s:" is optional.MolImporter
that can be used to read the structures found in the text.public static boolean isMetadataMol(Molecule m)