Name | Description | |
---|---|---|
![]() | GdPictureTextExtraction Constructor | GdPictureTextExtraction is a streamlined class designed to effortlessly convert any GdPicture technology-supported document into plain text. It provides a range of capabilities that allow for addressing various scenarios, including indexing and enhancing the performance of LLM inferences. It employs internal logic to optimize extraction accuracy and minimize processing time through the dynamic utilization of page layout analysis, encoding detection, and OCR components. The identical API serves for processing raster images, PDFs, CAD files, Email files, and office formats alike. Documents can be loaded from file paths, Stream objects, or distant URIs. |
GdPictureTextExtraction Class Members
In This Topic
The following tables list the members exposed by GdPictureTextExtraction.
Public Constructors
Public Properties
Name | Description | |
---|---|---|
![]() | Dictionary | Specifies the dictionary to be used during the optional OCR process. |
![]() | EnableKeyValuePairsExtraction | Specifies whether key value pairs extraction is enabled. |
![]() | EnableOCR | Specifies whether OCR is enabled. |
![]() | EnableOrientationDetection | Specifies whether document orientation detection is activated. |
![]() | EnableTablesExtraction | Specifies whether tables extraction is enabled. |
![]() | PageRange | Use this property before the loading step to specify a range of pages that will be subsequently processed. This allows for speeding up the loading process. |
![]() | ParagraphSeparator | This property specifies the separator to be utilized for splitting paragraphs. It takes effect solely when the PreserveParagraphs property is set to true. |
![]() | PreserveParagraphs | Specifies that the text extraction engine must preserve text paragraphs. This functionality is particularly useful to improve the accuracy of NLP engines. |
![]() | ResourcesFolder | Specifies the path to the directory containing the engine resources (mostly dictionaries). |
![]() | TimeoutMilliseconds | Specifies the timeout for any subsequent process, in milliseconds. |
Public Methods
Name | Description | |
---|---|---|
![]() | CloseDocument | Closes the currently loaded. |
![]() | Dispose | |
![]() | GetDocumentFormat | Returns the format the currently loaded document. |
![]() | GetFormFieldCount | Returns the number of extracted form fields within the extraction process. Form fields extraction is automatically performed during each extraction process. |
![]() | GetFormFieldKeyRect | Returns the location of the key part of a specified form field. |
![]() | GetFormFieldKeyText | Returns the text of a specified form field. |
![]() | GetFormFieldType | Returns the type of a specified form field. |
![]() | GetFormFieldValueRect | Returns the location of the value part of a specified form field. |
![]() | GetFormFieldValueText | Returns the text of the key of a specified form field. |
![]() | GetKeyValuePairConfidence | Returns the detection confidence a specified key-value pair. |
![]() | GetKeyValuePairCount | Returns the number of extracted key-value pairs within the extraction process. Key-value pairs extraction is automatically performed during each extraction process. |
![]() | GetKeyValuePairDataType | Returns the data type of a specified key-value pair. |
![]() | GetKeyValuePairIsStrong | Returns whether a specific key-value pair is strong. A pair is marked as strong when a semantic relationship have been established during the detection process. |
![]() | GetKeyValuePairKeyRect | Returns the location of the key part of a specified key-value pair. |
![]() | GetKeyValuePairKeyString | Returns the string representation of the key part of a specified key-value pair. |
![]() | GetKeyValuePairPublicName | |
![]() | GetKeyValuePairValueRect | Returns the location of the value part of a specified key-value pair. |
![]() | GetKeyValuePairValueString | Returns the string representation of the value part of a specified key-value pair. |
![]() | GetPageCount | This method returns the number of pages in the currently loaded document. If there is no currently loaded document it returns 0. |
![]() | GetPageText | Retrieves the text from a particular page of the currently loaded document. |
![]() | GetStat | Returns the status of the last executed operation with the current GdPictureTextExtraction object. |
![]() | GetTableCellRect | Returns the location of a cell in a specified table. |
![]() | GetTableCellText | Returns the text content of a cell in a specified table. |
![]() | GetTableColumnCount | Returns the number of columns in a specified table. |
![]() | GetTableColumnRect | Returns the location of a column in a specified table. |
![]() | GetTableCount | Returns the number of detected tables within the extraction process. |
![]() | GetTableRect | Returns the location of a specified table. |
![]() | GetTableRowCount | Returns the number of rows in a specified table. |
![]() | GetTableRowRect | Returns the location of a row in a specified table. |
![]() | IsHeaderCell | Specify whether if the cell's coordinate is located in the table's header. |
![]() | LoadFromFile | Loads a document from a file path. |
![]() | LoadFromHttp | Loads a document from a distant URI. |
![]() | LoadFromStream | Loads a document from a stream object. |
See Also