GdPicture.NET.14~GdPicture14.GdPictureTextExtraction - GdPictureTextExtraction Class Members

	Name	Description
	GdPictureTextExtraction Constructor	GdPictureTextExtraction is a streamlined class designed to effortlessly convert any GdPicture technology-supported document into plain text. It provides a range of capabilities that allow for addressing various scenarios, including indexing and enhancing the performance of LLM inferences. It employs internal logic to optimize extraction accuracy and minimize processing time through the dynamic utilization of page layout analysis, encoding detection, and OCR components. The identical API serves for processing raster images, PDFs, CAD files, Email files, and office formats alike. Documents can be loaded from file paths, Stream objects, or distant URIs.

Top

	Name	Description
	Dictionary	Specifies the dictionary to be used during the optional OCR process.
	EnableKeyValuePairsExtraction	Specifies whether key value pairs extraction is enabled.
	EnableOCR	Specifies whether OCR is enabled.
	EnableOrientationDetection	Specifies whether document orientation detection is activated.
	EnableTablesExtraction	Specifies whether tables extraction is enabled.
	PageRange	Use this property before the loading step to specify a range of pages that will be subsequently processed. This allows for speeding up the loading process.
	ParagraphSeparator	This property specifies the separator to be utilized for splitting paragraphs. It takes effect solely when the PreserveParagraphs property is set to true.
	PreserveParagraphs	Specifies that the text extraction engine must preserve text paragraphs. This functionality is particularly useful to improve the accuracy of NLP engines.
	ResourcesFolder	Specifies the path to the directory containing the engine resources (mostly dictionaries).
	TimeoutMilliseconds	Specifies the timeout for any subsequent process, in milliseconds. The default value is -1, which means there is no timeout.

Top

	Name	Description
	CloseDocument	Closes the currently loaded.
	Dispose
	GetDocumentFormat	Returns the format the currently loaded document.
	GetFormFieldCount	Returns the number of extracted form fields within the extraction process. Form fields extraction is automatically performed during each extraction process.
	GetFormFieldKeyRect	Returns the location of the key part of a specified form field.
	GetFormFieldKeyText	Returns the text of a specified form field.
	GetFormFieldType	Returns the type of a specified form field.
	GetFormFieldValueRect	Returns the location of the value part of a specified form field.
	GetFormFieldValueText	Returns the text of the key of a specified form field.
	GetKeyValuePairConfidence	Returns the detection confidence a specified key-value pair.
	GetKeyValuePairCount	Returns the number of extracted key-value pairs within the extraction process. Key-value pairs extraction is automatically performed during each extraction process.
	GetKeyValuePairDataType	Returns the data type of a specified key-value pair.
	GetKeyValuePairIsStrong	Returns whether a specific key-value pair is strong. A pair is marked as strong when a semantic relationship have been established during the detection process.
	GetKeyValuePairKeyRect	Returns the location of the key part of a specified key-value pair.
	GetKeyValuePairKeyString	Returns the string representation of the key part of a specified key-value pair.
	GetKeyValuePairPublicName
	GetKeyValuePairValueRect	Returns the location of the value part of a specified key-value pair.
	GetKeyValuePairValueString	Returns the string representation of the value part of a specified key-value pair.
	GetPageCount	This method returns the number of pages in the currently loaded document. If there is no currently loaded document it returns 0.
	GetPageText	Retrieves the text from a particular page of the currently loaded document.
	GetStat	Returns the status of the last executed operation with the current GdPictureTextExtraction object.
	GetTableCellRect	Returns the location of a cell in a specified table.
	GetTableCellText	Returns the text content of a cell in a specified table.
	GetTableColumnCount	Returns the number of columns in a specified table.
	GetTableColumnRect	Returns the location of a column in a specified table.
	GetTableCount	Returns the number of detected tables within the extraction process.
	GetTableRect	Returns the location of a specified table.
	GetTableRowCount	Returns the number of rows in a specified table.
	GetTableRowRect	Returns the location of a row in a specified table.
	IsHeaderCell	Specify whether if the cell's coordinate is located in the table's header.
	LoadFromFile	Loads a document from a file path.
	LoadFromHttp	Loads a document from a distant URI.
	LoadFromStream	Loads a document from a stream object.

Top

Reference

GdPictureTextExtraction Class
GdPicture14 Namespace