Missing left spaces in extract text
Posted: Fri Feb 22, 2013 10:52 am
Hi,
I store the ocr text in a text file:
The problem is, in the file is missing the left spaces. Every line is left trim. The position of the text in the extracts text is not equal to the text position in the pdf dokument.
What is the problem. When I use the method GetPageTextWithCoords(...) then is the coordinates correct.
Regards
Steffen
I store the ocr text in a text file:
Code: Select all
oGdPictureImagingSource.OCRTesseractReinit();
oGdPictureImagingSource.OCRTesseractSetPassCount(5);
sOCR = oGdPictureImagingSource.OCRTesseractDoOCR(iImagePage, "deu", Application.StartupPath + "\\OCR", "");
if (oGdPictureImagingSource.GetStat() == GdPictureStatus.OK)
{
oGdPictureImagingSource.OCRTesseractClear();
System.IO.Stream fs = new System.IO.FileStream("Text.OCR", System.IO.FileMode.Create);
byte[] data = System.Text.Encoding.UTF8.GetBytes(sOCR);
fs.Write(data, 0, data.Length);
fs.Close();
}
What is the problem. When I use the method GetPageTextWithCoords(...) then is the coordinates correct.
Regards
Steffen