New methods

Discussions about machine vision support in GdPicture.
Post Reply
versilej
Posts: 6
Joined: Thu Jun 09, 2011 8:17 am

New methods

Post by versilej » Mon Jun 27, 2011 5:53 pm

Hi,

I need the new syntax in order to be able to convert a PDF into a searchable PDF. My previous code following:

Code: Select all

                GdPictureImaging gdImaging = new GdPictureImaging();
                GdViewer gdViewer = new GdViewer();
                lock (licenseLock)
                {                   
                    gdImaging.SetLicenseNumber(Properties.Settings.Default.GDPictureLicense);
                    gdImaging.SetLicenseNumberOCRTesseract(Properties.Settings.Default.GDTesserectLicense);                    
                    gdViewer.SetLicenseNumber(Properties.Settings.Default.GDPictureLicense);
                    gdViewer.DisplayFromFile(fileName);
                }
                // seems to take a second to get access to the gd library
                Thread.Sleep(500);
                if (fileName.IndexOf(".pdf") > -1)
                {
                    int newPDFID = gdImaging.PdfOCRStart(defPdfFilePath + "working" + core.ToString() + ".pdf", true, string.Empty, string.Empty, string.Empty, string.Empty, string.Empty);
                    for (int y = 1; y <= gdViewer.PageCount; y++)
                    {
                        if (stopRunning)
                        {
                            return;
                        }
                        imageID = gdViewer.PdfRenderPageToGdPictureImage(400, y);
                        gdImaging.ConvertTo1Bpp(imageID);
                        gdImaging.PdfAddGdPictureImageToPdfOCR(newPDFID, imageID, TesseractDictionary.TesseractDictionaryEnglish, defPdfFilePath + "OCR\\", string.Empty);
                        gdImaging.ReleaseGdPictureImage(imageID);
                    }
                    gdImaging.PdfOCRStop(newPDFID);
                }
                else if (fileName.IndexOf(".tif") > -1)
                {
                    gdImaging.TiffOpenMultiPageForWrite(false);
                    imageID = gdImaging.TiffCreateMultiPageFromFile(fileName);
                    gdImaging.PdfOCRCreateFromMultipageTIFF(imageID, TesseractDictionary.TesseractDictionaryEnglish, defPdfFilePath + "OCR\\", string.Empty, defPdfFilePath + "working" + core.ToString() + ".pdf", true, fileName, string.Empty, fileName, string.Empty, string.Empty);
                    gdImaging.ReleaseGdPictureImage(imageID);
                }
                if (stopRunning)
                {
                    return;
                }

                // determine if PDF is searchable now          
                gdViewer.DisplayFromFile(defPdfFilePath + "working" + core.ToString() + ".pdf");
                string pdfText = string.Empty;
                for (int y = 1; y <= (gdViewer.PageCount > defPages ? defPages : gdViewer.PageCount); y++)
                {
                    pdfText += gdViewer.PdfGetPageText(y);
                }
                gdViewer.CloseDocument();

versilej
Posts: 6
Joined: Thu Jun 09, 2011 8:17 am

Re: New methods

Post by versilej » Tue Jun 28, 2011 3:12 am

So I finally found the right DEMO to replace my code but now am running into another problem (repeatedly) - System.OutOfMemory Exception in the PdfAddGdPictureImageToPdfOCR function. Any hints, I've tried it with loading into memory and not loading into memory?

Code: Select all

GdPictureImaging gdImaging = new GdPictureImaging();
                GdPicturePDF gdPDF = new GdPicturePDF();
                lock (licenseLock)
                {                   
                    gdImaging.SetLicenseNumberUpgrade(Properties.Settings.Default.GD7PictureLicense, Properties.Settings.Default.GD8PictureLicense);
                    gdImaging.SetLicenseNumberOCRTesseract(Properties.Settings.Default.GD8TesserectLicense);                    
                    gdPDF.SetLicenseNumber(Properties.Settings.Default.GD8PictureLicense);
                }
                if (fileName.IndexOf(".pdf") > -1)
                {
                    if (gdPDF.LoadFromFile(fileName, true) == GdPictureStatus.OK)
                    {
                        int newPDFID = gdImaging.PdfOCRStart(defPdfFilePath + "working" + core.ToString() + ".pdf", true, string.Empty, string.Empty, string.Empty, string.Empty, string.Empty);
                        for (int y = 1; y <= gdPDF.GetPageCount(); y++)
                        {
                            if (stopRunning)
                            {
                                return;
                            }
                            gdPDF.SelectPage(y);
                            imageID = gdPDF.RenderPageToGdPictureImage(res, true);
                            gdImaging.ConvertTo1Bpp(imageID);
                            gdImaging.PdfAddGdPictureImageToPdfOCR(newPDFID, imageID, "eng", defPdfFilePath + "OCR\\", string.Empty);
                            gdImaging.ReleaseGdPictureImage(imageID);                           
                            Application.DoEvents();
                        }
                        gdImaging.PdfOCRStop(newPDFID);
                        gdPDF.CloseDocument();
                    }
                    else
                        throw new Exception("Failed to Load From File with gdPDF");
                }
                else if (fileName.IndexOf(".tif") > -1)
                {
                    gdImaging.TiffOpenMultiPageForWrite(false);
                    imageID = gdImaging.TiffCreateMultiPageFromFile(fileName);
                    gdImaging.PdfOCRCreateFromMultipageTIFF(imageID, "eng", defPdfFilePath + "OCR\\", string.Empty, defPdfFilePath + "working" + core.ToString() + ".pdf", true, fileName, string.Empty, fileName, string.Empty, string.Empty);
                    gdImaging.ReleaseGdPictureImage(imageID);
                }
                if (stopRunning)
                {
                    return;
                }
                // determine if PDF is searchable now  
                string pdfText = string.Empty;
                if (gdPDF.LoadFromFile(defPdfFilePath + "working" + core.ToString() + ".pdf", false) == GdPictureStatus.OK)
                {                   
                    for (int y = 1; y <= (gdPDF.GetPageCount() > defPages ? defPages : gdPDF.GetPageCount()); y++)
                    {
                        gdPDF.SelectPage(y);
                        pdfText += gdPDF.GetPageText();
                    }
                    gdPDF.CloseDocument();
                }
                if (pdfText.Length > defLength)
                {
                    // PDFfilepath + working.pdf needs to replace old pdf
                    string newFileName = fileName.Replace("tif", "pdf");
                    File.Copy(defPdfFilePath + "working" + core.ToString() + ".pdf", newFileName, true);
                    File.Delete(defPdfFilePath + "working" + core.ToString() + ".pdf");
                    if (fileName.IndexOf("tif") > -1 &&
                        delTiff)
                    {
                        File.Delete(fileName);
                    }
                }                
                // look for any text to determine searchability
                if (pdfText.Length > defLength)
                {
                    success = true;
                }
                else
                {
                    success = false;
                }

versilej
Posts: 6
Joined: Thu Jun 09, 2011 8:17 am

Re: New methods

Post by versilej » Tue Jun 28, 2011 3:37 am

Here is a screenshot, it's random as I have it processing the same PDF file over and over, 90% of the time it does not do this. It never does it using the TIFF multipage, always on the same function call.

Image

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: New methods

Post by Loïc » Tue Jun 28, 2011 11:21 am

Hi,

We are investigating the issue. Is it possible for you to send us the document you are converting to https://www.gdpicture.com/support/getting-support-from-our-team ?
Also what is the value used in res ?

Kind regards,

Loïc

versilej
Posts: 6
Joined: Thu Jun 09, 2011 8:17 am

Re: New methods

Post by versilej » Tue Jun 28, 2011 6:03 pm

It happens on many different documents, I tested quite a few of them and it repeats itself. One thing I am not sure of is whether or not I should be creating a new gdImaging object for each thread or if I should just use one global object? These documents are typically 10-30 pages, 10-20 megs.

versilej
Posts: 6
Joined: Thu Jun 09, 2011 8:17 am

Re: New methods

Post by versilej » Sat Jul 02, 2011 9:17 pm

Hello? Hate to be a bother but I purchased these controls, and when I emailed support they told me to post here?

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: New methods

Post by Loïc » Sat Jul 02, 2011 9:26 pm

Hi,

Sorry we have some late on support. Also you did not replied to this question: "what is the value used in res?"

We have tested multi-process PDF/OCR creation without encountering any problem. I think it should be better for you to reproduce the problem in a standalone application and send it to https://www.gdpicture.com/support/getting-support-from-our-team for better investigation. With the code snippet and the information provided, we are absolutely unable to bring more help.

Kind regards,

Loïc

versilej
Posts: 6
Joined: Thu Jun 09, 2011 8:17 am

Re: New methods

Post by versilej » Sun Jul 03, 2011 8:12 pm

I sent both the PDF, and full application to support. The resolution is user settable, but I have tried 200,300,400 and it's purely random.

karlie
Posts: 4
Joined: Tue Mar 09, 2010 2:47 am

Re: New methods

Post by karlie » Thu Jul 28, 2011 10:14 pm

Any updates on the out of memory exceptions when creating searcable PDF files? After upgrading our product to version 8 of GDPicture.NET we have been getting regular error reports from our customers with this exception

System.Exception: Exception of type 'System.OutOfMemoryException' was thrown.
at System.String.CtorCharCount(Char c, Int32 count)
at Microsoft.VisualBasic.Strings.Space(Int32 Number)
at aq.a(c[]& A_0, Int32 A_1)
at GdPicture.GdPictureImaging.PdfAddGdPictureImageToPdfOCR(Int32 PdfID, Int32 ImageID, String Dictionary, String DictionaryPath, String CharWhiteList)

Are you still testing GDPicture.NET in the x86 configuration? Because even though GDPicture version 8 is now ANYCPU, our program is still compiled as x86.

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: New methods

Post by Loïc » Thu Jul 28, 2011 11:47 pm

Hi Karlie,

This problem have been fixd in GdPicture.NET 8.1.1. Please, update !

Kind regards,

Loïc

vrtacic
Posts: 1
Joined: Mon Aug 01, 2011 10:07 am

Re: New methods

Post by vrtacic » Mon Aug 01, 2011 10:29 am

Hello,

At first I haved the same error than karlie and versilej....I have updated and now it is ok for out memory exceptions. Now I have another problem on the same method : PdfAddGdPictureImageToPdfOCR

The problem is : "Attempted to read or write protected memory. This is often an indication that other memory is corrupt."

Kind Regards

David


Le framework .NET a renvoyé l'erreur suivante :
System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.Exception: OCR exception: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
Source: GdPicture.NET
StackTrace: at j.c(IntPtr A_0, String A_1, String A_2, String A_3, IntPtr& A_4, Int32& A_5, Int32 A_6, Int32 A_7, IntPtr A_8, Int32 A_9, Int32 A_10, Int32& A_11, Int32 A_12, Int32 A_13, Int32 A_14, Int32 A_15, Int32 A_16, Int32 A_17, Int32 A_18, Int32 A_19, Int32 A_20, Int32 A_21, Int32 A_22, Int32 A_23)
at j.a(IntPtr A_0, String A_1, String A_2, String A_3, IntPtr& A_4, Int32& A_5, Int32 A_6, Int32 A_7, IntPtr A_8, Int32 A_9, Int32 A_10, Int32& A_11, Int32 A_12, Int32 A_13, Int32 A_14, Int32 A_15, Int32 A_16, Int32 A_17, Int32 A_18, Int32 A_19, Int32 A_20, Int32 A_21, Int32 A_22, Int32 A_23)
at aq.a(Int32 A_0, Int32 A_1, Int32 A_2, Int32 A_3, Int32 A_4, String A_5, String A_6, String A_7, IntPtr& A_8, Int32& A_9, Int32 A_10)
at aq.a(Int32 A_0, Int32 A_1, Int32 A_2, Int32 A_3, Int32 A_4, String A_5, String A_6, String A_7, IntPtr& A_8, Int32& A_9, Int32 A_10)
at GdPicture.GdPictureImaging.PdfAddGdPictureImageToPdfOCR(Int32 PdfID, Int32 ImageID, String Dictionary, String DictionaryPath, String CharWhiteList)
--- End of inner exception stack trace ---
at System.RuntimeMethodHandle._InvokeMethodFast(Object target, Object[] arguments, SignatureStruct& sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)
at System.RuntimeMethodHandle.InvokeMethodFast(Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)
at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks)
at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at System.RuntimeType.InvokeMember(String name, BindingFlags bindingFlags, Binder binder, Object target, Object[] providedArgs, ParameterModifier[] modifiers, CultureInfo culture, String[] namedParams)
at System.Type.InvokeMember(String name, BindingFlags invokeAttr, Binder binder, Object target, Object[] args)
at CDotNetType.bInvoke(CDotNetType* , Object gcrObj, SByte* pszNomMethode, CSLevel* pclPile, Int32 nNbParamPile, Int32 bValeurRetour, STOperationDotNet* pstOperation)
at CDotNetType.bInvoke(CDotNetType* , Object gcrObj, STMethodeDotNet* pstMethode, UInt32* pdwIdentifiant, CSLevel* pclPile, Int32 nNbParamPile, Int32 bValeurRetour, STOperationDotNet* pstOperation)
at CDotNetInstance.bAppelleMethode(CDotNetInstance* , STMethodeDotNet* pstMethode, UInt32* pdwIdentifiant, CSLevel* pclPile, Int32 nNbParamPile, Int32 bValeurRetour, STOperationDotNet* pstOperation)

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: New methods

Post by Loïc » Tue Aug 02, 2011 1:52 am

Hi,

Please open a ticket to https://www.gdpicture.com/support/getting-support-from-our-team providing instructions to reproduce the problem. W especially need code snippet and document causing the problem.

Kind regards,

Loïc

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest