Hope you are well!
I am having some difficulty running the Tesseract OCR on a Multipage PDF using threading in ASP.net. I want to run the OCR process in the background so that the user doesn't need to wait for it to complete. Instead, they can continue doing other tasks which the OCR runs.
When I run my code using a single page PDF it works perfectly! But, when I try a mutlipage PDF I get the following error:
Code: Select all
System.ArgumentNullException was unhandled
Message="Value cannot be null. Parameter name: ptr"
ParamName="ptr"
Source="mscorlib"
StackTrace:
at System.Runtime.InteropServices.Marshal.GetDelegateForFunctionPointer(IntPtr ptr, Type t)
at Ꮑ.ᢤ.ᢲ(Int32 ᢳ, Int32 ᢴ, Int32 ᢵ, Int32 ᢶ, Int32 ᢷ, TesseractDictionary ᢸ, String ᢹ, String ᢺ, IntPtr& ᢻ, Int32& ᢼ, Int32 ᢽ)
at GdPicture.GdPictureImaging.PdfAddGdPictureImageToPdfOCR(Int32 PdfID, Int32 ImageID, TesseractDictionary Dictionary, String DictionaryPath, String CharWhiteList)
at _Default.DoOCR_Multi() in C:\Projects\TesseractTest\Default.aspx.vb:line 84
at _Default._Lambda$__2() in C:\Projects\TesseractTest\Default.aspx.vb:line 20
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.runTryCode(Object userData)
at System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode code, CleanupCode backoutCode, Object userData)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
InnerException:
Code: Select all
Dim licensenumber As String = "LICENCENUMBER"
Protected Sub Button2_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles Button2.Click
Dim NewThread As Thread = New Thread(AddressOf DoOCR_Multi)
NewThread.Priority = ThreadPriority.Lowest
NewThread.Start()
Button2.Text = "Started Multi OCR... Wait for a few moments then check multipage_out.pdf"
End Sub
Public Function DoOCR_Multi() As Boolean
Dim sourcedoc As String = Server.MapPath("./") & "multipage.pdf"
Dim outdoc As String = Server.MapPath("./") & "multipage_out.pdf"
Dim randomgen As New Random()
Dim randomnum As Integer = randomgen.Next()
Dim thedate As String = DateTime.Now.ToString("yyyymmddhhMMss")
Dim tempfilename As String = sourcedoc
Dim tempfilename2 As String = outdoc
Dim ImageID As Integer
Dim oGdViewer As New GdPicture.GdViewer
Dim oGdPictureImaging As New GdPicture.GdPictureImaging
Dim PdfID As Integer
oGdViewer.SetLicenseNumber(licensenumber)
oGdPictureImaging.SetLicenseNumber(licensenumber)
oGdPictureImaging.SetLicenseNumberOCRTesseract(licensenumber)
oGdViewer.DisplayFromFile(tempfilename)
PdfID = oGdPictureImaging.PdfOCRStart(tempfilename2, True, "", "", "", "", "")
For i As Integer = 1 To oGdViewer.PageCount
ImageID = oGdViewer.PdfRenderPageToGdPictureImage(300, i)
oGdPictureImaging.ConvertTo1Bpp(ImageID)
oGdPictureImaging.PdfAddGdPictureImageToPdfOCR(PdfID, ImageID, TesseractDictionary.TesseractDictionaryEnglish, Server.MapPath("./") & "App_Data\Dictionary", "")
oGdViewer.ReleaseGdPictureImage(ImageID)
Next
oGdPictureImaging.PdfOCRStop(PdfID)
oGdViewer.CloseDocument()
End Function
Would you mind trying the sample project that I've created? You just need to add the dictionary files to the /App_Data/Dictionary/ folder and the GDPicture.NET DLLs files to the /Bin/ folder. I must be doing something incorrectly, but I just can't locate the problem. I'm hoping that you'll be able to point me in the right direction!
Thank you,
Chris