some magic in .GetPageText?
Posted: Wed Jan 06, 2021 3:59 pm
Hi
we have to extract text from Pdf, and we use .GetPageText from GdPicturePDF.
but appens something funny:
what i see is different from what i get
i see: COC TELEMATICO
but i get: cac TELEMATiCa
i see: Q1J7N1LVL
but i get: Q1J7N1 LVL <-- there is a space between 1 and L
I have attached an the Pdf with this problem, but this problem appens with too much Pdf of the same kind
I have also to say that 'select and paste' from Acrobat produce the same problem as using .GetPageText
Maybe is not a problem of GDPicture but can someone help me solve the problem?
Alberto
we have to extract text from Pdf, and we use .GetPageText from GdPicturePDF.
but appens something funny:
what i see is different from what i get
i see: COC TELEMATICO
but i get: cac TELEMATiCa
i see: Q1J7N1LVL
but i get: Q1J7N1 LVL <-- there is a space between 1 and L
I have attached an the Pdf with this problem, but this problem appens with too much Pdf of the same kind
I have also to say that 'select and paste' from Acrobat produce the same problem as using .GetPageText
Maybe is not a problem of GDPicture but can someone help me solve the problem?
Alberto