Identifying a tif or pdf when it has the wrong extension

Discussions about image processing and document imaging.
Post Reply
neojakey
Posts: 1
Joined: Tue Sep 09, 2014 8:22 pm

Identifying a tif or pdf when it has the wrong extension

Post by neojakey » Tue Sep 09, 2014 8:29 pm

Hello all,

I have a scenario where a TIF file has been mis-labelled as a PDF (hello.pdf) when in actual fact it is a TIF (hello.tif).. and vice versa. Is there a way using gdpicture that you validate the content of the file (its binary content) to verify that the file is actually what it says it is.

We are using GDPicture 9 in dot net...

Many thanks in advance,
Paul

bmarkovic
Posts: 1
Joined: Thu Jun 11, 2015 4:32 pm

Re: Identifying a tif or pdf when it has the wrong extension

Post by bmarkovic » Thu Jun 11, 2015 4:36 pm

You can just read the file content an it will tell you.

public static Enums.FileIdentifierTypes GetImageFileType(byte[] fileContent)
{

//bmp
if (fileContent.Length > 1 && fileContent[0] == 0x42 && fileContent[1] == 0x4d) return Enums.FileIdentifierTypes.Bmp;

// gif
if (fileContent.Length > 5 && fileContent[0] == 0x47 && fileContent[1] == 0x49 && fileContent[2] == 0x46 &&
fileContent[3] == 0x38 && fileContent[4] == 0x39 && fileContent[5] == 0x61) return Enums.FileIdentifierTypes.Gif;
if (fileContent.Length > 5 && fileContent[0] == 0x47 && fileContent[1] == 0x49 && fileContent[2] == 0x46 &&
fileContent[3] == 0x38 && fileContent[4] == 0x37 && fileContent[5] == 0x61) return Enums.FileIdentifierTypes.Gif;

// jpeg
if (fileContent.Length > 3 && fileContent[0] == 0xff && fileContent[1] == 0xd8 &&
fileContent[fileContent.Length - 2] == 0xff && fileContent[fileContent.Length - 1] == 0xd9)
return Enums.FileIdentifierTypes.Jpeg;

// pdf
if (fileContent.Length > 3 && fileContent[0] == 0x25 && fileContent[1] == 0x50 && fileContent[2] == 0x44 &&
fileContent[3] == 0x46) return Enums.FileIdentifierTypes.Pdf;

// png
if (fileContent.Length > 7 && fileContent[0] == 0x89 && fileContent[1] == 0x50 && fileContent[2] == 0x4e &&
fileContent[3] == 0x47 && fileContent[4] == 0x0d && fileContent[5] == 0x0a && fileContent[6] == 0x1a &&
fileContent[7] == 0x0a) return Enums.FileIdentifierTypes.Png;

// tif
if (fileContent.Length > 3 && fileContent[0] == 0x49 && fileContent[1] == 0x49 && fileContent[2] == 0x2a &&
fileContent[3] == 0x00) return Enums.FileIdentifierTypes.Tif;
if (fileContent.Length > 3 && fileContent[0] == 0x4d && fileContent[1] == 0x4d && fileContent[2] == 0x00 &&
fileContent[3] == 0x2a) return Enums.FileIdentifierTypes.Tif;

return Enums.FileIdentifierTypes.Unkown;
}

Hope this helps.

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: Identifying a tif or pdf when it has the wrong extension

Post by Loïc » Thu Jun 11, 2015 4:56 pm

Also the latest GdPicture release can detect the file format by analyzing its content.

See GdPictureImaging::GetDocumentFormatFromStream

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests