I'm deep in the weeds reverse engineering a very old proprietary document storage format (Keyfile). Embedded in the middle of a larger file is a block of image data (the scan of a single document page) that is encoded with CCITT4. I've learned enough about the file and the TIFF spec so far to write a filter that extracts the data from the source file and writes a new file that is supposed to be a plain TIFF, but it's not quite there yet, and I can't figure out what I'm still missing.
Encouragingly Adobe Photoshop opens my newly minted TIFF file and displays the document just fine (no errors, no warnings). Unfortunately, none of the other common tools will. I'm on a mac and have access to linux so I've tried:
Gimp
Preview (OSX)
ImageMagick
some of the libtiff utilities like fax2pdf
I suspect there's something wrong still with my TIFF file, that Photoshop is silently overlooking. I hope it's not in the raw CCITT4 image data, because I would rather not have to write code to decode that completely.
I can't post the files I'm working with because they contain sensitive data. However, I'm hoping that I'm just doing something wrong with my tiff header block that someone can point out. To that end. here's some basic information about my test file (the one that opens fine in Photoshop).
Keyfile.tiff 31K (32300 bytes)
Keyfile TIFF Version 1.01
0100.0004.00000001.000009f0 ImageWidth
0101.0004.00000001.00000ce0 ImageLength
0102.0003.00000001.00000001 BitsPerSample
0103.0003.00000001.00000004 Compression
0106.0003.00000001.00000000 PhotometricInterpolation
0111.0004.00000001.00000200 StripOffsets
0115.0003.00000001.00000001 SamplesPerPixel
0116.0004.00000001.00000ce0 RowsPerStrip
0117.0004.00000001.00007c2c StripByteCounts
011a.0005.00000001.000001d6 XResolution
011b.0005.00000001.000001de YResolution
0128.0004.00000001.00000002 ResolutionUnit
0131.0002.0000001a.000001e6 Software
This decode of the TIFF header block comes from code that I've written. Here's a hex dump of the header portion of the file to address 0x200.
49492A00080000000D000001040001000000F00900000101040001000000E00C00000201030001000000010000000301030001000000040000000601030001000000000000001101040001000000000200001501030001000000010000001601040001000000E00C000017010400010000002C7C00001A01050001000000D60100001B01050001000000DE010000280104000100000002000000310102001A000000E6010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002C010000010000002C010000010000004B657966696C6520544946462056657273696F6E20312E303100
What follows is exactly 0x7c2c bytes of compressed image data. I say this based on the tiff compression tag (4), which is copied over intact form the original file, and from looking at dozens and dozens of files with a hex editor and learning to recognize the image data block. Also the fact that Photoshop opens this file would seem to indicate I am correct.
Any help figuring out what I still need to do to make this file compatible with the rest of the utilities would be much appreciated.
For what it's worth here's the error produced by imagemagick:
>convert Keyfile.tiff Keyfile.pdf
convert: Premature EOL at line 0 of strip 0 (got 0, expected 2544). `Fax4Decode' # warning/tiff.c/TIFFWarnings/881.
I'm new to coding for TIFF and so any utilities or hints that would allow me to gather more detailed information about what's going on would also be appreciated.
Update:
Here are the first 0x318 bytes of the file. There's nothing sensitive here and you have the first 0x118 bytes of the image data. I can probably provide a bit more of the file if needed.
49492A00080000000D000001040001000000F00900000101040001000000E00C00000201030001000000010000000301030001000000040000000601030001000000000000001101040001000000000200001501030001000000010000001601040001000000E00C000017010400010000002C7C00001A01050001000000D60100001B01050001000000DE010000280104000100000002000000310102001A000000E6010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002C010000010000002C010000010000004B657966696C6520544946462056657273696F6E20312E3031000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000FFFFFFFC8085B51FFFFFFFFFFFFFFFFFFFFFFFF90154E0C4221836AC80A900F04142050814204679705E823C0D3089900E92D641B9B1D2907364E94886C112854118E6208686E6492B47D11C1A29289806DC25083A41427495102E6D349641736AA96439B08496113867960B314A08CC1A2102141410221AADC28102123E918508E02AC41143D2C5131C3C68B1620B8CCB02A8238F564536394D16F11AA050CEA8A9944105DB92591D12D04513E195B23E1252561A742191D11B0628110DA6E5259A6881891832C74B704A0C8F1B4618450E2AA4087391D17988888EA41CDAD8A2B0AAA4436A2647D94CC585
Update 2:
OK, I found a file that I can post. It's a mostly white page, but if rendered correctly, you will see the two darkish crescent moons which are the reflection of the holes on the original scanned page. There's also a bit of noise over to the right and along the top. Here's what it looks like (image):
I used Photoshop to convert/save a file I could upload. Here's a hex dump of the file my code generated, which opens fine in Photoshop, but not with anything else.
49492A00080000000D000001040001000000F00900000101040001000000E00C00000201030001000000010000000301030001000000040000000601030001000000000000001101040001000000000200001501030001000000010000001601040001000000E00C00001701040001000000530300001A01050001000000D60100001B01050001000000DE010000280104000100000002000000310102001A000000E6010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002C010000010000002C010000010000004B657966696C6520544946462056657273696F6E20312E3031002C19461170350282E88E8AF52889A91024623806A1C8F97C8E8D111D1847115B44CF3A2388DA2E8C2388122F98C868E23451112508B88600D4297C8E88E44788F91E308BC4745CC8F91E23A2EC8E88F11E23B36447C8F11CC8E611020711111A6888390E39C738E0848E8BA23A388D4A224111B03681C206478DA892946E2E06D06B51121718036032092844E0AE470350604AA229C88E0680CC224511803402E24A11F88E0660D8224A40CD1016ACC8E0B606048906482C101752460C8E19006E224AC3203901D091B03C08122D9C0DA12141BFFFFFFFFFFFFFFFFFFFFFFFFFFFFF2D2125082123F1A2EA08124122EB6820A475E2105130A8209826474388886475612449543B295550C8E88224EC591D1174295B23A48C0EC591E08762111E23A2F9F46D11D02E22323E088A3870447542223EE35BDF56AD5856AD430A1856AC2879692C06C2FC304259A688BA23D2D23211A4088FC504162A5373447C20A2396062188A891F23F7C48E89502F41A46D11B417126E51328709EDE4747D04171D8B23A650E5714E13158921F111588AB0AF72CA6AB50ED27690664750C286B6B1B29D351609F21976B8685A8613C309A96014631FFFFFFFFFFFFF2039C720383A5C5DFEB56B0B51FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF9601A8FFFFFFFFFFFFFFFFFFFFFFFFFFCEC6947FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF95CEA3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE5A852A3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFC004004
Here are it's specs.
Keyfile_66.tiff 1K (1363 bytes)
Keyfile TIFF Version 1.01
0100.0004.00000001.000009f0 ImageWidth
0101.0004.00000001.00000ce0 ImageLength
0102.0003.00000001.00000001 BitsPerSample
0103.0003.00000001.00000004 Compression
0106.0003.00000001.00000000 PhotometricInterpolation
0111.0004.00000001.00000200 StripOffsets
0115.0003.00000001.00000001 SamplesPerPixel
0116.0004.00000001.00000ce0 RowsPerStrip
0117.0004.00000001.00000353 StripByteCounts
011a.0005.00000001.000001d6 XResolution
011b.0005.00000001.000001de YResolution
0128.0004.00000001.00000002 ResolutionUnit
0131.0002.0000001a.000001e6 Software
Here's a link to download the file.
Any idea why this is would be much appreciated.
Related
I am trying to build a native byte parser that given an HEIF image it returns back its metadata (mainly width and height of the image).
I am struggling a lot at the moment finding the right documentation and specs to use for parsing such info. I have to do such thing for both XMP and EXIF metadata, but let's focus only on XMP for now.
What I need is the exact byte structure of where to find what. According to the HEIF international standard doc (here):
For image items, XMP metadata shall be stored as an item of item_type value 'mime' and content type'application/rdf+xml'. The body of the item shall be a valid XMP document, in XML form.
Perfect, if I analyse a sample image I can find such marker:
From now on I can't find anywhere how to get the info I need. I would expect something saying "the first 2 bytes are the header, with marker 0xFF 0xCE (just an example), the next 2 bytes are the width, and following 2 bytes the height...etc".
In my case I am going by intuition. My sample image is of dimensions 8736x5856. If in the tool I look for Big-Endian 2 byte integer 8736, I can find it:
And hey, 2 bytes later there is the 5856 height as well:
But again, I arrived here by luck and intuition. I need a proper schema that tells me where to find what in such a way that I can traslate it to code.
What I think you'r seeing is a "mime" and "ispe" mp4 box as HEIF is ISOBMFF based. I would recommend looking at the file using a mp4 capable tool like mp4dump, HexFiend or fq (note: my tool). The "ispe" (Image Spatial Extents) box i probably what you want to read.
fq does no support ispe box yet but you could read it like this:
$ fq 'grep_by(.type=="ispe").data | tobytes | [.[-8:-4], .[-4:] | tonumber]' file.heif
[
8736,
5856
]
So what you need is probably a basic ISOBMFF reader and then look for the "ispe" box and decode it. If you'r only looking for the first of a specific box you can probably ignore that ISOBMFF is a tree structure.
I'm trying to read(preview) AI (adobe illustrator) file in my web application. my web app is on Linux machine and mainly uses Python.
I couldn't find any native python code that can preview AI file, so I continued to search for solution and found ghostscript, which gives the option to convert AI to JPG/PNG and I these format I have no problem previewing.
The issue I have is that I need the preview to include the whole document and not just the artboard, in illustrator it's possible when removing the checkbox from "use artboards" when saving, see screenshot: https://helpx.adobe.com/content/dam/help/en/illustrator/how-to/export-svg/_jcr_content/main-pars/image0/5286-export-svg-fig1.jpg
but when I try to export from ghostscript, I can't make it work...
from my understanding, it's best to try and first convert to EPS and then from that to JPG/PNG, but I failed doing that as well and the items that are outside the artboard are not showing.
on linux, these are the commands I basically tried , after installing ghostscript:
gs -dNOPAUSE -dBATCH -sDEVICE=eps2write -sOutputFile=out.eps input.ai
gs -dNOPAUSE -dBATCH -sDEVICE=jpeg -r300 -sOutputFile=out.jpeg input.ai
gs -dNOPAUSE -dBATCH -sDEVICE=pngalpha -r300 -sOutputFile=out.png input.ai
if it's not possible with ghostscript and I need imagemagick instead, I don't mind using it... I tried it for 10 minute and just got bunch of errors so I left it....
AI file for example: https://drive.google.com/open?id=1UgyLG_-nEUL5FLTtD3Dl281YVYzv0mUy
Jpeg example of the output I want: https://drive.google.com/open?id=1tLT2Uj1pp1gKRnJ8BojPZJxMFRn6LJoM
Thank you
Some updates on the topic: I've found this:
https://gist.github.com/moluapple/2059569
This is AI PGF extractor which should theoretically help to extract the additional data from the PDF. Currently, it seems quite old and written for win32, so I cannot test it at the moment, but it's at least some kind of lead.
Firstly, Adobe Illustrator native files are not technically supported by Ghostscript at all. They might work, because they are normally either PostScript or PDF files with custom bits that can be ignored for the purposes of drawing the content. But it's not a guarantee.
Secondly; no, do not multiply convert the files! That's a piece of cargo-cult mythology that's been doing the rounds for ages. There are sometimes reasons for doing so but in general this will simply magnify problems, not solve them. Really, don't do that.
You haven't quoted the errors you are getting and you haven't supplied any files to look at, so it's not really possible to tell what your problem is. I have no clue what an 'artboard' is, and a picture of the Illustrator dialog doesn't help.
Perhaps if you could supply an example file, and maybe a picture of what you expect, it might be possible to figure it out. My guess is that your '.ai' file is a PDF file, and that it has a MediaBox (which is what Ghostscript uses by default) and an ArtBox which is what you actually want to use. Or something like that. Hard to say without more information.
Edit
Well, I'm afraid the answer here is that you can't easily get what you want from that file without using Illustrator.
The file is a PDF file (if you rename input.ai to input.pdf then you can open it with a PDF reader). But Illustrator doesn't use most of the PDF file when it opens it. Instead the PDF file contains a '/PieceInfo' key, which is a key in the Page dictionary. That points to a dictionary which has a /Private key, which (finally!) points to a dictionary with a bunch of Illustrator stuff:
52 0 obj
<<
/AIMetaData 53 0 R
/AIPrivateData1 54 0 R
/AIPrivateData10 55 0 R
/AIPrivateData11 56 0 R
/AIPrivateData2 57 0 R
/AIPrivateData3 58 0 R
/AIPrivateData4 59 0 R
/AIPrivateData5 60 0 R
/AIPrivateData6 61 0 R
/AIPrivateData7 62 0 R
/AIPrivateData8 63 0 R
/AIPrivateData9 64 0 R
/ContainerVersion 11
/CreatorVersion 23
/NumBlock 11
/RoundtripStreamType 1
/RoundtripVersion 17
>>
endobj
That's the actual saved file format of the Illustrator file. You can think of the PDF file as a 'preview' wrapped around the Illustrator native file. Illustrator reads the PDF file to find its own data, then throws the PDF file away and uses the native file format stored within it instead.
The problem is that the PDF part of the file simply doesn't contain the content you want to see. That's stored in the Illustrator native data. Ghostscript just renders what's in the PDF file, it doesn't look at the Illustrator native file.
Looking at the Illustrator private data, some of it is uncompressed, but most is compressed, it doesn't say how it is compressed but applying the FlateDecode filter produces a good old-fashioned Illustrator PostScript file, one that will work with Ghostscript.
But you would have to manually parse the PDF file, extract all the compressed AIPrivateData streams, concatenate them together, apply the FlateDecode filter to decompress them, and only then send the resulting output to Ghostscript with the -dEPSCrop switch set. That will result in the output you want.
But neither Ghostscript nor ImageMagick (which generally uses Ghostscript to render PDF files) will do any of that for you, you would have to do it all yourself.
Is there anyway (commandline tools) to calculate MD5 hash for .NEF (also .CR2, .TIFF) regardless any metadata, e.g. EXIF, IPTC, XMP and so on?
The MD5 hash should be same once we update any metadata inside the image file.
I searched for a while, the closest solution is:
exiftool test.nef -all= -o - -m | md5
but 'exiftool -all=' still keeps a set of EXIF tags in the output file. The MD5 hash can be changed if I update remaining tags.
ImageMagick has a method for doing exactly this. It is installed on most Linux distros and is available for OSX (ideally via homebrew) and also Windows. There is an escape for the image signature which includes only pixel data and not metadata - you use it like this:
identify -format %# _DSC2007.NEF
feb37d5e9cd16879ee361e7987be7cf018a70dd466d938772dd29bdbb9d16610
I know it does what you want and that the calculated checksum does not change when you modify the metadata on PNG files for example, and I know it does calculate the checksum correctly for CR2 and NEF files. However, I am not in the habit of modifying RAW files such as you have and have not tested it does the right thing in that case - though I would be startled if it didn't! So please test before use.
The reason that there is still some Exif data left is because the image data for a NEF file (and similar TIFF based filetypes) is located within that Exif block. Remove that and you have removed the image data. See ExifTool FAQ 7, which has an example shortcut tag that may help you out.
I assume your intention is to verify the actual image data has not been tampered with.
An alternate approach to stripping the meta-data can be to convert the image to a format that has no metadata.
ImageMagick is a well known open source (Apache 2 license) for image manipulation and conversion. It provides libraries with various language bindings as well as command line tools for various operating systems.
You could try:
convert test.nef bmp:- | md5
This converts test.nef to bmp on stdout and pipes it to md5.
AFAIR bmp has no support for metadata and I'm not sure if ImageMagick even preserves metadata across conversions.
This will only work with single image files (i.e. not multi-image tiff or gif animations). There is also the slight possibility some changes can be made to the image which result in the same conversion because of color space conversions, but these changes would not be visible.
I am doing image manipulation on the png images. I have the following problem. After saving an image with imwrite() function, the size of the image is increased. For example previously image is 847KB, after saving it becomes 1.20 MB. Here is a code. I just read an image and then save it, but the size is increased. I tried to set compression params but it doesn't help.
Mat image;
image = imread("5.png", -1);
vector<int> compression_params;
compression_params.push_back(CV_IMWRITE_PNG_COMPRESSION);
compression_params.push_back(9);
compression_params.push_back(0);
imwrite("output.png",image,compression_params);
What could be a problem? Any help please.
Thanks.
PNG has several options that influence the compression: deflate compression level (0-9), deflate strategy (HUFFMAN/FILTERED), and the choice (or strategy for dynamically chosing) for the internal prediction error filter (AVERAGE, PAETH...).
It seems OpenCV only lets you change the first one, and it hasn't a good default value for the second. So, it seems you must live with that.
Update: looking into the sources, it seems that compression strategy setting has been added (after complaints), but it isn't documented. I wonder if that source is released. Try to set the option CV_IMWRITE_PNG_STRATEGY with Z_FILTERED and see what happens
See the linked source code for more details about the params.
#Karmar, It's been many years since your last edit.
I had similar confuse to yours in June, 2021. And I found out sth which might benefit others like us.
PNG files seem to have this thing called mode. Here, let's focus only on three modes: RGB, P and L.
To quickly check an image's mode, you can use Python:
from PIL import Image
print(Image.open("5.png").mode)
Basically, when using P and L you are attributing 8 bits/pixel while RGB uses 3*8 bits/pixel.
For more detailed explanation, one can refer to this fine stackoverflow post: What is the difference between images in 'P' and 'L' mode in PIL?
Now, when we use OpenCV to open a PNG file, what we get will be an array of three channels, regardless which mode that
file was saved into. Three channels with data type uint8, that means when we imwrite this array into a file, no matter
how hard you compress it, it will be hard to beat the original file if it was saved in P or L mode.
I guess #Karmar might have already had this question solved. For future readers, check the mode of your own 5.png.
I'm having a proprietary image format SNG( a proprietary format) which is having a countinous array of Image data along with Image meta information in seperate HDR file.
Now I need to convert this SNG format to a Standard TIFF 6.0 Format. So I studied the TIFF format i.e. about its Header, Image File Directories( IFD's) and Stripped Image Data.
Now I have few concerns about this conversion. Please assist me.
SNG Continous Data vs TIFF Stripped Data: Should I convert SNG Data to TIFF as a continous data in one Strip( data load/edit time problem?) OR make logical StripOffsets of the SNG Image data.
SNG Data Header uses only necessary Meta Information, thus while converting the SNG to TIFF, some information can’t be retrieved such as NewSubFileType, Software Tag etc.
So this raises a concern that after conversion whether any missing directory information such as NewSubFileType, Software Tag etc is necessary and sufficient condition for TIFF File.
Encoding of each pixel component of RGB Sample in SNG data:
Here each SNG Image Data Strip per Pixel component is encoded as:
Out^[i] := round( LineBuffer^[i * 3] * **0.072169** + LineBuffer^[i * 3 + 1] * **0.715160** + LineBuffer^[i * 3+ 2]* **0.212671**);
Only way I deduce from it is that each Pixel is represented with 3 RGB component and some coefficient is multiplied with each component to make the SNG Viewer work RGB color information of SNG Image Data. (Developer who earlier work on this left, now i am following the trace :))
Thus while converting this to TIFF, the decoding the same needs to be done. This raises a concern that the how RBG information in TIFF is produced, or better do we need this information?.
Please assist...
Are you able to load this into a standard windows bitmap handle? If so, there are probably a bunch of free and commercial libraries for saving it as TIFF.
The standard for TIFF is libtiff -- it's a C library. Here's a version for Delphi made by an expert in the TIFF format:
http://www.awaresystems.be/imaging/tiff/delphi.html
There seems to be a lot of choices.
I think the approach of
Loading your format into an in-memory standard bitmap (which you need to do to show it, right?)
Using a pre-existing TIFF encoding library to save as TIFF
Will be a lot easier than trying to do a direct format-to-format conversion. The only reasons I wouldn't do it this way are:
The bitmap is too big to keep in memory
The original format is lossy and I will lose more quality in the re-encoding -- but you'd have to be saving in a standard lossy format (JPEG) to save quality.
Disclaimer: I work for Atalasoft.
We make .NET imaging codecs (including TIFF) -- that are a lot easier to use than LibTiff -- you can call them in Delphi through COM. We can convert standard windows bitmaps to TIFF or PDF (or other formats) with a couple of lines of code.
One approach, if you have a Windows application which handles and can print this format, would be to let it do the work for you, and call it to print the file to one of the many available 'printer drivers' which support direct output to TIFF.