Reading byte by byte HEIF/HEIC images XMP metadata - image-processing

I am trying to build a native byte parser that given an HEIF image it returns back its metadata (mainly width and height of the image).
I am struggling a lot at the moment finding the right documentation and specs to use for parsing such info. I have to do such thing for both XMP and EXIF metadata, but let's focus only on XMP for now.
What I need is the exact byte structure of where to find what. According to the HEIF international standard doc (here):
For image items, XMP metadata shall be stored as an item of item_type value 'mime' and content type'application/rdf+xml'. The body of the item shall be a valid XMP document, in XML form.
Perfect, if I analyse a sample image I can find such marker:
From now on I can't find anywhere how to get the info I need. I would expect something saying "the first 2 bytes are the header, with marker 0xFF 0xCE (just an example), the next 2 bytes are the width, and following 2 bytes the height...etc".
In my case I am going by intuition. My sample image is of dimensions 8736x5856. If in the tool I look for Big-Endian 2 byte integer 8736, I can find it:
And hey, 2 bytes later there is the 5856 height as well:
But again, I arrived here by luck and intuition. I need a proper schema that tells me where to find what in such a way that I can traslate it to code.

What I think you'r seeing is a "mime" and "ispe" mp4 box as HEIF is ISOBMFF based. I would recommend looking at the file using a mp4 capable tool like mp4dump, HexFiend or fq (note: my tool). The "ispe" (Image Spatial Extents) box i probably what you want to read.
fq does no support ispe box yet but you could read it like this:
$ fq 'grep_by(.type=="ispe").data | tobytes | [.[-8:-4], .[-4:] | tonumber]' file.heif
[
8736,
5856
]
So what you need is probably a basic ISOBMFF reader and then look for the "ispe" box and decode it. If you'r only looking for the first of a specific box you can probably ignore that ISOBMFF is a tree structure.

Related

NVIDIA NVENC (Media Foundation) encoded h.264 frames not decoded properly using VideoToolbox

I am facing the same problem as described here when trying to decode a frame on iPad Pro OS v14.3 (I am also using Olivia Stork's example):
25% of the picture data is decoded correctly, the rest of the picture is just green.
The decoded image on iPad Pro OS v14.3 looks like this (the image was converted and saved in the decoder callback as described here, so it's not just a displaying problem).
The original image looks like this.
The image is encoded with NVIDIA NVENC (Media Foundation) on Windows10.
I searched the frame picture data for additional 4-Byte NALU start codes as described in the link, but there are only the three expected ones for SPS, PPS and IDR picture data.
I have another Media Foundation decoder application running on Windows10 which can decode the frames from exactly the same source correctly.
I am struggling for days now finding the cause of the problem.. anyone any ideas?
Thanks in advance. Rob
-
EDIT 2021-01-11:
I figured out that there are actually three additional 3-byte start codes (0x000001) within in the IDR picture data block of NALU type 5.
I tried to replace these start codes with the length of the following data block (big endian), as described here, but with the same result.
I also tried adding Emulation Prevention Bytes (0x000001 => 0x000301) as described here, but that also made no difference.
Maybe I am mislead and these start codes have nothing to do with the issue.. at least they are not just random image data, because they always appear at the same position (index) in the picture data block. Currently I am running out of ideas.. any hint anybody?
-
EDIT 2021-01-14:
I figured out a few more things:
Out of sheer lack of ideas I copied the picture data followed after the last start code at the beginning of the block (right after after the 4-Byte NALU start code).
I had expected - if that would work at all - to see the last quarter of the original image at the top of the decoded image, but to my surprise the decoded image looked like this.
I tried the same with the picture data coming after the second and third start code, and the decoded image looked like this and this:
The image data is decoded correctly and it is even at the correct position (compare to original image).
Even if I strip off all 3-Byte start codes and copy the picture data concatenated after the 4-Byte start code, it's the same result, only 25% of the image is decoded. So the additional 3-Byte start codes are apparently not the problem. There must be some setting somewhere which tells the decoder to only decode 25% of the image. I would tip on the CMVideoFormatDescription, but as far as I can see it looks okay.
I am also wondering how the decoder knows where to display the different picture data blocks. Either there is an offset defined somewhere within the picture data or the xy-position of every pixel is added by the encoder somehow..
I managed to find the cause of the problem: The 3-Byte start codes in the IDE picture data block must be replaced by 4-Byte start codes.
So first replace all 3-Byte start codes by 4-Byte start codes.
Then replace the 4-Byte start codes with the length of the following data block (big endian). The slices should be arranged like this (as mentioned here by 'Blackie'):
[4byte slice1 size][slice1 data][4byte slice2 size][slice2 data]...[4byte slice4 size][slice4 data]
Remember to not include the start code length in slice size.
After changing that, my frame was completely displayed.
By the way:
The information where to display the different picture data blocks is specified in the header data of each NALU (parameter 'first_mb_in_slice').
There is a very good c# example here how to extract the NALU header data. You can almost copy it 1:1.

Reverse Engineering proprietary TIFF format

I'm deep in the weeds reverse engineering a very old proprietary document storage format (Keyfile). Embedded in the middle of a larger file is a block of image data (the scan of a single document page) that is encoded with CCITT4. I've learned enough about the file and the TIFF spec so far to write a filter that extracts the data from the source file and writes a new file that is supposed to be a plain TIFF, but it's not quite there yet, and I can't figure out what I'm still missing.
Encouragingly Adobe Photoshop opens my newly minted TIFF file and displays the document just fine (no errors, no warnings). Unfortunately, none of the other common tools will. I'm on a mac and have access to linux so I've tried:
Gimp
Preview (OSX)
ImageMagick
some of the libtiff utilities like fax2pdf
I suspect there's something wrong still with my TIFF file, that Photoshop is silently overlooking. I hope it's not in the raw CCITT4 image data, because I would rather not have to write code to decode that completely.
I can't post the files I'm working with because they contain sensitive data. However, I'm hoping that I'm just doing something wrong with my tiff header block that someone can point out. To that end. here's some basic information about my test file (the one that opens fine in Photoshop).
Keyfile.tiff 31K (32300 bytes)
Keyfile TIFF Version 1.01
0100.0004.00000001.000009f0 ImageWidth
0101.0004.00000001.00000ce0 ImageLength
0102.0003.00000001.00000001 BitsPerSample
0103.0003.00000001.00000004 Compression
0106.0003.00000001.00000000 PhotometricInterpolation
0111.0004.00000001.00000200 StripOffsets
0115.0003.00000001.00000001 SamplesPerPixel
0116.0004.00000001.00000ce0 RowsPerStrip
0117.0004.00000001.00007c2c StripByteCounts
011a.0005.00000001.000001d6 XResolution
011b.0005.00000001.000001de YResolution
0128.0004.00000001.00000002 ResolutionUnit
0131.0002.0000001a.000001e6 Software
This decode of the TIFF header block comes from code that I've written. Here's a hex dump of the header portion of the file to address 0x200.
49492A00080000000D000001040001000000F00900000101040001000000E00C00000201030001000000010000000301030001000000040000000601030001000000000000001101040001000000000200001501030001000000010000001601040001000000E00C000017010400010000002C7C00001A01050001000000D60100001B01050001000000DE010000280104000100000002000000310102001A000000E6010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002C010000010000002C010000010000004B657966696C6520544946462056657273696F6E20312E303100
What follows is exactly 0x7c2c bytes of compressed image data. I say this based on the tiff compression tag (4), which is copied over intact form the original file, and from looking at dozens and dozens of files with a hex editor and learning to recognize the image data block. Also the fact that Photoshop opens this file would seem to indicate I am correct.
Any help figuring out what I still need to do to make this file compatible with the rest of the utilities would be much appreciated.
For what it's worth here's the error produced by imagemagick:
>convert Keyfile.tiff Keyfile.pdf
convert: Premature EOL at line 0 of strip 0 (got 0, expected 2544). `Fax4Decode' # warning/tiff.c/TIFFWarnings/881.
I'm new to coding for TIFF and so any utilities or hints that would allow me to gather more detailed information about what's going on would also be appreciated.
Update:
Here are the first 0x318 bytes of the file. There's nothing sensitive here and you have the first 0x118 bytes of the image data. I can probably provide a bit more of the file if needed.
49492A00080000000D000001040001000000F00900000101040001000000E00C00000201030001000000010000000301030001000000040000000601030001000000000000001101040001000000000200001501030001000000010000001601040001000000E00C000017010400010000002C7C00001A01050001000000D60100001B01050001000000DE010000280104000100000002000000310102001A000000E6010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002C010000010000002C010000010000004B657966696C6520544946462056657273696F6E20312E3031000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000FFFFFFFC8085B51FFFFFFFFFFFFFFFFFFFFFFFF90154E0C4221836AC80A900F04142050814204679705E823C0D3089900E92D641B9B1D2907364E94886C112854118E6208686E6492B47D11C1A29289806DC25083A41427495102E6D349641736AA96439B08496113867960B314A08CC1A2102141410221AADC28102123E918508E02AC41143D2C5131C3C68B1620B8CCB02A8238F564536394D16F11AA050CEA8A9944105DB92591D12D04513E195B23E1252561A742191D11B0628110DA6E5259A6881891832C74B704A0C8F1B4618450E2AA4087391D17988888EA41CDAD8A2B0AAA4436A2647D94CC585
Update 2:
OK, I found a file that I can post. It's a mostly white page, but if rendered correctly, you will see the two darkish crescent moons which are the reflection of the holes on the original scanned page. There's also a bit of noise over to the right and along the top. Here's what it looks like (image):
I used Photoshop to convert/save a file I could upload. Here's a hex dump of the file my code generated, which opens fine in Photoshop, but not with anything else.
49492A00080000000D000001040001000000F00900000101040001000000E00C00000201030001000000010000000301030001000000040000000601030001000000000000001101040001000000000200001501030001000000010000001601040001000000E00C00001701040001000000530300001A01050001000000D60100001B01050001000000DE010000280104000100000002000000310102001A000000E6010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002C010000010000002C010000010000004B657966696C6520544946462056657273696F6E20312E3031002C19461170350282E88E8AF52889A91024623806A1C8F97C8E8D111D1847115B44CF3A2388DA2E8C2388122F98C868E23451112508B88600D4297C8E88E44788F91E308BC4745CC8F91E23A2EC8E88F11E23B36447C8F11CC8E611020711111A6888390E39C738E0848E8BA23A388D4A224111B03681C206478DA892946E2E06D06B51121718036032092844E0AE470350604AA229C88E0680CC224511803402E24A11F88E0660D8224A40CD1016ACC8E0B606048906482C101752460C8E19006E224AC3203901D091B03C08122D9C0DA12141BFFFFFFFFFFFFFFFFFFFFFFFFFFFFF2D2125082123F1A2EA08124122EB6820A475E2105130A8209826474388886475612449543B295550C8E88224EC591D1174295B23A48C0EC591E08762111E23A2F9F46D11D02E22323E088A3870447542223EE35BDF56AD5856AD430A1856AC2879692C06C2FC304259A688BA23D2D23211A4088FC504162A5373447C20A2396062188A891F23F7C48E89502F41A46D11B417126E51328709EDE4747D04171D8B23A650E5714E13158921F111588AB0AF72CA6AB50ED27690664750C286B6B1B29D351609F21976B8685A8613C309A96014631FFFFFFFFFFFFF2039C720383A5C5DFEB56B0B51FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF9601A8FFFFFFFFFFFFFFFFFFFFFFFFFFCEC6947FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF95CEA3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE5A852A3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFC004004
Here are it's specs.
Keyfile_66.tiff 1K (1363 bytes)
Keyfile TIFF Version 1.01
0100.0004.00000001.000009f0 ImageWidth
0101.0004.00000001.00000ce0 ImageLength
0102.0003.00000001.00000001 BitsPerSample
0103.0003.00000001.00000004 Compression
0106.0003.00000001.00000000 PhotometricInterpolation
0111.0004.00000001.00000200 StripOffsets
0115.0003.00000001.00000001 SamplesPerPixel
0116.0004.00000001.00000ce0 RowsPerStrip
0117.0004.00000001.00000353 StripByteCounts
011a.0005.00000001.000001d6 XResolution
011b.0005.00000001.000001de YResolution
0128.0004.00000001.00000002 ResolutionUnit
0131.0002.0000001a.000001e6 Software
Here's a link to download the file.
Any idea why this is would be much appreciated.

Is there anyway (commandline tools) to calculate MD5 hash for .NEF (also .CR2, .TIFF) regardless any metadata?

Is there anyway (commandline tools) to calculate MD5 hash for .NEF (also .CR2, .TIFF) regardless any metadata, e.g. EXIF, IPTC, XMP and so on?
The MD5 hash should be same once we update any metadata inside the image file.
I searched for a while, the closest solution is:
exiftool test.nef -all= -o - -m | md5
but 'exiftool -all=' still keeps a set of EXIF tags in the output file. The MD5 hash can be changed if I update remaining tags.
ImageMagick has a method for doing exactly this. It is installed on most Linux distros and is available for OSX (ideally via homebrew) and also Windows. There is an escape for the image signature which includes only pixel data and not metadata - you use it like this:
identify -format %# _DSC2007.NEF
feb37d5e9cd16879ee361e7987be7cf018a70dd466d938772dd29bdbb9d16610
I know it does what you want and that the calculated checksum does not change when you modify the metadata on PNG files for example, and I know it does calculate the checksum correctly for CR2 and NEF files. However, I am not in the habit of modifying RAW files such as you have and have not tested it does the right thing in that case - though I would be startled if it didn't! So please test before use.
The reason that there is still some Exif data left is because the image data for a NEF file (and similar TIFF based filetypes) is located within that Exif block. Remove that and you have removed the image data. See ExifTool FAQ 7, which has an example shortcut tag that may help you out.
I assume your intention is to verify the actual image data has not been tampered with.
An alternate approach to stripping the meta-data can be to convert the image to a format that has no metadata.
ImageMagick is a well known open source (Apache 2 license) for image manipulation and conversion. It provides libraries with various language bindings as well as command line tools for various operating systems.
You could try:
convert test.nef bmp:- | md5
This converts test.nef to bmp on stdout and pipes it to md5.
AFAIR bmp has no support for metadata and I'm not sure if ImageMagick even preserves metadata across conversions.
This will only work with single image files (i.e. not multi-image tiff or gif animations). There is also the slight possibility some changes can be made to the image which result in the same conversion because of color space conversions, but these changes would not be visible.

Reading TIFF files

I need to read and interpret a binary file containing a TIFF image. I know there exist readers for doing this but I want to go the hard way. I found the TIFF format description and need to parse the binary file in small chunks. Assume I was able to read in memory the complete binary file. This means that I have a variable containing one long list of bytes.
I know via the format definition what the meaning is of the different groups of n bytes.
How can one define character variables with different lengths (sometimes 2, sometimes 3, sometimes 4 etc.) so that the variable address points to the right position in the image variable array?
With other words, assume my image is loaded into an array Image containing all bytes of the file.
The first 2 bytes I want to load in a string with length 2 bytes so that I can just link the address pointer to the first position in the Image array and automatically the first 2 bytes are associated with the first character string. A second string of 4 bytes would have another meaning and so I make the address for the second string of 4 bytes point to the 3rd position of the Image array.
Is this feasible in C++? I remember that this was a normal way of working for dynamical memory allocation in Fortran 77 in a simulation code I analysed a long time ago.
Thanks in advance for the hints!
Regards,
Stefan
The C++ language is easily capable of processing TIFF files from a byte array. The idea you have in mind is basically correct, but there a few problems with it. C strings are zero-terminated and the strings which appear in TIFF files are not necessarily zero terminated since their length is specified explicitly. It really is simpler to create a dedicated data structure to hold the TIFF-specific data fields and then parse the binary data into the structure. Your method will immediately run into trouble with the Motorola/Intel byte issue if your machine has the opposite endian-ness.

Is there a way to pad files with a few extra bytes to get a different md5 checksum?

I have video files, that I want to pad with a random number of extra bytes, in order to create a different md5 checksum. Is there a way to do that, and keep them playable?
It depends on the video file format, but you should be able to just add the extra bytes to the end, and most video players should ignore them. Most video formats contain a lot of metadata about the video data (such as "the total video size is X bytes"), so they're robust against this sort of change.
One simple way to do this is to use the >> shell redirection operator to append data, e.g.:
# Add 32 random bytes to the end of the movie.avi
head -c 32 /dev/urandom >> movie.avi
Metadata would be a good thing to change. If the file has metadata about the time the film was made or the software used for encoding, changes to those values should not have any effect on the final result. You'd need to specify the format.
Yegor,
It depends entirely on the video format. Look it up on wikipedia, some have a end of file flag byte sequence, simply adding bytes after it will achieve your effect, others will not work out so simply.

Resources