Why can't I get a manually modified MPEG-4 extended box (chunk) size to work? - parsing

Overview
As part of a project to write an MPEG-4 (MP4) file parser, I need to understand how an extended box (or chunk) size is processed within an MP4 file. When I tried to manually simulate an MP4 file with an extended box size, media players report that the file is invalid.
Technical Information
Paraphrasing the MPEG-4 specification:
An MP4 file is formed as a series of objects called 'boxes'. All data is contained in boxes, there is no other data within the file.
Here is a screen capture of Section 4.2: Object Structure, which describes the box header and its size and type fields:
Most MP4 box headers contain two fields: a 32-bit compact box size and a 32-bit box type. The compact box size supports a box's data up to 4 GB. Occasionally an MP4 box may have more data than that (e.g., a large video file). In this case, the compact box size is set to 1, and eight (8) octets are added immediately following the box type. This 64-bit number is known as the 'extended box size', and supports a box's size up to 2^64.
To understand the extended box size better, I took a simple MP4 file and wanted to modify the moov/trak/mdia box to use the extended box size, rather than the compact size.
Here is what the MP4 file looks like before modifying it. The three box headers are highlighted in RED:
My plan was as follows:
Modify the moov/trak/mdia box
In the moov/trak/mdia, insert eight (8) octets immediately following the box type ('mdia'). This will eventually be our extended box size.
Copy the compact box size to the newly-inserted extended box size, adding 8 to the size to compensate for the newly inserted octets. The size is inserted in big-endian order.
Set the compact size to 1.
Modify the moov/trak box
Add 8 to the existing compact box size (to compensate for the eight octets added to mdia).
Modify the moov box
Add 8 to the existing compact box size (again, to compensate for the eight octets in mdia)
Here's what the MP4 file looks like now, with the modified octets are in RED:
What have we done?
We have told the MP4 parser/player to take the moov/trak/mdia box size from the extended field rather than the compact size field, and have increased all parent boxes by eight (8) to compensate for the newly-inserted extended box size in the mdia box.
What's the problem?
When I attempt to play the modified MP4 file I receive error messages from different media players:
Why do the media players see the modified file as invalid MP4?
Did I need to alter any other fields?
Does the extended box size have to be greater than 2^32?
Can it be that only specific box types support extended box size (e.g., Media Data)?

A tip of the hat to #Alan Birtles for pointing out that the chunk offsets would also need to be modified. Indeed, the stco (sample table chunk offset?) box contains absolute file offsets to the data chunks in the mdat box (rather than relative offsets within a box). This can be seen in the specification document:
The chunk offsets need to be increased by the number of octets we added to the file before the mdat box. In our case, this is the eight (8) octet extended box size inserted in the mdia box.
All that remained was to manually change the chunk offsets found in the two stco boxes (both video and audio tracks), adding eight (8) to each chunk offset. Here are the stco boxes before adding 8 to their chunk offsets:
Now the file passes validity tests of both the ffmpeg and ffprobe tools. Interestingly, although VLC succeeds in playing the modified file, other media players (e.g., Windows Media Player, MS Photos, MS Movies & TV, MS MovieMaker) report the file as corrupted. It is not clear why they fail to play the file. Unverified possibilities include:
Not supporting the extended box size for any box other than mdat
Balking if the extended box size is less than 2^32
In summary, if any fields are added to boxes (e.g., extended box size), the stco chunk offsets need to be incremented by the number of octets inserted in the MP4 file preceding each stco box.

Related

iOS - JPG "open as" hex in Xcode

In Xcode, there is option to open a JPG file as hex. What is this actually showing?
On opening, we see something like this.
What is the data that is available on the left side and what is on the right side?
In iOS, how can we read the data which is on the right side starting with ˇÿˇ·NÿExifII*?
The hex view is showing the bytes of the file. The left side shows the byte offset down the left column and the hex value for each byte in the grid. The right side is simply the character for each byte. A little bullet is shown for non-printable characters.
Use the Data class to load the file into memory. Then you can do whatever you need with the data.
It's actually showing you (in the middle pane) each byte in the file displayed in hexadecimal. In the right pane, you see the ASCII equivalent (where available) of each hexadecimal byte. Sounds like what you really want is to get the EXIF data, a question which is answered here: How to get Exif data from downloaded image
The hex editor is showing the raw bytes in the file in hexadecimal bytes.
(Personally, I didn't realize Xcode had an Hex viewer, so thanks!)
The data column on the right is attempting to render the binary into byte length ascii where it can. Control codes and the like will render as periods. The use of this is designed for searching for strings or text in a binary file.
The data itself is (for example as a JPEG file) following the JPEG data file format. You can see more info on that on the Wikipedia JPEG page.
In general, one shouldn't need to "read" a jpeg file manually. There are several APIs to reading in a graphic file to have it ready to use easily. In general most graphic entities should live in the .xcassets areas now to allow better resolution scaling for multiple devices.

What is the iDOT chunk

Looking at a screenshot I took on a MacBook Pro with Retina display and running OS X 10.11, I found that it contained these chunks:
IHDR, iCCP, pHYs, iTXt, iDOT, IDAT…, IEND
All of these are part of the 2003 spec, except for iDOT which is a small (28 bytes) chunk. According to the chunk naming conventions, the fact that its second letter is capital should indicate that it's a chunk with a public specification. I couldn't find its specification anywhere yet though. It's not listed in the Register of public PNG chunks and keywords, Version 1.4.6 either, although that appears to be the latest version.
There are many sites on the web mentioning that chunk, including many on Stack Overflow. Most are describing error messages along the lines of
ImageIO: PNG invalid PNG file: iDOT doesn't point to valid IDAT chunk
and those which got resolved found out some kind of image corruption not neccessarily due to this chunk, or applied some conversion which presumably also deleted this chunk.
Many pages also mention Retina displays. It is my guess and hope that this chunk somehow indicates the display density in effect when the screenshot was taken. That would be massively useful for automatic scaling of screenshots.
Edit: Taking some more screenshots, I find that indeed the pixel density seems to play a part: running the display at native resolution I get no such chunk and the image dimensions as shown while taking the screenshot. Only at higher density do I get the chunk and a PNG image size which is an integer multiple of the displayed one. The 28 bytes of data seem to represent 7 little-endian 32-bit integers. For me these were (2, 0, h, 40, h, h, x) where 2 presumably indicates the pixel density, h is the apparent image height (i.e. half the one actually stored) and x is some number I don't understand at all. I don't know how fractional pixel densities would enter this game.
Where can one find details and perhaps even a specification for this chunk? Do I have to contact Apple or the registry, or is there someone here who can provide more details?
As Hendrik already wrote in a comment, https://www.hackerfactor.com/blog/index.php?/archives/895-Connecting-the-iDOTs.html has reverse-engineered the purpose of this chunk. It contains an offset to a well-defined position in the encoded image data. Right after a flush so decompressor state is reset, and also right at an IDAT chunk boundary, and with a known pixel position at half the height of the image.
The purpose of that is to allow decoding the image data in parallel on multi-core processors. This goal has recently (2021-12-01) been confirmed in a comment on that page from someone involved with the implementation. That comment, albeit still far from official, is there most authoritative I could find on this issue so far.

How to localize bitmap font?

I usually have 4 files per font. For instance
menu-ipadhd.fnt
menu-ipadhd.png
menu.fnt
menu.png
Question is, should I localize all 4 files or .fnt localization is enough? Can Cocos2D find png file from corresponding .lproj? What is the proper way to localize bitmap fonts?
Whether cocos2d finds files according to language is easy to test, put two different files one in en.lproj and the other in some other language folder, then change the device's or simulators language and see whether it picks the correct one for each language.
If you are meaning to localize the text (strings displayed with a bitmap font) you really just have to localize the strings and make sure that for all supported languages all language-specific characters (äéøß and such) are included in the bitmap font's image. At least all of those actually used by your localized strings.
If you are localizing to a completely different character set, for example from english to cyrillic, asian or arabic languages, you need to provide different bitmap fonts altogether, ie both fnt and png files and load them according to the language. This is next to localizing the text itself.
The reason is that these non-latin character sets contain hundreds if not thousands of different characters, so including them in a single bitmap font is not desirable and may not even be possible to fit them all in 4096x4096 (or less) texture space.
Some cocos2d version's bitmap font class limits the number of characters in a bitmap font to 2048 - this may apply only to v1.x I haven't checked if that limitation still exists in 2.x in general, I just can't locate it in 2.1 anymore. For character sets that exceed this number of characters you will have to increase that number in the header of CCBMFontLabel - if available.
Be careful though with large bitmap fonts as the storage per letter can add up quite significantly, an entire font can use several megabytes - not counting the texture itself!

png snapshot of a specific swf frame

I have two swfs. the first is my as3 application. the second one is contains 50 odd frames. The first swf has only two elements: a text box wherein the user types an integer (the frame number of the second swf actually) and a border container 100 x 300. When the user keys in an integer in the first swfs text box, I need to access the second swf from the first one, take a bitmap snapshot of the frame specified by the user, convert to png (minimum resolution 300 dpi) and display it inside the first swfs border container.
Can anyone guide me on how to take a bitmap snapshot (or convert) a specific frame of an external swf and pass it back to the master (controlling) swf?
Thanks

Maximum image dimensions in a browser/CSS spec?

I want to display a page containing about 6000 tiny image thumbnails (40x40 each). To avoid having to make 6000 HTTP requests, I am exploring CSS sprites, i.e. concatenating all these thumbnails into one long strip and using CSS to crop the required images out. Unfortunately, I have discovered that JPEG files cannot be larger than 65500 pixels in any one dimension. Wary of further limits in the web stack, I am wondering: are any of the following unable to cope with an image with dimensions of 40x240000?
Internet Explorer
Opera
WebKit
Any CSS spec
Any HTML spec
The PNG spec
Edit: the purpose of this is simply to display an entire image collection at once, requiring that the user at most has to scroll. I want the "micro-thumbnails" to flow into an existing CSS layout, so I can't just use a big rectangular image. I don't want the user to have to click through multiple pages to see everything. The total number of pixels is not that great - only twice what would fit on a 2560x1600 display. The total file size of all the micro-thumbnails is only a couple of megabytes. Assuming every image is manipulated uncompressed in the browser's memory, taking 8 bytes of storage per pixel (RGBA plus 100% overhead fudge factor), we are talking RAM usage in the low hundreds of megabytes; not unreasonable for a specialized application in the year 2010. The only unreasonable thing is the volume of HTTP requests that would be generated if all micro-thumbnails were sent individually.
Well, Safari/iOS lists these limits:
The maximum size for decoded GIF, PNG, and TIFF images is 3 megapixels.
That is, ensure that width * height ≤ 3 * 1024 * 1024. Note that the decoded size is far larger than the encoded size of an image.
The maximum decoded image size for JPEG is 32 megapixels using subsampling.
JPEG images can be up to 32 megapixels due to subsampling, which allows JPEG images to decode to a size that has one sixteenth the number of pixels. JPEG images larger than 2 megapixels are subsampled—that is, decoded to a reduced size. JPEG subsampling allows the user to view images from the latest digital cameras.
Individual resource files must be less than 10 MB.
This limit applies to HTML, CSS, JavaScript, or nonstreamed media.
http://developer.apple.com/library/safari/#documentation/AppleApplications/Reference/SafariWebContent/CreatingContentforSafarioniPhone/CreatingContentforSafarioniPhone.html
Based on your update, I'd still really recommend not using this approach. Don't you think there's a reason that Google's image search doesn't work like this?
As such, I'd recommend simply loading images as required via Ajax. (i.e.: When the user scrolls below the currently visible set of images.) Whilst this will use more connections, it'll mean that you can have sensibly sized thumbnails and as a general approach is much more manageable than having to re-generate pre-generated thumbnail image "sheets" on the back-end when a new image is added, etc.

Resources