NVIDIA NVENC (Media Foundation) encoded h.264 frames not decoded properly using VideoToolbox

NVIDIA NVENC (Media Foundation) encoded h.264 frames not decoded properly using VideoToolbox - ios

I am facing the same problem as described here when trying to decode a frame on iPad Pro OS v14.3 (I am also using Olivia Stork's example):
25% of the picture data is decoded correctly, the rest of the picture is just green.
The decoded image on iPad Pro OS v14.3 looks like this (the image was converted and saved in the decoder callback as described here, so it's not just a displaying problem).
The original image looks like this.
The image is encoded with NVIDIA NVENC (Media Foundation) on Windows10.
I searched the frame picture data for additional 4-Byte NALU start codes as described in the link, but there are only the three expected ones for SPS, PPS and IDR picture data.
I have another Media Foundation decoder application running on Windows10 which can decode the frames from exactly the same source correctly.
I am struggling for days now finding the cause of the problem.. anyone any ideas?
Thanks in advance. Rob
-
EDIT 2021-01-11:
I figured out that there are actually three additional 3-byte start codes (0x000001) within in the IDR picture data block of NALU type 5.
I tried to replace these start codes with the length of the following data block (big endian), as described here, but with the same result.
I also tried adding Emulation Prevention Bytes (0x000001 => 0x000301) as described here, but that also made no difference.
Maybe I am mislead and these start codes have nothing to do with the issue.. at least they are not just random image data, because they always appear at the same position (index) in the picture data block. Currently I am running out of ideas.. any hint anybody?
-
EDIT 2021-01-14:
I figured out a few more things:
Out of sheer lack of ideas I copied the picture data followed after the last start code at the beginning of the block (right after after the 4-Byte NALU start code).
I had expected - if that would work at all - to see the last quarter of the original image at the top of the decoded image, but to my surprise the decoded image looked like this.
I tried the same with the picture data coming after the second and third start code, and the decoded image looked like this and this:
The image data is decoded correctly and it is even at the correct position (compare to original image).
Even if I strip off all 3-Byte start codes and copy the picture data concatenated after the 4-Byte start code, it's the same result, only 25% of the image is decoded. So the additional 3-Byte start codes are apparently not the problem. There must be some setting somewhere which tells the decoder to only decode 25% of the image. I would tip on the CMVideoFormatDescription, but as far as I can see it looks okay.
I am also wondering how the decoder knows where to display the different picture data blocks. Either there is an offset defined somewhere within the picture data or the xy-position of every pixel is added by the encoder somehow..

I managed to find the cause of the problem: The 3-Byte start codes in the IDE picture data block must be replaced by 4-Byte start codes.
So first replace all 3-Byte start codes by 4-Byte start codes.
Then replace the 4-Byte start codes with the length of the following data block (big endian). The slices should be arranged like this (as mentioned here by 'Blackie'):
[4byte slice1 size][slice1 data][4byte slice2 size][slice2 data]...[4byte slice4 size][slice4 data]
Remember to not include the start code length in slice size.
After changing that, my frame was completely displayed.
By the way:
The information where to display the different picture data blocks is specified in the header data of each NALU (parameter 'first_mb_in_slice').
There is a very good c# example here how to extract the NALU header data. You can almost copy it 1:1.

Related

Reading byte by byte HEIF/HEIC images XMP metadata

I am trying to build a native byte parser that given an HEIF image it returns back its metadata (mainly width and height of the image).
I am struggling a lot at the moment finding the right documentation and specs to use for parsing such info. I have to do such thing for both XMP and EXIF metadata, but let's focus only on XMP for now.
What I need is the exact byte structure of where to find what. According to the HEIF international standard doc (here):
For image items, XMP metadata shall be stored as an item of item_type value 'mime' and content type'application/rdf+xml'. The body of the item shall be a valid XMP document, in XML form.
Perfect, if I analyse a sample image I can find such marker:
From now on I can't find anywhere how to get the info I need. I would expect something saying "the first 2 bytes are the header, with marker 0xFF 0xCE (just an example), the next 2 bytes are the width, and following 2 bytes the height...etc".
In my case I am going by intuition. My sample image is of dimensions 8736x5856. If in the tool I look for Big-Endian 2 byte integer 8736, I can find it:
And hey, 2 bytes later there is the 5856 height as well:
But again, I arrived here by luck and intuition. I need a proper schema that tells me where to find what in such a way that I can traslate it to code.

What I think you'r seeing is a "mime" and "ispe" mp4 box as HEIF is ISOBMFF based. I would recommend looking at the file using a mp4 capable tool like mp4dump, HexFiend or fq (note: my tool). The "ispe" (Image Spatial Extents) box i probably what you want to read.
fq does no support ispe box yet but you could read it like this:
$ fq 'grep_by(.type=="ispe").data | tobytes | [.[-8:-4], .[-4:] | tonumber]' file.heif
[
8736,
5856
]
So what you need is probably a basic ISOBMFF reader and then look for the "ispe" box and decode it. If you'r only looking for the first of a specific box you can probably ignore that ISOBMFF is a tree structure.

Catching errors for CIFilter.outputImage when using CIAztecCodeGenerator/CIQRCodeGenerator

I'm trying to create 2D barcode images with iOS and CoreImage using a CIFilter for CIAztecCodeGenerator. Depending on the length of the text and the error correction level settings, CIFilter.outputImage sometimes returns nil.
The following message is printed on the console only:
Unable to create barcode. The message is too large for an Aztec
barcode.
A similar message will be printed when using CIQRCodeGenerator.
Is there a way to catch this kind of error for CIFilter in code or find out in advance if the text will be too long to be processed?
Thank you very much for any suggestions!

The maximum size of the data depends on the correction level and also on the type of message that you want to encode. For example if your text only have numbers then a different encoding is used to put that data in the barcode and more characters will be allowed. You cannot really detect that specific error but you can always just check if the result was nil. If it was nil then probably beacuse message was too long.
According to documentation at https://developer.apple.com/library/archive/documentation/GraphicsImaging/Reference/CoreImageFilterReference/index.html#//apple_ref/doc/filter/ci/CIAztecCodeGenerator :
The full-size format can store up to 1914 bytes of message data
(including correction) in up to 32 layers, producing a barcode image
sized no larger than 151 x 151 pixels.
Also at https://help.accusoft.com/BarcodeXpress/v13.2/BxNodeJs/aztec.html you will find this 1914 bytes value. So probably you could assume that anything that is not larger than 1914 bytes should be successfully encoded by CIAztecCodeGenerator, but encoding anything above that size could fail.

Reverse Engineering proprietary TIFF format

I'm deep in the weeds reverse engineering a very old proprietary document storage format (Keyfile). Embedded in the middle of a larger file is a block of image data (the scan of a single document page) that is encoded with CCITT4. I've learned enough about the file and the TIFF spec so far to write a filter that extracts the data from the source file and writes a new file that is supposed to be a plain TIFF, but it's not quite there yet, and I can't figure out what I'm still missing.
Encouragingly Adobe Photoshop opens my newly minted TIFF file and displays the document just fine (no errors, no warnings). Unfortunately, none of the other common tools will. I'm on a mac and have access to linux so I've tried:
Gimp
Preview (OSX)
ImageMagick
some of the libtiff utilities like fax2pdf
I suspect there's something wrong still with my TIFF file, that Photoshop is silently overlooking. I hope it's not in the raw CCITT4 image data, because I would rather not have to write code to decode that completely.
I can't post the files I'm working with because they contain sensitive data. However, I'm hoping that I'm just doing something wrong with my tiff header block that someone can point out. To that end. here's some basic information about my test file (the one that opens fine in Photoshop).
Keyfile.tiff 31K (32300 bytes)
Keyfile TIFF Version 1.01
0100.0004.00000001.000009f0 ImageWidth
0101.0004.00000001.00000ce0 ImageLength
0102.0003.00000001.00000001 BitsPerSample
0103.0003.00000001.00000004 Compression
0106.0003.00000001.00000000 PhotometricInterpolation
0111.0004.00000001.00000200 StripOffsets
0115.0003.00000001.00000001 SamplesPerPixel
0116.0004.00000001.00000ce0 RowsPerStrip
0117.0004.00000001.00007c2c StripByteCounts
011a.0005.00000001.000001d6 XResolution
011b.0005.00000001.000001de YResolution
0128.0004.00000001.00000002 ResolutionUnit
0131.0002.0000001a.000001e6 Software
This decode of the TIFF header block comes from code that I've written. Here's a hex dump of the header portion of the file to address 0x200.
49492A00080000000D000001040001000000F00900000101040001000000E00C00000201030001000000010000000301030001000000040000000601030001000000000000001101040001000000000200001501030001000000010000001601040001000000E00C000017010400010000002C7C00001A01050001000000D60100001B01050001000000DE010000280104000100000002000000310102001A000000E6010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002C010000010000002C010000010000004B657966696C6520544946462056657273696F6E20312E303100
What follows is exactly 0x7c2c bytes of compressed image data. I say this based on the tiff compression tag (4), which is copied over intact form the original file, and from looking at dozens and dozens of files with a hex editor and learning to recognize the image data block. Also the fact that Photoshop opens this file would seem to indicate I am correct.
Any help figuring out what I still need to do to make this file compatible with the rest of the utilities would be much appreciated.
For what it's worth here's the error produced by imagemagick:
>convert Keyfile.tiff Keyfile.pdf
convert: Premature EOL at line 0 of strip 0 (got 0, expected 2544). `Fax4Decode' # warning/tiff.c/TIFFWarnings/881.
I'm new to coding for TIFF and so any utilities or hints that would allow me to gather more detailed information about what's going on would also be appreciated.
Update:
Here are the first 0x318 bytes of the file. There's nothing sensitive here and you have the first 0x118 bytes of the image data. I can probably provide a bit more of the file if needed.
49492A00080000000D000001040001000000F00900000101040001000000E00C00000201030001000000010000000301030001000000040000000601030001000000000000001101040001000000000200001501030001000000010000001601040001000000E00C000017010400010000002C7C00001A01050001000000D60100001B01050001000000DE010000280104000100000002000000310102001A000000E6010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002C010000010000002C010000010000004B657966696C6520544946462056657273696F6E20312E3031000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000FFFFFFFC8085B51FFFFFFFFFFFFFFFFFFFFFFFF90154E0C4221836AC80A900F04142050814204679705E823C0D3089900E92D641B9B1D2907364E94886C112854118E6208686E6492B47D11C1A29289806DC25083A41427495102E6D349641736AA96439B08496113867960B314A08CC1A2102141410221AADC28102123E918508E02AC41143D2C5131C3C68B1620B8CCB02A8238F564536394D16F11AA050CEA8A9944105DB92591D12D04513E195B23E1252561A742191D11B0628110DA6E5259A6881891832C74B704A0C8F1B4618450E2AA4087391D17988888EA41CDAD8A2B0AAA4436A2647D94CC585
Update 2:
OK, I found a file that I can post. It's a mostly white page, but if rendered correctly, you will see the two darkish crescent moons which are the reflection of the holes on the original scanned page. There's also a bit of noise over to the right and along the top. Here's what it looks like (image):
I used Photoshop to convert/save a file I could upload. Here's a hex dump of the file my code generated, which opens fine in Photoshop, but not with anything else.
49492A00080000000D000001040001000000F00900000101040001000000E00C00000201030001000000010000000301030001000000040000000601030001000000000000001101040001000000000200001501030001000000010000001601040001000000E00C00001701040001000000530300001A01050001000000D60100001B01050001000000DE010000280104000100000002000000310102001A000000E6010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002C010000010000002C010000010000004B657966696C6520544946462056657273696F6E20312E3031002C19461170350282E88E8AF52889A91024623806A1C8F97C8E8D111D1847115B44CF3A2388DA2E8C2388122F98C868E23451112508B88600D4297C8E88E44788F91E308BC4745CC8F91E23A2EC8E88F11E23B36447C8F11CC8E611020711111A6888390E39C738E0848E8BA23A388D4A224111B03681C206478DA892946E2E06D06B51121718036032092844E0AE470350604AA229C88E0680CC224511803402E24A11F88E0660D8224A40CD1016ACC8E0B606048906482C101752460C8E19006E224AC3203901D091B03C08122D9C0DA12141BFFFFFFFFFFFFFFFFFFFFFFFFFFFFF2D2125082123F1A2EA08124122EB6820A475E2105130A8209826474388886475612449543B295550C8E88224EC591D1174295B23A48C0EC591E08762111E23A2F9F46D11D02E22323E088A3870447542223EE35BDF56AD5856AD430A1856AC2879692C06C2FC304259A688BA23D2D23211A4088FC504162A5373447C20A2396062188A891F23F7C48E89502F41A46D11B417126E51328709EDE4747D04171D8B23A650E5714E13158921F111588AB0AF72CA6AB50ED27690664750C286B6B1B29D351609F21976B8685A8613C309A96014631FFFFFFFFFFFFF2039C720383A5C5DFEB56B0B51FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF9601A8FFFFFFFFFFFFFFFFFFFFFFFFFFCEC6947FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF95CEA3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE5A852A3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFC004004
Here are it's specs.
Keyfile_66.tiff 1K (1363 bytes)
Keyfile TIFF Version 1.01
0100.0004.00000001.000009f0 ImageWidth
0101.0004.00000001.00000ce0 ImageLength
0102.0003.00000001.00000001 BitsPerSample
0103.0003.00000001.00000004 Compression
0106.0003.00000001.00000000 PhotometricInterpolation
0111.0004.00000001.00000200 StripOffsets
0115.0003.00000001.00000001 SamplesPerPixel
0116.0004.00000001.00000ce0 RowsPerStrip
0117.0004.00000001.00000353 StripByteCounts
011a.0005.00000001.000001d6 XResolution
011b.0005.00000001.000001de YResolution
0128.0004.00000001.00000002 ResolutionUnit
0131.0002.0000001a.000001e6 Software
Here's a link to download the file.
Any idea why this is would be much appreciated.

OpenCV imwrite increases the size of png image

I am doing image manipulation on the png images. I have the following problem. After saving an image with imwrite() function, the size of the image is increased. For example previously image is 847KB, after saving it becomes 1.20 MB. Here is a code. I just read an image and then save it, but the size is increased. I tried to set compression params but it doesn't help.
Mat image;
image = imread("5.png", -1);
vector<int> compression_params;
compression_params.push_back(CV_IMWRITE_PNG_COMPRESSION);
compression_params.push_back(9);
compression_params.push_back(0);
imwrite("output.png",image,compression_params);
What could be a problem? Any help please.
Thanks.

PNG has several options that influence the compression: deflate compression level (0-9), deflate strategy (HUFFMAN/FILTERED), and the choice (or strategy for dynamically chosing) for the internal prediction error filter (AVERAGE, PAETH...).
It seems OpenCV only lets you change the first one, and it hasn't a good default value for the second. So, it seems you must live with that.
Update: looking into the sources, it seems that compression strategy setting has been added (after complaints), but it isn't documented. I wonder if that source is released. Try to set the option CV_IMWRITE_PNG_STRATEGY with Z_FILTERED and see what happens
See the linked source code for more details about the params.

#Karmar, It's been many years since your last edit.
I had similar confuse to yours in June, 2021. And I found out sth which might benefit others like us.
PNG files seem to have this thing called mode. Here, let's focus only on three modes: RGB, P and L.
To quickly check an image's mode, you can use Python:
from PIL import Image
print(Image.open("5.png").mode)
Basically, when using P and L you are attributing 8 bits/pixel while RGB uses 3*8 bits/pixel.
For more detailed explanation, one can refer to this fine stackoverflow post: What is the difference between images in 'P' and 'L' mode in PIL?
Now, when we use OpenCV to open a PNG file, what we get will be an array of three channels, regardless which mode that
file was saved into. Three channels with data type uint8, that means when we imwrite this array into a file, no matter
how hard you compress it, it will be hard to beat the original file if it was saved in P or L mode.
I guess #Karmar might have already had this question solved. For future readers, check the mode of your own 5.png.

ios endless video recording

I'm trying to develop an iPhone app that will use the camera to record only the last few minutes/seconds.
For example, you record some movie for 5 minutes click "save", and only the last 30s will be saved. I don't want to actually record five minutes and then chop last 30s (this wont work for me). This idea is called "Loop recording".
This results in an endless video recording, but you remember only last part.
Precorder app do what I want to do. (I want use this feature in other context)
I think this should be easily simulated with a Circular buffer.
I started a project with AVFoundation. It would be awesome if I could somehow redirect video data to a circular buffer (which I will implement). I found information only on how to write it to a file.
I know I can chop video into intervals and save them, but saving it and restarting camera to record another part will take time and it is possible to lose some important moments in the movie.
Any clues how to redirect data from camera would be appreciated.

Important! As of iOS 8 you can use VTCompressionSession and have direct access to the NAL units instead of having to dig through the container.
Well luckily you can do this and I'll tell you how, but you're going to have to get your hands dirty with either the MP4 or MOV container. A helpful resource for this (though, more MOV-specific) is Apple's Quicktime File Format Introduction manual
http://developer.apple.com/library/mac/#documentation/QuickTime/QTFF/QTFFPreface/qtffPreface.html#//apple_ref/doc/uid/TP40000939-CH202-TPXREF101
First thing's first, you're not going to be able to start your saved movie from an arbitrary point 30 seconds before the end of the recording, you'll have to use some I-Frame at approximately 30 seconds. Depending on what your Keyframe Interval is, it may be several seconds before or after that 30 second mark. You could use all I-frames and start from an arbitrary point, but then you'll probably want to re-encode the video afterward because it will be quite large.
SO knowing that, let's move on.
First step is when you set up your AVAssetWriter, you will want to set its AVAssetWriterInput's expectsMediaDataInRealTime property to YES.
In the captureOutput callback you'll be able to do an fread from the file you are writing to. The first fread will get you a little bit of MP4/MOV (whatever format you're using) header (i.e. 'ftyp' atom, 'wide' atom, and the beginning of the 'mdat' atom). You want what's inside the 'mdat' section. So the offset you'll start saving data from will be 36 or so.
Each read will get you 0 or more AVC NAL Units. You can find a listing of NAL unit types from ISO/IEC 14496-10 Table 7-1. They will be in a slightly different format than specified in Annex B, but it's fine. Additionally, there will only be IDR slices and non-IDR slices in the MP4/MOV file. IDR will be the I-Frame you're looking to hang onto.
The NAL unit format in the MP4/MOV container is as follows:
4 bytes - Size
[Size] bytes - NALU Data
data[0] & 0x1F - NALU Type
So now you have the data you're looking for. When you go to save this file, you'll have to update the MPV/MOV container with the correct length, sample count, you'll have to update the 'stsz' atom with the correct sizes for each sample and things like updating the media headers and track headers with the correct duration of the movie and so on. What I would probably recommend doing is creating a sample container on first run that you can more or less just overwrite/augment with the appropriate data for that particular movie. You'll want to do this because the encoders on the various iDevices don't all have the same settings and the 'avcC' atom contains encoder information.
You don't really need to know much about the AVC stream in this case, so you'll probably want to concentrate your experimenting around updating the container format you choose correctly. Good luck.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart