I really really want to know about flip algorithm about YUV420P(IYUV) byte stream.
I knew flip algorithm of RGB or RGBA Byte stream... but yuv420 format didn't...
I tried to find any example bur I didn't...
How can I flip the yuv420p stream?
[Edit]
My project use C/C++ language!!
Related
I wonder which one among methods below should preserve more details of images:
Down scaling BGRA images and then converting them to NV12/YV12.
Converting BGRA images to NV12/YV12 images and then down scaling them.
Thanks for your recommendation.
Updated 2020-02-04:
For my question is more clear, I want to desribe a little more.
The images is come from a video stream like this:
Video Stream
-> decoded to YV12.
-> converted to BGRA.
-> stamped texts.
-> scaling down (or YV12/NV12).
-> YV12/NV12 (or scaling down).
-> H264 encoder.
-> video stream.
The whole sequence of tasks ranges from 300 to 500ms.
The issue I have is text stamped over the images after converted
and scaled looks not so clear. I wonder order at items: 4. then .5 or .5 then.4
Noting that the RGB data is very likely to be non-linear (e.g. in an sRGB format) ideally you need to
Convert from the non-linear "R'G'B'" data to linear RGB (Note this needs higher bit precision per channel) (see function spec on wikipedia)
Apply your downscaling filter
Convert the linear result back to non-linear R'G'B' (ie. sRGB)
Convert this to YCbCr/NV12
Ideally you should always do filtering/blending/shading in linear space. To give you an intuitive justification for this, the average of black (0) and white (255) in linear colour space will be ~128 but in sRGB this mid grey is represented as (IIRC) 186. If you thus do your maths in sRGB space, your result will look unnaturally dark/murky.
(If you are in a hurry, you can sometimes get away with just using squaring (and sqrt()) as a kludge/hack to convert from sRGB to linear (and vice versa))
For avoiding two phases of spatial interpolation the following order is recommended:
Convert RGBA to YUV444 (YCbCr) without resizing.
Resize Y channel to your destination resolution.
Resize U (Cb) and V (Cr) channels to half resolution in each axis.
The result format is YUV420 in the resolution of the output image.
Pack the data as NV12 (NV12 is YUV420 in specific data ordering).
It is possible to do the resize and NV12 packing in a single pass (if efficiency is a concern).
In case you don't do the conversion to YUV444, U and V channels are going to be interpolated twice:
First interpolation when downscaling RGBA.
Second interpolation when U and V are downscaled by half when converting to 420 format.
When downscaling the image it's recommended to blur the image before downscaling (sometimes referred as "anti-aliasing" filter).
Remark: since the eye is less sensitive to chromatic resolution, you are probably not going to see any visible difference (unless image has fine resolution graphics like colored text).
Remarks:
Simon answer is more accurate in terms of color accuracy.
In most cases you are not going to see the difference.
The gamma information is lost when converting to NV12.
Update: Regarding "Text stamped over the images after converted and scaled looks not so clear":
In case getting clear text is the main issue, the following stages are suggested:
Downscale BGRA.
Stamp text (using smaller font).
Convert to NV12.
Downsampling an image with stamped text, is going to result unclear text.
A better solution is to stamp a test with smaller font, after downscaling.
Modern fonts uses vectored graphics, and not raster graphics, so stamping text with smaller font gives better result than downscaled image with stamped text.
NV12 format is YUV420, the U and V channels are downscaled by a factor of x2 in each axis, so the text quality will be lower compared to RGB or YUV444 format.
Encoding image with text is also going to damage the text.
For subtitles the solution is attaching the subtitles in a separate stream, and adding the text after decoding the video.
ARKit runs at 60 frames/sec, which equates to 16.6ms per frame.
My current code to convert the CVPixelBufferRef (kCVPixelFormatType_420YpCbCr8BiPlanarFullRange format) to a cv::Mat (YCrCb) runs in 30ms, which causes ARKit to stall and everything to lag.
Does anyone have any ideas on how to to a quicker conversion or do I need to drop the frame rate?
There is a suggestion by Apple to use Metal, but I'm not sure how to do that.
Also I could just take the grayscale plane, which is the first channel, which runs in <1ms, but ideally I need the colour information as well.
In order to process an image in a pixel buffer using Metal, you need to do following.
Call CVMetalTextureCacheCreateTextureFromImage to create CVMetalTexture object on top of the pixel buffer.
Call CVMetalTextureGetTexture to create a MTLTexture object, which Metal code (GPU) can read and write.
Write some Metal code to convert the color format.
I have an open source project (https://github.com/snakajima/vs-metal), which processes pixel buffers (from camera, not ARKit) using Metal. Feel free to copy any code from this project.
I tried to convert Ycbcr to RGB, do image processing in RGB mat and convert it back to Ycbcr, it worked very slowly. I suggest only do that with a static image. For realtime processing, we should process directly in cv::Mat. ARFrame.capturedImage is Ycbcr buffer. So, the solution is
Sperate the buffer to 2 cv::Mat (yPlane and cbcrPlane). Keep in mind, we do not clone memory, we create 2 cv::Mat with base addresses is yPlane address and cbcrPlane address.
Do image process on yPlane and cbcrPlane, size(cbcrPlane) = size(yPlane) / 2.
You can check out my code here: https://gist.github.com/ttruongatl/bb6c69659c48bac67826be7368560216
For example I can give one of following video formats:
400
411
420
422
444
Selecting every video format is showing different PSNR value for video sequence.
OR Is there any way I can determine YUV video data format of my input YUV video sequence?
According to Wikipedia, PSNR is reported against each channel of color space.
Alternately, for color images the image is converted to a different color space and PSNR is reported against each channel of that color space, e.g., YCbCr or HSL.
See: https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio
For computing PSNR of video, you must have the source video, and the same video after some kind of processing stage.
PSNR is most commonly used to measure the quality of reconstruction of lossy compression codecs).
In case color sub-sampling (e.g converting YUV 444 to YUV 420), is part of the lossy compression pipeline, it's recommended to include the sub-sampling in the PSNR computation.
Note: There is no strict answer, it depends what you need get measured.
Example:
Assume input video is YUV 444, and H.264 codec were used from lossy compression, and assume pre-processing stage is converting YUV 444 to YUV 420.
Video Compression: YUV444 --> YUV420 --> H264 Encoder.
You need to reverse the process, and then compute PSNR.
Video Reconstruction: H264 Decoder --> YUV420 --> YUV444.
Now you have input video in YUV 444 format, and reconstructed video in YUV 444 format, apply PSNR computation of the two videos.
Determine YUV video data format of input YUV video:
I recommend using ffprobe tool.
You can download it from here: https://ffmpeg.org/download.html (select "Static Linking").
I found the solution here: https://trac.ffmpeg.org/wiki/FFprobeTips.
You can use the following example:
ffprobe -v error -show_entries stream=pix_fmt -of default=noprint_wrappers=1:nokey=1 input.mp4
Y-PSNR: you can simply extract the Y component of the original and the reference images, and calculate the PSNR value for each image/video frame.
For video: you need to calculate the mean value of the all estimated PSNR values.
How convert one channel YUV image (first channel - Y are used) to 24 depth RGB image? I asks, because i must display it using gtk+ interface and gtk supports only 24 depth RGB image.
I'm not sure what you are actually starting from, a single-channel grayvalue image or a three-channel YUV image of which the second and third channel are full of zeros. If you have a single-channel 8-bit image to start with, you can use cvtColor(source_mat,destination_mat,CV_GRAY2RGB) to convert to 24-bit RGB. If you are starting from a 3-channel 24-bit YUV image with two channels full of zeros, you can use the split() function to get the Y channel out of it, then convert that as described above.
I want to record images, rendered with OpenGL, into a movie-file with the help of AVAssetWriter. The problem arises, that the only way to access pixels from an OpenGL framebuffer is by using glReadPixels, which only supports the RGBA-pixel format on iOS. But AVAssetWriter doesn't support this format. Here I can either use ARGB or BGRA. As the alpha-values can be ignored, I came to the conclusion, that the fastest way to convert RGBA to ARGB would be to give glReadPixels the buffer shifted by one byte:
UInt8 *buffer = malloc(width*height*4+1);
glReadPixels(0, 0, width, height, GL_RGBA, GL_UNSIGNED_BYTE, buffer+1);
The problem is, that the glReadPixels call leads to a EXC_BAD_ACCESS crash. If I don't shift the buffer by one byte, it works perfectly (but obviously with wrong colors in the video-file). What's the problem here?
I came to the conclusion, that the fastest way to convert RGBA to ARGB would be to give glReadPixels the buffer shifted by one byte
This will however shift your alpha values by 1 pixel as well. Here's another suggestion:
Render the picture to a texture (using a FBO with that texture as color attachment). Next render that texture to another framebuffer, with a swizzling fragment shader:
#version ...
uniform sampler2D image;
uniform vec2 image_dim;
void main()
{
// we want to address texel centers by absolute fragment coordinates, this
// requires a bit of work (OpenGL-ES SL doesn't provide texelFetch function).
gl_FragColor.rgba =
texture2D(image, vec2( (2*gl_FragCoord.x + 1)/(2*image_dim.y),
(2*gl_FragCoord.y + 1)/(2*image_dim.y) )
).argb; // this swizzles RGBA into ARGB order if read into a RGBA buffer
}
What happens if you put an extra 128 bytes of slack on the end of your buffer? It might be that OpenGL is trying to fill 4/8/16/etc bytes at a time for performance, and has a bug when the buffer is non-aligned or something. It wouldn't be the first time a performance optimization in OpenGL had issues on an edge case :)
Try calling
glPixelStorei(GL_PACK_ALIGNMENT,1)
before glReadPixels.
From the docs:
GL_PACK_ALIGNMENT
Specifies the alignment requirements for the start of each pixel row in memory.
The allowable values are
1 (byte-alignment),
2 (rows aligned to even-numbered bytes),
4 (word-alignment), and
8 (rows start on double-word boundaries).
The default value is 4 (see glGet). This often gets mentioned as a troublemaker in various "OpenGL pitfalls" type lists, although this is generally more to do with its row padding effects than buffer alignment.
As an alternative approach, what happens if you malloc 4 extra bytes, do the glReadPixels as 4-byte aligned starting at buffer+4, and then pass your AVAssetWriter buffer+3 (although I've no idea whether AVAssetWriter is more tolerant of alignment issues) ?
You will need to shift bytes by doing a memcpy or other copy operation. Modifying the pointers will leave them unaligned, which may or may not be within the capabilities of any underlying hardware (DMA bus widths, tile granularity, etc.)
Using buffer+1 will mean the data is not written at the start of your malloc'd memory, but rather one byte in, so it will be writing over the end of your malloc'd memory, causing the crash.
If iOS's glReadPixels will only accept GL_RGBA then you'll have to go through and re-arrange them yourself I think.
UPDATE, sorry I missed the +1 in your malloc, StilesCrisis is probably right about the cause of the crash.