Video4Linux Y12 Pixel Format for use with OpenCV issues - opencv

I have a AMG88xx infrared camera attached to a raspberry PI 4
i am using the linux video-i2c driver
the driver appears to work correctly
v4l2-ctl -d /dev/video0 --all
Driver Info:
Driver name : video-i2c
Card type : I2C 1-104 Transport Video
Bus info : I2C:1-104
Driver version : 4.19.102
Capabilities : 0x85200001
Video Capture
Read/Write
Streaming
Extended Pix Format
Device Capabilities
Device Caps : 0x05200001
Video Capture
Read/Write
Streaming
Extended Pix Format
Priority: 2
Video input : 0 (Camera: ok)
Format Video Capture:
Width/Height : 8/8
Pixel Format : 'Y12 ' (12-bit Greyscale)
Field : None
Bytes per Line : 16
Size Image : 128
Colorspace : Raw
Transfer Function : Default (maps to None)
YCbCr/HSV Encoding: Default (maps to ITU-R 601)
Quantization : Default (maps to Full Range)
Flags :
Streaming Parameters Video Capture:
Capabilities : timeperframe
Frames per second: 10.000 (10/1)
Read buffers : 1
However the output pixel format (Y12) appears to be unsupported by openCV
>>> import cv2
>>> capture = cv2.VideoCapture(0)
VIDEOIO ERROR: V4L2: Pixel format of incoming image is unsupported by OpenCV
VIDEOIO ERROR: V4L: can't open camera by index 0
Do I need to build OpenCV with additional support? or somehow convert the pixelformat?

You don't need OpenCV and cv2.VideoCapture() to read that camera. It is just a relatively slow I2C device that you can read directly or using the Adafruit library as in this example.
By all means, you could read it as above and then convert from 12-bit to an 8-bit or 16-bit Numpy array and then process with OpenCV afterwards, but it is not necessary.
Alternatively, you could embed a subprocess call to ffmpeg like I did in the second part of this answer.

Issue was related to missing pixel format in OpenCV (see Issue #16620) fixed by #16626
found by compareing video4linux pixelformats with those supported by openCV in modules/videoio/src/cap_v4l.cpp

Related

How to extract a fixed set of frames from a live video stream for machine learning prediction in PyTorch?

I recently created a Video Swin Transformer model that takes in a ([batch_size], 3, 32, 224, 224) [batch_size, channel, temporal_dim, height, width] tensor for video and outputs logits. The goal is to have the model predict on a live stream from a camera. Is there any way to capture the fixed sequence of 32 frames repetitively and have the model predict on a live stream. If prediction time is longer than 32 frames, can I stretch out the frames over a longer time period like a minute? Thanks.
You can try to use my library ffmpegio, which suits your need:
To install:
pip install ffmpegio
To get block of 32 frames from your input url
import ffmpegio
url = 'input stream url'
temporal_dim = 32
height = 224
width = 224
size = [width,height]
pix_fmt = 'rgb24'
with ffmpegio.open(url,'rv',blocksize=temporal_dim,s=size,pix_fmt=pix_fmt) as stream:
for frames in stream: # frames in [time,height,width,ch] ndarray
vswim_in = frames.transpose(3,0,1,2) # reorg for your library
You can specify any other ffmpeg options as you wish to add (like a scaling/cropping filter to make input frame 224px square or input stream options).
Caveat. I haven't tested live stream buffering extensively. If you encounter any issues, please post an issue on the GitHub.

How to let FFMPEG fetch frames from OpenCV and stream them to HTTP server

There is a camera that shoots at 20 frame per second. each frame is 4000x3000 pixel.
The frames are sent to a software that contain openCV in it. OpenCV resizes the freames to 1920x1080 then they must be sent to FFMPEG to be encoded to H264 or H265 using Nvidia Nvenc.
The encoded video then got steamed HTTP to a maximum of 10 devices.
The infrastructure is crazy good (10 GB Lan) with state of the art switchers, routers etc...
Right now, i can get 90 FPS when encoding the images from an Nvme SSD. this means that the required encoding speed is achieved.
The question is how to get the images from OpenCV to FFMPEG ?
the stream will be watched on a webapp that was made using MERN stack (assuming that this is relevant).
For cv::Mat you have cv::VideoWriter. If you wish to use FFMpeg, assuming Mat is continuous, which can be enforced:
if (! mat.isContinuous())
{
mat = mat.clone();
}
you can simply feed mat.data into sws_scale
sws_scale(videoSampler, mat.data, stride, 0, mat.rows, videoFrame->data, videoFrame->linesize);
or directly into AVFrame
For cv::cuda::GpuMat, VideoWriter implementation is not available, but you can use NVIDIA Video Codec SDK and similarly feed cv::cuda::GpuMat::data into NvEncoderCuda, just make sure your GpuMat has 4 channels (BGRA):
NV_ENC_BUFFER_FORMAT eFormat = NV_ENC_BUFFER_FORMAT_ABGR;
std::unique_ptr<NvEncoderCuda> pEnc(new NvEncoderCuda(cuContext, nWidth, nHeight, eFormat));
...
cv::cuda::cvtColor(srcIn, srcIn, cv::ColorConversionCodes::COLOR_BG2BGRA);
NvEncoderCuda::CopyToDeviceFrame(cuContext, srcIn.data, 0, (CUdeviceptr)encoderInputFrame->inputPtr,
(int)encoderInputFrame->pitch,
pEnc->GetEncodeWidth(),
pEnc->GetEncodeHeight(),
CU_MEMORYTYPE_HOST,
encoderInputFrame->bufferFormat,
encoderInputFrame->chromaOffsets,
encoderInputFrame->numChromaPlanes);
Here's my complete sample of using GpuMat with NVIDIA Video Codec SDK

Python and ffmpeg create different tiff stacks

Hello everybody out there with an interest in image processing,
Creating a multipage tiff file (tiff stack) out of a grayscale movie can be achieved without programming using ffmpeg and tiffcp (the latter being part of Debian's libtiff-tools):
ffmpeg -i movie.avi frame%03d.tif
tiffcp frame*.tif stack.tif
Programming it in Python also seemed to be feasible to me using the OpenCV and tifffile libraries:
import numpy as np
import cv2
import tifffile
cap = cv2.VideoCapture('movie.avi')
success, frame = cap.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
image = np.zeros((300, 400, 500), 'uint8') # pre-allocate some space
i = 0;
while success:
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
image[i,:,:] = gray[80:480,0:500]
success, frame = cap.read()
cap.release()
tifffile.imsave('image.tif',image,photometric='minisblack')
However, the results differ in size. Looking at the histogram of the Python solution, I realized that it differes from the ffmpeg solution.
Thanks to the answer below, I compared the output files with the file:
user#ubuntu:~$ file ffmpeg.tif tifffile.tif
ffmpeg.tif: TIFF image data, little-endian
tifffile.tif: TIFF image data, little-endian, direntries=17, height=400, bps=8, compression=none, PhotometricIntepretation=BlackIsZero, description={"shape": [300, 400, 500]}, width=500
In addition, I compared the files with ffmpeg:
user#ubuntu:~$ ffmpeg -i ffmpeg.tif -i tifffile.tif
[tiff_pipe # 0x556cfec95d80] Stream #0: not enough frames to estimate rate; consider increasing probesize
Input #0, tiff_pipe, from 'ffmpeg.tif':
Duration: N/A, bitrate: N/A
Stream #0:0: Video: tiff, gray, 500x400 [SAR 1:1 DAR 5:4], 25 tbr, 25 tbn, 25 tbc
[tiff_pipe # 0x556cfeca6b40] Stream #0: not enough frames to estimate rate; consider increasing probesize
Input #1, tiff_pipe, from 'tifffile.tif':
Duration: N/A, bitrate: N/A
Stream #1:0: Video: tiff, gray, 500x400 [SAR 1:1 DAR 5:4], 25 tbr, 25 tbn, 25 tbc
Which additional diagnostics could I use in order to pin down the problem?
compression algorith
By default ffmpeg uses the packbits compression algorithm for TIFF output. This can be changed with the -compression_algo option, and other accepted values are raw, lzw, and deflate:
ffmpeg -i input.avi -compression_algo lzw output_%04d.tif
pixel format
Another difference may be caused by the pixel format (color space and chroma subsampling). See ffmpeg -h encoder=tiff for a list of supported pixel formats.
Which pixel format gets used depends on your input, and the log/console output will indicate the selected pixel format.
comparing outputs
I don't know what defaults are used by tifffile, but you can run ffmpeg -i ffmpeg.tif -i tifffile.tif and file ffmpeg.tif tifffile.tif to view details which may explain the discrepancy.

mp4 codec in Raspberry Pi 4: not writing frames to video

I'm not able to write an mp4 video file with cv2 on Rpi4.
All I'm getting in feedback is VIDIOC_DQBUF: Invalid argument
writer = cv2.VideoWriter('test.mp4', cv2.VideoWriter_fourcc(*'mp4v'), fps, (640, 480), True)
stream = cv2.VideoCapture(0)
ret, frame = stream.read()
while ret:
writer.write(frame)
cv2.imshow('Video', frame)
ret, frame = stream.read()
if cv2.waitKey(1) & 0xFF==27:
break
stream.release()
writer.release()
cv2.destroyAllWindows()
The video is displaying using cv2.imshow(frame), and the file is outputted, however no frames are actually written to it, so the video file appears corrupted.
I am assuming this is a codec error. I've tried displaying the codecs using fourcc=-1 in VideoWriter() though the other fourcc's I've tried didn't work either. Has anyone had success using opencv writing videos on rpi4?
I've tested your code and it worked well on my Raspberry Pi 4. I'm using the latest Raspberry Pi OS and OpenCV 4.3.0. I can also use avi codec:
out = cv2.VideoWriter('output.avi', cv2.VideoWriter_fourcc(*'XVID'), 30.0, (640,480))
If you cannot use both of them, try to make some updates for your rpi4.

FDK AAC encoder/decoder : Access Huffman encoded and decoded data

For the FDK AAC,
I want to access the spectral data before and after Huffman encoding/decoding in the encoder and in the decoder.
For accessing spectral data before Huffman encoding, I am using pSpectralCoefficient pointer and dumping 1024 samples (on the decoder side) and using qcOutChannel[ch]->quantSpec and dumping 1024 samples (on the encoder side). Is this correct?
Secondly, how do access the Huffman encoded signal in the encoder and decoder. If someone can tell me the location in the code and the name of the pointer to use and the length of this data, I will be extremely thankful.
Thirdly,
I wanted to know that what is the frame size in frequency domain(before huffman encoding)?
I am dumping 1024 samples of *pSpectralCoefficient. Is that correct?
Is it possible that some frames are 1024 in length and others are a set of 8 frames with 128 frequency bins. If it is possible, then is there any flag that can give me this information ?
Thank you for your time. Request you please help me out with this as soon as possible.
Regards,
Akshay
To pull out that specific data from the bitstream you will need to step through the decoder and find the desired peaces of stream. In order to do that you have to have the AAC bitstream specification. Current AAC specification is:
ISO/IEC 14496-3:2009 "Information technology -- Coding of audio-visual objects -- Part 3: Audio"

Resources