Python and ffmpeg create different tiff stacks - image-processing

Hello everybody out there with an interest in image processing,
Creating a multipage tiff file (tiff stack) out of a grayscale movie can be achieved without programming using ffmpeg and tiffcp (the latter being part of Debian's libtiff-tools):
ffmpeg -i movie.avi frame%03d.tif
tiffcp frame*.tif stack.tif
Programming it in Python also seemed to be feasible to me using the OpenCV and tifffile libraries:
import numpy as np
import cv2
import tifffile
cap = cv2.VideoCapture('movie.avi')
success, frame = cap.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
image = np.zeros((300, 400, 500), 'uint8') # pre-allocate some space
i = 0;
while success:
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
image[i,:,:] = gray[80:480,0:500]
success, frame = cap.read()
cap.release()
tifffile.imsave('image.tif',image,photometric='minisblack')
However, the results differ in size. Looking at the histogram of the Python solution, I realized that it differes from the ffmpeg solution.
Thanks to the answer below, I compared the output files with the file:
user#ubuntu:~$ file ffmpeg.tif tifffile.tif
ffmpeg.tif: TIFF image data, little-endian
tifffile.tif: TIFF image data, little-endian, direntries=17, height=400, bps=8, compression=none, PhotometricIntepretation=BlackIsZero, description={"shape": [300, 400, 500]}, width=500
In addition, I compared the files with ffmpeg:
user#ubuntu:~$ ffmpeg -i ffmpeg.tif -i tifffile.tif
[tiff_pipe # 0x556cfec95d80] Stream #0: not enough frames to estimate rate; consider increasing probesize
Input #0, tiff_pipe, from 'ffmpeg.tif':
Duration: N/A, bitrate: N/A
Stream #0:0: Video: tiff, gray, 500x400 [SAR 1:1 DAR 5:4], 25 tbr, 25 tbn, 25 tbc
[tiff_pipe # 0x556cfeca6b40] Stream #0: not enough frames to estimate rate; consider increasing probesize
Input #1, tiff_pipe, from 'tifffile.tif':
Duration: N/A, bitrate: N/A
Stream #1:0: Video: tiff, gray, 500x400 [SAR 1:1 DAR 5:4], 25 tbr, 25 tbn, 25 tbc
Which additional diagnostics could I use in order to pin down the problem?

compression algorith
By default ffmpeg uses the packbits compression algorithm for TIFF output. This can be changed with the -compression_algo option, and other accepted values are raw, lzw, and deflate:
ffmpeg -i input.avi -compression_algo lzw output_%04d.tif
pixel format
Another difference may be caused by the pixel format (color space and chroma subsampling). See ffmpeg -h encoder=tiff for a list of supported pixel formats.
Which pixel format gets used depends on your input, and the log/console output will indicate the selected pixel format.
comparing outputs
I don't know what defaults are used by tifffile, but you can run ffmpeg -i ffmpeg.tif -i tifffile.tif and file ffmpeg.tif tifffile.tif to view details which may explain the discrepancy.

Related

How to extract a fixed set of frames from a live video stream for machine learning prediction in PyTorch?

I recently created a Video Swin Transformer model that takes in a ([batch_size], 3, 32, 224, 224) [batch_size, channel, temporal_dim, height, width] tensor for video and outputs logits. The goal is to have the model predict on a live stream from a camera. Is there any way to capture the fixed sequence of 32 frames repetitively and have the model predict on a live stream. If prediction time is longer than 32 frames, can I stretch out the frames over a longer time period like a minute? Thanks.
You can try to use my library ffmpegio, which suits your need:
To install:
pip install ffmpegio
To get block of 32 frames from your input url
import ffmpegio
url = 'input stream url'
temporal_dim = 32
height = 224
width = 224
size = [width,height]
pix_fmt = 'rgb24'
with ffmpegio.open(url,'rv',blocksize=temporal_dim,s=size,pix_fmt=pix_fmt) as stream:
for frames in stream: # frames in [time,height,width,ch] ndarray
vswim_in = frames.transpose(3,0,1,2) # reorg for your library
You can specify any other ffmpeg options as you wish to add (like a scaling/cropping filter to make input frame 224px square or input stream options).
Caveat. I haven't tested live stream buffering extensively. If you encounter any issues, please post an issue on the GitHub.

Pass ffmpeg Stream to OpenCV

I would like to use the redirection operator to bring the stream from ffmpeg to cv2 so that I can recognize or mark the faces on the stream and redirect this stream again so that it runs under another stream.
One withoutfacedetect and One withfacedetect.
raspivid -w 1920 -h 1080 -fps 30 -o - -t 0 -vf -hf -b 6000000 | ffmpeg -f h264 -i - -vcodec copy -g 50 -strict experimental -f tee -map 0:v "[f=flv]rtmp://xx.xx.xx.xx/live/withoutfacedetect |[f=h264]pipe:1" > test.mp4
I then read up on CV2 and came across the article.
https://www.bogotobogo.com/python/OpenCV_Python/python_opencv3_Image_Object_Detection_Face_Detection_Haar_Cascade_Classifiers.php
I then ran the script with my picture and was very amazed that there was a square around my face.
But now back to business. What is the best way to do this?
thanks to #Mark Setchell, forgot to mention that I'm using a Raspberry Pi 4.
I'm still not 100% certain what you are really trying to do, and have more thoughts than I can express in a comment. I have not tried all of what I think you are trying to do, and I may be over-thinking it, but if I put down my thought-train, maybe others will add in some helpful thoughts/corrections...
Ok, the video stream comes from the camera into the Raspberry Pi initially as RGB or YUV. It seems silly to use ffmpeg to encode that to h264, to pass it to OpenCV on its stdin when AFAIK, OpenCV cannot easily decode it back into BGR or anything it naturally likes to do face detection with.
So, I think I would alter the parameters to raspivid so that it generates RGB data-frames, and remove all the h264 bitrate stuff i.e.
raspivid -rf rgb -w 1920 -h 1080 -fps 30 -o - | ffmpeg ...
Now we have RGB coming into ffmpeg, so you need to use tee and map similar to what you have already and send RGB to OpenCV on its stdin and h264-encode the second stream to rtmp as you already have.
Then in OpenCV, you just need to do a read() from stdin of 1920x1080x3 bytes to get each frame. The frame will be in RGB, but you can use:
cv2.cvtColor(cv2.COLOR_RGB2BGR)
to re-order the channels to BGR as OpenCV requires.
When you read the data from stdin you need to do:
frame = sys.stdin.buffer.read(1920*1080*3)
rather than:
frame = sys.stdin.read(1920*1080*3)
which mangles binary data such as images.

How to let FFMPEG fetch frames from OpenCV and stream them to HTTP server

There is a camera that shoots at 20 frame per second. each frame is 4000x3000 pixel.
The frames are sent to a software that contain openCV in it. OpenCV resizes the freames to 1920x1080 then they must be sent to FFMPEG to be encoded to H264 or H265 using Nvidia Nvenc.
The encoded video then got steamed HTTP to a maximum of 10 devices.
The infrastructure is crazy good (10 GB Lan) with state of the art switchers, routers etc...
Right now, i can get 90 FPS when encoding the images from an Nvme SSD. this means that the required encoding speed is achieved.
The question is how to get the images from OpenCV to FFMPEG ?
the stream will be watched on a webapp that was made using MERN stack (assuming that this is relevant).
For cv::Mat you have cv::VideoWriter. If you wish to use FFMpeg, assuming Mat is continuous, which can be enforced:
if (! mat.isContinuous())
{
mat = mat.clone();
}
you can simply feed mat.data into sws_scale
sws_scale(videoSampler, mat.data, stride, 0, mat.rows, videoFrame->data, videoFrame->linesize);
or directly into AVFrame
For cv::cuda::GpuMat, VideoWriter implementation is not available, but you can use NVIDIA Video Codec SDK and similarly feed cv::cuda::GpuMat::data into NvEncoderCuda, just make sure your GpuMat has 4 channels (BGRA):
NV_ENC_BUFFER_FORMAT eFormat = NV_ENC_BUFFER_FORMAT_ABGR;
std::unique_ptr<NvEncoderCuda> pEnc(new NvEncoderCuda(cuContext, nWidth, nHeight, eFormat));
...
cv::cuda::cvtColor(srcIn, srcIn, cv::ColorConversionCodes::COLOR_BG2BGRA);
NvEncoderCuda::CopyToDeviceFrame(cuContext, srcIn.data, 0, (CUdeviceptr)encoderInputFrame->inputPtr,
(int)encoderInputFrame->pitch,
pEnc->GetEncodeWidth(),
pEnc->GetEncodeHeight(),
CU_MEMORYTYPE_HOST,
encoderInputFrame->bufferFormat,
encoderInputFrame->chromaOffsets,
encoderInputFrame->numChromaPlanes);
Here's my complete sample of using GpuMat with NVIDIA Video Codec SDK

Video4Linux Y12 Pixel Format for use with OpenCV issues

I have a AMG88xx infrared camera attached to a raspberry PI 4
i am using the linux video-i2c driver
the driver appears to work correctly
v4l2-ctl -d /dev/video0 --all
Driver Info:
Driver name : video-i2c
Card type : I2C 1-104 Transport Video
Bus info : I2C:1-104
Driver version : 4.19.102
Capabilities : 0x85200001
Video Capture
Read/Write
Streaming
Extended Pix Format
Device Capabilities
Device Caps : 0x05200001
Video Capture
Read/Write
Streaming
Extended Pix Format
Priority: 2
Video input : 0 (Camera: ok)
Format Video Capture:
Width/Height : 8/8
Pixel Format : 'Y12 ' (12-bit Greyscale)
Field : None
Bytes per Line : 16
Size Image : 128
Colorspace : Raw
Transfer Function : Default (maps to None)
YCbCr/HSV Encoding: Default (maps to ITU-R 601)
Quantization : Default (maps to Full Range)
Flags :
Streaming Parameters Video Capture:
Capabilities : timeperframe
Frames per second: 10.000 (10/1)
Read buffers : 1
However the output pixel format (Y12) appears to be unsupported by openCV
>>> import cv2
>>> capture = cv2.VideoCapture(0)
VIDEOIO ERROR: V4L2: Pixel format of incoming image is unsupported by OpenCV
VIDEOIO ERROR: V4L: can't open camera by index 0
Do I need to build OpenCV with additional support? or somehow convert the pixelformat?
You don't need OpenCV and cv2.VideoCapture() to read that camera. It is just a relatively slow I2C device that you can read directly or using the Adafruit library as in this example.
By all means, you could read it as above and then convert from 12-bit to an 8-bit or 16-bit Numpy array and then process with OpenCV afterwards, but it is not necessary.
Alternatively, you could embed a subprocess call to ffmpeg like I did in the second part of this answer.
Issue was related to missing pixel format in OpenCV (see Issue #16620) fixed by #16626
found by compareing video4linux pixelformats with those supported by openCV in modules/videoio/src/cap_v4l.cpp

ImageMagick's Stream can't read TIFF64?

I am trying to extract a subregion of a large BigTIFF image (TIFF64). If the images are not too big, I can just convert src.tif dst.jpg. If the images are really big, though, convert doesn't work. I was trying to use stream to extract the region of interest without loading the complete image in memory. However, the result is a 0 bytes file. I uploaded one of my BigTIFFs here:
https://mfr.osf.io/render?url=https://osf.io/kgeqs/?action=download%26mode=render
This one is small enough to work with convert, and it produces the 0 byte image with stream:
stream -map rgb -storage-type char '20-07-2017_RecognizedCode-10685.tif[1000x1000+10000+10000]' 1k-crop.dat
Is there a way of getting stream to work? Is this a come-back of this old bug in stream with TIFF64? http://imagemagick.org/discourse-server/viewtopic.php?t=22046
I am using ImageMagick 6.9.2-4 Q16 x86_64 2016-03-17
I can't download your image to do any tests, but you could consider using vips which is very fast and frugal with memory, especially for large images - which I presume yours are, else you would probably not use BigTIFF.
So, if we make a large 10,000 x 10,000 TIF with ImageMagick for testing:
convert -size 10000x10000 gradient:cyan-magenta -compress lzw test.tif
and I show a smaller JPEG version here:
You could extract the top-left corner with vips like this, and also show the maximum memory usage (with --vips-leak):
vips crop test.tif a.jpg 0 0 100 100 --vips-leak
Output
memory: high-water mark 5.76 MB
And you could extract the bottom-right corner like this:
vips crop test.tif a.jpg 9000 9000 1000 1000 --vips-leak
Output
memory: high-water mark 517.01 MB
Using ImageMagick, that same operation requires 1.2GB of RAM:
/usr/bin/time -l convert test.tif -crop 1000x1000+9000+9000 a.jpg
2.46 real 2.00 user 0.45 sys
1216008192 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
298598 page reclaims
I agree with Mark's excellent answer, but just wanted to also say that the TIFF format you use can make a big difference.
Regular strip TIFFs don't really support random access, but tiled TIFFs do. For example, here's a 10k x 10k pixel strip TIFF:
$ vips copy wtc.jpg wtc.tif
$ time vips crop wtc.tif x.tif 8000 8000 100 100 --vips-leak
real 0m0.323s
user 0m0.083s
sys 0m0.185s
memory: high-water mark 230.80 MB
Here the TIFF reader has to scan almost the whole image to get to the bit it needs, causing relatively high memory use.
If you try again with a tiled image:
$ vips copy wtc.jpg wtc.tif[tile]
$ time vips crop wtc.tif x.tif 8000 8000 100 100 --vips-leak
real 0m0.032s
user 0m0.017s
sys 0m0.014s
memory: high-water mark 254.39 KB
Now it can just seek and read out the part it needs.
You may not have control over the details of the image format, of course, but if you do, you'll find that for this kind of operation tiled images are dramatically faster and need much less memory.

Resources