Change in NAL_REF_IDC value for P frames in x264 encoding

Change in NAL_REF_IDC value for P frames in x264 encoding - encoder

By changing nal_ref_idc value from 2 to 0 for all P frames in x264 encoding, I have observed the size of the P frames gets changed. Is it because of all P frames are now only referenced to a single I frame in GOP?
But the size of the first P frame after I frame should be independent of both the situation (whether direct encoding, nal_ref_idc = 0 or cumulative encoding, nal_ref_idc = 2), but I am getting a change of size in first P frame also after the I frame. Can anyone please tell me the possible reason or explanation behind this?

Related

OpenCV Encoding to H264 changing original RGB pixel values for gray images

I have the following issue:
I'm creating a uniform gray color video (for testing) using OpenCV VideoWriter. The output video will reproduce a constant image where all the pixels must have the same value x (25, 51, 76,... and so on).
When I generate the video using MJPG Encoder:
vw = cv2.VideoWriter('./videos/input/gray1.mp4',
cv2.VideoWriter_fourcc(*'MJPG'),
fps,(resolution[1],resolution[0]))
and read the output using the VideoCapture class, everything just works fine. I got a frame array with all pixel values set to (25,51,76 and so on).
However when I generate the video using HEV1 (H.265) or also H264:
vw = cv2.VideoWriter('./videos/input/gray1.mp4',
cv2.VideoWriter_fourcc(*'HEV1'),
fps,(resolution[1],resolution[0]))
I run into the following issue. The frame I got in BGR format follows the next configuration:
The blue channel value is the expected value (x) minus 4 (25-4=21, 51-4=47, 76-4=72, and so on).
The green channel is the expected value (x) minus 1 (25-1=24, 51-1=50, 76-1=75).
The red channel is the expected value (x) minus 3 (25-3=22, 51-3=48, 76-3=73).
Notice that the value is reduced with a constant value of 4,1,3, independently of the pixel value (so there is a constant effect).
What I could explain is a pixel value dependable feature, instead of a fixed one.
What is worse is that if I choose to generate a video with frames consisting in every color (pixel values [255 0 0],[0 255 0] and [0 0 255]) I get the corresponding outputs values ([251 0 0],[0 254 0] and [0 0 252])
I though that this relation was related to the grayscale Y value, where:
Y = 76/256 * RED + 150/256 * GREEN + 29/256 * BLUE
But this coefficients are not related with the output obtained. Maybe the problem is the reading with VideoCapture?
EDIT:
In case that I want to have the same output value for the pixels (Ej: [10,10,10] experimentally I have to create a img where the red and blue channel has the green channel value plus 2:
value = 10
img = np.zeros((resolution[0],resolution[1],3),dtype=np.uint8)+value
img[:,:,2]=img[:,:,2]+2
img[:,:,1]=img[:,:,1]+0
img[:,:,0]=img[:,:,0]+2
Anyone has experience this issue? It is related to the encoding process or just that OpenCV treats the image differently, prior encoding, depending on the fourcc parameter value?

How can I encode a real-time video from dynamically spaced frames?

I'm trying to create a video from a series of screenshots. The screenshots are in a database and have dynamic FPS (1-3 FPS). How can I create a video file with constant FPS?
Before performing av_packet_rescale_ts I tried to change the st^.codec.time_base.den value on the fly between 1 and 3.
This is the basic cycle of encoding of one picture:
repeat
fillchar(pkt, sizeof(TAVPacket), #0);
av_init_packet(#pkt);
(* encode the image *)
ret := avcodec_encode_video2(st^.codec, #pkt, frame, got_packet);
if (ret < 0) then
begin
writeln(format('Error encoding video frame: %s', [av_err2str(ret)]));
exit;
end;
if (got_packet > 0) then
begin
(* rescale output packet timestamp values from codec to stream timebase *)
av_packet_rescale_ts(#pkt, st^.codec.time_base, st^.time_base);
pkt.stream_index := st^.index;
log_packet(oc, #pkt);
(* Write the compressed frame to the media file. *)
av_interleaved_write_frame(oc, #pkt);
end;
inc(frame.pts);
until (av_compare_ts(frame.pts, st^.codec^.time_base, 1, av_make_q(1, 1)) >= 0);
Changing the FPS on the fly causes the video output to fail. If I don't change the st^.codec.time_base.den value the video speeds up and slows down.

There is no notion in ffmpeg of a dynamic timebase, so changing it during encoding is forbidden. But you are free to set the PTS of your frames before encoding to anything monotonically increasing.
You are not showing how you set the PTS in your example code. If you want a constant framerate, just ignore the timestamps from your database, count the frames and calculate a PTS according to the frame number (probably this is what ffmpeg is doing when you don't give it any PTS).
If your frames were recorded with a varying framerate, but you didn't record any timestamps for them, you cannot get a smooth looking video anymore.

How to calculate the output size of a convoluitonal layer in YOLO?

This is the architecture of YOLO. I am trying to calculate the output size of each layer myself, but I can't get the size as described in the paper.
For example, in the first Conv Layer, the input size is 448x448 but it uses a 7x7 filter with stride 2, but according to this equation W2=(W1−F+2P)/S+1 = (448 - 7 + 0)/2 + 1, I can't get an integer result, so the filter size seems to be unsuitable to the input size.
So anyone can explain this problem? Did I miss something or misunderstand the YOLO architecture?

As Hawx Won said, the input image has been added extra 3 paddings, and here is how it works from the source code.
For convolution layers, if pad is enabled, The padding value of each layer will be calculated by:
# In parser.c
if(pad) padding = size/2;
# In convolutional_layer.c
l.pad = padding;
Where size is the shape of the filter.
So, for the first layer: padding = size/2 = 7/2=3
Then the output of first convolutional layer should be:
output_w = (input_w+2*pad-size)/stride+1 = (448+6-7)/2+1 = 224
output_h = (input_h+2*pad-size)/stride+1 = (448+6-7)/2+1 = 224

Well, I spent some time learning the source code, and learned about that the input image has added extra 3 paddings on top,down,left and right side of the image, so the image size becomes (448+2x3)=454, the out put size of valid padding should be calculated in this way:
Output_size=ceil((W-F+1)/S)=(454-7+1)/2=224, therefore, output size should be 224x224x64
I hope this could be helpful

JPEG2000 : Can number of tiles in X direction be zero?

According to JPEG2000 specs, Number of tiles in X and Y directions is calculated by following formula:
numXtiles =  (Xsiz − XTOsiz)/ XTsiz
&
numYtiles =  (Ysiz − YTOsiz)/ YTsiz
But it is not mentioned about the range of numXtiles or numYtiles.
Can we have numXtiles=0 while numYtiles=250 (or any other value) ?

In short, no. You will always need at least one row and one column of tiles to place your image in the canvas.
In particular, the SIZ marker of the JPEG 2000 stream syntax does not directly define the number of tiles, but rather the size of each tile. Since the tile width and height are defined to be larger than 0 (see page 453 of "JPEG 2000 Image compression fundamentals, standards and practice", by David Taubman and Michael Marcellin), you will always have at least one tile.
That said, depending on the particular implementation that you are using, there may be a parameter numXtiles that you can set to 0 without crashing your program. In that case, the parameter is most likely being ignored or interpreted differently.

Buffer data in Simulink in continuous time

I need to buffer some signals for a fixed duration to be used within the simulation. The use of buffer block in Simulink requires the frame rate to be known. However, I am using a continuous time solver (with defined maximum step size) so I don't really know how much should I put the buffer size as. There does not seem to be any option wherein a trigger based on time can be used. Can someone suggest how this can be done?

A simple buffer, made using a MATLAB Function Block, that would always have the most recent element at the top, would be,
function y = buffer(x)
% initialize the buffer
y = zeros(100,1);
% Shuffle the elements down
y(2:end) = y(1:end-1);
% add the new element
y(1) = x;

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Change in NAL_REF_IDC value for P frames in x264 encoding - encoder

Related

OpenCV Encoding to H264 changing original RGB pixel values for gray images

How can I encode a real-time video from dynamically spaced frames?

How to calculate the output size of a convoluitonal layer in YOLO?

JPEG2000 : Can number of tiles in X direction be zero?

Buffer data in Simulink in continuous time

Categories

Resources