I have a .webm file of a recording of a game at 16fps. However, upon trying to process the video with OpenCV, it seems the video is recorded with a variable framerate, so when I try to use OpenCV to get a frame every second by getting the every 16th frame, it won't work since the video stream will end prematurely.
Therefore, I'm trying to convert a variable-frame .webm video, which claims it has a framerate of 16 fps, to a video with a constant frame, so I can extract one frame for every second. I've tried the following ffmpeg command from https://ffmpeg.zeranoe.com/forum/viewtopic.php?t=5518:
ffmpeg -i input.webm -c:v copy -b:v copy -r 16 output.webm
However, the following error will occur:
[NULL # 00000272ccbc0c40] [Eval # 000000bc11bfe2f0] Undefined constant or missing '(' in 'copy'
[NULL # 00000272ccbc0c40] Unable to parse option value "copy"
[NULL # 00000272ccbc0c40] Error setting option b to value copy.
Error setting up codec context options.
Here's is the code I'm trying to use to process a frame every second:
video = cv2.VideoCapture(test_mp4_vod_path)
print("Opened ", test_mp4_vod_path)
print("Processing MP4 frame by frame")
# forward over to the frames you want to start reading from.
# manually set this, fps * time in seconds you wanna start from
video.set(1, 0)
success, frame = video.read()
#fps = int(video.get(cv2.CAP_PROP_FPS)) # this will return 0!
fps = 16 # hardcode fps
total_frame_count = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
print("Loading video %d seconds long with FPS %d and total frame count %d " % (total_frame_count/fps, fps, total_frame_count))
count = 1
while video.isOpened():
success, frame = video.read()
if not success:
break
if count % fps == 0:
print("%dth frame is %d seconds on video"%(count, count/fps))
count += 1
The code will finish before it gets near the end of the video, since the video isn't at a constant FPS.
How can I convert a variable-FPS video to a constant FPS video?
For webM options in FFmpeg, read: https://trac.ffmpeg.org/wiki/Encode/VP9.
Don't use a codec copy option if converting frame rates.
Possible solution (the 2M is a testing value, adjust for your video) :
ffmpeg -i input.we -c:v libvpx-vp9 -minrate 2M -maxrate 2M -b:v 2M -pix_fmt yuv420p -r 16 output.webm
First of all, the other answer from VC.One, is very much the answer you need. It is not an exact answer to your question, however.
Your command has a small mistake, which is why the error is thrown. -b:v tells ffmpeg it should set the video bitrate to a given value. In your input, you set it to copy. That isn't a valid value for this option. The bitrate options expect a number and possibly an order of magnitude like 320k or 320000.
Either the intention was to copy the audio codec, in which case it should be -c:a copy, or the intention was to copy the video bitrate. For the latter, just remove the parameter altogether; -c:v copy produces an exact copy of the (selected part of the) video stream, which includes the bitrate, framecount, framerate and timestamps as well as all other video data.
To set up output to have the same video bitrate as the input without copying, use ffprobe to check for the streams bitrate first.
Related
I'm trying out the sample applications in the Nvidia video codec sdk, and am having trouble getting a useable decoded result.
My input file is YUV 4:2:0, taken from here, which is 352x288px.
I'm encoding using the AppEncD3D12.exe sample, with the following command:
.\AppEncD3D12.exe -i D:\akiyo_cif.y4m -s 352x288 -o D:\akiyo_out.mp4
This gives the output
GPU in use: NVIDIA GeForce RTX 2080 Super with Max-Q Design
[INFO ][17:46:39] Encoding Parameters:
codec : h264
preset : p3
tuningInfo : hq
profile : (default)
chroma : yuv420
bitdepth : 8
rc : vbr
fps : 30/1
gop : 250
bf : 1
multipass : 0
size : 352x288
bitrate : 0
maxbitrate : 0
vbvbufsize : 0
vbvinit : 0
aq : disabled
temporalaq : disabled
lookahead : disabled
cq : 0
qmin : P,B,I=0,0,0
qmax : P,B,I=0,0,0
initqp : P,B,I=0,0,0
Total frames encoded: 112
Saved in file D:\akiyo_out.mp4
Which looks promising. However, using the decode sample, a single frame of the output contains what look like 12 smaller frames of the input, in monochrome.
I'm running the decode sample like this:
PS D:\Nvidia\Video_Codec_SDK_11.1.5\Samples\build\Debug> .\AppDecD3D.exe -i D:\akiyo_out.mp4
GPU in use: NVIDIA GeForce RTX 2080 Super with Max-Q Design
Display with D3D9.
[INFO ][17:58:58] Media format: raw H.264 video (h264)
Session Initialization Time: 23 ms
[INFO ][17:58:58] Video Input Information
Codec : AVC/H.264
Frame rate : 30000/1000 = 30 fps
Sequence : Progressive
Coded size : [352, 288]
Display area : [0, 0, 352, 288]
Chroma : YUV 420
Bit depth : 8
Video Decoding Params:
Num Surfaces : 7
Crop : [0, 0, 0, 0]
Resize : 352x288
Deinterlace : Weave
Total frame decoded: 112
Session Deinitialization Time: 8 ms
I'm quite new to this so could be doing something stupid. Right now I don't know whether to look at encode or decode! Any ideas or tips most appreciated.
-I've tried other YUV files with the same result. I read that 4:2:2 is not supported, the above is 4:2:0.
Using the AppEncCuda sample, the decoded video (played with AppDecD3D.exe) is the correct size and in colour, but the video appears to scroll to the right as it is played, with colour information not scrolling at the same rate as the image
you have 2 problems:
According to the code and remarks in the AppEncD3D12 sample it expect the input frames to be in ARGB format but your input file is YUV -so the sample read data from the YUV file and treat it as ARGB. If you want the AppEncD3D12 to work with this file you need to either convert each YUV frame to argb or to change the code to work with YUV as input. The AppEncCuda sample is expecting YUV as input and that is the reason it give you better results. you can also see that in the AppEncD3D12 there were a total of 112 encoded but in the AppEncCuda there a total of 300 frames - this is because YUV frame are smaller then ARGB frames.
the 2nd problem is that the both sample save the output as RAW h264. The file is not really MP4 despite the name you gave it. There are a few players that can play a file of h264 RAW data and you can try to use them to play the output file. another option is to use FFMPEG to create a valid MP4 file and pass the RAW h264 samples to it - the NVIDIA encoder encode the video but it does not handle the creation of video files containers (There 2 many type of files like avi,mpg,mp4,mkv,ts, etc.) - you should use FFMPEG or other solution for that. The sdk samples contain a file FFmpegStreamer.h under the Utils folder that show how to use ffmpeg to output h264 video in Mpeg2 transport stream format to a file (*.ts) or the network.
I have an RTP/RTSP stream that's running at 25fps, as verified by ffprobe -i <URI>. Also, VLC plays back the RTSP stream at a real-time rate, but doesn't show me the FPS in the Media Information window.
However, when I use OpenCV 4.1.1.26 to retrieve the input stream's frame rate, it is giving me a response of 90000.0.
Question: How can I use OpenCV to probe for the correct frame rate of the RTSP stream? What would cause it to report 90000.0 instead of 25?
Here's my Python function to retrieve the frame rate:
import cv2
vid : cv2.VideoCapture = cv2.VideoCapture('rtsp://192.168.1.10/cam1/mpeg4')
def get_framerate(video: cv2.VideoCapture):
fps = video.get(cv2.CAP_PROP_FPS)
print('FPS is {0}'.format(fps))
get_framerate(vid)
MacOS Catalina
Python 3.7.4
I hope this helps you somehow. It is a simple calculator that takes cont captures and measure the beginning and the ending time. Then with the rule of three, i converted it to fps.
Related to you second question i read here that it could be due to bad installation. Also, you can check that your camera is working properly by printing ret variable. If it is true then you should be able to see the fps, if it is false then you can have an unpredictable result.
cv2.imshow() and key = cv2.waitKey(1) should be commented as it adds ping/delay resulting in bad measurement.
I post this as a comment because i do not have enough reputation points.
img = cv2.VideoCapture('rtsp://192.168.1.10/cam1/mpeg4')
while True:
if cont == 50:
a = datetime.now() - start
b = (a.seconds * 10e6 + a.microseconds)
print((a.seconds * 10e6 + a.microseconds), "fps = ", (50 * 10e6)/ b)
break
ret, frame = img.read()
# Comment for best test
cv2.imshow('fer', frame)
key = cv2.waitKey(1)
if key == ord('q'):
break
cont+=1
img.release()
cv2.destroyAllWindows()`
I want to save the image on a running video for the command ./darknet detector demo cfg/coco.data cfg/yolo.cfg yolo.weights <video_file>. So I changed the value of -prefix parameter to "a" in examples/detector.c to save image instead of streaming the video
Loading weights from yolo.weights...Done!
video file: /home/ubuntu/VID_20170602_164011.3gp
save_image_jpg(): a_00000000.jpg
FPS:0.0
Objects:
save_image_jpg(): a_00000001.jpg
FPS:0.0
Objects:
refrigerator: 26%
person: 40%
save_image_jpg(): a_00000002.jpg
FPS:0.0
Objects:
refrigerator: 36%
person: 88%
refrigerator: 26%
save_image_jpg(): a_00000003.jpg
But the jpg images are not saved in my system. I am using ubuntu as OS. Help is appriciated
The code changing the value from prefix 0 to "a" actually works. Just some file permissions was supposed to be given. Also you can change the image directory path in function save_image_jpg
sprintf(buff, "your_img_path/%s.jpg", name);
Later run the following command to convert the images to Video
ffmpeg -framerate 25 -i your_img_path/a_%08d.jpg -c:v libx264 -profile:v high -crf 20 -pix_fmt yuv420p video_name.mp4
So I set
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FPS, 60)
I also tried integer 5 instead of cv2.CAP_PROP_FPS. Neverteless, frame rate doesn't change. I get 30 when I
print(cap.get(cv2.CAP_PROP_FPS))
Why?
The problem maybe is with the codec of the camera stream and not with the FPS itself, for example, if your camera only supports YUYV it is probably that you could only work with some specific FPS, try with the app guvcview to check this in a GUI.
Try to change the codec to MJPG and then change the FPS using CAP_PROP_FPS. I'm using a Logitech C922 pro and this works for me to configure 1080p and 30fps, if you have other camera probably yu need to use a lower resolution to achieve 30fps:
import cv2 as cv
def decode_fourcc(v):
v = int(v)
return "".join([chr((v >> 8 * i) & 0xFF) for i in range(4)])
def setfourccmjpg(cap):
oldfourcc = decode_fourcc(cap.get(cv.CAP_PROP_FOURCC))
codec = cv.VideoWriter_fourcc(*'MJPG')
res=cap.set(cv.CAP_PROP_FOURCC,codec)
if res:
print("codec in ",decode_fourcc(cap.get(cv.CAP_PROP_FOURCC)))
else:
print("error, codec in ",decode_fourcc(cap.get(cv.CAP_PROP_FOURCC)))
cap = cv.VideoCapture(CAMERANUM)
cu.setfourccmjpg(cap)
w=1920
h=1080
fps=30
res1=cap.set(cv.CAP_PROP_FRAME_WIDTH,w)
res2=cap.set(cv.CAP_PROP_FRAME_HEIGHT,h)
res3=cap.set(cv.CAP_PROP_FPS,fps)
then resume your normal video capture polling loop.
Not all openCV parameters are supported by all cameras from an opencv standpoint. Each camera has a different set of parameters that need to be set. You need to find out what parameters are supported by your camera...
I know this question been asked hundred times... But I am getting frustrated with my result so I wanted to ask again. Before I dive deep into fft, I need to figure this simple task out.
I need to detect a 20 hz tone in an audiofile. I insert the 20hz tone myself like in the picture. (It can be any frequency as long as listener can't hear it so I thought I should choose a frequency around 20hz to 50 hz)
info about the audiofile.
afinfo 1.m4a
File: 1.m4a
File type ID: adts
Num Tracks: 1
----
Data format: 1 ch, 22050 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame
Channel layout: Mono
estimated duration: 8.634043 sec
audio bytes: 42416
audio packets: 219
bit rate: 33364 bits per second
packet size upper bound: 768
maximum packet size: 319
audio data file offset: 0
optimized
format list:
[ 0] format: 1 ch, 22050 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame
Channel layout: Mono
----
I followed this three tutorials and I came up with a working code that reads audio buffer and gives me fft doubles.
http://blog.bjornroche.com/2012/07/frequency-detection-using-fft-aka-pitch.html
https://github.com/alexbw/iPhoneFFT
How do I obtain the frequencies of each value in an FFT?
I read the data as follows
// If there's more packets, read them
inCompleteAQBuffer->mAudioDataByteSize = numBytes;
CheckError(AudioQueueEnqueueBuffer(inAQ,
inCompleteAQBuffer,
(sound->packetDescs?nPackets:0),
sound->packetDescs),
"couldn't enqueue buffer");
sound->packetPosition += nPackets;
int numFrequencies=2048;
int kNumFFTWindows=10;
SInt16 *testBuffer = (SInt16*)inCompleteAQBuffer->mAudioData; //Read data from buffer...!
OouraFFT *myFFT = [[OouraFFT alloc] initForSignalsOfLength:numFrequencies*2 andNumWindows:kNumFFTWindows];
for(long i=0; i<myFFT.dataLength; i++)
{
myFFT.inputData[i] = (double)testBuffer[i];
}
[myFFT calculateWelchPeriodogramWithNewSignalSegment];
for (int i=0;i<myFFT.dataLength/2;i++) {
NSLog(#"the spectrum data %d is %f ",i,myFFT.spectrumData[i]);
}
and my out out log something like
Everything checks out for 4096 samples of data
Set up all values, about to init window type 2
the spectrum data 0 is 42449.823771
the spectrum data 1 is 39561.024361
.
.
.
.
the spectrum data 2047 is -42859933071799162597786649755206634193030992632381393031503716729604050285238471034480950745056828418192654328314899253768124076782117157451993697900895932215179138987660717342012863875797337184571512678648234639360.000000
I know I am not calculating the magnitude yet but how can I detect that sound has 20 hz in it? Do I need to learn Goertzel algorithm?
There are many ways to convey information which gets inserted into then retrieved from some preexisting wave pattern. The information going in can vary things like the amplitude (amplitude modulation) or freq (frequency modulation), etc. Do you have a strategy here ? Note that the density of information you wish to convey can be influenced by such factors as the modulating frequency (higher frequencies naturally can convey more information as it can resolve changes more times per second).
Another approach is possible if both the sender and receiver have the source audio (reference). In this case the receiver could do a diff between reference and actual received audio to resolve out the transmitted extra information. A variation on this would be to have the sender send ~~same~~ audio twice, first send the reference untouched audio followed by a modulated version of this same reference audio that way the receiver just does a diff between these two audibly ~~same~~ clips to resolve out the embedded audio.
Going back to your original question ... if the sender and receiver have an agreement ... say for some time period X the reference pure 20 Hz tone is sent followed by another period X that 20 Hz tone is modulated by your input information to either alter its amplitude or frequency ... then just repeat this pattern ... on receiving side they just do a diff between each such pair of time periods to resolve your modulated information ... for this to work the source audio cannot have any tones below some freq say 100 Hz (you remove such frequency band if needed) just to eliminate interference from source audio ... you have not mentioned what kind of data you wish to transmit ... if its voice you first would need to stretch it out in effect lowering its frequency range from the 1 kHz range down to your low 20 Hz range ... once result of diff is available on receiving side you then squeeze this curve to restore it back to normal voice range of 1kHz ... maybe more work than you have time for but this might just work ... real AM/FM radio uses modulation to send voice over mega Hz carrier frequencies so it can work