Nvidia codec SDK samples: can't decode an encoded file correctly - sdk

I'm trying out the sample applications in the Nvidia video codec sdk, and am having trouble getting a useable decoded result.
My input file is YUV 4:2:0, taken from here, which is 352x288px.
I'm encoding using the AppEncD3D12.exe sample, with the following command:
.\AppEncD3D12.exe -i D:\akiyo_cif.y4m -s 352x288 -o D:\akiyo_out.mp4
This gives the output
GPU in use: NVIDIA GeForce RTX 2080 Super with Max-Q Design
[INFO ][17:46:39] Encoding Parameters:
codec : h264
preset : p3
tuningInfo : hq
profile : (default)
chroma : yuv420
bitdepth : 8
rc : vbr
fps : 30/1
gop : 250
bf : 1
multipass : 0
size : 352x288
bitrate : 0
maxbitrate : 0
vbvbufsize : 0
vbvinit : 0
aq : disabled
temporalaq : disabled
lookahead : disabled
cq : 0
qmin : P,B,I=0,0,0
qmax : P,B,I=0,0,0
initqp : P,B,I=0,0,0
Total frames encoded: 112
Saved in file D:\akiyo_out.mp4
Which looks promising. However, using the decode sample, a single frame of the output contains what look like 12 smaller frames of the input, in monochrome.
I'm running the decode sample like this:
PS D:\Nvidia\Video_Codec_SDK_11.1.5\Samples\build\Debug> .\AppDecD3D.exe -i D:\akiyo_out.mp4
GPU in use: NVIDIA GeForce RTX 2080 Super with Max-Q Design
Display with D3D9.
[INFO ][17:58:58] Media format: raw H.264 video (h264)
Session Initialization Time: 23 ms
[INFO ][17:58:58] Video Input Information
Codec : AVC/H.264
Frame rate : 30000/1000 = 30 fps
Sequence : Progressive
Coded size : [352, 288]
Display area : [0, 0, 352, 288]
Chroma : YUV 420
Bit depth : 8
Video Decoding Params:
Num Surfaces : 7
Crop : [0, 0, 0, 0]
Resize : 352x288
Deinterlace : Weave
Total frame decoded: 112
Session Deinitialization Time: 8 ms
I'm quite new to this so could be doing something stupid. Right now I don't know whether to look at encode or decode! Any ideas or tips most appreciated.
-I've tried other YUV files with the same result. I read that 4:2:2 is not supported, the above is 4:2:0.
Using the AppEncCuda sample, the decoded video (played with AppDecD3D.exe) is the correct size and in colour, but the video appears to scroll to the right as it is played, with colour information not scrolling at the same rate as the image

you have 2 problems:
According to the code and remarks in the AppEncD3D12 sample it expect the input frames to be in ARGB format but your input file is YUV -so the sample read data from the YUV file and treat it as ARGB. If you want the AppEncD3D12 to work with this file you need to either convert each YUV frame to argb or to change the code to work with YUV as input. The AppEncCuda sample is expecting YUV as input and that is the reason it give you better results. you can also see that in the AppEncD3D12 there were a total of 112 encoded but in the AppEncCuda there a total of 300 frames - this is because YUV frame are smaller then ARGB frames.
the 2nd problem is that the both sample save the output as RAW h264. The file is not really MP4 despite the name you gave it. There are a few players that can play a file of h264 RAW data and you can try to use them to play the output file. another option is to use FFMPEG to create a valid MP4 file and pass the RAW h264 samples to it - the NVIDIA encoder encode the video but it does not handle the creation of video files containers (There 2 many type of files like avi,mpg,mp4,mkv,ts, etc.) - you should use FFMPEG or other solution for that. The sdk samples contain a file FFmpegStreamer.h under the Utils folder that show how to use ffmpeg to output h264 video in Mpeg2 transport stream format to a file (*.ts) or the network.

Related

ffmpeg convert variable framerate .webm to constant framerate video

I have a .webm file of a recording of a game at 16fps. However, upon trying to process the video with OpenCV, it seems the video is recorded with a variable framerate, so when I try to use OpenCV to get a frame every second by getting the every 16th frame, it won't work since the video stream will end prematurely.
Therefore, I'm trying to convert a variable-frame .webm video, which claims it has a framerate of 16 fps, to a video with a constant frame, so I can extract one frame for every second. I've tried the following ffmpeg command from https://ffmpeg.zeranoe.com/forum/viewtopic.php?t=5518:
ffmpeg -i input.webm -c:v copy -b:v copy -r 16 output.webm
However, the following error will occur:
[NULL # 00000272ccbc0c40] [Eval # 000000bc11bfe2f0] Undefined constant or missing '(' in 'copy'
[NULL # 00000272ccbc0c40] Unable to parse option value "copy"
[NULL # 00000272ccbc0c40] Error setting option b to value copy.
Error setting up codec context options.
Here's is the code I'm trying to use to process a frame every second:
video = cv2.VideoCapture(test_mp4_vod_path)
print("Opened ", test_mp4_vod_path)
print("Processing MP4 frame by frame")
# forward over to the frames you want to start reading from.
# manually set this, fps * time in seconds you wanna start from
video.set(1, 0)
success, frame = video.read()
#fps = int(video.get(cv2.CAP_PROP_FPS)) # this will return 0!
fps = 16 # hardcode fps
total_frame_count = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
print("Loading video %d seconds long with FPS %d and total frame count %d " % (total_frame_count/fps, fps, total_frame_count))
count = 1
while video.isOpened():
success, frame = video.read()
if not success:
break
if count % fps == 0:
print("%dth frame is %d seconds on video"%(count, count/fps))
count += 1
The code will finish before it gets near the end of the video, since the video isn't at a constant FPS.
How can I convert a variable-FPS video to a constant FPS video?
For webM options in FFmpeg, read: https://trac.ffmpeg.org/wiki/Encode/VP9.
Don't use a codec copy option if converting frame rates.
Possible solution (the 2M is a testing value, adjust for your video) :
ffmpeg -i input.we -c:v libvpx-vp9 -minrate 2M -maxrate 2M -b:v 2M -pix_fmt yuv420p -r 16 output.webm
First of all, the other answer from VC.One, is very much the answer you need. It is not an exact answer to your question, however.
Your command has a small mistake, which is why the error is thrown. -b:v tells ffmpeg it should set the video bitrate to a given value. In your input, you set it to copy. That isn't a valid value for this option. The bitrate options expect a number and possibly an order of magnitude like 320k or 320000.
Either the intention was to copy the audio codec, in which case it should be -c:a copy, or the intention was to copy the video bitrate. For the latter, just remove the parameter altogether; -c:v copy produces an exact copy of the (selected part of the) video stream, which includes the bitrate, framecount, framerate and timestamps as well as all other video data.
To set up output to have the same video bitrate as the input without copying, use ffprobe to check for the streams bitrate first.

Extract Mpeg TS from Wireshark

I need to extract a MPEG-TS stream from a Wireshark capture. I have managed to do this but when I play it back using VLC the output is crappy, it's just a green window with some jitter on the top rows.
Here is how I did it:
Captured using ip.dest filter for the multicast stream.
Analyze -> Decode As -> UDP port (field), portnumber (value), MP2T (current)
Tools Dump MPEG TS Packets.
It does not play out correctly. Is there any other way of doing this
When I need to dump TS from a pcap file I do following:
If TS in plain UDP (column protocol shows MPEG TS for each packet) jump to step 3
If TS is packed in RTP, right click on any packet -> Decode as -> Choose RTP under field "Current"
Use tool MPEG Dump, Tools -> Dump MPEG TS Packets
I do not use MP2T packets decoding, it usually doesn't work.
If the TS is in plain UDP, it can happen that TS packets are shuffled and 4 bits long TS packet field which serves as a continuity counter is not long enough to correctly order TS packets. This can result in corrupted playback of dumped TS.
I've added two filtering options to the original pcap2mpeg.
You can find it on: https://github.com/bugre/pcap2mpegts
So you can:
filter by udp destination port
filter by mcast group IP and destination port
for the cases where the captured file has multiple TS on the same IP but on different ports, or on different mcast IP's.
you would run it as:
pcap2mpegts.pl -y -i 239.100.0.1 -p 2000 -l multi_ts_capture.pcap -o single-stream-output.ts
Not using Wireshark, you can use pcap2mpeg.pl. I tested it and it works well if there is a single MPEG
stream in the PCAP.
Here is the output of ffprobe on a mpeg file with 2 streams that was successfully extracted:
Input #0, mpegts, from 'test.mpeg':
Duration: 00:27:59.90, start: 4171.400000, bitrate: 8665 kb/s
Program 1
Metadata:
service_name : Service01
service_provider: FFmpeg
Stream #0:0[0x100]: Video: h264 (Main) ([27][0][0][0] / 0x001B), yuv420p(progressive), 4096x2176 [SAR 1:1 DAR 32:17], 10 fps, 10 tbr, 90k tbn, 20 tbc
Stream #0:1[0x1001]: Data: bin_data ([6][0][0][0] / 0x0006)

Why isn't Octave reading in the entire 14 bits of my .NEF raw files?

I am using a Nikon D5200. I intend to do some image processing on the raw images shot with the camera. But I am encountering a problem when I read the raw images using GNU Octave. Rather than giving bit depth of 16 (since the .NEF are shot at 14-bit depth), the result is just a 8-bit array. What might be the problem?
imfinfo("/media/karthikeyan/3434-3531/DCIM/100D5200/DSC_1094.NEF")
ans =
scalar structure containing the fields:
Filename = /media/karthikeyan/3434-3531/DCIM/100D5200/DSC_1094.NEF
FileModDate = 10-Oct-2016 18:10:02
FileSize = 26735420
Format = DCRAW
FormatVersion =
Width = 6036
Height = 4020
BitDepth = 8
ColorType = truecolor
The result from exiftool is as follows:
exiftool DSC_1094.NEF | grep -i bit
Bits Per Sample : 14
I am using Ubuntu 14.04, Octave 4.0.3.

What is the supported format for compressed 4-channel audio file in iOS?

First of all I'm a noob in both iOS and audio programming, so bear with me if I don't use the correct technical terms, but I'll do my best!
What we want to do:
In an iOS app we are developing, we want to be able to play sounds throughout 4 different outputs to have a mini surround system. That is, we want to have the Left and Right channels play through the Headphones, while the Center and Center surround play through an audio hardware connected to the lightning port. Since the audio files will be streamed/dowloaded from a remote server, using raw (PCM) audio files is not an option.
The problem:
Apple has, since iOS 6, made it possible to play an audio file using a multiroute configuration... and that is grate and exactly what we need... but, when ever we try to play a 4-channel audio file, AAC-encoded and encapsulated in an m4a (or CAF) file format, we get the following error:
ERROR: [0x19deee000] AVAudioFile.mm:86: AVAudioFileImpl: error 1718449215
(Which is the status code for "kAudioFileUnsupportedDataFormatError" )
We get the same error when we use the same audio encoded as lossless (ALAC) instead, but we don't get this error when playing the same audio befor encoding (PCM format).
We don't get the error neither when we use a stereo audio file, or a 5.1 audio file encoded, the same way as the 4-channels one, in both AAC and ALAC.
What we tried:
The encoding
The file was encoded using Apple's audio tools provided with Mac OS X: afconvert using this command:
afconvert -v -f 'm4af' -d "aac#44100" 4ch_master.caf 4ch_44100_AAC.m4a
and
afconvert -v -f 'caff' -d "alac#44100" 4ch_master.caf 4ch_44100_ALAC.caf
in the case of lossless encoding.
The audio format, as given by afinfo for the master (PCM) audio file:
File: 4ch_master.caf
File type ID: caff
Num Tracks: 1
----
Data format: 4 ch, 44100 Hz, 'lpcm' (0x0000000C) 16-bit little-endian signed integer
no channel layout.
estimated duration: 582.741338 sec
audio bytes: 205591144
audio packets: 25698893
bit rate: 2822400 bits per second
packet size upper bound: 8
maximum packet size: 8
audio data file offset: 4096
optimized
audio 25698893 valid frames + 0 priming + 0 remainder = 25698893
source bit depth: I16
The AAC-encoded format info:
File: 4ch_44100_AAC.m4a
File type ID: m4af
Num Tracks: 1
----
Data format: 4 ch, 44100 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame
Channel layout: Quadraphonic
estimated duration: 582.741338 sec
audio bytes: 18338514
audio packets: 25099
bit rate: 251730 bits per second
packet size upper bound: 1039
maximum packet size: 1039
audio data file offset: 106496
optimized
audio 25698893 valid frames + 2112 priming + 371 remainder = 25701376
source bit depth: I16
format list:
[ 0] format: 4 ch, 44100 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame
Channel layout: Quadraphonic
----
And for the lossless encoded audio file:
File: 4ch_44100_ALAC.caf
File type ID: caff
Num Tracks: 1
----
Data format: 4 ch, 44100 Hz, 'alac' (0x00000001) from 16-bit source, 4096 frames/packet
Channel layout: 4.0 (C L R Cs)
estimated duration: 582.741338 sec
audio bytes: 83333400
audio packets: 6275
bit rate: 1143862 bits per second
packet size upper bound: 16777
maximum packet size: 16777
audio data file offset: 20480
optimized
audio 25698893 valid frames + 0 priming + 3507 remainder = 25702400
source bit depth: I16
----
The code
In the code part, at the beginning, we followed the implementation presented at session 505 of WWDC12 using AVAudioPlayer API. At that level, multirouting didn't seemed to work reliably.. we didn't suspect that that might have been related to the audio format, so we moved on experimenting with AVAudioEngine API, presented at session 502 of WWDC14 and the sample code associated to it. We made the multirouting work for the master 4-channels audio file (after some adaptations), but then we hit the error mentioned above when calling scheduleFile, as shown in the code snippet below (Note: We are using Swift and all the necessary audio graph setup is done but not shown here):
var playerNode: AVAudioPlayerNode!
...
...
let audioFileToPlay = AVAudioFile(forReading: URLOfTheAudioFle)
playerNode.scheduleFile(audioFileToPlay, atTime: nil, completionHandler: nil)
Do someone have a hint on what could be wrong in the audio data format?
After contacting Apple Support, the answer was that this is not possible for the currently shipping system configurations:
"Thank you for contacting Apple Developer Technical Support (DTS). Our engineers have reviewed your request and have concluded that there is no supported way to achieve the desired functionality given the currently shipping system configurations."

Why doesn't my MPEG-TS play on iOS?

My MPEG-TS video isn't playing on iOS via HTTP Live Streaming and I am not sure why. I know my iOS code/m3u8 format is correct because if I replace my .ts file with a sample one from apple (bipbop), it works. I provided information on my video (doesn't work) and the one that works.
Mine (not working)
General
ID : 1 (0x1)
Format : MPEG-TS
File size : 9.57 MiB
Duration : 3s 265ms
Overall bit rate mode : Variable
Overall bit rate : 24.3 Mbps
Video
ID : 769 (0x301)
Menu ID : 1 (0x1)
Format : AVC
Format/Info : Advanced Video Codec
Format profile : High#L4.2
Format settings, CABAC : No
Format settings, ReFrames : 1 frame
Codec ID : 27
Duration : 3s 279ms
Bit rate : 23.1 Mbps
Width : 1 920 pixels
Height : 1 080 pixels
Display aspect ratio : 16:9
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Stream size : 9.01 MiB (94%)
Apples (working)
General
ID : 1 (0x1)
Format : MPEG-TS
File size : 281 KiB
Duration : 9s 943ms
Overall bit rate mode : Variable
Overall bit rate : 231 Kbps
Video
ID : 257 (0x101)
Menu ID : 1 (0x1)
Format : AVC
Format/Info : Advanced Video Codec
Format profile : Main#L2.1
Format settings, CABAC : No
Format settings, ReFrames : 2 frames
Format settings, GOP : M=2, N=24
Codec ID : 27
Duration : 9s 542ms
Width : 400 pixels
Height : 300 pixels
Display aspect ratio : 4:3
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Color primaries : BT.601 NTSC
Transfer characteristics : BT.709
Matrix coefficients : BT.601
Audio
ID : 258 (0x102)
Menu ID : 1 (0x1)
Format : AAC
Format/Info : Advanced Audio Codec
Format version : Version 4
Format profile : LC
Muxing mode : ADTS
Codec ID : 15
Duration : 9s 380ms
Bit rate mode : Variable
Channel(s) : 2 channels
Channel positions : Front: L R
Sampling rate : 22.05 KHz
Compression mode : Lossy
Delay relative to video : -121ms
My video doesn't have an audio stream, but that shouldn't matter.
What is it about my video that makes it not work via HTTP Live Streaming?
Your video is high profile, level 4.2. iPhone 5 only supports up level 4.1. iPhone 4 only supports up to main profile level 3.1. Also 23.1 Also MBps is really high. 3 or 4 is probably max.
Edit:
Here is a compiled list I have made for ios devices.
The problem is not the operating system. iOS is just passing on the encoded h.264 stream to the SoC's video decode block. The hardware decoding blocks are limited, and each SoC iteration has different limitations.
Generally the limits are on the profile and macroblock rate. You will need to severely cut back the bitrate in your video if you want it to play on any iOS device.
Szatmary's table looks like a great resource for choosing your target encoding parameters.

Resources