NVidia RTX Voice, input a wav file instead of a microphone - nvidia

I've downloaded this Ai software. I've tried it and it does a beautiful job with noise cancellation as the input should be the microphone.
My question is : How can I input a wav file instead of a microphone? Anyway to use this magical noise cancellation tool with a simple wav file ?

Related

OpenCV with FFMPEG back-end and h264_v4l2m2m codec

I'm trying to figure out if there is a way to configure OpenCV 4.5.4 that uses FFMPEG back-end to write video file (via VideoWriter) with h264_v4l2m2m codec instead of h264. The difference between those 2 codecs on ffmpeg side is that h264_v4l2m2m uses hardware support to encode frames into video file.
If using ffmpeg tool directly via command line (Linux), the codec can be chosen with -vcodec argument, however, I don't see a way to accomplish the same in OpenCV and it seems to me that it just uses h264.
I notice that by means of CPU usage. h264 codec uses all cores of CPU, while h264_v4l2m2m takes just a little amount of CPU resources due to offloading encoding operations to hardware.
Thus, ffmpeg by itself works fine. The question is: How to achieve the same via OpenCV?
EDIT (Feb 2022): At this point of time this is not supported / tested on RPI4 as stated by the dev team in this comment.

How to streaming from an .avi container without encoding it in H264 or H265

I would like to stream a .avi container and not use any codec in the encoding process, that is, I do not want it to encode in H264 or H265, just upload the video and do not encode it, I am using the Azure SDK media services in .NET.
The presets that azure media services has for example in their sdk, they all use h264 or h265 to encode and return an mp4, I just want to upload .avi and see if it is possible that it does not apply any compression and then download the .avi
Thanks!
Adding the answer here. It looks like you were wanting to do a lossless, or near lossless encoding pass using CRF (constant rate factor encoding). There is currently no support for setting CRF encoding in the standard encoder in AMS, but there is work going on to add CRF encoding settings to the SDK in the near future.
For now, you are limited to the settings available in the Transform preset in the H264 or H265 Layers.
You can see all of the available encoding settings most easily in the REST API
https://github.com/Azure/azure-rest-api-specs/blob/main/specification/mediaservices/resource-manager/Microsoft.Media/stable/2021-06-01/Encoding.json
Or if you look at the Transform object in your favorite SDK. Look at the H264Video and H264 Layer classes in the model, as well as the H265 equivalent ones for settings you can control in your code.
https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.management.media.models.h264video?view=azure-dotnet
https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.management.media.models.h264layer?view=azure-dotnet
UPDATE: SDK for .NET is available now with Exposed RateControlMode for H264 encoding, enabling 2 new ratecontrol modes - CBR (Constant Bit Rate) and CRF (Constant Rate Factor).
See- https://www.nuget.org/packages/Microsoft.Azure.Management.Media

Audio bitrate in YouTube videos?

This video clip:
https://www.youtube.com/watch?v=wc0PB6Azwn0
What is the max audio bitrate and how to detect real audio bitrate? is it the same? Please no rumors and no guessing.
Does it depend on the video quality I am watching (1080p, 720p, etc)?
If you say - yes, it makes no sense, because the clip was uploaded at one bitrate only.
Hope, someone who knows the subject can answer that questions.
Is it possible to detect the audio bitrate from YouTube video at all?
Stats for nerds is useless, it shows no audio bitrate.
Taking consideration lossy (vs lossless) audio max is 320 Kbps, it cannot be higher than that by definition.
Anyone who knows?
You can use the command line tool youtube-dl to list all available transcodings for a given YouTube video by running youtube-dl -F <url-to-your-video>.
Example output using the URL mentioned in your question:
Studying the output we can see that the audio transcoding with the highest bitrate is "format 251" using the opus codec at an average bitrate of around 145k. Note that YouTube is not using a fixed bitrate but rather a variable bitrate with target of ~160k.
The opus codec is currently supported in most modern browsers (but not Safari). Browsers without support for opus will fallback to the m4a stream at a variable bitrate targeting ~128k.
If you want to make 100% sure which audio transcoding you're currently listening to, you can right click the YouTube video player and select "Stats for nerds" and look for the number mentioned in the codecs section and cross-reference that with the output given by youtube-dl:
Does it depend on the video quality I am watching (1080p, 720p, etc)? If you say - yes, it makes no sense, because the clip was uploaded at one bitrate only.
Yes, it depends on video quality. When you're choosing a video quality, you're not just choosing the video quality... you're choosing the audio quality as well. YouTube isn't giving you the option, but it's part of the package.
Videos aren't served as-is, they're transcoded. You upload your video and it's re-compressed at a variety of different bitrates with different settings.
Your audio bitrate depends on what YouTube decided to encode it as. Each video may have many versions of the stream.
The best thing you can do is get a build of FFmpeg with libquvi enabled, and let it parse the page, find the streams, download the stream, demux, and figure things out for you from there.
Taking consideration lossy (vs lossless) audio max is 320 Kbps, it cannot be higher than that by definition.
Your definition is wrong. There are all kinds of lossy audio codecs, and they can be ran at a variety of bitrates.

Audio for spectrogram or audio for text

I need some api or routine in Delphi (any version) to convert audio to text or convert audio to spectrogram.
Sorry for any translation errors.

Can I use ffmpeg to create multi-bitrate (MBR) MPEG-4 videos?

I am currently in a webcam streaming server project that requires the function of dynamically adjusting the stream's bitrate according to the client's settings (screen sizes, processing power...) or the network bandwidth. The encoder is ffmpeg, since it's free and open sourced, and the codec is MPEG-4 part 2. We use live555 for the server part.
How can I encode MBR MPEG-4 videos using ffmpeg to achieve this?
The multi-bitrate video you are describing is called "Scalable Video Codec". See this wiki link for basic understanding.
Basically, in a scalable video codec, a base layer stream itself has completely decodable; however, additional information is represented in the form of (one or many) enhancement streams. There are couple of techniques to be able to do this including lower/higher resolution, framerate and change in Quantization. The following papers explains in details
of Scalable Video coding for MEPG4 and H.264 respectively. Here is another good paper that explains what you intend to do.
Unfortunately, this is broadly a research topic and till date no open source (ffmpeg and xvid) doesn't support such multi layer encoding. I guess even commercial encoders don't support this as well. This is significantly complex. Probably you can check out if Reference encoder for H.264 supports it.
The alternative (but CPU expensive) way could be transcode in real-time while transmitting the packets. In this case, you should start off with reasonably good quality to start with. If you are using FFMPEG as API, it should not be a problem. Generally multiple resolution could still be a messy but you can keep changing target encoding rate.

Resources