Audio for spectrogram or audio for text - delphi

I need some api or routine in Delphi (any version) to convert audio to text or convert audio to spectrogram.
Sorry for any translation errors.

Related

How to streaming from an .avi container without encoding it in H264 or H265

I would like to stream a .avi container and not use any codec in the encoding process, that is, I do not want it to encode in H264 or H265, just upload the video and do not encode it, I am using the Azure SDK media services in .NET.
The presets that azure media services has for example in their sdk, they all use h264 or h265 to encode and return an mp4, I just want to upload .avi and see if it is possible that it does not apply any compression and then download the .avi
Thanks!
Adding the answer here. It looks like you were wanting to do a lossless, or near lossless encoding pass using CRF (constant rate factor encoding). There is currently no support for setting CRF encoding in the standard encoder in AMS, but there is work going on to add CRF encoding settings to the SDK in the near future.
For now, you are limited to the settings available in the Transform preset in the H264 or H265 Layers.
You can see all of the available encoding settings most easily in the REST API
https://github.com/Azure/azure-rest-api-specs/blob/main/specification/mediaservices/resource-manager/Microsoft.Media/stable/2021-06-01/Encoding.json
Or if you look at the Transform object in your favorite SDK. Look at the H264Video and H264 Layer classes in the model, as well as the H265 equivalent ones for settings you can control in your code.
https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.management.media.models.h264video?view=azure-dotnet
https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.management.media.models.h264layer?view=azure-dotnet
UPDATE: SDK for .NET is available now with Exposed RateControlMode for H264 encoding, enabling 2 new ratecontrol modes - CBR (Constant Bit Rate) and CRF (Constant Rate Factor).
See- https://www.nuget.org/packages/Microsoft.Azure.Management.Media

NVidia RTX Voice, input a wav file instead of a microphone

I've downloaded this Ai software. I've tried it and it does a beautiful job with noise cancellation as the input should be the microphone.
My question is : How can I input a wav file instead of a microphone? Anyway to use this magical noise cancellation tool with a simple wav file ?

Google Speech API word offset timestamps are inaccurate

I have some audio files (25 GB) for which I want to provide a feature for the user to see the highlighted word synced to the audio as it's being played. I was looking to Google Speech API to transcribe the files and provide the data for the word offsets so I wouldn't have to manually do this. However, I've noticed that the offsets are inconsistently accurate even if the API is able to transcribe the audio properly and accurately (over 90% confidence per word).
What can affect the accuracy of these word timings?
Some observations:
I created an audio file of "The quick brown fox jumped over the sleeping lazy dog." using Audacity as a 16-bit wave with sampling of 44100. The API transcribed properly but word timings are way off missing entire words completely.
I create a wave file from Audible (via mic) and the word offsets were quite accurate.
I tried a professionally recorded Arabic file and although the API transcribed accurately, the word timings were way off.

which format is best for upload video & audio

I am working on "upload video & audio to server",I want to know which format is best for upload (consider the quality & file-size)
video formats are just containers, if you want to consider quality and file size you should look into the encoding of video. For ios based devices h264 encoder with high efficiency level 4 provides the good compression, hence you will get good quality in less file size.
If you want to learn about conversion of video data from one format to another please look into ffmpeg.

Library for decoding H.264 RTSP stream

I was planning to decode H.264 based RTSP stream using FFMPEG in OpenCV but, when I tried so it gave some errors. Later, I found that many people have faced issues while decoding H.264 stream using ffmpeg (libavcodec). Typically the below mentioned error messages pop-up while using libavcodec:
"[h264 # 0xa766dd0]concealing 1200 DC, 1200 AC, 1200 MV errors"
Has anyone used any other library successfully for decoding H.264 based RTSP. If so, which is the library (I have heard of live555 which is used within vlc player for decoding such streams). I would also like to know the output format and how it can be made compatible with OpenCV (typically within opencv we can use cvQueryFrame to directly extract a frame from a video stream, but in case we are using a library other than ffmpeg how to go about it).
Thanks in advance.
Regards,
Saurabh Gandhi
VLC is using ffmpeg to decode h.264.
the problem can happen when you have the wrong SPS PPS, or don't have.
You need to extract it from the RTSP protocol and pass it to the ffmpeg before trying to decode video.
To Decode your RTSP stream , The best libraries are FFMPEG and Gstreamer.
To decode the stream you need to feed the decoder with the right buffer for which you have to understand your H.264 stream so that you can arrange your SPS, PPS and NAL data before feeding it to the Library Decoder

Resources