RTPEngine transcode Opus to PCMU - unable to change Opus settings from defaults - transcoding

I am invoking RTPEngine from OpenSIPS to transcode incoming Opus call to PCMU. I`m passing these parameters to the RTPEngine daemon:
codec-mask-opus codec-set-opus/48000/2/32000//maxplaybackrate=32000;stereo=0;useinbandfec=1;maxaveragebitrate=32000 transcode-PCMU
When the call is established, I can see (in SIP 200OK SDP fmtp field) that Opus is being told to use default values, my changes haven`t been implemented. If I try to change Opus parameters on transcoding from PCMU to Opus, it works as expected. My question is simply this - is my syntax correct or am I doing something wrong?

Related

Record audio (FLAC or WAV) in ReactJs and use Speech2Text from google using ruby backend

I need to record an audio through front-end. I'm using React Mic and mic-recorder-to-mp3 ReactJS libraries. Everything is fine, I'm managing to download the blob of the audio and hear it. Although, I need to upload it to my back-end so I can use Google SpeechToText API to extract text from audio.
That is the script that im using but its not returning any result, i think that is cos' the audio recorded dont have the right encodings.
require "google/cloud/speech"
require 'json'
# Instantiates a client
speech = Google::Cloud::Speech.new
# The name of the audio file to transcribe
file_name = "./newmp3.mp3"
# The raw audio
audio_file = File.binread file_name
encoding = :LINEAR16
# The audio file's encoding and sample rate
config = {
encoding: "LINEAR16",
language_code: "pt-BR",
model: "default",
sample_rate_hertz: 16000
}
audio = { content: audio_file }
# Detects speech in the audio file
response = speech.recognize(config, audio)
results = response.results
puts response
You are sending an mp3 file to the API, but you are telling it that the file is encoded as LINEAR16 (PCM data). This will not work.
According to the speech API docs, MP3 is only supported through the beta API.
One easy way to resolve this is to use a simple external audio encoder like ffmpeg, and convert it to something like FLAC before sending it over:
ffmpeg -i input.mp3 output.flac
Then set FLAC as the audio type for your encoding setting. But remember you cannot upload more then 1 minute of audio using this method. Uploading longer files has to be done asynchronously by using Google Cloud servers for storage.

Can't get stats while using sout with libvlc

I want to display a stream, and monitor the video stats of the displayed stream while recording it using libvlc. When I use sout + duplicate to record the stream while displaying it, I can only get the demux_bitrate stat from the displayed stream using libvlc_media_get_stats function. I am looking to get decoded_video, displayed_pictures, etc as well.
I've tried using the duplicate module to try to make this happen but I can't seem to make this work - am not sure if what I want to do is supported. Code below is tweaked from https://wiki.videolan.org/Documentation:Modules/display/ example for transcoding a stream while displaying the original version.
:sout=#duplicate{dst='transcode{vcodec=h264}:std{access=file,mux=ts,dst=c:\junk\test.mp4}',dst=display}
The stream displays, the file is generated, but the only valid stat is demux_bitrate which seems like the stat that would be accessible from the non-display stream instead of the displayed version.
Display and save with Transcoding
:sout=#duplicate{dst=display,dst="transcode{vcodec=h264}:standard{access=file,mux=mp4,dst=c:\junk\test.mp4}"}
Display and save without Transcoding
:sout=#duplicate{dst=display,dst=standard{access=file,mux=mp4,dst=c:\junk\test.mp4}}

How to save just raw PCM to file with iOS SDK (Core Audio)?

I'm converting an MP3 file into raw PCM, and I need to save it as just raw PCM. (Note, am using Java/RoboVM to port to iOS.)
I'm using the coreaudio package, and the relevant part of my code looks like this:
// Define the output PCM format.
AudioStreamBasicDescription outputFormat = new AudioStreamBasicDescription();
outputFormat.setFormat(AudioFormat.LinearPCM);
outputFormat.setFormatFlags(AudioFormatFlags.Canonical);
outputFormat.setBitsPerChannel(16);
outputFormat.setChannelsPerFrame(1);
outputFormat.setFramesPerPacket(1);
outputFormat.setBytesPerFrame(2);
outputFormat.setBytesPerPacket(2);
outputFormat.setSampleRate(22050);
// ...
outputFile = ExtAudioFile.create(outputFileURL, AudioFileType.CAF, outputFormat, null, AudioFileFlags.EraseFile);
I then run through a loop, reading from the MP3 file and writing to the output file.
Upon importing this raw file into Audacity, I notice it always has a spike at the start, indicating that it's not actually a raw PCM file but instead is inside of a wrapper with a header (whether it be WAV or CAF headers, etc).
I understand I can just take the file and strip the header off and get the raw PCM data, but in terms of space/performance of this part of my app, I'd love if I can just keep it simple and save the raw PCM data as-is without a wrapper, but I don't know how to go about doing that.
The issue arises here:
outputFile = ExtAudioFile.create(outputFileURL, AudioFileType.CAF, outputFormat, null, AudioFileFlags.EraseFile);
There aren't many choices for AudioFileType, I've tried WAVE and CAF. Ideally there would be a PCM or RAW option but there's not. Is there a specific AudioFileType I should choose, or do I need to go about this another way?
The extended audio file services framework doesn't support a "raw" PCM format.
For an application to understand a PCM format it needs to know data stuff like:
How many channels are there
Are they interleaved or not
What is the sample rate
Is the data floating point or not
What is the bit depth
etc...
In fact, on iOS and OS X the AudioStreamBasicDescription is a struct which tells you what is required to interpret a PCM stream. For this reason, a "raw PCM" format doesn't really work, it needs at least some metadata. The closest formats to raw PCM are WAV, AIFF and CAF. If these don't serve your purposes you'll have to create a custom file format. But this doesn't need to be difficult.
The extended audio file services APIs are quite configurable. After opening an audio file to read (ExtAudioFileOpenURL) you can set various properties on the ExtAudioFileRef handle.
In your case consider setting kExtAudioFileProperty_ClientDataFormat. This property controls the format of the PCM data read from the file. As ExtAudioFileRead decodes the input file, it will convert the data it sends back to the format you specify. There are some limitations to this method. IIRC, it does not support doing sample rate conversion and things like that.
As you read the properly decoded data, you can then use something like NSOutputStream to write the "raw PCM" format of your choice directly to a file with no metadata at all.

capture h264 to buffer with openCV

I retrieve an rtp H264 stream. I process like this :
- get Udp packet
- remove rtp header and parse packet to get image
- record/append image into a file
- open this file with opencv (bool VideoCapture::open(const string& filename))
and all is working fine!!
Now I want to skip the record in the file step and directly send image from udp process to opencv. But i don't know how initialise opencv with an input buffer. It only accept const string& filename.
Could somebody help me?
Thanks
if it's single images, you got in memory, imdecode should do the trick

Other causes for DirectShow "no combination of intermediate filters could be found" errors?

I have a Delphi 6 application that uses the DSPACK DirectShow component library. Currently I am getting the error "no combination of intermediate filters could be found" when I attempt to connect the Capture pin on an audio capture device to the Input pin of another filter. I believe I am setting the media formats correctly. I have an error trap and in that trap I query explicitly both pins for the exact media format they are set to in case there is an incongruity. When I do this, both pins come back with the exact same WAV format:
format tag: 1
number of channels: 1
bits per sample: 16
sample rate: 8000
That matches up to what I set both filters to, yet I am getting an error that (usually as far as I know) indicates a format incompatibility. Has anyone run into this error before and knows what I might be doing wrong or what other kinds of tests/inspections I can do?
It turns out the error was being caused by the media format I was returning from my push source audio filter. I had the wrong sub-type and that was triggering the "no combination of intermediate filters could be found" error from DirectShow since the sub-type I was using in my push source filter was incorrect and not compatible with other filters like the Capture filter I was using in my filter graph. See the "UPDATE" note in my thread on media format's for full details:
Correct Media Type settings for a DirectShow filter that delivers Wav audio data?

Resources