Multiple StreamingRecognizeRequest - google-cloud-speech

I'm trying to setup a StreamingRecognize, with multiple request's. Is it possible ?
The point is that i want to send audio stream from the mic with a unknown time, so i think that i must implement multiple requests. (Considering that a request session has a max_time = 65 seconds).
Anyone can help me with this ?
Thank's alot ;)
Google sample code:
static async Task<object> StreamingMicRecognizeAsync(int seconds)
{
if (NAudio.Wave.WaveIn.DeviceCount < 1)
{
Console.WriteLine("No microphone!");
return -1;
}
var speech = SpeechClient.Create();
var streamingCall = speech.StreamingRecognize();
// Write the initial request with the config.
await streamingCall.WriteAsync(
new StreamingRecognizeRequest()
{
StreamingConfig = new StreamingRecognitionConfig()
{
Config = new RecognitionConfig()
{
Encoding =
RecognitionConfig.Types.AudioEncoding.Linear16,
SampleRateHertz = 16000,
LanguageCode = "en",
},
InterimResults = true,
}
});
// Print responses as they arrive.
Task printResponses = Task.Run(async () =>
{
while (await streamingCall.ResponseStream.MoveNext(
default(CancellationToken)))
{
foreach (var result in streamingCall.ResponseStream
.Current.Results)
{
foreach (var alternative in result.Alternatives)
{
Console.WriteLine(alternative.Transcript);
}
}
}
});
// Read from the microphone and stream to API.
object writeLock = new object();
bool writeMore = true;
var waveIn = new NAudio.Wave.WaveInEvent();
waveIn.DeviceNumber = 0;
waveIn.WaveFormat = new NAudio.Wave.WaveFormat(16000, 1);
waveIn.DataAvailable +=
(object sender, NAudio.Wave.WaveInEventArgs args) =>
{
lock (writeLock)
{
if (!writeMore) return;
streamingCall.WriteAsync(
new StreamingRecognizeRequest()
{
AudioContent = Google.Protobuf.ByteString
.CopyFrom(args.Buffer, 0, args.BytesRecorded)
}).Wait();
}
};
waveIn.StartRecording();
Console.WriteLine("Speak now.");
await Task.Delay(TimeSpan.FromSeconds(seconds));
// Stop recording and shut down.
waveIn.StopRecording();
lock (writeLock) writeMore = false;
await streamingCall.WriteCompleteAsync();
await printResponses;
return 0;
}

In Cloud Speech-to-Text audio length limit for each streaming request is around 1 minute [1]. You can either use asynchronous speech recognition [2] for audio files up to 180 minutes or renew the streaming request before it reaches to the time limit for streaming speech recognition [3].
Here is a Python example how to renew streaming request and stream audio more than 1 minute [4].

Related

WebRTC connection does not resume after mobile browser is backgrounded

I have a web application running on Safari on an iPad displaying a live WebRTC video stream. When the user switches away from Safari for a few seconds, and then switches back, the <video> element just shows a black rectangle.
I have added logging to the onsignalingstatechange handler, and checked the console logs for any apparent errors after resuming Safari, but there is nothing obvious indicating the failure.
How can I recover/resume/restart the stream after the user switches back to Safari?
Here is my cargo cult WebRTC code, for reference:
export default class WebRtcPlayer {
static server = "http://127.0.0.1:8083";
server = null;
stream = null;
channel = null;
webrtc = null;
mediastream = null;
video = null;
constructor(id, stream, channel) {
this.server = WebRtcPlayer.server;
this.video = document.getElementById(id);
this.stream = stream;
this.channel = channel;
this.video.addEventListener("loadeddata", () => {
this.video.play();
});
this.video.addEventListener("error", () => {
console.error("video error");
});
this.play();
}
getStreamUrl() {
// RTSPtoWeb only, not RTSPtoWebRTC
return `${this.server}/stream/${this.stream}/channel/${this.channel}/webrtc`;
}
async play() {
console.log("webrtc play");
this.mediastream = new MediaStream();
this.video.srcObject = this.mediastream;
this.webrtc = new RTCPeerConnection({
iceServers: [{
urls: ["stun:stun.l.google.com:19302"],
}],
sdpSemantics: "unified-plan"
});
this.webrtc.onnegotiationneeded = this.handleNegotiationNeeded.bind(this);
this.webrtc.onsignalingstatechange = this.handleSignalingStateChange.bind(this);
this.webrtc.ontrack = this.handleTrack.bind(this);
this.webrtc.addTransceiver("video", {
"direction": "sendrecv",
});
}
async handleNegotiationNeeded() {
console.log("handleNegotiationNeeded");
let offer = await this.webrtc.createOffer({
offerToReceiveAudio: false,
offerToReceiveVideo: true
});
await this.webrtc.setLocalDescription(offer);
}
async handleSignalingStateChange() {
console.log(`handleSignalingStateChange ${this.webrtc.signalingState}`);
switch (this.webrtc.signalingState) {
case "have-local-offer":
let formData = new FormData();
formData.append("data", btoa(this.webrtc.localDescription.sdp));
const response = await fetch(this.getStreamUrl(), {
method: "POST",
body: formData,
});
this.webrtc.setRemoteDescription(new RTCSessionDescription({
type: "answer",
sdp: atob(await response.text()),
}));
break;
case "stable":
/*
* There is no ongoing exchange of offer and answer underway.
* This may mean that the RTCPeerConnection object is new, in which case both the localDescription and remoteDescription are null;
* it may also mean that negotiation is complete and a connection has been established.
*/
break;
case "closed":
/*
* The RTCPeerConnection has been closed.
*/
break;
default:
console.log(`unhandled signalingState is ${this.webrtc.signalingState}`);
break;
}
}
handleTrack(event) {
console.log("handle track");
this.mediastream.addTrack(event.track);
}
static setServer(serv) {
this.server = serv;
}
}
I'm not sure if it's the best way, but I used the Page Visibility API to subscribe to the visibilitychange event:
constructor(id, stream, channel) {
// ...
document.addEventListener("visibilitychange", () => {
if (document.visibilityState === "visible") {
console.log("Document became visible, restarting WebRTC stream.");
this.play();
}
});
// ...
}

Is there a limitation in number of context which can be opened in Playwright?

We are trying web crawl and get contents from multiple pages. I am taking the advantage of async API with Promise ALL which can execute requests in parallel.
Is there a limitation on the number of contexts which can be opened parallel?
const fs = require('fs');
let browser;
const batch_size = 4; // control the number of async parallel calls
(async () => { // main function
let urls = [];
urls = fs.readFileSync('./resources/input_selenium_urls.csv').toString().split("\n");
browser = await chromium.launch();
let context_size = 0;
let processUrls = [];
let total_length = 0;
for (let i=0;i<urls.length;i++,total_length++) {
if ((context_size==batch_size)||(i==urls.length-1)){
await Promise.all(processUrls.map(x => getHTMLPageSource(x)));
context_size = 0;
processUrls = [];
} else {
processUrls.push(urls[i]);
context_size++;
}
}
await browser.close();
})();
async function getHTMLPageSource(url) {
const context = await browser.newContext();
const page = await context.newPage();
let response = {}
try {
await page.goto(url, { waitUntil: 'networkidle' });
response = {
url : url,
content: await page.title(),
error : null
}
console.log(response);
}
catch {
response = {
error : "Timeout error"
}
}
context.close;
return response;
}
Browser contexts are cheap to create, but it's not clear whether there is a hard-coded limit on them from the docs perhaps the limit might depend on the browser you chose and your OS resources. I think you might only be able to find out by creating a lot of contexts.

Reading From Socket Multiple Times

I'm having some trouble writing and reading from a socket multiple times. I'm writing a speed test and I essentially want to write an http request using a socket, which will return garbage data which I then read to test the bandwidth speed. I've made the http request successfully, but I'm unable to flip flop between writing to the socket and then reading the response. I can read the data from the socket's stream once, and then it doesn't seem to have any more data to read even after i've made another request.
The salient part of the code is in the start method where the loop runs:
class DownloadTest {
late Socket socket;
late Stream<Uint8List> dataStream;
bool graceTimeOver = false;
int totalBytesDownloaded = 0;
late DateTime startTime;
final ckSize = 10;
final graceTime = 2;
final dlTime = 5;
final client = HttpClient();
final String _serverAddress;
final void Function(double mbps) onProgress;
DownloadTest({required serverAddress, required this.onProgress})
: _serverAddress = serverAddress;
Future<void> start() async {
await resetTest();
while (true) {
writeDlRequest();
var bytesDownloaded = await downloadData();
print(graceTimeOver);
if (!graceTimeOver) {
checkGraceTime();
} else {
print(bytesDownloaded);
totalBytesDownloaded += bytesDownloaded;
if (testFinished()) {
break;
}
}
}
print('test over');
await socket.close();
}
Future<int> downloadData() async {
var bytes = 0;
await for (var data in dataStream) {
bytes += data.length;
}
return bytes;
}
Future<void> resetTest() async {
socket = await Socket.connect(_serverAddress, 80);
dataStream = socket.asBroadcastStream();
graceTimeOver = false;
startTime = DateTime.now();
totalBytesDownloaded = 0;
}
void checkGraceTime() {
if (!graceTimeOver) {
var elapsedSeconds =
startTime.difference(DateTime.now()).inMilliseconds / 1000;
if (elapsedSeconds >= graceTime) {
graceTimeOver = true;
startTime = DateTime.now();
print('grace time over');
}
}
}
void writeDlRequest() {
socket.write('GET /garbage.php?ckSize=$ckSize HTTP/1.1\r\n');
socket.write('Host: speedtest.somethingsomething.com:80\r\n');
socket.write('Connection: keep-alive\r\n');
socket.write('\r\n');
}
bool testFinished() {
var elapsedSeconds =
startTime.difference(DateTime.now()).inMilliseconds / 1000;
return elapsedSeconds >= dlTime;
}
}
I've made the socket's stream into a broadcast stream so I should be able to listen to it multiple times, but after reading from it the first time, it fails to read any more data and the loop just spins. Any ideas where i'm going wrong?

Is it possible to do voice pitch shifting in Twilio group video?

We have built a web application. The application's core is to arrange the meetings/sessions on the web. So User A(Meeting co-ordinator) will arrange a meeting/session and all other participants B, C, D and etc will be joining in the meeting/session. So I have used Twilio group video call to achieve it.
I have the below use case.
We want to do the voice pitch shifting of the User A's(Meeting co-ordinator) voice. So all other participants will be receiving the pitch-shifted voice in group video. We have analyzed the AWS Polly in Twilio but it doesn’t match with our use case.
So please advice is there any services in Twilio to achieve this scenario.
(or)
will it be possible to interrupt Twilio group call and pass the pitch-shifted voice to other participants?
Sample Code Used
initAudio();
function initAudio() {
analyser1 = audioContext.createAnalyser();
analyser1.fftSize = 1024;
analyser2 = audioContext.createAnalyser();
analyser2.fftSize = 1024;
if (!navigator.getUserMedia)
navigator.getUserMedia = navigator.webkitGetUserMedia || navigator.mozGetUserMedia;
if (!navigator.getUserMedia)
return(alert("Error: getUserMedia not supported!"));
navigator.getUserMedia({ audio: true }, function(stream){
gotStream(stream);
}, function(){ console.log('Error getting Microphone stream'); });
if ((typeof MediaStreamTrack === 'undefined')||(!MediaStreamTrack.getSources)){
console.log("This browser does not support MediaStreamTrack, so doesn't support selecting sources.\n\nTry Chrome Canary.");
} else {
MediaStreamTrack.getSources(gotSources);
}
}
function gotStream (stream) {
audioInput = audioContext.createMediaStreamSource(stream);
outputMix = audioContext.createGain();
dryGain = audioContext.createGain();
wetGain = audioContext.createGain();
effectInput = audioContext.createGain();
audioInput.connect(dryGain);
audioInput.connect(effectInput);
dryGain.connect(outputMix);
wetGain.connect(outputMix);
audioOutput = audioContext.createMediaStreamDestination();
outputMix.connect(audioOutput);
outputMix.connect(analyser2);
crossfade(1.0);
changeEffect();
}
function crossfade (value) {
var gain1 = Math.cos(value * 0.5 * Math.PI);
var gain2 = Math.cos((1.0 - value) * 0.5 * Math.PI);
dryGain.gain.value = gain1;
wetGain.gain.value = gain2;
}
function createPitchShifter () {
effect = new Jungle( audioContext );
effect.output.connect( wetGain );
effect.setPitchOffset(1);
return effect.input;
}
function changeEffect () {
if (currentEffectNode)
currentEffectNode.disconnect();
if (effectInput)
effectInput.disconnect();
var effect = 'pitch';
switch (effect) {
case 'pitch':
currentEffectNode = createPitchShifter();
break;
}
audioInput.connect(currentEffectNode);
}
Facing the error while adding the Localaudiotrack to a room
var mediaStream = new Twilio.Video.LocalAudioTrack(audioOutput.stream);
room.localParticipant.publishTrack(mediaStream, {
name: 'adminaudio'
});
ERROR:
Uncaught (in promise) TypeError: Failed to execute 'addTrack' on 'MediaStream': parameter 1 is not of type 'MediaStreamTrack'.
Twilio developer evangelist here.
There is nothing within Twilio itself that pitch shifts voices.
If you are building this in a browser, then you could use the Web Audio API to take the input from the user's microphone and pitch shift it, then provide the resultant audio stream to the Video API instead of the original mic stream.
the comments in the above answer are SO helpful! I've been researching this for a couple of weeks, posted to Twilio-video.js to no avail and finally just the right phrasing pulled this up on S.O!
but to summarize and to add what I've found to work since it's hard to follow all the 27 questions/comments/code excerpts:
when connecting to Twilio:
const room = await Video.connect(twilioToken, {
name: roomName,
tracks: localTracks,
audio: false, // if you don't want to hear the normal voice at all, you can hide this and add the shifted track upon participant connections
video: true,
logLevel: "debug",
}).then((room) => {
return room;
});
upon a new (remote) participant connection:
const stream = new MediaStream([audioTrack.mediaStreamTrack]);
const audioContext = new AudioContext();
const audioInput = audioContext.createMediaStreamSource(stream);
source.disconnect(audioOutput);
console.log("using PitchShift.js");
var pitchShift = PitchShift(audioContext);
if (isFinite(pitchVal)) {
pitchShift.transpose = pitchVal;
console.log("gain is " + pitchVal);
}
pitchShift.wet.value = 1;
pitchShift.dry.value = 0.5;
try {
audioOutput.stream.getAudioTracks()[0]?.applyConstraints({
echoCancellation: true,
noiseSuppression: true,
});
} catch (e) {
console.log("tried to constrain audio track " + e);
}
var biquadFilter = audioContext.createBiquadFilter();
// Create a compressor node
var compressor = audioContext.createDynamicsCompressor();
compressor.threshold.setValueAtTime(-50, audioContext.currentTime);
compressor.knee.setValueAtTime(40, audioContext.currentTime);
compressor.ratio.setValueAtTime(12, audioContext.currentTime);
compressor.attack.setValueAtTime(0, audioContext.currentTime);
compressor.release.setValueAtTime(0.25, audioContext.currentTime);
//biquadFilter.type = "lowpass";
if (isFinite(freqVal)) {
biquadFilter.frequency.value = freqVal;
console.log("gain is " + freqVal);
}
if (isFinite(gainVal)) {
biquadFilter.gain.value = gainVal;
console.log("gain is " + gainVal);
}
source.connect(compressor);
compressor.connect(biquadFilter);
biquadFilter.connect(pitchShift);
pitchShift.connect(audioOutput);
const localAudioWarpedTracks = new Video.LocalAudioTrack(audioOutput.stream.getAudioTracks()[0]);
const audioElement2 = document.createElement("audio");
document.getElementById("audio_div").appendChild(audioElement2);
localAudioWarpedTracks.attach();

Play multiple Audio files on Safari at once

I want to play multiple Audio files simultaneously on iOS .
On the click of a button I create multiple instance of an Audio file and put them into an array.
let audio = new Audio('path.wav')
audio.play().then(() => {
audio.pause();
possibleAudiosToPlay.push(audio);
});
After a while I play them all:
possibleAudiosToPlay.forEach(el => {
el.currentTime = 0;
el.play();
});
While this plays all audio files: When a new one begins it stops the old one. (on iOS)
Apples developer guide says this isn't possible at all with HTML5 Audio:
Playing multiple simultaneous audio streams is also not supported.
But can this be achieved with the Web Audio API?
There isn't anything written about it in Apples developer guide.
Yes you can with Web Audio API. You have to create an AudioBufferSourceNode for each one of your audio sources, since each source can be played only once (you can't stop it and play it again).
const AudioContext = window.AudioContext || window.webkitAudioContext;
const ctx = new AudioContext();
const audioPaths = [
"path/to/audio_file1.wav",
"path/to/audio_file2.wav",
"path/to/audio_file3.wav"
];
let promises = [];
// utility function to load an audio file and resolve it as a decoded audio buffer
function getBuffer(url, audioCtx) {
return new Promise((resolve, reject) => {
if (!url) {
reject("Missing url!");
return;
}
if (!audioCtx) {
reject("Missing audio context!");
return;
}
let xhr = new XMLHttpRequest();
xhr.open("GET", url);
xhr.responseType = "arraybuffer";
xhr.onload = function() {
let arrayBuffer = xhr.response;
audioCtx.decodeAudioData(arrayBuffer, decodedBuffer => {
resolve(decodedBuffer);
});
};
xhr.onerror = function() {
reject("An error occurred.");
};
xhr.send();
});
}
audioPaths.forEach(p => {
promises.push(getBuffer(p, ctx));
});
// Once all your sounds are loaded, create an AudioBufferSource for each one and start sound
Promise.all(promises).then(buffers => {
buffers.forEach(b => {
let source = ctx.createBufferSource();
source.buffer = b;
source.connect(ctx.destination);
source.start();
})
});

Resources