I'm using Google STT with pyaudio for smart speaker(using python).
When using Single_utterance, STT is executed the moment the sentence is broken, and if its value is set to False, it seems that the input is received for 5 min, which is the maximum length of streaming.
However, I want to limit streaming length to 1 min. How can I make it?
Related
I'm trying to demodulate a signal using GNU Radio Companion. The signal is FSK (Frequency-shift keying), with mark and space frequencies at 1200 and 2200 Hz, respectively.
The data in the signal text data generated by a device called GeoStamp Audio. The device generates audio of GPS data fed into it in real time, and it can also decode that audio. I have the decoded text version of the audio for reference.
I have set up a flow graph in GNU Radio (see below), and it runs without error, but with all the variations I've tried, I still can't get the data.
The output of the flow graph should be binary (1s and 0s) that I can later convert to normal text, right?
Is it correct to feed in a wav audio file the way I am?
How can I recover the data from the demodulated signal -- am I missing something in my flow graph?
This is a FFT plot of the wav audio file before demodulation:
This is the result of the scope sink after demodulation (maybe looks promising?):
UPDATE (August 2, 2016): I'm still working on this problem (occasionally), and unfortunately still cannot retrieve the data. The result is a promising-looking string of 1's and 0's, but nothing intelligible.
If anyone has suggestions for figuring out the settings on the Polyphase Clock Sync or Clock Recovery MM blocks, or the gain on the Quad Demod block, I would greatly appreciate it.
Here is one version of an updated flow graph based on Marcus's answer (also trying other versions with polyphase clock recovery):
However, I'm still unable to recover data that makes any sense. The result is a long string of 1's and 0's, but not the right ones. I've tried tweaking nearly all the settings in all the blocks. I thought maybe the clock recovery was off, but I've tried a wide range of values with no improvement.
So, at first sight, my approach here would look something like:
What happens here is that we take the input, shift it in frequency domain so that mark and space are at +-500 Hz, and then use quadrature demod.
"Logically", we can then just make a "sign decision". I'll share the configuration of the Xlating FIR here:
Notice that the signal is first shifted so that the center frequency (middle between 2200 and 1200 Hz) ends up at 0Hz, and then filtered by a low pass (gain = 1.0, Stopband starts at 1 kHz, Passband ends at 1 kHz - 400 Hz = 600 Hz). At this point, the actual bandwidth that's still present in the signal is much lower than the sample rate, so you might also just downsample without losses (set decimation to something higher, e.g. 16), but for the sake of analysis, we won't do that.
The time sink should now show better values. Have a look at the edges; they are probably not extremely steep. For clock sync I'd hence recommend to just go and try the polyphase clock recovery instead of Müller & Mueller; chosing about any "somewhat round" pulse shape could work.
For fun and giggles, I clicked together a quick demo demod (GRC here):
which shows:
I know YouTube is very closed and doesn't publish any detailed statistics, but I have a specific research interest to find out the length of arbitrary How-To videos.
When I search for that term I will get a few million results. Would it be possible to determine the playback duration for portions of the search results? Since the usage of the YouTube API is limited one could grasp a few videos per day and maybe with multiple API-keys.
Beside using the API there might be powerful scrapers I could use.
JS browser utility
I'd recommend you a simple JS utility to use in a browser dev tools. Read here how to use it for counting. I've modified it to count video length time.
The JavaScript code
So open a youtube search page and open your browser’s Dev tools (it’s F12 on PCs, Preferences -> Advanced -> Show Develop menu on Mac). Once they are open, go to Console (Console tab) and enter the following code:
function domCounter(selector){
var a = document.querySelectorAll(selector);
var hour = 0, min = 0;
for(var i=0; i<a.length;i++){
var time = a[i].innerHTML.split(':');
// console.log(time);
hour += parseInt(time[0]);
min += parseInt(time[1]);
}
return hour + Math.round(min/60);
}
How to find css selector
So to call it in a browser console you just hit:
domCounter('span.video-time')
Disclaimer
This utility works for one search page result though. You might get improve it to traverse pagination.
You won't be able to get the duration videos returned from the search endpoint without looking at the video duration for each one.
The search endpoint does, however, provide videoDuration parameter you can pass in to your request to only return videos of a specific duration range:
The videoDuration parameter filters video search results based on
their duration. If you specify a value for this parameter, you must
also set the type parameter's value to video.
Acceptable values are:
any – Do not filter video search results based on their duration. This is the default value.
long – Only include videos longer than 20 minutes. medium – Only include videos that are between four and 20 minutes long (inclusive).
short – Only include videos that are less than four minutes long.
I am working on chrome cast based application using google cast api. Initially, I am joining to the chromecast session from youtube and playing the video. Later I joined to this session from my application.
There is a requirement in my application to mute the audio at particular interval of time.
I required to mute the audio from 00:01:34:03(hh:mm:ss:ms) to 00:01:34:15((hh:mm:ss:ms).
Converting the time to seconds in the below way.
Time to seconds conversion: (00*60*60)+(01*60)+34+(03/1000) = 94.003 -> Mute start time
Calling the mute method after the interval of: Mute start time - Current streaming position
I am using approximateStreamPosition value (in GCKMediaControlChannel header file) to known the stream position of the casting video. It is returning the value in double format say 94.70801001358.
In this 94 is seconds duration, what does the value after the decimal point indicates(.70801001358). Is it milliseconds? If so can I round it to three digits.
As I required to mute the audio in milliseconds duration, is it causes any delay or advance muting of the audio if I round off the value.
The 0.70801001358 is in seconds; I am not sure what you mean by asking if that is in milliseconds. In milliseconds, that number would be 708.01001358.
You wont be able to have millisecond accuracy in controlling mute (or any other control command for that matter); just setting up a command and transfer time from your iOS device to Chromecast together will throw your calculations off by a good number of milliseconds.
I would like to create a DSP plugin which takes an input of 8 channels (the 7.1 speaker mode), does some processing then returns the data to 2 output channels. My plan was to use setspeakermode to FMOD_SPEAKERMODE_7POINT1 and FMOD_DSP_DESCRIPTION.channels to 2 but that didnt work, both in and out channels were showing as 2 in my FMOD_DSP_READCALLBACK function.
How can I do this?
You cannot perform a true downmix in FMODEx using the DSP plugin interface. The best you can do is process the incoming 8ch data, then fill just the front left and front right parts of the output buffer leaving the rest silent.
Setting the channel count to 2 tells FMOD your DSP can only handle stereo signal, setting the count to 0 means any channel count.
Is there an API in one of the iOS layers that I can use to generate a tone by just specifying its Hertz. What I´m looking to do is generate a DTMF tone. This link explains how DTMF tones consists of 2 tones:
http://en.wikipedia.org/wiki/Telephone_keypad
Which basically means that I should need playback of 2 tones at the same time...
So, does something like this exist:
SomeCleverPlayerAPI(697, 1336);
If spent the whole morning searching for this, and have found a number of ways to playback a sound file, but nothing on how to generate a specific tone. Does anyone know, please...
Check out the AU (AudioUnit) API. It's pretty low-level, but it can do what you want. A good intro (that probably already gives you what you need) can be found here:
http://cocoawithlove.com/2010/10/ios-tone-generator-introduction-to.html
There is no iOS API to do this audio synthesis for you.
But you can use the Audio Queue or Audio Unit RemoteIO APIs to play raw audio samples, generate an array of samples of 2 sine waves summed (say 44100 samples for 1 seconds worth), and then copy the results in the audio callback (1024 samples, or whatever the callback requests, at a time).
See Apple's aurioTouch and SpeakHere sample apps for how to use these audio APIs.
The samples can be generated by something as simple as:
sample[i] = (short int)(v1*sinf(2.0*pi*i*f1/sr) + v2*sinf(2.0*pi*i*f2/sr));
where sr is the sample rate, f1 and f1 are the 2 frequencies, and v1 + v2 sum to less than 32767.0. You can add rounding or noise dithering to this for cleaner results.
Beware of clicking if your generated waveforms don't taper to zero at the ends.