Is it possible to get a best-guess for the entire input-audio from Google Cloud Speech? - google-cloud-speech

We're running into an issue trying to use google cloud speech (GCS) for audio-indexing purposes. We've tried two different setups:
A single audio-file containing multiple speakers (high SNR, only
speech + silence) is sent to GCS.
The audio-file is split into separate speakers, the segments concatenated,
and one audio-file per speaker is sent to GCS.
The problem is that large parts (~22%) of the speech doesn't get any output hypotheses regardless of setup (1 or 2 above).
The documentation states that "If the Speech API determines that an alternative has a sufficient Confidence Value, then that alternative is included in the response." Is this also true for the best hypothesis (that it's only included if the confidence is high enough) – and is that why parts of the speech is missing?
And the actual question as per the title: Is it possible to get a best-guess for the entire input-audio from Google Cloud Speech?

Related

What triggers an intermediate result in the streaming API of Google Cloud Speech?

It seems to be a specific number of seconds of silence, but I haven't found a definitive answer on this.
This detail is not publicly available in the documentation. Since the interim_results are used on streaming requests to get partial translation during the streaming, my guess is that it is not a straight-forward rule but most likely is related to the ML model used, so it would be hard to identify the exact triggers for every different situation.
Nonetheless, I'm wondering why you would need to know the relation between what's ocurring with the audio and interim result during a streaming request. For example, if you want to split the requests and translations for every time the speaker pause, you could use single_utterance=true for every request.

Applying for Additional Quota for YouTube API as an Individual (without business info)

I recently began using the Youtube Data v3 API for a program that I'm writing which is purely for personal use. To give a brief summary of what it does, it checks the the live chat from my most recent (usually ongoing) livestream and performs actions based on certain keywords entered in chat (essentially commands for people to use from live chat). In order to do that, however, I have to constantly send requests to get a refreshed livechat. As it is now, it sends requests on 1 second intervals. I recently did a livestream to test out my program and it only took about 25 minutes for me to reach the daily quota limit of 10,000 units/day.
The request is:youtube.liveChatMessages().list(liveChatId=liveChatId,part="snippet")
It seems like every request I make costs 6 units, according to the math. I want to be able to host livestreams at lengths of up to 3 hours, which would require a significant quota increase. I'm aware that there is an option to fill out a form to request additional quota. However, it asks for business information such as a business name, business website, business mailing address, etc. Like I said before, I'm doing this for my own use only. I'm in no way part of a business, and just made my program as a personal project. Does anyone know if there's any way to apply for additional quota as an individual/hobbyist? If not, do you think just putting n/a in those fields would be acceptable? I did find another post where someone else had the exact same problem, but no one was able to give a helpful answer. Any advice would be greatly appreciated.
Unfortunately, and although only related, it seems as Google is for the money here. I also tried to do something similar myself (a very basic chat bot just reading the chat messages), and, although some other users on the net got some different results, they all have in common that, according to the doc how it should be done, all poll at this interval of about once a second (that's the timeout one get as part of the answer to a poll for new messages). I, along with a few others, got as most as about 5 minutes with polling once a second, some others, like you, got a few more minutes out of it. I changed the interval by hand in incrementing intervals of 5 seconds each: 5, 10, 15, etc... you get the picture. I can't remember on which value I finally tuned in, but I was only able to get about 2 1/2 hours worth with a rather long polling interval of just once every 10 seconds or so - still way enough for a simple chat bot just reading the chat. But also replying would had at least doubled the usage and hence halfed the time.
It's already a pain to get it working as an idividual as just setting up the required OAuth authentication requires one to at least provide basic information like providing a fixed callback and some legal and policy information. I always ended up in had it rejected with this standard reply "Your project seem to be for internal use only.". I even was able to got this G suite working (before it required payment) to set up an "internal" project (only possible if account belongs to a G suite organization account), but after I set up the OAuth login I got the error that my private account I wanted to use the bot on was not part of the organization and hence can't be used. TLDR: Just useless waste of time.
As far as I'm in for this for several months now there's just no way to get it done as a private individual for personal use. Yes, one can just set it up and have the required check rejected (as it uses the YouTube data API scopes), but one still stuck with that 10.000 units / day quota. Building your own powerful tool capable of doing more than just polling once every 10 to 30 seconds with just a minimum of interaction doesn't get you any further than just a few minuts, maybe one or two hours if you're lucky. If you want more you have to set up a business and pay for it - simple and short: Google wants you to pay for that service.
As Mixer got officially announced to be shut down on July 22nd you have exactly these two options:
Use one of the public available services like Streamlabs, Nightbot, etc ... They're backed by their respective "businesses" and by it don't seem to have those quota limits (although I just found some complaints on Streamlabs just from April - so about one month prior to when you posted this question where they admitted to had reached their limits - don't know if they already got it solved).
Don't use YouTube for streaming but rather Twitch - as Twitch doesn't have these limits and anybody is free to set up an API token either on the main account or on a second bot account (which is also explicitly explained in their docs). The downside of this are of course the objective sacrifices one has to suffer: a) viewers only have the quality of the streamer until one reaches at least affiliate b) caped at max 1080p60 with only 6.000kBit/s c) only short time of VOD storage
I myself wanted to use YouTube as my main platform (and currently do, but without my own stuff at the moment) and my own bot stuff and such as streaming on YouTube has some advantages over Twitch, but as YouTube wants me to pay what others (namely: Twitch) offer me for free (although overall not as good quality) it's an easy decision to make. Mixer looked promissing, as it also offered quite some neat features (overall better quality than Twitch, lower latency), but the requirements to get partner status were so high (2.000 followers along with another insane high number to reach) and Mixer itself just so little of a platform (I made the fun to count all the streamers and viewers - only a few hundred streamers with just a few 10.000s viewers the whole platform had less than some big Twitch channels on their own) - and now it's announced soon to be dead anyway.
Hope this may give you some input into what a small streamer has to consider and suffer from when chosing a platform - but after all what I experienced I have these information: Either do it like all the others: Stream on Twitch and use YouTube as an archive to export to from Twitch (although Twitch STILL doesn't have an auto-export of the latest VOD implemented - but I guess that could be done by some small script) - or if you want to stay on YouTube use some existing bot like Nightbot or any of the other services like Streamlabs.
If you get any other information on how to convince Google to increase the limit as an individual please let us know.

Throttling of OneNote (Graph) API

We have developed an importing solution for one of our clients. It parses and converts data contained in many OneNote notebooks, to required proprietary data structures, for the client to store and use within another information system.
There is substantial amount of data across many notebooks, requiring a considerable amount of Graph API queries to be performed, in order to retrieve all of the data.
In essence, we built a bulk-importing (batch process, essentially) solution, which goes through all OneNote notebooks under a client's account, parses sections and pages data of each, as well as downloads and stores all page content - including linked documents and images. The linked documents and images require the most amount of Graph API queries.
When performing these imports, the Graph API throttling issue arises. After certain time, even though we are sending queries at a relatively low rate, we start getting the 429 errors.
Regarding data volume, average section size of a client notebook is 50-70 pages. Each page contains links to about 5 documents for download, on average. Thus, it requires up to 70+350 requests to retrieve all the pages content and files of a single notebook section. And our client has many such sections in a notebook. In turn, there are many notebooks.
In total, there are approximately 150 such sections across several notebooks that we need to import for our client. Considering the stats above, this means that our import needs to make a total of 60000-65000 Graph API queries, estimated.
To not flood the Graph API service and keep within the throttling limits, we have experimented a lot and gradually decreased our request rate to be just 1 query for every 4 seconds. That is, at max 900 Graph API requests are made per hour.
This already makes each section import noticeably slow - but it is endurable, even though it means that our full import would take up to 72 continuous hours to complete.
However - even with our throttling logic at this rate implemented and proven working, we still get 429 "too many requests" errors from the Graph API, after about 1hr 10mins, about 1100 consequtive queries. As a result, we are unable to proceed our import on all remaining, unfinished notebook sections. This enables us to only import a few sections consequtively, having then to wait for some random while before we can manually attempt to continue the importing again.
So this is our problem that we seek help with - especially from Microsoft representatives. Can Microsoft provide a way for us to be able to perform this importing of these 60...65K pages+documents, at a reasonably fast query rate, without getting throttled, so we could just get the job done in a continuous batch process, for our client? In example, as either a separate access point (dedicated service endpoint), perhaps time-constrained eg configured for our use within a certain period - so we could within that period, perform all the necessary imports?
For additional information - we currently load the data using the following Graph API URL-s (placeholders of actual different values are brought in uppercase letters between curly braces):
Pages under the notebook section:
https://graph.microsoft.com/v1.0/users/{USER}/onenote/sections/{SECTION_ID}/pages?...
Content of a page:
https://graph.microsoft.com/v1.0/users/{USER}/onenote/pages/{PAGE_ID}/content
A file (document or image) eg link from the page content:
https://graph.microsoft.com/v1.0/{USER}/onenote/resources/{RESOURCE_ID}/$value
which call is most likely to cause the throttling?
What can you retrieve before throttling - just pageids (150 calls total) or pageids+content (10000 calls)? If the latter can you store the results (eg sql database) so that you don't have to call these again.
If you can get pageids+content can you then access the resources using preAuthenticated=true (maybe this is less likely to be throttled). I don't actually offline images as I usually deal with ink or print.
I find the onenote API is very sensitive to multiple calls without waiting for them to complete, I find more than 12 simultaneous calls via a curl multi technique problematic. Once you get throttled if you don't back off immediately you can be throttled for a long, long time. I usually have my scripts bail if I get too many 429 in a row (I have it set for 10 simultaneous 429s and it bails for 10 minutes).
We now have the solution released & working in production. Turns out that indeed adding ?preAuthenticated=true to the page requests returns the page content having resource links (for contained documents, images) in a different format. Then, as it seems, querying these resource links will not impact the API throttling counters - as we've had no 429 errors since.
We even managed to bring the call rate down to 2 seconds from 4, without any problems. So I have marked codeeye's answer as the accepted one.

Mixing streams concept in kurento media server

Can anybody explain what is the basic concept in mixing in Kurento media server?
As it is mentioned in what kurento provides, there is a term mixing. So, I would like to know what kurento Media server mixes. As,
Do it mix multi stream generated by a user into one stream and broadcast that stream to other receiving user? If it does this how to use this concept
Do kurento able to receive multi-streams through one PeerConnection object with user, i.e., at one WebRtcEndPoint Kurento can receive or send multi stream by mixing those streams into one stream?
Edit Regarding Answer Update
So, I can use mixing concept by using Hubport.
Now, do this HubPort supports different MediaTypes. As, if one user is streaming its screen sharing and at the same time he is streaming its audio also. So, do this composite element mix both the streams to one and stream one single stream to all other users?
The concept of mixing refers to combining several media streams into one. This can be better understood with a conference room. In other setups, every user would have one stream going out, and another coming in for each other participant (except himself). That leaves you with 1 + (n -1) = n streams per participant. This results in n * n streams total, where n is the number of participants.
Mixing all streams in the media server allows you to save bandwidth, ideal in scenarios like mobile devices connected through 3G, for instance. What the mixer does, it combines all the streams into one, so each user is sending one stream, and receiving one stream that has all the combined participant's media (except his own). So just two streams per user saves a lot of bandwidth.
This, however, has a toll on CPU consumption, as it's necessary to adapt the videos to the new resolution, combine them... there is some processing involved.
On the other hand, the concept you are referring to is multicast, which is the ability to send several streams through one WebRTC connection. This doesn't save bandwidth, nor combines all the streams into one, but helps you reduce the number of endpoints present in your deployment. this is in our roadmap, but can't tell you when that'll be.
EDIT
Mixing can be achieved in the media server through the Composite media element. You can check this other SO answer for more info on how to use that media element.

iOS streaming audio from network -- random access of a 6-hour file

A potential client has come to me asking for a an app which will stream a six hour audio file. The user needs to be able to set the "playback head" to any position along the file. Presumably, this means that the app must not be forced to download the entire file before it beings playing back starting at an arbitrary
An added complication -- there are actually four files which need to be streamed and mixed simultaneously.
My questions are:
1) Is there an out-of-the box technology which will allow me random access of streaming audio, on iOS? Can this be done with standard server technology and a single long file, or will it involve some fancy server tech?
2) Which iOS framework is best suited for this. Is there anything high-level that would allow me to easily mix these four audio files?
3) Can this be done entirely with standard browser technology on the client side? (i.e. HTML5)
Have a close look at the MP3 format. It is remarkably easy and efficient to parse, chop up into little bits, and reassemble into a custom stream.
Hence rolling your own server-side code to grab what you want and send to the client will not be as crazy or difficult as it may sound.
MP3 is also widely supported by various clients. I strongly suspect any HTML5 capable browser will be able of play the stream you generate via a long-lived bit-rate regulated HTTP request.

Resources