Google cloud speech API streaming: RequestObserver.onCompleted() still causes "OUT_OF_RANGE: Exceeded maximum allowed stream duration" error - google-cloud-speech

Using the Google cloud speech API's streaming requires the length of a streaming session to be less than 60 seconds. To handle streaming beyond this limitation, we would require to split the streaming data in to several chunks, using something like single_utterance, for example. When submitting such chunk, we use the "RequestObserver.onCompleted()" to mark the end of streaming session. However it seems like the Grpc threads which were created to handle streaming running even after getting the final results, causing the error "io.grpc.StatusRuntimeException: OUT_OF_RANGE: Exceeded maximum allowed stream duration of 65 seconds".
Is there any other mechanisms that could be used to properly terminate Grpc threads, so that it won't run until the allowed 60 second limit? (Seems like we could use SpeechClient.close() or SpeechClient.shutdown() to release all background resources, but would require recreating the SpeechClient instances again. This would be somewhat heavy.)
Is there any other recommended ways that we could use to stream data beyond the 60 second limit, without splitting the stream to several chunks.
Parameters : [Encoding=LINEAR16,Rate=44100]

Related

What's the most efficient way to handle quota for the YouTube Data API when developing a chat bot?

I'm currently developing a chat bot for one specific YouTube channel, which can already fetch messages from the currently active livechat. However I noticed my quota usage shooting up, so I took the "liberty" to calculate my quota cost.
My API call currently looks like this https://www.googleapis.com/youtube/v3/liveChat/messages?liveChatId=some_livechat_id&part=snippet,authorDetails&pageToken=pageTokenIfProvided, which uses up 5 units. I checked this by running one API call and comparing the quota usage before and after (so apologies, if this is inaccurate). The response contains pollingIntervalMillis set to 5086 milliseconds. Currently, my bot adds that interval to the current datetime and schedules the next fetch at that time (using Celery), so it currently fetches messages at a rate of 4-6 seconds. I'm gonna take the liberty and always wait for 6 seconds.
Calculating my API quota would result in a usage of 72.000 units per day:
10 requests per minute * 60 minutes * 24 hours = 14.400 requests per day
14.400 requests * 5 units per request = 72.000 units per day
This means that if I used the pollingIntervalMillis as a guideline for how often to request, I'd easily reach the maximum quota of 10.000 units by running the bot for 3 hours and 20 minutes. In order to not use up the quota by just fetching chat messages, I would need to run 1 API call per minute (1,3889 approximately). This is very unfeasible for a chatbot, since this is only for fetching messages and not even sending any messages to the chat.
So my question is: Is there maybe a more efficient way to fetch chat messages which won't use up the quota so much? Or will I only get this resolved by applying for a quota extension? And if this is only resolved by a quota extension, how much would I need to ask for reliably? Around 100k units? Even more?
I am also asking myself how something like Streamlabs Chatbot (previously known as AnkhBot) accomplishes this without hitting the quota limit despite thousands of users using their API client, their quota must probably be in the millions or billions.
And another question would be how I'd actually fill out the form, if the bot is still in this "early" state of development?
You pretty much hit the nail on the head. Services like Streamlabs are owned by larger companies, in their case Logitech. They not only have the money to throw around for things like increasing their API quota, but they also have professional relationships with companies like Google to decrease their per unit cost.
As for efficiency, the API costs are easily found in the documentation, but for live chat as you've found, you're going to be hitting the API for 5 units per hit. The only way to improve your overall daily cost with your calls is to perform them less frequently. While once per minute is clearly excessively long, once every 15-18 seconds could reduce the overall cost of your API quota increase, while making the chat bot adequately responsive.
Of course that all depends on your desired usage of the data, but still a recommendation if you're implementing the bot still in the realm of hobbyist usage.

Youtube API liveChat messages Quota Limit way too low

I am implementing an application that should react to live chat messages.
In a first test I simply called https://www.googleapis.com/youtube/v3/liveChat/messages to get the chats and author details (parameter part=snippet,authorDetails). The response indicates pollingIntervalMillis: 1000 so in theory I could call the API every second. However my little manual testing (without automated polling) already used a stunning 120 Quota Usage out of a 10.000 daily limit?
Am I doing something wrong?
Will I get lower usage if I specify something like a fields parameter to only ask for the needed data (message text + author name)?
I have the feeling to have missed some important thing to cut down the quota usage? Or is this simply not a supported use case?
Is there something like a streaming API that pushes new messages to my server?

Google Speech API Won't Accept Large Audio Files

I'm receiving a server error when trying to process large audio files. The audio files are originally audio/m4a # 32kHz and per the recommendations of the documentation am converting/compressing them to audio/amr_wb # 16kHz. These files are well below the 180 minutes of audio limit yet I'm still receiving a server error when processing them.
GaxError Exception occurred in retry method that was not classified as transient, caused by 8:Received message larger than max (5371623 vs. 4194304)
I'm using version V1p1beta and the method long_running_recognize to transcribe these audio files. My files are hosted on Google Cloud Storage and I'm providing the uri in my api call.
How can I send large audio files to the API without the server enforcing a size restriction ? It seems wrong to recommend using FLAC or WAV and having a 180 minutes of length in audio limit if the server can't even handle my hour long audio file that has been encoded to AMR_WB.
Thanks for any help
Currently Speech-to-Text API released the v1 endpoint, I suggest trying this version. I was able to get a proper response by using a 90 minute audio.

Throttling of OneNote (Graph) API

We have developed an importing solution for one of our clients. It parses and converts data contained in many OneNote notebooks, to required proprietary data structures, for the client to store and use within another information system.
There is substantial amount of data across many notebooks, requiring a considerable amount of Graph API queries to be performed, in order to retrieve all of the data.
In essence, we built a bulk-importing (batch process, essentially) solution, which goes through all OneNote notebooks under a client's account, parses sections and pages data of each, as well as downloads and stores all page content - including linked documents and images. The linked documents and images require the most amount of Graph API queries.
When performing these imports, the Graph API throttling issue arises. After certain time, even though we are sending queries at a relatively low rate, we start getting the 429 errors.
Regarding data volume, average section size of a client notebook is 50-70 pages. Each page contains links to about 5 documents for download, on average. Thus, it requires up to 70+350 requests to retrieve all the pages content and files of a single notebook section. And our client has many such sections in a notebook. In turn, there are many notebooks.
In total, there are approximately 150 such sections across several notebooks that we need to import for our client. Considering the stats above, this means that our import needs to make a total of 60000-65000 Graph API queries, estimated.
To not flood the Graph API service and keep within the throttling limits, we have experimented a lot and gradually decreased our request rate to be just 1 query for every 4 seconds. That is, at max 900 Graph API requests are made per hour.
This already makes each section import noticeably slow - but it is endurable, even though it means that our full import would take up to 72 continuous hours to complete.
However - even with our throttling logic at this rate implemented and proven working, we still get 429 "too many requests" errors from the Graph API, after about 1hr 10mins, about 1100 consequtive queries. As a result, we are unable to proceed our import on all remaining, unfinished notebook sections. This enables us to only import a few sections consequtively, having then to wait for some random while before we can manually attempt to continue the importing again.
So this is our problem that we seek help with - especially from Microsoft representatives. Can Microsoft provide a way for us to be able to perform this importing of these 60...65K pages+documents, at a reasonably fast query rate, without getting throttled, so we could just get the job done in a continuous batch process, for our client? In example, as either a separate access point (dedicated service endpoint), perhaps time-constrained eg configured for our use within a certain period - so we could within that period, perform all the necessary imports?
For additional information - we currently load the data using the following Graph API URL-s (placeholders of actual different values are brought in uppercase letters between curly braces):
Pages under the notebook section:
https://graph.microsoft.com/v1.0/users/{USER}/onenote/sections/{SECTION_ID}/pages?...
Content of a page:
https://graph.microsoft.com/v1.0/users/{USER}/onenote/pages/{PAGE_ID}/content
A file (document or image) eg link from the page content:
https://graph.microsoft.com/v1.0/{USER}/onenote/resources/{RESOURCE_ID}/$value
which call is most likely to cause the throttling?
What can you retrieve before throttling - just pageids (150 calls total) or pageids+content (10000 calls)? If the latter can you store the results (eg sql database) so that you don't have to call these again.
If you can get pageids+content can you then access the resources using preAuthenticated=true (maybe this is less likely to be throttled). I don't actually offline images as I usually deal with ink or print.
I find the onenote API is very sensitive to multiple calls without waiting for them to complete, I find more than 12 simultaneous calls via a curl multi technique problematic. Once you get throttled if you don't back off immediately you can be throttled for a long, long time. I usually have my scripts bail if I get too many 429 in a row (I have it set for 10 simultaneous 429s and it bails for 10 minutes).
We now have the solution released & working in production. Turns out that indeed adding ?preAuthenticated=true to the page requests returns the page content having resource links (for contained documents, images) in a different format. Then, as it seems, querying these resource links will not impact the API throttling counters - as we've had no 429 errors since.
We even managed to bring the call rate down to 2 seconds from 4, without any problems. So I have marked codeeye's answer as the accepted one.

how to process a HTTP stream with Dart

cross posted from dartisans G+ where I got no response so far:
Q: how to do (async) simultaneous stream downloads.
Hi, I'm still learning Dart and, for training, I'd like to create a web page from which I can fetch datas from 1 to 10 URLs that are HTTP streams of binary data. Once I got a chunk of data from each streams , simultaneously, I then perform a computation and proceed to next chunks and so on, ad lib. I need parallelism because the client has much more network bandwith than the servers.
Also I do not want to fully download each URL they're too big to fit in memory or even on local storage. Actually, It's pretty similar to video streaming but it's not video data it's binary data and instead of displaying data I just do some computation and I do that on many streams at a time.
Can I do that with Dart and how ? do dart:io or dart:async have the classes I can use to do that ? do I need to use "webworkers" to spawn 1 to 10 simultaneous http requests ?
any pointers/advices/similar samples would be greatly appreciated.
tl;dr: how to process a HTTP stream of data chunk by chunk and how to parallelize this to process many streams at the same time.

Resources