Need help Speech-to-text, always failed for too many retries - google-cloud-speech

I use google speech-to-text API to get subtitles from audio, but when audio is too long, normally longer than 60 minutes, it will fail for too many retries.It says: google.api_core.exceptions.GoogleAPICallError: None Too many retries, giving up.
Can someone help me ??
I have tried many time, when audio file is shorter than about 60 minutes, it is OK.
client = speech.SpeechClient()
# Loads audio into memory.
audio = types.RecognitionAudio(uri=gcs_uri)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.OGG_OPUS,
sample_rate_hertz=48000,
language_code='en-US',
enable_word_time_offsets=True,
enable_automatic_punctuation=True)
# Detects speech in the audio file
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
# Get the feedback from Google Cloud API
operation.add_done_callback(callback)
time.sleep(30)
#metadata = operation.metadata
# Every 30 seconds to get back one response
percentile(operation, 30)
response = operation.result(timeout=None)

This is an exception thrown by operation.result() call. The operation.result() call has an internal counter that overflows.
Try to poll operation.done() before calling to operation.result(). The operation.done() is a non-blocking call.
Hope that will be fixed in future releases of google.cloud.speech library.

Related

Twitter stream APIs, allotted requests for search stream APIs v2

I'm new to Twitter APIs (this is my first experience), and I'm playing with them to monitor an account for new tweets, opening a web page when it happens, but I'm having some doubt on understanding how the allotting works.
Not knowing much, the twitter stream v2 APIs seem the ones fitting my use case, and in the Twitter-API-v2-sample-code git repository there is also a very clear filtered stream nodejs example. In fairness, I had little hassle to implement everything, and my code is not much different from filtered_stream.js source code. Given the provided example, implementation is straightforward: I use https://api.twitter.com/2/tweets/search/stream/rules to setup my rules (an array like [ { 'value': 'from:<myAccount>' } ] and then I start to listen at https://api.twitter.com/2/tweets/search/stream, easy peasy.
What I don't understand is the allotting resources count, because as per Twitter documentation I should be able to make 50 requests every 15 minutes, but I can barely make a couple, thus every time I'm testing my script I have to wait a couple of minutes before restarting.
These are the relevant headers I received after restarting a script running since one hour (the status code at restart was 429):
x-rate-limit-limit: 50
x-rate-limit-remaining: 49
Reset time: +15 minutes from current time
I usually don't have to wait 15 minutes, just a couple usually is fine... And my other note is that i managed to arrive down to 45 x-rate-limit-remaining once or twice, but never lower than that (usually I'm locked out at 47 / 48).
What I don't understand is: I opened one stream, I closed that one stream, and now I'm locked out for a couple of minutes. Allegedly, shouldn't I be able to open up to 50 connection in 15 minutes? (which is actually plenty if I'm just debugging a portion of code). Even the headers says that I have 49 attempts remaining out of 50, the status code 429 seems in pure contradiction with the x-rate-limits ... Sometimes, I cannot even reset the rules and start the stream in the same run, because the stream will return a backoff (status 429) when the rules resetting finishes (get -> set -> delete)...
I could add my code, but it's literally the NodeJS example I cited above, and my problem is not about querying the APIs, but rather not being able to connect for no apparent reason (at least to me). The only thing I could think of is the fact that I use the same Bearer for all requests (as per their example), but I don't see written anywhere it is a problem (I generated it in the developer dashboard, I'm not sure there is an API for that as well).
Edit - adding details
Just to describe my issue, this is the output I get when I start the script the first time:
Headers for stream received (status 200):
- [x-rate-limit-limit]: 50
- [x-rate-limit-remaining]: 49
- [x-rate-limit-reset]: 20/03/2021, 11:05:35
Which make sense, I made one request, remaining count went down by one.
Now, I stopped it, and ran it immediately after (Ctrl + C, run again, let's say two seconds delay), and this is the new output:
Headers for stream received (status 429):
- [x-rate-limit-limit]: 50
- [x-rate-limit-remaining]: 48
- [x-rate-limit-reset]: 20/03/2021, 11:05:35
With the following exception being returned in the body:
{
title: 'ConnectionException',
detail: 'This stream is currently at the maximum allowed connection limit.',
connection_issue: 'TooManyConnections',
type: 'https://api.twitter.com/2/problems/streaming-connection'
}
I understand the server takes a bit to realise I disconnected, but don't I have 50 connections available in a 15 minutes timeframe? I only opened one connection.
Actually, After the time it took to write all of the above (let's say ten minutes), I was able to connect again, receveing with this output:
Headers for stream received (status 200):
- [x-rate-limit-limit]: 50
- [x-rate-limit-remaining]: 47
- [x-rate-limit-reset]: 20/03/2021, 11:05:35
Maybe I'm realising only now and I wrote a useless question, but can I only have one active connection, being able to close it and open it again 50 times in 15 minutes? I understood I could have 50 active connections, but maybe at this point I'm wrong (and the Twitter server indeed takes a few minutes to realise I disconnected).

withSessionContinueSeconds method doesn't work in Flurry

I have the following code:
let builder = FlurrySessionBuilder.init()
.withAppVersion(appVersion)
.withLogLevel(FlurryLogLevelAll)
.withCrashReporting(true)
.withSessionContinueSeconds(20)
I wanted to increase the number od seconds before the session times out because of the nature of the app I am writing. However, the session timeout is still after 10 seconds rather than the 20 I have specified in my code.
How can I fix this?

Get YouTube Live Stream total duration

I want to get the total duration of a current live stream on YouTube (not ended!). This is straightforward for normal videos, e.g.:
https://www.googleapis.com/youtube/v3/videos?part=contentDetails&id={VIDEO_ID}&key={KEY}
However, for live streams such as Earth live from ISS this just returns a duration of "PT0S" which essentially means it's 0 seconds long.
There should be a way to get it. One way to get it is via Javascript, if you have the live stream open in your browser:
ytplayer = document.getElementById("movie_player");
console.log("Duration: " + ytplayer.getDuration());
Is it possible to get from a server-side? Other people have asked the same but have not received an answer yet.
Edit: I've just noticed the javascript way of getting the duration doesn't make any sense. For a live stream that started 2 hours ago, getCurrentTime() returns correctly 2*60*60=7200 when at the current time, but getDuration() returns 10245 for whatever reason.

Is it possible to get audio from an ICY stream with percentage and seek function

I'm trying to reproduce audio from an ICY stream. I'm able to reproduce that with AVPlayer and some good open source library but I'm not able to control the stream. I have no idea how I can get the percentage reproduced or how to seek to a specific time in the stream. Is that possible? Is there a good library that can help me?
Actually I'm using AFSoundManager but I'm always receiving negative numbers for percentage and I get invalid time when trying to seek the stream at a specified time.
That's the code that I'm using:
AFSoundManager.sharedManager().startStreamingRemoteAudioFromURL("http://www.abstractpath.com/files/audiosamples/sample.mp3") { (percentage, elapsedTime, timeRemaining, error, poppi) in
if error == nil {
//This block will be fired when the audio progress increases in 1%
if elapsedTime > 0 {
println(elapsedTime)
self.slider.value = Float(elapsedTime*1000)
}
} else {
//Handle the error
println(error)
}
I'm able of course to get the elapsedTime but not the percentage or the remainingTime. I always get negative numbers.
This code works perfectly with remote or local audio file but not with the stream.
This isn't possible.
These streams are live. There is nothing to seek to because what you haven't heard hasn't happened yet. Even streams that playback music end-to-end are still "live" in the sense that the audio you haven't received hasn't been encoded yet. (Small codec and transit buffers aside, of course.)

Can a Google App Engine App get blocked from accessing the Google Docs API

I have implemented a Google App Engine application which uploads documents to specific folders in Google Docs. A month ago I started having response time issues (deadline exceeded on GdataClient.GetDocList, fetch-url call, in Gdata Client)when querying for a specific folder in Google Docs. This caused a lot of tasks to queue-up in the Task Queue.
When I saw this, I paused the queues for a while - about 24 hours. When I restarted the queue nearly all of the where uploaded again, except 10 of the the files / tasks.
When I implemented the GetDocList call, I implemented a retry / sleep functionality to avoid the sometimes intermittent "DeadLineExceeded" which I got during my .GetNextLink().href-loop. I know that this is not a good "Cloud" design. But I was forced to do this to get it stable enough for production. For every sleep I extend the wait time and I only retry 5 times. The last time I wait for about 25 sec before retrying.
What I think is that all the tasks in the queues retried so many times (even though I have limited the tasks to running in serial-mode , one at a time. Maximum 5 a minute) that the App Engine App where black-listed from the Google Docs Api.
Can this happen?
What do I need to do to be able to query Google Docs Api from the same App Engine instance again?
Do I need to migrate the App Engine app to a new Application ID?
When I try this from my development environment, the code works, it queries the folder structure and returns a result within the time-limit.
The folder-structure I'm querying is rather big, which means that I need to fetch them via the .GetNextLink().href. In my development environment, the folderstructure contains of much less folders.
Anyway, this have been working very good for about a year in the production AppEngine instance. But stopped working around the 4th - 5th of March.
The user-account which is queried is currently using 7000 MB (3%) of the available 205824 MB.
When I use the code from dev-env but with completely different Google Apps domain / app-id / google account I can not reproduce the error.
When I changed the max-results to 1 (instead of 100 or 50 or 20) I succeed intermittently. But as the max-result is 1 I need to query many 1000 times, and since I only succeed with max 3 in a row, until my exponential back-off quits I never get my whole resultset. The resultset (the folder I query consist of between 300 to 400 folders (which in turn consists of at least 2 - 6 subfolders with pdf-files in)
I have tried with max-result 2, then the fetch fails on every occasion. If I change back to max-result 1 , then it succeeds on one or two fetches in a row, but this is not suffient. Since I need the whole folder-structure to be able to find a the correct folder to store the file in.
I have tried this from my local environment - i.e. from a completly different IP-adress and it still fails. This means that the app-engine app is not blocked from accessing google docs. The max-result change from 2 to 1 also proves that.
Conclusion:
The slow return time from the Google Docs API must be due to the extensive amount of files and collections inside the collection which I'm looping through. Keep in mind that this collection contains about 3500 Mb. Is this an issue?
Log:
DocListUrl to get entries from = https://docs.google.com/feeds/default/private/full/folder:XXXXXXX/contents?max-results=1.
Retrying RetryGetDocList, wait for 1 seconds.
Retrying RetryGetDocList, wait for 1 seconds.
Retrying RetryGetDocList, wait for 4 seconds.
Retrying RetryGetDocList, wait for 9 seconds.
Retrying RetryGetDocList, wait for 16 seconds.
Retrying RetryGetDocList, wait for 25 seconds.
ApplicationError: 5
Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py", line 703, in call
handler.post(*groups)
File "/base/data/home/apps/XXXX/prod-43.358023265943651014/DocsHandler.py", line 418, in post
success = uploader.Upload(blob_reader, fileToUpload.uploadSize, fileToUpload.MainFolder, fileToUpload.ruleTypeReadableId ,fileToUpload.rootFolderId,fileToUpload.salesforceLink,fileToUpload.rootFolder, fileToUpload.type_folder_name, fileToUpload.file_name, currentUser, client, logObj)
File "/base/data/home/apps/XXXX/prod-43.358023265943651014/DocsClasses.py", line 404, in Upload
collections = GetAllEntries('https://docs.google.com/feeds/default/private/full/%s/contents?max-results=1' % (ruleTypeFolderResourceId), client)
File "/base/data/home/apps/XXXX/prod-43.358023265943651014/DocsClasses.py", line 351, in GetAllEntries
chunk = RetryGetDocList(client.GetDocList , chunk.GetNextLink().href)
File "/base/data/home/apps/XXX/prod-43.358023265943651014/DocsClasses.py", line 202, in RetryGetDocList
return functionCall(uri)
File "/base/data/home/apps/XXX/prod-43.358023265943651014/gdata/docs/client.py", line 142, in get_doclist
auth_token=auth_token, **kwargs)
File "/base/data/home/apps/XXXX/prod-43.358023265943651014/gdata/client.py", line 635, in get_feed
**kwargs)
File "/base/data/home/apps/XXXXX/prod-43.358023265943651014/gdata/client.py", line 265, in request
uri=uri, auth_token=auth_token, http_request=http_request, **kwargs)
File "/base/data/home/apps/XXXX/prod-43.358023265943651014/atom/client.py", line 117, in request
return self.http_client.request(http_request)
File "/base/data/home/apps/XXXXX/prod-43.358023265943651014/atom/http_core.py", line 420, in request
http_request.headers, http_request._body_parts)
File "/base/data/home/apps/XXXXX/prod-43.358023265943651014/atom/http_core.py", line 497, in _http_request
return connection.getresponse()
File "/base/python_runtime/python_dist/lib/python2.5/httplib.py", line 206, in getresponse
deadline=self.timeout)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/urlfetch.py", line 263, in fetch
return rpc.get_result()
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 592, in get_result
return self.__get_result_hook(self)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/urlfetch.py", line 371, in _get_fetch_result
raise DeadlineExceededError(str(err))
DeadlineExceededError: ApplicationError: 5
Regards
/Jens
On occasion responses from the Google Documents List API exceed the deadline for App Engine HTTP requests. This can be the case with extremely large corpuses of documents being returned in the API.
To workaround this, set the max-results parameter to a smaller number than 1000.
Also, retry the request using exponential back-off.
To work around failing uploads, use the Task Queue in App Engine to complete uploads, as well as resumable upload with the API.
You can request the App Engine team increase the size of the HTTP timeout of your application to a large number of seconds that would allow this request to succeed. However, it is rare that the team approves such a request without a strong need.

Resources