Celery Task that comunicate with Twitter - twitter

What is the right approach when writing celery tasks that communicate with service that have a rate limits and sometimes is missing (not responding) for a long time of period?
Do I have to use task retry? What if service is missing too much time? Is there a way to store this tasks for later execution after a long time period?
What if this is a subtask in a long task?

First, i suggest you to set a socket timeout for avoid long waiting of a response.
Than you can catch the socket TimeOutException and in this particoular case to a retry with big amount of time, like 15 minutes.
Anyway normally I use an incrementalRetry with a percentage of increment, this will increase the time every time the task retry, this is useful when you write task that depends on external services that can be available for long time.
You can set on task an high number of retry like 50 and than set the standard retry time by using the var
#20 seconds
self.default_retry_delay = 20
after you can implement a method like this for your task
def incrementalRetry(self, exc, perc = 20, args = None):
"""By default the retry delay is increased by 20 percent"""
if args:
self.request.args = args
delay = self.default_retry_delay
if self.request.kwargs.has_key('retry_deleay'):
delay = self.request.kwargs['retry_deleay']
retry_delay = delay+round((delay*perc)/100,2)
#print "delay"+str(retry_delay)
self.retry(self.request.args,
self.request.kwargs.update({'retry_deleay':retry_delay}),
exc=exc,countdown=retry_delay, max_retries=self.max_retries)
What if this is a subtask in a long task?
If you don't need the result you can launch it in async mode by using task.delay(args=[])
A nice feature is also the task group that allow you to launch different tasks and after all are finished you can to something else in you work flow.

Related

Serverless SQS consumer skips messages

I am using the Serverless Framework to consume messages from SQS. Some of the messages sent to the queue do not get consumed. They go straight to the in-flight SQS status and from there to my dead letter queue. When I look at my log of the consumer, I can see that it consumed and successfully processed 9/10 messages. One is always not consumed and ends up in the dead letter queue. I am setting reservedConcurrency to 1 so that only one consumer can run at a time. The function consumer timeout is set to 30 seconds. This is the consumer code:
module.exports.mySQSConsumer = async (event, context) => {
context.callbackWaitsForEmptyEventLoop = false;
console.log(event.Records);
await new Promise((res, rej) => {
setTimeout(() => {
res();
}, 100);
});
console.log('DONE');
return true;
}
Consumer function configuration follow:
functions:
mySQSConsumer:
handler: handler.mySQSConsumer
timeout: 30 # seconds
reservedConcurrency: 1
events:
- sqs:
arn: arn:aws:sqs:us-east-1:xyz:my-test-queue
batchSize: 1
enabled: true
If I remove the await function, it will process all messages. If I increase the timeout to 200ms, even more messages will go to straight to the in-flight status and from there to the dead letter queue. This code is very simple. Any ideas why it's skipping some messages? The messages that don't get consumed don't even show up in the log using the first console.log() statement. They seem entirely ignored.
I figured out the problem. The SQS queue Lambda function event triggering works differently than I thought. The messages get pushed into the Lambda function, not pulled by it. I think this could be engineered better by AWS, but it's what it is.
The issue was the Default Visibility Timeout set to 30 seconds together with Reserved Concurrency set to 1. When the SQS queue gets filled up quickly with thousands of records, AWS starts pushing the messages to the Lambda function at a rate that is faster than the rate at which the single function instance can process them. AWS "assumes" that it can simply spin up more instances of the Lambda to keep up with the backpressure. However, the concurrency limit doesn't let it spin up more instances - the Lambda function is throttled. As a result, the function starts returning failure to the AWS backend for some messages, which will, consequently, hide the failed messages for 30 seconds (the default setting) and put them back into the queue after this period for reprocessing. Since there are so many records to process by the single instance, 30 seconds later, the Lambda function is still busy and can't process those messages again. So the situation repeats itself and the messages go back to invisibility for 30 seconds. This repeats total 3 times. After the third attempt, the messages go to the dead letter queue (we configured our SQS queue that way).
To resolve this issue, we increased the Default Visibility Timeout to 5 minutes. That's enough time for the Lambda function to process through most of the messages in the queue while the failed ones wait in invisibility. After 5 minutes, they get pushed back into the queue and since the Lambda function is no longer busy, it will process most of them. Some of them have to go to invisibility twice before being successfully processed.
So the remedy to this problem is either increasing the Default Invisibility Timeout like we did or increasing the number of failures necessary before a message goes to the dead letter queue.
I hope this helps someone.

How to protect against Redis::TimeoutError: Connection timed out on Heroku

This might seem like a silly question, but I figured someone here on StackOverflow might have some ideas around this. What the hell, right?
I'm using Heroku workers with 1X Dynos to run Resque. Sometimes I get this error: Redis::TimeoutError: Connection timed out. It happens in the redis gem; here's the stacktrace:
Redis::TimeoutError Connection timed out
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis/connection/ruby.rb:55 rescue in _read_from_socket
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis/connection/ruby.rb:48 _read_from_socket
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis/connection/ruby.rb:41 gets
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis/connection/ruby.rb:273 read
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis/client.rb:245 block in read
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis/client.rb:233 io
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis/client.rb:244 read
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis/client.rb:175 block in call_pipelined
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis/client.rb:214 block (2 levels) in process
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis/client.rb:340 ensure_connected
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis/client.rb:204 block in process
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis/client.rb:286 logging
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis/client.rb:203 process
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis/client.rb:174 call_pipelined
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis/client.rb:146 block in call_pipeline
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis/client.rb:273 with_reconnect
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis/client.rb:144 call_pipeline
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis.rb:2101 block in pipelined
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis.rb:37 block in synchronize
vendor/ruby-2.1.5/lib/ruby/2.1.0/monitor.rb:211 mon_synchronize
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis.rb:37 synchronize
vendor/bundle/ruby/2.1.0/gems/redis-3.2.0/lib/redis.rb:2097 pipelined
vendor/bundle/ruby/2.1.0/gems/redis-namespace-1.5.1/lib/redis/namespace.rb:413 namespaced_block
vendor/bundle/ruby/2.1.0/gems/redis-namespace-1.5.1/lib/redis/namespace.rb:265 pipelined
vendor/bundle/ruby/2.1.0/gems/resque-1.25.2/lib/resque.rb:214 push
vendor/bundle/ruby/2.1.0/gems/resque-1.25.2/lib/resque/job.rb:153 create
vendor/bundle/ruby/2.1.0/gems/resque_solo-0.1.0/lib/resque_ext/job.rb:7 create_solo
vendor/bundle/ruby/2.1.0/gems/resque-1.25.2/lib/resque.rb:317 enqueue_to
vendor/bundle/ruby/2.1.0/gems/resque-1.25.2/lib/resque.rb:298 enqueue
We have resque-retry set up, but I guess it doesn't matter if the enqueue call can't even connect to redis.
My current solution is to wrap every Resque.enqueue call with begin/rescue so that we can re-try the enqueue call (as per https://github.com/resque/resque/issues/840). But isn't there a better way?
Heroku Redis allows you to change your instance timeout and maxmemory-policy settings. Theses settings will be kept across upgrades and HA failover.
From documentation:
The timeout setting sets the number of seconds Redis waits before killing idle connections. A value of zero means that connections will not be closed. The default value is 300 seconds (5 minutes). You can change this value using the CLI:
$ heroku redis:timeout maturing-deeply-2628 --seconds 60
Timeout for maturing-deeply-2628 (REDIS_URL) set to 60 seconds.
Connections to the redis instance will be stopped after idling for 60 seconds.
Looks like in case of resque the 0 (do not use connection time out) would be the best choise (--seconds 0).

why gevent.Timeout() can't raise Exception

with gevent.Timeout(0.1) as tt:
time.sleep(1)
above ,not raise Exception
with gevent.Timeout(0.1) as tt:
gevent.sleep(1)
throw gevent.timeout.Timeout: 0.1 seconds
there difference is time.sleep() and gevent.sleep()!
time.sleep() actually pauses all execution of code and does not allow any other code to run. Gevent is an event-loop, which means that it allows other "threads" (greenlets) to run while blocking.
Essentially gevent has a list of tasks it is executing. It only allows 1 task to run at a time. If you say time.sleep(1), that task is still running, but not doing anything. If you say gevent.sleep(1), it pauses the current task and allows the other tasks to run.
gevent.Timeout() actually launches a second task to monitor the amount of time has elapsed. Since time.sleep() never yields, that second task never gets a chance to throw the error.

Why does my Quartz trigger fire before the specified interval?

My objective is to have Quartz.NET execute a job at precisely 25Hz or every 40ms.
I'm using the following trigger:
ITrigger MyTrigger = TriggerBuilder.Create().WithIdentity("T1").ForJob("MyJob").WithSimpleSchedule(x => x.WithInterval(TimeSpan.FromMilliseconds(40)).RepeatForever()).Build();
and the following job:
[DisallowConcurrentExecution]
private class MyJob : Quartz.IJob
{
public void Execute(IJobExecutionContext context)
{
Idx++;
Console.WriteLine("Job {0} fired at {1}ms", Idx, MyStopWatch.ElapsedMilliseconds);
}
}
The problem is that the first 150 executions or so fire too quickly. For example the first 60 iterations all fire at either 20ms or 21ms on the stopwatch. Afterward they fire in bunches every 200ms until it becomes stable around 1000ms, and then starts firing every 40-42ms as intended.
How can I prevent Quartz from triggering a job if the previous job was fired within 40ms?
What is the source of this behavior?
You should not rely on Quartz for millisecond precision scheduling. Quartz has a lot of infrastructure (misfire checks, compensation etc) that will make it unsuitable for this kind of precision.
I'd argue that stable firing on bare virtual machine level could be a challenge too due to GC etc that might pause things a bit.
Quartz can do a lot for you but in this case I'd just go with custom Thread or Timer implementation if these are your requirements.

QTP Recovery scenario used to "skip" consecutive FAILED steps with 0 timeout -- how can I restore original timeout value?

Suppose I use QTPs recovery scenario manager to set the playback synchronization timeout to 0. The handler would return with "continue with next statement".
I'd do that to make sure that any following playback statements don't waste their time waiting for the next non-existing/non-matching step before failing:
I have a lot of GUI tests that kind of get stuck because let's say if 10 controls are missing, their (consecutive) playback steps produce 10 timeout waits before failing. If the playback timeout is 30 seconds, I loose 10x30 seconds=5 minutes execution time while it really would be sufficient to wait for 30 seconds ONCE (because the app does not change anymore -- we waited a full timeout period already).
Now if I have 100 test cases (=action iterations), this possibly happens 100 times, wasting 500 minutes of my test exec time window.
That's why I come up with the idea of a recovery scenario function setting the timeout to 0 after/upon the first failed playback step. This would accelerate the speed while skipping the rightly-FAILED step, yet would not compromise the precision/reliability of identifying the next matching GUI context (which creates a PASSED step).
Then of course upon the next passed playback step, I would want to restore the original timeout value. How could I do that? This is my question.
One cannot define a recovery scenario function that is called for PASSED steps.
I am currently thinking about setting a method function for Reporter.ReportEvent, and "sniffing" for PASSED log entries there. I'd install that method function in the scenario recovery function which sets timeout to 0. Then, when the "sniffer" function senses a ReportEvent call with PASSED status during one of the following playback steps, I'd reset everything (i.e. restore the original timeout, and uninstall the method function). (I am 99% sure, however, that .Click and .Set methods do not call ReportEvent to write their result status...so this option might probably not work.)
Better ideas? This really bugs me.
It sounds to me like you tests aren't designed correctly, if you fail to find an object why do you continue?
One possible (non recovery scenario) solution would be to use RegisterUserFunc to override the methods you are using in order to do an obj.Exist(0) before running the required method.
Function MyClick(obj)
If obj.Exist(1) Then
obj.Click
Else
Reporter.ReportEvent micFail, "Click failed, no object", "Object does not exist"
End If
End Function
RegisterUserFunc "Link", "Click", "MyClick"
RegisterUserFunc "WebButton", "Click", "MyClick"
''# etc
If you have many controls of which some may be missing and you know that after 10 seconds you mentioned (when the first timeout occurs), nothing more will show up, then you can use the exists method with a timeout parameter.
Something like this:
timeout = 10
For Each control in controls
If control.exists(timeout) Then
do something with the control
Else
timeout = 0
End If
Next
Now only the first timeout will be 10 seconds. Each and every subsequent timeout in your collection of controls will have the timeout set to 0 which will save your time.

Resources