Tweepy error 104: Connection aborted - twitter

I am trying to scrape some tweets using Tweepy, but the connection crash after few hundreds requests with the following error:
tweepy.error.TweepError:
Failed to send request: ('Connection aborted.', error("(104, 'ECONNRESET')",))
My code is like this:
for status in tweepy.Cursor(api.search,
q="",
count=100,
include_entities=True,
monitor_rate_limit=True,
wait_on_rate_limit=True,
wait_on_rate_limit_notify = True,
retry_count = 5, #retry 5 times
retry_delay = 5, #seconds to wait for retry
geocode ="34.0207489,-118.6926066,100mi", # los angeles
until=until_date,
lang="en").items():
try:
towrite = json.dumps(status._json)
output.write(towrite + "\n")
except Exception, e:
log.error(e)
c+=1
if c % 10000 == 0: # 100 requests, sleep
time.sleep(900) # sleep 15 min
I can capture the error with try/except, but I am not able to restart the cursor from the point where it crashed.
Does anyone know how to solve this error, or restart the cursor from last known status?
Thanks!

Tweepy documentation says the Requests / 15-min window (user auth) are 180, but apparently sleeping for too long affect the connection reliability (after some requests) so if you run a request every 5 sec everything seems to work just fine:
for status in tweepy.Cursor(api.search,
q="",
count=100,
include_entities=True,
monitor_rate_limit=True,
wait_on_rate_limit=True,
wait_on_rate_limit_notify = True,
retry_count = 5, #retry 5 times
retry_delay = 5, #seconds to wait for retry
geocode ="34.0207489,-118.6926066,100mi", # los angeles
until=until_date,
lang="en").items():
try:
towrite = json.dumps(status._json)
output.write(towrite + "\n")
except Exception, e:
log.error(e)
c+=1
if c % 100 == 0: # first request completed, sleep 5 sec
time.sleep(5)

It seems to me that the tweepy call should be inside the try block. Also, you have arguments in api.search that are not in the Tweepy API (http://docs.tweepy.org/en/v3.5.0/api.html#help-methods). Anyway, this worked for me:
backoff_counter = 1
while True:
try:
for my_item in tweepy.Cursor(api.search, q="test").items():
# do something with my_item
break
except tweepy.TweepError as e:
print(e.reason)
sleep(60*backoff_counter)
backoff_counter += 1
continue
Basically, when you get the error you sleep for a while, and then try again. I used an incremental backoff to make sure that the sleeping time was enough for re-establishing the connection.

Related

Waiting for 20 seconds before continuing (permanent error)

the get_chat_history and egt_chat_members methods throw a permanent waiting error -Waiting for 20 (23,21,22,18) seconds before continuing. get_chat works fine. This error appeared a couple of days ago.
async with tg_cl:
while True:
try:
async for members in tg_cl.get_chat_members(target):
members_chat.append(members)
break
except FloodWait as Err:
print("Flood wait: {} seconds".format(Err.value))
sleep(Err.value)
continue
...............
async with tg_cl:
while True:
try:
if 'join' in chat:
info_chat = await tg_cl.join_chat(chat)
else:
info_chat = await tg_cl.get_chat(chat)
async for messages in tg_cl.get_chat_history(chat, limit=1, offset_id=-1):
count_messages = messages.id
break
except FloodWait as Err:
print("Flood wait: {} seconds".format(Err.value))
sleep(Err.value)
continue
Pyrogram already handles FloodWait errors on its own, you don't need to apply any logic yourself.
When setting up your Client instance, you can set the sleep_threshold. This is the amount of time that Pyrogram will handle a FloodWait error on its own, without any logic needed from you. You can set it to an arbitrarily high value to not get any errors anymore. Keep in mind that Pyrogram will silently handle these errors itself and only print something like "waiting x seconds" in your output.
list_of_members = []
for member in app.get_chat_members(chat_id):
list_of_members.append(member.id)
print(list_of_members)
[123, 456, 789, ...]
Please note that in Channels you can only retrieve 200 members at a time, in chats only 10 000 (ten thousand), this is a hard limit by the Server.
See Pyrogram's documentation on the available arguments, as well as some examples:
https://docs.pyrogram.org/api/methods/get_chat_members

How can I keep the Tempfile contents from being empty in a separate (Ruby) thread?

In a Rails 6.x app, I have a controller method which backgrounds queries that take longer than 2 minutes (to avoid a browser timeout), advises the user, stores the results, and sends a link that can retrieve them to generate a live page (with Highcharts charts). This works fine.
Now, I'm trying to implement the same logic with a method that backgrounds the creation of a report, via a Tempfile, and attaches the contents to an email, if the query runs too long. This code works just fine if the 2-minute timeout is NOT reached, but the Tempfile is empty at the commented line if the timeout IS reached.
I've tried wrapping the second part in another thread, and wrapping the internals of each thread with a mutex, but this is all getting above my head. I haven't done a lot of multithreading, and every time I do, I feel like I stumble around till I get it. This time, I can't even seem to stumble into it.
I don't know if the problem is with my thread(s), or a race condition with the Tempfile object. I've had trouble using Tempfiles before, because they seem to disappear quicker than I can close them. Is this one getting cleaned up before it can be sent? The file handle actually still exists on the file system at the commented point, even though it's empty, so I'm not clear on what's happening.
def report
queue = Queue.new
file = Tempfile.new('report')
thr = Thread.new do
query = %Q(blah blah blah)
#calibrations = ActiveRecord::Base.connection.exec_query query
query = %Q(blah blah blah)
#tunings = ActiveRecord::Base.connection.exec_query query
if queue.empty?
unless #tunings.empty?
CSV.open(file.path, 'wb') do |csv|
csv << ["headers...", #parameters].flatten
#calibrations.each do |c|
line = [c["h1"], c["h2"], c["h3"], c["h4"], c["h5"], c["h6"], c["h7"], c["h8"]]
t = #tunings.select { |t| t["code"] == c["code"] }.first
#parameters.each do |parameter|
line << t[parameter.downcase]
end
csv << line
end
end
send_data file.read, :type => 'text/csv; charset=iso-8859-1; header=present', :disposition => "attachment; filename=\"report.csv\""
end
else
# When "timed out", `file` is empty here
NotificationMailer.report_ready(current_user, file.read).deliver_later
end
end
give_up_at = Time.now + 120.seconds
while Time.now < give_up_at do
if !thr.alive?
break
end
sleep 1
end
if thr.alive?
queue << "Timeout"
render html: "Your report is taking longer than 2 minutes to generate. To avoid a browser timeout, it will finish in the background, and the report will be sent to you in email."
end
end
The reason the file is empty is because you are giving the query 120 seconds to complete. If after 120 seconds that has not happened you add "Timeout" to the queue. The query is still running inside the thread and has not reached the point where you check if the queue is empty or not. When the query does complete, since the queue is now not empty, you skip the part where you write the csv file and go to the Notification.report line. At that point the file is still empty because you never wrote anything into it.
In the end I think you need to rethink the overall logic of what you are trying to accomplish and there needs to be more communication between the threads and the top level.
Each thread needs to tell the top level if it has already sent the result, and the top level needs to let the thread know that its past time to directly send the result, and instead should email the result.
Here is some code that I think / hope will give some insight into how to approach this problem.
timeout_limit = 10
query_times = [5, 15, 1, 15]
timeout = []
sent_response = []
send_via_email = []
puts "time out is set to #{timeout_limit} seconds"
query_times.each_with_index do |query_time, query_id|
puts "starting query #{query_id} that will take #{query_time} seconds"
timeout[query_id] = false
sent_response[query_id] = false
send_via_email[query_id] = false
Thread.new do
## do query
sleep query_time
unless timeout[query_id]
puts "query #{query_id} has completed, displaying results now"
sent_response[query_id] = true
else
puts "query #{query_id} has completed, emailing result now"
send_via_email[query_id] = true
end
end
give_up_at = Time.now + timeout_limit
while Time.now < give_up_at
break if sent_response[query_id]
sleep 1
end
unless sent_response[query_id]
puts "query #{query_id} timed out, we will email the result of your query when it is completed"
timeout[query_id] = true
end
end
# simulate server environment
loop { }
=>
time out is set to 10 seconds
starting query 0 that will take 5 seconds
query 0 has completed, displaying results now
starting query 1 that will take 15 seconds
query 1 timed out, we will email the result of your query when it is completed
starting query 2 that will take 1 seconds
query 2 has completed, displaying results now
starting query 3 that will take 15 seconds
query 1 has completed, emailing result now
query 3 timed out, we will email the result of your query when it is completed
query 3 has completed, emailing result now

specifying different retry limit and retry delay for different jobs in backburner

I am using beaneater/beanstalk in my app for maintaining the job queues.
https://github.com/nesquena/backburner
My global config file for backburner look like -
Backburner.configure do |config|
config.beanstalk_url = ["beanstalk://#{CONFIG['beanstalk']['host']}:#{CONFIG['beanstalk']['port']}"]
config.tube_namespace = CONFIG['beanstalk']['tube_name']
config.on_error = lambda { |e| puts e }
config.max_job_retries = 5 # default 0 retries
config.retry_delay = 30 # default 5 seconds
config.default_priority = 65536
config.respond_timeout = 120
config.default_worker = Backburner::Workers::Simple
config.logger = Logger.new('log/backburner.log')
config.priority_labels = { :custom => 50, :useless => 1000 }
config.reserve_timeout = nil
end
I want to set different retry limit and retry delay for different jobs.
I was looking at rubydoc for corresponding variable/function. As per this rubydoc link, I tried configuring retry_limit locally in a worker as:
One specific worker look like -
class AbcJob
include Backburner::Queue
queue "abc_job" # defaults to 'backburner-jobs' tube
queue_priority 10 # most urgent priority is 0
queue_respond_timeout 300 # number of seconds before job times out
queue_retry_limit 2
def self.perform(abc_id)
.....Task to be done.....
end
end
However, it is still picking up the retry limit from global config file and retrying it 5 times instead of 2. Any thing that I am missing here?
How can I over write retry limit and retry delay locally?
I could not find right way to do it but I found a solution.
I am putting entire body of perform in begin-rescue block and in case of failure I am re-enqueing it with custom delay. Also, to keep the track of number of retries I made it an argument which I am enqueueing.
class AbcJob
include Backburner::Queue
queue "abc_job" # defaults to 'backburner-jobs' tube
queue_priority 10 # most urgent priority is 0
queue_respond_timeout 300 # number of seconds before job times out
def self.perform(abc_id, attempt = 1)
begin
.....Task to be done.....
rescue StandardError => e
# Any notification method so that you can know about failure reason and fix it before next retry
# I am using NotificationMailer with e.message as body to debug
# Any function you want your retry delay to be, I am using quadratic
delay = attempt * attempt
if attempt + 1 < GlobalConstant::MaxRetryCount
Backburner::Worker.enqueue(AbcJob, [abc_id, attempt + 1], delay: delay.minute)
else
raise # if you want your jobs to be buried eventually
end
end
end
I have kept the default value of attempt to 1 so that magic nuber 1 do not appear in code which might raise question about why are we passing a constant. For enqueueing from other places in code you can use simple enqueue
Backburner::Worker.enqueue(AbcJob, abc_id)

open url is Timeout::Error

uniqUsers = User.find(params[:userid]).events.where("comingfrom != ''").uniq_by {|obj| obj.comingfrom}
uniqUsers.map do |elem|
begin
#tag = nil
open('http://localhost:3000/search_dbs/show?userid='+ params[:userid] + '&fromnumber=' + elem.comingfrom + '&format=json', 'r', :read_timeout=>1) do |http|
#tag = http.read
end
rescue Exception => e
puts "failes"
puts e
end
end
hi , this is driving me crazy , for some reason the open url command is running out of time with no error. when i try the same url in chrome everything works like a charm, when im doing this from the code i get Timeout::Error
One second is optimistic.
When I was writing spiders, I'd create a retry queue, that contained sub-arrays or objects that contain the number of retries previously attempted, the URL, and maybe the last timeout value. Using an incrementing timeout value, the first time I'd try one second, the second try two seconds, four, eight, sixteen, etc. until I determined the site wasn't going to respond.

Resque retry retries without delay

This is the code that I have
class ExampleTask
extend Resque::Plugins::ExponentialBackoff
#backoff_strategy = [0, 20, 3600]
#queue = :example_tasks
def self.perform
raise
end
end
I am running into a problem where, whenever I enqueue this task locally, Resque seems to retry the task immediately without respecting the backoff strategy. Has anyone ever experienced this problem before?
upgrading to 1.0.0 actually solves this problem.
For any future readers, the first integer in the array #backoff_strategy is how long Resque-Retry will wait before retrying the first time. From the github readme:
key: m = minutes, h = hours
no delay, 1m, 10m, 1h, 3h, 6h
#backoff_strategy = [0, 60, 600, 3600, 10800, 21600]
#retry_delay_multiplicand_min = 1.0
#retry_delay_multiplicand_max = 1.0
The first delay will be 0 seconds, the 2nd will be 60 seconds, etc... Again, tweak to your own needs.

Resources