Sidekiq stacktrace in logs

Sidekiq stacktrace in logs - ruby-on-rails

I'm running a sidekiq application on heroku with papertrails addon and I use exceptions to fail jobs. For each exception full stacktrace is stored in papertrail logs which is definitely not what I want.
I didn't find a way how to turn off that feature. Could you give me a hint what I could do with that?
Maybe I should handle job failing in a different way?
Thanks!

Here's a modification of the standard error logger that limits the backtrace logging to the lines unique to the application:
class ExceptionHandlerLogger
def call(ex, ctxHash)
Sidekiq.logger.warn(Sidekiq.dump_json(ctxHash)) if !ctxHash.empty?
Sidekiq.logger.warn "#{ex.class.name}: #{ex.message}"
unless ex.backtrace.nil?
Sidekiq.logger.warn filter_backtrace(ex.backtrace).join("\n")
end
end
def filter_backtrace(backtrace)
index = backtrace.index { |item| item.include?('/lib/sidekiq/processor.rb') }
backtrace.first(index.to_i)
end
end
if !Sidekiq.error_handlers.delete_if { |h| h.class == Sidekiq::ExceptionHandler::Logger }
fail "default sidekiq logger class changed!"
end
Sidekiq.error_handlers << ExceptionHandlerLogger.new

Related

How to find an event in sentry by job arguments?

I have a rails app, sidekiq and sentry.
I want to find event in sentry by job arguments.
Sample:
I have SomeJob which executed with arguments [{some_arg: 'Arg1'}]
Job failed with error and send event to sentry.
How I can find event by job arguments?
I try full-text search, but it doesn't work

Search in sentry is limited by what they allow you to search by.
From reading their Search docs briefly you can either use:
sentry tags
messages
Either way, you would want to enrich your sentry events.
For example, let's assume you will rescue from the error raised in your job
class SomeJob
include Sidekiq::Worker
def perform(args)
# do stuff with args
rescue StandardError
SentryError.new(args: args)
end
end
SentryJobError is really just a PORO that would be called by your job classes.
class SentryJobError
def initialize(args:)
return if Rails.env.development?
Sentry.configure_scope do |scope|
scope.set_context('job_args', { args: args })
scope.set_context('message', 'job ${args[:some_arg]} failed')
end
end
end

How do I suppress Rails "Expire fragment" log messages?

I'm working on a Rails site that I did not create, and some of my log messages are hard to find in the sea of "Expire fragment" messages in log/production.log. It does not seem to indicate any sort of problem, so I'd like to suppress those messages and make it easy to find the messages my code is generating. Is there any way to suppress them?

You can try overriding the logger in an initializer, like here.
# Create logger that ignores messages containing “CACHE”
class CacheFreeLogger < ::Logger
def debug(message, *args, &block)
super unless message.include? 'Expire fragment'
end
end
# Overwrite ActiveRecord’s logger
ActiveRecord::Base.logger = ActiveSupport::TaggedLogging.new(
CacheFreeLogger.new(STDOUT)) unless Rails.env.test?

Sidekiq Error Handling and stop worker

To avoid running unnecessary, I'd like my Sidekiq worker to make a check at each stage for a certain condition. If that condition is not met, then Sidekiq should stop and report the error.
Currently I have:
class BotWorker
include Sidekiq::Worker
def perform(id)
user = User.find(id)
if user.nil?
# report the error? Thinking of using UserMailer
return false # stop the worker
end
# other processing here
end
This seems like a naive way to handle Sidekiq errors. The app needs to immediately notify admin if something breaks in the worker.
Am I missing something? What is a better way to handle errors in Sidekiq?

You can create your own error handler
Sidekiq.configure_server do |config|
config.error_handlers << Proc.new {|exception,context_hash| MyErrorService.notify(exception,context_hash) }
end

How should my scraping "stack" handle 404 errors?

I have a rake task that is responsible for doing batch processing on millions of URLs. Because this process takes so long I sometimes find that URLs I'm trying to process are no longer valid -- 404s, site's down, whatever.
When I initially wrote this there was basically just one site that would continually go down while processing so my solution was to use open-uri, rescue any exceptions produced, wait a bit, and then retry.
This worked fine when the dataset was smaller but now so much time goes by that I'm finding URLs are no longer there anymore and produce a 404.
Using the case of a 404, when this happens my script just sits there and loops infinitely -- obviously bad.
How should I handle cases where a page doesn't load successfully, and more importantly how does this fit into the "stack" I've built?
I'm pretty new to this, and Rails, so any opinions on where I might have gone wrong in this design are welcome!
Here is some anonymized code that shows what I have:
The rake task that makes a call to MyHelperModule:
# lib/tasks/my_app_tasks.rake
namespace :my_app do
desc "Batch processes some stuff # a later time."
task :process_the_batch => :environment do
# The dataset being processed
# is millions of rows so this is a big job
# and should be done in batches!
MyModel.where(some_thing: nil).find_in_batches do |my_models|
MyHelperModule.do_the_process my_models: my_models
end
end
end
end
MyHelperModule accepts my_models and does further stuff with ActiveRecord. It calls SomeClass:
# lib/my_helper_module.rb
module MyHelperModule
def self.do_the_process(args = {})
my_models = args[:my_models]
# Parallel.each(my_models, :in_processes => 5) do |my_model|
my_models.each do |my_model|
# Reconnect to prevent errors with Postgres
ActiveRecord::Base.connection.reconnect!
# Do some active record stuff
some_var = SomeClass.new(my_model.id)
# Do something super interesting,
# fun,
# AND sexy with my_model
end
end
end
SomeClass will go out to the web via WebpageHelper and process a page:
# lib/some_class.rb
require_relative 'webpage_helper'
class SomeClass
attr_accessor :some_data
def initialize(arg)
doc = WebpageHelper.get_doc("http://somesite.com/#{arg}")
# do more stuff
end
end
WebpageHelper is where the exception is caught and an infinite loop is started in the case of 404:
# lib/webpage_helper.rb
require 'nokogiri'
require 'open-uri'
class WebpageHelper
def self.get_doc(url)
begin
page_content = open(url).read
# do more stuff
rescue Exception => ex
puts "Failed at #{Time.now}"
puts "Error: #{ex}"
puts "URL: " + url
puts "Retrying... Attempt #: #{attempts.to_s}"
attempts = attempts + 1
sleep(10)
retry
end
end
end

TL;DR
Use out-of-band error handling and a different conceptual scraping model to speed up operations.
Exceptions Are Not for Common Conditions
There are a number of other answers that address how to handle exceptions for your use case. I'm taking a different approach by saying that handling exceptions is fundamentally the wrong approach here for a number of reasons.
In his book Exceptional Ruby, Avdi Grimm provides some benchmarks showing the performance of exceptions as ~156% slower than using alternative coding techniques such as early returns.
In The Pragmatic Programmer: From Journeyman to Master, the authors state "[E]xceptions should be reserved for unexpected events." In your case, 404 errors are undesirable, but are not at all unexpected--in fact, handling 404 errors is a core consideration!
In short, you need a different approach. Preferably, the alternative approach should provide out-of-band error handling and prevent your process from blocking on retries.
One Alternative: A Faster, More Atomic Process
You have a lot of options here, but the one I'm going to recommend is to handle 404 status codes as a normal result. This allows you to "fail fast," but also allows you to retry pages or remove URLs from your queue at a later time.
Consider this example schema:
ActiveRecord::Schema.define(:version => 20120718124422) do
create_table "webcrawls", :force => true do |t|
t.text "raw_html"
t.integer "retries"
t.integer "status_code"
t.text "parsed_data"
t.datetime "created_at", :null => false
t.datetime "updated_at", :null => false
end
end
The idea here is that you would simply treat the entire scrape as an atomic process. For example:
Did you get the page?
Great, store the raw page and the successful status code. You can even parse the raw HTML later, in order to complete your scrapes as fast as possible.
Did you get a 404?
Fine, store the error page and the status code. Move on quickly!
When your process is done crawling URLs, you can then use an ActiveRecord lookup to find all the URLs that recently returned a 404 status so that you can take appropriate action. Perhaps you want to retry the page, log a message, or simply remove the URL from your list of URLs to scrape--"appropriate action" is up to you.
By keeping track of your retry counts, you could even differentiate between transient errors and more permanent errors. This allows you to set thresholds for different actions, depending on the frequency of scraping failures for a given URL.
This approach also has the added benefit of leveraging the database to manage concurrent writes and share results between processes. This would allow you to parcel out work (perhaps with a message queue or chunked data files) among multiple systems or processes.
Final Thoughts: Scaling Up and Out
Spending less time on retries or error handling during the initial scrape should speed up your process significantly. However, some tasks are just too big for a single-machine or single-process approach. If your process speedup is still insufficient for your needs, you may want to consider a less linear approach using one or more of the following:
Forking background processes.
Using dRuby to split work among multiple processes or machines.
Maximizing core usage by spawning multiple external processes using GNU parallel.
Something else that isn't a monolithic, sequential process.
Optimizing the application logic should suffice for the common case, but if not, scaling up to more processes or out to more servers. Scaling out will certainly be more work, but will also expand the processing options available to you.

Curb has an easier way of doing this and can be a better (and faster) option instead of open-uri.
Errors Curb reports (and that you can rescue from and do something:
http://curb.rubyforge.org/classes/Curl/Err.html
Curb gem:
https://github.com/taf2/curb
Sample code:
def browse(url)
c = Curl::Easy.new(url)
begin
c.connect_timeout = 3
c.perform
return c.body_str
rescue Curl::Err::NotFoundError
handle_not_found_error(url)
end
end
def handle_not_found_error(url)
puts "This is a 404!"
end

You could just raise the 404's:
rescue Exception => ex
raise ex if ex.message['404']
# retry for non-404s
end

It all just depends on what you want to do with 404's.
Lets assume that you just want to swallow them. Part of pguardiario's response is a good start: You can raise an error, and retry a few times...
# lib/webpage_helper.rb
require 'nokogiri'
require 'open-uri'
class WebpageHelper
def self.get_doc(url)
attempt_number = 0
begin
attempt_number = attempt_number + 1
page_content = open(url).read
# do more stuff
rescue Exception => ex
puts "Failed at #{Time.now}"
puts "Error: #{ex}"
puts "URL: " + url
puts "Retrying... Attempt #: #{attempts.to_s}"
sleep(10)
retry if attempt_number < 10 # Try ten times.
end
end
end
If you followed this pattern, it would just fail silently. Nothing would happen, and it would move on after ten attempts. I would generally consider that a Bad Plan(tm). Instead of just failing out silently, I would go for something like this in the rescue clause:
rescue Exception => ex
if attempt_number < 10 # Try ten times.
retry
else
raise "Unable to contact #{url} after ten tries."
end
end
and then throw something like this in MyHelperModule#do_the_process (you'd have to update your database to have an errors and error_message column):
my_models.each do |my_model|
# ... cut ...
begin
some_var = SomeClass.new(my_model.id)
rescue Exception => e
my_model.update_attributes(errors: true, error_message: e.message)
next
end
# ... cut ...
end
That's probably the easiest and most graceful way to do it with what you currently have. That said, if you're handling that many request in one massive rake tasks, that's not very elegant. You can't restart it if something goes wrong, it's tying up a single process on your system for a long time, etc. If you end up with any memory leaks (or infinite loops!), you find yourself in a place where you can't just say 'move on'. You probably should be using some kind of queueing system like Resque or Sidekiq, or Delayed Job (though it sounds like you have more items that you'd end up queueing than Delayed Job would happily handle). I'd recommend digging in to those if you're looking for a more eloquent approach.

I actually have a rake task that does something remarkably similar. Here is the gist of what I did to deal with 404's and you could apply it pretty easy.
Basically what you want to do is to use the following code as a filter and create a logfile to store your errors. So before you grab the website and process it you first do the following:
So create/instantiate a logfile in your file:
#logfile = File.open("404_log_#{Time.now.strftime("%m/%d/%Y")}.txt","w")
# #{Time.now.strftime("%m/%d/%Y")} Just includes the date into the log in case you want
# to run diffs on your log files.
Then change your WebpageHelper class to something like this:
class WebpageHelper
def self.get_doc(url)
response = Net::HTTP.get_response(URI.parse(url))
if (response.code.to_i == 404) notify_me(url)
else
page_content = open(url).read
# do more stuff
end
end
end
What this is doing is pinging the page for a response code. The if statement I included is checking if the response code is a 404 and if it is run the notify_me method otherwise run your commands as usual. I just arbitrarily created that notify_me method as an example. On my system I have it writing to txt file that it emails me upon completion. You could use a similar method to look at other response codes.
Generic logging method:
def notify_me(url)
puts "Failed at #{Time.now}"
puts "URL: " + url
#logfile.puts("There was a 404 error for the site #{url} at #{Time.now}.")
end

Regarding the problem you're experiencing, you can do the following:
class WebpageHelper
def self.get_doc(url)
retried = false
begin
page_content = open(url).read
# do more stuff
rescue OpenURI::HTTPError => ex
unless ex.io.status.first.to_i == 404
log_error ex.message
sleep(10)
unless retried
retried = true
retry
end
end
# FIXME: needs some refactoring
rescue Exception => ex
puts "Failed at #{Time.now}"
puts "Error: #{ex}"
puts "URL: " + url
puts "Retrying... Attempt #: #{attempts.to_s}"
attempts = attempts + 1
sleep(10)
retry
end
end
end
But I'd rewrite the whole thing in order to do parallel processing with Typhoeus:
https://github.com/typhoeus/typhoeus
where I'd assign a callback block which would do the handling of the returned data, thus decoupling the fetching of the page and the processing.
Something along the lines:
def on_complete(response)
end
def on_failure(response)
end
def run
hydra = Typhoeus::Hydra.new
reqs = urls.collect do |url|
Typhoeus::Request.new(url).tap { |req|
req.on_complete = method(:on_complete).to_proc }
hydra.queue(req)
}
end
hydra.run
# do something with all requests after all requests were performed, if needed
end

I think everyone's comments on this question are spot on and correct. There is alot of good info on this page. Here is my attempt at collecting this very hefty bounty. That being said +1 to all answers.
If you are only concerned with 404 using OpenURI you can handle just those types of exceptions
# lib/webpage_helper.rb
rescue OpenURI::HTTPError => ex
# handle OpenURI HTTP Error!
rescue Exception => e
# similar to the original
case e.message
when /404/ then puts '404!'
when /500/ then puts '500!'
# etc ...
end
end
If you want a bit more you can do different Execption handling per type of error.
# lib/webpage_helper.rb
rescue OpenURI::HTTPError => ex
# do OpenURI HTTP ERRORS
rescue Exception::SyntaxError => ex
# do Syntax Errors
rescue Exception => ex
# do what we were doing before
Also I like what is said in the other posts about number of attempts. Makes sure it isn't an infinite loop.
I think the rails thing to do after a number of attempts would be to log, queue, and or email.
To log you can use
webpage_logger = Log4r::Logger.new("webpage_helper_logger")
# somewhere later
# ie 404
case e.message
when /404/
then
webpage_logger.debug "debug level error #{attempts.to_s}"
webpage_logger.info "info level error #{attempts.to_s}"
webpage_logger.fatal "fatal level error #{attempts.to_s}"
There are many ways to queue.
I think some of the best are faye and resque. Here is a link to both:
http://faye.jcoglan.com/
https://github.com/defunkt/resque/
Queues work just like a line. Believe it or not the Brits call lines, "queues" (The more you know). So, using a queuing server then you can line up many requests and when the server you are trying to send the request comes back, you can hammer that server with your requests in the queue. Thus forcing their server to go down again, but hopefully over time they will upgrade their machines because they keep crashing.
And finally to email, rails also to the rescue (not resque)...
Here is the link to rails guide on ActionMailer: http://guides.rubyonrails.org/action_mailer_basics.html
You could have a mailer like this
class SomeClassMailer < ActionMailer::Base
default :from => "notifications#example.com"
def self.mail(*args)
...
# then later
rescue Exception => e
case e.message
when /404/ && attempts == 3
SomeClassMailer.mail(:to => "broken#example.com", :subject => "Failure ! #{attempts}")

Instead of using initialize, which always returns a new instance of an object, when creating a new SomeClass from a scraping, I'd use a class method to create the instance. I'm not using exceptions here beyond what nokogiri is throwing because it sounds like nothing else should bubble up further since you just want these to be logged, but otherwise be ignored. You mentioned logging the exceptions--are you just logging what goes to stdout? I'll answer as if you are...
# lib/my_helper_module.rb
module MyHelperModule
def self.do_the_process(args = {})
my_models = args[:my_models]
# Parallel.each(my_models, :in_processes => 5) do |my_model|
my_models.each do |my_model|
# Reconnect to prevent errors with Postgres
ActiveRecord::Base.connection.reconnect!
some_object = SomeClass.create_from_scrape(my_model.id)
if some_object
# Do something super interesting if you were able to get a scraping
# otherwise nothing happens (except it is noted in our logging elsewhere)
end
end
end
Your SomeClass:
# lib/some_class.rb
require_relative 'webpage_helper'
class SomeClass
attr_accessor :some_data
def initialize(doc)
#doc = doc
end
# could shorten this, but you get the idea...
def self.create_from_scrape(arg)
doc = WebpageHelper.get_doc("http://somesite.com/#{arg}")
if doc
return SomeClass.new(doc)
else
return nil
end
end
end
Your WebPageHelper:
# lib/webpage_helper.rb
require 'nokogiri'
require 'open-uri'
class WebpageHelper
def self.get_doc(url)
attempts = 0 # define attempts first in non-block local scope before using it
begin
page_content = open(url).read
# do more stuff
rescue Exception => ex
attempts += 1
puts "Failed at #{Time.now}"
puts "Error: #{ex}"
puts "URL: " + url
if attempts < 3
puts "Retrying... Attempt #: #{attempts.to_s}"
sleep(10)
retry
else
return nil
end
end
end
end

perform not being called for Delayed Jobs

I'm using delayed_job 2.1.4 from collectiveidea, and it seems the perform method is never called even though the jobs are processed and removed from the queue. Am I missing something?
I'm using Rails 3.0.5 on Heroku
In the Controller:
Delayed::Job.enqueue FacebookJob.new
In the Job class:
class FacebookJob
def initialize
end
def perform
fb_auths = Authentication.where(:provider => 'facebook')
fb_auths.each do |auth|
checkins = FbGraph::User.new('me', :access_token => URI.encode(auth.token)).checkins
if checkins != nil
checkins.each do |checkin|
[...]
end
end
end
end
end
(the whole code: https://gist.github.com/966509)

The simple answer: does DelayedJob know about the Authentication and FBGraph::User classes? If not, you'll see exactly the behavior you describe: the items will be silently removed from the queue.
See this entry in the Delayed Job Wiki in the Delayed Job Wiki.
Try adding 'require authentication' and 'require fb_graph' (or whatever) in your facebook_job.rb file.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart