How to debug Cron Jobs on Google App Engine? - ruby-on-rails

Situation
Cron is (alongside the Rails app), deployed to GCP with cron.yaml:
cron:
- description: count things regularly
url: /api/v1/cron/rake_task
schedule: every 30 minutes
timezone: Europe/Berlin
Problem
Question
How to see the cron log? View reveals nothing at all, but clearly there has to be a sensible way to debug the failure.
On a standard environment one could go after /var/log/syslog or /var/log/cron.log, but here there's nothing if I log in to VM or even go after the main gaeapp container.
Any leads would be welcome!

Cron log gets populated into the log of default service.
Instead of View link log of default service should be filtered for the Rake task route that the cron is calling.

I have a base http request handler that all the other request handlers inherit. I wrapped it up in a try-catch so that I can email myself the stack trace every time an exception occurs.
Here is how I did it in python/webapp2, I'm sure you could setup something similar in ruby. (alternatively you could use a product like https://rollbar.com/)
from google.appengine.api import mail, app_identity
def email_dev(subject, body):
try:
app_id = app_identity.get_application_id()
mail.AdminEmailMessage(
sender="%s <no-reply#%s.appspotmail.com>" % (app_id, app_id),
subject=subject,
body=body,
).send()
except Exception as e:
print("Error sending email: "+str(e))
class BaseHandler(webapp2.RequestHandler):
def dispatch(self, *args, **kwargs):
try:
return super(BaseHandler, self).dispatch(*args, **kwargs)
except Exception:
email_dev(
"Handler: "+self.__class__.__name__+" failed",
traceback.format_exc())
raise

Related

How to mark a sidekiq task/job for retry without raising an error?

I use a Sidekiq queue to process communications with an unreliable, 3rd party API. Since this API is often down for a couple minutes at a time and then back up again, Sidekiq has been handy. When a connection issue happens, an error is raised and Sidekiq throws the job back in the queue to be retried again later, after some time has passed.
I use NewRelic to not only help debug crashes, but also for monitoring. My problem is that this current methodology above creates errors in NewRelic. If the 3rd party API is down for more than a couple of minutes, the error count accumulates enough to cause notifications to send out through NewRelic.
What I'd like to do is only raise an error from my worker when a certain number of retries have occurred for a job. I'm using sidekiq_retries_exhausted to do this. My problem is that I'm not quite sure how to put jobs back in the queue after they have an error without raising an error.
Does Sidekiq provide any facilities to return a job to a queue, increment the number of retries for the job, and have it sit there until it's due to run again, as if an exception was raised in the worker class?
You raise a specific error and tell the error service to ignore errors of that type. For NewRelic:
https://docs.newrelic.com/docs/agents/ruby-agent/installation-configuration/ruby-agent-configuration#error_collector.ignore_errors
Here is what I did to keep intentional retry errors out of AirBrake:
class TaskWorker
include Sidekiq::Worker
class RetryNotAnError < RuntimeError
end
def perform task_id
task = Task.find(task_id)
task.do_cool_stuff
if task.finished?
#log.debug "Task #{task_id} was successful."
return false
else
#log.debug "Task #{task_id} will try again later."
raise RetryNotAnError, task_id
end
end
end
Tell Airbrake to ignore it:
Airbrake.configure do |config|
config.ignore << 'RetryNotAnError'
end
It's good to make your exception name OBVIOUSLY not an error (e.g. RetryLaterNotAnError), as it will still show up in logs and such, and you don't want to freak people out when they see a bunch of them.
ps. That said, I would really like to see Sidekiq to provide an explicit, errorless retry mechanism.
If using Sidekiq Enterprise, one other option might be to utilize the optional set of additional error types that will then get treated as Sidekiq::Limiter::OverLimit violations.
For my purposes, I've used a new error class and then added it to the list in the config. Here are the notes from the sidekiq-ent code (not in the public sidekiq repo) on how to modify your config file:
# An optional set of additional error types which would be
# treated as a rate limit violation, so the job would automatically
# be rescheduled as with Sidekiq::Limiter::OverLimit.
#
# Sidekiq::Limiter.errors << MyApp::TooMuch
# Sidekiq::Limiter.errors = [Foo::Error, MyApp::Limited]
Inside the specific job you can specify the max_retries, or it will default to 20:
sidekiq_options max_limiter_retries: 10
Inside the job, I'll rescue the "expected" intermittent error that I'd rather not ignore completely and then raise the error I've added to the list, something like this:
rescue RestClient::RequestTimeout => e
raise SidekiqSoftRetry.new(e.inspect)
end
Here's what that looks like in my initialization file-- and Mike Perham was kind enough to respond with the option to update the global retry limit.
class SidekiqSoftRetry < RuntimeError
end
Sidekiq::Limiter::DEFAULT_OPTIONS[:reschedule] = 10
Sidekiq::Limiter.configure do |config|
config.errors.concat(
[
SidekiqSoftRetry,
]
)
end

Write to the system's standard error in Progress

I am writing a small program in Progress that needs to write an error message to the system's standard error. What ways, simple if at all possible, can I use to print to standard error?
I am using OpenEdge 11.3.
When on Windows (10.2B+) you can use .NET:
System.Console:Error:WriteLine ("This is an error message") .
together with
prowin32 2> stderr.out
Progress doesn't provide a way to write to stderr - the easiest way I can think of is to output-through an external program that takes stdin and echoes it to stderr.
You could look into LOG-MANAGER:WRITE-MESSAGE. It won't log to standard output or standard error, but to a client-specific log. This log should be monitored in any case (specifically if the client is an application server).
From the documentation:
For an interactive or batch client, the WRITE-MESSAGE( ) method writes the log entries to the log file specified by the LOGFILE-NAME attribute or the Client Logging (-clientlog) startup parameter. For WebSpeed agents and AppServer servers, the WRITE-MESSAGE() method writes the log entries to the server log file. For DataServers, the WRITE-MESSAGE() method writes the log entries to the log file specified by the DataServer Logging (-dslog) startup parameter.
LOG-MANAGER:WRITE-MESSAGE("Got here, x=" + STRING(x), "DEBUG1").
Will write this in the log:
[04/12/05#13:19:19.742-0500] P-003616 T-001984 1 4GL DEBUG1 Got here, x=5
There are quite a lot of options regarding the LOG-MANAGER system, what messages to display, where the file is placed, etc.
There is no easy way, but in Unixen you can always do something like this using OUTPUT THROUGH (untested):
output through "cat >&2" no-echo unbuffered.
Alternatively -- and this is tested -- if you just want error messages from a batch-mode program to go to standard out then
output through "tee" ...
...definitely works.

Why is my Ruby script utilizing 90% of my CPU?

I wrote a admin script that tails a heroku log and every n seconds, it summarizes averages and notifies me if i cross a certain threshold (yes I know and love new relic -- but I want to do custom stuff).
Here is the entire script.
I have never been a master of IO and threads, I wonder if I am making a silly mistake. I have a couple of daemon threads that have while(true){} which could be the culprit. For example:
# read new lines
f = File.open(file, "r")
f.seek(0, IO::SEEK_END)
while true do
select([f])
line = f.gets
parse_heroku_line(line)
end
I use one daemon to watch for new lines of a log, and the other to periodically summarize.
Does someone see a way to make it less processor-intensive?
This probably runs hot because you never really block while reading from the temporary file. IO::select is a thin layer over POSIX select(2). It looks like you're trying to block until the file is ready for reading, but select(2) considers EOF to be ready ("a file descriptor is also ready on end-of-file"), so you always return right away from select then call gets which returns nil at EOF.
You can get a truer EOF reading and nice blocking behavior by avoiding the thread which writes to the temp file and instead using IO::popen to fork the %x[heroku logs --ps router --tail --app pipewave-cedar] log tailer, connected to a ruby IO object on which you can loop over gets, exiting when gets returns nil (indicating the log tailer finished). gets on the pipe from the tailer will block when there's nothing to read and your script will only run as hot as it takes to do your line parsing and reporting.
EDIT: I'm not set up to actually try your code, but you should be able to replace the log tailer thread and your temp file read loop with this code to get the behavior described above:
IO.popen( %w{ heroku logs --ps router --tail --app my-heroku-app } ) do |logf|
while line = logf.gets
parse_heroku_line(line) if line =~ /^/
end
end
I also notice your reporting thread does not do anything to synchronize access to #total_lines, #total_errors, etc. So, you have some minor race conditions where you can get inconsistent values from the instance vars that parse_heroku_line method updates.
select is about whether a read would block. f is just a plain old file, so you when get to the end reads don't block, they just return nil instantly. As a result select returns instantly rather than waiting for something to be appending to the file as I assume you're expecting. Because of this you're sitting in a tight busy loop, so high cpu is to be expected.
If you are at eof (you could either check f.eof? or whether gets returns nil), then you could either start sleeping (perhaps with some sort of back off) or use something like listen to be notified of filesystem changes

Symfony mailer: Swift_TransportException between message sending

On a current project which I'm currently working, i have a symfony task that runs some mass data insertion to database and runs it for at least half an hour.
When the task starts a mail notification is sent correctly, the problem is that at the of the task execution we can't send another mail to notify about the end of processing.
The mailer factory is currently configured with the spool delivery strategy but, in this specific situation, we desire to fire a notification immediately, using the sendNextImmediately() method.
I'm are getting the exception:
[Swift_TransportException]
Expected response code 250 but got code "451", with message "451 4.4.2 Timeout - closing connection. 74sm1186065wem.17
"
and the flowing error on php log file:
Warning: fwrite(): SSL: Broken pipe in /var/www/project/lib/vendor/symfony/lib/vendor/swiftmailer/classes/Swift/Transport/StreamBuffer.php on line 209
Can anyone give some help?
Is there any way that i can perhaps refresh symfony mailer to establish a new connection?
Doing a Symfony2 project, I ran across this failure too. We were using a permanently running php script, which produced the error.
We figured out that following code does the job:
private function sendEmailMessage($renderedTemplate, $subject, $toEmail)
{
$mailer = $this->getContainer()->get('mailer');
/* #var $mailer \Swift_Mailer */
if(!$mailer->getTransport()->isStarted()){
$mailer->getTransport()->start();
}
$sendException = null;
/* #var $message \Swift_Message */
$message = \Swift_Message::newInstance()
->setSubject($subject)
->setFrom($this->getContainer()->getParameter('email_from'))
->setTo($toEmail)
->setBody($renderedTemplate);
$mailer->send($message);
$mailer->getTransport()->stop();
//throw $sendException;
}
For Symfony1 Users
My guess was that the connection was being hold for too long (with no activity at all), causing an ssl connection timeout.
For now, the problem can be solved by stopping the Swift_Transport instance and starting it again explicitly, just before sending the second message.
Here is the code:
$this->getMailer()->getRealtimeTransport()->stop();
$this->getMailer()->getRealtimeTransport()->start();
$this->getMailer()->sendNextImmediately()->send($message);
I had exactly the same problem and above solutions were very helpful, but there is one thing I had to do differently: order.
$this->getMailer()->sendNextImmediately()->send($message);
$this->getMailer()->getRealtimeTransport()->stop();
It didn't worked for me if I tried to stop Transport before sending message (connection timeout was already hanging). Also You don't need to run getRealtimeTransport()->start() - it will be started automatically.

how can we tell delayed_job when a delayed task fails so it will auto-retry?

Our app is hosted on heroku and we use delayed job when sending info to a remote system (via a GET to a url with some url params)
the remote system returns a success code usually, but it it;s real busy it returns a tryagain code.
suppose the our method is
def send_info
the_url = "http://mydomain.com/dosomething?arg=#{self.someval}"
the_result = open(the_url).read
successflag = get_success_flag_from(the_result)
end
and so somewhere in our code we do
#widget.delay.send_info
and that all works fine.
Except it does not automatically handle the case where the remote said to try back later.
Is there any way for the send_info method (which is what delayed job will execute) to "tell" delayed_job "retry me again"? Do we need to throw some custom exception or something?
Raising any kind of exception ought to cause delayed_job to requeue it (subject to only-trying-so-many-times); if you don't especially need a custom exception you can just raise a RuntimeError.

Resources