Sidekiq worker shutdown - ruby-on-rails

I have got strange errors
2016-07-15T14:34:09.334Z 16484 TID-33ld0 WARN: Terminating 1 busy worker threads
2016-07-15T14:34:09.334Z 16484 TID-33ld0 WARN: Work still in progress [#<struct Sidekiq::BasicFetch::UnitOfWork queue="queue:load_xml", job="{\"class\":\"GuitarmaniaWorker\",\"args\":[],\"retry\":false,\"queue\":\"load_xml\",\"jid\":\"56c01b371c3ee077c2ccf440\",\"created_at\":1468590072.35382,\"enqueued_at\":1468590072.3539252}">]
2016-07-15T14:34:09.334Z 16484 TID-33ld0 DEBUG: Re-queueing terminated jobs
2016-07-15T14:34:09.335Z 16484 TID-33ld0 INFO: Pushed 1 jobs back to Redis
2016-07-15T14:34:09.336Z 16484 TID-33ld0 INFO: Bye!
What can cause it?
Locally all work great but after deployment on production server this errors appeared.
Any suggestions.

This means the Sidekiq is being shut down. In the event of a normal "kill" operation (TERM signal), Sidekiq server will attempt to shutdown gracefully by waiting 8 seconds for jobs to be complete. It will then stop the job execution and re-queue for the next time the server starts.
That begs the question, why is your Sidekiq shutting down. Possible reasons are: you on command-line or a script killed the process; you or your data host shut down the machine; your OS ran out of memory. The last reason is the most likely if you're not sure of the cause.
If the memory leak occurs soon after startup, you may be running too many Sidekiq processes and/or your machine may not have enough memory to load the app. If the message happens after some time, it's possible you have a memory leak - run free periodically after starting Sidekiq to see if memory usage scales up, which would indicate a leak. Sometimes a memory leak is due to a library, sometimes it's your own application. More about tracking down leaks in Ruby here.

Related

Docker container stopping in the Jelastic environment

When stopping a Docker container in native Docker environment, by default it sends the SIGTERM signal to the container's init process (PID 1) which should be the actual application, which should then handle the shutdown properly. However when running container in the Jelastic, this does not seem to be case, and instead of gracefully terminating the SQL server, it seems that the server crashes every time.
I did try writing and enabling a Systemd service that gets the SQL PID and then send SIGTERM to it, but it doesn't seem to run, and judging from the logs there's no service shutdown messages at all, just startup messages.
So what changes would be required to the container or the environment to get the server to get the SIGTERM signal and have enough time, maybe few seconds, to do the graceful shutdown?
thank you for reporting the issue, we tried to reproduce the problem on our test lab and were able to get exactly same result. We agree that issue is really serious so we are going to fix it with highest priority now. Please accept our apologies for that inconvenience. I want to notice that due to our primary design we also expect the process to be terminated first with "sigterm" signal and only after not receiving a termination result for some period of time the system had to send "sigkill", only after considering that process cannot be terminated gracefully. Our engineers will work on this to explore the issue deeper and will deliver a fix shortly.
Thank you!

sidekiq memory usage reset

I have Rails app which uses Sidekiq for background process. To deploy this application I use capistrano, ubuntu server and apache passenger. To start and restart Sidekiq I use capistrano-sidekiq gem.
My problem is - when Sidekiq is running, amount of memory (RAM) used by Sidekiq is growing up. And when Sidekiq finished all processes (workers) it keeps holding a large amount of RAM and not reseting it.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
ubuntu 2035 67.6 45.4 3630724 1838232 ? Sl 10:03 133:59 sidekiq 3.5.0 my_app [0 of 25 busy]
How to make Sidekiq to reset used memory after workers finished their work?
Sidekiq uses thread to execute jobs.
And threads share the same memory as the parent process.
So if one job uses a lot of memory the Sidekiq process memory usage will grow up and won't be released by Ruby.
Resque uses another technique it executes every jobs in another process therefore when the job is done, the job's process exits and the memory is released.
One way to prevent your Sidekiq process from using too much memory is to use Resque's forking method.
You could have your job main method executed in another process and wait until that new process exits
ex:
class Job
include Process
def perform
pid = fork do
# your code
end
waitpid(pid)
end
end

Delayed Jobs workers getting timed out on engineyard

I think i'm having a problem where engineyard is adding a timeout to some of my delayed job workers, (seems to be 10 minutes). I have a copy process that can run for > 10 minutes and everytime it gets to that 10 minutes threshold the job is killed. Is there anyway to configure the engineyard timeout for worker instances?? I'm looking through and all I see is timeouts regarding nginx/apache
There isn't a timeout set for the Delayed Job workers, so this is more likely a memory usage issue. Monit tracks the memory consumed by the workers and will restart those that reach a set threshold. Monit's actions will be logged in /var/log/syslog, so this can be checked to confirm if Monit is terminating the workers. The memory threshold is set in the /etc/monit.d/delayed_job.monitrc file(s) and can be increased to fit the workers' requirements. After alteration of the configuration Monit must be reloaded using sudo monit reload.
If you submit a ticket at https://support.cloud.engineyard.com the support staff will be more than happy to help you further diagnose this issue.

Sidekiq worker is leaking memory

Using the sidekiq gem - I have sidekiq worker that runs a process (git-tf clone of big repository) using IO.popen and tracks the stdout to check the progress of the clone.
When I am running the worker, I see that sidekiq memory is getting larger over the time until I get kernel OOM and the process get killed. the subprocess (java process) is taking only 5% of the total memory.
How I can debug/check the memory leak I have in my code? and does the sidekiq memory is the total of my workers memory with the popen process?
And does anyone have any idea how to fix it?
EDIT
This is the code of my worker -
https://gist.github.com/yosy/5227250
EDIT 2
I ran the code without sidekiq, and I have no memory leaks.. this is something strange with sidekiq and big repositories in tfs
I didn't find the cause for the memory leak in sidekiq, but I found a away to get a way from sidekiq.
I have modified git-tf to have server command that accepts command from redis queue, it removes lot of complexity from my code.
The modified version of git-tf is here:
https://github.com/yosy/gittf
I will add documentation later about the sever command when I will fix some bugs.

Possible memory issue crashing Hbase Thrift Server

I'm running Cloudera CDH4 with Hbase and Hbase Thrift Server. Several times a day, the Thrift Server crashes.
In /var/log/hbase/hbase-hbase-thrift-myserver.out, there is this:
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
# Executing /bin/sh -c "kill -9 8151"...
In /var/log/hbase/hbase-hbase-thrift-myserver.log, there are no error messages at the end of the file. There are only a lot of DEBUG messages stating that one of the nodes is caching a particular file.
I can't figure out any configuration options for the Hbase Thrift Server. There are no obvious files in /etc/. Just /etc/hbase/conf and its Hbase files.
Any ideas on debugging?
We had this exact same problem with our HBase Thrift setup, and ended up using a watchdog script that restarts Thrift if its not running.
Are you hitting your HBase server hard, several times a day? That could result in this. No way around this, Thrift does seem to take up (or leak) a lot of memory every time its used, so you need a watchdog script.
If a watchdog script is too heavy-duty, you could use a simple cron job to restart Thrift during frequent intervals to make sure it stays up.
The following cron restarts Thrift every two hours.
0 */2 * * * hbase-daemon.sh restart thrift
Using /etc/hbase/conf/hbase-env.sh, I increased my heap size, and this addressed the crashing issue.
# The maximum amount of heap to use, in MB. Default is 1000.
export HBASE_HEAPSIZE=8000
Thanks to Harsh J on the CDH Users mailing list for helping me figure out. As he pointed out, my lack of log messages indicates a kill -9 is probably taking place:
Indeed if a shutdown handler message is missing in the log tail
pre-crash, there may have been a kill -9 passed to the process via the
OOM handler.
increasing heap size may not be the solution always.
as per this cloudera blog,
Thrift server might be receiving invalid data.
i would suggest to enable the Framed transport and compact protocol.
there's a catch if you enable these protocols on server, client should be using the same protocol.

Resources