How to check health of docker image running sidekiq - ruby-on-rails

I'm using kubernetes on my cluster with several rails / node docker images. Most of them have :3000/healtz health check that simply returns status 200 with OK in body.
Now I'm trying to discover the best way how this health check can be performed on docker image running sidekiq. How I can verify that the worker is running?

If your image is unix like, you can check if the proccess is running with
$ ps aux | grep '[s]idekiq'
But this don't guarantee that everything is working inside sidekiq and redis.
A better approach is described/developed in this sidekiq plugin https://github.com/arturictus/sidekiq_alive
I'm facing problems with livenessProbe for k8s and trying to solve without using this lib but not successful yet.

Sidekiq 6.0 ships with a new sidekiqmon, which you could use to validate that a process is running on your current machine with redis:
REDIS_URL=redis://redis.example.com:6380/5 sidekiqmon | grep $$(hostname)
Documentation: https://github.com/mperham/sidekiq/wiki/Monitoring#sidekiqmon

You have different approaches depending mostly on how you deployed Sidekiq inside the Kubernetes pod.
Here some of them.
SIDEKIQMON
Since Sidekiq v6 we have sidekiqmon
You can try with something like:
bundle exec sidekiqmon processes | grep $(hostname)
To make it work you must be able to grep the correct running process related to current pod/container.
PS AUX (I used in the past)
Something like this could work:
ps aux | grep '[s]idekiq 6'
Note the 6 (sidekiq version) at the end to differentiate from the other process that you could have.
Anyway here the idea is to just check your processes with ps aux and try to "grep" the sidekiq one you need to monitor.
SIDEKIQ_ALIVE
Another approach is to use sidekiq_alive
But it requires more effort to be configured and managed and I never really tried it. This is the only one that really check the full functionality with also the Redis status.
SYSTEMCTL
If you deployed sidekiq using systemd you can just do:
systemctl status sidekiq | grep '[r]unning'"
So at the end my YAML manifest for Sidekiq deployment looks like this:
livenessProbe:
exec:
command: ["/bin/bash", "-l", "-c", "bundle exec sidekiqmon processes | grep $(hostname)"]
initialDelaySeconds: 120
periodSeconds: 30
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 30
Note that only the sidekiq_alive approach is going to perform a full end to end test, checking also the redis status. Maybe this is too much, you could have other monitoring system for Redis so maybe here, with the kubernetes LivenessProbes, you don't need to check the whole flow but just check if the sidekiq process is still alive or crashed due to a memory leak. This is up to you, based on your needs.
"Why you need this kind of LivenessProbes?"
Because if you are launching Sidekiq via systemd or similar, you will have the main process (PID 1) as /usr/bin/python3 /usr/bin/systemctl start sidekiq and the related process launched with ExecStart as sidekiq 6.1.3 your_app_name [0 of 5 busy]. If you have a memory leak, the last one will crash and not the PID 1.
For this reason you need a sort of livenessProbes for Sidekiq.
Instead if you are not using systemd and your PID 1 is sidekiq 6.1.3 your_app_name [0 of 5 busy] because you are launching it with bundle exec sidekiq you don't need it.

Related

Pulumi does not perform graceful shutdown of kubernetes pods

I'm using pulumi to manage kubernetes deployments. One of the deployments runs an image which intercepts SIGINT and SIGTERM signals to perform a graceful shutdown like so (this example is running in my IDE):
{"level":"info","msg":"trying to activate gcloud service account","time":"2021-06-17T12:19:25-05:00"}
{"level":"info","msg":"env var not found","time":"2021-06-17T12:19:25-05:00"}
{"Namespace":"default","TaskQueue":"main-task-queue","WorkerID":"37574#Paymahns-Air#","level":"error","msg":"Started Worker","time":"2021-06-17T12:19:25-05:00"}
{"Namespace":"default","Signal":"interrupt","TaskQueue":"main-task-queue","WorkerID":"37574#Paymahns-Air#","level":"error","msg":"Worker has been stopped.","time":"2021-06-17T12:19:27-05:00"}
{"Namespace":"default","TaskQueue":"main-task-queue","WorkerID":"37574#Paymahns-Air#","level":"error","msg":"Stopped Worker","time":"2021-06-17T12:19:27-05:00"}
Notice the "Signal":"interrupt" with a message of Worker has been stopped.
I find that when I alter the source code (which alters the docker image) and run pulumi up the pod doesn't gracefully terminate based on what's described in this blog post. Here's a screenshot of logs from GCP:
The highlighted log line in the image above is the first log line emitted by the app. Note that the shutdown messages aren't logged above the highlighted line which suggests to me that the pod isn't given a chance to perform a graceful shutdown.
Why might the pod not go through the graceful shutdown mechanisms that kubernetes offers? Could this be a bug with how pulumi performs updates to deployments?
EDIT: after doing more investigation I found that this problem is happening because starting a docker container with go run /path/to/main.go actually ends up created two processes like so (after execing into the container):
root#worker-ffzpxpdm-78b9797dcd-xsfwr:/gadic# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.3 0.3 2046200 30828 ? Ssl 18:04 0:12 go run src/cmd/worker/main.go --temporal-host temporal-server.temporal.svc.cluster.local --temporal-port 7233 --grpc-port 6789 --grpc-hos
root 3782 0.0 0.5 1640772 43232 ? Sl 18:06 0:00 /tmp/go-build2661472711/b001/exe/main --temporal-host temporal-server.temporal.svc.cluster.local --temporal-port 7233 --grpc-port 6789 --
root 3808 0.1 0.0 4244 3468 pts/0 Ss 19:07 0:00 /bin/bash
root 3817 0.0 0.0 5900 2792 pts/0 R+ 19:07 0:00 ps aux
If run kill -TERM 1 then the signal isn't forwarded to the underlying binary, /tmp/go-build2661472711/b001/exe/main, which means the graceful shutdown of the application isn't executed. However, if I run kill -TERM 3782 then the graceful shutdown logic is executed.
It seems the go run spawns a subprocess and this blog post suggests the signals are only forwarded to PID 1. On top of that, it's unfortunate that go run doesn't forward signals to the subprocess it spawns.
The solution I found is to add RUN go build -o worker /path/to/main.go in my dockerfile and then to start the docker container with ./worker --arg1 --arg2 instead of go run /path/to/main.go --arg1 --arg2.
Doing it this way ensures there aren't any subprocess spawns by go and that ensures signals are handled properly within the docker container.

Issue accessing vespa outside docker container

Installed Docker on Mac and trying to run Vespa on Docker following steps specified in following link
https://docs.vespa.ai/documentation/vespa-quick-start.html
I did n't had any issues till step 4. I see vespa container running after step 2 and step 3 returned 200 OK response.
But Step 5 failed to return 200 OK response. Below is the command I ran on my terminal
curl -s --head http://localhost:8080/ApplicationStatus
I keep getting
curl: (52) Empty reply from server whenever I run without -s option.
So I tried to see listening ports inside my vespa container and don't see anything for 8080 but can see for 19071(used in step 3)
➜ ~ docker exec vespa bash -c 'netstat -vatn| grep 8080'
➜ ~ docker exec vespa bash -c 'netstat -vatn| grep 19071'
tcp 0 0 0.0.0.0:19071 0.0.0.0:* LISTEN
Below doc has info related to vespa ports
https://docs.vespa.ai/documentation/reference/files-processes-and-ports.html
I'm assuming port 8080 should be active after docker run(step 2 of quick start link) and can be accessed outside container as port mapping is done.
But I don't see 8080 port active inside container in first place.
A'm I missing something. Do I need to perform any additional step than mentioned in quick start? FYI I installed Jenkins inside my docker and was able to access outside container via port mapping. But not sure why it's not working with vespa.I have been trying from quiet sometime but no progress. Please advice me if I'm missing something here.
You have too low memory for your docker container, "Minimum 6GB memory dedicated to Docker (the default is 2GB on Macs).". See https://docs.vespa.ai/documentation/vespa-quick-start.html
The deadlock detector warnings and failure to get configuration from configuration server (which is likely oom killed) indicates that you are too low on memory.
My guess is that your jdisc container had not finished initialize or did not initialize properly? Did you try to check the log?
docker exec vespa bash -c '/opt/vespa/bin/vespa-logfmt /opt/vespa/logs/vespa/vespa.log'
This should tell you if there was something wrong. When it is ready to receive requests you would see something like this:
[2018-12-10 06:30:37.854] INFO : container Container.org.eclipse.jetty.server.AbstractConnector Started SearchServer#79afa369{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
[2018-12-10 06:30:37.857] INFO : container Container.org.eclipse.jetty.server.Server Started #10280ms
[2018-12-10 06:30:37.857] INFO : container Container.com.yahoo.container.jdisc.ConfiguredApplication Switching to the latest deployed set of configurations and components. Application switch number: 0
[2018-12-10 06:30:37.859] INFO : container Container.com.yahoo.container.jdisc.ConfiguredApplication Initializing new set of configurations and components. Application switch number: 1

solr 6.3.0 not starting Ubuntu 14.04

I am trying to run solr on my machine. I have made everthing available for the same.
For example java and ruby versions are same as asked in the tutorials around.
This is how I am doing it.
solr_wrapper -d solr/config/ --collection_name hydra-development --version 6.3.0
This throws the followign error.
`exec': Failed to execute solr start: (RuntimeError)
Port 8983 is already being used by another process (pid: 1814)
Please choose a different port using the -p option.
The error message clearly indicates that some other process is using port 8983.
U need to find which process and try killing it
first run
$ lsof -i :8983
This will list applications running on port 8983. Lets say the pid of the process is 1814
run
$ sudo kill 1814
if you run into Error CREATEing SolrCore, it is mostly because of the permission issues caused by root installation
first cleanup the broken core:
bin/solr delete -c mycore
and recreate core as the solr user
su -u solr -c "/opt/solr/bin/solr create_core -c mycore"

Unicorn failing to spawn workers on USR2 signal

I'm sending a USR2 signal to the master process in order to achieve zero downtime deploy with unicorn. After the old master is dead, I'm getting the following error:
adding listener failed addr=/path/to/unix_socket (in use)
unicorn-4.3.1/lib/unicorn/socket_helper.rb:140:in `initialize':
Address already in use - /path/to/unix_socket (Errno::EADDRINUSE)
The old master is killed in the before_fork block on the unicorn.rb config file. The process is started via upstart without the daemon (-D) option.
Any Ideia on what's going on?
Well, turns out you have to run in daemonized mode (-D) if you want to be able to do zero downtime deployment. I changed a few things in my upstart script and now it works fine:
setuid username
pre-start exec unicorn_rails -E production -c /path/to/app/config/unicorn.rb -D
post-stop exec kill cat `/path/to/app/tmp/pids/unicorn.pid`
respawn

What is the best way to stop a Unicorn Server process from running?

What is the best way to stop a Unicorn Server process from running? Whenever I try to stop it using kill -p 90234 it does not work. It is most likely something I am doing wrong.
Thanks.
Have a look at the Unicorn SIGNALS page. If the master is behaving correctly and you just want to turn it off, you should send a QUIT signal:
kill -QUIT 1234 # where 1234 is the real process id, of course
That gracefully stops all the workers, letting them finish any requests that they're in the middle of serving.
I use this:
ps aux | grep 'unicorn' | awk '{print $2}' | xargs sudo kill -9
I just looked back at this two months later. This is craziness, and don't use this if you have more than one Unicorn master and you only want to kill one of them.
Interesting that no-one considered the pid file that unicorn creates? My usual config puts it in ./tmp/unicorn.pid, so perhaps the safest way is
kill -QUIT `cat tmp/unicorn.pid`
and the pid file is then properly deleted by the departing process. I always put the pid file in the same relative place so I guess I could alias that for convenience, although when I am developing I don't usually daemonize unicorn.
I would probably go with:
sudo pkill unicorn_rails
ps aux | grep unicorn
#=> root 4393 2.0 0.9 65448 20764 ? S 20:06 0:35 unicorn_rails m
kill 4393
Ultimately, the key is the following line which targets the master unicorn process and kills it
kill $(ps aux | grep '[u]nicorn_rails master' | awk '{print $2}')
Usually I'm lazy and I just kill by name:
$ killall processname
Simple Things There - In Terminal type "ps" and have a look for the Master Unicorn Process. Copy the PID of it and then type "kill −9 90234" (where 90234 is PID of master unicorn process). After that worker process should disappear itself.
for those using chef and seeing that none of the above works (because the processes are respawned as soon as you kill them):
sudo sv stop APP_NAME
sv is the control for runit.
To quit a specific Unicorn server you can use something like the following:
pkill -QUIT --pidfile /path/to/app/shared/tmp/pids/unicorn.pid
This way you can selectively kill any process and you don't have to use shell evaluation/expansion which may not be available.

Resources