Docker with a Rails App-Workers not running - ruby-on-rails

So I have a Rails Application that has multiple types of workers. I decided to try and run the rails app with Docker, with a separate image for each type of worker (Resque, DelayedJob, a scheduler, different configurations). The problem is that the workers with queues (DelayedJob + Resque) are not picking up jobs (using both to rule out the queuing system itself). I can see the jobs enqueued, they're there, but the workers never pick up anything off the queue. If I run a worker off the console, it works just fine.
The images are based on Cedarish-https://github.com/progrium/cedarish
The web workers that are sitting behind NGINX seem to be doing fine, though I have noticed some issues with them sometimes becoming non-responsive after a while but not sure if that's related.
Any idea as to what could cause a worker, run under Docker and successfully connecting to Redis + MySQL, to just ignore the job queue and not pick anything up?
Guessing this has something to do with my Docker configuration...

Turns out this was an operating system problem-Docker was running up to 100% CPU usage and just generally misbehaving.
This was on a GCE instance with Debian 7 with backports.
The following fixed the problem:
sudo aptitude install bridge-utils libvirt-bin debootstrap
vi /etc/default/grub
GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"
sudo reboot

Related

Logging from multiple processes in a single docker container

I have an application (let's call it Master) which runs on linux and starts several processes (let's call them Workers) using fork/exec. Therefore each Worker has its own PID and writes its own logs.
When running directly on a host machine (without docker) each process uses syslog for logging, and rsyslog puts ouptut from each Worker to a separate file, using a config like this:
$template workerfile,"/var/log/%programname%.log"
:programname, startswith, "worker" ?workerfile
:programname, isequal, "master" "/var/log/master"
Now, I want to run my application inside a docker container. Docker starts Master process as the main process (in CMD section of the Dockerfile), and then it forks the Workers at runtime (not sure if it's a canonical way to use docker, but that's what I have). Of course I'm getting only the stdout for the Master process from docker, and logs of Workers get lost.
So my question is, any way I could get the logs from the forked processes?
To be precise, I want the logs from different processes to appear in individual files on the host machine eventually.
I tried to run rsyslog daemon inside docker container (just like I do when running without docker), writing logs to a mounted volume, but it doesn't seem to work. I guess it requires a workaround like supervisord to run the Master process and rsyslogd at the same time, which looks like an overkill to me.
I couldn't find any simple solution for that, though my problem seems to be trivial.
Any help is appreciated, thanks

How to interact with already running instance via terminal in Mongooseim?

I am using Mongooseim 3.2.0 from the source code on the ubuntu server. Below are concern:
What is the best way to run mongooseim as a service so that it automatically restarts if mongooseim crashes or system restarts?
How to interact via terminal with already running mongooseim instance on the ubuntu server like "mongooseimctl live". My guess is running "mongooseimctl live" will try to create another instance. I just want to see the live logs and interaction and don't want to keep scrolling the long log files for this purpose.
I apologize if the answer to above is obvious but just want to follow the best guidance.
mongooseimctl live or mongooseimctl foreground is mostly useful for development or smoke testing a deployment (unless you're running inside a container). For real world use cases you should start the server in the background with mongooseimctl start.
Back to the container - the best approach for containerised applications is to run them in the foreground, therefore in a container startup script use mongooseimctl foreground.
Once the server is running (no matter how it was started) attaching a shell to troubleshoot issues can be done with mongooseimctl debug. This is the command to use when you get the Protocol 'inet_tcp': the name mongooseim#localhost seems to be in use by another Erlang node error. Be careful if it's a production environment - you can easily take the server down with access to this shell.
If you're just interested in watching logs, with no interactive access to the server internals that the shell offers, a simple tail -f /your-configured-mongooseim-log-dir/* should be enough.
Ubuntu nowadays uses systemd for managing its services' lifetimes. A systemd .service file can be found at https://github.com/esl/MongooseIM/blob/master/tools/pkg/platforms/debian_stretch/files/build/mongooseim.service - we use it for packaging into Debian/Ubuntu .deb packages.

Best Practices for Cron on Docker

I've transitioned to using docker with cron for some time but I'm not sure my setup is optimal. I have one cron container that runs about 12 different scripts. I can edit the schedule of the scripts but in order to deploy a new version of the software running (some scripts which run for about 1/2 day) I have to create a new container to run some of the scripts while others finish.
I'm considering either running one container per script (the containers will share everything in the image but the crontab). But this will still make it hard to coordinate updates to multiple containers sharing some of the same code.
The other alternative I'm considering is running cron on the host machine and each command would be a docker run command. Doing this would let me update the next run image by using an environment variable in the crontab.
Does anybody have any experience with either of these two solutions? Are there any other solutions that could help?
If you are just running docker standalone (single host) and need to run a bunch of cron jobs without thinking too much about their impact on the host, then making it simple running them on the host works just fine.
It would make sense to run them in docker if you benefit from docker features like limiting memory and cpu usage (so they don't do anything disruptive). If you also use a log driver that writes container logs to some external logging service so you can easily monitor the jobs.. then that's another good reason to do it. The last (but obvious) advantage is that deploying new software using a docker image instead of messing around on the host is often a winner.
It's a lot cleaner to make one single image containing all the code you need. Then you trigger docker run commands from the host's cron daemon and override the command/entrypoint. The container will then die and delete itself after the job is done (you might need to capture the container output to logs on the host depending on what logging driver is configured). Try not to send in config values or parameters you change often so you keep your cron setup as static as possible. It can get messy if a new image also means you have to edit your cron data on the host.
When you use docker run like this you don't have to worry when updating images while jobs are running. Just make sure you tag them with for example latest so that the next job will use the new image.
Having 12 containers running in the background with their own cron daemon also wastes some memory, but the worst part is that cron doesn't use the environment variables from the parent process, so if you are injecting config with env vars you'll have to hack around that mess (write them do disk when the container starts and such).
If you worry about jobs running parallel there are tons of task scheduling services out there you can use, but that might be overkill for a single docker standalone host.

Is it best practice to daemonize a process within docker?

Many best practice guides emphasize making your process a daemon and having something watch it to restart in case of failure. This made sense for a while. A specific example can be sidekiq.
bundle exec sidekiq -d
However, with Docker as I build I've found myself simply executing the command, if the process stops or exits abruptly the entire docker container poofs and a new one is automatically spun up - basically the entire point of daemonizing a process and having something watch it (All STDOUT is sent to CloudWatch / Elasticsearch for monitoring).
I feel like this also tends to re-enforce the idea of a single process in a docker container, which if you daemonize would tend to in my opinion encourage a violation of that general standard.
Is there any best practice documentation on this even if you're running only a single process within the container?
You don't daemonize a process inside a container.
The -d is usually seen in the docker run -d command, using a detached (not daemonized) mode, where the the docker container would run in the background completely detached from your current shell.
For running multiple processes in a container, the background one would be a supervisor.
See "Use of Supervisor in docker" (or the more recent docker --init).
Some relevent 12 Factor app recommendations
An app is executed in the execution environment as one or more processes
Concurrency is implemented by running additional processes (rather than threads)
Website:
https://12factor.net/
Docker was open sourced by a PAAS operator (dotCloud) so it's entirely possible the authors were influenced by this architectural recommendation. Would explain why Docker is designed to normally run a single process.
The thing to remember here is that a Docker container is not a virtual machine, although it's entirely possible to make it quack like one. In practice a docker container is a jailed process running on the host server. Container orchestration engines like Kubernetes (Mesos, Docker Swarm mode) have features that will ensure containers stay running, replacing them should the need arise.
Remember my mention of duck vocalization? :-) If you want your container to run multiple processes then it's possible to run a supervisor process that keeps everything healthy and running inside (A container dies when all processes stop)
https://docs.docker.com/engine/admin/using_supervisord/
The ultimate expression of this VM envy would be LXD from Ubuntu, here an entire set of VM services get bootstrapped within LXC containers
https://www.ubuntu.com/cloud/lxd
In conclusion is it a best practice? I think there is no clear answer. Personally I'd say no for two reasons:
I'm fixated on deploying 12 factor compliant applications, so married to the single process model
If I need to run two processes on the same set of data, then in Kubernetes I can run containers within the same POD... Means Kubernetes manages the processes (running as separate containers with a common data volume).
Clearly my reasons are implementation specific.
There are multiple run supervisors that can help you take a foreground process (or multiple ones) run them monitored and restart them on failure (or exit the container).
one is runit (http://smarden.org/runit/), which I have not used myself.
my choice is S6 (http://skarnet.org/software/s6/). someone already built a container envelope for it, named S6-overlay (https://github.com/just-containers/s6-overlay) which is what I usually use if/when I need to have a user-space process run as daemon. it also has facets to do prep work on container start, change permissions and more, in runtime.
tl;dr: I can't find a best practices document that relates directly to this for docker, but I agree with you.
The only best "Best Practices" for docker I could find was at dockers own site, which states that containers should be one process. In my mind, that means foregrounded processes as well. So basically, I've drawn the same conclusion as you. (You've probably read that too, but this is for anyone else reading this).
Honestly, I think we are still in (relatively) new territory with best practices for docker. Anecdotally, it has been a best practice in the organizations I've worked with. The number of times I've felt more satisfied with a foregrounded process has been significantly greater then the times I've said to myself "Boy, I sure wish I backgrounded that one." In fact, I don't think I've ever said that.
The only exception I can think of is when you are trying to evaluate software and need a quick and dirty way to ship infrastructure off to someone. EG: "Hey, there is this new thing called LAMP stacks I just heard of, here is a docker container that has all the components for you to play around with". Again, though, that's an outlier and I would shudder if something like that ever made it to production or even any sort of serious development environment.
Additionally, it certainly forces a micro-architecture style, which I think is ultimately a good thing.

what is the correct way to set up puma, rails and nginx with docker?

I have a Rails application which is using Puma. I'm using nginx for load balancing. I would like to dockerize and deploy to a DigitalOcean (Docker) droplet.
After reading lots of blogs and examples (most of which are a year old and that's a long time in the Docker world), I'm still confused about 2 things. Let's say that I select a DigitalOcean box with 4 CPUs. How am I supposed to set up the Rails containers? Should I set up 4 different containers, where Puma is configured with 1 worker process? Or should I set up 1 container where Puma is configured with 4 worker processes?
And the second thing I'm confused about: should I run nginx inside the Rails container, or should I run them in separate containers?
These 2 questions allow 4 permutations that I diagramed below.
option 1
option 2
option 3
option 4
Docker likes to push the single process per container style of design. When running multiple processes in a single container there is the extra layer of a service manager in between Docker and the underlying processes which causes Docker to lose visibility of the real service status. This more often than not makes services harder to manage with Docker and it's associated tools. Puma managing workers will not be as bad as a generic service manager running multiple processes.
You may also need to consider the next step in the application, hosting across multiple droplets/hosts and how to easy it will be to move to that next step.
Option 1 and 3 follow Dockers preferred design. If you are using MRI, Puma can run in clustered mode so it just depends on whether you want to manage the Ruby processes yourself (1) or have Puma do the worker management (3). There will be differences between how nginx and Puma distribute requests between workers. Puma can also schedule zero down time updates which would require a bit of effort to get working via Docker. If you are using Rubinius or JRuby you would probably lean towards option 3 and let threads do the work.
Option 1 may allow you to more easily scale across different sized hosts with Docker tools.
Option 2 looks like it adds an unnecessary application hop and Docker no longer maintains the service state in your app tier as you need something else in the container to launch both nginx and Puma.
Option 4 might perform a bit better than others due to the local sockets, but again Docker is no longer aware of the service state.
In any case try a couple of solutions and benchmark both with something like JMeter. You will quickly get an idea of what works and what doesn't, both in performance and management.

Resources