Access rails active storage objects from sidekiq in different docker container - ruby-on-rails

we're having a rails 6 app holding files via ActiveStorage. So far, requests dealing with uploading files etc. last some time because we're currently working non-asynchronous. Now we're switching to background processing. Everything related to files specifically is causing headache, because the files reside in one docker container in the realm of our rails service but need to be accessed from a separate sidekiq container where unfortunately ActiveJob tasks are not able to find the ActiveStorage objects (ActiveStorage::FileNotFoundError).
So - if any - what would be the best approach to make ActiveStorage ressources accessible to sidekiq in a multi-container docker setup?
Thanks in advance!

I guess you use disk service on active storage(I should point out that if you do not store the rails app or the storage directory on the host, then you will lose your files when the container is restarted.). If your files are in disk service, if you want to use them in the background processor, then you can link both the rails app and the app directory in the background processor from your host, or you can link the storage directory to both of them from the host. As an example, you can look at the following article;
https://nickjanetakis.com/blog/dockerize-a-rails-5-postgres-redis-sidekiq-action-cable-app-with-docker-compose

Related

Hosting multiple Single Page Apps with Docker

We have a couple of single page apps that we want to host on a single web server. I'm only talking about the frontend part (Angular, React). The APIs run elsewhere. Each app is basically just a directory with a collection of static files (js, html, css, etc.) generated by the CI process. In fact, the build process creates one Docker image per app. Each image basically just contains a directory that contains the build artifacts.
All apps should appear in different folders on the same website:
/app1
/app2
/app3
What would be the best practice for deploying the apps? We've come up with a few strategies.
1. A single image / container
We could build a final web server image (e.g. Apache) and merge all the directories from the app images into it.
Cons: Versioning sounds like hell. Each new version of an app causes a new version of the final image. What if we want to revert to an older version of an app while a newer version of another app has already been deployed?
2. Multiple containers with a front-end reverse proxy
We could build each app image with its own built-in web server. And then route them all together with a front-end reverse proxy (nginx, traefik, etc.).
Cons: Waste of resources running multiple web servers.
3. One web server container and multiple data-only containers for the apps
Deploy each app in a separate container that provides it's app directory as a volume but does nothing else. Then there is a separate web server container that shares the same volumes in order to have access to all the files.
So far I like the 3rd variant best. Whenever a new version of an app needs to be deployed, we simply do a Docker pull on a new version of its image. But it still seems hacky. Volumes must be deleted manually. Otherwise the volume will not be seeded with the new content. Also having containers that do nothing isn't the Docker way, isn't it?
A Docker container wraps a process, but your compiled front-end applications are static files. That is, the setup you're describing here doesn't really match Docker's model.
Without Docker you could imagine deploying these to a single directory
/var/www/
app1/
index.html
css/app.css
app2/
index.html
css/app2.css
js/main.js
and serve these with a single HTTP server; you would not typically run a separate server for each front-end application.
A totally reasonable option, in fact, is to completely ignore Docker here. Even if your back-end applications are being served from containers, you can publish your front-end code (again, compiled to static files) via whatever hosting service you have conveniently available. Things like Webpack's file hashing can help support deploying updated versions of the application without breaking existing clients.
If I was using Docker I'd use either of your first two options but not the third. Running a combined all-the-front-ends HTTP server is the same pattern already discussed, except the HTTP server is in a container instead of the host. Running a dedicated HTTP server for each front-end application lets you use Docker's image versioning, and the incremental cost of an additional HTTP server isn't that expensive.
I would avoid any approach that involves named volumes or "data-only containers". Nothing ever automatically copies content into a volume, except for one specific corner case (on native Docker only, using named volumes but not any other kind of mount, only the first time you use a volume but never updating the volume content), and so you'd have to manually write code to copy content out of an image into a shared hosting location; that's more complicated and doesn't really gain you anything over directly running Webpack on the host.

Kubernetes - from Minikube to production

I have created a simple PHP api application that works with a mysql database to store data. I have been experimenting with Kubernetes on my Windows 10 machine through Minikube.
I have just about got my head round the ideas involved, yet I’m not sure about how to implement this properly. So far I have used Kompose to create a set of yaml files from an existing docker-compose file. This has been half successful.
To get my application code into a pod hosting PHP, I have been using hostPath to share from my local machine. I mount to the minikube machine and share from there. I was having trouble sharing by other means. The application code is hosted in a github repo.
My questions are:
Is mounting my application code into a pod (assuming this is similar to what happens in docker) the correct way to do this? I’m not clear exactly what information is held on an image retrieved from the docker hub. Although I have read up on containers isolating the build environment from your machine.
How does this approach to translate into a production environment hosted on a cloud? I see there are various storage types. I had for example, wanted to try deploying on AWS just to see how this would work in practice.
I’m really looking for guidance to go from the tutorials found on the web working on my machine, to something that could be done for a customer hosted on the cloud. This might scale up to a more microservices style architecture over time.
The approach you are describing is mostly for development setups, where you want to mount your code into the container as a volume so you don't have to rebuild every time your code changes. Typically done with a docker-compose file.
For production setups, you want the docker image to correctly work and only mount volumes to data you want to persist, typically databases are the core example. For this EKS is deeply integrated into the AWS infrastructure and will create EBS volumes on demand. You don't need to provision any volume or even care for most cases (unless you need multiple read-write volumes needed for scaling).
For a PHP application you really should not persist any data in the pod, because it will create other issues when you need to scale the application. Also, a good approach for managing files that need to persist is S3 (AWS simple storage service).
So generally speaking, you need a deployment per application a service to access each pod on that application and then an ingress object to route traffic from the internet to each pod.
Your application docker image is really the core. You just build it with your code inside. Make sure to pass configuration using environment variable or configuration file so you can connect to the database.
Now for kubernetes, for each compoment (e.g. PHP application, MySQL) you will most likely create a deployment k8s manifest that points to the docker image and add some configuration environment variables.
For production, you will need persistence volume. On aws you can simply use EBS-backed volumes
To get traffic from Internet to your PHP application, you will need to add one or more k8s components:
K8s Service manifest that exposes your PHP deployment/pod on a stable address. If you only have q or very few services, you can use LoadBalancer which on cloud like AWS will create an ALB/ELB (might need to add annotation to your service)
An ingress which is just a reverse proxy (contour, nginx, traefik). On cloud environment it will map to an ALB/ELB. The advantage of this is that you can have a single ALB for all your services i.e. save money. Also you can configure routing path or TLS termination in one place.

How do I use Docker on cloud or datacenter

I couldn't have enough courage to start using docker now I'm feel like came from last century. I want to clear my doubts about docker before get started. My question is mainly for deploying/running docker images on cloud or hosting environment.
Can I build a docker image with any type of server (eg. wildfly, payara) and/or database server (eg. mysql, oracle) and will it work on docker enabled cloud/datacenter?
If it's yes how about persistent datas like database files and static storages (eg. images, uploaded documents, logs) those are stored in docker images or somewhere else? What will happen to those files when I update my application and redeploy new image?
I read posts about what is docker but I couln't find specific answer. Forgive me for not doing enough googling.
I have run docker on AWS and other cloud providers. It is really not that hard if you have some experience with system administration and or devops. Regarding cloud hosters and getting started, most providers have some sort of tutorial on how to get started using docker with their infrastructure:
http://docs.aws.amazon.com/AmazonECS/latest/developerguide/docker-basics.html
https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-dockerextension/
Can I build a docker image with any type of server (eg. wildfly,
payara) and/or database server (eg. mysql, oracle) and will it work on
docker enabled cloud/datacenter?
To get a server up and running, you just need the docker engine installed on the host, there are packages for many distros:
https://docs.docker.com/engine/installation/
After docker engine is installed, you can create dockerfiles for basically any server or service. Hopefully you do not need to, in most cases, since there are countless docker files and pre-configured, vendor maintained images already available on dockerhub (I use wildfly, elk-stack, and mysql for example). Be careful about selecting images are maintained, otherwise you end up with security issues in your images that might never get fixed! Or you have to do it yourself!
Example images:
https://hub.docker.com/r/jboss/wildfly/
https://hub.docker.com/_/mysql/
https://hub.docker.com/_/oraclelinux/
https://hub.docker.com/u/payara/
If it's yes how about persistent datas like database files and static
storages (eg. images, uploaded documents, logs) those are stored in
docker images or somewhere else? What will happen to those files when
I update my application and redeploy new image?
In general, you will want to store persistent data external to the docker image and mount it into the image as a volume:
https://docs.docker.com/engine/tutorials/dockervolumes/
Some cloud based storage providers might be easier to mount or connect to in other ways, but this volume approach is standard, IMO.
For logfiles, I actually push them to an ELK server, so having a volume for the logs is not necessarily required. However, since the ELK server is also a docker image, it does have a volume where the data is persisted.
So you have:
documentation from your cloud hoster (or docker themselves)
a host in your cloud running docker engine
0..n images that you can either grab from dockerhub or build yourself.
storage for persistent data on this host or mounted from elsewhere that you mount into your docker images on startup. this is where e.g. mysql data folders live, or where you can persist logs, etc.
Of course, it can get much more complex from there, e.g. how to transparently scale and update your environment etc., but that is something for e.g. kubernetes or docker swarm or some other solution (I've scripted a bit on my own but do not need the robustness or elastic scalability of large systems).
Regarding cluster management, it should be noted that Swarm is now included in the Docker Core. This has created some controversy in the community and even talks of a fork of the core:
https://technologyconversations.com/2015/11/04/docker-clustering-tools-compared-kubernetes-vs-docker-swarm/
https://jaxenter.com/docker-1-12-is-probably-the-most-important-release-since-1-0-129080.html
http://searchitoperations.techtarget.com/news/450303918/Docker-fork-talk-prompts-container-standardization-brawl
http://www.infoworld.com/article/3118345/cloud-computing/why-kubernetes-is-winning-the-container-war.html
I have experience running docker on Alibaba cloud and AWS as well. I did not see any difference in working with docker on both cloud providers. Docker images can be build same way on all linux platform regardless of the cloud provider. However, persistence of data need to be taken care using docker volumes. However, it is recommended to use managed service such as RDS in Alibaba cloud for databases instead of using docker.
Can I build a docker image with any type of server (eg. wildfly,
payara) and/or database server (eg. mysql, oracle) and will it work on
docker enabled cloud/datacenter?
You can build your own Docker images or use solutions that are already pre-packaged and proven by cloud providers. For example, here is an auto-clustering Docker-based implementation of GlassFish that can be run and managed on Jelastic PaaS.
If it's yes how about persistent datas like database files and static
storages (eg. images, uploaded documents, logs) those are stored in
docker images or somewhere else? What will happen to those files when
I update my application and redeploy new image?
With the above mentioned cluster, all data is kept inside containers and stays without changes after restart. As an option, you can also connect a separate data storage container if you wish to share it across other containers.

amazon ecs Container for Configuration files

at the moment I try to figure out a good setup for my application in amazon ecs.
My application needs a config file. Now I want to have a container to hold my config file so when I want to change something I don't need to redeploy my application.
I can't find any best practice method for this. What I found out is that the ecs tasks just make a docker run and you can't make a docker create.
Does anyone have an idea how I can manage my config files for my applications?
Most likely using Docker for this is overkill. How complex is the data? If it's simple key-value pairs I would use DynamoDB and get rid of the file completely. Another option would be using EFS for the file, or attaching/detaching an EBS volume.
You should not do that, it makes it fragile and you're not guaranteed to be able to access it from all containers across a cluster (or you end up having that on all instances which wastes resources). Why not package it up with the container as-is or package it as much as possible and provide environment variables to fill in the gap? If you really want to go this route I highly suggest something like S3

how to use sidekiq worker to access uploaded file in a rails app on Dokku server

I'm using sidekiq plugin:
https://github.com/bigboxsoftware/dokku-sidekiq to enable sidekiq on the server. And the problem is the worker cannot access file in any of app folders.
As the plugin README said
Adds a post-deploy hook to Dokku to automatically deploy a container running a Sidekiq worker
I'm not sure this means my app and sidekiq are running in the different container?
I have tried to use mounted folder to store uploaded file
cat /home/dokku/home/PERSISTENT_STORAGE
/home/dokku/shared/temp:/app/public/temp
But it still cannot work.
In rails console, I can use File.open('public/temp/file') to open the file, but once use Sidekiq::Worker.new('public/temp/file'), it raise Errno::ENOENT: No such file or directory error.
What I can do now?
In this situation Sidekiq is running it's own server (container) thus making it sandboxed from the application server (i.e. might or might not be running on the same physical hardware).
In the situation when you want to "share" anything between application server and sidekiq you need to share with a Database / Datastore. This could be S3, Redis, Postgres or the arguments to the sidekiq job itself.
I have have all of these in production for different purposes.

Resources