Run Docker container through Oozie - docker

I'm trying to build an Oozie workflow to execute everyday a python script which needs specific libraries to run.
At the moment I created a python virtual environment (using venv) on a node of my cluster (consisting of 11 nodes).
Through Oozie I saw that it is possible to run the script using an SSH Action specifying the node containing the virtual environment. Alternatively it is possible to use a Shell Action to run the python script but this requires creating the virtual environment, with the same dependencies in terms of libraries, on the node where the shell will be executed (any of the cluster nodes).
I would like to avoid sharing keys or configuring all the cluster nodes to make this possible and looking in the docs I found this section talking about launching applications using Docker containers but in Hadoop version of my cluster this feature is experimental and not complete (Hadoop 3.0.0). I suppose that if you can launch Docker containers from shell you should be able to launch them from Oozie.
So my question is: has anyone tried to do it? Is it a trick to use docker this way?
I came across this question but to date 2019/09/30 there are no specific answers.
UPDATE: I tried to do it, and it works (you can find more info in my answer to this question). I'm still wondering if it's a correct way to do it.

Related

Docker namespace, docker on virtualbox, mirror environment

Let's assume scenario I'm using a set of CLI docker run commands for creating a whole environment of containers, networks (bridge type in my case) and connect containers to particular networks.
Everything works well till the moment I want to have only one such environment at a single machine.
But what if I want to have at the same machine a similar environment to the one I've just created but for a different purpose (testing) I'm having an issue of name collisions since I can't crate and start containers and networks with the same name.
So far I tried to start second environment the same way I did with the first but with prefixing all containers and networks names.That worked but had a flaw: in the application that run all requests to URIs were broken since they had a structure
<scheme>://<container-name>:<port-number>
and the application was not able to reach <prefix-container-name>.
What I want to achieve is to have an exact copy of the first environment running on the same machine as the second environment that I could use to perform the application tests etc.
Is there any concept of namespaces or something similar to it in Docker?
A command that I could use before all docker run etc commands I use to create environment and have just two bash scripts that differ only by the namespace command at their beginning?
Can using virtual machine, ie Oracle Virtualbox be the solution to my problem? Create a VM for the second environment? isn't that an overkill, will it add an additional set of troubles?
Perhaps there is a kind of --hostname for docker run command that will allow to access the container from other container by using this name? Unlucky --hostname only gives ability to access the container by this name form the container itself but not from any other. Perhaps there is an option or command that can make an alias, virtual host or whatever magic common name I could put into apps URIs <scheme>://<magic-name>:<port-number> so creating second environment with different containers and networks names will cause no problem as long as that magic-name is available in the environment network
My need for having exact copy of the environment is because of tests I want to run and check if they fail also on dependency level, I think this is quite simple scenario from the continues integration process. Are there any dedicated open source solutions to what I want to achieve? I don't use docker composer but bash script with all docker cli commands to get the whole env up and running.
Thank you for your help.
Is there any concept of namespaces or something similar to it in Docker?
Not really, no (but keep reading).
Can using virtual machine [...] be the solution to my problem? ... Isn't that an overkill, will it add an additional set of troubles?
That's a pretty reasonable solution. That's especially true if you want to further automate the deployment: you should be able to simulate starting up a clean VM and then running your provisioning script on it, then transplant that into your real production environment. Vagrant is a pretty typical tool for trying this out. The biggest issue will be network connectivity to reach the individual VMs, and that's not that big a deal.
Perhaps there is a kind of --hostname for docker run command that will allow to access the container from other container by using this name?
docker run --network-alias is very briefly mentioned in the docker run documentation and has this effect. docker network connect --alias is slightly more documented and affects a container that's already been created.
Are there any dedicated open source solutions to what I want to achieve?
Docker Compose mostly manages this for you, if you want to move off of your existing shell-script solution: it puts a name prefix on all of the networks and volumes it creates, and creates network aliases for each container matching its name in the YAML file. If your host volume mounts are relative to the current directory then that content is fairly isolated too. The one thing you can't easily do is launch each copy of the stack on a separate host port(s), so you have to resolve those conflicts.
Kubernetes has a concept of a namespace which is in fact exactly what you're asking for, but adopting it is a substantial investment and would involve rewriting your deployment sequence even more than Docker Compose would.

Self updating docker stack

I have a docker stack deployed with 20+ services which comprise my application. I would like to know that is there a way to update this stack with the latest changes to the software from within one of the containers running as a part of the stack?
Approach i have tried:
In one of the containers for a service, mounted the docker socket and the /usr/bin/docker file and downloaded the latest compose file from the server.
Instantiated a script which downloads the latest images
Initiate a docker stack deploy with the new compose file
Everything works fine this way but if the service which is running this process itself has an update and if that docker stack deploy tries to create this service before any other service in the stack, then the stack update fails.
Any suggestion or alternative approaches for this?
There is no out of the box solution for docker swarm mode (something like watchtower for single docker). I think you already found the best solution for doing this automatically. I would suggest you put the update container (the one that is updating the services) on a ignore list. Then on one of your master nodes, create a cron that updates that one container. I know this is not a prefect solution, but it should work.
The standard way to do this is to build a new Docker image that contains your new application code. Tag it (as in the docker build -t argument) with some unique version, like a source control tag or date stamp. Start a new container with the new application code, then stop and delete the old container.
As a general rule you do not upgrade the software inside a running container. Delete the old container and start a new container with the software and version you want. Also, this is generally managed by an operator, a continuous deployment system, or an orchestration system, not by the container itself. (Mounting the Docker socket into a container is a significant security exposure.)
(Imagine setting up a second copy of your cluster that works exactly the same way as your production cluster, except that it has the software you want to deploy tomorrow. You don't want your production cluster picking that up on its own until you've tested it. This scheme should give you a reproducible deployment setup so that it's easy to start that pre-production cluster, but also give you control over which specific versions are running where.)

CI testing with docker-compose on Jenkins with Kubernetes

I have tests that I run locally using a docker-compose environment.
I would like to implement these tests as part of our CI using Jenkins with Kubernetes on Google Cloud (following this setup).
I have been unsuccessful because docker-in-docker does not work.
It seems that right now there is no solution for this use-case. I have found other questions related to this issue; here, and here.
I am looking for solutions that will let me run docker-compose. I have found solutions for running docker, but not for running docker-compose.
I am hoping someone else has had this use-case and found a solution.
Edit: Let me clarify my use-case:
When I detect a valid trigger (ie: push to repo) I need to start a new job.
I need to setup an environment with multiple dockers/instances (docker-compose).
The instances on this environment need access to code from git (mount volumes/create new images with the data).
I need to run tests in this environment.
I need to then retrieve results from these instances (JUnit test results for Jenkins to parse).
The problems I am having are with 2, and 3.
For 2 there is a problem running this in parallel (more than one job) since the docker context is shared (docker-in-docker issues). If this is running on more than one node then i get clashes because of shared resources (ports for example). my workaround is to only limit it to one running instance and queue the rest (not ideal for CI)
For 3 there is a problem mounting volumes since the docker context is shared (docker-in-docker issues). I can not mount the code that I checkout in the job because it is not present on the host that is responsible for running the docker instances that I trigger. my workaround is to build a new image from my template and just copy the code into the new image and then use that for the test (this works, but means I need to use docker cp tricks to get data back out, which is also not ideal)
I think the better way is to use the pure Kubernetes resources to run tests directly by Kubernetes, not by docker-compose.
You can convert your docker-compose files into Kubernetes resources using kompose utility.
Probably, you will need some adaptation of the conversion result, or maybe you should manually convert your docker-compose objects into Kubernetes objects. Possibly, you can just use Jobs with multiple containers instead of a combination of deployments + services.
Anyway, I definitely recommend you to use Kubernetes abstractions instead of running tools like docker-compose inside Kubernetes.
Moreover, you still will be able to run tests locally using Minikube to spawn the small all-in-one cluster right on your PC.

Using docker to run a distributed computation

I just came across docker, and was looking through its docs to figure out how to use this to distribute a java project across multiple nodes, while making this distribution platform independent i.e the nodes can be running any platform. Currently i'm sending classes to different nodes and running it on them with the assumption that these nodes have the same environment as the client. I couldn't quite figure out how to do this, any suggestions wouldbe greatly appreciated.
I do something similar. In my humble opinion Docker or not is not your biggest problem. However, using Docker images for this purpose can and will save you a lot of headaches.
We have a build pipeline where a very large Java project is built using Maven. The outcome of this is a single large JAR file that contains the software we need to run on our nodes.
But some of our nods also need to run some 3rd party software such as Zookeeper and Cassandra. So after the Maven build we use packer.io to create a Docker image that contains all needed components which ends up on a web server that can be reached only from within our private cloud infrastructure.
If we want to roll out our system we use a combination of Python scripts that talk with the OpenStack API and create virtual machines on our cloud, and Puppet which performs the actual software provisioning inside of the VMs. Our VMs are CentOS 7 images, so what Puppet actually does is to add the Docker yum repos. Then installs Docker through yum, pulls in the Docker image from our repository server and finally uses a custom bash script to launch our Docker image.
For each of these steps there are certainly even more elegant ways of doing it.

Moving from Docker Containers to Cloud Foundry containers

Recently I started to practice Dockers. Basically, I am running a C application on Docker container. Now, I want to try cloud foundry, therefore, trying to understand the difference between the two.
I'll describe the application as a novice because I am.
The application I start as a service(from /etc/init.d) and it reads a config file during startup, which specifies what all modules to load and IP of other services and it's own (0.0.0.0 does not work, so I have to give actual IP).
I had to manually update the IP and some details in the config file when the container starts. So, I wrote a startup script which did all the changes when the container starts and then the service start command.
Now, moving on to Cloud Foundry, the first thing I was not able to find is 'How to deploy C application' then I found a C build pack and a binary build pack option. I still have to try those but what I am not able to understand how I can provide a startup script to a cloud foundry container or in brief how to achieve what I was doing with Dockers.
The last option I have is to use docker containers in Cloud foundry, but I want to understand if I can achieve what I described above.
I hope I was clear enough to explain my doubt.
Help appreciated.
An old question, but a lot has changed since this was posted:
Recently I started to practice Dockers. Basically, I am running a C application on Docker container. Now, I want to try cloud foundry, therefore, trying to understand the difference between the two.
...
The last option I have is to use docker containers in Cloud foundry, but I want to understand if I can achieve what I described above.
There's nothing wrong with using Docker containers on CF. If you've already got everything set up to run inside a Docker container, being able to run that on CF give you yet another place you can easily deploy your workload.
While these are pretty minor, there are a couple requirements for your Docker container, so it's worth checking those to make sure it's possible to run on CF.
https://docs.cloudfoundry.org/devguide/deploy-apps/push-docker.html#requirements
Anyways, I am not working on this now as CF is not suitable for the project. It's an SIP application and CF only accepts HTTP/S requests.
OK, the elephant in the room. This is no longer true. CF has support for TCP routes. These allow you to receive TCP traffic directly to your application. This means, it's no longer just HTTP/S apps that are suitable for running on CF.
Instructions to set up your CF environment with TCP routing: https://docs.cloudfoundry.org/adminguide/enabling-tcp-routing.html
Instructions to use TCP routes as a developer: https://docs.cloudfoundry.org/devguide/deploy-apps/routes-domains.html#create-route-with-port
Now, moving on to Cloud Foundry, the first thing I was not able to find is 'How to deploy C application' then I found a C build pack and a binary build pack option.
Picking a buildpack is an important step. The buildpack takes your app and prepares it to run on CF. A C buildpack might sound nice as it would take your source code, build and run it, but it's going to get tricky because your C app likely depends on libraries. Libraries that may or may not be installed.
If you're going to go this route, you'll probably need to use CF's multi-buildpack support. This lets you run multiple buildpacks. If you pair this with the Apt buildpack, you can install the packages that you need so that any required libraries are available for your app as it's compiled.
https://docs.cloudfoundry.org/buildpacks/use-multiple-buildpacks.html
https://github.com/cloudfoundry/apt-buildpack
Using the binary buildpack is another option. In this case, you'd build your app locally. Perhaps in a docker container or on an Ubuntu VM (it needs to match the stack being used by your CF provider, i.e. cf stacks, currently Ubuntu Trusty or Ubuntu Bionic). Once you have a binary or binary + set of libraries, you can simply cf push the compiled artifacts. The binary buildpack will "run" (it actually does nothing) and then your app will be started with the command you specify.
My $0.02 only, but the binary buildpack is probably the easier of the two options.
what I am not able to understand how I can provide a startup script to a cloud foundry container or in brief how to achieve what I was doing with Dockers.
There's a few ways you can do this. The first is to specify a custom start command. You do this with cf push -c 'command'. This would normally be used to just start your app, like './my-app', but you could also use this to do other things.
Ex: cf push -c './prep-my-app.sh && ./my-app'
Or even just call your start script:
Ex: cf push -c './start-my-app.sh'.
CF also has support for a .profile script. This can be pushed with your app (at the root of the files you push), and it will be executed by the platform prior to your application starting up.
https://docs.cloudfoundry.org/devguide/deploy-apps/deploy-app.html#profile
Normally, you'd want to use a .profile script as you'd want to let the buildpack decide how to start your app (setting -c will override the buildpack), but in your case with the C or binary buildpack's, it's unlikely the buildpack will be able to do that, so you'll end up having to set a custom start command anyway.
For this specific case, I'd suggest using cf push -c as it's slightly easier, but for all other cases and apps deployed with other buildpacks, I'd suggest a .profile script.
Hope that helps!

Resources