Can a docker image use hadoop? - docker

Can a docker image access hadoop resources? Eg. submit YARN jobs and access HDFS; something like MapR's Datasci. Refinery, but for Hortonworks HDP 3.1. (May assume that the image will be launched on a hadoop cluster node).
Saw the hadoop docs for launching docker applications from hadoop nodes, but was interested in whether could go the "other way" (ie. being able to start a docker image with the conventional docker -ti ... command and have that application be able to run hadoop jars etc. (assuming that the docker image host is a hadoop node itself)). I understand that MapR hadoop has docker images for doing this, but am interested in using Hortonworks HDP 3.1. Ultimately trying to run h2o hadoop in a docker container.
Anyone know if this is possible or can confirm that this is not possible?

Yes. As long as you have the client jars and appropriate configs similar to an edge node for your container it should work.

Related

How to create a marklogic most basic 3-node-cluster from dockerhub marklogic images via gradle

I am looking for any sample project to create MarkLogic 3 Nodes Cluster directly from ML Docker Hub image (https://hub.docker.com/_/marklogic) via Gradle on a single machine.
The idea is to auto spin off different ML versions for the dev enviroment set up.
Current three node cluster example in gitbub ml-gradle is to install ML from rpm installation package.
I would like to directly use the ML docker hub image instead.
MarkLogic on Docker Hub includes instructions for spinning up a cluster with this simple command:
docker-compose -f cluster.yml up -d --scale dnode=2
To run this, pull down the Docker image (you'll need an account on Docker Hub (free) and you'll need to do the checkout process to get access to the MarkLogic image (also free)). Then you can create the cluster.yml file using the example given on the setup instructions page on Docker Hub.
As #rjrudin points out, you can set up a gradle task to do this.
ml-gradle is typically used to deploy an application to an existing ML cluster. To actually create the ML cluster, you use the "docker" executable. You can automate this via Gradle's Exec task if you wish, but doing so is outside the scope of ml-gradle which assumes that you already have an ML cluster setup.

what is "docker container"?

I understand docker engine sits on top of docker host (which is OS) and docker engine pull docker/container images from docker hub (or any other repo). Docker engine interact with OS to configure and set up container out of image pulled as part of "Docker Run" command.
However I quite often also came across term "Docker Container". Is this some another tool and what is its role in entire architecture ? I know there is windows container or linux containers for respective docker host..but what is it Docker Container itself ? Is it something people use loosely to simply refer to container in general ?
In simple words, when you execute a docker image, it will spawn a docker container.
You can relate it to Java class(as docker image), and when we initialize a class it will create an object(docker container).
So docker container is an executable form of a docker image. You can have multiple Docker containers from a single docker image.
A docker container is an image that is an (think of it as a tarball, or archive) executable package that can stand on its own. The image has everything it needs to run such as software, runtimes, tools, libraries, etc. Check out Docker for more information.
Docker container are nothing but processes which are spawned using image as a source.
The processes are sandboxed(isolated) from other processes in terms of namespaces and controlled in terms of memory, cpu, etc. using control groups. Control groups and namespaces are Linux kernel features which help in creating a sandboxed environment to run processes in isolation.
Container is a name docker uses to indicate these sandboxed processes.
Some trivia - the concept sandboxing process is also present in FreeBSD and it is called Jails.
While the concept isn’t new in terms on core technology. Docker were innovative to imagine entire ecosystem in terms of containers and provide excellent tools on top of kernel features.
First of all you (generally) start with a Dockerfile which is a script where you setup the docker environment in which you are going to work (the OS, the extra packages etc). If you want is like the source code in typical programming languages.
Dockerfiles are built (with the command sudo docker build pathToDockerfile/ and the result is an image. It is basically a built (or compiled if you prefer) and executable version of the environment described in you Dockerfile.
Actually you can download docker images directly from dockerhub.
Continuing the simile it is like the compiled executable.
Now you can run the image assigning to it a name or setting different attributes. This is a container. Think for example to a server environment where you might need the same service to be instantiated the same time more than once.
Continuing again the simile this is like having the same executable program being launched many times at the same time.

Best practice using docker inside Jenkins?

Hi I'm learning how to use Jenkins integrated with Docker and I don't understand what should I do to communicate them.
I'm running Jenkins inside a Docker container and I want to build an image in a pipeline. So I need to execute some docker commands inside the Jenkins container.
So the thing here is where docker come from. I understand that we need to bind mount the docker host daemon (socket) to the Jenkins container but this container still needs the binaries to execute Docker.
I have seen some approaches to achieve this and I'm confused what should I do. I have seen:
bind mount the docker binary (/usr/local/bin/docker:/usr/bin/docker)
installing docker in the image
if I'm not wrong the blue ocean image comes with Docker pre-installed (I have not found any documentation of this)
Also I don't understand what Docker plugins for Jenkins can do for me.
Thanks!
Docker has a client server architecture. The server is the docker deamon and the client is basically the command line interface that allows you to execute docker ... from the command line.
Thus when running Jenkins inside Docker you will need access to connect to the deamon. This is acheieved by binding the /var/run/docker.sock into the container.
At this point you need something to communicate with the Deamon which is the server. You can either do that by providing access to docker binaries. This can be achived by either mounting the docker binaries, or installing the
client binaries inside the Jenkins container.
Alternatively, you can communicate with the deamon using the Docker Rest API without having the docker client binaries inside the Jenkins container. You can for instance build an image using the API.
Also I don't understand what Docker plugins for Jenkins can do for me
The Docker plugin for Jenkins isn't useful for the use case that you described. This plugin allows you to provision Jenkins slaves using Docker. You can for instance run a compilation inside a Docker container that gets automatically provisioned by Jenkins
It is not best practice to use Docker with Jenkins. It is also not a bad practice. The relationship between Jenkins and Docker is not determined in such a manner that having Docker is good or bad.
Jenkins is a Continuous Integration Server, which is a fancy way of saying "a service that builds stuff at various times, according to predefined rules"
If your end result is a docker image to be distributed, you have Jenkins call your docker build command, collect the output, and report on the success / failure of the docker build command.
If your end result is not a docker image, you have Jenkins call your non-docker build command, collect the output, and report on the success / failure of the non-docker build.
How you have the build launched depends on how you would build the product. Makefiles are launched with make, Apache Ant with ant, Apache Maven with mvn package, docker with docker build and so on. From Jenkin's perspective, it doesn't matter, provided you provide a complete set of rules to launch the build, collect the output, and report the success or failure.
Now, for the 'Docker plugin for Jenkins'. As #yamenk stated, Jenkins uses build slaves to perform the build. That plugin will launch the build slave within a Docker container. The thing built within that container may or may not be a docker image.
Finally, running Jenkins inside a docker container just means you need to bind your Docker-ized Jenkins to the external world, as #yamenk indicates, or you'll have trouble launching builds.
Bind mounting the docker binary into the jenkins image only works if the jenkins images is "close enough" - it has to contain the required shared libraries!
So when sing a standard jenkins/jenkins:2.150.1 within an ubuntu 18.04 this is not working unfortunately. (it looked so nice and slim ;)
So the the requirement is to build or find a docker image which contains a compatible docker client for the host docker service is.
Many people seem to install docker in their jenkins image....

How are Packer and Docker different? Which one should I prefer when provisioning images?

How are Packer and Docker different? Which one is easier/quickest to provision/maintain and why? What is the pros and cons of having a dockerfile?
Docker is a system for building, distributing and running OCI images as containers. Containers can be run on Linux and Windows.
Packer is an automated build system to manage the creation of images for containers and virtual machines. It outputs an image that you can then take and run on the platform you require.
For v1.8 this includes - Alicloud ECS, Amazon EC2, Azure, CloudStack, DigitalOcean, Docker, Google Cloud, Hetzner, Hyper-V, Libvirt, LXC, LXD, 1&1, OpenStack, Oracle OCI, Parallels, ProfitBricks, Proxmox, QEMU, Scaleway, Triton, Vagrant, VirtualBox, VMware, Vultr
Docker's Dockerfile
Docker uses a Dockerfile to manage builds which has a specific set of instructions and rules about how you build a container.
Images are built in layers. Each FROM RUN ADD COPY commands modify the layers included in an OCI image. These layers can be cached which helps speed up builds. Each layer can also be addressed individually which helps with disk usage and download usage when multiple images share layers.
Dockerfiles have a bit of a learning curve, It's best to look at some of the official Docker images for practices to follow.
Packer's Docker builder
Packer does not require a Dockerfile to build a container image. The docker plugin has a HCL or JSON config file which start the image build from a specified base image (like FROM).
Packer then allows you to run standard system config tools called "Provisioners" on top of that image. Tools like Ansible, Chef, Salt, shell scripts etc.
This image will then be exported as a single layer, so you lose the layer caching/addressing benefits compared to a Dockerfile build.
Packer allows some modifications to the build container environment, like running as --privileged or mounting a volume at build time, that Docker builds will not allow.
Times you might want to use Packer are if you want to build images for multiple platforms and use the same setup. It also makes it easy to use existing build scripts if there is a provisioner for it.
Expanding on the Which one is easier/quickest to provision/maintain and why? What are the pros and cons of having a docker file?`
From personal experience learning and using both, I found: (YMMV)
docker configuration was easier to learn than packer
docker configuration was harder to coerce into doing what I wanted than packer
speed difference in creating the image was negligible, after development
docker was faster during development, because of the caching
the docker daemon consumed some system resources even when not using docker
there are a handful of processes running as the daemon
I did my development on Windows, though I was targeting LINUX servers for running the images.
That isn't an issue during development, except for a foible of running Docker on Windows.
The docker daemon reserves various TCP port ranges for itself
The ranges might change every time you reboot your system or restart the daemon
The only error message is to the effect: can't use that port! but not why it can't
BTW, The workaround is to:
turn off Hypervisor
reboot
reserve the public ports you want your host system to see
turn on hypervisor
reboot
Running packer on Windows, however, the issue I found is that the provisioner I wanted to use, ansible, doesn't run on Windows.
Sigh.
So I end up having to run packer on a LINUX system after all.
Just because I was feeling perverse, I wrote a Dockerfile so I could run both packer and ansible from my Windows station in a docker container using that image.
Docker builds images using a Dockerfile.
These can be run (Docker containers).
Packer also builds images. But you don't need a Dockerfile. And you get the option of using Provisioners such as Ansible which lets you create vastly more customisable images. It isn't used for running these images.

combining cloudbees ec2 docker image with docker in docker

Am trying to combine docker in docker feature with in cloudbees ecs image.
Both the images are build using different linux based distribution.
Cloudbees ECS slave image is build use base ubuntu 14.04 and docker:1.8-dind is build from base debian:jessie. What is the best way to combine both into one docker image with both features using debian:jessie as the base.
I've done something similar in the past and it usually comes down to walking the Dockerfile dependency chain and building up the image that way. In your instance, you'd probably want to start at https://hub.docker.com/r/cloudbees/java-build-tools/~/dockerfile/ and swap out
FROM ubuntu:15.04
with
FROM debian:jessie
And build it to see what works and what doesn't work. Typically it's a package manager or something that needs to be updated/replaced.
The downside of this approach is that it can be a lot of trial and error and you end up with giant Dockerfiles, but the upside is that you can usually streamline your image to do exactly what you want without a lot of the Ubuntu extras.

Resources