Which docker container should I use to Hadoop? - docker

I'm trying to find a recent version of Hadoop available on docker.
Is there an official Hadoop repository created since 2016 (Is there any official Docker images for Hadoop?)?
I found some repositories like :
https://hub.docker.com/r/harisekhon/hadoop/
https://hub.docker.com/r/sequenceiq/hadoop-docker/
https://hub.docker.com/r/uhopper/hadoop/
https://hub.docker.com/r/cloudera/quickstart/
https://hub.docker.com/r/mcapitanio/hadoop/
But I don't know if they are good and updated.
Can you help me to find the best image please?
Thanks

Cloudera images include much much more than only Hadoop. Therefore I wouldn't suggest that as Docker images should do one thing
I've had success with the SequenceIQ and uhopper images, and the last one in your list is deprecated if you see the description, but in truth, they all will probably work for your purposes unless you really specifically need a Hadoop 3 feature
The ones I've used recently are by bde2020

Related

Why use another person Docker Image?

I'm learning about Docker architecture.
I know that images are made to run applications in containers (virtualization). One thing that I stepped on was that there is an entire community hub for posting images. But what is actually the point of doing that?
Isn't the idea of images contain a very specific enviroment with very specific configurations that runs very specific applications?
The idea of images is to have a well-defined environment. The images by the community serve mostly as building blocks or base images for your own, more specific, images. For some applications, you can use an image as-is with maybe a little configuration parameters, but I would guess the more common use case is to start building your specific image based on an already existing, more general image.
Example:
You want to create an image with a certain Java application. So you look for an image that already has the Java version you want, and create an image based on that more general image.
You want to test your application on different OS versions (maybe different Linux versions). So you create a couple of images, each based on a different base image that already has the OS installed that you are interested in.

Do OS providers make special / custom made OS for docker?

I am trying to understand Docker and its related core concepts, I came to know that there is concept of images which forms the basis of container where applications run isolated.
I also came to know that we can download the official images from docker hub, https://hub.docker.com , part of screen shot below:
My question is:
Do respective company create special/custom made OS (the minimal, for example we can see ubuntu image) for docker? If so, what benefit these companies get in creating these custom made images for docker?
One could call them custom images, however, they are just base bare images which are to be used as a starting point for your application.
They are mostly built by people who works at Docker and they are trying to ensure some guarantee of quality.
They are stripped of unnecessary packages in order to keep the image size to a minimum.
To find out more you could read this Docker documentation page or this blog post.

what is a docker image? what is it trying to solve?

I understand that it is software shipped in some sort of binary format. In simple terms, what exactly is a docker-image? And what problem is it trying to solve? And how is it better than other distribution formats?
As a student, I don't completely get the picture of how it works and why so many companies are opting for it? Even many open source libraries are shipped as docker images.
To understand the docker images, you should better understand the main element of the Docker mechanism the UnionFS.
Unionfs is a filesystem service for Linux, FreeBSD and NetBSD which
implements a union mount for other file systems. It allows files and
directories of separate file systems, known as branches, to be
transparently overlaid, forming a single coherent file system.
The docker images сonsist of several layers(levels). Each layer is a write-protected filesystem, for every instruction in the Dockerfile created own layer, which has been placed over already created. Then the docker run or docker create is invoked, it make a layer on the top with write persmission(also it has doing a lot of other things). This approach to the distribution of containers is very good in my opinion.
Disclaimer:
It is my opinion which I'm found somewhere, feel free to correct me if I'm wrong.

What's the best way to compose multiple Docker images with different bases

I would like to compose multiple Docker images that start with different bases. However, many of the installation scripts afterward are similar.
What's the best way to source a sub Docker file?
Sounds like what you're looking for is the ability to include Dockerfiles in other Dockerfiles. There was a proposal for such a feature, but currently there is nothing that supports this out of the box. The discussion is worth reading through because it includes links to tools like harbor and dfpp that people built to a support a subset of the functionality.
One problem with tools like this is that you can't easily make the same include file work for debian, centos, and alpine linux (for example). The way this is currently addressed (like redis and redis-alpine images for essentially the same software) is to have duplicate dockerfiles.

What is the essential difference between docker and rkt?

How are they functioning differently?
Which features of the kernel are they using?
You can read all about it in this link
Basically, my impression is that rkt takes pride in being image-agnostic (meaning you can run images that were built using docker or other container engines) and contain less overhead than docker does. This is a nice picture to describe the differences between the two (taken from the link I've attached) -

Resources