Where is hadoop installed in datamechanics - docker

I am using this spark image from datamechanics, and am assuming that the image has hadoop installed because the name says so. But I can find it in the usual locations (/usr/local, /opt/, etc). Also the docs are not easy to read to understand how the image was built (mostly because I couldn't find a code like file which I can read). Does anyone know if hadoop is actually installed in the datamechanics images. If not, is there an alternate image that is recommended for spark and hadoop? Thanks!

The name of the image is a reference to the Spark downloadable package that includes Spark 3.1.1 with Hadoop 3.2.0 client libraries, as compared to the "bring your own" version...
It does not "come with" scripts to run HDFS and YARN, therefore does have have either "installed". You can search Dockerhub for datanode/namenode images, but neither are necessary to run Spark from the image you've found.

Related

Is Docker-ized dev envoirment good for maintaining legacy software?

Let's say I have old, unmaintained application that lives on a VPS (i.e. Symfony 3 PHP app that relies on PHP 5).
If some changes are needed I have to clone this app to my desktop, build it, change and re-deploy. As time goes, recreating desktop dev environment gets harder - in this example I can't simply build the app as I use PHP7 in my CLI that breaks building process.
I tried to dockerize the app, so I added Ubuntu 18 to my docker-compose file... and it doesn't work as latest Ubuntu that has PHP5 support is 14.04. 14.04 is also the oldest (official) version available on DockerHub. But will it be still available in 3 years? If not, Docker won't build a container.
So, my question is: is Docker a right tool to solve described problem at all?
If so, should I backup docker images described that my build relies on?
If not, beside proper maintenance, what tool is better?
You can install PHP5 in newer ubuntu versions, but it means adding an external repository.
You could also create your own docker image, containing only the libraries you want. If so, I'd advise to try and use alpine as a base image. There is a bit of a learning curve to adapt, but once you do it you'll have a small image tailored to your needs.
Given that containers allow you to isolate processus and conf with minimal footprint compared to a VM, I think it is the best option. Tailoring and maintaining your own image is not that expensive in terms of maintenance if you document it correctly, and it will allow you to always have a system 'maintaining' all your precise requirements.

Use multiple docker images at once

I am trying to containerize my ROS + tensorflow application. The problem is that I want to use GPU. I can either use GPU and forget about ROS, or use ROS and forget about GPU, but I don't know how I can enable both of them.
I have tried starting FROM a cuda image and installing ROS as described here, but the ros couldn't be installed, not being able to find the package.
I also tried building them one by one in the same Dockerfile and copying all the ROS-related stuff from a ROS build, but that failed too.
Ideally, I want to make it work as if I "include" a Cuda image and a ROS image, but resource on building from multiple images like this could hardly be found. Any help or pointer would be appreciated.

Docker query on containerizing

Our requirement is to create a container for legacy apps over docker.
We don't have the operating system support/application server support available, nor do we have knowledge to build them from scratch.
But we have a physical instance of the legacy app running in our farm.
We could get an ISO image from our server team if required, our question is if we get this ISO image can we export this as a docker image?
if yes, please let me know if there is any specific procedure or steps associated with it.
if no, please tell me why? and the possible workarounds for the same.
if we get this ISO image can we export this as a docker image?
I don't think there is an easy way (like push-the-export-button) to do this. Explanation follows...
You are describing a procedure taking place in the Virtual Machine world. You take a snapshot of a server, move the .iso file somewhere else and create a new VM that will run on a Hypervisor.
Containers are not VMs. They "contain" all the bytes that a service needs to run but not a whole operating system. They are supposed to run as processes on the host.
Workarounds:
You will have to get your hands dirty. This means that you will have to find out what the legacy app uses (for example Apache + PHP + MySql + app code) and build it from scratch with Docker.
Some thoughts:
containers are supposed to be lightweight. For example one might use one container for the database, another one for the Apache etc... Your case looks like you are moving towards a fat container that has everything inside.
Depending on what the legacy technology is, you might hit a wall... For example, if we are talking about something working with old php, mysql you might find ready-to-use images on hub.docker.com. But if the legacy app is a financial system written in cobol, I don't know what your starting point might be...
You will need to reverse engineer the application dependencies from the artifacts that you have in access to. This means recovering the language specific dependencies (whether python, java, php, node, etc). And any operating system level packages/dependencies that are required.
Essentially you are rebuilding the contents of that ISO image inside your docker file using OS package installation tools like apt, language level tools like pip, PECL, PEAR, composer, or maven, and finally the files that make up the app code.
So, for example: a PHP application might be dependent on having build-essential and php-mysql installed in the OS. Then the app may be dependent on packages like twig and monolog loaded through composer. If you are using SASS you may need to install ruby as well.
Your job is to track all these down and create a docker file that reproduces the iso image. If you are using a common stack like a J2EE app in tomcat, or a php app fronted by apache or ngnix, there will be base docker images that will get you most of the way to where you need to go.
It does look like there are some tools that can do this for you automatically: Dependency Walker equivalent for Linux?. I can't vouch for any of them. But you can also use command line tools. For example this will give you a list of all the user installed packages on a fedora system:
sudo dnf history userinstalled
When an app is using a dependency manager like composer or pip, there is usually a file that lists all the language specific dependencies.
At the end of the process you'll have a portable legacy app that can be easily deployed anywhere with a minimal footprint.
As one of the comments rightly points out, creating a VM from the ISO image is another way forward that will be much easier to accomplish. The application dependencies won't be explicit, but maybe that's ok for your use case.

Compile Tensorflow from source with Docker to get CPU speed up

I am looking for a way to set up or modify an existing Docker image for installing tensorflow that will install it such that the SSE4, AVX, AVX2, and FMA instructions can be utilized for CPU speed up. So far I have found how to install from source using bazel How to Compile Tensorflow... and CPU instructions not compiled.... Neither of these explain how to do this within Docker. So I think what I am looking for is what you need to add to an existing docker image that installs without these options so that you can get a compile version of tensorflow with the CPU options enabled. The existing docker images do not do this because they want the image to run on as many machines as possible. I am using Ubuntu 14.04 on linux PC. I am new to docker but have installed tensorflow and have it working without getting the CPU warnings I get when I use the docker images. I may not need this for speed, but I have seen posts that claim the speed up can be significant. I searched for existing docker images that do this and could not find anything. I need this to work with gpu so needs to be compatible with nvidia-docker.
I just found this docker support for bazel and it might provide an answer, however I do not understand it well enough to know for sure. I believe this is saying that you can not build tensorflow with bazel inside a Dockerfile. You have to build a Dockerfile using bazel. Is my understanding correct and is this the only way to get a docker image with tensorflow compiled from source? If so, I could still use help in how to do it and still get the other dependencies that I would get if using an existing docker image for tensorflow.
Dockerfiles that build with CPU support can be found here.
Hope that helps! Spent many a late night here on Stack Overflow and Github Issues and stuff. Now it's my turn to give back! :)
The GPU stuff in particular is really hairy - especially when enabling the XLA/JIT/AOT stuff as well as the Graph Transform Tools.
Lots of hacks embedded in my Dockerfiles. Feel free to review and ask me questions!
The contributing guidelines mention building TensorFlow from source with Docker to run the unit tests:
Refer to the
CPU-only developer Dockerfile and
GPU developer Dockerfile
for the required packages. Alternatively, use the said
Docker images, e.g.,
tensorflow/tensorflow:nightly-devel and tensorflow/tensorflow:nightly-devel-gpu
for development to avoid installing the packages directly on your system.

Is it a bad practice to use version managers like RVM inside docker containers?

I'm new to using docker and so far I'm unable to find many ruby/rails images that contain RVM or rbenv.
The most common thing I see is that each container has multiple tags and each tagged image version has only one version of Ruby installed. See this image for example.
The only way to use another version is to use another tag for the image you are using as you can not install a new version with RVM nor with rbenv.
Is this done on purpose?
Is it a bad practice to use version managers for programming languages inside docker containers?
Why?
This would be considered a bad practice or anti-pattern in docker. RVM is trying to solve a similar problem that docker is solving, but with a very different approach. RVM is designed for a host or VM with all the tools installed in one place. Docker creates an isolated environment where only the tools you need to run your single application are included.
Containers are ideally minimalistic, only containing the prerequisites needed for your application, making them more portable. Docker also uses layers and a union filesystem to reuse common base images for each image, so any copy of something like Ruby version X is only downloaded and written to disk once, ever (ignoring updates to that image).
It depends on how you are going to use it. If you need to install about any version of ruby on the custom docker image without messing around with downloading tarballs and applying patches RVM can be just perfect. RVM is basically a bash script so using it inside of docker container is as bad as using any other bash script inside of docker container.

Resources