Preserving RPM database for containers

Preserving RPM database for containers - docker

How to preserve RPM database for RPMs installation that happened after the container is spin up? I know installing RPMs inside running container is anti pattern, but we need this.
RPMs should be bundled with Image itself, but in our case, requirement is to preserve the installation happened after container spin up.

Related

Red Hat base image - list packages installed in an image with "rpm", without running it

Our org builds apps based on some Red Hat base images, we find that rpm -qa can be used to check installed system packages and run security checks. Here "packages" means system libs, like libxml2 or expat.(For software dependencies we have Maven). From the dockerfiles, we know that it's not downloading sources and install from it, so in theory, all packages are detected by rpm.
Some of them always pointing to latest, I have to check regularly to make sure which package is at which level. Need to automate this.
When it's impossible to run the image first(for example, when entrypoint is defined as java -jar), I cannot easily run the container and run rpm -qa. I have to create a project, create a main class doing some long-running job, set all the things in Maven jib plugin, build and run it, and docker exec xxx rpm -qa. It's inconvenient.
I have tried dive, I see things but I cannot view the content. I can docker save -o foo.tar and try to extract files from there, but it's inconvenient. Besides, I am not aware of any file containing a list of packages installed. Is there any?
Tried docker history, not very helpful.
I would like a feature from docker to list all packages and versions, for vulnerability checks, delegating the listing of packages to rpm or dpkg(for Ubuntu based images, maybe in the future) depending on the availability of any one of them.
dockerfile can be inaccessible if the image comes from some remote registry. Needs to analyze the binary.

If the default ENTRYPOINT of a given image is on your way for a particular operation, you can unilaterally decide to change it to whatever you want at run time, even to totally drop it.
In your particular case, this should do the trick:
docker run -it --rm --entrypoint '' <your_image> rpm -it
Change the final command to the one you need. You can even run bash and interactively enter your own set of commands to inspect.

How to update my app when I deploy it using docker

I'm deployed a nodejs app using docker, I don't know how to update the deploy after my nodejs app updated.
Currently, I have to remove the old docker container and image when updating the nodejs app each time.
I expect that it's doesn't need to remove the old image and container when I nodejs app updated.

You tagged this "production". The standard way I've done this is like so:
Develop locally without Docker. Make all of your unit tests pass. Build and run the container locally and run integration tests.
Build an "official" version of the container. Tag it with a time stamp, version number, or source control tag; but do not tag it with :latest or a branch name or anything else that would change over time.
docker push the built image to a registry.
On the production system, change your deployment configuration to reference the version tag you just built. In some order, docker run a container (or more) with the new image, and docker stop the container(s) with the old image.
When it all goes wrong, change your deployment configuration back to the previous version and redeploy. (...oops.) If the old versions of the images aren't still on the local system, they can be fetched from the registry.
As needed docker rm old containers and docker rmi old images.
Typically much of this can be automated. A continuous integration system can build software, run tests, and push built artifacts to a registry; cluster managers like Kubernetes or Docker Swarm are good at keeping some number of copies of some version of a container running somewhere and managing the version upgrade process for you. (Kubernetes Deployments in particular will start a copy of the new image before starting to shut down old ones; Kubernetes Services provide load balancers to make this work.)
None of this is at all specific to Node. As far as the deployment system is concerned there aren't any .js files anywhere, only Docker images. (You don't copy your source files around separately from the images, or bind-mount a source tree over the image contents, and you definitely don't try to live-patch a running container.) After your unfortunate revert in step 5, you can run exactly the failing configuration in a non-production environment to see what went wrong.
But yes, fundamentally, you need to delete the old container with the old image and start a new container with the new image.

Copy the new version to your container with docker cp, then restart it with docker restart <name>

Handling software updates in Docker images

Let's say I create a docker image called foo that contains the apt package foo. foo is a long running service inside the image, so the image isn't restarted very often. What's the best way to go about updating the package inside the container?
I could tag my images with the version of foo that they're running and install a specific version of the package inside the container (i.e. apt-get install foo=0.1.0 and tag my container foo:0.1.0) but this means keeping track of the version of the package and creating a new image/tag every time the package updates. I would be perfectly happy with this if there was some way to automate it but I haven't seen anything like this yet.
The alternative is to install (and update) the package on container startup, however that means a varying delay on container startup depending on whether it's a new container from the image or we're starting up an existing container. I'm currently using this method but the delay can be rather annoying for bigger packages.
What's the (objectively) best way to go about handling this? Having to wait for a container to start up and update itself is not really ideal.

If you need to update something in your container, you need to build a new container. Think of the container as a statically compiled binary, just like you would with C or Java. Everything inside your container is a dependency. If you have to update a dependency, you recompile and release a new version.
If you tamper with the contents of the container at startup time you lose all the benefits of Docker: That you have a traceable build process and each container is verifiably bit-for-bit identical everywhere and every time you copy it.
Now let's address why you need to update foo. The only reason you should have to update a dependency outside of the normal application delivery cycle is to patch a security vulnerability. If you have a CVE notice that ubuntu just released a security patch then, yep, you have to rebuild every container based on ubuntu.
There are several services that scan and tell you when your containers are vulnerable to published CVEs. For example, Quay.io and Docker Hub scan containers in your registry. You can also do this yourself using Clair, which Quay uses under the hood.
For any other type of update, just don't do it. Docker is a 100% fossilization strategy for your application and the OS it runs on.
Because of this your Docker container will work even if you copy it to 1000 hosts with slightly different library versions installed, or run it alongside other containers with different library versions installed. You container will continue to work 2 years from now, even if the dependencies can no longer be downloaded from the internet.
If for some reason you can't rebuild the container from scratch (e.g. it's already 2 years old and all the dependencies went missing) then yes, you can download the container, run it interactively, and update dependencies. Do this in a shell and then publish a new version of your container back into your registry and redeploy. Don't do this at startup time.

Identifying files contained within a docker image (or Application dependencies)

I'm currently running a project with Docker, and I have this question. Is there any possible way that you can know what are the exact files that are contained in a docker image, without running that image ?
In a different way, I'm scheduling docker containers to run on the cloud based on the binaries/libraries they use, so that containers with common dependencies (have common binaries and libraries) are scheduled to run on the same host thus share these dependencies on the hosts os. Is there a possible way to identify the dependencies / do that ?

You could run docker diff on each layer to see what files were added. It is not very succinct, but it should be complete.
Alternatively your distro and programming language may have tools that help identify which dependencies have been added. For example, Python users can check the output of pip freeze and Debian users can check the output of dpkg --get-selections to check on what system packages have been installed.

How do I create docker image from an existing CentOS?

I am new to docker.io and not sure if this is beyond the scope of docker. I have an existing CentOS 6.5 system. I am trying to figure out how to create a docker image from a CentOS Linux system I already have running. I would like to basically clone this existing system; so I can port it to another cloud provider. I was able to create a docker image from a base CentOS image but I want to basically clone my existing system and use docker.io going forward.
Am I stuck with creating a base CentOS from scratch and configure it for docker from there? This might be more of a VirtualBox/Vagrant thing, but am interested in docker.io.
Looks like I need to start with base CentOS and create a Dockerfile with all the addons I need... Think I'm getting there now....

Cloning a system that is up and running is certainly not what Docker is intended for. Instead, Docker is meant to develop your OS and server installation together with the app or service, making DevOps even more DevOpsy. By starting with a clean CentOS image, you will be sure to install only what you need for the service, and have everything under control. You actually don't want all the other stuff that might produce incompatibilities. So, the answer here is that you definitely should approach the problem here the other way around.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart