Finding what takes space in a docker image - docker

I'm looking for a visual tool that would take a Docker image and be able to show (in some kind of a chart) what makes the image e.g. 1,2GB large.
In addition to what disk tools would do, it would also tell me which Docker overlay brought that file in.
Is there such a tool or shall I dream on?

Dive does it.
Mentioned here.

Related

Create a docker image so it can be used as base image

This question is a little vague. Sorry I do not have more details. I did an exam a few days ago that involved containers. One question was about optimizing an existing docker file, creating the image and pushing it to docker hub. The docker file included EXPOSE, CMD, LABEL, FROM RUN, ADD instructions.
One of the characteristics of the image was that one should be able to use it as base/parent image for creating other images.
I can't believe this mention was added for no reason. Yet I am not able to understand why this characteristic had to be listed. Is there something specific that one needs to add in docker files for base images? Or are base images stored differently in the registry?
What makes a base docker image different than a normal runnable image?
There is nothing special that you need to do for an image to be reusable. All images can be used in the FROM clause and thus used as base images. The OS needs to match the commands in the file, of course. Also images are made of layers and layers are usually reused between images.
If I would have to look hard for a reason why that was specified as a requirement, I would say that they wanted you to pay attention to:
the size of the final image (make it as small as possible)
documentation (make it easy for others to understand what your image already does)

Datamechanics - spark docker image - example of how to use the connector that comes inbuilt with the image

I came across the below docker image for spark. The image also comes with some of the connectors to some of the popular cloud services. An example of how to use the inbuilt connectors(say Azure storage gen2) in pyspark application will be of great help.
link to dockerhub image
: https://hub.docker.com/r/datamechanics/spark
I looked into the below example that was provided but it didn't help much in understanding how to use the connector that comes with the default image
https://github.com/datamechanics/examples/blob/main/pyspark-example/main.py
There is some more documentation at https://docs.datamechanics.co/docs/docker-images but it is not very helpful to understand how to use the images indeed..
The point that there is no Dockerfile and also no response to reported issues makes it very difficult.
It looks like https://g1thubhub.github.io/docker.html is helpful, although the versions of the images that are used are older.

Recognize Logo in a full image

First, you need to know that I'm a beginner in this subject. Initially, I'm an Embedded System Developpers but I never worked with image recognition.
Let me expose my main goal:
I would like to create my own database of Logos and be able to
recognize them in a larger image. Typical application would be, for
example, to make a database of pepsi logos and coca-cola logos and
when I take a photo of a bottle of Soda, it tells me if it one of
them or an another.
So, here is my problem:
I first wanted to use the Auto ML Kit of Google. I gave him my
databases so it could train itself on it. My first attempt was to
take photos of bottle entirely and then compare. It was ok but not
too efficient. I then tried to give him only logos but after
training, it couldnt recognize anything in the whole image of a
bottle.
I think I didn't give enough images in the first case. But I'd prefer to use the second case (by giving only logo) so that the machine would search something similar in the image.
Finally, my questions:
If you've worked with ML Kit from Google, were you able to train a
model by giving images that should be recognized in a larger image?
If yes, do you have any hints to give me?
Do you know reliable software that could help me to perform tests of this kind? I thought about Azure Machine Learning Studio from
Microsoft (since I develop on Visual Studio).
In a first time, I'd like to code as few as I can just for testing. Maybe later I could try to code my own Machine Learning System but I think it's a big challenge.
I also thought that I would need to split my image in smaller image and then send each of this images into the Machine but it would be time consuming and I need a fast reaction (like < 2 seconds).
Thanks in advance for your answer. I don't need complete answer with full tutorial (Stack Overflow is not intended for that anyway ^^) but just some advices would already be good.
Have a good day!
Azure’s Custom Vision is great for this: https://www.customvision.ai
Let’s say you want to detect a pepsi logo. Upload 70 images of products with the logo on them. Use Custom Vision to draw a box around the logo for each photo. Click “train”, and you get a tensorflow model with code.
Look up any tutorial for it, it’s pretty incredible and really easy to use.

GIMP How to use python console (or make plugin) to navigate between images

Note: This applies to windows only, due to the inherent problems from using standard, free automation tools for (EDIT: GDK) applications on Windows. If there is a better way to work with automating GDK on Windows than AutoIt, please let me know)
I am trying to automate some tasks in GIMP using AutoIT mostly, which is mostly simple enough, however, moving from open image to open image is problematic. There are two ways I have done this (using AutoIt) so far: Automatically click on the arrows (in single window mode) to scroll from image to image, and 2) using right and left arrow KEYS to do same.
There are problems with both approaches. For the first approach I can not do anything else with my computer until the AutoIt script is done processing because it is occupying my mouse. And we are talking about 100 or so images each time.
For the second approach it requires, I do not know the proper word, but 'thumbnail bar' at the top, that the thumbnail bar be 'active' or else using arrow keys does not work. you CAN make it active by clicking on an image inside of it, once or twice depending on state, but then I have to do all sorts of gymnastics to know which ACTUAL image I am supposed to be on in order to proceed.
That brings me to conclude the best solution is to have some programmatic way, in a Python or Scheme plugin (or a simple command or two, that I can have AutoIt simply paste into the console) to allow me to move between images in a very reliable, non-obtrusive, simple way.
Thank you in advance.
Dev
Normally, you can't. By design, the script-fu and python-fu API deal with images processing and cannot interfere with the UI. You can just open windows (a.k.a. "displays") on images and delete them (and only those your script created).
But you can write Python/Scheme scripts to do the tasks directly instead of trying to do them by emulating a human...

Docker find all layers in an image

Docker has changed it's backend and now the "docker history" command no longer shows layer IDs of all layers in an image. More details here: https://github.com/docker/docker/issues/20131
Although I understand why it's showing "missing", I still havent found a new way to extract the information I'm looking for. Does anyone know how I can find all the layers in an image? I'd like to be able to cross-reference layers over different images so that I know when a certain layer is used by more than once service, which is why I used the ID shown in the history command up until recentely.
Any help is appreciated, thank you.

Resources