kubernetes not able to pull image from spark master host - docker

I have 3 node[host a,host b, host c] kubernetes cluster(version 1.12.2). I am trying run spark-pi example jar as mentioned in kubernetes document.
Host a is my kubernetes Master. >> kubectl get nodees list all the three nodes.
I have built the spark docker image using whats provided in spark 2.3.0 binary folder.
>> sudo ./bin/docker-image-tool.sh -r docker.io/spark/spark -t spark230 build
I got the message the image got built successfully.
>> docker images ls
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/spark/spark spark230 6a2b645d7efe About an hour ago 346 MB
docker.io/weaveworks/weave-npc 2.5.0 d499500e93d3 7 days ago 49.5 MB
docker.io/weaveworks/weave-kube 2.5.0 a5103f96993a 7 days ago 148 MB
docker.io/openjdk 8-alpine 97bc1352afde 2 weeks ago 103 MB
k8s.gcr.io/kube-proxy v1.12.2 15e9da1ca195 2 weeks ago 96.5 MB
k8s.gcr.io/kube-apiserver v1.12.2 51a9c329b7c5 2 weeks ago 194 MB
k8s.gcr.io/kube-controller-manager v1.12.2 15548c720a70 2 weeks ago 164 MB
k8s.gcr.io/kube-scheduler v1.12.2 d6d57c76136c 2 weeks ago 58.3 MB
k8s.gcr.io/etcd 3.2.24 3cab8e1b9802 7 weeks ago 220 MB
k8s.gcr.io/coredns 1.2.2 367cdc8433a4 2 months ago 39.2 MB
k8s.gcr.io/pause 3.1 da86e6ba6ca1 10 months ago 742 kB
> ./bin/spark-submit
--master k8s://https://<api-server>:<api
> server port> --deploy-mode cluster --name spark-pi
> --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=5 --conf
> spark.kubernetes.container.image=spark/spark:spark230 --conf
> spark.kubernetes.authenticate.driver.serviceAccountName=spark
> local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
When I submit above command , it gives proper output sometimes. Other times it throws below error.
> code = Unknown desc = repository docker.io/spark/spark not found:
does not exist or no pull access, reason=ErrImagePull
When I debugged it further, it looks like, whenever node name: host b or host c its throwing above error message.
When node name : host a , then it runs fine. Looks like other nodes are unable to locate the image.
Questions:
Should I install spark on all nodes & build docker image on all nodes?
Is it possible to pass the image reference from single node [host a] to other nodes? i.e how to make other nodes refer the same image from host a.

Yes, you need to build Spark image in all the nodes. You can write the wrapper script to invoke the 'rebuild-image.sh' on all the nodes as below:
for h in hostnames; do
rsync -av /opt/spark ${h}:/opt
ssh ${h} /opt/spark/rebuild-image.sh

You can always save docker image as tar file and then copy that tar file to other host and load the image there.
To save docker image as tar file:
sudo docker save -o <path for generated tar file> <image name>
Now copy your tar file to other host using scp or some other copy tools. And load the docker image using:
sudo docker load -i <path to image tar file>
Hope this helps

Related

Why isn't this cron job restarting my docker containers?

I currently have two docker containers running with the names fe and be, which I can see with docker container list, which gives the output:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2f342ab21447 dockerimage1/blahblah "docker-entrypoint.s…" 3 days ago Up About an hour 0.0.0.0:80->3000/tcp, :::80->3000/tcp fe
12ea03925500 dockerimage2/blahblah "docker-entrypoint.s…" 3 days ago Up About an hour 0.0.0.0:2053->3001/tcp, :::2053->3001/tcp be
When I do crontab -e I can see my cron file:
# Jobs
10 2 * * * docker container restart fe && docker container restart be && echo 'Restarting Docker Container" > /root/clogs.txt
But it doesn't ever run, I tried checking the logs in syslog with grep CRON /var/log/syslog but i only see this repeated a bunch of times:
Feb 3 04:15:01 ubuntu CRON[106860]: (CRON) info (No MTA installed, discarding output)
Feb 3 04:16:01 ubuntu CRON[106869]: (user) CMD (/var/tmp/.update-logs/./History >/dev/null 2>&1 & disown)
I can tell that the cron job didn't work, because running docker container list again I can see that the status is still about an hour ago even when it was supposed to run 3 minutes ago.

How to mount host data to different containers

I need to mount host directory /data to 6 containers h1,h2,h3,h4,h5,h6
/data is an external hard disk mounted on the host. The 6 containers can be opened and closed easily.
The 6 containers will go into their own sub-directories of /data to analyze data independently and produce new data locally. All sub-directories have nothing to do with each other.
A relevant question is here, but no preferred answer is given.
How to do that? Below are the containers and images I have now.
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d9bd9334a1e7 ubuntu "/usr/bin/bash" 19 hours ago Up 18 hours h6
23679fe7252b ubuntu "/usr/bin/bash" 19 hours ago Up 18 hours h5
e2864e38e746 ubuntu "/usr/bin/bash" 19 hours ago Up 18 hours h4
c8996a304638 ubuntu "/usr/bin/bash" 19 hours ago Up 18 hours h3
9acd2a223d86 ubuntu "/usr/bin/bash" 19 hours ago Up 18 hours h2
5690b8c7b6da ubuntu "/usr/bin/bash" 2 days ago Up 12 hours h1
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/hello-world latest f2a91732366c 2 months ago 1.85 kB
docker.io/ubuntu 27 422dc563ca32 2 months ago 252 MB
docker.io/ubuntu latest 422dc563ca32 2 months ago 252 MB
IF those containers are already running, you cannot easily add /data to them.
Except maybe with docker cp.
But the best practice remains either:
make images with /data already in it (Dockerfile ADD)
or use the existing image and launch your container, but with the -v (volume) option. See Use volumes.

deleting old images in Docker - OSX

I've been toying with a docker image for Tensorflow.
To summarize, I first installed the standard image, then realized I needed nodejs, so added it and did a docker commit. Then realized I needed expressJS, added it an did a commit
I am running docker v1.12.5 (so the new gc/prune commands are not there)
At this stage, docker images -a shows:
REPOSITORY TAG IMAGE ID CREATED SIZE
tensor-node-express latest f2f59eb61aae 15 hours ago 2.104 GB
gcr.io/tensorflow/tensorflow latest-devel 308238445d5c 2 days ago 1.995 GB
gcr.io/tensorflow/tensorflow <none> 74435614a991 9 days ago 1.52 GB
I only want to keep tensor-node-express and delete the older images.
$ docker rmi 308238445d5c
Error response from daemon: conflict: unable to delete 308238445d5c (cannot be forced) - image has dependent child images
$docker rmi gcr.io/tensorflow/tensorflow:latest-devel
Error response from daemon: conflict: unable to remove repository reference "gcr.io/tensorflow/tensorflow:latest-devel" (must force) - container 03de9d864e31 is using its referenced image 308238445d5c
I assumed that this means docker commits store differential images, but when I go to ~/.docker/machine/machines/default, I see:
40894464 Mar 13 13:57 boot2docker.iso
5043847168 Mar 16 08:34 disk.vmdk
I suppose the 5G file is a composite of my images, which seems to show each docker commit is the full image!
Any thoughts on how I can only use the latest docker image (tensor-node-express) and free my HD of the invasion of docker?
Supplementary info - here is the output of docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e6dcd2915991 tensor-node-express "/bin/bash" 15 hours ago Exited (130) 15 hours ago flamboyant_bose
fb44b19a21c2 gcr.io/tensorflow/tensorflow:latest-devel "/bin/bash" 18 hours ago Exited (130) 15 hours ago compassionate_bose
075001a687e3 gcr.io/tensorflow/tensorflow:latest-devel "/bin/bash" 18 hours ago Exited (0) 18 hours ago nervous_sinoussi
a80ce2d2e688 gcr.io/tensorflow/tensorflow:latest-devel "/bin/bash" 19 hours ago Exited (130) 18 hours ago happy_euclid
f493bd3c8712 gcr.io/tensorflow/tensorflow:latest-devel "/bin/bash" 19 hours ago Exited (1) 19 hours ago friendly_cori
03de9d864e31 gcr.io/tensorflow/tensorflow:latest-devel "/bin/bash" 2 days ago Exited (255) 23 minutes ago 6006/tcp, 8888/tcp tender_hopper
2dd1e83d62d3 gcr.io/tensorflow/tensorflow:latest-devel "/bin/bash" 2 days ago Exited (0) 15 hours ago modest_einstein
3067ed171b1c gcr.io/tensorflow/tensorflow:latest-devel "/bin/bash" 2 days ago Exited (0) 2 days ago dazzling_bhabha
62c699afd3fd 74435614a991 "/bin/bash" 2 days ago Exited (127) 2 days ago inspiring_austin
9523ffe2945c 74435614a991 "/bin/bash" 2 days ago Exited (0) 2 days ago kickass_leakey
e06958ea517c 74435614a991 "/bin/bash" 2 days ago Exited (0) 2 days ago objective_euler
ccf922954667 74435614a991 "/bin/bash" 2 days ago Exited (255) 2 days ago dreamy_bartik
fad0d92a07a3 74435614a991 "/bin/bash" 2 days ago Exited (130) 2 days ago elastic_dubinsky
f2a98d4e11ea 74435614a991 "/bin/bash" 2 days ago Exited (0) 2 days ago heuristic_kilby
f07e46367b17 74435614a991 "/bin/bash" 2 days ago Exited (130) 2 days ago trusting_darwin
5bbf9cf992b8 74435614a991 "/bin/bash" 2 days ago Exited (0) 2 days ago flamboyant_knuth
I tried
docker ps --filter "status=exited" | grep "days ago" | awk '{print $1}' | xargs docker rm (credit)
I ran the above manually as well for some of the containers it missed
That pruned the ps list to:
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e6dcd2915991 tensor-node-express "/bin/bash" 15 hours ago Exited (130) 15 hours ago flamboyant_bose
But even then I can't delete old images - same error.
Further update, I tried to list dependencies in images (credit)
using this script:
for i in $(docker images -q)
do
docker history $i | grep -q 74435614a991 && echo $i
done | sort -u
And it told me:
308238445d5c
74435614a991
f2f59eb61aae
This means my new images are child images of the old image. But the size is not a differential looking at the disk size.
Thoughts?
docker-machine uses a Linux VM
When you looked at the docker-machine .vdmk and .iso files, what you are looking at is files for a Linux VM running on your Mac. This is needed because Docker requires Linux kernel features to run, it cannot run directly on the Mac's microkernel.
So your Mac is running a Linux virtual machine, and inside that virutal machine is running the Docker daemon and all of your containers.
Therefore the file size of the .vmdk and .iso tell you nothing about any one image.
docker images have parent/child relationships
As you may already know, docker images have parents and/or children. For instance when you build an image with a Dockerfile like this:
FROM ubuntu:latest
RUN apt-get update && apt-get install nginx
You will end up with a new image that you have perhaps tagged my-nginx. But it requires the ubuntu:latest image as its parent; you cannot delete ubuntu:latest with this image still around, as it requires its parent.
docker commit creates those relationships
When you use docker commit, you are basically doing a dynamic snapshot build. It is similar to the above, except there's no Dockerfile involved.
The above example has a FROM line which indicates the image to use as a base. When using commit, there is a base implied - whatever image was used to launch the running container that you are committing.
The above example has a RUN command which will create new contents in the built image, above and beyond the base image. In a real Dockerfile there are usually multiple commands that do various things which build on the base image. When you use commit, you don't have that. Instead, anything that has been written to the container on top of the base image is your new content. It exists in a read-write filesystem layer in the container. That is the thing you are committing; it is written as a new read-only layer and you get that back as a new (immutable, read-only) docker image. With a parent.
Based on your comments, and the question itself, you appear to have believed that using docker commit would create a new full image that had no dependencies on other images. That is not true. You can craft images like that if you build them yourself from scratch, just not this way.
You can untag the image
If what you want is for the image to not show up in your list, that's easy. Just untag it.
docker rmi gcr.io/tensorflow/tensorflow:latest-devel
However, this is more or less cosmetic. The image will still be there, as another image requires it. All this does is remove the tag, so it doesn't appear in the docker images list anymore without the -a flag.
The reason trying this did not work for you is you tried to rmi the image using its ID, not using its tag.

Push in docker private registry

I'm following https://docs.docker.com/registry/deploying/ and I have installed on docker.tp.cselt.it a private docker registry
> sudo docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
registry 2 65b0a3f42eef 7 days ago 165.8 MB
dockerui/dockerui latest 95c8b9dc91e0 6 weeks ago 6.13 MB
> sudo docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e142b5f0933e registry:2 "/bin/registry /etc/ 7 minutes ago Up 7 minutes 0.0.0.0:5000->5000/tcp registry
1d5c9e515118 registry:2 "htpasswd -Bbn testu 7 minutes ago Exited (0) 7 minutes ago romantic_jang
ae7b5d62628f dockerui/dockerui:latest "/dockerui" About an hour ago Up About an hour 0.0.0.0:9000->9000/tcp goofy_meitner
On another machine, I'm trying to push an image (hello-world) on that registry:
> docker login docker.tp.cselt.it:5000
Username (testuser):
WARNING: login credentials saved in /home/administrator/.docker/config.json
Login Succeeded
> docker pull hello-world
Using default tag: latest
latest: Pulling from library/hello-world
b901d36b6f2f: Pull complete
0a6ba66e537a: Pull complete
Digest: sha256:8be990ef2aeb16dbcb9271ddfe2610fa6658d13f6dfb8bc72074cc1ca36966a7
Status: Downloaded newer image for hello-world:latest
> docker tag hello-world docker.tp.cselt.it:5000/hello-world
> docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
docker.tp.cselt.it:5000/hello-world latest 0a6ba66e537a 5 months ago 960 B
hello-world latest 0a6ba66e537a 5 months ago 960 B
> docker push docker.tp.cselt.it:5000/hello-world
The push refers to a repository [docker.tp.cselt.it:5000/hello-world] (len: 1)
0a6ba66e537a: Image already exists
b901d36b6f2f: Image already exists
latest: digest: sha256:1c7adb1ac65df0bebb40cd4a84533f787148b102684b74cb27a1982967008e4b size: 2744
Now, on the first machine (docker.tp.cselt.it):
> sudo docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
registry 2 65b0a3f42eef 7 days ago 165.8 MB
dockerui/dockerui latest 95c8b9dc91e0 6 weeks ago 6.13 MB
> sudo docker exec -it 65b0a3f42eef bash
> ls /var/lib/registry/docker/registry/v2/repositories/
centos hello-world ubuntu
But when I run:
> curl -u testuser:testpassword -X GET http://docker.tp.cselt.it:5000/v2/_catalog --noproxy docker.tp.cselt.it
I receive ""
What's wrong?
Riccardo
curl --noproxy docker.tp.cselt.it -u testuser:testpassword --insecure -X GET https://docker.tp.cselt.it:5000/v2/_catalog
{"repositories":["hello-world"]}

Docker run -d <private image> gives fatal. On other hosts it's ok?

Let the commands speak for themselves:
on a host called: coreworker
core#coreworker-1 ~ $ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
hub-docker-repo:5000/485d5874-c786-4b90-93ac-8db5342a6059 1 bbd5d4d98156 31 minutes ago 139.3 MB
ec2-54-169-239-164.ap-southeast-1.compute.amazonaws.com:5000/hub-action-repository latest 66ecb895d185 14 hours ago 856.4 MB
ec2-54-169-239-164.ap-southeast-1.compute.amazonaws.com:5000/hub-ext-node-base 0.12.7 f2f1afc202e4 8 days ago 136.6 MB
...
core#coreworker-1 ~ $
on a host called devhost
core#devhost ~ $ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
<none> <none> 67e45ce93dee 48 minutes ago 725 MB
node 0.12.7 a4b45afffe4a 5 days ago 642.2 MB
jwnintex/mesos-worker latest 42f1b41b0089 5 weeks ago 504 MB
jwnintex/nginx-port-router latest 11edcdf1a5fc 9 weeks ago 126.4 MB
jwnintex/consul latest e66fb6787628 10 weeks ago 69.4 MB
jwnintex/mesos-master latest 187d84106a3e 3 months ago 561.8 MB
jwnintex/marathon latest b1d8dd91146a 3 months ago 699.3 MB
jwnintex/zookeeper latest 9b72d56707c9 4 months ago 304.3 MB
jwnintex/registrator latest b1c29d1a74a9 6 months ago 11.79 MB
but when I do:
core#devhost ~ $ docker images -a hub-docker-repo:5000
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
now run the image on devhost residing in hub-docker-repo
veryify it's up:
core#devhost ~ $ ping hub-docker-repo
PING hub-docker-repo.service.consul (172.17.8.150) 56(84) bytes of data.
64 bytes from coreworker-1.node.dc1.consul (172.17.8.150): icmp_seq=1 ttl=63 time=8.27 ms
64 bytes from coreworker-1.node.dc1.consul (172.17.8.150): icmp_seq=2 ttl=63 time=5.19 ms
now try it
core#devhost ~ $ docker run -d hub-docker-repo:5000/485d5874-c786-4b90-93ac-8db5342a6059:1
Unable to find image 'hub-docker-repo:5000/485d5874-c786-4b90-93ac-8db5342a6059:1' locally
Pulling repository hub-docker-repo:5000/485d5874-c786-4b90-93ac-8db5342a6059
FATA[0000] Error: image 485d5874-c786-4b90-93ac-8db5342a6059:1 not found
Now try it on coreworker which is actually hosting the registry (as a docker image)
core#coreworker-1 ~ $ docker run -d hub-docker-repo:5000/485d5874-c786-4b90-93ac-8db5342a6059:1
98d642c1bafd30569d853e92167c7b4fe720bd67f65ec0d0719ec5a36bb6616f
Why am I not able to run that remote image? Or better, even discover it?
It turns out the image had never been pushed and so couldn't be seen by the other host.

Resources