Marathon won't launch docker container - docker

I have a 1/1 master/slave setup with the slave having 8gb ram 8 cpus. I am trying to use marathon to deploy a docker container with 1gb mem and 1 cpu but it just hangs on waiting
I believe this is usually caused by marathon not getting the resources it wants for the task
when I look at my logs I see
Sending 1 offers to framework
8bb1a298-cc23-426e-ad43-d440a2a560c4-0000 (marathon) at
scheduler-d4a993b4-69ea-4ac3-9e98-b54afe1e790b#127.0.0.1:52016 I0127
23:07:37.396546 2471 master.cpp:3297] Processing DECLINE call for
offers: [ 5271fcb3-4d77-4b12-af85-d94fd9172514-O127 ] for framework
8bb1a298-cc23-426e-ad43-d440a2a560c4-0000 (marathon) at
scheduler-d4a993b4-69ea-4ac3-9e98-b54afe1e790b#127.0.0.1:52016 I0127
23:07:37.396917 2466 hierarchical.cpp:744] Recovered cpus(​):6;
mem(​):5968; disk(​):156020; ports(​):[31000-31056, 31058-32000]
(total: cpus(​):8; mem(​):6992; disk(​):156020;
ports(​):[31000-32000], allocated: cpus(​):2; mem(​):1024;
ports(*):[31057-31057]) on slave
8bb1a298-cc23-426e-ad43-d440a2a560c4-S0 from framework
8bb1a298-cc23-426e-ad43-d440a2a560c4-0000
so it looks like marathon is declining the offer it gets? the next line in the logs say that mesos is reclaiming the offered resources and what its reclaiming looks like its plenty for my task?
any ideas on how to trouble shoot this further?
edit: so got to dig into this a bit further and found the marathon logs.
Basically the deployment works if we do not enter any information for port mapping in the marathon docker section. The docker container deploys successfully and I can ping it successfully from its host but I cannot access it from elsewhere.
if we set the container port as 8081 (which is what the docker container exposes are its application listens on) we get further in the deployment process but the app within the container fails to build with error
Error: listen EADDRINUSE :::8081
at Object.exports._errnoException (util.js:856:11)
at exports._exceptionWithHostPort (util.js:879:20)
at Server._listen2 (net.js:1234:14)
at listen (net.js:1270:10)
at Server.listen (net.js:1366:5)
at EventEmitter.listen (/usr/src/app/node_modules/express/lib/application.js:617:24)
at Object. (/usr/src/app/index.js:16:18)
at Module._compile (module.js:425:26)
at Object.Module._extensions..js (module.js:432:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:313:12)
at Function.Module.runMain (module.js:457:10)
at startup (node.js:138:18)
at node.js:974:3
So I think we are further along than we were but we are still having some port issues. I dont know why the container would build successfully on its own and with marathon with no port settings but not with marathon with port settings

There are few things to check:
On you slave: ps aux | grep sbin/mesos-slave should contain something like:
--containerizers=docker,mesos --executor_registration_timeout=5mins
Again on slave check that there's a Docker Daemon running:
ps aux | grep "docker daemon"
Make sure you've configured Docker network (in Marathon) as BRIDGE. With HOST mode you might get in collision with ports already used on host. This will allow mapping slave:32001 -> docker:8080.
...
"network": "BRIDGE",
"portMappings": [
{
"containerPort": 8080,
"hostPort": $PORT0,
"protocol": "tcp"
}
],
...
When the task starts in Marathon you'll see the app ID like myapp.a72db5b0-ca16-11e5-ba5f-fea9945fabaf. Use Mesos CLI (pip install mesos.cli mesos.interface) to fetch the logs. There's a command similar to Unix's tail for fetching stdout logs (-f follow logs):
mesos tail -f -i myapp.a72db5b0-ca16-11e5-ba5f-fea9945fabaf
and stderr:
mesos tail -f -i myapp.a72db5b0-ca16-11e5-ba5f-fea9945fabaf stderr
-i allows you to get logs from inactive tasks (in case that the task is crashing quickly). If you don't catch the ID in Marathon, use mesos ps -i.
In case that the task is not starting, there's either not enough resources or some problem with Marathon. Navigate your browser to http://{marathon URI:8080]/logging and increase verbosity for task allocation. Then check Marathon logs.

Related

Issue accessing vespa outside docker container

Installed Docker on Mac and trying to run Vespa on Docker following steps specified in following link
https://docs.vespa.ai/documentation/vespa-quick-start.html
I did n't had any issues till step 4. I see vespa container running after step 2 and step 3 returned 200 OK response.
But Step 5 failed to return 200 OK response. Below is the command I ran on my terminal
curl -s --head http://localhost:8080/ApplicationStatus
I keep getting
curl: (52) Empty reply from server whenever I run without -s option.
So I tried to see listening ports inside my vespa container and don't see anything for 8080 but can see for 19071(used in step 3)
➜ ~ docker exec vespa bash -c 'netstat -vatn| grep 8080'
➜ ~ docker exec vespa bash -c 'netstat -vatn| grep 19071'
tcp 0 0 0.0.0.0:19071 0.0.0.0:* LISTEN
Below doc has info related to vespa ports
https://docs.vespa.ai/documentation/reference/files-processes-and-ports.html
I'm assuming port 8080 should be active after docker run(step 2 of quick start link) and can be accessed outside container as port mapping is done.
But I don't see 8080 port active inside container in first place.
A'm I missing something. Do I need to perform any additional step than mentioned in quick start? FYI I installed Jenkins inside my docker and was able to access outside container via port mapping. But not sure why it's not working with vespa.I have been trying from quiet sometime but no progress. Please advice me if I'm missing something here.
You have too low memory for your docker container, "Minimum 6GB memory dedicated to Docker (the default is 2GB on Macs).". See https://docs.vespa.ai/documentation/vespa-quick-start.html
The deadlock detector warnings and failure to get configuration from configuration server (which is likely oom killed) indicates that you are too low on memory.
My guess is that your jdisc container had not finished initialize or did not initialize properly? Did you try to check the log?
docker exec vespa bash -c '/opt/vespa/bin/vespa-logfmt /opt/vespa/logs/vespa/vespa.log'
This should tell you if there was something wrong. When it is ready to receive requests you would see something like this:
[2018-12-10 06:30:37.854] INFO : container Container.org.eclipse.jetty.server.AbstractConnector Started SearchServer#79afa369{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
[2018-12-10 06:30:37.857] INFO : container Container.org.eclipse.jetty.server.Server Started #10280ms
[2018-12-10 06:30:37.857] INFO : container Container.com.yahoo.container.jdisc.ConfiguredApplication Switching to the latest deployed set of configurations and components. Application switch number: 0
[2018-12-10 06:30:37.859] INFO : container Container.com.yahoo.container.jdisc.ConfiguredApplication Initializing new set of configurations and components. Application switch number: 1

How to get host's udev events from a Docker container?

In a Docker container, I am looking for a way to get the udev events on the host.
Using udevadm monitor, it sends back host's kernel events only in a container.
The question is whether there is a way to detect host's udev events or forward host's event to containers?
This is how I made my container receive host events by udev:
docker run --net=host -v /run/udev/control:/run/udev/control
--net=host allows container and host operate through PF_NETLINK sockets, which are used by udev monitor to receive kernel events (found here)
/run/udev/control is a file, which udev monitor uses to check if udevd is already running. If it doesn't exist, monitoring is disabled.
Just like above answer pointed out: we could enable --net=host, but host network is not suggested because of multiple known reasons.
In fact this issue happens just because it need NETLINK to communicate between kernel & user space, but if not use host network, host & container will in different netns, so enable udev in container could make them in same netns which then no need to use host network.
When we ran into this issue, we did next:
# apt-get install udev
# vim /etc/init.d/udev to comment some special settings:
1) Comments next:
#if [ ! -e "/run/udev/" ]; then
# warn_if_interactive
#fi
2) Comments next:
#if ! ps --no-headers --format args ax | egrep -q '^\['; then
# log_warning_msg "udev does not support containers, not started"
# exit 0
#fi
# root#e751e437a8ba:~# service udev start
[ ok ] Starting hotplug events dispatcher: systemd-udevd.
[ ok ] Synthesizing the initial hotplug events (subsystems)...done.
[ ok ] Synthesizing the initial hotplug events (devices)...done.
[ ok ] Waiting for /dev to be fully populated...done.

docker - driver "devicemapper" failed to remove root filesystem after process in container killed

I am using Docker version 17.06.0-ce on Redhat with devicemapper storage. I am launching a container running a long-running service. The master process inside the container sometimes dies for whatever reason. I get the following error message.
/bin/bash: line 1: 40 Killed python -u scripts/server.py start go
I would like the container to exit and to be restarted by docker. However docker never exits. If I do it manually I get the following error:
Error response from daemon: driver "devicemapper" failed to remove root filesystem.
After googling, I tried a bunch of things:
docker rm -f <container>
rm -f <pth to mount>
umount <pth to mount>
All result in device is busy. The only remedy right now is to reboot the host system which is obviously not a long-term solution.
Any ideas?
I had the same problem and the solution was a real surprise.
So here is the error om docker rm:
$ docker rm 08d51aad0e74
Error response from daemon: driver "devicemapper" failed to remove root filesystem for 08d51aad0e74060f54bba36268386fe991eff74570e7ee29b7c4d74047d809aa: remove /var/lib/docker/devicemapper/mnt/670cdbd30a3627ae4801044d32a423284b540c5057002dd010186c69b6cc7eea: device or resource busy
Then I did the following (basically go through all processes and look for docker in mountinfo):
$ grep docker /proc/*/mountinfo | grep 958722d105f8586978361409c9d70aff17c0af3a1970cb3c2fb7908fe5a310ac
/proc/20416/mountinfo:629 574 253:15 / /var/lib/docker/devicemapper/mnt/958722d105f8586978361409c9d70aff17c0af3a1970cb3c2fb7908fe5a310ac rw,relatime shared:288 - xfs /dev/mapper/docker-253:5-786536-958722d105f8586978361409c9d70aff17c0af3a1970cb3c2fb7908fe5a310ac rw,nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota
This got be the PID of the offending process keeping it busy - 20416 (the item after /proc/)
So I did a ps -p and to my surprise find:
[devops#dp01app5030 SeGrid]$ ps -p 20416
PID TTY TIME CMD
20416 ? 00:00:19 ntpd
A true WTF moment. So I pair problem solved with Google and found this:
Then found this https://github.com/docker/for-linux/issues/124
Turns out I had to restart ntp daemon and that fixed the issue!!!

Docker container on Marathon doesn't finish

I have Mesos cluster consists of 3 CentOS6.5 machines.
ZooKeeper and Mesos-Master is running on one of the machines and Mesos-Slave is running on each machine.
Also, Marathon is running on master node.
Then, I am trying to run docker containers on Marathon, following this instruction by Mesosphere.
job.json is like as follows,
{
"container": {
"type": "DOCKER",
"docker": {
"image": "libmesos/ubuntu"
}
},
"id": "ubuntu",
"instances": 1,
"cpus": 0.5,
"mem": 512,
"uris": [],
"cmd": "date -u +%T"
}
Then I run following command,
curl -X POST -H "Accept: application/json" -H "Content-Type: application/json" <master-hostname>:8080/v2/apps -d#job.json
Then on Marathon Web UI, I can see the Docker container is "Deploying" status even after long time.
And on Mesos-Master Web UI, I can see the Task is in "STAGING" status after long time.
On Sandbox pane, I can see the stdout and the command seems to completed successfly. No problem.
stderr is like this,
I0416 19:19:49.254998 29178 exec.cpp:132] Version: 0.22.0
I0416 19:19:49.257824 29193 exec.cpp:206] Executor registered on slave 20150416-160950-109643786-5050-30728-S0
stdout is like this,
Registered executor on master-hostname
10:19:49
But I expect the container(TASK) to finish after completed the command.
Is it possible?
If possible, how to do that way?
Thank you.
The task will finish (you should be able to see in the Mesos completed tasks) but the container will be restarted by Marathon. Marathon is for long-running apps.
If you don't want your application to be running continuously, you should take a look at another framework like Chronos.
Marathon is for long running processes. Even if you remove the containers, marathon will try to restart these. One more thing that I observed is that Marathon tries to launch containers and continue to do that till you are left with no memory and CPU. When you are out of resources your task will go in stage state.

Docker on RHEL 6 Cgroup mounting failing

I'm trying to get my head around something that's been working on a Centos+Vagrant, but not on our providers RHEL (Red Hat Enterprise Linux Server release 6.5 (Santiago)). A sudo service docker restart hands this:
Stopping docker: [ OK ]
Starting cgconfig service: Error: cannot mount cpuset to /cgroup/cpuset: Device or resource busy
/sbin/cgconfigparser; error loading /etc/cgconfig.conf: Cgroup mounting failed
Failed to parse /etc/cgconfig.conf [FAILED]
Starting docker: [ OK ]
The service starts okey enough, but images cannot run. A mounting failed error is shown when I try. And the startup-log also gives a warning or two. Regarding the kernelwarning, centos gives the same and has no problems as Epel should resolve this:
WARNING: You are running linux kernel version 2.6.32-431.17.1.el6.x86_64, which might be unstable running docker. Please upgrade your kernel to 3.8.0.
2014/08/07 08:58:29 docker daemon: 1.1.2 d84a070; execdriver: native; graphdriver:
[1233d0af] +job serveapi(unix:///var/run/docker.sock)
[1233d0af] +job initserver()
[1233d0af.initserver()] Creating server
2014/08/07 08:58:29 Listening for HTTP on unix (/var/run/docker.sock)
[1233d0af] +job init_networkdriver()
[1233d0af] -job init_networkdriver() = OK (0)
2014/08/07 08:58:29 WARNING: mountpoint not found
Anyone had any success overcoming this problem or should I throw in the towel and wait for the provider to update to RHEL 7?
I have the same issue.
(1) check cgconfig status
# /etc/init.d/cgconfig status
if it stopped, restart it
# /etc/init.d/cgconfig restart
check cgconfig is running
(2) check cgconfig is on
# chkconfig --list cgconfig
cgconfig 0:off 1:off 2:off 3:off 4:off 5:off 6:off
if cgconfig is off, turn it on
(3) if still does not work, may be some cgroups modules is missing. In the kernel .config file, make menuconfig, add those modules into kernel and recompile and reboot
after that, it should be OK
I ended up asking the same question at Google Groups and in the end finding a solution with some help. What worked for me was this:
umount cgroup
sudo service cgconfig start
The project of making Docker work was put on halt all the same. Later a problem of network connection for the containers. This took to much time to solve and had to give up.
So I spent the whole day trying to rig docker to work on my vps. I was running into this same error. Basically what it came down to was the fact that OpenVZ didn't support docker containers up until a couple months ago. Specifically this RHEL update:
https://openvz.org/Download/kernel/rhel6/042stab105.14
Assuming this is your problem, or some variation of it, the burden of solving it is on your host. They will need to follow these steps:
https://openvz.org/Docker_inside_CT
In my case
/etc/rc.d/rc.cgconfig start
was generating
Starting cgconfig service: Error: cannot mount cpu,cpuacct,memory to
/cgroup/cpu_and_mem: Device or resource busy /usr/sbin/cgconfigparser;
error loading /etc/cgconfig.conf: Cgroup mounting failed Failed to
parse /etc/cgconfig.conf
i had to use:
/etc/rc.d/rc.cgconfig restart
and it automagicly umouted and mounted groups
Stopping cgconfig service: Starting cgconfig service:
it seems like the cgconfig service not running,so check it!
# /etc/init.d/cgconfig status
# mkdir -p /cgroup/cpuacct /cgroup/memory /cgroup/devices /cgroup/freezer net_cls /cgroup/blkio
# cat /etc/cgconfig.conf |tail|grep "="|awk '{print "mount -t cgroup -o",$1,$1,$NF}'>cgroup_mount.sh
# sh ./cgroup_mount.sh
# /etc/init.d/cgconfig restart
# /etc/init.d/docker restart
This situation occurs when the kernel is booted with cgroup_disable=memory and /etc/cgconfig.conf contains memory = /cgroup/memory;
This causes only /cgroup/cpuset to be mounted instead of the full set.
Solution: either remove cgroup_disable=memory from your kernel boot options or comment out memory = /cgroup/memory; from cgconfig.conf.
The cgconfig service startup uses mount and umount which requires an extra privilege bump from docker.
See the --privileged=true flag here for more info.
I was able to overcome this issue by starting my container with:
docker run -it --privileged=true my-image.
Tested in Centos6, Centos6.5.

Resources