Live migration of a jboss/wildfly container with CRIU failed

Live migration of a jboss/wildfly container with CRIU failed - docker

I've tried to live migrate a wildfly-container to another host like described here. The example with the np container works well. When I replace the example with a simple jboss/wildfly container, I just received this error when criu tries to restore the container on the other host :
Error response from daemon: Cannot restore container <CONTAINER-ID>: criu failed: type NOTIFY errno 0
Error: failed to restore one or more containers
Because I didn't found a solution to this error, I've compiled the linux kernel like described on the criu website and here.
After that sudo criu check prints:
Warn (criu/libnetlink.c:54): ERROR -2 reported by netlink
Warn (criu/libnetlink.c:54): ERROR -2 reported by netlink
Warn (criu/sockets.c:711): The current kernel doesn't support packet_diag
Warn (criu/libnetlink.c:54): ERROR -2 reported by netlink
Warn (criu/sockets.c:721): The current kernel doesn't support netlink_diag
Info prctl: PR_SET_MM_MAP_SIZE is not supported
Looks good.
criu --version
Version: 2.11
docker --version
Docker version 1.6.2, build 7c8fca2
Checkpoint/Restore for an example shell script example worked very well. But when I want to checkpoint a container
docker run -d --name looper busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
with
criu dump -t $PID --images-dir /tmp/looper
I receive this output
Error (criu/sockets.c:132): Diag module missing (-2)
Error (criu/sockets.c:132): Diag module missing (-2)
Error (criu/sockets.c:132): Diag module missing (-2)
Error (criu/mount.c:701): mnt: 87:./etc/hosts doesn't have a proper root mount
Error (criu/cr-dump.c:1641): Dumping FAILED.`
I can't find some solutions with these errors. Is there any known solution to live migrate a wildfly-container?
Thanks in advance

Related

Can I run k8s master INSIDE a docker container? Getting errors about k8s looking for host's kernel details

In a docker container I want to run k8s.
When I run kubeadm join ... or kubeadm init commands I see sometimes errors like
\"modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could
not open moddep file
'/lib/modules/3.10.0-1062.1.2.el7.x86_64/modules.dep.bin'.
nmodprobe:
FATAL: Module configs not found in directory
/lib/modules/3.10.0-1062.1.2.el7.x86_64",
err: exit status 1
because (I think) my container does not have the expected kernel header files.
I realise that the container reports its kernel based on the host that is running the container; and looking at k8s code I see
// getKernelConfigReader search kernel config file in a predefined list. Once the kernel config
// file is found it will read the configurations into a byte buffer and return. If the kernel
// config file is not found, it will try to load kernel config module and retry again.
func (k *KernelValidator) getKernelConfigReader() (io.Reader, error) {
possibePaths := []string{
"/proc/config.gz",
"/boot/config-" + k.kernelRelease,
"/usr/src/linux-" + k.kernelRelease + "/.config",
"/usr/src/linux/.config",
}
so I am bit confused what is simplest way to run k8s inside a container such that it consistently past this getting the kernel info.
I note that running docker run -it solita/centos-systemd:7 /bin/bash on a macOS host I see :
# uname -r
4.9.184-linuxkit
# ls -l /proc/config.gz
-r--r--r-- 1 root root 23834 Nov 20 16:40 /proc/config.gz
but running exact same on a Ubuntu VM I see :
# uname -r
4.4.0-142-generic
# ls -l /proc/config.gz
ls: cannot access /proc/config.gz
[Weirdly I don't see this FATAL: Module configs not found in directory error every time, but I guess that is a separate question!]
UPDATE 22/November/2019. I see now that k8s DOES run okay in a container. Real problem was weird/misleading logs. I have added an answer to clarify.

I do not believe that is possible given the nature of containers.
You should instead test your app in a docker container then deploy that image to k8s either in the cloud or locally using minikube.
Another solution is to run it under kind which uses docker driver instead of VirtualBox
https://kind.sigs.k8s.io/docs/user/quick-start/

It seems the FATAL error part was a bit misleading.
It was badly formatted by my test environment (all on one line.
When k8s was failing I saw the FATAL and assumed (incorrectly) that was root cause.
When I format the logs nicely I see ...
kubeadm join 172.17.0.2:6443 --token 21e8ab.1e1666a25fd37338 --discovery-token-unsafe-skip-ca-verification --experimental-control-plane --ignore-preflight-errors=all --node-name 172.17.0.3
[preflight] Running pre-flight checks
[WARNING FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
[preflight] The system verification failed. Printing the output from the verification:
KERNEL_VERSION: 4.4.0-142-generic
DOCKER_VERSION: 18.09.3
OS: Linux
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.3. Latest validated version: 18.06
[WARNING SystemVerification]: failed to parse kernel config: unable to load kernel module: "configs", output: "modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-142-generic/modules.dep.bin'\nmodprobe: FATAL: Module configs not found in directory /lib/modules/4.4.0-142-generic\n", err: exit status 1
[discovery] Trying to connect to API Server "172.17.0.2:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://172.17.0.2:6443"
[discovery] Failed to request cluster info, will try again: [the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps cluster-info)]
There are other errors later, which I originally though were a side-effect of the nasty looking FATAL error e.g. .... "[util/etcd] Attempt timed out"]} but I now think root cause is Etcd part times out sometimes.
Adding this answer in case someone else puzzled like I was.

Running 'docker-compose up' throws permission denied when trying official samaple of Docker

I am using Docker 1.13 community edition on a CentOS 7 x64 machine. When I was following a Docker Compose sample from Docker official tutorial, all things were OK until I added these lines to the docker-compose.yml file:
volumes:
- .:/code
After adding it, I faced the following error:
can't open file 'app.py': [Errno 13] Permission denied. It seems that the problem is due to a SELinux limit. Using this post I ran the following command:
su -c "setenforce 0"
to solve the problem temporarily, but running this command:
chcon -Rt svirt_sandbox_file_t /path/to/volume
couldn't help me.

Finally I found the correct rule to add to SELinux:
# ausearch -c 'python' --raw | audit2allow -M my-python
# semodule -i my-python.pp
I found it when I opened the SELinux Alert Browser and clicked on 'Details' button on the row related to this error. The more detailed information from SELinux:
SELinux is preventing /usr/local/bin/python3.4 from read access on the
file app.py.
***** Plugin catchall (100. confidence) suggests **************************
If you believe that python3.4 should be allowed read access on the
app.py file by default. Then you should report this as a bug. You can
generate a local policy module to allow this access. Do allow this
access for now by executing:
ausearch -c 'python' --raw | audit2allow -M my-python
semodule -i my-python.pp

docker - driver "devicemapper" failed to remove root filesystem after process in container killed

I am using Docker version 17.06.0-ce on Redhat with devicemapper storage. I am launching a container running a long-running service. The master process inside the container sometimes dies for whatever reason. I get the following error message.
/bin/bash: line 1: 40 Killed python -u scripts/server.py start go
I would like the container to exit and to be restarted by docker. However docker never exits. If I do it manually I get the following error:
Error response from daemon: driver "devicemapper" failed to remove root filesystem.
After googling, I tried a bunch of things:
docker rm -f <container>
rm -f <pth to mount>
umount <pth to mount>
All result in device is busy. The only remedy right now is to reboot the host system which is obviously not a long-term solution.
Any ideas?

I had the same problem and the solution was a real surprise.
So here is the error om docker rm:
$ docker rm 08d51aad0e74
Error response from daemon: driver "devicemapper" failed to remove root filesystem for 08d51aad0e74060f54bba36268386fe991eff74570e7ee29b7c4d74047d809aa: remove /var/lib/docker/devicemapper/mnt/670cdbd30a3627ae4801044d32a423284b540c5057002dd010186c69b6cc7eea: device or resource busy
Then I did the following (basically go through all processes and look for docker in mountinfo):
$ grep docker /proc/*/mountinfo | grep 958722d105f8586978361409c9d70aff17c0af3a1970cb3c2fb7908fe5a310ac
/proc/20416/mountinfo:629 574 253:15 / /var/lib/docker/devicemapper/mnt/958722d105f8586978361409c9d70aff17c0af3a1970cb3c2fb7908fe5a310ac rw,relatime shared:288 - xfs /dev/mapper/docker-253:5-786536-958722d105f8586978361409c9d70aff17c0af3a1970cb3c2fb7908fe5a310ac rw,nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota
This got be the PID of the offending process keeping it busy - 20416 (the item after /proc/)
So I did a ps -p and to my surprise find:
[devops#dp01app5030 SeGrid]$ ps -p 20416
PID TTY TIME CMD
20416 ? 00:00:19 ntpd
A true WTF moment. So I pair problem solved with Google and found this:
Then found this https://github.com/docker/for-linux/issues/124
Turns out I had to restart ntp daemon and that fixed the issue!!!

Packer docker build exits code 137 when running runit cookbook

I'm trying to use Packer to build a docker image of the webapp I'm working on. Whenever I run packer build, when it gets to the step that it runs the runit recipe, I would get Build 'docker' errored: Error executing Chef: Non-zero exit status: 137
I looked into 137, and found out this is the exit code commonly associated with a kill -9. In most cases this would imply that the system is running critically low on memory, and the system is attempting to compensate.
I tried to find the smallest possible reproduction, and I came up with this packer configuration:
{
"builders":[{
"type": "docker",
"pull": false,
"image": "silkstart/basic_server",
"export_path": "image.tar",
"run_command":[
"-d",
"-i",
"-t",
"--memory-reservation",
"1G",
"{{.Image}}",
"/bin/bash"
]
}],
"provisioners":[
{
"type": "chef-solo",
"cookbook_paths": ["cookbooks", "vendor/cookbooks"],
"data_bags_path": "data_bags",
"roles_path": "roles",
"environments_path": "environments",
"run_list": [
"recipe[runit]"
]
}
],
"post-processors": [
{
"type": "docker-import",
"repository": "silkstart/docker_test",
"tag": "0.1"
}
]
}
When I run packer build on this configuration, this is my output:
TMPDIR=/opt/shared packer build packer_files/docker_test.json
docker output will be in this color.
==> docker: Creating a temporary directory for sharing data...
==> docker: Starting docker container...
docker: Run command: docker run -v /opt/shared/packer-docker484290992:/packer-files -d -i -t --memory-reservation 1G silkstart/basic_server /bin/bash
docker: Container ID: 1f87b0cf1fe71f07b580ae6b18415a79c23a1a32a40f5f0366be90f160977a50
==> docker: Provisioning with chef-solo
docker: Installing Chef...
docker: % Total % Received % Xferd Average Speed Time Time Time Current
docker: Dload Upload Total Spent Left Speed
docker: 100 20022 100 20022 0 0 45092 0 --:--:-- --:--:-- --:--:-- 45196
docker: Getting information for chef stable for ubuntu...
docker: downloading https://omnitruck-direct.chef.io/stable/chef/metadata?v=&p=ubuntu&pv=14.04&m=x86_64
docker: to file /tmp/install.sh.23/metadata.txt
docker: trying curl...
docker: url https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/14.04/x86_64/chef_12.6.0-1_amd64.deb
docker: md5 5cfc19d5a036b3f7860716bc9795a85e
docker: sha256 e0b42748daf55b5dab815a8ace1de06385db98e29a27ca916cb44f375ef65453
docker: version 12.6.0downloaded metadata file looks valid...
docker: downloading https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/14.04/x86_64/chef_12.6.0-1_amd64.deb
docker: to file /tmp/install.sh.23/chef_12.6.0-1_amd64.deb
docker: trying curl...
docker: Comparing checksum with sha256sum...
docker:
docker: WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING
docker:
docker: You are installing an omnibus package without a version pin. If you are installing
docker: on production servers via an automated process this is DANGEROUS and you will
docker: be upgraded without warning on new releases, even to new major releases.
docker: Letting the version float is only appropriate in desktop, test, development or
docker: CI/CD environments.
docker:
docker: WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING
docker:
docker: Installing chef
docker: installing with dpkg...
docker: Selecting previously unselected package chef.
docker: (Reading database ... 17195 files and directories currently installed.)
docker: Preparing to unpack .../chef_12.6.0-1_amd64.deb ...
docker: Unpacking chef (12.6.0-1) ...
docker: Setting up chef (12.6.0-1) ...
docker: Thank you for installing Chef!
docker: Creating directory: /tmp/packer-chef-solo
docker: Creating directory: /tmp/packer-chef-solo/cookbooks-0
docker: Creating directory: /tmp/packer-chef-solo/cookbooks-1
docker: Creating directory: /tmp/packer-chef-solo/roles
docker: Creating directory: /tmp/packer-chef-solo/data_bags
docker: Creating directory: /tmp/packer-chef-solo/environments
docker: Creating configuration file 'solo.rb'
docker: Creating JSON attribute file
docker: Executing Chef: sudo chef-solo --no-color -c /tmp/packer-chef-solo/solo.rb -j /tmp/packer-chef-solo/node.json
docker: [2016-01-29T06:42:48+00:00] INFO: Forking chef instance to converge...
docker: [2016-01-29T06:42:48+00:00] INFO: *** Chef 12.6.0 ***
docker: [2016-01-29T06:42:48+00:00] INFO: Chef-client pid: 207
docker: [2016-01-29T06:42:50+00:00] INFO: Setting the run_list to ["recipe[runit]"] from CLI options
docker: [2016-01-29T06:42:50+00:00] INFO: Run List is [recipe[runit]]
docker: [2016-01-29T06:42:50+00:00] INFO: Run List expands to [runit]
docker: [2016-01-29T06:42:50+00:00] INFO: Starting Chef Run for 1f87b0cf1fe7
docker: [2016-01-29T06:42:50+00:00] INFO: Running start handlers
docker: [2016-01-29T06:42:50+00:00] INFO: Start handlers complete.
docker: [2016-01-29T06:42:52+00:00] INFO: Processing service[runit] action nothing (runit::default line 20)
docker: [2016-01-29T06:42:52+00:00] INFO: Processing execute[start-runsvdir] action nothing (runit::default line 24)
docker: [2016-01-29T06:42:52+00:00] INFO: Processing execute[runit-hup-init] action nothing (runit::default line 33)
docker: [2016-01-29T06:42:52+00:00] INFO: Processing apt_package[runit] action install (runit::default line 64)
docker: [2016-01-29T06:42:55+00:00] INFO: Processing cookbook_file[/var/chef/cache/preseed/runit/runit-2.1.1-6.2ubuntu3.seed] action create (dynamically defined)
docker: [2016-01-29T06:42:55+00:00] INFO: cookbook_file[/var/chef/cache/preseed/runit/runit-2.1.1-6.2ubuntu3.seed] created file /var/chef/cache/preseed/runit/runit-2.1.1-6.2ubuntu3.seed
docker: [2016-01-29T06:42:55+00:00] INFO: cookbook_file[/var/chef/cache/preseed/runit/runit-2.1.1-6.2ubuntu3.seed] updated file contents /var/chef/cache/preseed/runit/runit-2.1.1-6.2ubuntu3.seed
docker: [2016-01-29T06:42:55+00:00] INFO: apt_package[runit] pre-seeding package installation instructions
==> docker: Killing the container: 1f87b0cf1fe71f07b580ae6b18415a79c23a1a32a40f5f0366be90f160977a50
Build 'docker' errored: Error executing Chef: Non-zero exit status: 137
I'm not entirely sure what is causing the code 137, and any help would be appreciated.
Update 1
I'm including a gist of the full debug output from Chef. It's much more verbose, mainly due it would seem to all of the attempts Ohai makes to get information.
https://gist.github.com/jrstarke/4c5f3b432aaee70c7f77
No references in here seem to suggest an out of memory error, at least on the docker host.

After much much digging, I found the problem. The underlying problem, and the solution were both found on an issue in cloudfoundry-incubator/garden-linux.
Apparently as part of the setup process one of the post init scripts for runit executes a kill -s HUP 1. Why I'm not entirely sure, but as they noted there, doing a trap '' HUP right before my apt-get install runit and a trap HUP afterwards totally solved my problem.

Check the OOM log on the host machine. Also you can use the execute_command configuration value to turn the log level to debug.

This answer seemed to work for me: https://stackoverflow.com/a/42398166/2878244
I had to increase the memory resources assigned to docker by going to the Docker Tab > Preferences > Advanced

error while executing the following commands

When I run the following commands I am getting the below output:
sudo docker run ubuntu /bin/echo hello world
WARNING: WARNING: Local (127.0.0.1) DNS resolver found in resolv.conf and containers can't use it. Using default external servers : [8.8.8.8 8.8.4.4]
And when I run docker version, the output is:
mkdir /var/lib/docker/containers: permission denied[/var/lib/docker|a0f30ece] -job initserver() = ERR (1)
2014/03/03 21:49:51 initserver: mkdir /var/lib/docker/containers: permission denied
What is the problem?
My Problem solved by following :
Try modify the /etc/default/docker file, un-comment the OPTS line:
6 # Use DOCKER_OPTS to modify the daemon startup options.
7 #DOCKER_OPTS="-dns 8.8.8.8"

Categories

HOME

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart