Unable to mount hugetlbfs in non-root docker container

Unable to mount hugetlbfs in non-root docker container - docker

Trying to run a dpdk app in a container without the --privileged option to docker run.
I've created the hugetlbfs mounts on the host machine (e.g., mount -t hugetlbfs nodev /tmp/mnt/huge)
# mount | grep huge
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb)
nodev on /tmp/mnt/huge type hugetlbfs (rw,relatime,seclabel)
However, within the container I get:
EAL: Selected IOVA mode 'VA'
EAL: 64 hugepages of size 1073741824 reserved, but no mounted hugetlbfs found for that size
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Failed to get current mempolicy: Operation not permitted. Assuming MPOL_DEFAULT.
EAL: set_mempolicy failed: Operation not permitted
EAL: set_mempolicy failed: Operation not permitted
EAL: Failed to get current mempolicy: Operation not permitted. Assuming MPOL_DEFAULT.
EAL: set_mempolicy failed: Operation not permitted
EAL: set_mempolicy failed: Operation not permitted
EAL: error allocating rte services array
EAL: FATAL: rte_service_init() failed
EAL: rte_service_init() failed
I've tried both setting the ownership of the mount point to the non-root-user as well as trying to bind the mount point in various ways (e.g., -v /mnt/huge:/mnt/huge -v /dev/hugepages:/dev/hugepages). I've even tried non-standard mountpoints and passing --huge-dir non/standard/mountpoint to the eal_args.
One stackoverflow answer suggested using the --device option (e.g., --device /dev/hugepages), but for /dev/hugepages or /dev/hugetlbfs I get the following error:
/usr/bin/docker-current: Error response from daemon: linux runtime spec devices: error gathering device information while adding custom device "/dev/hugepages": not a device node.
I've also passed in a lot of syscaps like sys_rawio, sys_admin, sys_resource, but doubt those are the issue.
It seems like people have been able to do it in the past. Not sure if there are new features in docker that lock down memory further in containers preventing what I'm trying to do. Any suggestions?

Related

Can I run k8s master INSIDE a docker container? Getting errors about k8s looking for host's kernel details

In a docker container I want to run k8s.
When I run kubeadm join ... or kubeadm init commands I see sometimes errors like
\"modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could
not open moddep file
'/lib/modules/3.10.0-1062.1.2.el7.x86_64/modules.dep.bin'.
nmodprobe:
FATAL: Module configs not found in directory
/lib/modules/3.10.0-1062.1.2.el7.x86_64",
err: exit status 1
because (I think) my container does not have the expected kernel header files.
I realise that the container reports its kernel based on the host that is running the container; and looking at k8s code I see
// getKernelConfigReader search kernel config file in a predefined list. Once the kernel config
// file is found it will read the configurations into a byte buffer and return. If the kernel
// config file is not found, it will try to load kernel config module and retry again.
func (k *KernelValidator) getKernelConfigReader() (io.Reader, error) {
possibePaths := []string{
"/proc/config.gz",
"/boot/config-" + k.kernelRelease,
"/usr/src/linux-" + k.kernelRelease + "/.config",
"/usr/src/linux/.config",
}
so I am bit confused what is simplest way to run k8s inside a container such that it consistently past this getting the kernel info.
I note that running docker run -it solita/centos-systemd:7 /bin/bash on a macOS host I see :
# uname -r
4.9.184-linuxkit
# ls -l /proc/config.gz
-r--r--r-- 1 root root 23834 Nov 20 16:40 /proc/config.gz
but running exact same on a Ubuntu VM I see :
# uname -r
4.4.0-142-generic
# ls -l /proc/config.gz
ls: cannot access /proc/config.gz
[Weirdly I don't see this FATAL: Module configs not found in directory error every time, but I guess that is a separate question!]
UPDATE 22/November/2019. I see now that k8s DOES run okay in a container. Real problem was weird/misleading logs. I have added an answer to clarify.

I do not believe that is possible given the nature of containers.
You should instead test your app in a docker container then deploy that image to k8s either in the cloud or locally using minikube.
Another solution is to run it under kind which uses docker driver instead of VirtualBox
https://kind.sigs.k8s.io/docs/user/quick-start/

It seems the FATAL error part was a bit misleading.
It was badly formatted by my test environment (all on one line.
When k8s was failing I saw the FATAL and assumed (incorrectly) that was root cause.
When I format the logs nicely I see ...
kubeadm join 172.17.0.2:6443 --token 21e8ab.1e1666a25fd37338 --discovery-token-unsafe-skip-ca-verification --experimental-control-plane --ignore-preflight-errors=all --node-name 172.17.0.3
[preflight] Running pre-flight checks
[WARNING FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
[preflight] The system verification failed. Printing the output from the verification:
KERNEL_VERSION: 4.4.0-142-generic
DOCKER_VERSION: 18.09.3
OS: Linux
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.3. Latest validated version: 18.06
[WARNING SystemVerification]: failed to parse kernel config: unable to load kernel module: "configs", output: "modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-142-generic/modules.dep.bin'\nmodprobe: FATAL: Module configs not found in directory /lib/modules/4.4.0-142-generic\n", err: exit status 1
[discovery] Trying to connect to API Server "172.17.0.2:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://172.17.0.2:6443"
[discovery] Failed to request cluster info, will try again: [the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps cluster-info)]
There are other errors later, which I originally though were a side-effect of the nasty looking FATAL error e.g. .... "[util/etcd] Attempt timed out"]} but I now think root cause is Etcd part times out sometimes.
Adding this answer in case someone else puzzled like I was.

Headless Chromium on Docker fails

With some sites headless Chromium is failing when it is running inside Docker container:
[0520/093103.024239:ERROR:platform_shared_memory_region_posix.cc(268)] Failed to reserve 16728064 bytes for shared memory.: No space left on device (28)
[0520/093103.024591:ERROR:validation_errors.cc(76)] Invalid message: VALIDATION_ERROR_UNEXPECTED_NULL_POINTER (null field 1)
[0520/093103.024946:FATAL:memory.cc(22)] Out of memory. size=16723968
How should I tune Docker to fix this?

You're running out of shared memory as is described in line 1.
[0520/093103.024239:ERROR:platform_shared_memory_region_posix.cc(268)] Failed to reserve 16728064 bytes for shared memory.: No space left on device (28)
This is handled by /dev/shm which is set to a default of 64mb in Docker, which isn't that much for modern web applications.
For context on /dev/shm see here https://superuser.com/questions/45342/when-should-i-use-dev-shm-and-when-should-i-use-tmp
Option 1:
Run chrome with --disable-dev-shm-usage
Option 2:
Set /dev/shm size to a reasonable amount docker run -it --shm-size=1g replacing 1g with whatever amount you want.

How to run replicas on memory tmpfs on host in OpenEBS?

Is their a way to run replicas on memory tmpfs on host. I got the problem (infinity restart)
time="2018-11-02T21:55:05Z" level=fatal msg="Error running start replica command: failed to find extents, error: invalid argument"
Is the service able to work on disks mounted in memory?

Currently OpenEBS Jiva storage engine support only those file systems which supports extents mapping ext4,XFS etc...
where as tmpfs does not support extents mapping hence it fails.

ceph mount failed with (95) Operation not supported

I installed ceph on servers "A" and "B" and I would like to mount it from "C" or "D" servers.
But I faced below error.
ceph-fuse[4628]: ceph mount failed with (95) Operation not supported
My server configuration is as follow.
A Server: ubunt16.04(ceph-server) 10.1.1.54
B Server: ubuntu16.04(ceph-server) 10.1.1.138
C Server: AmazonLinux(clinet)
D Server: ubuntu16.04(client)
and ceph.conf
[global]
fsid = 44f299ac-ff11-41c8-ab96-225d62cb3226
mon_initial_members = node01, node02
mon_host = 10.1.1.54,10.1.1.138
auth cluster required = none
auth service required = none
auth client required = none
auth supported = none
osd pool default size = 2
public network = 10.1.1.0/24
Ceph is also installed correctly.
ceph health
HEALTH_OK
ceph -s
cluster 44f299ac-ff11-41c8-ab96-225d62cb3226
health HEALTH_OK
monmap e1: 2 mons at {node01=10.1.1.54:6789/0,node02=10.1.1.138:6789/0}
election epoch 12, quorum 0,1 node01,node02
osdmap e41: 2 osds: 2 up, 2 in
flags sortbitwise,require_jewel_osds
pgmap v100: 64 pgs, 1 pools, 306 bytes data, 4 objects
69692 kB used, 30629 MB / 30697 MB avail
64 active+clean
An error occurred when using the ceph-fuse command.
sudo ceph-fuse -m 10.1.1.138:6789 /mnt/mycephfs/ --debug-auth=10 --debug-ms=10
ceph-fuse[4628]: starting ceph client
2017-11-02 08:57:22.905630 7f8cfdd60f00 -1 init, newargv = 0x55779de6af60 newargc=11
ceph-fuse[4628]: ceph mount failed with (95) Operation not supported
ceph-fuse[4626]: mount failed: (95) Operation not supported
I got an error saying "ceph mount failed with (95) Operation not supported"
I added an option "--auth-client-required=none"
sudo ceph-fuse -m 10.1.1.138:6789 /mnt/mycephfs/ --debug-auth=10 --debug-ms=10 --auth-client-required=none
ceph-fuse[4649]: starting ceph client
2017-11-02 09:03:47.501363 7f1239858f00 -1 init, newargv = 0x5597621eaf60 newargc=11
Behavior changed, There is no response here.
I got an below error if ceph-fuse command is not used.
sudo mount -t ceph 10.1.1.138:6789:/ /mnt/mycephfs
can't read superblock
Somehow, it seems necessary to authenticate with client even with "auth supported = none"
In that case, how could you pass authentication form servers "c" or "d"?
Please let me know, If there is possible cause other than authentication.

I thought that you need more steps such as format the file system, so
you should check again your installation steps for your purposes, Ceph has multiple components for each services, such as object storage, block storage, file system and API. And each service was required its configuration steps.
This installation gude is helpful for your cases.
https://github.com/infn-bari-school/cloud-storage-tutorials/wiki/Ceph-cluster-installation-(jewel-on-CentOS)
If you want to build the Ceph file system for testing, you can build the small size CephFS as following installation steps.
I'll skip the details of the steps and CLI usages, you can get more information from the official documents.
Environment informations
Ceph version: Jewel, 10.2.9
OS: CentOS 7.4
Prerequisite before installation of Ceph file system.
Required this configuration 4 nodes,
ceph-admin node: deploy monitor, metadata server
ceph-osd0: osd service
ceph-osd1: osd service
ceph-osd2: osd service
Enabling NTP - all nodes
The OS user for deploying ceph compnents required escalation privileges setting (e.g. sudoers)
SSH public key configuration (directions: ceph-admin -> OSD nodes)
Installation of ceph-deploy tool on ceph-admin Admin node.
# yum install -y ceph-deploy
Deploying the required the Ceph components for the Ceph file system
Create the cluster on ceph-admin Admin node using normal OS user (for deploying ceph components)
$ mkdir ./cluster
$ cd ./cluster
$ ceph-deploy new ceph-admin
modify the ceph.conf into the cluster directory.
$ vim ceph.conf
[global]
..snip...
mon_initial_members = ceph-admin
mon_host = $MONITORSERVER_IP_OR_HOSTNAME
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
# the number of replicas for objects in the pool, default value is 3
osd pool default size = 3
public network = $YOUR_SERVICE_NETWORK_CIDR
installing monitor and osd services to related nodes.
$ ceph-deploy install --release jewel ceph-admin ceph-osd0 ceph-osd1 ceph-osd2
initiate monitor service
$ ceph-deploy mon create-initial
Create the OSD devices
ceph-deploy osd list ceph-osd{0..2}:vdb
Adding metadata server component for Ceph file system service.
Adding metadata server (this service just required only Ceph file system)
ceph-deploy mds create ceph-admin
check the status
ceph mds stat
create the pools for cephFS
ceph osd pool create cephfs_data_pool 64
ceph osd pool create cephfs_meta_pool 64
Create the ceph file systems
ceph fs new cephfs cephfs_meta_pool cephfs_data_pool
Mount the Ceph file system
Required ceph-fuse package on the node for mounting.
mount as the cephFS
ceph-fuse -m MONITOR_SERVER_IP_OR_HOSTNAME:PORT_NUMBER <LOCAL_MOUNTPOINT>
End...

I solved this problem by fixing three settings.
1.
The auth settings in ceph.conf returned as follows
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
2.
public network was wrong.
public network = 10.1.1.0/24
↓
public network = 10.0.0.0/8
my client ip address was 10.1.0.238...
It was a stupid mistake.
3.
I changed secret option to secretfile option and everything was fine.
In this case, it was failed.
sudo mount -t ceph 10.1.1.138:6789:/ /mnt/mycephfs -o name=client.admin,secret=`sudo ceph-authtool -p /etc/ceph/ceph.client.admin.keyring`
output:
mount error 1 = Operation not permitted
but in this case, It was success.
sudo mount -vvvv -t ceph 10.1.1.138:6789:/ /mnt/mycephfs -o name=admin,secretfile=admin.secret
output:
parsing options: rw,name=admin,secretfile=admin.secret
mount: error writing /etc/mtab: Invalid argument
※ Invalid argument The error seems to be ignored.
Apparently, Both are the same key.
sudo ceph-authtool -p /etc/ceph/ceph.client.admin.keyring
AQBd9f9ZSL46MBAAqwepJDC5tuIL/uYp0MXjCA==
cat admin.secret
AQBd9f9ZSL46MBAAqwepJDC5tuIL/uYp0MXjCA==
I don't know reason,but I could mount using secretfile option.

Docker on RHEL 6 Cgroup mounting failing

I'm trying to get my head around something that's been working on a Centos+Vagrant, but not on our providers RHEL (Red Hat Enterprise Linux Server release 6.5 (Santiago)). A sudo service docker restart hands this:
Stopping docker: [ OK ]
Starting cgconfig service: Error: cannot mount cpuset to /cgroup/cpuset: Device or resource busy
/sbin/cgconfigparser; error loading /etc/cgconfig.conf: Cgroup mounting failed
Failed to parse /etc/cgconfig.conf [FAILED]
Starting docker: [ OK ]
The service starts okey enough, but images cannot run. A mounting failed error is shown when I try. And the startup-log also gives a warning or two. Regarding the kernelwarning, centos gives the same and has no problems as Epel should resolve this:
WARNING: You are running linux kernel version 2.6.32-431.17.1.el6.x86_64, which might be unstable running docker. Please upgrade your kernel to 3.8.0.
2014/08/07 08:58:29 docker daemon: 1.1.2 d84a070; execdriver: native; graphdriver:
[1233d0af] +job serveapi(unix:///var/run/docker.sock)
[1233d0af] +job initserver()
[1233d0af.initserver()] Creating server
2014/08/07 08:58:29 Listening for HTTP on unix (/var/run/docker.sock)
[1233d0af] +job init_networkdriver()
[1233d0af] -job init_networkdriver() = OK (0)
2014/08/07 08:58:29 WARNING: mountpoint not found
Anyone had any success overcoming this problem or should I throw in the towel and wait for the provider to update to RHEL 7?

I have the same issue.
(1) check cgconfig status
# /etc/init.d/cgconfig status
if it stopped, restart it
# /etc/init.d/cgconfig restart
check cgconfig is running
(2) check cgconfig is on
# chkconfig --list cgconfig
cgconfig 0:off 1:off 2:off 3:off 4:off 5:off 6:off
if cgconfig is off, turn it on
(3) if still does not work, may be some cgroups modules is missing. In the kernel .config file, make menuconfig, add those modules into kernel and recompile and reboot
after that, it should be OK

I ended up asking the same question at Google Groups and in the end finding a solution with some help. What worked for me was this:
umount cgroup
sudo service cgconfig start
The project of making Docker work was put on halt all the same. Later a problem of network connection for the containers. This took to much time to solve and had to give up.

So I spent the whole day trying to rig docker to work on my vps. I was running into this same error. Basically what it came down to was the fact that OpenVZ didn't support docker containers up until a couple months ago. Specifically this RHEL update:
https://openvz.org/Download/kernel/rhel6/042stab105.14
Assuming this is your problem, or some variation of it, the burden of solving it is on your host. They will need to follow these steps:
https://openvz.org/Docker_inside_CT

In my case
/etc/rc.d/rc.cgconfig start
was generating
Starting cgconfig service: Error: cannot mount cpu,cpuacct,memory to
/cgroup/cpu_and_mem: Device or resource busy /usr/sbin/cgconfigparser;
error loading /etc/cgconfig.conf: Cgroup mounting failed Failed to
parse /etc/cgconfig.conf
i had to use:
/etc/rc.d/rc.cgconfig restart
and it automagicly umouted and mounted groups
Stopping cgconfig service: Starting cgconfig service:

it seems like the cgconfig service not running,so check it!
# /etc/init.d/cgconfig status
# mkdir -p /cgroup/cpuacct /cgroup/memory /cgroup/devices /cgroup/freezer net_cls /cgroup/blkio
# cat /etc/cgconfig.conf |tail|grep "="|awk '{print "mount -t cgroup -o",$1,$1,$NF}'>cgroup_mount.sh
# sh ./cgroup_mount.sh
# /etc/init.d/cgconfig restart
# /etc/init.d/docker restart

This situation occurs when the kernel is booted with cgroup_disable=memory and /etc/cgconfig.conf contains memory = /cgroup/memory;
This causes only /cgroup/cpuset to be mounted instead of the full set.
Solution: either remove cgroup_disable=memory from your kernel boot options or comment out memory = /cgroup/memory; from cgconfig.conf.

The cgconfig service startup uses mount and umount which requires an extra privilege bump from docker.
See the --privileged=true flag here for more info.
I was able to overcome this issue by starting my container with:
docker run -it --privileged=true my-image.
Tested in Centos6, Centos6.5.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart