How to mount HostPath Volume in Kubernetes with SELinux - docker

I am trying to mount a hostPath volume into a Kubernetes Pod. An example of a hostPath volume specification is shown below, which is taken from the docs. I am deploying to hosts that are running RHEL 7 with SELinux enabled.
apiVersion: v1
kind: Pod
metadata:
name: test-pd
spec:
containers:
- image: k8s.gcr.io/test-webserver
name: test-container
volumeMounts:
- mountPath: /test-pd
name: test-volume
volumes:
- name: test-volume
hostPath:
# directory location on host
path: /data
# this field is optional
type: Directory
When my Pod tries to read from a file that has been mounted from the underlying host, I get a "Permission Denied" error. When I run setenforce 0 to turn off SELinux, the error goes away and I can access the file. I get the same error when I bind mount a directory into a Docker container.
The issue is described here and, when using Docker, can be fixed by using the z or Z bind mount flag, described in the Docker docs here.
Whilst I can fix the issue by running
chcon -Rt svirt_sandbox_file_t /path/to/my/host/dir/to/mount
I see this as a nasty hack, as I need to do this on every host in my Kubernetes cluster and also because my deployment of Kubernetes as described in the YAML spec is not a complete description of what it is that needs to be done to get my YAML to run correctly. Turning off SELinux is not an option.
I can see that Kubernetes mentions SELinux security contexts in the docs here, but I haven't been able to successfully mount a hostPath volume into a pod without getting the permission denied error.
What does the YAML need to look like to successfully enable a container to mount a HostPath volume from an underlying host that is running SELinux?
Update:
The file I am accessing is a CA certificate that has these labels:
system_u:object_r:cert_t:s0
When I use the following options:
securityContext:
seLinuxOptions:
level: "s0:c123,c456"
and then check the access control audit errors via ausearch -m avc -ts recent, I can see that there is a permission denied error where the container has a level label of s0:c123,c456, so I can see that the level label works. I have set the label to be s0.
However, if I try to change the type label to be cert_t, the container doesn't even start, there's an error :
container_linux.go:247: starting container process caused "process_linux.go:364: container init caused \"write /proc/self/task/1/attr/exec: invalid argument\""
I don't seem to be able to change the type label of the container.

Expanding on the answer from VAS as it is partially correct:
You can only specify the level portion of an SELinux label when relabeling a path destination pointed to by a hostPath volume. This is automatically done so by the seLinuxOptions.level attribute specified in your securityContext.
However attributes such as seLinuxOptions.type currently have no effect on volume relabeling. As of this writing, this is still an open issue within Kubernetes

You can assign SELinux labels using seLinuxOptions:
apiVersion: v1
kind: Pod
metadata:
name: test-pd
spec:
containers:
- image: k8s.gcr.io/test-webserver
name: test-container
volumeMounts:
- mountPath: /test-pd
name: test-volume
securityContext:
seLinuxOptions: # it may don’t have the desired effect
level: "s0:c123,c456"
securityContext:
seLinuxOptions:
level: "s0:c123,c456"
volumes:
- name: test-volume
hostPath:
# directory location on host
path: /data
# this field is optional
type: Directory
According to documentation:
Thanks to Phil for pointing that out. It appears to be working only in Pod.spec.securityContext according to the issue comment
seLinuxOptions: Volumes that support SELinux labeling are relabeled to be accessible by the label specified under seLinuxOptions. Usually you only need to set the level section. This sets the Multi-Category Security (MCS) label given to all Containers in the Pod as well as the Volumes.

You could try with full permissions:
...
image: k8s.gcr.io/test-webserver
securityContext:
privileged: true
...

Using selinux can solve this problem. Reference article:
https://zhimin-wen.medium.com/selinux-policy-for-openshift-containers-40baa1c86aa5
In addition: You can refer to the selinux parameters to set the addition, deletion, and modification of the mount directory
https://selinuxproject.org/page /ObjectClassesPerms
my setting:
If a directory's selinux is unconfined_u:object_r:kubernetes_file_t:s0 , you kan defind a selinux policy is:
module myapp 1.0;
require {
type kubernetes_file_t;
type container_t;
class file { create open read unlink write getattr execute setattr link };
class dir { add_name create read remove_name write };
}
#============= container_t ==============
#!!!! This avc is allowed in the current policy
allow container_t kubernetes_file_t:dir { add_name create read remove_name write };
#!!!! This avc is allowed in the current policy
allow container_t kubernetes_file_t:file { create open read unlink write getattr execute setattr link };
run command on node:
sudo checkmodule -M -m -o myapp.mod myapp.te
sudo semodule_package -o myapp.pp -m myapp.mod
sudo semodule -i myapp.pp

Related

Waiting for volume to be created either by external provisioner "pd.csi.storage.gke.io" or manually created by system administrator. Windows minikube

I created a PVC and then tried to expand the size of the volume claim.
Volume expansion is set to true as below:
minikube kubectl -- get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
fast kubernetes.io/gce-pd Delete Immediate true 55m
standard (default) k8s.io/minikube-hostpath Delete Immediate true 156m
I patched the PVC using kubectl edit.
When I described the PVC I get the below message:
Normal ExternalProvisioning 93s (x177 over 61m) persistent volume-controller waiting for a volume to be created, either by external provisioner "pd.csi.storage.gke.io" or manually created by the system administrator.
Should I create a volume here? Please Help.
Please refer this code to reproduce the issue.
It seems you are creating a PVC using a GKE provision, in this case, you don't need to create PV, only PVC example:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: volume-name
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
Google is only allowing ReadWriteOnce and ReadOnlyOnce for this type of dynamic provisioning yet. Basically, after applying this config you still need to create a pod to consume this volume. After this the creation and bind are complete.

why the mountpoint in container can't been seen in host

I want do some device initiation by using Daemonset(K8s resource).
Actually the deivce has been formated(inside container) and mounted(inside container) successfully to a container path /hostmnt/lvpmem/ which is mapped of /mnt/ which is a host path.
mountpoint works fine in container
[root#driver-hm4ll /]
#mountpoint /hostmnt/lvpmem/
/hostmnt/lvpmem/ is a mountpoint
but mountpoint works wrong in host env
[root#host ~]# mountpoint /mnt/lvpmem/
/mnt/lvpmem/ is not a mountpoint
Also the data I write in container under /hostmnt/lvpmem/ can't been seen under /mnt/lvpmem/ in host env.
How can I mount the device so that both host and container can see it ?
Also, if container is destroyed does the mount relation also be destroyed ? I have no idea about umounting the device in host env if mount relation can't be seen.
Some opensource project use nsenter in container to run such format/mount command does it help ?
add /mnt as a volume to pod on directory /hostmnt. So that whatever being written under /hostmnt directory (insisde the container) will be seen on host under directory /mnt .
Example of a pod with hostpath :
apiVersion: v1
kind: Pod
metadata:
name: test-pd
spec:
containers:
- image: k8s.gcr.io/test-webserver
name: test-container
volumeMounts:
- mountPath: /hostmnt
name: test-volume
volumes:
- name: test-volume
hostPath:
# directory location on host
path: /mnt
# this field is optional
type: Directory
Mount point seen in container is just the snapshot when container start. My solution:
Enable hostPID for pod and use nsenter to mount on host
Restart container itself by using exit then the mount point can be seen in new container

How to grant non root user wrote permissions on kubernetes SBM flexvolume mount?

Please help! I'm struggling with this for a few days now...
I'm trying to write to a mount in a Kubernetes pod with a non-root user and getting access denied.
In the Kubernetes manifest, I am mounting a windows shared folder like this:
kind: Deployment
apiVersion: apps/v1
metadata:
name: centos-deployment
spec:
template:
spec:
volumes:
- name: windows-mount
flexVolume:
driver: microsoft.com/smb
secretRef:
name: centos-credentials
options:
mountOptions: 'cifs,dir_mode=0777,file_mode=0777'
source: //100.200.300.400/windows-share
containers:
- name: centos-pod
image: 'centos:latest'
command:
- sh
- '-c'
- sleep 1000000
volumeMounts:
- name: windows-mount
mountPath: /var/windows-share
and in the Dockerfile I'm changing to application user like so:
# Drop from 'root' user to 'nobody' (user with no privileges).
USER nobody:nobody
But now, the mount is owned by "root". The "root" user can write to the path but the user "nobody" cannot.
I tried init container to run chmod -R 775 on the folder, but it looks like the root user cannot change the permissions or ownership of the mount. (umask command returned 022)
If I exec into the pod, I can see the mount is set with 755 permissions instead of 777
"file_mode=0755,dir_mode=0755"
[root#centos-deployment-5d46bd8b89-tzghs /]# mount | grep windows-share
//100.200.300.400/windows-share on /var/windows-share type cifs (rw,relatime,vers=default,cache=strict,username=*******,domain=,uid=0,noforceuid,gid=0,noforcegid,addr=100.200.300.400,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
Any idea how to mount a Windows share so that it is writable by non-root user?
Thanks! any help will be very appreciated
Full reference: https://linux.die.net/man/8/mount.cifs
Try playing with the mountOptions. For example:
uid=arg - sets the uid that will own all files or directories on the mounted filesystem when the server does not provide ownership information. It may be specified as either a username or a numeric uid. When not specified, the default is uid 0. The mount.cifs helper must be at version 1.10 or higher to support specifying the uid in non-numeric form. See the section on FILE AND DIRECTORY OWNERSHIP AND PERMISSIONS below for more information.
volumes:
- name: windows-mount
...
options:
mountOptions: 'cifs,uid=<YOUR_USERNAME>,dir_mode=0777,file_mode=0777'
If this doesen't work you can also try adding noperm mount option.
noperm - Client does not do permission checks. This can expose files on this mount to access by other users on the local client system. It is typically only needed when the server supports the CIFS Unix Extensions but the UIDs/GIDs on the client and server system do not match closely enough to allow access by the user doing the mount. Note that this does not affect the normal ACL check on the target machine done by the server software (of the server ACL against the user name provided at mount time).
volumes:
- name: windows-mount
...
options:
mountOptions: 'cifs,dir_mode=0777,file_mode=0777,noperm'
About using chmod/chown on CIFS
The core CIFS protocol does not provide unix ownership information or mode for files and directories. Because of this, files and directories will generally appear to be owned by whatever values the uid= or gid= options are set, and will have permissions set to the default file_mode and dir_mode for the mount. Attempting to change these values via chmod/chown will return success but have no effect.

How to expose low-numbered ports in the kubernetes mini-cluster that comes with Docker Desktop

I'm using the kubernetes cluster built in to Docker Desktop to develop my application.
I would like to expose services inside the cluster as ports on localhost.
I can do so using kubectl expose deployment foobar --type=NodePort --port=30088, which creates a service like this:
apiVersion: v1
kind: Service
metadata:
labels:
role: web
name: foobar
spec:
externalTrafficPolicy: Cluster
ports:
- nodePort: 30088
port: 80
protocol: TCP
targetPort: 80
selector:
role: web
type: NodePort
But it only works for very high numbered ports. If I try something lower I get:
The Service "kafka-external" is invalid: spec.ports[0].nodePort: Invalid value: 9092: provided port is not in the valid range. The range of valid ports is 30000-32767
It seems there is a kubernetes apiserver setting called ServiceNodePortRange which would allow me to override this restriction, but I can't figure out how to set it on Docker's builtin cluster.
So my question is: how do I expose a specific, low-numbered port (like 9092) on Docker's kubernetes cluster? Is there a way to override that setting? Or a better way to expose the service than NodePort?
NodePort is intended to be a building block for load-balancers or other
ingress modes. This means it didn't matter which port you got as long as
you got one. This makes it a little clunky to use directly - you can't
have just any port. You can change the port range, but you run the risk of
conflicts with real things running on your nodes and with any pod HostPorts.
The default range is indeed 30000-32767 but it can be changed by setting the --service-node-port-range Update the file /etc/kubernetes/manifests/kube-apiserver.yaml and add the line --service-node-port-range=xxxxx-yyyyy.
In the Kubernetes cluster there is a kube-apiserver.yaml file which is in the directory - /etc/kubernetes/manifests/kube-apiserver.yaml but not on the kube-apiserver container/pod but on the master itself.
Login to Docker VM:
Add the following line to the pod spec:
spec:
containers:
- command:
- kube-apiserver
...
- --service-node-port-range=xxxxx-yyyyy # <-- add this line
...
Save and exit. Pod kube-apiserver will be restarted with new parameters.
Exit Docker VM (for screen: Ctrl-a,k , for container: Ctrl-d )
Check the results:
$ kubectl get pod kube-apiserver-docker-desktop -o yaml -n kube-system | less
Take a look: service-pod-range, changing pod range, changing-nodeport-range.

container labels in kubernetes

I am building my docker image with jenkins using:
docker build --build-arg VCS_REF=$GIT_COMMIT \
--build-arg BUILD_DATE=`date -u +"%Y-%m-%dT%H:%M:%SZ"` \
--build-arg BUILD_NUMBER=$BUILD_NUMBER -t $IMAGE_NAME\
I was using Docker but I am migrating to k8.
With docker I could access those labels via:
docker inspect --format "{{ index .Config.Labels \"$label\"}}" $container
How can I access those labels with Kubernetes ?
I am aware about adding those labels in .Metadata.labels of my yaml files but I don't like it that much because
- it links those information to the deployment and not the container itself
- can be modified anytime
...
kubectl describe pods
Thank you
Kubernetes doesn't expose that data. If it did, it would be part of the PodStatus API object (and its embedded ContainerStatus), which is one part of the Pod data that would get dumped out by kubectl get pod deployment-name-12345-abcde -o yaml.
You might consider encoding some of that data in the Docker image tag; for instance, if the CI system is building a tagged commit then use the source control tag name as the image tag, otherwise use a commit hash or sequence number. Another typical path is to use a deployment manager like Helm as the principal source of truth about deployments, and if you do that there can be a path from your CD system to Helm to Kubernetes that can pass along labels or annotations. You can also often set up software to know its own build date and source control commit ID at build time, and then expose that information via an informational-only API (like an HTTP GET /_version call or some such).
I'll add another option.
I would suggest reading about the Recommended Labels by K8S:
Key Description
app.kubernetes.io/name The name of the application
app.kubernetes.io/instance A unique name identifying the instance of an application
app.kubernetes.io/version The current version of the application (e.g., a semantic version, revision hash, etc.)
app.kubernetes.io/component The component within the architecture
app.kubernetes.io/part-of The name of a higher level application this one is part of
app.kubernetes.io/managed-by The tool being used to manage the operation of an application
So you can use the labels to describe a pod:
apiVersion: apps/v1
kind: Pod # Or via Deployment
metadata:
labels:
app.kubernetes.io/name: wordpress
app.kubernetes.io/instance: wordpress-abcxzy
app.kubernetes.io/version: "4.9.4"
app.kubernetes.io/managed-by: helm
app.kubernetes.io/component: server
app.kubernetes.io/part-of: wordpress
And use the downward api (which works in a similar way to reflection in programming languages).
There are two ways to expose Pod and Container fields to a running Container:
1 ) Environment variables.
2 ) Volume Files.
Below is an example for using volumes files:
apiVersion: v1
kind: Pod
metadata:
name: kubernetes-downwardapi-volume-example
labels:
version: 4.5.6
component: database
part-of: etl-engine
annotations:
build: two
builder: john-doe
spec:
containers:
- name: client-container
image: k8s.gcr.io/busybox
command: ["sh", "-c"]
args: # < ------ We're using the mounted volumes inside the container
- while true; do
if [[ -e /etc/podinfo/labels ]]; then
echo -en '\n\n'; cat /etc/podinfo/labels; fi;
if [[ -e /etc/podinfo/annotations ]]; then
echo -en '\n\n'; cat /etc/podinfo/annotations; fi;
sleep 5;
done;
volumeMounts:
- name: podinfo
mountPath: /etc/podinfo
volumes: # < -------- We're mounting in our example the pod's labels and annotations
- name: podinfo
downwardAPI:
items:
- path: "labels"
fieldRef:
fieldPath: metadata.labels
- path: "annotations"
fieldRef:
fieldPath: metadata.annotations
Notice that in the example we accessed the labels and annotations that were passed and mounted to the /etc/podinfo path.
Beside labels and annotations, the downward API exposed multiple additional options like:
The pod's IP address.
The pod's service account name.
The node's name and IP.
A Container's CPU limit , CPU request , memory limit, memory request.
See full list in here.
(*) A nice blog discussing the downward API.
(**) You can view all your pods labels with
$ kubectl get pods --show-labels
NAME ... LABELS
my-app-xxx-aaa pod-template-hash=...,run=my-app
my-app-xxx-bbb pod-template-hash=...,run=my-app
my-app-xxx-ccc pod-template-hash=...,run=my-app
fluentd-8ft5r app=fluentd,controller-revision-hash=...,pod-template-generation=2
fluentd-fl459 app=fluentd,controller-revision-hash=...,pod-template-generation=2
kibana-xyz-adty4f app=kibana,pod-template-hash=...
recurrent-tasks-executor-xaybyzr-13456 pod-template-hash=...,run=recurrent-tasks-executor
serviceproxy-1356yh6-2mkrw app=serviceproxy,pod-template-hash=...
Or viewing only specific label with $ kubectl get pods -L <label_name>.

Resources