Serving assets using nginx, kubernetes and docker - docker

There is a need to serve assets such as images/video/blobs that are going to be used in a website and a set of mobile apps. I am thinking of running following setup:
Run nginx in a docker container to serve assets
Run a side car container with a custom app which will pull these assets from a remote location and put in to a 'local storage'. Nginx gets assets from this local storage. Custom app will keep local storage up to date.
To run this setup I need to make sure that the pods that runs these two containers have a local storage which is accessible from both containers. To achieve this, I am thinking of restricting these pods to a set of nodes in the kubernetes cluster and provision local persisted volumes in these nodes. Does this make sense?

To achieve this, I am thinking of restricting these pods to a set of nodes in the kubernetes cluster and provision local persisted volumes in these nodes.
Why go for a Persistent Volume when the Sidecar container can pull the assets at any time from the remote location. Create a volume with EmptyDirVolumeSource and mount it on both the containers in the Pod. The Sidecar container has Write permissions on the volume and the main container has a Read permission.

From the description of your issue, it looks like distributed file systems might be what you are looking for.
For example CephFS and Glusterfs are supported in Kubernetes ( Volumes, PersistentVolumes ) and have a good set of capabilities like concurrent access ( both ) and PVC expanding ( Glusterfs ):
cephfs
A cephfs volume allows an existing CephFS volume to be mounted into your Pod. Unlike emptyDir, which is erased when a Pod is
removed, the contents of a cephfs volume are preserved and the volume
is merely unmounted. This means that a CephFS volume can be
pre-populated with data, and that data can be “handed off” between
Pods. CephFS can be mounted by multiple writers simultaneously.
Important: You must have your own Ceph server running with the share
exported before you can use it.
See the CephFS example for more details.
glusterfs
A glusterfs volume allows a Glusterfs (an open source networked filesystem) volume to be mounted into your Pod.
Unlike emptyDir, which is erased when a Pod is removed, the contents
of a glusterfs volume are preserved and the volume is merely
unmounted. This means that a glusterfs volume can be pre-populated
with data, and that data can be “handed off” between Pods. GlusterFS
can be mounted by multiple writers simultaneously.
Important: You must have your own GlusterFS installation running
before you can use it.
See the GlusterFS example for more details.
For more information about these topics check out the following links:
Using Existing Ceph Cluster for Kubernetes Persistent Storage
GlusterFS Native Storage Service for Kubernetes
TWO DAYS OF PAIN OR HOW I DEPLOYED GLUSTERFS CLUSTER TO KUBERNETES

Related

Binding of volume to Docker Container via Kubernetes

I'm new to the area, sorry if my question sounds dumb.
What I'm looking for: I have a containers pod, where one of the containers (alpine based) should read/write from/to the customer's provided file. I don't want to limit customer on how to provide file (or at least to support most common ways).
And file's size might be huge sometimes (not sure if that requirement makes any difference).
The more flexibility here the better.
From the initial search I found there are multiple ways to bind the volume/directory to docker's container:
Docker bind mount - sharing dir between Host and container (nice to have)
Add a docker volume to the pod (must have)
Mount AWS S3 bucket to docker's container (must have)
any other ways of supplying file access to the container? Let's say from the remote machine via sftp access?
But main question - is it all possible to configure via Kubernetes?
Ideally in the same yaml file that starts the containers?
Any hints/examples are very welcome!
It surely is possible!
Like there are volume mount for a docker container, there are volume mounts in Kubernetes as well.
This is achieved using Persistent Volume Claim, PVC. These are Pod lifecycle independent storage classes to store the data in the volume mount.
Understand more about the concept here: https://kubernetes.io/docs/concepts/storage/persistent-volumes/

Is it practical to mount a block storage to multiple VPS running Docker Swarm for shared storage?

Looking at multiple options to implement a shared storage for a Docker Swarm, I can see most of them require a special Docker plugin:
sshFs
CephFS
glusterFS
S3
and others
... but one thing that is not mentioned anywhere is just mounting a typical block storage to all VPS nodes running the Docker Swarm. Is this option impractical and thus not mentioned on the Internet? Am I missing something?
My idea is as follows:
Create a typical Block Storage (like e.g. one offered by DigitalOcean or Vultr).
Mount it to your VPS filesystem.
Mount a folder from that Block Storage as a volume in the Docker Container / Docker Worker with using a "local" driver.
Sounds the simplest and most obvious to me. Why people are using more complicated setups like sshFs, CephFS etc? And most importantly, is the implementation I described viable, and if so, what are the drawbacks of doing it this way?
The principal advantage of using a volume plugin over mounted storage comes down to the ability to create storage volumes dynamically, and associate them with namespaces.
i.e. with docker managing storage for a volume via a "volumes: data:" directive in a compose file, a volume will be created for each named stack that is deployed.
Using the local driver and mounts, you the swarm admin now need to ensure that no two stacks are trying to use /mnt/data.
Once you pass that hurdle, some platforms have limitations to the number of hosts a block storage can be mounted on to.
Theres also the security angle to consider - with the volume mapped like that a compromose to any service on any host can potentially expose all your data to an attacker, where a volume plugin will expose just exactly the data mounted on that container.
All that said - docker swarm is awesome and the current plugin space is lacking - if mounting block storage is what it takes to get a workable storage solution I say do it. Hopefully the CSI support will be ready before year end however.

Share volume in docker swarm for many nodes

I'm facing a big challenge. Trying run my app on 2 VPS in docker swarm. Containers that use volumes should use shared volume between nodes.
My solution is:
Use plugin glusterFS and mount volume on every node using nfs. NFS generate single point of failure so when something go wrong my data are gone. (it's not look good maybe im wrong)
Use Azure Storage - store data as blob ( Azure Data Lake Storage Gen2 ). But my main is problem how can i connect to azure storage using docker-compose.yaml? I should declarate volume in every service that use volume and declare volume in volume section. I don't have idea how to do that.
Docker documentation about it is gone. Should be here https://docs.docker.com/docker-for-azure/persistent-data-volumes/.
Another option is use https://hub.docker.com/r/docker4x/cloudstor/tags?page=1&ordering=last_updated but last update was 2 years ago so its probably not supported anymore.
Do i have any other options and which share volume between nodes is best solution?
There are a number of ways of dealing with creating persistent volumes in docker swarm, none of them particularly satisfactory:
First up, a simple way is to use nfs, glusterfs, iscsi, or vmware to multimount the same SAN storage volume onto each docker swarm node. Services just mount volumes as /mnt/volumes/my-sql-workload
On the one hand its really simple, on the other hand there is literally no access control and you can easilly accidentally load services pointing at each others data.
Next, commercial docker volume plugins for SANs. If you are lucky and possess a Pure Storage, NetApp or other such SAN array, some of them still offer docker volume plugins. Trident for example if you have a NetApp.
Third. if you are in the cloud, the legacy swarm offerings on Azure and Aws included a built in "cloudstor" volume driver but you need to dig really deep to find it in their legacy offering.
Four, there are a number of opensource or free volume plugins that will mount volumes from nfs, glusterfs or other sources. But most are abandoned or very quiet. The most active I know of is marcelo-ochoa/docker-volume-plugins
I wasn't particularly happy with how those plugins mounted pre-existing volumes, but made operations like docker volume create hard, so I made my own, but really
Swarm Cluster Volume Support with CSI Plugins is hopefully going to drop in 2021¹. Which hopefully is a solid rebuttal to all the problems above.
¹Its now 2022 and the next version of Docker has not yet gone live with CSI support. Still we wait.
In my opinion, a good solution could be to create a GlusterFS cluster, configure a single volume and mount it in every Docker Swarm node (i.e. in /mnt/swarm-storage).
Then, for every Container that needs persistent storage, bind-mount a subdirectory of the GlusterFS volume inside the container.
Example:
services:
my-container:
...
volumes:
- type: bind
source: /mnt/swarm-storage/my-container
target: /a/path/inside/the/container
This way, every node shares the same storage, so that a given container could be instantiated indifferently on every cluster node.
You don't need any Docker plugin for a particular storage driver, because the distributed storage is transparent to the Swarm cluster.
Lastly, GlusterFS is a distributed filesystem, designed to not have a single point of failure and you can cluster it on as many node you like (contrary to NFS).

same data volume attached to multiple container even in different host

I'm able to bind a docker volume to a specific container in a swarm thanks to flocker, but now i would have multiple replicas of my server (to do load balancing) and so i'm searching something to bind the same data volume to multiple replicas of a docker service.
In flocker documentaiton i have found that
Can more than one container access the same volume? Flocker works by
creating a 1 to 1 relationship of a volume and a container. This means
you can have multiple volumes for one container, and those volumes
will always follow that container.
Flocker attaches volumes to the individual agent host (docker host)
and this can only be one host at a time because Flocker attaches
Block-based storage. Nodes on different hosts cannot access the same
volume, because it can only be attached to one node at a time.
If multiple containers on the same host want to use the same volume,
they can, but be careful because multiple containers accessing the
same storage volume can cause corruption.
Can I attach a single volume to multiple hosts? Not currently, support
from multi-attach backends like GCE in Read Only mode, or NFS-like
backends like storage, or distributed filesystems like GlusterFS would
need to be integrated. Flocker focuses mainly on block-storage uses
cases that attach a volume to a single node at a time.
So i think is no possible to do what i want with flocker.
I could use a different orchestrator (k8s) if that could help me, even if i have no experience with that.
I would not use NAS/NFS or anything distribuited filesystems.
Any suggestions?
Thanks in advance.
In k8s, you can mount volume to different Pods at the same time if technology that backs the volume supports shared access.
As mentioned in Kubernetes Persistent Volumes:
Access Modes A PersistentVolume can be mounted on a host in any way
supported by the resource provider. As shown below, providers will
have different capabilities and each PV’s access modes are set to the
specific modes supported by that particular volume. For example, NFS
can support multiple read/write clients, but a specific NFS PV might
be exported on the server as read-only. Each PV gets its own set of
access modes describing that specific PV’s capabilities.
The access modes are:
ReadWriteOnce – the volume can be mounted as read-write by a single node
ReadOnlyMany – the volume can be mounted read-only by many nodes
ReadWriteMany – the volume can be mounted as read-write by many nodes
Types of volumes that supports ReadOnlyMany mode:
AzureFile
CephFS
FC
FlexVolume
GCEPersistentDisk
Glusterfs
iSCSI
Quobyte
NFS
RBD
ScaleIO
Types of volumes that supports ReadWriteMany mode:
AzureFile
CephFS
Glusterfs
Quobyte
RBD
PortworxVolume
VsphereVolume(works when pods are collocated)

Bind mount volume between host and pod containers in Kubernetes

I have a legacy application that stores some config/stats in one of the directory on OS partition (e.g. /config/), and I am trying to run this as a stateful container in Kubernetes cluster.
I am able to run it as a container but due to the inherent ephemeral nature of containers, whatever data my container is writing to the OS partition directory /config/ is lost when the container goes down/destroyed.
I have the Kubernetes deployment file written in such a way that the container is brought back to life, albeit as a new instance either on same host or on another host, but this new container has no access to the data written by previous instance of the container.
If it was a docker container I could get this working using bind-mounts, so that whatever data the container writes to its OS partition directory is saved on the host directory, so that any new instance would have access to the data written by previous instance.
But I could not find any alternative for this in Kubernetes.
I could use hostpath provisioning, but hostpath-provisioning right now works only for single-node kubernetes cluster.
Is there a way I could get this working in a multi-node Kubernetes cluster? Any other option other than hostpath provisioning? I can get the containers talk to each other and sync-the data between nodes, but how do we bind-mount a host directory to container?
Thanks for your help in advance!
This is what you have Volumes and VolumeMounts for in your POD definition. Your lead about hostPath is the right direction, but you need a different volume type when you host data in a cluster (as you seen your self).
Take a look at https://kubernetes.io/docs/concepts/storage/volumes/ for a list of supported storage backends. Depending on your infrastructure you might find one that suits your needs, or you might need to actually create a backing service for one (ie. NFS server, Gluster, Ceph and so on).
If you want to add another abstraction layer to make a universal manifest that can work on different environments (ie. with storage based on cloud provider, or just manualy provisioned depending on particular needs). You will want to get familiar with PV and PVC (https://kubernetes.io/docs/concepts/storage/persistent-volumes/), but as I said they are esntially an abstraction over the basic volumes, so you need to crack that first issue anyway.

Resources