GCP: how to access cloud storage bucket from a VM instance - docker

I'm trying to deploy and run a docker image in a GCP VM instance.
I need it to access a certain Cloud Storage Bucket (read and write).
How do I mount a bucket inside the VM? How do I mount a bucket inside the Docker container running in my VM?
I've been reading google cloud documentation for a while, but I'm still confused. All guides show how to access a bucket from a local machine, and not how to mount it to VM.
https://cloud.google.com/storage/docs/quickstart-gsutil
Found something about Fuse, but it looks overly complicated for just mounting a single bucket to VM filesystem.

Google Cloud Storage is a object storage API, it is not a filesystem. As a result, it isn't really designed to be "mounted" within a VM. It is designed to be highly durable and scalable to extraordinarily large objects (and large numbers of objects).
Though you can use gcsfuse to mount it as a filesystem, that method has pretty significant drawbacks. For example, it can be expensive in operation count to do even simple operations for a normal filesystem.
Likewise, there are many surprising behaviors that are a result of the fact that it is an object store. For example, you can't edit objects -- they are immutable. To give the illusion of writing to the middle of an object, the object is, in effect, deleted and recreated whenever a call to close() or fsync() happens.
The best way to use GCS is to design your application to use the API (or the S3 compatible API) directly. That way the semantics are well understood by the application, and you can optimize for them to get better performance and control your costs. Thus, to access it from your docker container, ensure your container has a way to authenticate through GCS (either through the credentials on the instance, or by deploying a key for a service account with the necessary permissions to access the bucket), then have the application call the API directly.
Finally, if what you need is actually a filesystem, and not specifically GCS, Google Cloud does offer at least 2 other options if you need a large mountable filesystem that is designed for that specific use case:
Persistent Disk, which is the baseline filesystem that you get with a VM, but you can mount many of these devices on a single VM. However, you can't mount them read/write to multiple VMs at once -- if you need to mount to multiple VMs, the persistent disk must be read only for all instances they are mounted to.
Cloud Filestore is a managed service that provides an NFS server in front of a persistent disk. Thus, the filesystem can be mounted read/write and shared across many VMs. However it is significantly more expensive (as of this writing, about $0.20/GB/month vs $0.04/GB/month in us-central1) than PD, and there are minimum size requirements (1TB).

Google Cloud Storage buckets cannot be mounted in Google Compute instances or containers without third-party software such as FUSE. Neither Linux nor Windows have built-in drivers for Cloud Storage.

GCS VM comes with google cloud SDK installed. So without mounting you can copy in and out files using those commands.
gsutil ls gs://

Related

Share large file with all nodes in docker swarm

Currently, I am migrating to Docker Swarm and have begun to use docker configs to offload most of the configuration files but I have one file remaining that is several GBs that is used by my tileserver. Right now, I have a 1 master / 4 workers and I am looking for a way to share that file with all nodes in the swarm to prepare for a time when the tileserver goes down.
Any ideas ?
If you want highly available data then a solution that distributes data amongst nodes (or servers).
One approach would be deploying an object storage solution onto the swarm - something like minio gives you an s3 compatible REST api and when deployed with a minimum of 4 disks in erasure coding mode tolerates 1 disk down for writing and 2 disks down for reading (assuming you have a node per disk).
If re-jigging your app to work with object storage isnt in scope then investigate something like glusterfs which you will want to install on the metal, rather than on docker. glusterfs will give you a unified filesystem with decent HA on 3 nodes, you can add disks on the fly.
Obviously with minio its expected your app would use the s3 api to access its files. With glusterfs you would need to mount gfs volumes on host locations where containers than then mount volumes to gain access to that network storage.
unless you are willing to go wandering through the world of rex-ray and other community supported docker volume drivers that either havn't seen an update in years or are literally maintained by one guy for fun which can bring some first class support for glusterfs based docker volumes to your hopefully non production docker swarm.

Dask +SLURM over ftp mount (CurlFtpFS)

So I have a working DASK/SLURM cluster of 4 raspberry Pis with a common NFS share, that I can run Python jobs succesfully.
However, I want to add some more arm devices to my cluster that do not support NFS mounts (Kernel module missing) so I wish to move to fuse based ftp mounts wiht CurlftpFS.
I have setup the mounts sucesfully with anonymous username and without any passwords and the common FTP share can be seen by all the nodes (just as before when it was an NFS share).
I can still run SLURM jobs (since they do not use the share) but when I try to run a DASK job the master node timesout complaining that no worker nodes could be started.
I am not sure what exactly is the problem, since the share it open to anyone for read/write access (e.g. logs and dask queue intermediate files).
Any ideas how I can troubleshoot this?
I don't believe anyone has a cluster like yours!
At a guess, the filesystem access via FUSE, ftp and the pi is much slower than the OS is expecting, and you are seeing the effects of low-level timeouts, i.e., from Dask's point of view it appears that files reads are failing. Dask needs access to storage for configuration and sometimes temporary files. You would want to make sure that these locations are on local storage or tuned off. However, if this is happening during import of modules, which you have on the shared drive by design, there may be no fixing it (python loads many small files during import). Why not use rsync to move the files to the nodes?

is docker storage driver a persistent storage

i'm new to docker and i'm trying to understand persistent storage in docker.
in the section Manage Application data > Store data within containers > About storage drivers
https://docs.docker.com/storage/storagedriver/ Storage drivers allow
you to create data in the writable layer of your container. The files
won’t be persisted after the container is deleted, and both read and
write speeds are lower than native file system performance.
but later on the section Manage Application data > Store data within containers > Use the Device Mapper storage driver
https://docs.docker.com/storage/storagedriver/device-mapper-driver/
they use direct-lvm that creates logical volumes that allow to persist data
my question : using a storage driver means :
the container-generated data is ephemeral ?
the container-generated data is ephemeral if we are using a logical
volume on a loopback-device ?
the container-generated data is persistent if we are using a logical
volume on a block device ?
The storage driver configuration is essentially an install-time setting that's not really relevant once you've gotten it set up correctly. In particular if you run docker info and it says it's using an overlay2 driver I would recommend closing this particular browser tab and not changing anything.
Of the paragraph you quoted, the important thing to take away is that files you create inside a container, that aren't inside a volume directory, will be lost as soon as the container is deleted. It doesn't matter what underlying storage driver you're using. The performance differences between the container filesystem, named volumes, and bind-mounts almost never matter (except on MacOS hosts where bind mounts are very very slow).
The data the storage driver persists includes both the temporary container filesystems (they get persisted until the container is deleted) and the underlying image data. It does not include named Docker volumes or other bind-mounted host directories.
If you're using devicemapper, you might see if you can upgrade your host to a newer Linux distribution that can use the overlay2 driver. In particular that avoids the fixed space limit of the devicemapper driver. If you must use devicemapper, general wisdom has been that using a dedicated partition for it is better than using a file. As I said up front, though, this is essentially install-time configuration and has no bearing on your application or docker run commands.

Best practice to automatically backup remotely hosted server

I am trying to setup a server for team note taking, and I am wondering what is the best way to backup its data, A.K.A my notes, automatically.
Currently I plan to run the server in a docker image.
The docker image will be hosted by a hosting service (such as Google).
I found a free hosting service that fits my need, but it does not allow mounting volumes to a docker image.
Therefore, I think the only way for me to backup my data is to transfer them to some other cloud services.
However, this requires that I have to store some sort of sensitive data for authentication in my docker image, apparently this is not cool.
So:
Is it possible to transfer data from a docker image to a cloud service without taking the risk of leaking password/private key?
Is there any other way to backup my data?
I don't have to use docker as all I need is actually Node.js.
But the server must be hosted on some remote machines because I don't have the ability/time/money to host a machine on my own...
I use borg backup to backup our servers (including docker volumes) ... and it's saved the day many times due to failure and stupidity.
It transfers over SSH so comms are encrypted. The repositories it uses are also encrypted on disk so that makes all your data safe. It de-duplicates, snapshots, prunes, compresses ... the feature list is quite large.
After the first backup, subsequent backups are much faster because it only submits the changes since the previous backup.
You can also mount the snapshots as filesystems so you can hunt down the single file you deleted or just restore the whole lot. The mounts can also be done remotely.
I've configured ours to backup /home, /etc and the /var/lib/docker/volumes directories (among others).
We rent a few cheap storage VPSs and send the data up to them nightly. They're in different geographic locations with different hosting providers, you know, because we're paranoid.
Beside docker swarm secrets, don't forget bind mounts strategies: you could have your data in a volume.
In that case, you can have a backup strategy done on the host (instead of the container at runtime), which would take that volume, compress it and save it elsewhere. See for instance this answer or this one.

How do I use Docker on cloud or datacenter

I couldn't have enough courage to start using docker now I'm feel like came from last century. I want to clear my doubts about docker before get started. My question is mainly for deploying/running docker images on cloud or hosting environment.
Can I build a docker image with any type of server (eg. wildfly, payara) and/or database server (eg. mysql, oracle) and will it work on docker enabled cloud/datacenter?
If it's yes how about persistent datas like database files and static storages (eg. images, uploaded documents, logs) those are stored in docker images or somewhere else? What will happen to those files when I update my application and redeploy new image?
I read posts about what is docker but I couln't find specific answer. Forgive me for not doing enough googling.
I have run docker on AWS and other cloud providers. It is really not that hard if you have some experience with system administration and or devops. Regarding cloud hosters and getting started, most providers have some sort of tutorial on how to get started using docker with their infrastructure:
http://docs.aws.amazon.com/AmazonECS/latest/developerguide/docker-basics.html
https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-dockerextension/
Can I build a docker image with any type of server (eg. wildfly,
payara) and/or database server (eg. mysql, oracle) and will it work on
docker enabled cloud/datacenter?
To get a server up and running, you just need the docker engine installed on the host, there are packages for many distros:
https://docs.docker.com/engine/installation/
After docker engine is installed, you can create dockerfiles for basically any server or service. Hopefully you do not need to, in most cases, since there are countless docker files and pre-configured, vendor maintained images already available on dockerhub (I use wildfly, elk-stack, and mysql for example). Be careful about selecting images are maintained, otherwise you end up with security issues in your images that might never get fixed! Or you have to do it yourself!
Example images:
https://hub.docker.com/r/jboss/wildfly/
https://hub.docker.com/_/mysql/
https://hub.docker.com/_/oraclelinux/
https://hub.docker.com/u/payara/
If it's yes how about persistent datas like database files and static
storages (eg. images, uploaded documents, logs) those are stored in
docker images or somewhere else? What will happen to those files when
I update my application and redeploy new image?
In general, you will want to store persistent data external to the docker image and mount it into the image as a volume:
https://docs.docker.com/engine/tutorials/dockervolumes/
Some cloud based storage providers might be easier to mount or connect to in other ways, but this volume approach is standard, IMO.
For logfiles, I actually push them to an ELK server, so having a volume for the logs is not necessarily required. However, since the ELK server is also a docker image, it does have a volume where the data is persisted.
So you have:
documentation from your cloud hoster (or docker themselves)
a host in your cloud running docker engine
0..n images that you can either grab from dockerhub or build yourself.
storage for persistent data on this host or mounted from elsewhere that you mount into your docker images on startup. this is where e.g. mysql data folders live, or where you can persist logs, etc.
Of course, it can get much more complex from there, e.g. how to transparently scale and update your environment etc., but that is something for e.g. kubernetes or docker swarm or some other solution (I've scripted a bit on my own but do not need the robustness or elastic scalability of large systems).
Regarding cluster management, it should be noted that Swarm is now included in the Docker Core. This has created some controversy in the community and even talks of a fork of the core:
https://technologyconversations.com/2015/11/04/docker-clustering-tools-compared-kubernetes-vs-docker-swarm/
https://jaxenter.com/docker-1-12-is-probably-the-most-important-release-since-1-0-129080.html
http://searchitoperations.techtarget.com/news/450303918/Docker-fork-talk-prompts-container-standardization-brawl
http://www.infoworld.com/article/3118345/cloud-computing/why-kubernetes-is-winning-the-container-war.html
I have experience running docker on Alibaba cloud and AWS as well. I did not see any difference in working with docker on both cloud providers. Docker images can be build same way on all linux platform regardless of the cloud provider. However, persistence of data need to be taken care using docker volumes. However, it is recommended to use managed service such as RDS in Alibaba cloud for databases instead of using docker.
Can I build a docker image with any type of server (eg. wildfly,
payara) and/or database server (eg. mysql, oracle) and will it work on
docker enabled cloud/datacenter?
You can build your own Docker images or use solutions that are already pre-packaged and proven by cloud providers. For example, here is an auto-clustering Docker-based implementation of GlassFish that can be run and managed on Jelastic PaaS.
If it's yes how about persistent datas like database files and static
storages (eg. images, uploaded documents, logs) those are stored in
docker images or somewhere else? What will happen to those files when
I update my application and redeploy new image?
With the above mentioned cluster, all data is kept inside containers and stays without changes after restart. As an option, you can also connect a separate data storage container if you wish to share it across other containers.

Resources