When ML model gets trained, it should be moved automatically from on-premise to Azure storage.
How can I automate On-Premise ML trained model to be stored in Azure storage account, goal here is when the model gets trained it should automatically be stored inside a storage account containers.
There are several solutions can help copying the trained model files from on-premise to Azure Storage.
To use azcopy sync command to replicate the source location to the destination location. Evan if your on-premise OS is Linux, you can try to run it via crontab with an interval.
To use Azure/azure-storage-fuse to mount a container of Azure Blob Storage to Linux filesystem, then directly save the trained model files to the mounted path, if the on-premise trainning machine is Linux.
To use an Azure File Share with Windows or Linux or macOS via Samba 3.0 as a directory in your on-premise filesystem, then you can save the trained model files into it.
At the end of Python trainning script, to add some code using Azure Storage SDK for Python to directly upload the trained module files to Azure Storage.
Hope it helps.
Related
We are planning to host our Artifactory on ECS (Fargate) and mount the data to EFS. We will use an ALB in front of the containers (8081 and 8082) We still have some open issues:
Can we use multiple containers at the same time or will there be upload/write issues to EFS?
Is EFS a good solution or is S3 better?
What about the metadata. I read Artifactory is hosting this in some Derby database. What if we redeploy a new container? Will the data be gone? Can this data be persisted on EFS or do we need RDS?
Can we use multiple containers at the same time or will there be upload/write issues to EFS?
Ans: Yes you can use multiple containers to host Artifactory instances in a single host. However it is generally recommended to use multiple host to avoid the 'single point of failure' scenario. I don't anticipate any RW issues with EFS/S3.
Is EFS a good solution or is S3 better?
Ans: In my opinion both S3 and EFS are better known as scalable solutions rather than high performance oriented and it completely depends on the use-case. You can overcome this issue by enabling cache-fs in Artifactory which will store the frequently used binaries in a defined place (like a local disk with higher RW speeds). You can read more about cache-fs here: https://jfrog.com/knowledge-base/what-is-cache-fs-video/
What about the metadata. I read Artifactory is hosting this in some Derby database. What if we redeploy a new container? Will the data be gone? Can this data be persisted on EFS or do we need RDS?
Ans: when you are configuring more than one Artifactory node it is mandatory to have an external database (RDS) to store the configurations/references. On a side note: Artifactory generates the metadata for the packages/artifacts and store them in the FS only. However the references will be stored in the DB
I can't find much information on what the differences are in running Airflow on Google Cloud Composer vs Docker. I am trying to switch our data pipelines that are currently on Google Cloud Composer onto Docker to just run locally but am trying to conceptualize what the difference is.
Cloud Composer is a GCP managed service for Airflow. Composer runs in something known as a Composer environment, which runs on Google Kubernetes Engine cluster. It also makes use of various other GCP services such as:
Cloud SQL - stores the metadata associated with Airflow,
App Engine Flex - Airflow web server runs as an App Engine Flex application, which is protected using an Identity-Aware Proxy,
GCS bucket - in order to submit a pipeline to be scheduled and run on Composer, all that we need to do is to copy out Python code into a GCS bucket. Within that, it'll have a folder called DAGs. Any Python code uploaded into that folder is automatically going to be picked up and processed by Composer.
How Cloud Composer benefits?
Focus on your workflows, and let Composer manage the infrastructure (creating the workers, setting up the web server, the message brokers),
One-click to create a new Airflow environment,
Easy and controlled access to the Airflow Web UI,
Provide logging and monitoring metrics, and alert when your workflow is not running,
Integrate with all of Google Cloud services: Big Data, Machine Learning and so on. Run jobs elsewhere, i.e. other cloud provider (Amazon).
Of course you have to pay for the hosting service, but the cost is low compare to if you have to host a production airflow server on your own.
Airflow on-premise
DevOps work that need to be done: create a new server, manage Airflow installation, takes care of dependency and package management, check server health, scaling and security.
pull an Airflow image from a registry and creating the container
creating a volume that maps the directory on local machine where DAGs are held, and the locations where Airflow reads them on the container,
whenever you want to submit a DAG that needs to access GCP service, you need to take care of setting up credentials. Application's service account should be created and downloaded as a JSON file that contains the credentials. This JSON file must be linked into your docker container and the GOOGLE_APPLICATION_CREDENTIALS environment variable must contain the path to the JSON file inside the container.
To sum up, if you don’t want to deal with all of those DevOps problem, and instead just want to focus on your workflow, then Google Cloud composer is a great solution for you.
Additionally, I would like to share with you tutorials that set up Airflow with Docker and on GCP Cloud Composer.
I'm working in an application to obtain some data from a web service, create a text file in the local filesystem send a command to a command line application, obtain the result and then send the results back via the web service.
I need to be able to write to the local file system, read from it and then delete the temporary file. I was reading about bind mounts and volumes but this folder can be delete if a new version of the image is uploaded is just a staging area.
Any ideas how this can be done, thanks.
When using containers in App Service, I believe you will have to link a storage account and mount file shares accordingly. Depending on the OS (windows / linux), the steps vary a bit.
If you are not using containers, then you should be able to access the temporary file locations for file-based requirements. Do note that the storage available this way is limited and not shared across site instances.
I'm trying to deploy and run a docker image in a GCP VM instance.
I need it to access a certain Cloud Storage Bucket (read and write).
How do I mount a bucket inside the VM? How do I mount a bucket inside the Docker container running in my VM?
I've been reading google cloud documentation for a while, but I'm still confused. All guides show how to access a bucket from a local machine, and not how to mount it to VM.
https://cloud.google.com/storage/docs/quickstart-gsutil
Found something about Fuse, but it looks overly complicated for just mounting a single bucket to VM filesystem.
Google Cloud Storage is a object storage API, it is not a filesystem. As a result, it isn't really designed to be "mounted" within a VM. It is designed to be highly durable and scalable to extraordinarily large objects (and large numbers of objects).
Though you can use gcsfuse to mount it as a filesystem, that method has pretty significant drawbacks. For example, it can be expensive in operation count to do even simple operations for a normal filesystem.
Likewise, there are many surprising behaviors that are a result of the fact that it is an object store. For example, you can't edit objects -- they are immutable. To give the illusion of writing to the middle of an object, the object is, in effect, deleted and recreated whenever a call to close() or fsync() happens.
The best way to use GCS is to design your application to use the API (or the S3 compatible API) directly. That way the semantics are well understood by the application, and you can optimize for them to get better performance and control your costs. Thus, to access it from your docker container, ensure your container has a way to authenticate through GCS (either through the credentials on the instance, or by deploying a key for a service account with the necessary permissions to access the bucket), then have the application call the API directly.
Finally, if what you need is actually a filesystem, and not specifically GCS, Google Cloud does offer at least 2 other options if you need a large mountable filesystem that is designed for that specific use case:
Persistent Disk, which is the baseline filesystem that you get with a VM, but you can mount many of these devices on a single VM. However, you can't mount them read/write to multiple VMs at once -- if you need to mount to multiple VMs, the persistent disk must be read only for all instances they are mounted to.
Cloud Filestore is a managed service that provides an NFS server in front of a persistent disk. Thus, the filesystem can be mounted read/write and shared across many VMs. However it is significantly more expensive (as of this writing, about $0.20/GB/month vs $0.04/GB/month in us-central1) than PD, and there are minimum size requirements (1TB).
Google Cloud Storage buckets cannot be mounted in Google Compute instances or containers without third-party software such as FUSE. Neither Linux nor Windows have built-in drivers for Cloud Storage.
GCS VM comes with google cloud SDK installed. So without mounting you can copy in and out files using those commands.
gsutil ls gs://
I am trying to setup a server for team note taking, and I am wondering what is the best way to backup its data, A.K.A my notes, automatically.
Currently I plan to run the server in a docker image.
The docker image will be hosted by a hosting service (such as Google).
I found a free hosting service that fits my need, but it does not allow mounting volumes to a docker image.
Therefore, I think the only way for me to backup my data is to transfer them to some other cloud services.
However, this requires that I have to store some sort of sensitive data for authentication in my docker image, apparently this is not cool.
So:
Is it possible to transfer data from a docker image to a cloud service without taking the risk of leaking password/private key?
Is there any other way to backup my data?
I don't have to use docker as all I need is actually Node.js.
But the server must be hosted on some remote machines because I don't have the ability/time/money to host a machine on my own...
I use borg backup to backup our servers (including docker volumes) ... and it's saved the day many times due to failure and stupidity.
It transfers over SSH so comms are encrypted. The repositories it uses are also encrypted on disk so that makes all your data safe. It de-duplicates, snapshots, prunes, compresses ... the feature list is quite large.
After the first backup, subsequent backups are much faster because it only submits the changes since the previous backup.
You can also mount the snapshots as filesystems so you can hunt down the single file you deleted or just restore the whole lot. The mounts can also be done remotely.
I've configured ours to backup /home, /etc and the /var/lib/docker/volumes directories (among others).
We rent a few cheap storage VPSs and send the data up to them nightly. They're in different geographic locations with different hosting providers, you know, because we're paranoid.
Beside docker swarm secrets, don't forget bind mounts strategies: you could have your data in a volume.
In that case, you can have a backup strategy done on the host (instead of the container at runtime), which would take that volume, compress it and save it elsewhere. See for instance this answer or this one.