I want to automate the deployment of DAGs written in a certain repository.
To achieve that I use the gcloud tool and this just imports the DAGs and Plugins according to the documentation.
Now the problem is that when changing the structure of a DAG it is just not possible to get it to load/run correctly in the webinterface. When I use airflow locally I just restart the webserver and everything is fine, however using Cloud Composer I cannot find out how to restart the webserver.
We only support uploading DAGs through GCS currently: https://cloud.google.com/composer/docs/how-to/using/managing-dags
The webserver, which is hosted through GAE, can't be restarted.
Related
I deployed a doccano docker container to Cloud Run and I am successfully able to reach the WebApp.
Everything works fine, such as log in, data import and annotation.
Now I would like to connect the container to Google Cloud Storage in order to save all annotations in a bucket. Currently, all data is lost after the container restarts.
Any hints on how to accomplish that are highly appreciated!
What I (kind of) tried:
Container is up and running, some environment variables are set. But I don't know how I can set a bucket uri within the doccano docker container (doccanos documentation is a bit sparse in that regard).
Maybe this can be helpful for anyone with a similar use case:
My solution/workaround for deploying doccano on GCP was deploying a docker container to the Compute Engine (and opening a port to the app) instead of Cloud Run. Cloud Run seems indeed to be the wrong service for that use case. Compute Engine has a persistent storage which keeps all of the data even if the container has to restart.
I know this may seem like an opinion-based question but I can't seem to find any answers anywhere. I'm having trouble figuring out how to deploy my flask backend and react front end on google cloud. I am using a docker-compose on my local machine but I can't seem to find a way to deploy that on Google Cloud.
My question is, is there a way to deploy them using a docker-compose file using Cloud Build and Cloud Run? Or do I have to create two different Cloud Run instances to run the frontend and backend? Or is it better to create a VM instance and run the docker-compose container on there (and how would one even do this)? I am very new to deployment so any help is appreciated.
For reference, I saw this but it didn't exactly answer my question. Thanks in advance!
You use docker-compose for multi-container applications. In your case it wouldn't make much sense.
You have a python backend. You can containerize it and deploy to Cloud Run, Cloud Functions, App Engine, Google Kubernetes Engine or even on a Compute Engine VM. In my opinion the most convenient option would be Cloud Run.
If your React frontend is a Single Page App, it communicates with your python backend with HTTP requests. You build the HTML/CSS/JS files and host them somewhere, like a Cloud Storage bucket or Cloud CDN.
I notice that it is possible to trigger a DAG using gcloud by issuing
gcloud composer environments run myenv trigger_dag -- some_dag --run_id=foo
It is my understanding that gcloud uses the client libraries to do everything that it does and hence I am assuming that I can do the same operation (i.e. trigger a composer DAG) using the Python client for Cloud Composer. Unfortunately I've browsed through the documentation at that link, specifically at https://googleapis.dev/python/composer/latest/service_v1beta1/environments.html, and I don't see anything there that enables me to do the same as gcloud composer environments run.
Please can someone help explain if its possible to trigger a DAG using the Python client for Cloud Composer?
Unfortunately the Python Client Library of Cloud Composer does not support trigger of DAGs as of now. A possible workaround for triggering it via Python is to send a HTTP request directly to the airflow instance in your Cloud Composer. See Trigger a DAG from Cloud Functions for more details. See Python code that triggers the DAG hosted in Cloud Function.
In this document, the Cloud Function configured to trigger a DAG when a new file is uploaded to the bucket. If that don't fit your use case, you can always change the trigger type of the Cloud Function that will fit to with your use case.
Is possible to import the hooks that airflow provided in the code (snowflake hook, aws hook, etc) in a kubernetes operator that run a python script?
I may have the wrong idea of how to work with airflow.
Example. I have Airflow running with a kubernetes cluster. Tasks of the dag are inside containers. One task takes some data from SQL and upload it to S3. If I want to do this in a python script inside container I need to code all this stuff (with the possible error and expend time) if I can reuse the libraries (hooks) that airflow has in the code I can save time and also, I'm supported for a lot of developers.
Thank you
I can't find much information on what the differences are in running Airflow on Google Cloud Composer vs Docker. I am trying to switch our data pipelines that are currently on Google Cloud Composer onto Docker to just run locally but am trying to conceptualize what the difference is.
Cloud Composer is a GCP managed service for Airflow. Composer runs in something known as a Composer environment, which runs on Google Kubernetes Engine cluster. It also makes use of various other GCP services such as:
Cloud SQL - stores the metadata associated with Airflow,
App Engine Flex - Airflow web server runs as an App Engine Flex application, which is protected using an Identity-Aware Proxy,
GCS bucket - in order to submit a pipeline to be scheduled and run on Composer, all that we need to do is to copy out Python code into a GCS bucket. Within that, it'll have a folder called DAGs. Any Python code uploaded into that folder is automatically going to be picked up and processed by Composer.
How Cloud Composer benefits?
Focus on your workflows, and let Composer manage the infrastructure (creating the workers, setting up the web server, the message brokers),
One-click to create a new Airflow environment,
Easy and controlled access to the Airflow Web UI,
Provide logging and monitoring metrics, and alert when your workflow is not running,
Integrate with all of Google Cloud services: Big Data, Machine Learning and so on. Run jobs elsewhere, i.e. other cloud provider (Amazon).
Of course you have to pay for the hosting service, but the cost is low compare to if you have to host a production airflow server on your own.
Airflow on-premise
DevOps work that need to be done: create a new server, manage Airflow installation, takes care of dependency and package management, check server health, scaling and security.
pull an Airflow image from a registry and creating the container
creating a volume that maps the directory on local machine where DAGs are held, and the locations where Airflow reads them on the container,
whenever you want to submit a DAG that needs to access GCP service, you need to take care of setting up credentials. Application's service account should be created and downloaded as a JSON file that contains the credentials. This JSON file must be linked into your docker container and the GOOGLE_APPLICATION_CREDENTIALS environment variable must contain the path to the JSON file inside the container.
To sum up, if you don’t want to deal with all of those DevOps problem, and instead just want to focus on your workflow, then Google Cloud composer is a great solution for you.
Additionally, I would like to share with you tutorials that set up Airflow with Docker and on GCP Cloud Composer.