Airflow on Google Cloud Composer vs Docker - docker

I can't find much information on what the differences are in running Airflow on Google Cloud Composer vs Docker. I am trying to switch our data pipelines that are currently on Google Cloud Composer onto Docker to just run locally but am trying to conceptualize what the difference is.

Cloud Composer is a GCP managed service for Airflow. Composer runs in something known as a Composer environment, which runs on Google Kubernetes Engine cluster. It also makes use of various other GCP services such as:
Cloud SQL - stores the metadata associated with Airflow,
App Engine Flex - Airflow web server runs as an App Engine Flex application, which is protected using an Identity-Aware Proxy,
GCS bucket - in order to submit a pipeline to be scheduled and run on Composer, all that we need to do is to copy out Python code into a GCS bucket. Within that, it'll have a folder called DAGs. Any Python code uploaded into that folder is automatically going to be picked up and processed by Composer.
How Cloud Composer benefits?
Focus on your workflows, and let Composer manage the infrastructure (creating the workers, setting up the web server, the message brokers),
One-click to create a new Airflow environment,
Easy and controlled access to the Airflow Web UI,
Provide logging and monitoring metrics, and alert when your workflow is not running,
Integrate with all of Google Cloud services: Big Data, Machine Learning and so on. Run jobs elsewhere, i.e. other cloud provider (Amazon).
Of course you have to pay for the hosting service, but the cost is low compare to if you have to host a production airflow server on your own.
Airflow on-premise
DevOps work that need to be done: create a new server, manage Airflow installation, takes care of dependency and package management, check server health, scaling and security.
pull an Airflow image from a registry and creating the container
creating a volume that maps the directory on local machine where DAGs are held, and the locations where Airflow reads them on the container,
whenever you want to submit a DAG that needs to access GCP service, you need to take care of setting up credentials. Application's service account should be created and downloaded as a JSON file that contains the credentials. This JSON file must be linked into your docker container and the GOOGLE_APPLICATION_CREDENTIALS environment variable must contain the path to the JSON file inside the container.
To sum up, if you don’t want to deal with all of those DevOps problem, and instead just want to focus on your workflow, then Google Cloud composer is a great solution for you.
Additionally, I would like to share with you tutorials that set up Airflow with Docker and on GCP Cloud Composer.

Related

Trigger deployment of Docker container on demand

I have a web application that helps my client launch an API with a button "Launch my API".
Under the hood, I have a Docker image that is ran on two Google Cloud Run services (one for debug environment and one for production).
My challenge is the following: How can I trigger the deployment of new Docker containers on-demand ?
Naively, I would like that this button call an API that trigger the launch of these services based on my Docker image (that is already in Google Cloud or available to download at a certain URL).
Ultimately, I'll need to use Kubernetes to manage all of my container's clients. Maybe I should look into that for triggering new container deployments ?
I tried to glue together (I'm very new to the cloud) a Google Cloud function that trigger a new service on Google Cloud Run based on my docker image but with no success.

Create service or container from another container, on Google Cloud Run or Cloud Run on GKE

Can I create a service or container from another container, on Google Cloud Run or Cloud Run on GKE ?
I basically want to manage my containers/services dynamically from another container and not sure how to go about this
Adding more details:
One of my microservices needs to create new isolated containers that will run some user-land code. I would like to have full life-cycle control of these containers, run the code, and then destroy as needed.
I also looked at Cloud Run APIs but not sure how to run something like 'kubectl create ...' through the APIs? Is that the right approach?
Yes, you should be able to deploy Cloud Run services from Cloud Run services.
on Cloud Run (hosted): services by default run with Editor permissions, so this should be possible without any extra configuration
note that if you deploy apps with --allow-unauthenticated which requires setting IAM permissions, the Editor role will not be enough, as you need Owner role on the GCP project for that.
on Cloud Run on GKE: services by default run with limited scopes (as they by default inherit GKE node's permissions/scopes). You should add a service account to the Kubernetes Pod and use it to authenticate.
From there, you have several options:
Use the REST API directly: Since run.googleapis.com behaves like a Kubernetes API server, you can directly apply JSON objects of Knative Services. (You can use gcloud ... --log-http to learn how deployments are made using REST API requests).
Use gcloud: you can ship your container image with gcloud and invoke it from your process.
Use Google Cloud Client Libraries: You can use the client libraries that are available for Cloud Run (for example this Go library) to construct in-memory Service objects and send them to the API using a higher level client library (recommended approach)

How can I define additional services in a Divio Cloud project using docker-compose?

I defined an additional service in my docker-compose.yml file in my Divio Cloud project.
Locally, it works just as expected. As well as the default web and db containers, I get my new container.
However, when I push this configuration to the Divio Cloud server, it's clearly not working at all, and I can't connect to the custom container.
In short
If you need an additional service in your project, you should configure it on the Divio Cloud, not in docker-compose.yml. docker-compose.yml is only used for local development purposes, and is ignored in deployment.
The longer answer
In Divio Cloud projects, docker-compose.yml is used to orchestrate all the services and containers in the local development environment only.
In the actual hosting environment, it's not used at all, and is simply ignored. Locally, your project has all the containers defined in the docker-compose.yml file - web, db and whatever else you define.
When your project is deployed on the hosting environment, only the web container is used.
The other containers are used locally for convenience, to replicate services that are part of the infrastructure.
For example, locally you have a db container running the Postgres database. In the cloud infrastructure, the web container connects to a Postgres cluster.
Similarly, if you have Celery in your cloud project, it will use backing services provided as part of the cloud infrastructure, but when you set up the same project locally, it will build them in new Docker containers.
More information at docker-compose.yml reference in the Divio Cloud Developer Handbook.
Note: I am a member of the Divio team. This question is one that we see quite regularly via our support channels.

How to update DAGs in Google Cloud Composer

I want to automate the deployment of DAGs written in a certain repository.
To achieve that I use the gcloud tool and this just imports the DAGs and Plugins according to the documentation.
Now the problem is that when changing the structure of a DAG it is just not possible to get it to load/run correctly in the webinterface. When I use airflow locally I just restart the webserver and everything is fine, however using Cloud Composer I cannot find out how to restart the webserver.
We only support uploading DAGs through GCS currently: https://cloud.google.com/composer/docs/how-to/using/managing-dags
The webserver, which is hosted through GAE, can't be restarted.

How to deploy docker app using docker-compose.yml in cloud foundry

I have a docker-compose.yml file which have environment variable and certificates. I like to deploy these in cloud foundry dev version.
I want to deploy microgateway on cloud foundry link for microgateway is below-
https://github.com/CAAPIM/Microgateway
In cloud native world, you instantiate the services to your foundation beforehand. You can use prebuilt services (auto-scaler) available from the market place.
If the service you want is not available, you can install a tile (e.g redis, mysql, rabbitmq), which will add services to the market place. Lot of vendors provide tiles that can be installed on PCF (check on newtork.pivotal.io for the full list).
If you have services that are outside of cloud foundry (e.g. Oracle, Mongo, or MS Sql Server), and you wish to inject them into your cloud foundry foundation, you can create do that by creating User Provide Services (cups).
Once you have a service, you have to create a service instance. Think of it as provisioning a service for you. After you have provisioned i.e. created a service instance, then you can bind it to one or more apps.
A service instance is scoped to an org and a space. All apps within a org - space, can be bound to that service instance.
You deploy your app individually, by itself, to cloud foundry (jar, war, zip). You then bind any needed services to your app (e.g db, scaling, caching etc).
Use a manifest file to do all these steps in one deployment.
PCF 2.0 is introducing PKS - Pivotal Container Service. It is implementation of Kubo within PCF. It is still not GA.
Kubo, Kubernetes, and PKS allow you to deployed your containerized applications.
I have played with MiniKube and little bit of Kubo. Still getting my hands wet on PKS.
Hope this helps!

Resources