Create a google bigquery connection from Airflow UI (Dockerized) - docker

I am running an Airflow instance using Docker. I am able to access the Airflow UI using http://localhost:8080/. Also able to execute a sample dag using PythonOperator. Using PythonOperator I am able to query a big query table on GCP environment. The service account key JSON file is added in my docker compose yaml file.
This works perfectly.
Now I want to use BigQueryOperator and BigQueryCheckOperator for which I need a connection ID. This connection ID would come from Airflow connections which happens through Airflow UI.
But when I am trying to create a new Google Bigquery connection getting errors. Could anyone please help me to fix this.

In your docker compose file, can you set the environment variable GOOGLE_APPLICATION_CREDENTIALS to /opt/airflow/configs/kairos-aggs-airflow-local-bq-connection.json? This might be enough to fix your first screenshot.
Looking at the docs and comparing your second screenshot, I think you could try selecting 'Google Cloud Platform' as the connection type and adding a project ID and Scopes to the form.
The answers to this question may also be helpful.

Related

How to save doccano database to Google Cloud Storage after deploying to Cloud Run?

I deployed a doccano docker container to Cloud Run and I am successfully able to reach the WebApp.
Everything works fine, such as log in, data import and annotation.
Now I would like to connect the container to Google Cloud Storage in order to save all annotations in a bucket. Currently, all data is lost after the container restarts.
Any hints on how to accomplish that are highly appreciated!
What I (kind of) tried:
Container is up and running, some environment variables are set. But I don't know how I can set a bucket uri within the doccano docker container (doccanos documentation is a bit sparse in that regard).
Maybe this can be helpful for anyone with a similar use case:
My solution/workaround for deploying doccano on GCP was deploying a docker container to the Compute Engine (and opening a port to the app) instead of Cloud Run. Cloud Run seems indeed to be the wrong service for that use case. Compute Engine has a persistent storage which keeps all of the data even if the container has to restart.

Can a docker container get access to (not local) DynamoDB?

I am learning about microservices and Docker and I have made a small application in visual studio 2022 that basically can perform CRUD operations on the DynamoDB (with ASP.NET 6.0).
When I run the project on localhost everything works, but as soon as I make a docker container and try to perform crud from the Docker container, I get an error that states:
unable to get iam security credentials from ec2 instance metadata service
I tried a bunch of things like changing my appsettings.json, but came to the conclusion that that is not the problem since it works when I run the solution locally.
When I google about this problem I get overflow with information about running DynamoDB locally. I get that that is good for developing purpose, but I still want to try to perform CRUD operations on my DynamoDB from the Docker container (and think it must be possible).
So my question is: is it possible to access my DynamoDB table from a Docker image?
I have found the answer. The problem was in my docker-compose file where I needed the following line:
volumes:
- ~/.aws/:/root/.aws:ro
I found it on this post:
AWS DotNet SDK Error: Unable to get IAM security credentials from EC2 Instance Metadata Service
by user #smcg

Unable to pull logs from Airflow Worker

I've got a simple docker development setup for Airflow that includes separate containers for the Airflow UI and Worker. I'm encountering a 403 Forbidden error whenever I attempt to view the log for a task in the Airflow UI.
So far I've ensured they all have the same secret key (in fact, using Docker Volumes they're all reading the exact same configuration file) but this doesn't seem to help. I haven't done anything about time sync, but I'd expect that docker containers would effectively be sharing the system clock anyway so I don't see how they'd get out of sync in the first place.
I can find the log file on the airflow worker, and it has run successfully - but something is obviously missing that should be allowing the airflow UI to display that (and it would be much more convenient for my workflow to be able to see the logs in the UI rather than having to rummage around on the worker).

How to properly use DynamoDB in a Docker container?

I am new to Docker and trying to figure out how to use dynamodb and boto3 within my Docker image. I have followed many tutorial and read many articles. From what I have the basic setup of most dockerized applications have a docker-compose file with two images, the service you have built, and an image of the database. So here is where I am confused, the only image I can find of DynamoDB is dynamodb-local. And to my understanding this image is only used to create a localized database on your computer. I need the ability to connect to an actual dynamodb table on my aws account. I currently just have instructions in my Dockerfile to download boto3 on build. Just wondering if I am doing anything wrong? Could anyone give some clarity, or some good resources to read?
If you need to connect to an external DynamoDB instance then you don't have to create a container for it.
You can just pass the required credentials to access the AWS hosted instance through environment variables to the other service container.
Although I do recommend spinning up a local database for development purposes.

Google Composer - Airflow Configuration Overrides broke Airflow after update of killed_task_cleanup_time variable

I needed to increase a core variable of airflow: killed_task_cleanup_time. I did it using Google Composer Console. After update most DAGs disappeared in UI. I checked Composer monitoring when I notice that there is no workers at all. In logs I found the following errors messages:
Regarding Google Composer docs I could update Airflow config overrides. There is also information that it takes effect on all Airflow instances (worker, server, scheduler) approximately 5 minutes after submitting the update request. I am waiting already 7h with broken airflow and disappeared dags.
What I tried to do:
restart google composer
come back to previous airflow configuration
Nothing helped...
Does anybody have an idea what caused this behaviour ?
As LEC mentioned in his comment, I would encourage you to open a ticket with Google Cloud Support, as it would be nice to look into your project and logs to further troubleshoot this behaviour. Please, don't share private data here.

Resources