Adding BigQuery connection with gdrive scopes? - google-cloud-composer

I have an external Sheets table that I want to query via the BigQueryOperator in Airflow.
I would prefer to use the Cloud Composer service account.
I've created a new connection via the Airflow UI with the following parameters:
Conn Id: bigquery_with_gdrive_scope
Conn Type: google_cloud_platform
Project Id: <my project id>
Keyfile path: <none>
Keyfile JSON: <none>
Scopes: https://www.googleapis.com/auth/bigquery,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/drive
In my DAG, I use: BigQueryOperator(..., bigquery_conn_id='bigquery_with_gdrive_scope')
The log reports: Access Denied: BigQuery BigQuery: No OAuth token with Google Drive scope was found.
The task attributes show: bigquery_conn_id bigquery_with_gdrive_scope
It's almost as though the bigquery_conn_id parameter is being ignored.

Adding GCP API scopes (like in the accepted answer) did not work for us. After a lot of debugging, it seemed like GCP had "root" scopes that were assigned to the environment during creation, and could not be overridden via Airflow Connections. It seems like this only affects GCP API scopes.
For reference, we were using composer 1.4.0 and airflow 1.10.0
If you want to add a scope pertaining to GCP on Cloud Composer, you MUST do so when you create the environment. It cannot be modified after the fact.
When creating your environment, be sure to add https://www.googleapis.com/auth/drive. Specifically, you can add the following flag to your gcloud composer environment create command:
--oauth-scopes=https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/drive
Lastly, do not forget to share the document with the service account email (unless you have given the service account domain wide access)

In case anyone runs up against the same problem, (Composer 1.0.0, Airflow 1.9.0) falls back to gcloud auth unless Keyfile path or Keyfile json are provided. This ignores any scope arguments.
The master branch of Airflow fixes this; but for now you have to generate a credential file for the service account and tell Airflow where these are located.
There are step by step directions here.
For my use-case I created a key for airflow's service account and set up a connection as follows:
Conn Id: bigquery_with_gdrive_scope
Conn Type: google_cloud_platform
Project Id: <my project id>
Keyfile path: <none>
Keyfile JSON: <contents of keyfile for airflow service account>
Scopes: https://www.googleapis.com/auth/bigquery,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/drive

Related

Berglas not finding my google cloud credentials

I am trying to read my google cloud default credentials with berglas, and it says that:
failed to create berglas client: failed to create kms client: google: could not find default credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
And I am passing the right path, and i have tried with many paths but none of them work.
$HOME/.config/gcloud:/root/.config/gcloud
I'm unfamiliar with Berglas (please include references) but the error is clear. Google's client libraries attempt to find credentials automatically. The documentation describes the process by which credentials are sought.
Since the credentials aren't being found, you're evidently not running on a Google Cloud compute service (where credentials are found automatically). Have you set an environment variable called APPLICATION_DEFAULT_CREDENTIALS and is it pointing to a valid Service Account key file?
The Berglas' README suggests using the following command to auth your user's credentials as Application Default Credentials. You may not have completed this step:
gcloud auth application-default login

How to authorize Google API inside of Docker

I am running an application inside of Docker that requires me to leverage google-bigquery. When I run it outside of Docker, I just have to go to the link below (redacted) and authorize. However, the link doesn't work when I copy-paste it from the Docker terminal. I have tried port mapping as well and no luck either.
Code:
credentials = service_account.Credentials.from_service_account_file(
key_path, scopes=["https://www.googleapis.com/auth/cloud-platform"],
)
# Make clients.
client = bigquery.Client(credentials=credentials, project=credentials.project_id,)
Response:
requests_oauthlib.oauth2_session - DEBUG - Generated new state
Please visit this URL to authorize this application:
Please see the available solutions on this page, it's constantly updated.
gcloud credential helper
Standalone Docker credential helper
Access token
Service account key
In short you need to use a service account key file. Make sure you either use a Secret Manager, or you just issue a service account key file for the purpose of the Docker image.
You need to place the service account key file into the Docker container either at build or runtime.

GCP docker authentication: How is using gcloud more secure than just using a JSON keyfile?

Setting up authentication for Docker  |  Artifact Registry Documentation suggests that gcloud is more secure than using a JSON file with credentials. I disagree. In fact I'll argue the exact opposite is true. What am I misunderstanding?
Setting up authentication for Docker | Artifact Registry Documentation says:
gcloud as credential helper (Recommended)
Configure your Artifact Registry credentials for use with Docker directly in gcloud. Use this method when possible for secure, short-lived access to your project resources. This option only supports Docker versions 18.03 or above.
followed by:
JSON key file
A user-managed key-pair that you can use as a credential for a service account. Because the credential is long-lived, it is the least secure option of all the available authentication methods
The JSON key file contains a private key and other goodies giving a hacker long-lived access. The keys to the kingdom. But only to the Artifact Repository in this instance, because the service account that the JSON file is for only has specifically those rights.
Now gcloud has two auth options:
gcloud auth activate-service-account ACCOUNT --key-file=KEYFILE
gcloud auth login
Lets start with gcloud and a service account: Here it stores KEYFILE in unencrypted in ~/.config/gcloud/credentials.db. Using the JSON file directly boils down docker login -u _json_key --password-stdin https://some.server < KEYFILE which stores the KEYFILE contents in ~/.docker/config.json. So using gcloud with a service account or just using the JSON file directly should be equivalent, security wise. They both store the same KEYFILE unencrypted in a file.
gcloud auth login requires login with a browser where I give consent to giving gcloud access to my user account in its entirety. It is not limited to the Artifact Repository like the service account is. Looking with sqlite3 ~/.config/gcloud/credentials.db .dump I can see that it stores an access_token but also a refresh_token. If the hacker has access to ~/.config/gcloud/credentials.db with access and refresh tokens, doesn't he own the system just as much as if he had access to the JSON file? Actually, this is worse because my user account is not limited to just accessing the Artifact Registry - now the user has access to everything my user has access to.
So all in all: gcloud auth login is at best security-wise equivalent to using the JSON file. But because the access is not limited to the Artifact Registry, it is in fact worse.
Do you disagree?

Run a Docker container under a different service account when using Cloud Build

I am using Cloud Build and would like to run a Docker container under a different service account than the standard Cloud Build service account (A).
The service account I would like to use (B) is from a different project.
One way to do it would be to put the json key on Cloud Storage and then mount it in the Docker container, but I think it should be possible with IAM policies too.
My cloubuild.yaml now contains the following steps:
steps:
- name: 'gcr.io/kaniko-project/executor:v0.20.0'
args:
- --destination=gcr.io/$PROJECT_ID/namu-app:latest
- --cache=true
- --cache-ttl=168h
- name: 'docker'
args: ['run', '--network=cloudbuild', 'gcr.io/$PROJECT_ID/namu-app:latest']
The network is set so that Cloud Build service account is accessible to docker container - see https://cloud.google.com/cloud-build/docs/build-config#network.
So I think my container should have access to the Cloud Build service account.
Then I run the following code inside the Docker container:
import socket
from googleapiclient.discovery import build
from google.auth import impersonated_credentials, default
default_credentials, _ = default()
print("Token: {}".format(default_credentials.token))
play_credentials = impersonated_credentials.Credentials(
source_credentials=default_credentials,
target_principal='google-play-api#api-0000000000000000-0000000.iam.gserviceaccount.com',
target_scopes=[],
lifetime=3600)
TRACK = "internal"
PACKAGE_NAME = 'x.y.z'
APPBUNDLE_FILE = "./build/app/outputs/bundle/release/app.aab"
socket.setdefaulttimeout(300)
service = build('androidpublisher', 'v3')
edits = service.edits()
edit_id = edits.insert(body={}, packageName=PACKAGE_NAME).execute()['id']
However, this fails with:
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/androidpublisher/v3/applications/x.y.z/edits?alt=json returned "Request had insufficient authentication scopes.">
I tried several ways of assigning service account roles, but no luck so far. I thought at first that explicitly 'impersonating' credentials might not be necessary (maybe it can be implicit?).
In summary, I want service account A from project P1 to run as service account B from project P2.
Any ideas?
You might follow to alternatives to troubleshoot this issue:
Give the Cloud Build service account the same permissions that has the service account you are using to run this on your local environment.
Authenticate with a different approach using a credentials file, as shown in this code snippet:
from apiclient.discovery import build
import httplib2
from oauth2client import client
SERVICE_ACCOUNT_EMAIL = (
'ENTER_YOUR_SERVICE_ACCOUNT_EMAIL_HERE#developer.gserviceaccount.com')
# Load the key in PKCS 12 format that you downloaded from the Google APIs
# Console when you created your Service account.
f = file('key.p12', 'rb')
key = f.read()
f.close()
# Create an httplib2.Http object to handle our HTTP requests and authorize it
# with the Credentials. Note that the first parameter, service_account_name,
# is the Email address created for the Service account. It must be the email
# address associated with the key that was created.
credentials = client.SignedJwtAssertionCredentials(
SERVICE_ACCOUNT_EMAIL,
key,
scope='https://www.googleapis.com/auth/androidpublisher')
http = httplib2.Http()
http = credentials.authorize(http)
service = build('androidpublisher', 'v3', http=http)
Using gcloud you can do gcloud run services update SERVICE --service-account SERVICE_ACCOUNT_EMAIL. Documentation also says that
In order to deploy a service with a non-default service account, the
deployer must have the iam.serviceAccounts.actAs permission on the
service account being deployed.
See https://cloud.google.com/run/docs/securing/service-identity#gcloud for more details.

Can I run oc commands in openshift pod terminals?

Is there any way that I can run the oc commands on pod terminals? What I am trying to do is let the user login using
oc login
Then run the command to get the token.
oc whoami -t
And then use that token to call the REST APIs of openshift. This way works on local environment but on openshift, there are some permission issues as I guess openshift doesn't give the root permissions to the user. it says permission denied.
EDIT
So basically i want to be able to get that BEARER token I can send in the HEADERS of the REST APIs to create pods, services, routes etc. And i want that token before any pod is made because i am going to use that token to create pods. It might sound silly I know but that's what I want to know if it is possible, the way we do it using command line using oc commands, is it possible on openshift.
The other possible way could be to call an API that gives me a token and then use that token in other API calls.
#gshipley It does sound like a chiken egg problem to me. But if i were to explain you what I do on my local machine, all i would want is to replicate that on openshift if it is possible. I run the oc commands on nodejs, oc.exe file is there in my repository. I run oc login and oc whoami -t. I read the token i get and store it. Then I send that token as BEARER in API headers. Thats what works on my local machine. I just want to replicate this scenario on openshift. Is it possible?
As a cluster admin create new Role as e.g. role.yml
apiVersion: authorization.openshift.io/v1
kind: ClusterRole
metadata:
name: mysudoer
rules:
apiGroups: [''],
resources: ['users']
verbs: ['impersonate']
resourceNames: ["<your user name>"]
and run
oc create -f role.yml
or instead of creating raw role.yml file, use:
oc create clusterrole mysudoer --verb impersonate --resource users --resource-name "<your user name>"
then give your ServiceAccount the new role
oc adm policy add-cluster-role-to-user mysudoer system:serviceaccount:<project>:default
download the oc tool into your container. Now whenever you execute a command you need to add --as=<user name>, or to hide that, create a shell alias inside your container
alias oc="oc --as=<user name>"
the oc should now behave exactly as on your machine, including the exact same privileges as the ServiceAccount only functions as an entry point to the API, but the real tasks are done as your user.
In case you want something simpler, just add the proper permissions to your ServiceAccount, e.g.
oc policy add-role-to-user admin -z default --namespace=<your project>
if you run the command, any container in your project that has oc will be able to auto-magically do tasks inside the project. However, this way, the permissions are not inherited from the user as in the first step, so it's always required to manually add them to the service account as needed.
Explanation, there is always ServiceAccount in your project called default. It has no privileges a thus can not do anything, however all necessary credentials for authenticating the ServiceAccount are by default in every single container. The cool thing is that oc, if you do not provide any credentials, and just run it inside a container in OpenShift, it will automatically try to login using this account. The steps above simply show, how to get the proper permissions to the account, so that the oc can use it to do something meaningful.
In case you simply want access the RESt API, use the token provided in
/var/run/secrets/kubernetes.io/serviceaccount/token
and set up the permissions for the ServiceAccount as described above. With that, you will not even need the oc command line tool.

Resources