Airflow fernet key does not mask credentials - docker

I am using Apache Airflow 2.2.3 with Python 3.9 and run everything in docker containers.
When I add connections to airflow I do it via the GUI because this way the passwords were supposed to be encrypted. In order for the encryption to work I installed the python package "apache-airflow[crypto]" on my local machine and generated a Fernet Key that I then put in my docker-compose.yaml as the variable "AIRFLOW__CORE__FERNET_KEY: 'MY_KEY'".
I also added the package "apache-airflow[crypto]" to my airflow repositories requirements.txt so that airflow can handle fernet keys.
My questions are the following:
When I add the fernet key as an environment variable as described, I can see the fernet key in the docker-compose.yaml and also when I enter the container and use os.environ["AIRFLOW__CORE__FERNET_KEY"] it's shown - isn't that unsafe? As far as I understand it credentials can be decrypted using this fernet key.
When I add connections to airflow I can get their properties via the container CLI by using "airflow connections get CONNECTION_NAME". Although I added the Fernet Key I see the password in plain text here - isn't that supposed to be hidden?
Unlike passwords the values (/connection strings) in the GUI's "Extra" field do not disappear and are even readable in the GUI. How can I hide those credentials from the GUI and from the CLI?
The airflow GUI tells me that my connections are encrypted so I think that the encryption did work somehow. But what is meant by that statement though when I can clearly see the passwords?

I think you make wrong assumptions about "encryption" and "security". The assumptions that you can prevent user who have access to running software (which airflow CLI gives you) are unrealistic and is not really "physically achievable".
Fernet key is used to encrypt data "At rest" in the database. If your database content is stolen (but not your Airflow program/configuration) - your data is protected. This is the ONLY reason for Fernet Key. It protect your data stored in the database "at rest". But once you have the key (from Airflow runtime) you can decrypt it. Usually the database is in some remote server and it has some backups. As long as the backups are not kept together with the key, if your airflow instances is "safe" but your database or backup gets "stolen" no-one will be able to use that data.
Yes. If you have access to airflow running instance you are supposed to be able to read passwords in clear text. How else do you expect Airflow to work? It needs to read the passwords to authenticate. If you can run airflow program, the data needs to be accessible. There is no work around it and you cannot do it differently this is impossible by design. What you CAN do to protect your data better - you can use Secrets Managers https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/secrets-backend/index.html - but they give you at most possibility of frequent rotation of secrets. Airflow - when running needs to have access to those passwords, otherwise you would not be able to well - authenticate. And once you have access to Airflow Runtime (for example with CLI) there is no way to prevent accessing those passwords that airflow has to know at runtime. This is basic property of any system that needs to authenticate with external system and is accessible at runtime. Airflow is written in Python and you can easily write any code that uses its runtime, so there is no way you can physically protect the runtime passwords that need to be known to "airflow core". At runtime, it needs to know authentication to connect and communicate with external systems. And once you have access to the system, you have - by definition - access to all secrets that system uses at runtime. There is no system in the world that can do it differently - that's just the nature of it. Frequent rotation and temporary authentication is the only way to deal with it so that potentially leaked authentication is not used for a long time.
Modern Airflow (2.1 + I believe) has secret masker that masks sensitive data also from extras when you specify it. https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/mask-sensitive-values.html. The secret masker also masks sensitive data in logs, because logs can also be archived and backed up so - similarly to database - it makes sense to protect it. The UI - unlike CLI (which gives you access to runtime of airflow "Core") is just a front-end and it does not give you access to running core, so there masking sensitive data also makes sense.

Related

Is it safe to store secrets in serverless.yml?

I am working with aws free tier account and amazon charges us for custom secrets. I am creating a lambda function that needs access to secrets. I came across this post on how to manage secrets in serverless. Can someone please help me understand if approach 1 of storing it in local is safe? Further is it safe to just put them in yml file if you are not going to check it in anywhere.
It is a valid approach, however, not recommended in the long run for production systems as it has a few potential issues:
secrets need to be stored on your local machine, if your machine gets compromised, so are your secrets
they are stored in plaintext in generated CloudFormation template, if someone gets access to that template, they will be able to use them. Please keep in mind that generated CF template gets stored in an S3 bucket in plaintext, which means that in the end you'll be storing your secrets unencrypted in an S3 bucket
Though, if it's just your personal project, that approach should work just fine for you and will be relatively safe.
The recommended way to do it though is to fetch and decrypt the secrets at runtime, as described in #4 of the cited article.

ec2 roles vs ec2 roles with temporary keys for s3 access

So I have a standard Rails app running on ec2 that needs access to s3. I am currently doing it with long-term access keys, but rotating keys is a pain, and I would like to move away from this. It seems I have two alternative options:
One, tagging the ec2 instance with a role with proper permissions to access the s3 bucket. This seems easy to setup, yet not having any access keys seems like a bit of a security threat. If someone is able to access a server, it would be very difficult to stop access to s3. Example
Two, I can 'Assume the role' using the ruby SDK and STS classes to get temporary access keys from the role, and use them in the rails application. I am pretty confused how to set this up, but could probably figure it out. It seems like a very secure method, however, as even if someone gets access to your server, the temporary access keys make it considerably harder to access your s3 data over the long term. General methodology of this setup.
I guess my main question is which should I go with? Which is the industry standard nowadays? Does anyone have experience setting up STS?
Sincere thanks for the help and any further understanding on this issue!
All of the methods in your question require AWS Access Keys. These keys may not be obvious but they are there. There is not much that you can do to stop someone once they have access inside the EC2 instance other than terminating the instance. (There are other options, but that is for forensics)
You are currently storing long term keys on your instance. This is strongly NOT recommended. The recommended "best practices" method is to use IAM Roles and assign a role with only required permissions. The AWS SDKs will get the credentials from the instance's metadata.
You are giving some thought to using STS. However, you need credentials to call STS to obtain temporary credentials. STS is an excellent service, but is designed to for handing out short term temporary credentials to others - such as the case where your web server is creating credentials via STS to hand to your users for limited case use such as accessing files on S3 or sending an email, etc. The fault in your thinking about STS is that once the bad guy has access to your server, he will just steal the keys that you call STS with, thereby defeating the need to call STS.
In summary, follow best practices for securing your server such as NACLs, security groups, least privilege, minimum installed software, etc. Then use IAM Roles and assign the minimum privileges to your EC2 instance. Don't forget the value of always backing up your data to a location that your access keys CANNOT access.

Can libgit2sharp rely on the installed git global configuration provider?

I'm wiring up some LibGit2Sharp code to VSO, so I need to use alternate credentials to access it. (NTLM won't work) I don't want to have to manage these cleartext credentials - I'm already using git-credential-winstore to manage them, and I'm happy logging onto the box if I ever need to update those creds.
I see that I can pass in DefaultCredentials and UsernamePassword credentials - is there any way I can get it to fetch the creds from the global git cred store that's already configured on the machine?
Talking to external programs is outside of the scope of libgit2, so it won't talk to git's credential helper. It's considered to be the tool writer's responsibility to retrieve the credentials from the user, wherever they may be.
The credential store is a helper for the git command-line tool to integrate with whatever credcential storage you have on your environment while keeping the logic outside of the main tool, which needs to run in many different places. It is not something that's core to a repository, but a helper for the user interface.
When using libgit2, you are the one who is writing the tool which users interact with and thus knows how to best get to the environment-specific storage. What libgit2 wants to know is what exactly it should answer to the authentication challenge, as any kind of guessing on its part is going to make everyone's life's harder.
Since the Windows credential storage is accessed through an API, it's not out of the question to support some convenience functions to transform from that credential storage into what libgit2's callback wants, but it's not something where libgit2 can easily take the initiative.

Only allow an Openshift app to be connected with another one

I am currently using the free version of Openshift. I have a scalable ruby on rails + postgres app using 2 of my gears and have a separate (potentially scalable) elasticsearch app using the 3rd gear.
The elasticsearch app was generated using https://github.com/rbrower3/openshift-elasticsearch-cartridge
Since the elasticsearch runs as an app on its own url then that leaves it open to attack from the outside world if someone found out the web address of it.
I have considered the elasticsearch-jetty plugin, although I've not managed to lock it down with a username and password successfully yet, but was wondering if there were any other options for limiting access to my elasticsearch Openshift app somehow, eg using apache somehow, so that only my other app can make connections to it (which would need to be read and write - updating the elasticsearch index as well as selecting data from it).
Thanks
The most basic answer is we support .htaccess for Apache where you can specify a username and password. The other option is to add some other Auth option in front of your elasticsearch by modifying the code in the repo to do that. I am not familiar enough with a default elasticsearch install to know what specific mechanism you can use.

API keys and secrets used in iOS app - where to store them?

I'm developing for iOS and I need to make requests to certain APIs using an API key and a secret. However, I wouldn't like for it to be exposed in my source code and have the secret compromised when I push to my repository.
What is the best practice for this case? Write it in a separate file which I'll include in .gitignore?
Thanks
Write it in a separate file which I'll include in .gitignore?
No, don't write it ever.
That means:
you don't write that secret within your repo (no need to gitignore it, or ot worry about adding/committing/pushing it by mistake)
you don't write it anywhere on your local drive (no need to worry about your computer stolen with that "secret" on it)
Store in your repo a script able to seek that secret from an external source (from outside of git repo) and load it in memory.
This is similar to a git credential-helper process, and that script would launch a process listening to localhost:port in order to serve that "secret" to you when you whenever you need it in the current session only.
Once the session is done, there is no trace left.
And that is the best practice to manage secret data.
You can trigger automatically that script on git checkout, if you declare it in a .gitattributes file as a content filter:
This is a very old question, but if anyone is seeing this in google I would suggest you try CloudKit for storing any App secrets (API keys, Oauth secrets). Only your app can access your app container and communication between Apple and your app is secure.
You can check it out here.

Resources