I am migrating a micro-service system to Helm. The system has roughly 30 distinct deployments depending on an installation context. We are using Helm 3. Our currently layout is a three tier chart/subcharts organised by functionality that may or may not be required in a given context. The subcharts, when grouped in a 2nd-level subchart, usually need to be enabled/disabled together; so this is easy by disabling their parent in the top-level values file. However, there are some scenarios where grand-child charts depend on an uncle chart and I'm having difficulty finding an elegant solution to these situations.
What are strategies that have been used successfully in other charts?
Two scenarios that currently fall into this category for me are:
I would like to have a "feature flag" (global) that allows the installer to decide if a PVC should be create and mounted on applicable pods so that they can log into a central place for retrieval later (ELK, I know, I know...). If the flag is set then the PVC needs creation and the deployments will mount it. If not, then no PVC should be created and an empty dir used.
Some of the deployments use a technical "account" to communicate with each-other. So when these services are enabled, I'd like to create a secret with the username/password and run a Job to create the user in our identity provider. That same secret would then be added to the applicable deployments' environment variables. There are a handful of these technical accounts that are reused my multiple deployments. As such, I'd like to only create their secret and run the user creation job once.
Many thanks for any hints or helpful experience that you can send my way.
Related
I have a Docker image that will execute different logic depending on the environment variables that it is run with. You can imagine that running it with VAR=A will produce slightly different logic compared to running it with VAR=B.
We have an application that is meant to allow users to kick off tasks that are defined within this Docker image. Depending on the user attributes, different environment variables will need to be passed into the Docker container when it is run. The task is meant to run each time an event is generated through user action and then the container should shut down/be removed.
I'm trying to determine if GCP has any container services that best match what I'm looking for. My understanding of some of the services is:
Cloud Functions - can work well for consuming events and taking specific actions each time an event is triggered, but it is not suited for containerized workloads.
Cloud Run - a serverless way of deploying containers. As I understand it, a deployment on cloud run spins up a "service", and the environment variables must be passed in as part of the service definition. Because we may have a large number of combinations of environment variables (many of which may need to be running at once), it seems that this would end up creating a large number of services, which feels potentially clunky. This approach seems better for deploying a single service with static environment variables that needs to be auto-scaled by GCP.
GKE - another container orchestration platform. This is what I'm considering at the moment. The idea is that we would define a single job definition that can vary according to environment variables that are passed into it. The problem is that these jobs would need to be kicked off dynamically via code. This is a fairly straightforward process with kubectl, but the Kubernetes REST API seems fairly underdeveloped (or at least not that well documented). And the lack of information online on how to start jobs on-demand through the Kubernetes API makes me question whether this is the best approach.
Are there any tools that I'm missing that would be useful in spinning up containers on-demand with dynamic sets of environment variables and removing them when done?
I have a Jenkins instance that has users on different projects. I also have a Bitbucket instance where the users store their code. Push/pull access to the code repositories is authenticated by SSH public/private keys. Not all users in Bitbucket have access to all repositories.
I want users to be able to configure their jobs to use their own private keys to pull source from Bitbucket, but users should not be able to access other users' private keys. Is this possible with combination of the Jenkins Credentials plugin and the Git plugin? If so, how do I configure this?
If this is not possible with the Jenkins Credentials plugin/Git plugin, how can I implement this in Jenkins? I figure enterprises that use Jenkins must have resolved this problem but I can't find a solution anywhere.
Jenkins has a lot of inherent security issues in this respect and allowing users to have configuration permissions in a multi-tenant environment is very difficult to lock down.
Even if you were able to set permissions on a per credentials basis, a user that has configuration permissions to setup their own freestyle job can easily run processes on the nodes that could scrape passwords from the environment of other tenants. This can even be done with background processes if you limit one executor slot per node.
From what I've seen in industry and leveraged myself for an enterprise scale there are two high-level recommendations I would suggest:
Breakup the single instance into multiple instances so each set of
users or teams can have their own instance to work with.
Evaluate what users' functional needs are and provide a capability to request jobs where configuration permissions do not need to be given to users, rather only build/read permissions.
For Item 1:
Breaking up the instances not only helps Jenkins management from a security perspective, but from a scaling perspective as well since there are several issues you can run into with Jenkins once reaching a certain size (e.g. users that are resource hogs, unstashing bottlenecks, archiving bottlenecks, poorly written pipelines, etc.). These scaling problems typically lead to a need of vertically scaling the Jenkins master.
However, this approach has its own set of issues to solve since you now have multiple instances to maintain, but that is typically a bit easier to manage and there are some off-the-shelf solutions available if you're willing to pay the price (e.g. Cloudbees CI). Managing multiple instances can be solved in-house as well if you're willing to write some scripts or setup a service to handle this. Personally I'm a bit more of a fan of the in-house solution than the paid solution since I lean towards the ability to control one's own destiny and off-the-shelf solutions aren't always the one-size-fits-all they claim to be.
For Item 2:
If you really want to keep a single instance, the best way to secure it is to not let users have configuration permissions. As mentioned above, Jenkins has a lot of inherent security issues that does not make it well-suited for users to configure jobs in a multi-tenant setting. By evaluating the users' needs, you often find that there is a lot of common requirements users have that could be provided from common job templates without having to give them permissions.
Leveraging the Job DSL plugin to parameterize job creation is one way to do this. Parameters could then either be provided through a custom service or configuration files that are committed to a git repo. Another approach is leverage Jenkins REST API directly with a custom service that posts new job configurations from a customs job templates.
However, this approach could still run into scaling problem in the long term if the utilization of the Jenkins instance is expected to increase. These scaling problem are not insurmountable and can be mitigated with vertical scaling or offloading some stashing/archiving activity, but eventually at a certain point it might make sense to re-evaluate going with Item 1, or even a combination of Item 1 with Item 2.
Conclusion:
I know this is not likely the answer you were hoping for, but if security is a major concern, then a multi-tenant Jenkins instance that allow users' configuration permissions is not they way to go.
so I have been building my application mostly as 12 factor app and now looking at the config part.
Right now as it stands I have separate config files for dev and production and through the build process we either build a dev or production image. The code is 100% the same the only thing that changes is the config.
Now I 100% understand that in a 12 factor app the config should come from external source such as: environment variables, or maybe a safe store like vault etc...
So what the various articles and blogs fail to mention about the config is the how is the config stored/processed. If the code is separated in it's own git repo and it has no config stored with it then how do we handle the config?
Do we store the actual config values on a separate git and then some how merge/push/execute those on the target environment (Kubernet config map, marathon JSON config, Vault, etc...) through the build process using some kind of trigger?
There is not a standard but what I've been observing is some common behaviors like:
Sensitive information never gets on versioning system, specially git which is a DCVS (you can clone the repo for other locations). If you don't follow, remember that our existing "security system" is based on the incapacity of read crypto info in a certain time, but in certain point you might be able to read the info. Usually on kubernetes I see operators, managing the service account across multiple namespaces and then other only referring the service account, tools like KMS, Cert manager, Vault and etc. are welcome
Configuration like env vars, endpoints, are stored and versioned with their own "lifecycle".
12factor does not meant to separate the configuration of your app from your repository, instead suggest not to put into your app (like on your container or even binary distribution).
In fact if you want to use a separate repo only for config you can do it, but if you want to put aside your project source code the configuration, you can do it as well. It is more a decision based on the size of the project, complexity, segregation of duties and team context. (IMHO)
On my case of study for instance, makes sense to separate config on a dedicated repository as production environment has more than 50 cluster, which one with their own isolation stack, also there are different teams managing their own services and using common backing services (db, api, streams...). In my opinion as long as things gets more complex and cross-shared, makes more sense to separate config on independent repository, as there are several teams and resources over multiple clusters.
Building a AWS serverless solution (lambda, s3, cloudformation etc) I need an automated build solution. The application should be stored in a Git repository (pref. Bitbucket or Codecommit). I looked at BitBucket pipelines, AWS CodePipeline, CodeDeploy , hosted CI/CD solutions but it seems that all of these do something static as in receiving a dumb signal that something changed to rebuild the whole environment.... like it is 1 app, not a distributed application.
I want to define ordered steps of what to do depending on the filetype per change.
E.g.
1. every updated .js file containing lambda code should first be used to update the existing lambda
2. after that, every new or changed cloudformation file/stack shoud be used to update or create existing ones, there may be a needed order (importing values from each other)
3. after that, code for new lambda's in .js files should be used to update the created lambda's (prev step) code.
Non updated resources should NOT be updated or recreated!
It seems that my pipelines should be ordered AND have the ability to filter input (e.g. only .js files from a certain path) and receive as input also what the name of the changed resource(s) is(are).
I dont seem to find this functionality withing AWS or hosted git solutions like BitBucket or CI/CD pipelines like CircleCI or Codeship, aws CodePipeline, CodeDeploy etc.
How come? Doesn't anyone need this? Seems like a basic requirement in my eyes....
I looked again at available AWS tooling and got to the following conclusion:
When coupling CodePipeline to CodeCommit repositry, every commit puts a whole package of the repositry on S3 as input for CodeCommit. So not only the changes but everything.
In CodePipeline there is the orchestration functionality i was looking for. You can have actions for every component like create-change-set for SAM component and execute-chage-set etc and have control over the order of all.
But:
Since all code is given as input I assume all actions in CodeCommit will be triggered even for a small change in code which does not affect 99% of the resources. Underwater SAM or CF will determine themself what did or did not change. But it is not very efficient. See my post here.
I cannot see in the pipeline overview which one was run the last time and its status...
I cannot temporary disable a pipeline or trigger with custom input
In the end I think to make a main pipeline with custom lambda code determining what actually changed using CodeCommit API and splitting all actions in sub pipelines. From the main pipeline I will push their needed input to S3 and execute them.
(i'm not allow to comment, so i'll try and provide an answer instead - probably not the one you were hoping for :) )
There is definitely a need and at Codeship we're looking into how best to support FaaS/Serverless workflows. It's been a bit of a moving target over the last years, but more common practices etc. are starting to emerge/mature to a point where it makes more sense to start codifying them.
For now, it seems most people working in this space have resorted to scripting (either the Serverless framework, or directly against the FaaS providers) but everyone's struggling with the issue of just deploying what's changes vs. deploying everything as you point to. Adding further complexity with sequencing is obviously just making things harder.
Most services (Codeship included) will allow you some form of sequenced/stepped approach to deploying, but you'll have to do all the heavy lifting of working out what has changed etc.
As to your question of How come? i think it's purely down to how fast the tooling has been changing lately combined with how few are really doing it. There's a huge push for larger companies to move to K8s and i think they've basically just drowned out the FaaS adopters. Not that it should be like that, or that we at Codeship don't want to change that; it's just how i personally see things.
Just a quick question about best practices on creating Docker images for critical environments. As we know in the real world, often times the team/company deploying to internal test is not the same as who is deploying to client test environments and production. There becomes a problem because all app configuration info may not be available when creating the Docker UAT/production image e.g. with Jenkins. And then there is the question about passwords that are stored in app configuration.
So my question is, how "fully configured" should the Docker image be? The way I see it, it is in practice not possible to fully configure the Docker image, but some app passwords etc. must be left out. But then again this slightly defies the purpose of a Docker image?
how "fully configured" should the Docker image be? The way I see it, it is in practice not possible to fully configure the Docker image, but some app passwords etc. must be left out. But then again this slightly defies the purpose of a Docker image?
There will always be tradeoffs between convenience, security, and flexibility.
An image that works with zero runtime configuration is very convenient to run but not very flexible and sensitive config like passwords will be exposed.
An image that takes all configuration at runtime is very flexible and doesn't expose sensitive info, but can be inconvenient to use if default values aren't provided. If a user doesn't know some values they may not be able to use the image at all.
Sensitive info like passwords usually land on the runtime side when deciding what configuration to bake into images and what to require at runtime. However, this isn't always the case. As an example, you may want to build test images with zero runtime configuration that only point to test environments. Everyone has access to test environment credentials anyways, zero configuration is more convenient for testers, and no one can accidentally run a build against the wrong database.
For configuration other than credentials (e.g. app properties, loglevel, logfile location) the organizational structure and team dynamics may dictate how much configuration you bake in. In a devops environment making changes and building a new image may be painless. In this case it makes sense to bake in as much configuration as you want to. If ops and development are separate it may take days to make minor changes to the image. In this case it makes sense to allow more runtime configuration.
Back to the original question, I'm personally in favor of choosing reasonable defaults for everything except credentials and allowing runtime overrides only as needed (convention with reluctant configuration). Runtime configuration is convenient for ops, but it can make tracking down issues difficult for the development team.