As far as I can see there are three ways to make Terraform use prepopulated plugins (to prevent downloads from web on init command).
terraform provider mirror command + provider_installation in .terraformrc (or terraform.rc)
terraform init -plugin-dir command
warming up provider-plugin-cache
Are they all equivalent? Which one is recommended? My use case is building "deployer" docker image for CI/CD pipeline and also I am considering the possibility to use Terraform under Terraspace.
The first two of these are connected in that they all share the same underlying mechanism: the "filesystem mirror" plugin installation method.
Using terraform init -plugin-dir makes Terraform in effect construct a one-off provider_installation block which contains only a single filesystem_mirror block referring to the given directory. It allows you to get that effect for just one installation operation, rather than configuring it in a central place for all future commands.
Specifically, if you run terraform init -plugin-dir=/example then that's functionally equivalent to the following CLI configuration:
provider_installation {
filesystem_mirror {
path = "/example"
}
}
The plugin cache directory is different because Terraform will still access the configured installation methods (by default, the origin registry for each provider) but will skip downloading the plugin package file (the file actually containing the plugin code, as opposed to the metadata about the release) if it's already in the cache. Similarly, it'll save any new plugin package it downloads into the cache for future use.
This therefore won't stop Terraform from trying to install any new plugins it encounters via network access to the origin registry. It is just an optimization to avoid re-downloading the same package repeatedly.
There is a final approach which is similar to the first one but with a slight difference: Implied Local Mirror Directories.
If you don't have a provider_installation block in your configuration then Terraform will construct one for itself by searching the implied mirror directories and treating any provider it finds there as a local-only one. For example, if /usr/share/terraform/plugins contains any version of registry.terraform.io/hashicorp/aws (the official AWS provider) then Terraform will behave as if it were configured as follows:
provider_installation {
filesystem_mirror {
path = "/usr/share/terraform/plugins"
include = ["registry.terraform.io/hashicorp/aws"]
}
direct {
exclude = ["registry.terraform.io/hashicorp/aws"]
}
}
This therefore makes Terraform treat the local directory as the only possible installation source for that particular provider, but still allows Terraform to fetch any other providers from upstream if requested.
If your requirement is for terraform init to not consult any remote services at all for the purposes of plugin installation, the approach directly intended for that case is to write a provider_installation block with only a filesystem_mirror block inside of it, which will therefore disable the direct {} installation method and thus prevent Terraform from trying to access the origin registry for any provider.
Not sure about Terraspace. Only regarding plugins:
terraform provider mirror command + provider_installation in .terraformrc (or terraform.rc): seems more secure version, but it requires to update the local mirror whenever you change plugin versions. It's not very clear whether you can reuse the same mirror location for different configurations requiring different set or versions of plugins.
terraform init -plugin-dir command: terraform commands will fail if required plugins and specific versions are not preinstalled. This approach seems to be the most time consuming and the most controlling of available plugins.
When this option is used, only the plugins in the given directory are available for use.
warming up provider-plugin-cache: this one can re-use pre-downloaded plugin versions and also will try to download new versions when you update the constraints. This approach will work if your cache path is writable. If it is not then terraform probably will fail as 2nd option. This option seems to be the least time consuming and more close to the local development. Cache is not automatically cleaned up and will need some cleaning automation.
Depending on whether you have many different configurations, what level of security is required, whether you have capacity to update caches/mirrors frequently enough to follow with required versions, the choice could be different as well.
Related
Is it possible to manage Dockerfile for a project externally
So instead of ProjectA/Dockerfile, ProjectB/Dockerfile
Can we do something like project-deploy/Dockerfile.ProjectA, project-deploy/Dockerfile.ProjectB which somehow will know how to build ProjectA, ProjectB docker images.
We would like to allow separation of the developer, devops roles
Yes this is possible, though not recommended (I'll explain why in a second). First, how you would accomplish what you asked:
Docker Build
The command to build an image in its simplest form is docker build . which performs a build with a build context pulled from the current directory. That means the entire current directory is sent to the docker service, and the service will use it to build an image. This build context should contain all of the local resources docker needs to build your image. In this case, docker will also assume the existence of a file called Dockerfile inside of this context, and use it for the actual build.
However, we can override the default behavior by specifying a the -f flag in our build command, e.g. docker build -f /path/to/some.dockerfile . This command uses your current directory as the build context, but uses it's own Dockerfile that can be defined elsewhere.
So in your case, let's we assume the code for ProjectA is housed in the directory project-a and project-deploy in project-deploy. You can build and tag your docker image as project-a:latest like so:
docker build -f project-deploy/Dockerfile.ProjectA -t project-a:latest project-a/
Why this is a bad idea
There are many benefits to using containers over traditional application packaging strategies. These benefits stem from the extra layer of abstraction that a container provides. It enables operators to use a simple and consistent interface for deploying applications, and it empowers developers with greater control and ownership of the environment their application runs in.
This aligns well with the DevOps philosophy, increases your team's agility, and greatly alleviates operational complexity.
However, to enjoy the advantages containers bring, you must make the organizational changes to reflect them or all your doing is making thing more complex, and further separating operations and development:
If your operators are writing your dockerfiles instead of your developers, then you're just adding more complexity to their job with few tangible benefits;
If your developers are not in charge of their application environments, there will continue to be conflict between operations and development, accomplishing basically nothing for them either.
In short, Docker is a tool, not a solution. The real solution is to make organizational changes that empower and accelerate the individual with logically consistent abstractions, and docker is a great tool designed to complement that organizational change.
So yes, while you could separate the application's environment (the Dockerfile) from its code, it would be in direct opposition to the DevOps philosophy. The best solution would be to treat the docker image as an application resource and keep it in the application project, and allow operational configuration (like environment variables and secrets) to be accomplished with docker's support for volumes and variables.
I'm using Django but I guess the question is applicable to any web project.
In our case, there are two types of codes, the first one being python code (run in django), and others are static files (html/js/css)
I could publish new image when there is a change in any of the code.
Or I could use bind mounts for the code. (For django, we could bind-mount the project root and static directory)
If I use bind mounts for code, I could just update the production machine (probably with git pull) when there's code change.
Then, docker image will handle updates that are not strictly our own code changes. (such as library update or new setup such as setting up elasticsearch) .
Does this approach imply any obvious drawback?
For security reasons is advised to keep an operating system up to date with the last security patches but docker images are meant to be released in an immutable fashion in order we can always be able to reproduce productions issues outside production, thus the OS will not update itself for security patches being released. So this means we need to rebuild and deploy our docker image frequently in order to stay on the safe side.
So I would prefer to release a new docker image with my code and static files, because they are bound to change more often, thus requiring frequent release, meaning that you keep the OS more up to date in terms of security patches without needing to rebuild docker images in production just to keep the OS up to date.
Note I assume here that you release new code or static files at least in a weekly basis, otherwise I still recommend to update at least once a week the docker images in order to get the last security patches for all the software being used.
Generally the more Docker-oriented solutions I've seen to this problem learn towards packaging the entire application in the Docker image. That especially includes application code.
I'd suggest three good reasons to do it this way:
If you have a reproducible path to docker build a self-contained image, anyone can build and reproduce it. That includes your developers, who can test a near-exact copy of the production system before it actually goes to production. If it's a Docker image, plus this code from this place, plus these static files from this other place, it's harder to be sure you've got a perfect setup matching what goes to production.
Some of the more advanced Docker-oriented tools (Kubernetes, Amazon ECS, Docker Swarm, Hashicorp Nomad, ...) make it fairly straightforward to deal with containers and images as first-class objects, but trickier to say "this image plus this glop of additional files".
If you're using a server automation tool (Ansible, Salt Stack, Chef, ...) to push your code out, then it's straightforward to also use those to push out the correct runtime environment. Using Docker to just package the runtime environment doesn't really give you much beyond a layer of complexity and some security risks. (You could use Packer or Vagrant with this tool set to simulate the deploy sequence in a VM for pre-production testing.)
You'll also see a sequence in many SO questions where a Dockerfile COPYs application code to some directory, and then a docker-compose.yml bind-mounts the current host directory over that same directory. In this setup the container environment reflects the developer's desktop environment and doesn't really test what's getting built into the Docker image.
("Static files" wind up in a gray zone between "is it the application or is it data?" Within the context of this question I'd lean towards packaging them into the image, especially if they come out of your normal build process. That especially includes the primary UI to the application you're running. If it's things like large image or video assets that you could reasonably host on a totally separate server, it may make more sense to serve those separately.)
I am starting to doubt if I might be missing the whole point of cfn-init. I started thinking that I should bake my AMI used in my cfn template to save time so it doesn't waste time reinstalling all the packages so I can quickly test the next boostrapping steps. But if I have in my cfn-init commands to download awslogs and stream my logs by executing the cfn-init command in my userdata, if I bake that in, my log group will be created, but doesn't the awslog program need to run a fresh command to start streaming logs, it just does not make sense if that command is baked in. Which brings me to my next question, is cfn-init bootstrapping designed (or at least best practice) to run it everytime a new ec2 is spun up, i.e. you cannot or should not bake in the cfn-init part?
Your doubt is very valid and it is purely the design approach and style of working of the devop.
If your cfn-int just accomplishes installing few packages; very well this can be baked in the AMI. As you rightly pointed it would save time and ensure faster stack creation.
However what if you would like to install the latest version of the packages; in That case you can just add the latest flag / keyword to the cfn-init package section. I have used the cfn-init to dynamically to accept the BIOS name of the Active Directory - Domain Controller; so in this case I wouldn't be able to bake that in AMI.
Another place where cfn-init helps is that assume that you have configured 4 packages to be installed; what if you there is a requirement of yet another package to be also installed; in that case - If it is CloudFormation cfn-init - it is another single line of code to be added. If it is AMI - a new AMI approach new AMI has to Baked.
This is purely a trade off.
Is it possible to provision Dataflow workers with custom packages?
I'd like to shell out to a Debian-packaged binary from inside a computation.
Edit: To be clear, the package configuration is sufficiently complex that it's not feasible to just bundle the files in --filesToStage.
The solution should involve installing the Debian package at some point.
This is not something Dataflow explicitly supports. However, below are some suggestions on how you could accomplish this. Please keep in mind that things could change in the service that could break this in the future.
There are two separate problems:
Getting the debian package onto the worker.
Installing the debian package.
For the first problem you can use --filesToStage and specify the path to your debian package. This will cause the package to be uploaded to GCS and then downloaded to the worker on startup. If you use this option you must include in the value of --filesToStage all your jars as well since they will not be included by default if you explicitly set --filesToStage.
On the java worker any files passed in --filesToStage will be available in the following directories (or a subdirectory of)
/var/opt/google/dataflow
or
/dataflow/packages
You would need to check both locations in order to be guaranteed of finding the file.
We provide no guarantee that these directories won't change in the future. These are simply the locations used today.
To solve the second problem you can override StartBundle in your DoFn. From here you could shell out to the command line and install your debian package after finding it in /dataflow/packages.
There could be multiple instances of your DoFn running side by side so you could get contention issues if two processes try to install your package simultaneously. I'm not sure if the debian package system can handle this or you need to so in your code explicitly.
A slight variant of this approach is to not use --filesToStage to distribute the package to your workers but instead add code to your startBundle to fetch it from some location.
If you are making a service with a Dockerfile is it preferred for you to build an image with the Dockerfile and push it to the registry -- rather than distribute the Dockerfile (and repo) for people to build their images?
What use cases favour Dockerfile+repo distribution, and what use case favour Registry distribution?
I'd imagine the same question could be applied to source code versus binary package installs.
Pushing to a central shared registry allows you to freeze and certify a particular configuration and then make it available to others in your organisation.
At DevTable we were initially using a Dockerfile that was run when we deployed our servers in order to generate our Docker images. As our docker image become more complex and had more dependencies, it was taking longer and longer to generate the image from the Dockerfile. What we really needed was a way to generate the image once and then pull the finished product to our servers.
Normally, one would accomplish this by pushing their image to index.docker.io, however we have proprietary code that we couldn't publish to the world. You may also end up in such a situation if you're planning to build a hosted product around Docker.
To address this need in to community, we built Quay, which aims to be the Github of Docker images. Check it out and let us know if it solves a need for you.
Private repositories on your own server are also an option.
To run the server, clone the https://github.com/dotcloud/docker-registry to your own server.
To use your own server, prefix the tag with the address of the registry's host. For example:
# Tag to create a repository with the full registry location.
# The location (e.g. localhost.localdomain:5000) becomes
# a permanent part of the repository name
$ sudo docker tag 0u812deadbeef your_server.example.com:5000/repo_name
# Push the new repository to its home location on your server
$ sudo docker push your_server.example.com:5000/repo_name
(see http://docs.docker.io.s3-website-us-west-2.amazonaws.com/use/workingwithrepository/#private-registry)
I think it depends a little bit on your application, but I would prefer the Dockerfile:
A Dockerfile...
... in the root of a project makes it super easy to build and run it, it is just one command.
... can be changed by a developer if needed.
... is documentation about how to build your project
... is very small compared with an image which could be useful for people with a slow internet connection
... is in the same location as the code, so when people checkout the code, they will find it.
An Image in a registry...
... is already build and ready!
... must be maintained. If you commit new code or update your application you must also update the image.
... must be crafted carefully: Can the configuration be changed? How you handle the logs? How big is it? Do you package an NGINX within the image or is this part of the outer world? As #Mark O'Connor said, you will freeze a certain configuration, but that's maybe not the configuration someone-else want to use.
This is why I would prefer the Dockerfile. It is the same with a Vagrantfile - it would prefer the Vagrantfile instead of the VM image. And it is the same with a ANT or Maven script - it would prefer the build script instead of the packaged artifact (at least if I want to contribute code to the project).