Why does gitlab-ci default to using git clone for each job instead of building a docker image first?

Why does gitlab-ci default to using git clone for each job instead of building a docker image first? - docker

Gitlab-ci's default mode is to use git clone in every job in a pipeline.
This is time-consuming, especially since after cloning we need to install/update all dependencies.
I'd like to flip the order of our pipelines, and start with git clone + docker build, then run all subsequent jobs using that image without cloning and rebuilding for each job.
Am I missing something?
Is there any reason that I'd want to clone the repo separately for each job if I already have an image with the current code?

You are not missing anything. If you know what you are doing, you don't need to clone your repo for each stage in your pipeline. If you set the GIT_STRATEGY variable to none, your test jobs, or whatever they are, will run faster and you can simply run your docker pull commands and the tests that you need. Just make sure that you use the correct docker images, even if you start many parallel jobs. You could for example use CI_COMMIT_REF_NAME as part of the name of the docker image.
As to why GitLab defaults to using git clone, my guess is that this is the least surprising behavior. If you imagine someone new to GitLab and new to CI, it will be much easier for them to get up and running if each job simply clones the whole repo. You also have to remember that not everyone builds docker images in their jobs. I would guess that the most common way this is set up is either with programming languages that doesn't need to be compiled, for example python, or that there is a build job that produces binaries, and then a test job that runs the binaries. They can then use artifacts to send the binaries from the build job to the test job.
This is easy and it works. When people then realize that a lot of the time of their test jobs is spent just cloning the repository, they might look into how to change the GIT_STRATEGY, and to do other things to optimize their particular build.

One of the reasons of using a CI is to execute your repo in a fresh state. This cannot be done if you skip the git clone process in certain jobs. A job may modify the repo's state by deleting its file or generating new ones; only the artifacts which are explicitly documented in the pipeline should be shared between jobs-nothing else.

Related

Jenkins how to deploy same code to different servers (that I can specify)?

in the software company I work we use jenkins to deploy to different servers, the way we do that is ever single branch from git repository deploy to the specific server based on the name of the branch and in the specifications on the jenkinsfile. But we are in the process of unification of this branchs in just one: Master, but how we can configure jenkins to catch the same code and deploy to the servers we are interested in, without changing the code? I think we should separate code from deploy, but the pipeline still have to exist in some way.

2 solutions come to mind:
I believe you may be using SCM polling to get the builds started. With git diff you can check what was changed and based on that start the specific deployment.
If you are running the builds manually you can parameterize the build and specify this way which one you want to deploy.
From experience you might want to set the pipeline so when you commit to the repository only testing and building is done and deploy is not done (or only on a test env) and the proper prod deploy is done only manually (and can be parameterized).

In Jenkins, how do I set SCM behavior for the master node rather the build nodes?

I'm aware I'm lacking basic Jenkins concepts but with my current knowledge it's hard for to research successfully - maybe you can give me some hints I can use to re-word my question if needed.
Currently I'm facing a situation in which in a setup with several build nodes the Jenkins master machine is running out of disk space because Jenkins clones git repositories on both, the master and build nodes (and the master only has limited space). This question explains why.
Note: the master node itself does not build anything - it just clones the repo to a local workspace folder (I guess it just needs the Jenkinsfiles).
Going through the job configurations and googling this issue I find options regarding shallow and sparse clones or cleaning up the workspace before or after the build using the Cleanup Plugin. But those settings and plugins only care about the checkout done with checkout(SCM) on the build nodes, not the master.
But in case I want to leave the situation as is on the build nodes but keep the workspace folders on the Jenkins master machine slim, how do I approach this? What do I have to search for?
And as a side question - isn't it possible to have something like "git exports"? I.e. having the .git folders removed after checking out the commit I need?
In case it depends on the kind of job I use, I'm using scripted pipeline jobs.

I've got a similar setup: A master node, multiple build nodes.
Simply, I set the number of executors=0 on the master node (from Manage Jenkins -> Manage Nodes), so every job will land on build nodes.
The only repo cloned on the master is the shared library.

Running Jenkins builds in the master node is discouraged for two main reasons:
First of all, the usability of the Jenkins platform might be affected by many ongoing builds, for example showing delays on certain operations.
It is a well-known security problem, as pointed out by the documentation:
Any builds running on the built-in node have the same level of access to the controller file system as the Jenkins process.
It is therefore highly advisable to not run any builds on the built-in node, instead using agents (statically configured or provided by clouds) to run builds.
Always in that wiki page you can find details on this security problems, like what an attacker can do and an alternative that lets you use the master node to build, but patching some of the listed security problems. The solution is based on a plugin called Job Restrictions Plugin.
By the way, the most popular decision is to let slave nodes do the build:
To prevent builds from running on the built-in node directly, navigate to Manage Jenkins » Manage Nodes and Clouds. Select master in the list, then select Configure in the menu. Set the number of executors to 0 and save. Make sure to also set up clouds or build agents to run builds on, otherwise builds won’t be able to start.
If you really have strong reasons to build on the master node, you can always apply a different git clone strategy based on the value of the env.NODE_NAME environment variable. It is set to master if the pipeline job is run on the master node, otherwise it is filled with the node name (of course). Nonetheless, I have never seen anyone customizing the git clone command based on the node used, so... Don't do it 😉
About the sparse checkout and the sparse/shallow clone:
The former creates an incomplete working directory, avoiding to map all the trees and blobs present in the current commit, but only those you specify. Do you save that much space? Or better, is your project tree that heavy that you would need to do something like this? The sparse-checkout is generally used when you want a clean working tree, without unnecessary files.
sparse/shallow clone can be useful sometimes to reduce the download time, especially when you have a huge history. The most common option is --depth=1 that instructs git to retrieve only the most recent commit. As far as I know, Jenkins already applies some optimizations to speed the clone process but it generally keeps the entire history. Again, I am not sure you would gain a lot more space.
A valid (at least for me) alternative to space-optimizations on git files, is to build on Docker containers. Jenkins has reached a good level of integration with Docker and there are a lot of advantages using it, among which the disposal of the workspace after the job finished.

I didn't use the pipeline feature myself so far -- but conceptually it is clear that the master requires initial access to the Jenkinsfile. It will therefore be difficult to avoid this step entirely.
If Jenkins itself does not provide an option to fine-tune the clone/checkout behavior on the master side, then I'd see these options:
Create a custom version of Jenkins (or of the corresponding plugin) which hard-codes the behavior that you need (like, shallow/sparse clone). Modifying and building both Jenkins and its plugins is surprisingly simple; often, the most difficult part is to locate the code that you need to touch.
Tune the master's clone in-place. Shallowness and sparse-checkout properties can be set for existing clones. If you set these properties after the initial clone (possibly in the Jenkinsfile itself or in a post-build step), then Jenkins may possibly maintain those properties.
Constantly re-cloning and deleting the repo on master side increases the load both on the Jenkins master and on your Git server, so better be careful with that (especially since your repository has a size where disk space matters already). If you really want to go that way, you could try to force-remove the clone on the master in a post-build step -- this should be relatively easy to implement. You need to check that this hack will not interfere with Jenkins' access to the Jenkinsfile.

Jenkins with Shared jobs

I am working with Jenkins, and we have quite a few projects that all use the same tasks, i.e. we set a few variables, change the version, restore packages, start sonarqube, build the solution, run unit/integration tests, stop sonarqube etc. The only difference would be like {Solution_Name}, everything else is exactly the same.
What my question is, is there a way to create 1 'Shared' job, that does all that work, while the job for building the project passes the variables down to that shared worker job. What i'm looking for is the ability to not have to create all the tasks for all of our services/components. It be really nice if each of our services/components could have only 2 tasks, one to set the variables, another to run the shared job.
Is this possible?
Thanks in advance.

You could potentially benefit from looking into the new pipelines as code feature.
https://jenkins.io/doc/book/pipeline/
Using this pattern, you define your build pipeline in a groovy script rather than the jenkins' UI. This script is then kept in the codebase of the project it builds in a file called Jenkinsfile.
By checking this pipeline into a git repository, you can create a minimal configuration on the jenkins' side and simply tell it to look towards a specific repo and do the things that pipeline says to do.
There's a few benefits to this approach if it works for your setup. The big one being that your build pipeline will be fully versioned just like the project it builds. And the repository becomes portable, easily able to be built on any jenkins' installation across as many jobs as long as the pipeline plugins are installed.

Deploy web app via Jenkins

I have recently started to mess about with Jenkins and am unsure how to deploy my web app to a basic server. I've gotten into the Pipeline (https://jenkins.io/doc/book/pipeline/) and it seems like a fantastic way to work.
Where I'm a bit stuck is in two spots:
Once my repo is in my workspace within Jenkins, how do I prep it so I am only deploying the files necessary for the application? For example, I don't need my src/ directory or my Vagrantfile when I'm deploying things.
How do I deploy my app to the server? I see examples all over the place, but I am getting a bit lost since there seems to be so many ways to do this. I'm assuming scp or something like that...?
To build off of #2, is there a way to deploy web apps as transactions (in one shot) rather than file-by-file?
Please let me know if I can provide any information for potential answers!

I can't speak to your specific use case but a common way to do this is the build-and-deploy model, where you will have 2 Jenkins jobs. The "build" job will check out from source, run build commands such as maven or make, and lastly will "archive" the build artifacts. The latter is an option under the 'post-build actions' tab at the bottom.
In the "deploy" job, you will grab the artifacts of your choice. You can fetch a single file, all of them, and everything in between. This requires use of the 'Copy Artifact' plug-in and it allows you to copy files generated by other jobs. Now you can run your usual deploy script in the 'Execute Command' box. Most command line paradigms are supported out of the box such as setting environment variables.
The instructions above assume that you want to run your application off of a host that you've provisioned as a Jenkins slave.

Use artifacts as mentioned by Paul Back, or a 3rd party artifactory server as in video
This is always tricky and error-prone. Why not spin up a fresh server with new release (humanly verified once)

Jenkins & Ansible is the answer here. This is how I deploy to production, since I am in no need to use anything like Docker (too many issues with particular app) so have to run the app natively. Quick example would be
You monitor a specific branch in gitlab / github or whatever else and then call a webhook on push / merge etc on that branch, at this point you deal with anything you need to do by running a playbook on the jenkins job that monitors that branch (jenkins).
in my case jenkins and ansible run on the same server. Jenkins runs the ansible playbook that does whatever I need to do.
for example with ansible, I copy certain files that need to be there, run configs / change filenames etc. setup nginx, run composer,
you get the point.

Adding Jenkins Pipelines on Build

does anyone know if its possible to add a Jenkins pipeline build into a Jenkins docker image? For example, I may have a Jenkinsfile that defines my pipeline in groovie, and would like to ADD that into my image when building from the Jenkins image.
something like:
FROM jenkins:latest
ADD ./jobs/Jenkinsfile-pipeline-example $JENKINS_HOME/${someplace}
And have that pipeline ready to go when i run it.
Thanks.

It's a lot cleaner to use Jenkinsfile for this instead. This way, as your repositories develop you can change the build process without needing to recompile and redeploy your Jenkins instance everytime. (less work, and less CI downtime) Also, having the Jenkinsfile in source code allows a simpler decoupling.
If you have any questions about extending Jenkins on Docker further to handle building NodeJS, Ruby or something else I go into how to do all that in an article.

You can create any job in Jenkins by passing in an XML file that describes the job. See https://support.cloudbees.com/hc/en-us/articles/220857567-How-to-create-a-job-using-the-REST-API-and-cURL
The way I've done this is to manually create the job I want in Jenkins, then append config.xml to the URL and it shows you the XML content needed to generate the pipeline job. Save that XML and you can deliver it to your newly deployed Jenkins instance.
I use a system similar to this to generate several hundred jobs based on our external build specifications.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart