I am using Jenkins to run some Ansible playbooks. One of the simple tests I did was to have the playbook to cat the fstab file on a remote server:
The playbook looks like this:
---
- hosts: "tesst-1-server"
tasks:
- name: dislpay /etc/fstab
shell: cat /etc/fstab
register: fstab_reg
- debug: msg="{{ fstab_reg.stdout }}"
In Jenkins, I have a freestyle project, it uses Invoke Ansible Playbook to call the above playbook, and the project credentials was setup with a different: ansible-user. This is different from the default user-jenkins that runs Jenkins. User ansible-user can ssh to all my servers. I have ansible-user setup in Jenkins Credential with its private key and passphrase. But when I run the project, I got an error:
[update_fstab] $ /usr/bin/ansible-playbook google/ansible/test-scripts/test/sub_book.yml -i /etc/ansible/hosts -f 5 --private-key /tmp/ssh14117407503194058572.key -u ansible-user
[WARNING]: Invalid characters were found in group names but not replaced, use
-vvvv to see details
fatal: [test-1-server]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ansible-user#test-1-server: Permission denied (publickey).", "unreachable": true}
I am not quiet sure what exactly the error is saying as I have setup the private key and passphrase to ansible-user's credentials. What does the group names in the message mean? Because this is done through Jenkins, I am not sure how to do the -vvv as it suggested.
How can I make Jenkins to pass the private key and passphrase to the Ansible playbook?
Thanks!
I think I have found the "issue". After I switched to a different user other than ansible-user, the playbook worked. Interesting thing is that when I created the private key pairs for ansible-user, I used "-m PEM" and it should be good for Jenkins.
I am trying to build web application with gitlab-CI.
I created runner with this configuration:
name = "REDACTED"
url = "REDACTED"
token = REDACTED
executor = "docker-windows"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "mcr.microsoft.com/powershell"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["c:\\cache"]
shm_size = 0
Then my .gitlab-ci.yml looks like this
image: microsoft/dotnet:latest
stages:
- build
- test
before_script:
- "dotnet restore"
node_build:
stage: build
only:
- master
script:
- "echo Stage - Build started"
- "cd ./WebApplication"
- dir
- dotnet build
node_test:
stage: test
only:
- master
script:
- "echo Stage - Test started"
- "cd ./WebApplication"
- dir
- dotnet build
When the pipeline is ran, output looks like this
Running with gitlab-runner 13.11.0 (7f7a4bb0)
on REDACTED REDACTED
Preparing the "docker-windows" executor
Using Docker executor with image microsoft/dotnet:latest ...
Pulling docker image microsoft/dotnet:latest ...
Using docker image sha256:34f6f2295334d34567c67059f7c28836c79e014d0c4fadf54de3978798640003 for microsoft/dotnet:latest with digest microsoft/dotnet#sha256:61d86fc52893087df54b0579fcd9c33e144a4b3d34c543a94e6a6b376c74285d ...
Preparing environment
Running on REDACTED via
REDACTED ...
Getting source from Git repository
Fetching changes with git depth set to 50...
Reinitialized existing Git repository in C:/builds/REDACTED /c-sharp-ci-test/.git/
Checking out bbb22919 as master...
git-lfs/2.11.0 (GitHub; windows amd64; go 1.14.2; git 48b28d97)
Skipping Git submodules setup
Executing "step_script" stage of the job script
Using docker image sha256:34f6f2295334d34567c67059f7c28836c79e014d0c4fadf54de3978798640003 for microsoft/dotnet:latest with digest microsoft/dotnet#sha256:61d86fc52893087df54b0579fcd9c33e144a4b3d34c543a94e6a6b376c74285d ...
Cleaning up file based variables
ERROR: Job failed (system failure): Error response from daemon: container e144f05bdd00b4e744554345666afbc008ee2437c7d56bf4a98fbd949a88b1b2 encountered an error during hcsshim::System::CreateProcess: failure in a Windows system call: The system cannot find the file specified. (0x2)
[Event Detail: Provider: 00000000-0000-0000-0000-000000000000]
[Event Detail: Provider: 00000000-0000-0000-0000-000000000000]
[Event Detail: onecore\vm\compute\management\orchestration\vmhostedcontainer\processmanagement.cpp(173)\vmcomputeagent.exe!00007FF7D970B1D7: (caller: 00007FF7D96BE70B) Exception(6) tid(37c) 80070002 The system cannot find the file specified.
CallContext:[\Bridge_ProcessMessage\VmHostedContainer_ExecuteProcess]
Provider: 00000000-0000-0000-0000-000000000000] extra info: {"CommandLine":"powershell -NoProfile -NoLogo -InputFormat text -OutputFormat text -NonInteractive -ExecutionPolicy Bypass -Command -","User":"ContainerUser","WorkingDirectory":"C:\\","Environment"
When I look into log, it says it tried to run step_script stage of the job, which I never specified and it tries to run powershell. Why is that happening and how can I get rid of it ? I supose dotnet:latest does not have powershell in it as it is not needed for building.
First, it is always best to use a fixed tag instead of the shifting "latest": from one build to the next, "latest" can reference a new image version.
Second try a specific dotnet image like mcr.microsoft.com/dotnet/core/sdk:3. instead of microsoft/dotnet:xxx: note though they are likely to use Powershell, as seen in their Dockerfile
Try one of the .Dotnet Samples outside of GitLab to see if you can make it work manually, then include it in your gitlab-ci.yml.
Note: from gitlab-org/gitlab-runner issue 26418, step_script would be equivalent to build_script.
I'm using Gitlab CI-CD to build some projects using a single Runner (for now) on Docker (the runner itself is a docker container, so I guess this is Docker in Docker..)
My problem is that I can't use my own nexus/npm repository while building...
npm install --registry=http://153.89.23.53:8082/repository/npm-all
npm ERR! code EHOSTUNREACH
npm ERR! errno EHOSTUNREACH
npm ERR! request to http://153.89.23.53:8082/repository/npm-all/typescript/-/typescript-3.6.5.tgz failed, reason: connect EHOSTUNREACH 153.89.23.53:8082
The same runner on another server works perfectly, but it doens't work if running on the same server hosting the Nexus (everything is container-based)
The Gitlab runner is using the host network.
If I connect to the Runner and try to ping 153.89.23.53:8082 (Nexus), it works
root#62591008a000:/# wget http://153.89.23.53:8082
--2020-07-13 09:56:16-- http://153.89.23.53:8082/
Connecting to 153.89.23.53:8082... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7952 (7.8K) [text/html]
Saving to: 'index.html'
index.html 100%[===========================================================================================>] 7.77K --.-KB/s in 0s
2020-07-13 09:56:16 (742 MB/s) - 'index.html' saved [7952/7952]
So I guess the problem occurs in the "second docker container", the one used inside the runner... but I have no idea what I should change.
Note : I could probably set the gitlab runner to join the nexus network and use internal IPs, but this would break the scripts if the runner is started on other servers...
Ok, I found the solution..
There is a network_mode settings that can be set in the runner configuration. Default value is bridge, not host..
**config.toml**
[runners.docker]
...
volumes = ["/cache"]
network_mode = "host"
In a docker container I want to run k8s.
When I run kubeadm join ... or kubeadm init commands I see sometimes errors like
\"modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could
not open moddep file
'/lib/modules/3.10.0-1062.1.2.el7.x86_64/modules.dep.bin'.
nmodprobe:
FATAL: Module configs not found in directory
/lib/modules/3.10.0-1062.1.2.el7.x86_64",
err: exit status 1
because (I think) my container does not have the expected kernel header files.
I realise that the container reports its kernel based on the host that is running the container; and looking at k8s code I see
// getKernelConfigReader search kernel config file in a predefined list. Once the kernel config
// file is found it will read the configurations into a byte buffer and return. If the kernel
// config file is not found, it will try to load kernel config module and retry again.
func (k *KernelValidator) getKernelConfigReader() (io.Reader, error) {
possibePaths := []string{
"/proc/config.gz",
"/boot/config-" + k.kernelRelease,
"/usr/src/linux-" + k.kernelRelease + "/.config",
"/usr/src/linux/.config",
}
so I am bit confused what is simplest way to run k8s inside a container such that it consistently past this getting the kernel info.
I note that running docker run -it solita/centos-systemd:7 /bin/bash on a macOS host I see :
# uname -r
4.9.184-linuxkit
# ls -l /proc/config.gz
-r--r--r-- 1 root root 23834 Nov 20 16:40 /proc/config.gz
but running exact same on a Ubuntu VM I see :
# uname -r
4.4.0-142-generic
# ls -l /proc/config.gz
ls: cannot access /proc/config.gz
[Weirdly I don't see this FATAL: Module configs not found in directory error every time, but I guess that is a separate question!]
UPDATE 22/November/2019. I see now that k8s DOES run okay in a container. Real problem was weird/misleading logs. I have added an answer to clarify.
I do not believe that is possible given the nature of containers.
You should instead test your app in a docker container then deploy that image to k8s either in the cloud or locally using minikube.
Another solution is to run it under kind which uses docker driver instead of VirtualBox
https://kind.sigs.k8s.io/docs/user/quick-start/
It seems the FATAL error part was a bit misleading.
It was badly formatted by my test environment (all on one line.
When k8s was failing I saw the FATAL and assumed (incorrectly) that was root cause.
When I format the logs nicely I see ...
kubeadm join 172.17.0.2:6443 --token 21e8ab.1e1666a25fd37338 --discovery-token-unsafe-skip-ca-verification --experimental-control-plane --ignore-preflight-errors=all --node-name 172.17.0.3
[preflight] Running pre-flight checks
[WARNING FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
[preflight] The system verification failed. Printing the output from the verification:
KERNEL_VERSION: 4.4.0-142-generic
DOCKER_VERSION: 18.09.3
OS: Linux
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.3. Latest validated version: 18.06
[WARNING SystemVerification]: failed to parse kernel config: unable to load kernel module: "configs", output: "modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-142-generic/modules.dep.bin'\nmodprobe: FATAL: Module configs not found in directory /lib/modules/4.4.0-142-generic\n", err: exit status 1
[discovery] Trying to connect to API Server "172.17.0.2:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://172.17.0.2:6443"
[discovery] Failed to request cluster info, will try again: [the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps cluster-info)]
There are other errors later, which I originally though were a side-effect of the nasty looking FATAL error e.g. .... "[util/etcd] Attempt timed out"]} but I now think root cause is Etcd part times out sometimes.
Adding this answer in case someone else puzzled like I was.
I am using the https://github.com/puckel/docker-airflow image to run Airflow. I had to add pip install docker in order for it to support DockerOperator.
Everything seems ok, but I can't figure out how to pull an image from a private google docker container repository.
I tried adding the connection in the admin section type of google cloud conenction and running the docker operator as.
t2 = DockerOperator(
task_id='docker_command',
image='eu.gcr.io/project/image',
api_version='2.3',
auto_remove=True,
command="/bin/sleep 30",
docker_url="unix://var/run/docker.sock",
network_mode="bridge",
docker_conn_id="google_con"
)
But always get an error...
[2019-11-05 14:12:51,162] {{taskinstance.py:1047}} ERROR - No Docker
registry URL provided
I also tried the docker_conf_option
t2 = DockerOperator(
task_id='docker_command',
image='eu.gcr.io/project/image',
api_version='2.3',
auto_remove=True,
command="/bin/sleep 30",
docker_url="unix://var/run/docker.sock",
network_mode="bridge",
dockercfg_path="/usr/local/airflow/config.json",
)
I get the following error:
[2019-11-06 13:59:40,522] {{docker_operator.py:194}} INFO - Starting
docker container from image
eu.gcr.io/project/image
[2019-11-06 13:59:40,524] {{taskinstance.py:1047}} ERROR -
('Connection aborted.', FileNotFoundError(2, 'No such file or
directory'))
I also tried using only dockercfg_path="config.json" and got the same error.
I can't really use Bash Operator to try to docker login as it does not recognize docker command...
What am I missing?
line 1: docker: command not found
t3 = BashOperator(
task_id='print_hello',
bash_command='docker login -u _json_key - p /usr/local/airflow/config.json eu.gcr.io'
)
airflow.hooks.docker_hook.DockerHook is using docker_default connection where one isn't configured.
Now in your first attempt, you set google_con for docker_conn_id and the error thrown is showing that host (i.e registry name) isn't configured.
Here are a couple of changes to do:
image argument passed in DockerOperator should be set to image tag without registry name prefixing it.
DockerOperator(api_version='1.21',
# docker_url='tcp://localhost:2375', #Set your docker URL
command='/bin/ls',
image='image',
network_mode='bridge',
task_id='docker_op_tester',
docker_conn_id='google_con',
dag=dag,
# added this to map to host path in MacOS
host_tmp_dir='/tmp',
tmp_dir='/tmp',
)
provide registry name, username and password for the underlying DockerHook to authenticate to Docker in your google_con connection.
You can obtain long lived credentials for authentication from a service account key. For username, use _json_key and in password field paste in the contents of the json key file.
Here are logs from running my task:
[2019-11-16 20:20:46,874] {base_task_runner.py:110} INFO - Job 443: Subtask docker_op_tester [2019-11-16 20:20:46,874] {dagbag.py:88} INFO - Filling up the DagBag from /Users/r7/OSS/airflow/airflow/example_dags/example_docker_operator.py
[2019-11-16 20:20:47,054] {base_task_runner.py:110} INFO - Job 443: Subtask docker_op_tester [2019-11-16 20:20:47,054] {cli.py:592} INFO - Running <TaskInstance: docker_sample.docker_op_tester 2019-11-14T00:00:00+00:00 [running]> on host 1.0.0.127.in-addr.arpa
[2019-11-16 20:20:47,074] {logging_mixin.py:89} INFO - [2019-11-16 20:20:47,074] {local_task_job.py:120} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.989537 s
[2019-11-16 20:20:47,088] {logging_mixin.py:89} INFO - [2019-11-16 20:20:47,088] {base_hook.py:89} INFO - Using connection to: id: google_con. Host: gcr.io/<redacted-project-id>, Port: None, Schema: , Login: _json_key, Password: XXXXXXXX, extra: {}
[2019-11-16 20:20:48,404] {docker_operator.py:209} INFO - Starting docker container from image alpine
[2019-11-16 20:20:52,066] {logging_mixin.py:89} INFO - [2019-11-16 20:20:52,066] {local_task_job.py:99} INFO - Task exited with return code 0
I know the question is about GCR but it's worth noting that other container registries may expect the config in a different format.
For example Gitlab expects you to pass the fully qualified image name to the DAG and only put the Gitlab container registry host name in the connection:
DockerOperator(
task_id='docker_command',
image='registry.gitlab.com/group/project/image:tag',
api_version='auto',
docker_conn_id='gitlab_registry',
)
The set up your gitlab_registry connection like:
docker://gitlab+deploy-token-1234:ABDCtoken1234#registry.gitlab.com
Based on recent Cloud Composer documentation, it's recommended to use KubernetesPodOperator instead, like this:
from airflow.contrib.operators.kubernetes_pod_operator import KubernetesPodOperator
KubernetesPodOperator(
task_id='docker_op_tester',
name='docker_op_tester',
dag=dag,
namespace="default",
image="eu.gcr.io/project/image",
cmds=["ls"]
)
Further to #Tamlyn's answer, we can also skip the creation of connection (docker_conn_id) from airflow and use it with gitlab as under
On your development machine :
https://gitlab.com/yourgroup/yourproject/-/settings/repository (create a token here and get details for logging in)
docker login registry.gitlab.com (on the machine to login to docker from the machine to push the image to docker - enter your gitlab credentials when prompted)
docker build -t registry.gitlab.com/yourgroup/yourproject . && docker push registry.gitlab.com/yourgroup/yourproject (builds and pushes to your project repo's container registry)
On your airflow machine :
https://gitlab.com/yourgroup/yourproject/-/settings/repository (you can use the above created token for logging in)
docker login registry.gitlab.com (to login to docker from the machine to pull the image from docker, this skips the need for creating a docker registry connection - enter your gitlab credentials when prompted = this generates ~/.docker/config.json which is required Reference from docker docs )
In your dag :
dag = DAG(
"dag_id",
default_args = default_args,
schedule_interval = "15 1 * * *"
)
docker_trigger = DockerOperator(
task_id = "task_id",
api_version = "auto",
network_mode = "bridge",
image = "registry.gitlab.com/yourgroup/yourproject",
auto_remove = True, # use if required
force_pull = True, # use if required
xcom_all = True, # use if required
# tty = True, # turning this on screws up the log rendering
# command = "", # use if required
environment = { # use if required
"envvar1": "envvar1value",
"envvar2": "envvar2value",
},
dag = dag,
)
this works with Ubuntu 20.04.2 LTS (tried and tested) with airflow installed on the instance
You will need to instal Cloud SDK in your workstation which includes the gcloud command-line tool.
After installing Cloud SDK and Docker version 18.03 or newer
According to their documentation to pull from Container Registry, use the command:
docker pull [HOSTNAME]/[PROJECT-ID]/[IMAGE]:[TAG]
or
docker pull [HOSTNAME]/[PROJECT-ID]/[IMAGE]#[IMAGE_DIGEST]
where:
[HOSTNAME] is listed under Location in the console. It's one of four
options: gcr.io, us.gcr.io, eu.gcr.io, or asia.gcr.io.
[PROJECT-ID] is your Google Cloud Platform Console project ID.
[IMAGE] is the image's name in Container Registry.
[TAG] is the tag applied to the image. In a registry, tags are unique
to an image.
[IMAGE_DIGEST] is the sha256 hash value of the image contents. In the
console, click on the specific image to see its metadata. The digest
is listed as the Image digest.
To get the pull command for a specific image:
Click on the name of an image to go to the specific registry.
In the registry, check the box next to the version of the image that
you want to pull.
Click SHOW PULL COMMAND on the top of the page.
Copy the pull command, which identifies the image using either the
tag or the digest
*Also check that you have push and pull permissions from the registry.
**Configured Docker to use gcloud as a credential helper, or are using another authentication method. To use gcloud as the credential helper, run the command:
gcloud auth configure-docker