Is it possible to delay jenkins from connecting to a slave before the cloud init script has run - jenkins

I have a cloud init script
#cloud-config
package_upgrade: true
packages:
- openjdk-8-jdk
- apt-transport-https
- git
- jq
groups:
- docker
users:
- default
- name: jenkins
groups: docker
homedir: /var/lib/jenkins
lock_passwd: true
ssh_authorized_keys:
- ssh-rsa xyz
Which is given to the jenkins ec2-plugin when starting an ubuntu 18.04 AMI.
When jenkins tries to connect to the instance the logs show:
INFO: Verifying: java -fullversion
sh: 1: java: not found
Nov 01, 2018 8:22:10 PM null
INFO: Installing: sudo yum install -y java-1.8.0-openjdk.x86_64
sudo: no tty present and no askpass program specified
Nov 01, 2018 8:22:10 PM null
WARNING: Failed to install: sudo yum install -y java-1.8.0-openjdk.x86_64
sh: 1: java: not found
ERROR: Unable to launch the agent for Ubuntu 18.04 (i-xxx)
java.io.EOFException: unexpected stream termination
If I try to connect to the agent manually again after some time has elapsed (2/3 mins) all is fine:
Agent successfully connected and online
Should the cloud-init script have run before the SSH connection?
I have never had this trouble when using Amazon Linux AMI's where I install java 8 in the same way (via a cloud init script). Is this something specific to the way amazon linux runs cloud init scripts vs ubuntu?

In the end I decided it was easier to install java and create a new AMI to fully avoid this issue.
I think that perhaps my expectations that cloud init would run fully before connecting might be incorrect, mainly because of this comment in the documentation
Allow enough time for the instance to launch and execute the directives in your user data, and then check to see that your directives have completed the tasks you intended.
Perhaps one approach to help solve this might be to stop sshd in the run commands while things install and then start it again when all done, hopefully Jenkins would then connect only once everything is ready.

Related

Microk8s Error When Trying to create local cluster

I'm blindly following the installation documentation to get Microk8s installed and configured with kubeflow, but I'm hitting an error like this below:
https://charmed-kubeflow.io/docs/quickstart
To be sure I am clean:
juju unregister -y microk8s-localhost
sudo snap remove microk8s --purge
sudo snap remove juju --purge
rm ~/.local/share/juju
Ubuntu version:
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.1 LTS
Release: 22.04
Codename: jammy
(If I do the same with Ubuntu 20.04, it's correct)
Then, I try to create a local cluster.
sudo snap install microk8s --classic --channel=1.22/stable
snap install juju --classic --channel=2.9/stable
microk8s enable dns storage ingress metallb:10.64.140.43-10.64.140.49
microk8s status --wait-ready
juju bootstrap microk8s
The creation never ending
Creating Juju controller "microk8s-localhost" on microk8s/localhost
Bootstrap to Kubernetes cluster identified as microk8s/localhost
Fetching Juju Dashboard 0.8.1
Creating k8s resources for controller "controller-microk8s-localhost"
.........<waiting>.....
I try with a just installed Ubuntu VM, with the same problem.
The --debug
...
17:17:44 INFO cmd bootstrap.go:395 Creating k8s resources for controller "controller-microk8s-localhost"
17:17:44 DEBUG juju.kubernetes.provider bootstrap.go:628 creating controller service:
&Service{ObjectMeta:{controller-service controller-microk8s-localhost 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/managed-by:juju app.kubernetes.io/name:controller] map[controller.juju.is/id:d99735d9-21f7-435e-8a76-65d05ac60830] [] [] []},Spec:ServiceSpec{Ports:[]ServicePort{ServicePort{Name:api-server,Protocol:,Port:17070,TargetPort:{0 17070 },NodePort:0,AppProtocol:nil,},},Selector:map[string]string{app.kubernetes.io/name: controller,},ClusterIP:,Type:ClusterIP,ExternalIPs:[],SessionAffinity:,LoadBalancerIP:,LoadBalancerSourceRanges:[],ExternalName:,ExternalTrafficPolicy:,HealthCheckNodePort:0,PublishNotReadyAddresses:false,SessionAffinityConfig:nil,IPFamilyPolicy:nil,ClusterIPs:[],IPFamilies:[],AllocateLoadBalancerNodePorts:nil,LoadBalancerClass:nil,InternalTrafficPolicy:nil,},Status:ServiceStatus{LoadBalancer:LoadBalancerStatus{Ingress:[]LoadBalancerIngress{},},Conditions:[]Condition{},},}
17:17:44 DEBUG juju.caas.kubernetes.provider.proxy setup.go:177 polling caas credential rbac secret, in 1 attempt, token for secret "controller-proxy" not found
17:17:46 DEBUG juju.kubernetes.provider configmap.go:84 updating configmap "controller-configmap"
17:17:47 DEBUG juju.kubernetes.provider configmap.go:84 updating configmap "controller-configmap"
17:17:48 DEBUG juju.kubernetes.provider bootstrap.go:1209 mongodb container args:
printf 'args="--dbpath=/var/lib/juju/db --sslPEMKeyFile=/var/lib/juju/server.pem --sslPEMKeyPassword=ignored --sslMode=requireSSL --port=37017 --journal --replSet=juju --quiet --oplogSize=1024 --auth --keyFile=/var/lib/juju/shared-secret --storageEngine=wiredTiger --bind_ip_all"\nipv6Disabled=$(sysctl net.ipv6.conf.all.disable_ipv6 -n)\nif [ $ipv6Disabled -eq 0 ]; then\n args="${args} --ipv6"\nfi\nexec mongod ${args}\n'>/root/mongo.sh && chmod a+x /root/mongo.sh && /root/mongo.sh
17:17:48 DEBUG juju.kubernetes.provider k8s.go:2008 selecting units "app.kubernetes.io/name=controller" to watch
17:17:48 DEBUG juju.kubernetes.provider.watcher k8swatcher.go:114 fire notify watcher for controller-0
17:17:48 DEBUG juju.kubernetes.provider.watcher k8swatcher.go:114 fire notify watcher for controller
17:17:49 DEBUG juju.kubernetes.provider.watcher k8swatcher.go:114 fire notify watcher for controller-0
17:17:50 DEBUG juju.kubernetes.provider.watcher k8swatcher.go:114 fire notify watcher for controller-0
.........<waiting>.....
end never ending.
Tanks

Centos -- Yum Update Error – “HTTP Error 403 - Forbidden”

I’ve recently spun up a Docker container running CentOS Linux version 7. In my office, we have a proxy server, so once the container was up, I consoled in and set the proxy manually:
[me#8adfa83bb9e2 /home/me]#
[me#8adfa83bb9e2 /home/me]# export http_proxy="http://10.10.10.101:8888"
[me#8adfa83bb9e2 /home/me]#
On a separate SO post, I learned about setting the proxy in the /etc/yum.conf file. So I added the following line to my /etc/yum.conf file:
proxy=http://10.10.10.101:8888
And then I did a “yum clean metadata”:
[me#8adfa83bb9e2 /home/me]# yum clean metadata
Loaded plugins: fastestmirror, ovl
Cleaning repos: base extras updates
0 metadata files removed
0 sqlite files removed
0 metadata files removed
[me#8adfa83bb9e2 /home/me]#
At this point, I figured I was home free. I did a “yum update”:
[me#8adfa83bb9e2 /home/me]#
[me#8adfa83bb9e2 /home/me]# yum update
Loaded plugins: fastestmirror, ovl
Loading mirror speeds from cached hostfile
Could not retrieve mirrorlist http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=os&infra=container error was
14: HTTP Error 403 – Forbidden
...and then a lot more stuff here...
Hmm. “HTTP Error 403”. That’s a new one for me; I’m used to running “yum update” and it just automagically works.
This isn’t a DNS problem; the Docker container can resolve and ping mirrorlist.centos.org. I tried to use wget to pull down that URL, but the container doesn’t have wget installed. When I try the same thing from the host machine:
me#hostmachine:/home/me$
me#hostmachine:/home/me$
me#hostmachine:/home/me$ sudo wget http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=os&infra=container
[1] 7039
[2] 7040
[3] 7041
[2] Done arch=x86_64
me#hostmachine:/home/me$
Redirecting output to ‘wget-log’.
[1]- Exit 8 sudo wget http://mirrorlist.centos.org/?release=7
[3]+ Done repo=os
me#hostmachine:/home/me$
me#hostmachine:/home/me$
me#hostmachine:/home/me$ ls -l
total 4
-rw-r--r-- 1 root root 382 Jan 21 19:55 wget-log
me#hostmachine:/home/me$
me#hostmachine:/home/me$
me#hostmachine:/home/me$ more wget-log
--2021-01-21 19:55:31-- http://mirrorlist.centos.org/?release=7
Resolving mirrorlist.centos.org (mirrorlist.centos.org)... 147.75.69.225, 18.225.36.18, 67.219.148.138, ...
Connecting to mirrorlist.centos.org (mirrorlist.centos.org)|147.75.69.225|:80... connected.
HTTP request sent, awaiting response... 503 Service Unavailable
2021-01-21 19:55:31 ERROR 503: Service Unavailable.
me#hostmachine:/home/me$
me#hostmachine:/home/me$
(Yes, the host machine has the correct proxy settings. It is not a Centos machine.)
Soooooooo… It looks like the yum service is “unavailable” from my host system. But I’ve run “yum update” on many, many other Centos machines in my environment. No idea what might be different here. Has anyone seen this before? Thank you.
FYI for anyone who may be looking at this post... The issue was a proxy server problem. Once I set the proxy server settings on both the host and container, the issue went away. I think in the above post, I'd set the proxy on the container, but not the host, and the host was NAT'ing the container's IP address when doing a "yum update."

Time in Docker container out of sync with host machine

I'm trying to connect to CosmosDB through my SpringBoot app. I have all of this working if I run the app with Spring or via Intellij. But, when I run the app in Docker I get the following error message:
com.azure.data.cosmos.CosmosClientException: The authorization token is not valid at the current time.
Please create another token and retry
(token start time: Thu, 26 Mar 2020 04:32:10 GMT,
token expiry time: Thu, 26 Mar 2020 04:47:10 GMT, current server time: Tue, 31 Mar 2020 20:12:42 GMT).
Note that in the above error message the current server time is correct but the other times are 5 days behind.
What I find interesting is that I only ever receive this in the docker container.
FROM {copy of zulu-jdk11}
ARG JAR_FILE
#.crt file in the same folder as your Dockerfile
ARG CERT="cosmos.cer"
ARG ALIAS="cosmos2"
#import cert into java
COPY $CERT /
RUN chmod +x /$CERT
WORKDIR $JAVA_HOME/lib/security
RUN keytool -importcert -file /$CERT -alias $ALIAS -cacerts -storepass changeit -noprompt
WORKDIR /
COPY /target/${JAR_FILE} app.jar
COPY run-java.sh /
RUN chmod +x /run-java.sh
ENV JAVA_OPTIONS "-Duser.timezone=UTC"
ENV JAVA_APP_JAR "/app.jar"
# run as non-root to mitigate some security risks
RUN addgroup -S pcc && adduser -S nonroot -G nonroot
USER nonroot:nonroot
ENTRYPOINT ["/run-java.sh"]
One thing to note is ENV JAVA_OPTIONS "-Duser.timezone=UTC" but removing this didn't help me at all
I basically run the same step from IntelliJ and I have no issues with it but in docker the expiry date seems to be 5 days behind.
version: "3.7"
services:
orchestration-agent:
image: {image-name}
ports:
- "8080:8080"
network_mode: host
environment:
- COSMOSDB_URI=https://host.docker.internal:8081/
- COSMOSDB_KEY={key}
- COSMOSDB_DATABASE={database}
- COSMOSDB_POPULATEQUERYMETRICS=true
- COSMOSDB_ITEMLEVELTTL=60
I think it should also be mentioned that I changed the network_mode to host. And I also changed the CosmosDB URI from https://localhost:8081 to https://host.docker.internal:8081/
I would also like to mention that I built my dockerfile with the help of:
Importing self-signed cert into Docker's JRE cacert is not recognized by the service
How to add a SSL self-signed cert to Jenkins for LDAPS within Dockerfile?
Docker containers don't maintain a separate clock, it's identical to the Linux host since time is not a namespaced value. This is also why Docker removes the permission to change the time inside the container, since that would impact the host and other containers, breaking the isolation model.
However, on Docker Desktop, docker runs inside of a VM (allowing you to run Linux containers on non-Linux desktops), and that VM's time can get out of sync when the laptop is suspended. This is currently being tracked in an issue over on github which you can follow to see the progress: https://github.com/docker/for-win/issues/4526
Potential solutions include restarting your computer, restarting docker's VM, running NTP as a privileged container, or resetting the time sync in the windows VM with the following PowerShell:
Get-VMIntegrationService -VMName DockerDesktopVM -Name "Time Synchronization" | Disable-VMIntegrationService
Get-VMIntegrationService -VMName DockerDesktopVM -Name "Time Synchronization" | Enable-VMIntegrationService
With WSL 2, restarting the VM involves:
wsl --shutdown
wsl
There is recent known problem with WSL 2 time shift after sleep which has been fixed in 5.10.16.3 WSL 2 Linux kernel which is still not included in Windows 10 version 21H1 update but can be installed manually.
How to check WSL kernel version:
> wsl uname -r
Temporal workaround for the old kernel that helps until next sleep:
> wsl hwclock -s
Here's an alternative that worked for me on WSL2 with Docker Desktop on Windows:
Since it's not possible to set the date inside a Docker container, I just opened Ubuntu in WSL2 and ran the following command to synchronize the clock:
sudo date -s "$(wget -qSO- --max-redirect=0 google.com 2>&1 | grep Date: | cut -d' ' -f5-8)Z"
It worked well, so I added the following line in my root user's crontab:
# Edit root user's crontab
sudo crontab -e
# Add the following line to run it every minute of every day:
* * * * * sudo date -s "$(wget -qSO- --max-redirect=0 google.com 2>&1 | grep Date: | cut -d' ' -f5-8)Z"
After that, I just restarted my Docker containers, and the dates were correct since they seemed to use the WSL2 Ubuntu dates.
Date before (incorrect):
date
Thu Feb 4 21:50:35 UTC 2021
Date after (correct):
date
Fri Feb 5 19:01:05 UTC 2021

Hyperledger fabric first network error executing ./byfn.sh: line 175: docker-compose: command not found

I have the following prerequisites node v10.16.3, go v1.8.1 linux/amd64, Docker Podman v1.0.5, Docker-compose 1.24.1. All are included in path. When I execute "sudo ./byfn.sh up" I'm able to generate the crypto certificates without issues- However error "docker-compose: command not found
" comes in Line 175 when docker-compose is executed in NetworkUp. Tried uninstalling Docker and reinstalling and couldnt find anything relevant in stackflow as well - can someone pl bail me out ?
error
./byfn.sh: line 175: docker-compose: command not found
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
Additional info
host:
BuildahVersion: 1.6-dev
Conmon:
package: podman-1.0.5-1.gitf604175.module+el8.0.0+4017+bbba319f.x86_64
path: /usr/libexec/podman/conmon
version: 'conmon version 1.14.0-dev, commit: 6ee5ad0285ca12e6c8d0b663d7a8db5323812ef6-dirty'
Distribution:
distribution: '"rhel"'
version: "8.0"
kernel: 4.18.0-80.4.2.el8_0.x86_64
os: linux

docker-machine create with digitalocean driver and Ubuntu 16.04 x64 fails

I am attempting to create a docker machine on Digital Ocean, but with the 16.04 LTS instead of the default 15.10. The do-access-token file contains my token.
Here's the script (create-do):
#!/usr/bin/env bash
# Creates a digital-ocean server with Ubuntu 16.04 instead of the default
if [ "$1" != "" ]; then
echo "Creating: " $1
docker-machine \
create \
--driver digitalocean \
--digitalocean-access-token=`cat do-access-token` \
--digitalocean-image=ubuntu-16-04-x64 \
--digitalocean-ipv6=true \
$1
else
echo "Must have server name!"
fi
When I run the script like this:
$ ./create-do ps-server
It successfully allocates the machine at Digital Ocean, then craps out with this:
Creating: ps-server
Running pre-create checks...
Creating machine...
(ps-server) Creating SSH key...
(ps-server) Creating Digital Ocean droplet...
(ps-server) Waiting for IP address to be assigned to the Droplet...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Error creating machine: Error running provisioning: Something went wrong
running an SSH command!
command : sudo apt-get update
err : exit status 100
output : Reading package lists...
E: Could not get lock /var/lib/apt/lists/lock - open (11: Resource temporarily unavailable)
E: Unable to lock directory /var/lib/apt/lists/
The machine is running, but I can't get to it since the SSH key was apparently not set before things started going wrong.
Anyone seen this before and/or have a work-around?
Update: May 21, 2016
Broken again with same error this morning. Tried 4 times, failed same way each time.
Update: May 20, 2016
This was, according do Digital Ocean's support, due to an issue with their Ubuntu 16.04 image which has now been corrected and I have confirmed that this now works.
Related GitHub issue (not yet closed):
https://github.com/docker/machine/issues/3358
this worked for me:
docker-machine provision your-node
I've taken this solution from here: https://github.com/docker/machine/issues/3358
I hope this helps!

Resources