Docker image deployed to Google Compute Engine keeps restarting - docker

I built an image with Google Cloud Build using Docker Compose. In my cloudbuild.yml file I have the following steps:
Build the docker image using docker compose
Tag the built image
Create an instance template
Create instance group
Now here is the problem every time a new instance gets built the created container from the image keeps restarting and never actually boots up. In spite of this I can build the image and start it as a container on the instance independent from the image from cloud build.
I managed to find some clues from the logs:
E1219 19:13:52 7f28dce6d700 api_server.cc:184 Metadata request unsuccessful: Server responded with 'Forbidden' (403): Transport endpoint is not connected
oauth2.cc:289 Getting auth token from metadata server docker
I also got some clue by running the following in the instance:
docker -a -i start <container_id>
Output: Unrecognized input header: 99
The cloudbuild.yml file looks like (I've replaced some variables with ...):
#cloudbuild.yaml
steps:
- name: 'docker/compose:1.22.0'
args: ['-f', 'docker/docker-compose.tb.prod.yml', 'up', '-d']
- name: 'gcr.io/cloud-builders/docker'
args: ['tag', 'tb:latest', '...']
- name: 'gcr.io/cloud-builders/gcloud'
args: [
'beta', 'compute', '--project=...', 'instance-templates', 'create-with-container',
'tb-app-staging-${COMMIT_SHA}',
'--machine-type=n1-standard-2', '--network=...', '--network-tier=PREMIUM', '--metadata=google-logging-enabled=true',
'--maintenance-policy=MIGRATE', '--service-account=...',
'--scopes=https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append',
'--tags=http-server,https-server', '--image=cos-stable-69-10895-62-0', '--image-project=cos-cloud', '--boot-disk-size=20GB', '--boot-disk-type=pd-standard',
'--container-restart-policy=always', '--labels=container-vm=cos-stable-69-10895-62-0',
'--boot-disk-device-name=...',
'--container-image=...',
]
- name: 'gcr.io/cloud-builders/gcloud'
args: [
'beta', 'compute', '--project=...', 'instance-groups',
'managed', 'rolling-action', 'start-update',
'tb-app-staging',
'--version',
'template=...',
'--zone=europe-west1-b',
'--max-surge=20',
'--max-unavailable=9999'
]
images: ['...']
timeout: 1200s

I found the issue and I'll answer this question myself just incase someone else runs into the same issue.
The problem was that in my docker-compose.yml I have the configuration for stdin_open and tty set to true but my cloudbuild.yml file did not accept it and was failing silently (annoying!).
To fix the issue you will need to use the flags --container-stdin and --container-tty on the create-with-container command.
More details can be found on the google docs https://cloud.google.com/compute/docs/containers/configuring-options-to-run-containers

I has a similar issue the reason was setting USER in Dockerfile. I was using changing user to 'node' which is user available in official nodejs images. But does not work on Google cloud containers.
FROM node:current-buster-slim
USER node

Related

How to use RABBITMQ_DEFAULT_USER and RABBITMQ_DEFAULT_PASS with defined rabbitmq-conf.json

I am facing an issue while trying to run RabbitMQ docker container.
It says user does not exist (https://i.stack.imgur.com/SOeqq.png).
I am passing userid and password to RabbitMQ docker-compose file as environment variables during runtime.(user and password wont be fix)
docker-compose file
I have created rabbitmq-conf.json as i need to use some predefined queues.
rabbitmq-conf.json file
It was working fine with RabbitMQ:3.8.14-management-alpine image but not working with Rabbitmq:3.11.3-management-alpine
rabbitmq.conf file contains:-
management.load_definitions = /etc/rabbitmq/rabbitmq-conf.json

Deploying Cloud Run via YAML gives error spec.template.spec.containers should contain exactly 1 container

When deploying a Cloud Run service via a YAML file from the command line, it fails with this error.
ERROR: (gcloud.run.services.replace) spec.template.spec.containers should contain exactly 1 container
This is because the documentation for adding an environment variable is wrong, or confusing at best.
The env node should be a child of the image and not the containers node as it says here.
https://cloud.google.com/run/docs/configuring/environment-variables#yaml
This is correct:
- image: us-east1-docker.pkg.dev/proj/repo/image:r1
env:
- name: SOMETHING
value: Xyz

Kubernetes Pod fails to start - Google Composer

I'm trying to set up a task in Airflow that creates a pod in Google Kubernetes from a Docker image in Registry. The task fails and returns:
Pod Launching failed: Pod took too long to start
Which does not say much. In a similar question, someone suggested adding a tag to the image version when using a private repository, but it did not solve the problem.
The operator is defined as:
start_pod_job = GKEStartPodOperator(task_id="start_pod_job",
project_id="my-project",
gcp_conn_id=GCP_CONN_ID,
location="us-east1-a",
cluster_name="us-east-cluster-name",
is_delete_operator_pod=True,
namespace="default",
image="us.gcr.io/my-project/my-image:v1",
cmds=['gs://my-bucket/my-file.csv'],
name="my-name")
The entrypoint of the image is [ "python", "./myScript.py"]. It takes as a parameter the URI of a file in Google Storage. I was able to run the image in my local machine successfully.
I'm currently using Airflow 1.10.15 and Composer 1.16.14.
Any suggestions?

How to get resolved sha digest for all images within Kubernetes yaml?

Docker image tags are mutable, in that image:latest and image:1.0 can both point to image#sha256:....., but when version 1.1 is released, image:latest stored within a registry can be pointed to an image with a different sha digest. Pulling an image with a particular tag now does not mean that an identical image will be pulled next time.
If a Kubernetes YAMl resource definition refers to an image by tag (not by digest), is there a means of determining what sha digest each image will actually resolve to, before the resource definition is deployed? Is this functionality supported using kustomize or kubectl?
Use case is wanting to determine what has actually been deployed in one environment before deploying to another (I'd like to take a hash of the resolved resource definition and could then use this to understand whether image:1.0 to be deployed to PROD refers to the same image:1.0 that was deployed to UAT).
Are there any tools that can be used to support this functionality?
For example, given the following YAML, is there a way of replacing all images with their resolved digests?
apiVersion: v1
kind: Pod
metadata:
name: example
spec:
containers:
- name: image1
image: image1:1.1
command:
- /bin/sh -c some command
- name: image2
image: image2:2.2
command:
- /bin/sh -c some other command
To get something like this:
apiVersion: v1
kind: Pod
metadata:
name: example
spec:
containers:
- name: image1
image: image1#sha:....
command:
- /bin/sh -c some command
- name: image2
image: image2#sha:....
command:
- /bin/sh -c some other command
I'd like to be able to do something like pipe yaml (that might come from cat, kustomize or kubectl ... --dry-run) through a tool and then pass to kubectl apply -f:
cat mydeployment.yaml | some-tool | kubectl apply -f -
EDIT:
The background to this is the need to be able to prove to auditors/regulators that what is about to be deployed to one env (PROD) is exactly what has been successfully deployed to another env (UAT). I'd like to use normal tags in the deployment template and at the time of deploying to UAT, take a snapshot of the template with the tags replaced with the digests of the resolved images. That snapshot will be what is deployed (via kubectl or similar). When deploying to PROD, that same snapshot will be used.
This tool is supporting exactly what you need...
kbld: https://get-kbld.io/
Resolves name-tag pair reference (nginx:1.17) into digest reference
(index.docker.io/library/nginx#sha256:2539d4344...)
Looks integrates quite well with templating tools like Kustomize or even Helm
You can all the containers used info with this command. This will list all namespaces, with pod names, with container image name and sha256 of the image.
kubectl get pods --all-namespaces -o=jsonpath='{range .items[*]}{"\n"}{.metadata.namespace}{","}{.metadata.name}{","}{range .status.containerStatuses[*]}{.image}{", "}{.imageID}{", "}{end}{end}' | sort
is there a means of determining what sha digest each image will actually resolve to, before the resource definition is deployed?
No, and in the case you describe, it can vary by node. The Deployment will create some number of Pods, each Pod will get scheduled on some Node, and the Kubelet there will only pull the image if it doesn’t have something with that tag already. If you have two replicas, and you’ve changed the image a tag points to, then on node A it could use the older image that was already there, but on node B where there isn’t an image, it will pull and get the newer version.
The best practice here is to avoid changing the image a tag points to. Give each build coming out of your CI system a unique tag (a datestamp or source control commit ID for example) and use that in your Kubernetes object specifications. That avoids this problem entirely.
A workaround is to set
imagePullPolicy: Always
in your pod specs, which will force the node to pull a newer version, but this is unnecessary overhead in most cases.
Here's another on - k8s-digester from google folks. It's a bit different in a sense than the main focus is on cluster-side changes(via Adm Controller) even though client-side KRM functions seems to also be possible.
Overall, kbld seems to be more about development experience and adoption with your cli/CICD/orchestration, while k8s-digester is more about administration on the cluster side.

Getting an error while trying to use a command under the lifecycle tag on kubernetes

im successfully running kubernetes, gcloud and postgres but i wanna make some modifications after pod startup , im trying to move some files so i tried these 3 options
1
image: paunin/postgresql-cluster-pgsql
lifecycle:
postStart:
exec:
command: [/bin/cp /var/lib/postgres/data /tmpdatavolume/]
2
image: paunin/postgresql-cluster-pgsql
lifecycle:
postStart:
exec:
command:
- "cp"
- "/var/lib/postgres/data"
- "/tmpdatavolume/"
3
image: paunin/postgresql-cluster-pgsql
lifecycle:
postStart:
exec:
command: ["/bin/cp "]
args: ["/var/lib/postgres/data","/tmpdatavolume/"]
on option 1 and 2, im getting the same errors (from kubectl get events )
Killing container with docker id f436e40f5df2: PostStart handler: Error ex
ecuting in Docker Container: -1
and on option 3 it wont even let me upload the yaml file giving me this error
error validating "postgres-master.yaml": error validating data: found invalid field args for v1.ExecAction; if you choose to ignore these errors, turn validation off with --validate=false
any help would be appreciated! thanks.
pd: i just pasted part of my yaml file since i wasnt getting any errors since i added those new lines
Here's the document about lifecycle hooks you might find useful.
Your option 1 won't work and should give you the error you saw, it should be ["/bin/cp","/var/lib/postgres/data","/tmpdatavolume/"] instead. Option 2 is also the right way to specify it. Can you kubectl exec into your pod and type those commands to see what error messages that generates? Do something like kubectl exec <pod-name> -i -t -- bash -il
The error message shown in option 3 means that you're not passing a valid configuration to the API server. To learn the API definition, see v1.Lifecycle and after a few clicks into its child fields you'll find args isn't valid under lifecycle.postStart.exec.
Alternatively, you can find those API definition using kubectl explain, e.g. kubectl explain pods.spec.containers.lifecycle.postStart.exec in this case.

Resources