Retrying after a settable delay in Argo Workflows - rate-limiting

One of our Argo Workflow steps may hit a rate limit and I want to be able to tell argo how long it should wait until the next retry.
Is there a way to do it?
I've seen Retries on the documentation but it only talks about retry count and backoff strategies and it doesn't look like it could be parameterized.

As far as I know there's no built-in way to add a pause before the next retry.
However, you could build your own with Argo's exit handler feature.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: exit-handler-with-pause-
spec:
arguments:
parameters
- name: pause-before-retry-seconds
value: "60"
entrypoint: intentional-fail
onExit: exit-handler
- name: intentional-fail
container:
image: alpine:latest
command: [sh, -c]
args: ["echo intentional failure; exit 1"]
- name: exit-handler
steps:
- - name: pause
template: pause
when: "{{workflow.status}} != Succeeded"
- name: pause
container:
image: alpine:latest
env:
- name: SECONDS
value: "{{workflow.parameters.pause-before-retry-seconds}}"
command: [sh, -c]
args:
- >-
echo "Pausing before retry..."
sleep "$SECONDS"
If the retry pause needs to be calculated within the workflow, check out the exit handler with params example.

Related

Kubernetes /bin/bash with -c argument returns - : invalid option

I have this definition in my values.yaml which is supplied to job.yaml
command: ["/bin/bash"]
args: ["-c", "cd /opt/nonrtric/ric-common/ansible/; cat group_vars/all"]
However, after the pod initializes, I get this error:
/bin/bash: - : invalid option
If i try this syntax:
command: ["/bin/sh", "-c"]
args:
- >
cd /opt/nonrtric/ric-common/ansible/ &&
cat group_vars/all
I get this error: Error: failed to start container "ric-register-avro": Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: exec: "/bin/sh -c": stat /bin/sh -c: no such file or directory: unknown
Both sh and bash are supplied in the image, which is CentOS 7
job.yaml
---
apiVersion: batch/v1
kind: Job
metadata:
name: ric-register-avro
spec:
backoffLimit: 0
template:
spec:
containers:
- image: "{{ .Values.ric_register_avro_job.image }}"
name: "{{ .Values.ric_register_avro_job.name }}"
command: {{ .Values.ric_register_avro_job.containers.command }}
args: {{ .Values.ric_register_avro_job.containers.args }}
volumeMounts:
- name: all-file
mountPath: "/opt/nonrtric/ric-common/ansible/group_vars/"
readOnly: true
subPath: all
volumes:
- name: all-file
configMap:
name: ric-register-avro--configmap
restartPolicy: Never
values.yaml
global:
name: ric-register-avro
namespace: foo-bar
ric_register_avro_job:
name: ric-register-avro
all_file:
rest_api_url: http://10.230.227.13/foo
auth_username: foo
auth_password: bar
backoffLimit: 0
completions: 1
image: 10.0.0.1:5000/5gc/ric-app
containers:
name: ric-register-avro
command: ["/bin/bash"]
args: ["-c cd /opt/nonrtric/ric-common/ansible/; cat group_vars/all"]
restartPolicy: Never
In your Helm chart, you directly specify command: and args: using template syntax
command: {{ .Values.ric_register_avro_job.containers.command }}
args: {{ .Values.ric_register_avro_job.containers.args }}
However, the output of a {{ ... }} block is always a string. If the value you have inside the template is some other type, like a list, it will be converted to a string using some default Go rules, which aren't especially useful in a Kubernetes context.
Helm includes two lightly-documented conversion functions toJson and toYaml that can help here. Valid JSON is also valid YAML, so one easy approach is just to convert both parts to JSON
command: {{ toJson .Values.ric_register_avro_job.containers.command }}
args: {{ toJson .Values.ric_register_avro_job.containers.args }}
or, if you want it to look a little more like normal YAML,
command:
{{ .Values.ric_register_avro_job.containers.command | toYaml | indent 12 }}
args:
{{ .Values.ric_register_avro_job.containers.args | toYaml | indent 12 }}
or, for that matter, if you're passing a complete container description via Helm values, it could be enough to
containers:
- name: ric_register_avro_job
{{ .Values.ric_register_avro_job.containers | toYaml | indent 10 }}
In all of these cases, I've put the templating construct starting at the first column, but then used the indent function to correctly indent the YAML block. Double-check the indentation and adjust the indent parameter.
You can also double-check that what's coming out looks correct using helm template, using the same -f option(s) as when you install the chart.
(In practice, I might put many of the options you show directly into the chart template, rather than making them configurable as values. The container name, for example, doesn't need to be configurable, and I'd usually fix the command. For this very specific example you can also set the container's workingDir: rather than running cd inside a shell wrapper.)
I use this:
command: ["/bin/sh"]
args: ["-c", "my-command"]
Trying this simple job I've no issue:
apiVersion: batch/v1
kind: Job
metadata:
name: foo
spec:
template:
spec:
containers:
- name: foo
image: centos:7
command: ["/bin/sh"]
args: ["-c", "echo 'hello world'"]
restartPolicy: Never

Cloudbuild does not trigger new pod deployment, No resources found in namespace GKE

I've been playing around with GCP Triggers to deploy a new pod every time a push is made to a Github repo. I've got everything set up and the docker image is pushed to the GCP Container Registry and the trigger completes successfully without any errors. I use the $SHORT_SHA tags that are generated by the build pipeline as my tags. But, however, the new pod deployment does not work. I am not sure what the issue is because I am modifying the codebase as well with every new push just to test the deployment. I've followed couple of tutorials by Google on Triggers, but I am unable to understand what exactly the issue is and why does the newly pushed image does not get deployed.
cloudbuild.yaml
- name: maven:3-jdk-8
id: Maven Compile
entrypoint: mvn
args: ["package", "-Dmaven.test.skip=true"]
- name: 'gcr.io/cloud-builders/docker'
id: Build
args:
- 'build'
- '-t'
- 'us.gcr.io/$PROJECT_ID/<image_name>:$SHORT_SHA'
- '.'
- name: 'gcr.io/cloud-builders/docker'
id: Push
args:
- 'push'
- 'us.gcr.io/$PROJECT_ID/<image_name>:$SHORT_SHA'
- name: 'gcr.io/cloud-builders/gcloud'
id: Generate manifest
entrypoint: /bin/sh
args:
- '-c'
- |
sed "s/GOOGLE_CLOUD_PROJECT/$SHORT_SHA/g" kubernetes.yaml
- name: "gcr.io/cloud-builders/gke-deploy"
args:
- run
- --filename=kubernetes.yaml
- --image=us.gcr.io/$PROJECT_ID/<image_name>:$SHORT_SHA
- --location=us-central1-c
- --cluster=cluster-1
kubernetes.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: <deployment_name>
spec:
replicas: 1
selector:
matchLabels:
app: <container_label>
template:
metadata:
labels:
app: <container_label>
spec:
nodeSelector:
cloud.google.com/gke-nodepool: default-pool
containers:
- name: <container_name>
image: us.gcr.io/<project_id>/<image_name>:GOOGLE_CLOUD_PROJECT
ports:
- containerPort: 8080
apiVersion: v1
kind: Service
metadata:
name: <service-name>
spec:
selector:
app: <selector_name>
ports:
- port: 80
targetPort: 8080
type: LoadBalancer
I would recommend few changes to work your cloud build to deploy an application in the EKS cluster.
cloudbuild.yaml
In build and push stage change the arg into gcr.io/$PROJECT_ID/<image_name>:$SHORT_SHA use the gcr.io/$PROJECT_ID/sample-image:latest.
Generate a manifest stage - you can skip/delete the stage.
gke-deploy stage - remove the image step.
kubernetes.yaml
In the spec - you can mention the image as gcr.io/$PROJECT_ID/sample-image:latest it will always take the latest on each deployment.
Rest all seems good.

kubernetes prestop hook doesnt work with env variables

I have a deployment template as below. but the prestop hook is never been executed at all.
the idea here is set the zookeeper node offline before the pod is terminated.
I am running kubectl rollout to restart the pods. and old pod is when it terminates the prestop is not run. could someone please check whats wrong ?
Basically how its prestop executed in case of successful stop ? I need this feature because the zookeeper is involved here and the api connects to zookeeper to send the requests.
apiVersion: apps/v1
kind: Deployment
metadata:
name: abcd
labels:
app: abcd
spec:
replicas: 1
selector:
matchLabels:
app: abcd
template:
metadata:
labels:
app: abcd
spec:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
# terminationGracePeriodSeconds: 1
containers:
- name: se
image: "xxx"
lifecycle:
preStop:
exec:
command: ["zookeepercli","--servers","zk-hs", "-c", "set", "$HOSTNAME", "offline"]
ports:
- containerPort: 2345
- name: pe-1
image: "xxx"
lifecycle:
preStop:
exec:
command: ["zookeepercli","--servers","zk-hs", "-c", "set", "$HOSTNAME", "offline"]
ports:
- containerPort: 2313
As user2511126 mentioned in his/her comment:
the preStop hook doesnt uses env variables. I moved to bash script and works now
According to kubernetes documentation:
PreStop
This hook is called immediately before a container is terminated due to an API request or management event such as liveness probe failure, preemption, resource contention and others. A call to the preStop hook fails if the container is already in terminated or completed state. It is blocking, meaning it is synchronous, so it must complete before the call to delete the container can be sent. No parameters are passed to the handler.
A more detailed description of the termination behavior can be found in Termination of Pods.
No parameters can be passed to the handler, this includes environmental variables.

Is there a sneaky way to run a command before the entrypoint (in a k8s deployment manifest) without having to modify the dockerfile/image? [duplicate]

In this official document, it can run command in a yaml config file:
https://kubernetes.io/docs/tasks/configure-pod-container/
apiVersion: v1
kind: Pod
metadata:
name: hello-world
spec: # specification of the pod’s contents
restartPolicy: Never
containers:
- name: hello
image: "ubuntu:14.04"
env:
- name: MESSAGE
value: "hello world"
command: ["/bin/sh","-c"]
args: ["/bin/echo \"${MESSAGE}\""]
If I want to run more than one command, how to do?
command: ["/bin/sh","-c"]
args: ["command one; command two && command three"]
Explanation: The command ["/bin/sh", "-c"] says "run a shell, and execute the following instructions". The args are then passed as commands to the shell. In shell scripting a semicolon separates commands, and && conditionally runs the following command if the first succeed. In the above example, it always runs command one followed by command two, and only runs command three if command two succeeded.
Alternative: In many cases, some of the commands you want to run are probably setting up the final command to run. In this case, building your own Dockerfile is the way to go. Look at the RUN directive in particular.
My preference is to multiline the args, this is simplest and easiest to read. Also, the script can be changed without affecting the image, just need to restart the pod. For example, for a mysql dump, the container spec could be something like this:
containers:
- name: mysqldump
image: mysql
command: ["/bin/sh", "-c"]
args:
- echo starting;
ls -la /backups;
mysqldump --host=... -r /backups/file.sql db_name;
ls -la /backups;
echo done;
volumeMounts:
- ...
The reason this works is that yaml actually concatenates all the lines after the "-" into one, and sh runs one long string "echo starting; ls... ; echo done;".
If you're willing to use a Volume and a ConfigMap, you can mount ConfigMap data as a script, and then run that script:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: my-configmap
data:
entrypoint.sh: |-
#!/bin/bash
echo "Do this"
echo "Do that"
---
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
containers:
- name: my-container
image: "ubuntu:14.04"
command:
- /bin/entrypoint.sh
volumeMounts:
- name: configmap-volume
mountPath: /bin/entrypoint.sh
readOnly: true
subPath: entrypoint.sh
volumes:
- name: configmap-volume
configMap:
defaultMode: 0700
name: my-configmap
This cleans up your pod spec a little and allows for more complex scripting.
$ kubectl logs my-pod
Do this
Do that
If you want to avoid concatenating all commands into a single command with ; or && you can also get true multi-line scripts using a heredoc:
command:
- sh
- "-c"
- |
/bin/bash <<'EOF'
# Normal script content possible here
echo "Hello world"
ls -l
exit 123
EOF
This is handy for running existing bash scripts, but has the downside of requiring both an inner and an outer shell instance for setting up the heredoc.
I am not sure if the question is still active but due to the fact that I did not find the solution in the above answers I decided to write it down.
I use the following approach:
readinessProbe:
exec:
command:
- sh
- -c
- |
command1
command2 && command3
I know my example is related to readinessProbe, livenessProbe, etc. but suspect the same case is for the container commands. This provides flexibility as it mirrors a standard script writing in Bash.
IMHO the best option is to use YAML's native block scalars. Specifically in this case, the folded style block.
By invoking sh -c you can pass arguments to your container as commands, but if you want to elegantly separate them with newlines, you'd want to use the folded style block, so that YAML will know to convert newlines to whitespaces, effectively concatenating the commands.
A full working example:
apiVersion: v1
kind: Pod
metadata:
name: myapp
labels:
app: myapp
spec:
containers:
- name: busy
image: busybox:1.28
command: ["/bin/sh", "-c"]
args:
- >
command_1 &&
command_2 &&
...
command_n
Here is my successful run
apiVersion: v1
kind: Pod
metadata:
labels:
run: busybox
name: busybox
spec:
containers:
- command:
- /bin/sh
- -c
- |
echo "running below scripts"
i=0;
while true;
do
echo "$i: $(date)";
i=$((i+1));
sleep 1;
done
name: busybox
image: busybox
Here is one more way to do it, with output logging.
apiVersion: v1
kind: Pod
metadata:
labels:
type: test
name: nginx
spec:
containers:
- image: nginx
name: nginx
volumeMounts:
- name: log-vol
mountPath: /var/mylog
command:
- /bin/sh
- -c
- >
i=0;
while [ $i -lt 100 ];
do
echo "hello $i";
echo "$i : $(date)" >> /var/mylog/1.log;
echo "$(date)" >> /var/mylog/2.log;
i=$((i+1));
sleep 1;
done
dnsPolicy: ClusterFirst
restartPolicy: Always
volumes:
- name: log-vol
emptyDir: {}
Here is another way to run multi line commands.
apiVersion: batch/v1
kind: Job
metadata:
name: multiline
spec:
template:
spec:
containers:
- command:
- /bin/bash
- -exc
- |
set +x
echo "running below scripts"
if [[ -f "if-condition.sh" ]]; then
echo "Running if success"
else
echo "Running if failed"
fi
name: ubuntu
image: ubuntu
restartPolicy: Never
backoffLimit: 1
Just to bring another possible option, secrets can be used as they are presented to the pod as volumes:
Secret example:
apiVersion: v1
kind: Secret
metadata:
name: secret-script
type: Opaque
data:
script_text: <<your script in b64>>
Yaml extract:
....
containers:
- name: container-name
image: image-name
command: ["/bin/bash", "/your_script.sh"]
volumeMounts:
- name: vsecret-script
mountPath: /your_script.sh
subPath: script_text
....
volumes:
- name: vsecret-script
secret:
secretName: secret-script
I know many will argue this is not what secrets must be used for, but it is an option.

Liveness Probe gets timed

I'm using Jenkins and Kubernetes to perform this actions.
Since my loadBalancer needs a healthy pod I had to add the livenessProbe to my pod.
My configuration for the pod:
apiVersion: v1
kind: Pod
metadata:
labels:
component: ci
spec:
# Use service account that can deploy to all namespaces
serviceAccountName: default
# Use the persisnte volume
containers:
- name: gcloud
image: gcr.io/cloud-builders/gcloud
command:
- cat
tty: true
- name: kubectl
image: gcr.io/cloud-builders/kubectl
command:
- cat
tty: true
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
The issue that happens is when I want to deploy the code (CD over Jenkins) it comes to the touch
/tmp/healthy;
command and it's timed out.
The error response I get looks like this:
java.io.IOException: Failed to execute shell script inside container [kubectl] of pod [wobbl-mobile-label-qcd6x-13mtj]. Timed out waiting for the container to become ready!
When I type kubectl get events
I get the following response:
Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
Any hints on how to solve this?
I have read this documentation for the liveness and I took the config for it from there.
As can be seen from the link you are referring . The example is to help you understand the working of liveness probe. I the example below from this link
they have purposely removed /tmp/healthy file after
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
what this does is it creates /tmp/healthy file when the container is created. After 5 seconds the liveness probe kicks in and checks for /tmp/healthy file , at this moment the container does have a /tmp/healthy file present . After 30 seconds it deletes the file and liveness probe fails to find the /tmp/healthy file and restarts the container. This process will continue to go on and liveness probe will fail the health check after every 30 seconds.
If you only add
touch /tmp/healthy
The liveness probe should work well

Resources