I have 5 tasks in my project that need to be run periodically. Some of these tasks are run on a daily basis, some on a weekly basis.
I try to containerize each task in a Docker image. Here is one illustrative example:
FROM tensorflow/tensorflow:2.7.0
RUN mkdir /home/MyProject
COPY . /home/MyProject
WORKDIR /home/MyProject/M1/src/
RUN pip install pandas numpy
CMD ./task1.sh
There are a list of Python scripts that need to be run in the task1.sh file defined above. This is not a server application or anything similar, it will run the task1.sh, which will run all the python scripts defined in it one by one, and the entire process will be finished within minutes. And the same process is supposed to be repeated 24 hours later.
How can I schedule such Docker containers in GCP? Are there different ways of doing it? Which one is comparably simpler if there are multiple solutions?
I am not a dev-ops expert by any means. All examples in documentation I find are explained for server applications which are running all the time, not like my example where the image needs to be run just once periodically. This topic is quite daunting for a beginner in this domain like myself.
ADDENDUM:
Looking at Google's documentation for cronjobs in GKE on the following page:
https://cloud.google.com/kubernetes-engine/docs/how-to/cronjobs
I find the following cronjob.yaml file:
# cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
concurrencyPolicy: Allow
startingDeadlineSeconds: 100
suspend: false
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo "Hello, World!"
restartPolicy: OnFailure
It is stated that this cronjob prints the current time and a string once every minute.
But it is documented in a way with the assumption that you deeply understand what is going on on the page, in which case you would not need to read the documentation!
Let's say that I have my image that I would like it to be run once every day, and the name of my image - say - is my_image.
I assume that I am supposed to change the following part for my own image.
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo "Hello, World!"
It is a total mystery what these names and arguments mean.
name: hello
I suppose it is just a user selected name and does not have any practical importance.
image: busybox
Is this busybox the base image? If not, what is that? It says NOTHING about what this busybox thing is and where it comes from!
args:
- /bin/sh
- -c
- date; echo "Hello, World!"
And based on the explanation on the page, this is the part that prints the date and the "Hello, World!" string to the screen.
Ok... So, how do I modify this template to create a cronjob out of my own image my_image? This documentation does not help at all!
I will answer your comment here, because the second part of your question is too long to answer.
Don't be afraid, it's kubernetes API definition. You declare what you want to the control plane. It is in charge to make your whishes happen!
# cronjob.yaml
apiVersion: batch/v1 # The API that you call
kind: CronJob # The type of object/endpoint in that API
metadata:
name: hello # The name of your job definition
spec:
schedule: "*/1 * * * *" # Your scheduling, change it to "0 10 * * *" to run your job every dat at 10.00am
concurrencyPolicy: Allow # config stuff, deep dive later
startingDeadlineSeconds: 100 # config stuff, deep dive later
suspend: false # config stuff, deep dive later
successfulJobsHistoryLimit: 3 # config stuff, deep dive later
failedJobsHistoryLimit: 1 # config stuff, deep dive later
jobTemplate: # Your execution definition
spec:
template:
spec:
containers:
- name: hello # Custom name of your container. Only to help you in case of debug, logs, ...
image: busybox # Image of your container, can be gcr.io/projectID/myContainer for example
args: # Args to pass to your container. You also have the "entrypoint" definition to change if you want. The entrypoint is the binary to run and that will receive the args
- /bin/sh
- -c
- date; echo "Hello, World!"
# You can also use "command" to run the command with the args directly. In fact it's WHAT you start in your container to perform the job.
restartPolicy: OnFailure # Config in case of failure.
You have more details on the API definition here
Here the API definition of a container with all the possible values to customize it.
Related
I have a use case that my "./main" binary should run inside the pod and stop after some time (90 seconds) before launching a new pod by the cronJob object.
But I am not confused about how to add both sleep and run my binary in the background together. Please suggest a good approach to this and excuse me for any wrong syntax.
Dockerfile
FROM golang:alpine
WORKDIR /app
COPY main /app
RUN apk update && apk add bash
CMD ["./main &"]
---
cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: cron
namespace: test-cron
spec:
schedule: "*/2 * * * *"
concurrencyPolicy: Replace
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
volumes:
- name: log
hostPath:
path: /data/log/test-cron/
containers:
- name: test-cron
image: test-kafka-0.5
command: ["sleep", "90"] // By adding this, the sleep command is working but my binary is not running inside my container.
Kubernetes has built-in support for killing a Pod after a deadline; you do not need to try to implement this manually.
In your Dockerfile, set up your image to run the program normally. Do not try to include any sort of "sleep" or "kill" option, and do not try to run the program in the background.
CMD ["./main"]
In the Job spec, you need to set an activeDeadlineSeconds: field.
apiVersion: batch/v1
kind: CronJob
spec:
jobTemplate:
spec:
activeDeadlineSeconds: 90 # <-- add
template:
spec:
containers:
- name: test-cron
image: test-kafka-0.5
# do not override `command:`
An identical option exists at the Pod level. For your use case this doesn't especially matter, but if the Job launches multiple Pods then there's a difference of whether each individual Pod has the deadline or whether the Job as a whole does.
It looks from the question like you're trying to run this job every two minutes and not have two concurrent copies of it. Assuming the Pod is created and starts promptly, this should accomplish that. If you had a reason to believe the Job would run faster the second time and just want to restart it, the CronJob might not be the setup you're after.
I have 2 programs that need to be run periodically.
simple_job_1.py:
from datetime import datetime
import time
print("Starting job 1 ... ", datetime.now())
print("Doing stuff in job 1 for 20 seconds .........")
time.sleep(20)
print("Stopping job 1 ... ", datetime.now())
simple_job_2.py:
from datetime import datetime
import time
print("Starting job 2 ... ", datetime.now())
print("Doing stuff in job 2 for 5 seconds .........")
time.sleep(5)
print("Stopping job 2 ... ", datetime.now())
And I have created 2 docker images by building following Dockerfile's:
For job1:
FROM python:3
# Create a folder inside the container
RUN mkdir /home/TestProj
# Copy everything from the current folder in host to the dest folder in container
COPY . /home/TestProj
WORKDIR /home/TestProj
COPY . .
CMD ["python", "simple_job_1.py"]
For job2:
FROM python:3
# Create a folder inside the container
RUN mkdir /home/TestProj
# Copy everything from the current folder in host to the dest folder in container
COPY . /home/TestProj
WORKDIR /home/TestProj
COPY . .
CMD ["python", "simple_job_2.py"]
Here is how I build these container images:
docker build -t simple_job_1:1.0 .
docker build -t simple_job_2:1.0 .
And here is my docker-compose yaml file:
simple_compose.yaml:
version: "3.9"
services:
job1:
image: simple_job_1:1.0
job2:
image: simple_job_2:1.0
Q1) I need to run this compose - say - every 10th minute as a cronjob. How can I achieve it? I know that it is possible to run containers as cronjobs but is it possible to do that with docker compose?
Q2) Is it possible to run the docker-compose as a cronjob in Google Cloud GKE?
Q3) How can I make sure that job2 only starts after job1 completes?
ADDENDUM:
Here is an example of cronjob spec for running containers as cronjobs in GKE:
# cronjob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: simple-job-cj-1
spec:
schedule: "0 8 * * *"
concurrencyPolicy: Allow
startingDeadlineSeconds: 100
suspend: false
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: simple-job-cj-1
image: simple_job
restartPolicy: OnFailure
But please note that this is running a given container. By not being an expert in this field, I guess I can define multiple containers under the containers: section in the spec: above, which probably(?) means I do not need to use docker-compose then?? But if that is the case, how can I make sure that job2 only starts after job1 completes running?
I'm developing an app whose logs contain custom fields for metric purposes.
Therefore, we produce the logs in JSON format and send them to an Elasticsearch cluster.
We're currently working on migrating the app from a local Docker node to our organization's Kubernetes cluster.
Our cluster uses Fluentd as a DaemonSet, to output logs from all pods to our Elasticsearch cluster.
The setup is similar to this: https://medium.com/kubernetes-tutorials/cluster-level-logging-in-kubernetes-with-fluentd-e59aa2b6093a
I'm trying to figure out what's the best practice to send logs from our app. My two requirements are:
That the logs are formatted correctly in JSON format. I don't want them to be nested in the msg field of the persisted document.
That I can run kubectl logs -f <pod> and view the logs in readable text format.
Currently, if I don't do anything and let the DaemonSet send the logs, it'll fail both requirements.
The best solution I thought about is to ask the administrators of our Kubernetes cluster to replace the Fluentd logging with Fluentbit.
Then I can configure my deployment like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-app
labels:
app: example-app
annotations:
fluentbit.io/parser-example-app: json
fluentbit.io/exclude-send-logs: "true"
spec:
selector:
matchLabels:
app: example-app
template:
metadata:
labels:
app: example-app
spec:
containers:
- name: example-app
image: myapp:1.0.0
volumeMounts:
- name: app-logs
mountPath: "/var/log/app"
- name: tail-logs
image: busybox
args: [/bin/sh, -c, 'tail -f /var/log/example-app.log']
volumeMounts:
- name: app-logs
mountPath: "/var/log/app"
volumes:
- name: app-logs
emptyDir: {}
Then the logs are sent to the Elasticsearch in correct JSON format, and I can run kubectl logs -f example-app -c tail-logs to view them in a readable format.
Is this the best practice though? Am I missing a simpler solution?
Is there an alternative supported by Fluentd?
I'll be glad to here your opinion :)
There isn't really a good option here that isn't going to chew massive amounts of CPU. The closest things I can suggest other than the solution you mentioned above is inverting it where the main output stream is unformatted and you run Fluent* (usually Bit) are a sidecar on a secondary file stream. That's no better though.
Really most of us just make the output be in JSON format and on the rare occasions we need to manually poke at logs outside of the normal UI (Kibana, Grafana, whatever), we just deal with the annoyance.
You could also theoretically make your "human" format sufficiently machine parsable to allow for querying. The usual choice there is "logfmt", aka key=value pairs. So my log lines on logfmt-y services look like timestamp=2021-05-15T03:48:05.171973Z level=info event="some message" otherkey=1 foo="bar baz". That's simple enough to read by hand but also can be parsed efficiently.
Fist time deploying to OpenShift (actually minishift in my Windows 10 Pro). Any sample application I deploied successfully resulted in:
From Web Console I see a weird message "Build #1 is pending" although I saw it was successfully from PowerShell
I found someone fixing similiar issue changing to 0.0.0.0 (enter link description here) but I give a try and it isn't the solution in my case.
Here are the full logs and how I am deploying
PS C:\to_learn\docker-compose-to-minishift\first-try> oc new-app https://github.com/openshift/nodejs-ex warning: Cannot check if git requires authentication.
--> Found image 93de123 (16 months old) in image stream "openshift/nodejs" under tag "10" for "nodejs"
Node.js 10.12.0
---------------
Node.js available as docker container is a base platform for building and running various Node.js applications and frameworks. Node.js is a platform built on Chrome's JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.
Tags: builder, nodejs, nodejs-10.12.0
* The source repository appears to match: nodejs
* A source build using source code from https://github.com/openshift/nodejs-ex will be created
* The resulting image will be pushed to image stream tag "nodejs-ex:latest"
* Use 'start-build' to trigger a new build
* WARNING: this source repository may require credentials.
Create a secret with your git credentials and use 'set build-secret' to assign it to the build config.
* This image will be deployed in deployment config "nodejs-ex"
* Port 8080/tcp will be load balanced by service "nodejs-ex"
* Other containers can access this service through the hostname "nodejs-ex"
--> Creating resources ...
imagestream.image.openshift.io "nodejs-ex" created
buildconfig.build.openshift.io "nodejs-ex" created
deploymentconfig.apps.openshift.io "nodejs-ex" created
service "nodejs-ex" created
--> Success
Build scheduled, use 'oc logs -f bc/nodejs-ex' to track its progress.
Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
'oc expose svc/nodejs-ex'
Run 'oc status' to view your app.
PS C:\to_learn\docker-compose-to-minishift\first-try> oc get bc/nodejs-ex -o yaml apiVersion: build.openshift.io/v1
kind: BuildConfig
metadata:
annotations:
openshift.io/generated-by: OpenShiftNewApp
creationTimestamp: 2020-02-20T20:10:38Z
labels:
app: nodejs-ex
name: nodejs-ex
namespace: samplepipeline
resourceVersion: "1123211"
selfLink: /apis/build.openshift.io/v1/namespaces/samplepipeline/buildconfigs/nodejs-ex
uid: 1003675e-541d-11ea-9577-080027aefe4e
spec:
failedBuildsHistoryLimit: 5
nodeSelector: null
output:
to:
kind: ImageStreamTag
name: nodejs-ex:latest
postCommit: {}
resources: {}
runPolicy: Serial
source:
git:
uri: https://github.com/openshift/nodejs-ex
type: Git
strategy:
sourceStrategy:
from:
kind: ImageStreamTag
name: nodejs:10
namespace: openshift
type: Source
successfulBuildsHistoryLimit: 5
triggers:
- github:
secret: c3FoC0RRfTy_76WEOTNg
type: GitHub
- generic:
secret: vlKqJQ3ZBxfP4HWce_Oz
type: Generic
- type: ConfigChange
- imageChange:
lastTriggeredImageID: 172.30.1.1:5000/openshift/nodejs#sha256:3cc041334eef8d5853078a0190e46a2998a70ad98320db512968f1de0561705e
type: ImageChange
status:
lastVersion: 1
I am building my docker image with jenkins using:
docker build --build-arg VCS_REF=$GIT_COMMIT \
--build-arg BUILD_DATE=`date -u +"%Y-%m-%dT%H:%M:%SZ"` \
--build-arg BUILD_NUMBER=$BUILD_NUMBER -t $IMAGE_NAME\
I was using Docker but I am migrating to k8.
With docker I could access those labels via:
docker inspect --format "{{ index .Config.Labels \"$label\"}}" $container
How can I access those labels with Kubernetes ?
I am aware about adding those labels in .Metadata.labels of my yaml files but I don't like it that much because
- it links those information to the deployment and not the container itself
- can be modified anytime
...
kubectl describe pods
Thank you
Kubernetes doesn't expose that data. If it did, it would be part of the PodStatus API object (and its embedded ContainerStatus), which is one part of the Pod data that would get dumped out by kubectl get pod deployment-name-12345-abcde -o yaml.
You might consider encoding some of that data in the Docker image tag; for instance, if the CI system is building a tagged commit then use the source control tag name as the image tag, otherwise use a commit hash or sequence number. Another typical path is to use a deployment manager like Helm as the principal source of truth about deployments, and if you do that there can be a path from your CD system to Helm to Kubernetes that can pass along labels or annotations. You can also often set up software to know its own build date and source control commit ID at build time, and then expose that information via an informational-only API (like an HTTP GET /_version call or some such).
I'll add another option.
I would suggest reading about the Recommended Labels by K8S:
Key Description
app.kubernetes.io/name The name of the application
app.kubernetes.io/instance A unique name identifying the instance of an application
app.kubernetes.io/version The current version of the application (e.g., a semantic version, revision hash, etc.)
app.kubernetes.io/component The component within the architecture
app.kubernetes.io/part-of The name of a higher level application this one is part of
app.kubernetes.io/managed-by The tool being used to manage the operation of an application
So you can use the labels to describe a pod:
apiVersion: apps/v1
kind: Pod # Or via Deployment
metadata:
labels:
app.kubernetes.io/name: wordpress
app.kubernetes.io/instance: wordpress-abcxzy
app.kubernetes.io/version: "4.9.4"
app.kubernetes.io/managed-by: helm
app.kubernetes.io/component: server
app.kubernetes.io/part-of: wordpress
And use the downward api (which works in a similar way to reflection in programming languages).
There are two ways to expose Pod and Container fields to a running Container:
1 ) Environment variables.
2 ) Volume Files.
Below is an example for using volumes files:
apiVersion: v1
kind: Pod
metadata:
name: kubernetes-downwardapi-volume-example
labels:
version: 4.5.6
component: database
part-of: etl-engine
annotations:
build: two
builder: john-doe
spec:
containers:
- name: client-container
image: k8s.gcr.io/busybox
command: ["sh", "-c"]
args: # < ------ We're using the mounted volumes inside the container
- while true; do
if [[ -e /etc/podinfo/labels ]]; then
echo -en '\n\n'; cat /etc/podinfo/labels; fi;
if [[ -e /etc/podinfo/annotations ]]; then
echo -en '\n\n'; cat /etc/podinfo/annotations; fi;
sleep 5;
done;
volumeMounts:
- name: podinfo
mountPath: /etc/podinfo
volumes: # < -------- We're mounting in our example the pod's labels and annotations
- name: podinfo
downwardAPI:
items:
- path: "labels"
fieldRef:
fieldPath: metadata.labels
- path: "annotations"
fieldRef:
fieldPath: metadata.annotations
Notice that in the example we accessed the labels and annotations that were passed and mounted to the /etc/podinfo path.
Beside labels and annotations, the downward API exposed multiple additional options like:
The pod's IP address.
The pod's service account name.
The node's name and IP.
A Container's CPU limit , CPU request , memory limit, memory request.
See full list in here.
(*) A nice blog discussing the downward API.
(**) You can view all your pods labels with
$ kubectl get pods --show-labels
NAME ... LABELS
my-app-xxx-aaa pod-template-hash=...,run=my-app
my-app-xxx-bbb pod-template-hash=...,run=my-app
my-app-xxx-ccc pod-template-hash=...,run=my-app
fluentd-8ft5r app=fluentd,controller-revision-hash=...,pod-template-generation=2
fluentd-fl459 app=fluentd,controller-revision-hash=...,pod-template-generation=2
kibana-xyz-adty4f app=kibana,pod-template-hash=...
recurrent-tasks-executor-xaybyzr-13456 pod-template-hash=...,run=recurrent-tasks-executor
serviceproxy-1356yh6-2mkrw app=serviceproxy,pod-template-hash=...
Or viewing only specific label with $ kubectl get pods -L <label_name>.