Keeping a POD alive in Kubernetes - docker

I am in the process of migrating an application from Windows server to Kubernetes running on cloud.I was able to launch the application successfully in Kubernetes. Next is restoration and there is a script provided to restore the data to the new instance, but that should be run when the instance of application is shutdown. There is a script called stop.sh to shutdown the instance of that application. Both restore and shutdown script should be run inside the POD. I got into the POD using "Kubectl exec" , and then i tried to shutdown the instance using stop.sh. Then the instance will shutdown, but the POD will also exit along with it and i am not able run the restore.sh script inside the POD. So is there a way where i can keep the POD alive even after my application instance is shutdown to run the restore script.
Regards,
John

A POD is meant to stay alive while the main process is running, when the main process shutdown, the pod will be terminated.
What you want is not how to keep the POD alive, but how to refactor your application to work with Kubernetes properly.
The restore phase you previously had in a script, will likely run as an init container.
Init containers are specialized containers that run before other containers in a Pod, so you could run your restore logic before the main application start.
The next phase is starting the application, the application should start by itself inside the container when the container is created, so the init container shouldn't affect it's lifecycle. The application will assume the init container has done it's work and the environment is setup properly.
The next phase is termination, when a container is started or stopped, Kubernetes will trigger container hooks for PostStart and PreStop. You likely need to use the PreStop hook to execute the custom script.
The scenario above, assumes minimal changes to the application, if you are willing to refactor the application to deploy it to Kubernetes, there are other ways it can be achieved, like using persistent volumes to store the data and re-use it when the container start, so you won't need to do the backup and restore all the time.

Related

jenkins build process running forever

emphasized text
I want to stop jenkins process after build and start server.
but It's running forever like this...
what should I do?
The application you are starting does not seem to have an end, because it seems you are running a spring boot application. After the successful build you start it rightaway inside the jenkins job. So it will never terminate, because the application is not terminating.
So I think you want to deploy the application somewhere and let it run maybe on the target VM?

How to kill a multi-container pod if one container fails?

I'm using Jenkins Kubernetes Plugin which starts Pods in a Kubernetes Cluster which serve as Jenkins agents. The pods contain 3 containers in order to provide the slave logic, a Docker socket as well as the gcloud command line tool.
The usual workflow is that the slave does its job and notifies the master that it completed. Then the master terminates the pod. However, if the slave container crashes due to a lost network connection, the container terminates with error code 255, the other two containers keep running and so does the pod. This is a problem because the pods have large CPU requests and setup is cheap with the slave running only when they have to, but having multiple machines running for 24h or over the weekend is a noticable financial damage.
I'm aware that starting multiple containers in the same pod is not fine Kubernetes arts, however ok if I know what I'm doing and I assume I do. I'm sure it's hard to solve this differently given the way the Jenkins Kubernetes Plugin works.
Can I make the pod terminate if one container fails without it respawn? As solution with a timeout is acceptable as well, however less preferred.
Disclaimer, I have a rather limited knowledge of kubernetes, but given the question:
Maybe you can run the forth container that exposes one simple endpoint of "liveness"
It can run ps -ef or any other way to contact 3 existing containers just to make sure they're alive.
This endpoint could return "OK" only if all the containers are running, and "ERROR" if at least one of them was detected as "crushed"
Then you could setup a liveness probe of kubernetes so that it would stop the pod upon the error returned from that forth container.
Of course if this 4th process will crash by itself for any reason (well it shouldn't unless there is a bug or something) then the liveness probe won't respond and kubernetes is supposed to stop the pod anyway, which is probably what you really want to achieve.

How to add containers to a Kubernetes pod on runtime

I have a number of Jobs
running on k8s.
These jobs run a custom agent that copies some files and sets up the environment for a user (trusted) provided container to run.
This agent runs on the side of the user container, captures the logs, waits for the container to exit and process the generated results.
To achieve this, we mount Docker's socket /var/run/docker.sock and run as a privileged container, and from within the agent, we use docker-py to interact with the user container (setup, run, capture logs, terminate).
This works almost fine, but I'd consider it a hack. Since the user container was created by calling docker directly on a node, k8s is not aware of it's existence. This has been causing troubles since our monitoring tools interact with K8s, and don't get visibility to these stand-alone user containers. It also makes pod scheduling harder to manage, since the limits (cpu/memory) for the user container are not accounted as the requests for the pod.
I'm aware of init containers but these don't quite fit this use case, since we want to keep the agent running and monitoring the user container until it completes.
Is it possible for a container running on a pod, to request Kubernetes to add additional containers to the same pod the agent is running? And if so, can the agent also request Kubernetes to remove the user container at will (e.g. certain custom condition was met)?
From this GitHub issue, it seems that the answer is that adding or removing containers to a pod is not possible, since the container list in the pod spec is immutable.
In kubernetes 1.16, there is an alpha feature that would allow for creation of ephemeral containers which could be "added" to running pods. Note, that this requires a feature gate to be enabled on relevant components e.g. kubelet. This may be hard to enable on control plane for cloud provider managed services such as EKS.
API Reference 1.16
Simple tutorial
I don't think you can alter a running pod like that but you can certainly define your own pod and run it programmatically using API
What I mean is you should define a pod with the user container and whatever other containers you wish and run it as a unit. It's possible you'll need to play around with liveness checks to have post processing completed after your user container dies
You can share data between multiple containers in a pod using shared volumes. this would let your agent container read from log files written on the user container, and drop config files into the shared volume for setup.
This way you could run the user container and the agent container as a Job with both containers in the pod. When both containers exit, the job will be completed.
You seem to indicate above that you are manually terminating the user container. That wouldn't be supported via shared volume unless you did something like forcing users to terminate their execution at the presence of a file on the shared volume.
Is it possible for a container running on a pod, to request Kubernetes
to add additional containers to the same pod the agent is running? And
if so, can the agent also request Kubernetes to remove the user
container at will (e.g. certain custom condition was met)?
I'm not aware of any way to add containers to existing Job pod definitions. There's no replicas option for Jobs so you couldn't hack it by changing replicas from 0->1 like you potentially could on a Deployment.
I'm not aware of any way to use kubectl to delete a container but not the whole pod. See kubectl delete.
If you want to kill the user container (rather than having it run to completion), you'll have to get on the host and use docker kill <sha> on the user container. Make sure to set .spec.template.spec.restartPolicy = "Never" on the user container or k8s will restart it.
I'd recommend:
Having a shared volume to transfer logs to the agent and so the agent can set up the user container
Making user containers expect to exit on their own and read configs from the shared volume
I don't know what workloads you are doing or how users are making containers so that may not be possible. If you're not able to dictate how users build their containers, the above may not work.
Another option is providing a binary that acts as a command API on the user container. This binary could accept commands like "setup", "run", "terminate", "transfer logs" via RPC and it would be the main process in their docker container.
Then you could make the build process for users something like:
FROM your-container-with-binary:latest
put whatever you want in this
container and set ENV JOB_PATH=/path/to/executable/code (or put code
in specific location)
Lots of moving parts to this whichever way you make it happen.
You can inject containers to pods dynamically via : https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/
An admission controller is a piece of code that intercepts requests to
the Kubernetes API server prior to persistence of the object, but
after the request is authenticated and authorized. The controllers
consist of the list below, are compiled into the kube-apiserver
binary, and may only be configured by the cluster administrator. In
that list, there are two special controllers: MutatingAdmissionWebhook
and ValidatingAdmissionWebhook. These execute the mutating and
validating (respectively) admission control webhooks which are
configured in the API.
Admission controllers may be “validating”, “mutating”, or both.
Mutating controllers may modify the objects they admit; validating
controllers may not.
And you can inject additional runtime requirements to pods via : https://kubernetes.io/docs/concepts/workloads/pods/podpreset/
A Pod Preset is an API resource for injecting additional runtime
requirements into a Pod at creation time. You use label selectors to
specify the Pods to which a given Pod Preset applies.

How to delay Docker Swarm updating a stateful container until it's ready?

Problem domain
Imagine that a stateful container is being managed by Swarm, e.g. a database, and another container is relying on it, e.g. a service that is executing a long-running job (minutes, sometimes hours) that does not tolerate the database (or even itself) to go down while it's executing.
To give an example, a database importing a multi GB dump.
There's also a CI/CD system in place which takes care of building new versions of the containers and deploying them to the Swarm, or pushing the image to Docker Hub which then calls a defined webhook which fires off the deployment event.
Question
Is there any way I can build my containers so that Swarm can know whether it's ok to update it or not? Similarly how HEALTHCHECK reports whether it needs to be restarted, something that would let Swarm know that 'It's safe to restart this container now'.
Or is it the CI/CD system's responsibility to check whether the stateful containers are safe to restart, and only then issue the update command to swarm?
Thanks in advance!
Docker will not check with a container if it is ready to be stopped, once you give docker the command to stop a container it will perform that action. However it performs the stop in two steps. The first step is a SIGTERM that your container can trap and gracefully handle. By default, after 10 seconds, a SIGKILL is sent that the Linux kernel immediately applies and cannot be trapped by the container. For your goals, you'll want to make sure your app knows when it's safe to exit after receiving the first signal, and you'll probably want to extend the time to much longer than 10 seconds between signals.
The healthcheck won't tell docker that your container is at a safe point to stop. It does tell swarm when your container has finished starting, or when it's misbehaving and needs to be stopped and replaced. The healthcheck defines a command to run inside your container, and the exit code is checked for whether it's 0 (healthy) or 1 (unhealthy). No other exit codes are currently valid.
If you need more than the simple signal handling inside the container, then yes, you're likely moving up the stack to a ci/cd tool to manage the deployment.

Kubernetes: How do I deploy container from saved checkpoint?

I am using experimental checkpoint feature to start up my app in the container and save its state.
I do so because tests on the app cannot be run in pararell and startup takes long.
I want to migrate to kubernetes to manage test containers
Build and start up an app in the container
Save state
Spin X instances from saved container
Run one test on each container
How do I use Kubernetes to do that?
I uses GCP
Container state migration (CRIU) is a feature that Docker has in a experimental state. According to Kubernetes devs (https://github.com/kubernetes/kubernetes/issues/3949), looks like it is not something Kubernetes will support in the short term. Therefore, you currently cannot migrate pods with checkpoints (i.e. it will need to start again). Not sure if creating a container image of your started application could help, that would depend on how the container image was created.

Resources