How to handle flink management and k8s management

How to handle flink management and k8s management - docker

I'm considering deploying Flink with K8s. I'm a newbie on Flink and have a simple question:
Saying that I use K8s to manager dockers and deploy the TaskManager into the dockers.
As my understanding, a docker can be restarted by K8s when it fails, and a Task can be restarted by Flink when it fails.
If a Task is running in a container of docker and the container suddenly fails for some reason, in the Flink's view, a Task failed so the task should be restarted, and in the K8s' view, a container failed so the docker should be restarted. In this case, should we worry about some conflict because of the two kinds of "be restarted"?

I think you want to read up on the official kubernetes setup guide here: https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/kubernetes.html
It describes 3 ways of getting it to work:
Session Cluster: This involves spinning up 2 deployments in the appendix and requires you to submit your Flink job manually or via a script in the beginning. This is very similar to a local standalone cluster when you are developing, except it is now in your kubernetes cluster
Job Cluster: By deploying Flink as a k8s job, you would be able to eliminate the job submission step.
Helm chart: By the look of it, the project has not updated this for 2 years, so your mileage may vary.
I have had success with a Session Cluster, but I would eventually like to try the "proper" way, which is to deploy it as kubernetes job using the 2nd method by the looks of it.
Depending on your Flink Source and the kind of failure, your Flink job will fail differently. You shouldn't worry about the "conflict". Either Kubernetes is going to restart the container, or Flink is going to handle the error it could handle. After a certain amount of retry it would cancel, depending on how you configured this. See Configuration for more details. If the container exited with a code that is not 0, Kubernetes would try to restart it. However, it may or may not resubmit the job depending on whether you deployed the job in a Job Cluster or whether you had an initialization script for the image you used. In a Session Cluster, this can be problematic depending on whether the job submission is done through task manager or job manager. If the job was submitted through task manager, then we need to cancel the existing failed job so the resubmitted job can start.
Note: if you did go with the Session Cluster and have a file system based Stateful Backend (non-RocksDB Stateful Backend) for checkpoints, you would need to figure out a way for the job manager and task manager to share a checkpoint directory.
If the task manager uses a checkpoint directory that is inaccessible to the job manager, the task manager's persistence layer would build up and eventually cause some kind of out of disk space error. This may not be a problem if you decided to go with RocksDB and enable incremental checkpoints

Related

Call jobs on demand

We have a docker container which is a CLI application, it runs, does it s things and exits.
I got the assignment to put this into kubernetes but that containers can not be deployed as it exits and then is considered a crashloop.
So the next question is if it can be put in a job. The job runs and gets restarted every time a request comes in over the proxy. Is that possible? Can job be restarted externally with different parameters in kubernetes?

So the next question is if it can be put in a job.
If it is supposed to just run once, a Kubernetes Job is a good fit.
The job runs and gets restarted every time a request comes in over the proxy. Is that possible?
This can not easyli be done without external add-ons. Consider using Knative for this.
Can job be restarted externally with different parameters in kubernetes?
Not easyli, you need to interact with the Kubernetes API, to create a new Job for this, if I understand you correctly. One way to do this, is to have a Job with kubectl-image and proper RBAC-permissions on the ServiceAccount to create new jobs - but this will involve some latency since it is two jobs.

How to restart interrupted Jenkins jobs after a server or node failure/restart?

I'm running a Jenkins server and some slaves on a docker swarm that's hosted on preemptive google instances (akin to AWS spot instances). I've got everything set up so that at any given moment there is a Jenkins master running on a single server and slaves running on every other server on the swarm. When one server gets terminated another is spun up and replaces it, and eventually Jenkins is back up running again on another machine even if its server was stopped, and slaves get replaced as they die.
I'm facing two problems:
My first one is when the Jenkins master dies and comes back online it tries to resume the jobs that were previously running and they end up getting stuck trying to be built. Is there any way to automatically have Jenkins restart jobs that were interrupted instead of trying to resume them?
The second is when a slave dies I'd like to automatically restart any jobs that were running on it elsewhere. Is there any way to do that?
Currently I'm dealing with both situations by have an external application retry the failed build jobs, but that's not really optimal.
Thanks!

Starting up dependent services in Jenkins

Our test suite relies on a number of subsidiary services being present - database, message queue, redis, and so on. I would like to set up a Jenkins build that spins up all the correct services (docker containers, most likely) and then runs the correct tests, followed by some other steps.
Can someone point me to a good example for doing such a thing? I've seen a plug-in for mongo, and some general guides on spinning up agents, but their relationship to what I'm trying to do is unclear.

One possibility is to use the JenkinsCI Kubernetes plugin and jenkinsCI Kubernetes pipeline plugin: they will allow you to
launch docker slaves automatically,
with container group support through podTemplate and containerTemplate.

Advantages/Disadvantages of Running Jenkins Slaves for Dev/Test/Prod?

Let's start by agreeing that we want to adhere to typical Docker/DevOps principles. Therefore, we want to keep tasks isolated, configurations versions controlled, and overall customization to a minimum.
The Landscape:
Jenkins is being used as the CI/CD tool on your cloud instance of choice.
The Plan:
Create separate instances for test/staging/prod, each with Docker installed
Spin up Jenkins slave containers on each instance, which are controlled by Jenkins master
When a commit is sent to 'test' branch, Jenkins master sends task to 'Test' slave which ultimately spins up version of application
Similarly, after tests are successfully run and code is pushed to staging or prod branches, Jenkins will have branch-respective slave build application.
The Question(s):
What is wrong with this approach?
What can be improved by this approach?

There are a few questions you should ask yourself when taking on this approach, a lot of those are covered in this blogpost.
The final paragraph suggests exposing the docker socket to the CI container, allowing you to build images on the host machine, instead of inside the CI container, saving you from a lot of pains that come from running Docker in Docker.
Other questions you should probably ask are what would be the orchestration service used for controlling the master and slave containers. I had a great time following this blog post by Stelligent to quickly create all I needed on AWS ECS using a Cloudformation stack, but other solutions are obviously an option.
So all in all, I don't see anything wrong with your approach, as long as you exercise caution and follow best practices.
Good luck.

How would i go about creating docker environment in CI with lots of services

Suppose i want to move mu current acceptance test CI environment to dockers, so i can take benefit of performance improvements and also quickly setting up multiple clones for slow acceptance tests.
I would have a lot of services.
The easy ones would be postgres, mongodb, reddis and such, which are updated rarely.
However, how would i go about, if my own product has lots of services aswell? - over 10-20 services, that all need to work together for tests. Is it even feasible to handle this with dockers, i.e., how can CI efficiently control so many containers automatically AND make clones of them to run acceptance tests in parallel.
Also, how would i automatically update the containers easily for the CI? Would the CI simply need to rebuild every container at the start of the every run with the HEAD of every service branch? Or would the CI run git pull and some update/migrate command on every service?
In VM-s its easy to control these services, but i would like to be convinced that dockers are good or better for it as well.

I'm in the same position as you and have recently gotten this all working to my liking.
First of all, while docker is generally intended to run a single process, for testing I've found it works better for the docker container to run all services needed. There is some duplication in going this route, but you don't have to worry about shared services, like Mongo or PostgreSQL. This can be accomplished by using something like Supervisor: http://docs.docker.com/articles/using_supervisord/
The idea is to configure supervisor to start all necessary services inside the container, so they are completely isolated from other containers. In my environment, I have mongo, xvfb, chrome and firefox all running in a single container. So really, you still are running a single process (supervisor) but it starts many others.
As for adding repositories to your container, I just have the host machine checkout the code and then when I run docker, I use the -v flag to add the repo to the container. This way you don't need to rebuild the container each time. I build containers nightly with the latest code to be able to add all necessary gems for a faster 'gem install' at testing time.
Lastly I have a script as the entrypoint of the container that allows me to pass in what test I want to run.
Jenkins then just runs the docker commands and passes in the tests to run. These can be done in parallel, sequentially or any other way you like. I'm currently looking into having these tests run on slave Jenkins instances in an auto-scaling group in AWS.
Hope that helps.

drone is a docker based open source CI plus online service: https://drone.io
Generally it runs build and test in docker containers, and remove all containers after built. you just need to provide a file named .drone.yml with similar configuration like .travis.yml to configure your build.
it will manage your services like database, cache as linked container.
For your build environment, you can use exiting docker images as template of dependencies.
So far, it supports github.com and gitlab. for your own CI system, you can use drone CLI only or its web interface.

I recommend to use Jenkins docker plugin, though it is new, it starts to expose the power of docker used inside jenkins, the configuration is well written there. (let me know if u have problem)
The strategy I planned to use it.
create different app images to serve different service like postgres, mongodb, reddis and such, since it is rare updated, they will be configured globally as "cloud" template in advance, each VM will have label to indicate the service
In each jenkins job, each images will be selected as slave node (use that label as name)
When the job is triggered, it will automatically start the docker container as slave in seconds
It shall work for you.
BTW: As the time I answered (2014.5), the plugin is not mature enough, but it is the right direction.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart