Spring Cloud Data Flow - Is there a way to store the Arguments as part of Task? - spring-cloud-dataflow

I am using SCDF on k8s. When run the task, every time I have to enter the Arguments and Parameters. Is there a way I can auto save them along with Task definition so that I dont need to enter every time?

By design, the task definition doesn't include the arguments or batch job parameters (if your task is a batch job). This is because the task arguments and parameters are bound to vary and typically provided externally. But, if you want them to be part of the task application, then you can have them configured as task application properties instead of arguments.

Related

Prevent multiple cron running in nest.js on docker

In docker we have used deploy: replicas: 3 for our microservice. We have some Cronjob & the problem is the system in running all cronjob is getting called 3 times which is not what we want. We want to run it only one time. Sample of cron in nest.js :
#Cron(CronExpression.EVERY_5_MINUTES)
async runBiEventProcessor() {
const calculationDate = new Date()
Logger.log(`Bi Event Processor started at ${calculationDate}`)
How can I run this cron only once without changing the replicas to 1?
This is quite a generic problem when cron or background job is part of the application having multiple instances running concurrently.
There are multiple ways to deal with this kind of scenario. Following are some of the workaround if you don't have a concrete solution:
Create a separate service only for the background processing and ensure only one instance is running at a time.
Expose the cron job as an API and trigger the API to start background processing. In this scenario, the load balancer will hand over the request to only one instance. This approach will ensure that only one instance will handle the job. You will still need an external entity to hit the API, which can be in-house or third-party.
Use repeatable jobs feature from Bull Queue or any other tool or library that provides similar features.
Bull will hand over the job to any active processor. That way, it ensures the job is processed only once by only one active processor.
Nest.js has wrapper for the same. Read more about the Bull queue repeatable job here.
Implement a custom locking mechanism
It is not difficult as it sounds. Many other schedulers in other frameworks work on similar principles to handle concurrency.
If you are using RDBMS, make use of transactions and locking. Create cron records in the database. Acquire the lock as soon as the first cron enters and processes. Other concurrent jobs will either fail or timeout as they will not be able to acquire the lock. But you will need to handle a few cases in this approach to make it bug-free and flawless.
If you are using MongoDB or any similar database that supports TTL (Time-to-live) setting and unique index. Insert the document in the database where one of the fields from the document has unique constraints that ensure another job will not be able to insert one more document as it will fail due to database-level unique constraints. Also, ensure TTL(Time-to-live index) on the document; this way document will be deleted after a configured time.
These are workaround if you don't have any other concrete options.
There are quite some options here on how you could solve this, but I would suggest to create a NestJS microservice (or plain nodeJS) to run only the cronjob and store it in a shared db for example to store the result in Redis.
Your microservice that runs the cronjob does not expose anything, it only starts your cronjob:
const app = await NestFactory.create(
WorkerModule,
);
await app.init();
Your WorkerModule imports the scheduler and configures the scheduler there. The result of the cronjob you can write to a shared db like Redis.
Now you can still use 3 replica's but prevent registering cron jobs in all replica's.

Get Jenkins build job duration once finished

I am using Jenkins integration server for my CI/CD.
I am working with freestyle projects.
I want to get build duration once finished (in seconds) using RESTFUL API (JSON).
This is what i tried:
"duration=$(curl -g -u login:token--silent "$BUILD_URL/api/json?pretty=true&tree=duration" | jq -r '.duration')"
Duration is always equel to 0 even though i did this shell script in a post buil task.
There are a few reasons why this may not work for you, but the most probable reason is that the build hasn't finished when you make the API call.
I tried it on our instance and for finished jobs it works fine and for running jobs it always returns 0. If your post build task is executed as part of the job, then the job has probably not finished executing yet, which is why you are always getting 0.
The Api call for the build will not contain the duration attribute as long as the build is running, and therefore you cannot use that mechanism during the build duration.
However you have a nice alternative for achieving what you want using Freestyle jobs.
The solution, which still uses the Api method, is to create a separate generic job for updating your data base with the results, this jobs will receive as parameters the project name and the build number, run the curl command for receiving the duration, update your database and run any other logic you need.
This job can now be called from any freestyle job using the Parameterized Trigger plugin post task with the relevant build environment parameters.
This has the additional benefit that the duration update mechanism is controlled in a single job and if updates are needed they can be made in a single location avoiding the need to update all separate jobs.
Assuming your job is called Update-Duration and it receives two parameters Project and Build the post trigger can look like the following:
And thats it, just add this tigger to any job that needs it, and in the future you can update the logic without changing the calling jobs.
Small thing, in order to avoid a race condition that can be caused if the caller job has not yet finished the execution you can increase the quite period of your data base updater job to allow enough time for the caller jobs to finish so the duration will be populated.

Spring Cloud Data Flow preconfigured tasks

Spring Cloud Data Flow is a great solution and currently I'm trying to find the possibility to preconfigure the tasks in order to trigger them manually.
The use-case very simple:
as a DevOps I should have ability to preconfigure the tasks, which includes creation of the execution graph and application and deploy parameters and save task with all parameters needed for execution.
as a User with role ROLE_DEPLOY I should have ability start, stop, restart and execution monitor the preconfigured tasks.
Preconfigured tasks is the task with all parameters needed for execution.
Is it possible to have such functionality?
Thank you.
You may want to review the continuous deployment section from the reference guide. There's built-in lifecycle semantics for orchestrating tasks in SCDF.
Assuming you have the Task applications already built and that the DevOps persona is familiar with how the applications work together, they can either interactively or programmatically build a composed-task definition and task/job parameters in SCDF.
The above step persists the task definition in the SCDF's database. Now, the same definition can be launched manually or via a cronjob schedule.
If nothing really changes to the definition, app version, or the task/parameters, yes, anyone with the ROLE_DEPLOY role can interact with it.
You may also find the CD for Tasks guide useful reference material. Perhaps repeat this locally (with desired security configurations) on your environment to get a hold of how it works.

How to create multiple tasks in one jar(Spring Cloud Task)?

Use case(Spring cloud task):
I have different tasks which are independent of each other. I want to create those tasks in one jar and trigger task from command line. Is it possible ?
Also I want to schedule them using crontab. Please suggest.
I am not sure why you would need to have all those independent tasks inside the same jar but as long as you have a way to invoke the appropriate task based on your criteria, you could use Spring Cloud Data Flow to set this up.
You register your single jar with all the independent tasks as a task application (let's say mytask)
You create a schedule to trigger this task with the command line arguments to specify the criteria to launch the appropriate functionality inside the task jar
(Note that the scheduler support is only available on Kubernetes and Cloud Foundry)
Depending on the context you might want to consider using composed tasks as well: https://docs.spring.io/spring-cloud-dataflow/docs/2.5.2.RELEASE/reference/htmlsingle/#spring-cloud-dataflow-composed-tasks

How can I coordinate integration tests in a multi-container (Docker) system?

I have inherited a system that consists of a couple daemons that asynchronously process messages. I am trying to find a clean way to introduce integration testing into this system with minimal impact/risk on the existing programs. Here is a very simplified overview of their responsibilities:
Process 1 polls a queue for messages, and inserts a row into a DB for each one it dequeues.
Process 2 polls the DB for rows inserted by Process 1, does some calculations, and then deposits a file into a directory on the host and sends an email.
These processes are quite old and complex, and I am strongly inclined to avoid modifying them in any way. What I would like to do is put each of them in a container, and also stand up the dependencies (queue, DB, mail server) in other containers. This part is straightforward, but what I'm unsure about is the best way to orchestrate these tests. Since these processes consume and generate output asynchronously I will need to poll or wait for the expected outcome (mail sent, file created).
Normally I would just write a series of tests in a single test suite of my language of choice (Java, Go, etc), and make the setUp / tearDown hooks responsible for resetting the environment to the desired state. But because these processes have a lot of internal state I am afraid I cannot successfully "clean up" properly after each distinct test. This would be a problem if, for example, one test failed to generate the desired output in a specific period of time so I marked it as failed, but a subsequent test falsely got marked as passed because the original test case actually did output something (albeit much slower than anticipated) that was mistakenly attributed to the subsequent test. For these reasons I feel I need to recreate the world between each test.
In order to do this the only options I can see are:
Use a shell script to actually run my tests -- having it bring up the containers, execute a single test file, and then terminate my containers for each test.
Follow my usual pattern of setUp / tearDown in my existing test framework but call out to docker to terminate and start up the containers between each test.
Am I missing another option? Is there some kind of existing framework or pattern used for this sort of testing?

Resources