Flume doesn't call the stop method of the last Agent - flume

I'm working with Flume. I have a external program which modifies the flume.conf.
When I modify the configuration file, Flume reloads the new configuration, changing, creating or removing the agents depending the new flume.conf.
Basically, when you modify the file, FLume calls all the stop methods of your agents and after that the starts methods of your agents what are present in your flume.conf. But, if you remove all the agents that there're in my flume.conf (or you only had once) it doesn't work. Flume detects that there're changes in your file, but there're nothing right now in the file, it's empty and Flume doesn't call the stop methods, so if I had some agents, the agents will never stop.
I'm talking about the Agents, not about Flume. Flume doesn't have to stop, because you could want to add new agents in the future.
I'm not sure if it's a bug or it's the normal performance.

Related

One-time (per worker) setup for python dataflow?

My dataflow job has to download some file from remote server. I want to save the file on worker machine so job doesn't have to keep downloading the same file.
I tried to do this with setup method, however it seems setup will be called for each thread, and multiple threads can call setup in parallel (I cannot find documentation around this, but based on my experience my job tries to write file data in parallel and hence causing malformed data).
Is there a way to perform one-time setup whenever worker machine is launched?
I also checked Apache Beam: DoFn.Setup equivalent in Python SDK but I believe it focuses around per-thread setup.
The Beam model doesn't include a specific callback for when a VM is created because the model doesn't guarantee the runtime environment. However, because you are using Dataflow that uses containers you have two options:
Modify the container image
Modify the setup.py
The first will give you direct control over the container image, and it works for all languages. The second only works for Python.

jenkins change label as requested

We use jenkins for automation for our test infrastructure. The requirement is to give users the ability to use a jenkins node for their private test or debug using private jenkins jobs and then put back in the pool of nodes marked with labels; so that other jobs that were marked to run on particular labels can be run without interference.
We can achieve this by letting users alter label, but that didnt workout as users (nearly 50) are making their own label names and it takes time for admin to reassign the nodes (even with process) and precious test time is getting affected.
we are looking for some solution such as ability to provide buttons like take this node offline (cant use this option since jenkins cannot see the node anymore and so users cannot run jenkins job on the node) but may be with the ability to run scripts.
I have done some research on this but have to compromise on some requirements, so i decided to seek help from the community... SUGGESTIONS?
Did you have a look to this question:
How to take Jenkins master node offline using CLI?
In the 1st question, there are some CLI to make a node offline.
Maybe you can create a dedicated job on the master with one parameter (the node name). This job will call the Jenkins CLI to stop your node.

jenkins job on two slaves?

We need to be able to run a Jenkins job that consumes two slaves. (Or, two jobs, if we can guarantee that they run at the same time, and it's possible for at least one to know what the other is.) The situation is that we have a heavy weight application that we need to run tests against. The tests run on one machine, the application runs on another. It's not practical to have them on the same host.
Right now, we have a Jenkins job that uses a script to kick a dedicated application server up, install the correct version, the correct data, and then run the tests against it. That means that I can't use the dedicated application server to run other tasks, when there aren't the heavy weight testing going on. It also pretty much limits us to one loop. Being able to assign the app server dynamically would allow more of them.
There's clearly no way to do this in the core jenkins, but I'm hoping there's some plugin or hackery to make this possible. The current test build is a maven 2 job, but that's configurable, if we have to wrap it in something else. It's kicked off by the successful completion of another job, which could be changed to start two, or whatever else is required.
I just learned from that the simultaneous allocation of multiple slaves can be done nicely in a pipeline job, by nesting node clauses:
node('label1') {
node('label2') {
// your code here
[...]
}
}
See this question where Mateusz suggested that solution for a similar problem.
Let me see if I understood the problem.
You want to have dynamically choose a slave and start the App Server on it.
When the App server is running on a Slave you do not want it to run any other job.
But when the App server is not running, you want to use that Slave as any other Slave for other jobs.
One way out will be to label the Slaves. And use "Restrict where this project can be run" to make the App Server and the Test Suite run on the machines with Slave label.
Then in the slave nodes, put " # of executors" to 1. This will make sure that at anytime only one Job will run.
Next step will be to create a job to start the App Server and then kick off the Test job once the App Server start job is successful..
If your test job needs to know the server details of the machine where your App server is running then it becomes interesting.

Why aren't JobListeners Durable in Quartz.NET?

I'm trying to chain a few jobs in Quartz.NET through JobChainingJobListener. I first create a couple of durable jobs (while using ADO JobStore with SQL Server) and this part works well - the jobs are visible across Quartz.NET restarts.
When I chain my jobs with Scheduler.ListenerManager.AddJobListener(listener, matchers)the listener fires correctly, but its definition cannot be made durable in the database. After every server restart, I have to define all listeners again.
Looking at the DB tables, there are no tables for listeners, nor does the code for ListenerManagerImpl contain any hints of listener persistence.
I'm planning to add listener durability and reload the global listener dictionary on server restart. Before I do that, I'm wondering if there are any reasons why the project does not already do so? Considering how mature Quartz.NET is, someone would have already ran into this, so it seems I'm missing something.
Can anyone please point to any pitfalls in implementing listener durability?
From Quartz's perspective listeners are just a configuration issue. Just like you configure job store type or other settings for the library. Commonly listeners are stateless and thus need no persistence services, unlike triggers and jobs that hold state that need to be persisted between invocations and possible job processing nodes.
If you have sound configuration management plan this shouldn't be an issue. Just handle the listener configuration like you would other aspects of the setup. If you have state management in your listeners that would need storage between restarts, that's a different story. Then you'd naturally need custom persistence.

scheduled task or windows service

My team is having a debate which is better: a windows service or scheduled tasks. We have a server dedicated to running jobs and currently they are all scheduled tasks. Some jobs take files, rename them and place them in other directories on the network. Other jobs extract data from SQL, modify it, and ship it elsewhere. Other jobs ftp files out. There is a lot of variety, but all in all, they are fairly straightforward.
I am partial to having each of these run as a windows service instead of a scheduled task because it is so much easier to monitor a windows service than a scheduled task. Some are diametrically opposed. In the end, none of us have that much experience to provide actual factual comparisons between the two methods. I am looking for some feedback on what other have experienced.
If it runs constantly - windows service.
If it needs to be run at various intervals - scheduled task.
Scheduled Task - When activity to be carried out on some fixed/predefined schedule. It take less memory and resources of OS. Not required installation. It can have UI (eg. Send reminder mail to defaulters)
Windows Service - When a continue monitoring is required. It makes OS busy by consuming more. Require install/uninstallation while changing version. No UI at all (eg. Process a mail as soon as it arrives)
Use them wisely
Sceduling jobs with the build in functionality is a perfectly valid use. You would have to recreate the full functionality in order to create a good service, and unless you want to react to speciffic events, I see no reason to move a nightly job into a service.
Its different when you want to process a file after it was posted in a folder, thats something I would create a service for, thats using the filesystem watcher to monitor a folder.
I think its reinventing the wheel
While there is nothing wrong with using the Task Scheduler, it is itself, a service. But we have the same requirements where I work and we have general purpose program that does several of these jobs. I interpreted your post to say that you would run individual services for each task, I would consider writing a single, database driven (service) program to do all your tasks, and that way, when you add a new one, it is simply a data entry chore, and not a whole new progam to write. If you practice change control, this difference is can be significant. If you have more than a few tasks the effort may be comperable. This approach will also allow you to craft a logging mechanism best suited to your operations.
This is a portion of our requirments document for our task program, to give you an idea of where to start:
This program needs to be database driven.
It needs to run as a windows service.
The program needs to be able to process "jobs" in the following manner:
Jobs need to be able to check for the existence of a source file, and take action based on the existence or not of the source file. (i.e proceed with processing, vs report that the file isn't there vs ignore it because it is not critical that the file isn't there.
Jobs need to be able to copy a file from a source to a target location or
Copy a file from source, to a staging location, perform "processing", and then copy either the original file or a result of the "processing" to the target location or
Copy a file from source, to a staging location, perform "processing", and the processing is the end result.
The sources and destination that jobs might copy to and from can be disparate: UNC, SFTP, FTP, etc.
The "processing", can be, encrypting/decrypting a file, parsing a data file for correct format, feeding the file to the mainframe via terminal emulation, etc., usually implemented by calling a command line passing parameters to an .exe
Jobs need to be able to clean up after themselves, as required. i.e. delete intermediate or original files, copy files to an archive location, etc.
The program needs to be able to determine the success and failure of each phase of a job and take appropriate action which would be logging, and possibly other notification, abort further processing on failure, etc.
Jobs need to be configured to activate at certain set times, or at certain intervals (optionally during certain set hours) i.e. every 15 mins from 9:00 - 5:00.
There needs to be a UI to add new jobs.
There needs to be a button to push to fire off a job as if a timer event had activated it.
The standard Display of the program should show an operator what is going on and whether the program is functioning properly.
All of this is predicated on the premise that it is a given that you write your own software. There are several enterprise task scheduler programs available on the market, as well. Buying off the shelf may be a better solution for you.

Resources