Combination of remoting and clustering - quartz.net

I am quite new to Quartz.NET, but was able to create a running solution for my problem.
There are remote server instances, which are executed as windows services. The jobstore for these instances is an AdoJobStore with SQLLite backend.
The client application is able to run jobs remotely through remote scheduler proxies.
Now i have to combine the remote execution with clustering. Right here I am struggling with the instantiating of scheduler proxies for remote servers. When a scheduler is created on client, side addresses and ports are configured explicit with the properties of the scheduler factory.
In architecture with a cluster consisting of several remote services and one client, which has to start jobs on these servers with the Quartz.NET feature load balancing, an explicit start of each of the jobs to a specific server address makes no sense to me.
So, how should the client app give the jobs to the cluster and how has the cluster to be configured (for example a list of server ip addresses and port to be used)?
In addition: how have the Quartz.NET server instances to share the database and how will this work for server less SQLLite?
Thanks for any tip useful for further reading I have to do,
Mario

Meanwhile I was able to get my system to work. The answer to my question “Combination of remoting & clustering” is: Do not combine these features, as it is not necessary.
For implementation of a distributed cluster, don’t use remoting at all (hard to find when your first development step was creating a client with a single remote server).
Distribution of jobs and therefore all “connecting” of instances is done by using the same database, which has to be centralized for that reason (using SQL Express now).
Don’t start your local (client) scheduler instance.
Don’t care about all the local working threads appearing even when all the work should be carried out by the remoted servers in the cluster. My expectation would have been to use a scheduler with 0 threads in the local application, as you do not want to start any job within this app.
Problem unsolved: There seems to be no way to register a listener which will be called when a job is executed in the cluster. So I have to build my own feedback channel job --> starting app in order to track the status of jobs (start time, finish time, node where execution has taken place, ..).
Unsolved problem: When the local (WPF) application is closed by the user an endless loop in SimpleThreadPool
while (runnable == null && run)
{
Monitor.Wait(lockObject, 500);
}
prevents the process from being exited.

Related

Upgrade micro-service without breaking current execution

Suppose you have a micro-service architecture with a topology of two services A and B on which both has 3 instances running each.
A its a web service receiving web requests, and B its a cli based application listening for events from a queue
Now you want to deploy a new version of B, but since the instances of B can be processing info at the moment.
How can be deployed, replacing old instances for new ones without breaking current execution?
There is any tool, patterns or strategy that handle this scenarios?
You need a simple strategy where you stop serving new requests for B for that instance which is about to go under deployment.
If it's consuming events using rest then you can use load balancer, if you have load balancer then using consul, consul template you can detach that instance from load balancer. Keep some approx time say 5 mins (which you need to evaluate) and then start the deployment.
Using this approach is necessary if you are not sure how to find out if the current instance has done all the processing of existing events.
If these events are consumed using MQ then you can have an endpoint upon called which will disable the new event consumption. And then have the same wait and deploy strategy.

How to scale the spring cloud data flow server

Normally we run the Jar of spring cloud data flow in one of the machine, but what if over the period we create many flows on the machine and the server gets overloaded and becomes a single point of failure, Do we have some thing where we can run the spring cloud data flow server jar on another machine and shift the flows on to that so that we can avoid any such failures and make our complete system more resilient and robust. or does the expansion happen automatically when we deploy our complete system on PCF/or cloud foundry.
SCDF is a simple Boot application. It doesn't retain any state about the stream/task applications itself, but it does keep track of the DSL definitions in the database.
It is common to provision multiple instances of SCDF-server and a load balancer in front for resiliency.
In PCF specifically, if you scale the SCDF-server to >1, PCF will automatically load-balance the incoming traffic (from SCDF Shell/GUI). It is also important to note that PCF will automatically restart the application instance, if it goes down for any reason. You will be set up for multiple levels of resiliency this way.

Second and Third Distributed Kafka Connector workers failing to work correctly

With a Kafka cluster of 3 and a Zookeeper cluster of the same I brought up one distributed connector node. This node ran successfully with a single task. I then brought up a second connector, this seemed to run as some of the code in the task definitely ran. However it then didn't seem to stay alive (though with no errors thrown, the not staying alive was observed by a lack of expected activity, while the first connector continued to function correctly). When I call the URL http://localhost:8083/connectors/mqtt/tasks, on each connector node, it tells me the connector has one task. I would expect this to be two tasks, one for each node/worker. (Currently the worker configuration says tasks.max = 1 but I've also tried setting it to 3.
When I try and bring up a third connector, I get the error:
"POST /connectors HTTP/1.1" 500 90 5
(org.apache.kafka.connect.runtime.rest.RestServer:60)
ERROR IO error forwarding REST request:
(org.apache.kafka.connect.runtime.rest.RestServer:241)
java.net.ConnectException: Connection refused
Trying to call the connector POST method again from the shell returns the error:
{"error_code":500,"message":"IO Error trying to forward REST request:
Connection refused"}
I also tried upgrading to Apache Kafka 0.10.1.1 that was released today. I'm still seeing the problems. The connectors are each running on isolated Docker containers defined by a single image. They should be identical.
The problem could be that I'm trying to run the POST request to http://localhost:8083/connectors on each worker, when I only need to run it once on a single worker and then the tasks for that connector will automatically distribute to the other workers. If this is the case, how do I get the tasks to distribute? I currently have the max set to three, but only one appears to be running on a single worker.
Update
I ultimately got things running using essentially the same approach that Yuri suggested. I gave each worker a unique group ID, then gave each connector task the same name. This allowed the three connectors and their single tasks to share a single offset, so that in the case of sink connectors the messages they consumed from Kafka were not duplicated. They are basically running as standalone connectors since the workers have different group ids and thus won't communicate with each other.
If the connector workers have the same group ID, you can't add more than one connector with the same name. If you give the connectors different names, they will have different offsets and consume duplicate messages. If you have three workers in the same group, one connector and three tasks, you would theoretically have an ideal situation where the tasks share an offset and the workers make sure the tasks are always running and well distributed (with each task consuming a unique set of partitions). In practice the connector framework doesn't create more than one task, even with tasks.max set to 3 and when the topic tasks are consuming has 25 partitions.
If anyone knows why I'm seeing this behaviour, please let me know.
I've encountered with similar issue in the same situation as yours.
Task.max is configured for a topic and distributed workers automatically decide what nodes handle topic. So, if you have 3 workers in a cluster and your topic configuration says task.max=2 then only 2 of 3 workers will process the topic. In theory, if one of workers fails, 3rd should pick up workload. But..
The distributed connector turned out to be very unreliable: once you add\remove some nodes, the cluster broke down and all workers did nothing but tried to choose leader and failed. The only way to fix was to restart whole cluster and preferably all workers simultaneously.
I chose another way - I used standalone worker and it works like a charm to me because distribution of load is implemented on Kafka client level and once some worker dropped, the cluster re-balances automatically and clients connected to unoccupied topics.
PS. Maybe it will be useful for you too. Confluent connector is not tolerate to invalid payload that does not match topic's schema. Once the connector get some invalid message it silently dies. The only way to find out is to analyze metrics.
I'm posting an answer to an old question, since Kafka Connect has moved on a lot in three years.
In the latest version (2.3.1) there is incremental rebalancing which massively improves the behaviour of Kafka Connect.
It's also worth noting that when configuring Kafka Connect rest.advertised.host.name must be set correctly, as if it's not you will see errors including the one quoted
{"error_code":500,"message":"IO Error trying to forward REST request: Connection refused"}
See this post for more details.

Any additional considerations when using Faye in a Node.js cluster?

We're planning to run an Express-based server on Node.js in "cluster mode" using Node.js' cluster support. So there will be 1 master process and 'n' (where 'n' is calculated based on the number of CPUs) child processes running on a single machine. We already have a testbed set up using Faye for pubsub in non-cluster mode and it works great.
Are there any additional considerations we need to be aware of when using Faye on top of a Node cluster? For example, since there will be 'n' HTTP server instances, will it be a problem creating a Faye NodeAdapter in each Node process and attaching it to the HTTP server instance in that process?
Thanks.
-brian
I just realized that the answer to my question is fairly obvious. One thing to be aware of is that Faye will need to access shared state across multiple server instances (processes). In a single-server config, you could probably get away with using Faye's memory engine. In a clustered config, you'd need to use Faye's redis engine or some other engine that allows state to be shared by different processes. I'd prefer not to introduce another persistence component just for this purpose so I may look into implementing my own on top of my current persistent store (Neo4j).

Rails: Can I run backgrounds jobs in a different server?

Is it possible to host the application in one server and queue jobs in another server?
Possible examples:
Two different EC2 instances, one with the main server and the second with the queueing service.
Host the app in Heroku and use an EC2 instance with the queueing service
Is that possible?
Thanks
Yes, definitely. We have delayed_job set up that way where I work.
There are a couple of requirements for it to work:
The servers have to have synced clocks. This is usually not a problem as long as the server timezones are all set to the same.
The servers all have to access the same database.
To do it, you simply have the same application on both (or all, if more than two) servers, and start workers on whichever server you want to process jobs. Either server can still queue jobs, but only the one(s) with workers running will actually process them.
For example, we have one interface server, a db server and several worker servers. The interface server serves the application via Apache/Passenger, connecting the Rails application to the db server. The workers have the same application, though Apache isn't running and you can't access the application through http. They do, on the other hand, have delayed_jobs workers running. In a common scenario, the interface server queues up jobs in the db, and the worker servers process them.
One word of caution: If you're relying on physical files in your application (attachments, log files, downloaded XML or anything else), you'll most likely need a solution like S3 to keep those files. The reason for this is that the individual servers might not have the actual files. An example of this is if your user were to upload their profile picture on your web-facing server, the files would likely be stored on that server. If you then have another server to resize the profile pictures, the image wouldn't exist on the worker server.
Just to offer another viable option: you can use a new worker service like IronWorker that relies completely on an elastic farm of cloud servers inside EC2.
This way you can queue/schedule jobs to run and they will parallelize across tons of threads spanning multiple servers - all without worrying about the infrastructure.
Same deal with the database though - it needs to be accessible from the outside.
Full disclosure: I helped build IW.

Resources