I'm using Neo4j 1.9.M01 in a Spring MVC application that exposes some domain specific REST services (read, update). The web application is deployed three times into the same web container (Tomcat 6) and each "node" has it's own embedded Neo4j HA instance part of the same cluster.
the three Neo4j config:
#node 1
ha.server_id=1
ha.server=localhost:6361
ha.cluster_server=localhost:5001
ha.initial_hosts=localhost:5001,localhost:5002,localhost:5003
#node 2
ha.server_id=2
ha.server=localhost:6362
ha.cluster_server=localhost:5002
ha.initial_hosts=localhost:5001,localhost:5002,localhost:5003
#node 3
ha.server_id=3
ha.server=localhost:6363
ha.cluster_server=localhost:5003
ha.initial_hosts=localhost:5001,localhost:5002,localhost:5003
Problem: when performing an update on one of the nodes the change is replicated to only ONE other node and the third node stays in the old state corrupting the consistency of the cluster.
I'm using the milestone because it's not allowed to run anything outside of the web container so I cannot rely on the old ZooKeeper based coordination in pre-1.9 versions.
Do I miss some configuration here or can it be an issue with the new coordination mechanism introduced in 1.9?
This behaviour (replication only to ONE other instance) is the same default as in 1.8. This is controlled by:
ha.tx_push_factor=1
which is the default.
Slaves get updates from master in a couple of ways:
By configuring a higher push factor, for example:
ha.tx_push_factor=2
(on every instance rather, because the one in use is the one on the current master).
By configuring pull interval for slaves to fetch updates from its master, for example:
ha.pull_interval=1s
By manually pulling updates using the Java API
By issuing a write transaction from the slave
See further at http://docs.neo4j.org/chunked/milestone/ha-configuration.html
A first guess would be to set
ha.discovery.enabled = false
see http://docs.neo4j.org/chunked/milestone/ha-configuration.html#_different_methods_for_participating_in_a_cluster for an explanation.
For a full analysis could you please provide data/graph.db/messages.log from all three cluster members.
Side note: It should be possible to use 1.8 also for your requirements. You could also spawn zookeeper directly from tomcat, just mimic what bin/neo4j-coordinator does: run class org.apache.zookeeper.server.quorum.QuorumPeerMain in a seperate thread upon startup of the web application.
Related
I have the following situation:
Two times a day for about 1h we receive a huge inflow in messages which are currently running through RabbitMQ. The current Rabbit cluster with 3 nodes can't handle the spikes, otherwise runs smoothly. It's currently setup on pure EC2 instances. The instance type is currenty t3.medium, which is very low, unless for the other 22h per day, where we receive ~5 msg/s. It's also setup currently has ha-mode=all.
After a rather lengthy and revealing read in the rabbitmq docs, I decided to just try and setup an ECS EC2 Cluster and scale out when cpu load rises. So, create a service on it and add that service to the service discovery. For example discovery.rabbitmq. If there are three instances then all of them would run on the same name, but it would resolve to all three IPs. Joining the cluster would work based on this:
That would be the rabbitmq.conf part:
cluster_formation.peer_discovery_backend = dns
# the backend can also be specified using its module name
# cluster_formation.peer_discovery_backend = rabbit_peer_discovery_dns
cluster_formation.dns.hostname = discovery.rabbitmq
I use a policy ha_mode=exact with 2 replicas.
Our exchanges and queues are created manually upfront for reasons I cannot discuss any further, but that's a given. They can't be removed and they won't be re-created on the fly. We have 3 exchanges with each 4 queues.
So, the idea: during times of high load - add more instances, during times of no load, run with three instances (or even less).
The setup with scale-out/in works fine, until I started to use the benchmarking tool and discovered that queues are always created on one single node which becomes the queue master. Which is fine considering the benchmarking tool is connected to one single node. Problem is, after scale-in/out, also our manually created queues are not moved to other nodes. This is also in line with what I read on the rabbit 3.8 release page:
One of the pain points of performing a rolling upgrade to the servers of a RabbitMQ cluster was that queue masters would end up concentrated on one or two servers. The new rebalance command will automatically rebalance masters across the cluster.
Here's the problems I ran into, I'm seeking some advise:
If I interpret the docs correctly, scaling out wouldn't help at all, because those nodes would sit there idling until someone would manually call rabbitmq-queues rebalance all.
What would be the preferred way of scaling out?
With a Kafka cluster of 3 and a Zookeeper cluster of the same I brought up one distributed connector node. This node ran successfully with a single task. I then brought up a second connector, this seemed to run as some of the code in the task definitely ran. However it then didn't seem to stay alive (though with no errors thrown, the not staying alive was observed by a lack of expected activity, while the first connector continued to function correctly). When I call the URL http://localhost:8083/connectors/mqtt/tasks, on each connector node, it tells me the connector has one task. I would expect this to be two tasks, one for each node/worker. (Currently the worker configuration says tasks.max = 1 but I've also tried setting it to 3.
When I try and bring up a third connector, I get the error:
"POST /connectors HTTP/1.1" 500 90 5
(org.apache.kafka.connect.runtime.rest.RestServer:60)
ERROR IO error forwarding REST request:
(org.apache.kafka.connect.runtime.rest.RestServer:241)
java.net.ConnectException: Connection refused
Trying to call the connector POST method again from the shell returns the error:
{"error_code":500,"message":"IO Error trying to forward REST request:
Connection refused"}
I also tried upgrading to Apache Kafka 0.10.1.1 that was released today. I'm still seeing the problems. The connectors are each running on isolated Docker containers defined by a single image. They should be identical.
The problem could be that I'm trying to run the POST request to http://localhost:8083/connectors on each worker, when I only need to run it once on a single worker and then the tasks for that connector will automatically distribute to the other workers. If this is the case, how do I get the tasks to distribute? I currently have the max set to three, but only one appears to be running on a single worker.
Update
I ultimately got things running using essentially the same approach that Yuri suggested. I gave each worker a unique group ID, then gave each connector task the same name. This allowed the three connectors and their single tasks to share a single offset, so that in the case of sink connectors the messages they consumed from Kafka were not duplicated. They are basically running as standalone connectors since the workers have different group ids and thus won't communicate with each other.
If the connector workers have the same group ID, you can't add more than one connector with the same name. If you give the connectors different names, they will have different offsets and consume duplicate messages. If you have three workers in the same group, one connector and three tasks, you would theoretically have an ideal situation where the tasks share an offset and the workers make sure the tasks are always running and well distributed (with each task consuming a unique set of partitions). In practice the connector framework doesn't create more than one task, even with tasks.max set to 3 and when the topic tasks are consuming has 25 partitions.
If anyone knows why I'm seeing this behaviour, please let me know.
I've encountered with similar issue in the same situation as yours.
Task.max is configured for a topic and distributed workers automatically decide what nodes handle topic. So, if you have 3 workers in a cluster and your topic configuration says task.max=2 then only 2 of 3 workers will process the topic. In theory, if one of workers fails, 3rd should pick up workload. But..
The distributed connector turned out to be very unreliable: once you add\remove some nodes, the cluster broke down and all workers did nothing but tried to choose leader and failed. The only way to fix was to restart whole cluster and preferably all workers simultaneously.
I chose another way - I used standalone worker and it works like a charm to me because distribution of load is implemented on Kafka client level and once some worker dropped, the cluster re-balances automatically and clients connected to unoccupied topics.
PS. Maybe it will be useful for you too. Confluent connector is not tolerate to invalid payload that does not match topic's schema. Once the connector get some invalid message it silently dies. The only way to find out is to analyze metrics.
I'm posting an answer to an old question, since Kafka Connect has moved on a lot in three years.
In the latest version (2.3.1) there is incremental rebalancing which massively improves the behaviour of Kafka Connect.
It's also worth noting that when configuring Kafka Connect rest.advertised.host.name must be set correctly, as if it's not you will see errors including the one quoted
{"error_code":500,"message":"IO Error trying to forward REST request: Connection refused"}
See this post for more details.
I am quite new to Quartz.NET, but was able to create a running solution for my problem.
There are remote server instances, which are executed as windows services. The jobstore for these instances is an AdoJobStore with SQLLite backend.
The client application is able to run jobs remotely through remote scheduler proxies.
Now i have to combine the remote execution with clustering. Right here I am struggling with the instantiating of scheduler proxies for remote servers. When a scheduler is created on client, side addresses and ports are configured explicit with the properties of the scheduler factory.
In architecture with a cluster consisting of several remote services and one client, which has to start jobs on these servers with the Quartz.NET feature load balancing, an explicit start of each of the jobs to a specific server address makes no sense to me.
So, how should the client app give the jobs to the cluster and how has the cluster to be configured (for example a list of server ip addresses and port to be used)?
In addition: how have the Quartz.NET server instances to share the database and how will this work for server less SQLLite?
Thanks for any tip useful for further reading I have to do,
Mario
Meanwhile I was able to get my system to work. The answer to my question “Combination of remoting & clustering” is: Do not combine these features, as it is not necessary.
For implementation of a distributed cluster, don’t use remoting at all (hard to find when your first development step was creating a client with a single remote server).
Distribution of jobs and therefore all “connecting” of instances is done by using the same database, which has to be centralized for that reason (using SQL Express now).
Don’t start your local (client) scheduler instance.
Don’t care about all the local working threads appearing even when all the work should be carried out by the remoted servers in the cluster. My expectation would have been to use a scheduler with 0 threads in the local application, as you do not want to start any job within this app.
Problem unsolved: There seems to be no way to register a listener which will be called when a job is executed in the cluster. So I have to build my own feedback channel job --> starting app in order to track the status of jobs (start time, finish time, node where execution has taken place, ..).
Unsolved problem: When the local (WPF) application is closed by the user an endless loop in SimpleThreadPool
while (runnable == null && run)
{
Monitor.Wait(lockObject, 500);
}
prevents the process from being exited.
I'm trying to understand how neo4j cluster creation/joining works as it is not behaving properly in our application.
So I'm starting from scratch and creating a 3 box cluster as per the tutorial: http://neo4j.com/docs/2.3.4/ha-setup-tutorial.html
The following note is copy/pasted from the tutorial:
Startup Time When running in HA mode, the startup script returns
immediately instead of waiting for the server to become available.
This is because the instance does not accept any requests until a
cluster has been formed. In the example above this happens when you
start the second instance. To keep track of the startup state you can
follow the messages in console.log — the path is printed before the
startup script returns.
However when I startup the second instance, my cluster is still not formed... I need to startup the 3rd one for the cluster to start.
Is this an error in the neo4j docs?
Furthermore, is there a way to "force" an instance to become a master on cluster startup? For example, if I have 3 nodes and 2 of them fail and need to be re-installed, when I restart the cluster, how can I force the one with the valid database to become master? Isn't there a chance the 2nd or 3rd one with a blank database would become master?
When you start a cluster for the first time, or stop all instances and then start them again, the initial cluster MUST consist of all members listed in ha.initial_hosts. In addition, all instances in the cluster should have the exact same entries in ha.initial_hosts for the cluster to come up quickly and cleanly. The cluster will not form until all of the instances are up and running.
We're planning to run an Express-based server on Node.js in "cluster mode" using Node.js' cluster support. So there will be 1 master process and 'n' (where 'n' is calculated based on the number of CPUs) child processes running on a single machine. We already have a testbed set up using Faye for pubsub in non-cluster mode and it works great.
Are there any additional considerations we need to be aware of when using Faye on top of a Node cluster? For example, since there will be 'n' HTTP server instances, will it be a problem creating a Faye NodeAdapter in each Node process and attaching it to the HTTP server instance in that process?
Thanks.
-brian
I just realized that the answer to my question is fairly obvious. One thing to be aware of is that Faye will need to access shared state across multiple server instances (processes). In a single-server config, you could probably get away with using Faye's memory engine. In a clustered config, you'd need to use Faye's redis engine or some other engine that allows state to be shared by different processes. I'd prefer not to introduce another persistence component just for this purpose so I may look into implementing my own on top of my current persistent store (Neo4j).