Neo4j - Difference between High Availability and Distributed Mechanism? - neo4j

Neo4j explain about the clustering through a concept called High Availability. And, What I know about clustering with hadoop knowledge is distributed computing.
What are the difference between these two concepts?
Thanks

Neo4j High Availability refers to an approach for scaling the number of requests to which Neo4j can respond. Neo4j HA implements a master slave with replication clustering model for high availability scaling. This means that all writes go to the "master" server (or are proxied to master from the slaves) and the update is synchronized among the slave servers. Reads can be sent to any server in the cluster, scaling out the number of requests to which the database can respond.
Compare this to distributed computing, which is a general term to describe how computation operations can be done in parallel across a large number of machines. One key difference is the concept of data sharding. With Neo4j each server in the cluster contains a full copy of the graph, whereas with a distributed filesystem such as HDFS, the data is sharded and each machine stores only a small piece of the entire dataset.
Part of the reason Neo4j does not shard the graph is that since a graph is a highly connected data structure, traversing through a distributed/sharded graph would involve lots of network latency as the traversal "hops" from machine to machine.

Related

Bosun HA and scalability

I have a minor bosun setup, and its collecting metrics from numerous services, and we are planning to scale these services on the cloud.
This will mean more data coming into bosun and hence, the load/efficiency/scale of bosun is affected.
I am afraid of losing data, due to network overhead, and in case of failures.
I am looking for any performance benchmark reports for bosun, or any inputs on benchmarking/testing bosun for scale and HA.
Also, any inputs on good practices to be followed to scale bosun will be helpful.
My current thinking is to run numerous bosun binaries as a cluster, backed by a distributed opentsdb setup.
Also, I am thinking is it worthwhile to run some bosun executors as plain 'collectors' of scollector data (with bosun -n command), and some to just calculate the alerts.
The problem with this approach is it that same alerts might be triggered from multiple bosun instances (running without option -n). Is there a better way to de-duplicate the alerts?
The current best practices are:
Use https://godoc.org/bosun.org/cmd/tsdbrelay to forward metrics to opentsdb. This gets the bosun binary out of the "critical path". It should also forward the metrics to bosun for indexing, and can duplicate the metric stream to multiple data centers for DR/Backups.
Make sure your hadoop/opentsdb cluster has at least 5 nodes. You can't do live maintenance on a 3 node cluster, and hadoop usually runs on a dozen or more nodes. We use Cloudera Manager to manage the hadoop cluster, and others have recommended Apache Ambari.
Use a load balancer like HAProxy to split the /api/put write traffic across multiple instances of tsdbrelay in an active/passive mode. We run one instance on each node (with tsdbrelay forwarding to the local opentsdb instance) and direct all write traffic at a primary write node (with multiple secondary/backup nodes).
Split the /api/query traffic across the remaining nodes pointed directly at opentsdb (no need to go thru the relay) in an active/active mode (aka round robin or hash based routing). This improves query performance by balancing them across the non-write nodes.
We only run a single bosun instance in each datacenter, with the DR site using the read only flag (any failover would be manual). It really isn't designed for HA yet, but in the future may allow two nodes to share a redis instance and allow active/active or active/passive HA.
By using tsdbrelay to duplicate the metric streams you don't have to deal with opentsdb/hbase replication and instead can setup multiple isolated monitoring systems in each datacenter and duplicate the metrics to whichever sites are appropriate. We have a primary and a DR site, and choose to duplicate all metrics to both data centers. I actually use the DR site daily for Grafana queries since it is closer to where I live.
You can find more details about production setups at http://bosun.org/resources including copies of all of the haproxy/tsdbrelay/etc configuration files we use at Stack Overflow.

Scaling HBase writes on a cluster using Thrift

We're trying to scale up HBase writes on a cluster using Thrift. (Our HBase application is in Python, and hence needs Thrift.)
Despite increasing the number of nodes in the cluster, we are seeing the same write speeds.
First off, is the recommended strategy to run Thrift on:
1. The client?
2. The HBase master?
3. HBase region servers?
If on #1 or #2, will the client or HBase master take care of splitting the requests to the various region servers? It doesn't appear to in our case.
If #3, then I have to modify the client to write to the specific region servers, and randomize the writes. I can do this, but it seems to defeat the purpose of using HBase.
Any other tips on read/write scaling (especially with Thrift) are greatly appreciated.
In HBase to gain performance with node increase you should have a decent "rowkey" distribution. As long as you have "hot spots" (a very busy region server) in your cluster you would not gain anything from increasing your cluster size. checkout the article on row key design to start with.
If you don't need to read right away (if you are comfortable with async writes) you can check asynch hbase client from stumbleupon for performance gain.
I found the answer at these two questions, it looks like we'll go with #3 (write to the specific region servers, and randomize the writes):
Is it better to send data to hbase via one stream or via several servers concurrently?
HBase Thrift: how to connect to remote HBase master/cluster?

How scalable is distributed Erlang?

Part A:
Erlang has a lot of success stories about running concurrent agents e.g. the millions of simultaneous Facebook chats. That's millions of agents, but of course it's not millions of CPUs across a network. I'm having trouble finding metrics on how well Erlang scales when scaling is "horizontal" across a LAN/WAN.
Let's assume that I have many (tens of thousands) physical nodes (running Erlang on Linux) that need to communicate and synchronize small infrequent amounts of data across the LAN/WAN. At what point will I have communications bottlenecks, not between agents, but between physical nodes? (Or will this just work, assuming a stable network?)
Part B:
I understand (as an Erlang newbie, meaning I could be totally wrong) that Erlang nodes attempt to all connect to and be aware of each other, resulting in an N^2 connection point-to-point network. Assuming that part A won't just work with N = 10K's, can Erlang be configured easily (using out-of-the-box config or trivial boilerplate, not writing a full implementation of grouping/routing algorithms myself) to cluster nodes into manageable groups and route system -wide messages through the cluster/group hierarchy?
We should specify that we talk about horizontal scalability of physical machines -- that's the only problem. CPUs on one machine will be handled by one VM, no matter what the number of those is.
node = machine.
To begin, I can say that 30-60 nodes you get out of the box (vanilla OTP installation) with any custom application written on the top of that (in Erlang). Proof: ejabberd.
~100-150 is possible with optimized custom application. I means, it has to be good code, written with knowledge about GC, characteristic of data types, message passing etc.
over +150 is all right but when we talk about numbers like 300, 500 it will require optimizations & customizations of TCP layer. Also, our app has to be aware of cost of e.g. sync calls across the cluster.
The other thing is DB layer. Mnesia (built-in) due its features will not be effective over 20 nodes (my experience - I may be wrong). Solution: just use something else: dynamo DBs, separate cluster of MySQLs, HBase etc.
The most common technique to leverage cost of creating high quality application and scalability are federations of ~20-50 nodes clusters. So internally its an efficient mesh of ~50 erlang nodes and its connected via any suitable protocol with N another 50 nodes clusters. To sum up, such a system is federation of N erlang clusters.
Distributed erlang is designed to run in one data center. If you need more, geographically distant nodes, then use federations.
There are lots of config options e.g. which do not connect all nodes to each other. It may be helpful, however in ~50 cluster erlang overhead is not significant. Also you can create a graph of erlang nodes using 'hidden' connection, which doesn't join this full mesh, but also it cannot benefit from connection to all nodes.
The biggest problem I see, in this kind of systems, is designing it as master-less system. If you do not need that, everything should be ok.

Berkeley DB replication: upper limit on number of replicants?

I'm considering using Berkeley DB to cache some data on an application cluster. What's a reasonable upper limit on the number of nodes I can plan on Berkeley DB handling? Writing to the database will be from a single node.
Mark,
Most of our customers are using replication groups of 5-20 nodes, although we have some large customers running with much larger replication groups. There is no inherent limit built into Berkeley DB.
The real world limit will depend on your read/write workload mix, how you configure your replication system and the amount of CPU cycles available on the master system. Basically, the master needs to communicate with each replica (send log records, process acknowledgements, respond to requests, etc.). Each replica that communicates with the master adds a small amount of overhead. For a mostly read/occasionally write workload, the master doesn't have to communicate that often and communicating with the replicas requires minimal processing. On a predominantly write workload, the master is actively communicating with the replicas and incurs a more significant workload per replica. You can reduce the workload on the master by funneling read operations to the replicas and by utilizing the Berkeley DB HA client-to-client synchronization feature.
Your mileage will vary, so the best approach is to test a prototype of your application and evaluate the balance of throughput, application requirements and available CPU cycles. Do you have a sense of how many nodes you expect to need in your replication groups?
Regards,
Dave
PS: The Getting Started with Replication Guide is a good place to start.

Scaling with a cluster- best strategy

I am thinking about the best strategy to scale with a cluster of servers. I know there is no hard and fast rules, but I am curious what people think about these scenarios:
cluster of combination app/db servers that are round robin (with failover) balanced using dnsmadeeasy. the db's are synced using replication. Has the advantage that capacity can be augmented easily by adding another server to the cluster, and it is naturally failsafe.
cluster of app servers, again round robin load balanced (with failover) using dnsmadeeasy, all reporting to a big DB server in the back. easy to add app servers, but the single db server creates a single failure point. Could possible add a hot standby with replication.
cluster of app servers (as above) using two databases, one handling reads only, and one handling writes only.
Also, if you have additional ideas, please make suggestions. The data is mostly denormalized and non relational, and the DBs are 50/50 read-write.
Take 2 physical machines and make them Xen servers
A. Xen Base alpha
B. Xen Base beta
In each one do three virtual machines:
"web" server for statics(css,jpg,js...) + load balanced proxy for dynamic request (apache+mod-proxy-balancer,nginx+fair)
"app" server (mongrel,thin,passenger) for dynamic requests
"db" server (mySQL, PostgreSQL...)
Then your distribution of functions can be like this:
A1 owns your public ip and handle requests to A2 and B2
B1 pings A1 and takes over if ping fails
A2 and B2 take dynamic request querying A3 for data
A3 is your dedicated data server
B3 backups A3 second to second and offer readonly access to make copies, backups etc.
B3 pings A3 and become master if A3 becomes unreachable
Hope this can help you some way, or at least give you some ideas.
It really depends on your application.
I've spent a bit of time with various techniques for my company and what we've settled on (for now) is to run a reverse proxy/loadbalancer in front of a cluster of web servers that all point to a single master DB. Ideally, we'd like a solution where the DB is setup in a master/slave config and we can promote the slave to master if there are any issues.
So option 2, but with a slave DB. Also for high availability, two reverse proxies that are DNS round robin would be good. I recommend using a load balancer that has a "fair" algorithm instead of simple round robin; you will get better throughput.
There are even solutions to load balance your DB but those can get somewhat complicated and I would avoid them until you need it.
Rightscale has some good documentation about this sort of stuff available here: http://wiki.rightscale.com/
They provide these types of services for the cloud hosting solutions.
Particularly useful I think are these two entries with the pictures to give you a nice visual representation.
The "simple" setup:
http://wiki.rightscale.com/1._Tutorials/02-AWS/02-Website_Edition/2._Deployment_Setup
The "advanced" setup:
http://wiki.rightscale.com/1._Tutorials/02-AWS/02-Website_Edition/How_do_I_set_up_Autoscaling%3f
I'm only going to comment on the database side:
With a normal RDBMS a 50/50 read/write load for the DB will make replication "expensive" in terms of overhead. For almost all cases having a simple failover solution is less costly than implementing a replicating active/active DB setup. Both in terms of administration/maintenance and licensing cost (if applicable).
Since your data is "mostly denormalized and non relational" you could take a look at HBase which is an OSS implementation of Google Bigtable, a column based key/value database system. HBase again is built on top of Hadoop which is an OSS implementation of Google GFS.
Which solution to go with depends on your expected capacity growth where Hadoop is meant to scale to potentially 1000s of nodes, but should run on a lot less as well.
I've managed active/active replicated DBs, single-write/many-read DBs and simple failover clusters. Going beyond a simple failover cluster opens up a new dimension of potential issues you'll never see in a failover setup.
If you are going for a traditional SQL RDBMS I would suggest a relatively "big iron" server with lots of memory and make it a failover cluster. If your write ratio shrinks you could go with a failover write cluster and a farm of read-only servers.
The answer lies in the details. Is your application CPU or I/O bound? Will you require terabytes of storage or only a few GB?

Resources