how large is the neo4j read replica delay? - neo4j

In the clustering / causal consistency docs neo4j talks about how they use async replication between the primaries (which are in a Raft group and are replicating writes to over half the nodes) and secondaries (read replicas). AIUI, you can guarantee read-after-write goodness by passing a token from the writer (after commit) to the reader, and then the reader uses the token to query, and the read replica will only respond when it has the data replicated.
That seems great, but what is the minimum replication latency that I would have to wait for? The docs are understandably cagey about committing to a maximum (since an overwhelmed or misconfigured reader might not make progress at all!), but does log tailing after a commit begin almost constantly, or once a second, or once a minute?
The reason I care is that I want to bound my write-to-read latency to less than a second (except in rarer tail cases where read replicas are unhealthy). If the log shipping is slower than that even in the happy case, I can’t use read replicas and would have to make the primaries scale out instead (which has its own latency impact of course, but is probably a bit better).

Related

Storm process increasing memory

I am implementing a distributed algorithm for pagerank estimation using Storm. I have been having memory problems, so I decided to create a dummy implementation that does not explicitly save anything in memory, to determine whether the problem lies in my algorithm or my Storm structure.
Indeed, while the only thing the dummy implementation does is message-passing (a lot of it), the memory of each worker process keeps rising until the pipeline is clogged. I do not understand why this might be happening.
My cluster has 18 machines (some with 8g, some 16g and some 32g of memory). I have set the worker heap size to 6g (-Xmx6g).
My topology is very very simple:
One spout
One bolt (with parallelism).
The bolt receives data from the spout (fieldsGrouping) and also from other tasks of itself.
My message-passing pattern is based on random walks with a certain stopping probability. More specifically:
The spout generates a tuple.
One specific task from the bolt receives this tuple.
Based on a certain probability, this task generates another tuple and emits it again to another task of the same bolt.
I am stuck at this problem for quite a while, so it would be very helpful if someone could help.
Best Regards,
Nick
It seems you have a bottleneck in your topology, ie, a bolt receivers more data than in can process. Thus, the bolt's input queue grows over time consuming more and more memory.
You can either increase the parallelism for the "bottleneck bolt" or enable fault-tolerance mechanism which also enables flow-control via limited number of in-flight tuples (https://storm.apache.org/documentation/Guaranteeing-message-processing.html). For this, you also need to set "max spout pending" parameter.

Hadoop namenode memory usage

I am confused by hadoop namenode memory problem.
when namenode memory usage is higher than a certain percentage (say 75%), reading and writing hdfs files through hadoop api will fail (for example, call some open() will throw exception), what is the reason? Does anyone has the same thing?
PS.This time the namenode disk io is not high, the CPU is relatively idle.
what determines namenode'QPS (Query Per Second) ?
Thanks very much!
Since the namenode is basically just a RPC Server managing a HashMap with the blocks, you have two major memory problems:
Java HashMap is quite costly, its collision resolution (seperate chaining algorithm) is costly as well, because it stores the collided elements in a linked list.
The RPC Server needs threads to handle requests- Hadoop ships with his own RPC framework and you can configure this with the dfs.namenode.service.handler.count for the datanodes it is default set to 10. Or you can configure this dfs.namenode.handler.count for other clients, like MapReduce jobs, JobClients that want to run a job. When a request comes in and it want to create a new handler, it go may out of memory (new Threads are also allocating a good chunk of stack space, maybe you need to increase this).
So these are the reasons why your namenode needs so much memory.
What determines namenode'QPS (Query Per Second) ?
I haven't benchmarked it yet, so I can't give you very good tips on that. Certainly fine tuning the handler counts higher than the number of tasks that can be run in parallel + speculative execution.
Depending on how you submit your jobs, you have to fine tune the other property as well.
Of course you should give the namenode always enough memory, so it has headroom to not fall into full garbage collection cycles.

Defining Scaling Threshold for Azure Web Roles

Azure embraces the notion of elastic scaling and I've been able to acheive this with my Worker Roles. However, when it comes to my Web Roles (e.g. MVC Apps) I am not sure what to monitor (or how) to determine when its a good time to increase (or descrease) the number of running instances. I'm assuming I need to monitor one or many Performance Counters but not sure where to start.
Can anyone recommend a best practice for assessing an MVC Web Role instances load relative to scaling decisions?
This question is a bit open-ended, as monitoring is typically app-specific. Having said that:
Start with simple measurements that you'd look at on a local server, representing KPIs for your app. For instance: Maybe look at network utilization. This TechNet article describes performance counters collected by System Center for Windows Azure. For instance:
ASP.NET Applications Requests/sec
Network Interface Bytes
Received/sec
Network Interface Bytes Sent/sec
Processor % Processor Time Total
LogicalDisk Free Megabytes
LogicalDisk % Free Space
Memory Available Megabytes
You may also want to watch # of requests queued and request wait time.
Network utilization is interesting, since your NIC provides approx. 100Mbps per core and could end up being a bottleneck even when CPU and other resources are underutilized. You may need to scale out to more instances to handle high-bandwidth scenarios.
Also: I tend to give less importance to CPU utilization, even though it's so easy to measure (and shows up so frequently in examples). Running a CPU at near capacity is a good thing usually, since you're paying for it and might as well use as much as possible.
As far as decreasing: This needs to be handled a bit more carefully. Windows Azure compute is billed by the hour. If, say, you scale out to an extra instance at 11:50 and scale in again at 12:10, you've just incurred two cpu-hours. Also: You don't want to scale out, then take new measurements and deciding you can now scale back again (effectively creating a constant pulse of adding and decreasing instances). To make things easier, consider the Autoscaling Application Block (WASABi), found in the Enterprise Library. This has all the scale rules baked in (such as the ones I just mentioned) and is very straightforward to use.

Erlang fault-tolerant application: PA or CA of CAP?

I have already asked a question regarding a simple fault-tolerant soft real-time web application for a pizza delivery shop.
I have gotten really nice comments and answers there, but I disagree in that it is a true web service. Rather than a web service, it is more of a real-time system to accept orders from customers, control the dispatching of these orders and control the vehicles that deliver those orders in real time.
Moreover, unlike a 'true' web service this system is not intended to have many users - it is just a few dispatchers (telephone operators) and a few delivery drivers that will use it (as for now I have no requirement to provide direct access to the service to the actual customers; only the dispatchers and delivery drivers will have the direct access).
Hence this question is a bit more general.
I have found that in order to make a right choice for a NoSQL data storage option for this application first thing that I have to do is to make a choice between CA, PA and CP according to the CAP theorem.
Now, the Building Web Applications with Erlang book says that "while it [Mnesia] is not a SQL database, it is a CA database like a SQL database. It will not handle network partition". The same book says that the CouchDB database is a PA database.
Having that in mind, I think that the very first thing that I need to do with my application is to decide what the 'fault-tolerance' term means regarding to CAP.
The simple requirement that I have is to have the application available 24/7(R1). The other one is that there is no need to scale, the application will have a very modest amount of users (it is probably not possible to have thousands of dispatchers) (R2).
Now, does R1 require the application to provide Consistency, Availability and Partition Tolerance and with what priorities?
What type of data storage option will better handle the following issues:
Providing 24/7 availability for a dispatcher (a person who accepts phone calls from customers and who uses a CRM) to look up customer records and put orders into the system;
Looking up current ongoing served orders and their status (placed, baking, dispatched, delivering, delivered) in real time;
Keep track of all working vehicles' locations and their payloads in real time;
Recover any part of the system after system crash or network crash to continue providing 1,2 and 3;
To sum it up: What kind of Data Storage (CA, PA or CP) will suite the system described above better? What kind of Data Storage will better satisfy the R1 requirement?
For your 24/ requirement you are searching a database with (High) Availability because you want your requests to succeed everytime (even if they are only error results).
A netsplit would bringt your whole system down, when you have no partition tolerance
Consistency is nice to have, but you can only have 2 of 3.
Your best bet will be a PA solution. I highly recomment a solution which has been inspired by Amazon Dynamo. The best known dynamo implementations are riak and couchdb. Riak even allows you to change PA to some other form by tuning the read and write replicas.
First, don't confuse CAP "Availability" with "High Availability". They have nothing to do with each other. The A in CAP simply means "All DB nodes can answer queries". To get High Availability, you must be in multiple data centers, you must have robust documented procedures for maintenance, expansion, etc. None of that depends on your CAP choice.
Second, be realistic about your requirements. A stock-trading application might have a requirement for 100% uptime, because every second of downtime could loose millions of dollars. On the other hand, I'm guessing your pizza joint might loose tens of dollars for every minute it's down. So it doesn't make sense to spend millions trying to keep it up. Try to compute your actual costs.
Third, always evaluate your choice vs mainstream. You could just go CA (MySQL) and quickly fail-over to the slaves when problems happen. Be realistic about the costs (and risks) of building on new technology. If you really expect your system to run for 5 years without downtime, ask for proof that someone else has run that database for 5 years without downtime.
If you go "AP" and have remote people (drivers, etc.) then you'll need to write an app that stores their data on their phone and sends it in the background (with retries). Of course, you could do this regardless of weather your database was CA or AP.
If you want high uptimes, you can either:
Increase MTBF (Mean Time Between Failures) - Buy redundant power supplies, buy dual ethernet cards, etc..
Decrease MTTR (Mean Time To Recovery) - Just make sure when failure happens you can recover quickly. (Fail over to slave)
I've seen people spend tens of thousands of dollars on MTBF, only to be down for 8 hours while they restore their backup. It makes more sense to ensure MTTR is low before attacking MTBF.

Why is membase server so slow in response time?

I have a problem that membase is being very slow on my environment.
I am running several production servers (Passenger) on rails 2.3.10 ruby 1.8.7.
Those servers communicate with 2 membase machines in a cluster.
the membase machines each have 64G of memory and a100g EBS attached to them, 1G swap.
My problem is that membase is being VERY slow in response time and is actually the slowest part right now in all of the application lifecycle.
my question is: Why?
the rails gem I am using is memcache-northscale.
the membase server is 1.7.1 (latest).
The server is doing between 2K-7K ops per second (for the cluster)
The response time from membase (based on NewRelic) is 250ms in average which is HUGE and unreasonable.
Does anybody know why is this happening?
What can I do inorder to improve this time?
It's hard to immediately say with the data at hand, but I think I have a few things you may wish to dig into to narrow down where the issue may be.
First of all, do your stats with membase show a significant number of background fetches? This is in the Web UI statistics for "disk reads per second". If so, that's the likely culprit for the higher latencies.
You can read more about the statistics and sizing in the manual, particularly the sections on statistics and cluster design considerations.
Second, you're reporting 250ms on average. Is this a sliding average, or overall? Do you have something like max 90th or max 99th latencies? Some outlying disk fetches can give you a large average, when most requests (for example, those from RAM that don't need disk fetches) are actually quite speedy.
Are your systems spread throughout availability zones? What kind of instances are you using? Are the clients and servers in the same Amazon AWS region? I suspect the answer may be "yes" to the first, which means about 1.5ms overhead when using xlarge instances from recent measurements. This can matter if you're doing a lot of fetches synchronously and in serial in a given method.
I expect it's all in one region, but it's worth double checking since those latencies sound like WAN latencies.
Finally, there is an updated Ruby gem, backwards compatible with Fauna. Couchbase, Inc. has been working to add back to Fauna upstream. If possible, you may want to try the gem referenced here:
http://www.couchbase.org/code/couchbase/ruby/2.0.0
You will also want to look at running Moxi on the client-side. By accessing Membase, you need to go through a proxy (called Moxi). By default, it's installed on the server which means you might make a request to one of the servers that doesn't actually have the key. Moxi will go get it...but then you're doubling the network traffic.
Installing Moxi on the client-side will eliminate this extra network traffic: http://www.couchbase.org/wiki/display/membase/Moxi
Perry

Resources