Low Latency Bigdata On Couchbase - membase

Is Couchbase a kind of storage that address GroupBy-based read and write of 4TB worth of data with low latency? If not, what size of data Couchbase is good for for low latency access ?

Couchbase can definitely handle 4TB of data. It will be fast to the degree you can keep your working set in RAM. So you can have disk greater than memory, but you want to have a really small # of cache-miss rates, which we let you monitor. If you see that % get too high, it is time to grow your cluster so that more ram becomes available.
4TB should be a few tens of nodes. At that scale, disk throughput starts to be the limiting factor (eg slow disks take too long to warm up lots of ram). So for really hot stuff, people use SSDs, but for the majority of apps EC2 is plenty fine.

Related

Is the accessing speed of the RAM/Disk Memory dependent on its volume?

As the image shows that, as the memory capacity increases the accessing time is also increasing.
Does it make sense that, accessing time is dependent on the memory capacity..???
No. The images show that technologies with lower cost in $ / GB are slower. Within a certain level (tier of the memory hierarchy), performance is not dependent on size. You can build systems with wider busses and so on to get more bandwidth out of a certain tier, but it's not inherently slower to have more.
Having more disks or larger disks doesn't make disk access slower, they're close to constant latency determined by the nature of the technology (rotating platter).
In fact, larger-capacity disks tend to have better bandwidth once they do seek to the right place, because more bits per second are flying under the read / write heads. And with multiple disks you can run RAID to utilize multiple disks in parallel.
Similarly for RAM, having multiple channels of RAM on a big many-core Xeon increases aggregate bandwidth. (But unfortunately hurts latency due to a more complicated interconnect vs. simpler quad-core "client" CPUs: Why is Skylake so much better than Broadwell-E for single-threaded memory throughput?) But that's a sort of secondary effect, and just using RAM with more bits per DIMM doesn't change latency or bandwidth, assuming you use the same number of DIMMs in the same system.

What are the minimum requirements of neo4j?

I'd like to use a neo4j database in a docker container with Odroid XU4. The database is not big, approximately 20.000 nodes will be in it. The Odroid has only 2G memory, and I'd like to have a samba server, some nodejs applications and at least one PgSQL database too, so the system is short on memory. I read in the neo4j manual that 2G memory is the minimum, but I read by docker examples that it is used with 512M, so I am a little confused about this. What is the minimum memory I can use the neo4j docker image with?
I have similar troubles with the disk space. The system is on a 32GB SD card. I'd like to save database data there and backup on an external hard drive, so I could spend max 16GB for the neo4j. The data certainly does not require that kind of space, I am not sure why neo4j needs it (according to the manual again).
First you can use http://neo4j.com/hardware-sizing-calculator/ to get rough estimate for memory and disk usage.
Second option is to do some math. You can use information on page 12 in http://graphaware.com/assets/bachman-msc-thesis.pdf
You should keep in mind it's good to have all data in the memory for the performance reasons.
From my point of view you shouldn't have problem with the memory, but you can't expect great performance.
It's better to try it by yourself before you ask here ;)

Passenger server upgrade: Processor (CPU) Cores VS Ram?

I went through documentation of Passenger to find out how many application instances it can run with respect to hardware configuration. Documentation only talks about RAM
The optimal value depends on your system’s hardware and the server’s average load. You should experiment with different values. But generally speaking, the value should be at least equal to the number of CPUs (or CPU cores) that you have. If your system has 2 GB of RAM, then we recommend a value of 30. If your system is a Virtual Private Server (VPS) and has about 256 MB RAM, and is also running other services such as MySQL, then we recommend a value of 2.
It says minimum value can be number of CPU/CPU Cores we have. I have a VPS with one VCPU & 1GB RAM & my service provider has an option to just upgrade the RAM. I'm wondering how far I can just keep upgrading only RAM? How important it is to upgrade number of CPUs?
Quick Answer
Depends on what resources are the bottleneck for your app.
Long answer
You'll need to factor in a few things:
How much CPU time does your app need?
How much RAM does any given instance of your app use at peak load?
Does your app spend a lot of time doing IO intensive tasks? (ie: db and file reads/writes, network communication)
There can be other things to factor in, but your bottlenecks will probably be one of the above. If RAM is your main bottleneck, by all means use your newly available RAM. However, if it turns out that your app is being slowed down by CPU availability or flooded IO, no amount of RAM is going to speed things up.
On the topic of CPU cores; my understanding is that the main Apache process that runs Passenger is a single threaded process. Apache spawns new threads to handle concurrency on an as-needed basis. Each additional CPU core theoretically allows you to spawn x*n threads, where x is the number of threads you can optimally run under a single CPU core and n is the number of CPU cores available to Apache.
Disclaimer: I'm not very well read on Passenger internals; though this logic usually holds true for other kinds of Apache configurations.

Does every server in a MongoDB replica set need to have exactly the same RAM?

Can I set up a replica set in MongoDB 1.8 using servers with different amounts of RAM?
server1: 5gb
server2: 2gb
server3: 4gb
If yes, what are the pros and cons?
No, you do not need equal RAM. (Yes, you could set up a replica set as described.)
MongoDB uses memory-mapped files for all caching, which means that cache paging is handled by the operating system. The replicas with more memory will keep more of the database in memory; those with less will page more to disk.
MongoDB will eventually bring the entire database into memory if it can. If you're using two replicas for reads and one for writes, you might want to use the 5gb and 4gb machines for reads, so they are more likely to be hitting RAM.
Yes, you can configure a replica set this way.
If yes, what are the pros and cons?
Here's a doc explaining the major features of replica sets. Let's take a look at these in light of the RAM differences.
Pros:
More computers means better data redundancy. Having that 2GB node at least means that you have one more copy of the data.
Having a full 3 nodes on a replica set makes it easier to take one down for maintenance.
Cons:
Having servers of different sizes isn't great for automated failover. Let's say that your 5GB server is the primary. What happens when it goes down and the 2GB server wins the election? You still have automated fail-over, but your performance has probably dropped dramatically.
Read scaling may not work very well. Depending on your read patterns, sending reads to the 2GB server may result in lots of extra disk hits and slower performance.
So, the big problem here, is really one of performance. If you're just doing this for a dev setup, then it will basically work. But in production you run the risk of completely tanking your app. If your app is used to living on 4GB+ of RAM and then suddenly drops to 2GB, it may become unusable.
Most production setups want to fail over to another "equally-powered" computer.

DataSet size best practices - are there any general rules?

I'm working on a desktop application that will produce several in-memory datasets as an intermediary before being committed to a database.
Obviously I'm going to try to keep the size of these to a minimum, but are there any guidelines on thresholds I shouldn't cross for good functionality on an 'average' machine?
Thanks for any help.
There is no "average" machine. There is a wide range of still-in-use computers, including those that run DOS/Win3.1/Win9x and have less than 64MB of installed RAM.
If you don't set any minimum hardware requirements for your application, at least consider the oldest OS you're planning to support, and use the official minimum hardware requirements of that OS to gain a lower-bound assesment.
Generally, if your application is going to consume a considerable amount of RAM, you may want to let the user configure the upper bounds of the application's memory management mechanism.
That said, if you decide to dynamically manage the upper bounds based on realtime data, there are quite a few things you can do.
If you're developing a windows application, you can use WMI to get the system's total memory amount, and base your limitations on that value (say, use up to 5% of the total memory).
In .NET, if your data structures are complex and you find it hard to assess the amount of memory you consume, you can query the Garbage Collector for the amount of allocated memory using GC.GetTotalMemory(false), or use a System.Diagnostics.Process object.

Resources