Couch DB running on Windows OS is paging while it has ample RAM available - memory

I have three node CouchDB cluster. It is running on windows. Each node has 16vcpu and 64GB RAM. I am fairly new to CouchDB and to nonrelational databases in general.
The cluster is running on windows. What I am struggling with is one of the nodes (which I am assuming is the coordinator) is using the page file about 120GB disk space while it has about 48GB free RAM available to it.
We increased the RAM from 32Gb to 64GB to help with the paging. Only to find out that, it is now using more of the page file since the page file is being currently managed by the Windows OS.
I would assume it would be paging once it used all the available RAM, but what we have is 120GB paging file while it has about 50GB free RAM.
Why is it using the page file which has less response time while it has free RAM available to it?
Wasn't it supposed to use unreserved RAM for disk caching of frequently accessed DB file blocks to speed up access? Why is it behaving this way?
Is there a CouchDB or Erlang Beam configuration setting that I should be looking at?

Related

Does it make sense to run multinode Elasticsearch cluster on a single host?

What do I get by running multiple nodes on a single host? I am not getting availability, because if the host is down, the whole cluster goes with it. Does it make sense regarding performance? Doesn't one instance of ES take as many resources from the host as it needs?
Generally no, but if you have machines with ridiculous amounts of CPU and memory, you might want that to properly utilize the available resources. Avoiding big heaps with Elasticsearch is a good thing generally since garbage collection on bigger heaps can become a problem and in any case above 32 GB you lose the benefit of pointer compression. Mostly you should not need big heaps with ES. Most of the memory that ES uses is through memory mapped files, which relies on the OS cache. So just because you aren't assigning memory to the heap doesn't mean it is not being used: more memory available for caching means you'll be able to handle bigger shards or more shards.
So if you run more nodes, that advantage goes away and you waste memory on redundant heaps, and you'll have nodes competing for resources. Mostly, you should base these decisions on actual memory, cache, and cpu usage of course.
It depends on your host and how you configure your nodes.
For example, Elastic recommends allocating up to 32GB of RAM (because of how Java compresses pointers) to elasticsearch and have another 32GB for the operating system (mostly for disk caching).
Assuming you have more than 64GB of ram on your host, let's say 128, it makes sense to have two nodes running on the same machine, having both configured to 32GB ram each and leaving another 64 for the operating system.

What are the minimum requirements of neo4j?

I'd like to use a neo4j database in a docker container with Odroid XU4. The database is not big, approximately 20.000 nodes will be in it. The Odroid has only 2G memory, and I'd like to have a samba server, some nodejs applications and at least one PgSQL database too, so the system is short on memory. I read in the neo4j manual that 2G memory is the minimum, but I read by docker examples that it is used with 512M, so I am a little confused about this. What is the minimum memory I can use the neo4j docker image with?
I have similar troubles with the disk space. The system is on a 32GB SD card. I'd like to save database data there and backup on an external hard drive, so I could spend max 16GB for the neo4j. The data certainly does not require that kind of space, I am not sure why neo4j needs it (according to the manual again).
First you can use http://neo4j.com/hardware-sizing-calculator/ to get rough estimate for memory and disk usage.
Second option is to do some math. You can use information on page 12 in http://graphaware.com/assets/bachman-msc-thesis.pdf
You should keep in mind it's good to have all data in the memory for the performance reasons.
From my point of view you shouldn't have problem with the memory, but you can't expect great performance.
It's better to try it by yourself before you ask here ;)

Solr on Tomcat, Windows OS consumes all memory

Update
I've configured both xms (initial memory) and xmx (maximum memory allocation jvm paramters, after a restart i've hooked up Visual VM to monitor the Tomcat memory usage. While the indexing process is running, the memory usage of Tomcat seems ok, memory consumption is in range of defined jvm params. (see image)
So it seems that filesystem buffers are consuming all the leftover memory, and does not drop memory? Is there a way handle this behaviour, like change nGram size or directoryFactory?
I'm pretty new to Solr and Tomcat, but here we go:
OS Windows server 2008
4 Cpu
8 GB Ram
Tomcat Service version 7.0 (64 bit)
Only running Solr
No optional JVM parameters set, but Solr config through GUI
Solr version 4.5.0.
One Core instance (both for querying and indexing)
Schema config:
minGramSize="2" maxGramSize="20"
most of the fields are stored = "true" (required)
Solr config:
ramBufferSizeMB: 100
maxIndexingThreads: 8
directoryFactory: MMapDirectory
autocommit: maxdocs 10000, maxtime 15000, opensearcher false
cache (defaults): filtercache initialsize:512 size: 512 autowarm: 0
queryresultcache initialsize:512 size: 512 autowarm: 0
documentcache initialsize:512 size: 512 autowarm: 0
We're using a .Net Service (based on Solr.Net) for updating and inserting documents on a single Solr Core instance. The size of documents sent to Solr vary from 1 Kb up to 8Mb, we're sending the documents in batches, using one or multiple threads. The current size of the Solr Index is about 15GB.
The indexing service is running around 3 a 4 hours to complete all inserts and updates to Solr. While the indexing process is running the Tomcat process memory usage keeps growing up to > 7GB Ram and does not reduce, even after 24 hours.
After a restart of Tomcat, or a Reload Core in the Solr Admin the memory drops back to 1 a 2 GB Ram. Memory leak?
Is it possible to configure the max memory usage for the Solr process on Tomcat?
Are there other alternatives? Best practices?
Thanks
You can setup JVM memory setting on tomcat. I usually do this with setenv.bat file in bin directory of tomcat (same directory as the catalina.bat/.sh files are).
Adjust following values as per your needs:
set JAVA_OPTS=%JAVA_OPTS% -Xms256m -Xmx512m"
Here are clear instruction on it:
http://wiki.razuna.com/display/ecp/Adjusting+Memory+Settings+for+Tomcat
At first you have to set XMX parameter to limit maximum memory that can be used by Tomcat. But in case of SOLR you have to remember that it uses a lot of memory outside of JVM to handle filesystem buffers. So never use more than 50% of available memory for Tomcat in this case.
I have the following setup (albeit a much smaller problem)...
5000 documents, document sizes range from 1MB to 30MB.
We have a requirement to run under 1GB for the Tomcat process on a 2 CPU / 2GB system
After bit of experimentation I came up with these settings for JAVA.
-Xms448m
-Xmx768m
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:ParallelCMSThreads=4
-XX:PermSize=64m
-XX:MaxPermSize=64m
-XX:NewSize=384m
-XX:MaxNewSize=384m
-XX:TargetSurvivorRatio=90
-XX:SurvivorRatio=6
-XX:+CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=55
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+OptimizeStringConcat
-XX:+UseCompressedOops
-XX:MinHeapFreeRatio=5
-XX:MaxHeapFreeRatio=5
These helped but I encountered issues with the OutOfMemory and Tomcat using too much memory even with such a small dataset.
Solution Or things/configuration I have set so far that seem to hold well are as follows:
Disable all caches other than QueryResultCache
Do not include text/content fields in your query only include the id
Do not use row size greater than 10 and do not include highlighting.
If you are using highlighting (this is the biggest culprit), get the document identifiers from the query first and then do the query again with highlighting and the search terms with the id field included.
Finally for the memory problem. I had to grudgingly implement an unorthodox approach to solve the tomcat/java memory hogging issue (as java never gives back memory to the OS).
I created a memory governor service that runs with debug privilege and calls windows API to force tomcat process to release memory. I also have a global mutex to prevent access to tomcat while this happens when a call comes in.
Surprisingly this approach is working out well but not without its own perils if you do not have the option to control access to Tomcat.
If you find a better solution/configuration changes please let us know.

Does every server in a MongoDB replica set need to have exactly the same RAM?

Can I set up a replica set in MongoDB 1.8 using servers with different amounts of RAM?
server1: 5gb
server2: 2gb
server3: 4gb
If yes, what are the pros and cons?
No, you do not need equal RAM. (Yes, you could set up a replica set as described.)
MongoDB uses memory-mapped files for all caching, which means that cache paging is handled by the operating system. The replicas with more memory will keep more of the database in memory; those with less will page more to disk.
MongoDB will eventually bring the entire database into memory if it can. If you're using two replicas for reads and one for writes, you might want to use the 5gb and 4gb machines for reads, so they are more likely to be hitting RAM.
Yes, you can configure a replica set this way.
If yes, what are the pros and cons?
Here's a doc explaining the major features of replica sets. Let's take a look at these in light of the RAM differences.
Pros:
More computers means better data redundancy. Having that 2GB node at least means that you have one more copy of the data.
Having a full 3 nodes on a replica set makes it easier to take one down for maintenance.
Cons:
Having servers of different sizes isn't great for automated failover. Let's say that your 5GB server is the primary. What happens when it goes down and the 2GB server wins the election? You still have automated fail-over, but your performance has probably dropped dramatically.
Read scaling may not work very well. Depending on your read patterns, sending reads to the 2GB server may result in lots of extra disk hits and slower performance.
So, the big problem here, is really one of performance. If you're just doing this for a dev setup, then it will basically work. But in production you run the risk of completely tanking your app. If your app is used to living on 4GB+ of RAM and then suddenly drops to 2GB, it may become unusable.
Most production setups want to fail over to another "equally-powered" computer.

Tomcat Minimum Memory :: Virtual Hosts vs. Multiple Instance

Trying to determine memory usage in a vanilla web app run through tomcat.
I assume that a virtual hosts setup will use significantly less memory than host-per-instance. What is the minimum memory footprint of a single host tomcat 7 instance? For each instance added does the memory footprint grow linearly, or can we share common resources among instances?
I would prefer a multi-instance setup, so as to isolate client sites (i.e. not affect other sites on redeploy or restart), but memory usage is the key. If each instance requires 512mb ram (like grails, for example), then I may have to take the virtual host route as I was not intending to use the 16GB ram available on tomcat alone!
Suggestions appreciated. BTW, only a handful of sites will incur significant load; the majority are small and draw on a client CMS (perhaps I can virtual hosts these sites and only host-per-instance with the "important" client sites)

Resources