Hazelcast Client Memory Leak

Hazelcast Client Memory Leak - memory

We have Spring Boot 2.0.4 application. We use distributed Hazelcast 3.11 cache. In our application we configured HazelcastClient which connects to a Hazelcast server in Docker container.
In cache we store different "persons" in one map and the same "persons" but as a list in another (~900 persons in one list by one key; these persons in both maps are not the same for 100%, they both describe the person in real life but the last one in the list have less properties.). All the maps are of BINARY type.
When we made stress tests to get person by random id from the cache (1st map), everything went excellent. 5000 concurrent requests didn't influence our application HEAP at all, 10000 - slightly. In JSON format one person details has the size of 10kB.
When we made stress tests to get the list of persons from the cache (2nd map) we faced problems with the HEAP of our application where the client is configured. We made just 500 concurrent requests and the HEAP grew to 4Gb size! In JSON format the list has the size of 800kB. It is stored in the 2nd map and was requested by the same key 500 times.
Does anybody know what is going on?
DTO
Controller
Method of a Facade which is retrieved from the Controller, and where caching takes place via #Cacheable annotation
HazelcastInstance configuration
hazelcast.xml configuration for the server side
500 concurrent requests (3 times in a row)
Heap, Classes
UPDATED:
I made 500 concurrent requests sequentially 23 times. Below we can see the final minutes of the test.
Telemetries Overview

#Nicolay, correct me if I'm wrong:
the second map contains lists of people, ~900 people, as an entry. You mentioned each person is ~10KB, so each entry in the second map is ~9MB, even though you're saying it's 800KB in Json format. Can you please check the size of entries in the second map through Hazelcast. like: client.getMap(map_name).getEntryView(key).getCost(). This will give you entry memory cost in bytes.
500 concurrent req, if each entry is ~9MB, will require 4.5GB additional heap, which matches what you observed.
By looking numbers, everything seems fine, other that Json size being 800KB.
Can you check those numbers?

Related

Apache beam do all keys have to fit into memory on a worker

Assuming I have an unbounded dataset with extremely high cardinity > 1,000,000,000 unique keys, lets say I want to count by key, lets say over fixed windows
My understanding the combine function will essentially maintain an accumulator on each machine in memory for each key.
Question 1
Is the above assumption correct or can workers flush out keys and accumulators to disk when under memory pressure
Question 2 (assuming above correct)
Assuming the data is not naturally partitioned (e.g reading from pubsub) would we run out of memory on each worker since every machine may in theory see every key and have to maintain an in memory structure for each key?
Question 3 (assuming above correct)
If we store the data on kafka and split up the data into partitions based on the key we are counting on. Assuming you have 1 beam worker reading from 1 partition then each worker only see a consistent subset of the keyspace. In this scenario would the memory use of the workers be any different?

Beam is meant to be highly scalable; there are Beam pipelines that run on Dataflow with many trillions of unique keys.
When running a combining operation in Beam a table of keys and aggregated values is kept in memory, but when the table becomes full it is flushed to disk (well, technically, to shuffle) so it will not run out of memory. Another worker will read this data out of shuffle, one value at a time, to compute the final aggregate over all upstream worker outputs.
As for your other two questions, if your input is naturally partitioned by key such that each worker only sees a subset of keys it is possible that more combining could happen before the shuffle, leading to less data being shuffled, but this is by no means certain and the effects would likely be small. In particular, memory considerations won't change.

Getting actual memory usage per user session in SSAS tabular model

I'm trying to build a report which would show actual memory usage per user session when working with a particular SSAS tabular in-mem model. The model itself is relatively big (~100GB in mem) and the test queries are relatively heavy: no filters, lowest granularity level, couple of SUM measures + exporting 30k rows to CSV.
First, I tried querying following DMV:
select SESSION_SPID
,SESSION_CONNECTION_ID
,SESSION_USER_NAME
,SESSION_CURRENT_DATABASE
,SESSION_USED_MEMORY
,SESSION_WRITES
,SESSION_WRITE_KB
,SESSION_READS
,SESSION_READ_KB
from $system.discover_sessions
where SESSION_USER_NAME='username'
and SESSION_SPID=29445
and got following results:
$system.discover_sessions result
I was expecting SESSION_USED_MEMORY to show at least several hundreds of MBs, but the biggest value I got is 11 KB (MS official documentation for this DMV indicates that SESSION_USED_MEMORY is in kilobytes).
I've also tried querying 2 more DMVs:
SELECT SESSION_SPID
,SESSION_COMMAND_COUNT
,COMMAND_READS
,COMMAND_READ_KB
,COMMAND_WRITES
,COMMAND_WRITE_KB
,COMMAND_TEXT FROM $system.discover_commands
where SESSION_SPID=29445
and
select CONNECTION_ID
,CONNECTION_USER_NAME
,CONNECTION_BYTES_SENT
,CONNECTION_DATA_BYTES_SENT
,CONNECTION_BYTES_RECEIVED
,CONNECTION_DATA_BYTES_RECEIVED from $system.discover_connections
where CONNECTION_USER_NAME='username'
and CONNECTION_ID=2047
But also got quite underwhelming results: 0 used memory from $system.discover_commands and 4,8 MB from $system.discover_connections for CONNECTION_DATA_BYTES_SENT, which still seems to be smaller than the actual session would take.
These results don't seem to correspond to a very blunt test, where users would send similar queries via PowerBI and we would observe ~40GB spike in RAM allocation on the SSAS server per 4 users (so roughly 10GB per user session).
Have anyone used these (or any other DMVs or methods) to get actual user session memory consumption? Using SQL tracer dump would be the last resort since it would require parsing and loading the result into a DB and my goal is to have a real-time report showing active user sessions.

neo4j not creating index on large dataset

I am creating an index on a very large (8.2M node, 63M property) neo4j db instance.
CREATE INDEX ON :Article(lowerTitle)
It takes a negligible amount of time to issue the command, and the index (presumably) begins to process.
I have a max java heap of 100GB, and 40 cores (it's a large server). It is, however stupidly, a HDD.
Right after issuing the index command, my core usage spikes up to very efficient usage. After about 20 seconds, it drops to using almost no processor power, but about 90% of MEM.
I have left it running for 3 hours, and the index is still not created (or at least, there are no improvements for simple MATCH queries on single parameters, which average out at about 16 seconds).
MATCH (arti {lowerTitle: "quantum mechanics"}) RETURN arti
Is this reasonable? What is taking so long? Am I doing something wrong?
NOTE: I have also noticed that my total database size (38.02GB) has not increased over the 3 hours

For verifying that your index is online, issue the :schema command in the browser.
You should see your index status.
ONLINE means OK
POPULATING means it is still populating the indices
FAILED means, well, failed
Your query will never run fast, because you are not using a label, so no indices will be used, change it to :
MATCH (arti:Article {lowerTitle: "quantum mechanics"}) RETURN arti

Neo4j In memory configurations, multithreading, and slow writes

How do I improve performance when writing to neo4j. I currently have neo4j set up on a server and I am currently running it in embedded more. I believe my configurations are storing all the content of my graph database in memory based upon configurations I've found online
neostore.nodestore.db.mapped_memory=0
neostore.relationship.db.mapped_memory=0
neostore.propertystore.db.mapped_memory=0
neostore.propertystore.db.strings.mapped_memory=0
neostore.propertystore.db.arrays.mapped_memory=0
neostore.propertystore.db.index.keys.mapped_memory=0
neostore.propertystore.db.index.mapped_memory=0
node_auto_indexing=true
node_keys_indexable=type,id
cache_type=strong
use_memory_mapped_buffers=false
node_cache_size=12G
relationship_cache_size=12G
node_cache_array_fraction=10
relationship_cache_array_fraction=10
Please let me know if this is incorrect. The problem that I am encountering is that when I try to persist information to the graph database. It appears that those times are not very quick in comparison to our MYSQL times of the samething(ex. to add 250 items would take about 3sec and in MYSQL it takes 1sec) . I read online that when you have multiple indexes that that can slow down performance on persisting data so I am working on that right now to see if that is my culprit. But, I just wanted to make sure that my configurations seem to be inline when it comes to running your graph database in memory.
Second question to this topic. Okay, if my configurations are good and my database is indeed in memory, then is there a way to optimize persisting data just in case this isn't the silver bullet. If we ran one thread against our test that executes this functionality, oppose to 10 threads, its seems like the times for execution bubbles up
ex.( thread 1 finishes 1s, thread 2 finishes 2s, thread 3 finishes 3s,etc). Is there some special multithreaded configuration that I am missing to improve the performance when mulitple threads are hitting it at one time.
Neo4J version
1.9.1-enterprise
My Jvm configs are
-Xms25G -Xmx25G -XX:+UseNUMA -XX:+UseSerialGC
My Machine Specs:
File system type ext3

You cache arguments are invalid.
node_cache_size=12G
relationship_cache_size=12G
node_cache_array_fraction=10
relationship_cache_array_fraction=10
These can only be used with the GCR cache. Setting the cache isn't going to put everything in memory for you at start up, you will have to write code to do this for you. Something like this:
GlobalGraphOperations ggo = GlobalGraphOperations.at(graphDatabaseFactory);
for (Node n : ggo.getAllNodes()) {
for (String propertyKey : n.getPropertyKeys()) {
n.getProperty(propertyKey);
}
for (Relationship relationship : n.getRelationships()) {
}
}
Beware with the strong cache, if you have a lot of nodes/relationships, eventually your cache will become large and performing GC against it will cause long pauses in your system.
My recommendation would be to use the memory mapped files, as this is an OS handled and will be outside of heap space. It doesn't provide near the speed of caching, but it will provide a speed up if you have to read from the neo store.

Cache (large and static) data with class variables

First, let me explain the situation, I've got following:
A "Node" Class with following attributes:
node_id (unique)
node_name (unique)
And a "NodeConnection" Class with following attributes:
node_from
node_to
We'll have around 1 to 3 million nodes and something around 3 to 10 million NodeConnections.
After the nodes and connections are imported once, they won't change.
On each request to the Rails-Application, we'll have to look up around 10 to 100 node_ids by possible node_names. And we have to lookup a few hundred to a few thousands node_connections.
We currently prototyped this without any caching (so, a LOT of database-queries) and response times were horrible (like 2 Minutes).
So we switched over to cache the nodes and connections via memcached.
Got a performance boost, but still lacking of performance. (Because we're calling Cache.read for every NodeConnection, that's a few thousand calls per request)
Now, we tried caching via Classvariable and got a huge performance boost. (Response times within a few hundred ms)
# Pseudocode below
class Node
def nodes
##nodes ||= get_nodes
end
def node_connections
##node_connections ||= get_node_connections
end
end
So, I'd like to ask about Pros and Cons of this solution.
Cons I've got yet:
Every Rails instance has to build up its own cache (it's own ClassVariables) -> higher total memory usage
Initializing the cache is time consuming (1-3 minutes), so we can't do this within a request
Any other solutions out there to cache large (>100MB) and static (data won't change during applications lifetime) data efficiently, so all rails instances within the same machine can access this cache very fast (!)?

It sounds like a very specific situation, but in order to avoid the need for a per-process in-memory cache (i.e. your class variables) to naturally warm up, I'd be investigating the feasibility of scripting the warm-up process and running it from inside an initializer... your app may take longer to start up, but your users would not have to wait.
EDIT | Note that if you were using something like Unicorn, which supports pre-loading application code before forking worker processes, you could minimize the impact of such initialization.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart