Redis performance comparison: using TTL vs allkeys-lru policy - memory

In Redis, using allkeys-lru deletes the key no matter if it's an expire-set key or not.
Using TTL, setting an expiration for the key, uses memory.
Quoting from Redis.io:
It is also worth noting that setting an expire to a key costs memory,
so using a policy like allkeys-lru is more memory efficient since
there is no need to set an expire for the key to be evicted under
memory pressure.
Is it really more efficient overall to NOT put a TTL on the key and let allkeys-lru policy handle it?
Isn't there any tradeoffs in this situation? For example, does the allkeys-lru block the write action until it completes the expiration? That would make me use the TTL if this expiration is going to take long durations.
I would love to discuss about this. Thanks for everybody's input!

allkeys-lru is triggered by Redis allocated memory limit. It's a safety feature to avoid crashing Redis entirely.
If you rely on only allkeys-lru to cleanup your data then your Redis will run slower because any operation would have to be applied to a bigger DB. And your Redis DB will always be at max size.
Also it makes it harder to monitor your resources during your business growth.
Using TTL on your values is more a technical decision based on your use case. It gives you more control over which events you don't need anymore.
TTL uses more memory because it has to store the TTL value for each record, makes sense.
For Redis-Streams, you can use the MAXLEN property to not grow your streams too much, specially when you don't need older data. This property is per stream so it will not increase Redis memory that much.
Redis-streams are expired by stream(by key), not by record. So it's not possible to expire old records from Streams based on a TTL/record if you continuously receive new data.
Main conclusion: Use TTL and MAXLEN where possible to cleanup unnecessary data sooner so Redis will not need to do it all at once and you will have more control over your data and resources.

Related

InfluxDB 2 storage retention/max. size

I am using InfluxDB 2.2 to store & aggregate data on a gateway device. The environment is pretty limited regarding space. I do not know in which interval and how large the data is that get's ingested. Retention is not that much of a requirement. All I want is to make sure that the influx db does not grow larger than let's say 5GB.
I know that I could just set restrictive bounds to the retention but this does not feel like an ideal solution. Do you see any possibility to achieve this?
Seems that you are more concerned about the disk space. If so, there are several workaround you could try:
Retention policy: this is similar to TTL in other NoSQL and it could help you to delete the obsolete data automatically. How long you should set the retention policy really depends on the business you are running. You could run the instance for a few days and see how the disk space is growing and then change your retention policy.
Downsampling: "Downsampling is the process of aggregating high-resolution time series within windows of time and then storing the lower resolution aggregation to a new bucket ". not all data need to retrieved at all times. Most of the time, the fresher the data (i.e. hot data), the more frequent it will be fetched. What's more, you might just need to see the big picture of historical data, i.e. less granular. For example, if you are collecting the data in second-level granularity, you could perform a downsampling task to only retain the mean of the indicator values at an minute or even hour precision instead. That will save you a lot of space while not affecting your trending view that much.
See more details here.

Erlang SSL TCP Server And Garbage Collection

Edit: The issue seems to be with SSL acccpeting and a memory leak.
I have noticed if you have long lived Processes(its a server), and clients send data to the server(recv), the Erlang garbage collection never gets called (or rarely)
Servers need data (to preform actions), and the data can be variable length (due to a message like "Hello" or "How are you doing"). Because of this, it seems like the Erlang process will accumulate garbage.
How can you properly handle this, the Erlang process has to touch the recv data, so is it unavoidable? Or do you have to come up with designs that touches the variable length data the less amount of times (like immediately passing it to a port driver).
Spawning a worker to process the data is a bad solution(millions of connections ...), and using workers would basically be the same thing, right? So that leaves me with very few options.
Thanks ...
If the server holds on to the received message longer than it needs to, it's a bug in the server implementation. Normally, the server should forget all or most references to the data in a request when that request has finished processing, and the data will then become garbage and will eventually get collected. But if you stick the data from each request in a list in the process state, or in an ets table or similar, you will get a memory leak.
There is a bit of an exception with binaries larger than 64 bytes, because they are handled by reference counting, and to the memory allocator it can look like there's no need to perform a collection yet, although the number of bytes used off-heap by such binaries can be quite large.
Just incase anyone finds themselves in the same boat. This is known/happening when hammering the server with many connections at once. I think.
Without ssl, my sessions are at ~8KB roughly, and resources are being triggered for GC as expected. With SSL its an increase to ~150KB, and the memory keeps growing and growing and growing.
http://erlang.org/pipermail/erlang-questions/2017-August/093037.html
You might be having a problem with large SSL/TLS session tables. We (OTP team) have historically had some issues with those tables as they may grow very large if you do not have some limiting mechanisms. Alas in the latest ssl version one of the limiting mechanism was broken, however it is easily fixed by this patch. As a workaround you can also sacrifice some efficiency and disable session reuse.
diff --git a/lib/ssl/src/ssl_manager.erl b/lib/ssl/src/ssl_manager.erl
index ca9aaf4..ef7c3de 100644
--- a/lib/ssl/src/ssl_manager.erl
+++ b/lib/ssl/src/ssl_manager.erl
## -563,7 +563,7 ## server_register_session(Port, Session, #state{session_cache_server_max = Max,
do_register_session(Key, Session, Max, Pid, Cache, CacheCb) ->
try CacheCb:size(Cache) of
- Max ->
+ Size when Size >= Max ->
invalidate_session_cache(Pid, CacheCb, Cache);
_ ->
CacheCb:update(Cache, Key, Session),

Data Retrieval Throughput - ETS lookup vs inter-process Messaging

suppose we have an erlang application which involves thousands of processes. Suppose there is a single resource X which may be a tuple, a list, or any erlang term, which all these processes may need to read / pick out something from it, at any moment in time.
An example of such an occurrence, is say, an API system, in which client processes may need to read and write on a remote machine. Ant it happens that you do not want, for each read/write request, a new connection to be created. So, what you do, you create a pool of connections, consider them as a pool of open pipes/sockets/channels.
Now, this pool of resources is to be shared by thousands of processes such that for each read or write demand, you want that process to retrieve any available open channel/resource.
Question is, what if i have a process (a single process) hold this information, whether in its process dictionary or in its receive loop. It would mean that all the processes would have to send a message to this process whenever they need a free resource. This single process would have a huge mailbox at any time because of the high demand for this single resource. OR I could use an ETS Table, and have only one row, say, #resources{key=pool,value= List_of_openSockets_or_channels}. But this would mean that, all our processes would attempt to make a read from the ETS Table for the same row at (high probability) same instantaneous times.
How would the ETS Table handle, if 10,000 process atttempt a read, for the same row/record from it, at the same time/at almost same time ? and yet, if i use a process, its mailbox, if 10,000 processes send a message to it, at same time, for the same resource (and it would need to reply each requestor). And remember this action may occur so frequently. What option (dis-regarding availability issues of process going down blah blah), would provide higher throughput, in a way that, processes would get what they need faster ? Is there any other better way, of handling high demand data structures in the Erlang VM in a way that will provide very fast access to millions of processes, even if they all needed that resource at the same time ?
Short answer: profile. Try different approaches and verify how your system behaves.
Firstly, I would look at ETS' {read_concurrency, true} option. From the documentation:
{read_concurrency,boolean()} Performance tuning. Default is false.
When set to true, the table is optimized for concurrent read
operations. When this option is enabled on a runtime system with SMP
support, read operations become much cheaper; especially on systems
with multiple physical processors. However, switching between read and
write operations becomes more expensive. You typically want to enable
this option when concurrent read operations are much more frequent
than write operations, or when concurrent reads and writes comes in
large read and write bursts (i.e., lots of reads not interrupted by
writes, and lots of writes not interrupted by reads). You typically do
not want to enable this option when the common access pattern is a few
read operations interleaved with a few write operations repeatedly. In
this case you will get a performance degradation by enabling this
option. The read_concurrency option can be combined with the
write_concurrency option. You typically want to combine these when
large concurrent read bursts and large concurrent write bursts are
common.
Secondly, I would look at caching possibilities. Are the processes reading that information only once or multiple times? If they're accessing it multiple times, you could read it once and store it in your process state.
Thirdly, you could try to replicate and distribute that piece of information across your system. Divide et impera.
If you use the process approach, in order to avoid having all the read requests serialized on the message queue of the 'server' process you must replicate.
Using an ETS table with read_concurrency feels more natural and it is something that I used when developing the parallel version of Dialyzer. However, ETS access was never a bottleneck in that case.

Can a Memcached daemon ever free() unused memory, without terminating the process?

I believe that you can't force a running Memcached instance to de-allocate memory, short of terminating that Memcached instance (and freeing all of the memory it held). Does anyone know of a definitive piece of documentation, or even a mailing list or blog posting from a reliable source, that can confirm or deny this impression?
As I understand it, a Memcached process initially allocates a chunk of memory (the exact initial allocation size is configurable), and then monotonically increases its memory utilization over its lifetime, limited by the daemon's maximum memory allocation size (also configurable). At no point does the Memcached daemon ever free any memory, regardless of whether the daemon has any ongoing need for the memory it holds.
I know that this question might sound a little whiny, with a tone of "I DEMAND that open source project X support my specific need!" That's not it, at all--I'm purely interested in the exact technical answer, here, and I swear I'm not harshing on Memcached. For the curious, this question came out of a discussion about possible methods for gracefully juggling multiple Memcached instances on a single server, given an application where the cost of a cache flush can be quite high.
However, I'd appreciate it if you save your application suggestions/advice for a different question (re-architecting my application, using a different caching implementation, etc.). I do appreciate a good brainstorm, but I think this question will be most valuable if it stays focused on the technical specifics of how Memcached does and does not work. If you don't have the answer to this specific question, there is probably still value in what you have to say, but I'd guess that there's a different, better place to post the more speculative comments/suggestions/advice.
This is probably the hardest problem we have to solve for memcached currently (well, a variation of it, anyway).
Freeing a chunk of memory requires us to know that a) nothing within the chunk is in use and b) nothing will start using it while we're in the process of purging it for reuse/freeing. I've heard some really good ideas for how we might solve our slab rebalancing problems which is basically the same, except we're not trying to free the memory, but to give it to something else (a common problem in a few large installations).
Also, whether free actually reduces the RSS of your process is implementation dependent. In many cases, a malloc/fill/free will leave the memory mapped in (unless your allocator uses mmap instead of sbrk).
I'm pretty sure this isn't possible with memcached. I don't see any technical reason why it couldn't be implemented though. Lock cache operations, expire enough keys to reach the desired size, update the size, unlock. (I'm sure there's nicer ways to avoid blocking the server during that time.)
The standard and default mechanism of memory management in memcached is slab allocator. It means that memory is being allocated for the process and never released to the operating system. Basically, when memory is no longer used to store some data, it is being held by the process in order to be reused later, when needed. However, the operating system releases memory allocated by the process when it is finished. That is why memory is being released when you kill/stop the memcached.
There is a compile-time option in memcached to enable malloc/free mechanism. So that when free() is called, memory might be released to operating system (this depends on C standard library implementation). But doing so might hurt a good fragmentation and performance.
Please read more about the issue here:
Why not use malloc/free
Memcached memory management

Determine whether memory location is in CPU cache

It is possible for an operating system to determine whether a page of memory is in DRAM or in swap; for example, simply try to access it and if a page fault occurs, it wasn't.
However, is the same thing possible with CPU cache?
Is there any efficient way to tell whether a given memory location has been loaded into a cache line, or to know when it does so?
In general, I don't think this is possible. It works for DRAM and the pagefile since that is an OS managed resource, cache is managed by the CPU itself.
The OS could do a tight timing loop of a memory read and try to see if it completes fast enough to be in the cache or if it had to go out to main memory - this would be very error prone.
On multi-core/multi-proc systems, there are cache coherency protocols that are used between processors to determine when to they need to invalidate each other's caches, I suppose you could have a custom device that would snoop this protocol that the OS would query.
What are you trying to do? If you want to force something into memory, current x86 processors support prefetching memory into the cache in a non-blocking way, for instance with Visual C++ you could use _mm_prefetch to fetch a line into the cache.
EDIT:
I haven't done this myself, so use at your own risk. To determine cache misses for profiling, you may be able to use some architecture-specific registers. http://download.intel.com/design/processor/manuals/253669.pdf, Appendix A gives "Performance Tuning Events". This can't be used to determine if an individual address is in the cache or when it is loaded in the cache, but can be used for overall stats. I believe this is what vTune (a phenomenal profiler for this level) uses.
If you try to determine this yourself then the very act of running your program could invalidate the relevant cache lines, hence rendering your measurements useless.
This is one of those cases that mirrors the scientific principle that you cannot measure something without affecting that which you are measuring.
X86
dont know how to tell if address IS in cache
BUT here is how to tell if address WAS in cache
rdtsc
save timestamp
mov eax,address
rdtsc read timestamp counter
calculate timestamp difference
if < threshold then was in cache
threshold has to be determined from documentation or empirically
some machines have cache hit/miss counters which would serve equally well

Resources