Redis - monitoring memory usage - memory

I am currently testing insertion of keys in a database Redis (on local).
I have more than 5 millions keys and I have just 4GB RAM so at one moment I reach capacity of RAM and swap fill in (and my PC goes down)...
My problematic : How can I make monitoring memory usage on the machine which has the Redis database, and in this way alert no more insert some keys in the Redis database ?
Thanks.

Memory is a critical resource for Redis performance. Used memory defines total number of bytes allocated by Redis using its allocator (either standard libc, jemalloc, or an alternative allocator such as tcmalloc).
You can collect all memory utilization metrics data for a Redis instance by running “info memory”.
127.0.0.1:6379> info memory
Memory
used_memory:1007280
used_memory_human:983.67K
used_memory_rss:2002944
used_memory_rss_human:1.91M
used_memory_peak:1008128
used_memory_peak_human:984.50K
Sometimes, when Redis is configured with no max memory limit, memory usage will eventually reach system memory, and the server will start throwing “Out of Memory” errors. At other times, Redis is configured with a max memory limit but noeviction policy. This would cause the server not to evict any keys, thus preventing any writes until memory is freed. The solution to such problems would be configuring Redis with max memory and some eviction policy. In this case, the server starts evicting keys using eviction policy as memory usage reaches the max.
Memory RSS (Resident Set Size) is the number of bytes that the operating system has allocated to Redis. If the ratio of ‘memory_rss’ to ‘memory_used’ is greater than ~1.5, then it signifies memory fragmentation. The fragmented memory can be recovered by restarting the server.

Concerning memory usage, I'd advise you to look at the redis.io FAQ and this article about using redis as a LRU cache.
You can either cap the memory usage via the maxmemory configuration setting, in which case once the memory limit is reached all write requests will fail with an error, or you could set the maxmemory-policy to allkeys-lru, for example, to start overwriting the least recently used data on the server with stuff you currently need, etc. For most use cases you have enough flexibility to handle such problems through proper config.
My advice is to keep things simple and manage this issue through configuration of the redis server rather than introducing additional complexity through os-level monitoring or the like.

There is a good Unix utility named vmstat. It is like top but command line, so you can get the memory usage and be prepared before you system is halt. You can also use ps v PID to get this info about specific process. Redis's PID is can be retrieved this way: pidof redis-server

Related

Redis memory stats. what is the differences between these 3?

When you running redis process in the machine, there are redis-cli. So that we can get some information of the process.
❯ redis-cli
127.0.0.1:6379> info memory
and there are 3 things called.
used_memory_rss and used_memory_peak and maxmemory. As far as I know, used_memory_rss is actual memory that Redis is consuming. Also I know that once Redis take memory, it doesn't release(free) memory to OS.(They are not doing GC) unless you restart the process.
Then how is it possible that used_memory_peak is bigger than used_memory_rss?
and what is maxmemory?
maxmemory is the value of the corresponding configuration directive, it is well-described in INFO command documentation along with the memory optimization tips.
Regarding Redis never releasing memory - what makes you think so? The documentation says somewhat different:
Redis will not always free up (return) memory to the OS when keys are removed... This happens because the underlying allocator can't easily release the memory. For example often most of the removed keys were allocated in the same pages as the other keys that still exist.
"Not always" is not the same as "never" :) Whether it releases memory or not strongly depends on what allocator is being used and what data access patterns you have.
For example, there is MEMORY PURGE command (works only with jemalloc) that can trigger some memory to be released to OS:
127.0.0.1:6379> info memory
# Memory
used_memory:1312328
used_memory_human:1.25M
used_memory_rss:7118848
used_memory_rss_human:6.79M
...
127.0.0.1:6379> memory purge
OK
127.0.0.1:6379> info memory
# Memory
used_memory:1312328
used_memory_human:1.25M
used_memory_rss:6041600
used_memory_rss_human:5.76M
...
(note used_memory_rss slightly reduced - this means it can go below peak usage under certain lucky circumstances)

AWS server became slow after traffic increase

I have a single page Angular app that makes request to a Rails API service. Both are running on a t2xlarge Ubuntu instance. I am using a Postgres database.
We had increase in traffic, and my Rails API became slow. Sometimes, I get an error saying Passenger queue full for rails application.
Auto scaling on the server is working; three more instances are created. But I cannot trace this issue. I need root access to upgrade, which I do not have. Please help me with this.
As you mentioned that you are using T2.2xlarge instance type. Firstly I want to tell you should not use T2 instance type for production environment. Cause of T2 instance uses CPU Credit. Lets take a look on this
What happens if I use all of my credits?
If your instance uses all of its CPU credit balance, performance
remains at the baseline performance level. If your instance is running
low on credits, your instance’s CPU credit consumption (and therefore
CPU performance) is gradually lowered to the base performance level
over a 15-minute interval, so you will not experience a sharp
performance drop-off when your CPU credits are depleted. If your
instance consistently uses all of its CPU credit balance, we recommend
a larger T2 size or a fixed performance instance type such as M3 or
C3.
Im not sure you won't face to the out of CPU Credit problem because you are using Xlarge type but I think you should use other fixed performance instance types. So instance's performace maybe one part of your problem. You should use cloudwatch to monitor on 2 metrics: CPUCreditUsage and CPUCreditBalance to make sure the problem.
Secondly, how about your ASG? After scale-out, did your service become stable? If so, I think you do not care about this problem any more because ASG did what it's reponsibility.
Please check the following
If you are opening a connection to Database, make sure you close it.
If you are using jquery, bootstrap, datatables, or other css libraries, use the CDN links like
<link rel="stylesheet" ref="https://cdnjs.cloudflare.com/ajax/libs/bootstrap-select/1.12.4/css/bootstrap-select.min.css">
it will reduce a great amount of load on your server. do not copy the jquery or other external libraries on your own server when you can directly fetch it from other servers.
There are a number of factors that can cause an EC2 instance (or any system) to appear to run slowly.
CPU Usage. The higher the CPU usage the longer to process new threads and processes.
Free Memory. Your system needs free memory to process threads, create new processes, etc. How much free memory do you have?
Free Disk Space. Operating systems tend to thrash when the file systems on system drives run low on free disk space. How much free disk space do you have?
Network Bandwidth. What is the average bytes in / out for your
instance?
Database. Monitor connections, free memory, disk bandwidth, etc.
Amazon has CloudWatch which can provide you with monitoring for everything except for free disk space (you can add an agent to your instance for this metric). This will also help you quickly see what is happening with your instances.
Monitor your EC2 instances and your database.
You mention T2 instances. These are burstable CPUs which means that if you have consistenly higher CPU usage, then you will want to switch to fixed performance EC2 instances. CloudWatch should help you figure out what you need (CPU or Memory or Disk or Network performance).
This is totally independent of AWS Server. Looks like your software needs more juice (RAM, StorageIO, Network) and it is not sufficient with one machine. You need to evaluate the metric using cloudwatch and adjust software needs based on what is required for the software.
It could be memory leaks or processing leaks that may lead to this as well. You need to create clusters or server farm to handle the load.
Hope it helps.

EC2 CloudWatch memory metrics don't match what Top shows

I have a t2.micro EC2 instance, running at about 2% CPU. I know from other posts that the CPU usage shown in TOP is different to CPU reported in CloudWatch, and the CloudWatch value should be trusted.
However, I'm seeing very different values for Memory usage between TOP, CloudWatch, and NewRelic.
There's 1Gb of RAM on the instance, and TOP shows ~300Mb of Apache processes, plus ~100Mb of other processes. The overall memory usage reported by TOP is 800Mb. I guess there's 400Mb of OS/system overhead?
However, CloudWatch reports 700Mb of usage, and NewRelic reports 200Mb of usage (even though NewRelic reports 300Mb of Apache processes elsewhere, so I'm ignoring them).
The CloudWatch memory metric often goes over 80%, and I'd like to know what the actual value is, so I know when to scale if necessary, or how to reduce memory usage.
Here's the recent memory profile, seems something is using more memory over time (big dips are either Apache restart, or perhaps GC?)
Screenshot of memory usage over last 12 days
AWS doesn't supports Memory metrics of any EC2 instance. As Amazon does all his monitoring from outside the EC2 instance(servers), it is unable to capture the memory metrics inside the instance. But, for complete monitoring of an instance, you must need Memory Utilisation statistics for any instance, along with his CPU Utilisation and Network IO operations.
But, we can use custom metrics feature of cloudwatch to send any app-level data to Cloudwatch and monitor it using amazon tools.
You can follow this blog for more details: http://upaang-saxena.strikingly.com/blog/adding-ec2-memory-metrics-to-aws-cloudwatch
You can set a cron for 5 min interval in that instance, and all the data points can be seen in Cloudwatch.
CloudWatch doesn't actually provide metrics regarding memory usage for EC2 instance, you can confirm this here.
As a result, the MemoryUtilization metric that you are referring to is obviously a custom metric that is being pushed by something you have configured or some application running on your instance.
As a result, you need to determine what is actually pushing the data for this metric. The data source is obviously pushing the wrong thing, or is unreliable.
The behavior you are seeing is not a CloudWatch problem.

Choosing redis maxmemory size and BGSAVE memory usage

I am trying to find out what a safe setting for 'maxmemory' would be in the following situation:
write-heavy application
8GB RAM
let's assume other processes take up about 1GB
this means that the redis process' memory usage may never exceed 7GB
memory usage doubles on every BGSAVE event, because:
In the redis docs the following is said about the memory usage increasing on BGSAVE events:
If you are using Redis in a very write-heavy application, while saving an RDB file on disk or rewriting the AOF log Redis may use up to 2 times the memory normally used.
the maxmemory limit is roughly compared to 'used_memory' from redis-cli INFO (as is explained here) and does not take other memory used by redis into account
Am I correct that this means that the maxmemory setting should, in this situation, be set no higher than (8GB - 1GB) / 2 = 3.5GB?
If so, I will create a pull request for the redis docs to reflect this more clearly.
I would recommend in this case a limit of 3GB. Yes, the docs are pretty much correct and running a bgsave will double for a short term the memory requirements. However, I prefer to reserve 2GB of memory for the system, or at a maximum for a persisting master 40% of maximum memory.
You indicate you have a very write heavy application. In this case I would highly recommend a second server do the save operations. I've found during high writes and a bgsave the response time to the client(s) can get high. It isn't Redis per se causing it, but the response of the server itself. This is especially true for virtual machines. Under this setup you would use the second server to slave from the primary and save to disk while the first remains responsive.

Using the Web Console, Is it possible to Change the Membase Memory Quota?

Let's say I have a Membase server with a memory quota of 1GB. Is it possible to change it and make it larger, say 8GB, assuming the server gets a hardware upgrade?
Currently, I have the impression that Membase memory quota is unchangeable once set, which is very frustrating.
For 1.6.5 at least, you can change it from Web console. From Manage | Data Buckets, you can reset your RAM quota.
But for a cluster, it won't be that easy though because it is required to have same memory quota across all cluster nodes.

Resources