How to economize on memory use using the Xmx JVM option - memory

How do I determine the lower bound for the JVM option Xmx or otherwise economize on memory without a trial and error process? I happen to set Xms and Xmx to be the same amount, which I assume helps to economize on execution time. If I set Xmx to 7G, and likewise Xms, it will happily report that all of it is being used. I use the following query:
Runtime.getRuntime().totalMemory()
If I set it to less than that, say 5GB, likewise all of it will be used. It is not until I provide very much less, say 1GB will there be an out-of-heap exception. Since my execution times are typically 10 hours or more, I need to avoid trial and error processes.

I'd execute the program with plenty of heap while monitoring heap usage with JConsole. Take note of the highest memory use after a major garbage collection, and set about maximum heap size 50% to 100% higher than that amount to avoid frequent garbage collection.
As an aside, totalMemory reports the size of the heap, not how much of it is presently used. If you set minimum and maximum heap size to the same number, totalMemory will be the same irrespective of what your program does ...

Using Xms256M and Xmx512M, and a trivial program, freeMemory is 244M and totalMemory is 245M and maxMemory is 455M. Using Xms512M and Xmx512M, the amounts are 488M, 490M, and 490M. This suggests that totalMemory is a variable amount that can vary if Xms is less than Xmx. That suggests the answer to the question is to set Xms to a small amount and monitor the highwater mark of totalMemory. It also suggests maxMemory is the ultimate heap size that cannot be exceed by the total of current and future objects.
Once the highwater mark is known, set Xmx to be somewhat more than that to be prudent -- but not excessively more because this is an economization effort -- and set Xms to be the same amount to get the time efficiency that is evidently preferred.

Related

Why does Prometheus consume so much memory?

I'm using Prometheus 2.9.2 for monitoring a large environment of nodes.
As part of testing the maximum scale of Prometheus in our environment, I simulated a large amount of metrics on our test environment.
My management server has 16GB ram and 100GB disk space.
During the scale testing, I've noticed that the Prometheus process consumes more and more memory until the process crashes.
I've noticed that the WAL directory is getting filled fast with a lot of data files while the memory usage of Prometheus rises.
The management server scrapes its nodes every 15 seconds and the storage parameters are all set to default.
I would like to know why this happens, and how/if it is possible to prevent the process from crashing.
Thank you!
The out of memory crash is usually a result of a excessively heavy query. This may be set in one of your rules. (this rule may even be running on a grafana page instead of prometheus itself)
If you have a very large number of metrics it is possible the rule is querying all of them. A quick fix is by exactly specifying which metrics to query on with specific labels instead of regex one.
This article explains why Prometheus may use big amounts of memory during data ingestion. If you need reducing memory usage for Prometheus, then the following actions can help:
Increasing scrape_interval in Prometheus configs.
Reducing the number of scrape targets and/or scraped metrics per target.
P.S. Take a look also at the project I work on - VictoriaMetrics. It can use lower amounts of memory compared to Prometheus. See this benchmark for details.
Because the combination of labels lies on your business, the combination and the blocks may be unlimited, there's no way to solve the memory problem for the current design of prometheus!!!! But i suggest you compact small blocks into big ones, that will reduce the quantity of blocks.
Huge memory consumption for TWO reasons:
prometheus tsdb has a memory block which is named: "head", because head stores all the series in latest hours, it will eat a lot of memory.
each block on disk also eats memory, because each block on disk has a index reader in memory, dismayingly, all labels, postings and symbols of a block are cached in index reader struct, the more blocks on disk, the more memory will be cupied.
in index/index.go, you will see:
type Reader struct {
b ByteSlice
// Close that releases the underlying resources of the byte slice.
c io.Closer
// Cached hashmaps of section offsets.
labels map[string]uint64
// LabelName to LabelValue to offset map.
postings map[string]map[string]uint64
// Cache of read symbols. Strings that are returned when reading from the
// block are always backed by true strings held in here rather than
// strings that are backed by byte slices from the mmap'd index file. This
// prevents memory faults when applications work with read symbols after
// the block has been unmapped. The older format has sparse indexes so a map
// must be used, but the new format is not so we can use a slice.
symbolsV1 map[uint32]string
symbolsV2 []string
symbolsTableSize uint64
dec *Decoder
version int
}
We used the prometheus version 2.19 and we had a significantly better memory performance. This Blog highlights how this release tackles memory problems. i will strongly recommend using it to improve your instance resource consumption.

Golang. Zero Garbage propagation or efficient use of memory

From time to time I face with the concepts like zero garbage or efficient use of memory etc. As an example in the section Features of well-known package httprouter you can see the following:
Zero Garbage: The matching and dispatching process generates zero bytes of garbage. In fact, the only heap allocations that are made, is by building the slice of the key-value pairs for path parameters. If the request path contains no parameters, not a single heap allocation is necessary.
Also this package shows very good benchmark results compared to standard library's http.ServeMux:
BenchmarkHttpServeMux 5000 706222 ns/op 96 B/op 6 allocs/op
BenchmarkHttpRouter 100000 15010 ns/op 0 B/op 0 allocs/op
As far as I understand the second one has (from the table) no heap memory allocation and zero average number of allocations made per repetition.
The question: I want to learn a basic understanding of memory management. When garbage collector allocates/deallocates memory. What does the benchmark numbers means (the last two columns of the table) and how people know when heap is allocating?
I'm absolutely new in memory management, so it's really difficult to understand what's going on "under the hood". The articles I've read:
https://golang.org/ref/mem
https://golang.org/doc/effective_go.html
http://gribblelab.org/CBootcamp/7_Memory_Stack_vs_Heap.html
http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
The garbage collector doesn't allocate memory :-), it just deallocates. Go's garbage collector is evolving, for the details have a look at the design document https://docs.google.com/document/d/16Y4IsnNRCN43Mx0NZc5YXZLovrHvvLhK_h0KN8woTO4/preview?sle=true and follow the discussion on the golang mailing lists.
The last two columns in the benchmark output are dead simple: How many bytes have been allocated in total and how many allocations have happened during one iteration of the benchmark code. (This allocation is done by your code, not by the garbage collector). As any allocation is a potential creation of garbage reducing these numbers might be a design goal.
When are things allocated on the heap? Whenever the Go compiler decides to! The compiler tries to allocate on the stack, but sometimes it must use the heap, especially if a value escapes from the local stack-bases scopes. This escape analysis is currently undergoing rework, so it is not easy to tell which value will be heap- or stack-allocated, especially as this is changing from compiler version to version.
I wouldn't be too obsessed with avoiding allocations until your benchmarking show too much GC overhead.

Maximum memory allocation on openCL CPU

I have read that there's a limit to the maximum memory allocation to around 60% of device memory, and these can be changed by modifying the GPU_MAX_HEAP_SIZE and GPU_MAX_ALLOC_SIZE environment variables for GPU.
I am wonder if the AMD SDK has something similar for the CPU if I want to raise the limit of memory allocation?
For my current configuration, it returns the following:
CL_DEVICE_MAX_MEM_ALLOC_SIZE = 2973.37MB
CL_DEVI_CEGLOBAL_MEM_SIZE = 11893.5MB
Thanks.
I was able to change this on my system. I don't know if this method was possible when you originally asked the question.
set the environment variable 'CPU_MAX_ALLOC_PERCENT' to the percentage of total memory you want to be able to allocate for a single global buffer. I have 8GB system memory, and after setting CPU_MAX_ALLOC_PERCENT to 80, clinfo reports the following:
Max memory allocation: 6871207116
Success! 6.399GB
You can also use GPU_MAX_ALLOC_PERCENT in the same way for your GPU devices.

memcached memory consumption

I set up a memcached on 3 machines with each allocated 4G. I have totally 200M items. Thus, on average, each item has 60 bytes. The hash key is a length 10 string. The hash value is boolean. The memory looks more than enough. However, I still found "evictions" number increases. What's the memcached memory consumption?
Memcached allocates storage based on slabs of fixed sizes, and may therefore waste a lot of memory. If you have very small objects, it will select the smallest slab that is larger than your object, but that may depending on configuration be quite large.
I think you can configure this to some extent, and it may be worthwhile testing with different values for the -n parameter to see if this makes a difference.

GC and memory limit issues with R

I am using R on some relatively big data and am hitting some memory issues. This is on Linux. I have significantly less data than the available memory on the system so it's an issue of managing transient allocation.
When I run gc(), I get the following listing
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 2147186 114.7 3215540 171.8 2945794 157.4
Vcells 251427223 1918.3 592488509 4520.4 592482377 4520.3
yet R appears to have 4gb allocated in resident memory and 2gb in swap. I'm assuming this is OS-allocated memory that R's memory management system will allocate and GC as needed. However, lets say that I don't want to let R OS-allocate more than 4gb, to prevent swap thrashing. I could always ulimit, but then it would just crash instead of working within the reduced space and GCing more often. Is there a way to specify an arbitrary maximum for the gc trigger and make sure that R never os-allocates more? Or is there something else I could do to manage memory usage?
In short: no. I found that you simply cannot micromanage memory management and gc().
On the other hand, you could try to keep your data in memory, but 'outside' of R. The bigmemory makes that fairly easy. Of course, using a 64bit version of R and ample ram may make the problem go away too.

Resources