We have an erlang application that crashed due to memory issue:
Thu Jun 9 13:22:19 202
Slogan: eheap_alloc: Cannot allocate 3936326656 bytes of memory (of type "heap").
System version: Erlang/OTP 23 [erts-11.1.1] [source] [64-bit] [smp:48:48] [ds:48:48:10] [async-threads:1] [hipe]
Compiled: Mon Oct 12 07:40:16 2020
The total memory in =memory section of the crash dump says:
=memory
total: 126190588240
processes: 99882909648
processes_used: 99879758664
system: 26307678592
atom: 6308969
atom_used: 6281592
binary: 221919896
but when I total the processes' memory in the =proc section, it turns out to be 24660309644 bytes which is way less than the above mentioned for processes_used i.e 99879758664 bytes.
Can anyone help me understand why and where could this difference be?
Related
I have CentOS 6.8, Cassandra 3.9, 32 GB RAM. When I start Cassandra and once it is started, it starts consuming the memory and start adding up 'Cached' memory value when I start querying from CQLSH or Apache Spark and in this process, very less memory remain for other processing like cron execution.
Here are some details from my system
free -m
total used free shared buffers cached
Mem: 32240 32003 237 0 41 24010
-/+ buffers/cache: 7950 24290
Swap: 2047 25 2022
And here is the output of top -M command
top - 08:54:39 up 5 days, 16:24, 4 users, load average: 1.22, 1.20, 1.29
Tasks: 205 total, 2 running, 203 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.5%us, 1.2%sy, 19.8%ni, 75.3%id, 0.1%wa, 0.1%hi, 0.0%si, 0.0%st
Mem: 31.485G total, 31.271G used, 219.410M free, 42.289M buffers
Swap: 2047.996M total, 25.867M used, 2022.129M free, 23.461G cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14313 cassandr 20 0 595g 28g 22g S 144.5 91.3 300:56.34 java
You can see only 220 MB is left and 23.46 is cached.
My question is how to configure Cassandra so that it can use 'cached' memory to certain value and leave more RAM available for other processes.
Thanks in advance.
In linux in general cached memory as your 23g is just really fine. This memory is used as filesystem cache and so on - not by cassandra itself. Linux systems tend to use all available memory.
This helps to speed up your system in many ways to prevent disk reads.
You can still use the cached memory - just start processes and use your ram, the kernel will free it immediatly.
You can set the sizes in cassandra-env.sh under conf folder. This article should help. http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsTuneJVM.html
I have a 4GB ram in my system and it used 2GB of ram before the insertion completed wen using disc_copies. I was wondering what would happen if 100 percent of the ram was consumed? Is there any option to limit the ram consumed during disc_copies, like limiting the ram usage to 2GB?
If you are looking how to limit erlang VM memory usage you should use control groups for it. But if you like to monitor memory usage you should use memory monitor memsup from os_mon application.
$ erl -boot start_sasl
Erlang/OTP 18 [erts-7.0] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
...
=PROGRESS REPORT==== 22-Oct-2015::22:39:46 ===
application: sasl
started_at: nonode#nohost
Eshell V7.0 (abort with ^G)
1> application:start(os_mon).
...
=PROGRESS REPORT==== 22-Oct-2015::22:40:03 ===
application: os_mon
started_at: nonode#nohost
ok
2>
...
2> memsup:get_memory_data().
{8162500608,6514708480,{<0.7.0>,426616}}
3> memsup:get_system_memory_data().
[{system_total_memory,8162500608},
{free_swap,5996748800},
{total_swap,5997850624},
{cached_memory,3290759168},
{buffered_memory,444370944},
{free_memory,1647222784},
{total_memory,8162500608}]
4>
Read os_mon documentation about usage and alarms.
My server keeps getting filled up with ejabberd crash logs every few hours - it seems that the ejabberd server keep crashing and the crash log will fill the server's free space until the server has no more space left (GBs of crash logs). The crash logs start with something like this:
=erl_crash_dump:0.1
Tue Feb 4 23:44:02 2014
Slogan: Kernel pid terminated (application_controller) ({application_start_failure,kernel, {shutdown,{kernel,start,[normal,[]]}}})
System version: Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false]
Compiled: Fri Dec 16 03:22:15 2011
Taints:
Atoms: 4574
Can anyone see something from the crash log and let me know what's happening?
In this case, the crash dump is unlikely to tell you very much - it tells you that the kernel application shut down ({shutdown,{kernel,start,[normal,[]]}}), but it doesn't say why. In the error log you should find a number of crash reports and error messages that led to the node shutting down.
Crash dumps are much more useful if the node crashes because it runs out of memory. In that case, you can usually see which process behaved badly.
I get this message when launching RabbitMQ:
=WARNING REPORT==== 8-Feb-2014::10:43:42 ===
Only 2048MB of 23482MB memory usable due to limited address space.
Crashes due to memory exhaustion are possible - see
http://www.rabbitmq.com/memory.html#address-space
When I follow that link, I read about how I should be using a 64-bit Erlang VM. But:
ajax:~ maxvitek$ erl
Erlang R16B03 (erts-5.10.4) [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]
Eshell V5.10.4 (abort with ^G)
1>
...which certainly appears to be a 64-bit build. This is with the vm_memory_high_watermark set to 1. If I can get rid of the memory address problem so that RabbitMQ could use more of the system's memory, I will set that back to 0.4. Any idea where to look to fix this?
Both Erlang and RabbitMQ are installed via Homebrew, running on Mavericks.
I am not able to understand the reason why Erlang crashes and restarts. I am running Ejabberd server and its log folder is always full of erl_crash_xxxx.dump files. How can I debug this problem.
Here is a small part of erlang.log file:
=CRASH REPORT==== 4-Sep-2013::19:44:51 ===
crasher:
initial call: ejabberd_http:init/2
pid: <0.15614.15>
registered_name: []
exception exit: {normal,
{gen_fsm,sync_send_all_state_event,
[<0.15454.15>,
{http_put,2020093061,
[{"xmlns",
"http://jabber.org/protocol/httpbind"},
{"rid","2020093061"},
{"sid",
"26820e4cd7d331de864b857d1ef3351caf7dbac5"}],
[],115,1,[],
{{49,205,148,16},56132}},
30000]}}
in function gen_fsm:sync_send_all_state_event/3
in call from ejabberd_http_bind:http_put/7
in call from ejabberd_http_bind:handle_http_put/7
in call from ejabberd_http:process/2
in call from ejabberd_http:process_request/1
in call from ejabberd_http:process_header/2
in call from ejabberd_http:receive_headers/1
ancestors: [ejabberd_http_sup,ejabberd_sup,<0.37.0>]
messages: []
links: [<0.274.0>,#Port<0.1519795>]
dictionary: []
trap_exit: false
status: running
heap_size: 2584
stack_size: 24
reductions: 1082
neighbours:
These are the top few lines of a typical crash dump file:
=erl_crash_dump:0.1
Tue Sep 3 16:31:47 2013
Slogan: Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
System version: Erlang R14B04 (erts-5.8.5) [source] [64-bit] [rq:1] [async-threads:0] [hipe] [kernel-poll:false]
Compiled: Wed Oct 5 17:25:18 2011
Taints:
Atoms: 4699
=memory
total: 21498768
processes: 556368
processes_used: 541208
system: 20942400
atom: 322177
atom_used: 302233
binary: 18216
code: 2165726
ets: 53736
=hash_table:atom_tab
size: 3203
used: 2471
Those 'crashes' are probably totally unrelated. What you see as '=CRASH REPORT=' in your log files are sort of 'normal' or expected crashes and as such handled by a supervisor. The one you posted is a crash inside a handler for a HTTP call while creating or sending the a response that might ended up in some unforseen condition (like when the sending process does not exist anymore - not sure about that though). Erlang is built and designed to handle such crashes gracefully. In other words, your ejabberd is still running and serving requests happily.
The other one is a crash dump which results from a real crash of the Erlang runtime. This might happen if for example the machine hosting the runtime runs out of memory. In your case it seems like a configuration error, the node could not boot correctly at all. See http://www.ejabberd.im/node/872.