Understand Erlang crash log generated by Ejabberd - erlang

I am not able to understand the reason why Erlang crashes and restarts. I am running Ejabberd server and its log folder is always full of erl_crash_xxxx.dump files. How can I debug this problem.
Here is a small part of erlang.log file:
=CRASH REPORT==== 4-Sep-2013::19:44:51 ===
crasher:
initial call: ejabberd_http:init/2
pid: <0.15614.15>
registered_name: []
exception exit: {normal,
{gen_fsm,sync_send_all_state_event,
[<0.15454.15>,
{http_put,2020093061,
[{"xmlns",
"http://jabber.org/protocol/httpbind"},
{"rid","2020093061"},
{"sid",
"26820e4cd7d331de864b857d1ef3351caf7dbac5"}],
[],115,1,[],
{{49,205,148,16},56132}},
30000]}}
in function gen_fsm:sync_send_all_state_event/3
in call from ejabberd_http_bind:http_put/7
in call from ejabberd_http_bind:handle_http_put/7
in call from ejabberd_http:process/2
in call from ejabberd_http:process_request/1
in call from ejabberd_http:process_header/2
in call from ejabberd_http:receive_headers/1
ancestors: [ejabberd_http_sup,ejabberd_sup,<0.37.0>]
messages: []
links: [<0.274.0>,#Port<0.1519795>]
dictionary: []
trap_exit: false
status: running
heap_size: 2584
stack_size: 24
reductions: 1082
neighbours:
These are the top few lines of a typical crash dump file:
=erl_crash_dump:0.1
Tue Sep 3 16:31:47 2013
Slogan: Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
System version: Erlang R14B04 (erts-5.8.5) [source] [64-bit] [rq:1] [async-threads:0] [hipe] [kernel-poll:false]
Compiled: Wed Oct 5 17:25:18 2011
Taints:
Atoms: 4699
=memory
total: 21498768
processes: 556368
processes_used: 541208
system: 20942400
atom: 322177
atom_used: 302233
binary: 18216
code: 2165726
ets: 53736
=hash_table:atom_tab
size: 3203
used: 2471

Those 'crashes' are probably totally unrelated. What you see as '=CRASH REPORT=' in your log files are sort of 'normal' or expected crashes and as such handled by a supervisor. The one you posted is a crash inside a handler for a HTTP call while creating or sending the a response that might ended up in some unforseen condition (like when the sending process does not exist anymore - not sure about that though). Erlang is built and designed to handle such crashes gracefully. In other words, your ejabberd is still running and serving requests happily.
The other one is a crash dump which results from a real crash of the Erlang runtime. This might happen if for example the machine hosting the runtime runs out of memory. In your case it seems like a configuration error, the node could not boot correctly at all. See http://www.ejabberd.im/node/872.

Related

Erlang crash dump memory data inconsistency in processes

We have an erlang application that crashed due to memory issue:
Thu Jun 9 13:22:19 202
Slogan: eheap_alloc: Cannot allocate 3936326656 bytes of memory (of type "heap").
System version: Erlang/OTP 23 [erts-11.1.1] [source] [64-bit] [smp:48:48] [ds:48:48:10] [async-threads:1] [hipe]
Compiled: Mon Oct 12 07:40:16 2020
The total memory in =memory section of the crash dump says:
=memory
total: 126190588240
processes: 99882909648
processes_used: 99879758664
system: 26307678592
atom: 6308969
atom_used: 6281592
binary: 221919896
but when I total the processes' memory in the =proc section, it turns out to be 24660309644 bytes which is way less than the above mentioned for processes_used i.e 99879758664 bytes.
Can anyone help me understand why and where could this difference be?

Erlang/OTP app using rebar returns crash dump when releasing

I followed the tutorial many times and, created a small app using rebar and release it successfully. But when I come to the main application (calendarApp), crash dump was returned. I made some changes but the result is the same. I would like to know what I am doing wrong, and how I can fix this.
At the end, when I do start and attach, it returns "Node is not running!".
I will show you:
sonu#sonu-Ideapad-Z570:~/calendarApp/rel$ ./calendarApp/bin/calendarApp console
Exec: /home/sonu/release/calendarApp/rel/calendarApp/erts-5.10.4/bin/erlexec -boot /home/sonu/release/calendarApp/rel/calendarApp/releases/1/calendarApp -mode embedded -config /home/sonu/release/calendarApp/rel/calendarApp/releases/1/sys.config -args_file /home/sonu/release/calendarApp/rel/calendarApp/releases/1/vm.args -- console
Root: /home/sonu/release/calendarApp/rel/calendarApp
Erlang R16B03 (erts-5.10.4) [source] [smp:4:4] [async-threads:10] [kernel-poll:false]
{global,calendarApp_sup} (<0.216.0>) starting....
{global,calendarApp_server} (<0.217.0>) starting ......
Eshell V5.10.4 (abort with ^G)
(calendarApp#127.0.0.1)1>
=INFO REPORT==== 19-Jan-2016::09:50:45 ===
application: calendarApp
exited: {{shutdown,
{failed_to_start_child,calendarApp_server,
{undef,
[{mnesia,create_schema,
[['calendarApp#127.0.0.1']],
[]},
{calendarApp_db,init,0,
[{file,"src/calendarApp_db.erl"},{line,7}]},
{calendarApp_server,init,1,
[{file,"src/calendarApp_server.erl"},{line,49}]},
{gen_server,init_it,6,
[{file,"gen_server.erl"},{line,304}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]}}},
{calendarApp_app,start,[normal,[]]}}
type: permanent
{"Kernel pid terminated",application_controller," {application_start_failure,calendarApp,{{shutdown,{failed_to_start_child,calendarApp_server,{undef,[{mnesia,create_schema,[['calendarApp#127.0.0.1']],[]},{calendarApp_db,init,0,[{file,\"src/calendarApp_db.erl\"},{line,7}]},{calendarApp_server,init,1,[{file,\"src/calendarApp_server.erl\"},{line,49}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]}}},{calendarApp_app,start,[normal,[]]}}}"}
Crash dump was written to: erl_crash.dump
Kernel pid terminated (application_controller) ({application_start_failure,calendarApp,{{shutdown,{failed_to_start_child,calendarApp_server,{undef,[{mnesia,create_schema, [['calendarApp#127.0.0.1']],
sonu#sonu-Ideapad-Z570:~/calendarApp/rel$ ./calendarApp/bin/calendarApp console
If anyone knows the problem, please help me to fix this.

RabbitMQ (beam.smp) and high CPU/memory load issue

I have a debian box running tasks with celery and rabbitmq for about a year. Recently I noticed tasks were not being processed so I logged into the system and noticed that celery could not connect to rabbitmq. I restarted rabbitmq-server and even though celery was not complaining anymore it was not executing new tasks now. The odd thing was that rabbitmq was devouring cpu and memory resources like crazy. Restarting server would not solve the problem. After spending couple hours looking for solution online to no avail I decided to rebuild the server.
I rebuilt new server with Debian 7.5, rabbitmq 2.8.4, celery 3.1.13 (Cipater). For about an hour or so everything worked beautifully again until celery started complaining again that it can't connect to rabbitmq!
[2014-08-06 05:17:21,036: ERROR/MainProcess] consumer: Cannot connect to amqp://guest:**#127.0.0.1:5672//: [Errno 111] Connection refused.
Trying again in 6.00 seconds...
I restarted rabbitmq service rabbitmq-server start and same issue gain:
rabbitmq started again swelling up constantly pounding on cpu and slowly taking over all ram and swap:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21823 rabbitmq 20 0 908m 488m 3900 S 731.2 49.4 9:44.74 beam.smp
Here's the result on rabbitmqctl status:
Status of node 'rabbit#li370-61' ...
[{pid,21823},
{running_applications,[{rabbit,"RabbitMQ","2.8.4"},
{os_mon,"CPO CXC 138 46","2.2.9"},
{sasl,"SASL CXC 138 11","2.2.1"},
{mnesia,"MNESIA CXC 138 12","4.7"},
{stdlib,"ERTS CXC 138 10","1.18.1"},
{kernel,"ERTS CXC 138 10","2.15.1"}]},
{os,{unix,linux}},
{erlang_version,"Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:8:8] [async-threads:30] [kernel-poll:true]\n"},
{memory,[{total,489341272},
{processes,462841967},
{processes_used,462685207},
{system,26499305},
{atom,504409},
{atom_used,473810},
{binary,98752},
{code,11874771},
{ets,6695040}]},
{vm_memory_high_watermark,0.3999999992280962},
{vm_memory_limit,414559436},
{disk_free_limit,1000000000},
{disk_free,48346546176},
{file_descriptors,[{total_limit,924},
{total_used,924},
{sockets_limit,829},
{sockets_used,3}]},
{processes,[{limit,1048576},{used,1354}]},
{run_queue,0},
Some entries from /var/log/rabbitmq:
=WARNING REPORT==== 8-Aug-2014::00:11:35 ===
Mnesia('rabbit#li370-61'): ** WARNING ** Mnesia is overloaded: {dump_log,
write_threshold}
=WARNING REPORT==== 8-Aug-2014::00:11:35 ===
Mnesia('rabbit#li370-61'): ** WARNING ** Mnesia is overloaded: {dump_log,
write_threshold}
=WARNING REPORT==== 8-Aug-2014::00:11:35 ===
Mnesia('rabbit#li370-61'): ** WARNING ** Mnesia is overloaded: {dump_log,
write_threshold}
=WARNING REPORT==== 8-Aug-2014::00:11:35 ===
Mnesia('rabbit#li370-61'): ** WARNING ** Mnesia is overloaded: {dump_log,
write_threshold}
=WARNING REPORT==== 8-Aug-2014::00:11:36 ===
Mnesia('rabbit#li370-61'): ** WARNING ** Mnesia is overloaded: {dump_log,
write_threshold}
=INFO REPORT==== 8-Aug-2014::00:11:36 ===
vm_memory_high_watermark set. Memory used:422283840 allowed:414559436
=WARNING REPORT==== 8-Aug-2014::00:11:36 ===
memory resource limit alarm set on node 'rabbit#li370-61'.
**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************
=INFO REPORT==== 8-Aug-2014::00:11:43 ===
started TCP Listener on [::]:5672
=INFO REPORT==== 8-Aug-2014::00:11:44 ===
vm_memory_high_watermark clear. Memory used:290424384 allowed:414559436
=WARNING REPORT==== 8-Aug-2014::00:11:44 ===
memory resource limit alarm cleared on node 'rabbit#li370-61'
=INFO REPORT==== 8-Aug-2014::00:11:59 ===
vm_memory_high_watermark set. Memory used:414584504 allowed:414559436
=WARNING REPORT==== 8-Aug-2014::00:11:59 ===
memory resource limit alarm set on node 'rabbit#li370-61'.
**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************
=INFO REPORT==== 8-Aug-2014::00:12:00 ===
vm_memory_high_watermark clear. Memory used:411143496 allowed:414559436
=WARNING REPORT==== 8-Aug-2014::00:12:00 ===
memory resource limit alarm cleared on node 'rabbit#li370-61'
=INFO REPORT==== 8-Aug-2014::00:12:01 ===
vm_memory_high_watermark set. Memory used:415563120 allowed:414559436
=WARNING REPORT==== 8-Aug-2014::00:12:01 ===
memory resource limit alarm set on node 'rabbit#li370-61'.
**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************
=INFO REPORT==== 8-Aug-2014::00:12:07 ===
Server startup complete; 0 plugins started.
=ERROR REPORT==== 8-Aug-2014::00:15:32 ===
** Generic server rabbit_disk_monitor terminating
** Last message in was update
** When Server state == {state,"/var/lib/rabbitmq/mnesia/rabbit#li370-61",
50000000,46946492416,100,10000,
#Ref<0.0.1.79456>,false}
** Reason for termination ==
** {unparseable,[]}
=INFO REPORT==== 8-Aug-2014::00:15:37 ===
Disk free limit set to 50MB
=ERROR REPORT==== 8-Aug-2014::00:16:03 ===
** Generic server rabbit_disk_monitor terminating
** Last message in was update
** When Server state == {state,"/var/lib/rabbitmq/mnesia/rabbit#li370-61",
50000000,46946426880,100,10000,
#Ref<0.0.1.80930>,false}
** Reason for termination ==
** {unparseable,[]}
=INFO REPORT==== 8-Aug-2014::00:16:05 ===
Disk free limit set to 50MB
UPDATE:
Seems like problem was solved when installed newest version of rabbitmq (3.3.4-1) from rabbitmq.com repository. Originally I had one installed (2.8.4) from Debian repositories. So far rabbitmq-server is working smoothly. I will update this post if issue comes back.
UPDATE:
Unfortunately after about 24 hours the issue reappeared where rabbitmq shut down and restarting the process would make it consume resources until it shuts down again within minutes.
Finally I found the solution. These posts helped to figure this out.
RabbitMQ on EC2 Consuming Tons of CPU
and
https://serverfault.com/questions/337982/how-do-i-restart-rabbitmq-after-switching-machines
What happened was rabbitmq was holding on to all the results that were never freed to the point it became overloaded. I cleared all the stale data in /var/lib/rabbitmq/mnesia/rabbit/, restarted rabbit and it works fine now.
My solution was to disable storing results alltogether with CELERY_IGNORE_RESULT = True in the Celery config file to assure this does not happen again.
You can also reset the queue:
Warning: This clears all data and configuration! Use with caution.
sudo service rabbitmq-server start
sudo rabbitmqctl stop_app
sudo rabbitmqctl reset
sudo rabbitmqctl start_app
You might need to run these command right after rebooting if your system is not responding.
You are running out the memory resources because of celery, I got a similar issue and it was a problem with the queues used by celery backend result.
You can check how many queues there are using rabbitmqctl list_queues command, take attention if that number grows forever. In that case check out your celery use.
About celery, if you are not getting the results as asycronous events dont configure a backend for store those unused results.
I experienced a similar issue and it turned out to be due to some rogue RabbitMQ client applications.
The issue seems to have been that due to some un-handled error, the rogue application was continuously trying to make a connection to the RabbitMQ broker.
Once the client applications were restarted, everything went back to normal (since the application stopped malfunctioning and there for stopped trying to make a connection to RabbitMQ in an endless loop)
Another possible cause: The management-plugin.
I'm running RabbitMQ 3.8.1 with enabled management-plugin.
On a 10-core server I had up to 1000% CPU-usage with 3 idle consumers and not a single message being sent, and one queue.
When I disabled the management-plugin by executing rabbitmq-plugins disable rabbitmq_management the usage dropped to 0% with occasional spikes of 200%.

How to set up yaws 1.89 in ubuntu

When I am trying to install yaws 1.89, the error bellow is raised. Please help me to overcome that error.
~/yaws$ sudo yaws
Erlang R16B03 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:10] [kernel-poll:true]
Eshell V5.10.4 (abort with ^G)
1> =ERROR REPORT==== 14-Mar-2014::15:05:09 ===
Failed to load setuid_drv (from "/usr/local/lib/yaws/priv/lib") : "Driver compiled with incorrect version of erl_driver.h"
=ERROR REPORT==== 14-Mar-2014::15:05:09 ===
FATAL {'EXIT',normal}
=INFO REPORT==== 14-Mar-2014::15:05:09 ===
application: yaws
exited: {{shutdown,
{failed_to_start_child,yaws_server,
{badconf,
[{yaws_server,init,1,
[{file,"yaws_server.erl"},{line,159}]},
{gen_server,init_it,6,
[{file,"gen_server.erl"},{line,304}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]}}},
{yaws_app,start,[normal,[]]}}
type: permanent
{"Kernel pid terminated",application_controller,"{application_start_failure,yaws,{{shutdown,{failed_to_start_child,yaws_server,{badconf,[{yaws_server,init,1,[{file,\"yaws_server.erl\"},{line,159}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]}}},{yaws_app,start,[normal,[]]}}}"}
Crash dump was written to: erl_crash.dump
Kernel pid terminated (application_controller) ({application_start_failure,yaws,{{shutdown,{failed_to_start_child,yaws_server,{badconf,[{yaws_server,init,1,[{file,"yaws_server.erl"},{line,159}]},{ge
Yaws 1.89 was released in September 2010, but you're trying to run it on a newer version of Erlang, R16B03, which was released in December 2013. The drivers Yaws uses, which are compiled native code, were compiled using a version of the Erlang driver interface that doesn't match that of the Erlang version you're running, which is what causes the error messages you're seeing.
Your comment above hints that you were able to get it working by downloading yaws-1.89.tar.gz and building it yourself; if so, then yes, that's a good way to get it working with your current version of Erlang. But a better way would be to use a newer version of Yaws — the latest version at this time of writing is Yaws 1.98, released in November 2013.

How to interpret ejabberd crash dumps?

My server keeps getting filled up with ejabberd crash logs every few hours - it seems that the ejabberd server keep crashing and the crash log will fill the server's free space until the server has no more space left (GBs of crash logs). The crash logs start with something like this:
=erl_crash_dump:0.1
Tue Feb 4 23:44:02 2014
Slogan: Kernel pid terminated (application_controller) ({application_start_failure,kernel, {shutdown,{kernel,start,[normal,[]]}}})
System version: Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false]
Compiled: Fri Dec 16 03:22:15 2011
Taints:
Atoms: 4574
Can anyone see something from the crash log and let me know what's happening?
In this case, the crash dump is unlikely to tell you very much - it tells you that the kernel application shut down ({shutdown,{kernel,start,[normal,[]]}}), but it doesn't say why. In the error log you should find a number of crash reports and error messages that led to the node shutting down.
Crash dumps are much more useful if the node crashes because it runs out of memory. In that case, you can usually see which process behaved badly.

Resources