One of the RabbitMQ nodes went down but remained in hung state which didn't let the other node to become active node. Have a VIP through which applications connect to Rabbit, in case one of the nodes go down the VIP switches to the other node but this didn't happen as the first node, that went down, hung itself.
Below are the logs from that time.
[Then Active Node that hung]
=CRASH REPORT==== 11-Jun-2017::05:15:09 ===
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.253.0>
registered_name: rabbit_disk_monitor
exception exit: {eagain,[{erlang,open_port,
[{spawn,"/bin/sh -s unix:cmd 2>&1"},
[stream]],
[]},
{os,start_port_srv_handle,1,
[{file,"os.erl"},{line,313}]},
{os,start_port_srv_loop,0,
[{file,"os.erl"},{line,329}]}]}
in function gen_server:terminate/7 (gen_server.erl, line 826)
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.243.0>]
messages: []
links: [<0.252.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 1598
stack_size: 27
reductions: 2036096302
neighbours:
[The other node which should have taken over]
=CRASH REPORT==== 11-Jun-2017::05:15:09 ===
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.253.0>
registered_name: rabbit_disk_monitor
exception exit: {eagain,[{erlang,open_port,
[{spawn,"/bin/sh -s unix:cmd 2>&1"},
[stream]],
[]},
{os,start_port_srv_handle,1,
[{file,"os.erl"},{line,313}]},
{os,start_port_srv_loop,0,
[{file,"os.erl"},{line,329}]}]}
in function gen_server:terminate/7 (gen_server.erl, line 826)
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.243.0>]
messages: []
links: [<0.252.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 1598
stack_size: 27
reductions: 2036096302
neighbours:
Could, using delayed message plugin and mnesia have caused this?
eagain means Resource temporarily unavailable
Most likely you ran out the filedescriptors.
Check your current filedescription configuration and try to increase it
Check this links for more details:
debian: http://www.rabbitmq.com/install-debian.html#kernel-resource-limits
RPM http://www.rabbitmq.com/install-rpm.html#kernel-resource-limits
RabbitMQ crashed.
RabbitMQ was working correctly for many days(10-15 days).
I am not getting why it got crashed.
I am using RabbitMQ 3.4.0 on Erlang 17.0
The erlang has created dump file for the crash. Which shows
eheap_alloc: Cannot allocate 229520 bytes of memory (of type "old_heap").
Also note that the rabbitmq publish-subscribe message load is very low. (max:1-2 messages/second).And RabbitMQ messages are processed as it comes so RabbitMQ is almost empty all the time. The disk space & memory are also sufficient.
More system info:
Limiting to approx 8092 file handles (7280 sockets)
Memory limit set to 6553MB of 16383MB total.
Disk free limit set to 50MB.
The RabbitMQ logs are as below.
=ERROR REPORT==== 18-Jul-2015::04:29:31 ===
** Generic server rabbit_disk_monitor terminating
** Last message in was update
** When Server state == {state,"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia",
50000000,28358258688,100,10000,
#Ref<0.0.106.70488>,false}
** Reason for termination ==
** {eacces,[{erlang,open_port,
[{spawn,"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""},
[stream,in,eof,hide]],
[]},
{os,cmd,1,[{file,"os.erl"},{line,204}]},
{rabbit_disk_monitor,get_disk_free,2,[]},
{rabbit_disk_monitor,internal_update,1,[]},
{rabbit_disk_monitor,handle_info,2,[]},
{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,599}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
=INFO REPORT==== 18-Jul-2015::04:29:31 ===
Disabling disk free space monitoring on unsupported platform:
{{'EXIT',{eacces,[{erlang,open_port,
[{spawn,"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""},
[stream,in,eof,hide]],
[]},
{os,cmd,1,[{file,"os.erl"},{line,204}]},
{rabbit_disk_monitor,get_disk_free,2,[]},
{rabbit_disk_monitor,init,1,[]},
{gen_server,init_it,6,[{file,"gen_server.erl"},{line,306}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]}},
17179336704}
=INFO REPORT==== 18-Jul-2015::04:29:31 ===
Disabling disk free space monitoring on unsupported platform:
{{'EXIT',{eacces,[{erlang,open_port,
[{spawn,"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""},
[stream,in,eof,hide]],
[]},
{os,cmd,1,[{file,"os.erl"},{line,204}]},
{rabbit_disk_monitor,get_disk_free,2,[]},
{rabbit_disk_monitor,init,1,[]},
{gen_server,init_it,6,[{file,"gen_server.erl"},{line,306}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]}},
17179336704}
=CRASH REPORT==== 18-Jul-2015::04:29:31 ===
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.167.0>
registered_name: rabbit_disk_monitor
exception exit: {eacces,
[{erlang,open_port,
[{spawn,
"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""},
[stream,in,eof,hide]],
[]},
{os,cmd,1,[{file,"os.erl"},{line,204}]},
{rabbit_disk_monitor,get_disk_free,2,[]},
{rabbit_disk_monitor,internal_update,1,[]},
{rabbit_disk_monitor,handle_info,2,[]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,599}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]}
in function gen_server:terminate/6 (gen_server.erl, line 746)
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.140.0>]
messages: []
links: [<0.166.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 4185
stack_size: 27
reductions: 481081978
neighbours:
=SUPERVISOR REPORT==== 18-Jul-2015::04:29:31 ===
Supervisor: {local,rabbit_disk_monitor_sup}
Context: child_terminated
Reason: {eacces,
[{erlang,open_port,
[{spawn,
"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""},
[stream,in,eof,hide]],
[]},
{os,cmd,1,[{file,"os.erl"},{line,204}]},
{rabbit_disk_monitor,get_disk_free,2,[]},
{rabbit_disk_monitor,internal_update,1,[]},
{rabbit_disk_monitor,handle_info,2,[]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,599}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]}
Offender: [{pid,<0.167.0>},
{name,rabbit_disk_monitor},
{mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
{restart_type,{transient,1}},
{shutdown,4294967295},
{child_type,worker}]
=CRASH REPORT==== 18-Jul-2015::04:29:31 ===
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.24989.51>
registered_name: []
exception exit: unsupported_platform
in function gen_server:init_it/6 (gen_server.erl, line 322)
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.140.0>]
messages: []
links: [<0.166.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 1598
stack_size: 27
reductions: 650
neighbours:
=SUPERVISOR REPORT==== 18-Jul-2015::04:29:31 ===
Supervisor: {local,rabbit_disk_monitor_sup}
Context: start_error
Reason: unsupported_platform
Offender: [{pid,<0.167.0>},
{name,rabbit_disk_monitor},
{mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
{restart_type,{transient,1}},
{shutdown,4294967295},
{child_type,worker}]
=CRASH REPORT==== 18-Jul-2015::04:29:31 ===
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.24991.51>
registered_name: []
exception exit: unsupported_platform
in function gen_server:init_it/6 (gen_server.erl, line 322)
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.140.0>]
messages: []
links: [<0.166.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 1598
stack_size: 27
reductions: 650
neighbours:
=SUPERVISOR REPORT==== 18-Jul-2015::04:29:31 ===
Supervisor: {local,rabbit_disk_monitor_sup}
Context: start_error
Reason: unsupported_platform
Offender: [{pid,{restarting,<0.167.0>}},
{name,rabbit_disk_monitor},
{mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
{restart_type,{transient,1}},
{shutdown,4294967295},
{child_type,worker}]
From the error message, rabbitmq can't open more files due to system limits.
You can set max open file numbers to upper value to avoid the problem.
https://serverfault.com/questions/249477/windows-server-2008-r2-max-open-files-limit
There are two unrelated errors here: one is the VM failure to allocate memory. Another is disk space monitor terminating. Disk space monitor is optional and on some less common platforms or with specific security restrictions, it is known to fail. That does not bring the VM down, and certainly has nothing to do with heap allocation failures.
The heap allocation failure typically comes down to two most common cases:
A known bug fixed in Erlang 17.x (don't recall which specific patch release, so use 17.5)
You run 32 bit Erlang/OTP on a 64 bit OS.
Chen Yu's comment about the EACCESS system call error is correct.
I get analog error
systemd unit for activation check: "rabbitmq-server.service"
eheap_alloc: Cannot allocate 306586976 bytes of memory (of type "heap").^M
^M
Crash dump is being written to: erl_crash.dump...done^M
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 514979
max locked memory (kbytes, -l) 65536
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 514979
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
this is crush dump
=erl_crash_dump:0.5
Wed Dec 2 17:16:31 2020
Slogan: eheap_alloc: Cannot allocate 306586976 bytes of memory (of type "heap").
System version: Erlang/OTP 20 [erts-9.2] [source] [64-bit] [smp:32:32] [ds:32:32:10] [async-threads:512] [kernel-poll:true]
Compiled: Mon Feb 5 17:34:00 2018
Taints: crypto,asn1rt_nif,erl_tracer,zlib
Atoms: 34136
Calling Thread: scheduler:0
=scheduler:1
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK
Current Process:
=scheduler:2
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING
Scheduler Sleep Info Aux Work: THR_PRGR_LATER_OP
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK | NONEMPTY | EXEC
Current Process:
=scheduler:3
Scheduler Sleep Info Flags:
Scheduler Sleep Info Aux Work: DELAYED_AW_WAKEUP | DD | THR_PRGR_LATER_OP
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK | NONEMPTY | EXEC
Current Process: <0.12306.0>
Current Process State: Running
Current Process Internal State: ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL | ACTIVE | RUNNING | TRAP_EXIT | ON_HEAP_MSGQ
Current Process Program counter: 0x00007f2f3ab3a060 (unknown function)
Current Process CP: 0x0000000000000000 (invalid)
Current Process Limited Stack Trace:
0x00007f2b50252d68:SReturn addr 0x32A6EC98 (rabbit_channel:handle_method/3 + 6712)
0x00007f2b50252d78:SReturn addr 0x32A69630 (rabbit_channel:handle_cast/2 + 4160)
0x00007f2b50252df8:SReturn addr 0x51102708 (gen_server2:handle_msg/2 + 1808)
0x00007f2b50252e28:SReturn addr 0x3FD85E70 (proc_lib:init_p_do_apply/3 + 72)
0x00007f2b50252e48:SReturn addr 0x7FFB4948 (<terminate process normally>)
=scheduler:4
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
I have a debian box running tasks with celery and rabbitmq for about a year. Recently I noticed tasks were not being processed so I logged into the system and noticed that celery could not connect to rabbitmq. I restarted rabbitmq-server and even though celery was not complaining anymore it was not executing new tasks now. The odd thing was that rabbitmq was devouring cpu and memory resources like crazy. Restarting server would not solve the problem. After spending couple hours looking for solution online to no avail I decided to rebuild the server.
I rebuilt new server with Debian 7.5, rabbitmq 2.8.4, celery 3.1.13 (Cipater). For about an hour or so everything worked beautifully again until celery started complaining again that it can't connect to rabbitmq!
[2014-08-06 05:17:21,036: ERROR/MainProcess] consumer: Cannot connect to amqp://guest:**#127.0.0.1:5672//: [Errno 111] Connection refused.
Trying again in 6.00 seconds...
I restarted rabbitmq service rabbitmq-server start and same issue gain:
rabbitmq started again swelling up constantly pounding on cpu and slowly taking over all ram and swap:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21823 rabbitmq 20 0 908m 488m 3900 S 731.2 49.4 9:44.74 beam.smp
Here's the result on rabbitmqctl status:
Status of node 'rabbit#li370-61' ...
[{pid,21823},
{running_applications,[{rabbit,"RabbitMQ","2.8.4"},
{os_mon,"CPO CXC 138 46","2.2.9"},
{sasl,"SASL CXC 138 11","2.2.1"},
{mnesia,"MNESIA CXC 138 12","4.7"},
{stdlib,"ERTS CXC 138 10","1.18.1"},
{kernel,"ERTS CXC 138 10","2.15.1"}]},
{os,{unix,linux}},
{erlang_version,"Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:8:8] [async-threads:30] [kernel-poll:true]\n"},
{memory,[{total,489341272},
{processes,462841967},
{processes_used,462685207},
{system,26499305},
{atom,504409},
{atom_used,473810},
{binary,98752},
{code,11874771},
{ets,6695040}]},
{vm_memory_high_watermark,0.3999999992280962},
{vm_memory_limit,414559436},
{disk_free_limit,1000000000},
{disk_free,48346546176},
{file_descriptors,[{total_limit,924},
{total_used,924},
{sockets_limit,829},
{sockets_used,3}]},
{processes,[{limit,1048576},{used,1354}]},
{run_queue,0},
Some entries from /var/log/rabbitmq:
=WARNING REPORT==== 8-Aug-2014::00:11:35 ===
Mnesia('rabbit#li370-61'): ** WARNING ** Mnesia is overloaded: {dump_log,
write_threshold}
=WARNING REPORT==== 8-Aug-2014::00:11:35 ===
Mnesia('rabbit#li370-61'): ** WARNING ** Mnesia is overloaded: {dump_log,
write_threshold}
=WARNING REPORT==== 8-Aug-2014::00:11:35 ===
Mnesia('rabbit#li370-61'): ** WARNING ** Mnesia is overloaded: {dump_log,
write_threshold}
=WARNING REPORT==== 8-Aug-2014::00:11:35 ===
Mnesia('rabbit#li370-61'): ** WARNING ** Mnesia is overloaded: {dump_log,
write_threshold}
=WARNING REPORT==== 8-Aug-2014::00:11:36 ===
Mnesia('rabbit#li370-61'): ** WARNING ** Mnesia is overloaded: {dump_log,
write_threshold}
=INFO REPORT==== 8-Aug-2014::00:11:36 ===
vm_memory_high_watermark set. Memory used:422283840 allowed:414559436
=WARNING REPORT==== 8-Aug-2014::00:11:36 ===
memory resource limit alarm set on node 'rabbit#li370-61'.
**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************
=INFO REPORT==== 8-Aug-2014::00:11:43 ===
started TCP Listener on [::]:5672
=INFO REPORT==== 8-Aug-2014::00:11:44 ===
vm_memory_high_watermark clear. Memory used:290424384 allowed:414559436
=WARNING REPORT==== 8-Aug-2014::00:11:44 ===
memory resource limit alarm cleared on node 'rabbit#li370-61'
=INFO REPORT==== 8-Aug-2014::00:11:59 ===
vm_memory_high_watermark set. Memory used:414584504 allowed:414559436
=WARNING REPORT==== 8-Aug-2014::00:11:59 ===
memory resource limit alarm set on node 'rabbit#li370-61'.
**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************
=INFO REPORT==== 8-Aug-2014::00:12:00 ===
vm_memory_high_watermark clear. Memory used:411143496 allowed:414559436
=WARNING REPORT==== 8-Aug-2014::00:12:00 ===
memory resource limit alarm cleared on node 'rabbit#li370-61'
=INFO REPORT==== 8-Aug-2014::00:12:01 ===
vm_memory_high_watermark set. Memory used:415563120 allowed:414559436
=WARNING REPORT==== 8-Aug-2014::00:12:01 ===
memory resource limit alarm set on node 'rabbit#li370-61'.
**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************
=INFO REPORT==== 8-Aug-2014::00:12:07 ===
Server startup complete; 0 plugins started.
=ERROR REPORT==== 8-Aug-2014::00:15:32 ===
** Generic server rabbit_disk_monitor terminating
** Last message in was update
** When Server state == {state,"/var/lib/rabbitmq/mnesia/rabbit#li370-61",
50000000,46946492416,100,10000,
#Ref<0.0.1.79456>,false}
** Reason for termination ==
** {unparseable,[]}
=INFO REPORT==== 8-Aug-2014::00:15:37 ===
Disk free limit set to 50MB
=ERROR REPORT==== 8-Aug-2014::00:16:03 ===
** Generic server rabbit_disk_monitor terminating
** Last message in was update
** When Server state == {state,"/var/lib/rabbitmq/mnesia/rabbit#li370-61",
50000000,46946426880,100,10000,
#Ref<0.0.1.80930>,false}
** Reason for termination ==
** {unparseable,[]}
=INFO REPORT==== 8-Aug-2014::00:16:05 ===
Disk free limit set to 50MB
UPDATE:
Seems like problem was solved when installed newest version of rabbitmq (3.3.4-1) from rabbitmq.com repository. Originally I had one installed (2.8.4) from Debian repositories. So far rabbitmq-server is working smoothly. I will update this post if issue comes back.
UPDATE:
Unfortunately after about 24 hours the issue reappeared where rabbitmq shut down and restarting the process would make it consume resources until it shuts down again within minutes.
Finally I found the solution. These posts helped to figure this out.
RabbitMQ on EC2 Consuming Tons of CPU
and
https://serverfault.com/questions/337982/how-do-i-restart-rabbitmq-after-switching-machines
What happened was rabbitmq was holding on to all the results that were never freed to the point it became overloaded. I cleared all the stale data in /var/lib/rabbitmq/mnesia/rabbit/, restarted rabbit and it works fine now.
My solution was to disable storing results alltogether with CELERY_IGNORE_RESULT = True in the Celery config file to assure this does not happen again.
You can also reset the queue:
Warning: This clears all data and configuration! Use with caution.
sudo service rabbitmq-server start
sudo rabbitmqctl stop_app
sudo rabbitmqctl reset
sudo rabbitmqctl start_app
You might need to run these command right after rebooting if your system is not responding.
You are running out the memory resources because of celery, I got a similar issue and it was a problem with the queues used by celery backend result.
You can check how many queues there are using rabbitmqctl list_queues command, take attention if that number grows forever. In that case check out your celery use.
About celery, if you are not getting the results as asycronous events dont configure a backend for store those unused results.
I experienced a similar issue and it turned out to be due to some rogue RabbitMQ client applications.
The issue seems to have been that due to some un-handled error, the rogue application was continuously trying to make a connection to the RabbitMQ broker.
Once the client applications were restarted, everything went back to normal (since the application stopped malfunctioning and there for stopped trying to make a connection to RabbitMQ in an endless loop)
Another possible cause: The management-plugin.
I'm running RabbitMQ 3.8.1 with enabled management-plugin.
On a 10-core server I had up to 1000% CPU-usage with 3 idle consumers and not a single message being sent, and one queue.
When I disabled the management-plugin by executing rabbitmq-plugins disable rabbitmq_management the usage dropped to 0% with occasional spikes of 200%.
I am not able to understand the reason why Erlang crashes and restarts. I am running Ejabberd server and its log folder is always full of erl_crash_xxxx.dump files. How can I debug this problem.
Here is a small part of erlang.log file:
=CRASH REPORT==== 4-Sep-2013::19:44:51 ===
crasher:
initial call: ejabberd_http:init/2
pid: <0.15614.15>
registered_name: []
exception exit: {normal,
{gen_fsm,sync_send_all_state_event,
[<0.15454.15>,
{http_put,2020093061,
[{"xmlns",
"http://jabber.org/protocol/httpbind"},
{"rid","2020093061"},
{"sid",
"26820e4cd7d331de864b857d1ef3351caf7dbac5"}],
[],115,1,[],
{{49,205,148,16},56132}},
30000]}}
in function gen_fsm:sync_send_all_state_event/3
in call from ejabberd_http_bind:http_put/7
in call from ejabberd_http_bind:handle_http_put/7
in call from ejabberd_http:process/2
in call from ejabberd_http:process_request/1
in call from ejabberd_http:process_header/2
in call from ejabberd_http:receive_headers/1
ancestors: [ejabberd_http_sup,ejabberd_sup,<0.37.0>]
messages: []
links: [<0.274.0>,#Port<0.1519795>]
dictionary: []
trap_exit: false
status: running
heap_size: 2584
stack_size: 24
reductions: 1082
neighbours:
These are the top few lines of a typical crash dump file:
=erl_crash_dump:0.1
Tue Sep 3 16:31:47 2013
Slogan: Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
System version: Erlang R14B04 (erts-5.8.5) [source] [64-bit] [rq:1] [async-threads:0] [hipe] [kernel-poll:false]
Compiled: Wed Oct 5 17:25:18 2011
Taints:
Atoms: 4699
=memory
total: 21498768
processes: 556368
processes_used: 541208
system: 20942400
atom: 322177
atom_used: 302233
binary: 18216
code: 2165726
ets: 53736
=hash_table:atom_tab
size: 3203
used: 2471
Those 'crashes' are probably totally unrelated. What you see as '=CRASH REPORT=' in your log files are sort of 'normal' or expected crashes and as such handled by a supervisor. The one you posted is a crash inside a handler for a HTTP call while creating or sending the a response that might ended up in some unforseen condition (like when the sending process does not exist anymore - not sure about that though). Erlang is built and designed to handle such crashes gracefully. In other words, your ejabberd is still running and serving requests happily.
The other one is a crash dump which results from a real crash of the Erlang runtime. This might happen if for example the machine hosting the runtime runs out of memory. In your case it seems like a configuration error, the node could not boot correctly at all. See http://www.ejabberd.im/node/872.