nginx + passenger + rails 3.1 = 502 bad gateway? - ruby-on-rails

I have the latest Nginx running with Passenger, SQLite and Rails 3.1. Somehow, when I have Passenger running for a while, I start getting "502 bad gateway" errors when visiting my website.
Here is a snippet from my Nginx error log:
2011/06/27 08:55:33 [error] 20331#0: *11270 upstream prematurely closed connection while reading response header from upstream, client: xxx.xxx.xx.x, server: www.example.com, request: "GET / HTTP/1.1", upstream: "passenger:unix:/passenger_helper_server:", host: "example.com"
2011/06/27 08:55:47 [info] 20331#0: *11273 client closed prematurely connection, so upstream connection is closed too while sending request to upstream, client: xxx.xxx.xx.x, server: www.example.com, request: "GET / HTTP/1.1", upstream: "passenger:unix:/passenger_helper_server:", host: "example.com"
Here is my passenger-status --show=backtraces output:
Thread 'Client thread 7':
in 'Passenger::FileDescriptor Client::acceptConnection()' (HelperAgent.cpp:160)
in 'void Client::threadMain()' (HelperAgent.cpp:603)
Thread 'Client thread 10':
in 'Passenger::FileDescriptor Client::acceptConnection()' (HelperAgent.cpp:160)
in 'void Client::threadMain()' (HelperAgent.cpp:603)
Thread 'Client thread 11':
in 'Passenger::FileDescriptor Client::acceptConnection()' (HelperAgent.cpp:160)
in 'void Client::threadMain()' (HelperAgent.cpp:603)
Thread 'Client thread 12':
in 'Passenger::FileDescriptor Client::acceptConnection()' (HelperAgent.cpp:160)
in 'void Client::threadMain()' (HelperAgent.cpp:603)
Thread 'Client thread 13':
in 'Passenger::FileDescriptor Client::acceptConnection()' (HelperAgent.cpp:160)
in 'void Client::threadMain()' (HelperAgent.cpp:603)
Thread 'Client thread 14':
in 'Passenger::FileDescriptor Client::acceptConnection()' (HelperAgent.cpp:160)
in 'void Client::threadMain()' (HelperAgent.cpp:603)
Thread 'Client thread 15':
in 'Passenger::FileDescriptor Client::acceptConnection()' (HelperAgent.cpp:160)
in 'void Client::threadMain()' (HelperAgent.cpp:603)
Thread 'Client thread 16':
in 'Passenger::FileDescriptor Client::acceptConnection()' (HelperAgent.cpp:160)
in 'void Client::threadMain()' (HelperAgent.cpp:603)
Thread 'Client thread 17':
in 'Passenger::FileDescriptor Client::acceptConnection()' (HelperAgent.cpp:160)
in 'void Client::threadMain()' (HelperAgent.cpp:603)
Thread 'Client thread 18':
in 'Passenger::FileDescriptor Client::acceptConnection()' (HelperAgent.cpp:160)
in 'void Client::threadMain()' (HelperAgent.cpp:603)
Thread 'Client thread 19':
in 'Passenger::FileDescriptor Client::acceptConnection()' (HelperAgent.cpp:160)
in 'void Client::threadMain()' (HelperAgent.cpp:603)
Thread 'Client thread 20':
in 'Passenger::FileDescriptor Client::acceptConnection()' (HelperAgent.cpp:160)
in 'void Client::threadMain()' (HelperAgent.cpp:603)
Thread 'Client thread 21':
in 'Passenger::FileDescriptor Client::acceptConnection()' (HelperAgent.cpp:160)
in 'void Client::threadMain()' (HelperAgent.cpp:603)
Thread 'Client thread 22':
in 'Passenger::FileDescriptor Client::acceptConnection()' (HelperAgent.cpp:160)
in 'void Client::threadMain()' (HelperAgent.cpp:603)
Thread 'Client thread 23':
in 'Passenger::FileDescriptor Client::acceptConnection()' (HelperAgent.cpp:160)
in 'void Client::threadMain()' (HelperAgent.cpp:603)
Thread 'Client thread 24':
in 'Passenger::FileDescriptor Client::acceptConnection()' (HelperAgent.cpp:160)
in 'void Client::threadMain()' (HelperAgent.cpp:603)
Thread 'MessageServer thread':
in 'void Passenger::MessageServer::mainLoop()' (MessageServer.h:537)
Thread 'MessageServer client thread 35':
in 'virtual bool Passenger::BacktracesServer::processMessage(Passenger::MessageServer::CommonClientContext&, boost::shared_ptr<Passenger::MessageServer::ClientContext>&, const std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&)' (BacktracesServer.h:47)
in 'void Passenger::MessageServer::clientHandlingMainLoop(Passenger::FileDescriptor&)' (MessageServer.h:470)
This is what my passenger-memory-stats shows:
---------- Nginx processes ----------
PID PPID VMSize Private Name
-------------------------------------
16291 1 35.4 MB 0.1 MB nginx: master process /home/apps/.nginx/sbin/nginx
16292 16291 36.0 MB 0.8 MB nginx: worker process
16293 16291 35.8 MB 0.5 MB nginx: worker process
16294 16291 35.8 MB 0.5 MB nginx: worker process
16295 16291 35.8 MB 0.5 MB nginx: worker process
### Processes: 5
### Total private dirty RSS: 2.46 MB
----- Passenger processes ------
PID VMSize Private Name
--------------------------------
16251 87.0 MB 0.3 MB PassengerWatchdog
16254 100.4 MB 1.3 MB PassengerHelperAgent
16256 41.6 MB 5.7 MB Passenger spawn server
16259 134.8 MB 0.8 MB PassengerLoggingAgent
18390 770.4 MB 17.1 MB Passenger ApplicationSpawner: /home/apps/manager/current
18415 853.3 MB 147.7 MB Rack: /home/apps/manager/current
18424 790.5 MB 57.2 MB Rack: /home/apps/manager/current
18431 774.7 MB 18.7 MB Rack: /home/apps/manager/current
### Processes: 8
### Total private dirty RSS: 248.85 MB
It seems there is an issue with my the communication between Passenger and Nginx?
Also, looking at the Rails logs, it is clear that the request never reaches Rails at all, as there are no log entries for visits that get the 502 error. So my initial thought of something being wrong with any Rack middleware should not be possible.

The "V" in VM is for Virtual. See also answers on other SO questions, e.g. Virtual Memory Usage from Java under Linux, too much memory used.
That top 147 MB does not hint of anything unusual whatsoever. Your 502 errors mean something else is wrong with the worker processes from Passenger's point of view. You should check your Rails & Nginx log files for clues, and perhaps passenger-status --show=backtraces.

Try setting passenger_spawn_method conservative -- apparently there are issues with Passenger default forking settings and Rails 3.1

I just meet such deadly "502 Bad Gateway error" reported by nginx, web stack is Ubuntu 12.04 + Rails 3.2.9 + Passenger 3.0.18 + nginx 1.2.4, it spent me 2 hours to found the root cause:
My rails application no need database support, so I just remove the gem 'sqlite3' in the Gemfile, it works fine in development mode, but will lead 502 Bad Gateway in production mode.
So after add back gem 'sqlite3' in Gemfile, such 502 Bad Gateway error disappear....

I had the same problem and in my case it helped to increase the passenger_max_pool_size setting in the Nginx configuration file.
Maybe you can also take a look on the following postings which also helped me finding this solution:
http://whowish-programming.blogspot.com/2011/10/nginx-502-bad-gateway.html
https://groups.google.com/forum/?fromgroups#!topic/phusion-passenger/fgj6_4VdlLo and
https://groups.google.com/forum/?fromgroups#!topic/phusion-passenger/wAQDCrFHHgE

It was the same for me in Rails 4, but I have added a "SECRETKEYBASE" in /confirg/secrets.yml
production:
secretkeybase: # add yours here

Related

Keycloak docker image not running any more Failed to start service org.wildfly.undertow.listener.ajp

i'm a new to keycloak, sorry for this newby post.
I have a docker image that was working perfectly, the same image is deployed on AWS and everything is working fine but suddenly the execution fails for this image locally and it doesn't want to be executed after a system reboot here are the logs if anyone had the same error:
KC version 8.2.0 Config:
{
"KC_DOMAIN": "http://keycloak:8080",
"KC_PRIVATE_DOMAIN": "http://keycloak:8080",
"KC_LIFE_SPAN_DISTRIBUTOR": "1200",
"KC_LIFE_SPAN_IMPLICIT_FLOW_DISTRIBUTOR": "1200",
"KC_SESSION_IDLE_TIMEOUT_DISTRIBUTOR": "3600",
"KC_SESSION_MAX_LIFE_SPAN_DISTRIBUTOR": "43200",
"KC_XFRAME_OPTIONS_DISTRIBUTOR": "ALLOW ORIGIN *",
"KC_CONTENT_SECURITY_POLICY_DISTRIBUTOR": "frame-src 'self' *; frame-ancestors 'self' *; object-src 'none';",
"KEYCLOAK_LOGLEVEL": "INFO",
"EVENT_LOGLEVEL": "DEBUG",
"ROOT_LOGLEVEL": "INFO",
"JGROUP_LOGLEVEL": "INFO",
"KEYCLOAK_LOG_ROTATE_SIZE":"10240k",
"KEYCLOAK_LOG_MAX_BACKUP_INDEX":"50",
"ENABLE_INFINISPAN_STATISTICS": "false",
"DB_USER": "keycloak",
"DB_PASSWORD": "******",
"DB_ADDR": "mysql",
"DB_PORT": "3306",
"DB_DATABASE": "keycloak",
"JDBC_PARAMS": "",
"PROXY_ADDRESS_FORWARDING": "false",
"KC_SSL_REQUIRED": "none",
"MAX_CONCURRENT_REQUESTS": "25",
"QUEUE_SIZE": "100"
}
Error when i execute the docker image:
...
13:35:42,295 INFO [org.jboss.as.ejb3] (MSC service thread 1-8) WFLYEJB0481: Strict pool slsb-strict-max-pool is using a max instance size of 128 (per class), which is derived from thread worker pool sizing.
13:35:42,366 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool -- 58) WFLYUT0014: Creating file handler for path '/opt/keycloak/welcome-content' with options [directory-listing: 'false', follow-symlink: 'false', case-sensitive: 'true', safe-symlink-paths: '[]']
13:35:42,448 INFO [org.wildfly.extension.undertow] (MSC service thread 1-3) WFLYUT0012: Started server default-server.
13:35:42,486 INFO [org.wildfly.extension.undertow] (MSC service thread 1-1) WFLYUT0018: Host default-host starting
13:35:42,674 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-8) MSC000001: Failed to start service org.wildfly.undertow.listener.ajp: org.jboss.msc.service.StartException in service org.wildfly.undertow.listener.ajp: WFLYUT0082: Could not start 'ajp' listener.
at org.wildfly.extension.undertow.ListenerService.start(ListenerService.java:211)
at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1739)
at org.jboss.msc.service.ServiceControllerImpl$StartTask.execute(ServiceControllerImpl.java:1701)
at org.jboss.msc.service.ServiceControllerImpl$ControllerTask.run(ServiceControllerImpl.java:1559)
at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Protocol family unavailable
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:461)
at sun.nio.ch.Net.bind(Net.java:453)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:222)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85)
at org.xnio.nio.NioXnioWorker.createTcpConnectionServer(NioXnioWorker.java:178)
at org.xnio.XnioWorker.createStreamConnectionServer(XnioWorker.java:310)
at org.wildfly.extension.undertow.AjpListenerService.startListening(AjpListenerService.java:64)
at org.wildfly.extension.undertow.ListenerService.start(ListenerService.java:199)
... 8 more
13:35:42,674 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-6) MSC000001: Failed to start service org.wildfly.undertow.listener.default: org.jboss.msc.service.StartException in service org.wildfly.undertow.listener.default: WFLYUT0082: Could not start 'default' listener.
at org.wildfly.extension.undertow.ListenerService.start(ListenerService.java:211)
at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1739)
at org.jboss.msc.service.ServiceControllerImpl$StartTask.execute(ServiceControllerImpl.java:1701)
at org.jboss.msc.service.ServiceControllerImpl$ControllerTask.run(ServiceControllerImpl.java:1559)
at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Protocol family unavailable
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:461)
at sun.nio.ch.Net.bind(Net.java:453)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:222)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85)
at org.xnio.nio.NioXnioWorker.createTcpConnectionServer(NioXnioWorker.java:178)
at org.xnio.XnioWorker.createStreamConnectionServer(XnioWorker.java:310)
at org.wildfly.extension.undertow.HttpListenerService.startListening(HttpListenerService.java:106)
at org.wildfly.extension.undertow.ListenerService.start(ListenerService.java:199)
... 8 more
2021-11-19T13:35:43.686Z WARN [org.jboss.as.dependency.private] (MSC service thread 1-7) WFLYSRV0018: Deployment "deployment.keycloak-server.war" is using a private module ("org.kie") which may be changed or removed in future versions without notice.
2021-11-19T13:35:45.327Z ERROR [org.jgroups.protocols.JDBC_PING] (ServerService Thread Pool -- 60) JGRP000138: Error reading JDBC_PING table: org.h2.jdbc.JdbcSQLException: Table "JGROUPSPING" not found; SQL statement:
SELECT ping_data, own_addr, cluster_name FROM JGROUPSPING WHERE cluster_name=? [42102-193]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
at org.h2.message.DbException.get(DbException.java:179)
at org.h2.message.DbException.get(DbException.java:155)
at org.h2.command.Parser.readTableOrView(Parser.java:5389)
at org.h2.command.Parser.readTableFilter(Parser.java:1257)
at org.h2.command.Parser.parseSelectSimpleFromPart(Parser.java:1897)
at org.h2.command.Parser.parseSelectSimple(Parser.java:2045)
at org.h2.command.Parser.parseSelectSub(Parser.java:1891)
at org.h2.command.Parser.parseSelectUnion(Parser.java:1709)
at org.h2.command.Parser.parseSelect(Parser.java:1697)
at org.h2.command.Parser.parsePrepared(Parser.java:445)
at org.h2.command.Parser.parse(Parser.java:317)
at org.h2.command.Parser.parse(Parser.java:289)
at org.h2.command.Parser.prepareCommand(Parser.java:254)
at org.h2.engine.Session.prepareLocal(Session.java:561)
at org.h2.engine.Session.prepareCommand(Session.java:502)
at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1203)
at org.h2.jdbc.JdbcPreparedStatement.<init>(JdbcPreparedStatement.java:73)
at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:676)
at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnection.doPrepareStatement(BaseWrapperManagedConnection.java:758)
at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnection.prepareStatement(BaseWrapperManagedConnection.java:744)
at org.jboss.jca.adapters.jdbc.WrappedConnection$5.produce(WrappedConnection.java:516)
at org.jboss.jca.adapters.jdbc.WrappedConnection$5.produce(WrappedConnection.java:514)
at org.jboss.jca.adapters.jdbc.SecurityActions.executeInTccl(SecurityActions.java:97)
at org.jboss.jca.adapters.jdbc.WrappedConnection.prepareStatement(WrappedConnection.java:514)
at org.jgroups.protocols.JDBC_PING.prepareStatement(JDBC_PING.java:209)
at org.jgroups.protocols.JDBC_PING.readAll(JDBC_PING.java:221)
at org.jgroups.protocols.JDBC_PING.readAll(JDBC_PING.java:197)
at org.jgroups.protocols.FILE_PING.findMembers(FILE_PING.java:124)
at org.jgroups.protocols.Discovery.invokeFindMembers(Discovery.java:216)
at org.jgroups.protocols.Discovery.findMembers(Discovery.java:241)
at org.jgroups.protocols.Discovery.down(Discovery.java:380)
at org.jgroups.protocols.FILE_PING.down(FILE_PING.java:119)
at org.jgroups.protocols.MERGE3.down(MERGE3.java:278)
at org.jgroups.protocols.FD_SOCK.down(FD_SOCK.java:377)
at org.jgroups.protocols.FD.down(FD.java:320)
at org.jgroups.protocols.VERIFY_SUSPECT.down(VERIFY_SUSPECT.java:102)
at org.jgroups.protocols.pbcast.NAKACK2.down(NAKACK2.java:553)
at org.jgroups.protocols.UNICAST3.down(UNICAST3.java:581)
at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:347)
at org.jgroups.protocols.pbcast.ClientGmsImpl.joinInternal(ClientGmsImpl.java:72)
at org.jgroups.protocols.pbcast.ClientGmsImpl.join(ClientGmsImpl.java:40)
at org.jgroups.protocols.pbcast.GMS.down(GMS.java:1044)
at org.jgroups.protocols.FlowControl.down(FlowControl.java:295)
at org.jgroups.protocols.FRAG3.down(FRAG3.java:135)
at org.jgroups.protocols.FORK.down(FORK.java:109)
at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:928)
at org.jgroups.JChannel.down(JChannel.java:627)
at org.jgroups.JChannel._connect(JChannel.java:855)
at org.jgroups.JChannel.connect(JChannel.java:352)
at org.jgroups.JChannel.connect(JChannel.java:343)
at org.jboss.as.clustering.jgroups.subsystem.ChannelServiceConfigurator.get(ChannelServiceConfigurator.java:112) at org.jboss.as.clustering.jgroups.subsystem.ChannelServiceConfigurator.get(ChannelServiceConfigurator.java:58)
at org.wildfly.clustering.service.FunctionalService.start(FunctionalService.java:67)
at org.wildfly.clustering.service.AsyncServiceConfigurator$AsyncService.lambda$start$0(AsyncServiceConfigurator.java:117)
at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
at java.lang.Thread.run(Thread.java:748)
at org.jboss.threads.JBossThread.run(JBossThread.java:485)
Suppressed: org.h2.jdbc.JdbcSQLException: Table "JGROUPSPING" not found; SQL statement:
SELECT ping_data, own_addr, cluster_name FROM JGROUPSPING WHERE cluster_name=? [42102-193]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
at org.h2.message.DbException.get(DbException.java:179)
at org.h2.message.DbException.get(DbException.java:155)
at org.h2.command.Parser.readTableOrView(Parser.java:5389)
at org.h2.command.Parser.readTableFilter(Parser.java:1257)
at org.h2.command.Parser.parseSelectSimpleFromPart(Parser.java:1897)
at org.h2.command.Parser.parseSelectSimple(Parser.java:2045)
at org.h2.command.Parser.parseSelectSub(Parser.java:1891)
at org.h2.command.Parser.parseSelectUnion(Parser.java:1709)
at org.h2.command.Parser.parseSelect(Parser.java:1697)
at org.h2.command.Parser.parsePrepared(Parser.java:445)
at org.h2.command.Parser.parse(Parser.java:317)
at org.h2.command.Parser.parse(Parser.java:289)
at org.h2.command.Parser.prepareCommand(Parser.java:254)
at org.h2.engine.Session.prepareLocal(Session.java:561)
at org.h2.engine.Session.prepareCommand(Session.java:502)
at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1203)
at org.h2.jdbc.JdbcPreparedStatement.<init>(JdbcPreparedStatement.java:73)
at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:676)
at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnection.doPrepareStatement(BaseWrapperManagedConnection.java:758)
at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnection.prepareStatement(BaseWrapperManagedConnection.java:744)
at org.jboss.jca.adapters.jdbc.WrappedConnection$4.produce(WrappedConnection.java:478)
at org.jboss.jca.adapters.jdbc.WrappedConnection$4.produce(WrappedConnection.java:476)
at org.jboss.jca.adapters.jdbc.SecurityActions.executeInTccl(SecurityActions.java:97)
at org.jboss.jca.adapters.jdbc.WrappedConnection.prepareStatement(WrappedConnection.java:476)
at org.jgroups.protocols.JDBC_PING.prepareStatement(JDBC_PING.java:212)
... 35 more
2021-11-19T13:35:45.330Z ERROR [org.jgroups.protocols.JDBC_PING] (ServerService Thread Pool -- 60) JGRP000145: Error updating JDBC_PING table: org.h2.jdbc.JdbcSQLException: Table "JGROUPSPING" not found; SQL statement:
DELETE FROM JGROUPSPING WHERE own_addr=? AND cluster_name=? [42102-193]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
at org.h2.message.DbException.get(DbException.java:179)
...
The main problem was this error:
13:35:42,674 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-8) MSC000001: Failed to start service org.wildfly.undertow.listener.ajp: org.jboss.msc.service.StartException in service org.wildfly.undertow.listener.ajp: WFLYUT0082: Could not start 'ajp' listener.
This is a an error from the JBOSS application server (Wildfly) that failed to open a socket to listen for incoming connections due to attempting to open an IPv6 socket
And the solution that worked for me is that the JAVA_OPTS environment variable must be set to contain -Djava.net.preferIPv4Stack=true
And here is how i apply this to change keycloak jvm arguments via CLI in standalone configuration, i modify the commons.sh script which is executed in the standalone.sh by adding this enty:
if [[ "$JAVA_OPTS" != *"-Djava.net.preferIPv4Stack=true"* ]]; then
export JAVA_OPTS="$JAVA_OPTS -Djava.net.preferIPv4Stack=true"
fi
You can find more informations in this link

RabbitMQ Generic server rabbit_disk_monitor terminating / eheap_alloc: Cannot allocate 229520 bytes of memory (of type "old_heap")

RabbitMQ crashed.
RabbitMQ was working correctly for many days(10-15 days).
I am not getting why it got crashed.
I am using RabbitMQ 3.4.0 on Erlang 17.0
The erlang has created dump file for the crash. Which shows
eheap_alloc: Cannot allocate 229520 bytes of memory (of type "old_heap").
Also note that the rabbitmq publish-subscribe message load is very low. (max:1-2 messages/second).And RabbitMQ messages are processed as it comes so RabbitMQ is almost empty all the time. The disk space & memory are also sufficient.
More system info:
Limiting to approx 8092 file handles (7280 sockets)
Memory limit set to 6553MB of 16383MB total.
Disk free limit set to 50MB.
The RabbitMQ logs are as below.
=ERROR REPORT==== 18-Jul-2015::04:29:31 ===
** Generic server rabbit_disk_monitor terminating
** Last message in was update
** When Server state == {state,"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia",
50000000,28358258688,100,10000,
#Ref<0.0.106.70488>,false}
** Reason for termination ==
** {eacces,[{erlang,open_port,
[{spawn,"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""},
[stream,in,eof,hide]],
[]},
{os,cmd,1,[{file,"os.erl"},{line,204}]},
{rabbit_disk_monitor,get_disk_free,2,[]},
{rabbit_disk_monitor,internal_update,1,[]},
{rabbit_disk_monitor,handle_info,2,[]},
{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,599}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
=INFO REPORT==== 18-Jul-2015::04:29:31 ===
Disabling disk free space monitoring on unsupported platform:
{{'EXIT',{eacces,[{erlang,open_port,
[{spawn,"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""},
[stream,in,eof,hide]],
[]},
{os,cmd,1,[{file,"os.erl"},{line,204}]},
{rabbit_disk_monitor,get_disk_free,2,[]},
{rabbit_disk_monitor,init,1,[]},
{gen_server,init_it,6,[{file,"gen_server.erl"},{line,306}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]}},
17179336704}
=INFO REPORT==== 18-Jul-2015::04:29:31 ===
Disabling disk free space monitoring on unsupported platform:
{{'EXIT',{eacces,[{erlang,open_port,
[{spawn,"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""},
[stream,in,eof,hide]],
[]},
{os,cmd,1,[{file,"os.erl"},{line,204}]},
{rabbit_disk_monitor,get_disk_free,2,[]},
{rabbit_disk_monitor,init,1,[]},
{gen_server,init_it,6,[{file,"gen_server.erl"},{line,306}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]}},
17179336704}
=CRASH REPORT==== 18-Jul-2015::04:29:31 ===
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.167.0>
registered_name: rabbit_disk_monitor
exception exit: {eacces,
[{erlang,open_port,
[{spawn,
"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""},
[stream,in,eof,hide]],
[]},
{os,cmd,1,[{file,"os.erl"},{line,204}]},
{rabbit_disk_monitor,get_disk_free,2,[]},
{rabbit_disk_monitor,internal_update,1,[]},
{rabbit_disk_monitor,handle_info,2,[]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,599}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]}
in function gen_server:terminate/6 (gen_server.erl, line 746)
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.140.0>]
messages: []
links: [<0.166.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 4185
stack_size: 27
reductions: 481081978
neighbours:
=SUPERVISOR REPORT==== 18-Jul-2015::04:29:31 ===
Supervisor: {local,rabbit_disk_monitor_sup}
Context: child_terminated
Reason: {eacces,
[{erlang,open_port,
[{spawn,
"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""},
[stream,in,eof,hide]],
[]},
{os,cmd,1,[{file,"os.erl"},{line,204}]},
{rabbit_disk_monitor,get_disk_free,2,[]},
{rabbit_disk_monitor,internal_update,1,[]},
{rabbit_disk_monitor,handle_info,2,[]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,599}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]}
Offender: [{pid,<0.167.0>},
{name,rabbit_disk_monitor},
{mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
{restart_type,{transient,1}},
{shutdown,4294967295},
{child_type,worker}]
=CRASH REPORT==== 18-Jul-2015::04:29:31 ===
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.24989.51>
registered_name: []
exception exit: unsupported_platform
in function gen_server:init_it/6 (gen_server.erl, line 322)
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.140.0>]
messages: []
links: [<0.166.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 1598
stack_size: 27
reductions: 650
neighbours:
=SUPERVISOR REPORT==== 18-Jul-2015::04:29:31 ===
Supervisor: {local,rabbit_disk_monitor_sup}
Context: start_error
Reason: unsupported_platform
Offender: [{pid,<0.167.0>},
{name,rabbit_disk_monitor},
{mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
{restart_type,{transient,1}},
{shutdown,4294967295},
{child_type,worker}]
=CRASH REPORT==== 18-Jul-2015::04:29:31 ===
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.24991.51>
registered_name: []
exception exit: unsupported_platform
in function gen_server:init_it/6 (gen_server.erl, line 322)
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.140.0>]
messages: []
links: [<0.166.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 1598
stack_size: 27
reductions: 650
neighbours:
=SUPERVISOR REPORT==== 18-Jul-2015::04:29:31 ===
Supervisor: {local,rabbit_disk_monitor_sup}
Context: start_error
Reason: unsupported_platform
Offender: [{pid,{restarting,<0.167.0>}},
{name,rabbit_disk_monitor},
{mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
{restart_type,{transient,1}},
{shutdown,4294967295},
{child_type,worker}]
From the error message, rabbitmq can't open more files due to system limits.
You can set max open file numbers to upper value to avoid the problem.
https://serverfault.com/questions/249477/windows-server-2008-r2-max-open-files-limit
There are two unrelated errors here: one is the VM failure to allocate memory. Another is disk space monitor terminating. Disk space monitor is optional and on some less common platforms or with specific security restrictions, it is known to fail. That does not bring the VM down, and certainly has nothing to do with heap allocation failures.
The heap allocation failure typically comes down to two most common cases:
A known bug fixed in Erlang 17.x (don't recall which specific patch release, so use 17.5)
You run 32 bit Erlang/OTP on a 64 bit OS.
Chen Yu's comment about the EACCESS system call error is correct.
I get analog error
systemd unit for activation check: "rabbitmq-server.service"
eheap_alloc: Cannot allocate 306586976 bytes of memory (of type "heap").^M
^M
Crash dump is being written to: erl_crash.dump...done^M
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 514979
max locked memory (kbytes, -l) 65536
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 514979
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
this is crush dump
=erl_crash_dump:0.5
Wed Dec 2 17:16:31 2020
Slogan: eheap_alloc: Cannot allocate 306586976 bytes of memory (of type "heap").
System version: Erlang/OTP 20 [erts-9.2] [source] [64-bit] [smp:32:32] [ds:32:32:10] [async-threads:512] [kernel-poll:true]
Compiled: Mon Feb 5 17:34:00 2018
Taints: crypto,asn1rt_nif,erl_tracer,zlib
Atoms: 34136
Calling Thread: scheduler:0
=scheduler:1
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK
Current Process:
=scheduler:2
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING
Scheduler Sleep Info Aux Work: THR_PRGR_LATER_OP
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK | NONEMPTY | EXEC
Current Process:
=scheduler:3
Scheduler Sleep Info Flags:
Scheduler Sleep Info Aux Work: DELAYED_AW_WAKEUP | DD | THR_PRGR_LATER_OP
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK | NONEMPTY | EXEC
Current Process: <0.12306.0>
Current Process State: Running
Current Process Internal State: ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL | ACTIVE | RUNNING | TRAP_EXIT | ON_HEAP_MSGQ
Current Process Program counter: 0x00007f2f3ab3a060 (unknown function)
Current Process CP: 0x0000000000000000 (invalid)
Current Process Limited Stack Trace:
0x00007f2b50252d68:SReturn addr 0x32A6EC98 (rabbit_channel:handle_method/3 + 6712)
0x00007f2b50252d78:SReturn addr 0x32A69630 (rabbit_channel:handle_cast/2 + 4160)
0x00007f2b50252df8:SReturn addr 0x51102708 (gen_server2:handle_msg/2 + 1808)
0x00007f2b50252e28:SReturn addr 0x3FD85E70 (proc_lib:init_p_do_apply/3 + 72)
0x00007f2b50252e48:SReturn addr 0x7FFB4948 (<terminate process normally>)
=scheduler:4
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING

WSO2: message broker, startup takes long (v 2.2.0)

We have installed WSO2 Message Broker, v2.2.0 on Suse 64 bit OS, single core. We have configured the master-datasources.xml to point to an Oracle database. The startup of the MB takes minutes, especially:
TID: [0] [MB] [2014-06-11 15:57:53,039] INFO {org.apache.cassandra.thrift.ThriftServer} - Listening for thrift clients... {org.apache.cassandra.thrift.ThriftServer}
TID: [0] [MB] [2014-06-11 15:57:53,219] INFO {org.apache.cassandra.service.GCInspector} - GC for MarkSweepCompact: 407 ms for 1 collections, 60663688 used; max is 1037959168 {org.apache.cassandra.service.GCInspector}
TID: [0] [MB] [2014-06-11 15:58:39,137] WARN {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent} - Waiting for required OSGi services: org.wso2.carbon.server.admin.common.IServerAdmin,org.wso2.carbon.throttling.agent.ThrottlingAgent, {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent}
TID: [0] [MB] [2014-06-11 15:59:39,136] WARN {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent} - Waiting for required OSGi services: org.wso2.carbon.server.admin.common.IServerAdmin,org.wso2.carbon.throttling.agent.ThrottlingAgent, {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent}
TID: [0] [MB] [2014-06-11 16:00:39,136] WARN {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent} - Waiting for required OSGi services: org.wso2.carbon.server.admin.common.IServerAdmin,org.wso2.carbon.throttling.agent.ThrottlingAgent, {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent}
Is there a reason for this?
With Wso2 MB 220 we are getting these kind of errors when zookeeper/casandra server does not start properly.Ideally if clustering enabled zookeeper(Internal or External) server should be started properly before MB starts.
Further If you trying to run a MB cluster on a single machine and want to run two Zookeeper nodes there, Most probably you will be end up in these OSGI level errors.Please follow blog post on http://indikasampath.blogspot.com/2014/05/wso2-message-broker-cluster-setup-in.html for configuration details on WSO2 Message Broker cluster setup on a single machine

Passenger 4 + Nginx Won't Start Rails App

I've just installed passenger + nginx on a brand new Ubuntu 13.04 box, and trying to run an app that runs great on several other servers.
Every time I try to run the app in production, I get:
An error occurred while starting up the preloader: it did not write a startup response in time.
Application root
/home/avishai/apps/XXX/current
Environment (value of RAILS_ENV, RACK_ENV, WSGI_ENV and PASSENGER_ENV)
production
Ruby interpreter command
/home/avishai/.rvm/wrappers/ruby-1.9.3-p448/ruby
User and groups
Unknown
Environment variables
Unknown
Ulimits
Unknown
Passenger is able to load the app in development mode, but not production. The nginx error log shows the following:
[ 2013-08-14 17:09:14.8321 17810/7f99daebd700 Pool2/Spawner.h:738 ]: [App 17843 stdout]
[ 2013-08-14 17:10:44.8406 17810/7f99daebd700 Pool2/Implementation.cpp:774 ]: Could not spawn process for group /home/avishai/apps/XXX/current#default: An error occurred while starting up the preloader: it did not write a startup response in time.
in 'void Passenger::ApplicationPool2::SmartSpawner::throwPreloaderSpawnException(const string&, Passenger::SpawnException::ErrorKind, Passenger::ApplicationPool2::Spawner::BackgroundIOCapturerPtr&, const DebugDirPtr&)' (SmartSpawner.h:150)
in 'std::string Passenger::ApplicationPool2::SmartSpawner::negotiatePreloaderStartup(Passenger::ApplicationPool2::SmartSpawner::StartupDetails&)' (SmartSpawner.h:558)
in 'void Passenger::ApplicationPool2::SmartSpawner::startPreloader()' (SmartSpawner.h:206)
in 'virtual Passenger::ApplicationPool2::ProcessPtr Passenger::ApplicationPool2::SmartSpawner::spawn(const Passenger::ApplicationPool2::Options&)' (SmartSpawner.h:744)
in 'void Passenger::ApplicationPool2::Group::spawnThreadRealMain(const SpawnerPtr&, const Passenger::ApplicationPool2::Options&, unsigned int)' (Implementation.cpp:707)
[ 2013-08-14 17:10:44.8409 17810/7f99d8f9e700 agents/HelperAgent/RequestHandler.h:1888 ]: [Client 20] Cannot checkout session. An error occurred while starting up the preloader: it did not write a startup response in time.
[ 2013-08-14 17:10:44.8412 17810/7f99d8f9e700 agents/HelperAgent/RequestHandler.h:1888 ]: [Client 21] Cannot checkout session. An error occurred while starting up the preloader: it did not write a startup response in time.
I'm at a dead-end. How can I get the app to load?

Nginx + Passenger - Uncaught exception in PassengerServer client thread

I've installed Passenger with Nginx for testing here and I keep getting this error after some thousand requests:
[ pid=57259 thr=0x40f07780 file=ext/nginx/HelperAgent.cpp:576 time=2010-12-15 14:04:25.876 ]: Uncaught exception in PassengerServer client thread:
exception: write() failed: Socket is not connected (57)
backtrace:
in 'void Client::forwardResponse(Passenger::SessionPtr&, Passenger::FileDescriptor&)' (HelperAgent.cpp:368)
in 'void Client::handleRequest(Passenger::FileDescriptor&)' (HelperAgent.cpp:502)
in 'void Client::threadMain()' (HelperAgent.cpp:595)
[ pid=57259 thr=0x40f07080 file=ext/nginx/HelperAgent.cpp:566 time=2010-12-15 14:04:26.416 ]: Couldn't forward the HTTP response back to the HTTP client: It seems the user clicked on the 'Stop' button in his browser.
I have 2 servers, and I was running haproxy+apache+mongrel on them, I switched one of them for haproxy+nginx+passenger (haproxy is only a backup for my testing, so I can redirect to the old schema quickly in case of fire).
So I noticed that my passenger dies after this message.
Im using ruby-ee 1.8.7, rails 2.3.5 and FreeBSD.
That turns out to be a FreeBSD kernel bug. We're slowly putting more workarounds in the Phusion Passenger codebase to work around this.

Resources