Trying out Scrapy + Splash - lua
So I'm playing around with Scrapy & Splash and I'm running into some issues.
I tried running my spiders, and kept getting HTTP 502 & 504 errors. Okay, so I tried to check out Splash in my browser.
First I did "sudo docker run -p 8050:8050 -p 5023:5023 scrapinghub/splash --max-timeout 3600 -v3" to start Splash running, then I went to localhost:8050. Web UI opens up properly, and I'm able to enter code.
Here is the basic function I'm trying to run:
function main(splash, args)
assert(splash:autoload("https://code.jquery.com/jquery-3.1.1.min.js"))
splash.resource_timeout = 30.0
splash.images_enabled = false
assert(splash:go(args.url))
assert(splash:wait(0.5))
return {
html = splash:html(),
--png = splash:png(),
--har = splash:har(),
}
end
I try to render http://boingboing.net/blog, using this function, and get an 'invalid hostname' LUA error; here are the logs:
2017-08-01 18:26:28+0000 [-] Log opened.
2017-08-01 18:26:28.077457 [-] Splash version: 3.0
2017-08-01 18:26:28.077838 [-] Qt 5.9.1, PyQt 5.9, WebKit 602.1, sip 4.19.3, Twisted 16.1.1, Lua 5.2
2017-08-01 18:26:28.077900 [-] Python 3.5.2 (default, Nov 17 2016, 17:05:23) [GCC 5.4.0 20160609]
2017-08-01 18:26:28.077984 [-] Open files limit: 65536
2017-08-01 18:26:28.078046 [-] Can't bump open files limit
2017-08-01 18:26:28.180376 [-] Xvfb is started: ['Xvfb', ':1937726875', '-screen', '0', '1024x768x24', '-nolisten', 'tcp']
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
2017-08-01 18:26:28.226937 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles
2017-08-01 18:26:28.301002 [-] verbosity=3
2017-08-01 18:26:28.301116 [-] slots=50
2017-08-01 18:26:28.301202 [-] argument_cache_max_entries=500
2017-08-01 18:26:28.301530 [-] Web UI: enabled, Lua: enabled (sandbox: enabled)
2017-08-01 18:26:28.302122 [-] Site starting on 8050
2017-08-01 18:26:28.302219 [-] Starting factory <twisted.web.server.Site object at 0x7ffa08390dd8>
2017-08-01 18:26:32.660457 [-] "172.17.0.1" - - [01/Aug/2017:18:26:32 +0000] "GET / HTTP/1.1" 200 7677 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0"
2017-08-01 18:27:18.860020 [-] "172.17.0.1" - - [01/Aug/2017:18:27:18 +0000] "GET /info?wait=0.5&images=1&expand=1&timeout=3600.0&url=http%3A%2F%2Fboingboing.net%2Fblog&lua_source=function+main%28splash%2C+args%29%0D%0A++assert%28splash%3Aautoload%28%22https%3A%2F%2Fcode.jquery.com%2Fjquery-3.1.1.min.js%22%29%29%0D%0A++splash.resource_timeout+%3D+30.0%0D%0A++splash.images_enabled+%3D+false%0D%0A++assert%28splash%3Ago%28args.url%29%29%0D%0A++assert%28splash%3Await%280.5%29%29%0D%0A++return+%7B%0D%0A++++html+%3D+splash%3Ahtml%28%29%2C%0D%0A++++--png+%3D+splash%3Apng%28%29%2C%0D%0A++++--har+%3D+splash%3Ahar%28%29%2C%0D%0A++%7D%0D%0Aend HTTP/1.1" 200 5656 "http://localhost:8050/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0"
2017-08-01 18:27:19.038565 [pool] initializing SLOT 0
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
process 1: D-Bus library appears to be incorrectly set up; failed to read machine uuid: UUID file '/etc/machine-id' should contain a hex string of length 32, not length 0, with no other text
See the manual page for dbus-uuidgen to correct this issue.
2017-08-01 18:27:19.066765 [render] [140711856519656] viewport size is set to 1024x768
2017-08-01 18:27:19.066964 [pool] [140711856519656] SLOT 0 is starting
2017-08-01 18:27:19.067071 [render] [140711856519656] function main(splash, args)\r\n assert(splash:autoload("https://code.jquery.com/jquery-3.1.1.min.js"))\r\n splash.resource_timeout = 30.0\r\n splash.images_enabled = false\r\n assert(splash:go(args.url))\r\n assert(splash:wait(0.5))\r\n return {\r\n html = splash:html(),\r\n --png = splash:png(),\r\n --har = splash:har(),\r\n }\r\nend
2017-08-01 18:27:19.070107 [render] [140711856519656] [lua_runner] dispatch cmd_id=__START__
2017-08-01 18:27:19.070270 [render] [140711856519656] [lua_runner] arguments are for command __START__, waiting for result of __START__
2017-08-01 18:27:19.070352 [render] [140711856519656] [lua_runner] entering dispatch/loop body, args=()
2017-08-01 18:27:19.070424 [render] [140711856519656] [lua_runner] send None
2017-08-01 18:27:19.070496 [render] [140711856519656] [lua_runner] send (lua) None
2017-08-01 18:27:19.070657 [render] [140711856519656] [lua_runner] got AsyncBrowserCommand(id=None, name='http_get', kwargs={'url': 'https://code.jquery.com/jquery-3.1.1.min.js', 'callback': '<a callback>'})
2017-08-01 18:27:19.070755 [render] [140711856519656] [lua_runner] instructions used: 70
2017-08-01 18:27:19.070834 [render] [140711856519656] [lua_runner] executing AsyncBrowserCommand(id=0, name='http_get', kwargs={'url': 'https://code.jquery.com/jquery-3.1.1.min.js', 'callback': '<a callback>'})
2017-08-01 18:27:19.071141 [network] [140711856519656] GET https://code.jquery.com/jquery-3.1.1.min.js
qt.network.ssl: QSslSocket: cannot resolve SSLv2_client_method
qt.network.ssl: QSslSocket: cannot resolve SSLv2_server_method
2017-08-01 18:27:19.082150 [pool] [140711856519656] SLOT 0 is working
2017-08-01 18:27:19.082298 [pool] [140711856519656] queued
2017-08-01 18:28:39.151814 [network-manager] Download error 3: the remote host name was not found (invalid hostname) (https://code.jquery.com/jquery-3.1.1.min.js)
2017-08-01 18:28:39.152087 [network-manager] Finished downloading https://code.jquery.com/jquery-3.1.1.min.js
2017-08-01 18:28:39.152202 [render] [140711856519656] [lua_runner] dispatch cmd_id=0
2017-08-01 18:28:39.152268 [render] [140711856519656] [lua_runner] arguments are for command 0, waiting for result of 0
2017-08-01 18:28:39.152339 [render] [140711856519656] [lua_runner] entering dispatch/loop body, args=(PyResult('return', None, 'invalid_hostname'),)
2017-08-01 18:28:39.152400 [render] [140711856519656] [lua_runner] send PyResult('return', None, 'invalid_hostname')
2017-08-01 18:28:39.152468 [render] [140711856519656] [lua_runner] send (lua) (b'return', None, b'invalid_hostname')
2017-08-01 18:28:39.152582 [render] [140711856519656] [lua_runner] instructions used: 79
2017-08-01 18:28:39.152642 [render] [140711856519656] [lua_runner] caught LuaError LuaError('[string "function main(splash, args)\\r..."]:2: invalid_hostname',)
2017-08-01 18:28:39.152816 [pool] [140711856519656] SLOT 0 finished with an error <splash.qtrender_lua.LuaRender object at 0x7ffa08477e48>: [Failure instance: Traceback: <class 'splash.exceptions.ScriptError'>: {'error': 'invalid_hostname', 'type': 'LUA_ERROR', 'source': '[string "function main(splash, args)\r..."]', 'message': 'Lua error: [string "function main(splash, args)\r..."]:2: invalid_hostname', 'line_number': 2}
/app/splash/browser_tab.py:1180:_return_reply
/app/splash/qtrender_lua.py:901:callback
/app/splash/lua_runner.py:27:return_result
/app/splash/qtrender.py:17:stop_on_error_wrapper
--- <exception caught here> ---
/app/splash/qtrender.py:15:stop_on_error_wrapper
/app/splash/qtrender_lua.py:2257:dispatch
/app/splash/lua_runner.py:195:dispatch
]
2017-08-01 18:28:39.152883 [pool] [140711856519656] SLOT 0 is closing <splash.qtrender_lua.LuaRender object at 0x7ffa08477e48>
2017-08-01 18:28:39.152944 [render] [140711856519656] [splash] clearing 0 objects
2017-08-01 18:28:39.153026 [render] [140711856519656] close is requested by a script
2017-08-01 18:28:39.153304 [render] [140711856519656] cancelling 0 remaining timers
2017-08-01 18:28:39.153374 [pool] [140711856519656] SLOT 0 done with <splash.qtrender_lua.LuaRender object at 0x7ffa08477e48>
2017-08-01 18:28:39.153997 [events] {"user-agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0", "error": {"error": 400, "info": {"error": "invalid_hostname", "type": "LUA_ERROR", "source": "[string \"function main(splash, args)\r...\"]", "message": "Lua error: [string \"function main(splash, args)\r...\"]:2: invalid_hostname", "line_number": 2}, "type": "ScriptError", "description": "Error happened while executing Lua script"}, "active": 0, "status_code": 400, "maxrss": 107916, "qsize": 0, "path": "/execute", "timestamp": 1501612119, "fds": 18, "args": {"render_all": false, "http_method": "GET", "png": 1, "url": "http://boingboing.net/blog", "wait": 0.5, "html": 1, "response_body": false, "har": 1, "load_args": {}, "lua_source": "function main(splash, args)\r\n assert(splash:autoload(\"https://code.jquery.com/jquery-3.1.1.min.js\"))\r\n splash.resource_timeout = 30.0\r\n splash.images_enabled = false\r\n assert(splash:go(args.url))\r\n assert(splash:wait(0.5))\r\n return {\r\n html = splash:html(),\r\n --png = splash:png(),\r\n --har = splash:har(),\r\n }\r\nend", "resource_timeout": 0, "uid": 140711856519656, "save_args": [], "viewport": "1024x768", "timeout": 3600, "images": 1}, "client_ip": "172.17.0.1", "rendertime": 80.11527562141418, "method": "POST", "_id": 140711856519656, "load": [0.46, 0.51, 0.54]}
2017-08-01 18:28:39.154127 [-] "172.17.0.1" - - [01/Aug/2017:18:28:38 +0000] "POST /execute HTTP/1.1" 400 325 "http://localhost:8050/info?wait=0.5&images=1&expand=1&timeout=3600.0&url=http%3A%2F%2Fboingboing.net%2Fblog&lua_source=function+main%28splash%2C+args%29%0D%0A++assert%28splash%3Aautoload%28%22https%3A%2F%2Fcode.jquery.com%2Fjquery-3.1.1.min.js%22%29%29%0D%0A++splash.resource_timeout+%3D+30.0%0D%0A++splash.images_enabled+%3D+false%0D%0A++assert%28splash%3Ago%28args.url%29%29%0D%0A++assert%28splash%3Await%280.5%29%29%0D%0A++return+%7B%0D%0A++++html+%3D+splash%3Ahtml%28%29%2C%0D%0A++++--png+%3D+splash%3Apng%28%29%2C%0D%0A++++--har+%3D+splash%3Ahar%28%29%2C%0D%0A++%7D%0D%0Aend" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0"
2017-08-01 18:28:39.154237 [pool] SLOT 0 is available
If I try it without loading up JQuery first, I get a 'network5' LUA error (which is some species of timeout). Logs for that are as follows:
2017-08-01 18:31:07.110255 [-] "172.17.0.1" - - [01/Aug/2017:18:31:06 +0000] "GET /info?wait=0.5&images=1&expand=1&timeout=3600.0&url=http%3A%2F%2Fboingboing.net%2Fblog&lua_source=function+main%28splash%2C+args%29%0D%0A++--assert%28splash%3Aautoload%28%22https%3A%2F%2Fcode.jquery.com%2Fjquery-3.1.1.min.js%22%29%29%0D%0A++splash.resource_timeout+%3D+30.0%0D%0A++splash.images_enabled+%3D+false%0D%0A++assert%28splash%3Ago%28args.url%29%29%0D%0A++assert%28splash%3Await%280.5%29%29%0D%0A++return+%7B%0D%0A++++html+%3D+splash%3Ahtml%28%29%2C%0D%0A++++--png+%3D+splash%3Apng%28%29%2C%0D%0A++++--har+%3D+splash%3Ahar%28%29%2C%0D%0A++%7D%0D%0Aend HTTP/1.1" 200 5658 "http://localhost:8050/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0"
2017-08-01 18:31:07.489653 [pool] initializing SLOT 1
2017-08-01 18:31:07.490576 [render] [140711856961016] viewport size is set to 1024x768
2017-08-01 18:31:07.490692 [pool] [140711856961016] SLOT 1 is starting
2017-08-01 18:31:07.490829 [render] [140711856961016] function main(splash, args)\r\n --assert(splash:autoload("https://code.jquery.com/jquery-3.1.1.min.js"))\r\n splash.resource_timeout = 30.0\r\n splash.images_enabled = false\r\n assert(splash:go(args.url))\r\n assert(splash:wait(0.5))\r\n return {\r\n html = splash:html(),\r\n --png = splash:png(),\r\n --har = splash:har(),\r\n }\r\nend
2017-08-01 18:31:07.493641 [render] [140711856961016] [lua_runner] dispatch cmd_id=__START__
2017-08-01 18:31:07.493782 [render] [140711856961016] [lua_runner] arguments are for command __START__, waiting for result of __START__
2017-08-01 18:31:07.493865 [render] [140711856961016] [lua_runner] entering dispatch/loop body, args=()
2017-08-01 18:31:07.493937 [render] [140711856961016] [lua_runner] send None
2017-08-01 18:31:07.494010 [render] [140711856961016] [lua_runner] send (lua) None
2017-08-01 18:31:07.494270 [render] [140711856961016] [lua_runner] got AsyncBrowserCommand(id=None, name='go', kwargs={'baseurl': None, 'http_method': 'GET', 'headers': None, 'body': None, 'url': 'http://boingboing.net/blog', 'errback': '<an errback>', 'callback': '<a callback>'})
2017-08-01 18:31:07.494416 [render] [140711856961016] [lua_runner] instructions used: 166
2017-08-01 18:31:07.494502 [render] [140711856961016] [lua_runner] executing AsyncBrowserCommand(id=0, name='go', kwargs={'baseurl': None, 'http_method': 'GET', 'headers': None, 'body': None, 'url': 'http://boingboing.net/blog', 'errback': '<an errback>', 'callback': '<a callback>'})
2017-08-01 18:31:07.494576 [render] [140711856961016] HAR event: _onStarted
2017-08-01 18:31:07.494697 [render] [140711856961016] callback 0 is connected to loadFinished
2017-08-01 18:31:07.495031 [network] [140711856961016] GET http://boingboing.net/blog
2017-08-01 18:31:07.495617 [pool] [140711856961016] SLOT 1 is working
2017-08-01 18:31:07.495741 [pool] [140711856961016] queued
2017-08-01 18:31:37.789845 [network-manager] timed out, aborting: http://boingboing.net/blog
2017-08-01 18:31:37.790154 [network-manager] Finished downloading http://boingboing.net/blog
2017-08-01 18:31:37.791064 [render] [140711856961016] mainFrame().urlChanged http://boingboing.net/blog
2017-08-01 18:31:37.796078 [render] [140711856961016] mainFrame().initialLayoutCompleted
2017-08-01 18:31:37.796343 [render] [140711856961016] loadFinished: RenderErrorInfo(type='Network', code=5, text='Operation canceled', url='http://boingboing.net/blog')
2017-08-01 18:31:37.796420 [render] [140711856961016] loadFinished: disconnecting callback 0
2017-08-01 18:31:37.796518 [render] [140711856961016] [lua_runner] dispatch cmd_id=0
2017-08-01 18:31:37.796576 [render] [140711856961016] [lua_runner] arguments are for command 0, waiting for result of 0
2017-08-01 18:31:37.796640 [render] [140711856961016] [lua_runner] entering dispatch/loop body, args=(PyResult('return', None, 'network5'),)
2017-08-01 18:31:37.796699 [render] [140711856961016] [lua_runner] send PyResult('return', None, 'network5')
2017-08-01 18:31:37.796765 [render] [140711856961016] [lua_runner] send (lua) (b'return', None, b'network5')
2017-08-01 18:31:37.796883 [render] [140711856961016] [lua_runner] instructions used: 175
2017-08-01 18:31:37.796943 [render] [140711856961016] [lua_runner] caught LuaError LuaError('[string "function main(splash, args)\\r..."]:5: network5',)
2017-08-01 18:31:37.797093 [pool] [140711856961016] SLOT 1 finished with an error <splash.qtrender_lua.LuaRender object at 0x7ffa083ff828>: [Failure instance: Traceback: <class 'splash.exceptions.ScriptError'>: {'error': 'network5', 'type': 'LUA_ERROR', 'source': '[string "function main(splash, args)\r..."]', 'message': 'Lua error: [string "function main(splash, args)\r..."]:5: network5', 'line_number': 5}
/app/splash/browser_tab.py:533:_on_content_ready
/app/splash/qtrender_lua.py:702:error
/app/splash/lua_runner.py:27:return_result
/app/splash/qtrender.py:17:stop_on_error_wrapper
--- <exception caught here> ---
/app/splash/qtrender.py:15:stop_on_error_wrapper
/app/splash/qtrender_lua.py:2257:dispatch
/app/splash/lua_runner.py:195:dispatch
]
2017-08-01 18:31:37.797158 [pool] [140711856961016] SLOT 1 is closing <splash.qtrender_lua.LuaRender object at 0x7ffa083ff828>
2017-08-01 18:31:37.797217 [render] [140711856961016] [splash] clearing 0 objects
2017-08-01 18:31:37.797310 [render] [140711856961016] close is requested by a script
2017-08-01 18:31:37.797430 [render] [140711856961016] cancelling 0 remaining timers
2017-08-01 18:31:37.797491 [pool] [140711856961016] SLOT 1 done with <splash.qtrender_lua.LuaRender object at 0x7ffa083ff828>
2017-08-01 18:31:37.798067 [events] {"user-agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0", "error": {"error": 400, "info": {"error": "network5", "type": "LUA_ERROR", "source": "[string \"function main(splash, args)\r...\"]", "message": "Lua error: [string \"function main(splash, args)\r...\"]:5: network5", "line_number": 5}, "type": "ScriptError", "description": "Error happened while executing Lua script"}, "active": 0, "status_code": 400, "maxrss": 113372, "qsize": 0, "path": "/execute", "timestamp": 1501612297, "fds": 21, "args": {"render_all": false, "http_method": "GET", "png": 1, "url": "http://boingboing.net/blog", "wait": 0.5, "html": 1, "response_body": false, "har": 1, "load_args": {}, "lua_source": "function main(splash, args)\r\n --assert(splash:autoload(\"https://code.jquery.com/jquery-3.1.1.min.js\"))\r\n splash.resource_timeout = 30.0\r\n splash.images_enabled = false\r\n assert(splash:go(args.url))\r\n assert(splash:wait(0.5))\r\n return {\r\n html = splash:html(),\r\n --png = splash:png(),\r\n --har = splash:har(),\r\n }\r\nend", "resource_timeout": 0, "uid": 140711856961016, "save_args": [], "viewport": "1024x768", "timeout": 3600, "images": 1}, "client_ip": "172.17.0.1", "rendertime": 30.308406591415405, "method": "POST", "_id": 140711856961016, "load": [0.39, 0.42, 0.49]}
2017-08-01 18:31:37.798190 [-] "172.17.0.1" - - [01/Aug/2017:18:31:37 +0000] "POST /execute HTTP/1.1" 400 309 "http://localhost:8050/info?wait=0.5&images=1&expand=1&timeout=3600.0&url=http%3A%2F%2Fboingboing.net%2Fblog&lua_source=function+main%28splash%2C+args%29%0D%0A++--assert%28splash%3Aautoload%28%22https%3A%2F%2Fcode.jquery.com%2Fjquery-3.1.1.min.js%22%29%29%0D%0A++splash.resource_timeout+%3D+30.0%0D%0A++splash.images_enabled+%3D+false%0D%0A++assert%28splash%3Ago%28args.url%29%29%0D%0A++assert%28splash%3Await%280.5%29%29%0D%0A++return+%7B%0D%0A++++html+%3D+splash%3Ahtml%28%29%2C%0D%0A++++--png+%3D+splash%3Apng%28%29%2C%0D%0A++++--har+%3D+splash%3Ahar%28%29%2C%0D%0A++%7D%0D%0Aend" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0"
2017-08-01 18:31:37.798294 [pool] SLOT 1 is available
If I additionally comment out the resource_timeout line, I get a network3 LUA error (again, invalid hostname, but this time presenting differently).
Any idea what I'm doing wrong?
As it turns out, it wasn't a Scrapy/Splash issue at all -- it was a Docker / IP route / network admin issue. The network admins set it up so that I can only make HTTP requests through a particular destination; adding "--net=host" to my docker start-up seems to have fixed this.
This webpage was very helpful.
Try changing
function main(splash, args)
...
assert(splash:go(args.url))
...
to
function main(splash)
...
assert(splash:go(splash.args.url))
...
At least that's how it reads when I open Splash on port 8050 in the default script. With that change, your script works for me.
Related
How to write fluent bit input logs to localhost syslog server
I'm working on collecting logs from docker containerized application. I'm able to bring the logs to stdout output plugin but when I am trying syslog output plugin then it is not writing on syslog server. Below is the configuration file. [SERVICE] Parsers_File /etc/td-agent-bit/parsers.conf [INPUT] Name forward [Output] name syslog match * host 127.0.0.1 port 514 mode udp syslog_format rfc5424 syslog_hostname_key hostname syslog_appname_key appname syslog_procid_key procid syslog_message_key log The logging property for container application is set to logging: driver: fluentd options: fluentd-address: localhost:24224 tag: logs After running the fluent bit /opt/td-agent-bit/bin/td-agent-bit -c fluent.conf [2021/09/20 08:47:16] [ warn] [engine] failed to flush chunk '8481-1632152835.361162854.flb', retry in 7 seconds: task_id=0, input=forward.0 > output=syslog.0 (out_id=0) [2021/09/20 08:47:23] [ warn] [engine] chunk '8481-1632152835.361162854.flb' cannot be retried: task_id=0, input=forward.0 > output=syslog.0 [2021/09/20 08:47:26] [ warn] [engine] failed to flush chunk '8481-1632152845.361118393.flb', retry in 6 seconds: task_id=0, input=forward.0 > output=syslog.0 (out_id=0) [2021/09/20 08:47:32] [ warn] [engine] chunk '8481-1632152845.361118393.flb' cannot be retried: task_id=0, input=forward.0 > output=syslog.0 [2021/09/20 08:47:36] [ warn] [engine] failed to flush chunk '8481-1632152855.361556013.flb', retry in 8 seconds: task_id=0, input=forward.0 > output=syslog.0 (out_id=0) Can anyone tell what is going wrong here? Even for simple cpu input plugin syslog does not work. like /opt/td-agent-bit/bin/td-agent-bit -i cpu -o syslog output [2021/09/20 08:53:43] [ info] [cmetrics] version=0.2.1 [2021/09/20 08:53:43] [ info] [output:syslog:syslog.0] setup done for 127.0.0.1:514 [2021/09/20 08:53:43] [ info] [sp] stream processor started [2021/09/20 08:53:48] [ warn] [engine] failed to flush chunk '8765-1632153224.515974981.flb', retry in 10 seconds: task_id=0, input=cpu.0 > output=syslog.0 (out_id=0) [2021/09/20 08:53:53] [ warn] [engine] failed to flush chunk '8765-1632153228.516869744.flb', retry in 6 seconds: task_id=1, input=cpu.0 > output=syslog.0 (out_id=0) [2021/09/20 08:53:58] [ warn] [engine] chunk '8765-1632153224.515974981.flb' cannot be retried: task_id=0, input=cpu.0 > output=syslog.0
i faced similar issue, happening because fluentbit pod itself is 127.0.0.1. i created another rsyslog pod and used its IP to get logs from fluentbit. https://artifacthub.io/packages/helm/rsyslog/rsyslog
Uvicorn server shutting down unexpectedly
I'm working with FastAPI framework, served by Uvicorn server. My application should run some time consuming numerical computation at a given endpoint (/run). For this I am using 'background_task' from fastAPI (which is basically 'background_task' from Starlette). When running the application, after some times of nominal behaviour, the server is shut down for some reason. The logs from the application look like this: INFO: Started server process [922] INFO: Waiting for application startup. DEBUG: None - ASGI [1] Started DEBUG: None - ASGI [1] Sent {'type': 'lifespan.startup'} DEBUG: None - ASGI [1] Received {'type': 'lifespan.startup.complete'} INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) DEBUG: ('10.0.2.111', 57396) - Connected DEBUG: ('10.0.2.111', 57397) - Connected DEBUG: ('10.0.2.111', 57396) - ASGI [2] Started DEBUG: ('10.0.2.111', 57396) - ASGI [2] Received {'type': 'http.response.start', 'status': 200, 'headers': '<...>'} INFO: ('10.0.2.111', 57396) - "GET /run HTTP/1.1" 200 DEBUG: ('10.0.2.111', 57396) - ASGI [2] Received {'type': 'http.response.body', 'body': '<32 bytes>'} DEBUG: ('10.0.2.111', 57396) - ASGI [3] Started DEBUG: ('10.0.2.111', 57396) - ASGI [3] Received {'type': 'http.response.start', 'status': 404, 'headers': '<...>'} INFO: ('10.0.2.111', 57396) - "GET /favicon.ico HTTP/1.1" 404 DEBUG: ('10.0.2.111', 57396) - ASGI [3] Received {'type': 'http.response.body', 'body': '<22 bytes>'} DEBUG: ('10.0.2.111', 57396) - ASGI [3] Completed ... DEBUG: ('10.0.2.111', 57396) - Disconnected ... The background task is completed. DEBUG: ('10.0.2.111', 57396) - ASGI [2] Completed DEBUG: ('10.0.2.111', 57397) - Disconnected DEBUG: ('10.0.2.111', 57405) - Connected ... The application goes on, with requests and completed background tasks. At some point, during the execution of a background task: INFO: Shutting down DEBUG: ('10.0.2.111', 57568) - Disconnected DEBUG: ('10.0.2.111', 57567) - Disconnected INFO: Waiting for background tasks to complete. (CTRL+C to force quit) DEBUG: ('10.0.2.111', 57567) - ASGI [6] Completed INFO: Waiting for application shutdown. DEBUG: None - ASGI [1] Sent {'type': 'lifespan.shutdown'} DEBUG: None - ASGI [1] Received {'type': 'lifespan.shutdown.complete'} DEBUG: None - ASGI [1] Completed INFO: Finished server process [922] I really don't get why this happens. I have no idea what to try in order to fix it. My code looks like this. #!/usr/bin/env python3.7 import time from fastapi import FastAPI, BackgroundTasks import uvicorn from starlette.responses import JSONResponse import my_imports_from_project analysis_api = FastAPI() #analysis_api.get("/") def root(): return {"message": "root"} #analysis_api.get("/test") def test(): return {"message": "test"} #analysis_api.get("/run") def run(name: str, background_task: BackgroundTasks): try: some_checks(name) except RaisedExceptions: body = {"running": False, "name": name, "cause": "Not found in database"} return JSONResponse(status_code=400, content=body) body = {"running": True, "name": name} background_task.add_task(run_analysis, name) return JSONResponse(status_code=200, content=body) if __name__ == "__main__": uvicorn.run("api:analysis_api", host="0.0.0.0", log_level="debug")
This is how I solved the whole problem. I think that the problem was that my tasks spawn some processes in order to perform computations. So, instead of using FastApi background_task, I am now using multiprocessing.Process(). This solves it. As pointed out from the guys from FastApi, this solution might not scale well if the project becomes big and complex. In that case it is highly suggested to use something like a message queue + task running (as suggested on FastApi site. However, for small projects the solution with multiprocessing.Process or subprocess.Popen is totally fine.
Cassandra connection idling and timing out
I'm trying to load and delete data from Cassandra using the python driver. I have tried this both using cassandra running in a docker container and again locally after the docker version was giving me problems. Here's an example of what I'm doing: class Controller(object): def __init__(self): self.cluster = Cluster() self.session = self.cluster.connect('mykeyspace') def insert_into_cassandra(self): query = ('INSERT INTO mytable (mykey, indexed_key) VALUES (?, ?)') prepared = self.session.prepare(query) prepared.consistency_level = ConsistencyLevel.QUORUM params_gen = self.params_generator(fname) execute_concurrent_with_args(self.session, prepared, self.parameter_generator(), concurrency=50) def delete_param_gen(self, results): for r in results: yield [r.mykey] def delete_by_index(self, value): query = "SELECT mykey from mytable where indexed_key = '%s'" % value res = self.session.execute(query) delete_query = "DELETE from mytable where mykey = ?" prepared = self.session.prepare(delete_query) prepared.consistency_level = ConsistencyLevel.QUORUM params_gen = self.delete_param_gen(res) execute_concurrent_with_args(self.session, prepared, params_gen, concurrency=50) Nothing crazy. When loading/deleting data, I frequently see the following messages: Sending options message heartbeat on idle connection (4422117360) 127.0.0.1 Heartbeat failed for connection (4422117360) to 127.0.0.1 Here are some logs from deleting data. [2017-02-28 08:37:20,562] [DEBUG] [cassandra.connection] Defuncting connection (4422117360) to 127.0.0.1: errors=Connection heartbeat timeout after 30 seconds, last_host=127.0.0.1 [2017-02-28 08:37:20,563] [DEBUG] [cassandra.io.libevreactor] Closing connection (4422117360) to 127.0.0.1 [2017-02-28 08:37:20,563] [DEBUG] [cassandra.io.libevreactor] Closed socket to 127.0.0.1 [2017-02-28 08:37:20,564] [DEBUG] [cassandra.pool] Defunct or closed connection (4422117360) returned to pool, potentially marking host 127.0.0.1 as down [2017-02-28 08:37:20,566] [DEBUG] [cassandra.pool] Replacing connection (4422117360) to 127.0.0.1 [2017-02-28 08:37:20,567] [DEBUG] [cassandra.connection] Defuncting connection (4426057600) to 127.0.0.1: errors=Connection heartbeat timeout after 30 seconds, last_host=127.0.0.1 [2017-02-28 08:37:20,567] [DEBUG] [cassandra.io.libevreactor] Closing connection (4426057600) to 127.0.0.1 [2017-02-28 08:37:20,567] [DEBUG] [cassandra.io.libevreacto[2017-02-28 08:37:20,568] [ERROR] [cassandra.cluster] Unexpected exception while handling result in ResponseFuture: Traceback (most recent call last): File "cassandra/cluster.py", line 3536, in cassandra.cluster.ResponseFuture._set_result (cassandra/cluster.c:67556) File "cassandra/cluster.py", line 3711, in cassandra.cluster.ResponseFuture._set_final_result (cassandra/cluster.c:71769) File "cassandra/concurrent.py", line 154, in cassandra.concurrent._ConcurrentExecutor._on_success (cassandra/concurrent.c:3357) File "cassandra/concurrent.py", line 203, in cassandra.concurrent.ConcurrentExecutorListResults._put_result (cassandra/concurrent.c:5539) File "cassandra/concurrent.py", line 209, in cassandra.concurrent.ConcurrentExecutorListResults._put_result (cassandra/concurrent.c:5427) File "cassandra/concurrent.py", line 123, in cassandra.concurrent._ConcurrentExecutor._execute_next (cassandra/concurrent.c:2369) File "load_cassandra.py", line 148, in delete_param_gen for r in rows: File "cassandra/cluster.py", line 3991, in cassandra.cluster.ResultSet.next (cassandra/cluster.c:76025) File "cassandra/cluster.py", line 4006, in cassandra.cluster.ResultSet.fetch_next_page (cassandra/cluster.c:76193) File "cassandra/cluster.py", line 3781, in cassandra.cluster.ResponseFuture.result (cassandra/cluster.c:73073) cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {})r] Closed socket to 127.0.0.1 And here are some from inserting data: [2017-02-28 16:50:25,594] [DEBUG] [cassandra.connection] Sending options message heartbeat on idle connection (140301574604448) 127.0.0.1 [2017-02-28 16:50:25,595] [DEBUG] [cassandra.cluster] [control connection] Attempting to reconnect [2017-02-28 16:50:25,596] [DEBUG] [cassandra.cluster] [control connection] Opening new connection to 127.0.0.1 [2017-02-28 16:50:25,596] [DEBUG] [cassandra.connection] Not sending options message for new connection(140301347717016) to 127.0.0.1 because compression is disabled and a cql version was not specified [2017-02-28 16:50:25,596] [DEBUG] [cassandra.connection] Sending StartupMessage on <AsyncoreConnection(140301347717016) 127.0.0.1:9042> [2017-02-28 16:50:25,596] [DEBUG] [cassandra.connection] Sent StartupMessage on <AsyncoreConnection(140301347717016) 127.0.0.1:9042> [2017-02-28 16:50:30,596] [DEBUG] [cassandra.io.asyncorereactor] Closing connection (140301347717016) to 127.0.0.1 [2017-02-28 16:50:30,596] [DEBUG] [cassandra.io.asyncorereactor] Closed socket to 127.0.0.1 [2017-02-28 16:50:30,596] [DEBUG] [cassandra.connection] Connection to 127.0.0.1 was closed during the startup handshake [2017-02-28 16:50:30,597] [WARNING] [cassandra.cluster] [control connection] Error connecting to 127.0.0.1: Traceback (most recent call last): File "cassandra/cluster.py", line 2623, in cassandra.cluster.ControlConnection._reconnect_internal (cassandra/cluster.c:47899) File "cassandra/cluster.py", line 2645, in cassandra.cluster.ControlConnection._try_connect (cassandra/cluster.c:48416) File "cassandra/cluster.py", line 1119, in cassandra.cluster.Cluster.connection_factory (cassandra/cluster.c:15085) File "cassandra/connection.py", line 333, in cassandra.connection.Connection.factory (cassandra/connection.c:5790) cassandra.OperationTimedOut: errors=Timed out creating connection (5 seconds), last_host=None [2017-02-28 16:50:39,309] [ERROR] [root] Exception inserting data into cassandra Traceback (most recent call last): File "load_cassandra.py", line 54, in run controller.insert_into_cassandra(filename) File "extract_to_cassandra.py", line 141, in insert_into_cassandra for success, result in results: File "cassandra/concurrent.py", line 177, in _results (cassandra/concurrent.c:4856) File "cassandra/concurrent.py", line 186, in cassandra.concurrent.ConcurrentExecutorGenResults._results (cassandra/concurrent.c:4622) File "cassandra/concurrent.py", line 165, in cassandra.concurrent._ConcurrentExecutor._raise (cassandra/concurrent.c:3745) cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'QUORUM', 'required_responses': 1, 'received_responses': 0} [2017-02-28 16:50:39,465] [DEBUG] [cassandra.connection] Received options response on connection (140301574604448) from 127.0.0.1 [2017-02-28 16:50:39,466] [DEBUG] [cassandra.cluster] Shutting down Cluster Scheduler [2017-02-28 16:50:39,467] [DEBUG] [cassandra.cluster] Shutting down control connection [2017-02-28 16:50:39,467] [DEBUG] [cassandra.io.asyncorereactor] Closing connection (140301574604448) to 127.0.0.1 [2017-02-28 16:50:39,467] [DEBUG] [cassandra.io.asyncorereactor] Closed socket to 127.0.0.1 [2017-02-28 16:50:39,468] [DEBUG] [cassandra.pool] Defunct or closed connection (140301574604448) returned to pool, potentially marking host 127.0.0.1 as down I tweaked with consistency and even set it to 1, but that didn't work. Inserts tend to work better when running cassandra locally as opposed to docker, but they still timeout. Deletes usually work for a couple of seconds and then hang/timeout. edit: Here's are the logs from cassandra when things fail: INFO 18:39:11 MUTATION messages were dropped in last 5000 ms: 4 for internal timeout and 0 for cross node timeout. Mean internal dropped latency: 2933809 ms and Mean cross-node dropped latency: 0 msINFO 18:39:11 Pool Name Active Pending Completed Blocked All Time Blocked [48/1513] INFO 18:39:11 MutationStage 32 15 470 0 0 INFO 18:39:11 ViewMutationStage 0 0 0 0 0 INFO 18:39:11 ReadStage 0 0 59 0 0 INFO 18:39:11 RequestResponseStage 0 0 0 0 0 INFO 18:39:11 ReadRepairStage 0 0 0 0 0 INFO 18:39:11 CounterMutationStage 0 0 0 0 0 INFO 18:39:11 MiscStage 0 0 0 0 0 INFO 18:39:11 CompactionExecutor 0 0 6399 0 0 INFO 18:39:11 MemtableReclaimMemory 0 0 36 0 0 INFO 18:39:11 PendingRangeCalculator 0 0 1 0 0 INFO 18:39:11 GossipStage 0 0 0 0 0 INFO 18:39:11 SecondaryIndexManagement 0 0 0 0 0 INFO 18:39:11 HintsDispatcher 0 0 0 0 0 INFO 18:39:11 MigrationStage 0 0 2 0 0 INFO 18:39:11 MemtablePostFlush 0 0 62 0 0 INFO 18:39:11 PerDiskMemtableFlushWriter_0 0 0 36 0 0 INFO 18:39:11 ValidationExecutor 0 0 0 0 0 INFO 18:39:11 Sampler 0 0 0 0 0 INFO 18:39:11 MemtableFlushWriter 0 0 36 0 0 INFO 18:39:11 InternalResponseStage 0 0 0 0 0 INFO 18:39:11 AntiEntropyStage 0 0 0 0 0 INFO 18:39:11 CacheCleanupExecutor 0 0 0 0 0 INFO 18:39:11 Native-Transport-Requests 33 0 727 0 0 INFO 18:39:11 CompactionManager 0 0INFO 18:39:11 MessagingService n/a 0/0 INFO 18:39:11 Cache Type Size Capacity KeysToSave INFO 18:39:11 KeyCache 1368 51380224 all INFO 18:39:11 RowCache 0 0 all INFO 18:39:11 Table Memtable ops,data INFO 18:39:11 system_distributed.parent_repair_history 0,0 INFO 18:39:11 system_distributed.repair_history 0,0 INFO 18:39:11 system_distributed.view_build_status 0,0 INFO 18:39:11 system.compaction_history 1,231 INFO 18:39:11 system.hints 0,0 INFO 18:39:11 system.schema_aggregates 0,0 INFO 18:39:11 system.IndexInfo 0,0 INFO 18:39:11 system.schema_columnfamilies 0,0 INFO 18:39:11 system.schema_triggers 0,0 INFO 18:39:11 system.size_estimates 40,1255 INFO 18:39:11 system.schema_functions 0,0 INFO 18:39:11 system.paxos 0,0 INFO 18:39:11 system.views_builds_in_progress 0,0 INFO 18:39:11 system.built_views 0,0 INFO 18:39:11 system.peer_events 0,0 INFO 18:39:11 system.range_xfers 0,0 INFO 18:39:11 system.peers 0,0 INFO 18:39:11 system.batches 0,0 INFO 18:39:11 system.schema_keyspaces 0,0 INFO 18:39:11 system.schema_usertypes 0,0 INFO 18:39:11 system.local 0,0 INFO 18:39:11 system.sstable_activity 6,117 INFO 18:39:11 system.available_ranges 0,0 INFO 18:39:11 system.batchlog 0,0 INFO 18:39:11 system.schema_columns 0,0 INFO 18:39:11 system_schema.columns 0,0 INFO 18:39:11 system_schema.types 0,0 INFO 18:39:11 system_schema.indexes 0,0 INFO 18:39:11 system_schema.keyspaces 0,0 INFO 18:39:11 system_schema.dropped_columns 0,0 INFO 18:39:11 system_schema.aggregates 0,0 INFO 18:39:11 system_schema.triggers 0,0 INFO 18:39:11 system_schema.tables 0,0 INFO 18:39:11 system_schema.views 0,0 INFO 18:39:11 system_schema.functions 0,0 INFO 18:39:11 system_auth.roles 0,0 INFO 18:39:11 system_auth.role_members 0,0 INFO 18:39:11 system_auth.resource_role_permissons_index 0,0 INFO 18:39:11 system_auth.role_permissions 0,0 INFO 18:39:11 mykeyspace.mytable 430,27163514 INFO 18:39:11 system_traces.sessions 0,0 INFO 18:39:11 system_traces.events 0,0 INFO 18:39:13 ParNew GC in 261ms. CMS Old Gen: 46106544 -> 74868512; Par Eden Space: 208895224 -> 0; Par Survivor Space: 16012448 -> 26083328 I see messages like this too: Out of 29 commit log syncs over the past 248s with average duration of 1596.14ms, 1 have exceeded the configured commit interval by an average of 18231.00ms
One thing you could try, is to reduce the idle_heartbeat_interval setting in your connection. By default it is 30 seconds, but you can configure that when instancing your Cluster class. In this example, I'll set it to 10 seconds: def __init__(self): self.cluster = Cluster(idle_heartbeat_interval=10) self.session = self.cluster.connect('mykeyspace') If that doesn't help, then it might be time to check your data model for anti-patterns.
Unicorn worker timeout for no reason intermittently
I am running Unicorn with a standard Ruby on Rails application on Ubuntu. The problem I am having is that intermittently my unicorn processes will timeout for no reason. The same request will work 90% of the time and timeout the other 10% of the time. The Rails log looks like this 2015-07-30 14:33:52,519 DEBG 'web' stdout output: Jul 30 14:33:52 3f48d861780b rails[148]: Started GET "/request_url" for 00.00.00.00 at 2015-07-30 14:33:52 +0000 2015-07-30 14:34:23,257 DEBG 'web' stderr output: E, [2015-07-30T14:34:23.252900 #18] ERROR -- : worker=4 PID:148 timeout (31s > 30s), killing E, [2015-07-30T14:34:23.256257 #18] ERROR -- : reaped #<Process::Status: pid 148 SIGKILL (signal 9)> worker=4 2015-07-30 14:34:23,261 DEBG 'web' stderr output: I, [2015-07-30T14:34:23.261294 #215] INFO -- : worker=4 ready The nginx error.log shows this error: 2015/07/30 14:20:43 [error] 2368#0: *365 upstream prematurely closed connection while reading response header from upstream, client: 00.00.00.00, server: ~^(www\.)?(.+)$, request: "GET /request_url HTTP/1.1", upstream: "http://00.00.00.00:5000/request_url", host: "myapp.com" Does anyone know what could be causing this?
RabbitMQ Generic server rabbit_disk_monitor terminating / eheap_alloc: Cannot allocate 229520 bytes of memory (of type "old_heap")
RabbitMQ crashed. RabbitMQ was working correctly for many days(10-15 days). I am not getting why it got crashed. I am using RabbitMQ 3.4.0 on Erlang 17.0 The erlang has created dump file for the crash. Which shows eheap_alloc: Cannot allocate 229520 bytes of memory (of type "old_heap"). Also note that the rabbitmq publish-subscribe message load is very low. (max:1-2 messages/second).And RabbitMQ messages are processed as it comes so RabbitMQ is almost empty all the time. The disk space & memory are also sufficient. More system info: Limiting to approx 8092 file handles (7280 sockets) Memory limit set to 6553MB of 16383MB total. Disk free limit set to 50MB. The RabbitMQ logs are as below. =ERROR REPORT==== 18-Jul-2015::04:29:31 === ** Generic server rabbit_disk_monitor terminating ** Last message in was update ** When Server state == {state,"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia", 50000000,28358258688,100,10000, #Ref<0.0.106.70488>,false} ** Reason for termination == ** {eacces,[{erlang,open_port, [{spawn,"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""}, [stream,in,eof,hide]], []}, {os,cmd,1,[{file,"os.erl"},{line,204}]}, {rabbit_disk_monitor,get_disk_free,2,[]}, {rabbit_disk_monitor,internal_update,1,[]}, {rabbit_disk_monitor,handle_info,2,[]}, {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,599}]}, {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]} =INFO REPORT==== 18-Jul-2015::04:29:31 === Disabling disk free space monitoring on unsupported platform: {{'EXIT',{eacces,[{erlang,open_port, [{spawn,"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""}, [stream,in,eof,hide]], []}, {os,cmd,1,[{file,"os.erl"},{line,204}]}, {rabbit_disk_monitor,get_disk_free,2,[]}, {rabbit_disk_monitor,init,1,[]}, {gen_server,init_it,6,[{file,"gen_server.erl"},{line,306}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,239}]}]}}, 17179336704} =INFO REPORT==== 18-Jul-2015::04:29:31 === Disabling disk free space monitoring on unsupported platform: {{'EXIT',{eacces,[{erlang,open_port, [{spawn,"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""}, [stream,in,eof,hide]], []}, {os,cmd,1,[{file,"os.erl"},{line,204}]}, {rabbit_disk_monitor,get_disk_free,2,[]}, {rabbit_disk_monitor,init,1,[]}, {gen_server,init_it,6,[{file,"gen_server.erl"},{line,306}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,239}]}]}}, 17179336704} =CRASH REPORT==== 18-Jul-2015::04:29:31 === crasher: initial call: rabbit_disk_monitor:init/1 pid: <0.167.0> registered_name: rabbit_disk_monitor exception exit: {eacces, [{erlang,open_port, [{spawn, "C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""}, [stream,in,eof,hide]], []}, {os,cmd,1,[{file,"os.erl"},{line,204}]}, {rabbit_disk_monitor,get_disk_free,2,[]}, {rabbit_disk_monitor,internal_update,1,[]}, {rabbit_disk_monitor,handle_info,2,[]}, {gen_server,handle_msg,5, [{file,"gen_server.erl"},{line,599}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,239}]}]} in function gen_server:terminate/6 (gen_server.erl, line 746) ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.140.0>] messages: [] links: [<0.166.0>] dictionary: [] trap_exit: false status: running heap_size: 4185 stack_size: 27 reductions: 481081978 neighbours: =SUPERVISOR REPORT==== 18-Jul-2015::04:29:31 === Supervisor: {local,rabbit_disk_monitor_sup} Context: child_terminated Reason: {eacces, [{erlang,open_port, [{spawn, "C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""}, [stream,in,eof,hide]], []}, {os,cmd,1,[{file,"os.erl"},{line,204}]}, {rabbit_disk_monitor,get_disk_free,2,[]}, {rabbit_disk_monitor,internal_update,1,[]}, {rabbit_disk_monitor,handle_info,2,[]}, {gen_server,handle_msg,5, [{file,"gen_server.erl"},{line,599}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,239}]}]} Offender: [{pid,<0.167.0>}, {name,rabbit_disk_monitor}, {mfargs,{rabbit_disk_monitor,start_link,[50000000]}}, {restart_type,{transient,1}}, {shutdown,4294967295}, {child_type,worker}] =CRASH REPORT==== 18-Jul-2015::04:29:31 === crasher: initial call: rabbit_disk_monitor:init/1 pid: <0.24989.51> registered_name: [] exception exit: unsupported_platform in function gen_server:init_it/6 (gen_server.erl, line 322) ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.140.0>] messages: [] links: [<0.166.0>] dictionary: [] trap_exit: false status: running heap_size: 1598 stack_size: 27 reductions: 650 neighbours: =SUPERVISOR REPORT==== 18-Jul-2015::04:29:31 === Supervisor: {local,rabbit_disk_monitor_sup} Context: start_error Reason: unsupported_platform Offender: [{pid,<0.167.0>}, {name,rabbit_disk_monitor}, {mfargs,{rabbit_disk_monitor,start_link,[50000000]}}, {restart_type,{transient,1}}, {shutdown,4294967295}, {child_type,worker}] =CRASH REPORT==== 18-Jul-2015::04:29:31 === crasher: initial call: rabbit_disk_monitor:init/1 pid: <0.24991.51> registered_name: [] exception exit: unsupported_platform in function gen_server:init_it/6 (gen_server.erl, line 322) ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.140.0>] messages: [] links: [<0.166.0>] dictionary: [] trap_exit: false status: running heap_size: 1598 stack_size: 27 reductions: 650 neighbours: =SUPERVISOR REPORT==== 18-Jul-2015::04:29:31 === Supervisor: {local,rabbit_disk_monitor_sup} Context: start_error Reason: unsupported_platform Offender: [{pid,{restarting,<0.167.0>}}, {name,rabbit_disk_monitor}, {mfargs,{rabbit_disk_monitor,start_link,[50000000]}}, {restart_type,{transient,1}}, {shutdown,4294967295}, {child_type,worker}]
From the error message, rabbitmq can't open more files due to system limits. You can set max open file numbers to upper value to avoid the problem. https://serverfault.com/questions/249477/windows-server-2008-r2-max-open-files-limit
There are two unrelated errors here: one is the VM failure to allocate memory. Another is disk space monitor terminating. Disk space monitor is optional and on some less common platforms or with specific security restrictions, it is known to fail. That does not bring the VM down, and certainly has nothing to do with heap allocation failures. The heap allocation failure typically comes down to two most common cases: A known bug fixed in Erlang 17.x (don't recall which specific patch release, so use 17.5) You run 32 bit Erlang/OTP on a 64 bit OS. Chen Yu's comment about the EACCESS system call error is correct.
I get analog error systemd unit for activation check: "rabbitmq-server.service" eheap_alloc: Cannot allocate 306586976 bytes of memory (of type "heap").^M ^M Crash dump is being written to: erl_crash.dump...done^M ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 514979 max locked memory (kbytes, -l) 65536 max memory size (kbytes, -m) unlimited open files (-n) 1048576 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 514979 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited this is crush dump =erl_crash_dump:0.5 Wed Dec 2 17:16:31 2020 Slogan: eheap_alloc: Cannot allocate 306586976 bytes of memory (of type "heap"). System version: Erlang/OTP 20 [erts-9.2] [source] [64-bit] [smp:32:32] [ds:32:32:10] [async-threads:512] [kernel-poll:true] Compiled: Mon Feb 5 17:34:00 2018 Taints: crypto,asn1rt_nif,erl_tracer,zlib Atoms: 34136 Calling Thread: scheduler:0 =scheduler:1 Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING Scheduler Sleep Info Aux Work: Current Port: Run Queue Max Length: 0 Run Queue High Length: 0 Run Queue Normal Length: 0 Run Queue Low Length: 0 Run Queue Port Length: 0 Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK Current Process: =scheduler:2 Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING Scheduler Sleep Info Aux Work: THR_PRGR_LATER_OP Current Port: Run Queue Max Length: 0 Run Queue High Length: 0 Run Queue Normal Length: 0 Run Queue Low Length: 0 Run Queue Port Length: 0 Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK | NONEMPTY | EXEC Current Process: =scheduler:3 Scheduler Sleep Info Flags: Scheduler Sleep Info Aux Work: DELAYED_AW_WAKEUP | DD | THR_PRGR_LATER_OP Current Port: Run Queue Max Length: 0 Run Queue High Length: 0 Run Queue Normal Length: 0 Run Queue Low Length: 0 Run Queue Port Length: 0 Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK | NONEMPTY | EXEC Current Process: <0.12306.0> Current Process State: Running Current Process Internal State: ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL | ACTIVE | RUNNING | TRAP_EXIT | ON_HEAP_MSGQ Current Process Program counter: 0x00007f2f3ab3a060 (unknown function) Current Process CP: 0x0000000000000000 (invalid) Current Process Limited Stack Trace: 0x00007f2b50252d68:SReturn addr 0x32A6EC98 (rabbit_channel:handle_method/3 + 6712) 0x00007f2b50252d78:SReturn addr 0x32A69630 (rabbit_channel:handle_cast/2 + 4160) 0x00007f2b50252df8:SReturn addr 0x51102708 (gen_server2:handle_msg/2 + 1808) 0x00007f2b50252e28:SReturn addr 0x3FD85E70 (proc_lib:init_p_do_apply/3 + 72) 0x00007f2b50252e48:SReturn addr 0x7FFB4948 (<terminate process normally>) =scheduler:4 Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING