I have a cluster situation consisting of 4 total nodes, 3 servers and 1 management node, working properly.
At the beginning of the month we planned to patch the OS and we started from the first server node with this procedure:
Stop service
S.O. patching
Server restart
Start service
The service of the first patched node named "serverA" fails to restart with this error:
Log entries cluster join:
serverA:
| INFO | region-dm-12 | ache.geode.internal.tcp.Connection | --> Connection: shared=true ordered=false failed to connect to peer 10.237.110.195( Server serverB:9993):1024 because: java.net.ConnectException: Connection timed out (Connection timed out)
| WARN | region-dm-12 | ache.geode.internal.tcp.Connection | --> Connection: Attempting reconnect to peer 10.237.110.195( Server serverB:9993):1024
ServerMgmt:
| WARN | pool-3-thread-1 | tributed.internal.ReplyProcessor21 | --> 15 seconds have elapsed while waiting for replies: <CreateRegionProcessor$CreateRegionReplyProcessor 44180 waiting for 1 replies from [10.237.110.194( Server serverA:632):1024]> on 10.237.110.225( Management:6033):1024 whose current membership list is: [[10.237.110.196( Server serverC:16805):1024, 10.237.110.225( Management:6033):1024, 10.237.110.195( Server serverB:9993):1024, 10.237.110.194( Server serverA:632):1024]]
The connection between the systems was verified with tcpdumps, udp 1024 is running fine.
We have tried redeploying the service and making numerous attempts but we always get the same error during startup.
Any suggestions? Thank you.
Marco.
I think to see this error message, serverA was probably able to send UDP messages to serverB but it is failing to create a TCP connection. It's hard to say why though - a firewall issue, some TCP configuration issue, ... ?
Check to see if serverB has anything interesting in its logs. Since you are using TCP dump, you should be watching for that TCP connection for serverB:9993, since it looks like that is wwhat failed.
There is no firewall between the systems, we've analyzed again the network connection, during startup from node a, and we can see that the communication can be established between all systems. But what we detected is, that on port 2323 which is configured as locater, the node sends packages to the b and c node, but only receives back packages from the c node, and not from the b node. This is for us again a sign that the b node has an issue. Does it give a way to check our assumption from the b node?
A node ip .194
B node ip .195
C node ip .196
Management ip .225
I went to step 7
https://coral.ai/docs/dev-board/get-started/#run-demo
After I hit the command edgetpu_demo --stream, the console has the message below and then the system rebooted.
INFO:edgetpuvision.streaming.server:Listening on ports tcp: 4665, web:
4664, annexb: 4666 INFO:edgetpuvision.streaming.server:New web
connection from 192.168.1.6:62609
INFO:edgetpuvision.streaming.server:Number of active clients: 1
INFO:edgetpuvision.streaming.server:[192.168.1.6:62609] Rx thread
finished INFO:edgetpuvision.streaming.server:[192.168.1.6:62609] Tx
thread finished INFO:edgetpuvision.streaming.server:New web connection
from 192.168.1.6:62610 INFO:edgetpuvision.streaming.server:Number of
active clients: 2
INFO:edgetpuvision.streaming.server:[192.168.1.6:62609] Stopping...
INFO:edgetpuvision.streaming.server:[192.168.1.6:62609] Stopped.
INFO:edgetpuvision.streaming.server:Number of active clients: 1
INFO:edgetpuvision.streaming.server:New web connection from
192.168.1.6:62611 INFO:edgetpuvision.streaming.server:Number of active clients: 2 INFO:edgetpuvision.streaming.server:[192.168.1.6:62610] Rx
thread finished INFO:edgetpuvision.streaming.server:New web connection
from 192.168.1.6:62612 INFO:edgetpuvision.streaming.server:Number of
active clients: 3
INFO:edgetpuvision.streaming.server:[192.168.1.6:62611] Rx thread
finished
Need I update some modules such as GStreamer, etc.?
Board reboot can occur if board is not getting enough power. Please make sure to boot the board with at least 2.1 - 3 amp of power adaptor.
英语不好,请见谅!!!!
I use the poolboy as my database connection pools,i have read the README.md on the github:https://github.com/devinus/poolboy
But at last i do not konw where i have started the poolboy when i want it to start,then i got an error:already_started
My project's files:http://pastebin.com/zus6dGdz
I use the cowboy to be my http server,but you can ignore it.
I start the program like this:
1.I use the rebar to compile
$rebar clean & make
2.then i use the erl to run my program
$ erl -pa ebin/ -pa deps/*/ebin -s start server_start
But i got the errors as follows:
=CRASH REPORT==== 3-Feb-2015::17:47:27 ===
crasher:
initial call: poolboy:init/1
pid: <0.171.0>
registered_name: []
exception exit: {{badmatch,{error,{already_started,<0.173.0>}}},
[{poolboy,new_worker,1,
[{file,"src/poolboy.erl"},{line,260}]},
{poolboy,prepopulate,3,
[{file,"src/poolboy.erl"},{line,281}]},
{poolboy,init,3,[{file,"src/poolboy.erl"},{line,143}]},
{gen_server,init_it,6,
[{file,"gen_server.erl"},{line,306}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,237}]}]}
in function gen_server:init_it/6 (gen_server.erl, line 330)
ancestors: [hello_erlang_sup,<0.66.0>]
messages: []
links: [<0.172.0>,<0.173.0>,<0.170.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 610
stack_size: 27
reductions: 205
neighbours:
neighbour: [{pid,<0.173.0>},
{registered_name,db_mongo_handler},
{initial_call,{db_mongo_handler,init,['Argument__1']}},
{current_function,{gen_server,loop,6}},
{ancestors,[<0.172.0>,mg_pool1,hello_erlang_sup,<0.66.0>]},
{messages,[]},
{links,[<0.172.0>,<0.174.0>,<0.171.0>]},
{dictionary,[]},
{trap_exit,false},
{status,waiting},
{heap_size,233},
{stack_size,9},
{reductions,86}]
Please help to solve the problem!Ths!
You are starting a pool of 10 workers with the same registered name. When a process is registered with a name and another process tries to register with the same name, you get the error already_started.
In your example code, the worker module for poolboy is db_mongo_handler. Poolboy tries to start 10 workers by calling db_mongo_handler:start_link/1 which is implemented as
start_link(Args) ->
gen_server:start_link({local, ?SERVER}, ?MODULE, Args, []).
The first worker can start but when the second worker starts it crashes with already_started.
Normally the workers of a pool of many similar workers should not have a registered name. Instead, only the pool has a name and when you need a worker, you ask poolboy to deliver a pid() of one of the workers using poolboy:checkout(mg_pool1).
To fix the code, change gen_server:start_link({local, ?SERVER}, ?MODULE, Args, []) to gen_server:start_link(?MODULE, Args, []). Then it will not be registered with a name.
I'm getting a lot of errors on shutdown of my Erlang vm related to my cowboy handlers. I've got a simple_one_for_one supervisor running a start_listeners() function that runs cowboy:start_http().
Everything starts, no errors, handles requests normally.
If I shutdown the erlang VM, I get:
[error] Supervisor bitter_rpc_sup had child bitter_rpc_http_id started with bitter_rpc_sup:start_listeners() at undefined exit with reason killed in context shutdown_error
And a bunch of other errors related to the cowboy processes being killed and terminating abnormally. Does cowboy not follow OTP conventions for shutdown? Is there a way for me to intercept the shutdown at the supervisor and manually shut down all of the cowboy processes / ranch pool?
Where should I be looking to try and squash this error?
You can create ranch child and add it in your supervisor:
init([]) ->
%% define Ref, NbAcceptors, IP, Port, Dispatch
...
WebChild = ranch:child_spec(Ref,
NbAcceptors,
ranch_tcp,
[{ip, IP}, {port, Port}],
cowboy_protocol,
[{env, [{dispatch, Dispatch}]}]),
{ok, {{one_for_one, 10, 10}, [WebChild]}}.
Taking a hard look at the included Cowboy examples, the http server isn't supervised directly, but is running under the Cowboy application.
So I changed the supervisor for my rpc daemon to do nothing:
init([]) ->
Procs = [],
{ok, {{one_for_one, 10, 10}, Procs}}.
and instantiated the cowboy dispatcher in the main process, returning the empty supervisor from start(,)
My app generates some image data on the fly and sends it back to the browser with send_data some_huge_blob, :type => 'image/png'. This works well enough in development mode, but in production with nginx/passenger in the mix it appears as if sometimes passenger just crashes. Here is the debug output in my nginx log
[ pid=596 thr=140172782794496 file=ext/common/ApplicationPool/Pool.h:1162 time=2011-07-25 23:15:14.965 ]: Exception occurred while connecting to checked out process 1428: Cannot connect to Unix socket '/tmp/passenger.1.0.589/generation-0/backends/ruby.kJRjXYuZteKoogZIufN8a2cDPdpbIlYmIr1hh3G9UV7GhKDB4pqZ5y0jR': Connection refused (111)
[ pid=596 thr=140172782794496 file=ext/common/ApplicationPool/Pool.h:685 time=2011-07-25 23:15:14.965 ]: Detaching process 1428
[ pid=596 thr=140172782794496 file=ext/common/ApplicationPool/../Process.h:138 time=2011-07-25 23:15:14.969 ]: Application process 1428 (0x2676ee0): destroyed.
[ pid=1405 thr=70178806733240 file=abstract_request_handler.rb:466 time=2011-07-25 23:15:14.982 ]: Accepting new request on main socket
2011/07/25 23:15:16 [error] 642#0: *96 upstream prematurely closed connection while reading response header from upstream, client: 173.8.216.57, server: app.somedomain.com, request: "GET /projects/4e2dee4c106a821bf2000008/revisions/1/assets/Layout2.psd/preview HTTP/1.1", upstream: "passenger:unix:/passenger_helper_server:", host: "app.somedomain.com"
Note that there is nothing in my production.log file that indicates the request even makes it to the app!
Any ideas? Or ideas as to how to debug this further? The connection refused bit is interesting...
For what it's worth, this is an Ubuntu image on a micro instance in AWS.