CouchDB/Couchrest Errno::ECONNREFUSED Connection Refused - connect(2) error - ruby-on-rails

At work, we have about 1500 test cases, and we manually clean the database using DB.recreate! method before each test. When running all tests using bundle exec rake spec, all tests rarely pass. There are number of tests that fail towards the end of suite with the "Errno::ECONNREFUSED Connection Refused - connect(2) error" errors.
Any help would be much appreciated!
I am using CouchDB 1.3.1, Ubuntu 12.04 LTS, Ruby 1.9.3, and Rails 3.2.12.
Thanks,
EDIT
I looked at the log file more carefully and matched the time tests started failing and error messages that were generated in couchdb log.
[Fri, 16 Aug 2013 19:39:46 GMT] [error] [<0.23790.0>] ** Generic server <0.23790.0> terminating
** Last message in was {'EXIT',<0.23789.0>,killed}
** When Server state == {file,{file_descriptor,prim_file,{#Port<0.14445>,20}},
79}
** Reason for termination ==
** killed
[Fri, 16 Aug 2013 19:39:46 GMT] [error] [<0.23790.0>] {error_report,<0.31.0>,
{<0.23790.0>,crash_report,
[[{initial_call,{couch_file,init,['Argument__1']}},
{pid,<0.23790.0>},
{registered_name,[]},
{error_info,
{exit,killed,
[{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}},
{ancestors,[<0.23789.0>]},
{messages,[]},
{links,[]},
{dictionary,[]},
{trap_exit,true},
{status,running},
{heap_size,377},
{stack_size,24},
{reductions,916}],
[]]}}
[Fri, 16 Aug 2013 19:39:46 GMT] [error] [<0.23808.0>] {error_report,<0.31.0>,
{<0.23808.0>,crash_report,
[[{initial_call,
{couch_ref_counter,init,['Argument__1']}},
{pid,<0.23808.0>},
{registered_name,[]},
{error_info,
{exit,
{noproc,
[{erlang,link,[<0.23790.0>]},
{couch_ref_counter,'-init/1-lc$^0/1-0-',1},
{couch_ref_counter,init,1},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]},
[{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]}},
{ancestors,[<0.23793.0>,<0.23792.0>,<0.23789.0>]},
{messages,[]},
{links,[]},
{dictionary,[]},
{trap_exit,false},
{status,running},
{heap_size,377},
{stack_size,24},
{reductions,114}],
[]]}}
[Fri, 16 Aug 2013 19:39:46 GMT] [error] [<0.103.0>] ** Generic server <0.103.0> terminating
** Last message in was {'EXIT',<0.88.0>,killed}
** When Server state == {db,<0.103.0>,<0.104.0>,nil,<<"1376681645837889">>,
<0.106.0>,<0.102.0>,<0.107.0>,
{db_header,6,1,0,
{1856,{1,0,1777},95},
{1951,1,83},
nil,0,nil,nil,1000},
1,
{btree,<0.102.0>,
{1856,{1,0,1777},95},
#Fun<couch_db_updater.10.55895019>,
#Fun<couch_db_updater.11.100913286>,
#Fun<couch_btree.5.25288484>,
#Fun<couch_db_updater.12.39068440>,snappy},
{btree,<0.102.0>,
{1951,1,83},
#Fun<couch_db_updater.13.114276184>,
#Fun<couch_db_updater.14.2340873>,
#Fun<couch_btree.5.25288484>,
#Fun<couch_db_updater.15.23651859>,snappy},
{btree,<0.102.0>,nil,
#Fun<couch_btree.3.20686015>,
#Fun<couch_btree.4.73514747>,
#Fun<couch_btree.5.25288484>,nil,snappy},
1,<<"_users">>,"/var/lib/couchdb/_users.couch",
[#Fun<couch_doc.8.106888048>],
[],nil,
{user_ctx,null,[],undefined},
nil,1000,
[before_header,after_header,on_file_open],
[create,
{before_doc_update,
#Fun<couch_users_db.before_doc_update.2>},
{after_doc_read,
#Fun<couch_users_db.after_doc_read.2>},
sys_db,
{user_ctx,
{user_ctx,null,[<<"_admin">>],undefined}},
nologifmissing,sys_db],
snappy,#Fun<couch_users_db.before_doc_update.2>,
#Fun<couch_users_db.after_doc_read.2>}
** Reason for termination ==
** killed

Ah.... The power of the community. I got the following answer from someone in the CouchDB mailing list.
In short, the solution is to change delayed_commit value to false. It's set to true by default, and rapidly recreating multiple databases at the beginning of each test case were creating a race condition (deleting non-existent db, etc.).
This definitely solved my problem.
One caveat is that it has doubled our test duration. That's another problem to tackle, but for now, I am happy with all passing tests.

Related

Jenkins build faild on slave node with `java.io.EOFException`

The stacktrace is below:
Evacuated stdout
Starting Selenium nodes on ci2
March 18, 2019 11:04:00 AM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
WARN: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.gitclient.Git$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
March 18, 2019 11:04:03 AM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
WARN: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
Slave JVM has not reported exit code. Is it still running?
[03/18/19 11:04:06] Launch failed - cleaning up connection
ERROR: Connection terminated
java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
at hudson.remoting.Command.readFrom(Command.java:140)
at hudson.remoting.Command.readFrom(Command.java:126)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
[03/18/19 11:04:06] [SSH] Connection closed。
I use JDK8 and jenkins 2.164.1 on Ubuntu 16.04.6
How to fix this?
This issue has been bugging us as well. There seems to be a temporary work around, not a solution. Just set the remote root directory of the slave node configuration on jenkins server to a new path, and remoting will take care of the rest.. but, the issue seem to reoccur now and then. We don't know the root cause yet. Any word is more than welcome.

Why are Google Pipeline VM instances hanging indefinitely?

I am using Dockerflow to run parallel tasks through the Google Pipelines API on Google Cloud Platform. I started a single-step task running 1389 VMs in parallel and found that 233 of the VMs were apparently doing nothing and hanging indefinitely.
I did a spot check of the serial console output and repeatedly saw the VMs running into "Getting controller config failed" errors.
When I tried logging into the VMs I received the error: "Connection Failed. We are unable to connect to the VM on port 22".
I am wondering why my VM instances are hanging, and if there is something I can do to avoid running into these issues.
I've included a snippet of the serial console output below
startupscript: +++ readlink -f /usr/share/google-genomics/startup.sh
startupscript: ++ dirname /usr/share/google-genomics/startup.sh
startupscript: + cd /usr/share/google-genomics
startupscript: + ./controller --operation_id <id> --validation_token <token> --base_path https://genomics.googleapis.com
create controller[2905]: Getting controller config
create controller[2905]: Getting controller config failed, will retry: Get <link>: Get <service_account_token_link>: net/http: timeout awaiting response headers
create controller[2905]: Getting controller config failed, will retry: Get <link>: dial tcp 74.125.26.95:443: i/o timeout
collectd[2342]: write_gcm: Asking metadata server for auth token
collectd[2342]: write_gcm: curl_easy_perform() failed: Couldn't connect to server
collectd[2342]: write_gcm: Error -1 from wg_curl_get_or_post
collectd[2342]: write_gcm: wg_transmit_unique_segment failed.
collectd[2342]: write_gcm: wg_transmit_unique_segments failed. Flushing.
there was a temporary networking issue in us-east1-b. All 3 above VMs were in us-east1-b. These minor incidents do not appear in https://status.cloud.google.com/
Serial console output for a successful run looks like:
A Feb 21 19:05:06 ggp-5629907348021283130 startupscript: + ./controller --operation_id --validation_token --base_path https://autopush-genomics.sandbox.googleapis.com
A Feb 21 19:05:06 ggp-5629907348021283130 create controller[2689]: Getting controller config
A Feb 21 19:05:36 ggp-5629907348021283130 create controller[2689]: Getting controller config failed, will retry: Get https://genomics.googleapis.com/v1alpha2/pipelines:getControllerConfig?alt=json&operationId=&validationToken=: dial tcp 173.194.212.81:443: i/o timeout
A Feb 21 19:05:43 ggp-5629907348021283130 controller[2689]: Switching to status: pulling-image
A Feb 21 19:05:43 ggp-5629907348021283130 controller[2689]: Calling SetOperationStatus(pulling-image)
A Feb 21 19:05:44 ggp-5629907348021283130 controller[2689]: SetOperationStatus(pulling-image) succeeded
The "Getting controller config failed, will retry" is fine. It succeeded upon retry. The "SetOperationStatus(pulling-image) succeeded" indicates networking is working.
In theory, you can submit any number of jobs to Pipelines API and the API will take care of queueing.
If these temporary networking hiccups become common, we may consider changing Pipelines API to somehow detect and retry.
there may have been a temporary networking issue. Can you give me some failed operation ids (or failed VM names)?
Have you tried again since then; can you reproduce the problem?

Connection reset by peer: FastCGI: comm with server aborted: read failed

Using FastCGI on my dedicated server (Debian).
I now have following error, sometimes (total random behavior !!).
Resulting to white page (error 500).
[Tue May 27 13:02:09 2014] [error] [client 85.68.183.29] (104)Connection reset by peer: FastCGI: comm with server "/var/www/php5.external" aborted: read failed, referer: [...]
[Tue May 27 13:02:09 2014] [error] [client 85.68.183.29] FastCGI: incomplete headers (0 bytes) received from server "/var/www/php5.external", referer: [...]
I cannot find any other errors linked to this (any PHP details, MySQL SQL error, nothing else !!!).
Any idea to prevent this ugly bug?
Should I come back to mod-php5 ??
You might try following the suggestion on this page: https://groups.google.com/d/msg/highload-php-en/4F79Pco-2eg/_tfPMiLFzg4J
Copied here for reference:
Use -idle-timeout paramater on "FastCgiExternalServer" line to solve
this problem.
My FastCgiExternalServer line: FastCgiExternalServer
/var/run/fastcgi/USERNAME-fcgi -appConnTimeout 10 -idle-timeout 250
-socket /var/run/fastcgi/USERNAME.socket -pass-header Authorization
More information in mod_fastcgi doc:
http://www.fastcgi.com/mod_fastcgi/docs/mod_fastcgi.html
I had this issue as well. I figured out that there was some recursive dependency that resulted in not enough memory being available. Resolving the recursive dependency removed the issue.

mongooseIM- not able to use mod_vcard_odbc

I have set up mongooseIM successfully. It wan't to use it with odbc. The module mod_vcard_odbc cannot work properly. When i enter vcard following error occurs-
2014-01-02 19:35:22.192 [error] <0.369.0>#ejabberd_odbc:outer_transaction:400 SQL transaction restarts exceeded
** Restarts: 10
** Last abort reason: "#42S22Unknown column 'server' in 'where clause'"
** Stacktrace: [{ejabberd_odbc,sql_query_t,1,[{file,"src/ejabberd_odbc.erl"}, {line,138}]},{odbc_queries,update_t,4,[{file,"src/odbc_queries.erl"},{line,119}]},{ejabberd_odbc,outer_transaction,3,[{file,"src/ejabberd_odbc.erl"},{line,391}]},{ejabberd_odbc,run_sql_cmd,4,[{file,"src/ejabberd_odbc.erl"},{line,317}]},{p1_fsm,handle_msg,10,[{file,"src/p1_fsm.erl"},{line,542}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]
** When State == {state,<0.371.0>,mysql,30000,<<"localhost">>,1000,{0,{[],[]}}}
I don't know why this error occurs.

Rails errors causing Apache / Passanger internal server errors

Ok I thought I was close to getting passenger and Apache working. I notice that some gem files were not installed after navigating to the url to see if my rails app was working. Passanger error page let me know what gems were missing so I got them installed.
Now going to the URL I get a 500 Apache internal error page with no helpful info so I checked out the log file on the server and here is what I see.
Rails Error: Unable to access log file. Please ensure that /home/mydirectory/dev/vb/log/production.log exists and is chmod 0666. $
Rack: /home/mydirectory/dev/vb: symbol lookup error: /usr/local/rvm/gems/ruby-1.9.2-p0#prodset/gems/sqlite3-ruby-1.2.4/lib/sqlite$
[Tue Dec 07 20:12:17 2010] [error] [client 64.58.208.22] Premature end of script headers:
[ pid=20653 thr=140618873321280 file=ext/apache2/Hooks.cpp:816 time=2010-12-07 20:12:17.617 ]: The backend application (proce$
Rack: /home/mydirectory/dev/vb: symbol lookup error: /usr/local/rvm/gems/ruby-1.9.2-p0#prodset/gems/sqlite3-ruby-1.2.4/lib/sqlite$
[Tue Dec 07 20:12:43 2010] [error] [client 64.58.208.22] Premature end of script headers:
Rack: /home/mydirectory/dev/vb: symbol lookup error: /usr/local/rvm/gems/ruby-1.9.2-p0#prodset/gems/sqlite3-ruby-1.2.4/lib/sqlite$
[Tue Dec 07 20:13:25 2010] [error] [client 64.58.208.22] Premature end of script headers:
[ pid=21932 thr=140618873321280 file=ext/apache2/Hooks.cpp:816 time=2010-12-07 20:13:25.168 ]: The backend application (proce$
Rack: /home/mydirectory/dev/vb: symbol lookup error: /usr/local/rvm/gems/ruby-1.9.2-p0#prodset/gems/sqlite3-ruby-1.2.4/lib/sqlite$
[Tue Dec 07 20:13:31 2010] [error] [client 64.58.208.22] Premature end of script headers:
[ pid=20623 thr=140618873321280 file=ext/apache2/Hooks.cpp:816 time=2010-12-07 20:13:31.266 ]: The backend application (proce$
Rails Error: Unable to access log file. Please ensure that /home/mydirectory/dev/vb/log/production.log exists and is chmod 0666. $
Rack: /home/mydirectory/dev/vb: symbol lookup error: /usr/local/rvm/gems/ruby-1.9.2-p0#prodset/gems/sqlite3-ruby-1.2.4/lib/sqlite$
[Tue Dec 07 20:24:56 2010] [error] [client 64.58.208.22] Premature end of script headers:
[ pid=20622 thr=140618873321280 file=ext/apache2/Hooks.cpp:816 time=2010-12-07 20:24:56.442 ]: The backend application (proce$
Rack: /home/mydirectory/dev/vb: symbol lookup error: /usr/local/rvm/gems/ruby-1.9.2-p0#prodset/gems/sqlite3-ruby-1.2.4/lib/sqlite$
anyone have any suggestions on what I should look at next. I have tried running bundler and also using rvm to install sqlite3 and I still have the same issue.
thanks again for any help
Did you checked the suggestion on the first line of the error log?
Rails Error: Unable to access log file. Please ensure that /home/mydirectory/dev/vb/log/production.log exists and is chmod 0666.

Resources