I have "mariadb" set to 127.0.0.1 in my /etc/hosts file and sidekiq occasionally throws errors such as:
Mysql2::Error::ConnectionError: Unknown MySQL server host 'mariadb' (16)
The VM is not under significant load or anything like that.
Later edit: seems other gems have trouble resolving hosts too:
WARN -- : Unable to record event with remote Sentry server (Errno::EBUSY - Failed to open TCP connection to XXXX.ingest.sentry.io:443 (Device or resource busy - getaddrinfo)):
Anyone have any idea why that may happen?
I've figured this out a couple weeks ago but wanted to be sure before posting an answer.
I still can't figure out the mechanic of this issue but it was caused by fail2ban.
I had it running in a container polling the httpd logs and blocking the tremendous amount of bots scraping my sites.
Also I increased the max file handlers and inotify handlers.
fs.file-max = 131070
fs.inotify.max_user_watches = 65536
As soon as I got rid of fail2ban and increased the inotify handlers the errors disappeared.
Obviously fail2ban gets on the "do not touch" list because of this, and we've rolled out a 404/403/500 handler on application layer that pushes unknown IPs to Cloudflare.
Although this is probably an edge case I'm leaving this here in hope it helps someone at some point.
Related
I have a storm cluster with 1 nimbus node and 3 supervisor node which are running on docker containers on AWS ec2 instances. I had a topology running with the number of workers equal to 3 and it ran perfectly fine. I stopped and removed this container and started a new one. After this, I seem to have the following error in the supervisor logs:
2016-10-03 21:18:22 b.s.m.n.Client [ERROR] connection attempt 129 to Netty-Client-hostname:6702 failed: java.lang.RuntimeException: Returned channel was actually not established
I have edited "/etc/hosts" to include the hostname as follows:
IP-address hostname
Yet, the problems seems to persist. Although, the same topology runs perfectly fine with the number of workers set to 1. Any pointers on solving this issue is appreciated.
The problem was with the hostname. I changed the hostname to match the DNS name by updating "/etc/hostname" as well as "/etc/hosts" and the rebooted nimbus instance followed by the supervisor instances. This fixed the problem. Hope this helps anyone who is stuck with the same problem!
Please check your supervisor log, sometimes you need to redeploy the app because supervisor has not started the topology yet.
Sometimes my ArangoDB is going down with next error:
Error message 'Could not connect to 'tcp://127.0.0.1:8529' 'connect() failed with #10061
I can't understand the reason. It's look like I am turning on my PC and nothing do not work.
Before I fixed this problem with reinstall, but is there any better solution?
OS Windows
ArangoDB 2.8.7
The V8 version used in the pre ArangoDB 3 had occasional troubles in the garbage collection which would make ArangoDB in term go down.
This is fixed with ArangoDB 3.
Please upgrade your installation, and report back whether the problem still persists.
You can use netstat to check whether ArangoDB is listening to its default port 8529:
netstat -a
Active Connections
Proto Lokale Adresse Remoteadresse Status
...
TCP 127.0.0.1:8529 meschenich:0 LISTEN
...
If thats not the case, your client has nothing to connect to.
This could be due to firewall of an antivirus.
In my case it was Avast antivirus that was blocking connecting to that port.
I disabled all the antivirus shields and checked loading arangodb web server
http://127.0.0.1:8529
It connects after few minutes.
Reference : No connection could be made because the target machine actively refused it
I fixed the problem by restarting Windows.
I am totally new to Cassandra and met the following error when using cqlsh:
cqlsh
Connection error: Could not connect to localhost:9160
I read the solutions from the following link and tried them all. But none of them works for me.
How to connect Cassandra to localhost using cqlsh?
I am working on CentOS6.5 and installed Cassandra2.0 using yum intall dsc20.
I ran into the same issue running the same OS and same install method. While the cassandra service claims that it's starting ok, if you run service cassandra status it would tell me that the process was dead. Here are the steps I took to fix it:
Viewing the log file at /var/log/cassandra/cassandra.log gave told me that my heap size was too small. Manually set the heap size in /etc/cassandra/conf/cassandra-env.sh:
MAX_HEAP_SIZE="1G"
HEAP_NEWSIZE="256M"
Tips on setting the heap size for your system can be found here
Next, the error log claimed the stack size was too small. Once again in /etc/cassandra/conf/cassandra-env.sh find a line that looks like JVM_OPTS="$JVM_OPTS -Xss128k" and raise that number to JVM_OPTS="$JVM_OPTS -Xss256k"
Lastly, the log complained that the local url was misformed and threw a java exception. I found the answer to the last part here. Basically, you want to manually bind your server's hostname in your /etc/hosts file.
127.0.0.1 localhost localhost.localdomain server1.example.com
Hope this helps~
Change:
/etc/cassandra/cassandra.yaml
Whether to start the thrift rpc server.
start_rpc: false
to
start_rpc: true
I am frequently getting Pg Connection timeout error in my application. I tried to fix it by upgrading the system memory, also updated the pg gem but the issue is still occurring. Anybody's help will be really appreciated.
(ActiveRecord::StatementInvalid) "PGError: server closed the
connection unexpectedly\n\tThis probably means the server terminated
abnormally\n\tbefore or while processing the request.\n: BEGIN"
Both the client and server think that the other vanished unexpectedly. This suggests that you're having networking problems. Look into the network between client and server:
NAT routers or connection-tracking stateful firewalls with short-lived or undersized connection tables;
Physical connectivity problems with cables, WiFi, etc
Faulty switches, hubs and routers
Buggy software host-based firewalls
... etc
For what it's worth, the easiest solution is ActiveRecord::Base.connection.reconnect! once connectivity is re-established.
At work we're running some high traffic sites in rails. We often get a problem with the following being spammed in the nginx error log:
2011/05/24 11:20:08 [error] 90248#0: *468577825 connect() to unix:/app_path/production/shared/system/unicorn.sock failed (61: Connection refused) while connecting to upstream
Our setup is nginx on the frontend server (load balancing), and unicorn on our 4 app servers. Each unicorn is running with 8 workers. The setup is very similar to the one GitHub uses.
Most of our content is cached, and when the request hits nginx it looks for the page in memcached and serves that it if can find it - otherwise the request goes to rails.
I can solve the above issue - SOMETIMES - by doing a pkill of the unicorn processes on the servers followed by a:
cap production unicorn:check (removing all the pid's)
cap production unicorn:start
Do you guys have any clue to how I can debug this issue? We don't have any significantly high load on our database server when these problems occurs..
Something killed your unicorn process on one of the servers, or it timed out. Or you have an old app server in your upstream app_server { } block that is no longer valid. Nginx will retry it from time to time. The default is to re-try another upstream if it gets a connection error, so hopefully your clients didn't notice anything.
I don't think this is a nginx issue for me, restarting nginx didn't help. It seems to be gunicorn...A quick and dirty way to avoid this is to recycle the gunicorn instances when the system is not being used, say 1AM for example if that is an acceptable maintenance window. I run gunicorn as a service that will come back up if killed so a pkill script takes care of the recycle/respawn:
start on runlevel [2345]
stop on runlevel [06]
respawn
respawn limit 10 5
exec /var/web/proj/server.sh
I am starting to wonder if this is at all related to memory allocation. I have MongoDB running on the same system and it reserves all the memory for itself but it is supposed to yield if other applications require more memory.
Other things worth a try is getting rid of eventlet or other dependent modules when running gunicorn. uWSGI can also be used as an alternative to gunicorn.