Trouble connecting a Rails 4 app to redis over an ssh tunnel - ruby-on-rails

We're running a redis server in EC2 and to protect the connections established ssh tunnels from our production app machines to the redis instance. We used something like autossh -M 0 -f -NT -L6379:localhost:6379 -i /path/to/ec2.key root#ec2-xxxx.compute-1-amazonaws.com to establish persistent tunnels.
Now here's the problem, this seems to work about 90% of the time, but occasionally for whatever reason redis throws Redis::CannotConnectError: Error connecting to Redis on localhost:6379 (ECONNREFUSED) when initializing Resque from environment.rb. The given box already has numerous redis connections from other processes (unicorn & resque threads), so I know the tunnel is working. If I rerun the command it's almost always successful and is able to establish the connection without a hitch.
We're running Rails 4, Resque 1.24.1, Redis gem 3.0.5 and redis-server 2.6.12 on CentOS 6 on an m2-xlarge.
Any ideas? When this happens during deploy it's extremely frustrating as it kills capistrano, it's also causing sporadic failures in our nightly jobs that spin up their own rails environments.

Related

Rescue from Error connecting to Redis on 127.0.0.1:6379 (Errno::ECONNREFUSED) for RoR

I know this error is due to Redis server being down that can start working by restarting the redis server. However, how do I rescue from this error to avoid crash on Production?
Error connecting to Redis on 127.0.0.1:6379 (Errno::ECONNREFUSED)
There probably should not be a rescue from this issue, as this would mean anyehere in your code where you rescue, you would need to provide an alternative path for rescue to take.
Sometimes pain helps us to find and solve the source of a problem. Redis should not do down in production. If it does it should be restarted automatically.
For production loads, consider using infrastructure tools like Monit to monitor that Redis is up and running on the specified port and if it goes down automatically restarts redis. Or a script as per this answer below.
How Can I Restart The Redis Server If It Goes Down Automatically using a script?

Production server using an outdated release of capistrano

I deploy new version via capistrano to a Ubuntu 14.04 server and now Unicorn + Nginx setup is referring to inexistent release. I get the ActionView::MissingTemplate and also a I18n::InvalidLocaleData because it failed to load the devise.en.yml file.
I pretty much followed this repo. I already restart nginx and unicorn but still gives me the same error. It's searching on a release/release_timestamp that no longer exists
You can at least confirm that Nginx is not part of the problem by directly connecting to the port or socket that Unicorn is listening on from within the server.
If Unicorn is running on a socket, see Can Curl send requests to sockets?.

After a deploy to EC2 sidekiq now reports SocketError: getaddrinfo: Name or service not known

Application is Rails 4.1.4, Ruby 2.1.2.
Using sidekiq 3.2.6, redis 3.1.0, celluloid 0.15.2. The sidekiq implementation is as default as can be, with the exception of connecting to a remote redis queue (elastic cache).
When certain events are processed, we use sidekiq to queue up calls to an external API. The API is reachable through curl from the server our application is hosted on. All other functionality seems to still be performing as expected. This functionality has worked for weeks on the current server implementation/architecture.
After a successful deploy (with Capistrano, through Jenkins) to and EC2 instance, which is behind an elastic load balancer, and an auto-scaling group sidekiq will no longer connect(?) to elasticcache.
SocketError: getaddrinfo: Name or service not known
/gems/redis-3.1.0/lib/redis/connection/ruby.rb:152 in getaddrinfo
/gems/redis-3.1.0/lib/redis/connection/ruby.rb:152 in connect
/gems/redis-3.1.0/lib/redis/connection/ruby.rb:211 in connect
/gems/redis-3.1.0/lib/redis/client.rb:304 in establish_connection
/gems/redis-3.1.0/lib/redis/client.rb:85 in block in connect
/gems/redis-3.1.0/lib/redis/client.rb:266 in with_reconnect
/gems/redis-3.1.0/lib/redis/client.rb:84 in connect
/gems/redis-3.1.0/lib/redis/client.rb:326 in ensure_connected
/gems/redis-3.1.0/lib/redis/client.rb:197 in block in process
/gems/redis-3.1.0/lib/redis/client.rb:279 in logging
/gems/redis-3.1.0/lib/redis/client.rb:196 in process
/gems/redis-3.1.0/lib/redis/client.rb:102 in call
/gems/redis-3.1.0/lib/redis.rb:1315 in block in smembers
/gems/redis-3.1.0/lib/redis.rb:37 in block in synchronize
/usr/local/rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/monitor.rb:211 in mon_synchronize
/gems/redis-3.1.0/lib/redis.rb:37 in synchronize
/gems/redis-3.1.0/lib/redis.rb:1314 in smembers
/gems/sidekiq-3.2.6/lib/sidekiq/api.rb:557 in block in cleanup
/gems/connection_pool-2.0.0/lib/connection_pool.rb:58 in with
/gems/sidekiq-3.2.6/lib/sidekiq.rb:72 in redis
/gems/sidekiq-3.2.6/lib/sidekiq/api.rb:556 in cleanup
/gems/sidekiq-3.2.6/lib/sidekiq/api.rb:549 in initialize
/gems/sidekiq-3.2.6/lib/sidekiq/scheduled.rb:79 in new
/gems/sidekiq-3.2.6/lib/sidekiq/scheduled.rb:79 in poll_interval
/gems/sidekiq-3.2.6/lib/sidekiq/scheduled.rb:58 in block in poll
/gems/sidekiq-3.2.6/lib/sidekiq/util.rb:15 in watchdog
/gems/sidekiq-3.2.6/lib/sidekiq/scheduled.rb:23 in poll
/gems/celluloid-0.15.2/lib/celluloid/calls.rb:25 in public_send
/gems/celluloid-0.15.2/lib/celluloid/calls.rb:25 in dispatch
/gems/celluloid-0.15.2/lib/celluloid/calls.rb:122 in dispatch
/gems/celluloid-0.15.2/lib/celluloid/actor.rb:322 in block in handle_message
/gems/celluloid-0.15.2/lib/celluloid/actor.rb:416 in block in task
/gems/celluloid-0.15.2/lib/celluloid/tasks.rb:55 in block in initialize
/gems/celluloid-0.15.2/lib/celluloid/tasks/task_fiber.rb:13 in block in create
We have restarted sidekiq, restarted elastic cache, restarted the server, inspected the redis queue with redis-cli and seen nothing noteworthy.
As implied, we can connect to elastic cache using redis-cli, however, using sidekiq/apifrom the console, we get the same SocketError
Any ideas on how to remedy? The application is neigh unusable at this point.
Thanks!
Yay for embarrassing errors! There was a typo in the ENV var url. 10 hours later, between me and the devops, and it was a copy and paste issue.
Thanks

Sporadic Redis TimeoutError from Heroku Rails app

We're running a rails app via heroku that connects to a windows azure VM, where I've set up a redis master/slave to act as a cache (slash quick reference data store). The problem is, we sporadically get redis timeouts. This is with a timeout of 10 seconds (which I know is more than it needs), and reestablishing redis connections on fork. And using hiredis as a driver.
Anyone have a clue why this might be happening? I know heroku and the azure vm are hosted on different coasts, so there's a bit of latency; could there be TCP request drops? I'm fairly out of ideas.

Nginx + unicorn (rails) often gives "Connection refused" in nginx error log

At work we're running some high traffic sites in rails. We often get a problem with the following being spammed in the nginx error log:
2011/05/24 11:20:08 [error] 90248#0: *468577825 connect() to unix:/app_path/production/shared/system/unicorn.sock failed (61: Connection refused) while connecting to upstream
Our setup is nginx on the frontend server (load balancing), and unicorn on our 4 app servers. Each unicorn is running with 8 workers. The setup is very similar to the one GitHub uses.
Most of our content is cached, and when the request hits nginx it looks for the page in memcached and serves that it if can find it - otherwise the request goes to rails.
I can solve the above issue - SOMETIMES - by doing a pkill of the unicorn processes on the servers followed by a:
cap production unicorn:check (removing all the pid's)
cap production unicorn:start
Do you guys have any clue to how I can debug this issue? We don't have any significantly high load on our database server when these problems occurs..
Something killed your unicorn process on one of the servers, or it timed out. Or you have an old app server in your upstream app_server { } block that is no longer valid. Nginx will retry it from time to time. The default is to re-try another upstream if it gets a connection error, so hopefully your clients didn't notice anything.
I don't think this is a nginx issue for me, restarting nginx didn't help. It seems to be gunicorn...A quick and dirty way to avoid this is to recycle the gunicorn instances when the system is not being used, say 1AM for example if that is an acceptable maintenance window. I run gunicorn as a service that will come back up if killed so a pkill script takes care of the recycle/respawn:
start on runlevel [2345]
stop on runlevel [06]
respawn
respawn limit 10 5
exec /var/web/proj/server.sh
I am starting to wonder if this is at all related to memory allocation. I have MongoDB running on the same system and it reserves all the memory for itself but it is supposed to yield if other applications require more memory.
Other things worth a try is getting rid of eventlet or other dependent modules when running gunicorn. uWSGI can also be used as an alternative to gunicorn.

Resources