rails, pg exception from sidekiq - ruby-on-rails

please help resolve issue with
Message
PG::UnableToSend: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.
Traceback
PG::UnableToSend: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
full log: https://gist.githubusercontent.com/igorkasyanchuk/cc96dba31b0b0f83b77faadf077433e8/raw/70af94105b0a7649b6ad0dde086ac539902d1d3f/gistfile1.txt
sample code:
class RedZoneCrossedWorker < Worker
def perform(event_id)
Chewy.strategy(:atomic) do
puts "#{self.class.name} Performing: #{event_id} #{jid}"
event = RedZoneCrossedShipmentEvent.find_by(id: event_id)
shipment = event.try(:shipment)
if shipment
# generate driver/oo payments
shipment.generate_red_zone_payments
# send mails and notifications if red zone crossed successfully
notifier = event.successfully_completed? ? 'ShipmentNotifiers::ShipmentRedZoneSuccessNotifier' : 'ShipmentNotifiers::ShipmentRedZoneViolationsNotifier'
ShipmentNotificationWorker.new.perform(event.shipment_id, notifier)
end
end
end
end
we have 2 app servers + 1 db. we have pool 25 and workers 20, rails, sidekiq, pg.
it happen only with sidekiq background workers.
thanks for advices

Looks like problem was solved using:
production:
....
pool: 20
variables:
tcp_keepalives_idle: 60
tcp_keepalives_interval: 60
tcp_keepalives_count: 100

Related

Rails app hangs when there's a server sent event (SSE) that requires database actions

I'm learning & doing SSE for the first time in rails! My controller code:
def update
response.headers['Content-Type'] = 'text/event-stream'
sse = SSE.new(response.stream, event: 'notice')
begin
User.listen_to_creation do |user_id|
sse.write({id: user_id})
end
rescue ClientDisconnected
ensure
sse.close
end
end
Front end:
var source = new EventSource('/site_update');
source.addEventListener('notice', function(event) {
var data = JSON.parse(event.data)
console.log(data)
});
Model pub/sub
class User
after_commit :notify_creation, on: :create
def notify_creation
ActiveRecord::Base.connection_pool.with_connection do |connection|
self.class.execute_query(connection, ["NOTIFY user_created, '?'", id])
end
end
def self.listen_to_creation
ActiveRecord::Base.connection_pool.with_connection do |connection|
begin
execute_query(connection, ["LISTEN user_created"])
connection.raw_connection.wait_for_notify do |event, pid, id|
yield id
end
ensure
execute_query(connection, ["UNLISTEN user_created"])
end
end
end
def self.clean_sql(query)
sanitize_sql(query)
end
private
def self.execute_query(connection, query)
sql = self.clean_sql(query)
connection.execute(sql)
end
end
I've noticed that if I'm writing to SSE, something trivial like in a tutorial like... sse.write({time_now: Time.now}), everything works great. In command line, CTRL+C successfully shuts down the local server.
However, whenever I need to write something that requires some kind of database action, for example when I'm doing a postgres pub/sub as in this tutorial, then CTRL+C doesn't shut down the local server, it's just stuck and hangs and requires me to manually kill the PID.
On the actual spun up server, sometimes a page refresh will hang forever as well. Other times, it will throw a timeout error:
ActiveRecord::ConnectionTimeoutError (could not obtain a connection from the pool within 5.000 seconds (waited 5.001 seconds); all pooled connections were in use):
Unfortunately this issue persists in production as well, where i'm using Heroku. I just get lots of timeout errors. But I think I have Heroku properly configured, and also local settings... my understanding is I just need to have a sizable pool (I have 5) to pull connections from and allow multiple threads. Below you'll find some config code.
THERE ARE NO ENV VARIABLES, DEFAULTS USED!
# config/database.yml
default: &default
adapter: postgresql
pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
timeout: 5000
development:
<<: *default
database: proper_development
# config/puma.rb
workers Integer(ENV['WEB_CONCURRENCY'] || 1)
threads_count = Integer(ENV['MAX_THREADS'] || 5)
threads threads_count, threads_count
preload_app!
rackup DefaultRackup
port ENV['PORT'] || 3000
environment ENV['RACK_ENV'] || 'development'
on_worker_boot do
# Worker specific setup for Rails 4.1+
# See: https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server#on-worker-boot
ActiveRecord::Base.establish_connection
end
If it's helpful here's the output when I run rails s
=> Booting Puma
=> Rails 5.0.2 application starting in development on http://localhost:3000
=> Run `rails server -h` for more startup options
Puma starting in single mode...
* Version 4.3.3 (ruby 2.4.0-p0), codename: Mysterious Traveller
* Min threads: 0, max threads: 16
* Environment: development
* Listening on tcp://127.0.0.1:3000
* Listening on tcp://[::1]:3000
Use Ctrl-C to stop
The issue here seems to be the lack of consistency between the puma threads and the database connections. If some connection was initiated by middleware etc through AR, the code you have written can lead to two connections being held in the same request cycle until you receive a notification and the thread finishes its job. AR caches connections per thread, so if a request was made and connection was checked out from the pool it will be held by that. Look at this issue for more details. If you end up using the connection pool to check out one more connection and make that connection wait till you get a notification from Postgres, potentially two connections can be held by the same Puma thread that is waiting.
To see this in action, start a new Rails server instance in development and make a request to your SSE endpoint. If you were getting timeouts before you might see two connections to Postgres while you made just one request to a newly launched server. So, even though your number of threads and connection pool size was same, you might run out of free connections from the pool. An easier way might be to just add this line in development after you checkout a connection to see how many cached connections are being held right now.
def self.listen_to_creation
ActiveRecord::Base.connection_pool.with_connection do |connection|
# Print or log this array
p ActiveRecord::Base.connection_pool.instance_variable_get(:#thread_cached_conns).keys.map(&:object_id)
begin
execute_query(connection, ["LISTEN user_created"])
.........
.........
Also, the snippets you have posted seem to indicate you are running up to 16 threads on a connection pool of size 5 in development environment, so that is a different issue.
To fix this, you would want to investigate which thread is holding the connection and if you can reuse it for your notification or just increase the DB pool size.
Now, coming to SSE itself here. Since a SSE connection blocks and reserves a thread in your current setup. If you have multiple requests to this endpoint you might quickly starve out of Puma threads itself making requests wait. This might work in case you are not expecting a lot of requests to this endpoint but if you are, you would need more free threads so you might even want to increase the Puma thread count. Ideally, though a non blocking server would work better here.
Edit:
Also, forgot to add that SSE in rails has an issue of keeping alive dead connections if it doesn't know the connection is dead. You might have threads endlessly waiting this way until some data comes and they realize the connection is no longer valid.

Handling Net::ReadTimeout error in HTTParty

I am using httparty (0.13.1) gem. I am making series of API calls using httparty. Some of my initial API calls succeeded, but the later calls fail consecutively. I have added a timeout of 180 seconds. I searched google but I can not find any solution still. I am struggling due to this for a long time.
My code:
response = HTTParty.get("http://pubapi.cryptsy.com/api.php?method=marketdatav2", timeout: 180)
Error:
A Net::ReadTimeout occurred in background at 2014-10-05 11:42:06 UTC :
I don't know whether this time out is working? I feel 180 seconds is more than enough to retrieve the response because the default timeout is 60 seconds. If I want to handle Net read timeout error, Is there a way to do it? I want to return nil if this error occurs.
Or Is there a best solution to avoid happening this error?
Had a similar problem with a different module, and in my case it likely succeeds if retried, so I catch the timeout and retry.
max_retries = 3
times_retried = 0
begin
#thing that errors sometimes
rescue Net::ReadTimeout => error
if times_retried < max_retries
times_retried += 1
puts "Failed to <do the thing>, retry #{times_retried}/#{max_retries}"
retry
else
puts "Exiting script. <explanation of why this is unlikely to recover>"
exit(1)
end
end
Left here in case someone else finds it helpful.
you could use rescue to handling your timeout exception:
begin
HTTParty.get("http://pubapi.cryptsy.com/api.php?method=marketdatav2", timeout: 180)
rescue Net::ReadTimeout
nil
end

Redis + ActionController::Live threads not dying

Background: We've built a chat feature in to one of our existing Rails applications. We're using the new ActionController::Live module and running Puma (with Nginx in production), and subscribing to messages through Redis. We're using EventSource client side to establish the connection asynchronously.
Problem Summary: Threads are never dying when the connection is terminated.
For example, should the user navigate away, close the browser, or even go to a different page within the application, a new thread is spawned (as expected), but the old one continues to live.
The problem as I presently see it is that when any of these situations occur, the server has no way of knowing whether the connection on the browser's end is terminated, until something attempts to write to this broken stream, which would never happen once the browser has moved away from the original page.
This problem seems to be documented on github, and similar questions are asked on StackOverflow here (pretty well exact same question) and here (regarding getting number of active threads).
The only solution I've been able to come up with, based on these posts, is to implement a type of thread / connection poker. Attempting to write to a broken connection generates an IOError which I can catch and properly close the connection, allowing the thread to die. This is the controller code for that solution:
def events
response.headers["Content-Type"] = "text/event-stream"
stream_error = false; # used by flusher thread to determine when to stop
redis = Redis.new
# Subscribe to our events
redis.subscribe("message.create", "message.user_list_update") do |on|
on.message do |event, data| # when message is received, write to stream
response.stream.write("messageType: '#{event}', data: #{data}\n\n")
end
# This is the monitor / connection poker thread
# Periodically poke the connection by attempting to write to the stream
flusher_thread = Thread.new do
while !stream_error
$redis.publish "message.create", "flusher_test"
sleep 2.seconds
end
end
end
rescue IOError
logger.info "Stream closed"
stream_error = true;
ensure
logger.info "Events action is quitting redis and closing stream!"
redis.quit
response.stream.close
end
(Note: the events method seems to get blocked on the subscribe method invocation. Everything else (the streaming) works properly so I assume this is normal.)
(Other note: the flusher thread concept makes more sense as a single long-running background process, a bit like a garbage thread collector. The problem with my implementation above is that a new thread is spawned for each connection, which is pointless. Anyone attempting to implement this concept should do it more like a single process, not so much as I've outlined. I'll update this post when I successfully re-implement this as a single background process.)
The downside of this solution is that we've only delayed or lessened the problem, not completely solved it. We still have 2 threads per user, in addition to other requests such as ajax, which seems terrible from a scaling perspective; it seems completely unattainable and impractical for a larger system with many possible concurrent connections.
I feel like I am missing something vital; I find it somewhat difficult to believe that Rails has a feature that is so obviously broken without implementing a custom connection-checker like I have done.
Question: How do we allow the connections / threads to die without implementing something corny such as a 'connection poker', or garbage thread collector?
As always let me know if I've left anything out.
Update
Just to add a bit of extra info: Huetsch over at github posted this comment pointing out that SSE is based on TCP, which normally sends a FIN packet when the connection is closed, letting the other end (server in this case) know that its safe to close the connection. Huetsch points out that either the browser is not sending that packet (perhaps a bug in the EventSource library?), or Rails is not catching it or doing anything with it (definitely a bug in Rails, if that's the case). The search continues...
Another Update
Using Wireshark, I can indeed see FIN packets being sent. Admittedly, I am not very knowledgeable or experienced with protocol level stuff, however from what I can tell, I definitely detect a FIN packet being sent from the browser when I establish the SSE connection using EventSource from the browser, and NO packet sent if I remove that connection (meaning no SSE). Though I'm not terribly up on my TCP knowledge, this seems to indicate to me that the connection is indeed being properly terminated by the client; perhaps this indicates a bug in Puma or Rails.
Yet another update
#JamesBoutcher / boutcheratwest(github) pointed me to a discussion on the redis website regarding this issue, specifically in regards to the fact that the .(p)subscribe method never shuts down. The poster on that site pointed out the same thing that we've discovered here, that the Rails environment is never notified when the client-side connection is closed, and therefore is unable to execute the .(p)unsubscribe method. He inquires about a timeout for the .(p)subscribe method, which I think would work as well, though I'm not sure which method (the connection poker I've described above, or his timeout suggestion) would be a better solution. Ideally, for the connection poker solution, I'd like to find a way to determine whether the connection is closed on the other end without writing to the stream. As it is right now, as you can see, I have to implement client-side code to handle my "poking" message separately, which I believe is obtrusive and goofy as heck.
A solution I just did (borrowing a lot from #teeg) which seems to work okay (haven't failure tested it, tho)
config/initializers/redis.rb
$redis = Redis.new(:host => "xxxx.com", :port => 6379)
heartbeat_thread = Thread.new do
while true
$redis.publish("heartbeat","thump")
sleep 30.seconds
end
end
at_exit do
# not sure this is needed, but just in case
heartbeat_thread.kill
$redis.quit
end
And then in my controller:
def events
response.headers["Content-Type"] = "text/event-stream"
redis = Redis.new(:host => "xxxxxxx.com", :port => 6379)
logger.info "New stream starting, connecting to redis"
redis.subscribe(['parse.new','heartbeat']) do |on|
on.message do |event, data|
if event == 'parse.new'
response.stream.write("event: parse\ndata: #{data}\n\n")
elsif event == 'heartbeat'
response.stream.write("event: heartbeat\ndata: heartbeat\n\n")
end
end
end
rescue IOError
logger.info "Stream closed"
ensure
logger.info "Stopping stream thread"
redis.quit
response.stream.close
end
I'm currently making an app that revolves around ActionController:Live, EventSource and Puma and for those that are encountering problems closing streams and such, instead of rescuing an IOError, in Rails 4.2 you need to rescue ClientDisconnected. Example:
def stream
#Begin is not required
twitter_client = Twitter::Streaming::Client.new(config_params) do |obj|
# Do something
end
rescue ClientDisconnected
# Do something when disconnected
ensure
# Do something else to ensure the stream is closed
end
I found this handy tip from this forum post (all the way at the bottom): http://railscasts.com/episodes/401-actioncontroller-live?view=comments
Here's a potentially simpler solution which does not use a heartbeat. After much research and experimentation, here's the code I'm using with sinatra + sinatra sse gem (which should be easily adapted to Rails 4):
class EventServer < Sinatra::Base
include Sinatra::SSE
set :connections, []
.
.
.
get '/channel/:channel' do
.
.
.
sse_stream do |out|
settings.connections << out
out.callback {
puts 'Client disconnected from sse';
settings.connections.delete(out);
}
redis.subscribe(channel) do |on|
on.subscribe do |channel, subscriptions|
puts "Subscribed to redis ##{channel}\n"
end
on.message do |channel, message|
puts "Message from redis ##{channel}: #{message}\n"
message = JSON.parse(message)
.
.
.
if settings.connections.include?(out)
out.push(message)
else
puts 'closing orphaned redis connection'
redis.unsubscribe
end
end
end
end
end
The redis connection blocks on.message and only accepts (p)subscribe/(p)unsubscribe commands. Once you unsubscribe, the redis connection is no longer blocked and can be released by the web server object which was instantiated by the initial sse request. It automatically clears when you receive a message on redis and sse connection to the browser no longer exists in the collection array.
Building on #James Boutcher, I used the following in clustered Puma with 2 workers, so that I have only 1 thread created for the heartbeat in config/initializers/redis.rb:
config/puma.rb
on_worker_boot do |index|
puts "worker nb #{index.to_s} booting"
create_heartbeat if index.to_i==0
end
def create_heartbeat
puts "creating heartbeat"
$redis||=Redis.new
heartbeat = Thread.new do
ActiveRecord::Base.connection_pool.release_connection
begin
while true
hash={event: "heartbeat",data: "heartbeat"}
$redis.publish("heartbeat",hash.to_json)
sleep 20.seconds
end
ensure
#no db connection anyway
end
end
end
Here you are solution with timeout that will exit blocking Redis.(p)subscribe call and kill unused connection tread.
class Stream::FixedController < StreamController
def events
# Rails reserve a db connection from connection pool for
# each request, lets put it back into connection pool.
ActiveRecord::Base.clear_active_connections!
# Last time of any (except heartbeat) activity on stream
# it mean last time of any message was send from server to client
# or time of setting new connection
#last_active = Time.zone.now
# Redis (p)subscribe is blocking request so we need do some trick
# to prevent it freeze request forever.
redis.psubscribe("messages:*", 'heartbeat') do |on|
on.pmessage do |pattern, event, data|
# capture heartbeat from Redis pub/sub
if event == 'heartbeat'
# calculate idle time (in secounds) for this stream connection
idle_time = (Time.zone.now - #last_active).to_i
# Now we need to relase connection with Redis.(p)subscribe
# chanel to allow go of any Exception (like connection closed)
if idle_time > 4.minutes
# unsubscribe from Redis because of idle time was to long
# that's all - fix in (almost)one line :)
redis.punsubscribe
end
else
# save time of this (last) activity
#last_active = Time.zone.now
end
# write to stream - even heartbeat - it's sometimes chance to
# capture dissconection error before idle_time
response.stream.write("event: #{event}\ndata: #{data}\n\n")
end
end
# blicking end (no chance to get below this line without unsubscribe)
rescue IOError
Logs::Stream.info "Stream closed"
rescue ClientDisconnected
Logs::Stream.info "ClientDisconnected"
rescue ActionController::Live::ClientDisconnected
Logs::Stream.info "Live::ClientDisconnected"
ensure
Logs::Stream.info "Stream ensure close"
redis.quit
response.stream.close
end
end
You have to use reds.(p)unsubscribe to end this blocking call. No exception can break this.
My simple app with information about this fix: https://github.com/piotr-kedziak/redis-subscribe-stream-puma-fix
Instead of sending a heartbeat to all the clients, it might be easier to just set a watchdog for each connection. [Thanks to #NeilJewers]
class Stream::FixedController < StreamController
def events
# Rails reserve a db connection from connection pool for
# each request, lets put it back into connection pool.
ActiveRecord::Base.clear_active_connections!
redis = Redis.new
watchdog = Doberman::WatchDog.new(:timeout => 20.seconds)
watchdog.start
# Redis (p)subscribe is blocking request so we need do some trick
# to prevent it freeze request forever.
redis.psubscribe("messages:*") do |on|
on.pmessage do |pattern, event, data|
begin
# write to stream - even heartbeat - it's sometimes chance to
response.stream.write("event: #{event}\ndata: #{data}\n\n")
watchdog.ping
rescue Doberman::WatchDog::Timeout => e
raise ClientDisconnected if response.stream.closed?
watchdog.ping
end
end
end
rescue IOError
rescue ClientDisconnected
ensure
response.stream.close
redis.quit
watchdog.stop
end
end
If you can tolerate a small chance of missing a message you can use subscribe_with_timeout:
sse = SSE.new(response.stream)
sse.write("hi", event: "hello")
redis = Redis.new(reconnect_attempts: 0)
loop do
begin
redis.subscribe_with_timeout(5 * 60, 'mycoolchannel') do |on|
on.message do |channel, message|
sse.write(message, event: 'message_posted')
end
end
rescue Redis::TimeoutError
sse.write("ping", event: "ping")
end
end
This code subscribes to a Redis channel, waits for 5 minutes, then closes connection to Redis and subscribes again.

ActiveResource EOFError on "slow" API

I'm seriously struggling to solve this one, any help would be appreciated!
I have two Rails apps, let's call them Client and Service, all very simple, normal REST interface - here's the basic scenario:
Client makes a POST /resources.json request to the Service
The Service runs a process which creates the resource and returns an ID to the Client
Again, all very simple, just that Service processing is very time-intensive and can take several minutes. If that happens, an EOFError is raised on the Client, exactly 60s after the request was made (no matter what the ActiveResource::Base.timeout is set to) while the service correctly processed the request and responds with 200/201. This is what we see in the logs (chronologically):
C 00:00:00: POST /resources.json
S 00:00:00: Received POST /resources.json => resources#create
C 00:01:00: EOFError: end of file reached
/usr/ruby1.8.7/lib/ruby/1.8/net/protocol.rb:135:in `sysread'
/usr/ruby1.8.7/lib/ruby/1.8/net/protocol.rb:135:in `rbuf_fill'
/usr/ruby1.8.7/lib/ruby/1.8/timeout.rb:62:in `timeout'
...
S 00:02:23: Response POST /resources.json, 201, after 143s
Obviously the service response never reached the client. I traced the error down to the socket level and recreated the scenario in a script, where I open a TCPSocket and try to retrieve data. Since I don't request anything, I shouldn't get anything back and my request should time out after 70 seconds (see full script at the bottom):
Timeout::timeout(70) { TCPSocket.open(domain, 80).sysread(16384) }
These were the results for a few domain:
www.amazon.com => Timeout after 70s
github.com => EOFError after 60s
www.nytimes.com => Timeout after 70s
www.mozilla.org => EOFError after 13s
www.googlelabs.com => Timeout after 70s
maps.google.com => Timeout after 70s
As you can see, some servers allowed us to "wait" for the full 70 seconds, while others terminated our connection, raising EOFErrors. When we did this test against our service, we (expectedly) got an EOFError after 60 seconds.
Does anyone know why this happens? Is there any way to prevent these or extend the server-side time-out? Since our service continues "working", even after the socket was closed, I assume it must be terminated on the proxy-level?
Every hint would be greatly appreciated!
PS: The full script:
require 'socket'
require 'benchmark'
require 'timeout'
def test_socket(domain)
puts "Connecting to #{domain}"
message = nil
time = Benchmark.realtime do
begin
Timeout::timeout(70) { TCPSocket.open(domain, 80).sysread(16384) }
message = "Successfully received data" # Should never happen
rescue => e
message = "Server terminated connection: #{e.class} #{e.message}"
rescue Timeout::Error
message = "Controlled client-side timeout"
end
end
puts " #{message} after #{time.round}s"
end
test_socket 'www.amazon.com'
test_socket 'github.com'
test_socket 'www.nytimes.com'
test_socket 'www.mozilla.org'
test_socket 'www.googlelabs.com'
test_socket 'maps.google.com'
I know this is nearly a year old, but in case anyone else finds this, I wanted to add a possible culprit.
Amazon's ELB will terminate idle connections at 60 seconds, so if you are using EC2 behind ELB, then ELB could be the server side problem.
the only "documentation" I could find here is https://forums.aws.amazon.com/thread.jspa?threadID=33427&start=50&tstart=50, but it's better than nothing
Each server decides when to close the connection. It depends on the server side software and its settings. You can't control that.

Mongodb server goes down, how to prevent Rails app from timing out?

I'm using central_logger to store logs from our Rails app in mongodb. When the mongo server went down recently our app started timing out on mongo inserts. How can I prevent Rails from timing out if the mongo server goes down?
The ruby driver supports timeouts like so
#conn = Connection.new("localhost", 27017, :pool_size => 5, :timeout => 5)
But the central_logger gem isn't using that. So you can either fork it to add that in there, or monkey-path the CentralLogger::MongoLogger.connect method
It currently has
def connect
#mongo_connection ||= Mongo::Connection.new(#db_configuration['host'],
#db_configuration['port'],
:auto_reconnect => true).db(#db_configuration['database'])
if #db_configuration['username'] && #db_configuration['password']
# the driver stores credentials in case reconnection is required
#authenticated = #mongo_connection.authenticate(#db_configuration['username'],
#db_configuration['password'])
end
end
You would need to monkey-path in :timeout=>5 (or whatever) to the Mongo::Connection.new
I would bet the author of central-logger would like to have this in there, so a fork and pull request would likely be welcome.
You could use replica sets - so if the master goes down, it can failover automatically to one of the replicas.
Usually the database insert should be fast, so you could work with the ruby timeout:
require 'timeout'
Timeout::timeout(0.2) do
... write to log server
end
this code will timeout and continue after 200 milliseconds in any case.

Resources