I have a fairly simple Rails app. It listens for requests in the form
example.com/items?access_key=my_secret_key
My application controller looks at the secret key to determine which user is making the call, looks up their database credentials, and connects to the appropriate database to get that person's items.
However we need to have this support multiple requests at a time, and Puma seems like everyone's favorite / the fastest server for us to use. We started running into problems when benchmarking it with ApacheBench. FYI, puma is configured to have 3 workers and min=1, max=16 threads.
If I were to run
ab -n 100 -c 10 127.0.0.1:3000/items?access_key=my_key
then this error is thrown with a whole lot of stack trace after it:
/home/user/.gem/ruby/2.0.0/gems/mysql2-0.3.16/lib/mysql2/client.rb:70: [BUG] Segmentation fault
ruby 2.0.0p353 (2013-11-22 revision 43784) [x86_64-linux]
Edit: This line also appears in the enormous amount of info that the error contains:
*** glibc detected *** puma: cluster worker 1: 17088: corrupted double-linked list: 0x00007fb671ddbd60
And it looks to me like that's tripping multiple times. I have been unable to determine exactly when (on which requests) it trips.
The benchmarking seems to still finish, but it seems quite slow (from ab):
Concurrency Level: 10
Time taken for tests: 21.085 seconds
Complete requests: 100
Total transferred: 3620724 bytes
21 seconds for 3 megabytes? Even if mysql was being slow, that's... bad. But I think it's worse than that - the amount of data isn't high enough. There are no segfaults when I run concurrency 1, and the amount of data for -n 10 -c 1 is 17 megabytes. So puma is responding with some error page that I can't see - running 'curl address' gives me the expected data, and I can't manually do concurrency.
It gets worse when I run more requests or higher concurrency.
ab -n 1000 -c 10 127.0.0.1:3000/items?access_key=my_key
yields
apr_socket_recv: Connection reset by peer (104)
Total of 199 requests completed
and
ab -n 100 -c 50 127.0.0.1:3000/items?access_key=my_key
yields
apr_socket_recv: Connection reset by peer (104)
Total of 6 requests completed
Running top in another putty window shows me that very often (most times I try to benchmark) only one of the three workers puma created is performing any work. Rarely, all three do.
Because it seems like the error might be somewhere in here, I'll show you my application_controller. It's short, but the bulk of the application (which, like I said, is fairly simple).
class ApplicationController < ActionController::Base
# Prevent CSRF attacks by raising an exception.
# For APIs, you may want to use :null_session instead.
protect_from_forgery with: :exception
def get_yaml_params
YAML.load(File.read("#{APP_ROOT}/config/ecommerce_config.yml"))
end
def access_key_login
access_key = params[:access_key]
unless access_key
show_error("missing access_key parameter")
return false
end
access_info = get_yaml_params
unless client_login = access_info[access_key]
show_error("invalid access_key")
return false
end
status = ActiveRecord::Base.establish_connection(
:adapter => "mysql2",
:host => client_login["host"],
:username => client_login["username"],
:password => client_login["password"],
:database => client_login["database"]
)
end
def generate_json (columns, footer)
// config/application.rb includes the line
// require 'json'
query = "select"
columns.each do |column, name|
query += " #{column}"
query += " AS #{name}" unless column == name
query += ","
end
query = query[0..-2] # trim ','
query += " #{footer}"
dbh = ActiveRecord::Base.connection
results = dbh.select_all(query).to_hash
data = results.map do |result|
columns.map {|column, name| result[name]}
end
({"fields" => columns.values, "values" => data}).to_json
end
def show_error(msg)
render(:text => "Error: #{msg}\n")
nil
end
end
And an example of a controller that uses it
class CategoriesController < ApplicationController
def index
access_key_login or return
columns = {
"prd_type" => "prd_type",
"prd_type_description" => "description"
}
footer = "from cin_desc;"
json = generate_json(columns, footer)
render(:json => json)
end
end
That's pretty much it as far as custom code goes. I can't find anything making this not threadsafe, so I don't know what the cause of the segfaults is. I don't know why not all of the workers spin up when requests are made. I don't know what error is getting returned to ApacheBench. Thanks for helping, I can post more information as you need it.
It appears that the stable version of mysql2 library, 0.3.17, is NOT threadsafe. Until it is updated to be threadsafe, using it with multithreaded puma will be impossible. An alternative would be to use Unicorn.
Related
I have a rails instance which on average uses about 250MB of memory. Lately I'm having issues with some really heavy spikes of memory usage which results in a response time of about ~25s. I have an endpoint which takes some relative simple params and base64 strings which are being send over to AWS.
See the image below for the correlation between memory/response time.
Now, when I look at some extra logs what's specifically happening during that time, I found something interesting.
First of all, I find the net_http memory allocations extremely high. Secondly, the update operation took about 25 sec in total. When I closely look at the timeline, I noticed some "blank gaps", between ~5 and ~15 seconds. The specific operations that are being done during those HTTP calls is from my perspective nothing special. But I'm a bit confused why those gaps occur, maybe someone could tell me a bit about that?
The code that's handling the requests:
def store_documents
identity_documents.each do |side, content|
is_file = content.is_a?(ActionDispatch::Http::UploadedFile)
file_extension = is_file ? content : content[:name]
file_name = "#{SecureRandom.uuid}_#{side}#{File.extname(file_extension)}"
if is_file
write_to_storage_service(file_name, content.tempfile.path)
else
write_file(file_name, content[:uri])
write_to_storage_service(file_name, file_name)
delete_file(file_name)
end
store_object_key_on_profile(side, file_name)
end
end
# rubocop:enable Metrics/MethodLength
def write_file(file_name, base_64_string)
File.open(file_name, 'wb') do |f|
f.write(
Base64.decode64(
get_content(base_64_string)
)
)
end
end
def delete_file(file_name)
File.delete(file_name)
end
def write_to_storage_service(file_name, path)
S3_IDENTITY_BUCKET
.object(file_name)
.upload_file(path)
rescue Aws::Xml::Parser::ParsingError => e
log_error(e)
add_errors(base: e)
end
def get_content(base_64_string)
base_64_string.sub %r{data:((image|application)/.{3,}),}, ''
end
def store_object_key_on_profile(side, file_name)
profile.update("#{side}_identity_document_object_key": file_name)
end
def identity_documents
{
front: front_identity_document,
back: back_identity_document
}
end
def front_identity_document
#front_identity_document ||= identity_check_params[:front_identity_document]
end
def back_identity_document
#back_identity_document ||= identity_check_params[:back_identity_document]
end
I tend towards some issues with Ruby GC, or perhaps Ruby doesn't have enough pages available to directly store the base64 string in memory? I know that Ruby 2.6 and Ruby 2.7 had some large improvements regarding memory fragmentation, but that didn't change much either (currently running Ruby 2.7.1)
I have my Heroku resources configured to use Standard-2x dynos (1GB ram) x3. WEB_CONCURRENCY(workers) is set to 2, and amount of threads is set to 5.
I understand that my questions are rather broad, I'm more interested in some tooling, or ideas that could help to narrow my scope. Thanks!
I have a rails application running with puma server. Is there any way, we can see how many number of threads used in application currently ?
I was wondering about the same thing a while ago and came upon this issue. The author included the code they ended up using to collect those stats:
module PumaThreadLogger
def initialize *args
ret = super *args
Thread.new do
while true
# Every X seconds, write out what the state of this dyno is in a format that Librato understands.
sleep 5
thread_count = 0
backlog = 0
waiting = 0
# I don't do the logging or string stuff inside of the mutex. I want to get out of there as fast as possible
#mutex.synchronize {
thread_count = #workers.size
backlog = #todo.size
waiting = #waiting
}
# For some reason, even a single Puma server (not clustered) has two booted ThreadPools.
# One of them is empty, and the other is actually doing work
# The check above ignores the empty one
if (thread_count > 0)
# It might be cool if we knew the Puma worker index for this worker, but that didn't look easy to me.
# The good news: By using the PID we can differentiate two different workers on two different dynos with the same name
# (which might happen if one is shutting down and the other is starting)
source_name = "#{Process.pid}"
# If we have a dyno name, prepend it to the source to make it easier to group in the log output
dyno_name = ENV['DYNO']
if (dyno_name)
source_name="#{dyno_name}."+source_name
end
msg = "source=#{source_name} "
msg += "sample#puma.backlog=#{backlog} sample#puma.active_connections=#{thread_count - waiting} sample#puma.total_threads=#{thread_count}"
Rails.logger.info msg
end
end
end
ret
end
end
module Puma
class ThreadPool
prepend PumaThreadLogger
end
end
This code contains logic that is specific to heroku, but the core of collecting the #workers.size and logging it will work in any environment.
Major edit: Since originally finding this issue I have whittled it down to the below. I think this is now a marginally more precise description of the problem. Comments on the OP may therefore not correlate entirely.
Edit lightly modified version posted in rails/puma projects: https://github.com/rails/rails/issues/21209, https://github.com/puma/puma/issues/758
Edit Now reproduced with OS X and Rainbows
Summary: When using Puma and running long-running connections I am consistently receiving errors related to ActiveRecord connections crossing threads. This manifests itself in message like message type 0x## arrived from server while idle and a locked (crashed) server.
The set up:
Ubuntu 15 / OSX Yosemite
PostgreSQL (9.4) / MySQL (mysqld 5.6.25-0ubuntu0.15.04.1)
Ruby - MRI 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux] / Rubinius rbx-2.5.8
Rails (4.2.3, 4.2.1)
Puma (2.12.2, 2.11)
pg (pg-0.18.2) / mysql2
Note, not all combinations of the above versions have been tried. First listed version is what I'm currently testing against.
rails new issue-test
Add a route get 'events' => 'streaming#events'
Add a controller streaming_controller.rb
Set up database stuff (pool: 2, but seen with different pool sizes)
Code:
class StreamingController < ApplicationController
include ActionController::Live
def events
begin
response.headers["Content-Type"] = "text/event-stream"
sse = SSE.new(response.stream)
sse.write( {:data => 'starting'} , {:event => :version_heartbeat})
ActiveRecord::Base.connection_pool.release_connection
while true do
ActiveRecord::Base.connection_pool.with_connection do |conn|
ActiveRecord::Base.connection.query_cache.clear
logger.info 'START'
conn.execute 'SELECT pg_sleep(3)'
logger.info 'FINISH'
sse.write( {:data => 'continuing'}, {:event => :version_heartbeat})
sleep 0.5
end
end
rescue IOError
rescue ClientDisconnected
ensure
logger.info 'Ensuring event stream is closed'
sse.close
end
render nothing: true
end
end
Puma configuration:
workers 1
threads 2, 2
#...
bind "tcp://0.0.0.0:9292"
#...
activate_control_app
on_worker_boot do
require "active_record"
ActiveRecord::Base.connection.disconnect! rescue ActiveRecord::ConnectionNotEstablished
ActiveRecord::Base.establish_connection(YAML.load_file("#{app_dir}/config/database.yml")[rails_env])
end
Run the server puma -e production -C path/to/puma/config/production.rb
Test script:
#!/bin/bash
timeout 30 curl -vS http://0.0.0.0/events &
timeout 5 curl -vS http://0.0.0.0/events &
timeout 30 curl -vS http://0.0.0.0/events
This reasonably consistently results in a complete lock of the application server (in PostgreSQL, see notes). The scary message comes from libpq:
message type 0x44 arrived from server while idle
message type 0x43 arrived from server while idle
message type 0x5a arrived from server while idle
message type 0x54 arrived from server while idle
In the 'real-world' I have quite a few extra elements and the issue presents itself at random. My research indicates that this message comes from libpq and is subtext for 'communication problem, possibly using connection in different threads'. Finally, while writing this up, I had the server lock up without a single message in any log.
So, the question(s):
Is the pattern I'm following not legal in some way? What have I mis[sed|understood]?
What is the 'standard' for working with database connections here that should avoid these problems?
Can you see a way to reliably reproduce this?
or
What is the underlying issue here and how can I solve it?
MySQL
If running MySQL, the message is a bit different, and the application recovers (though I'm not sure if it is then in some undefined state):
F, [2015-07-30T14:12:07.078215 #15606] FATAL -- :
ActiveRecord::StatementInvalid (Mysql2::Error: This connection is in use by: #<Thread:0x007f563b2faa88#/home/dev/.rbenv/versions/2.2.2/lib/ruby/gems/2.2.0/gems/actionpack-4.2.3/lib/action_controller/metal/live.rb:269 sleep>: SELECT `tasks`.* FROM `tasks` ORDER BY `tasks`.`id` ASC LIMIT 1):
Warning: read 'answer' as 'seems to make a difference'
I don't see the issue happen if I change the controller block to look like:
begin
#...
while true do
t = Thread.new do #<<<<<<<<<<<<<<<<<
ActiveRecord::Base.connection_pool.with_connection do |conn|
#...
end
end
t.join #<<<<<<<<<<<<<<<<<
end
#...
rescue IOError
#...
But I don't know whether this has actually solved the problem or just made it extremely unlikely. Nor can I really fathom why this would make a difference.
Posting this as a solution in case it helps, but still digging on the issue.
I have a Rails app running on 4.1.6 and Ruby 2.1.3. Some times on some requests they take so long, but it doesn't happen all the times. When I check newrelic I still can't identify or trace the slowness lines.
check out perftools - you can use googles perftools, or the ruby specific implementation. There's a nice write-up about it here
It runs through your application and finds bottlenecks by determining time spent in individual method calls. You'll get output like this:
Total: 23 samples
18 78.3% 78.3% 18 78.3% BigDecimal#div
4 17.4% 95.7% 4 17.4% BigDecimal#*
1 4.3% 100.0% 23 100.0% BigMath#PI
0 0.0% 100.0% 23 100.0% BigMath.PI
In this case, you'd need to spend some time looking at the div method in BigDecimal
It may be a slow API call.
Here is the code for Doorkeeper::TokensController#create
module Doorkeeper
class TokensController < Doorkeeper::ApplicationMetalController
def create
response = strategy.authorize
self.headers.merge! response.headers
self.response_body = response.body.to_json
self.status = response.status
rescue Errors::DoorkeeperError => e
handle_token_exception e
end
# ...snip...
private
def strategy
#strategy ||= server.token_request params[:grant_type]
end
end
end
I don't see any heavy lifting done in here.
Read this New Relic blog post about adding custom metrics to your code. This will replace "Application code" entries in your trace with greater detail about what is going on inside your code.
I cannot copy the linked information here due to copyright.
I understand that when we fork a process the child process inherits a copy of the parents open file descriptors and offsets. According to the man pages this refers to the same file descriptors used by the parent. Based on that theory in the following program
puts "Process #{Process.pid}"
file = File.open('sample', 'w')
forked_pid = fork do
sleep(10)
puts "Writing to file now..."
file.puts("Hello World. #{Time.now}")
end
file.puts("Welcome to winter of my discontent #{Time.now}")
file.close
file = nil
Question 1:
Shouldn't the forked process which is sleeping for 10 seconds lose its file descriptor and not be able to write to the file as the parent process completes and closes the file and exits.
Question 2: But for whatever reason if this works then how does ActiveRecord lose its connection in this scenario. It only works if I set :reconnect => true on ActiveRecord connect can it actually connect, which means its losing connection.
require "rubygems"
require "redis"
require 'active_record'
require 'mysql2'
connection = ActiveRecord::Base.establish_connection({
:adapter => 'mysql2',
:username => 'root_user',
:password => 'Pi',
:host => 'localhost',
:database => 'list_development',
:socket => '/var/lib/mysql/mysql.sock'
})
class User < ActiveRecord::Base
end
u = User.first
puts u.inspect
fork do
sleep 3
puts "*" * 50
puts User.first.inspect
puts "*" * 50
end
puts User.first.inspect
However, the same is not true with Redis (v2.4.8) which does not lose connection on a fork, again. Does the it try to reconnect internally on a fork?
If thats the case then why isn't the write file program not throwing an error.
Could somebody explain whats going on here. Thanks
If you close a file descriptor in one process it stays valid in the other process, this is why your file example works fine.
The mysql case is different because it's a socket with another process at the end. When you call close on the mysql adapter (or when the adapter gets garbage collected when ruby exits) it actually sends a "QUIT" command to the server saying that you're disconnecting, so the server tears down its side of the socket. In general you really don't want to share a mysql connection between two processes - you'll get weird errors depending on whether the two processes are trying to use the socket at the same time.
If closing a redis connection just closes the socket (as opposed to sending a "I'm going away " message to the server) then the child connection should continue to work because the socket won't actually have been closed