I have a Rails system in which every half hour, the following is done:
There are 15 clients somewhere else on the network
The server creates a record called Measurement for each of these clients
The measurement records are configured, and then they are run asynchronously via Sidekiq, using MeasurementWorker.perform_async(m.id)
The connection to the client is done with Celluloid actors and a WebSocket client
Each measurement, when run, creates a number of event records that are stored in the database
The system has been running well with 5 clients, but now I am at 15, and many of the measurements don't run anymore when I start them at the same time, with the following error:
2015-02-04T07:30:10.410Z 35519 TID-owd4683iw MeasurementWorker JID-15f6b396ae9e3e3cb2ee3f66 INFO: fail: 5.001 sec
2015-02-04T07:30:10.412Z 35519 TID-owd4683iw WARN: {"retry"=>false, "queue"=>"default", "backtrace"=>true, "class"=>"MeasurementWorker", "ar
gs"=>[6504], "jid"=>"15f6b396ae9e3e3cb2ee3f66", "enqueued_at"=>1423035005.4078047}
2015-02-04T07:30:10.412Z 35519 TID-owd4683iw WARN: could not obtain a database connection within 5.000 seconds (waited 5.000 seconds)
2015-02-04T07:30:10.412Z 35519 TID-owd4683iw WARN: /home/webtv/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/activerecord-4.1.4/lib/active_
record/connection_adapters/abstract/connection_pool.rb:190:in `block in wait_poll'
....
Now, my production environment looks like this:
config/sidekiq.yml
production:
:verbose: false
:logfile: ./log/sidekiq.log
:poll_interval: 5
:concurrency: 50
config/unicorn.rb
...
worker_processes Integer(ENV["WEB_CONCURRENCY"] || 3)
timeout 60
...
config/database.yml
production:
adapter: postgresql
database: ***
username: ***
password: ***
host: 127.0.0.1
pool: 50
postgresql.conf
max_connections = 100 # default
As you see, I've already increased the concurrency of Sidekiq to 50, to cater for a high number of possible concurrent measurements. I've set the database pool to 50, which already looks like overkill to me.
I should add that the server itself is quite powerful, with 8 GB RAM and a quad-core Xeon E5-2403 1.8 GHz.
What should these values ideally be set to? What formula can I use to calculate them? (E.g. number of maximum DB connections = Unicorn workers × Sidekiq concurrency × N)
It looks to me like your pool configuration of 100 is not taking affect. Each process will need a max of 50 so change 100 to 50. I don't know if you are using Heroku but it is notoriously tough to configure the pool size.
Inside mysql, your max connection count should look like this:
((Unicorn processes) * 1) + ((sidekiq processes) * 50)
Unicorn is single threaded and never needs more than one connection unless you are spinning up your own threads in your Rails app for some reason.
I'm sure the creator of sidekiq #MikePerham is more than suited to the task of fixing your sidekiq issues but as a ruby dev two things stand out.
If you're doing a lot of database operations via ruby can you push some of them into the database as triggers? You could still start them on the appside with a sidekiq process of course. :)
Second every half hour screams to me of a rake task run via cron. Hope you're doing that too. FWIW I usually use the Whenever gem to create the cron line I have to drop into the crontab of the user running the app. Note its designed to autocreate the crontask in a scripted deploy but in a non-scripted one you can still leverage it to give you the lines you have to paste into your crontab though via the whenever command.
Also you mention this is for measurements.
Have you considered leveraging something like elasticsearch and the searchkick gem? This is a little more of a complex setup, be sure to firewall the server you install ES on. But this might make your code a lot more manageable as you grow. Also it gives you a good search mechanism almost for free and its distributed and more language agnostic, e.g. Bloodhound, Java. :) Plus kibana gives you a nice window into the ES records
Related
I'm trying to process large amount of data using Python and maintaining processing status in MySQL. However, I'm surprised there is no standard connection pool for python-mysql (like HikariCP in Java).
I initially started with PyMySQL, things were great until the program ran for first few hours. After few hours, things started to fail. I was getting lot of errors like:
pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on '127.0.0.1' ([Errno 99] Cannot assign requested address)")
Moreover, lot of ports were stuck in TIME_WAIT state because I'm opening and closing connections too frequently because of lack of connection pooling
/d/p/950 ❯❯❯ netstat -nt | wc -l
84752
Per this and this, I tried to set tcp_fin_timeout and ip_local_port_range, but hardly anything improved.
echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout
echo 15000 65000 > /proc/sys/net/ipv4/ip_local_port_range
Then I found out that MySQL provides mysql.connector which comes with pooling functionality. After doing all that performance actually deteriorated. More processes started to get failed. I'm using Python's multiprocessing module to simultaneously run 29 processes(multiprocessing.Pool picked this no by default) on a 24 core machine. Following was the code, of course I was using .my.cnf to pass all the credential to avoid committing them to git :
import mysql.connector
from mysql.connector import pooling
conn_pool = pooling.MySQLConnectionPool(pool_name="mypool1",
pool_size=pooling.CNX_POOL_MAXSIZE,
option_files=MYSQL_CONFIG,
option_groups=MYSQL_GROUP_NODE1,
allow_local_infile=True)
conn = conn_pool.get_connection()
Finally, reverted back to old code. Still using PyMySQL and though errors are less frequent it is still causing a major problem. I looked at SQLAlchemy and couldn't really found much of a documentation around pooling.
I'm wondering how's everyone else dealing with mysql-python connection pooling issue? I really believe there should be something out there so that I don't have to reinvent the wheel.
Any pointers are much appreciated.
DBUtils implements MySQL (and generally claims to support abritrary DB-API 2 compliant database interfaces) user-sized connection pool PooledDB, thead-mapped pool PersistentDB and SteadyDB (see functionality section). The latter should fit your case where multiprocessing.Pool creates worker processes with managed persistent database connection each. It is described as:
DBUtils.SteadyDB is a module implementing "hardened" connections to a database, based on ordinary connections made by any DB-API 2 database module. A "hardened" connection will transparently reopen upon access when it has been closed or the database connection has been lost or when it is used more often than an optional usage limit.
You can use it with PyMySQL like:
import pymysql
from DBUtils.SteadyDB import connect
db = connect(
creator = pymysql, # the rest keyword arguments belong to pymysql
user = 'guest', password = '', database = 'name',
autocommit = True, charset = 'utf8mb4',
cursorclass = pymysql.cursors.DictCursor)
Also see this related question for more examples.
I am running Rails 3 with Ruby 2.3.3 on puma with postgresql. I have an initializer/twitter.rb file that starts a thread on boot with a streaming api for twitter. When I use rails server to start my application, the twitter streaming works and I can reach my website like normal. (If I do not put the streaming on a different thread, the streaming works but I can not view my application in the browser since the thread is blocked by the twitter stream). But when I use puma -C config/puma.rb to start my application, I get the following message that is telling me that my thread was found on startup and was put to sleep. How can I tell puma to let me run this thread in the background on startup?
initializer/twitter.rb
### START TWITTER THREAD ### if production
if Rails.env.production?
puts 'Starting Twitter Stream...'
Thread.start {
twitter_stream.user do |object|
case object
when Twitter::Tweet
handle_tweet(object)
when Twitter::DirectMessage
handle_direct_message(object)
when Twitter::Streaming::Event
puts "Received Event: #{object.to_yaml}"
when Twitter::Streaming::FriendList
puts "Received FriendList: #{object.to_yaml}"
when Twitter::Streaming::DeletedTweet
puts "Deleted Tweet: #{object.to_yaml}"
when Twitter::Streaming::StallWarning
puts "Stall Warning: #{object.to_yaml}"
else
puts "It's something else: #{object.to_yaml}"
end
end
}
end
config/puma.rb
workers Integer(ENV['WEB_CONCURRENCY'] || 2)
threads_count = Integer(ENV['RAILS_MAX_THREADS'] || 5)
threads threads_count, threads_count
preload_app!
rackup DefaultRackup
port ENV['PORT'] || 3000
environment ENV['RACK_ENV'] || 'development'
on_worker_boot do
# Valid on Rails up to 4.1 the initializer method of setting `pool` size
ActiveSupport.on_load(:active_record) do
config = ActiveRecord::Base.configurations[Rails.env] ||
Rails.application.config.database_configuration[Rails.env]
config['pool'] = ENV['RAILS_MAX_THREADS'] || 5
ActiveRecord::Base.establish_connection(config)
end
end
Message on startup
2017-04-19T23:52:47.076636+00:00 app[web.1]: Connecting to database specified by DATABASE_URL
2017-04-19T23:52:47.115595+00:00 app[web.1]: Starting Twitter Stream...
2017-04-19T23:52:47.229203+00:00 app[web.1]: Received FriendList: --- !ruby/array:Twitter::Streaming::FriendList []
2017-04-19T23:52:47.865735+00:00 app[web.1]: [4] * Listening on tcp://0.0.0.0:13734
2017-04-19T23:52:47.865830+00:00 app[web.1]: [4] ! WARNING: Detected 1 Thread(s) started in app boot:
2017-04-19T23:52:47.865870+00:00 app[web.1]: [4] ! #<Thread:0x007f4df8bf6240#/app/config/initializers/twitter.rb:135 sleep> - /app/vendor/ruby-2.3.3/lib/ruby/2.3.0/openssl/buffering.rb:125:in `sysread'
2017-04-19T23:52:47.875056+00:00 app[web.1]: [4] - Worker 0 (pid: 7) booted, phase: 0
2017-04-19T23:52:47.865919+00:00 app[web.1]: [4] Use Ctrl-C to stop
2017-04-19T23:52:47.882759+00:00 app[web.1]: [4] - Worker 1 (pid: 11) booted, phase: 0
2017-04-19T23:52:48.148831+00:00 heroku[web.1]: State changed from starting to up
Thanks in advance for the help. I have looked at several other posts mentioning WARNING: Detected 1 Thread(s) started in app boot but the answers say to ignore the warning if the thread is not important. In my case, the thread is very important and I need this thread to not sleep.
From your code I think you have a bigger issue on your hands than a sleeping thread... which I guess might be caused by the fact that some things are misnamed and others are just not often considered when relying on a web framework.
In the world of servers, "workers" refer to forked processes that perform server related tasks, often accepting new connections and handling web requests.
BUT - fork doesn't duplicate threads! - the new process (the worker) starts with only one single thread, which is a copy of the thread that called fork.
This is because processes don't share memory (normally). Whatever global data you have in a process is private to that process (i.e., if you save connected websocket clients in an array, that array is different for each "worker").
This can't be helped, it's part of how the OS and fork are designed.
So, the warning is not something you can circumvent - it's an indication of a design flaw in the app(!).
For example, in your current design (assuming the thread wasn't sleeping), the handle_tweet method will only be called for the original server process and it won't be called for any worker process.
If you're using pub/sub, you only need one twitter_stream connection for the whole app (no matter how many servers or workers you application has) - perhaps a twitter_stream process (or background app) will be better than a thread.
But if you're implementing handle_tweet in a process specific way - i.e., by sending a message to every connected clients saved in an array - you need to make sure every "worker" initiates a twitter_stream thread(!).
When I wrote Iodine (a different server than Puma), I handled these use cases using the Iodine.run method, which defers tasks for later. The "saved" task should be performed only after the workers are initialized and the event loop starts running, so it's performed in each process (allowing you to start a new tread in each proccess).
i.e.
Iodine.run do
Thread.start do
twitter_stream.user do |object|
# ...
end
end
end
I assume Puma has a similar solution. From what I understand of the Puma Clustered-Mode Documentation, Adding the following block to your config/puma.rb might help:
# config/puma.rb
on_worker_boot do
Thread.start do
twitter_stream.user do |object|
# ...
end
end
end
Good luck!
EDIT: relating to the comment about twitter_stream using ActiveRecord
From the comments I gather that the twitter_stream callbacks store data in the DataBase as well as handle "push" events or notices.
Although these two concerns are connected, they are very different from each other.
For example, twitter_stream callbacks should only store data in the DataBase once. Even if your application grows to a billion users, you will only need to save the data in the database once.
This means that the twitter_stream callbacks should have their own dedicated process that runs only once, possibly separate from the main application.
At first, and as long as you limit your application to a single (only one server/application running), you might use fork together with the initializer/twitter.rb script... i.e.:
### START TWITTER PROCESS ### if production
if Rails.env.production?
puts 'Starting Twitter Stream...'
Process.fork do
twitter_stream.user do |object|
# ...
end
end
end
On the other hand, notifications should be addressed to a specific user on a specific connection owned by a specific process.
Hence, notifications should be a separate concern from the twitter_stream DataBase update and they should be running in the background of every process, using the on_worker_boot (or Iodine.run) described above.
To achieve this, you might have on_worker_boot start a background thread that will listen to a pub/sub service such as Redis, while the twitter_stream callbacks "publish" updates to the pub/sub service.
This would allow each process to review the update and check if any of the connections it "owns" belongs to a client that should be notified of the update.
The way I'm reading your question, this doesn't look like an issue. A sleeping thread is different from a dead thread. Sleep just means that the thread is waiting idle, not consuming any cpu. If all else is hooked up properly, then as soon as the twitter api detects an event, it should wake the the thread up, run whatever handler you've defined, and then go right back to sleep. Sleeping isn't "running in the background," but it is "waiting for something to happen (e.g. someone tweets #me.) so I can run in the background."
A quick example to demonstrate this:
2.4.0 :001 > t = Thread.new { TCPServer.new(1234).accept ; puts "Got a connection! Dying..." }
=> #<Thread:0x007fa3941fed90#(irb):1 sleep>
2.4.0 :002 > t
=> #<Thread:0x007fa3941fed90#(irb):1 sleep>
2.4.0 :003 > t
=> #<Thread:0x007fa3941fed90#(irb):1 sleep>
2.4.0 :004 > TCPSocket.new 'localhost', 1234
=> #<TCPSocket:fd 35>
2.4.0 :005 > Got a connection! Dying...
t
=> #<Thread:0x007fa3941fed90#(irb):1 dead>
Sleeping just means "waiting for action."
Puma is a thread-based server, and is very particular about spinning threads up in its boot process, hence the warning about a thread started at app boot.
For what it's worth though, it's kind of weird to have a thread listening for updates from an api like that in a webserver. Maybe you should look into having a worker handle twitter events using something like Resque? Or maybe ActionCable is relevant to your use case?
I am seeing some very slow cache reads in my rails app. Both redis (redis-rails) and memcached (dalli) produced the same results.
It looks like it is only the first call to Rails.cache that causes the slowness (averaging 500ms).
I am using skylight to instrument my app and see a graph like:
I have a Rails.cache.fetch call in this code, but when I benchmark it I see it average around 8ms, which matches what memcache-top shows for my average call time.
I thought this might be dalli connections opening slowly, but benchmarking that didnt show anything slow either. I'm at a loss for what else to check into.
Does anyone have any good techniques for tracking this sort of thing down in a rails app?
Edit #1
Memcache server is stored in ENV['MEMCACHE_SERVERS'], all the servers are in the us-east-1 datacenter.
Cache config looks like:
config.cache_store = :dalli_store, nil, { expires_in: 1.day, compress: true }
I ran something like:
100000.times { Rails.cache.fetch('something') }
and calculated the average timings and got something on the order of 8ms when running on one of my webservers.
Testing my theory of the first request is slow, I opened a console on my web server and ran the following as the first command.
irb(main):002:0> Benchmark.ms { Rails.cache.fetch('someth') { 1 } }
Dalli::Server#connect my-cache.begfpc.0001.use1.cache.amazonaws.com:11211
=> 12.043342
Edit #2
Ok, I split out the fetch into a read and write, and tracked them independently with statsd. It looks like the averages sit around what I would expect, but the max times on the read are very spiky and get up into the 500ms range.
http://s16.postimg.org/5xlmihs79/Screen_Shot_2014_12_19_at_6_51_16_PM.png
I would like to up/down-scale my dynos automatically dependings on the size of the pending list.
I heard about HireFire, but the scaling is only made every minutes, and I need it to be (almost) real time.
I would like to scale my dynos so that the pending list be ~always empty.
I was thinking about doing it by myself (with a scheduler (~15s delay) and using Heroku API), because I'm not sure there is anything out there; and if not, do you know any monitoring tools which could send an email alert if the queue lenght exceed a fixed size ? (similar to apdex on newrelic).
A potential custom code solution is included below. There are also two New Relic plgins that do Resque monitoring. I'm not sure if either do email alerts based on exceeding a certain queue size. Using resque hooks you could output log messages that could trigger email alerts (or slack, hipchat, pagerduty, etc) via a service like Papertrail or Loggly. THis might look something like:
def after_enqueue_pending_check(*args)
job_count = Resque.info[:pending].to_i
if job_count > PENDING_THRESHOLD
Rails.logger.warn('pending queue threshold exceeded')
end
end
Instead of logging you could send an email but without some sort of rate limiting on the emails you could easily get flooded if the pending queue grows rapidly.
I don't think there is a Heroku add-on or other service that can do the scaling in realtime. There is a gem that will do this using the deprecated Heroku API. You can do this using resque hooks and the Heroku platform-api. This untested example uses the heroku platform-api to scale the 'worker' dynos up and down. Just as an example I included 1 worker for every three pending jobs. The downscale will only every reset the workers to 1 if there are no pending jobs and no working jobs. This is not ideal and should be updated to fit your needs. See here for information about ensuring that then scaling down the workers you don't lose jobs: http://quickleft.com/blog/heroku-s-cedar-stack-will-kill-your-resque-workers
require 'platform-api'
def after_enqueue_upscale(*args)
heroku = PlatformAPI.connect_oauth('OAUTH_TOKEN')
worker_count = heroku.formation.info('app-name','worker')["quantity"]
job_count = Resque.info[:pending].to_i
# one worker for every 3 jobs (minimum of 1)
new_worker_count = ((job_count / 3) + 1).to_i
return if new_worker_count <= worker_count
heroku.formation.update('app-name', 'worker', {"quantity" => new_worker_count})
end
def after_perform_downscale
heroku = PlatformAPI.connect_oauth('OAUTH_TOKEN')
if Resque.info[:pending].to_i == 0 && Resque.info[:working].to_i == 0
heroku.formation.update('app-name', 'worker', {"quantity" => 1})
end
end
Im having a similiar issue and have ran into "Hirefire"
https://www.hirefire.io/.
For ruby, use:
https://github.com/hirefire/hirefire-resource
It runs similar to theoretically works like AdepScale (https://www.adeptscale.com/). However Hirefire can also scale workers and does not limit itself to just dynos. Hope this helps!
I am running a Rails 3.1.0 app and I have an odd problem. On our staging server, with VERY little activity we have 5 ruby processes CONSTANTLY pinging mySQL with the following:
poll([{fd=12, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
write(12, "\f\0\0\0\3SHOW TABLES", 16) = 16
select(13, [12], NULL, NULL, NULL) = 1 (in [12])
read(12, "\1\0\0\1\1D\0\0\2\3def\0\vTABLE_NAMES\0\31Tabl"..., 16384) = 637
poll([{fd=12, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
write(12, "\f\0\0\0\3SHOW TABLES", 16) = 16
select(13, [12], NULL, NULL, NULL)
That last line is incomplete, but we're talking a few times every single second (x5/6 processes). The server is a beast, it has 32GB of RAM and has been optimised somewhat (the mySQL setup that is) but its killing the server.
Like I say, the server has very little activity, so its not users, or a task.
(For admins thinking of moving this away from this forum, I believe this is a ruby/rails issue, I'm not sure if it was in a server forum it would have a good compatibility with answerers)
I would be incredibly grateful for any advice, I fear it might be a bit over my head. I'm not such a Linux/mySQL pro.
Thanks
I would look at the connection pool for your database. Does running this help?
ActiveRecord::Base.clear_active_connections!
Specifically, in your config/database.yml for this environment, try setting pool: 50 and restart rails, then see if this affects the result. The next question if your pool is exhausted would be to get to the specifics of why the database connection pool is getting used up (this command, or something running in resque). I think the pool default size is 4 or 5