6 ruby processes calling SHOW TABLES on mysql, bringing mysql down - ruby-on-rails

I am running a Rails 3.1.0 app and I have an odd problem. On our staging server, with VERY little activity we have 5 ruby processes CONSTANTLY pinging mySQL with the following:
poll([{fd=12, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
write(12, "\f\0\0\0\3SHOW TABLES", 16) = 16
select(13, [12], NULL, NULL, NULL) = 1 (in [12])
read(12, "\1\0\0\1\1D\0\0\2\3def\0\vTABLE_NAMES\0\31Tabl"..., 16384) = 637
poll([{fd=12, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
write(12, "\f\0\0\0\3SHOW TABLES", 16) = 16
select(13, [12], NULL, NULL, NULL)
That last line is incomplete, but we're talking a few times every single second (x5/6 processes). The server is a beast, it has 32GB of RAM and has been optimised somewhat (the mySQL setup that is) but its killing the server.
Like I say, the server has very little activity, so its not users, or a task.
(For admins thinking of moving this away from this forum, I believe this is a ruby/rails issue, I'm not sure if it was in a server forum it would have a good compatibility with answerers)
I would be incredibly grateful for any advice, I fear it might be a bit over my head. I'm not such a Linux/mySQL pro.
Thanks

I would look at the connection pool for your database. Does running this help?
ActiveRecord::Base.clear_active_connections!
Specifically, in your config/database.yml for this environment, try setting pool: 50 and restart rails, then see if this affects the result. The next question if your pool is exhausted would be to get to the specifics of why the database connection pool is getting used up (this command, or something running in resque). I think the pool default size is 4 or 5

Related

MySQL connection pool in python?

I'm trying to process large amount of data using Python and maintaining processing status in MySQL. However, I'm surprised there is no standard connection pool for python-mysql (like HikariCP in Java).
I initially started with PyMySQL, things were great until the program ran for first few hours. After few hours, things started to fail. I was getting lot of errors like:
pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on '127.0.0.1' ([Errno 99] Cannot assign requested address)")
Moreover, lot of ports were stuck in TIME_WAIT state because I'm opening and closing connections too frequently because of lack of connection pooling
/d/p/950 ❯❯❯ netstat -nt | wc -l
84752
Per this and this, I tried to set tcp_fin_timeout and ip_local_port_range, but hardly anything improved.
echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout
echo 15000 65000 > /proc/sys/net/ipv4/ip_local_port_range
Then I found out that MySQL provides mysql.connector which comes with pooling functionality. After doing all that performance actually deteriorated. More processes started to get failed. I'm using Python's multiprocessing module to simultaneously run 29 processes(multiprocessing.Pool picked this no by default) on a 24 core machine. Following was the code, of course I was using .my.cnf to pass all the credential to avoid committing them to git :
import mysql.connector
from mysql.connector import pooling
conn_pool = pooling.MySQLConnectionPool(pool_name="mypool1",
pool_size=pooling.CNX_POOL_MAXSIZE,
option_files=MYSQL_CONFIG,
option_groups=MYSQL_GROUP_NODE1,
allow_local_infile=True)
conn = conn_pool.get_connection()
Finally, reverted back to old code. Still using PyMySQL and though errors are less frequent it is still causing a major problem. I looked at SQLAlchemy and couldn't really found much of a documentation around pooling.
I'm wondering how's everyone else dealing with mysql-python connection pooling issue? I really believe there should be something out there so that I don't have to reinvent the wheel.
Any pointers are much appreciated.
DBUtils implements MySQL (and generally claims to support abritrary DB-API 2 compliant database interfaces) user-sized connection pool PooledDB, thead-mapped pool PersistentDB and SteadyDB (see functionality section). The latter should fit your case where multiprocessing.Pool creates worker processes with managed persistent database connection each. It is described as:
DBUtils.SteadyDB is a module implementing "hardened" connections to a database, based on ordinary connections made by any DB-API 2 database module. A "hardened" connection will transparently reopen upon access when it has been closed or the database connection has been lost or when it is used more often than an optional usage limit.
You can use it with PyMySQL like:
import pymysql
from DBUtils.SteadyDB import connect
db = connect(
creator = pymysql, # the rest keyword arguments belong to pymysql
user = 'guest', password = '', database = 'name',
autocommit = True, charset = 'utf8mb4',
cursorclass = pymysql.cursors.DictCursor)
Also see this related question for more examples.

How can I prevent database connections from timing out in Rails?

I have a Rails system in which every half hour, the following is done:
There are 15 clients somewhere else on the network
The server creates a record called Measurement for each of these clients
The measurement records are configured, and then they are run asynchronously via Sidekiq, using MeasurementWorker.perform_async(m.id)
The connection to the client is done with Celluloid actors and a WebSocket client
Each measurement, when run, creates a number of event records that are stored in the database
The system has been running well with 5 clients, but now I am at 15, and many of the measurements don't run anymore when I start them at the same time, with the following error:
2015-02-04T07:30:10.410Z 35519 TID-owd4683iw MeasurementWorker JID-15f6b396ae9e3e3cb2ee3f66 INFO: fail: 5.001 sec
2015-02-04T07:30:10.412Z 35519 TID-owd4683iw WARN: {"retry"=>false, "queue"=>"default", "backtrace"=>true, "class"=>"MeasurementWorker", "ar
gs"=>[6504], "jid"=>"15f6b396ae9e3e3cb2ee3f66", "enqueued_at"=>1423035005.4078047}
2015-02-04T07:30:10.412Z 35519 TID-owd4683iw WARN: could not obtain a database connection within 5.000 seconds (waited 5.000 seconds)
2015-02-04T07:30:10.412Z 35519 TID-owd4683iw WARN: /home/webtv/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/activerecord-4.1.4/lib/active_
record/connection_adapters/abstract/connection_pool.rb:190:in `block in wait_poll'
....
Now, my production environment looks like this:
config/sidekiq.yml
production:
:verbose: false
:logfile: ./log/sidekiq.log
:poll_interval: 5
:concurrency: 50
config/unicorn.rb
...
worker_processes Integer(ENV["WEB_CONCURRENCY"] || 3)
timeout 60
...
config/database.yml
production:
adapter: postgresql
database: ***
username: ***
password: ***
host: 127.0.0.1
pool: 50
postgresql.conf
max_connections = 100 # default
As you see, I've already increased the concurrency of Sidekiq to 50, to cater for a high number of possible concurrent measurements. I've set the database pool to 50, which already looks like overkill to me.
I should add that the server itself is quite powerful, with 8 GB RAM and a quad-core Xeon E5-2403 1.8 GHz.
What should these values ideally be set to? What formula can I use to calculate them? (E.g. number of maximum DB connections = Unicorn workers × Sidekiq concurrency × N)
It looks to me like your pool configuration of 100 is not taking affect. Each process will need a max of 50 so change 100 to 50. I don't know if you are using Heroku but it is notoriously tough to configure the pool size.
Inside mysql, your max connection count should look like this:
((Unicorn processes) * 1) + ((sidekiq processes) * 50)
Unicorn is single threaded and never needs more than one connection unless you are spinning up your own threads in your Rails app for some reason.
I'm sure the creator of sidekiq #MikePerham is more than suited to the task of fixing your sidekiq issues but as a ruby dev two things stand out.
If you're doing a lot of database operations via ruby can you push some of them into the database as triggers? You could still start them on the appside with a sidekiq process of course. :)
Second every half hour screams to me of a rake task run via cron. Hope you're doing that too. FWIW I usually use the Whenever gem to create the cron line I have to drop into the crontab of the user running the app. Note its designed to autocreate the crontask in a scripted deploy but in a non-scripted one you can still leverage it to give you the lines you have to paste into your crontab though via the whenever command.
Also you mention this is for measurements.
Have you considered leveraging something like elasticsearch and the searchkick gem? This is a little more of a complex setup, be sure to firewall the server you install ES on. But this might make your code a lot more manageable as you grow. Also it gives you a good search mechanism almost for free and its distributed and more language agnostic, e.g. Bloodhound, Java. :) Plus kibana gives you a nice window into the ES records

Isolating cause of Erlang and RabbitMQ crashes

We have been trying to make use of the RabbitMQ Service Bus (v3.3.4) but the central bus keeps crashing. At the moment we are not using any clustering and its hosted on Windows Server 2008 R2. We'd like to isolate the root cause but the below error is the only one we can find. Can anyone shed some light on what; if anything; we can do to find the root cause of this?
Note: There are roughly 20 consumers with roughly the same number of Topic subscriptions. Also, all the clients are .NET 4.5 using the 3.3.4 Rabbit client libraries.
Version=1
EventType=APPCRASH
EventTime=130658038736577295
ReportType=2
Consent=1
ReportIdentifier=7f93ccd8-9cbe-11e4-ae00-000c29c08139
IntegratorReportIdentifier=7f93ccd7-9cbe-11e4-ae00-000c29c08139
Response.type=4
Sig[0].Name=Application Name
Sig[0].Value=erl.exe
Sig[1].Name=Application Version
Sig[1].Value=0.0.0.0
Sig[2].Name=Application Timestamp
Sig[2].Value=5343035d
Sig[3].Name=Fault Module Name
Sig[3].Value=MSVCR100.dll
Sig[4].Name=Fault Module Version
Sig[4].Value=10.0.30319.1
Sig[5].Name=Fault Module Timestamp
Sig[5].Value=4ba220dc
Sig[6].Name=Exception Code
Sig[6].Value=40000015
Sig[7].Name=Exception Offset
Sig[7].Value=00000000000760d9
DynamicSig[1].Name=OS Version
DynamicSig[1].Value=6.1.7600.2.0.0.272.7
DynamicSig[2].Name=Locale ID
DynamicSig[2].Value=1033
DynamicSig[22].Name=Additional Information 1
DynamicSig[22].Value=8d79
DynamicSig[23].Name=Additional Information 2
DynamicSig[23].Value=8d79a00078e92d9c3d5d79d4324254fe
DynamicSig[24].Name=Additional Information 3
DynamicSig[24].Value=9af5
DynamicSig[25].Name=Additional Information 4
DynamicSig[25].Value=9af5b20633c279dbf44b04a614c6a1f6
UI[2]=C:\Program Files\erl6.0\erts-6.0\bin\erl.exe
UI[5]=Check online for a solution (recommended)
UI[6]=Check for a solution later (recommended)
UI[7]=Close
UI[8]=erl.exe stopped working and was closed
UI[9]=A problem caused the application to stop working correctly. Windows will notify you if a solution is available.
UI[10]=&Close
LoadedModule[0]=C:\Program Files\erl6.0\erts-6.0\bin\erl.exe
LoadedModule[1]=C:\Windows\SYSTEM32\ntdll.dll
LoadedModule[2]=C:\Windows\system32\kernel32.dll
LoadedModule[3]=C:\Windows\system32\KERNELBASE.dll
LoadedModule[4]=C:\Windows\system32\MSVCR100.dll
LoadedModule[5]=C:\Program Files\erl6.0\erts-6.0\bin\erlexec.dll
LoadedModule[6]=C:\Windows\system32\USER32.dll
LoadedModule[7]=C:\Windows\system32\GDI32.dll
LoadedModule[8]=C:\Windows\system32\LPK.dll
LoadedModule[9]=C:\Windows\system32\USP10.dll
LoadedModule[10]=C:\Windows\system32\msvcrt.dll
LoadedModule[11]=C:\Windows\system32\IMM32.DLL
LoadedModule[12]=C:\Windows\system32\MSCTF.dll
LoadedModule[13]=C:\Windows\system32\apphelp.dll
LoadedModule[14]=C:\Program Files\erl6.0\erts-6.0\bin\beam.dll
LoadedModule[15]=C:\Windows\system32\ADVAPI32.dll
LoadedModule[16]=C:\Windows\SYSTEM32\sechost.dll
LoadedModule[17]=C:\Windows\system32\RPCRT4.dll
LoadedModule[18]=C:\Windows\WinSxS\amd64_microsoft.windows.common-controls_6595b64144ccf1df_6.0.7600.16661_none_fa62ad231704eab7\COMCTL32.dll
LoadedModule[19]=C:\Windows\system32\SHLWAPI.dll
LoadedModule[20]=C:\Windows\system32\COMDLG32.dll
LoadedModule[21]=C:\Windows\system32\SHELL32.dll
LoadedModule[22]=C:\Windows\system32\WS2_32.dll
LoadedModule[23]=C:\Windows\system32\NSI.dll
LoadedModule[24]=C:\Windows\system32\IPHLPAPI.DLL
LoadedModule[25]=C:\Windows\system32\WINNSI.DLL
LoadedModule[26]=C:\Windows\system32\mswsock.dll
LoadedModule[27]=C:\Windows\System32\wshtcpip.dll
LoadedModule[28]=C:\Windows\system32\NLAapi.dll
LoadedModule[29]=C:\Windows\system32\DNSAPI.dll
LoadedModule[30]=C:\Windows\System32\winrnr.dll
LoadedModule[31]=C:\Windows\system32\napinsp.dll
LoadedModule[32]=C:\Windows\System32\wship6.dll
FriendlyEventName=Stopped working
ConsentKey=APPCRASH
AppName=erl.exe
AppPath=C:\Program Files\erl6.0\erts-6.0\bin\erl.exe

Neo4j.rb 1.9 HA in development working intermittently, then giving errors

Hullo,
We are attempting to set up an Neo4j HA cluster in our Rails dev environment, much like what is explained here: https://github.com/andreasronge/neo4j/wiki/Neo4j%3A%3ARails-Config
We have two instances in the cluster. Server 1 is the app, Server 2 is the Rails console. They both start fine, but eventually one of them will fall over. Usually, it's one of the following:
1) java.io.FileNotFound: /server_1_path/path/to/some/RailsModel_exact/_2.fxm file. Somehow, the indexes expect a file to exist that does not exist. Sometimes, the file does not exist in EITHER server directory, and the only thing that helps is to make both sets of index files identical by copying one to the other.
2) Orphaned index.lock files. The error here will say that a certain index is locked, and removing the specific .lock file fixes the issue. Annoying.(maybe similar issue)
3) Add data in one instance, never shows up in the other instance. In this case, I create a node in the Rails console, and it never shows up in the app, or vice versa. In this case, it seems that both instances start up as master and will never sync. Usually have to delete one of the dbs and restart to get them working again.
I am not sure if the new 1.9 HA stuff isn't ready for prime time or we are being too nonchalant with how we quit the app/console and Neo4j is not shutting down cleanly.
This is a highly frustrating issue. We'd appreciate any help/pointers to get it working right.
We are using the 1.9 M03 version of the gem, and here is our config:
server_id = ((defined? Rails::Console)) ? 2 : 1
config.neo4j['enable_ha'] = true
config.neo4j['enable_remote_shell'] = "port=133#{server_id}"
config.neo4j['ha.server_id'] = server_id
config.neo4j['ha.server'] = "localhost:600#{server_id}"
config.neo4j['ha.pull_interval'] = '1s'
config.neo4j['ha.discovery.enabled'] = false
config.neo4j['ha.initial_hosts'] = [1,2,3].map{|id| ":500#{id}"}.join(',')
config.neo4j['ha.cluster_server'] = ":5001-5099" #"#{server_id}"
config.neo4j.storage_path = File.expand_path("db/ha_neo_#{server_id}", Object::Rails.root)
config.neo4j['online_backup_server']= "localhost:636#{server_id}"
config.neo4j['ha.cluster_server'] = "localhost:500#{server_id}"
config.neo4j['webserver.port'] = "747#{server_id}"
config.neo4j['webserver.https.port'] = "748#{server_id}"
config.neo4j['enable_remote_shell'] = "port=933#{server_id}"
config.neo4j['use_adaptive_cache'] = false
puts "Config HA cluster, ha.server_id: #{config.neo4j['ha.server_id']}, db: #{config.neo4j.storage_path}"
Thanks for any/all help/advice.

mod-rails / phusion passenger on apache: Really slow

I installed redmine on the apache and used mod_ruby first, which was incredible slow... now i switched to phusion passenger but the response time is still really slow ( talking about 5-6 seconds here, even using a wget to localhost from the server itself.. )
i just removed the "old" mods from the apache dir, but it's still slow... anyway, the logfile at least shows, that the passenger is used:
127.0.0.1 - - [15/Nov/2009:10:38:25 +0000] "OPTIONS * HTTP/1.0" 200 - "-" "Apache/2.2.9
(Debian) Phusion_Passenger/2.2.5 PHP/5.2.6-1+lenny3 with Suhosin-Patch mod_ssl/2.2.9
OpenSSL/0.9.8g mod_perl/2.0.4 Perl/v5.10.0 (internal dummy connection)"
I have no idea why this happens, the server should be fast enough.. apache-log isn't showing anything suspicious..
EDIT:
Thanks for the hint..
The "passenger-status" is "empty":
----------- General information -----------
max = 6
count = 0
active = 0
inactive = 0
Waiting on global queue: 0
Any advice? Thanks!
Try increasing the PoolIdleTime setting (which is 2 Minutes by default I think). Setting it to 0 helped speed the startup of my Redmine stack a lot. Check out this question on Serverfault on which values to set.
You can use the config option PassengerMinInstances, avaliable since Passenger 3.0.0. This setting allows you to tell Apache how many instances of your deployment must stay alive, even when your applications have been idle for a logner period than the defined by PoolIdleTime. Have a look at the Phusion Passenger docs. There are some otrher usefull options to improve your deoployment's performance.
This answer may be a bit outdated --I'm quite sure almost everyone know the new features of Passenger, but I wasn't, and this question helped a lot.
I found a tool ( http://www.wekkars.com ) that keeps my application alive. I just updated the PoolIdleTime to 30 minutes and the tool does the rest...

Resources