I'm trying to process large amount of data using Python and maintaining processing status in MySQL. However, I'm surprised there is no standard connection pool for python-mysql (like HikariCP in Java).
I initially started with PyMySQL, things were great until the program ran for first few hours. After few hours, things started to fail. I was getting lot of errors like:
pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on '127.0.0.1' ([Errno 99] Cannot assign requested address)")
Moreover, lot of ports were stuck in TIME_WAIT state because I'm opening and closing connections too frequently because of lack of connection pooling
/d/p/950 ❯❯❯ netstat -nt | wc -l
84752
Per this and this, I tried to set tcp_fin_timeout and ip_local_port_range, but hardly anything improved.
echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout
echo 15000 65000 > /proc/sys/net/ipv4/ip_local_port_range
Then I found out that MySQL provides mysql.connector which comes with pooling functionality. After doing all that performance actually deteriorated. More processes started to get failed. I'm using Python's multiprocessing module to simultaneously run 29 processes(multiprocessing.Pool picked this no by default) on a 24 core machine. Following was the code, of course I was using .my.cnf to pass all the credential to avoid committing them to git :
import mysql.connector
from mysql.connector import pooling
conn_pool = pooling.MySQLConnectionPool(pool_name="mypool1",
pool_size=pooling.CNX_POOL_MAXSIZE,
option_files=MYSQL_CONFIG,
option_groups=MYSQL_GROUP_NODE1,
allow_local_infile=True)
conn = conn_pool.get_connection()
Finally, reverted back to old code. Still using PyMySQL and though errors are less frequent it is still causing a major problem. I looked at SQLAlchemy and couldn't really found much of a documentation around pooling.
I'm wondering how's everyone else dealing with mysql-python connection pooling issue? I really believe there should be something out there so that I don't have to reinvent the wheel.
Any pointers are much appreciated.
DBUtils implements MySQL (and generally claims to support abritrary DB-API 2 compliant database interfaces) user-sized connection pool PooledDB, thead-mapped pool PersistentDB and SteadyDB (see functionality section). The latter should fit your case where multiprocessing.Pool creates worker processes with managed persistent database connection each. It is described as:
DBUtils.SteadyDB is a module implementing "hardened" connections to a database, based on ordinary connections made by any DB-API 2 database module. A "hardened" connection will transparently reopen upon access when it has been closed or the database connection has been lost or when it is used more often than an optional usage limit.
You can use it with PyMySQL like:
import pymysql
from DBUtils.SteadyDB import connect
db = connect(
creator = pymysql, # the rest keyword arguments belong to pymysql
user = 'guest', password = '', database = 'name',
autocommit = True, charset = 'utf8mb4',
cursorclass = pymysql.cursors.DictCursor)
Also see this related question for more examples.
Related
While this Q/A does not address the actual issue of: How to detect with client (eg redis-py) that redis is running out of memory constraint not by machine but by the maxmem configuration? Before inserts fail which command to use in the programm to detect about to be full?
My first guess is: info and check if used_memory_peak < maxmem setting. Is this correct?
(Besides, for out of machine memory, since defrag, use which setting, none of the returned INFO fields help here)
Well should i just try an insert and see if fail (but that would be after the fact then.)
Trail and error, good enough tested by running
while true; do redis-cli lpush mm longstringhere; done; results on maxmem - used_memory < 0.1MB with insert failures:
(error) OOM command not allowed when used memory > 'maxmemory'.
So i have set i poll it via redis-py client and once the diff goes <1mb threshold throw up, sry raise Error of course. Make sure the user_memory memory addon of your longest command is < threshold too of course otherwise you run into it on insert.
I try to figure how to calc the ~percentage of used mem so i get notification way earlier eg 90% of maxmem, therefore this solution is fine.
Info dump:
# Memory
used_memory:3126272
used_memory_human:2.98M
used_memory_rss:5292032
used_memory_rss_human:5.05M
used_memory_peak:4914296
used_memory_peak_human:4.69M
used_memory_peak_perc:63.62%
used_memory_overhead:696654...
Furthermore maxmem is not a hardcap, when running it further by eg adding members to existing set.
used_memory:3162584
used_memory_human:3.02M
code to get percent 0-100
rmem_info = pipe.info(section='memory')
{'redis_mem_percent': math.ceil(rmem_info['used_memory'] / rmem_info['maxmemory'] *100)}
I have a Rails system in which every half hour, the following is done:
There are 15 clients somewhere else on the network
The server creates a record called Measurement for each of these clients
The measurement records are configured, and then they are run asynchronously via Sidekiq, using MeasurementWorker.perform_async(m.id)
The connection to the client is done with Celluloid actors and a WebSocket client
Each measurement, when run, creates a number of event records that are stored in the database
The system has been running well with 5 clients, but now I am at 15, and many of the measurements don't run anymore when I start them at the same time, with the following error:
2015-02-04T07:30:10.410Z 35519 TID-owd4683iw MeasurementWorker JID-15f6b396ae9e3e3cb2ee3f66 INFO: fail: 5.001 sec
2015-02-04T07:30:10.412Z 35519 TID-owd4683iw WARN: {"retry"=>false, "queue"=>"default", "backtrace"=>true, "class"=>"MeasurementWorker", "ar
gs"=>[6504], "jid"=>"15f6b396ae9e3e3cb2ee3f66", "enqueued_at"=>1423035005.4078047}
2015-02-04T07:30:10.412Z 35519 TID-owd4683iw WARN: could not obtain a database connection within 5.000 seconds (waited 5.000 seconds)
2015-02-04T07:30:10.412Z 35519 TID-owd4683iw WARN: /home/webtv/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/activerecord-4.1.4/lib/active_
record/connection_adapters/abstract/connection_pool.rb:190:in `block in wait_poll'
....
Now, my production environment looks like this:
config/sidekiq.yml
production:
:verbose: false
:logfile: ./log/sidekiq.log
:poll_interval: 5
:concurrency: 50
config/unicorn.rb
...
worker_processes Integer(ENV["WEB_CONCURRENCY"] || 3)
timeout 60
...
config/database.yml
production:
adapter: postgresql
database: ***
username: ***
password: ***
host: 127.0.0.1
pool: 50
postgresql.conf
max_connections = 100 # default
As you see, I've already increased the concurrency of Sidekiq to 50, to cater for a high number of possible concurrent measurements. I've set the database pool to 50, which already looks like overkill to me.
I should add that the server itself is quite powerful, with 8 GB RAM and a quad-core Xeon E5-2403 1.8 GHz.
What should these values ideally be set to? What formula can I use to calculate them? (E.g. number of maximum DB connections = Unicorn workers × Sidekiq concurrency × N)
It looks to me like your pool configuration of 100 is not taking affect. Each process will need a max of 50 so change 100 to 50. I don't know if you are using Heroku but it is notoriously tough to configure the pool size.
Inside mysql, your max connection count should look like this:
((Unicorn processes) * 1) + ((sidekiq processes) * 50)
Unicorn is single threaded and never needs more than one connection unless you are spinning up your own threads in your Rails app for some reason.
I'm sure the creator of sidekiq #MikePerham is more than suited to the task of fixing your sidekiq issues but as a ruby dev two things stand out.
If you're doing a lot of database operations via ruby can you push some of them into the database as triggers? You could still start them on the appside with a sidekiq process of course. :)
Second every half hour screams to me of a rake task run via cron. Hope you're doing that too. FWIW I usually use the Whenever gem to create the cron line I have to drop into the crontab of the user running the app. Note its designed to autocreate the crontask in a scripted deploy but in a non-scripted one you can still leverage it to give you the lines you have to paste into your crontab though via the whenever command.
Also you mention this is for measurements.
Have you considered leveraging something like elasticsearch and the searchkick gem? This is a little more of a complex setup, be sure to firewall the server you install ES on. But this might make your code a lot more manageable as you grow. Also it gives you a good search mechanism almost for free and its distributed and more language agnostic, e.g. Bloodhound, Java. :) Plus kibana gives you a nice window into the ES records
We have been trying to make use of the RabbitMQ Service Bus (v3.3.4) but the central bus keeps crashing. At the moment we are not using any clustering and its hosted on Windows Server 2008 R2. We'd like to isolate the root cause but the below error is the only one we can find. Can anyone shed some light on what; if anything; we can do to find the root cause of this?
Note: There are roughly 20 consumers with roughly the same number of Topic subscriptions. Also, all the clients are .NET 4.5 using the 3.3.4 Rabbit client libraries.
Version=1
EventType=APPCRASH
EventTime=130658038736577295
ReportType=2
Consent=1
ReportIdentifier=7f93ccd8-9cbe-11e4-ae00-000c29c08139
IntegratorReportIdentifier=7f93ccd7-9cbe-11e4-ae00-000c29c08139
Response.type=4
Sig[0].Name=Application Name
Sig[0].Value=erl.exe
Sig[1].Name=Application Version
Sig[1].Value=0.0.0.0
Sig[2].Name=Application Timestamp
Sig[2].Value=5343035d
Sig[3].Name=Fault Module Name
Sig[3].Value=MSVCR100.dll
Sig[4].Name=Fault Module Version
Sig[4].Value=10.0.30319.1
Sig[5].Name=Fault Module Timestamp
Sig[5].Value=4ba220dc
Sig[6].Name=Exception Code
Sig[6].Value=40000015
Sig[7].Name=Exception Offset
Sig[7].Value=00000000000760d9
DynamicSig[1].Name=OS Version
DynamicSig[1].Value=6.1.7600.2.0.0.272.7
DynamicSig[2].Name=Locale ID
DynamicSig[2].Value=1033
DynamicSig[22].Name=Additional Information 1
DynamicSig[22].Value=8d79
DynamicSig[23].Name=Additional Information 2
DynamicSig[23].Value=8d79a00078e92d9c3d5d79d4324254fe
DynamicSig[24].Name=Additional Information 3
DynamicSig[24].Value=9af5
DynamicSig[25].Name=Additional Information 4
DynamicSig[25].Value=9af5b20633c279dbf44b04a614c6a1f6
UI[2]=C:\Program Files\erl6.0\erts-6.0\bin\erl.exe
UI[5]=Check online for a solution (recommended)
UI[6]=Check for a solution later (recommended)
UI[7]=Close
UI[8]=erl.exe stopped working and was closed
UI[9]=A problem caused the application to stop working correctly. Windows will notify you if a solution is available.
UI[10]=&Close
LoadedModule[0]=C:\Program Files\erl6.0\erts-6.0\bin\erl.exe
LoadedModule[1]=C:\Windows\SYSTEM32\ntdll.dll
LoadedModule[2]=C:\Windows\system32\kernel32.dll
LoadedModule[3]=C:\Windows\system32\KERNELBASE.dll
LoadedModule[4]=C:\Windows\system32\MSVCR100.dll
LoadedModule[5]=C:\Program Files\erl6.0\erts-6.0\bin\erlexec.dll
LoadedModule[6]=C:\Windows\system32\USER32.dll
LoadedModule[7]=C:\Windows\system32\GDI32.dll
LoadedModule[8]=C:\Windows\system32\LPK.dll
LoadedModule[9]=C:\Windows\system32\USP10.dll
LoadedModule[10]=C:\Windows\system32\msvcrt.dll
LoadedModule[11]=C:\Windows\system32\IMM32.DLL
LoadedModule[12]=C:\Windows\system32\MSCTF.dll
LoadedModule[13]=C:\Windows\system32\apphelp.dll
LoadedModule[14]=C:\Program Files\erl6.0\erts-6.0\bin\beam.dll
LoadedModule[15]=C:\Windows\system32\ADVAPI32.dll
LoadedModule[16]=C:\Windows\SYSTEM32\sechost.dll
LoadedModule[17]=C:\Windows\system32\RPCRT4.dll
LoadedModule[18]=C:\Windows\WinSxS\amd64_microsoft.windows.common-controls_6595b64144ccf1df_6.0.7600.16661_none_fa62ad231704eab7\COMCTL32.dll
LoadedModule[19]=C:\Windows\system32\SHLWAPI.dll
LoadedModule[20]=C:\Windows\system32\COMDLG32.dll
LoadedModule[21]=C:\Windows\system32\SHELL32.dll
LoadedModule[22]=C:\Windows\system32\WS2_32.dll
LoadedModule[23]=C:\Windows\system32\NSI.dll
LoadedModule[24]=C:\Windows\system32\IPHLPAPI.DLL
LoadedModule[25]=C:\Windows\system32\WINNSI.DLL
LoadedModule[26]=C:\Windows\system32\mswsock.dll
LoadedModule[27]=C:\Windows\System32\wshtcpip.dll
LoadedModule[28]=C:\Windows\system32\NLAapi.dll
LoadedModule[29]=C:\Windows\system32\DNSAPI.dll
LoadedModule[30]=C:\Windows\System32\winrnr.dll
LoadedModule[31]=C:\Windows\system32\napinsp.dll
LoadedModule[32]=C:\Windows\System32\wship6.dll
FriendlyEventName=Stopped working
ConsentKey=APPCRASH
AppName=erl.exe
AppPath=C:\Program Files\erl6.0\erts-6.0\bin\erl.exe
I would like to monitor elasticsearch using nagios.
Basiclly, I want to know if elasticsearch is up.
I think I can use the elasticsearch Cluster Health API (see here)
and use the 'status' that I get back (green, yellow or red), but I still don't know how to use nagios for that matter ( nagios is on one server and elasticsearc is on another server ).
Is there another way to do that?
EDIT :
I just found that - check_http_json. I think I'll try it.
After a while - I've managed to monitor elasticsearch using the nrpe.
I wanted to use the elasticsearch Cluster Health API - but I couldn't use it from another machine - due to security issues...
So, in the monitoring server I created a new service - which the check_command is check_command check_nrpe!check_elastic. And now in the remote server, where the elasticsearch is, I've editted the nrpe.cfg file with the following:
command[check_elastic]=/usr/local/nagios/libexec/check_http -H localhost -u /_cluster/health -p 9200 -w 2 -c 3 -s green
Which is allowed, since this command is run from the remote server - so no security issues here...
It works!!!
I'll still try this check_http_json command that I posted in my qeustion - but for now, my solution is good enough.
After playing around with the suggestions in this post, I wrote a simple check_elasticsearch script. It returns the status as OK, WARNING, and CRITICAL corresponding to the "status" parameter in the cluster health response ("green", "yellow", and "red" respectively).
It also grabs all the other parameters from the health page and dumps them out in the standard Nagios format.
Enjoy!
Shameless plug: https://github.com/jersten/check-es
You can use it with ZenOSS/Nagios to monitor cluster health, data indices, and individual node heap usage.
You can use this cool Python script for monitoring your Elasticsearch cluster. This script check your IP:port for Elasticsearch status. This one and more Python script for monitoring Elasticsearch can be found here.
#!/usr/bin/python
from nagioscheck import NagiosCheck, UsageError
from nagioscheck import PerformanceMetric, Status
import urllib2
import optparse
try:
import json
except ImportError:
import simplejson as json
class ESClusterHealthCheck(NagiosCheck):
def __init__(self):
NagiosCheck.__init__(self)
self.add_option('H', 'host', 'host', 'The cluster to check')
self.add_option('P', 'port', 'port', 'The ES port - defaults to 9200')
def check(self, opts, args):
host = opts.host
port = int(opts.port or '9200')
try:
response = urllib2.urlopen(r'http://%s:%d/_cluster/health'
% (host, port))
except urllib2.HTTPError, e:
raise Status('unknown', ("API failure", None,
"API failure:\n\n%s" % str(e)))
except urllib2.URLError, e:
raise Status('critical', (e.reason))
response_body = response.read()
try:
es_cluster_health = json.loads(response_body)
except ValueError:
raise Status('unknown', ("API returned nonsense",))
cluster_status = es_cluster_health['status'].lower()
if cluster_status == 'red':
raise Status("CRITICAL", "Cluster status is currently reporting as "
"Red")
elif cluster_status == 'yellow':
raise Status("WARNING", "Cluster status is currently reporting as "
"Yellow")
else:
raise Status("OK",
"Cluster status is currently reporting as Green")
if __name__ == "__main__":
ESClusterHealthCheck().run()
I wrote this a million years ago, and it might still be useful: https://github.com/radu-gheorghe/check-es
But it really depends on what you want to monitor. The above measures:
if Elasticsearch responds to HTTP
if ingestion rate drops under the defined levels
if total number of documents drops the defined levels
But of course there's much more that might be interesting. From query time to JVM heap usage. We wrote a blog post about the most important ones here: https://sematext.com/blog/top-10-elasticsearch-metrics-to-watch/
Elasticsearch has APIs for all these, so you may be able to use a generic check_http_json to get the needed metrics. Alternatively, you may want to use something like Sematext Monitoring for Elasticsearch, which gets these metrics out of the box, then forward threshold/anomaly alerts to Nagios. (disclosure: I work for Sematext)
What needs to be done to enable pooling in a Delphi 7 app? My connection string is:
Provider=SQLOLEDB.1;Initial Catalog=%s;Data Source=%s;Password=%s;User ID=%s;OLE Db Services=-1
I can tell that connection pooling is not being achieved by looking at the SQLServer:GeneralStatistics UserConnections performance counter - it fluctuates wildly when my application runs. With connection pooling I'd expect it to achieve a steady state. Also, I see that Logins/sec and Logouts/sec counters are both very high - if connection pooling were used Logouts/sec would be at or near zero.
In searching I found this article on resource pooling:
http://www.ddj.com/database/184416942
It suggests that "If you are working at the OLEDB SDK (or COM) level using ATL, you have to write some more code" (aside from adding OLE Db Services=-1 to the connection string) to get connection pooling:
CDataSource db;
CDBPropSet dbinit(DBPROPSET_DBINIT);
dbinit.AddProperty(DBPROP_AUTH_USERID, "MyName);
dbinit.AddProperty(DBPROP_INIT_DATASOURCE, "MyServer);
dbinit.AddProperty(DBPROP_INIT_CATALOG, "MyDb );
dbinit.AddProperty(DBPROP_INIT_PROMPT, (short)4);
dbinit.AddProperty(DBPROP_INIT_LCID, (long)1033);
dbinit.AddProperty(DBPROP_INIT_OLEDBSERVICES, (long)DBPROPVAL_OS_ENABLEALL);
HRESULT hr = db.OpenWithServiceComponents(_T("sqloledb"), &dbinit);
Unfortunately that code is Greek to me and I'm not sure how to translate that to Delphi (or if its even necessary).
I'm also careful not to change the connection string at all. Any suggestions on what else I might need to do to enable resource pooling?
You need to keep one instance of the connection open at all times...if it drops to zero, then ADO will re-establish the connection to authenticate the user.
You don't mention it, but are you using Delphi's ADO implementation (dbGo for Delphi 7, IIRC) for your data access? If so, are you connecting everything through the same TADOConnection? If so, it should be doing the pooling for your application (meaning that one running copy of your application is using one connection to the DB server).