Lots of "COMMIT;" in the PostgreSQL log of slow queries - ruby-on-rails

I am trying to optimize the PostgreSQL 9.1 database for a Rails app I am developing. In postgresql.conf I have set
log_min_duration_statement = 200
I then use PgBadger to analyze the log file. The statement which, by far, takes up most of the time is:
COMMIT;
I get no more information than this and I am very confused as to what statement this is. Does anyone know what I can do to get more detailed information about the COMMIT queries? All other queries show the variables used in the statement, SELECT, UPDATE etc. But not the COMMIT queries.

As #mvp notes, if COMMIT is slow the usual reason is slow fsync()s because every transaction commit must flush data to disk - usually with the fsync() call. That's not the only possible reason for slow commits, though. You might:
have slow fsync()s as already noted
have slow checkpoints stalling I/O
have a commit_delay set - I haven't verified that delayed commits get logged as long running statements, but it seems reasonable
If fsync() is slow, your best option is to re-structure your work so you can run it in fewer larger transactions. A reasonable alternative can be to use a commit_delay to group commits; this will group commits up to improve overall throughput but will actually slow individual transactions down.
Better yet, fix the root of the problem. Upgrade to a RAID controller with battery backed write-back cache or to high-quality SSDs that're power-fail safe. See, ordinary disks can generally do less than one fsync() per rotation, or between 5400 and 15,000 per minute depending on the hard drive. With lots of transactions and lots of commits, that's going to limit your throughput considerably, especially since that's the best case if all they're doing is trivial flushes. By contrast, if you have a durable write cache on a RAID controller or SSD, the OS doesn't need to make sure the data is actually on the hard drive, it only needs to make sure it's reached the durable write cache - which is massively faster because that's usually just some power-protected RAM.
It's possible fsync() isn't the real issue; it could be slow checkpoints. The best way to see is to check the logs to see if there are any complaints about checkpoints happening too frequently or taking too long. You can also enable log_checkpoints to record how long and how frequent checkpoints are.
If checkpoints are taking too long, consider tuning the bgwriter completion target up (see the docs). If they're too frequent, increase checkpoint_segments.
See Tuning your PostgreSQL server for more information.

COMMIT is perfectly valid statement which purpose is to commit currently pending transaction. Because of nature of what it really does - making sure that data is really flushed to disk, it is likely to take most of the time.
How can you make your app work faster?
Right now, it is likely that your code is using so called auto-commit mode - that is, every statement is implicitlly COMMIT'ted.
If you explicitly wrap bigger blocks into BEGIN TRANSACTION; ... COMMIT; blocks, you will make your app work much faster and reduce number of commits.
Good luck!

Try to log every query for a couple of days and then see what is going on in the transaction before the COMMIT statement.
log_min_duration_statement = 0

Related

How to quickly warmup Neo4J page cache for a set of nodes and relationships

I'm setting up a REST API backed by a Neo4J instance of complex data about 400 GB of store size and need to warmup page cache for a set of nodes and relationships.
There are specific calls for a set of input data which takes a lot of time to query and get a response. So I want a way to warmup the cache only for the nodes and relationships which are accessed for these calls. I tried using APOC.warmup.run(true, true) (took around 15 mins which is acceptable for me), but it does load the whole store to memory which I can't do since I have a constraint on memory.I tried writing a simple Cypher which traverses through these path, but it is taking lot of time to execute and when I check the memory growth of Neo4J instance which is very slow as compared to APOC warmup.
I am also thinking if there is a way to extend/customize APOC warmup to load only specific parts of the store, but want to see if there are people out there who already tried something similar before.
I expect a quick way of warming up specific part of the store rather than the whole store.
You should also have enough page-cache configured for that (400G)
If you use apoc.warmup.run(true,true,true) also the schema indexes will be warmed up.
I guess it only took 15 minutes for you because your disk read performance might be lower? And how many CPUs do you have?
You cannot really load a specific part of the store.
But recent Neo4j versions track page-cache usage and restore it after restart.
So you can just run your queries and after a restart the same pages will be in the page cache.
You also see with PROFILE if you have page-faults in your query.

Background job taking twice the time that the same operation within rails

In my Rails application, I have a long calculation requiring a lot of database access.
To make it short, my calculation took 25 seconds.
When implementing the same calculation within a background job (a big single worker), the same calculation take twice the same time (ie 50 seconds). I have try several technics to put the job in a background process put none add an impact on my performances => using DelayJob / Sidekiq / doing the process within my rails but in a thread created for the work, but all have the same impact on my performances *2.
This performance difference only exist in rails 'production' environment. It looks like there is an optimisation done by rails that is not done in my background job.
My technical environment is the following =>
I am using ruby 2.0 / rails 4
I am using unicorn (but I have same problem without it).
The job is using Rails.cache to store some partial computation.
I am using postgresql
Does anybody has an clue where this impact might come from ?
I'm assuming you're comparing the background job speed to the speed of running the operation during a web request? If so, you're likely benefiting from Rails's QueryCache, which caches db queries during a web request. Try disabling it like described here:
Disabling Rails SQL query caching globally
If that causes the web request version of the job to take as long as the background job, you've found your culprit. You can then enable the query cache on your background job to speed it up (if it makes sense for your application).
Background job is not something that need to used for speed-up things. It's main meaning is to 'fire and forget' and remove 25 seconds of calculating synchronously and adding some more of calculating asynchronously. So you can give user response that she's request is processing and return with calculation later.
You may take speed gain from background job by splitting big task on some small and running them at same time. In your case I think it's something impossible to use, because of dependency of operations in yours calculation.
So if you want to speed you calculation, you need to look into denormalization of your data structure, storing some calculated values for your big calculation on moment when source data for this calculation updated. So you will calculate less on user request for results and more on data storage. And it's good place for use background job. So you finish your update of data, create background task for update caches. And if user request for calculation comes before this task is finished you will still need to wait for cache fill-up.
Update: I think I am still need to answer your main question. So basically this additional time on background task processing is comes from implementation. Because of 'fire and forget' approach no one need that background task scheduler will consume big amount of processor time just monitoring for new jobs. I am not sure completely but think that if your calculation will be two times more complex, time gain will be same 25 seconds.
My guess is that the extra time is coming from the need for your background worker to load rails and all of your application. My clue is that you said the difference was greatest with Rails in production mode. In production mode, subsequent calls to the app make use of the app and class cache.
How to check this hypotheses:
Change your background job to do the following:
print a log message before you initiate the worker
start the worker
run your calculation. As part of your calculation startup, print a log message
print another log message
run your calculation again
print another log message
Then compare the two times for running your calculation.
Of course, you'll also gain some extra time benefits from database caching, code might remain resident in memory, etc. But if the second run is much much faster, then the fact that the second run didn't restart Rails is more significant.
Also, the time between the log message from steps 1 and 3 will also help you understand the start up times.
Fixes
Why wait?
Most important: why do you need the results faster? Eg, tell your user that the result will be emailed to them after it is calculated. Or let your user see that the calculation is proceeding in the background, and later, show them the result.
The key for any long running calculation is to do it in the background and encourage the user to not wait for the result. They should be able to do something else until they get the result.
Start the calculation automatically As soon as the user logs in, or after they do something interesting, start the calculation. That way, when (and if) the user asks for the calculation, the answer will either be already done or will soon be done.
Cache the result and bust the cache as needed Similar to the above, start the calculation periodically and automatically. If the user changes some data, then restart the calculation by busting the cache. There are also ways to halt any on-going calculation if data is changed during the calculation.
Pre-calculate part of the calculation Why are you taking 25 seconds or more for a dbms calculation? Could be that you should change the calculation. Investigate adding indexes, summary tables, de-normalizing, splitting the calculation into smaller steps that can be pre-calculated, etc.

Memory constantly increasing in Rails app

I recently launched a new Ruby on Rails application that worked well in development mode. After the launch I have been experiencing the memory being used is constantly increasing:
UPDATED: When this screen dump (the one below) from New Relic was taken. I have scheduled a web dyno restart every hour (one out of two web dynos). Thus, it does not reach the 500Mb-crash level and it actually gets a bit of a sig saw pattern. The problem is not at all resolved by this though, only some of the symptoms. As you can see the morning is not so busy but the afternoon is more busy. I made an upload at 11.30 for a small detail, it could not have affected the problem even though it appears as such in the stats.
It could be noted as well that it is the MIN memory that keeps on increasing even though the graph shows AVG memory. Even when the graph seems to go down temporarily in the graph, the min memory stays the same or increases. The MIN memory never decreases!
The app would (without dyno restarts) increase in memory until it reached the maximum level at Heroku and the app crashes with execution expired-types of errors.
I am not a great programmer but I have made a few apps before without having this type of problem.
Troubleshooting performed
A. I thought the problem would lie within the before_filter in the application_controller (Will variables in application controller cause a memory leak in Rails?) but that wasn't the problem.
B. I installed oink but it does not give any results (at all). It creates an oink.log but does not give any results when I run "heroku run oink -m log/oink.log", no matter what threshold.
C. I tried bleak_house but it was deprecated and could not be installed
D. I have googled and read most articles in the topic but I am none the wiser.
E. I would love to test memprof but I can't install it (I have Ruby 1.9x and don't really know how to downgrade it to 1.8x)
My questions:
Q1. What I really would love to know is the name(s) of the variable(s) that are increasing for each request, or at least which controller is using the most memory.
Q2. Will a controller as the below code increase in memory?
related_feed_categories = []
#gift.tags.each do |tag|
tag.category_connections.each do |cc|
related_feed_categories << cc.category_from_feed
end
end
(sorry, SO won't re-format the code to be easily readable for some reason).
Do I need to "kill off" related_feed_categories with "related_feed_categories = nil" afterwards or does the Garbage Collector handle that?
Q3. What would be my major things to look for? Right now I can't narrow it down AT ALL. I don't know which part of the code to look deeper into, and I don't really know what to look for.
Q4. In case I really cannot solve the problem. Are there any online consulting service where I can send my code and get them to find the problem?
Thanks!
UPDATED. After receiving comments it could have to do with sessions. This is a part of the code that I guess could be bad:
# Create sessions for last generation
friend_data_arr = [#generator.age, #generator.price_low, #generator.price_high]
friend_positive_tags_arr = []
friend_negative_tags_arr = []
friend_positive_tags_arr << #positive_tags
friend_negative_tags_arr << #negative_tags
session["last_generator"] = [friend_data_arr, friend_positive_tags_arr, friend_negative_tags_arr]
# Clean variables
friend_data_arr = nil
friend_positive_tags_arr = nil
friend_negative_tags_arr = nil
it is used in the generator#show controller. When some gifts have been generated through my gift-generating-engine I save the input in a session (in case they want to use that info in a later stage). I never kill or expire these sessions so in case this could cause memory increase.
Updated again: I removed this piece of code but the memory still increases, so I guess this part is not it but similar code might causing the error?
That's unlikely our related_feed_categories provoke this.
Are you using a lot of files ?
How long do you keep sessions datas ? Looks like you have an e-commerce site, are you keeping objects in sessions ?
Basically, i think it is files, or session, or an increase in temporary datas flushed when the server crash(memcache ?).
In the middle of the night, i guess that ou have fewer customer. Can you post the same memory chart, in peak hours ?
It may be related to this problem : Memory grows indefinitely in an empty Rails app
UPDATE :
Rails don't store all the datas on client side. I don't remember the default store, bu unless you choose the cookie::store, rails send only datas like session_id.
They are few guidelines about sessions, the ActiveRecord::SessionStore seem to be the best choice for performance purpose. And you shouldn't keep large objects nor secrets datas in sessions. More on session here : http://guides.rubyonrails.org/security.html#what-are-sessions
In the 2.9 part, you have an explanation to destroy sessions, unused for a certain time.
Instead of storing objects in sessions, i suggest you store the url giving the search results. You may even store it in database, offering the possibility to save few research to your customer, and/or by default load the last used.
But at these stage we are still, not totally sure that sessions are the culprits. In order to be sure, you may try on a test server, to stress test your application, with expiring sessions. So basically, you create a large number of sessions, and maybe 20 min later rails has to suppress them. If you find any difference in memory consumption, it will narrow things.
First case : memory drop significantly when sessions expires, you know that's is session related.
Second case : The memory increase at a faster rate, but don't drop when sessions expires, you know that it is user related, but not session related.
Third case : nothing change(memory increase at usual), so you know it do not depend on the number of user. But i don't know what could cause this.
When i said stress tests, i mean a significant number of sessions, not really a stress test. The number of sessions you need, depends on your average numbers of users. If you had 50 users, before your app crashed, 20 -30 sessions may be sginificant. So if you had them by hand, configure a higher expire time limit. We are just looking for differences in memory comsuption.
Update 2 :
So this is most likely a memory leak. So use object space, it has a count_objects method, which will display all the objets currently used. It should narrow things. Use it when memory have already increased a lot.
Otherwise, you have bleak_house, a gem able to find memory leaks, still ruby tools for memory leaks are not as efficient as java ones, but it's worth a try.
Github : https://github.com/evan/bleak_house
Update 3 :
This may be an explanation, this is not really memory leak, but it grows memory :
http://www.tricksonrails.com/2010/06/avoid-memory-leaks-in-ruby-rails-code-and-protect-against-denial-of-service/
In short, symbols are keep in memory until your restart ruby. So if symbols are created with random name, memory will grow, until your app crash. This don't happen with Strings, the are GCed.
Bit old, but valid for ruby 1.9.x Try this : Symbol.all_symbols.size
Update 4:
So, your symbols are probably the memory leak. Now we still have to find where it occurs. Use Symbol.all_symbols. It gives you the list. I guess you may store this somewhere, and make a diff with the new array, in order to see what was added.
It may be i18n, or it may be something else generating in an implicit way like i18n. But anyway, this is probably generating symbols with random data in the name. And then these symbols are never used again.
Assuming category_from_feed returns a string (or a symbol perhaps), a magnitude of 300MB increase is quite unlikely. You can roughly arrive at this by profiling this:
4_000_000.times {related_feed_categories << "Loooooooooooooong string" }
This snippet would shoot the memory usage up by about 110MB.
I'd look at DB connections or methods that read a file and then don't close it. I can see that it's related to feeds which probably means you might be using XML. That can be a starting point too.
Posting this as answer because this looks bad in comments :/

Heroku database performance experience needed?

We are experiencing some serious scaling challenges for our intelligent search engine/aggregator. Our database holds around 200k objects. From profiling and newrelic it seems most of our troubles may come from the database. We are using the smallest dedicated database Heroku provide (Ronin).
We have been looking into indexing and caching. So far we managed to solve our problems by reducing database calls and caching content intelligently, but now even this seems to reach an end. We are constantly asking ourselves if our code/configuration is good enough or if we are simply not using enough "hardware".
We suspect that the database solution we buy from Heroku may be performing insufficiently. For example, just doing a simple count (no joins, no nothing) on the 200k items takes around 250ms. This seems like a long time, even though postgres is known for its bad performance on counts?
We have also started to use geolocation lookups based on latitude/longitude. Both columns are indexed floats. Doing a distance calculation involves pretty complicated math, but we are using the very well recommended geocoder gem that is suspected to run very optimized queries. Even geocoder still takes 4-10 seconds to perform a lookup on, say, 40.000 objects, returning only a limit of the first nearest 10. This again sounds like a long time, and all the experienced people we consult says that it sound very odd, again hinting at the database performance.
So basically we wonder: What can we expect from the database? Might there be a problem? And what can we expect if we decide to upgrade?
An additional question I have is: I read here that we can improve performance by loading the entire database into memory. Are we supposed to configure this ourselves and if so how?
UPDATE ON THE LAST QUESTION:
I got this from the helpful people at Heroku support:
"What this means is having enough memory (a large enough dedicated
database) to store your hot data set in memory. This isn't something
you have to do manually, Postgres is configured automatically use all
available memory on our dedicated databases.
I took a look at your database and it looks like you're currently
using about 1.25 GB of RAM, so you haven't maxed your memory usage
yet."
UPDATE ON THE NUMBERS AND FIGURES
Okay so now I've had time to look into the numbers and figures, and I'll try to answer the questions below as follows:
First of all, the db consists of around 29 tables with a lot of relations. But in reality most queries are done on a single table (some additional resources are joined in, to provide all needed information for the views).
The table has 130 columns.
Currently it holds around 200k records but only 70k are active - hence all indexes are made as partial-indexes on this "state".
All columns we search are indexed correctly and none is of text-type, and many are just booleans.
Answers to questions:
Hmm the baseline performance it's kind of hard to tell, we have sooo many different selects. The time it takes varies typically from 90ms to 250ms selecting a limit of 20 rows. We have a LOT of counts on the same table all varying from 250ms to 800ms.
Hmm well, that's hard to say cause they wont give it a shot.
We have around 8-10 users/clients running requests at the same time.
Our query load: In new relic's database reports it says this about the last 24 hours: throughput: 9.0 cpm, total time: 0.234 s, avg time: 25.9 ms
Yes we have examined the query plans of our long-running queries. The count queries are especially slow, often over 500ms for a pretty simple count on the 70k records done on indexed columns with a result around 300
I've tuned a few Rails apps hosted on Heroku, and also hosted on other platforms, and usually the problems fall into a few basic categories:
Doing too much in ruby that could be done at the db level (sorting, filtering, join data, etc)
Slow queries
Inefficient use of indexes (not enough, or too many)
Trying too hard to do it all in the db (this is not as common in rails, but does happen)
Not optimizing cacheable data
Not effectively using background processing
Right now its hard to help you because your question doesn't contain any specifics. I think you'll get a better response if you pinpoint the biggest issue you need help with and then ask.
Some info that will help us help you:
What is the average response time of your actions? (from new relic, request-log-analyzer, logs)
What is the slowest request that you want help with?
What are the queries and code in that request?
Is the site's performance different when you run it locally vs. heroku?
In the end I think you'll find that it is not an issue specific to Heroku, and if you had your app deployed on amazon, engineyard, etc you'd have the same performance. The good news is I think that your problems are common, and shouldn't be too hard to fix once you've done some benchmarking and profiling.
-John McCaffrey
We are constantly asking...
...this seems a lot...
...that is suspected...
...What can we expect...
Good news! You can put and end to seeming, suspecting wondering and expecting through the magic of measurement!!!
Seriously though, you've not mentioned any of the basic points you'd need to get a useful answer:
What's the baseline performance of the DB running a sequential scan and single-row index fetches? You say Heroku say your DB fits in RAM, so you shouldn't see disk I/O issues when you measure.
Does this performance match whatver Heroku say it should be?
How many concurrent clients?
What's your query load - what queries and how often?
Have you checked the query plans for any of your suspiciously long-running queries?
Once you've got this sort of information, maybe someone can say something useful. As it stands anything you read here is just guesswork.
First: you should check your postgres configuration. (show all from within psql or another client, or just look at postgres.conf in the data directory) The parameter with the largest impact on performance is effective_cache_size, which should be set to about (total_physical_ram - memory_in_use_by_kernel_and_all_processes). For a 4GB machine, this often is around 3GB (4-1). (this is very course tuning, but will give the best results for a first step)
Second: why do you want all the counts? Better use a typical query: just ask for what is needed, not what is available. (reason: there is no possible optimisation for a COUNT(*): eiither the whole table, or a whole index needs to be scanned)
Third: start gathering and analysing some queryplans (for typical queries that perform badly). You can get a query plan by putting EXPLAIN ANALYZE before the actual query. (another way is to increase the logging level, and obtain them from the logfile) A bad queryplan can point you at missing statistics or indexes, or even at bad data-modelling.
Newrelic monitoring can be included as an add-on for heroku (http://devcenter.heroku.com/articles/newrelic). At the very least this should give you a lot of insight into what is happening behind the scenes, and may help you pinpoint some issues.

Validating a legacy table with ActiveRecord

Good day all,
We are doing a data migration from one system to a Rails application. Some of the tables we are working with are very large and moving them over 1 record at a time using ActiveRecord takes far too long. Therefore we resorted to copying the table over in SQL and validating after the fact.
The one-by-one validation check is still slow, but the speed increase from the SQL copy more than makes up for it. However, that hasn't quenched our thirst to see if we can get the validation check to happen more quickly. We attempted to split the table into chunks and pass each chunk to a Thread but it actually executed slower.
The question is, large table, currently iterating row-by-row to do the validation, like so
Model.find_each do |m|
logger.info "M #{m.id} is not valid" unless m.valid?
end
Anyone have any recommendations on how to speed this up?
Thanks
peer
EDIT I should say not specifically this code. We are looking for recommendations on how we can run this concurrently, giving each process a chunk of data, without needed a machine per process
find_each is using find_in_batches, which fetches 1000 rows at a time by default. You could try playing with the batch_size option. The way you have it above seems pretty optimal; it's fetching from the database in batches and iterating over each one, which you need to do. I would monitor your RAM to see if the batch size is optimal, and you could also try using Ruby 1.9.1 to speed things up if you're currently using 1.8.*.
http://api.rubyonrails.org/classes/ActiveRecord/Batches/ClassMethods.html#M001846
I like zgchurch's response as a starting point.
What I would add is that threading is definitely not going to help here, especially because Ruby uses green threads (at least in 1.8.x), so there is no opportunity to utilize multiple processors anyway. Even if that weren't the case it's very likely that this operation is IO-heavy enough that you would get IO contention eating into any multi-core benefits.
Now if you really want to speed this up you should take a look at the actual validations and figure out a more efficient way to achieve them. Just loading all the rows and instantiating an ActiveRecord object is going to tend to dominate the performance in most validation situations. You may be spending 90-99.99% of your time just loading and unloading the data from memory.
In these types of situations I tend to go towards raw SQL. You can do things like validating foreign key integrity tens of thousands of times faster than raw ActiveRecord validation callbacks. Of course the viability of this approach depends on the actual ins and outs of your validations. Even if you need something a little richer than SQL to define validity, you could still probably get a 10-100x speed increase just be loading the minimal data with a thinner SQL interface and examining the data directly. If that's the case Perl or Python might be a better choice for raw performance.

Resources