I maintain a website of events taking place in my city.
My website homepage is a 5-days calendar with all the events taking place from today to 4 days in the future.
I have done quite a good job with my "ActiveRecord" code as I have around ~120ms spent in MySQL (double checked with RackMiniProfiler) to load ~200-300 events.
However my response time is very slow (1.5s - 2s).
Most time is spent in instantiation of AR's objects from my queries
Using ObjectSpace, I see that ~6k AR::Base objects are instantiated for +300 events that are displayed.
The reason why there are so many objects is that my Event model has many associated models (e.g. venue, occurrences, categories, etc...), all of which contain bits of information I need to show.
As expected, profiling proved ActiveRecord's object instantiation as the most consuming task during the request.
I am not experienced enough in neither ActiveRecord nor its performance.
Is such speed expected or should my objects be instantiated much faster?
Should I move away from AR and use simple ruby hashes?
Is this the Rails standard when my data model is too complex?
========= UPDATE 1 =========
This pastebin contains the service class I use to load the events for a single day in the calendar.
I hope it's understandable, I did not have time to properly document it since it's still a work in progress to improve performance.
========= UPDATE 2 =========
Loading all these objects has another drawback: it causes GC runs while the page is being rendered, adding ~100ms after every n events are rendered, which becomes a total overhead of ~500ms.
To give you an idea of how much data I'm loading (sadly 99% of it is needed), if I dump to JSON I get a 47K file.
========= UPDATE 3 =========
As mentioned by #TheSuper, even though it does not improve AR's performance, fragment caching is indeed my friend as I'm rendering quite the amount of data while under GC runs. Applying fragment caching yielded a 1-1.2s improvement, which is HUGE.
However I still cannot overcome the 600ms wall of AR.
A possible improvement are "selective includes" discussed in this answer, for the few cases where I need a small portion of the attributes of an included model, but this is ugly and inflexible.
Related
My introduction to Lua has been through building a FiveM Role Play server using ESX v1.2. There's many inefficiencies in the code and I've isolated the majority of them which were causing "server thread hitches" in the server, but there's one thing I'm a little lost on.
There's an "extended player" object which holds a bunch of RP specific information aggregated from various database calls, commonly called xPlayer. Each player has a server id which is a number and there's a lua table called ESX.Players and the xPlayer is stored with a hash with ESX.Players[source] = xPlayer
There's a function:
ESX.GetPlayerFromId = function(source)
return ESX.Players[tonumber(source)]
end
which wraps the fetch by hash which is where we get lots of server hitches happening. In calling code I've taken out all processing logic to leave just this function call in place and the hitches happen still and commenting it the issue goes away. The question is why given it's just pulling from a hashtable causing lag when the things in the hashtable are already instantiated tables and are NOT reaching out to touch any IO or do anything fancy in the process? It's just a stored instance.
The one thing that stands out to me is there are a hand full of values on the xPlayer table followed by MANY functions. Is the weight of those functions on a Lua table enough to slow down returning a reference to that? This file shows where the extended player table is created and returned https://github.com/esx-framework/es_extended/blob/v1-final/server/classes/player.lua
I am in the process of pushing all those functions onto a dedicated utility table that will exist once and each of them take an xPlayer argument to be processed by the function. There's thousands of calls throughout the 100+ FiveM resources back onto the xPlayer.() calls, so the weight of code churn and regression testing is epic therefore right now the codebase is not in a state to release to a production server to test with 40+ players hammering it yet. Can anyone confirm if I'm on the right track, is lifting away the functions of the xPlayer tables likely to give me any performance improvement when fetching by hashed key? And if so why?
Mistakenly, many months ago, I used the following logic to essentially implement the same functionality as the PCollectionView's asList() method:
I assigned a dummy key of “000” to every element in my collection
I then did a groupBy on this dummy key so that I would essentially get a list of all my elements in a single array list
The above logic worked fine when I only had about 200 elements in my collection. When I ran it on 5000 elements, it ran for hours until I finally killed it. Then, I created a custom “CombineFn” in which I essentially put all the elements into my own local hash table. That worked, and even on my 5000 element situation, it ran within about a minute or less. Similarly, I later learned that I could use the asList() method, and that too ran in less than a minute. However, what concerns me – and what I don't understand – is why the group by took so long to run (even with only 200 elements it would take more than a few seconds) and with 5000 it ran for hours without seeming to accomplish anything.
I took a look at the group by code, and it seems to be doing a lot off steps I don't quite understand… Is it somehow related to the fact that the group by statement is attempting to run things across a cluster? Or is it may be related to using an efficient coder? Or am I missing something else? The reason I ask is that there are certain situations in which I'm forced to use a group by statement, because the data set is way too large to fit in any single computer's RAM. However, I am concerned that I'm not understanding how to properly use a group by statement, since it seems to be so slow…
There are a couple of things that could be contributing to the slow performance. First, if you are using SerializableCoder, that is quite slow, and AvroCoder will likely work better for you. Second, if you are iterating through the Iterable in your ParDo after the GBK, if you have enough elements you will exceed the cache and end up fetching the same data many times. If you need to do that, explicitly putting the data in a container will help. Both the CombineFn and asList approaches do this for you.
I am using ActiveRecord to bulk migrate some data from a table in one database to a different table in another database. About 4 million rows.
I am using find_each to fetch in batches. Then I do a little bit of logic to each record fetched, and write it to a different db. I have tried both directly writing one-by-one, and using the nice activerecord-import gem to batch write.
However, in either case, my ruby process memory usage is growing quite a bit throughout the life of the export/import. I would think that using find_each, I'm getting batches of 1000, there should only be 1000 of them in memory at a time... but no, each record I fetch seems to be consuming memory forever, until the process is over.
Any ideas? Is ActiveRecord caching something somewhere that I can turn off?
update 17 Jan 2012
I think I'm going to give up on this. I have tried:
* Making sure everything is wrapped in a ActiveRecord::Base.uncached do
* Adding ActiveRecord::IdentityMap.enabled = false (I think that should turn off the identity map for the current thread, although it's not clearly documented, and I think the identity map isn't on by default in current Rails anyhow)
Neither of those seem to have much effect, memory is still leaking.
I then added a periodic explicit:
GC.start
That seems to slow down the rate of memory leak, but the memory leak still happens (eventually exhausting all memory and bombing).
So I think I'm giving up, and deciding it is not currently possible to use AR to read millions of rows from one db and insert them into another. Perhaps there is a memory leak in MySQL-specific code being used (that's my db), or somewhere else in AR, or who knows.
I would suggest queueing each unit of work into a Resque queue . I have found that ruby has some quirks when iterating over large arrays like these.
Have one main thread that queue's up the work by ID, then have multiple resque workers hitting that queue to get the work done.
I have used this method on approx 300k records, so it would most likely scale to millions.
Change line #86 to bulk_queue = [] since bulk_queue.clear only sets the length of the arrya to 0 makeing it impossible for the GC to clear it.
So here's the final line of my request:
Completed in 9209ms (View: 358, DB: 1582)
So I need to figure out what's holding up the request. This request can get to be upwards of a minute, so I really need to figure this one out. If the DB and View take 1.9 seconds, then that means there's 7.2 seconds that are unaccounted for in the log. How can I further analyze other parts of the code? It would be great to know if, say, there was a single callback that's largely responsible for the delay.
Generally code that isn't in the DB or the view is in the controller (or it could be in the logic part of the model). The biggest culprits I've had problems with are loops within loops, or gratuitous use of :include => {...stuff...}
Are you materializing a lot of records from the database? Query time might not be much, but if, say, I selected 10,000 records from the database using a very simple and fast query, I'd still expect my page load time to be high, since it would spend a considerable amount of time hydrating the large recordset.
I'd suggest running some profiling tests (rake test:profile). These should give you an indication of where most of the time is being spent. You'll probably find it's just one or two method calls that are being invoked with excessive frequency.
I've got a fancy-schmancy "worksheet" style view in a Rails app that is taking way too long to load. (In dev mode, and yes I know there's no caching there, "Completed in 57893ms (View: 54975, DB: 855)") The worksheet is rendered using helper methods, because I couldn't stand maintaining umpteen teeny little partials for the different sorts of rows in the worksheet. Now I'm wondering whether partials might actually be faster?
I've profiled the page load and identified a few cases where object caching will shave a few seconds off, but the profile output suggests that a large chunk of time is spent simply looping through the Worksheet model's constituent objects and appending the string output from the helper. Here's an example of what I'm talking about:
def header_row(wksht)
content_tag(:thead, :class => "ioe") do
content_tag(:tr) do
html_row = []
for i in (0...wksht.class::NUM_COLS) do
html_row << content_tag(:th, h(wksht.column_headings[i].upcase),
:class => wksht.column_classes[i])
end
html_row.join("\n")
end
end
end
OTOH using partials means opening files, spinning off the Ruby interpreter, and in the long run, aggregating a bunch of strings, right? So I'm wondering whether there is another way to speed things up in the helpers. Should I be using something like a stringstream (does that exist in Ruby?), should I get rid of content_tag calls in favor of my own "" string interpolation... I'm willing to write my own performance tests, and share the results, if you have any suggested alternatives to the approach I've already taken.
As it's a fairly complex view (and has an editable version as well), I'd rather not rewrite-and-profile the whole thing more than once. :)
Some related reading:
http://www.viget.com/extend/helpers-vs-partials-a-performance-question/ (old)
http://www.breakingpointsystems.com/community/blog/ruby-string-processing-overhead/
http://blog.purepistos.net/index.php/2008/07/14/benchmarking-ruby-string-interpolation-concatenation-and-appending/
#tadman:
There are row totals and column totals (and more columnar arithmetic), and since they're not all just totals, but also depend on other "magic numbers" from the database, I implemented them in the Ruby code rather than Javascript. (DRY and unit testable.) Javascript is used only in the edit view, and just to add/delete rows (client side only) and to fetch a sheet with fresh totals when the cell contents change. It fetches the whole table because nearly half of the values get updated when an input cell changes.
The worksheet and its rows are actually virtual models; they don't live in the DB, but rather aggregate a boatload of real AR objects. They get created every time a view renders (but that takes 1.7 secs in dev mode, so I'm not worried about it).
I suppose I could transmit a matrix of numbers, rather than marked-up content, and have JS unpack it into the sheet. But that gets unmaintainable fast.
I ended up reading an excellent article at http://www.infoq.com/articles/Rails-Performance ("A Look At Common Performance Problems In Rails"). Then I followed the author's suggestion to cache computations during request processing:
def estimated_costs
#estimated_costs ||=
begin
# tedious vector math
end
end
Because my worksheet does stuff like the above over and over, and then builds on those results to calculate some more rows, this resulted in a 90% speedup right off the bat. Should have been plain as day, but it started with just a few totals, then I showed the prototype to the customer, and it snowballed from there :)
I also wondered whether my array-based math might be inefficient, so I replaced the Ruby Arrays of numbers with NArray (http://narray.rubyforge.org/). The speedup was negligible but the code's cleaner, so it's staying that way.
Finally, I put some object caching in place. The "magic numbers" in the database only change a few times a year at most, and some of them are encrypted, but they need to be used in most of the calculations. That's low-hanging fruit ripe for caching, and it shaved off another 1.25 seconds.
I'll look at eager loading of associations next, as there's probably some time to save there, and I'll do a quick comparison of sending "just the data" vs sending the HTML, as #tadman suggested. About the only partial I can cache is the navigation sidebar. All of the other content depends on the request parameters.
Thanks for your suggestions!
Internally all partials are translated into a block of executable Ruby code and run through exactly the same runtime as any helper methods. Periodically you can see glimpses of this when a malformed template causes the generated code to fail to compile.
Although it stands to reason that helper methods are faster than partials, and a straightforward string interpolation is faster still, it's hard to say if the performance gain from this would make it worth pursuing. Rendering a very large number of partials can be a bottleneck in terms of logging in the development environment, but in a production environment their impact seems less severe.
The only way to figure this one out is to benchmark your pages using two different rendering methods.
As you point out, caching is where you get the big gains. Using Memcached to save large chunks of pre-rendered HTML content can give you exponentially faster load times. Rendering 10,000 rows into HTML will always be slower than retrieving the same snippet from the Rails.cache subsystem.
It's also the case that the content you don't render is always rendered the quickest, so anything you can do to reduce the amount of content you generate for each helper call will provide big gains. If you're building a large spread-sheet style app that's entirely dependent on JavaScript, you may find that bundling up the data as a JSON array and expanding it client-side is significantly faster than unrolling the HTML on the server and shipping it over that way.