Rails app: Trouble shooting frequent Handling RequestTimeOut errors - ruby-on-rails

I have a large webb app of which I have recently been working hard to reduce load times. I have two controllers Generator (some 20.000 items) and Product (some 1.500 items) that have been slow for a while but I have worked with indexes and smart queries. On my dev app the app response time is about 500 ms.
From time to time I still get RequestTimeOut on the app and I need help trouble shooting this error. I understand what it means (a request has taken too much time) and I have installed the 'rack-timeout' gem and set it to 15 seconds (which works fine).
I have gone through the entire app (and especially the two slowest: Generator & Product) in search for time to save. I have had some issues with caching that I am currently trying to fix (caching would help quite a bit).
It seems that these timeouts happens mostly when bots (Yandex.ru especially) spiders through my site and especially goes through one generator after another. They may not be very slow any more but loading so many after another causes a lot of requests.
Now I am out of ideas and need some help in order to know what and how to continue my trouble shooting:
Is there anything else outside of response time that cause this
error? E.g. memory leakage or something? Or is it just a matter of
lots of requests on slow controllers?
I haven't been able to test it on my development platform. Is
there a way to benchmark and see how the app would handle
requests like from the bots? I seem to remember there was an
"Apache-thing" one could use to simulate traffic like this.
Any other ways of looking at the problem or trouble shoot this
issue from a high level point of view? Any ideas and
thoughts are welcome!

Related

Remote Queued Logging of Issues (not exceptions)

We're writing an application which uses our http api, and occasionally it will encounter errors, failed connections, timeouts, etc.
We'd like to, at least in beta, be able to capture these incidents, and forward them to ourselves somehow. Since obviously many of these issues could be due to the actual connection being down, this would need to queue these incidents and send them when a connection is available.
I tried googling for an answer for this but to no avail, came across a bunch of solutions which catch exceptions, but not just random "incidents" (could really just be a string we log somewhere, we'd just include all the details in it).
Short of writing my own coredata (or something) backed queue, I'm at a loss at what a solution for this could be.
Does anyone know of any libs/services which could help with this?
You might want to look into Testflight, or less general purpose, Parse. Not quite sure, but maybe HockeyKit offers a solution for this, too.
You can take a look at Bugfender, it's a product we have built to solve this problem. We have found that while developing an app there are a lot of issues that are not crashes, so we decided to make our own product to help us on this.
It's easy to integrate and you can get the devices logs. Our service works offline and online, we have spent a lot of time to make it reliable and easy to use.
Compared to other products, you don't need any crash to get the logs, you choose when you want them.

How do I get a more detailed transaction trace with the Ruby New Relic agent

I'm running a rails 3.0 application on Heroku and using the New Relic addon/service.
I have been looking at the transaction traces feature (available in the pro version) to understand a little more about the performance characteristics of the application. However, a significant portion of time (30-50%) is "uninstrumented time". After making a few stabs by putting method_tracers in some places and going through the reasonably slow cycle to test whether I get more info, I'm feeling this is going nowhere fast.
It seems in the PHP new relic agent they have a great feature to get very detailed traces without needing to guess where to put method tracers: http://newrelic.com/docs/php/php-agent-faq#top100
Is there anything similar to this for ruby?
Note: I'm already using rpm_contrib to get some more info and have garbage collection stats enabled. Also, this is not about fixing a performance problem, just understanding how to better use the performance tools available and scratch a niggling itch about that uninstrumented time.
There isn't currently anything similar for Ruby. I'll mention it to the Ruby engineer when I get a chance. My guess is unless a lot of requests come in for it, it won't be at the top of the list for a while, though. In the meantime, you can use the method tracers to figure out the uninstrumented time.
Hope that helps.
Method tracers can work well, but if you have a lot of code in your controller, try a binary search using trace_execution_scoped, which records the time spent in a block of code:
http://newrelic.github.com/rpm/NewRelic/Agent/MethodTracer/InstanceMethods/TraceExecutionScoped.html#method-i-trace_execution_scoped
Add a couple calls to this, give each metric a sensible name like "Custom/MySlowControllerAction/block0" (first argument to trace_execution_scoped), and repeat.
The metrics you name will show up not just in Transaction Traces, but also in the Performance Breakdown for the controller action under the Web Transactions tab, so you'll see average time in that block of code across all requests, not just the slow ones.

Anybody using detrusion.com, web application firewall for ruby on rails

PS: I was doing to some random search and then I got detrusion.com.
Whats this web application firewall ?
How it works ?
Any performance hit, if yes then how much?
Should I use this destruction.com or anything else better available.
Anybody??
I quickly glanced at the code and it doesnt appear to be doing all that much. Basically it maintains a white and black list of IPs. While it cannot be that much of a crazy performance hit you'd probably be better off doing this kind of request analyzing in a Rack middleware, that is before it even gets to the Rails request handling.
That being said, I dont like the fact that it will re-sync every 5 minutes DURING processing a given request. That is, it will block the current request while it re-syncs its ruleset / and lists. Which means that you're at the mercy of the Detrusion.com team to keep their site/API up. So when they go down you go down.
While its not as real-timey, I'd feel more comfortable to have the updating process be out of bound. Maybe you store the rules/lists in a flat file or a local DB (Redis would be perfect) which you load on app start. Then you have a frequent cron which reloads the ruleset from Detrusion and writes it locally.
Something like that. Just anything to de-couple your request handling from a Detrusion API check.

Ruby on Rails: what performance can I realistically aim for?

I've been building an application in Ruby on Rails 3, and I'm starting to worry about performance optimization. Now I hope that my question is not too subjective for this site, but I'm interested in facts, not a discussion, so here goes:
While I'm trying to get my views to render faster, there is one thing I simply do not know: What should I aim for? Given a reasonably complex page, what load time is realistic? I simply don't have any reference.
What I'm typically seeing for my application is something like this:
Completed 200 OK in 397ms (Views: 341.1ms | ActiveRecord: 17.7ms)
This is on my production server, running Apache/Passenger.
I am the only one (!) making requests on that server, it's a root server (not virtual), running Ubuntu, AMD Athlon 64 X2 5600+, 4 GB RAM
That is, for most of my more complicated actions (not unusually complicated, just assume it's a paginated listing of 20 objects with 5 computed properties each or something) the ActiveRecord times are almost always fine (<20-30ms), but the "views" number is usually >200 ms.
Now, to my question: When I started using RoR my expectation (maybe unrealistic) was that for most consumer-oriented applications with average complexity (let's say something like Facebook, Twitter, etc. WITHOUT the millions of users) I would get < 20 ms load times as long as I was the only one making requests, and that for a single server load times would only approach 100ms or more if there were lots of people making requests at the same time.
My expectation was also that database requests would be the major bottleneck, since all the rest is just relatively simple computations without any real complexity. I thought that it might take 10ms to get all the objects from the database, and then maybe another 5 ms to run the controller code, build the view, etc.
Since I've never been in charge of any production app, I don't know if this expectation was in any way realistic. So I would like somebody with experience point out to me what my realistic expectation should be.
(e.g. "pretty much everything but really nasty stuff should render in 50 ms tops as long as you are the only one making requests")
or ("actually 300 ms is not unusual for RoR applications, even if you're the only user")
or ("Are you kidding? I get < 10 ms with 150 concurrent requests on a smaller server than yours. There must be something very wrong with your app)
Again, I hope this is not too subjective, but I'm not really interested in an opinion of whether or not RoR is fast, I want facts from someone with more experience on what numbers are average and to be expected from production RoR applications. Otherwise I simply have no clue at what point I should stop optimizing and just accept that I'll never get 10 ms load times.
Gosh, I'm not sure I'm the one to answer this, but since I've been around these waters enough times, I may have an incomplete idea of things to look at.
First of all, the response times is pretty subjective. Meaning, it's good enough if it's good enough for you. From my experience, pages resembling your description seem to take about as much time as what you're describing. So, you're not orders of magnitude off in either direction.
If you want to optimize your view renders with your current architecture, your next step is here, I think. Greg Pollack does a great job breaking this stuff down for you and will make sure you're on track. You'll be sure to get your assets cached and your stack fine-tuned. That'll be your most practical general advice.
If you're willing to look at your deployment architecture, Ilya Grigorik raises some great questions in this article and then answers them with Goliath. If your bottlenecks are speeding up your server-client round trip, that's probably the approach to do.
I try to pay attention to anything Aaron Patterson says about performance, like in this talk. He's going to teach general optimization ideas, most of them for your server-side code. You may catch a few things that relate to your current problem.
I was pulled aside by a former co-worker at MWRC this year and told that I'm absolutely nuts if I'm not building with JRuby these days. It's a bit of a commitment, and I've resisted making major changes like that until I have truly painful response times, which I don't, and it doesn't sound like you're having either. However, JRuby's a very mainstream thing to do now, and you and I will likely embrace this for some projects at some point in the future.
So, bottom line, I think you're in the realm of a spry app as you are. I think I'd work down these resources in the order I presented them.
Not knowing what you're rendering, it's hard to comment on the performance, but I would venture to say that 200ms is very high. Don't forget that the debug information in your logs can be a little misleading: if you're querying your DB or some external resource from within a view, as opposed to preloading that data in your controller, then that time will be attributed to view rendering.
Common culprits: you load Model X in your model, but then access an association in your view which triggers a bunch of selects under the hood. The time to fetch Model x is low, but the associated records will show up as "view time".
In other words, dig into the logs and if its actually your view code, then bring up a profiler.
I'm getting view times < 20ms on a $20/month linode server. That's well-optimized code, for a request of medium complexity, running on JRuby. You haven't hit Rails' performance limits by any means. Time to use a profiler and see what's taking so long.
I don't think your 200 ms view time is abnormal, or even high in any way.
However, you have room for improvement. You say " (not unusually complicated, just assume it's a paginated listing of 20 objects with 5 computed properties each or something)"
To me, that's 100 operations that could be pre-calculated, and would speed up your view rendering time.
Finally -- Rendering time doesn't usually have a direct correlation to number of users. Under most deployments, as a request comes in, it is handled by a process and then responded to. Other requests wait until the first is completed before they are processed.
Use static content where possible. Outside of that, use caching where possible, at the highest possible level, preferably at the page level. When content can't be cached, try to get -something- static or cacheable back to the user quickly. You might, for instance, serve up a static page with the basic layout, and an animated busy-image where the content belongs, and then use JavaScript to load the dynamic content.

How to prepare to be tech crunched

There is a good chance that we will be tech crunched in the next few days. Unfortunately, we have not gone live yet so we don't have a good estimation of how our system handles a production audience.
Our production setup consists of 2 EngineYard slices each with 3 mongrel instances, using Postgres as the database server.
Obviously a huge portion of how our app will hold up is to do with our actual code and queries etc. However, it would be good to see if there are any tips/pointers on what kind of load to expect or experiences from people who have been through it. Does 6 mongrel instances (possibly 8 if the servers can take it) sound like it will handle the load, or are at least most of it?
I have worked on several rails applications that experienced high load due to viral growth on Facebook.
Your mongrel count should be based on several factors. If your mongrels make API calls or deliver email and must wait for responses, then you should run as many as possible. Otherwise, try to maintain one mongrel per CPU core, with maybe a couple extra left over.
Make sure your server is using a Fair Proxy Balancer (not round robin). Here is the nginx module that does this: http://github.com/gnosek/nginx-upstream-fair/tree/master
And here are some other tips on improving and benchmarking your application performance to handle the load:
ActiveRecord
The most common problem Rails applications face is poor usage of ActiveRecord objects. It can be quite easy to make 100's of queries when only one is necessary. The easiest way to determine if this could be a problem with your application is to set up New Relic. After making a request to each major page on your site, take a look at the newrelic SQL overview. If you see a large number of very similar queries sequentially (select * from posts where id = 1, select * from posts where id = 2, select * from posts...) this may be a sign that you need to use a :include in one of your ActiveRecord calls.
Some other basic ActiveRecord tips (These are just the ones I can think of off the top of my head):
If you're not doing it already, make sure to correctly use indexes on your database tables.
Avoid making database calls in views, especially partials, it can be very easy to lose track of how much you are making database queries in views. Push all queries and calculations into your models or controllers.
Avoid making queries in iterators. Usually this can be done by using an :include.
Avoid having rails build ActiveRecord objects for large datasets as much as possible. When you make a call like Post.find(:all).size, a new class is instantiated for every Post in your database (and it could be a large query too). In this case you would want to use Post.count(:all), which will make a single fast query and return an integer without instantiating any objects.
Associations like User..has_many :objects create both a user.objects and user.object_ids method. The latter skips instantiation of ActiveRecord objects and can be much faster. Especially when dealing with large numbers of objects this is a good way to speed things up.
Learn and use named_scope whenever possible. It will help you keep your code tiny and makes it much easier to have efficient queries.
External APIs & ActionMailer
As much as you can, do not make API calls to external services while handling a request. Your server will stop executing code until a response is received. Not only will this add to load times, but your mongrel will not be able to handle new requests.
If you absolutely must make external calls during a request, you will need to run as many mongrels as possible since you may run into a situation where many of them are waiting for an API response and not doing anything else. (This is a very common problem when building Facebook applications)
The same applies to sending emails in some cases. If you expect many users to sign up in a short period of time, be sure to benchmark the time it takes for ActionMailer to deliver a message. If it's not almost instantaneous then you should consider storing emails in your database an using a separate script to deliver them.
Tools like BackgroundRB have been created to solve this problem.
Caching
Here's a good guide on the different methods of caching in rails.
Benchmarking (Locating performance problems)
If you suspect a method may be slow, try benchmarking it in console. Here's an example:
>> Benchmark.measure { User.find(4).pending_invitations }
=> #<Benchmark::Tms:0x77934b4 #cutime=0.0, #label="", #total=0.0, #stime=0.0, #real=0.00199985504150391, #utime=0.0, #cstime=0.0>
Keep track of methods that are slow in your application. Those are the ones you want to avoid executing frequently. In some cases only the first call will be slow since Rails has a query cache. You can also cache the method yourself using Memoization.
NewRelic will also provide a nice overview of how long methods and SQL calls take to execute.
Good luck!
Look into some load testing software like WEBLoad or if you have money, Quick Test Pro. This will help give you some idea. WEBLoad might be the best test in your situation.
You can generate thousands of virtual nodes hitting your site and you can inspect the performance of your servers from that load.
In my experience having watched some of our customers absorb a crunching, the traffic was fairly modest- not the bone crushing spike people seem to expect. Now, if you get syndicated and make on Yahoo's page or something, things may be different.
Search for the experiences of Facestat.com if you want to read about how they handled it (the Yahoo FP.)
My advise is just be prepared to turn off signups or go to a more static version of your site if your servers get too hot. Using a monitoring/profiling tool is a good idea as well, I like FiveRuns Manage tool for ease of setup.
Since you're using EngineYard, you should be able to allocate more machines to handle the load if necessary
Your big problems will probably not be the number of incoming requests, but will be the amount of data in your database showing you where your queries aren't using the indexes your expecting, or are returning too much data, e.g. The User List page works with 10 users, but dies when you try to show 10,000 users on that one page because you didn't add pagination (will_paginate plugin is almost your friend - watch out for 'select count(*)' queries that are generated for you)
So the two things to watch:
Missing indexes
Too much data per page
For #1, there's a plugin that runs an 'explain ...' query after every query so you can check index usage manually
There is a plugin that can generate data for you for various types of data that may help you fill your database up to test these queries too.
For #2, use will_paginate plugin or some other way to reduce data per page.
We've got basically the same setup as you, 2 prod slices and a staging slice at EY. We found ab to be a great load testing tool - just write a bash script with the urls that you expect to get hit and point it at your slice. Watch NewRelic stats and it should give you some idea of the load your app can handle and where you might need to optimise.
We also found query_reviewer to be very useful as well. It is great for finding those un-indexed tables and n+1 queries.

Resources