Why is memory profiling in ruby so hard? - ruby-on-rails

Or rather, why aren't there better tools for profiling memory in ruby, specifically rails apps?
Recently our rails app (hosted on heroku) has started seeing lots of R14 errors in the worker dynos. This means we're running out of memory. Bumping the dynos to 2x (512mb -> 1GB) only alleviated the problem temporarily, leading me to believe there is a memory leak somewhere. Naturally, my next step was to find a good profiling gem that can help me discover the source of the leak.
Maybe I'm just ignorant of the tools available, or maybe I just don't know how to use the ones I have. My wish is that I could install a gem and then run reports on the memory usage statistics. Hitting an endpoint to get a report is not really viable as my memory issues are isolated to worker dynos running delayed jobs.
I've looked at memprof, but it's 1.8 only.
I've looked at ruby-prof (awesome), but the memory profiling requires a patched ruby intepreter.
I've looked at GC::Profiler, but I don't understand how to find memory leaks with it.
So, is it just plain difficult to find memory leaks in ruby? Or am I missing the point somehow?

Depending on your "type" of leak, You can run valgrind against ruby. Might require recompilation again though. In general it's hard because ruby does method allocation without firing any events, by default, so it's tricky to track. See also perftools.rb project, which somewhat works around this limitation.

Recently I've been having some success with Skylight to profile web and background worker methods and then hunt down opportunities for optimisation. It probably wasn't around when you posted this question. The downside is that it only really helps you debug in staging or production, not development environments, so the dev loop can be very slow.
Make sure you install both skylight-ruby and sidekiq-skylight to get profiling on both your web server and background workers if you're using Sidekiq.
Good luck!

There is a nice way if your application is running on OS with dtrace or similar technology like SystemTap. In my case, we use RHEL/CentOS which has the latter:
https://lukas.zapletalovi.com/2016/08/probing-ruby-20-apps-with-systemtap-in-rhel7.html
You can easily connect to production application and "inject" profiling code for a moment and track calls, memory, CPU time or I/O and then "disconnect" at any time. It's very efficient so you will not likely notice any drastice slowdown (unless you screw your script up).

I disagree that memory profiling in Ruby is hard. The JVM has some of the best memory profiling tools on the planet, and you can run your Ruby programs on the JVM. Don't reinvent the wheel.
Browsing Memory the JRuby Way
Finding Leaks in Ruby Apps with Eclipse Memory Analyzer
Monitoring Memory with JRuby, Part 1: jhat and VisualVM
Monitoring Memory with JRuby, Part 2: Eclipse Memory Analyzer

Related

Rails Server Memory Leak/Bloating Issue

We are running 2 rails application on server with 4GB of ram. Both servers use rails 3.2.1 and when run in either development or production mode, the servers eat away ram at incredible speed consuming up-to 1.07GB ram each day. Keeping the server running for just 4 days triggered all memory alarms in monitoring and we had just 98MB ram free.
We tried active-record optimization related to bloating but still no effect. Please help us figure out how can we trace the issue that which of the controller is at fault.
Using mysql database and webrick server.
Thanks!
This is incredibly hard to answer, without looking into the project details itself. Though I am quite sure you won't be using Webrick in your target production build(right?), so check if it behaves the same under Passenger or whatever is your choice.
Also without knowing the details of the project I would suggest looking at features like generating pdfs, csv parsing, etc. Seen a case, where generating pdf files have been eating resources in a similar fashion, leaving like 5mb of not garbage collected memory for each run.
Good luck.

Rails (3) server - what to use nowadays?

I've been using Ruby Enterprise Edition and Passenger (for Apache, since I run Apache anyway for other things) for some time, but I'm wondering if there's a new trend about what to use on servers nowadays.
For example I've heard about Thin, Unicorn... I also know that 1.9.2 is faster than REE, but I wonder about RAM consumption. I'd rather have it consume less RAM even at the expense of some speed.
Thanks for all advice.
If you want minimal memory you should try Thin.
It does not have master worker as Unicorn or Passenger, thus uses less memory.
Suppose you have a very small app that needs to run on a small VM, then you can use 1 thin worker + nginx. I ran several rails 3.2 apps using Thin+nginx+postgres on 256MB VMs without swapping.
Unicorn is faster but it needs a master worker. It's good if you want to run on Heroku, you can set it 2 or 3 workers and be within the 512MB limit.
If your app is very big and you have too many long running requests, I would check out jRuby and Thinidad/Torquebox.
I converted a few apps from MRI+Sidekiq to jruby+Trinidad+Trinidad_Scheduler. I get about 100-200 req/sec using a pool of 50 threads in a trinidad server!
What I like about jRuby is that you can combine everything on one Rails Server. You can put together on the same JavaVM the cache_store with EHcache, Scheduling, Background processing and real multithreading.
You don't need to run redis, memcached, resque or sidekiq separately.
Im not saying they are not good, I love sidekiq and resque, but you can decrease your complexity by combining everything on one process and have high concurrency.
A more advanced and Enterprise solution is Torquebox, it has support for clustering and is super scalable. But I've had problems with my app crashing on torquebox, so i'm sticking to Trinidad for now.
The disadvantages of jRuby? MEmory! A Trinidad server will use minimum 512MB, up to 2-3GB ram.
Also, for Single Thread server, a single request from a rails app running Ruby-1.9.3 is about twice as fast as the same request on jRuby.
Another option is Puma, you can get full multithreading on MRI with puma. I myself could not get it stable enough on my apps.
So, it all depends on your requirements, memory usage, full threading and concurrency.
Apart from Passenger, have a look at Unicorn, Trinidad, Puma and Torquebox. Those seems to be the top rails servers right now.
There is an great book with an introduction of converting your Rails app to jRuby and deploy your app using several methods such as trinidad.
http://pragprog.com/book/jkdepj/deploying-with-jruby
The Torquebox Documentation is amazingly good. It's very detailed and explains really good how to use all Torquebox features.
http://torquebox.org/documentation/
I Hope that sharing my experience has helped.
Passenger is still extremely strong, especially being REE will naturally support 1.9 in the near future. The fact that your application can crash, however it won't affect anything else on your machine is an amazing feature to have. Deploying code is extremely easy because the server will continue to accept connections, which means less frustration/stress for you.
However, in terms of comparisons:
Here is a great resource is check out various comparisons(including memory consumption) with all the new servers.
It compares Thin, Unicorn, Passenger, TorqueBox, Glassfish, and Trinidad:
http://torquebox.org/news/2011/03/14/benchmarking-torquebox-round2/
Mike Lewis' link does a good job of comparing those different ruby servers. My personal experience has been with nginx/REE/Passenger and its been good. I haven't tried the others, so I can't comment on that.
However, I can speak on RAM usage. Your biggest savings of RAM will come from using 32-bit servers. In my experience (3x 3GB app servers), 64-bit REE/passenger processes took up to 2x as much RAM as their 32-bit counterparts. We saw a significant performance increase moving from 64 to 32 bit servers, everything else staying the same. Unless your application requires 64-bit, I would suggest running your application servers (not database) in 32-bit.
Passenger is still a very good choice to use so you are not behind the times or anything. It is also actively supported and has a very good development team that contributes a lot to the community. We have been using Unicorn and it has been very good. Our favorite functionality is to be able to upgrade apps/ruby/nginx without dropping a connection.

memory requirement for jruby +rails+mongrel?

hi
I am planning to run jruby (1.5.3 latest) on mongrel but how much memory will it require on x64 server for a simple web site ? and how many instances will be required ?
10000 page views per day
for the same requirement what would be the numbers for ruby.
any reference production data would be welcome.
You probably won't use mongrel with jruby, at least i've never heard of it. We run an app using trinidad, which wraps tomcat7 and for similar performance to what you're looking for I use a 1gb heap.
Mongrel has really gone out of favour for more robust setups using passenger or thin or unicorn for instance.
If you're limited with memory, from my experience CRuby is the way to go. Try REE or ruby-1.9.2 with Passenger3 and nginx. It's a super simple setup and very fast.
JRuby definitely takes more memory, but if you have java requirements you don't have much choice.
10000 page views you should get away with a small ec2 instance (if that's what your instances refers to)
It's really hard to give a definitive answer though as it all depends on what type of app you're running. is it cpu intensive calculations, or memory intensive data?? who knows
From my experience, CRuby tends to be much simpler than JRuby, easier for local use (ie tests run significantly faster in cruby) and also very fast.

What do I need to know about JRuby on Rails after developing RoR Apps?

I have done a few projects using Ruby on Rails. I am going to use JRuby on Rails and hosting it on GAE. In that case what are the stuff that I need to know while developing JRuby apps. I read that
JRuby has the same syntax
I can access Java libraries
JRuby does not have access to some gems/plugins
JRuby app would take some time to load for the first time, so I have to keep it alive by sending
request every 5 mins or so
I cannot use ActiveRecord and instead I must DataMapper
Please correct if I am wrong about any of the statements I have made and Is there anything else that I must know?. Do I need to start reading about JRuby from the scratch or I can go about as usual developing Ruby apps?
I use JRuby everyday.
True:
JRuby has the same syntax
JRuby does not have access to some gems/plugins
I can access Java libraries
Some gems/plugins have jruby-specific versions, some don't work at all. In general, I have found few problems and as the libraries and platforms have matured a lot of the problems have gone away (JRuby has become a lot better).
You can access Java, but in general why would you want to?
False:
JRuby app would take some time to load for the first time, so I have to keep it alive by sending request every 5 mins or so
I cannot use ActiveRecord and instead I must DataMapper
Although I guess it is possible to imagine a server setup where the initial startup/warmup cost of the JVM means you need to ping the server, there is nothing inherent in JRuby that makes this true. If you need to keep the server alive, you should look at your deployment environment. Something similar happens in shared-hosting with passenger where an app can go out of memory after a period of inactivity.
Also, we use ActiveRecord with no problems at all.
afaik, rails 3 is 100% compatible with jruby, so there should be no problem on that path.
like every new platform, you should make yourself comfortable with it by playing around with jruby. i recommend using RVM to do that.
as far as you questions go:
JRuby is just an other runtime like MRI or Rubinus
since JRuby is within the JVM using Java is very easy, but you can also use RJB from MRI
some gems are not compatible, when they use native c libraries, that do not run on JRuby
the JVM and your application container need startup time and some time to load your app, but that is all, there is no need for keep alive, that is wrong
you can use whatever you want, most gems are updated to be compatible with JRuby
#TobyHede mostly covered issues that you thought of you might have so I'll leave it at that.
As for other things to have in mind, it's simply a different interpreter and funny discrepancies will crop up that will take some adaptation.
some methods are implemented differently, such as sleep 10.seconds will throw exception (you have to sleep 10.seconds.to_i) and I remember getting NoMethodError on Symbol class when switching from MRI to JRuby (don't remember which method wasn't implemented), just have in mind slight variations will be there
you will experience hangs and exceptions in gems that otherwise worked for you (pry for example when listing more then one page)
some gems may work differently, pry (again) will exit if you press ctrl+c for example, pretty annoying
slightly slower load times of everything and no zeus
you'll get occasional java exception stack traces with no indication on which line of ruby code it happened
Timeout.timeout often will not work as expected when its wrapped around net code and stars align badly (this has mostly been fixed in jruby core, but it seems to still be an issue with gems that do their own netcode in pure java)
hidden problems with thread-safety in third party code How do you choose gems for a high throughput multithreaded Rails app? - stay away from EventMachine for example
threads will be awesome (due to nativeness and no gil) and fibers will suck (due to no coroutine support in JVM they're ordinary threads), this is why you often won't get a performance boost with celluloid when compared to MRI
you used to run your rails with MRI Ruby as processes in an OS, you knew how to track their PIDs, bloat, run times, kill them, monitor them etc, this part is not evident when you switch to JRuby because everything has turned to threads in a single process. Java world has very good tools to handle these issues, but its something you'll have to learn
killall -9 ruby doesn't do the trick with jruby when your console hangs (which it does more often then before), you have to ps -ef and then track the proper processes without killing your netbeans etc (minor, but annoying)
due to my last point, knowing Java and the JVM will help you get out of tight spots in certain situations (depending on what you intend to do this may be something you actually really need), choice of deployment server will increase or decrease this need (torquebox for example is a bit notorious for this, other deployment options might be simpler, see http://thenerdings.blogspot.com/2012/09/pulling-plug-on-torquebox-and-jruby-for.html)
...
Also, see what jruby team says about differences, https://github.com/jruby/jruby/wiki/DifferencesBetweenMriAndJruby
But yeah, otherwise its "just the same as MRI Ruby" :)

Delayed Jobs leaking memory?

I'm using collectiveidea's delayed_job with my Ruby on Rails app (v2.3.8), and running about 40 background jobs with it on an 8GB RAM Slicehost machine (Ubuntu 10.04 LTS, Apache 2).
Let's say I ssh into my server with no workers running. When I do free -m, I'm see I'm generally using about 1GB of RAM out of 8. Then after starting the workers and waiting about a minute for them to be utilized by the code, I'm up to about 4GB. If I come back in an hour or two, I'll be at 8GB and into the swap memory, and my website will be generating 502 errors.
So far I've just been killing the workers and restarting them, but I'd rather fix the root of the problem. Any thoughts? Is this a memory leak? Or, as a friend suggested, do I need to figure out a way to run garbage collection?
Actually, Delayed::Job 3.0 leaks memory in Ruby 1.9.2 if your models have serialized attributes. (I'm in the process of researching a solution.)
Here's someone who seemed to have solved it, http://spacevatican.org/2012/1/26/memory-leak-in-yaml-on-ruby-1-9-2
Here's the issue from Delayed::Job https://github.com/collectiveidea/delayed_job/issues/336
Just about every time someone asks about this, the problem is in their code. Try using one of the available profiling tools to find where your job is leaking. ( https://github.com/wycats/ruby-prof or similar.)
Triggering GC at the end of each job will reduce your max memory usage at the cost of thrashing your throughput. It won't stop Ruby from bloating to the max size required by any individual job, however, since Ruby can't free memory back to the OS. I don't recommend taking this approach.

Resources