TimeCop in production on a per-request basis - ruby-on-rails

I've got a rails app using MRI ruby and Unicorn as the app server. The app provides a 10-week fitness program, and I've built a feature for people logged in as admins to be able to "time travel" to different weeks in the program. Basically, it just sets a session variable with the date they're "time travelled" to, and, each request, it time-travels to that point at the beginning, then returns at the end. The code is below.
I'd limited the feature non-production environments, out of fear that one person's time-travelling may affect other users (since TimeCop monkey patches core classes). But, given MRI isn't really multi-threaded I'm now thinking that's an irrational fear, and that there should be no danger of using the "time travel" feature in production.
For the duration within which a single request is processed (and therefore the time for which the core classes are monkey patched by TimeCop if the user is using "time travel"), there should be no possibility that any other request gets run on the same ruby process.
Is this correct? Or can other user's requests be affected by TimeCop's changes to the core classes in a way I'm not aware of?
The code I'm using is as follows:
module TimeTravelFilters
extend ActiveSupport::Concern
included do
if Rails.env.development? || Rails.env.staging?
around_action :time_travel_for_request
end
end
def time_travel_for_request
time_travel
yield
time_travel_return
end
def time_travel
if session[:timetravel_to_date]
Timecop.travel(session[:timetravel_to_date])
end
end
def time_travel_return
Timecop.return
end
end

MRI's global interpreter lock does mean that 2 threads won't execute concurrently, but the granularity of that is much much smaller than the processing of one request.
As it happens unicorn doesn't use threads for concurrency so you'd be ok, but the day you switched to another server (eg puma) then you'd be in for a nasty surprise.
This would also affect things like data in logs, created_at/updated_at timestamps for anything updated and so on. It might also affect monitoring data gathered by services like newrelic, airbrake etc if you use those. Another example of something that might seem completely unrelated is api requests to AWS: the signature that verifies these requests includes a timestamp, and they will fail if you're out of sync by more than a few minutes. There is just too much code (much of which you don't control) that assumes that Time.now is accurate.
You'd be much better off identifying those bits of code that implicitly use the current Time and changing them to allow the desired time to be passed as an argument.
As an aside I think your code would leave the altered time in place if the controller raised an exception

Related

rails 3 & activerecord: do I need to take special record locking precautions to update a counter field?

Each time a user's profile gets displayed we update an integer counter in the Users table.
I started thinking about high-concurrencey situations and wondered what happens if a bunch of people hit a user's profile page at the exact same time: does rails/activerecord magically handle the record locking and semaphore stuff for me?
Or does my app need to explicitly call some sort of mechanism to avoid missing update events when concurrent calls are made to my update method?
def profile_was_hit
self.update_attributes :hitcounter => self.hitcount + 1
end
And along those lines, when should I use something like Users.increment_counter(:hit counter, self.id) instead?
In the default configuration, a single Rails instance is only handling a single request at a time, so you don't have to worry about any concurrency trouble on the application layer.
If you have multiple instances of your application running (which you probably do/will), they will all make requests to the database without talking to one another about it. This is why this is handled at the database layer. MySQL, PostgreSQL, etc. are all going to lock the row on write.
The way you are handling this situation isn't ideal for performance though because your application is reading the value, incrementing it, and then writing it. That lag between read and write does allow for you to miss values. You can avoid this by pushing the increment responsibility to your database (UPDATE hitcounter SET hitcount = hitcount + 1;). I believe ActiveRecord has support for this built in, I'll/you'll have to go dig around for it though. Update: Oh, duh, yes you want to use the increment_counter method to do this. Reference/Documentation.
Once you update your process push incrementing responsibility to the database I wouldn't worry about performance for a while. I once had a PHP app do this once per request and it scaled gloriously for me to 100+ updates/second (mysqld on the same machine, no persistent connections, and always < 10% cpu).

How can I postpone database updates in Rails?

I'm building something akin to Google Analytics and currently I'm doing real time database updates. Here's the workflow for my app:
User makes a RESTful API request
I find a record in a database, return it as JSON
I record the request counter for the user in the database (i.e. if I user makes 2
API calls, I increment the request counter for the user by '2').
1 and 2 are really fast in SQL - they are SELECTs. #3 is really slow, because it's an UPDATE. In the real world, my database (MySQL) is NOT scaling. According to New Relic, #3 is taking most of the time - up to 70%!
My thinking is that I need to stop doing synchronous DB operations. In the short term, I'm trying to reduce DB writes, so I'm thinking about a global hash (say declared in environment.rb) that is accessible from my controllers and models that I can write to in lieu of writing to the DB. Every so often I can have a task write the updates that need to be written to the DB.
Questions:
Does this sound reasonable? Any gotchas?
Will I run into any concurrency problems?
How does this compare with writing logs to the file system and importing later?
Should I be using some message queuing system instead, like Starling? Any recommendations?
PS: Here's the offending query -- all columns of interest are indexed:
UPDATE statistics_api SET count_request = COALESCE(count_request, ?) + ? WHERE (id = ?)
Your hash solution sounds like it's a bit too complex. This set of slides is an insightful and up-to-date resource that addresses your issue head on:
http://www.slideshare.net/mattmatt/the-current-state-of-asynchronous-processing-with-ruby
They say the simplest thing would be:
Thread.new do
MyModel.do_long_thing
end
But the Ruby mysql driver is blocking, so a mysql request in that thread could still block your request. You could use mysqlplus as a driver and get non-blocking requests, but now we're getting a pretty complex and specialized solution.
If you really just want this out of your request cycle, but can spare locking the server for it, you can do something like:
MyController
after_filter :do_jobs
def index
#job = Proc.new{ MyModel.do_long_thing }
end
private
def do_jobs
return unleses #job
#job.call
end
end
I'd abstract it into ApplicationController more, but you get the idea. The proc defers updates until after the request.
If you are serious about asynchronous and background processes, you'll need to look at the various options out there and make a decision about what fits you needs. Matt Grande recommended DelayedJob- that's a very popular pick right now, but if your entire server is bogged down with database writes, I would not suggest it. If this is just a particularly slow update, but your server is not over-loaded, then maybe it's a good solution.
I currently use Workling with Starling in my most complex project. Workling has been pretty extensible, but Starling has been a little less than ideal. One of Workling's advantages is the ability to swap backends, so we can move off Starling if it becomes a large problem.
If your server is bogged with writes, you will need to look at scaling it up regardless of your asynchronous task approach.
Good luck! It sounds like you're app is growing at an exciting pace :-)
I just asked a similar question over on the EventMachine mailing list, and I was suggested that I try phat (http://www.mikeperham.com/2010/04/03/introducing-phat-an-asynchronous-rails-app/) to get asynchronous database access.
Maybe you should try it out.
Do it later with DelayedJob.
Edit: If your DB is being hit so much that one UPDATE is noticeably slowing down your requests, maybe you should consider setting up a master-slave database architecture.

Rails classes reloading in production mode

Is there a way to reload ruby model in runtime?
For example I've a model
class Model
def self.all_models
##all_models ||= Model.all
end
end
Records in this model are changed very rarely, but then they do, I don't want to reload whole application, just this one class.
On a Development server, this is not a problem. A production server is a big one.
In reality it's not feasible without restarting the server. The best you could do is add a before filter in ApplicationController to update class variables in each worker thread, but it has to be done on every request. You can't turn this behaviour off and on easily.
If it's an resource intensive operation, you can settle for a less intensive test like a comparing value in a database/last modified time of a file to a constant defined at runtime in an effort to determine if the full reload should occur. But you would still have to do this as part of every request.
However, to the best of my knowledge modifying routes once the server has been loaded is impossible. Modifying other site wide variables may require a little more effort, such as reading from a file/database and updating in a before filter.
There may be another way, but I haven't tried it at all. So there's no guarantee.
If you're using a ruby based server such as mongrel. In theory you could use hijack to update the model/routes/variables in the control thread from which, worker threads are spawned from.

Storing Objects in a Session in Rails

I have always been taught that storing objects in a session was a bad idea. Instead IDs should be stored that retrieve the record when needed.
However, I have an application that I wonder is an exception to this rule. I'm building a flashcard application, and the words being quizzed are in a table in the database whose schema doesn't change. I want to store the words currently being quizzed in a session, so a user can finish where they started in case they move on to a separate page.
In this case, is it possible to get away with storing these words as objects in the database? If so, why? The reason I ask is because the quiz is designed to move quickly, and I'd hate to waste a database call on retrieving a record that never changes in the first place. However, perhaps there are other negatives to a large session that I'm not aware of.
*For the record, I have tried caching it with the built-in memcache methods in Rails 2.3, but apparently that has a maximum size per item of 1MB.
The main reason not to store objects in the session is that if the object structure changes, you will get an exception. Consider the following:
class Foo
attr_accessor :bar
end
class Bar
end
foo = Foo.new
foo.bar = Bar.new
put_in_session(foo)
Then, in a subsequent release of the project, you change Bar's name. You reboot the server, and try to grab foo out of the session. When it tries to deserialize, it fails to find Bar and explodes.
It might seem like it would be easy to avoid this pitfall, but in practice, I've seen it bite a number of people. This is just because serializing an object can sometimes take more along with it than is immediately apparent (this sort of thing is supposed to be transparent) and unless you have rigorous rules about this, things will tend to get flummoxed up.
The reason it's normally frowned upon is that it's extremely common for this to bite people in ActiveRecord, since it's quite common for the structure of your app to shift over time, and sessions can be deserialized a week or longer after they were originally created.
If you understand all that and are willing to put in the energy to be sure that your model does not change and is not serializing anything extra, you're probably fine. But be careful :)
Rails tends to encourage RESTful design, and using sessions isn't very RESTful. I'd probably make a Quiz resource that has a bunch of words, as well as a current_word. This way, when they come back, you'll know where they were.
Now, REST isn't everything (depending on who you talk to), but there's a pretty good case against large sessions. Remember that sessions write things to and from disk, and the more data that you're writing, the longer it takes to read back...
Since your app is a Rails app, I would suggest either:
Using your clients' ability to cache
by caching the cards in javascript.
(you'd need a fairly ajaxy app to
do this, see the latest RailsCast for some interesting points on javascript page caching)
Use one of the many other rails-supported server-side
caching options (i.e. MemCached) to
cache this data.
A much more insidious issue you'll encounter storing objects directly in the session is when you're using CookieStore (the default in Rails 2+ I believe). It's very easy to get CookieOverflow errors which are very hard to recover from.

Application Context in Rails

Rails comes with a handy session hash into which we can cram stuff to our heart's content. I would, however, like something like ASP's application context, which instead of sharing data only within a single session, will share it with all sessions in the same application. I'm writing a simple dashboard app, and would like to pull data every 5 minutes, rather than every 5 minutes for each session.
I could, of course, store the cache update times in a database, but so far haven't needed to set up a database for this app, and would love to avoid that dependency if possible.
So, is there any way to get (or simulate) this sort of thing? If there's no way to do it without a database, is there any kind of "fake" database engine that comes with Rails, runs in memory, but doesn't bother persisting data between restarts?
Right answer: memcached . Fast, clean, supports multiple processes, integrates very cleanly with Rails these days. Not even that bad to set up, but it is one more thing to keep running.
90% Answer: There are probably multiple Rails processes running around -- one for each Mongrel you have, for example. Depending on the specifics of your caching needs, its quite possible that having one cache per Mongrel isn't the worst thing in the world. For example, supposing you were caching the results of a long-running query which
gets fresh data every 8 hours
is used every page load, 20,000 times a day
needs to be accessed in 4 processes (Mongrels)
then you can drop that 20,000 requests down to 12 with about a single line of code
##arbitrary_name ||= Model.find_by_stupidly_long_query(param)
The double at-mark, a Ruby symbol you might not be familiar with, is a global variable. ||= is the commonly used Ruby idiom to execute the assignment if and only if the variable is currently nil or otherwise evaluates to false. It will stay good until you explicitly empty it OR until the process stops, for any reason -- server restart, explicitly killed, what have you.
And after you go down from 20k calculations a day to 12 in about 15 seconds (OK, two minutes -- you need to wrap it in a trivial if block which stores the cache update time in a different global), you might find that there is no need to spend additional engineering assets on getting it down to 4 a day.
I actually use this in one of my production sites, for caching a few expensive queries which literally only need to be evaluated once in the life of the process (i.e. they change only at deployment time -- I suppose I could precalculate the results and write them to disk or DB but why do that when SQL can do the work for me).
You don't get any magic expiry syntax, reliability is pretty slim, and it can't be shared across processes -- but its 90% of what you need in a line of code.
You should have a look at memcached: http://wiki.rubyonrails.org/rails/pages/MemCached
There is a helpful Railscast on Rails 2.1 caching. It is very useful if you plan on using memcached with Rails.
Using the stock Rails cache is roughly equivalent to this.
#p3t0r- is right,MemCached is probably the best option, but you could also use the sqlite database that comes with Rails. That won't work over multiple machines though, where MemCached will. Also, sqlite will persist to disk, though I think you can set it up not to if you want. Rails itself has no application-scoped storage since it's being run as one-process-per-request-handler so it has no shared memory space like ASP.NET or a Java server would.
So what you are asking is quite impossible in Rails because of the way it is designed. What you ask is a shared object and Rails is strictly single threaded. Memcached or similar tool for sharing data between distributed processes is the only way to go.
The Rails.cache freezes the objects it stores. This kind of makes sense for a cache but NOT for an application context. I guess instead of doing a roundtrip to the moon to accomplish that simple task, all you have to do is create a constant inside config/environment.rb
APP_CONTEXT = Hash.new
Pretty simple, ah?

Resources