How to run some tasks in the background and put the results to a global memory hash? - ruby-on-rails

I'm building a website with Rails which can let user check if some domains have been registered. The logic is designed like this:
There is a text field on a page let users input some domain names
When user click check button, the input will be post to the server
server get the inputs, create a background task(which will be executed in another process), and return some text like "checking now..." to user immediately
The background task will post the domain names to another site, get the response, parse it to get useful information, then put data to a globle memory hash (the task takes 3 seconds)
The page contains "checking now..." has a javascript function, will request to server to get the check result. It runs every 2 seconds until it gets result.
There is a action in the server side to handle the "check-status request". It checks that Globle memory hash, it found the result, returns it, otherwise return "[]"
(I'm using Rails 2.3.8, delayed_job)
I was a Java developer, this is what I will do in Java. But I found it difficult to hold a "globle memory hash" for "delayed job worker" and "rails", because they are in different processes.
Is my design is impossible in rails? Must I store the checking result to database or something like "memcached"?
And, can these background tasks run in parallel?

It's not just between DelayedJob's workers and Rails' workers that you won't see the global variables, in production you will be dealing with multiple Rails worker processes as well. So even if you were to set a global variable in Rails, some other Rails worker won't see it.
This is by design. It has the advantage that you can spread the load of Rails and DelayedJob across multiple machines, because they only deal with the stateless request, and look at the database system or other persistent storage to add the statefulness that your web application needs.
From what I've gathered, a Java web application may use a threaded model that would allow you to perform background tasks and set global variables like you want to. But that also limits you to a single machine; what would you do if you had to scale up?
Memcached actually sounds like a really good solution in this case. It's a snap to install, requires very little set up, and is easy to use from within Rails too.

Related

Preventing Rails from connecting to database during initialization

I am quite new at Ruby/Rails. I am building a service that make an API available to users and ends up with some files created in the local filesystem, without any need to connect to any database. Then, once every few hours, I want to run a piece of ruby code that takes these local files, uploads them to Amazon S3 and registers their location into a Postgres database.
Right now both codes live together in the same project. I am observing that every time a user does something the system connects to the database. I have seen this answer which recommends to eliminate all traces of ActiveRecord in my code, but given that I want to have my background bookkeeping process connect to the database I am stuck on what to do.
Is it possible to define two different profiles (one with database and one without) and specify which profile a certain function call should run on? would this work?
I'm a bit confused by this, the db does not magically connect to the database for kicks on every request, it does so because of a specific request requires it. Generally through ActiveRecord but not exclusively
If your system is connecting every time you make a request, then that implies you have some sort of user metric or authorisation based code in there. Just killing off the database will cause this to fail, and likely you'll have to find it anyways, to then get your system to work. I'd advise locating it.
Things to look for are before_filters in controllers, or database session management, for example, or look for what is in the logs - the query should appear - and that will tell you what is being loaded, modified or whatnot.
It might even work to stop your database, just before doing a user activity, and see where the error leads you. Rinse and repeat until the user activity works, without the database.

How to dispaly a holding screen whilst ActiveJob retrieves lots of data from an external API

I have an application that makes API requests to salesforce using restforce.
Specifically the application finds a contact object, returns IDs for all related objects and then pulls the full record for every related object based on their ID.
This takes a long time for two reasons:
There are a lot of request to an external API, usually takes a few fractions of a second for each to reply and for some there can be +500 individual requests.
There is often a large amount of data being pulled back via each request.
All requests currently fall within the salesforce rest API limits but I'm getting timeout errors from my development server as it can take 5+ minutes for some of these requests to process.
Rails 4.2 - How best to handle this?
My question is how do I best get rails to handle this?
I can fire the API requests either from the controller (which definitely violates the skinny controllers) or from the view (via helper methods, which seems like a dodgy hack).
Ideally I'd like to get it running in a background job, but i'm unsure if I can just include all the authentication and other methods in a job in the same way I can include helper methods?
Even if I could get it to work in a background job, I'm unsure what best practice might be for the user experience. Ideally I'd like to route them to a page telling them to "hang tight, go get a coffee" with a progress bar, and then auto route them to the final page once the request is complete...
But I'm unsure how to generate a temporary display until a job has been completed?
Could anyone recommend any gems or strategies that might help me digest this problem?
You should definitely use a background job for this.
Give a database object to the job, which it will update to signal that is has finished, and maybe from time to time to indicate progress.
On the user side, simply tell them that the background job is working, with eventually a progress indicator, and display the result once the database object giving to the job tells you it's ready.

Ruby on Rails times out. How do I fork a process?

I have a page of a long list of items. Each has a check box next to it. There's a jQuery check-all function, but when I submit all of them at once, the request times out because it's doing a bunch of queries and inserting a bunch of records in the MySQL database for each item. If it were to not timeout, it'd probably take about 20 minutes. Instead, I just submit like 30 at a time.
I want to be able to just check all and submit and then just go on doing other work. My coworker (1) said I should just write a rake task. I did that, but I ended up duplicating code, and I prefer the user interface because what if I want to un-check a few? The rake task just submits them all.
Another coworker (2) recommended I use fork. He said that would spawn a new process that would run on the server but allow the server to respond before it's done. Then, since an item disappears after it's been submitted, I could just refresh the page to check if they're done.
I tried this on my local, however, it still seems that Rails is waiting for the process to finish before it responds to the POST request sent by the HTML form. The code I used looks like this:
def bulk_apply
pid = fork do
params[:ids].each do |id|
Item.find(id).apply # takes a LONG time, esp. x 100
end
end
Process.detach(pid) # reap child process automatically; don't leave running
flash[:notice] = "Applying... Please wait... Then, refresh page. Only submit once. PID: #{pid}"
redirect_to :back
end
Coworker 1 said that generally you don't want to fork Rails because fork creates a child process that is basically a copy of the Rails process. He said if you want to do it through the web GUI, use BackgroundJob (Bj) (because we're already using that in our Rails app). So, I'm looking into BackgroundJob now, but what do you recommend?
I've had good success using background job. If you need rails you will be using script/runner which still starts up a new process with rails. The good thing is that Backround Job will make sure that there is never more than one running at a time.
You can also use script runner directly, or even run a rake task in the background like so:
system " RAILS_ENV=#{RAILS_ENV} ruby #{RAILS_ROOT}/script/runner 'CompositeGrid.calculate_values(#{self.id})' & " unless RAILS_ENV == "test"
The ampersand tells it to start a new process. Be careful because you probably don't want a bunch of these running at the same time. I would definitely take advantage of background job if it is already available.
you should check out IronWorker . It would be super easy to do what you want and it doesn't matter how long it takes.
In your action you'd just instantiate a worker which has the code that's doing all your database queries. Example worker:
Item.find(id).apply # takes a LONG time, esp. x 100
And here's how you'd queue up those jobs to run in parallel:
client = IronWorkerNG::Client.new
ids.each do |id|
client.tasks.create("MyWorker", "id"=>id)
end
That's all you'd need to do and IronWorker takes care of the rest.
Try delayed_job gem. This is a database-based background job gem. We used it in an e-commerce website. For example, sending order confirmation email to the user is an ideal candidate for delayed job.
Additionally you can try multi-threading, which is supported by Ruby. This could make things run faster. Forking an entire process tends to be expensive due to memory usage.

How to Get Results From a Background Process

I am designing a Ruby on Rails application that requests XML feeds, reads them in, and parses them into objects to be used in views. Since the request for the XML feed and subsequent receipt of it can take several seconds from some sources to complete I need a way to offload these tasks from my front-line application tier. I do not want my application servers to take more than a few hundred milliseconds to process a request. Currently the application serving processes sit and wait for the XML feed data to be returned so they can parse it and finish return the user's request. I am aware of DelayedJobs, however given that the result of this action is to be returned to the user in real-time I am unsure of how to offload it to a background task and receive the result.
If I offload this task to a background task how does the result get returned to the user loading the page?
One common model for this sort of thing is to use your preferred background job library (you mention DelayedJob, which seems to be a popular one) to offload the task from the request/response cycle, and then set up AJAX polling on the client to update the page with the results once they become available.
You can have your main returned page fire an AJAX request at a second tier of servers that handle the XML retrieval, and return HTML for the section of the page that will contain that information. That way you aren't running any asynchronous jobs (from the server's point of view) and the retrieval won't start until the AJAX request comes in, which will reduce the bandwidth you waste on bots.
This is a standard use of AJAX, so I'm not sure whether I'm missing something in your problem that makes it inappropriate for you.
The most common approach is to use AJAX and DelayedJob here, but it is only an usability improvement - instead of user waiting for 5sec to load the page they get an empty or half-empty page with a spinner for 5 seconds. The only way (in my opinion) to really improve the user experience is to load and process those xml feeds periodically and display to user the cached result.
If you are open to Perl code running on your server, I'd lift a piece of LiveJournal infrastructure: Gearman and TheSchwartz
Sounds like you want Gearman - and it has Ruby client bindings.
(see
http://www.livejournal.com/doc/server/lj.install.workers_setup_install.html )

Showing status of current request by AJAX

I'm trying to develop an application which modifies a couple of tasks of the famous Online-TODO List RememberTheMilk (rememberthemilk.com) using the REST API.
Unfortunately the modifying takes a lot of time, so I want to give a feedback to the users.
My idea was just to display a couple of text lines (e.g. modifying task 1 of n...).
Therefore I used the periodically_call_remote on my page and called a which reads a Singleton.
In the request I store the text that should be displayed in the same singleton. But I found out, that once I set up a request, the periodically_call_remote does not update the specified div.
My question to this:
1. is this a good way to implement this behaviour?
2. if it is, how do get the periodically_call_remote to work during a submit?
Using a Singleton is most definitely a bad idea. In an advanced production setup it isn't guaranteed that subsequent requests will go to the same process or to the same machine (and subsequently will have a different Singleton). Plus, if you have many users, I don't even want to think about what'll happen to those poor Singletons.
Does any of this stuff actually need to go through your Rails app? It seems like you can call the RTM API via Javascript from the page the user is on and then update the page when the XHR request is complete.

Resources