How to share variable value between requests in Rails? - ruby-on-rails

I'm using Ruby on Rails 4.2. In a controller I have a method which took a lot of time to complete making some heavy calculations. I want to inform the user of calculations progress. My idea was to have #progress variable which is updated during calculations and is read by different action processing AJAX requests from frontend. But this idea fails - I always have the default 0 value in AJAX action while the variable is updating in long method. I've tried ##progress, $progress and session[:progress] but with the exactly same results.
Now I'm considering to make a model for storing progress in database and reading it from there, but I can't believe it couldn't be done by some more simple means.
Please share your thoughts!

Theoretical:
The usual approach for these cases is to perform the job asynchronously from the HTTP handler process (so the end-user is not waiting too long for a response from the webserver).
This means:
delegate the heavy work to a background job,
somehow make the client-side aware of when the job is done (2 options here).
Practical (application of the theoretical above in a context of a Rails app):
Background job: The rails community provide a wide variety of gems (+ built-in solution ActiveJob) to do async jobs (= background tasks). They can be divided into 2 main categories:
persisted state: write a file on disk with the current state so the queue can be resumed if server reboots (DelayedJob, Que)
in-memory state: usually faster, but the queue is lost if server reboots (Resque, Sidekiq)
surface to client-side:
There are two main options here:
polling: client-side AJAX call to the back-end every X seconds to check if the background job is done
subscribing via web socket: client-side connecting via web socket to the server and listening to an event triggered when the job is done (ex: ActionCable as pointed out by #Vasilisa)
Opinion-based:
If you want to keep it simple, I would go with a very simple implementation: Resque for the back-end and a polling system for the front-end.
If you want something complete, capable of resisting server reboots and restoring the queue where it was before the crash, I would use a persisted version (DelayedJob for example) or wrap the in-memory solution with your own persisting logic.

Related

In asp.net-mvc, what is the correct way to do expensive operations without impacting other users?

I asked this question about 5 years ago around how to "offload" expensive operations where the users doesn't need to wait for (such as auditng, etc) so they get a response on the front end quicker.
I now have a related but different question. On my asp.net-mvc, I have build some reporting pages where you can generate excel reports (i am using EPPlus) and powerpoint reports (i am using aspose.slides). Here is an example controller action:
public ActionResult GenerateExcelReport(FilterParams args)
{
byte[] results = GenerateLargeExcelReportThatTake30Seconds(args);
return File(results, #"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml", "MyReport.xlsx");
}
The functionality working great but I am trying to figure out if these expensive operations (some reports can take up to 30 seconds to return) are impacting other users. In the previous question, I had an expensive operation that the user DIDN"T have to wait for but in this case he does have to wait for as its a syncronoous activity (click Generate Report and expectation is that users get a report when its finished)
In this case, I don't care that the main user has to wait 30 seconds but i just want to make sure I am not negatively impacting other users because of this expensive operation, generating files, etc
Is there any best practice here in asp.net-mvc for this use case ?
You can try combination of Hangfire and SignalR. Use Hangfire to kickoff a background job and relinquish the http request. And once report generation is complete, use SignalR to generate a push notification.
SignalR notification from server to client
Alternate option is to implement a polling mechanism on client side.
Send an ajax call to enque a hangfire job to generate the report.
And then start polling some api using another ajax call that provides status and as soon report is ready, retrieve it. I prefer to use SignalR rather than polling.
If the report processing is impacting the performance on the web server, offload that processing to another server. You can use messaging (ActiveMQ or RabbitMQ or some other framework of your choice) or rest api call to kick off report generation on another server and then again use messaging or rest api call to notify report generation completion back to the web server, finally SignalR to notify the client. This will let the web server be more responsive.
UPDATE
Regarding your question
Is there any best practice here in asp.net-mvc for this use case
You have to monitor your application overtime. Monitor both Client side as well as server side. There are few tools you can rely upon such as newrelic, app dynamics. I have used newrelic and it has features to track issues both at client browser as well as server side. The names of the product are "NewRelic Browser" and "NewRelic Server". I am sure there are other tools that will capture similar info.
Analyze the metrics overtime and if you see any anomalies then take appropriate actions. If you observe server side CPU and memory spikes, try capturing metrics on client side around same timeframe. On client side if you notice any timeout issues, connection errors that means your application users are unable to connect to your app while the server is doing some heavy lifting. Next try to Identify server side bottlenecks. If there is not enough room to performance tune the code, then go thru some server capacity planning exercise and figure out how to further scale your hardware or move the background jobs out of the web servers to reduce load. Just capturing metrics using these tools may not be enough, you may have to instrument (log capturing) your application to capture additional metrics to properly monitor application health.
Here you can find some information about capacity planning for .net application from Microsoft.
-Vinod.
These are all great ideas on how to move work out of the request/response cycle. But I think #leora simply wants to know whether a long-running request will adversely impact other users of an asp.net application.
The answer is no. asp.net is multi-threaded. Each request is handled by a separate worker thread.
In general it could be considered a good practice to run long running tasks in background and give some kind of notification to user when the job is done. As you probably know web request execution time is limited to 90 seconds, so if your long running task could exceed this, you have no choice but to run in some other thread/process. If you are using .net 4.5.2 you can use HostingEnvironment.QueueBackgroundWorkItem for running long running tasks in background and use SignalR to notify user when the task is finished the execution. In case that you are generating a file you can store it on server with some unique ID and send to user a link for downloading it. You can delete this file later (with some windows service for example).
As mentioned by others, there are some more advanced background task runners such as Hangfire, Quartz.Net and others but the general concept is the same - run task in backround and notify user when it is done. Here is some nice article about different oprions to run background tasks.
You need to use async and await of C#.
From your question I figured that you are just concerned with the fact that the request can be taking more resources than it should, instead of with scalability. If that's the case, make your controller actions async, as well as all the operations you call, as long as they involve calls that block threads. e.g. if your requests go through wires or I/O operations, they will be blocking the thread without async (technically, you will, since you will wait for the response before continuing). With async, those threads become available (while awaiting for the response), and so they can potentially serve other requests of other users.
I assumed you are not wandering how to scale the requests. If you are, let me know, and I can provide details on that as well (too much to write unless it's needed).
I believe a tool/library such as Hangfire is what your looking for. First, it'll allows for you to specify a task run on a background thread (in the same application/process). Using various techniques, such as SignalR allows for real-time front-end notification.
However, something I set up after using Hangfire for nearly a year was splitting our job processing (and implementation) to another server using this documentation. I use an internal ASP.NET MVC application to process jobs on a different server. The only performance bottleneck, then, is if both servers use the same data store (e.g. database). If your locking the database, the only way around it is to minimize the locking of said resource, regardless if the methodology you use.
I use interfaces to trigger jobs, stored in a common library:
public interface IMyJob
{
MyJobResult Execute( MyJobSettings settings );
}
And, the trigger, found in the front-end application:
//tell the job to run
var settings = new MyJobSettings();
_backgroundJobClient.Enqueue<IMyJob>( c => c.Execute( settings ) );
Then, on my background server, I write the implementation (and hook in it into the Autofac IOC container I'm using):
public class MyJob : IMyJob
{
protected override MyJobResult Running( MyJobSettings settings )
{
//do stuff here
}
}
I haven't messed too much with trying to get SignalR to work across the two servers, as I haven't run into that specific use case yet, but it's theoretically possible, I imagine.
You need to monitor your application users to know if other users are being affected e.g. by recording response times
If you find that this is affecting other users, you need to run the task in another process, potentially on another machine. You can use the library Hangfire to achieve this.
Using that answer, you can declare a Task with low priority
lowering priority of Task.Factory.StartNew thread
public ActionResult GenerateExcelReport(FilterParams args)
{
byte[] result = null;
Task.Factory.StartNew(() =>
{
result = GenerateLargeExcelReportThatTake30Seconds(args);
}, null, TaskCreationOptions.None, PriorityScheduler.BelowNormal)
.Wait();
return File(result, #"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml", "MyReport.xlsx");
}
Queue the jobs in a table, and have a background process poll that table to decide which Very Large Job needs to run next. Your web client would then need to poll the server to determine when the job is complete (potentially by checking a flag in the database, but there are other methods.) This guarantees that you won't have more than one (or however many you decide is appropriate) of these expensive processes running at a time.
Hangfire and SignalR can help you here, but a queueing mechanism is really necessary to avoid major disruption when, say, five users request this same process at the same time. The approaches mentioned that fire off new threads or background processes don't appear to provide any mechanism for minimizing processor / memory consumption to avoid disrupting other users due to consuming too many resources.

How to perform parallel processing on a "single" rails API in server side?

There are a lot of methods to handle "multiple" API requests in server side. Parallel processing can be implemented here.
But i would like to know how a single API can be parallely processed.
For example:
If an API request executes a method say method1.
def method 1
.....
......
.......
end
If method1 is a long method which may take a long time for processing (include multiple loops and database queries), instead of processing it sequentially, is there a scope for parallel processing there?
One way would be using resque for creating background jobs. But is there any other way to do it and if so how should the code be written to accommodate the requirement.
And is there any server side method to do it which is not ruby specific?
Note that there is a huge difference between event based servers and background jobs.
An event based server often runs on a single thread and uses callbacks for non-blocking IO. The most famous example is Node.js. For ruby there is the eventmachine library and various frameworks and simple HTTP servers.
An evented server can start processing one request and then switch to another request while the first is waiting for a database call for example.
Note that even if you have a event based server you can't be slow at processing requests. The user experience will suffer and clients will drop the connection.
Thats where background jobs (workers) come in, they let your web process finish fast so that it can send the response and start dealing with the next request. Slow processes like sending out emails or cleanup that don't have does not require user feedback or concurrency are farmed out to workers.
So in conclusion - if your application is slow then using parallel processing is not going to save you. Its not a silver bullet. Instead you should invest in optimising database queries and leveraging caching so that responses are fast.
While you could potentially run database queries or other operations in parallel in Rails, the greatly added complexity is probably not worth the performance gains.
What I mean here is that with what you are usually doing in Rails concurrency is not really applicable - you're fetching something from the DB and using it to make JSON or HTML. You can't really start rendering until you have the results back anyways. While you could potentially do something like fetch data and use it to render partials concurrently Rails does not support this out of the box since it would greatly increase the complexity while not offering much to the majority of the users of the framework.
As always - don't optimise prematurely.

Rails - passing data from background job to main thread

I am using in my app a background job system (Sidekiq) to manage some heavy job that should not block the UI.
I would like to transmit data from the background job to the main thread when the job is finished, e.g. the status of the job or the data done by the job.
At this moment I use Redis as middleware between the main thread and the background jobs. It store data, status,... of the background jobs so the main thread can read what it happens behind.
My question is: is this a good practice to manage data between the scheduled job and the main thread (using Redis or a key-value cache)? There are others procedures? Which is best and why?
Redis pub/sub are thing you are looking for.
You just subscribe main thread using subscribe command on channel, in which worker will announce job status using publish command.
As you already have Redis inside your environment, you don't need anything else to start.
Here are two other options that I have used in the past:
Unix sockets. This was extremely fiddly, creating and closing connections was a nuisance, but it does work. Also dealing with cleaning up sockets and interacting with the file system is a bit involved. Would not recommend.
Standard RDBMS. This is very easy to implement, and made sense for my use case, since the heavy job was associated with a specific model, so the status of the process could be stored in columns on that table. It also means that you only have one store to worry about in terms of consistency.
I have used memcached aswell, which does the same thing as Redis, here's a discussion comparing their features if you're interested. I found this to work well.
If Redis is working for you then I would stick with it. As far as I can see it is a reasonable solution to this problem. The only things that might cause issues are generating unique keys (probably not that hard), and also making sure that unused cache entries are cleaned up.

Deferring blocking Rails requests

I found a question that explains how Play Framework's await() mechanism works in 1.2. Essentially if you need to do something that will block for a measurable amount of time (e.g. make a slow external http request), you can suspend your request and free up that worker to work on a different request while it blocks. I am guessing once your blocking operation is finished, your request gets rescheduled for continued processing. This is different than scheduling the work on a background processor and then having the browser poll for completion, I want to block the browser but not the worker process.
Regardless of whether or not my assumptions about Play are true to the letter, is there a technique for doing this in a Rails application? I guess one could consider this a form of long polling, but I didn't find much advice on that subject other than "use node".
I had a similar question about long requests that blocks workers to take other requests. It's a problem with all the web applications. Even Node.js may not be able to solve the problem of consuming too much time on a worker, or could simply run out of memory.
A web application I worked on has a web interface that sends request to Rails REST API, then the Rails controller has to request a Node REST API that runs heavy time consuming task to get some data back. A request from Rails to Node.js could take 2-3 minutes.
We are still trying to find different approaches, but maybe the following could work for you or you can adapt some of the ideas, I would love to get some feedbacks too:
Frontend make a request to Rails API with a generated identifier [A] within the same session. (this identifier helps to identify previous request from the same user session).
Rails API proxies the frontend request and the identifier [A] to the Node.js service
Node.js service add this job to a queue system(e.g. RabbitMQ, or Redis), the message contains the identifier [A]. (Here you should think about based on your own scenario, also assuming a system will consume the queue job and save the results)
If the same request send again, depending on the requirement, you can either kill the current job with the same identifier[A] and schedule/queue the lastest request, or ignore the latest request waiting for the first one to complete, or other decision fits your business requirement.
The Front-end can send interval REST request to check if the data processing with identifier [A] has completed or not, then these requests are lightweight and fast.
Once Node.js completes the job, you can either use the message subscription system or waiting for the next coming check status Request and return the result to the frontend.
You can also use a load balancer, e.g. Amazon load balancer, Haproxy. 37signals has a blog post and video about using Haproxy to off loading some long running requests that does not block shorter ones.
Github uses similar strategy to handle long requests for generating commits/contribution visualisation. They also set a limit of pulling time. If the time is too long, Github display a message saying it's too long and it has been cancelled.
YouTube has a nice message for longer queued tasks: "This is taking longer than expected. Your video has been queued and will be processed as soon as possible."
I think this is just one solution. You can also take a look EventMachine gem, that helps to improve the performance, handler parallel or async request.
Since this kind of problem may involve one or more services. Think about possibility of improving performance between those services(e.g. database, network, message protocol etc..), if caching may help, try out caching frequent requests, or pre-calculate results.

Executing large numbers of asynchronous IO-bound operations in Rails

I'm working on a Rails application that periodically needs to perform large numbers of IO-bound operations. These operations can be performed asynchronously. For example, once per day, for each user, the system needs to query Salesforce.com to fetch the user's current list of accounts (companies) that he's tracking. This results in huge numbers (potentially > 100k) of small queries.
Our current approach is to use ActiveMQ with ActiveMessaging. Each of our users is pushed onto a queue as a different message. Then, the consumer pulls the user off the queue, queries Salesforce.com, and processes the results. But this approach gives us horrible performance. Within a single poller process, we can only process a single user at a time. So, the Salesforce.com queries become serialized. Unless we run literally hundreds of poller processes, we can't come anywhere close to saturating the server running poller.
We're looking at EventMachine as an alternative. It has the advantage of allowing us to kickoff large numbers of Salesforce.com queries concurrently within a single EventMachine process. So, we get great parallelism and utilization of our server.
But there are two problems with EventMachine. 1) We lose the reliable message delivery we had with ActiveMQ/ActiveMessaging. 2) We can't easily restart our EventMachine's periodically to lessen the impact of memory growth. For example, with ActiveMessaging, we have a cron job that restarts the poller once per day, and this can be done without worrying about losing any messages. But with EventMachine, if we restart the process, we could literally lose hundreds of messages that were in progress. The only way I can see around this is to build a persistance/reliable delivery layer on top of EventMachine.
Does anyone have a better approach? What's the best way to reliably execute large numbers of asynchronous IO-bound operations?
I maintain ActiveMessaging, and have been thinking about the issues of a multi-threaded poller also, though not perhaps at the same scale you guys are. I'll give you my thoughts here, but am also happy to discuss further o the active messaging list, or via email if you like.
One trick is that the poller is not the only serialized part of this. STOMP subscriptions, if you do client -> ack in order to prevent losing messages on interrupt, will only get sent a new message on a given connection when the prior message has been ack'd. Basically, you can only have one message being worked on at a time per connection.
So to keep using a broker, the trick will be to have many broker connections/subscriptions open at once. The current poller is pretty heavy for this, as it loads up a whole rails env per poller, and one poller is one connection. But there is nothing magical about the current poller, I could imagine writing a poller as an event machine client that is implemented to create new connections to the broker and get many messages at once.
In my own experiments lately, I have been thinking about using Ruby Enterprise Edition and having a master thread that forks many poller worker threads so as to get the benefit of the reduced memory footprint (much like passenger does), but I think the EM trick could work as well.
I am also an admirer of the Resque project, though I do not know that it would be any better at scaling to many workers - I think the workers might be lighter weight.
http://github.com/defunkt/resque
I've used AMQP with RabbitMQ in a way that would work for you. Since ActiveMQ implements AMQP, I imagine you can use it in a similar way. I have not used ActiveMessaging, which although it seems like an awesome package, I suspect may not be appropriate for this use case.
Here's how you could do it, using AMQP:
Have Rails process send a message saying "get info for user i".
The consumer pulls this off the message queue, making sure to specify that the message requires an 'ack' to be permanently removed from the queue. This means that if the message is not acknowledged as processed, it is returned to the queue for another worker eventually.
The worker then spins off the message into the thousands of small requests to SalesForce.
When all of these requests have successfully returned, another callback should be fired to ack the original message and return a "summary message" that has all the info germane to the original request. The key is using a message queue that lets you acknowledge successful processing of a given message, and making sure to do so only when relevant processing is complete.
Another worker pulls that message off the queue and performs whatever synchronous work is appropriate. Since all the latency-inducing bits have already performed, I imagine this should be fine.
If you're using (C)Ruby, try to never combine synchronous and asynchronous stuff in a single process. A process should either do everything via Eventmachine, with no code blocking, or only talk to an Eventmachine process via a message queue.
Also, writing asynchronous code is incredibly useful, but also difficult to write, difficult to test, and bug-prone. Be careful. Investigate using another language or tool if appropriate.
also checkout "cramp" and "beanstalk"
Someone sent me the following link: http://github.com/mperham/evented/tree/master/qanat/. This is a system that's somewhat similar to ActiveMessaging except that it is built on top of EventMachine. It's almost exactly what we need. The only problem is that it seems to only work with Amazon's queue, not ActiveMQ.

Resources