Rails process controls to delay/impede multiple submissions

Rails process controls to delay/impede multiple submissions - ruby-on-rails

An application is based upon client's use and thus network connections. Sometimes, some processes are longer (i.e. capturing a hand-written line, converting to JSON, generating image and uploading to a static server), which can be compounded via flaky web connections.
Overanxious users may believe something is wrong and continue submitting... making the situation worse.
For specific actions, say update_signature, how could one conceivably trash other requests for that particular action and unique identifier ?

You can handle rate limiting with Rack::Throttle. And debounce double clicks (convert to a single click) with javascript.
But don't optimize prematurely.

Related

Should I POST high-rate user actions to my server on a per-action basis or send the batch of events once the session is closed?

I'm building a site where users can watch a video and click as many times as they want to "like" it. It's a bit like Periscope's "Hearts" function for those who know it.
The viewers are viewing the video on a web browser for now. Every "Like" is input into a heroku-hosted REDIS instance, so the write/read are fairly cheap. However potentially there could be a high rate of simultaneous input as many users watch a video at the same time.
In this scenario, I'm facing two options:
Send an event to the REDIS instance every time the user "likes." convenience: story the "like" right away with all relevant information. Inconvenience: lots of concurrent likes into the server.
Cache the "likes" locally and only send to REDIS once the session is over. Problem: at any time the user can close his browser (and potentially never return) so the "like" information could be lost permanently.
Any advice on which option is preferable?

Don't cache.
First, it's a really big complication as you won't know when the session is really over.
Second, Redis increment is probably as fast or faster than your cache. I bet your concern is Rails only, not Redis.
You may eventually want to make another endpoint - maybe a simple Sinatra app - to simply handle likes. I noticed autosuggest gems sometimes do this (for example) and it saves all the overhead of a rails request.
If it is a successful app, the concern could be someone writing a script to 'like' continually. You may need to put in some throttle to allow a limited number of requests over time.

opening and closing streaming clients for specific durations

I'd like to infrequently open a Twitter streaming connection with TweetStream and listen for new statuses for about an hour.
How should I go about opening the connection, keeping it open for an hour, and then closing it gracefully?
Normally for background processes I would use Resque or Sidekiq, but from my understanding those are for completing tasks as quickly as possible, not chilling and keeping a connection open.
I thought about using a global variable like $twitter_client but that wouldn't horizontally scale.
I also thought about building a second application that runs on one box to handle this functionality, but that seems excessive if it can be integrated into the main app somehow.
To clarify, I have no trouble starting a process, capturing tweets, and using them appropriately. I'm just not sure what I should be starting. A new app? A daemon of some sort?
I've never encountered a problem like this, and am completely lost. Any direction would be much appreciated!

Although not a direct fix, this is what I would look at:
Time
You're working with time, so I'd look at what time-centric processes could be used to induce the connection for an hour
Specifically, I'd look at running a some sort of job on the server, which you could fire at specific times (programmatically if required), to open & close the connection. I only have experience with resque, but as you say, it's probably not up to the job. If I find any better solutions, I'll certainly update the answer
Storage
Once you've connected to TweetStream, you'll want to look at how you can capture the tweets for that time period. It seems a waste to create a data table just for the job, so I'd be inclined to use something like Redis to store the tweets that you need
This can then be used to output the tweets you need, allowing you to simulate storing / capturing them, but then delete them after the hour-window has passed
Delivery
I don't know what context you're using this feature in, so I'll just give you as generic process idea as possible
To display the tweets, I'd personally create some sort of record in the DB to show the time you're pinging TweetStream that day (if it changes; if it's constant, just set a constant in an initializer), and then just include some logic to try and get the tweets from Redis. If you're able to collect them, show them as you wish, else don't print anything
Hope that gives you a broader spectrum of ideas?

Is using a Web API as dataprovider for a website efficient?

I was thinking about setting up a project with Web API. Basically build the API first and program the web site using this API.
Although it's sound promising I was wondering:
If I separate the logic in a nice way, I might end up retrieving data on a web-page through multiple API call's, which in turn are multiple connections with the server with all the overhead etc..
For example, if I use, let's say 8 different API call's on one page, I can't imagine it won't have an impact on the web-page's performance.
So, have I misunderstood something? Or is this kind of overhead negligible - or does the need for multiple call's indicates that the design is wrong?
Thanks in advance.

Well, we did it. Web API server providing the REST access to all the data. Independent UI Clients consuming it as the only access-point to underlying peristence.
The first request takes some time. It is significantly longer. It must init all the UI Client stuff, and get the least needed data from a server. (Menu, user, access rights, metadata...list-view data)
The point, the real advantage, is hidden in the second, the third... request. Lot of stuff is already there on a UI Client. And even, if this is requested again, caching (Server, Client, both) could be introduced.
So, this would mean more requests (at least during the UI Client start up)... but it does not imply ... slower application.
The maintenance benefit is hidden (maybe it is not hidden, it should be obvious) in the Separation of Concern. On the server, we are no longer solving the issue, where to place the user data handling, the base-controller or child-controller... should there by the Master-page, the Layout-controller...
Solved. We are taking care about single, specific stuff, published via REST. One method, one business operation. And that's the dream if we'd like to keep that application alive and be the repairman and extender.

One aspect is that you can display the page to the end user very very fast . Once the page is loaded, use Jquery async calls and any Javscript template tool (like angularjs or mustacheJs) to call the web api simultaneously to build the client page views.
I have used this approach in multiple project and experience of the user is tremendous.

Most modern browsers support 6-8 parallel connections to the same site. So you do have to be careful about that. Unless you are connecting to that many separate systems, I would try to reduce the number of connections. Or ensure the calls are called asynchronously by different events to reduce the chance of parallel connections.

Making a series of HTTP calls to obtain data for your page will have an overhead. Only testing will tell you how that might impact in your scenario.
There is little point using Web API just because you can. You should have a legitimate reason for building a RESTful API. Even then, if it is primarily for your own consumption, design it to deliver a ViewModel for each page in one call.

NSURLConnection and multiple asynchronous requests - is it messing with the data being transmitted?

I have an NSArray of links. I want to parse through them with an online article extractor API (Clear Read), and with the result given back for each article (some HTML) I throw it into an NSString.
My problem arises from the fact that, say my array has 100 URLs in it, I loop through the array shooting each item into the API and getting back some results in JSON. This is firing like 100 NSURLConnection calls at once asynchronously.
I wasn't sure if that'd be a problem, but when I give it 100 URLs (real strings, none are nil) the data that comes back often has either empty values for the JSON keys (when they shouldn't), or the data coming back is nil. There's also a bunch of duplicates.
Should I be handling multiple asynchronous connections better than I am now? If so, how?

A couple of thoughts:
If you're doing concurrent asynchronous requests and are using asynchronous NSURLConnection, then you'll want to define your own class for this download operation to make sure that every connection keeps track of its own properties. That way, everything can be encapsulated within this class where the resulting download objects can keep track of what's downloaded, what's been parsed, etc. If you're not using asynchronous NSURLConnection (e.g. you're just using dataWithContentsOfURL), it's even easier, though you lose some of the progress updates that NSURLConnection provides and/or streaming opportunities.
For best performance, you should do concurrent requests. Having said that, you should not have more than four or five concurrent requests going to any particular server. This is an iOS imposed constraint, and especially if you have a slow network connection, you risk having connections timeout otherwise.
If you're doing preliminary testing on the simulator, you may want to make sure you try out the "network link conditioner". It's part of the "Hardware IO Tools for Xcode", available at the Downloads for Apple Developers. There are issues (such as the aforementioned timeout problems if you have too many concurrent requests going to a particular server) that only manifest themselves in slow connections.
Having said that, you also want to make sure to test your solution on a device with real world network speeds. It's easy to successfully run massively parallel tasks successfully on the simulator that are too greedy for the device. Limiting the number of concurrent sessions to five will diminish this resource problem, but it should be part of your testing strategy.
I agree with JRG-Developer, that you should look into established frameworks, such as AFNetworking. Make sure to set the maxConcurrentOperationCount for the queue of the AFHTTPClient, though, if queueing 100 plus operations.
I don't know how much data your 100 requests entail, but be forewarned that the app approval process has been known to reject apps that make extraordinary networks requests on cellular networks. What constitutes excessive cellular network activity is not explicitly stated in the app review guidelines, though Avoiding iPhone App Rejection From Apple has claimed that you should ensure that you don't exceed more than 4.5mb in 5 minutes. You can use Reachability to determine what type of network you are on and perhaps warn the user if they're on cellular (if the amount of data approaches this threshold).

Have you considered using a third party framework - such as AFNetworking - and limiting the number of asynchronous calls happening at once? Perhaps this might help / solve your problem.
In particular, you might consider creating a networking manager class that creates and manages AFHTTPClient(s), which in turn manages AFHTTPRequestOperations, for each endpoint (base URL) you hit.

Should I convert my action method to async action method?

I have a web site where user can upload a PDF and convert it to WORD doc.
It works nice but sometimes (5-6 times per hour) the users have to wait more than usual for the conversion to take place....
I use ASP.NET MVC and the flow is:
- USER uploads file -> get the stream and convert it to word -> save word file as a temp file -> return the user the url
I am not sure if I have to convert this flow to asynchronous? Basically, my flow is sequential now BUT I have about 3-5 requests per second and CPU is dual core and 4 GB Ram.
And as I know maxConcurrentRequestsPerCPU is 5000; also The default value of Threads Per Processor Limit is 25; so these default settings should be more than fine, right?
Then why still my web app has "waitings" some times? Are there any IIS settings I need to modify from default to anything else or I should just go and make my sync method for conversion to be async?
Ps: The conversion itself is taking between 1 seconds to 40-50 seconds depending on the pdf file size.
UPDATE: Basically what it's not very clear for me is: if a user uploads a file and the conversion is long shouldn't only current request "suffer" because of this? Because the next request is independent, make another CPU call and different thread so should be no wait here, isn't it?

There are a couple of things that must be defined clearly here. Async(hronous) method and flow are not the same thing at least as far as I can understand.
An asynchronous method (using Task, usually also leveraging the async/await keywords) will work in the following way:
The execution starts on thread t1 until it reaches an await
The (potentially) long operation will not take place on thread t1 - sometimes not even on an app thread at all, leveraging IOCP (I/O completion ports).
Thread t1 is free and released back to the thread pool and is ready to service other requests if needed
When the (potentially) long operation returns a thread is taken from the thread pool (could even be the same t1 or, most probably, another one) and the rest of the code execution resumes from the last await encountered
The rest of the code executes
There's a couple of things to note here:
a. The client is blocked during the whole process. The eventual switching of threads and so on happens only on the server
b. This approach is mainly designed to alleviate an unwanted condition called 'thread starvation'. It is not meant to speed up the total client waiting time and it usually doesn't speed up the process.
As far as I understand an asynchronous flow would mean, at least in this case, that after the user's request of converting the document, the client (i.e. the client's browser) would quickly receive a response in which (s)he is informed that this potentially long process has started on the server, the user should be patient and this current response page might provide progress feedback.
In your case I recommend the second approach because the first approach would not help at all.
Of course this will not be easy. You need to emulate a queue, you need to have a processing agent and an eviction policy (most probably enforce by the same agent if you don't want a second agent).
This would work along the following lines:
a. The end user submits a file, the web server receives it
b. The web server places it in the queue and receives a job number
c. The web server returns the user a response with the job number (let's say an HTML page with a polling mechanism that would periodically receive progress from the server)
d. The agent would start processing the document when it gets the chance (i.e. finishes other work) and update its status in a common place for the web server to pick this information
e. The web server would receive calls from the HTML response asking for the status of the job and would find out that the job is complete and offer a download link or start downloading it directly.
This can be refined in some ways:
instead of the client polling the server, websockets or long polling (for example SignalR covers both) could be used
many processing agents could be used instead of one if the hardware configuration makes sense
The queue can be implemented with a simple RDBMS, Remus Rușanu has a nice article about this.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart