Processing uploaded images asynchronously: what to do in the meantime? - image-processing

I need to accept images uploaded by the users, and do some processing in the background, like generating different size thumbnails, a checksum for the original image, check for duplicates, etc. After that, the user should be able to see his submission.
The problem is that the http response will probably be sent before the processing is finished, so what do I tell to the user?
I can think of 4 options:
Put a dumb placeholder for the thumbnails, with a sign that reads 'processing' or something, plus an explanation somewhere. The user will have to press f5 until he sees the image, unless he already trusts the system and knows it's going to work.
Put a smart placeholder, something like a javascript animation plus recurring ajax calls that will fetch the thumbnails when they are ready. This is great for the user experience, but might generate some overhead in the server.
Do the processing asynchronously to avoid overloading, but block the request until the processing is finished. This one looks like a good choice to deliver the product fast, and iterate later if the server starts getting many upload at the same time.
Web sockets?
Are there other options? Which one looks better to you? Are there any pros/cons I'm not seeing?

I would go with
"Put a smart placeholder, something like a javascript animation plus recurring ajax calls that will fetch the thumbnails when they are ready. This is great for the user experience, but might generate some overhead in the server."
You can then refine this further based on analytics. With time you will know how long each image would takes to transform based on factors like image size, server-load etc. You can incorporate this knowledge to optimize your JS scripts that poll for results.
Don't try to optimize without data points and profiling.

Related

How to optimise computation intensive request response on rails [duplicate]

This question already has answers here:
How do I handle long requests for a Rails App so other users are not delayed too much?
(3 answers)
Closed 6 years ago.
I have an application, which does a lot of computation on few pages(requests). The web interface sends an AJAX request. The computation takes sometimes about 2-5 minutes. The problem is, by this time AJAX request times out.
We can certainly increase the timeout on the web portal, but that doesn't sound like right solution. Also, to improve performance:
Removed N+1/Duplicate queries
Implemented Caching
What else could be done here to reduce the calculation time?
Also, if it still takes longer, I was thinking of following solutions:
Do the computation beforehand and store it in DB. So when the actual request comes, there is no need of calculation. (Apprehensive about this approach. Since we will have to modify/Erase-and-recalculate this data, whenever there is some application logic change.)
Load the whole data in cache when application starts/data gets modified. But for the first time computation has to be done. Also, can't keep whole data in the cache when the application starts. So need to store it in the cache as per demand.
Maybe, do something like Angular promise, where promise gets fulfilled when the response comes from the server.
Do we have any alternative to do this efficiently?
UPDATE:
Depending on user input, the calculation might happen in few seconds. And also it might take 2-5 minutes. The scenario is, user imports an excel. The excel has been parsed and saved in DB. Now on another page, user wants to see the report/analytics graph derived with few calculations on the imported data(which has already been saved to db with background job). The calculation has to be done with many factors, so do not want to save it in DB(As pointed above). Also, when user request the report/analytics graph, It'll be bad experience to tell him that graph will be shown after sometime. You'll get email/notification etc.
The extremely typical solution is to enqueue a job for background processing, and return a job ID to the front-end. Your front-end can then poll for completion using that job ID, or you can trigger a notification such as an email to be sent to the user when the job completes.
There are a multitude of gems for this, and it is such a popular and accepted solution that Rails introduced its own ActiveJob for this exact purpose.
Here are a few possible solutions:
Optimize your tables with indexes to reduce data fetching time.
Preload all rows you'll be dealing with at the beginning, so you won't do a query each time you calculate something... it's faster/easier to #things.select { |r| r.blah } than to Thing.where(conditions)
Instead of all that, just do the computing in PLSQL on the database side. Sure, it's not the same as writing Ruby code but it could be faster.
And yes, cache the whole results set into memcache or redis or something (and expire when something change)
Run the calculation in the background (crontab?) and store the results in a JSON somewhere, or cache the entire HTML file (if you're not localizing or anything)
PS: I'm doing 1,2,3 combined with 5 (caching JSON results into memcache and then pulling the array and formatting/localizing) for a few M records from about 12 tables... sports data mainly.

Should I POST high-rate user actions to my server on a per-action basis or send the batch of events once the session is closed?

I'm building a site where users can watch a video and click as many times as they want to "like" it. It's a bit like Periscope's "Hearts" function for those who know it.
The viewers are viewing the video on a web browser for now. Every "Like" is input into a heroku-hosted REDIS instance, so the write/read are fairly cheap. However potentially there could be a high rate of simultaneous input as many users watch a video at the same time.
In this scenario, I'm facing two options:
Send an event to the REDIS instance every time the user "likes." convenience: story the "like" right away with all relevant information. Inconvenience: lots of concurrent likes into the server.
Cache the "likes" locally and only send to REDIS once the session is over. Problem: at any time the user can close his browser (and potentially never return) so the "like" information could be lost permanently.
Any advice on which option is preferable?
Don't cache.
First, it's a really big complication as you won't know when the session is really over.
Second, Redis increment is probably as fast or faster than your cache. I bet your concern is Rails only, not Redis.
You may eventually want to make another endpoint - maybe a simple Sinatra app - to simply handle likes. I noticed autosuggest gems sometimes do this (for example) and it saves all the overhead of a rails request.
If it is a successful app, the concern could be someone writing a script to 'like' continually. You may need to put in some throttle to allow a limited number of requests over time.

WebGL is it possible to emulate an asynchronous call to gl.finish()

WebGL is nice and asynchronous in that you can send off a long list of rendering commands without waiting for them to complete. However, if for some reason you do need to wait for the rendering to complete, you have to do it synchronously with gl.finish(). Surely it would be better if gl.finish accepted a callback and returned immediately?
Question: Is there any way to emulate this reliably?
Usage case: I am rendering a large number of vertices to a large off-screen canvas and then using drawImage to copy sections of this large canvas to small canvases on the page. I don't actually use gl.finish() but drawImage() seems to have the same effect. In my application, re-rendering is only triggered when the user performs an action (e.g. clicking a button), and it may take several hundred milliseconds. It would be nice if during rendering the browser was still responsive allowing scrolling etc. I am looking in particular for a Chrome solution, though something that also works in Firefox and Safari would be good.
Possible (bad) answer: You could try and estimate how long rendering is going to take and then set a timeout that begins with the call to gl.finish(). However, reliably doing this estimation for all sizes of vertex buffer and all users is going to be pretty tricky and inaccurate.
Possible (non-)answer: requestAnimationFrame does what I'm looking for...it doesn't though, does it?
Possible answer in 2018: Perhaps the ImageBitmap API solves this problem - see MDN docs.
You've already partially hit on your answer: drawImage() does indeed have finish-like behavior in that it forces all outstanding drawing commands to complete before it reads back the image data. The problem is that even if gl.finish() did what you wanted it to, wait for rendering to complete, you would still have the same behavior using it as you do now. The main thread would be blocked while the rendering finishes, interrupting the user's ability to interact with the page.
Ideally what you would want in this scenario is some sort of callback that indicates when a set of draw commands have been completed without actually blocking to wait for them. Unfortunately no such callback exists (and it would be surprisingly difficult to provide one, given the way the browser's internals work!)
A decent middle-ground in your case may be to do some intelligent estimations of when you feel the image may be ready. For example, once you have dispatched your draw call spin through 3 or 4 requestAnimationFrames before you call drawImage. If you consistently observe it taking longer (10 frames?) then spin for longer. This would allow users to continue interacting with the page normally and either produce no delay when doing the draw image, because the contents have finished rendering, or much less delay because you do the synchronous step mid-way through the render. Depending on the intended usage of your site non-realtime rendering could probably even stand to spin for a full second or so before presenting.
This certainly isn't a perfect solution, and I wish I had a better answer for you. Perhaps WebGL will gain the ability to query this type of status in the future, because it would be valuable in cases like yours, but for now this is likely the best you can do.

How to do UITableView operation queues to download concurrently

I am doing a UITableview to download data
the name webservice is very fast, so I use it to populate the table initially, then I start an operation queue for the image.
Then a seperate queue for the rest of data because it loads very slow but that effects the image load time, How can I do the 2 concurrently.
Can you figure out whats slowing the performance there and help me fix it?
As I assume you know, you can specify how many concurrent requests by setting maxConcurrentOperationCount when you create your queue. Four is a typical value.
self.imageDownloadingQueue.maxConcurrentOperationCount = 4;
The thing is, you can't go much larger than that (due to both iOS restrictions and some server restrictions). Perhaps 5, but no larger than that.
Personally, I wouldn't use up the limited number of max concurrent operations returning the text values. I'd retrieve all of those up front. You do lazy loading of images because they're so large, but text entries are so small that the overhead of doing separate network requests starts to impose its own performance penalties. If you are going to do lazy loading of the descriptions, I'd download them in batches of 50 or 100 or so.
Looking at your source code, at a very minimum, you're making twice as many JSON requests as you should (you're retrieving the same JSON in getAnimalRank and getAnimalType). But you really should just alter that initial JSON request to return everything you need, the name, the rank, the type, the URL (but not the image itself). Then in a single call, you get everything you need (except the images, which we'll retrieve asynchronously, and which your server is delivery plenty fast for the UX). And if you decide to keep the individual requests for the rank/type/url, you need to take a look at your server code, because there's no legitimate reason that shouldn't come back instantaneously, and it's currently really slow. But, as I said, you really should just return all of that in the initial JSON request, and your user interface will be remarkably faster.
One final point: You're using separate queues for details and image downloads. The entire purpose in using NSOperationQueue and setting maxConcurrentOperationCount is that iOS can only execute 5 concurrent requests with a given server. By putting these in two separate queues, you're losing the benefit of maxConcurrentOperationCount. As it turns out it takes a minute for requests to time out, so you're probably not going to experience a problem, but still, it reflects a basic misunderstanding of the purpose of the queues.
Bottom line, you should have only one network queue (because the system limitation is how many network concurrent connections between your device and any given server, not how many image downloads and, separately, how many description downloads).
Have you thought about just doing this asyncronously? I wrote a class to do something very similar to what you describe using blocks. You can do this two ways:
Just load async whenever cellForRowAtIndexPath fires. This works for lots of situations, but can lead to the wrong image showing for a second until the right one is done loading.
Call the process to load the images when the dragging has stopped. This is generally the way I do things so that the correct image always shows where it should. You can use a placeholder image until the image is loaded from the web.
Look at this SO question for details:
Loading an image into UIImage asynchronously

Storing Data In Memory: Session vs Cache vs Static

A bit of backstory: I am working on an web application that requires quite a bit of time to prep / crunch data before giving it to the user to edit / manipulate. The data request task ~ 15 / 20 secs to complete and a couple secs to process. Once there, the user can manipulate vaules on the fly. Any manipulation of values will require the data to be reprocessed completely.
Update: To avoid confusion, I am only making the data call 1 time (the 15 sec hit) and then wanting to keep the results in memory so that I will not have to call it again until the user is 100% done working with it. So, the first pull will take a while, but, using Ajax, I am going to hit the in-memory data to constantly update and keep the response time to around 2 secs or so (I hope).
In order to make this efficient, I am moving the intial data into memory and using Ajax calls back to the server so that I can reduce processing time to handle the recalculation that occurs w/ this user's updates.
Here is my question, with performance in mind, what would be the best way to storing this data, assuming that only 1 user will be working w/ this data at any given moment.
Also, the user could potentially be working in this process for a few hours. When the user is working w/ the data, I will need some kind of failsafe to save the user's current data (either in a db or in a serialized binary file) should their session be interrupted in some way. In other words, I will need a solution that has an appropriate hook to allow me to dump out the memory object's data in the case that the user gets disconnected / distracted for too long.
So far, here are my musings:
Session State - Pros: Locked to one user. Has the Session End event which will meet my failsafe requirements. Cons: Slowest perf of the my current options. The Session End event is sometimes tricky to ensure it fires properly.
Caching - Pros: Good Perf. Has access to dependencies which could be a bonus later down the line but not really useful in current scope. Cons: No easy failsafe step other than a write based on time intervals. Global in scope - will have to ensure that users do not collide w/ each other's work.
Static - Pros: Best Perf. Easies to maintain as I can directly leverage my current class structures. Cons: No easy failsafe step other than a write based on time intervals. Global in scope - will have to ensure that users do not collide w/ each other's work.
Does anyone have any suggestions / comments on what I option I should choose?
Thanks!
Update: Forgot to mention, I am using VB.Net, Asp.Net, and Sql Server 2005 to perform this task.
I'll vote for secret option #4: use the database for this. If you're talking about a 20+ second turnaround time on the data, you are not going to gain anything by trying to do this in-memory, given the limitations of the options you presented. You might as well set this up in the database (give it a table of its own, or even a separate database if the requirements are that large).
I'd go with the caching method of for storing the data across any page loads. You can name the cache you want to store the data in to avoid conflicts.
For tracking user-made changes, I'd go with a more old-school approach: append to a text file each time the user makes a change and then sweep that file at intervals to save changes back to DB. If you name the files based on the user/account or some other session-unique indicator then there's no issue with conflict and the app (or some other support app, which might be a better idea in general) can sweep through all such files and update the DB even if the session is over.
The first part of this can be adjusted to stagger the write out more: save changes to Session, then write that to file at intervals, then sweep the file at larger intervals. you can tune it to performance and choose what level of possible user-change loss will be possible.
Use the Session, but don't rely on it.
Simply, let the user "name" the dataset, and make a point of actively persisting it for the user, either automatically, or through something as simple as a "save" button.
You can not rely on the session simply because it is (typically) tied to the users browser instance. If they accidentally close the browser (click the X button, their PC crashes, etc.), then they lose all of their work. Which would be nasty.
Once the user has that kind of control over the "persistent" state of the data, you can rely on the Session to keep it in memory and leverage that as a cache.
I think you've pretty much just answered your question with the pros/cons. But if you are looking for some peer validation, my vote is for the Session. Although the performance is slower (do you know by how much slower?), your processing is going to take a long time regardless. Do you think the user will know the difference between 15 seconds and 17 seconds? Both are "forever" in web terms, so go with the one that seems easiest to implement.
perhaps a bit off topic. I'd recommend putting those long processing calls in asynchronous (not to be confused with AJAX's asynchronous) pages.
Take a look at this article and ping me back if it doesn't make sense.
http://msdn.microsoft.com/en-us/magazine/cc163725.aspx
I suggest to create a copy of the data in a new database table (let's call it EDIT) as you send the initial results to the user. If performance is an issue, do this in a background thread.
As the user edits the data, update the table (also in a background thread if performance becomes an issue). If you have to use threads, you must make sure that the first thread is finished before you start updating the rows.
This allows a user to walk away, come back, even restart the browser and commit whenever she feels satisfied with the result.
One possible alternative to what the others mentioned, is to store the data on the client.
Assuming the dataset is not too large, and the code that manipulates it can be handled client side. You could store the data as an XML data island or JSON object. This data could then be manipulated/processed and handled all client side with no round trips to the server. If you need to persist this data back to the server the end resulting data could be posted via an AJAX or standard postback.
If this does not work with your requirements I'd go with just storing it on the SQL server as the other comment suggested.

Resources