I have a website where the user can upload an excel spreadsheet to load data in a table. There can be a few 100k rows in the excel spreadsheet. When he uploads the file the website needs to insert an equal amount of rows in a database table.
What strategy should i take to do this? I was thinking of displaying a "Please wait page" until the operation is completed but i want him to be able to continue browsing the website. Also, since the database at that time will be kind of busy - wouldn't that stop people from working on the website?
My data access layer is in NHibernate.
Thanks,
Y
Displaying a please wait page would be pretty unfriendly as your user could be wating quite a while and would block threads on your web server.
I would upload the file, store it and create an entry in a queue (you'll need anouther table for this) to indicate that there is a file waiting to be processed. You can then have another process (which could even run on it's own server) which picks up tasks from this queue table and processes the xls file in it's own time.
I would create an upload queue that would submit this request to. Then the user could just check in on the queue every once in a while. You could store the progress of the batch operation in the queue as the rows are processed.
Also, database servers are robust, powerful, multi-tasking systems. Unless you have observed a problem with the website while the inserts are happening don't assume it will stop people from working on the website.
However, as far as insert or concurrent read/write performance goes there are mechanisms to deal with this. You could use the "INSERT LOW PRIORITY" syntax in MySQL or have your application throttle the inserts by sleeping a millisecond between each insert. Also, how you craft your insert statements, wether you use bound parameters or not, and wether you use multi-valued inserts can affect the insert performance and how it affects clients to a large degree.
On Submit you could pass the DB Operation to a asynchronous RequestHandler and set a Session Value when its done.
While the asynch process is in progress you can check the Session Value on each request and if it is set (operation = completed) display a message, eg in a modal or whatever message mechanism you have.
Related
I have an application that, at its core, is a sort of data warehouse and report generator. People use it to "mine" through a large amount of data with ad-hoc queries, produce a report page with a bunch of distribution graphs, and click through those graphs to look at specific result sets of the underlying items being "mined." The problem is that the database is now many hundreds of millions of rows of data, and even with indexing, some queries can take longer than a browser is willing to wait for a response.
Ideally, at some arbitrary cutoff, I want to "offline" the user's query, and perform it in the background, save the result set to a new table, and use a job to email a link to the user which could use this as a cached result to skip directly to the browser rendering the graphs. These jobs/results could be saved for a long time in case people wanted to revisit the particular problem they were working on, or emailed to coworkers. I would be tempted to just create a PDF of the result, but it's the interactive clicking of the graphs that I'm trying to preserve here.
None of the standard Rails caching techniques really captures this idea, so maybe I just have to do this all by hand, but I wanted to check to see if I wasn't missing something that I could start with. I could create a keyed model result in the in-memory cache, but I want these results to be preserved on the order of months, and I deploy at least once a week.
Considering Data loading from lots of join tables. That's why it's taking time to load.
Also you are performing calculation/visualization tasks with the data you fetch from DB, then show on UI.
I like to recommend some of the approaches to your problem:
Minimize the number of joins/nested join DB queries
Add some direct tables/columns, ex. If you are showing counts of comments of user the you can add new column in user table to store it in user table itself. You can add scheduled job to update data or add callback to update count
also try to minimize the calculations(if any) performing on UI side
you can also use the concept of lazy loading for fetching the data in chunks
Thanks, hope this will help you to decide where to start 🙂
I wanted to ask for the best solution in synchronizing large amounts of data between the local data storage and a database. (Offline and Online synchronization)
let's say i have two tables and these tables have customers.
Now if i change some data e.g name for the customer, i have to send this to the server before i can pull the server side update, otherwise there would be a conflict.
Let's image the app is offline and has no internet connection so all the changes are made locally.
What i have been doing up till now is save the initial data pull locally and set queue states for the changes. If i change the name for the customer i would set the customer queue state to AWAITING and once the application gets internet and is back online, i would run a query and get all customers that have queue state AWAITING and send that to the database for synchronization, before i can do a data pull.
I believe this model is called Eventual consistency which i think is not very good, and wonder if there is a better way to do this.
I had a different idea where i would create some Queue where every change is stored in JSON and once the app gets internet the queue would be run and all of these API calls would be sent, but this would work weirdly if let's say i change the customers name from John to Matt, then back from Matt to John, and at this point 2 new api calls would be added to the queue. First where i change to Matt and next where i change back to John, which is what we started with and is therefor not needed, but these changes would be very hard to keep track of, and as it is already now it feels like the synchronization is very unstable.
Do you guys know any better approaches for synchronization between (Offline and Online) local changes and the serverside database?
This question already has answers here:
How do I handle long requests for a Rails App so other users are not delayed too much?
(3 answers)
Closed 6 years ago.
I have an application, which does a lot of computation on few pages(requests). The web interface sends an AJAX request. The computation takes sometimes about 2-5 minutes. The problem is, by this time AJAX request times out.
We can certainly increase the timeout on the web portal, but that doesn't sound like right solution. Also, to improve performance:
Removed N+1/Duplicate queries
Implemented Caching
What else could be done here to reduce the calculation time?
Also, if it still takes longer, I was thinking of following solutions:
Do the computation beforehand and store it in DB. So when the actual request comes, there is no need of calculation. (Apprehensive about this approach. Since we will have to modify/Erase-and-recalculate this data, whenever there is some application logic change.)
Load the whole data in cache when application starts/data gets modified. But for the first time computation has to be done. Also, can't keep whole data in the cache when the application starts. So need to store it in the cache as per demand.
Maybe, do something like Angular promise, where promise gets fulfilled when the response comes from the server.
Do we have any alternative to do this efficiently?
UPDATE:
Depending on user input, the calculation might happen in few seconds. And also it might take 2-5 minutes. The scenario is, user imports an excel. The excel has been parsed and saved in DB. Now on another page, user wants to see the report/analytics graph derived with few calculations on the imported data(which has already been saved to db with background job). The calculation has to be done with many factors, so do not want to save it in DB(As pointed above). Also, when user request the report/analytics graph, It'll be bad experience to tell him that graph will be shown after sometime. You'll get email/notification etc.
The extremely typical solution is to enqueue a job for background processing, and return a job ID to the front-end. Your front-end can then poll for completion using that job ID, or you can trigger a notification such as an email to be sent to the user when the job completes.
There are a multitude of gems for this, and it is such a popular and accepted solution that Rails introduced its own ActiveJob for this exact purpose.
Here are a few possible solutions:
Optimize your tables with indexes to reduce data fetching time.
Preload all rows you'll be dealing with at the beginning, so you won't do a query each time you calculate something... it's faster/easier to #things.select { |r| r.blah } than to Thing.where(conditions)
Instead of all that, just do the computing in PLSQL on the database side. Sure, it's not the same as writing Ruby code but it could be faster.
And yes, cache the whole results set into memcache or redis or something (and expire when something change)
Run the calculation in the background (crontab?) and store the results in a JSON somewhere, or cache the entire HTML file (if you're not localizing or anything)
PS: I'm doing 1,2,3 combined with 5 (caching JSON results into memcache and then pulling the array and formatting/localizing) for a few M records from about 12 tables... sports data mainly.
I'm doing a series of background requests to pull the user's data from a cloud API using Restful requests. Each request returns a day of data, and I have a rate limit of 150 per hour. This means that I will be doing a lot of calls, either in series or in parallel. Regardless, the process is expected to take some time.
What is the standard practice of dealing with saving results of such requests to Core Data - do I save each object that comes in, or do I save in batches? My concern with batching is that if connection is lost, rate limit is hit, or the app crashes, I want to gracefully terminate the download and be able to resume it at a later point, from where I left off.
If you save in batches, you do run the risk of losing data you've already downloaded if the app crashes or is forcefully closed before the save. Even if this does happen, you should be able to use CoreData to identify which items you have not yet downloaded so you can gracefully resume where you left off when the app stopped.
A bit of backstory: I am working on an web application that requires quite a bit of time to prep / crunch data before giving it to the user to edit / manipulate. The data request task ~ 15 / 20 secs to complete and a couple secs to process. Once there, the user can manipulate vaules on the fly. Any manipulation of values will require the data to be reprocessed completely.
Update: To avoid confusion, I am only making the data call 1 time (the 15 sec hit) and then wanting to keep the results in memory so that I will not have to call it again until the user is 100% done working with it. So, the first pull will take a while, but, using Ajax, I am going to hit the in-memory data to constantly update and keep the response time to around 2 secs or so (I hope).
In order to make this efficient, I am moving the intial data into memory and using Ajax calls back to the server so that I can reduce processing time to handle the recalculation that occurs w/ this user's updates.
Here is my question, with performance in mind, what would be the best way to storing this data, assuming that only 1 user will be working w/ this data at any given moment.
Also, the user could potentially be working in this process for a few hours. When the user is working w/ the data, I will need some kind of failsafe to save the user's current data (either in a db or in a serialized binary file) should their session be interrupted in some way. In other words, I will need a solution that has an appropriate hook to allow me to dump out the memory object's data in the case that the user gets disconnected / distracted for too long.
So far, here are my musings:
Session State - Pros: Locked to one user. Has the Session End event which will meet my failsafe requirements. Cons: Slowest perf of the my current options. The Session End event is sometimes tricky to ensure it fires properly.
Caching - Pros: Good Perf. Has access to dependencies which could be a bonus later down the line but not really useful in current scope. Cons: No easy failsafe step other than a write based on time intervals. Global in scope - will have to ensure that users do not collide w/ each other's work.
Static - Pros: Best Perf. Easies to maintain as I can directly leverage my current class structures. Cons: No easy failsafe step other than a write based on time intervals. Global in scope - will have to ensure that users do not collide w/ each other's work.
Does anyone have any suggestions / comments on what I option I should choose?
Thanks!
Update: Forgot to mention, I am using VB.Net, Asp.Net, and Sql Server 2005 to perform this task.
I'll vote for secret option #4: use the database for this. If you're talking about a 20+ second turnaround time on the data, you are not going to gain anything by trying to do this in-memory, given the limitations of the options you presented. You might as well set this up in the database (give it a table of its own, or even a separate database if the requirements are that large).
I'd go with the caching method of for storing the data across any page loads. You can name the cache you want to store the data in to avoid conflicts.
For tracking user-made changes, I'd go with a more old-school approach: append to a text file each time the user makes a change and then sweep that file at intervals to save changes back to DB. If you name the files based on the user/account or some other session-unique indicator then there's no issue with conflict and the app (or some other support app, which might be a better idea in general) can sweep through all such files and update the DB even if the session is over.
The first part of this can be adjusted to stagger the write out more: save changes to Session, then write that to file at intervals, then sweep the file at larger intervals. you can tune it to performance and choose what level of possible user-change loss will be possible.
Use the Session, but don't rely on it.
Simply, let the user "name" the dataset, and make a point of actively persisting it for the user, either automatically, or through something as simple as a "save" button.
You can not rely on the session simply because it is (typically) tied to the users browser instance. If they accidentally close the browser (click the X button, their PC crashes, etc.), then they lose all of their work. Which would be nasty.
Once the user has that kind of control over the "persistent" state of the data, you can rely on the Session to keep it in memory and leverage that as a cache.
I think you've pretty much just answered your question with the pros/cons. But if you are looking for some peer validation, my vote is for the Session. Although the performance is slower (do you know by how much slower?), your processing is going to take a long time regardless. Do you think the user will know the difference between 15 seconds and 17 seconds? Both are "forever" in web terms, so go with the one that seems easiest to implement.
perhaps a bit off topic. I'd recommend putting those long processing calls in asynchronous (not to be confused with AJAX's asynchronous) pages.
Take a look at this article and ping me back if it doesn't make sense.
http://msdn.microsoft.com/en-us/magazine/cc163725.aspx
I suggest to create a copy of the data in a new database table (let's call it EDIT) as you send the initial results to the user. If performance is an issue, do this in a background thread.
As the user edits the data, update the table (also in a background thread if performance becomes an issue). If you have to use threads, you must make sure that the first thread is finished before you start updating the rows.
This allows a user to walk away, come back, even restart the browser and commit whenever she feels satisfied with the result.
One possible alternative to what the others mentioned, is to store the data on the client.
Assuming the dataset is not too large, and the code that manipulates it can be handled client side. You could store the data as an XML data island or JSON object. This data could then be manipulated/processed and handled all client side with no round trips to the server. If you need to persist this data back to the server the end resulting data could be posted via an AJAX or standard postback.
If this does not work with your requirements I'd go with just storing it on the SQL server as the other comment suggested.