Import large number of data using CSV/XLS - ruby-on-rails

I have a import functionality in that I upload CSV/XLS file to import a data. I have a 30000 records (CSV) file.(size : 3.4 MB).
This file will take 40 to 50 MIN. to import a data.
As per each record I stored data into 3 tables.
I want to reduce that time to importing a data.
What should I do please help me
Thanks In Advance

I'm going to assume that you are using tables in a database. Importing 3.4 MB shouldn't take that long to import. In on one of my recent projects, I had to import and parse 100MB+ of dictionary files and it only took a minute (python based).
The time really depends on the code that you have written. Although there are some things to look for that will help reduce the import time.
The first is don't print any values in loops. It generally uses up a good amount of time in any language.
Also only open the database once, no need to close it when your in the same function or problem space.
Use the executemany functionality when it is available. When you are ready to commit all of the changes then commit them all at once.
It would also be nice to see how you structured your import function, then I might be able to provide more details.
Edit:
See Improve INSERT-per-second performance of SQLite?

Related

Rails importing large excel sheets/google sheets

I am currently working on two projects.
One has an excel file with size of about 130mb and you can image how much records it would be containing.
Other is using google sheets with records 25k+ and these will increase over times.
So for such mega uploads how should I go about in rails.
I am not finding a detailed tutorial addressing this issue, if someone has it then please share it with me.
Kindly advise me a strategy/gems that I should prefer.
Thanks.
Have you concidered converting to CSV and then importing?
There's a tutorial and Gem for that: RailsCasts 396
First, export to CSV. Then split into smaller files, fow example with
split data.csv
(OS X/Linux)
I'd implement the import as a rake task. You could also just generate seed.rb with a bit of string manipulation.
Rails shouldn't have a problem with a 170MB file, but it's often annoying to wait for long-running tasks to finish, especially if you're still debugging.
Alternatively, you can probably import it much faster if you talk directly to mysql. But you'll loose the convenience of rails and should at least do a
Data.all.each do |datum|
datum.touch
datum.save!
end
to verify.

How to load large amount of data into CoreData

I have a CoreData database that gets initialized with a local file.
The CoreData schema that looks like this:
Category -->> Objections -->> Responses -->> Evidence
("-->>" means, has many)
Each entity also has a description that can be anywhere from 2 to thousands of characters long, stored in an NSString.
Question: How can I store this data so that it would be easy for someone to edit without having to know a lot about programming? (But also follow best practices)
Currently, I am thinking of these as possible approaches:
1) Store everything in 1 big plist file. This would be about 25 pages long.
2) Separate each entity into it's own PList file, and relate each the values with an ID#, like a Relational Data Base. This would make the files a more manageable size, but you have to keep track of ID#.
3) Same as above, but with JSON
Create a dead simple desktop application that uses Core Data. Let the people edit the file in that desktop application and keep it stored in Core Data. Then when you ship your application you can embed that SQLite file into your iOS application so there is no start up parsing required.
Standing up an OS X app for this that doesn't need to be pretty is dead simple. Takes an afternoon or two at most and saves you a ton of headaches.
run in simulator, do the parsing of the plist OR CSV (might be easier, since thats an excel compatible format) there (in the simulated app) and then copy the resulting DB to your xcode project and ship it.
=> your 'content provider' can work with excel
=> you don't have to ship a CSV (or whatever file you use) since you have a filled DB after you parsed it in the simulator
There is a nice post from Mattt Thompson about Core Data Libraries & Utilities: http://nshipster.com/core-data-libraries-and-utilities/, maybe something fit for you. =]

Performance Issue with OPENXML SDK to read & write

Acording to my client requirement,we are replacing ASPOSE technology by OPENXML SDK to read and write Excel 2007 (.xlsm macro-enabled file),and in fact we are able to accomplish the job using openxml sdk.
However the problem starts when we compare the execution time,if the data is too large(reprting data for eg 18k) to be read or written,ASPOSE is too fast in compariosn to OPEN XML SDK.
We take the following approach given in the below link and it works perfectly but the real issue is excution time it takes is too much,during wich we loose the trasaction time out.
http://msdn.microsoft.com/en-us/library/office/hh180830.aspx
If OPENXML SDK failes to deliver result in lesser time than ASPOSE,only then the OPENXML SDK is aceepted otherwise our effort given finding and writing open xml sdk code is in vain.
basically we have are taking entire huge data in dataset and write using any of the above technology.
Any help to improve permance in terms coding will be highly appreacited.
The problem is one of working set, not one of IO time. Allocation of too much memory will slow down your application more than just about any other factor. It is best if you use a streaming approach. Some time ago I recorded a screen-cast that uses a streaming approach - it generates a worksheet at around the rate of 10,000 rows per second (for rows that have 20 columns of data). This is near to being IO bound, so it would be difficult to speed up more than this. You would need to use faster disks.
You can find the screen-cast, and the example code here:
http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2012/01/10/screen-cast-using-open-xml-and-linq-to-xml-in-a-streaming-fashion-to-create-huge-spreadsheets.aspx
-Eric

Xls (csv) or xml for rails importing data

I need to import data to my app, now i do it via xls spreadsheets, but when in my case it has about 80.000 rows it is slow, so maybe is it better to chose another format? For example, will xml data be more faster in importing?
XML is unlikely to be any faster - it still needs to be parsed as strings and converted.
80,000 rows is quite a lot. How long does it take you?
Edit:
You can make what's happening more visible by dropping puts statements into your code, with timestamps. It's crude, but you can then time between various parts of your code to see which part takes the longest.
Or better yet, have a go at using ruby-prof to profile your code and see where the code is spending the most amount of time.
Either way, getting a more detailed picture of the slow-points is a Good Idea.
You may find there's just one or two bottlenecks that can be easily fixed.

Bulk time entry into timesprite / fogbugz

Is there a good way to import time data into either timesprite or fogbugz? Both seem to have very clunky interfaces for adding single items at a time. What I want is a spreadsheet-style format that I can enter a bunch of rows and suck them in. Noticed timesprite has an import on it, but it seems to only want timesprite formatted XML.
You can use the FogBugz API to bulk-add time records into FogBugz. I don't know of an existing spreadsheet-entry interface that uses the API to load data into FogBugz, but one could easily be written for the purpose.

Resources