Rails importing large excel sheets/google sheets - ruby-on-rails

I am currently working on two projects.
One has an excel file with size of about 130mb and you can image how much records it would be containing.
Other is using google sheets with records 25k+ and these will increase over times.
So for such mega uploads how should I go about in rails.
I am not finding a detailed tutorial addressing this issue, if someone has it then please share it with me.
Kindly advise me a strategy/gems that I should prefer.
Thanks.

Have you concidered converting to CSV and then importing?
There's a tutorial and Gem for that: RailsCasts 396

First, export to CSV. Then split into smaller files, fow example with
split data.csv
(OS X/Linux)
I'd implement the import as a rake task. You could also just generate seed.rb with a bit of string manipulation.
Rails shouldn't have a problem with a 170MB file, but it's often annoying to wait for long-running tasks to finish, especially if you're still debugging.
Alternatively, you can probably import it much faster if you talk directly to mysql. But you'll loose the convenience of rails and should at least do a
Data.all.each do |datum|
datum.touch
datum.save!
end
to verify.

Related

Need local SDK tool for parsing native pdf file with large tables

User needs to parse native-pdf(selectable data, not scanned, no OCR required) in local. The pdf files may be over 400 pages with large tables. Some tables may not have clear borders. Is there any API I could use?
Thanks!
Now that I know you don't want an API, I might recommend that you check out ItextSharp, from nuget. I have used this several times in the past, and there are many stack overflow forums on how to use it. https://www.nuget.org/packages/iTextSharp/5.5.13.1
EDIT: I apologize, it looks like iTextSharp has been replaced with iText 7 https://itextpdf.com/en/products/itext-7
It seems there are several PDF parser APIs out there you could use. PDFTron looks promising, and they offer a free trial: https://www.pdftron.com/pdf-sdk/parsing-library/
DocParser may also be helpful for you, https://docparser.com/features.
I found all of these through a simple google search, so it may benefit you to do some research for yourself. As we can only make broad suggestions based on the information in your question.

Import large number of data using CSV/XLS

I have a import functionality in that I upload CSV/XLS file to import a data. I have a 30000 records (CSV) file.(size : 3.4 MB).
This file will take 40 to 50 MIN. to import a data.
As per each record I stored data into 3 tables.
I want to reduce that time to importing a data.
What should I do please help me
Thanks In Advance
I'm going to assume that you are using tables in a database. Importing 3.4 MB shouldn't take that long to import. In on one of my recent projects, I had to import and parse 100MB+ of dictionary files and it only took a minute (python based).
The time really depends on the code that you have written. Although there are some things to look for that will help reduce the import time.
The first is don't print any values in loops. It generally uses up a good amount of time in any language.
Also only open the database once, no need to close it when your in the same function or problem space.
Use the executemany functionality when it is available. When you are ready to commit all of the changes then commit them all at once.
It would also be nice to see how you structured your import function, then I might be able to provide more details.
Edit:
See Improve INSERT-per-second performance of SQLite?

iPad app as wrapper for Excel Model

I've got an Excel file that takes ~10 inputs and outputs ~5 numbers. The problem is, the calculations run involve lots of assumptions, are rather complex, and laid out over 5 excel sheets with lots of lookup tables, etc.
I'd like to wrap the Excel model in an iPad app -- so that it's easy to solicit user input and show the easy outputs without having them to see the dirty work beneath.
It's important for me to encapsulate the Excel model since that's still getting tweaked and adjusted... so to have a wrapper set up as opposed to reproduce the logic in the Excel file would save me probably 2 orders of magnitude of time.
Have looked around and not found a way to do this yet... any thoughts?
Thanks
Two options come to mind.
One is that you can use an excel wrapper on iOS. Details can be found here: How can i create excel sheet and file in iPhone sdk?
The second option is to setup a server and pass the task onto the server. I'm familiar with Ruby, and creating/modifying excel files in Ruby is a breeze. I'd expect PHP, python, etc. to have similar faculties.
Either option is going to depend on your use case, whether you're charging for the app or not, and your familiarity with server side programming.

Xls (csv) or xml for rails importing data

I need to import data to my app, now i do it via xls spreadsheets, but when in my case it has about 80.000 rows it is slow, so maybe is it better to chose another format? For example, will xml data be more faster in importing?
XML is unlikely to be any faster - it still needs to be parsed as strings and converted.
80,000 rows is quite a lot. How long does it take you?
Edit:
You can make what's happening more visible by dropping puts statements into your code, with timestamps. It's crude, but you can then time between various parts of your code to see which part takes the longest.
Or better yet, have a go at using ruby-prof to profile your code and see where the code is spending the most amount of time.
Either way, getting a more detailed picture of the slow-points is a Good Idea.
You may find there's just one or two bottlenecks that can be easily fixed.

Attach 1 or more (non image) files to rails application, with having to install an image-processing library

I'm currently learning rails by creating a simple project management app. I've gotten to the point where I would like to be allow a user upload multiple files - pdfs, docs, xls etc. The user only needs to be able to attach one file at a time, but the possibilty to have multiple documents associated with a project is a must.
I've spent quite a lot of time researching my options, and it appears the two main plugins are attachment_fu and paperclip. From what I've read though, these appear to concentrate specifically on the upload and subsequent resizing of images, something I couldn't care less about. Is there a simpler way to achieve what I'm trying?
Thank you all in advance.
You might still consider using attachment_fu or paperclip as those are the "standard" libraries for such tasks. And they work fine for any kind of file.
The multi-upload can't be made without JS or Flash now. You need add some hack in your view to manage it.

Resources