Xls (csv) or xml for rails importing data - ruby-on-rails

I need to import data to my app, now i do it via xls spreadsheets, but when in my case it has about 80.000 rows it is slow, so maybe is it better to chose another format? For example, will xml data be more faster in importing?

XML is unlikely to be any faster - it still needs to be parsed as strings and converted.
80,000 rows is quite a lot. How long does it take you?
Edit:
You can make what's happening more visible by dropping puts statements into your code, with timestamps. It's crude, but you can then time between various parts of your code to see which part takes the longest.
Or better yet, have a go at using ruby-prof to profile your code and see where the code is spending the most amount of time.
Either way, getting a more detailed picture of the slow-points is a Good Idea.
You may find there's just one or two bottlenecks that can be easily fixed.

Related

Looking for a stable way to store a large static constant in XCode 13/Swift 5

The Situation: I have a simple word game that uses an English language dictionary (the language dictionary) to provide definitions (in a dictionary data structure, the code-dictionary). Apple's APIs won't do what I need and I could not find a suitable 3rd party solution. The game has worked fine with a makeshift dictionary that was missing words and definitions. I took the Wiktionary database and parsed it way down but hit diminishing returns around 46MB. The dictionary is a simple text file, one word's definition(s) per line. There's also a separate wordlist file with one word per file. I do the parsing with a Python script so I can adjust the file(s) and formats.
The Problem: Reading the language dictionary to a code dictionary takes too long to do it on load. Making the language-dictionary a static const data structure causes XCode to use all my real memory and equivalent swap space eventually rendering my system unusable at around 15GB of swap. This happens whether I attempt a build or not and seems to be XCode scanning it for auto-completion.
The Question: Is there some way to work around this situation? Some kind of directive to not scan the static dictionary file? A simple precompiled library that won't be scanned? An alternative data-structure?
What I've tried: I did a bunch of profiling and sticking in print statements to determine that the language-dictionary loading was the slow part. The code was a simple get next line, add to code-dictionary loop. The static dictionary was my best attempt to fix the loading time so far.
What I haven't tried: A static array and splitting up the dictionary both require almost complete rewrites. I'm not against massive rewrites, if they're likely to resolve the problem. I looked at a plist, but it appears I would need to read that in as well but it would be much larger because of the XML. I considered learning Core Data, but it seemed overkill for a basic, if large lookup table.
This would have been a great excuse to learn Core Data if I was there already, but, I'm struggling with Swift forcing advanced concepts into areas of basic code.
I resolved the situation by changing the language dictionary builder to create a plist. I then loaded that directly into a String:String data dictionary. It was surprisingly fast in comparison.
Things I'm still considering as optimizations include breaking it into smaller pieces, say based on the first letter or two. If they're small enough, say a half second or less to load on the slowest device, they can be loaded into the dictionary as needed during play. Eventually, a background load might be better, but that's a whole 'nother thing.

Slow Batch Upload to Google Sheets for Batch Upload of Records through Power Apps

I am fairly new to Power Apps, and am trying to make a batch data entry form.
I am prototyping this now, and while I think in theory it should be working I keep running into technical errors.
The data source I'm using is google sheets. For prototyping purposes, there are three columns, item_id, item, and recorded_value.
For this app, it will be pulling a list of standard values into a gallery, where the input values can then be selected.
The approach I have taken is to create a gallery, which is added to a collection using the code below:
ClearCollect(
collection,
ForAll(
Filter(Gallery1.AllItems,true),
{ item:t_item.Text,item_id:t_item_id.Text,
recorded_value:t_recorded_value.Text
}
)
)
This is then uploaded to google sheets, I have found "success" using the two methods below:
ForAll(collection,Patch(records, Defaults(records),{item:item,item_id:item_id,recorded_value:recorded_value}))
or
Collect(records, collection)
I would say overall I am seeing 2 main issues in the testing:
The initial 'collect' seems like it fails to capture items on occasion. I don't know if it is cache related or what, but it seems like unless I scroll all the way down it will leave some fields blank (maybe not an issue in real use, but seems odd)
Uploading of records seems to take excruciatingly long in some cases. While initially it was just straight up crashing due to the problems in issue 1, I have found that it will sometimes get to say item 85 before sitting for a minute or so and then going through the rest of the list. For just 99 items it is taking several minutes to upload.
Ultimately I am looking to know if there is a better approach for what I am doing. I am basically just wanting to take a max of 99 rows and paste it on to the table, but it feels really inefficient right now due to the looping nature of the function. I am not sure if this is more of a powerapps or google sheets issue, but any advice would be appreciated.
From everything I could research, it seems like batch upload of records like this is going to be time consuming nearly any way you approach it.
I was able to come up with a workaround however which more or less eliminates the problem.
Instead of uploading each individual record, I am taking the approach of concatenating all records in the collection in a single cell through a variable, using delimiters to differentiate the rows/columns. (set variable with concat function, then patch the variable to the data source.)
This method allows all of the data to be stored nearly instantaneously.
After that I am just going to perform some basic etl through Python to transform the data into a more standard format and load it into SQL server which is fairly trivial to do.
I recommend others looking to take a 'batch insert' approach try something similar, as it will now only take users essentially a second to load records rather than several minutes.

Is Marshal.ReleaseComObject really necessary when using Microsoft.Office.Interop.Excel?

I have a mid sized code library (several thousand lines) that uses Excel Interop (Microsoft.Office.Interop.Excel).
The program that keeps a workbook open for hours at a time, and does manipulations like adding/editing text, shapes, and calling macros.
I have not once seen a Marshal.ReleaseComObject. Yet, the users don't report any problems.
In all cases, the objects go out of scope within several seconds.
So, is this a problem? How? If yes, how do I justify to management that it needs cleanup? If not, why recommend it in the first place?
It's been a while, but I did a lot of Excel automation from .NET. I never used Marshal.ReleaseComObject either. Never saw a problem.

iPad app as wrapper for Excel Model

I've got an Excel file that takes ~10 inputs and outputs ~5 numbers. The problem is, the calculations run involve lots of assumptions, are rather complex, and laid out over 5 excel sheets with lots of lookup tables, etc.
I'd like to wrap the Excel model in an iPad app -- so that it's easy to solicit user input and show the easy outputs without having them to see the dirty work beneath.
It's important for me to encapsulate the Excel model since that's still getting tweaked and adjusted... so to have a wrapper set up as opposed to reproduce the logic in the Excel file would save me probably 2 orders of magnitude of time.
Have looked around and not found a way to do this yet... any thoughts?
Thanks
Two options come to mind.
One is that you can use an excel wrapper on iOS. Details can be found here: How can i create excel sheet and file in iPhone sdk?
The second option is to setup a server and pass the task onto the server. I'm familiar with Ruby, and creating/modifying excel files in Ruby is a breeze. I'd expect PHP, python, etc. to have similar faculties.
Either option is going to depend on your use case, whether you're charging for the app or not, and your familiarity with server side programming.

How do I keep text type data on my app using some form of data persistence

I need to feed some substantial text into my first app which will change as the date changes (i.e every day) for a whole month.
This data, I currently have it all on a .doc file which each day's content on that file, properly separated.
What method can I use to store this data in my app and how can I format it in such a way that the app takes the date portion of the text and use that to determine whether or not it should show that particular portion of text for a day.
The app will update the content every month.
I've looked everywhere and the posts I've seen about persistent data just confuse me more. Would this be possible using just plists? The data will ALWAYS be text never images or anything else.
If you post a sample of the text, we can probably give you a better answer (not clear what you're asking), but this sounds like a perfect candidate for core data or sqlite. Once you have the text separated by date/text combos, it becomes pretty simple to do a search by the appropriate date, and show only the relevant text.
By the way, having it in a *.doc format is just going to add unnecessary overhead. If you really want to have it in a flat file, use something like json.
That looks like a bit much for plists. You could use core data or sqlite, but there is also the easier option of having separate plaintext files and reading them as is demonstrated in this question.

Resources