What is the more efficient way of loading data to db? - ruby-on-rails

I must say that i currently use fixtures to populate my database. In an app i'm making, i will need to pre-populate the database with lots of data. I find that fixtures are a very nice way to describe this data, but there are some efficiency problems.
One important problem is managing the big yaml files. I think that it can get a bit overwhelming when i will have like 200 entries there.
Then, using something like Factories is not really to my liking, because it kinda messes data with code and i just want data representation to be available for easy changes.
Thus, i think of writing a small program to convert from csv to yaml and vice versa, in order to manage my entries through excel (i know that such a script exists already).
Do you know of another better way to do this sort of managing ? Notice that my data is not relevant with each other, meaning that a collection.each to populate is out of the question. Each entry is truly individual with lots of different attributes.

You can do all code you want directly in your db/seed.rb file. Inside you can add all script you want.
You can load a YAML file and save the return of this YAML or you can add your object in format you want.
After you just need call the rake task rake db:seed to launch this task

Related

How to dynamically generate CoreData objects

I haven't found an explicit "no" in the documentation/discussion but suspect it is not possible to generate CoreData objects programmatically, at runtime.
What I want to do is similar to executing DDL commands (eg. "Create Table", "Drop Table" etc.) from inside running code, because I don't know until I ask the user how many columns his table needs, or what data types they need to be. Maybe he needs multiple tables, at that.
Does anyone know whether this is possible? Would appreciate a pointer to something to read.
(Would also appreciate learning the negative, so I can stop wondering.)
If not doable in CoreData, would this be a reason to switch to SQLite?
You can create the entire Core Data model at run time-- there's no requirement to use Xcode's data modeler at all, and there's API support for creating and configuring every detail of the model. But it's probably not as flexible as it sounds like you want it to be. Although you can create new entity descriptions or modify existing ones, you can only do so before loading a data store file. Once you're reading and writing data, you must consider the data model as being fixed. Changing it at that point will generate an exception.
It's not quite the same as typical SQLite usage. It's sort of like the SQLite tables are defined in one fie and the data is stored in another file-- and you can modify the tables on the fly but only before loading the actual data. (I know that's not how SQLite really works, but that's basically the approach that Core Data enforces).
If you expect to need to modify your model / schema as you describe, you'll probably be better off going with direct SQLite access. There are a couple of Objective-C SQLite wrappers that allow an ObjC-style approach while still supporting SQLite-style access:
PLDatabase
FMDB

What's the best practice for handling mostly static data I want to use in all of my environments with Rails?

Let's say for example I'm managing a Rails application that has static content that's relevant in all of my environments but I still want to be able to modify if needed. Examples: states, questions for a quiz, wine varietals, etc. There's relations between your user content and these static data and I want to be able to modify it live if need be, so it has to be stored in the database.
I've always managed that with migrations, in order to keep my team and all of my environments in sync.
I've had people tell me dogmatically that migrations should only be for structural changes to the database. I see the point.
My counterargument is that this mostly "static" data is essential for the app to function and if I don't keep it up to date automatically (everyone's already trained to run migrations), someone's going to have failures and search around for what the problem is, before they figure out that a new mandatory field has been added to a table and that they need to import something. So I just do it in the migration. This also makes deployments much simpler and safer.
The way I've concretely been doing it is to keep my test fixture files up to date with the good data (which has the side effect of letting me write more realistic tests) and re-importing it whenever necessary. I do it with connection.execute "some SQL" rather than with the models, because I've found that Model.reset_column_information + a bunch of Model.create sometimes worked if everyone immediately updated, but would eventually explode in my face when I pushed to prod let's say a few weeks later, because I'd have newer validations on the model that would conflict with the 2 week old migration.
Anyway, I think this YAML + SQL process works explodes a little less, but I also find it pretty kludgey. I was wondering how people manage that kind of data. Is there other tricks available right in Rails? Are there gems to help manage static data?
In an app I work with, we use a concept we call "DictionaryTerms" that work as look up values. Every term has a category that it belongs to. In our case, it's demographic terms (hence the data in the screenshot), and include terms having to do with gender, race, and location (e.g. State), among others.
You can then use the typical CRUD actions to add/remove/edit dictionary terms. If you need to migrate terms between environments, you could write a rake task to export/import the data from one database to another via a CSV file.
If you don't want to have to import/export, then you might want to host that data separate from the app itself, accessible via something like a JSON request, and have your app pull the terms from that request. That seems like a lot of extra work if your case is a simple one.

Formatting organizing and filtering data from text files

I'm looking to go through a bunch of text files in a bunch of folders. I'd like to go through each file line by line and do some basic statistics, like grabbing time stamp and count repeating values. Is there any tool or scripting solution that someone could recommend for doing this?
Another possibility is to have a script/tool that could just parse these files and add them to a database like sqlite or access, for easy filtering.
So far I tried using AIR, but it looks like there might be too much data for it to process, and it hangs, but that could be because of some inefficient filtering.
I have used QuickMacros for things like this. It can do just about anyting to a textfile (some illegal in 7 states) as well as connect to databases and perform sql tasks like create and modify tables etc.
I routinely used it to extract data, parse it, and then load it into another database. Especially useful with Scheduled Tasks.
Here's the website
I recommend Perl and CPAN

Smartest way to import massive datasets into a Rails application?

I've got multiple massive (multi gigabyte) datasets I need to import into a Rails app. The datasets are currently each in their own database on my development machine, and I need to read from them and create rows in tables in my Rails database based on the information they contain. The tables in my Rails database will not be exactly the same as the tables in the source databases.
What's the smartest way to go about this?
I was thinking migrations, but I'm not exactly sure how to connect the migration to the databases, and even if that is possible, is that going to be ridiculously slow?
without seeing the schemas or knowing the logic you want to apply to each row, I would say the fastest way to import this data is to create a view of the table you want to export in the column order you want (and process it using sql) and the do a select into outfile on that view. You can then take the resulting file and import it into the target db.
This will not allow you to use any rails model validations on the imported data, though.
Otherwise, you have to go the slow way and create a model for each source db/table to extract the data (http://programmerassist.com/article/302 tells you how to connect to a different db for a given model) and import it that way. This is going to be quite slow, but you could set up an EC2 monster instance and run it as fast as possible.
Migrations would work for this, but I wouldn't recommend it for something like this.
Since georgian suggested it, I'll post my comment as an answer:
If the changes are superficial (column names changed, columns removed, etc), then I would just manually export them from the old database and into the new, and then run a migration to change the columns.

Rails: Script to import data

I have a couple of scripts that generate & gather large amounts of data that I will need both to seed my database with and in the future to add large amounts more to it. What is the best way to import lots of relational data into a rails database as both seed data and intermittently during production?
I haven't settled on an output format for my script yet but the data's structure largely mirrors my rails model's and contains has_many associations that I would like the import to preserve.
I've googled a fair bit and seen ar-extensions and seed_fu as well as the idea of using fixtures.
With ar-extensions all the examples seem to be staightforward csv imports (likely from table dumps which seems to be its primary use case) with no mention of handling associations or avoiding duplicate updates. In my case I have no id's, foreign keys, or join tables in my script so this seems like it wouldn't work for me unless I was prepared to handle that complexity myself.
With seed_fu, it looks like it could handle the relational aspects of the data creation but would still require me to specify ids (how can you know which ones are available in production?) and mix code with data.
Fixtures also have the same id problem though now it requires objects to be named (I'd probably just end up using numbers for names) and I am not sure how I would avoid accidental duplications of records.
Or would i be better off just putting my data into a local sqlite db first and then using the straight table dumping techniques?
I've done this before with CSV files. I have a cron job that collects the data and puts it in accessible CSV files. Then I have a rake task (also in cron, but on another box) that looks for CSVs, and if there are any, calls model methods to create objects out of the CSV rows.
The models have a csv_create_or_update method that takes a CSV row. This technique sidesteps the id issue, and also allows validations to run (or not) on the data coming in.
I was handed a rails app recently that required some imported data and it was all handled in a rake task. All the data came in form of csv formatted files and was handled per class/model etc as necessary. This worked out relatively well for me being new to the system as it was very easy to see where data was going and how it was being applied. As you import the data you can check for id conflicts/collisions and handle them accordingly.

Resources