Turn flat file into relational models - ruby-on-rails

I get flat files once a week that have a few 100k rows. I would like to turn them into a relational model system in Rails since there are about five columns that are fairly static and would make sense for a different model that would then be linked back into the main table as a foreign key.
Is there was a quick way to check if an entry already exists, and if so, look it up and put the foreign key in the main model, and if not, create a new entry in the second model and then reference it in the main model.
I can turn the above paragraph into code but wanted to know if there is a simple 'few lines' Ruby or Rails implementation.

Just for the sake of merging my comment and #anton-z 's.
You can use activerecord-import for doing bulk operations, and ActiveRecord's find_or_create_by to do the checking.

Related

Rails database setup Polymorphism

We have to create a request system which will have roughly 10 different types of requests. All of these requests will belong to the 'accounting' aspect of our application. Therefore we've called them "Accounting requests".
All requests share maybe only a few columns and each has up to 20 columns individually.
We started to wonder if having separate tables for each request type would be practical in terms of speed when we start to have to do very complicated joins or queries, for example, fetching ALL requests types into a single table and then sorting it.
Maybe it would be easier to just use Single Table Inheritance since it will have a type column and we'd be using one table to store all 10 accounting request types.
What do you think regarding using STI for this many polymorphic associations and requirements?
Essentially, it would have models like so:
AccountingRequest
BillingRequest < AccountingRequest
CheckRequest < AccountingRequest
CancellationRequest < AccountingRequest
Each subclass has roughly 10+ fields.
Currently reading about Multiple Table Inheritance here. This seems like the solution that fits my requirements in this case. Not sure yet though.
STI is a good fit if your models all share the same attributes.
However if your sub classes start having attributes specific to them and not applicable to others, then STI can result in a lot of null columns. In that case, I usually prefer to go with polymorphic association.
This railscast episode is a great example of the difference between the 2
You can use STI in that situation. But making STI will require all the columns into one single table and that's not the good think. The table will go very large in the number of fields.
I think you should divide into two tables like as below...
Request: A request table will be the polymorphic table which saved the information for the type of requests.
RequestItem: The request item table will save all the 20 fields records into the table and will have a foreign key of request table. The request item table will have two fields into the database that's called key and value.
It sounds do-able.
When I've looked into this, I found that making extensive use of value objects helped to control the non-applicability of some attributes to some of the types.
In my case I had types of products, some of which would not have particular measurements for example. In those cases I used a Null Object to indicate "Not applicable" where appropriate.
Edit: I also found the composed_of syntax very convenient: https://apidock.com/rails/ActiveRecord/Aggregations/ClassMethods/composed_of
For now I'm using a bit of NoSQL for such cases. Postgresql's JSONB type allows to store multilevel ruby hash. It also provides rich functionality: DB level constraints, indexes and query operators.
So common attributes are stored in standard way and child specific - in jsonb. Then you can use whatever you need on top of this: STI, Value Objects pattern, serialization or just create scopes for each child. I prefer the last one - my models are thin, most of constraints are DB level and all business logic is in service classes.
Pros:
Avoiding alter table on big tables when need to add one more child type
Keeping my queries efficient
Preventing storing and selecting unnecessary columns
Serialization out of the box for JSON APIs
Cons:
A bit of schemaless
Vendor lock

Rails - Good way to support creation of drafts for several models

I want to allow users to create drafts of several models (such as article, blog post etc). I am thinking of implementing this by creating a draft model for each of my current models (such as articleDraft, blogpostDraft etc.). Is there a better way to do this? Creating a new model for every existing model that should support drafts seems messy and is a lot of work.
I think the better was is to have a flag in the table (ex: int column called draft), to identify if the record is a draft or not.
Advantages of having such a column with out a separate table, as I can see:
It's easy to make your record non-draft (just change the flag)
you will not duplicate data (because practically you will have the same in draft and non-draft records)
coding will be easy, no complex login
all the data will be in one place and hence less room for error
I've been working on Draftsman, a Ruby gem for creating a draft state of your ActiveRecord data.
Draftsman's default approach is to store draft data for all drafted models in a single drafts table via a polymorphic relationship. It stores the object state as JSON in an object column and optionally stores JSON data representing changes in an object_changes column.
Draftsman allows for you to create a separate draft model for each model (e.g., article_drafts, blog_post_drafts) if you want. I agree that this approach is fairly cumbersome and error-prone.
The real advantage to splitting the draft data out into separate models (or to just use a boolean draft flag on the main table, per sameera207's answer) is that you don't end up with a gigantic drafts table with tons of records. I'd offer that that only becomes a real problem when your application has a ton of usage though.
All that to say that my ultimate recommendation is to store all of your draft data in the main model (blog) or a single drafts table, then separate out as needed if your application needs to scale up.
Check out the Active Record Versioning category at The Ruby Toolbox. The current leader is Paper Trail.
I'd go down the state machine route. You can validate each attribute when the model's in a certain state only. Far easier than multiple checkboxes and each state change can have an action (or actions) associated with it.
Having a flag in the model has some disadvantages:
You can not save as draft unless the data is valid. Sure, you can skip validations in the Rails model, but think about the "NOT NULL" columns defined in the database
To find the "real" records, you have to use a filter (like "WHERE draft = FALSE"). This can slow down query performance.
As an alternative, check out my gem drafting. It stores drafts for different models in a separate table.

Rails: associations in app with one model

I've read so many Rails books/tutorials, but when it comes time to actually make the app, I keep tripping over myself, so I'm trying this little exercise to help me get it better.
I have an app with 3 classes (Link, Url, Visit) that have associations defined between them, such as has_one, belongs_to etc. This allows me to call methods like Link.url
If I were to convert this into an app with a single model, is it possible to create the convenience methods (such as Link.url) even though there are no relationships between models, because there is only one model.
This might not be the 'Rails way' but if you answer this question it'll help me get it more.
I guess another way to ask this is, do the Rails associations only exist because the tab
Thanks
Models exist to represent tables in a database. If you have 3 different conceptual objects, then you need 3 different models. Keeping those objects separate and in different tables/models is essential to good programming in any language. The relations are there to help you understand the correlation of each object to the others.
If you think all of data from each of the models can be represented in one table sensibly, then combine them in to one model with a name that encompasses all of the data. If you choose this option, you'll use columns for that one table which represent each of the pieces of data you need. Those column names come free in the model when you create the migration. A table with a column named "url" on a model named "Hit" could be used like this:
Hit.first.url

Naming conventions for Rails migrations

Is there a best practice naming convention for Rails migrations, particularly when editing a model?
e.g. if I'm adding a column bar to the Foo model, should I name it edit_foo or add_bar_to_foo
I'm assuming if I'm editing mutliple models then I should create multiple migrations, but what if I'm making multiple modifications to a single model, do I name it add_bar_remove_x_edit_y_to_foo?
I agree with the previous poster. The naming should focus on readability. But also keep in mind that you can't (nor should) have two migrations with the same name.
So, general names like edit_foo_model is generally not a good idea (since, what happens when you want to add more columns to that model), then it would be better to group the columns into what the purpose is, like update_foo_for_bar_support. You can usually skip adding model since, well, everyone knows that migrations do handle models, so there is no need to mention that in the name (that is, update_foo instead of update_foo_model).
Also, what I usually do is to keep different changes separated. So, if there are multiple different changes in a model, I'd separate them into different migration files, one for adding columns and one for removing columns for instance.
I would split up multiple schema changes into multiple migrations so that you can easily name the single migrations!
The point is readability- to find out fast what the migration is responsible for.. if you write too much "data" in the name, it makes it harder to scan and you shoot yourself in the foot.
So.. if it's 1-2 changes, write it in the name, if there are too many changes, write update_foo_model (or edit_foo_model)

Add fields to ActiveRecord model dynamically in Rails 2.2.2?

Say I wanted to allow an administrative user to add a field to an ActiveRecord Model via an interface in the Rails app. I believe the normal ActiveRecord::Migration code would be adequate for modifying the AR Model's table structure (something that would not be wise for many applications - I know). Of course, only certain types of fields could be added...in theory.
Obviously, the forms that add (or edit) records to this newly modified ActiveRecord Model would need to be build dynamically at run-time. A common form_for approach won't do. This discussion suggests this can only be accomplished with JavaScript.
http://groups.google.com/group/rubyonrails-talk/browse_thread/thread/fc0b55fd4b2438a5
I've used Ruby in the past to query an object for it's available methods. I seem to remember it was insanely slow. I'm too green with Ruby and Rails to know an elegant way to approach this. I hope someone here may. I'm also open to entirely different approaches to this problem that don't involve modifying the database.
To access the columns which are currently defined for a model, use the columns method - it will give you, for each column, its name, type and other information (such as whether it is a primary key, etc.)
However, modifying the schema at runtime is delicate.
The schema is pre-loaded (and cached, from the DB driver) by each model class when it is first loaded. In production mode, Rails only does this once per model, around startup.
In order to force Rails to refresh its cached schema following your modification, you should force Ruby to reload the affected model's class (pretty much what Rails does for you automatically, after each request, when running in development mode - see how to reload a class using remove_const followed by load.)
If you have a Mongrel cluster, you also have to inform the other processes in the cluster, which run in their own separate memory space, to also reload their model's classes (some clusters will allow you to create a 'restart.txt' file, which will cause an automatic soft-restart of all processes in your cluster with no additional work required on your behalf.)
Now, these having been said, depending on the actual problem that you need to solve you may not need to dynamically alter the schema after all. Instead of adding, say, columns col1, col2 and col3 to some table entries (model Entry), you can use a table called dyn_attribs, where Entry has_many :dyn_attribs, and where dyn_attribs has both a key column (which in this case can have values col1, col2 or col3) and a value column (which lists the corresponding values for col1, col2 etc.)
Thus, instead of:
my_entry = Entry.find(123)
col1 = my_entry.col1
#do something with col1
you would use:
my_entry = Entry.find(123, :include => :dyn_attribs)
dyn_attribs = my_entry.dyn_attribs.inject(HashWithIndifferentAccess.new) { |s,a|
s[a.key] = a.value ; s
}
col1 = dyn_attribs[:col1]
#do something with col1
The above inject call can be factored away into the model, or even into a base class inherited from by all models that may require additional, dynamic columns/attributes (see Polymorphic associations on how to make several models share the same dyn_attribs table for dynamic attributes.)
UPDATE
Adding or renaming a column via a regular HTML form.
Assume that you have a DynAttrTable model representing a table with dynamic attributes, as well as a DynAttrDef defining the dynamic attribute names for a given table.
Run:
script/generate scaffold_resource DynAttrTable name:string
script/generate scaffold_resource DynAttrDef name:string
rake db:migrate
Then edit the generated models:
class DynAttrTable < ActiveRecord::Base
has_many :dyn_attr_defs
end
class DynAttrDef < ActiveRecord::Base
belongs_to :dyn_attr_table
end
You may continue to edit the controllers and the views like in this tutorial, replacing Recipe with DynAttrTable, and Ingredient with DynAttrDef.
Alternatively, use one of the plugins reviewed here to automatically put the dyn_attr_tables and dyn_attr_defs tables under management by an automated interface (with all its bells and whistles), with virtually zero implementation effort on your behalf.
This should get you going.
Say I wanted to allow an
administrative user to add a field to
an ActiveRecord Model via an interface
in the Rails app.
I've solved this sort of problem before by having an extra model called AdminAdditions. The table includes an id, an admin user id, a model name string, a type string, and a default value string.
I override the model's find and save methods to add attributes from its admin_additions, and save them appropriately when changed. The model table has a large text field, initially empty, where I save nondefault values of the added attributes.
Essentially the views and controllers can pretend that every attribute of the model has its own column. This means form_for and so on all work.
ActiveRecord::Migration.add_column(User, "email", :string)
You could use Flex Attributes for this, though if you want to be able to search or order by these new columns you'll have to write (a lot of) custom SQL.
I have seen the dynamic alteration/migration of tables offered as a solution many times but I have never actually seen it implemented. There are many reasons why this solution is rarely implemented.
If the table is large then the table may/will be locked for extended periods of what is supposed to be up-time.
Why is your model changing dynamically? It is quite rare for a models structure to need to change dynamically. It is more often an indication that you are trying to model something specific in a generalised way.
This is often an attempt a producing a "Categorised" model than could be better solved by another approach.
DDL statements are often not allowed by the same user that is being used for day to day DML requirements. Whilst this could be the case, and often is in the ROR arena it is not always the "right" way to do it.
What are you trying to achieve here? A better understanding of the problem would probably reveal a more natural solution.
If you were doing this with PostgreSQL now you could probably get away with a JSON type field and then just store whatever in the json hash.

Resources