I'm creating an app that will import products from several XML feeds. In the XML there is a category specified, like T-Shirt for instance. The problem is that different resellers specify the categories differently. For instance, what one reseller calls "T-Shirts" another may call "T-Shirt", a third "short sleeved shirts" and so on.
I want to somehow map these categories to the categories I have myself. So I need some tips on how I should structure my database.
The idea I have is to create a "raw_categories" table which contains the name of the resellers category and a "category_id" which has a belongs_to relationship to my own "categories" table. Then when I import I simply try to find a raw_category which has a matching name and if there is one, pick it, otherwise add a new one. This new one I can then manually relate to one of my own categories.
Do you understand how I mean, and is it a good approach? Is there a better/more efficient way?
If this is a good idea. How do I do it in Rails? Should I use something like this (I think I've seen something like this in the API doc):
# products model
has_one :category, :through => :raw_categories
I estimate that there will be about 40k to 100k products in the database.
Regards
Linus
Yes, that is the typical design. I would usually call your base table category, and call this alias table categoryAlias. I'm picky about verbage, but raw_categories has nothing to do with "rawness", it's just the categories you want to use.
The other thing I'd suggest is that when you create a category you also create a categoryAlias.
Considering the amount of data you will have in these categories, one practical suggestion I can offer is that when you create a category, also create a categoryAlias row for it, with the same name.
This will make your import code easier as you will only need to query categoryAlias to determine if a category already exists, or there is an alias for it.
Related
I'm relatively new to the Rails framework and I'm not sure if the approach I am taking is the most efficient/effective way or if I am following Rails conventions well.
The basic issue I have is that my application will have a Company model and various set Categories (not editable by the user). Each Company can be part of multiple Categories. My understanding, from other examples, is that I should set the relationships as something like:
Company has_many_belongs_to_many Categories
Category has_many_belongs_to_many Companies
However, since there will not be that many categories (<10), and since they will not change/be editable/be added/be removed by users, I'm not sure I need to create a whole new table for categories then join them onto Companies? Is there a better way to do this in Rails that I'm missing? Thanks in advance!
Even though you may only have 10 categories or so, I would say this is still fine to have it in its own table. Setting up the relationships give you programmatic power to retrieve companies related to a single category and vice versa when you need it without having to reconstruct queries yourself.
An example of the simplicity for keeping those in the database:
# Get all companies under a specific Category
#category = Category.find(1)
#companies = #category.companies
That's pretty simple if you ask me. And if you add another category to the table in the future, you won't need to write any new code to get it work.
Another thing, I would check out using has_many :through instead of has_and_belongs_to_many (habtm), as habtm can cause unforeseen problems as your application gets bigger. Here is a great article that goes into that problem a lot deeper: Why You Don’t Need Has_and_belongs_to_many Relationships. Not saying you can't use it (if the shoe fits), but generally it's good to be aware of potential problems so you can make the right decision for you and your app.
I'm working on a Rails 5 app for Guild Wars 2, and I'm trying to figure out a way to serialize and store all of the items in the game without duplicating code or table columns. The game has a public API to get the items from, documented
here: https://wiki.guildwars2.com/wiki/API:2/items
As you can see, all of the items share several pieces of data like ID, value, rarity, etc. but then also branch off into specific details based on their type.
I've been searching around for a solution, and I've found a few answers, but none that work for this specific situation.
Single Table Inheritance: There's way too much variance between items. STI would likely end up with a table over 100 columns wide, with most of them null.
Polymorphic Associations: Really doesn't seem to be the proper way to use these. I'm not trying to create a type of model that gets included multiple other places, I just want to extend the data of my "Item" model.
Multiple Table Inheritance: This looks to me like the perfect solution. It would do exactly what I'm wanting. Unfortunately, ActiveRecord does not support this, and all of the "workarounds" I've found seem hacky and weird.
Basically, what I'm wanting is a single "Item" model with the common columns, then a "details" attribute that will fetch the type-specific data from the relevant table.
What's the best way to create this schema?
One possible solution:
Use #serialize on the details (text) column
class Item
serialize :details, Hash
end
One huge downside is that this is very inefficient if you need to query on the details data. This essentially bypasses the native abstractions of the database.
I was in a similar situation recently. I solved by using Sequel instead of ActiveRecords.
You can find it here:
https://github.com/TalentBox/sequel-rails
And an implmentation example:
http://www.matchingnotes.com/class-table-inheritance-in-rails.html
Good luck
I want to categorize objects in multiple trees to reflect their characteristics and to build a navigation on.
So, given the following trees:
Category1
-Category-1-1
-Category-1-2
Category2
-Category-2-1
-Category-2-2
--Category-2-2-1
An object could e.g. belong to both Category-1-2 and to Category-2-2-1.
The goal is to be able to fetch all objects from the database
that belong to a certain category
that belong to a certain category or its decendants
A more practical example:
A category might have a hierarchy of 'Tools > Gardening Tools > Cutters'.
A second category: 'Hard objects > Metal objects > Small metal objects'
An object 'Pruners' would be categorized as belonging to 'Cutters' as well as 'Small metal objects'.
I want to be able to
retrieve all 'Gardening Tools' -> 'Pruners'
retrieve all Category children of 'Gardening Tools' -> 'Cutters'
retrieve all 'Hard objects' -> 'Pruners'
retrieve all 'Hard objects' that are also 'Cutters' -> 'Pruners'
retrieve all 'Soft objects' that are also 'Cutters' -> []
Any pointers? I have briefly looked at closure_tree, awesome_nested_sets etc., but I am not sure they are a good match.
Please note that the code here is all pseudo code.
I would use ancestry gem and would model your data with three model classes.
This way your data is normalized and it's a good base to build on.
Category - ancestry tree
has_may Memberships
has_may Products through Memberships
Membership
belongs_to Category
belongs_to Products
Products
has_may Memberships
has_may Categories through Memberships
From there on you need to figure out how to perform the equerries efficiently.
My way of doing this is to understand how to do it with SQL and then figure out how to express the queries with activercord's DSL.
Some resources:
http://railsantipatterns.com/ This book has some examples of complex SQL queries turned into reusable scopes and helpers
http://guides.rubyonrails.org/active_record_querying.html#joining-tables Rails's documentation, see section on joins and includes
http://stackoverflow.com/questions/38549/difference-between-inner-and-outer-join A great explanation of SQL joins
Queries examples:
Find a category.
Category.find(category_id)
Find a category and include it's products inside the specified category.
Category.find(category_id).join(:memberships => :products)
Find a category's sub-tree ind include products
Category.subtree_of(category_id).join(:memberships => :products)
Find all categories a products belongs to.
Product.find(product_id).categories
I just did this and I chose not to use ancestry, but closure_tree because the author says it is faster and I agree with him. Know you need a `has_and_belongs_to_many' between Categories (which I like to call tags whenever I add multiple to a single object) and Objects.
Now the finders, the bad news is that without your own custom query you might not be able to do it with one. Using the gems methods you will do something like:
Item.joins(:tags).where(tags: {id: self_and_descendant_ids })
The code is clean and it executes two queries, one for the descendant_ids and another one in Objects. Slight variations of this, should give you what you need for all except the last. That one is tough and I haven't implemented it (I'm in the process).
For now, you will have to call tag.self_and_ancestor_ids on both (Query count: 2), all items in those tags (Query count: 4) and intersect. After this, some serious refactoring is needed. I think we need to write SQL to reduce the number of queries, I don't think Rails query interface will be enough.
Another reason I chose *closure_tree* was the use of parent_id, all siblings share it (just like any other Rails association) so it made it easier to interface with other gems (for example RankedModel to sort).
I think you could go for one of the tree gems, personally I like Ancestry. Then make an association for each category to have many objects and each object can belong to many categories.
Have you stumbled on any problems already or are you just researching your options?
I have this app I am writing in Rails 3.1, I am wondering the best way to model this.
Would it be best if I created a model called "Movie" with a "title" and then create a new model for each "movie asset" such as "poster, trailer, screener" etc and relate it to the "Movie" by associations? Or would it be best if I just created this as one and do-away with the of associations of each asset to "Movie"?
My assumption is to just make it as one as it will remove all the overhead of the FK's and joins to get retrieve the data related to the movie but I am looking for opinions/suggestions. Thanks
There can be three types of attributes(columns) for movies.
Which have exactly one value, and are present in every movie e.g. title, year, official trailer etc.
Keep them in the movie table.
Which have exactly one value, but are present in few of the movies e.g. total Academy Awards.
Keep them in separate table, and use has_one+belongs_to association.
Which have multiple values e.g. trailers
Keep them in separate table, and use has_many+belongs_to association.
More suggestions:
For many key-value attributes, it is easier to use one json/yaml column using serialize instead of creating one column for each key.
Do not store images in DB, keep them in file-system or cloud storage.
You can create your models in the order you want since you have to fill in both models to create a unique association (such as belongs_to and has_many) so I think it doesn't really matter !
I have an app with an Account model. Each Account belongs_to a sport, which I would usually have as a Sport model and in the DB. But as this isn't really something that will change and is not administered by the end users I thought that it might be better to put it as an integer column in the Account model and map to a hash using a class variable.
However, I need each sport to have many player_positions (which are specific to each sport). So I thought maybe I could do something like:
##player_positions = {:rugby => [position_1, ..., ...]}
Is this good practice for static data like this or should I stick to putting it in the DB as it is relational?
I also thought maybe I could use a yaml file but not sure how I could set that up.
I would stick in the database because you are relating more than one thing to the sport (positions in addition to Account). It makes life easier if you want to use belongs_to and it will allow you to easily use SQL for counts and such (e.g., you can group accounts by sport and get a count of accounts by sport).
I have used an enumerated list of constants in some cases as opposed to tables. For example, I had a simple app with a user type of ADMIN, EDITOR and READER. Rather than put that in a roles table in the DB, I just made those constants on the User class and added a string column to the users table for role.
You don't have to use the database. http://railscasts.com/episodes/189-embedded-association might give you ideas.