Program design when combining external sources with local

Program design when combining external sources with local - ruby-on-rails

I have a Rails app that is basically designed this way:
It has a Book model, that has an external_id (all saved Book records have an external_id). The external_id links to an external source about books that doesn't allow for the data to be stored. We use a Presenter to handle some of the differences in the Book model and the external library's class to smooth things over for the view.
We let users do things like "Favorite" their books, regardless of source, so we have a join table and model with a book_id and a user_id to record favorites.
However, in some of the queries, there will be a list of results displayed to the user from the external source, even though we might have Book records with those external_ids. We want to be able to display information like who that the user is friends with that has favorited that book.
It seems there are a couple of ways to handle this:
1) Always load the canonical Book record (if it exists) in the presenter based on the external_id, and override the Book#friends_who_favorited method to return false if no external_id was found
2) Overload the presenter to either call Book#friends_who_favorited or if not a Book record, create its own join query based on external_id (since we wouldn't know the book id yet).
3) Denormalize the database a little, and make sure that we always store the external_id everywhere -- Basically treat external_id like the primary key since every Book record has an external_id. Then the queries can be done more directly, not require a join query, and we wouldn't need multiple queries written. But, this ties us even more to that external source since now our database design will be based on external_id.
It seems like #1 might be the best way to do it, even though it would introduce an extra query to Book (Book.where(external_id: x).first), since #2 would require writing a whole set of additional queries to handle the external_id case. But, I'm open to suggestions as I'm not fully comfortable with any of these methods.

Based on the discussions, if I do that I might consider this solution:
Setup
Uniform the identifier of all books to an id instead of ActiveRecord default id. This is the current field external_id, though I would prefer to rename it without underscore, say rid represents resource id.
Use a format for internal books on rid different from external books.
For example, suppose the format of external id like "abcde12345", then you name the internal books rid as "int_123" according to actual id so all of them are guaranteed to be unique.
Use a model callback to update rid after creating. If it's internal, copy its id and add "int_" prefix. If it's external, save its external id to that field.
Usage
Now usage would be simpler. For every action, use rid instead of original id. When an user favouring the book, the association would be the rid.
In the join table, you can also keep the original id there, so that when one day you changed implementation, there would still be original ids available.
Now the join table will have 4 fields: id, user_id, book_id(the original id), book_rid.
To display the users who liked this book, no matter the book is external or not, you can now query based on the rid in join table and fulfil the job.
Refacoring
Actually refacoring on this solution should not be hard and do no harm.
Add a field rid in the join table
Build a query task to fill rid of all books. Actually it's for internal books only which has blank external_id at this moment.
Build a query to fill the rid field in join table.
Refacor associating method to specify association id, and other related methods if needed.

Related

How to create an Order Model with a type field that dictates other fields

I'm building a Ruby on Rails App for a business and will be utilizing an ActiveRecord database. My question really has to do with Database Architecture and really the best way I should organize all the different tables and models within my app. So the App I'm building is going to have a database of orders for an ECommerce Business that sells products through 2 different channels, a subscription service where they pick the products and sell it for a fixed monthly fee and a traditional ECommerce channel, where customers pay for their products directly. So essentially while all of these would be classified as the Order model, there are two types of Orders: Subscription Order and Regular Order.
So initially I thought I would classify all this activity in my Orders Table and include a field 'Type' that would indicate whether it is a subscription order or a regular order. My issue is that there are a bunch of fields that I would need that would be specific to each type. For instance, transaction_id, batch_id and sub_id are all fields that would only be present if that order type was a subscription, and conversely would be absent if the order type was regular.
My question is, would it be in my best interest to just create two separate tables, one for subscription orders and one for regular orders? Or is there a way that fields could only appear conditional on what the Type field is? I would hate to see so many Nil values, for instance, if the order type was a regular order.
Sorry this question isn't as technical as it is just pertaining to best practice and organization.
Thanks,
Sunny

What you've described is a pattern called Single Table Inheritance — aka, having one table store data for different types of objects with different behavior.
Generally, people will tell you not to do it, since it leads to a lot of empty fields in your database which will hurt performance long term. It also just looks gross.
You should probably instead store the data in separate tables. If you want to get fancy, you can try to implement Class Table Inheritance, in which there are actually separate but connected table for each of the child classes. This isn't supported natively by ActiveRecord. This gem and this gem might be able to help you, but I've never used either, so I can't give you a firm recommendation.

I would keep all of my orders in one table. You could create a second table for "subscription order information" that would only contain the columns transaction_id, batch_id and sub_id as well as a primary key to link it back to the main orders table. You would still want to include an order type column in the main database though to make it a little easier when debugging.

Assuming you're using Postgres, I might lean towards an Hstore for that.
Some reading:
http://www.devmynd.com/blog/2013-3-single-table-inheritance-hstore-lovely-combination
https://github.com/devmynd/hstore_accessor

Make an integer column called order_type.
In the model do:
SUBSCRIPTION = 0
ONLINE = 1
...
It'll query better than strings and whenever you want to call one you do Order:SUBSCRIPTION.
Make two+ other tables with a foreign key equal to whatever the ID of the corresponding row in orders.
Now you can keep all shared data in the orders table, for easy querying, and all unique data in the other tables so you don't have bloated models.

Grails how to set _idx field when INSERTing data from outside of the Grails application?

I have a scaffolded Grails application with two domains, Person and Course. Person belongs to Course, and Course hasMany Persons. I have modified show.gsp for Course to list all of the Persons associated with the selected Course.
To achieve this, Course.groovy contains the following line:
List persons = new ArrayList()
And, as a result, the "person" database table contains a persons_idx field. I frequently will be adding new data to the "person" table outside of my Grails application, from an external website.
When INSERTing new data, how to I figure out what to set persons_idx as?
I had originally used a SortedSet instead of an ArrayList for persons, since I care about sorting. But since I am sorting on Person.lastName, and there will always be multiple people with the same last name, then the list will exclude those persons who have the same last names as others. I wish there was another way...
Thanks.

Having two applications manipulate the same Database is a thing to avoid, when possible. Can your 2nd application instead call an action on the controlling app to add a Person to the Course with parameters passed to specify each? That way, only one app is writing to the DB, reducing caching, index, and sequence headaches.
You also state that Person belongsTo Course... so you create a new Person for "Bob Jenkins" for each course that he's in? This seems excessive. You should probably look into a ManyToMany for this.
Without moving to a service, unfortunately, you'd want to change the indices on some if not many of the rows for the children of the Course you're trying to add a Person to, as that index is the sorted index for all the Persons in the Course.
I would suggest going back to a "Set", and do your sorting in the app. Your other question about sorting already told you not to override compareTo to just check the last name. If I were you, I'd forget about overriding compareTo at all (except to check IDs, if you want), and just use the sort() method, passing in a closure that correctly sorts the objects.

Loading all the data but not from all the tables

I watched this rails cast http://railscasts.com/episodes/22-eager-loading but still I have some confusions about what is the best way of writing an efficient GET REST service for a scenario like this:
Let's say we have an Organization table and there are like twenty other tables that there is a belongs_to and has_many relations between them. (so all those tables have a organization_id field).
Now I want to write a GET and INDEX request in form of a Rails REST service that based on the organization id being passed to the request in URL, it can go and read those tables and fill the JSON BUT NOT for ALL of those table, only for a few of them, for example let's say for a Patients, Orders and Visits table, not all of those twenty tables.
So still I have trouble with getting my head around how to write such a
.find( :all )
sort of query ?
Can someone show some example so I can understand how to do this sort of queries?

You can include all of those tables in one SQL query:
#organization = Organization.includes(:patients, :orders, :visits).find(1)
Now when you do something like:
#organization.patients
It will load the patients in-memory, since it already fetched them in the original query. Without includes, #organization.patients would trigger another database query. This is why it's called "eager loading", because you are loading the patients of the organization before you actually reference them (eagerly), because you know you will need that data later.
You can use includes anytime, whether using all or not. Personally I find it to be more explicit and clear when I chain the includes method onto the model, instead of including it as some sort of hash option (as in the Railscast episode).

Querying Mongodb collection based on parent's attribute

I've got a Posts document that belong to Users, and Users have an :approved attribute. How can I query my Posts using Mongodb s.t. I only get those for where User has :approved => true ?
I could write a loop that creates a new array, but that seems inefficient.

MongoDB does not have any notion of joins.
You've stated in the comments that Posts and Users are separate collections, but your query clearly involves data from both collections, which would imply a join.
I could write a loop that creates a new array, but that seems inefficient.
A join operation in SQL is basically a loop that happens on the server. With no join support on the server side, you'll have to make your own.
Note that many of the libraries (like Morphia) actually have some of this functionality built-in. You are using Mongoid which may have some of this support, but you'll have to do some hunting.

The easiest way to think about it would be to query for unique user ids of users who are approved and then query for post documents where the poster's user_id is in that set.
As Rubish said, you could de-normalize by adding an approved field to the post document. When a user's approval status is toggled (they become approved or unapproved) do an update on the posts collection where, for all of that user's posts, you toggle the denormalized approval field.
Using the denormalized method lets you do one query instead of two (simplifying the logic for the most common case) and isn't too much of a pain to maintain.
Let me know if that makes sense.

Single Inheritance or Polymorphic?

I'm programming a website that allows users to post classified ads with detailed fields for different types of items they are selling. However, I have a question about the best database schema.
The site features many categories (eg. Cars, Computers, Cameras) and each category of ads have their own distinct fields. For example, Cars have attributes such as number of doors, make, model, and horsepower while Computers have attributes such as CPU, RAM, Motherboard Model, etc.
Now since they are all listings, I was thinking of a polymorphic approach, creating a parent LISTINGS table and a different child table for each of the different categories (COMPUTERS, CARS, CAMERAS). Each child table will have a listing_id that will link back to the LISTINGS TABLE. So when a listing is fetched, it would fetch a row from LISTINGS joined by the linked row in the associated child table.
LISTINGS
-listing_id
-user_id
-email_address
-date_created
-description
CARS
-car_id
-listing_id
-make
-model
-num_doors
-horsepower
COMPUTERS
-computer_id
-listing_id
-cpu
-ram
-motherboard_model
Now, is this schema a good design pattern or are there better ways to do this?
I considered single inheritance but quickly brushed off the thought because the table will get too large too quickly, but then another dilemma came to mind - if the user does a global search on all the listings, then that means I will have to query each child table separately. What happens if I have over 100 different categories, wouldn't it be inefficient?
I also thought of another approach where there is a master table (meta table) that defines the fields in each category and a field table that stores the field values of each listing, but would that go against database normalization?
How would sites like Kijiji do it?

Your database design is fine. No reason to change what you've got. I've seen the search done a few ways. One is to have your search stored procedure join all the tables you need to search across and index the columns to be searched. The second way I've seen it done which worked pretty well was to have a table that is only used for search which gets a copy of whatever fields that need to be searched. Then you would put triggers on those fields and update the search table.
They both have drawbacks but I preferred the first to the second.
EDIT
You need the following tables.
Categories
- Id
- Description
CategoriesListingsXref
- CategoryId
- ListingId
With this cross reference model you can join all your listings for a given category during search. Then add a little dynamic sql (because it's easier to understand) and build up your query to include the field(s) you want to search against and call execute on your query.
That's it.
EDIT 2
This seems to be a little bigger discussion that we can fin in these comment boxes. But, anything we would discuss can be understood by reading the following post.
http://www.sommarskog.se/dyn-search-2008.html
It is really complete and shows you more than 1 way of doing it with pro's and cons.
Good luck.

I think the design you have chosen will be good for the scenario you just described. Though I'm not sure if the sub class tables should have their own ID. Since a CAR is a Listing, it makes sense that the values are from the same "domain".
In the typical classified ads site, the data for an ad is written once and then is basically read-only. You can exploit this and store the data in a second set of tables that are more optimized for searching in just the way you want the users to search. Also, the search problem only really exists for a "general" search. Once the user picks a certain type of ad, you can switch to the sub class tables in order to do more advanced search (RAM > 4gb, cpu = overpowered).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart