Embedded or referenced relations - ruby-on-rails

I use mongodb and mongoid gem and I'd like to get some advice.
I have an app where User has many Markets and Market has many Products.
I need to search for the products, say in a specific price range, in all (or any) the markets which belong to the user.
Which relation fits better for this, embedded or referenced?
I currently use referenced and it looks like so
class User
has_many :markets
end
class Market
belongs_to :user
has_many :products
end
class Product
belongs_to :calendar
belongs_to :user
end
And for search, I use this query
Product.where(user_id: current_user.id).
in(market_id: marked_ids).
where(:price.gte => price)
I'm curious, since mongdb is a document oriented database, would I benefit in a performance or design, if I used embedded documents in this situation?

In your case I would advice to use referenced data. Because I suppose that you need to manipulate each of those collections on it's own (you need to be able to edit/delete/update "products" by _id, and do some other complicated queries, which is much easier and effective when you have separate collection).
At the same time I would store some full embedded data in Users collection, just for speed-up display to visitor's browser. Let's say you have a user's page where you want to show user's profile and top-5 markets and top-20 products. You can embed those newest top-5 and top-20 to User's document and update those embedded objects when there are new markets/products. In this case - when you show user's page you need to make just 1 query to MongoDB. So this works as cache. If visitor needs to view more products, he goes to the next page "Products" and query separate "Products" collection in MongoDB.

Use embedded documents if you only need to access the item through the parent class. If you need to query it directly or from multiple objects, use a reference.

Related

Rails includes method when to add

So I've read a lot about the rails includes method but I'm still a bit confused about what's the best situation to use it.
I have a situation where I have a user record and then this user is related to multiple models like client, player, game, team_player, team, server and server_center.
I need to display specific attributes from the related models in a view. I only need around 1-2 attributes from a specific model and I don't use the others.
I already added delegates for example to get the server.name from player I can use server_name but in this situation do I include all of the tables from which I need the attributes or is there something else I do because I only need a couple of attributes from the model.
My query is as follows at the moment:
#user_profile = User
.includes({:client => [:player, :team_player => [:team]]},
:game,
{:server_center => :server})
.where(game_id: #master.admin.games)
Includes ensures that all of the specified associations are loaded using the minimum possible number of queries.
Let say we have 2 models named User and Profile :
class User < ActiveRecord::Base
has_one :profile
end
class Profile < ActiveRecord::Base
belongs_to :user
end
If we are iterating through each of the users and display the name of each user were name field resides in Profile model which has a association with User model, we would normally have to retrieve the name with a separate database query each time. However, when using the includes method, it has already eagerly loaded the associated person table, so this block only required a single query.
without includes:
users = User.all
users.each do |user|
puts user.profile.name # need extra database query for each time we call name
end
with includes
# 1st query to get all users 2nd to get all profiles and loads to the memory
users = User.includes(:profile).all
users.each do |user|
puts user.profile.name # no extra query needed instead it loads from memory.
end
Eager Loading is used to prevent N+1 query problems. basically it does left outer join and this plays an important role in speeding up request response or optimizing the queries. eg: if we are having huge amount users and if we want to iterate through those users and their corresponding profile. no of time which we will be hitting database will be equals to number of users. but if we are using includes it will keep all profile into memory later when we iterate through the users it will fetch from this memory instead of querying.
Eager loading may not always be the best the cure for our N+1 queries for eg: if you are dealing with some complex queries preferably looks for some caching solutions like Russian Doll caching etc.. still both method has his own pros & cons end of the day it's up to you to determine the best approach.
one useful gem which helps to detect N+1 query is bullet

Designing multi-tenant app in Rails

I'm implementing a multi-tenant app with Rails. My approach is not to use the postgres inbuilt multi-tenant feature and add a column to record the subdomain. That is where the question is :)
Let's get this example
class Organisation < ActiveRecord::Base
has_many :users
end
class User < ActiveRecord::Base
belongs_to :organisation
end
I'm thinking about two approaches here:
Approach 1
add a subdomain column only to organisations
pros - The way how relational databases should work \0/
cons - When I have more complex queries , that will make my code slow
Approach 2
add a subdomain column to both organisations and users
pros - This will make queries faster
cons - I'm going against the relational databases
So the question is, what sort of a method I should follow between above two, or are there a different approach that I didn't think about?
We run a multi-tenant Rails app with slightly fewer than 500 table-backed classes, and if I had to guess I'd say around 400 of them relate to client data.
Client-specific attributes are held in the Client model, but we add the client_id to every client table with a not null constraint on the database. Only a minority of them are indexed, though because they are generally only accessed through a parent record.
We do not have to worry about setting the client id because the model will generally have:
class Child
after_initialize do
self.client ||= parent.client
end
end
We add the client_id to many tables because we have a lot of code that does:
#books = current_user.client.books
... so in those cases we'll have an association directly from Client to the model, and client_id is indexed.
We add the client_id to all tables, though, because we very often, for operational or unusual reasons, want to be able to find all of the relevant records for a client ...
MarketingText.where(client: Client.snowbooks).group(:type).count
... and having to go through a parent record is just inconvenient.
Also, because we made the decision to do this on all client-specific tables, we do not have to make the decision on each one.
So to get to your question, what I would do is add the subdomain to the Organisation only. However, I would add the organisation_id column to every table holding Organisation-specific data.
If you have a lot of clients but you are going to be generally familiar with their subdomain, then I would write a meta-program method on the Organisation that lets you use:
Organisation.some_subdomain
... to get the required organisation, then find the child records (in any table) with an association directly from the Organisation model ...
Organisation.some_subdomain.users
Organisation.some_subdomain.prices
Organisation.some_subdomain.whatevers
my opinion will go Approach number one, couple reasons for this
using relational database provided with activerecord + scopes will make writing software easier, also if you have more objects under organization later for example transactions, items (beside users),
I have a project with multi tenant capabilities and below is sample of design in my project
class Company < ApplicationRecord
has_many :users
# transaction
has_many :transactions
has_many :journals , :through => :transactions
# item
has_many :items
# other has_many ...
end
and in controller you can use eager loading to minimize query (includes / joins)
#company.includes(:users).scope_filter_here.search(params[:q])
approach number 1 is more user friendly compared with approach number 2 as it's more simple to user writing your url address, the less url to type is better (personal opinion).

Should I keep a value in the child association AND parent to avoid an extra query?

Let's say I have a Book, which has_many Photos. When a user visits /books/1 I want to display each of the book's photos in a gallery, but on the index page (/books) I only want to display one photo for each book. Note that I've added a column to the Photos table called hero_image which will contain a boolean value.
Out of a book's 50 photos, only 1 will ever be a hero_image. In order to get all of the hero images for the index page, would it not be more efficient to simple include a column on the Book table such as hero_image that contains the same value (image-url) as the Photo that is a hero_image? Doing so would save me an extra query to the database which would look up Photos based on the hero_image field and the id of the books being displayed.
Another important consideration is that I frequently update every Photo through a rake task, so if I was to include an after_update callback that updates a Photo's parent if hero_image_changed? (using ActiveModel::Dirty) then that would also be somewhat expensive (there are a LOT of photos and each of them would trigger the aforementioned callback)
You can define a simple has_one defining the principal_photo among the related photos:
class Book < ActiveRecord::Base
has_many :photos
has_one :principal_photo, -> { where(principal: true) }, class_name: 'Photo'
Eager Loading:
#books = Book.where(id: 1..999).includes(:principal_photo)
This code allows you to eager-load only the most relevant image and not all photos from a Book scope. This will produce only 2 SQL requests, one for retrieving the Book records and another one to retrieve their related Photo record defined as principal. This will save you from the pain of caching data and the callback hell behind it.

Sort by timestamp on has_many after scope

I have a User object, a Package object (User has_many packages) and then a LocationTracker (User has_many location_trackers), which acts as a join table between User and Package, but just tracks details such as the most recent package delivery.
I'd like to sort my Users based on the most recent package they sent. The LocationTracker has an attribute last_received_from_user
I can easily sort the users from a certain location by ordering by the last_received_from_user attribute, however I'd also like to have a global index page that shows all of the Users, sorted by the last package they delivered.
I'm having trouble grouping the users. I'm attempting to use a DISTINCT ON(last_received_from_user), but then it complains that the attribute isn't in the group, and when I add it to the group, it groups by that timestamp, which is obviously pretty unique, so I get duplicate users showing up.
My current code is as follows:
User.includes(:location_trackers)
.group("location_trackers.user_id, users.id")
.order("location_trackers.last_received_from_user #{order} NULLS LAST")
Any help is greatly appreciated!
EDIT:
I've got the last_received_from_user which allows me to sort users from a SINGLE location well. However, I need to be able to scope based on what could be a number of different options. For example, only show users in a certain area (Which could be compromised of a few locations), or order by ALL users for ALL locations. The attribute works great for a single user-location relationship, but fails when it comes to attempting to perform the search on more than 1 location.
I'd like to sort my Users based on the most recent package they sent
Wouldn't it be easier (and way more efficient) having an attribute like latest_delivery_location and using a callback on the User model like:
class User < ApplicationRecord
after_update :update_latest_delivery_location
private
def update_latest_delivery_location
update_attributes(
latest_delivery_location: location_trackers.last.last_received_from_user
)
end
end
Or updating such attribute after an order has been placed / dispatched. I'd go for this approach because is easier to maintain and, if you want it more performing you could always add an index on users.latest_delivery_location for sorting operations.

Mongo Design question with Rails/Mongoid for a bill tracking app

I'm writing a quick app for a user to track their daily bills (for money tracking purposes). I want the user to be able to define their own categories that a bill can be applicable for. I'm trying however to decide the best way to model this and also validate categories as unique.
My initial thought was this:
class User
include Mongoid::Document
embeds_many :bills
field :categories, :type => Array
end
class Bill
include Mongoid::Document
embeded_in :user, :inverse_of => :bills
field :category
index :category
end
So a user can add categories, just as strings, and when they add a bill, they'll choose from their available categories for the bill.
So, a couple questions:
Does this seem like the proper design? I Don't think it's necessary to define an actual category model as it's literally just a string used to index bills on, but I'm not sure if there are other benefits to a separate model
How do I validate_uniqueness_of :categories in my user model. I don't think it works on array items like this, but I could be wrong. I don't want a user to create categories with the same name. I suppose this might be the advantage of a separate model, embedded in the User, but again it seems like more work than necessary.
Can someone tell me my best options here to validate that a user has unique categories (but users can have the same categories, i obviously don't care about that, just unique in the scope of a single user)
[Update]
The design seems proper. In a Rails specific way how would you validate the uniqueness? When adding a category pull the list and do an indexOf check to ensure it doesn't exist. If it does just bounce back an error.
I'm not a Rails guy, let me know if I'm off track or something.
I'm not sure MongoDB would be the best choice of storage engines for that. You would be better off using MySQL with a categories table.
Knocks against MongoDB:
Not ACID transactions
No single server durability
Not relational (you want relational for a bill tracking application)

Resources