Wrapping my head around MongoDB, mongomapper and joins

Wrapping my head around MongoDB, mongomapper and joins - ruby-on-rails

I'm new to MongoDB and I've used RDBMS for years.
Anyway, let's say I have the following collections:
Realtors
many :bookmarks
key :name
Houses
key :address, String
key :bathrooms, Integer
Properties
key :address, String
key :landtype, String
Bookmark
key :notes
I want a Realtor to be able to bookmark a House and/or a Property. Notice that Houses and Properties are stand-alone and have no idea about Realtors or Bookmarks. I want the Bookmark to be sort of like a "join table" in MySQL.
The Houses/Properties come from a different source so they can't be modified.
I would like to be able to do this in Rails:
r = Realtor.first
r.bookmarks would give me:
House1
House2
PropertyABC
PropertyOO1
etc...
There will be thousands of Houses and Properties.
I realize that this is what RDBMS were made for. But there are several reasons why I am using MongoDB so I would like to make this work.
Any suggestions on how to do something like this would be appreciated.
Thanks!

OK, first things first. You've structured your data as if this were an RDBMS. You've even run off and created a "join table" as if such a thing were useful in Mongo.
The short answer to your question is that you're probably going to have re-define "first" to load the given "Bookmarks". Either "server-side" with an $in clause or "client-side" with a big for loop.
So two Big Questions about the data:
If Bookmarks completely belong to a Realtor, why are they in their own collection?
If Realtors can Bookmark Houses and Property, then why are these in different collections? Isn't this needless complication? If you want something like Realtor.first on bookmarks why put them in different collections?
The Realtors collection should probably be composed of items that look like this:
{"name":"John", "bookmarks": [
{"h":"House1","notes":[{"Nice location","High Ask"}] },
{"p":"PropertyABC","notes":[{"Haunted"}] }
] }
Note how I've differentiated "h" and "p" for ID of the house and ID of the property? If you take my next suggestion you won't need even that.
Taking this one step further, you probably want Houses and Properties in the same collection, say "Locations". In the "Locations" collection, you're just going to stuff all Houses and Properties and mark them with "type":"house" or "type":"property". Then you'll index on the "type" field.
Why? Because now when you write the "first" method, your query is pretty easy. All you do is loop through "bookmarks" and grab the appropriate key ("House1", "PropertyABC") from the "Locations" collection. Paging is straight forward, you query for 10 items and then return.
I know that at some level it seems kind of lame."Why am I writing a for loop to grab data? I tried to stop doing that 15 years ago!" But Mongo is a "document-oriented" store, so it's optimized for loading individual documents. You're trying to load a bunch of documents, so you have to jump through this little hoop.
Fortunately, it's not all bad. Mongo is really fast at loading individual docs. Running a query to fetch 10 items at once is still going to be very quick.

Related

ActiveRecord schema for multiple types of objects with common columns?

I'm working on a Rails 5 app for Guild Wars 2, and I'm trying to figure out a way to serialize and store all of the items in the game without duplicating code or table columns. The game has a public API to get the items from, documented
here: https://wiki.guildwars2.com/wiki/API:2/items
As you can see, all of the items share several pieces of data like ID, value, rarity, etc. but then also branch off into specific details based on their type.
I've been searching around for a solution, and I've found a few answers, but none that work for this specific situation.
Single Table Inheritance: There's way too much variance between items. STI would likely end up with a table over 100 columns wide, with most of them null.
Polymorphic Associations: Really doesn't seem to be the proper way to use these. I'm not trying to create a type of model that gets included multiple other places, I just want to extend the data of my "Item" model.
Multiple Table Inheritance: This looks to me like the perfect solution. It would do exactly what I'm wanting. Unfortunately, ActiveRecord does not support this, and all of the "workarounds" I've found seem hacky and weird.
Basically, what I'm wanting is a single "Item" model with the common columns, then a "details" attribute that will fetch the type-specific data from the relevant table.
What's the best way to create this schema?

One possible solution:
Use #serialize on the details (text) column
class Item
serialize :details, Hash
end
One huge downside is that this is very inefficient if you need to query on the details data. This essentially bypasses the native abstractions of the database.

I was in a similar situation recently. I solved by using Sequel instead of ActiveRecords.
You can find it here:
https://github.com/TalentBox/sequel-rails
And an implmentation example:
http://www.matchingnotes.com/class-table-inheritance-in-rails.html
Good luck

Can I have a one way HABTM relationship?

Say I have the model Item which has one Foo and many Bars.
Foo and Bar can be used as parameters when searching for Items and so Items can be searched like so:
www.example.com/search?foo=foovalue&bar[]=barvalue1&bar[]=barvalue2
I need to generate a Query object that is able to save these search parameters. I need the following relationships:
Query needs to access one Foo and many Bars.
One Foo can be accessed by many different Queries.
One Bar can be accessed by many different Queries.
Neither Bar nor Foo need to know anything about Query.
I have this relationship set up currently like so:
class Query < ActiveRecord::Base
belongs_to :foo
has_and_belongs_to_many :bars
...
end
Query also has a method which returns a hash like this: { foo: 'foovalue', bars: [ 'barvalue1', 'barvalue2' } which easily allows me to pass these values into a url helper and generate the search query.
This all works fine.
My question is whether this is the best way to set up this relationship. I haven't seen any other examples of one-way HABTM relationships so I think I may be doing something wrong here.
Is this an acceptable use of HABTM?

Functionally yes, but semantically no. Using HABTM in a "one-sided" fashion will achieve exactly what you want. The name HABTM does unfortunately insinuate a reciprocal relationship that isn't always the case. Similarly, belongs_to :foo makes little intuitive sense here.
Don't get caught up in the semantics of HABTM and the other association, instead just consider where your IDs need to sit in order to query the data appropriately and efficiently. Remember, efficiency considerations should above all account for your productivity.
I'll take the liberty to create a more concrete example than your foos and bars... say we have an engine that allows us to query whether certain ducks are present in a given pond, and we want to keep track of these queries.
Possibilities
You have three choices for storing the ducks in your Query records:
Join table
Native array of duck ids
Serialized array of duck ids
You've answered the join table use case yourself, and if it's true that "neither [Duck] nor [Pond] need to know anything about Query", using one-sided associations should cause you no problems. All you need to do is create a ducks_queries table and ActiveRecord will provide the rest. You could even opt to use has_many :through relationship if you need to do anything fancy.
At times arrays are more convenient than using join tables. You could store the data as a serialized integer array and add handlers for accessing the data similar to the following:
class Query
serialize :duck_ids
def ducks
transaction do
Duck.where(id: duck_ids)
end
end
end
If you have native array support in your database, you can do the same from within your DB. similar.
With Postgres' native array support, you could make a query as follows:
SELECT * FROM ducks WHERE id=ANY(
(SELECT duck_ids FROM queries WHERE id=1 LIMIT 1)::int[]
)
You can play with the above example on SQL Fiddle
Trade Offs
Join table:
Pros: Convention over configuration; You get all the Rails goodies (e.g. query.bars, query.bars=, query.bars.where()) out of the box
Cons: You've added complexity to your data layer (i.e. another table, more dense queries); makes little intuitive sense
Native array:
Pros: Semantically nice; you get all the DB's array-related goodies out of the box; potentially more performant
Cons: You'll have to roll your own Ruby/SQL or use an ActiveRecord extension such as postgres_ext; not DB agnostic; goodbye Rails goodies
Serialized array:
Pros: Semantically nice; DB agnostic
Cons: You'll have to roll your own Ruby; you'll loose the ability to make certain queries directly through your DB; serialization is icky; goodbye Rails goodies
At the end of the day, your use case makes all the difference. That aside, I'd say you should stick with your "one-sided" HABTM implementation: you'll lose a lot of Rails-given gifts otherwise.

A model with a handful of id's

I have a model which has many of another model but this model only needs to have 10 or less id's in it.
Let's say it has, Bathroom, Kitchen, LivingRoom for arguments sake and the new records will probably never need to change.
What is the best way of making a model like this that doesn't use a database table?

This may not be best practices, but to solve the same problem I just specified a collection in my model, like this:
ROOM_TYPES = [ "Bathroom", "Living Room", "Kitchen" ]
Then in the view:
<%= f.select(:room_type, Project::ROOM_TYPES, {:prompt => '...'}) %>
(replace Project with your actual model name.)
Super-straightforward, almost no setup. I can see how it would be difficult to maintain though, since there's no way to add items without accessing the Rails code, but it does get the job done quickly.

Even though the collection of rows never changes, it's still useful to have this as a table in your database in order to leverage ActiveRecord relations. A less contrived example would be a database table that has a list of US states. This is highly unlikely to change, and would likely have only a couple of columns (state name and state abbreviation). However, by storing these in the database and supporting them with ActiveRecord, you preserve the ability to do handy things like searching for all users in a state using conventional rails semantics.
That being said, an alternative would be to simply create a class that you stick in your models directory that does not inherit from ActiveRecord, and then either populate it once from the database when the application loads, or simply populate it by hand.
A similar question was asked yesterday, and one of the answers proposes a mechanism for supporting something similar to what you want to do:
How to create a static class that represents what is in the database?

Single Inheritance or Polymorphic?

I'm programming a website that allows users to post classified ads with detailed fields for different types of items they are selling. However, I have a question about the best database schema.
The site features many categories (eg. Cars, Computers, Cameras) and each category of ads have their own distinct fields. For example, Cars have attributes such as number of doors, make, model, and horsepower while Computers have attributes such as CPU, RAM, Motherboard Model, etc.
Now since they are all listings, I was thinking of a polymorphic approach, creating a parent LISTINGS table and a different child table for each of the different categories (COMPUTERS, CARS, CAMERAS). Each child table will have a listing_id that will link back to the LISTINGS TABLE. So when a listing is fetched, it would fetch a row from LISTINGS joined by the linked row in the associated child table.
LISTINGS
-listing_id
-user_id
-email_address
-date_created
-description
CARS
-car_id
-listing_id
-make
-model
-num_doors
-horsepower
COMPUTERS
-computer_id
-listing_id
-cpu
-ram
-motherboard_model
Now, is this schema a good design pattern or are there better ways to do this?
I considered single inheritance but quickly brushed off the thought because the table will get too large too quickly, but then another dilemma came to mind - if the user does a global search on all the listings, then that means I will have to query each child table separately. What happens if I have over 100 different categories, wouldn't it be inefficient?
I also thought of another approach where there is a master table (meta table) that defines the fields in each category and a field table that stores the field values of each listing, but would that go against database normalization?
How would sites like Kijiji do it?

Your database design is fine. No reason to change what you've got. I've seen the search done a few ways. One is to have your search stored procedure join all the tables you need to search across and index the columns to be searched. The second way I've seen it done which worked pretty well was to have a table that is only used for search which gets a copy of whatever fields that need to be searched. Then you would put triggers on those fields and update the search table.
They both have drawbacks but I preferred the first to the second.
EDIT
You need the following tables.
Categories
- Id
- Description
CategoriesListingsXref
- CategoryId
- ListingId
With this cross reference model you can join all your listings for a given category during search. Then add a little dynamic sql (because it's easier to understand) and build up your query to include the field(s) you want to search against and call execute on your query.
That's it.
EDIT 2
This seems to be a little bigger discussion that we can fin in these comment boxes. But, anything we would discuss can be understood by reading the following post.
http://www.sommarskog.se/dyn-search-2008.html
It is really complete and shows you more than 1 way of doing it with pro's and cons.
Good luck.

I think the design you have chosen will be good for the scenario you just described. Though I'm not sure if the sub class tables should have their own ID. Since a CAR is a Listing, it makes sense that the values are from the same "domain".
In the typical classified ads site, the data for an ad is written once and then is basically read-only. You can exploit this and store the data in a second set of tables that are more optimized for searching in just the way you want the users to search. Also, the search problem only really exists for a "general" search. Once the user picks a certain type of ad, you can switch to the sub class tables in order to do more advanced search (RAM > 4gb, cpu = overpowered).

Rails model: "has many" simple attribute

Let's assume this model:
Movie
- Title: String
- Has many:
- Alternative Title: String
My questions is, how should I store the alt. title attribute? I am deciding between three approaches:
Separate AR model: probably an overkill
CSV in a signle DB column
Serialized array in single DB column
The latter two seems logically equivilent. I am leaning towards the CSV approach. Can anyone give some advise on this? What would be the implications on speed and searchability?

If a movie can have many titles, it makes most sense to have a Title model and give the Movie model a has_many :titles relation, especially if you later on decide to add more metadata about titles. It may seem like overkill, but I think it will be the least hassle in the long run. Furthermore, I think that a movie's "main" title should be a Title object as well, perhaps with an is_main_title or similar attribute to distinguish it from the others.

If most of the time you only use the primary title, I'll go with your CSV option.
If most of the time you use all the titles, I'll put all the titles (primary and secondary) inside a single CSV column (named "titles") and just get the first when the primary is needed (with a helper function).
Why?
Because it makes things simple- and if the time has come, like Jordan said, that you need another attribute you can always migrate to a separate model.
Until then, YAGNI.

I would also vote for a separate model even though it seems like overkill it will allow you to basically follow the Rails way the easiest. However, if you choose not to reap the benefits of all the baked in magic associated with associations, then I would recommend YAML or JSON over CSV. CSV is quite simple, but Rails has baked in support for YAML serialization and would probably be the easiest solution. Check out RDoc on #serialize. For the given example this would basically amount to:
class Movie < ActiveRecord::Base
serialize :alternate_titles
end
With that, Rails would handle a lot of the drudgery for you and you'll have a nice array of alternate titles always available.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart