Indexing a MongoDB on Heroku, with Rails and Mongoid - ruby-on-rails

I have a Rails app running Mongoid on Heroku and I need to set up some indexes to make the database faster. I've tried to read about it several places and I know Mongoid has some built-in functions for indexing, but I'm not sure on what to apply them and how often to index.
It is mostly my Design-model I want to index:
scope :full_member_and_show, where(full_member: true).and(show: true).desc(:created_at)
scope :not_full_member_and_show, where(full_member: false).and(show: true).desc(:created_at)
embeds_many :comments
belongs_to :designer
search_in :tags_array
attr_accessible :image, :tags, :description, :title, :featured, :project_number, :show
field :width
field :height
field :description
field :title
field :tags, type: Array
field :featured, :type => Boolean, :default => false
field :project_number, :type => Integer, :default => 0
field :show, :type => Boolean, :default => true
field :full_member, :type => Boolean, :default => false
field :first_design, :type => Boolean, :default => false
What do I need to index, how exactly do I do it with Mongoid and how often should I do it?
ERROR UPDATE
If try to index the below:
index({ full_member: 1, show: 1 }, { unique: true })
It throws me this error:
Invalid index specification {:full_member=>1, :show=>1}; should be either a string, symbol, or an array of arrays.

You don't need to index periodically: once you've added an index, mongo keeps that index up to date as the collection changes. This is the same as an index in MySQL or Postgres (you may have been thinking of something like solr)
What to index depends on what queries you'll be making against your dataset. Indexes do carry some overhead when you do updates and consume disk space so you don't want to add them when you don't need them.
You tell mongoid what indexes you want by index, for example
class Person
include Mongoid::Document
index :city
end
There are loads of examples in the mongoid docs for the various kinds of indexes mongo supports.
Then you run
rake db:mongoid:create_indexes
This determines what indexes you want (based in the calls to index in your model) and then ensures that they exist in the db, creating them if necessary. In development you'd run this after adding indexes to your models. In production it makes sense to run this as part of your deploy (you only need to if you've added indexes since the last deploy but it's way easier to just do it systematically)
There's a lot of information about how mongo uses indexes in the documentation

Related

Postgres hstore and Rails sunspot solr

I have an application which relies heavily on the hstore type in postgres. The Issue I can't seem to get over is making the hstore searchable in sunspot. Here is some code I am working on
class Post < ActiveRecord::Base
# properties is type hstore
%w[price condition website].each do |key|
store_accessor :properties, key
end
...
searchable :auto_index => false, :auto_remove => false do
text :title, :boost => 5.0
integer :category
integer :subcategory
# this is whats giving me the problem
string :properties["price"]
end
end
I have tried adding different types but nothing seems to work. Is this a feature not yet supported?
Hstore is basically a hash it stores keys and values so all you have to do is iterate over the the keys and look them up.
Here is the working code:
searchable :auto_index => false, :auto_remove => false do
text :title, :boost => 5.0
integer :category
integer :subcategory
%w[price condition website].each do |key|
string key.to_sym do
properties[key]
end
end
end
hopefully in the future they'll have support for
hstore :properties

Using crc32 tweak on has_many relations in Thinking Sphinx

It's weird actually. I have two models that have has_many relation each other, here are my models
#model city
class City < ActiveRecord::Base
belong_to :state
end
#model state
class State < ActiveRecord::Base
has_many :city
end
and I have state index
ThinkingSphinx::Index.define 'state', :with => :active_record do
indexes state_name, :sortable => true
#here is the problem
has "CRC32(cities.city_name)", :as => :city_name, :type => :integer
end
I want to use city_name as a filter. My code above doesn't work and i got an error message when run
rake ts:index
here is the error message
ERROR: index 'state_core': sql_range_query: Unknown column 'cities.city_name' in 'field list'
but, when i put city_name in indexes block like below, the indexer runs well!
ThinkingSphinx::Index.define 'state', :with => :active_record do
indexes state_name, :sortable => true
indexes cities.city_name
has "CRC32(cities.city_name)", :as => :city_name, :type => :integer
end
any suggestions ?
Thinking Sphinx can't tell if you're referring to association tables within SQL snippets - so in your first example, there's nothing indicating that it needs to join on cities.
The join method within an index definition exists for this very purpose - so, try the following:
ThinkingSphinx::Index.define 'state', :with => :active_record do
indexes state_name, :sortable => true
has "CRC32(cities.city_name)", :as => :city_name, :type => :integer
join cities
end
However, it's worth noting a few things: firstly, you may also need to add cities.city_name to the GROUP BY clause, since it's not part of any aggregate values:
# within index definition
group_by 'cities.city_name
But also: your State model has many cities, not just one, so it should actually be aggregated into a set of integer values, not just one. This means you don't need the group_by call, but you do need to add the aggregate behaviour yourself. This is done differently depending on whether you're using PostgreSQL or MySQL:
# PostgreSQL
has "array_to_string(array_agg(crc32(cities.name)), ',')",
:as => :city_names, :type => :integer, :multi => true
# MySQL
has "GROUP_CONCAT(CRC32(cities.name) SEPARATOR ',')",
:as => :city_names, :type => :integer, :multi => true
CRC32 is not a native function in PostgreSQL, and so you may need to add it yourself. Thinking Sphinx prior to v3 did this for you, but I've rewritten it so the CRC32 function is no longer required. This is largely due to the fact that CRC32 can result in collisions, and it can't be reversed, and so it's an inelegant and imperfect solution. Hence, I think using fields for string comparison is better, but it's up to you for whether this is preferred in your app.
I would recommend this approach instead:
ThinkingSphinx::Index.define :state, :with => :active_record do
indexes state_name, :sortable => true
has cities.id, :as => :city_ids
end
city = City.find_by_name('Melbourne')
State.search :with => {:city_ids => city.id}
It's accurate and elegant.

match two or more object from differents models with mongoid

I have 2 model:
class User
include Mongoid::Document
field :email, :type => String, :null => false, :default => ""
.
.
end
class Admin
include Mongoid::Document
field :email, :type => String, :null => false, :default => ""
.
.
end
I want with a mongoid query find all users have a equal email in Admin model, something like:
User.where(:email => {exist_admin_class?})
This is possible? Or I have make a relationship between two model with a has_one User and belongs_to Admin
What is the best way to do this?
Thank you very much!
Indeed, MongoDB doesnt support cross collection queries. But it isnt necessary, especially not in this requirement. I would suggest using inheritance for that:
mongoid HowTo
Why: Just because admins are a special kind of users.

Mongoid create new Users

I'm trying to write an example app using Ruby on Rails and the Mongoid Mapper.
For some kind of Testing I want to write 1000 Testusers into MongoDB...
With the code bolow Mongoid is not able to write unique uid's. In my ruby console i got the right number for the counter but not for the uid.
Does anybody know what I forgot?
class User
include Mongoid::Document
include Mongoid::Timestamps
def self.create_users
(1..1000).each do |f|
user = User.new(uid: f.to_s, first_name: "first_name", last_name: "last_name", e_mail: "e_mail")
user.save!
puts f
puts user.uid
end
end
field :uid, :type => String
field :first_name, :type => String
field :last_name, :type => String
field :e_mail, :type => String
field :messages, :type => String
attr_accessible :first_name, :last_name, :e_mail
validates_presence_of :uid, :first_name, :last_name, :e_mail
validates_uniqueness_of :uid
has_many :messages
end
You don't have to provide the field uid in your models. MongoId add the id field for you and manages the value during the create operation.
Simply remove field :uid, :type => String from model
If you want to use your own ids you can change the name of the uid field to _id and it should work just fine. However, the default generated mongo _id will make it easier to scale and using it removes one of the more difficult aspects of sharding if you ever need that feature.
If you want to use the ones that are generated by default, they are included automatically unless overridden explicitly (behavior which you have seen) so just remove your custom field and you should be all set.
You can read more about ObjectIds here.

In a join table, what's the best workaround for Rails' absence of a composite key?

create_table :categories_posts, :id => false do |t|
t.column :category_id, :integer, :null => false
t.column :post_id, :integer, :null => false
end
I have a join table (as above) with columns that refer to a corresponding categories table and a posts table. I wanted to enforce a unique constraint on the composite key category_id, post_id in the categories_posts join table. But Rails does not support this (I believe).
To avoid the potential for duplicate rows in my data having the same combination of category_id and post_id, what's the best workaround for the absence of a composite key in Rails?
My assumptions here are:
The default auto-number column
(id:integer) would do nothing to
protect my data in this situation.
ActiveScaffold may provide a
solution but I'm not sure if
it's overkill to include it in my
project simply for this single
feature, especially if there is a
more elegant answer.
Add a unique index that includes both columns. That will prevent you from inserting a record that contains a duplicate category_id/post_id pair.
add_index :categories_posts, [ :category_id, :post_id ], :unique => true, :name => 'by_category_and_post'
It's very hard to recommend the "right" approach.
1) The pragmatic approach
Use validator and do not add unique composite index. This gives you nice messages in the UI and it just works.
class CategoryPost < ActiveRecord::Base
belongs_to :category
belongs_to :post
validates_uniqueness_of :category_id, :scope => :post_id, :message => "can only have one post assigned"
end
You can also add two separate indexes in your join tables to speed up searches:
add_index :categories_posts, :category_id
add_index :categories_posts, :post_id
Please note (according to the book Rails 3 Way) the validation is not foolproof because of a potential race condition between the SELECT and INSERT/UPDATE queries. It is recommended to use unique constraint if you must be absolutely sure there are no duplicate records.
2) The bulletproof approach
In this approach we want to put a constraint on the database level. So it means to create a composite index:
add_index :categories_posts, [ :category_id, :post_id ], :unique => true, :name => 'by_category_and_post'
Big advantage is a great database integrity, disadvantage is not much useful error reporting to the user. Please note in creating of composite index, order of columns is important.
If you put less selective columns as leading columns in index and put most selective columns at the end, other queries which have condition on non-leading index columns may also take advantage of INDEX SKIP SCAN. You may need to add one more index to get advantage of them, but this is highly database dependant.
3) Combination of both
One can read about combination of both, but I tend to like the number one only.
I think you can find easier to validate uniqueness of one of the fields with the other as a scope:
FROM THE API:
validates_uniqueness_of(*attr_names)
Validates whether the value of the specified attributes are unique across the system. Useful for making sure that only one user can be named "davidhh".
class Person < ActiveRecord::Base
validates_uniqueness_of :user_name, :scope => :account_id
end
It can also validate whether the value of the specified attributes are unique based on multiple scope parameters. For example, making sure that a teacher can only be on the schedule once per semester for a particular class.
class TeacherSchedule < ActiveRecord::Base
validates_uniqueness_of :teacher_id, :scope => [:semester_id, :class_id]
end
When the record is created, a check is performed to make sure that no record exists in the database with the given value for the specified attribute (that maps to a column). When the record is updated, the same check is made but disregarding the record itself.
Configuration options:
* message - Specifies a custom error message (default is: "has already been taken")
* scope - One or more columns by which to limit the scope of the uniquness constraint.
* case_sensitive - Looks for an exact match. Ignored by non-text columns (true by default).
* allow_nil - If set to true, skips this validation if the attribute is null (default is: false)
* if - Specifies a method, proc or string to call to determine if the validation should occur (e.g. :if => :allow_validation, or :if => Proc.new { |user| user.signup_step > 2 }). The method, proc or string should return or evaluate to a true or false value.
I implement both of the following when I have this issue in rails:
1) You should have a unique composite index declared at the database level to ensure that the dbms won't let a duplicate record get created.
2) To provide smoother error msgs than just the above, add a validation to the Rails model:
validates_each :category_id, :on => :create do |record, attr, value|
c = value; p = record.post_id
if c && p && # If no values, then that problem
# will be caught by another validator
CategoryPost.find_by_category_id_and_post_id(c, p)
record.errors.add :base, 'This post already has this category'
end
end
A solution can be to add both the index and validation in the model.
So in the migration you have:
add_index :categories_posts, [:category_id, :post_id], :unique => true
And in the model:
validates_uniqueness_of :category_id, :scope => [:category_id, :post_id]
validates_uniqueness_of :post_id, :scope => [:category_id, :post_id]

Resources