Using crc32 tweak on has_many relations in Thinking Sphinx - ruby-on-rails

It's weird actually. I have two models that have has_many relation each other, here are my models
#model city
class City < ActiveRecord::Base
belong_to :state
end
#model state
class State < ActiveRecord::Base
has_many :city
end
and I have state index
ThinkingSphinx::Index.define 'state', :with => :active_record do
indexes state_name, :sortable => true
#here is the problem
has "CRC32(cities.city_name)", :as => :city_name, :type => :integer
end
I want to use city_name as a filter. My code above doesn't work and i got an error message when run
rake ts:index
here is the error message
ERROR: index 'state_core': sql_range_query: Unknown column 'cities.city_name' in 'field list'
but, when i put city_name in indexes block like below, the indexer runs well!
ThinkingSphinx::Index.define 'state', :with => :active_record do
indexes state_name, :sortable => true
indexes cities.city_name
has "CRC32(cities.city_name)", :as => :city_name, :type => :integer
end
any suggestions ?

Thinking Sphinx can't tell if you're referring to association tables within SQL snippets - so in your first example, there's nothing indicating that it needs to join on cities.
The join method within an index definition exists for this very purpose - so, try the following:
ThinkingSphinx::Index.define 'state', :with => :active_record do
indexes state_name, :sortable => true
has "CRC32(cities.city_name)", :as => :city_name, :type => :integer
join cities
end
However, it's worth noting a few things: firstly, you may also need to add cities.city_name to the GROUP BY clause, since it's not part of any aggregate values:
# within index definition
group_by 'cities.city_name
But also: your State model has many cities, not just one, so it should actually be aggregated into a set of integer values, not just one. This means you don't need the group_by call, but you do need to add the aggregate behaviour yourself. This is done differently depending on whether you're using PostgreSQL or MySQL:
# PostgreSQL
has "array_to_string(array_agg(crc32(cities.name)), ',')",
:as => :city_names, :type => :integer, :multi => true
# MySQL
has "GROUP_CONCAT(CRC32(cities.name) SEPARATOR ',')",
:as => :city_names, :type => :integer, :multi => true
CRC32 is not a native function in PostgreSQL, and so you may need to add it yourself. Thinking Sphinx prior to v3 did this for you, but I've rewritten it so the CRC32 function is no longer required. This is largely due to the fact that CRC32 can result in collisions, and it can't be reversed, and so it's an inelegant and imperfect solution. Hence, I think using fields for string comparison is better, but it's up to you for whether this is preferred in your app.
I would recommend this approach instead:
ThinkingSphinx::Index.define :state, :with => :active_record do
indexes state_name, :sortable => true
has cities.id, :as => :city_ids
end
city = City.find_by_name('Melbourne')
State.search :with => {:city_ids => city.id}
It's accurate and elegant.

Related

match two or more object from differents models with mongoid

I have 2 model:
class User
include Mongoid::Document
field :email, :type => String, :null => false, :default => ""
.
.
end
class Admin
include Mongoid::Document
field :email, :type => String, :null => false, :default => ""
.
.
end
I want with a mongoid query find all users have a equal email in Admin model, something like:
User.where(:email => {exist_admin_class?})
This is possible? Or I have make a relationship between two model with a has_one User and belongs_to Admin
What is the best way to do this?
Thank you very much!
Indeed, MongoDB doesnt support cross collection queries. But it isnt necessary, especially not in this requirement. I would suggest using inheritance for that:
mongoid HowTo
Why: Just because admins are a special kind of users.

tag synonyms in rails using thinking sphinx

What is a proper way to handle tag synonyms in rails?
Model is called Situation, I use acts_as_taggable_on for tags and ThinkingSphinx for search.
Situation.search :conditions => { :tag_name => '(synonym11 | synonym12) | (synonym21 | synonym22)' }, :match_mode => :boolean
but with proper ranking
something like that helps
define_index :main_index do
# fields
indexes :title
indexes :description
indexes user.name, :as => :author
# attributes
has id, user_id, created_at, updated_at
end
define_index :tag_index do
# fields
indexes taggings.tag.name, :as => :tag_name
# attributes
has id, user_id, created_at, updated_at
set_property :wordforms => 'db/synonyms.txt'
end

Indexing a MongoDB on Heroku, with Rails and Mongoid

I have a Rails app running Mongoid on Heroku and I need to set up some indexes to make the database faster. I've tried to read about it several places and I know Mongoid has some built-in functions for indexing, but I'm not sure on what to apply them and how often to index.
It is mostly my Design-model I want to index:
scope :full_member_and_show, where(full_member: true).and(show: true).desc(:created_at)
scope :not_full_member_and_show, where(full_member: false).and(show: true).desc(:created_at)
embeds_many :comments
belongs_to :designer
search_in :tags_array
attr_accessible :image, :tags, :description, :title, :featured, :project_number, :show
field :width
field :height
field :description
field :title
field :tags, type: Array
field :featured, :type => Boolean, :default => false
field :project_number, :type => Integer, :default => 0
field :show, :type => Boolean, :default => true
field :full_member, :type => Boolean, :default => false
field :first_design, :type => Boolean, :default => false
What do I need to index, how exactly do I do it with Mongoid and how often should I do it?
ERROR UPDATE
If try to index the below:
index({ full_member: 1, show: 1 }, { unique: true })
It throws me this error:
Invalid index specification {:full_member=>1, :show=>1}; should be either a string, symbol, or an array of arrays.
You don't need to index periodically: once you've added an index, mongo keeps that index up to date as the collection changes. This is the same as an index in MySQL or Postgres (you may have been thinking of something like solr)
What to index depends on what queries you'll be making against your dataset. Indexes do carry some overhead when you do updates and consume disk space so you don't want to add them when you don't need them.
You tell mongoid what indexes you want by index, for example
class Person
include Mongoid::Document
index :city
end
There are loads of examples in the mongoid docs for the various kinds of indexes mongo supports.
Then you run
rake db:mongoid:create_indexes
This determines what indexes you want (based in the calls to index in your model) and then ensures that they exist in the db, creating them if necessary. In development you'd run this after adding indexes to your models. In production it makes sense to run this as part of your deploy (you only need to if you've added indexes since the last deploy but it's way easier to just do it systematically)
There's a lot of information about how mongo uses indexes in the documentation

Thinking Sphinx - Showing the right record from the association

I have successfully got Thinking Sphinx working with Geolocation on an associated model. Happy days!
But I now need it to show the right associated record on a Google map.
The scenario is a Company with has_many offices. Offices have got the lng,lat values. I am searching on the Company and associating the offices to it.
E.g.
define_index do
indexes :name, :sortable => true
indexes offices(:city), :as => :city
indexes offices(:postal_code), :as => :postal_code
has "RADIANS(offices.lat)", :as => :lat, :type => :float
has "RADIANS(offices.lng)", :as => :lng, :type => :float
has created_at
has updated_at
set_property :latitude_attr => 'lat'
set_property :longitude_attr => 'lng'
set_property :field_weights => { 'name' => 10,
'service_name' => 9,
'city' => 8 }
end
Searching for x company in y location / postcode works perfectly, showing the correct companies that have got offices in the desired location within the #geodist radius.
E.g.
{:geo=>[0.9283660690549609, -0.050527407508941975], :sort_mode=>:expr, :sort_by=>"#weight * #weight / #geodist", :with=>{"#geodist"=>0.0..120700.8}, :conditions=>{:service_name=>"Business strategies"}, :page=>1, :per_page=>12, :star=>true}
The resulting records are company object, not the offices, which is fine for the list view but I want to show icons on a google map of the relevant associated office.
What is the best way to find the relevant associated office record to show within the radius bounds?
Sphinx only reliably handles single-value float attributes - and it has no concept of paired lat/lng values. This means that you can't have a solid search across objects with more than one lat/lng pair.
The best workaround is to actually search on Office instead - and perhaps pull in each office's company information:
define_index do
indexes company.name, :as => :name, :sortable => true
indexes city, postal_code
has "RADIANS(offices.lat)", :as => :lat, :type => :float
has "RADIANS(offices.lng)", :as => :lng, :type => :float
has company.created_at, :as => :created_at
has company.updated_at, :as => :updated_at
has company_id
set_property :field_weights => {
'name' => 10,
'service_name' => 9,
'city' => 8
}
end
And then when searching, you can group by company_id to ensure only one result for any company (if that's what you'd prefer):
Office.search 'foo',
:geo => [lat, lng],
:with => {'#geodist' => 0.0..120700.8}
:group_function => :attr
:group_by => 'company_id'
If it's important as to which Office gets returned for a given company_id, then you'll probably want to use the :group_clause option as well. The docs cover this.

In a join table, what's the best workaround for Rails' absence of a composite key?

create_table :categories_posts, :id => false do |t|
t.column :category_id, :integer, :null => false
t.column :post_id, :integer, :null => false
end
I have a join table (as above) with columns that refer to a corresponding categories table and a posts table. I wanted to enforce a unique constraint on the composite key category_id, post_id in the categories_posts join table. But Rails does not support this (I believe).
To avoid the potential for duplicate rows in my data having the same combination of category_id and post_id, what's the best workaround for the absence of a composite key in Rails?
My assumptions here are:
The default auto-number column
(id:integer) would do nothing to
protect my data in this situation.
ActiveScaffold may provide a
solution but I'm not sure if
it's overkill to include it in my
project simply for this single
feature, especially if there is a
more elegant answer.
Add a unique index that includes both columns. That will prevent you from inserting a record that contains a duplicate category_id/post_id pair.
add_index :categories_posts, [ :category_id, :post_id ], :unique => true, :name => 'by_category_and_post'
It's very hard to recommend the "right" approach.
1) The pragmatic approach
Use validator and do not add unique composite index. This gives you nice messages in the UI and it just works.
class CategoryPost < ActiveRecord::Base
belongs_to :category
belongs_to :post
validates_uniqueness_of :category_id, :scope => :post_id, :message => "can only have one post assigned"
end
You can also add two separate indexes in your join tables to speed up searches:
add_index :categories_posts, :category_id
add_index :categories_posts, :post_id
Please note (according to the book Rails 3 Way) the validation is not foolproof because of a potential race condition between the SELECT and INSERT/UPDATE queries. It is recommended to use unique constraint if you must be absolutely sure there are no duplicate records.
2) The bulletproof approach
In this approach we want to put a constraint on the database level. So it means to create a composite index:
add_index :categories_posts, [ :category_id, :post_id ], :unique => true, :name => 'by_category_and_post'
Big advantage is a great database integrity, disadvantage is not much useful error reporting to the user. Please note in creating of composite index, order of columns is important.
If you put less selective columns as leading columns in index and put most selective columns at the end, other queries which have condition on non-leading index columns may also take advantage of INDEX SKIP SCAN. You may need to add one more index to get advantage of them, but this is highly database dependant.
3) Combination of both
One can read about combination of both, but I tend to like the number one only.
I think you can find easier to validate uniqueness of one of the fields with the other as a scope:
FROM THE API:
validates_uniqueness_of(*attr_names)
Validates whether the value of the specified attributes are unique across the system. Useful for making sure that only one user can be named "davidhh".
class Person < ActiveRecord::Base
validates_uniqueness_of :user_name, :scope => :account_id
end
It can also validate whether the value of the specified attributes are unique based on multiple scope parameters. For example, making sure that a teacher can only be on the schedule once per semester for a particular class.
class TeacherSchedule < ActiveRecord::Base
validates_uniqueness_of :teacher_id, :scope => [:semester_id, :class_id]
end
When the record is created, a check is performed to make sure that no record exists in the database with the given value for the specified attribute (that maps to a column). When the record is updated, the same check is made but disregarding the record itself.
Configuration options:
* message - Specifies a custom error message (default is: "has already been taken")
* scope - One or more columns by which to limit the scope of the uniquness constraint.
* case_sensitive - Looks for an exact match. Ignored by non-text columns (true by default).
* allow_nil - If set to true, skips this validation if the attribute is null (default is: false)
* if - Specifies a method, proc or string to call to determine if the validation should occur (e.g. :if => :allow_validation, or :if => Proc.new { |user| user.signup_step > 2 }). The method, proc or string should return or evaluate to a true or false value.
I implement both of the following when I have this issue in rails:
1) You should have a unique composite index declared at the database level to ensure that the dbms won't let a duplicate record get created.
2) To provide smoother error msgs than just the above, add a validation to the Rails model:
validates_each :category_id, :on => :create do |record, attr, value|
c = value; p = record.post_id
if c && p && # If no values, then that problem
# will be caught by another validator
CategoryPost.find_by_category_id_and_post_id(c, p)
record.errors.add :base, 'This post already has this category'
end
end
A solution can be to add both the index and validation in the model.
So in the migration you have:
add_index :categories_posts, [:category_id, :post_id], :unique => true
And in the model:
validates_uniqueness_of :category_id, :scope => [:category_id, :post_id]
validates_uniqueness_of :post_id, :scope => [:category_id, :post_id]

Resources