Thinking sphinx attribute from polymorphic association's datetime field - ruby-on-rails

I have a model A associated to model B via INNER JOIN:
class A
has_many :bees, as: :bable
scope :bees, -> () {
joins("INNER JOIN bees AS b ON id = b.bable_id .......")
}
end
class B
table_name = "bees"
belongs_to :bable, polymorphic: true
end
I need to filter using B's datetime field (created_at), so I declared a new attribute thus:
has bees.created_at, as: :b_created_at
The sphinx query statement generated now includes:
GROUP_CONCAT(DISTINCT UNIX_TIMESTAMP(bees.`created_at`) SEPARATOR ',') AS `b_created_at`
After indexing, my sphinx index file size exploded.
How much is the "GROUP_CONCAT" part of the query causing the problem, and is there a better way to filter by this attribute?
How can I debug the indexer and find other causes of the large index file being generated?
Thanks

It appears that the indexer is creating, within the index file, a comma separated list of all created timestamps of all bees - as created timestamps are generally unique (!), this indexing is going to create one item for every bee. If you have a lot of bees then this is going to be big.
I would be looking at some way to bypass Sphinx for this part of the query if that is possible and get it to add a direct SQL BETWEEN LowDateTs AND HighDateTs against the built in created_at instead. I hope this is possible - it will definitely be better than using a text index to find it.
Hope this is of some help.
Edit:
Speed reading Sphinx' docs:
[...] WHERE clause. This clause will map both to fulltext query and filters. Comparison operators (=, !=, <, >, <=, >=), IN, AND, NOT, and BETWEEN are all supported and map directly to filters [...]
So the key is to stop it treating the timestamp as a text search and use a BETWEEN, which will be vastly more efficient and hopefully stop it trying to use text indexing on this field.

Related

Query in a string column for one of the value in an array like multiple OR (using full text search)

In a rails 4 app, in one model I have a column containing multiple ids as a string with comma separated values.
"123,4568,12"
I have a "search" engine that I use to retrieve the records with one or many values using the full text search of postgresql I can do something like this which is very useful:
records = MyModel.where("my_models.col_name ## ?", ["12","234"])
This return all the records that have both 12 and 234 in the targeted column. The array comes from a form with a multiple select.
Now I'm trying to make a query that will find all the records that have either 12 or 234 in there string.
I was hopping to be able to do something like:
records = MyModel.where("my_models.col_name IN (?)", ["12","234"])
But it's not working.
Should I iterate through all the values in the array to build a query with multiple OR ? Is there something more appropriate to do this?
EDIT / TL;DR
#BoraMa answer is a good way to achieve this.
To find all the records containing one or more ids referenced in the request use:
records = MyModel.where("my_models.col_name ## to_tsquery(?)", ["12","234"].join('|'))
You need the to_tsquery(?) and the join with a single pipe |to do a OR like query.
To find all the records containing exactly all the ids in the query use:
records = MyModel.where("my_models.col_name ## ?", ["12","234"])
And of course replace ["12","234"] with something like params[:params_from_my_form]
Postgres documentation for full text search
If you already started to use the fulltext search in Postgres in the first place,I'd try to leverage it again. I think you can use a fulltext OR query which can be constructed like this:
records = MyModel.where("my_models.col_name ## to_tsquery(?)", ["12","234"].join(" | "));
This uses the | operator for ORing fulltext queries in Postgres. I have not tested this and maybe you'll need to do to_tsvector('my_models.col_name') for this to work.
See the documentation for more info.
Suppose your ids are :
a="1,2,3,4"
You can simply use:
ModelName.find(a)
This will give you all the record of that model whose id is present in a.
I just think a super simple solution, we just sort the ids in saving callback of MyModel, then the query must be easier:
class MyModel < ActiveRecord::Base
before_save :sort_ids_in_col_name, if: :col_name_changed?
private
def sort_ids_in_col_name
self.col_name = self.col_name.to_s.split(',').sort.join(',')
end
end
Then the query will be easy:
ids = ["12","234"]
records = MyModel.where(col_name: ids.sort.join(',')

Not able to index multiple timestamp for an object in thinking sphinx

I have two models user and orders.
I have one in User model as has_many orders.
I am creating one index in thinking sphinx like :
has association(:created_at), as: :order_time, type: :timestamp
Now I want to search for users who have created any order in some time range. and using above index as
User.search with: {:order_time => t1..t2}
But, this is not giving accurate results. Any idea what am I doing wrong here.
Also I tried writing a sql query also something like
user_order_time = <<-SQL
SELECT orders.created_at
FROM orders
WHERE (orders.creator_id = users.id)
SQL
and added index in this way
has "#{user_order_time}", as: :order_time, type: :timestamp
and tries to use this index, even this isn't working.
Can anyone tell me the problem with each approach.
Firstly, this answer is written presuming you're using SQL-backed indices (using the :with => :active_record option in your index definition) rather than real-time indices, and you're using Thinking Sphinx v3.
To cover your second approach first:
user_order_time = <<-SQL SELECT orders.created_at FROM orders WHERE (orders.creator_id = users.id) SQL
has "#{user_order_time}", as: :order_time, type: :timestamp
This will not work. You can refer to SQL snippets in attributes and fields, but only the sections that go in the SELECT clause. You cannot use full SQL queries.
However, with this approach, you're on the right track:
has association(:created_at), as: :order_time, type: :timestamp
Are you using association there just when writing this question, not in your actual code? Because it should be something like this:
has orders.created_at, as: :order_time
I've not specified the type - Thinking Sphinx will automatically detect this from the database.
If that doesn't work, it's worth looking at the generated SQL query in the Sphinx configuration file for clues as to why it's not returning the values you're expecting (locally, that's config/development.sphinx.conf, and you're looking for the sql_query setting in source user_core_0).

comparing one attribute to another with ransack

Ransack allows me to build conditions with an attribute, a predicate and a value. I haven't been able to find any documentation on how to compare one attribute to another however. For instance, how could I create a condition for:
WHERE column_a < column_b
I've been using Ransack for quite a while, but I don't see any possibility to do what you are looking for. What you want is a "case -> when" statement, which can be produced in Rails or as SQL with ActiveRecord.
Ransack gives you the ability to create a custom SQL command, by defining attribute, predicate and value, which then translates into WHERE Statement you already mentioned. I don't see any possibility to tell Ransack directly to filter for what you want. However:
What you could is create a scope like:
scope :column_b_gt_columnb_a, -> { where('column_b > column_a') }
And then you can build your search like this:
Object({ column_b_gt_columnb_a: true })
Probably not really what you were looking, but I think that's the best you gonna get...
And if you want to do it with Rails you would do to compare values or use said where statement I used above.
Records.each do |i|
case i.variable_a
when i.variable_b
# do something when it's the same
when i.variable_a > i.variable_b
# do something when it's greater
end
end
For an example of an SQL statement look here
How do I compare two columns for equality in SQL Server?
Hope this helps a bit!

How can I include more fields to where condition using ransack?

I'm building a query using ransack predicates, so that means I need a lot of conditions for a WHERE clause SQL-wise. Here's what I have as a Hash of conditions for ransanck's search method:
This is the console output of params[:q]
{"price_gteq"=>"0", "rooms_gteq"=>"1", "baths_gteq"=>"1", "has_photos_true"=>"0", "zone_id"=>"1", "deal_type_eq"=>"1", "s"=>{"0"=>{"name"=>"price", "dir"=>"asc"}}, "price_lteq"=>"2500000", "rooms_lteq"=>"2", "baths_lteq"=>"7", "sector_id_in"=>nil}
And this is the SQL query it generates:
Property.search(params[:q]).result.to_sql
Which gives:
SELECT "properties".* FROM "properties" LEFT OUTER JOIN "sectors" ON "sectors"."id" = "properties"."sector_id" WHERE (("properties"."price" >= 0.0 AND "properties"."price" <= 2500000.0)) ORDER BY "properties"."price" ASC
How can I include ALL the fields inside the where clause?
As you can see, only the :price field remains inside the where clause. I need :baths and :rooms to be included for the where clause too.
How can I fix it?
I've solved my own problem fortunately, What happened was that I was using the UNRANSACKABLE_ATTRS array for my model to exclude the aforementioned fields from sorting (The default Sorting methodology provided by Ransack), but I didn't know this would affect search as well. Anyways I removed the UNRANSACKABLE_ATTRS array and every condition in search started to work as expected. :)

Custom query with Squeel on equality of two attributes to ordered pairs

I'm having difficulty writing a query, whether in SQL or not, but I'd like to be able to write it in ruby using the gem Squeel. Basically, I have a class:
class Article < ActiveRecord::Base
belongs_to :genre
belongs_to :publisher
...
end
I want a scope that takes in an array of ordered pairs of the format (genre_id, publisher_id) and outputs an ActiveRecord::Relation that contains all of the records with genre, publisher pairs equal to the pairs passed in.
Basically, I want Article.where{ genre_id.eq_any [1, 5, 76] }, except instead of a single attribute, I want it on a pair of attributes:
Article.where{ genre_id_publisher_id.eq_any [(1,2), (1,4), (2, 4)] }
The only way I can think of doing it is making a scope with a select call that adds a column which would be the concatenation of the two ids with some delimiter, then searching based on that:
Article.select(*,
"#{genre_id}-#{publisher_id}" as genre_publisher_pair).where(
genre_publisher_pair.eq_any ["1-2", "1-4", "2-4"])
However, that seems like too much overhead. I also already have a compound index on [genre_id, publisher_id] and I'm afraid this won't use that index, which is going to be a requirement for me.
Is there an easier way to write these compound scopes? Thanks!

Resources