Sunspot Solr index time boost - ruby-on-rails

I try to use document boost on index time, but it seems, that it hasn't any effect. I've set up my model for Sunspot like
Spree::Product.class_eval do
searchable :auto_index => true, :auto_remove => true do
text :name, :boost => 2.0, stored: true
text :description, :boost => 1.2, stored: false
boost { boost_value }
end
end
The boost_value field is a field in the database, where a user can change the boost in the frontend. It gets stored at index time (either the first time I build the index, or when a product is updated). I have about 3600 products in my database, with a default boost_valueof 1.0. Two of the products got different boost_values, one with 5.0 and the other with 2.0.
However, If I just want to retrieve all products from Solr, the document boost seems to have no effect on the order or the score:
solr = ::Sunspot.new_search(Spree::Product) do |query|
query.order_by("score", "desc")
query.paginate(page: 1, per_page: Spree::Product.count)
end
solr.execute
solr.results.first
The Solr query itself looks like this:
http://localhost:8982/solr/collection1/select?sort=score+desc&start=0&q=*:*&wt=xml&fq=type:Spree\:\:Product&rows=3600&debugQuery=true
I've appended a debugQuery=true at the end, to see what the scores are. But there are no scores shown.
The same things happens, when I search for a term. For examle, I have 2 products that have a unique string testtest inside the name field. When I search for this term, the document boost has no effect on the order.
So my questions are:
Can per document index time boosting be used based on a database field?
Does the document boost has any effect for q=*:*?
How can I debug this?
Or do I have to specify, that solr should involve the document boost?

In solr, the boosts only apply to text searches, so it applies only if you do a fulltext search.
Something like this:
solr = ::Sunspot.new_search(Spree::Product) do |query|
fulltext 'somesearch'
query.order_by("score", "desc") # I think this isn't necesary
query.paginate(page: 1, per_page: Spree::Product.count)
end
If you want to boost certain products more than others:
solr = ::Sunspot.new_search(Spree::Product) do |query|
fulltext 'somesearch' do
boost(2.0) { with(:featured, true) }
end
query.paginate(page: 1, per_page: Spree::Product.count)
end
As you see, this is much powerfull than boosting at index time, and you could put different boostings for different conditions, all at query time with no need of reindexing if you want to change the boost or the conditions.

Related

how to rank records dynamically in a result set in rails using sunspot solr

I have users, cafes and their food_items(which have some ingredients listed). Until now i used solr to search for food_items via some ingredients that a user likes. This was accomplished using sunspot-solr search according to the sunspot docs
Also, i am able to gather a relative like-ness of a user to different cafes(based on how many times he has visited it, searched its menu etc)(this is a dynamic value that will be generated on the fly)
Problem:
I want to show the same results(food_items) fetched via solr, ranked by cafes(result re-ranking)(based on the like-ness of the user to a cafe) using sunspot solr for rails
This app is hosted on heroku and uses websolr
i have found these:
https://cwiki.apache.org/confluence/display/solr/Query+Re-Ranking
https://cwiki.apache.org/confluence/display/solr/RankQuery+API
but i have no idea as to how i can create a QParserPlugin or generate a rank query in sunspot.
sunspot provides a way to write custom queries. so if i could get help in constructing a query to fetch the like-ness and rank each record (or) any other way to implement such logic, that would be great. thanks!
you can do something like:-
def build_query(where_conditions)
condition_procs = where_conditions.map{|c| build_condition c}
Sunspot.search(table_clazz) do
condition_procs.each{|c| instance_eval &c}
paginate(:page => page, :per_page => per_page)
end
end
def build_condition(condition)
Proc.new do
# write this code as if it was inside the sunspot search block
keywords condition['words'], :fields => condition[:field].to_sym
end
end
conditions = [{words: "tasty pizza", field: "title"},
{words: "cheap", field: "description"}]
build_query conditions

Sunspot solr doesn't indexing all models

I have a model Post. Also I have STI model Review. Here is search config for Post:
searchable do
text :title, :content
text :username do
user.try(:username)
end
text :user_full_name do
user.try(:full_name_with_username)
end
text :user_full_name_with_username do
user.try(:full_name_with_username)
end
end
The problem is, that not all models were indexed even if I add certain model through Sunspot.index(Review.find(id)) and Sunspot.commit
After indexing, I trying to find some reviews by username :
reviews_ids = Review.search do
fulltext params[:titles_search] do
fields(:username)
end
end.results.map(&:id)
and there are not all reviews in results.
What could it be? How to debug it?
The search results from Solr are always paginated so I suspect that you're not seeing all the results from your search. By default, I believe the results are paginated to 20 results per page.
To see all the search results at once you could do something like:
search = Review.search do
fulltext params[:titles_search] do
fields(:username)
end
paginate page: 1, per_page: Review.count
end
This sets the "number of elements per page" equal to the number of total Reviews. So this should allow you to see all the search results without paging. Obviously if you have a large number of Review objects, this is going to be a huge memory hog.
A better way to do it is to work with the pagination behavior, so if you rewrite your search like this:
def search_reviews(opts = {})
options = {
page: nil,
per_page: 20,
}.merge(opts)
Review.search do
fulltext params[:titles_search] do
fields(:username)
end
paginate page: options[:page] if options[:page].present?
paginate per_page: options[:per_page] if options[:per_page].present?
end.results
end
You can call it successively with different page numbers to get all your results.

Elasticsearch with Tire, how can I get global facets to take queries and other facets into account when reporting the count?

The problem can best be shown with a working example to get an idea of what I'm trying to do.
Example on Newegg's faceted search page...
Here's a link to just internal hard drives:
http://www.newegg.com/All-Desktop-Hard-Drives/SubCategory/ID-14
Here's the same link but with the $25-$50 facet applied:
http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=100007603%204025&IsNodeId=1&name=%2425%20-%20%2450
Notice how when the $25-$50 checkbox was checked the other price check boxes didn't disappear and they still retain the correct count value. This leads me to believe they are using global facets.
You can now click the $50-$75 facet and you get this:
http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=100007603%204025%204026&IsNodeId=1&name=%2450%20-%20%2475
In this case it still shows the count for each price range on the left but you get the results of both facets shown.
If you do a search for let's say "Western":
http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=100007603%204025%204026&IsNodeId=1&bop=And&SrchInDesc=western&Page=1&PageSize=20
Now it's all filtered correctly by the search terms and facets. The counts are correct in the faceted navigation and so are the search results.
I'm trying to replicate that behavior and I almost have it working (maybe).
Here is my search method:
def self.search(params = {}, items_per_page: 25, sort_by: :published_at, sort_direction: "desc")
tire.search(page: (params[:page] || 1), per_page: items_per_page) do
query do
filtered do
query { string params[:search] } if params[:search].present?
terms = []
terms << { term: { category_id: params[:category_id] } } if params[:category_id].present?
terms << { term: { published: params[:published] } } if params[:published].present?
terms << { range: { word_count: { gte: params[:word_count].to_i } } } if params[:word_count].present?
filter :and, terms if not terms.empty?
end
end
facet "categories" do
terms :category_id
end
facet "word_count" do
histogram :word_count, interval: 200
end
facet "visible" do
terms :published
end
end
end
With that setup it all works except for example, let's say I want to facet on 2 categories instead of 1. That's impossible with this setup because as soon as I check off category 1 then the checkbox for category 2, 3, and so on disappear.
This makes me think I need to use global: true in my facets but then the facet count is never correct, it won't take into account the search query results or the results of the other facets. How can I get the best of both worlds because it seems crippled/unusable otherwise.
If doing global: true isn't the answer here then what else can I do?

thinking_sphinx results order

I have no experience with thinking_sphinx (I take pride in the fact I even got it working).
I would like to sort my results based on relevance to the search and how recent they are. Maybe, 5X for relevance, 1X for time. (I'd have to play with that to get it right). Obviously if there's no search criteria, I'd like it to sort just by time.
I know I need to add the created_at column to the search model, but not as indexes (what term do I use?)
Report controller:
def index
#reports = Report.search params[:search]
# unknown sorting code here
end
Report model:
define_index do
indexes apparatus
indexes body
indexes comments.body, as => :comment_body
????? created_at
end
You would just do:
define_index do
indexes apparatus
indexes body
indexes comments.body, as => :comment_body
has created_at
end
By using has you just denote whatever fields it needs but isn't indexing on
For search sorting, you need to read the Sphinx docs for how you think you'd want them weighted and sorted:
http://freelancing-god.github.com/ts/en/searching.html#sorting
http://freelancing-god.github.com/ts/en/searching.html#fieldweights
By default, Sphinx sorts based on how relevant it thinks the results are to the given inputs.

Filtering Sphinx search results by date range

I have Widget.title, Widget.publish_ at, and Widget.unpublish_ at. It's a rails app with thinking_sphinx running, indexing once a night. I want to find all Widgets that have 'foo' in the title, and are published (publish _at < Time.now, unpublish _at > Time.now).
To get pagination to work properly, I really want to do this in a sphinx query. I have
'has :publish_at, :unpublish_at' to get the attributes, but what's the syntax for 'Widget.search("foo #publish_ at > #{Time.now}",:match _mode=>:extended'? Is this even possible?
Yep, easily possible, just make sure you're covering the times in your indexes:
class Widget < ActiveRecord::Base
define_index do
indexes title
has publish_at
has unpublish_at
...
end
To pull it based purely off the dates, a small amount of trickery is required due to sphinx requiring a bounded range (x..y as opposed to x>=y). The use of min/max value is very inelegant, but I'm not aware of a good way around it at the moment.
min_time = Time.now.advance(:years => -10)
max_time = Time.now.advance(:years => 10)
title = "foo"
Widget.search title, :with => {:publish_at => min_time..Time.now, :unpublish_at => Time.now..max_time}
I haven't used sphinx with rails yet.
But this is possible by the Sphinx API.
What you need to do is to set a datetime attribute at your sphinx.conf.
And don't forget to use UNIX_TIMESTAMP(publish_at), UNIX_TIMESTAMP(unpublish_at) at your index select.

Resources