What should I do to speed up Sphinx indexing (using MySQL)?
Should I use other database, noSQL database?
Note also that delta indexing is fast, only a full reindeinx process is slow.
Please explain in details. Thx!
UPDATE:
I'm reindexing over 100.000 items and my thinking-sphinx index definition looks like this
define_index do
indexes [text, user(:nickname), user(:full_name)]
has rewrites(:id), :as => :rewrite_id
has rewrites(:user_id), :as => :rewrite_user_id
has [rewrites(:user_id), user_id], :as => :user_id_or_rewrites_user_id
has comments(:user_id), :as => :comments_user_id
has simbols(:id), :as => :simbol_ids
has followings(:follower_id), :as => :follower_id
has follows(:followable_id), :as => :followable_id
has created_at, :sortable => true
has rewrites_count, :sortable => true
has relevance, :sortable => true
has user_id
set_property :delta => :datetime
end
Building a full index is slow. How slow?
Building a delta index is fast.
This sounds normal in my experience.
noSQL databases (last I heard #Rails 2.3.5) were kind of difficult to integrate with Rails. No SQL speeds depend on your data sets and relations.
Without more information this sounds normal.
== Edit ==
Make sure you have SQL indexes on
created_at
rewrites_count
relevance
in addition to your primary keys naturally.
When working with thinking_sphinx always look at the SQL it generates in the real sphinx configuration file. Run a query analyzer against all the queries it will run. I have found you can also manipulate the queries quite a bit.
Also for one to many relationships you may need to add this:
:source => :ranged_query
It will cause the sphinx to use a separate query to gather the children rather than an outer join. It is much faster in many cases.
how about using Real Time indexes (with adjusting appropriate memory limit)
Related
UPDATE:
Ok so I am no longer getting an error, but I am not getting any results back (even when there is only one search option).
I added the has clause to my list of indexes:
define_index do
has bar_profiles(:day), :as => :days
indexes bar_profiles.budget, :as => :budget_tags
.
.
end
So my search is:
bars = Bar.search(search_options)
with
search_options = {:conditions=>{:budget_tags=>"LOW BUDGET"}, :with=>{:days=>"thursday"}, :page=>1, :per_page=>20}
bar_profiles has rows for budget, experience, tags, day, etc.
Bar has many bar_profiles (potentially 1 for each day)
What I am trying to do is use the thinking sphinx search (in the bar model) to match the users selected criteria for budget, experience, tags against the bar_profile that has the day that matches with "today" (the current day).
This is the last thing I have to do to finish this app and I'm pulling my hair out cause I can't find any examples of how to set this up right...
If you have any insight please post it, anything helps. Thanks.
At first I thought my question was similar to this with an extra layer of abstraction, but I think my problem is with the search options not the indexing.
First off let me state that I have been having the worse time trying to fix a previous' groups implementation using thinking sphinx. I have finally got the project 90% working and the last 10% deals with being able to get the right filters to check against.
Here is a brief overview. The application has bars and bar_profiles (amongst many other tables that connect to these 2, and users, but they are not necessary to understand this issue.) There can be a bar_profile for each day of the week, for each bar.
So in the bar_profile model there is:
belongs_to :bar
and in bar there is:
has_many :bar_profiles
followed by the indexes in bar (written by the previous developer):
define_index do
# name is a reserved keyword so we need to specify it as a symbol to work
indexes :name
indexes tags
indexes bar_profiles.day, :as => :day
indexes bar_profiles.tags, :as => :daily_tags
indexes bar_profiles.budget, :as => :budget_tags
indexes bar_profiles.experience, :as => :experience_tags
set_property :delta => true
end
The issue I am having is this current implementation does not constrain the search properly to the current day. Instead of checking the current days profile for the bar, it seems to be checking against ALL the bars profiles.
So I set the current day at the start of the method:
today = (Time.now + Time.zone_offset('EST')).strftime("%A")
Then I think it needs to be something like below. I referenced this post by pat about using 'with', but I am not sure if I am messing up the syntax (because I am getting an error):
search_options = {:conditions => {}, :with => {:day=>today}, :page => 1, :per_page => algorithm.results_per_page}
Then I use these search options:
search_options[:conditions][:experience_tags] = options[:experience] unless options[:experience].blank?
budget = combine_budgets(options[:budget])
search_options[:conditions][:budget_tags] = budget unless budget.blank?
But when I try to run the search I get this in my development log:
^^^^ ERROR! Reason: index bar_core,bar_delta: no such filter attribute 'day'
Now I am pretty confused by this since the index for :day was set up as shown above... I'm not sure if 'filter attribute' is different then an index attribute. If someone could please offer some insight into this it would be greatly appreciated (looking at you #pat).
This is the final issue in this app, so if anyone can help me I would be very grateful.
Thanks,
Alan
I want my search engine to be able to order Lawyers on the count of cases of a certain case type. The most a lawyer has finalized cases of a certain type, the higher he will be ranked.
lawyer.rb
has_many :cases
has_many :case_types, :through => :cases
define_index do
indexes case_types.name, :as => :case_types
has case_types(:id), :as => :case_types_id
has "SUM(case_types)", :as => :case_type_count #this line gives an error, as my lawyer table does't have a case_type column, also, I need to count DISTINCT case_types
end
In my search_controller.rb, I would like to do something like that, suggestion being the name of a case type
#lawyers = Lawyer.search params[:suggestion], :order => "#case_type_count DESC"
Am I going the wrong way? should I think of a less Sphinx oriented method? The problem is I need to do an each_with_geodist on #lawyers, so I would need to get my lawyers through a Sphinx search.
Add the following to your define_index:
has "COUNT(case_types.id)", :as => :case_type_count, :type => :integer
join case_types
Then retrieve by case_count:
Lawyer.search("", :order => "case_type_count desc")
I have found it useful to read the sql code in development.sphinx.conf which allows me to see the column names being generated.
I'm working with thinking sphinx
define_index do
indexes to
indexes created_on
has created_on
end
now while searching on console
Emaildumps.search 5.day.ago,
:group_by => 'created_on',
:group_function => :day
now the error i get is
Sphinx Daemon returned error: index emaildumps_core: INTERNAL ERROR: incoming-schema mismatch (in=timestamp created_on:32#160, my=timestamp created_on:32#0)
it may be a dumb question but i'm a newbee at sphinx and i can't understand the fundamentals of indexing and searching in it
what am I doing wrong??
so please help me out.
It's perhaps related, but you can't have fields and attributes with the same name. So, I'd recommend aliasing one of those (the field is better):
define_index do
indexes to
indexes created_on, :as => :created_on_field
has created_on
end
That said, not sure if there's much value in having created_on as a field, but up to you.
I've got indexes on a few different models, and sometimes the user might search for a value which exists in multiple models. Now, if the user is really only interested in data from one of the models I'd like the user to be able to pre/postfix the query with something to limit the scope.
For instance, if I only want to find a match in my Municipality model, I've set up an index in that model so that the user now can query "xyz municipality" (in quotes):
define_index do
indexes :name, :sortable => true
indexes "name || ' municipality' name", :as => :extended_name, :type => :string
end
This works just fine. Now I also have a Person model, with a relation to Municipality. I'd like, when searching only on the Person model, to have the same functionality available, so that I can say Person.search("xyz municipality") and get all people connected to that municipality. This is my current definition in the Person model:
has_many :municipalities, :through => :people_municipalities
define_index do
indexes [lastname, firstname], :as => :name, :sortable => true
indexes municipalities.name, :as => :municipality_name, :sortable => true
end
But is there any way I can create an index on this model, referencing municipalities, like the one I have on the Municipality model itself?
If you look at the generated SQL in the sql_query setting of config/development.sphinx.conf for source person_core_0, you'll see how municipalities.name is being concatenated together (I'd post an example, but it depends on your database - MySQL and PostgreSQL handle this completely differently).
I would recommend duplicating the field, and insert something like this (SQL is pseudo-code):
indexes "GROUP_CONCAT(' municipality ' + municipalities.name)",
:as => :extended_municipality_names
Also: there's not much point adding :sortable true to either this nor the original field from the association - are you going to sort by all of the municipality names concat'd together? I'm guessing not :)
i got this error:
SQLite3::SQLException: no such column: apis.name: SELECT * FROM examples WHERE ("apis"."name" = 'deep')
my code
Api.find :all, :from => params[:table_name], :conditions => {:name => 'deep' }
I need to make a back end rails application which will be used by a silverlight application. one of the requirements is to fetch simple data from the database. i need to be able to query different tables with the same code.(my app has 2000 tables!)
i think it does not make sense for rails to put in "apis" in the WHERE clause. is there any speciic reason for this?
It does that so when joins are performed, the where clauses will line up with the right tables' columns. This is handy most of the time, but in your particular case causes issues.
What you could do is use the other conditions syntax, which will not add rails table names to the attributes, but still sanitize the inputs properly.
Api.find :all, :from => params[:table_name], :conditions => ['name = ?','deep']