Even when no fields specified in update attributes actually changes , thinking sphinx sets delta = 1 which results in large number of unwanted queries being fired . can we do something to let sphinx know that there was no actual update.
Related
As per my search and understanding in delta indexing when we add new records or do changes in records we need to re index sphinx to show that data otherwise it will not show.
But I check that data is updating without re indexing. So what the purpose of re indexing delta
With Thinking Sphinx, there's the distinction between a full re-index where all the indices are reprocessed (via rake ts:index and rake Ts:rebuild), and processing a single index.
When you have delta indexing enabled, it means that the delta index for a given model is automatically processed straight after the change to a record, or adding a new record. This is either done as part of the standard callback process (when using :delta => true) or via a background worker (Sidekiq, DelayedJob, etc) if you're using the appropriate delta gem for those.
All of this means that you don't need to run a full reprocessing of all indices for the change to be present - the delta index is reprocessed automatically, and the record's changes are reflected in Sphinx.
One catch worth noting is that the more changes that happen, the larger the delta index gets, and thus the slower it is to process. So, a full re-index is still required on a regular basis (hourly? daily? depends on your app) to keep delta processing times fast.
I'm performing a query using an sqlite db where I pull out a quite large data set of call records from a database. On the same page I want to show the breakdown of counts per day on the call records, so I perform about 30 count queries on the database.
Is there a way I can filter the set that I retrieve initially and perform the counts on the in memory set, so I don't have to run those continuous queries? I need those counts for graphing and display purposes but even with an index on date, it takes about 10 seconds to run the initial query plus all of the count queries.
What I'm basically asking is there a way to perform the counts on the records returned or perform analysis on it, or is there a smarter way to cache this data?
#set = Record.get_records_for_range(date1, date2)
while date1 < date2
#count = Record.count_records_for_date(date1)
date1 = date1 + 1
end
is basically what I'm doing. Surely there's a simpler and faster way?
Using #set.length will get you the count of the in memory set without querying the database because it is performed by ruby not active record (like .count is)
Read about it here https://batsov.com/articles/2014/02/17/the-elements-of-style-in-ruby-number-13-length-vs-size-vs-count/
Here is a quote pulled out of that article
length is a method that’s not part of Enumerable - it’s part of a concrete class (like String or Array) and it’s usually running in O(1) (constant) time. That’s as fast as it gets, which means that using it is probably a good idea.
I'm building a rails project, and I have a database with a set of tables.. each holding between 500k and 1M rows, and i am constantly creating new rows.
By the nature of the project, before each creation, I have to search through the table for duplicates (for one field), so i don't create the same row twice. Unfortunately, as my table is growing, this is taking longer and longer.
I was thinking that I could optimize the search by adding indexes to the specific String fields through which i am searching.. but I have heard that adding indexes increases the creation time.
So my question is as follows:
What is the trade off with finding and creating rows which contain fields that are indexed? I know adding indexes to the fields will cause my program to be faster with the Model.find_by_name.. but how much slower will it make my row creation?
Indexing slows down insertation of entries because its required to add the entry to the index and that needs some ressources but once added they speed up your select queries, thats like you said BUT maybe the b-tree isnt the right choice for you! Because the B-Tree indexes the first X units of the indexed subject. Thats great when you have integers but text search is tricky. When you do queries like
Model.where("name LIKE ?", "#{params[:name]}%")
it will speed up selection but when you use queries like this:
Model.where("name LIKE ?", "%#{params[:name]}%")
it wont help you because you have to search the whole string which can be longer than some hundred chars and then its not an improvement to have the first 8 units of a 250 char long string indexed! So thats one thing. But theres another....
You should add a UNIQUE INDEX because the database is better in finding duplicates then ruby is! Its optimized for sorting and its definitifly the shorter and cleaner way to deal with this problem! Of cause you should also add a validation to the relevant model but thats not a reason to let things lide with the database.
// about index speed
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
You dont have a large set of options. I dont think the insert speed loss will be that great when you only need one index! But the select speed will increase propotionall!
I have delta indexing setup for thinking sphinx on one of my models. When ever a record gets updated, the delta is being set to True, but I don't see the index getting updated with the changes made to the record. I have my sphinx configuration files updated with the delta property changes. Any idea why the delta indexing is not getting triggered?
According to the documentation after you update the database and the model, you should do this:
rake thinking_sphinx:rebuild
Maybe you've omit that step..
We are running thinking sphinx on a utility instance in our server cluster. It is rerunning the index every minute. But, if you make a change to a record, it disappears from search results until the index is updated (up to 1 minute).
Is Thinking Sphinx only returning rows that have updated_at times that less than their last index?
If so, how can I get db changes to update the TS on the utility instance?
Instead of re-indexing every minute try using the Delayed Deltas approach. It is designed to tide over your search results until you fully re-index.
See:
http://freelancing-god.github.com/ts/en/deltas.html
Update:
Looks like the team at sphinx is trying to solve these problems with real-time indexes:
http://sphinxsearch.com/docs/current.html#rt-indexes