sphinx is returning stale results

sphinx is returning stale results - ruby-on-rails

Environment:
Memcached, Rails 2.2.2 + cache_money, Sphinx + thinking sphinx
The following yields stale results:
- add a record; mysql contains the correct data
- the record is probably cached in memory at this point
- re-index sphinx
- sphinx returns the proper result with the correct data
- edit the record
- the cache is invalidated properly, mysql contains the correct, updated data
- re-index sphinx again
- sphinx is now stale
Re-indexing sphinx, clearing memcached, and/or editing the questionable records all have no effect. Disabling the cache layer all together (cache_money plus memcached) also has no effect.

Does your delta query just get new, unindexed rows from your table(s), or is it grabbing every row with a modified time greater than a specified value?

Related

mongo query with .in operator is slow and results in large keysExamined and large acquireCount and large lock count

I have a Rails application that is using mongoid as a mongo wrapper. Given a batch of 1,000 mongo bson ids, I run a simple query:
Case.where(:_id.in => case_ids).to_a
Here, case_ids, is the array of 1,000 bson ids.
This fires a Slow query warning:
{"t":{"$date":"2021-10-31T10:25:23.555-04:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn16","msg":"Slow query","attr":{"type":"command","ns":"app_development.audita
bles","command":{"getMore":1879838943203028675,"collection":"cases","$db":"app_development","lsid":{"id":{"$uuid":"cc709ec6-788d-4908-a8d8-5c55c678dea9"}}},"originatingC
ommand":{"find":"cases","$db":"app_development","filter":{"_id":{"$in":[{"$oid":"60e49287e55e88201c22637d"},{"$oid":"60e49287e55e88201c22637f"},{"$oid":"60e49287e55e8820
1c226381"},{"$oid":"60e49287e55e88201c226383"},{"$oid":"60e49287e55e88201c226385"},{"$oid":"60e49287e55e88201c226387"},{"$oid":"60e49287e55e88201c226389"}
This is using the mongo _id and I've confirmed that it has an index (as it should). There are only 85,000 records in the collection. Any idea why this is firing so many getMores and why it's hitting so many records? When the query is done it returns:
{"$oid":"60e49287e55e88201c2265ad"},{"$oid":"60e49287e55e88201c2265af"},{"$oid":"60e49287e55e88201c2265b1"},{"$oid":"60e49287e55e88201c2265b3"},{"$oid":"60e49287e55e88201c2265b5"}]}}},"planSummary":"IXSCAN { _id: 1 }","cursorid":1879838943203028675,"keysExamined":38102,"docsExamined":36514,"cursorExhausted":true,"numYields":38,"nreturned":36515,"reslen":13165126,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":39}},"Global":{"acquireCount":{"r":39}},"Database":{"acquireCount":{"r":39}},"Collection":{"acquireCount":{"r":39}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":101},"truncated":{"originatingCommand":{"filter":{"_id":{"$in":{"282":{"type":"objectId","size":12}}}}}},"size":{"originatingCommand":1600124}}
The most striking of which is:
"keysExamined":38102,"docsExamined":36514,"cursorExhausted":true,"numYields":38,"nreturned":36515,
And
"locks":{"ReplicationStateTransition":{"acquireCount":{"w":39}},"Global":{"acquireCount":{"r":39}}
Is there something wrong with my indexes or query, and is there a better index I can run to support faster .in queries, or is there some precompute I can do so that when I need to pull out a subset of docs, it's faster?
Thanks for any help,
Kevin

Deleting specific keys from memcached hash

I am trying to cache a table t_abc using memcached, for my rails application. The table has 50,000 entries. And I finally want have 20,000 keys(which will be of the form "abc_"+id). When the 20,001st entry is to be inserted in the cache, I want the least recently used key out of these 20,000(of the above form, and not some other keys in the memcached) to be deleted from the cache. How do I achieve that?
NOTE: I am keeping an expiry = 0 for all the entries.

No, unfortunately you cannot efficiently do what you want to do with memcached.
It has a single LRU which works across all the data you are storing. So if you are storing data from multiple tables then the fraction of the memcached entries taken up by each table depends on the relative patterns of access of the data from the different tables.
So to control the amount of rows of the table are cached, really all you can do is adjust how big your memcached is and vary what other data gets stored.
(Also, to evaluate the memcached configuration you might consider using a different metric, such as the hit rate or the median response time, rather than simply the number of rows cached.)

Rails : what rails fuzzy method/gem/plugin use to search in a 1 million records database table?

I own a ~1 million records MySQL table.
I will need soon to add search in my Rails 3.x app. I want the search to be fuzzy.
Actually, I use a plugin (rails-fuzzy-search) for another table but it's only 3000 records.
This plugin create trigrams in another table (25000 trigrams for the 3000 records table).
Well, I can't use this method for my 1 million records table else my trigrams table will be maybe 100 millions records !
I see some gems:
https://github.com/seamusabshere/fuzzy_match
https://github.com/kiyoka/fuzzy-string-match
Or the use of Sphinx and Thinking Sphinx + addons.
I don't know what is the best solution for better performances.
The search will be set for two fields of my table.

some searching around revealed fuzzily gem:
Anecdotical benchmark: against our whole Geonames-derived table of
locations (3.2M records, about 1GB of data), on my development machine
(a 2011 MacBook Pro)
searching for the top 10 matching records takes 6ms ±1 preparing the
index for all records takes about 10min the DB query overhead when
changing a record is at 3ms ±2 the memory overhead (footprint of the
trigrams table index) is about 300MB
Also, check out Solr and Sunspot

Thinking Sphinx delta indexing - delta index not getting updated

I have delta indexing setup for thinking sphinx on one of my models. When ever a record gets updated, the delta is being set to True, but I don't see the index getting updated with the changes made to the record. I have my sphinx configuration files updated with the delta property changes. Any idea why the delta indexing is not getting triggered?

According to the documentation after you update the database and the model, you should do this:
rake thinking_sphinx:rebuild
Maybe you've omit that step..

Thinking Sphinx and lack of updated records

We are running thinking sphinx on a utility instance in our server cluster. It is rerunning the index every minute. But, if you make a change to a record, it disappears from search results until the index is updated (up to 1 minute).
Is Thinking Sphinx only returning rows that have updated_at times that less than their last index?
If so, how can I get db changes to update the TS on the utility instance?

Instead of re-indexing every minute try using the Delayed Deltas approach. It is designed to tide over your search results until you fully re-index.
See:
http://freelancing-god.github.com/ts/en/deltas.html
Update:
Looks like the team at sphinx is trying to solve these problems with real-time indexes:
http://sphinxsearch.com/docs/current.html#rt-indexes

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

sphinx is returning stale results - ruby-on-rails

Does your delta query just get new, unindexed rows from your table(s), or is it grabbing every row with a modified time greater than a specified value?

Related

mongo query with .in operator is slow and results in large keysExamined and large acquireCount and large lock count

Deleting specific keys from memcached hash

Rails : what rails fuzzy method/gem/plugin use to search in a 1 million records database table?

Thinking Sphinx delta indexing - delta index not getting updated

Thinking Sphinx and lack of updated records

Categories

Resources