I have a Rails application that is using mongoid as a mongo wrapper. Given a batch of 1,000 mongo bson ids, I run a simple query:
Case.where(:_id.in => case_ids).to_a
Here, case_ids, is the array of 1,000 bson ids.
This fires a Slow query warning:
{"t":{"$date":"2021-10-31T10:25:23.555-04:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn16","msg":"Slow query","attr":{"type":"command","ns":"app_development.audita
bles","command":{"getMore":1879838943203028675,"collection":"cases","$db":"app_development","lsid":{"id":{"$uuid":"cc709ec6-788d-4908-a8d8-5c55c678dea9"}}},"originatingC
ommand":{"find":"cases","$db":"app_development","filter":{"_id":{"$in":[{"$oid":"60e49287e55e88201c22637d"},{"$oid":"60e49287e55e88201c22637f"},{"$oid":"60e49287e55e8820
1c226381"},{"$oid":"60e49287e55e88201c226383"},{"$oid":"60e49287e55e88201c226385"},{"$oid":"60e49287e55e88201c226387"},{"$oid":"60e49287e55e88201c226389"}
This is using the mongo _id and I've confirmed that it has an index (as it should). There are only 85,000 records in the collection. Any idea why this is firing so many getMores and why it's hitting so many records? When the query is done it returns:
{"$oid":"60e49287e55e88201c2265ad"},{"$oid":"60e49287e55e88201c2265af"},{"$oid":"60e49287e55e88201c2265b1"},{"$oid":"60e49287e55e88201c2265b3"},{"$oid":"60e49287e55e88201c2265b5"}]}}},"planSummary":"IXSCAN { _id: 1 }","cursorid":1879838943203028675,"keysExamined":38102,"docsExamined":36514,"cursorExhausted":true,"numYields":38,"nreturned":36515,"reslen":13165126,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":39}},"Global":{"acquireCount":{"r":39}},"Database":{"acquireCount":{"r":39}},"Collection":{"acquireCount":{"r":39}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":101},"truncated":{"originatingCommand":{"filter":{"_id":{"$in":{"282":{"type":"objectId","size":12}}}}}},"size":{"originatingCommand":1600124}}
The most striking of which is:
"keysExamined":38102,"docsExamined":36514,"cursorExhausted":true,"numYields":38,"nreturned":36515,
And
"locks":{"ReplicationStateTransition":{"acquireCount":{"w":39}},"Global":{"acquireCount":{"r":39}}
Is there something wrong with my indexes or query, and is there a better index I can run to support faster .in queries, or is there some precompute I can do so that when I need to pull out a subset of docs, it's faster?
Thanks for any help,
Kevin
I am trying to cache a table t_abc using memcached, for my rails application. The table has 50,000 entries. And I finally want have 20,000 keys(which will be of the form "abc_"+id). When the 20,001st entry is to be inserted in the cache, I want the least recently used key out of these 20,000(of the above form, and not some other keys in the memcached) to be deleted from the cache. How do I achieve that?
NOTE: I am keeping an expiry = 0 for all the entries.
No, unfortunately you cannot efficiently do what you want to do with memcached.
It has a single LRU which works across all the data you are storing. So if you are storing data from multiple tables then the fraction of the memcached entries taken up by each table depends on the relative patterns of access of the data from the different tables.
So to control the amount of rows of the table are cached, really all you can do is adjust how big your memcached is and vary what other data gets stored.
(Also, to evaluate the memcached configuration you might consider using a different metric, such as the hit rate or the median response time, rather than simply the number of rows cached.)
I own a ~1 million records MySQL table.
I will need soon to add search in my Rails 3.x app. I want the search to be fuzzy.
Actually, I use a plugin (rails-fuzzy-search) for another table but it's only 3000 records.
This plugin create trigrams in another table (25000 trigrams for the 3000 records table).
Well, I can't use this method for my 1 million records table else my trigrams table will be maybe 100 millions records !
I see some gems:
https://github.com/seamusabshere/fuzzy_match
https://github.com/kiyoka/fuzzy-string-match
Or the use of Sphinx and Thinking Sphinx + addons.
I don't know what is the best solution for better performances.
The search will be set for two fields of my table.
some searching around revealed fuzzily gem:
Anecdotical benchmark: against our whole Geonames-derived table of
locations (3.2M records, about 1GB of data), on my development machine
(a 2011 MacBook Pro)
searching for the top 10 matching records takes 6ms ±1 preparing the
index for all records takes about 10min the DB query overhead when
changing a record is at 3ms ±2 the memory overhead (footprint of the
trigrams table index) is about 300MB
Also, check out Solr and Sunspot
I have delta indexing setup for thinking sphinx on one of my models. When ever a record gets updated, the delta is being set to True, but I don't see the index getting updated with the changes made to the record. I have my sphinx configuration files updated with the delta property changes. Any idea why the delta indexing is not getting triggered?
According to the documentation after you update the database and the model, you should do this:
rake thinking_sphinx:rebuild
Maybe you've omit that step..
We are running thinking sphinx on a utility instance in our server cluster. It is rerunning the index every minute. But, if you make a change to a record, it disappears from search results until the index is updated (up to 1 minute).
Is Thinking Sphinx only returning rows that have updated_at times that less than their last index?
If so, how can I get db changes to update the TS on the utility instance?
Instead of re-indexing every minute try using the Delayed Deltas approach. It is designed to tide over your search results until you fully re-index.
See:
http://freelancing-god.github.com/ts/en/deltas.html
Update:
Looks like the team at sphinx is trying to solve these problems with real-time indexes:
http://sphinxsearch.com/docs/current.html#rt-indexes