I am using thinking sphinx version 2.0.10 in rails for full text search and I am dealing with millions of record in Database. it take huge time to return result. so is there any way to keep the indexes on swap device. so it will work faster.
Thank You for Help
Thinking Sphinx configures Sphinx to store attributes in memory - but as far as I know there's no such setting that applies to field data. Sphinx index files can be stored on any disk you like though, instead of just RAILS_ROOT/db/sphinx/RAILS_ENV - this is configured using the searchd_file_path setting in config/sphinx.yml.
Perhaps you could elaborate on how you're using Sphinx and Thinking Sphinx - what kinds of queries you're running that are slow, and what the relevant index structures look like. There may be other ways of improving the speed of this.
Related
I understand the basics of how indexing helps boost performance, and how to index my DB, but Im confused about how often to re-index. Also, when I re-index my database, do I need ot first removed the initial index, or can I just re-index as if I am indexing for the first time.
This is not a rails question its a DBMS question. What, where, when and how you re-index depends on your database DBMS but as a general rule re indexing is rarely needed unless there is database corruption of some description or you have a massive amount of changes to data that is included in the indexes. By changes I mean updates and deletes.
For example if you use Postgres then this link might help http://www.postgresql.org/docs/9.1/static/routine-reindex.html.
Also have a look through stackexchange. Questions and answers like this https://dba.stackexchange.com/questions/1937/is-reindex-dangerous may enlighten you.
If you use MySQL then this is a very good explanation http://dev.mysql.com/doc/refman/5.0/en/rebuilding-tables.html
Lookup whatever DBMS you are using and check the official documentation on how and when to re-index. The requirement to re-index is also likely to be different depending on the table types used like InnoDB and MyISAM for MySQL may well have different requirements and csv may well not have any indexing at all
I am using thinking_sphinx in Rails. As far as I know, Sphinx is used for full text search. Let's say I have these queries:
keyword
country
sort order
I use Sphinx for all the search above. However, when I am querying without keyword, but just country and sort order only, is it a better to use just normal query in MySQL instead of using Sphinx?
In other words, should Sphinx be used only when keyword is searched?
Looking at overall performance and speed.
Not to sound snarky, but does performance really matter?
If you're building an application which will only be used by a handful of users within an organization, then you can probably dismiss the performance benefits of using one method over the other and focus instead on simplicity in your code.
On the other hand, if your application is accessed by a large number of users on the interwebz and you really need to focus on being performant, then you should follow #barryhunter's advice above and benchmark to determine the best approach in a given circumstance.
P.S. Don't optimize before you need to. Fight with all your heart to keep code out of your code.
Benchmark! Benchmark! Benchmark!
Ie test it yourself. The exact performance will vary depending on the exact data, and perhaps even the relative perofrmance of your sphinx and mysql servers.
Sphinx will offer killer-speeds over MySQL when searching by a text string and MySQL will probably be faster when searching by a numerical key.
So, assuming that both "country" and "sort order" can be indexed using a numerical index in MySQL, it will be better to use Sphinx only with "keyword" and for the other two - MySQL normal query.
However, benchmarks won't hurt, as barryhunter suggested ;)
I have a database in (psql) that contains about 16,000 records; they are the titles of movies. I am trying to figure out what is the most optimal way to go about searching them (currently they are being searched via the web on a Heroku hosted website for Ruby on Rails). However, some queries such as searching for something like the word 'a' can take up to 20 seconds. I was thinking of using Sphinx however, such packages are advertised for full text searching, so I am wondering if that is appropriate for my problem. Any advice would be appreciated.
16000 records are too few both in both number and size (as you said title) to qualify for a Search Engine search. Try out normal full text search of your database. Set up the indexes for making it faster.
However this does not stop you from trying out some Search Engine like Sphinx or Solr. Both are open source. Sphinx pretty easy to setup too. But again to reiterate there is no need for this as the data size is too less and comes under the domain of Database Full Text Search.
If your database is on PSQL then sphinx is not possible as up till now heroku postgres is not supported to work with sphinx so the remaining choice so far is to use solr which is also good for full text search and some simple steps to make it implement.
I've been looking into searching plugins/gems for Rails. Most of the articles compare Ferret (Lucene) to Ultrasphinx or possibly Thinking Sphinx, but none that talk about SearchLogic. Does anyone have any clues as to how that one compares? What do you use, and how does it perform?
thinking_sphinx and sphinx work beautifully, no indexing, query, install problems ever (5 or 6 install, including production slicehost )
why doesn't everybody use sphinx, like, say craigslist? read here about its limitations (year and a half old articles. The sphinx developer, Aksyonoff, is working on these and he's putting in features and reliability and stamping out bugs at an amazing pace)
http://codemonkey.ravelry.com/2008/01/09/sphinx-for-search/
http://www.ibm.com/developerworks/opensource/library/os-php-apachesolr/
Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?
ferret: easy install, doesn't stem properly, very slow indexing (one mysql db: sphinx: 3 seconds, ferret: 50 minutes). Well documented problems (index corruption) in drb servers in production under load. Having said that, i have use it in develometn since acts-as_ferret came out 3 years ago, and it has served me well. Not adhering to porter stemming is an advantage in some contexts.
Lucene and Solr is the gorilla/mack truck / heavyweight champ of open source search. The teams have been doing an impressive number of new features in solr 14 release:
acts-as-solr: works well, once the tomcat or jetty is in place, but those sometimes are a pain. The A-A-S fork by mattmatt is the main fork, but the project is relatively unmaintained.
re the tomcat install: SOLR/lucene has unquestionably the best knowledge base/ support search engine of any software package i've seen ( i guess i'm not that surprised), the search box here:
http://www.lucidimagination.com/
Sunspot the new ruby wrapper, build on solr-ruby. Looks promising, but I couldn't get it to install on OSX. Indexes all ruby objects, not just databases through AR
one thing that's really instructive is to install 2 search plugins, e.g. sphinx and SOLR, sphinx and ferret, and see what different results they return. It's as easy as #sphinx_results - #ferret_results
just saw this post and responses
http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/
http://www.jroller.com/otis/entry/open_source_search_engine_benchmark
http://www.flax.co.uk/blog/2009/07/07/xapian-compared/
First off, my obvious bias: I created and maintain Thinking Sphinx.
As it so happens, I actually saw Ben Johnson (creator of SearchLogic) present at the NYC ruby meet about it last night. SearchLogic is SQL-only - so if you're not dealing with massive tables, and relevance rankings aren't needed, then it could be exactly what you're looking for. The syntax is pretty clean, too.
However, if you want all the query intelligence handled by code that is not your own, then Sphinx or Solr (which is Lucene under the hood, I think) is probably going to work out better.
SearchLogic is a good plugin, but is really meant to make your search code more readable, it doesn't provide the automatic indexing that Sphinx does. I haven't used Ferret, but Sphinx is incredibly powerful.
http://railscasts.com/episodes/120-thinking-sphinx
Great introduction to see how flexible it is.
I have not used SearchLogic but I can tell you that Lucene is a very mature project, that has implementation in many languages. It is fast and flexible and the API is fun to work with. It's a good bet.
Given this question is still highly ranked at google for full text search, I'd really like to say that Sunspot is even stronger today if you're interested in adding full text search capabilities to your Rails application (and would like to have Solr behind you for that). You can check a full tutorial on this here.
And while we're at it, another contender that has arrived in the field is ElasticSearch, that aims to be a real time full text search engine built on top of Lucene (but doing things differently when compared to Solr). ElasticSearch includes out-of-the-box sharding and replication to multiple nodes, faster real time search, "percolators" to allow you to receive notifications when something that matches your criteria becomes available and it's moving really fast with many more other features. It's easy to build something on top of it, since the API is dead simple and completely based on REST using JSON as a format. One could say you don't even need a plugin to use it.
Personally, I don't bother with database agnostics for web applications and am quite happy using the full text search in pg83. The benefit is, if and when you change your framework/language, that you will still have full text search.
Full Text Indexing and MATCH() AGAINST().
If you're just looking to do a fast search against a few text columns in your table, you can simply use a full text index of those columns and use MATCH() AGAINST() in your queries.
Create the full text index in a migration file:
add_index :table, :column, type: :fulltext
Query using that index:
where( "MATCH( column ) AGAINST( ? )", term )
ElasticSearch and Searchkick
If you're looking for a full blown search indexing solution that allows you to search for any column in any of your records while still being lightning quick, take a look at ElasticSearch and Searchkick.
ElasticSearch is the indexing and search engine.
Searchkick is the integration library with Rails that makes it very easy to index your records and search them.
Searchkick's README does a fantastic job at explaining how to get up and running and to fine tune your setup, but here is a little snippet:
Install and start ElasticSearch.
brew install elasticsearch
brew services start elasticsearch
Add searchkick gem to your bundle:
bundle add searchkick --strict
The --strict option just tells Bundler to use an exact version in your Gemfile, which I highly recommend.
Add searchkick to a model you want to index:
class MyModel < ApplicationRecord
searchkick
end
Index your records.
MyModel.reindex
Search your index.
matching_records = MyModel.search( "term" )
For anyone looking for a simple search gem without any dependencies, check out acts_as_indexed
I am constructing an anagram generator that was a coding exercise, and uses a word list thats about 633,000 lines long (one word per line). I wrote the program just in Ruby originally, and I would like to modify this to deploy it online.
My hosting service supports Ruby on Rails as about the only Ruby-based solution. I thought of hosting on my own machine, and using a smaller framework, but I don't want to deal with the security issues at this moment.
I have only used RoR for database-driven (CRUD) apps. However, I have never populated a sqlite database this way, so this is a two-part question:
1) Should I import this to a database? If so, what's the best method to do so? I would like to stick with sqlite to keep things simple if that's the case.
2) Is a 'flat file' better? I wont be doing any creating or updating, just checking against the list of words.
Thank you.
How about keeping it in memory? Storing that many words would take just a few megabytes of RAM, and otherwise you'd be accessing the file frequently so it'd probably be cached anyway. The advantage of keeping the word list in memory is that you can organize it in whatever data structure suits your needs best (I'm thinking a trie). If you can't spare that much memory, it might be to your advantage to use a database so you can efficiently load only the parts of the word list you need for any given query - of course, in that case you'd want to create some index columns (well at least one) so you can take advantage of the indexing capabilities of SQL.
Assuming that what you're doing is looking up whether a word exists in your list, I would say that SQLite with an indexed column will likely be faster than scanning through the word list linearly. Now, if your current approach is fast enough for your purposes, then I see no reason to bother porting it over to a database; it's just an added headache for no gain as far as you're concerned. If you're seeing the search times become a burden, then dumping it into an indexed database would be a good idea.
You can create the table with the following schema:
CREATE TABLE words (
word text primary key
);
CREATE INDEX word_idx ON words(word);
And import your data with:
sqlite words.db < schema.sql
while read word
do
sqlite3 words.db "INSERT INTO words values('$word');"
done < words.txt
I would skip the database for reasons listed above. A simple hash in memory will perform about as fast a lookup in the database.
Even if the database was a bit faster for the lookup, you're still wasting time with the DB having to parse the query and create a plan for the lookup, then assemble the results and send them back to your program. Plus you can save yourself a dependency.
If you plan on moving other parts of your program to a persistent store, then go for it. But a hashmap should be sufficient for your use.