Full Text Searching with Rails

Full Text Searching with Rails - ruby-on-rails

I've been looking into searching plugins/gems for Rails. Most of the articles compare Ferret (Lucene) to Ultrasphinx or possibly Thinking Sphinx, but none that talk about SearchLogic. Does anyone have any clues as to how that one compares? What do you use, and how does it perform?

thinking_sphinx and sphinx work beautifully, no indexing, query, install problems ever (5 or 6 install, including production slicehost )
why doesn't everybody use sphinx, like, say craigslist? read here about its limitations (year and a half old articles. The sphinx developer, Aksyonoff, is working on these and he's putting in features and reliability and stamping out bugs at an amazing pace)
http://codemonkey.ravelry.com/2008/01/09/sphinx-for-search/
http://www.ibm.com/developerworks/opensource/library/os-php-apachesolr/
Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?
ferret: easy install, doesn't stem properly, very slow indexing (one mysql db: sphinx: 3 seconds, ferret: 50 minutes). Well documented problems (index corruption) in drb servers in production under load. Having said that, i have use it in develometn since acts-as_ferret came out 3 years ago, and it has served me well. Not adhering to porter stemming is an advantage in some contexts.
Lucene and Solr is the gorilla/mack truck / heavyweight champ of open source search. The teams have been doing an impressive number of new features in solr 14 release:
acts-as-solr: works well, once the tomcat or jetty is in place, but those sometimes are a pain. The A-A-S fork by mattmatt is the main fork, but the project is relatively unmaintained.
re the tomcat install: SOLR/lucene has unquestionably the best knowledge base/ support search engine of any software package i've seen ( i guess i'm not that surprised), the search box here:
http://www.lucidimagination.com/
Sunspot the new ruby wrapper, build on solr-ruby. Looks promising, but I couldn't get it to install on OSX. Indexes all ruby objects, not just databases through AR
one thing that's really instructive is to install 2 search plugins, e.g. sphinx and SOLR, sphinx and ferret, and see what different results they return. It's as easy as #sphinx_results - #ferret_results
just saw this post and responses
http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/
http://www.jroller.com/otis/entry/open_source_search_engine_benchmark
http://www.flax.co.uk/blog/2009/07/07/xapian-compared/

First off, my obvious bias: I created and maintain Thinking Sphinx.
As it so happens, I actually saw Ben Johnson (creator of SearchLogic) present at the NYC ruby meet about it last night. SearchLogic is SQL-only - so if you're not dealing with massive tables, and relevance rankings aren't needed, then it could be exactly what you're looking for. The syntax is pretty clean, too.
However, if you want all the query intelligence handled by code that is not your own, then Sphinx or Solr (which is Lucene under the hood, I think) is probably going to work out better.

SearchLogic is a good plugin, but is really meant to make your search code more readable, it doesn't provide the automatic indexing that Sphinx does. I haven't used Ferret, but Sphinx is incredibly powerful.
http://railscasts.com/episodes/120-thinking-sphinx
Great introduction to see how flexible it is.

I have not used SearchLogic but I can tell you that Lucene is a very mature project, that has implementation in many languages. It is fast and flexible and the API is fun to work with. It's a good bet.

Given this question is still highly ranked at google for full text search, I'd really like to say that Sunspot is even stronger today if you're interested in adding full text search capabilities to your Rails application (and would like to have Solr behind you for that). You can check a full tutorial on this here.
And while we're at it, another contender that has arrived in the field is ElasticSearch, that aims to be a real time full text search engine built on top of Lucene (but doing things differently when compared to Solr). ElasticSearch includes out-of-the-box sharding and replication to multiple nodes, faster real time search, "percolators" to allow you to receive notifications when something that matches your criteria becomes available and it's moving really fast with many more other features. It's easy to build something on top of it, since the API is dead simple and completely based on REST using JSON as a format. One could say you don't even need a plugin to use it.

Personally, I don't bother with database agnostics for web applications and am quite happy using the full text search in pg83. The benefit is, if and when you change your framework/language, that you will still have full text search.

Full Text Indexing and MATCH() AGAINST().
If you're just looking to do a fast search against a few text columns in your table, you can simply use a full text index of those columns and use MATCH() AGAINST() in your queries.
Create the full text index in a migration file:
add_index :table, :column, type: :fulltext
Query using that index:
where( "MATCH( column ) AGAINST( ? )", term )
ElasticSearch and Searchkick
If you're looking for a full blown search indexing solution that allows you to search for any column in any of your records while still being lightning quick, take a look at ElasticSearch and Searchkick.
ElasticSearch is the indexing and search engine.
Searchkick is the integration library with Rails that makes it very easy to index your records and search them.
Searchkick's README does a fantastic job at explaining how to get up and running and to fine tune your setup, but here is a little snippet:
Install and start ElasticSearch.
brew install elasticsearch
brew services start elasticsearch
Add searchkick gem to your bundle:
bundle add searchkick --strict
The --strict option just tells Bundler to use an exact version in your Gemfile, which I highly recommend.
Add searchkick to a model you want to index:
class MyModel < ApplicationRecord
searchkick
end
Index your records.
MyModel.reindex
Search your index.
matching_records = MyModel.search( "term" )

For anyone looking for a simple search gem without any dependencies, check out acts_as_indexed

Related

Umbraco (Examine) Search - Synonyms

I am trying to implement synonym searching in the Examine search engine that comes with Umbraco 8 out of the box.
Does anyone have any experience with implementing synonym searching in Examine/Umbraco 8. The options that I have been considering after looking around are -
A package that can be installed in Umbraco 8 that offers this extended functionality (if one exists).
Implementing a custom index (currently just using the out of the box 'ExternalIndex') that somehow implements synonym searching in the analysis (via custom analyzer implementation etc - If that is even possible).
Manually formatting multiple search terms by checking for synonyms in the string beforehand, running all searches and consolidating the results after (really a nasty, last resort option - you don't have to tell me how bad this is, I already know).
I have been trawling around the forums for a definitive answer on this and cannot really find one. Essentially I want to stick with the Examine engine for simplicity, however I am starting to think that the best way to achieve what I am after would be to move to a new engine completely (elastic search for example).
Many thanks in advance.

Use algolia? It's free and will do what you need easily? https://www.algolia.com/

The Examine is based on something called the Lucene search index. Lucene is known to not really do synonyms I'm afraid (read why here and potential solution).
Your thinking is probably correct. Examine is good at what it does, if you want to use more advanced searching then you will be better off using a more advanced search provider. There are loads of options, Algolia is Saas and comes with a free plan depending on your usage. It's easy to install and you target data from the front-end.
YOu could also look into Azure Cognitive Search or Solr. These are probably harder to implement but will also do the job

Rails: acts_as_tree and acts_as_sane_tree

This is the first time I'm modelling a hierarchy within the same model (product categories).
I found a great post on this topic. Since I use Rails 4 & Postgres, which according to the article supports recursive querying (this is the first time I hear this term), the "Adjacency List With Recursive Query" seems to be the way to go because it's both easy to model and fast to query.
The article suggests the acts_as_sane_tree gem, which supports recursive querying. This repo hasn't been updated for two years and I'm not sure whether it supports Rails 4. The project is a fork of the acts_as_tree gem, which supports Rails 4 and is well maintained.
Which gem should I use? And does the acts_as_tree gem support recursive querying to avoid expensive queries?

If you are in doubt what gem to use, I always suggest to takes a look at the Ruby Toolbox. It helps to evaluate if a gem project is still active, how many developer using this gem and a lot more. Why do you know to do that? Do do not want to choose a gem that is not maintained anymore. You want to use the tools that the community uses and stay as close to the mainstream. If you do not follow the community you will run into problems if you need a bug fixed, further documentation or want to update your Rails version.
In this case for nested ActiveRecord awesome_nested_set and ancestry are good candidates. I would not choose the Recursive Query implementation, because most databases do not support this. Unless you have a very good reason, it is not worth to bind your app to a specific database management system.

have you consider ancestry gem?
"It exposes all the standard tree structure relations (ancestors, parent, root, children, siblings, descendants) and all of them can be fetched in a single SQL query."

I'd agree with the accepted answer on one point - it's good to go for gems that are well maintained.
On two points I disagree:
Firstly, just because a gem is popular doesn't make it the right choice, or even a good choice. Taking ancestry gem as an example. It's been around a long time and is popular, but it requires you to add a special column to your tables which it fills with magic voodoo (I'm very uncomfortable with that sort of thing). Whereas a gem like acts_as_recursive_tree does all the same things as ancestry, also using single queries, but it only requires you to make a parent_id column that holds the ID of the parent - probably what you already have before even hunting for a gem.
Another example - there was a gem for linking uploaded files to records. I chose to use it because it seemed the popular choice. But I ditched it as soon as I discovered it was actually modelling a many-to-many relationships, not with a joining table, but by putting comma-separated list of IDs into a single field (can you believe it?)
Secondly, if the database you have chosen has cool features like recursive query implementation, then by all means use it - that's part of the reason you chose the superior database in the first place, isn't it? Unless you have a need for you application to be database-agnostic, then don't be scared of using the features your database provides. Mitigating against the very unlikely possibility that sometime in the future you'll want to switch to a database that has less features than your current one is certainly not worth the cost of avoiding the more powerful gems that use the features of your database.
Anyway, my recommendation is acts_as_recursive_tree It's very easy to user and powerful, and actively maintained.

When to use sphinx search, when to use normal query?

I am using thinking_sphinx in Rails. As far as I know, Sphinx is used for full text search. Let's say I have these queries:
keyword
country
sort order
I use Sphinx for all the search above. However, when I am querying without keyword, but just country and sort order only, is it a better to use just normal query in MySQL instead of using Sphinx?
In other words, should Sphinx be used only when keyword is searched?
Looking at overall performance and speed.

Not to sound snarky, but does performance really matter?
If you're building an application which will only be used by a handful of users within an organization, then you can probably dismiss the performance benefits of using one method over the other and focus instead on simplicity in your code.
On the other hand, if your application is accessed by a large number of users on the interwebz and you really need to focus on being performant, then you should follow #barryhunter's advice above and benchmark to determine the best approach in a given circumstance.
P.S. Don't optimize before you need to. Fight with all your heart to keep code out of your code.

Benchmark! Benchmark! Benchmark!
Ie test it yourself. The exact performance will vary depending on the exact data, and perhaps even the relative perofrmance of your sphinx and mysql servers.

Sphinx will offer killer-speeds over MySQL when searching by a text string and MySQL will probably be faster when searching by a numerical key.
So, assuming that both "country" and "sort order" can be indexed using a numerical index in MySQL, it will be better to use Sphinx only with "keyword" and for the other two - MySQL normal query.
However, benchmarks won't hurt, as barryhunter suggested ;)

RoR: Optimal Search Package for 16k+ Record Database

I have a database in (psql) that contains about 16,000 records; they are the titles of movies. I am trying to figure out what is the most optimal way to go about searching them (currently they are being searched via the web on a Heroku hosted website for Ruby on Rails). However, some queries such as searching for something like the word 'a' can take up to 20 seconds. I was thinking of using Sphinx however, such packages are advertised for full text searching, so I am wondering if that is appropriate for my problem. Any advice would be appreciated.

16000 records are too few both in both number and size (as you said title) to qualify for a Search Engine search. Try out normal full text search of your database. Set up the indexes for making it faster.
However this does not stop you from trying out some Search Engine like Sphinx or Solr. Both are open source. Sphinx pretty easy to setup too. But again to reiterate there is no need for this as the data size is too less and comes under the domain of Database Full Text Search.

If your database is on PSQL then sphinx is not possible as up till now heroku postgres is not supported to work with sphinx so the remaining choice so far is to use solr which is also good for full text search and some simple steps to make it implement.

Switching from SQl to MongoDB in Rails 3

I am considering switching a quite big application (Rails 3.0.10) from our SQL database (SQLite and Postgres) to MongoDB. I plan to put everything in it, mainly utf-8 string, binary file and user data. (Maybe also a little full text search) I have complex relationships (web structure: categories, tags, translations..., polymorphic also) and I feel that MongoDB philosophy is to avoid that and to put everything in big document, am I right ?
Does anyone have experience with MongoDB in Rails ? Particularly switching a app from ActiveRecord to Mongoid ? Do you think it's a good idea ? Do you know a guide/article to learn the MongoDB way to organize complex data ?
ps : In MongoDB, I particularly like the freedom offers by its architecture and its performance orientation. It's my main personal motivations to consider the switch.

I am using mongodb with mongoid, for 5-6 months. Have also worked with postgres + AR, MySQL + AR. Have no experience with switching AR to mongoid.
Are you facing any performance issues or expect to face them soon? If not I would advice to avoid the switch, as the decision seems just to be based on coolness factor of Mongodb.
They both have their pros and cons, I like the speed of mongodb, but there are many restrictions on what you can do to achieve that(like no joins, no transaction support and slow field vs. field(updated_at > created_at) queries).
If there are performance issues, I would still recommend to stick with your current system, as the switch might be a big task and it would be better if you spend half the time in optimizing the current system. After reading the question, I get a feeling that you have never worked with mongodb before, there are a many things which can bite you and you would not be fully aware of how to solve them.
However, If you still insist on switching, you need to carefully evaluate you data structure and the way you query them. In relational database, you have the normal forms, which have the advantage that whatever structure you start with, you will roughly reach the same end result once you do the normalization. In mongodb, there are virtually unlimited ways in which you can model your documents. You need to carefully model your documents to avail the benefits of mongodb. The queries you need to run play a very important role in your structuring along with the actual data you want to store.
Keep in mind, you do not have joins in mongodb(can be mitigated, with good modeling). As of now you can not have queries like, field1 = field2, i.e. you can't compare fields, but need to provide a literal to query against.
Take a look at this question: Efficient way to store data in MongoDB: embedded documents vs individual documents. Somebody points the OP to a discussion where embedded documents are recommended, but pretty much similar scenario, OP chooses to go with standalone documents, because of the nature of the queries he will be using to fetch the data.
All I want to say is, it should be a informed decision, which should be taken after you completely model your system with mongodb, have some performance tests with some real data to see if mongodb will solve your problem and should not be based on coolness factor.
UPDATE:
You can do field1 = field2 using $where clause, but its slow and is advised to be avoided.

We are currently switching from PostgreSQL, tsearch, and PostGIS in a production application. It has been a challenging process to say the least. Our data model is a better fit for mongodb because we don't need to do complex joins. We can model our data very easily into the nested document structure mongodb provides.
We have started a mirror site with the mongodb changes in it so we can leave the production site alone, while we stumble through the process. I don't want to scare you, because in the end, we will be happy we made the switch - but it is a lot of work. I would agree with the answer from rubish: be informed, and make the decision you feel is best. Don't base it on the 'coolness' factor.
If you must change, here are some tips from our experience:
ElasticSearch fits well with mongo's document structure to replace PostgreSQL's tsearch full text search extensions.
It also has great support for point based geo indexing. (Points of interest closest to, or within x miles/kilometers)
We are using Mongo's built in GridFS to store files, which works great. It simplifies the sharing of user contributed images, and files across our cluster of servers.
We are using rake tasks to dump data out of postgresql into yaml format. Then, we have another rake task in the mirror site which imports and converts the data into models stored in mongodb.
The data export/import might work using a shared redis database, resque on both sides, and an observer in the production application to log changes as they happen.
We are using Mongoid as our ODM, and there are a lot of scopes within our models that needed to be rewritten to work with Mongoid vs ActiveRecord.
Over all, we are very happy with MongoDB. It offers us much more flexibility in the way we model our data. I just wish we would have discovered it before the project were started.

skip active record,
Alternatively, if you’ve already created your app, have a look at config/application.rb
and change the first lines from this:
require "rails/all"
to this:
require "action_controller/railtie"
require "action_mailer/railtie"
require "active_resource/railtie"
require "rails/test_unit/railtie"
It’s also important to make sure that the reference to active_record in the generator block is commented out:
Configure generators values. Many other options are available, be sure to check the documentation.
# config.generators do |g|
# g.orm :active_record
# g.template_engine :erb
# g.test_framework :test_unit, :fixture => true
# end
As of this this writing, it’s commented out by default, so you probably won’t have to change anything here.
I hope it will be helpful to you while you switching app from AR to mongo.
Thanks.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart