Couden't find any compare questions related to sunspot (Solr) to Elastic Search (Lucene)
What would be the pro's and con's on both search engines?
I saw other VS questions to get a better inside in the comparisment of 2 gems so hope this is allowed to get a better insight in the both engines for newbies ( like me ). I have looked at sunspot already but have some issues with it. So I searched
http://www.elasticsearch.org/guide/reference/api/
vs
http://sunspot.github.com/
I started working on a project that needed full text search in Ruby so naturally I started with Solr + Sunspot, but I couldn't get it to work. It was a pain just the get them connected, then tried to figure out if the document indexed correctly, figure out the runtime classpath so I can add additional analyzer/tokenizer classes, editing config.xml/schema.xml, etc. Solr numDocs clearly said it received and indexed them but I couldn't get any query results. I just gave up after a couple days, it was kind of a configuration hell.
ElasticSearch + Tire was a breezy to get it up and running, I got it working in an hour.
Lucene is just a Java search library, hence Solr was developed to be a full service search app, but Solr still have all the trapping of a typical Java webapp: overly complicated XML configurations, schema-heavy, expect XML docs for indexing, requires a Java servlet container (Jetty or Tomcat), which just become too many points of failure for me.
ElasticSearch is based on Lucene too, it has a built-in servlet container so just run like a daemon, use a very straight forward JSON + REST API so it's great for testing and a more natural fit for Ruby. It's schemaless and it worked for me without even editing a config file. Everything worked beautifully.
What I really needed was Chinese search and ElasticSearch already packaged Luecene's SmartChineseAnalyzer as a plugin. Not sure how difficult it will be to customize the analyzer/tokenizer chain if you need that level of customization. Docmentation for ElasticSearch and Tire are both top-notch.
Tire (Ruby library for ElasticSearch)
https://github.com/karmi/tire
You can try out the demo, it'll install a rails searchapp, download the ElasticSearch binary and run it, then start Webrick automatically.
$ rails new searchapp -m https://raw.github.com/karmi/tire/master/examples/rails-application-template.rb
On my system it complained about not having a Javascript engine (Rails 3.2? no longer include thereubyracer gem by default), so I had to:
$ wget https://raw.github.com/karmi/tire/master/examples/rails-application-template.rb
$ nano rails-application-template.rb
add gem 'therubyracer' in the file (look for gem 'tire' and gem 'will_paginate'), then...
$ rails new searchapp -m rails-application-template.rb
For developing my own app, I just downladed the ElasticSearch tarball and run in the foreground with the -f switch (so I can easily stop it by Ctrl-C)
$ bin/elasticsearch -f
You can install the eleasticsearch-head plugin to get a web admin interface
https://github.com/mobz/elasticsearch-head
Also something I found out: if you have one-to-many relationship models, Tire won't resolve them for you in the search results, it just returns a flat collection. Your has_many and belongs_to relationships will just be object ids in the collection rather than full objects.
I think you should search for a comparison between Solr and elastic search.
In fact sunspot is based on Solr, and both Solr and elastic search are based on Lucene. They are two different projects with similar goals, both built on top of Lucene.
Here are two comparisons:
ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage?
http://www.findbestopensource.com/article-detail/solr-vs-elasticsearch
Here is the most complete up-to-date post on the topic: http://solr-vs-elasticsearch.com/
My recommendations as of May 2018
Here are some simple guidelines if the crazy long grid of features
above did not help.
Choose Solr if any of the following are true...
Your team consists mainly of Java programmers
You're already using ZooKeeper in your stack
You're already using Java in your stack
You are building a search application that has specific and nuanced relevancy requirements
You are building an ecommerce, job, or product search engine
Search is a central part of your product and user experience and there is the organizational mandate for search to be a core strength
Choose Elasticsearch if any of the following are true...
Your team consists mainly of Ruby/PHP/Python/full stack programmers (and your application does not have specific and nuanced relevancy
requirements)
You live and breathe JSON
You already use Kibana/ELK for managing your logs
Your application is analytics-heavy
If in doubt...
Every serious search application I have worked on has required
in-depth customization of the search workflow and relevancy tweaks,
and at the time of writing, this is simply not possible in
Elasticsearch without major hacking. If in doubt, go Solr.
Related
I've added sunspot gem in my application and tried to send it to production in heroku, but I'm trying to reindex my database, however, I'm getting an error. I did some more digging and I think I have to add websolr as an add-on? This costs $20/month. Is this the only option?
THanks
Founder of Websolr + Bonsai here (Heroku addons for Solr and Elasticsearch).
Rich's answer is pretty solid, with the exception of the SQL LIKE operator, which I do not recommend. The performance does not scale, and you're either going to sink in a lot more time than you might expect in order to eke out baseline search functionality. End result: a lot of time spent, and unhappy users.
Postgres full text search is a reasonable alternative, though the term analysis and result ranking will be lacking compared to Solr/Elasticsearch as your search traffic starts to grow in production.
You might also consider our sister service, Bonsai, which does offer a free Starter plan. It uses Elasticsearch, which means you'd want to use the official Ruby bindings for Elasticsearch rather than Sunspot.
Lastly, if you already have a production app on Heroku, you are welcome to create more than one index in your account, and share those indexes with your staging/qa and other apps.
I've done some more research and found out that there are other options if you don't want to take the websolr path. These other answers are good for some insights, but doesn't give an alternative to what can be used.
For some that's still looking, I suggest taking a look at Elastic Search
Rails Cast has a good tutorial on this as well.
And to use it with heroku, look into Bonsai which gives users a free option.
Hopefully this answer will help those that are also seeking other options than using sunspot gem with solr
Solr on Heroku uses their own add-on, which starts at $20pm:
Although I don't know why it costs up front, and doesn't have a "trial" option like many of the other Heroku Add-ons, there are certain ways around it
Full Text Search
Full text search is what you're performing, and Solr is a tool to make the process much more efficient. Despite being quite DB-expensive, you can use full text searching with Heroku, depending on your DB:
MYSQL
To perform full-text searching on MYSQL, you can simply use the "LIKE" operator with %variable% as your search phrase, like this:
SELECT * FROM `table` WHERE `name` LIKE `%benjamin%`
This basically finds all the records where the name column contains "benjamin" somewhere inside it. This is quite slow
POSTGRESQL
PostgreSQL offers more power in its full text searching, but is nonetheless still quite slow & expensive. You can read more about it here, but with rails, you can use a bunch of gems which do the task for you
We recently used a gem called textacular here: http://firststop.herokuapp.com
Here is the code we used for it:
#Search
def self.search(search)
basic_search(name: search, description: search)
end
Further Reading
You can see how full text searching works here: Any reason not use PostgreSQL's built-in full text search on Heroku?
I would recommend if you're just getting the foundations established for your app. Afterwards, you can upgrade to a more dedicated solution in the form of Solr et al
Here are
If you want to use the Heroku platform it starts for free, but you have to pay for almost every add-on, extra workers, extra storage, search engine, background tasks, you name it.
For $20/month you could also get a decent VPS, but you would have to install and manage that server by yourself.
As for sunspot/solr on Heroku, I don't think you can do that for free.
I've a mongoid embedded one to many model on Rails 3.1, to full text search within. I neet something very light and simple to deploy on heroku too, without having to pay for add-ons, initially.
All heroku Full-Text Search add-on currently, seem to have just paying plans (which is no good to start with), see Flying Sphinx and Websolr.
I need advice on a good solution (a ruby gem deployable on heroku) to start with and than to scale to other cloud services eventually.
Maybe MongoDB's core functionalities are enough for your needs:
http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo
There are two more possibilities that come into my mind:
1) you can use this gem:
https://github.com/mauriciozaffari/mongoid_search
2) you can use elasticsearch (http://www.elasticsearch.org/) and use the following gem:
https://github.com/karmi/tire
However, you couldnt use this solution with heroku only, you would have to setup your own server, for which in the case you want to use heroku, EC2 would be recommendable
We were using the sunspot_mongo gem with solr on Mongoid 2.4.
But after upgrading to Mongoid 3, support for sunspot seems to not be there. So we're investigating a move to elasticsearch with the tire gem. There are some new offerings in the "search as a service space" for elasticsearch, but they don't seem quite production ready yet, so hoping that changes quickly.
Hope it helps!
I am wondering on how to implement a search functionality like Github.
Just one search box on the top header right and when searched for a keyword, displays the results for Repository, Code and User.
Is there any tutorial or example to implement this on Rails 3?
Odds are really good they're doing separate searches across the tables for the same value, then combining the results afterwards.
Use Rails to create a small form containing a text field. When it's submitted take the value of the field and do a query using that as the search term.
If you're not sure how to do queries using ActiveRecord, see "Active Record Query Interface" for a nice overview.
You will have to do several queries, one per model, and put the results together on the same view.
If your question is "how do I do full text searches on several activerecord models in a DRY way" then there are basically two paths:
The common solution, but a bit complex, is using a dedicated daemon on your machine, like Sphinx. Sphinx is a service in (like Apache or MySQL) that indexes your content and allows you to do searches. You can use the Thinking Sphinx gem to communicate with it easily from rails. An alternative to Sphinx is Solr (there's also a gem for it called Sunspot)
If you are using Postgresql, there's a simpler alternative that doesn't require external services running on your server. Postgresql has with some full-text search capabilities built-in. There's a gem called texticle that helps using these services from rails. You can have that working very quickly.
Want to build a web app using SOLR as the only backend. Most of the data will be stored in SOLR via offline jobs although there is some need for CRUD.
Looking at popular web frameworks today like Rails, Django, web2py etc. despite NoSQL the sweet spot for productivity still seems to be around active record implementations sitting on top of a RDBMS.
What's the best framework, in terms of productivity, for building web apps with SOLR as the backend?
All three of the above answers are great recommendations for development frameworks. I would flip around your question and ask "Which is best web app framework for me", not "which is best with Solr" and make a decision based on your skills, the community that you have around you, and other soft factors. Especially if you are completely agnostic on which way to go.
If you have friends who love Grails and can help you get started, then Grails might be the way to go. Have a Python group that meets regularly? Then Django has a lot to offer. I personally love Rails, and so I would recommend rails. But that is only a recommendation of "What I like" versus "what is best".
The wonderful thing about Solr is how agnostic it is to the front end. It plays nice in so many environments!
The web2py Database Abstraction layer does not support SOLR at this time which means you cannot use the DAL syntax for accessing SOLR and you cannot use automatically generated forms from a SOLR DB schema. Yet you can generate forms using SQLFORM.factory as-if you had a normal relational database and perform the insert/update/select/update into SOLR manually. web2py includes libraries for parsing/writing both JSON and XMl so it will be easy to implement SOLR APIs in few lines of code. If you bring this up on the web2py mailing list we can help with some examples.
EDIT (copied from the answer on the web2py mailing list):
Normally in web2py you define a model
db.define_table('message',Field('body'))
and then web2py generates and processes forms for you:
form=SQLFORM(db.message)
if form.accepts(request.vars):
do_something
In your case you would not use define_table because web2py DAL does
not support SOLR and you cannot generate forms from the schema but you
can install this: http://code.google.com/p/solrpy/
and you can do
#in model
import solr
s = solr.SolrConnection('http://example.org:8083/solr')
#in controller
form=SQLFORM.factory(Field('body'))
if form.accepts(request.vars):
s.add(mody=request.vars.body)
s.commit()
do_something
So the difference is SQLFORM.factory instead of SQLFORM and the extra
line after accepts. That is it.
I would use Sunspot 1.2 and Rails 3.
Sunspot is commonly used as an ActiveRecord extension, but is also designed to be ORM-agnostic. Rails 3 has decoupled ActiveRecord from the framework, making it easy to go entirely ORM-free.
http://outoftime.github.com/sunspot/
By the way , SphinxSearch is a lot faster than solr/lucence and many unique features. Also search accuracy is a lot better comparing from my experience and independent benchmarks.
it have native, very easy python api and it integrates well with web2py.
but it needs a RDBMS tho . I am using it , web2py + sphinxsearch , building an office files search engine.
You can give a try too.
www.sphinxsearch.com
I would like to do full-text searching of data in my Ruby on Rails application. What options exist?
There are several options available and each have different strengths and weaknesses. If you would like to add full-text searching, it would be prudent to investigate each a little bit and try them out to see how well it works for you in your environment.
MySQL has built-in support for full-text searching. It has online support meaning that when new records are added to the database, they are automatically indexed and will be available in the search results. The documentation has more details.
acts_as_tsearch offers a wrapper for similar built-in functionality for recent versions of PostgreSQL
For other databases you will have to use other software.
Lucene is a popular search provider written in Java. You can use Lucene through its search server Solr with Rails using acts_as_solr.
If you don't want to use Java, there is a port of Lucene to Ruby called Ferret. Support for Rails is added using the acts_as_ferret plugin.
Xapian is another good option and is supported in Rails using the acts_as_xapian plugin.
Finally, my preferred choice is Sphinx using the Ultrasphinx plugin. It is extremely fast and has many options on how to index and search your databases, but is no longer being actively maintained.
Another plugin for Sphinx is Thinking Sphinx which has a lot of positive feedback. It is a little easier to get started using Thinking Sphinx than Ultrasphinx. I would suggest investigating both plugins to determine which fits better with your project.
I can recommend Sphinx. Ryan Bates has a great screencast on using the Thinking Sphinx plugin to create a full-text search solution.
You can use Ferret (which is Lucene written in Ruby). It integrates seamless with Rails using the acts_as_ferret mixin. Take a look at "How to Integrate Ferret With Rails". A alternative is Sphinx.
Two main options, depending on what you're after.
1) Full Text Indexing and MATCH() AGAINST().
If you're just looking to do a fast search against a few text columns in your table, you can simply use a full text index of those columns and use MATCH() AGAINST() in your queries.
Create the full text index in a migration file:
add_index :table, :column, type: :fulltext
Query using that index:
where( "MATCH( column ) AGAINST( ? )", term )
2) ElasticSearch and Searchkick
If you're looking for a full blown search indexing solution that allows you to search for any column in any of your records while still being lightning quick, take a look at ElasticSearch and Searchkick.
ElasticSearch is the indexing and search engine.
Searchkick is the integration library with Rails that makes it very easy to index your records and search them.
Searchkick's README does a fantastic job at explaining how to get up and running and to fine tune your setup, but here is a little snippet:
Install and start ElasticSearch.
brew install elasticsearch
brew services start elasticsearch
Add searchkick gem to your bundle:
bundle add searchkick --strict
The --strict option just tells Bundler to use an exact version in your Gemfile, which I highly recommend.
Add searchkick to a model you want to index:
class MyModel < ApplicationRecord
searchkick
end
Index your records.
MyModel.reindex
Search your index.
matching_records = MyModel.search( "term" )
I've been compiling a list of the various Ruby on Rails search options in this other question. I'm not sure how, or if to combine our questions.
It depends on what database you are using. I would recommend using Solr as it offers up a lot of nice options. The downside is you have to run a separate process for it. I have used Ferret as well, but found it to be less stable in terms of multi-threaded access to the index. I haven't tried Sphinx because it only works with MySQL and Postgres.
Just a note for future reference: Ultra Sphinx is no longer being maintained. Thinking sphinx is its replacement. Although it lacks several features at this time like excerpting which Ultra sphinx had, it makes up for it in other features.
I would recommend acts_as_ferret as I am using it for Scrumpad project at work. The indexing can be done as a separate process which ensures that while re-indexing we can still use our application. This can reduce the downtime of website. Also the searching is much faster. You can search through multiple model at a time and have your results sorted out by the fields you prefer.