Solr sunspot in different environments - ruby-on-rails

So I have been using solr sunspot for a couple of days and have been going a little crazy over an issue with it. I have search many different sites for the answer, but a lot of people seem to have different ideas. I am trying to figure out how sunspot manages indexing/reindexing of models in different environments with auto indexing on. This is what I have concluded:
Dev/Prod: Saving/updating an object automatically updates the indexing of that object in solr
Test: Saving/updating an object does not automatically update the index and you need to call object.reindex! in order for it to take effect.
Console: Same as testing. Reindex is required to properly update solr.
So does this look correct? It was killing me for so long whether something was wrong with my solr sunspot setup or if it just doesn't work the same in these different environments. Any help would be greatly appreciated!

After updating a document in Solr, you must issue a "commit" to tell Solr to write the changes to disk and have them start appearing in search results. Sunspot::Rails takes care of this automatically in the course of a Rails request, but outside of that (in tests, from the console), you need to do it explicitly. It's a simple Sunspot.commit.

There are three main ways to update an index:
object.index: marks the object for indexing, but might not be immediately indexed
object.index!: indexes the object immediately
Class.reindex: indexes all objects on the model immediately
It sounds like you should be using the .index! method instead of the .index method on console.

Related

Heroku Rails Websolr, sunspot is not free in production?

I've added sunspot gem in my application and tried to send it to production in heroku, but I'm trying to reindex my database, however, I'm getting an error. I did some more digging and I think I have to add websolr as an add-on? This costs $20/month. Is this the only option?
THanks
Founder of Websolr + Bonsai here (Heroku addons for Solr and Elasticsearch).
Rich's answer is pretty solid, with the exception of the SQL LIKE operator, which I do not recommend. The performance does not scale, and you're either going to sink in a lot more time than you might expect in order to eke out baseline search functionality. End result: a lot of time spent, and unhappy users.
Postgres full text search is a reasonable alternative, though the term analysis and result ranking will be lacking compared to Solr/Elasticsearch as your search traffic starts to grow in production.
You might also consider our sister service, Bonsai, which does offer a free Starter plan. It uses Elasticsearch, which means you'd want to use the official Ruby bindings for Elasticsearch rather than Sunspot.
Lastly, if you already have a production app on Heroku, you are welcome to create more than one index in your account, and share those indexes with your staging/qa and other apps.
I've done some more research and found out that there are other options if you don't want to take the websolr path. These other answers are good for some insights, but doesn't give an alternative to what can be used.
For some that's still looking, I suggest taking a look at Elastic Search
Rails Cast has a good tutorial on this as well.
And to use it with heroku, look into Bonsai which gives users a free option.
Hopefully this answer will help those that are also seeking other options than using sunspot gem with solr
Solr on Heroku uses their own add-on, which starts at $20pm:
Although I don't know why it costs up front, and doesn't have a "trial" option like many of the other Heroku Add-ons, there are certain ways around it
Full Text Search
Full text search is what you're performing, and Solr is a tool to make the process much more efficient. Despite being quite DB-expensive, you can use full text searching with Heroku, depending on your DB:
MYSQL
To perform full-text searching on MYSQL, you can simply use the "LIKE" operator with %variable% as your search phrase, like this:
SELECT * FROM `table` WHERE `name` LIKE `%benjamin%`
This basically finds all the records where the name column contains "benjamin" somewhere inside it. This is quite slow
POSTGRESQL
PostgreSQL offers more power in its full text searching, but is nonetheless still quite slow & expensive. You can read more about it here, but with rails, you can use a bunch of gems which do the task for you
We recently used a gem called textacular here: http://firststop.herokuapp.com
Here is the code we used for it:
#Search
def self.search(search)
basic_search(name: search, description: search)
end
Further Reading
You can see how full text searching works here: Any reason not use PostgreSQL's built-in full text search on Heroku?
I would recommend if you're just getting the foundations established for your app. Afterwards, you can upgrade to a more dedicated solution in the form of Solr et al
Here are
If you want to use the Heroku platform it starts for free, but you have to pay for almost every add-on, extra workers, extra storage, search engine, background tasks, you name it.
For $20/month you could also get a decent VPS, but you would have to install and manage that server by yourself.
As for sunspot/solr on Heroku, I don't think you can do that for free.

Caching large numbers of ActiveRecord objects

There's an oft-called method in my Rails application that retrieves ~200 items from the database. Rather than do this again and again, I store the results using Rails.cache.write. However, when I retrieve the results using Rails.cache.read, it's still very slow: about 400ms. Is there any way to speed this up?
This is happening in a controller action, and I'd prefer users not have to wait so long to load the page.
FYI regarding Rails caching, from the Rails Guides, "...It’s important to note that query caches are created at the start of an action and destroyed at the end of that action and thus persist only for the duration of the action."
If you can share the method, I may be able to help more quickly. Otherwise, a couple performance best practices:
Use .includes to avoid N+1 queries. Define this in the model and
call it in the controller.
How are your indexes set-up (if any)?

reloading tire/elasticsearch mappings for a model that already has data stored

I am using Tire and elasticsearch to provide search functionality on a MongoMapper model, which is part of a Rails App. I just stumbled across a problem where the mappings for this model were not being updated when I redeployed to an environment that uses the following configuration (in config/environments/env_name.rb):
config.cache_classes = true
reloading the class alone didn't seem to fix the issue (perhaps understandably, the new mappings might not be incompatible with existing data I guess?). instead I had to do the following:
MyModel.index.delete
<restart the app or reload the class>
MyModel.index.import MyModel.all
I just wondered if there's a better way of a). ensuring the latest mappings defined in my model code are being used by elasticsearch after each deployment but b). avoiding unnecessary repopulating the index with the complete dataset?
We normally deploy using Chef, so I could automate the three steps I used successfully without too much trouble. But I'm new to elasticsearch and tire so I thought it's highly likely I'm misusing both or making things unnecessarily difficult.
Couple of points here:
Tire tries to create the index with correct mapping when the class loads
but Tire does not attempt to create the index for the model when it already exists
So, your question is really more about the proper workflow? When you deploy a new version of the application, you shouldn't re-populate the index, in the same way you don't re-populate the database from some kind of backup.
Automatically checking for index mappings conforming to current definition in the model is certainly possible (compare the MyModel.tire.index.mapping with MyModel.tire.mapping, re-populate if different, etc), it's something I'd be wary to do.
The developer usually knows when she changed the mapping and should re-index the data. Dropping the index, and re-populating also means search downtime, and isn't even feasible for large applications.
A nicer solution is to use a specific index name such as my-index-2012-12 when importing the data, and point a my-index alias to this index. Then you can freely re-populate the index, and flip the alias when you're done, without downtime. Tire tries hard to support you in this kind of workflow (the Rake import task, etc).

Preserve external changes in CouchDB with CouchRest Model

I'm using couchrest_model to manage some DBs in Rails. So far, it worked like a charm, but I noticed that if I PUT some data via HTTP request, CouchRest Model doesn't seem to realise that the changes are made, so it wipes off the whole record. Of course, I can see the changes in Futon, but not in Rails. When I enter the console, the previously saved instance is just not there.
Of course, I could use HTTP all the way, but I'd really like to make use of validations and other goodies that are available in ActiveRecord class.
Is there any chance that I can make these two guys work together?
P.S.
If you think/know that this approach will work with any other CouchDB Ruby/Rails gem, please, do tell! =)
I've mentioned CouchRest Model because IMO it's the most up-to-date and advanced gem out there.
I realised that this one was so damn easy, it's just that I was using the wrong tool (apart from being a proper n00b). AFAICT, it's not possible to use CouchRest Model solely to carry out persistent operations on CouchDB backend. All external calls that alter the database record(s) in certain way will somehow "remove" that record from ActiveARecord. Instead, you'd probably like to use CouchPotato, since it supports persistent operations.
I'll be glad to give checkmark if anyone comes up with vaguely better idea that this one.

Count the amount of queries in Rails

I'm trying to optimize my site, in order to do that I want to know which action is making more queries than others. Is there anyway to know the amount of DB hits made by one action?
i found these gem very helpful for inspecting issues and optimizing queries
https://github.com/noahd1/oink
https://github.com/flyerhzm/bullet
This gem will do exactly what you need: show the number of db queries issued per action:
https://github.com/makandra/query_diet
You could look at the rails log to see what & how many queries are being fired for each request. But its usually pain go though the log each time to see which request are taking time.
Usually I use newrelic gem in development to see which actions are taking more time and will try to optimize the queries. Refer to http://newrelic.com/docs/ruby/developer-mode for more info on newrelic in development model.
Also based on the database and rails version there are other gems ( https://github.com/flyerhzm/bullet ) which tells you how queries are performed in a particular request.

Resources