I'm looking into the migration of Solr to AWS CloudSearch before start investigate myself, thought of asking a question to the community.
My current application is in ASP.NET MVC C# and using the SolrNet.dll for Solr Service.
If anybody did the migration then please share your experiences, Is there any changes in the return JSON results, or query parameters or APIs.
Appreciate your helps on this.
Amazon cloudsearch is based on solr, so conceptually they work same way, but amazon has written its own wrapper on top of solr api.
Amazon cloud search has two endpoints, one for search and other for
indexing documents.
Amazon cloudsearch has four different query parsers, if you place
queries using Lucence as query parser, query syntax and
functionality is same.
The only difference I observed functionally is cloudsearch doesn't
support hierarchy in fields data, it only provides with text and
literal datatypes and their arrays for multiple values.
Related
I would like to implement a graph database in our company and wanted your expert help on this. I have a data model. and want to put it into AWS Neptune, but I don't know if I want Tinkerpop or a property graph. I am a newbie. How do I go about getting this data into neptune? Can someone tell me how to start or basics of getting data loaded?
I'm not quite sure I understand the question. Apache TinkerPop is a graph computing framework that includes the Gremlin query language. It is designed for use with property graphs.
Amazon Neptune supports Apache TinkerPop/Gremlin. Loading data into Amazon Neptune can be done via the supported query languages or via the Neptune bulk load API.
I'm looking for a solution that combines the power of Google Site Search with Elasticsearch for a Rails 4 app preferably as a Rails gem.
As far as I can tell the most popular search solutions, ThinkingSphinx, ElasticSearch with the tire gem, Sunspot Solr and lastly PostreSQL searching functionality, all seem to only handle database searching and do not have page/html template searching functionality.
If I have this wrong then please correct me and I will happily pick one of the above.
The site contains mainly static HTML so the Google Site Search api is the obvious solution but there are some ActiveRecord results that should also be included in the results of a search.
If there really isn't a simple combined solution then I would appreciate any pointers as to how to achieve the merging of an elasticsearch (my preffered AR search solution) with a google site search where the results happen to include the same pages so page results are not duplicated.
To rephrase your question, you want to combine results of Google Site Search with custom search results provided by Sphinx, Solr or Elasticsearch?
First, you cannot really customize the content of Google Site Search results easily. You can customize the design, and could employ some JavaScript tricks to "merge" its results with another data source, but I'd say the approach is not something maintainable and, more importantly, usable.
Notice, that you can display the search results from Elasticsearch with Tire in the same way as ActiveRecord instances, all the usual Rails helpers such as url_for etc. work. The easiest way to evaluate the integration is to generate the example application with the Rails template.
If you want to combine the results from ActiveRecord data and the results from any static pages you might have on your site/application, it wouldn't be hard to write a simple crawler which would retrieve, parse and index the content of static pages and store it as ActiveModel-compatible documents in Elasticsearch.
I have a site-wide custom written search controller for my Rails 3 app and would like to include results from the site's WordPress blog. What is the best way for me to perform a keyword search on posts from within my Rails app?
If you share database then just use SQL query on it. This solution gives you speed of direct db query but you’ll need to construct that query properly in order to get all relevant data.
If you don't have access to the WP database from your Rails app then the best way will be to use curl, httparty, RestClient or any other file retrieval library.
To do that, create Wordpress page with custom template which will output search results in a format which is best for you to parse in Rails app (json, xml, csv, urlencoded, whatever).
Then request that WP page from your Ruby app using curl/RestClient/httparty…
This solution gives you the power of WP template tags and functions to get the results.
Also instead of creating custom template from scratch you can just simply copy and tweak search.php from core template to provide the results in a format required by your Rails app.
With this solution you are lacking the speed of direct access to db because all search result will have to be transferred through http pipe and you have to process the data twice (encode to the proper format in WP and decode in Rails app).
Interesting problem. I think I would approach it like this:
Use RSS as the text transport from the blog to your rails app. This allows the flexibility to add more blogs in the future, change your blog engine, database host, etc. It also protects you from Wordpress code updates. Given the security history of Wordpress, I like to host them in a protected sandbox anyway. RSS is the native language for blogs, so it seems a natural fit for this kind of content integration.
Use the feedzirra gem to import RSS entries into a rails model.
Use Elasticsearch and tire for fuzzy text searching across both your rails app and your blog entries. See this Railscast for details.
Option 1. is to use search engine for both sites, like elasticsearch, solr etc. So you populate the index from rails and wordpress.
Option 2. You write script, that reads periodically your wordpress RSS and saves data in your rails app.
At the end you should avoid to search from different sources, you should gather the data into one place and then search.
You don't have to stuck with wordpress. You can use Google search APIs. Web search api has been deprecated but still working. Its replacement is Custom Search API. You may need to pay if you query over the limit.
Alternatively you can leverage other search engine APIs like Bing Search API.
I'd suggest using the Wordpress JSON API and plugging that into your search using solr or something similar. You can index as posts are created and then call the articles via the sam JSON interface.
Use Tire and wp-elasticsearch with ElasticSearch.
I am wondering on how to implement a search functionality like Github.
Just one search box on the top header right and when searched for a keyword, displays the results for Repository, Code and User.
Is there any tutorial or example to implement this on Rails 3?
Odds are really good they're doing separate searches across the tables for the same value, then combining the results afterwards.
Use Rails to create a small form containing a text field. When it's submitted take the value of the field and do a query using that as the search term.
If you're not sure how to do queries using ActiveRecord, see "Active Record Query Interface" for a nice overview.
You will have to do several queries, one per model, and put the results together on the same view.
If your question is "how do I do full text searches on several activerecord models in a DRY way" then there are basically two paths:
The common solution, but a bit complex, is using a dedicated daemon on your machine, like Sphinx. Sphinx is a service in (like Apache or MySQL) that indexes your content and allows you to do searches. You can use the Thinking Sphinx gem to communicate with it easily from rails. An alternative to Sphinx is Solr (there's also a gem for it called Sunspot)
If you are using Postgresql, there's a simpler alternative that doesn't require external services running on your server. Postgresql has with some full-text search capabilities built-in. There's a gem called texticle that helps using these services from rails. You can have that working very quickly.
Want to build a web app using SOLR as the only backend. Most of the data will be stored in SOLR via offline jobs although there is some need for CRUD.
Looking at popular web frameworks today like Rails, Django, web2py etc. despite NoSQL the sweet spot for productivity still seems to be around active record implementations sitting on top of a RDBMS.
What's the best framework, in terms of productivity, for building web apps with SOLR as the backend?
All three of the above answers are great recommendations for development frameworks. I would flip around your question and ask "Which is best web app framework for me", not "which is best with Solr" and make a decision based on your skills, the community that you have around you, and other soft factors. Especially if you are completely agnostic on which way to go.
If you have friends who love Grails and can help you get started, then Grails might be the way to go. Have a Python group that meets regularly? Then Django has a lot to offer. I personally love Rails, and so I would recommend rails. But that is only a recommendation of "What I like" versus "what is best".
The wonderful thing about Solr is how agnostic it is to the front end. It plays nice in so many environments!
The web2py Database Abstraction layer does not support SOLR at this time which means you cannot use the DAL syntax for accessing SOLR and you cannot use automatically generated forms from a SOLR DB schema. Yet you can generate forms using SQLFORM.factory as-if you had a normal relational database and perform the insert/update/select/update into SOLR manually. web2py includes libraries for parsing/writing both JSON and XMl so it will be easy to implement SOLR APIs in few lines of code. If you bring this up on the web2py mailing list we can help with some examples.
EDIT (copied from the answer on the web2py mailing list):
Normally in web2py you define a model
db.define_table('message',Field('body'))
and then web2py generates and processes forms for you:
form=SQLFORM(db.message)
if form.accepts(request.vars):
do_something
In your case you would not use define_table because web2py DAL does
not support SOLR and you cannot generate forms from the schema but you
can install this: http://code.google.com/p/solrpy/
and you can do
#in model
import solr
s = solr.SolrConnection('http://example.org:8083/solr')
#in controller
form=SQLFORM.factory(Field('body'))
if form.accepts(request.vars):
s.add(mody=request.vars.body)
s.commit()
do_something
So the difference is SQLFORM.factory instead of SQLFORM and the extra
line after accepts. That is it.
I would use Sunspot 1.2 and Rails 3.
Sunspot is commonly used as an ActiveRecord extension, but is also designed to be ORM-agnostic. Rails 3 has decoupled ActiveRecord from the framework, making it easy to go entirely ORM-free.
http://outoftime.github.com/sunspot/
By the way , SphinxSearch is a lot faster than solr/lucence and many unique features. Also search accuracy is a lot better comparing from my experience and independent benchmarks.
it have native, very easy python api and it integrates well with web2py.
but it needs a RDBMS tho . I am using it , web2py + sphinxsearch , building an office files search engine.
You can give a try too.
www.sphinxsearch.com