I am trying to implement a web/smart phone app that allow users to search for places based on keywords and location and here is the requirement:
Users shall be able to search by typing in keywords and location; Locations can be zip code, city/state or current location from the mobile app (lat and long)
We would like to be able to customize relevance score; We need to be able to define our own relevance algorithm based on keyword matching, location matching and some other parameters.
We use ASP.NET MVC as our web development framework and MongoDB as a data store. We also maintain a list of all zipcode and city/state as well as their centroid (lat/long) in our database. Our thought is override the scoring that the full-text system provide (like Lucene scoring) with our own algorithm. I am trying to find the best solution to address this. I am wondering whether should we use MongoDB full-text search or try to use Lucene .NET or perhaps Solr? Any help/pointer/comment is always apprecated!
So as a starting point, MongoDB does not have support for full-text search.
It has some regex capabilities and you can index on arrays. So you can do some things here, like building an array of keywords to make basic text search possible.
However, this is a long way from what Solr and Sphinx.
The other big problem you'll have is with relevance scoring. It's going to be very difficult to perform any type of server-side relevance scoring with MongoDB. There's no really efficient version of a server-side stored procedure. You'll likely have to pull the results to a client or server dedicated to that scoring.
Related
I'm starting a new rails project that integrates closely with Google Maps. When a user searches for a city, I'm trying to decide whether to geocode the address on the fly (using Google's Geocoding API) or to look up the city in a database pre-populated with lat/long. After I have the lat/long I will plot it on Google Maps.
Which do you think would perform better? With the database lookup, the table would have to be pretty large to account for all the cities I would need and I would have to fallback on the geocoding API anyway for any cities that I don't have in my database.
I wasn't sure if there is a common practice to this or not. I don't need a user's specific location, but just a city they are searching for.
The size of the table is no problem, as long as you index on the city name.
Performance of indexed database queries outspeed web API access by far.
An other point is, that you have better controll of the found data. For example, if you find more than one matching city, you can provide a choice of your DB entries, while Google sometimes reports none or some random (or at least unexpected) search result.
This is, why I had to change to a DB search first strategy in one of my project: Google somtimes didn't find my customers addresses but something total different (i.e. small villages with the same name as the expected bigger one)
Why not do both?
Have the address's geocoded information in your database as "Address Cache" and then call the Google Maps Geocode API only if the address doesn't already exist in your database. That's the approach I used in my Google Maps to SugarCRM integration. It works well. BTW, the Google Maps Geocode API is impressively fast, so users rarely notice. Yet, there is a 2,500/day limit on request and it's also throttled to about 10 requests per second. So, considering those limits, I think a combination database/geocode approach is much better in the long run.
https://github.com/jjwdesign/JJWDesign-Google-Maps
I am using ravendb for my intranet website. I need to implement full text search on whole website ? I can use ravendb's linq search queries for documents which is lucene based in the background.
Other approach is to use Lucene.Net library to implement fulltext search independently.
Whatever approach I choose, it should be able to search through attachments stored in blob format in ravendb.
Any ideas or suggestions please ?
RavenDB is fully integrated with Lucene. There would be little point to using it independently.
But by definition, attachments are not searchable. You can certainly store very large documents that are fully searchable, but they wouldn't be attachments. The whole point of attachments are for things that you wouldn't want to search. Example: videos, photos, music, etc.
Review:
http://ravendb.net/docs/client-api/attachments
http://ravendb.net/docs/client-api/querying/linq-extensions/search
http://ravendb.net/docs/appendixes/lucene-indexes-usage
Revised Answer
I have written a bundle that uses IFilters to have RavenDB automatically extract the contents of attachments and index them with Lucene. It is available here.
Enjoy!
What proximity search options are there for Rails? (Perhaps with pros and cons of each?)
Is a postcode database the way to go?
or using Geocoding with a gem such as Geocoder?
Are there any best practises or gotchas to be aware of?
(Example usage, A Yellow Pages type app where businesses can list, and users can enter their postcode and find businesses that are close to them, or within a radius of specified miles.)
Update: The app is pretty much exactly like the example app above - Businesses can list in the app which is essential a directory (which will have categories for each type of business) and users can go to a category and sort the results by distance (after entering their postcode - so the businesses closest to them, gets shown first). There will be no searching by name or other criteria - it is a simple go to the 'Boxer Dog Breeders' page and sort by distance.
Well the answer depends on what you want to do. If you want simple proximity searches (within a radius and stuff) then the Geocoder gem is more than fine! Now, if you want more advanced search capabilities (polygon, multi-polygon searches etc) I would suggest to go with the PostrgeSQL database and the wonderful PostGIS extension. For use with Rails you should definitely check out the PostGIS adapter for ActiveRecord which comes from the author of a really nice gem called RGeo which uses the superfast libraries GEOS and Proj for the Geospatial calculations.
Otherwise if you find that you need a dedicated search server that has GIS capabilities then you should definitely use ElasticSearch and Ruby has a great gem to aid you with ES called Tire.
Hope I helped!
If you plan to add addational search criteria (eg. name of firm, category) or complex sorting and your DB will grow over 10000 I'm recommend using external search server. For example sphinx search + thinking sphinx.
Here is example of geosearching: http://freelancing-god.github.com/ts/en/geosearching.html
RoR model solutions (like Geocoder) are not very efficient with full text searching.
I don't have experience with Geocoding myself, but Alex Reisner's Geocoder gem looks like the best option, by a mile.
Geocoder is a complete geocoding solution for Ruby. With Rails it adds geocoding (by street or IP address), reverse geocoding (find street address based on given coordinates), and distance queries. It’s as simple as calling geocode on your objects, and then using a scope like Venue.near("Billings, MT").
Checkout the README for a full example on how to use it.
I have a mediawiki installation that I've customized with some of my own extensions. Here is the basic platform, pretty standard LAMP install.
Ubuntu Server
Apache 2
Mediawiki 1.15
PHP 5.2.6
MySQL 5.0.67
For the actual MW search I use Lucene (EzMwLucene). I also have custom extension that displays tabular data from a separate database within a MW page. Lucene doesn't index this info (which, in my case is actually good because it would clutter your expected search results). For this installation I didn't do anything to Lucene other than install it and wouldn't know how to customize it for my needs and it may be "too powerful".
At any rate, I need to create a search for the data in my other database. I have a master table that is updated daily based on data stored in other (normalized) tables. At the moment it is one of these searches that basically creates a SQL query based on the criteria you enter. This is a lot of work, though. I would like it to be more of a "type and submit" type search.
I don't think I need a comprehensive "cut & paste" type answer, but if anybody has something that I can google I would be very appreciative. I don't need to recreate the wheel, which is what I would be doing if I followed what I see in google.
If you would like to see my master database, let me know, I would want to sanitize it to make me more anonymous (whatever that means). Also, if you're familiar with MW and would like to see any of my extension code, again, let me know.
TL;DR: need to make a custom search feature with LAMP (displayed in Mediawiki). Any guidance appreciated.
Thanks SO!
Why do you need to add custom search? This will relate to the best answer.
For simplicity, you could use the Google Search Engine - http://www.mediawiki.org/wiki/Extension:Google_Custom_Search_Engine
Otherwise it sounds like you need to write a full-text query for the database.
I'm looking into implementing full text search on our Firebird database. Our requirements are:
Every field in several tables should be indexed. When a result is found we should be able to find out the originating table.
The index can be stored in the database or in the file system.
The results of the search (BigInt primary keys) must be used to join with the original records in the database to display the records in a table.
Can anybody recommend a decent way to achieve what we need? I've looked at somehow integrating DotLucence into Delphi, but can't really find very much information on how to go about it.
Here are a few resources for you to consider:
Sphinx very powerful and popular free open source full-text search engine.
Textolution Fulltext search for Interbase and Firebird.
IBObjects Full Text Search ("Fuzzy Search") module, a fully working module that can be used to set up your search indexes or as a model for your own custom implementation.
Rubicon is a Delphi add-on that lets you put full text search capabilities into your applications.
Fulltext Search for Firebird SQL By Dan Letecky on CodeProject using DotLucene full-text search engine.
Mutis is a Delphi port of the Lucene Search Engine. Provide a flexible API for index, catalog and search text-based information with great performance. Excelent for implement custom search engines, researching, text retrieval, data mining and more.
There is a fork of Firebird code made by a company called Red Soft. It's licensed under the same license as Firebird, so you can take a look at their version which can support full-text searches using Lucene engine via JavaVM interfaces.
You can also read a paper titled "Full text search in Firebird without a full text search engine" by Bjoern Reimer and Dirk Baumeister, presented at 4th Firebird Conference.
I think you will have a problem with requirement 2: The index can be stored in the database or in the file system. Most indexing services create their own index file which stores data in a highly optimized way. If you really want it, maybe it is possible to load and save an index to a single blob field but I don't really see a reason for this.