Umbraco (Examine) Search - Synonyms - umbraco

I am trying to implement synonym searching in the Examine search engine that comes with Umbraco 8 out of the box.
Does anyone have any experience with implementing synonym searching in Examine/Umbraco 8. The options that I have been considering after looking around are -
A package that can be installed in Umbraco 8 that offers this extended functionality (if one exists).
Implementing a custom index (currently just using the out of the box 'ExternalIndex') that somehow implements synonym searching in the analysis (via custom analyzer implementation etc - If that is even possible).
Manually formatting multiple search terms by checking for synonyms in the string beforehand, running all searches and consolidating the results after (really a nasty, last resort option - you don't have to tell me how bad this is, I already know).
I have been trawling around the forums for a definitive answer on this and cannot really find one. Essentially I want to stick with the Examine engine for simplicity, however I am starting to think that the best way to achieve what I am after would be to move to a new engine completely (elastic search for example).
Many thanks in advance.

Use algolia? It's free and will do what you need easily? https://www.algolia.com/

The Examine is based on something called the Lucene search index. Lucene is known to not really do synonyms I'm afraid (read why here and potential solution).
Your thinking is probably correct. Examine is good at what it does, if you want to use more advanced searching then you will be better off using a more advanced search provider. There are loads of options, Algolia is Saas and comes with a free plan depending on your usage. It's easy to install and you target data from the front-end.
YOu could also look into Azure Cognitive Search or Solr. These are probably harder to implement but will also do the job

Related

Geodata Querying Optimisations

I am planning to write a Node.js-powered RESTful web service that I will use for a mobile application which provides some sort of location based features. The most basic use case is going to look something like this:
the user can create a resource by sending a request to the web service containing the resource's name and the user's current location (latitude and longitude)
the web service will store the metadata about this resource internally in some sort of collection
the user can query the web service for a list of resources within 5km of his current location
One of the first problems that came up in my mind was scalability. Let's suppose that at some point in the future the server will hold metadata for 1 million resources. When a user will query for nearby results, looping through 1 million entries to compute the distance will take forever.
There are many services out there that have the same flow, so I thought implementing something like this is not going to take me a lot of time. I might have been wrong.
I am now two days into researching proven methods and algorithms. By now I have read everything I could put my hands on about QuadTrees, Geohases, databases with spatial indexing support, formulas and so on. However, I still can't get the whole picture of how everything is going to work.
I was hoping that maybe someone who has worked on something similar could share his insight on what approach might be the most suitable considering this use case and the technologies that I am planning to use. Also, a short description of how it can be implemented would help me a lot!
For those who are also looking for more information on this topic out of curiosity, my answer might not provide much clearance. However, some answers in here might help you understand how you could achieve proximity searches using Geohashes.
My approach, after doing a little research on Redis, will be not to overcomplicate things and just use the tools that are already out there. It has out of the box support for spatial indexing and will most probably meet all my persistance requirements for this project.
Apparently MongoDB also comes with built-in support for geodata. In fact, even RDBMS like MySQL or SQLite do come with such capabilities.

Implement text matching like solr in ruby

I am working on a test based Q & A application. The questions on the application mainly have two or three words as the answer.
Example : Q. Who founded Google ?
A. Larry Page , Sergei Brin
There are no options for the answer, the user has to actually type it in. Plus in some cases there might be a synonym for the answer.
Example: USB Drive, Universal Serial Bus Drive, Pen Drive are all correct answers for the question: What is meant by nerd bling?
I have worked with solr before and it's full text search is powerful enough to do a match, consider synonyms and give a score for the match. However, I need to match the answers in my RoR application. Instead of writing my own regex to handle the task, I am wondering if there are some libraries that I could look at within RoR for this.
Also, if I were to look under the hood of solr and take inspiration from the code there to create a library of my own, please suggest files/modules I should be looking at (since I barely have any idea about Java).
Try using elasticsearch which allows RESTFUL search. Refer to www.elasticsearch.org/

Ruby on Rails object reporting

I am currently developing a ruby application that has a large number of different objects. As part of this application, I would like to add a reporting engine that allows a user to create custom reports on virtually any variable within the application - for example, they could create a report that shows what percentage of customers have a telephone number, or the absolute number of suppliers whose street name starts with an E. The point is, they should be able to create any report on the data in the app, regardless of how obscure, without needing to rely on it having been created in the application already.
My question is: how do I start creating a structure that allows this to happen? Will it be necessary to specify all possible variables that could be used as part of a report (e.g. I would need to specify that customers.count, customers.email_address and suppliers.addresses.street_name are all variables available to the reporting engine for the example above), or could these somehow be made available automatically?
If it is necessary to specify the variables, what would be the best way to do this?
I have searched for some resources on this, but have not yet found any - if anyone can recommend a source, it would also be appreciated.
Thanks!
Consider yourself warned that this likely violates YAGNI. I would highly recommend building reports first for the most common types of reports your users will want, so that you can make them usable and pretty. Doing this at the abstract level is an order of magnitude more complex, is error prone, may lead to some security issues if you're not careful, and will be difficult to make pretty reports rather than generic looking ones.
That said, take a look at something like Active Admin, which provides custom filters and data exports. You should be able to add custom scopes to have it do what you want, but if it still doesn't, then looking at the implementation should give you a good idea of what's involved.

Rails versioning with point-in-time query

Is there a Rails 3 model versioning plugin that supports point-in-time queries (not just recovery) in an SQL database?
To be concrete: I have a table, documents. I want to be able to say, "as of 9/17/2010, which documents contained the text 'foo'?". This requirement seems to rule out all of the single-table versioning solutions like vestal_versions, and none of the other ones seem to have this feature either. All of the plugins I've looked at are documented as black boxes, so perhaps they store enough data internally to do this sort of query but you would never know it from the docs. In terms of Slowly Changing Dimensions, Type 2 is probably the sort of solution I'm looking for, although ultimately I'll use whatever works.
I also need to keep track of which user made changes, although that's probably possible to do outside of the versioning system too.
Is there one that I'm missing? Or am I using the wrong search term? Or do I get to roll my own?
I think you get to roll your own. If you really need to be able to perform queries on the versions, then the versions should probably be their own model, and making a versioning gem bend to fit your needs will be more painful than doing it yourself.
Sorry, and have fun :/

MVC Implementation where a Search Engine is the Model

Maybe I am mistating the problems and conflating the answer with the questions, but please here me out. I would like to think (communally, with you) about a site that is based on any any of the MVC frameworks(something PHP or ASP.NET MVC, whtever) that would use a search engine (lucene/solr, FAST ESP, whatever) as the back end of the Model. That is to say, there is no database per se in the project. Just a giant index of docuements that are semistructured content.
I am looking to understand - and keep in mind the site is primarily read-only - where I am likely to run into trouble. What are the things that make you think this is a bad idea from the get go. Also, please assume that there will be a robust infrastructure with caching surrounding the search engine - so while perf comments are welcomed, we feel they are not the major problem.
Thanks!
In general, I'd use a tool like Lucene for searching content, and a database for retrieving it. That doesn't mean that it won't work. It's more a question of why you don't want to use a database. Yes, it can work, and it probably will work (depending on the functional requirements of the site, read on), but that still doesn't make a tool like Lucene the right tool for the job per se.
That being said, it also it does depend on the kind of site however. Is it really a site with just a whole bunch of searchable data and nothing else, or is it something much more than that? If the answer is the first, then good! If it is the latter, there are some issues I can think of:
Updates to the data can be troublesome. "Instant updates" are usually a no-go, as Lucene would have to rebuild its index, which is time-consuming. If there aren't many updates to the data that's fine. You can just recreate the index a couple of times per day, or nightly, if that works.
Trying to stuff any data in an index which is not really suited to be indexed is usually not a good idea. If the site lets users register on your site, then that user data should really go in a database. It's not impossible to store it in a lucene index, it's just not the right tool for the job. Use the index as a bunch of indexed documents, but don't use it as a database as well.

Resources