Lucene term boosting with sunspot-rails - ruby-on-rails

I'm having an issue with Lucene's Term [Boosting][1] query syntax, specifically in Ruby on Rails via the sunspot_rails gem. This is whereby you can specify the weight of a specific term during a query, and is not related to the weighting of a particular field.
The HTML query generated by sunspot uses the qf parameter to specify the fields to be searched as configured, and the q parameter for the query itself. When the caret is added to a search term to specify a boost (i.e. q=searchterm^5) it returns no results, even though results would be returned without the boost term.
If, on the other hand, I create an HTTP query manually and manually specify the field to search (q=title_texts:searchterm^5), results are returned and scores seem affected by the boost.
In short, it appears as though query term boosting doesn't work in conjunction with fields specified with qf.
My application calls for search across several fields, using the respective boosts associated to those fields, conditionally in turn with boosting on individual terms of a query.
Any insight?
[1]: http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Boosting a Term

Sunspot uses the dismax parser for fulltext search, which eschews the usual Lucene query syntax in favor of a limited (but user-input-friendly) query syntax combined with a set of additional parameters (such as qf) that can be constructed by the client application to tune how search works. Sunspot provides support for per-field boost using the boost_fields method in the fulltext DSL:
http://outoftime.github.com/sunspot/docs/classes/Sunspot/DSL/Fulltext.html#M000129

The solution I have found is to use DisMax, but adding the bq parameter with a boolean string with the boosted terms therein.

Related

What is the difference between the filter and search query parameters in Microsoft Graph Mail API?

While I was looking at the documentation for query parameters here, I noticed that there were two query parameters that seemingly did the exact same thing: filter and search.
I'm just wondering what the difference is between them and when is one used over the other.
While they're similar, they operate a little differently.
$search uses Keyword Query Language (KQL) and is only supported by message and person collections (i.e. you can't use $search on most endpoints). By default, it searches multiple properties. Most importantly, $search is a "contains" search, meaning it will look for your search word/phrase anywhere within a string.
For example, /messages?$search="bacon" will search for the word "bacon" anywhere in the from, subject, or body properties.
Unlike $search, the $filter parameter only searches the specified property and does not support "contains" search. It also works with just about every endpoint. In most places, it supports the following operators: equals (eq), not equals (ne), greater than (gt), greater than or equals (ge), less than (lt), less than or equals (le), and (and), or (or), not (not), and (on some endpoints) starts with (startsWith).
For example, /messages?$filter=subject eq 'bacon' will return only messages where the subject is "bacon".
Both search and filter reduce the result set that you ultimately receive, however they operate in different ways.
Search operates on the query against the entire graph and reduces the amount of information a search query returns. This is often optimized for queries that search is good at, e.g. performing searches for items that can be indexed.
Filter operates on the much smaller result set returned by the search to provide more fine grain filtering. Separating this out allows filtering to perform tasks that would not be performant against the full collection.
This is indicated in Microsoft's documentation:
Search: Returns results based on search criteria.
Filter: Filters results (rows). (results that could be returned by search)
For performance purposes, it's good to use both if you can, search to narrow the results (e.g. using search indexes) and then do fine grain filtering on the returned results.

Prefix queries (*) in Azure Search don't return expected results

While searching on azure using Rest API provided by Microsoft Search API
Not behaving correctly when search string contains '#'.
Example: I've 3 rows in Azure Document
CES
CES#123
CES#1234
When My search string was CES* then all 3 were the result.
When My Search string was CES#123* then only one exact matching record was in result.
When My Search string was CES#* then there was no result.
As per my requirement in case of "CES#*" search string, all 3 records should be part of result set.
I've tried " "(space) in replacement of # it works, but my data contains # for search I need to maintain this.
I'm using SearchMode:Any.
This behavior is expected.
Query terms of prefix queries are not analyzed. Therefore, in your example with "CES#*" you are searching for term CES# while the # sign was stripped from the terms in the index: CES, 123, 1234.
Here is an excerpt from the How full text search works in Azure Search article:
Exceptions to lexical analysis
Lexical analysis applies only to query types that require complete
terms – either a term query or a phrase query. It doesn’t apply to
query types with incomplete terms – prefix query, wildcard query,
regex query – or to a fuzzy query. Those query types, including the
prefix query with term air-condition* in our example, are added
directly to the query tree, bypassing the analysis stage. The only
transformation performed on query terms of those types is lowercasing.

How to best use Solr parser syntax in a specific business requirement

Just starting to learn Solr for a project at work and was wondering on how to go about this issue. Our application allows a user to search based on a business name. The business name is comprised of 3 different categories ( English, French and Combined Name ). Based on a single query entered by the user, how would one go about using Solr to provide the most relevant search results? I have looked into fuzzy and proximity searches which seem reasonable enough. Although fuzzy search only applies to a single term, which makes me believe that I would need to split the query into single terms and apply fuzzy search to each and merge the results if I were to use it ? My question is how to best approach the problem ? Thanks!
To provide relevancy to your documents , you need to have a combination of proper boosting queries and your priorities as what relevance means to your use case . If Regex based search is included in use case you may go for NGrams , if exact search is what you seeking for , boosting is important . You can use parameters like phrase slope , mm, and other edismax parameters to your advantage . You may use a combination of title and text content search, with a good combination of boosts . Also , Solr allows you to pass your query in parenthesis, that functions like an SQL IN query , that further boosts relevancy in your documents by sticking to keywords only mentioned in the query . And , at last , if all these doesn't suffice, you may use custom function queries to meet your needs . While doing all this, just keep in mind the Analyzers in schema.xml file are just right and serve the purpose to execute above mentioned queries .
You can go as far down this rabbit-hole as you have time for wrt Business Name search. (Fuzzy, sound-alike, language-specific analysis, weird compounded-terms used as a domain name (eg: getting "EZBake" to match "easy bake", or "1-to-1" to match "one to one" is non-trivial)
Since this sounds like a pre-existing application, I typically look to query logs (when available) to sample the frequency of different types of mismatches (dig out the zero-result search terms and start manually categorizing the high-level issues behind the more common mismatches).
That will provide you with a backlog of "matching use cases to research how to implement" (in the order of maximal benefit, as determined by your sample).
Then you're ready to start burning them down, and asking much more specific questions about how to get Solr to jump through your domain-specific hoops.

How do a general search across string properties in my nodes?

Working with Neo4j in a Rails app.
I have nodes with several string properties containing long strings of user generated content. For example in my nodes of type: "Book", I might have properties, "review", and "summary", which would contain long-form string values.
I was trying to design queries that returned nodes which match those properties to general language search terms provided by a user in a search box. As my query got increasingly complicated, it occurred to me that I was trying to resolve natural language search.
I looked into some of the popular search gems in Rails, but they all seem to depend on ActiveRecord. What search solutions exist for Neo4j.rb?
There are a few ways that you could go about this!
As FrobberOfBits said, Neo4j has what are called "legacy indexes" which use Lucene it the background to provide indexing of generic things. It does support the new schema indexes. Unfortunately those are based on exact matches (though I'm pretty sure that will change in Neo4j 2.3.x somewhat).
Neo4j does support pattern matching on strings via the =~ operator, but those queries aren't indexed. So the performance depends on the size of your database.
We often recommend a gem called searchkick which lets you define indexes for Elasticsearch in your models. Then you can just call a Model.search method to do your searches and it will first query elasticsearch to get the node IDs and then load those nodes via Neo4j.rb. You can use that via the neo4j-searchkick gem: https://github.com/neo4jrb/neo4j-searchkick
Lastly, if you're doing NLP and are trying to extract important words from your text, you could create a Tag/Word label and create relationships from your nodes to these NLP extracted nodes so that you can search based on those nodes in the future. You could even build recommendations from one text node to another based on the number/type of common tag nodes.
I don't know if anything specific exists for neo4j.rb and activerecord. What I can say is that generally this stuff is handled through the use of legacy indexes that are implemented by Lucene.
The premise is that you create a lucene-managed index on certain properties, and that then gives you access to use the Lucene query language via cypher to get data from those indices. Relative to neo4j.rb, it doesn't look any different than running cypher queries, like this:
START item=node:node_auto_index("(title:'foo bar' AND body:baz*) OR title:'bat'")
RETURN item
Note that lucene indexes and that query language can only be used in a START block, not a MATCH block. Refer to the Lucene Query Syntax to discover more about what you can do with that query syntax (fuzzy matching, wildcards, etc -- quite a bit more extensive than what regex would give you).

Neo4j schema indexes for fuzzy search

Right now I'm thinking on possibility to create fuzzy search in my application over my Neo4j database.
The main criteria are: fuzzy search and performance.
What is the best way to achive these goals with a last version of Neo4j community edition ?
Fuzzy search is a tricky thing. Even in plain lucene (where you can do fuzzy search with lucene query strings) it is not recommended because it is quite expensive.
You can use that query syntax in Neo4j too when you indexed your data with a manual index.
The solution that most suggest is to rather go with auto-suggestion, i.e. match on the first few characters, present the options in the auto-complete box and then search by using the user selected strings.

Resources