Prefix queries (*) in Azure Search don't return expected results - odata

While searching on azure using Rest API provided by Microsoft Search API
Not behaving correctly when search string contains '#'.
Example: I've 3 rows in Azure Document
CES
CES#123
CES#1234
When My search string was CES* then all 3 were the result.
When My Search string was CES#123* then only one exact matching record was in result.
When My Search string was CES#* then there was no result.
As per my requirement in case of "CES#*" search string, all 3 records should be part of result set.
I've tried " "(space) in replacement of # it works, but my data contains # for search I need to maintain this.
I'm using SearchMode:Any.

This behavior is expected.
Query terms of prefix queries are not analyzed. Therefore, in your example with "CES#*" you are searching for term CES# while the # sign was stripped from the terms in the index: CES, 123, 1234.
Here is an excerpt from the How full text search works in Azure Search article:
Exceptions to lexical analysis
Lexical analysis applies only to query types that require complete
terms – either a term query or a phrase query. It doesn’t apply to
query types with incomplete terms – prefix query, wildcard query,
regex query – or to a fuzzy query. Those query types, including the
prefix query with term air-condition* in our example, are added
directly to the query tree, bypassing the analysis stage. The only
transformation performed on query terms of those types is lowercasing.

Related

Odata wildcard at the beginning instead of the end of query

I know that you can do an odata query like this for wildcards
$filter=search.ismatch('lux*', 'Description')
What I would like to do is this
$filter=search.ismatch('*lux', 'Description')
I have tried the above query and it did not return any information and I know there are matches for '*lux'
Ideally I would like to have 2 different fields in the query like this
search=&$filter=Hotel eq 'Southern' and search.ismatch('*lux', 'Description')
That syntax does not return anything either
Ideal result set:
Hotel: Description:
Southern Ultra lux
Southern Mega lux
Also I did not know how to tag this as I don't work with it a lot so I am sorry if it is mis tagged
I found this answer: You cannot use a * or ? symbol as the first character of a search. No text analysis is performed on wildcard search queries. At query time, wildcard query terms are compared against analyzed terms in the search index and expanded.
Looks like it isn't possible only if the wildcard is at the beginning of the search query. If the wildcard is at the end or middle of the search query then it will work perfectly.
You can do prefix, suffix and infix queries in Azure Cognitive Search if your queryType is set to "full" (invokes the full Lucene parser) and if you're using regular full text search ("search=") instead of a $filter. Prefix, suffix, and infix are variations of wildcard search forms. There are some examples in the docs if you want to take a look.

What is the difference between the filter and search query parameters in Microsoft Graph Mail API?

While I was looking at the documentation for query parameters here, I noticed that there were two query parameters that seemingly did the exact same thing: filter and search.
I'm just wondering what the difference is between them and when is one used over the other.
While they're similar, they operate a little differently.
$search uses Keyword Query Language (KQL) and is only supported by message and person collections (i.e. you can't use $search on most endpoints). By default, it searches multiple properties. Most importantly, $search is a "contains" search, meaning it will look for your search word/phrase anywhere within a string.
For example, /messages?$search="bacon" will search for the word "bacon" anywhere in the from, subject, or body properties.
Unlike $search, the $filter parameter only searches the specified property and does not support "contains" search. It also works with just about every endpoint. In most places, it supports the following operators: equals (eq), not equals (ne), greater than (gt), greater than or equals (ge), less than (lt), less than or equals (le), and (and), or (or), not (not), and (on some endpoints) starts with (startsWith).
For example, /messages?$filter=subject eq 'bacon' will return only messages where the subject is "bacon".
Both search and filter reduce the result set that you ultimately receive, however they operate in different ways.
Search operates on the query against the entire graph and reduces the amount of information a search query returns. This is often optimized for queries that search is good at, e.g. performing searches for items that can be indexed.
Filter operates on the much smaller result set returned by the search to provide more fine grain filtering. Separating this out allows filtering to perform tasks that would not be performant against the full collection.
This is indicated in Microsoft's documentation:
Search: Returns results based on search criteria.
Filter: Filters results (rows). (results that could be returned by search)
For performance purposes, it's good to use both if you can, search to narrow the results (e.g. using search indexes) and then do fine grain filtering on the returned results.

exclude term in YouTube Data API without including term

I'm using the YouTube Data API's search.list method to return a list of videos by date. I'm interested in filtering out certain content without having to specify a search term. The documentation specifies that you can use the - operator as a Boolean NOT, but this only seems to work if I precede that with a search term, meaning I can do this:
q:'food -pizza'
which will return results for the query term 'food' but not 'pizza'. Now say I want it to return any result excluding pizza you'd think this would work:
q:'-pizza'
but this returns an empty Array (no results). Am I doing this wrong? is there a way to exclude certain terms without having to specify a specific search term to include before hand?

How do a general search across string properties in my nodes?

Working with Neo4j in a Rails app.
I have nodes with several string properties containing long strings of user generated content. For example in my nodes of type: "Book", I might have properties, "review", and "summary", which would contain long-form string values.
I was trying to design queries that returned nodes which match those properties to general language search terms provided by a user in a search box. As my query got increasingly complicated, it occurred to me that I was trying to resolve natural language search.
I looked into some of the popular search gems in Rails, but they all seem to depend on ActiveRecord. What search solutions exist for Neo4j.rb?
There are a few ways that you could go about this!
As FrobberOfBits said, Neo4j has what are called "legacy indexes" which use Lucene it the background to provide indexing of generic things. It does support the new schema indexes. Unfortunately those are based on exact matches (though I'm pretty sure that will change in Neo4j 2.3.x somewhat).
Neo4j does support pattern matching on strings via the =~ operator, but those queries aren't indexed. So the performance depends on the size of your database.
We often recommend a gem called searchkick which lets you define indexes for Elasticsearch in your models. Then you can just call a Model.search method to do your searches and it will first query elasticsearch to get the node IDs and then load those nodes via Neo4j.rb. You can use that via the neo4j-searchkick gem: https://github.com/neo4jrb/neo4j-searchkick
Lastly, if you're doing NLP and are trying to extract important words from your text, you could create a Tag/Word label and create relationships from your nodes to these NLP extracted nodes so that you can search based on those nodes in the future. You could even build recommendations from one text node to another based on the number/type of common tag nodes.
I don't know if anything specific exists for neo4j.rb and activerecord. What I can say is that generally this stuff is handled through the use of legacy indexes that are implemented by Lucene.
The premise is that you create a lucene-managed index on certain properties, and that then gives you access to use the Lucene query language via cypher to get data from those indices. Relative to neo4j.rb, it doesn't look any different than running cypher queries, like this:
START item=node:node_auto_index("(title:'foo bar' AND body:baz*) OR title:'bat'")
RETURN item
Note that lucene indexes and that query language can only be used in a START block, not a MATCH block. Refer to the Lucene Query Syntax to discover more about what you can do with that query syntax (fuzzy matching, wildcards, etc -- quite a bit more extensive than what regex would give you).

Lucene term boosting with sunspot-rails

I'm having an issue with Lucene's Term [Boosting][1] query syntax, specifically in Ruby on Rails via the sunspot_rails gem. This is whereby you can specify the weight of a specific term during a query, and is not related to the weighting of a particular field.
The HTML query generated by sunspot uses the qf parameter to specify the fields to be searched as configured, and the q parameter for the query itself. When the caret is added to a search term to specify a boost (i.e. q=searchterm^5) it returns no results, even though results would be returned without the boost term.
If, on the other hand, I create an HTTP query manually and manually specify the field to search (q=title_texts:searchterm^5), results are returned and scores seem affected by the boost.
In short, it appears as though query term boosting doesn't work in conjunction with fields specified with qf.
My application calls for search across several fields, using the respective boosts associated to those fields, conditionally in turn with boosting on individual terms of a query.
Any insight?
[1]: http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Boosting a Term
Sunspot uses the dismax parser for fulltext search, which eschews the usual Lucene query syntax in favor of a limited (but user-input-friendly) query syntax combined with a set of additional parameters (such as qf) that can be constructed by the client application to tune how search works. Sunspot provides support for per-field boost using the boost_fields method in the fulltext DSL:
http://outoftime.github.com/sunspot/docs/classes/Sunspot/DSL/Fulltext.html#M000129
The solution I have found is to use DisMax, but adding the bq parameter with a boolean string with the boosted terms therein.

Resources