SOLR exact match results not matching

SOLR exact match results not matching - parsing

Hi I have a text_exact fieldType (field is text_ex) that has KeywordTokenizerFactory for matching against exact queries. For example, searching for sale gives results that contain the term sale, specifically. When I run the query like this text_ex:sale, the number of results were found to be 28, where as when I run the same query using a switch-query parser (defined in request handler), I get the number of results as 18, even though the parsed query is same as text_ex:sale in the switch query fq. Can anyone help me debug this issue? I suppose there is some inconsistency in the the way the query is parsed for exact case being true or false. I am not sure.
Here is my schema definition:
<fieldType name="text_exact" class="solr.TextField" omitNorms="false">
<analyzer type="index" omitTermFreqAndPositions="false">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.WordDelimiterGraphFilterFactory"
generateWordParts="1"
generateNumberParts="0"
catenateWords="0"
catenateNumbers="0"
catenateAll="1"
splitOnCaseChange="0"
splitOnNumerics="0"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.WordDelimiterGraphFilterFactory"
generateWordParts="0"
generateNumberParts="0"
catenateWords="0"
catenateNumbers="0"
catenateAll="1"
splitOnCaseChange="0"
splitOnNumerics="0"/>
</analyzer>
</fieldType>
And here is my solrconfig.xml details:
<requestHandler name="/select" class="solr.SearchHandler">
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
-->
<lst name="defaults">
<str name="exact">false</str>
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="defType">edismax</str>
<str name="qf">
displayValue^20 description^5 connectorName_txt zenDescription_txt^5 zenBusinessOwner_txt^2
categoryName^8 reportOwner^2 reportDetailsNameColumn^5
</str>
<str name="pf2">
displayValue^20 description^5 connectorName_txt zenDescription_txt^5 zenBusinessOwner_txt^2
categoryName^8 reportOwner^2 reportDetailsNameColumn^5
</str>
<str name="pf3">
displayValue^20 description^5 connectorName_txt zenDescription_txt^5 zenBusinessOwner_txt^2
categoryName^8 reportOwner^2 reportDetailsNameColumn^5
</str>
<str name="tie">1</str>
<str name="mm">100%</str>
<int name="ps2">3</int>
<int name="ps3">9</int>
<int name="qs">0</int>
<str name="df">text</str>
<str name="q.alt">*:*</str>
<str name="sort">score desc, averageRating desc, lastOneWeekCount desc</str>
<str name="bq">
query({!boost b=20}approved:"yes")
</str>
</lst>
<lst name="appends">
<str name="fq">{!switch case.false=*:* case.true=text_ex:$${q} v=$exact}</str>
</lst>
</requestHandler>
And here is the parsed query when I did exact search using the request handler:
"filter_queries":["{!switch case.false=\"*:*\" case.true=\"text_ex:sale\" v=$exact}"],
"parsed_filter_queries":["text_ex:sale"]
And here is the parsed query when the query text_ex:sale is run:
"filter_queries": [
"{!switch case.false=\"*:*\" case.true=\"text_ex:text_ex:sale\" v=$exact}"
],
"parsed_filter_queries": [
"MatchAllDocsQuery(*:*)"
]
Also I have noticed that enclosing a query within double quotes is throwing a syntax error. Any suggestions on how to resolve this issue?. Thanks in advance.
Here is the error message:
"error": {
"msg": "org.apache.solr.search.SyntaxError: Expected identifier at pos 53 str='{!switch case.false=\"*:*\" case.true=\"text_ex:text_ex\":sale\"\" v=$exact}'",
"code": 400
}

Related

Rails - How to use solr SuggestComponent with sunspot

I am using sunspot_rails gem for using Solr search library with Rails. I am trying to show suggestions for users when they enter search terms in my application like this
But I can't get the SuggestComponent working with sunspot. I referred this guide for suggest component and added the following to solrconfig.xml
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">name</str>
<str name="weightField">price</str>
<str name="contextField">cat</str>
<str name="suggestAnalyzerFieldType">string</str>
<str name="buildOnStartup">false</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
When I try the context filtering suggest query
http://localhost:8982/solr/development/suggest?suggest=true&suggest.build=true&suggest.dictionary=mySuggester&wt=json&suggest.q=c&suggest.cfq=memory
I am getting an empty response
{
"responseHeader":{
"status":0,
"QTime":43
},
"command":"build",
"suggest":{
"mySuggester":{
"c":{
"numFound":0,
"suggestions":[
]
}
}
}
}
Any idea what am I doing wrong? Can anyone help me how to use SuggestComponent with sunspot gem? Thanks in advance

You don't need to use solr's suggest component. you just need to ensure that your solr gives back results for partial keyword search which can be done by adding the edge Ngram or Ngram filter factories(loads of tutorials for that), while you do it ensure you use bundle exec susnspot:solr:start since that will use the configuration saved in your codebase. Then you can use Twitter typeahead to implement auto complete.

Solr ShingleFilterFactory in query analysis not worikng

I have field with definition below, I works perfect in analysis but when I try to query it in that way, query analysis behaves different. What I am missing?
data: thd_keyphrase: Privately held companies based in California,Social media,Privately held companies
query: q=thd_keyphrase:find any social media
in analysis query is processed this way: |find any|any social|social media
and it matches Social media
output from debug query is sifferent:
"rawquerystring": "thd_keyphrase:find any social media",
"querystring": "thd_keyphrase:find any social media",
"parsedquery": "thd_keyphrase:find text:ani text:social text:media",
"parsedquery_toString": **"thd_keyphrase:find text:ani text:social text:media",**
or when I remove default field text : "msg": "no field name specified in query and no default specified via 'df' param",
<fieldType name="keyphrase" class="solr.TextField" omitNorms="false" termVectors="false" multiValued="false">
<analyzer type="index">
<tokenizer class="solr.PatternTokenizerFactory" pattern="\s*,\s*"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="5"
outputUnigrams="false" outputUnigramsIfNoShingles="true" tokenSeparator=" "/>
<!-- <filter class="solr.KeepWordFilterFactory" words="keepwords.txt" ignoreCase="true" enablePositionIncrements="false"/>-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</types>

Since you have spaces in the text string make sure to surround it with double quotes like so:
q=thd_keyphrase:"find any social media"
Also, do you mean to tokenize the field on comma?

Solr: After reloading core config, admin panel doesn't respond

I'm running Solr 4.2 with sunspot in Rails, and have indexed about 6000 items.
Currently I'm playing around with the spellcheck feature. But everytime I make a change in the solrconfig.xml (like turning collation on and off), the admin panel from solr doesn't respond any more.
When I try to execute a query, the loading spinner shows up, and nothing happens. Same behaviour for other parts of the panel, like Core Admin or Statistics.
Restarting solr doesn't help. Reindexing the items doesn't helper neither. Only deleting the whole index files, restarting solr, and re-indexing all items does work; but that's a painful way to work.
So does anybody have a clue, what's happening here? Where could I start to debug? Is it related to the SpellChecker Component? Maybe I missed something here.
This is the part from the solrconfig.xml that I'm playing with:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_general</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">name_texts</str>
<str name="field">taxon_permalinks_sms</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<!-- the spellcheck distance measure used, the default is the internal levenshtein -->
<str name="distanceMeasure">internal</str>
<!-- minimum accuracy needed to be considered a valid spellcheck suggestion -->
<float name="accuracy">0.5</float>
<!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -->
<int name="maxEdits">2</int>
<!-- the minimum shared prefix when enumerating terms -->
<int name="minPrefix">1</int>
<!-- maximum number of inspections per result. -->
<int name="maxInspections">5</int>
<!-- minimum length of a query term to be considered for correction -->
<int name="minQueryLength">4</int>
<!-- maximum threshold of documents a query term can appear to be considered for correction -->
<float name="maxQueryFrequency">0.01</float>
<!-- uncomment this to require suggestions to occur in 1% of the documents
<float name="thresholdTokenFrequency">.01</float>
-->
</lst>
<!-- a spellchecker that can break or combine words. See "/spell" handler below for usage -->
<lst name="spellchecker">
<str name="name">wordbreak</str>
<str name="classname">solr.WordBreakSolrSpellChecker</str>
<str name="field">name_texts</str>
<str name="field">taxon_permalinks_sms</str>
<str name="combineWords">true</str>
<str name="breakWords">true</str>
<int name="maxChanges">5</int>
</lst>
</searchComponent>
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">text</str>
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.dictionary">wordbreak</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.alternativeTermCount">5</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">10</str>
<str name="spellcheck.maxCollations">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
And btw: How can I check if my config for the different components is valid?
EDIT
When I try to open the search directly, e.g. curl or in my browser, it also hangs loading (e.g. calling http://localhost:8982/solr/core1/select?q=*%3A*&wt=xml&indent=true)

I found the solution here, if anyone is interested: solr - spellcheck causing Core Reload to hang
Turns out, it's a Solr bug, but there is an easy workaround. You have to delete the <str name="spellcheck.maxCollationTries">10</str> inside your request handler. This line causes the problem. If you really need this parameter, just append it to your URL and your safe.

How to read facet facet_ranges in grails?

A.<lst name="Age">
<int name="0">2</int>
<int name="2">1</int>
<int name="6">1</int>
<int name="9">1</int>
</lst>
</lst>
B.<lst name="facet_ranges">
<lst name="Age">
<lst name="counts">
<int name="0">3</int>
<int name="6">1</int>
<int name="9">1</int>
</lst>
<long name="gap">3</long>
<long name="start">0</long>
<long name="end">51</long>
</lst>
</lst>
i fired one RANGE query to solr database... it gives an Answer..A and B
But I am not able to read B part of answer. which is wanted to display on page
i am using grails
Help me out
thank You!!

Check QueryResponse getFacetRanges which should return you the range facet responses.
getFacetRanges().get(0) should return you RangeFacet
RangeFacet getCounts() return you RangeFacet.Count which can be used to get the range and the value.

Finding singular versions of a word in Sunspot/Solr

I have a Rails+Sunspot application and I'm working on configuring it so that searching returns the singluar version of the query. For instance:
I want a search for "cookies" to return something named "cookie". Currently my Sunspot search returns "cookies" but not "cookie" (singluar).
I've made some customizations to Solr's schema.xml, adding solr.EdgeNGramFilterFactory to provide more flexibility but EdgeNGramFilterFactory doesn't suite this case as it only allows matches when the query is a substring of the result's name. My understanding is EdgeNGramFilterFactory will return "cookie" when the user searches for "co", "coo", "cook" or "cooki", but not a superstring of "cookie" (ie: cookies). Simply put, this is because "cookies" is not a substring within "cookie".
I've tried adding all three of Solr's build-in stemming factories but to no avail. You can see one commented out in my schema.
In schema.xml, the relevant field looks as follows:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
<!-- <filter class="solr.EnglishMinimalStemFilterFactory"/> -->
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
I supposed I could singluarize the user's query but I would rather not touch their query before it hits Solr.
You can play with this here: http://staging.zisboombah.com/parent/food_guide/?search=cookie. Try changing the query between "cookie" and "cookies".
Any tips on how to do this in Solr would be greatly appreciated!

The solr xml options are ordered. You want the stemmer to come before the ngram filter, so that you ngram-ize cooki, rather than stemming c, co, etc.
Combining filters in this way may lead to some odd results, mostly depending on how aggressive your stemmer is. You should definitely add the stemmer to the query analyzer, but that will mess with your autocomplete.
A better solution: use a copyField to make independent text_stemmed and text_autocomplete fields. Then search using an OR query over both fields.

Like Kyle mentions, you probably want to use more text field types for each of these different use cases.
Here's an example of mine:
schema.xml
<schema>
<types>
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_en" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_stopwords" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
</analyzer>
</fieldType>
<!-- ... -->
</types>
<fields>
<!-- ... -->
</fields>
<copyField source="*_text" dest="text"/>
<copyField source="*_texts" dest="text"/>
<copyField source="*_textsv" dest="text"/>
<copyField source="*_textv" dest="text"/>
</schema>
Sunspot modeling
Using the copyField directive can save some setup work in the model. However, Sunspot uses those text declarations to decide which fields to keywords-search by default, so I like to include distinct text invocations that use :as to specify the full Solr document field name.
searchable do
text :name, stored: true, default_boost: 10
text :name, as: 'name_text_en'
text :description, stored: true
end

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

SOLR exact match results not matching - parsing

Related

Rails - How to use solr SuggestComponent with sunspot

Solr ShingleFilterFactory in query analysis not worikng

Solr: After reloading core config, admin panel doesn't respond

How to read facet facet_ranges in grails?

Finding singular versions of a word in Sunspot/Solr

Categories

Resources