Rails - How to use solr SuggestComponent with sunspot - ruby-on-rails

I am using sunspot_rails gem for using Solr search library with Rails. I am trying to show suggestions for users when they enter search terms in my application like this
But I can't get the SuggestComponent working with sunspot. I referred this guide for suggest component and added the following to solrconfig.xml
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">name</str>
<str name="weightField">price</str>
<str name="contextField">cat</str>
<str name="suggestAnalyzerFieldType">string</str>
<str name="buildOnStartup">false</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
When I try the context filtering suggest query
http://localhost:8982/solr/development/suggest?suggest=true&suggest.build=true&suggest.dictionary=mySuggester&wt=json&suggest.q=c&suggest.cfq=memory
I am getting an empty response
{
"responseHeader":{
"status":0,
"QTime":43
},
"command":"build",
"suggest":{
"mySuggester":{
"c":{
"numFound":0,
"suggestions":[
]
}
}
}
}
Any idea what am I doing wrong? Can anyone help me how to use SuggestComponent with sunspot gem? Thanks in advance

You don't need to use solr's suggest component. you just need to ensure that your solr gives back results for partial keyword search which can be done by adding the edge Ngram or Ngram filter factories(loads of tutorials for that), while you do it ensure you use bundle exec susnspot:solr:start since that will use the configuration saved in your codebase. Then you can use Twitter typeahead to implement auto complete.

Related

SOLR exact match results not matching

Hi I have a text_exact fieldType (field is text_ex) that has KeywordTokenizerFactory for matching against exact queries. For example, searching for sale gives results that contain the term sale, specifically. When I run the query like this text_ex:sale, the number of results were found to be 28, where as when I run the same query using a switch-query parser (defined in request handler), I get the number of results as 18, even though the parsed query is same as text_ex:sale in the switch query fq. Can anyone help me debug this issue? I suppose there is some inconsistency in the the way the query is parsed for exact case being true or false. I am not sure.
Here is my schema definition:
<fieldType name="text_exact" class="solr.TextField" omitNorms="false">
<analyzer type="index" omitTermFreqAndPositions="false">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.WordDelimiterGraphFilterFactory"
generateWordParts="1"
generateNumberParts="0"
catenateWords="0"
catenateNumbers="0"
catenateAll="1"
splitOnCaseChange="0"
splitOnNumerics="0"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.WordDelimiterGraphFilterFactory"
generateWordParts="0"
generateNumberParts="0"
catenateWords="0"
catenateNumbers="0"
catenateAll="1"
splitOnCaseChange="0"
splitOnNumerics="0"/>
</analyzer>
</fieldType>
And here is my solrconfig.xml details:
<requestHandler name="/select" class="solr.SearchHandler">
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
-->
<lst name="defaults">
<str name="exact">false</str>
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="defType">edismax</str>
<str name="qf">
displayValue^20 description^5 connectorName_txt zenDescription_txt^5 zenBusinessOwner_txt^2
categoryName^8 reportOwner^2 reportDetailsNameColumn^5
</str>
<str name="pf2">
displayValue^20 description^5 connectorName_txt zenDescription_txt^5 zenBusinessOwner_txt^2
categoryName^8 reportOwner^2 reportDetailsNameColumn^5
</str>
<str name="pf3">
displayValue^20 description^5 connectorName_txt zenDescription_txt^5 zenBusinessOwner_txt^2
categoryName^8 reportOwner^2 reportDetailsNameColumn^5
</str>
<str name="tie">1</str>
<str name="mm">100%</str>
<int name="ps2">3</int>
<int name="ps3">9</int>
<int name="qs">0</int>
<str name="df">text</str>
<str name="q.alt">*:*</str>
<str name="sort">score desc, averageRating desc, lastOneWeekCount desc</str>
<str name="bq">
query({!boost b=20}approved:"yes")
</str>
</lst>
<lst name="appends">
<str name="fq">{!switch case.false=*:* case.true=text_ex:$${q} v=$exact}</str>
</lst>
</requestHandler>
And here is the parsed query when I did exact search using the request handler:
"filter_queries":["{!switch case.false=\"*:*\" case.true=\"text_ex:sale\" v=$exact}"],
"parsed_filter_queries":["text_ex:sale"]
And here is the parsed query when the query text_ex:sale is run:
"filter_queries": [
"{!switch case.false=\"*:*\" case.true=\"text_ex:text_ex:sale\" v=$exact}"
],
"parsed_filter_queries": [
"MatchAllDocsQuery(*:*)"
]
Also I have noticed that enclosing a query within double quotes is throwing a syntax error. Any suggestions on how to resolve this issue?. Thanks in advance.
Here is the error message:
"error": {
"msg": "org.apache.solr.search.SyntaxError: Expected identifier at pos 53 str='{!switch case.false=\"*:*\" case.true=\"text_ex:text_ex\":sale\"\" v=$exact}'",
"code": 400
}

Sunspot/rails configuration for multi-core (for different language docs) Solr 5 in one environment

I create two cores for English and Japanese docs by Solr 5.1, and am wondering how to set up Sunspot/Rails to choose a core depending on locale selection from my rails app.
The default sunspot.yml shows a setting of one core for each production, development, and test environment, but in my case, there are two cores in one environment.
Is it possible to handle multiple cores under one environment by Sunspot?
Using URL, I can query these cores by different languages as below, so still look for a configuration to select core by locale of an user.
server:port/solr/#/EN_core/query?q=text
server:port/solr/#/JP_core/query?q='テキスト'
I figure out how to index multilingual documents in a single Solr instance and search the indexed documents by a specified language from sunspot/rails. This method uses different fields instead of cores for different languages, so it is not a direct answer to my question, but a working example to deal with multilingual documents by sunspot/solr/rails.
For example, index/search field is “description” of Entry model. Some entries have descriptions in English and the others have in Japanese. I use the language detection during the index of solr (https://cwiki.apache.org/confluence/display/solr/Detecting+Languages+During+Indexing) and copyField to deal with sunspot's behavior to add “_text” to the searchable fields.
Add empty string fields “descption_en” and “desctipion_jp” to the Entry model by rails migration commands. May sound strange but these empty fields enable sunspot to search the documents either by English or Japanese. The commands may be like below, but it took quite a lot of time for > 10 million records. I should consider other methods here - https://www.onehub.com/blog/2009/09/15/adding-columns-to-large-mysql-tables-quickly/
rails generate migration AddLanguageHolderToEntry description_en:string description_jp:string
rake db:migrate
Add searchable to the Entry model
class Entry < ActiveRecord::Base
searchable do
text :description, :description_en, :description_ja
end
end
Configure solrconfig.xml to enable Solr the language detection during indexing.
Adding the following updateRequestProcessorChain. Using “description_text” in langid.fl instead of “description” because Sunspot adds “_text” to field name.
<updateRequestProcessorChain name="langid">
<processor class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
<bool name="langid">true</bool>
<str name="langid.fl">description_text</str>
<str name="langid.whitelist">en,ja</str>
<bool name="langid.map">true</bool>
<str name="langid.langField">language</str>
<str name="langid.fallback">en</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
I also added langid to the requestHandlers of “/update” and "/update/extract" as follows.
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="update.chain">langid</str>
</lst>
</requestHandler>
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
<str name="update.chain">langid</str>
</lst>
</requestHandler>
Check paths to the libraries
<lib dir="/path to/contrib/langid/lib/" regex=".*\.jar" />
<lib dir="/path to/dist/" regex="solr-langid-\d.*\.jar" />
Configure schema.xml
Add fields for “description”. “_text_en” and “_text_jp” are for the outputs from the solr's language detection. “_en_text” and “_jp_text” for indexing/searching by sunspot.
<field name="name_text_en" type="text_en" indexed="false" stored="true"/>
<field name="name_en_text" type="text_en" indexed="true" stored="false"/>
<field name="name_text_ja" type="text_ja" indexed="false" stored="true"/>
<field name="name_ja_text" type="text_ja" indexed="true" stored="false"/>
For the detected language.
These copyfields are set for searching.
<copyField source="description_text_en" dest="description_en_text" />
<copyField source="description_text_ja" dest="description_ja_text" />
Need “text_en” and “text_ja” filedtypes in the schema.xml. I omit details configuration for them here, but use standard analyzers.
<fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="false">.....
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">.....
Make indexing from sunspot
bundle exec rake sunspot:reindex
Search document – for test.
rails console
for English documents -
#search = Entry.search do
fulltext 'keyword_en' do
fields(:description_en)
end
end
for Japanese documents -
#search = Entry.search do
fulltext 'キーワード' do
fields(:description_ja)
end
end
#search.results
As you see that this is ad-hoc method and welcome any comments on it.

How do I query (via url) the solr.admin.LukeRequestHandler to get collection index data

I want to use the luke handler as suggested in Solr schema, how to get dynamic fields in a collection, which is http://solr:8983/solr/admin/luke?numTerms=0
but the 4.10.3 solrconfig.xml has the following entry which indicates luke has been rolled into /admin/ and I should be able to use the http://localhost:8983/solr/admin path, which give me a 404 error.
<requestHandler name="/admin/"
class="solr.admin.AdminHandlers" />
<!-- This single handler is equivalent to the following... -->
<!--
<requestHandler name="/admin/luke" class="solr.admin.LukeRequestHandler" />
<requestHandler name="/admin/system" class="solr.admin.SystemInfoHandler" />
<requestHandler name="/admin/plugins" class="solr.admin.PluginInfoHandler" />
<requestHandler name="/admin/threads" class="solr.admin.ThreadDumpHandler" />
<requestHandler name="/admin/properties" class="solr.admin.PropertiesRequestHandler" />
<requestHandler name="/admin/file" class="solr.admin.ShowFileRequestHandler" >
-->
When I look for LukeRequestHandler documentation I find http://lucene.apache.org/solr/4_4_0/solr-core/org/apache/solr/handler/admin/LukeRequestHandler.html which expects I am building a java app, which I am not.
I attempt to use several methods found there in a url, all of which 404.
In addition to "how do I query the luke handler to get index data",
"is this the correct documentation for what I am trying to figure out?".
Any help in understanding how (these) java docs relate to me trying to understand how Solr works from url would be greatly appreciated.
I spent some days scratching my head with the same issue. Apparently, you need to include the name of your core into every request.
Testing with the "gettingstarted" core:
http://localhost:8983/solr/admin/luke/ gives 404.
http://localhost:8983/solr/gettingstarted/admin/luke/ gives an XML with the information of the index.
Check if this solves your problem.

Solr: After reloading core config, admin panel doesn't respond

I'm running Solr 4.2 with sunspot in Rails, and have indexed about 6000 items.
Currently I'm playing around with the spellcheck feature. But everytime I make a change in the solrconfig.xml (like turning collation on and off), the admin panel from solr doesn't respond any more.
When I try to execute a query, the loading spinner shows up, and nothing happens. Same behaviour for other parts of the panel, like Core Admin or Statistics.
Restarting solr doesn't help. Reindexing the items doesn't helper neither. Only deleting the whole index files, restarting solr, and re-indexing all items does work; but that's a painful way to work.
So does anybody have a clue, what's happening here? Where could I start to debug? Is it related to the SpellChecker Component? Maybe I missed something here.
This is the part from the solrconfig.xml that I'm playing with:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_general</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">name_texts</str>
<str name="field">taxon_permalinks_sms</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<!-- the spellcheck distance measure used, the default is the internal levenshtein -->
<str name="distanceMeasure">internal</str>
<!-- minimum accuracy needed to be considered a valid spellcheck suggestion -->
<float name="accuracy">0.5</float>
<!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -->
<int name="maxEdits">2</int>
<!-- the minimum shared prefix when enumerating terms -->
<int name="minPrefix">1</int>
<!-- maximum number of inspections per result. -->
<int name="maxInspections">5</int>
<!-- minimum length of a query term to be considered for correction -->
<int name="minQueryLength">4</int>
<!-- maximum threshold of documents a query term can appear to be considered for correction -->
<float name="maxQueryFrequency">0.01</float>
<!-- uncomment this to require suggestions to occur in 1% of the documents
<float name="thresholdTokenFrequency">.01</float>
-->
</lst>
<!-- a spellchecker that can break or combine words. See "/spell" handler below for usage -->
<lst name="spellchecker">
<str name="name">wordbreak</str>
<str name="classname">solr.WordBreakSolrSpellChecker</str>
<str name="field">name_texts</str>
<str name="field">taxon_permalinks_sms</str>
<str name="combineWords">true</str>
<str name="breakWords">true</str>
<int name="maxChanges">5</int>
</lst>
</searchComponent>
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">text</str>
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.dictionary">wordbreak</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.alternativeTermCount">5</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">10</str>
<str name="spellcheck.maxCollations">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
And btw: How can I check if my config for the different components is valid?
EDIT
When I try to open the search directly, e.g. curl or in my browser, it also hangs loading (e.g. calling http://localhost:8982/solr/core1/select?q=*%3A*&wt=xml&indent=true)
I found the solution here, if anyone is interested: solr - spellcheck causing Core Reload to hang
Turns out, it's a Solr bug, but there is an easy workaround. You have to delete the <str name="spellcheck.maxCollationTries">10</str> inside your request handler. This line causes the problem. If you really need this parameter, just append it to your URL and your safe.

How to read facet facet_ranges in grails?

A.<lst name="Age">
<int name="0">2</int>
<int name="2">1</int>
<int name="6">1</int>
<int name="9">1</int>
</lst>
</lst>
B.<lst name="facet_ranges">
<lst name="Age">
<lst name="counts">
<int name="0">3</int>
<int name="6">1</int>
<int name="9">1</int>
</lst>
<long name="gap">3</long>
<long name="start">0</long>
<long name="end">51</long>
</lst>
</lst>
i fired one RANGE query to solr database... it gives an Answer..A and B
But I am not able to read B part of answer. which is wanted to display on page
i am using grails
Help me out
thank You!!
Check QueryResponse getFacetRanges which should return you the range facet responses.
getFacetRanges().get(0) should return you RangeFacet
RangeFacet getCounts() return you RangeFacet.Count which can be used to get the range and the value.

Resources