Partial Solr search Rails 4 - ruby-on-rails

How to do solr search with partial string..?
For example: I am having shopping sites 'abcdef' , 'abcokp', 'abc'..
Then if I search 'abc' , then it should show all three sites. but it shows only last one 'abc'.
Any help?

Change solr/conf.schema.xml with following snippet
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
Restart solr and done. Any other alternative, as schema.xml does not committed?

Related

solr sunspot unicode? characters

How do I use this file provided in sunspot (mapping-ISOLatin1Accent.txt)(or is this the one I need as well)? I need to be able to search for "las pinas" and include results like "las piñas" in my database. Meaning n => ñ? I have my config schema.xml like this for now:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false" tokenizerFactory="solr.StandardTokenizerFactory"/>
</analyzer>
</fieldType>
and I have tried moving the <charFilter> setting around as well.
I have also searched and found various solutions mostly pointing to this or this articles but those don't seem to work either.

give importance to documents which contains the word proximity + solr + sunspot

I am working on rails application and which is based on Apache Solr search engine and we are using Sunspot gem. But I am facing one problem, If I search query house rent then its giving me thousands of results by using and query. But the results what I am getting are not relevant.
I am expecting the documents which contains the house and rent words near to each other, those documents should come on top. But for now the documents which contains more number of house and rent documents are coming on top. But there is no any word proximity.
My schema.xml contains following definition:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="[\s,\.;\(\)]+"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
To achieve this what changes are need to do? or any filter are necessary to add for this?
You can try this
<fieldType name="shingleString" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.ShingleFilterFactory" outputUnigrams="true" outputUnigramIfNoNgram="true" maxShingleSize="99"/>
<filter class="solr.PositionFilterFactory" />
</analyzer>
</fieldType>
Use phrase fields and boost them or you can try terms boosting like "house rent"~5

Rails sunspot-solr - words with hyphen

I'm using the sunspot_rails gem and everything is working perfect so far but: I'm not getting any search results for words with a hyphen.
Example:
The string "tron" returns a lot of results(the word mentioned in all articles is e-tron)
The string "e-tron" returns 0 results even though this is the correct word mentioned in all my articles.
My current schema.xml config:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
What I want: The behaviour for the search string tron is okay of course, but I also want to have the correct matches for the search string e-tron.
The problem is that solr.StandardTokenizerFactory is splitting words by hyphens so "e-tron" generates the tokens "e", "tron". Presumably "e" is lost as solr.TextField filters with a minimum token size of 2.
This is one example that would show your specific problem.
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
solr.WhitespaceTokenizerFactory will generate tokens on whitespace. ["e-tron"]
solr.WordDelimiterFilterFactory will split on hyphens but also preserve the original word. ["e", "tron", "e-tron"]

Finding singular versions of a word in Sunspot/Solr

I have a Rails+Sunspot application and I'm working on configuring it so that searching returns the singluar version of the query. For instance:
I want a search for "cookies" to return something named "cookie". Currently my Sunspot search returns "cookies" but not "cookie" (singluar).
I've made some customizations to Solr's schema.xml, adding solr.EdgeNGramFilterFactory to provide more flexibility but EdgeNGramFilterFactory doesn't suite this case as it only allows matches when the query is a substring of the result's name. My understanding is EdgeNGramFilterFactory will return "cookie" when the user searches for "co", "coo", "cook" or "cooki", but not a superstring of "cookie" (ie: cookies). Simply put, this is because "cookies" is not a substring within "cookie".
I've tried adding all three of Solr's build-in stemming factories but to no avail. You can see one commented out in my schema.
In schema.xml, the relevant field looks as follows:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
<!-- <filter class="solr.EnglishMinimalStemFilterFactory"/> -->
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
I supposed I could singluarize the user's query but I would rather not touch their query before it hits Solr.
You can play with this here: http://staging.zisboombah.com/parent/food_guide/?search=cookie. Try changing the query between "cookie" and "cookies".
Any tips on how to do this in Solr would be greatly appreciated!
The solr xml options are ordered. You want the stemmer to come before the ngram filter, so that you ngram-ize cooki, rather than stemming c, co, etc.
Combining filters in this way may lead to some odd results, mostly depending on how aggressive your stemmer is. You should definitely add the stemmer to the query analyzer, but that will mess with your autocomplete.
A better solution: use a copyField to make independent text_stemmed and text_autocomplete fields. Then search using an OR query over both fields.
Like Kyle mentions, you probably want to use more text field types for each of these different use cases.
Here's an example of mine:
schema.xml
<schema>
<types>
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_en" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_stopwords" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
</analyzer>
</fieldType>
<!-- ... -->
</types>
<fields>
<!-- ... -->
</fields>
<copyField source="*_text" dest="text"/>
<copyField source="*_texts" dest="text"/>
<copyField source="*_textsv" dest="text"/>
<copyField source="*_textv" dest="text"/>
</schema>
Sunspot modeling
Using the copyField directive can save some setup work in the model. However, Sunspot uses those text declarations to decide which fields to keywords-search by default, so I like to include distinct text invocations that use :as to specify the full Solr document field name.
searchable do
text :name, stored: true, default_boost: 10
text :name, as: 'name_text_en'
text :description, stored: true
end

Adding stemming to my schema.xml file doesn't work

I'm trying to setup Websolr on my Heroku app. I'm following the instructions in the Heroku docs. I've got the initial setup working fine.
In development:
ruby-1.9.2-p0 > Note.search { keywords 'grit' }.results.length
=> 3
I am trying to add stemming. I updated the relevant part of my schema.xml file to this:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
I then reindexed:
$ rake sunspot:reindex
But it doesn't seem to work at all:
ruby-1.9.2-p0 > Note.search { keywords 'gri' }.results.length
=> 0
What am I doing wrong?
I have two ideas for you here:
Firstly, you didn't mention whether you were restarting Solr after changing your schema.xml. So: are you restarting Solr so your changes can take effect? :)
Next, I am wondering if the term grit would even qualify to have its t removed under the Porter stemming algorithm. You would need to have a close read of the PorterStemmer algorithm to be sure. But you may also try some more obvious examples (say, writing to write).

Resources