Thinking Sphinx and Norwegian characters (æ, ø, å) - ruby-on-rails

I've set up Thinking Sphinx for wildcard searches, but I'm having trouble searching for words containing Norwegian characters, as the automatic starring seems to mess up the query. For instance, my search for "ål" will end up with:
Sphinx Query (2.8ms) å*l*
Sphinx Found 0 results
If I manually enter the stars in the search term, "*ål*", the expected results are returned:
Sphinx Query (3.7ms) *ål*
Sphinx Found 8 results
It seems somehow the å (as well as æ, ø) gets misinterpreted when automatically adding the stars.
Anyone here familiar with this problem?
My config/sphinx.yml looks as follows:
development:
enable_star: 1
min_infix_len: 2
charset_table: "U+FF10..U+FF19->0..9, U+FF21..U+FF3A->a..z, U+FF41..U+FF5A->a..z, 0..9, A..Z->a..z, a..z,
U+C5->U+E5, U+E5, U+D8->U+F8, U+F8, U+C6->U+E6, U+E6,
U+C4->U+E4, U+E4, U+D6->U+F6, U+F6"
And a couple of examples of searches performed in the console:
ruby-1.9.2-p290 :014 > ThinkingSphinx.search("ål", :star => true).count
=> 0
ruby-1.9.2-p290 :015 > ThinkingSphinx.search("*ål*", :star => true).count
=> 8

This has been fixed in recent commits - for the moment, you'll need to grab it via the repo:
gem 'thinking-sphinx',
:git => 'git://github.com/freelancing-god/thinking-sphinx.git'

Related

Rails Digest::UUID v5 (vs) Postgresql uuid-ossp v5

I'm getting different V5 UUIDs when generating with Rails Digest::UUID and Postgresql uuid-ossp.
Rails:
[58] pry(main)> Digest::UUID.uuid_v5('e90bf6ab-f698-4faa-9d0f-810917dea53a', 'e90bf6ab-f698-4faa-9d0f-810917dea53a')
=> "db68e7ad-332a-57a7-9638-a507f76ded93"
Postgresql uuid-ossp:
select uuid_generate_v5('e90bf6ab-f698-4faa-9d0f-810917dea53a', 'e90bf6ab-f698-4faa-9d0f-810917dea53a');
uuid_generate_v5
--------------------------------------
6c569b95-a6fe-5553-a6f5-cd871ab30178
What would be the reason? I thought both should generate the same UUID when the input is the same, but it is different!
It's not an answer to the question about why Rails produces a different result, but if you want to produce v5 UUID in your Ruby code, you could use uuidtools. It returns the same result as PSQL:
~ pry
[1] pry(main)> require 'uuidtools'
=> true
[2] pry(main)> UUIDTools::UUID.sha1_create(UUIDTools::UUID.parse('e90bf6ab-f698-4faa-9d0f-810917dea53a'), 'e90bf6ab-f698-4faa-9d0f-810917dea53a')
=> #<UUID:0x3fe09ea60dd8 UUID:6c569b95-a6fe-5553-a6f5-cd871ab30178>
[3] pry(main)>
It seems that a patch is proposed so that working string-representation of namespaces can be enabled explicitly
The new behavior will be enabled by setting the config.active_support.use_rfc4122_namespaced_uuids option to
true.
but, the patch is very recent and it could be still under test. People can be afraid it breaks things. Check
https://github.com/rails/rails/issues/37681
https://github.com/rails/rails/pull/37682/files
Meanwhile, a workaround is to pack the namespace string
ns=n.scan(/(\h{8})-(\h{4})-(\h{4})-(\h{4})-(\h{4})(\h{8})/).flatten.map { |s| s.to_i(16) }.pack("NnnnnN")
In your example
irb(main):037:0> n='e90bf6ab-f698-4faa-9d0f-810917dea53a'
=> "e90bf6ab-f698-4faa-9d0f-810917dea53a"
irb(main):038:0> ns=n.scan(/(\h{8})-(\h{4})-(\h{4})-(\h{4})-(\h{4})(\h{8})/).flatten.map { |s| s.to_i(16) }.pack("NnnnnN")
=> "\xE9\v\xF6\xAB\xF6\x98O\xAA\x9D\x0F\x81\t\x17\xDE\xA5:"
irb(main):039:0> puts Digest::UUID.uuid_v5(ns, 'e90bf6ab-f698-4faa-9d0f-810917dea53a')
6c569b95-a6fe-5553-a6f5-cd871ab30178

Putting ~ after id fetches record

If we have an Active Record database say Users
User.find(id) works as expected:
But so does User.find('id~')
Also User.find('id~gibberish')
Is this a vulnerability or flaw of ActiveRecord?
How do I handle such requests appropriately?
This should help clear some things up, it is not ActiveRecord, it's Ruby's to_i method that you're seeing.
2.2.1 :001 > '11'.to_i
=> 11
2.2.1 :002 > '11~'.to_i
=> 11
2.2.1 :003 > '11~gibberish'.to_i
=> 11
This is not a vulnerability nor a flaw. If you're worried about input like this, I'd ask for an example where you think it could cause you harm.
Additionally if you'd like to be super defensive, use Integer(
2.2.1 :004 > Integer('11~gibberish')
ArgumentError: invalid value for Integer(): "11~gibberish"
2.2.1 :005 > Integer('11')
=> 11

What is wrong with this Sunspot Solr setup?

I am using sunspot to search my local db. After adding the gems, running the generate command, and booting up the solr server I do the following:
class Style < ActiveRecord::Base
attr_accessible :full_name, :brand_name
searchable do
text :full_name
text :brand_name
end
end
Added the above to my Style model and re-indexed (I had already indexed prior to creating this post, which is why I re-indexed to put it here)
funkdified#vizio ~/rails_projects/goodsounds.org $ rake sunspot:solr:reindex
[RailsAdmin] RailsAdmin initialization disabled by default. Pass SKIP_RAILS_ADMIN_INITIALIZER=false if you need it.
*Note: the reindex task will remove your current indexes and start from scratch.
If you have a large dataset, reindexing can take a very long time, possibly weeks.
This is not encouraged if you have anywhere near or over 1 million rows.
Are you sure you want to drop your indexes and completely reindex? (y/n)
y
[#######################################] [14/14] [100.00%] [00:00] [00:00] [53.19/s]
Then I try a search and get nothing
1.9.3p392 :003 > Style.search { fulltext 'Monkey' }.results
SOLR Request (10.4ms) [ path=#<RSolr::Client:0x0000000685ab28> parameters={data: fq=type%3AStyle&q=Monkey&fl=%2A+score&qf=full_name_text+brand_name_text&defType=dismax&start=0&rows=30, method: post, params: {:wt=>:ruby}, query: wt=ruby, headers: {"Content-Type"=>"application/x-www-form-urlencoded; charset=UTF-8"}, path: select, uri: http://localhost:8982/solr/select?wt=ruby, open_timeout: , read_timeout: , retry_503: , retry_after_limit: } ]
=> []
But, wait shouldn't it have worked and picked this up?
Style.first
Style Load (1.3ms) SELECT "styles".* FROM "styles" LIMIT 1
=> #<Style id: 54, brand_name: "Monkey", full_name "Monkey Chicken", created_at: "2013-02-01 23:25:58", updated_at: "2013-02-16 03:02:16">
Here is one more clue. I am seeing "unknown field" for brand_name (setup in Style.rb)
If you change the schema (the "searchable" block) you have to either reindex all models:
rake sunspot:solr:reindex
or reindex that specific model with a given batch size (here 500):
rake sunspot:solr:reindex[500,Style]
as per the Sunspot doco on Github (search "Reindexing Objects").
FYI, to use Style.reindex for non-schema changes, you will have to call Sunspot.commit to save changes.

Full-text search on Heroku using pg_search gem

I've implemented full-text search using pg_search gem for my Rails application
My migration to create index looks like
execute(<<-'eosql'.strip)
CREATE index mytable_fts_idx
ON mytable
USING gin(
(setweight(to_tsvector('english', coalesce("mytable"."name", '')), 'A') ||
' ' ||
setweight(to_tsvector('english', coalesce("mytable"."description",'')), 'B')
)
)
eosql
And my controller code looks like
pg_search_scope :full_text_search,
:against => [
:name, :description],
:using => {
:tsearch => {
:prefix => true,
:dictionary => "english",
:any_word => true
}
}
which works totally fine locally on Postgres 9.0.4. However, when I deploy the same to heroku and search for a sample query 'test', it throws up an error
PGError: ERROR: syntax error in tsquery: "' test ':*"
SELECT COUNT(count_column) FROM (SELECT 1 AS count_column FROM "mytable" WHERE (((to_tsvector('english', coalesce("mytable"."name", '')) || to_tsvector('english', coalesce("mytable"."description", ''))) ## (to_tsquery('english', ''' ' || 'test' || ' ''' || ':*')))) LIMIT 12 OFFSET 0) subquery_for_count ):
Any suggestions on where I'm wrong and what I should be looking at to fix this error? Thanks.
I'm the main developer of pg_search. Sorry that you ran into that problem! Right now there is a pg_search bug when using :prefix searches against PostgreSQL 8.3 (the default for Heroku).
https://github.com/Casecommons/pg_search/issues/10
It's my top priority right now. I'm still figuring out the best way to get the test suite to run against both 8.x and 9.x.
Update: Unfortunately, :prefix searches don't work against PostgreSQL 8.3 at all. The functionality was introduced in 8.4. I've released pg_search 0.3.3 which improves the error message. Hopefully Heroku will upgrade to 9.0 across the board soon. I believe they want to do so, but they obviously can't just upgrade everyone wholesale without warning.

PostgreSQL and Heroku, find and group

I'm trying to get an app to run on Heroku properly. (Heroku uses the postgreSQL database, yeh?)
In development, I'm using sqlite, and this is my code in a controller =>
#productsort = Products.find(:all,
:select => 'count(*) count, color',
:group => 'color',
:order => 'count DESC',
:conditions => 'size = "Small"')
As you can see, I'm trying to group products by their colors, and order them by greatest amount to least.
Also, the products must be "Small". (the conditions)
In SQL, it works fine.
But not in PostgreSQL (heroku).
This is from running "heroku log"
2011-06-20T18:20:33+00:00 app[web.1]: ActiveRecord::StatementInvalid (PGError: ERROR: column "Small" does not exist
2011-06-20T18:20:33+00:00 app[web.1]: LINE 1: ...ducts".* FROM "products" WHERE (size = "Smal...
Hm... I've searched around and I couldn't find anything similar to what I have.
All help would be appreicated. Thank you
You need to be using single quotes around your strings in the conditions (double quotes may work with sqlite, but they definitely don't with PostgreSQL).
So replace your conditions with this:
:conditions => "size = 'Small'"
It will still work in SQLite too.

Resources