How to show excerpts from pg-search multisearch results - ruby-on-rails

I've set up pg_search in my Rails app on Heroku:
#query = 'fast'
PgSearch.multisearch(#query) #=>
[#<PgSearch::Document searchable: ferrari, :content: 'this car is really fast'>,
#<PgSearch::Document searchable: viper, :content: 'a very fast car'>]
I'd like to display these results with excerpts from content to show where the match occurs. I can call excerpt(content, #query) to get exactly what I want when #query is only one word, but excerpt() only handles exact matches, so if:
#query = 'car fast'
PgSearch.multisearch(#query) #=>
[#<PgSearch::Document searchable: ferrari, :content: 'this car is really fast'>,
#<PgSearch::Document searchable: viper, :content: 'a very fast car'>]
then excerpt(content, #query) is nil because nowhere in content does the exact phrase 'car fast' appear.
I considered excerpt(content, #query.split(' ').first) to at least show something for multi-word queries, but there are still cases such as this:
#query = 'car?'
#results = PgSearch.multisearch(#query) #=>
[#<PgSearch::Document searchable: ferrari, :content: 'this car is really fast'>,
#<PgSearch::Document searchable: viper, :content: 'a very fast car'>]
excerpt(#results.first.content, #query) #=> nil
So, how do folks show excerpts from search results when using pg_search?

I'm the author and maintainer of pg_search.
Right now there isn't a built-in way to get excerpts along-side your results in pg_search, but there easily could be if I or someone else has the time to build it in.
PostgreSQL has a function ts_headline that you can call which returns a string excerpt as a column.
It might be possible to call something like this (I haven't tested it yet):
PgSearch.multisearch(#query).select(["ts_headline(pg_search_documents.content, plainto_tsquery(?)) AS excerpt", #query])
Then each of your results should have an excerpt method that returns something like what you want.
By the way, this is something that I eventually want to make automatic in pg_search. I just haven't had the time to delve too deeply into it yet.

FWIW— Following nertzy's example above, I was able to get this to work with the following:
PgSearch.multisearch(#query).select("ts_headline(pg_search_documents.content, plainto_tsquery('english', ''' ' || unaccent('#{#query}') || ' ''' || ':*')) AS excerpt")
I was having trouble getting plainto_tsquery(?) to work, as it was throwing a syntax error. My solution above was simply the result of doing
PgSearch.multisearch(#query).select(["ts_headline(pg_search_documents.content, plainto_tsquery(?)) AS excerpt", #query]).to_sql
and then plugging in the to_tsquery arguments for the new plainto_tsquery call—something I'm sure is not entirely sound, but seems to work.

If you interpolate the string, you will be subject to sql injection attacks.
Since the .select won't accept a parameterized statement like the .where does (Users.where("id = ?", params[:id])), you will need to sanitize explicitly.
sanitized = ActionController::Base.helpers.sanitize(params[:q])
#results = PgSearch.multisearch(params[:q])
.select(["ts_headline(pg_search_documents.content, plainto_tsquery('english', ''' ' || '#{sanitized}' || ' ''' || ':*')) AS excerpt"])

There's a much easier way, if you don't feel like digging through SQL - you can make use of built in pg_search gem functionality to display excerpts in a really simple and straightforward way:
In your controller:
#articles = Article.search(params[:search]).with_pg_search_highlight
In your view:
= raw(article.pg_search_highlight)
That should do it.

Related

Using column_names.include? to protect against SQL Injection?

I have a rails API that handles requests from my front end. These requests include query parameters in the url for refining and sorting results from the database. An example URL query looks like this:
http://localhost:8000/clients?_sort=name&_order=DESC&_start=0&_end=10
My index method in my controller grabs these params and uses them for filtering and sorting:
def index
#all_clients = Client.all
response.headers['X-Total-Count'] = #all_clients.count
if (Client.column_names.include?(params[:_sort]))
if (params[:_order] == 'ASC')
#clients = Client.filtered(params[:_start].to_i, params[:_end].to_i).order("#{params[:_sort]} asc")
else
#clients = Client.filtered(params[:_start].to_i, params[:_end].to_i).order("#{params[:_sort]} desc")
end
end
json_response(#clients || #all_clients)
end
the filtered method is a scope which looks like this:
scope :filtered, -> (_start, _end) { limit(_end-_start).offset(_start) }
My question is this: by using Client.column_names.include? to check if params[:_sort] is a valid attribute to sort by, am I effectively whitelisting against SQL Injection? If not, how could I alter this code to protect against SQL Injection?
The important thing to consider here is not "the whitelisting of params" (since you're already cherry-picking which parameters to use anyway, rather than blindly using the whole params hash for something), but rather how you are constructing the SQL.
There are two potential injection areas in the code:
limit(_end-_start)
Is this vulnerable? No. If _end or _start are anything besides integers, then the code will just fail with an error message - such as:
NoMethodError: undefined method `-' for "DROP_TABLE":String
or
ArgumentError: invalid value for Integer(): 3.14159
order("#{params[:_sort]} desc")
Is this vulnerable? Yes. (But not easily.) This page gives a concrete example:
params[:_sort] = "(CASE SUBSTR(password, 1, 1) WHEN 's' THEN 0 else 1 END)"
You should never use direct string interpolation in SQL, unless you are absolutely 100% sure that the string is "safe". In this case, you could just write it as:
order(params[:_sort] => :asc)

Exact Term Search In Rails

I'm trying to build a basic search where only the entire exact search term shows results. Currently it is showing results based on individual words.
Here's the code from the model:
def search
find(:all, :conditions => ['term' == "%#{search}%"])
end
Sorry in advance. I'm very new to rails!
Thank you.
Remove the % from "%#{search}%" so it's "#{search}".
% is a wildcard that matches every result containing the word. So "%tea%" for example would match tear, nestea, and steam, when that's not what you want.
This should yield an exact match:
def search
find(:all, :conditions => ['term' == "#{search}"])
end
Your code doesn't work for several reasons.
You do not pass any value to that method. Therefore search will always be nil.
The ['term' == "%#{search}%"] condition doesn't make much sense because - as I said before - search is undefined and therefore the condition will is the same as ['term' == "%%"]. The string term is not equal to %% therefore the whole condition is basically: [false].
Rails 5.0 uses a different syntax for queries. The syntax you used is very old and doesn't work anymore.
I would do something like this:
# in your model
scope :search, -> (q) {
q.present? ? where("column_name LIKE :query", query: "%#{q}%") :none
}
# in your controller
def set_index
#b = Best.search(params[:search]).order(:cached_weighted_score => :desc)
end

rails dynamic where sql query

I have an object with a bunch of attributes that represent searchable model attributes, and I would like to dynamically create an sql query using only the attributes that are set. I created the method below, but I believe it is susceptible to sql injection attacks. I did some research and read over the rails active record query interface guide, but it seems like the where condition always needs a statically defined string as the first parameter. I also tried to find a way to sanitize the sql string produced by my method, but it doesn't seem like there is a good way to do that either.
How can I do this better? Should I use a where condition or just somehow sanitize this sql string? Thanks.
def query_string
to_return = ""
self.instance_values.symbolize_keys.each do |attr_name, attr_value|
if defined?(attr_value) and !attr_value.blank?
to_return << "#{attr_name} LIKE '%#{attr_value}%' and "
end
end
to_return.chomp(" and ")
end
Your approach is a little off as you're trying to solve the wrong problem. You're trying to build a string to hand to ActiveRecord so that it can build a query when you should simply be trying to build a query.
When you say something like:
Model.where('a and b')
that's the same as saying:
Model.where('a').where('b')
and you can say:
Model.where('c like ?', pattern)
instead of:
Model.where("c like '#{pattern}'")
Combining those two ideas with your self.instance_values you could get something like:
def query
self.instance_values.select { |_, v| v.present? }.inject(YourModel) do |q, (name, value)|
q.where("#{name} like ?", "%#{value}%")
end
end
or even:
def query
empties = ->(_, v) { v.blank? }
add_to_query = ->(q, (n, v)) { q.where("#{n} like ?", "%#{v}%") }
instance_values.reject(&empties)
.inject(YourModel, &add_to_query)
end
Those assume that you've properly whitelisted all your instance variables. If you haven't then you should.

Rails/Postgres query Hstore for presence

I have some HStore columns in my Rails app. Looking around the Posgresql documentation and some blog posts, I've discovered that, given an HStore value that is a hash like this:
{ some_key: 'value' }
I can query the columns like so:
Model.where("the_hstore_column -> 'some_key' = 'value'")
There are a bunch of issues with this as a sufficient querying tool, however:
It really only makes sense for super simple values. If the value is
itself a hash or something, I have no idea how to search for it
effectively, if I don't know all its contents. (Even if I did, I'd
have to turn them all to stringified JSON or something).
It isn't helpful (at least, I can't make it helpful) for doing
queries for the presence or non-presence of the column's content (or
the content of any key under the column).
In other words (in pseudo-sql), I'd love to be able to do this:
Model.where("the_hstore_column = {}")
Model.where.not("the_hstore_column = {}")
Or:
Model.where("the_hstore_column -> 'some_key' = NULL")
Model.where.not("the_hstore_column -> 'some_key' = NULL")
Or best yet, given an HStore of value { some_key: { sub_key: 'value' } }:
Model.where("the_hstore_column -> 'some_key' INCLUDES (?)", 'sub_key')
Model.where.not("the_hstore_column -> 'some_key' INCLUDES (?)", 'sub_key')
These appear not to be working, but for the life of me, I can't find great information on how to conduct these queries. Does anyone know how to write them, or where I could look for better information?
UPDATE After more looking around, I found this post, which looked promising, but I can't get the code to actually work. For example Model.where("the_hstore_column ? :key", key: 'some_key') is returning an empty relation, even if there are many Model objects with some_key in the_hstore_column.
As requested by Andrius Buivydas, I'm pasting the relevant portion of my model below. It's relatively brief, because I decided to abstract out the Hstore-accessing into a module I wrote, essentially turning it into a hash store (which I though, apparently incorrectly, was its whole purpose). I tried using the built-in store_accessor, but it didn't work to my liking, (didn't seem to help parse saved hashes, for now obvious-seeming reasons) thus the ActiveRecord::MetaExt::HstoreAccessor module below. (ParsedHstore just takes an HStore string and parses it into a hash).
class Place.rb
include ActiveRecord::MetaExt::HstoreAccessor
hstore_accessor :hours, :extra
end
(In a separate file):
module ActiveRecord
module MetaExt
module HstoreAccessor
module ClassMethods
def hstore_accessor(*symbols)
symbols.each do |field|
class_eval do
define_method(field) do
ParsedHstore.new(self[field.to_sym]).hash_value
end
define_method("add_to_#{field}!") do |arg|
self[field.to_sym] = self[field.to_sym].merge(arg)
send("#{field}_will_change!")
save
arg
end
define_method("add_to_#{field}") do |arg|
self[field.to_sym] = self[field.to_sym].merge(arg)
send("#{field}_will_change!")
arg
end
define_method("remove_from_#{field}") do |arg|
other = self[field].dup
other.delete(arg.to_s); other.delete(arg.to_sym)
self[field.to_sym] = other
send("#{field}_will_change!")
self[field.to_sym]
end
define_method("remove_from_#{field}!") do |arg|
other = self[field].dup
other.delete(arg.to_s); other.delete(arg.to_sym)
self[field.to_sym] = other
send("#{field}_will_change!")
save
self[field.to_sym]
end
define_method("set_#{field}") do |arg|
self[field.to_sym] = arg
send("#{field}_will_change!")
self[field.to_sym]
end
define_method("set_#{field}!") do |arg|
self[field.to_sym] = arg
send("#{field}_will_change!")
self[field.to_sym]
end
end
end
end
end
def self.included(base)
base.extend ClassMethods
end
end
end
end
The idea is that this lets me easily add/remove values to an HStore field, without having to think about the merging and _will_change! logic every time. So I could do this, for example: Place.first.add_to_extra({ key: 'value'}).
Should I have made these fields json or jsonb? I feel like I'm reinventing the wheel here, or, more aptly, trying to turn a horse into a wheel or something.
Also, I may be misunderstanding the query example. I literally tried this query in my database (which has many places with a non-empty ranking key under the extra field), and turned up with an empty relation:
Place.where("extra ? :key", key: 'ranking')
I wouldn't be surprised if I messed this syntax up, as it seems really strange. Wouldn't that replace the ? with 'ranking', turning the query into this?: Place.where("extra ranking :key")? Seems weird and emphatically different from any other SQL I've run. Or is it turning to Place.where("extra ? ranking")? But ? is usually for safe injection of variables, no?
Please let me know if something else in my model or elsewhere would be more relevant.
It really only makes sense for super simple values. If the value is itself a hash or something, I have no idea how to search for it effectively, if I don't know all its contents.
Postgresql HStore stores key, values pairs that are both strings, only. So you can't store a hash, a nil or something else like an object - they will be converted to the strings.
Model.where("the_hstore_column ? :key", key: 'some_key')
That should work if everything is defined correctly. Could you paste an extract of the model file with the definition of the hstore column values?
Another way to find empty hstore columns is
Model.where(hstore_column: ['',nil])
OR
Model.where("hstore_column='' OR hstore_column IS NULL")

Rails - Conditional Query, with ActiveRecord?

Given a query like:
current_user.conversations.where("params[:projectid] = ?", projectid).limit(10).find(:all)
params[:projectid] is being sent from jQuery ajax. Sometimes that is an integer and the above works fine. But if the use selects "All Projects, that's a value of '' which rails turns into 0. which yields an invalid query
How with rails do you say search params[:projectid] = ? if defined?
Thanks
I think you may have mistyped the query a bit. "params[:projectid] = ?" shouldn't be a valid query condition under any circumstances.
In any case, you could do some sort of conditional statement:
if params[:project_id].blank?
#conversations = current_user.conversations.limit(10)
else
#conversations = current_user.conversations.where("project_id = ?", params[:project_id]).limit(10)
end
Although, I'd probably prefer something like this:
#conversations = current_user.conversations.limit(10)
#converstaions.where("project_id = ?", params[:project_id]) unless params[:project_id].blank?
Sidenotes:
You don't have to use .find(:all). Rails will automatically execute the query when the resultset is required (such as when you do #conversations.each).
Wherever possible, try to adhere to Rails' snakecasing naming scheme (eg. project_id as opposed to projectid). You'll save yourself and collaborators a lot of headaches in the long run.
Thanks but if the where query has lets say 3 params, project_id, project_status, ... for example, then the unless idea won't work. I'm shocked that Rails doesn't have a better way to handle conditional query params
EDIT: If you have multiple params that could be a part of the query, consider the fact that where takes a hash as its argument. With that, you can easily build a parameter hash dynamically, and pass it to where. Something like this, maybe:
conditions = [:project_id, :project_status, :something_else].inject({}) do |hsh, field|
hsh[field] = params[field] unless params[field].blank?
hsh
end
#conversations = current_user.conversations.where(conditions).limit(10)
In the above case, you'd loop over all fields in the array, and add each one of them to the resulting hash unless it's blank. Then, you pass the hash to the where function, and everything's fine and dandy.
I didn't understand why you put:
where("params[:projectid] = ?", projectid)
if you receive params[:project] from the ajax request, the query string shouldn't be:
where("projectid = ?", params[:projectid])
intead?
And if you are receiving an empty string ('') as the parameter you can always test for:
unless params[:projectid].blank?
I don't think i undestood your question, but i hope this helps.

Resources