I have been building an app these days. The functionality is nothing fancy at all, I have to connect to a client's SOAP webservice, fetch some data, save it into my pg database and build a search functionality based on this data.
The search has to be performed on two tables, both combined are like 80K rows. It needs to look for every word in the input text in several fields from these two tables, which have a classical assocation one to many.
Previous to get my hands dirty I was looking at the choices I had to get the functionality done (ransack, searchkick, scoped_search etc), but I ended up trying first just vanilla Active Record and I was very surprised to find that I could achieve the functionality way easier than I thought and with an acceptable response time, about to 400ms active record time for the most expensive queries in local.
So the problem is, the performance of this app in Heroku is way worse than in local (I'm developing using a vagrant box btw). On average, queries take 2-3 times longer than in local, so the user experience goes from acceptable to poor. I was wondering If someone could help to improve my query. I'm also worried about how the background job that fetchs the data is also way les performant than in local and about some issues with the memory, but that's a different story though.
The relevant snippets are these:
part_master.rb where the search method is implemented:
class PartMaster < ApplicationRecord
has_many :part_variants, foreign_key: 'sap_cod', primary_key: 'sap_cod'
has_many :locations, foreign_key: 'sap_cod', primary_key: 'sap_cod'
scope :con_stock, -> { where("stock > 0") }
scope :planta, -> (planta) { where planta_cod: planta}
def self.search(params)
recordset = PartMaster.joins(:part_variants).all
recordset = recordset.con_stock if params[:stock].present?
recordset = recordset.planta(params[:planta]) if params[:planta].present?
recordset = search_keywords(params[:search], recordset)
recordset
end
private
def self.search_keywords(query, recordset)
keywords = query.to_s.strip.split
if query
keywords.each do |keyword|
recordset = recordset.where('part_masters.sap_cod ILIKE :q OR
unaccent(descripcion_maestro) ILIKE unaccent(:q)
OR fabricante ILIKE :q OR ref_fabricante ILIKE :q
OR fabricante_prov ILIKE :q OR ref_prov ILIKE :q',
q: "%#{keyword}%")
end
recordset.distinct.order(:sap_cod)
end
end
end
And this is the call to the method from the controller:
def index
parts = params[:search].present? ? PartMaster.search(params) :
PartMaster.none
#parts = parts.page(params[:page]).per(50)
end
I have an index in every searchable field.
EDIT: Finally I have tried a mix of the proposal in the answers. I have created one field in each table that is a concatenation of the relevant fields for the search, having so 2 OR statements instead of 5, and I also have put trigram GIN indexes in both new fields. I haven't seen any improvement though, the times corresponding to ActiveRecord are very similar, perhaps marginally better.
The thing is, the output for the query using EXPLAIN dones't show any info about the indexes being used.
Hash Join (cost=2243.29..6067.41 rows=2697 width=132)
Hash Cond: ((part_variants.sap_cod)::text = (part_masters.sap_cod)::text)
Join Filter: ((part_masters.combinada_maestro ~~* '%rodamiento%'::text) OR (part_variants.combinada_info ~~* '%rodamiento%'::text))
-> Seq Scan on part_variants (cost=0.00..1264.96 rows=54896 width=18)
-> Hash (cost=1128.13..1128.13 rows=34813 width=132)
-> Seq Scan on part_masters (cost=0.00..1128.13 rows=34813 width=132)
(6 rows)
Suggestions to improve the AR query speed use a direct Postgresql query in your model
Example for in you keywords loop
query = "SELECT * FROM part_masters WHERE......"
PartMaster.connection.execute(query, :skip_logging)
I agree with spikermann. Also the multiple ORs in a loop is not helping neither.
If you only want to work on a vanilla solution vs adding SOLR or any other engine, you could have one separate field to hold copies of the strings that you would like to search. (ex. name, description, ...). The search this field only. You will need some method to keep the field updated when the name, description or other values change.
Related
In our application, the Recipe model has many ingredients (many-to-many relationship implemented using :through). There is a query to return all the recipes where at least one ingredient from the list is contained (using ILIKE or SIMILAR TO clause). I would like to pose two questions:
What is the cleanest way to write the query which will return this in Rails 6 with ActiveRecord. Here is what we ended up with
ingredients_clause = '%(' + params[:ingredients].map { |i| i.downcase }.join("|") + ')%'
recipes = recipes.where("LOWER(ingredients.name) SIMILAR TO ?", ingredients_clause)
Note that recipes is already created before this point.
However, this is a bit dirty solution.
I also tried to use ILIKE = any(array['ing1', 'ing2',..]) with the following:
ingredients_clause = params[:ingredients].map { |i| "'%#{i}%'" }.join(", ")
recipes = recipes.where("ingredients.name ILIKE ANY(ARRAY[?])", ingredients_clause)
This won't work since ? automatically adds single quotes so it would be
ILIKE ANY (ARRAY[''ing1', 'ing2', 'ing3'']) which is of course wrong.
Here, ? is used to sanitise parameters for SQL query, so avoid possible SQL injection attacks. That is why I don't want to write a plain query formed from params.
Is there any better way to do this?
What is the best approach to order results by the number of ingredients that are matched? For example, if I search for all recipes that contains ingredients ing1 and ing2 it should return those which contains both before those which contains only one ingredient.
Thanks in advance
For #1, a possible solution would be something like (assuming the ingredients table is already joined):
recipies = recipies.where(Ingredients.arel_table[:name].lower.matches_any(params[:ingredients]))
You can find more discussion on this kind of topic here: Case-insensitive search in Rails model
You can access a lot of great SQL query features via #arel_table.
#2 If we assume all the where clauses are applied to recipies already:
recipies = recipies
.group("recipies.id")
# Lets Rails know you meant to put a raw SQL expression here
.order(Arel.sql("count(*) DESC"))
In a rails 4 app, in one model I have a column containing multiple ids as a string with comma separated values.
"123,4568,12"
I have a "search" engine that I use to retrieve the records with one or many values using the full text search of postgresql I can do something like this which is very useful:
records = MyModel.where("my_models.col_name ## ?", ["12","234"])
This return all the records that have both 12 and 234 in the targeted column. The array comes from a form with a multiple select.
Now I'm trying to make a query that will find all the records that have either 12 or 234 in there string.
I was hopping to be able to do something like:
records = MyModel.where("my_models.col_name IN (?)", ["12","234"])
But it's not working.
Should I iterate through all the values in the array to build a query with multiple OR ? Is there something more appropriate to do this?
EDIT / TL;DR
#BoraMa answer is a good way to achieve this.
To find all the records containing one or more ids referenced in the request use:
records = MyModel.where("my_models.col_name ## to_tsquery(?)", ["12","234"].join('|'))
You need the to_tsquery(?) and the join with a single pipe |to do a OR like query.
To find all the records containing exactly all the ids in the query use:
records = MyModel.where("my_models.col_name ## ?", ["12","234"])
And of course replace ["12","234"] with something like params[:params_from_my_form]
Postgres documentation for full text search
If you already started to use the fulltext search in Postgres in the first place,I'd try to leverage it again. I think you can use a fulltext OR query which can be constructed like this:
records = MyModel.where("my_models.col_name ## to_tsquery(?)", ["12","234"].join(" | "));
This uses the | operator for ORing fulltext queries in Postgres. I have not tested this and maybe you'll need to do to_tsvector('my_models.col_name') for this to work.
See the documentation for more info.
Suppose your ids are :
a="1,2,3,4"
You can simply use:
ModelName.find(a)
This will give you all the record of that model whose id is present in a.
I just think a super simple solution, we just sort the ids in saving callback of MyModel, then the query must be easier:
class MyModel < ActiveRecord::Base
before_save :sort_ids_in_col_name, if: :col_name_changed?
private
def sort_ids_in_col_name
self.col_name = self.col_name.to_s.split(',').sort.join(',')
end
end
Then the query will be easy:
ids = ["12","234"]
records = MyModel.where(col_name: ids.sort.join(',')
I have two models user and orders.
I have one in User model as has_many orders.
I am creating one index in thinking sphinx like :
has association(:created_at), as: :order_time, type: :timestamp
Now I want to search for users who have created any order in some time range. and using above index as
User.search with: {:order_time => t1..t2}
But, this is not giving accurate results. Any idea what am I doing wrong here.
Also I tried writing a sql query also something like
user_order_time = <<-SQL
SELECT orders.created_at
FROM orders
WHERE (orders.creator_id = users.id)
SQL
and added index in this way
has "#{user_order_time}", as: :order_time, type: :timestamp
and tries to use this index, even this isn't working.
Can anyone tell me the problem with each approach.
Firstly, this answer is written presuming you're using SQL-backed indices (using the :with => :active_record option in your index definition) rather than real-time indices, and you're using Thinking Sphinx v3.
To cover your second approach first:
user_order_time = <<-SQL SELECT orders.created_at FROM orders WHERE (orders.creator_id = users.id) SQL
has "#{user_order_time}", as: :order_time, type: :timestamp
This will not work. You can refer to SQL snippets in attributes and fields, but only the sections that go in the SELECT clause. You cannot use full SQL queries.
However, with this approach, you're on the right track:
has association(:created_at), as: :order_time, type: :timestamp
Are you using association there just when writing this question, not in your actual code? Because it should be something like this:
has orders.created_at, as: :order_time
I've not specified the type - Thinking Sphinx will automatically detect this from the database.
If that doesn't work, it's worth looking at the generated SQL query in the Sphinx configuration file for clues as to why it's not returning the values you're expecting (locally, that's config/development.sphinx.conf, and you're looking for the sql_query setting in source user_core_0).
I have following method in a model named CashTransaction.
def is_refundable?
self.amount > self.total_refunded_amount
end
def total_refunded_amount
self.refunds.sum(:amount)
end
Now I need to extract all the records which satisfy the above function i.e records which return true.
I got that working by using following statement:
CashTransaction.all.map { |x| x if x.is_refundable? }
But the result is an Array. I am looking for ActiveRecord_Relation object as I need to perform join on the result.
I feel I am missing something here as it doesn't look that difficult. Anyways, it got me stuck. Constructive suggestions would be great.
Note: Just amount is a CashTransaction column.
EDIT
Following SQL does the job. If I can change that to ORM, it will still do the job.
SELECT `cash_transactions`.* FROM `cash_transactions` INNER JOIN `refunds` ON `refunds`.`cash_transaction_id` = `cash_transactions`.`id` WHERE (cash_transactions.amount > (SELECT SUM(`amount`) FROM `refunds` WHERE refunds.cash_transaction_id = cash_transactions.id GROUP BY `cash_transaction_id`));
Sharing Progress
I managed to get it work by following ORM:
CashTransaction
.joins(:refunds)
.group('cash_transactions.id')
.having('cash_transactions.amount > sum(refunds.amount)')
But what I was actually looking was something like:
CashTransaction.joins(:refunds).where(is_refundable? : true)
where is_refundable? being a model function. Initially I thought setting is_refundable? as attr_accesor would work. But I was wrong.
Just a thought, can the problem be fixed in an elegant way using Arel.
There are two options.
1) Finish, what you have started (which is extremely inefficient when it comes to bigger amount of data, since it all is taken into the memory before processing):
CashTransaction.all.map(&:is_refundable?) # is the same to what you've written, but shorter.
SO get the ids:
ids = CashTransaction.all.map(&:is_refundable?).map(&:id)
ANd now, to get ActiveRecord Relation:
CashTransaction.where(id: ids) # will return a relation
2) Move the calculation to SQL:
CashTransaction.where('amount > total_refunded_amount')
Second option is in every possible way faster and efficient.
When you deal with database, try to process it on the database level, with smallest Ruby involvement possible.
EDIT
According to edited question here is how you would achieve the desired result:
CashTransaction.joins(:refunds).where('amount > SUM(refunds.amount)')
EDIT #2
As to your updates in question - I don't really understand, why you have latched onto is_refundable? as an instance method, which could be used in query, which is basically not possible in AR, but..
My suggestion is to create a scope is_refundable:
scope :is_refundable, -> { CashTransaction
.joins(:refunds)
.group('cash_transactions.id')
.having('cash_transactions.amount > sum(refunds.amount)')
}
Now it is available in as short notation as
CashTransaction.is_refundable
which is shorter and more clear than aimed
CashTransaction.where('is_refundable = ?', true)
You can do it this way:
cash_transactions = CashTransaction.all.map { |x| x if x.is_refundable? } # Array
CashTransaction.where(id: cash_transactions.map(&:id)) # ActiveRecord_Relation
But, this is an in-efficient way of doing it as the other answerers also mentioned.
You can do it using SQL if amount and total_refunded_amount are the columns of the cash_transactions table in the database which will be much more efficient and performant:
CashTransaction.where('amount > total_refunded_amount')
But, if amount or total_refunded_amount are not the actual columns in the database, then you can't do it this way. Then, I guess you have do it the other way which is in-efficient than using raw SQL.
I think you should pre-compute is_refundable result (in a new column) when a CashTransaction and his refunds (supposed has_many ?) are updated by using callbacks :
class CashTransaction
before_save :update_is_refundable
def update_is_refundable
is_refundable = amount > total_refunded_amount
end
def total_refunded_amount
self.refunds.sum(:amount)
end
end
class Refund
belongs_to :cash_transaction
after_save :update_cash_transaction_is_refundable
def update_cash_transaction_is_refundable
cash_transaction.update_is_refundable
cash_transaction.save!
end
end
Note : The above code must certainly be optimized to prevent some queries
They you can query is_refundable column :
CashTransaction.where(is_refundable: true)
I think it's not bad to do this on two queries instead of a join table, something like this
def refundable
where('amount < ?', total_refunded_amount)
end
This will do a single sum query then use the sum in the second query, when the tables grow larger you might find that this is faster than doing a join in the database.
I'm experimenting with a few concepts (actually playing and learning by building a RoR version of the 1978 database WHATSIT?).
It basically is a has_many :through structure with Subject -> Tags <- Value. I've tried to replicate a little of the command line structure by using a query text field to enter the commands. Basically things like: What's steve's phone.
Anyhow, with that interface most of the searches use ILIKE. I though about enhancing it by allowing OR conditions using some form of an array. Something like What's steve's [son,daugher]. I got it working by creating the ILIKE clause directly, but not with string replacement.
def bracket_to_ilike(arrel,name,bracket)
bracket_array = bracket.match(/\[([^\]]+)\]/)[1].split(',')
like_clause = bracket_array.map {|i| "#{name} ILiKE '#{i}' "}.join(" OR ")
arrel.where(like_clause)
end
bracket_to_ilike(tags,'tags.name','[son,daughter]') produces the like clause tags.name ILiKE 'son' OR tags.name ILiKE 'daughter'
And it get the relations, but with all the talk about using the form ("tags.name ILiKE ? OR tags.name ? ",v1,v2,vN..)., I though I'd ask if anyone has any ideas on how to do that.
Creating variables on the fly is doable from what I've searched, but not in favor. I just wondered if anyone has tried creating a method that can add a where clause that has a variable number parameters.I tried sending the where clause to the relation, but it didn't like that.
Steve
Couple of things to watch out for in your code...
What will happen when one of the elements of bracket_array contains a single quote?
What will happen if I take it step farther and set an element to say "'; drop tables..."?
My first stab at refactoring your code would be to see if Arel can do it. Or Sequeel, or whatever they call the "metawhere" gem these days. My second stab would be something like this:
arrel.where( [ bracket_array.size.times.map{"#{name} ILIKE ?"}.join(' OR '), *bracket_array ])
I didn't test it, but the idea is to use the size of bracket_array to generate a string of OR'd conditions, then use the splat operator to pass in all the values.
Thanks to Phillip for pointing me in the right direction.
I didn't know you could pass an array to a where clause - that opened up some options
I had used the splat operator a few times, but it didn't hit me that it actually creates an object(variable)
The [son,daughter] stuff was just a console exercise to see what I could do, but not sure what I was going to do with it. I ended up taking the model association and creating the array out of the picture and implemented OR searches.
def array_to_ilike(col_name,keys)
ilike = [keys.map {|i| "#{col_name} ILiKE ? "}.join(" OR "), *keys ]
#ilike = [keys.size.times.map{"#{col_name} ILIKE ?"}.join(' OR '), *keys ]
#both work, guess its just what you are use to.
end
I then allowed a pipe(|) character in my subject,tag,values searches, so a WHATSIT style question
What's Steve's Phone Home|Work => displays home and work phone
steve phone home|work The 's stuff is just for show
steve son|daughter => displays children
phone james%|lori% => displays phone number for anyone who's name starts with james or lori
james%|lori% => dumps all information on anyone who's name starts with james or lori
The query then parses the command and if it encounters a | in any of the words, it will do things like:
t_ilike = array_to_ilike('tags.name',name.split("|"))
# or I actually stored it off on the inital parse
t_ilike = #tuple[:tag][:ilike] ||= ['tags.name ilike ?',tag]
Again this is just a learning exercise in creating a non-CRUD class to deal with the parsing and searching.
Steve