Rails 4 with Postgres hstore, can you query keys with wildcards? - ruby-on-rails

I want to be able to query keys in hstore with wildcards.
For example, I have a preferences model that has an hstore column called 'skills'.
An example of skills might be
{'Ruby' => {'checked' => true } }
Now I want to query this like so
Preference.where("skills LIKE :key", key: "%ruby%")
{"Angular.js"=>"{\"checked\"=>true}"}
SELECT user_id FROM preferences WHERE EXISTS( SELECT 1 FROM skeys(skills) AS k WHERE k LIKE '%angular%');
user_id
---------
(0 rows)
However,
SELECT user_id FROM preferences WHERE EXISTS( SELECT 1 FROM skeys(skills) AS k WHERE k LIKE '%a%');
user_id
---------
1
(1 row)

Per what Craig was saying in the comments, this is possible, but not efficient, here is an example query:
SELECT * FROM some_hstore WHERE EXISTS( SELECT 1 FROM skeys(blah) AS k WHERE k ~* 'a%');
You can see matching patterns here from postgres http://www.postgresql.org/docs/current/static/functions-matching.html

Related

Find n most referenced records by foreign_key in related table

I have a table skills and a table programs_skills which references skill_id as a foreign key, I want to retrieve the 10 most present skills in table programs_skills (I need to count the number of occurrence of skill_id in programs_skills and then order it by descending order).
I wrote this in my skill model:
def self.most_used(limit)
Skill.find(
ActiveRecord::Base.connection.execute(
'SELECT programs_skills.skill_id, count(*) FROM programs_skills GROUP BY skill_id ORDER BY count DESC'
).to_a.first(limit).map { |record| record['skill_id'] }
)
end
This is working but I would like to find a way to perform this query in a more elegant, performant, "activerecord like" way.
Could you help me rewrite this query ?
Just replace your query by:
WITH
T AS
(
SELECT skill_id, COUNT(*) AS NB, RANK() OVER(ORDER BY COUNT(*) DESC) AS RNK
FROM programs_skills
GROUP BY skill_id
)
SELECT wojewodztwo, NB
FROM T
WHERE RNK <= 10
This use CTE and windowed function.
ProgramsSkills.select("skill_id, COUNT(*) AS nb_skills")
.group(:skill_id).order("nb_skills DESC").limit(limit)
.first(limit).pluck(:skill_id)

Re-write a query to avoid PG::GroupingError: ERROR: in the GROUP BY clause or be used in an aggregate function

I tried many alternatives before posting this question.
I have a query on a table A with columns: id, num, user_id.
id is PK, user_id can be duplicate.
I need to have all the rows such that only unique user_id has chosen to have highest num value. For this, I came up with aSQL below, which will work in Oracle database. I am on ruby on rails platform with Postgres Database.
select stats.* from stats as A
where A.num > (
select B.num
from stats as B
where A.user_id == B.user_id
group by B.user_id
having B.num> min(B.num) )
I tried writing this query via active record method but still ran into
PG::GroupingError: ERROR: column "b.num" must appear in the GROUP BY
clause or be used in an aggregate function
Stat.where("stats.num > ( select B.nums from stats as B where stats.user_id = B.user_id group by B.user_id having B.num < max(B.num) )")
Can someone tell me alternative way of writing this query
The SELECT clause of your subquery in Rails doesn't match that of your example. Note that since you're performing an aggregate function min(B.num) in your HAVING clause, you'll have to also include it in your SELECT clause:
Stat.where("stats.num > ( select B.num from stats as B where stats.user_id = B.user_id group by B.user_id having B.num < max(B.num) )")
You may also need a condition to handle the case where select B.num from stats as B where stats.user_id = B.user_id group by B.user_id having B.num < max(B.num) returns more than one row.

Query and order by number of matches in JSON array

Using JSON arrays in a jsonb column in Postgres 9.4 and Rails, I can set up a scope that returns all rows containing any elements from an array passed to the scope method - like so:
scope :tagged, ->(tags) {
where(["data->'tags' ?| ARRAY[:tags]", { tags: tags }])
}
I'd also like to order the results based on the number of matched elements in the array.
I appreciate I might need to step outside the confines of ActiveRecord to do this, so a vanilla Postgres SQL answer is helpful too, but bonus points if it can be wrapped up in ActiveRecord so it can be a chain-able scope.
As requested, here's an example table. (Actual schema is far more complicated but this is all I'm concerned about.)
id | data
----+-----------------------------------
1 | {"tags": ["foo", "bar", "baz"]}
2 | {"tags": ["bish", "bash", "baz"]}
3 |
4 | {"tags": ["foo", "foo", "foo"]}
The use case is to find related content based on tags. More matching tags are more relevant, hence results should be ordered by the number of matches. In Ruby I'd have a simple method like this:
Page.tagged(['foo', 'bish', 'bash', 'baz']).all
Which should return the pages in the following order: 2, 1, 4.
Your arrays contain only primitive values, nested documents would be more complicated.
Query
Unnest the JSON arrays of found rows with jsonb_array_elements_text() in a LATERAL join and count matches:
SELECT *
FROM (
SELECT *
FROM tbl
WHERE data->'tags' ?| ARRAY['foo', 'bar']
) t
, LATERAL (
SELECT count(*) AS ct
FROM jsonb_array_elements_text(t.data->'tags') a(elem)
WHERE elem = ANY (ARRAY['foo', 'bar']) -- same array parameter
) ct
ORDER BY ct.ct DESC; -- more expressions to break ties?
Alternative with INSTERSECT. It's one of the rare occasions that we can make use of this basic SQL feature:
SELECT *
FROM (
SELECT *
FROM tbl
WHERE data->'tags' ?| '{foo, bar}'::text[] -- alt. syntax w. array
) t
, LATERAL (
SELECT count(*) AS ct
FROM (
SELECT * FROM jsonb_array_elements_text(t.data->'tags')
INTERSECT ALL
SELECT * FROM unnest('{foo, bar}'::text[]) -- same array literal
) i
) ct
ORDER BY ct.ct DESC;
Note a subtle difference: This consumes each element when matched, so it does not count unmatched duplicates in data->'tags' like the first variant does. For details see the demo below.
Also demonstrating an alternative way to pass the array parameter: as array literal: '{foo, bar}'. This may be simpler to handle for some clients:
PostgreSQL: Issue with passing array to procedure
Or you could create a server side search function taking a VARIADIC parameter and pass a variable number of plain text values:
Passing multiple values in single parameter
Related:
Check if key exists in a JSON with PL/pgSQL?
Index
Be sure to have a functional GIN index to support the jsonb existence operator ?|:
CREATE INDEX tbl_dat_gin ON tbl USING gin (data->'tags');
Index for finding an element in a JSON array
What's the proper index for querying structures in arrays in Postgres jsonb?
Nuances with duplicates
Clarification as per request in the comment. Say, we have a JSON array with two duplicated tags (4 total):
jsonb '{"tags": ["foo", "bar", "foo", "bar"]}'
And search with an SQL array parameter including both tags, one of them duplicated (3 total):
'{foo, bar, foo}'::text[]
Consider the results of this demo:
SELECT *
FROM (SELECT jsonb '{"tags":["foo", "bar", "foo", "bar"]}') t(data)
, LATERAL (
SELECT count(*) AS ct
FROM jsonb_array_elements_text(t.data->'tags') e
WHERE e = ANY ('{foo, bar, foo}'::text[])
) ct
, LATERAL (
SELECT count(*) AS ct_intsct_all
FROM (
SELECT * FROM jsonb_array_elements_text(t.data->'tags')
INTERSECT ALL
SELECT * FROM unnest('{foo, bar, foo}'::text[])
) i
) ct_intsct_all
, LATERAL (
SELECT count(DISTINCT e) AS ct_dist
FROM jsonb_array_elements_text(t.data->'tags') e
WHERE e = ANY ('{foo, bar, foo}'::text[])
) ct_dist
, LATERAL (
SELECT count(*) AS ct_intsct
FROM (
SELECT * FROM jsonb_array_elements_text(t.data->'tags')
INTERSECT
SELECT * FROM unnest('{foo, bar, foo}'::text[])
) i
) ct_intsct;
Result:
data | ct | ct_intsct_all | ct_dist | ct_intsct
-----------------------------------------+----+---------------+---------+----------
'{"tags": ["foo", "bar", "foo", "bar"]}' | 4 | 3 | 2 | 2
Comparing elements in the JSON array to elements in the array parameter:
4 tags match any of the search elements: ct.
3 tags in the set intersect (can be matched element-to-element): ct_intsct_all.
2 distinct matching tags can be identified: ct_dist or ct_intsct.
If you don't have dupes or if you don't care to exclude them, use one of the first two techniques. The other two are a bit slower (besides the different result), because they have to check for dupes.
I'm posting details of my solution in Ruby, in case it's useful to anyone tackling the same issue.
In the end I decided a scope isn't appropriate as the method will return the an array of objects (not a chainable ActiveRecord::Relation), so I've written a class method and have provided a way to pass a chained scope to it through a block:
def self.with_any_tags(tags, &block)
composed_scope = (
block_given? ? yield : all
).where(["data->'tags' ?| ARRAY[:tags]", { tags: tags }])
t = Arel::Table.new('t', ActiveRecord::Base)
ct = Arel::Table.new('ct', ActiveRecord::Base)
arr_sql = Arel.sql "ARRAY[#{ tags.map { |t| Arel::Nodes::Quoted.new(t).to_sql }.join(', ') }]"
any_tags_func = Arel::Nodes::NamedFunction.new('ANY', [arr_sql])
lateral = ct
.project(Arel.sql('e').count(true).as('ct'))
.from(Arel.sql "jsonb_array_elements_text(t.data->'tags') e")
.where(Arel::Nodes::Equality.new Arel.sql('e'), any_tags_func)
query = t
.project(t[Arel.star])
.from(composed_scope.as('t'))
.join(Arel.sql ", LATERAL (#{ lateral.to_sql }) ct")
.order(ct[:ct].desc)
find_by_sql query.to_sql
end
This can be used like so:
Page.with_any_tags(['foo', 'bar'])
# SELECT "t".*
# FROM (
# SELECT "pages".* FROM "pages"
# WHERE data->'tags' ?| ARRAY['foo','bar']
# ) t,
# LATERAL (
# SELECT COUNT(DISTINCT e) AS ct
# FROM jsonb_array_elements_text(t.data->'tags') e
# WHERE e = ANY(ARRAY['foo', 'bar'])
# ) ct
# ORDER BY "ct"."ct" DESC
Page.with_any_tags(['foo', 'bar']) do
Page.published
end
# SELECT "t".*
# FROM (
# SELECT "pages".* FROM "pages"
# WHERE pages.published_at <= '2015-07-19 15:11:59.997134'
# AND pages.deleted_at IS NULL
# AND data->'tags' ?| ARRAY['foo','bar']
# ) t,
# LATERAL (
# SELECT COUNT(DISTINCT e) AS ct
# FROM jsonb_array_elements_text(t.data->'tags') e
# WHERE e = ANY(ARRAY['foo', 'bar'])
# ) ct
# ORDER BY "ct"."ct" DESC

Sequel -- How To Construct This Query?

I have a users table, which has a one-to-many relationship with a user_purchases table via the foreign key user_id. That is, each user can make many purchases (or may have none, in which case he will have no entries in the user_purchases table).
user_purchases has only one other field that is of interest here, which is purchase_date.
I am trying to write a Sequel ORM statement that will return a dataset with the following columns:
user_id
date of the users SECOND purchase, if it exists
So users who have not made at least 2 purchases will not appear in this dataset. What is the best way to write this Sequel statement?
Please note I am looking for a dataset with ALL users returned who have >= 2 purchases
Thanks!
EDIT FOR CLARITY
Here is a similar statement I wrote to get users and their first purchase date (as opposed to 2nd purchase date, which I am asking for help with in the current post):
DB[:users].join(:user_purchases, :user_id => :id)
.select{[:user_id, min(:purchase_date)]}
.group(:user_id)
You don't seem to be worried about the dates, just the counts so
DB[:user_purchases].group_and_count(:user_id).having(:count > 1).all
will return a list of user_ids and counts where the count (of purchases) is >= 2. Something like
[{:count=>2, :user_id=>1}, {:count=>7, :user_id=>2}, {:count=>2, :user_id=>3}, ...]
If you want to get the users with that, the easiest way with Sequel is probably to extract just the list of user_ids and feed that back into another query:
DB[:users].where(:id => DB[:user_purchases].group_and_count(:user_id).
having(:count > 1).all.map{|row| row[:user_id]}).all
Edit:
I felt like there should be a more succinct way and then I saw this answer (from Sequel author Jeremy Evans) to another question using select_group and select_more : https://stackoverflow.com/a/10886982/131226
This should do it without the subselect:
DB[:users].
left_join(:user_purchases, :user_id=>:id).
select_group(:id).
select_more{count(:purchase_date).as(:purchase_count)}.
having(:purchase_count > 1)
It generates this SQL
SELECT `id`, count(`purchase_date`) AS 'purchase_count'
FROM `users` LEFT JOIN `user_purchases`
ON (`user_purchases`.`user_id` = `users`.`id`)
GROUP BY `id` HAVING (`purchase_count` > 1)"
Generally, this could be the SQL query that you need:
SELECT u.id, up1.purchase_date FROM users u
LEFT JOIN user_purchases up1 ON u.id = up1.user_id
LEFT JOIN user_purchases up2 ON u.id = up2.user_id AND up2.purchase_date < up1.purchase_date
GROUP BY u.id, up1.purchase_date
HAVING COUNT(up2.purchase_date) = 1;
Try converting that to sequel, if you don't get any better answers.
The date of the user's second purchase would be the second row retrieved if you do an order_by(:purchase_date) as part of your query.
To access that, do a limit(2) to constrain the query to two results then take the [-1] (or last) one. So, if you're not using models and are working with datasets only, and know the user_id you're interested in, your (untested) query would be:
DB[:user_purchases].where(:user_id => user_id).order_by(:user_purchases__purchase_date).limit(2)[-1]
Here's some output from Sequel's console:
DB[:user_purchases].where(:user_id => 1).order_by(:purchase_date).limit(2).sql
=> "SELECT * FROM user_purchases WHERE (user_id = 1) ORDER BY purchase_date LIMIT 2"
Add the appropriate select clause:
.select(:user_id, :purchase_date)
and you should be done:
DB[:user_purchases].select(:user_id, :purchase_date).where(:user_id => 1).order_by(:purchase_date).limit(2).sql
=> "SELECT user_id, purchase_date FROM user_purchases WHERE (user_id = 1) ORDER BY purchase_date LIMIT 2"

How to get data from two different fields in sample table whether anyone of the field is NULL ?

I am using ruby 1.8.6 , rails 2.3.8.
Here , I have a problem with multiple combo box in Rails,
Product drop down list
Sku's drop down list ( depends on production selection )
Product tables fields are
id name
In Sku's tables fields are
id name product_id alias_id
Alias tables fields are
id name
For example I have Sku's tables data like below
id name product_id alias_id
1. 100-m 1 10
2. 10-ml 1 NULL
3. 150 1 2
4. 200-m 1 10
5. 300-m 1 10
in Controller I written query like,
#skus = Sku.all(:conditions => ["product_id = ? ",
params[:id]],:select=>"skus.id,
CASE when skus.alias_id IS NOT NULL then (SELECT alias.name FROM alias WHERE
alias.id = skus.alias_id group by alias.name) END AS 'skus_name'",
:order=>"skus_name" ,:include=>[:alias])
This query written output like,
id skus_name
1. 100gms
2. 10-ml
3. 150-ml
4. 100gms
5. 100gms
Can any one help me how to get the distinct results?
Thanks in advance
You can either call uniq on the #sku variable that is returned.
#skus = Sku.all(:conditions => ["product_id = ? ",
params[:id]],:select=>"skus.id,
CASE when skus.alias_id IS NOT NULL then (SELECT alias.name FROM alias WHERE
alias.id = skus.alias_id group by alias.name) END AS 'skus_name'",
:order=>"skus_name" ,:include=>[:alias]).uniq
This will perform the same DB select but get unique results in ruby.
The alternative is to use DISTINCT in the select
#skus = Sku.all(:conditions => ["product_id = ? ",
params[:id]],:select=>"skus.id,
CASE when skus.alias_id IS NOT NULL then (SELECT DISTINCT alias.name FROM alias WHERE
alias.id = skus.alias_id group by alias.name) END AS 'skus_name'",
:order=>"skus_name" ,:include=>[:alias])
This will only get unique results in the database.
I'd go with the second option as it should be quicker than doing uniq in ruby :)

Resources