Query and order by number of matches in JSON array - ruby-on-rails

Using JSON arrays in a jsonb column in Postgres 9.4 and Rails, I can set up a scope that returns all rows containing any elements from an array passed to the scope method - like so:
scope :tagged, ->(tags) {
where(["data->'tags' ?| ARRAY[:tags]", { tags: tags }])
}
I'd also like to order the results based on the number of matched elements in the array.
I appreciate I might need to step outside the confines of ActiveRecord to do this, so a vanilla Postgres SQL answer is helpful too, but bonus points if it can be wrapped up in ActiveRecord so it can be a chain-able scope.
As requested, here's an example table. (Actual schema is far more complicated but this is all I'm concerned about.)
id | data
----+-----------------------------------
1 | {"tags": ["foo", "bar", "baz"]}
2 | {"tags": ["bish", "bash", "baz"]}
3 |
4 | {"tags": ["foo", "foo", "foo"]}
The use case is to find related content based on tags. More matching tags are more relevant, hence results should be ordered by the number of matches. In Ruby I'd have a simple method like this:
Page.tagged(['foo', 'bish', 'bash', 'baz']).all
Which should return the pages in the following order: 2, 1, 4.

Your arrays contain only primitive values, nested documents would be more complicated.
Query
Unnest the JSON arrays of found rows with jsonb_array_elements_text() in a LATERAL join and count matches:
SELECT *
FROM (
SELECT *
FROM tbl
WHERE data->'tags' ?| ARRAY['foo', 'bar']
) t
, LATERAL (
SELECT count(*) AS ct
FROM jsonb_array_elements_text(t.data->'tags') a(elem)
WHERE elem = ANY (ARRAY['foo', 'bar']) -- same array parameter
) ct
ORDER BY ct.ct DESC; -- more expressions to break ties?
Alternative with INSTERSECT. It's one of the rare occasions that we can make use of this basic SQL feature:
SELECT *
FROM (
SELECT *
FROM tbl
WHERE data->'tags' ?| '{foo, bar}'::text[] -- alt. syntax w. array
) t
, LATERAL (
SELECT count(*) AS ct
FROM (
SELECT * FROM jsonb_array_elements_text(t.data->'tags')
INTERSECT ALL
SELECT * FROM unnest('{foo, bar}'::text[]) -- same array literal
) i
) ct
ORDER BY ct.ct DESC;
Note a subtle difference: This consumes each element when matched, so it does not count unmatched duplicates in data->'tags' like the first variant does. For details see the demo below.
Also demonstrating an alternative way to pass the array parameter: as array literal: '{foo, bar}'. This may be simpler to handle for some clients:
PostgreSQL: Issue with passing array to procedure
Or you could create a server side search function taking a VARIADIC parameter and pass a variable number of plain text values:
Passing multiple values in single parameter
Related:
Check if key exists in a JSON with PL/pgSQL?
Index
Be sure to have a functional GIN index to support the jsonb existence operator ?|:
CREATE INDEX tbl_dat_gin ON tbl USING gin (data->'tags');
Index for finding an element in a JSON array
What's the proper index for querying structures in arrays in Postgres jsonb?
Nuances with duplicates
Clarification as per request in the comment. Say, we have a JSON array with two duplicated tags (4 total):
jsonb '{"tags": ["foo", "bar", "foo", "bar"]}'
And search with an SQL array parameter including both tags, one of them duplicated (3 total):
'{foo, bar, foo}'::text[]
Consider the results of this demo:
SELECT *
FROM (SELECT jsonb '{"tags":["foo", "bar", "foo", "bar"]}') t(data)
, LATERAL (
SELECT count(*) AS ct
FROM jsonb_array_elements_text(t.data->'tags') e
WHERE e = ANY ('{foo, bar, foo}'::text[])
) ct
, LATERAL (
SELECT count(*) AS ct_intsct_all
FROM (
SELECT * FROM jsonb_array_elements_text(t.data->'tags')
INTERSECT ALL
SELECT * FROM unnest('{foo, bar, foo}'::text[])
) i
) ct_intsct_all
, LATERAL (
SELECT count(DISTINCT e) AS ct_dist
FROM jsonb_array_elements_text(t.data->'tags') e
WHERE e = ANY ('{foo, bar, foo}'::text[])
) ct_dist
, LATERAL (
SELECT count(*) AS ct_intsct
FROM (
SELECT * FROM jsonb_array_elements_text(t.data->'tags')
INTERSECT
SELECT * FROM unnest('{foo, bar, foo}'::text[])
) i
) ct_intsct;
Result:
data | ct | ct_intsct_all | ct_dist | ct_intsct
-----------------------------------------+----+---------------+---------+----------
'{"tags": ["foo", "bar", "foo", "bar"]}' | 4 | 3 | 2 | 2
Comparing elements in the JSON array to elements in the array parameter:
4 tags match any of the search elements: ct.
3 tags in the set intersect (can be matched element-to-element): ct_intsct_all.
2 distinct matching tags can be identified: ct_dist or ct_intsct.
If you don't have dupes or if you don't care to exclude them, use one of the first two techniques. The other two are a bit slower (besides the different result), because they have to check for dupes.

I'm posting details of my solution in Ruby, in case it's useful to anyone tackling the same issue.
In the end I decided a scope isn't appropriate as the method will return the an array of objects (not a chainable ActiveRecord::Relation), so I've written a class method and have provided a way to pass a chained scope to it through a block:
def self.with_any_tags(tags, &block)
composed_scope = (
block_given? ? yield : all
).where(["data->'tags' ?| ARRAY[:tags]", { tags: tags }])
t = Arel::Table.new('t', ActiveRecord::Base)
ct = Arel::Table.new('ct', ActiveRecord::Base)
arr_sql = Arel.sql "ARRAY[#{ tags.map { |t| Arel::Nodes::Quoted.new(t).to_sql }.join(', ') }]"
any_tags_func = Arel::Nodes::NamedFunction.new('ANY', [arr_sql])
lateral = ct
.project(Arel.sql('e').count(true).as('ct'))
.from(Arel.sql "jsonb_array_elements_text(t.data->'tags') e")
.where(Arel::Nodes::Equality.new Arel.sql('e'), any_tags_func)
query = t
.project(t[Arel.star])
.from(composed_scope.as('t'))
.join(Arel.sql ", LATERAL (#{ lateral.to_sql }) ct")
.order(ct[:ct].desc)
find_by_sql query.to_sql
end
This can be used like so:
Page.with_any_tags(['foo', 'bar'])
# SELECT "t".*
# FROM (
# SELECT "pages".* FROM "pages"
# WHERE data->'tags' ?| ARRAY['foo','bar']
# ) t,
# LATERAL (
# SELECT COUNT(DISTINCT e) AS ct
# FROM jsonb_array_elements_text(t.data->'tags') e
# WHERE e = ANY(ARRAY['foo', 'bar'])
# ) ct
# ORDER BY "ct"."ct" DESC
Page.with_any_tags(['foo', 'bar']) do
Page.published
end
# SELECT "t".*
# FROM (
# SELECT "pages".* FROM "pages"
# WHERE pages.published_at <= '2015-07-19 15:11:59.997134'
# AND pages.deleted_at IS NULL
# AND data->'tags' ?| ARRAY['foo','bar']
# ) t,
# LATERAL (
# SELECT COUNT(DISTINCT e) AS ct
# FROM jsonb_array_elements_text(t.data->'tags') e
# WHERE e = ANY(ARRAY['foo', 'bar'])
# ) ct
# ORDER BY "ct"."ct" DESC

Related

How to query nested JSONB postgres in Rails

I've a table called Order with a jsonb column type called line_items. The line_items column can contain nested values like this:
[
{
"id":9994857545813,
"sku":"CLIPPING-PATH_C2_24H",
},
{
"id":9994857578581,
"sku":"NATURAL-SHADOW_C1_24H",
}
]
The above example has two line items in it but it can vary from 1 to any number of line items.
I need to query all orders that contains only 1 line item where sku = a particular value such as CLIPPING-PATH_C2_24H as per above example.
So, the query should not match the above example but the following that has only 1 line item and sku=CLIPPING-PATH_C2_24H
[
{
"id":9994857545813,
"sku":"CLIPPING-PATH_C2_24H",
}
]
Can any help to write the query using Rails active record?
demo
You can call plpghsql in ruby.(How to call plpgsql functions from Ruby on rails?)
sql query: select jsonb_path_query(order_json,'$[*] ? (#.sku == "CLIPPING-PATH_C2_24H")') from orders ;
it's not easy to get it right, since some part of PL/pgsql function string even include 4 single quotes. Better use raise notice to test it step by step.
CREATE OR REPLACE FUNCTION get_sku_CLIPPING_path (_sku text)
RETURNS json
AS $$
DECLARE
_sql text;
_returnjson jsonb;
BEGIN
RAISE NOTICE '%: _sku', $1;
RAISE NOTICE '%', '$[*] ? (#.sku == ' || $1 || ')';
RAISE NOTICE '%', $s$
SELECT
jsonb_path_query(order_json, $s$ || '''' || '$[*] ? (#.sku == ' || $1 || ')''' || ' from orders';
_sql := $s$
SELECT
jsonb_path_query(order_json, $s$ || '''' || '$[*] ? (#.sku == ' || $1 || ')''' || ' )from orders';
EXECUTE _sql
USING _sku INTO _returnjson;
RETURN (_returnjson::json);
END
$$
LANGUAGE plpgsql;
call it: select * from get_sku_CLIPPING_path('"CLIPPING-PATH_C2_24H"');
First things first, you don't have a nested json. You just have a json array with objects. Moreover your json objects can be representable as tabular. It's better to store those objects in another table and setup a one to many relationship.
You can have an "orders" table and an "order_details" table.
To get data you need, first we need to find records which have "sku":"CLIPPING-PATH_C2_24H" then we need to parse json and get that object from line_items fields.
SELECT
t.*
FROM orders o,
-- extract object from array of json objects
LATERAL jsonb_path_query(o.line_items, '$[*] ? (#.sku == $value)', '{"value" : "CLIPPING-PATH_C2_24H"}') order_line,
-- convert "sku":"NATURAL-SHADOW_C1_24H" into columns
LATERAL jsonb_to_record(order_line) as t(id bigint, sku text)
WHERE
-- find record which has "sku":"NATURAL-SHADOW_C1_24H"
o.line_items #> '[{"sku":"NATURAL-SHADOW_C1_24H"}]';
Result will be like this;
id | sku
------------- | --------------------
9994857545813 | CLIPPING-PATH_C2_24H
fiddle is here
I was able to figure this out using the jsonb_array_length method:
Order
.where("line_items #> ?", [{sku: sku}].to_json)
.where("jsonb_array_length(line_items) = 1")
.count
end
The following resource was extremely helpful:
https://gist.github.com/mankind/1802dbb64fc24be33d434d593afd6221

Ruby/Rails - Chain unknown number of method calls

I would like to dynamically create (potentially complex) Active Record queries from a 2D array passed into a method as an argument. In other words, I'd like to take this:
arr = [
['join', :comments],
['where', :author => 'Bob']
]
And create the equivalent of this:
Articles.join(:comments).where(:author => 'Bob')
One way to do this is:
Articles.send(*arr[0]).send(*arr[1])
But what if arr contains 3 nested arrays, or 4, or 5? A very unrefined way would be to do this:
case arr.length
when 1
Articles.send(*arr[0])
when 2
Articles.send(*arr[0]).send(*arr[1])
when 3
Articles.send(*arr[0]).send(*arr[1]).send(*arr[2])
# etc.
end
But is there a cleaner, more succinct way (without having to hit the database multiple times)? Perhaps some way to construct a chain of method calls before executing them?
One convenient way would be to use a hash instead of a 2D array.
Something like this
query = {
join: [:comments],
where: {:author => 'Bob'}
}
This approach is not much complex and You don't need to worry if the key is not provided or is empty
Article.joins(query[:join]).where(query[:where])
#=> "SELECT `articles`.* FROM `articles` INNER JOIN `comments` ON `comments`.`article_id` = `articles`.`id` WHERE `articles`.`author` = 'Bob'"
If the keys are empty or not present at all
query = {
join: []
}
Article.joins(query[:join]).where(query[:where])
#=> "SELECT `articles`.* FROM `articles`"
Or nested
query = {
join: [:comments],
where: {:author => 'Bob', comments: {author: 'Joe'}}
}
#=> "SELECT `articles`.* FROM `articles` INNER JOIN `comments` ON `comments`.`article_id` = `articles`.`id` WHERE `articles`.`author` = 'Bob' AND `comments`.`author` = 'Joe'"
I created following query which will work on any model and associated chained query array.
def chain_queries_on(klass, arr)
arr.inject(klass) do |relation, query|
begin
relation.send(query[0], *query[1..-1])
rescue
break;
end
end
end
I tested in local for following test,
arr = [['where', {id: [1,2]}], ['where', {first_name: 'Shobiz'}]]
chain_queries_on(Article, arr)
Query fired is like below to return proper output,
Article Load (0.9ms) SELECT `article`.* FROM `article` WHERE `article`.`id` IN (1, 2) AND `article`.`first_name` = 'Shobiz' ORDER BY created_at desc
Note-1: few noticeable cases
for empty arr, it will return class we passed as first argument in method.
It will return nil in case of error. Error can occur if we use pluck which will return array (output which is not chain-able) or if we do not pass class as first parameter etc.
More modification can be done for improvement in above & avoid edge cases.
Note-2: improvements
You can define this method as a class method for Object class also with one argument (i.e. array) and call directly on class like,
# renamed to make concise
Article.chain_queries(arr)
User.chain_queries(arr)
Inside method, use self instead of klass
arr.inject(Articles){|articles, args| articles.send(*args)}

Rails query interface: selecting rows in database where any of the JSON array's values match a certain criteria

Database info:
Database: PostgresSQL
Table name: publishing_rules
Column name: menu_items
Column format: JSON
Example column value: {"items":[{"id":1,"title":"dfgdfg"},{"id":2,"title":"sdf"}]}
I need to gather all columns which have at least one item with an id equal to my value. So far I've come up with this:
id = 1
items = PublishingRule.where("menu_items #> '{items,0}' ->> 'id' = ?", id.to_s)
However this code only acquires columns with items array first value matching my criteria. I need to modify my code to something similar to:
items = PublishingRule.where("menu_items #> '{items, ANY}' ->> 'id' = ?", id.to_s)
or
id = 1
items = PublishingRule.where("menu_items #> '{items.map}' ->> 'id' = ?", id.to_s)
How do I do that?
Since the items is array at given example you can't work it out using only operators. You need to use jsonb_array_elements() in order to look into that.
Here's SQL query example to meet your requirement:
SELECT *
FROM publishing_rules
WHERE EXISTS (
SELECT 1
FROM jsonb_array_elements( menu_items -> 'items' )
WHERE value ->> 'id' = '2'
LIMIT 1
);
So, using within WHERE EXISTS lookup into the array does the trick.

Re-write a query to avoid PG::GroupingError: ERROR: in the GROUP BY clause or be used in an aggregate function

I tried many alternatives before posting this question.
I have a query on a table A with columns: id, num, user_id.
id is PK, user_id can be duplicate.
I need to have all the rows such that only unique user_id has chosen to have highest num value. For this, I came up with aSQL below, which will work in Oracle database. I am on ruby on rails platform with Postgres Database.
select stats.* from stats as A
where A.num > (
select B.num
from stats as B
where A.user_id == B.user_id
group by B.user_id
having B.num> min(B.num) )
I tried writing this query via active record method but still ran into
PG::GroupingError: ERROR: column "b.num" must appear in the GROUP BY
clause or be used in an aggregate function
Stat.where("stats.num > ( select B.nums from stats as B where stats.user_id = B.user_id group by B.user_id having B.num < max(B.num) )")
Can someone tell me alternative way of writing this query
The SELECT clause of your subquery in Rails doesn't match that of your example. Note that since you're performing an aggregate function min(B.num) in your HAVING clause, you'll have to also include it in your SELECT clause:
Stat.where("stats.num > ( select B.num from stats as B where stats.user_id = B.user_id group by B.user_id having B.num < max(B.num) )")
You may also need a condition to handle the case where select B.num from stats as B where stats.user_id = B.user_id group by B.user_id having B.num < max(B.num) returns more than one row.

Rails 4 with Postgres hstore, can you query keys with wildcards?

I want to be able to query keys in hstore with wildcards.
For example, I have a preferences model that has an hstore column called 'skills'.
An example of skills might be
{'Ruby' => {'checked' => true } }
Now I want to query this like so
Preference.where("skills LIKE :key", key: "%ruby%")
{"Angular.js"=>"{\"checked\"=>true}"}
SELECT user_id FROM preferences WHERE EXISTS( SELECT 1 FROM skeys(skills) AS k WHERE k LIKE '%angular%');
user_id
---------
(0 rows)
However,
SELECT user_id FROM preferences WHERE EXISTS( SELECT 1 FROM skeys(skills) AS k WHERE k LIKE '%a%');
user_id
---------
1
(1 row)
Per what Craig was saying in the comments, this is possible, but not efficient, here is an example query:
SELECT * FROM some_hstore WHERE EXISTS( SELECT 1 FROM skeys(blah) AS k WHERE k ~* 'a%');
You can see matching patterns here from postgres http://www.postgresql.org/docs/current/static/functions-matching.html

Resources