How to query nested JSONB postgres in Rails - ruby-on-rails

I've a table called Order with a jsonb column type called line_items. The line_items column can contain nested values like this:
[
{
"id":9994857545813,
"sku":"CLIPPING-PATH_C2_24H",
},
{
"id":9994857578581,
"sku":"NATURAL-SHADOW_C1_24H",
}
]
The above example has two line items in it but it can vary from 1 to any number of line items.
I need to query all orders that contains only 1 line item where sku = a particular value such as CLIPPING-PATH_C2_24H as per above example.
So, the query should not match the above example but the following that has only 1 line item and sku=CLIPPING-PATH_C2_24H
[
{
"id":9994857545813,
"sku":"CLIPPING-PATH_C2_24H",
}
]
Can any help to write the query using Rails active record?

demo
You can call plpghsql in ruby.(How to call plpgsql functions from Ruby on rails?)
sql query: select jsonb_path_query(order_json,'$[*] ? (#.sku == "CLIPPING-PATH_C2_24H")') from orders ;
it's not easy to get it right, since some part of PL/pgsql function string even include 4 single quotes. Better use raise notice to test it step by step.
CREATE OR REPLACE FUNCTION get_sku_CLIPPING_path (_sku text)
RETURNS json
AS $$
DECLARE
_sql text;
_returnjson jsonb;
BEGIN
RAISE NOTICE '%: _sku', $1;
RAISE NOTICE '%', '$[*] ? (#.sku == ' || $1 || ')';
RAISE NOTICE '%', $s$
SELECT
jsonb_path_query(order_json, $s$ || '''' || '$[*] ? (#.sku == ' || $1 || ')''' || ' from orders';
_sql := $s$
SELECT
jsonb_path_query(order_json, $s$ || '''' || '$[*] ? (#.sku == ' || $1 || ')''' || ' )from orders';
EXECUTE _sql
USING _sku INTO _returnjson;
RETURN (_returnjson::json);
END
$$
LANGUAGE plpgsql;
call it: select * from get_sku_CLIPPING_path('"CLIPPING-PATH_C2_24H"');

First things first, you don't have a nested json. You just have a json array with objects. Moreover your json objects can be representable as tabular. It's better to store those objects in another table and setup a one to many relationship.
You can have an "orders" table and an "order_details" table.
To get data you need, first we need to find records which have "sku":"CLIPPING-PATH_C2_24H" then we need to parse json and get that object from line_items fields.
SELECT
t.*
FROM orders o,
-- extract object from array of json objects
LATERAL jsonb_path_query(o.line_items, '$[*] ? (#.sku == $value)', '{"value" : "CLIPPING-PATH_C2_24H"}') order_line,
-- convert "sku":"NATURAL-SHADOW_C1_24H" into columns
LATERAL jsonb_to_record(order_line) as t(id bigint, sku text)
WHERE
-- find record which has "sku":"NATURAL-SHADOW_C1_24H"
o.line_items #> '[{"sku":"NATURAL-SHADOW_C1_24H"}]';
Result will be like this;
id | sku
------------- | --------------------
9994857545813 | CLIPPING-PATH_C2_24H
fiddle is here

I was able to figure this out using the jsonb_array_length method:
Order
.where("line_items #> ?", [{sku: sku}].to_json)
.where("jsonb_array_length(line_items) = 1")
.count
end
The following resource was extremely helpful:
https://gist.github.com/mankind/1802dbb64fc24be33d434d593afd6221

Related

ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list using order by with uniq

I have this scope on my product model:
scope :all_products, lambda {
joins(:prices)
.where('prices.start_date <= ? and products.available = ?', Time.current, true)
.uniq
.order('CASE WHEN products.quantity >= products.min_quantity and (prices.finish_date IS NULL OR prices.finish_date >= now()) THEN 0 ELSE 1 END, prices.finish_date asc')
}
I get the follow error when I try to run it: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
How I can use my order by and the query be uniq? I use rails 4.
You need to select the columns first so you can order them later in .order
The result will still have duplicated records in spite of using .uniq or .distinct because the generated query is SELECT DISTINCT products.*, prices.finish_date, ... tries to find all the combination of products.*, prices.finish_date and the special column that has a unique value (in this case you only want the products.id to be unique)
DISTINCT ON is the solution but using it is a little bit tricky for postgres because of SELECT DISTINCT ON expressions must match initial ORDER BY expressions.
Please try:
sub_query = Product.joins(:prices)
.select("DISTINCT ON (products.id) products.*, CASE WHEN (products.quantity >= products.min_quantity) AND (prices.finish_date IS NULL OR prices.finish_date >= now()) THEN 0 ELSE 1 END AS t, prices.finish_date AS date")
query = Product.from("(#{sub_query.to_sql}) as tmp").select("tmp.*").order("tmp.t, tmp.date ASC")

Rails/Postgres Concat JSONB keys together in select

How can I obtain an aggregate of different keys within a JSONB column? Some of the data I want to show in the aggregated form would be a count & total (easy) but the other would be an array of a concatenation of two keys. I keep getting an error when trying to concat them within the inline sql.
context.request_groups
.joins(:request_items)
.select("request_groups.id as id,
count(*) as count,
array_agg(request_items.data->>'contact_first_name' || ' ' || request_items.data->>'contact_last_name') as names,
sum(cast(request_items.data->>'amount' as float)) as total")
.group(:id)
I'm currently trying to use the ARRAY_AGG function (example: http://www.postgresqltutorial.com/postgresql-aggregate-functions/postgresql-array_agg-function/) and getting the following error:
ActiveRecord::StatementInvalid (PG::UndefinedFunction: ERROR: operator does not exist: text ->> unknown)
LINE 1: ...t_name' || ' ' || request_items.data->>'contac...
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
I'm looking for results like:
id, count, total, ['first last', 'first last']
Heres the table layout:
----------------------
Request Group
----------------------
id | UUID
created_at | Timestamp
----------------------
----------------------
Request Items
----------------------
id | UUID
group_id | UUID
data | JSONB
----------------------
JSONB layout:
id: 3,
group_id: 2
data:
{
"name"=>"Fun Event",
"amount"=>200,
"contact_first_name"=>"Jeff",
"contact_last_name"=>"Person"
}
Following one should work I hope, there was problem in your code where constructing expression in array_agg method. Just adding open and close brackets will solve the issue.
Old one: request_items.data->>'contact_first_name'
Updated one: (request_items.data ->> 'contact_first_name')
If brackets not used, there would be problem in distinguish arguments of array_agg method.
context.request_groups
.joins(:request_items)
.select("request_groups.id as id,
count(*) as count,
array_agg( (request_items.data ->> 'contact_first_name') || ' ' || (request_items.data ->> 'contact_last_name') ) as names,
sum(cast(request_items.data->>'amount' as float)) as total")
.group(:id)

Rewhere or unscope a query containing an array condition on Rails

I'm trying to rewhere or unscope a query, where the original condition cannot be written using hash condition:
Reservation.where('block_id IS NULL OR block_id != ?', 'something')
> SELECT `reservations`.* FROM `reservations` WHERE (block_id IS NULL OR block_id != 'something')
Trying to rewhere doesn't work:
Reservation.where('block_id IS NULL OR block_id != ?', 'something').rewhere(block_id: 'anything')
> SELECT `reservations`.* FROM `reservations` WHERE (block_id IS NULL OR block_id != 'something') AND `reservations`.`block_id` = 'anything'
But this example with hash condition would work:
Reservation.where.not(block_id: 'something').rewhere(block_id: 'anything')
> SELECT `reservations`.* FROM `reservations` WHERE `reservations`.`block_id` = 'anything'
I understand that this is probably because on the array condition rails doesn't know which column I'm invoking a where, and therefore rewhere won't find anything to replace.
Is there any way to explicitly tell which column I'm filtering in an array condition? or rewrite the first query (IS NULL OR != value) with hash condition?
Note: Please don't suggest unscoped, as I'm trying to unscope/rewhere only this specific condition, not the whole query.
Thanks!
Sorry it wasn't clear that you had other where clauses that you wanted to keep. You could access the array of where clauses using relations.values[:where] and manipulate it, something like:
Reservation.where('block_id IS NULL OR block_id != ?', 'something')
.tap do |relation|
# Depending on your version of Rails you can do
where_values = relation.where_values
# Or
where_values = relation.values[:where]
# With the first probably being better
where_values.delete_if { |where| ... }
end
.where(block_id: 'anything')
aka hacking

Query and order by number of matches in JSON array

Using JSON arrays in a jsonb column in Postgres 9.4 and Rails, I can set up a scope that returns all rows containing any elements from an array passed to the scope method - like so:
scope :tagged, ->(tags) {
where(["data->'tags' ?| ARRAY[:tags]", { tags: tags }])
}
I'd also like to order the results based on the number of matched elements in the array.
I appreciate I might need to step outside the confines of ActiveRecord to do this, so a vanilla Postgres SQL answer is helpful too, but bonus points if it can be wrapped up in ActiveRecord so it can be a chain-able scope.
As requested, here's an example table. (Actual schema is far more complicated but this is all I'm concerned about.)
id | data
----+-----------------------------------
1 | {"tags": ["foo", "bar", "baz"]}
2 | {"tags": ["bish", "bash", "baz"]}
3 |
4 | {"tags": ["foo", "foo", "foo"]}
The use case is to find related content based on tags. More matching tags are more relevant, hence results should be ordered by the number of matches. In Ruby I'd have a simple method like this:
Page.tagged(['foo', 'bish', 'bash', 'baz']).all
Which should return the pages in the following order: 2, 1, 4.
Your arrays contain only primitive values, nested documents would be more complicated.
Query
Unnest the JSON arrays of found rows with jsonb_array_elements_text() in a LATERAL join and count matches:
SELECT *
FROM (
SELECT *
FROM tbl
WHERE data->'tags' ?| ARRAY['foo', 'bar']
) t
, LATERAL (
SELECT count(*) AS ct
FROM jsonb_array_elements_text(t.data->'tags') a(elem)
WHERE elem = ANY (ARRAY['foo', 'bar']) -- same array parameter
) ct
ORDER BY ct.ct DESC; -- more expressions to break ties?
Alternative with INSTERSECT. It's one of the rare occasions that we can make use of this basic SQL feature:
SELECT *
FROM (
SELECT *
FROM tbl
WHERE data->'tags' ?| '{foo, bar}'::text[] -- alt. syntax w. array
) t
, LATERAL (
SELECT count(*) AS ct
FROM (
SELECT * FROM jsonb_array_elements_text(t.data->'tags')
INTERSECT ALL
SELECT * FROM unnest('{foo, bar}'::text[]) -- same array literal
) i
) ct
ORDER BY ct.ct DESC;
Note a subtle difference: This consumes each element when matched, so it does not count unmatched duplicates in data->'tags' like the first variant does. For details see the demo below.
Also demonstrating an alternative way to pass the array parameter: as array literal: '{foo, bar}'. This may be simpler to handle for some clients:
PostgreSQL: Issue with passing array to procedure
Or you could create a server side search function taking a VARIADIC parameter and pass a variable number of plain text values:
Passing multiple values in single parameter
Related:
Check if key exists in a JSON with PL/pgSQL?
Index
Be sure to have a functional GIN index to support the jsonb existence operator ?|:
CREATE INDEX tbl_dat_gin ON tbl USING gin (data->'tags');
Index for finding an element in a JSON array
What's the proper index for querying structures in arrays in Postgres jsonb?
Nuances with duplicates
Clarification as per request in the comment. Say, we have a JSON array with two duplicated tags (4 total):
jsonb '{"tags": ["foo", "bar", "foo", "bar"]}'
And search with an SQL array parameter including both tags, one of them duplicated (3 total):
'{foo, bar, foo}'::text[]
Consider the results of this demo:
SELECT *
FROM (SELECT jsonb '{"tags":["foo", "bar", "foo", "bar"]}') t(data)
, LATERAL (
SELECT count(*) AS ct
FROM jsonb_array_elements_text(t.data->'tags') e
WHERE e = ANY ('{foo, bar, foo}'::text[])
) ct
, LATERAL (
SELECT count(*) AS ct_intsct_all
FROM (
SELECT * FROM jsonb_array_elements_text(t.data->'tags')
INTERSECT ALL
SELECT * FROM unnest('{foo, bar, foo}'::text[])
) i
) ct_intsct_all
, LATERAL (
SELECT count(DISTINCT e) AS ct_dist
FROM jsonb_array_elements_text(t.data->'tags') e
WHERE e = ANY ('{foo, bar, foo}'::text[])
) ct_dist
, LATERAL (
SELECT count(*) AS ct_intsct
FROM (
SELECT * FROM jsonb_array_elements_text(t.data->'tags')
INTERSECT
SELECT * FROM unnest('{foo, bar, foo}'::text[])
) i
) ct_intsct;
Result:
data | ct | ct_intsct_all | ct_dist | ct_intsct
-----------------------------------------+----+---------------+---------+----------
'{"tags": ["foo", "bar", "foo", "bar"]}' | 4 | 3 | 2 | 2
Comparing elements in the JSON array to elements in the array parameter:
4 tags match any of the search elements: ct.
3 tags in the set intersect (can be matched element-to-element): ct_intsct_all.
2 distinct matching tags can be identified: ct_dist or ct_intsct.
If you don't have dupes or if you don't care to exclude them, use one of the first two techniques. The other two are a bit slower (besides the different result), because they have to check for dupes.
I'm posting details of my solution in Ruby, in case it's useful to anyone tackling the same issue.
In the end I decided a scope isn't appropriate as the method will return the an array of objects (not a chainable ActiveRecord::Relation), so I've written a class method and have provided a way to pass a chained scope to it through a block:
def self.with_any_tags(tags, &block)
composed_scope = (
block_given? ? yield : all
).where(["data->'tags' ?| ARRAY[:tags]", { tags: tags }])
t = Arel::Table.new('t', ActiveRecord::Base)
ct = Arel::Table.new('ct', ActiveRecord::Base)
arr_sql = Arel.sql "ARRAY[#{ tags.map { |t| Arel::Nodes::Quoted.new(t).to_sql }.join(', ') }]"
any_tags_func = Arel::Nodes::NamedFunction.new('ANY', [arr_sql])
lateral = ct
.project(Arel.sql('e').count(true).as('ct'))
.from(Arel.sql "jsonb_array_elements_text(t.data->'tags') e")
.where(Arel::Nodes::Equality.new Arel.sql('e'), any_tags_func)
query = t
.project(t[Arel.star])
.from(composed_scope.as('t'))
.join(Arel.sql ", LATERAL (#{ lateral.to_sql }) ct")
.order(ct[:ct].desc)
find_by_sql query.to_sql
end
This can be used like so:
Page.with_any_tags(['foo', 'bar'])
# SELECT "t".*
# FROM (
# SELECT "pages".* FROM "pages"
# WHERE data->'tags' ?| ARRAY['foo','bar']
# ) t,
# LATERAL (
# SELECT COUNT(DISTINCT e) AS ct
# FROM jsonb_array_elements_text(t.data->'tags') e
# WHERE e = ANY(ARRAY['foo', 'bar'])
# ) ct
# ORDER BY "ct"."ct" DESC
Page.with_any_tags(['foo', 'bar']) do
Page.published
end
# SELECT "t".*
# FROM (
# SELECT "pages".* FROM "pages"
# WHERE pages.published_at <= '2015-07-19 15:11:59.997134'
# AND pages.deleted_at IS NULL
# AND data->'tags' ?| ARRAY['foo','bar']
# ) t,
# LATERAL (
# SELECT COUNT(DISTINCT e) AS ct
# FROM jsonb_array_elements_text(t.data->'tags') e
# WHERE e = ANY(ARRAY['foo', 'bar'])
# ) ct
# ORDER BY "ct"."ct" DESC

Can you add clauses in a where block conditionally when using Squeel?

To start, I'm using Rails v3.2.9 with Squeel 1.0.13 and here's what I'm trying to do:
I want to search for a client using any of three pieces of identifying information - name, date of birth (dob), and social insurance number (sin). The result set must include any record that has any of the identifier - an OR of the conditions. I have done this in Squeel before and it would look something like:
scope :by_any, ->(sin, name, dob){ where{(client.sin == "#{sin}") | (client.name =~ "%#{name}%") | (client.dob == "#{dob}")} }
This works fine as long as I provide all of the identifiers. But what if I only have a name? The above scope results in:
SELECT "clients".* FROM "clients" WHERE ((("clients"."sin" IS NULL OR "clients"."name" ILIKE '%John Doe%') OR "clients"."dob" IS NULL))
This includes the set of clients where sin is null and the set of clients where dob is null along with the requested set of clients with a name like 'John Doe'.
So enter my attempt to conditionally add clauses to the where block. At first, I tried to check the values using the nil? method:
def self.by_any (sin, name, dob)
where do
(clients.sin == "#{sin}" unless sin.nil?) |
(clients.name =~ "%#{name}" unless name.nil?) |
(clients.dob == "#{dob}" unless dob.nil?)
end
which results in:
SELECT "clients".* FROM "clients" WHERE ('t')
raising many other questions, like what's the deal with that 't', but that's a tangent.
Short of writing the where clause for each permutation, is there a way I can conditionally add clauses?
So, this isn't the prettiest thing ever, but it does what you're after.
def self.by_any(sin, name, dob)
where do
[
sin.presence && clients.sin == "#{sin}",
name.presence && clients.name =~ "%#{name}",
dob.presence && clients.dob == "#{dob}"
].compact.reduce(:|)
# compact to remove the nils, reduce to combine the cases with |
end
end
Basically, [a, b, c].reduce(:f) returns (a.f(b)).f(c). In this case f, the method invoked, is the pipe, so we get (a.|(b)).|(c) which, in less confusing notation, is (a | b) | c.
It works because, in Squeel, the predicate operators (==, =~, and so on) return a Predicate node, so we can construct them independently before joining them with |.
In the case where all three are nil, it returns all records.
After eventually finding this related post, I cannibalized #bradgonesurfing 's alternate pattern to come to this solution:
def self.by_any (sin, name, dob)
queries = Array.new
queries << self.by_sin(sin) unless sin.nil?
queries << self.by_name(name) unless name.nil?
queries << self.by_dob(dob) unless dob.nil?
self.where do
queries = queries.map { |q| id.in q.select{id} }
queries.inject { |s, i| s | i }
end
end
where self.by_sin, self.by_name, and self.by_dob are simple scopes with filters. This produces something along the lines of:
SELECT *
FROM clients
WHERE clients.id IN (<subquery for sin>)
OR clients.id IN (<subquery for name>)
OR clients.id IN (<subquery for dob>)
where the subqueries are only include if their associated value is not nil.
This effectively allows me to union the appropriate scopes together as an ActiveRecord::Relation.

Resources