Query jsonb array for integer member - ruby-on-rails

Background: We use PaperTrail to keep the history of our changing models. Now I want to query for a Item, which belonged to a certain customer. PaperTrail optionally stores the object_changes and I need to query this field to understand, when something was created with this ID or changed to this ID.
My table looks simplified like this:
item_type | object_changes
----------|----------------------------------------------------------
"Item" | {"customer_id": [null, 5], "other": [null, "change"]}
"Item" | {"customer_id": [4, 5], "other": ["unrelated", "change"]}
"Item" | {"customer_id": [5, 6], "other": ["asht", "asht"]}
How do I query for elements changed from or to ID 5 (so all rows above)? I tried:
SELECT * FROM versions WHERE object_changes->'customer_id' ? 5;
Which got me:
ERROR: operator does not exist: jsonb ? integer
LINE 1: ...T * FROM versions WHERE object_changes->'customer_id' ? 5;
^
HINT: No operator matches the given name and argument type(s).
You might need to add explicit type casts.

For jsonb the contains operator #> does what you ask for:
Get all rows where the number 5 is an element of the "customer_id" array:
SELECT *
FROM versions
WHERE object_changes->'customer_id' #> '5';
The #> operator expects jsonb as right operand - or a string literal that is valid for jsonb (while ? expects text). The numeric literal without single quotes you provided in your example (5) cannot be coerced to jsonb (nor text), it defaults to integer. Hence the error message. Related:
No function matches the given name and argument types
PostgreSQL ERROR: function to_tsvector(character varying, unknown) does not exist
This can be supported with different index styles. For my query suggested above, use an expression index (specialized, small and fast):
CREATE INDEX versions_object_changes_customer_id_gin_idx ON versions
USING gin ((object_changes->'customer_id'));
This alternative query works, too:
SELECT * FROM versions WHERE object_changes #> '{"customer_id": [5]}';
And can be supported with a general index (more versatile, bigger, slower):
CREATE INDEX versions_object_changes_gin_idx ON versions
USING gin (object_changes jsonb_path_ops);
Related:
Index for finding an element in a JSON array
Query for array elements inside JSON type
According to the manual, the operator ? searches for any top-level key within the JSON value. Testing indicates that strings in arrays are considered "top-level keys", but numbers are not (keys have to be strings after all). So while this query would work:
SELECT * FROM versions WHERE object_changes->'other' ? 'asht';
Your query looking for a number in an array will not (even when you quote the input string literal properly). It would only find the (quoted!) string "5", classified as key, but not the (unquoted) number 5, classified as value.
Aside: Standard JSON only knows 4 primitives: string, number, boolean and null. There is no integer primitive (even if I have heard of software adding that), integer is a just a subset of number, which is implemented as numeric in Postgres:
https://www.postgresql.org/docs/current/static/datatype-json.html#JSON-TYPE-MAPPING-TABLE
So your question title is slightly misleading as there are no "integer" members, strictly speaking.

Use a lateral join and the jsonb_array_elements_text function to process each row's object_changes:
SELECT DISTINCT v.* FROM versions v
JOIN LATERAL jsonb_array_elements_text(v.object_changes->'customer_id') ids ON TRUE
WHERE ids.value::int = 5;
The DISTINCT is only necessary if the customer_id you're looking for could appear multiple times in the array (if a different field changed but customer_id is tracked anyway).

Related

Rails PostgreSql store multidimensional array

Is it possible to store a multidimensional array in a column.
I have tried the following and received the error below coming from creating the records column.
migration_file.rb
create_table :balance_sheets_details do |t|
t.string :headers, array: true, default: []
t.string :records, array: true, default: [[]]
t.timestamps
end
Raised error
PG::InvalidTextRepresentation: ERROR: malformed array literal: "{{}}"
From the docs on arrays (emphasis added):
The syntax for CREATE TABLE allows the exact size of arrays to be specified, for example:
CREATE TABLE tictactoe (
squares integer[3][3]
);
However, the current implementation ignores any supplied array size limits, i.e., the behavior is the same as for arrays of unspecified length.
The current implementation does not enforce the declared number of dimensions either. Arrays of a particular element type are all considered to be of the same type, regardless of size or number of dimensions. So, declaring the array size or number of dimensions in CREATE TABLE is simply documentation; it does not affect run-time behavior.
Thus, there isn't really a multidimensional array type. To fix your issue, just change the default from {{}} to {}.
This means a varchar[][] is the same type as a varchar[]:
db=# select pg_typeof(a), pg_typeof(b) from (values ('{{hello},{world}}'::varchar[][], '{foo}'::varchar[])) x(a, b);
pg_typeof | pg_typeof
---------------------+---------------------
character varying[] | character varying[]
(1 row)
You will still be able to store multidimensional data, though.
A one and two dimensional array are not the same:
db=# select '{{foo}}'::varchar[] = '{foo}'::varchar[];
?column?
----------
f
(1 row)

Query against a Postgres array column type

TL;DR I'm wondering what the pros and cons are (or if they are even equivalent) between #> {as_champion, whatever} and using IN ('as_champion', 'whatever') is. Details below:
I'm working with Rails and using Postgres' array column type, but having to use raw sql for my query as the Rails finder methods don't play nicely with it. I found a way that works, but wondering what the preferred method is:
The roles column on the Memberships table is my array column. It was added via rails as so:
add_column :memberships, :roles, :text, array: true
When I examine the table, it shows the type as: text[] (not sure if that is truly how Postgres represents an array column or if that is Rails shenanigans.
To query against it I do something like:
Membership.where("roles #> ?", '{as_champion, whatever}')
From the fine Array Operators manual:
Operator: #>
Description: contains
Example: ARRAY[1,4,3] #> ARRAY[3,1]
Result: t (AKA true)
So #> treats its operand arrays as sets and checks if the right side is a subset of the left side.
IN is a little different and is used with subqueries:
9.22.2. IN
expression IN (subquery)
The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result. The result of IN is "true" if any equal subquery row is found. The result is "false" if no equal row is found (including the case where the subquery returns no rows).
or with literal lists:
9.23.1. IN
expression IN (value [, ...])
The right-hand side is a parenthesized list of scalar expressions. The result is "true" if the left-hand expression's result is equal to any of the right-hand expressions. This is a shorthand notation for
expression = value1
OR
expression = value2
OR
...
So a IN b more or less means:
Is the value a equal to any of the values in the list b (which can be a query producing single element rows or a literal list).
Of course, you can say things like:
array[1] in (select some_array from ...)
array[1] in (array[1], array[2,3])
but the arrays in those cases are still treated like single values (that just happen to have some internal structure).
If you want to check if an array contains any of a list of values then #> isn't what you want. Consider this:
array[1,2] #> array[2,4]
4 isn't in array[1,2] so array[2,4] is not a subset of array[1,2].
If you want to check if someone has both roles then:
roles #> array['as_champion', 'whatever']
is the right expression but if you want to check if roles is any of those values then you want the overlaps operator (&&):
roles && array['as_champion', 'whatever']
Note that I'm using the "array constructor" syntax for the arrays everywhere, that's because it is much more convenient for working with a tool (such as ActiveRecord) that knows to expand an array into a comma delimited list when replacing a placeholder but doesn't fully understand SQL arrays.
Given all that, we can say things like:
Membership.where('roles #> array[?]', %w[as_champion whatever])
Membership.where('roles #> array[:roles]', :roles => some_ruby_array_of_strings)
and everything will work as expected. You're still working with little SQL snippets (as ActiveRecord doesn't have a full understanding of SQL arrays or any way of representing the #> operator) but at least you won't have to worry about quoting problems. You could probably go through AREL to manually add #> support but I find that AREL quickly devolves into an incomprehensible and unreadable mess for all but the most trivial uses.

JSONB query in Rails for a key that contains an array of hashes

I have a Rails 5 project with a Page model that has a JSONB column content. So the structure looks like this (reduced to the bare minimum for the question):
#<Page id: 46, content: {..., "media" => [{ "resource_id" => 143, "other_key" => "value", ...}, ...], ...}>
How would I write a query to find all pages that have a resource_id of some desired number under the media key of the content JSONB column? This was an attempt that I made which doesn't work (I think because there are other key/value pairs in each item of the array):
Page.where("content -> 'media' #> ?", {resource_id: '143'}.to_json)
EDIT: This works, but will only check the first hash in the media array: Page.where("content -> 'media' -> 0 ->> 'resource_id' = ?", '143')
Using sql, this should give you all pages which have resource id 143:
select * from pages p where '{"resource_id": 143}' <# ANY ( ARRAY(select jsonb_array_elements ( content -> 'media' ) from pages where id=p.id ) );
Postgresql has a function called ANY (postgres docs) which uses the form expression operator ANY (array). The left-hand expression is evaluated and compared to each element of the array using the given operator.
Since the right hand side parameter to ANY has to be an array (not a json array), we use the jsonb_array_elements method to convert the content->media json array into a set of rows which are then converted into an array by using ARRAY().
The <# operator checks if the expression on the right contains the expression on the left side. Ex: '{"a": 1}'::jsonb <# '{"b": 2, "a": 1}'::jsonb will return true.

Mongoid order by length of array

How to sort the Mongoid model by the length of the array which is a field inside the model.
Mongo documentation says:
You cannot use $size to find a range of sizes (for example: arrays
with more than 1 element). If you need to query for a range, create an
extra size field that you increment when you add elements. Indexes
cannot be used for the $size portion of a query, although if other
query expressions are included indexes may be used to search for
matches on that portion of the query expression.
So we cannot order by using mongo's $size.
You can solve your task by adding new field, which will store array size.
class Post
include Mongoid::Document
field :likes, type: Array, default: []
field :likes_size, type: Integer
before_save do
self.likes_size = likes.size
end
end
Sort posts by likes_size:
Post.order_by(likes_size: :desc)
Document says that you can't orderby using size.
Try adding a new column containing the value of size and sort it which will work as order by.
In ruby, you can sort an array like this :
my_array.sort_by(&:my_attr)
It will sort the array my_array by the attribute my_attr of each element inside the array.
You can also write it like this :
my_array.sort_by{|element| element.my_attr }
Which is exactly the same, it will sort by the my_attr attribute of each element. This second syntax is for when you want a more complex sort condition than just the result of a method of each element.
Documentation : http://ruby-doc.org/core-2.3.1/Enumerable.html#method-i-sort_by

sanitize_sql_array is adding extra, unnecessary quotes to query

This is the first time I've seen this issue. I'm building up an SQL array to run through sanitize_sql_array and Rails is adding extra, unnecessary single quotes in the return value. So instead of returning:
SELECT DISTINCT data -> 'Foo' from products
it returns:
SELECT DISTINCT data -> ''Foo'' from products
which of course Postgres doesn't like.
Here is the code:
sql_array = ["SELECT DISTINCT %s from products", "data -> 'Foo'"]
sql_array = sanitize_sql_array(sql_array)
connection.select_values(sql_array)
Note the same thing happens when I use the shorter and more usual:
sql_array = ["SELECT DISTINCT %s from products", "data -> 'Foo'"]
connection.select_values(send(:sanitize_sql_array, sql_array))
Ever seen this before? Does it have something to do with using HStore? I definitely need that string sanitized since the string Foo is actually coming from a user-entered variable.
Thanks!
You're giving sanitize_sql_array a string that contains an hstore expression and expecting sanitize_sql_array to understand that the string contains some hstore stuff; that's asking far too much, sanitize_sql_array only knows about simple things like strings and numbers, it doesn't know how to parse PostgreSQL's SQL extensions or even standard SQL. How would you expect sanitize_sql_array to tell the difference between, for example, a string that happens to contain '11 * 23' and a string that is supposed to represent the arithmetical expression 11 * 23?
You should split your data -> 'Foo' into two pieces so that sanitize_sql_array only sees the string part when it is sanitizing things:
sql_array = [ 'select distinct data -> ? from products', 'Foo' ]
sql = sanitize_sql_array(sql_array)
That will give you the SQL you're looking for:
select distinct data -> 'Foo' from products

Resources