Building Activerecord / SQL query for jsonb value search

Building Activerecord / SQL query for jsonb value search - ruby-on-rails

Currently, for a recurring search with different parameters, I have this ActiveRecord query built:
current_user.documents.order(:updated_at).reverse_order.includes(:groups,:rules)
Now, usually I tack on a where clause to this to perform this search. However, I now need to do a search through the jsonb field for all rows that have a certain value as in the key:value pair. I've been able to do something a similar to that in my SQL, with this syntax (the data field will only be exactly two levels nested):
SELECT
*
FROM
(SELECT
*
FROM
(SELECT
*
FROM
documents
) A,
jsonb_each(A.data)
) B,
jsonb_each_text(B.value) ASC C
WHERE
C.value = '30';
However, I want to use the current ActiveRecord search to make this query (which includes the groups/rules eager loading).
I'm struggling with the use of the comma, which I understand is an implicit join, which is executed before explicit joins, so when I try something like this:
select * from documents B join (select * from jsonb_each(B.data)) as A on true;
ERROR: invalid reference to FROM-clause entry for table "b"
LINE 1: ...* from documents B join (select * from jsonb_each(B.data)) a...
^
HINT: There is an entry for table "b", but it cannot be referenced from this part of the query.
But I don't understand how to reference the complete "table" the ActiveRecord query I have creates before I make a joins call, as well as make use of the comma syntax for implicit joins to work.
Also, I'm an SQL amateur, so if you see some improvements or other ways to do this, please do tell.
EDIT: Description of documents table:
Table "public.documents"
Column | Type | Modifiers | Storage | Stats target | Description
------------+-----------------------------+--------------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('documents_id_seq'::regclass) | plain | |
document_id | character varying | | extended | |
name | character varying | | extended | |
size | integer | | plain | |
last_updated| timestamp without time zone | | plain | |
user_id | integer | | plain | |
created_at | timestamp without time zone | | plain | |
updated_at | timestamp without time zone | | plain | |
kind | character varying | | extended | |
uid | character varying | | extended | |
access_token_id | integer | | plain | |
data | jsonb | not null default '{}'::jsonb | extended | |
Indexes:
"documents_pkey" PRIMARY KEY, btree (id)
```
Sample rows, first would match a search for '30' (data is the last field):
2104 | 24419693037 | LsitHandsBackwards.jpg | | | 1 | 2017-06-25 21:45:49.121686 | 2017-07-01 21:32:37.624184 | box | 221607127 | 15 | {"owner": {"born": "to make history", "price": 30}}
2177 | /all-drive/uml flows/typicaluseractivity.svg | TypicalUserActivity.svg | 12375 | 2014-08-11 02:21:14 | 1 | 2017-07-07 14:00:11.487455 | 2017-07-07 14:00:11.487455 | dropbox | 325694961 | 20 | {"owner": {}}

You can use a query similar to the one you already showed:
SELECT
d.id, d.data
FROM
documents AS d
INNER JOIN json_each(d.data) AS x ON TRUE
INNER JOIN json_each(x.value) AS y ON TRUE
WHERE
cast(y.value as text) = '30';
Assuming your data would be the following one:
INSERT INTO documents
(data)
VALUES
('{"owner": {"born": "to make history", "price": 30}}'),
('{"owner": {}}'),
('{"owner": {"born": "to make history", "price": 50}, "seller": {"worth": 30}}')
;
The result you'd get is:
id | data
-: | :---------------------------------------------------------------------------
1 | {"owner": {"born": "to make history", "price": 30}}
3 | {"owner": {"born": "to make history", "price": 50}, "seller": {"worth": 30}}
You can check it (together with some step-by-step looks at the data) at dbfiddle here

Related

Influx: doing math the same fields in different groups

I have InfluxDB measurement currently set up with following "schema":
+----+-------------+-----------+
| ts | cost(field) | type(tag) |
+----+-------------+-----------+
| 1 | 10 | 'a' |
| 1 | 20 | 'b' |
| 2 | 12 | 'a' |
| 2 | 18 | 'b' |
| 2 | 22 | 'c' |
+------------------+-----------+
I am trying to write a query that will group my table by timestamp and get a delta between field values of two different tags. If I want to get delta between tag 'a' and tag 'b', it will give me following result (please not that I ignore tag 'c'):
+----+-----------+------------+
| ts | type(tag) | delta_cost |
+----+-----------+------------+
| 1 | 'a' | 10 |
| 2 | 'b' | 6 |
+----+-----------+------------+
Is it something Influx can do or am I using the wrong tool?

Just managed to answer my own question. While one of the obvious ways would be performing self-join, Influx does not support joins anymore. We can, however, use nested selects in a following format:
SELECT MEAN(cost_a) - MEAN(cost_b) as delta_cost
FROM
(SELECT cost as cost_a, tag, tablename where tag='a'),
(SELECT cost as cost_b, tag, tablename where tag='b')
GROUP BY time(60s)
Since I am getting my data every 60 seconds anyway, and I have a guarantee of just one point per tag per 60 seconds, I can use GROUP BY and take MEAN without any problems

Formatting JSON table from Postgresql request

I'm trying to create a Json format from a postgresql request.
Firstly I have used Rails to request my database in the format.json block of my controller and then used a json.builder file to format the json view. It worked until my requests return hundreds of thousands rows, so I searched how to optimize the json creation, avoiding all the ActiveRecord stack.
To do this I am using Postgresql 9.6 json functions to get directly my data in the right format, which is for example :
SELECT array_to_json('{{1157241840,-1.95},{1157241960,-1.96}}'::float[]);
[[1157241840, -1.95], [1157241960, -1.96]]
But using data from this kind of request :
SELECT date,value FROM measures;
The best I could obtain was something like this :
SELECT array_to_json(array_agg(t)) FROM (SELECT date,value FROM measures) t;
Resulting in :
[
{"date":"1997-06-13T19:12:00","value":1608.4},
{"date":"1997-06-13T19:12:00","value":-0.6}
]
which is quite different ... How would you build this SQL request ?
Thanks for your help !
My measures table look like this :
id | value | created_at | updated_at | parameter_id | quality_id | station_id | date | campain_id | elevation | sensor_id | comment_id
--------+-------+----------------------------+----------------------------+--------------+------------+------------+---------------------+------------+-----------+-----------+------------
799634 | -1.99 | 2017-02-21 09:41:09.062795 | 2017-02-21 09:41:09.118807 | 2 | | 1 | 2006-06-26 23:24:00 | 1 | -5.0 | |
1227314 | -1.59 | 2017-02-21 09:44:12.032576 | 2017-02-21 09:44:12.088311 | 2 | | 1 | 2006-11-30 19:48:00 | 1 | -5.0 | |
1227315 | 26.65 | 2017-02-21 09:44:12.032576 | 2017-02-21 09:44:12.088311 | 3 | | 1 | 2006-11-30 19:48:00 | 1 | -5.0 | |

If you need array of array you need to use json_build_array:
SELECT json_agg(json_build_array(date,value)) FROM measures;
If you want convert timestamp to epoch:
SELECT json_agg(json_build_array(extract(epoch FROM date)::int8, value)) FROM measures;
For test:
WITH measures AS (
SELECT 1157241840 as date, -1.95 as value
UNION SELECT 1157241960, -1.96
UNION SELECT 1157241980, NULL
)
SELECT json_agg(json_build_array(date,value)) FROM measures;
json_agg
----------------------------------------------------------------
[[1157241840, -1.95], [1157241960, -1.96], [1157241980, null]]

create table measures (date timestamp, value float);
insert into measures (date, value) values
(to_timestamp(1157241840),-1.95),
(to_timestamp(1157241960),-1.96);
select array_to_json(array_agg(array[extract(epoch from date), value]::float[]))
from measures
;
array_to_json
-----------------------------------------
[[1157241840,-1.95],[1157241960,-1.96]]

Query completed with an empty output

https://docs.google.com/spreadsheets/d/1033hNIUutMjjdwiZZ40u59Q8DvxBXYr7pcWyRRHAdXk
That's a link to the file in which it is not working! If you open it, go to sheet named "My query stinks".
The sheet called deposits has data like this in columns A (date), B (description), and C (amount):
+---+-----------+-----------------+---------+
| | A | B | C |
+---+-----------+-----------------+---------+
| 1 | 6/29/2016 | 1000000044 | 480 |
| 2 | 6/24/2016 | 1000000045 | 359.61 |
| 3 | 8/8/2016 | 201631212301237 | 11.11 |
+---+-----------+-----------------+---------+
The sheet "My Query Stinks" has data in columns A (check number), B (failing query) and C (amount):
+---+-----------------+------+--------+
| | A | B | C |
+---+-----------------+------+--------+
| 1 | 1000000044 | #N/A | 480 |
| 2 | 1000000045 | #N/A | 359.61 |
| 3 | 201631212301237 | #N/A | 11.11 |
+---+-----------------+------+--------+
In Column B on My Query Stinks, I want to enter a query. Here's what I'm trying:
=query(Deposits!A:C,"select A where A =" & A2)
For some reason, it returns "#N/A Error Query completed with an empty output." I want it to find that 1000000044 (the value in C4) matches 1000000044 over on Deposits and return the date.

Try
=query(Deposits!A:C,"select A where B ='" &A2&"'")
Explanation
Values like 1000000044 in Column B of the Deposit sheet and Column A of My Query Stinks sheets are set as text (string) values, so they should be enclosed on single quotes (apostrophes) otherwise QUERY think this values are numbers or variable names.

Try this:
=query(Deposits!A:C,"select A where B = '"&A2&"' LIMIT 1")
You'll need LIMIT 1 as you have multiple deposits for the same value in your second column.

Another solution for this problem could be to replace '=' with 'contains':
=query(Deposits!A:C,"select A where B contains '" &A2&"'")
Simple, but this error cost me half a morning.

Designing a Core Data managed object model for an iOS app that creates dynamic databases

I'm working on an iPhone app for users to create mini databases. The user can create a custom database schema and add columns with the standard data types (e.g. string, number, boolean) as well as other complex types such as objects and collections of a data type (e.g. an array of numbers).
For example, the user can create a database to record his meals.
Meal database:
[
{
"timestamp": "2013-03-01T13:00:00",
"foods": [1, 2],
"location": {
"lat": 47.253603,
"lon": -122.442537
}
}
]
Meal-Food database:
[
{
"id": 1,
"name": "Taco",
"healthRating": 0.5
},{
"id": 2,
"name": "Salad",
"healthRating": 0.8
}
]
What is the best way to implement a database for an app like this?
My current solution is to create the following database schema for the app:
When the user creates a new database schema as in the example above, the definition table will look like this:
+----+-----------+--------------+------------+-----------------+
| id | parent_id | name | data_type | collection_type |
+----+-----------+--------------+------------+-----------------+
| 1 | | meal | object | |
| 2 | 1 | timestamp | timestamp | |
| 3 | 1 | foods | collection | list |
| 4 | 1 | location | location | |
| 5 | | food | object | |
| 6 | 5 | name | string | |
| 7 | 5 | healthRating | number | |
+----+-----------+--------------+------------+-----------------+
When the user populates the database, the record table will look like this:
+----+-----------+---------------+------------------------+-----------+-----+
| id | parent_id | definition_id | string_value | int_value | ... |
+----+-----------+---------------+------------------------+-----------+-----+
| 1 | | 1 | | | |
| 2 | 1 | | 2013-03-01T13:00:00 | | |
| 3 | 1 | 2 | | 1 | |
| 4 | 1 | 2 | | 2 | |
| 5 | 1 | 4 | 47.253603, -122.442537 | | |
+----+-----------+---------------+------------------------+-----------+-----+
More details about this approach:
Values for different data types are stored in different columns in the record table. It is up to the app to parse values correctly (e.g. converting timestamp int_value into a date object).
Constraints and validation must be performed on the app as it is not possible on the database level.
What are other drawbacks with this approach and are there better solutions?

First of all your Record table is very inefficient and somewhat hard to work with. Instead you can have separate record tables for each record type you need to support.It will simplify everything a lot and add some additional flexibility, because it will not be a problem to introduce support for a new record type.
With that being said we can conclude it will be enough to have basic table management to make your system functional. Naturally, there is ALTER TABLE command:
but in some cases it might be very expensive and some engines have various limitations. For example:
SQLite supports a limited subset of ALTER TABLE. The ALTER TABLE
command in SQLite allows the user to rename a table or to add a new
column to an existing table.
Another approach might be to use BLOBs with some type tags in order to store record values.
This approach will reduce the need to support separate tables. It leads us to Schemaless approach.

Do you absolutely have to use CoreData for this?
It might make more sense to use a schema-less solution, such as http://developer.couchbase.com/mobile/develop/references/couchbase-lite/release-notes/iOS/index.html

[AnyDac][DApt]-400 But my tables does have a PK

[anydac][DApt]-400.Fetch command fetched[0] instead of [1] record,
Possible reasons:update table does not have PK or row identifier,record has been changed/deleted by another user,
when executing
SingleTestRunADQuery.Append();
SingleTestRunADQuery.FieldByName('run_id').Value := StartRecordingButton.Tag;
SingleTestRunADQuery.FieldByName('ph_value').Value := FloatToStr(ph_reading);
SingleTestRunADQuery.FieldByName('conductivity_value').Value := conductivity_reading;
SingleTestRunADQuery.FieldByName('cod_value').Value := cod_reading;
SingleTestRunADQuery.Post();
on
mysql> describe measurements;
+------------------------+-----------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+-----------+------+-----+-------------------+-------+
| run_id | int(11) | NO | MUL | NULL | |
| measurement_time_stamp | timestamp | NO | PRI | CURRENT_TIMESTAMP | |
| ph | float | NO | | NULL | |
| conductivity | float | NO | | NULL | |
| cod | float | NO | | NULL | |
+------------------------+-----------+------+-----+-------------------+-------+
5 rows in set (0.03 sec)
as you can see, the table does have a PK. Also, the program is single-threaded and only one copy is running, so no one else is updating.
I set SingleTestRunADQuery.MasterFields=run_id and IndexFieldNames=run_id as that is the PK of table which holds a summary of all test runs. The second table hold the measurements taken during tests, with run_id giving all the measurements for one test run (I only added PK on tiemstamp to get rid of this error, but it didn't work and can be removed, I guess).
In case it helps, here's the master data source:
mysql> describe test_runs;
+------------------+-------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+-------------+------+-----+-------------------+----------------+
| run_id | int(11) | NO | PRI | NULL | auto_increment |
| start_time_stamp | timestamp | NO | | CURRENT_TIMESTAMP | |
| end_time_stamp | timestamp | YES | | NULL | |
| description | varchar(64) | YES | | NULL | |
+------------------+-------------+------+-----+-------------------+----------------+
4 rows in set (0.05 sec)
Any idea what's wrong?
[Update] # mj2008 points out that some fields have different names. This is for historical reasons (I am still trying something out & don't want to change yet), hoever these are adapted by the query:
SELECT run_id,
measurement_time_stamp,
ph as ph_value,
conductivity as conductivity_value,
cod as cod_value
FROM photo_catalytic.measurements
ORDER BY measurement_time_stamp DESC

I'm not sure that is correct to have TIMESTAMP field as PRIMARY KEY. It will automatically change on every UPDATE.

Check out to the TFields property of Query component, through "Fields Editor" option. check that "Key Fields" has a ProviderFlags.pfInKey property set to true.
This applies today to FireDac components.

You should change the connection properties as follows: Try setting UpdateOptions.RefreshMode to rmManual.
enter image description here

check if timestamp that is on database doesnot hav seconds and miliseconds

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Building Activerecord / SQL query for jsonb value search - ruby-on-rails

Related

Influx: doing math the same fields in different groups

Formatting JSON table from Postgresql request

Query completed with an empty output

Designing a Core Data managed object model for an iOS app that creates dynamic databases

[AnyDac][DApt]-400 But my tables does have a PK

Categories

Resources