accessing list from json and iterating in postgres - postgresql-9.6

I have a json as mentioned below,
{
"list": [{
"notificationId": 123,
"userId": 444
},
{
"notificationId": 456,
"userId": 789
}
]
}
I need to write a postgres procedure which interates through the list and perform either update/insert based on notification id is already present or not in DB.
I have a notification table which has notificationid and userID as columns.
Can anyone please tell me on how to perform this using postgres json operators.

Try this query:
SELECT *
FROM yourTable
WHERE col->'list'#>'[{"notificationId":123}]';
You may replace the value 123 with whatever notificationId you want to search. Follow the link below for a demo showing that this logic works:
Demo

Assuming you have a unique constraint on notificationid (e.g. because it's the primary key, there is no need for stored function or loop:
with data (j) as (
values ('
{
"list": [{
"notificationId": 123,
"userId": 444
},
{
"notificationId": 456,
"userId": 789
}
]
}'::jsonb)
)
insert into notification (notificationid, userid)
select (e.r ->> 'notificationId')::int, (e.r ->> 'userId')::int
from data d, jsonb_array_elements(d.j -> 'list') as e(r)
on conflict (notificationid) do update
set userid = excluded.userid;
The first step in that statement is to turn the array into a list of rows, this is what:
select e.*
from data d, jsonb_array_elements(d.j -> 'list') as e(r)
does. Given your sample JSON, this returns two rows with a JSON value in each:
r
--------------------------------------
{"userId": 444, "notificationId": 123}
{"userId": 789, "notificationId": 456}
This is then split into two integer columns:
select (e.r ->> 'notificationId')::int, (e.r ->> 'userId')::int
from data d, jsonb_array_elements(d.j -> 'list') as e(r)
So we get:
int4 | int4
-----+-----
123 | 444
456 | 789
And this result is used as the input for an INSERT statement.
The on conflict clause then does an insert or update depending on the presence of the row identified by the column notificationid which has to have a unique index.

Meanwhile i tried this,
CREATE OR REPLACE FUNCTION insert_update_notifications(notification_ids jsonb) RETURNS void AS
$$
DECLARE
allNotificationIds text[];
indJson jsonb;
notIdCount int;
i json;
BEGIN
FOR i IN SELECT * FROM jsonb_array_elements(notification_ids)
LOOP
select into notIdCount count(notification_id) from notification_table where notification_id = i->>'notificationId' ;
IF(notIdCount = 0 ) THEN
insert into notification_table(notification_id,userid) values(i->>'notificationId',i->>'userId');
ELSE
update notification_table set userid = i->>'userId' where notification_id = i->>'notificationId';
END IF;
END LOOP;
END;
$$
language plpgsql;
select * from insert_update_notifications('[{
"notificationId": "123",
"userId": "444"
},
{
"notificationId": "456",
"userId": "789"
}
]');
It works.. Please review this.

Related

How to do CosmosDB JOIN to get an array of Column values

I have following data in cosmosDb
[
{
accountId : "12453"
tag : "abc"
},
{
accountId : "12453"
tag : "bcd"
},
{
accountId : "34567"
tag : "qwe"
},
{
accountId : "34567"
tag : "xcx"
}
]
And desired output is something like this
[
{
accountId : "123453"
tag : {
"abc",
"bcd"
},
{
accountId : "34567"
tag : {
"qwe",
"xcx"
}
]
I tried Join, array and group by in multiple ways but didn't work.
Below query gives similar output but rather than count I am looking for an array of tags
select c.accountId, count(c.tag) from c where c.accountId = "12453" group BY c.accountId
Also tried below query
select c.accountId, ARRAY(select c.tag from c IN f where f.accountId = "12453") as tags FROM f
This doesn't return any tag values and I get below output but I need distinct accountId and when I try DISTINCT on accountId, it gives an error.
[
{
accountId : "12,
tag : []
}
........
]
Can someone please help with correct query syntax
As M0 B said, this is not supported. Cosmos DB can't combine arrays across documents. You need to handle this on your client side.

How to show AVRO data from one field in two columns?

Let's say I have an AVRO file that was written as a record with one column named c1.
When I read this file with the schema it was written with I get all the data from c1.
Now, is it possible to read exactly that file with another schema, that has for example three columns? One column named c0 with a default value of null, one column named c1 that returns all the values as with the first schema. And one column named c2 with an alias of c1 that also returns all values of c1?
This is probably implementation dependent. If using JavaScript is an option, avsc will let you do what you want. For example, if you write an Avro file using your first schema...
const writerSchema = {
type: 'record',
name: 'Foo',
fields: [{name: 'c1', type: 'int'}]
};
const encoder = avro.createFileEncoder('data.avro', writerSchema);
// Write a little data.
encoder.write({c1: 123});
encoder.write({c1: 45});
encoder.end({c1: 6789});
...then you can read it with the second schema simply as...
const readerSchema = {
type: 'record',
name: 'Foo',
fields: [
{name: 'c0', type: ['null', 'int'], 'default': null},
{name: 'c1', type: 'int'},
{name: 'c2', aliases: ['c1'], type: 'int'},
]
};
// Decode the file and print out its data.
avro.createFileDecoder('data.avro', {readerSchema})
.on('data', (val) => { console.log(val); });
...and the output will have the form you expect:
Foo { c0: null, c1: 123, c2: 123 }
Foo { c0: null, c1: 45, c2: 45 }
Foo { c0: null, c1: 6789, c2: 6789 }

Temporary table creation takes long time RDS Server

Below query takes long time to create temporary table, its only have "228000" distinct record.
DECLARE todate,fromdate DATETIME;
SET fromdate=DATE_SUB(UTC_TIMESTAMP(),INTERVAL 2 DAY);
SET todate=DATE_ADD(UTC_TIMESTAMP(),INTERVAL 14 DAY);
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
DROP TEMPORARY TABLE IF EXISTS tempabc;
SET max_heap_table_size = 1024*1024*1024;
CREATE TEMPORARY TABLE IF NOT EXISTS tempabc
-- (index using BTREE(id))
ENGINE=MEMORY
AS
(
SELECT SQL_NO_CACHE DISTINCT id
FROM abc
WHERE StartTime BETWEEN fromdate AND todate
);
I already created index on 'startTime' coulmn, still it tooks 20 sec to create table. Kindly help me out to reduce the creation time.
More Info:-
I changed my query earlier I was using "tempabc" temporary table to get my output, now I am using IN clause instead of temporary table and now it is taking 12 sec to execute, but still more than expected time..
Earlier(taking 20-30 sec)
DECLARE todate,fromdate DATETIME;
SET fromdate=DATE_SUB(UTC_TIMESTAMP(),INTERVAL 2 DAY);
SET todate=DATE_ADD(UTC_TIMESTAMP(),INTERVAL 14 DAY);
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
DROP TEMPORARY TABLE IF EXISTS tempabc;
SET max_heap_table_size = 1024*1024*1024;
CREATE TEMPORARY TABLE IF NOT EXISTS tempabc
-- (index using BTREE(id))
ENGINE=MEMORY
AS
(
SELECT SQL_NO_CACHE DISTINCT id
FROM abc
WHERE StartTime BETWEEN fromdate AND todate
);
SELECT DISTINCT p.xyzID
FROM tempabc s
JOIN xyz_tab p ON p.xyzID=s.ID AND IFNULL(IsGeneric,0)=0;
Now(taking 12-14 sec)
DECLARE todate,fromdate Timestamp;
SET fromdate=DATE_SUB(UTC_TIMESTAMP(),INTERVAL 2 DAY);
SET todate=DATE_ADD(UTC_TIMESTAMP(),INTERVAL 14 DAY);
SELECT p.xyzID FROM xyz_tab p
WHERE id IN (
SELECT DISTINCT id FROM abc
WHERE StartTime BETWEEN fromdate AND todate )
AND IFNULL(IsGeneric,0)=0 GROUP BY p.xyxID;
But we need to achieve 3-5 sec of execution time.
This is my explain output.
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: abc
partitions: NULL
type: index
possible_keys: ix_starttime_id,IDX_Start_time,IX_id_starttime,IX_id_starttime_prgsvcid
key: IX_id_starttime
key_len: 163
ref: NULL
rows: 18779876
filtered: 1.27
Extra: Using where; Using index; Using temporary; Using filesort; LooseScan
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: p
partitions: NULL
type: eq_ref
possible_keys: PRIMARY,IX_seriesid
key: PRIMARY
key_len: 152
ref: onconnectdb.abc.ID
rows: 1
filtered: 100.00
Extra: Using where
Explain in JSON format
EXPLAIN: {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "10139148.44"
},
"grouping_operation": {
"using_temporary_table": true,
"using_filesort": true,
"cost_info": {
"sort_cost": "1.00"
},
"nested_loop": [
{
"table": {
"table_name": "abc",
"access_type": "index",
"possible_keys": [
"ix_starttime_tmsid",
"IDX_Start_time",
"IX_id_starttime",
"IX_id_starttime_prgsvcid"
],
"key": "IX_id_starttime",
"used_key_parts": [
"ID",
"StartTime",
"EndTime"
],
"key_length": "163",
"rows_examined_per_scan": 19280092,
"rows_produced_per_join": 264059,
"filtered": "1.37",
"using_index": true,
"loosescan": true,
"cost_info": {
"read_cost": "393472.45",
"eval_cost": "52812.00",
"prefix_cost": "446284.45",
"data_read_per_join": "2G"
},
"used_columns": [
"ID",
"StartTime"
],
"attached_condition": "(`onconnectdb`.`abc`.`StartTime` between <cache>(fromdate#1) and <cache>(todate#0))"
}
},
{
"table": {
"table_name": "p",
"access_type": "eq_ref",
"possible_keys": [
"PRIMARY",
"IX_seriesid"
],
"key": "PRIMARY",
"used_key_parts": [
"ID"
],
"key_length": "152",
"ref": [
"onconnectdb.abc.ID"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 1,
"filtered": "100.00",
"cost_info": {
"read_cost": "9640051.00",
"eval_cost": "0.20",
"prefix_cost": "10139147.44",
"data_read_per_join": "2K"
},
"used_columns": [
"ID",
"xyzID",
"IsGeneric"
],
"attached_condition": "(ifnull(`onconnectdb`.`p`.`IsGeneric`,0) = 0)"
}
}
]
}
}
}
Please suggest.

Rails PSQL query JSON for nested array and objects

So I have a json (in text field) and I'm using postgresql and I need to query the field but it's nested a bit deep. Here's the format:
[
{
"name":"First Things",
"items":[
{
"name":"Foo Bar Item 1",
"price":"10.00"
},
{
"name":"Foo Item 2",
"price":"20.00"
}
]
},
{
"name":"Second Things",
"items": [
{
"name":"Bar Item 3",
"price":"15.00"
}
]
}
]
And I need to query the name INSIDE the items node. I have tried some queries but to no avail, like:
.where('this_json::JSON #> [{"items": [{"name": ?}]}]', "%#{name}%"). How should I go about here?
I can query normal JSON format like this_json::JSON -> 'key' = ? but need help with this bit.
Here you need to use json_array_elements() twice, as your top level document contains array of json, than items key has array of sub documents. Sample psql query may be the following:
SELECT
item->>'name' AS item_name,
item->>'price' AS item_price
FROM t,
json_array_elements(t.v) js_val,
json_array_elements(js_val->'items') item;
where t - is the name of your table, v - name of your JSON column.

Aerospike: how to bulk load a list of integers into a bin?

I'm trying to use the Aerospike bulk loader to seed a cluster with data from a tab-separated file.
The source data looks like this:
set key segments
segment 123 10,20,30,40,50
segment 234 40,50,60,70
The third column, 'segments', contains a comma separated list of integers.
I created a JSON template:
{
"version" : "1.0",
"input_type" : "csv",
"csv_style": { "delimiter": " " , "n_columns_datafile": 3, "ignore_first_line": true}
"key": {"column_name":"key", "type": "integer"},
"set": { "column_name":"set" , "type": "string"},
"binlist": [
{"name": "segments",
"value": {"column_name": "segments", "type": "list"}
}
]
}
... and ran the loader:
java -cp aerospike-load-1.1-jar-with-dependencies.jar com.aerospike.load.AerospikeLoad -c template.json data.tsv
When I query the records in aql, they seem to be a list of strings:
aql> select * from test
+--------------------------------+
| segments |
+--------------------------------+
| ["10", "20", "30", "40", "50"] |
| ["40", "50", "60", "70"] |
+--------------------------------+
The data I'm trying to store is a list of integers. Is there an easy way to convert the objects stored in this bin to a list of integers (possibly a Lua UDF) or perhaps there's a tweak that can be made to the bulk loader template?
Update:
I attempted to solve this by creating a Lua UDF to convert the list from strings to integers:
function convert_segment_list_to_integers(rec)
for i=1, table.maxn(rec['segments']) do
rec['segments'][i] = math.floor(tonumber(rec['segments'][i]))
end
aerospike:update(rec)
end
... registered it:
aql> register module 'convert_segment_list_to_integers.lua'
... and then tried executing against my set:
aql> execute convert_segment_list_to_integers.convert_segment_list_to_integers() on test.segment
I enabled some more verbose logging and notice that the UDF is throwing an error. Apparently, it's expecting a table and it was passed userdata:
Dec 04 2015 23:23:34 GMT: DEBUG (udf): (udf_rw.c:send_result:527) FAILURE when calling convert_segment_list_to_integers convert_segment_list_to_integers ...rospike/usr/udf/lua/convert_segment_list_to_integers.lua:2: bad argument #1 to 'maxn' (table expected, got userdata)
Dec 04 2015 23:23:34 GMT: DEBUG (udf): (udf_rw.c:send_udf_failure:407) Non-special LDT or General UDF Error(...rospike/usr/udf/lua/convert_segment_list_to_integers.lua:2: bad argument #1 to 'maxn' (table expected, got userdata))
It seems that maxn isn't an applicable method to a userdata object.
Can you see what needs to be done to fix this?
To convert your lists with string values to lists of integer values you can run the following record udf:
function convert_segment_list_to_integers(rec)
local list_with_ints = list()
for value in list.iterator(rec['segments']) do
local int_value = math.floor(tonumber(value))
list.append(list_with_ints, int_value)
end
rec['segments'] = list_with_ints
aerospike:update(rec)
end
When you edit your existing lua module, make sure to re-run register module 'convert_segment_list_to_integers.lua'.
The cause of this issue is within the aerospike-loader tool: it will always assume/enforce strings as you can see in the following java code:
case LIST:
/*
* Assumptions
* 1. Items are separated by a colon ','
* 2. Item value will be a string
* 3. List will be in double quotes
*
* No support for nested maps or nested lists
*
*/
List<String> list = new ArrayList<String>();
String[] listValues = binRawText.split(Constants.LIST_DELEMITER, -1);
if (listValues.length > 0) {
for (String value : listValues) {
list.add(value.trim());
}
bin = Bin.asList(binColumn.getBinNameHeader(), list);
} else {
bin = null;
log.error("Error: Cannot parse to a list: " + binRawText);
}
break;
Source on Github: http://git.io/vRAQW
If you prefer, you can modify this code and re-compile to always assume integer list values. Change line 266 and 270 to something like this (untested):
List<Integer> list = new ArrayList<Integer>();
list.add(Integer.parseInt(value.trim());

Resources