KSQL - Select Columns from Array of Struct as Arrays - ksqldb

Similar to KSQL streams - Get data from Array of Struct, my input JSON looks like:
{
"Obj1": {
"a": "abc",
"b": "def",
"c": "ghi"
},
"ArrayObj": [
{
"key1": "1",
"key2": "2",
"key3": "3"
},
{
"key1": "4",
"key2": "5",
"key3": "6"
},
{
"key1": "7",
"key2": "8",
"key3": "9"
}
]
}
I have created a stream with:
CREATE STREAM Example1(Obj1 STRUCT<a VARCHAR, b VARCHAR, c VARCHAR>, ArrayObj ARRAY<STRUCT<key1 VARCHAR, key2 VARCHAR, key3 VARCHAR>>) WITH (kafka_topic='sample_topic', value_format='JSON', partitions=1);
However, I would like only a single row of output from each input JSON document, with the data from each column in the array flattened into arrays, like:
a b key1 key2 key3
abc def [1, 4, 7] [2, 5, 8] [3, 6, 9]
Is this possible with KSQL?

At present you can only flatten ArrayObj in the way you want if you know up front how many elements it will have:
CREATE STREAM flatten AS
SELECT
Obj1.a AS a,
Obj1.b AS b,
ARRAY[ArrayObj[1]['key1'], ArrayObj[2]['key1'], ArrayObj[3]['key1']] as key1,
ARRAY[ArrayObj[1]['key2'], ArrayObj[2]['key2'], ArrayObj[3]['key2']] as key2,
ARRAY[ArrayObj[1]['key3'], ArrayObj[2]['key3'], ArrayObj[3]['key3']] as key3,
FROM Example1;
I guess if you new the array was going to be up to a certain size you could just a case statement to selectively extract the elements, e.g.
-- handles arrays of size 2 or 3 elements, i.e. third element is optional.
CREATE STREAM flatten AS
SELECT
Obj1.a AS a,
Obj1.b AS b,
ARRAY[ArrayObj[1]['key1'], ArrayObj[2]['key1'], ArrayObj[3]['key1']] as key1,
ARRAY[ArrayObj[1]['key2'], ArrayObj[2]['key2'], ArrayObj[3]['key2']] as key2,
CASE
WHEN ARRAY_LENGTH(ArrayObj) >= 3)
THEN ARRAY[ArrayObj[1]['key3'], ArrayObj[2]['key3'], ArrayObj[3]['key3']]
ELSE
null
as key3,
FROM Example1;
If that doesn't suit your needs then the design discussion going on at the moment around lambda function support in ksqlDB may be of interest: https://github.com/confluentinc/ksql/pull/5661

Related

OPA masking a dynamic array field

I'm trying to apply masking on an input and result field that is part of an array. And the size of the array is dynamic. Based on the documentation, it is instructed to provide absolute array index which is not possible in this use case. Do we have any alternative?
Eg. If one needs to mask the age field of all the students from the input document?
Input:
"students" : [
{
"name": "Student 1",
"major": "Math",
"age": "18"
},
{
"name": "Student 2",
"major": "Science",
"age": "20"
},
{
"name": "Student 3",
"major": "Entrepreneurship",
"age": "25"
}
]
If you want to just generate a copy of input that has a field (or set of fields) removed from the input, you can use json.remove. The trick is to use a comprehension to compute the list of paths to remove. For example:
paths_to_remove := [sprintf("/students/%v/age", [x]) | some x; input.students[x]]
result := json.remove(input, paths_to_remove)
If you are trying to mask fields from the input document in the decision log using the Decision Log Masking feature then you would write something like:
package system.log
mask[x] {
some i
input.input.students[i]
x := sprintf("/input/students/%v/age", [i])
}

Boost documents in search results which are matched to array

I have this relatively complex search query that's already being built and working with perfect sorting.
But I think here searching is slow just because of script so all I want to remove script and write query accordingly.
current code :-
"sort": [
{
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "double pscore = 0;for(id in params.boost_ids){if(params._source.midoffice_master_id == id){pscore = -999999999;}}return pscore;",
"params": {
"boost_ids": [
3,
4,
5
]
}
}
}
}]
Above code explaination:-
For example, if a match query would give a result like:
[{m_id: 1, name: A}, {m_id: 2, name: B}, {m_id: 3, name: C}, {m_id: 4, name: D}, ...]
So I want to boost document with m_id array [3, 4, 5] which would then transform the result into:
[{m_id: 3, name: C}, {m_id: 4, name: D}, {m_id: 1, name: A}, {m_id: 2, name: B}, ...]
You can make use of the below query using Function Score Query(for boosting) and Terms Query (used to query array of values)
Note that the logic I've mentioned is in the should clause of the bool query.
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"match_all": {} //just a sample must clause to retrieve all docs
}
],
"should": [
{
"function_score": { <---- Function Score Query
"query": {
"terms": { <---- Terms Query
"m_id": [
3,4,5
]
}
},
"boost": 100 <---- Boosting value
}
}
]
}
}
}
So basically, you can remove the sort logic completely and add the above function query in your should clause, which would give you the results in the order you are looking for.
Note that you'd have to find a way to add the logic correctly in case if you have much complex query, and if you are struggling with anything, do let me know. I'd be happy to help!!
Hope this helps!

Same key, different values: nested dicts of dicts

Borrowing an MWE from this question, I have a set of nested dicts of dicts:
{
"type": "A"
"a": "aaa",
"payload": {"another":{"dict":"value", "login":"user1"}},
"actor": {"dict":"value", "login":"user2"}
}
{
"type": "B"
"a": "aaa",
"payload": {"another":{"dict":"value", "login":"user3"}},
"actor": {"dict":"value", "login":"user4"}
}
}
{
"type": "A"
"a": "aaa",
"b": "bbb",
"payload": {"another":{"dict":"value", "login":"user5"}},
"actor": {"dict":"value", "login":"user6"}
}
}
{
"type": "A"
"a": "aaa",
"b": "bbb",
"payload": {"login":"user5"},
"actor": {"login":"user6"}
}
}
For dictionaries that have "type":"A", I want to get the username from the payload dict and the username from actor dict. The same username can appear multiple times. I would like to store a txt file with a list of actor (ID1) and a list of payload (ID2) like this:
ID1 ID2
user2 user1
user6 user5
user6 user5
Right now, I have a start:
zgrep "A" | zgrep -o 'login":"[^"]*"' | zgrep -o 'payload":"[^"]*" > usernames_list.txt
But of course this won't work, because I need to find login within the payload dict and login within the actor dict for each dict of type A.
Any thoughts?
I am assuming you have the payload and actor dictionaries for all entries of type A.
Parse out the user name from the payload entries and redirect them
to a file named payload.txt
Parse out the user name from actor entries and redirect them to a
different file named actor.txt
Use paste command to join the entries and output them the way you want it

JSON (not jsonb column) column merge for multiple objects in rails

Initially I have the JSON hash value like,
a = { "1": 1, "2": 2 }. ( Initial json hash)
Now I need to add the new key-value pair to the json hash
{ "3": 3 }.( New hash key-pair )
After merged the new value, My hash looks like
a = { "1": 1, "2": 2,"3": 3 }. ( Result json hash )
Can you please share your logic for satisfying the above conditions for multiple objects?
Note: 1. My column is not a jsonb. It's a json column.
2. I am using the postsgres database.
3. Merge the key-value pair to multiple objects columns.
Thats simple. We can make use of ruby syntax merge to get the expected result.
a = { "1": 1, "2": 2 }
b = { "3": 3 }
result = a.merge(b) # will give you { "1": 1, "2": 2, "3": 3 }
If you are using Postgres 9.5+, you can convert it to jsonb and concatenate it using || operator and then cast it to json type.
UPDATE t
SET json_col = (json_col ::jsonb || '{ "3": 3 }' ::jsonb)::json;
For older versions, you may have to convert it to text combine and do some manipulations to convert to json type.
UPDATE t
SET json_col = ( replace(json_col :: text, '}', ',')
|| replace('{ "3": 3 }', '{', '' ) ) :: json ;
Demo

Can Neo4j return additional property maps

I am using the cypher rest api manually for this.
I would like to return the data in a way that is easy to parse.
Here's an example of some data with the same sort of relationships I'm dealing with:
(me:Person:Normal),
(dad:Person:Geezer),
(brother:Person:Punk),
(niece:Person:Tolerable),
(daughter:Person:Awesome),
(candy:Rule),
(tv:Rule),
(dad)-[:HAS_CHILD {Num:1}]->(brother),
(dad)-[:HAS_CHILD {Num:2}]->(me),
(me)-[:HAS_CHILD {Num:1}]->(daughter),
(brother)-[:HAS_CHILD {Num:1}]->(niece),
(me)-[:ALLOWS]->(candy),
(me)-[:ALLOWS]->(tv)
I want to get all of the HAS_CHILD relationships and if any of those child nodes have :ALLOWS relationships I want the ids of those too.
so if I were to do something like...
START n=node({idofdad}) MATCH n-[r?:HAS_CHILD]->h-[?:ALLOWS]->allowed
WITH n, r, h, collect(ID(allowed)) as allowedIds
WITH n, r,
CASE
WHEN h IS NOT NULL THEN [ID(h), LABELS(h), r.Num?, allowedIds]
ELSE NULL
END as has
RETURN
LABELS(n) as labels,
ID(n) as id,
n as node,
COLLECT(has) as children;
The :HAS_CHILD may not exist, so I have to do this wierd case thing.
The data that comes back is 'ok' but the JSON mapper that I have (Newtonsoft) doesn't make it easy to map an array to an object (meaning I know that array index[0] is ID of the (children) collections.
The results of the above look like this:
{
"columns": ["labels", "id", "node", "children"],
"data": [
["Person", "Geezer"],
6,
{},
[
[7, ["Person", "Normal"], 2, [2, 1]],
[5, ["Person", "Punk"], 1, []]
]
]
}
Since this is more/less a document and it'd be easier to map the 'children' column I'd like to get something that looks like this:
{
"columns": ["labels", "id", "node", "children"],
"data": [
["Person", "Geezer"],
6,
{},
[
[
"id": 7,
"labels": ["Person", "Normal"],
"ChildNumber": 2,
"AllowedIds": [2, 1]
},
{
"id": 5,
"labels": ["Person", "Punk"],
"ChildNumber": 1,
"AllowedIds": []
}
]
]
}
I would expect the query to look something like:
START n=node({idofdad}) MATCH n-[r?:HAS_CHILD]->h-[?:ALLOWS]->allowed
WITH n, r, h, collect(ID(allowed)) as allowedIds
WITH n, r,
CASE
WHEN h IS NOT NULL THEN
{ id: ID(h), labels: LABELS(h), ChildNumber: r.Num?, AllowedIds: allowedIds }
ELSE NULL
END as has
RETURN
LABELS(n) as labels,
ID(n) as id,
n as node,
COLLECT(has) as children;
Is this even remotely possible?

Resources