Filtering by comparing two streams one-on-one in jq - stream

I have streams
{
"key": "a",
"value": 1
}
{
"key": "b",
"value": 1
}
{
"key": "c",
"value": 1
}
{
"key": "d",
"value": 1
}
{
"key": "e",
"value": 1
}
And
(true,true,false,false,true)
I want to compare the two one-on-one and only print the object if the corresponding boolean is true.
So I want to output
{
"key": "a",
"value": 1
}
{
"key": "b",
"value": 1
}
{
"key": "e",
"value": 1
}
I tried (https://jqplay.org/s/GGTHEfQ9s3)
filter:
. as $input | foreach (true,true,false,false,true) as $dict ($input; select($dict))
input:
{
"key": "a",
"value": 1
}
{
"key": "b",
"value": 1
}
{
"key": "c",
"value": 1
}
{
"key": "d",
"value": 1
}
{
"key": "e",
"value": 1
}
But I get output:
{"key":"a","value":1}
{"key":"a","value":1}
null
{"key":"b","value":1}
{"key":"b","value":1}
null
{"key":"c","value":1}
{"key":"c","value":1}
null
{"key":"d","value":1}
{"key":"d","value":1}
null
{"key":"e","value":1}
{"key":"e","value":1}
null
Help will be appreciated.

One way would be to read in the streams as arrays, use transpose to match their items, and select by one and output the other:
jq -s '[.,[(true,true,false,false,true)]] | transpose[] | select(.[1])[0]' objects.json
Demo
Another approach would be to read in the streams as arrays, convert the booleans array into those indices where conditions match, and use them to reference into the objects array:
jq -s '.[[(true,true,false,false,true)] | indices(true)[]]' objects.json
Demo
The same approach but using nth to reference into the inputs stream requires more precaution, as the successive consumption of stream inputs demands the provision of relative distances, not absolute positions to nth. A conversion can be implemented by successively checking the position of the next true value using index and a while loop:
jq -n 'nth([true,true,false,false,true] | while(. != []; .[index(true) + 1:]) | index(true) | values; inputs)' objects.json
Demo
One could also use reduce to directly iterate over the boolean values, and just select any appropriate input:
jq -n 'reduce (true,true,false,false,true) as $dict ([]; . + [input | select($dict)]) | .[]' objects.json
Demo
A solution using foreach, like you intended, also would need the -n option to not miss the first item:
jq -n 'foreach (true,true,false,false,true) as $dict (null; input | select($dict))' objects.json
Demo

Unfortunately, each invocation of jq can currently handle at most one external JSON stream. This is not usually an issue unless both streams are very large, so in this answer I'll focus on a solution that scales. In fact, the amount of computer memory required is miniscule no matter how large the streams may be.
For simplicity, let's assume that:
demon.json is a file consisting of a stream of JSON boolean values (i.e., not comma-separated);
object.json is your stream of JSON objects;
the streams have the same length;
we are working in a bash or bash-like environment.
Then we could go with:
paste -d '\t' demon.json <(jq -c . objects.json) | jq -n '
foreach inputs as $boolean (null; input; select($boolean))'
So apart from the startup costs of paste and jq, we basically only need enough memory to hold one of the objects in objects.json at a time. This solution is also very fast.
Of course, if objects.json were already in JSONL (JSON-lines) format, then the first call to jq above would not be necessary.

Related

Elasticsearch saves document as string of array, not array of strings

I am trying to contain array as a document value.
I succeed it in "tags" field as below;
This document contains array of strings.
curl -XGET localhost:9200/MY_INDEX/_doc/132328908
#=> {
"_index":"MY_INDEX",
"_type":"_doc",
"_id":"132328908",
"found":true,
"_source": {
"tags": ["food"]
}
}
However, when I am putting items in the same way as above,
the document is SOMETIMES like that;
curl -XGET localhost:9200/MY_INDEX/_doc/328098989
#=> {
"_index":"MY_INDEX",
"_type":"_doc",
"_id":"328098989",
"found":true,
"_source": {
"tags": "[\"food\"]"
}
}
This is string of array, not array of strings, which I expected.
"tags": "[\"food\"]"
It seems that this situation happens randomly and I could not predict it.
How could it happen?
Note:
・I use elasticsearch-ruby client to index a document.
This is my actual code;
es_client = Elasticsearch::Client.new url: MY_ENDPOINT
es_client.index(
index: MY_INDEX,
id: random_id, # defined elsewhere
body: {
doc: {
"tags": ["food"]
},
}
)
Thank you in advance.

Parsing stops on null or []

I have this filter working well but after a new use case where the property "p" can be null or a empty array [], the parser stop to evaluate the expression.
".p[]?.product.productId" the issue is here, when p is null or an empty array [].
When I have the p property like this, it works well. [{}] or [{"id":123}]
I'm breaking the filter in lines to make it easy to understand.
.p as $p
| .p[]?.product.productId as $pa
| .io[]
| select(.product.productId == ($pa) or .description == "product description x")
| .product.productId as $pid
| {"offerId": .offerId,
"description": .description,
"required":
"($p[] | select(.product.productId == $pid) | .required)",
"applied": false,
"amount": (if .prices | length == 0
then 0
elif .prices[0].amount != null
then .prices[0].amount
else .prices[0].amountPercentage
end)}
Input:
{
"p": null,
"io": [{
"offerId": 5593,
"description": "product description x",
"product": {
"productId": 393,
"description": "product description x 2",
"type": "Insurance"
},
"prices": [
{
"amount": null,
"amountPercentage": 4.13999987,
"status": "On"
}
]
}]
}
All I want is to be able to ignore the P when it is null or [].
*I'm aware about this literal expression "($p[] | select(.product.productId == $pid) | .required)"
jqplay.org/s/wYwKUFM2XR
Regards
E? is like try E catch empty, whereas what you seem to want is either try E catch null or perhapsE? // null
.p[]? is not the same as .p?[] or .p?[]?:
$ jq -n '[] | .p[]?'
jq: error (at <unknown>): Cannot index array with string "p"
$ jq -n '[] | .p?[]'
$
$ jq -n '[] | .p?[]?'
$
Specifically, .p[] is like .p | (try .[] catch empty), so there is nothing to stop the .p from raising an exception.
You might like to consider using try explicitly:
$ jq -n '[] | try .p[] catch null'
$

JMESPath to extract key where values match

Given source which looks like:
{
"Name": "sandbox-config",
"VersionList": {
"version-2": [ "STAGING" ],
"version-1": [ "CURRENT", "NEXT" ],
"version-0": [ "ANCIENT" ]
}
}
I'm looking for a jmespath query which would give me:
{
"Name": "sandbox-config",
"Version": "version-1"
}
where version-1 is the first key where the value array contains "CURRENT".
So, a query like,
{ Name:Name, Version:VersionList.*[?#==`CURRENT`] | [] | [0]}
gives me:
{
"Name": "sandbox-config",
"Version": "CURRENT"
}
which isn't what I'm after. Similarly:
{Name:Name, Version:VersionList.keys(#)}
which gives me:
{
"Name": "sandbox-config",
"Version": [
"version-2",
"version-1",
"version-0"
]
}
Any suggestions? I feel like I'm circling around a solution and not quite getting there.
(Context for this: I'm trying to process the output of aws secretsmanager list-secrets, which has SecretVersionsToStages with ARN values as keys with an array containing "AWSCURRENT".)
https://docs.aws.amazon.com/secretsmanager/latest/apireference/API_DescribeSecret.html#API_DescribeSecret_ResponseSyntax
if you want just get the version number for one secret with stage [AWSCURRENT], I recommend you use describe secret rather than list secret.
And one thing I need to call out is that SecretVersionsToStages with versoin number as keys with an array containing "AWSCURRENT".

JSON-LD normalization - ignore JSON nesting

I'm working on JSON-LD serialization, and ideally I would like to have a #context which I can add to the existing GeoJSON output (together with some #ids and #types), so that both the Turtle output and the JSON-LD output will normalize to the same triples.
Data is organized as follows: each object/feature has an ID and a name, and data on one or more layers. Per layer, there is a data field, which contains a JSON object.
Example GeoJSON output:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"id": "admr.nl.appingedam",
"name": "Appingedam",
"layers": {
"cbs": {
"data": {
"name": "Appingedam",
"population": 1092
}
},
"admr": {
"data": {
"name": "Appingedam",
"gme_code": 4654,
"admn_level": 3
}
}
}
},
"geometry": {…}
}
]
}
Example Turtle output:
<admr.nl.appingedam>
a :Node ;
dc:title "Appingedam" ;
:createdOnLayer <layer/admr> ;
:layerData <admr.nl.appingedam/admr> ;
:layerData <admr.nl.appingedam/cbs> .
<admr.nl.appingedam/admr>
a :LayerData ;
:definedOnLayer <layer/admr> ;
<layer/admr/name> "Appingedam" ;
<layer/admr/gme_code> "4654" .
<layer/admr/admn_level> "3" .
<admr.nl.appingedam/cbs>
a :LayerData ;
:definedOnLayer <layer/cbs> ;
<layer/cbs/name> "Appingedam" ;
<layer/cbs/population> "1092" ;
The properties object does not have its own URI. Is there a way to create a JSON-LD context which takes the contents of the properties into account, but further 'ignores' its precence?
Answered by Gregg Kellogg on JSON-LD mailing list:
This is something that keeps coming up: having a transparent layer,
that basically folds properties up a level. This was discussed during
the development of JSON-LD, but ultimately it was rejected.
I don't see any prospects for doing something in the short-term, but
it could be revisited in a possible future WG chartered with revising
the spec. Feedback like this is quite useful.
In the mean time, you can play with different JSON-LD encodings that
match your RDF though tools like http://json-ld.org/playground and my
own http://rdf.greggkellogg.net/distiller.
Gregg

Emit Tuples From Erlang Views In CouchDB

CouchDB, version 0.10.0, using native erlang views.
I have a simple document of the form:
{
"_id": "user-1",
"_rev": "1-9ccf63b66b62d15d75daa211c5a7fb0d",
"type": "user",
"identifiers": [
"ABC",
"DEF",
"123"
],
"username": "monkey",
"name": "Monkey Man"
}
And a basic javascript design document:
{
"_id": "_design/user",
"_rev": "1-94bd8a0dbce5e2efd699d17acea1db0b",
"language": "javascript",
"views": {
"find_by_identifier": {
"map": "function(doc) {
if (doc.type == 'user') {
doc.identifiers.forEach(function(identifier) {
emit(identifier, {\"username\":doc.username,\"name\":doc.name});
});
}
}"
}
}
}
which emits:
{"total_rows":3,"offset":0,"rows":[
{"id":"user-1","key":"ABC","value":{"username":"monkey","name":"Monkey Man"}},
{"id":"user-1","key":"DEF","value":{"username":"monkey","name":"Monkey Man"}},
{"id":"user-1","key":"123","value":{"username":"monkey","name":"Monkey Man"}}
]}
I'm looking into building an Erlang view that does the same thing. Best attempt so far is:
%% Map Function
fun({Doc}) ->
case proplists:get_value(<<"type">>, Doc) of
undefined ->
ok;
Type ->
Identifiers = proplists:get_value(<<"identifiers">>, Doc),
ID = proplists:get_value(<<"_id">>, Doc),
Username = proplists:get_value(<<"username">>, Doc),
Name = proplists:get_value(<<"name">>, Doc),
lists:foreach(fun(Identifier) -> Emit(Identifier, [ID, Username, Name]) end, Identifiers);
_ ->
ok
end
end.
which emits:
{"total_rows":3,"offset":0,"rows":[
{"id":"user-1","key":"ABC","value":["monkey","Monkey Man"]},
{"id":"user-1","key":"DEF","value":["monkey","Monkey Man"]},
{"id":"user-1","key":"123","value":["monkey","Monkey Man"]}
]}
The question is - how can I get those values out as tuples, instead of as arrays? I don't imagine I can (or would want to) use records, but using atoms in a tuple doesn't seem to work.
lists:foreach(fun(Identifier) -> Emit(Identifier, {id, ID, username, Username, name, Name}) end, Identifiers);
Fails with the following error:
{"error":"json_encode","reason":"{bad_term,{<<\"user-1\">>,<<\"monkey\">>,<<\"Monkey Man\">>}}"}
Thoughts? I know that Erlang sucks for this specific kind of thing (named access) and that I can do it by convention (id at first position, username next, real name last), but that makes the client side code pretty ugly.
The JSON object {"foo":"bar","baz":1} is {[{<<"foo">>,<<"bar">>},{<<"baz">>,1}]}
In Erlang lingua it is a proplist wrapped in a tuple.
It's not pretty, but very efficient :)
To get a feel for it you can play with the JSON lib that ships with CouchDB:
Start CouchDB with the -i
(interactive) flag
On the resulting erlang shell, type: couch_util:json_decode(<<"{\"foo\":\"bar\"}">>).
Profit
// in later versions of CouchDB, this is ejson:decode()
For test_suite_reports bd, that has tests field:
[
{
"name": "basics",
"status": "success",
"duration": 21795
},
{
"name": "all_docs",
"status": "success",
"duration": 385
} ...
I have wrote this to get name and status:
fun({Doc}) ->
Name = fun(L) -> proplists:get_value(<<"name">>, L, null) end,
Status = fun(L) -> proplists:get_value(<<"status">>, L, null) end,
Tests = proplists:get_value(<<"tests">>, Doc, null),
lists:foreach(fun({L}) -> Emit(Name(L), Status(L)) end, Tests)
end.
If you like experimental features (that still work...), you might want to have a look to Erlang exprecs.
I found it extremely helpful in creating a sort of dynamic records for Erlang.

Resources