Transform { key: [values] } to {value: [keys]} in rego - open-policy-agent

Please help me write a rule on rego that returns a transformed object.
My object:
{
"read": [
"server1",
"server2"
],
"write": [
"server2",
"server3"
],
"create": [
"server1",
"server2",
"server3"
]
}
Expected:
{
"server1": [
"read",
"create"
],
"server2": [
"read",
"write",
"create"
],
"server3": [
"write",
"create"
]
}
I understand that I need to use comprehensions, but I don't understand how

TLDR; you can use comprehensions to perform group-by operations. In this case, you want to group operations by server.
operations_by_server := {server: operations |
server := input[_][_]
operations := {op |
some op
server == input[op][_]
}
}
Here's an interactive version: https://play.openpolicyagent.org/p/y5ldvjvmfK
More info on comprehensions: https://www.openpolicyagent.org/docs/latest/policy-language/#comprehensions
As a side note, if you want to learn more about Rego, check out this online course. Disclaimer: I work for the company that made the course.

Related

Why we have to pass clients array to modifyGroup command in dynamic security plugin?

In mosquitto docs, we have to pass both roles and clients whenever we are updating a group. When we pass it, the client belongs to group, but the client object is not updated and the group where it was attached is not listened on GetGroup command.
I thought there is a hierarchy of Mqtt Entities (Clients->Groups->Roles) and lower level entities cannot contain higher level entities, but groups somehow can.
Why we have to pass clients array to modifyGroup command?
Only the entries that are present in the modifyGroup object are modified. So if you send:
{
"commands":[{
"command": "modifyGroup",
"groupname": "mygroup",
"clients": [ { "username": "client", "priority": 1 } ]
}]
}
Then mygroup will be modified to have a single user client.
If instead you send:
{
"commands":[{
"command": "modifyGroup",
"groupname": "mygroup",
"clients": []
}]
}
Then mygroup will be modified to have no clients.
Finally, if you send:
{
"commands":[{
"command": "modifyGroup",
"groupname": "mygroup",
"textdescription": "text"
}]
}
Then the group members will not be modified.
If you don't have this behaviour, please update your question with examples of exactly what you are sending that produces an error.
Both roles and clients are marked as optional in the example for modifying a group
Modify Group
Command:
{
"commands":[
{
"command": "modifyGroup",
"groupname": "group to modify",
"textname": "", # Optional
"textdescription": "", # Optional
"roles": [
{ "rolename": "role", "priority": 1 }
], # Optional
"clients": [
{ "username": "client", "priority": 1 }
] # Optional
}
]
}

How to create dynamic node relation in neo4j for dynamic data?

I was able to create author nodes directly from the json file . But the challenge is on what basis or how we have to link the data. Linking "Author" to "organization". since the data is dynamic we cannot generalize it. I have tried with using csv file but, it fails the conditions when dynamic data is coming. For example one json record contain 2 organization and 3 authors, next record will be different. Different json record have different author and organization to link. organization/1 represent organization1 and organization/2 represents organization 2. Any help or hint will be great. Thank you. Please find the json file below.
"Author": [
{
"seq": "3",
"type": "abc",
"identifier": [
{
"idtype:auid": "10000000"
}
],
"familyName": "xyz",
"indexedName": "MI",
"givenName": "T",
"preferredName": {
"familyName": "xyz1",
"givenName": "a",
"initials": "T.",
"indexedName": "bT."
},
"emailAddressList": [],
"degrees": [],
"#id": "https:abc/2009127993/author/person/3",
"hasAffiliation": [
"https:abc/author/organization/1"
],
"organization": [
[
{
"identifier": [
{
"#type": "idtype:uuid",
"#subtype": "idsubtype:affiliationInstanceId",
"#value": "aff2"
},
{
"#type": "idtype:OrgDB",
"#subtype": "idsubtype:afid",
"#value": "12345"
},
{
"#type": "idtype:OrgDB",
"#subtype": "idsubtype:dptid"
}
],
"organizations": [],
"addressParts": [],
"sourceText": "",
"text": " Medical University School of Medicine",
"#id": "https:abc/author/organization/1"
}
],
[
{
"identifier": [
{
"#type": "idtype:uuid",
"#subtype": "idsubtype:affiliationInstanceId",
"#value": "aff1"
},
{
"#type": "idtype:OrgDB",
"#subtype": "idsubtype:afid",
"#value": "7890"
},
{
"#type": "idtype:OrgDB",
"#subtype": "idsubtype:dptid"
}
],
"organizations": [],
"addressParts": [],
"sourceText": "",
"text": "K University",
"#id": "https:efg/author/organization/2"
}
]
Hi I see that Organisation is part of the Author data, so you have to model it like wise. So for instance (Author)-[:AFFILIATED_WITH]->(Organisation)
When you use apoc.load.json which supports a stream of author objects you can load the data.
I did some checks on your JSON structure with this cypher query:
call apoc.load.json("file:///Users/keesv/work/check.json") yield value
unwind value as record
WITH record.Author as author
WITH author.identifier[0].`idtype:auid` as authorId,author, author.organization[0] as organizations
return authorId, author, organizations
To get this working you will need to create include apoc in the plugins directory, and add the following two lines in the apoc.conf file (create one if it is not there) in the 'conf' directory.
apoc.import.file.enabled=true
apoc.import.file.use_neo4j_config=false
I also see a nested array for the organisations in the output why is that and what is the meaning of that?
And finally I see also in the JSON that an organisation can have a reference to other organisations.
explanation
In my query I use UNWIND to unwind the base Author array. This means you get for every author a 'record' to work with.
With a MERGE or CREATE statement you can now create an Author Node with the correct properties. With the FOREACH construct you can walk over all the Organization entry and create/merge an Organization node and create the relation between the Author and the Organization.
here an 'psuedo' example
call apoc.load.json("file:///Users/keesv/work/check.json") yield value
unwind value as record
WITH record.Author as author
WITH author.identifier[0].`idtype:auid` as authorId,author, author.organization[0] as organizations
// creating the Author node
MERGE (a:Author { id: authorId })
SET a.familyName = author.familyName
...
// walk over the organizations
// determine
FOREACH (org in organizations |
MERGE (o:Organization { id: ... })
SET o.name = org.text
...
MERGE (a)-[:AFFILIATED_WITH]->(o)
// if needed you can also do a nested FOREACH here to process the Org Org relationship
)
Here is the JSON file I used I had to change something at the start and the end
[
{
"Author":{
"seq":"3",
"type":"abc",
"identifier":[
{
"idtype:auid":"10000000"
}
],
"familyName":"xyz",
"indexedName":"MI",
"givenName":"T",
"preferredName":{
"familyName":"xyz1",
"givenName":"a",
"initials":"T.",
"indexedName":"bT."
},
"emailAddressList":[
],
"degrees":[
],
"#id":"https:abc/2009127993/author/person/3",
"hasAffiliation":[
"https:abc/author/organization/1"
],
"organization":[
[
{
"identifier":[
{
"#type":"idtype:uuid",
"#subtype":"idsubtype:affiliationInstanceId",
"#value":"aff2"
},
{
"#type":"idtype:OrgDB",
"#subtype":"idsubtype:afid",
"#value":"12345"
},
{
"#type":"idtype:OrgDB",
"#subtype":"idsubtype:dptid"
}
],
"organizations":[
],
"addressParts":[
],
"sourceText":"",
"text":" Medical University School of Medicine",
"#id":"https:abc/author/organization/1"
}
],
[
{
"identifier":[
{
"#type":"idtype:uuid",
"#subtype":"idsubtype:affiliationInstanceId",
"#value":"aff1"
},
{
"#type":"idtype:OrgDB",
"#subtype":"idsubtype:afid",
"#value":"7890"
},
{
"#type":"idtype:OrgDB",
"#subtype":"idsubtype:dptid"
}
],
"organizations":[
],
"addressParts":[
],
"sourceText":"",
"text":"K University",
"#id":"https:efg/author/organization/2"
}
]
]
}
}
]
IMPORTANT create unique constraints for Author.id and Organization.id!!
In this way you can process any json file with an unknown number of author elements and an unknown number of affiliated organisations

Renaming type for FSharp.Data JsonProvider

I have a JSON that looks something like this:
{
...
"names": [
{
"value": "Name",
"language": "en"
}
],
"descriptions": [
{
"value": "Sample description",
"language" "en"
}
],
...
}
When using JsonProvider from the FSharp.Data library, it maps both fields as the same type MyJsonProvider.Name. This is a little confusing when working with the code. Is there any way how to rename the type to MyJsonProvider.NameOrDescription? I have read that this is possible for the CsvProvider, but typing
JsonProvider<"./Resources/sample.json", Schema="Name->NameOrDescription">
results in an error.
Also, is it possible to define that the Description field is actually an Option<MyJsonProvider.NameOrDescription>? Or do I just have to define the JSON twice, once with all possible values and the second time just with mandatory values?
[
{
...
"names": [
{
"value": "Name",
"language": "en"
}
],
"descriptions": [
{
"value": "Sample description",
"language" "en"
}
],
...
},
{
...
"names": [
{
"value": "Name",
"language": "en"
}
],
...
}
]
To answer your first question, I do not think there is a way of specifying such renaming. It would be quite reasonable option, but the JSON provider could also be more clever when generating names here (it knows that the type can represent Name or Description, so it could generate a name with Or based on those).
As a hack, you could add an unusued field with the right name:
type A = JsonProvider<"""{
"do not use": { "value_with_langauge": {"value":"A", "language":"A"} },
"names": [ {"value":"A", "language":"A"} ],
"descriptions": [ {"value":"A", "language":"A"} ]
}""">
To answer your second question - your names and descriptions fields are already arrays, i.e. ValueWithLanguge[]. For this, you do not need an optional value. If they are not present, the provider will simply give you an empty array.

Azure data factory pipeline for-each activity not working sequential

I've azure data factory pipeline through which I need to pull my all CSV files from blob storage container and store that files to azure data lake container. Before storing those files to data lake I need to apply some data manipulation on that file's data.
Now I need to do this process sequentially and not parallel. So I use ForEach Activity->Settings->Sequential.
But it not working sequentially and works as a parallel process.
Below is the pipeline code
{
"name":"PN_obfuscate_and_move",
"properties":{
"description":"move PN blob csv to adlgen2(obfuscated)",
"activities":[
{
"name":"GetBlobFileName",
"type":"GetMetadata",
"dependsOn":[
],
"policy":{
"timeout":"7.00:00:00",
"retry":0,
"retryIntervalInSeconds":30,
"secureOutput":false,
"secureInput":false
},
"userProperties":[
],
"typeProperties":{
"dataset":{
"referenceName":"PN_Getblobfilename_Dataset",
"type":"DatasetReference"
},
"fieldList":[
"childItems"
],
"storeSettings":{
"type":"AzureBlobStorageReadSetting",
"recursive":true
},
"formatSettings":{
"type":"DelimitedTextReadSetting"
}
}
},
{
"name":"ForEachBlobFile",
"type":"ForEach",
"dependsOn":[
{
"activity":"GetBlobFileName",
"dependencyConditions":[
"Succeeded"
]
}
],
"userProperties":[
],
"typeProperties":{
"items":{
"value":"#activity('GetBlobFileName').output.childItems",
"type":"Expression"
},
"isSequential":true,
"activities":[
{
"name":"Blob_to_SQLServer",
"description":"Copy PN blob files to sql server table",
"type":"Copy",
"dependsOn":[
],
"policy":{
"timeout":"7.00:00:00",
"retry":0,
"retryIntervalInSeconds":30,
"secureOutput":false,
"secureInput":false
},
"userProperties":[
{
"name":"Source",
"value":"PNemailattachment//"
},
{
"name":"Destination",
"value":"[dbo].[PN]"
}
],
"typeProperties":{
"source":{
"type":"DelimitedTextSource",
"storeSettings":{
"type":"AzureBlobStorageReadSetting",
"recursive":false,
"wildcardFileName":"*.*",
"enablePartitionDiscovery":false
},
"formatSettings":{
"type":"DelimitedTextReadSetting"
}
},
"sink":{
"type":"AzureSqlSink"
},
"enableStaging":false
},
"inputs":[
{
"referenceName":"PNBlob",
"type":"DatasetReference"
}
],
"outputs":[
{
"referenceName":"PN_SQLServer",
"type":"DatasetReference"
}
]
},
{
"name":"Obfuscate_PN_SQLData",
"description":"mask specific columns",
"type":"SqlServerStoredProcedure",
"dependsOn":[
{
"activity":"Blob_to_SQLServer",
"dependencyConditions":[
"Succeeded"
]
}
],
"policy":{
"timeout":"7.00:00:00",
"retry":0,
"retryIntervalInSeconds":30,
"secureOutput":false,
"secureInput":false
},
"userProperties":[
],
"typeProperties":{
"storedProcedureName":"[dbo].[Obfuscate_PN_Data]"
},
"linkedServiceName":{
"referenceName":"PN_SQLServer",
"type":"LinkedServiceReference"
}
},
{
"name":"SQLServer_to_ADLSGen2",
"description":"move PN obfuscated data to azure data lake gen2",
"type":"Copy",
"dependsOn":[
{
"activity":"Obfuscate_PN_SQLData",
"dependencyConditions":[
"Succeeded"
]
}
],
"policy":{
"timeout":"7.00:00:00",
"retry":0,
"retryIntervalInSeconds":30,
"secureOutput":false,
"secureInput":false
},
"userProperties":[
],
"typeProperties":{
"source":{
"type":"AzureSqlSource"
},
"sink":{
"type":"DelimitedTextSink",
"storeSettings":{
"type":"AzureBlobFSWriteSetting"
},
"formatSettings":{
"type":"DelimitedTextWriteSetting",
"quoteAllText":true,
"fileExtension":".csv"
}
},
"enableStaging":false
},
"inputs":[
{
"referenceName":"PN_SQLServer",
"type":"DatasetReference"
}
],
"outputs":[
{
"referenceName":"PNADLSGen2",
"type":"DatasetReference"
}
]
},
{
"name":"Delete_PN_SQLData",
"description":"delete all data from table",
"type":"SqlServerStoredProcedure",
"dependsOn":[
{
"activity":"SQLServer_to_ADLSGen2",
"dependencyConditions":[
"Succeeded"
]
}
],
"policy":{
"timeout":"7.00:00:00",
"retry":0,
"retryIntervalInSeconds":30,
"secureOutput":false,
"secureInput":false
},
"userProperties":[
],
"typeProperties":{
"storedProcedureName":"[dbo].[Delete_PN_Data]"
},
"linkedServiceName":{
"referenceName":"PN_SQLServer",
"type":"LinkedServiceReference"
}
}
]
}
}
],
"folder":{
"name":"PN"
},
"annotations":[
]
},
"type":"Microsoft.DataFactory/factories/pipelines"
}
The ForEach activity in Azure Data Factory (ADF) by default runs up to 20 tasks in parallel. You can make it run up to 50. If you want to force it to run sequentially, ie one after the other, then you can either set the Sequential checkbox on the Settings section of the ForEach UI (see below) or set the isSequential property of the ForEach activity in the JSON to true, eg
{
"name": "<MyForEachPipeline>",
"properties": {
"activities": [
{
"name": "<MyForEachActivity>",
"type": "ForEach",
"typeProperties": {
"isSequential": "true",
"items": {
...
I would caution the use of this setting though. Running things in serial, ie one after the other will slow things down. Is there another way you can design your workflow to take advantage of this really powerful feature of Azure Data Factory? Then your job will only take as long as the longest task(s), rather than the cumulative total of all tasks together.
Let's say I have a job to run which has 10 tasks each taking 1 second. If I run this job in serial it will take 10 seconds, but if I run it in parallel it will take 1 second.
SSIS never really had this - you could either manually create multiple paths or maybe use third-party components but it wasn't built in. It's really a superb feature of ADF you should try and take advantage of. There may of course be occasions where you really do need to run in serial which is why this option is available.
I've had something similar.
I've only just started with ADF so could be wrong but I noticed that by default, there is a batch size set on the for each activity. I made sure to set this to 1 as well as setting the sequential checkbox and now the inner activities are running in the order I expected.
For anyone else facing this issue:
Unchecking the sequential checkbox then type 1 for the batch limit
Recheck the box. Note that I'm using the designer and can't see anything in the code view relating to batch size so maybe it's hidden.

How to use parameters correctly in Transactional Cypher HTTP

I'm a bit stuck. I'm trying to use parameters in my http request in order to reduce the overhead but I just can't figure out why this won't work:
{
"statements" : [ {
"statement" : "MATCH (n:Person) WHERE n.name = {name} SET n.dogs={dogs} RETURN n",
"parameters" : [{
"name" : "Andres",
"dogs":5
},{
"name" : "Michael",
"dogs":3
},{
"name" : "Someone",
"dogs":2
}
]
}]
}
I've tried just opening a transaction with a STATEMENT and feeding the separate 'rows' in as PARAMETERS on subsequent transactions before I /COMMIT, but no joy.
I know that multiple nodes can be created in a similar from the examples in the manual.
What am I missing?
I've since modified an answer from this post, which seems to work by using a FOREACH statement to allow for multplie 'sets' of parameters.
{
"statements" : [
{
"parameters": {
"props": [
{
"userid": "177032492760",
"username": "John"
},
{
"userid": "177032492760",
"username": "Mike"
},
{
"userid": "100007496328",
"username": "Wilber"
}
]
},
"statement": "FOREACH (p in {props} | MERGE (user:People {id:p.userid}) ON CREATE SET user.name = p.username) "
}
]
}
You can also use UNWIND for this case :
The statement :
UNWIND props as prop
MERGE (user:People {id: {prop}.id}) // and the rest of your query
The parameters :
{"props":[ {"id": 1234, "name": "John"},{"id": 4567, "name": "Chris"}]}
This is what is used on Graphgen to load the generated graphs in a local database from the webapp. http://graphgen.neoxygen.io

Resources