How to find document from CouchDB based on properties of other documents in a single query? - join

There is already an existing CouchDB database which was created based on existing records from a MySQL database.
I have a set of documents like this:
[
{
"_id": "lf_event_users_1247537_11434",
"_rev": "1-19e90d3f19e9da7cc5adab44ebbe3894",
"TS_create": "2018-12-17T10:29:20",
"emm_id": 204662,
"eu_user_id": 201848611,
"type": "lf_event_users",
"uid": 1247537,
"vendor_id": 11434
},
{
"_id": "lf_event_users_1247538_11434",
"_rev": "1-0d0d1e9f1fb5aad9bafd4c53a6cada17",
"TS_create": "2018-12-17T10:29:20",
"emm_id": 204661,
"eu_user_id": 201848611,
"type": "lf_event_users",
"uid": 1247538,
"vendor_id": 11434
},
{
"_id": "lf_event_users_1247539_11434",
"_rev": "1-09bc2bfc709ee9c6e6cac9cb34964ac4",
"TS_create": "2018-12-17T10:29:20",
"emm_id": 204660,
"eu_user_id": 201848611,
"type": "lf_event_users",
"uid": 1247539,
"vendor_id": 11434
}
]
As you can see, all of them are for the same "eu_user_id" = 201848611, and each one has a different "emm_id".
Now, I have another set of document like this in the same CouchDB database:
[
{
"_id": "lf_event_management_master_204660_11434",
"_rev": "2-320111a3814a3efd6838baa0fb5412bb",
"emm_disabled": "n",
"emm_title": "Scanned for local delivery",
"settings": {
"event_view": "ScannedForLocalDeliveryEvent",
"sort_weight": 0
},
"type": "lf_event_management_master",
"uid": 204660,
"vendor_id": 11434
},
{
"_id": "lf_event_management_master_204661_11434",
"_rev": "2-e6d6ebbd4dc4ca473a376d3d16a58e93",
"emm_disabled": "n",
"emm_title": "Local Delivery Cancelled",
"settings": {
"event_view": "CancelDeliveryEvent",
"sort_weight": 4
},
"type": "lf_event_management_master",
"uid": 204661,
"vendor_id": 11434
},
{
"_id": "lf_event_management_master_204662_11434",
"_rev": "2-53cb3d3eba80704e87ea5ff8d5c269df",
"emm_disabled": "n",
"emm_title": "Local Delivery Exception",
"settings": {
"event_view": "DeliveryExceptionEvent",
"sort_weight": 3
},
"type": "lf_event_management_master",
"uid": 204662,
"vendor_id": 11434
}
]
As you can see, each document in this last set has a "uid" matching the "emm_id" in the previous set of documents. Basically this means:
A "user" has many allowed "events".
You can see also that the documents of type "lf_event_management_master" has no "eu_user_id" value or any other key matching this.
My question is:
How can I get all documents of type "lf_event_management_master" allowed for user "201848611" in a single query?
In my case, I only have the User ID (201848611) available at the point where I need to get the allowed events. Currently what is happening is:
I get all the "lf_event_users" records for this user.
Loop all results from previous query and build a new query, extracting this time to find all the "lf_event_management_master" where the "uid" includes any of the "emm_id" values found with the previous query.
Thank you in advance.

Related

How to get a hierarchical list of nodes and relations as json in Neo4j

I have a neo4j DB in which user data and relations between them would be stored, the end user will interact with this data from a mobile app (app is in Flutter, we use a nestjs neo4j connector in between). Now we have to enable offline access to data. So the idea was to export the data of the user from neo4j as json and use it when offline, when the device gets online we will make the changes to the DB. I have some problem getting the data as json
This is a rough sample what I am trying to do
The cypher commands to create these nodes
CREATE (c:Computer {name: 'Andy',uid:'123'})
CREATE (d1:Drive {name: 'Drive1',capacity:"2gb",uid:'223'})
CREATE (d2:Drive {name: 'Drive2',capacity:"4gb",uid:'233'})
CREATE (f1:Folder {name: 'desktop',type:"special",uid:'323'})
CREATE (f2:Folder {name: 'mydocuments',type:"special",uid:'333'})
CREATE (f3:Folder {name: 'myprojects',type:"normal",uid:'343'})
CREATE (t1:File {name: 'text1',type:"txt",size:"1kb",uid:'423'})
CREATE (t2:File {name: 'text2',type:"txt",size:"1.5kb",uid:'433'})
CREATE (t3:File {name: 'text3',type:"txt",size:"2kb",uid:'443'})
CREATE (do1:File {name: 'doc1',type:"doc",size:"1mb",uid:'523'})
CREATE (do2:File {name: 'doc2',type:"doc",size:"1.5mb",uid:'533'})
CREATE (do3:File {name: 'doc3',type:"doc",size:"2mb",uid:'543'})
CREATE (c)-[r1:PARTITION{during: 'osinstall'}]->(d1)
CREATE (c)-[r2:PARTITION{during: 'setup'}]->(d2)
CREATE (d1)-[r3:AutoCreated{during: 'osinstall',type:"folder"}]->(f1)
CREATE (d1)-[r4:AutoCreated{during: 'osinstall',type:"folder"}]->(f2)
CREATE (f1)-[r5:Shortcut{type:"folder"}]->(c)
CREATE (f2)-[r6:Shortcut{type:"folder"}]->(c)
CREATE (d2)-[r7:UserCreated{type:"folder"}]->(f3)
CREATE (d2)-[r8:UserCreated{type:"file"}]->(t1)
CREATE (d2)-[r9:UserCreated{type:"file"}]->(t2)
CREATE (f3)-[r10:UserCreated{type:"file"}]->(t3)
CREATE (f3)-[r11:UserCreated{type:"file"}]->(do1)
CREATE (d2)-[r12:UserCreated{type:"file"}]->(do2)
CREATE (do2)-[r13:Shortcut{type:"file"}]->(f1)
CREATE (f3)-[r14:Shortcut{type:"folder"}]->(f1)
CREATE (f1)-[r15:UserCreated{type:"file"}]->(do3)
CREATE (do3)-[r16:Shortcut{type:"file"}]->(f3)
CREATE (c1:Computer {name: 'Randy',uid:'c1-123'})
CREATE (c1d1:Drive {name: 'Drive1',capacity:"1gb",uid:'c1-223'})
CREATE (c1t1:File {name: 'text1',type:"txt",size:"1kb",uid:'c1-423'})
CREATE (c1t2:File {name: 'text2',type:"txt",size:"1.5kb",uid:'c1-433'})
CREATE (c1sh1:SharedDrive {name:"SharedDrive",uid:'c1-s1'})
CREATE (c1)-[c1r1:PARTITION{during: 'osinstall'}]->(c1d1)
CREATE (c1d1)-[c1r2:UserCreated{type:"file"}]->(c1t1)
CREATE (c1d1)-[c1r3:UserCreated{type:"file"}]->(c1t2)
CREATE (c1t1)-[c1r4:Share{type:"file"}]->(c1sh1)
CREATE (c1)-[c1r5:SHAREDPARTITION{during: 'osinstall'}]->(c1sh1)
CREATE (c1)-[common:Network]->(c)
I want to query a root node(say Andy) with the users uid and get the data in the format
{
"name": "Andy",
"uid":"123",
"PARTITION":[
{
"name": "Drive1",
"capacity":"2gb",
"uid":"223",
"Folder":[
{ "name": "desktop","type":"special","uid":"323",
"File":[
"... Detail about file doc3 here.."
],
"Shortcut":[
"... Detail about file doc2 here.."
]
},
{"name": "mydocuments","type":"special","uid":"333"}
]
},
{
"name": "Drive2",
"capacity":"4gb",
"uid":"233",
"Folder":[
{ "name": "myprojects","type":"normal","uid":"343",
"File":[
"...Detail about Files doc1, text3 here..."
],
"Shortcut":[
"... Detail about file doc3 here.."
]
}
],
"File":[
"...Detail about Files text1,text2,doc 2 here..."
]
}
],
"Shortcut":[
{
"name": "desktop","type":"special","uid":"323"
},{
"name": "mydocuments","type":"special","uid":"333"
}
],
"Network":[
{
"name": "Randy",
"uid":"c1-123",
"SHAREDPARTITION":["...HERE ONLY NEED THE files and folders from shareddrive other drives should not show up..."]
}
]
}
I want to add the relations from and to the node as key and for the value add a list of nodes(with their related properties) connected to it and move to the next one. I don't know how to do so. So far I have tried
match (n:Computer{uid:"123"})-[r:PARTITION]->(x)
match b=(x)-[*]->(y)
with collect(b) as c
call apoc.convert.toTree(c) yield value
return value
but this does not return the shortcut file paths properly ie., if I add a shortcut from doc3(at desktop) to myproject, I don't find the file detail with myproject shortcuts I need it at both places at desktop(under files) and myproject(under shortcut) folder. Also the shared computer details are not fetched(all drives must not be fetched just the sharedpartition). Apart from this the return data is not in the expected format and I have to process it in the app after fetching it.
Can someone help me with this?
I am also open to different solutions for neo4j flutter offline.
You can create a connection from the root node then collect them all together. Just make sure you are using the relationship that you want to extract. Below is not exactly you described but closest to your json format.
match b=(n:Computer{uid:"123"})-[r:PARTITION]->(x:Drive)-[]-(y:Folder)-[]-(z:File)
match c=(n)-[:Shortcut]-()
match d=(n)-[:Network]-()-[:SHAREDPARTITION]-()
with collect(b) + collect(c) + collect(d) as t
call apoc.convert.toTree(t) yield value
return value
Result:
{
"name": "Andy",
"uid": "123",
"_type": "Computer",
"_id": 298,
"partition": [
{
"uid": "233",
"_type": "Drive",
"name": "Drive2",
"_id": 300,
"partition.during": "setup",
"capacity": "4gb",
"usercreated": [
{
"uid": "343",
"shortcut": [
{
"uid": "543",
"size": "2mb",
"shortcut.type": "file",
"_type": "File",
"name": "doc3",
"_id": 309,
"type": "doc"
}
],
"_type": "Folder",
"name": "myprojects",
"usercreated": [
{
"uid": "443",
"size": "2kb",
"_type": "File",
"name": "text3",
"_id": 306,
"type": "txt",
"usercreated.type": "file"
},
{
"uid": "523",
"size": "1mb",
"_type": "File",
"name": "doc1",
"_id": 307,
"type": "doc",
"usercreated.type": "file"
}
],
"_id": 303,
"type": "normal",
"usercreated.type": "folder"
}
]
},
{
"uid": "223",
"_type": "Drive",
"name": "Drive1",
"_id": 299,
"partition.during": "osinstall",
"capacity": "2gb",
"autocreated": [
{
"autocreated.during": "osinstall",
"autocreated.type": "folder",
"uid": "323",
"shortcut": [
{
"uid": "533",
"size": "1.5mb",
"shortcut.type": "file",
"_type": "File",
"name": "doc2",
"_id": 308,
"type": "doc"
}
],
"_type": "Folder",
"name": "desktop",
"usercreated": [
{
"uid": "543",
"size": "2mb",
"_type": "File",
"name": "doc3",
"_id": 309,
"type": "doc",
"usercreated.type": "file"
}
],
"_id": 301,
"type": "special"
}
]
}
],
"shortcut": [
{
"uid": "333",
"shortcut.type": "folder",
"_type": "Folder",
"name": "mydocuments",
"_id": 302,
"type": "special"
},
{
"uid": "323",
"shortcut.type": "folder",
"_type": "Folder",
"name": "desktop",
"_id": 301,
"type": "special"
}
],
"network": [
{
"_type": "Computer",
"name": "Randy",
"uid": "c1-123",
"_id": 310,
"sharedpartition": [
{
"_type": "SharedDrive",
"name": "SharedDrive",
"uid": "c1-s1",
"_id": 473,
"sharedpartition.during": "osinstall"
}
]
}
]
}

Storing a List in a Vertex using Gremlin and Azure Cosmos Graph

I am trying to store a list of changes made to a Vertex in the Vertex itself. Ideally I would want something like this:
{
"id": "95fcfa87-1c03-436d-b3ca-340cea926ee9",
"label": "person",
"type": "vertex",
"log": [{
"user": "user#user.dk",
"action": "update",
"timestamp": "22-03-2017",
"field": "firstName",
"oldValue": "Marco"
}
]
}
Using this method chain I am able to a achieve the following structure
graph.addV('person')
.property('firstName', 'Thomas')
.property(list, 'log', '22-03-2017')
.properties('log')
.hasValue('22-03-2017', '21-03-2017')
.property('user','user#user.dk')
.property('action', 'update')
.property('field', 'firstName')
.property('oldValue', 'Marco')
{
"id": "95fcfa87-1c03-436d-b3ca-340cea926ee9",
"label": "person",
"type": "vertex",
"properties": {
"firstName": [{
"id": "f23482a9-48bc-44e0-b783-3b74a2439a11",
"value": "Thomas"
}
],
"log": [{
"id": "5cfa35e1-e453-42e2-99b1-eb64cd853f22",
"value": "22-03-2017",
"properties": {
"user": "user#user.dk",
"action": "update",
"field": "firstName",
"oldValue": "Marco"
}
}
]
}
}
However this seems overly complex, as I will have to store a value and add properties to it.
Is it possible to add anonymous objects (i.e. without id and value) with the above mentioned data?
Not an actual solution to storing proper objects in a history log, but if you just use it as a log and don't have to access or query it by its properties, you could just put the serialised JSON in the value?
Something like along these lines should approximate the structure you're requesting:
dynamic entry = new JObject();
entry.user = "user#user.dk";
entry.action = "update";
entry.timestamp = "22-03-2017 12:34:56";
entry.field = "firstName";
entry.oldValue = "Marco";
graph.addV('person')
.property('firstName', 'Thomas')
.property(list, 'log', entry.ToString());
{
"id": "95fcfa87-1c03-436d-b3ca-340cea926ee9",
"label": "person",
"type": "vertex",
"properties": {
"firstName": [{
"id": "f23482a9-48bc-44e0-b783-3b74a2439a11",
"value": "Thomas"
}
],
"log": [{
"id": "5cfa35e1-e453-42e2-99b1-eb64cd853f22",
"value": "{\"user\":\"user#user.dk\",\"action\":\"update\",\"timestamp\":\"22-03-2017\",\"field\":\"firstName\",\"oldValue\":\"Marco\"}"
}
]
}
}
These log entries can easily be read, deserialised, used, and presented, but will not do much for queriability.

spring-data-elasticsearcher :How can i use route to delete document in spring data elasticsearcher

my child document mapping is:
{
"_index": "test-index",
"_type": "test_type",
"_id": "AVznf5cOTLguhbQOC8aV",
"_version": 1,
"_score": null,
"_routing": "1b973ddd-0aa9-4578-9bf9-74125a3c7r4d",
"_parent": "1b973ddd-0aa9-4578-9bf9-74125a3c7r4d",
"_source": {
"id": null,
"email": "test#hempel.com",
"actionDate": "2017-06-20T08:43:52.000Z",
"actionStatus": "SENT_SUCCESS",
"description": "",
"ip": "0.0.0.0",
"address": "",
"browser": null,
"os": "",
"taskId": "1b973ddd-0aa9-4578-9bf9-74125a3c7f4d",
"taskName": "007",
"actionStatusName": "SENT_SUCCESS",
"new": true
},
"sort": [
"test#hempel.com"
]
}
you can see, it's child document, so every time i query the document like this:
test_index/test_type/AVznWID-TLguhbQOC2Zt?routing=89293986-7d08-4e73-be1e-1ec9e136b440 /Get
well , so delete will like this:
test_index/test_type/AVznWID-TLguhbQOC2Zt?routing=89293986-7d08-4e73-be1e-1ec9e136b440 /delete
but the problem is ,how can i query and delete the document with routing value do this job by using spring data elasticsearcher:
ElasticsearchTemplate
well,i have found a way to resolve this problem,just using:
org.elasticsearch.action.delete # "DeleteRequest"
in some cases,we have come to rely too much on the tool

JSON API questions. Included vs relationships

I am reading this before building an API endpoints. I read this quote about compound documents:
To reduce the number of HTTP requests, servers MAY allow responses
that include related resources along with the requested primary
resources. Such responses are called "compound documents".
Here is a sample JSON response using the JSON API specification:
{
"data": [{
"type": "articles",
"id": "1",
"attributes": {
"title": "JSON API paints my bikeshed!"
},
"links": {
"self": "http://example.com/articles/1"
},
"relationships": {
"author": {
"links": {
"self": "http://example.com/articles/1/relationships/author",
"related": "http://example.com/articles/1/author"
},
"data": { "type": "people", "id": "9" }
},
"comments": {
"links": {
"self": "http://example.com/articles/1/relationships/comments",
"related": "http://example.com/articles/1/comments"
},
"data": [
{ "type": "comments", "id": "5" },
{ "type": "comments", "id": "12" }
]
}
}
}],
"included": [{
"type": "people",
"id": "9",
"attributes": {
"first-name": "Dan",
"last-name": "Gebhardt",
"twitter": "dgeb"
},
"links": {
"self": "http://example.com/people/9"
}
}, {
"type": "comments",
"id": "5",
"attributes": {
"body": "First!"
},
"relationships": {
"author": {
"data": { "type": "people", "id": "2" }
}
},
"links": {
"self": "http://example.com/comments/5"
}
}, {
"type": "comments",
"id": "12",
"attributes": {
"body": "I like XML better"
},
"relationships": {
"author": {
"data": { "type": "people", "id": "9" }
}
},
"links": {
"self": "http://example.com/comments/12"
}
}]
}
So from what I can see, the relationships sections give basic/sparse information about the associations between the articles table and other tables. It looks like an article belongs_to an author and has_many comments.
What will the links be used for? Will the API have to use the link in order to receive more detailed JSON about the relationship? Doesn't this require an additional API call? Is this efficient?
The "included" section seems like it contains more detailed information about the relationships/associations?
Are both "included" and "relationships" necessary? What's the intuition behind needing both of these sections?
The idea is that a relationship in a resource simply gives linkage data (that is basic data to uniquely identify the related resource – these data are the id and the type), in order to keep it to a minimum.
On the other hand, the included section is here in case you want to send along detailed information about some related resources (for instance to minimise the number of HTTP requests). Note that the included section is expected to contain only resources that are related to either a primary resource (i.e. within the data section), or an included resource (this constraint is called full linkage in the spec).
To put it simply, the relationships section of a resource tell you which resources are related to a given resource, and the included section tells you what those resources are.
As far as links are concerned, they may come in handy when you have a has_many relationship, for which the linkage data itself might contain several thousands of id/type records, thus making your response document quite big. In case those are not necessarily needed by your client when they request the base resource, you might decide to make them available through a link.

Avro schema evolution

I have two questions:
Is it possible to use the same reader and parse records that were written with two schemas that are compatible, e.g. Schema V2 only has an additional optional field compared to Schema V1 and I want the reader to understand both? I think the answer here is no, but if yes, how do I do that?
I have tried writing a record with Schema V1 and reading it with Schema V2, but I get the following error:
org.apache.avro.AvroTypeException: Found foo, expecting foo
I used avro-1.7.3 and:
writer = new GenericDatumWriter<GenericData.Record>(SchemaV1);
reader = new GenericDatumReader<GenericData.Record>(SchemaV2, SchemaV1);
Here are examples of the two schemas (I have tried adding a namespace as well, but no luck).
Schema V1:
{
"name": "foo",
"type": "record",
"fields": [{
"name": "products",
"type": {
"type": "array",
"items": {
"name": "product",
"type": "record",
"fields": [{
"name": "a1",
"type": "string"
}, {
"name": "a2",
"type": {"type": "fixed", "name": "a3", "size": 1}
}, {
"name": "a4",
"type": "int"
}, {
"name": "a5",
"type": "int"
}]
}
}
}]
}
Schema V2:
{
"name": "foo",
"type": "record",
"fields": [{
"name": "products",
"type": {
"type": "array",
"items": {
"name": "product",
"type": "record",
"fields": [{
"name": "a1",
"type": "string"
}, {
"name": "a2",
"type": {"type": "fixed", "name": "a3", "size": 1}
}, {
"name": "a4",
"type": "int"
}, {
"name": "a5",
"type": "int"
}]
}
}
},
{
"name": "purchases",
"type": ["null",{
"type": "array",
"items": {
"name": "purchase",
"type": "record",
"fields": [{
"name": "a1",
"type": "int"
}, {
"name": "a2",
"type": "int"
}]
}
}]
}]
}
Thanks in advance.
I encountered the same issue. That might be a bug of avro, but you probably can work around by adding "default": null to the field of "purchase".
Check my blog for details: http://ben-tech.blogspot.com/2013/05/avro-schema-evolution.html
You can do opposite of it . Mean you can parse data schem 1 and write data from schema 2 . Beacause at write time it write data into file and if we don't provide any field at reading time than it will be ok. But if we write less field than read than it will not recognize extra field at reading time so , it will give error .
Best way is to have a schema mapping to maintain the schema like Confluent Avro schema registry.
Key Take Aways:
1. Unlike Thrift, avro serialized objects do not hold any schema.
2. As there is no schema stored in the serialized byte array, one has to provide the schema with which it was written.
3. Confluent Schema Registry provides a service to maintain schema versions.
4. Confluent provides Cached Schema Client, which checks in cache first before sending the request over the network.
5. Json Schema present in “avsc” file is different from the schema present in Avro Object.
6. All Avro objects extends from Generic Record
7. During Serialization : based on schema of the Avro Object a schema Id is requested from the Confluent Schema Registry.
8. The schemaId which is a INTEGER is converted to Bytes and prepend to serialized AvroObject.
9. During Deserialization : First 4 bytes are removed from the ByteArray. 4 bytes are converted back to INTEGER(SchemaId)
10. Schema is requested from the Confluent Schema Registry and using this schema the byteArray is deserialized.
http://bytepadding.com/big-data/spark/avro/avro-serialization-de-serialization-using-confluent-schema-registry/

Resources