How do I update a Neo4j geospatial index? - neo4j

Quite simply, this question or answer does not exist anywhere I have looked.
The objective is to reindex a node to update it's latitude and longitude properties.
The plugin I'm using to accomplish Geospatial operations in neo4j is called Spatial
Here is my setup
I create a pointlayer:
POST http://localhost:7474/db/data/ext/SpatialPlugin/graphdb/addSimplePointLayer
{
"layer" : "geom",
"lat" : "geolocation.lat",
"lon" : "geolocation.lon"
}
I then create a geom spatial index:
POST http://localhost:7474/db/data/index/node/
{
"name": "geom",
"config": {
"provider": "spatial",
"geometry_type": "point",
"lat": "geolocation.lat",
"lon": "geolocation.lon"
}
}
I finally add the node to the index:
POST http://localhost:7474/db/data/index/node/geom
{
"value": "dummy",
"key": "dummy",
"uri": "http://localhost:7474/db/data/node/5734"
}
I have a theory about how reindexing might be accomplished. First, I would remove the node from the geospatial index and then re-add it. But I'm concerned this may mess something up. I've read elsewhere that removing indexes and then adding them can create problems.
What is the proper way to reindex a node?

It looks as if you can just call POST on the index again. I don't know what the implications of this are yet. I also don't know if it creates a new index/node. It does seem to update correctly from my limited testing.
Example:
POST http://localhost:7474/db/data/index/node/geom
{
"value": "dummy",
"key": "dummy",
"uri": "http://localhost:7474/db/data/node/5734"
}
You can verify that the index node was not duplicated by running the following Cypher query.
MATCH (node { id: 5734 })
RETURN node
Important: The above id is NOT to be confused with the actual ID of the geospatial node itself. It is a property for referring to the node that you indexed.

Related

How does one parse nested Avro records correctly in NiFi?

I have incoming Avro records that roughly follow the format below. I am able to read them and convert them in existing NiFi flows. However, a recent change requires me to read from these files and parse the nested record, employers in this example. I read the Apache NiFi blog post, Record-Oriented Data with NiFi
but was unable to figure out how to get the AvroRecordReader to parse nested records.
{
"name": "recordFormatName",
"namespace": "nifi.examples",
"type": "record",
"fields": [
{ "name": "id", "type": "int" },
{ "name": "firstName", "type": "string" },
{ "name": "lastName", "type": "string" },
{ "name": "email", "type": "string" },
{ "name": "gender", "type": "string" },
{ "name": "employers",
"type": "record",
"fields": [
{"name": "company", "type": "string"},
{"name": "guid", "type": "string"},
{"name": "streetaddress", "type": "string"},
{"name": "city", "type": "string"}
]}
]
}
What I hope to achieve is a flow to read the employers records for each recordFormatName record and use the PutDatabaseRecord processor to keep track of the employers values seen. The current plan is to insert the records to a MySQL database. As suggested in an answer below, I plan on using PartitionRecord to sort the records based on a value in the employers subrecord. I do not need the top level details for this particular flow.
I have tried to parse with the AvroRecordReader but cannot figure out how to specify the nested records. Is this something that can be accomplished with the AvroRecordReader alone or does preprocessing, say a JOLT Transform need to happen first?
EDIT: Added further details about database after receiving a response.
What is your target DB and what does your target table look like? PutDatabaseRecord may not be able to handle nested records unless your DB, driver, and target table support them.
Alternatively you may need to use UpdateRecord to flatten the "employers" object into fields at the top level of the record. This is a manual process (until NIFI-4398 is implemented), but you only have 4 fields. After flattening the records, you could use PartitionRecord to get all records with a specific value for, say, employers.company. The outgoing flow files from PartitionRecord would technically constitute the distinct values for the partition field(s). I'm not sure what you're doing with the distinct values, but if you can elaborate I'd be happy to help.

Join two nodes in Firebase

I'm working on an app, which is supposed to show data from two nodes(Firebase). Firebase DB is structured as:
{
"College": {
"4F2EAB65": {
"id": "4F2EAB65",
"name": "SomeCollege"
},
"A3C2ED31": {
"id": "A3C2ED31",
"name": "OtherCollege"
},
"F967B5A0": {
"id": "F967B5A0",
"name": "CoolCollege"
}
},
"Student": {
"3E20545B": {
"college-ID": "4F2EAB65",
"id": "3E20545B",
"name": "A"
},
"6FDEE194": {
"college-ID": "F967B5A0",
"id": "6FDEE194",
"name": "B"
}
}
I want to fetch student details having details: "id", "name", "college-ID", "college-Name"(Need to fetch "college-Name" by "college-ID").
I've achieved this using for loop at front end. Is there any way to get this achieved at Firebase server, also can we make something like join (SQL).
Thanks.
There is no support for server-side joins in the Firebase Realtime Database. Client-side joins are quite normal.
The alternative is to duplicate the data upon writing, so that you don't have to read from two locations.
What's best for your application is a matter of personal preference, your comfort level with the code involved vs data duplication, and the use-cases of your app.
Client-side jons are likely not as slow as you may think. See http://stackoverflow.com/questions/35931526/speed-up-fetching-posts-for-my-social-network-app-by-using-query-instead-of-obse/35932786#35932786

neo4jClient create node with dynamic label using paramters

I am building an app that give users the ability to construct there own graphs. I have been using parameters for all queries and creates. But when I want to give users the ability to create a node where they can also Label it anything they want(respecting neo4j restrictions on empty string labels). How would I parameterize this type of transaction?
I tried this:
.CREATE("(a:{dynamicLabel})").WithParams(new {dynamicLabel = dlabel})...
But this yields a syntax error with neo. I am tempted to concatenate, but am worried that this may expose an injection risk to my application.
I am tempted to build up my-own class that reads the intended string and rejects any type of neo syntax, but this would limit my users a bit and I would rather not.
There is an open neo4j issue 4334, which is a feature request for adding the ability to parameterize labels during CREATE.So, this is not yet possible.
That issue contains a comment that suggests generating CREATE statements with hardcoded labels, which will work. It is, unfortunately, not as performant as using parameters (should it ever be supported in this case).
I searched like hell and finally found it out.
you can do it like that:
// create or update nodes with dynamic label from import data
WITH "file:///query.json" AS url
call apoc.load.json(url) YIELD value as u
UNWIND u.cis as ci
CALL apoc.merge.node([ ci.label ], {Id:ci.Id}, {}, {}) YIELD node
RETURN node;
The JSON looks like that:
{
"cis": [
{
"label": "Computer",
"Id": "1"
},
{
"label": "Service",
"Id": "2"
},
{
"label": "Person",
"Id": "3"
}
],
"relations": [
{
"end1Id": "1",
"Id": "4",
"end2Id": "2",
"label": "USES"
},
{
"end1Id": "3",
"Id": "5",
"end2Id": "1",
"label": "MANAGED_BY"
}
]
}
If you are using a Java client, then you can do it like this.
Node node = GraphDatabaseService.createNode();
Label label = new Label() {
#Override
public String name() {
return dynamicLabelVal;
}
};
node.addLabel(label);
You can then have a LabelCache which will avoid Label object creation for every node.

Solr CollapsingQParserPlugin with group.facet=on style facet counts

I have a Solr index of about 5 million documents at 8GB using Solr 4.7.0. I require grouping in Solr, but find it to be too slow. Here is the group configuration:
group=on
group.facet=on
group.field=workId
group.ngroups=on
The machine has ample memory at 24GB and 4GB is allocated to Solr itself. Queries are generally taking about 1200ms compared to 90ms when grouping is turned off.
I ran across a plugin called CollapsingQParserPlugin which uses a filter query to remove all but one of a group.
fq={!collapse field=workId}
It's designed for indexes that have a lot of unique groups. I have about 3.8 million. This approach is much much faster at about 120ms. It's a beautiful solution for me except for one thing. Because it filters out other members of the group, only facets from the representative document are counted. For instance, if I have the following three documents:
"docs": [
{
"id": "1",
"workId": "abc",
"type": "book"
},
{
"id": "2",
"workId": "abc",
"type": "ebook"
},
{
"id": "3",
"workId": "abc",
"type": "ebook"
}
]
once collapsed, only the top one shows up in the results. Because the other two get filtered out, the facet counts look like
"type": ["book":1]
instead of
"type": ["book":1, "ebook":1]
Is there a way to get group.facet counts using the collapse filter query?
According to Yonik Seeley, the correct group facet counts can be gathered using the JSON Facet API. His comments can be found at:
https://issues.apache.org/jira/browse/SOLR-7036?focusedCommentId=15601789&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15601789
I tested out his method and it works great. I still use the CollapsingQParserPlugin to collapse the results, but I exclude the filter when counting up the facets like so:
fq={!tag=workId}{!collapse field=workId}
json.facet={
type: {
type: terms,
field: type,
facet: {
workCount: "unique(workId)"
},
domain: {
excludeTags: [workId]
}
}
}
And the result:
{
"facets": {
"count": 3,
"type": {
"buckets": [
{
"val": "ebook",
"count": 2,
"workCount": 1
},
{
"val": "book",
"count": 1,
"workCount": 1
}
]
}
}
}
I was unable to find a way to do this with Solr or plugin configurations, so I developed a work around to effectively create group facet counts while still using the CollapsingQParserPlugin.
I do this by making a duplicate of the fields I'll be faceting on and making sure all facet values for the entire group are in each document like so:
"docs": [
{
"id": "1",
"workId": "abc",
"type": "book",
"facetType": [
"book",
"ebook"
]
},
{
"id": "2",
"workId": "abc",
"type": "ebook",
"facetType": [
"book",
"ebook"
]
},
{
"id": "3",
"workId": "abc",
"type": "ebook",
"facetType": [
"book",
"ebook"
]
}
]
When I ask Solr to generate facet counts, I use the new field:
facet.field=facetType
This ensures that all facet values are accounted for and that the counts represent groups. But when I use a filter query, I revert back to using the old field:
fq=type:book
This way the correct document is chosen to represent the group.
I know this is a dirty, complex way to make it work, but it does work and that's what I needed. Also it requires the ability to query your documents before insertion into Solr, which calls for some development. If anyone has a simpler solution I would still love to hear it.

Set based updates in RavenDB

I am working with RavenDB documents. I need to change a field in all the documents at once. I read there is something called set based updates in Raven DB documentation. I need a little help to put me in right direction here.
A patron Document looks something like this:
{
"Privilege": [
{
"Level": "Gold",
"Code": "12312",
"EndDate": "12/12/2012"
}
],
"Phones": [
{
"Cell": "123123",
"Home": "9783041284",
"Office": "1234123412"
}
]
{
In Patrons document collection, there is a Privilege.Level field in each doc. I need to write a query to update it to "Gold" for all documents in that Patrons collection. This is what I know so far. I need to create an Index (ChangePrivilegeIndex) first:
from Patrons in docs.patrons
select new {Patrons.Privilege.Level}
and then write a curl statement to patch documents all at once something like this:
PATCH http://localhost:8080/bulk_docs/ChangePrivilegeIndex
[
{ "Type": "Set", "Name": "Privilege.Level", "Value": "Gold"}
]
I need help to get this to work .. thank you. I know there are lots of loose ends in the actual scripts.. that's why its not working. Can some one look at the scenario and the script above to put me in right direction.

Resources