I have a data structure where data is composed of two sorts of nodes: item and claim. claim represents some information about the item and may refer to other items, such as - certain entity being located in certain other entity, e.g. "Germany is in Europe". Example structure:
create
(v1:item {id:"Q1", name: "Europe"}),
(v2:item {id:"Q2", name: "France"}),
(v3:item {id:"Q3", name: "Germany"}),
(v4:item {id:"Q4", name: "Bavaria"}),
(v5:item {id:"Q5", name: "Munich"}),
(c1:claim:located),
(c2:claim:located),
(c3:claim:located),
(c4:claim:located),
(v5)-[:claim]->c4,
(c4)-[:located]->v4,
(v4)-[:claim]->c3,
(c3)-[:located]->v3,
(v3)-[:claim]->c2,
(c2)-[:located]->v1,
(v2)-[:claim]->c1,
(c1)-[:located]->v1;
also in http://console.neo4j.org/?id=ncbom6. Now, if I wanted to traverse it - e.g. to figure out all items in Germany, or in Europe, how can I do this? Is it possible with Cypher in this model? I know there's something like v1-[r:*]->v2 but this assumes either one specific relationship or any relationship, and I need a repeating pattern of claim-located pairs.
If you want to find, for example, all the items in Europe, using the data in your console:
MATCH (v:item { name: "Europe" })<-[:claim|located*]-(x:item)
RETURN x;
If you also want to ensure that the path traversed strictly alternates between claim and located relationships, here is a somewhat tricky way to do that:
MATCH (v:item { name: "Europe" })<-[rel:claim|located*]-(x:item)
WHERE REDUCE(s = 0, x IN rel | CASE
WHEN (s = 0 AND TYPE(x)= 'claim')
THEN 1
WHEN (s = 1 AND TYPE(x)= 'located')
THEN 0
ELSE NULL END )= 0
RETURN x;
You can modify the WHEN tests if you need additional checking.
Related
I am loading simple csv data into neo4j. The data is simple as follows :-
uniqueId compound value category
ACT12_M_609 mesulfen 21 carbon
ACT12_M_609 MNAF 23 carbon
ACT12_M_609 nifluridide 20 suphate
ACT12_M_609 sulfur 23 carbon
I am loading the data from the URL using the following query -
LOAD CSV WITH HEADERS
FROM "url"
AS row
MERGE( t: Transaction { transactionId: row.uniqueId })
MERGE(c:Compound {name: row.compound})
MERGE (t)-[r:CONTAINS]->(c)
ON CREATE SET c.category= row.category
ON CREATE SET r.price =row.value
Next I do the aggregation to count total orders for a compound and create property for a node in the following way -
MATCH (c:Compound) <-[:CONTAINS]- (t:Transaction)
with c.name as name, count( distinct t.transactionId) as ord
set c.orders = ord
So far so good. I can accomplish what I want but I have the following 2 questions -
How can I create the orders property for compound node in the first step itself? .i.e. when I am loading the data I would like to perform the aggregation straight away.
For a compound node I am also setting the property for category. Theoretically, it can also be modelled as category -contains-> compound by creating Categorynode. But what advantage will I have if I do it? Because I can execute the queries and get the expected output without creating this additional node.
Thank you for your answer.
I don't think that's possible, LOAD CSV goes over one row at a time, so at row 1, it doesn't know how many more rows will follow.
I guess you could create virtual nodes and relationships, aggregate those and then use those to create the real nodes, but that would be way more complicated. Virtual Nodes/Rels
That depends on the questions/queries you want to ask.
A graph database is optimised for following relationships, so if you often do a query where the category is a criteria (e.g. MATCH (c: Category {category_id: 12})-[r]-(:Compound) ), it might be more performant to create a label for it.
If you just want to get the category in the results (e.g. RETURN compound.category), then it's fine as a property.
I have a node:
Database: {
name: 'example',
description: 'this is the example database'
type: 'relational'
}
I want type to be an enum like:
DB_TYPE enum {
relational
document
graph
other
}
1st Question: How can I define this enum type so that all "database" nodes can have a type property that is one of these 4 values?
Should I just leave it as a string and forget about making an enum?
I considered using labels for these nodes like: :Relational, :Document.
2nd Question: If I should use labels, what is the cypher syntax to determine if a given database node is either relational, document, graph, or other?
AFAIK, there is no way to define an enum property for a node. From what you described, I think you'd better use labels. If your really don't want to use labels, another alternative could be having one node per type and then connecting database nodes to these type nodes. But depending on the size of your graph, those type nodes could become super nodes with lots of relationships. I would not suggest this approach. Again to me the best solution in such usecases is to use labels.
The easiest would be checking labels against labels(a) where a is your node. For example:
MATCH (a) where 'Relational' in labels(a) OR 'Document' in labels(a) ....
There is also an APOC procedure apoc.label.exists that you can use:
MATCH (a) where apoc.label.exists(a, 'Relational') OR apoc.label.exists(a, 'Document')....
If you are using Python with neomodel as an ORM tool, you can define your model in the following way:
from neomodel import StructuredNode, StringProperty
class Database(StructuredNode):
TYPES = (
('RELATIONAL', 'relational'),
('DOCUMENT', 'document'), # ...
)
name = StringProperty(unique_index=True),
description = StringProperty(),
type = StringProperty(choices=TYPES)
More info:
https://neomodel.readthedocs.io/en/latest/module_documentation.html#neomodel.properties.StringProperty
https://github.com/neo4j-contrib/neomodel/commit/dee7ca0b83cecf0156dc164052701ce7b8ebe14a
I am currently investigating how to model a bitemporal graph in neo4j. Unfortunately noone seems to have publicly undertaken this before.
One particular thing I am looking at is whether I can store in a new node only those values that have changed and then express a query that would merge all those values ordered by a given timestamp:
This creates the data I am playing with:
CREATE (:P1 {id: '1'})<-[:EXPANDS {date:5200, recorded:5100}]-(:P1Data {name:'Joe', wage: 3000})
// New data, recorded 2014-10-1 for 2015-1-1
MATCH (p:P1 {id: '1'}) CREATE (:P1Data { wage:3100 })-[:EXPANDS { date:5479, recorded: 5387}]->(p)
Now, I can get a history for a given point in time so far, e.g. like
MATCH (:P1 { id: '1' })<-[x:EXPANDS]-(d:P1Data)
WHERE x.recorded < 6000
WITH {date: x.date, data:d} as data
RETURN data
ORDER BY data.date DESC
What I would like to achieve is to merge the name and wage values such that I get a whole view of the data at a given point in time. The answer may also be that this is not really possible.
(PS: I say only in query, because I found a refactor function in apoc which does merge nodes, but that procedure actually merges and persists the node, while I would just want to query it).
As with most things, you can do it using REDUCE like so:
MATCH (:P1 { id: '1' })<-[x:EXPANDS]-(d:P1Data)
WITH x.date AS date, d AS data
ORDER BY date
WITH COLLECT(data) AS datas
WITH REDUCE(s = {}, y IN datas|
{name: COALESCE(y.name, s.name),
wage: COALESCE(y.wage, s.wage)})
AS most_recent_fields
RETURN most_recent_fields.name AS name, most_recent_fields.wage AS wage
You can do it in descending order instead (swap s and y inside the COALESCE statements if so), but there isn't really a way to shortcut processing the entire set of results from your queried time back to the start.
UPDATE: This will, of course, generate a Map and not a Node, but if you only want the properties and don't want to create a permanent record, a Map is actually better suited to your needs.
EXTENDED: If you don't want to specify which keys to use, you can do it without REDUCE like this instead:
MATCH (:P1 { id: '1' })<-[x:EXPANDS]-(d:P1Data)
WITH x.date AS date, d AS data
ORDER BY date
WITH COLLECT(data) AS datas
CREATE (t:Temp)
FOREACH(data IN datas|
SET t += data)
DELETE t
RETURN t
This approach does create a node, but if you DELETE it right before you RETURN it, it won't persist at all. += ensures that pre-existing properties aren't removed, only overwritten if the data node has existing values.
I've read a lot of posts about finding the highest-valued objects in arrays using max and max_by, but my situation is another level deeper, and I can't find any references on how to do it.
I have an experimental Rails app in which I am attempting to convert a legacy .NET/SQL application. The (simplified) model looks like Overlay -> Calibration <- Parameter. In a single data set, I will have, say, 20K Calibrations, but about 3,000-4,000 of these are versioned duplicates by Parameter name, and I need only the highest-versioned Parameter by each name. Further complicating matters is that the version lives on the Overlay. (I know this seems crazy, but this models our reality.)
In pure SQL, we add the following to a query to create a virtual table:
n = ROW_NUMBER() OVER (PARTITION BY Parameters.Designation ORDER BY Overlays.Version DESC)
And then select the entries where n = 1.
I can order the array like this:
ordered_calibrations = mainline_calibrations.sort do |e, f|
[f.parameter.Designation, f.overlay.Version] <=> [e.parameter.Designation, e.overlay.Version] || 1
end
I get this kind of result:
C_SCR_trc_NH3SensCln_SCRT1_Thd 160
C_SCR_trc_NH3SensCln_SCRT1_Thd 87
C_SCR_trc_NH3Sen_DewPtHiThd_Tbl 310
C_SCR_trc_NH3Sen_DewPtHiThd_Tbl 160
C_SCR_trc_NH3Sen_DewPtHiThd_Tbl 87
So I'm wondering if there is a way, using Ruby's Enumerable built-in methods, to loop over the sorted array, and only return the highest-versioned elements per name. HUGE bonus points if I could feed an integer to this method's block, and only return the highest-versioned elements UP TO that version number ("160" would return just the second and fourth entries, above).
The alternative to this is that I could somehow implement the ROW_NUMBER() OVER in ActiveRecord, but that seems much more difficult to try. And, of course, I could write code to deal with this, but I'm quite certain it would be orders of magnitude slower than figuring out the right Enumerable function, if it exists.
(Also, to be clear, it's trivial to do .find_by_sql() and create the same result set as in the legacy application -- it's even fast -- but I'm trying to drag all the related objects along for the ride, which you really can't do with that method.)
I'm not convinced that doing this in the database isn't a better option, but since I'm unfamiliar with SQL Server I'll give you a Ruby answer.
I'm assuming that when you say "Parameter name" you're talking about the Parameters.Designation column, since that's the one in your examples.
One straightforward way you can do this is with Enumerable#slice_when, which is available in Ruby 2.2+. slice_when is good when you want to slice an array "between" values that are different in some way. For example:
[ { id: 1, name: "foo" }, { id: 2, name: "foo" }, { id: 3, name: "bar" } ]
.slice_when {|a,b| a[:name] != b[:name] }
# => [ [ { id: 1, name: "foo" }, { id: 2, name: "foo" } ],
# [ { id: 3, name: "bar" } ]
# ]
You've already sorted your collection, so to slice it you just need to do this:
calibrations_by_designation = ordered_calibrations.slice_when do |a, b|
a.parameter.Designation != b.parameter.Designation
end
Now calibrations_by_designation is an array of arrays, each of which is sorted from greatest Overlay.Version to least. The final step, then, is to get the first element in each of those arrays:
highest_version_calibrations = calibrations_by_designation.map(&:first)
Essentially, I'm storing a directed graph of entities in CouchDB, and need to be able to find edges going IN and OUT of the graph.
SETUP:
The way the data is being stored right now is as follows. Each document represents a RELATION between two entities:
doc: {
entity1: { name: '' ... },
entity2: { name: '' ... }
...
}
I have a view which does a bunch of emits, two of which emit documents keyed on their entity1 component and on their entity2 component, so something like:
function() {
emit(['entity1', doc.entity1.name]);
emit(['entity2', doc.entity2.name]);
}
Edges are directed, and go from entity1 and entity2. So if I want to find edges going out of an entity, I just query the first emit; if I want edges going into an entity, I query the second emit.
PROBLEM:
The problem here lies in the fact that I also have the need to capture edges both going INTO and OUT OF entities. Is there a way I can group or reduce these two emits into a single bi-directional set of [x] UNIQUE pairs?
Is there a better way of organizing my view to promote this action?
It might be preferable to just create a second view. But there's nothing stopping you from cramming all sorts of different data into the same view like so:
function() {
if (doc.entity1.name == doc.entity2.name) {
emit(['self-ref', doc.entity1.name], 1);
}
emit(['both' [doc.entity1.name, doc.entity2.name]], 1);
emit(['either' [doc.entity1.name, "out"]], 1);
emit(['either' [doc.entity2.name, "in"]], 1);
emit(['out', doc.entity1.name], 1);
emit(['in', doc.entity2.name], 1);
}
Then you could easily do the following:
find all the self-ref's:
startkey=["self-ref"]&endkey=["self-ref", {}].
find all of the edges (incoming or outgoing) for a particular node:
startkey=["either", [nodeName]]&endkey=["either", [nodeName, {}]]
if you don't reduce this, then you'll still be preserving "in" vs "out" in the key. If you never need to query for all nodes with incoming or outgoing edges, then you can replace the last two emits with the "either" emits.
find all of the edges from node1 -> node2:
key=["both", [node1, node2]
as well as your original queries for incoming or outgoing for a particular node.
I'd recommend benchmarking your application's typical use cases before choosing between this combined view approach or a multi-view approach.