I loaded data from csv into Neo4J. One column in the file is an array of arrays that Neo4J now considers one large string. How can I convert it back into an array?
My file looks like this:
Id, name, reviews
1, item1, "[[date1, User1, Rating1],
[date2, User2, Rating2],
[date3, User3, Rating3]] "
Into Neo4J:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///data/file.csv" AS line
CREATE(:Product{
Id:toInteger(line.Id),
name:toString(line.name),
reviews:line.reviews})
RETURN line
Now the review column is loaded, but considered one large string.
"[date1, User1, Rating1], [date2, User2, Rating2], [date3, User3, Rating3]"
Is there any way to split it into two arrays like this:
First Array:
[date1, User1, Rating1], //0
[date2, User2, Rating2], //1
[date3, User3, Rating3] //2
Second Array example:
// 0 , 1 , 2
[date1, User1, Rating1]
I'd like to be able to acces my data like this:
MATCH (n) RETURN n.reviews[2] (output: date3, User3, Rating3)
MATCH (n) RETURN n.reviews[2][0] (output: date3)
MATCH (n) RETURN n.reviews[1][1] (output: User2)
Is there any way to do this?
Using APOC Procedures, you can use the apoc.convert.fromJsonList() function to convert the list, though you'll need to make sure each of the subitems in the arrays is quoted so they'll be interpreted as strings.
WITH "[['date1', 'User1', 'Rating1'], ['date2', 'User2', 'Rating2'], ['date3', 'User3', 'Rating3']]" as inputString
WITH apoc.convert.fromJsonList(input) as input
RETURN input[2][0] //returns 'date3'
Just a note, the conversion functions are currently missing in the APOC docs, but you can reference them and their signature by entering this in the Neo4j browser:
CALL apoc.help('fromjson')
And now for the bad news.
Although you can do this with literal input and parameters and convert from properties which are JSON strings, you cannot use a nested list as a property of a node or relationship, that's just a current limitation of the properties implementation.
That said, this feels like your modeling may need some improvement. We'd recommend using separate nodes for your reviews rather than this nested structure, so something like:
(:Product)-[:HAS_REVIEW]->(:Review)
Where the :Review node has the date and rating, and either has the user ID, or has a relationship to the user node who rated the product.
Usage would look something like:
MATCH (p:Product {id:12345})-[:HAS_REVIEW]->(r)
WITH p, collect(r) as reviews
...
At that point you have an (unordered) array of the review nodes, and can do index access to get a review at a particular index, then use dot notation to access the property or properties you want. If you want ordering, you'll need to do an explicit ORDER BY before the collect(), and you'll need something to order it by, probably the date of the review.
Related
I am loading simple csv data into neo4j. The data is simple as follows :-
uniqueId compound value category
ACT12_M_609 mesulfen 21 carbon
ACT12_M_609 MNAF 23 carbon
ACT12_M_609 nifluridide 20 suphate
ACT12_M_609 sulfur 23 carbon
I am loading the data from the URL using the following query -
LOAD CSV WITH HEADERS
FROM "url"
AS row
MERGE( t: Transaction { transactionId: row.uniqueId })
MERGE(c:Compound {name: row.compound})
MERGE (t)-[r:CONTAINS]->(c)
ON CREATE SET c.category= row.category
ON CREATE SET r.price =row.value
Next I do the aggregation to count total orders for a compound and create property for a node in the following way -
MATCH (c:Compound) <-[:CONTAINS]- (t:Transaction)
with c.name as name, count( distinct t.transactionId) as ord
set c.orders = ord
So far so good. I can accomplish what I want but I have the following 2 questions -
How can I create the orders property for compound node in the first step itself? .i.e. when I am loading the data I would like to perform the aggregation straight away.
For a compound node I am also setting the property for category. Theoretically, it can also be modelled as category -contains-> compound by creating Categorynode. But what advantage will I have if I do it? Because I can execute the queries and get the expected output without creating this additional node.
Thank you for your answer.
I don't think that's possible, LOAD CSV goes over one row at a time, so at row 1, it doesn't know how many more rows will follow.
I guess you could create virtual nodes and relationships, aggregate those and then use those to create the real nodes, but that would be way more complicated. Virtual Nodes/Rels
That depends on the questions/queries you want to ask.
A graph database is optimised for following relationships, so if you often do a query where the category is a criteria (e.g. MATCH (c: Category {category_id: 12})-[r]-(:Compound) ), it might be more performant to create a label for it.
If you just want to get the category in the results (e.g. RETURN compound.category), then it's fine as a property.
I am creating an app kind of like Facebook. It is an app where people can share products and collections of products. In the "create a post" popup, people can either select a product or a collection (group of products but consider it as a single object) or just text to create a post. I need to fetch the posts created by my followers.
Each post will have a property of type PRODUCT, COLLECTION, OR TEXT to indicate what type of post it is.
In my neo4j DB, there is a Post object, product object, collection object and user object.
When you create a post, relations will be created between them.
(post)-[:CREATED_BY]->(USER)
(post{type:"PRODUCT"})-[:INCLUDES]->(product)
(post{type:"COLLECTION})-[:INCLUDES]->(collection)
This is what I tried to get the posts of type "PRODUCT". IT shows an error. but just to give a basic idea of our properties.
MATCH (user:User{lastName: "mylastname"})-[:FOLLOWS {status: "accepted"}]->(following) WITH following
OPTIONAL MATCH (post:Post {type: "PRODUCT"})-[r:CREATED_BY]->(following) WITH post,user, r OPTIONAL
MATCH
(post)-[:INCLUDES]->(product:Product) WITH COLLECT({post:post, datetime: r.datetime,
type:"PRODUCT",product:product user: following}) as productPosts
UNWIND productPosts AS row
RETURN row
ORDER BY row.datetime DESC
SKIP 0
LIMIT 10
Your WITH clauses are not specifying all the variables that need to be carried forward to the remainder of the query. Also, there has at least one typo (a missing comma).
In fact, your query does not even need any WITH clauses. Nor does it need to COLLECT a list only to immediately UNWIND it.
This query should work better:
MATCH (user:User{lastName: "mylastname"})-[:FOLLOWS {status: "accepted"}]->(following)
OPTIONAL MATCH (post:Post {type: "PRODUCT"})-[r:CREATED_BY]->(following)
OPTIONAL MATCH (post)-[:INCLUDES]->(product:Product)
RETURN {post:post, datetime: r.datetime, type:"PRODUCT", product:product, user: following} AS row
ORDER BY row.datetime DESC
LIMIT 10
I want to find method-pairs that read or write the same field, for this i wrote this query:
match (c:Class)-[:DECLARES]->(m1:Method), (c)-[:DECLARES]-(m2:Method), (c)-[:DECLARES]-(f:Field), (m1)-[:WRITES|READS]->(f), (m2)-[:WRITES|READS]->(f)
return m1.name, m2.name, f.name
Now i have the problem, that there are several duplicates in the results.
I want every "m1.name" and "m2.name" pair to be unique. Is there a way to filter out results that are swaped versions of other results?
If you enforce a specific ordering of the Method nodes' native IDs, that will produce distinct Method name pairs (assuming method names are unique):
MATCH
(c:Class)-[:DECLARES]->(m1:Method),
(c)-[:DECLARES]-(m2:Method),
(c)-[:DECLARES]-(f:Field),
(m1)-[:WRITES|READS]->(f),
(m2)-[:WRITES|READS]->(f)
WHERE ID(m1) < ID(m2)
RETURN m1.name, m2.name, f.name
I am currently investigating how to model a bitemporal graph in neo4j. Unfortunately noone seems to have publicly undertaken this before.
One particular thing I am looking at is whether I can store in a new node only those values that have changed and then express a query that would merge all those values ordered by a given timestamp:
This creates the data I am playing with:
CREATE (:P1 {id: '1'})<-[:EXPANDS {date:5200, recorded:5100}]-(:P1Data {name:'Joe', wage: 3000})
// New data, recorded 2014-10-1 for 2015-1-1
MATCH (p:P1 {id: '1'}) CREATE (:P1Data { wage:3100 })-[:EXPANDS { date:5479, recorded: 5387}]->(p)
Now, I can get a history for a given point in time so far, e.g. like
MATCH (:P1 { id: '1' })<-[x:EXPANDS]-(d:P1Data)
WHERE x.recorded < 6000
WITH {date: x.date, data:d} as data
RETURN data
ORDER BY data.date DESC
What I would like to achieve is to merge the name and wage values such that I get a whole view of the data at a given point in time. The answer may also be that this is not really possible.
(PS: I say only in query, because I found a refactor function in apoc which does merge nodes, but that procedure actually merges and persists the node, while I would just want to query it).
As with most things, you can do it using REDUCE like so:
MATCH (:P1 { id: '1' })<-[x:EXPANDS]-(d:P1Data)
WITH x.date AS date, d AS data
ORDER BY date
WITH COLLECT(data) AS datas
WITH REDUCE(s = {}, y IN datas|
{name: COALESCE(y.name, s.name),
wage: COALESCE(y.wage, s.wage)})
AS most_recent_fields
RETURN most_recent_fields.name AS name, most_recent_fields.wage AS wage
You can do it in descending order instead (swap s and y inside the COALESCE statements if so), but there isn't really a way to shortcut processing the entire set of results from your queried time back to the start.
UPDATE: This will, of course, generate a Map and not a Node, but if you only want the properties and don't want to create a permanent record, a Map is actually better suited to your needs.
EXTENDED: If you don't want to specify which keys to use, you can do it without REDUCE like this instead:
MATCH (:P1 { id: '1' })<-[x:EXPANDS]-(d:P1Data)
WITH x.date AS date, d AS data
ORDER BY date
WITH COLLECT(data) AS datas
CREATE (t:Temp)
FOREACH(data IN datas|
SET t += data)
DELETE t
RETURN t
This approach does create a node, but if you DELETE it right before you RETURN it, it won't persist at all. += ensures that pre-existing properties aren't removed, only overwritten if the data node has existing values.
Nodes returned in neo4j seem to be special, in that they output as JSON objects, and they don't appear at all if they're null.
An example:
I have a :Person object, and they can have 0 or more :Friend relationships to another :Person.
Let's say that a :Person has the following properties: ID, firstName, lastName, sensitiveThing.
sensitiveThing is a property that might be used by our system, or could be personally accessible to the user themselves, but we don't want to return it to any other user.
If I want a query to give me back data of my friends, and friends of those friends, I might use a query like this
MATCH (me:User{ID:777})-[:Friend]-(friend:User)
WITH me, friend
OPTIONAL MATCH (friend)-[:Friend]-(foaf:User)
WHERE me <> foaf
RETURN friend.ID, friend.firstName, friend.lastName, COLLECT(foaf) AS FriendOfAFriend
While this lets me nicely bundle up friends of friends as JSON objects within a JSON array, and correctly emits an empty array if a friend doesn't have any other friends besides me, I don't want to return sensitiveThing with this query.
If I try to replace COLLECT(foaf) with a custom object only including fields I care about, like this:
COLLECT({ID:(foaf.ID), firstName:(foaf.firstName), lastName:(foaf.lastName)})
then I get what I want...until I hit the case where there are no friends of friends. When I was working with nodes before, the object wouldn't even be emitted. But now, I would get something like this returned to me:
[{ID: (null), firstName: (null), lastName: (null)}]
This is obviously not what I want.
Ideally, I'm looking for a way to return a node as before, but whitelist or blacklist properties I want to emit, so I can retain the correct null handling if the node is null (from an optional match)
If I can't have that, then I'd like a way to use a custom object, but not return the object at all if all its fields are null.
Any other workarounds or tips for dealing with optional matches are more than welcome.
You can use apoc.map.removeKeys:
WITH {p1: 1, p2: 2, p3: 3, p4: 4} as node
CALL apoc.map.removeKeys( node, ['p2', 'p4'] ) YIELD value
RETURN value
I've never seen a way to whitelist or blacklist properties in the documentation.
However, you can return your custom object by chaining collect with extract:
MATCH (me:User{ID:777})-[:Friend]-(friend:User)
WITH me, friend
OPTIONAL MATCH (friend)-[:Friend]-(foaf:User)
WHERE me <> foaf
WITH friend, collect(foaf) AS FriendOfAFriend
RETURN friend.ID, friend.firstName, friend.lastName,
extract(foaf in FriendOfAFriend | {ID:(foaf.ID), firstName:(foaf.firstName), lastName:(foaf.lastName)}) AS FriendOfAFriend
collect will return an empty list if there are no friends, extract will keep it that way.