how to create relationship using cypher - neo4j

I have been learning neo4j/cypher for the last week. I have finally been able to upload two csv files and create a relationship,"captured". However, I am not fully confident in my understanding of the code as I was following the tutorial on the neo4j site. Could you please help me confirm what I did is correct.
I have two csv files, a "cap.csv" and a "survey.csv". The survey table contains data of each unique survey conducted at the survey sites. the cap table contains data of each unique organisms captured. In the cap table I have a foreign key, "survey_id", which in the Postgres db you would join to the p.key in the survey table.
I want to create a relationship, "captured", showing each unique organsism that was captured based on the "date" column in the survey table.
Survey table
| lake_id | date |survey_id | duration |
| -------- | -------------- | --| --
| 1 | 05/27/14 |1 | 7 |
| 2 | 03/28/13 | 2|10 |
| 2 | 06/29/19 | 3|23 |
| 3 | 08/21/21 | 4|54 |
| 1 | 07/23/18 | 5|23 |
| 2 | 07/22/23 | 6|12 |
Capture table
| cap_id | species |capture_life_stage | weight | survey_id |
| -------- | -------------- | --| -----|---|
| 1 | a |adult | 10 | 1|
| 2 | a | adult|10 | 2 |
| 3 | b | juv|23 | 3 |
| 4 | a | adult|54 | 4 |
| 5 | b | juv|23 | 5 |
| 6 | c | juv |12 | 6 |
LOAD CSV WITH HEADERS FROM 'file:///cap.csv' AS row
WITH
row.id as id,
row.species as species,
row.capture_life_stage as capture_life_stage,
toInteger(row.weight) as weight,
row.survey_id as survey_id
MATCH (c:cap {id: id})
MERGE (s) - [rel:captured {survey_id: survey_id}] ->(c)
return count(rel)
I am struggling to understand the code I wrote above. I followed the neo4j tutorial exactly but used my data (https://neo4j.com/developer/desktop-csv-import/).
I am fairly confident from data checks, but did the above code create the "captured" relationship showing each unique organism captured on that unique survey date? Based on the visual I can see I believe it did but I don't fully understand each step in the code.
What is the purpose of the MATCH (c:cap {id: id}) in the code?

The code below
MATCH (c:cap {id: id})
is the same as
MATCH (c:cap)
Where c.id = id
It is a shorter way of finding Captured node based on id and then you are creating a relationship with Survey node.
Question: s is not defined in your query. Where is it?

Related

Google Sheets Return Any(All) Row Value(s) For MAX-GROUP Query

I am looking to return non-grouped row values from a query of a table sorted by the MAX value of a column, within a group.
DATA TABLE
| NAME | ASSET | ACTION | DATE |
|--|--|--|--|
| JOE | CAR | BOUGHT | 1/1/2020 |
| JANE | HORSE | BOUGHT | 1/1/2021 |
| JOE | HORSE | BOUGHT | 2/1/2021 |
| JANE | HORSE | SOLD | 3/1/2021 |
| JOE | CAR | SOLD | 1/1/2022 |
| JOE | CAR | BOUGHT | 2/1/2022 |
For the table above, I presented the following code.
=QUERY(A1:D5,"SELECT A,B,C,D, MAX(D) GROUP BY A,B",TRUE)
The following TARGET TABLE is output I'm looking for:
| NAME | ASSET | ACTION | DATE |
|--|--|--|--|
| JANE | HORSE | SOLD | 3/1/2021 |
| JOE | HORSE | BOUGHT | 2/1/2021 |
| JOE | CAR | BOUGHT | 2/1/2022 |
However, because 'C' is not included in the GROUP, the formula returns an error. "Unable to parse query string for Function QUERY parameter 2: ADD_COL_TO_GROUP_BY_OR_AGG: C"
If I were to omit COL C & D, "ACTION" & "DATE" from the SELECT: =QUERY(A1:D5,"SELECT A,B, MAX(D) GROUP BY A,B",TRUE) , I have the correct record rows, but am missing the STATUS.
MAX-DATE TABLE
| NAME | ASSET | max DATE |
|--|--|--|
| JANE | HORSE | 3/1/2021 |
| JOE | HORSE | 2/1/2021 |
| JOE | CAR | 2/1/2022 |
OR, when I add COL C as a "PIVIOT": =QUERY(A1:D5,"SELECT A,B, MAX(D) GROUP BY A,B PIVOT C",TRUE)I have the correct record rows, but do not have the 'current' STATUS within the record row.
PIVOT ACTION TABLE
| NAME | ASSET | BOUGHT | SOLD |
|--|--|--|--|
| JANE | HORSE | 1/1/2021 | 3/1/2021 |
| JOE | HORSE | 2/1/2021 | |
| JOE | CAR | 2/1/2022 | 1/1/2022 |
Still I have not found a method to create my TARGET TABLE.
Am I overlooking a method to include a non-grouped field into a query using MAX()? Or is it impossible within Google Sheets Query without JOIN functions?
(I hope it is obvious that I desire to apply this to a large and dynamic dataset.)
Thank you for your insight. Cheers!
It's not that flexible to work with QUERYs with its aggregation requisites and so on.
You can create a filter, by comparing column D with a "fictional" column created with BYROW: = BYROW(A2:A,LAMBDA(each,MAXIFS($D$2:$D,$A$2:$A,each,$B$2:$B,OFFSET(each,,1))))
That would look like this (I highlighted the matches and added extra rows for reference):
Then, you can set this filter (don't create this column, it's just a visualization of what I did):
=FILTER(A2:D,D2:D = BYROW(A2:A,LAMBDA(each,MAXIFS($D$2:$D,$A$2:$A,each,$B$2:$B,OFFSET(each,,1)))))
This way, you're comparing the dates with the maximum for each category

Rails: create unique auto-incremental id based on sibling records

I have three models in my rails project, namely User, Game, Match
user can create many matches on each game
so table structure for matches is like
table name: game_matches
+----+---------+---------+-------------+------------+
| id | user_id | game_id | match_type | match_name |
+----+---------+---------+-------------+------------+
| 1 | 1 | 1 | practice | |
| 2 | 3 | 2 | challenge | |
| 3 | 1 | 1 | practice | |
| 4 | 3 | 2 | challenge | |
| 5 | 1 | 1 | challenge | |
| 6 | 3 | 2 | practice | |
+----+---------+---------+-------------+------------+
i want to generate match_name based on user_id, game_id and match_type values
for example match_name should be create like below
+----+---------+---------+-------------+-------------+
| id | user_id | game_id | match_type | match_name |
+----+---------+---------+-------------+-------------+
| 1 | 1 | 1 | practice | Practice 1 |
| 2 | 3 | 2 | challenge | Challenge 1 |
| 3 | 1 | 1 | practice | Practice 2 |
| 4 | 3 | 2 | challenge | Challenge 2 |
| 5 | 1 | 1 | challenge | Challenge 1 |
| 6 | 3 | 2 | practice | Practice 1 |
+----+---------+---------+-------------+-------------+
How can i achieve this auto incremental value in my rails model during new record creation.
Any help suggestions appreciated.
Thanks in advance.
I see two ways you can solve this:
DB: trigger
Rails: callback
Trigger (assuming Postgres):
DROP TRIGGER IF EXISTS trigger_add_match_name ON customers;
DROP FUNCTION IF EXISTS function_add_match_name();
CREATE FUNCTION function_add_match_name()
RETURNS trigger AS $$
BEGIN
NEW.match_name := (
SELECT
CONCAT(game_matches.match_type, ' ', COALESCE(count(*), 0))
FROM game_matches
WHERE game_matches.user_id = NEW.user_id AND game_matches.match_type = NEW.match_type
);
RETURN NEW;
END
$$ LANGUAGE 'plpgsql';
CREATE TRIGGER trigger_add_match_name
BEFORE INSERT ON game_matches
FOR EACH ROW
EXECUTE PROCEDURE function_add_match_name();
Please note that this is not tested.
Rails
class GameMatch
before_create :assign_match_name
private
def assign_match_name
number = GameMatch.where(user_id: user_id, match_type: match_type).count || 0
name = "#{match_type} #{number + 1}"
self.match_name = name
end
end
Again, untested.
I'd prefer the trigger solution since callbacks can be skipped or ommited altogether when inserting via pure SQL.
Also I'd add "match_number" column instead of the full name and then construct the name within the Model or a Decorator or a view Helper (more flexible, I18n) but the logic behind stays the same.
You should retrieve the last match_name for these user and game, split it, increase the counter and join back with a space. Unfortunately, SQL does not provide SPLIT function, so somewhat like below would be a good start:
SELECT match_name
FROM match_name
WHERE user_id = 3
AND game_id = 2
ORDER BY id DESC
LIMIT 1
I would actually better create a match_number column of type INT to keep the number by type and produce a name by concatenation the type with this number.

Concatenating nodes from a query into a single line for export to csv in Neo4J using Cypher

I have a neo4J graph that represents a chess tournament.
Say I run this:
MATCH (c:ChessMatch {m_id: 1"})-[:PLAYED]-(p:Player) RETURN *
This gives me the results of the two players who played in a chess match.
The graph looks like this:
And the properties are something like this:
|--------------|------------------|
| (ChessMatch) | |
| m_id | 1 |
| date | 1969-05-02 |
| comments | epic battle |
|--------------|------------------|
| (player) | |
| p_id | 1 |
| name | Roy Lopez |
|--------------|------------------|
| (player) | |
| p_id | 2 |
| name | Aron Nimzowitsch |
|--------------|------------------|
I'd like to export this data to a csv, which would look like this:
| m_id | date | comments | p_id_A | name_A | p_id_B | name_B |
|------|------------|-------------|--------|-----------|--------|------------------|
| 1 | 1969-05-02 | epic battle | 1 | Roy Lopez | 2 | Aron Nimzowitsch |
Googling around, surprisingly, I didn't find any solid answers. The best I could think of is so just use py2neo and pull down all the data as separate tables and merge in Pandas, but this seems uninspiring. Any ideas on how to do in cypher would be greatly illuminating.
APOC has a procedure for that :
apoc.export.csv.query
Check https://neo4j-contrib.github.io/neo4j-apoc-procedures/index32.html#_export_import for more details. Note that you'll have to add the following to neo4j.conf :
apoc.export.file.enabled=true
Hope this helps.
Regards,
Tom

Neo4j CSV import query super slow, when setting relationships

I am trying to evaluate Neo4j (using the community version).
I am importing some data (1 million rows) using the LOAD CSV process. It needs to match previously imported nodes to create a relationship between them.
Here is my query:
//Query #3
//create edges between Tr and Ad nodes
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///1M.txt'
AS line
FIELDTERMINATOR '\t'
//find appropriate tx and ad
MATCH (tx:Tr { txid: TOINT(line.txid) }), (ad:Ad {p58: line.p58})
//create the edge (relationship)
CREATE (tx)-[out:OUT_TO]->(ad)
//set properties on the edge
SET out.id= TOINT(line.id)
SET out.n = TOINT(line.n)
SET out.v = TOINT(line.v)
I have indicies on:
Indexes
ON :Ad(p58) ONLINE (for uniqueness constraint)
ON :Tr(txid) ONLINE
ON :Tr(h) ONLINE (for uniqueness constraint)
This query has been running for 5 days now and it has so far created 270K relationships (out of 1M).
Java heap is 4g
Machine has 32G of RAM and an SSD for a drive, only running linux and Neo4j
Any hints to speed this process up would be highly appreciated.
Should I try the enterprise edition?
Query Plan:
+--------------------------------------------+
| No data returned, and nothing was changed. |
+--------------------------------------------+
If a part of a query contains multiple disconnected patterns,
this will build a cartesian product between all those parts.
This may produce a large amount of data and slow down query processing.
While occasionally intended,
it may often be possible to reformulate the query that avoids the use of this cross product,
perhaps by adding a relationship between the different parts or by using OPTIONAL MATCH (identifier is: (ad))
20 ms
Compiler CYPHER 3.0
Planner COST
Runtime INTERPRETED
+---------------------------------+----------------+---------------------+----------------------------+
| Operator | Estimated Rows | Variables | Other |
+---------------------------------+----------------+---------------------+----------------------------+
| +ProduceResults | 1 | | |
| | +----------------+---------------------+----------------------------+
| +EmptyResult | | | |
| | +----------------+---------------------+----------------------------+
| +Apply | 1 | line -- ad, out, tx | |
| |\ +----------------+---------------------+----------------------------+
| | +SetRelationshipProperty(4) | 1 | ad, out, tx | |
| | | +----------------+---------------------+----------------------------+
| | +CreateRelationship | 1 | out -- ad, tx | |
| | | +----------------+---------------------+----------------------------+
| | +ValueHashJoin | 1 | ad -- tx | ad.p58; line.p58 |
| | |\ +----------------+---------------------+----------------------------+
| | | +NodeIndexSeek | 1 | tx | :Tr(txid) |
| | | +----------------+---------------------+----------------------------+
| | +NodeUniqueIndexSeek(Locking) | 1 | ad | :Ad(p58) |
| | +----------------+---------------------+----------------------------+
| +LoadCSV | 1 | line | |
+---------------------------------+----------------+---------------------+----------------------------+
OKAY, so by splitting the MATCH statement into two it sped up the query immensely. Thanks #William Lyon for pointing me to the Plan. I noticed the warning.
old MATCH atatement
MATCH (tx:Tr { txid: TOINT(line.txid) }), (ad:Ad {p58: line.p58})
split into two:
MATCH (tx:Tr { txid: TOINT(line.txid) })
MATCH (ad:Ad {p58: line.p58})
on 750K relationships the query took 83 seconds.
Next up 9 Million CSV LOAD

Create on NOT MATCH command for Neo4j's CQL?

I have a non-unique node (:Neighborhood) that uniquely appears [:IN] a (:City) node. I would like to create a new neighborhood node and establish its relationship ONLY if that neighborhood node does not exist in that city. There can be multiple neighborhoods that have the same name, but each neighborhood must appear uniquely appear in the property city.
Following the advice from the Gil's answer here: Return node if relationship is not present, how can I do something like:
MATCH a WHERE NOT (a:Neighborhood {name : line.Neighborhood})-[r:IN]->(c:City {name : line.City})
ON MATCH SET (a)-[r]-(c)
So then it would only create a new neighborhood node if it doesn't already exist in the city.
**UPDATE:**I upgraded and profiled it and still can't take advantage of any optimizations...
PROFILE LOAD CSV WITH HEADERS FROM "file://THEFILE" as line
WITH line LIMIT 0
MATCH (c:City { name : line.City})
MERGE (n:Neighborhood {name : toInt(line.Neighborhood)})-[:IN]->(c)
;
+--------------+------+--------+---------------------------+------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+--------------+------+--------+---------------------------+------------------------------+
| EmptyResult | 0 | 0 | | |
| UpdateGraph | 5 | 16 | anon[340], b, neighborhood, line | MergePattern |
| SchemaIndex | 5 | 10 | b, line | line.City; :City(name) |
| ColumnFilter | 5 | 0 | line | keep columns line |
| Filter | 5 | 0 | anon[216], line | anon[216] |
| Extract | 5 | 0 | anon[216], line | anon[216] |
| Slice | 5 | 0 | line | { AUTOINT0} |
| LoadCSV | 5 | 0 | line | |
+--------------+------+--------+---------------------------+------------------------------+
I think you could simply use MERGE for this:
MATCH (c:City {name: line.City})
MERGE c<-[:IN]-(a:Neighborhood {name : line.Neighborhood})
If you haven't already imported all of the cities, you can create those with MERGE:
MATCH (c:City {name: line.City})
MERGE c<-[:IN]-(a:Neighborhood {name : line.Neighborhood})
But beware of the Eager operator:
http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/
In short: You should run your LOAD CSV (I assume that's what you're doing here) twice, once to load the cities and once to load the neighborhoods.

Resources