Concatenating nodes from a query into a single line for export to csv in Neo4J using Cypher - neo4j

I have a neo4J graph that represents a chess tournament.
Say I run this:
MATCH (c:ChessMatch {m_id: 1"})-[:PLAYED]-(p:Player) RETURN *
This gives me the results of the two players who played in a chess match.
The graph looks like this:
And the properties are something like this:
|--------------|------------------|
| (ChessMatch) | |
| m_id | 1 |
| date | 1969-05-02 |
| comments | epic battle |
|--------------|------------------|
| (player) | |
| p_id | 1 |
| name | Roy Lopez |
|--------------|------------------|
| (player) | |
| p_id | 2 |
| name | Aron Nimzowitsch |
|--------------|------------------|
I'd like to export this data to a csv, which would look like this:
| m_id | date | comments | p_id_A | name_A | p_id_B | name_B |
|------|------------|-------------|--------|-----------|--------|------------------|
| 1 | 1969-05-02 | epic battle | 1 | Roy Lopez | 2 | Aron Nimzowitsch |
Googling around, surprisingly, I didn't find any solid answers. The best I could think of is so just use py2neo and pull down all the data as separate tables and merge in Pandas, but this seems uninspiring. Any ideas on how to do in cypher would be greatly illuminating.

APOC has a procedure for that :
apoc.export.csv.query
Check https://neo4j-contrib.github.io/neo4j-apoc-procedures/index32.html#_export_import for more details. Note that you'll have to add the following to neo4j.conf :
apoc.export.file.enabled=true
Hope this helps.
Regards,
Tom

Related

how to create relationship using cypher

I have been learning neo4j/cypher for the last week. I have finally been able to upload two csv files and create a relationship,"captured". However, I am not fully confident in my understanding of the code as I was following the tutorial on the neo4j site. Could you please help me confirm what I did is correct.
I have two csv files, a "cap.csv" and a "survey.csv". The survey table contains data of each unique survey conducted at the survey sites. the cap table contains data of each unique organisms captured. In the cap table I have a foreign key, "survey_id", which in the Postgres db you would join to the p.key in the survey table.
I want to create a relationship, "captured", showing each unique organsism that was captured based on the "date" column in the survey table.
Survey table
| lake_id | date |survey_id | duration |
| -------- | -------------- | --| --
| 1 | 05/27/14 |1 | 7 |
| 2 | 03/28/13 | 2|10 |
| 2 | 06/29/19 | 3|23 |
| 3 | 08/21/21 | 4|54 |
| 1 | 07/23/18 | 5|23 |
| 2 | 07/22/23 | 6|12 |
Capture table
| cap_id | species |capture_life_stage | weight | survey_id |
| -------- | -------------- | --| -----|---|
| 1 | a |adult | 10 | 1|
| 2 | a | adult|10 | 2 |
| 3 | b | juv|23 | 3 |
| 4 | a | adult|54 | 4 |
| 5 | b | juv|23 | 5 |
| 6 | c | juv |12 | 6 |
LOAD CSV WITH HEADERS FROM 'file:///cap.csv' AS row
WITH
row.id as id,
row.species as species,
row.capture_life_stage as capture_life_stage,
toInteger(row.weight) as weight,
row.survey_id as survey_id
MATCH (c:cap {id: id})
MERGE (s) - [rel:captured {survey_id: survey_id}] ->(c)
return count(rel)
I am struggling to understand the code I wrote above. I followed the neo4j tutorial exactly but used my data (https://neo4j.com/developer/desktop-csv-import/).
I am fairly confident from data checks, but did the above code create the "captured" relationship showing each unique organism captured on that unique survey date? Based on the visual I can see I believe it did but I don't fully understand each step in the code.
What is the purpose of the MATCH (c:cap {id: id}) in the code?
The code below
MATCH (c:cap {id: id})
is the same as
MATCH (c:cap)
Where c.id = id
It is a shorter way of finding Captured node based on id and then you are creating a relationship with Survey node.
Question: s is not defined in your query. Where is it?

Not able to populate db to run cucumber scenarios

I'm learning Cucumber, and I have to populate the db to run the scenarios.
These are the instructions:
(...) you will create a step definition that will match the step
Given the following movies exist in the Background section of both
sort_movie_list.feature and filter_movie_list.feature. (Later in
the course, we will show how to DRY out the repeated Background
sections in the two feature files.)
Add your code in the movie_steps.rb step definition file. You can
just use ActiveRecord calls to directly add movies to the database;
it`s OK to bypass the GUI associated with creating new movies, since
that's not what these scenarios are testing.
This one of the *.feature files
Feature: display list of movies filtered by MPAA rating
As a concerned parent
So that I can quickly browse movies appropriate for my family
I want to see movies matching only certain MPAA ratings
Background: movies have been added to database
Given the following movies exist:
| title | rating | release_date |
| Aladdin | G | 25-Nov-1992 |
| The Terminator | R | 26-Oct-1984 |
| When Harry Met Sally | R | 21-Jul-1989 |
| The Help | PG-13 | 10-Aug-2011 |
| Chocolat | PG-13 | 5-Jan-2001 |
| Amelie | R | 25-Apr-2001 |
| 2001: A Space Odyssey | G | 6-Apr-1968 |
| The Incredibles | PG | 5-Nov-2004 |
| Raiders of the Lost Ark | PG | 12-Jun-1981 |
| Chicken Run | G | 21-Jun-2000 |
This is my code from *_steps.rb:
Given /the following movies exist/ do |movies_table|
movies_table.hashes.each do |movie|
Movie.create!(movie)
end
fail "Unimplemented"
end
And this is the error I get:
Background: movies have been added to database # features/sort_movie_list.feature:7
Given the following movies exist: # features/step_definitions/movie_steps.rb:3
| title | rating | release_date |
| Aladdin | G | 25-Nov-1992 |
| The Terminator | R | 26-Oct-1984 |
| When Harry Met Sally | R | 21-Jul-1989 |
| The Help | PG-13 | 10-Aug-2011 |
| Chocolat | PG-13 | 5-Jan-2001 |
| Amelie | R | 25-Apr-2001 |
| 2001: A Space Odyssey | G | 6-Apr-1968 |
| The Incredibles | PG | 5-Nov-2004 |
| Raiders of the Lost Ark | PG | 12-Jun-1981 |
| Chicken Run | G | 21-Jun-2000 |
Unimplemented (RuntimeError)
./features/step_definitions/movie_steps.rb:7:in `/the following movies exist/'
features/sort_movie_list.feature:9:in `Given the following movies exist:'
I have tried movie = Movie.create!, Movie.create!(movie), Movie.create! movie, movie = Movie.create! (this last one just for pure desperation)... What am I doing wrong?
Looks good to me.
You iterate over the movies and then just before the end you do fail "Unimplemented". What would you expect?

Neo4j - count very slow

I am running this query (bisac_code is uniquely indexed).
Execution time is more than 2.5 minutes.
52 main codes are selected from almost 4000 in total.
The total number of wokas is very large, 19 million nodes.
Are there any possibilities to make it run faster?
neo4j-sh (?)$ MATCH (b:Bisac)-[r:INCLUDED_IN]-(w:Woka)
> WHERE (b.bisac_code =~ '.*000000')
> RETURN b.bisac_code as bisac_code, count(w) as wokas_count
> ORDER BY b.bisac_code
> ;
+---------------------------+
| bisac_code | wokas_count |
+---------------------------+
| "ANT000000" | 13865 |
| "ARC000000" | 32905 |
| "ART000000" | 79600 |
| "BIB000000" | 2043 |
| "BIO000000" | 256082 |
| "BUS000000" | 226173 |
| "CGN000000" | 16424 |
| "CKB000000" | 26410 |
| "COM000000" | 44922 |
| "CRA000000" | 18720 |
| "DES000000" | 2713 |
| "DRA000000" | 62610 |
| "EDU000000" | 228182 |
| "FAM000000" | 42951 |
| "FIC000000" | 474004 |
| "FOR000000" | 41999 |
| "GAM000000" | 8803 |
| "GAR000000" | 37844 |
| "HEA000000" | 36939 |
| "HIS000000" | 3908869 |
| "HOM000000" | 5123 |
| "HUM000000" | 29270 |
| "JNF000000" | 40396 |
| "JUV000000" | 200144 |
| "LAN000000" | 89059 |
| "LAW000000" | 153138 |
| "LCO000000" | 1528237 |
| "LIT000000" | 89611 |
| "MAT000000" | 58134 |
| "MED000000" | 80268 |
| "MUS000000" | 75997 |
| "NAT000000" | 35991 |
| "NON000000" | 107513 |
| "OCC000000" | 42134 |
| "PER000000" | 26989 |
| "PET000000" | 4980 |
| "PHI000000" | 72069 |
| "PHO000000" | 8546 |
| "POE000000" | 104609 |
| "POL000000" | 309153 |
| "PSY000000" | 55710 |
| "REF000000" | 96477 |
| "REL000000" | 133619 |
| "SCI000000" | 86017 |
| "SEL000000" | 40901 |
| "SOC000000" | 292713 |
| "SPO000000" | 172284 |
| "STU000000" | 10508 |
| "TEC000000" | 77459 |
| "TRA000000" | 9093 |
| "TRU000000" | 12041 |
| "TRV000000" | 27706 |
+---------------------------+
52 rows
198310 ms
And the response time is not consistent.
After a while drops to less than half of a minute.
52 rows
31207 ms
In Neo4j 2.3 there will be index support for prefix LIKE searches but probably not for postfix ones.
There are two ways of making #user2194039's solution faster:
Use path expression to count the Woka per Bisac:
MATCH (b:Bisac) WHERE (b.bisac_code =~ '.*000000')
WITH b, size((b)-[:INCLUDED_IN]->()) as wokas_count
RETURN b.bisac_code as bisac_code, wokas_count
ORDER BY b.bisac_code
Mark the Bisac's with that pattern with a label
MATCH (b:Bisac) WHERE (b.bisac_code =~ '.*000000') SET b:Main;
MATCH (b:Main:Bisac)
WITH b, size((b)-[:INCLUDED_IN]->()) as wokas_count
RETURN b.bisac_code as bisac_code, wokas_count
ORDER BY b.bisac_code;
The slow speed is caused by your regular expression pattern matching (=~ ). Although your bisac_code is indexed, the regex match causes the index to be ineffective. The index only works when you are matching full bisac_code values.
Cypher does include some string manipulation facilities that might let you get by without using a regex =~, but I doubt it would make any difference, because the index will still be useless.
I might suggest considering if you can further categorize your bisac_codes so that you do not need to do a pattern match. Maybe an extra indexed property that somehow denotes those codes that end in 000000?
If you do not want to add properties, you may try matching only the Bisacs first, and then including the Wokas. Something like this:
MATCH (b:Bisac) WHERE (b.bisac_code =~ '.*000000')
WITH b
MATCH (b)-[r:INCLUDED_IN]-(w:Woka)
RETURN b.bisac_code as bisac_code, count(w) as wokas_count
ORDER BY b.bisac_code
This may help Cypher stick to the 4000 Bisac nodes while doing the pattern match, before getting involved with all 19 million Woka nodes, but I am not sure if this will make a material difference. Even slogging through 4000 nodes (effectively without an index) is a slow process.
Hash Tables in Database Indexing
The reason that your index is ineffective for regex pattern matching is that Neo4j likely uses a hash table for indexing properties. This is common of many databases. Wikipedia has an article here.
The basics though are that the index is not storing all of the properties that you want to search through. It is storing values that represent the properties you want to search through, and the representation is only valid for the whole property. If you are searching for only a part of the property value, the hashes stored in the index are useless, and the database must search through the properties the old-fashioned way -- one by one.
Edit re: your edit
The improvement in response time after running this query multiple times is certainly due to caching. Neo4j is remembering that you access the Bisac nodes and bisac_code properties frequently, and is keeping them in memory. This makes future queries faster because the values do not need to be read off disk.
However, eventually, those nodes a properties will likely be dropped from the cache, as Neo4j finds you manipulating different nodes, which it will cache instead. There are only so many nodes Neo4j can cache before running out of memory, so it picks the most recent and/or frequently used data.

Can I have a cucumber example with several values in a single column x row position

Hi here is what I what I have:
Scenario Outline: Seatching for stuff
Given that the following simple things exists:
| id | title | description | temp |
| 1 | First title | First description | low |
| 2 | Second title | Second description with öl | Medium |
| 3 | Third title | Third description | High |
| 11 | A title with number 2 | can searching numbers find this 2 | Exreme |
When I search for <criteria>
Then I should get <result>
And I should not get <excluded>
Examples
|criteria|results | excluded |
| 1 | 1 | 2,3,11 |
| 11 | 11 | 1,2,3 |
| title | 1,2,3 | 11 |
| öl | 2 | 1,3,11 |
| Fir* | 1 | 2,3,11 |
| third | 3 | 1,2,11 |
| High | 3 | 1,2,11 |
As you can see I'm trying to test a search field for a web-application using cucumber and the scenario outline structure in order to test several search criteria.
I'm not sure how to handle the input I would get as result and excluded in my steps.
Maybe this doesn't work at all?
Is there a workaround?
There's nothing wrong with what you're doing. Cucumber will just take that as a single string. The fact that it's actually comma-separated values means nothing to Cucumber.
Your step definition would still look like this:
Then /^I should not get ([^"]*)$/ do |excluded|
# excluded will be a string, "2,3,11"
values = excluded.split(",")
# Do whatever you want with the values
end

SpecFlow - Repeat test X times with list?

Scenario: Change a member to ABC 60 days before anniversary date
Given Repeat When+Then for each of the following IDs:
| ID |
| 0047619101 |
| 0080762602 |
| 0186741901 |
| 0311285102 |
| 0570130101 |
| 0725968201 |
| 0780265749 |
| 0780265750 |
| 0780951340 |
| 0780962551 |
#-----------------------------------------------------------------------
When these events occur:
| WorkflowEventType | WorkflowEntryPoint |
| ABC | Status Change |
Then these commands are executed:
| command name |
| TerminateWorkflow |
And For Member, the following documents were queued:
| Name |
| ABC Packet |
In the above scenario I would like to:
GIVEN - Lookup 10 members from the DB
WHEN + THEN - Do these steps 10 times, once for each record.
Is this possible with SpecFlow?
If so, how would you set it up?
TIA
This is actually quite easy to do, although the documentation takes a bit of searching.
What you want is a scenario outline, like so:
Scenario Outline: Change a member to ABC 60 days before anniversary date
Given I have <memberId>
When these events occur:
| WorkflowEventType | WorkflowEntryPoint |
| ABC | Status Change |
Then these commands are executed:
| command name |
| TerminateWorkflow |
And For <memberId>, the following documents were queued:
| Name |
| ABC Packet |
Examples:
| memberId |
| 0047619101 |
| 0080762602 |
| 0186741901 |
| ...etc... |
This will execute your scenario once for each id in the examples table. You can extend the table to have multiple columns, if needed.
Or, more simply (if you really only have one row in each of your example tables above)
Scenario Outline: Change a member to ABC 60 days before anniversary date
Given I have <memberId>
When A 'ABC' Event Occurs with EntryPoint 'Status Change'
Then a TerminateWorkflow command is executed
And For <memberId>, the 'ABC Packet' document was queued
Examples:
| memberId |
| ...etc... |
For more information see the specflow-wiki on github and the cucumber language syntax for scenario outlines

Resources