store temp variables in neo4j - neo4j

I have some cypher queries that I execute against my neo4j database. The query is in this form
MATCH p=(j:JOB)-[r:HAS|STARTS]->(s:URL)-[r1:VISITED]->(t:URL)
WHERE j.job_id =5000 and r1.origin='iframe' and r1.job_id=5000 AND NOT (t.netloc =~ 'VERY_LONG_LIST')
RETURN count(r1) AS number_iframes;
If you can't understand what I am doing. This is a much simpler query
MATCH (s:WORD)
WHERE NOT (s.text=~"badword1|badword2|badword3")
RETURN s
I am basically trying to match some words against specific list
The problem is that this list is very large as you can see my job_id=5000 and I have more than 20000 jobs, so if my whitelist length is 1MB then I will end up with very large queries. I tried 500 jobs and end up with 200 MB queries file.
I was trying to execute these queries using transactions from py2neo but this is wont be feasible because my post request length will be very large and it will timeout. As a result, I though of using
neo4j-shell -file <queries_file>
However as you can see the file size is very large because of the large whitelist. So my question is there anyway that I can store this "whitelist" in a variable in neo4j using cypher??
I wish if there is something similar to this
SAVE $whitelist="word1,word2,word3,word4,word5...."
MATCH p=(j:JOB)-[r:HAS|STARTS]->(s:URL)-[r1:VISITED]->(t:URL)
WHERE j.job_id =5000 and r1.origin='iframe' and r1.job_id=5000 AND NOT (t.netloc =~ $whitelist)
RETURN count(r1) AS number_iframes;

What datatype is your netloc?
If you have an index on netloc you can also use t.netloc IN {list} where {list} is a parameter provided from the outside.
Such large regular expressions will not be fast
What exactly is your regexp and netloc format like? Perhaps you can change that into a split + index-list lookup?
In general also for regexps you can provide an outside parameter.
You can also use "IN" + index for job_ids.
You can also run a separate job that tags the jobs within your whitelist with a label and use that label for additional filtering e.g. in the match already.
Why do you have to check this twice ? Isn't it enough that the job has id=5000?
j.job_id =5000 and r1.job_id=5000

Related

How to process a concatenated CSV file using cypher query in neo4j

I have a concatenated csv file with a certain delimiter. Example:
name,age
John,24
Alice,25
--------
parent,child
Node1,Node2
Node3,Node4
So I want to process the first part of the csv in one query and the other part in different query. Is there a way to process this csv file according to the delimiter in Neo4j?
It is possible to do this in Cypher but it is cumbersome and inefficient.
Here's a snippet using APOC:
LOAD CSV FROM 'file:///my.csv' AS line
WITH collect(line) AS lines
WITH lines, range(0, size(lines)) AS indexes
WITH apoc.coll.zip(lines, indexes) AS indexed
WITH indexed, [item IN indexed WHERE item[0][0] = "--------"][0][1] AS separator
WITH
[item IN indexed WHERE item[1] < separator] AS first,
[item IN indexed WHERE item[1] > separator] AS second
RETURN *
This gives:
╒═════════════════════════════════════════════════════════╤════════════════════════════════════════════════════════════════════╕
│"first" │"second" │
╞═════════════════════════════════════════════════════════╪════════════════════════════════════════════════════════════════════╡
│[[["name","age"],0],[["John","24"],1],[["Alice","25"],2]]│[[["parent","child"],4],[["Node1","Node2"],5],[["Node3","Node4"],6]]│
└─────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────┘
(One can probably also do this without APOC by e.g. using reduce.)
With these collections returned, it is possible to UNWIND first or second and CREATE the related nodes. Still, it is going to be hard to read/write these queries and definitely slow to execute. So it's best to perform the split before loading the CSV using a scripting language (Bash, Python, etc.)

Not able to create a node with special characters "-" in neo4j

I am trying to create a node in neo4j(version 3.2.3). Below is the cypher query,
MERGE (`source-real-address`:SOURCE {Source:{`source-real-address`}})
I found in forums to create a node with special characters we should use
backticks `
in the query. But I couldn't able to create a node with backticks. No error were thrown in the logs.
Could you please help me to resolve this?
Please correct me if I am doing anything wrong in the cypher query. I am started
to understand neo4j cypher query language.
Note:- I am sending data to neo4j from graylog with the help of neo4j output plugin. I could able to create node without special character fields.
The syntax {Source:{`source-real-address`}}) means that you are trying to use a param named source-real-address as the value of the property Source. If this is your goal, you can set a param in the Neo4j Browser for test purposes with :params {"source-real-address":"Some value"}. If not, you can remove the extra { and } in the value and use "" instead of backticks, like this:
MERGE (source-real-address:SOURCE {Source:"source-real-address"})
Remeber that the value of a property should be Boolean, Integer, Float or String.
In Cypher backticks are used to create relationships, labels and variable names with special chars (not for property values).
Use the CREATE command to create Node with special characters
see this also: https://neo4j.com/docs/cypher-manual/current/syntax/naming/

Solr and Rails: [* TO *] value instead of nil (asterisk TO asterisk)

Inside my model at searchable block I have index time added_at.
At search block for searching I added with(:added_at, nil), made reindex and now inside search object I have:
<Sunspot::Search:{:fq=>["-added_at_d:[* TO *]"]...}>
What is the meaning of this [* TO *] ? Something went wrong?
By adding with(:added_at, nil) you narrow down the search results to documents having no values in the field added_at, so we can expect the corresponding query filter to be defined as :
fq=>["added_at_d:null"] # not valid
The problem is that Solr Standard Query Parser does not support searching a field for empty/null value. In this situation the filter needs to be negated (exluding documents having any value in the field) so that the query remains valid.
The operator - can be used to exclude the field, and the wildcard character * can be used to match any value, now we can expect the query filter to look like :
fq=>["-added_at_d:*"]
However, although the above is valid for the query parser, using a range query should be preferred to prevent inconsitent behaviors when using wildcard within negative subqueries.
Range Queries allow one to match documents whose field(s) values are
between the lower and upper bound specified by the Range Query. Range
Queries can be inclusive or exclusive of the upper and lower bounds.
A * may be used for either or both endpoints to specify an open-ended range query.
Eventually there is nothing wrong with this filter that ends up looking like :
fq=>["-added_at_d:[* TO *]"]
cf. Lucene Range Queries, Solr Standard Query Parser

How to set "resultDataContents" in Neo4jrb?

I want to visualise data from Neo4j with the frontend-library D3.js in an Rails application, using Neo4jrb. For example I could use the following query to get my graph data.
query = "MATCH path = (a)-[b]->(c) RETURN path"
result = Neo4j::Session.current.query(query)
But this query is not giving me the exact data I want.
According to the Neo4j data visualisation guide there is a possibility to set the parameter resultDataContents to "graph". (
Neo4j documentation for "resultDataContents")
This is exactly what I need for my application. Is there any possibility to set this parameter in Neo4jrb, or another idea how to achieve such a result?
Unfortunately not currently. The neo4j-core gem (which the neo4j gem uses) was build to abstract away the REST format. The "graph" format returns data in a different way.
You have a couple of options. You could make the JSON queries yourself or you could retrieve the nodes and relationships from the queries that you perform and then build your own nodes/relationships structure which is returned. This might be more future-proof anyway if you ever want to switch to Bolt.
A way that you might do this in your case:
query = "MATCH path = (a)-[b]->(c) RETURN nodes(path) AS nodes, rels(path) AS rels"
result = Neo4j::Session.current.query(query)
response = {nodes: [], rels: []}
result.each do |row|
response[:nodes].concat(row.nodes)
response[:rels].concat(row.rels)
end
response[:nodes].uniq!
response[:rels].uniq!

NEO4J execute severals statement

How it's possible to run a collection of query like this (came from a spreadsheet copy) directly in one cypher query? one by one it's ok, but need 100 copy/paste
*******************************
MATCH (c:`alpha`)
where c.name = "a-01"
SET c.CP_PRI=1, c.TO_PRI=1, c.TA_PRI=2
return c ;
MATCH (c:`beta`)
where c.name = "a-02"
SET c.CP_PRI=1, c.TO_PRI=1, c.TA_PRI=0
return c ;
and 100 other lines ...
*********************************
you may try the 'union' clause, which joins the results of queries into one big-honkin result set:
http://docs.neo4j.org/chunked/milestone/query-union.html
That said - the root behavior of what you are trying to do could use some details - maybe there's a better way to write the query - you could use Excel to 'build' the unified query via calculations / macros, you could possibly write a unified query that combines the rules you are trying to follow, there's a lot of options, but it's hard to know a starting direction w/o context....
Talking about the REST API you can use the transactional endpoint in Neo4J 2.0, or the batch endpoint in Neo4J 1.x.
If you want to use the shell, have a look to the import page, in particular the neo4j-shell-tools where they're importing massive quantity of data batching multiple queries.

Resources