Cypher: How to optimize a search in a browser (maybe using parameters) - neo4j

Well, I have to run the following query (Neo4j comm. ed. 3.0.12 on Docker)
the caveat is that the calendar name has unknown format:
1) firstname + " " + lastname + "-" + specialization
2) lastname + " " + firstname + "-" + specialization
:PARAM name: "Di Pietro Chiara - Gynecologist"
MERGE (_200:`person` {`lastname`: "Di Pietro", `firstname`: "Chiara", `birthdate`: "1984/03/25"})
MERGE (_cal_445:`calendar` { :`X-VR-CALNAME` = $name })-[:`belongs_to a`]-(_per_445:`person`)
WHERE $name = _per_445.firstname + " " + _per_445.lastname
OR $name = (_per_445.nome + " " + _per_445.cognome)
RETURN _cal_445, _per_445
The query, and some different variants, doesn't run. Sometimes returns an error, and sometimes destroys the browser layout on the screen,
Surely there is something wrong but I was unable to find and correct.
The part of confronting against two inverted format: how could be optimized?
Why the PARAM declaration generate an error?
Any help will be greatly appreciated.

This part of your query is not valid :
MERGE (_cal_445:`calendar` { :`X-VR-CALNAME` = $name })
You should replace it by this :
MERGE (_cal_445:`calendar` { `:X-VR-CALNAME`:$name })
Moreover, you are doing a MERGE with the value $name that is also on the WHERE cluse. It's just not allowed ...
If you replace the merge by a match, your query will work :
MERGE (_200:`person` {`lastname`: "Di Pietro", `firstname`: "Chiara", `birthdate`: "1984/03/25"})
WITH _200
MATCH (_cal_445:`calendar` { `:X-VR-CALNAME`: $name })-[:`belongs_to a`]-(_per_445:`person`)
WHERE $name = _per_445.firstname + " " + _per_445.lastname
OR $name = (_per_445.nome + " " + _per_445.cognome)
RETURN _cal_445, _per_445

Related

Event-time Temporal Join in Apache Flink only works with small datasets

Background: I'm trying to get an event-time temporal join working with two 'large(r)' datasets/tables that are read from a CSV-file (16K+ rows in left table, somewhat less in right table). Both tables are append-only tables, i.e. their datasources are currently CSV-files, but will become CDC changelogs emitted by Debezium over Pulsar.
I am using the fairly new SYSTEM_TIME AS OF syntax.
The problem: join results are only partly correct, i.e. at the start (first 20% or so) of the execution of the query, rows of the left-side are not matched with rows from the right side, while in theory, they should. After a couple of seconds, there are more matches, and by the time the query ends, rows of the left side are getting matched/joined correctly with rows of the right side.
Every time that I run the query it shows other results in terms of which rows are (not) matched.
Both datasets are not ordered by their respective event-times. They are ordered by their primary key. So it's really this case, only with more data.
In essence, the right side is a lookup-table that changes over time, and we're sure that for every left record there was a matching right record, as both were created in the originating database at +/- the same instant. Ultimately our goal is a dynamic materialized view that contains the same data as when we'd join the 2 tables in the CDC-enabled source database (SQL Server).
Obviously, I want to achieve a correct join over the complete dataset as explained in the Flink docs
Unlike simple examples and Flink test-code with a small dataset of only a few rows (like here), a join of larger datasets does not yield correct results.
I suspect that, when the probing/left table starts flowing, the build/right table is not yet 'in memory' which means that left rows don't find a matching right row, while they should -- if the right table would have started flowing somewhat earlier. That's why the left join returns null-values for the columns of the right table.
I've included my code:
#Slf4j(topic = "TO_FILE")
public class CsvTemporalJoinTest {
private final String emr01Ddl =
"CREATE TABLE EMR01\n" +
"(\n" +
" SRC_NO STRING,\n" +
" JRD_ETT_NO STRING,\n" +
" STT_DT DATE,\n" +
" MGT_SLT_DT DATE,\n" +
" ATM_CRT_DT DATE,\n" +
" LTD_MDT_IC STRING,\n" +
" CPN_ORG_NO STRING,\n" +
" PTY_NO STRING,\n" +
" REG_USER_CD STRING,\n" +
" REG_TS TIMESTAMP,\n" +
" MUT_USER_CD STRING,\n" +
" MUT_TS TIMESTAMP(3),\n" +
" WATERMARK FOR MUT_TS AS MUT_TS,\n" +
" PRIMARY KEY (CPN_ORG_NO) NOT ENFORCED\n" +
") WITH (\n" +
" 'connector' = 'filesystem',\n" +
" 'path' = '" + getCsv1() + "',\n" +
" 'format' = 'csv'\n" +
")";
private final String emr02Ddl =
"CREATE TABLE EMR02\n" +
"(\n" +
" CPN_ORG_NO STRING,\n" +
" DSB_TX STRING,\n" +
" REG_USER_CD STRING,\n" +
" REG_TS TIMESTAMP,\n" +
" MUT_USER_CD STRING,\n" +
" MUT_TS TIMESTAMP(3),\n" +
" WATERMARK FOR MUT_TS AS MUT_TS,\n" +
" PRIMARY KEY (CPN_ORG_NO) NOT ENFORCED\n" +
") WITH (\n" +
" 'connector' = 'filesystem',\n" +
" 'path' = '" + getCsv2() + "',\n" +
" 'format' = 'csv'\n" +
")";
#Test
public void testEventTimeTemporalJoin() throws Exception {
var env = StreamExecutionEnvironment.getExecutionEnvironment();
var tableEnv = StreamTableEnvironment.create(env);
tableEnv.executeSql(emr01Ddl);
tableEnv.executeSql(emr02Ddl);
Table result = tableEnv.sqlQuery("" +
"SELECT *" +
" FROM EMR01" +
" LEFT JOIN EMR02 FOR SYSTEM_TIME AS OF EMR01.MUT_TS" +
" ON EMR01.CPN_ORG_NO = EMR02.CPN_ORG_NO");
tableEnv.toChangelogStream(result).addSink(new TestSink());
env.execute();
System.out.println("[Count]" + TestSink.values.size());
//System.out.println("[Row 1]" + TestSink.values.get(0));
//System.out.println("[Row 2]" + TestSink.values.get(1));
AtomicInteger i = new AtomicInteger();
TestSink.values.listIterator().forEachRemaining(value -> log.info("[Row " + i.incrementAndGet() + " ]=" + value));
}
private static class TestSink implements SinkFunction<Row> {
// must be static
public static final List<Row> values = Collections.synchronizedList(new ArrayList<>());
#Override
public void invoke(Row value, SinkFunction.Context context) {
values.add(value);
}
}
String getCsv1() {
try {
return new ClassPathResource("/GBTEMR01.csv").getFile().getAbsolutePath();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
String getCsv2() {
try {
return new ClassPathResource("/GBTEMR02.csv").getFile().getAbsolutePath();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
Is there a way to solve this? E.g. is there a way to FIRST load the right side into Flink state, and THEN start loading/streaming the left side? Would this be a good approach because this question begs: how much later? what is the time that the left side can start flowing?
We're using Flink 1.13.3.
This sort of temporal/versioned join depends on having accurate watermarks. Flink relies on the watermarks to know which rows can safely be dropped from the state being maintained (because they can no longer affect the results).
The watermarking you've used indicates that the rows are ordered by MUT_TS. Since this isn't true, the join isn't able to produce complete results.
To fix this, the watermarks should be defined with something like this
WATERMARK FOR MUT_TS AS MUT_TS - INTERVAL '2' MINUTE
where the interval indicates how much out-of-orderness needs to be accommodated.

How to access nested XML with Nokogiri [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I am using Nokogiri to parse XML. I was told to use a CSS selector to search through the XML, but I can't chain it to get through the nested objects.
How do I access the inner elements?
2.6.3 :039 > pp a.css("interface").to_s
"<interface>\n" +
" <status>\n" +
" <__XML__OPT_Cmd_show_interface_status_down>\n" +
" <__XML__OPT_Cmd_show_interface_status___readonly__>\n" +
" <__readonly__>\n" +
" <TABLE_interface>\n" +
" <ROW_interface>\n" +
" <interface>mgmt0</interface>\n" +
" <state>connected</state>\n" +
" <vlan>routed</vlan>\n" +
" <duplex>full</duplex>\n" +
" <speed>a-1000</speed>\n" +
" <type>--</type>\n" +
" </ROW_interface>\n" +
" <ROW_interface>\n" +
" <interface>Vlan1</interface>\n" +
" <state>down</state>\n" +
" <vlan>routed</vlan>\n" +
" <duplex>auto</duplex>\n" +
" <speed>auto</speed>\n" +
" </ROW_interface>\n" +
" <ROW_interface>\n" +
" <interface>Vlan6</interface>\n" +
" <state>down</state>\n" +
" <vlan>routed</vlan>\n" +
" <duplex>auto</duplex>\n" +
" <speed>auto</speed>\n" +
" </ROW_interface>\n" +
" <ROW_interface>\n" +
" <interface>Vlan486</interface>\n" +
" <state>down</state>\n" +
" <vlan>routed</vlan>\n" +
" <duplex>auto</duplex>\n" +
" <speed>auto</speed>\n" +
" </ROW_interface>\n" +
" </TABLE_interface>\n" +
" </__readonly__>\n" +
" </__XML__OPT_Cmd_show_interface_status___readonly__>\n" +
" </__XML__OPT_Cmd_show_interface_status_down>\n" +
" </status>\n" +
" </interface><interface>mgmt0</interface><interface>Vlan1</interface><interface>Vlan6</interface><interface>Vlan486</interface>"
I end up with this tree. What is my XPath here? This is only part of the parsed XML:
2.6.3 :043 > pp parsed
#(DocumentFragment:0x3fce080cd300 {
name = "#document-fragment",
children = [
#(ProcessingInstruction:0x3fce080cce14 { name = "xml" }),
#(Text "\n"),
#(Element:0x3fce080cc7d4 {
name = "rpc-reply",
namespace = #(Namespace:0x3fce080cffb0 {
prefix = "nf",
href = "urn:ietf:params:xml:ns:netconf:base:1.0"
}),
children = [
#(Text "\n" + " "),
#(Element:0x3fce080cf22c {
name = "data",
namespace = #(Namespace:0x3fce080cffb0 {
prefix = "nf",
href = "urn:ietf:params:xml:ns:netconf:base:1.0"
}),
children = [
#(Text "\n" + " "),
#(Element:0x1903f98 {
name = "show",
namespace = #(Namespace:0x1903f20 {
href = "http://www.cisco.com/nxos:1.0:if_manager"
}),
children = [
#(Text "\n" + " "),
#(Element:0x1903700 {
name = "interface",
namespace = #(Namespace:0x1903f20 {
href = "http://www.cisco.com/nxos:1.0:if_manager"
}),
children = [
#(Text "\n" + " "),
#(Element:0x19030fc {
name = "status",
namespace = #(Namespace:0x1903f20 {
href = "http://www.cisco.com/nxos:1.0:if_manager"
}),
children = [
#(Text "\n" + " "),
#(Element:0x1902a1c {
name = "__XML__OPT_Cmd_show_interface_status_down",
namespace = #(Namespace:0x1903f20 {
href = "http://www.cisco.com/nxos:1.0:if_manager"
}),
Your question is really generic and poorly asked so answering a specific question is not possible, but it looks like you need to understand how to access tags in a document using a CSS accessor, which Nokogiri makes very easy.
Meditate on this:
require 'nokogiri'
foo =<<EOT
<tag1>
<tag2>some text</tag2>
<tag3>some more text</tag3>
<tags>something</tags>
<tags>or</tags>
<tags>other</tags>
</tag1>
EOT
xml = Nokogiri::XML.parse(foo)
at finds the first matching occurrence in the document:
xml.at('tag2').content # => "some text"
at is pretty smart, in that it tries to determine whether the accessor is CSS or XPath, so it's a good first tool when you want the first match. If that doesn't work then you can try at_css which specifies that accessor is CSS, because sometimes you can come up with something that could work as CSS or XPath but return different results:
xml.at_css('tag3').content # => "some more text"
xml.at_css('tag3').text # => "some more text"
Similar to at is search, which also tries to determine whether it's CSS or XPath, but finds all matching nodes throughout the document rather than just the first matching one. Because it returns all matching nodes, it returns a NodeSet, unlike at which returns a Node, so you have to be aware that NodeSets behave differently than Nodes when accessing their content or text:
xml.search('tags').text # => "somethingorother"
That's almost never what you want, but you'd be surprised how many people then ask how to split that resulting string into the desired three words. It's usually impossible to do accurately, so a different tactic is needed:
xml.search('tags').map { |t| t.content } # => ["something", "or", "other"]
xml.search('tags').map { |t| t.text } # => ["something", "or", "other"]
xml.search('tags').map(&:text) # => ["something", "or", "other"]
Both at and search have ..._css and ..._xpath variations to help you fine-tune your code's behavior, but I always recommend starting with the generic at and search until you're forced to define what the accessor is.
I also recommend starting with CSS accessors over XPath because they tend to be more readable, and more easily learned if you're working inside HTML with CSS. XPath is very powerful, probably still more so than CSS, but learning it takes longer and often results in less readable code, which affects maintainability.
This is all in the tutorials and cheat sheets and documentation. Nokogiri is extremely powerful but it takes time reading and trying things to learn it. You can also search on SO for other things I've written about searching XML and HTML documents; In particular "What are some examples of using Nokogiri?" helps get an idea how to scrape a page. There's a lot of information covering many different topics related to this. I find it an interesting exercise to parse documents like this as it was part of my professional life for years.
You could use xpath:
parsed = Nokogiri::XML::DocumentFragment.parse(xml)
siamese_cat = parsed.xpath(.//interface/status/state)
Or just iterating thru XML
parsed = Nokogiri::XML::DocumentFragment.parse(xml)
parsed.each do |element|
# Some instructions
end

neo4j java cypher parameters not working

I am trying to create some dummy nodes in graph:
private final static Driver driver = GraphDatabase.driver("bolt://localhost:7687",
AuthTokens.basic("neo4j", "password"));
static Session session = driver.session();
String cypher = "CREATE "
+ "(:GPPocEntity {id:'{gppeid}',gppe_out_prop_1:'{gppe_out_prop_1_val_id}',"
+ "gppe_out_prop_2:'{gppe_out_prop_2_val_id}',"
+ "gppe_out_prop_X:'{gppe_out_prop_X_val_id}'})"
+ "-[:has]->"
+ "(:PPocEntity {id:'{ppeid}',ppe_out_prop_1:'{ppe_out_prop_1_val_id}',"
+ "ppe_out_prop_2:'{ppe_out_prop_2_val_id}',"
+ "ppe_out_prop_X:'{ppe_out_prop_X_val_id}'})"
+ "-[:contains]->"
+ "(:PocEntity {id:'{peid}',pe_out_prop_1:'{pe_out_prop_1_val_id}',"
+ "pe_out_prop_2:'{pe_out_prop_2_val_id}',"
+ "pe_out_prop_X:'{pe_out_prop_X_val_id}'})";
Map<String, Object> params = new HashMap<String, Object>();
int id = 1111;
params.put("gppeid","gppe"+id);
params.put("ppeid","ppe"+id);
params.put("peid","pe"+id);
params.put("gppe_out_prop_1_val_id","gppe_out_prop_1_val_"+id);
params.put("gppe_out_prop_2_val_id","gppe_out_prop_2_val_"+id);
params.put("gppe_out_prop_X_val_id","gppe_out_prop_X_val_"+id);
params.put("ppe_out_prop_1_val_id","ppe_out_prop_1_val_"+id);
params.put("ppe_out_prop_2_val_id","ppe_out_prop_2_val_"+id);
params.put("ppe_out_prop_X_val_id","ppe_out_prop_X_val_"+id);
params.put("pe_out_prop_1_val_id","pe_out_prop_1_val_"+id);
params.put("pe_out_prop_2_val_id","pe_out_prop_2_val_"+id);
params.put("pe_out_prop_X_val_id","pe_out_prop_X_val_"+id);
session.run(cypher, params);
But this does not set those parameters in cypher. Why is this so?
The problem is that you wrap the parameters in the cypher-query in single quotes, so they are not interpreted. Try to correct the query by removing single quotes:
String cypher = "CREATE "
+ "(:GPPocEntity {id:{gppeid}, gppe_out_prop_1: {gppe_out_prop_1_val_id}, "
+ " gppe_out_prop_2: {gppe_out_prop_2_val_id}, "
+ " gppe_out_prop_X: {gppe_out_prop_X_val_id}}) "
+ "-[:has]->"
...
A cypher parameter is $ + name, not { + name + }.
So for parameter called gppe_out_prop_1_val_id, you should put $gppe_out_prop_1_val_id into your query.
And you don't need to put quotes around, paramters are typed, so Neo4j will do it for you.

neo4j query extremly slow

I want to create a graph using the neo4jclient , I have a dictionary that hold 2 kinds of object ( documents ,list) ,
Dictionary<Document, List<Concept>>
as you can see ,each document hold a list of concept ,a relationship must be between them (document)<-[:IN]-(concept)
public void LoadNode(Dictionary<Document, List<Concept>> dictionary, GraphClient _client)
{
var d = dictionary.ToList();
var t = dictionary.Values;
string merge1 = string.Format
("MERGE (source:{0} {{ {1}:row.Key.Id_Doc }})", "Document", "Name");
string strForEachDoc = " FOREACH( concept in row.Value | ";
string merge2 = string.Format
("MERGE (target:{0} {{ {1} : concept.Name }})", "Concept", "Name");
string merge3 = string.Format
(" MERGE (source)-[ r:{0} ]->(target)", "Exist");
{_client.Cypher
.WithParam("coll", d)
.ForEach("(row in {coll} | " +
merge1 + " " +
strForEachDoc + " " +
merge2 + " " +
merge3 + "))")
.ExecuteWithoutResults();
}
}
it takes times and Visual studio ran into a bizarre error
"Une exception de première chance de type 'System.AggregateException'
s'est produite dans mscorlib.dll"

Getting incremental changes from Neo4j DB

Is there any way in Neo4j 1.9, to get all the nodes/relationships that were modified(created/updated/deleted) within certain span of time - like we do in SOLR delta import?
One crude way I can think of is maintain a timestamp property for each node/relationship and index them to fetch those nodes/relationship.
START a=node:custom_index("timestamp:[{start_time} TO {end_time}]")
RETURN a;
But then the question would be if I modify the node via CYPHER, index will not be updated.
There's no built-in functionality like that in Neo4j, unfortunately.
To address issues one by one. Maintaining timestamp is not possible, because you have nowhere to put it in the case of deleted nodes/relationships. You can't put a timestamp on a property either. So you would know a node has been changed, but wouldn't know how.
One possible solution is to log the changes somewhere as they happen, using TransactionEventHandlers. Then, you can a) choose exactly what to record, and b) don't worry about Cypher, it will be logged no matter what method you used to update the database.
I've put together a small demo. It just logs every change to std out. It uses some GraphAware classes (disclaimer: I'm the author) for simplicity, but could be written without them, if you feel so inclined.
Here's the important part of the code, in case the link gets eventually broken or something:
#Test
public void demonstrateLoggingEveryChange() {
GraphDatabaseService database = new TestGraphDatabaseFactory().newImpermanentDatabase();
database.registerTransactionEventHandler(new ChangeLogger());
//perform mutations here
}
private class ChangeLogger extends TransactionEventHandler.Adapter<Void> {
#Override
public void afterCommit(TransactionData data, Void state) {
ImprovedTransactionData improvedData = new LazyTransactionData(data);
for (Node createdNode : improvedData.getAllCreatedNodes()) {
System.out.println("Created node " + createdNode.getId()
+ " with properties: " + new SerializablePropertiesImpl(createdNode).toString());
}
for (Node deletedNode : improvedData.getAllDeletedNodes()) {
System.out.println("Deleted node " + deletedNode.getId()
+ " with properties: " + new SerializablePropertiesImpl(deletedNode).toString());
}
for (Change<Node> changedNode : improvedData.getAllChangedNodes()) {
System.out.println("Changed node " + changedNode.getCurrent().getId()
+ " from properties: " + new SerializablePropertiesImpl(changedNode.getPrevious()).toString()
+ " to properties: " + new SerializablePropertiesImpl(changedNode.getCurrent()).toString());
}
for (Relationship createdRelationship : improvedData.getAllCreatedRelationships()) {
System.out.println("Created relationship " + createdRelationship.getId()
+ " between nodes " + createdRelationship.getStartNode().getId()
+ " and " + createdRelationship.getEndNode().getId()
+ " with properties: " + new SerializablePropertiesImpl(createdRelationship).toString());
}
for (Relationship deletedRelationship : improvedData.getAllDeletedRelationships()) {
System.out.println("Deleted relationship " + deletedRelationship.getId()
+ " between nodes " + deletedRelationship.getStartNode().getId()
+ " and " + deletedRelationship.getEndNode().getId()
+ " with properties: " + new SerializablePropertiesImpl(deletedRelationship).toString());
}
for (Change<Relationship> changedRelationship : improvedData.getAllChangedRelationships()) {
System.out.println("Changed relationship " + changedRelationship.getCurrent().getId()
+ " between nodes " + changedRelationship.getCurrent().getStartNode().getId()
+ " and " + changedRelationship.getCurrent().getEndNode().getId()
+ " from properties: " + new SerializablePropertiesImpl(changedRelationship.getPrevious()).toString()
+ " to properties: " + new SerializablePropertiesImpl(changedRelationship.getCurrent()).toString());
}
}
}

Resources