I am trying to create some dummy nodes in graph:
private final static Driver driver = GraphDatabase.driver("bolt://localhost:7687",
AuthTokens.basic("neo4j", "password"));
static Session session = driver.session();
String cypher = "CREATE "
+ "(:GPPocEntity {id:'{gppeid}',gppe_out_prop_1:'{gppe_out_prop_1_val_id}',"
+ "gppe_out_prop_2:'{gppe_out_prop_2_val_id}',"
+ "gppe_out_prop_X:'{gppe_out_prop_X_val_id}'})"
+ "-[:has]->"
+ "(:PPocEntity {id:'{ppeid}',ppe_out_prop_1:'{ppe_out_prop_1_val_id}',"
+ "ppe_out_prop_2:'{ppe_out_prop_2_val_id}',"
+ "ppe_out_prop_X:'{ppe_out_prop_X_val_id}'})"
+ "-[:contains]->"
+ "(:PocEntity {id:'{peid}',pe_out_prop_1:'{pe_out_prop_1_val_id}',"
+ "pe_out_prop_2:'{pe_out_prop_2_val_id}',"
+ "pe_out_prop_X:'{pe_out_prop_X_val_id}'})";
Map<String, Object> params = new HashMap<String, Object>();
int id = 1111;
params.put("gppeid","gppe"+id);
params.put("ppeid","ppe"+id);
params.put("peid","pe"+id);
params.put("gppe_out_prop_1_val_id","gppe_out_prop_1_val_"+id);
params.put("gppe_out_prop_2_val_id","gppe_out_prop_2_val_"+id);
params.put("gppe_out_prop_X_val_id","gppe_out_prop_X_val_"+id);
params.put("ppe_out_prop_1_val_id","ppe_out_prop_1_val_"+id);
params.put("ppe_out_prop_2_val_id","ppe_out_prop_2_val_"+id);
params.put("ppe_out_prop_X_val_id","ppe_out_prop_X_val_"+id);
params.put("pe_out_prop_1_val_id","pe_out_prop_1_val_"+id);
params.put("pe_out_prop_2_val_id","pe_out_prop_2_val_"+id);
params.put("pe_out_prop_X_val_id","pe_out_prop_X_val_"+id);
session.run(cypher, params);
But this does not set those parameters in cypher. Why is this so?
The problem is that you wrap the parameters in the cypher-query in single quotes, so they are not interpreted. Try to correct the query by removing single quotes:
String cypher = "CREATE "
+ "(:GPPocEntity {id:{gppeid}, gppe_out_prop_1: {gppe_out_prop_1_val_id}, "
+ " gppe_out_prop_2: {gppe_out_prop_2_val_id}, "
+ " gppe_out_prop_X: {gppe_out_prop_X_val_id}}) "
+ "-[:has]->"
...
A cypher parameter is $ + name, not { + name + }.
So for parameter called gppe_out_prop_1_val_id, you should put $gppe_out_prop_1_val_id into your query.
And you don't need to put quotes around, paramters are typed, so Neo4j will do it for you.
Related
Background: I'm trying to get an event-time temporal join working with two 'large(r)' datasets/tables that are read from a CSV-file (16K+ rows in left table, somewhat less in right table). Both tables are append-only tables, i.e. their datasources are currently CSV-files, but will become CDC changelogs emitted by Debezium over Pulsar.
I am using the fairly new SYSTEM_TIME AS OF syntax.
The problem: join results are only partly correct, i.e. at the start (first 20% or so) of the execution of the query, rows of the left-side are not matched with rows from the right side, while in theory, they should. After a couple of seconds, there are more matches, and by the time the query ends, rows of the left side are getting matched/joined correctly with rows of the right side.
Every time that I run the query it shows other results in terms of which rows are (not) matched.
Both datasets are not ordered by their respective event-times. They are ordered by their primary key. So it's really this case, only with more data.
In essence, the right side is a lookup-table that changes over time, and we're sure that for every left record there was a matching right record, as both were created in the originating database at +/- the same instant. Ultimately our goal is a dynamic materialized view that contains the same data as when we'd join the 2 tables in the CDC-enabled source database (SQL Server).
Obviously, I want to achieve a correct join over the complete dataset as explained in the Flink docs
Unlike simple examples and Flink test-code with a small dataset of only a few rows (like here), a join of larger datasets does not yield correct results.
I suspect that, when the probing/left table starts flowing, the build/right table is not yet 'in memory' which means that left rows don't find a matching right row, while they should -- if the right table would have started flowing somewhat earlier. That's why the left join returns null-values for the columns of the right table.
I've included my code:
#Slf4j(topic = "TO_FILE")
public class CsvTemporalJoinTest {
private final String emr01Ddl =
"CREATE TABLE EMR01\n" +
"(\n" +
" SRC_NO STRING,\n" +
" JRD_ETT_NO STRING,\n" +
" STT_DT DATE,\n" +
" MGT_SLT_DT DATE,\n" +
" ATM_CRT_DT DATE,\n" +
" LTD_MDT_IC STRING,\n" +
" CPN_ORG_NO STRING,\n" +
" PTY_NO STRING,\n" +
" REG_USER_CD STRING,\n" +
" REG_TS TIMESTAMP,\n" +
" MUT_USER_CD STRING,\n" +
" MUT_TS TIMESTAMP(3),\n" +
" WATERMARK FOR MUT_TS AS MUT_TS,\n" +
" PRIMARY KEY (CPN_ORG_NO) NOT ENFORCED\n" +
") WITH (\n" +
" 'connector' = 'filesystem',\n" +
" 'path' = '" + getCsv1() + "',\n" +
" 'format' = 'csv'\n" +
")";
private final String emr02Ddl =
"CREATE TABLE EMR02\n" +
"(\n" +
" CPN_ORG_NO STRING,\n" +
" DSB_TX STRING,\n" +
" REG_USER_CD STRING,\n" +
" REG_TS TIMESTAMP,\n" +
" MUT_USER_CD STRING,\n" +
" MUT_TS TIMESTAMP(3),\n" +
" WATERMARK FOR MUT_TS AS MUT_TS,\n" +
" PRIMARY KEY (CPN_ORG_NO) NOT ENFORCED\n" +
") WITH (\n" +
" 'connector' = 'filesystem',\n" +
" 'path' = '" + getCsv2() + "',\n" +
" 'format' = 'csv'\n" +
")";
#Test
public void testEventTimeTemporalJoin() throws Exception {
var env = StreamExecutionEnvironment.getExecutionEnvironment();
var tableEnv = StreamTableEnvironment.create(env);
tableEnv.executeSql(emr01Ddl);
tableEnv.executeSql(emr02Ddl);
Table result = tableEnv.sqlQuery("" +
"SELECT *" +
" FROM EMR01" +
" LEFT JOIN EMR02 FOR SYSTEM_TIME AS OF EMR01.MUT_TS" +
" ON EMR01.CPN_ORG_NO = EMR02.CPN_ORG_NO");
tableEnv.toChangelogStream(result).addSink(new TestSink());
env.execute();
System.out.println("[Count]" + TestSink.values.size());
//System.out.println("[Row 1]" + TestSink.values.get(0));
//System.out.println("[Row 2]" + TestSink.values.get(1));
AtomicInteger i = new AtomicInteger();
TestSink.values.listIterator().forEachRemaining(value -> log.info("[Row " + i.incrementAndGet() + " ]=" + value));
}
private static class TestSink implements SinkFunction<Row> {
// must be static
public static final List<Row> values = Collections.synchronizedList(new ArrayList<>());
#Override
public void invoke(Row value, SinkFunction.Context context) {
values.add(value);
}
}
String getCsv1() {
try {
return new ClassPathResource("/GBTEMR01.csv").getFile().getAbsolutePath();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
String getCsv2() {
try {
return new ClassPathResource("/GBTEMR02.csv").getFile().getAbsolutePath();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
Is there a way to solve this? E.g. is there a way to FIRST load the right side into Flink state, and THEN start loading/streaming the left side? Would this be a good approach because this question begs: how much later? what is the time that the left side can start flowing?
We're using Flink 1.13.3.
This sort of temporal/versioned join depends on having accurate watermarks. Flink relies on the watermarks to know which rows can safely be dropped from the state being maintained (because they can no longer affect the results).
The watermarking you've used indicates that the rows are ordered by MUT_TS. Since this isn't true, the join isn't able to produce complete results.
To fix this, the watermarks should be defined with something like this
WATERMARK FOR MUT_TS AS MUT_TS - INTERVAL '2' MINUTE
where the interval indicates how much out-of-orderness needs to be accommodated.
Well, I have to run the following query (Neo4j comm. ed. 3.0.12 on Docker)
the caveat is that the calendar name has unknown format:
1) firstname + " " + lastname + "-" + specialization
2) lastname + " " + firstname + "-" + specialization
:PARAM name: "Di Pietro Chiara - Gynecologist"
MERGE (_200:`person` {`lastname`: "Di Pietro", `firstname`: "Chiara", `birthdate`: "1984/03/25"})
MERGE (_cal_445:`calendar` { :`X-VR-CALNAME` = $name })-[:`belongs_to a`]-(_per_445:`person`)
WHERE $name = _per_445.firstname + " " + _per_445.lastname
OR $name = (_per_445.nome + " " + _per_445.cognome)
RETURN _cal_445, _per_445
The query, and some different variants, doesn't run. Sometimes returns an error, and sometimes destroys the browser layout on the screen,
Surely there is something wrong but I was unable to find and correct.
The part of confronting against two inverted format: how could be optimized?
Why the PARAM declaration generate an error?
Any help will be greatly appreciated.
This part of your query is not valid :
MERGE (_cal_445:`calendar` { :`X-VR-CALNAME` = $name })
You should replace it by this :
MERGE (_cal_445:`calendar` { `:X-VR-CALNAME`:$name })
Moreover, you are doing a MERGE with the value $name that is also on the WHERE cluse. It's just not allowed ...
If you replace the merge by a match, your query will work :
MERGE (_200:`person` {`lastname`: "Di Pietro", `firstname`: "Chiara", `birthdate`: "1984/03/25"})
WITH _200
MATCH (_cal_445:`calendar` { `:X-VR-CALNAME`: $name })-[:`belongs_to a`]-(_per_445:`person`)
WHERE $name = _per_445.firstname + " " + _per_445.lastname
OR $name = (_per_445.nome + " " + _per_445.cognome)
RETURN _cal_445, _per_445
UPDATE: I thought I had to pass the parameters as a JSON string in the request body, but actually I need to put them on the URL (the endpoint string), so it's working now.
I'm new to Valence. I have some Salesforce Apex code (written by someone else) that creates a D2L user. The code is working fine.
I want to add an Apex method to retrieve info for an existing D2L user using the userName parameter. I've copied the existing method, changed to a GET, set the query parameter to userName, and kept everything else the same.
When I call my method, I get a 403 Invalid Token error.
Do I need to use different authorization parameters for a GET? For example, do I still need to include a timestamp?
Here's a portion of the Salesforce Apex code:
public static final String USERS = '/d2l/api/lp/1.0/users/';
String TIMESTAMP_PARAM_VALUE = String.valueOf(Datetime.now().getTime()).substring(0,10);
String method = GETMETHOD;
String action = USERS;
String signData = method + '&' + action + '&' + TIMESTAMP_PARAM_VALUE;
String userSignature = sign(signData,USER_KEY);
String appSignature = sign(signData,APP_KEY);
String SIGNED_USER_PARAM_VALUE = userSignature;
String SIGNED_APP_PARAM_VALUE = appSignature;
String endPoint = DOMAIN + action + '?' +
APP_ID_PARAM + '=' + APP_ID + '&' +
USER_ID_PARAM + '=' + USER_ID + '&' +
SIGNED_USER_PARAM + '=' + SIGNED_USER_PARAM_VALUE + '&' +
SIGNED_APP_PARAM + '=' + SIGNED_APP_PARAM_VALUE + '&' +
TIMESTAMP_PARAM + '=' + TIMESTAMP_PARAM_VALUE;
HttpRequest req = new HttpRequest();
req.setMethod(method);
req.setTimeout(30000);
req.setEndpoint(endPoint);
req.setBody('{ "orgDefinedId"' + ':' + '"' + person.Id + '" }');
I thought I had to pass the parameters as a JSON string in the request body, but actually I need to put them on the URL (the endpoint string), so it's working now
I want to create a graph using the neo4jclient , I have a dictionary that hold 2 kinds of object ( documents ,list) ,
Dictionary<Document, List<Concept>>
as you can see ,each document hold a list of concept ,a relationship must be between them (document)<-[:IN]-(concept)
public void LoadNode(Dictionary<Document, List<Concept>> dictionary, GraphClient _client)
{
var d = dictionary.ToList();
var t = dictionary.Values;
string merge1 = string.Format
("MERGE (source:{0} {{ {1}:row.Key.Id_Doc }})", "Document", "Name");
string strForEachDoc = " FOREACH( concept in row.Value | ";
string merge2 = string.Format
("MERGE (target:{0} {{ {1} : concept.Name }})", "Concept", "Name");
string merge3 = string.Format
(" MERGE (source)-[ r:{0} ]->(target)", "Exist");
{_client.Cypher
.WithParam("coll", d)
.ForEach("(row in {coll} | " +
merge1 + " " +
strForEachDoc + " " +
merge2 + " " +
merge3 + "))")
.ExecuteWithoutResults();
}
}
it takes times and Visual studio ran into a bizarre error
"Une exception de première chance de type 'System.AggregateException'
s'est produite dans mscorlib.dll"
Is there any way in Neo4j 1.9, to get all the nodes/relationships that were modified(created/updated/deleted) within certain span of time - like we do in SOLR delta import?
One crude way I can think of is maintain a timestamp property for each node/relationship and index them to fetch those nodes/relationship.
START a=node:custom_index("timestamp:[{start_time} TO {end_time}]")
RETURN a;
But then the question would be if I modify the node via CYPHER, index will not be updated.
There's no built-in functionality like that in Neo4j, unfortunately.
To address issues one by one. Maintaining timestamp is not possible, because you have nowhere to put it in the case of deleted nodes/relationships. You can't put a timestamp on a property either. So you would know a node has been changed, but wouldn't know how.
One possible solution is to log the changes somewhere as they happen, using TransactionEventHandlers. Then, you can a) choose exactly what to record, and b) don't worry about Cypher, it will be logged no matter what method you used to update the database.
I've put together a small demo. It just logs every change to std out. It uses some GraphAware classes (disclaimer: I'm the author) for simplicity, but could be written without them, if you feel so inclined.
Here's the important part of the code, in case the link gets eventually broken or something:
#Test
public void demonstrateLoggingEveryChange() {
GraphDatabaseService database = new TestGraphDatabaseFactory().newImpermanentDatabase();
database.registerTransactionEventHandler(new ChangeLogger());
//perform mutations here
}
private class ChangeLogger extends TransactionEventHandler.Adapter<Void> {
#Override
public void afterCommit(TransactionData data, Void state) {
ImprovedTransactionData improvedData = new LazyTransactionData(data);
for (Node createdNode : improvedData.getAllCreatedNodes()) {
System.out.println("Created node " + createdNode.getId()
+ " with properties: " + new SerializablePropertiesImpl(createdNode).toString());
}
for (Node deletedNode : improvedData.getAllDeletedNodes()) {
System.out.println("Deleted node " + deletedNode.getId()
+ " with properties: " + new SerializablePropertiesImpl(deletedNode).toString());
}
for (Change<Node> changedNode : improvedData.getAllChangedNodes()) {
System.out.println("Changed node " + changedNode.getCurrent().getId()
+ " from properties: " + new SerializablePropertiesImpl(changedNode.getPrevious()).toString()
+ " to properties: " + new SerializablePropertiesImpl(changedNode.getCurrent()).toString());
}
for (Relationship createdRelationship : improvedData.getAllCreatedRelationships()) {
System.out.println("Created relationship " + createdRelationship.getId()
+ " between nodes " + createdRelationship.getStartNode().getId()
+ " and " + createdRelationship.getEndNode().getId()
+ " with properties: " + new SerializablePropertiesImpl(createdRelationship).toString());
}
for (Relationship deletedRelationship : improvedData.getAllDeletedRelationships()) {
System.out.println("Deleted relationship " + deletedRelationship.getId()
+ " between nodes " + deletedRelationship.getStartNode().getId()
+ " and " + deletedRelationship.getEndNode().getId()
+ " with properties: " + new SerializablePropertiesImpl(deletedRelationship).toString());
}
for (Change<Relationship> changedRelationship : improvedData.getAllChangedRelationships()) {
System.out.println("Changed relationship " + changedRelationship.getCurrent().getId()
+ " between nodes " + changedRelationship.getCurrent().getStartNode().getId()
+ " and " + changedRelationship.getCurrent().getEndNode().getId()
+ " from properties: " + new SerializablePropertiesImpl(changedRelationship.getPrevious()).toString()
+ " to properties: " + new SerializablePropertiesImpl(changedRelationship.getCurrent()).toString());
}
}
}