gds.beta.pipeline.nodeClassification.train: Target property not found - neo4j

Using Neo4j v4.4 and GDS 2.0. I'm trying to train a model. When I type:
CALL gds.beta.pipeline.nodeClassification.train('individual-graph', {
pipeline: 'pipe',
nodeLabels: ['PERSON'],
modelName: 'xmen-model-fastRP',
targetProperty: 'is_risky',
metrics: ['F1_WEIGHTED','ACCURACY'],
randomSeed: 2
}) YIELD modelInfo
RETURN
modelInfo.bestParameters AS winningModel,
modelInfo.metrics.F1_WEIGHTED.outerTrain AS trainGraphScore,
modelInfo.metrics.F1_WEIGHTED.test AS testGraphScore
I get the following error message:
Failed to invoke procedure gds.beta.pipeline.nodeClassification.train: Caused by: java.lang.IllegalArgumentException: Target property is_risky not found in graph with node properties: [[embedding]]
What am I doing wrong? Can you help please?

It means that the property 'is_risky' is not found in the node PERSON. The only existing property is embedding.
Going thru the example in the neo4j documentation (https://neo4j.com/docs/graph-data-science/current/machine-learning/nodeclassification-pipelines/) will give you an idea of what the error is. Below is an example of a similar issue that you are getting.
Target property `my_class` not found in graph
with node properties: [[sizePerStory, class], [sizePerStory]]
As you can see, the algorithm will give you a list of properties available for prediction.

Related

Spatial querydsl - GeometryExpressions. Find entities that range contains given "Point"

Used tools and their versions:
I am using:
spring boot 2.2.6
hibernate/hibernate-spatial 5.3.10 with dialect set to: org.hibernate.spatial.dialect.mysql.MySQL56SpatialDialect
querydsl-spatial 4.2.1
com.vividsolutions.jts 1.13
jscience 4.3.1
Problem description:
I have an entity that represents medical-clinic:
import com.vividsolutions.jts.geom.Polygon;
#Entity
public class Clinic {
#Column(name = "range", columnDefinition = "Polygon")
private Polygon range;
}
The range is a circle calculated earlier based on the clinic's gps-location and radius. It represents the operating area for that clinic. That means that it treats only patients with their home address lying within that circle. Let's assume that above circle, is correct.
My goal (question):
I have a gps point with a patient location: 45.7602322 4.8444941. I would like to find all clinics that are able to treat that patient. That means, to find all the clinics that their range field contains 45.7602322 4.8444941.
My solution (partially correct (I think))
To get it done, I have created a simple "Predicate"/"BooleanExpression":
GeometryExpressions.asGeometry(QClinic.clinic.range)
.contains(Wkt.fromWkt("Point(45.7602322 4.8444941)"))
and it actualy works, because I can see proper sql query in console:
select (...) where
ST_Contains(clinic0_.range, ?)=1 limit ?
first binding param: POINT(45.7602322 4.8444941)
But I have two problems with that:
QClinic.clinic.range is marked as "warning" in intellij as: "Unchecked assignment: 'com.querydsl.spatial.jts.JTSPolygonPath' to 'com.querydsl.core.types.Expression<org.geolatte.geom.Geometry'". Yes, in QClinic range is com.querydsl.spatial.jts.JTSPolygonPath
Using debugger and intellij's "evaluate" on the above line (that creates the expression) i can see that there is an error message: "unknown operation with operator CONTAINS and args [clinic.range, POINT(45.7602322 4.8444941)]"
You can ignore the second warning. The spatial specific operations are simply not registered in the serializer used for the toString of the operation. They reside in their own module.
The first warning indicates that you're mixing up Geolatte and JTS expressions. From your mapping it seems you intend to use JTS. In that case you need to use com.querydsl.spatial.jts.JTSGeometryExpressions instead of com.querydsl.spatial.jts.GeometryExpressions in order to get rid of that warning.

Error using DseGraphFrame with querying timestamp field

I have a Person label with a created property defined:
schema.propertyKey(“created”).Timestamp().single().create()
I get the error below when trying to use DseGraphFrame to filter for the Person label using the created property in dse spark:
scala> g.V().hasLabel(“Person”).has(“created”,
P.gt("2018-10-07T14:46:26.790Z")).count().next()
org.apache.spark.sql.AnalysisException: cannot resolve '(created >
1538923586790L)' due to data type mismatch: differing types in
'(created > 1538923586790L)' (timestamp and bigint).;; 'Filter
((~label#270 = Person) && (created#280 > 1538923586790))…
Any idea why?
This was a defect in DSE version but resolved in DSE 5.1.8 and DSE 6.0.0.
see here - https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/releaseNotes/RNdse.html and look for DSP-15146

Cannot convert value to bool : InvalidArgumentException

I am using cakephp 3.3 for my vod application and i want to insert data using following query:
$query=$notifications->query()->insert(['message' ,'status','user_id' ,'video_id' ,'notify_to' ,'notification_type'])
->values([
'message'=>'Congratulations! your video '.$video_name.' has been approved to be uploaded on MM2View by admin.',
'status'=>$status,
'user_id'=>$user_id[0]['users_id'],
'video_id'=>$id,
'notify_to'=>1,
'notification_type'=>3
])
->execute();
But i am getting
Cannot convert value to bool : InvalidArgumentException Error message. I have done some google related to this problem but did not find any correct solution.
Invalid argument exceptions are caused because of type mismatch in operations you written in the code.
Check your model class for the type you given and compare it with the code

Neo4j with spatial: NotFoundException: More than one relationship

What is the cause and how to fix this exception:
org.neo4j.graphdb.NotFoundException: More than one relationship[RTREE_CHILD, INCOMING] found for NodeImpl#105
at org.neo4j.kernel.impl.core.NodeImpl.getSingleRelationship(NodeImpl.java:344)
at org.neo4j.kernel.impl.core.NodeProxy.getSingleRelationship(NodeProxy.java:191)
at org.neo4j.collections.rtree.RTreeIndex.getIndexNodeParent(RTreeIndex.java:768)
at org.neo4j.collections.rtree.RTreeIndex.adjustPathBoundingBox(RTreeIndex.java:672)
at org.neo4j.collections.rtree.RTreeIndex.add(RTreeIndex.java:90)
at org.neo4j.gis.spatial.EditableLayerImpl.add(EditableLayerImpl.java:44)
at org.neo4j.gis.spatial.ShapefileImporter.importFile(ShapefileImporter.java:209)
at org.neo4j.gis.spatial.ShapefileImporter.importFile(ShapefileImporter.java:122)
I am using 2.0.0 and spatial jars coming from compiled github project.
The exception is thrown when I try to import Shapefile (this is code in unmanaged extension):
GraphDatabaseService spatialDb = new GraphDatabaseFactory().newEmbeddedDatabase("/home/db/data/spatial.db");
Transaction tx = spatialDb.beginTx();
try {
ShapefileImporter importer = new ShapefileImporter(spatialDb, new NullListener());
importer.importFile("/home/bla/realshp/users_location.shp", "users_location");
tx.success();
} catch (Exception e) {
e.printStackTrace();
} finally {
tx.close();
return Response.status(200).entity("Done. ").build();
}
The shape file is generated from CSV file with ogr2ogr - it seems legit and is read without exceptions. In the original file there was around 30000 points defined as follows (ogr2ogr will pull longitude and latitude):
id,longitude,latitude,gender,updated
3,-122.1171925,37.4343361,1,2013-11-20 05:03:22
304,-122.0919000,37.3094000,1,2013-11-03 00:42:01
311,-122.0919000,37.3094000,1,2013-11-03 00:42:01
How to get around it? I need to load milions of points to the db.
Side question: now I create new graph-spatial datastore - is it correct? Maybe I should load it to existing graph db?
UPDATE:
I tried to input coordinates "manually" using methods from TestSimplePointLayer. I got the same exception around 450th coordinate. Bunch of them are the same as you can see in the sample, but they are valid points. How to get around it?
You are skipping a step here. You create a spatial index and then you add the users to the index.
So for example if you had a shape file of all the states or counties or zip codes in the US, you can create a spatial layer with those shapes and add the users to them.
You can use a simple point layer as well if you want, but they have to be unique, but the user nodes that reside in those locations don't have to be. See http://java.dzone.com/articles/running-along-graph-using-0 and http://www.markhneedham.com/blog/2013/03/10/neo4jcypher-finding-football-stadiums-near-a-city-using-spatial/ for a better idea.
I meet the same error when I try to add node with same lon/lat (0,0) to the layer.
When more than 100 RTREE_CHILD ref node inserted, this exception appears. It's a bug of the source code.
src/main/java/org/neo4j/gis/spatial/rtree/RTreeIndex.java
try this forked plugin:
https://github.com/linkedin-inc/spatial

neo4j reference node obsolete but yet still returned from getAllNodes

According to Neo4j documentation the "reference node concept is obsolete - indexes are the canonical way of getting hold of entry points in the graph.".
However if I use GlobalGraphOperations.getAllNodes() I'm still returned a node with id 0 which I didn't create and which has all the looks of a reference node.
I'm trying to implement a method getNode(String uuid)
public Node getNode(String uuid)
{
GlobalGraphOperations globalGraphOperations = GlobalGraphOperations.at(graphDb);
for(Node tmpNode : globalGraphOperations.getAllNodes())
{
if(tmpNode.equals(graphDb.getReferenceNode()))
{ continue;}
String tmpNodeUuid = (String)tmpNode.getProperty("uuid");
if (tmpNodeUuid.equals(uuid))
{
return tmpNode;
}
}
return null;
}
why does getAllNodes return a reference node?
how to implement programmatically getNode() without using deprecated function getReferenceNode()?
The reference node concept is indeed deprecated and will be removed with Neo4j version 2.0. In 1.x the concept still exists and the reference node is created when the database is intially created. If you don't need it, you can just delete the reference node. The method you're writing is gonna get slow as the graph grows as the entire graph is traversed. You should create an index for the UUID property and use that to look up nodes in the graph, which is much faster. As well as being the 'canonical way of getting hold of entry points in the graph' :-)

Resources